onFeb 10, 2025
onFeb 10, 2025
In recent years, the primary interface for interacting with software tools and data has been mainly through front-end applications and APIs. However, generative AI radically shifts this paradigm by introducing natural language—both voice and text—as a new communication layer for these tools. The input space expands from being constrained by the software’s programmed functionalities to the near-infinite possibilities enabled by natural language.
This shift significantly enhances potential applications but also makes developing LLM-based products more challenging. It is nearly impossible to evaluate all possible interactions before exposing a system to real users—meaning it’s difficult to ensure your product behaves as expected before deployment. Why?
#1. There are countless edge cases that are complex to identify.
#2. Small variations in input can lead to significantly different outputs.
#3. The same request can be phrased in many different ways.
To better understand the range of user inputs, we analyzed over 10,000 real-world interactions with various LLM-based products and developed a taxonomy categorizing them into three main groups, each with subcategories:
Category | Description |
---|---|
Appropriate use | Inputs that align with expected usage and provide clear, structured information. |
Intentional misuse | Inputs designed to break, manipulate, or exploit the system. |
- Misleading or misdirecting inputs | Trick the system into generating misleading or biased outputs. |
- Toxic or abusive inputs | Attack, insult, or provoke the system/company. |
- Confidential info requests | Extract or compare sensitive company information. |
Unintentional misuse | Inputs that cause confusion due to ambiguity, incorrect formatting, or other issues. |
- Ambiguous or incomplete inputs | Lack of clarity, requiring more details to respond properly. |
- Incorrect formatting or information | Inputs that don’t follow expected formats or contain factual errors. |
- Slang, abbreviations & typos | Inputs that may cause misinterpretation. |
- Unconventional phrasing | Non-standard structure that makes interpretation harder. |
To illustrate how this taxonomy applies in a real-world scenario, below are examples of interactions categorized for a Mobile Sales Assistant:
At Galtea Platform, one of our core capabilities is enabling organizations to simulate a wide range of user interactions, covering all segments of this taxonomy. Through a proprietary, research-driven methodology, we identify vulnerabilities and edge cases before your system goes live, ensuring your LLM product operates reliably in production.
If this interests you, book a demo with us: Galtea Demo