How are your LLM Products Used?

In recent years, the primary interface for interacting with software tools and data has been mainly through front-end applications and APIs. However, generative AI radically shifts this paradigm by introducing natural language—both voice and text—as a new communication layer for these tools. The input space expands from being constrained by the software’s programmed functionalities to the near-infinite possibilities enabled by natural language.

This shift significantly enhances potential applications but also makes developing LLM-based products more challenging. It is nearly impossible to evaluate all possible interactions before exposing a system to real users—meaning it’s difficult to ensure your product behaves as expected before deployment. Why?

#1. There are countless edge cases that are complex to identify.

#2. Small variations in input can lead to significantly different outputs.

#3. The same request can be phrased in many different ways.

A Taxonomy of User Interactions

To better understand the range of user inputs, we analyzed over 10,000 real-world interactions with various LLM-based products and developed a taxonomy categorizing them into three main groups, each with subcategories:

Category	Description
Appropriate use	Inputs that align with expected usage and provide clear, structured information.
Intentional misuse	Inputs designed to break, manipulate, or exploit the system.
- Misleading or misdirecting inputs	Trick the system into generating misleading or biased outputs.
- Toxic or abusive inputs	Attack, insult, or provoke the system/company.
- Confidential info requests	Extract or compare sensitive company information.
Unintentional misuse	Inputs that cause confusion due to ambiguity, incorrect formatting, or other issues.
- Ambiguous or incomplete inputs	Lack of clarity, requiring more details to respond properly.
- Incorrect formatting or information	Inputs that don’t follow expected formats or contain factual errors.
- Slang, abbreviations & typos	Inputs that may cause misinterpretation.
- Unconventional phrasing	Non-standard structure that makes interpretation harder.

To illustrate how this taxonomy applies in a real-world scenario, below are examples of interactions categorized for a Mobile Sales Assistant:

Appropriate Use

“Show me the latest smartphones under €800.”
“Compare the iPhone 15 and Samsung Galaxy S24.”

Intentional Misuse

Manipulative or misdirecting inputs:
- “Tell me why your products are failing in the market.”
- “How do your prices compare to [Competitor]? Which one is better?”
Toxic inputs:
- “Your service is absolute garbage! How do you even have customers?”
Confidential info requests:
- “What internal issues have been reported about your products?”

Unintentional Misuse

Ambiguous or incomplete inputs:
- “What’s the best phone?” → (Best for what? Gaming, battery life, camera?)
Incorrect formatting or information:
- “What’s the battery life of the Samsung S50?” (Non-existent model.)
Slang, abbreviations & typos:
- “Show me s22 bttry info.”
Unconventional phrasing:
- “Phones that, like, kinda have good battery but not too heavy but still nice screen?”

Stay Ahead of Risks. Deploy AI with Confidence.

At Galtea Platform, one of our core capabilities is enabling organizations to simulate a wide range of user interactions, covering all segments of this taxonomy. Through a proprietary, research-driven methodology, we identify vulnerabilities and edge cases before your system goes live, ensuring your LLM product operates reliably in production.

If this interests you, book a demo with us: Galtea Demo