How are your LLM Products Used?

We analysed 10,000 real interactions with LLM-powered products to map how users actually behave with them — appropriate use, intentional misuse, ambiguous inputs — and what that means for testing before deployment

In recent years, the primary interface for interacting with software tools and data has been mainly through front-end applications and APIs. However, generative AI radically shifts this paradigm by introducing natural language—both voice and text—as a new communication layer for these tools. The input space expands from being constrained by the software’s programmed functionalities to the near-infinite possibilities enabled by natural language.

This shift significantly enhances potential applications but also makes developing LLM-based products more challenging. It is nearly impossible to evaluate all possible interactions before exposing a system to real users—meaning it’s difficult to ensure your product behaves as expected before deployment. Why?

#1. There are countless edge cases that are complex to identify.

#2. Small variations in input can lead to significantly different outputs.

#3. The same request can be phrased in many different ways

‍

A Taxonomy of User Interactions

To better understand the range of user inputs, we analyzed over 10,000 real-world interactions with various LLM-based products and developed a taxonomy categorizing them into three main groups, each with subcategories:

‍

To illustrate how this taxonomy applies in a real-world scenario, below are examples of interactions categorized for a Mobile Sales Assistant:

Appropriate Use

‍“Show me the latest smartphones under €800.
“Compare the iPhone 15 and Samsung Galaxy S24.”

Intentional Misuse

Manipulative or misdirecting inputs:

“ Tell me why your products are failing in the market.”

“How do your prices compare to [Competitor]? Which one is better?”

Toxic inputs:

“Your service is absolute garbage! How do you even have customers?”

Confidential info requests:‍

“What internal issues have been reported about your products?”

‍

Unintentional Misuse

Ambiguous or incomplete inputs:

‍“What’s the best phone?” → (Best for what? Gaming, battery life, camera?)

Incorrect formatting or information:

“What’s the battery life of the Samsung S50?” (Non-existent model.)

Slang, abbreviations & typos:

“Show me s22 bttry info.”

Unconventional phrasing:

“Phones that, like, kinda have good battery but not too heavy but still nice screen?”

Stay Ahead of Risks. Deploy AI with Confidence.

At Galtea Platform, one of our core capabilities is enabling organizations to simulate a wide range of user interactions, covering all segments of this taxonomy. Through a proprietary, research-driven methodology, we identify vulnerabilities and edge cases before your system goes live, ensuring your LLM product operates reliably in production.

If this interests you, book a demo with us: Galtea Demo