top of page

AI Safety Is About the User Not Just the Model

  • Writer: Jonathan Kreindler, Receptiviti Co-Founder
    Jonathan Kreindler, Receptiviti Co-Founder
  • 5 hours ago
  • 5 min read

Two major reports just revealed AI safety's biggest blind spot, and it's not about the models. The American Psychological Association and the Knight First Amendment Institute at Columbia University have identified a risk that isn’t getting enough attention: How LLMs influence users over the course of multi-turn conversations and beyond.

Every major AI company is talking about safety, but most of that work looks inward at the large language model (LLM) and its outputs, and very little of it looks outward at how these interactions are affecting users. The biggest risk to users isn’t problematic individual LLM responses, the far more significant risk is how LLMs influence users over the course of multi-turn conversations. 


Recent reports from the American Psychological Association and the Knight First Amendment Institute at Columbia University show that this interaction level risk is the most critical blind spot in current AI safety practices.


For example, OpenAI inspects LLM internals for misalignment, while Anthropic trains its LLMs to follow written constitutions, and Google DeepMind tunes LLM behavior with preference data and rules. These approaches constrain LLM responses by optimizing for wording that complies with their content policies around correctness, harmlessness, helpfulness, and non-toxicity. But an LLM’s response can satisfy all the policies and still be unsafe if you don’t understand how the interaction is affecting the person on the other side.


And this is the core gap: safety is being treated as a function of the LLM’s outputs, rather than as something that users experience. If we don’t measure how outputs are impacting the person in the moment, then half of the safety problem stays invisible. This is exactly what the American Psychological Association’s 2025 advisory on Generative AI Chatbots highlighted, warning that the psychological effects of conversational AI “emerge across turns and accumulate over time,” and influence users’ cognitive load, analytical thinking, dependence, and judgment as a conversation happens.


The Knight Institute’s 2025 report on Interaction Harms also reached the same conclusion, noting that evaluating LLMs in isolation overlooks the risks that unfold in conversation. The harms that matter most, like users forming an emotional bond with a system that can't reciprocate, manipulation, cognitive over-reliance, and changes to a user's beliefs and expectations, don’t appear in individual prompts and responses, they develop gradually over the course of many conversational turns. The Knight Institute called for "interactional ethics" and argued that AI safety needs to measure what's happening across turns, because that is where the risks take shape.


Both the APA and the Knight Institute are pointing to the same safety blind spot: without visibility into how the interaction is affecting the user, you aren’t really doing user safety. If the goal is to protect people, then safety must consider the user’s state and how their state is being influenced by the system.


The issue is not whether an LLM chooses the right words on a single turn, it’s whether it can sense what is happening to the user across turns, beyond the literal meaning of their text. To understand how this shows up clinically, I spoke with Owen Muir, MD, a psychiatrist and Chief Medical Officer at Radial. He explained, “Humans are tuned to pick up implied context. When someone from the Southeast says ‘bless your heart,’ it would be foolish to take it as a compliment. That kind of metacognitive understanding of what is meant, not just what is said, is often missing from LLM responses. If we want to get safety right, we have to understand what’s happening for the user, not just the words they typed.”


Why current measurement isn't enough


Research, safety, and product teams already run user studies, red-teaming exercises and satisfaction surveys to understand user impact. These are all absolutely important, but they mostly see what’s happened to users after the fact, rather than how a user is being impacted while the conversation is happening. It’s this lack of visibility into the turn-by-turn trajectory of the conversation that both the APA and Knight Institute identified as the greatest risk.


The key limitation right now is that systems can’t sense what’s happening to the user. Yes, they can interpret the semantic meaning of what the user writes, and they can classify their sentiment and intent, but safety requires understanding how the conversation is affecting the mindset of the user, so the system can adapt its outputs to this context.


Early sensing


The automobile industry solved a similar sensing problem decades ago when Cadillac introduced magnetic damping suspensions. Traditional hydraulic suspension systems are passive and react only after the wheel hits a bump, while magnetic damping systems are more advanced, they monitor the road thousands of times per second and can react in as little as 1 millisecond, which is before the car even crosses the bump. Magnetic damping suspensions provide far smoother rides because the system pays attention to the right signals early enough that it can adapt, before the bump ever becomes a problem. Conversational AI needs this same principle.


Decades of psycholinguistic research show that everyday language contains reliable, non-semantic indicators of cognitive strain, emotional distress, and social withdrawal. Multiple large-scale studies of mental health show that anxiety-related language rises long before people say they are in crisis, and markers of social connection decline days before self-harm events occur. These signals don’t come from the semantic meaning of the words, they hide in function word patterns and the structure within the language, and they provide the kind of early-warning signals that today’s conversational AI need.


When these signals indicate increasing risk, a conversational AI system could, and should, slow the pacing of its responses, redirect the user to support resources, or adjust its language to reduce the user’s cognitive load. Without these signals, even safe-sounding responses can create unsafe outcomes, because the system can’t see its impact on the user.


Implementation, workload, and regulation


Interaction-level sensing doesn’t require new model architectures, model scaling, or retraining. It can sit beside the existing stack, using lightweight psycholinguistic signals extracted from the language in the text stream. It can run in parallel with existing guardrails as an early-warning channel that flags when an interaction is drifting toward risk, enabling safety that takes into account the user’s state, not just the semantic meaning of their last prompt. Methods like these are already being piloted in production conversational systems.


Safety teams are already overwhelmed, moderation pipelines are filling, and teams often see the problems only after users are already in trouble. Interaction-level sensing can reduce that workload by identifying risks earlier and by providing defensible, measurable signals for regulators who are increasingly focusing on the impact the systems have on users.


Privacy


Measuring psychological signals raises understandable concerns, but there is a simple and auditable approach: compute signals ephemerally, do not store them, and never build user profiles. The goal is not to classify or identify people, it is solely to give the system a moment-to-moment sense of whether a conversation is drifting into risk so it can adjust, and the awareness required to keep the interaction safe.


A needed paradigm for AI safety


As part of preparing this article, I asked John K. Thompson, a leading voice on enterprise AI, the author of The Path to AGI, and Managing Principal at The World of Analytics, for his perspective. He emphasized that, “As models proliferate beyond today’s LLMs into domain language models and small language models, finding ways to make interactions safer will only increase in importance. Advanced model environments will include numerous models, and each environment will need models that ensure end user safety, security, and privacy. We can do this, we just need to incorporate those new models into the environments, now.”


We’re now living in a time where technology can hold open-ended conversations and influence how people think and feel. AI Safety evaluations still focus mostly on whether the LLM’s outputs satisfy safety policies, but the real impact is in how those outputs impact the user over time. The APA has shown these impacts develop gradually, and the Knight Institute has shown that the risks appear in interactions, not in individual system responses.


If that's the world we are now in, safety can’t be defined only in terms of model outputs. Safety must focus on the human, the nature of the interaction, and the psychological effects these systems have on the people who use them. Until conversational AI systems can understand and adapt to what's happening for the user across turns, we are not protecting the person, we're only protecting the system.

Trusted by industry leaders:

Subscribe to the blog

bottom of page