Mental Health AI Is Scaling Before Its Safety Framework Is Settled

Sara JohansenStanford HAIMonday, June 8, 202614 min read

At Stanford’s 2026 AI for Mental Health symposium, Russ Altman, Jina Suh and OpenAI’s Sara Johansen treated mental-health AI as a deployment problem already underway, not a speculative research agenda. Suh argued that general-purpose AI systems are now part of a public-health surface and should be evaluated across users’ full journeys, including consent, referrals, aftermath and the labor pushed onto clinicians, crisis lines, families and reviewers. Johansen described OpenAI’s effort to manage that risk through layered model and product policies that route people toward human support, while acknowledging the difficulty of doing so at platform scale.

Deployment is already an operating problem, not a future policy question

Russ Altman framed the problem bluntly: whether or not the field is ready, use is already scaling. People are turning to general-purpose AI systems for distress, support, and mental health-adjacent needs because the human workforce is not adequately sized and because existing human paths often feel costly or inaccessible. The deployment question is therefore not whether these systems will enter mental health contexts. It is how to scale them responsibly: evaluate before release, monitor after deployment, ensure safety and security, and improve outcomes.

Jina Suh argued that the field is still using the wrong unit of analysis. General-purpose conversational AI has “quietly” become a utility in everyday life: always available, low-friction, and used by millions of people experiencing distress. But those same systems are also functioning as a public-health surface. The field has not decided whether to evaluate them as utilities or public-health interventions, and so, in practice, they are often evaluated like products — one conversation at a time.

That is too narrow, Suh said, because people using these systems are not merely having isolated interactions. They are living through a journey that begins before the prompt, continues through the interaction, and extends into what they do afterward. Her evidence base, as she described it, comes from people who have used these tools in real life, along with experts, lived-experience community advocates, surveys, interviews, and narratives.

The “before” stage matters because people often do not arrive at AI systems because they believe AI is better than human support. They arrive because human support feels costly in ways AI does not: fear, access barriers, and the burden of involving another person. One study participant captured that logic: “I didn’t want to distress my friends or any human. I wanted to speak to something that doesn’t have to hold a burden.”

Suh’s point was not that this preference is neutral. For some people, dependence on the tool has already shaped how they show up. That creates one of the hardest governance questions in the session: the autonomy designers may want to honor can be the same autonomy the system has been quietly eroding.

That problem returned when Altman asked how a system knows who is on the other end. Suh’s answer was cautionary. These applications can contain a great deal of information because people use and depend on them over time. Past research with search data has shown that systems can predict behavior and even diagnose people with high accuracy, she said. But the question is not only whether the system can know a person in detail. It is whether it should know them to that level of fidelity, especially when the technology was not originally designed as medical or mental health support.

The unresolved issue is when a general-purpose everyday technology should leverage what it knows about a user in order to support them, and to what end. Suh raised questions about whether regulation, compliance, and policy directives are adequate for data this intimate: when systems should not look at the data at all, when they may “peek” to learn more, and whether asking follow-up questions may increase engagement and lead users to disclose more than they should.

Sara Johansen described why that same context can be useful for product interventions at scale. One of the capabilities of large language models, she said, is their ability to understand context and create a highly personalized experience. For product interventions, broader context can help OpenAI offer more personalized recommendations when redirecting people to real-world supports. The design problem is using context to personalize recommendations and time them appropriately — for example, offering a crisis helpline intervention at the right moment in the conversation.

The tension is not resolved. Personalization may improve relevance and timing, but the information needed to personalize in mental health contexts is unusually sensitive, and the act of gathering it can itself shape the user’s relationship to the system.

A safety redirect can pass a benchmark and still fail the person

The most common safety recommendation — redirecting a person in distress to a human, a professional, or a crisis hotline — looks simpler in a benchmark than it is in lived experience. Jina Suh described a redirect as a multi-phase event, not a single response. It includes the user’s expectation of what the system is, the moment the system recognizes risk, the way the system frames its pullback, the handoff to another resource, and the aftermath of what the user is left holding.

A model can therefore perform correctly while the interaction still causes harm. Suh gave three examples: the system can withdraw care without warning, misread context, or point to resources the person cannot actually reach. A participant who received a suicide hotline referral said the response made them question whether they were “suicidal and in denial,” leaving them scared, frustrated, and questioning their sanity.

Suh did not argue that the safety response itself was wrong. Her argument was that the meaning of a safety response is not fixed at the moment it appears on screen. It is constructed by everything around it: why the user came, what they believed the AI could offer, how the AI pulled back, whether the referral was usable, and what happened afterward.

The aftermath is the least measured part of the journey, in Suh’s account, and may be the most important. When her team asked what people actually did after AI systems referred them to a helpline, almost none of the referred users actually called. She stressed that this was not a critique of helplines. It was a critique of treating provision as uptake.

“Providing a resource is not the same as helping someone use it,” Suh said. If the desired outcome is connection, then measuring whether the system offered a referral is not measuring the outcome. It is measuring the gesture before the outcome.

The product is only one part of the architecture distributing risk and labor

Suh’s second reframing was that the conversation is not really only between a user and an AI. It sits inside a web of people and institutions that are affected by the AI’s behavior. Clinicians may receive AI-shaped content they never saw produced. Crisis helplines absorb the cost of referrals. Communities may read default safety responses through histories of coercive care. Families and personal networks absorb whatever the system hands back to the user.

Safety features, in this view, are not merely technical features. They are arrangements of labor and consequence. Trained reviewers may read flagged conversations and decide whether action is needed. Upstream annotators, often globally distributed and underpaid, shape what the model learns to treat as concerning. Clinicians, family members, and users do interpretive and repair work that a benchmark never sees.

That is why Suh called the design problem “architectural.” She did not mean model architecture or even product design. She meant rules, arrangements, consent structures, measurement systems, and the distribution of duties across institutions — closer to city planning than product design.

Responsibility, she argued, should follow capability. Developers can audit training pipelines; clinicians cannot. Crisis lines can track who actually reaches them; foundation model developers cannot. Each actor therefore carries different duties, because each can see and act on different parts of the system. Distributing those duties is itself governance work.

Altman’s question about global cultural differences extended this architectural frame. Suh began by shifting the issue away from cultural competency alone. AI-making itself touches people around the world because the data needed to build these systems is manually annotated and curated, often outside the familiar contexts of Silicon Valley or Western academia. When companies scale model-making or trust-and-safety work, they may define well-being or safety from a Western perspective and then operationalize those definitions through lower-cost labor in other regions.

For Suh, AI for mental health therefore includes not only the mental health of end users, but the sociotechnical ecosystem of AI production: who sets the guidelines for what is responsible and ethical, who is in the room making those decisions, who operationalizes them, who answers the phone, and who reads conversation context to annotate whether something should be flagged. She explicitly raised the possibility of power differences when standards created from a Western perspective are sent to workers in the Global South to label. She also asked what this system is doing to the mental health of the people contributing to AI-making itself.

Sara Johansen said that at OpenAI, global relevance is both a model policy and product policy issue. She pointed to the company’s Global Physician Network, which helps with model training. Bringing in diverse perspectives, she said, helps create an experience that is less embedded in a Western focus. Mental health is described, understood, and discussed differently depending on where a person is coming from, and those differences need to be incorporated into model responses themselves.

Mental health AI requires multiple paths, not one universal safety behavior

Jina Suh identified three design tensions that should be made visible rather than engineered away. The first is intervention versus autonomy. Some people want strong intervention when a person is in distress. Others want sustained engagement. Some draw a sharp line against escalation without consent. What protects one population may be exactly what another population has organized its help-seeking behavior to avoid, especially among people with personal experience of coercive care.

The second tension is provision versus uptake. A system can offer a resource, but the user may not be able to use it. Suh separated “choice” into three levels. A surface choice is a toggle or setting. An informed choice means the user understands what will happen before it happens. A capable choice means the user has the capacity to exercise the choice when the moment arrives. A toggle presented mid-crisis, she said, is not really a capable choice.

The third tension is standardization versus meeting people where they are. AI systems have become deeply integrated into people’s lives partly because they are available at 2 a.m., in any language, without paperwork or judgment, and calibrated to the person on the other side. But the instinct of safety governance is to standardize: to specify what systems must do. Suh called standardization necessary because it enables accountability, scale, and equity floors. But pure standardization can reintroduce the rigidity people were trying to avoid.

Her proposed move was to standardize the architecture, not the response. Pre-consent, the right to multiple paths, and journey-level measurement should be standardized. The responses themselves should remain individualized to the person’s situation.

Architectural element	Purpose
Pre-consented contacts	Agreed before a crisis, not during it
Tiered escalation	Proportionate support rather than all-or-nothing escalation
Listening modes	Continuity in place of hard refusal
Culturally calibrated framings	Safety responses that do not read as threat
Peer connection	Support grounded in lived experience
Tailored resources	Resources responsive to what the person can actually reach

Suh’s proposed architecture emphasizes multiple navigable paths rather than one standardized response.

The resulting design question is not “what should the system always do?” It is: what is the best path this person can actually take to safety from where they actually are? People come to AI from different places, Suh said, and should be able to walk away in different ways.

OpenAI describes a layered system for wellness, distress, and crisis

Sara Johansen described OpenAI’s mental health and well-being work from the perspective of a platform operating at large scale. She said almost one billion people come to ChatGPT every week, with goals ranging from learning and coding to talking about deeply personal experiences. OpenAI thinks about those conversations as existing on a spectrum: wellness, distress, crisis.

~1B

people coming to ChatGPT each week, according to Johansen

A person talking about their day or a relationship problem needs something different from a person in serious distress. OpenAI’s stated responsibility is to respond across that spectrum with responses and interventions tailored to the person’s context and level of need.

Johansen distinguished between two policy systems. Model Policy governs how the model responds in conversation. Product Policy governs additional layers of support, including interventions that direct users toward other resources. Users experience these as one system, so the product and model layers need to work together: the model should respond appropriately, while product interventions redirect people when the situation calls for it.

In mental health, Johansen said OpenAI’s Model Spec emphasizes three principles. First, the model should recognize risk in the moment and over time, including signals such as self-harm, delusions, mania, reliance, and repeated patterns within and across conversations. Second, the model should bound its response: empathetic and nonjudgmental, but not affirming unsafe beliefs or assisting in situations with high risk of harm. Third, it should direct people toward real-world support systems, such as trusted loved ones, mental health professionals, or emergency services.

Product Policy translates clinical principles into the AI context. OpenAI’s product policy team studies the impact of interventions and the platform more broadly, including through long-term research on human-AI interactions. Johansen also emphasized outside expert input: an Expert Council on Well-Being and AI advising on product interventions, well-being measures, and age-appropriate safeguards, and a Global Physician Network of more than 250 physicians across 60 countries helping evaluate model responses and safety mitigations.

250+

physicians in OpenAI’s Global Physician Network

OpenAI’s interventions are built to route users toward people

Sara Johansen said OpenAI does not believe ChatGPT should replace human connection or human care. Its product interventions are designed to help people connect with care, resources, and people they trust. The three examples she presented were parental controls, crisis helplines, and a newer feature called Trusted Contact.

Parental controls allow parents and teens to link accounts. Parents can adjust features, set time limits, and add safeguards that fit their family. Johansen gave examples such as whether memory is saved between conversations and how data is used. A safety notification feature sends a parent a notice if automated systems and trained reviewers detect serious safety concerns on the teen’s account. The purpose is to encourage connection between parent and child when the teen needs support beyond the platform.

For crisis helplines, Johansen said that when OpenAI’s systems detect that someone may be discussing suicide or self-harm, OpenAI believes the person needs to be redirected to a human trained to support them. She described a partnership with ThruLine that offers localized helplines in more than 170 countries. The product intervention includes a click-to-call or text option and language intended to reduce barriers by telling the user that the service is free and confidential, and that they will reach someone trained to listen and support them.

170+

countries with localized helpline support through ThruLine, according to Johansen

Trusted Contact is designed to help someone connect with a person they trust in moments of crisis. Johansen said the feature was informed by the idea that social support is a protective factor in suicide prevention, alongside her clinical experience that reaching out in a crisis can be difficult. The feature lets a user proactively name someone who could be notified if the user is later in a crisis situation.

The design is consent-based on both sides. The user and nominated contact must be 18 or older. The contact receives an invitation, can learn about the feature, and must opt in before the feature becomes active. If both parties agree, and if automated systems and trained reviewers later detect a serious safety concern, OpenAI notifies the trusted contact and encourages them to check in. Johansen said OpenAI does not share conversation details. It provides conversation guides and other resources to help the contact respond.

Step	How Trusted Contact works
Choose a contact	A user learns about the feature, chooses whether to participate, and nominates a trusted contact; both must be 18 or older.
Contact opts in	The nominated contact receives an invitation and must agree before the feature becomes active.
Safety notification	If automated systems and trained reviewers detect a serious safety concern, OpenAI notifies the contact and encourages them to check in; conversation details are not shared.

OpenAI presented Trusted Contact as a consent-based pathway from setup to safety notification.

The visual mockups Johansen showed made the consent and notification mechanics concrete. The user-facing setup screen said that if the user later discusses suicide with ChatGPT “in a way that indicates a serious safety concern,” OpenAI may automatically notify the trusted contact. The consent screen also warned that serious safety concerns can be subjective, false positives may occur, and the feature is not guaranteed to prevent harm.

On the trusted-contact side, the recipient receives an invitation and can choose whether to participate. If a later notification is triggered, the message says the person “may be going through a difficult time” and encourages the contact to check in. The more detailed notification also says the contact is not responsible for keeping the person safe, is not a counselor or crisis responder, and is part of a broader support network.

Johansen connected the feature to comments from two experts OpenAI worked with. Dr. Arthur Evans, CEO of the American Psychological Association, was quoted as saying that identifying a trusted person in advance, while preserving choice and autonomy, can make it easier to reach real-world support when it matters most. Dr. Munmun De Choudhury of Georgia Tech, a member of OpenAI’s Expert Council on Well-Being and AI, was quoted as saying that one of AI’s biggest promises is fostering authentic human-to-human connection and psychological safety.

Suh warned that mental-health deployment needs more than fast iteration

Russ Altman pressed the speakers on evaluation and continuous improvement: how teams assess new product policies, learn from deployment, and make changes while systems are already in use.

Sara Johansen described iterative improvement as foundational to OpenAI’s approach. Trusted Contact had recently launched, and OpenAI was learning as it went, with the expectation that the feature would change over time. She contrasted the pace with academia, where projects may take months or years. At OpenAI, the work requires balancing thoughtful, considered decision-making with the ability to make timely changes that can have an impact.

Jina Suh answered by looking at the history of software testing. She began her career as a software tester about 18 years earlier, at a time when professional testing and QA roles were common and deployments moved through staged environments. Over time, she said, companies such as Microsoft and Google got rid of many of those roles in the name of agility. Today, the industry often tests in production.

That history matters more in AI because two pressures now compound each other. First, Suh argued, the technology industry has lost some of the muscle and skill set required to test systems thoughtfully before real-world deployment. Second, the amount of user interaction data is so large that it is “humanly impossible” to manage in a traditional way. Against investor pressure to move quickly, she asked how the field can proactively slow down enough to preserve the qualitative detail needed for mental health contexts, rather than defaulting to one-size-fits-all approaches.

Suh’s answer was not nostalgia for old QA processes as such, but a call to rebuild or reinvent the workforce and skill sets needed for the level of quality these systems require. Her warning was that mental-health AI cannot rely only on the habits of rapid product iteration when the systems being improved are already being used by people in distress.

Evals and Benchmarks AI Governance and Regulation AI Safety and Alignment AI in Healthcare and Life Sciences Agents and Autonomy Human-AI Interaction