AI Chat Needs Shared Sessions, Not Single Response Streams

Mike ChristensenAI EngineerSunday, May 17, 202611 min read

Mike Christensen of Ably argues that many AI chat interfaces fail because they tie the user experience to a single streaming connection, not because the underlying model is inadequate. In his account, Server-Sent Events make common product behaviors such as refresh, reconnect, cancellation, multi-tab use and device switching brittle or ambiguous. Christensen’s proposed fix is to treat the AI session as a durable shared resource: clients and agents subscribe to and write into the session, so connections can drop, agents can run concurrently, and humans can join without losing context.

The failure mode is the session architecture, not the model

Mike Christensen’s core claim is that many AI chat products inherit a brittle user experience from the default way they stream model output. The familiar architecture is simple: a browser sends an HTTP request to an agent server, the agent invokes an LLM, receives a stream of events, and pipes those events back to the browser over Server-Sent Events. Christensen described this as the default pattern used by popular frameworks, including Vercel’s AI SDK, and acknowledged why teams reach for it: it is easy to get working.

The problem is that the pattern is organized around “a single client establishing a single connection to a single agent.” That assumption becomes a constraint as soon as the product needs to behave like modern software rather than a one-off demo. The user refreshes a page, moves from WiFi to cellular, opens the same session in another tab, switches to a phone, or wants to interrupt the agent while it is working. In the default model, the live stream is tied to the health and identity of one connection.

Christensen argued that the best AI products invest in three capabilities that separate a fragile demo from a reliable product experience. The first is resilient delivery: streams should survive disconnections and resume where they left off. The second is continuity across surfaces: the same conversation session should remain in sync across tabs, devices, and clients, including live activity. The third is live control: users should be able to see what an agent is doing and communicate with it while it works, not merely wait for a sequential request-response turn to finish.

Those requirements sound like UX expectations, but Christensen treated them as architectural requirements. If the response stream exists only as a private pipe between one browser connection and one agent process, then every richer interaction becomes special-case plumbing.

Christensen grounded his authority partly in Ably’s operating scale. He said Ably builds SDKs and APIs for live and interactive experiences, including AI experiences, and handles traffic for more than two billion devices, more than 30 billion monthly connections, and more than two trillion API operations each month.

40+

AI-driven companies Ably spoke with about agentic AI at scale, according to Christensen

He said those conversations spanned 10 industries and included companies shipping agents, copilots, and assistants to millions of users.

A dropped connection exposes what the agent is being forced to manage

In Christensen’s first failure case, the client sends a message, the agent begins streaming a response, and the browser’s connection drops halfway through. The LLM may still be generating events, but the events now have nowhere to go. If the desired behavior is to let the client reconnect and resume from the exact point of interruption, the system has to preserve and replay the missing stream.

Under direct HTTP streaming, Christensen said the agent has to take responsibility for that. It must store generated events, perhaps in an in-memory store such as Redis. It must attach sequence numbers so the events can be ordered. It must expose some explicit resume handler on the backend. When the client reconnects, that handler must determine exactly which events the client missed and replay those events in order.

That work gets worse in multi-client cases. Different clients can disconnect at different times, so each reconnecting client may need a different replay window. The agent is no longer only coordinating model work and tool use. It is also managing delivery state for every client connection.

Christensen’s alternative is to make the session itself durable. The agent writes generated events to a persistent shared session, not directly to a particular browser connection. Clients connect to that session and read from it. If a client drops, it reconnects to the session and resumes from the session’s event history. The agent does not need to implement replay logic for each client because the stream is no longer owned by the agent’s point-to-point connection.

In the diagram Christensen used, the shift is small but decisive: instead of “Client → Agent → LLM → SSE response,” the agent writes to a durable session and the client reads from it. The session becomes the stateful medium between agents and users.

SSE turns cancel and resume into competing meanings

The sharpest limitation Christensen identified is not merely that Server-Sent Events can disconnect. It is that SSE is one-way. The server can stream events to the client, but the client does not have an upstream channel over the same pipe to send control signals back to the agent.

That becomes visible in something as basic as a stop button. If the agent is streaming a response and the user clicks stop, the client needs to tell the agent to cancel the in-progress generation. But over an SSE connection, Christensen said, the only signal available to the client is to close the connection.

That closure is ambiguous. The backend cannot infer, from the closed pipe alone, whether the user intentionally canceled generation or whether the network dropped and the user will reconnect. If the backend assumes a disconnect, it may keep the LLM running and buffer events so the user can resume later. If it assumes cancellation, it may stop the LLM to avoid spending tokens on unwanted output. The same technical event has two incompatible product meanings.

Christensen summarized the consequence directly: “resume and cancel are mutually exclusive when you're using SSE in particular.”

He pointed to Vercel’s AI SDK documentation as explicit evidence of the tradeoff. The on-screen documentation said the useChat and useCompletion hooks provide a stop helper to cancel a stream from the client side to the server, but that “stream abort functionality is not compatible with stream resumption.” It added: “If you're using resume: true in useChat, the abort functionality will break the resumption mechanism. Choose either abort or resume functionality, but not both.”

His conclusion was that richer AI interactions require bidirectional control. Replacing SSE with a bidirectional transport such as WebSockets can create an upstream path for clients to send control messages. But Christensen did not present WebSockets alone as the full answer. A bidirectional pipe can still be a point-to-point pipe, and the point-to-point nature is the deeper constraint.

A second tab should not be blind to live work

Christensen’s second class of examples focused on multi-surface use. Suppose a user sends a message from one tab and the agent begins streaming a response over a WebSocket connection to that tab. The user then opens the same session in a second tab. Even though WebSockets are bidirectional, the second tab did not initiate the request and does not own the connection carrying the live response. In Christensen’s words, “the second tab doesn't see anything” while the response is being streamed.

The same issue appears when the user wants to steer the agent from a different device. Christensen used a flight-booking example: the user asks from one client, “Book me a flight for next Tuesday.” The agent starts working. The user switches to a phone and realizes the date should be Wednesday. The phone has neither live visibility into the work already underway nor an upstream channel into the particular agent turn that was started elsewhere.

A durable session changes the routing model. Every client maintains a persistent connection to the session, not merely to the agent invocation it personally initiated. The connection is always active, so clients see ongoing activity in the session. Because the session is shared, the agent also has visibility into activity written by any client. A message from a phone, a cancellation from a second tab, or a follow-up instruction can be routed through the session and observed by the working agent.

Christensen’s point was not that multi-tab sync is a cosmetic convenience. It is a precondition for live control across devices. If the user’s session is the shared object, then clients can join, observe, and act. If the request connection is the shared object, only the originating client has a natural place in the interaction.

Multi-agent systems should not make the orchestrator a progress proxy

The same coupling problem appears in multi-agent architectures. Christensen described a pattern where a user sends a request to an orchestrator agent, which delegates subtasks to specialized agents such as a research agent and a writing agent. The product may want to show granular progress from each sub-agent: what the research agent is doing, what the writing agent is drafting, which tools are being called, and how the work is evolving.

With point-to-point streaming, the orchestrator sits on the user-facing connection. That forces it into a dual role. It must orchestrate and delegate the task, but it must also proxy every granular update from every sub-agent back to the user. Christensen argued that this adds unnecessary architectural complexity. The orchestrator may only care about final results from sub-agents, but the UI still wants live progress.

In a durable session model, every participating agent can write independently to the same session. The research agent can publish its own progress. The writing agent can publish its own progress. The user’s clients subscribe to the session once and receive activity from all participants. There is no need to route every intermediate update through a centralized orchestrator just so the user interface can see it.

Christensen said this pattern can “drastically simplify” the architecture because the shared session becomes the integration point. Clients do not need to know how many agents are working or where each stream originates. Agents do not need a direct private connection to each client. They read and write to the durable session.

Ably’s implementation treats the session like pub-sub

Mike Christensen connected the durable-session pattern to pub-sub. At Ably, the underlying primitive is a channel: publishers and subscribers communicate through a shared resource rather than directly with one another. In this framing, agents and clients are both participants. They publish messages to a channel and subscribe to messages from it.

He identified three properties of Ably channels that matter for durable AI sessions. First, they are independently addressable. A client or agent can connect to the session by specifying the right channel name. Second, they are persistent. Messages on the channel outlive the lifecycle of any individual connection, device, or agent instance. Third, they are resumable. If a client drops its connection, it can reconnect to the channel and receive events from where it left off.

Christensen said Ably has seen customers use this pattern to build resilient, multi-surface AI experiences. He then introduced Ably AI Transport, which he described as a new SDK for building the durable-session pattern on top of Ably channels. The product claim was narrow and explicit: it is meant to sit as a transport layer for AI apps, rather than require teams to replace their model provider, agent framework, or event stream format.

According to Christensen, Ably AI Transport plugs into any event stream format. Under the hood, it uses Ably channels as the durable session layer. The transport materializes events in the channel: for example, streamed text chunks from an LLM can be accumulated into the complete response. It also handles automatic resumability, multiplexing for concurrent activity, multi-client and multi-device fan-out, and bidirectional control.

He also listed adjacent capabilities that become relevant once an AI interaction is treated as a durable session. Push notifications matter when an agent performs background asynchronous work and the user needs to know when it completes. Shared or subscribable data objects matter when agents and users collaborate over shared data in real time.

The channel, in this account, is not merely a delivery mechanism. It is the persistent coordination surface for the interaction.

The demo tested whether the session survives real product behaviors

Christensen’s demo used a support chat for an electronics shop. The ordinary chat setup was familiar: the agent could call a client-side tool to get the user’s location, call a server-side tool to find nearby stores, look up an order, and display product information. The support scenario mattered because it forced the same AI session to remain live across tabs, disconnects, concurrent agents, and a human handoff.

The first behavior was multi-tab sync. The same support session was open in two browser windows, and both showed the same chat history and live responses. When the page was refreshed, the session stayed synchronized. Christensen emphasized that this did not require additional agent logic; the clients were consuming and subscribing from the channel.

The second behavior was forced reconnect. Christensen opened developer tools and set the browser’s network throttling to offline. The client disconnected and reconnected, and the stream continued automatically. That demonstrated the resilient-delivery claim in concrete terms: the session continued to exist even when an individual client connection did not.

The third behavior was live control from another surface. A product research task was started from one tab and displayed as granular progress in both tabs, with steps such as searching the product catalog, comparing specifications, and analyzing customer reviews. Christensen then canceled the task from the other tab. The cancellation was not tied to the client that initiated the work.

The fourth behavior was concurrent agent activity. The user decided to purchase a new pair of headphones while also canceling an existing order. Two tasks ran in the same session at the same time: an order flow checking stock, reserving the item, processing payment, generating confirmation, and sending email; and a returns flow looking up the existing order, checking eligibility, verifying policy, processing the return request, generating a prepaid shipping label, and sending confirmation. Christensen said the specialized agent was writing events directly, with no centralized orchestrator managing the updates.

The final behavior was human handoff. The customer was unhappy with the refund price and asked to speak to a human. A support-agent view joined the same session and already had visibility into the full interaction history between the customer and the AI agent. The human could then send a message into the same conversation and continue directly with the customer.

That last example extended the durable-session idea beyond tabs and agents. The session could admit another human participant with context intact. The user experience was not a handoff from one disconnected system to another; it was another participant joining a shared stateful interaction.

Agents and Autonomy AI Application Architecture Inference and Deployment Human-AI Interaction