An Event-Sourced Agent Harness Separates State Replay From Side Effects

Misha KaletskyAI EngineerThursday, May 14, 202616 min read

Jonas Templestein of Iterate argues that an agent harness can be reduced to an append-only event stream plus processors: synchronous reducers to derive state, and post-append hooks to perform side effects. His design puts model chunks, tool calls, errors, schedules, subscriptions and even processor deployment into the log, so a restarted agent can replay state without replaying old LLM calls. The larger claim is that agents and third-party services can compose by reading and appending to the same durable stream, with bounded waits and circuit breakers replacing tighter, blocking plugin interfaces.

The harness is an event log with processors attached

Jonas Templestein’s proposed agent harness has one primitive: an append-only stream of events addressed by a path. The working claim is that agent systems already behave as if they have a log — user messages, model deltas, tool calls, errors, state changes, scheduler ticks — but most harnesses leave some consequential behavior outside that log, visible later only in traces or side effects. Templestein’s preference is to make every meaningful occurrence an event, so the agent becomes debuggable by construction.

I’m kind of like a one abstraction kind of guy, right? Like, this should be like a, like web standards based, this should be, there should only be one thing, which is an event.

Jonas Templestein · Source

The system shape is small enough to state directly:

a durable stream API that can append JSON events to a path and stream events back from that path;
a processor contract with a synchronous reduce(event, state) function and an afterAppend(append, event, state) hook;
ways to run processors locally, through push subscriptions, as ordinary deployed services, or as dynamic workers configured by appending source code to a stream;
failure controls such as idempotency keys, pause and resume events, and a circuit breaker for streams producing too many events too quickly.

The split between reducer and hook is the central design constraint. The reducer derives state from the event stream. The hook performs side effects after an event has been appended. If a local processor sleeps, 100 events arrive, and the processor restarts, it should be able to catch up its state without reissuing 100 old LLM calls. The reducer can replay history safely because it is synchronous and side-effect-free; side effects belong in afterAppend, where they can be tied to the current processing moment rather than historical replay.

Templestein framed the stream service, events.iterate.com, as a proof of concept rather than a production deployment. It had no authentication, and he repeatedly warned that secrets did not belong in event payloads. But the lack of ceremony was also part of the point: a participant could curl a JSON object to a stream path and immediately see it become part of the durable stream.

The initial event envelope was intentionally plain. A posted event needed a type; most events also had a payload. The server supplied the streamPath, an auto-incrementing offset, and createdAt. A stream initialized itself with an initialization event at offset 1. Subsequent posts appended at later offsets. Event types could be short strings like hello-world, but Iterate’s own event types were URLs such as https://events.iterate.com/events/stream/initialized, partly because the URL can point to documentation while still functioning as an opaque type string.

The stream service was tolerant of malformed input in a way that reinforced the event-sourcing premise. When Templestein posted JSON without a valid type, the server did not silently reject it as an invisible failure. It appended an invalid-event-appended event carrying the validation reason. The error became part of the stream, because the error was also something that happened.

There was one exception he called out: if a stream is paused, further disallowed appends return an error rather than appending an error event. Otherwise, a paused or looping stream could produce an unbounded number of error events. Even in an event-sourced system, the service still needs pre-append enforcement for a small class of invariants.

Paths make agents and subagents addressable like files

The stream namespace was hierarchical. Every stream path began with a slash, and Templestein described the model as file-system-like: /jonastemplestein/hello-world, /jonas/example, child paths, parent paths. Streams are created implicitly: post to a path under the streams API and the service creates the stream.

That path-addressability is not only naming. It is part of the agent composition model. A parent stream can observe child-stream creation. A processor can append to the current stream, a child stream, or a parent path. Templestein’s example for subagents was deliberately informal: append to ./Boris, treat Boris as a subagent, and subscribe to whatever result events Boris emits. The parent does not need a special subagent API if subagent interaction is just more events at related paths.

He also argued that an agent should become publicly routable as soon as it exists. In his framing, a digital intelligent entity that is not a robot is basically an internet-connected server program that speaks HTTP. Giving the agent a URL avoids later inventing connector abstractions just to get a Slack webhook, a web form submission, or another HTTP-originating event into the system.

This design choice came with an explicit safety warning. The service had no authentication; participants could see each other’s streams; secrets did not belong in event payloads. Templestein treated those as solvable concerns, not as reasons to hide the agent behind a local-only interface. His preferred shape was still: the moment an agent exists, it has a URL.

The service also had namespace escape hatches. The UI exposed a “project slug,” and Templestein said a different project slug could effectively create a new database or namespace. During live use, that could be passed as an HTTP header if people stepped on each other’s paths.

The raw stream already has enough machinery to look like a runtime

Before any SDK, the stream could already do runtime-like work over plain HTTP. A client could append JSON to a stream, fetch the stream, or hold open a server-sent-events connection. With ?live=true, the connection stayed open and emitted future events as they arrived. Piping the SSE output through sed and jq made the stream readable in a terminal.

That meant a minimal agent shape existed before any framework. If one process appended hello-world and another process watched the stream and appended responses, the stream was already mediating agent behavior. Templestein described this as trivial enough that he spends substantial time “just curling it.”

Several stream features were implemented as events:

An idempotencyKey prevented duplicate appends when the same key was used twice.
A stream could be paused by appending a stream-paused event with a reason.
A stream could be resumed by appending a resumed event.
A scheduled append could create recurring or future events, such as a heartbeat every five seconds.
A subscription configuration event could tell the stream to push matching future events to a callback URL.
A JSONata transformer configuration could observe certain events, rewrite them, and append derived events.

The scheduling example was especially literal. To create a heartbeat, Templestein appended an event whose payload contained another event — { type: "Heartbeat" } — and a schedule such as “every five seconds.” Scheduling was not a special side channel. It was an event instructing the stream to append another event in the future.

The same applied to push subscriptions. Templestein contrasted the terminal SSE connection as a pull subscription — an HTTP client connects and receives events — with a configured callback where the stream pushes selected events to another server, Slack endpoint, or third-party URL. The push subscription itself was configured by appending an event.

This is the pattern that carried through the architecture: infrastructure capabilities appear as processors reacting to stream events, not as a separate management plane.

Reducers materialize state; hooks enact consequences

The TypeScript SDK and examples were meant to turn the raw curl shape into stream processors. Templestein described a processor runtime that consumes the stream, runs the reducer, and then runs the afterAppend hook. The simplest processor counted how many hello-world events had been seen. A slightly less trivial one would watch for ping and append pong.

The reducer pattern begins with the state required by a feature. For a basic LLM agent, that state might include a model string, a system prompt, a message history, whether compaction is currently happening, or whether a tool call is in progress.

The initial state for a simple agent could be an empty history, a system prompt, and a default model. Events then transform that state. If an agent-input-added event arrives, the reducer appends a user message to history. If an event represents model output, the reducer incorporates that output into whatever state shape the agent needs.

Templestein used the OpenAI Responses API as the concrete example. OpenAI expects messages in its own format, with a user role and content field. The local stream’s event might be deliberately simpler: agent-input-added with a string. The reducer becomes the translation layer between the event-sourced view of the world and the API-specific representation needed for a model request.

The model string and prompt also become natural event-sourced state. Templestein suggested a future model changed event: a tiny reducer branch could update the model field, and the next LLM request would use the new model. The same pattern would apply to other harness capabilities. A feature is not a bespoke control path; it is another event type plus a small reducer and, if necessary, an afterAppend consequence.

The fuller local TypeScript agent did not establish its claim by running cleanly onstage: Templestein’s editor and dependencies misbehaved, and examples had been refactored shortly before the session. The technical point still came through in the code walkthrough. The demonstrated runtime split was not “an OpenAI call happens in the reducer”; it was “events derive state, and the hook decides which current event should trigger the next consequence.”

Materialization can happen anywhere because reducers are cheap

An audience question pressed on where model streaming events become something clients can consume. If OpenAI or Anthropic produces output-item events, text deltas, and an end event, is the stream processor squashing those into materialized messages? Are the materialized events persisted back to the stream, or is materialization client-side?

Templestein’s answer was that every streaming chunk can be an event in the stream, and reducers can run in many different places. A processor may reduce raw model deltas into a message view. A UI may reduce raw events into feed items. A service-internal circuit breaker may reduce timestamps into a “too many events” decision. The event stream remains the shared substrate; materialized views are projections over it.

He used the stream UI itself as an example. The UI displays nicer “feed items,” but those are not the same as events. They are derived from events by a reducer that decides which event types are interesting and how to render them. In that sense, building a UI for the system is also stream processing.

The circuit breaker was the production example. Its initial state is paused: false and pausedReason: null. Its reducer accumulates timestamps for recent events. If the last 100 timestamps do not extend more than a second back, the processor’s afterAppend function throws the circuit breaker by appending a pause event.

about 100 events in a second

threshold Templestein described for circuit-breaking a looping stream

The pause feature is therefore implemented in the same processor style, except that the service’s built-in processors also have privileges third-party processors do not. Templestein said built-in processors can stop some events before append, which is why the pause mechanism could not be implemented purely by an unprivileged third party.

He stressed that this cheapness of reducers changes composition. If another agent harness, processor, or plugin publishes a reducer for its event types, a different processor can import that reducer and run it. Because it is synchronous and side-effect-free, using it is “practically free,” in his words. That lets abstractions build on other abstractions without requiring every component to share a process, language, or deployment environment.

Dynamic workers turn deployment into another event

The dynamic-worker mechanism made the deployment claim concrete. In the Iterate UI, Templestein appended an event of type https://events.iterate.com/events/stream/dynamic-worker/configured. Its payload contained a slug, ping-pong, and a JavaScript string exporting a processor. In other words, the processor was not deployed through a separate control plane and then connected to the stream; the stream received an event that configured the processor.

The on-screen payload showed the processor embedded directly inside the event:

export default {
  slug: "ping-pong",
  initialState: {},
  reducer(state) {
    return state;
  },
  async afterAppend(append, event) {
    if (event.type === "https://events.iterate.com/events/manual-event-appended") {
      if (event.payload.message === "ping") {
        await append({
          type: "https://events.iterate.com/events/manual-event-appended",
          payload: {
            message: "pong"
          },
          metadata: event.metadata ?? null,
        });
      }
    }
    return;
  }
}

The detail matters: this particular dynamic worker watched for manual-event-appended events whose payload message was ping, then appended another manual-event-appended event whose message was pong. That was separate from the broader agent-input pattern Templestein had described earlier, where a simple LLM agent might react to agent-input-added. The demo used a manual event type for the ping-pong proof; the architecture treats both as ordinary event types that a processor can reduce over or react to.

Once that configuration event was appended to a stream, the stream responded to ping with pong. The UI showed the added input followed by the derived pong event.

That was the deployment argument in its most compact form. A stream knew nothing about ping-pong behavior. A single event containing JavaScript configured a processor. The processor then ran against subsequent events and appended its own derived events. Templestein said the implementation used Cloudflare dynamic workers: the code is evaluated and run in a small dynamic worker in the backend.

He described this as the most exciting part of the system, though it requires several prior concepts before the payoff is legible. His broader claim was that a processor file exporting a defined processor could be bundled into an event. A basic AI agent — perhaps 40 lines of code — could then be appended to any stream, and that stream would become an AI agent.

You can write like the 40 lines of code required for a basic AI agent and then you can append that to any stream and then that stream becomes an AI agent.

Jonas Templestein · Source

Secrets were the immediate complication. Templestein did not want an OpenAI API key embedded in the event stream. The version he had running used environment variables outside the stream, with a mechanism that substitutes the secret into a fetch request header. The source code or bundled processor can live as an event; secrets need a separate handling path.

The dynamic worker also raised versioning and overwriting questions. The event type is “configured” rather than “created” because processors have slugs and can be overwritten. Templestein said an AI agent could, in principle, alter its own functionality by appending a new dynamic-worker configuration event with different JavaScript.

One audience member asked what happens if the pushed code has an error. Templestein’s answer followed the system’s rule: the only thing that can happen is an event. An error should produce an error event. Another speaker added that since the demo evaluated JavaScript, there was no TypeScript type-checking step as implemented, but a TypeScript compilation step could itself be added: compiler errors could be emitted as events, an LLM could attempt to fix them, and a new event could carry the fixed code.

The bundling problem was similarly treated as another processor opportunity. If dynamic workers cannot import arbitrary NPM packages unless bundled into the script string, Templestein speculated that another event type could carry an unbundled script plus package.json. A processor could observe that, bundle it, and append the “fat” bundled event. In his framing, that could become a way to hack on an agent harness without separately installing dependencies, deploying servers, or managing a conventional runtime.

Distribution is the feature and the hazard

Templestein wanted the harness to be distributed in a stronger sense than “the server is remote.” A processor could run on one computer, another plugin on another server, another in a dynamic worker, one written in TypeScript, another in Rust. If they all consume and append to the same stream, they can participate in the same agent system.

The hazard is also obvious: race conditions and loops. Two plugins can respond to one another forever, appending an endless stream of events. Templestein treated this as a double-edged sword but not as a new problem. In his view, most agent harnesses already face these issues; making the system explicitly stream-based at least lets the infrastructure preempt them with mechanisms like pause events and circuit breakers.

This distributed design also informed his skepticism of “before hooks.” Built-in processors can stop certain events before append — pausing requires that kind of pre-append enforcement — but Templestein was broadly against third-party before hooks in agent loops. He pointed to risks such as performance regressions, cost increases, and breaking context caching. His preference was eventual consistency: processors can inject context or warnings, but the core loop should not become hostage to every hook.

The concrete alternative he proposed was a bounded wait. Before making a new LLM request, an agent might wait up to 200 milliseconds for safety checkers or context injectors to append relevant events. If they arrive in time, the agent uses them. If they do not, the agent proceeds anyway. The result is resilient composition rather than synchronous blocking.

That model applies to prompt-injection protection, RAG, and other just-in-time services. A Notion indexer could watch the stream, decide whether it has relevant context, and try to append it before the model call. A safety checker could observe that an agent is about to act and append a “desist” or warning event. The safety checker’s implementation, language, deployment, and business model need not be embedded inside the agent harness. It only needs to participate in the stream.

Templestein contrasted this with current plugin shapes such as MCP tools or CLI integrations. If someone wanted to build a Claude Code plugin business, he argued, they would have to get Claude Code to proactively call the right MCP tool, or install and authenticate a CLI, or otherwise fit into a narrower extension point. In the stream model, the plugin can plug into the agent’s event stream directly: somebody notices Claude is about to do something and appends more events.

An audience member summarized the implication as just-in-time services offered by other agents or processors, potentially chargeable. Templestein agreed, with the caveat that humans would probably be charged first. The larger point was not billing mechanics, but that a distributed stream gives outside processors a way to intervene without being inside the same process or request path.

The primitive sits between queue, pub-sub, stream database, and code runner

When asked whether everyone would host their own Iterate instance or use a shared one, Templestein did not prescribe a single deployment model. He called the core service a streaming database and said the approach could work on almost anything “durable stream shaped.” The service he showed was built for the exercise.

He described the primitive as a combination of a queue, a pub-sub system, a streaming database, and something that can run code. Durable streams, in his shorthand, are append-only event logs with offset tracking. Some systems expect the client to track the last consumed offset, as in Kafka-like models. Other pub-sub systems, including cloud offerings, track offsets server-side and deliver the next event. The Iterate service was close to the Durable Streams API specification he referenced, though not exactly compliant.

The additional piece he found interesting was push subscription out of the stream. A stream can be consumed by a live HTTP client, but it can also notify processors or external services when events arrive. That is what makes distributed processors practical: the stream can wake up code elsewhere and say there is work to do.

Authentication and abuse control remained outside the demo but not outside the design discussion. An audience member asked about exposing a stream-backed image generator, where anyone could pass a stream to a local LLM that generates images. Templestein’s answer was direct: it would need authentication. The public, unauthenticated namespace was suitable for a disposable playground cleared aggressively, but not for metered or resource-intensive services.

He sketched, rather than fully specified, an authorization model that could still be expressed through events. Each event might carry client provenance: who submitted it, what they were allowed to do, and how confident the service was that the client was entitled to perform the operation. A stream might become public because an event made it public. Until then, perhaps only the creator could write to it. Templestein described the details as “gnarly,” while treating the general problem as solvable.

Agents and Autonomy AI Application Architecture Inference and Deployment