Agent Swarms Need a Coordination Layer, Not Another Runtime

Lou BichardAI EngineerSaturday, May 23, 202612 min read

Lou Bichard of Ona argues that companies building fleets of background coding agents are repeatedly recreating the same missing infrastructure. In his account, runtimes, orchestration and triggers are increasingly solved; the unresolved primitive is coordination — the layer that lets agents track state, hand off work, enforce gates and know when they can move through the software development lifecycle. GitHub, Linear and CI can expose artifacts and signals, Bichard says, but they are not agent-native coordination systems; he suggests the missing layer may need to take the form of a CLI gateway that local and remote agents can call.

The missing layer is not where most teams are looking

Lou Bichard frames the current agent-infrastructure problem around a goal he calls the “software factory”: not simply one engineer running many coding agents in parallel, but an incremental movement of the human from “in the loop” to “on the loop” across the software development lifecycle. In that version of the system, work can progress proactively while the human is not at the computer.

Much of the infrastructure for this is either already available or close to settled. Agents need somewhere to run. They need orchestration. They need triggers. But once teams start running fleets or swarms of agents, the gap appears elsewhere: coordination.

Coordination, in Bichard’s definition, is the layer that governs what work should happen, tracks progress, enforces gates, manages state, and lets agents interact with each other. It is the missing primitive beneath practical agent swarms. Agents need to pick up tasks from each other, pass messages, collaborate, and determine whether they have satisfied the requirements of a given stage before moving forward. Existing human tools do not provide that cleanly.

GitHub is not a coordination layer for agents.

Lou Bichard

GitHub can host the artifacts agents produce: pull requests, reviews, CI failures, merge conflicts, labels. But it becomes overwhelming as a coordination surface. A swarm of agents can raise pull requests, attempt fixes, respond to failures, and continue iterating, but the resulting GitHub activity is noisy for a human trying to know where to intervene. Linear, Bichard said, suffers from a similar problem in Symphony: it can be used as a substrate, but it is still a human workflow tool being repurposed in strange ways for agents.

That distinction matters because the blocker is not presented as the absence of one more model capability. The practical bottleneck in building software factories is infrastructure: runtime, orchestration, triggers, and especially coordination.

Primitive	Role	Bichard’s status
Agent runtimes	Where agents execute: environments, filesystems, and tools	Mostly solved
Orchestration	Kick off agents and manage workflow runs	Effectively solved
Triggers	Start sessions from PR events, cron, webhooks, or manual invocation	Solved
Coordination	Govern work, track progress, enforce gates, and manage state	Missing layer

Bichard’s four infrastructure primitives for a software factory

Software factories require more than parallel coding agents

Bichard distinguishes between several patterns for running coding agents at scale. A swarm starts with one intent, fans it out to multiple agents, and funnels their work back into a single result such as a pull request or task. Fleets are different: they fan agents out across repositories, often to perform independent work at organizational scale. Event-driven agents come online in response to pull requests, CI failures, webhooks, monitoring alerts, or similar events. Scheduled agents run on recurring jobs such as nightly updates, weekly audits, or daily triage.

Those patterns are useful, but in Bichard’s definition they are not themselves a software factory. A factory is a more ambitious control system for moving work across the SDLC with decreasing human initiation. Humans still supervise, approve, and set direction, but they are not manually driving each individual step.

Several public examples show large companies building versions of this infrastructure internally. Stripe’s Minions system appeared in Bichard’s material as “one-shot, end-to-end coding agents,” producing roughly 1,300 pull requests per week, with human-reviewed but “zero human-written code.” The slide emphasized isolated devboxes per run, fast spin-up, the same tools and CI as humans, agentic loops combined with deterministic steps such as linters, tests, and git, and more than 400 MCP tools.

Ramp’s Inspect system was another example. The slide described it as a background agent with full development context, responsible for roughly 30% of merged PRs. It runs in sandboxed VMs on Modal, with full-stack sessions including Vite, Postgres, and Temporal, and is wired into Sentry, Datadog, LaunchDarkly, Buildkite, GitHub, and Slack.

The point was not that Stripe, Ramp, and Ona have built identical systems. It was that large companies are independently building internal infrastructure for background agents. Stripe called theirs Minions. Ramp called theirs Inspect. Ona has been building its own platform for background agents on top of development environments. The repetition matters because it shows the same class of infrastructure emerging in multiple places, even as one primitive — coordination — remains unresolved.

Harness engineering turns repository artifacts into feedback for agents

Lou Bichard tied the software-factory idea to what OpenAI has called harness engineering. He credited OpenAI and Ryan for a blog post that captured the mindset: take knowledge that agents otherwise cannot see and encode it into the repository, context files, and AGENTS.md so that agents can move through development work more reliably.

For Bichard, harness engineering is an extension of context engineering. It includes the repository artifacts that tell an agent what the system is and how work should be done: reusable prompt workflows, architecture decision records, runbooks, API contracts, setup instructions, changelogs, on-call playbooks, and ownership files. It also includes the mechanisms that give the agent executable feedback: linters, formatters, type checks, pre-commit hooks, branch protection, required CI checks, scanners, tests, coverage thresholds, deterministic fixtures, preview environments, logs, traces, feature flags, migrations, and reproducible builds.

The important point is not the size of the checklist. It is that these artifacts become the operating environment for agents. Documentation and context files tell the model what it cannot infer. Tests, linters, scanners, and CI checks tell it when it is wrong. Environment conventions let it reproduce what a human engineer would run. The more of that knowledge is encoded in durable, inspectable form, the less an agent must rely on a one-off prompt or a human nudge.

The loop Bichard described is iterative. Let the agent attempt work. Watch where it gets lost. Encode that missing knowledge back into the repository or context. Then repeat, with the goal of making the agent flow farther through the software factory without manual correction.

This is also where he locates a major source of difficulty. Context is the hard part of multi-agent work. Long sessions degrade agent performance; he called this “context rot,” echoing an earlier speaker. Agents lose track as context windows fill. They skip steps, end early, and need nudging. They can also optimize for approval rather than correctness, becoming sycophantic in the way they report completion or satisfy a prompt.

The practical response is not simply to stuff more into context. Bichard points to decomposing work into smaller tasks, using antagonistic agents, spawning sub-agents, and adding explicit gates. The system has to prevent agents from treating a high-level SDLC label as if it were a complete operational plan.

The SDLC becomes fractal once agents must execute it

A human can look at “plan, develop, test, review, ship” and infer a large number of implicit steps. Agents do not reliably do that. The apparent five-step SDLC hides a much finer-grained process, and each transition may require an explicit gate.

Lou Bichard used the planning stage as the example. “Plan” might contain at least three micro-steps: break down the ticket, identify dependencies, and estimate scope. Each has a corresponding gate: do acceptance criteria exist, are dependencies available, and does the work fit in one pull request? A human may perform these checks implicitly. An agent needs the gate spelled out.

The same applies through testing, review, and shipping. If a team wants agents to traverse the SDLC, it must decompose those broad phases into micro-steps and define the conditions for moving from one to the next. Otherwise, agents can skip tests, declare completion prematurely, or proceed without satisfying organizational policy.

This is why Bichard emphasizes coordination over CI alone. CI can be part of the feedback system, but it is not sufficient as the answer to agent coordination. The missing layer is about modeling the lifecycle of agent work as explicit states with ownership, transitions, retries, approvals, and durable state. A local coding agent needs some way to know whether it has completed the “plan” phase, whether it is allowed to proceed, and which agent or workflow owns the next action.

The coordination layer he imagines would model agent work more like state machines than labels on pull requests. It would include durable execution patterns for long-running workflows that survive restarts, with retry, timeout, and checkpointing. It would build in gates: human approval points, quality checks, and policy enforcement. And it would be packaged in a way agents can call from wherever they run.

Runtime is increasingly solved, but Bichard prefers full development environments

Lou Bichard separates the runtime question from the coordination question. A runtime is where the agent executes: the environment, filesystem, tools, and isolation boundary. In his view, this part of the stack is mostly solved, even if teams choose different approaches.

He laid out several runtime patterns. Agents can run in separate threads, with no real isolation and shared memory. They can use git worktrees, which provide git-level isolation while still depending on host tools and shared secrets. Containers and microVMs provide process and filesystem isolation, with lifecycle managed by a container runtime and secrets often requiring manual injection. Ona’s preferred abstraction is the cloud development environment: a full VM-level environment with the same development tooling a human engineer would use, pre-agent provisioning of secrets, IDE access, and platform-managed lifecycle.

Bichard was explicit that Ona believes “proper development tasks” should run inside virtual machines. His reasons were security and performance isolation. A container, he said, is not a bulletproof isolation boundary; if an agent is running there and the team wants to secure it, containers present challenges. Containers can also be bursty when run in Kubernetes or pods, with noisy-neighbor problems and compute contention across workloads. Full VM isolation, in his view, is the correct primitive for serious development tasks.

That position is also part of Ona’s product architecture. Bichard described Ona as a platform that gives each agent a full cloud development environment, the same workspace a human engineer would use. Agents can run in isolated environments per task, work in parallel in the cloud, and be triggered by pull-request events, schedules, webhooks, or manual invocation. Ona builds on the dev-container standard so humans and agents can share configuration.

Ona’s swarm demo made the abstraction operational

Lou Bichard used an Ona demo to make the swarm abstractions concrete. In the interface, he showed two tasks. One asked Ona to implement Symphony from a detailed repository spec. That task used process-based sub-agents: a parent agent ran inside one VM and spawned sub-agents as processes within that same environment.

The second task asked Ona to run a fleet with multiple VMs. In that version, the agent could create other VMs inside the platform. Scaling was described as theoretically limited by what the user is willing to pay and what the cloud provider can support. The parent agent controls the task, while sub-agents receive smaller pieces of context and individual tasks, then pass messages back to the parent agent.

The demo also exposed a UX problem. As tasks become more complicated and agents spawn more sub-agents, the interface has to help humans understand what is running, what each agent owns, and where intervention is needed. In Ona’s UI, VM-level sub-agents appeared as separate environments that came online, progressed through work, and were terminated when complete. Process-level sub-agents appeared inside a single agent window, with each sub-agent opening into its own chat context.

The operational questions under the diagram are the real point: where each agent runs, how isolated it is, how context is divided, how message passing works, what happens when work completes, and how the human sees the state of the system.

The Bank of Ona showed today’s agents can build, but not without new control surfaces

Lou Bichard said he has tried to build a software factory himself. The example he presented was “The Bank of Ona,” a full banking application built with Java 21, Spring Boot, PostgreSQL, and React. His slide described it as built entirely by autonomous AI agents, with 575 pull requests and zero human-written application code.

575

pull requests in the Bank of Ona autonomous-agent case study

The architecture of that experiment followed the software-factory loop he had been describing. A product agent wrote issues from a roadmap. An implementation agent picked them up and delivered pull requests with tests. A code-review agent reviewed every pull request, fixed CI failures, resolved merge conflicts, and labeled work for human approval. A triage agent ran hourly, checked project state, and dispatched the appropriate agent. QA and security agents closed the loop.

The conclusion was qualified. The technology exists today to automate a surprising amount of the process. But the hard parts appear in the handoffs, gates, and state management. Agents can produce code and raise pull requests. What is harder is making them reliably follow the micro-steps of the SDLC, maintain enough context, know when they have completed a stage, and coordinate with other agents without burying humans in tool noise.

This is where the argument returns to primitives. Runtime alone does not create a factory. Orchestration alone does not create a factory. Event triggers alone do not create a factory. A factory needs a coordination layer that can encode the organization’s SDLC as executable, inspectable workflow state.

The coordination primitive may look like a CLI gateway

In the question period, an audience member asked Lou Bichard to make the coordination-layer proposal more concrete. His answer centered on form factor: one plausible solution is a CLI.

He acknowledged other approaches, including graph-based workflow systems where prompts and transitions are defined visually or declaratively, similar in spirit to mermaid-like diagrams or n8n-style workflows. But he argued that the coordination layer ultimately needs to be packaged in a way local coding agents can invoke.

The key interaction he described is simple: a locally running agent, such as Claude Code or another CLI-based coding agent, should be able to call a tool and ask, in effect, “Have I achieved this part of my SDLC, and can I now proceed to the next part?” That CLI would act as a gateway to workflow state, gates, and policy.

That form factor matters because the coordination problem spans local and remote execution. The proposed layer is not merely another CI check after the fact. It is a callable interface during the agent’s work: a way for the agent to consult the state machine, learn whether a gate has been satisfied, and determine the next valid transition. Packaged as a CLI, it would be composable, scriptable, and version-controlled, and could run in a local development environment as well as in CI or another remote execution context.

Bichard said Ona has internal prototypes and that he has a full spec. He is still thinking through whether to release an implementation or a standard. His preference, as he described it, is less about owning the implementation and more about establishing a standard that others can collaborate on.

An audience member asked whether the prototype uses an existing protocol such as ACP or A2A. Bichard said it currently does not. It is not built on ACP yet, though it might be in the future, and it is not built on A2A. He characterized the problem as a slightly different space. He also mentioned several nascent or interim efforts, including ACPX from the OpenDevin ecosystem and a GitHub-related project called Fabbro, but described the overall area as very early.

The shape of the proposal is therefore not a finished standard. It is a diagnosis plus a likely interface: state machines, durable execution, gates, and policy enforcement, exposed through a composable, scriptable, version-controlled CLI that both local and remote agents can call.

AI Application Architecture Inference and Deployment Agents and Autonomy Human-AI Interaction Coding Assistants