AI-Generated PR Firehoses Are Turning Agent Work Into Infrastructure

Onur SolmazAI EngineerThursday, May 21, 202611 min read

OpenClaw maintainer Onur Solmaz argues that high-volume AI-generated pull requests are less a code-review problem than an operations problem. In his talk, he presents acpx, a headless CLI for the Agent Client Protocol, as a way to replace terminal scraping with structured agent workflows that can reproduce bugs, judge implementations, run review loops and emit machine-readable results. He extends the same model to Spritz, a Kubernetes operator for disposable per-task agent pods, making the case for interoperable, isolated agent infrastructure rather than one shared bot or ad hoc maintainer intervention.

OpenClaw’s pull-request volume turned agent work into an operations problem

Onur Solmaz described OpenClaw’s current bottleneck as a firehose of user intent: more than 60,000 total pull requests, with roughly 300 to 500 opened per day on average. In his framing, the project’s problem is no longer simply whether an individual patch is good. It is how to absorb the needs of “tens of thousands of stakeholders” without letting the codebase become an accumulation of localized AI-generated fixes.

300–500

OpenClaw pull requests opened per day on average, according to Solmaz

Solmaz said many incoming PRs begin with a user running into a problem, asking an agent to fix it, and sending the resulting patch upstream. The source description adds that most arrive AI-generated and most are not mergeable. Solmaz called much of the output “slop,” but not worthless. A bad PR can still be a useful data point: it may identify a broken workflow, a confusing API, or a place where users repeatedly get stuck. The maintainer’s job is to separate the signal from the proposed implementation.

That distinction drives much of his work. The question is not merely “can an agent write code?” It is whether the repeated mechanical parts of maintainer work can be turned into agent-operable procedures: identify what a PR is trying to do, determine whether the problem is real, reproduce it, judge whether the implementation is the right abstraction or merely a local patch, resolve conflicts, run CI, handle review comments, and produce a structured outcome.

Solmaz pointed to Peter Steinberger’s PR review workflow as the pattern he wanted to mechanize. The workflow asks, in effect: is the issue clear; is this the best possible fix; if not, should the PR be rewritten, discussed, or discarded? Steinberger’s visible example said that “95% of the time” the answer to “is this the best possible fix?” is no, because many contributors submit fixes that are too localized and would make the project unmaintainable.

Solmaz’s claim was that this kind of triage is “so mechanical” that, once repeated often enough, it becomes a candidate for automation. The result is not an agent with total authority to design the project. It is an attempt to automate the lower-level judgment and cleanup steps before a maintainer sees the work.

ACP is the interface Solmaz chose to stop scraping terminals

Solmaz’s tooling is built around ACP, the Agent Client Protocol. He distinguished it from MCP by function: MCP gives tools to the model, while ACP standardizes the interaction between an agent and a client. In his explanation, Codex in VS Code, Claude Code, Zed, and other coding-agent surfaces all otherwise require separate integrations. ACP exists to avoid that duplication by providing one interface for agent-client communication.

He also placed ACP among competing and adjacent protocol efforts. Agent-to-agent protocols are for agents talking to agents; ACP is for a human-facing client talking to an agent, though Solmaz noted that agents can also use that human-facing interface to communicate with other agents. He said OpenClaw may eventually support multiple protocols as adoption patterns become clearer.

The immediate reason for choosing ACP was practical rather than ideological. When Solmaz needed adapters for both Codex and Claude Code, he said Zed had already built them. That made ACP the available path.

From there he built acpx, a command-line client for ACP. The purpose was simple: “let an agent call any other agent over the command line,” as Solmaz put it. He described the tool as “slowly turning into a Swiss army knife for ACP.” The slide for acpx called it a headless CLI client for ACP, intended for AI agents and orchestrators that need to talk to coding agents through a structured protocol rather than by reading characters from a PTY.

That point matters because Solmaz’s workflows depend on structured outputs. If a review step, reproduction step, or CI-fix step is going to feed the next node in a workflow graph, the agent’s result has to be more than terminal text. It needs to be machine-usable: a decision, a classification, a JSON payload, a route to the next step.

The first use case was turning chat into a coding harness

Solmaz’s path into this system began with chat-driven development. He said he had been building coding harnesses since before ChatGPT, including a JupyterLab extension over an early Codex-era model, and later became an OpenClaw maintainer after using the project from its early “clawdbot” days in Discord. His initial OpenClaw contribution was an MS Teams integration, motivated by enterprise adoption.

The practical workflow he showed was a developer experience where chat channels become agent workspaces. Early on, he used Opus as an intermediary to instruct Codex, which he compared to playing the telephone game. He would ask Opus to tell Codex to do something, then inspect the Codex session to see how the instruction had been paraphrased. It worked, but he regarded it as fragile because prompt wording matters.

His current setup is more direct: Discord channels bound to Codex through ACP. The example slide showed channels named codex-1 through codex-5, plus Claude and other bot channels. Solmaz said he often has one to five agents working in parallel, each tied to a channel and a task. He described it as “running a full IDE on Discord.”

One example involved asking an agent, before flying to London, to create a PDF from ACP documentation and put it in /tmp because the Codex process did not know how to send it back through the Discord harness. He then had to use another channel to retrieve it. For him, the advantage was mobility and parallelism: he described himself as “addicted to side projects,” and said acpx itself was built through this kind of Discord-driven workflow.

The workflow graph turns maintainer habits into agent SOPs

The core abstraction Solmaz added to acpx is a workflow engine: graph workflows that drive a coding-agent harness. He described them as “standard operating procedures for agents,” then translated the phrase plainly: workflows.

In the OpenClaw PR triage case, the graph begins with an incoming item and routes it through a set of questions and actions. What is this PR trying to do? Is it trying to solve the bug in the right way, or is it just a local solution? Can the bug be reproduced? Is the implementation good enough? Are there conflicts with the current base? Does CI pass? Does review feedback identify problems that need to be addressed?

Stage	Question or action	Visible workflow routes
Read item	Start from the incoming PR or item.	Move to finding intent.
Find intent	Identify what the PR is trying to do.	Proceed to judging the implementation or solution.
Judge implementation	Ask whether it solves the bug in the right way or is only a local solution.	Route as “Bad, localized, or unclear” or “Good enough.”
Bad, localized, or unclear	Treat the implementation as unsuitable for landing.	Comment and close PR.
Good enough	Continue processing the PR.	Check conflicts against the current base.
Review loop	Trigger Codex review, address review feedback, and check CI.	Handle P0/P1 findings, related CI failures, and minor or unrelated failures.
Conflicts	Check conflicts against the current base.	Route as clean, straightforward, or ambiguous.

Solmaz’s PR-triage workflow turns repeated maintainer checks into agent-operable steps.

The visible flowchart included terminal branches such as “Bad, localized, or unclear” leading to “Comment and close PR,” and “Good enough” continuing through conflict and review steps. Solmaz’s stated goal is that by the time a PR reaches a human maintainer, the mechanical issues should already be resolved where possible.

He also defended a pattern he jokingly labeled the “shameful” review/refactor loop: have Codex review, address feedback, run CI, and repeat when the feedback concerns shallow problems. His caveat was important. He did not argue that looping an agent over design work necessarily produces good software. He argued that loops are acceptable when the work is to uncover and fix shallow bugs that are “easily fixed.” If the agent discovers that the change requires a fundamental refactor, the workflow should escalate to a human.

That distinction gives the automation a boundary. Solmaz wants agents to handle superficial refactors, straightforward conflict resolution, related CI failures, and review issues with clear severity. He does not present them as a replacement for architectural judgment when the implementation is fundamentally flawed or the validation is ambiguous.

In the acpx demo, the workflow replayed a PR-processing run: reproducing a bug, judging a refactor, entering a review loop, and emitting structured JSON so the next workflow node could consume it. Solmaz compared the system to n8n, but for driving coding agents through TypeScript-defined procedures. He also emphasized that the engine is general: PR triage is one application, not the whole product.

Parallel agents change the compute model from personal assistant to enterprise infrastructure

Solmaz framed agent usage as a spectrum between personal agents and enterprise agents. A personal user may run one to five agents. A company, in his view, may run thousands of agents at once on parallel workloads. That difference implies far more inference consumption at work than at home, and therefore a larger enterprise market.

His shorthand was: apply agents generously where they can solve the problem. But generous use requires a different deployment model. A single OpenClaw instance in Slack, Teams, or Discord is not enough for larger organizations, because chat platforms do not yet provide the kind of multi-agent provisioning and identity model he wants.

The mockup he showed imagined multiple agent identities inside Slack, with generated names such as bob-coal-breeze, bob-quick-cartoon, and bob-bright-sage. Solmaz said Slack and Teams do not currently support this model in the way he needs. To create another apparent agent identity, a user has to create another app, manifest, name, and profile picture. That should not be managed manually through clicking, he argued.

The operating model he wants is one agent per task: disposable, on demand, separately addressable, with its own working state. The agent creates files, edits files, and synchronizes state with the user’s environment. It is not quite the same as a personal desktop agent, because the enterprise version must provision many isolated workers, grant them access to the systems they need, connect them to communication channels, and eventually retire those task environments when they are no longer needed.

Solmaz listed the required components as Kubernetes, an agent harness such as OpenClaw, Codex, or Claude Code, ACP, GitHub repository access, read/write access to infrastructure such as AWS, Azure, or Google Cloud, and state or data synchronization. He mentioned rsync and Dropbox-like synchronization only as examples of the kind of mechanism needed, not as a settled implementation.

Spritz provisions disposable agent pods instead of asking one bot to do everything

The last system Solmaz described was Spritz, work he identified as part of his role at TextCortex rather than OpenClaw. Spritz is an open-source Kubernetes orchestrator for disposable agent instances, available at textcortex/spritz according to the slide.

Solmaz described Spritz as a Go operator that handles the complicated parts around the user experience: keeping agents running, wiring them into Slack-like workflows, and spawning new instances when one shared concierge agent becomes a bottleneck. The slide presented the broader product shape as a self-hosted control plane: deploy Spritz on your own Kubernetes cluster, package one or more agent runtimes as presets, and let humans or gateway automations spawn fresh agents on demand. It also stated that each spawned agent runs in its own workload, is owned by a specific user, and is exposed through a consistent UI, API, and ACP gateway.

Layer	Role in Solmaz’s architecture
ACP	Standardizes agent-to-client interaction.
acpx	Provides a headless CLI and workflow engine for ACP-compatible agents.
OpenClaw	One possible agent harness and collaboration surface.
Spritz	Spawns isolated, disposable agent instances on Kubernetes.
Slack or Discord	Lets a user request work, dispatch an agent, or receive a link to the task environment.

The systems Solmaz described separate protocol, workflow, harness, orchestration, and chat surfaces.

The slide emphasized runtime agnosticism. OpenClaw can be one runtime, but Spritz is not tied to it. A runtime can be OpenClaw, Claude Code, Codex-based, or a custom internal agent image, provided it speaks ACP on the Spritz runtime contract.

Solmaz’s demo use case was error reporting. In Slack, he asked an internal bot whether there were new bugs after a production release. The bot identified an AttributeError: the system attempted to call ensure_email_domain on a FingerprintAuth object in agent_system_message, apparently because the code assumed all user objects had that method. Solmaz then asked the bot to dispatch an agent to debug it. The bot returned a Spritz URL and said an agent was investigating.

Because Slack could not host the newly provisioned agent identity the way Solmaz wanted, the actual debugging happened in a separate web UI hosted inside the cluster. The UI showed gateways for Codex ACP and OpenClaw ACP, a traceback, and the agent’s root-cause analysis. The agent concluded that agent_system_message incorrectly called user.ensure_email_domain() for any user_auth class, while that helper only worked for real User rows. FingerprintAuth implemented userMixin, which hid the mismatch until runtime.

The visible fix was to use the canonical default-agent fallback only for runtime User instances and use a minimal fallback system message for non-User principals such as FingerprintAuth. The agent also added a regression test for the FingerprintAuth case. It reported that the targeted pytest passed, while a full pytest run could not finish because xn failed to fetch gcloud-k8s-auth from a configured GitHub registry with a 403.

Solmaz called the pod-per-agent approach “wasteful,” but said he thinks it is the better abstraction. His reason was the same lesson he drew from OpenClaw: giving an agent a full computer makes it more powerful. He said he thought OpenHands uses Firecracker, and added that he is still learning the virtualization landscape. Spritz, in his presentation, is the Kubernetes-based version he has running.

The through-line is interoperability plus isolation

Solmaz’s architecture has two recurring constraints. The first is interoperability: agents should not be locked to a single harness or UI. ACP is the mechanism he currently uses to abstract across OpenClaw, Codex, Claude Code, and other compatible runtimes. acpx is the command-line and workflow layer that lets those agents be driven programmatically.

The second is isolation: in Solmaz’s proposed enterprise model, agents should be disposable task environments rather than one shared bot with a growing pile of context and responsibilities. That pushes the system toward Kubernetes, per-task workloads, user-owned instances, and lifecycle management.

The same pattern runs from Discord-bound coding sessions to Kubernetes-managed task environments: chat is where users ask for work, ACP provides the structured boundary, workflow graphs encode repeatable maintainer procedures, and disposable pods give each task its own working computer.

AI Application Architecture Inference and Deployment Agents and Autonomy Coding Assistants Enterprise AI Adoption