Orply.

Claude Code’s Growth Tests the Economics of Long-Running AI Agents

Alex KantrowitzBoris ChernyAlex KantrowitzWednesday, May 20, 202620 min read

Anthropic’s Claude Code head Boris Cherny argues that the product has become more than an AI coding tool: it is now one of the company’s main surfaces for agentic AI. In a Big Technology interview, Cherny says Claude Code’s rapid growth reflects real productivity gains and a shift from models that answer questions to systems that can use tools, run tasks, and coordinate other agents, while acknowledging that rate limits, token costs, safety checks, and organizational change remain unresolved constraints.

Claude Code is becoming Anthropic’s agent surface

Claude Code is no longer just an AI coding product in Boris Cherny’s account. It is becoming one of Anthropic’s primary ways of exposing agents to users — and that shift is forcing the company to confront capacity, safety, rate limits, token economics, and organizational redesign all at once.

Cherny describes the growth as unusually steep even by the standards of people who have worked on hyper-growth products. Inside Anthropic, he said, the product “skyrocketed immediately” before external release, which gave the team confidence it would be a hit. After the release of Opus 4 and Sonnet 4 in May of the prior year, growth “went exponential,” then kept inflecting with Opus 4.5 in November, 4.6 in February, and 4.7 after that.

I've just never seen growth this steep, and then it just kept going more and more exponential.

Boris Cherny · Source

The more important claim is not just that usage rose. Cherny argues that, for an increasing number of people, Claude Code is becoming the way they experience AI agents and Anthropic itself. Anthropic still has several surfaces — Claude chat, Claude Design, Co-work, API products, managed agents, SDKs — but Claude Code has become a first introduction for many users.

Alex Kantrowitz framed the scale of the broader company by referring to Dario Amodei’s public comments, as Kantrowitz characterized them, that demand for Anthropic’s products was up roughly 80 times year over year. Kantrowitz also said Anthropic’s annual recurring revenue had moved from what he described as $4 billion around the prior year to “maybe” $45 billion. Cherny declined to break out whether the API or owned-and-operated products are larger today. He would only say that it is “a mix”: products play a much bigger role than they did a year earlier, and product growth and API growth are both accelerating.

That answer matters because Anthropic’s early commercial identity, as Kantrowitz put it, was heavily API-driven. Cherny said there had once been a real debate inside Anthropic, before he joined, about whether the company should build products at all. His view now is that a lab needs products for mindshare and for safety. Anthropic exists, he said, to study AI safety, and products give it better tools to do that. But because Anthropic is a small organization and will not build most things people need in the world, it also needs a platform: managed agents, APIs, SDKs, and other ways for thousands of businesses to build on top.

Claude Code’s original bet, according to Cherny, was that AI software development should not remain anchored in “a fancy text editor.” At the time, most engineers still understood AI as a chatbot or autocomplete interface. Anthropic saw that the model was becoming good at coding and at using tools, and Claude Code was built around that difference.

For Cherny, the distinction between a chatbot and an agent is simple: an agent can use tools. That can mean editing files on a desktop, using a browser, organizing files, connecting to Cloudflare or other services, or taking actions through a user’s computer when granted access. The product’s significance, in his account, comes from that small interface change: instead of only replying, the model can act.

Kantrowitz described the shift as moving from autocomplete toward a natural-language agent that can code, connect to services, and perform tasks. Cherny agreed. In his telling, the product category changed once models could operate through the user’s existing tools rather than merely generate text about what the user should do next.

Demand looks real to Cherny, but the hard part is changing how organizations work

The central challenge for Anthropic is not only whether people are using Claude Code. It is whether the usage reflects durable productivity gains or artificially stimulated demand.

Kantrowitz pressed Cherny on “tokenmaxxing”: corporate mandates or competitions that reward employees for consuming AI tokens, running agents, or hitting AI-usage targets. The concern is that some reported demand may reflect gamified incentives rather than economically useful work.

Cherny said he does not think tokenmaxxing accounts for a large percentage of Claude Code usage. He did not claim to know how many companies are doing it, and said he had only heard of it as a trend. His counterpoint was that Claude Code has “many, many, many customers,” rather than one company driving usage.

The most concrete challenge came from an example Kantrowitz introduced from a Financial Times report, as he described it on air: Amazon staff were allegedly using AI tools for unnecessary tasks to inflate usage scores after Amazon introduced targets for more than 80% of developers to use AI weekly. Kantrowitz also said he had checked the dynamic with an Amazon employee, who described running an automation for hours and deleting the result in order to meet targets. Cherny did not dispute the possibility of such cases; he treated them as part of a broader, uneven search for organizational change.

Cherny’s own recommendation is not to create token leaderboards. It is to give people enough token access to experiment without seeking approval for every use, and to create psychological safety around workflow experimentation. Some experiments will fail. Some will work. The important thing, in his view, is that productivity improvements will often come from unexpected places.

At Meta, before Anthropic, Cherny said he worked on the health of code across Facebook, Instagram, WhatsApp, and other apps. Before AI models, productivity gains of 1%, 2%, or 3% per engineer over a year were hard-won and meaningful. With Claude Code, he said companies including Anthropic and its major customers are reporting productivity gains “on the order of hundreds of percentage points.” Anthropic’s last reported internal figure, he said, was that code written per engineer had grown about 250% since Claude Code was introduced, while code quality and reliability remained stable.

250%
reported increase in code written per engineer at Anthropic since Claude Code was introduced, according to Cherny

Cherny compared the moment to the adoption of personal computers. Computers did not automatically make companies more productive just by appearing in the office, he said. Firms that kept paper processes and placed computers at the periphery saw little benefit. Firms that reorganized business processes around computers — throwing away filing cabinets and making the computer central — captured the gains. Cherny sees AI in similar terms: companies are experimenting with how to reorganize work around it, and there is not one right approach.

The organizational lesson, as he put it, is that the people who find breakthrough uses may not be the people management would have selected in advance. It could be an accountant automating accounting, a marketer automating marketing, or a new graduate engineer building something unexpected. Companies should let broad experimentation happen first, then optimize once a use case scales.

Token economics now shape the product experience

Cherny’s case for agentic AI depends on rapid model improvement, but the constraints are already visible: agents can waste tokens, users can hit rate limits, and longer-running tasks require safer delegation.

Kantrowitz’s concrete example was using Claude Co-work to create PowerPoint presentations and export a PDF. In one case, he said, the system appeared to spiral, use many tools, and fail to execute a simple export until he pushed it. The model eventually replied that it had gone “down a rabbit hole worrying about a constraint that wasn’t actually blocking us,” then shipped the file.

Cherny separated three dimensions of model quality: intelligence, speed, and efficiency. Anthropic tries to move all three together, he said, but if forced to choose, he would optimize first for intelligence. More intelligence lets users do more things; efficiency can be optimized afterward.

That sequencing explains why Anthropic exposes controls. Users can choose among model sizes — Opus as the largest, Sonnet as the middle model, Haiku as the smallest — and can also adjust “effort,” which Cherny described as how much work the model should put into the task. For Opus 4.7 at maximum intelligence, he said users should use extra-high or maximum effort. If they want to conserve tokens, they can choose medium or low effort.

The stronger objection is that the PDF-style failure may be inherent to large language models because they are probabilistic next-token systems and do not produce the same answer twice. In that view, spiraling behavior is a feature of the technology, not something that can be fixed. Kantrowitz attributed that argument to a commenter on his show.

Cherny rejected it. His counterexample was Claude Code itself. A year and a half earlier, he said, the product was not good enough to build entire features or products reliably. It would spiral, produce bad code, or produce code that did not work. As the model and product improved, the results improved. Today, he said, Claude Code is “100% written by Claude Code,” Co-work is “100% written by Claude Code,” and an increasing number of Anthropic product features are fully written by Claude Code.

Claude Code is 100% written by Claude Code. Co-work is 100% written by Claude Code.

Boris Cherny

Cherny also described a recent talk at Y Combinator where he asked a room of a few hundred people how much of their code was written using Claude Code. About half, he said, raised their hands to indicate 100%. When he asked how many had 0% of their code written with AI, one hand went up. Everyone else was somewhere in between.

Rate limits are the other side of the same problem. Kantrowitz identified them as the most visible frustration among Claude Code users: people try the product, hit their token allotment after a short period, wait hours to use it again, and consider alternatives. Cherny said Anthropic is actively working on the issue, but also argued that a “very small percent” of users actually hit their rate limits. For Pro users, he said, the percentage is higher; for Max users, it is “actually quite low.”

He gave three explanations for why the issue has become salient. First, Anthropic briefly reduced peak rate limits, then rolled that back and doubled rate limits. Second, Claude Code’s extensibility means users can install plugins and integrations, some of which consume tokens inefficiently. Anthropic is working to surface token usage by plugin so users can decide whether the integration is worth it. Third, power users have dramatically expanded their workflows.

When Claude Code launched, Cherny said, users generally ran one Claude at a time. He now runs about five simultaneously on his computer during normal work. On many nights, he runs hundreds of Claudes in parallel, and sometimes thousands. That usage pattern sits at the edge of what a Max plan can support. Users who want as many tokens as they need can pay through the API, he said, which is what many enterprises do.

Competition sharpens the capacity question. Kantrowitz linked rate limits to OpenAI’s Codex and to Anthropic’s relative discipline around data-center spending, suggesting that users who hit Anthropic limits may shift to Codex if OpenAI has more capacity available. Cherny answered that Claude Code’s growth “has never been faster than it is today” and is still accelerating. He said the company is focused on improving the experience for users who do hit limits, including doubled five-hour rate limits, increased weekly limits announced that day, and new “Colossus” capacity, which Kantrowitz and Cherny described as brought online via Elon Musk.

On Codex, Cherny’s response was restrained. There are always copycats and competitors, he said. He finds it flattering, and it forces everyone to do better. His stated priority is talking to users every day and improving the product incrementally.

Long-running agents require Claude to supervise Claude

The product direction Cherny described depends on letting agents work longer and with less constant human approval. That creates a security problem: asking users to approve every tool call appears safer, but repeated prompts can make users less careful.

Claude Code’s earlier pattern was to ask every time the model wanted to use a tool. Users usually clicked yes, or eventually chose “always allow.” Cherny said the security problem is that users become fatigued and stop evaluating the specific action. Auto mode is Anthropic’s answer.

In auto mode, when Claude wants to use a tool, it asks another Claude whether that tool use is safe. The second Claude has some, but not all, context. The decision also sits behind multiple layers of safety checks. Cherny said Anthropic spent months iterating on the system and used thousands of benchmarks and evaluations to determine whether it was safe.

His claim is that auto mode is proving safer than the prior prompt-heavy approach both in lab settings and in the wild. If one unsafe command is buried among many routine approvals, a tired user might approve it. A second Claude evaluating the tool call may refuse it.

This pattern — Claude supervising, prompting, or coordinating other Claudes — is also central to how Cherny personally works. He said he no longer writes code; he prompts Claude. More recently, he mostly uses a Claude that prompts other Claudes.

I don't write code. I prompt Claude. And actually nowadays mostly what I'm doing is I have a Claude that prompts other Claudes.

Boris Cherny · Source

That is Cherny’s model for where knowledge work may go: humans increasingly specify goals, agents decompose or delegate work, and other agents execute. He expects the same pattern to extend beyond coding into Co-work, where users start one task, then another, then another, and learn to manage many parallel agent runs.

ConstraintCherny’s account of the issueAnthropic’s stated response
Token inefficiencyMore intelligent models may use more tokens or spiral on some tasks.Optimize first for intelligence, then efficiency; expose model and effort controls.
Rate limitsMost users do not hit limits, but power users increasingly run many Claudes in parallel.Double five-hour limits, increase weekly limits, add capacity, and let heavy users pay through API.
Permission fatigueRepeated tool approvals make users less thoughtful about safety.Use auto mode, where another Claude evaluates tool calls with additional safety checks.
Cherny’s framing of the main constraints around Claude Code’s growth

The frontier is moving from code into ordinary work

Cherny’s clearest example of agentic AI outside programming was a personal travel-planning task. He said he used Co-work to plan a month of travel around events in London, Tokyo, and other stops. He gave it rough timing and asked it to look through his email and calendar, confirm details, and book travel.

Co-work found two stops he had omitted and a couple of dates he had provided incorrectly. Cherny then asked it to book the travel. He went back to coding, returned about an hour later, and found that it had booked eight flights and five hotels. One hotel was in the wrong area; he asked it to rebook, and it did.

Cherny said he repeatedly tests common tasks with new models, and the travel-planning result was the best he had seen. The difficult part of using these systems, in his view, is regularly revising one’s assumptions about what the current model can do.

Engineers who tried AI coding a year earlier and stopped, he said, may still think models are only reliable for a few lines at a time. In his view, that impression is now outdated. The current experience is materially different because model capability has changed. Cherny described AI agents as the first technology he has used where “every month there’s a step change” in what the system can do.

That improvement curve makes a beginner’s mindset unusually important. Tasks that failed with one model may work cleanly with the next. Cherny’s advice is to keep retrying uses that previously seemed unrealistic, because “the next model might just do it perfectly.”

Kantrowitz connected this to a broader interface claim. Traditional software requires users to adapt to the menus, workflows, and assumptions of a product built at scale. An agent changes that relationship by letting a user state a desired outcome and allowing the system to operate across the tools where that outcome has to be achieved. The travel example, in this framing, is not merely a better booking flow. It is a sign that the user’s preferences and context can become the interface.

Cherny also described watching a non-engineer friend use Co-work to fix a laptop language-input problem. Instead of searching Google for instructions, she asked Co-work. It requested permission to use the computer, opened settings, diagnosed the issue, and fixed it while she watched. Cherny said the user remains in the driver’s seat: the system is visible, not operating invisibly in the background. But he treated the case as evidence that non-engineers are already finding uses he would not have anticipated.

The roadmap Cherny described has three themes. First is intelligence: as models improve, Claude Code and Co-work can perform more ambitious work. Coding moved from writing a line at a time to building entire features or products. Co-work moved from creating documents toward booking flights, combining many tools, and doing QuickBooks work. Second is longer-running tasks, including auto mode. Third is parallelism: many Claudes running at once, with product experiences that make it more obvious when and how to use that.

The chatbot may also become more agentic. Kantrowitz described a possible future in which a user discusses a problem — for example, an upcoming trip to India — and the chatbot suggests and executes an action rather than merely answering. Cherny said he could see that direction and stated plainly that “agents are the future.” Anthropic is trying experiments in that area, he said, though he did not elaborate.

Higher leverage does not eliminate the need for people

The strongest version of the labor-market skepticism is not that agents are useless. It is that AI labs still need people to figure out how AI is useful, handle organizational change, and integrate systems. Kantrowitz introduced that point through a quotation he attributed to Wharton professor Ethan Mollick: “You will know that the AI labs believe in artificial superintelligence when they disband their newly formed consulting, sorry, forward deployed engineering groups. As long as people are required to figure out how AI is useful and do organizational change and systems integrations, jobs seem pretty safe.”

Cherny’s answer was not that people disappear. It was that individual leverage rises. He said one engineer at Anthropic now has “insane” leverage; the relevant question becomes how large a business or how many products one person can support. He said Anthropic is beginning to see similar leverage among marketers, forward-deployed engineers, and sales teams. About half the go-to-market team, he said, uses Claude Code, and the rest uses other Anthropic products.

His reason for still hiring people, including roles that might seem vulnerable to automation, is that Anthropic remains bottlenecked on good people. Even if leverage per person rises, demand is so high and there is so much to build that the company cannot hire enough strong people.

The question becomes sharper when the task is something like configuring Salesforce or handling IPO paperwork. Kantrowitz put both examples to Cherny as tests skeptics might use: if the technology is so powerful, why not configure Salesforce from a prompt, or let AI handle IPO paperwork without an investment bank?

Cherny answered by returning to the human role in the loop. Even if Claude configures Salesforce, someone has to ask Claude to do it. If there are many configurations and business judgments involved, prompting and steering Claude could itself be a full-time job. Eventually Claude may become good at asking Claude to do those things, deepening the chain, but Cherny’s view is that people are still needed to pilot the process.

Kantrowitz suggested that perhaps the future job is asking one question. Cherny’s response was that asking the right question would then carry enormous leverage.

The Saaspocalypse depends on which moats survive agents

The “Saaspocalypse” question is whether automated programming and agentic interfaces weaken the defensibility of software companies. Cherny answered through the language of business moats, specifically citing the Seven Powers framework as a preferred way to think about defensibility.

His view is not that all moats disappear. Some become more valuable; some become less valuable. Network effects, he argued, should become more important. If a product’s value depends on the people or businesses connected to it, it does not matter whether an agent can write code or build an alternative interface. A messaging app is valuable because one’s friends are there.

Switching costs, by contrast, may become less important. If a company wants to move from Vendor A to Vendor B, Claude can increasingly help execute that migration. The better agents become, the less friction protects incumbents whose main advantage is that switching is painful.

The more radical possibility is that users interact through one agent that interfaces with all software, collapsing many software moats into the agent layer. Cherny said something like that is possible but seems “a little far-fetched” to him. He returned to messaging: he can build a good messaging app with Claude Code in a few hours, but it is not useful if it cannot talk to his friends.

Cherny conceded that an agent could become the user-facing layer for communication, but emphasized that the underlying communication still depends on protocols, existing networks, and whether apps support interoperability. He used Signal as an example: an app might use the same protocol, but it may not be able to message other people on Signal.

Cherny also pointed to scale economies as a durable moat. For a chip manufacturer such as TSMC, the process power and cost declines achieved through scale remain fundamental economic forces. In tech infrastructure, companies with strong infrastructure can support more users and reduce marginal cost per user over time. Those advantages do not disappear simply because users can build apps with agents.

His bottom line is mixed. Some software categories will be exposed by lower switching costs and easier custom software. But companies with real network effects, scale economies, infrastructure advantages, or multiple accumulated moats may become more defensible, not less.

Self-improving AI is plausible, but humans still steer the loop

Kantrowitz characterized Anthropic cofounder Jack Clark’s view as roughly a 60% chance that models will start improving themselves by 2028, while caveating that the exact percentage or year might be slightly off. Cherny said that “seems right.”

His evidence was again Claude Code’s own development. Since around Opus 4.5 in November of the prior year, he said, 100% of Claude Code has been written using Claude Code. But he drew a line between that and a fully autonomous self-improvement loop. Today, Claude Code is writing itself, but a person is still prompting it. Claude is starting to generate ideas for what to build next for Claude Code, but not always good ones. Cherny said he still generates most of the ideas.

At some point, he expects that to change. The model will improve and become more of a self-reinforcing loop. He did not present that as a distant abstraction; he treated it as one of the reasons Anthropic exists. If you ask engineers and researchers why they joined Anthropic, he said, they will say AI safety. The company’s mission, in his telling, is to make sure the technology goes well, including for future generations, because fast self-reinforcing progress is one possible outcome.

The separate “world model” critique asks whether language models can be reliable agents without an internal model of how actions affect the world. Kantrowitz quoted Yann LeCun’s view as: “You cannot build a reliable agentic system without a world model. LLMs don’t have world models. They can’t predict the consequences of their actions before taking them.” Cherny did not take a strong research-side position. He said he is on the product side, and offered to sit down with LeCun and use Claude Code together for an hour.

Pressed on whether he must believe Claude has some understanding of consequences if he trusted it to book flights and hotels, Cherny said Anthropic researchers have published work, as he described it, showing surprising degrees of intelligence in these models. Although they fundamentally predict the next token, he said, the models can plan and reason in ways one might not expect from a system described only as next-token prediction.

Kantrowitz mentioned Anthropic research, again as an interview reference, showing that when models write poetry, the model appears to be thinking about the next line while writing the first. Cherny said that is how he would write poetry too: if predicting the next word is hard enough, the model has to learn to plan ahead.

The next year tests whether non-engineer adoption keeps widening

The final skepticism is whether the current agent boom is a real future or a fever dream. The fever-dream argument is that ordinary users may prefer simple interfaces and tapping through familiar apps; prompting Claude Code may feel too technical, and developer enthusiasm may not generalize.

Cherny’s answer was behavioral. He said Anthropic ran a hackathon for Opus 3.7 where winners included a doctor, an electrician, and a carpenter. Many participants had no coding experience but used Claude Code to build something useful. One person, he said, built and sold a startup as a result of one of Anthropic’s hackathons.

Claude Code was originally built for engineers, and engineers figured it out first. But Cherny said non-engineers quickly learned to use it to build economically useful things. He also said that much current usage is not engineers. Even before Claude Code had easier surfaces, people were installing it in a terminal; for many, that was their first time using a terminal. Now the product has a desktop app, iOS app, Slack app, and other interfaces. But the earlier willingness to use a terminal matters to Cherny because it is the “ultimate market test”: people were willing to jump through hoops because the product was useful.

That leaves a concrete set of tests for Cherny’s thesis over the next year. If he is right, Claude Code and Co-work should become less defined by developer workflows and more defined by durable, repeated use across business functions and everyday work. The product should also absorb more of the rough edges now visible to power users: wasteful token loops, permission fatigue, rate-limit frustration, and the difficulty of managing many agents in parallel. Cherny’s claim is not that those problems are solved. It is that model progress, product design, and organizational adaptation are moving fast enough to make recent limitations poor guides to what the systems can do next.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free