Orply.

Applied AI Moves From Usage Growth To Proof Of Control

Applied AISaturday, May 30, 20261h 34m to watch16 min read

John Coogan, Jordi Hays, Brad Gerstner, Loblaw, Giga, and the All-In panel each pointed to the same shift: AI use is no longer being judged by adoption alone. Enterprises are asking what tokens produce, infrastructure investors still see constrained compute, and more value is moving into the operating layers that govern workflows, context, measurement, and model choice.

1. The new enterprise AI question is not adoption. It is output per token.

Applied AI has moved past the stage where usage alone counts as evidence of progress. The new management question is whether model consumption is producing completed work at a cost the business can defend.

John Coogan and Jordi Hays described that shift through the rise of token rationing inside large companies. Their argument was not that enterprise AI demand is fake or that the tools have failed. It was that the tools have become material enough to require ordinary operating discipline. Companies encouraged experimentation, employees and teams found ways to use large volumes of model calls, and executives are now asking what was actually built, shipped, resolved, or saved.

The cleanest formulation came from Coogan: when a company spends enormous sums on AI, leadership eventually asks, “What did we make? What did we do? How did we spend it? What did we get done?” That is the difference between adoption theater and ROI accounting. A dashboard showing rising token use may prove that employees are experimenting. It does not prove that the company is becoming more productive.

The management problem is made worse when token usage becomes a status signal. Hays cited reports of “token maxxing” at companies such as Meta, where employees allegedly left systems running overnight or generated unnecessary work to climb usage leaderboards. Coogan connected the behavior to Goodhart’s law: once a measure becomes a target, it stops being a good measure. He also invoked Jevons paradox: as AI tasks get cheaper or easier, total usage may rise rather than fall.

Goodhart’s law is becoming a practical enterprise AI problem. Token volume can be useful as an adoption signal, but it becomes misleading if workers are rewarded for consumption rather than output. The same logic applies to AI budgets: if a team receives a token allowance and then feels pressure to spend it, the budget itself can become an incentive to manufacture work.

That distinction matters because AI experimentation did have a purpose. It helped companies discover where the tools could be useful. The next phase is not a retreat from AI but a governance layer around an expensive operating input: budgets, access controls, usage policies, cost attribution, and stronger links between consumption and output.

Coogan and Hays both drew a line between disciplined and undisciplined use. A builder using a coding model “like a scalpel” to iterate on one useful product is different from a team spraying tokens across low-value backlog work that had been deprioritized for good reasons. Hays’s point was especially relevant to enterprise work: if AI makes it cheap to revisit the cutting-room floor, companies still need judgment about what should remain there.

Accidental or runaway bills made the issue concrete. Hays cited reporting about a CFO worrying over a half-billion-dollar accidental AI bill and imagined the escalation path from an IT surprise to a CTO problem to a CFO-level shock. He cautioned that budget overruns are not automatically evidence of poor ROI: if a company set its 2026 token budget in 2025, model capabilities and usage patterns may have changed enough to make the original budget obsolete. But that caveat reinforces the core point. AI spending is now large enough, variable enough, and bottom-up enough to need a financial control plane.

$500M
accidental one-month AI bill Hays cited from reporting in the enterprise spending debate

The practical implication is a new CFO/CTO conversation. Not “Are employees using AI?” but “Which workflows changed, what finished faster, what quality improved, which costs fell, and what did the tokens cost?” That question will not be answered by token counts alone. It will be answered by linking model calls to shipped features, resolved tickets, merged pull requests, generated assets, reduced cycle time, or higher customer satisfaction.

Enterprise AI Enters Its ROI Era as Token Costs SurgeTBPN

2. The paradox: enterprises are optimizing spend while infrastructure demand still looks supply-constrained.

The immediate enterprise story is cost discipline. The infrastructure story is still scarcity. Those two points appear to conflict, but Brad Gerstner’s argument is that they can coexist: customers can optimize token usage per task while total demand rises because more tasks, users, agents, and workflows come online.

Gerstner framed the AI market as supply-constrained rather than demand-exhausted. In his account, the market’s skepticism has moved through phases. First, critics asked whether AI revenue would appear. Then, after revenue appeared, they asked whether it was just “token maxing” without ROI. Gerstner’s answer was not that every dollar of token spend is productive. He said enterprises are early, some usage will be wasteful, optimization will happen, and aggregate demand can still grow.

Altimeter’s enterprise survey was the key support for that view. Gerstner said even companies already optimizing AI API usage expected forward growth in raw API consumption. Companies planning to optimize expected still higher growth. In other words, optimization is not necessarily the opposite of expansion. It may be the condition that lets AI move deeper into workflows.

Enterprise postureTrailing spend growthExpected forward raw API growth
Already optimized or actively optimizing+87%+58%
Planning to optimize within 6 months+68%+90%
Evaluating, no firm plans+42%+23%
Not a current priority+145%+91%
Altimeter’s enterprise survey, as presented by Gerstner, showed expected API growth even among customers already optimizing token spend.

The driver is inference. Gerstner said inference-time reasoning opened another scaling vector, and he repeated Jensen Huang’s claim to him that inference demand could rise not 100x or 1,000x, but “one billion x,” because agents will talk to agents. That is a maximalist framing, and it should be read as Gerstner’s investment thesis rather than an established outcome. But it explains why infrastructure investors can remain bullish while enterprise buyers become more cost-conscious.

The same debate now reaches physical infrastructure. Gerstner argued that there is “not a dark GPU” and “not a dark token” in the world, meaning deployed AI compute is being consumed immediately rather than sitting idle like unused fiber during the dot-com buildout. He said Google, Amazon, Microsoft, OpenAI, and Anthropic all report token constraints and could generate more revenue if they had more capacity.

His conclusion leads to data centers. Gerstner treated data center opposition as a national-capacity problem, not just a local land-use dispute. He argued that a moratorium would be “horrific for America,” pushing the country toward recession and ceding AI leadership to China. At the same time, he did not dismiss local concerns. Communities worry about water, power bills, jobs, and disruption. His proposed answer was a “sociopolitical bridge”: tangible dividends or benefits for places asked to host infrastructure.

Gerstner’s most compact version of the scarcity case was that “there is not a dark GPU in the world today” and “not a dark token in the world today.” That line is doing two kinds of work in his argument: it distinguishes today’s compute buildout from the unused fiber of the dot-com era, and it supports his view that optimization will not automatically collapse aggregate demand.

That is where token economics become political. Enterprises want cheaper, better-governed inference. Infrastructure investors want more powered shells, chips, memory, and data centers. Local communities want assurance that the buildout will not extract resources without returning benefits. Gerstner’s national-capacity argument points in the opposite direction from local moratorium politics, which is why the conversation can feel contradictory.

Inference-time reasoning makes that contradiction sharper. A single AI assistant query is one kind of demand. A software agent that plans, calls tools, checks results, revises, invokes another agent, and records its reasoning can consume far more tokens. If enterprises discipline low-value usage but deploy more agentic systems in coding, support, finance, legal, and operations, total demand may still rise.

The market question is therefore not simply whether token prices fall. It is what happens after they fall. If cheaper inference mostly replaces waste, spending may flatten. If cheaper inference makes AI economically viable inside every workflow, spending may grow even as unit costs decline. Gerstner is arguing for the second outcome. Enterprise CFOs are testing whether the first-order bill can be controlled before that deeper embedding happens.

AI Compute Remains Supply Constrained as Infrastructure Stocks Pull AheadTBPN

3. Model value is being contested by the operating layer: coding, workflow, and context.

The applied-AI frontier is shifting from “which model is smartest?” to “who owns the workflow, context, review loop, and decision trace?” The clearest examples now sit below the model layer: coding-agent habits, enterprise productivity instrumentation, and memory systems that let agents retrieve institutional precedent.

Each example moves the unit of value farther from raw model output and closer to the system that specifies, measures, reviews, and remembers work. That is the operating layer: the practices and infrastructure that turn model capability into repeatable organizational leverage.

Codex shows the shift inside software work. Matias Castello’s claim was not simply that AI writes code faster. It was that builder work is moving from manual execution into specification. In his workflow, the human describes intent, constraints, preferences, and review criteria; Codex plans, writes issues, implements, tests, and returns options for evaluation. The bottleneck becomes the clarity of the human’s instructions and the quality of the operating habits around the agent.

Castello’s examples were specific. He uses reusable skills, an agents.md file that encodes his preferences, Linear for milestones and issues, and feature flags for experiments. He can ask Codex to research competitors, propose features, implement them as modular experiments, and leave him with toggles to review in the morning. That workflow preserves human product judgment while reducing the manual cost of exploration.

The important change is that implementation becomes more delegable, but judgment does not disappear. Castello’s process is designed to reduce model guessing. When a model produces a bad result, he said, it usually had to make assumptions the user did not specify. His answer is not to trust the agent blindly; it is to encode preferences and constraints more explicitly.

Loblaw supplied the enterprise version of the same operating-layer story. Lauren Steinberg said ChatGPT Enterprise is available to every Loblaw colleague and described OpenAI tools changing both internal work and customer-facing retail flows. The strongest numerical claim was code generation: Loblaw’s internal data card said 46.9% of all code is now AI-generated, more than 70% in teams such as Digital Health, and more than 30% of engineers are seeing 2x productivity based on merged pull requests. Those are company-presented internal figures, not an independent audit, but they show what enterprises are beginning to measure.

46.9%
of Loblaw code shown as AI-generated in the company’s internal data

The “based on merged PRs” qualifier matters. It moves the discussion beyond tool usage toward a work artifact: merged pull requests. That is still an imperfect productivity measure, but it is closer to output than token volume. Loblaw also showed AI in customer flows, with a PC Express experience that moves from a dinner question to a recipe, local store selection, priced basket, and checkout path. In product imagery, Steinberg said ChatGPT Images lets Loblaw show products in more scenarios and on more models than time and cost previously allowed.

Together, Castello and Loblaw show two sides of the operating layer. One is the builder’s layer: skills, issue trackers, feature flags, review loops, and encoded preferences. The other is the enterprise measurement layer: generated code share, merged PRs, customer flows, product assets, and commerce conversion surfaces.

Zach Blumenfeld’s context-graph argument goes one layer deeper. If agents are going to operate in financial services, healthcare, support, compliance, or internal operations, document retrieval alone is insufficient. A normal knowledge base can tell an agent the current facts and relevant policies. A context graph is meant to retrieve precedents: how similar decisions were made, what caused them, what policies were applied, and what outcomes followed.

In Blumenfeld’s financial analyst example, a standard retrieval setup can identify risk factors in a credit-limit request. The context graph adds prior fraud flags, compliance rejections, velocity-check failures, account risk tier, employer risk, and similar historical decisions. The agent is not just retrieving a policy document; it is traversing a record of institutional judgment.

LayerWhat it controlsWhy it matters
Coding-agent workflowIntent, constraints, tasks, review, feature flagsMoves human work from implementation toward specification and evaluation
Enterprise deployment metricsAI-generated code, merged PRs, customer flows, generated assetsLinks tool use to visible work artifacts rather than token volume alone
Context graph memoryEntities, events, precedents, causal links, reasoning tracesLets operational agents retrieve how similar decisions were resolved, not only what policy says
The operating layer is where applied-AI value is increasingly being captured and governed.

This is why the model layer alone is not the whole prize. Enterprises need swappable models, but they also need memory, policy, identity, permissions, context, evaluation, and traceability. The organization that controls those layers controls whether AI is a clever assistant, a production workflow, or a decision-support system with institutional memory.

The same issue explains why ROI accounting is difficult. If a company cannot connect model usage to workflow outputs, context quality, review decisions, and downstream metrics, it cannot know whether token spend is leverage or noise. The operating layer is where that connection has to be made.

Codex Moves Builder Work From Coding to SpecificationOpenAI
Loblaw Says AI Now Generates 46.9% of Its CodeOpenAI
Context Graphs Let Agents Retrieve Precedents, Not Just PoliciesAI Engineer

4. Startups are being judged by paid operating impact, not AI novelty.

Giga’s story brings the same operating-layer shift into startup execution. Varun Vummadi’s argument was that AI startups do not win by saying “agent.” They win by moving a customer’s KPI in production.

Giga began with an edtech idea, pivoted into fine-tuning, and then found customer support through paying customer demand. Vummadi said the company discovered the real business from usage rather than abstract market analysis: customers were pulling Giga toward support and coding, and Giga chose support. That matters because it puts revenue and operational pain ahead of AI novelty.

For Vummadi, support automation is not primarily a chatbot problem. It is a policy, workflow, and KPI problem. Traditional IVR or chatbot systems might deflect 10% to 15% of calls before a human gets involved, he said. Giga’s AI support agents are meant to reach 60% to 70% deflection and, for some customers, eventually 90% to 95%. Those are Vummadi’s claims, but they identify the right measurement surface: resolution, deflection, customer satisfaction, and escalation.

60–70%
deflection rate Vummadi says Giga’s AI support agents can reach

The underlying system, in his telling, is a loop. A customer starts with some resolution rate, the team changes the policy or “markdown file,” dashboards show what moved, and the deployment is iterated until the KPI improves. The model matters, but the operating loop matters more: listen to customer calls and meetings, update policies, adjust dashboards, and keep pushing the metric.

That is why Vummadi’s next product idea is an “AI forward-deployed engineer.” Forward-deployed engineers currently attend meetings, configure systems, translate customer intent into software changes, and keep deployments improving. Giga wants an AI system that can join Slack and Google Meet, take notes, update policies, change dashboards, and help move metrics without requiring a human implementation team for every adjustment.

DoorDash was the proof point Vummadi emphasized. He said Giga had eight people when it competed against a well-funded rival with roughly 400 people and won the DoorDash account after a three-month pilot. He credited YC trust and an introduction from Garry Tan to Tony Xu as part of the opportunity, and he did not present the account as independently verified proof that small startups always beat large competitors. But the lesson he drew was consistent with the broader operating shift: paid performance in a real workflow can outweigh AI positioning.

Giga’s internal operating model mirrors that thesis. Vummadi estimated the company would need six to seven times as many engineers without coding agents. He said a smaller team reduces context switching because one person can own and build more of an entire system. But Giga’s hiring process still tests understanding: candidates are asked to build with AI, then change the code without AI access. The company wants both tool leverage and underlying competence.

That is a useful corrective to the looser agent narrative. Enterprise customers are not buying autonomy in the abstract. They are buying improved operating metrics. A support agent that sounds human but fails to resolve issues is not a success. A coding agent that generates pull requests nobody can review is not leverage. A context system that retrieves documents but cannot explain precedent is not operational memory.

Vummadi’s blunt startup advice was that ideas are cheap and payment is the test. “It’s never about the idea,” he said. “It’s about if somebody is willing to pay you money for it.” In applied AI, that standard is becoming more severe. Customers have seen demos. They are now asking whether the product changes the work.

Giga Says Product Velocity Beat a 400-Person Rival at DoorDashY Combinator

5. The governance fight is becoming a fight over agency: open models, swappable systems, and worker fluency.

The governance debate is no longer separate from enterprise AI operations. If companies rely on one or two model providers for coding, support, decision systems, customer flows, and institutional memory, ROI and policy controls become dependency questions. Who can inspect the system? Who can swap it? Who sets the rules? Who can run it locally? Who gets to learn the tools directly?

The All-In discussion framed that issue as centralization versus agency. The panel used Pope Leo XIV’s AI encyclical as a starting point because it argued that technology is never neutral; it takes on the characteristics of those who devise, finance, regulate, and use it. Jason Calacanis summarized the Pope’s warning as a question of whether AI concentrates power in a few hands or serves everyone.

David Sacks agreed that centralization of power is the core risk, but he located the danger differently. He warned that government regulation could become the centralizing force if AI safety becomes an approval regime or an “FDA for AI.” In his view, safety mandates could expand into censorship-like control, as he believes happened in social media trust and safety. His preferred checks were competition, decentralization, open models, and antitrust if monopoly or duopoly emerges.

Bill Gurley, Chamath Palihapitiya, Calacanis, and Sacks argued sharply about Anthropic’s safety posture, with Gurley presenting competing interpretations: regulatory capture, or a more ideological belief in building something superior to humans. Those claims were interpretations from the panel, not established facts. The more durable thread was that frontier-lab safety rhetoric, regulation, and market share are now entangled. A company that defines itself as safest can also benefit if rivals face heavier compliance burdens or if open models are restricted.

Open models became the practical backstop in the discussion. Sacks argued that open source means software freedom: the ability to run systems on one’s own hardware without surrendering data or dependency to a monopolist aligned with government. Calacanis called the broader idea “intelligence sovereignty,” distinguishing it from ordinary privacy. Privacy means a company cannot look at your files. Intelligence sovereignty means a company cannot be the sole interpreter of your photos, messages, work, or worldview.

The enterprise version is swappability. Palihapitiya described Fortune 1000 deployments where the control plane can hot-swap among models because customers do not want to bet critical workflows on one lab’s technology, terms of service, or policy philosophy. Gurley connected that to open connectors and context management, comparing the strategic role of open interfaces to Kubernetes. If connectors, memory, and data interfaces are standardized, models become more interchangeable.

Control plane is the governance object hiding inside the productivity debate. It decides which model is used, what data is sent, what limits apply, what is logged, what can be audited, and whether the enterprise can switch providers. The same control plane that helps a CFO manage token cost can also determine whether a company is locked into a model provider’s policy regime.

The labor debate had less agreement. Sacks argued AI would lead to net job gains and cited software-developer postings and code-commit growth as evidence that coding automation has not eliminated developer demand. Calacanis argued that CEOs are using AI to do more with less and that some jobs and layers will be displaced. Palihapitiya argued many layoffs attributed to AI are really “AI washing” for prior overhiring and mismanagement. Gurley was less categorical, emphasizing that innovation historically raises prosperity while still disrupting individuals.

But the practical workplace advice converged. Workers need AI fluency. Gurley said the immediate question is whether people become the most AI-enabled version of themselves. Sacks called proficiency in Claude a highly marketable skill for new graduates. Calacanis said younger applicants increasingly choose to build software rather than write memos. The shared point was that agency now depends partly on direct tool competence.

That brings governance back to the worker level. If AI systems are centralized behind opaque institutional controls, workers may become operators inside someone else’s automation regime. If workers learn to use models, build workflows, inspect outputs, and turn messy context into tools, they gain bargaining power and practical agency. Open models and swappable enterprise systems are one form of decentralization. Worker fluency is another.

The operating-layer question therefore has governance consequences. Token budgets decide who can use AI and how much. Compute scarcity decides who can build and deploy at scale. Coding agents decide who can turn intent into software. Context graphs decide what institutional memory agents can access. Control planes decide which models, policies, and logs govern the work. Open ecosystems and worker skill decide whether that layer is contestable.

AI Governance Fight Shifts to Centralization, Open Models, and Worker AgencyAll-In Podcast

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free