Only 18% of AI Coding Spend Is Shipping Into Products

Ranjan RoyAlex KantrowitzTuesday, June 2, 202617 min read

Alex Kantrowitz and Ranjan Roy argue that the warning signs around the AI boom are less about a single spending scare than about a widening gap between AI usage and demonstrable value. Kantrowitz focuses on enterprise token spending that is not translating into shipped products, while Roy warns that “token maxing,” circular cloud financing and private-market valuation anchors are turning a promising technology into a reflexive capital cycle. Their discussion extends that concern from Anthropic’s surge past OpenAI to Robinhood’s AI trading plans and new data-for-services bargains, all pointing to the same test: whether AI adoption can become disciplined before the financial structure around it outruns the returns.

The AI spending scare is not one problem

Alex Kantrowitz framed the current unease around enterprise AI spending as a question of whether the boom is showing ordinary growing pains or something more structurally fragile. The examples are already concrete enough to matter: companies hitting annual token budgets in a quarter, AI bills doubling or tripling, Microsoft reportedly canceling most Claude Code licenses partly over cost, Uber’s operations chief saying the costs are getting harder to justify, and Starbucks terminating an AI inventory-counting program nine months after deploying it across North American stores.

But Kantrowitz resisted treating every one of those examples as proof that the entire industry is on fire. Starbucks’ program, he noted, was a computer-vision inventory tool for a difficult physical environment, not necessarily a direct referendum on large-language-model productivity. Microsoft has its own competing coding tool, which complicates the meaning of its Claude Code pullback. That leaves Uber and unnamed enterprise anecdotes as more directly relevant signals. He did not dismiss the problem; he argued that the public reaction may be outrunning the evidence.

Ranjan Roy took the other side of the tension without becoming an AI skeptic. Roy said he believes in agentic AI and works with enterprise AI directly, but argued that the last several months of unfettered experimentation have produced predictable waste. Engineers were given access to Claude Code and similar tools, often without meaningful visibility into usage or spending, while organizations encouraged them to use AI aggressively. In that environment, token consumption was not merely a cost center; in some companies it became a status signal.

Roy’s phrase for the pathology was “token maxing”: using high-throughput or expensive models in ways that burn tokens because the organization has created incentives to do so. He tied that behavior to internal token leaderboards at companies such as Amazon and Meta, and to the broader bragging rights that came with spending heavily on Claude. He said he had recently heard senior technology people “bragging about how much they are spending on Claude,” a behavior he found strange because executives do not normally boast about operating expenses.

The most sensational claim discussed came from an AI consultant cited in coverage: one client allegedly spent half a billion dollars in a single month after failing to place usage limits on Claude licenses. Kantrowitz emphasized that the company was unnamed and that such claims should be treated with caution. Roy agreed that only a small number of companies could plausibly absorb a $500 million monthly error, but used the example to illustrate the scale of the incentives now flowing through AI usage.

Another reported example sharpened the same point: Roy said there had been reporting that one employee on Meta’s AI token leaderboard had used 60 trillion tokens, which he estimated would equal $900 million at current API prices. His argument was not that every AI dollar is waste. It was that isolated, extreme usage can be annualized into revenue narratives, reflected in funding rounds, and then transmitted across the entire AI supply chain.

Kantrowitz drew a harder boundary. Even if token maxing is real, he did not believe Anthropic’s reported revenue growth could be explained mainly by wasteful leaderboard behavior. He cited a public list of Anthropic’s annualized revenue trajectory: $1 billion in January 2025, $3 billion in May, $4 billion in June, $5 billion in August, $7 billion in October, $8 billion to $10 billion in December, then $14 billion in February 2026, $19 billion in March, $30 billion in April, and $47 billion in May. His view was that accounting effects and extreme customer anecdotes may distort the picture, but cannot plausibly fake all of that demand.

Roy did not say the revenue was fake. He said the way it is being interpreted may be dangerous. If companies are bragging about Claude spending, if internal leaderboards reward token consumption, and if annualized revenue curves are then used to justify enormous valuations, he argued, the industry has created a feedback loop before it has fully established the productive value of the spend.

The 18 percent shipping rate is the sharper warning sign

For Kantrowitz, the more serious number was not the half-billion-dollar anecdote or the token leaderboard. It was a statistic from EntelligenceAI, cited in the Wall Street Journal discussion: among companies using advanced AI coding tools, only 18% of token spending was translating into shipped coding products that reached real users. EntelligenceAI was described as aggregating data from more than 2,000 companies using advanced AI tools for coding.

18%

of advanced AI coding-token spending translating into shipped products that reach real users, according to EntelligenceAI

Kantrowitz treated that as the real red light. In his framing, even if a minority of usage is pure token maxing, the industry still depends on the majority of AI spending producing measurable returns. If 82% of token spend is not resulting in shipped products, then the question is not whether a few engineers are gaming leaderboards. The question is whether enterprises can convert expensive model usage into output that customers, employees, and shareholders can see.

Uber became the practical example. Kantrowitz cited comments from Uber COO Andrew Macdonald, who said that higher token usage did not map neatly onto a proportional increase in useful consumer features. Macdonald’s formulation, as quoted in the source, was that “that link is not there yet,” and that it remained hard to draw a line from token-usage statistics to a claim such as, “now we’re actually producing 25% more useful consumer features.”

Kantrowitz said the reaction online split predictably. AI critics treated Uber’s comments as proof that the technology is overhyped; AI boosters treated them as an embarrassment for Uber, a “skill issue” rather than a technology issue. Kantrowitz’s position was more direct: if Uber, a technology company, is having trouble connecting AI use to shipped customer-facing work, less technical companies should not assume they will solve the problem easily.

Roy accepted that the 18% figure was a more serious concern than token maxing, but he interpreted it differently. He argued that for a four-to-six-month-old wave of enterprise experimentation, an 18% conversion rate may not be inherently irrational. New technology often requires wasteful exploration before organizations learn which workflows are valuable. The healthy version of that process, in Roy’s view, would be to identify the 18% that is working, reinvest there, and optimize the rest.

His own example was technical and specific. In enterprise agentic workflows, a team might initially pass a giant CSV file into the model as context, consuming large numbers of tokens. Later, it might convert the data into JSON, split it into multiple JSON files, and retrieve only the necessary chunks. Roy said changes like that can reduce token consumption by 70% or 80% in a process. To him, that is what responsible enterprise AI adoption should look like: build, measure, learn, and reduce waste.

Kantrowitz’s concern was that the financing and cultural environment around AI may not allow such patience. The issue is not experimentation in itself. It is experimentation under pressure from executives, investors, and public narratives that already assume a productivity revolution is underway. If employees are told to use as much AI as possible and the company then looks for immediate productivity gains four months later, disappointment is almost built in.

Roy agreed that the organizational setup matters. If a company tells its entire engineering team to use AI aggressively but gives no strategic direction, no cost visibility, and no clear definition of successful output, it should not expect a clean return-on-investment story in a few months. He also argued that some failed case studies are being cherry-picked. Starbucks’ computer-vision inventory tool, for example, attempted to infer milk, syrup, sauces, and other inventory states from photos in chaotic stores. Roy contrasted that with more established AI applications such as demand planning and inventory forecasting, arguing that the Starbucks case looked more like a press-release-driven project aimed at a hard physical-world problem than a fair proxy for enterprise AI as a whole.

Kantrowitz added another mitigating factor from Simon Willison, whose view was shown on screen: Claude Code only “got really good” in November, so budgets set in 2025 would predictably underestimate 2026 demand. Willison argued that Uber’s budget overrun and Microsoft’s seat cancellations may support a “product-market fit” hypothesis: customers are shocked by price, but still find the product useful enough to say yes or to renegotiate.

Roy agreed that the time window is too short for sweeping conclusions. But he stressed that useful tools can still lead users down expensive rabbit holes. Anyone who has used them, he said, can “crank tokens and end up nowhere.” The enterprise problem is what happens when that individual behavior is multiplied by thousands of workers under pressure to demonstrate AI adoption.

Kantrowitz floated a third possibility: perhaps the tools are useful enough that engineers can put parts of their jobs on autopilot, still ship the same features they were assigned, and thereby fail to produce visible productivity gains for management. In that scenario, the technology works, the employees adapt around it, and leadership sees only a higher bill.

Roy’s broader conclusion was not that the 82% wasted spend proves AI has no value. It was that “we can’t have nice things” because the industry did not approach the technology as a measured learning process. It approached it as a capital markets event.

Circular financing turns usage growth into systemic risk

The cost question becomes more consequential because of how AI infrastructure is being financed. Kantrowitz described a circular structure in which major cloud companies invest in AI labs, the labs spend heavily on compute from those same cloud providers, the cloud providers book the spending as revenue, and the valuation gains in the AI labs then appear in reported profits or paper gains.

Roy said this structure has been visible for some time, beginning with earlier discussions of cloud credits embedded in large funding rounds. What has changed is the scale. Microsoft invested $13 billion in OpenAI, and much of that flowed back to Microsoft through Azure usage, while Microsoft also benefited from the value of its OpenAI stake. Amazon and Google invested in Anthropic, while Anthropic became a major cloud customer.

Roy cited striking quarterly profit effects. Alphabet, Google’s parent, reported $62.6 billion in profit, of which $28.7 billion was described as a paper markup on Anthropic. Amazon reported $30.3 billion in profit, including a $16.8 billion Anthropic paper gain. Roy was careful not to call this unethical; he said companies may be required to recognize the markups. His concern was reflexivity. When nearly half of a reported profit number depends on a private AI valuation, and that valuation is itself supported by a capital cycle involving the same narrow group of companies and investors, a downturn could move quickly through the system.

Kantrowitz pushed on the accounting point. If Google invested in Anthropic at a low valuation and Anthropic is now valued near $900 billion, why should Google not mark up the value of its stake? Roy conceded the accounting logic. The problem, he said, is not a cabal of executives inventing fake profits. It is a small, socially and financially connected ecosystem in which everyone can see others making similar investments, receiving similar cloud commitments, and benefiting from similar valuation marks. Groupthink can become a financing model.

The forward-looking commitments may be even more fragile. Kantrowitz cited another set of figures: Microsoft has 49% of its $627 billion future backlog tied to OpenAI, while Oracle has 54% of its $553 billion pipeline depending on OpenAI alone. His point was that there is no guarantee OpenAI will ultimately spend all that money. AI can be a blessing for these companies, but dependence on one AI lab can become a curse if demand does not arrive as expected.

Roy said those contracts assume end demand from the rest of the world beyond the small circle of AI labs, cloud providers, chip companies, and investors. That is where the enterprise productivity debate loops back into the financing debate. If companies discover that large shares of token spending do not ship into real products, or if AI adoption slows as budgets tighten, then the cloud backlogs, chip demand, private valuations, and paper gains all face pressure at once.

The anticipated IPO wave adds another layer. Roy connected the enormous private valuations of companies such as SpaceX, OpenAI, and Anthropic to the amount of public-market capital that would be required to absorb them. He referred to a CNBC comment from Josh Brown describing the coming capital needs as “three asteroids coming to hit the Earth.” The concern is not simply that private investors want liquidity. It is that valuations anchored in late-stage private rounds may be handed to public investors before the underlying economics are fully tested.

The boom has already moved into bottlenecks

The AI boom is not confined to model labs and cloud providers. Kantrowitz highlighted a Wall Street Journal claim that AI has made memory chips “more valuable than oil.” The three largest memory-chip makers — Samsung Electronics, SK Hynix, and Micron Technology — were described as carrying market capitalizations above $1 trillion each, together about 22% above the combined market capitalization of the world’s three most valuable oil companies, even with Saudi Aramco near $1.8 trillion. Flash-memory maker SanDisk had nearly tripled since March and was said to be worth nearly as much as PetroChina.

Kantrowitz called the memory-chip numbers “bananas” and suggested they would have to fall eventually, while adding that this was not investment advice. Roy treated the memory boom as the kind of second- and third-order effect he had been warning about. AI systems produce and consume enormous amounts of data, so demand for memory was not surprising. What unsettled him was the speed and magnitude of the market reaction.

His anecdote from the investment community was telling. Friends do not ask him primarily about adoption patterns, business-process improvements, or whether enterprise AI is producing ROI. They ask: “What is the next bottleneck?” In his view, that question captures the speculative structure of the current cycle. Investors who missed the memory-chip move are looking for the next constrained component in the AI value chain: cooling, data-center infrastructure, or some other niche input that can become a bottleneck.

Kantrowitz left open the possibility that real economic forces are driving some of this. The AI companies may genuinely need huge quantities of memory chips and related infrastructure to build the systems they are promising. Roy did not deny that. His argument was about market health: when the dominant question becomes identifying the next bottleneck rather than measuring end-user value, the boom has shifted from technology adoption into supply-chain speculation.

Anthropic’s valuation lead is both operational and financial

Anthropic’s new financing crystallized the same debate. Kantrowitz cited the New York Times report that Anthropic had raised $65 billion at a $900 billion pre-money valuation, officially passing OpenAI’s last valuation of $730 billion and becoming the world’s most valuable AI startup. The company also introduced Claude Opus 4.8, described in the on-screen excerpt as significantly better than its predecessor at generating computer code.

Kantrowitz saw the milestone as remarkable. A year earlier, he said, Anthropic was valued around $350 billion; few would have expected it to pass OpenAI so quickly. The rise, in his view, came largely on the back of Claude Code. Anthropic is the hottest company in AI, with a clear shot at becoming one of the most successful IPOs among the AI labs, even if SpaceX is treated as a separate category.

Roy cautioned against counting OpenAI out, but read the Anthropic round as a classic late-stage financing signal. In his view, investors such as Altimeter, Dragoneer, and Sequoia have long used pre-IPO rounds to establish a valuation anchor for the public market. What used to be a guarded secret — a private company’s valuation — has become part of the sales process. The number is repeated until the market treats it as the reference point.

Kantrowitz noted that OpenAI had raised $122 billion and reached only an $852 billion post-money valuation, implying that Anthropic’s financing represented a real symbolic shift. Roy’s response was that everyone now needs to see the S-1 filings. He cited SpaceX’s filing, discussed previously, as “almost shockingly not good” despite its implied $1.8 trillion valuation. His broader point was that valuation anchoring can shape expectations before public investors see the underlying financials.

When Kantrowitz called the argument a “nice conspiracy,” Roy rejected the word. “It is just finance,” he said. A higher private valuation would ordinarily be something new investors resist because it buys them less ownership. But when late-stage investors are already positioned for an IPO, Roy argued, they have every incentive to support a high anchor if it helps realize gains in the public market.

The Anthropic-versus-OpenAI comparison therefore has two meanings at once. Operationally, Anthropic has gained momentum through Claude Code and enterprise usage. Financially, its valuation has become part of a larger pre-IPO race in which private-market marks, cloud commitments, and public-market expectations are tightly intertwined.

Robinhood’s AI trading feature points to the agentic endgame

Robinhood’s new AI feature moved the discussion from enterprise costs to consumer agency. The Wall Street Journal report, shown on screen, said Robinhood would allow users to link an AI agent such as Anthropic’s Claude or the coding agent Cursor to a separate dedicated investment account. The agent could access dedicated funds and place trades as directed. Users might ask it to reduce concentration risk, monitor semiconductor stocks, or invest $100 according to a detailed strategy based on startup funding, deal activity, and private-company valuations. Options trading was not included — “yet,” Kantrowitz noted.

Roy treated the timing as narratively perfect. If the AI capital cycle ends with retail investors buying into the same assets through AI-directed Robinhood accounts, it completes the loop. But after initially mocking the idea, he conceded the underlying product logic. An agent probably can trade better than an ordinary person who is not deeply focused on markets, especially if given clear constraints. In that sense, Robinhood’s feature may be an evolution of algorithmic wealth-management products such as Betterment.

Kantrowitz made the broader claim: this will not stop at Robinhood. Once users begin researching stocks in ChatGPT or Claude, the chatbot itself will have reason to offer a strategy, connect to a bank account, allocate money, monitor performance, and ask whether the user wants to invest more. The same pattern applies beyond investing. Chatbots will try to infer the next action after a conversation and keep the user inside the chat experience while agents or computer-use tools complete the task elsewhere.

Roy said early versions of that behavior are already visible. Claude and Gemini sometimes respond to simple requests by building interactive experiences or dashboards, going beyond the immediate question. If the systems gain access to more accounts and services, they will try to do more. The optimistic version is that the agent anticipates user needs and executes well. The risky version is that users grant broad access before the consequences are clear.

Kantrowitz’s own Gmail example illustrated the tradeoff. While trying to determine whether his business had a particular identifying number, he asked ChatGPT. The system suggested the answer might be in his Gmail and surfaced a Gmail connector inside the chat. Kantrowitz connected it; ChatGPT searched his email, found incorporation documents, and answered the question. He later asked how much he had paid for a flight, and it retrieved the ticket price from Gmail.

Roy found that more unsettling than the Robinhood feature. Giving OpenAI access to email, he said, is “kind of terrifying.” Kantrowitz acknowledged the privacy concern but argued that the direction is clear. If ChatGPT can draft an email, and it has Gmail access, the next step is drafting inside Gmail, then sending the email, then monitoring the replies. The product pressure is toward completing the workflow.

Free services are becoming data-collection bargains

The final privacy example was more physical. Roy described Shift, an AI training startup that offers to clean homes for free in exchange for recording cleaners as they scrub, vacuum, dust, tidy, and wash, using the footage to train robots. The Verge excerpt shown on screen framed the catch plainly: the home cleaning is free because the cleaning session becomes robot-training data.

Roy said the company indicated it would scrub personally identifiable or private data, though he emphasized the difficulty of doing that reliably in a home environment. Kantrowitz imagined future headlines about intimate or sensitive footage being stored by a free-cleaning AI startup, drawing a comparison to earlier concerns about robot vacuums and data labeling. Roy narrowed the concern to documents, contracts, and other private materials that might be visible in a home during recording.

Roy also noted that when he asked online whether the service was real, the founder responded that it was, saying the company had already served some customers over the previous weeks and that globally more than 10,000 contributors had collected “skill demonstrations.” Roy found that phrase revealing: a person cleaning a home becomes a demonstrator of a skill for a future robot.

Neither speaker treated the offer as obviously irrational for consumers. Roy said he was tempted mainly out of curiosity — he wanted to see whether the cleaner would wear a GoPro, a camera rig, or something closer to a motion-capture setup. Kantrowitz suggested that if he was already willing to connect Gmail to ChatGPT, perhaps inviting a recorded cleaner into his home was just the next step. Roy replied that, at that point, “it’s all gone anyways.”

The through-line was not that every user will reject these bargains. It was almost the opposite. AI products are becoming useful enough that people will trade privacy, autonomy, and data access for convenience. The Gmail connector, Robinhood’s agentic trading account, and Shift’s free cleaning service all point to the same direction: AI systems need permissions, data, and real-world demonstrations, and consumers will be offered services in exchange for providing them.

Data and Training AI Startups and Funding AI Consumer Products AI Market Signals Agents and Autonomy AI Infrastructure and Compute AI Economics and Labor Coding Assistants Enterprise AI Adoption