Coding Is AI’s First Breakout Market, but Value Capture Remains Unsettled

Benedict Evansa16zMonday, June 8, 202623 min read

Tech analyst Benedict Evans argues in an a16z interview with Erik Torenberg that AI now looks less like a solved platform shift than a market with one clear breakout use case: coding. Evans says agentic software development has reached real product-market pull, while larger questions about consumer adoption, enterprise workflows, model differentiation, infrastructure spending and value capture remain unresolved. His central case is that AI resembles the internet in 1997: obviously important, already useful in places, but still too early to know which layer of the stack will own the economics.

AI has found one market that is pulling hard, and most of the rest is still unresolved

Benedict Evans treats the past year’s biggest change in AI as narrower than the industry’s overall rhetoric. The open questions have not been answered. One use case, however, has become unmistakably real: agentic coding.

The prior state, in his account, was “kind of sort of working and kind of exciting,” with the industry unsure what to do with the technology. Now software development has narrowed attention because it has “absolute product market fit” in the practical sense: customers are pulling the product out of vendors’ hands. Evans does not argue that coding will be the only meaningful use case. He says almost certainly there will be others. But coding is the one that is working now.

Agentic coding went from being kind of useful to really changing everything.

Benedict Evans · Source

That result was partly foreseeable and partly not. At a simple level, the first people experimenting seriously with large language models were software developers, and the first thing software developers would try to make work was software development. Evans compares this to the early PC era: the first thing people did with PCs was make computers. In his framing, LLMs are themselves a kind of computer, and the first thing people are doing with them is making more compute.

But he does not think anyone could have deterministically predicted the timing, or that coding would cross the line first in precisely this way. The move from assistance to agentic work happened quickly. Six months earlier, he says, it did not work in the same sense. That makes claims about downstream labor structure premature. Companies are now asking real questions about junior engineers, senior engineers, team organization, and what entry-level work is for if a class of tasks can be automated. The questions have become concrete; the answers have not.

The implications for software engineering careers will take years to settle. It is now realistic to ask whether companies should hire junior engineers, what those engineers would do, and whether the old justification for hiring them was the actual tasks they performed or something else. Evans rejects confident claims about the market structure of engineering in three years. “You’d be insane,” he says, to think you could know that yet.

The same uncertainty applies to the broader AI market. Top-line numbers are rising: models are getting bigger, capex is growing, usage is growing, and more people are using the tools. But Evans says the fundamental questions from two or three years ago remain open. It is not known whether there will be a winner in models. It is not known whether model companies can capture value up the stack. It is not known how much models can do. And with the current technology, he does not see a clear path for consumers to use AI daily rather than weekly.

That distinction matters because Evans sees a gap between Silicon Valley power users and everyone else. On one side are people running AI tools constantly, including those who have clusters of Mac Studios and use Claude all day. On the other side are people who say the tool is useful, but they used it last week for something. Coding, in his account, is where that gap has most clearly been crossed.

OpenAI and Anthropic illustrate a strategic split, not a settled platform outcome

Benedict Evans describes OpenAI’s recent history as a series of product-strategy iterations: from “everything all at once yesterday” to a more concentrated push around coding. Anthropic, with less capital raised, focused on coding and “got coding working,” though Evans explicitly leaves open whether that was deliberate strategy or a fortunate discovery.

The contrast matters because Evans sees the AI market moving beyond the first phase of competition, when the dominant imperative was simply to build a bigger model faster with more compute. There is now more visible divergence in product strategy and more competitive tension around what to build on top of models. But that divergence does not prove that any company has found the final platform structure.

OpenAI’s challenge, as Evans describes it, was the question of what comes after the models themselves. If the model exists, what else can be built around it to create value? Evans characterizes one phase of OpenAI’s strategy as almost literally asking ChatGPT for 15 ideas and doing all of them. The line is pointed, but the underlying claim is structural: a foundation model alone does not determine a product strategy.

Anthropic’s coding success gives one answer, but only one. The things working right now are software development and “some things in some other fields.” Beyond that, many people are experimenting at the edges. Enterprises, especially outside tech and outside the United States, are often looking at narrow point solutions rather than broad chatbot adoption.

His example is a commodities company that wants to use LLMs to improve cash-flow forecasting. The company works with many small producers and does not always know when invoices will be paid. Because the business is low margin, better prediction matters. Asking ChatGPT or Claude to summarize the week’s meetings is one mode of use; applying LLMs to a specific business-process problem with direct operational consequences is another.

In some contexts, the challenge is not asking a user to figure out what to do with a new tool. It is identifying a known business problem and building or buying a system that can solve it.

The right analogy is not a prediction machine

Benedict Evans repeatedly compares the current AI moment to the internet in 1997 or 1998, the PC market in the late 1970s and early 1980s, and mobile data around 2009 and 2010. But the comparisons are useful for asking questions, not for predicting the answer.

In early platform shifts, the technology is exciting but unclear. It does not quite work yet, and it is not obvious what it is for. Early PCs required users to tolerate crashes, hardware configuration, and missing infrastructure. Evans recalls a time when it was normal for a computer screen to freeze and for the user to crawl under the desk, unplug the machine, and hope some work survived. Sound cards could cost $300 and take a weekend to make work. Early internet access required TCP/IP software on a floppy disk. Mobile data had similar friction.

The present AI market has the same gap between the small number of people willing to put in work to make something function and the mass-market product where a user presses a button and it happens.

The mobile-data analogy is especially important because it connects adoption, pricing, infrastructure, and value capture. In the late 2000s, users could receive $5,000 or $10,000 data bills. At the same time, flat-rate plans could overwhelm networks. AT&T/Cingular launched the iPhone with flat-rate data; then people bought iPhones, used 3G, watched YouTube, and the network lacked capacity. Evans emphasizes that cellular networks have marginal costs: operators had to add capacity, and capacity cost money.

He sees the same pattern in AI tokens. One user pays $20 a month and receives what might be $10,000 worth of token consumption; another experiments for a few days and receives a $10,000 bill. Pricing, perceived value, usage, and underlying cost are misaligned.

1,500–2,000x

Evans’s spoken estimate of mobile-data traffic growth since the late 2000s

The second part of the mobile analogy is value capture. Evans says mobile data traffic rose by roughly 1,500 to 2,000 times. He describes mobile networks collectively as having about $1 trillion in revenue and spending about $200 billion a year on capex, while their stocks have been flat for 20 years. They built global infrastructure that changed daily life, but “all the cool stuff” was built by someone else.

That is the unresolved question for LLMs. Do foundation models become the infrastructure layer, like mobile networks, ISPs, chips, or cloud infrastructure? Or do they become something more like operating systems, with leverage, network effects, and the ability to decide what gets built?

History does not answer that. It suggests the question. Chip companies did not capture all the value. ISPs did not. Mobile network operators did not. Windows and iOS did capture value, but they had levers up the stack and network effects that Evans does not currently see in models. Netscape is another warning: Marc Andreessen famously said Netscape would reduce Windows to “a set of badly debugged device drivers,” but browsers turned out not to be where all the value sat.

The broader methodological point is that platform comparisons are useful but not predictive. In hindsight, outcomes look inevitable. At the time, they usually were not. Fifteen years earlier, Evans says, many smart people thought iPhone versus Android was simply open versus closed again and that Android would crush the iPhone. That did not happen, even if it can now be explained.

Foundation models look more like infrastructure than products

Benedict Evans does make one substantive bet: he does not think foundation models are products, and he does not think the chatbot is the product. He expects value to sit further up the stack.

His argument has several parts. First, he does not see a clear path for one model to be sustainably and fundamentally better than all the others in a differentiated way. Models may have different emphases. One may be better for a task, or a user may prefer one. But he does not see the equivalent of Instagram, YouTube, or Google Search: a position with a durable network effect or structural leverage.

Second, the chatbot is a limited V1 user interface. It works well for some tasks and some people, but most work requires more than a prompt box. It needs tooling, the right data, configuration, controls, and a purpose-built interface. Someone needs to sit down and decide how the workflow should work.

Evans draws a distinction between people who are skilled at a job and people who are skilled at designing tools for that job. Great print designers are not necessarily the people who should create InDesign. Great financial advisers are not necessarily the people who should design TurboTax. The same logic applies to AI applications: the user who performs the work may not be the person best equipped to define the software abstraction.

Third, Evans doubts that model companies can build all the applications themselves. Microsoft and Apple did not build every Windows or iPhone app. If AI requires many dedicated, horizontal, and vertical applications, then the foundation model provider may sit underneath them rather than above them.

Enterprise procurement makes the point. If a law firm buys software, it generally does not care whether the SaaS product uses Claude or OpenAI underneath because the firm has “standardized on Claude.” That is not how cloud works either. A customer often does not know what cloud a SaaS product runs on, because that infrastructure has been abstracted away. Evans thinks foundation models may look more like hyperscalers in that sense: important, expensive, competitive, but without direct control over the application-layer customer.

Layer or analogy	Evans’s point	Implication for AI models
Mobile networks	Built essential infrastructure, but much of the value moved up the stack	Model providers may enable the market without controlling the most valuable applications
Cloud providers	Often abstracted away from the SaaS buyer	Enterprise customers may not care which model sits underneath a product
Operating systems	Captured value through leverage and network effects	Models would need comparable control or network effects to resemble this layer
Semiconductors	Each generation can become more expensive, leaving fewer players	Frontier models may consolidate even if their output is commoditized

Evans’s analogies separate infrastructure importance from value capture.

That does not mean Evans is certain models become commodities. He frames his position as a chain of reasoning: if models are not sustainably differentiated, if chatbot UI is inadequate, if applications need to be built above models, and if multiple model suppliers compete to sell similar capabilities, then the default expectation should be commoditization. He asks the opposing question: explain why that will not happen.

The price environment today, in his view, should not be mistaken for the long-term structure. Current demand for tokens is intense because supply is scarce. But extreme scarcity is transitory. There is a surge of capex, model efficiency is improving dramatically, new models arrive, and developers can switch providers. Mobile data also had effectively infinite demand, but telecoms still ended up in price wars because they were selling a commodity to customers willing to move.

Evans allows for ways he could be wrong. The world might end up with only two companies capable of making frontier LLMs, and those companies might have pricing power. Models might subsume more of the application layer than he expects. Model companies might find leverage up the stack. But the present scarcity cannot be extrapolated indefinitely.

The capex curve is running into financial gravity

Benedict Evans frames AI infrastructure spending as a “financial gravity problem.” In his spoken estimate, Microsoft, Meta, and Google are in line to spend more than 50% of revenue on capex this year. Telecoms, which are considered capital intensive, spend roughly 15% to 20% of revenue on capex.

Evans cites $700 billion as guidance from the big four companies for the year. He compares that with telecom capex of about $300 billion overall, mobile capex of about $200 billion, and oil and gas capex somewhere between $700 billion and $1 trillion depending on definitions and who is counting. His point is magnitude, not precision accounting. $700 billion a year, in his framing, is the scale of major global infrastructure.

$700B

Evans’s cited annual capex guidance from the big four companies

The large technology companies could not spend $1.5 trillion next year without borrowing, Evans says, and they could not sustain that level for long. At some point, growth in capex has to slow because there is no more money available. He does not claim to know the exact ceiling. He is more confident in the existence of a ceiling than in the number.

The difficulty is that the investment case is both economically compelling and strategically existential. For Google, Meta, Microsoft, Amazon to some extent, and Apple to some extent, AI may be the future of compute. If it is, they cannot sit out. Evans compares the risk to Microsoft in the 2000s, IBM in the 1990s, and Meta in the 2010s being constrained by Apple. No CEO wants to let a rival define the next platform.

At the same time, CFOs have to ask how much participation is required. The returns on current investment may look hugely positive, but the infrastructure race has multiple unstable variables: demand exceeds supply; efficiency is increasing; no one knows the next model’s capabilities; edge and open source may matter more later; and models may be relevant for only three to six or six to nine months before the next frontier system arrives.

Attempts to model AI infrastructure economics remind Evans of trying to model internet bandwidth in the late 1990s. The spreadsheet rows are visible, but the values are uncertain. The only reliable statement is that physical and financial limits exist.

He is explicit about the upper bound: the world cannot spend $10 trillion a year on AI infrastructure because there is not $10 trillion a year available to spend. The sentence is meant to puncture a style of reasoning that treats demand as the only relevant variable. Demand may be immense. But capital supply, infrastructure buildout, model efficiency, and pricing all still have to reconcile.

That is why the capex argument and the commoditization argument are linked. Scarcity supports today’s pricing. Massive investment is meant to relieve scarcity. If supply rises, efficiency improves, and several providers compete to sell similar tokens, the question becomes where durable pricing power comes from.

ROI will be hard to measure before it becomes unavoidable

Benedict Evans expects a reckoning around wasteful token use, but he does not reduce the issue to companies being foolish. At this stage, it is genuinely hard to know the return on investment.

Some users are clearly “using the most expensive model to dick around on the internet,” just as mobile users once discovered that supposedly flat-rate behavior could produce large data bills. But the deeper issue is disequilibrium: cost, pricing, usage, and ROI are not yet aligned.

A chart from Evans’s presentation, attributed on-screen to the Atlanta Fed and Baslandze et al., divides early AI benefits into categories that are easier to deploy but harder to measure, versus harder to deploy but easier to measure. The slide is titled “What works first?” and describes survey results from U.S. CFOs in December 2023, separating 2023 results from 2024 expectations. Productivity, better insights and decision-making, and better customer service sit under the “easy to deploy, hard to measure” side. Cost-saving and new revenue sit under “hard to deploy, easy to measure.”

Evans’s interpretation is that the first benefits are real but financially slippery. If employees make more slides faster, do analysis more quickly, or improve support quality, the business may benefit, but the value is not as cleanly measured as a new AI-driven revenue line or a directly eliminated cost.

Excel provides his model for how productivity gains can become competitive necessities rather than durable margin. If a discounted cash flow analysis takes a week, a person might do one or two. If it takes 10 seconds, they might do 50. That does not necessarily mean the firm can charge more. It may mean everyone does more analysis, and the productivity gain is competed away.

The consulting example follows the same logic. If Bain, BCG, or McKinsey can do a piece of analysis in a day that used to take a week, they may do five times more analysis and charge the client the same. Their cost base may not change proportionally. The client receives more work, the consultant remains competitive, and the surplus may not show up as higher margins for the firm using the tool.

That does not make the technology unimportant. Its effects may appear as consumer surplus, competitive pressure, or rising baseline expectations rather than as an easily attributable ROI line item. Evans sees this as consistent with earlier software waves: tools can become indispensable without allowing every adopter to capture the full economic value.

The next big AI questions may belong to lawyers, consultants, retailers, and Hollywood

Benedict Evans argues that as AI moves beyond foundation-model research and coding, many of the important questions stop being purely technology questions. They become industry questions.

He uses Netflix to explain the shift. In earlier essays displayed from his own site, Evans argued that “content isn’t king” and that “Netflix is not a tech company.” The visible text on the Netflix page described Netflix as “a television company using tech as a crowbar for market entry,” where the technology had to be good but the questions that mattered were still TV questions.

Technology enabled the market entry, but the decisive questions became television and Los Angeles questions: what shows to make, how many, what to pay talent, whether to pursue awards, whether to make movies, whether to buy sports, and what kind of sports. Those are not questions San Francisco is naturally equipped to answer.

AI in law, consulting, finance, advertising, or Hollywood may follow a similar pattern. “What does this mean for law?” is partly an AI question, but it is also a question for people who understand how law firms actually work, what associates do, how clients buy legal work, and what the deliverable really is. Evans says the same for generative video and Hollywood: someone like Ben Affleck may know more about the relevant questions than a technology analyst because he understands the industry.

Professional-services firms are one of Evans’s recurring examples because they have traditionally used pyramid structures. If AI automates a large share of what people at the bottom of the pyramid do, the implications are not obvious from outside. An outsider may not know what those associates actually do or what clients were paying for. The task may be automated while the job remains; or the job may be reconfigured; or the product and margin structure may change.

Evans also emphasizes how much organizational work is implicit. Strategy consultancies such as Bain, BCG, and McKinsey have value partly because they can enter a company, talk across organizational boundaries, discover how work really gets done, and identify why the official strategy is not being followed. Sometimes the reason is that incentives point in the opposite direction: people’s bonus targets may depend on not doing the strategy. That kind of tacit organizational diagnosis is hard to write down, hard to put in training data, and hard to turn into a Claude skill.

This is one reason he is cautious about AI agents simply redesigning companies or solving large business problems. Many problems are not known in advance by the people who have them. In venture pitches, he says, some ideas seem to “fill a hole in the universe”: once explained, everyone wonders why no one saw the problem before. But often people in the industry did not know the problem existed, and it could take years to persuade them. That is a hard fit with the idea that a middle manager will prompt a general model into solving a major industry problem.

Outside observers may be able to identify classes of disruption, but they may not know which workflows matter, which incentives govern behavior, or what clients are really buying. That makes some of AI’s most important questions “half AI questions, half something else” questions.

The most interesting uses are not old workflows made cheaper

Benedict Evans proposes several ways to think about what AI may unlock beyond coding. The first is price elasticity: if something becomes cheaper, do people do the same amount for less money, more for the same money, or much more for more money because new demand appears? He connects this to Jevons paradox, but treats it as a practical business question rather than a slogan.

A second question is whether AI removes a former barrier to entry. Owning a printing press was once both a cost basis and a barrier to entry for newspapers. If a technology eliminates an equivalent constraint in another industry, the competitive structure changes.

A third question is whether AI makes possible things that were previously so expensive or impractical that no one considered them. Evans’s historical example is the steam engine making trains possible: no number of horses would produce an express train. His contemporary examples are Spotify and YouTube. The first phase of music disruption was not needing to buy a $15 CD for one track. The second was $15 a month buying access to essentially all music, a proposition that had previously been impossible.

Applied to AI, Evans is less interested in rebuilding old products with new technology than in raising the level of abstraction. He calls it a paired fallacy to say that because a new technology exists, the goal is to rebuild the old thing with it: rebuild Office with open source, rebuild Office on the web, rebuild existing software with AI. Google Docs exists, but that is not the most interesting point. The important products are usually the new things that could not exist under the prior abstraction.

Advertising, commerce, brands, and retail are areas he finds especially interesting because of their scale and because AI may alter what computers can infer about products and intent. Google, Meta, and Amazon, in his account, historically know a product as a SKU, metadata, and purchase correlations. They know that people who bought this also bought that, but not necessarily why. This is why Amazon might recommend more toilet seats after someone buys one: it does not really understand the object or the purchase context, even though it should be able to infer some frequency patterns.

With LLMs, systems may have a different level of statistical understanding. A user could show a picture of a coat and ask what it is and where to buy it. That would not have worked 10 years ago and probably would not have worked five years ago; now it should. The next step is asking for 10 similar coats at different prices, with pros and cons. The step after that is asking the system to look at a user’s Instagram and suggest a winter coat that changes their look but not too much. Three years ago, Evans says, that would have sounded like science fiction; now it seems plausible that someone could build a version that works.

The enterprise analogue is higher-level synthesis across systems. A company might have recorded Zoom calls with clients, email flows, Salesforce data, product telemetry, metrics, and analytics. Instead of asking an AI system to do sentiment analysis on call-center calls, a company might ask how to change pricing to reduce churn. That is a different layer of abstraction: not classifying a known input, but synthesizing across systems to recommend an operational change.

Evans does not pretend to know which of these uses will become Uber or Airbnb-scale outcomes. His point is that in 1997, predicting those companies from the internet’s capabilities would have been difficult. If such predictions were easy, venture capital would not have a one-in-ten hit rate.

If Evans is right, AI means more software

Benedict Evans’s answer to what AI does to software is simple: more software. But he gets there by mapping where AI lands inside enterprise systems rather than by making a clean stock-market call.

He divides enterprise software into three broad buckets. First are large horizontal systems: SAP, Workday, CRM, capital-management software, payroll systems, and similar “big iron” platforms. Second are vertical apps: a large U.S. company might have 300 to 400 SaaS applications, plus another thousand applications it bought, built, or runs internally on premises. Third is the improvised middle: Excel, email, shared file systems, and internal workarounds.

Tasks move among those buckets. In principle, every SaaS app does something that could have been handled in SAP or Excel. Graduate recruiting could be managed in Workday, a dedicated recruiting tool, email, or a shared Google Sheet depending on the company’s scale. PwC, hiring thousands of graduates each year, probably has dedicated software, perhaps built internally or by a consultancy. A company hiring five graduates a year would not buy software for that; it would use email and a spreadsheet.

AI enters that already fragmented landscape as another option. A company might do the task in an LLM. A vendor might add an LLM feature to Salesforce or a vertical app. A department might use an LLM to build its own tool, much as departments have long run on mysterious Excel files built 15 years ago by someone who has left.

Evans’s second framing is whether the LLM belongs at the bottom of the stack or the top. At the bottom, it is a feature inside an existing system: Salesforce can review the customer history, compare similar sales calls, consider business objectives, and suggest an email or call strategy. The AI is controlled by the use case, with tooling and guardrails.

At the top, the LLM sits across systems: Salesforce, Workday, email, Google Analytics, telemetry, and other data sources. It synthesizes information that no one system could provide on its own. Evans thinks both patterns will exist. The unresolved design question is where to put probabilistic software that can make mistakes and where to put deterministic systems that cannot answer broad questions but can reliably store and execute structured processes.

SaaS already produced one or two orders of magnitude more software. Evans says AI should probably do the same. That does not mean every SaaS incumbent survives. Investors are right to assume some percentage of current SaaS companies will be wiped out. The problem is that no one knows which ones. That uncertainty may make investors reluctant to be long software, but it does not justify a clean, universal conclusion that the whole category is doomed.

Tasks change faster than jobs, and exception handling matters

Benedict Evans is cautious about claims that AI-native systems will eliminate human-facing software or that systems of record will be built primarily for agents rather than people. He finds the ideas interesting, but he is unsure how new some of the underlying questions are. Chris Dixon, he recalls, said 10 or 15 years ago that APIs were the new business development. The contemporary version may be that companies do not need an API so much as an MCP server for agents to plug into. Evans’s reaction is that much of this may be “what’s old is new.”

The deeper issue is exception handling. The important decisions are often the ones that cannot be automated: cases requiring judgment, opinion, or a response to something that has not happened in that form before.

He separates tasks from jobs. The tasks used to accomplish a job can change dramatically while the job and the client-facing product remain similar. Accountants today do almost none of the same tasks accountants did 50 years ago, but to the client, the service may look broadly like accounting. The work has been reorganized around new tools.

A more abstract distinction is between places where the desired output is “the average” and places where it is not. LLMs are well suited to situations where the user wants the answer that anyone would give, the document any associate would produce, the standard form of analysis, or the conventional version of a task that can be described. They are less suited to situations where the user wants a new answer, a different answer, a different idea, or a result that cannot be explained by describing how people normally do it.

That boundary matters for both automation and product design. If the value of a task is conformity to a standard pattern, AI may be powerful. If the value is judgment under ambiguity, novelty, or tacit understanding, Evans is more cautious.

AI may become ordinary magic

Benedict Evans closes with an IBM advertisement from the early 1950s. His presentation reproduces an IBM ad promising “150 Extra Engineers.” The ad says an IBM electronic calculator can speed through thousands of computations so quickly that, on many complex problems, it is “just like having 150 EXTRA ENGINEERS.” It also says valuable engineering personnel, “now in critical shortage,” no longer have to spend “priceless creative time at routine repetitive figuring.”

Evans uses it as a reminder that the pitch around AI has deep historical echoes: machines taking over routine repetitive figuring, freeing scarce skilled workers, and changing what organizations can do.

It’s going to be magic, and in 20 years time we’ll just say, well, of course that’s how it is. Computers have always done that.

Benedict Evans

The point is not that AI is ordinary. Evans says AI is amazing, transformative, and unlike anything that came before. But mobile was also a big deal. So were the internet, PCs, and computing. Each wave was hard to predict, each changed everything, each created winners and losers, and each eventually became invisible in daily life.

This wave, in his framing, will produce some things that ruin people’s lives, put some people out of work, and create outcomes people are not happy about. It will also create things people value and later take for granted. Evans points to the call itself as an example of past magic becoming mundane: computers streaming HD video without crashing, an iPhone streaming to a Mac over Wi-Fi, everything simply working.

The future he sketches is not a clean forecast. It is a set of live questions: whether models differentiate, whether applications capture the value, whether capex reaches a sustainable equilibrium, whether consumer usage becomes daily, whether professional-services pyramids are rebuilt, whether AI creates new abstractions rather than cheaper old workflows. Coding has crossed into obvious demand. The rest of the economy is still working out what problem it wants AI to solve.

AI Application Architecture AI Labs and Strategy Agents and Autonomy AI Infrastructure and Compute AI Business Models AI Economics and Labor Coding Assistants Enterprise AI Adoption