Foundation Models May Become Commodity Infrastructure for AI Applications

Erik Torenberga16zThursday, June 4, 202621 min read

Tech analyst Benedict Evans argues that AI has crossed into real customer pull first in software development, while the broader product and business-model questions remain unsettled. In a conversation with Erik Torenberg for a16z, Evans says foundation models may become indispensable but commoditized infrastructure unless their providers can show durable pricing power, distribution control, or network effects. His case is less a prediction than a warning against mistaking today’s scarcity, capex surge, and excitement for the market’s eventual equilibrium.

Coding is the first place AI has crossed from promise into pull

The most important change in AI over the past year is not that models became larger, usage grew, or capital expenditure accelerated. Those trends continued. The sharper shift is that product strategy narrowed around the first use case with clear pull from customers: software development.

Benedict Evans says agentic coding moved from “kind of useful” to materially different in a short period. Demand is not theoretical. Customers are “pulling it out of your hands,” creating a crunch around capacity, pricing, and capital spending. The broad early posture — bigger models, more compute, more possible products — has given way to a much more specific observation: AI works for coding now. Whether it will work comparably well elsewhere remains one of the unresolved questions.

That coding would be early was not shocking in retrospect. Software developers were the people “messing about with this stuff,” and the first thing they tried to make work was software development. Evans compares it to the early PC era, when the first thing people did with PCs was make computers. LLMs, in one sense, are computers; the first breakout use is making more software.

But he resists turning that hindsight into a deterministic prediction. Some people who believed models would be able to do everything can now say coding vindicates them. Evans does not think anyone could have known exactly when agentic coding would work, or that coding would be the first use case to cross the line.

The coding-agent interface used as an example was not a chatbot writing a paragraph about code. A user asked it to “Add -E option for extended regex,” and the agent displayed a plan to inspect flags and regex handling, open files such as Cargo.toml, src/search.rs, src/flags.rs, and src/args.rs, then “wire a new -E option through parsing and search logic.” The point is the shift from answer generation toward codebase exploration, implementation planning, and file-level changes.

Evans is much more cautious about what this means for engineering jobs and team design. He says the technology did not work this way six months earlier, and everyone is still scrambling to understand the implications. The obvious questions are now real rather than speculative: whether to hire junior engineers, what junior engineers would do, and whether companies hired them for the tasks they performed or for training, leverage, and career-pipeline reasons.

His answer is not that software engineering careers are safe, nor that they are about to vanish. It is that no one can yet know the market structure or what the career of a software engineer will look like in three years.

I think it'd be you'd be insane to think that you could know that yet.

Benedict Evans · Source

The more general claim is that AI has converted one class of labor question from abstract to concrete. A body of work that people used to do can now be automated in software development. What happens to firms, careers, training systems, and hierarchy after that has not settled.

OpenAI’s posture illustrates the unresolved product problem

The scramble above the model layer is visible in OpenAI’s public product posture as Evans reads it. Benedict Evans describes the company as moving through several approaches after the initial model race. In his telling, at one point it looked as if OpenAI was trying “everything all at once”: ads, e-commerce, shopping carts, payments, a browser, a social video app, and other attempts to build value on top of model infrastructure. Evans jokes that it looked almost like asking ChatGPT for 15 ideas for building value and doing all of them.

Anthropic, with less capital raised, focused more directly on coding. Evans does not say whether that was deliberate strategy or fortunate discovery, but says the outcome is clear: coding worked. OpenAI then swung back toward coding as an obvious priority.

The larger issue is that outside software development and a handful of specific domains, AI has not yet become a daily product for most people. Evans distinguishes between heavy users in Silicon Valley and the much broader public that finds AI useful occasionally. He says the data still shows roughly 10% of people as daily active users and 30–40% as weekly active users. If someone uses a tool once a week, he says, “you haven’t achieved nirvana yet.”

10%

approximate share Evans cites as daily active AI users

The gap is not just enthusiasm. It is product form. A highly technical user may run Claude all day, while many others say they used ChatGPT last week for something. The central challenge is bridging that gap.

Some enterprise uses do not require the end user to discover what to do with a new general-purpose tool. A corporation can identify a specific back-office process and automate it. Evans gives the example of a commodities company that wants to use LLMs to improve cash-flow forecasting. The company deals with many small producers, does not necessarily know when invoices will be paid, and operates in a low-margin business where cash-flow prediction matters. That is a different problem from asking a chatbot to summarize meetings.

This distinction runs through Evans’s thinking. General chatbots expose the technology to users, but many valuable uses may be embedded inside workflows, vertical applications, and internal systems. The breakout consumer or enterprise product may not look like ChatGPT. It may be software that uses models in multiple ways behind the scenes, with tooling, data, guardrails, and a user interface designed for a specific job.

Adoption accelerates, but early platform shifts are always messy

AI’s fast adoption numbers sit on top of decades of prior infrastructure. When Erik Torenberg asks how AI adoption compares with mobile and other platform shifts, Evans’s first answer is that each wave stands on the infrastructure of earlier waves. Mobile did not need to wait for the internet to be created from scratch. The internet did not need to wait for PCs. PCs did not need to wait for semiconductors and consumer electronics. By the time AI arrives, there are already billions of connected devices and a mature software ecosystem.

That means comparing raw user numbers across eras can mislead. When Marc Andreessen was working on Netscape, Evans notes, there were only “double digit millions of PCs” on the planet. The early web could not have had 900 million weekly active users, because the hardware base did not exist.

His second point is that early platform shifts are often exciting and broken at the same time. Evans remembers a period when computers routinely froze and users had to crawl under a desk, unplug them, and hope some work survived. Sound cards were expensive and hard to configure. Getting the internet working could involve a floppy disk with TCP/IP. Mobile, too, had its early period of awkwardness and missing pieces.

AI, in his framing, is at a similar stage. It is not clear what should be a browser, what should be an app, what should be a platform, or how the pieces fit together. There is a gap between what a small number of motivated users can make work and what becomes a one-button product for everyone else.

His third analogy is more specific: the current pricing crunch looks like mobile data in 2008–2010. Some users got huge bills after using what they thought was a manageable service; others had flat-rate plans that strained networks when iPhone users began watching YouTube over 3G. Mobile operators had to reconcile perceived value, pricing, marginal cost, and capacity through bundles, caps, throttling, and fair-use policies.

Evans sees the same pattern in AI tokens. On one side, a user pays $20 a month and may consume far more in token value than the subscription price implies. On the other, someone experiments for a few days and gets a $10,000 bill. Pricing, cost, and usage are not yet aligned.

The deeper lesson from mobile data is not just that pricing eventually normalized. It is that mobile operators built enormously valuable infrastructure, carried traffic that rose by 1,500 to 2,000 times, spent heavily on capex, changed daily life, and still did not capture most of the value. Evans says mobile networks collectively have about $1 trillion in revenue and spend about $200 billion a year on capex, while their stocks have been flat for 20 years. “All the cool stuff” was built by others.

That is the analogy Evans wants applied to foundation models. Infrastructure can be indispensable, expensive, technically sophisticated, and socially transformative without becoming the layer that captures the most value.

The commodity argument is a question about leverage, not usefulness

The commodity risk Evans sees for foundation models is not a claim that they are unimportant. It is a claim about pricing power, and he frames it as a thesis to be tested rather than a conclusion already proved. Benedict Evans starts from a chain of questions: Can one model be sustainably and fundamentally better than the others? Is there a network effect? Is there a strategic position equivalent to Instagram, YouTube, or Google Search? Does the model provider control distribution, workflows, or user relationships?

He does not see clear evidence of those advantages for LLMs. Models differ. Users may prefer one over another. One may be better for a task at a given moment. But Evans argues the main durable differentiator appears to be willingness to spend money, not a structural network effect.

The chatbot, in his view, is a “weird limited V1 UI.” It works well for some tasks and some users, but many use cases require tooling, configuration, data integration, control, and a user interface designed by people who understand both the work and product design. People who are excellent at doing a job are not necessarily the people who should design the software for that job. His examples are print designers versus InDesign, and financial advisers versus TurboTax.

This is where Evans separates the model from the product. A foundation model can provide capabilities, but products encode workflows. They make choices about what the user sees, what data is available, when the model is allowed to act, what errors are acceptable, and what must be deterministic.

He compares model providers to cloud infrastructure. When a law firm buys enterprise software, it generally does not ask whether the product uses Claude or OpenAI because the firm has “standardized on Claude.” Similarly, SaaS customers often do not know or care which cloud provider a product runs on. The infrastructure is abstracted away.

That does not mean model providers have no advantages. Evans says the semiconductor analogy may also apply: each generation becomes more expensive, leaving fewer players. There may be three to six frontier model companies, plus open-source, edge, and older models used for cheaper tasks. But if several companies sell similar capabilities built on similar chips, and some have other business models such as advertising, he asks where price discipline comes from.

It’s not that I know that they’re going to become commodities. My position is more right more now, well like hey here is a here is a chain of argument that says deterministically it looks like these things will be commodities and explain to me why they won’t be.

Benedict Evans · Source

The qualification matters. Evans allows that the world could settle with only two companies able to make frontier LLMs, or with models absorbing more of the application layer than he expects. His point is that the current scarcity of tokens is transitory. Supply, capex, efficiency, and pricing will move. The question is what happens when the market reaches a different equilibrium.

Historical analogies help frame the uncertainty, but they do not predict the outcome

Earlier platform shifts are useful because they generate the right questions, not because they answer them. Benedict Evans uses comparisons to PCs, the web, mobile, cloud, semiconductors, browsers, operating systems, and telecom networks throughout his argument, but he is careful about what they can and cannot do.

He cites a familiar failure of retrospective certainty: many smart people looked at iPhone and Android 15 years ago and concluded that mobile would be “open versus closed” again and Android would crush the iPhone. That did not happen. It can be explained after the fact, but it was not obvious at the time.

His broader view is that at this stage in a technology cycle, many paths remain open. Later, the S-curve steepens, the market narrows, and it becomes clear what happened. At that point, the right move for an analyst is to look elsewhere.

One of the characteristics of tech is that the moment that you understand something and you know how it works and what’s going to happen, is the moment you should move on to something else.

Benedict Evans

He uses Apple as the example. He says he has not updated his Apple spreadsheet in years because “we know what happened, they won.” He no longer treats the next iPhone or Apple’s market share in China as the central strategic question. The interesting work is where the answers are still unknown.

AI has an additional uncertainty that previous platform shifts did not have to the same degree. In 1995, people did not know what the internet would become, but they did know some physical constraints: telcos would not give everyone broadband next week, and not everyone would buy a $3,000 PC. With generative AI, Evans says the physical and economic limits are less settled. A new model might be much cheaper or better; model efficiency might change the cost curve; edge and open-source models may shift workloads; the character of models may change.

That makes prediction harder. It also means that AI’s current shape may be less stable than the early shape of earlier waves. The right stance, for Evans, is not to refuse predictions altogether but to distinguish between what can be reasoned about and what remains unknowable.

The next AI products may come from industry problems outsiders cannot see

The next strategic questions may not belong primarily to model builders. Benedict Evans says many of them move into industries: finance, law, consulting, accounting, advertising, enterprise sales, Hollywood. His answer is that the relevant questions become partly AI questions and partly domain questions.

He compares this to Netflix. The company was enabled by technology, but the important strategic questions became media-industry questions: what shows to make, how many, what to pay talent, whether to pursue awards, whether to buy sports rights, and which sports. Those are “Los Angeles questions,” not “San Francisco questions.” Similarly, what AI means for law may be a question for people who understand law firms, clients, associate work, incentives, and billing structures.

Professional services are a central example. Law firms, consultancies, and investment banks often have pyramid structures in which junior employees perform large amounts of work. If AI automates a significant part of the bottom of that pyramid, what happens? Evans says people outside those industries may not know what junior workers actually do, what clients actually pay for, or how the structure can be reconfigured.

The same applies inside companies. A large amount of organizational work is implicit. It is not documented, not in training data, and not something employees can easily reduce to a flowchart. Evans argues that part of the value of firms such as Bain, BCG, and McKinsey is their license to enter an organization, talk across silos, discover how the company actually works rather than how it is supposed to work, and identify incentive conflicts — for example, when bonus targets cause people not to follow the stated strategy.

That matters for vertical AI startups and for model companies trying to move up the stack. The hard part is not always building a model that can write a memo or generate a slide. It is identifying the real workflow, the exception cases, the hidden incentives, and the decision rights that determine whether a tool can change the way an organization works.

Evans also distinguishes between tasks and jobs. Tasks can change dramatically while the job sold to the client remains recognizable. Accountants today do very different things from accountants 50 years ago, but to the client the service may appear continuous. The question is not only what task AI automates, but whether the job, client value, business model, and organizational structure change.

He offers another distinction: where do you want the average answer, and where do you want a non-average answer? LLMs are strong when the desired output is what a competent person would typically produce and when the process can be described. They are weaker where the point is a new idea, a different answer, or a judgment that is not easily explained.

Automation can lower costs, expand usage, or create things that were previously impossible

When Torenberg asks about daily-use cases outside coding, Benedict Evans frames AI as automation: it makes a class of things cheap or possible that previously required human effort. But that can play out in several ways.

The first is price elasticity, which he connects to Jevons paradox. If something becomes cheaper, do users do the same amount for less money, more for the same money, or much more for more money? The second is barrier removal: was an expensive activity or asset a barrier to entry, like owning a printing press for a newspaper? The third is unlocking: does making something cheap change a business model or competitive structure? The fourth is the emergence of activities that were previously so cost-prohibitive no one seriously considered them.

Evans’s historical examples are trains and music. Steam engines made trains possible; buying more horses would not produce an express train. Spotify changed music not only by removing the need to buy a $15 CD for one track, but by making $15 a month buy access to nearly all music — a proposition that previously could not exist.

Advertising and commerce are one area where Evans sees concrete possibilities. Advertising is roughly a trillion-dollar market, and retail is roughly $25 trillion, so even small structural changes matter. His claim is that Google, Meta, and Amazon historically knew a great deal about correlations — SKUs, metadata, “people who bought this also bought that” — but not necessarily what a product is or why someone wants it. The joke example is Amazon recommending more toilet seats after someone buys one. It knows the purchase, but not the context.

With LLMs and related AI systems, Evans says, platforms may be able to operate with a different level of statistical correlation around products, preferences, and contexts. He is cautious with the word “know,” but the direction in his account is clear: recommendations and ads can become more relevant. He points to rising ad numbers and conversion rates at Google and Facebook as they roll AI into ad systems, recommendation engines, and prediction algorithms.

He sketches increasingly ambitious commerce interactions. A user could show a picture of a coat and ask what it is and where to buy it. Then ask for ten similar coats at different prices with pros and cons. Then ask the system to look at their Instagram and suggest a winter coat that changes their look “but not too much.” Three years ago, he says, that would have sounded like science fiction; now it seems buildable.

The enterprise analogue is not merely call-center sentiment analysis. It is a higher-level synthesis across recorded Zoom calls, email flows in Salesforce, product telemetry, metrics, and analytics: how should the company change prices to improve churn? That is a different abstraction layer from automating an existing report.

Evans’s warning is that predicting the exact new products from here is like trying to predict Uber and Airbnb from the internet in 1997. The important companies may “fill a hole in the universe”: once explained, they seem obvious, but before that neither incumbents nor customers recognized the problem existed.

Software will multiply, but the value may move around unpredictably

AI points toward more software, not less. Benedict Evans expects it to become cheaper and faster to build software, and expects models to allow software to do things it previously could not. That means more competition and likely changes to margins, though he says the future margin structure has not shaken out.

He is skeptical that all enterprise software will move cleanly to outcome-based pricing. In some systems, such as Salesforce, it may be possible to tie actions to revenue. In much enterprise software, he thinks it is hard to connect each button press or workflow to EPS and price accordingly. Current pricing experiments may not reflect long-term equilibrium.

To understand how AI enters enterprise software, Evans divides the existing software estate into three rough buckets. First are large horizontal systems: SAP, Workday, CRM, capital management, payroll, and similar “big iron” systems. Second are vertical applications and the hundreds of SaaS apps used by large companies, plus many more internally built or purchased tools, including on-prem systems. Third is the improvised middle layer of Excel, email, shared files, and departmental workarounds.

Tasks move among these buckets. A company could manage graduate recruiting in Workday, a dedicated app, Excel, email, or a custom internal tool. PwC, hiring thousands of graduates, probably has dedicated software. A company hiring five graduates a year may use email and a shared Google Sheet. AI adds another set of options: use an LLM directly, use an LLM feature inside an existing system, or use an LLM to build a tool.

Evans also offers a second framing: does the LLM sit at the bottom of the stack or the top? At the bottom, it is a feature inside Salesforce — a button that reviews customer history, call context, business objectives, and suggests an email. It is controlled by the application and constrained by its workflow. At the top, an LLM might look across Salesforce, Workday, email, analytics, and other systems to synthesize something no individual application could produce.

The core architecture question is where to put probabilistic software that can make mistakes and where to put deterministic software that must be reliable. Where does the database sit, and where does the model sit? Evans’s answer is “probably both,” depending on the task.

For SaaS incumbents, the implication is uncomfortable but not cleanly tradable. Evans thinks some existing software companies will be damaged or wiped out, but he does not identify which ones. Investors can see that AI will change the competitive field; they cannot necessarily identify the victims. That uncertainty makes it hard to price the whole software sector.

Capex can be rational and still hit financial gravity

The strategic case for overinvesting in AI collides with financial limits. Torenberg raises the argument, associated with Google and others, that underinvesting in AI is riskier than overinvesting. Benedict Evans answers by separating strategic necessity from the amount of money companies can actually spend.

The major technology companies are spending at a scale that resembles global infrastructure. Evans says Microsoft, Meta, and Google are on track to spend about 50% of revenue on capex this year. Telecoms, by comparison, are capital-intensive businesses and spend about 15–20% of revenue on capex. He cites guidance of $700 billion from the big four companies this year, compared with roughly $300 billion for telecoms overall, $200 billion for mobile, and oil and gas somewhere around $700 billion to $1 trillion depending on definitions.

$700B

capex guidance Evans cites for the big four companies this year

That amount is not impossible. It is what large global infrastructure can cost. But Evans says those companies could not spend $1.5 trillion next year without borrowing, and could not sustain unlimited growth in capex. There are “laws of physics” around the amount of money available. The curve must eventually taper because there is nowhere else for it to go.

At the same time, the strategic pressure is real. For Google, Meta, Microsoft, Amazon, and to some extent Apple, AI may be existential. If this is the future of compute, they cannot let someone else get away with it. Evans compares the risk to Microsoft in the 2000s, IBM in the 1990s, and Meta being constrained by Apple in the 2010s. The CFO may ask how much participation is required, but non-participation is not a comfortable option.

The difficulty is that the spreadsheet has rows but uncertain values. Demand exceeds supply. Efficiency is improving rapidly. The next model may change the economics. Edge and open-source models may absorb some workloads. Frontier models may be relevant for only three to nine months before being superseded. Analysts can try to assign numbers, but Evans compares it to modeling internet bandwidth in the late 1990s: you know what variables matter, not what values to put in.

Token usage has a similar disequilibrium. Some users are “using the most expensive model to dick around on the internet,” analogous to early mobile users receiving huge data bills. Pricing must eventually align with cost, usage, and ROI. But ROI is hard to measure early.

Evans points to survey evidence in his presentation to show why early ROI is difficult to pin down. A slide titled “What works first?” summarized a survey of US CFOs from December 2023, attributed on-screen to the Atlanta Fed and Rastaedize et al. The pattern was straightforward: the first benefits show up in areas that are easier to deploy but harder to measure financially; the cleaner financial cases take longer.

AI deployment impact area	How Evans’s slide characterizes it
Productivity	Easy to deploy, hard to measure
Better insights and decision-making	Easy to deploy, hard to measure
Better customer service	Easy to deploy, hard to measure
Cost-saving	Hard to deploy, easy to measure
New revenue	Hard to deploy, easy to measure

Evans’s presentation separates early AI benefits that are easier to deploy from financial outcomes that are easier to measure.

Better analytics, better customer support, more productivity, faster slide-making, and faster analysis may have real financial value without mapping neatly to a revenue line or cost-saving line. Building a new AI-driven revenue product or a clear cost-saving workflow is easier to measure, but harder to deploy.

There is also consumer surplus. If a discounted cash-flow analysis takes a week, an analyst may do one or two. If it takes 10 seconds, they may do 50. The client may not pay more. The firm may produce more analysis, compete away the gain, and treat the tool as a necessity rather than a margin expansion. Evans compares this to Excel and to financial analysis in investment banking: more analysis gets done, perhaps with fewer people, while customer pricing may not change.

The familiar pitch is 150 extra engineers

An old IBM advertisement from the early 1950s gives Evans a way to place AI in a longer pattern. Benedict Evans describes the ad as showing a sea of engineers with slide rules. Its headline promises “150 extra engineers,” because an IBM electronic calculator could speed through intricate computations that would otherwise consume scarce engineering labor.

The ad copy, attributed on-screen to IBM, says an “IBM Electronic Calculator speeds through thousands of intricate computations so quickly that on many complex problems it’s just like having 150 EXTRA Engineers.” Evans’s point is that this resembles many AI startup pitches: routine calculation automated, scarce human talent freed for higher-value work, enormous productivity implied.

The point is not that AI is just another calculator. Evans says AI is “amazing and transformative and completely unlike anything that’s happened before.” But the same was true, in their own ways, of mobile, the internet, PCs, and computing.

Every 10, 15, or 20 years, a fundamental technology change arrives. It changes everything, creates new winners, destroys some old structures, ruins some lives, puts some people out of work, produces things people love, and eventually becomes invisible.

His example is the call itself: computers streamed HD video for an hour without crashing; an iPhone streamed video to a Mac over Wi-Fi; it all worked. In an earlier era, that would have seemed extraordinary. Now it is assumed.

That is Evans’s base case for AI. The open questions are where value settles, which products emerge, which jobs change, which companies are damaged, how pricing normalizes, and whether foundation models retain leverage or become infrastructure. But if the technology keeps working its way into the economy, the endpoint may feel less like permanent astonishment than quiet expectation.

AI Application Architecture AI Labs and Strategy Agents and Autonomy AI Infrastructure and Compute AI Business Models AI Economics and Labor AI Product Management Coding Assistants Enterprise AI Adoption