Orply.

Baseten Raises $1.5 Billion as Inference Demand Shifts Toward Open Source

Baseten’s $1.5 billion financing at a $13 billion valuation rests on a bet that AI inference is becoming a larger and more operationally demanding market as companies run more open-source and post-trained models. CEO Tuhin Srivastava says the capital will help Baseten secure diversified compute and build the infrastructure layer customers need, while Altimeter partner Apoorv Agrawal argues the shift is toward capability, control, and cost advantages rather than simple access to frontier models.

Baseten’s raise is a bet on inference capacity and open-source deployment economics

Baseten’s $1.5 billion financing is being used to meet a specific kind of AI demand: companies trying to run more inference, often on open-source models, while keeping control of cost and performance. Bloomberg described the round as two tranches, one at an $11 billion valuation and another at $13 billion, and described Baseten as an AI inference startup that provides software and computing capacity for companies using open-source, typically lower-cost AI models.

$13B
valuation on Baseten’s second financing tranche

Tuhin Srivastava said the demand comes from companies trying to “insert intelligence everywhere,” from improving open-source models, and from post-training techniques that make those models more useful for specific tasks. The capital need is operational rather than abstract: Baseten has to procure compute and hire infrastructure and research engineers to build the software layer that sits on top of it.

Bloomberg’s on-screen graphics placed that financing in a market context. Investors shown included Altimeter, Conviction, Spark Capital, Wellington Management, and Sands Capital. A separate customer graphic named OpenEvidence, Lovable, Cursor, Clay, Mercor, and Abridge. The source presented those customers in the context of Baseten’s inference business and the demand surge Srivastava described.

Asked what it is like to acquire compute in a constrained market, Srivastava said Baseten has a strong relationship with Nvidia, which he said has been supportive over the years. But the lesson he emphasized was diversification. Baseten cannot rely on a single compute source. The company now acquires compute from 18 different clouds and operates in roughly 90 clusters. Srivastava clarified that the diversification is not necessarily about chip vendors; it is about cloud sources. The goal is flexibility as customer demand rises.

Compute detailWhat Srivastava said
Cloud sourcesBaseten acquires compute from 18 different clouds
ClustersBaseten operates in around 90 different clusters
Diversification focusNot necessarily chips; cloud sources
Baseten’s compute diversification, as described by Srivastava

Altimeter’s thesis is that inference becomes one of the world’s largest markets

Apoorv Agrawal described Altimeter’s investment premise bluntly: inference will be “one of the largest if not the largest markets, not in AI, in the world.” The reasoning is that AI usage has moved beyond simple question-and-answer interactions. Modern workflows involve agents, longer context windows, retrieval, analysis, synthesis, verification, and repeated loops through those steps.

Each user request, Agrawal said, can trigger hundreds or thousands of inference requests. That multiplication of model calls is what makes inference economically consequential. The market is not simply about cheaper access to models; he corrected Ed Ludlow by saying the relevant enterprise considerations are “capability, control, and cost.”

On capability, Agrawal argued that post-trained open-source models are now close to frontier model performance, and can be better for particular workflows. On control, he invoked Satya Nadella’s argument that enterprises need to compound their own advantages — data, knowledge, and know-how — rather than “giving away your intelligence” on rent. Baseten, in that view, lets customers combine models with their own workflow and proprietary context.

That thesis also explains where Altimeter sees value accruing. Agrawal described the model layer as “a phenomenal business for very few people,” but also “a game of emperors.” To capture value there, a company has to remain at the frontier, and the competitive environment is, in his words, a “knife fight.” He joked that the AI year has four seasons: Anthropic, OpenAI, SpaceX, and Google, with open source as an evergreen presence.

The model layer is a phenomenal business for very few people. It’s a game of emperors.

Apoorv Agrawal

The model layer remains important in Agrawal’s argument. It is just not the only place where value can accrue. The infrastructure and service layer around inference may be where a broader set of customers build advantage without competing directly with the frontier labs.

Open source matters when models become usable, specialized, and economical

The open-source argument has changed from availability to usefulness. In the early phase of the current AI cycle, Ludlow said, the assumption was that only the largest models, with hundreds of billions of parameters, mattered. Open-source models were often treated as impractical for real business use. Baseten’s position depends on a different claim: that utility can be extracted from open-source models, including smaller or specialized ones.

Tuhin Srivastava described open source as having gone through a cycle. Llama 3 represented one important moment roughly two to two and a half years earlier, followed by “a bit of a winter.” The DeepSeek moment, in his telling, put open source back on the map because the capability gap with frontier models narrowed. He said something similar had happened over the prior couple of weeks with GLM 5.2, which he described as a “frontier-level model” that is not merely strong on benchmarks but feels capable to users in practice.

Baseten’s role is to make these models easy to use regardless of size. The company works with small, big, and massive models. What it sells is not a single model choice but the ability for customers to run models without building the same infrastructure teams that frontier labs maintain.

Apoorv Agrawal treated the model market as a moving portfolio problem: yesterday it was DeepSeek, today it is GLM, and tomorrow it will be something else. The model-layer battle is seasonal and nowhere near an endgame. Baseten’s function is to let customers such as Cursor, Abridge, and OpenEvidence use whichever underlying model is most useful, combine it with their enterprise-specific workflow, and deliver performance optimized for that workflow.

That usually means more than one model. Customers typically use a portfolio of models, fine-tuned and combined to support a specific product or process. Baseten benefits as models improve, but its value is tied to serving the application layer rather than owning the winning foundation model.

The economic problem appears after companies start using AI everywhere

Jensen Huang’s phrase about wanting teams to “burn tokens” captures the demand side of Baseten’s market. Tuhin Srivastava interpreted the phrase simply: burning tokens means using AI everywhere. He acknowledged that it is in Nvidia’s interest for that to be the story, but described a recurring customer pattern.

Companies first adopt AI aggressively and find gains. Then they discover that the usage is not necessarily profitable. That is the point at which open-source models, post-trained models, and Baseten’s infrastructure become relevant. The goal is not only wider deployment of intelligence, but deployment with unit economics that work for the business — “better, faster, and cheaper,” in Srivastava’s phrasing.

Apoorv Agrawal drew the line between frontier models and open-source deployment more sharply. Closed frontier models are “very, very good,” especially for use cases that require the highest intelligence, highest reasoning, or new use-case discovery. They are also strong when a customer wants one model as a service and does not want to think further about architecture.

But that describes only a class of customers and workloads. Citing Jonathan Ross, Agrawal said roughly 35 companies drive 99% of all inference. The top four or five are the major labs Ludlow had named. The next 30 include companies such as Cursor, OpenEvidence, Abridge, and Harvey. For those companies, the central question is how to deliver something that is not rented from someone else, but theirs to compound over time.

~35
companies said to drive 99% of all inference, citing Jonathan Ross

Agrawal pointed to Harvey as an example, saying Altimeter had posted a chart showing the company achieved frontier capabilities by post-training an open-source model, with better cost and control. He used the example to illustrate what the “next 30” companies, and eventually the next thousand, may try to do: use open-source foundations, specialize them for their own workflows, and improve the economics and control of inference-heavy products.

The frontier, in your inbox tomorrow at 08:00.

Sign up free. Pick the industry Briefs you want. Tomorrow morning, they land. No credit card.

Sign up free