Etched Raises $800 Million to Challenge Nvidia in AI Inference

Ed LudlowBloomberg TechnologyTuesday, June 30, 20265 min read

Etched chief executive Gavin Uberti told Bloomberg Technology the startup is positioning itself against Nvidia by treating AI inference as a rack-scale systems problem rather than a chip-only contest. The company, which has raised $800 million at a $5 billion post-money valuation from investors including Jane Street and TSMC-linked VentureTech Alliance, argues its low-voltage inference design and cluster-scale memory can lower cost per token as models get larger. Uberti said production scale is now the central test, though Etched did not disclose named customers, revenue, or comparative benchmark figures.

Etched is selling inference as a rack-scale problem

? gavin-uberti described Etched as a company building rack-scale inference systems, not merely a new AI chip. The premise, as Ed Ludlow put it, is that the industry’s center of gravity is moving from training AI models to running them: inference.

Etched has raised $800 million in total funding at a latest post-money valuation of $5 billion. Listed investors included Jane Street, Ribbit Capital, VentureTech Alliance, HRT, Stripes, and Two Sigma Ventures. Ludlow identified VentureTech Alliance as TSMC-linked.

$800M

total funding raised by Etched

Uberti said Etched was unveiling both its funding and two core technologies: “low voltage inference” and “cluster scale memory.” He broke inference into two phases: prefill, which reads input data, and decode, which produces output tokens. Etched’s system is designed around that workload.

Etched’s first-generation rack comes pre-assembled with 32 Etched chips. Its central feature is the cluster-scale memory layer linking those chips with very low-latency communication. That link allows one chip to read and use the high-bandwidth memory and SRAM of other chips in the same system.

Etched is therefore presenting the rack, not only the chip, as the relevant unit of competition. Uberti’s claim is that inference at scale benefits when chips inside a rack can pool and access memory with low latency, especially as models grow larger.

The technical pitch is more tokens from the same memory budget

Uberti’s explanation of Etched’s advantage focused less on avoiding high-bandwidth memory than on getting more work from it. Ludlow contrasted Etched with Cerebras, whose pitch, as Ludlow summarized it, includes insulation from high-bandwidth-memory constraints because it relies on SRAM rather than HBM. Uberti did not adopt that framing for Etched. He said Etched is trying to “get more out of the same volume of HBM.”

The mechanism he emphasized is low-voltage inference. In Uberti’s account, the bottleneck on a GPU is often power: a chip thermally throttles, so more compute cannot simply be packed onto the same device. Etched’s answer, he said, is to run at “under half the voltage of typical Nvidia GPUs.” Lower voltage creates large power savings; those savings make it possible to serve more users on the same chip and get more useful work out of the same HBM bandwidth.

We want to be able to use the same HBM and the same SRAM to run way, way more tokens for more users.

? gavin-uberti · Source

Ludlow translated the customer implication into an economic question: whether Etched is pitching superiority on a dollar-per-token basis. Uberti answered directly: “Absolutely.” He described the advantage as economies of scale. More infrastructure, he argued, should allow a lower cost per user, and larger models should increase the value of Etched’s cluster-scale memory design.

The technical claim is not simply that Etched has a faster chip. Uberti tied the advantage to power savings, memory bandwidth, inter-chip memory access, and rack-scale deployment. His stated expectation is that as models get bigger, cluster-scale memory gives Etched “more and more of an advantage.”

The funding case depends on technical conviction and manufacturing scale

The capital raise is part of Etched’s argument. Ludlow asked why a young company could raise such a large amount and why firms such as Jane Street would back the technology. Uberti’s answer was that Etched’s investors are unusually technical. To invest in a company like Etched, he said, backers had to be “very first principles driven,” understand the technology deeply, and believe both that the market will be large and that Etched’s approach is fundamentally better.

He singled out Jane Street and VentureTech Alliance as backers with large chip teams that, in his phrase, “get it.” Ludlow described the $5 billion valuation as striking given Etched’s brief history.

Uberti acknowledged that it was “not a low valuation,” but said the company now has rack-scale inference systems running and has proven out the technology. His defense of the valuation was not based on revenue detail. It was based on technical progress and the readiness of the system architecture.

When asked whether the capital raise was explained by the need to build at scale, Uberti said production was a key part of it. He argued that making a meaningful difference requires building a large number of products, not just demonstrating benchmark performance. He also pointed to the team as central to Etched’s case, saying around half of the platform team came from Nvidia and that Etched’s vice president had run Nvidia’s DGX and HGX teams.

That staffing detail matters because Etched is challenging an incumbent whose systems dominate the market. Uberti’s answer was that the company has recruited people who have built major Nvidia platforms before.

The missing piece is named customer deployment

Ed Ludlow described Nvidia as still having what he called a “technical monopoly” in the market. He asked ? gavin-uberti what Etched could say about real-world workloads, real-world revenues, or evidence that its systems will be deployed meaningfully.

Uberti said Etched is running benchmarks and seeing “best-in-the-world performance.” When Ludlow narrowed the question to evidence outside the lab environment, Uberti said some customers have tested the system, but he did not name those customers or describe the tests. He redirected the emphasis to production.

Etched did not disclose customer names, real-world workloads, revenue, deployment scale, benchmark names, or comparative performance figures. The concrete disclosures were the system architecture, the 32-chip first-generation rack configuration, the under-half-voltage claim relative to typical Nvidia GPUs, the $800 million funding total, and the $5 billion post-money valuation.

To make a big difference you have to build a lot of these products.

? gavin-uberti

Uberti placed the burden of execution on building enough systems for the architecture to matter commercially. The company’s public case is that inference demand will reward rack-scale memory sharing and lower-voltage operation; the evidence offered was technical disclosure, investor backing, internal benchmarks, customer testing in unnamed form, and a production team with Nvidia platform experience.

AI Startups and Funding Inference and Deployment AI Infrastructure and Compute

Etched is selling inference as a rack-scale problem

The technical pitch is more tokens from the same memory budget

The funding case depends on technical conviction and manufacturing scale

The missing piece is named customer deployment

The frontier, in your inbox tomorrow at 08:00.