AI’s Next Bottleneck Is Compute Waste, Not GPU Scarcity

Shawn WangLatent SpaceThursday, June 18, 202621 min read

Anjney Midha, AMP’s founder and an investor in frontier AI companies including Anthropic and Mistral, argues that AI’s infrastructure bottleneck is as much waste and misalignment as GPU scarcity. In a conversation with swyx at Periodic Labs, he makes the case for AMP as a neutral compute grid that would pool supply and demand so FLOPs can move more like megawatts. Midha ties that infrastructure thesis to a broader discipline he calls “output maxing”: raising utilization, reducing organizational loss, earning community trust for data centers, and making frontier systems deliver more useful work from scarce resources.

The frontier compute problem is not just scarcity. It is waste at scale.

Anjney Midha describes the current AI infrastructure bottleneck less as a simple shortage of GPUs than as a systems-alignment problem: capital, clusters, operators, schedulers, research teams, and output measurement are often separated by enough organizational layers that small misalignments compound into large waste.

His benchmark for what “good” infrastructure should look like comes from Google. In large clusters, he distinguishes between node allocation and MFU. Node utilization is the share of cards in a data center that are allocated and in use. At Google, he says, 95% node utilization “was considered an outage”; 96% should be standard. Many single-tenant clusters, in his view, are not running at that level. MFU is harder: he puts best-in-class performance today somewhere around 60% to 70%.

95%

node utilization that Midha says would have been considered an outage at Google

The waste is not just a technical issue. Midha frames it as a leadership and alignment issue. The people funding clusters, deploying them, managing them, and judging the output may be theoretically aligned, but in practice can sit far apart in the supply chain. He uses the image of two lines starting only a few degrees apart: near the origin the error looks small, but at scale the gap spreads. AI infrastructure plans, he says, often begin with a North Star and a team that wants to do the right thing. Then they are forced to scale too quickly, rather than through iterative bring-up, and the waste compounds.

The corrective, in his account, is not exotic. It is common sense from semiconductor and data-center operations: bring systems up iteratively, measure tightly, and do not treat AI capability gains as a license to abandon infrastructure discipline.

Common sense should always be in fashion.

Anjney Midha · Source

Midha accepts that “this time is different” on AI capability. He does not accept the extension of that claim to every part of the stack. The fact that models are improving in unusual ways does not mean data centers, power procurement, cluster scheduling, and operational reliability have escaped ordinary constraints. If anything, he argues, AI scaling raises the premium on disciplined infrastructure because the margin for error is lower and the cost of waste is higher.

That cost is not only economic. It also includes power-system stress, local permitting fights, and regulatory risk. Midha recasts the familiar startup slogan in those terms. The right posture is not “move fast and break things,” or even only “move fast with stable infrastructure,” but “move fast with responsible infrastructure.”

AI data centers need a better bargain with the communities that host them

Midha’s argument for responsible infrastructure starts with local legitimacy. If a data center is brought into a community, he says, the community should be able to see the benefit clearly enough to feel like a partner rather than a host being exploited for land and power.

He cites an idea from Scott Nolan of General Matter, who had spoken in Midha’s Stanford class about energy bottlenecks. If the marginal economics of compute are roughly $4 per hour, Nolan proposed charging $4.50 per hour and directing that marginal 50 cents to the local community as cash. Midha says that as a compute customer, he would happily pay the extra amount if it made the public benefit obvious and made the compute more reliable.

That reliability matters because community opposition is already a scaling risk. Midha says his understanding is that up to 20% of U.S. data centers this year are at risk of not getting the community support needed for bring-up, while cautioning that such numbers can be overstated and should be examined carefully.

20%

share of U.S. data centers Midha says may be at risk from lack of community support

The issues are not confined to jobs. Shawn Wang notes that communities care about the power grid, environment, and surrounding impacts. Midha adds power-grid constraints and permitting. His preferred AI data-center deal would be more direct: if a data center arrives in a community, it should reduce local electricity bills. That, he says, turns the relationship into a real partnership.

The warning is also regulatory. Midha expects audits and investigations. When regulators arrive, he says, the teams moving fast and breaking things “in the name of AI progress” should be prepared.

That is part of why he is skeptical of the marketing category “neo-cloud.” He does not dismiss newer cloud suppliers wholesale, but says there are long-established American data-center providers with 20-year track records who understand land, power, shell, credit history, boom-and-bust cycles, and reliability. They may not be sponsoring happy hours at NeurIPS or talking in the latest AI jargon, but Midha’s conclusion is blunt: “They’re adults. I trust them.”

For him, stable infrastructure partners are a strategic choice, not a conservative affectation. Short-term thinking in the compute layer, he says, “is going to catch up to us.”

AMP is trying to make FLOPs behave more like megawatts

Midha positions AMP not as a full-stack AI lab and not as a neo-cloud, but as a pooling and coordination layer for compute. His core analogy is the electric grid.

In systems design, he says, there are two recurring regimes. One is integration: collapse processes into one node and own more of the stack. The other is pooling: pull a process out of a node and share it across many nodes to improve utilization. AMP is deliberately the second kind. It is meant to be a horizontal, multi-cloud, multi-silicon compute grid.

Midha’s phrase for the ambition is to “make FLOPs flow like megawatts.” Today, he says, compute is stranded in pools across the ecosystem, with little fungibility. AMP currently attacks this at the scheduling and economic layers, while inviting other efforts that make compute more fungible at other parts of the stack to connect to the grid.

He describes AMP’s economic role as an independent system operator, borrowing from the history of electric grids. Once factories and industrial users realized they should pool generators rather than each run half-used local generation, they needed a neutral coordinator among generation facilities, transmission lines, and factories. Historically durable grid operators, in Midha’s telling, often did not own the assets themselves. They began with long-term anchors with uncorrelated demand: a steel factory needing to spike at night, a shoe mill needing to spike during the day. Each got guaranteed base load, and the grid scheduled spikes to raise peak utilization across the town.

AMP’s intended structure follows that logic, but Midha is careful about what has and has not yet been secured. He says AMP pools supply from trusted partners “at about 1.3 gigawatt scale over four years” and pools demand from research labs and other teams. When pressed on the number, he clarifies that AMP has not secured all of it; that figure reflects demand the company has started to secure, and AMP has not publicly confirmed how much capacity it has for the current year. His steady-state ambition is a baseload pool of 1.3 gigawatts available at all times. For spike capacity, he estimates that AMP’s teams need roughly 6 gigawatts over the next four years to keep moving their respective frontiers.

Metric	Midha's description	Scale
Grid scale discussed	AMP pools supply from trusted partners and has started to secure demand; Midha clarifies the full amount has not been secured	About 1.3 gigawatts over four years
Steady-state baseload ambition	Guaranteed capacity Midha wants available at all times	1.3 gigawatts
Spike-capacity estimate	Flexible capacity Midha estimates AMP's teams need over four years	Roughly 6 gigawatts
Cloud-spend equivalent	Midha's rough equivalence for 1.3 gigawatts	About $40 billion

Midha's stated AMP grid scale ambitions and caveats

The technical starting point is scheduling. Midha cites AMP engineering leaders Seb and Mihai, who had worked on Google’s Borg and ex-Borg GQM scheduling systems. The internal Google pattern was to guarantee teams base capacity for routine workloads while enabling them to spike for research needs. One important mechanism was interruptible demand: jobs queue up, priorities shift dynamically through credits or bidding, and lower-priority jobs can be interrupted when another team spends more priority on a more valuable job.

Midha describes the mechanism as dynamic prioritization. A research lead might decide one job is worth five credits and another is worth ten; the higher-priority job gets capacity, and the lower-priority job can be interrupted. His concrete example — “Genie 3” versus “Nano Banana 2” — is made up, but the underlying point is that scarce compute can be allocated by a changing internal valuation of work, not only by static reservations.

Wang notes that this kind of internal market was real at Google, but also raises the criticism that sometimes a company needs central command to go all in on a strategic priority. He links that critique to the view that Google’s internal marketplace dynamics contributed to it missing GPT. Midha does not fully litigate that disagreement. Instead, he moves to AMP’s organizational design: AMP Holdings has an infrastructure business and a capital business called Foundry, which incubates or invests in frontier AI labs.

The tension is central to the model. AMP wants neutral pooling and dynamic allocation, but Midha also recognizes that frontier work sometimes needs decisive prioritization, tight trust, and mission-level choices. AMP’s answer is not to own the whole model stack. It is to coordinate enough independent demand and supply that frontier teams can have base guarantees and flexible spikes without each building an inefficient private island.

Research hoarding is treated as a market failure

Foundry, AMP’s capital arm, is part of Midha’s response to what he sees as another kind of infrastructure waste: research trapped inside large organizations. He points to teams inside DeepMind and other labs that have pushed the frontier but later find themselves misaligned with the priorities of the parent company.

Midha says he understands why Alphabet, Google, and DeepMind make internal prioritization decisions. A parent company may choose to deprioritize an “omni model” or another research direction and emphasize coding, for example. But he views the broader result as tragic when important research does not reach production or even publication.

Wang suggests that some work does come out as papers but not products. Midha says the worse pattern is when papers are not published at all. He refers to what he says people have heard about DeepMind: a six-month internal embargo window in which, if someone on the business side says a paper could be interesting, it may be held indefinitely. Wang summarizes the community complaint more sharply: the work that gets published is the work “not good enough” to hold back. Midha calls that an adverse selection problem.

His claim is not that DeepMind is wrong to run its business. It is that when research is hoarded, negative externalities are imposed on the rest of the field. In his words, “there’s a market failure.” AMP and Foundry are structured partly to unlock researchers and teams that might otherwise remain inside corporate trust boundaries or under shifting parent-company priorities.

The same logic applies to compute. Midha says AMP’s 1.3 gigawatts is “nothing” relative to the scale required, even while he equates it to about $40 billion of cloud spend. The scale of latent demand is visible in the near term, too. He says AMP once expected to have excess capacity by the end of the year, but in the prior six weeks that excess disappeared. His text messages, he says, are full of founders who have raised billions asking whether he can find them “50 nodes in the next few weeks.”

End-of-life prediction is the AI application Midha has not been able to stop thinking about

Midha’s infrastructure argument is connected to a much older personal research interest: end-of-life prediction in healthcare. He studied bioinformatics at Stanford Medicine after moving through economics, computer science, and mathematical computational science. He apprenticed with Stanford professor Nigam Shah, who was working on end-of-life prediction using longitudinal patient data.

The data mattered. Midha says Stanford had one of the only large longitudinal patient datasets in the United States, at least 12 million patient lives, with only the Veterans Affairs dataset larger. At the time, the Stanford dataset was called STRIDE, and access for deep-learning research required affiliation with Stanford Medical School. That access was part of why Midha enrolled in bioinformatics.

Shah’s premise, as Midha tells it, was that end-of-life care was a very large problem. Midha says that at the time, over 30% of Medicare and Medicaid spend went to end-of-life care. The clinical problem was not just cost. It was uncertainty. A patient with a terminal diagnosis might be told they have somewhere between six months and six years to live. The error bars are so wide that the information is hard to act on.

Midha connects that uncertainty to culture and law. He contrasts his own upbringing in India, where he says Hindu culture can understand death as one step in a journey of many lives, with his view that American medical practice often works backward from an assumption that death must be delayed or postponed. Physicians in the United States, facing malpractice risk, avoid precise recommendations. If they are wrong, they can be sued and lose their license. In countries where that risk is different, he says, physicians can be more prescriptive, telling patients what the literature suggests, where the patient may be an outlier, and how to make decisions accordingly.

The result in the United States, in Midha’s telling, is that patients default to trying everything. They enter aggressive regimes of drugs and therapies, spend weeks in the hospital, suffer lower quality of life, and use large public resources. Doctors feel bad; taxpayers pay; patients lose the chance to spend their final days on what matters to them.

The AI question was whether a model could make a more precise recommendation after a terminal diagnosis than a human physician could. Midha says the technology worked once the dataset was available, and that even regression models could work; the team did not need to be fancy. At the time they tried simple neural nets. Today, he says, what can be done with reinforcement learning is “extraordinary.” But the central blocker remains regulatory: the burden of a wrong clinical diagnosis cannot simply be shifted from physician to AI system.

That regulatory obstacle disillusioned him a decade or more ago because he did not have the resources to influence it. Now he says he is spending time on a new incubation to train AI models for more precise end-of-life prediction and patient empowerment.

I haven't been able to get this out of my mind a single day for the last 14 years.

Anjney Midha · Source

Midha frames two issues as ones that should be bipartisan in America. The first is empowering patients to make better end-of-life clinical decisions while reducing taxpayer burden through science. The second is net-positive data centers. In his view, they sit on the same scaling curve: responsible compute infrastructure is the bottleneck to training enough AI models to help in domains like end-of-life care.

“Output maxing” is the broader discipline behind the infrastructure and healthcare examples

When asked to name the broader discipline behind his interests in GPU waste, healthcare waste, and frontier systems, Midha gives a simple engineering phrase: “output maxing.” The point is to make the most of what exists.

He is not arguing for minimalism or anti-scaling. He accepts the bitter lesson as a powerful idea, but rejects a crude interpretation in which teams throw enormous numbers of next-generation GPUs at suboptimal model scaling and tolerate waste. Nor does he believe optimization means maintaining 50 different architectures without enough standardization. He credits Anthropic’s velocity partly to a decision to pick the transformer architecture and double down when investment in the field was otherwise fragmented across alternatives.

Output maxing, for Midha, is closely related to alignment, though he recognizes that “alignment” is overloaded. He uses it in a full-stack systems sense: alignment between limited partners, venture firms, founders, public shareholders, customers, infrastructure suppliers, researchers, and communities. Systems often begin small, with tight feedback loops and natural alignment. As they scale, division of labor and specialization introduce abstractions. Every interface behaves like an API, and every API creates some communication loss.

The technical and organizational question is whether systems can scale up and scale out without lossy transmission. Midha sees two routes. One is standardization: protocols and API specifications that reduce communication loss. The other is new capability that creates enough abundance that previous constraints relax. His example is a room-temperature superconductor, which would be a lossless transmission mechanism for energy and, in his view, could enable things like flying cars within a few years.

Periodic Labs, where the interview is recorded, becomes an example of the second route. Midha says the mission there is superconductivity, not coding. Coding models are tools the team can use, but the constraint is physics: “literally reality.”

Compute markets need protocols, trust, and enough liquidity to start moving

Shawn Wang mentions SF Compute as an effort to standardize futures contracts for compute. Midha says he hopes efforts like that can be accelerated because exchanges are hard to bootstrap. Compute markets face inefficiencies at multiple levels: trust boundaries between parts of the stack, capital markets, and operations. A large injection of compute demand or supply can shock the system into a flywheel.

His ideal is a two-way protocol. If SF Compute has excess capacity, it should be able to connect to the AMP grid and receive demand. If it has demand but lacks supply, it should be able to connect to AMP capacity. AMP’s working implementation is currently among labs, universities, and trusted parties that already feel aligned, but Midha says the hope is an open protocol anyone can connect to.

The supply side is not limited to NVIDIA, though standardization matters. Wang asks whether alternative chips hurt AMP’s standardization goals. Midha says they need not. His example is MatX, led by Reiner Pope. Midha says Pope chose the NVIDIA reference architecture as the standard for MatX’s data-center footprint, so MatX chips can plug into sites planned for NVIDIA bring-up. From an I/O and rack-footprint perspective, Midha says, MatX follows the NVIDIA rack standard. The company’s innovation is elsewhere, in systems co-design and the logic die.

That choice is, in Midha’s view, strategically disciplined. A new chip company cannot fight on every front. By piggybacking on NVIDIA’s published reference architecture, MatX avoids innovating on data-center design while focusing on the bottleneck it believes it can improve. Midha does not present this as competition with NVIDIA. Demand is so high, he says, that NVIDIA cannot meet production needs; the ecosystem needs more chips.

The harder problem for chip startups is the trust boundary. Systems co-design requires visibility into future model architectures as early as possible because chip tape-out takes about two years. If the model architecture changes by the time the chip arrives, the startup is exposed. Inside Google, chip and model teams can sit within the same trust boundary; outside, founders lose that tight loop.

Midha sees part of his role as helping chip teams that can unlock capacity for the independent ecosystem gain access to trusted model-roadmap visibility. His relationships with labs such as Anthropic, Mistral, Black Forest Labs, and others are relevant because trust is the scarce coordination resource.

Founder judgment is another alignment layer

Midha rejects the venture-capital habit of putting technical researchers into a narrow box: brilliant scientist, not CEO. He gives the example of Anastasios, associated with LM Arena, whom he helped briefly in what he calls an administrative “CEO intern” role while the founders were graduating from their PhDs and recruiting leadership. By the time Anastasios completed his PhD, Midha says, he had published work with citation counts exceeding people twice his age and had already started LM Arena as a side project used by millions.

To Midha, that is evidence of output, not narrowness. He says venture capitalists are often bad at seeing human beings as dynamic agents. They observe a researcher and conclude “not a CEO, not a founder.” His answer is to point at Dario Amodei: a scientist who, in Midha’s forecast, has gone from zero to what will soon be “a trillion dollar company” in four years.

The distinction he draws is between being nominally a CEO and being a great CEO. Being a CEO by title is not hard; being a good or great one is. But researchers who have reached the top of their fields have already performed at extremely high levels. Midha compares them to star athletes of the mind. Publishing at the frontier requires winning resources, earning organizational trust, leading collaborators, and making judgment calls under pressure.

There is a caveat. Some researchers do not want to be CEOs; they primarily want to publish. Midha thinks that is legitimate and says AMP donates excess compute to nonprofits and university labs, including carving out a couple thousand H100s. But researchers who do want to be CEOs must accept a different kind of confrontation. Scientific communities already reward conviction and argument about architectures and results. CEOs must be confrontational up and down the stack: with their own teams, hiring, recruiting, customers, and partners.

This is also why Midha is uncomfortable with “winning” as the dominant frame for frontier AI. Shawn Wang notes that people still want competitiveness. Midha replies that the better word is “lead.” To lead is to push the frontier, advance the state of the art, do something not done before, and capture enough value to continue innovating. But capture too much value, or behave as if value capture is detached from mission, and people will perceive misalignment.

The same distinction shapes how he reads competition among frontier companies. People doing the technical work, he says, often do not see themselves as competing in the broad categories outsiders assign to them. A company working on real-time action prediction models may not regard another “world model” company as a direct competitor unless it is solving the same precise bottleneck. Outsiders collapse technical distinctions into market categories, then narrate winners and losers. Operators closer to the work look for specificity: what is the bottleneck, what is the system, what is the useful abstraction?

Midha sees the present AI moment as unusually favorable to first-principles thinkers because scaling and the bitter lesson are forcing people to revise assumptions. He criticizes reasoning by analogy, especially among investors. In uncertain periods, he says, people cling to heuristics from the previous era and proclaim them as axioms. An axiom can be proven internally; a heuristic is a shortcut. Confusing the two leads people to misjudge companies and founders.

Anthropic and Periodic show why culture is brittle under abundance

Midha calls culture the ultimate moat, then immediately qualifies the metaphor. In practice, he says, very few moats are really moats. Culture is fragile and must be replenished. He recalls seeing a quote on a wall at Andreessen Horowitz, which he attributes in the conversation to Bushido or a Japanese philosopher: culture is not a set of beliefs, but a set of actions. If leaders stop taking the actions that demonstrate alignment with the mission they have stated to their team and the world, the culture frays.

Culture is not a set of beliefs, it's a set of actions.

Anjney Midha

This is his diagnosis for AI labs that have sufficient cash and compute but still cannot ship state-of-the-art systems. The missing ingredient, he argues, is not another resource infusion but culture. Culture can compound, but only if daily trade-offs reinforce it. The most durable version comes from leaders knowing themselves well enough that mission-aligned trade-offs feel authentic rather than performative.

Anthropic is his main example. From the beginning, he says, the company had a missionary belief that capabilities would scale, that systems would remain stochastic rather than deterministic, that error bars would matter, and that risk would remain until interpretability was cracked. A document shown during this part of the source displays Dario Amodei’s October 2024 essay “Machines of Loving Grace,” subtitled “How AI Could Transform the World for the Better.” The visible opening argues that focusing on AI risks does not mean pessimism; in Amodei’s framing, risks are “the only thing standing between us” and a fundamentally positive future. That visual sits directly inside Midha’s discussion of Anthropic’s culture: upside and risk are treated as linked, not opposites.

Midha acknowledges that some people might think Anthropic overestimated risk in earlier periods, such as delaying or constraining Claude 1. But he argues hindsight is not the relevant standard. At the time, the company did not know how the model would be used, and the liability of being seen as irresponsible could have been existential. Repeating “safety” every day, and allowing the team and world to hold leadership accountable to it, became part of the culture.

Midha’s account of Anthropic’s operating priority is narrower and more concrete: he says its P0 from day one was coding. He does not present this as a complete public history of the company, but as the internal priority that forced trade-offs. The reasoning, in his telling, was that if Anthropic could crack coding, it would crack a generally powerful capability that accelerates all kinds of work on a computer. If it could accelerate work on a computer, it could help reach AGI safely.

Wang asks how Anthropic “cracked coding,” offering the hypothesis that Claude had a lucky edge that the company noticed and compounded. Midha’s answer is preparation. Anthropic, in his view, had been the most prepared AI company for four years. When the right context data arrived and developers began sending the right diffs and usage signals, the company was ready. A person could call that luck, he says, but it was luck meeting paranoia, preparation, and efficiency. He also emphasizes early scarcity: Anthropic had a hard time getting going and had to do more with less.

That scarcity matters because it sharpened culture. The many “no’s” Anthropic heard from investors, including those already committed to OpenAI, were a feature rather than a bug. Scarcity forced the company to decide what hill it would die on and what its P0 really was. Midha worries that many new AI labs raise too much money too quickly, before hardship forces that definition. Without scarce resources, they do not have to choose; without choosing, their cultures become fragile before they reach takeoff.

Periodic Labs is where Midha applies the same frame to a scientific company. The first constraint is the most literal one: physics. Periodic’s mission is superconductivity, and the hard boundary is technical reality. He says that 12 months earlier, the idea was not popular. While he was a visiting scientist in Stanford’s physics department, he and others benchmarked frontier models on physics and science capabilities. The models were useful for summarizing papers, but poor at analyzing scientific data from a condensed matter physics lab. Periodic’s founders, including Liam, a co-creator of ChatGPT, and Dogus, who worked near Demis Hassabis at DeepMind and created GNoME, were entering a domain where existing models did not already solve the core problem.

Midha also uses Periodic to illustrate culture under financial pressure. He says some people wanted to join, then took jobs elsewhere that offered more money. After what he describes only as a technical breakthrough and a state-of-the-art system, some wanted to come back. Midha says he told them no. The point was not simply resentment; it was a culture decision. If the mission matters, and if the hard period defines who is truly committed, then rejoining only after validation changes the signal.

He generalizes from that to Silicon Valley. It is, in his words, both deeply missionary and deeply mercenary. Big money can cause people to lose their minds, even when the amounts are small in the grand scheme. For Midha, money is useful as a resource for mission. When it becomes the measure itself, it loses meaning.

AI Startups and Funding AI Labs and Strategy Inference and Deployment AI Governance and Regulation AI in Healthcare and Life Sciences AI Infrastructure and Compute