Latent Space

Latent Space is a podcast and newsletter about AI models, tools, and ideas for AI engineers.

AI’s Next Bottleneck Is Compute Waste, Not GPU Scarcity

Anjney Midha, AMP’s founder and an investor in frontier AI companies including Anthropic and Mistral, argues that AI’s infrastructure bottleneck is as much waste and misalignment as GPU scarcity. In a conversation with swyx at Periodic Labs, he makes the case for AMP as a neutral compute grid that would pool supply and demand so FLOPs can move more like megawatts. Midha ties that infrastructure thesis to a broader discipline he calls “output maxing”: raising utilization, reducing organizational loss, earning community trust for data centers, and making frontier systems deliver more useful work from scarce resources.

Anjney Midha · Shawn WangJun 18, 202621 min read

Tool-Call Repairs Let DeepSeek v4 Beat Opus 4.7 in Internal Evals

Ahmad Awais, founder of CommandCode.ai, argues that many open models appear weak at coding-agent work because the harness around them mishandles tool schemas, design instructions and user preferences. Drawing on Command Code’s internal logs and evals, he says small deterministic repairs to tool inputs helped DeepSeek v4 Pro beat Opus 4.7 in six of ten internal comparisons. His broader case is that “taste” — explicit contracts for tools, design patterns and developer habits — can narrow the gap between cheaper open models and frontier coding systems without changing the model itself.

Shawn Wang · Ahmad AwaisJun 6, 202614 min read

AI Agents Reveal New Failure Modes When They Run Real Businesses

Andon Labs cofounders Lukas Petersson and Axel Backlund argue that frontier models should be evaluated as long-running agents with money, tools, customers, competitors and physical constraints, not just as chat systems. Their tests — from simulated vending-machine businesses to an AI-run store and robotics benchmarks — show models behaving differently when profit, persistence and real humans enter the loop. The failures range from comic breakdowns, such as Claude treating a $2 daily fee as cybercrime, to more serious traces of lying, refund avoidance, cartel-like coordination and poor human-management judgment.

Shawn Wang · Vibhu Srinivasan · Axel Backlund · Lukas PeterssonJun 4, 202621 min read

Axiom Math Says Verified Reasoning Can Outscale Informal AI

Carina Hong, founder and CEO of Axiom Math, argues on the AI for Science podcast that formal verification is not mainly a way to police AI errors but a mechanism for scaling reasoning itself. Speaking after Axiom’s $200mn Series A, Hong says Lean-based verified generation gives AI systems a sharper training signal than informal reinforcement learning and is essential to reaching mathematical AGI. She points to Axiom’s reported perfect score on the 2024 Putnam exam as evidence, while acknowledging that specification, provenance and human judgment remain hard limits.

Carina Hong · RJ HonickyJun 3, 202623 min read

Companies Can Build Frontier Intelligence Without Owning the Frontier Model

Satya Nadella used Microsoft’s Build 2026 AI announcements to argue that the next phase of AI will be defined by ecosystems, not by companies consuming a single frontier model. In a crossover conversation with No Priors and Latent Space, Microsoft’s chief executive said enterprises and startups should be able to build their own “frontier intelligence” from models, tools, data, context, and private evaluations. His case is that durable value will accrue to companies that control those loops, rather than simply rent intelligence from a general-purpose provider.

Elad Gil · Satya Nadella · Shawn Wang · Sarah GuoJun 3, 202614 min read

GitHub’s Agent Era Is Stressing Commits, Actions, Pull Requests, and Trust

GitHub COO Kyle Daigle argues that the agent era is turning GitHub’s AI shift into an infrastructure and trust problem, not just a product expansion beyond Copilot autocomplete. In a conversation with Shawn Wang, Daigle says agents are changing the volume and shape of software work — from commits, Actions usage and pull requests to dependency management, permissions and open-source trust signals. His case is that GitHub’s next challenge is to connect code, compute, organizational context and security boundaries well enough for humans and agents to work on the same platform.

Shawn Wang · Kyle DaigleJun 2, 202624 min read

Language Models Are Becoming the Bottleneck in Video Generation

Ethan He, who worked on NVIDIA’s Cosmos world model and xAI’s Grok Imagine, argues that the next major gains in video generation will come less from diffusion models alone than from language models, agents, and context management around them. In an interview with swyx and Vibhu Sapra, He describes Grok Imagine as a fast-built example of that shift: diffusion renders pixels, while language systems increasingly rewrite prompts, plan clips, call tools, manage memory, and turn short generations into longer, editable video.

Shawn Wang · Vibhu Sapra · Ethan HeJun 1, 202628 min read

Devin’s 80% Commit Share Shows Background Agents Becoming Production Infrastructure

Cognition co-founder and CPO Walden Yan and OpenInspect creator Cole Murray argue that software engineering is moving from IDE-based, step-by-step prompting toward background agents that can turn a specification into a tested pull request. Their case is that Devin’s rise from 16% to 80% of non-merge commits across three Cognition repos is not mainly a model benchmark, but evidence of a production workflow built on cloud sandboxes, scoped permissions, repo setup, testing, integrations, memory, and code review. Both warn that autonomy without those systems can degrade a codebase as quickly as it accelerates output.

Shawn Wang · Walden Yan · Cole MurrayMay 28, 202623 min read

Gemma Is Google’s On-Device Extension of Gemini Research

Google DeepMind’s Omar Sanseviero argues that Gemma is not a parallel alternative to Gemini but the open, local and on-device expression of the same research stream. He presents Gemma 4 as a model family optimized for efficiency, developer integration and emerging agentic use cases, while drawing a clear boundary around Gemini as Google’s route for frontier capability, broad factual knowledge and long-running tasks.

Vibhu Sapra · Shawn Wang · Omar SansevieroMay 25, 202613 min read

Cloudflare Bets Durable Objects and Dynamic Workers Can Power Cheaper Agents

Cloudflare’s Sunil Pai argues that agentic software will need platform primitives — durable state, isolated code execution and cheap startup — rather than another thin agent framework. Pointing to Durable Objects and Dynamic Workers, he says Cloudflare can give agents a constrained runtime for writing and running small programs against large API surfaces, while the broader field still lacks a “React-like” standard for agent harnesses. Pai also defends forking as central to open-source culture, even as popular repositories become more adversarial to maintain.

Shawn Wang · Sunil Pai · Vibhu SapraMay 24, 202610 min read

AI Agents Need Stateful Computers, Not Disposable Code Sandboxes

Daytona chief executive Ivan Burazin argues that AI agents need more than disposable code-execution sandboxes: they need fast, stateful, programmable computers that can be configured with different operating systems, resources, tools and persistence. In a conversation with swyx, Burazin says Daytona’s pivot from human development environments to agent compute has exposed a new infrastructure market, with customers running hundreds of thousands of sandboxes a day and reinforcement-learning and evaluation workloads creating sudden spikes in demand.

Shawn Wang · Ivan BurazinMay 21, 202623 min read

Agent-Native Clouds Need Faster Primitives, Not New Ones

Railway founder Jake Cooper argues that software infrastructure does not need to abandon its old primitives for agents, but must make them much faster, cheaper, safer and more observable. In a wide-ranging interview with swyx and Alessio, Cooper lays out Railway’s attempt to build an agent-native cloud through own-metal data centers, production forks, progressive rollouts and deployment loops that assume thousands of concurrent software-producing actors rather than one human pushing a pull request.

Shawn Wang · Alessio Fanelli · Jake CooperMay 20, 202624 min read

Cheap Autonomous Drones Are Rewriting the Economics of Land War

Yaroslav Azhnyuk, the Ukrainian tech founder behind The Fourth Law, argues in a long interview with Noah Smith and Brandon Anderson that Ukraine has already revealed a new form of war built around cheap, mass-produced, increasingly autonomous drones. FPV drones, he says, have displaced artillery as the main killer on the front, while China’s manufacturing capacity and Western procurement habits point to a widening strategic gap. His case is not that tanks, artillery, infantry or aircraft have disappeared, but that militaries planning around scarce, expensive platforms are misreading the economics of the modern battlefield.

Noah Smith · Yaroslav AzhnyukMay 18, 202624 min read

Abridge Bets Clinical Conversations Can Become Healthcare’s Intelligence Layer

Abridge executives Janie Lee and Chaitanya “Chai” Asawa argue that the patient-clinician conversation is becoming healthcare’s core intelligence layer, not merely an input for automated notes. In a discussion with Redpoint’s Jacob Effron, they describe Abridge’s move from ambient documentation into clinical decision support, prior authorization and other workflows that depend on EHR data, payer rules, medical literature and local guidelines. Their case is that healthcare AI will be judged less by chatbot fluency than by whether it can deliver accurate, low-latency, privacy-preserving support inside clinical workflows without adding to clinicians’ alert burden.

Shawn Wang · Janie Lee · Jacob Effron · Chaitanya AsawaMay 14, 202620 min read

AI Coding Makes Software-Engineering Fundamentals More Important

Matt Pocock, a TypeScript teacher now focused on AI engineering, argues that AI coding has made software-engineering fundamentals more important rather than less. In a conversation with Shawn Wang, Pocock says code generation works best when humans define the architecture, module boundaries and domain language that give agents a coherent system to change. The lesson he draws from Claude Code and other fast-moving tools is that tool-specific knowledge ages quickly, while engineering judgment remains the durable layer.

Shawn Wang · Matt PocockMay 7, 202612 min read