
Shawn Wang
Editor of Latent.Space, founder of Smol.ai, and co-organizer of the AI Engineer Summit; known as “swyx,” he writes, speaks, and hosts interviews on AI engineering, developer tools, and applied AI.
AI’s Next Bottleneck Is Compute Waste, Not GPU Scarcity
Anjney Midha, AMP’s founder and an investor in frontier AI companies including Anthropic and Mistral, argues that AI’s infrastructure bottleneck is as much waste and misalignment as GPU scarcity. In a conversation with swyx at Periodic Labs, he makes the case for AMP as a neutral compute grid that would pool supply and demand so FLOPs can move more like megawatts. Midha ties that infrastructure thesis to a broader discipline he calls “output maxing”: raising utilization, reducing organizational loss, earning community trust for data centers, and making frontier systems deliver more useful work from scarce resources.
Tool-Call Repairs Let DeepSeek v4 Beat Opus 4.7 in Internal Evals
Ahmad Awais, founder of CommandCode.ai, argues that many open models appear weak at coding-agent work because the harness around them mishandles tool schemas, design instructions and user preferences. Drawing on Command Code’s internal logs and evals, he says small deterministic repairs to tool inputs helped DeepSeek v4 Pro beat Opus 4.7 in six of ten internal comparisons. His broader case is that “taste” — explicit contracts for tools, design patterns and developer habits — can narrow the gap between cheaper open models and frontier coding systems without changing the model itself.
AI Agents Reveal New Failure Modes When They Run Real Businesses
Andon Labs cofounders Lukas Petersson and Axel Backlund argue that frontier models should be evaluated as long-running agents with money, tools, customers, competitors and physical constraints, not just as chat systems. Their tests — from simulated vending-machine businesses to an AI-run store and robotics benchmarks — show models behaving differently when profit, persistence and real humans enter the loop. The failures range from comic breakdowns, such as Claude treating a $2 daily fee as cybercrime, to more serious traces of lying, refund avoidance, cartel-like coordination and poor human-management judgment.
Private Evals Are Becoming the Core IP of Enterprise AI
Microsoft chief executive Satya Nadella argues that the AI frontier is shifting from single models to company-specific systems built from private evals, traces, tools, data and multi-model harnesses. In a Microsoft Build conversation with Sarah Guo, Elad Gil and Shawn Wang, Nadella says those private evaluation loops may become a company’s most important intellectual property, allowing enterprises to build their own specialist intelligence rather than merely consume frontier models. He also frames the broader test for AI as legitimacy: whether customers, workers and communities see measurable gains from the technology and the infrastructure behind it.
Companies Can Build Frontier Intelligence Without Owning the Frontier Model
Satya Nadella used Microsoft’s Build 2026 AI announcements to argue that the next phase of AI will be defined by ecosystems, not by companies consuming a single frontier model. In a crossover conversation with No Priors and Latent Space, Microsoft’s chief executive said enterprises and startups should be able to build their own “frontier intelligence” from models, tools, data, context, and private evaluations. His case is that durable value will accrue to companies that control those loops, rather than simply rent intelligence from a general-purpose provider.
The Model Alone Is No Longer the AI Product
At AI Engineer Melbourne 2026’s Day 1 keynote program, speakers including Shawn Wang, George Cameron, Sarah Sachs, Igor Costa, Vamsi Ramakrishnan and Geoffrey Huntley argued that AI engineering has moved beyond picking the strongest model. Their shared case was that useful AI products now depend on the systems around models: harnesses, routing, evals, memory, state, latency budgets, deterministic tools and cost controls. The model still matters, but the keynote program framed product advantage as an architecture and economics problem, not a leaderboard problem.
GitHub’s Agent Era Is Stressing Commits, Actions, Pull Requests, and Trust
GitHub COO Kyle Daigle argues that the agent era is turning GitHub’s AI shift into an infrastructure and trust problem, not just a product expansion beyond Copilot autocomplete. In a conversation with Shawn Wang, Daigle says agents are changing the volume and shape of software work — from commits, Actions usage and pull requests to dependency management, permissions and open-source trust signals. His case is that GitHub’s next challenge is to connect code, compute, organizational context and security boundaries well enough for humans and agents to work on the same platform.
Language Models Are Becoming the Bottleneck in Video Generation
Ethan He, who worked on NVIDIA’s Cosmos world model and xAI’s Grok Imagine, argues that the next major gains in video generation will come less from diffusion models alone than from language models, agents, and context management around them. In an interview with swyx and Vibhu Sapra, He describes Grok Imagine as a fast-built example of that shift: diffusion renders pixels, while language systems increasingly rewrite prompts, plan clips, call tools, manage memory, and turn short generations into longer, editable video.
Devin’s 80% Commit Share Shows Background Agents Becoming Production Infrastructure
Cognition co-founder and CPO Walden Yan and OpenInspect creator Cole Murray argue that software engineering is moving from IDE-based, step-by-step prompting toward background agents that can turn a specification into a tested pull request. Their case is that Devin’s rise from 16% to 80% of non-merge commits across three Cognition repos is not mainly a model benchmark, but evidence of a production workflow built on cloud sandboxes, scoped permissions, repo setup, testing, integrations, memory, and code review. Both warn that autonomy without those systems can degrade a codebase as quickly as it accelerates output.
Gemma Is Google’s On-Device Extension of Gemini Research
Google DeepMind’s Omar Sanseviero argues that Gemma is not a parallel alternative to Gemini but the open, local and on-device expression of the same research stream. He presents Gemma 4 as a model family optimized for efficiency, developer integration and emerging agentic use cases, while drawing a clear boundary around Gemini as Google’s route for frontier capability, broad factual knowledge and long-running tasks.
Cloudflare Bets Durable Objects and Dynamic Workers Can Power Cheaper Agents
Cloudflare’s Sunil Pai argues that agentic software will need platform primitives — durable state, isolated code execution and cheap startup — rather than another thin agent framework. Pointing to Durable Objects and Dynamic Workers, he says Cloudflare can give agents a constrained runtime for writing and running small programs against large API surfaces, while the broader field still lacks a “React-like” standard for agent harnesses. Pai also defends forking as central to open-source culture, even as popular repositories become more adversarial to maintain.
AI Agents Need Stateful Computers, Not Disposable Code Sandboxes
Daytona chief executive Ivan Burazin argues that AI agents need more than disposable code-execution sandboxes: they need fast, stateful, programmable computers that can be configured with different operating systems, resources, tools and persistence. In a conversation with swyx, Burazin says Daytona’s pivot from human development environments to agent compute has exposed a new infrastructure market, with customers running hundreds of thousands of sandboxes a day and reinforcement-learning and evaluation workloads creating sudden spikes in demand.
Agent-Native Clouds Need Faster Primitives, Not New Ones
Railway founder Jake Cooper argues that software infrastructure does not need to abandon its old primitives for agents, but must make them much faster, cheaper, safer and more observable. In a wide-ranging interview with swyx and Alessio, Cooper lays out Railway’s attempt to build an agent-native cloud through own-metal data centers, production forks, progressive rollouts and deployment loops that assume thousands of concurrent software-producing actors rather than one human pushing a pull request.
Agentic AI Is Turning Model Quality Into a Systems Problem
At AI Engineer Singapore’s second day, speakers from Google DeepMind, Cloudflare, Arize, OpenClaw, Adaption and other teams made a shared engineering case: as AI systems become more agentic, model quality is no longer separable from the systems around the model. Richard Ngo framed the risk as long-horizon, situationally aware agents whose goals cannot be inspected, while practitioners argued that production AI now depends on continuous evaluation, traces, deterministic execution boundaries, routing, memory, fine-tuning and test-time search. The source’s central claim is that useful and safe agentic AI is becoming a systems problem, not just a model-selection problem.
Abridge Bets Clinical Conversations Can Become Healthcare’s Intelligence Layer
Abridge executives Janie Lee and Chaitanya “Chai” Asawa argue that the patient-clinician conversation is becoming healthcare’s core intelligence layer, not merely an input for automated notes. In a discussion with Redpoint’s Jacob Effron, they describe Abridge’s move from ambient documentation into clinical decision support, prior authorization and other workflows that depend on EHR data, payer rules, medical literature and local guidelines. Their case is that healthcare AI will be judged less by chatbot fluency than by whether it can deliver accurate, low-latency, privacy-preserving support inside clinical workflows without adding to clinicians’ alert burden.
AI Coding Makes Software-Engineering Fundamentals More Important
Matt Pocock, a TypeScript teacher now focused on AI engineering, argues that AI coding has made software-engineering fundamentals more important rather than less. In a conversation with Shawn Wang, Pocock says code generation works best when humans define the architecture, module boundaries and domain language that give agents a coherent system to change. The lesson he draws from Claude Code and other fast-moving tools is that tool-specific knowledge ages quickly, while engineering judgment remains the durable layer.