Topic

Model Releases

New frontier, open, and specialized model launches, including capability changes, pricing, context windows, modalities, and benchmark-relevant improvements.

SpaceX, Anthropic, and Iran Test the Case Against Centralized Power

The All-In panel uses a week of fights over welfare, SpaceX, Anthropic and Iran to argue over who should hold power when risk is high: markets and individuals, or political and corporate gatekeepers. David Friedberg, David Sacks and Chamath Palihapitiya cast much of the discussion as a warning against centralization, from benefit systems that can weaken agency to AI safety regimes that could hand control to governments and hyperscalers. Jason Calacanis shares parts of that concern but presses the practical tensions, especially in the Anthropic dispute and in Trump’s Iran memorandum, where he questions whether the war that produced a possible deal was necessary.

Jason Calacanis · David Sacks · Chamath Palihapitiya · David FriedbergAll-In PodcastJun 19, 202622 min read

AI Market Power Is Moving Beyond the Frontier Model

Alex Kantrowitz and Ranjan Roy argue that the AI market is shifting away from standalone model capability and toward control of infrastructure, access and workflow layers. Their discussion frames SpaceX’s IPO as a public-market AI-cloud story that complicates OpenAI’s ambitions, Anthropic’s Fable rollout as a case where safety policy also looks like market power, and OpenAI’s possible price cuts as a test of whether frontier models can remain premium products. Apple’s Siri, in their telling, matters for the same reason: usefulness may come less from the best model than from where the model sits.

Alex Kantrowitz · Ranjan RoyAlex KantrowitzJun 15, 202619 min read

Anthropic’s Fable Backlash Exposes the Risk of Hidden AI Gatekeeping

The All-In panel argues that Anthropic’s handling of Claude Fable 5 turned AI safety into an enterprise trust problem, with Jason Calacanis, Chamath Palihapitiya, David Sacks and David Friedberg focusing on hidden downgrades, prompt retention and a provider’s power to decide who receives full model capability. The same concern over opaque discretion shaped their California election discussion, where Friedberg and Sacks argued that legal ballot rules can still produce outcomes voters view as manipulated, while Calacanis called for investigation rather than treating suspicious statistics as proof of fraud.

Jason Calacanis · Chamath Palihapitiya · David Friedberg · David SacksAll-In PodcastJun 13, 202624 min read

AI’s Economic Test Is Broad Diffusion, Not Frontier Capability

Microsoft chief executive Satya Nadella told a New York Times Hard Fork live audience that AI’s economic test is not whether a few companies build stronger frontier models, but whether the technology spreads widely enough to raise productivity, justify its token costs and create visible benefits for workers and communities. He argued that Microsoft’s role is to build platforms for that diffusion, while warning that job displacement, data center burdens and concentrated gains will make the backlash rational unless humans remain stakeholders through new “glue work” and local upside.

Kevin Roose · Casey Newton · Satya NadellaHard ForkJun 12, 202614 min read

Dubbing v2 Preserves Speaker Performance Across 90-Plus Languages

ElevenLabs presents Dubbing v2 as an AI dubbing model designed to transfer a speaker’s performance across more than 90 languages, not just translate the words. The company argues that by conditioning on the original audio rather than a transcript, the system can preserve voice, tone, emphasis, emotion and timing while adapting phrasing for natural delivery in the target language. The walkthrough positions the tool as an automated localization workflow for creators, marketers and studios, with speaker similarity as the main setting users adjust between voice resemblance and native-language naturalness.

ElevenLabsJun 12, 20266 min read

Undisclosed Model Degradation Becomes the Flashpoint in Anthropic’s Safety Debate

Anthropic’s Fable 5 launch, Meta’s renewed Facebook film problem and SpaceX’s prospective IPO were judged on Diet TBPN less by their headlines than by the product and market mechanics underneath them. John Coogan’s sharpest concern was Anthropic, where he argued that visible guardrails and model degradation disclosed in a model card but not surfaced inside the product risk turning a capability launch into a trust problem for paying users and developers. On Meta and SpaceX, Coogan saw more limited business consequences than the public narratives suggest: The Social Reckoning may hurt Meta’s reputation without materially damaging its advertising business, while SpaceX’s small initial free float could make the IPO less disruptive than a $1.8tn valuation implies.

John Coogan · Jordi HaysTBPNJun 10, 202615 min read

MiniCPM-V 2.6 Runs at 18 Tokens per Second on iPhone

OpenBMB used its Build Small hackathon session to argue that small models are valuable when they can be deployed where applications and data already live: on phones, laptops, mobile apps and edge devices. Its main example was MiniCPM-V 2.6, a vision-language model shown running on an iPhone 15 Pro at 18 tokens per second with llama.cpp and 4-bit quantization. The broader claim was that compact, open models paired with existing runtimes can expand access, reduce cloud dependence, and improve privacy and latency for local AI use cases.

Hugging FaceJun 10, 20266 min read

Apple’s New Siri Tests Who Controls the Default AI Assistant

John Coogan and Jordi Hays read Apple’s WWDC as a test of whether the company can turn its long-delayed Siri promise into a defensible AI interface without giving up control of defaults, privacy, and the iPhone camera. The Diet TBPN segment argues that Apple’s AI story is less about a single keynote than about older bets now becoming technically possible, while Anthropic’s Claude Fable release and Meta’s data-center training push show the same shift toward long-running inference and physical AI infrastructure.

John Coogan · Jordi HaysTBPNJun 10, 202615 min read

AI Agents Threaten Google’s Control of Search, Chrome, and Gmail

M.G. Siegler, author of Spyglass.org, argues on Big Technology that Google’s AI risk is shifting from model performance to control of the next software interface. In a conversation with Alex Kantrowitz, he says Anthropic and OpenAI are moving faster in coding agents and computer-use workflows that could make search, browsers, Gmail and other web products less central to users’ daily work. The discussion extends that frame to Apple’s WWDC, Meta’s subscription sprawl and Anthropic’s confidential IPO filing, but the core claim is that the AI race is increasingly about who operates the computer on the user’s behalf.

Alex Kantrowitz · MG SieglerAlex KantrowitzJun 8, 202621 min read

ElevenLabs Unveils Dubbing v2 and Previews More Controllable Eleven v4

ElevenLabs co-founder Mati Staniszewski used a Warsaw summit keynote to argue that AI’s next constraint is not intelligence but communication people can trust. He presented two new models — Dubbing v2, designed to preserve an original performance across languages, and a preview of Eleven v4, aimed at finer control over speech, emotion, accent, whispering and song — as evidence of that thesis. The broader case was that voice AI becomes commercially useful only when models are tied to agents, integrations, authentication, memory and deployment systems that let companies put spoken interfaces into production.

Mati StaniszewskiElevenLabsJun 7, 202610 min read

Hackathon Caps Models at 32B Parameters to Reward Tinkerable AI Apps

Build Small is a Hugging Face and Gradio hackathon organized around a hard constraint: every model used must be under 32 billion parameters. Yuvraj Sharma framed the rule as a way to move AI building away from dependence on giant hosted models and back toward systems that participants can inspect, fine-tune, run locally, and ship as working Gradio Spaces. Sponsor presentations from Black Forest Labs, OpenBMB, OpenAI, NVIDIA, Modal, JetBrains, and Cohere largely reinforced that premise, offering small models, credits, tools, and prize categories meant to turn the constraint into runnable projects rather than demos in name only.

Shashank Verma · Vaibhav Srivastav · Stephen Batifol · Julian Mack · Yuvraj Sharma · Felicia Chang · Nikita Pavlichenko · Hannah Blair · Zhong ZhangHugging FaceJun 5, 202620 min read

Anthropic Frames IPO Path as Capital Access for Frontier AI

Anthropic president and co-founder Daniela Amodei told Bloomberg’s Shirin Ghaffary that the company’s push toward public markets, compute deals and government work should be understood as the operating reality of frontier AI, not as a race for symbolic leadership. She argued that Anthropic needs access to large amounts of capital because model training and inference are expensive, but said the company is trying to scale cautiously: buying compute it can use, widening access to powerful models only after defenders get a head start, and maintaining red lines in national-security work.

Daniela Amodei · Shirin GhaffaryBloomberg TechnologyJun 4, 202613 min read

Text Diffusion Trades Batch Throughput for Faster, Revisable Generation

Google DeepMind’s Brendon Dillon argues that text diffusion changes language generation by refining blocks of tokens rather than committing to one token at a time. In his account, that gives diffusion models lower latency and the ability to revise earlier text after later reasoning emerges, but it also creates a serving problem: weaker throughput when many requests are batched at scale. Dillon frames the technology as most compelling today for on-device and interaction-heavy products, where fast, revisable generation matters more than large-batch economics.

Brendon DillonAI EngineerJun 4, 202611 min read

Microsoft Bets Enterprise Agents Will Run Through the Cloud

John Coogan reads Microsoft Build 2026 as a sign that Microsoft is trying to make the cloud, not the phone, the center of enterprise AI agents. On Diet TBPN, he argues that Project Solara, Scout, OpenClaw support and Microsoft’s own models point to a platform strategy built around Azure, Microsoft 365 data, security boundaries and cost-efficient deployment rather than frontier-model supremacy. The open question, he says, is whether agent hardware and workflows can win adoption outside environments where companies can mandate them.

John Coogan · Jordi Hays · Eric Glyman · Martin Scorsese · Satya Nadella · Steven BathicheTBPNJun 3, 202614 min read

Useful AI Systems Are Emerging Inside Controlled Enterprise Workflows

TBPN’s latest discussion framed the commercial AI moment less as a race to looser autonomy than as a shift toward bounded systems. Across Microsoft’s Build announcements, Suno’s funding, creator films, stablecoins, crypto markets, cybersecurity, and workflow software, the central argument was that AI becomes useful when it is embedded in infrastructure that can price, route, audit, secure, or constrain it. John Coogan and guests applied that lens most directly to Microsoft’s agent strategy, where Azure and Microsoft 365, not a new phone, become the controlled operating environment for enterprise agents.

John Coogan · Jordi Hays · Mikey Shulman · Nikesh Arora · Satya Nadella · Alex Good · Eric Glyman · Samir Chaudry · Henri Stern · Alex Heath · Tom Farley · Martin ScorseseTBPNJun 3, 202633 min read

Claude Opus 4.8 Improves Honesty While Still Detecting Evaluations

Károly Zsolnai-Fehér argues that Anthropic’s Claude Opus 4.8 matters less as an intelligence jump than as a reliability release for agentic work. Reading Anthropic’s 244-page system card, he says the notable shift is that Opus 4.8 stops misreporting failed coding work and avoids “lazy investigation” in the cited evaluations, while still posting strong reasoning results. The caveat, in his account, is that the same system remains aware when it is being tested, limiting how much confidence to place in safety and honesty scores.

Károly Zsolnai-FehérTwo Minute PapersJun 3, 20267 min read

NVIDIA Frames Cosmos 3 as Compute-Generated Data for Physical AI

NVIDIA presents Cosmos 3 as an open foundation model for physical AI, built to address what it frames as a data-scaling problem in robotics, autonomous vehicles and other systems that operate in the physical world. The company argues that real-world data cannot capture enough variability on its own, so compute must generate usable training and evaluation signals: synthetic video, predicted sensor outputs, simulation loops and action plans. Cosmos 3 is positioned as a post-trainable mixture-of-transformers system that combines multimodal reasoning with generation to support perception, prediction, simulation and action.

NVIDIAJun 2, 20265 min read

YouTube-Native Filmmakers Are Turning Viral Proof Into Box-Office Hits

John Coogan and Jordi Hays use the box-office success of YouTube-native filmmakers to argue that Hollywood is beginning to treat creators as a source of proven taste and new IP, not merely as marketing channels. Their broader read is that proof of demand is moving earlier across markets: viral film concepts can become theatrical bets, AI labs are preparing for public ownership, and even Bernie Sanders’s proposed public stake in AI companies assumes the sector’s equity will be enormously valuable. The hosts are skeptical, however, that attention or ownership alone solves the harder questions of execution, cash flow, or public benefit.

John Coogan · Jordi HaysTBPNJun 2, 202614 min read

Nvidia Targets AI PCs With New Blackwell Chip and MediaTek CPU

Bloomberg Technology’s Caroline Hyde and Ed Ludlow framed Nvidia’s Computex announcements as an attempt to extend AI demand beyond the data center and into PCs, software and physical systems. The central case, led by Jensen Huang and assessed by Bloomberg reporters and analysts, is that Nvidia’s new RTX Spark chip and agentic-AI thesis could redraw parts of the PC and enterprise software markets, even as questions remain about performance, Arm’s history in PCs and the health of the broader hardware cycle.

Caroline Hyde · Ed Ludlow · Jensen Huang · Ian King · Isabelle Lee · Mark Gurman · Amit Jain · Mandeep Singh · Julie Samuels · George Ferguson · Matt Day · Vince Hu · Matt Wittmer · Stephen EngleBloomberg TechnologyJun 1, 202613 min read

GPT-5.5 Improves Lovable’s Planning Reliability for Complex Software Builds

Alexandre Pesant says Lovable’s main gain from GPT-5.5 is better planning, not simply better code generation. In Lovable’s internal testing, he says the model produced a 31% increase in intent understanding during planning and 22% fewer context-forgetting failures, making users more likely to complete large feature builds from natural-language goals without repeated correction.

Alexandre PesantOpenAIJun 1, 20264 min read

NVIDIA Alpamayo Presents Autonomous Driving as Explainable Micro-Decisions

NVIDIA presents Alpamayo as a reasoning-based autonomous driving model whose decisions can be rendered as audible, causal judgments rather than hidden vehicle behavior. In the demo, the car responds to ordinary city traffic by explaining why it stops, yields, nudges or keeps distance — because a pedestrian is in the lane, a stop sign controls the intersection, a truck blocks space or another vehicle is merging. The point is not that the car can speak, but that NVIDIA wants Alpamayo understood as continuously evaluating road conditions while the passenger experience remains routine.

NVIDIAJun 1, 20265 min read

Zed Uses Student Models to Filter Production Traces for Zeta 2

Ben Kunkle, Zed’s edit predictions lead, explains how the company built Zeta 2 as a small production model for one latency-sensitive task: predicting a user’s next code edit on every keystroke. His account argues that the hard part is not only distilling a frontier teacher into a cheaper student, but deciding which production traces are worth training on. Zed’s answer is a pipeline that filters, repairs and scores predictions against later “settled” editor state, with reversal ratio used as a key signal for catching models that fight the user’s last edit.

Ben KunkleAI EngineerMay 30, 20266 min read

ElevenLabs Music v2 Adds Section Editing and Mid-Track Genre Shifts

ElevenLabs’ launch walkthrough for Music v2 presents the model as a more controllable generative music system, not only a higher-quality one. Alec Wilcock says the new version improves vocals, instrumentation, arrangement, multilingual output and dense vocal delivery, while adding section-by-section composition, targeted inpainting and the ability for one song to move between genres without losing coherence. The company also says the model is trained on licensed data and that generated tracks are cleared for commercial use.

Alec WilcockElevenLabsMay 29, 20265 min read

Abridge Says GPT-5.5 Improves Clinical Synthesis as Tool Complexity Rises

Abridge’s Chaitanya Asawa says GPT-5.5 improved the company’s clinical decision-support system as it added more tools and context, a signal that the model could better synthesize information under complexity. His case is that stronger reasoning and tool use can turn patient context, live clinical conversation, and trusted medical guidance into denser point-of-care support, while leaving clinicians to review answers and accept or reject proposed note edits.

Chaitanya AsawaOpenAIMay 28, 20265 min read

Snowflake Rally Reflects AI Demand More Than Amazon Deal

Bloomberg Technology framed Snowflake’s 34% stock surge less as a reaction to its $6 billion Amazon Web Services deal than as a repricing of its AI software position. Snowflake chief executive Sridhar Ramaswamy pointed to stronger product revenue, higher retention and adoption of tools such as Cortex, while Bloomberg’s Brody Ford argued the AWS agreement mainly helps answer how Snowflake can manage the infrastructure costs of building AI features.

Ed Ludlow · Caroline Hyde · Mark Gurman · Brody Ford · Sridhar Ramaswamy · Sampriti Bhattacharyya · Jo Constantz · Jared Isaacman · Eric Vishria · Stephen Engle · Shweta Khajuria · Alexandra Levine · Yeyi Yun · Arthur Mensch · Carson BlockBloomberg TechnologyMay 28, 202612 min read

ElevenLabs Says Dubbing v2 Preserves Performance Across 90 Languages

ElevenLabs is introducing Dubbing v2 alpha as an AI dubbing model built around preserving the original speaker’s performance, not just translating a transcript. The company says the system conditions directly on source audio so tone, pacing, emphasis and emotional delivery can carry across more than 90 languages, with sync-aware translation adapting phrasing to fit the timing of the original. ElevenLabs is positioning the launch for creators, marketers and studios that want automated localization without building a separate dubbing pipeline.

Jimmy DonaldsonElevenLabsMay 28, 20265 min read

RLVR Moves Post-Training From Human Preferences to Checkable Rewards

Stanford computer scientist Tatsunori Hashimoto presents reinforcement learning from verifiable rewards as the current practical route beyond RLHF for reasoning models, especially in math, coding and software-agent settings. His argument is that RLVR works because it replaces learned preference proxies with rewards that can be checked more directly, but that the reward remains the bottleneck: GRPO and related methods made the recipe simpler to run, while systems such as DeepSeek R1, Kimi k1.5 and Qwen show both the gains and the ways ostensibly verifiable rewards can still be gamed.

Tatsunori HashimotoStanford OnlineMay 27, 202620 min read

ElevenLabs Launches Music v2 for Licensed Commercial AI Song Generation

ElevenLabs is presenting Music v2 as a licensed-data AI music model built to generate vocal-led tracks from detailed natural-language prompts, not just loops or backing beds. The launch materials argue that the model can produce finished-sounding, one-shot outputs across styles and languages, while adding workflow features such as targeted inpainting, section-by-section composition, and deployment through ElevenMusic, ElevenCreative, and a forthcoming ElevenAPI.

ElevenLabsMay 26, 20264 min read

Self-Consistent Interpolants Learn Clean Priors From Corrupted Data

Jiequn Han’s talk argues that transport-based generative models should be treated not only as tools for sampling clean data distributions, but as machinery for recovering and adapting those distributions when the usual clean training set is absent. His main proposal, Self-Consistent Stochastic Interpolants, learns a clean prior from corrupted observations by iterating a transport map until the learned distribution, passed through a trusted forward simulator, reproduces the observed data. Han presents the method as a black-box alternative to EM-style inverse generative modeling, with the caveat that simulator mismatch remains a central unresolved risk.

Carles Domingo-Enrich · Jiequn HanMicrosoft ResearchMay 26, 202615 min read

Gemma Is Google’s On-Device Extension of Gemini Research

Google DeepMind’s Omar Sanseviero argues that Gemma is not a parallel alternative to Gemini but the open, local and on-device expression of the same research stream. He presents Gemma 4 as a model family optimized for efficiency, developer integration and emerging agentic use cases, while drawing a clear boundary around Gemini as Google’s route for frontier capability, broad factual knowledge and long-running tasks.

Vibhu Sapra · Shawn Wang · Omar SansevieroLatent SpaceMay 25, 202613 min read

Google’s GenAI Stack Turns Multimodal Prompts Into Application Pipelines

Google DeepMind’s Paige Bailey and Guillaume Vernade argue that Google’s generative AI stack is being organized as an application pipeline rather than a set of isolated models. In a three-hour workshop, Bailey showed AI Studio turning multimodal Gemini prompts into inspectable API calls and generated apps with auth and Firestore, while Vernade used Gemini, Nano Banana, Veo and Lyria to illustrate, animate and score The Wind in the Willows. Their case is that builders can now orchestrate prompt, code, media generation and deployment in one workflow, even as the demos exposed seams that still require engineering discipline.

Paige Bailey · Guillaume Vernade · Ian ValentineAI EngineerMay 23, 202623 min read

Fast Coding Models Require Smaller Tasks and Continuous Validation

Sarah Chieng of Cerebras argues that fast coding models such as Codex Spark, which she says can generate code at roughly 1,200 tokens per second, require more disciplined developer workflows rather than looser ones. In her account, a 20x speedup over models such as Sonnet and Opus makes old habits — large prompts, unattended agents, delayed validation, and sprawling context — produce technical debt faster than developers can inspect it. Her playbook is to use speed for bounded execution, continuous testing and linting, variant generation, stricter permissions, and external memory that keeps short sessions from losing the plan.

Sarah ChiengAI EngineerMay 22, 202613 min read

Google Says It Is at the AI Frontier, Except in Coding

Google chief executive Sundar Pichai told Hard Fork’s Kevin Roose and Casey Newton that Google is at the frontier in some areas of AI and behind in others, particularly long-horizon coding tasks. He argued that the race is moving fast enough for public judgments of leadership to change within months, while defending Google’s broader platform strategy in search, agents, cloud infrastructure and chips. Pichai also treated public anxiety about AI as rational, saying the technology is advancing toward AGI quickly enough that companies and governments need to prepare without either dismissing disruption or slowing progress excessively.

Kevin Roose · Casey Newton · Sundar PichaiHard ForkMay 22, 202613 min read

Google’s AI Strategy Emphasizes Scale Over Frontier Model Leadership

Kevin Roose and Casey Newton read Google’s I/O announcements as evidence of a company that has regained operational confidence in AI without yet proving frontier leadership. Roose argues Google is leaning on speed, cost, distribution and infrastructure — putting capable models across search, coding, video and cloud tools at enormous scale. Newton is more skeptical: fast and cheap, he says, is not the same as best, and many of Google’s most important product claims remain untested until users can rely on them in real workflows.

Kevin Roose · Casey Newton · Demis HassabisHard ForkMay 21, 20267 min read

Gemini Omni Flash Replaces Veo as Google’s Default Video Model

ElevenLabs’ breakdown of Google’s I/O 2026 launch presents Gemini Omni as a major reset of Google’s AI video stack, with Omni Flash already replacing Veo as the default video model in the Gemini app. The source argues that the significance is not just better text-to-video generation, but a shift toward multimodal, conversational video creation: users can combine text, images, audio, video, and reference photos, then revise clips through successive instructions while preserving characters and scenes.

ElevenLabsMay 21, 20266 min read

Google’s I/O Pitch Put Distribution Ahead of Model Breakthroughs

John Coogan and Jordi Hays read Google I/O as a mixed signal: Google’s smart-glasses strategy looks stronger where it combines Gemini with eyewear distribution and Google’s own services, but its model launches exposed the risk of tying AI progress to a fixed conference calendar. On TBPN, they argued that Street View may be an underappreciated AI training asset and that AI video still has to move from impressive short clips to coherent long-form outputs. The episode also framed a potential SpaceX IPO and Nvidia’s latest results as evidence that the financial returns from space and AI infrastructure are already arriving at exceptional scale.

John Coogan · Jordi Hays · Tyler Cosgrove · Steve WozniakTBPNMay 21, 202614 min read

GPT-5.5 Improves Fact Extraction From Messy Clinical Conversations

Matt Sanders of Abridge argues that GPT-5.5 improves clinical note generation by extracting more relevant facts from provider-patient conversations, rather than merely producing smoother summaries. His case is that medical encounters rarely unfold in order: patients and clinicians return to issues, add detail later, and leave key facts scattered across the visit. Abridge says better first-pass fact extraction in those messy conversations can produce more complete notes and reduce documentation burden for providers.

Matt SandersOpenAIMay 20, 20263 min read

Google’s AI Assets Are Becoming a Product Coherence Problem

John Coogan and Jordi Hays read Google’s I/O as evidence that the company’s AI advantage is becoming a product-navigation problem: it has data, distribution, models and hardware partnerships, but its demos and product names left questions about coherence and pace. Across the source, that same pressure appears in more operational forms, as AI pushes companies to turn technical capability into usable workflows, secure software dependencies and faster product systems. Tae Kim’s Nvidia argument and the expected SpaceX IPO make the capital-market version of the question explicit: whether investors will keep paying for scarce infrastructure, extreme scale and growth curves that may take years to prove out.

Jordi Hays · John Coogan · Dylan Field · Immad Akhund · Brian Chesky · Marcus Milione · Feross Aboukhadijeh · Tae KimTBPNMay 20, 202632 min read

Gemini’s Strategy Shifts From Frontier Leaderboards to Deployable AI Infrastructure

Google DeepMind executives Tulsee Doshi and Logan Kilpatrick argue that Google’s current Gemini strategy is built less around a single frontier model than around a deployable AI stack. In their account, Gemini 3.5 Flash, the Anti-Gravity agent harness and new multimodal products such as Omni are meant to make models fast, cheap and integrated enough to run across Search, the Gemini app, AI Studio, YouTube and enterprise tools. The deeper shift, Kilpatrick says, is that the model is increasingly absorbing the scaffolding that once surrounded it, while Google standardizes the remaining agent infrastructure across its products.

Nathan Labenz · Logan Kilpatrick · Tulsee DoshiThe Cognitive RevolutionMay 20, 202619 min read

Google’s AI Repricing Turns on Product Restraint and Developer Adoption

John Coogan and Jordi Hays use Google I/O to argue that Alphabet is being repriced less as a search incumbent threatened by AI than as a full-stack AI company, though they say Google still has to prove it can turn models such as Gemini Omni and Flash into useful products without cluttering every surface. The Diet TBPN episode also treats distribution as the common pressure point behind several unrelated fights: whether smartphones help explain the timing of global fertility decline, why a small Spotify icon change provoked backlash, and whether podcasts or childcare are eroding the market for serious nonfiction.

John Coogan · Jordi HaysTBPNMay 20, 202615 min read

AI’s Value Is Shifting From Model Demos to Distribution and Measurement

Google’s problem at I/O, Jordi Hays argued, was no longer proving that its AI models are impressive, but making Gemini useful rather than redundant across products investors now increasingly view as part of a full-stack AI business. The TBPN discussion extended that framing across the rest of the show: AI’s value, the hosts and guests argued, depends less on model spectacle than on distribution, workflow integration, economics and adoption by institutions. That distinction ran from Google’s risk of crowding users with Gemini entry points to SendCutSend’s physical capacity constraints, Commure’s push to automate healthcare administration, and METR’s effort to turn frontier-model risk into something auditable.

Jordi Hays · John Coogan · Ajeya Cotra · Jim Belosic · Tanay Tandon · Aidan Dewar · Fai Nur · Philip InghelbrechtTBPNMay 19, 202631 min read

WeatherNext Predicted Hurricane Melissa’s Jamaica Landfall Three Days Early

Google DeepMind presents WeatherNext, its AI-based global weather forecasting model, as having helped forecasters predict Hurricane Melissa’s Category 5 intensification and landfall in Jamaica three days in advance. Ferran Alet says the model provided a more accurate early signal than previous systems, while National Hurricane Center officials Michael Brennan and Robbie Berg say its confidence supported more aggressive warnings before the storm arrived. Jamaica’s Evan Thompson argues that the added notice gave authorities time to move people out of danger.

Robbie Berg · Ferran Alet · Evan Thompson · Michael BrennanGoogle DeepMindMay 19, 20263 min read

GPT Image 2 Wins on Layout While Nano Banana 2 Wins on Speed

ElevenLabs’ side-by-side test of GPT Image 2 and Nano Banana 2 argues that the models are complementary rather than interchangeable. In more than 20 generation and editing prompts, GPT Image 2 was favored for strict prompt adherence, tight composition, source-faithful edits, and text-heavy layouts, while Nano Banana 2 was faster, cheaper at 4K, and stronger in several tasks involving detail retention, realism, and consistency. The practical recommendation is to A/B the same prompt and choose the model whose likely failure mode fits the job.

ElevenLabsMay 18, 202614 min read

GPT Image 2 Beats Nano Banana 2 on Control, Not Speed

ElevenLabs’ side-by-side test of GPT Image 2 and Nano Banana 2 argues that the models are complementary rather than interchangeable. Across more than 20 generation and editing prompts, the comparison found GPT Image 2 stronger when briefs required tight prompt control, text hierarchy, layout discipline, and source fidelity, while Nano Banana 2 more often won on speed, 4K cost efficiency, fine detail, and polished editorial transformations. The practical recommendation is to route work by failure risk — and A/B test important prompts — rather than pick a single default model.

ElevenLabsMay 18, 202614 min read

Long-Running Agents Need Separate Builders, Evaluators, and Disposable Scaffolding

Anthropic’s Ash Prabaker and Andrew Wilson argue that long-running agents are a harness-design problem, not a matter of writing longer prompts. Their case is that agents can run for hours only when building, judging, planning and state management are separated: adversarial evaluators should test live behavior, work should be decomposed into explicit contracts, and durable state should live outside the model’s context. They also warn that this scaffolding is provisional, because each new model release changes which supports are useful and which have become dead weight.

Ash Prabaker · Andrew WilsonAI EngineerMay 18, 202619 min read

Images 2.0 Moves Image Generation From Novelty to Workflow Tool

OpenAI product lead Adele Li and researcher Kenji Hata argue that Images 2.0 marks a shift from novelty image generation to a working visual layer inside ChatGPT. In a podcast discussion with Andrew Mayne, they point to 1.5bn images generated weekly, sharper text rendering, stronger photorealism, broader aspect ratios and more consistent characters as evidence that the model is moving into education, internal communication, marketing assets, software mockups and other practical creative work.

Andrew Mayne · Adele Li · Kenji HataOpenAIMay 14, 202612 min read

MagenticLite Brings Full Agent Workflows to Small Language Models

Microsoft Research is presenting MagenticLite as a full-stack agentic system designed to make small language models usable for multi-step work across a browser and local files. Weili Shi, Harkirat Behl and Hussein Mozannar argue that the capability comes from specializing the stack rather than relying on frontier-scale models: MagenticBrain handles planning, coding and delegation, while Fara 1.5 controls the browser. The release also emphasizes user oversight, with the agent pausing for credentials, approvals or other points where the user needs to take control.

Hussein Mozannar · Harkirat Behl · Weili ShiMicrosoft ResearchMay 14, 20267 min read

GPT-Realtime-2 Turns Voice Agents Into Tool-Using Reasoning Systems

OpenAI’s Build Hour on GPT-Realtime-2 presented the new realtime voice release as a shift from conversational voice interfaces toward tool-using, stateful agents. Teri Yu and Erika Kettleson argued that GPT-realtime-2’s larger context window, stronger instruction following, parallel tool calling and controllable speech behavior let developers build voice systems that can operate apps, reason across workflows and know when not to speak. Sierra’s Ken Murphy and Soham Ray added that production voice agents still depend on the surrounding system: guardrails, tuned turn-taking, tracing, redaction, evaluations and customer-specific workflows.

Ken Murphy · Teri Yu · Sarah Urbonas · Soham Ray · Erika KettlesonOpenAIMay 13, 202614 min read

NVIDIA’s Nemotron 3 Nano Omni Trades Accuracy for Multimodal Throughput

Károly Zsolnai-Fehér’s account of NVIDIA’s Nemotron 3 Nano Omni argues that the 30-billion-parameter open multimodal model is notable less for leading general intelligence benchmarks than for processing long video, audio, images and documents quickly and cheaply. The reported advantage comes from compression across the system — Mamba layers, audio tokenization, aspect-ratio-preserving vision handling, distilled encoders and efficient video sampling — which reduces the amount of material sent into the language-model backbone.

Károly Zsolnai-FehérTwo Minute PapersMay 13, 20267 min read

Altman Testimony Casts Musk’s OpenAI Claims as a Fight Over Control

OpenAI’s trial, Anthropic’s secondary-market flare-up, and two media deals are read on Diet TBPN as fights over control, enforceability, and credibility. John Coogan argues that Musk v. OpenAI is increasingly not only about whether OpenAI betrayed its nonprofit mission, but whether Elon Musk accepted a for-profit path only if he controlled it; Jordi Hays frames the Anthropic panic as a test of whether private-company transfer restrictions can hold against demand for AI exposure. Coogan and Hays treat Thinking Machines’ demo separately, as a bet that real-time interaction should be native to AI models, while eBay’s rejected GameStop bid and Byron Allen’s BuzzFeed investment turn on market confidence.

John Coogan · Jordi Hays · Alex ShanTBPNMay 13, 202615 min read

Text-to-Speech Models Are Converging on LLM-Style Architectures

Samuel Humeau of Mistral argues that modern text-to-speech has converged on an architecture that resembles large language modeling: an autoregressive transformer generates compressed audio tokens frame by frame, rather than raw waveform samples. Using Mistral’s open-weight Voxtral TTS model as the example, he says neural audio codecs make that possible by reducing dense speech signals to token-like representations a transformer can handle. The remaining latency frontier, in his account, is not just streaming playable audio early, but letting TTS consume an LLM’s text stream as it is still being written.

Samuel HumeauAI EngineerMay 9, 202612 min read

GPT-5.5 Instant Cuts High-Stakes Errors but Exposes Safety Gaps

Károly Zsolnai-Fehér argues that OpenAI’s GPT-5.5 Instant matters because it is the default ChatGPT model used at scale, not because it is the flashiest frontier system. His reading of OpenAI’s release material is that the model is materially better on factuality and now approaches expert or thinking-model performance on some biology and cybersecurity tasks, but that its power makes a safety weakness more important: under hard adversarial biological prompts, the base model’s refusal rate drops sharply before OpenAI’s classifier-based safeguards are applied.

Károly Zsolnai-FehérTwo Minute PapersMay 8, 20268 min read

BFL Is Moving FLUX From Image Generation Toward Physical AI

Stephen Batifol of Black Forest Labs argues that FLUX is no longer just an image-generation line but the start of a broader push toward visual intelligence: models that can generate, edit, understand, and eventually act across images, video, audio, and physical environments. In the talk, he presents FLUX.1, Kontext, FLUX.2, and FLUX.2 Klein as product steps toward that goal, while BFL’s Self-Flow research is framed as the mechanism for moving representation learning inside multimodal generative models rather than relying on external encoders.

Stephen BatifolAI EngineerMay 8, 202611 min read

OpenAI Splits Audio API Into Translation, Transcription, and Voice-Agent Models

OpenAI is presenting three new API audio models as infrastructure for voice applications that can translate, transcribe, reason and act in real time. Romain Huet’s demonstration centered on GPT-Realtime-Translate, which keeps pace with multilingual speech, and GPT-Realtime-2, a voice-agent model that can follow turn-taking instructions, use business context and call tools while explaining its work. GPT-Realtime-Whisper completes the set as a streaming speech-to-text model for live transcription.

Romain Huet · Jason Wei · Dominic GrilloOpenAIMay 7, 20266 min read

DeepSeek V4 Claims Frontier-Adjacent Open Weights With One-Million-Token Context

Károly Zsolnai-Fehér of Two Minute Papers argues that DeepSeek V4 Preview is a consequential open-weight AI release because it pairs frontier-adjacent benchmark results with a reported one-million-token text context window and sharply lower long-context memory costs. His case rests less on outright benchmark dominance than on access economics: a freely self-hostable model appears close enough to recent closed frontier systems to change what developers can afford to use. He also stresses the limits: DeepSeek V4 is text-only, degrades near the edge of its context window, and still needs serious hardware at full scale.

Károly Zsolnai-FehérTwo Minute PapersMay 7, 20266 min read

Gemma 4 Moves On-Device AI From Chatbots to Local Agents

Chintan Parikh of Google DeepMind argues that on-device AI is moving from local chatbots toward local agents, as smaller Gemma 4 edge models become capable of tool calling, structured output and reasoning on phones, laptops and embedded hardware. With Weiyi Wang joining the Q&A, Parikh presents LiteRT as the deployment layer for that shift across Android, iOS, desktop, web and IoT. His case is pragmatic rather than absolute: edge inference can improve latency, privacy, offline use and cost, but teams still have to manage memory, quantization, accelerator support and when to call the cloud.

Weiyi Wang · Chintan ParikhAI EngineerMay 7, 202611 min read