Topic

Image and Video Generation

Generative image and video models, editing tools, creative workflows, media production, synthetic content, and visual AI product launches.

Midjourney Medical Extends Image-Generation Ambitions Into Full-Body Ultrasound Scanning

TBPN hosts John Coogan and Jordi Hays read Midjourney Medical as a continuation of David Holz’s long-running work on sensing, interfaces and machine perception, rather than a sudden move from image generation into healthcare. Their account argues that Midjourney’s unusual business — bootstrapped, community-driven and cash-generative — has given Holz room to attempt a capital-intensive ultrasound scanning system with ambitions far beyond a conventional clinic device. The episode pairs that bet with OpenAI’s hiring of Noam Shazeer and Dean Ball as evidence that technical talent, policy capacity and institutional advantage are converging in AI.

John Coogan · Jordi Hays · Jake PaulTBPNJun 19, 202614 min read

Flows Agent Turns Creative Briefs Into Editable AI Production Pipelines

ElevenLabs presents Flows Agent as a conversational assistant for building and revising node-based creative workflows inside ElevenCreative Flows. The company’s case is that a user can describe an ad or other asset in natural language, have the agent assemble the models, prompts, nodes, and connections, then keep the resulting pipeline visible for edits, approvals, and reuse. The demo emphasizes cost controls for credit-heavy generation, node-level revisions through chat, and templates that turn a completed flow into a repeatable production system.

ElevenLabsJun 18, 20266 min read

Models Will Absorb Today’s Agent Harnesses Within a Year

Logan Kilpatrick, who leads Google AI Studio and the Gemini API, argues that the current rush to build agent harnesses may have a short shelf life. In an interview with Sequoia Capital’s Sonya Huang, he says models are absorbing the scaffolding around agents and could make much of today’s custom harness layer less distinctive within about 12 months. Google’s own strategy runs on both sides of that claim: Antigravity has become a shared agent layer across products, while Kilpatrick says the durable advantage for builders will move to focus, domain knowledge, risk tolerance and useful outcomes for users.

Logan Kilpatrick · Sonya HuangSequoia CapitalJun 11, 202619 min read

Codex Turns Campaign Briefs Into Editable Marketing Assets

OpenAI’s demo presents the Creative Production plugin for Codex as a campaign-production workflow for marketing teams, rather than a standalone image generator. Using a fictional Maison Feve chocolate launch, the company shows Codex turning a brief into mood-board directions, revised visual treatments, display-ad variants and an editable Canva handoff. The argument is that marketers can use Codex to carry campaign context through concepting, asset generation and final production edits in one working thread.

OpenAIJun 10, 20265 min read

Apple’s New Siri Tests Who Controls the Default AI Assistant

John Coogan and Jordi Hays read Apple’s WWDC as a test of whether the company can turn its long-delayed Siri promise into a defensible AI interface without giving up control of defaults, privacy, and the iPhone camera. The Diet TBPN segment argues that Apple’s AI story is less about a single keynote than about older bets now becoming technically possible, while Anthropic’s Claude Fable release and Meta’s data-center training push show the same shift toward long-running inference and physical AI infrastructure.

John Coogan · Jordi HaysTBPNJun 10, 202615 min read

A Python Decorator Replaces the GPU Deployment Container Loop

RunPod’s Audrey Hsu argues that GPU inference development should not require a commit, container build, registry push and server provisioning cycle for every model change. In a demo of Flash, RunPod’s Python SDK, she shows how adding a `@flash.endpoint` decorator to an async function can package that function as a GPU-backed cloud endpoint while the rest of the application stays in the developer’s IDE. Her broader case is that teams should experiment on Pods or low worker counts, then move to Serverless when they need autoscaling inference across many GPU workers.

Audry HsuAI EngineerJun 9, 202610 min read

ElevenLabs Adds Studio and Flows Agents to Automate Creative Production

Luke Harries used ElevenLabs’ Warsaw summit to argue that AI creative production is moving beyond prompt-based asset generation toward agent-directed workflows. Presenting ElevenCreative, he introduced Studio Agent and Flows Agent as layers above models and editing tools, intended to help teams ideate, script, prompt, edit, localize, and reuse campaigns. His case was that marketers’ role shifts from executing each production step to directing and approving systems that can produce hero assets, performance variations, and localized creative continuously.

Luke HarriesElevenLabsJun 8, 20266 min read

Sanders’ 50% AI Stock Plan Turns Training Data Into a Political Fight

Jason Calacanis argued that Anthropic’s call for an AI slowdown and Bernie Sanders’ proposal for public ownership of major AI companies show AI politics moving toward jobs, ownership and redistribution. He dismissed Sanders’ 50% stock-tax plan as unworkable but said its premise could resonate with voters who believe AI companies built enormous value from public and creative inputs while threatening employment. Yoland Yan’s ComfyUI demo supplied the production-layer version of the same control question, presenting generative AI as a workflow where exposed parameters and reproducibility matter more than prompt-box convenience.

Jason Calacanis · Lon Harris · Alex Wilhelm · Yoland YanThis Week in StartupsJun 7, 202624 min read

ComfyUI Bets on Open-Source Control for AI Video Workflows

Despite its Anthropic-titled hook, the source’s developed argument is about product interfaces that give users more control over complex systems. ComfyUI co-founder Yoland Yan argues that serious AI video creators need open, node-based workflows rather than simplified freemium tools; INTVL founder Louis Phillips makes the case for turning tracked routes into contested fitness territory; and the fact-checker bounty highlights live verification as a control layer for streamed claims.

Louis Phillips · Yoland YanThis Week in StartupsJun 5, 202617 min read

Hackathon Caps Models at 32B Parameters to Reward Tinkerable AI Apps

Build Small is a Hugging Face and Gradio hackathon organized around a hard constraint: every model used must be under 32 billion parameters. Yuvraj Sharma framed the rule as a way to move AI building away from dependence on giant hosted models and back toward systems that participants can inspect, fine-tune, run locally, and ship as working Gradio Spaces. Sponsor presentations from Black Forest Labs, OpenBMB, OpenAI, NVIDIA, Modal, JetBrains, and Cohere largely reinforced that premise, offering small models, credits, tools, and prize categories meant to turn the constraint into runnable projects rather than demos in name only.

Shashank Verma · Vaibhav Srivastav · Stephen Batifol · Julian Mack · Yuvraj Sharma · Felicia Chang · Nikita Pavlichenko · Hannah Blair · Zhong ZhangHugging FaceJun 5, 202620 min read

Native Multimodal Models Extend LLMs but Still Lack Unified Representations

Victoria Lin of Thinking Machines uses a Stanford CS25 seminar to argue that native multimodal models have extended much of the large-language-model recipe into images, audio, video and action, but have not yet unified multimodal intelligence. Her account is that tokenization, Transformers, autoregressive conditioning and scaling transfer only partly: images, video and action require different representations, objectives and sometimes modality-specific parameters. The result, she says, is a field moving beyond text-only systems while still relying on text as its strongest abstraction for reasoning.

Steven Feng · Victoria LinStanford OnlineJun 4, 202619 min read

Microsoft Bets Enterprise Agents Will Run Through the Cloud

John Coogan reads Microsoft Build 2026 as a sign that Microsoft is trying to make the cloud, not the phone, the center of enterprise AI agents. On Diet TBPN, he argues that Project Solara, Scout, OpenClaw support and Microsoft’s own models point to a platform strategy built around Azure, Microsoft 365 data, security boundaries and cost-efficient deployment rather than frontier-model supremacy. The open question, he says, is whether agent hardware and workflows can win adoption outside environments where companies can mandate them.

John Coogan · Jordi Hays · Eric Glyman · Martin Scorsese · Satya Nadella · Steven BathicheTBPNJun 3, 202614 min read

Useful AI Systems Are Emerging Inside Controlled Enterprise Workflows

TBPN’s latest discussion framed the commercial AI moment less as a race to looser autonomy than as a shift toward bounded systems. Across Microsoft’s Build announcements, Suno’s funding, creator films, stablecoins, crypto markets, cybersecurity, and workflow software, the central argument was that AI becomes useful when it is embedded in infrastructure that can price, route, audit, secure, or constrain it. John Coogan and guests applied that lens most directly to Microsoft’s agent strategy, where Azure and Microsoft 365, not a new phone, become the controlled operating environment for enterprise agents.

John Coogan · Jordi Hays · Mikey Shulman · Nikesh Arora · Satya Nadella · Alex Good · Eric Glyman · Samir Chaudry · Henri Stern · Alex Heath · Tom Farley · Martin ScorseseTBPNJun 3, 202633 min read

NVIDIA Frames Cosmos 3 as Compute-Generated Data for Physical AI

NVIDIA presents Cosmos 3 as an open foundation model for physical AI, built to address what it frames as a data-scaling problem in robotics, autonomous vehicles and other systems that operate in the physical world. The company argues that real-world data cannot capture enough variability on its own, so compute must generate usable training and evaluation signals: synthetic video, predicted sensor outputs, simulation loops and action plans. Cosmos 3 is positioned as a post-trainable mixture-of-transformers system that combines multimodal reasoning with generation to support perception, prediction, simulation and action.

NVIDIAJun 2, 20265 min read

RTX Spark Agent Moves Architectural Designs From Brief to Photoreal Render

NVIDIA’s RTX Spark demonstration argues that an architectural AI agent is most useful as a workflow operator, not as a standalone design tool. Running locally on RTX Spark and connected to tools including Rhino, Blender, ComfyUI, OpenShell and Claude Sonnet, the agent turns a residential brief into massing options, editable layouts, validated geometry and photoreal renders. NVIDIA frames the speedup as orchestration across existing applications, with the designer still approving directions, resolving tradeoffs and controlling materials and shots.

NVIDIAJun 2, 20265 min read

YouTube Is Becoming Hollywood’s Talent Market and IP Proving Ground

TBPN’s John Coogan and Jordi Hays argue that YouTube is moving from Hollywood competitor to Hollywood’s talent market, where creator-led films prove creative judgment, production ability and audience response before studio capital arrives. The episode extends that pattern to AI policy, software and prediction markets: established institutions are trying to absorb signals formed outside their usual channels, from internet-proven filmmakers and frontier AI labs to traders and startups testing demand before regulators, studios or public markets have settled their response.

Jordi Hays · John Coogan · Marc Benioff · Nico Ferreyra · Mike Schroepfer · Graham Stephan · Bernie Su · Sue Khim · Scott Trinkham · Adam Iscoe · Jason Oppenheim · Danial Jameel · Tyler BohallTBPNJun 1, 202627 min read

Open Image Models Converge on Flow Matching and DiT Architectures

Stanford adjunct lecturer Shervine Amidi uses Lecture 8 of CME296 to argue that modern visual generation is best understood as a stack of choices for transporting noise into data: the paradigm, representation, architecture, training procedure, and evaluation method. He presents flow matching as the current default for image-generation systems, diffusion transformers as the dominant architectural direction, and latent spaces as a practical compression tradeoff now being challenged by scaled pixel-space models.

Shervine AmidiStanford OnlineJun 1, 202623 min read

NVIDIA Positions RTX Spark as a 128 GB Local AI Workstation

NVIDIA’s Computex preview positioned RTX Spark as a compact Windows platform for local AI, creative production and RTX gaming, built around a new superchip pairing a Blackwell RTX GPU with a Grace CPU. Jacob Freeman and other NVIDIA presenters argued that its 128 GB of unified memory and RTX acceleration allow slim laptops and small desktops to run larger local agents, handle heavy creative scenes and support modern ray-traced games with DLSS 4.5.

Gerardo Delgado · Joel Pennington · Jacob FreemanNVIDIAJun 1, 20265 min read

State-of-the-Art AI Models Are a Pareto Frontier, Not a Ranking

Bertrand Charpentier, cofounder and chief scientist at Pruna AI, argues that state-of-the-art image generation should not be defined by a single leaderboard rank. Using Design Arena-style evaluation as his example, he says a slow top model can require 20 days of compute, about $5,300 and 556 kWh to evaluate, while a fast compressed model can run the same test in 7 hours for $265. His broader case is that model selection should be based on a Pareto frontier of quality, latency, cost and energy, not a podium that treats efficiency as secondary.

Bertrand CharpentierAI EngineerJun 1, 202611 min read

Language Models Are Becoming the Bottleneck in Video Generation

Ethan He, who worked on NVIDIA’s Cosmos world model and xAI’s Grok Imagine, argues that the next major gains in video generation will come less from diffusion models alone than from language models, agents, and context management around them. In an interview with swyx and Vibhu Sapra, He describes Grok Imagine as a fast-built example of that shift: diffusion renders pixels, while language systems increasingly rewrite prompts, plan clips, call tools, manage memory, and turn short generations into longer, editable video.

Shawn Wang · Vibhu Sapra · Ethan HeLatent SpaceJun 1, 202628 min read

Loblaw Says AI Now Generates 46.9% of Its Code

Lauren Steinberg, Loblaw’s chief digital officer, argues that OpenAI tools are already changing both employee work and customer-facing retail flows at Canada’s largest retailer. She says ChatGPT Enterprise is available to every Loblaw colleague, Codex is contributing to internal code-generation and pull-request-linked productivity gains, and ChatGPT-powered PC Express can move a shopper from a dinner question to a local, priced basket. The case is supported by Loblaw’s own on-screen examples and internal data, rather than an independent audit.

Lauren SteinbergOpenAIMay 29, 20265 min read

AI Photo Analysis Is Moving From Skin Care to Cosmetic Advice

George Mack, Nirav Savjani, Tim Ferriss and Chris Williamson argue that image-capable AI is moving from practical skin-care triage into cosmetic judgment. Mack says Gemini identified a fungal skin treatment that years of doctors and lifestyle changes had missed; Savjani says the same photo-upload pattern is now driving looksmaxing tools that recommend facial changes, procedures and appearance edits. The discussion turns on a boundary the speakers see becoming harder to police: when AI advises what to do to a face, it can also normalize a version of that face that no longer matches reality.

Chris Williamson · Nirav Savjani · George Mack · Tim FerrissChris WilliamsonMay 29, 20267 min read

Text-to-Image Evaluation Requires Metrics Matched to Specific Failure Modes

Stanford adjunct lecturers Afshine Amidi and Shervine Amidi argue that evaluating text-to-image models starts with separating aesthetic quality from prompt adherence, then choosing metrics suited to the failure being tested. In Lecture 7 of Stanford’s CME296 course on diffusion and large vision models, they treat human ratings, FID, CLIPScore, reference-based measures, multimodal judges, and benchmarks as imperfect instruments rather than substitutes for a universal image-quality score. Their central warning is practical: automated and qualitative evaluations can be useful, but only when their assumptions, calibration, and failure modes are made explicit.

Shervine AmidiStanford OnlineMay 28, 202619 min read

Meta Flow Maps Cut Reward-Alignment Costs With One-Step Posterior Sampling

Peter Potaptchik presents Meta Flow Maps as an amortized way to remove a costly inner loop in reward-aligning generative models: repeatedly simulating trajectories to estimate expected future reward from a noisy state. The method trains stochastic flow maps to produce differentiable, one-step samples from the clean-data posterior conditioned on any time and noisy state, enabling value-gradient estimates for inference-time steering and an off-policy objective for fine-tuning. In ImageNet experiments, Potaptchik argues, this lets a single-particle steered sampler outperform Best-of-1000 baselines across several rewards with far less compute.

Peter PotaptchikMicrosoft ResearchMay 26, 202616 min read

Diffusion Models Generate Images Through Critical Instability Windows

Luca Ambrogioni argues that trained diffusion models generate images through brief instability windows rather than uniform step-by-step denoising. In a Microsoft Research generative modeling seminar, he links score dynamics, conditional entropy and statistical-physics phase transitions to show how low-frequency spatial modes soften at critical times, allowing noise to organize into coherent structure. Experiments on patch models, Fashion-MNIST and ImageNet models are presented as evidence that these critical windows govern both pattern formation and the timing of effective guidance.

Carles Domingo-Enrich · Sasank Edara · Luca AmbrogioniMicrosoft ResearchMay 26, 202617 min read

Wavelet Score Models Show Local Interactions Drive Diffusion Denoising

Emma Finn argues that the memorization puzzle in diffusion models can be probed by replacing a black-box score network with an analytically solvable wavelet parameterization. In her Microsoft Research New England seminar, Finn presents the method as a way to isolate which data moments and dependency structures matter across noise scales. Her reported experiments on MNIST suggest that local same-scale wavelet interactions improve denoising more consistently than independent coefficient models or orientation-only coupling, while the larger question of whether the framework explains generative novelty remains unresolved.

Emma FinnMicrosoft ResearchMay 26, 202612 min read

Synthetic Intimacy, Surveillance, and Stimulation Are Raising the Cost of Impulse

Chris Williamson’s inaugural Mostly Wise conversation with Andrew Huberman, Matt McCusker and Tom Segura uses health advice, comedy, AI replicas and conspiracy talk to examine where useful tools become distortions. Huberman repeatedly argues for moderation and mechanism over slogans — from low-dose tadalafil and sleep protocols to cannabis, sunscreen and self-control — while Segura and McCusker test those claims against comedy, parenting and lived experience. The broader case is that modern life increasingly requires judgment about thresholds: when optimization becomes rumination, evidence becomes pattern-seeking, and synthetic intimacy or surveillance starts to reshape ordinary behavior.

Chris Williamson · Matt McCusker · Andrew Huberman · Tom SeguraChris WilliamsonMay 25, 202635 min read

Google’s GenAI Stack Turns Multimodal Prompts Into Application Pipelines

Google DeepMind’s Paige Bailey and Guillaume Vernade argue that Google’s generative AI stack is being organized as an application pipeline rather than a set of isolated models. In a three-hour workshop, Bailey showed AI Studio turning multimodal Gemini prompts into inspectable API calls and generated apps with auth and Firestore, while Vernade used Gemini, Nano Banana, Veo and Lyria to illustrate, animate and score The Wind in the Willows. Their case is that builders can now orchestrate prompt, code, media generation and deployment in one workflow, even as the demos exposed seams that still require engineering discipline.

Paige Bailey · Guillaume Vernade · Ian ValentineAI EngineerMay 23, 202623 min read

AI’s Bottlenecks Shift From Model Demos to Compute, Rights, and Institutions

AI, in TBPN’s latest discussion, is no longer treated mainly as a product demo but as a question of infrastructure, financing and institutional adoption. The strongest evidence came from SpaceX’s AI-heavy IPO framing, Anthropic’s reported move toward operating profit, and OpenAI’s claimed Erdős breakthrough, which the speakers used to challenge the “AI is a scam” critique. The unresolved issue is not whether the technology matters, but how quickly compute capacity, rights regimes, regulation and existing institutions can absorb it.

John Coogan · Jordi Hays · Tyler Cosgrove · Alex Tabarrok · Bill Clerico · Christina Storm · Erik Bernhardsson · Alex Norström · Jordan SchneiderTBPNMay 21, 202627 min read

Google’s AI Strategy Emphasizes Scale Over Frontier Model Leadership

Kevin Roose and Casey Newton read Google’s I/O announcements as evidence of a company that has regained operational confidence in AI without yet proving frontier leadership. Roose argues Google is leaning on speed, cost, distribution and infrastructure — putting capable models across search, coding, video and cloud tools at enormous scale. Newton is more skeptical: fast and cheap, he says, is not the same as best, and many of Google’s most important product claims remain untested until users can rely on them in real workflows.

Kevin Roose · Casey Newton · Demis HassabisHard ForkMay 21, 20267 min read

Gemini Omni Flash Replaces Veo as Google’s Default Video Model

ElevenLabs’ breakdown of Google’s I/O 2026 launch presents Gemini Omni as a major reset of Google’s AI video stack, with Omni Flash already replacing Veo as the default video model in the Gemini app. The source argues that the significance is not just better text-to-video generation, but a shift toward multimodal, conversational video creation: users can combine text, images, audio, video, and reference photos, then revise clips through successive instructions while preserving characters and scenes.

ElevenLabsMay 21, 20266 min read

Google’s I/O Pitch Put Distribution Ahead of Model Breakthroughs

John Coogan and Jordi Hays read Google I/O as a mixed signal: Google’s smart-glasses strategy looks stronger where it combines Gemini with eyewear distribution and Google’s own services, but its model launches exposed the risk of tying AI progress to a fixed conference calendar. On TBPN, they argued that Street View may be an underappreciated AI training asset and that AI video still has to move from impressive short clips to coherent long-form outputs. The episode also framed a potential SpaceX IPO and Nvidia’s latest results as evidence that the financial returns from space and AI infrastructure are already arriving at exceptional scale.

John Coogan · Jordi Hays · Tyler Cosgrove · Steve WozniakTBPNMay 21, 202614 min read

Google’s AI Assets Are Becoming a Product Coherence Problem

John Coogan and Jordi Hays read Google’s I/O as evidence that the company’s AI advantage is becoming a product-navigation problem: it has data, distribution, models and hardware partnerships, but its demos and product names left questions about coherence and pace. Across the source, that same pressure appears in more operational forms, as AI pushes companies to turn technical capability into usable workflows, secure software dependencies and faster product systems. Tae Kim’s Nvidia argument and the expected SpaceX IPO make the capital-market version of the question explicit: whether investors will keep paying for scarce infrastructure, extreme scale and growth curves that may take years to prove out.

Jordi Hays · John Coogan · Dylan Field · Immad Akhund · Brian Chesky · Marcus Milione · Feross Aboukhadijeh · Tae KimTBPNMay 20, 202632 min read

Any-to-Any Agents Rely on Orchestrated Multimodal Models, Not One Network

Google DeepMind’s Patrick Löber presents “any-to-any” agents as an orchestration problem rather than a claim that one model already handles every modality. In his architecture, Gemini reads and reasons across PDFs, images, audio, video and other sources, then uses function calling to invoke specialized native models for images, speech, live audio, video or embeddings. Löber argues that the useful shift is not generating every possible format, but letting an agent decide when a diagram, spoken explanation or other output is warranted.

Patrick LoeberAI EngineerMay 20, 202610 min read

Gemini’s Strategy Shifts From Frontier Leaderboards to Deployable AI Infrastructure

Google DeepMind executives Tulsee Doshi and Logan Kilpatrick argue that Google’s current Gemini strategy is built less around a single frontier model than around a deployable AI stack. In their account, Gemini 3.5 Flash, the Anti-Gravity agent harness and new multimodal products such as Omni are meant to make models fast, cheap and integrated enough to run across Search, the Gemini app, AI Studio, YouTube and enterprise tools. The deeper shift, Kilpatrick says, is that the model is increasingly absorbing the scaffolding that once surrounded it, while Google standardizes the remaining agent infrastructure across its products.

Nathan Labenz · Logan Kilpatrick · Tulsee DoshiThe Cognitive RevolutionMay 20, 202619 min read

Google’s AI Repricing Turns on Product Restraint and Developer Adoption

John Coogan and Jordi Hays use Google I/O to argue that Alphabet is being repriced less as a search incumbent threatened by AI than as a full-stack AI company, though they say Google still has to prove it can turn models such as Gemini Omni and Flash into useful products without cluttering every surface. The Diet TBPN episode also treats distribution as the common pressure point behind several unrelated fights: whether smartphones help explain the timing of global fertility decline, why a small Spotify icon change provoked backlash, and whether podcasts or childcare are eroding the market for serious nonfiction.

John Coogan · Jordi HaysTBPNMay 20, 202615 min read

Text-to-Image Training Is Becoming a Problem of Signal Allocation

Stanford adjunct lecturers Shervine Amidi and Afshine Amidi present text-to-image model training as a problem of allocating scarce learning signal across the full model lifecycle, not simply choosing a diffusion or flow-matching loss. In Lecture 6 of Stanford’s CME296 course, they argue that practical training depends on emphasizing hard timesteps, adjusting for resolution, using data curricula and representation alignment, then applying post-training, personalization, and distillation methods to improve control and reduce inference cost.

Shervine AmidiStanford OnlineMay 19, 202621 min read

AI’s Value Is Shifting From Model Demos to Distribution and Measurement

Google’s problem at I/O, Jordi Hays argued, was no longer proving that its AI models are impressive, but making Gemini useful rather than redundant across products investors now increasingly view as part of a full-stack AI business. The TBPN discussion extended that framing across the rest of the show: AI’s value, the hosts and guests argued, depends less on model spectacle than on distribution, workflow integration, economics and adoption by institutions. That distinction ran from Google’s risk of crowding users with Gemini entry points to SendCutSend’s physical capacity constraints, Commure’s push to automate healthcare administration, and METR’s effort to turn frontier-model risk into something auditable.

Jordi Hays · John Coogan · Ajeya Cotra · Jim Belosic · Tanay Tandon · Aidan Dewar · Fai Nur · Philip InghelbrechtTBPNMay 19, 202631 min read

AI Growth Is Running Into Power, Memory, and Inference Bottlenecks

TBPN’s discussion recast the AI boom around physical and economic bottlenecks — power, cooling, chip scarcity, inference cost and memory — rather than model ambition alone. Mike Isaac, Rowan Trollope and Dean Leitersdorf described an industry where local utilities, low-level inference optimization and fast state management are becoming central constraints, a capacity problem the hosts also saw in the whey protein shortage. Everlane’s reported sale to Shein pointed to a different limit: Hays argued that venture-backed ethical basics struggled against price pressure, brand preference and the demand for sustained growth. Joanna Stern supplied the adoption constraint, arguing from her reporting that AI’s progress will be judged through trust, job anxiety, children’s safety and whether new devices ease or deepen phone dependence.

John Coogan · Jordi Hays · Joanna Stern · Rowan Trollope · Dean Leitersdorf · Mike IsaacTBPNMay 18, 202624 min read

GPT Image 2 Wins on Layout While Nano Banana 2 Wins on Speed

ElevenLabs’ side-by-side test of GPT Image 2 and Nano Banana 2 argues that the models are complementary rather than interchangeable. In more than 20 generation and editing prompts, GPT Image 2 was favored for strict prompt adherence, tight composition, source-faithful edits, and text-heavy layouts, while Nano Banana 2 was faster, cheaper at 4K, and stronger in several tasks involving detail retention, realism, and consistency. The practical recommendation is to A/B the same prompt and choose the model whose likely failure mode fits the job.

ElevenLabsMay 18, 202614 min read

Microsoft’s OpenAI Advantage Has Not Become an AI Product Lead

Alex Kantrowitz and Ranjan Roy use Satya Nadella’s 2022 email about Microsoft’s dependence on OpenAI and Nvidia to argue that the company saw the central AI risk early but did not turn privileged model access into a decisive product advantage. Their broader case is that distribution and partnerships are proving inadequate without control, AI-native execution, and usable integrations — a problem they see not only at Microsoft, but also in Apple’s weak ChatGPT-Siri integration and Google’s uneven AI products.

Alex Kantrowitz · Ranjan RoyAlex KantrowitzMay 18, 202616 min read

Gemini Becomes the Prompt Engineer for Google’s Gen Media Stack

Google DeepMind developer advocate Guillaume Vernade demonstrates a gen-media workflow built around Gemini as the orchestrator rather than as a one-shot generator. Using The Wind in the Willows, he shows Gemini reading the full book, producing structured prompts and scripts, and handing them to Nano Banana, Veo, Lyria and TTS models for images, video, music and narration. His broader case is that multimodal production depends less on a single model than on schemas, reference assets, state management, cost controls and prompt handoffs between specialist systems.

Guillaume Vernade · Paige BaileyAI EngineerMay 18, 202619 min read

GPT Image 2 Beats Nano Banana 2 on Control, Not Speed

ElevenLabs’ side-by-side test of GPT Image 2 and Nano Banana 2 argues that the models are complementary rather than interchangeable. Across more than 20 generation and editing prompts, the comparison found GPT Image 2 stronger when briefs required tight prompt control, text hierarchy, layout discipline, and source fidelity, while Nano Banana 2 more often won on speed, 4K cost efficiency, fine detail, and polished editorial transformations. The practical recommendation is to route work by failure risk — and A/B test important prompts — rather than pick a single default model.

ElevenLabsMay 18, 202614 min read

AI Tools Are Moving Creative and Software Work Toward Specification

TBPN’s discussion uses Debater Center, AI-generated Monet-style clips, Cursor, Figma and a 67-year-old AI founder to question whether tech labels describe what is actually happening underneath. The speakers argue that ranked debate software may need an audience to create the performative pressure people associate with online debate, while AI tools such as Luma and Cursor are shifting creative and technical work from manual execution toward higher-level specification. Their shorter points on Figma and the older founder make the same corrective move: they resist premature obituaries for products, skills and founder archetypes that are still active.

John Coogan · Jordi HaysTBPNMay 15, 202619 min read

Images 2.0 Moves Image Generation From Novelty to Workflow Tool

OpenAI product lead Adele Li and researcher Kenji Hata argue that Images 2.0 marks a shift from novelty image generation to a working visual layer inside ChatGPT. In a podcast discussion with Andrew Mayne, they point to 1.5bn images generated weekly, sharper text rendering, stronger photorealism, broader aspect ratios and more consistent characters as evidence that the model is moving into education, internal communication, marketing assets, software mockups and other practical creative work.

Andrew Mayne · Adele Li · Kenji HataOpenAIMay 14, 202612 min read

ElevenCreative Adds Templates for Reusable AI Creative Workflows

ElevenLabs is introducing Templates in ElevenCreative, a feature that turns its node-based Flows into reusable creative workflows with defined inputs and outputs. The company presents the tool as a way to run repeatable production tasks — such as product shots, mockups, style transfers, character sheets, and thumbnail translation — without rebuilding the workflow each time. Users can run templates from a gallery or publish their own, choosing which variables others can edit, what asset is returned, and whether access is private, workspace-only, or public.

ElevenLabsMay 13, 20265 min read

Apple-Device AI Is Becoming Viable Without Cloud Inference

Prince Canuma presents MLX, Apple’s array framework for Apple Silicon, as a practical foundation for running AI agents locally rather than through cloud services. His case is rooted in accessibility and unreliable connectivity, but extends to product constraints for voice agents, robots and multimodal apps: vision, speech, video generation and long-context inference can increasingly run on Macs, iPhones and iPads without a network call. Canuma does not argue that local models replace every frontier cloud system, but that the boundary has moved far enough to make on-device AI a serious deployment option.

Prince CanumaAI EngineerMay 11, 202613 min read

Fresh Product Data Is the Constraint for LLM Commerce Discovery

Criteo executives Diarmuid Gill and Liva Ralaivola argue that modern ad tech is best understood as a millisecond-scale prediction system: anonymous commerce signals, learned embeddings and real-time auctions are used to decide whether to bid, what to show and how much an impression is worth. In a conversation with Nathan Labenz, they frame Criteo’s work with OpenAI and other generative tools as an extension of that problem, not a replacement for it: LLMs may change product discovery, but the system still depends on fresh retailer data, consent, latency discipline and human oversight.

Nathan Labenz · Alex Persky-Stern · Diarmuid Gill · Liva RalaivolaThe Cognitive RevolutionMay 9, 202618 min read

BFL Is Moving FLUX From Image Generation Toward Physical AI

Stephen Batifol of Black Forest Labs argues that FLUX is no longer just an image-generation line but the start of a broader push toward visual intelligence: models that can generate, edit, understand, and eventually act across images, video, audio, and physical environments. In the talk, he presents FLUX.1, Kontext, FLUX.2, and FLUX.2 Klein as product steps toward that goal, while BFL’s Self-Flow research is framed as the mechanism for moving representation learning inside multimodal generative models rather than relying on external encoders.

Stephen BatifolAI EngineerMay 8, 202611 min read

Luma Is Rebuilding Video AI Around a Unified Multimodal Transformer

In a Stanford CS153 guest lecture, Luma AI co-founder and chief executive Amit Jain argues that generative video is only a staging point toward “unified intelligence”: models that understand and generate across text, images, video, audio, code and tools in a single work loop. Jain traces Luma’s path from Apple-era LiDAR and 3D capture to internet-scale video, saying the company followed the data but now sees prettier clips as insufficient. The destination, he says, is a multimodal AI factory for professional creative and physical work, where human skills, tool use, feedback and unified transformer architectures produce full campaigns, schematics, productions and eventually robotics workflows.

Anjney Midha · Amit JainStanford OnlineMay 7, 202619 min read

Descript Bets Creator AI on Reliable Editing, Not Content Slop

Laura Burkhauser, Descript’s chief executive, distinguishes generative AI tools for creators from the “slop” she defines as mass-produced content arbitrage. Her case is that Descript’s future depends less on adding AI everywhere than on making editing automation reliable, reversible and useful for recorded human media. That means choosing third-party models by fit and taste, building in-house systems where Descript has workflow data, and treating creator backlash as a product constraint rather than a branding problem.

Nathan Labenz · Laura BurkhauserThe Cognitive RevolutionMay 7, 202619 min read