Topic

Open Models

Open-weight and open-source model releases, licensing, community ecosystems, local deployment, and the competitive impact of open AI.

AI Progress Is Being Bought With Data, Not Sample Efficiency

Dwarkesh Patel argues that recent AI progress is driven less by clear gains in sample efficiency than by an immense expansion of training data, including synthetic rollouts and highly specific human expert examples. In his account, frontier models can display broad professional competence because labs keep pushing more tasks into the training distribution, not because the systems learn new domains the way humans do. Patel says that data-heavy approach may still be commercially powerful when capabilities can be amortized across billions of uses, but it leaves unresolved whether current systems can solve their own sample-efficiency problem.

Dwarkesh PatelDwarkesh PatelJun 19, 20268 min read

Anthropic’s Fable Backlash Exposes the Risk of Hidden AI Gatekeeping

The All-In panel argues that Anthropic’s handling of Claude Fable 5 turned AI safety into an enterprise trust problem, with Jason Calacanis, Chamath Palihapitiya, David Sacks and David Friedberg focusing on hidden downgrades, prompt retention and a provider’s power to decide who receives full model capability. The same concern over opaque discretion shaped their California election discussion, where Friedberg and Sacks argued that legal ballot rules can still produce outcomes voters view as manipulated, while Calacanis called for investigation rather than treating suspicious statistics as proof of fraud.

Jason Calacanis · Chamath Palihapitiya · David Friedberg · David SacksAll-In PodcastJun 13, 202624 min read

MiniCPM-V 2.6 Runs at 18 Tokens per Second on iPhone

OpenBMB used its Build Small hackathon session to argue that small models are valuable when they can be deployed where applications and data already live: on phones, laptops, mobile apps and edge devices. Its main example was MiniCPM-V 2.6, a vision-language model shown running on an iPhone 15 Pro at 18 tokens per second with llama.cpp and 4-bit quantization. The broader claim was that compact, open models paired with existing runtimes can expand access, reduce cloud dependence, and improve privacy and latency for local AI use cases.

Hugging FaceJun 10, 20266 min read

Coding Revenue and Compute Shortages Are Extending the AI Boom

Alex Sacerdote, founder and portfolio manager of Whale Rock Capital Management, argues that AI is still at the earliest stage of enterprise adoption and may be a steeper curve than prior technology shifts. In his telling, coding has become the first clear proof that AI can generate large revenue by replacing or augmenting labor, while the model layer is consolidating around a few leaders rather than commoditizing. Sacerdote’s broader case is that investors are underestimating both the earnings power of those winners and the hardware renaissance required to supply the compute behind them.

Patrick O'Shaughnessy · Alex SacerdoteInvest Like The BestJun 9, 202624 min read

Second-Order Effects Shape Gurley’s View of AI, Stablecoins, and Venture Capital

Benchmark veteran Bill Gurley argues that the same habits shaped his investing career and his current view of AI, crypto, payments and venture capital: understand the foundations of a field, stay close to its bleeding edge, and think in systems rather than single-variable causes. In a Knowledge Project interview with Shane Parrish, Gurley says founders and investors misread opportunities when they ignore second- and third-order effects, whether in startup burn rates, AI regulation, tokenized markets or stablecoin adoption.

Bill Gurley · Shane ParrishThe Knowledge Project PodcastJun 9, 202623 min read

Telemetry, Not Code, Audits Nondeterministic AI Agents

Dat Ngo of Arize argues that LLM observability has to account for failures in execution paths, not just broken components, because agents can call tools in different orders, branch, loop, and change behavior across runs. In his account, traces become the audit record for nondeterministic systems, while evaluation must combine model judges, human feedback, golden datasets, deterministic checks, and business metrics at the right scope. Arize’s stated direction is to connect observability, evals, experimentation, and improvement into an increasingly automated loop.

Dat NgoAI EngineerJun 7, 202610 min read

Sanders’ 50% AI Stock Plan Turns Training Data Into a Political Fight

Jason Calacanis argued that Anthropic’s call for an AI slowdown and Bernie Sanders’ proposal for public ownership of major AI companies show AI politics moving toward jobs, ownership and redistribution. He dismissed Sanders’ 50% stock-tax plan as unworkable but said its premise could resonate with voters who believe AI companies built enormous value from public and creative inputs while threatening employment. Yoland Yan’s ComfyUI demo supplied the production-layer version of the same control question, presenting generative AI as a workflow where exposed parameters and reproducibility matter more than prompt-box convenience.

Jason Calacanis · Lon Harris · Alex Wilhelm · Yoland YanThis Week in StartupsJun 7, 202624 min read

Tool-Call Repairs Let DeepSeek v4 Beat Opus 4.7 in Internal Evals

Ahmad Awais, founder of CommandCode.ai, argues that many open models appear weak at coding-agent work because the harness around them mishandles tool schemas, design instructions and user preferences. Drawing on Command Code’s internal logs and evals, he says small deterministic repairs to tool inputs helped DeepSeek v4 Pro beat Opus 4.7 in six of ten internal comparisons. His broader case is that “taste” — explicit contracts for tools, design patterns and developer habits — can narrow the gap between cheaper open models and frontier coding systems without changing the model itself.

Shawn Wang · Ahmad AwaisLatent SpaceJun 6, 202614 min read

AI Application Companies Are Moving Beyond Frontier APIs to Protect Margins

Baseten founder and chief executive Tuhin Srivastava used a Stanford MS&E435 seminar with instructor Apoorv Agrawal to argue that inference is becoming the cost of goods sold for AI applications. His case is that scaled AI companies will need to move beyond default frontier-model APIs toward custom or post-trained models, both to improve margins and to protect the workflows and user signals that make their products defensible. Baseten’s role, as Srivastava framed it, is to provide the production inference stack and compute access needed to run that custom intelligence at scale.

Apoorv Agrawal · Tuhin SrivastavaStanford OnlineJun 5, 202618 min read

ComfyUI Bets on Open-Source Control for AI Video Workflows

Despite its Anthropic-titled hook, the source’s developed argument is about product interfaces that give users more control over complex systems. ComfyUI co-founder Yoland Yan argues that serious AI video creators need open, node-based workflows rather than simplified freemium tools; INTVL founder Louis Phillips makes the case for turning tracked routes into contested fitness territory; and the fact-checker bounty highlights live verification as a control layer for streamed claims.

Louis Phillips · Yoland YanThis Week in StartupsJun 5, 202617 min read

Hackathon Caps Models at 32B Parameters to Reward Tinkerable AI Apps

Build Small is a Hugging Face and Gradio hackathon organized around a hard constraint: every model used must be under 32 billion parameters. Yuvraj Sharma framed the rule as a way to move AI building away from dependence on giant hosted models and back toward systems that participants can inspect, fine-tune, run locally, and ship as working Gradio Spaces. Sponsor presentations from Black Forest Labs, OpenBMB, OpenAI, NVIDIA, Modal, JetBrains, and Cohere largely reinforced that premise, offering small models, credits, tools, and prize categories meant to turn the constraint into runnable projects rather than demos in name only.

Shashank Verma · Vaibhav Srivastav · Stephen Batifol · Julian Mack · Yuvraj Sharma · Felicia Chang · Nikita Pavlichenko · Hannah Blair · Zhong ZhangHugging FaceJun 5, 202620 min read

Production Inference Turns Transformer Models Into a Full-Stack Systems Problem

In a Stanford CS25 seminar, Modal’s Charles Frye argues that transformer inference has become the economic and operational center of AI systems: training produces weights, but serving turns them into usable, billable products. His account treats production inference as a full-stack problem, where application latency goals, workload shape, model choice, GPU memory limits, deployment failures, observability and cost controls all determine whether a system works. Frye’s main warning is that the largest serving gains come from matching the inference stack to the application, not from treating model hosting as a generic infrastructure task.

Steven Feng · Charles FryeStanford OnlineJun 4, 202622 min read

Relational Work and Capital Ownership May Decide Who Gains From AGI

Economists Alex Imas and Phil Trammell argue that the central question after AGI is not simply which jobs machines can do, but what remains scarce once machine-made goods become cheap and varied. In a conversation with Dwarkesh Patel, they frame labor’s future around demand for human involvement, capital-produced variety, and whether people or future agents satiate on machine-made goods. They also argue that redistribution will depend less on generic transfers than on whether households and countries can hold claims on the assets that capture AI surplus.

Dwarkesh Patel · Alex Imas · Phil Trammell · Sasha RushDwarkesh PatelJun 4, 202624 min read

Microsoft Bets Enterprise Agents Will Run Through the Cloud

John Coogan reads Microsoft Build 2026 as a sign that Microsoft is trying to make the cloud, not the phone, the center of enterprise AI agents. On Diet TBPN, he argues that Project Solara, Scout, OpenClaw support and Microsoft’s own models point to a platform strategy built around Azure, Microsoft 365 data, security boundaries and cost-efficient deployment rather than frontier-model supremacy. The open question, he says, is whether agent hardware and workflows can win adoption outside environments where companies can mandate them.

John Coogan · Jordi Hays · Eric Glyman · Martin Scorsese · Satya Nadella · Steven BathicheTBPNJun 3, 202614 min read

The Model Alone Is No Longer the AI Product

At AI Engineer Melbourne 2026’s Day 1 keynote program, speakers including Shawn Wang, George Cameron, Sarah Sachs, Igor Costa, Vamsi Ramakrishnan and Geoffrey Huntley argued that AI engineering has moved beyond picking the strongest model. Their shared case was that useful AI products now depend on the systems around models: harnesses, routing, evals, memory, state, latency budgets, deterministic tools and cost controls. The model still matters, but the keynote program framed product advantage as an architecture and economics problem, not a leaderboard problem.

Igor Costa · John Allsopp · George Cameron · Sarah Sachs · Vamsi Ramakrishnan · Shawn Wang · Geoffrey HuntleyAI EngineerJun 3, 202620 min read

AI Acceleration Is Creating Dependencies Faster Than Institutions Can Govern

Nathan Labenz and Prakash Narayanan frame the second day of “Sprinting Through the AI Marathon” as evidence that AI acceleration is shifting from product progress into institutional dependency. OpenAI forward deployed engineers describe tax agents whose improvement comes from practitioner correction traces; Labenz reports that frontier safety circles are treating recursive self-improvement as a near-term premise reliant on AI monitoring AI; and Matthew Sanders argues the Vatican’s AI intervention is a claim for human and religious agency. The shared concern is that capital markets, service firms, labs, governments and moral communities are being pulled into AI systems faster than they can settle ownership, liability or control.

Nathan Labenz · Arthur Araujo · Prakash Narayanan · John Wasseige · Matthew SandersThe Cognitive RevolutionJun 2, 202631 min read

NVIDIA Frames Cosmos 3 as Compute-Generated Data for Physical AI

NVIDIA presents Cosmos 3 as an open foundation model for physical AI, built to address what it frames as a data-scaling problem in robotics, autonomous vehicles and other systems that operate in the physical world. The company argues that real-world data cannot capture enough variability on its own, so compute must generate usable training and evaluation signals: synthetic video, predicted sensor outputs, simulation loops and action plans. Cosmos 3 is positioned as a post-trainable mixture-of-transformers system that combines multimodal reasoning with generation to support perception, prediction, simulation and action.

NVIDIAJun 2, 20265 min read

YouTube-Native Filmmakers Are Turning Viral Proof Into Box-Office Hits

John Coogan and Jordi Hays use the box-office success of YouTube-native filmmakers to argue that Hollywood is beginning to treat creators as a source of proven taste and new IP, not merely as marketing channels. Their broader read is that proof of demand is moving earlier across markets: viral film concepts can become theatrical bets, AI labs are preparing for public ownership, and even Bernie Sanders’s proposed public stake in AI companies assumes the sector’s equity will be enormously valuable. The hosts are skeptical, however, that attention or ownership alone solves the harder questions of execution, cash flow, or public benefit.

John Coogan · Jordi HaysTBPNJun 2, 202614 min read

Open Image Models Converge on Flow Matching and DiT Architectures

Stanford adjunct lecturer Shervine Amidi uses Lecture 8 of CME296 to argue that modern visual generation is best understood as a stack of choices for transporting noise into data: the paradigm, representation, architecture, training procedure, and evaluation method. He presents flow matching as the current default for image-generation systems, diffusion transformers as the dominant architectural direction, and latent spaces as a practical compression tradeoff now being challenged by scaled pixel-space models.

Shervine AmidiStanford OnlineJun 1, 202623 min read

Luma AI Targets Robotics Generalization With Open Physical AI Lab

Luma AI is launching an open physical AI lab to work on robots that can generalize beyond task-by-task demonstrations, CEO Amit Jain told Bloomberg Technology. Jain argues that physical AI should be built on large-scale multimodal data systems rather than narrow robotics training alone, and that the stack must remain open because robots could become part of homes, factories, hospitals and other productive systems.

Ed Ludlow · Amit Jain · Caroline HydeBloomberg TechnologyJun 1, 20266 min read

Inference Hardware and Continual Learning Are Replacing Data as AI Bottlenecks

Google chief scientist Jeff Dean argues in a Two Minute Papers interview that AI progress is not chiefly constrained by running out of public text, but by systems work: extracting more from existing data, building inference-specialized hardware, distilling large models into smaller ones, and giving models access to much larger context. Dean frames the next phase less as better chatbots than as action-driven, agentic systems that can test, simulate and learn under controlled safety gates, while acknowledging unresolved problems in continual learning, healthcare deployment and infrastructure reliability at Google scale.

Károly Zsolnai-Fehér · Jeff DeanTwo Minute PapersJun 1, 202613 min read

AI Fatalism Is Blocking Real Choices on Regulation and War

Brad Carson, a former congressman and senior Pentagon official who now leads Americans for Responsible Innovation, argues that AI development is not an unstoppable force beyond public control. In a long exchange with Keith Duggar, Carson makes the case that governments still have leverage over frontier AI through chips, law, procurement and international negotiation, and that fatalism is itself a political choice. His sharpest warnings concern military use, where opaque neural systems could turn lethal targeting into probabilistic scores without intelligible accountability.

Keith Duggar · Brad CarsonMachine Learning Street TalkMay 31, 202623 min read

AI Governance Fight Shifts to Centralization, Open Models, and Worker Agency

On All-In, Bill Gurley joined Jason Calacanis, David Sacks and Chamath Palihapitiya for a debate framed less around whether AI is powerful than around who will control it. The panel read Pope Leo XIV’s AI encyclical as a warning about concentrated power, but split over the remedy: Sacks argued government regulation could become the centralizing threat, while Gurley and others scrutinized Anthropic’s safety posture as either regulatory strategy or something closer to a belief in building a superior intelligence. Their practical conclusion was that open models, swappable systems and worker fluency are the main checks against AI power consolidating in a few labs or agencies.

Jason Calacanis · David Sacks · Chamath Palihapitiya · Bill Gurley · Nick CalacanisAll-In PodcastMay 29, 202627 min read

Hugging Face Ships a $299 Hackable Robot for Voice AI Experiments

Andres Marafioti argues that Hugging Face’s Reachy Mini is meant to move robotics experimentation out of expensive humanoid hardware and into a $299-to-$449 open-source platform that users can assemble, repair and modify themselves. The robot’s most-used application is conversation, and Marafioti’s account ties its social ambition to a technical stack built for low-latency speech: Parakeet transcription, Qwen 3.5 27B, and an optimized Qwen3 TTS implementation that he says improved from 0.8x to 5.8x real time.

Andres MarafiotiAI EngineerMay 29, 202612 min read

Context Graphs Let Agents Retrieve Precedents, Not Just Policies

Neo4j’s Zach Blumenfeld argues that agents built for operational decisions need context graphs rather than document retrieval alone. In his model, a standard knowledge base can tell an agent the relevant facts and policies, but a context graph adds prior decision traces, causal links, precedents and outcomes, allowing the agent to retrieve how similar cases were resolved. He presents `create-context-graph` and `neo4j-agent-memory` as open-source scaffolding for building that pattern with graph entities, short-term memory and embedded reasoning traces.

Zach BlumenfeldAI EngineerMay 29, 202610 min read

AI Venture Winners Will Be Larger, Faster, and Harder to Identify

Andreessen Horowitz general partner David George and VenCap CIO David Clark argue that AI has broken several of venture capital’s old assumptions at once: the largest companies are scaling revenue faster, potential outcomes are getting much larger, and early leadership is proving less durable. George’s core test for AI winners is whether they are “in the token path” — directly tied to the flow of AI usage and spending — while Clark stresses that the same market may produce unprecedented exits and unusually fast turnover among apparent leaders.

David Clark · David Georgea16zMay 29, 202615 min read

Snowflake Rally Reflects AI Demand More Than Amazon Deal

Bloomberg Technology framed Snowflake’s 34% stock surge less as a reaction to its $6 billion Amazon Web Services deal than as a repricing of its AI software position. Snowflake chief executive Sridhar Ramaswamy pointed to stronger product revenue, higher retention and adoption of tools such as Cortex, while Bloomberg’s Brody Ford argued the AWS agreement mainly helps answer how Snowflake can manage the infrastructure costs of building AI features.

Ed Ludlow · Caroline Hyde · Mark Gurman · Brody Ford · Sridhar Ramaswamy · Sampriti Bhattacharyya · Jo Constantz · Jared Isaacman · Eric Vishria · Stephen Engle · Shweta Khajuria · Alexandra Levine · Yeyi Yun · Arthur Mensch · Carson BlockBloomberg TechnologyMay 28, 202612 min read

RLVR Moves Post-Training From Human Preferences to Checkable Rewards

Stanford computer scientist Tatsunori Hashimoto presents reinforcement learning from verifiable rewards as the current practical route beyond RLHF for reasoning models, especially in math, coding and software-agent settings. His argument is that RLVR works because it replaces learned preference proxies with rewards that can be checked more directly, but that the reward remains the bottleneck: GRPO and related methods made the recipe simpler to run, while systems such as DeepSeek R1, Kimi k1.5 and Qwen show both the gains and the ways ostensibly verifiable rewards can still be gamed.

Tatsunori HashimotoStanford OnlineMay 27, 202620 min read

Frontier AI Has Become a Gigawatt-Scale Industrial Infrastructure Race

In a Stanford MS&E seminar on the economics of the AI supercycle, OpenAI infrastructure executive Sachin Katti argued that frontier AI has become an industrial systems problem, not a GPU procurement problem. Katti said usable compute now depends on synchronizing chips, memory, networking, power, cooling, buildings, land, suppliers and operators at gigawatt scale. His broader case was that OpenAI’s model and revenue ambitions depend on how quickly it can turn that whole chain into reliable infrastructure for training, inference and agentic workloads.

Apoorv Agrawal · Sachin KattiStanford OnlineMay 27, 202620 min read

Children’s Data Profiles Can Begin Before Birth

Proton engineering director Eamonn Maguire argues that a child’s digital profile can begin before birth, as parents’ emails, searches and sign-ups create signals that advertising and platform systems can use to infer pregnancy, family status and future behavior. Speaking with Craig Smith, Maguire uses Proton’s Born Private initiative, which lets parents reserve an email address for a child, to make a broader case that privacy is an infrastructure decision made long before children can consent. He extends the argument to social media, AI training data and the limits of trusting platforms whose business models depend on profiling.

Craig Smith · Eamonn MaguireEye on AIMay 27, 202617 min read

Abstraction Requires Accountability When AI, Logistics, and Companies Get Too Complex

Abstraction creates value only when responsibility for the hidden system remains clear, the TBPN discussion argued across AI ethics, company governance, logistics and inference markets. Christopher Hale framed the Vatican’s AI position as a claim that human dignity and accountability must govern algorithmic systems; Eric Ries argued that mission-driven companies need structures strong enough to resist capital and convenience; and Sean Henry and Alex Atallah described logistics and AI markets where software layers must still answer for the fragmented physical or computational systems beneath them.

John Coogan · Jordi Hays · Eric Ries · Christopher Hale · Alex Atallah · Sean HenryTBPNMay 26, 202623 min read

Local Frontier AI Still Needs 100x Better Price Performance

Alex Cheema of EXO Labs argues that running frontier AI locally is primarily an inference-stack problem, not a model-training problem. Using a four-Mac Studio GLM 5.1 setup that costs about $40,000 and reaches roughly 20 tokens per second as the current reference point, Cheema says local price-performance still has about 100x to improve through better kernels, interconnects, heterogeneous hardware, energy efficiency, orchestration, and benchmarks. His case is that today’s awkward home cluster is not the endpoint, but evidence of how much optimization remains outside the cloud.

Alex CheemaAI EngineerMay 26, 202621 min read

Distributed RL Let Composer Match Frontier Coding Models With Smaller-Model Speed

Cursor’s Federico Cassano and Fireworks’ Dmytro Dzhulgakov argue that Composer’s advantage comes from specializing a model for software engineering inside Cursor rather than spending capacity on general-purpose behavior. Starting from an open-source base, Cursor used mid-training and reinforcement learning against its own product environment, while Fireworks supplied the distributed infrastructure needed to make agent rollouts, weight synchronization, and inference efficient enough to run at scale. Their case is that application companies with enough product-specific usage, tools, and feedback can build models that are better, faster, and cheaper for their own workflows than larger general models.

Sonya Huang · Dmytro Dzhulgakov · Federico CassanoSequoia CapitalMay 26, 202617 min read

Gemma Is Google’s On-Device Extension of Gemini Research

Google DeepMind’s Omar Sanseviero argues that Gemma is not a parallel alternative to Gemini but the open, local and on-device expression of the same research stream. He presents Gemma 4 as a model family optimized for efficiency, developer integration and emerging agentic use cases, while drawing a clear boundary around Gemini as Google’s route for frontier capability, broad factual knowledge and long-running tasks.

Vibhu Sapra · Shawn Wang · Omar SansevieroLatent SpaceMay 25, 202613 min read

Google’s GenAI Stack Turns Multimodal Prompts Into Application Pipelines

Google DeepMind’s Paige Bailey and Guillaume Vernade argue that Google’s generative AI stack is being organized as an application pipeline rather than a set of isolated models. In a three-hour workshop, Bailey showed AI Studio turning multimodal Gemini prompts into inspectable API calls and generated apps with auth and Firestore, while Vernade used Gemini, Nano Banana, Veo and Lyria to illustrate, animate and score The Wind in the Willows. Their case is that builders can now orchestrate prompt, code, media generation and deployment in one workflow, even as the demos exposed seams that still require engineering discipline.

Paige Bailey · Guillaume Vernade · Ian ValentineAI EngineerMay 23, 202623 min read

TSMC’s Wafer Scarcity May Be Preventing an AI Overbuild

Investor Gavin Baker argues on Invest Like The Best that the AI boom is being organized less by software adoption than by scarcity: compute demand is outrunning power, wafers, and frontier-model access. In his account, Anthropic’s growth, Nvidia’s position, TSMC’s capacity discipline, and even SpaceX’s possible orbital compute are all expressions of the same constraint. Baker’s central claim is that the AI cycle may avoid a classic infrastructure bubble only if physical bottlenecks, especially leading-edge wafer supply, keep capital from building far ahead of demand.

Patrick O'Shaughnessy · Gavin BakerInvest Like The BestMay 20, 202625 min read

Models Are Trained on Curated Corpora, Not the Internet

Stanford CS336’s data lecture, taught by Tatsunori Hashimoto, argues that training data is both the most consequential and least transparent part of modern language models. Hashimoto says models are not trained on “the internet” in any simple sense, but on static corpora shaped by crawlers, access limits, licensing, copyright risk, filtering, deduplication and conversion choices. The lecture’s central claim is that data construction is a legal and operational pipeline, not a passive input, and that those choices materially distinguish otherwise similar models.

Tatsunori HashimotoStanford OnlineMay 20, 202622 min read

Agentic AI Is Turning Model Quality Into a Systems Problem

At AI Engineer Singapore’s second day, speakers from Google DeepMind, Cloudflare, Arize, OpenClaw, Adaption and other teams made a shared engineering case: as AI systems become more agentic, model quality is no longer separable from the systems around the model. Richard Ngo framed the risk as long-horizon, situationally aware agents whose goals cannot be inspected, while practitioners argued that production AI now depends on continuous evaluation, traces, deterministic execution boundaries, routing, memory, fine-tuning and test-time search. The source’s central claim is that useful and safe agentic AI is becoming a systems problem, not just a model-selection problem.

Shawn Wang · Eugene Yan · Philip Vollet · Haotian Zhang · Eugene Evstafev · Jason Liu · Pratik Desai · Michelle Chen · Jason Lopatecki · Amr Ahmed · Rita Zhang · Harris Snyder · Adarsh Shah · Eric Zhang · Ricky Robinett · Linoy Bitan · Wei Sheng · Richard NgoAI EngineerMay 17, 202626 min read

MagenticLite Brings Full Agent Workflows to Small Language Models

Microsoft Research is presenting MagenticLite as a full-stack agentic system designed to make small language models usable for multi-step work across a browser and local files. Weili Shi, Harkirat Behl and Hussein Mozannar argue that the capability comes from specializing the stack rather than relying on frontier-scale models: MagenticBrain handles planning, coding and delegation, while Fara 1.5 controls the browser. The release also emphasizes user oversight, with the agent pausing for credentials, approvals or other points where the user needs to take control.

Hussein Mozannar · Harkirat Behl · Weili ShiMicrosoft ResearchMay 14, 20267 min read

Agents Can Now Fine-Tune Open Models Through Prompted Workflows

Merve Noyan argues that open models have moved from downloadable artifacts into an operational stack for selection, serving, inspection, training and deployment. In her Hugging Face presentation, she makes the case that access to model weights now matters because developers can quantize, fine-tune and run models locally or at the edge, while Hub benchmarks, inference providers, traces, MCP and Skills let agents act directly on those workflows. Her strongest example is a coding agent that can size hardware, choose infrastructure and launch a fine-tuning job from a prompt.

Merve NoyanAI EngineerMay 13, 202612 min read

Computing Is Shifting From Prerecorded Execution to Continuous Generation

In a Stanford CS153 Frontier Systems lecture, NVIDIA chief executive Jensen Huang argues that AI is forcing the first fundamental reinvention of computing in decades, moving the industry from prerecorded, on-demand execution to continuous real-time generation. Huang says that shift requires rebuilding the full stack — chips, compilers, networks, storage, systems and institutions — around new bottlenecks, with NVIDIA’s co-design approach producing gains that conventional Moore’s Law scaling cannot match.

Jensen HuangStanford OnlineMay 13, 202619 min read

NVIDIA’s Nemotron 3 Nano Omni Trades Accuracy for Multimodal Throughput

Károly Zsolnai-Fehér’s account of NVIDIA’s Nemotron 3 Nano Omni argues that the 30-billion-parameter open multimodal model is notable less for leading general intelligence benchmarks than for processing long video, audio, images and documents quickly and cheaply. The reported advantage comes from compression across the system — Mamba layers, audio tokenization, aspect-ratio-preserving vision handling, distilled encoders and efficient video sampling — which reduces the amount of material sent into the language-model backbone.

Károly Zsolnai-FehérTwo Minute PapersMay 13, 20267 min read

Enterprise GenAI Pilots Fail When Feedback Cannot Reach the Model

Alessandro Cappelli, co-founder and chief customer officer of Adaptive ML, argues that enterprise generative AI pilots fail to reach production because companies lack a systematic way to turn defects, user feedback, business metrics and production signals into model improvement. In a talk on Fortune 500 deployments, he says prompting and instruction fine-tuning can produce credible demos, but reinforcement learning is the mechanism needed to train models and agents against enterprise-specific environments, rewards and KPIs. His case is that agents make this feedback loop more urgent, because they consume more tokens, touch live systems and leave less room for error.

Alessandro CappelliAI EngineerMay 12, 202612 min read

Cerebras’s Higher IPO Range Tests AI Infrastructure Demand

Alex Wilhelm and Jason Calacanis treat Cerebras’s raised IPO range as a test of how much public investors will pay for future AI inference demand and the quality of contracts with customers such as OpenAI. Ori Goshen makes a parallel case that enterprise AI’s hard problem is no longer choosing one model, but routing work across models, tools and inference strategies for cost, latency and accuracy. Across OpenAI’s deployment spinout, AI21’s orchestration pitch, Magrathea Metals’ brine-based magnesium plan and OpenClaw’s fading momentum, the article frames deployment as a question of incentives, constraints and where the bottleneck actually sits.

Jason Calacanis · Alex Wilhelm · Ori Goshen · Alex GrantThis Week in StartupsMay 12, 202620 min read

Apple-Device AI Is Becoming Viable Without Cloud Inference

Prince Canuma presents MLX, Apple’s array framework for Apple Silicon, as a practical foundation for running AI agents locally rather than through cloud services. His case is rooted in accessibility and unreliable connectivity, but extends to product constraints for voice agents, robots and multimodal apps: vision, speech, video generation and long-context inference can increasingly run on Macs, iPhones and iPads without a network call. Canuma does not argue that local models replace every frontier cloud system, but that the boundary has moved far enough to make on-device AI a serious deployment option.

Prince CanumaAI EngineerMay 11, 202613 min read

Text-to-Speech Models Are Converging on LLM-Style Architectures

Samuel Humeau of Mistral argues that modern text-to-speech has converged on an architecture that resembles large language modeling: an autoregressive transformer generates compressed audio tokens frame by frame, rather than raw waveform samples. Using Mistral’s open-weight Voxtral TTS model as the example, he says neural audio codecs make that possible by reducing dense speech signals to token-like representations a transformer can handle. The remaining latency frontier, in his account, is not just streaming playable audio early, but letting TTS consume an LLM’s text stream as it is still being written.

Samuel HumeauAI EngineerMay 9, 202612 min read

BFL Is Moving FLUX From Image Generation Toward Physical AI

Stephen Batifol of Black Forest Labs argues that FLUX is no longer just an image-generation line but the start of a broader push toward visual intelligence: models that can generate, edit, understand, and eventually act across images, video, audio, and physical environments. In the talk, he presents FLUX.1, Kontext, FLUX.2, and FLUX.2 Klein as product steps toward that goal, while BFL’s Self-Flow research is framed as the mechanism for moving representation learning inside multimodal generative models rather than relying on external encoders.

Stephen BatifolAI EngineerMay 8, 202611 min read

Perplexity Frames AI Agents as Metered Digital Labor

Perplexity chief business officer Dmitry Shevelenko argues that AI agents should be judged less as software features than as metered digital labor: tools users will pay for when they perform economically useful work. In a Big Technology Podcast interview, he makes the case that Perplexity’s computer-use agents, workflow packaging, broad permissions and multi-model orchestration are all part of that shift. The unresolved question is whether users and companies will accept the access, trust and usage-based pricing required to make those agents a real business rather than another AI novelty cycle.

Alex Kantrowitz · Dmitry ShevelenkoAlex KantrowitzMay 7, 202619 min read

DeepSeek V4 Claims Frontier-Adjacent Open Weights With One-Million-Token Context

Károly Zsolnai-Fehér of Two Minute Papers argues that DeepSeek V4 Preview is a consequential open-weight AI release because it pairs frontier-adjacent benchmark results with a reported one-million-token text context window and sharply lower long-context memory costs. His case rests less on outright benchmark dominance than on access economics: a freely self-hostable model appears close enough to recent closed frontier systems to change what developers can afford to use. He also stresses the limits: DeepSeek V4 is text-only, degrades near the edge of its context window, and still needs serious hardware at full scale.

Károly Zsolnai-FehérTwo Minute PapersMay 7, 20266 min read

Autonomous AI Hackers Are Already Beating Humans on HackerOne

Oege de Moor, founder and CEO of XBOW, argues that autonomous AI hacking has moved from assistance to real exploitation. In an AI Ascent 2026 talk, he says XBOW’s system reached the top of HackerOne using only black-box access, found a remote code execution flaw in Bing Image Search from a URL alone, and would have been three times more effective with GPT-5. His warning is that defenders have six to nine months before comparable open-weight models make the same capabilities broadly available, including to attackers.

Oege MoorSequoia CapitalMay 7, 20266 min read

Small-Model Inference Needs Infrastructure Beyond Model Servers

Filip Makraduli of Superlinked argues that the hard part of small-model inference is no longer simply serving a model, but operating many embeddings, rerankers, extractors and multimodal models efficiently in production. In his account, conventional one-model-per-container deployments waste GPU capacity and leave teams to rebuild routing, autoscaling, monitoring, hot-swapping and eviction themselves. Superlinked’s SIE is presented as an open-source attempt to provide that missing infrastructure layer for AI search and document-processing workloads.

Filip MakraduliAI EngineerMay 7, 20269 min read

Enterprise AI Agents Need Harnesses, Traces, and Controlled Runtimes

LangChain co-founder and CEO Harrison Chase argues that enterprise AI agents are becoming an architectural problem rather than a question of adding autonomy wherever possible. In an NVIDIA AI Podcast interview, he says systems such as Claude Code, Manus and Deep Research share a common “deep agent” pattern: an LLM in a tool-calling loop, supported by a reusable harness, workspace, subagents and planning. For enterprises, Chase says trust depends on choosing the right level of autonomy and surrounding agents with observability, evaluation, secure runtimes and continued iteration.

Harrison Chase · Noah KravitzNVIDIAMay 7, 202612 min read

Gemma 4 Moves On-Device AI From Chatbots to Local Agents

Chintan Parikh of Google DeepMind argues that on-device AI is moving from local chatbots toward local agents, as smaller Gemma 4 edge models become capable of tool calling, structured output and reasoning on phones, laptops and embedded hardware. With Weiyi Wang joining the Q&A, Parikh presents LiteRT as the deployment layer for that shift across Android, iOS, desktop, web and IoT. His case is pragmatic rather than absolute: edge inference can improve latency, privacy, offline use and cost, but teams still have to manage memory, quantization, accelerator support and when to call the cloud.

Weiyi Wang · Chintan ParikhAI EngineerMay 7, 202611 min read