May 2026
Károly Zsolnai-Fehér of Two Minute Papers argues that DeepSeek V4 Preview is a consequential open-weight AI release because it pairs frontier-adjacent benchmark results with a reported one-million-token text context window and sharply lower long-context memory costs. His case rests less on outright benchmark dominance than on access economics: a freely self-hostable model appears close enough to recent closed frontier systems to change what developers can afford to use. He also stresses the limits: DeepSeek V4 is text-only, degrades near the edge of its context window, and still needs serious hardware at full scale.
AI is entering car development first as a way to compress the years of sketching, 3D modeling, simulation, testing and software work behind a new vehicle, not as a prompt that produces a finished car, Verge contributor Tim Stevens argues on The Vergecast. Stevens says that could move automakers toward three-year development cycles and lower costs, but warns it may also encourage homogenized design and erase the junior work that trains future designers and developers. The Verge’s Hayden Field applies the labor question to the wider AI business, where coding tools such as Claude Code and Codex are gaining traction while companies cite AI in layoffs without clear evidence that the productivity gains or ROI are there.
Airbnb’s challenge in the AI era is less a feature rollout than a company reinvention, chief executive Brian Chesky argues in a conversation with Patrick O’Shaughnessy. Chesky says the company has to move beyond a business still identified mainly with homes, rebuild around identity and personal preferences, and do so without damaging a large public platform that hosts and investors depend on. His answer is a more hands-on operating model: fewer abstraction layers, smaller elite teams closer to users, continuous recruiting, and a CEO directly engaged with the work.
Speaking with Craig Smith on Eye on AI, IIT Madras electrical engineering professor Andrew Thangaraj argues that India’s AI talent problem begins with a higher-education system that filters too many students out too early and rewards exam knowledge over usable skills. He presents IIT Madras’s online undergraduate degree in data science — a low-cost, no-JEE program with a rigorous exit standard and project-heavy diploma stage — as an attempt to move the filter from admission to completion. Thangaraj says that model is necessary if India is to build AI capacity at national scale rather than through a handful of elite seats.
In a Startup School India fireside with YC’s Jon Xu, Razorpay co-founder and CEO Harshil Mathur argues that the company’s rise in Indian payments came less from an initial fintech thesis than from staying with a painful customer problem through regulation, bank failures and market skepticism. Mathur says Razorpay turned delays into a moat, customer trust into an operating principle, and early bets such as UPI into openings incumbents missed. His broader case is that founders must keep direct ownership of the decisions that define the company, especially as AI lowers the cost of building and raises the cost of slow judgment.
At AI Ascent 2026, Flapping Airplanes co-founders Ben and Asher Spector argued that data scarcity, more than compute alone, will determine where AI can create value next. They said the biggest gains so far have come in unusually data-rich domains such as search and coding, while much of the economy — including robotics, trading, science and narrow industrial workflows — lacks comparable datasets. Their proposed answer is to make models far more data-efficient by developing new GPU-level primitives that current frameworks such as PyTorch make hard to express.
Philip Johnston, Starcloud’s co-founder and chief executive, argues that AI data centers could become cheaper in orbit than on Earth if launch costs fall to about $500 per kilogram. His case rests on continuous solar power in a dawn-dusk orbit, avoiding land and battery costs, and using constellations of optically linked satellites for inference workloads. Starcloud’s plan, he said, starts with an orbital GPU proof point and points toward an 88,000-satellite network delivering roughly 20 gigawatts of compute capacity.
At Sequoia’s AI Ascent 2026, ElevenLabs co-founder and CEO Mati Staniszewski argues that audio was an overlooked frontier in 2022 because the AI field was focused on text and images, leaving room for a smaller company to build quickly and monetize early. His broader case is that as AI intelligence becomes more capable, voice becomes the interface problem: the way people will use agents, robots, services, education and healthcare. Staniszewski says the next hard problems are emotional intelligence, timing, authentication and workflow, not merely making synthetic speech sound human.
At AI Ascent 2026, Ricursive Intelligence co-founders Anna Goldie and Azalia Mirhoseini argued that the next bottleneck in AI is the chip-design process itself, and that AI should be used to design the hardware that trains and serves it. Drawing on their AlphaChip work, which Goldie said has shipped in four generations of Google TPUs, they described Ricursive’s plan to rebuild chip-design tools for fast AI feedback loops and turn that tooling into a platform for custom silicon. Their larger claim is that workload-specific chips, and eventually co-designed chips and models, require moving chip design from yearlong expert workflows to automated optimization.
At AI Ascent 2026, Unconventional AI founder and CEO Naveen Rao argued that the current AI compute stack is approaching an energy wall because it is built on an 80-year-old digital computing model poorly suited to intelligence. Rao’s case is that GPUs and matrix math cannot close the efficiency gap with biological brains fast enough, and that AI hardware must instead be rebuilt around physical dynamics, time-domain computation, and architectures that blur memory and processing. He presented Unconventional AI’s coupled-oscillator chip prototype as an attempt to move compute closer to the thermodynamic limits of intelligence per watt.
Tony James, former president and COO of Blackstone, tells David Haber on the a16z Show that his career at DLJ, Costco and Blackstone was defined less by asset class than by a repeatable operating pattern: enter under-scaled franchises before the opportunity is priced, then use culture, disciplined decision-making and structure to let them compound. He argues Blackstone’s rise from roughly $16bn in assets to near $1tn depended on turning a collection of subscale businesses into a firm-level machine, with investment committees, distribution and succession treated as sources of advantage rather than administrative chores.
Oege de Moor, founder and CEO of XBOW, argues that autonomous AI hacking has moved from assistance to real exploitation. In an AI Ascent 2026 talk, he says XBOW’s system reached the top of HackerOne using only black-box access, found a remote code execution flaw in Bing Image Search from a URL alone, and would have been three times more effective with GPT-5. His warning is that defenders have six to nine months before comparable open-weight models make the same capabilities broadly available, including to attackers.
NASA Administrator Jared Isaacman uses an a16z Show interview with Morgan Brennan to cast the US return to the moon as a national-security test rather than an open-ended exploration program. He argues that NASA must compress Artemis from years between launches to months, insert a 2027 risk-reduction mission before 2028 landing attempts, and rebuild internal capabilities the agency has outsourced. Industry still has a central role, in his account, but NASA should set sharper demand signals for a lunar base while reserving its own effort for capabilities no market will fund, including nuclear power and propulsion for Mars.
In a Stanford CS153 guest lecture, Luma AI co-founder and chief executive Amit Jain argues that generative video is only a staging point toward “unified intelligence”: models that understand and generate across text, images, video, audio, code and tools in a single work loop. Jain traces Luma’s path from Apple-era LiDAR and 3D capture to internet-scale video, saying the company followed the data but now sees prettier clips as insufficient. The destination, he says, is a multimodal AI factory for professional creative and physical work, where human skills, tool use, feedback and unified transformer architectures produce full campaigns, schematics, productions and eventually robotics workflows.
Laura Burkhauser, Descript’s chief executive, distinguishes generative AI tools for creators from the “slop” she defines as mass-produced content arbitrage. Her case is that Descript’s future depends less on adding AI everywhere than on making editing automation reliable, reversible and useful for recorded human media. That means choosing third-party models by fit and taste, building in-house systems where Descript has workflow data, and treating creator backlash as a product constraint rather than a branding problem.
A Vercel presentation from Saltbox Management’s Shane Smyth makes the case for Saltbox One as an enterprise Salesforce agent built for implementation work, not a generic chat layer. Smyth argues that production Salesforce tasks require project and org context, authenticated tools, model routing, sandboxed execution and explicit human approval before writes. The product he describes uses Vercel’s AI SDK, AI Gateway, Fluid Compute, Sandbox MicroVMs and v0 to let the same chat surface summarize meetings, generate stories, inspect orgs, produce Salesforce code and deploy validated changes.
Vercel’s beginner guide presents v0 as a way to move from a plain-language prompt to deployed software inside the company’s development stack. The walkthrough argues that v0 is more than a website generator: it creates Next.js code, connects to AI and infrastructure services, publishes to Vercel, and can also import an existing GitHub repository, create a branch, and open a pull request for review.
Raj Navakoti argues that enterprise AI agents fail less because of model limits or retrieval plumbing than because companies have not made institutional knowledge legible. In his Demand-Driven Context workshop, he proposes building agent-ready knowledge bases from the bottom up: give agents real tickets or incidents, observe where they fail, and turn those failures into structured, validated context blocks. The method, shown through smaller-scope examples and prototypes including work from IKEA Digital, is presented as an incremental curation loop rather than a proven enterprise-scale system.
WorkOS engineers Nick Nisi and Zack Proser make the case that AI “skills” are a practical way to turn repeated agent instructions into portable, reusable workflows. They argue that small markdown-and-script packages can encode team context, constraints, evidence-gathering commands and output formats so agents stop producing generic answers and start following a team’s way of working. Their warning is that skills only help when they are focused, routed correctly, tested against a no-skill baseline and managed like shared software rather than treated as another giant context file.
Giving LLM agents access to production databases creates an authorization problem that prompt instructions alone cannot solve, Stephanie Wong and Kurtis Van Gent argue in a Google Cloud Live session on MCP Toolbox for Databases. They describe Toolbox as Google’s open source framework for putting an architectural gate between agents and systems such as AlloyDB and BigQuery. Van Gent’s core argument is that production agents should use constrained, reviewed tools with application-bound or OAuth-derived parameters, so the model can act on data only within boundaries set outside the prompt.
OpenAI’s Mark Handley and Greg Steinbrecher argue that frontier AI training has outgrown conventional data-center networking because synchronized GPU clusters are constrained by their worst congestion or failure, not average throughput. They present Multipath Reliable Connection, developed with major hardware and cloud partners, as OpenAI’s answer: a protocol that spreads traffic across many paths, detects loss quickly, routes around failures from the endpoints, and is being pushed as an open standard for the wider industry.
Liad Yosef and Ido Salomon argue that MCP Apps turn chat products such as ChatGPT, Claude, VS Code, Cursor and Copilot into application distribution surfaces, not just places for text responses. Their case is that tools can return branded, interactive UI resources over MCP, while user actions flow back through the host so the model retains context and control. For builders, they frame this as a shift from monolithic web destinations to portable app components that can run across compliant agent hosts.
Filip Makraduli of Superlinked argues that the hard part of small-model inference is no longer simply serving a model, but operating many embeddings, rerankers, extractors and multimodal models efficiently in production. In his account, conventional one-model-per-container deployments waste GPU capacity and leave teams to rebuild routing, autoscaling, monitoring, hot-swapping and eviction themselves. Superlinked’s SIE is presented as an open-source attempt to provide that missing infrastructure layer for AI search and document-processing workloads.
LangChain co-founder and CEO Harrison Chase argues that enterprise AI agents are becoming an architectural problem rather than a question of adding autonomy wherever possible. In an NVIDIA AI Podcast interview, he says systems such as Claude Code, Manus and Deep Research share a common “deep agent” pattern: an LLM in a tool-calling loop, supported by a reusable harness, workspace, subagents and planning. For enterprises, Chase says trust depends on choosing the right level of autonomy and surrounding agents with observability, evaluation, secure runtimes and continued iteration.
Factory’s Luke Alvoeiro argues that long-running software agents will not be built by stretching chat sessions, but by organizing agents into roles with explicit contracts, handoffs and validation. In a talk on Factory’s Missions system, he presents a three-part architecture — orchestrator, workers and validators — designed to run software work for hours or days while humans supervise scope and acceptance rather than every step. The case rests on Factory’s production experience, including missions Alvoeiro says have run as long as 16 days, and on a claim that serial execution, adversarial verification and model selection by role matter more than default parallelism.
AWS developer advocate Brooke Jamieson presents Agent Toolkit for AWS as a way to make coding agents more reliable and governable on AWS by giving them current documentation, task-specific runbooks, and an audited path for API calls. In a Lambda and API Gateway demo, she argues that the toolkit addresses common agent failures such as stale service knowledge and missed IAM steps, while letting teams constrain agent actions through IAM and inspect them in CloudTrail.
Katja Sirazitdinova presents CuTe DSL for JAX as an escape hatch for developers who need more GPU-kernel control than XLA provides, without leaving the JAX workflow. The tutorial argues that custom NVIDIA kernels can be written in Python with CUTLASS CuTe DSL, bridged into compiled JAX programs through `cutlass_call`, and used where fusion, special layouts, custom data movement, or narrow performance bottlenecks justify the extra discipline around shapes, dtypes, launch constraints, and static parameters.
Chintan Parikh of Google DeepMind argues that on-device AI is moving from local chatbots toward local agents, as smaller Gemma 4 edge models become capable of tool calling, structured output and reasoning on phones, laptops and embedded hardware. With Weiyi Wang joining the Q&A, Parikh presents LiteRT as the deployment layer for that shift across Android, iOS, desktop, web and IoT. His case is pragmatic rather than absolute: edge inference can improve latency, privacy, offline use and cost, but teams still have to manage memory, quantization, accelerator support and when to call the cloud.
In a Google for Developers guide, Nikita Namjoshi shows how AI Studio turns a prompt-built app from a static page into a Firebase-backed application when the requested workflow requires stored data. Using a book-tracking app as the example, she argues that AI Studio can detect the need for persistence, prompt the user to enable Firebase, wire in authentication, and save extracted book metadata to Cloud Firestore. The guide also shows that publishing the app publicly requires configuring a Gemini API key, with an optional spend cap to limit usage costs.
A Codex sales-prep walkthrough argues that sellers can use one conversation thread to assemble customer-meeting context across Google Calendar, Salesforce, Google Drive, Slack, Gmail, and a pipeline dashboard. Using an Acme Corporation expansion review as the example, the source shows Codex identifying the relevant opportunity and risks, creating a meeting brief, drafting internal and customer follow-up, updating Salesforce next steps, and filtering the pipeline view. Its central claim is that Codex reduces the manual work of preparing for a sales meeting by carrying context and actions across the systems sellers already use.
Alex Kestner argues that Amazon EKS Auto Mode extends EKS management from the Kubernetes control plane into the data plane where workloads run, infrastructure is provisioned, and scaling and networking decisions are made. He presents the service as a way for teams to create what AWS calls a production-grade cluster through a console click or API call, while retaining Kubernetes-native controls over instance types, node pools, storage, and networking. Nana Janashia frames the central concern as whether automation reduces control; Kestner’s answer is that Auto Mode is opinionated, not closed.
Google Cloud’s Billy Jacobson presents Firestore with MongoDB compatibility as a way for teams to keep MongoDB-compatible code, drivers, tools, and query patterns while moving the underlying database to Firestore’s serverless infrastructure. He argues the product is aimed at both migrations and new applications, combining compatibility with live migration tooling, real-time subscription queries, and integrations with Google’s generative AI ecosystem. The enterprise case rests on serverless scaling, governance controls, high availability, and usage-based pricing rather than reserved capacity.