
Stanford Online
Stanford Online is the Stanford School of Engineering’s portal for academic and professional education offered by Stanford schools and units, including degree programs, credit-bearing courses, professional certificates, and free open content. It is operated by the Stanford Engineering Center for Global & Online Education.
Tokens Can Now Substitute for 100-Person Startup Engineering Teams
In a Stanford CS153 lecture, OpenAI chief executive Sam Altman argued that AI has already rewritten the startup playbook, allowing small teams to buy capabilities with tokens that once required large engineering organizations. He used OpenAI’s experience with ChatGPT, Codex and model scaling to make a broader case: scale keeps producing capabilities that experts underestimate, but the institutions around AI — from education and research pipelines to compute markets and governance — are not adapting as quickly. Altman said the central choice ahead is whether intelligence becomes a broadly available utility or remains concentrated in a few companies.
Game Studios Are Overbuilding for Competition as Players Seek Stress Relief
Cheryl Platz, a game developer, designer and author speaking at Stanford’s CS547 HCI seminar, argues that game strategy should start with why people play rather than with genre conventions, monetization or production scope. Her case is that the industry still overbuilds for competitive, mastery-driven players while evidence she cites points to rising demand for stress relief, self-expression, companionship, comfort and education. Platz presents a nine-part motivator framework as a practical tool for decisions about mechanics, teaching, community design, monetization and modernization.
AI Application Companies Are Moving Beyond Frontier APIs to Protect Margins
Baseten founder and chief executive Tuhin Srivastava used a Stanford MS&E435 seminar with instructor Apoorv Agrawal to argue that inference is becoming the cost of goods sold for AI applications. His case is that scaled AI companies will need to move beyond default frontier-model APIs toward custom or post-trained models, both to improve margins and to protect the workflows and user signals that make their products defensible. Baseten’s role, as Srivastava framed it, is to provide the production inference stack and compute access needed to run that custom intelligence at scale.
Inference Constraints Are Reshaping Language Model Architecture
In a Stanford CS336 guest lecture, Dan Fu argued that language-model inference is no longer downstream plumbing but a central research and design constraint. Fu described serving as the machinery that turns a trained model into a usable system, where schedulers, KV caches, GPU kernels, routing policies and hardware choices determine which architectures are practical, economical and reliable at scale.
Geometric Priors Can Make Robot Learning Far More Data Efficient
In a Stanford Robotics Seminar talk, Northeastern computer science professor Robert Platt argues that robot learning should move between brittle hand-coded models and data-hungry generalist policies by building geometry into learned systems. His case is that representations such as equivariant point-cloud policies, spherical image embeddings, ray-based attention and image-plane control can make robots generalize over pose without having to learn that structure from scratch. Platt presents the payoff as data efficiency: geometric bias does not replace scaling, but can shift the curve so scarce robot demonstrations count for more.
Native Multimodal Models Extend LLMs but Still Lack Unified Representations
Victoria Lin of Thinking Machines uses a Stanford CS25 seminar to argue that native multimodal models have extended much of the large-language-model recipe into images, audio, video and action, but have not yet unified multimodal intelligence. Her account is that tokenization, Transformers, autoregressive conditioning and scaling transfer only partly: images, video and action require different representations, objectives and sometimes modality-specific parameters. The result, she says, is a field moving beyond text-only systems while still relying on text as its strongest abstraction for reasoning.
Production Inference Turns Transformer Models Into a Full-Stack Systems Problem
In a Stanford CS25 seminar, Modal’s Charles Frye argues that transformer inference has become the economic and operational center of AI systems: training produces weights, but serving turns them into usable, billable products. His account treats production inference as a full-stack problem, where application latency goals, workload shape, model choice, GPU memory limits, deployment failures, observability and cost controls all determine whether a system works. Frye’s main warning is that the largest serving gains come from matching the inference stack to the application, not from treating model hosting as a generic infrastructure task.
Vision-Language Models Understand Multimodal Inputs but Still Generate Text
Stanford’s CS336 lecture on alignment and multimodality, led by Percy Liang with Tatsunori Hashimoto, argues that the core problem in vision-language systems is still how to turn non-text data into tokens a Transformer can use. The lecture traces the field from CLIP and SigLIP through LLaVA and Qwen, presenting modern VLMs as largely built around a stable template: a vision encoder, an adapter, and a pretrained language model that generates text. Liang’s larger point is that these systems are powerful multimodal input models, but not true omni models; representing images and video without losing fine detail remains the central technical constraint.
Open Image Models Converge on Flow Matching and DiT Architectures
Stanford adjunct lecturer Shervine Amidi uses Lecture 8 of CME296 to argue that modern visual generation is best understood as a stack of choices for transporting noise into data: the paradigm, representation, architecture, training procedure, and evaluation method. He presents flow matching as the current default for image-generation systems, diffusion transformers as the dominant architectural direction, and latent spaces as a practical compression tradeoff now being challenged by scaled pixel-space models.
Text-to-Image Evaluation Requires Metrics Matched to Specific Failure Modes
Stanford adjunct lecturers Afshine Amidi and Shervine Amidi argue that evaluating text-to-image models starts with separating aesthetic quality from prompt adherence, then choosing metrics suited to the failure being tested. In Lecture 7 of Stanford’s CME296 course on diffusion and large vision models, they treat human ratings, FID, CLIPScore, reference-based measures, multimodal judges, and benchmarks as imperfect instruments rather than substitutes for a universal image-quality score. Their central warning is practical: automated and qualitative evaluations can be useful, but only when their assumptions, calibration, and failure modes are made explicit.
Uber Prosecution Shows Incident Response Is Now a Governance Risk
Joe Sullivan, the former federal cybercrime prosecutor and security executive at Facebook, Uber and Cloudflare, uses a Stanford CS153 lecture to argue that modern technology leadership now turns as much on governance and transparency as on technical response. Drawing on his prosecution over Uber’s 2016 security incident, Sullivan says companies need to assign disclosure authority, document cross-functional decisions, and build executive trust before a crisis, because the legal and reputational failure around an incident can become as consequential as the breach itself.
Model Behavior Depends More on Post-Training Data Than Algorithms
Stanford computer scientist Tatsunori Hashimoto’s CS336 lecture argues that post-training is less a matter of exotic algorithms than of choosing the data and feedback that turn a broadly capable pretrained model into a controllable product. He presents supervised fine-tuning as a way to extract behaviors already latent in pretraining, and RLHF as preference optimization whose results depend heavily on annotators, reward models, safety data and evaluation incentives. The lecture’s central warning is that style, refusals, hallucination, and reward hacking are not side issues; they are consequences of the data pipeline that shapes what users actually see.
Language-Model Data Pipelines Decide What Models Can Learn
Stanford’s CS336 lecture on data, taught by Percy Liang and Tatsunori Hashimoto, argues that language-model performance is shaped as much by corpus construction as by training itself. The lecture treats transformation, filtering, deduplication, source mixing and synthetic post-training data as engineering decisions that define what the model sees, how often it sees it and which compute is wasted. Its recurring point is that scalable algorithms are necessary, but the decisive choices still come from inspecting concrete data and deciding what “quality” means for the model being built.
RLVR Moves Post-Training From Human Preferences to Checkable Rewards
Stanford computer scientist Tatsunori Hashimoto presents reinforcement learning from verifiable rewards as the current practical route beyond RLHF for reasoning models, especially in math, coding and software-agent settings. His argument is that RLVR works because it replaces learned preference proxies with rewards that can be checked more directly, but that the reward remains the bottleneck: GRPO and related methods made the recipe simpler to run, while systems such as DeepSeek R1, Kimi k1.5 and Qwen show both the gains and the ways ostensibly verifiable rewards can still be gamed.
Frontier AI Has Become a Gigawatt-Scale Industrial Infrastructure Race
In a Stanford MS&E seminar on the economics of the AI supercycle, OpenAI infrastructure executive Sachin Katti argued that frontier AI has become an industrial systems problem, not a GPU procurement problem. Katti said usable compute now depends on synchronizing chips, memory, networking, power, cooling, buildings, land, suppliers and operators at gigawatt scale. His broader case was that OpenAI’s model and revenue ambitions depend on how quickly it can turn that whole chain into reliable infrastructure for training, inference and agentic workloads.
DeepMind’s AI Co-Scientist Turns LLMs Into Debate-Driven Research Agents
Google DeepMind’s Vivek Natarajan used a Stanford CS25 seminar to argue that scientific AI will require more than stronger chatbot-style models. He presented the company’s Gemini-based AI co-scientist as a multi-agent system built to generate, critique, rank and refine hypotheses over longer time horizons, with lab validation rather than benchmark scores as the test of usefulness. The case he made was cautious as well as ambitious: such systems may help scientists traverse large hypothesis spaces, but their value still depends on expert judgment, experimental capacity, publishing norms and safety controls.
Value Per Gigawatt Is Becoming AI Infrastructure’s Core Metric
Amin Vahdat, Google’s chief technologist for AI infrastructure and leader of its internal compute and TPU programs, argues in a Stanford CS153 lecture that AI infrastructure should be judged by value delivered per dollar, not by gigawatts or flops alone. With a gigawatt-scale buildout costing roughly $40 billion to $50 billion, he says the scarce discipline is building systems that are reliable enough, balanced across compute, memory and networks, procurable on multi-year timelines, and useful to customers and communities rather than merely large.
Enterprise AI Advantage Comes From Internal Evals and Proprietary Context
Yash Patil, chief executive of Applied Compute and a guest speaker in Stanford’s MS&E435 seminar, argues that the enterprise opportunity in AI is shifting from access to general frontier models toward the ability to define and optimize company-specific tasks. General models provide a baseline, he says, but durable advantage comes from internal evals, verifiers, feedback loops, proprietary context and product constraints that teach systems what “correct” means inside a business.
Generative AI’s Revenue Stack Is Still Inverted Toward Chips
Stanford adjunct lecturer and Altimeter partner Apoorv Agrawal argues in MS&E435 that generative AI’s economics still look unlike the software and cloud cycles investors often use to value it. In his estimates, AI revenue has grown sharply, but gross profit remains concentrated in semiconductors, while applications face inference costs, thin monetization and uncertain paths to mass-market utility. The question he puts to students is not whether AI demand exists, but how long the stack’s inverted shape can persist before applications and infrastructure capture more of the value.
Neuro-Symbolic Planning Makes Robot Learning More Data-Efficient
Jiayuan Mao, a Member of Technical Staff at Amazon Frontier AI & Robotics and incoming University of Pennsylvania assistant professor, argues in a Stanford Robotics Seminar that robot learning should be built around planning over compositional world models rather than direct policy fitting alone. His case is that neuro-symbolic systems — neural models embedded in symbolic constraint graphs for objects, relations, actions and effects — can learn from few demonstrations, compose skills at inference time and generalize to new objects, states and goals more reliably than end-to-end policies.
Robots Need Game-Theoretic Planning to Navigate Human Interaction
UC Berkeley roboticist Negar Mehr uses a Stanford robotics seminar on interactive autonomy to argue that robots cannot handle shared spaces by treating people and other robots as moving obstacles. She frames interaction as a coupled decision problem: agents must predict how others will respond to their own actions, coordinate across multiple possible equilibria, and learn from demonstrations of interaction rather than isolated behavior. Her broader case is that game-theoretic structure, multi-agent learning, and training-time foundation-model coaching can make that coupling tractable without replacing deployed control policies.
Language Models Generalize Differently From Parameters Than From Context
In a Stanford CS25 seminar, Anthropic researcher Andrew Lampinen argues that language models generalize differently depending on whether information is stored in their parameters or supplied in context. His experiments find that models can often use relations flexibly when the relevant facts are visible in the prompt, but fail to make the same reversals, syllogistic inferences, or codebook translations when those facts have only been learned through training. Lampinen presents augmentation, retrieval, and reinforcement-learned recall as partial ways to make latent implications more usable, while stressing that parametric learning and in-context learning remain complementary rather than substitutes.
AI Defaults Can Become Clinical Decisions in Digital Health
UCSF clinical informatics professor Peter Washington argues in a Stanford HCI seminar that AI-enabled digital health systems fail or succeed on decisions that often look like engineering defaults: metrics, thresholds, prompts, labels and workflow placement. Using examples from wearables, substance-use interventions, sepsis alerts, Apple Watch hypertension detection and Parkinson’s assessment, he makes the case that human-centered design is not a layer added after modeling, but part of how the model is trained, evaluated and made usable.
AI-Native Startups Are Replacing Teams With Agentic Operating Systems
In a Stanford CS153 Frontier Systems lecture, Y Combinator CEO Garry Tan and general partner Diana Hu argue that AI agents are changing the basic production unit of a startup from a team to a founder operating through skills, memory, evals and customer feedback loops. Tan frames agentic coding as a programmable company architecture, while Hu says AI-native companies are becoming closed-loop systems with far higher revenue per employee and less need for traditional managerial coordination.
AI Evaluation Benchmarks Measure Different Questions, Not One Scoreboard
Stanford’s CS336 lecture on evaluation, led by Percy Liang with sections from Tatsunori Hashimoto, argues that model evaluation is not a single scoreboard but a choice about what behavior is being measured and for what purpose. The lecture treats perplexity, exam benchmarks, chat preferences, agent tasks, reasoning puzzles, safety tests and realistic professional evaluations as different instruments with different failure modes. Its central claim is procedural: before reading or designing a benchmark, define the object being evaluated, the use case it serves and the trade-offs among difficulty, realism and validity.
Models Are Trained on Curated Corpora, Not the Internet
Stanford CS336’s data lecture, taught by Tatsunori Hashimoto, argues that training data is both the most consequential and least transparent part of modern language models. Hashimoto says models are not trained on “the internet” in any simple sense, but on static corpora shaped by crawlers, access limits, licensing, copyright risk, filtering, deduplication and conversion choices. The lecture’s central claim is that data construction is a legal and operational pipeline, not a passive input, and that those choices materially distinguish otherwise similar models.
Text-to-Image Training Is Becoming a Problem of Signal Allocation
Stanford adjunct lecturers Shervine Amidi and Afshine Amidi present text-to-image model training as a problem of allocating scarce learning signal across the full model lifecycle, not simply choosing a diffusion or flow-matching loss. In Lecture 6 of Stanford’s CME296 course, they argue that practical training depends on emphasizing hard timesteps, adjusting for resolution, using data curricula and representation alignment, then applying post-training, personalization, and distillation methods to improve control and reduce inference cost.
Language Model Scaling Depends on Controlling Hyperparameter Drift
Stanford’s CS336 scaling-laws lecture, taught by Tatsunori Hashimoto, argues that modern language-model scaling is less about accepting a single Chinchilla-style rule than about controlling which training choices drift with size. Hashimoto presents scaling laws as useful empirical tools for choosing model/data tradeoffs, learning rates, batch sizes, sparsity, optimizers, and architectures, but repeatedly cautions that their transfer depends on the regime that produced them. Techniques such as µP and WSD schedules can reduce some uncertainty, he says, while data mixtures, optimizer details, weight decay, architecture changes, and post-training can still break clean extrapolations.
Computing Is Shifting From Prerecorded Execution to Continuous Generation
In a Stanford CS153 Frontier Systems lecture, NVIDIA chief executive Jensen Huang argues that AI is forcing the first fundamental reinvention of computing in decades, moving the industry from prerecorded, on-demand execution to continuous real-time generation. Huang says that shift requires rebuilding the full stack — chips, compilers, networks, storage, systems and institutions — around new bottlenecks, with NVIDIA’s co-design approach producing gains that conventional Moore’s Law scaling cannot match.
Uranium Enrichment Is the Missing Link in AI’s Power Supply
In a Stanford CS153 Frontier Systems lecture, General Matter chief executive Scott Nolan argues that AI’s infrastructure constraint is moving upstream from chips and data centers to electricity. For high-uptime, low-carbon data-center power, Nolan says the long-term answer points toward nuclear, but the decisive U.S. bottleneck is not reactors themselves; it is uranium enrichment, a capability he says the country has largely lost and that General Matter was founded to rebuild.
Autonomous Medical Robots Need Physics Models, Not Just Foundation Models
UC San Diego professor Michael Yip argues in a Stanford Robotics Seminar that medical robotics must move beyond teleoperation if it is to address healthcare labor shortages. Current surgical robots can improve precision but still depend on a surgeon’s skill, while surgery’s scarce data, deformable tissue, safety constraints, and need for millimeter accuracy make end-to-end learning an inadequate answer on its own. Yip makes the case for a hybrid path: modern perception where it works, explicit physics and control where contact demands it, and humanoid platforms where broader hospital tasks require more general embodiment.
KV Cache Movement Has Become the Core Inference Bottleneck
Stanford’s CS336 lecture on inference, taught by Percy Liang with Tatsunori Hashimoto, argues that serving language models is now a core systems problem rather than an afterthought to training. Liang’s central claim is that autoregressive Transformer generation is sequential and often memory-bound, especially because attention must repeatedly move KV-cache data rather than perform dense, easily parallelized computation. The lecture treats batching, grouped-query and latent attention, quantization, pruning, speculative decoding, continuous batching, and PagedAttention as different attempts to move fewer bytes, reuse memory better, or trade latency for throughput without degrading model quality too much.
Reasoning Gains Persist When Models Learn Them During Pretraining
Shrimai Prabhumoye of Mistral AI used a Stanford CS25 seminar to argue that large-language-model pretraining is becoming less a matter of adding tokens and more a question of training strategy. Drawing on studies of curriculum ordering, early reasoning data, and reinforcement as a pretraining objective, she said base models improve when they see broad data before high-quality data, encounter reasoning traces during pretraining rather than only post-training, and are rewarded for intermediate thoughts that improve prediction.
Ultra-Scale Training Depends on Memory Sharding and Communication Overlap
Nouamane Tazi of Hugging Face uses a Stanford CS25 seminar to argue that ultra-scale model training is less a question of adding GPUs than of managing memory, communication, batch size, and hardware topology. His central case is that 5D parallelism—data, tensor, pipeline, context, and expert parallelism—lets training runs span massive clusters only when each axis is chosen for a specific bottleneck. The practical rule, he says, is conservative: shard only as much as the workload requires, because every added parallelism dimension buys scale by spending communication, complexity, or both.
AI Is Moving Venture Capital’s Bottlenecks to Compute, Power, and Policy
Ben Horowitz, co-founder of Andreessen Horowitz, uses a Stanford CS153 lecture with Anjney Midha to argue that venture capital is a systems business whose constraints keep moving. He says a16z was built in 2009 to serve entrepreneurs rather than merely allocate capital, using centralized control, small investment groups, and a deliberately constructed relationship network. In Horowitz’s account, AI has shifted the next bottlenecks toward capital, compute, electricity, policy, moats, and culture, forcing venture firms and startups to redesign around those constraints rather than rely on older software-era assumptions.
AI Is Splitting Product Management Into Builders and Information Movers
In a Stanford CS153 guest lecture, Mike Abbott and Nikhyl Singhal argue that AI is changing product management by eroding the value of roles built around coordination, reporting, and internal information flow. Singhal, founder of Skip and a former product executive at Meta, Google, and Credit Karma, says companies still need product judgment, but increasingly favor hands-on builders who can understand customers, work with technical systems, and make decisions. His broader case is that the product role now depends less on title and process than on company stage, iteration speed, and the ability to build directly.
Luma Is Rebuilding Video AI Around a Unified Multimodal Transformer
In a Stanford CS153 guest lecture, Luma AI co-founder and chief executive Amit Jain argues that generative video is only a staging point toward “unified intelligence”: models that understand and generate across text, images, video, audio, code and tools in a single work loop. Jain traces Luma’s path from Apple-era LiDAR and 3D capture to internet-scale video, saying the company followed the data but now sees prettier clips as insufficient. The destination, he says, is a multimodal AI factory for professional creative and physical work, where human skills, tool use, feedback and unified transformer architectures produce full campaigns, schematics, productions and eventually robotics workflows.