Percy Liang

Professor of Computer Science at Stanford University and Senior Fellow at Stanford HAI, focused on foundation models, language models, machine learning, and natural language processing. He directs Stanford’s Center for Research on Foundation Models and teaches CS336: Language Modeling from Scratch.

Vision-Language Models Understand Multimodal Inputs but Still Generate Text

Stanford’s CS336 lecture on alignment and multimodality, led by Percy Liang with Tatsunori Hashimoto, argues that the core problem in vision-language systems is still how to turn non-text data into tokens a Transformer can use. The lecture traces the field from CLIP and SigLIP through LLaVA and Qwen, presenting modern VLMs as largely built around a stable template: a vision encoder, an adapter, and a pretrained language model that generates text. Liang’s larger point is that these systems are powerful multimodal input models, but not true omni models; representing images and video without losing fine detail remains the central technical constraint.

Stanford OnlineJun 4, 202622 min read

Language-Model Data Pipelines Decide What Models Can Learn

Stanford’s CS336 lecture on data, taught by Percy Liang and Tatsunori Hashimoto, argues that language-model performance is shaped as much by corpus construction as by training itself. The lecture treats transformation, filtering, deduplication, source mixing and synthetic post-training data as engineering decisions that define what the model sees, how often it sees it and which compute is wasted. Its recurring point is that scalable algorithms are necessary, but the decisive choices still come from inspecting concrete data and deciding what “quality” means for the model being built.

Stanford OnlineMay 27, 202620 min read

AI Evaluation Benchmarks Measure Different Questions, Not One Scoreboard

Stanford’s CS336 lecture on evaluation, led by Percy Liang with sections from Tatsunori Hashimoto, argues that model evaluation is not a single scoreboard but a choice about what behavior is being measured and for what purpose. The lecture treats perplexity, exam benchmarks, chat preferences, agent tasks, reasoning puzzles, safety tests and realistic professional evaluations as different instruments with different failure modes. Its central claim is procedural: before reading or designing a benchmark, define the object being evaluated, the use case it serves and the trade-offs among difficulty, realism and validity.

Stanford OnlineMay 20, 202619 min read

KV Cache Movement Has Become the Core Inference Bottleneck

Stanford’s CS336 lecture on inference, taught by Percy Liang with Tatsunori Hashimoto, argues that serving language models is now a core systems problem rather than an afterthought to training. Liang’s central claim is that autoregressive Transformer generation is sequential and often memory-bound, especially because attention must repeatedly move KV-cache data rather than perform dense, easily parallelized computation. The lecture treats batching, grouped-query and latent attention, quantization, pruning, speculative decoding, continuous batching, and PagedAttention as different attempts to move fewer bytes, reuse memory better, or trade latency for throughput without degrading model quality too much.

Stanford OnlineMay 12, 202617 min read