Orply.

Tatsunori Hashimoto

Assistant Professor of Computer Science at Stanford University and instructor for Stanford CS336, focused on machine learning, natural language processing, language models, and robust/trustworthy AI systems.

Vision-Language Models Understand Multimodal Inputs but Still Generate Text

Stanford’s CS336 lecture on alignment and multimodality, led by Percy Liang with Tatsunori Hashimoto, argues that the core problem in vision-language systems is still how to turn non-text data into tokens a Transformer can use. The lecture traces the field from CLIP and SigLIP through LLaVA and Qwen, presenting modern VLMs as largely built around a stable template: a vision encoder, an adapter, and a pretrained language model that generates text. Liang’s larger point is that these systems are powerful multimodal input models, but not true omni models; representing images and video without losing fine detail remains the central technical constraint.

Stanford OnlineJun 4, 202622 min read

Model Behavior Depends More on Post-Training Data Than Algorithms

Stanford computer scientist Tatsunori Hashimoto’s CS336 lecture argues that post-training is less a matter of exotic algorithms than of choosing the data and feedback that turn a broadly capable pretrained model into a controllable product. He presents supervised fine-tuning as a way to extract behaviors already latent in pretraining, and RLHF as preference optimization whose results depend heavily on annotators, reward models, safety data and evaluation incentives. The lecture’s central warning is that style, refusals, hallucination, and reward hacking are not side issues; they are consequences of the data pipeline that shapes what users actually see.

Stanford OnlineMay 27, 202623 min read

Language-Model Data Pipelines Decide What Models Can Learn

Stanford’s CS336 lecture on data, taught by Percy Liang and Tatsunori Hashimoto, argues that language-model performance is shaped as much by corpus construction as by training itself. The lecture treats transformation, filtering, deduplication, source mixing and synthetic post-training data as engineering decisions that define what the model sees, how often it sees it and which compute is wasted. Its recurring point is that scalable algorithms are necessary, but the decisive choices still come from inspecting concrete data and deciding what “quality” means for the model being built.

Stanford OnlineMay 27, 202620 min read

RLVR Moves Post-Training From Human Preferences to Checkable Rewards

Stanford computer scientist Tatsunori Hashimoto presents reinforcement learning from verifiable rewards as the current practical route beyond RLHF for reasoning models, especially in math, coding and software-agent settings. His argument is that RLVR works because it replaces learned preference proxies with rewards that can be checked more directly, but that the reward remains the bottleneck: GRPO and related methods made the recipe simpler to run, while systems such as DeepSeek R1, Kimi k1.5 and Qwen show both the gains and the ways ostensibly verifiable rewards can still be gamed.

Stanford OnlineMay 27, 202620 min read

AI Evaluation Benchmarks Measure Different Questions, Not One Scoreboard

Stanford’s CS336 lecture on evaluation, led by Percy Liang with sections from Tatsunori Hashimoto, argues that model evaluation is not a single scoreboard but a choice about what behavior is being measured and for what purpose. The lecture treats perplexity, exam benchmarks, chat preferences, agent tasks, reasoning puzzles, safety tests and realistic professional evaluations as different instruments with different failure modes. Its central claim is procedural: before reading or designing a benchmark, define the object being evaluated, the use case it serves and the trade-offs among difficulty, realism and validity.

Stanford OnlineMay 20, 202619 min read

Models Are Trained on Curated Corpora, Not the Internet

Stanford CS336’s data lecture, taught by Tatsunori Hashimoto, argues that training data is both the most consequential and least transparent part of modern language models. Hashimoto says models are not trained on “the internet” in any simple sense, but on static corpora shaped by crawlers, access limits, licensing, copyright risk, filtering, deduplication and conversion choices. The lecture’s central claim is that data construction is a legal and operational pipeline, not a passive input, and that those choices materially distinguish otherwise similar models.

Stanford OnlineMay 20, 202622 min read

Language Model Scaling Depends on Controlling Hyperparameter Drift

Stanford’s CS336 scaling-laws lecture, taught by Tatsunori Hashimoto, argues that modern language-model scaling is less about accepting a single Chinchilla-style rule than about controlling which training choices drift with size. Hashimoto presents scaling laws as useful empirical tools for choosing model/data tradeoffs, learning rates, batch sizes, sparsity, optimizers, and architectures, but repeatedly cautions that their transfer depends on the regime that produced them. Techniques such as µP and WSD schedules can reduce some uncertainty, he says, while data mixtures, optimizer details, weight decay, architecture changes, and post-training can still break clean extrapolations.

Stanford OnlineMay 19, 202619 min read

KV Cache Movement Has Become the Core Inference Bottleneck

Stanford’s CS336 lecture on inference, taught by Percy Liang with Tatsunori Hashimoto, argues that serving language models is now a core systems problem rather than an afterthought to training. Liang’s central claim is that autoregressive Transformer generation is sequential and often memory-bound, especially because attention must repeatedly move KV-cache data rather than perform dense, easily parallelized computation. The lecture treats batching, grouped-query and latent attention, quantization, pruning, speculative decoding, continuous batching, and PagedAttention as different attempts to move fewer bytes, reuse memory better, or trade latency for throughput without degrading model quality too much.

Stanford OnlineMay 12, 202617 min read