Shervine Amidi

Adjunct Lecturer at Stanford University’s Institute for Computational and Mathematical Engineering, where he co-teaches CME 295 on transformers and large language models and CME 296 on diffusion and large vision models. He is an AI and machine learning educator with Stanford and École Centrale Paris training.

Open Image Models Converge on Flow Matching and DiT Architectures

Stanford adjunct lecturer Shervine Amidi uses Lecture 8 of CME296 to argue that modern visual generation is best understood as a stack of choices for transporting noise into data: the paradigm, representation, architecture, training procedure, and evaluation method. He presents flow matching as the current default for image-generation systems, diffusion transformers as the dominant architectural direction, and latent spaces as a practical compression tradeoff now being challenged by scaled pixel-space models.

Stanford OnlineJun 1, 202623 min read

Text-to-Image Evaluation Requires Metrics Matched to Specific Failure Modes

Stanford adjunct lecturers Afshine Amidi and Shervine Amidi argue that evaluating text-to-image models starts with separating aesthetic quality from prompt adherence, then choosing metrics suited to the failure being tested. In Lecture 7 of Stanford’s CME296 course on diffusion and large vision models, they treat human ratings, FID, CLIPScore, reference-based measures, multimodal judges, and benchmarks as imperfect instruments rather than substitutes for a universal image-quality score. Their central warning is practical: automated and qualitative evaluations can be useful, but only when their assumptions, calibration, and failure modes are made explicit.

Stanford OnlineMay 28, 202619 min read

Text-to-Image Training Is Becoming a Problem of Signal Allocation

Stanford adjunct lecturers Shervine Amidi and Afshine Amidi present text-to-image model training as a problem of allocating scarce learning signal across the full model lifecycle, not simply choosing a diffusion or flow-matching loss. In Lecture 6 of Stanford’s CME296 course, they argue that practical training depends on emphasizing hard timesteps, adjusting for resolution, using data curricula and representation alignment, then applying post-training, personalization, and distillation methods to improve control and reduce inference cost.

Stanford OnlineMay 19, 202621 min read