Karan Singh

Karan Singh is a Stanford Electrical Engineering PhD student and NSF Graduate Research Fellow in the Stanford Translational AI Lab, working on applied machine learning, foundation models for functional MRI, and medical AI; he is also listed as an instructor for Stanford CS25: Transformers United.

Ultra-Scale Training Depends on Memory Sharding and Communication Overlap

Nouamane Tazi of Hugging Face uses a Stanford CS25 seminar to argue that ultra-scale model training is less a question of adding GPUs than of managing memory, communication, batch size, and hardware topology. His central case is that 5D parallelism—data, tensor, pipeline, context, and expert parallelism—lets training runs span massive clusters only when each axis is chosen for a specific bottleneck. The practical rule, he says, is conservative: shard only as much as the workload requires, because every added parallelism dimension buys scale by spending communication, complexity, or both.

Stanford OnlineMay 11, 202618 min read