Shannon’s Entropy Limit Frames Language Models as Text Compressors
Grant Sanderson’s 3Blue1Brown video uses the question of how far English can be compressed to rebuild Shannon’s definitions of information and entropy. Sanderson argues that prediction and compression are mathematically equivalent: a good language predictor is, in principle, a good text compressor, and Shannon’s estimate of roughly one bit per English character frames the limit such systems are trying to approach. The result is a narrower version of the slogan “compression is intelligence”: not a definition of intelligence, but an explanation of why compression theory sits so close to modern language-model training.
3Blue1Brown·Jun 7, 2026·13 min read