Victoria Lin

Victoria Lin is a research scientist at Thinking Machines Lab focused on native multimodal intelligence. She previously worked at Meta SuperIntelligence Labs and Salesforce Research, and earned a PhD in computer science from the University of Washington.

Native Multimodal Models Extend LLMs but Still Lack Unified Representations

Victoria Lin of Thinking Machines uses a Stanford CS25 seminar to argue that native multimodal models have extended much of the large-language-model recipe into images, audio, video and action, but have not yet unified multimodal intelligence. Her account is that tokenization, Transformers, autoregressive conditioning and scaling transfer only partly: images, video and action require different representations, objectives and sometimes modality-specific parameters. The result, she says, is a field moving beyond text-only systems while still relying on text as its strongest abstraction for reasoning.

Stanford OnlineJun 4, 202619 min read