AI Progress Is Being Bought With Data, Not Sample Efficiency

Dwarkesh PatelDwarkesh PatelFriday, June 19, 20268 min read

Dwarkesh Patel argues that recent AI progress is driven less by clear gains in sample efficiency than by an immense expansion of training data, including synthetic rollouts and highly specific human expert examples. In his account, frontier models can display broad professional competence because labs keep pushing more tasks into the training distribution, not because the systems learn new domains the way humans do. Patel says that data-heavy approach may still be commercially powerful when capabilities can be amortized across billions of uses, but it leaves unresolved whether current systems can solve their own sample-efficiency problem.

AI progress is being bought with data, not obvious gains in sample efficiency

Dwarkesh Patel defines one useful sense of intelligence as sample efficiency: how much data an agent needs in a domain before it can operate fluently and competently. By that definition, he says, recent AI progress does not clearly show major gains in training sample efficiency. The dominant improvement has come from widening and improving the data distribution, then spending the compute needed to create or identify the right data.

Reinforcement learning is central to that story. Patel describes RL as a form of synthetic data generation: compute is poured into a verifier, rubric, or LLM judge to discover which rollouts are good, and then the model is trained to predict those correct rollouts much as it is trained to predict internet text. This only works if the model already assigns some prior probability to the correct solution. That is why, in Patel’s account, frontier systems need “mind-stretching amounts” of human expert trajectories in every field and skill where competence is desired.

The human data is not generic. Patel points to listings on Mercor and Surge: Word specialists converting legacy documents into polished files, legal experts writing realistic M&A diligence or securities filings, and management consultants producing market-research templates. These are narrow, professional skills translated into examples, rubrics, and explanations of reasoning.

The scale matters as much as the specificity. Patel says each skill corresponds to at least hundreds of human experts producing completions, rubrics, and chain-of-thought explanations. He connects that to the size of the data industry serving AI labs, which he says is already earning billions of dollars annually and is moving toward “decabillions.”

The correct way to think about these models is not like a human who has learned all these different skills that you see these models displaying. It's more like a Frankenstein's monster which has been built out of a billion graphs of carefully constructed examples all sewn together.

Dwarkesh Patel

The comparison to human learning is deliberately jarring. Imagine needing decades of courses, hundreds of concurrent professors, and millions of practice tasks just to learn how to polish a Word file. Even that understates the gap, Patel says, because models generate many more attempts per task. With GRPO, he says, models produce hundreds to thousands of rollouts per task to solve the credit-assignment problem.

A four-month open-model lag points to data as the transferable advantage

Dwarkesh Patel cites an Epoch chart reporting that open models lag state-of-the-art closed models by about four months. The chart compares open-weight and closed-weight models over time on a “Smart Capabilities Index score,” with examples including GPT-4, Claude 3.5 Sonnet, GPT-4o, Claude 3 Opus, Gemini 1.5 Pro, Llama 3 400B, and Llama 3.1 405B.

4 months

reported open-model lag behind state-of-the-art closed models in the Epoch chart Patel cites

The visual point is that the open-weight frontier is not years behind the closed-weight frontier on this capabilities index. Patel treats that lag as evidence about what is transferable. If closed-model advantage mostly came from hidden training tricks, hyperparameter recipes, or architectural optimizations, public and lagging competitors should have a much harder time catching up. Those advantages are not directly visible from the outside. Data, by contrast, can be distilled from public APIs. The observed speed of catch-up is evidence, in Patel’s view, that the transferable part of frontier progress is the data distribution more than a secret algorithmic recipe.

The resulting picture is not that models are small or simple. It is that their visible capabilities are held together by an immense, mostly invisible mass of training data. Patel’s image is of AIs as “a galaxy glittering with capabilities,” with “an unimaginably massive black hole of data” at the center.

The gap between human and model learning is orders of magnitude wide

Dwarkesh Patel gives three comparisons to make the scale concrete.

First, if a person sees and hears about 2,000 words per hour, then from birth to adulthood they encounter about 200 million tokens. Frontier models, by contrast, are trained on tens to hundreds of trillions of tokens — close to a million-fold difference.

~200 million

tokens Patel estimates a person sees and hears from birth to adulthood

Second, humans can learn to teleoperate a humanoid robot or robot arm within hours. If AI systems could learn robotics skills that quickly, Patel says, robotics would become a “decatrillion-dollar industry,” with large numbers of Unitree G1-style robots doing useful work. The obstacle, in his account, is that AI systems learn much less efficiently; even millions of hours of demonstrations have not been enough for complex, open-ended robotics tasks.

Third, a teenager can learn to drive with about 20 hours of practice. Even if one credits the teenager’s prior 16 years of world-learning and physical intuition, Patel says that is still three to four orders of magnitude less data than Waymo and Tesla are using to train self-driving models.

These comparisons are not meant to deny that models can become useful. They isolate a narrower deficit: current systems may become capable after massive exposure, but they do not learn new domains the way humans do.

Evolution, multimodal experience, and scale do not erase the sample-efficiency problem

Dwarkesh Patel addresses three common objections to the human-versus-model comparison.

The first objection is evolution. Humans are not really learning from scratch, the argument goes: billions of years of evolution effectively “pre-trained” us, so comparing lifetime human experience to randomly initialized LLM training is unfair. Patel rejects this as the wrong analogy. The genome is only about three gigabytes, he says, and only one to two percent of it is protein-coding, which is not enough space to store the parameters of a pre-trained neural network. His preferred analogy is that evolution found useful hyperparameters and loss functions, while the brain still builds up its connectome during life — the closer analogue to neural-network weights.

Even granting the evolution comparison would not explain why each new marginal capability after pretraining still takes so much data. Once educated, a human does not need a hundred professors to learn a new programming language. AI systems, Patel says, continue to require enormous additional data for each new skill.

The second objection is multimodal experience. A language-token comparison omits the sensory stream people receive from birth to adulthood, which Patel says might amount to tens to hundreds of billions of tokens. His response is that blind and deaf people still have general intelligence despite being cut off from large portions of that sensory stream. Deaf people, he adds, may ingest fewer language tokens than his rough 200 million-token estimate, relying on sign language and reading. That suggests to him that the million-fold gap may be an understatement rather than an overstatement.

The third objection is scale. Scaling laws suggest larger models are more sample efficient; the human brain has about 100 trillion synapses, while Patel says current frontier models are around five trillion parameters. Perhaps one or two orders of magnitude more model size would close the gap.

Patel argues that this misunderstands the scaling-law equations. He presents loss as the sum of independent parameter and data terms:

L (N, D) = E + A / N^{α} + B / D^{β}

Using constants from the Chinchilla scaling-law paper, Patel says that even increasing the number of parameters to infinity would reduce the amount of data needed for the same loss by only a factor of 10. Humans, in his estimate, are thousands to millions of times more sample efficient than current models. Scaling current architectures, therefore, cannot by itself bridge the discrepancy. His conclusion is that humans appear to sit on a different scaling curve altogether.

Sample inefficiency may still be commercially acceptable

The practical question is whether sample efficiency is necessary for the major goals of AI labs: automating white-collar work and automating AI research.

For white-collar work, the labs’ bet is that common tasks really are common. Software engineers, analysts, and accountants perform recurring kinds of work that can be brought into the training distribution. Dwarkesh Patel says recent revenue curves suggest enormous value from doing exactly that, even if AI systems do not replicate the special features of human learning.

The commercial escape hatch is amortization. Training an AI system may be much less efficient than training a person, but an AI can absorb “gigawatts of training” and then apply what it learns across billions of sessions. Patel gives the counterfactual of a human software engineer who could only become competent after reading every public GitHub repository: the training would take so long that it would make no economic sense, and the trained person could still work on only one project at a time. A model’s inefficient training can still pay off if the resulting capability is reused at massive scale.

The unresolved issue is how much out-of-distribution thinking different jobs require. Some jobs are mechanical enough that they were automated before modern AI, with bank tellers and travel agents as Patel’s examples. Other jobs require daily work on problems far from the training distribution. Patel identifies software engineering as likely belonging to this latter category, even though it is often treated as the first profession AI will take. He says he would bet there will be more demand for human software engineers in 2027 than there is now, largely because AI will act as a complementary input.

For jobs that require more novel thinking, the labs’ plan is first to automate AI research, then use automated AI researchers to solve the sample-efficiency problem itself. That leaves the harder question: can AI systems that lack human-level sample efficiency nonetheless solve the research problems required for human-like intelligence and learning?

Patel does not answer that question here. He says current thinking about an intelligence explosion is clumsy: people either dismiss AI-accelerated AI progress entirely or assume “some kind of God pops out the other end.” What is missing, in his view, is careful reasoning about a period of much faster AI progress built on top of LLMs and the particular kind of intelligence LLMs actually have.

Data and Training AI Research Methods Open Models AI Economics and Labor

AI progress is being bought with data, not obvious gains in sample efficiency

A four-month open-model lag points to data as the transferable advantage

The gap between human and model learning is orders of magnitude wide

Evolution, multimodal experience, and scale do not erase the sample-efficiency problem

Sample inefficiency may still be commercially acceptable

The frontier, in your inbox tomorrow at 08:00.