Data Scarcity, Not Compute, Is the Next AI Bottleneck

Ben SpectorSequoia CapitalThursday, May 7, 20266 min read

At AI Ascent 2026, Flapping Airplanes co-founders Ben and Asher Spector argued that data scarcity, more than compute alone, will determine where AI can create value next. They said the biggest gains so far have come in unusually data-rich domains such as search and coding, while much of the economy — including robotics, trading, science and narrow industrial workflows — lacks comparable datasets. Their proposed answer is to make models far more data-efficient by developing new GPU-level primitives that current frameworks such as PyTorch make hard to express.

The constraint is not just model scale

Data scarcity, not just compute or model scale, may determine where AI creates value next. Ben Spector and Asher Spector built Flapping Airplanes’ argument around a specific bottleneck: AI systems that need far less data to reach valuable capability.

Asher noted that the company’s name had caused confusion since its launch three months earlier, drawing inbound from the aviation industry for runways, airplane parts, wind tunnels, and headquarters. His correction was that Flapping Airplanes is not an airplane company, but an AI lab.

Ben framed the current generation of large language models as very strong on a narrow pair of economically large, unusually data-rich tasks: search and coding. Search can draw on “basically the entire internet,” he said. Coding is “a big fraction of the internet,” and is also friendly to synthetic data generation. In the data Ben showed, search was labeled as a $200 billion task and coding as a $1 trillion task, while recent and frontier training scales ranged from single-digit trillions of tokens to the order of 100 trillion tokens.

The question Ben posed was whether comparable capability can be reached with much less data. He pointed to human learning as a comparison: people can become good at coding with “maybe 10,000 times or 100,000 times less data” than current models consume. That comparison set up the central thesis: the next valuable AI frontier may depend less on finding more data-rich tasks and more on making models work in data-poor settings.

Domain	Value shown	Data condition in the talk
Search	$200B	High data
Coding	$1T	High data
Trading	$3T	Limited financial data
Robotics	$15T	Requires difficult data generation
Scientific discovery	Unbounded	Very little data
End-to-end toaster supply chain	$3B	Representative of the long tail

Ben Spector contrasted data-rich AI tasks with lower-data domains that make up much of the broader economy.

The toaster supply chain example was deliberately comic, but Ben used it to make a serious point. The economy is not made only of search and coding. It is made of “tens of thousands” of narrower operational domains, many of which do not have internet-scale datasets waiting to be scraped or synthesized.

Why compute is easier to buy than data

Asher Spector gave a second reason data efficiency matters: compute is easier to scale than data. FLOPs get cheaper over time, and data is also getting cheaper, he said, but “probably” not as quickly as compute.

Asher contrasted the relative homogeneity of compute with the fragmentation of frontier-quality data. The visual he used, attributed to Epoch AI, showed deployed FLOPS rising across 2023, 2024, and 2025, segmented by companies including Nvidia, Huawei, Amazon, AMD, and Google. Against that, he listed many kinds of data sources: web crawls, books, code, news, medical, legal, finance, video, satellite, sensors, government, enterprise, patents, scientific, and clinical.

His point was operational. The compute market is “more homogeneous” than the data market. He relayed an anecdote that after GPT launched, “Greg” said they could try to buy all the compute. There is no equivalent centralized purveyor of frontier data. To collect long-tail domain data, companies may have to negotiate with businesses, handle terms of use, and deal with regulations.

1000x

Deployment implication of a model that is 1000x more data-efficient

Asher stated the implication directly: if a model is a thousand times more data-efficient, “it’d be a thousand times easier to deploy.” The recap later sharpened that into the line that a model “1000x more data-efficient is 1000x easier to deploy into the economy.”

The third reason was about market structure. Asher argued that relatively few companies can train AI models today, partly because of compute centralization and partly because of data centralization. He said he had heard of new labs trying to create capabilities by buying distressed bookstores and visiting rare libraries to find niche data needed for frontier models.

If data is the moat, he argued, data efficiency changes who can compete. Earlier at the event, according to Asher, an audience poll had identified data as the most common answer to what the AI moat is. On that premise, data efficiency is not just a technical optimization. It “modulates who can actually participate in which parts of the AI economy.”

If you care about the shape of the world to come, I think you really should care about data efficiency.

Asher Spector

Their proposed path runs through systems, not just algorithms

Asher Spector said Flapping Airplanes’ goal is to design data-efficient AI, but he did not describe the company’s algorithms, calling them core IP. What he did disclose was the systems strategy the company believes can unlock those algorithms: new primitives for interacting with hardware.

The company’s central technical diagram separated “what GPUs can do efficiently” from the smaller set of “what current frameworks can do efficiently.” The larger circle represented the hardware’s efficient capability; the smaller circle represented what frameworks such as PyTorch make easy to express. Asher’s claim was that much existing research lives inside the smaller circle, because today’s frameworks make some operations natural and others difficult. Flapping Airplanes is looking in the gap: operations GPUs can perform efficiently but current frameworks do not make easy to use.

He gave “fine-grainedness” as an example of something difficult under current frameworks but possible on GPUs. He also pointed to history as support for the bet. A visual listed DistBelief in 2011, Transformers in 2017, and FlashAttention in 2022 as examples where machine learning progress involved new primitives for interacting with hardware. His distinction was that this does not necessarily require designing new chips; it can mean extracting more from hardware that already exists.

Ben Spector connected that thesis to his own background. During his Stanford PhD, he said, he worked on early Megakernels research, “trying to make GPUs do weird stuff.” At Flapping Airplanes, he said the team is going further in “spiritually similar directions,” trying to “abuse the crap out of GPUs in ways that haven’t been done before.”

Ben then described the programming-model problem. Current machine-learning frameworks are easy to use because they synthesize a single-threaded programming model on top of massively parallel processors. A programmer writes a sequence like Matmul, Attention, Matmul, RMS Norm; under the hood, the system dispatches work across parallel hardware.

But Ben argued that this model breaks down for more irregular computations. The examples he showed had increasingly sparse and complex routing patterns between operations. Those patterns, he said, are “not easily expressed in current frameworks.”

He then showed what he called a teaser of Flapping Airplanes’ internal framework: a virtual machine that “takes over the whole GPU” so the team can run its own execution model. He emphasized that the displayed trace was not a real workload, but a stylized example of something asymptotically inefficient to run in PyTorch: “a very small batch, kind of deeply pipelined, Hogwild style training loop.”

The systems work matters, in Ben’s framing, because it opens algorithmic space. New systems enable new algorithms, and the company believes many of those algorithms are relevant to data efficiency. The core work is the co-optimization of the two.

Asher closed by tying that technical posture to hiring. The team works with people who have trained large models, he said, but it is also looking for people with unconventional backgrounds who can help “change the paradigm.” The recruiting slide listed examples of the kind of unconventional profile the company values: a Clash of Clans world champion, a current high school student, an IMO perfect score, and “1T model enjoyers.”

Data and Training AI Research Methods AI Infrastructure and Compute

The constraint is not just model scale

Why compute is easier to buy than data

Their proposed path runs through systems, not just algorithms

The frontier, in your inbox tomorrow at 08:00.