Better Models Are Becoming Tools for Producing More Compute

The Cognitive RevolutionTuesday, June 23, 202619 min read

The source argues that AI progress is becoming recursive: more compute trains stronger models, and stronger models can help design chips, optimize clusters, and produce the next round of compute. It uses a Columbia University paper on LLM-generated Verilog, tree-of-thoughts search, and RTL-based reinforcement learning as evidence that models are entering evaluable hardware-design loops. The same premise drives its Europe 2031 forecast: without multi-billion-dollar compute commitments and frontier infrastructure, Europe risks becoming dependent on American and Chinese AI systems rather than competing at the frontier.

Compute is becoming part of its own production process

The central claim is that AI progress should be understood as a feedback loop, not a one-way dependency. More compute trains better AI systems. Better AI systems then become tools for designing better hardware, writing better infrastructure software, optimizing clusters, and accelerating the engineering work that produces still more compute.

The phrase used for this loop is “compute improves compute.” It describes a specific dependency: AI designs chips; chips run AI. Faster chips make it possible to train stronger models. Stronger models can help design the next generation of chips and the surrounding software stack. That next generation produces more compute, which feeds back into model training.

Compute improves compute: AI designs chips, chips run AI.

The concrete example is a Columbia University paper by Hao Zheng and collaborators titled “Faster and Better LLM-Based Hardware Design via Tree of Thoughts and RTL-Based Reinforcement Learning.” The paper is used as evidence that large language models are moving into a domain directly connected to the supply of compute: Verilog generation.

Verilog matters because it is not ordinary application code. It is described as the programming language used to design a processor. If a model writes better Verilog, it can contribute to hardware design. If it contributes to hardware design, it can affect the pipeline that produces the chips used to train and run future AI systems.

The paper’s displayed setup starts with a natural-language prompt and an LLM for hardware design. The model generates Verilog code, and that generated code is evaluated as passing or failing. The source shows this as a simple pipeline: “Natural Language Prompt,” “LLMs for hardware design,” “VerilogCode,” and “Evaluate Verilog Pass or Fail?” That pass/fail structure is central. Hardware-description code can be checked. It can be simulated. It can fail. That makes it a useful setting for automated feedback.

The paper then adds a framework called RTL-ToT-RL. In the visual shown on screen, the “Overall framework of RTL-ToT-RL” includes a state space, thought state, state evaluation, a Verilog environment, a state generator, and a PPO policy. Another diagram shows a reinforcement learning loop labeled “Proximal Policy Optimization,” with actor, critic, environment, state, reward, and action.

The mechanism is direct: the framework gives an LLM a positive or negative signal on the hardware code it writes. The model is not merely prompted once and judged informally. It can generate candidate designs, have those designs evaluated, and use the evaluation signal to improve.

The tree-of-thoughts component is described as checking multiple ways to write the code at the same time. Instead of relying on one generated answer, the system explores alternatives. The reinforcement learning component then supplies feedback from the RTL environment. Together, the method turns hardware-code generation into an iterative search problem with evaluable outputs.

Displayed element	What was visible	Why it matters to the argument
Verilog code-generation setup	Natural-language prompt, LLMs for hardware design, Verilog code, evaluate Verilog, pass or fail	Connects model output to hardware-description code that can be checked.
RTL-ToT-RL framework	State space, thought state, state evaluation, Verilog environment, state generator, PPO policy	Shows the system as an iterative framework rather than a one-shot prompt.
PPO loop	Actor, critic, environment, state, reward, action	Makes the reinforcement-learning feedback structure explicit.
RTLLM performance table	Models including CodeLlama, GPT-3.5, and GPT-4 across RTLLM versions	Used to support the claim that performance improves across model families.
RTLLM-V1.2 experimental table	GPT-4 baseline, GPT-4 with Reflexion, GPT-4 with RTL-ToT, GPT-4 with RTL-ToT-RL	Used to support the claim that tree-of-thoughts and RTL-based reinforcement learning improve GPT-4 results.

The technical evidence shown on screen was a sequence of diagrams and tables from the Columbia hardware-design paper.

The displayed experimental tables establish the comparison structure visible in the source. One table reports “Performance of RTLLM across different versions” and lists models including CodeLlama, GPT-3.5, and GPT-4. Another reports “Experimental results on RTLLM-V1.2” and compares GPT-4 baseline performance with GPT-4 using Reflexion, GPT-4 with RTL-ToT, and GPT-4 with RTL-ToT-RL.

Table shown	Comparison visible on screen	Spoken interpretation
Table II: Performance of RTLLM across different versions	CodeLlama, GPT-3.5, and GPT-4 across RTLLM versions, with synthesis and function columns visible	GPT-4 performance jumps, GPT-3.5 performance jumps, and Llama performance jumps.
Table IV: Experimental results on RTLLM-V1.2	GPT-4 baseline, GPT-4 with Reflexion, GPT-4 with RTL-ToT, and GPT-4 with RTL-ToT-RL	When tree-of-thoughts and RTL-based reinforcement learning are added, performance continues to improve.

The paper tables are used to support the claim that evaluated, iterative methods improve LLM hardware-code generation.

The significance is the feedback structure. Hardware code can be tested. It can be simulated. It can pass or fail. Once that feedback exists, the model can iterate much faster than a human engineering process organized around manual writing, checking, rewriting, and simulation.

The strongest formulation is that “LLMs are designing the hardware.” In context, the point is that models are becoming active participants in the hardware design process. They are not just chat interfaces or code-completion tools. They are being placed inside loops where they propose hardware code, receive feedback, and improve future proposals.

That is why the Columbia paper matters to the broader compute argument. The more capable the model becomes at producing hardware descriptions, the more it can contribute to the production of the compute that trains the next generation of models. The model becomes part of the production chain for its own future substrate.

The bottleneck shifts from human design labor to fabrication capacity

The hardware claim turns on the economics of tape-out. A chip design process is slow not only because the engineering is difficult, but because the consequences of error are expensive.

Tape-out is defined as the movement from “I want this chip” to a design that can be given to a manufacturer to print. In this description, that process takes “a literal human year.” Teams spend months writing code, checking code, rewriting code, simulating code, and trying to ensure that there is no error before committing the design to manufacturing.

The reason is the cost of the first mask. Making the first mask is described as costing millions of dollars. Because a mistake at that stage is so expensive, the process is conservative and labor-intensive. Human engineers check the work repeatedly because the manufacturing commitment is costly.

1 year

the speaker’s estimate for moving from wanting a chip to having a design ready for manufacturing

AI-generated hardware code changes the cadence if it can be evaluated quickly. The contrast is between months of human work and LLM generation that can happen in milliseconds. If a model can produce candidate Verilog quickly, test those candidates against an RTL environment, and use reinforcement learning to improve, the slowest part of the design loop may no longer be the act of writing and checking hardware code.

The claimed consequence is that the bottleneck moves downstream. If work that once required an entire team for months can be compressed into an afternoon of model-driven iteration, the constraint becomes the ability to manufacture. TSMC’s mask-printing capacity is named as the kind of industrial limit that would be hit.

This is the practical meaning of “compute improves compute.” The loop does not require every constraint to disappear. It requires one major constraint — the human pace of hardware design iteration — to loosen enough that other constraints become visible. Faster design cycles create more pressure on fabrication. More fabrication produces more chips. More chips supply more compute. More compute trains stronger models, which can then re-enter the design process.

The broader pattern is AI-assisted engineering. Chip design is one instance of a larger class of tasks where models can become more useful as they gain intelligence, larger context windows, and better long-horizon reasoning. Human engineers can still do many things AI engineers currently cannot: set up large-scale experiments, debug complex code bases, coordinate with one another, and design entirely new architectures.

Those remaining gaps matter because the compute feedback loop does not stop at Verilog. The source extends the loop beyond chip design: better models could help design better chips, write better software to optimize clusters, and support the workflows used to train still better systems. In this version of the loop, compounding happens across hardware, software, and research operations.

The chain is explicit: better models help design better chips; better chips allow better models to be trained; better models write better software for optimizing the cluster; better cluster software enables still better training. Each improvement becomes an input to the next improvement.

There is also a boundary around the loop. It does not continue infinitely. Energy eventually constrains it. Atoms eventually constrain it. There are fundamental limits of computation. But the gap between current systems and those physical limits is described as enormous. The important question is how quickly the industry moves through that gap.

That is the departure from a simple Moore’s-law-style extrapolation. If compute were only an external input improving on an independent semiconductor trend line, forecasts could treat chip progress as a background assumption. But if better AI systems help produce better compute, the rate of progress becomes endogenous. The tools being improved are also tools for improving the next tools.

The technical tension is not whether AI systems erase chip manufacturing, energy constraints, or physics. It is whether AI-assisted design and infrastructure work compress enough of the engineering cycle to accelerate the whole stack. The Columbia paper is used because it shows one plausible mechanism: evaluated hardware-code generation with reinforcement feedback.

Feedback is what turns generation into engineering

The acceleration thesis depends on models moving from short code-generation tasks toward longer engineering work. Verilog generation is important because it is close to hardware, but it is still part of a larger transition: models becoming capable of doing the work that makes engineering organizations productive.

Human engineers can set up large-scale experiments. They can debug complex code bases. They can coordinate across teams. They can design new architectures. Those tasks require memory, planning, context, judgment, and persistence over long horizons. The expectation stated in the source is that models will begin to do more of them as they become smarter, gain longer context windows, and improve at reasoning over longer time spans.

That matters because the compute stack is not only silicon. It includes training infrastructure, cluster behavior, software optimization, debugging tools, evaluation pipelines, and research coordination. The source’s broader point is that models become more strategically important as they move from producing isolated outputs toward helping with the systems that make further model improvement possible.

The paper’s reinforcement-learning setup is the concrete instance shown in detail. The model acts, the environment evaluates, the system receives reward, and future behavior improves. In hardware-code generation, the environment is a Verilog or RTL evaluation process. The source then generalizes the compute-improves-compute idea to the surrounding stack: better models can write better software to optimize clusters, and those optimized clusters can train better models.

Feedback is the difference between plausible output and engineering progress. A model that only produces plausible text can be wrong without knowing it. A model in an evaluable loop can search, test, and improve. Hardware design, because it has simulation and pass/fail structure, provides a clean example of the kind of domain where this matters.

The “afternoon” comparison should be read through that lens. The claim is not that all chip work becomes trivial. It is that some parts of engineering move from human-time iteration to machine-time iteration. When the cost of trying an alternative falls, the search process changes. When many alternatives can be generated and checked quickly, the design space becomes more accessible.

That shift also changes organizational assumptions. A human chip team must allocate scarce engineering attention. It must decide which alternatives are worth exploring because each alternative carries labor cost. A model-driven system can explore many more candidates if evaluation is automated and cheap relative to human review. That increases the value of benchmarks, simulators, environments, and reward signals.

The Columbia paper sits at the intersection of these ideas. It uses large language models for hardware design. It introduces tree-of-thoughts exploration so the system can consider multiple candidate paths. It uses RTL-based reinforcement learning so generated code can be judged and improved. It reports performance gains across models and methods. It is used as an early sign that models are entering the engineering loop that produces compute.

Europe’s AI problem is framed first as capital and compute scarcity

The second major claim is geopolitical: by 2031, Europe may be structurally behind in frontier AI because it lacks the compute infrastructure and capital commitment needed to compete with the United States and China.

A tweet shown on screen from François Chollet condenses the concern: “It’s basically a law of nature that any highly successful AI lab in Europe eventually turns into a minor Microsoft API.” The tweet functions as a shorthand for dependency. Europe can produce promising AI efforts, but the fear is that successful labs eventually become tied to non-European infrastructure and capital.

Mistral is used as the immediate example. The speaker says Mistral “essentially got bought” through a Microsoft arrangement involving a 10% stake, and that it is “over for them as an independent entity long-term” because they do not have billions and billions of dollars of compute behind them. The larger claim is that frontier AI labs require enormous compute backing, and that Europe does not have enough of it.

The problem is framed as compute resources and capital expenditure. Building large language models from scratch — or doing something as expensive as what Meta is doing — is described as a problem of resources at the scale of billions. “Computing resources are hard out here,” the speaker says. Talent and ambition are not treated as sufficient substitutes for clusters, GPUs, and the money required to buy them.

The comparison is stark. Americans are described as having compute. China is described as having compute and “absolute absolute tons of money” put toward it. Europe, by contrast, is described as having no comparable actor stepping forward. There is “almost absolutely no large tech company” in Europe operating at a level where it can put forward two or three billion dollars to buy NVIDIA GPUs. Later, the required commitment is described as five billion dollars to build a foundation model from zero.

Baidu supplies the China contrast. A headline shown on screen says: “Baidu’s Ernie chatbot is gaining AI capabilities faster than ChatGPT did, says CEO Robin Li.” The speaker asks who supports Baidu and answers: the Chinese government, and “basically all the money in China in general.” In the argument, Baidu stands for a Chinese model of compute support that Europe is said to lack.

Actor or region	How it is used in the argument	Direct claim made in the source
United States	Compute-rich benchmark	Americans have compute and the capital base for frontier AI.
China	Compute-rich benchmark	Baidu is supported by the Chinese government and Chinese capital more broadly.
Baidu	Example of Chinese AI backed by national resources	Ernie is presented through a headline saying its capabilities are gaining faster than ChatGPT did, according to CEO Robin Li.
Mistral	Example of European dependence	Mistral is said to have effectively been pulled into Microsoft’s orbit through a 10% stake.
Europe	Region at risk of falling behind	No comparable European actor is described as putting up the billions required for frontier compute.

The Europe argument distinguishes between compute-rich actors and Europe’s alleged lack of comparable capital commitment.

The claim is not that Europe has no engineers, no companies, or no AI users. It is that Europe lacks the kind of concentrated compute base and capital commitment needed to remain at the frontier. No one, in this account, is standing up to assemble the billions required to build an “absolute foundation model from absolute zero.” That absence is treated as the strategic failure.

The deeper claim is that compute access will increasingly determine the productivity of the technology sector, not only the fate of AI labs. If frontier models become the tools used to write code, check servers, debug systems, design infrastructure, and accelerate research, then regions without frontier compute will work more slowly. Their developers will use weaker systems, delayed systems, or whatever access is available from actors that control stronger models.

That is why the forecast for Europe is severe. “Europe is dead” means dead as a leading technology place in an AI-centered world. The claim is not about a lack of intelligence or education. It is about missing the compute base required to participate in the compounding loop.

The standard-of-living formulation is loose but revealing. The speaker suggests thinking of standard of living in terms of raw intellectual and labor output divided by GDP, then argues that compute-rich regions will hit an “infinite wall” relative to Europe. The intended point is that AI magnifies intellectual labor. If the best AI systems are concentrated in the United States, China, or other compute-rich places, then those places get a productivity multiplier Europe lacks.

Europe’s danger, as framed here, is not immediate disappearance. It is dependence and relative slowdown. If the strongest models are trained and served elsewhere, Europe may still consume AI services, but it will not set the pace of capability. It will be downstream of the actors with the compute.

The 2031 scenario turns compute scarcity into everyday productivity loss

The 2031 thought experiment centers on inference access. Training frontier models is one bottleneck; running them at scale is another. If the strongest models are expensive to serve, then the question becomes who can afford to use them routinely.

The hypothetical is simple: assume GPT-7 exists in 2031, and assume inference costs twenty dollars per token. Under that assumption, the speaker says nobody in Europe is running it. The example is deliberately extreme, but it clarifies the strategic concern. A region can fall behind not only because it cannot train the leading model, but because it cannot afford to use the leading model as an everyday input to work.

$20 per token

the hypothetical 2031 inference cost used to illustrate Europe’s access problem

For technology workers, that matters at the level of ordinary tasks. The examples given are writing code, checking servers, and basic work that LLMs do or are expected to do better. If frontier models make these tasks faster, safer, or more reliable, then the productivity gap appears wherever those models are available. If European workers are using weaker models while American or Chinese workers use frontier systems, the same job takes longer or produces worse output.

This is the practical meaning of relying on “leftovers” from the United States or China. Europe may still receive AI services. It may still have software products and enterprise tools. But if the frontier is controlled elsewhere, Europe’s access depends on systems built and prioritized elsewhere. The strongest capability may be too expensive, too constrained, or simply unavailable for routine use at European scale.

The Chollet tweet about successful European AI labs becoming “a minor Microsoft API” captures this dependency at the company level. The speaker extends it to the regional level. If Europe’s most promising AI efforts require outside infrastructure and capital, then Europe’s independent frontier position weakens. A lab can survive as a product or service provider while losing the ability to set the terms of the frontier.

“They just missed the boat” points to a timing problem. Compute infrastructure is not acquired instantly. Capital allocation at the scale of billions must happen before the gap is fully visible. If the compute feedback loop accelerates, waiting makes the problem harder. The actors already ahead use AI to improve compute and infrastructure, widening the distance from those still trying to assemble the basics.

The inference-access divide also explains why AI is linked to standard of living in the speaker’s framing. If AI systems multiply labor output, then access to stronger systems becomes a productive asset. A worker with a frontier model can write, debug, and operate systems differently from a worker with a weaker model. A company with routine access to the best models can move faster than a company that cannot afford or obtain that access.

The resulting hierarchy is the article’s synthesis of the source’s claims about capital, compute, and access. At the top are actors that train, run, and improve frontier models on infrastructure they control. Below them are actors that buy or receive access to those systems. Below them are actors that use older, weaker, or constrained models. The Europe 2031 scenario places Europe in a dependent position, not because it lacks demand for AI, but because it lacks the compute base and capital posture to own the frontier.

Regulation is treated as another drag on catching up

The later Europe claim adds regulation to the capital and compute argument. The regulatory environment is described as making it difficult to build massive compute clusters. Under that condition, Europe may “regulate themselves completely out of the frontier model race.”

This is not developed as a detailed account of specific statutes. The source names Europe as a “regulatory superpower” and mentions regulations such as GDPR and the AI Act. It also says Europe lacks domestic tech giants comparable to those in the United States or China, even though it has a huge market. The argument is that Europe has been able to use market size to impose rules, but that this posture becomes brittle if AI capabilities are accelerating rapidly.

The timing problem is central. If a piece of legislation takes three years to draft and pass, the source argues, the technology may have “completely completely transformed” by the time the rule arrives. Regulation, in this framing, is aimed at a moving target. In a slow-moving sector that may be manageable. In a sector shaped by a compute feedback loop, delay can become strategic loss.

The 2031 choice is framed bluntly. Europe can try to maintain a precautionary regulatory approach and risk being left behind economically and technologically, or it can try to catch up. Catching up is hard because the missing asset is not one regulation or one product launch. It is the physical and financial base for frontier AI: compute infrastructure, large capital commitments, and the ability to train and serve the best models.

The direct regulatory claim is that Europe’s rules and institutional posture make massive compute-cluster construction harder. The capital claim is that Europe does not have enough large technology companies or investors willing to put billions into the necessary GPUs and infrastructure. The dependency claim is that successful European AI labs risk being absorbed into or subordinated to American cloud and model infrastructure. The article’s synthesis is that these three problems reinforce one another.

Claim type	What is directly claimed	How it affects the 2031 forecast
Capital scarcity	No European actor is described as putting up two, three, or five billion dollars for frontier compute.	Europe lacks the spending base for independent frontier model development.
Compute scarcity	Europe is contrasted with the United States, China, and the UAE as places with compute.	European technology workers may lack routine access to the strongest models.
Dependency	Successful European AI labs are framed through the Chollet tweet as becoming Microsoft APIs; Mistral is described as effectively tied to Microsoft.	Promising European efforts may survive without controlling the frontier infrastructure.
Regulation	The regulatory environment is said to make massive compute clusters difficult to build.	Catching up becomes harder while the target keeps moving.

The Europe forecast rests on separate but reinforcing claims about capital, compute, dependency, and regulation.

The harsh phrase that Europe becomes “basically a third-world country by 2031” is a relative claim about technological productivity under unequal AI access. It is not a claim about Europe’s current wealth or living conditions. It is a claim about technological hierarchy in a world where AI capability becomes a key input to labor productivity. If American and Chinese actors have routine access to stronger models, while European firms work with weaker or more constrained access, the relative gap becomes visible in everyday technical work.

The regulation argument also depends on the long-horizon engineering argument. If models become able to help debug large systems, coordinate experiments, optimize clusters, and design hardware, then restrictions or delays around compute infrastructure affect more than model training. They affect the tools engineers use to improve every other part of the technology stack. A lag in frontier AI becomes a lag in the ability to catch up.

The hardware paper and the Europe forecast depend on the same premise

The hardware-design paper and the Europe forecast may look like separate topics, but they depend on the same premise: compute is becoming the strategic substrate of AI, and AI is becoming a tool for expanding and optimizing compute.

The Columbia paper supplies the technical mechanism. It shows LLMs being applied to Verilog generation, with pass/fail evaluation, tree-of-thoughts exploration, and RTL-based reinforcement learning. The displayed tables compare model and method performance, and the spoken interpretation is that GPT-4, GPT-3.5, and Llama improve, with further improvement when RTL-ToT-RL is added. If that capability improves, models can help shorten design cycles that previously depended on long human iteration.

The Europe argument supplies the strategic consequence. If compute is the substrate, then regions without massive compute investment fall behind. If AI also improves compute, the lead compounds. The actors with compute train stronger models. Stronger models help improve chips, software, clusters, and research execution. Those improvements support still stronger models. The actors without compute do not simply trail by a fixed interval; they lack the tools that would help them close the gap.

That is why Mistral, Baidu, Microsoft, TSMC, and European regulation all appear in the same argument. Mistral represents European dependence on outside capital and infrastructure, as characterized by the speaker. Baidu represents, in the speaker’s telling, Chinese state and national capital support. Microsoft represents the pull of American cloud and AI infrastructure. TSMC represents the downstream fabrication constraint that becomes more important if design accelerates. European regulation represents a political and institutional posture that may make it harder to build large compute clusters.

The most severe claims are rhetorical, but the structure is consistent. “Obviously, we’re all dead” is used as a throwaway intensifier around AI-designed hardware. “Europe is dead” means Europe is dead as a leading technology place if it misses the frontier AI compute wave. “Third-world country by 2031” is a relative claim about technological productivity under unequal AI access.

The argument can be reduced to six linked claims. Frontier AI depends on compute. AI systems are beginning to assist in the production and optimization of compute. Evaluated loops such as RTL-based reinforcement learning make hardware-code generation a plausible part of that process. If design iteration accelerates, manufacturing and cluster capacity become more central bottlenecks. Countries and companies with the capital and infrastructure to build compute will use AI to extend their lead. Europe risks relying on insufficient compute, outside infrastructure, and slower institutional adaptation in a race increasingly determined by compute capacity.

The most important shift is from viewing AI as a product to viewing it as a production input. If AI is only a chatbot, then a region can be a consumer and still participate meaningfully. If AI is an input into chip design, cluster optimization, software engineering, research execution, and labor productivity, then dependence on external AI becomes dependence across the technology stack.

Compute, in this account, is not just something a lab buys to train a model. It is an amplifier. More compute produces better models. Better models improve the systems that produce and use compute. That makes capital allocation, infrastructure, manufacturing capacity, and regulatory speed part of the same competitive system.

Europe’s danger, as framed here, is that it enters this system without the assets that compound: no comparable multi-billion-dollar compute commitments, no clear path to frontier model independence, and a regulatory environment said to make massive cluster construction harder. The consequence is not immediate disappearance. It is dependence, slower technical work, and a widening productivity gap by 2031.

AI Labs and Strategy AI Research Methods AI Governance and Regulation AI Infrastructure and Compute AI Policy and Geopolitics AI Economics and Labor