AI’s Scarce Inputs Are Rewriting the Open Versus Closed Model Debate

John CooganTBPNTuesday, June 30, 202615 min read

John Coogan, Jordi Hays and Tyler use Zhipu AI’s open-weight GLM-5.2 release to argue that the open-versus-closed AI fight is now about timing, not a settled winner. Closed labs may still lead at the frontier, but they say capable open models arriving close behind can weaken API-based controls, shorten monetization windows and complicate security planning. The discussion broadens that pressure into the AI supply chain, where scarce compute and memory capacity may be capturing profits before model providers can.

Open-weight models keep narrowing the policy window

John Coogan framed the release of Zhipu AI’s GLM-5.2 as another turn in a debate that has refused to settle: whether open-weight AI models will commoditize frontier capabilities or fall far enough behind closed models that the frontier remains controllable and monetizable. Jordi Hays captured the recurring rhythm more bluntly: for open-source AI, the mood swings between “it’s over” and “we’re so back.”

The immediate trigger was GLM-5.2, released by China’s Zhipu AI, also known as Z.AI. Coogan cited Wall Street Journal reporting that security researchers said the model can match the latest U.S. models at finding security bugs. The point was not just benchmark bragging. Because GLM-5.2 is open-weight, users can download it, run it on their own hardware, modify it, and use it without going through a private API or supervised provider. That makes it attractive to developers and enterprises that want direct control. It also makes it attractive, Coogan noted, to hackers who want to run capable models “in the shadows.”

GLM-5.2 had also become one of the top 10 most-used AI models on OpenRouter, according to data Coogan cited from the model-routing platform. He treated OpenRouter as an important piece of market infrastructure: a way to be a “front door” to many AI models without having to compete directly in the frontier benchmark race.

The source showed a U.S. Center for AI Standards and Innovation chart titled “Overall AI Capability,” plotting Elo scores for U.S. and PRC models from January 2024 through January 2027. The blue U.S. line included models such as GPT-4o, o1, o3-mini, o3, Opus 4, GPT-5, GPT-5.2, Opus 4.6, GPT-5.4, and GPT-5.5. The red PRC line included DeepSeek R1, Alibaba Qwen3, Kimi models, and DeepSeek V4 Pro. Coogan read the chart as reflecting a recent narrative: U.S. closed-source labs were improving faster, aided by capital markets, data centers, and researcher density, while Chinese and open models might plateau.

Tyler, who had been reviewing the chart and newer model data, cautioned against reading too much certainty into that picture. He said the Elo was an approximation built from many benchmarks, some of which are proprietary or not public, making it difficult to place newer models precisely. GLM-5.2, in his view, looked like “a big step up” from the Chinese trend line, but he also said the benchmark mix used in the chart appeared to accentuate the gap between U.S. and Chinese labs. Epoch AI, he noted, had produced work suggesting a relatively stable gap between closed-source and open-source models since around 2023.

That matters because the policy question is not whether open-weight models are absolutely best. It is whether they become good enough, soon enough, to weaken controls built around closed APIs, know-your-customer requirements, or restricted access to frontier systems. Coogan’s concern was that if a company is waiting for government approval to use a future closed model, but an open-weight model catches up within a few months and is “close enough,” the regulatory premise becomes harder to execute.

Benchmarks are less useful if the unit of competition is the completed task

Coogan pushed Tyler toward a different way of measuring model economics: cost per task rather than cost per token. Tyler agreed. A model can have a low token price and still be expensive for a given job if it uses many tokens to complete it. Conversely, a model whose token price is unchanged can become cheaper in practice if it becomes more token-efficient.

That distinction complicated GLM-5.2’s apparent advantage. Tyler said many users described the model as “very token hungry.” It may be much cheaper than frontier models on a per-token basis, but the relevant comparison is the cost of completing a task. On some tasks, especially when closed-source reasoning models are run in slower or more deliberative modes, that calculation can shift.

The broader market structure, in Tyler’s account, is starting to split into two useful classes of models. At one end are the frontier systems, chosen when the marginal quality is worth paying for: coding agents, cybersecurity, and other work where a failure is expensive enough that buyers “will pay whatever it is.” At the other end are small, fast, cheap models used for repetitive point solutions, sometimes orchestrated by larger models. Coogan offered receipt processing as the example: every receipt entering an expense-management system may be processed by an LLM, but recognizing that someone spent $10 on coffee does not require a frontier model.

The difficult space is the middle. Tyler was not convinced there is a large market for models that are neither the best nor extremely cheap. Hobbyist coding agents may be one use case, particularly for users who do not want to pay high closed-source token prices. But OpenRouter usage, he said, points heavily toward small models by token volume, including models such as DeepSeek Flash.

That does not mean GLM-5.2 can be dismissed. Tyler called it “a very good model” and warned against concluding that the gap is widening so much that open-source or Chinese labs no longer matter. Coogan drew the monetization implication: if open-weight models arrive near the frontier quickly, the period during which a closed lab can monetize a new frontier model may be shorter and less predictable.

Distillation is both an accusation and a measurement problem

The discussion of GLM-5.2 quickly turned to distillation: the claim that some open models perform well because they have been trained, directly or indirectly, on outputs from closed models. Tyler said this has become a recurring accusation around Chinese open-source models. It is also hard to prove. There seem to be “aspects of Anthropic models” in some of these systems, he said, and Coogan noted that Anthropic had openly accused Alibaba of distilling.

But the line between distillation and ordinary training data is blurring. Tyler pointed out that a model may not be trained directly against a closed API. It may instead train on public GitHub repositories that were themselves written or heavily modified with closed-source models. Is that distillation? Tyler did not resolve the question.

Coogan extended the point beyond code. As more of the public internet, GitHub, open-source repositories, and even books become LLM outputs, training on public data increasingly means training on the traces of prior models. If a model has a recognizable writing quirk, that quirk can spread into downstream datasets. Code conventions generated by closed models can similarly re-enter open training corpora.

Tyler’s practical warning was that distilled models tend to generalize worse. They can show strong benchmark scores and still lack robustness outside the tests. He recommended initial suspicion toward “super high benchmark scores,” especially where the benchmarks are susceptible to overfitting or indirect training exposure. Coogan described the missing quality as the “big model je ne sais quoi.”

On GLM-5.2 specifically, Tyler said the anecdotal reports were strongest for coding. Users described it as very good. For creative writing and other tasks harder to benchmark directly, the case was less clear. Coogan also raised whether users had tested the model on politically sensitive Chinese topics such as Tiananmen Square. Tyler replied that because the model is open-source, users can fine-tune around refusals or restrictions, even if doing so is not always trivial.

The practical point was less about whether GLM-5.2 is pure. It was that model capability may diffuse through public artifacts even when direct API distillation is restricted. That makes closed-source advantage harder to defend solely through access controls.

The security debate is now about lead time, not permanent separation

Coogan revisited John Luttig’s May 2024 argument that the future of foundation models would be closed-source. Luttig’s thesis, as Coogan summarized it, rested on closed-source data flywheels, exponential capital expenditure, and the intensity of frontier training. Open source would still have a home where smaller, less capable, configurable models were useful, such as enterprise workloads. But the bulk of value creation and capture would occur at the frontier. Open-source model releases could serve as marketing or as a way to commoditize complements, but open-source providers would ultimately lose the capital expenditure war as ROI declined.

That argument fit the Meta Llama era, when open-source AI was often discussed as part of Meta’s strategy. Coogan said the idea was that open models could help Meta attract talent, show it had an AI story, lower internal costs, and commoditize the broader model layer even if the company was not directly monetizing Llama through rapidly growing ARR. By 2026, he said, that looked more complicated, given reports of Meta spending heavily on Gemini and other closed-source frontier labs.

China’s entry made the game theory harder. Coogan cited George Hotz’s recent argument that China has different incentives from U.S. firms. In Hotz’s formulation, China can benefit from giving away moderately resourced models because AI-driven deflation in the U.S. service sector helps a less service-dependent Chinese economy. Hotz’s broader claim, as Coogan relayed it, was that nobody will get a monopoly on AI: “We don’t live in a unipolar world anymore.”

The security concern was sharpened by a resurfaced 2023 clip of Anthropic CEO Dario Amodei testifying to Congress. Coogan emphasized that the clip was old, not a new statement, but said some of Amodei’s warnings looked prescient. Amodei had warned that within two to three years frontier models could raise serious biosecurity and cybersecurity risks, and that open-source scaling could go down a dangerous path if it continued.

Coogan’s current framing was that the U.S. may still have a defensive lead, but not a permanent safety margin. He said cybersecurity firms such as CrowdStrike and Palo Alto Networks have been working with advanced models including Mythos and GPT-5.5 Cyber to harden systems against LLM-driven attacks. As long as closed models remain ahead of open models, white-hat defenders may have time to find and fix vulnerabilities before black-hat users get equivalent capability from open systems.

But he did not treat that lead as guaranteed to widen. If the gap does not expand, cybersecurity and eventually biosecurity strategy must adapt. The question becomes how defenders use temporary access to stronger closed models before open-weight systems commoditize the same capability.

Compute scarcity is constraining even the companies buying from each other

The AI infrastructure shortage appeared again in a Financial Times report that Google had capped Meta’s use of Gemini models. Hays read from the report: Google told Meta around March that it could not provide all the Gemini capacity Meta wanted to purchase, disrupting and delaying some internal AI projects. Several other Google clients were also affected, though Meta was hit especially hard because of unusually high demand.

Coogan was struck by the scale implied by the story. Google had spent enormous sums on capital expenditure, he said, yet still appeared to be capacity-constrained. His initial interpretation was bullish for Google: if the company cannot meet demand, that aligns with its Google Cloud growth story.

Hays tied the report to “token maxing,” the period when tech companies began spending aggressively on AI tokens and capacity. He also raised, cautiously, whether distillation concerns could be part of the story, though he said he did not know. Coogan distinguished that from a separate report that Meta had limited employee use of Claude and Codex in some areas because it did not want accidental distillation. In the Google-Meta case, the issue as reported was capacity, not explicitly model leakage.

The source also showed a Google Gemini-branded rideshare bicycle wheel cover with the text “Google Gemini” and “A new kind of help from Google.” The image was incidental but underscored how consumer-facing the brand has become even as the capacity behind it is constrained.

The report said Meta had encouraged staff to be more efficient with AI tokens, both because of Google’s restrictions and a broader push to streamline AI costs. That detail fit the earlier cost-per-task discussion: even the largest AI buyers are being forced to treat tokens as an operating constraint rather than an abstraction.

Meta’s brain-to-text work is impressive, but not yet a daily device

Meta also announced what Hays described as a “mind reader”: Brain2Qwerty v2, a non-invasive brain-to-text decoder. The on-screen X post from AI at Meta said the system was the “highest-performing end-to-end pipeline capable of real-time sentence decoding from raw brain signals,” advancing beyond character-level performance to decoding words and semantics. Meta said the research could help millions of people with brain lesions or disorders that prevent communication.

Coogan and Hays treated the announcement with both interest and skepticism about form factor. The source showed Meta’s video text: “Introducing Brain2Qwerty v2 — A non-invasive real-time decoder to translate brain activity into sentences.” Coogan then showed an image of the magnetoencephalography device: a large, helmet-like machine covering much of a person’s head in a clinical room.

Hays resisted the word “non-invasive” in the everyday sense, not because it pierced the body, but because the equipment looked room-sized and impractical. Coogan gave Meta credit on the technical meaning: if nothing is implanted, it is non-invasive. But he agreed the current setup is not something a person would “daily drive.”

The more interesting near-term demo, in Coogan’s view, would be a setup where someone could sit in a chair and see thoughts translated onto a screen. He connected the announcement to a prediction from Rob Toews, who had recently forecast in Forbes that telepathy would be commonplace by 2030. Coogan called that an aggressive prediction.

The Meta research, as discussed, was therefore not presented as a consumer product about to ship. It was a meaningful research milestone with an obvious clinical use case, a provocative surveillance-adjacent cultural reaction, and a hardware gap large enough to make “commonplace telepathy” still sound far off.

Memory suppliers are absorbing the AI profit pool before model providers do

A Wall Street Journal article by James Mackintosh supplied one of the more concrete economic claims: memory-chip makers are profiting from AI at the expense of almost everyone else. Coogan read the argument as an “extraordinary transfer of cash” from AI providers, and potentially future AI users, to memory suppliers.

The key bottleneck is high-bandwidth memory. Micron, Samsung Electronics, and SK Hynix were compared to oil producers supplying airlines: they provide an essential input whose price has suddenly become much higher. Because capacity is limited and new production facilities take years to build, data center demand has pushed prices up sharply.

Coogan read several numbers from the Journal piece. In the quarter ended May 28, Micron increased DRAM prices by more than 60% from the prior three months while shipments rose only by a low single-digit percentage. NAND flash prices, also used in data centers, jumped more than 80%. The article said Micron customers paid $18 billion more in the quarter. Memory prices had quadrupled in a year.

60%+

Micron DRAM price increase from the prior three months, as read from the Journal article

The source showed an LSEG chart titled “The Costs of Memory Profits,” comparing share price changes since April for Micron, Microsoft, Meta, Amazon, and Alphabet A. Micron’s line rose dramatically, approaching roughly 120%, while the large AI infrastructure buyers stayed close to flat by comparison. Coogan said Micron’s stock had been “through the roof” and described the company as joining the trillion-dollar club.

The pressure is not confined to AI. Coogan read that Apple had raised MacBook prices by more than 15%, and the Journal author’s own memory purchase for a quiet PC had tripled in price and now cost more than the CPU. That reversal is unusual for memory, where prices typically decline over time.

The AI-specific problem is that model providers are not generally passing higher input costs through to end users. They are still pricing to acquire customers rather than to generate profits. If inputs keep rising, Coogan summarized, either losses grow or prices must rise, which could slow adoption. In that framing, the scarce input supplier captures the economics while AI providers subsidize usage.

Comcast’s split raised the harder question of whether theme parks can still compound

Hays noted that Comcast planned to separate its media and connectivity businesses, with NBCUniversal and Sky separating film, theme parks, and streaming operations from connectivity. Coogan’s immediate question was whether someone could build “the Anduril of theme parks”: a new theme-park business using a modern technology stack.

Hays was skeptical. Theme parks are capital intensive, and people who have worked on Disney parks had described to him the difficulty of amortizing a ride over roughly 20 years. He argued the business is probably harder now than when many parks were built, because every niche has instant online entertainment available.

Coogan countered with the strength of in-person experiences. He pointed to current trend pieces about high pricing for IRL events: people can watch sports highlights online, yet still pay thousands of dollars to attend a Knicks game. He connected that to investment interest in sports assets, including Thrive buying a stake in the San Francisco Giants and exploration around an NBA team in Las Vegas.

Hays responded with a caveat about sports betting volume. A recent statistic, as he recalled it, showed sports betting volume exceeding the combined sales of movie tickets, theaters, theme parks, and other IRL categories. He acknowledged volume is not revenue, but considered it meaningful. The exchange left the theme-park question unresolved: live experiences may command premium prices, but they compete with digital entertainment and adjacent gambling economics while requiring long-dated capital commitments.

The Yahoo-Facebook near-sale still reads as a lesson in seriousness

The source briefly revisited Yahoo’s failed attempt to buy Facebook in 2006. Hays read from a Wired account: Yahoo offered $1 billion in cash, and Mark Zuckerberg verbally agreed to sell. Coogan filled in the strategic logic. Yahoo had hundreds of millions of users but was struggling in social networking; Facebook had strong social tools and needed broader distribution.

The deal fell apart after Yahoo announced slower projected sales and earnings growth and delayed its advertising platform. Its stock fell 22% overnight. Terry Semel, Yahoo’s CEO, cut the offer from $1 billion to $800 million. Zuckerberg, warned about Semel’s reputation for last-minute renegotiation, walked away. Two months later, Semel returned with the original $1 billion offer, but Zuckerberg had convinced his board and executive team that Yahoo was not a serious partner and that Facebook would be worth more independently.

Coogan thought Zuckerberg’s reaction was reasonable. If a buyer cuts the price before definitive documents, the seller has to worry about further retrading during paperwork, closing, and earn-out negotiations. He also found the counterfactual interesting: under Yahoo, Facebook may have been less able to attract talent, maintain momentum, and later buy Instagram and WhatsApp.

Hays’s punchline was that Yahoo should make another offer. Coogan joked that if Meta kept trading down far enough, perhaps Yahoo could pick it up. Underneath the joke was the same seriousness test that ran through the AI sections: capital, capacity, and commitment matter, but so does whether counterparties believe the other side will still be there at the original terms when the environment changes.

AI startups are raising for people and infrastructure at the same time

The final funding note came from an X post by Chamath Palihapitiya announcing a $135 million Series A for 8090. The post said the round was led by Salesforce Ventures and joined by WNDR, Craft Ventures, The Production Board, and LAUNCH, with angels including Nikesh Arora, Cliff Robbins, Adam D’Angelo, Shyam Ravindran, Abhi Arun, and Thomas Laffont.

Hays noted the investor list; Coogan pointed out that it effectively brought together the “Besties,” with The Production Board representing David Friedberg’s fund. The post said the capital would go to two places: hiring more people because demand was accelerating, and investing in compute and infrastructure needed to deliver solutions with quality and reliability.

That allocation matched the broader pattern running through the discussion. AI companies are not just raising to hire software teams. They are raising to secure access to the scarce physical and model infrastructure beneath the product layer: tokens, compute, memory, and capacity that remain expensive even for the largest buyers.

Data and Training AI Startups and Funding AI Labs and Strategy Evals and Benchmarks Inference and Deployment AI Security AI in Healthcare and Life Sciences AI Infrastructure and Compute Open Models AI Business Models Model Releases