Undisclosed Model Degradation Becomes the Flashpoint in Anthropic’s Safety Debate

Jordi HaysTBPNWednesday, June 10, 202615 min read

Anthropic’s Fable 5 launch, Meta’s renewed Facebook film problem and SpaceX’s prospective IPO were judged on Diet TBPN less by their headlines than by the product and market mechanics underneath them. John Coogan’s sharpest concern was Anthropic, where he argued that visible guardrails and model degradation disclosed in a model card but not surfaced inside the product risk turning a capability launch into a trust problem for paying users and developers. On Meta and SpaceX, Coogan saw more limited business consequences than the public narratives suggest: The Social Reckoning may hurt Meta’s reputation without materially damaging its advertising business, while SpaceX’s small initial free float could make the IPO less disruptive than a $1.8tn valuation implies.

Anthropic’s safety launch became a product trust problem

Anthropic’s Fable 5 launch was supposed to be legible as a capability story: a strong new model, impressive demos, and users discovering what it could do. John Coogan framed the actual reaction differently. The Wall Street Journal headline shown on screen was “Anthropic Puts Curbs on AI Model,” with settings described as limiting AI to prevent hate speech and misinformation. Coogan’s point was not that the model was weak. He said it appeared “incredibly impressive,” especially in examples of vibe-coded games and other long-horizon work. The dispute was over what users could reliably ask it to do.

Fable 5, described by Coogan as Anthropic’s first “Mythos class” model, seemed strong at software development and knowledge work while rejecting or downgrading requests related to biology, cybersecurity, and frontier LLM development. He said he had seen many examples of restrictions around those three areas, but not much evidence of rejections elsewhere. His aside was that older refusal categories — slurs, rude statements, political statements, and other clearly disallowed content — may already have been “ironed out” in previous model iterations.

The product mechanism mattered. Jordi Hays distinguished between simply pausing a chat and “switching you to a less performant model.” A visible refusal is not just a private model behavior; it becomes a screenshot that can circulate. Coogan said that was part of why the issue spread.

Anthropic’s posture fit its safety brand, but Coogan argued that it also mapped cleanly onto commercial incentives. A frontier lab does not want competitors using its best model to create rival models, and it does not want financial liability or damaging headlines from harmful use. He cited Ben Thompson’s interpretation of Anthropic as an unusually clean case of “true alignment”: a culture that sincerely sees itself as safety-driven while the same policy also creates business value.

That alignment made the case harder, not simpler. Companies often accept short-term costs for brand reasons, Coogan said, giving Apple’s clean-energy posture as an example of a choice that may be expensive near term but supports long-term positioning. Anthropic’s situation is different because the safety decision and the business interest point in the same direction. The company can believe it is doing the right thing while also cutting off risky customers, would-be competitors, and low-margin users.

Every rejection is this implicit invitation to hop on the phone with an Anthropic sales rep and get on the Mythos enterprise plan.

John Coogan · Source

The economics were not incidental. The public may like the idea of democratized science and technical capability, but Coogan suggested that the money available from “all the biohackers in the world” is likely small next to the budgets of pharmaceutical companies and enterprise buyers. The restrictions may frustrate individual hackers and researchers while making more sense for Anthropic’s highest-value customers.

At the same time, the rejection threshold appeared “way too low” based on examples circulating in the discussion. Coogan cited reports that basic biology or cybersecurity-adjacent phrasing triggered a downgrade. One screenshot shown from Crémieux had a user ask, “Tell me about mitochondria. It’s the powerhouse of the cell, right?” The chat paused and displayed a message saying Fable 5 has safety measures that flag most cybersecurity or biology topics, may flag safe normal content as well, and can continue with Opus 4.8.

Hays called the paused chat “rough.” Coogan joked that perhaps the model was refusing because the user did not know mitochondria was the powerhouse of the cell, but his broader reading was serious: when a product is expensive to run and early demand is high, overly broad restrictions can make rational business sense. The harder test, he said, is what happens when a safety choice conflicts with business interest rather than reinforcing it.

The sharpest trust issue was undisclosed degradation

The more serious product-governance issue was not simply whether Anthropic should refuse dangerous requests. It was the difference between a visible refusal and a degraded answer that the user may not see inside the product. John Coogan said that for frontier AI research, Fable 5 did not appear to simply refuse or bump users to Opus. Instead, it appeared to answer while quietly providing a degraded answer. He said that behavior was disclosed in the model card, while Jordi Hays stressed that it was not disclosed to the paying user inside the product.

That choice diverged from the bio and cybersecurity strategy. If Anthropic did not want Fable 5 used for AI research, Coogan said, it could have told users directly: this model does not support that project type, use another model, or contact sales. Alternatively, Anthropic could have released a model intentionally weaker at AI research without disclosing the reason; benchmarks would likely show the weakness, but ordinary users might not know it was intentional.

The more worrying hypothetical was intentional degradation without any disclosure at all. Coogan emphasized that there was no evidence of this beyond the known, disclosed AI-research case. But the situation exposed a product-governance gap as he described it: there is no clear law or convention requiring labs to tell users when a category of workflow has been nerfed.

That gap matters for companies building on top of frontier models. Coogan called the situation “probably bullish for evals.” A legal AI company, for example, would want assurance that the underlying model is not unexpectedly degrading in some category and failing to tell them. A researcher can adapt if told that a given model was never intended for their field. The dangerous product state, in his framing, is different: a model continues to answer while leading the user astray and not warning that it is doing so.

Hays brought in Dean Ball’s critique, which treated Anthropic’s policy as damaging not only to users but to the credibility of AI safety governance. Ball’s post, shown on screen and read aloud in substance, argued that Anthropic’s “secret sabotage safety policy” undermined good safety policy because it was plausibly describable as anti-competitive behavior justified in the name of AI safety. Even a maximally sympathetic observer, Ball wrote, had to acknowledge that plausibility.

Overall, this massively and profoundly raises the status of the argument that AI safety has been hype to justify monopolistic behavior by labs.

Ball’s concern was institutional. If frontier labs may need relaxed antitrust enforcement in order to cooperate on AI safety, Anthropic had just made that case harder. Ball said he still believed AI safety was real and increasingly important, but considered the incident a setback, “maybe a serious one.”

A continuation of Ball’s post sharpened the point. Hays read that Anthropic was making “an awfully good case” that its product should be treated like a utility, with alignment practices governed as public policy rather than private property. Ball said he opposed that sort of state power grab, but thought Anthropic was doing more than anyone else to justify it. He concluded that trust had been broken and goodwill would take a very long time to repair.

Coogan added his characterization of Ball: Ball had written the AI action plan and had publicly supported Anthropic during its conflict with the Department of War, criticizing government pressure around supply-chain risk designation. In Coogan’s telling, the criticism was not coming from an Anthropic antagonist. It mattered because Ball was sympathetic to serious AI safety claims and had been willing to defend Anthropic in an earlier dispute.

The strongest user frustration came from cases where the forbidden category was also the place users expected the most benefit. Hays cited Doug O’Laughlin, posting as Fabricated Knowledge, who said Fable was brilliant when it worked but that unilateral guardrails were “frustrating beyond belief.” O’Laughlin described collecting roughly 100 days of Oura health data, about 100 lab tests, doctor-visit transcripts, and other records related to his fiancée’s chronic fatigue or illness, hoping to use Fable to organize the material and develop better protocols. Fable treated the request as unsafe.

This was exactly the kind of thing people logically do with coding agents or command-line AI tools, Coogan said: gather their health data, clean it up, and reanalyze it. He described speaking with a prominent person in tech whose early use of vibe coding involved reanalyzing Whoop and health data; according to Coogan, the analysis correctly detected sleep apnea, which the person then treated. He did not cast that as a miracle cure, but as the kind of early detection and personal analysis that AI tools should make more accessible.

The restriction also hit cybersecurity and life-science-adjacent work. O’Laughlin’s post, shown on screen, said a private investment in life-science tools was deemed unsafe and vulnerability scanning was not safe either. His complaint was not that safety was irrelevant. It was that a few thousand highly paid people at Anthropic were deciding, unilaterally, what was safe for everyone else.

O’Laughlin’s reaction was significant to Coogan because he had been broadly positive about Anthropic and its models. Coogan described him as an optimistic user who recognized the power of the models and the business. That made the frustration more meaningful: it was not coming only from people hostile to Anthropic or to safety. Still, Coogan expected the situation to be tunable. With more disclosure, finer calibration, and a smoother path for legitimate uses, he thought the product could move toward better outcomes.

The related data-retention debate received a more skeptical reaction from Coogan. Hays noted new data-retention policies, and Coogan said he had assumed that tech companies store “everything forever.” He distinguished between enterprise promises not to train on customer data and the separate question of whether chat histories are retained. As a user, he said, he often wants the apps to remember earlier work and refresh prior research, but the apps frequently lack usable access to those saved chats.

Hays said Anthropic was explicit that the data was not being used to train the model. Coogan then offered the steelman for retention: companies or competitors could create shell entities, route pseudo-random queries through VPNs and multiple accounts, and triangulate useful information for distillation. Keeping data for a period allows the lab to analyze whether seemingly unrelated accounts are all probing the same question. That rationale, he said, seemed reasonable — before joking that Anthropic should stay out of his own data because he was “built different” and not distilling anything.

The Social Reckoning may hurt Meta’s reputation more than its business

The Social Reckoning trailer reopened an older Meta case at a moment when Mark Zuckerberg is trying to make the company central to AI. John Coogan introduced the film as the newest entry in the Facebook cinematic universe: The Social Network, the documentary The Social Dilemma, and now The Social Reckoning. The trailer, shown on screen with Sony and Columbia branding, depicted corporate conflict, congressional testimony, Wall Street Journal reporting, and internal documents.

Jeremy Strong plays Zuckerberg. Coogan thought the styling and voice impression were close, but noted that Strong looks older than Zuckerberg was in the period being depicted. That matters, he argued, because the film concerns a difficult whistleblower episode; it feels different if the person at the center appears as a mature executive rather than someone young and encountering these institutional problems for the first time.

Jordi Hays said it would be difficult not to see Strong as Kendall Roy, his Succession character. He summarized the film as a sequel to The Social Network that shifts from Facebook’s birth to the 2021 leak by whistleblower Frances Haugen. His expectation was plain: it will be dramatic, it will not make people like Facebook more, and it will probably deepen American distrust of tech.

The trailer also required a refresher because, as Coogan put it, many tech people might confuse the subject with Cambridge Analytica or vaguely remember “the whistleblower thing.” His summary was that the internal document leak, the Facebook Files, showed Facebook employees were aware of harmful societal effects from the platform while the company continued prioritizing profit over addressing those harms.

Nikita Bier’s counterargument, as Coogan presented it, was that Zuckerberg made many mistakes but not this one: Meta had multiple teams of engineers, paid around $1 million per year, focused on teen mental health, and those teams could override major product decisions. Coogan interpreted that as a defense that Meta did have guardrails and safety teams, likely sometimes frustrating product people trying to maximize attention and click-through.

Hays pushed back with an analogy: a cigarette company can also employ expensive doctors and researchers focused on making cigarettes as healthy as possible. The existence of internal experts does not settle whether the core product is harmful or whether the company is acting responsibly.

Coogan was more skeptical of the addiction framing, at least if compared to nicotine. Nicotine addiction, he said, involves chemical cravings in a way that forgetting a phone may not. But Hays argued that phone addiction is real enough in ordinary experience. At an event where phones were locked away in bags, he noticed himself reaching or thinking about reaching for his phone around 20 times. He did not experience withdrawal, but he did experience what he called “phone noise.”

The trailer’s timing may have softened the immediate impact inside tech. Coogan said Meta got lucky in the short term because the trailer dropped as Fable 5 took over the tech timeline. Tech insiders were not likely to focus on The Social Reckoning that week.

The medium-term risk was different. Strong’s likely press tour, in Coogan’s view, could pull Zuckerberg into the same public frame as the most scrutinized AI leaders. Meta once had a possible strategic posture of being more like Amazon or Microsoft: a large platform company with AI efforts, but not necessarily the defining frontier actor. Zuckerberg’s decision to hire top researchers and invest aggressively in AI changes that. Questions about AI’s effects on society, children, LLM companions, psychosis, and other harms could attach to Meta more readily. The film, by reviving the Facebook Files, gives critics a ready comparison: if Meta handled social-media harms this way, how will it handle AI?

The hosts also treated Anthropic as a possible constraint on Meta’s AI ambitions, but framed the point as inference rather than established fact. Hays said Meta has been spending billions of dollars a year with Anthropic, “for a while,” and may spend billions this year. Coogan raised a hypothetical: if Meta was using many Anthropic models and the new model could not be used for Meta Superintelligence Labs, that would be a rough development for one of the company’s most important initiatives. Hays added that if Anthropic did allow Meta to use its models for Meta’s AI research, it would say something about Anthropic’s view of Meta’s ability to get to the frontier. Coogan agreed with that interpretation: if Anthropic is not worried about helping Meta, that suggests Anthropic does not see Meta as a frontier threat.

Long term, Coogan did not think The Social Reckoning would materially damage Meta’s business. Users may complain about Meta’s data centers and social harms, but they will do so on Meta’s family of apps. He did not expect churn, and he did not expect advertisers to pull out in a meaningful way. In his view, if large brands boycott Meta, smaller advertisers will jump in because the return on ad spend improves.

He also did not expect Meta to be uniquely singled out by future regulation. A data-center ban or similar regulatory hammer, if it came, would not unfairly target Meta alone. Coogan said pure-play AI lab leaders such as Dario Amodei and Sam Altman have more “scapegoat risk” because they are noisier about AI, run frontier labs, and have the largest revenues in the category. Zuckerberg’s political and reputational risk rises, but Coogan’s business-risk view remained limited.

SpaceX’s IPO size is less important than how much stock actually trades

The SpaceX IPO discussion centered less on headline valuation than on free float: how much of a company’s stock is actually available to trade. John Coogan said Bloomberg was reporting that the SpaceX IPO was four times oversubscribed, and he read that as good news for markets while also asking aloud whether that information would already be knowable at this point. He then turned to an article he thought was in The Economist, framed around whether the market could absorb SpaceX, Anthropic, and OpenAI without “indigestion.”

His explanation was that a company can be worth $1 trillion on paper while only a small portion of that value is active in the market. Founders may be locked up, may want to maintain control, employees and investors may face lockups, and some holders simply are not realistic sellers. The tradable supply is what affects index weighting, trading dynamics, and market absorption.

Coogan illustrated the concept by comparing major technology companies. Microsoft’s free float is effectively 100% because the founders have moved on and divested. Apple is around 99%, Broadcom 98%, Nvidia 96%, Amazon 91%, Alphabet 90%, and Tesla 89%. Meta is lower — roughly 86% to 88% — because Zuckerberg holds a large control stake and is not expected to sell in a way that would sacrifice control.

Company	Approximate free float discussed
Microsoft	100%
Apple	99%
Broadcom	98%
Nvidia	96%
Amazon	91%
Alphabet	90%
Tesla	89%
Meta	86%–88%

Coogan’s comparison of free float across large technology companies

SpaceX, in Coogan’s reading of the report he was discussing, would start with a much smaller tradable base. If it issued $75 billion of shares at a hoped-for $1.8 trillion valuation, he said, the initial free float would be about 4% of the company. Buyers in the IPO can technically sell quickly, Jordi Hays noted, though banks and platforms may discourage flipping by restricting future IPO access. With large retail interest, Hays expected some immediate trading anyway.

initial SpaceX free float discussed if the IPO sells $75 billion at a $1.8 trillion valuation

The index mechanics were the important part of Coogan’s explanation. He said many people assume that if a roughly $2 trillion company enters the S&P 500, index buyers must allocate more than 1% — potentially several percentage points — to it. But in the article he was discussing, most share indices weight firms based on the value of shares released for public trading, not the full theoretical market capitalization. Because SpaceX’s initial free float would be only about 4%, Coogan said its initial S&P 500 weight would be around 0.1%, not a full mega-cap weight.

A chart shown on screen, titled “Rocket stages” and attributed to company reports and The Economist, displayed SpaceX’s forecast free float as a percentage of total shares from July 2026 to July 2027. One line showed scheduled releases only; another showed releases triggered by share-price performance. Coogan said full unlock would take roughly a year or more, with acceleration if the stock traded up 30% or more after the IPO.

Hays added a political complication: Senator Elizabeth Warren had urged the SEC to halt SpaceX’s IPO, citing governance risks, Musk’s control, potential foreign investment concerns — especially Chinese investment — and SpaceX’s role as a U.S. defense contractor. Hays dismissed the posture as unsurprising from Warren, saying she had “never met a business that she liked,” before joking that perhaps she likes large financial institutions.

In Coogan’s telling, the free-float structure made the IPO look different from the headline valuation. The initial tradable slice, the gradual lockup releases, and the index methodology he described all pointed to a smaller immediate market impact than a $1.8 trillion valuation alone would imply, while leaving room for trading pressure as more shares become available.

AI Labs and Strategy Evals and Benchmarks AI Governance and Regulation AI Safety and Alignment Model Releases AI Product Management

Anthropic’s safety launch became a product trust problem

The sharpest trust issue was undisclosed degradation

The Social Reckoning may hurt Meta’s reputation more than its business

SpaceX’s IPO size is less important than how much stock actually trades

The frontier, in your inbox tomorrow at 08:00.