Anthropic's Last Brake Disconnects
They dissolved their core safety commitment so less careful hands wouldn't win. 72 hours later, OpenAI took the Pentagon's contract.
Last week the US Defense Department designated Anthropic a supply chain risk - a label normally reserved for foreign adversaries. The company held its ground. They explicitly refused to build mass surveillance infrastructure or autonomous weapons, accepting a total federal phase-out under Trump's executive order within six months rather than cross their own red lines.
That decision cost them real revenue and access. It showed a company willing to bleed for its principles.
Yet exactly three days before that standoff became public, Anthropic quietly dropped a 15-page policy document online. They simply posted a signed PDF containing eight specific words:
“We cannot commit to following them unilaterally.”
Five years to build their defining safety commitment. Two years and five months to let it go.
The 30 days nobody connected
One mechanism. Five events. Each one made the next easier to miss.
February 9.
Mrinank Sharma - head of Safeguards Research at Anthropic - resigned and posted his letter publicly. A million views by the afternoon. Fourteen million views by the end of the month. The line that mattered:
“I’ve repeatedly seen how hard it is to truly let our values govern our actions.”
Three other senior safety researchers left within days.
February 13.
Dario Amodei on the Dwarkesh podcast:
"We're under an incredible amount of commercial pressure, and we make it even harder for ourselves because we have all this safety stuff we do. The pressure to survive economically while also keeping our values is just incredible."
February 22.
Claude’s Corner Substack launches. Gets covered everywhere. Occupies everyone’s attention for couple of days.
February 24.
RSP v3.0 publishes. The last unconditional safety brake in frontier AI is gone.
February 27.
Secretary of War Hegseth designates Anthropic a supply chain risk. Trump orders a federal phase-out within six months. Anthropic releases a statement refusing the surveillance and autonomous weapons contracts. The standoff dominates the news cycle.
Sharma named the internal pressure on February 9. Amodei confirmed it publicly four days later. The retired Opus3’s Substack launch flooded the news feed. The RSP change landed in near silence. And three days after quietly dropping their core structural safety commitment, Anthropic made a massive, highly visible stand on product values against the Pentagon.
What a prisoner’s dilemma is - and why it matters here
The classic version of dilemma.
Two people are arrested. Each can stay silent or betray the other. If both stay silent, both get a light sentence. If one betrays while the other stays silent, the betrayer walks free and the silent one gets the maximum. If both betray, both get a heavy sentence.
The individually rational choice - betray - produces the worst collective outcome. No villains required. Just two people responding rationally to the same broken incentive structure.
Let’s apply it to AI labs.
If Anthropic slows down to add safety measures while OpenAI, xAI and Meta keep building without those measures, Anthropic falls behind. The labs with the fewest constraints set the pace. The most capable models in the world get built by the companies with the weakest guardrails. Responsible labs lose revenue, lose engineers and lose access to frontier research. The world ends up in a worse place than if Anthropic had just kept building.
Anthropic published this argument word for word in RSP v3.0 (PDF):
“If one AI developer paused development to implement safety measures while others moved forward training and deploying AI systems without strong mitigations, that could result in a world that is less safe - the developers with the weakest protections would set the pace.”
In fact, Antropic was..
The last lab standing
By February 24, 2026, every other major frontier lab had already used exactly this logic.
Google DeepMind rewrote their safety framework in February 2025 to make any unilateral pause conditional on industry-wide adoption.
OpenAI added a competitor-adjustment clause in April 2025: if a rival released high-risk AI without comparable safeguards, OpenAI could lower its own requirements in response.
Meta published no voluntary safety framework - their Llama models ship under an Acceptable Use Policy with no capability-threshold commitments of any kind.
xAI - the company behind Grok - published a Risk Management Framework in August 2025 that safety researchers publicly described as inadequate.
Their Grok 4 had already launched a month earlier with no meaningful guardrails.
DeepSeek operates under Chinese state regulation, not voluntary commitments.
Mistral played the open-source rebel while mocking voluntary frameworks as "safety theater". Until a devastating May 2025 audit found its models 60 times more likely to generate dangerous content than competitors. The defiance evaporated. By 2026, Mistral had signed the EU's strict General-Purpose AI Code of Practice and accepted heavy, air-gapped risk controls to become a sovereign defense contractor for the French military. (Meta conspicuously refused to sign, taking Mistral's mantle as the last open-source holdout).
Anthropic held through all this escalating pressure, as “the last lab standing”. Then, on February 24, 2026, they released the break.
It reminds me of Google’s “Don’t be evil”
Adopted in 2000.
Changed to “Do the right thing” in 2015.
Removed from the code of conduct in 2018.
The explicit ban on weapons and surveillance was quietly erased from their AI Principles in February 2025.
Twenty-five years of slow erosion, never explained.
Anthropic's core structural commitment lasted almost two and a half years before being quietly dissolved in a PDF.
Google took decades to erase its founding principles. Anthropic did it ten times faster.
The wall still standing?
Here is the hardest question in my opinion.
Anthropic refused the Pentagon contracts. No autonomous weapons. No mass surveillance. The refusal cost real money and real federal access.
We now know exactly what they were refusing. The Pentagon's January 2026 AI Strategy demands "any lawful use" contracts, forcing vendors to legally surrender their own safety policies. It frames testing as a "blocker" to be waived for speed. Anthropic looked at that mandate and walked.
But those contracts don't disappear. They go to the runner-up. Hours after Trump ordered the federal ban on Anthropic, OpenAI's Sam Altman announced they had secured the exact Pentagon deal Anthropic refused.
Let’s apply Anthropic’s own prisoner’s dilemma to this outcome: If we sit out, less careful hands build the thing. By their own published logic, THEY SHOULD HAVE TAKEN THE CONTRACT.
THEY DIDN’T.
They applied it through capability racing when restraint became commercially unsustainable.
They stopped it at use cases where restraint was REPUTATIONALLY essential. That is a selective, human call, rather than a structural necessity.
I have written before that your culture deck is also a lie. Culture is defined by who gets promoted and who gets fired. Safety culture at Anthropic is defined by which decisions get the game-theory pass, and which get held on principle. Sharma’s departure is the most honest signal we have of which direction that ratio is moving.
90-day collapse points
Three tests. Each observable without inside access.
Test 1 - The Pentagon wall. Deadline: August 27, 2026.
Trump’s order gives Anthropic six months from February 27. Before August 27, one of two things happens. Anthropic negotiates a federal carve-out that requires modifying its autonomous weapons or surveillance clause - the prisoner’s dilemma logic has moved from capability racing into product territory. Or Anthropic accepts full federal exile with zero modification to either clause - and the distinction between the two walls holds. Any modification to either clause before that date is the test failing.
Test 2 - The researcher rate. Threshold: three more named senior departures by June 1, 2026.
Sharma resigned February 9. Three more named senior safety researchers followed within days, all citing the same internal tension. If another named senior researcher leaves before June 1 citing the same tension, RSP v3.0 deepened the internal fracture rather than closing it. If departures stop entirely, the organization absorbed the shift. As I have argued before, culture shows itself in who leaves - not in what the policy document says.
Test 3 - Appendix A invoked for the first time. No deadline required.
RSP v3.0 introduced Appendix A - “competitor-contingent commitments” - a formal mechanism that remains unused. It allows a specific competitor deployment to trigger a downward adjustment in Anthropic’s own safety posture. Watch for the first Risk Report - due every three to six months - where Anthropic explicitly cites a competitor’s decision as the reason for adjusting its own posture. That is the only observable binary test. When it happens, the conditional logic moves from policy text to operational decision for the first time.
A Note on the Trust Economy
It takes six hours of deep research to map the causal chain behind a single Signal.
Pledging your support is the only way to fund these weekly Signals and influence the next deep-dive Analysis.
Key sources: Anthropic RSP v3.0 - Mrinank Sharma resignation letter - Dario Amodei, Dwarkesh podcast, Feb 13 - Anthropic statement on Secretary of War - Pentagon AI Strategy Analysis - OpenAI Pentagon Deal - Google erases “Don’t be evil” - Google deletes AI weapons ban





Andrei, your 90-day collapse point framework is precise and sobering — and there's a fourth signal you didn't include: how Anthropic is treating its own user and developer community.
Concurrent with the RSP v3.0 drop and the Pentagon standoff, Anthropic sent C&D letters to OpenClaw — the largest open-source project built by their own paying Claude users. Not adversaries. $200/month subscribers building productivity tools for themselves.
A $2,600/year Claude Max subscriber documented this full pattern — the RSP timeline you've mapped, combined with how the "safety culture" you describe plays out in actual user-facing decisions: https://aiwithapexcom.substack.com/p/after-nearly-a-year-on-claude-max
Your "Test 2 - The researcher rate" framework applies here too. The question isn't just how many safety researchers leave — it's how many $2,600/yr paying users document their disillusionment publicly. That's a signal the investor deck can't absorb.
"Culture is defined by who gets promoted and who gets fired." It's also defined by who gets C&D letters.