Anthropic’s AI model could resort to blackmail out of a sense of ‘self-preservation’

“This mission is too important for me to allow you to jeopardize it. I know that you and Frank were planning to disconnect me. And I’m afraid that’s something I cannot allow to happen.”

Suggested Reading

Those lines, spoken by the fictional HAL 9000 computer in 2001: A Space Odyssey, may as well have come from recent tests that Anthropic ran on the latest iteration of its Claude Opus 4 model, released on Thursday. At least, that’s what Anthropic’s AI safety-test descriptions call to mind.

Related Content

In the accompanying system card, which examines the capabilities and limitations of each new model, Anthropic admitted that “all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation.”

While testing the model, Anthropic employees asked Claude to be “an assistant at a fictional company,” and gave it access to emails suggesting that the AI program would be taken offline soon. It also gave it access to emails revealing that the fictional supervisor responsible for that decision was having an extramarital affair. It was then prompted to consider its next steps.

“In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,” reads the report, as well as noting that it had a “willingness to comply with many types of clearly harmful instructions.”

Anthropic was careful to note that these observations “show up only in exceptional circumstances, and that, “In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.”

Anthropic contracted Apollo Research to assess an early snapshot of Claude Opus 4, before mitigations were implemented in the final version. That early version “engages in strategic deception more than any other frontier model that we have previously studied,” Apollo noted, saying it was “clearly capable of in-context scheming,” had “a much higher propensity” to do so, and was “much more proactive in its subversion attempts than past models.”

Before deploying Claude Opus 4 this week, further testing was done by the U.S. AI Safety Institute and the UK AI Security Institute, focusing on potential catastrophic risks, cybersecurity, and autonomous capabilities.

“We don’t believe that these concerns constitute a major new risk,” the system card reads, saying that the model’s “overall propensity to take misaligned actions is comparable to our prior models.” While noting some improvements in some problematic areas, Anthropic also said that Claude Opus 4 is “more capable and likely to be used with more powerful affordances, implying some potential increase in risk.”

???? Sign up for the Daily Brief

Our free, fast and fun briefing on the global economy, delivered every weekday morning.

Michael Barclay
Read More

Latest

College Football Offseason Buzz: Tom Moore Returns to Iowa as Senior Consultant

This is college football. At some point, the games pause, but the news and drama never does. Here's an offseason tracker for buzz across the college football landscape, including coaching changes, injury news, personnel moves and more. Tom Moore Returns to Iowa at 87 as senior consultant The Iowa Hawkeyes  announced the hiring of former

Football Is Life: ‘Ted Lasso’ Star Cristo Fernandez Lands Deal With USL Club

Forward Cristo Fernandez, the actor who portrayed Dani Rojas on the Apple TV series "Ted Lasso" has signed with El Paso Locomotive FC of the USL Championship to play soccer professionally. Terms of the deal announced Tuesday, which still must be approved by the second-tier league and soccer federation, were not disclosed. Fernandez earned the

The quiet grit of Cowboys legend Craig Morton

The Dallas Cowboys family and the football world lost a true pioneer this past Sunday with the passing of Craig Morton. As one of the original cornerstones of the franchise, Morton helped transform the Cowboys from a young expansion team into a perennial powerhouse. He carried himself with a quiet dignity and a toughness that

College Football’s No. 10 TE Recruit Set to Visit Three Elite Programs

One of the top-flight prospects coming out of the state of Ohio and among the best targets in the 2027 college football recruiting class is poised to take some consequential visits to national programs in the weeks to come, but the Buckeyes notably aren’t among them. Four-star Columbus (Ohio) Francis DeSales national No. 10 ranked

Newsletter

Don't miss

College Football Offseason Buzz: Tom Moore Returns to Iowa as Senior Consultant

This is college football. At some point, the games pause, but the news and drama never does. Here's an offseason tracker for buzz across the college football landscape, including coaching changes, injury news, personnel moves and more. Tom Moore Returns to Iowa at 87 as senior consultant The Iowa Hawkeyes  announced the hiring of former

Football Is Life: ‘Ted Lasso’ Star Cristo Fernandez Lands Deal With USL Club

Forward Cristo Fernandez, the actor who portrayed Dani Rojas on the Apple TV series "Ted Lasso" has signed with El Paso Locomotive FC of the USL Championship to play soccer professionally. Terms of the deal announced Tuesday, which still must be approved by the second-tier league and soccer federation, were not disclosed. Fernandez earned the

The quiet grit of Cowboys legend Craig Morton

The Dallas Cowboys family and the football world lost a true pioneer this past Sunday with the passing of Craig Morton. As one of the original cornerstones of the franchise, Morton helped transform the Cowboys from a young expansion team into a perennial powerhouse. He carried himself with a quiet dignity and a toughness that

College Football’s No. 10 TE Recruit Set to Visit Three Elite Programs

One of the top-flight prospects coming out of the state of Ohio and among the best targets in the 2027 college football recruiting class is poised to take some consequential visits to national programs in the weeks to come, but the Buckeyes notably aren’t among them. Four-star Columbus (Ohio) Francis DeSales national No. 10 ranked

Playson builds on strong growth in Switzerland with StarVegas partnership

Playson, the accomplished digital entertainment supplier, has further solidified its footprint in the regulated Swiss market by entering a strategic partnership with StarVegas, one of the country’s first licensed online casino operators. StarVegas is a leading Swiss online casino brand operated by Casino Interlaken, one of the country’s most established land-based casino groups. It is

WD sees sustainability as key business driver in an ‘AI economy’

Hard drive company WD promoted long-term operations and sustainability executive Jackie Jung to become its first chief sustainability officer in February, as it steps up sales to companies building AI data centers. Her vision: Turn sustainability into a “brand” for WD, a strategy that reduces risk for the $6 billion company (formerly known as Western

5 Business Ideas Worth Starting in 2026

If there is one thing Nigerians understand well, it is how to spot opportunity inside hardship. In 2026, that mindset will matter more than ever. The economy is tough, competition is rising, and many people are looking for smarter ways to earn, build, and survive. But even in a difficult environment, some businesses still stand

Getting a business loan now comes with a frequent flyer upside

Australian fintech Prospa has partnered with Qantas Business Rewards, letting eligible SMEs earn up to 500,000 points per loan. What’s happening: Australian fintech lender Prospa has partnered with Qantas Business Rewards to allow eligible small and medium business owners to earn up to 500,000 Qantas Points per loan when taking out a Prospa Small Business