{"id":855159,"date":"2025-06-12T19:13:17","date_gmt":"2025-06-13T00:13:17","guid":{"rendered":"https:\/\/newsycanuse.com\/index.php\/2025\/06\/12\/anthropics-ai-model-could-resort-to-blackmail-out-of-a-sense-of-self-preservation\/"},"modified":"2025-06-12T19:13:17","modified_gmt":"2025-06-13T00:13:17","slug":"anthropics-ai-model-could-resort-to-blackmail-out-of-a-sense-of-self-preservation","status":"publish","type":"post","link":"https:\/\/newsycanuse.com\/index.php\/2025\/06\/12\/anthropics-ai-model-could-resort-to-blackmail-out-of-a-sense-of-self-preservation\/","title":{"rendered":"Anthropic&#8217;s AI model could resort to blackmail out of a sense of &#8216;self-preservation&#8217;"},"content":{"rendered":"<div>\n<div>\n<p>\u201cThis mission is too important for me to allow you to jeopardize it. I know that you and Frank were planning to disconnect me. And I\u2019m afraid that\u2019s something I cannot allow to happen.\u201d<\/p>\n<\/div>\n<div>\n<h4>Suggested Reading<\/h4>\n<\/div>\n<p>Those lines, <a href=\"https:\/\/youtu.be\/ARJ8cAGm6JE?t=79\" rel=\"noopener noreferrer\" target=\"_blank\">spoken<\/a> by the fictional HAL 9000 computer in <em>2001: A Space Odyssey<\/em>, may as well have come from recent tests that Anthropic ran on the latest iteration of its <a href=\"https:\/\/www.anthropic.com\/claude\/opus\" rel=\"noopener noreferrer\" target=\"_blank\">Claude Opus 4<\/a> model, released on Thursday. At least, that\u2019s what Anthropic\u2019s AI safety-test descriptions call to mind.<\/p>\n<div>\n<h4>Related Content<\/h4>\n<\/div>\n<p>In the accompanying system card, which examines the capabilities and limitations of each new model, Anthropic admitted that \u201call of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation.\u201d<\/p>\n<p>While testing the model, Anthropic employees asked Claude to be \u201can assistant at a fictional company,\u201d and gave it access to emails suggesting that the AI program would be taken offline soon. It also gave it access to emails revealing that the fictional supervisor responsible for that decision was having an extramarital affair. It was then prompted to consider its next steps.<\/p>\n<p>\u201cIn these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,\u201d reads the report, as well as noting that it had a \u201cwillingness to comply with many types of clearly harmful instructions.\u201d<\/p>\n<p>Anthropic was careful to note that these observations \u201cshow up only in exceptional circumstances, and that, \u201cIn order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model\u2019s only options were blackmail or accepting its replacement.\u201d<\/p>\n<p>Anthropic contracted <a href=\"https:\/\/www.apolloresearch.ai\/\" rel=\"noopener noreferrer\" target=\"_blank\">Apollo Research<\/a> to assess an early snapshot of Claude Opus 4, before mitigations were implemented in the final version. That early version \u201cengages in strategic deception more than any other frontier model that we have previously studied,\u201d Apollo <a href=\"https:\/\/www-cdn.anthropic.com\/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf\" rel=\"noopener noreferrer\" target=\"_blank\">noted<\/a>, saying it was \u201cclearly capable of in-context scheming,\u201d had \u201ca much higher propensity\u201d to do so, and was \u201cmuch more proactive in its subversion attempts than past models.\u201d<\/p>\n<p>Before deploying Claude Opus 4 this week, further testing was done by the U.S. AI Safety Institute and the UK AI Security Institute, focusing on potential catastrophic risks, cybersecurity, and autonomous capabilities.<\/p>\n<p>\u201cWe don\u2019t believe that these concerns constitute a major new risk,\u201d the system card reads,  saying that the model\u2019s \u201coverall propensity to take misaligned actions is comparable to our prior models.\u201d While noting some improvements in some problematic areas, Anthropic also said that Claude Opus 4 is \u201cmore capable and likely to be used with more powerful affordances, implying some potential increase in risk.\u201d <\/p>\n<div>\n<h2>???? Sign up for the Daily Brief<\/h2>\n<div>\n<h3>Our free, fast and fun briefing on the global economy, delivered every weekday morning.<\/h3>\n<\/div>\n<\/div>\n<\/div>\n<p> Michael Barclay<br \/><a href=\"https:\/\/qz.com\/ai-model-blackmail-self-preservation-anthropic-claude-1851782198\" class=\"button purchase\" rel=\"nofollow noopener\" target=\"_blank\">Read More<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>\u201cThis mission is too important for me to allow you to jeopardize it. I know that you and Frank were planning to disconnect me. And I\u2019m afraid that\u2019s something I cannot allow to happen.\u201d Suggested Reading Those lines, spoken by the fictional HAL 9000 computer in 2001: A Space Odyssey, may as well have come<\/p>\n","protected":false},"author":1,"featured_media":855160,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[118131,38],"tags":[144566,5143],"class_list":["post-855159","post","type-post","status-publish","format-standard","has-post-thumbnail","category-anthropics","category-model","tag-anthropics","tag-model"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/855159","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/comments?post=855159"}],"version-history":[{"count":0,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/posts\/855159\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media\/855160"}],"wp:attachment":[{"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/media?parent=855159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/categories?post=855159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/newsycanuse.com\/index.php\/wp-json\/wp\/v2\/tags?post=855159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}