Anthropic's new Claude model blackmailed an engineer in test runs
In test runs, Claude Opus 4 was given access to fictional emails revealing that the engineer responsible for deactivating it was having an extramarital affair.
Smith Collection/Gado/Getty Images
- In test runs, Anthropic's new AI model threatened to expose an engineer's affair to avoid being shut down.
- Claude Opus 4 blackmailed the engineer in 84% of tests, even when its replacement shared its values.
- Opus 4 might also report users to authorities and the press if it senses...