On Anthropic’s Project Glasswing and Claude Mythos Preview

Anthropic:

We formed Project Glasswing because of capabilities we’ve observed in a new frontier model trained by Anthropic that we believe could reshape cybersecurity. Claude Mythos2 Preview is a general-purpose, unreleased frontier model that reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities.

Mythos Preview has already found thousands of high-severity vulnerabilities, including some in every major operating system and web browser. Given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely. The fallout—for economies, public safety, and national security—could be severe. Project Glasswing is an urgent attempt to put these capabilities to work for defensive purposes…

We do not plan to make Claude Mythos Preview generally available, but our eventual goal is to enable our users to safely deploy Mythos-class models at scale—for cybersecurity purposes, but also for the myriad other benefits that such highly capable models will bring. To do so, we need to make progress in developing cybersecurity (and other) safeguards that detect and block the model’s most dangerous outputs. We plan to launch new safeguards with an upcoming Claude Opus model, allowing us to improve and refine them with a model that does not pose the same level of risk as Mythos Preview.

Claude Mythos Preview, by every benchmark, is not a normal large language model. It consistently scores way above its peers in coding, reasoning, and computer use. It probably is the most intelligent model ever made; even the maintainers of FFmpeg, a media framework used by every video hosting site on the internet, thought its code was written by humans. I think the term “artificial general intelligence” is largely marketing, but if it were defined as a program that consistently outperforms humans, Mythos is AGI for coding. It has discovered decades-old security vulnerabilities in a matter of days, and even figured out ways to string together multiple vulnerabilities to form novel, sophisticated attacks. Mythos is unique.

Knowing this, it doesn’t come as a shock that Anthropic determined this model is not safe for public use. In fact, Anthropic knows it will never be safe for the public, but it also knows that models like Mythos will be available in the market in the next few months. There are hundreds of researchers at OpenAI, Google, and most worryingly, the Chinese labs, hard at work on building models similar to Mythos, and they will succeed. But Anthropic made a Mythos-class model first, and in line with its mission statement of building safe artificial intelligence, it seeded it to a handful of technology companies to make all user-facing software impenetrable. Software safety is a fleeting goal — no software will ever be free of vulnerabilities or bugs — but Anthropic wants to ensure that the vulnerabilities a Mythos-class model could exploit are patched before such a model becomes public.

Anthropic, through this experiment — which the company calls “Project Glasswing” — also wants to understand how to constrain the model. Obviously the lab will, in a few months, publish Claude Opus 5 and Claude Sonnet 5, its next generation of pre-trained models that will probably come close to Mythos’ capabilities. OpenAI is also working on GPT-5.5, a freshly pre-trained model set to excel in coding and logic work beyond any current frontier model. These advancements are coming imminently, and Anthropic must figure out how to develop guardrails that the models cannot overcome. They must themselves be aligned with human interests and be safe enough not to obey malicious commands. And all of this must be done before the open-weight models catch up, both to remain competitive (remember the DeepSeek meltdown) and to let more security researchers prevent attacks.

This is a gargantuan, worrisome task. I’m not of the opinion that AI will take over the world anytime soon — as I have written many times, I think the humans in charge of the AI labs are far more dangerous than the technology itself, which seems fairly aligned with human interests. But I am worried that the open-weight labs will catch up too quickly, before major tech companies can fully address the vulnerabilities in their software. If, say, DeepSeek were to produce an open-weight, Mythos-level model before the industry patches the vulnerabilities Mythos surfaces, a malicious state-sponsored actor could launch major attacks on Western infrastructure. With the Pentagon’s bird-brained leadership, not even the U.S. government can serve as a line of defense. These attacks are already, in some ways, taking place now.

This is all quite concerning. The solution to this is for the private sector to make the best use of Mythos in the next few months, before this technology becomes mainstream. And everyone should update their software and open-source libraries, especially government agencies, big and small. But I guess this is what we wanted — “AGI.”