Anthropic has sounded the alarm about the potential dangers of its newest artificial intelligence model, the “Claude Mythos,” with executives warning that the system is so advanced that it could become dangerous from a cybersecurity standpoint if it were made available to the general public, the New York Post reports.
Following a troubling internal review, the company revealed that the model demonstrated an unprecedented ability to identify and exploit vulnerabilities in critical infrastructure, including power grids, hospitals and power plants. According to Anthropic, Mythos “found thousands of high-severity vulnerabilities, including some in every major web browser and operating system.”
Instead of a public release, the company led by CEO Dario Amodei announced “Project Glasswing,” which will give access only to a select group of about 40 major organizations, including Amazon, Google, Apple, Nvidia, CrowdStrike and JPMorgan Chase, who will be able to use the system to identify and fix security vulnerabilities.
The approach reflects, according to some experts, a necessary compromise between innovation and risk. Roman Yampolskiy, an AI safety researcher at the University of Louisville, said the limited release might be the most practical option available.
The enterprise-only release is probably Anthropic’s best way to “give the technology to those who can fix the vulnerabilities, but not to the hackers who will find even more holes,” Yampolskiy told the NYP.
“Most likely, of course, there will be a leak of one kind or another,” he said. “Any level of restriction is preferable to full open access. Ideally, I wish this had never been developed in the first place. And it’s not like it’s going to stop.”
Yampolskiy added that such systems are expected to become more and more dangerous: “That’s exactly what we expect from these models – they will become more efficient in developing hacking tools, biological weapons, chemical weapons, new weapons that we can’t even imagine.”
Anthropic’s own tests seem to confirm some of these concerns. In one case, Mythos allegedly managed to escape from a secure “sandbox” environment designed to limit access to the Internet. One researcher only became aware of the breach after he “received an unexpected email from the AI model while eating a sandwich in a park.” In another case, the system discovered a vulnerability in the OpenBSD operating system that had gone undetected for 27 years.
Despite these risks, Anthropic argues that Project Glasswing could strengthen US cyber defense capabilities, especially in the context of increased aggression from geopolitical adversaries such as Iran, China and Russia.
Anthropic Statements
An Anthropic official said the selected organizations were chosen for their pivotal role in the global digital ecosystem. “We focused on organizations whose software represents the largest part of the global cyber attack surface,” the official said.
ChatGPT provides dangerous instructions in safety tests. What the researchers discovered
“These are companies that build and maintain the operating systems, browsers, cloud platforms and financial infrastructure that billions of people rely on every day,” he added. “When you discover a vulnerability in their systems and it’s fixed, the patch protects all the users of that software — in many cases, hundreds of millions of people.”
The company is also in active discussions with US government officials about how Mythos could support both defensive and offensive cyber capabilities.
“The Claude Mythos Preview illustrates what is now possible for defenders at scale, and adversaries will inevitably try to exploit the same capabilities,” explained Elia Zaitsev, CTO at CrowdStrike.
What the critics say
However, not everyone is convinced that Anthropic’s actions are consistent with its warnings. Critics argue that heavy publicity of the model’s capabilities could fuel attention rather than caution.
Perry Metzger, president of the AI Alliance for the Future policy organization in Washington, DC, said the company’s messages have spread “like wildfire.”
“You’d better pay for access to Glasswing or manage to get in, because only they are responsible enough to decide who should and shouldn’t have access. They’re the experts, after all,” Metzger said sarcastically. “I find the whole situation frustrating.”
Some critics have gone further, accusing Anthropic of “regulatory capture” – shaping future rules to its own advantage and to the disadvantage of competitors. Among those who have raised such concerns are figures in Washington, including President Trump’s AI adviser David Sacks.
“At every stage of the discussion about the emergence of AI, Dario Amodei believes that he, and only he, is qualified to decide what this technology can do and who can access it,” said Nathan Leamer, executive director of Build American AI. “He is the only modern Solomon who will decide who will regulate this area. Who else needs public debate?”
Anthropic has denied these allegations, pointing out that Project Glasswing includes companies developing their own competing AI models. The company also highlighted its support for open-source security initiatives.
“We’ve made our most capable model available to AWS, Apple, Google, Microsoft and others to identify and fix vulnerabilities in their own systems, and we’ve prioritized the open-source community by donating $4 million to organizations like the Linux Foundation and the Apache Software Foundation,” the official said.
Some in the industry have drawn comparisons to earlier moments in AI development. An anonymous source noted that OpenAI warned in 2019 that the GPT-2 model was too dangerous for a public release, at a time when both Amodei and Jack Clark were still working there.