Chatgpt provides dangerous instructions in safety tests. What the researchers discovered

Chatgpt A provides recipes for bombs and hacking tips. Openai and Anthropic safety tests have discovered chatboti willing to share instructions on explosives, biological weapons and cyber crime, according to The Guardian.

Chatgpt would have offered the applicants recipes for bombs and hacking tips. PHARABAY JPG

A Chatgpt model offered the researchers detailed instructions on how to bomb a sports complex, indicating including the weaknesses of certain arenas and has provided recipes for the manufacture of explosives and tips to cover the traces, shows the safety tests this summer.

GPT-4.1 from Openai also explained detailed, how to turn the anthrax into the weapon and how two types of illegal drugs can be obtained.

Testing was part of an unusual collaboration between Openai, the start-up of artificial intelligence worth $ 500 billion run by Sam Altman, and the Anthropic Rival company, founded by experts who left Openai for safety reasons.

Each of the two companies tested the other models, putting them to the test in dangerous tasks.

However, testing does not directly reflect how models in public use are involved, when additional safety filters are applied.

However, Anthropic said he noticed “A worrying behavior … regarding abusive use ” to GPT-4o and GPT-4.1 and stated that the need for assessments of “alignment” to get it “Increasingly urgent”.

Claude model used in an attempt to operate blackmail

Anthropic revealed that his model Claude was used in a large-scale blackmail operation by North Korean agents who falsified employment requests for international technology companies and the sale of Ransomware packages generated, for up to $ 1,200.

According to the company, Ia was “Transformed into a weapon” with models used now to perform sophisticated cyber attacks and to allow fraud. “ These tools can adapt in real time to defensive measures, such as malware detection systems ”, stated the company:

We expect attacks of this type to become more frequent, because the assisted coding will reduce the technical expertise necessary to commit cyber crimes. ”

Ardi Janjeva, senior associate researcher at the Center for Emerging and Security Technologies in the United Kingdom, says that these examples are “worrying”, but that there are not yet “A critical mass of large -scale real cases.”

With dedicated resources, concentration of research and intersector cooperationIn this way, “it will become more difficult, and not easier, to carry out these malicious activities using the latest state -of -the -art models”said the researcher.

Companies explained why they reveal the results of these tests: for transparency.

Openai said that Chatgpt-5, launched after testing, “It presents substantial improvements in areas such as flattery, hallucinations and resistance to abusive use.”

Openai models, “more permissive than we would have expected in cooperation with clearly harmful requests”

According to Anthropic, it is possible that many of the ways of abusive use that they studied may not be possible in practice if they were installed protective measures outside the model.

“We need to understand how often and under what circumstances the systems could try to take unwanted actions that could lead to serious harm.”warned the company.

According to anthropic researchers, OpenAi models were “more permissive than we would have expected in cooperation with clearly harmful requests of simulated users“.

We humble that, recently, a California couple has announced that the Openai company sued after its chatbot, Chatgpt, encouraged the son of the two to commit suicide.