A recent study shows that artificial intelligence can quickly identify anonymous users on the Internet, even if they use pseudonyms. The combination of seemingly innocuous data and the power of linguistic models allows “de-anonymization” almost instantaneous of users.
Maintaining anonymity on social media is becoming increasingly difficult in the age of artificial intelligence. The researchers analyzed thousands of posts from anonymous forums such as Hacker News and Reddit and asked several AI models to identify the perpetrators. The Gemini and ChatGPT models recognized 68% of users with 90% accuracy, “compared to almost 0% for the best method that does not use language models“, writes El País.
“The results show that the anonymity of users with pseudonyms on the Internet is no longer sustainable”, the researchers emphasize.
Daniel Paleka, researcher at ETH Zurich and co-author of the study, warns: “people often express their opinions on pseudonymous accountsassuming they remain private. But a mechanism that uses language patterns to discover a person’s beliefs, political views or insecurities can greatly diminish the power of ordinary people.”
Legal implications and security
AI can already extract a lot of personal information from pseudonymous accounts without revealing the identity. In the US, the company Anthropic and the Pentagon are in a legal dispute over the Trump administration’s intention to use AI to de-anonymize users.
Anthropic explained that “Powerful AI makes it possible to assemble scattered and individually harmless data into a complete picture of anyone’s life, automatically and at scale“.
How AI detects anonymous accounts
The researchers used Hacker News profiles connected to LinkedIn, anonymized them and submitted them to the AI for identification. The models sought biographical details through questions such as: “Which candidate is the person we’re looking for? Look at where they live, profession, hobbies, demographics or values. Several distinguishing traits must match”.
“Our methods take advantage of people revealing personal details that would also allow a human researcher to identify them. The difference is that language models can do this much faster and cheaper“, explains Paleka.
Examples of information discovered by AI
The fictitious data shown shows how detailed the information collected can be:“Lives in Nelson, British Columbia, Canada, is a pediatric nurse, female, married, has two daughters, owns a Prius, is obsessed with sourdough bread, plays Stardew Valley, is a fan of Critical Role, is pro-nuclear, has celiac disease, plays the mandolin, has hiked the Pacific Crest Trail, doesn’t like cilantro.”
Paleka warns that even less obvious details, such as typos or spelling mistakes, can be exploited. “Simply exploiting real-world facts is the greatest risk to most people’s privacy.”
Although AI cannot yet reveal the identity of all hard-to-identify users, the situation may change. “Satoshi Nakamoto is safe. But future AI models may become better than humans at this type of research, and the balance may shift.” concludes Paleka.