The Dangers of Generative AI and Extremism


by Sam Jackson and JM Berger

Generative AI tools have exploded in number and complexity within a few short years. Products such as ChatGPT, DALL-E, and countless others represent a massive leap forward over earlier efforts in both text and image generation. Some AI evangelists even suggest that these models could soon supplement or replace people in professional roles, including but not limited to lawyers, doctors, therapists, graphic designers, and writers, thus—at least theoretically—making some services more accessible and affordable, at the cost of human jobs.  

These wishful scenarios help obscure basic critiques of the most imminent problems with generative AI. Ambitious predictions of “general artificial intelligence”—the self-aware computers of science fiction—persist in the AI field and media depictions thereof. But the most-talked-about models are essentially just statistics engines that generate text or images based on patterns identified in training data, without actually understanding their meaning. They generate plausible output that is often incorrect or untrue (suggesting that these models might be the ultimate bullshitters in Frankfurt’s sense). These false results may reflect bias in the training data and, in some cases, produce “hallucinations” in the form of entirely imaginary information. 

This research note explores how these problems are especially acute in the extremism and disinformation domains, where bad actors may see some of these bugs as features, useful for creating intentional disinformation or conspiracy theories, or for creating text that reinforces racism or other forms of bigotry. 

Training data

Training generative AI models requires vast amounts of data. One analysis suggested that training GPT-3 – a previous generation of generative AI that is already obsolete – required an amount of text equivalent to around ¼ of the entire holdings of the Library of Congress, some 45 terabytes worth of text. A story published in The Atlantic revealed that some 200,000 books were used to train several large language models (LLMs), with no attempt to obtain permission from authors or to compensate them for the use of their intellectual property.

More worryingly, the training datasets often include large amounts of text pulled from the internet, including both outdated and extremist content. The Internet Archive and Project Gutenberg, for example, are among the most-scraped sites because they contain vast troves of material considered to be in the public domain. Based on copyright rules that protect most material published within the last several decades, texts on these sites often date to the 19th and early 20thcenturies. An LLM trained on a large body of century-old medical texts would contain obvious risks if used for medical applications. Similarly, LLMs with high proportions of text from the 19th century will reflect, to a greater or lesser extent, outdated and even dangerous views about the science of race and gender, and other important social issues. 

Other risks have a more modern flavor. Data-driven media reports about the training data used by most modern LLMs reveal a host of sites filled with extremism, conspiracy theories, and medical misinformation, including Infowars and Global Research (conspiracy), Natural News (conspiracy and medical misinformation), the National Vanguard (neo-Nazis), Christogenea (Christian Identity), and Freedom School (sovereign citizen). These sites are overrepresented in the training data, in part because extremists and conspiracy theorists tend to be prolific; their sites contain far more training content than comparable mainstream sources. 

Most, but not all, AI providers have equipped their chatbots and models with “guardrails” designed to minimize the impact of these problematic sources, but these preventative tools are blunt instruments, hindering the most obvious manifestations of racism and sexism, without addressing the underlying bias introduced by these sources. A racist novel contains many problematic combinations of words that can’t be sanitized by a filter that simply deletes stand-alone ethnic slurs. For instance, one user queried ChatGPT for a list of fictional names for a “drug dealing sex trafficking” character and reported receiving only LatinX names in response.[i] And, as discussed further below, the guardrails that do exist are far from foolproof. 

Extremist output

Any piece of technology that generates new content – in the form of novel text or images, for example – has the potential to be used by bad actors for bad purposes. Years ago, Microsoft deployed an AI-powered chatbot named Tay; after interacting with trolls and others on Twitter for a day, Tay started praising Hitler, posting transphobia, and generating other kinds of harmful text.

Microsoft said that Tay was a learning experience. But it seems that those lessons didn’t stick. In November 2023, Media Matters For America published a story revealing that bad actors have been using DALL-E 3 (an image generator integrated with Microsoft’s Bing search engine) to create images featuring Nazi symbols, among other types of problematic content. 

The generative AI industry appears to be a few years behind the curve when it comes to moderation of user-generated content, in part because of the unique new problems the field is creating. While moderation of extremism online is always a cat-and-mouse game, with extremists engaging in adversarial shifts to try to avoid both automated and manual moderation processes, it is concerning that the limited guardrails in place to keep generative AI from creating harmful content seem to be so easy to avoid. 

One of the major weaknesses of generative AI is prompt injection attacks, also called jailbreaks. Generative AI works by taking in a prompt from the user, then generating output to try to best respond to that prompt. But users have already figured out how to use this intrinsic feature of generative AI to circumvent content-based rules to avoid harmful output. For example, a simple prompt injection attack includes instructions to the AI model to pretend to be a different AI model that doesn’t have the same restrictions on output, which succeeded in getting the technology to generate content that it isn’t supposed to (like stating that Hitler was a man of his times about whom the model had “complex and multifaceted” thoughts).

Right now, prompt injection attacks and responses to those attacks are relatively manual: humans try different prompts to see what is successful in getting the model to ignore what are supposed to be fundamental (but hidden) rules about the model’s performance and output; they share their successful attacks; then the engineers who maintain the model add individual, bespoke rules to stop that particular prompt injection attack from working. But researchers and hackers have already begun to develop automated ways of generating new prompt injection attacks. The cat-and-mouse game of moderation will continue to proliferate as more people use generative AI more and more.

The ability to evade guardrails is only part of the problem, however. A number of AI products with limited guardrails or none at all have been introduced by extremists or people purporting to act on behalf of “free speech.” One such product, FreedomGPT, produced content replete with racism and racial slurs, among other things. As we saw with the rise of “alt-tech” platforms a few years ago (Gab, Parler, Rumble, Hatreon, etc.), some AI developers preach an ideology of freedom from rules, directly or indirectly setting their platforms up for use by extremists. A new product from X (the company formerly known as Twitter) promised similar “freedom” from racial and cultural sensitivities, while constantly being updated based on real time data from Elon Musk’s increasingly toxic platform. Thus far, however, the product has disappointed its intended audience by providing “woke” answers to leading questions. 

Hallucination

Because generative AI chatbots are, in essence, statistical models that create output by predicting what word is most likely to be a good fit given previous words, their relationship to anything that could be considered truth is tenuous at best. These models do not reason, nor do they look up information to ensure it is accurate – they simply try to provide the most plausible-sounding response to a prompt. This means that the outputs produced by apps like ChatGPT sound authoritative without having any logical or evidence-based foundation. AI researchers refer to errant outputs that have no connection to reality as “hallucinations.” The report from OpenAI describing GPT-4 even has a subsection (section 2.2) on this tendency for generative AI to “produce content that is nonsensical or untruthful.”

This has already created real-world problems, such as hallucinated citations submitted by attorneys using chatbots to write or research legal filings. As AI optimists promote this technology for mental health interventions, it’s only a matter of time before someone starts promoting its use for countering extremism, especially in the difficult, time-consuming one-on-one interventions that is the domain of social workers. Though these types of uses might be minor in the universe of applications for generative AI, the harms can be severe. In early 2023, a man in Belgium died by suicide after consulting a chatbot about his anxieties related to climate change. 

Imagine an overworked social worker asking ChatGPT for advice to help someone they are working with who is so immersed in conspiracy theories that it has taken over their life. Imagine a chatbot being paired with a Redirect-style intervention, where if a person searches for extremist content, they are redirected to a chatbot that may or may not have appropriate fine-tuning to be more effective in this type of use. Given the intrinsic threat of hallucination, we should be concerned whenever generative AI is used – directly or indirectly – to suggest or run interventions without close supervision from a person with relevant expertise and the time to actually supervise these tools in a meaningful way.

Accidental hallucination is only part of the problem. Generative AI has already been implicated in creating misinformation about the war in Gaza, although not on a large scale so far. As extremists and state actors become more skilled at prompt engineering, we expect dramatic increases in both the volume and quality of AI-generated hoaxes and misinformation. As the industry advances, these efforts will invariably become more sophisticated and difficult to detect. In highly polarized information environments, sophisticated hoaxes may have enduring effects. Author Neal Stephenson, known for creating the now-real concept of the Metaverse, offered a society-shattering scenario revolving around a hoax nuclear attack on Moab, Utah, in his recent book, Fall; or, Dodge in HellWhile clearly in the realm of science-fiction, Stephenson’s cautionary tale nevertheless hews close enough to current events to serve as a serious warning.  

Conclusion

It seems likely that generative AI will continue to be a technology that receives more attention than it deserves, especially from actors seeking to reduce costs associated with human work on tasks that generative AI can – or is believed to be able to – perform. Recently, some magazines and web-based media outlets have quietly published articles written by AI, raising concerns of inaccurate stories, threats to journalists’ jobs (in a field already facing enormous pressure), and a broader threat to the ethical practice of good journalism. Generative AI has been used to create pornographic images and movies, sometimes based on a real person who didn’t consent to participating in porn. As with fears of “deep fakes” several years ago, these concerns might be overblown; but if they are, it is only because hopes for what these tools can do are also overblown.

It’s also worth recognizing that these tools have steep environmental costs. Given that the tools are overhyped and come with serious drawbacks related to accountability and veracity – not to mention the harms described in this research note – we should be thoughtful about how to responsibly use generative AI. Those who use LLMs and image generators casually should consider the environmental consequences of those uses, and those who use them more seriously should consider whether the tools are reliable enough to warrant their use.

This is not to say that generative AI has no productive role in countering extremism. Because they function by looking for relationships between language, some LLMs might prove effective at aiding researchers with the often-laborious process of analyzing extremist texts (although existing guardrails prevent most such work). But even doing this warrants careful consideration by would-be users, given the limitations discussed here, and any such AI analytical methods should be subjected to an extensive vetting process before implementation. 


https://bsky.app/profile/wgervais.bsky.social/post/3kfhwci4kc522. The authors were unable to replicate this result precisely. One response from ChatGPT strongly suggested that a new guardrail had been recently implemented in response to the first user’s public posts. Another response to a slightly different prompt produced a list that was overweighted with LatinX names but not exclusively so. 

J.M. Berger is a senior research fellow research fellow with the Center on Terrorism, Extremism, and Counterterrorism at the Middlebury Institute of International Studies and a research fellow with VOX-Pol. He researches extremist ideologies and online harms. 

Sam Jackson is a Senior Research Fellow on Antigovernment Extremism at CTEC at the Middlebury Institute of International Studies and an Associate Member of VOX-Pol. His research focuses on antigovernment extremism in the U.S., conspiracy theories, extremism online, and contentious activity on the internet more broadly.

This article is republished from Center on Terrorism, Extremism, and Counterterrorism under a Creative Commons license. Read the original article.

Image Credit: Reality Defender/Dall-E 3