Using Online Data in Terrorism Research

by Stuart Macdonald, Elizabeth Pearson, Ryan Scrivens, and Joe Whittaker

This article summarises a recent paper published in Lara Frumkin, John Morrison, and Andrew Silke’s A Research Agenda for Terrorism Studies (Elgar).

Historically one of the greatest challenges for the study of extreme or terrorist groups was access. Today, online spaces offer researchers a level of access to primary data previously unimaginable. Yet the variety of data available lends itself to several methodological approaches. In this blog post, we consider the strengths, challenges and limitations of three of these: machine learning; case studies; and netnography. The aim is to introduce the reader to issues that should be considered at the design stage of a research project.

Machine Learning

Researchers are increasingly using machine learning algorithms to locate relevant materials within the overwhelming amount of information found online. Some advantages of machine learning include its ability to: evaluate large volumes of data and identify patterns not easily identifiable to humans; continuously improve and make more accurate predictions as it gains more experience with more data; and analyse data that are multi-dimensional and multi-variety, and with a wide range of applications. Machine learning, however, is not without limitations. It is highly susceptible to error because it is automated. It can also be a challenge to collect the massive volumes of data needed to properly train a machine learning algorithm. Further, it can be time consuming for an algorithm to learn from and process the data, and it oftentimes requires substantial resources to function, including added computer power.

Nevertheless, terrorism and extremism scholars have adopted various machine learning techniques in recent years, including but not limited to authorship analysis, sentiment analysis, and affect analysis. One relatively novel machine learning tool, sentiment analysis, has sparked the interest of many researchers. For example, sentiment analysis has been successfully used to detect extreme language, websites, and users online, as well as to measure levels of online propaganda and cyberhate following a terrorism incident, and to evaluate how radical discourse evolves over time. Sentiment analysis has also been used to detect violent extremist language and users, as well as to measure levels of – or propensity towards – violent radicalisation, all on a large scale.

Sentiment analysis, also known as ‘opinion mining’, is a category of computing science that specialises in evaluating the opinions found in a piece of text by organising data into distinct classes and sections, and assigning a piece of text with a positive, negative, or neutral polarity value. This is based on the notion that an author’s opinion toward a particular topic is reflected in the choice and intensity of words she or he chooses to communicate. Having said that, sentiment analysis has limitations. It is estimated, for example, that 21% of the time humans cannot agree amongst themselves about the sentiment within a given piece of text, with some individuals unable to understand subtle context or irony. Understandably, sentiment analysis systems cannot be expected to have 100% accuracy when compared to humans. Researchers have in turn suggested that combining sentiment analysis with other methods and/or semantic-oriented approaches can improve accuracy rates in general and the detection of extremist content online in particular. This includes combining sentiment analysis with classification software, affect analysis, social network analysis, or geolocation software. Researchers also suggest more work is needed to assess and potentially improve the classification accuracy and content identification offered by sentiment analysis software, which includes drawing comparisons between the performance of several sentiment methods and including a ‘comparative human evaluation’ component to validate a sentiment program’s classifications.

Case Studies

Case studies are ‘an approach to research that facilitates exploration of a phenomenon within its context using a variety of data sources.’ In online terrorism and extremism research, this involves collating information about a terrorist actor, group, or event from a range of sources into a case file. Collected data can then be analysed singularly in a high level of detail for either an individual actor, or a cell. Similarly, researchers can conduct a ‘multiple case study’ where cases are compared against each other, such as a small cohort of cases, or even as a large n sample of cases.

The primary benefit of this approach is the ability to analyse behaviours. Terrorists are a difficult research population to reach, as a large number die or are imprisoned because of their activities. Further, interacting with imprisoned terrorists is fraught with practical challenges and ethical issues. As a result, recent terrorism research has tended to focus on the content with which terrorists (potentially) interact, such as magazines, videos or images, or studies on terrorist supporters online on platforms such as Twitter or Telegram. These types of studies add to our knowledge in important ways but fall short of investigating actual behaviours because we cannot be sure how (if at all) terrorists engage with this type of content. A causal leap of faith is required to explain how it affects terrorists.

Nonetheless, case studies, as illustrated in Whittaker’s research, can offer invaluable insights for several reasons. Firstly, they can be used to study terrorists’ behaviours to establish whether such content is actually engaged. Secondly, a deeper dive can assess whether individuals are utilising specific types of terrorist content such as instructional material as part of their plots. Thirdly, because case studies collect data on a wider set of contextual factors, they offer an opportunity to analyse the use of instructional material, for example, in relation to other factors. Research that focuses on content and behaviours can offer a clear picture in which hypotheses can be advanced and tested.

There are, however, important limitations to case study data, particularly if the researcher is limited to secondary open-sources found on the internet. Data are likely to include biases based on the intent of the original author. This means some terrorist actors or incidents may have substantially more coverage than others, such as those that plotted attacks rather than travelled to Syria. If the data are uneven then it can skew the results towards the demographics of the more newsworthy incidents. Relatedly, relying on secondary sources raises issues of missing data. For instance, it is unrealistic to expect court documents or news articles to detail, for example, that an individual did not read jihadist magazines.

Despite these limitations, collecting open-source case study data via the internet can provide a fruitful way of advancing knowledge within the field of terrorism studies. It is incumbent on researchers to make the most of online data collection data sources such as news aggregators and academic journal libraries to gather these data and conduct robust research.


Ethnographies of extreme groups, such as those by Wiktorowiczs, Kenney, Weeks, and Pilkington, have contributed to increased understanding of how and why people radicalise. Such studies involve immersion in the cultures, rituals and practices of groups, becoming familiar with their members, and developing empathetic researcher-participant relationships with them. In digital sociology and cultural studies there is a parallel seam of expertise on the practices of digital communities. Netnography – ethnographic study of internet communities – emerged as a market research tool, analysing consumer practices. Subsequently it has enabled the study of communities of practice, considering questions of identity, friendship and affect, and exploring the ways in which online and offline spaces merge and intersect. Such work has shed light on the interactions between social life, technology and knowledge.

There are three important potential implications of applying netnography to online extremism research. The first is theoretical. Extremism researchers have sought to understand the precise role of online spaces in relation to offline. Some academics, governments, and the media have suggested online radicalisation can cause offline violence. Although it is clear that what happens in online spaces has an impact on aspects of terrorist behaviours, such as recruitment, propaganda or mobilisation, the reality is complex. The work of netnographers and digital sociologists has long challenged the idea of the internet as a delimited space. Research instead suggests the embeddedness of cyberspace in everyday life, meaning the dichotomous language of ‘off’ and ‘on’ does not do justice to more fluid understandings of the interlinked relations within them.

Second, digital sociology and netnography emphasise the emotional, affective, and gendered aspects of online communities, extreme or otherwise. Affect is often associated with bodily interaction, but also understood as emotion generated in community interactions. In radicalisation theory, the concept of affect is increasingly being applied to the understanding of processes of joining a group. Researchers have explored the ways in which recruitment is facilitated by ‘moral shock’, such as consuming distressing or brutal video or photo imagery, or of narratives aimed at mobilising engagement. What is less well explored is how relationships in extreme communities develop. Netnography offers some clues. For instance, Pearson found high levels of emotional work done by participants in an online Islamist community, where participants reported high personal risks of criminalisation and to mental health. Some of these risks came from outside their communities, some from within.

Finally, the application of findings from netnography would go some way to understanding the possibilities for deradicalisation of extreme actors. The study of extremism as a specific and yet ill-defined category has somewhat exacerbated the tendency to assume an extreme actor is fundamentally ‘other’. Existing netnographies of everyday online communities emphasise the features shared with extreme groups. The problem of extremism is then contextualised as a function of communities online, not simply a problem of extremists themselves. This fundamental insight enables a more nuanced approach to online behaviours observable in extreme communities, but evident elsewhere.

Netnography as methodology poses some of the challenges of ethnography offline. Access to extreme communities online is no less challenging. Researcher presence, even where possible, could alter interactions in communities or prompt deception. In practical terms, the best on offer might be ‘ethnography by proxy’. Netnography as theory, however, already offers foundational insights into the key questions for those studying online extremism, questions about the relations between offline and on-, and how communities work. Online extremism research should seek to build on pre-existing digital scholarship, and its lessons for extremism debates. Technology has made possible quantitative studies of large-n data associated with extreme groups. Qualitative in-depth studies such as netnography help to interrogate what this data means.


When designing a research project, it is important to be mindful of the strengths and limitations of the various available methodological approaches. The selection of methodological approach must be aligned with the research questions that your project will seek to answer. As this blog post has shown, different approaches are apt to answer different types of questions. As you review the existing literature, identify the contribution your study will seek to make and refine your research questions accordingly – it may become necessary to adapt your methodology too. Conversely, refinements to your methodology may necessitate amendments to your research questions. The same point applies to research ethics: different methodologies raise different ethical considerations and the way in which these ethics issues are addressed may require refinements to your methodology. In short, your research questions, methods and ethics are inextricably connected and so designing a research project is inevitably an iterative process.


Stuart Macdonald is a Professor in the School of Law at Swansea University. He is also the Coordinator of VOX-Pol.

Elizabeth Pearson is a Lecturer in Criminology with the Conflict, Violence and Terrorism Research Centre at Royal Holloway.

Ryan Scrivens is an Assistant Professor in the School of Criminal Justice at Michigan State University. He is also a Research Fellow at VOX-Pol.

Joe Whittaker a Lecturer in Cyber Threats at Swansea University.

Image Credit: PEXELS


Leave a Reply