Radical Reddits: into the Minds of Online Radicalised Communities.

The online domain potentially provides research with a vast body of data. The big quest within contemporary research is to make sense of all this data. A possibility to handle such data is by combining methods from the fields of new media and linguistics. Several studies have sought to understand how radicalism online comes to exist and grows. To date, however, none of these studies have fruitfully analysed language patterns within online communities that move beyond keyword analysis. In this thesis, I demonstrate a proof of concept for a classifier analysing and predicting salient language features within online radical discourse from the social media platform Reddit. Data consist of two datasets, radical and non-radical in nature, both containing 1 millions lines of text per dataset. The radical dataset is known for its radical nature, promoting radicalism in a variety of beliefs such as anti-feminism or white supremacy. Using software libraries as NLTK and SciKit within Python, I submitted that data to keyword and collocation frequency count, lexical diversity, a part-of-speech tagger and ultimately as features for a document classifier. Results showed that the radical discourse used in this thesis contains salient language features and show a clear sense of a virtual community. Finally, I discuss the implications of this thesis and provide directions for further research. Data was provided by TNO The Hague as part of the VOX-Pol project.

Tags: Big Data, Data Mining, Quantitative, Radicalisation, Reddit, Social Media