Journal Article |
Online extremism and the communities that sustain it: Detecting the ISIS supporting community on Twitter
View Abstract
The Islamic State of Iraq and ash-Sham (ISIS) continues to use social media as an essential element of its campaign to motivate support. On Twitter, ISIS’ unique ability to leverage unaffiliated sympathizers that simply retweet propaganda has been identified as a primary mechanism in their success in motivating both recruitment and “lone wolf” attacks. The present work explores a large community of Twitter users whose activity supports ISIS propaganda diffusion in varying degrees. Within this ISIS supporting community, we observe a diverse range of actor types, including fighters, propagandists, recruiters, religious scholars, and unaffiliated sympathizers. The interaction between these users offers unique insight into the people and narratives critical to ISIS’ sustainment. In their entirety, we refer to this diverse set of users as an online extremist community or OEC. We present Iterative Vertex Clustering and Classification (IVCC), a scalable analytic approach for OEC detection in annotated heterogeneous networks, and provide an illustrative case study of an online community of over 22,000 Twitter users whose online behavior directly advocates support for ISIS or contibutes to the group’s propaganda dissemination through retweets.
|
2017 |
Benigni, MC., Joseph, K. and Carley, KM. |
View
Publisher
|
Journal Article |
Understanding Abuse: A Typology of Abusive Language Detection Subtasks
View Abstract
As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and we discuss its implications for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.
|
2017 |
Waseem, Z., Davidson, T., Warmsley, D. and Weber, I. |
View
Publisher
|
Journal Article |
Class-based Prediction Errors to Detect Hate Speech with Out-of-vocabulary Words
View Abstract
Common approaches to text categorization essentially rely either on n-gram counts or on word embeddings. This presents important difficulties in highly dynamic or quickly-interacting environments, where the appearance of new words and/or varied misspellings is the norm. A paradigmatic example of this situation is abusive online behavior, with social networks and media platforms struggling to effectively combat uncommon or nonblacklisted hate words. To better deal with these issues in those fast-paced environments, we propose using the error signal
of class-based language models as input to text classification algorithms. In particular, we train a next-character prediction model for any given class, and then exploit the error of such class-based models to inform a neural network classifier. This way, we shift from the ability to describe seen documents to the ability to predict unseen content. Preliminary studies using out-of-vocabulary splits from abusive tweet data show promising results, outperforming competitive text categorization strategies by 4–11%.
|
2017 |
Serra, J., Leontiadis, I., Spathis, D., Stringhini, G., Blackburn, J. and Vakali, A. |
View
Publisher
|
Journal Article |
A Survey on Hate Speech Detection using Natural Language Processing
View Abstract
This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.
|
2017 |
Schmidt, A. and Wiegand, M. |
View
Publisher
|
Journal Article |
Detecting the Hate Code on Social Media
View Abstract
Social media has become an indispensable part of the everyday lives of millions of people around the world. It provides a platform for expressing opinions and beliefs, communicated to a massive audience. However, this ease with which people can express themselves has also allowed for the large scale spread of propaganda and hate speech. To prevent violating the abuse policies of social media platforms and also to avoid detection by automatic systems like Google’s Conversation AI, racists have begun to use a code (a movement termed Operation Google). This involves substituting references to communities by benign words that seem out of context, in hate filled posts or Tweets. For example, users have used the words Googles and Bings to represent the African-American and Asian communities, respectively. By generating the list of users who post such content, we move a step forward from classifying tweets by allowing us to study the usage pattern of these concentrated set of users.
|
2017 |
Magu, R., Joshi, K. and Luo, J. |
View
Publisher
|
Journal Article |
Automated hate speech detection and the problem of offensive language
View Abstract
A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can
reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.
|
2017 |
Davidson, T., Warmsley, D., Macy, M. and Weber, I. |
View
Publisher
|