‘Like Sheep Among Wolves’: Characterizing Hateful Users on Twitter

Hateful speech in Online Social Networks (OSNs) is a key challenge for companies and governments, as it impacts users and advertisers, and as several countries have strict legislation against the practice. This has motivated work on detecting and characterizing the phenomenon
in tweets, social media posts and comments. However, these approaches face several shortcomings due to the noisiness of OSN data, the sparsity of the phenomenon, and the subjectivity of the definition of hate speech. This works presents a user-centric view of hate speech, paving the way for better detection methods and understanding. We collect a Twitter dataset of 100, 386 users along with up to 200 tweets from their timelines with a randomwalk-based crawler on the retweet graph, and select a subsample of 4, 972 to be manually annotated as hateful or not through crowdsourcing. We examine the difference between user activity patterns, the content disseminated between hateful and normal users, and network centrality measurements in the sampled graph. Our results show that hateful users have more recent account creation dates, and more statuses, and followees per day. Additionally, they favorite more tweets, tweet in shorter intervals and are more central in the retweet network, contradicting the “lone wolf” stereotype often associated with such behavior. Hateful users are more negative, more profane, and use less words associated with topics such as hate,
terrorism, violence and anger. We also identify similarities between hateful/normal users and their 1-neighborhood, suggesting strong homophily.

Tags: Big Data, Content Analysis, Data Mining, Hate Speech, Twitter