Hate Speech
4chan & 8chan embeddings
September 18, 2023We have collected over 30M messages from the publicly available /pol/ message boards on 4chan and 8chan, and compiled them into a model of toxic language use. The trained word embeddings (±0.4GB) are released for free and may be useful for further study on toxic discourse or to boost hate speech detection systems: textgain.com/8chan. ...
Angry by design: toxic communication and technical architectures
September 18, 2023Hate speech and toxic communication online is on the rise. Responses to this issue tend to offer technical (automated) or non-technical (human content moderation) solutions, or see hate speech as a natural product of hateful people. In contrast, this article begins by recognizing platforms as designed environments that support particular practices while discouraging others. In ...
Anatomy of Online Hate: Developing a Taxonomy and Machine Learning Models for Identifying and Classifying Hate in Online News Media
September 18, 2023Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then ...
Hate Speech Detection on Twitter: Feature Engineering v.s. Feature Selection
September 18, 2023The increasing presence of hate speech on social media has drawn significant investment from governments, companies, and empirical research. Existing methods typically use a supervised text classification approach that depends on carefully engineered features. However, it is unclear if these features contribute equally to the performance of such methods. We conduct a feature selection analysis ...
Detecting the Hate Code on Social Media
September 18, 2023Social media has become an indispensable part of the everyday lives of millions of people around the world. It provides a platform for expressing opinions and beliefs, communicated to a massive audience. However, this ease with which people can express themselves has also allowed for the large scale spread of propaganda and hate speech. To ...
A Survey on Automatic Detection of Hate Speech in Text
September 18, 2023The scientific study of hate speech, from a computer science point of view, is recent. This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and main features used. This work also discusses the complexity of the concept of hate speech, defined in ...
Automated hate speech detection and the problem of offensive language
September 18, 2023A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. ...
Deep Learning for Hate Speech Detection in Tweets
September 18, 2023Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify a tweet as racist, sexist or neither. The complexity of the natural language constructs makes this task very challenging. We perform extensive experiments with multiple ...
Stereotypical Bias Removal for Hate Speech Detection Task using Knowledge-based Generalizations
September 18, 2023With the ever-increasing cases of hate spread on social media platforms, it is critical to design abuse detection mechanisms to pro-actively avoid and control such incidents. While there exist methods for hate speech detection, they stereotype words and hence suffer from inherently biased training. Bias removal has been traditionally studied for structured datasets, but we ...
Automatic Identification and Classification of Misogynistic Language on Twitter
September 18, 2023Hate speech may take different forms in online social media. Most of the investigations in the literature are focused on detecting abusive language in discussions about ethnicity, religion, gender identity and sexual orientation. In this paper, we address the problem of automatic detection and categorization of misogynous language in online social media. The main contribution ...