A Context Aware Embedding for the Detection of Hate Speech in Social Media Networks

Proliferation of social media platforms in recent past has resulted into upsurge in the number of users. Advent of these sites have paved way for the users to easily express share and communicate. In such a scenario, it is imperative to analyze the content and identify nasty content so as to avoid unpleasant situations. Machine learning techniques are extensively used for this purpose. In this paper, we propose a language model for the identification of hate speech in twitter data. Distil-BERT, a context aware embedding model along with Support Vector Machine (SVM) for the classification of hate speech has been used. SVM with a 10-fold cross validation and linear kernel has been found to provide better accuracy as compared to existing models. Results show that accuracy is improved with the use of context aware embedding model.

Tags: Hate Speech, Social Media, Support Vector Machine, Twitter