4chan & 8chan embeddings

Year: 2020

Aurthor: Voué, P., De Smedt, T. and De Pauw, G.

Publisher: Textgain

We have collected over 30M messages from the publicly available /pol/ message boards on 4chan and 8chan, and compiled them into a model of toxic language use. The trained word embeddings (±0.4GB) are released for free and may be useful for further study on toxic discourse or to boost hate speech detection systems: textgain.com/8chan.

View Publisher