By David Griffin
Social media can bring people together. It enables connections to be made between individuals otherwise unlikely to cross paths, bonds to be forged between communities separated otherwise by space, gender, religion or politics. However, despite its benefits, it has also provided a platform for the sharing of hate speech. This in turn can result in actual violence against individuals and communities (Byman, 2021).
Hate speech detection on social media is largely dependent on the manual labelling of datasets. These datasets are then, in turn, used for the training and subsequent testing of machine learning-based hate speech detection models. This is labour-intensive work. Consequently, the datasets used are often relatively small in size. When these same datasets are used in testing new models, they generally perform well. Yet, when applied to datasets beyond the test scenario, their abilities are limited (Arango et al., 2019). This concern is further exacerbated for languages with less available data.
There is a multitude of different topics which online hate speech might relate to. In the detection of such speech, these topics are referred to as ‘hate domains’. Most studies thus far have considered hate speech in general, without considering the domain to which it pertains (Toraman et al., 2022).
A new paper from Toraman et al. (2022) seeks to create large datasets for the detection of hate speech. This is done for both a globally prevalent language, English, and a globally less prevalent language, Turkish, with each dataset comprising 100k Twitter posts. In doing so, the authors also consider the specific hate domain the language relates to, using some of the most frequently observed. These are namely religion, racism, gender, sports and politics.
The work described by Toraman et al. (2022) has three main objectives:
For inclusion in the datasets, approximately 20k tweets were retrieved for each hate domain in each of the two languages. Keywords were used to select the tweets as were other criteria, including limiting the number sourced from the same Twitter account and those shorter than five words in length.
Students performed the manual labelling of tweets based on guidelines, dividing them into three categories: Hate, Offensive and Normal. Hate tweets were considered those inciting violence or damage, or threatening an individual or group based on a characteristic or trait. Offensive tweets included those which insulted, humiliated, discriminated or taunted. All other tweets were considered Normal.
The machine-learning models examined in this work are tested on three common machine learning metrics: Precision, Recall and F1-score. Precision in this circumstance refers to the number of tweets identified as Hate tweets that were actually Hate tweets. Recall describes the proportion of tweets correctly identified as Hate tweets out of all Hate tweets present within the dataset. F1-score is a combination of both Precision and Recall metrics.
Using these metrics, three different types of machine learning-based models are tested: a Bag-of-words model, two separate Neural models and a Transformer-based model.
The results for different models vary considerably, however there are some general points of note from this study:
The authors also highlight some of the challenges identified in identifying hate speech using machine learning models:
This work by Toraman et al. (2022) provides valuable insights into the abilities of machine learning models to detect online hate speech. It also provides an overview of the challenges and difficulties faced in tackling this blight. As our dependence on online communication systems continues to grow, it is paramount this complex field of study keeps pace, protecting vulnerable individuals and communities.
[Source] Toraman, C., Sahinuc¸ F., Yilmaz, E.H. (2022). Large-scale hate speech detection with cross-domain transfer.
DOI: 10.48550/arXiv.2203.01111
[1] Arango, A., P´erez, J., and Poblete, B. (2019). Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conf. on Research and Development in Information Retrieval, p 45–54, New York, NY, USA. Association for Computing Machinery.
DOI: 10.1145/3331184.3331262
Byman, D. L. (2021). How hateful rhetoric connects to real-world violence. Available Here