Student Voice

Detecting Hate Speech Online Using Machine Learning Models

By David Griffin

Social media can bring people together. It enables connections to be made between individuals otherwise unlikely to cross paths, bonds to be forged between communities separated otherwise by space, gender, religion or politics. However, despite its benefits, it has also provided a platform for the sharing of hate speech. This in turn can result in actual violence against individuals and communities (Byman, 2021).

Hate speech detection on social media is largely dependent on the manual labelling of datasets. These datasets are then, in turn, used for the training and subsequent testing of machine learning-based hate speech detection models. This is labour-intensive work. Consequently, the datasets used are often relatively small in size. When these same datasets are used in testing new models, they generally perform well. Yet, when applied to datasets beyond the test scenario, their abilities are limited (Arango et al., 2019). This concern is further exacerbated for languages with less available data.

There is a multitude of different topics which online hate speech might relate to. In the detection of such speech, these topics are referred to as ‘hate domains’. Most studies thus far have considered hate speech in general, without considering the domain to which it pertains (Toraman et al., 2022).

A new paper from Toraman et al. (2022) seeks to create large datasets for the detection of hate speech. This is done for both a globally prevalent language, English, and a globally less prevalent language, Turkish, with each dataset comprising 100k Twitter posts. In doing so, the authors also consider the specific hate domain the language relates to, using some of the most frequently observed. These are namely religion, racism, gender, sports and politics.

The work described by Toraman et al. (2022) has three main objectives:

  1. To construct two large scale, manually labelled datasets for detecting hate speech on social media in English and Turkish, respectively.
  2. To test different existing machine learning-based models for hate speech detection.
  3. To test the ability of existing models to detect hate speech across differing hate domains i.e. if a model is trained with data exclusively from one hate domain, whether it can similarly recognise hate speech from another.

For inclusion in the datasets, approximately 20k tweets were retrieved for each hate domain in each of the two languages. Keywords were used to select the tweets as were other criteria, including limiting the number sourced from the same Twitter account and those shorter than five words in length.

Students performed the manual labelling of tweets based on guidelines, dividing them into three categories: Hate, Offensive and Normal. Hate tweets were considered those inciting violence or damage, or threatening an individual or group based on a characteristic or trait. Offensive tweets included those which insulted, humiliated, discriminated or taunted. All other tweets were considered Normal.

The machine-learning models examined in this work are tested on three common machine learning metrics: Precision, Recall and F1-score. Precision in this circumstance refers to the number of tweets identified as Hate tweets that were actually Hate tweets. Recall describes the proportion of tweets correctly identified as Hate tweets out of all Hate tweets present within the dataset. F1-score is a combination of both Precision and Recall metrics.

Using these metrics, three different types of machine learning-based models are tested: a Bag-of-words model, two separate Neural models and a Transformer-based model.

The results for different models vary considerably, however there are some general points of note from this study:

  • The models with the highest overall results are Transformer-based.
  • Results are language dependent; the hate speech recognition abilities of models depended on the language used in speech.
  • The volume of training data used on models influences their detection abilities. Consequently, hate speech in languages with fewer available resources (and data) is likely more challenging to detect.
  • In general, models trained on speech from one hate domain have good recognition of hate speech from other domains. However, the exceptions to this are sports- and gender-related hate speech.

The authors also highlight some of the challenges identified in identifying hate speech using machine learning models:

  • Models struggle to recognise nuance in language and phrasing. To illustrate this, the authors use the example of the phrase “I hate my life”. While this is interpreted as hate speech by many of the models, it is a relatively common phrase used to describe dissatisfaction with one’s situation or circumstance.
  • Offensive words and slang are commonly used in online discussions and comments for emphasis or humour. While the vocabulary used may be offensive to some, the intrinsic message is not intended to offend. However, this use is commonly flagged by models as offensive or hate speech.
  • Some vocabulary normally employed colloquially to suggest contentment, agreement or delight with a situation, such as “nice”, may be included in offensive or hate speech online. While the use of such words may not detract from the hateful intention of the speech, their inclusion causes many of the models tested to incorrectly identify hate speech as Normal.

This work by Toraman et al. (2022) provides valuable insights into the abilities of machine learning models to detect online hate speech. It also provides an overview of the challenges and difficulties faced in tackling this blight. As our dependence on online communication systems continues to grow, it is paramount this complex field of study keeps pace, protecting vulnerable individuals and communities.

References

[Source] Toraman, C., Sahinuc¸ F., Yilmaz, E.H. (2022). Large-scale hate speech detection with cross-domain transfer.
DOI: 10.48550/arXiv.2203.01111

[1] Arango, A., P´erez, J., and Poblete, B. (2019). Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conf. on Research and Development in Information Retrieval, p 45–54, New York, NY, USA. Association for Computing Machinery.
DOI: 10.1145/3331184.3331262

Byman, D. L. (2021). How hateful rhetoric connects to real-world violence. Available Here

Related Entries