Detecting hate speech online using machine learning models

By David Griffin

Social media can bring people together. It enables connections to be made between individuals otherwise unlikely to cross paths, bonds to be forged between communities separated otherwise by space, gender, religion or politics. However, despite its benefits, it has also provided a platform for the sharing of hate speech. This in turn can result in actual violence against individuals and communities (Byman, 2021).

Hate speech detection on social media is largely dependent on the manual labelling of datasets. These datasets are then, in turn, used for the training and subsequent testing of machine learning-based hate speech detection models. This is labour-intensive work. Consequently, the datasets used are often relatively small in size. When these same datasets are used in testing new models, they generally perform well. Yet, when applied to datasets beyond the test scenario, their abilities are limited (Arango et al., 2019). This concern is further exacerbated for languages with less available data.

There is a multitude of different topics which online hate speech might relate to. In the detection of such speech, these topics are referred to as ‘hate domains’. Most studies thus far have considered hate speech in general, without considering the domain to which it pertains (Toraman et al., 2022).

A new paper from Toraman et al. (2022) seeks to create large datasets for the detection of hate speech. This is done for both a globally prevalent language, English, and a globally less prevalent language, Turkish, with each dataset comprising 100k Twitter posts. In doing so, the authors also consider the specific hate domain the language relates to, using some of the most frequently observed. These are namely religion, racism, gender, sports and politics.

The work described by Toraman et al. (2022) has three main objectives:

To construct two large scale, manually labelled datasets for detecting hate speech on social media in English and Turkish, respectively.
To test different existing machine learning-based models for hate speech detection.
To test the ability of existing models to detect hate speech across differing hate domains i.e. if a model is trained with data exclusively from one hate domain, whether it can similarly recognise hate speech from another.

For inclusion in the datasets, approximately 20k tweets were retrieved for each hate domain in each of the two languages. Keywords were used to select the tweets as were other criteria, including limiting the number sourced from the same Twitter account and those shorter than five words in length.

Students performed the manual labelling of tweets based on guidelines, dividing them into three categories: Hate, Offensive and Normal. Hate tweets were considered those inciting violence or damage, or threatening an individual or group based on a characteristic or trait. Offensive tweets included those which insulted, humiliated, discriminated or taunted. All other tweets were considered Normal.

The machine-learning models examined in this work are tested on three common machine learning metrics: Precision, Recall and F1-score. Precision in this circumstance refers to the number of tweets identified as Hate tweets that were actually Hate tweets. Recall describes the proportion of tweets correctly identified as Hate tweets out of all Hate tweets present within the dataset. F1-score is a combination of both Precision and Recall metrics.

Using these metrics, three different types of machine learning-based models are tested: a Bag-of-words model, two separate Neural models and a Transformer-based model.

The results for different models vary considerably, however there are some general points of note from this study:

The models with the highest overall results are Transformer-based.
Results are language dependent; the hate speech recognition abilities of models depended on the language used in speech.
The volume of training data used on models influences their detection abilities. Consequently, hate speech in languages with fewer available resources (and data) is likely more challenging to detect.
In general, models trained on speech from one hate domain have good recognition of hate speech from other domains. However, the exceptions to this are sports- and gender-related hate speech.

The authors also highlight some of the challenges identified in identifying hate speech using machine learning models:

Models struggle to recognise nuance in language and phrasing. To illustrate this, the authors use the example of the phrase “I hate my life”. While this is interpreted as hate speech by many of the models, it is a relatively common phrase used to describe dissatisfaction with one’s situation or circumstance.
Offensive words and slang are commonly used in online discussions and comments for emphasis or humour. While the vocabulary used may be offensive to some, the intrinsic message is not intended to offend. However, this use is commonly flagged by models as offensive or hate speech.
Some vocabulary normally employed colloquially to suggest contentment, agreement or delight with a situation, such as “nice”, may be included in offensive or hate speech online. While the use of such words may not detract from the hateful intention of the speech, their inclusion causes many of the models tested to incorrectly identify hate speech as Normal.

This work by Toraman et al. (2022) provides valuable insights into the abilities of machine learning models to detect online hate speech. It also provides an overview of the challenges and difficulties faced in tackling this blight. As our dependence on online communication systems continues to grow, it is paramount this complex field of study keeps pace, protecting vulnerable individuals and communities.

FAQ

Q: How were students involved in the manual labelling of tweets, and what ethical considerations were taken into account during this process? Considering the potentially distressing nature of hate speech, what measures were put in place to support the students' well-being during their participation in the research?

A: Students played a crucial role in the manual labelling of tweets, categorising them based on predefined guidelines into hate, offensive, or normal speech. This process is essential for training machine learning models in text analysis, including understanding nuances in student voice and other forms of communication on social media. Ethical considerations would have been paramount, given the sensitive nature of the content. Typically, this involves providing students with clear guidelines on handling distressing content, access to psychological support, and training on recognising and managing any emotional impact. The inclusion of students in such research underscores the importance of student voice in developing solutions to online hate speech, ensuring that their perspectives and resilience are considered in tackling this issue.

Q: How effective are these models in adapting to evolving online slang and expressions, particularly those that might emerge from student or youth cultures? Are there ongoing efforts to update the models to better understand the context and subtext of language used in social media?

A: The effectiveness of these models in adapting to evolving online slang and expressions, especially those from student or youth cultures, is a challenging aspect of text analysis. Given the dynamic nature of language on social media, models often require continuous updates and training on new datasets to grasp the context and subtext accurately. This includes understanding the student voice, which can be particularly nuanced and varied. Efforts to update these models are ongoing, involving the collection of new data, retraining of models with current language use, and incorporating feedback mechanisms to learn from inaccuracies. Engaging with student communities to understand emerging slang and expressions can significantly enhance the models' adaptability and effectiveness.

Q: Beyond detecting hate speech, how can the findings and methodologies from this study be applied to other areas of text analysis and online behavior monitoring, especially in educational settings where student voice is critical? For instance, could these models be adapted for monitoring cyberbullying among students or analyzing student sentiments in online learning environments?

A: The findings and methodologies from this study have broad applications in text analysis and online behaviour monitoring, particularly in educational settings where amplifying student voice is essential. By adapting these machine learning models, educational institutions can monitor cyberbullying among students, providing a safer online environment. Such models can be trained to identify signs of bullying or negative sentiments in students' online interactions, enabling timely intervention. Furthermore, analyzing student sentiments in online learning environments could help educators understand student engagement, well-being, and learning challenges. This approach not only contributes to safer online spaces but also supports the development of more responsive and student-centered educational practices, where student voice is acknowledged and valued.

References

[Source] Toraman, C., Sahinuc¸ F., Yilmaz, E.H. (2022). Large-scale hate speech detection with cross-domain transfer.
DOI: 10.48550/arXiv.2203.01111

[1] Arango, A., P´erez, J., and Poblete, B. (2019). Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conf. on Research and Development in Information Retrieval, p 45–54, New York, NY, USA. Association for Computing Machinery.
DOI: 10.1145/3331184.3331262

Byman, D. L. (2021). How hateful rhetoric connects to real-world violence. Available Here

Detecting hate speech online using machine learning models

FAQ

References

Related Entries