Inclusive Counterfactual Generation: Leveraging LLMs in Identifying Online Hate

1 July 2024. Centre member Prof Arjumand Younus, in collaboration with M Atif Qureshi (Director of Explainable Analytics Group at TU Dublin) and Simon Caton (UCD School of Computer Science) have released a conference paper for Web Engineering.

The paper examines the use of counterfactually augmented data as a solution for hate speech detection using LLMs such as ChatGPT:

Counterfactually augmented data has recently been proposed as a successful solution for socially situated NLP tasks such as hate speech detection. The chief component within the existing counterfactual data augmentation pipeline, however, involves manually flipping labels and making minimal content edits to training data. In a hate speech context, these forms of editing have been shown to still retain offensive hate speech content. Inspired by the recent success of large language models (LLMs), especially the development of ChatGPT, which have demonstrated improved language comprehension abilities, we propose an inclusivity-oriented approach to automatically generate counterfactually augmented data using LLMs. We show that hate speech detection models trained with LLM-produced counterfactually augmented data can outperform both state-of-the-art and human-based methods.

You can access the paper here and find Prof Younus’ blog post on the topic here also.

Skip to content