Studying Generalisability across Abusive Language Detection Datasets

Swamy, Steve Durairaj; Jamatia, Anupam; Gambäck, Björn

Swamy, Steve Durairaj; Jamatia, Anupam; Gambäck, Björn

Chapter

Published version

Åpne

Swamy (200.5Kb)

Permanent lenke

http://hdl.handle.net/11250/2628214

Utgivelsesdato

2019

Sammendrag

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

Utgiver

Association for Computational Linguistics

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal