Studying Generalisability across Abusive Language Detection Datasets
Chapter
Published version
Åpne
Permanent lenke
http://hdl.handle.net/11250/2628214Utgivelsesdato
2019Metadata
Vis full innførselSamlinger
Originalversjon
10.18653/v1/K19-1088Sammendrag
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.