Studying Generalisability across Abusive Language Detection Datasets
Chapter
Published version
![Thumbnail](/ntnu-xmlui/bitstream/handle/11250/2628214/Swamy_etal.pdf.jpg?sequence=6&isAllowed=y)
Åpne
Permanent lenke
http://hdl.handle.net/11250/2628214Utgivelsesdato
2019Metadata
Vis full innførselSamlinger
Originalversjon
10.18653/v1/K19-1088Sammendrag
Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.