dc.description.abstract | With the widespread use of online services like Facebook and Twitter, disseminating
hateful messages has become a simple matter. These messages not only
spoil the experience for other users of a service. There is also an increasing legal
pressure for the services to prevent and remove such hate-spreading content. For
this to be practically feasible, there is a need for systems that can automatically
detect hate speech in text.
Research on automatic detection of hateful and abusive language has been an
ongoing project over the last 20 years. However, the state-of-the-art is still not
good enough to be practically usable for identifying hate speech in a fully automatic
manner. Thus, this thesis continues the efforts to reach that goal.
With the increasing legal pressure to remove hate speech, and the multitude of
services and platforms this pressure applies to, detection approaches are needed
that do not depend on any information specific to a given platform. This is so
that the approach can be used across several different platforms without being
changed. For instance, the information stored about the text s author may differ
between services, and so using such data would reduce the general applicability
of the system. Therefore, the research in this thesis aims at avoiding any such
information, using exclusively text-based input in the detection.
This thesis proposes a novel, Deep Learning-based approach to hate speech detection,
using a two-pronged architecture that combines both Convolutional Neural
Networks and Long Short-Term Memory-networks. The proposed architecture uses
Character N-grams and Word Embeddings as inputs to its two prongs, which then
merge and produce a final classification. The experiments show that this architecture,
using its optimal configurations, performs better than most state-of-the-art
systems. | |