Distant Supervision and Sentiment Embeddings for Ternary Twitter Sentiment Analysis

Lichtenberg, Frederik Gørvell de; Byrkjeland, Mats

dc.contributor.advisor	Gambäck, Björn
dc.contributor.author	Lichtenberg, Frederik Gørvell de
dc.contributor.author	Byrkjeland, Mats
dc.date.accessioned	2017-10-04T14:00:24Z
dc.date.available	2017-10-04T14:00:24Z
dc.date.created	2017-06-11
dc.date.issued	2017
dc.identifier	ntnudaim:16951
dc.identifier.uri	http://hdl.handle.net/11250/2458479
dc.description.abstract	Tang et al. (2014) acknowledged the context-based word embeddings inability to dis-criminate between words with opposite sentiments that appear in similar contexts. Anexample is the words good and bad two opposites that appear in the same con-texts. Context-based word embedding methods like word2vec would likely treat theseas similar words. Tang et al. proposed a promising method for incorporating sentimentinformation in word embeddings. These embeddings are called Sentiment-Specific WordEmbeddings orSentiment Embeddings.To train sentiment embeddings, large amounts of sentiment-annotated data are needed.Manual annotation is too expensive for this purpose. Fast, automatical annotation isused to set a low-quality (weak) label on large corpora of tweets. This procedure is oftenreferred to asdistant supervision. The traditional approach is to use the occurrences ofemoticons to guess binary sentiment (positive/negative).In this thesis, we compare various lexicon-based sentiment classifiers against eachother on manually annotated Twitter data from the International Workshop on SemanticEvaluation (SemEval). Their performance as distant supervision methods are tested aspart of a complete Twitter Sentiment Analysis system. Instead of only looking at thepositive and negative sentiment classes, the neutral class is included. Both predictionperformance and speed of the distant supervision methods are evaluated.We propose the Ternary Sentiment Embedding Model a new model for creatingsentiment embeddings for the ternary sentiment classification task. It is based on theHybrid Ranking Model of Tang et al. (2016), but trains on ternary-labeled distant-supervised data instead of binary-labeled. The model trains sentiment embeddings fromdatasets made with different distant supervision methods. The model is used as part ofa complete Twitter Sentiment Analysis system and is compared to existing systems.The experiments of Chapter 8 show that the Ternary Sentiment Embedding Modelperforms better than the Hybrid Ranking Model of Tang et al. (2016) in most cases. Ourresults show that the quality of the distant-supervised dataset has a great impact on thequality of the produced sentiment embeddings, and hence the entire Twitter SentimentAnalysis system.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Kunstig intelligens
dc.title	Distant Supervision and Sentiment Embeddings for Ternary Twitter Sentiment Analysis
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: 16951_FULLTEXT.pdf
Størrelse:: 3.511Mb
Format:: PDF

Åpne

Filnavn:: 16951_ATTACHMENT.zip
Størrelse:: 102.0Kb
Format:: application/zip

Åpne

Filnavn:: 16951_COVER.pdf
Størrelse:: 1.556Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6558]

Vis enkel innførsel