Characterizing Twitter Data using Sentiment Analysis and Topic Modeling

Therkelsen, Jonas Foyn; Steinskog, Asbjørn Ottesen

dc.contributor.advisor	Gambäck, Björn
dc.contributor.author	Therkelsen, Jonas Foyn
dc.contributor.author	Steinskog, Asbjørn Ottesen
dc.date.accessioned	2016-10-14T14:00:57Z
dc.date.available	2016-10-14T14:00:57Z
dc.date.created	2016-06-26
dc.date.issued	2016
dc.identifier	ntnudaim:15796
dc.identifier.uri	http://hdl.handle.net/11250/2415333
dc.description.abstract	As the global community becomes increasingly connected, it gets more and more common to express thoughts and opinions through social networking websites. Twitter, currently the largest microblog website in the world, is heavily used for this purpose. Well known politicians, comedians and trending persons use this medium to express their minds through 140-character messages. This makes Twitter one of the platforms being most influential on the global web communities way of thinking. This thesis combines topic modeling and sentiment analysis in order to obtain information from tweets. While sentiment analysis seeks to find out what opinions people have, topic modeling tries to find out what they talk about. Convential topic modeling schemes, such as Latent Dirichlet Allocation, are known to perform inadequately when applied to tweets, due to the sparsity of short documents. To alleviate these disadvantages, we apply several pooling techniques, aggregating similar tweets into individual documents. We specifically study the aggregation of tweets sharing authors or hashtags. Our Twitter Sentiment Analysis system is comprised of seven different machine learning classifiers. These aim to predict whether a message's polarity is of neutral, negative or positive sentiment. Four machine learning algorithms, Maximum Entropy, Naïve Bayes, Support Vector Machines and Stochastic Gradient Descent, have been proposed for performing sentiment classification in this thesis. The classifiers were trained through experiments of extensive grid searches on a parameter space and preprocessing methods in order to achieve optimal classification scores. To combine topic modeling with sentiment analysis, a state-of-the-art visualization application, called TweetMoods, was built. TweetMoods simultaneously examines the topics contained in a Twitter corpus retrieved by a search query, and the sentiments expressed in these tweets. Our topic modeling results show that aggregating similar tweets into individual documents increases the topic coherence significantly. On performing message polarity classification on tweets, the Maximum Entropy classifier yielded results outperforming most earlier submitted work to the International Workshop on Semantic Evaluation of 2015. This proves the importance of our extensive grid searches on optimizing the parameter space of the classifiers.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Software
dc.subject	Datateknologi, Komplekse datasystemer
dc.title	Characterizing Twitter Data using Sentiment Analysis and Topic Modeling
dc.type	Master thesis
dc.source.pagenumber	132

Files in this item

Name:: 15796_FULLTEXT.pdf
Size:: 6.461Mb
Format:: PDF

View/Open

Name:: 15796_ATTACHMENT.zip
Size:: 130.1Mb
Format:: Unknown

View/Open

Name:: 15796_COVER.pdf
Size:: 1.556Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6544]

Show simple item record