Twitter Sentiment Analysis - Exploring Automatic Creation of Sentiment Lexica

In recent years, micro-blogging on the Internet has become a popular way of expressing your thoughts and feelings. Twitter is a social networking service specialized on the phenomenon, with over 320 million monthly active users world wide. The vast amount of micro-blogs posted through the service on a daily basis makes it a great data source of opinionated texts. In the field of Sentiment Analysis or opinion mining, in which the aim is to automatically extract the sentimental orientation of a text, there has been a shift towards the opinionated Twitter data. This shift has led to an entire new field of study: Twitter Sentiment Analysis.

In this Master's thesis the fields of lexicon based Sentiment Analysis and automatic creation of sentiment lexica have been explored. Based on our research within the fields, both an automatic lexicon creator and a lexicon based Sentiment Analysis system were developed.

Our lexicon based Sentiment Analysis system, utilizing our best performing sentiment lexicon created by our automatic lexicon creator, produces good results almost keeping up with systems utilizing sophisticated machine learning approaches. Regarding run-time performance, our system significantly outperforms the other compared systems, proving its capability of real-time classification of large amounts of tweets. In a lexicon comparison experiment, our created lexicon beats a manually annotated lexicon, both proving the viability of automatically generated sentiment lexica and specifically the PMI approach.

In addition, we have discovered the importance of tailoring the classifier to each individual sentiment lexica to utilize its full potential, and that the quality of the sentiment lexica produced through the PMI approach is highly dependent on the overall quality of a labeled dataset.

Publisher

NTNU