Classification and Visualisation of Twitter Sentiment Data

The social micro-blog site Twitter grows in user base each day and has become an attractive platform for companies, politicians, marketeers, and others wishing to share information and/or opinions. With a growing user market for Twitter, more and more systems and research are released for taking advantage of its informal nature and doing opinion mining and sentiment analysis.

This master thesis describes a system for doing Sentiment Analysis on Twitter data and experiments with grid searches on various combinations of machine learning algorithms, features and preprocessing methods to achieve so. The classification system is fairly domain independent and performs better than baseline.

This system is designed to be fast enough to classify big amounts of data and tweets in a stream, and provides an application program interface (API) to easily transfer data to applications or end users.

Three visualisation applications are implemented, showing how to use the API and providing examples of how sentiment data can be used.

The main contributions are:

C1: A literary study of the state-of-the-art for Twitter Sentiment Analysis.

C2: The implementation of a general system architecture for doing Twitter Sentiment Analysis.

C3: A comparison of different machine learning algorithms for the task of identifying sentiments in short messages in a fairly semi-independent domain.

C4: Implementations of a set of visualisation applications, showing how to use data from the generic system and providing examples of how to present sentiment analysis data.

Utgiver

NTNU