Neural networks for sentiment analysis in AsterixDB

Finckenhagen, Johan Morten Kristoffer

Finckenhagen, Johan Morten Kristoffer

Master thesis

Åpne

18918_FULLTEXT.pdf (720.1Kb)

18918_COVER.pdf (1.556Mb)

Permanent lenke

http://hdl.handle.net/11250/2569391

Utgivelsesdato

2018

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6822]

Sammendrag

As data is generated at an ever increasing rate, and social media are getting larger and more comprehensive than ever, the availability of these data and the possibility to analyze them, is growing as well. The capability of storing large masses of data has become trivialized over the years, and massive amount of information is stored every second. There is competitiveness in being able to extract meaningful information from data that were previously thought of as insignificant. Sentiment analysis is methods of retrieving an authors attitude towards the topic discussed. Having knowledge about the sentiment of the masses can be of significant market value, especially in i.e. knowing customers happiness with a product or service provided. Other useful areas can be mining twitter and follow opinions about the latest trends to develop new market strategies. Together with the late success of deep learning, it poses great interest to observe how these practices can be combined to analyze big data.

Previously the technique for processing natural language has been to create vocabularies of arbitrary order to use for further processing, but by introducing multi-dimensional vectors we can store semantic and syntactic information about words in a vector-space. These vectors has proven incredibly good for natural language processing, and more so as input format to deep learning algorithms as neural networks.

In this study several neural network models will be evaluated up against traditional classification algorithms, measuring accuracy, on sentiment classification of twitter messages. Each model will be tested for prediction speed on big data using AsterixDB, a big data management system support ingestion of streaming data. The results from this study gives the best accuracy score 84,02%, and the fastest networks can handle an average of about 10 000 tweets per second.

Utgiver

NTNU