Deep Learning with emphasis on extracting information from text data
Abstract
In this thesis the Natural Language Processing (NLP) problems of predicting the negative or positive sentiment of a movie review (sentiment analysis) and Automated Essay Grading (AES) were analyzed. The data set used for the movie review part is from the IMDB database and the essays were published by the Hewlett foundation. Features were retrieved by using both conventional methods, such as Bag of Words, and newer methods, such as word vectors. These features were used to train both conventional statistical methods and more computational demanding Deep Learning models. The results shows that the conventional methods still perform quite well relative to the new "hot" methods on the problems tested in this thesis. However, a significant increase in available data observations might change this.