Text Mining of News Articles for Stock Price Predictions
MetadataShow full item record
This thesis investigates the prediction of possible stock price changes immediately after news article publications, by automatic analysis of these news articles. Some background information about financial trading theory and text mining is given in addition to an overview of earlier related research in the field of automatic analyzes of news articles for predicting future stock prices. In this thesis a system is designed and implemented to predict stock price trends for the time immediately after the publication of news articles. This system consists mainly of four components. The first component gathers news articles and stock prices automatically from internet. The second component prepares the news articles by sending them to some document preprocessing steps and finding relevant features before they are sent to a document representation process. The third component categorizes the news articles into predefined categories, and finally the fourth component applies appropriate trading strategies depending on the category of the news article. This system requires a labeled data set to train the categorization component. This data set is labeled automatically on the basis of the price trends directly after the news article publication. An additional label refining step using clustering is added in an attempt to improve the labels given by the basic method of labeling by price trends.The findings indicate that a categorization of news articles provides additional information that can be used to forecast stock price trends. Experiments showed that the label refining method greatly improves the performance of the system. It was also shown that the timing of when to start the price trends used to label the data sets had a significant impact on the results. Trading simulations performed with the systems managed to gain positive returns (profits) on most of its trades. Some of the methods also managed to give better results than what trades performed with the manually labeled data set did.