Maskinlæring for analyse av børsmeldinger og aksjekursprediksjon

Medby, Karl Olav; Nordgård, Aleksander

Medby, Karl Olav; Nordgård, Aleksander

Master thesis

View/Open

no.ntnu:inspera:2364467.pdf (7.117Mb)

URI

http://hdl.handle.net/11250/2623174

Date

2019

Metadata

Show full item record

Collections

NTNU Handelshøyskolen [1718]

Abstract

I senere år har maskinlæring og tekstanalyse vist store fremskritt innenfor finansielle bruksområder. I denne oppgaven lager vi modeller ved hjelp av maskinlæring og språkteknologi for å komme med estimat på en aksjekursendring som følge av publikasjon av børsmeldinger på Oslo Børs, og bruker funnene til å argumentere mot den sterkeste formen for markedseffisiens. Vi sammenligner ni forskjellige modeller, før vi benytter de mest lovende i en long/short tradingstrategi med hedging mot Oslo Børs hovedindeks. Resultatene tyder på at det er mulig å oppnå meravkastning over indeksen, noe som viser at børsmeldinger bør være inkludert som beslutningsgrunnlag i en automatisert tradingstrategi.

Vår studie viste at beste resultatene ble oppnådd ved å representere tekstkorpuset som en TF-IDF-matrise, og deretter redusere dimensjonaliteten ved hjelp av latent semantisk analyse. En «naiv Bayes» klassifiseringsmodell ga best resultater ved kryssvalidering på treningsdataene, mens «gradient boosting» presterte best på testdataene.

Machine learning and natural language processing have in recent years shown great promise in several financial applications. In this paper we create models using machine learning and natural language processing to estimate stock price changes related to the publication of corporate announcements on the Oslo Stock Exchange, and use the findings from our model to argue against the strongest form of market efficiency. We compare nine different models before the most promising are applied in a trading application with a long/short strategy hedged against the Oslo Stock Exchange benchmark index, which indicates that there is potential to achieve excess returns over the index, showing that corporate announcements should be included in an automated trading application.

Our study found that the best results came by representing the corpus in a TF-IDF matrix and reducing the dimensionality with latent semantic analysis before training the classifiers. A naive Bayes classifier gave the best cross-validation score on the training set, while a gradient boosting classifier performed best on the test set.

Publisher

NTNU