Sales prediction in online banking

S. Stenberg, Erlend; Iden, Mathias

dc.contributor.advisor	Gulla, Jon Atle
dc.contributor.author	S. Stenberg, Erlend
dc.contributor.author	Iden, Mathias
dc.date.accessioned	2018-10-24T14:00:20Z
dc.date.available	2018-10-24T14:00:20Z
dc.date.created	2018-06-19
dc.date.issued	2018
dc.identifier	ntnudaim:19987
dc.identifier.uri	http://hdl.handle.net/11250/2569390
dc.description.abstract	This master thesis seeks to explore how machine learning methods can be applied to predict the customers that are likely to purchase a credit card in Sparebank 1 SMN. The sales prediction problem has many similarities with customer churn prediction problems. We examine the current literature of both problems within the banking domain and adapt several techniques to our project. The experiment conducted follows an exploratory, result-driven approach with the primary goal of answering three research questions. We develop two machine learning models from data based on the event logs from interactions with the bank's online services and from customers' personal attributes. We define two pipelines, one for each dataset. In both pipelines we evaluate multiple classification algorithms. The first pipeline is exploratory of nature as little research has been done examining how sequential event data in the form of customer timelines can be used for training a classification model. The second pipeline is based on a traditional static customer attributes dataset commonly seen in state-of-the-art research. We apply various preprocessing and data aggregation techniques to optimise the datasets for further analysis. By performing sampling and feature selection techniques we measure the effect on model performance in terms of how well the models are able to identify likely credit card purchasers while reducing the number of incorrectly predicted purchasers. After finalising each pipeline, we examine whether a combination of the models produce better results than either model in isolation. Finally, we attempt to uncover customer segments that are likely to produce high confidence predictions. Our main findings show that the Random Forest algorithm achieves the highest performance for both datasets. The customer event timelines produced a higher performing model than the static customer attributes in terms of identifying likely credit card purchasers. The combination of the two models identifies a slightly lower amount of purchasers than either model in isolation, however greatly reduces the number of incorrectly predicted purchasers. Furthermore, by using sampling techniques to balance the proportion of purchasers to non-purchasers in the datasets, we are able to control the model's ratio between correctly and incorrectly identified purchasers.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Programvareutvikling
dc.title	Sales prediction in online banking
dc.type	Master thesis

Files in this item

Name:: 19987_FULLTEXT.pdf
Size:: 2.245Mb
Format:: PDF

View/Open

Name:: 19987_COVER.pdf
Size:: 1.556Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6544]

Show simple item record