Identifying the Best Machine Learning Model for Predicting Bank Term Deposits: An Empirical Study Using Public, Post Financial Crisis Data.

Hjertaas, Marcus; Knudsen, Henrik Krantz; Lindstrøm, Jakob; Sælemyr, Joakim

dc.contributor.advisor	Becker, Denis
dc.contributor.author	Hjertaas, Marcus
dc.contributor.author	Knudsen, Henrik Krantz
dc.contributor.author	Lindstrøm, Jakob
dc.contributor.author	Sælemyr, Joakim
dc.date.accessioned	2023-06-22T17:20:08Z
dc.date.available	2023-06-22T17:20:08Z
dc.date.issued	2023
dc.identifier	no.ntnu:inspera:140915232:146054427
dc.identifier.uri	https://hdl.handle.net/11250/3072757
dc.description.abstract	Formålet med denne bacheloroppgaven er å finne den beste maskinlæringsmodellen for å nøyaktig predikere bankinnskudd fra offentligheten. Oppgaven blir sett på fra et bedriftsperspektiv, hvor en også tar hensyn til reelle faktorer som gjorde seg gjeldende i den aktuelle tidsperioden for datainnhentingen. Dataen som blir benyttet inneholder en mengde ulik informasjon fra et utvalg av portugisiske individer, deriblant hvorvidt disse individene har gjennomført et forpliktende langsiktig bankinnskudd eller ikke. Tallmaterialet har gjennomgått en rekke variabel-konfigurasjoner og data strukturelle transformasjoner. Observasjonene er hentet i kjølvannet av finanskrisen i 2008. Fire maskinlæringsmodeller ble anvendt for prediksjon. De er alle basert på anerkjente statistiske klassifikasjons modeller, disse er: binomial logistisk regresjon, beslutningstreklassifisering, kunstige nevrale nettverk og støtte vektor maskiner. Modellene varierer i kompleksitet, og har ulike fordeler og ulemper. Disse vil bli brukt for løse prediksjons-utfordringer som dataen inneholder. Det er ikke mulig å kombinere metodene for å kompensere for deres individuelle ulemper. Derimot, ved bruk av flere metoder kan det undersøkes hvilken modell som best imøtekommer de spesifikke utfordringene som dataen byr på. Den beste modellen blir valgt basert på en helhetlig vurdering. Det første kriteriet baserer seg på prediksjons-nøyaktighet og modellens evne til å klassifisere potensielle kunder. Det andre kriteriet baserer seg på treningsdataen, og graden av transformasjon i datastrukturen. Modellen er konfigurert for å predikere data gitt de gjeldende forholdene i datainnhentings perioden, derfor er modellene begrenset til disse forutsetningene. Våre funn tilsier at en modell basert på støtte vektor maskiner, trent på transformert data i henhold til randomisert under-utvelgelse av data, er den mest nøyaktige modellen. Modellen viser god evne til prediksjons-nøyaktighet og klassifikasjon av minoritetsklassen.
dc.description.abstract	The purpose of this bachelor thesis is to find the best machine learning method to accurately predict bank term deposits from the public. We are viewing the issue from a business perspective, also taking real-world implications into account. The data which the models are applied to provides a series of different information from a sample of Portuguese individuals, including whether or not these individuals have made committed long-term deposits into the bank. The data have gone through a set of configurations and resampling techniques. Retrieval of the data was in the wake of the Great Financial Crisis of 2008. Four machine learning models were chosen for prediction, they are all based on acknowledged statistical classification methods, these are: binomial logistic regression, decision tree classifier, artificial neural networks, and support vector machines. These methods vary in complexity and have different sets of advantages and disadvantages. They will be applied to face the prediction-challenges in the data. It’s not possible to combine one method to cover another method's disadvantages. However, by using several methods it is believed that one can find the model that best faces the specific challenges provided by the data. The best model is based on a comprehensive assessment, including two criterias. The first criterion is based on the prediction rate and the model’s ability at predicting actual possible customers. The second criterion concerns the model's training data and its amount of resampling interference. The models are configured to give predictions for the conditions during the period of data-retrieval, and are therefore constrained to these conditions. According to our findings, the support vector machine model trained on the undersampled data is the most favorable model. The model showed both great prediction accuracy and managed to classify the
dc.language	eng
dc.publisher	NTNU
dc.title	Identifying the Best Machine Learning Model for Predicting Bank Term Deposits: An Empirical Study Using Public, Post Financial Crisis Data.
dc.type	Bachelor thesis

Files in this item

Name:: no.ntnu:inspera:140915232:1460 ...
Size:: 6.767Mb
Format:: PDF

View/Open

Name:: no.ntnu:inspera:140915232:1460 ...
Size:: 2.273Mb
Format:: application/zip

View/Open

This item appears in the following Collection(s)

NTNU Handelshøyskolen [1565]

Show simple item record