Vis enkel innførsel

dc.contributor.advisorMartens, Harald
dc.contributor.authorLekva, Trym Tarjeison
dc.date.accessioned2018-08-28T14:01:43Z
dc.date.available2018-08-28T14:01:43Z
dc.date.created2018-06-04
dc.date.issued2018
dc.identifierntnudaim:18661
dc.identifier.urihttp://hdl.handle.net/11250/2559718
dc.description.abstractA multivariate method called principal component analysis has been used to model and analyze patterns in online sales records. Specifically, sales from a Chinese hot pot soup company from 2016 and 2017 have been analyzed. Within the field of data mining, recent literature mentions PCA as a tool for data reduction and fails to comment on its analytic potential [26] [14]. The hot pot sales records have been transformed into three different data structures: Daily sales - The products, or the provinces the products were shipped to, was used as columns. The sum of all sales was then put into rows, where each row represents one day. Purchase times - The products, or the provinces the products were shipped to, was used as columns. The rows were represented by the total sales within 48 time intervals. The first interval is 00:00-00:30 and the last being 23:30-00:00. Customer-product matrix - The products were used as columns. The rows were represented by the different accounts that had bought one or more products, making the dataset represent the product combination each customer has bought. The daily sales proved to contain interesting information which could be discovered with PCA. A change in purchase behavior over the years and seasonal differences in purchases was discovered. The purchase time format yielded results much similar to if one had summed all the products sold within the time slots. However, one could conclude that different products/provinces did not have a significantly different purchase pattern. This could not have been investigated by solely summing the data. The customer-product matrix proved to be too sparse, consisting of too many zero values, for the use of PCA to be effective. In addition to visual interpretation, a method called SIMCA (soft independent modeling of class analogies) was used. SIMCA is a method for classification, determining if a new sample fits an existing PCA model. A model for the daily sales was built for 2016, and all samples of 2017 were tested on that model using SIMCA. Days in 2017 which were visible as different from 2016 on the model which contained data from both year was selected and used for comparison. The results showed that the SIMCA classified many of the manually selected days as unfit for the PCA model. This indicates that SIMCA could be used to continuously track purchase i behavior and function as some alarm-system for when changes in purchase behavior appear.
dc.languageeng
dc.publisherNTNU
dc.subjectKybernetikk og robotikk
dc.titleInvestigating the Potential of Principal Component Analysis on Online Sales Records
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel