Investigating the Potential of Principal Component Analysis on Online Sales Records
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2559718Utgivelsesdato
2018Metadata
Vis full innførselSamlinger
Sammendrag
A multivariate method called principal component analysis has been used to model andanalyze patterns in online sales records. Specifically, sales from a Chinese hot pot soupcompany from 2016 and 2017 have been analyzed. Within the field of data mining,recent literature mentions PCA as a tool for data reduction and fails to comment on itsanalytic potential [26] [14]. The hot pot sales records have been transformed into threedifferent data structures:
Daily sales - The products, or the provinces the products were shipped to, wasused as columns. The sum of all sales was then put into rows, where each rowrepresents one day. Purchase times - The products, or the provinces the products were shipped to,was used as columns. The rows were represented by the total sales within 48time intervals. The first interval is 00:00-00:30 and the last being 23:30-00:00. Customer-product matrix - The products were used as columns. The rows wererepresented by the different accounts that had bought one or more products, makingthe dataset represent the product combination each customer has bought.The daily sales proved to contain interesting information which could be discoveredwith PCA. A change in purchase behavior over the years and seasonal differencesin purchases was discovered. The purchase time format yielded results much similarto if one had summed all the products sold within the time slots. However, one couldconclude that different products/provinces did not have a significantly different purchasepattern. This could not have been investigated by solely summing the data. Thecustomer-product matrix proved to be too sparse, consisting of too many zero values,for the use of PCA to be effective. In addition to visual interpretation, a method calledSIMCA (soft independent modeling of class analogies) was used. SIMCA is a methodfor classification, determining if a new sample fits an existing PCA model. A model forthe daily sales was built for 2016, and all samples of 2017 were tested on that modelusing SIMCA. Days in 2017 which were visible as different from 2016 on the modelwhich contained data from both year was selected and used for comparison. The resultsshowed that the SIMCA classified many of the manually selected days as unfit for thePCA model. This indicates that SIMCA could be used to continuously track purchaseibehavior and function as some alarm-system for when changes in purchase behaviorappear.