dc.description.abstract | A multivariate method called principal component analysis has been used to model and
analyze patterns in online sales records. Specifically, sales from a Chinese hot pot soup
company from 2016 and 2017 have been analyzed. Within the field of data mining,
recent literature mentions PCA as a tool for data reduction and fails to comment on its
analytic potential [26] [14]. The hot pot sales records have been transformed into three
different data structures:
Daily sales - The products, or the provinces the products were shipped to, was
used as columns. The sum of all sales was then put into rows, where each row
represents one day.
Purchase times - The products, or the provinces the products were shipped to,
was used as columns. The rows were represented by the total sales within 48
time intervals. The first interval is 00:00-00:30 and the last being 23:30-00:00.
Customer-product matrix - The products were used as columns. The rows were
represented by the different accounts that had bought one or more products, making
the dataset represent the product combination each customer has bought.
The daily sales proved to contain interesting information which could be discovered
with PCA. A change in purchase behavior over the years and seasonal differences
in purchases was discovered. The purchase time format yielded results much similar
to if one had summed all the products sold within the time slots. However, one could
conclude that different products/provinces did not have a significantly different purchase
pattern. This could not have been investigated by solely summing the data. The
customer-product matrix proved to be too sparse, consisting of too many zero values,
for the use of PCA to be effective. In addition to visual interpretation, a method called
SIMCA (soft independent modeling of class analogies) was used. SIMCA is a method
for classification, determining if a new sample fits an existing PCA model. A model for
the daily sales was built for 2016, and all samples of 2017 were tested on that model
using SIMCA. Days in 2017 which were visible as different from 2016 on the model
which contained data from both year was selected and used for comparison. The results
showed that the SIMCA classified many of the manually selected days as unfit for the
PCA model. This indicates that SIMCA could be used to continuously track purchase
i
behavior and function as some alarm-system for when changes in purchase behavior
appear. | |