Clustering users in an electronic business reference system
Abstract
When making strategic decisions in a business setting it is advantageous to knowas much about the users of your products as possible. Information about whatinterest segments exist can be used for search optimization, product improvementand custom tailored marketing.This project belongs to the field of knowledge discovery in databases and concernsthe discovery of user interest clusters in an electronic business referencesystem using an implicit voting scheme based on the sytem s web logs. A literaturereview is conducted to explore recent efforts in the field, experiments areconducted to apply the theory from the literature review and a qualitative analysisis conducted on the results of the experiments.The main contributions of this thesis are a comparison of Spearman Rank correlationand Frequency-Weighted Pearson correlation in terms of scalability andthe application of Blondel s algorithm on a previously unexplored data set generatedby users in a professional work setting. The results show that FrequencyWeightedPearson correlation is the more scalable alternative, and that clustersdo exist in the data set. Furthermore it is shown that there is seasonal variationsin the data set and the discovered interest groups.