Towards efficiently mining closed high utility itemsets from incremental databases
Journal article, Peer reviewed
MetadataVis full innførsel
OriginalversjonKnowledge-Based Systems. 2019, 165 13-29. 10.1016/j.knosys.2018.11.019
The set of closed high-utility itemsets (CHUIs) concisely represents the exact utility of all itemsets. Yet, it can be several orders of magnitude smaller than the set of all high-utility itemsets. Existing CHUI mining algorithms assume that databases are static, making them very expensive in the case of incremental data, since the whole dataset has to be processed for each batch of new transactions. To address this challenge, this paper presents the first approach, called IncCHUI, that mines CHUIs efficiently from incremental databases. In order to achieve this, we propose an incremental utility-list structure, which is built and updated with only one database scan. Further, we apply effective pruning strategies to fast construct incremental utility-lists and eliminate candidates that are not updated. Finally, we suggest an efficient hash-based approach to update or insert new closed sets that are found. Our extensive experimental evaluation on both real-life and synthetic databases shows the efficiency, as well as the feasibility of our approach. It significantly outperforms previously proposed methods that are mainly run in batch mode in terms of speed, and it is scalable with respect to the number of transactions.