An efficient algorithm for mining top-k on-shelf high utility itemsets

Dam, Thu-Lan; Li, Kenli; Fournier-Viger, Philippe; Duong, Quang-Huy

dc.contributor.author	Dam, Thu-Lan
dc.contributor.author	Li, Kenli
dc.contributor.author	Fournier-Viger, Philippe
dc.contributor.author	Duong, Quang-Huy
dc.date.accessioned	2018-04-04T10:46:46Z
dc.date.available	2018-04-04T10:46:46Z
dc.date.created	2017-07-19T13:33:37Z
dc.date.issued	2017
dc.identifier.citation	Knowledge and Information Systems. 2017, Published ahead of print 1-35.	nb_NO
dc.identifier.issn	0219-1377
dc.identifier.uri	http://hdl.handle.net/11250/2492555
dc.description.abstract	High on-shelf utility itemset (HOU) mining is an emerging data mining task which consists of discovering sets of items generating a high profit in transaction databases. The task of HOU mining is more difficult than traditional high utility itemset (HUI) mining, because it also considers the shelf time of items, and items having negative unit profits. HOU mining can be used to discover more useful and interesting patterns in real-life applications than traditional HUI mining. Several algorithms have been proposed for this task. However, a major drawback of these algorithms is that it is difficult for users to find a suitable value for the minimum utility threshold parameter. If the threshold is set too high, not enough patterns are found. And if the threshold is set too low, too many patterns will be found and the algorithm may use an excessive amount of time and memory. To address this issue, we propose to address the problem of top-k on-shelf high utility itemset mining, where the user directly specifies k, the desired number of patterns to be output instead of specifying a minimum utility threshold value. An efficient algorithm named KOSHU (fast top-K on-shelf high utility itemset miner) is proposed to mine the top-k HOUs efficiently, while considering on-shelf time periods of items, and items having positive and/or negative unit profits. KOSHU introduces three novel strategies, named efficient estimated co-occurrence maximum period rate pruning, period utility pruning and concurrence existing of a pair 2-itemset pruning to reduce the search space. KOSHU also incorporates several novel optimizations and a faster method for constructing utility-lists. An extensive performance study on real-life and synthetic datasets shows that the proposed algorithm is efficient both in terms of runtime and memory consumption and has excellent scalability.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Springer Verlag	nb_NO
dc.title	An efficient algorithm for mining top-k on-shelf high utility itemsets	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.pagenumber	1-35	nb_NO
dc.source.volume	Published ahead of print	nb_NO
dc.source.journal	Knowledge and Information Systems	nb_NO
dc.identifier.doi	10.1007/s10115-016-1020-2
dc.identifier.cristin	1482609
dc.description.localcode	This is a post-peer-review, pre-copyedit version of an article published in [Knowledge and Information Systems]. The final authenticated version is available online at: https://link.springer.com/article/10.1007%2Fs10115-016-1020-2	nb_NO
cristin.unitcode	194,63,10,0
cristin.unitname	Institutt for datateknologi og informatikk
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1

Files in this item

Name:: KAIS-D-16-00152R2.pdf
Size:: 2.575Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6544]
Publikasjoner fra CRIStin - NTNU [37175]

Show simple item record