mining high utility itemsets without candidate generation date: 2013/05/13 author: mengchi liu,...
TRANSCRIPT
![Page 1: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/1.jpg)
Mining High Utility Itemsets without Candidate Generation
Date: 2013/05/13
Author: Mengchi Liu, Junfeng Qu
Source: CIKM "12
Advisor: Jia-ling Koh
Speaker: I-Chih Chiu 1
![Page 2: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/2.jpg)
Outline
• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion
2
![Page 3: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/3.jpg)
Introduction• The rapid development of database techniques
facilitates the storage and usage of massive data from business corporations, governments, and scientific organizations.
• The high utility itemset mining problem is one of the most important from the famous frequent itemset mining problem.
3
![Page 4: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/4.jpg)
Introduction
• Traditional frequent itemset mining algorithms cannot evaluate the utility information about itemsets. In a supermarket database
Each item has a distinct price/profit. Each item in a transaction is associated with a distinct
quantity.An itemset with high support may have low utility
4
transaction support total utility
egg, bread 10 30
beef, pork 5 45
Ex :
![Page 5: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/5.jpg)
Motivation
• Recently, a number of high utility itemset mining algorithms have been proposed.Generate candidate high utility itemsets.Compute the exact utilities of the candidates by scanning
the database to identify high utility itemsets.
• However, the algorithms often generate a very large number of candidate itemsets.Excessive memory requirement for storing candidate
itemsets.A large amount of running time for generating candidates
and computing their exact utilities. 5
![Page 6: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/6.jpg)
Goal
• A novel structure, called utility-list, is proposed. the utility information about an itemset the heuristic information about whether the itemset should
be pruned or not.
• An efficient algorithm, called HUI-Miner (High Utility Itemset Miner), is developed. It does not generate candidate high utility itemsets. It can mine high utility itemsets after constructing the initial
utility-lists.
6
![Page 7: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/7.jpg)
Diagram
7
High utility itemsets
HUI-Miner
Construct utility list
transactions
![Page 8: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/8.jpg)
Outline
• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion
8
![Page 9: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/9.jpg)
Problem Definition• : a set of items.• Each transaction() has a unique identifier().
Def. 1. : is the associated with in T in the .
Def. 2. : is the of in the .
Def. 3. : is the product of and .
9𝑢 (𝑒 ,𝑇 5 )=𝑖𝑢 (𝑒 ,𝑇 5 )×𝑒𝑢 (𝑒)
Ex :
![Page 10: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/10.jpg)
Def. 4. : The of in is the sum of the utilities of all the items in in , where .
Def. 5. : The of is the sum of the utilities of in all the transactions in , where .
Def. 6. : The of is the sum of the utilities of all the items in , where .
10
𝑢 ({𝑎𝑒 },𝑇 2 )=𝑢 (𝑎 ,𝑇 2 )+𝑢 (𝑒 ,𝑇 2 )𝑢 ({𝑎𝑒 })=𝑢 ( {𝑎𝑒 },𝑇 2 )+𝑢 ( {𝑎𝑒 } ,𝑇 5 )
Ex :
Ex : TransactionUtility
![Page 11: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/11.jpg)
11
Def. 7. : The of itemset in is the sum of the utilities of all the transactions containing X in DB, where .
Property 1. If is less than a given “minutil”, all supersets of are not high utility.
Rationale.
𝑡𝑤𝑢 ({ 𝑓 })=𝑡𝑢 (𝑇 4 )+𝑡𝑢 (𝑇 6 )Ex :
Assume minutil=30, According to Property 1, all supersets of are not high utility.
Ex :
TransactionUtility Transaction−WeightedUtility
![Page 12: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/12.jpg)
Outline
• Introduction• Problem Definition• Utility-List Structure
Initial Utility-ListsUtility-Lists of 2-ItemsetsUtility-Lists of k-Itemsets(k3)
• High Utility Itemset Miner• Experiment• Conclusion
12
![Page 13: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/13.jpg)
Initial Utility-ListsDef. 8. A transaction is considered as “revised“ after
(1) all the items whose transaction-weighted utilities are less than a given are deleted from the transaction.
(2) the remaining items are sorted in transaction-weighted- utility-ascending order.
The remaining items are sorted: e<c<b<a<d
13
Suppose
All RevisedTransactions
Transaction−WeightedUtility
![Page 14: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/14.jpg)
Def. 9 : The set of all the items after in .
: an itemset, : a transaction (or itemset)
Def. 10. : The of itemset X in transaction T is the sum of the utilities of all the items in in , where .
14
All RevisedTransactions
𝑇 2 / {𝑒𝑏 }={𝑎𝑑 }Ex :
𝑇 2 / {𝑐 }={𝑏𝑎𝑑 }
InitialUtility −Lists
Tids : a transaction T containing XIutils : the utility of X in T, i.e., Rutils : the remaining utility of X in T, i.e.,
<3,2,9> is in the utility-list of {c}.
Ex :
![Page 15: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/15.jpg)
Utility-Lists of 2-Itemsets• No need for database scan.
15
identifying common
transactions
Utility-listsof 2-itemset
![Page 16: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/16.jpg)
Utility-Lists of k-Itemsets
• To construct the utility-list of k-itemset () Intersect the utility-list of and
16
Ex :{}
(k=2)
(k3)
![Page 17: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/17.jpg)
Outline
• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner
Search spacePruning StrategyHUI-Miner Algorithm
• Experiment• Conclusion
17
![Page 18: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/18.jpg)
Search space• Set-Enumeration Tree
18
Def. 11. Given a set-enumeration tree, an itemset represented by a node is called an extension of an itemset represented by an ancestor node of the node. For an itemset containing items, its extension containing items is called an - of the itemset.
Property 2. If is an extension of , Rationale. Any extension of X is a combination of X with the item(s) after X.
Ex :: the 1-extension of : the 2-extension of
Def. 9
![Page 19: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/19.jpg)
Pruning Strategy• Exhaustive search → Time consuming
19
Lemma 1. Given the utility-list of , if the sum of all the and in the utility-list is less than a given “”, any extension of is not high utility.
Assume X= {ec } , X ’={ecb}
![Page 20: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/20.jpg)
• : the of transaction • : the set in the utility-list of • : the set in the utility-list of ’
20
{𝑒𝑐 }⊂ {𝑒𝑐𝑏 }⇒ {𝑇 2 }⊆{𝑇 2 ,𝑡 4 }
Ex :Suppose
The sum of all the iutils amd rutils
7+6+11=24 < 30
![Page 21: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/21.jpg)
HUI-Miner Algorithm
21
![Page 22: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/22.jpg)
Outline
• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion
22
![Page 23: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/23.jpg)
Experimental Setup• Besides HUI-Miner, experiments include three algorithms
IHUPTWUUP-GrowthUP-Growth+
• Eight databases
23
real
synthetic
![Page 24: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/24.jpg)
• Running Time Terminated a mining task, once its running time exceeds 10000
seconds.
For most sparse databases, the performance superiority of HUI-Miner becomes very significant when the decreases. 24
![Page 25: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/25.jpg)
• Memory Consumption Except for database accidents in (a), HUI-Miner always consumes
less memory than the other algorithms.
Another observation is that UP-Growth+ consumes more memory than UP-Growth in (b) and(d).UP-Growth+ holds more information than UPGrowth in sparse and
large database.
25
![Page 26: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/26.jpg)
Experiment• Processing Order of Items
The processing order of items significantly influences the performance of a high utility itemset mining algorithm.
26
![Page 27: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/27.jpg)
27
![Page 28: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/28.jpg)
Outline
• Introduction• Problem Definition• Utility-List Structure• High Utility Itemset Miner• Experiment• Conclusion
28
![Page 29: Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:](https://reader035.vdocuments.net/reader035/viewer/2022081515/56649e715503460f94b6ffe2/html5/thumbnails/29.jpg)
Conclusion• Proposed a novel data structure, utility-list, and
developed an efficient algorithm, HUI-Miner, for high utility itemset mining.
• Utility-lists provide not only utility information about itemsets but also important pruning information for HUI-Miner.
• HUI-Miner can mine high utility itemsets without candidate generation, which avoids the costly generation and utility computation of candidates.
29