mining frequent patterns

46
Without Candidate Generation Mining Frequent Patterns Afsoon Yousefi CS:332, March 24 th , 2014 Inspired by Song Wang slides Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University

Upload: kevyn

Post on 16-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Mining Frequent Patterns. Jiawei Han, Jian Pei and Yiwen Yin School of Computer Science Simon Fraser University. Without Candidate Generation. Afsoon Yousefi. CS:332, March 24 th , 2014 Inspired by Song Wang slides. Outline. Problem of mining frequent Pattern Review of Apriori - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Mining Frequent Patterns

Without Candidate GenerationMining Frequent Patterns

Afsoon YousefiCS:332, March 24th, 2014Inspired by Song Wang slides

Jiawei Han, Jian Pei and Yiwen YinSchool of Computer ScienceSimon Fraser University

Page 2: Mining Frequent Patterns

2

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 3: Mining Frequent Patterns

3

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 4: Mining Frequent Patterns

4

Frequent pattern mining plays an essential role in mining associations.

Most of the previous studies, adopt an Apriori-like approach.Achieves good performance but suffers from:

Problem of mining frequent Pattern

• Apriori:• frequent 1-itemsets → length-2 candidates• Accumulate and test• Find a length-100 frequent pattern → candidates

It is costly to handle a huge number of candidate sets

• Scan database• Check a large set of candidates

It is tedious to repeatedly scan the database

Page 5: Mining Frequent Patterns

5

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 6: Mining Frequent Patterns

6

Knowing the minimum support threshold Use frequent (k-1)-itemsets generate candidates of frequent k-itemsets Scan database and count each pattern in Get frequent k-itemsets

Review of Apriori

TID Items Bought

100 f , a , c , d , g , i , m , p200 a , b , c , f , i , m , o300 b , f , h , j , o400 b , c , k , s , p500 a , f , c , e , i , p , m , n

Apriori itemsets

f , a , c , d , g , i , m , p , l , o , h , j , k , s , b , e , n

f , a , c , m , b , pfa , fc , fm , fp , ac , am , … bp

fa , fc , fm, …… …

Page 7: Mining Frequent Patterns

7

Bottleneck of the Apriori-like method is at theCandidate set generationTest

How to avoid generating a huge set of candidates?A novel compact data structure, called FP-treeFP-tree based pattern fragment growth mining methodEmploying a divide-and-conquer search method for frequent

itemsets combinations

Review of Apriori

Page 8: Mining Frequent Patterns

8

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 9: Mining Frequent Patterns

9

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 10: Mining Frequent Patterns

10

Minimum support threshold 1. One scan of DB to identify the set of frequent items

Items are ordered in frequency descending orderFor convenience, the frequent itemsets of each transaction is listed

in this ordering

Frequent Pattern Tree: An Example

TID Items Bought

100 f , a , c , d , g , i , m , p200 a , b , c , f , i , m , o300 b , f , h , j , o400 b , c , k , s , p500 a , f , c , e , i , p , m , n

¿ ( 𝑓 : 4 ) , (𝑐 : 4 ) , (𝑎 : 3 ) , (𝑏 :3 ) , (𝑚: 3 ) ,(𝑝 :3)>¿Frequent items

TID Items Bought Ordered frequent

items100

f , a , c , d , g , i , m , p f , c , a , m , p

200 a , b , c , f , i , m , o f , c , a , b , m

300 b , f , h , j , o f , b

400 b , c , k , s , p c , b , p

500

a , f , c , e , i , p , m , n f , c , a , m , p

TID

Ordered frequent items

100 f , c , a , m , p

200 f , c , a , b , m

300 f , b

400 c , b , p

500 f , c , a , m , p

Page 11: Mining Frequent Patterns

11

1. One scan of DB to identify the set of frequent items2. Store the set of frequent items of each transaction in a tree

1. Create a “null” root2. Scan the DB for second time3. Add the paths which are the ordered frequent items4. Share the path until a different item comes up5. Branch and create a sub-path

Frequent Pattern Tree: An Example

TID

Ordered frequent items

100 f , c , a , m , p

200 f , c , a , b , m

300 f , b

400 c , b , p

500 f , c , a , m , p

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

Page 12: Mining Frequent Patterns

12

1. One scan of DB to identify the set of frequent items2. Store the set of frequent items of each transaction in a tree

1. To facilitate tree traversal, build item header table2. Nodes with the same item-name are linked

Frequent Pattern Tree: An Example

TID

Ordered frequent items

100 f , c , a , m , p

200 f , c , a , b , m

300 f , b

400 c , b , p

500 f , c , a , m , p

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

Page 13: Mining Frequent Patterns

13

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 14: Mining Frequent Patterns

14

1. The tree consist of

2. Each node in the tree has three fields

3. Each entry in the frequent-item header table consist of

Frequent Pattern Tree: Design and Construction

One root A set of item prefix subtrees as the children of the

root A frequent-item header table

Item-name Count Node-link

Item-name Head of node-link

Page 15: Mining Frequent Patterns

15

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 16: Mining Frequent Patterns

16

1. Constructing FP-tree Needs exactly two scans of DB First to collect the set of frequent items Second to construct the FP-tree

The cost of inserting transaction is is the number of frequent items in

2. Completeness the FP-tree contains all the information related to mining frequent patterns given the minimum support threshold

3. Compactness The size of the tree is bounded by the occurrences of frequent items The height of the tree is bounded by the maximum number of items in a transaction

Frequent Pattern Tree: Properties

Page 17: Mining Frequent Patterns

17

The frequent itemsets of transactions have descending orderAn example for unordered itemsets

Frequent Pattern Tree: Properties

TID

Ordered frequent items

100 p , m , a , c , f

200 m , b , a , c , f

300 b , f

400 p , b , c

500 p , m , a , c , f

m:2

a:2

c:2

f:2

c:1

b:1

p:1

m:2

b:1

a:2

c:1

f:2

b:1

c:1

p:3

rootroot

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

Page 18: Mining Frequent Patterns

18

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 19: Mining Frequent Patterns

19

1. Examine the mining process by starting from the bottom of the header table

Collect all the patterns that node participates

Starting from ’s head in the header table and following ’s node-links

Mining Frequent Patterns Using FP-tree

Page 20: Mining Frequent Patterns

20

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 21: Mining Frequent Patterns

21

Node p (p:3) FP-tree paths <f:4 , c:3 , a:3 , m:2 , p:2> , <c:1 , b:1 , p:1> Conditional pattern base {(f:2 , c:2 , a:2 , m:2), (c:1 , b:1)} Construction of a FP-tree on these

just keep the frequent items

Mining Frequent Patterns Using FP-tree: An Example

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

¿ (𝑐 :3 )>¿Frequent items

• <p:3 , cp:3>

Frequent itemsets containing p

Page 22: Mining Frequent Patterns

22

Node m (m:3) FP-tree paths <f:4 , c:3 , a:3 , m:2 > , < f:4 , c:3 , a:3 , b:1 , m:1 > Conditional pattern base {(f:2 , c:2 , a:2 ), (f:1 , c:1 , a:1 , b:1)} Construction of a FP-tree on these

just keep the frequent items create the tree

Mining Frequent Patterns Using FP-tree: An Example

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

¿ ( 𝑓 :3 ,𝑐 :3 ,𝑎 :3 )>¿Frequent items

• <m:3 , am:3 , cm:3 , fm:3 , cam:3 , • fam:3 , fcm:3 , fcam:3>

Frequent itemsets containing m

Page 23: Mining Frequent Patterns

23

Node b (b:3) FP-tree paths <f:4 , c:3 , a:3 , b:1 > , < f:4 , b:1 > , < c:1 , b:1 > Conditional pattern base {(f:1 , c:1 , a:1 ), (f:1), (c:1)} Construction of a FP-tree on these

just keep the frequent items create the tree

Mining Frequent Patterns Using FP-tree: An Example

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

¿ ()>¿Frequent items

• < b:3 >

Frequent itemsets containing m

Page 24: Mining Frequent Patterns

24

Node a (a:3) FP-tree paths <f:4 , c:3 , a:3 > Conditional pattern base {(f:3 , c:3)} Construction of a FP-tree on these

just keep the frequent items create the tree

Mining Frequent Patterns Using FP-tree: An Example

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

¿ ( 𝑓 :3 ,𝑐 :3 )>¿Frequent items

• <a:3 , fa:3 , ca:3 , fca:3>

Frequent itemsets containing m

Page 25: Mining Frequent Patterns

25

Node c (c:4) FP-tree paths <f:4 , c:3> , <c:1> Conditional pattern base {(f:3)} Construction of a FP-tree on these

just keep the frequent items create the tree

Mining Frequent Patterns Using FP-tree: An Example

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

¿ ( 𝑓 :3 ,𝑐 : 4 )>¿Frequent items

• <c:4 , fc:3>

Frequent itemsets containing m

Page 26: Mining Frequent Patterns

26

Node f (f:4) FP-tree paths <f:4 > Conditional pattern base {()} Construction of a FP-tree on these

just keep the frequent items create the tree

Mining Frequent Patterns Using FP-tree: An Example

root

f:4

c:3

a:3

m:2

b:1

b:1

p:2 m:1

c:1

b:1

p:1

item Head of pointer

f cabmp

¿ ()>¿Frequent items

• <f:4>

Frequent itemsets containing m

Page 27: Mining Frequent Patterns

27

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 28: Mining Frequent Patterns

28

Mining Frequent Patterns Using FP-tree: Design and construction

• FP-tree• Minimum support threshold

Input

• The complete set of frequent patterns

Output

• If Tree contains a single path • Then for each combination of the nodes () do

• Generate pattern • Support = min support in

• Else for each in the header table • Generate pattern with support = support• Construct ’s FP-tree call it • If • Then call FP-growth(

FP-growth(, )

Page 29: Mining Frequent Patterns

29

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 30: Mining Frequent Patterns

30

1. To calculate the frequent patterns containing in path Only consider prefix sub-path of node in The frequency count of every node in tat sub-path is the same as node

2. Suppose FP-tree has a single path The complete set of the frequent patterns of FP-tree can be generated by Enumeration of all the combinations of the sub-paths of The support of each is equal to the minimum support of the items contained in that sub-path

Mining Frequent Patterns Using FP-tree : Properties

Page 31: Mining Frequent Patterns

31

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 32: Mining Frequent Patterns

32

1. FP-tree is usually much smaller than the size of DB.

2. FP-trees constructed in the FP-growth are never bigger than the sub-paths

3. Mining operations consist of mainly prefix count adjustment Counting Pattern fragment concatenation

This is much less costly than Generating a very large number of candidate patterns Test each of them

Algorithm Efficiency Properties

Page 33: Mining Frequent Patterns

33

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 34: Mining Frequent Patterns

34

Comparison of FP-growth with Apriori

Performed on a 450MHz Pentium PC 128MB main memory Microsoft Windows/NT

Written in Microsoft/Visual C++6.0

Run Time was considered time interval between input and output

Two datasets

Performance Study

D1 D2Items → 1KAverage transaction size → 25Average maximal frequent itemset size → 10Number of transactions → 10K

Items → 10KAverage transaction size → 25Average maximal frequent itemset size → 20Number of transactions → 100K

Page 35: Mining Frequent Patterns

35

Performance Study

Page 36: Mining Frequent Patterns

36

Performance Study

Page 37: Mining Frequent Patterns

37

Performance Study

Page 38: Mining Frequent Patterns

38

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 39: Mining Frequent Patterns

39

Construction of FP-trees for projected Databases

Database is large

FP-tree can not be constructed in the main memory

Partition database into a set of projected databases

Construct an FP-tree

Mine it in each projected databases

Future Works

Page 40: Mining Frequent Patterns

40

Construction of a disk-resident FP-tree

Use B+-tree structure to index FP-tree Split the tree based on the common prefix paths

Materialization of an FP-tree

Constructing FP-tree needs two scan of the database Materialize an FP-tree for frequent pattern mining How to select a good minimum support threshold Use a low ?

Future Works

Page 41: Mining Frequent Patterns

41

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 42: Mining Frequent Patterns

42

Constructs a highly compact FP-tree Usually substantially smaller than the original database

Applies a pattern growth method Avoids costly candidate generation and tests

Applies a partitioning-based divide and conquer method Dramatically reduces the size of the subsequent conditional FP-trees

Mines both short and long patterns efficiently in large databases

Conclution

Page 43: Mining Frequent Patterns

43

Problem of mining frequent PatternReview of AprioriFrequent Pattern Tree

An ExampleDesign & Construction Properties

Mining Frequent Patterns Using FP-TreeAn ExampleDesign and ConstructionProperties

Algorithm Efficiency PropertiesPerformance StudyFuture WorksConclusionSelected Questions

Outline

Page 44: Mining Frequent Patterns

44

What are the components of a FP-tree?

How To calculate the frequent patterns containing in path

Compare efficiency of mining operation in FP-growth with Apriori

Selected questions

One root A set of item prefix subtrees as the children of the root A frequent-item header table

Only consider prefix sub-path of node in The frequency count of every node in tat sub-path is the same as node Find all the combinations

Mining operations consist of mainly prefix count adjustment Counting Pattern fragment concatenation

This is much less costly than Generating a very large number of candidate patterns Test each of them

Page 45: Mining Frequent Patterns

45

Without Candidate GenerationMining Frequent Patterns

Afsoon YousefiCS:332, March 24th, 2014Inspired by Song Wang slides

Jiawei Han, Jian Pei and Yiwen YinSchool of Computer ScienceSimon Fraser University

Page 46: Mining Frequent Patterns

Category 1 Category 2 Category 3 Category 40

1

2

3

4

5

6

Series 1 Series 2 Series 3