1 mining association rules with constraints wei ning joon wong cosc 6412 presentation

47
1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

Upload: peter-mcdaniel

Post on 17-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

1

Mining Association Rules with Constraints

Wei NingJoon Wong

COSC 6412 Presentation

Page 2: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

2

Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Page 3: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

3

Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Page 4: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

4

Introduction Recall mining association rules

Association rules mining finds interesting association or correlation relationships among a large set of data items.

Page 5: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

5

Some problems we met during mining association rules Overwhelming? Not what you want? Wait so long? Lack of Focus

Page 6: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

6

Introduction(cont.) Example in walmart Suppose a manager want to find

which is the most popular shoes in winter?

Page 7: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

7

Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Page 8: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

8

Mining frequent itemsets vs. Mining association rules Mining frequent itemsets is almost

the same as Mining association rules

Page 9: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

9

Constrained Mining A naive solution

First find all frequent sets, and then test them for constraint satisfaction

Our approach:Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation.

Page 10: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

10

Frequent Itemsets & Constraints

Given a transaction database

Frequent itemset: a subset of items frequently appear in transactions, e.g. {a, c}

Constraint: a predicate over itemsets C(I): sum(I)>50 C(abd)=

TransactionTID

a, b, c10b, c, d, f20

a, c30

TDB (min_sup=2)

Item

Value

a 40b 10c -20d 10e -30

true

Page 11: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

11

Mining Frequent Itemsets With Constraints Given

A transaction database TDB A support threshold min_sup A constraint C

Find the complete set of frequent itemsets satisfying the constraint

Use constraint to Express user’s focus Improve both effectiveness and efficiency

Page 12: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

12

Classification of Constraints We have the following classification of

constraints Anti-monotone Monotone Succinct Convertible

Convertible anti-monotone Convertible monotone Strongly convertible

Inconvertible

Page 13: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

13

Anti-Monotone Definition 1 (Anti-Monotone): A 1-var

constraint C is anti-monotone if for all sets S, S’: S S’ & S satisfies C S’ satisfies C.

Simply, when an intemset S violates

the constraint, so does any of its

superset

Page 14: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

14

Is Min(S) v anti-monotone?

S={5, 10, 14}, v = 7

Min(S) 7

{5} violates it. Superset {5}: {5, 10}, {5, 14}, {5, 10 ,

14}So does {5, 10}, {5, 14}, {5, 10 , 14}

Min(S) v is anti-monotone

Page 15: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

15

Succinct Definition 2 (Succinct)

I Item is a succinct set if it can be expressed as p(Item) for some selection predicate p.

SP 2Item is a succinct powerset if there is a fixed number of succinct sets Item1, … Itemk Item such that SP can be expressed in terms of the strict powersets of Item1,…,Itemk, using union and minus.

Finally, a 1-var constraint C is succinct provided SATc(Item) is a succinct powerset.

Page 16: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

16

Succinct General idea: we can enumerate

all and only those sets that are guaranteed to satisfy the constraint.

If a constraint is succinct, we can directly generate precisely the sets that satisfy it.

Page 17: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

17

Succinct example

Itemset containing a or b

Itemset containing some item with

value more than 30

Page 18: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

18

Succinct example C1 Item.Price 100

Item 1 = Item.price 100(Item)={a,b} 2Item1={{a}, {b}, {a, b}} SATc1 = {{a}, {b}, {a, b}} SATc1 = 2Item1

C1 is succinct

Page 19: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

19

Convertible Convert tough constraints into

anti-monotone or monotone by properly order items

Page 20: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

20

Convertible Definition: R is an order of items Convertible anti-monotone

Itemset X satisfies constraint so does every prefix of X w.r.t. R

Page 21: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

21

Convertible example constraint C: avg(X) 25

Order items in value-descending order

<a, f, g, d, b, h, c, e>

Itemset afd satisfies C So do prefixes a and af Thus, it becomes

Anti-monotone!

Item Value

a 40b 0c -20d 10e -30f 30g 20h -10

Item Value

a 40f 30g 20d 10b 0h -10c -20e -30

Page 22: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

22

Commonly Used Constraints— A General Picture

Constraint Antimonotone Monotone Succinct

v S no yes yes

S V no yes yes

S V yes no yes

min(S) v no yes yes

min(S) v yes no yes

max(S) v yes no yes

max(S) v no yes yes

count(S) v yes no weakly

count(S) v no yes weakly

sum(S) v ( a S, a 0 )

yes no no

sum(S) v ( a S, a 0 )

no yes no

range(S) v yes no no

range(S) v no yes no

avg(S) v, { , , }

convertible convertible no

support(S) yes no no

support(S) no yes no

Page 23: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

23

Optional Proof of min(S) v is Anti-monotone According to the table, min(S) v

is both anti-monotone and succinct.

I only proof anti-monotone here due to time limitation.

Something special…

Page 24: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

24

Constraint Classification

Convertibleanti-monotone

Convertiblemonotone

Stronglyconvertible

Inconvertible

Succinct

Antimonotone

Monotone

Page 25: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

25

Summary of ApproachRecapitulation

Basic idea about mining frequent itemsets with constraints.

Introduce several important constraints.

Page 26: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

26

Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Page 27: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

27

Algorithms There are many algorithms in

solving constrained based association rules mining. Algorithm Direct Algorithm MultiJoins & Reorder Algorithm Apriori† Algorithm Hybrid(m) Algorithm CAP (Main Focus)

Page 28: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

28

Design of Algorithm Sound

An algorithm is sound provided it only finds frequent sets that satisfy the given constraints.

Complete An algorithm is complete provided all

frequent sets satisfying the given constraints are found.

Page 29: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

29

Algorithm Apriori†

Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found.

Step 1) Apriori with Cfreq

Step 2) Apply C – Cfreq to get final Ans

Page 30: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

30

Algorithm Apriori†

(Pseudocode)1. C1 consists of sets of size 1; k = 1; Ans = ;

2. While (Ck not empty) {

2.1 conduct db scan to form Lk from Ck;

2.2 form Ck+1 from Lk based on Cfreq; k++; }

3. For each set S in some Lk:

Add S to Ans if S satisfies (C – Cfreq).

Page 31: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

The Apriori† Algorithm — An Example

Database TDB

1st scan

C1L1

L2

C2 C2

2nd scan

C3 L33rd scan

Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Itemset

sup

{A} 2

{B} 3

{C} 3

{D} 1

{E} 3

Itemset

sup

{A} 2

{B} 3

{C} 3

{E} 3

Itemset

{A, B}

{A, C}

{A, E}

{B, C}

{B, E}

{C, E}

Itemset

sup

{A, B} 1{A, C} 2{A, E} 1{B, C} 2{B, E} 3{C, E} 2

Itemset

sup

{A, C} 2{B, C} 2{B, E} 3{C, E} 2

Itemset

{B, C, E}

Itemset

sup

{B, C, E}

2

Page 32: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

The Apriori† Algorithm — An Example (cont.)

Database TDB

L2

Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Itemset

sup

{A} 2

{B} 3

{C} 3

{E} 3

Itemset

sup

{A, C} 2{B, C} 2{B, E} 3{C, E} 2

Itemset

sup

{B, C, E}

2

L3

L1 Constraint : {A, C, E}

T.ItemAns

{A}{C}{E}

{A, C}{C, E}

Page 33: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

33

Algorithm CAP Succinct and Anti-monotone

Strategy I: Replace C1 in the Apriori Algorithm by C1

C.

Anti-monotone but non-succinct Strategy II: Define Ck as in the Apriori

Algorithm. Drop a set S Ck from counting if S fails C, i.e., constraint satisfaction is tested before counting is done.

Page 34: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

34

Algorithm CAP (cont.) Succinct but non-anti-monotone

Strategy III: Too Complicated. To be discussed later…

Non-succinct & non-anti-monotone Strategy IV: Induce any weaker constraint C1

from C. Depending on whether C1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set.

Page 35: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

35

Algorithm CAP (Pseudocode)1 if Csam Csuc Cnone is non-empty, prepare C1 as indicated in

Strategies I, III, and IV; k = 1;2 if Csuc is non-empty {

2.1 conduct db scan to form L1 as indicated in Strategy III;

2.2 form C2 as indicated in Strategy III; k = 2;}

3 while (Ck not empty) {

3.1 conduct db scan to form Lk from Ck;

3.2 form Ck+1 from Lk based on Strategy III if Csuc is non-empty, and Strategy II for constraints in Cam;}

4. if Cnone is empty, Ans = ULk. Otherwise, for each set S in some Lk, add S to Ans iff S satisfies Cnone.

Page 36: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

The Algorithm CAP — An Example

Database TDB

Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Constraints : {A, C, E} T.Item & min support count = 2 Question : Which strategy should we apply?

Page 37: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

The Algorithm CAP — An Example (Cont.)

Database TDB

1st scanC1

L1

L2

C2

C2

2nd scan

C3

Tid Items

10 A, C, D

20 B, C, E

30 A, B, C, E

40 B, E

Itemset

sup

{A} 2

{C} 3

{E} 3

Itemset

sup

{A} 2

{C} 3

{E} 3

Itemset

{A, C}

{A, E}

{C, E}

Itemset

sup

{A, C} 2{A, E} 1{C, E} 2

Itemset

sup

{A, C} 2{C, E} 2

Itemset

{}

Because {A, E} is pruned earlier

Ans

{A}{C}{E}

{A, C}{C, E}

Apply Strategy I!!!

Page 38: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

38

Case 3 : Succinct but not anti-monotone. Revisit…

{1} {2} {3} {4} {1,2} {2,3}………{3,4}

………{1,2,3,4}

Some possible frequent sets may be lost: e.g. {1,8} {1,2,10}

Apriori

{1} {2} {3} {4} {5} {6} {7} {8} {9} {10} min (S) < 5

{1} {2} {3} {4}

**Information extracted from past presentation.

Page 39: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

39

Case 3 : Succinct but not anti-monotone. Continue…

Algorithm Direct Idea : Play it safe. Generate Cc

k+1 by using Lc

k x F where F is the set of all frequent items.

Algorithm MultiJoins Algorithm Reorder

Page 40: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

40

Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

Page 41: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

41

Performance Analysis (Specification) Programs written in C Generate transactional databases

using program from IBM Almaden Research Center

100,000 records, domain of 1,000 items

Page size 4KB SPARC-10 environment

Page 42: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

42

Performance Analysis (Terminology) Speedup

Comparison of execution time between two algorithms.

Item Selectivity x% of them items satisfying the constraints.

Support Threshold *Low support threshold means more

frequent set to process.

Page 43: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

43

Performance Analysis Note: Support

threshold set at 0.5%.

For 10% selectivity, CAP runs 80 times faster than Apriori†!

For 30% selectivity, the speedup is about 10 times.

Page 44: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

44

Performance Analysis Note: Item Selectivity

fixed at 30%.

Support threshold goes up, frequent item set goes down, Apriori† improves.

CAP still at least 8 times faster.

Page 45: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

45

Performance Analysis

Each entry is of the form a/b a is the # of frequent set satisfying the constraint. B is the total number of frequent set.

For L4 with support of 0.2%, Apriori† finds 1250 frequent sets where 8 of which is found by CAP.

Support L1 L2 L3 L4 L5 L6 L7 L8

0.2% 174/582 79/969 29/1140 8/1250 1/934 0/451 0/132 0/20

0.6% 98/313 1/12 0/1 0 0 0 0 0

Page 46: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

46

Conclusion The idea of anti-monotonicity,

succinctness, and convertible are introduced in the paper.

Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining.

Page 47: 1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation

47

Reference R. Srikant, Q. Vu, and R. Agrawal. Mining

association rules with item constraints. KDD’97.

R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’98.

J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’00.