1 mining association rules with constraints wei ning joon wong cosc 6412 presentation
Embed Size (px)
TRANSCRIPT

1
Mining Association Rules with Constraints
Wei NingJoon Wong
COSC 6412 Presentation

2
Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

3
Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

4
Introduction Recall mining association rules
Association rules mining finds interesting association or correlation relationships among a large set of data items.

5
Some problems we met during mining association rules Overwhelming? Not what you want? Wait so long? Lack of Focus

6
Introduction(cont.) Example in walmart Suppose a manager want to find
which is the most popular shoes in winter?

7
Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

8
Mining frequent itemsets vs. Mining association rules Mining frequent itemsets is almost
the same as Mining association rules

9
Constrained Mining A naive solution
First find all frequent sets, and then test them for constraint satisfaction
Our approach:Analyze the properties of constraints comprehensively Push them as deeply as possible inside the frequent pattern computation.

10
Frequent Itemsets & Constraints
Given a transaction database
Frequent itemset: a subset of items frequently appear in transactions, e.g. {a, c}
Constraint: a predicate over itemsets C(I): sum(I)>50 C(abd)=
TransactionTID
a, b, c10b, c, d, f20
a, c30
TDB (min_sup=2)
Item
Value
a 40b 10c -20d 10e -30
true

11
Mining Frequent Itemsets With Constraints Given
A transaction database TDB A support threshold min_sup A constraint C
Find the complete set of frequent itemsets satisfying the constraint
Use constraint to Express user’s focus Improve both effectiveness and efficiency

12
Classification of Constraints We have the following classification of
constraints Anti-monotone Monotone Succinct Convertible
Convertible anti-monotone Convertible monotone Strongly convertible
Inconvertible

13
Anti-Monotone Definition 1 (Anti-Monotone): A 1-var
constraint C is anti-monotone if for all sets S, S’: S S’ & S satisfies C S’ satisfies C.
Simply, when an intemset S violates
the constraint, so does any of its
superset

14
Is Min(S) v anti-monotone?
S={5, 10, 14}, v = 7
Min(S) 7
{5} violates it. Superset {5}: {5, 10}, {5, 14}, {5, 10 ,
14}So does {5, 10}, {5, 14}, {5, 10 , 14}
Min(S) v is anti-monotone

15
Succinct Definition 2 (Succinct)
I Item is a succinct set if it can be expressed as p(Item) for some selection predicate p.
SP 2Item is a succinct powerset if there is a fixed number of succinct sets Item1, … Itemk Item such that SP can be expressed in terms of the strict powersets of Item1,…,Itemk, using union and minus.
Finally, a 1-var constraint C is succinct provided SATc(Item) is a succinct powerset.

16
Succinct General idea: we can enumerate
all and only those sets that are guaranteed to satisfy the constraint.
If a constraint is succinct, we can directly generate precisely the sets that satisfy it.

17
Succinct example
Itemset containing a or b
Itemset containing some item with
value more than 30

18
Succinct example C1 Item.Price 100
Item 1 = Item.price 100(Item)={a,b} 2Item1={{a}, {b}, {a, b}} SATc1 = {{a}, {b}, {a, b}} SATc1 = 2Item1
C1 is succinct

19
Convertible Convert tough constraints into
anti-monotone or monotone by properly order items

20
Convertible Definition: R is an order of items Convertible anti-monotone
Itemset X satisfies constraint so does every prefix of X w.r.t. R

21
Convertible example constraint C: avg(X) 25
Order items in value-descending order
<a, f, g, d, b, h, c, e>
Itemset afd satisfies C So do prefixes a and af Thus, it becomes
Anti-monotone!
Item Value
a 40b 0c -20d 10e -30f 30g 20h -10
Item Value
a 40f 30g 20d 10b 0h -10c -20e -30

22
Commonly Used Constraints— A General Picture
Constraint Antimonotone Monotone Succinct
v S no yes yes
S V no yes yes
S V yes no yes
min(S) v no yes yes
min(S) v yes no yes
max(S) v yes no yes
max(S) v no yes yes
count(S) v yes no weakly
count(S) v no yes weakly
sum(S) v ( a S, a 0 )
yes no no
sum(S) v ( a S, a 0 )
no yes no
range(S) v yes no no
range(S) v no yes no
avg(S) v, { , , }
convertible convertible no
support(S) yes no no
support(S) no yes no

23
Optional Proof of min(S) v is Anti-monotone According to the table, min(S) v
is both anti-monotone and succinct.
I only proof anti-monotone here due to time limitation.
Something special…

24
Constraint Classification
Convertibleanti-monotone
Convertiblemonotone
Stronglyconvertible
Inconvertible
Succinct
Antimonotone
Monotone

25
Summary of ApproachRecapitulation
Basic idea about mining frequent itemsets with constraints.
Introduce several important constraints.

26
Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

27
Algorithms There are many algorithms in
solving constrained based association rules mining. Algorithm Direct Algorithm MultiJoins & Reorder Algorithm Apriori† Algorithm Hybrid(m) Algorithm CAP (Main Focus)

28
Design of Algorithm Sound
An algorithm is sound provided it only finds frequent sets that satisfy the given constraints.
Complete An algorithm is complete provided all
frequent sets satisfying the given constraints are found.

29
Algorithm Apriori†
Main idea : Use Apriori Algorithm to get the frequent item sets. Then apply the constraints on the item sets found.
Step 1) Apriori with Cfreq
Step 2) Apply C – Cfreq to get final Ans

30
Algorithm Apriori†
(Pseudocode)1. C1 consists of sets of size 1; k = 1; Ans = ;
2. While (Ck not empty) {
2.1 conduct db scan to form Lk from Ck;
2.2 form Ck+1 from Lk based on Cfreq; k++; }
3. For each set S in some Lk:
Add S to Ans if S satisfies (C – Cfreq).

The Apriori† Algorithm — An Example
Database TDB
1st scan
C1L1
L2
C2 C2
2nd scan
C3 L33rd scan
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset
sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Itemset
sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset
sup
{A, B} 1{A, C} 2{A, E} 1{B, C} 2{B, E} 3{C, E} 2
Itemset
sup
{A, C} 2{B, C} 2{B, E} 3{C, E} 2
Itemset
{B, C, E}
Itemset
sup
{B, C, E}
2

The Apriori† Algorithm — An Example (cont.)
Database TDB
L2
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset
sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset
sup
{A, C} 2{B, C} 2{B, E} 3{C, E} 2
Itemset
sup
{B, C, E}
2
L3
L1 Constraint : {A, C, E}
T.ItemAns
{A}{C}{E}
{A, C}{C, E}

33
Algorithm CAP Succinct and Anti-monotone
Strategy I: Replace C1 in the Apriori Algorithm by C1
C.
Anti-monotone but non-succinct Strategy II: Define Ck as in the Apriori
Algorithm. Drop a set S Ck from counting if S fails C, i.e., constraint satisfaction is tested before counting is done.

34
Algorithm CAP (cont.) Succinct but non-anti-monotone
Strategy III: Too Complicated. To be discussed later…
Non-succinct & non-anti-monotone Strategy IV: Induce any weaker constraint C1
from C. Depending on whether C1 is anti-monotone and/or succinct, use one of the strategies I-III above for the generation of frequent set.

35
Algorithm CAP (Pseudocode)1 if Csam Csuc Cnone is non-empty, prepare C1 as indicated in
Strategies I, III, and IV; k = 1;2 if Csuc is non-empty {
2.1 conduct db scan to form L1 as indicated in Strategy III;
2.2 form C2 as indicated in Strategy III; k = 2;}
3 while (Ck not empty) {
3.1 conduct db scan to form Lk from Ck;
3.2 form Ck+1 from Lk based on Strategy III if Csuc is non-empty, and Strategy II for constraints in Cam;}
4. if Cnone is empty, Ans = ULk. Otherwise, for each set S in some Lk, add S to Ans iff S satisfies Cnone.

The Algorithm CAP — An Example
Database TDB
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Constraints : {A, C, E} T.Item & min support count = 2 Question : Which strategy should we apply?

The Algorithm CAP — An Example (Cont.)
Database TDB
1st scanC1
L1
L2
C2
C2
2nd scan
C3
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset
sup
{A} 2
{C} 3
{E} 3
Itemset
sup
{A} 2
{C} 3
{E} 3
Itemset
{A, C}
{A, E}
{C, E}
Itemset
sup
{A, C} 2{A, E} 1{C, E} 2
Itemset
sup
{A, C} 2{C, E} 2
Itemset
{}
Because {A, E} is pruned earlier
Ans
{A}{C}{E}
{A, C}{C, E}
Apply Strategy I!!!

38
Case 3 : Succinct but not anti-monotone. Revisit…
{1} {2} {3} {4} {1,2} {2,3}………{3,4}
………{1,2,3,4}
Some possible frequent sets may be lost: e.g. {1,8} {1,2,10}
Apriori
{1} {2} {3} {4} {5} {6} {7} {8} {9} {10} min (S) < 5
{1} {2} {3} {4}
**Information extracted from past presentation.

39
Case 3 : Succinct but not anti-monotone. Continue…
Algorithm Direct Idea : Play it safe. Generate Cc
k+1 by using Lc
k x F where F is the set of all frequent items.
Algorithm MultiJoins Algorithm Reorder

40
Outline Introduction Summary of Approach Algorithm CAP Performance Analysis Conclusion References

41
Performance Analysis (Specification) Programs written in C Generate transactional databases
using program from IBM Almaden Research Center
100,000 records, domain of 1,000 items
Page size 4KB SPARC-10 environment

42
Performance Analysis (Terminology) Speedup
Comparison of execution time between two algorithms.
Item Selectivity x% of them items satisfying the constraints.
Support Threshold *Low support threshold means more
frequent set to process.

43
Performance Analysis Note: Support
threshold set at 0.5%.
For 10% selectivity, CAP runs 80 times faster than Apriori†!
For 30% selectivity, the speedup is about 10 times.

44
Performance Analysis Note: Item Selectivity
fixed at 30%.
Support threshold goes up, frequent item set goes down, Apriori† improves.
CAP still at least 8 times faster.

45
Performance Analysis
Each entry is of the form a/b a is the # of frequent set satisfying the constraint. B is the total number of frequent set.
For L4 with support of 0.2%, Apriori† finds 1250 frequent sets where 8 of which is found by CAP.
Support L1 L2 L3 L4 L5 L6 L7 L8
0.2% 174/582 79/969 29/1140 8/1250 1/934 0/451 0/132 0/20
0.6% 98/313 1/12 0/1 0 0 0 0 0

46
Conclusion The idea of anti-monotonicity,
succinctness, and convertible are introduced in the paper.
Sound, complete, and efficient algorithms are introduced for the constraint based association rule mining.

47
Reference R. Srikant, Q. Vu, and R. Agrawal. Mining
association rules with item constraints. KDD’97.
R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. SIGMOD’98.
J. Pei and J. Han. Can we push more constraints into frequent pattern mining? KDD’00.