association mining

62
Association Mining Data Mining Spring 2012

Upload: vernon-kidd

Post on 31-Dec-2015

24 views

Category:

Documents


2 download

DESCRIPTION

Association Mining. Data Mining Spring 2012. Transactional Database. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk}. Items and Itemsets. Item = {Milk}, {Cheese}, {Bread}, etc. Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Association Mining

Association Mining

Data Mining

Spring 2012

Page 2: Association Mining

• Transactional Database

• Transaction – A row in the database

• i.e.: {Eggs, Cheese, Milk}

Transactional Database

Transactional dataset

Eggs Cheese Milk

Milk Jam

Cheese Bacon Eggs Cat food

Butter Bread

Bread Butter Eggs Milk Cheese

Page 3: Association Mining

• Item = {Milk}, {Cheese}, {Bread}, etc.

• Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk}

• Doesn’t have to be in the dataset

• Can be of size 1 – n

Items and Itemsets

Transactional dataset

Eggs Cheese Milk

Milk Jam

Cheese Bacon Eggs Cat food

Butter Bread

Bread Butter Eggs Milk Cheese

Page 4: Association Mining

The Support Measure

Page 5: Association Mining

Support Examples

Support({Eggs}) = 3/5 = 60%

Support({Eggs, Milk}) = 2/5 = 40%

Transactional dataset

Eggs Cheese Milk

Milk Jam

Cheese Bacon Eggs Cat food

Butter Bread

Bread Butter Eggs Milk Cheese

Page 6: Association Mining

Minimum Support

Minsup – The minimum support threshold for an itemset to be considered frequent (User defined)

Frequent itemset – an itemset in a database whose support is greater than or equal to minsup.

Support(X) > minsup = frequent

Support(X) < minsup = infrequent

Page 7: Association Mining

Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60% Pass

Support({Eggs, Milk}) = 2/5 = 40% Fail

Transactional dataset

Eggs Cheese Milk

Milk Jam

Cheese Bacon Eggs Cat food

Butter Bread

Bread Butter Eggs Milk Cheese

Page 8: Association Mining

Association Rules

Page 9: Association Mining

Confidence Example 1

{Eggs} => {Bread}

Confidence = sup({Eggs, Bread})/Sup({Eggs})

Confidence = (1/5)/(3/5) = 33%

Transactional dataset

Eggs Cheese Milk

Milk Jam

Cheese Bacon Eggs Cat food

Butter Bread

Bread Butter Eggs Milk Cheese

Page 10: Association Mining

Confidence Example 2

{Milk} => {Eggs, Cheese}

Confidence = sup({Milk, Eggs, Cheese})/sup({Milk})

Confidence = (2/5)/(3/5) = 66%

Transactional dataset

Eggs Cheese Milk

Milk Jam

Cheese Bacon Eggs Cat food

Butter Bread

Bread Butter Eggs Milk Cheese

Page 11: Association Mining

Strong Association Rules

Minimum Confidence – A user defined minimum bound on confidence. (Minconf)

Strong association rule – a rule X=>Y whose conf > minconf.

- this is a potentially interesting rule for the user.

Conf(X=>Y) > minconf = strong

Conf(X=>Y) < minconf = uninteresting

Page 12: Association Mining

Minimum Confidence Example

Minconf = 50%

{Eggs} => {Bread}

Confidence = (1/5)/(3/5) = 33% Fail

{Milk} => {Eggs, Cheese}

Confidence = (2/5)/(3/5) = 66% Pass

Page 13: Association Mining

Association Mining

Association Mining:

- Finds strong rules contained in a dataset from frequent itemsets.

Can be divided into two major subtasks:1. Finding frequent itemsets2. Rule generation

Page 14: Association Mining

• Some algorithms change items into letters or numbers

• Numbers are more compact

• Easier to make comparisons

Transactional Database Revisited

Transactional dataset

1 2 3

3 5

2 7 1 4

6 8

8 6 1 3 2

Page 15: Association Mining

Basic Set Logic

Subset – a subset itemset X is contained in an itemset Y.

Superset – a superset itemset Y contains an itemset X.

example: X = {1,2} Y = {1,2,3,5} Y

X

Page 16: Association Mining

Apriori

Arranges database into a temporary lattice structure to find associations

Apriori principle –

1. itemsets in the lattice with support < minsup will only produce supersets with support < minsup.

2. the subsets of frequent itemsets are always frequent.

Prunes lattice structure of non-frequent itemsets using minsup.

Reduces the number of comparisons Reduces the number of candidate itemsets

Page 17: Association Mining

Monotonicity

Monotone (upward closed) - if X is a subset of Y,

then support(X) cannot exceed support(Y).

Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X).

Apriori is anti-monotone.- uses this property to prune the lattice structure.

Page 18: Association Mining

Itemset Lattice

Page 19: Association Mining

Lattice Pruning

Page 20: Association Mining

Lattice Example

1 2 3 4 5

2 4

1 2 4

1 4

Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in dbPrune anything less than minsup = 30%

Page 21: Association Mining

Lattice Example

1 2 3 4 5

2 4

1 2 4

1 4

1 2 3 4 5

2 4

1 2 4

1 4

1 2 3 4 5

2 4

1 2 4

1 4

Count occurrences of each 2-itemset in the database and compute their supportPrune anything less than minsup = 30%

Page 22: Association Mining

Lattice Example

A B C D E

B D

A B D

A D

Count occurrences of the last 3-itemset in the database and compute its support.Prune anything less than minsup = 30%

Page 23: Association Mining

Example - Results

1 2 3 4 5

2 4

1 2 4

1 4

Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}

Page 24: Association Mining

Apriori Algorithm

Page 25: Association Mining

Frequent Itemset Generation

Itemset Support Frequent

{1} 75% Yes

{2} 50% No

{3} 75% Yes

{4} 25% No

{5} 100% Yes

Transactional Database

1 2 3 4 5

2 3 5

1 3 5

1 5

1. Minsup = 70%2. Generate all 1-itemsets3. Calculate the support for each itemset4. Determine whether or not the itemsets are frequent

Page 26: Association Mining

Frequent Itemset Generation

Itemset Support Frequent

{1,3} 50% Yes

{1,5} 75% Yes

{3,5} 75% Yes

Transactional Database

1 2 3 4 5

2 3 5

1 3 5

1 5

Generate all 2-itemsets, minsup = 70%

{1} U {3} = {1,3} , {1} U {5} = {1,5}

{3} U {5} = {3,5}

Page 27: Association Mining

Frequent Itemset Generation

Itemset Support Frequent

{1,3,5} 50% Yes

Transactional Database

1 2 3 4 5

2 3 5

1 3 5

1 5

Generate all 3-itemsets, minsup = 70%

{1,3} U {1,5} = {1,3,5}

Page 28: Association Mining

Frequent Itemset Results

All frequent itemsets generated are output:

{1} , {3} , {5}

{1,3} , {1,5} , {3,5}

{1,3,5}

Page 29: Association Mining

Apriori Rule Mining

Page 30: Association Mining

Apriori Rule Mining

Rule Combinations: 1. {1,2} 2-itemsets

{1}=>{2}{2}=>{1}

2. {1,2,3} 3-itemsets

{1}=>{2,3}{2,3}=>{1}{1,2}=>{3}{3}=>{1,2}{1,3}=>{2}{2}=>{1,3}

Page 31: Association Mining

Strong Rule Generation

Transactional Database

1 2 3 4 5

2 3 5

1 3 5

1 5

1. I = {{1}, {3}, {5}}2. Rules = X => Y3. Minconf = 80%

Page 32: Association Mining

Strong Rule Generation

Transactional Database

1 2 3 4 5

2 3 5

1 3 5

1 5

1. I = {{1}, {3}, {5}}2. Rules = X => Y3. Minconf = 80%

Page 33: Association Mining

Strong Rules Results

All strong rules generated are output:

{1}=>{5}{3}=>{5}{2}=>{3,5}{2,3}=>{5}{2,5}=>{3}

Page 34: Association Mining

Other Frequent Itemsets

Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X.

Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.

Page 35: Association Mining

Itemset Relationships

Frequent Itemsets

Closed Frequent Itemsets Maximal

FrequentItemsets

Page 36: Association Mining

Targeted Association Mining

Page 37: Association Mining

Targeted Association Mining

* Users may only be interested in specific results

* Potential to get smaller, faster, and more focused results

* Examples: 1. User wants to know how often only bread and garlic cloves occur together.

2. User wants to know what items occur with toilet paper.

Page 38: Association Mining

Itemset Trees

* Itemset Tree: - A data structure which aids in users querying for a

specific itemset and it’s support.

* Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order.

{Bread, Onion, Garlic} = {1, 2, 3}

* Why use numbers?- make the tree more compact - numbers follow ordering easily

Page 39: Association Mining

Itemset Trees

An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T1, T2, . . . , Tk} each element of which is an

itemset tree.

* If Ij is in the root, then it will also be inThe root’s children

* If Ij is not in the root, then it might be in the root’s children if:

first_item(I) < first_item(Ij) and

last_item(I) < last_item(Ij)

Page 40: Association Mining

Building an Itemset TreeLet ci be a node in the itemset tree.Let I be a transaction from the dataset

Loop: Case 1: ci = I

Case 2: ci is a child of I

- make I the parent node of ci

Case 3: ci and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6}

- make a node for the overlap- make I and ci it’s children.

Case 4: ci is a parent of I- Loop to check ci’s children- make I a child of ci

Note: {2,6} and {1,2,6} do not have a Lexical overlap

Page 41: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Page 42: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Child node.

Page 43: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Child node.

Page 44: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Child node.

Page 45: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Lexical overlap

Page 46: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Parent node.

Page 47: Association Mining

Itemset Trees - Creation

Dataset

2 4

1 2 3 5

3 9

1 2 6

2

2 9

Child node.

Page 48: Association Mining

Itemset Trees – Querying

Let I be an itemset, Let ci be a node in the treeLet totalSup be the total count for I in the tree

For all s.t. first_item(ci) < first_item(I):

Case 1: If I is contained in ci. - Add support to totalSup.

Case 2: If I is not contained and last_item(ci) < last_item(I)- proceed down the tree

Page 49: Association Mining

Example 1

Page 50: Association Mining

Itemset Trees - Querying

Querying Example 1:

Query: {2}

totalSup = 0

Page 51: Association Mining

Itemset Trees - Querying

Querying Example 1:

Query: {2}

2 = 2

Add to support:

totalSup = 3

Page 52: Association Mining

Itemset Trees - Querying

Querying Example 1:

Query: {2}

1,2 contains 2

Add to support

totalSup = 3 + 2 = 5

Page 53: Association Mining

Itemset Trees - Querying

Querying Example 1:

Query: {2,9}

3 > 2, and end of Subtree.

Return support

totalSup = 5

Page 54: Association Mining

Example 2

Page 55: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 0

Page 56: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 0

2 < 22 < 9 continue

Page 57: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 0

2 < 24 < 9

{2,4} doesn’t contain{2,9}, go to next sibling

Page 58: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 1

{2,9} = {2,9}

Add to support!

Page 59: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 1

1 < 22 < 9

continue

Page 60: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 1

1 < 25 < 9

{1,2,3,5} doesn’t contain{2,9}, go to next sibling

Page 61: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 1

1 < 26 < 9

{1,2,6} doesn’t contain{2,9}, go to next node

Page 62: Association Mining

Itemset Trees - Querying

Querying Example 2:

Query: {2,9}

totalSup = 1

3 < 2 <= fail9 < 9

End of tree,

totalSupp = 1

Nodes = 8