temporal data miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · temporal data mining • many data...

141
Temporal Data Mining Christian Moewes [email protected] Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing and Language Engineering Zittau Fuzzy Colloquium 2011 Zittau, September 9, 2011

Upload: others

Post on 27-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Data Mining

Christian [email protected]

Otto-von-Guericke University of MagdeburgFaculty of Computer Science

Department of Knowledge Processing and Language Engineering

Zittau Fuzzy Colloquium 2011 Zittau, September 9, 2011

Page 2: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Outline

1. Introduction

Data Mining

Knowledge discovery in databases

CRISP-DM

Why to Study Temporal Data Mining?

2. Association Rules and Frequent Item Sets

3. Frequent Sequence Mining

4. Finding Motifs in Time Series Effectively

Page 3: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Data

• today: companies/institutes maintain huge databases

⇒ gigantic archives of tables, documents, images, sounds

• “If you have enough data, you can solve any problem!”

• in large databases: can’t see the wood for the trees

• patterns, structures, regularities stay undetected

• finding patterns and exploit information is fairly difficult

We are drowning in information but starved for knowledge.[John Naisbitt]

C. Moewes Temporal Data Mining Zittau, September 9, 2011 1 / 114

Page 4: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Knowledge discovery in databases

• actually, abundance of data

• lack of tools transforming datainto knowledge

⇒ research area: knowledgediscovery in databases (KDD)

• nontrivial process of identifyingvalid, novel, potentially useful,and ultimately understandablepatterns in data

• one step in KDD: data mining Miner VGA (1989) screenshot

C. Moewes Temporal Data Mining Zittau, September 9, 2011 2 / 114

Page 5: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Data mining tasks

• classificationIs this customer creditworthy or not?

• segmentation, clusteringWhat groups of customers do I have?

• concept descriptionWhich properties characterize cured patients?

• predictionHow many new cars will be called next month?

• dependence/association analysisWhich CAN-bus errors of broken cars occur together frequently?

• . . .

C. Moewes Temporal Data Mining Zittau, September 9, 2011 3 / 114

Page 6: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

CRISP-DMCRoss Industry Standard Process for Data Mining

C. Moewes Temporal Data Mining Zittau, September 9, 2011 4 / 114

Page 7: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Data Mining

• many data mining problems deal with temporal features

• most prominent: (numeric/symbolic) time series

• time series are ubiquitous: finance, medicine, biometry, chemistry,astronomy, robotics, networks and industry

• upcoming: temporal sequences of arbitrary objects, e.g. subgraphs

• challenges: preprocessing, (dis)similarity measures, representation,search for useful information

• last decade: research on purely proposing new algorithms

• nowadays: research on application-based solutions

C. Moewes Temporal Data Mining Zittau, September 9, 2011 5 / 114

Page 8: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Outline

1. Introduction

2. Association Rules and Frequent Item Sets

The Apriori Algorithm

Association Rules

Rule Icons: Industrial Applications

3. Frequent Sequence Mining

4. Finding Motifs in Time Series Effectively

Page 9: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Set Mining: Motivation

• method for market basket analysis

• finding regularities in shopping behavior of customers ofsupermarkets, mail-order companies, on-line shops etc.

• more specifically: find sets of products that are frequentlybought together

• possible applications:• improve arrangement of products in shelves, on catalog’s pages• support cross-selling (suggestion of other products), product

bundling• fraud detection, technical dependence analysis

• often found patterns are expressed as association rules, e.g.

If a customer buys bread and wine,then she/he will probably also buy cheese.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 6 / 114

Page 10: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Set Mining: Basic Notions

• let A = {a1, . . . , am} be set of items (products, specialequipment items, service options, . . . )

• any subset I ⊆ A is called item setitem set: set of products that can be bought (together)

• let T = (t1, . . . , tn) with ∀i , 1 ≤ i ≤ n : ti ⊆ A be vector oftransactions over A,each transaction is item set, but some item sets may /∈ Ttransactions needn’t be pairwise different: may be ti = tk fori 6= kT may also be bag or multiset of transactionsA may not be explicitely given, but only implicitely as A =

⋃ni=1 ti

T can list, e.g. sets of products bought by customers ofsupermarket in given period of time

C. Moewes Temporal Data Mining Zittau, September 9, 2011 7 / 114

Page 11: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Set Mining: Basic Notions

let I ⊆ A be item set and T vector of transactions over A

• transaction t ∈ T covers item set I oritem set I is contained in transaction t ∈ T iff I ⊆ t.

• set KT (I) = {k ∈ {1, . . . , n} | I ⊆ tk} is called cover of I w.r.t. Tcover of item set is index set of transactions that cover itit may also be defined as vector of all transactions that cover it(however, this is complicated to write in formally correct way)

• value sT (I) = |KT (I)| is called (absolute) support of I w.r.t. Tvalue σT (I) =

1n

|KT (I)| is called relative support of I w.r.t. Tsupport of I is number or fraction of transactions that contain itsometimes σT (I) is also called (relative) frequency of I w.r.t. T

C. Moewes Temporal Data Mining Zittau, September 9, 2011 8 / 114

Page 12: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Set Mining: Formal Definition

given:

• set A = {a1, . . . , am} of items,

• vector T = (t1, . . . , tn) of transactions over A,

• number smin ∈ IN, 0 < smin ≤ n or (equivalently)number σmin ∈ IR, 0 < σmin ≤ 1, the minimum support

desired:

• set of frequent item sets, i.e.set FT (smin) = {I ⊆ A | sT (I) ≥ smin} or (equivalently)set ΦT (σmin) = {I ⊆ A | σT (I) ≥ σmin}

note: with the relations smin = ⌈nσmin⌉ and σmin = 1nsmin the two

versions can easily be transformed into each other

C. Moewes Temporal Data Mining Zittau, September 9, 2011 9 / 114

Page 13: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• minimum support smin = 3 or σmin = 0.3 = 30% in this example

• there are ??? possible item sets over A = {a, b, c , d , e}

• there are 16 frequent item sets (but only 10 transactions)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 10 / 114

Page 14: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• minimum support smin = 3 or σmin = 0.3 = 30% in this example

• there are 25 = 32 possible item sets over A = {a, b, c , d , e}

• there are 16 frequent item sets (but only 10 transactions)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 10 / 114

Page 15: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Properties of the Support of an Item Set

• brute force approach (enumerate all possible item sets,determine their support and discard infrequent item sets) isinfeasible: number of possible item sets grows exponentially withnumber of itemstypical supermarket has thousands of different products

• idea: consider properties of support, in particular:

∀I : ∀J ⊇ I : KT (J) ⊆ KT (I)

this property holds, since ∀t : ∀I : ∀J ⊇ I : J ⊆ t → I ⊆ teach item is one condition a transaction must satisfytransactions not satisfying this condition are removed from cover

• it follows: ∀I : ∀J ⊇ I : sT (I) ≥ sT (J) i.e.if an item set is extended, its support cannot increasesupport is anti-monotone or downward closed

C. Moewes Temporal Data Mining Zittau, September 9, 2011 11 / 114

Page 16: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Properties of the Support of an Item Set

• from ∀I : ∀J ⊇ I : sT (I) ≥ sT (J) it follows

∀smin : ∀I : ∀J ⊇ I : sT (I) < smin → sT (J) < smin.

i.e. no superset of an infrequent item set can be frequent

• Apriori propertyrationale: sometimes we can know a priori, i.e. before checking itssupport by accessing given transaction vector, that an item setcannot be frequent

• contraposition of this implication also holds:

∀smin : ∀I : ∀J ⊆ I : sT (I) ≥ smin → sT (J) ≥ smin.

i.e. all subsets of frequent item set are frequent

• compressed representation of set of frequent item sets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 12 / 114

Page 17: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Maximal Item Sets

• consider set of maximal (frequent) item sets:

MT (smin) = {I ⊆ A | sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < smin}

i.e. item set is maximal if it is frequent, but none of its propersupersets is frequent

• so, we know that

∀smin : ∀I : I ∈ MT (smin) ∨ ∃J ⊃ I : sT (J) ≥ smin

it follows

∀smin : ∀I : I ∈ FT (smin) → ∃J ∈ MT (smin) : I ⊆ J

i.e. every frequent item set has a maximal superset

• therefore:∀smin : FT (smin) =

I∈MT (smin)

2I

C. Moewes Temporal Data Mining Zittau, September 9, 2011 13 / 114

Page 18: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Maximal Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• which item sets are maximal?

• every frequent item set is a subset of at least one of these sets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 14 / 114

Page 19: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Maximal Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c},

• every frequent item set is a subset of at least one of these sets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 14 / 114

Page 20: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Maximal Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c}, {a, c , d},

• every frequent item set is a subset of at least one of these sets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 14 / 114

Page 21: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Maximal Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c}, {a, c , d}, {a, c , e},

• every frequent item set is a subset of at least one of these sets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 14 / 114

Page 22: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Maximal Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c}, {a, c , d}, {a, c , e}, {a, d , e}

• every frequent item set is a subset of at least one of these sets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 14 / 114

Page 23: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Limits of Maximal Item Sets

• set of maximal item sets captures set of all frequent item sets,but then we only know support of maximal item sets

• about support of a non-maximal frequent item set we only know:

∀smin : ∀I ∈ FT (smin) − MT (smin) : sT (I) ≥ maxJ∈MT (smin),J⊃I

sT (J)

this follows immediately from ∀I : ∀J ⊇ I : sT (I) ≥ sT (J),i.e. an item set cannot have lower support than any of itssupersets

• note, in general

∀smin : ∀I ∈ FT (smin) : sT (I) ≥ maxJ∈MT (smin),J⊇I

sT (J)

• question: can we find subset of set of all frequent item sets,which also preserves knowledge of all support values?

C. Moewes Temporal Data Mining Zittau, September 9, 2011 15 / 114

Page 24: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Closed Item Sets

• consider set of closed (frequent) item sets:

CT (smin) = {I ⊆ A | sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < sT (I)}

i.e. item set is closed if it is frequent, but none of its propersupersets has same support

• with this we know that

∀smin : ∀I : I ∈ CT (smin) ∨ ∃J ⊃ I : sT (J) = sT (I)

it follows

∀smin : ∀I : I ∈ FT (smin) → ∃J ∈ CT (smin) : I ⊆ J

i.e. every frequent item set has a closed superset

• therefore:∀smin : FT (smin) =

I∈CT (smin)

2I

C. Moewes Temporal Data Mining Zittau, September 9, 2011 16 / 114

Page 25: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Closed Item Sets

• however, not only every frequent item set has closed superset, butit has closed superset with the same support:

∀smin : ∀I : I ∈ FT (smin) → ∃J ⊇ I : J ∈ CT (smin) ∧ sT (J) = sT (I)

(proof: see considerations on next slide)

• set of all closed item sets preserves knowledge of all supportvalues:

∀smin : ∀I ∈ FT (smin) : sT (I) = maxJ∈CT (smin),J⊇I

sT (J)

• note: weaker statement

∀smin : ∀I ∈ FT (smin) : sT (I) ≥ maxJ∈CT (smin),J⊇I

sT (J)

follows immediately from ∀I : ∀J ⊇ I : sT (I) ≥ sT (J), i.e. item setcannot have lower support than any of its supersets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 17 / 114

Page 26: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Closed Item Sets

• alternative characterization:

I closed ⇔ sT (I) ≥ smin ∧ I =⋂

k∈KT (I)

tk .

reminder: KT (I) = {k ∈ {1, . . . , n} | I ⊆ tk} is cover of I w.r.t. T

• derived as follows: since ∀k ∈ KT (I) : I ⊆ tk , it is obvious that

∀smin : ∀I ∈ FT (smin) : I ⊆⋂

k∈KT (I)

tk ,

if I ⊂⋂

k∈KT (I) tk , it is not closed, since⋂

k∈KT (I) tk has samesupporton the other hand, no superset of

k∈KT (I) tk has cover KT (I)

• note: above characterization allows to construct (uniquelydetermined) closed superset of frequent item set with samesupport

C. Moewes Temporal Data Mining Zittau, September 9, 2011 18 / 114

Page 27: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Closed Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• all frequent item sets but {b} and {d , e} are closed

• {b} is subset of {b, c}, both have support 0.3{d , e} is subset of {a, d , e}, both have support 0.4

C. Moewes Temporal Data Mining Zittau, September 9, 2011 19 / 114

Page 28: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Types of Frequent Item Sets

• Frequent Item Set: any frequent item set (support is higherthan minimal support):I frequent ⇔ sT (I) ≥ smin

• Closed Item Set: frequent item set is called closed if nosuperset has same support:I closed ⇔ sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < sT (I)

• Maximal Item Set: frequent item set is called maximal if nosuperset is frequent:I maximal ⇔ sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < smin

• obvious relations• all maximal and all closed item sets are frequent• all maximal item sets are closed

C. Moewes Temporal Data Mining Zittau, September 9, 2011 20 / 114

Page 29: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Types of Frequent Item Sets: Example

0 items 1 item 2 items 3 items

∅ 1 {a} .7 {a, c} .4 {a, c , d} .3{b} .3 {a, d} .5 {a, c , e} .3{c} .7 {a, e} .6 {a, d , e} .4{d} .6 {b, c} .3{e} .7 {c , d} .4

{c , e} .4{d , e} .4

• Frequent Item Set: any frequent item set (support is higherthan the minimal support)

• Closed Item Set: (marked with +) a frequent item set is calledclosed if no superset has the same support

• Maximal Item Set: (marked with ∗) a frequent item set is calledmaximal if no superset is frequent

C. Moewes Temporal Data Mining Zittau, September 9, 2011 21 / 114

Page 30: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Types of Frequent Item Sets: Example

0 items 1 item 2 items 3 items

∅+ 1 {a}+ .7 {a, c}+ .4 {a, c , d}+ .3{b} .3 {a, d}+ .5 {a, c , e}+ .3{c}+ .7 {a, e}+ .6 {a, d , e}+ .4{d}+ .6 {b, c}+ .3{e}+ .7 {c , d}+ .4

{c , e}+ .4{d , e} .4

• Frequent Item Set: any frequent item set (support is higherthan the minimal support)

• Closed Item Set: (marked with +) a frequent item set is calledclosed if no superset has the same support

• Maximal Item Set: (marked with ∗) a frequent item set is calledmaximal if no superset is frequent

C. Moewes Temporal Data Mining Zittau, September 9, 2011 21 / 114

Page 31: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Types of Frequent Item Sets: Example

0 items 1 item 2 items 3 items

∅+ 1 {a}+ .7 {a, c}+ .4 {a, c , d}+∗ .3{b} .3 {a, d}+ .5 {a, c , e}+∗ .3{c}+ .7 {a, e}+ .6 {a, d , e}+∗ .4{d}+ .6 {b, c}+∗ .3{e}+ .7 {c , d}+ .4

{c , e}+ .4{d , e} .4

• Frequent Item Set: any frequent item set (support is higherthan the minimal support)

• Closed Item Set: (marked with +) a frequent item set is calledclosed if no superset has the same support

• Maximal Item Set: (marked with ∗) a frequent item set is calledmaximal if no superset is frequent

C. Moewes Temporal Data Mining Zittau, September 9, 2011 21 / 114

Page 32: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Searching for Frequent Item Sets

• it suffices to find closed item sets together with their support• characterization of closed item sets by

I closed ⇔ sT (I) ≥ smin ∧ I =⋂

k∈KT (I)

tk

suggests to find them by forming all possible intersections of thetransactions and checking their support

• however, approaches using this idea are not competitive withother methods

• if support of all frequent item sets is needed, it can be clumsy andtedious to compute support of non-closed frequent item set with

∀smin : ∀I ∈ FT (smin) − CT (smin) : sT (I) = maxJ∈CT (smin),J⊃I

sT (J)

• in order to find closed sets one may have to visit many frequentsets anyway

C. Moewes Temporal Data Mining Zittau, September 9, 2011 22 / 114

Page 33: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Finding the Frequent Item Setsidea: use properties ofsupport to organize searchfor all frequent item sets

∀I : ∀J ⊃ I :

sT (I) < smin

→ sT (J) < smin

since these propertiesrelate support of item setto support of its subsetsand supersets, organizesearch based on subsetlattice of set A (set of allitems)

subset lattice for five items {a, b, c , d , e}:

a b c d e

ab ac ad ae bc bd be cd ce de

abc abd abe acd ace ade bcd bce bde cde

abcd abce abde acde bcde

abcde

Hasse diagram

C. Moewes Temporal Data Mining Zittau, September 9, 2011 23 / 114

Page 34: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Subset Lattice and Frequent Item Setstransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

blue boxes are frequentitem sets, white boxes areinfrequent item sets

subset lattice with frequent item sets(smin = 3):

a b c d e

ab ac ad ae bc bd be cd ce de

abc abd abe acd ace ade bcd bce bde cde

abcd abce abde acde bcde

abcde

C. Moewes Temporal Data Mining Zittau, September 9, 2011 24 / 114

Page 35: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Subset Lattice and Closed Item Setstransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

red boxes are closed itemsets, white boxes areinfrequent item sets

subset lattice with closed item sets(smin = 3):

a b c d e

ab ac ad ae bc bd be cd ce de

abc abd abe acd ace ade bcd bce bde cde

abcd abce abde acde bcde

abcde

C. Moewes Temporal Data Mining Zittau, September 9, 2011 25 / 114

Page 36: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Subset Lattice and Maximal Item Setstransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

red boxes are maximalitem sets, white boxes areinfrequent item sets

subset lattice with maximal item sets(smin = 3):

a b c d e

ab ac ad ae bc bd be cd ce de

abc abd abe acd ace ade bcd bce bde cde

abcd abce abde acde bcde

abcde

C. Moewes Temporal Data Mining Zittau, September 9, 2011 26 / 114

Page 37: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Searching for Frequent Item Sets[Agrawal and Srikant, 1994]

one possible scheme for search:

• determine support of one element item sets and discard infrequentitems

• form candidate item sets with two items (both items must befrequent), determine their support, discard infrequent item sets

• form candidate item sets with three items (all pairs must befrequent), determine their support, discard infrequent item sets

• continue by forming candidate item sets with four, five etc. itemsuntil no candidate item set is frequent

this is the Apriori Algorithm which is based on two main steps:candidate generation and pruningall frequent item set mining algorithms are based on these steps insome form

C. Moewes Temporal Data Mining Zittau, September 9, 2011 27 / 114

Page 38: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

The Apriori Algorithm 1

function apriori (A, T , smin) (∗ Apriori algorithm ∗)

begin

k := 1; (∗ initialize the item set size ∗)

Ek :=⋃

a∈A{{a}}; (∗ start with single element sets ∗)

Fk := prune(Ek , T , smin); (∗ and determine the frequent ones ∗)

while Fk 6= ∅ do begin (∗ while there are frequent item sets ∗)

Ek+1 := candidates(Fk); (∗ create item sets with one item more ∗)

Fk+1 := prune(Ek+1, T , smin); (∗ and determine the frequent ones ∗)

k := k + 1; (∗ increment the item counter ∗)

end;

return⋃k

j=1 Fj ; (∗ return the frequent item sets ∗)

end (∗ apriori ∗)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 28 / 114

Page 39: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

The Apriori Algorithm 2

function candidates (Fk) (∗ generate candidates with k + 1 items ∗)

begin

E := ∅; (∗ initialize the set of candidates ∗)

forall f1, f2 ∈ Fk (∗ traverse all pairs of frequent item sets ∗)

with f1 = {a1, . . . , ak−1, ak} (∗ that differ only in one item and ∗)

and f2 = {a1, . . . , ak−1, a′

k} (∗ are in a lexicographic order ∗)

and ak < a′

k do begin (∗ (the order is arbitrary, but fixed) ∗)

f := f1 ∪ f2 = {a1, . . . , ak−1, ak , a′

k}; (∗ union has k + 1 items ∗)

if ∀a ∈ f : f − {a} ∈ Fk (∗ only if all subsets are frequent, ∗)

then E := E ∪ {f }; (∗ add the new item set to the candidates ∗)

end; (∗ (otherwise it cannot be frequent) ∗)

return E ; (∗ return the generated candidates ∗)

end (∗ candidates ∗)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 29 / 114

Page 40: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

The Apriori Algorithm 3

function prune (E , T , smin) (∗ prune infrequent candidates ∗)

begin

forall e ∈ E do (∗ initialize the support counters ∗)

sT (e) := 0; (∗ of all candidates to be checked ∗)

forall t ∈ T do (∗ traverse the transactions ∗)

forall e ∈ E do (∗ traverse the candidates ∗)

if e ⊆ t (∗ if transaction contains the candidate, ∗)

then sT (e) := sT (e) + 1; (∗ increment the support counter ∗)

F := ∅; (∗ initialize the set of frequent candidates ∗)

forall e ∈ E do (∗ traverse the candidates ∗)

if sT (e) ≥ smin (∗ if a candidate is frequent, ∗)

then F := F ∪ {e}; (∗ add it to the set of frequent candidates ∗)

return F ; (∗ return the pruned set of candidates ∗)

end (∗ prune ∗)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 30 / 114

Page 41: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Searching for Frequent Item Sets

• Apriori algorithm searches subset lattice top-down level by level

• collecting frequent item sets of size k in set Fk has drawbacks:frequent item set of size k + 1 can be formed in

j =k(k + 1)

2

possible ways (for infrequent item sets, number may be smaller)consequence: candidate generation step may carry out a lot ofredundant work, since it suffices to generate each candidate itemset once

• question: can we reduce or even eliminate this redundant work?more generally: how can we make sure that any candidate itemset is generated at most once?

• idea: assign to each item set unique parent item set, from whichthis item set is to be generated

C. Moewes Temporal Data Mining Zittau, September 9, 2011 31 / 114

Page 42: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Searching for Frequent Item Sets

• core problem: item set of size k (i.e. with k items) can begenerated in k! different ways (on k! paths in Hasse diagram),because in principle items may be added in any order

• if we consider an item by item process of building item set(levelwise traversal of the lattice), there are k possible ways offorming an item set of size k from item sets of size k − 1 byadding the remaining item

• obvious: it suffices to consider each item set at most once inorder to find frequent ones (infrequent item sets need not begenerated at all)

• question: can we reduce or even eliminate this variety?more generally: how can we make sure that any candidate itemset is generated at most once?

• idea: assign to each item set a unique parent item set, fromwhich this item set is to be generated

C. Moewes Temporal Data Mining Zittau, September 9, 2011 32 / 114

Page 43: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Searching for Frequent Item Sets

• we must search item subset lattice / its Hasse diagram

• assigning unique parents turns Hasse diagram into tree

• traversing resulting tree explores each item set exactly once

subset lattice (Hasse diagram) and possible tree for five items:

C. Moewes Temporal Data Mining Zittau, September 9, 2011 33 / 114

Page 44: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Searching with Unique Parents

Principle of Search Algorithm based on Unique Parents:

• Base Loop:• traverse all one-element item sets (their unique parent is ∅)• recursively process all one-element item sets that are frequent

• Recursive Processing:

for given frequent item set I:• generate all extensions J of I by one item (i.e. J ⊃ I, |J | = |I| + 1)

for which item set I is chosen unique parent• for all J : if J is frequent, process J recursively, otherwise discard J

• questions:• how can we formally assign unique parents?• how can we make sure that we generate only those extensions for

which item set that is extended is chosen unique parent?

C. Moewes Temporal Data Mining Zittau, September 9, 2011 34 / 114

Page 45: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Unique Parents and Prefix Trees

• item sets sharing same longest proper prefix are siblings, becausethey have same unique parent

• this allows to represent unique parent tree as prefix tree or trie

canonical parent tree and corresponding prefix tree for 5 items:

C. Moewes Temporal Data Mining Zittau, September 9, 2011 35 / 114

Page 46: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7

• example transaction database with 5 items and 10 transactions

• minimum support: 30%, i.e. at least 3 transactions must containitem set

• all one item sets are frequent → full second level is needed

C. Moewes Temporal Data Mining Zittau, September 9, 2011 36 / 114

Page 47: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4

• determining support of item sets: for each item set traversedatabase and count transactions that contain it (highly inefficient)

• better: traverse tree for each transaction and find item sets itcontains (efficient: can be implemented as simple doubly recursiveprocedure)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 37 / 114

Page 48: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4

• minimum support: 30%, i.e. at least 3 transactions must containitem set

• infrequent item sets: {a, b}, {b, d}, {b, e}

• subtrees starting at these item sets can be pruned

C. Moewes Temporal Data Mining Zittau, September 9, 2011 38 / 114

Page 49: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : ? e: ? e: ? d : ? e: ? e: ?

• generate candidate item sets with 3 items (parents must befrequent)

• before counting, check whether candidates contain infrequentitem set

• item set with k items has k subsets of size k − 1• parent is only one of these subsets

C. Moewes Temporal Data Mining Zittau, September 9, 2011 39 / 114

Page 50: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : ? e: ? e: ? d : ? e: ? e: ?

• item sets {b, c , d} and {b, c , e} can be pruned, because• {b, c , d} contains infrequent item set {b, d} and• {b, c , e} contains infrequent item set {b, e}

C. Moewes Temporal Data Mining Zittau, September 9, 2011 40 / 114

Page 51: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

• only remaining 4 item sets of size 3 are evaluated

C. Moewes Temporal Data Mining Zittau, September 9, 2011 41 / 114

Page 52: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

• minimum support: 30%, i.e. at least 3 transactions must containitem set

• infrequent item set: {c , d , e}

C. Moewes Temporal Data Mining Zittau, September 9, 2011 42 / 114

Page 53: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

d

e: ?

• generate candidate item sets with 4 items (parents must befrequent)

• before counting, check whether candidates contain an infrequentitem set

C. Moewes Temporal Data Mining Zittau, September 9, 2011 43 / 114

Page 54: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Apriori: Levelwise Search1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

d

e: ?

• item set {a, c , d , e} can be pruned, because it contains infrequentitem set {c , d , e}

• consequence: no candidate item sets with four items

• fourth access to transaction database is not necessary

C. Moewes Temporal Data Mining Zittau, September 9, 2011 44 / 114

Page 55: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Summary Apriori

Basic Processing Scheme

• breadth-first/levelwise traversal of the subset lattice

• candidates are formed by merging item sets that differ in only oneitem

• support counting is done with doubly recursive procedure

Advantages

• “perfect” pruning of infrequent candidate item sets (withinfrequent subsets)

Disadvantages

• can require lots of memory (since all frequent item sets arerepresented)

• support counting takes very long for large transactions

Software

• http://www.borgelt.net/apriori.html

C. Moewes Temporal Data Mining Zittau, September 9, 2011 45 / 114

Page 56: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Summary Frequent Item Set Mining

• many different algorithms for frequent item set mining exist

• here only Apriori algorithm

• algorithms for frequent item set mining differ in:• traversal order of prefix tree: (breadth-first/levelwise vs.

depth-first traversal)

• transaction representation: horizontal (item arrays) vs. vertical(transaction lists) vs. specialized data structures like FP-trees

• types of frequent item sets found: frequent vs. closed vs.maximal item sets (additional pruning methods for closed andmaximal item sets)

• additional filtering is necessary to reduce size of output (notdiscussed in this talk)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 46 / 114

Page 57: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Association Rules: Basic Notions

• often found patterns are expressed as association rules, e.g.

If a customer buys bread and wine,then she/he will probably also buy cheese.

• formally, we consider rules of form X → Y , with X , Y ⊆ A andX ∩ Y = ∅

• support of rule X → Y :

either: ςT (X → Y ) = σT (X ∪ Y ) (more common: correct rule)

or: ςT (X → Y ) = σT (X ) (more plausible: applicable rule)

• confidence of rule X → Y :

cT (X → Y ) =σT (X ∪ Y )

σT (X )=

sT (X ∪ Y )

sT (X )=

sT (I)

sT (X )

can be seen as estimate of P(Y | X )

C. Moewes Temporal Data Mining Zittau, September 9, 2011 47 / 114

Page 58: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Association Rules: Formal Definition

given:

• set A = {a1, . . . , am} of items,

• vector T = (t1, . . . , tn) of transactions over A,

• real number ςmin, 0 < ςmin ≤ 1, minimum support,

• real number cmin, 0 < cmin ≤ 1, minimum confidence

desired:

• set of all association rules, i.e. set

R = {R : X → Y | ςT (R) ≥ ςmin ∧ cT (R) ≥ cmin}

general procedure:

• find frequent item sets

• construct rules and filter them w.r.t. ςmin and cmin

C. Moewes Temporal Data Mining Zittau, September 9, 2011 48 / 114

Page 59: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

• which minimum support has to be used for finding frequent itemsets depends on definition of support of rule:

• if ςT (X → Y ) = σT (X ∪ Y ), then σmin = ςmin or equivalentlysmin = ⌈nςmin⌉

• if ςT (X → Y ) = σT (X), then σmin = ςmincmin or equivalentlysmin = ⌈nςmincmin⌉

• after frequent item sets have been found, rule construction thentraverses all frequent item sets I and splits them into disjointsubsets X and Y (X ∩ Y = ∅ and X ∪ Y = I), thus forming rulesX → Y

• filtering rules w.r.t. confidence is always necessary• filtering rules w.r.t. support is only necessary if

ςT (X → Y ) = σT (X)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 49 / 114

Page 60: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Properties of the Confidence

• from ∀I : ∀J ⊆ I : sT (I) ≤ sT (J) it obviously follows

∀X , Y : ∀a ∈ X :sT (X ∪ Y )

sT (X )≥

sT (X ∪ Y )

sT (X − {a})

and therefore

∀X , Y : ∀a ∈ X : cT (X → Y ) ≥ cT (X − {a} → Y ∪ {a}),

i.e. moving an item from antecedent to consequent cannotincrease confidence of rule

• immediate consequence:

∀X , Y : ∀a ∈ X : cT (X → Y ) < cmin → cT (X −{a} → Y ∪{a}) < cmin

i.e. if rule fails to meet minimum confidence, then no rulesover same item set and with larger consequent need to beconsidered

C. Moewes Temporal Data Mining Zittau, September 9, 2011 50 / 114

Page 61: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

function rules (F); (∗ — generate association rules ∗)R := ∅; (∗ initialize the set of rules ∗)forall f ∈ F do begin (∗ traverse the frequent item sets ∗)

m := 1; (∗ start with rule heads (consequents) ∗)Hm :=

i∈f {{i}}; (∗ that contain only one item ∗)repeat (∗ traverse rule heads of increasing size ∗)

forall h ∈ Hm do (∗ traverse the possible rule heads ∗)

if sT (f )sT (f −h) ≥ cmin (∗ if the confidence is high enough, ∗)

then R := R ∪ {[(f − h) → h]}; (∗ add rule to the result ∗)else Hm := Hm − {h}; (∗ otherwise discard the head ∗)

Hm+1 := candidates(Hm); (∗ create heads with one item more ∗)m := m + 1; (∗ increment the head item counter ∗)

until Hm = ∅ or m ≥ |f |; (∗ until there are no more rule heads ∗)end; (∗ or antecedent would become empty ∗)return R ; (∗ return the rules found ∗)

end; (∗ rules ∗)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 51 / 114

Page 62: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

function candidates (Fk) (∗ generate candidates with k + 1 items ∗)

begin

E := ∅; (∗ initialize the set of candidates ∗)

forall f1, f2 ∈ Fk (∗ traverse all pairs of frequent item sets ∗)

with f1 = {a1, . . . , ak−1, ak} (∗ that differ only in one item and ∗)

and f2 = {a1, . . . , ak−1, a′

k} (∗ are in a lexicographic order ∗)

and ak < a′

k do begin (∗ (the order is arbitrary, but fixed) ∗)

f := f1 ∪ f2 = {a1, . . . , ak−1, ak , a′

k}; (∗ union has k + 1 items ∗)

if ∀a ∈ f : f − {a} ∈ Fk (∗ only if all subsets are frequent, ∗)

then E := E ∪ {f }; (∗ add the new item set to the candidates ∗)

end; (∗ (otherwise it cannot be frequent) ∗)

return E ; (∗ return the generated candidates ∗)

end (∗ candidates ∗)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 52 / 114

Page 63: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• minimum support is smin = 3 or σmin = 0.3 = 30% in this example

• 25 = 32 possible item sets over A = {a, b, c , d , e}

• 16 frequent item sets (but only 10 transactions)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 53 / 114

Page 64: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

example: I = {a, c , e}, X = {c , e}, Y = {a}

cT (c , e → a) = = =

minimum confidence: 80%

association support of support of confidencerule all items antecedent

b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8

C. Moewes Temporal Data Mining Zittau, September 9, 2011 54 / 114

Page 65: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

example: I = {a, c , e}, X = {c , e}, Y = {a}

cT (c , e → a) =sT ({a, c , e})

sT ({c , e})= =

minimum confidence: 80%

association support of support of confidencerule all items antecedent

b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8

C. Moewes Temporal Data Mining Zittau, September 9, 2011 54 / 114

Page 66: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

example: I = {a, c , e}, X = {c , e}, Y = {a}

cT (c , e → a) =sT ({a, c , e})

sT ({c , e})=

30%

40%=

minimum confidence: 80%

association support of support of confidencerule all items antecedent

b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8

C. Moewes Temporal Data Mining Zittau, September 9, 2011 54 / 114

Page 67: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generating Association Rules

example: I = {a, c , e}, X = {c , e}, Y = {a}

cT (c , e → a) =sT ({a, c , e})

sT ({c , e})=

30%

40%= 75%

minimum confidence: 80%

association support of support of confidencerule all items antecedent

b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8

C. Moewes Temporal Data Mining Zittau, September 9, 2011 54 / 114

Page 68: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Support of an Association Rule

the two rule support definitions are not equivalent:transaction vector

1: {a, c , e}2: {b, d}3: {b, c , d}4: {a, e}5: {a, b, c , d}6: {c , e}7: {a, b, d}8: {a, c , d}

two association rules

association support of support of confidencerule all items antecedent

a → c 3 (37.5%) 5 (62.5%) 67.7%b → d 4 (50.0%) 4 (50.0%) 100.0%

let minimum confidence be cmin = .65

• for ςT (R) = σ(X ∪ Y ) and 3 < ςmin ≤ 4 only rule b → d isgenerated, but not rule a → c

• for ςT (R) = σ(X ) there is no value ςmin that generates only ruleb → d , but not at same time also rule a → c

C. Moewes Temporal Data Mining Zittau, September 9, 2011 55 / 114

Page 69: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Additional Rule Filtering

Simple Measures

• general idea: compare pT (Y | X ) = cT (X → Y )and pT (Y ) = cT ( ∅ → Y ) = σT (Y )

• (absolute) confidence difference to prior:

dT (R) = |cT (X → Y ) − σT (Y )|

• (absolute) difference of confidence quotient to 1:

qT (R) =

∣∣∣∣1 − min

{cT (X → Y )

σT (Y ),

σT (Y )

cT (X → Y )

}∣∣∣∣

• confidence to prior ratio (lift):

lT (R) =cT (X → Y )

σT (Y )

C. Moewes Temporal Data Mining Zittau, September 9, 2011 56 / 114

Page 70: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Additional Rule Filtering

More Sophisticated Measures

• consider 2 × 2 contingency table or estimated probability table:

X 6⊆ t X ⊆ t

Y 6⊆ t n00 n01 n0.

Y ⊆ t n10 n11 n1.

n.0 n

.1 n..

X 6⊆ t X ⊆ t

Y 6⊆ t p00 p01 p0.

Y ⊆ t p10 p11 p1.

p.0 p

.1 1

• n..

total number of transactionsn1.

number of transactions to which rule is applicablen11 number of transactions for which rule is correct

i.e. pij =nij

n..

, pi . =ni.

n..

, p.j =

n.j

n..

for i , j = 1, 2

• general idea: use measures for strength of dependence of X and Y

C. Moewes Temporal Data Mining Zittau, September 9, 2011 57 / 114

Page 71: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

An Information-theoretic Evaluation MeasureInformation Gain [Kullback and Leibler, 1951, Quinlan, 1986]

based on Shannon entropy H = −n∑

i=1

pi log2 pi [Shannon, 1948]

Igain(X , Y ) = H(Y ) − H(Y |X )

=

︷ ︸︸ ︷

−kY∑

i=1

pi . log2 pi . −

︷ ︸︸ ︷

kX∑

j=1

p.j

−kY∑

i=1

pi |j log2 pi |j

• H(Y ) entropy of distribution of Y

• H(Y |X ) expected entropy of distribution of Y if value of Xbecomes known

• H(Y ) − H(Y |X ) expected entropy reduction or information gain

C. Moewes Temporal Data Mining Zittau, September 9, 2011 58 / 114

Page 72: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Summary Association Rules

• association rule induction is a two step process• find frequent item sets (minimum support)• form relevant association rules (minimum confidence)

• generating association rules• form all possible association rules from frequent item sets• filter “interesting” association rules based on minimum support

and minimum confidence

• filtering association rules• compare rule confidence and consequent support• information gain• other measures, e.g. χ2 measure, . . .

C. Moewes Temporal Data Mining Zittau, September 9, 2011 59 / 114

Page 73: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Industrial Applications

• car manufacturer collects servicing tasks on all their vehicles• what are interesting subgroups of cars?• how do these subgroups behave over time?• which cars’ suspension failure rate is strongly increasing in winter?

• bank assesses credit contracts w.r.t. terminability• what changes were there in past?• any common factors?• how to communicate this to non-statisticians, e.g. bankers?

• tracking user activity in virtual environment• are there any oddities in user behavior?• how to parameterize “odd” things?

C. Moewes Temporal Data Mining Zittau, September 9, 2011 60 / 114

Page 74: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Or: What they have and what they want

data are

• high-dimensional

• many-valued

• time-stamped

results should be

• easy-to-understand patterns (rules)

• exploratory tools (visualization and inspection)

• natural way of interaction

• exploit temporal information (if desired)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 61 / 114

Page 75: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Rule Icons

• every rule〈A1 = a1 ∧ · · · ∧ Ak = ak〉 → C = c

of given rule set is represented as icon• for every possible item, reserved segment on outer border

C. Moewes Temporal Data Mining Zittau, September 9, 2011 62 / 114

Page 76: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Rule Icons

• every rule〈A1 = a1 ∧ · · · ∧ Ak = ak〉 → C = c

of given rule set is represented as icon• for every possible item, reserved segment on outer border• if item is present in antecedent, segment is colored

C. Moewes Temporal Data Mining Zittau, September 9, 2011 62 / 114

Page 77: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Rule Icons

• every rule〈A1 = a1 ∧ · · · ∧ Ak = ak〉 → C = c

of given rule set is represented as icon• for every possible item, reserved segment on outer border• if item is present in antecedent, segment is colored• interior encodes rule-measure: e.g. confidence

C. Moewes Temporal Data Mining Zittau, September 9, 2011 62 / 114

Page 78: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Rule Icons: Overlapping

• cover of 2 rules may be non-empty

• percentage bar to display mutual overlap

• special case: inclusion

Gender = male → Cancer = yes

Gender = male ∧ Smoker = yes → Cancer = yes

C. Moewes Temporal Data Mining Zittau, September 9, 2011 63 / 114

Page 79: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Rule Icons: Overlapping

• cover of 2 rules may be non-empty• percentage bar to display mutual overlap• general case:

C. Moewes Temporal Data Mining Zittau, September 9, 2011 64 / 114

Page 80: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Rule Icons: Location

• finally, arrange icons in two-dimensional chart

• choose 3 association rule measures for both axes and size of icon

• our suggestion for rule X → Y , choose following measures:• x-coordinate: recall, i.e. cT (Y → X)• y-coordinate: lift, i.e. cT (X → Y )/σT (Y )• size: support, i.e. σT (X ∪ Y )

C. Moewes Temporal Data Mining Zittau, September 9, 2011 65 / 114

Page 81: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Real-world Example: Daimler AG

car database

• 300,000 vehicles

• subset of 180 attributes

• 2–300 values per attribute

• probabilistic dependencynetwork

C. Moewes Temporal Data Mining Zittau, September 9, 2011 66 / 114

Page 82: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Real-world Example: Daimler AG

C. Moewes Temporal Data Mining Zittau, September 9, 2011 67 / 114

Page 83: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Real-world Example: Daimler AG

Explorative Analysis Tool

C. Moewes Temporal Data Mining Zittau, September 9, 2011 68 / 114

Page 84: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Real-world Example: ADAC

customer database

• car and customer information

• assessment of vehicle quality

C. Moewes Temporal Data Mining Zittau, September 9, 2011 69 / 114

Page 85: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Change of Rules

• why considering temporal development of rules?(i.e. change of certain rule evaluation measures)

• failure patterns usually do not arise out of sudden, but ratherevolve slowly over time

• fixed problem takes some while to have measurable effect

• how to present this evolution to user?• create time series for every measure used for locating and scaling

rule icon• interpolate between frames and present animation

• problem: high number of rules

C. Moewes Temporal Data Mining Zittau, September 9, 2011 70 / 114

Page 86: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

How does that look like?

real-world dataset

C. Moewes Temporal Data Mining Zittau, September 9, 2011 71 / 114

Page 87: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

How does that look like?

obviously, there is demand forpost-processing rule set

C. Moewes Temporal Data Mining Zittau, September 9, 2011 72 / 114

Page 88: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Change of Rules

1. divide dataset into reasonable time frames

2. run respective pattern induction algorithm

3. quantify each pattern w.r.t. any desired measure(s)

4. generate time series for each measure and each pattern

5. match time series against user-specified concept

6. rank them according to membership of concept

C. Moewes Temporal Data Mining Zittau, September 9, 2011 73 / 114

Page 89: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

User-driven Post-processing

• often users have idea in which direction to investigate

• however, they cannot explicitly phrase query for data mining

• we can use “fuzzy” intentions to thin out rule set, e.g.

“Show me only those rules that had a stronglyincreasing support and an increasing confidence in thelast quarter.”

or“Which patterns exhibit an increasing lift while thesupport was stable or at most slightly decreasing?”

C. Moewes Temporal Data Mining Zittau, September 9, 2011 74 / 114

Page 90: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

User-driven Post-processing

1. specify fuzzy partition on change rate domain of every patternevaluation measure

C. Moewes Temporal Data Mining Zittau, September 9, 2011 75 / 114

Page 91: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

User-driven Post-processing

1. specify fuzzy partition on change rate domain of every patternevaluation measure

2. encode user-concept as fuzzy antecedent

e.g. “lift is unchanged and confidence is increasing”

〈∆liftis unch ∧ ∆confis incr〉

will be evaluated as

⊤(

µ(unch)∆lift

(~a → c), µ(incr)∆conf

(~a → c))

where ⊤ is t-norm that represents fuzzy conjunction

C. Moewes Temporal Data Mining Zittau, September 9, 2011 75 / 114

Page 92: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

User-driven Post-processing

1. specify fuzzy partition on change rate domain of every patternevaluation measure

2. encode user-concept as fuzzy antecedent

3. order patterns w.r.t. concept membership degrees

C. Moewes Temporal Data Mining Zittau, September 9, 2011 75 / 114

Page 93: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Summary Industrial Applications

requirements

• easy-to-understand patterns

• exploratory visual tools

• natural and intuitive interaction

• exploitation of temporal information

desired properties of rules

• almost parameter-free (support and confidence have clear notionand can even be increased after induction)

• no black-box approach

• intuitive type of patterns (decision/business rules)

• natural way of treating missing values

• small data preprocessing overhead

C. Moewes Temporal Data Mining Zittau, September 9, 2011 76 / 114

Page 94: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Outline

1. Introduction

2. Association Rules and Frequent Item Sets

3. Frequent Sequence Mining

Canonical Form for Undirected Sequences

Allen’s Interval Relations

Temporal Interval Patterns

Quality Monitoring of Vehicles

4. Finding Motifs in Time Series Effectively

Page 95: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Sequence Mining

• directed vs. undirected sequences• e.g. temporal sequences are always directed• DNA sequences can be undirected (both directions can be relevant)

• multiple sequences vs. single sequence• multiple sequences: purchases with rebate cards, web server access

protocols• single sequence: alarms in telecommunication networks

• (time) points vs. time intervals• points: DNA sequences, alarms in telecommunication networks• intervals: weather data, movement analysis (sports medicine)• further distinction: one object per (time) point vs. multiple objects

C. Moewes Temporal Data Mining Zittau, September 9, 2011 77 / 114

Page 96: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Sequence Mining

• consecutive subsequences vs. subsequences with gaps

• a c b a b c b a always counts as subsequence abc

• a c b a b c b c may not always count as subsequence abc

• existence of occurrence vs. counting occurrences• combinatorial counting (all occurrences)• maximal number of disjoint occurrences• temporal support (number of time window positions)• minimum occurrence (smallest interval)

• relation between objects in sequence◦ items: only precede and succeed

◦ labeled time points: t1 < t2, t1 = t2, and t1 > t2

◦ labeled time intervals: relations like before, starts, overlaps etc.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 78 / 114

Page 97: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Frequent Sequence Mining

• directed sequences are easier to handle:• (sub)sequence itself can be used as code word• only 1 possible code word per sequence (only 1 direction) → this

code word is necessarily canonical

• consecutive subsequences are easier to handle:• fewer occurrences of given subsequence• for each occurrence, exactly one possible extensions• allows specialized data structures (similar to tree)

• item sequences are easiest to handle:• only 2 possible relations and thus patterns are simple• other sequences are handled with state machines for containment

tests

C. Moewes Temporal Data Mining Zittau, September 9, 2011 79 / 114

Page 98: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

A Canonical Form for Undirected Sequences

• if sequences to mine are not directed, subsequence can not beused as its own code word, because it does not have prefixproperty

• reason: undirected sequence can be read forward or backward

→ two possible code words, smaller (or larger) of which may then bedefined as canonical code word

• examples (that prefix property is violated):• assume: item order a < b < c . . . and lexicographically smaller

code word is canonical one• sequence bab, which is canonical, has prefix ba, but canonical form

of sequence ba is rather ab• sequence cabd , which is canonical, has the prefix cab, but

canonical form of sequence cab is rather bac

• consequence: we must look for different way of forming codewords (at least if we want the code to have prefix property)

C. Moewes Temporal Data Mining Zittau, September 9, 2011 80 / 114

Page 99: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

A Canonical Form for Undirected Sequences

one possibility to form them having prefix property:• handle (sub)sequences of even and odd length separately

• in addition, forming the code word is started in the middleeven length: sequence am am−1 . . . a2 a1 b1 b2 . . . bm−1 bm

is described by code word a1 b1 a2 b2 . . . am−1 bm−1 am bm

or by code word b1 a1 b2 a2 . . . bm−1 am−1 bm am.

odd length: sequence am am−1 . . . a2 a1 a0 b1 b2 . . . bm−1 bm

is described by code word a0 a1 b1 a2 b2 . . . am−1 bm−1 am bm

or by code word a0 b1 a1 b2 a2 . . . bm−1 am−1 bm am.

• the lexicographically smaller of 2 code words is canonical codeword

• such sequences are extended by adding pair am+1 bm+1 orbm+1 am+1, i.e. by adding 1 item at front and 1 item at end

C. Moewes Temporal Data Mining Zittau, September 9, 2011 81 / 114

Page 100: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

A Canonical Form for Undirected Sequences

code words defined in this way have prefix property:

• suppose prefix property would not hold• then w.l.o.g., there exists a canonical code word

wm = a1 b1 a2 b2 . . . am−1 bm−1 am bm,

the prefix wm−1 of which is not canonical, where

wm−1 = a1 b1 a2 b2 . . . am−1 bm−1,

• consequence: wm < vm, where

vm = b1 a1 b2 a2 . . . bm−1 am−1 bm am,

and vm−1 < wm−1, where

vm−1 = b1 a1 b2 a2 . . . bm−1 am−1

• but: vm−1 < wm−1 implies vm < wm, because vm−1 is prefix of vm

and wm−1 is a prefix of wm, but vm < wm contradicts wm < vm

C. Moewes Temporal Data Mining Zittau, September 9, 2011 82 / 114

Page 101: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

A Canonical Form for Undirected Sequences

• generating and comparing 2 possible code words takes linear time• however, this can be improved by maintaining additional piece of

information• for each sequence, symmetry flag is computed:

sm =m∧

i=1

(ai = bi)

• symmetry flag can be maintained in constant time with

sm+1 = sm ∧ (am+1 = bm+1)

• permissible extensions depend on symmetry flag:• if sm = true, it must be am+1 ≤ bm+1

• if sm = false, any relation between am+1 and bm+1 is acceptable

• rule guarantees: exactly canonical extensions are created• applying this rule to check candidate extension takes constant

timeC. Moewes Temporal Data Mining Zittau, September 9, 2011 83 / 114

Page 102: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Sequences of Time Intervals

• (labeled or attributed) time interval is triple I = (s, e, l), where sis start time, e is end time and l is associated label

• time interval sequence is set of (labeled) time intervals, ofwhich we assume that they are maximal in sense that for 2intervals I1=(s1, e1, l1) and I2=(s2, e2, l2) with l1= l2 we haveeither e1 < s2 or e2 < s1 (otherwise they are merged into 1interval I = (min{s1, s2}, max{e1, e2}, l1))

• time interval sequence database is vector of time intervalsequences

• time intervals can easily be ordered as follows:let I1 = (s1, e1, l1)and I2 = (s2, e2, l2) be 2 time intervalsit is I1 ≺ I2 iff

• s1 < s2 or• s1 = s2 and e1 < e2 or• s1 = s2 and e1 = e2 and l1 < l2

due to assumption made above, at least 3rd option must holdC. Moewes Temporal Data Mining Zittau, September 9, 2011 84 / 114

Page 103: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Allen’s Interval Relations

• due to their temporal extension, time intervals allow for differentrelations

• commonly used set of relations between time intervals are Allen’sinterval relations [Allen, 1983]

A before BA

B B after A

A meets BA

B B is met by A

A overlaps BA

B B is overlapped by A

A is finished by BA

B B finishes A

A contains BAB B during A

A is started by BA

B B starts A

A equals BAB B equals A

C. Moewes Temporal Data Mining Zittau, September 9, 2011 85 / 114

Page 104: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Interval Patterns[Kempe et al., 2008]

• pattern must specify relations between all referenced intervals• this can conveniently be done with matrix:

AB

C

A B C

A e ? ?B ? e ?C ? ? e

• such temporal pattern matrix can also be interpreted as adjacencymatrix of graph, which has interval relationships as edge labels

C. Moewes Temporal Data Mining Zittau, September 9, 2011 86 / 114

Page 105: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Interval Patterns[Kempe et al., 2008]

• pattern must specify relations between all referenced intervals• this can conveniently be done with matrix:

AB

C

A B C

A e o bB io e mC a im e

• such temporal pattern matrix can also be interpreted as adjacencymatrix of graph, which has interval relationships as edge labels

C. Moewes Temporal Data Mining Zittau, September 9, 2011 86 / 114

Page 106: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Temporal Interval Patterns[Kempe et al., 2008]

• pattern must specify relations between all referenced intervals• this can conveniently be done with matrix:

AB

C

A B C

A e bB e mC im e

• such temporal pattern matrix can also be interpreted as adjacencymatrix of graph, which has interval relationships as edge labels

• generally, input interval sequences may be represented as suchgraphs, thus mapping problem to frequent (sub)graph mining

• however, relationships between time intervals are constrained(e.g. “B after A” and “C after B” imply “C after A”)

• constraints can be exploited to obtain simpler canonical form• in canonical form, intervals are assigned in increasing time order

to rows and columns of temporal pattern matrixC. Moewes Temporal Data Mining Zittau, September 9, 2011 86 / 114

Page 107: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Support of Temporal Patterns

• support of temporal pattern w.r.t. single sequence can be definedby

• combinatorial counting (all occurrences)• maximal number of disjoint occurrences• temporal support (number of time window positions)• minimum occurrence (smallest interval)

• however, all of these definitions suffer from fact that such supportis not anti-monotone or downward closed :

AB B

C. Moewes Temporal Data Mining Zittau, September 9, 2011 87 / 114

Page 108: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Support of Temporal Patterns

• support of temporal pattern w.r.t. single sequence can be definedby

• combinatorial counting (all occurrences)• maximal number of disjoint occurrences• temporal support (number of time window positions)• minimum occurrence (smallest interval)

• however, all of these definitions suffer from fact that such supportis not anti-monotone or downward closed :

AB B

support of “A contains B” is 2,but support of “A” is only 1

C. Moewes Temporal Data Mining Zittau, September 9, 2011 87 / 114

Page 109: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Support of Temporal Patterns

• support of temporal pattern w.r.t. single sequence can be definedby

• combinatorial counting (all occurrences)• maximal number of disjoint occurrences• temporal support (number of time window positions)• minimum occurrence (smallest interval)

• however, all of these definitions suffer from fact that such supportis not anti-monotone or downward closed :

AB B

support of “A contains B” is 2,but support of “A” is only 1

• nevertheless, exhaustive pattern search can ensured, withouthaving to abandon pruning with Apriori property

• reasons: with minimum occurrence counting, relationship“contains” is the only one that can lead to support anomalies

C. Moewes Temporal Data Mining Zittau, September 9, 2011 87 / 114

Page 110: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Weakly Anti-Monotone / Downward Closed[Kempe et al., 2008]

• let P pattern space with subpattern relationship ⊏ and let s befunction from P to real numbers, s : P → IR

• for pattern S ∈ P, let P(S) = {R | R ⊏ S ∧ 6∃ Q : R ⊏ Q ⊏ S}be set of all parent patterns of S

• function s on pattern space P is called• strongly anti-monotone or strongly downward closed iff

∀S ∈ P : ∀R ∈ P(S) : s(R) ≥ s(S)

• weakly anti-monotone or weakly downward closed iff

∀S ∈ P : ∃R ∈ P(S) : s(R) ≥ s(S)

• support of temporal interval patterns is weakly anti-monotone (atleast) if it is computed from minimal occurrences

• if temporal interval patterns are extended backwards in time, thenApriori property can safely be used for pruning

C. Moewes Temporal Data Mining Zittau, September 9, 2011 88 / 114

Page 111: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Summary Frequent Sequence Mining

• different types of frequent sequence mining can bedistinguished:

• single and multiple sequences, directed and undirected sequences• items vs. (labeled) intervals, single and multiple objects per

position• relations between objects, definition of pattern support

• all common types of frequent sequence mining possess canonicalforms for which canonical extension rules can be found

• with these rules it is possible to check in constant time whetherpossible extension leads to result in canonical form

• weakly anti-monotone support function can be enough to allowpruning with Apriori property

• however, in this case: make sure that canonical form assignsappropriate parent pattern to ensure exhaustive search

C. Moewes Temporal Data Mining Zittau, September 9, 2011 89 / 114

Page 112: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Quality Monitoring of Vehicles

101,250 vehicles

• garage stops

• vehicle configuration

• 1.4 million temporal intervals

C. Moewes Temporal Data Mining Zittau, September 9, 2011 90 / 114

Page 113: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Quality Monitoring of Vehicles

C. Moewes Temporal Data Mining Zittau, September 9, 2011 91 / 114

Page 114: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Quality Monitoring of Vehicles

C. Moewes Temporal Data Mining Zittau, September 9, 2011 91 / 114

Page 115: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Pre-Production Vehicles

C. Moewes Temporal Data Mining Zittau, September 9, 2011 92 / 114

Page 116: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Pre-Production Vehicles

C. Moewes Temporal Data Mining Zittau, September 9, 2011 93 / 114

Page 117: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Outline

1. Introduction

2. Association Rules and Frequent Item Sets

3. Frequent Sequence Mining

4. Finding Motifs in Time Series Effectively

Time Series Representations

Symbolic Aggregate Approximation (SAX)

Motifs in Time Series

Sub-dimensional Motif: Example

Page 118: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Refresh: Data Mining in Time Series

• big challenge: to find useful information in time series

• typical problems: clustering, classification, frequent patternmining, association rules, visualization, anomaly detection

• because of huge amount of data, often problems boil down tosearch for reoccurring similar subsequences

• needed: similarity measure to compare subsequences

• e.g. Euclidean distance

d(Q, C) =

√√√√

n∑

i=1

(qi − ci )2

of 2 standard normal distributed subsequences Q = (q1, . . . , qn)T

and C = (c1, . . . , cn)T

• problem: many comparisons, capacity of fast main memoryusually too small to load all data

C. Moewes Temporal Data Mining Zittau, September 9, 2011 94 / 114

Page 119: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Memory-efficient Representations[Lin et al., 2007]

• problem: many, slow accesses to raw data

• solution: approximation of time series that fits into main memoryand contains interesting features

• e.g. discrete Fourier transformation (DFT), discrete wavelettransformation (DWT), piecewise linear (PLA) or adaptivepiecewise constant approximation (APCA), singular valuedecomposition (SVD)

• here: symbolic representations

• advantage: algorithms from text processing and bioinformaticsapplicable, e.g. hashing, Markov models, suffix trees etc.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 95 / 114

Page 120: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Time Series Representations[Lin et al., 2007]

Time Series

Representations

Model-based

HMM ARMA

Non data-adaptive

Wavelets

Orthonormal

Haar Daubechies

Bi-orthonormal

Coiflets Symlets

Random PAA Spectral

DFT DCT Chebyshev

Data-dictated

ClippedPhase-based

Grid

Data-adaptive

SortedCoefficients

PiecewisePolynomial

PiecewiseLinear

Interpolation Regression

APCA

SVD Symbolic

NLG Strings

SAXValue-based

Slope-based

Trees

C. Moewes Temporal Data Mining Zittau, September 9, 2011 96 / 114

Page 121: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

The most common representations[Lin et al., 2007]

DFT PLA Haar wavelet APCA

C. Moewes Temporal Data Mining Zittau, September 9, 2011 97 / 114

Page 122: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Piecewise Aggregate Approximation (PAA)[Lin et al., 2007]

reduction from 128 to 8 data pointsC. Moewes Temporal Data Mining Zittau, September 9, 2011 98 / 114

Page 123: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Symbolic Aggregate Approximation (SAX)[Lin et al., 2007]

• every sequence of length n becomes a word of defined length wover chosen alphabet A = {α1, . . . , αa} with |A| = a

• simple algorithm:1. separate subsequence into w equally sized intervals

2. PAA: compute mean value of each interval (as representative)C = (c1, . . . , cn)

T is mapped onto C = (c1, . . . , cw ) with

ci =w

n

nw

i∑

j= nw(i−1)+1

cj

3. map each mean value ci of C onto one of a letters with

ai = αj ⇔ βj−1 ≤ ci ≤ βj

• assumptions: normally distributed value range of PAA sequenceand equiprobable occurrence of each letter

• mapping ci 7→ b ∈ A by “cutpoints” β1, . . . , βa−1C. Moewes Temporal Data Mining Zittau, September 9, 2011 99 / 114

Page 124: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

“Cutpoints” of the Normal Distribution[Lin et al., 2007]

|A| 3 4 5 6 7 8 9 10

β1 −.43 −.67 −.84 −.97 −1.07 −1.15 −1.22 −1.28β2 0.43 0 0.25 0.43 0.57 0.67 0.76 0.84β3 0.67 0.25 0 −.18 −.32 −.43 −.52β4 0.84 0.43 0.18 0 −.14 −.25β5 0.97 0.57 0.32 0.14 0β6 1.07 0.67 0.43 0.25β7 1.15 0.76 0.52β8 1.22 0.84β9 1.28

• cutpoints separate normal distribution in equiprobable regions

C. Moewes Temporal Data Mining Zittau, September 9, 2011 100 / 114

Page 125: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

SAX: Example[Lin et al., 2007]

• here: n = 128, w = 8, a = 3

• result: baabccbc

C. Moewes Temporal Data Mining Zittau, September 9, 2011 101 / 114

Page 126: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

SAX: Distance Measure[Lin et al., 2007]

• PAA: lower bound of Euclidean distance with

dr (Q, C) =

√n

w

√√√√

w∑

i=1

(qi − ci )2

• SAX:

d∗(Q, C) =

√n

w

√√√√

w∑

i=1

d∗a (qi , ci )2

• distance d∗a should be defined via lookup table, e.g. for a = 4

a b c d

a 0 0 0.67 1.34b 0 0 0 0.67c 0.67 0 0 0d 1.340 0.67 0 0

C. Moewes Temporal Data Mining Zittau, September 9, 2011 102 / 114

Page 127: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Comparison of Distances[Lin et al., 2007]

C. Moewes Temporal Data Mining Zittau, September 9, 2011 103 / 114

Page 128: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

SAX Advantage: Lower Bound[Lin et al., 2007]

• d∗(Q, C) is lower bound of Euclidean distance d(Q, C) oforiginal sequences Q and C

d∗(Q, C) ≤ d(Q, C)

• if Q and C are dissimilar, so are Q and C

• SAX-based algorithms produce identical result compared toalgorithms running on original data

• “only” similar SAX words should be compared in the original space

• usually, only few accesses to original data

C. Moewes Temporal Data Mining Zittau, September 9, 2011 104 / 114

Page 129: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Find Motifs in Time Series[Chiu et al., 2003]

• motifs: primitive, frequent (similar) patterns, prototypes• challenges:

• motifs are unknown beforehand• complete search is expensive, i.e. O(n2)• outliers influence Euclidean distance

C. Moewes Temporal Data Mining Zittau, September 9, 2011 105 / 114

Page 130: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Generation of SAX Matrix[Chiu et al., 2003]

• find all time series motifs oflength m using “slidingwindow”

• window width n leads to(m − n + 1) subsequences

• transform every sequence intoSAX word of length w

• save it in row matrix,i.e. SAX matrix

• w columns,(m − n + 1) rows

C. Moewes Temporal Data Mining Zittau, September 9, 2011 106 / 114

Page 131: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Random Projection[Chiu et al., 2003]

• guess motif positions by so-called random projection

• pairwise comparison of SAX words

• collision matrix M with (m − n + 1)2 cells for all comparisons

• use hash table to implement M efficiently!

• initially, M(i , j) = 0 for 1 ≤ i , j ≤ m − n + 1

• idea: compare character after character of 2 words in SAX matrix

• assumption: “don’t care symbols” in sequences with unknownposition

• e.g. noisy motifs, dilated or contracted sequence

C. Moewes Temporal Data Mining Zittau, September 9, 2011 107 / 114

Page 132: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Random Projection[Chiu et al., 2003]

• thus SAX Matrix is projected onto 1 ≤ k < w randomly chosencolumns

• compare all rows of projected matrix• if 2 projected SAX words in rows i and j are equal, then

increment M(i , j)• repeat projection t times, because it is likely that some motifs will

share one cell in M after t iterations• many random sequences will most likely not collide with already

found motif• user-defined threshold s with 1 ≤ s ≤ k for collision entries in M• all M(i , j) ≥ s would be candidate motifs• but: there are very similar sequences in immediate neighborhood

of sequence i (so-called trivial matches)• these must be removed!

C. Moewes Temporal Data Mining Zittau, September 9, 2011 108 / 114

Page 133: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Random Projection: First Two Iterations[Chiu et al., 2003]

C. Moewes Temporal Data Mining Zittau, September 9, 2011 109 / 114

Page 134: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Sub-dimensional Motifs[Minnen et al., 2007]

• so far: univariate symbolic time series

• random projection can also be used for multivariate symbolic timeseries

• idea: increment collision matrix M for each variablej ∈ {1, . . . , p} fopr each projected SAX word

• problem: relevant dimensions of potential sub-dimensional motifsare unknown

• solution:• estimate distribution P(dj) over distances between non-trivial

matches by drawing a sample• compute distances d∗

1 , . . . , d∗

p for each entry M(i , j) ≥ s

• if P(dj ≤ d∗

j ) < r relj (user-specific dimension relevance), then j-th

variable will be relevant

C. Moewes Temporal Data Mining Zittau, September 9, 2011 110 / 114

Page 135: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Sub-dimensional Motif: Example[Moewes and Kruse, 2009]

• expert identified p = 9 of 130 variables as important

• motifs last at least n = 400 ms

• given: 10 time series with unknown sub-dimensional motifs

C. Moewes Temporal Data Mining Zittau, September 9, 2011 111 / 114

Page 136: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Sub-dimensional Motif in two Time Series[Moewes and Kruse, 2009]

0100

200300

400

1000 1500 2000

Tim

e [s]

attr_0

0100

200300

400

0 10 30 50 70

Tim

e [s]

attr_1

0100

200300

400

0 200 600 1000

Tim

e [s]

attr_2

0100

200300

400

−16 −14 −12

Tim

e [s]

attr_3

0100

200300

400

−24 −20 −16

Tim

e [s]

attr_4

0100

200300

400

0 10 30 50 70

Tim

e [s]

attr_1

0100

200300

400

−18 −16 −14 −12

Tim

e [s]

attr_3

0100

200300

400

13.7 13.8 13.9 14.0

Tim

e [s]

attr_6

0100

200300

400

13.7 13.8 13.9 14.0 14.1

Tim

e [s]

attr_7

0100

200300

400

27.4 27.6 27.8 28.0

Tim

e [s]

attr_8

attr_3 attr_1

attr_3 attr_1

C. Moewes Temporal Data Mining Zittau, September 9, 2011 112 / 114

Page 137: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Clustering of Motifs[Moewes and Kruse, 2009]

DO

0_00

36.c

sv_4

34

DO

0_00

36.c

sv_2

587

DO

0_00

77.c

sv_8

08

DO

0_00

77.c

sv_2

024

DO

0_00

36.c

sv_3

89

DO

0_00

36.c

sv_3

95

DO

0_00

36.c

sv_2

548

DO

0_00

36.c

sv_2

543

020

040

060

080

0

Cluster Dendrogram

hclust (*, "ward")prox

Hei

ght

• create dissimilarity matrix bypairwise comparison of allfound motifs in 10 time seriesbased on d∗

• positive, symmetric matrixwith zeros at main diagonal

• can be used to clusteroccurrences, which helpsfinding motifs occurring inseveral time series

• here: hierarchical clusteringof motifs containing variablesattr_1 and attr_3

C. Moewes Temporal Data Mining Zittau, September 9, 2011 113 / 114

Page 138: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Thanks to. . .This talk wouldn’t be possible without the work of

• Christian Borgelt, European Centre for Softcomputing,http://www.borgelt.net

• Steffen Kempe, Daimler AG

• Rudolf Kruse, University of Magdeburg,http://fuzzy.cs.ovgu.de

• Matthias Steinbrecher, SAP AG

Examples from this talk are based on real-world problems of thefollowing companies:

• ADAC

• Dresdner Bank

• Daimler AG

• Second LifeC. Moewes Temporal Data Mining Zittau, September 9, 2011 114 / 114

Page 139: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Literature I

Agrawal, R. and Srikant, R. (1994).

Fast algorithms for mining association rules in large databases.

In Bocca, J. B., Jarke, M., and Zaniolo, C., editors, Proceedings of the 20thInternational Conference on Very Large Data Bases (VLDB ’94), pages487–499, San Francisco, CA, USA. Morgan Kaufmann Publishers, Inc.

Allen, J. F. (1983).

Maintaining knowledge about temporal intervals.

Communications of the ACM, 26:832–843.

Chiu, B., Keogh, E., and Lonardi, S. (2003).

Probabilistic discovery of time series motifs.

In Proceedings of the ninth ACM SIGKDD international conference onKnowledge discovery and data mining, pages 493–498, Washington, D.C.ACM.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 1

Page 140: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Literature II

Kempe, S., Hipp, J., Lanquillon, C., and Kruse, R. (2008).

Mining frequent temporal patterns in interval sequences.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,16(5):645–661.

Kullback, S. and Leibler, R. A. (1951).

On information and sufficiency.

The Annals of Mathematical Statistics, 22(1):79–86.

Lin, J., Keogh, E., Wei, L., and Lonardi, S. (2007).

Experiencing SAX: a novel symbolic representation of time series.

Data Mining and Knowledge Discovery, 15(2):107–144.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 2

Page 141: Temporal Data Miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · Temporal Data Mining • many data mining problems deal with temporal features • most prominent: (numeric/symbolic)

Literature III

Minnen, D., Isbell, C., Essa, I., and Starner, T. (2007).

Detecting subdimensional motifs: An efficient algorithm for generalizedmultivariate pattern discovery.

In Proceedings of the 2007 Seventh IEEE International Conference on DataMining, pages 601–606, Los Alamitos, CA, USA. IEEE Computer Society.

Moewes, C. and Kruse, R. (2009).

Zuordnen von linguistischen ausdrücken zu motiven in zeitreihen.

at-Automatisierungstechnik, 57(3):146–154.

Quinlan, J. R. (1986).

Induction of decision trees.

Journal of Machine Learning, 1(1):81–106.

Shannon, C. E. (1948).

A mathematical theory of communication.

Bell System Technical Journal, 27(3):379–423.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 3