temporal data miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · temporal data mining • many data...

Temporal Data Mining

Christian [email protected]

Otto-von-Guericke University of MagdeburgFaculty of Computer Science

Department of Knowledge Processing and Language Engineering

Zittau Fuzzy Colloquium 2011 Zittau, September 9, 2011

http://www.ovgu.de

http://www.fin.ovgu.de

http://iws.cs.ovgu.de

Outline

1. Introduction

Data Mining

Knowledge discovery in databases

CRISP-DM

Why to Study Temporal Data Mining?

2. Association Rules and Frequent Item Sets

3. Frequent Sequence Mining

4. Finding Motifs in Time Series Effectively

Data

• today: companies/institutes maintain huge databases

⇒ gigantic archives of tables, documents, images, sounds

• “If you have enough data, you can solve any problem!”

• in large databases: can’t see the wood for the trees

• patterns, structures, regularities stay undetected

• finding patterns and exploit information is fairly difficult

We are drowning in information but starved for knowledge.[John Naisbitt]

C. Moewes Temporal Data Mining Zittau, September 9, 2011 1 / 114

Knowledge discovery in databases

• actually, abundance of data

• lack of tools transforming datainto knowledge

⇒ research area: knowledgediscovery in databases (KDD)

• nontrivial process of identifyingvalid, novel, potentially useful,and ultimately understandablepatterns in data

• one step in KDD: data mining Miner VGA (1989) screenshot


Data mining tasks

• classificationIs this customer creditworthy or not?

• segmentation, clusteringWhat groups of customers do I have?

• concept descriptionWhich properties characterize cured patients?

• predictionHow many new cars will be called next month?

• dependence/association analysisWhich CAN-bus errors of broken cars occur together frequently?

• . . .


CRISP-DMCRoss Industry Standard Process for Data Mining


Temporal Data Mining

• many data mining problems deal with temporal features

• most prominent: (numeric/symbolic) time series

• time series are ubiquitous: finance, medicine, biometry, chemistry,astronomy, robotics, networks and industry

• upcoming: temporal sequences of arbitrary objects, e.g. subgraphs

• challenges: preprocessing, (dis)similarity measures, representation,search for useful information

• last decade: research on purely proposing new algorithms

• nowadays: research on application-based solutions


Outline

1. Introduction


The Apriori Algorithm

Association Rules

Rule Icons: Industrial Applications



Frequent Item Set Mining: Motivation

• method for market basket analysis

• finding regularities in shopping behavior of customers ofsupermarkets, mail-order companies, on-line shops etc.

• more specifically: find sets of products that are frequentlybought together

• possible applications:• improve arrangement of products in shelves, on catalog’s pages• support cross-selling (suggestion of other products), product

bundling• fraud detection, technical dependence analysis

• often found patterns are expressed as association rules, e.g.

If a customer buys bread and wine,then she/he will probably also buy cheese.


Frequent Item Set Mining: Basic Notions

• let A = {a1, . . . , am} be set of items (products, specialequipment items, service options, . . . )

• any subset I ⊆ A is called item setitem set: set of products that can be bought (together)

• let T = (t1, . . . , tn) with ∀i , 1 ≤ i ≤ n : ti ⊆ A be vector oftransactions over A,each transaction is item set, but some item sets may /∈ Ttransactions needn’t be pairwise different: may be ti = tk fori 6= kT may also be bag or multiset of transactionsA may not be explicitely given, but only implicitely as A =

⋃ni=1 ti

T can list, e.g. sets of products bought by customers ofsupermarket in given period of time


Frequent Item Set Mining: Basic Notions

let I ⊆ A be item set and T vector of transactions over A

• transaction t ∈ T covers item set I oritem set I is contained in transaction t ∈ T iff I ⊆ t.

• set KT (I) = {k ∈ {1, . . . , n} | I ⊆ tk} is called cover of I w.r.t. Tcover of item set is index set of transactions that cover itit may also be defined as vector of all transactions that cover it(however, this is complicated to write in formally correct way)

• value sT (I) = |KT (I)| is called (absolute) support of I w.r.t. Tvalue σT (I) =

1n

|KT (I)| is called relative support of I w.r.t. Tsupport of I is number or fraction of transactions that contain itsometimes σT (I) is also called (relative) frequency of I w.r.t. T


Frequent Item Set Mining: Formal Definition

given:

• set A = {a1, . . . , am} of items,

• vector T = (t1, . . . , tn) of transactions over A,

• number smin ∈ IN, 0 < smin ≤ n or (equivalently)number σmin ∈ IR, 0 < σmin ≤ 1, the minimum support

desired:

• set of frequent item sets, i.e.set FT (smin) = {I ⊆ A | sT (I) ≥ smin} or (equivalently)set ΦT (σmin) = {I ⊆ A | σT (I) ≥ σmin}

note: with the relations smin = ⌈nσmin⌉ and σmin = 1nsmin the two

versions can easily be transformed into each other


Frequent Item Sets: Exampletransaction vector

1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

frequent item sets

0 items 1 item 2 items 3 items

∅: 1 {a}: .7 {a, c}: .4 {a, c , d}: .3{b}: .3 {a, d}: .5 {a, c , e}: .3{c}: .7 {a, e}: .6 {a, d , e}: .4{d}: .6 {b, c}: .3{e}: .7 {c , d}: .4

{c , e}: .4{d , e}: .4

• minimum support smin = 3 or σmin = 0.3 = 30% in this example

• there are ??? possible item sets over A = {a, b, c , d , e}

• there are 16 frequent item sets (but only 10 transactions)




10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• minimum support smin = 3 or σmin = 0.3 = 30% in this example

• there are 25 = 32 possible item sets over A = {a, b, c , d , e}

• there are 16 frequent item sets (but only 10 transactions)


Properties of the Support of an Item Set

• brute force approach (enumerate all possible item sets,determine their support and discard infrequent item sets) isinfeasible: number of possible item sets grows exponentially withnumber of itemstypical supermarket has thousands of different products

• idea: consider properties of support, in particular:

∀I : ∀J ⊇ I : KT (J) ⊆ KT (I)

this property holds, since ∀t : ∀I : ∀J ⊇ I : J ⊆ t → I ⊆ teach item is one condition a transaction must satisfytransactions not satisfying this condition are removed from cover

• it follows: ∀I : ∀J ⊇ I : sT (I) ≥ sT (J) i.e.if an item set is extended, its support cannot increasesupport is anti-monotone or downward closed


Properties of the Support of an Item Set

• from ∀I : ∀J ⊇ I : sT (I) ≥ sT (J) it follows

∀smin : ∀I : ∀J ⊇ I : sT (I) < smin → sT (J) < smin.

i.e. no superset of an infrequent item set can be frequent

• Apriori propertyrationale: sometimes we can know a priori, i.e. before checking itssupport by accessing given transaction vector, that an item setcannot be frequent

• contraposition of this implication also holds:

∀smin : ∀I : ∀J ⊆ I : sT (I) ≥ smin → sT (J) ≥ smin.

i.e. all subsets of frequent item set are frequent

• compressed representation of set of frequent item sets


Maximal Item Sets

• consider set of maximal (frequent) item sets:

MT (smin) = {I ⊆ A | sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < smin}

i.e. item set is maximal if it is frequent, but none of its propersupersets is frequent

• so, we know that

∀smin : ∀I : I ∈ MT (smin) ∨ ∃J ⊃ I : sT (J) ≥ smin

it follows

∀smin : ∀I : I ∈ FT (smin) → ∃J ∈ MT (smin) : I ⊆ J

i.e. every frequent item set has a maximal superset

• therefore:∀smin : FT (smin) =

⋃

I∈MT (smin)

2I


Maximal Frequent Item Sets: Exampletransaction vector


10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• which item sets are maximal?

• every frequent item set is a subset of at least one of these sets




10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c},





10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c}, {a, c , d},





10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c}, {a, c , d}, {a, c , e},





10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• which item sets are maximal?{b, c}, {a, c , d}, {a, c , e}, {a, d , e}



Limits of Maximal Item Sets

• set of maximal item sets captures set of all frequent item sets,but then we only know support of maximal item sets

• about support of a non-maximal frequent item set we only know:

∀smin : ∀I ∈ FT (smin) − MT (smin) : sT (I) ≥ maxJ∈MT (smin),J⊃I

sT (J)

this follows immediately from ∀I : ∀J ⊇ I : sT (I) ≥ sT (J),i.e. an item set cannot have lower support than any of itssupersets

• note, in general

∀smin : ∀I ∈ FT (smin) : sT (I) ≥ maxJ∈MT (smin),J⊇I

sT (J)

• question: can we find subset of set of all frequent item sets,which also preserves knowledge of all support values?


Closed Item Sets

• consider set of closed (frequent) item sets:

CT (smin) = {I ⊆ A | sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < sT (I)}

i.e. item set is closed if it is frequent, but none of its propersupersets has same support

• with this we know that

∀smin : ∀I : I ∈ CT (smin) ∨ ∃J ⊃ I : sT (J) = sT (I)

it follows

∀smin : ∀I : I ∈ FT (smin) → ∃J ∈ CT (smin) : I ⊆ J

i.e. every frequent item set has a closed superset

• therefore:∀smin : FT (smin) =

⋃

I∈CT (smin)

2I


Closed Item Sets

• however, not only every frequent item set has closed superset, butit has closed superset with the same support:

∀smin : ∀I : I ∈ FT (smin) → ∃J ⊇ I : J ∈ CT (smin) ∧ sT (J) = sT (I)

(proof: see considerations on next slide)

• set of all closed item sets preserves knowledge of all supportvalues:

∀smin : ∀I ∈ FT (smin) : sT (I) = maxJ∈CT (smin),J⊇I

sT (J)

• note: weaker statement

∀smin : ∀I ∈ FT (smin) : sT (I) ≥ maxJ∈CT (smin),J⊇I

sT (J)

follows immediately from ∀I : ∀J ⊇ I : sT (I) ≥ sT (J), i.e. item setcannot have lower support than any of its supersets


Closed Item Sets

• alternative characterization:

I closed ⇔ sT (I) ≥ smin ∧ I =⋂

k∈KT (I)

tk .

reminder: KT (I) = {k ∈ {1, . . . , n} | I ⊆ tk} is cover of I w.r.t. T

• derived as follows: since ∀k ∈ KT (I) : I ⊆ tk , it is obvious that

∀smin : ∀I ∈ FT (smin) : I ⊆⋂

k∈KT (I)

tk ,

if I ⊂⋂

k∈KT (I) tk , it is not closed, since⋂

k∈KT (I) tk has samesupporton the other hand, no superset of

⋂

k∈KT (I) tk has cover KT (I)

• note: above characterization allows to construct (uniquelydetermined) closed superset of frequent item set with samesupport


Closed Frequent Item Sets: Exampletransaction vector


10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• all frequent item sets but {b} and {d , e} are closed

• {b} is subset of {b, c}, both have support 0.3{d , e} is subset of {a, d , e}, both have support 0.4


Types of Frequent Item Sets

• Frequent Item Set: any frequent item set (support is higherthan minimal support):I frequent ⇔ sT (I) ≥ smin

• Closed Item Set: frequent item set is called closed if nosuperset has same support:I closed ⇔ sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < sT (I)

• Maximal Item Set: frequent item set is called maximal if nosuperset is frequent:I maximal ⇔ sT (I) ≥ smin ∧ ∀J ⊃ I : sT (J) < smin

• obvious relations• all maximal and all closed item sets are frequent• all maximal item sets are closed


Types of Frequent Item Sets: Example


∅ 1 {a} .7 {a, c} .4 {a, c , d} .3{b} .3 {a, d} .5 {a, c , e} .3{c} .7 {a, e} .6 {a, d , e} .4{d} .6 {b, c} .3{e} .7 {c , d} .4

{c , e} .4{d , e} .4

• Frequent Item Set: any frequent item set (support is higherthan the minimal support)

• Closed Item Set: (marked with +) a frequent item set is calledclosed if no superset has the same support

• Maximal Item Set: (marked with ∗) a frequent item set is calledmaximal if no superset is frequent




∅+ 1 {a}+ .7 {a, c}+ .4 {a, c , d}+ .3{b} .3 {a, d}+ .5 {a, c , e}+ .3{c}+ .7 {a, e}+ .6 {a, d , e}+ .4{d}+ .6 {b, c}+ .3{e}+ .7 {c , d}+ .4

{c , e}+ .4{d , e} .4







∅+ 1 {a}+ .7 {a, c}+ .4 {a, c , d}+∗ .3{b} .3 {a, d}+ .5 {a, c , e}+∗ .3{c}+ .7 {a, e}+ .6 {a, d , e}+∗ .4{d}+ .6 {b, c}+∗ .3{e}+ .7 {c , d}+ .4

{c , e}+ .4{d , e} .4





Searching for Frequent Item Sets

• it suffices to find closed item sets together with their support• characterization of closed item sets by

I closed ⇔ sT (I) ≥ smin ∧ I =⋂

k∈KT (I)

tk

suggests to find them by forming all possible intersections of thetransactions and checking their support

• however, approaches using this idea are not competitive withother methods

• if support of all frequent item sets is needed, it can be clumsy andtedious to compute support of non-closed frequent item set with

∀smin : ∀I ∈ FT (smin) − CT (smin) : sT (I) = maxJ∈CT (smin),J⊃I

sT (J)

• in order to find closed sets one may have to visit many frequentsets anyway


Finding the Frequent Item Setsidea: use properties ofsupport to organize searchfor all frequent item sets

∀I : ∀J ⊃ I :

sT (I) < smin

→ sT (J) < smin

since these propertiesrelate support of item setto support of its subsetsand supersets, organizesearch based on subsetlattice of set A (set of allitems)

subset lattice for five items {a, b, c , d , e}:

a b c d e

ab ac ad ae bc bd be cd ce de

abc abd abe acd ace ade bcd bce bde cde

abcd abce abde acde bcde

abcde

Hasse diagram


Subset Lattice and Frequent Item Setstransaction vector


10: {a, d , e}

blue boxes are frequentitem sets, white boxes areinfrequent item sets

subset lattice with frequent item sets(smin = 3):

a b c d e




abcde


Subset Lattice and Closed Item Setstransaction vector


10: {a, d , e}

red boxes are closed itemsets, white boxes areinfrequent item sets

subset lattice with closed item sets(smin = 3):

a b c d e




abcde


Subset Lattice and Maximal Item Setstransaction vector


10: {a, d , e}

red boxes are maximalitem sets, white boxes areinfrequent item sets

subset lattice with maximal item sets(smin = 3):

a b c d e




abcde


Searching for Frequent Item Sets[Agrawal and Srikant, 1994]

one possible scheme for search:

• determine support of one element item sets and discard infrequentitems

• form candidate item sets with two items (both items must befrequent), determine their support, discard infrequent item sets

• form candidate item sets with three items (all pairs must befrequent), determine their support, discard infrequent item sets

• continue by forming candidate item sets with four, five etc. itemsuntil no candidate item set is frequent

this is the Apriori Algorithm which is based on two main steps:candidate generation and pruningall frequent item set mining algorithms are based on these steps insome form


The Apriori Algorithm 1

function apriori (A, T , smin) (∗ Apriori algorithm ∗)

begin

k := 1; (∗ initialize the item set size ∗)

Ek :=⋃

a∈A{{a}}; (∗ start with single element sets ∗)

Fk := prune(Ek , T , smin); (∗ and determine the frequent ones ∗)

while Fk 6= ∅ do begin (∗ while there are frequent item sets ∗)

Ek+1 := candidates(Fk); (∗ create item sets with one item more ∗)

Fk+1 := prune(Ek+1, T , smin); (∗ and determine the frequent ones ∗)

k := k + 1; (∗ increment the item counter ∗)

end;

return⋃k

j=1 Fj ; (∗ return the frequent item sets ∗)

end (∗ apriori ∗)



function prune (E , T , smin) (∗ prune infrequent candidates ∗)

begin

forall e ∈ E do (∗ initialize the support counters ∗)

sT (e) := 0; (∗ of all candidates to be checked ∗)

forall t ∈ T do (∗ traverse the transactions ∗)

forall e ∈ E do (∗ traverse the candidates ∗)

if e ⊆ t (∗ if transaction contains the candidate, ∗)

then sT (e) := sT (e) + 1; (∗ increment the support counter ∗)

F := ∅; (∗ initialize the set of frequent candidates ∗)

forall e ∈ E do (∗ traverse the candidates ∗)

if sT (e) ≥ smin (∗ if a candidate is frequent, ∗)

then F := F ∪ {e}; (∗ add it to the set of frequent candidates ∗)

return F ; (∗ return the pruned set of candidates ∗)

end (∗ prune ∗)



• Apriori algorithm searches subset lattice top-down level by level

• collecting frequent item sets of size k in set Fk has drawbacks:frequent item set of size k + 1 can be formed in

j =k(k + 1)

2

possible ways (for infrequent item sets, number may be smaller)consequence: candidate generation step may carry out a lot ofredundant work, since it suffices to generate each candidate itemset once

• question: can we reduce or even eliminate this redundant work?more generally: how can we make sure that any candidate itemset is generated at most once?

• idea: assign to each item set unique parent item set, from whichthis item set is to be generated



• core problem: item set of size k (i.e. with k items) can begenerated in k! different ways (on k! paths in Hasse diagram),because in principle items may be added in any order

• if we consider an item by item process of building item set(levelwise traversal of the lattice), there are k possible ways offorming an item set of size k from item sets of size k − 1 byadding the remaining item

• obvious: it suffices to consider each item set at most once inorder to find frequent ones (infrequent item sets need not begenerated at all)

• question: can we reduce or even eliminate this variety?more generally: how can we make sure that any candidate itemset is generated at most once?

• idea: assign to each item set a unique parent item set, fromwhich this item set is to be generated



• we must search item subset lattice / its Hasse diagram

• assigning unique parents turns Hasse diagram into tree

• traversing resulting tree explores each item set exactly once

subset lattice (Hasse diagram) and possible tree for five items:


Searching with Unique Parents

Principle of Search Algorithm based on Unique Parents:

• Base Loop:• traverse all one-element item sets (their unique parent is ∅)• recursively process all one-element item sets that are frequent

• Recursive Processing:

for given frequent item set I:• generate all extensions J of I by one item (i.e. J ⊃ I, |J | = |I| + 1)

for which item set I is chosen unique parent• for all J : if J is frequent, process J recursively, otherwise discard J

• questions:• how can we formally assign unique parents?• how can we make sure that we generate only those extensions for

which item set that is extended is chosen unique parent?


Unique Parents and Prefix Trees

• item sets sharing same longest proper prefix are siblings, becausethey have same unique parent

• this allows to represent unique parent tree as prefix tree or trie

canonical parent tree and corresponding prefix tree for 5 items:


Apriori: Levelwise Search1: {a, d , e}2: {b, c , d}3: {a, c , e}4: {a, c , d , e}5: {a, e}6: {a, c , d}7: {b, c}8: {a, c , d , e}9: {c , b, e}

10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7

• example transaction database with 5 items and 10 transactions

• minimum support: 30%, i.e. at least 3 transactions must containitem set

• all one item sets are frequent → full second level is needed



10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4

• determining support of item sets: for each item set traversedatabase and count transactions that contain it (highly inefficient)

• better: traverse tree for each transaction and find item sets itcontains (efficient: can be implemented as simple doubly recursiveprocedure)



10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4


• infrequent item sets: {a, b}, {b, d}, {b, e}

• subtrees starting at these item sets can be pruned



10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : ? e: ? e: ? d : ? e: ? e: ?

• generate candidate item sets with 3 items (parents must befrequent)

• before counting, check whether candidates contain infrequentitem set

• item set with k items has k subsets of size k − 1• parent is only one of these subsets


Apriori: Levelwise Search


10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : ? e: ? e: ? d : ? e: ? e: ?

• item sets {b, c , d} and {b, c , e} can be pruned, because• {b, c , d} contains infrequent item set {b, d} and• {b, c , e} contains infrequent item set {b, e}




10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

• only remaining 4 item sets of size 3 are evaluated




10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2


• infrequent item set: {c , d , e}




10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

d

e: ?

• generate candidate item sets with 4 items (parents must befrequent)

• before counting, check whether candidates contain an infrequentitem set



10: {a, d , e}

a: 7 b: 3 c : 7 d : 6 e: 7a b c d

b: 0 c : 4 d : 5 e: 6 c : 3 d : 1 e: 1 d : 4 e: 4 e: 4c d c d

d : 3 e: 3 e: 4 d : ? e: ? e: 2

d

e: ?

• item set {a, c , d , e} can be pruned, because it contains infrequentitem set {c , d , e}

• consequence: no candidate item sets with four items

• fourth access to transaction database is not necessary


Summary Apriori

Basic Processing Scheme

• breadth-first/levelwise traversal of the subset lattice

• candidates are formed by merging item sets that differ in only oneitem

• support counting is done with doubly recursive procedure

Advantages

• “perfect” pruning of infrequent candidate item sets (withinfrequent subsets)

Disadvantages

• can require lots of memory (since all frequent item sets arerepresented)

• support counting takes very long for large transactions

Software

• http://www.borgelt.net/apriori.html


Summary Frequent Item Set Mining

• many different algorithms for frequent item set mining exist

• here only Apriori algorithm

• algorithms for frequent item set mining differ in:• traversal order of prefix tree: (breadth-first/levelwise vs.

depth-first traversal)

• transaction representation: horizontal (item arrays) vs. vertical(transaction lists) vs. specialized data structures like FP-trees

• types of frequent item sets found: frequent vs. closed vs.maximal item sets (additional pruning methods for closed andmaximal item sets)

• additional filtering is necessary to reduce size of output (notdiscussed in this talk)


Association Rules: Basic Notions

• often found patterns are expressed as association rules, e.g.

If a customer buys bread and wine,then she/he will probably also buy cheese.

• formally, we consider rules of form X → Y , with X , Y ⊆ A andX ∩ Y = ∅

• support of rule X → Y :

either: ςT (X → Y ) = σT (X ∪ Y ) (more common: correct rule)

or: ςT (X → Y ) = σT (X ) (more plausible: applicable rule)

• confidence of rule X → Y :

cT (X → Y ) =σT (X ∪ Y )

σT (X )=

sT (X ∪ Y )

sT (X )=

sT (I)

sT (X )

can be seen as estimate of P(Y | X )


Association Rules: Formal Definition

given:

• set A = {a1, . . . , am} of items,

• vector T = (t1, . . . , tn) of transactions over A,

• real number ςmin, 0 < ςmin ≤ 1, minimum support,

• real number cmin, 0 < cmin ≤ 1, minimum confidence

desired:

• set of all association rules, i.e. set

R = {R : X → Y | ςT (R) ≥ ςmin ∧ cT (R) ≥ cmin}

general procedure:

• find frequent item sets

• construct rules and filter them w.r.t. ςmin and cmin


Generating Association Rules

• which minimum support has to be used for finding frequent itemsets depends on definition of support of rule:

• if ςT (X → Y ) = σT (X ∪ Y ), then σmin = ςmin or equivalentlysmin = ⌈nςmin⌉

• if ςT (X → Y ) = σT (X), then σmin = ςmincmin or equivalentlysmin = ⌈nςmincmin⌉

• after frequent item sets have been found, rule construction thentraverses all frequent item sets I and splits them into disjointsubsets X and Y (X ∩ Y = ∅ and X ∪ Y = I), thus forming rulesX → Y

• filtering rules w.r.t. confidence is always necessary• filtering rules w.r.t. support is only necessary if

ςT (X → Y ) = σT (X)


Properties of the Confidence

• from ∀I : ∀J ⊆ I : sT (I) ≤ sT (J) it obviously follows

∀X , Y : ∀a ∈ X :sT (X ∪ Y )

sT (X )≥

sT (X ∪ Y )

sT (X − {a})

and therefore

∀X , Y : ∀a ∈ X : cT (X → Y ) ≥ cT (X − {a} → Y ∪ {a}),

i.e. moving an item from antecedent to consequent cannotincrease confidence of rule

• immediate consequence:

∀X , Y : ∀a ∈ X : cT (X → Y ) < cmin → cT (X −{a} → Y ∪{a}) < cmin

i.e. if rule fails to meet minimum confidence, then no rulesover same item set and with larger consequent need to beconsidered




10: {a, d , e}

frequent item sets



{c , e}: .4{d , e}: .4

• minimum support is smin = 3 or σmin = 0.3 = 30% in this example

• 25 = 32 possible item sets over A = {a, b, c , d , e}

• 16 frequent item sets (but only 10 transactions)



example: I = {a, c , e}, X = {c , e}, Y = {a}

cT (c , e → a) = = =

minimum confidence: 80%

association support of support of confidencerule all items antecedent

b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8




cT (c , e → a) =sT ({a, c , e})

sT ({c , e})= =



b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8




cT (c , e → a) =sT ({a, c , e})

sT ({c , e})=

30%

40%=



b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8




cT (c , e → a) =sT ({a, c , e})

sT ({c , e})=

30%

40%= 75%



b → c : .3 .3 1d → a: .5 .6 .833e → a: .6 .7 .857a → e: .6 .7 .857d , e → a: .4 .4 1a, d → e: .4 .5 .8


Support of an Association Rule

the two rule support definitions are not equivalent:transaction vector

1: {a, c , e}2: {b, d}3: {b, c , d}4: {a, e}5: {a, b, c , d}6: {c , e}7: {a, b, d}8: {a, c , d}

two association rules


a → c 3 (37.5%) 5 (62.5%) 67.7%b → d 4 (50.0%) 4 (50.0%) 100.0%

let minimum confidence be cmin = .65

• for ςT (R) = σ(X ∪ Y ) and 3 < ςmin ≤ 4 only rule b → d isgenerated, but not rule a → c

• for ςT (R) = σ(X ) there is no value ςmin that generates only ruleb → d , but not at same time also rule a → c


Additional Rule Filtering

Simple Measures

• general idea: compare pT (Y | X ) = cT (X → Y )and pT (Y ) = cT ( ∅ → Y ) = σT (Y )

• (absolute) confidence difference to prior:

dT (R) = |cT (X → Y ) − σT (Y )|

• (absolute) difference of confidence quotient to 1:

qT (R) =

∣∣∣∣1 − min

{cT (X → Y )

σT (Y ),

σT (Y )

cT (X → Y )

}∣∣∣∣

• confidence to prior ratio (lift):

lT (R) =cT (X → Y )

σT (Y )


Additional Rule Filtering

More Sophisticated Measures

• consider 2 × 2 contingency table or estimated probability table:

X 6⊆ t X ⊆ t

Y 6⊆ t n00 n01 n0.

Y ⊆ t n10 n11 n1.

n.0 n

.1 n..

X 6⊆ t X ⊆ t

Y 6⊆ t p00 p01 p0.

Y ⊆ t p10 p11 p1.

p.0 p

.1 1

• n..

total number of transactionsn1.

number of transactions to which rule is applicablen11 number of transactions for which rule is correct

i.e. pij =nij

n..

, pi . =ni.

n..

, p.j =

n.j

n..

for i , j = 1, 2

• general idea: use measures for strength of dependence of X and Y


An Information-theoretic Evaluation MeasureInformation Gain [Kullback and Leibler, 1951, Quinlan, 1986]

based on Shannon entropy H = −n∑

i=1

pi log2 pi [Shannon, 1948]

Igain(X , Y ) = H(Y ) − H(Y |X )

=

︷︸︸︷

−kY∑

i=1

pi . log2 pi . −

︷︸︸︷

kX∑

j=1

p.j

−kY∑

i=1

pi |j log2 pi |j

• H(Y ) entropy of distribution of Y

• H(Y |X ) expected entropy of distribution of Y if value of Xbecomes known

• H(Y ) − H(Y |X ) expected entropy reduction or information gain


Summary Association Rules

• association rule induction is a two step process• find frequent item sets (minimum support)• form relevant association rules (minimum confidence)

• generating association rules• form all possible association rules from frequent item sets• filter “interesting” association rules based on minimum support

and minimum confidence

• filtering association rules• compare rule confidence and consequent support• information gain• other measures, e.g. χ2 measure, . . .


Industrial Applications

• car manufacturer collects servicing tasks on all their vehicles• what are interesting subgroups of cars?• how do these subgroups behave over time?• which cars’ suspension failure rate is strongly increasing in winter?

• bank assesses credit contracts w.r.t. terminability• what changes were there in past?• any common factors?• how to communicate this to non-statisticians, e.g. bankers?

• tracking user activity in virtual environment• are there any oddities in user behavior?• how to parameterize “odd” things?


Or: What they have and what they want

data are

• high-dimensional

• many-valued

• time-stamped

results should be

• easy-to-understand patterns (rules)

• exploratory tools (visualization and inspection)

• natural way of interaction

• exploit temporal information (if desired)


Rule Icons

• every rule〈A1 = a1 ∧ · · · ∧ Ak = ak〉 → C = c

of given rule set is represented as icon• for every possible item, reserved segment on outer border


Rule Icons


of given rule set is represented as icon• for every possible item, reserved segment on outer border• if item is present in antecedent, segment is colored


Rule Icons


of given rule set is represented as icon• for every possible item, reserved segment on outer border• if item is present in antecedent, segment is colored• interior encodes rule-measure: e.g. confidence


Rule Icons: Overlapping

• cover of 2 rules may be non-empty

• percentage bar to display mutual overlap

• special case: inclusion

Gender = male → Cancer = yes

Gender = male ∧ Smoker = yes → Cancer = yes


Rule Icons: Overlapping

• cover of 2 rules may be non-empty• percentage bar to display mutual overlap• general case:


Rule Icons: Location

• finally, arrange icons in two-dimensional chart

• choose 3 association rule measures for both axes and size of icon

• our suggestion for rule X → Y , choose following measures:• x-coordinate: recall, i.e. cT (Y → X)• y-coordinate: lift, i.e. cT (X → Y )/σT (Y )• size: support, i.e. σT (X ∪ Y )


Real-world Example: Daimler AG

car database

• 300,000 vehicles

• subset of 180 attributes

• 2–300 values per attribute

• probabilistic dependencynetwork



Explorative Analysis Tool


Real-world Example: ADAC

customer database

• car and customer information

• assessment of vehicle quality


Temporal Change of Rules

• why considering temporal development of rules?(i.e. change of certain rule evaluation measures)

• failure patterns usually do not arise out of sudden, but ratherevolve slowly over time

• fixed problem takes some while to have measurable effect

• how to present this evolution to user?• create time series for every measure used for locating and scaling

rule icon• interpolate between frames and present animation

• problem: high number of rules


How does that look like?

real-world dataset


How does that look like?

obviously, there is demand forpost-processing rule set


Temporal Change of Rules

1. divide dataset into reasonable time frames

2. run respective pattern induction algorithm

3. quantify each pattern w.r.t. any desired measure(s)

4. generate time series for each measure and each pattern

5. match time series against user-specified concept

6. rank them according to membership of concept


User-driven Post-processing

• often users have idea in which direction to investigate

• however, they cannot explicitly phrase query for data mining

• we can use “fuzzy” intentions to thin out rule set, e.g.

“Show me only those rules that had a stronglyincreasing support and an increasing confidence in thelast quarter.”

or“Which patterns exhibit an increasing lift while thesupport was stable or at most slightly decreasing?”



1. specify fuzzy partition on change rate domain of every patternevaluation measure




2. encode user-concept as fuzzy antecedent

e.g. “lift is unchanged and confidence is increasing”

〈∆liftis unch ∧ ∆confis incr〉

will be evaluated as

⊤(

µ(unch)∆lift

(~a → c), µ(incr)∆conf

(~a → c))

where ⊤ is t-norm that represents fuzzy conjunction




2. encode user-concept as fuzzy antecedent

3. order patterns w.r.t. concept membership degrees


Summary Industrial Applications

requirements

• easy-to-understand patterns

• exploratory visual tools

• natural and intuitive interaction

• exploitation of temporal information

desired properties of rules

• almost parameter-free (support and confidence have clear notionand can even be increased after induction)

• no black-box approach

• intuitive type of patterns (decision/business rules)

• natural way of treating missing values

• small data preprocessing overhead


Outline

1. Introduction



Canonical Form for Undirected Sequences

Allen’s Interval Relations

Temporal Interval Patterns

Quality Monitoring of Vehicles


Frequent Sequence Mining

• directed vs. undirected sequences• e.g. temporal sequences are always directed• DNA sequences can be undirected (both directions can be relevant)

• multiple sequences vs. single sequence• multiple sequences: purchases with rebate cards, web server access

protocols• single sequence: alarms in telecommunication networks

• (time) points vs. time intervals• points: DNA sequences, alarms in telecommunication networks• intervals: weather data, movement analysis (sports medicine)• further distinction: one object per (time) point vs. multiple objects



• consecutive subsequences vs. subsequences with gaps

• a c b a b c b a always counts as subsequence abc

• a c b a b c b c may not always count as subsequence abc

• existence of occurrence vs. counting occurrences• combinatorial counting (all occurrences)• maximal number of disjoint occurrences• temporal support (number of time window positions)• minimum occurrence (smallest interval)

• relation between objects in sequence◦ items: only precede and succeed

◦ labeled time points: t1 < t2, t1 = t2, and t1 > t2

◦ labeled time intervals: relations like before, starts, overlaps etc.



• directed sequences are easier to handle:• (sub)sequence itself can be used as code word• only 1 possible code word per sequence (only 1 direction) → this

code word is necessarily canonical

• consecutive subsequences are easier to handle:• fewer occurrences of given subsequence• for each occurrence, exactly one possible extensions• allows specialized data structures (similar to tree)

• item sequences are easiest to handle:• only 2 possible relations and thus patterns are simple• other sequences are handled with state machines for containment

tests


A Canonical Form for Undirected Sequences

• if sequences to mine are not directed, subsequence can not beused as its own code word, because it does not have prefixproperty

• reason: undirected sequence can be read forward or backward

→ two possible code words, smaller (or larger) of which may then bedefined as canonical code word

• examples (that prefix property is violated):• assume: item order a < b < c . . . and lexicographically smaller

code word is canonical one• sequence bab, which is canonical, has prefix ba, but canonical form

of sequence ba is rather ab• sequence cabd , which is canonical, has the prefix cab, but

canonical form of sequence cab is rather bac

• consequence: we must look for different way of forming codewords (at least if we want the code to have prefix property)



one possibility to form them having prefix property:• handle (sub)sequences of even and odd length separately

• in addition, forming the code word is started in the middleeven length: sequence am am−1 . . . a2 a1 b1 b2 . . . bm−1 bm

is described by code word a1 b1 a2 b2 . . . am−1 bm−1 am bm

or by code word b1 a1 b2 a2 . . . bm−1 am−1 bm am.

odd length: sequence am am−1 . . . a2 a1 a0 b1 b2 . . . bm−1 bm

is described by code word a0 a1 b1 a2 b2 . . . am−1 bm−1 am bm

or by code word a0 b1 a1 b2 a2 . . . bm−1 am−1 bm am.

• the lexicographically smaller of 2 code words is canonical codeword

• such sequences are extended by adding pair am+1 bm+1 orbm+1 am+1, i.e. by adding 1 item at front and 1 item at end



code words defined in this way have prefix property:

• suppose prefix property would not hold• then w.l.o.g., there exists a canonical code word

wm = a1 b1 a2 b2 . . . am−1 bm−1 am bm,

the prefix wm−1 of which is not canonical, where

wm−1 = a1 b1 a2 b2 . . . am−1 bm−1,

• consequence: wm < vm, where

vm = b1 a1 b2 a2 . . . bm−1 am−1 bm am,

and vm−1 < wm−1, where

vm−1 = b1 a1 b2 a2 . . . bm−1 am−1

• but: vm−1 < wm−1 implies vm < wm, because vm−1 is prefix of vm

and wm−1 is a prefix of wm, but vm < wm contradicts wm < vm



• generating and comparing 2 possible code words takes linear time• however, this can be improved by maintaining additional piece of

information• for each sequence, symmetry flag is computed:

sm =m∧

i=1

(ai = bi)

• symmetry flag can be maintained in constant time with

sm+1 = sm ∧ (am+1 = bm+1)

• permissible extensions depend on symmetry flag:• if sm = true, it must be am+1 ≤ bm+1

• if sm = false, any relation between am+1 and bm+1 is acceptable

• rule guarantees: exactly canonical extensions are created• applying this rule to check candidate extension takes constant

timeC. Moewes Temporal Data Mining Zittau, September 9, 2011 83 / 114

Sequences of Time Intervals

• (labeled or attributed) time interval is triple I = (s, e, l), where sis start time, e is end time and l is associated label

• time interval sequence is set of (labeled) time intervals, ofwhich we assume that they are maximal in sense that for 2intervals I1=(s1, e1, l1) and I2=(s2, e2, l2) with l1= l2 we haveeither e1 < s2 or e2 < s1 (otherwise they are merged into 1interval I = (min{s1, s2}, max{e1, e2}, l1))

• time interval sequence database is vector of time intervalsequences

• time intervals can easily be ordered as follows:let I1 = (s1, e1, l1)and I2 = (s2, e2, l2) be 2 time intervalsit is I1 ≺ I2 iff

• s1 < s2 or• s1 = s2 and e1 < e2 or• s1 = s2 and e1 = e2 and l1 < l2

due to assumption made above, at least 3rd option must holdC. Moewes Temporal Data Mining Zittau, September 9, 2011 84 / 114

Allen’s Interval Relations

• due to their temporal extension, time intervals allow for differentrelations

• commonly used set of relations between time intervals are Allen’sinterval relations [Allen, 1983]

A before BA

B B after A

A meets BA

B B is met by A

A overlaps BA

B B is overlapped by A

A is finished by BA

B B finishes A

A contains BAB B during A

A is started by BA

B B starts A

A equals BAB B equals A


Temporal Interval Patterns[Kempe et al., 2008]

• pattern must specify relations between all referenced intervals• this can conveniently be done with matrix:

AB

C

A B C

A e ? ?B ? e ?C ? ? e

• such temporal pattern matrix can also be interpreted as adjacencymatrix of graph, which has interval relationships as edge labels




AB

C

A B C

A e o bB io e mC a im e





AB

C

A B C

A e bB e mC im e


• generally, input interval sequences may be represented as suchgraphs, thus mapping problem to frequent (sub)graph mining

• however, relationships between time intervals are constrained(e.g. “B after A” and “C after B” imply “C after A”)

• constraints can be exploited to obtain simpler canonical form• in canonical form, intervals are assigned in increasing time order

to rows and columns of temporal pattern matrixC. Moewes Temporal Data Mining Zittau, September 9, 2011 86 / 114

Support of Temporal Patterns

• support of temporal pattern w.r.t. single sequence can be definedby

• combinatorial counting (all occurrences)• maximal number of disjoint occurrences• temporal support (number of time window positions)• minimum occurrence (smallest interval)

• however, all of these definitions suffer from fact that such supportis not anti-monotone or downward closed :

AB B






AB B

support of “A contains B” is 2,but support of “A” is only 1






AB B

support of “A contains B” is 2,but support of “A” is only 1

• nevertheless, exhaustive pattern search can ensured, withouthaving to abandon pruning with Apriori property

• reasons: with minimum occurrence counting, relationship“contains” is the only one that can lead to support anomalies


Weakly Anti-Monotone / Downward Closed[Kempe et al., 2008]

• let P pattern space with subpattern relationship ⊏ and let s befunction from P to real numbers, s : P → IR

• for pattern S ∈ P, let P(S) = {R | R ⊏ S ∧ 6∃ Q : R ⊏ Q ⊏ S}be set of all parent patterns of S

• function s on pattern space P is called• strongly anti-monotone or strongly downward closed iff

∀S ∈ P : ∀R ∈ P(S) : s(R) ≥ s(S)

• weakly anti-monotone or weakly downward closed iff

∀S ∈ P : ∃R ∈ P(S) : s(R) ≥ s(S)

• support of temporal interval patterns is weakly anti-monotone (atleast) if it is computed from minimal occurrences

• if temporal interval patterns are extended backwards in time, thenApriori property can safely be used for pruning


Summary Frequent Sequence Mining

• different types of frequent sequence mining can bedistinguished:

• single and multiple sequences, directed and undirected sequences• items vs. (labeled) intervals, single and multiple objects per

position• relations between objects, definition of pattern support

• all common types of frequent sequence mining possess canonicalforms for which canonical extension rules can be found

• with these rules it is possible to check in constant time whetherpossible extension leads to result in canonical form

• weakly anti-monotone support function can be enough to allowpruning with Apriori property

• however, in this case: make sure that canonical form assignsappropriate parent pattern to ensure exhaustive search



101,250 vehicles

• garage stops

• vehicle configuration

• 1.4 million temporal intervals


Pre-Production Vehicles


Outline

1. Introduction




Time Series Representations

Symbolic Aggregate Approximation (SAX)

Motifs in Time Series

Sub-dimensional Motif: Example

Refresh: Data Mining in Time Series

• big challenge: to find useful information in time series

• typical problems: clustering, classification, frequent patternmining, association rules, visualization, anomaly detection

• because of huge amount of data, often problems boil down tosearch for reoccurring similar subsequences

• needed: similarity measure to compare subsequences

• e.g. Euclidean distance

d(Q, C) =

√√√√

n∑

i=1

(qi − ci )2

of 2 standard normal distributed subsequences Q = (q1, . . . , qn)T

and C = (c1, . . . , cn)T

• problem: many comparisons, capacity of fast main memoryusually too small to load all data


Memory-efficient Representations[Lin et al., 2007]

• problem: many, slow accesses to raw data

• solution: approximation of time series that fits into main memoryand contains interesting features

• e.g. discrete Fourier transformation (DFT), discrete wavelettransformation (DWT), piecewise linear (PLA) or adaptivepiecewise constant approximation (APCA), singular valuedecomposition (SVD)

• here: symbolic representations

• advantage: algorithms from text processing and bioinformaticsapplicable, e.g. hashing, Markov models, suffix trees etc.


Time Series Representations[Lin et al., 2007]

Time Series

Representations

Model-based

HMM ARMA

Non data-adaptive

Wavelets

Orthonormal

Haar Daubechies

Bi-orthonormal

Coiflets Symlets

Random PAA Spectral

DFT DCT Chebyshev

Data-dictated

ClippedPhase-based

Grid

Data-adaptive

SortedCoefficients

PiecewisePolynomial

PiecewiseLinear

Interpolation Regression

APCA

SVD Symbolic

NLG Strings

SAXValue-based

Slope-based

Trees


The most common representations[Lin et al., 2007]

DFT PLA Haar wavelet APCA


Piecewise Aggregate Approximation (PAA)[Lin et al., 2007]

reduction from 128 to 8 data pointsC. Moewes Temporal Data Mining Zittau, September 9, 2011 98 / 114

Symbolic Aggregate Approximation (SAX)[Lin et al., 2007]

• every sequence of length n becomes a word of defined length wover chosen alphabet A = {α1, . . . , αa} with |A| = a

• simple algorithm:1. separate subsequence into w equally sized intervals

2. PAA: compute mean value of each interval (as representative)C = (c1, . . . , cn)

T is mapped onto C = (c1, . . . , cw ) with

ci =w

n

nw

i∑

j= nw(i−1)+1

cj

3. map each mean value ci of C onto one of a letters with

ai = αj ⇔ βj−1 ≤ ci ≤ βj

• assumptions: normally distributed value range of PAA sequenceand equiprobable occurrence of each letter

• mapping ci 7→ b ∈ A by “cutpoints” β1, . . . , βa−1C. Moewes Temporal Data Mining Zittau, September 9, 2011 99 / 114

“Cutpoints” of the Normal Distribution[Lin et al., 2007]

|A| 3 4 5 6 7 8 9 10

β1 −.43 −.67 −.84 −.97 −1.07 −1.15 −1.22 −1.28β2 0.43 0 0.25 0.43 0.57 0.67 0.76 0.84β3 0.67 0.25 0 −.18 −.32 −.43 −.52β4 0.84 0.43 0.18 0 −.14 −.25β5 0.97 0.57 0.32 0.14 0β6 1.07 0.67 0.43 0.25β7 1.15 0.76 0.52β8 1.22 0.84β9 1.28

• cutpoints separate normal distribution in equiprobable regions


SAX: Example[Lin et al., 2007]

• here: n = 128, w = 8, a = 3

• result: baabccbc


SAX: Distance Measure[Lin et al., 2007]

• PAA: lower bound of Euclidean distance with

dr (Q, C) =

√n

w

√√√√

w∑

i=1

(qi − ci )2

• SAX:

d∗(Q, C) =

√n

w

√√√√

w∑

i=1

d∗a (qi , ci )2

• distance d∗a should be defined via lookup table, e.g. for a = 4

a b c d

a 0 0 0.67 1.34b 0 0 0 0.67c 0.67 0 0 0d 1.340 0.67 0 0


Comparison of Distances[Lin et al., 2007]


SAX Advantage: Lower Bound[Lin et al., 2007]

• d∗(Q, C) is lower bound of Euclidean distance d(Q, C) oforiginal sequences Q and C

d∗(Q, C) ≤ d(Q, C)

• if Q and C are dissimilar, so are Q and C

• SAX-based algorithms produce identical result compared toalgorithms running on original data

• “only” similar SAX words should be compared in the original space

• usually, only few accesses to original data


Find Motifs in Time Series[Chiu et al., 2003]

• motifs: primitive, frequent (similar) patterns, prototypes• challenges:

• motifs are unknown beforehand• complete search is expensive, i.e. O(n2)• outliers influence Euclidean distance


Generation of SAX Matrix[Chiu et al., 2003]

• find all time series motifs oflength m using “slidingwindow”

• window width n leads to(m − n + 1) subsequences

• transform every sequence intoSAX word of length w

• save it in row matrix,i.e. SAX matrix

• w columns,(m − n + 1) rows


Random Projection[Chiu et al., 2003]

• guess motif positions by so-called random projection

• pairwise comparison of SAX words

• collision matrix M with (m − n + 1)2 cells for all comparisons

• use hash table to implement M efficiently!

• initially, M(i , j) = 0 for 1 ≤ i , j ≤ m − n + 1

• idea: compare character after character of 2 words in SAX matrix

• assumption: “don’t care symbols” in sequences with unknownposition

• e.g. noisy motifs, dilated or contracted sequence


Random Projection[Chiu et al., 2003]

• thus SAX Matrix is projected onto 1 ≤ k < w randomly chosencolumns

• compare all rows of projected matrix• if 2 projected SAX words in rows i and j are equal, then

increment M(i , j)• repeat projection t times, because it is likely that some motifs will

share one cell in M after t iterations• many random sequences will most likely not collide with already

found motif• user-defined threshold s with 1 ≤ s ≤ k for collision entries in M• all M(i , j) ≥ s would be candidate motifs• but: there are very similar sequences in immediate neighborhood

of sequence i (so-called trivial matches)• these must be removed!


Random Projection: First Two Iterations[Chiu et al., 2003]


Sub-dimensional Motifs[Minnen et al., 2007]

• so far: univariate symbolic time series

• random projection can also be used for multivariate symbolic timeseries

• idea: increment collision matrix M for each variablej ∈ {1, . . . , p} fopr each projected SAX word

• problem: relevant dimensions of potential sub-dimensional motifsare unknown

• solution:• estimate distribution P(dj) over distances between non-trivial

matches by drawing a sample• compute distances d∗

1 , . . . , d∗

p for each entry M(i , j) ≥ s

• if P(dj ≤ d∗

j ) < r relj (user-specific dimension relevance), then j-th

variable will be relevant


Sub-dimensional Motif: Example[Moewes and Kruse, 2009]

• expert identified p = 9 of 130 variables as important

• motifs last at least n = 400 ms

• given: 10 time series with unknown sub-dimensional motifs


Sub-dimensional Motif in two Time Series[Moewes and Kruse, 2009]

0100

200300

400

1000 1500 2000

Tim

e [s]

attr_0

0100

200300

400

0 10 30 50 70

Tim

e [s]

attr_1

0100

200300

400

0 200 600 1000

Tim

e [s]

attr_2

0100

200300

400

−16 −14 −12

Tim

e [s]

attr_3

0100

200300

400

−24 −20 −16

Tim

e [s]

attr_4

0100

200300

400

0 10 30 50 70

Tim

e [s]

attr_1

0100

200300

400

−18 −16 −14 −12

Tim

e [s]

attr_3

0100

200300

400

13.7 13.8 13.9 14.0

Tim

e [s]

attr_6

0100

200300

400

13.7 13.8 13.9 14.0 14.1

Tim

e [s]

attr_7

0100

200300

400

27.4 27.6 27.8 28.0

Tim

e [s]

attr_8

attr_3 attr_1

attr_3 attr_1


Clustering of Motifs[Moewes and Kruse, 2009]

DO

0_00

36.c

sv_4

34

DO

0_00

36.c

sv_2

587

DO

0_00

77.c

sv_8

08

DO

0_00

77.c

sv_2

024

DO

0_00

36.c

sv_3

89

DO

0_00

36.c

sv_3

95

DO

0_00

36.c

sv_2

548

DO

0_00

36.c

sv_2

543

020

040

060

080

0

Cluster Dendrogram

hclust (*, "ward")prox

Hei

ght

• create dissimilarity matrix bypairwise comparison of allfound motifs in 10 time seriesbased on d∗

• positive, symmetric matrixwith zeros at main diagonal

• can be used to clusteroccurrences, which helpsfinding motifs occurring inseveral time series

• here: hierarchical clusteringof motifs containing variablesattr_1 and attr_3


Thanks to. . .This talk wouldn’t be possible without the work of

• Christian Borgelt, European Centre for Softcomputing,http://www.borgelt.net

• Steffen Kempe, Daimler AG

• Rudolf Kruse, University of Magdeburg,http://fuzzy.cs.ovgu.de

• Matthias Steinbrecher, SAP AG

Examples from this talk are based on real-world problems of thefollowing companies:

• ADAC

• Dresdner Bank

• Daimler AG

• Second LifeC. Moewes Temporal Data Mining Zittau, September 9, 2011 114 / 114

Literature I

Agrawal, R. and Srikant, R. (1994).

Fast algorithms for mining association rules in large databases.

In Bocca, J. B., Jarke, M., and Zaniolo, C., editors, Proceedings of the 20thInternational Conference on Very Large Data Bases (VLDB ’94), pages487–499, San Francisco, CA, USA. Morgan Kaufmann Publishers, Inc.

Allen, J. F. (1983).

Maintaining knowledge about temporal intervals.

Communications of the ACM, 26:832–843.

Chiu, B., Keogh, E., and Lonardi, S. (2003).

Probabilistic discovery of time series motifs.

In Proceedings of the ninth ACM SIGKDD international conference onKnowledge discovery and data mining, pages 493–498, Washington, D.C.ACM.

C. Moewes Temporal Data Mining Zittau, September 9, 2011 1

Literature II

Kempe, S., Hipp, J., Lanquillon, C., and Kruse, R. (2008).

Mining frequent temporal patterns in interval sequences.

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,16(5):645–661.

Kullback, S. and Leibler, R. A. (1951).

On information and sufficiency.

The Annals of Mathematical Statistics, 22(1):79–86.

Lin, J., Keogh, E., Wei, L., and Lonardi, S. (2007).

Experiencing SAX: a novel symbolic representation of time series.

Data Mining and Knowledge Discovery, 15(2):107–144.


Literature III

Minnen, D., Isbell, C., Essa, I., and Starner, T. (2007).

Detecting subdimensional motifs: An efficient algorithm for generalizedmultivariate pattern discovery.

In Proceedings of the 2007 Seventh IEEE International Conference on DataMining, pages 601–606, Los Alamitos, CA, USA. IEEE Computer Society.

Moewes, C. and Kruse, R. (2009).

Zuordnen von linguistischen ausdrücken zu motiven in zeitreihen.

at-Automatisierungstechnik, 57(3):146–154.

Quinlan, J. R. (1986).

Induction of decision trees.

Journal of Machine Learning, 1(1):81–106.

Shannon, C. E. (1948).

A mathematical theory of communication.

Bell System Technical Journal, 27(3):379–423.


temporal data miningfuzzy.cs.ovgu.de/.../moewes2011zittau.pdf · temporal data mining • many data...

Documents