an inductive database for mining temporal patterns in event sequences

25
An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou [email protected] RENNES - France

Upload: bell-webb

Post on 03-Jan-2016

30 views

Category:

Documents


3 download

DESCRIPTION

An Inductive Database for Mining Temporal Patterns in Event Sequences. Alexandre Vautier, Marie-Odile Cordier and René Quiniou. [email protected]. RENNES - France. M. Rabbit, you suffer from bigeminy, a severe cardiac arrhythmia. M. Dog, you are ok !. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: An Inductive Database for Mining Temporal Patterns in Event Sequences

An Inductive Database for Mining Temporal Patterns in Event Sequences

Alexandre Vautier, Marie-Odile Cordier and René Quiniou

[email protected] - France

Page 2: An Inductive Database for Mining Temporal Patterns in Event Sequences

The application Cardiac arrhytmias

P wave

Normal QRS complex

…ok, but how can a specific

arrhythmia be automatically characterized ?

Abnormal QRS complex

Electrocardiograms

Abnormal rhythm

Normal rhythm

M. Rabbit, you suffer from bigeminy, a severecardiac arrhythmia

M. Dog, you are ok !...

Page 3: An Inductive Database for Mining Temporal Patterns in Event Sequences

A problem definition close to supervised machine learning

Discretized and labeled electrocardiograms

p,Q,p,q,p,Q,p,Q

p,q,p,q,p,q,p,qp,Q,p,p,Q,p,p,Q

N

..ok, which patterns are frequent in the sequence labeled

but not frequent in sequences labeled ?

P

N

p,Q,p,q,p,Q,p,Q

p,q,p,q,p,q,p,q

p,Q,p,p,Q,p,p,Q

Temporal patterns representative of the sequence

P

P

Frequent temporal patterns

Page 4: An Inductive Database for Mining Temporal Patterns in Event Sequences

Formalization of the problemThe framework of inductive database (IDB)

Sequences {Lbigeminy} 2 P

{Llbb, Lmobitz, Lnormal} 2 N

Temporal patterns…

An IDB

…ok, which temporal patterns Csatisfy Quexpert(P,N,T,C) = (9L2P, freq(C,L) ¸ TL) Æ (8L2N, freq(C,L) < TL) ?

Sequences {Lbigeminy} 2 P

{Llbb, Lmobitz, Lnormal} 2 N

Temporal patterns{C|freq(C, Lbigeminy)¸ T

0},

{C|freq(C, Llbbb)¸ T1},{C|freq(C, Lmobitz)¸ T2},{C|freq(C, Lnormal)¸ T3},{C| Quexpert(P,N,T,C) }

Page 5: An Inductive Database for Mining Temporal Patterns in Event Sequences

Plan

Introduction

Problem features Sequences Chronicles

Inductive databases Order relation What is frequency ?

Algorithms Frequent Minimal Chronicles Search (Fmc Search) Querying the IDB

Experiments and problems to be solved

Conclusion and future work

Page 6: An Inductive Database for Mining Temporal Patterns in Event Sequences

Long sequences of time-stamped events with few types Numerical temporal information of major importance

An example of an event sequence:

Features of sequences

AB AB AB B AA B …

Events

time0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Page 7: An Inductive Database for Mining Temporal Patterns in Event Sequences

Features of temporal patternsChronicles

Chronicle: a set of events temporally constrained May contain several events of the same type Specifies numerical temporal constraint between events:

the uncertain delay represented by an interval [dmin,dmax] dmin,dmax 2 Z Is easily readable by an expert of the application domain

C, t0 A, t1 B, t2[5;10]

[-2;20]

C,1 B,5 A,8 B,10 C,15 C,16 A,26 B,34A,27

Instances IC(L)

Event sequence: an ordered list of time-stamped events

C:

Page 8: An Inductive Database for Mining Temporal Patterns in Event Sequences

Inductive databaseRequired definitions

A query language that makes use of frequency constraintsfreq(C,L) ¸ T and freq(C,L) · T

If a query on frequency satisfies monotonicity or anti-monotonicity properties then a search based on frequency is easier to compute

An order relation on chronicles must be defined

Page 9: An Inductive Database for Mining Temporal Patterns in Event Sequences

An order relation on chronicles

C is more general than C’ (C v C’) ,

each event of C can be matched to an event of C‘ each temporal constraint of C is more general than the corresponding

constraint in C'

A,t2

B,t3

B,t4

[5;10]

[9;20]A,t0

B,t1

[8;21]v

C C’

Page 10: An Inductive Database for Mining Temporal Patterns in Event Sequences

How to compute the frequency of a chronicle in a sequence ?

B B

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

A A BB

IC(L)

L:

AB B[2,3] [-1,3]

[1,5]

C:

BA BBAB

The cardinal of the set ofAll the instances ?Minimal occurences ? [Mannila,97]Earliest distinct instances ? [Dousson, 99]Distinct instances ?

Page 11: An Inductive Database for Mining Temporal Patterns in Event Sequences

Monotonicity and anti-monotonicity properties

Constraints on frequency should satisfy monotonicity or anti-monotonicity properties

0 1 2 3 4 5

A AB B

A B[-2,2]

A

freq(C,L)

¸freq(C’,L)

L:

C:

C’:

2 instances

3 instances

·

Minimal occurences [Mannila, 97] don’t have monotonicity and anti-monotonicity properties

Page 12: An Inductive Database for Mining Temporal Patterns in Event Sequences

Recognition criterion

Let IC(L) be the set of instances of the chronicle C in the sequence L

A recognition criterion selects a unique set E of instances from IC(L)

The frequency of the chronicle C in the sequence L according to the recognition criterion Q is

freqQ(C,L) = |E|

A monotonic criterion is a recognition criterion Q such that

C v C’ ) freqQ(C,L) ¸ freqQ(C’,L)

Page 13: An Inductive Database for Mining Temporal Patterns in Event Sequences

Fmc Search

Fmc Search: Frequent Minimal Chronicles SearchfreqQ(C,L) ¸ T, maxwin(C) · W

Input: L: An event sequence T: A minimum frequency threshold Q: A recognition criterion (application dependent) W: A maximal time window

Output: FmcQ,W(L,T) Every chronicle from FmcQ,W(L,T) satisfies 3 properties:

is as specific as possible generalizes at least T instances… …that respect the recognition criterion Q

Algorithm:

Step 1 Step 2 Step 3 Step 4

x xxx x FmcQ,W(L,T)

Page 14: An Inductive Database for Mining Temporal Patterns in Event Sequences

Fmc SearchStep 1: Chronicle instance extraction

The instances of every frequent chronicle are extracted from the sequence L. Their temporal constraints are set to [-W,W]

Implemented in the software FACE (Frequency Analyser for Chronicle Extraction)

x xxx x FmcQ,W(L,T)

The numerical temporal constraints of chronicles found by FACE are not specific enough

Page 15: An Inductive Database for Mining Temporal Patterns in Event Sequences

Fmc SearchStep 2: Fuzzy clustering of instances

A fuzzy clustering of each set IC(L) found at step 1 is performed

B BA A BB AB

AB

BBx

x xx

IC(L)

An instance has a membership

degree to each cluster

x xxx x FmcQ,W(L,T)

Page 16: An Inductive Database for Mining Temporal Patterns in Event Sequences

Fmc SearchStep 3: Chronicle construction from clusters

For each fuzzy cluster of step 2: Instances are sorted in the decreasing order of their membership

degree The T first instances that respect the Q criterion are kept to construct a

chronicle This chronicle is the lgg (least general generalization) of the selected

instances

x xxx x FmcQ,W(L,T)

The specificity of chronicles depends on the

clustering

Page 17: An Inductive Database for Mining Temporal Patterns in Event Sequences

Fmc SearchStep 4: Chronicle filtering - keep the most specific

Compute the set of frequent minimal (maximally specific) chronicles FmcQ,W (L,T) The most specific chronicles are retained

Monotonicity property:A chronicle C that satisfies freqQ(C,L) ¸ T is more general than at least one chronicle of FmcQ,W (L,T)

x xxx x FmcQ,W(L,T)

Page 18: An Inductive Database for Mining Temporal Patterns in Event Sequences

Querying the IDB

A chronicle C satisfies this query iff: C is more general than at least one chronicle of FmcQ,W(LP,TP)

monotonicity property C is not more general than every chronicle of FmcQ,W(LN,TN)

anti-monotonicity property

Version space

?

T

An adaptation of Mitchell’salgorithm computes this

version space

Remember my query:

Quexpert(P,N,T,C). For the explanation P = {LP} and N = {LN}

freq(C,LP)¸TP Æ freq(C,LN)<TN

Page 19: An Inductive Database for Mining Temporal Patterns in Event Sequences

ExperimentsCharacterization of cardiac arrhythmias

Data: 4 sequences of cardiac events elaborated from electrocardiograms labeled by an expert containing ~4000 events of 3 types

(P waves, normal QRS complexes, abnormal QRS complexes)

A typical query :freqQ

d(C,Lbigeminy) ¸ 5% Æ freqQ

d(C,Lnormal) · 10% Æ freqQ

d(C,Lmobitz) · 10% Æ freqQ

d(C,Llbbb) · 10% Æ W = 3 s

Page 20: An Inductive Database for Mining Temporal Patterns in Event Sequences

ExperimentsAn example of cardiac chronicle

Characterizes bigeminy arrhythmia

Also found by a supervised learning method (ILP) from ECGs [Carrault, 03]

p = “P waves”q = “normal QRS”Q = “abnormal QRS”

Page 21: An Inductive Database for Mining Temporal Patterns in Event Sequences

Problems to be solved

The step 3 of the Fmc-search has to cluster up to 180,000 instances per chronicle

For a minimum threshold of 5%, up to 1000 chronicles can be extracted in one sequence

This slows down Mitchell's algorithm dramatically

Finding Fmc is an NP-complete problem The set FmcQ,W(LN,TN)is correct but not complete

Results have to be filtered in order to give the correct solution

?

T

Optimal FmcQ,W(LN,TN)

Practical FmcQ,W(LN,TN)

Page 22: An Inductive Database for Mining Temporal Patterns in Event Sequences

Conclusion

An original method to extract temporal patterns in the form of chronicles Chronicles express constraints on time by numerical intervals

A formalization of the problem in the framework of inductive database which provides the definition of:

An order relation on temporal patterns A monotonic recognition criterion and the related frequency

A management of numerical temporal constraints (this task is very hard)

An algorithm that finds Fmc in sequences

A method to reuse and adapt Mitchell’s algorithm

Page 23: An Inductive Database for Mining Temporal Patterns in Event Sequences

Future work

Control the clustering step of Fmc search in order to compute only Fmcs that are needed by Mitchell’s algorithm

Adapt Mitchell’s algorithm in order to provide an approximate solution whose quality is user-defined

Extend the method to other measures of interest

Explore new applications intrusion detection

Page 24: An Inductive Database for Mining Temporal Patterns in Event Sequences

in Event Sequences

An IDB for Mining Temporal Patterns

Alexandre Vautier,Marie-Odile Cordier,

and René Quiniou

[email protected]

RENNES - France

Page 25: An Inductive Database for Mining Temporal Patterns in Event Sequences

Maximum size of IC(L) as a function of the number of events in a chronicle