an efficient algorithm for mining time interval-based patterns in large databases

An Efficient Algorithm for Mining Time Interval-

based Patterns in Large Databases

Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department of Computer Science

National Chiao Tung University Hsinchu, Taiwan 300

{ejen.cs95g, perrys0620.cs96g}@nctu.edu.tw wcpeng@cs.nctu.edu.tw sylee@csie.nctu.edu.tw

CIKM, 2010

OUTLINE1. INTRODUCTION 2.PROBLEM DEFINITION 3.INCISION STRATEGY 4.COINCIDENCE REPRESENTATION5.CTMiner ALGORITHM 6.EXPERIMENTAL RESULTS 7.CONCLUSION AND FUTURE

1. INTRODUCTION All related researches in this

domain are based on Allen’s temporal logics.

Which there are 13 temporal relations between any two event intervals .

1. INTRODUCTION Compare with previous works ：Kam et al. - hierarchical representation.Hoppner - scan database by sliding

window.Papapetrou - Hybrid-DFS algorithm.Wu et al. - TPrefixSpan.Patel et al. - Augmented Representation

(By additional counting information ), and IEMiner.

1. INTRODUCTION Propose ：Incision strategyCoincidence representationCTMiner (Coincidence Temporal

Miner)

2.PROBLEM DEFINITION Event interval and event sequenceE = {e1, e2,…, ek} be the set of event

symbols.(ei, si, fi), ei ∈ E,

si , fi ,are time points, si < fi Event start ： ei.ts

Event finish ： ei.tf{(e1, s1, f1), (e2, s2, f2), …, (en, sn, fn)}

where si ≤ si+1 and si< fi

2.PROBLEM DEFINITION Temporal databaseDatabase D = {r1, r2, …, rm}, each record

ri, where 1≤ i≤ mA record ri consists of a sequence-id and

an event interval(start time and finish time).

Records in the database D with the same client-id are grouped together.

Database D can be viewed as a collection of event sequences.

2.PROBLEM DEFINITION Time set and time sequenceAn event sequence q = {(e1, s1, f1), (e2, s2,

f2), …, (en, sn, fn)}The set T ={s1, f1, s2, f2, …, si, fi,…, sn, fn} is

called a time set corresponding to sequence q.

Order all the elements in T and eliminate redundant element, we got sequence Ts.sequence Ts = {t1, t2, t3, …, tk}where ti ∈ T , ti < ti+1.

2.PROBLEM DEFINITION Event slice

4 event intervals in sequence 2 (en, sn, fn)(B,1,5),(D,8,4),(E,10,13),(F,10,13)

Corresponding time set T={1,5,8,14,10,13,10,13}

{s1, f1, s2, f2, s3, f3, s4, f4 }Time sequence Ts ={1,5,8,10,13,14} {t1, t2, t3, …, tk}

2.PROBLEM DEFINITION Event sliceLet set L = { +, -, *, Φ},

a set of event sequences Q = {q1, q2, …, qi,…}, qi = {(e1, s1, f1), …, (ej, sj, fj) , … (en, sn, fn)}

2.PROBLEM DEFINITION Event slice

start slice D ＋ = (D, 8, 10)intermediate slice D* = (D, 10, 13)finish slice D － = (D, 13, 14)

The event interval B has only one intact slice B = (B, 1, 5)

3.INCISION STRATEGY

3.INCISION STRATEGY Incision example

The incision strategy can totally avoid the generation of intermediate slices. By trimming the intermediate slices, we can still express the relationship between any two intervals correctly.

4.COINCIDENCE REPRESENTATION

Group simultaneously occurring slices together to form the coincidences.

Concatenation with all coincidences can describe an event sequence effectively.

Simplify the processing of complex pairwise relationships between all intervals efficiently.

4.COINCIDENCE REPRESENTATION

Good scalabilityNonambiguity Simple is good Compact space usage

5.CTMiner ALGORITHM

min_sup = 2

5.CTMiner ALGORITHM

6.EXPERIMENTAL RESULTS Runtime performance on

synthetic data sets

6.EXPERIMENTAL RESULTS Real world dataset analysis

7.CONCLUSION AND FUTURE WORK

Coincidence representation is nonambiguous and has several advantages over existing representations .

7.CONCLUSION AND FUTURE WORK

Further ： mining closed and maximal temporal patterns, incremental temporal patterns mining, and the research of method toward data stream.

an efficient algorithm for mining time interval-based patterns in large databases

Documents

lecture 2 – biological databases filebioinformatics can be...

ceminer – an efficient algorithm for mining closed...

an optimistic concurrency control algorithm for mobile...

a polynomial interval shortest-route algorithm for acyclic

a prlmal algorithm for interval linear-programming...

applied algorithm design lecture 3 - pietro michiardi...

integration of databases with cloud enviornment...vm load...

our future in algorithm farming (long now interval 5/17/16)

interval search genetic algorithm based on trajectory to...

an algorithm for addressing the real interval eigenvalue...

an optimistic concurrency control algorithm for mobile...

interval arithmatic and automatic di erentiation euroad...

efficiently processing queries on interval-and-value tuples...

iogp: an incremental online graph partitioning algorithm ......

chapter 4 - courses.cs.washington.edu27 interval...

gust front update algorithm for the weather systems ... ·...

cure: an efficient clustering algorithm for large...

simple efficient algorithm for mpq -tree of an interval...

an efficient algorithm for mining time interval-based...

research article algorithm for target recognition based on...