a unified approach to mining complex time-series data for

38
Outline Introduction Mining by Learning Conclusion A Unified Approach to Mining Complex Time-Series Data for Various Kinds of Patterns Yi Wang 1 J.H. Feng 1 J.Y. Wang 1 Z.Q. Liu 2 1 Department of Computer Science, Tsinghua University, Beijing, 100084, China 2 School of Creative Media, City University of Hong Kong, Hong Kong IEEE ICDM Conference, 2007 Wang, et al Mining Complex Time-Series Data

Upload: tommy96

Post on 11-May-2015

973 views

Category:

Documents


6 download

TRANSCRIPT

Outline Introduction Mining by Learning Conclusion

A Unified Approach to Mining ComplexTime-Series Data for Various Kinds of Patterns

Yi Wang1 J.H. Feng1 J.Y. Wang1 Z.Q. Liu2

1Department of Computer Science, Tsinghua University, Beijing, 100084, China

2School of Creative Media, City University of Hong Kong, Hong Kong

IEEE ICDM Conference, 2007

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion

1 IntroductionAspects of Sequential Data MiningVarious Approaches or A Unified One

2 Mining by LearningLearning the Temporal Structure as A GraphVarious Kinds of Hidden Markovian ModelsLearning VLHMM

3 ConclusionMining Various Kinds of PatternsContributions

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,

integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,search-by-example,frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),uncertainty on the x-axis(e.g., time scale).

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,search-by-example,frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),uncertainty on the x-axis(e.g., time scale).

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,

search-by-example,frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),uncertainty on the x-axis(e.g., time scale).

two periodic patterns:one with 3 realizations,

the other with 2.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,search-by-example,

frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),uncertainty on the x-axis(e.g., time scale).

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,search-by-example,frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),uncertainty on the x-axis(e.g., time scale).

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,search-by-example,frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),

uncertainty on the x-axis(e.g., time scale).

match?

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Aspects of Sequential Data Mining

Various Sequence Types:

univariate/multivariate,integer (discrete)/real (continous),

Various Mining Goals:

periodic pattern,search-by-example,frequent atomic pattern,

Difficulties:

uncertainty on the y-axis(e.g., noise),uncertainty on the x-axis(e.g., time scale).

matches with which? ormatches with both?

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Various Approaches or A Unified One

Previous Research:

Various approaches

Our Work:

A unified approach

The Unified Approach:

Learns various types ofsequences by hiddenMarkovian models;represents the temproalstructure by a graph;andmines various patternsby well-studies graphalgorithms.

resultsd

Various types ofsequences anddifficulties

Various mining algorithms

Various mining

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Various Approaches or A Unified One

Previous Research:

Various approaches

Our Work:

A unified approach

The Unified Approach:

Learns various types ofsequences by hiddenMarkovian models;represents the temproalstructure by a graph;andmines various patternsby well-studies graphalgorithms.

resultsd

Various types ofsequences anddifficulties

Various mining algorithms

Various mining

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Various Approaches or A Unified One

Previous Research:

Various approaches

Our Work:

A unified approach

The Unified Approach:

Learns various types ofsequences by hiddenMarkovian models;

represents the temproalstructure by a graph;andmines various patternsby well-studies graphalgorithms.

mining

resultsd

Various types ofsequences anddifficulties

Various mining algorithms

Learninghidden Markovianmodel

Temporalstructureas directedgraph

Graphalgorithmsfor

Various mining

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Various Approaches or A Unified One

Previous Research:

Various approaches

Our Work:

A unified approach

The Unified Approach:

Learns various types ofsequences by hiddenMarkovian models;represents the temproalstructure by a graph;and

mines various patternsby well-studies graphalgorithms.

mining

resultsd

Various types ofsequences anddifficulties

Various mining algorithms

Learninghidden Markovianmodel

Temporalstructureas directedgraph

Graphalgorithmsfor

Various mining

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Aspects of the Problem A Unified Approach

Various Approaches or A Unified One

Previous Research:

Various approaches

Our Work:

A unified approach

The Unified Approach:

Learns various types ofsequences by hiddenMarkovian models;represents the temproalstructure by a graph;andmines various patternsby well-studies graphalgorithms.

mining

resultsd

Various types ofsequences anddifficulties

Various mining algorithms

Learninghidden Markovianmodel

Temporalstructureas directedgraph

Graphalgorithmsfor

Various mining

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning the Temporal Structure as A Graph

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Hidden Markov Model (HMM)

Given number of states, S , the number of contexts is S .

Short contexts → inaccurate modeling.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Hidden Markov Model (HMM)

Given number of states, S , the number of contexts is S .

Short contexts → inaccurate modeling.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Hidden Markov Model (HMM)

Given number of states, S , the number of contexts is S .

Short contexts → inaccurate modeling.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Hidden Markov Model (HMM)

Given number of states, S , the number of contexts is S .

Short contexts → inaccurate modeling.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Fixed nth-order Hidden Markov Model (n-HMM)

Given number of states, S , and the length of context, n, thenumber of contexts is Sn.

Long contexts → accurate modeling, but inefficient learning.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Fixed nth-order Hidden Markov Model (n-HMM)

Given number of states, S , and the length of context, n, thenumber of contexts is Sn.

Long contexts → accurate modeling, but inefficient learning.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Fixed nth-order Hidden Markov Model (n-HMM)

Given number of states, S , and the length of context, n, thenumber of contexts is Sn.

Long contexts → accurate modeling, but inefficient learning.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Fixed nth-order Hidden Markov Model (n-HMM)

Given number of states, S , and the length of context, n, thenumber of contexts is Sn.

Long contexts → accurate modeling, but inefficient learning.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Variable-length Hidden Markov Model (VLHMM)

Not all contexts have to be extended to fixed length of n;

Contexts have variable lengths: the shortest, but long enoughto accurately determine the next state;

Learning the minimum set of contexts for accurate modeling.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Variable-length Hidden Markov Model (VLHMM)

Not all contexts have to be extended to fixed length of n;

Contexts have variable lengths: the shortest, but long enoughto accurately determine the next state;

Learning the minimum set of contexts for accurate modeling.

1 2

3

HMM

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Variable-length Hidden Markov Model (VLHMM)

Not all contexts have to be extended to fixed length of n;

Contexts have variable lengths: the shortest, but long enoughto accurately determine the next state;

Learning the minimum set of contexts for accurate modeling.

1 2

3

HMM

3 3

1 1

2 2

2 1

3 2

3 12 3

1 31 2

n-HMM

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Variable-length Hidden Markov Model (VLHMM)

Not all contexts have to be extended to fixed length of n;

Contexts have variable lengths: the shortest, but long enoughto accurately determine the next state;

Learning the minimum set of contexts for accurate modeling.

1 2

3

HMM

3 3

1 1

2 2

2 1

3 2

3 12 3

1 31 2

n-HMM

1 2

3 3 3

21

1 3

3

2

2

3

3

21

3

VLHMM

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Hidden Markovian Model Various HMMs VLHMM

Learning Variable-length Hidden Markov Model (VLHMM)

The number of contexts is unknown before learning, even withthe number of states, S , given;

This situation is called “unknown model structure” in learningtheory, and is the most of the four types of learning problems;

As the EM algorithm cannot learn the model structure, wederived a structural-EM algorithm to learn the model;

Optimizing a Minimum-Entropy criterion to learn theminimum set of contexts, and

optimizing the Maximum-likelihood criterion the estimate themodel parameters.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Mining Patterns Contributions

Mining Various Kinds of Patterns

Align sequence with temporal structure

The Viterbi algorithm can setup a map from each element in thesequence to a context in the graph.

(Partial) Periodic Pattern

Finding cyclic paths in the graph. Many algorithms are developedto do this.

Search-by-Example

Input the example to the Viterbi algorithm, outputs a path that is“most likely” with the example.

Frequent Atomic Pattern

Select those contexts that frequently appear in the trainingsequence.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Mining Patterns Contributions

Our Contribution

A unified framework – mining by learning

Mining from the learned temporal structure using well-studiedgraph algorithms;

“Hidden” model support learning various kinds of sequences;

Probabilistic transitions (esp, self-transitions) encodeuncertainty in time-scale; Output p.d.f.s encode noises.

VLHMM for efficient and accurate learning and mining

Optimizing two criteria simultaneously by developing astructural-EM algorithm;

Minimum-Entropy criteria → minimum number of parameters,efficient and effective learning;

Maximum-Likelihood criteria → accurate learning of thetemporal structure.

Wang, et al Mining Complex Time-Series Data

Outline Introduction Mining by Learning Conclusion Mining Patterns Contributions

Thank You for Your Attention

More details and demos can be accessed online at:http://dbgroup.cs.tsinghua.edu.cn/wangyi/vlhmm

Wang, et al Mining Complex Time-Series Data