autoplait: automatic mining of co-evolving time sequences

66
AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU) SIGMOD 2014 Y. Matsubara et al. 1

Upload: yeriel

Post on 23-Feb-2016

98 views

Category:

Documents


0 download

DESCRIPTION

AutoPlait: Automatic Mining of Co-evolving Time Sequences. Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU). Motivation. Given : co-evolving time-series e .g., MoCap (leg/arm sensors) “Chicken dance”. Time . Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

AutoPlait: Automatic Mining of Co-evolving Time SequencesYasuko Matsubara (Kumamoto University)Yasushi Sakurai (Kumamoto University)Christos Faloutsos (CMU)SIGMOD 2014Y. Matsubara et al.1

Time MotivationGiven: co-evolving time-seriese.g., MoCap (leg/arm sensors)

Chicken dance

Y. Matsubara et al.2SIGMOD 2014

http://www.youtube.com/watch?v=6UV3kRV46Zs&t=49s

2

Time MotivationGiven: co-evolving time-seriese.g., MoCap (leg/arm sensors)

Chicken dance

Y. Matsubara et al.3SIGMOD 2014

Q. Any distinct patterns?Q. If yes, how many? Q. What kind?http://www.youtube.com/watch?v=6UV3kRV46Zs&t=49s

3MotivationChallenges: co-evolving sequencesUnknown # of patterns (e.g., beaks)Different durations Y. Matsubara et al.4SIGMOD 2014

Time

4MotivationChallenges: co-evolving sequencesUnknown # of patterns (e.g., beaks)Different durations Y. Matsubara et al.5SIGMOD 2014

Time

Q. Can we summarize it automatically ?5MotivationGoal: find patterns that agree with human intuition

Y. Matsubara et al.6SIGMOD 2014

Time 6MotivationGoal: find patterns that agree with human intuition

Y. Matsubara et al.7SIGMOD 2014

AutoPlait: fully-automatic mining algorithm7Importance of fully-automaticNo magic numbers! ... because,

Big data mining: -> we cannot afford human intervention!!

Y. Matsubara et al.8SIGMOD 2014Manualsensitive to the parameter tuninglong tuning steps (hours, days, )Automatic (no magic numbers)- no expert tuning required8OutlineSIGMOD 20149MotivationProblem definitionCompression & summarizationAlgorithmsExperimentsAutoPlait at workConclusionsY. Matsubara et al.Problem definitionSIGMOD 201410Y. Matsubara et al.Key concepts

Bundle:Segment:Regime:Segment-membership:

givenhiddenhiddenhiddenProblem definitionSIGMOD 201411Y. Matsubara et al.Bundle : set of d co-evolving sequences

Bundle X(d=4)

givenProblem definitionSIGMOD 201412Y. Matsubara et al.Segment: convert X -> m segments, S

Segment(m=8)s1s2s3s4s5s6s7s8

hiddenProblem definitionSIGMOD 201413Y. Matsubara et al.Regime: segment groups:

: model of regime r

Regimes(r=4)

wingsbeaks

hiddenProblem definitionSIGMOD 201414Y. Matsubara et al.Segment-membership: assignment

Segment-membership(m=8)

F = { 2, 4, 1, 3, 2, 4, 1, 3 }hiddenProblem definitionSIGMOD 201415Given: bundle X

Y. Matsubara et al.

Problem definitionSIGMOD 201416Given: bundle X

Find: compact description C of X

Y. Matsubara et al.

Problem definitionSIGMOD 201417Given: bundle X

Find: compact description C of X

Y. Matsubara et al.

m segmentsr regimesSegment-membershipOutlineSIGMOD 201418MotivationProblem definitionCompression & summarizationAlgorithmsExperimentsAutoPlait at workConclusionsY. Matsubara et al.Main ideasGoal: compact description of X

without user intervention!!

Challenges: Q1. How to generate informative regimes ?Q2. How to decide # of regimes/segments ?

SIGMOD 2014Y. Matsubara et al.19

Main ideasGoal: compact description of X

without user intervention!!

Challenges: Q1. How to generate informative regimes ?Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ?Idea (2): Model description cost

SIGMOD 2014Y. Matsubara et al.20

SIGMOD 201421Idea (1): MLCM: multi-level chain modelQ1. How to generate informative regimes ?

Y. Matsubara et al.ModelSequencesRegimesbeaksclapswings21SIGMOD 201422Idea (1): MLCM: multi-level chain modelQ1. How to generate informative regimes ?

Idea (1): Multi-level chain model HMM-based probabilistic modelwith across-regime transitionsY. Matsubara et al.beakswingsclapsModelSequencesRegimes22Idea (1): MLCM: multi-level chain model SIGMOD 2014Y. Matsubara et al.23

r regimes (HMMs)across-regime transition prob.

Single HMM parametersDetailsIdea (1): MLCM: multi-level chain modelSIGMOD 2014Y. Matsubara et al.24Regimes r=2 Regime 1 (k=3)Regime 2 (k=2)

r regimes (HMMs)across-regime transition prob.

Single HMM parametersstate 1

state 3state 2

state 1

state 2

Regimeswitch

Regime1 beaksRegime2 wingsDetailsSIGMOD 201425Idea (2): model description cost Q2. How to decide # of regimes/segments ?

Idea (2): Model description costMinimize coding costfind optimal # of segments/regimes

Y. Matsubara et al.25Idea (2): model description cost Idea: Minimize encoding cost!SIGMOD 2014Y. Matsubara et al.26Good compressionGood description(# of r, m)CostM(M) + Costc(X|M)Model costCoding cost min ()Idea (2): model description cost Total cost of bundle X, given CSIGMOD 2014Y. Matsubara et al.27

Details

Idea (2): model description cost Total cost of bundle X, given CSIGMOD 2014Y. Matsubara et al.28

Model description cost of Coding cost of X given # of segments/regimessegment lengthssegment- membership Fduration/dimensionsDetailsOutlineSIGMOD 201429MotivationProblem definitionCompression & summarizationAlgorithmsExperimentsAutoPlait at workConclusionsY. Matsubara et al.AutoPlaitOverviewSIGMOD 2014Y. Matsubara et al.30Iteration 0r=1, m=1

Start! 1 2 3 4 5 6 7Iteration

0 1 2 3 4 5 6 7Iterationr=1AutoPlaitOverviewSIGMOD 2014Y. Matsubara et al.31Iteration 1r=2, m=4

0 1 2 3 4 5 6 7Iterationr=1r=2SplitAutoPlaitOverviewSIGMOD 2014Y. Matsubara et al.32Iteration 1r=2, m=4Iteration 2r=3, m=6

Split0 1 2 3 4 5 6 7Iterationr=1r=2r=3AutoPlaitOverviewSIGMOD 2014Y. Matsubara et al.33Iteration 1r=2, m=4Iteration 2r=3, m=6Iteration 4r=4, m=8

Split 1 2 3 4 5 6 7Iteration0 1 2 3 4 5 6 7Iterationr=1r=2r=3r=4AutoPlaitAlgorithmsCutPointSearchFind good cut-points/segments RegimeSplitEstimate good regime parameters AutoPlaitSearch for the best number of regimes (r=2,3,4)SIGMOD 2014Y. Matsubara et al.34Inner-most loopInner loopOuter loop1.CutPointSearchGiven: - bundle - parameters of two regimes

Find: cut-points of segment sets S1, S2, SIGMOD 2014Y. Matsubara et al.35

Inner-most loop

1.CutPointSearchSIGMOD 2014Y. Matsubara et al.36

state 1state 2state 3state 2state 1

switch??switch??

Inner-most loopDP algorithmto compute likelihood:Details1.CutPointSearchTheoretical analysisScalability- It takes time (only single scan) - n: length of X- d: dimension of X- k: # of hidden states in regimeAccuracyIt guarantees the optimal cut points- (Details in paper)SIGMOD 2014Y. Matsubara et al.37

Inner-most loopDetails2.RegimeSplitGiven: - bundle Find: two regimes1. find cut-points of segment sets: S1, S22. estimate parameters of two regimes:SIGMOD 2014Y. Matsubara et al.38

Inner loop

2.RegimeSplitTwo-phase iterative approachPhase 1: (CutPointSearch)Split segments into two groups :Phase 2: (BaumWelch)Update model parameters:SIGMOD 2014Y. Matsubara et al.39

Phase 1Phase 2Inner loopDetails3.AutoPlaitGiven: - bundle Find: r regimes (r=2, 3, 4, )

- i.e., find full parameter setSIGMOD 2014Y. Matsubara et al.40

Outer loop

3.AutoPlait Split regimes r=2,3,, as long as cost keeps decreasing- Find appropriate # of regimesSIGMOD 2014Y. Matsubara et al.41

Outer loop3.AutoPlaitSplit regimes r=2,3,, as long as cost keeps decreasing- Find appropriate # of regimesSIGMOD 2014Y. Matsubara et al.42r=2, m=4r=3, m=6r=4, m=8

Outer loopOutlineSIGMOD 201443MotivationProblem definitionCompression & summarizationAlgorithmsExperimentsAutoPlait at workConclusionsY. Matsubara et al.ExperimentsWe answer the following questions

Q1. Sense-making Can it help us understand the given bundles?Q2. Accuracy How well does it find cut-points and regimes?Q3. Scalability How does it scale in terms of computational time?

SIGMOD 201444Y. Matsubara et al.Q1. Sense-makingMoCap data

SIGMOD 201445Y. Matsubara et al.

AutoPlait(NO magic numbers)DynaMMo (Li et al., KDD09)pHMM (Wang et al., SIGMOD11)Q1. Sense-makingMoCap data

SIGMOD 201446Y. Matsubara et al.AutoPlait (NO magic numbers)

Q2. Accuracy (a) Segmentation (b) Clustering

SIGMOD 201447Y. Matsubara et al.AutoPlait needs no magic numbers

TargetQ3. ScalabilityWall clock time vs. data size (length) : n

SIGMOD 201448Y. Matsubara et al.AutoPlait scales linearly, i.e., O(n)

AutoPlaitDynaMMopHMMOutlineSIGMOD 201449MotivationProblem definitionCompression & summarizationAlgorithmsExperimentsAutoPlait at workConclusionsY. Matsubara et al.AutoPlait at workAutoPlait is capable of various applications, e.g.,

App1. Model analysis - Web-click sequencesApp2. Event discovery - Google Trend data

SIGMOD 201450Y. Matsubara et al.AutoPlait at workAutoPlait is capable of various applications, e.g.,

App1. Model analysis - Web-click sequencesApp2. Event discovery - Google Trend data

SIGMOD 201451Y. Matsubara et al.App1. Model analysis (WebClick)Web-click sequences (1 month, 5urls)

5urls: blog, news, dictionary, Q&A, mailevery 10 minutesSIGMOD 201452Y. Matsubara et al.

Time (per 10 minutes)App1. Model analysis (WebClick)Web-click sequences (1 month, 5urls)

SIGMOD 201453Y. Matsubara et al.

AutoPlait finds 2 patterns: weekday/weekend !App1. Model analysis (WebClick)Web-click sequences (1 month, 5urls)

SIGMOD 201454Y. Matsubara et al.

AutoPlait finds 2 patterns: weekday/weekend !Monday (but holiday)App1. Model analysis (WebClick)Pattern of weekday regime

SIGMOD 201455Y. Matsubara et al.

Observation: Working hard every weekday (i.e., using dictionary, news sites)

DetailsApp1. Model analysis (WebClick)Pattern of weekend regime

SIGMOD 201456Y. Matsubara et al.

Observation: No more work on weekend(i.e., blog, mail, Q&A for non-business purposes)

DetailsAutoPlait at workAutoPlait is capable of various applications, e.g.,

App1. Model analysis - Web-click sequencesApp2. Event discovery - Google Trend data

SIGMOD 201457Y. Matsubara et al.App2. Event discovery (GoogleTrend)Anomaly detection (flu-related topics,10 years)

SIGMOD 201458Y. Matsubara et al.

App2. Event discovery (GoogleTrend)Anomaly detection (flu-related topics,10 years)

SIGMOD 201459Y. Matsubara et al.

AutoPlait detects 1 unusual spike in 2009 (i.e., swine flu)App2. Event discovery (GoogleTrend)Turning point detection (seasonal sweets topics)

SIGMOD 201460Y. Matsubara et al.

App2. Event discovery (GoogleTrend)Turning point detection (seasonal sweets topics)

SIGMOD 201461Y. Matsubara et al.

Trend suddenly changed in 2010 (release of android OS Ginger bread, Ice Cream Sandwich)App2. Event discovery (GoogleTrend)Trend discovery (game-related topics)

SIGMOD 201462Y. Matsubara et al.

App2. Event discovery (GoogleTrend)Trend discovery (game-related topics)

SIGMOD 201463Y. Matsubara et al.

It discovers 3 phases of game console war(Xbox&PlayStation/Wii/Mobile social games)OutlineSIGMOD 201464MotivationProblem definitionCompression & summarizationAlgorithmsExperimentsAutoPlait at workConclusionsY. Matsubara et al.ConclusionsAutoPlait has the following propertiesEffective Find optimal segments/regimesSense-makingReasonable regimesFully-automaticNo magic numbersScalableIt scales linearly SIGMOD 2014Y. Matsubara et al.65Thank you!SIGMOD 201466Y. Matsubara et al.

Code: http://www.cs.kumamoto-u.ac.jp/~yasuko/software.htmlMail: [email protected]