![Page 1: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/1.jpg)
Temporal DatabaseTemporal DatabasePaper ReadingPaper Reading
R95922007 R95922007 資工碩一 馬智釗資工碩一 馬智釗
Efficient Mining Strategy for Frequent Serial EpisodEfficient Mining Strategy for Frequent Serial Episodes in Temporal Databasees in Temporal Database, , K Huang, C ChangK Huang, C Chang
![Page 2: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/2.jpg)
IntroductionIntroduction
Discover Discover frequent serial episodesfrequent serial episodes to find to find relationships between events.relationships between events.- explain the problems that cause a particular - explain the problems that cause a particular eventevent
- predict future result- predict future result
EpisodeEpisode : a partially ordered collection : a partially ordered collection of events occurring together.of events occurring together.- the user defines “how close is close enough”- the user defines “how close is close enough”
- - winwin : the width of the time window : the width of the time window
![Page 3: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/3.jpg)
Three classes of Three classes of episodes episodes Introduced by Mannila et al.Introduced by Mannila et al. Serial episodesSerial episodes
- patterns of a total order in the sequence- patterns of a total order in the sequence Parallel episodesParallel episodes
- no constraints on the relative order- no constraints on the relative order Composite episodesComposite episodes
- serial combination of parallel episodes- serial combination of parallel episodes
![Page 4: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/4.jpg)
Examples : episodesExamples : episodes
![Page 5: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/5.jpg)
Algorithms (old)Algorithms (old)
Presented by Mannila et al.Presented by Mannila et al. Finding parallel and serial episodes tFinding parallel and serial episodes t
hat are frequent enough.hat are frequent enough. WINEPIWINEPI
- consider the - consider the supportsupport of an episode of an episode MINEPIMINEPI
- consider the number of - consider the number of minimal occurrencesminimal occurrences of an episodeof an episode
![Page 6: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/6.jpg)
WINEPIWINEPI
Consider the Sequence S=AConsider the Sequence S=A33AA44BB55BB66.. supportsupport : the number of sliding windo : the number of sliding windo
ws with width = ws with width = winwin.. Given Given winwin=3, there are six windows :=3, there are six windows :
WW11=A=A33, W, W22=A=A33AA44, W, W33=A=A33AA44BB55,,WW44=A=A44BB55BB66, W, W55=B=B55BB66, W, W66=B=B6 6 ..
<A,B> is supported by two windows.<A,B> is supported by two windows.
![Page 7: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/7.jpg)
MINEPIMINEPI
Consider the Sequence S=AConsider the Sequence S=A33AA44BB55BB66.. minimal occurrencesminimal occurrences : an interval that : an interval that
contains episode contains episode αα, but no proper su, but no proper sub-interval does.b-interval does.
<A> has <A> has momo support 2. support 2.- interval [3,3] and [4,4].- interval [3,3] and [4,4].
<A,B> has <A,B> has momo support 1. support 1.- interval [4,5].- interval [4,5].
![Page 8: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/8.jpg)
Complex sequencesComplex sequences
Several events occurring at one Several events occurring at one timetime
Example :Example :
A temporal database is a complex A temporal database is a complex sequence with temporal attributes.sequence with temporal attributes.
AADD
BB AABBEE
CCEE
AABBFF
AACCEE
BBDDFF
DD
![Page 9: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/9.jpg)
Algorithms (new)Algorithms (new)
Extend the algorithm to deal with coExtend the algorithm to deal with complex sequences.mplex sequences.
MINEPI+MINEPI+- depth-first enumeration to generate the frequent - depth-first enumeration to generate the frequent episodes by episodes by equalJoinequalJoin and and temporalJointemporalJoin..
EMMAEMMA- - EEpisodes pisodes MMining using ining using MMemory emory AAnchornchor- utilizes memory anchors to accelerate mining tas- utilizes memory anchors to accelerate mining taskk
![Page 10: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/10.jpg)
More about MINEPIMore about MINEPI
Breath-first mannerBreath-first manner- enumerate longer episodes from shorter ones- enumerate longer episodes from shorter ones
ParametersParameters- - maxwinmaxwin : maximum window width for an episode : maximum window width for an episode- - minsupminsup : minimal frequent for “frequent episod : minimal frequent for “frequent episode”e”
Temporal JoinTemporal Join- connects events from different time intervals- connects events from different time intervals
![Page 11: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/11.jpg)
Example : MINEPIExample : MINEPI
S = AS = A11AA22BB33AA44BB55, , maxwinmaxwin=4, =4, minsupminsup=2=2 Find frequent 1-episode firstFind frequent 1-episode first
- - momo(A)={[1,1],[2,2],[4,4]}, (A)={[1,1],[2,2],[4,4]}, momo(B)={[3,3],[5,5]}(B)={[3,3],[5,5]} Temporal Join with Temporal Join with maxwinmaxwin=4=4
- possibles of <A,B> : [1,3],[2,3],[2,5],[4,5]- possibles of <A,B> : [1,3],[2,3],[2,5],[4,5]- mo(<A,B>)={[2,3],[4,5]} (choose minimal ones)- mo(<A,B>)={[2,3],[4,5]} (choose minimal ones)- support(<A,B>)={[- support(<A,B>)={[11,4],[,4],[22,5],[,5],[44,5]},5]}- support count = 3, counting distinct start point- support count = 3, counting distinct start point
![Page 12: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/12.jpg)
MINEPI+MINEPI+
Must deal with complex sequences.Must deal with complex sequences. Depth-first manner for memory savingDepth-first manner for memory saving Equal JoinEqual Join
- connects events at the same interval- connects events at the same interval Bound ListBound List
• For a serial episode P=<pFor a serial episode P=<p11,…,p,…,pkk>>- {[ts- {[tsii,te,teii] : S contains P in time [ts] : S contains P in time [tsii,te,teii]}]}
• For an event YFor an event Y- {[t- {[tii,t,tii] : S contains P in time t] : S contains P in time tii}}
![Page 13: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/13.jpg)
Example : bound listExample : bound list
maxwinmaxwin = 4. = 4. Bound list of <A,B,C> : {[1,4],[3,6]}.Bound list of <A,B,C> : {[1,4],[3,6]}. Bound list of <C> : {[4,4],[6,6]}.Bound list of <C> : {[4,4],[6,6]}.
11 22 33 44 55 66 77 88
AADD
BB AABBEE
CCEE
AABBFF
AACCEE
BBDDFF
DD
![Page 14: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/14.jpg)
OperationsOperations
Given P=<pGiven P=<p11,…,p,…,pkk> and an event f.> and an event f.- P.boundlist = {[ts- P.boundlist = {[ts11,te,te11],…,[ts],…,[tsnn,te,tenn]}]}- f.boundlist = {[ts’- f.boundlist = {[ts’11,ts’,ts’11],…,[ts’],…,[ts’mm,ts’,ts’mm]}]}
Equal Join : PEqual Join : P11=P=P⊙⊙f=<pf=<p11,…,p,…,pkk∪∪f>.f>.- P- P11.boundlist are [ts.boundlist are [tsii,te,teii] such that] such that teteii=ts’=ts’j j for some j (1for some j (1≦≦jj≦≦m)m)
Temporal Join : PTemporal Join : P22=P=P .. f=<pf=<p11,…,p,…,pkk,f>.,f>.- P- P22.boundlist are [ts.boundlist are [tsii,ts’,ts’jj] such that] such that ts’ts’jj-ts-tsii<<maxwinmaxwin and ts’ and ts’jj>te>teii for some j (1 for some j (1≦≦jj≦≦m)m)
![Page 15: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/15.jpg)
Drawbacks of MINEPI+Drawbacks of MINEPI+
Huge amount of combinationsHuge amount of combinations- Consider |I| 1-frequent episodes- Consider |I| 1-frequent episodes- O(|I|- O(|I|22) checking for temporal joins and equal joins) checking for temporal joins and equal joins
Unnecessary joinsUnnecessary joins- should skip temporal joins for a prefix if the numb- should skip temporal joins for a prefix if the numberer
of extendable matching bounds < of extendable matching bounds < minsup minsup × |TDB|× |TDB| Duplicate joinsDuplicate joins
- episode <ABC,ABC> need 4+1 joins :- episode <ABC,ABC> need 4+1 joins : <A>→<AB>→<ABC>→<ABC,A>→<ABC,AB>→<ABC,ABC><A>→<AB>→<ABC>→<ABC,A>→<ABC,AB>→<ABC,ABC>
![Page 16: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/16.jpg)
EMMAEMMA
Divide into three phasesDivide into three phases(I) Mining frequent itemset in the complex sequence.(I) Mining frequent itemset in the complex sequence.(II) Encode each frequent itemset with a unique ID,(II) Encode each frequent itemset with a unique ID,
and construct a encoded horizontal database.and construct a encoded horizontal database.(III) Mining episodes in the encoded database.(III) Mining episodes in the encoded database.
Depth-First SearchDepth-First Search Memory AnchorMemory Anchor
- utilize the boundlists to access information- utilize the boundlists to access information- timelists of frequent itemsets are their boundlists- timelists of frequent itemsets are their boundlists
![Page 17: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/17.jpg)
Example : databaseExample : database
minsupminsup = 5 = 5
![Page 18: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/18.jpg)
Combine episodesCombine episodes
Only combine existing episodes with Only combine existing episodes with a “local” frequent 1-tuple episode.a “local” frequent 1-tuple episode.- overcome the huge amount of generations- overcome the huge amount of generations
Projected boundlist (PBL)Projected boundlist (PBL)- episode #3=<C> has boundlist- episode #3=<C> has boundlist {[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]}{[1,1],[2,2],[4,4],[8,8],[11,11],[14,14],[15,15]}- given - given maxwinmaxwin = 4, the projected boundlist is = 4, the projected boundlist is {[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}{[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}- note that |TDB|=16- note that |TDB|=16
![Page 19: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/19.jpg)
Example : PBL Example : PBL
#3.timelist={1,2,4,8,11,14,15}.#3.timelist={1,2,4,8,11,14,15}.1 → [2,4]1 → [2,4]2 → [3,5]2 → [3,5]4 → [5,7]4 → [5,7]8 → [9,11]8 → [9,11]11 → [12,14]11 → [12,14]14 → [15,16]14 → [15,16]15 → [16,16]15 → [16,16]
with with maxwinmaxwin = 4 and |TDB|=16. = 4 and |TDB|=16.
![Page 20: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/20.jpg)
Local frequent IDLocal frequent ID
A local frequent ID has boundlist that caA local frequent ID has boundlist that can match into other episode’s PBL.n match into other episode’s PBL.- - #3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}#3.PBL={[2,4],[3,5],[5,7],[9,11],[12,14],[15,16],[16,16]}- #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}- #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}
Record boundlist of ID when examining.Record boundlist of ID when examining.- get the boundlist immediately at temporal join- get the boundlist immediately at temporal join- <C,D>=<#3,#4> then <C,D>.boundlist =- <C,D>=<#3,#4> then <C,D>.boundlist = {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}{[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}
![Page 21: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/21.jpg)
Example : temporal Example : temporal joinjoin #4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}.#4.BL={[3,3],[5,5],[6,6],[9,9],[12,12],[13,13],[16,16]}. Recall the construction of #3.PBLRecall the construction of #3.PBL
11 → [2,4] : → [2,4] : [3,3][3,3] in it in it22 → [3,5] : → [3,5] : [3,3][3,3] in it (take minimal) in it (take minimal)44 → [5,7] : → [5,7] : [5,5][5,5] in it in it88 → [9,11] : → [9,11] : [9,9][9,9] in it in it1111 → [12,14] : → [12,14] : [12,12][12,12] in it in it1414 → [15,16] : → [15,16] : [16,16][16,16] in it in it1515 → [16,16] : → [16,16] : [16,16][16,16] in it in it
Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}Result : {[1,3],[2,3],[4,5],[8,9],[11,12],[14,16],[15,16]}
![Page 22: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/22.jpg)
Procedure : emmajoinProcedure : emmajoin
Recursively extend the episodesRecursively extend the episodes- until no more serial episodes can be extended- until no more serial episodes can be extended
Avoid unnecessary checking in MINEPI+Avoid unnecessary checking in MINEPI+- stop when the number of extendable bounds for a- stop when the number of extendable bounds for a serial episode is less than serial episode is less than minsup minsup × |TDB|.× |TDB|.
Example : #2=<B>.Example : #2=<B>.- #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}- #2.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}- #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16)- #2.PBL={[4,6],[7,9],[10,12],[13,15]} (|TDB|=16)- do not need to extend #2 if - do not need to extend #2 if minsupminsup = 5 = 5
![Page 23: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/23.jpg)
Example : emmajoinExample : emmajoin
#3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}.#3.BL={[1,1],[4,4],[8,8],[11,11],[14,14],[15,15]}. #7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}.#7.BL={[1,1],[4,4],[8,8],[11,11],[14,14]}. #9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}.#9.BL={[3,3],[6,6],[9,9],[12,12],[16,16]}. Call Call emmajoinemmajoin to extend each 1-tuple episodes to extend each 1-tuple episodes #3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}.#3.PBL={[2,4],[5,7],[9,11],[12,14],[15,16],[16,16]}. Find local frequent IDs in #3.PBL.Find local frequent IDs in #3.PBL.
![Page 24: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/24.jpg)
Example : emmajoin (cont.)Example : emmajoin (cont.)
minsupminsup = 5, = 5, maxwinmaxwin = 4. = 4. By temporal Join :By temporal Join :
- <#3,#3>.BL={- <#3,#3>.BL={[1,4],[8,11],[11,14],[14,15]}}- <#3,#7>.BL={- <#3,#7>.BL={[1,4],[8,11],[11,14]}}- <#3,#9>.BL={[1,3],[4,6],[8,9],[11,12],[14,16]}- <#3,#9>.BL={[1,3],[4,6],[8,9],[11,12],[14,16]}- <#3,#9> is generated from prefix #3- <#3,#9> is generated from prefix #3- recursively call - recursively call emmajoinemmajoin to extend<#3,#9> to extend<#3,#9>- <#3,#9>.PBL={[4,4],[7,7],[10,11],[13,14]}- <#3,#9>.PBL={[4,4],[7,7],[10,11],[13,14]}- there are no local frequent IDs since - there are no local frequent IDs since minsupminsup=5=5
Back to call Back to call emmajoinemmajoin for episode #7. for episode #7.
![Page 25: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/25.jpg)
ExperimentsExperiments
On a dataset composed of 10 stocks.On a dataset composed of 10 stocks. Parameters : Parameters : maxwinmaxwin//minsup.minsup.
- more running time when - more running time when maxwin maxwin increasesincreases- more running time when - more running time when minsup minsup decreasesdecreases- since the number of frequent episodes increases- since the number of frequent episodes increases
EMMA runs faster than MINEPI+.EMMA runs faster than MINEPI+. MINEPI+ uses lesser space than EMMA.MINEPI+ uses lesser space than EMMA.
- EMMA needs large memory as - EMMA needs large memory as minsup minsup decreasesdecreases
![Page 26: Temporal Database Paper Reading R95922007 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang](https://reader033.vdocuments.net/reader033/viewer/2022061608/5697bfaa1a28abf838c9a4b2/html5/thumbnails/26.jpg)
ConclusionConclusion
Modify MINEPI to MINEPI+Modify MINEPI to MINEPI+- for mining episodes in a complex sequence- for mining episodes in a complex sequence
Propose EMMAPropose EMMA- avoid the drawbacks of MINEPI+- avoid the drawbacks of MINEPI+
EMMA is more efficient than MINEPI+.EMMA is more efficient than MINEPI+. Future workFuture work
- only discussed serial episodes- only discussed serial episodes- parallel and composite episodes remain to be solved- parallel and composite episodes remain to be solved