adaptive learning and mining for data streams and frequent patterns

88
Adaptive Learning and Mining for Data Streams and Frequent Patterns Albert Bifet Laboratory for Relational Algorithmics, Complexity and Learning LARCA Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya Ph.D. dissertation, 24 April 2009 Advisors: Ricard Gavaldà and José L. Balcázar LARCA

Upload: albert-bifet

Post on 08-May-2015

1.837 views

Category:

Technology


3 download

DESCRIPTION

Thesis defense on mining evolving data streams and tree mining.

TRANSCRIPT

Page 1: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive Learning and Mining for Data Streams andFrequent Patterns

Albert Bifet

Laboratory for Relational Algorithmics, Complexity and Learning LARCADepartament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Ph.D. dissertation, 24 April 2009Advisors: Ricard Gavaldà and José L. Balcázar

LARCA

Page 2: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Future Data Mining

Future Data MiningStructured dataFind Interesting PatternsPredictionsOn-line processing

2 / 59

Page 3: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Evolving Massive Structured Data

The Disintegration of Persistenceof Memory 1952-54

Salvador Dalí

The basic problemFinding interesting structureon data

Mining massive dataMining time varying dataMining on real timeMining structured data

3 / 59

Page 4: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Data Streams

Data StreamsSequence is potentially infiniteHigh amount of data: sublinear spaceHigh speed of arrival: sublinear time per exampleOnce an element from a data stream has been processedit is discarded or archived

Approximation algorithmsSmall error rate with high probabilityAn algorithm (ε,δ )−approximates F if it outputs F̃ forwhich Pr[|F̃ −F |> εF ] < δ .

4 / 59

Page 5: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Tree Pattern Mining

Trees are sanctuaries.Whoever knows how

to listen to them,can learn the truth.

Herman Hesse

Given a dataset of trees, find thecomplete set of frequent subtrees

Frequent Tree Pattern (FT):

Include all the trees whosesupport is no less thanmin_sup

Closed Frequent Tree Pattern(CT):

Include no tree which has asuper-tree with the samesupport

CT ⊆ FT

5 / 59

Page 6: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

Mining EvolvingData Streams

1 Framework2 ADWIN

3 Classifiers4 MOA5 ASHT

Tree Mining6 Closure Operator

on Trees7 Unlabeled Tree

Mining Methods8 Deterministic

Association Rules9 Implicit Rules

Mining Evolving TreeData Streams

10 IncrementalMethod

11 Sliding WindowMethod

12 Adaptive Method13 Logarithmic

Relaxed Support14 XML Classification

6 / 59

Page 7: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

Mining EvolvingData Streams

1 Framework2 ADWIN

3 Classifiers4 MOA5 ASHT

Tree Mining6 Closure Operator

on Trees7 Unlabeled Tree

Mining Methods8 Deterministic

Association Rules9 Implicit Rules

Mining Evolving TreeData Streams

10 IncrementalMethod

11 Sliding WindowMethod

12 Adaptive Method13 Logarithmic

Relaxed Support14 XML Classification

6 / 59

Page 8: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

Mining EvolvingData Streams

1 Framework2 ADWIN

3 Classifiers4 MOA5 ASHT

Tree Mining6 Closure Operator

on Trees7 Unlabeled Tree

Mining Methods8 Deterministic

Association Rules9 Implicit Rules

Mining Evolving TreeData Streams

10 IncrementalMethod

11 Sliding WindowMethod

12 Adaptive Method13 Logarithmic

Relaxed Support14 XML Classification

6 / 59

Page 9: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

1 Introduction

2 Mining Evolving Data Streams

3 Tree Mining

4 Mining Evolving Tree Data Streams

5 Conclusions

7 / 59

Page 10: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Data Mining Algorithms with Concept Drift

No Concept Drift

-input output

DM Algorithm

-

Counter1

Counter2

Counter3

Counter4

Counter5

Concept Drift

-input output

DM Algorithm

Static Model

-

Change Detect.-

6

8 / 59

Page 11: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Data Mining Algorithms with Concept Drift

No Concept Drift

-input output

DM Algorithm

-

Counter1

Counter2

Counter3

Counter4

Counter5

Concept Drift

-input output

DM Algorithm

-

Estimator1

Estimator2

Estimator3

Estimator4

Estimator5

8 / 59

Page 12: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Time Change Detectors and Predictors(1) General Framework

ProblemGiven an input sequence x1,x2, . . . ,xt , . . . we want to output atinstant t

a prediction x̂t+1 minimizing prediction error:

|x̂t+1−xt+1|

an alert if change is detectedconsidering distribution changes overtime.

9 / 59

Page 13: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Time Change Detectors and Predictors(1) General Framework

-xt

Estimator

-Estimation

10 / 59

Page 14: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Time Change Detectors and Predictors(1) General Framework

-xt

Estimator

-Estimation

- -Alarm

Change Detect.

10 / 59

Page 15: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Time Change Detectors and Predictors(1) General Framework

-xt

Estimator

-Estimation

- -Alarm

Change Detect.

Memory-

6

6?

10 / 59

Page 16: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Optimal Change Detector and Predictor(1) General Framework

High accuracyFast detection of changeLow false positives and false negatives ratiosLow computational cost: minimum space and time neededTheoretical guaranteesNo parameters neededEstimator with Memory and Change Detector

11 / 59

Page 17: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 1

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 18: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 1 W1 = 01010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 19: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 10 W1 = 1010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 20: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 101 W1 = 010110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 21: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 1010 W1 = 10110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 22: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 10101 W1 = 0110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 23: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 101010 W1 = 110111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 24: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 1010101 W1 = 10111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 25: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 10101011 W1 = 0111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 26: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111 |µ̂W0− µ̂W1 | ≥ εc : CHANGE DET.!

W0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 27: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 28: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 01010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Page 29: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

TheoremAt every time step we have:

1 (False positive rate bound). If µt remains constant withinW, the probability that ADWIN shrinks the window at thisstep is at most δ .

2 (False negative rate bound). Suppose that for somepartition of W in two parts W0W1 (where W1 contains themost recent items) we have |µW0−µW1 |> 2εc . Then withprobability 1−δ ADWIN shrinks W to W1, or shorter.

ADWIN tunes itself to the data stream at hand, with no need forthe user to hardwire or precompute parameters.

13 / 59

Page 30: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithm ADaptive Sliding WINdow(2) ADWIN

ADWIN using a Data Stream Sliding Window Model,can provide the exact counts of 1’s in O(1) time per point.tries O(logW ) cutpointsuses O(1

εlogW ) memory words

the processing time per example is O(logW ) (amortizedand worst-case).

Sliding Window Model

1010101 101 11 1 1Content: 4 2 2 1 1Capacity: 7 3 2 1 1

14 / 59

Page 31: Adaptive Learning and Mining for Data Streams and Frequent Patterns

K-ADWIN = ADWIN + Kalman Filtering(2) ADWIN

-xt

Kalman- -

AlarmADWIN

-Estimation

ADWIN Memory-

6

6?

R = W 2/50 and Q = 200/W (theoretically justifiable), where Wis the length of the window maintained by ADWIN.

15 / 59

Page 32: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Classification(3) Mining Algorithms

DefinitionGiven nC different classes, a classifier algorithm builds a modelthat predicts for every unlabeled instance I the class C to whichit belongs with accuracy.

Classification Mining AlgorithmsNaïve BayesDecision TreesEnsemble Methods

16 / 59

Page 33: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Hoeffding Tree / CVFDT(3) Mining Algorithms

Hoeffding Tree : VFDT

Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000

With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate

Time

Contains “Money”

YESYes

NONo

Day

YES

Night

17 / 59

Page 34: Adaptive Learning and Mining for Data Streams and Frequent Patterns

VFDT / CVFDT(3) Mining Algorithms

Concept-adapting Very Fast Decision Trees: CVFDT

G. Hulten, L. Spencer, and P. Domingos.Mining time-changing data streams. 2001

It keeps its model consistent with a sliding window ofexamplesConstruct “alternative branches” as preparation forchangesIf the alternative branch becomes more accurate, switch oftree branches occurs

18 / 59

Page 35: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Decision Trees: CVFDT(3) Mining Algorithms

Time

Contains “Money”

YESYes

NONo

Day

YES

Night

No theoretical guarantees on the error rate of CVFDT

CVFDT parameters :1 W : is the example window size.2 T0: number of examples used to check at each node if the

splitting attribute is still the best.3 T1: number of examples used to build the alternate tree.4 T2: number of examples used to test the accuracy of the

alternate tree.19 / 59

Page 36: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Decision Trees: Hoeffding Adaptive Tree(3) Mining Algorithms

Hoeffding Adaptive Tree:replace frequency statistics counters by estimators

don’t need a window to store examples, due to the fact thatwe maintain the statistics data needed with estimators

change the way of checking the substitution of alternatesubtrees, using a change detector with theoreticalguarantees

Summary:1 Theoretical guarantees2 No Parameters

20 / 59

Page 37: Adaptive Learning and Mining for Data Streams and Frequent Patterns

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

boosting and baggingHoeffding Trees

with and without Naïve Bayes classifiers at the leaves.

21 / 59

Page 38: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Ensemble Methods(4) MOA for Evolving Data Streams

http://www.cs.waikato.ac.nz/∼abifet/MOA/

New ensemble methods:ADWIN bagging: When a change is detected, the worstclassifier is removed and a new classifier is added.Adaptive-Size Hoeffding Tree bagging

22 / 59

Page 39: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive-Size Hoeffding Tree(5) ASHT

T1 T2 T3 T4

Ensemble of trees of different sizesmaller trees adapt more quickly to changes,larger trees do better during periods with little changediversity

23 / 59

Page 40: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive-Size Hoeffding Tree(5) ASHT

0,2

0,21

0,22

0,23

0,24

0,25

0,26

0,27

0,28

0,29

0,3

0 0,1 0,2 0,3 0,4 0,5 0,6

Kappa

Err

or

0,25

0,255

0,26

0,265

0,27

0,275

0,28

0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3

Kappa

Err

or

Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging(right) on dataset RandomRBF with drift, plotting 90 pairs ofclassifiers.

24 / 59

Page 41: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive-Size Hoeffding Tree(5) ASHT

Figure: Accuracy and size on dataset LED with three concept drifts.

25 / 59

Page 42: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Main contributions (i)Mining Evolving Data Streams

1 General Framework for Time Change Detectors andPredictors

2 ADWIN

3 Mining methods: Naive Bayes, Decision Trees, EnsembleMethods

4 MOA for Evolving Data Streams5 Adaptive-Size Hoeffding Tree

26 / 59

Page 43: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

1 Introduction

2 Mining Evolving Data Streams

3 Tree Mining

4 Mining Evolving Tree Data Streams

5 Conclusions

27 / 59

Page 44: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Closed Frequent Trees

Our trees are:Labeled and UnlabeledOrdered and Unordered

Our subtrees are:InducedTop-down

Two different ordered treesbut the same unordered tree

28 / 59

Page 45: Adaptive Learning and Mining for Data Streams and Frequent Patterns

A tale of two trees

Consider D = {A,B}, where

A:

B:

and let min_sup = 2.

Frequent subtreesBA

29 / 59

Page 46: Adaptive Learning and Mining for Data Streams and Frequent Patterns

A tale of two trees

Consider D = {A,B}, where

A:

B:

and let min_sup = 2.

Closed subtreesBA

29 / 59

Page 47: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Closed Unordered Subtrees(6) Unlabeled Closed Frequent Tree Method

CLOSED_SUBTREES(t ,D ,min_sup,T )

123 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789

10 return T

30 / 59

Page 48: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Closed Unordered Subtrees(6) Unlabeled Closed Frequent Tree Method

CLOSED_SUBTREES(t ,D ,min_sup,T )

1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789

10 return T

30 / 59

Page 49: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Closed Unordered Subtrees(6) Unlabeled Closed Frequent Tree Method

CLOSED_SUBTREES(t ,D ,min_sup,T )

1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6 do if Support(t ′) = Support(t)7 then t is not closed8 if t is closed9 then insert t into T

10 return T

30 / 59

Page 50: Adaptive Learning and Mining for Data Streams and Frequent Patterns

ExampleD = {A,B}

min_sup = 2.

〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)

(0) (0,1)

(0,1,1)

(0,1,2)

(0,1,2,1)

(0,1,2,2)

(0,1,2,3)

(0,1,2,2,1)

(0,1,2,3,1)

31 / 59

Page 51: Adaptive Learning and Mining for Data Streams and Frequent Patterns

ExampleD = {A,B}

min_sup = 2.

〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)

(0) (0,1)

(0,1,1)

(0,1,2)

(0,1,2,1)

(0,1,2,2)

(0,1,2,3)

(0,1,2,2,1)

(0,1,2,3,1)

31 / 59

Page 52: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Experimental results(6) Unlabeled Closed Frequent Tree Method

TreeNatUnlabeled TreesTop-Down SubtreesNo Occurrences

CMTreeMinerLabeled TreesInduced SubtreesOccurrences

32 / 59

Page 53: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Closure Operator on Trees(7) Closure Operator

D : the finite input dataset of treesT : the (infinite) set of all trees

DefinitionWe define the following the Galois connection pair:

For finite A⊆Dσ(A) is the set of subtrees of the A trees in T

σ(A) = {t ∈T∣∣ ∀ t ′ ∈ A(t � t ′)}

For finite B ⊂TτD (B) is the set of supertrees of the B trees in D

τD (B) = {t ′ ∈D∣∣ ∀ t ∈ B (t � t ′)}

Closure OperatorThe composition ΓD = σ ◦ τD is a closure operator.

33 / 59

Page 54: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Galois Lattice of closed set of trees(7) Closure Operator

1 2 3

12 13 23

12334 / 59

Page 55: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Galois Lattice of closed set of trees

D

B = { }

1 2 3

12 13 23

12335 / 59

Page 56: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Galois Lattice of closed set of trees

B = { }

τD(B) = { , }

1 2 3

12 13 23

12335 / 59

Page 57: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Galois Lattice of closed set of trees

B = { }

τD(B) = { , }

ΓD(B) = σ ◦τD(B) = { and its subtrees }

1 2 3

12 13 23

12335 / 59

Page 58: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Implications from Lattices of Closed Trees(8) Association Rules

ProblemGiven a dataset D of rooted, unlabeled and unordered trees,find a “basis”: a set of rules that are sufficient to infer all therules that hold in the dataset D .

D

∧ →

∧ →

∧ →

36 / 59

Page 59: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Implications from Lattices of Closed Trees

Set of Rules:

A→ ΓD(A).

antecedents areobtained through acomputation akinto a hypergraphtransversalconsequentsfollow from anapplication of theclosure operators

1 2 3

12 13 23

12337 / 59

Page 60: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Implications from Lattices of Closed Trees

Set of Rules:

A→ ΓD(A).

∧ →

∧ →

∧ →

1 2 3

12 13 23

12337 / 59

Page 61: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Association Rule Computation Example(8) Association Rules

1 2 3

12 13 23

123

23

38 / 59

Page 62: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Association Rule Computation Example(8) Association Rules

1 2 3

12 13 23

123

23

38 / 59

Page 63: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Association Rule Computation Example(8) Association Rules

1 2 3

12 13 23

123

23

38 / 59

Page 64: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Association Rule Computation Example(8) Association Rules

1 2 3

12 13 23

123

23

38 / 59

Page 65: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Model transformation(8) Association Rules

IntuitionOne propositional variable vt is assigned to each possiblesubtree t .A set of trees A corresponds in a natural way to a modelmA.Let mA be a model: we impose on mA the constraints that ifmA(vt ) = 1 for a variable vt , then mA(vt ′) = 1 for all thosevariables vt ′ such that vt ′ represents a subtree of the treerepresented by vt .

R0 = {vt ′ → vt∣∣ t ′ � t , t ∈U , t ′ ∈U }

39 / 59

Page 66: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Implicit Rules Definition(9) Implicit Rules

D

Implicit Rule

∧ →

Given three trees t1, t2, t3, we say that t1∧ t2→ t3, is an implicitHorn rule (abbreviately, an implicit rule) if for every tree t it holds

t1 � t ∧ t2 � t ↔ t3 � t .

t1 and t2 have implicit rules if t1∧ t2→ t is an implicit rule forsome t .

40 / 59

Page 67: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Implicit Rules Definition(9) Implicit Rules

D

NOT Implicit Rule

∧ →

Given three trees t1, t2, t3, we say that t1∧ t2→ t3, is an implicitHorn rule (abbreviately, an implicit rule) if for every tree t it holds

t1 � t ∧ t2 � t ↔ t3 � t .

t1 and t2 have implicit rules if t1∧ t2→ t is an implicit rule forsome t .

40 / 59

Page 68: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Implicit Rules Definition(9) Implicit Rules

This supertree of theantecedents is NOT a

supertree of theconsequents.

NOT Implicit Rule

∧ →

40 / 59

Page 69: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Implicit Rules Characterization(9) Implicit Rules

TheoremAll trees a, b such that a� b have implicit rules.

TheoremSuppose that b has only one component. Then they haveimplicit rules if and only if a has a maximum component whichis a subtree of the component of b.

for all i < nai � an � b1

a1 · · · an−1 an b1 a1 · · · an−1 b1

∧ →

41 / 59

Page 70: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Main contributions (ii)Tree Mining

6 Closure Operator on Trees7 Unlabeled Closed Frequent Tree Mining8 A way of extracting high-confidence association rules from

datasets consisting of unlabeled treesantecedents are obtained through a computation akin to ahypergraph transversalconsequents follow from an application of the closureoperators

9 Detection of some cases of implicit rules: rules thatalways hold, independently of the dataset

42 / 59

Page 71: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

1 Introduction

2 Mining Evolving Data Streams

3 Tree Mining

4 Mining Evolving Tree Data Streams

5 Conclusions

43 / 59

Page 72: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Mining Evolving Tree Data Streams(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

ProblemGiven a data stream D of rooted, unlabeled and unorderedtrees, find frequent closed trees.

D

We provide three algorithms,of increasing power

IncrementalSliding WindowAdaptive

44 / 59

Page 73: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Relaxed Support(13) Logarithmic Relaxed Support

Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng,Yunfeng Liu and Kunqing Xie.CLAIM: An Efficient Method for Relaxed Frequent ClosedItemsets Mining over Stream Data

Linear Relaxed Interval:The support space of allsubpatterns can be divided into n = d1/εre intervals, whereεr is a user-specified relaxed factor, and each interval canbe denoted by Ii = [li ,ui), where li = (n− i)∗ εr ≥ 0,ui = (n− i + 1)∗ εr ≤ 1 and i ≤ n.Linear Relaxed closed subpattern t : if and only if thereexists no proper superpattern t ′ of t such that their suportsbelong to the same interval Ii .

45 / 59

Page 74: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Relaxed Support(13) Logarithmic Relaxed Support

As the number of closed frequent patterns is not linear withrespect support, we introduce a new relaxed support:

Logarithmic Relaxed Interval:The support space of allsubpatterns can be divided into n = d1/εre intervals, whereεr is a user-specified relaxed factor, and each interval canbe denoted by Ii = [li ,ui), where li = dc ie, ui = dc i+1−1eand i ≤ n.Logarithmic Relaxed closed subpattern t : if and only ifthere exists no proper superpattern t ′ of t such that theirsuports belong to the same interval Ii .

45 / 59

Page 75: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Algorithms(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

AlgorithmsIncremental: INCTREENAT

Sliding Window: WINTREENAT

Adaptive: ADATREENAT Uses ADWIN to monitor change

ADWIN

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)On ratio of false positives and false negativesOn the relation of the size of the current window andchange rates

46 / 59

Page 76: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Experimental Validation: TN1(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

INCTREENAT

CMTreeMiner

Time(sec.)

Size (Milions)2 4 6 8

100

200

300

Figure: Experiments on ordered trees with TN1 dataset

47 / 59

Page 77: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive XML Tree Classification on evolving datastreams(14) XML Tree Classification

D

D

B

C

A

C

D

B

C

B

D

B

C C

B

D

B

C

A

B

CLASS1 CLASS2 CLASS1 CLASS2

Figure: A dataset example

48 / 59

Page 78: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive XML Tree Classification on evolving datastreams(14) XML Tree Classification

Tree Trans.Closed Freq. not Closed Trees 1 2 3 4

c1

D

B

C C

B

C C 1 0 1 0

c2

D

B

C

A

B

C

A

C

A

A

1 0 0 1

49 / 59

Page 79: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive XML Tree Classification on evolving datastreams(14) XML Tree Classification

Frequent Treesc1 c2 c3 c4

Id c1 f 11 c2 f 1

2 f 22 f 3

2 c3 f 13 c4 f 1

4 f 24 f 3

4 f 44 f 5

41 1 1 1 1 1 1 0 0 1 1 1 1 1 12 0 0 0 0 0 0 1 1 1 1 1 1 1 13 1 1 0 0 0 0 1 1 1 1 1 1 1 14 0 0 1 1 1 1 1 1 1 1 1 1 1 1

Closed MaximalTrees Trees

Id Tree c1 c2 c3 c4 c1 c2 c3 Class1 1 1 0 1 1 1 0 CLASS12 0 0 1 1 0 0 1 CLASS23 1 0 1 1 1 0 1 CLASS14 0 1 1 1 0 1 1 CLASS2

50 / 59

Page 80: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive XML Tree Framework on evolving datastreams(14) XML Tree Classification

XML Tree Classification Framework ComponentsAn XML closed frequent tree minerA Data stream classifier algorithm, which we will feed withtuples to be classified online.

51 / 59

Page 81: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Adaptive XML Tree Framework on evolving datastreams(14) XML Tree Classification

Maximal Closed

# Trees Att. Acc. Mem. Att. Acc. Mem.

CSLOG12 15483 84 79.64 1.2 228 78.12 2.54CSLOG23 15037 88 79.81 1.21 243 78.77 2.75CSLOG31 15702 86 79.94 1.25 243 77.60 2.73CSLOG123 23111 84 80.02 1.7 228 78.91 4.18

Table: BAGGING on unordered trees.

52 / 59

Page 82: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Main contributions (iii)Mining Evolving Tree Data Streams

10 Incremental Method11 Sliding Window Method12 Adaptive Method13 Logarithmic Relaxed Support14 XML Classification

53 / 59

Page 83: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Outline

1 Introduction

2 Mining Evolving Data Streams

3 Tree Mining

4 Mining Evolving Tree Data Streams

5 Conclusions

54 / 59

Page 84: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Main contributions

Mining EvolvingData Streams

1 Framework2 ADWIN

3 Classifiers4 MOA5 ASHT

Tree Mining6 Closure Operator

on Trees7 Unlabeled Tree

Mining Methods8 Deterministic

Association Rules9 Implicit Rules

Mining Evolving TreeData Streams

10 IncrementalMethod

11 Sliding WindowMethod

12 Adaptive Method13 Logarithmic

Relaxed Support14 XML Classification

55 / 59

Page 85: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Future Lines (i)

Adaptive Kalman Filter

Kalman filter adaptive computing Q and R without using thesize of the window of ADWIN.

Extend MOA frameworkSupport vector machinesClusteringItemset miningAssociation rules

56 / 59

Page 86: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Future Lines (ii)

Adaptive Deterministic Association RulesDeterministic Association Rules computed on evolving datastreams

General Implicit Rules CharacterizationFind a characterization of implicit rules with any number ofcomponents

Not Deterministic Association RulesFind basis of association rules for trees with confidencelower than 100%

57 / 59

Page 87: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Future Lines (iii)

Closed Frequent Graph MiningMining methods to obtain closed frequent graphs.

Not incrementalIncrementalSliding WindowAdaptive

Graph ClassificationClassifiers of graphs using maximal and closed frequentsubgraphs.

58 / 59

Page 88: Adaptive Learning and Mining for Data Streams and Frequent Patterns

Relevant publicationsAlbert Bifet and Ricard Gavaldà.Kalman filters and adaptive windows for learning in data streams. DS’06

Albert Bifet and Ricard Gavaldà.Learning from time-changing data with adaptive windowing. SDM’07

Albert Bifet and Ricard Gavaldà.Adaptive parameter-free learning from evolving data streams. Tech-Rep R09-9

A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà.New ensemble methods for evolving data streams. KDD’09

José L. Balcázar, Albert Bifet, and Antoni Lozano.Mining frequent closed unordered trees through natural representations. ICCS’07

José L. Balcázar, Albert Bifet, and Antoni Lozano.Subtree testing and closed tree mining through natural representations. DEXA’07

José L. Balcázar, Albert Bifet, and Antoni Lozano.Mining implications from lattices of closed trees. EGC’2008

José L. Balcázar, Albert Bifet, and Antoni Lozano.Mining Frequent Closed Rooted Trees. MLJ’09

Albert Bifet and Ricard Gavaldà.Mining adaptively frequent closed unlabeled rooted trees in data streams. KDD’08

Albert Bifet and Ricard Gavaldà.Adaptive XML Tree Classification on evolving data streams

59 / 59