adaptive learning and mining for data streams and frequent patterns

Adaptive Learning and Mining for Data Streams andFrequent Patterns

Albert Bifet

Laboratory for Relational Algorithmics, Complexity and Learning LARCADepartament de Llenguatges i Sistemes Informàtics

Universitat Politècnica de Catalunya

Ph.D. dissertation, 24 April 2009Advisors: Ricard Gavaldà and José L. Balcázar

Future Data Mining

Future Data MiningStructured dataFind Interesting PatternsPredictionsOn-line processing

2 / 59

Mining Evolving Massive Structured Data

The Disintegration of Persistenceof Memory 1952-54

Salvador Dalí

The basic problemFinding interesting structureon data

Mining massive dataMining time varying dataMining on real timeMining structured data

3 / 59

Data Streams

Data StreamsSequence is potentially infiniteHigh amount of data: sublinear spaceHigh speed of arrival: sublinear time per exampleOnce an element from a data stream has been processedit is discarded or archived

Approximation algorithmsSmall error rate with high probabilityAn algorithm (ε,δ )−approximates F if it outputs F̃ forwhich Pr[|F̃ −F |> εF ] < δ .

4 / 59

Tree Pattern Mining

Trees are sanctuaries.Whoever knows how

to listen to them,can learn the truth.

Herman Hesse

Given a dataset of trees, find thecomplete set of frequent subtrees

Frequent Tree Pattern (FT):

Include all the trees whosesupport is no less thanmin_sup

Closed Frequent Tree Pattern(CT):

Include no tree which has asuper-tree with the samesupport

CT ⊆ FT

5 / 59

Outline

Mining EvolvingData Streams

1 Framework2 ADWIN

3 Classifiers4 MOA5 ASHT

Tree Mining6 Closure Operator

on Trees7 Unlabeled Tree

Mining Methods8 Deterministic

Association Rules9 Implicit Rules

Mining Evolving TreeData Streams

10 IncrementalMethod

11 Sliding WindowMethod

12 Adaptive Method13 Logarithmic

Relaxed Support14 XML Classification

6 / 59

Outline

1 Framework2 ADWIN

6 / 59

Outline

1 Framework2 ADWIN

6 / 59

Outline

1 Introduction

2 Mining Evolving Data Streams

3 Tree Mining

4 Mining Evolving Tree Data Streams

5 Conclusions

7 / 59

Data Mining Algorithms with Concept Drift

No Concept Drift

-input output

DM Algorithm

Counter1

Counter2

Counter3

Counter4

Counter5

Concept Drift

-input output

DM Algorithm

Static Model

Change Detect.-

8 / 59

Data Mining Algorithms with Concept Drift

No Concept Drift

-input output

DM Algorithm

Counter1

Counter2

Counter3

Counter4

Counter5

Concept Drift

-input output

DM Algorithm

Estimator1

Estimator2

Estimator3

Estimator4

Estimator5

8 / 59

Time Change Detectors and Predictors(1) General Framework

ProblemGiven an input sequence x1,x2, . . . ,xt , . . . we want to output atinstant t

a prediction x̂t+1 minimizing prediction error:

|x̂t+1−xt+1|

an alert if change is detectedconsidering distribution changes overtime.

9 / 59

Estimator

-Estimation

10 / 59

Estimator

-Estimation

- -Alarm

Change Detect.

10 / 59

Estimator

-Estimation

- -Alarm

Change Detect.

Memory-

10 / 59

Optimal Change Detector and Predictor(1) General Framework

High accuracyFast detection of changeLow false positives and false negatives ratiosLow computational cost: minimum space and time neededTheoretical guaranteesNo parameters neededEstimator with Memory and Change Detector

11 / 59

Algorithm ADaptive Sliding WINdow(2) ADWIN

Example

W= 101010110111111W0= 1

ADWIN: ADAPTIVE WINDOWING ALGORITHM

1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µ̂W0− µ̂W1 |< εc holds6 for every split of W into W = W0 ·W17 Output µ̂W

12 / 59

Example

W= 101010110111111W0= 1 W1 = 01010110111111

12 / 59

Example

W= 101010110111111W0= 10 W1 = 1010110111111

12 / 59

Example

W= 101010110111111W0= 101 W1 = 010110111111

12 / 59

Example

W= 101010110111111W0= 1010 W1 = 10110111111

12 / 59

Example

W= 101010110111111W0= 10101 W1 = 0110111111

12 / 59

Example

W= 101010110111111W0= 101010 W1 = 110111111

12 / 59

Example

W= 101010110111111W0= 1010101 W1 = 10111111

12 / 59

Example

W= 101010110111111W0= 10101011 W1 = 0111111

12 / 59

Example

W= 101010110111111 |µ̂W0− µ̂W1 | ≥ εc : CHANGE DET.!

W0= 101010110 W1 = 111111

12 / 59

Example

W= 101010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

12 / 59

Example

W= 01010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111

12 / 59

TheoremAt every time step we have:

1 (False positive rate bound). If µt remains constant withinW, the probability that ADWIN shrinks the window at thisstep is at most δ .

2 (False negative rate bound). Suppose that for somepartition of W in two parts W0W1 (where W1 contains themost recent items) we have |µW0−µW1 |> 2εc . Then withprobability 1−δ ADWIN shrinks W to W1, or shorter.

ADWIN tunes itself to the data stream at hand, with no need forthe user to hardwire or precompute parameters.

13 / 59

ADWIN using a Data Stream Sliding Window Model,can provide the exact counts of 1’s in O(1) time per point.tries O(logW ) cutpointsuses O(1

εlogW ) memory words

the processing time per example is O(logW ) (amortizedand worst-case).

Sliding Window Model

1010101 101 11 1 1Content: 4 2 2 1 1Capacity: 7 3 2 1 1

14 / 59

K-ADWIN = ADWIN + Kalman Filtering(2) ADWIN

Kalman- -

AlarmADWIN

-Estimation

ADWIN Memory-

R = W 2/50 and Q = 200/W (theoretically justifiable), where Wis the length of the window maintained by ADWIN.

15 / 59

Classification(3) Mining Algorithms

DefinitionGiven nC different classes, a classifier algorithm builds a modelthat predicts for every unlabeled instance I the class C to whichit belongs with accuracy.

Classification Mining AlgorithmsNaïve BayesDecision TreesEnsemble Methods

16 / 59

Hoeffding Tree / CVFDT(3) Mining Algorithms

Hoeffding Tree : VFDT

Pedro Domingos and Geoff Hulten.Mining high-speed data streams. 2000

With high probability, constructs an identical model that atraditional (greedy) method would learnWith theoretical guarantees on the error rate

Contains “Money”

YESYes

17 / 59

VFDT / CVFDT(3) Mining Algorithms

Concept-adapting Very Fast Decision Trees: CVFDT

G. Hulten, L. Spencer, and P. Domingos.Mining time-changing data streams. 2001

It keeps its model consistent with a sliding window ofexamplesConstruct “alternative branches” as preparation forchangesIf the alternative branch becomes more accurate, switch oftree branches occurs

18 / 59

Decision Trees: CVFDT(3) Mining Algorithms

Contains “Money”

YESYes

No theoretical guarantees on the error rate of CVFDT

CVFDT parameters :1 W : is the example window size.2 T0: number of examples used to check at each node if the

splitting attribute is still the best.3 T1: number of examples used to build the alternate tree.4 T2: number of examples used to test the accuracy of the

alternate tree.19 / 59

Decision Trees: Hoeffding Adaptive Tree(3) Mining Algorithms

Hoeffding Adaptive Tree:replace frequency statistics counters by estimators

don’t need a window to store examples, due to the fact thatwe maintain the statistics data needed with estimators

change the way of checking the substitution of alternatesubtrees, using a change detector with theoreticalguarantees

Summary:1 Theoretical guarantees2 No Parameters

20 / 59

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

boosting and baggingHoeffding Trees

with and without Naïve Bayes classifiers at the leaves.

21 / 59

Ensemble Methods(4) MOA for Evolving Data Streams

http://www.cs.waikato.ac.nz/∼abifet/MOA/

New ensemble methods:ADWIN bagging: When a change is detected, the worstclassifier is removed and a new classifier is added.Adaptive-Size Hoeffding Tree bagging

22 / 59

Adaptive-Size Hoeffding Tree(5) ASHT

T1 T2 T3 T4

Ensemble of trees of different sizesmaller trees adapt more quickly to changes,larger trees do better during periods with little changediversity

23 / 59

0 0,1 0,2 0,3 0,4 0,5 0,6

0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3

Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging(right) on dataset RandomRBF with drift, plotting 90 pairs ofclassifiers.

24 / 59

Figure: Accuracy and size on dataset LED with three concept drifts.

25 / 59

Main contributions (i)Mining Evolving Data Streams

1 General Framework for Time Change Detectors andPredictors

2 ADWIN

3 Mining methods: Naive Bayes, Decision Trees, EnsembleMethods

4 MOA for Evolving Data Streams5 Adaptive-Size Hoeffding Tree

26 / 59

Outline

1 Introduction

3 Tree Mining

5 Conclusions

27 / 59

Mining Closed Frequent Trees

Our trees are:Labeled and UnlabeledOrdered and Unordered

Our subtrees are:InducedTop-down

Two different ordered treesbut the same unordered tree

28 / 59

A tale of two trees

Consider D = {A,B}, where

and let min_sup = 2.

Frequent subtreesBA

29 / 59

A tale of two trees

Consider D = {A,B}, where

and let min_sup = 2.

Closed subtreesBA

29 / 59

Mining Closed Unordered Subtrees(6) Unlabeled Closed Frequent Tree Method

CLOSED_SUBTREES(t ,D ,min_sup,T )

123 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789

10 return T

30 / 59

1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6789

10 return T

30 / 59

1 if not CANONICAL_REPRESENTATIVE(t)2 then return T3 for every t ′ that can be extended from t in one step4 do if Support(t ′) ≥min_sup5 then T ← CLOSED_SUBTREES(t ′,D ,min_sup,T )6 do if Support(t ′) = Support(t)7 then t is not closed8 if t is closed9 then insert t into T

10 return T

30 / 59

ExampleD = {A,B}

min_sup = 2.

〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)

(0) (0,1)

(0,1,1)

(0,1,2)

(0,1,2,1)

(0,1,2,2)

(0,1,2,3)

(0,1,2,2,1)

(0,1,2,3,1)

31 / 59

ExampleD = {A,B}

min_sup = 2.

〈A〉= (0,1,2,3,2,1) 〈B〉= (0,1,2,3,1,2,2)

(0) (0,1)

(0,1,1)

(0,1,2)

(0,1,2,1)

(0,1,2,2)

(0,1,2,3)

(0,1,2,2,1)

(0,1,2,3,1)

31 / 59

Experimental results(6) Unlabeled Closed Frequent Tree Method

TreeNatUnlabeled TreesTop-Down SubtreesNo Occurrences

CMTreeMinerLabeled TreesInduced SubtreesOccurrences

32 / 59

Closure Operator on Trees(7) Closure Operator

D : the finite input dataset of treesT : the (infinite) set of all trees

DefinitionWe define the following the Galois connection pair:

For finite A⊆Dσ(A) is the set of subtrees of the A trees in T

σ(A) = {t ∈T∣∣ ∀ t ′ ∈ A(t � t ′)}

For finite B ⊂TτD (B) is the set of supertrees of the B trees in D

τD (B) = {t ′ ∈D∣∣ ∀ t ∈ B (t � t ′)}

Closure OperatorThe composition ΓD = σ ◦ τD is a closure operator.

33 / 59

Galois Lattice of closed set of trees(7) Closure Operator

12 13 23

12334 / 59

Galois Lattice of closed set of trees

B = { }

12 13 23

12335 / 59

B = { }

τD(B) = { , }

12 13 23

12335 / 59

B = { }

τD(B) = { , }

ΓD(B) = σ ◦τD(B) = { and its subtrees }

12 13 23

12335 / 59

Mining Implications from Lattices of Closed Trees(8) Association Rules

ProblemGiven a dataset D of rooted, unlabeled and unordered trees,find a “basis”: a set of rules that are sufficient to infer all therules that hold in the dataset D .

∧ →

36 / 59

Mining Implications from Lattices of Closed Trees

Set of Rules:

A→ ΓD(A).

antecedents areobtained through acomputation akinto a hypergraphtransversalconsequentsfollow from anapplication of theclosure operators

12 13 23

12337 / 59

Mining Implications from Lattices of Closed Trees

Set of Rules:

A→ ΓD(A).

∧ →

12 13 23

12337 / 59

Association Rule Computation Example(8) Association Rules

12 13 23

38 / 59

12 13 23

38 / 59

12 13 23

38 / 59

12 13 23

38 / 59

Model transformation(8) Association Rules

IntuitionOne propositional variable vt is assigned to each possiblesubtree t .A set of trees A corresponds in a natural way to a modelmA.Let mA be a model: we impose on mA the constraints that ifmA(vt ) = 1 for a variable vt , then mA(vt ′) = 1 for all thosevariables vt ′ such that vt ′ represents a subtree of the treerepresented by vt .

R0 = {vt ′ → vt∣∣ t ′ � t , t ∈U , t ′ ∈U }

39 / 59

Implicit Rules Definition(9) Implicit Rules

Implicit Rule

∧ →

Given three trees t1, t2, t3, we say that t1∧ t2→ t3, is an implicitHorn rule (abbreviately, an implicit rule) if for every tree t it holds

t1 � t ∧ t2 � t ↔ t3 � t .

t1 and t2 have implicit rules if t1∧ t2→ t is an implicit rule forsome t .

40 / 59

NOT Implicit Rule

∧ →

Given three trees t1, t2, t3, we say that t1∧ t2→ t3, is an implicitHorn rule (abbreviately, an implicit rule) if for every tree t it holds

t1 � t ∧ t2 � t ↔ t3 � t .

t1 and t2 have implicit rules if t1∧ t2→ t is an implicit rule forsome t .

40 / 59

This supertree of theantecedents is NOT a

supertree of theconsequents.

NOT Implicit Rule

∧ →

40 / 59

Implicit Rules Characterization(9) Implicit Rules

TheoremAll trees a, b such that a� b have implicit rules.

TheoremSuppose that b has only one component. Then they haveimplicit rules if and only if a has a maximum component whichis a subtree of the component of b.

for all i < nai � an � b1

a1 · · · an−1 an b1 a1 · · · an−1 b1

∧ →

41 / 59

Main contributions (ii)Tree Mining

6 Closure Operator on Trees7 Unlabeled Closed Frequent Tree Mining8 A way of extracting high-confidence association rules from

datasets consisting of unlabeled treesantecedents are obtained through a computation akin to ahypergraph transversalconsequents follow from an application of the closureoperators

9 Detection of some cases of implicit rules: rules thatalways hold, independently of the dataset

42 / 59

Outline

1 Introduction

3 Tree Mining

5 Conclusions

43 / 59

Mining Evolving Tree Data Streams(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

ProblemGiven a data stream D of rooted, unlabeled and unorderedtrees, find frequent closed trees.

We provide three algorithms,of increasing power

IncrementalSliding WindowAdaptive

44 / 59

Relaxed Support(13) Logarithmic Relaxed Support

Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng,Yunfeng Liu and Kunqing Xie.CLAIM: An Efficient Method for Relaxed Frequent ClosedItemsets Mining over Stream Data

Linear Relaxed Interval:The support space of allsubpatterns can be divided into n = d1/εre intervals, whereεr is a user-specified relaxed factor, and each interval canbe denoted by Ii = [li ,ui), where li = (n− i)∗ εr ≥ 0,ui = (n− i + 1)∗ εr ≤ 1 and i ≤ n.Linear Relaxed closed subpattern t : if and only if thereexists no proper superpattern t ′ of t such that their suportsbelong to the same interval Ii .

45 / 59

Relaxed Support(13) Logarithmic Relaxed Support

As the number of closed frequent patterns is not linear withrespect support, we introduce a new relaxed support:

Logarithmic Relaxed Interval:The support space of allsubpatterns can be divided into n = d1/εre intervals, whereεr is a user-specified relaxed factor, and each interval canbe denoted by Ii = [li ,ui), where li = dc ie, ui = dc i+1−1eand i ≤ n.Logarithmic Relaxed closed subpattern t : if and only ifthere exists no proper superpattern t ′ of t such that theirsuports belong to the same interval Ii .

45 / 59

Algorithms(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

AlgorithmsIncremental: INCTREENAT

Sliding Window: WINTREENAT

Adaptive: ADATREENAT Uses ADWIN to monitor change

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)On ratio of false positives and false negativesOn the relation of the size of the current window andchange rates

46 / 59

Experimental Validation: TN1(10,11,12) Incremental, Sliding Window, and Adaptive Tree Mining Methods

INCTREENAT

CMTreeMiner

Time(sec.)

Size (Milions)2 4 6 8

Figure: Experiments on ordered trees with TN1 dataset

47 / 59

Adaptive XML Tree Classification on evolving datastreams(14) XML Tree Classification

CLASS1 CLASS2 CLASS1 CLASS2

Figure: A dataset example

48 / 59

Tree Trans.Closed Freq. not Closed Trees 1 2 3 4

C C 1 0 1 0

1 0 0 1

49 / 59

Frequent Treesc1 c2 c3 c4

Id c1 f 11 c2 f 1

2 f 22 f 3

2 c3 f 13 c4 f 1

4 f 24 f 3

4 f 44 f 5

41 1 1 1 1 1 1 0 0 1 1 1 1 1 12 0 0 0 0 0 0 1 1 1 1 1 1 1 13 1 1 0 0 0 0 1 1 1 1 1 1 1 14 0 0 1 1 1 1 1 1 1 1 1 1 1 1

Closed MaximalTrees Trees

Id Tree c1 c2 c3 c4 c1 c2 c3 Class1 1 1 0 1 1 1 0 CLASS12 0 0 1 1 0 0 1 CLASS23 1 0 1 1 1 0 1 CLASS14 0 1 1 1 0 1 1 CLASS2

50 / 59

Adaptive XML Tree Framework on evolving datastreams(14) XML Tree Classification

XML Tree Classification Framework ComponentsAn XML closed frequent tree minerA Data stream classifier algorithm, which we will feed withtuples to be classified online.

51 / 59

Adaptive XML Tree Framework on evolving datastreams(14) XML Tree Classification

Maximal Closed

# Trees Att. Acc. Mem. Att. Acc. Mem.

CSLOG12 15483 84 79.64 1.2 228 78.12 2.54CSLOG23 15037 88 79.81 1.21 243 78.77 2.75CSLOG31 15702 86 79.94 1.25 243 77.60 2.73CSLOG123 23111 84 80.02 1.7 228 78.91 4.18

Table: BAGGING on unordered trees.

52 / 59

Main contributions (iii)Mining Evolving Tree Data Streams

10 Incremental Method11 Sliding Window Method12 Adaptive Method13 Logarithmic Relaxed Support14 XML Classification

53 / 59

Outline

1 Introduction

3 Tree Mining

5 Conclusions

54 / 59

Main contributions

1 Framework2 ADWIN

55 / 59

Future Lines (i)

Adaptive Kalman Filter

Kalman filter adaptive computing Q and R without using thesize of the window of ADWIN.

Extend MOA frameworkSupport vector machinesClusteringItemset miningAssociation rules

56 / 59

Future Lines (ii)

Adaptive Deterministic Association RulesDeterministic Association Rules computed on evolving datastreams

General Implicit Rules CharacterizationFind a characterization of implicit rules with any number ofcomponents

Not Deterministic Association RulesFind basis of association rules for trees with confidencelower than 100%

57 / 59

Future Lines (iii)

Closed Frequent Graph MiningMining methods to obtain closed frequent graphs.

Not incrementalIncrementalSliding WindowAdaptive

Graph ClassificationClassifiers of graphs using maximal and closed frequentsubgraphs.

58 / 59

Relevant publicationsAlbert Bifet and Ricard Gavaldà.Kalman filters and adaptive windows for learning in data streams. DS’06

Albert Bifet and Ricard Gavaldà.Learning from time-changing data with adaptive windowing. SDM’07

Albert Bifet and Ricard Gavaldà.Adaptive parameter-free learning from evolving data streams. Tech-Rep R09-9

A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavaldà.New ensemble methods for evolving data streams. KDD’09

José L. Balcázar, Albert Bifet, and Antoni Lozano.Mining frequent closed unordered trees through natural representations. ICCS’07

José L. Balcázar, Albert Bifet, and Antoni Lozano.Subtree testing and closed tree mining through natural representations. DEXA’07

José L. Balcázar, Albert Bifet, and Antoni Lozano.Mining implications from lattices of closed trees. EGC’2008

José L. Balcázar, Albert Bifet, and Antoni Lozano.Mining Frequent Closed Rooted Trees. MLJ’09

Albert Bifet and Ricard Gavaldà.Mining adaptively frequent closed unlabeled rooted trees in data streams. KDD’08

Albert Bifet and Ricard Gavaldà.Adaptive XML Tree Classification on evolving data streams

59 / 59

adaptive learning and mining for data streams and frequent patterns

Technology

improving adaptive bagging methods for evolving data...

136 ieee transactions on mobile computing, vol. 10, no....

an adaptive nearest neighbor classiﬁcation...

adaptive monitoring of bursty data streams brian babcock,...

frequent pattern mining from time-fading streams of...

- courses.cs.washington.edu · ¡mining query streams...

adaptive load shedding for mining frequent patterns from...

dynamic adaptive streaming over http...

mining frequent patterns from data streams

cs246: mining massive datasets jure leskovec, … ·...

finding frequent items in distributed data streams

1 efficient computation of frequent and top-k elements in...

mining data streams (part 1) - stanford university ·...

finding frequent items in data streams -...

adaptive encoding of zoomable video streams based on user

lecture 9. frequent pattern mining in...

verifying and mining frequent patterns from large windows ...

mining adaptively frequent closed unlabeled rooted trees in...

the active streams approach to adaptive distributed systems

adaptive cleaning for rfid data streams