moa : massive online analysis

Post on 27-Jan-2015

126 Views

Category:

Technology

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

Talk about MOA, a framework similar to WEKA dealing with massive data and data streams.

TRANSCRIPT

MOA: {M}assive {O}nline {A}nalysis.

University of WaikatoHamilton, New Zealand

What is MOA?

{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

boosting and baggingHoeffding Trees

with and without Naïve Bayes classifiers at the leaves.

2 / 21

WEKA

Waikato Environment for Knowledge AnalysisCollection of state-of-the-art machine learning algorithmsand data processing tools implemented in Java

Released under the GPLSupport for the whole process of experimental data mining

Preparation of input dataStatistical evaluation of learning schemesVisualization of input data and the result of learning

Used for education, research and applicationsComplements “Data Mining” by Witten & Frank

3 / 21

WEKA: the bird

4 / 21

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

5 / 21

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

5 / 21

MOA: the bird

The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.

5 / 21

Data stream classification cycle

1 Process an example at a time,and inspect it only once (atmost)

2 Use a limited amount ofmemory

3 Work in a limited amount oftime

4 Be ready to predict at anypoint

6 / 21

Experimental setting

Evaluation procedures for DataStreams

HoldoutInterleaved Test-Then-Train orPrequential

EnvironmentsSensor Network: 100KbHandheld Computer: 32 MbServer: 400 Mb

7 / 21

Experimental setting

Data SourcesRandom Tree GeneratorRandom RBF GeneratorLED GeneratorWaveform GeneratorFunction Generator

7 / 21

Experimental setting

ClassifiersNaive BayesDecision stumpsHoeffding TreeHoeffding Option TreeBagging and Boosting

Prediction strategiesMajority classNaive Bayes LeavesAdaptive Hybrid

7 / 21

Hoeffding Option TreeHoeffding Option TreesRegular Hoeffding tree containing additional option nodes thatallow several tests to be applied, leading to multiple Hoeffdingtrees as separate paths.

8 / 21

Extension to Evolving Data Streams

New Evolving Data Stream Extensions

New Stream GeneratorsNew UNION of StreamsNew Classifiers

9 / 21

Extension to Evolving Data Streams

New Evolving Data Stream Generators

Random RBF with DriftLED with DriftWaveform with Drift

HyperplaneSEA GeneratorSTAGGER Generator

9 / 21

Concept Drift Framework

t

f (t) f (t)

α

α

t0W

0.5

1

DefinitionGiven two data streams a, b, we define c = a⊕W

t0 b as the datastream built joining the two data streams a and b

Pr[c(t) = b(t)] = 1/(1+ e−4(t−t0)/W ).Pr[c(t) = a(t)] = 1−Pr[c(t) = b(t)]

10 / 21

Concept Drift Framework

t

f (t) f (t)

α

α

t0W

0.5

1

Example

(((a⊕W0t0 b)⊕W1

t1 c)⊕W2t2 d) . . .

(((SEA9⊕Wt0 SEA8)⊕W

2t0SEA7)⊕W

3t0SEA9.5)

CovPokElec = (CoverType⊕5,000581,012 Poker)⊕5,000

1,000,000 ELEC2

10 / 21

Extension to Evolving Data Streams

New Evolving Data Stream Classifiers

Adaptive Hoeffding Option TreeDDM Hoeffding TreeEDDM Hoeffding Tree

OCBoostFLBoost

11 / 21

Ensemble Methods

New ensemble methods:Adaptive-Size Hoeffding Tree bagging:

each tree has a maximum sizeafter one node splits, it deletes some nodes to reduce itssize if the size of the tree is higher than the maximum value

ADWIN bagging:When a change is detected, the worst classifier is removedand a new classifier is added.

12 / 21

Adaptive-Size Hoeffding Tree

T1 T2 T3 T4

Ensemble of trees of different sizesmaller trees adapt more quickly to changes,larger trees do better during periods with little changediversity

13 / 21

Adaptive-Size Hoeffding Tree

0,2

0,21

0,22

0,23

0,24

0,25

0,26

0,27

0,28

0,29

0,3

0 0,1 0,2 0,3 0,4 0,5 0,6

Kappa

Err

or

0,25

0,255

0,26

0,265

0,27

0,275

0,28

0,1 0,12 0,14 0,16 0,18 0,2 0,22 0,24 0,26 0,28 0,3

Kappa

Err

or

Figure: Kappa-Error diagrams for ASHT bagging (left) and bagging(right) on dataset RandomRBF with drift, plotting 90 pairs ofclassifiers.

14 / 21

ADWIN Bagging

ADWIN

An adaptive sliding window whose size is recomputed onlineaccording to the rate of change observed.

ADWIN has rigorous guarantees (theorems)On ratio of false positives and negativesOn the relation of the size of the current window andchange rates

ADWIN BaggingWhen a change is detected, the worst classifier is removed anda new classifier is added.

15 / 21

Web

http://www.cs.waikato.ac.nz/∼abifet/MOA/

16 / 21

GUIjava -cp .:moa.jar:weka.jar-javaagent:sizeofag.jar moa.gui.TaskLauncher

17 / 21

GUIjava -cp .:moa.jar:weka.jar-javaagent:sizeofag.jar moa.gui.TaskLauncher

17 / 21

Command Line

EvaluatePeriodicHeldOutTestjava -cp .:moa.jar:weka.jar -javaagent:sizeofag.jarmoa.DoTask "EvaluatePeriodicHeldOutTest-l DecisionStump -s generators.WaveformGenerator-n 100000 -i 100000000 -f 1000000" > dsresult.csv

This command creates a comma separated values file:

training the DecisionStump classifier on the WaveformGeneratordata,

using the first 100 thousand examples for testing,

training on a total of 100 million examples, and

testing every one million examples:

18 / 21

Easy Design of a MOA classifier

void resetLearningImpl ()

void trainOnInstanceImpl (Instance inst)

double[] getVotesForInstance (Instance i)

void getModelDescription (StringBuilderout, int indent)

19 / 21

Example of a classifier: MajorityClass

public class MajorityClass extends AbstractClassifier {

protected DoubleVector observedClassDistribution;

@Overridepublic void resetLearningImpl() {this.observedClassDistribution = new DoubleVector();

}

@Overridepublic void trainOnInstanceImpl(Instance inst) {this.observedClassDistribution.addToValue(

(int) inst.classValue(), inst weight());}

public double[] getVotesForInstance(Instance i) {return this.observedClassDistribution.getArrayCopy();

}...

}

20 / 21

Summary{M}assive {O}nline {A}nalysis is a framework for online learningfrom data streams.

http://www.cs.waikato.ac.nz/∼abifet/MOA/

It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:

boosting and baggingHoeffding Trees

MOA deals with evolving data streamsMOA is easy to use and extend

21 / 21

top related