handling concept drift - university of waikatoabifet/pakdd2011/... · learning with concept drift...

163
What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook Handling Concept Drift: Importance, Challenges & Solutions A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaitė PAKDD-2011 Tutorial May 27, Shenzhen, China http://www.cs.waikato.ac.nz/~abifet/PAKDD2011/

Upload: others

Post on 26-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Handling Concept Drift:Importance, Challenges & Solutions

A. Bifet, J. Gama, M. Pechenizkiy, I. Žliobaitė

PAKDD-2011 TutorialMay 27, Shenzhen, China

http://www.cs.waikato.ac.nz/~abifet/PAKDD2011/

Page 2: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Motivation for the Tutorial

• in the real world data– more and more data is organized as data streams

– data comes from complex environment, and it is evolving over time

– concept drift = underlying distribution of data is changing

• attention to handling concept drift is increasing, since predictions become less accurate over time

• to prevent that, learning models need to adapt to changes quickly and accurately

PAKDD-2011 Tutorial, May 27, Shenzhen, China

2A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 3: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Objectives of the Tutorial

• many approaches to handle drifts in a wide range of applications have been proposed

• the tutorial aims to provide

an integrated view of the adaptive learning

methods and application needs

PAKDD-2011 Tutorial, May 27, Shenzhen, China

3A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 4: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

OutlineBlock 1: PROBLEM

Block 2: TECHNIQUES

Block 3: EVALUATION

Block 4: APPLICATIONS

OUTLOOK

PAKDD-2011 Tutorial, May 27, Shenzhen, China

4A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 5: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Presenters & Schedule

PAKDD-2011 Tutorial, May 27, Shenzhen, China

5A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

14:05 – 14:50

Indrė Žliobaitė

14:50 – 15:40João Gama

Block 1: what concept drift is, why it needs to be handled and how

Block 2: approaches for adaptive machine learning to handle concept drift

Page 6: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Presenters & Schedule

PAKDD-2011 Tutorial, May 27, Shenzhen, China

6A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

16:00 – 16:45

Albert Bifet

16:45 – 17:30

Mykola Pechenizkiy

Block 3: training/evaluation of adaptive learning approaches and a software demo

Block 4: challenges for handling concept drift in different applications

Page 7: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

PAKDD-2011 Tutorial, May 27, Shenzhen, China

7A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 8: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Block 1: introduction

PAKDD-2011 Tutorial, May 27, Shenzhen, China

8A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 9: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

The goals:

• to introduce the problem of concept drift in supervised learning

• to overview and categorize the main principles how concept drift can be (and is) handled

PAKDD-2011 Tutorial, May 27, Shenzhen, China

9A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 10: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Example: production

PAKDD-2011 Tutorial, May 27, Shenzhen, China

10A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 11: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

monitor

many process parameters

� � � � � � � �

reduced and

transformed

parameterst1, t2

��

-3

-2

-1

0

1

2

3

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

t[2

]

t[1]SIMCA-P+ 11 - 13.12.2005 10:17:16

prediction

q

many process parameters

72

73

74

75

76

77

78

79

0 100 200 300 400

Ko

nz.

(IN

A&

INA

L)

Num

2S

3S

2S

3S

Target

PAKDD-2011 Tutorial, May 27, Shenzhen, China

11A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Example: production

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

predict quality

Page 12: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

monitor

many process parameters

� � � � � � � �

reduced and

transformed

parameterst1, t2

��

-3

-2

-1

0

1

2

3

-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8

t[2

]

t[1]SIMCA-P+ 11 - 13.12.2005 10:17:16

prediction

q

many process parameters

72

73

74

75

76

77

78

79

0 100 200 300 400

Ko

nz.

(IN

A&

INA

L)

Num

2S

3S

2S

3S

Target

Example: production

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

after 2-3 months

predictions get

“out of tune”

predict quality

PAKDD-2011 Tutorial, May 27, Shenzhen, China

12A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 13: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Process changes

Model does not

change

source: Evonik Industries

Static model

PAKDD-2011 Tutorial, May 27, Shenzhen, China

13A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 14: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Process changes

Model does not

change

source: Evonik Industries

the supplier of raw

material changes

a sensor breaks/

“wears off”/

is replaced

new regulations to

save electricity

new operating

crew

new production

procedures

PAKDD-2011 Tutorial, May 27, Shenzhen, China

14A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 15: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Desired Properties of a System To Handle Concept Drift

• Adapt to concept drift asap

• Distinguish noise from changes– robust to noise, but adaptive to changes

• Recognizing and reacting to reoccurring contexts

• Adapting with limited resources – time and memory

PAKDD-2011 Tutorial, May 27, Shenzhen, China

15A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 16: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Setting: adaptive learning

PAKDD-2011 Tutorial, May 27, Shenzhen, China

16A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 17: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Supervised Learning

Learner

population

Historical data

labels

Testingdata

testing labels ?

1. training

2.

2. application

X

y

X'

y'

Training:Learning a mapping function

y = L (x)

Application:Applying L to unseen data

y' = L (X')

PAKDD-2011 Tutorial, May 27, Shenzhen, China

17A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 18: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Learning with concept drift

Learner

population

Training:

y = L (x)

Application:

y' = L?? (X')

population

= ?? Testingdata

testing labels ?

X'

y'

Historical data

labels

X

y

PAKDD-2011 Tutorial, May 27, Shenzhen, China

18A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 19: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Online setting

time

?

Data arrives in a stream

PAKDD-2011 Tutorial, May 27, Shenzhen, China

19A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 20: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Adaptive Learning

time

?

predictionas time passes

the model updates/retrains

itself if needed

updated

model

PAKDD-2011 Tutorial, May 27, Shenzhen, China

20A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 21: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

How to Adapt

?

Select the right training data and retrain

or adjust the modelincrementally

PAKDD-2011 Tutorial, May 27, Shenzhen, China

21A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 22: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

The main strategies how to handle concept drift

Images.com

PAKDD-2011 Tutorial, May 27, Shenzhen, China

22A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 23: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Adaptive learning strategies

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows fixed windows,Instance weighting

adaptive combination rules

dynamic integration,meta learning

PAKDD-2011 Tutorial, May 27, Shenzhen, China

23A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 24: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows fixed windows,Instance weighting

adaptive combination rules

dynamic integration,meta learning

change detection and

a follow up reactionadapting at every step

Adaptive learning strategies

PAKDD-2011 Tutorial, May 27, Shenzhen, China

24A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 25: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows fixed windows,Instance weighting

adaptive combination rules

dynamic integration,meta learning

reactive,

forgetting

maintain

some

memory

Adaptive learning strategies

PAKDD-2011 Tutorial, May 27, Shenzhen, China

25A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 26: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

fixed windows,Instance weighting

forget old data and retrain at a fixed rate

Adaptive learning strategies

PAKDD-2011 Tutorial, May 27, Shenzhen, China

26A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 27: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Fixed Training Window

time

PAKDD-2011 Tutorial, May 27, Shenzhen, China

27A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

train predict

Page 28: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Fixed Training Window

time

PAKDD-2011 Tutorial, May 27, Shenzhen, China

28A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 29: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows

detect a change and cut

Adaptive learning strategies

PAKDD-2011 Tutorial, May 27, Shenzhen, China

29A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 30: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Variable Training Window

PAKDD-2011 Tutorial, May 27, Shenzhen, China

30A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

detect a change

Page 31: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Variable Training Window

PAKDD-2011 Tutorial, May 27, Shenzhen, China

31A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

drop old data

Page 32: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Variable Training Window

PAKDD-2011 Tutorial, May 27, Shenzhen, China

32A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

retrain the model predict

Page 33: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

build many models,dynamically combine

adaptive combination rules

Adaptive learning strategies

PAKDD-2011 Tutorial, May 27, Shenzhen, China

33A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 34: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dynamic EnsembleClassifier 1

Classifier 2

Classifier 3

Classifier 4

vote

PAKDD-2011 Tutorial, May 27, Shenzhen, China

34A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 35: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dynamic Ensemble

voter 1

voter 2

voter 3

voter 4

TRUE

time

PAKDD-2011 Tutorial, May 27, Shenzhen, China

35A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 36: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dynamic Ensemble

voter 1

voter 2

voter 3

voter 4

TRUE

punish

punish

reward

reward

voter 1

voter 2

voter 3

voter 4

PAKDD-2011 Tutorial, May 27, Shenzhen, China

36A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 37: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dynamic Ensemble

voter 1

voter 2

voter 3

voter 4

TRUE

punish

punish

reward

reward

voter 1

voter 2

voter 3

voter 4

TRUE

PAKDD-2011 Tutorial, May 27, Shenzhen, China

37A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 38: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dynamic Ensemble

voter 1

voter 2

voter 3

voter 4

TRUE

punish

punish

reward

reward

voter 1

voter 2

voter 3

voter 4

TRUE

voter 1

voter 2

voter 3

voter 4

PAKDD-2011 Tutorial, May 27, Shenzhen, China

38A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 39: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Triggering Evolving

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

build many models,switch models according to the observed incoming data

dynamic integration,meta learning

Adaptive learning strategies

PAKDD-2011 Tutorial, May 27, Shenzhen, China

39A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 40: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Contextual (Meta) Approaches

Group 1 = Classifier 1

Group 2 = Classifier 2Group 3 = Classifier 3

partition the training databuild classifiers

PAKDD-2011 Tutorial, May 27, Shenzhen, China

40A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 41: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Contextual (Meta) Approaches

Group 1 = Classifier 1

Group 2 = Classifier 2Group 3 = Classifier 3

find to which partition the new instance belongs andassign the classifier

PAKDD-2011 Tutorial, May 27, Shenzhen, China

41A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 42: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

• adaptive learning approaches implicitly or explicitly assume some type of change

PAKDD-2011 Tutorial, May 27, Shenzhen, China

42A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 43: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Types of Changes: speed

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

SUDDEN

GRADUAL

time

mea

nm

ean

time

PAKDD-2011 Tutorial, May 27, Shenzhen, China

43A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 44: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Types of Changes: reoccurrence

REOCCURING

timem

ean

mea

n

time

time

mea

n

ALWAYS NEW*

PAKDD-2011 Tutorial, May 27, Shenzhen, China

44A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 45: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Typically Used*

Triggering(change detection)

Evolving (adapting every step)

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows fixed windows,Instance weighting

adaptive fusion rules

dynamic integration,meta learning

sudden drift

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

* this mapping is not strict or crisp

PAKDD-2011 Tutorial, May 27, Shenzhen, China

45A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 46: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Typically Used

Triggering(change detection)

Evolving (adapting every step)

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows fixed windows,Instance weighting

adaptive fusion rules

dynamic integration,meta learning

gradual drift

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

PAKDD-2011 Tutorial, May 27, Shenzhen, China

46

Page 47: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Typically Used

Triggering(change detection)

Evolving (adapting every step)

Single classifier

Ensemble

Detectors

Dynamic ensembleContextual

Forgetting

variable windows fixed windows,Instance weighting

adaptive fusion rules

dynamic integration,meta learning

reoccuring drift

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

PAKDD-2011 Tutorial, May 27, Shenzhen, China

47A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 48: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Which approach to use?

• changes occur over time

• we need models that evolve over time

• choice of technique depends on

– what type of change is expected

– user goals/ applications

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

PAKDD-2011 Tutorial, May 27, Shenzhen, China

48A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 49: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Block summary

Images.com

PAKDD-2011 Tutorial, May 27, Shenzhen, China

49A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 50: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Block summary

• Concept drift is a problem in data stream applications, since static models lose accuracy

• Different adaptive techniques exist

• The choice of technique depends on

– what change is expected

– peculiarities of the task and application

PAKDD-2011 Tutorial, May 27, Shenzhen, China

50A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite

Handling Concept Drift: Importance, Challenges and Solutions

Page 51: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Block 2: methods and techniques

PAKDD-2011 Tutorial, May 27, Shenzhen, China

51A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 52: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

The goal:

to discuss selected popular approaches from each of the major types in more detail

• Data management

• Detection Methods

• Adaptation methods

• Decision model management

Detectors

Forgetting

Contextual

Dynamic ensembles

PAKDD-2011 Tutorial, May 27, Shenzhen, China

52A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 53: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Learning

MemoryModel Management

Model Adaptation

ControlDetection

Adaptation

Short term MemoryData Management

Input Data

Output ModelPAKDD-2011 Tutorial, May 27, Shenzhen, China

53

Page 54: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Data Management

• Characterize the information stored in memory to maintain a decision model consistent with the actual state of the nature.

– Full Memory.

– Partial Memory.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

54A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 55: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Weighting examples• Full Memory.

– Store in memory sufficient statistics over all the examples.

– Weighting the examples accordingly to their age.

– Oldest examples are less important.

• Weighting examples based on the age:

where example x was found tx time steps ago

)exp()(xtxw λλ −=

PAKDD-2011 Tutorial, May 27, Shenzhen, China

55A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 56: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Weight as a Function of Age

PAKDD-2011 Tutorial, May 27, Shenzhen, China

56

Page 57: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Partial Memory

• Partial Memory.

– Store in memory only the most recent examples.

– Examples are stored in a first-in first-out data structure.

– At each time step the learner induces a decision model using only the examples that are included in the window.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

57A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 58: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Partial Memory

• The key difficulty is how to select the appropriate window size:

– Small window• can assure a fast adaptability in phases with concept

changes

• In more stable phases it can affect the learner performance

– Large window• produce good and stable learning results in stable

phases

• can not react quickly to concept changes.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

58A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 59: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Partial Memory - Windows

• Fixed Size windows.– Store in memory a fixed number of the most recent

examples.– Whenever a new example is available:

• it is stored in memory and • the oldest one is discarded.

– This is the simplest method to deal with concept drift and can be used as a baseline for comparisons.

• Adaptive Size windows.– the set of examples in the window is variable.– They are used in conjunction with a detection model.– Decreasing the size of the window whenever the detection

model signals drift and increasing otherwise.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

59A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 60: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Discussion

• The data management model also indicates the forgetting mechanism.– Weighting examples:

• Corresponds to a gradual forgetting.

• The relevance of old information is less and less important.

– Time windows• Corresponds to abrupt forgetting;

• The examples are deleted from memory.

– Of course we can combine both forgetting mechanisms by weighting the examples in a time window.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

60A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 61: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Discussion

• These methods are blind adaptation models:

– There is no explicit change detection

– Do not provide:

• Any indication about change points

• Dynamics of the process generating data

PAKDD-2011 Tutorial, May 27, Shenzhen, China

61A. Bifet, J.Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 62: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Detection Methods

• The Detection Model characterizes the techniques and mechanisms for explicit drift detection.

– An advantage of the detection model is they can provide:

• Meaningful descriptions:

– indicating change-points or

– small time-windows where the change occurs

• Quantification of the changes.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

62A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 63: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Detection Methods

• Monitoring the evolution of performance indicators.(See Klinkenberg (98) for a good overview of these

indicators)– Performance measures,– Properties of the data, etc

• Monitoring distributions on two different time-windows(See D.Kifer, S.Ben-David, J.Gehrke; VLDB 04)– A reference window over past examples– A window over the most recent examples

PAKDD-2011 Tutorial, May 27, Shenzhen, China

63A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 64: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Monitoring the evolution of performance indicators

• Klinkenberg (1998)proposed monitoring the values of three performance indicators:– Accuracy, recall and precision over time,– and then, comparing it to a confidence interval of

standard sample errors– for a moving average value using the last M batches

of each particular indicator.• FLORA family of algorithms (Widmer and Kubat, 1996)

monitors accuracy and model size.– includes a window adjustment heuristic for a rule-

based classifier.– FLORA3: dealing with recurring concepts.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

64A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 65: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Monitoring distributions on two different time-windows

• D.Kifer, S.Ben-David, J.Gehrke (VLDB 04)– Propose algorithms (statistical tests based on Chernoff

bound) that examine samples drawn from two probability distributions and decide whether these distributions are different.

• Gama, Fernandes, Rocha: VFDTc (IDA 2006)– For streaming data decision tree induction

• Exploits tree structure: nested trees

– Continuously monitoring differences between two class-distribution of the examples:

• the distribution when a node was built and the class-distribution when a node was a leaf and

• the weighted sum of the class-distributions in the leaves descendant of that node.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

65A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 66: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Hoeffding Trees• Multiple windows

– Each node recives information from different windows• Root node: oldest data• Leaves: most recent data

PAKDD-2011 Tutorial, May 27, Shenzhen, China

66A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 67: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

The RS Method

• Implemented in the VFDTc system (IDA 2006)

• For each decision node i, compute two estimates of the classification error.– Static error (SEi):

• the distribution of the error at node i ;

– Backed up error (BUEi): • the sum of the error distributions of all the descending

leaves of the node i ;

– With these two distributions:• we can detect the concept change,

• by verifying the condition SEi ≤ BUEi

PAKDD-2011 Tutorial, May 27, Shenzhen, China

67A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 68: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

RS Method

• Each new example traverses the tree from the root to a leaf– At the leaf: update the value of

SEi

– It makes the opposite path, and update the values of SEi and BUEifor each internal node,

– Verify the regularization condition SEi ≤ BUEi :

• If SEi ≤ BUEi then the node i is pruned to a new leaf.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

68A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 69: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

PAKDD-2011 Tutorial, May 27, Shenzhen, China

69A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 70: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

ADAPTATION METHODS

PAKDD-2011 Tutorial, May 27, Shenzhen, China

70A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 71: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Adaptation Methods

• The Adaptation model characterizes the changes in the decision model do adapt to the most recent examples.– Blind Methods:

• Methods that adapt the learner at regular intervals without considering whether changes have really occurred.

– Informed Methods:• Methods that only change the decision model after a

change was detected. They are used in conjunction with a detection model.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

71A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 72: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Granularity of Decision Models

• Occurrences of drift can have impact in part of the instance space.

– Global models: Require the reconstruction of all the decision model.

• like naive Bayes, SVM, etc

– Granular decision models: Require the reconstruction of parts of the decision model

• like decision rules, decision trees

PAKDD-2011 Tutorial, May 27, Shenzhen, China

72A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 73: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

CVFDT algorithm

• Mining time-changing data streams , G. Hulten, L. Spencer, and P. Domingos; ACM SIGKDD 2001

• Process examples from the stream indefinitely. – For each example (x, y),

• Pass (x, y) down to a set of leaves using HT and all alternate trees of the nodes (x, y) passes through.

• Add (x, y) to the sliding window of examples.

• Periodically scans the internal nodes of HT.• Start a new alternate tree when a new winning attribute

is found.– Tighter criteria to avoid excessive alternate tree creation.– Limit the total number of alternate trees

PAKDD-2011 Tutorial, May 27, Shenzhen, China

73A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 74: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Smoothly adjust to concept drift

• Alternate trees are grown the same way HT is.

• Periodically each node with non-empty alternate trees enter a testing mode.– M training examples to

compare accuracy.

– Prune alternate trees with non-increasing accuracy over time.

– Replace if an alternate tree is more accurate.

No

Age<30?

Car Type=Sports Car?

No

Yes

Yes

Yes

No

Married?

Yes No

Yes No

Experience

<1 year?

No Yes

Yes No

PAKDD-2011 Tutorial, May 27, Shenzhen, China

74A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 75: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Decision model management

• Model management characterizes the number of decision models needed to maintain in memory.

• The key issue here is the assumption that data generated comes from multiple distributions,

– at least in the transition between contexts.

– Instead of maintaining a single decision model several authors propose the use of multiple decision models.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

75A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 76: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dynamic Weighted Majority

• A seminal work, is the system presented by Kolter and Maloof (ICDM03, ICML05).

• The Dynamic Weighted Majority algorithm (DWM) is an ensemble method for tracking concept drift.– Maintains an ensemble of base learners,

– Predicts using a weighted-majority vote of these experts.

– Dynamically creates and deletes experts in response to changes in performance.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

76A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 77: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

DETECTION ALGORITHMS

PAKDD-2011 Tutorial, May 27, Shenzhen, China

77A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 78: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Change Detection in Predictive Learning

• When there is a change in the class-distribution of the examples:– The actual model does not correspond any more to the

actual distribution.– The error-rate increases

• Basic Idea: – Learning is a process. – Monitor the quality of the learning process:

• Using Statistical Process Control techniques.– Monitor the evolution of the error rate.

– Main Problems:• How to distinguish Change from Noise?• How to React to drift?

PAKDD-2011 Tutorial, May 27, Shenzhen, China

78A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 79: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

SPC-Statistical Process ControlLearning with Drift Detection ,Gama, Medas, Gladys, Rodrigues; SBIA-

LNCS Springer, 2004.

• Suppose a sequence of examples in the form < xi,yi>• The actual decision model classifies each example in the sequence

– In the 0-1 loss function, predictions are either True or False– The predictions of the learning algorithm are sequences:

T, F,T, F,T, F,T,T,T, F, …– The Error is a random variable from Bernoulli trials.– The Binomial distribution gives the general form of the

probability of observing a F:• pi = (#F/i)• where i is the number of trials.

ippsiii/)1( −=

PAKDD-2011 Tutorial, May 27, Shenzhen, China

79A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 80: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Statistical Processing Control Algorithm

• Maintains two registers: – Pmin and Smin such that Pmin + Smin = min(pi + si )

– Minimum of the error rate taking into account the variance of the estimator.

• At example j : the error of the learning algorithm will be– Out-control if pj + sj > pmin + α smin

– In-control if pj + sj < pmin + β smin

– Warning Level: if pmin + α smin > pj + sj > pmin + smin

• The constants α and β depend on the desired confidence level.– Admissible values are = 2 and = 3.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

80A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 81: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Statistical Processing Control

PAKDD-2011 Tutorial, May 27, Shenzhen, China

81A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 82: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Illustrative Example: SEA Concepts

• The stream:

PAKDD-2011 Tutorial, May 27, Shenzhen, China

82A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 83: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

The Individual Error of the Models

PAKDD-2011 Tutorial, May 27, Shenzhen, China

83A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 84: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

System ErrorWhat CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

PAKDD-2011 Tutorial, May 27, Shenzhen, China

84A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 85: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Sequential Analysis

• Sequential analysis developed a set of methods for monitoring change detection:

– Sequential Probability Ratio Test - SPRT algorithm, D.Wald, Sequential Tests of Statistical

Hypotheses. Annals of Mathematical Statistics 16 (2): 117–186; 1945

– Cumulative Sums - CUSUM: E. Page, Continuous

Inspection Scheme. Biometrika 41,1954

PAKDD-2011 Tutorial, May 27, Shenzhen, China

85A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 86: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

CUSUM Algorithms

The CUSUM test is used to detect significant increases (or decreases) in the successive observations of a random variable x

The CUSUM test is used to detect significant increases (or decreases) in the successive observations of a random variable x

• Detecting significant increases:

– g0 = 0

– gt = max(0; gt-1+ (xt - αααα))

• The decision rule is: – if gt > λλλλ then alarm and gt = 0.

• Detecting decreases:

– g0 = 0

– gt = min(0; gt-1+ (xt - αααα))

• The decision rule is: – if gt < λλλλ then alarm and gt = 0.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

86A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 87: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Page-Hinckley Test

• The PH test is a sequential adaptation of the detection of an abrupt change in the average of a Gaussian signal.

– It considers a cumulative variable mT , defined as the cumulated difference between the observed values and their mean till the current moment:

PAKDD-2011 Tutorial, May 27, Shenzhen, China

87A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 88: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

The Page-Hinckley Test

• The minimum value of mt is also computed:– MT = min(mt ; t = 1, …,T).

• The test monitors the difference between MT and mT :– PHT = mT - MT .

• When this difference is greater than a given threshold (λ) we alarm a change in the distribution.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

88A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 89: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Illustrative Example

– The left figure plots the on-line error rate of a learning algorithm.

– The center plot is the accumulated on-line error. The slope increases after the occurrence of a change.

– The right plot presents the evolution of the PH statistic.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

89A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 90: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Page-Hinckley Test

• The PHt test is memoryless, and its accuracy depends on the choice of parameters αααα and λ.

– Both parameters are relevant to control the trade-off between earlier detecting true alarms by allowing some false alarms.

• The threshold λ depends on the admissible false alarm rate.

– Increasing λ will entail fewer false alarms, but might miss some changes.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

90A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 91: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Algorithm Adaptive Window-ADWIN• Learning from Time-Changing Data with Adaptive Windowing,

A.Bifet, R.Gavaldà (SDM'07)

• Whenever two “large enough” subwindows of W exhibit “distinct enough” averages, – we can conclude that the corresponding expected values are different,

and

– the older portion of the window is dropped

PAKDD-2011 Tutorial, May 27, Shenzhen, China

91A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 92: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Algorithm ADWIN• The value of εcut for a partition W0

. W1 of W is

computed as follows:

– Let n0 and n1 be the lengths of W0 and W1 and

– Let n be the length of W, so n = n0 + n1.

– Let ηW0 and ηW1 be the averages of the values in W0 and W1,

and W0 and W1 their expected values.

– To obtain totally rigorous performance guarantees we define:

PAKDD-2011 Tutorial, May 27, Shenzhen, China

92A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 93: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Characteristics of Algorithm ADWIN

• False positive rate bound: – If ηt remains constant within W, the probability that

ADWIN shrinks the window at this step is at most δ.

• False negative rate bound.

– Suppose that for some partition of W in two parts W0W1 (where W1 contains the most recent items) we have:|η W0 - ηW1| > 2*εcut. Then with probability 1 - δ ADWIN shrinks W to W1,

or shorter.

PAKDD-2011 Tutorial, May 27, Shenzhen, China

93A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 94: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Illustrative Examples

Abrupt Change Gradual Change

PAKDD-2011 Tutorial, May 27, Shenzhen, China

94A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 95: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

ADWIN – Compressing Space• Using exponential histograms

• Requires O(log W) space and time

• It can provide exact counts for all the subwindows

Example: Counting 1’s in a bit stream

A new 1 arrives:Compress: There are 3 buckets of size 1:

Compress again: There are 3 buckets of size 2:

Remove buckets when a drift is detect:

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Page 96: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What CD is About :: Types of Drifts & Approaches :: Techniques in Detail :: Evaluation :: MOA Demo :: Types of Applications :: Outlook

Dimensions of Analysis:• Data management

• Characterize the information stored in memory to maintain a decision model consistent with the actual state of the nature

• Detection Methods• Characterizes the techniques and mechanisms for explicit

drift detection

• Adaptation methods• Characterizes the changes in the decision model do adapt to

the most recent examples.

• Decision model management• Characterize the number of decision models needed to

maintain in memory.

Summary

PAKDD-2011 Tutorial, May 27, Shenzhen, China

96A. Bifet, J. Gama, M. Pechenizkiy, I.Zliobaite Handling Concept Drift: Importance, Challenges and Solutions

Page 97: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Block  3:  evalua=on  and  MOA  demo  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   98  

A.  Bifet,  J.Gama,  M.  Pechenizkiy,  I.Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 98: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

The  goals:    •  to  present  the  main  training  and  evalua=on  principles  of  adap=ve  learning  approaches  

•  to  present  a  tool  that  is  designed  for  building  and  evalua=ng  adap=ve  learners:    

– an  open-­‐source  so5ware  MOA  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   99  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 99: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Adap=ve  Learning  Method  Evalua=on  

  High  accuracy    Low  computa=onal  cost:  minimum  space  and  =me  needed  

  Change  detec=on:    Fast  detec=on  of  change    Low  false  posi=ves  and  false  nega=ves  ra=os    

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   100  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 100: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Streaming  Se[ng  

1. Process  an  example  at  a  =me,and  inspect  it  only  once  (at  most)  

2. Use  a  limited  amount  of    memory  

3. Work  in  a  limited  amount  of    =me  

4. Be  ready  to  predict  at  any  point  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   101  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 101: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Streaming  Evalua=on  

  Holdout  Evalua=on  

  Interleaved  Test-­‐Then-­‐Train  or  Prequen=al  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   102  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 102: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Holdout  Evalua=on  

  Holdout  an  independent  test  set    

  Apply  the  current  decision  model  to  the  test  set,  at  regular  =me  intervals  

  The  loss  es=mated  in  the  holdout  is  an  unbiased  es=mator  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   103  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 103: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Prequen=al  Evalua=on  

  The  error  of  a  model  is  computed  from  the  sequence  of  examples.    

  For  each  example  in  the  stream,  the  actual  model  makes  a  predic=on  based  only  on  the  example  aaribute-­‐values.  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   104  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 104: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Prequen=al  Evalua=on  

Issues  in  Evalua.on  of  Stream  Learning  Algorithms,  Joao  Gama,Raquel  Sebas=ao,Pedro  Pereira  Rodrigues,  KDD  2009  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   105  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 105: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Twiaer  Example  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   106  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 106: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Twiaer  Example  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   107  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 107: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Kappa  Sta=s=c  Predicted Class + Predicted Class - Total

Correct Class + 75 8 83 Correct Class - 7 10 17 Total 82 18 100

Predicted Class + Predicted Class - Total Correct Class + 68.06 14.94 83 Correct Class - 13.94 3.06 17 Total 82 18 100

Simple  confusion  matrix  example  

Confusion  matrix  for  chance  predictor  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   108  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 108: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Kappa  Sta=s=c  

 p0:  classifier’s  prequen=al  accuracy   pc:  probability  that  a  chance  classifier  makes  a  correct  predic=on.  

κ  sta=s=c  =  (p0  −  pc)/(1  −  pc)  

 κ  =  1  if  the  classifier  is  always  correct   κ  =  0  if  the  predic=ons  coincide  with  the  correct  ones  as  o5en  as  those  of  the  chance  classifier  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   109  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 109: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

RAM  Hours  

Accuracy Time Memory

Classifier A 70% 100 20

Classifier B 80% 20 40

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   110  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 110: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

RAM  Hours  

  Time  and  Memory  in  one  measure  

  Every  GB  of  RAM  deployed  for  1  hour  

  Inspired  by  Cloud  Compu=ng  Rental  Cost  Op=ons  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   111  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 111: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

RAM  Hours  

Accuracy Time Memory RAM-Hours

Classifier A 70% 100 20 2000

Classifier B 80% 20 40 800

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   112  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 112: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

MOA  So5ware  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   113  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 113: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Introduc=on  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   114  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 114: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Evalua=on  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   115  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua.on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 115: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Techniques  to  Handle  Concept  Dri5  Triggering   Evolving    

Single  classifier  

Ensemble  

Detectors  

Dynamic  ensemble  Contextual  

Forge[ng  

variable  windows   fixed  windows,  Instance  weigh=ng  

adap=ve    fusion  rules  dynamic  integra=on,  

meta  learning  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   116  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 116: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Classifica=on  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   117  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 117: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Clustering  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   118  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 118: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Graphical  Interface  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   119  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 119: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Command  Line  

EvaluatePeriodicHeldOutTest    java -cp .:moa.jar:weka.jar -javaagent:sizeofag.jar moa.DoTask "EvaluatePeriodicHeldOutTest -l DecisionStump -s generators.WaveformGenerator -n 100000 -i 100000000 -f 1000000" > dsresult.csv

This  command  creates  a  comma  separated  values  file:    training  the  DecisionStump  classifier  on  the  WaveformGenerator  

data,    using  the  first  100  thousand  examples  for  tes=ng,    training  on  a  total  of  100  million  examples,      and  tes=ng  every  one  million  examples  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   120  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 120: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

How  to  build  a  learner  

  void resetLearningImpl ()   void trainOnInstanceImpl (Instance inst)

  double[] getVotesForInstance (Instance i)

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   121  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Page 121: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

Summary  Block  3  

•  Streaming  Evalua=on    •  Accuracy  using  Prequen=al  Evalua=on  •  Kappa  Sta=s=c  and  RAM-­‐Hours  

•  MOA:  open  source  so5ware  for  data  streams  –  easy  to  use  and  extend  

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica=ons  ::  Outlook  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   122  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 122: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Block  4:  An  applica=on  perspec=ve  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   123  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 123: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   124  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Goal  I:  provide  researchers  dealing  with  CD  related  problems  a  reference  framework  for:  – posi=oning  their  work  with  respect  to  the  applicability  of  the  techniques  in  prac=ce,    

– helping  young  researchers  to  avoid  “green  aliens”  problem  in  mo=va=ng  their  work,    

– beZer  see  new  research  direc=ons  driven  by  challenges  and  opportuni/es  of  certain  types  of  real  applica.ons  where  CD  maZers  

Goal  II:  present  highlights  of  a  few  case  studies  

Block  4:  Outline  

Page 124: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Classical  situa=on  in  early  DM  

•  Problem  is  believed  to  be  important,  but  –  real  data  is  hard  to  get  due  to  privacy  or  proprietary  (but  also  laziness)  reasons  

–  or  the  solu=on  is  too  “generic”  such  that  is  it  not  applicable  in  prac=ce  

–  or  the  solu=on  addresses  a  =ny  problem  in  the  overall  automated  decision  making  process  

•  At  extreme  –  There  is  no  real  applica=on  in  mind  and  it  is  subs=tuted  by  an  invented  real  applica=on  (“green  aliens”)  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   125  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 125: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

As  a  consequence  

•  A  lot  of  research  is  done  on  small/toy  UCI  datasets  and  simulated/ar=ficial  data  

•  Exaggerated  in  concept  dri5  area  – Moving  hyperplane  and  SEA  concepts  (ar=ficially  generated)  benchmarks  and  UCI  datasets  “adjusted”  for  the  needs  of  researchers  (ar=ficially  introduced  dri5)  have  been  used  in  many  publica=ons  on  CD  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   126  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 126: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

“Green  Aliens”  as  an  extreme  case:  •  Inven=ng  non-­‐exis=ng  applica=ons  and  challenges  

•  See  the  parody  paper  by  Richard  D.  Arami  –  “Novel  Efficient  Automated  Robust  Eye  Detec=on  System  for  Green  Aliens”  

hZps://sites.google.com/site/richardaramid/green_alien.pdf    

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   127  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 127: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Block  4:  Outline  

•  Reference  framework  for  the  categoriza=on  of  applica=ons  in  which  CD  maZers  – Type  of  learning/mining  task,  available  data  and  labels,  and  expecta=ons  on  type/nature  of  CD    

•  Case  studies  – Peculiari=es  for  different  kind  or  applica=ons  – What  challenges  and  opportuni=es  they  bring  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   128  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 128: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Applica=on  tasks  

CHANGES  

DATA  

OPERATION  

How  to  characterize    an  applica=on  task,    when  designing    an  adap=ve  predic=on  system?  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   129  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 129: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

DATA  DATA   task    

 detec=on    classifica=on    predic=on      ranking  

type      =me  series    rela=onal    mix  

organiza=on      stream/batches      data  re-­‐access      missing  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   130  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 130: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

CHANGE  change  type    

 sudden    gradual    incremental      reoccurring  

change  source      adversary    interests    popula=on    complexity  

expecta=ons  about  changes    unpredictable    predictable    iden=fiable  

CHANGES  

DATA  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   131  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

expecta=ons  about  desired  ac=on    keep  the  model  uptodate    detect  the  change    iden=fy/locate  the  change    explain  the  change    

Page 131: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

OPERATION  

labels      real  =me    on  demand    fixed  lag    delay  

decision  speed      real  =me      analy=cal  

ground  truth  labels      so5/hard  

costs  of  mistakes    balanced/  unbalanced  

CHANGES  

DATA  

OPERATION  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   132  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 132: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   133  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 133: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Landscape  of  applica=ons  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   134  

Types  of  apps  Industries  

Monitoring/  control  

Personal  assistance/  personaliza.on  

Management  and  planning  

Ubiquitous  applica.ons    

Security,  Police   Fraud  detec=on,  insider  trading  detec=on,  adversary  ac=ons  detec=on    

-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Crime  volume  predic=on  

Authen=ca-­‐=on,  Intrusion  detec=on  

Finance,  Banking,  Telecom,  Credit  Scoring,  Insurance,  Direct  Marke=ng,  Retail,  Adver=sing,  e-­‐Commerce  

Monitoring  &  management  of  customer  segments,  bankruptcy  predic=on  

Product  or  service  recommenda=on,  including  complimentary  

Demand  predic=on,  response  rate  predic=on,  budget  planning  

Loca=on  based  services,  related  ads,  mobile  apps  

Educa=on  (higher,  professional,  children,  e-­‐Learning)    Entertainment,  Media    

Gaming  the  system,  Drop  out  predic=on  

Music,  VOD,  movie,  learning  object  recommenda=on,  adap=ve  news  access,  personalized  search  

Player-­‐centered  game  design,  learner-­‐centered  educa=on  

Virtual  reality,  simula=ons  

Page 134: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

STREAMS/    SENSORS  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   135  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 135: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

CFB  Boiler  Op=miza=on  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   136  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 136: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Online  Mass  Flow  Predic=on  •  fluctua=on  in  the  signal  

of  the  scales  =>      no  reliable  online  data  can  be  obtained  from  the  mass  flow  of  fuel  to  the  boiler  

•  predict  the  actual  mass  flow  based  on  the  sensor  measurements  

•  two  phases:  feed  and  burn  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   137  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 137: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

asymmetric nature of the outliers

short consumption periods within feeding stages

data collected from a typical experimentation with CFB boiler

Pechenizkiy et al. 2009

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   138  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Online  Mass  Flow  Predic=on  

Page 138: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Solu=on  –  use  domain  knowledge  •  Adap=ve  learning  approach  

–  Detect  a  change  and  cut  the  training  window  •  Challenges  

–  Specific  types  of  outliers  and  noise  –  True  change  labels  are  unavailable  

•  3+  component  predic=on  system    –  Signal  model  (2nd  order  regression)  –  Outlier  elimina=on  

–  Change  detec=on  and  training  window  selec=on  –  Backtracking  

Pechenizhkiy  et  al,  2009.  SIGKDD  Explora=ons  11(2),  p.  109-­‐116    

Based  on  moving  average  and  learnable  thresholds  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   139  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 139: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   140  

PREDICTIVE    ANALYTICS  

Page 140: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Stock  Balancing    Problem  

Empty shelves

vs.

Perishable goods becoming obsolete

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   141  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 141: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Challenges  in  food  sales  predic=on  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   142  

Page 142: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Reoccuring  and  suddent  drist  in  food  sales  

Reoccurring  season  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   143  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 143: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Solu=on  •  Adap=ve  learning  approach  

–  Added  external  data  (weather,  holidays)  –  Contextual  (meta  learning)  approach  

•  Approach  –  Form  contextual  features  (structural,  shape,  rela=onal)  –  Iden=fy  product  categories  –  Train  a  switch  mechanism,  which  assigns  the  predictor,  based  on  what  context  is  observed  

•  Challenges  –  Short  history,  rapid  and  frequent  changes  –  Discre=za=on,  formula=ng  the  labels  

Žliobaitė  et  al,  2010.  Bea=ng  the  baseline  predic=on  in  food  sales:  How  intelligent  an  intelligent  predictor  is?  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   144  

Page 144: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Solu=on  

Predictor  /  Classifier  

Temperature  Promo=ons  Seasons  

f(t+1)  

Evaluator  

Change    Detec=on  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   145  

Page 145: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Different  Kinds  of  Errors  

Predic=on  Actual  

Over  predic=on   Under  predic=on  

Over prediction might lead to overstocking and generate more costs

Under prediction might lead to understocking and generate more costs

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   146  

Page 146: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Early  predic=on   Late  predic=on  

Predic=on  Actual  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   147  

Different  Kinds  of  Errors  

Predicting high demand for Wednesday rather than Monday while it is in fact on Tuesday is usually worse (likely to generate more costs)

Page 147: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

DIAGNOSTICS/  DECISION  MAKING  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   148  

Page 148: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

An=bio=c  Resistance  Predic=on  predict the sensitivity of a pathogen to an antibiotic based on data about the antibiotic, the isolated pathogen, and the demographic and clinical features of the patient.

(Tsymbal  et  al.,  2008;  Informa=on  Fusion  9(1))      

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   149  

Page 149: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

How  An=bio=c  Resistance  Happens    

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   150  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 150: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Challenges  and  Solu=on  

•  Different  ways  how  resistance  may  happen  •  Different  ways  how  resistance  may  spread  

•  Physical  connec=ons/hospital  organiza=on    •  Local  dri5  •  Solu=on:  

– Contextual  approach  and  the  instance  level  – Dynamic  integra=on  of  classifiers  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   151  

Page 151: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

PERSONALIZATION  /  RECOMMENDERS  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   152  

Page 152: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Recommender  Systems  

We  Know  What  You  Ought  To  Be  Watching  This  Summer  

Lessons  learnt  from  NeQlix  compe..on:    Temporal  dynamics  is  important    Classical  CD  approaches  may  not  work  

     (Koren,  SIGKDD  2009)  

Something  happened in early 2004…

Are movies getting better with time?

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   153  

Page 153: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Mul=ple  sources  of  temporal  dynamics  

•  Both  items  and  users  are  changing  over  =me    •  Item-­‐side  effects:  –  Product  percep=on  and  popularity  are  constantly  changing  –  Seasonal  paZerns  influence  items’  popularity    

• User-­‐side  effects:  –  Customers  ever  redefine  their  taste  

–  Transient,  short-­‐term  bias;  anchoring  –  Dri5ing  ra=ng  scale  –  Change  of  rater  within  household  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   154  

Page 154: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Block  4:  Summary  •  Working  with  real  data  and  applica=ons  in  which  CD  occurs  and  maZers  is  fun  and  full  of  surprises  –  will  always  strengthen  your  results,  and    –  may  suggest  you  new  research  problems  (driven  by  real  challenges)  to  address    

•  We  presented  a  reference  framework  for  categorizing  different  (types  of)  applica=ons  –  data/task/goal,  expecta=ons  about  changes,  and  opera=onal  seyngs  dimensions  

–  don’t  use  it  as  a  ground  truth  but  as  a  guide  helping  to  beZer  understand  the  landscape  of  exis=ng  and  foreseen  problems  and  to  beZer  posi=on  your  own  problem/work  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   155  

Page 155: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Current  and  Foreseen  Promising  Direc=ons  

•  Changing  the  focus:  from  blind  adap=vity  to  –  change  modeling  and  descrip=on  –  recognizing  and  reusing  similar  situa=ons  from  the  past  

•  Applica=on  driven  concept  dri5  problems,  like    –  label  unavailability  or  lag  –  cost-­‐benefit  trade  off  of  the  model  update  –  controlled  adap=vity  (due  to  adversaries)  –  lack  of  ground  truth  for  training  

•  A  reference  framework  and  guidelines    –  for  incorpora=ng  adap=vity  in  modeling    –  to  be  used  in  different  applica=on  tasks  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   156  

Page 156: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Tutorial  Summary  •  Data  paZerns  change  over  =me,    

–  predic=on  systems  need  to  be  adap=ve    –  to  maintain  accuracy  

•  Four  types  of  learning  techniques    –  make  different  assump=ons  about  the  data  and  change,  –  may  be  appropriate  in  different  situa=ons  

•  Applica=on  tasks  have  different  proper=es  therefore    –  pose  different  challenges,    –  may  require  different  handling  techniques,    –  there  is  no  “one  fits  all  solu=on”    

•  Design  of  adap=ve  predic=on  systems  shall  be  =ghtly  connected  with  applica=on  task  at  hand  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   157  

Page 157: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Tutorial  Summary  

•  Current  state-­‐of-­‐the-­‐art  in  handling  CD  •  Categoriza=on  of  approaches  and  categoriza=on  of  applica=ons    

•  Popular  techniques  •  Lessons  learnt  from  case  studies  •  MOA  so5ware  framework  –  contribute  your  techniques  •  Ini=a=ve  for  the  repository  of  available  benchmarks  for  CD  research  –  contribute  your  datasets  

•  Promising  direc=ons  for  further  research  from  the  applica=ons  perspec=ve  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   158  

Page 158: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

HaCDAIS@IEEE  ICDM  2011    •  2nd  Int.  Workshop  on  Handling  Concept  Dri5  in  Adap=ve  Informa=on  Systems    – Vancouver,  Canada,  December  10th,  2011.    –  in  conjunc=on  with  the  11th  IEEE  Interna=onal  Conference  on  Data  Mining  (IEEE  ICDM  2011).  

– Welcome  to  par=cipate  – Welcome  to  contribute:  

•  July  23,  2011  -­‐  Submissions  due  

•  More  info:    – hZp://wwwis.win.tue.nl/hacdais2011/  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   159  

Page 159: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Data  Streams  @  ACM  SAC  2012  

•  Data  Streams  Track  – ACM  Symposium  on  Applied  Compu=ng    

•  Trento  University,  Italy,  March  20-­‐23,  2012.  – Paper  Submission:  31  August,  2011  

•  Welcome  to  contribute!  – hZp://www.cs.waikato.ac.nz/~abifet/SAC2012/  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   160  

Page 160: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Bibliography  –  General  Background  •  Detec/on  of  Abrupt  Changes  -­‐  Theory  and  Applica/on,  M.  Basseville,  I.  

Nikiforov,  Pren=ce-­‐Hall,  Inc.  1993.  

•  Sta/s/cal  Quality  Control,  E.  Grant,  R.  Leavenworth,  McGraw-­‐Hill,  1996.  

•  Con/nuous  Inspec/on  Scheme,  E.  Page.  Biometrika  41  1954  

•  Learning  in  the  Presence  of  Concept  DriA  and  Hidden  Contexts,  G.Widmer,  M.  Kubat:  Machine  Learning  23(1):  69-­‐101  (1996)  

•  Learning  driAing  concepts:  Example  selec/on  vs.  example  weigh/ng,  R.Klinkenberg,  IDA  2004  

•  Learning  with  DriA  Detec/on;  J.  Gama,  P.  Medas,  G.  Cas=llo,  P.  Rodrigues;  SBIA  2004,  Springer.  

•  The  problem  of  concept  driA:  Defini/ons  and  related  work,  Tsymbal,  A.  Technical  Report.  Dept.  Computer  Science,  Trinity  College,  Ireland,  2004.  

•  Adap/ve  Learning  and  Mining  for  Data  Streams  and  Frequent  PaNerns,  Albert  Bifet,  PhD  Thesis,  2009  

•  Adap/ve  Training  Set  Forma/on,  Indre  Žliobaitė,  PhD  Thesis,  Vilnius  University,  Lithuania,  2010.    

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   161  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 161: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Bibliography  -­‐  Approaches  •  Mining  Time-­‐Changing  Data  Streams,  G.  Hulten,  L.  Spencer,  P.  

Domingos,  ACM  SIGKDD,  2001.  •  Dynamic  Weighted  Majority:  A  New  Ensemble  Method  for  Tracking  

Concept  DriA,  by  J.  Kolter,  M.  Maloof,  ICDM  2003.  •  Mining  Concept  Dri5ing  Data  Streams  using  Ensemble  Classiers,  by  

H.Wang,  Wei  Fan,  P.  Yu,  J.  Han,  ACM  SIGKDD  2003.  •  Decision  Trees  for  Mining  Data  Streams;  J.  Gama,  R.  Fernandes,  

R.Rocha.  Intelligent  Data  Analysis  10(1):23-­‐45  (2006)  •  OLINDDA:  A  cluster-­‐based  approach  for  detec/ng  novelty  and  

concept  driA  in  data  streams.  Spinosa,  E.J.,  Carvalho,  A.,  and  Gama,  J.    22ndACM  SAC  2007:  ACM  Press.  

•  An  Ensemble  of  Classifiers  for  Coping  with  Recurring  Contexts  in  Data  Streams,  Katakis,  I.,  Tsoumakas,  G.,  and  Vlahavas,  I..  in  18th  ECAI.  2008,  IOS  

•  …  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   162  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 162: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Bibliography  -­‐  Applica=ons  •  Reference  Framework  for  Handling  Concept  DriA:  An  Applica/on  Perspec/ve.    

Žliobaitė,  I.  and  Pechenizkiy,  M.  Technical  report,  Eindhoven  University  of  Technology,  2010  

•  Sen/ment  knowledge  discovery  in  TwiNer  streaming  data.  Albert  Bifet  and  Eibe  Frank.  Proc  13th  Interna=onal  Conference  on  Discovery  Science,  Canberra,  Australia,  2010.  

•  Collabora/ve  Filtering  with  Temporal  Dynamics  Yehuda  Koren,  KDD  2009,  ACM,  2009  

•  MOA:  Massive  online  analysis,  a  framework  for  stream  classifica/on  and  clustering.    Albert  Bifet,  Geoff  Holmes,  Bernhard  Pfahringer,  Philipp  Kranen,  Hardy  Kremer,  Timm  Jansen,  Thomas  Seidl,  in  Pechenizkiy,  M.  &  Žliobaitė,  I.  (eds),  HaCDAIS  Workshop  ECML-­‐PKDD  2010,  2010  

•  Online  Mass  Flow  Predic/on  in  CFB  Boilers  with  Explicit  Detec/on  of  Sudden  Concept  DriA.  Pechenizkiy,  M.,  Bakker,  J.,  Žliobaitė,  I.,  Ivannikov,  A.,  Karkkainen,  T.  SIGKDD  Explora=ons  11(2),  p.  109-­‐116,  2009.  

•  Dynamic  Integra/on  of  Classifiers  for  Handling  Concept  DriA,  Tsymbal,  A.,  Pechenizkiy,  M.,  Cunningham,  P.  &  Puuronen,  S.  Informa=on  Fusion,  Special  Issue  on  Applica=ons  of  Ensemble  Methods,  9(1),  pp.  56-­‐68,  2008.  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China   163  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons  

Page 163: Handling Concept Drift - University of Waikatoabifet/PAKDD2011/... · Learning with concept drift Learner population Training: y = L (x) Application: y' = L?? (X') population = ??

What  CD  is  About  ::  Types  of  Dri5s  &  Approaches  ::  Techniques  in  Detail  ::  Evalua=on  ::  MOA  Demo  ::  Types  of  Applica.ons  ::  Outlook  

Acknowledgements  •  Thanks  to:    

–  Raquel  Sebas=ão,  Pedro  Rodrigues,  Jorn  Bakker  –  FCT  financial  support    of  LIAAD,  and  Poject  Kdus  –  Part  of  the  research  leading  to  these  results  has  received  funding  from  the  European  Commission  within  the  Marie  Curie  Industry  and  Academia  Partnerships  and  Pathways  (IAPP)  programme  under  grant  agreement  no.  251617.  

–  NWO  HaCDAIS  Project  

PAKDD-­‐2011  Tutorial,      May  27,  Shenzhen,  China  

A.  Bifet,  J.  Gama,  M.  Pechenizkiy,  I.  Zliobaite      Handling  Concept  Dri5:  Importance,  Challenges  and  Solu=ons   164