data mining (and machine learning)

44
Data Mining (and machine learning) ROC curves Rule Induction CW3

Upload: moral

Post on 07-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Data Mining (and machine learning). ROC curves Rule Induction Basics of Text Mining. Two classes is a common and special case. Two classes is a common and special case. Medical applications: cancer, or not? Computer Vision applications: landmine, or not? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining (and machine learning)

Data Mining(and machine learning)

ROC curves

Rule InductionCW3

Page 2: Data Mining (and machine learning)

Two classes is a common and special case

Page 3: Data Mining (and machine learning)

Two classes is a common and special case

Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …

Page 4: Data Mining (and machine learning)

Two classes is a common and special case

Medical applications: cancer, or not?Computer Vision applications: landmine, or not?Security applications: terrorist, or not?Biotech applications: gene, or not?… …

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 5: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 6: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 7: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.

False Negative: also to be minimised – miss a landmine / cancer very bad in many applications

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 8: Data Mining (and machine learning)

Two classes is a common and special caseTrue Positive: these are ideal. E.g. we correctly detect cancer

False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.

False Negative: also to be minimised – miss a landmine / cancer very bad in many applications

True Negative?:

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 9: Data Mining (and machine learning)

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 10: Data Mining (and machine learning)

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task

Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition?

Predicted Y Predicted N

Actually Y True Positive False Negative

Actually N False Positive True Negative

Page 11: Data Mining (and machine learning)

YES

NO

Page 12: Data Mining (and machine learning)

YES

NO

Page 13: Data Mining (and machine learning)

YES

NO

Sensitivity: 100%Specificity: 25%

YES NO

Page 14: Data Mining (and machine learning)

YES

NO

Sensitivity: 93.8%Specificity: 50%

Page 15: Data Mining (and machine learning)

YES

NO

Sensitivity: 81.3%Specificity: 83.3%

YES NO

Page 16: Data Mining (and machine learning)

YES

NO

Sensitivity: 56.3%Specificity: 100%

YES NO

Page 17: Data Mining (and machine learning)

YES

NO

Sensitivity: 100%Specificity: 25%

YES NO

100% Sensitivity means: detects all cancer cases (or whatever) but possibly with many false positives

Page 18: Data Mining (and machine learning)

YES

NO

Sensitivity: 56.3%Specificity: 100%

YES NO

100% Specificity means: misses some cancer cases (or whatever) but no false positives

Page 19: Data Mining (and machine learning)

Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks

Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases?A highly sensitive test for cancer: if “NO” then you be sure it’s “NO”

Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”.

With many trained classifiers, you can ‘move the line’ in this way.E.g. with NB, we could use a threshold indicating how much higherthe log likelihood for Y should be than for N

Page 20: Data Mining (and machine learning)

ROC curves

David Corne, and Nick Taylor, Heriot-Watt University - [email protected] slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html

Page 21: Data Mining (and machine learning)

Rule Induction• Rules are useful when you want to learn a

clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible

• There are a number of different ways to ‘learn’ rules or rulesets.

• Before we go there, what is a rule / ruleset?

Page 22: Data Mining (and machine learning)

Rules

IF Condition … Then Class Value is …

Page 23: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Rules are Rectangular

IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES

Page 24: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Rules are Rectangular

IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO

Page 25: Data Mining (and machine learning)

A Ruleset

IF Condition1 … Then Class = A

IF Condition2 … Then Class = A

IF Condition3 … Then Class = B

IF Condition4 … Then Class = C

Page 26: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

What’s wrong with this ruleset?(two things)

Page 27: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

What about this ruleset?

Page 28: Data Mining (and machine learning)

Two ways to interpret a ruleset:

Page 29: Data Mining (and machine learning)

Two ways to interpret a ruleset:

As a Decision List

IF Condition1 … Then Class = A

ELSE IF Condition2 … Then Class = A

ELSE IF Condition3 … Then Class = B

ELSE IF Condition4 … Then Class = C

ELSE … predict Background Majority Class

Page 30: Data Mining (and machine learning)

Two ways to interpret a ruleset:

As an unordered set

IF Condition1 … Then Class = A

IF Condition2 … Then Class = A

IF Condition3 … Then Class = B

IF Condition4 … Then Class = C

Check each rule and gather votes for each class

If no winner, predict background majority class

Page 31: Data Mining (and machine learning)

Three broad ways to learn rulesets

Page 32: Data Mining (and machine learning)

Three broad ways to learn rulesets

1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!

Page 33: Data Mining (and machine learning)

Three broad ways to learn rulesets

2. Use any good search/optimisation algorithm.

Evolutionary (genetic) algorithms are the most

common. You will do this coursework 3.

This means simply guessing a ruleset at random,

and then trying mutations and variants, gradually

improving them over time.

Page 34: Data Mining (and machine learning)

Three broad ways to learn rulesets

3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage

Page 35: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Take each class in turn ..

Page 36: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Pick a random member of that class in the training set

Page 37: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 38: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 39: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 40: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Extend it as much as possible without including another class

Page 41: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Next class

Page 42: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

Next class

Page 43: Data Mining (and machine learning)

YES

NO

5

4

3

2

1

00 1 2 3 4 5 6 7 8 9 10 11 12

And so on…

Page 44: Data Mining (and machine learning)

CW3• Run expts program that evolves a ruleset

• Try different sizes of training and test set

• Observe ‘overfitting’ and report