lecture7 - ibk

24
Introduction to Machine Introduction to Machine Learning Learning Lecture 7 Lecture 7 Instance Based Learning Albert Orriols i Puig il@ ll l d aorriols@salle.url.edu Artificial Intelligence – Machine Learning Enginyeria i Arquitectura La Salle Universitat Ramon Llull

Upload: albert-orriols-puig

Post on 24-Jan-2015

2.070 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lecture7 - IBk

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 7Lecture 7 Instance Based Learning

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull

Page 2: Lecture7 - IBk

Recap of Lecture 6

LET’S START WITH DATA CLASSIFICATIONCLASSIFICATION

Slide 2Artificial Intelligence Machine Learning

Page 3: Lecture7 - IBk

Recap of Lecture 6

Data Set Classification Model How?

We are going to deal with:

• Data described by nominal and continuous attributes

• Data that may have instances with missing values

Slide 3Artificial Intelligence Machine Learning

Page 4: Lecture7 - IBk

Recap of Lecture 6We want to build decision trees

How can I automatically generate these typesgenerate these types of trees?

Decide which attribute weDecide which attribute weshould put in each node

Decide a split pointDecide a split point

Rely on information theory

We also saw many other improvements

Slide 4Artificial Intelligence Machine Learning

Page 5: Lecture7 - IBk

Today’s Agenda

Classification without building a modelK-Nearest Neighbor (kNN)Effect of KDistance functionsDistance functionsVariants of K-NNStrengths and weaknesses

Slide 5Artificial Intelligence Machine Learning

Page 6: Lecture7 - IBk

Classification without Building a Model

Forget about a global model!g gSimply store all the training examples

B ild l l d l f h t t i tBuild a local model for each new test instance

Refered to as lazy learners

Some approaches to IBLSome approaches to IBLNearest neighbors

Locally weighted regression

Case-based reasoning

Slide 6Artificial Intelligence Machine Learning

Page 7: Lecture7 - IBk

k-Nearest NeighborsAlgorithmg

Store all the training data

Gi t t i tGiven a new test instanceRecover the k neighbors of the test instanceP di t th j it l th i hbPredict the majority class among the neighbors

Voronoi Cells: The feature space isdecomposed into several cells.

E.g. for k=1

Slide 7Artificial Intelligence Machine Learning

Page 8: Lecture7 - IBk

k-Nearest NeighborsBut, where is the learning process?, g p

Select the k neighbors and return the majority class is learning?

N th t’ j t t i iNo, that’s just retrieving

But still, some important issuesWhich k should I use?Which k should I use?

Which distance functions should I use?

Should I maintain all instances of the training data set?

Slide 8Artificial Intelligence Machine Learning

Page 9: Lecture7 - IBk

Which k Should I Use?The effect of k

15-NN 1-NN

Do you remember the discussion about overfitting in C4.5?

Slide 9

Apply the same concepts here!

Artificial Intelligence Machine Learning

Page 10: Lecture7 - IBk

Which k Should I Use?Some experimental results on the use of different kp

7-NN

Notice that the test error decreases as k increases but at k ≈ 5-

Number of neighbors

Notice that the test error decreases as k increases, but at k ≈ 5-7, it starts increasing again

Rule of thumb: k=3 k=5 and k=7 seem to work ok in the

Slide 10

Rule of thumb: k=3, k=5, and k=7 seem to work ok in the majority of problems

Artificial Intelligence Machine Learning

Page 11: Lecture7 - IBk

Distance FunctionsDistance functions must be able to

Nominal attributes

C ti tt ib tContinuous attributes

Missing values

The keyThey must return a low value for similar objects and a highThey must return a low value for similar objects and a high value for different objects

Seems obvious right? But still it is domain dependentSeems obvious, right? But still, it is domain dependent

There are many of them. Let’s see some of the most usedused

Slide 11Artificial Intelligence Machine Learning

Page 12: Lecture7 - IBk

Distance FunctionsDistance between two points in the same spacep p

d(x, y)

Some properties expected to be satisfied in generald(x, y) ≥ 0 and d(x, x) = 0

d(x y) = d(y x)d(x, y) = d(y, x)

d(x, y) + d(y, z) ≥ d(x, z)

Slide 12Artificial Intelligence Machine Learning

Page 13: Lecture7 - IBk

Distances for Continuous Variables

Given x=(x1,…,xn)’ and y=(y1,…,yn)’1 n 1 n

Euclidean ∑ −=n

yxyxd 2/12 ])([)(Euclidean ∑=

=i

iiE yxyxd1

])([),(

Minkowsky ∑ −=n

qqyxyxd /1])([)(Minkowsky ∑=i

iiE yxyxd1

])([),(

Distance absolute value ∑ −=n

iiABS yxyxd ||),( ∑=i

iiABS yy1

||),(

Slide 13Artificial Intelligence Machine Learning

Page 14: Lecture7 - IBk

Distances for Continuous Variables

What if attributes are measured over different scales?Attribute 1 ranging in [0,1]

Attribute 2 ranging in [0 1000]Attribute 2 ranging in [0, 1000]

Can you detect any potential problem in the aforementioned distance functions?distance functions?

Slide 14Artificial Intelligence Machine Learning

X in [0,1], y in [0,1000] X in [0,1000], y in [0,1000]

Page 15: Lecture7 - IBk

Distances for Continuous Variables

The larger the scale, the larger the influence of the g , gattribute in the distance function

Solution: Normalize each attributeSolution: Normalize each attribute

How:Normalization by means of the range

aa exexd )(

aa

aaa

exexdexexdnorm minmax

),(),( 2121 −

=

Normalization by means of the standard deviation

aaaa

aexexdexexd

norm σ4),(),( 21

21 =

Slide 15Artificial Intelligence Machine Learning

aσ4

Page 16: Lecture7 - IBk

Distances for Nominal Attributes

Several metrics to deal with nominal attributesOverlap distance function

Idea: Two nominal attributes are equal only if they have the same value

Slide 16Artificial Intelligence Machine Learning

Page 17: Lecture7 - IBk

Distances for Nominal Attributes

Several metrics to deal with nominal attributesValue difference metric (VDM)

C = number of classesP(a ex a c) = conditional probabilityP(a, exi , c) = conditional probability that the output class is c given that the attribute a has de value exi

a.

Idea: Two nominal values are similar if they have more similar correlations with the output classes

Slide 17

See (Wilson & Martinez) for more distance functions

Artificial Intelligence Machine Learning

Page 18: Lecture7 - IBk

Distances for Heterogeneous Attributes

What if my data set is described by both nominal and continuous attributes?continuous attributes?

Apply the same distance function

Use nominal distance functions for nominal attributes

Use continuous distance function for continuous attributes

Slide 18Artificial Intelligence Machine Learning

Page 19: Lecture7 - IBk

Variants of kNN

Different variants of kNN Distance-weighted kNN

Attribute-weighted kNN

Slide 19Artificial Intelligence Machine Learning

Page 20: Lecture7 - IBk

Distance-Weighted kNNInference of original kNNg

The k nearest neighbors vote for the class

Shouldn’t the closest examples have a higher influence in theShouldn t the closest examples have a higher influence in the decision process?

Weight the contribution of each of the k neighbors wrt their distanceWeight the contribution of each of the k neighbors wrt their distance

E.g.,))((maxarg)(ˆ k

xfvwxf = ∑ δ k

2

1

)(1

))(,(maxarg)(

i

iii

Vvq

dwwhere

xfvwxf

=

= ∑=∈

δ

∑== k

i

iii

q

w

xfwxf 1

)()(ˆ

2),( iqi xxd ∑

=iiw

1

More robust to noisy instances and outliers

E.g.: Shepard’s method (Shepard,1968)

Slide 20Artificial Intelligence Machine Learning

Page 21: Lecture7 - IBk

Attribute-weighted kNNWhat if some attributes are irrelevant or misleading?g

If irrelevant cost increases, but accuracy is not affected

If i l di t i d dIf misleading cost increases and accuracy may decrease

Weight attributes:

∑n

d 2)()( ∑=

−=i

iiiw yxwyxd1

2)(),(

How to determine the weights?Option 1: The expert provide us with the weightsp p p g

Option 2: Use a machine learning approach

More will be said in the next lecture!

Slide 21

More will be said in the next lecture!

Artificial Intelligence Machine Learning

Page 22: Lecture7 - IBk

Strengths and WeaknessesStrengths of kNN

Building of a new local model for each test instance

Learning has no costLearning has no cost

Empirical results show that the method is highly accurate w.r.t other machine learning techniquesmachine learning techniques

WeaknessesRetrieving approach, but does not learn

No global model. The knowledge is not legible

Test cost increases linearly with the input instances

No generalizationNo generalization

Curse of dimensionality: What happens if we have many attributes?

Slide 22

Noise and outliers may have a very negative effect

Artificial Intelligence Machine Learning

Page 23: Lecture7 - IBk

Next Class

From instance-based to case-based reasoning

A little bit more on learningDistance functions

Prototype selection

Slide 23Artificial Intelligence Machine Learning

Page 24: Lecture7 - IBk

Introduction to MachineIntroduction to Machine LearningLearning

Lecture 7Lecture 7 Instance Based Learning

Albert Orriols i Puigi l @ ll l [email protected]

Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q

Universitat Ramon Llull