instance-based learning evgueni smirnov. overview instance-based learning comparison of eager and...

25
Instance-Based Learning Evgueni Smirnov

Upload: emil-barrett

Post on 20-Jan-2018

233 views

Category:

Documents


0 download

DESCRIPTION

– Learning = storing all training instances – Classification = an instance gets a classification equal to the classification of the nearest instances to the instance. Instance-Based Learning

TRANSCRIPT

Page 1: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Instance-Based Learning

Evgueni Smirnov

Page 2: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

OverviewOverview• Instance-Based Learning

Comparison of Eager and Instance-Based Learning• Instance Distances for Instance-Based Learning• Nearest Neighbor (NN) Algorithm • Advantages and Disadvantages of the NN algorithm• Approaches to overcome the Disadvantages of the NN algorithm• Combining Eager and Instance-Based Learning

Page 3: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

– Learning = storing all training instances

– Classification = an instance gets a classification equal to the classification of the nearest instances to the instance.

Instance-Based LearningInstance-Based Learning

Page 4: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Different Learning MethodsDifferent Learning Methods• Eager Learning

– Learning = acquiring an explicit structure of a classifier on the whole training set;

– Classification = an instance gets a classification using the explicit structure of the classifier.

• Instance-Based Learning (Lazy Learning)– Learning = storing all training instances– Classification = an instance gets a classification equal to

the classification of the nearest instances to the instance.

Page 5: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Different Learning MethodsDifferent Learning Methods• Eager Learning

Any random movement=>It’s a mouse

I saw a mouse!

Page 6: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Instance-Based LearningInstance-Based Learning

Its very similar to aDesktop!!

Page 7: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

The Features of the Task of the NN Algorithm:• the instance language I is a conjunctive language with a set A with n attributes a1, a2, … an. The domain of each attribute ai, can be discrete or continuous.

• an instance x is represented as < a1(x), a2(x), … an(x) >, where ai(x) is the value of the attribute ai for the instance x;• the classes to be learned can be:– discrete. In this case we learn discrete function f(x) and the co-domain C of the function consists of the classes c to be learned.– continuous. In this case we learn continuous function f(x) and the co-domain C of the function consists of the classes c to be learned.

Nearest-Neighbor Algorithm (NN) Nearest-Neighbor Algorithm (NN)

Page 8: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

a

jijia range

xaxa),x(xd

|)()(|

Distance FunctionsDistance Functions

The distance functions are composed from difference metrics da

w.r.t. attributes a defined for each two instances xi and xj.

• If the attribute a is numerical, then :

• If the attribute a is discrete, then :

otherwise.1,

)a()a( if0, jijia

xx),x(xd

Page 9: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Distance FunctionsDistance Functions

The main distance function for determining nearest neighbors is the Euclidean distance:

2),(

Aa

jiaji xxd),xd(x

Page 10: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

The case of discrete set of classes.1. Take the instance x to be classified2. Find k nearest neighbors of x in the training data.3. Determine the class c of the majority of the instances among the k nearest neighbors.4. Return the class c as the classification of x.

kk-Nearest-Neighbor Algorithm-Nearest-Neighbor Algorithm

Page 11: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

++

++

--

-

-

-

-

e1

1-nn:1-nn: q1 is positive5-nn: q1 is classified as negative

q1

Classification & Decision BoundariesClassification & Decision Boundaries

Page 12: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

The case of continuous set of classes (Regression).1. Take the instance x to be classified2. Find k nearest neighbors of x in the training data.3. Return the average of the classes of the k nearest neighbors as the classification of x.

kk-Nearest-Neighbor Algorithm-Nearest-Neighbor Algorithm

Page 13: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

The case of discrete set of classes.1. Take the instance x to be classified2. Determine for each class c the sum 3. Return the class c with the greater Sc.

Distance Weighted Distance Weighted Nearest-Neighbor AlgorithmNearest-Neighbor Algorithm

cx c

cc

xxdS

tobelongs 2),(

1

Page 14: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Advantages of the NN AlgorithmAdvantages of the NN Algorithm

• the NN algorithm can estimate complex target classes locally and differently for each new instance to be classified;

• the NN algorithm provides good generalisation accuracy on many domains;

• the NN algorithm learns very quickly;

• the NN algorithm is robust to noisy training data;

• the NN algorithm is intuitive and easy to understand which facilitates implementation and modification.

Page 15: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Disadvantages of the NN AlgorithmDisadvantages of the NN Algorithm

• the NN algorithm has large storage requirements because it has to store all the data;

• the NN algorithm is slow during instance classification because all the training instances have to be visited;

• the accuracy of the NN algorithm degrades with increase of noise in the training data;

• the accuracy of the NN algorithm degrades with increase of irrelevant attributes.

Page 16: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Condensed NN AlgorithmCondensed NN Algorithm

The Condensed NN algorithm was introduced to reduce the storage requirements of the NN algorithm.

The algorithm finds a subset S of the training data D s.t. each instance in D can be correctly classified by the NN algorithm applied on the subset S. The average reduction of the algorithm varies between 60% to 80%.

Page 17: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Condensed NN AlgorithmCondensed NN Algorithm

++

--

This algorithm first randomly selects one instance for each class in D and puts it in S. Then each instance in D is classified using only the instances in S. If an instance is misclassified, it is added to S. This process is repeated until there are no instances in D that are misclassified.

D S

+

+

- +

Page 18: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Condensed NN AlgorithmCondensed NN AlgorithmThe CNN algorithm is especially sensitive to noise, because noisy instances will usually be misclassified by their neighbors, and thus will be retained. This causes two problems.

• storage reduction is hindered, because noisy instances are retained, and because they are there, often non-noisy instances nearby will also need to be retained.

• generalization accuracy is hurt because noisy instances are usually exceptions and thus do not represent the underlying function well.

Page 19: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

The Edited Nearest Neighbor algorithm was proposed to stabilise the accuracy of the NN algorithm when there is increase of noise in the training data.

The algorithm starts with the set S equal to the training data D, and then each instance in S is removed if it does not agree with the majority of its k nearest neighbors (with k=3, typically).

The algorithm edits out noisy instances as well as close border cases, leaving smoother decision boundaries. It also retains all internal points; i.e., it does not reduce the space as much as most other reduction algorithms.

Edited NN AlgorithmEdited NN Algorithm

Page 20: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

++

++

--

-

-

-

-

e1

The negative instance is removed!

Edited NN AlgorithmEdited NN Algorithm

The average reduction of the algorithm varies between 20% to 40%.

Page 21: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

The weighting-attribute technique was proposed in order to improve the accuracy of the NN algorithm in the presence of irrelevant attributes.

The key idea is to find weights for all the attribute and to use them when the distance between instances is computed. Determining the weights of the attributes can be done by some search algorithm while determining the adequacy of the weights can be done with the process of cross validation.

In a similar way we can choose the best k parameter for the NN algorithm!

Weighting AttributesWeighting Attributes

Page 22: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Combining Decision Tress and Combining Decision Tress and the NN Algorithmthe NN Algorithm

Outlook

sunny overcast rainy

Humidity Windy

high normal

no

false true

yes

yes yes no

Page 23: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Combining Decision Tress and Combining Decision Tress and the NN Algorithmthe NN Algorithm

Outlook

sunny overcast rainy

Humidity Windy

high normal false true

yes

Classify the instance using the NN algorithmapplied on the training instances

associated with the classification nodes (leaves)

Page 24: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Combining Decision Rules and Combining Decision Rules and the NN Algorithmthe NN Algorithm

Incrementally

Compile

Instances Instances & Abstractions

Page 25: Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based…

Summary PointsSummary Points

• Instance-based learning is simple, efficient and accurate approach to concept learning and classification.

• Many of the problems of instance-based learning can be solved.

• Instance-based learning can be combined with eager approaches to concept learning.