machine learning (cse 446): perceptron · machine learning (cse 446): perceptron noah smith c 2017...

Post on 24-Jul-2019

251 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning (CSE 446):Perceptron

Noah Smithc© 2017

University of Washingtonnasmith@cs.washington.edu

October 9, 2017

1 / 22

Happy Medium?

Decision trees (that aren’t too deep): use relatively few features to classify

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

2 / 22

Happy Medium?

Decision trees (that aren’t too deep): use relatively few features to classify

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

For today’s lecture, assume that y ∈ {−1,+1} instead of {0, 1}, and that x ∈ Rd.

3 / 22

Inspiration from NeuronsImage from Wikimedia Commons.

Input signals come in through dendrites, output signal passes out through the axon.

4 / 22

Neuron-Inspired Classifier

x[1]

w[1] ×

x[2] w[2] ×

x[3] w[3] ×

x[d] w[d] ×

b

!

fire, or not?ŷ

bias parameter

weight parameters

“activation” output

input

5 / 22

Neuron-Inspired Classifier

x[1]

w[1] ×

x[2] w[2] ×

x[3] w[3] ×

x[d] w[d] ×

b

! ŷoutput

input

f

6 / 22

Neuron-Inspired Classifier

f(x) = sign (w · x+ b)

remembering that: w · x =

d∑j=1

w[j] · x[j]

7 / 22

Neuron-Inspired Classifier

f(x) = sign (w · x+ b)

remembering that: w · x =

d∑j=1

w[j] · x[j]

Learning requires us to set the weights w and the bias b.

8 / 22

Perceptron Learning AlgorithmData: D = 〈(xn, yn)〉Nn=1, number of epochs EResult: weights w and bias binitialize: w = 0 and b = 0;for e ∈ {1, . . . , E} do

for n ∈ {1, . . . , N}, in random order do# predicty = sign (w · xn + b);if y 6= yn then

# updatew← w + yn · xn;b← b+ yn;

end

end

endreturn w, b

Algorithm 1: PerceptronTrain

9 / 22

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

10 / 22

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

The perceptron learning algorithm’s sole hyperparameter is E, the number of epochs(passes over the training data).

11 / 22

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

The perceptron learning algorithm’s sole hyperparameter is E, the number of epochs(passes over the training data).

Can you think of a clever way to efficiently tune E using development data?

12 / 22

Perceptron Updates

Suppose yn = 1 but w · xn + b < 0; this means y = −1.

The new weights and bias will be:

w′ = w + xn

b′ = b+ 1

If we immediately made the prediction again, we’d get activation:

w′ · xn + b′ = (w + xn) · xn + (b+ 1)

= w · xn + b+ ‖xn‖22 + 1

≥ w · xn + b+ 1

13 / 22

Perceptron Updates

Suppose yn = 1 but w · xn + b < 0; this means y = −1.

The new weights and bias will be:

w′ = w + xn

b′ = b+ 1

If we immediately made the prediction again, we’d get activation:

w′ · xn + b′ = (w + xn) · xn + (b+ 1)

= w · xn + b+ ‖xn‖22 + 1

≥ w · xn + b+ 1

So we know we’ve moved in the “right direction.”

14 / 22

Geometric Interpretation

For every possible x, there are three possibilities:

w · x+ b > 0 classified as positive

w · x+ b < 0 classified as negative

w · x+ b = 0 on the decision boundary

15 / 22

Geometric Interpretation

For every possible x, there are three possibilities:

w · x+ b > 0 classified as positive

w · x+ b < 0 classified as negative

w · x+ b = 0 on the decision boundary

The decision boundary is a (d− 1)-dimensional hyperplane.

16 / 22

Linear Decision Boundary

w·x + b < 0

w·x + b > 0

w·x + b = 0

17 / 22

Linear Decision Boundary

w·x + b = 0

activation = w·x + b

18 / 22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

19 / 22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

20 / 22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

21 / 22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

22 / 22

top related