machine learning (cse 446): perceptron · machine learning (cse 446): perceptron noah smith c 2017...

22
Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington [email protected] October 9, 2017 1 / 22

Upload: phungtu

Post on 24-Jul-2019

245 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Machine Learning (CSE 446):Perceptron

Noah Smithc© 2017

University of [email protected]

October 9, 2017

1 / 22

Page 2: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Happy Medium?

Decision trees (that aren’t too deep): use relatively few features to classify

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

2 / 22

Page 3: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Happy Medium?

Decision trees (that aren’t too deep): use relatively few features to classify

K-nearest neighbors: all features weighted equally.

Today: use all features, but weight them.

For today’s lecture, assume that y ∈ {−1,+1} instead of {0, 1}, and that x ∈ Rd.

3 / 22

Page 4: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Inspiration from NeuronsImage from Wikimedia Commons.

Input signals come in through dendrites, output signal passes out through the axon.

4 / 22

Page 5: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Neuron-Inspired Classifier

x[1]

w[1] ×

x[2] w[2] ×

x[3] w[3] ×

x[d] w[d] ×

b

!

fire, or not?ŷ

bias parameter

weight parameters

“activation” output

input

5 / 22

Page 6: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Neuron-Inspired Classifier

x[1]

w[1] ×

x[2] w[2] ×

x[3] w[3] ×

x[d] w[d] ×

b

! ŷoutput

input

f

6 / 22

Page 7: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Neuron-Inspired Classifier

f(x) = sign (w · x+ b)

remembering that: w · x =

d∑j=1

w[j] · x[j]

7 / 22

Page 8: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Neuron-Inspired Classifier

f(x) = sign (w · x+ b)

remembering that: w · x =

d∑j=1

w[j] · x[j]

Learning requires us to set the weights w and the bias b.

8 / 22

Page 9: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Perceptron Learning AlgorithmData: D = 〈(xn, yn)〉Nn=1, number of epochs EResult: weights w and bias binitialize: w = 0 and b = 0;for e ∈ {1, . . . , E} do

for n ∈ {1, . . . , N}, in random order do# predicty = sign (w · xn + b);if y 6= yn then

# updatew← w + yn · xn;b← b+ yn;

end

end

endreturn w, b

Algorithm 1: PerceptronTrain

9 / 22

Page 10: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

10 / 22

Page 11: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

The perceptron learning algorithm’s sole hyperparameter is E, the number of epochs(passes over the training data).

11 / 22

Page 12: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Parameters and Hyperparameters

This is the first supervised algorithm we’ve seen that has parameters that arenumerical values (w and b).

The perceptron learning algorithm’s sole hyperparameter is E, the number of epochs(passes over the training data).

Can you think of a clever way to efficiently tune E using development data?

12 / 22

Page 13: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Perceptron Updates

Suppose yn = 1 but w · xn + b < 0; this means y = −1.

The new weights and bias will be:

w′ = w + xn

b′ = b+ 1

If we immediately made the prediction again, we’d get activation:

w′ · xn + b′ = (w + xn) · xn + (b+ 1)

= w · xn + b+ ‖xn‖22 + 1

≥ w · xn + b+ 1

13 / 22

Page 14: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Perceptron Updates

Suppose yn = 1 but w · xn + b < 0; this means y = −1.

The new weights and bias will be:

w′ = w + xn

b′ = b+ 1

If we immediately made the prediction again, we’d get activation:

w′ · xn + b′ = (w + xn) · xn + (b+ 1)

= w · xn + b+ ‖xn‖22 + 1

≥ w · xn + b+ 1

So we know we’ve moved in the “right direction.”

14 / 22

Page 15: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Geometric Interpretation

For every possible x, there are three possibilities:

w · x+ b > 0 classified as positive

w · x+ b < 0 classified as negative

w · x+ b = 0 on the decision boundary

15 / 22

Page 16: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Geometric Interpretation

For every possible x, there are three possibilities:

w · x+ b > 0 classified as positive

w · x+ b < 0 classified as negative

w · x+ b = 0 on the decision boundary

The decision boundary is a (d− 1)-dimensional hyperplane.

16 / 22

Page 17: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Linear Decision Boundary

w·x + b < 0

w·x + b > 0

w·x + b = 0

17 / 22

Page 18: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Linear Decision Boundary

w·x + b = 0

activation = w·x + b

18 / 22

Page 19: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

19 / 22

Page 20: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

20 / 22

Page 21: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

21 / 22

Page 22: Machine Learning (CSE 446): Perceptron · Machine Learning (CSE 446): Perceptron Noah Smith c 2017 University of Washington nasmith@cs.washington.edu October 9, 2017 1/22

Interpretation of Weight Values

What does it mean when . . .

I w12 = 100?

I w12 = −1?

I w12 = 0?

22 / 22