cs540 machine learning lecture 6 - cs.ubc.camurphyk/teaching/cs540-fall08/l6.pdf · cs540 machine...

29
1 CS540 Machine learning Lecture 6

Upload: others

Post on 25-Jun-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

1

CS540 Machine learningLecture 6

Page 2: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

2

Last time

• Linear and ridge regression (QR, SVD, LMS)

Page 3: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

3

This time

• Logistic regression• MLE

• Perceptron algorithm• IRLS

• Multinomial logistic regression

Page 4: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

4

Logistic regression

• Model for binary classification

SigmoidorLogisticfunction

Page 5: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

5

Logistic regression in 1d

SAT scores

Page 6: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

6

Logistic regression in 2d

Page 7: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

7

Notation

Page 8: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

8

Why the logistic function?

• McCulloch Pitts model of neuron

Page 9: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

9

Log-odds ratio

Page 10: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

10

This time

• Logistic regression• MLE

• Perceptron algorithm• IRLS

• Multinomial logistic regression

Page 11: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

11

MLE

Page 12: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

12

Gradient

Page 13: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

13

Perceptron algorithm

Page 14: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

14

Perceptron algorithm

Page 15: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

15

Convergence

• If linearly separable (so errors = 0), guaranteed to converge, but may do so slowly

• If not linearly separable, may not converge

Page 16: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

16

This time

• Logistic regression• MLE

• Perceptron algorithm• IRLS

• Multinomial logistic regression

Page 17: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

17

IRLS

• Iteratively reweighted least squares for finding the MLE for logistic regression

• Special case of Newton’s algorithm

Page 18: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

18

Newton’s method

• Consider a quadratic objective

• Gradient methods may take many steps, but we can “hop” to the minimum in 1 step if we use

• In general, g(x) will not be linear in x, but we can linearize it. Alternatively we can approximate f by a quadratic.

Page 19: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

19

Linearize the gradient

• First order Taylor series approx of g(x) around x_k

0

g(x)g

lin(x)

xk xk+dk

Page 20: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

20

Approximate the function

• Construct second order Taylor of f(x) around x_k

f(x)fquad

(x)

xk xk+dk

Page 21: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

21

Newton’s algorithm

Use QR to solve H dk = -gk for dk

Page 22: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

22

Gradient and Hessian

Page 23: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

23

Generic solver

Page 24: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

24

IRLS

Page 25: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

25

L2 regularization

• Needed to prevent overfitting and w -> inf

Page 26: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

26

This time

• Logistic regression• MLE

• Perceptron algorithm• IRLS

• Multinomial logistic regression

Page 27: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

27

Multinomial logistic regression

• Y in {1,…,C} categorical

Binary case

softmax

Page 28: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

28

Softmax function

Page 29: CS540 Machine learning Lecture 6 - cs.ubc.camurphyk/Teaching/CS540-Fall08/L6.pdf · CS540 Machine learning Lecture 6. 2 Last time • Linear and ridge regression (QR, SVD, LMS) 3

29

MLE

Can compute gradient and Hessian and use Newton’s method

Can add L2 regularizer

Can use faster optimization methods eg bound optimization