lecture 29: optimization and neural nets cs4670/5670: computer vision kavita bala slides from andrej...

26
Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei http://vision.stanford.edu/teaching/cs2

Upload: arleen-hoover

Post on 19-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Lecture 29: Optimization and Neural Nets

CS4670/5670: Computer VisionKavita Bala

Slides from Andrej Karpathy and Fei-Fei Lihttp://vision.stanford.edu/teaching/cs231n/

Page 3: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Summary

Page 4: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Other loss functions

• Scores are not very intuitive

• Softmax classifier– Score function is same– Intuitive output: normalized class probabilities– Extension of logistic regression to multiple classes

Page 5: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Softmax classifier

Interpretation: squashes values into range 0 to 1

Page 6: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Cross-entropy loss

Page 7: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Aside: Loss function interpretation

• Probability– Maximum Likelihood Estimation (MLE)– Regularization is Maximum a posteriori (MAP)

estimation

• Cross-entropy H– p is true distribution (1 for the correct class), q is

estimated – Softmax classifier minimizes cross-entropy– Minimizes the KL divergence (Kullback-Leibler)

between the distribution: distance between p and q

Page 8: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

SVM vs. Softmax

Page 9: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Summary• Have score function and loss function

– Will generalize the score function• Find W and b to minimize loss

– SVM vs. Softmax• Comparable in performance• SVM satisfies margins, softmax optimizes probabilities

Page 10: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Gradient Descent

Page 11: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Step size: learning rateToo big: will miss the minimumToo small: slow convergence

Page 12: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Analytic Gradient

Page 13: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Page 14: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Gradient Descent

Page 15: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Mini-batch Gradient Descent

Page 16: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Stochastic Gradient Descent (SGD)

Page 17: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Summary

Page 18: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Page 19: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Page 20: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Page 21: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Where are we?

• Classifiers: SVM vs. Softmax• Gradient descent to optimize loss functions

– Batch gradient descent, stochastic gradient descent

– Momentum– Numerical gradients (slow, approximate), analytic

gradients (fast, error-prone)

Page 22: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Derivatives

• Given f(x), where x is vector of inputs– Compute gradient of f at x:

Page 23: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Examples

Page 24: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Page 25: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Page 26: Lecture 29: Optimization and Neural Nets CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li

Backprop gradients

• Can create complex stages, but easily compute gradient