a shallow look at deep learning computer vision james hays many slides from cvpr 2014 deep learning...

29
A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially) and Rob Fergus https://sites.google.com/site/ deeplearningcvpr2014

Upload: eustace-robinson

Post on 19-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

A shallow look at Deep Learning

Computer VisionJames Hays

Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially) and Rob Fergus

https://sites.google.com/site/deeplearningcvpr2014

Page 2: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Goal of this final course section

• To understand deep learning as a computer vision practitioner

• To understand the mechanics of a deep convolutional network at test time (the forward pass)

• To have an intuition for the learning mechanisms used to train CNNs (the backward pass / backpropagation)

• To know of recent success stories with learned representations in computer vision

• To gain an intuition for what is going on inside deep convolutional networks

Page 3: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

http://www.cc.gatech.edu/~zk15/DL2016/deep_learning_course.html

Page 4: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Image Low-level vision features

(edges, SIFT, HOG, etc.)

Object detection/ classification

Input data (pixels)

Learning Algorithm (e.g., SVM)

feature representation

(hand-crafted)

Features are not learned

Traditional Recognition Approach

Page 5: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

SIFT Spin image

Textons

Computer vision features

SURF, MSER, LBP, Color-SIFT, Color histogram, GLOH, …..

HoGand many others:

Page 6: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Motivation• Features are key to recent progress in recognition

• Multitude of hand-designed features currently in use

• Where next? Better classifiers? building better features?

Felzenszwalb, Girshick,McAllester and Ramanan, PAMI 2007

Yan & Huang(Winner of PASCAL 2010 classification competition)

Slide: R. Fergus

Page 7: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

What Limits Current Performance?

• Also removal of part deformations has small (<2%) effect.– Are “Deformable Parts” necessary in the Deformable Parts Model?

Divvala, Hebert, Efros, ECCV 2012

• Ablation studies on Deformable Parts Model– Felzenszwalb, Girshick, McAllester, Ramanan, PAMI’10

• Replace each part with humans (Amazon Turk):

Parikh & Zitnick, CVPR’10

Slide: R. Fergus

Page 8: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

• Mid-level cues

Mid-Level Representations

Continuation Parallelism Junctions Corners

“Tokens” from Vision by D.Marr:

• Object parts:

• Difficult to hand-engineer What about learning them?

Slide: R. Fergus

Page 9: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Learning Feature Hierarchy

• Learn hierarchy

• All the way from pixels classifier

• One layer extracts features from output of previous layer

Layer 1 Layer 2 Layer 3 SimpleClassifie

r

Slide: R. Fergus

Image/Video Pixels

• Train all layers jointly

Page 10: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Learning Feature Hierarchy1. Learn useful higher-level features from images

2. Fill in representation gap in recognition

Feature representation

Input data

1st layer “Edges”

2nd layer “Object parts”

3rd layer “Objects”

Pixels

Lee et al., ICML 2009; CACM 2011

Page 11: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Learning Feature Hierarchy

• Better performance

• Other domains (unclear how to hand engineer):– Kinect– Video– Multi spectral

• Feature computation time– Dozens of features now regularly used [e.g., MKL]– Getting prohibitive for large datasets (10’s sec /image)

Slide: R. Fergus

Page 12: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Approaches to learning features• Supervised Learning

– End-to-end learning of deep architectures (e.g., deepneural networks) with back-propagation

– Works well when the amounts of labels is large– Structure of the model is important (e.g.

convolutional structure)

• Unsupervised Learning– Learn statistical structure or dependencies of the data

from unlabeled data– Layer-wise training– Useful when the amount of labels is not large

Page 13: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Taxonomy of feature learning methods

• T• Support Vector Machine

• Logistic Regression

• Perceptron

• Deep Neural Net

• Convolutional Neural Net

• Recurrent Neural Net

• Denoising Autoencoder

• Restricted Boltzmann machines*

• Sparse coding*

• Deep (stacked) Denoising Autoencoder*

• Deep Belief Nets*

• Deep Boltzmann machines*

• Hierarchical Sparse Coding*

DeepShallow

Supervised

Unsupervised * supervised version exists

Page 14: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Supervised Learning

Page 15: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Example: Convolutional Neural Networks

• LeCun et al. 1989

• Neural network with specialized connectivity structure

Slide: R. Fergus

Page 16: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Convolutional Neural Networks

Input Image

Convolution

(Learned)

Non-linearity

Pooling

• Feed-forward:– Convolve input– Non-linearity (rectified linear)– Pooling (local max)

• Supervised• Train convolutional filters by

back-propagating classification errorLeCun et al. 1998

Feature maps

Slide: R. Fergus

Page 17: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Components of Each Layer

Pixels /

Features

Filter withDictionary(convolutional or tiled)

Spatial/Feature

(Sum or Max)

Normalization between feature

responses

OutputFeatures

+ Non-linearity

[Optional]

Slide: R. Fergus

Page 18: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Filtering

Input Feature Map

• Convolutional– Dependencies are local– Translation equivariance– Tied filter weights (few params)– Stride 1,2,… (faster, less mem.)

.

.

.

Slide: R. Fergus

Page 19: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Non-Linearity

nt)

• Non-linearity– Per-element (independe– Tanh– Sigmoid: 1/(1+exp(-x))– Rectified linear

• Simplifies backprop• Makes learning faster• Avoids saturation issues

Preferred option

Slide: R. Fergus

Page 20: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Pooling

• Spatial Pooling– Non-overlapping / overlapping regions– Sum or max– Boureau et al. ICML’10 for theoretical analysis

Max

Sum

Slide: R. Fergus

Page 21: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Normalization

• Contrast normalization (across feature maps)– Local mean = 0, local std. = 1, “Local” 7x7 Gaussian– Equalizes the features maps

Feature MapsFeature Maps

After Contrast Normalization

Slide: R. Fergus

Page 22: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Compare: SIFT Descriptor

Image Pixels

ApplyGabor filters

Spatial pool

(Sum)

Normalize tounit length

FeatureVector

Slide: R. Fergus

Page 23: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Applications

• Handwritten text/digits– MNIST (0.17% error [Ciresan et al. 2011])– Arabic & Chinese [Ciresan et al. 2012]

• Simpler recognition benchmarks– CIFAR-10 (9.3% error [Wan et al. 2013])– Traffic sign recognition

• 0.56% error vs 1.16% for humans [Ciresan et al. 2011]

Slide: R. Fergus

Page 24: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Application: ImageNet

[Deng et al. CVPR 2009]

• ~14 million labeled images, 20k classes

• Images gathered from Internet

• Human labels via Amazon Turk

Page 25: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Krizhevsky et al. [NIPS 2012]

• 7 hidden layers, 650,000 neurons, 60,000,000 parameters

• Trained on 2 GPUs for a week

• Same model as LeCun’98 but:- Bigger model (8 layers)- More data (106 vs 103 images)- GPU implementation (50x speedup over CPU)- Better regularization (DropOut)

Page 26: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

ImageNet Classification 2012

• Krizhevsky et al. -- 16.4% error (top-5)• Next best (non-convnet) – 26.2% error

35

30

25

20

15

10

5

0SuperVision ISI Oxford INRIA Amsterdam

Top-

5 er

ror r

ate

%

Page 27: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

ImageNet Classification 2013 Results

• http://www.image-net.org/challenges/LSVRC/2013/results.php

0.17

0.16

0.15

0.14

0.13

0.12

0.11

0.1

Test

err

or (t

op-

5)

Page 28: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Feature Generalization• Zeiler & Fergus, arXiv 1311.2901, 2013• Girshick et al. CVPR’14• Oquab et al. CVPR’14• Razavian et al. arXiv 1403.6382, 2014

(Caltech-101,256) (Caltech-101, SunS) (VOC 2012)(lots of datasets)

• Pre-train on Imagnet

Retrain classifier on Caltech256

6 training examples

From Zeiler & Fergus, Visualizing and Understanding Convolutional Networks, arXiv 1311.2901, 2013

CNN features

Sohn, Jung, Lee, Hero, ICCV 2011 Slide: R. Fergus

Bo, Ren, Fox, CVPR 2013

Page 29: A shallow look at Deep Learning Computer Vision James Hays Many slides from CVPR 2014 Deep Learning Tutorial (Honglak Lee and Marc’Aurelio especially)

Industry Deployment

• Used in Facebook, Google, Microsoft• Image Recognition, Speech Recognition, ….• Fast at test time

Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14

Slide: R. Fergus