unsupervised deep learning -...

66
UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School of Electrical Engineer – Tel Aviv University 1

Upload: vominh

Post on 23-Feb-2018

249 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

UNSUPERVISED DEEP LEARNING

Erez AharonovNoam Eilon

Deep Learning Seminar School of Electrical Engineer – Tel Aviv University 1

Page 2: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Building High-level Features Using Large Scale Unsupervised

Learning

Quoc V. Le

Marc’Aurelio

Rajat Monga

Matthieu Devin

Kai Chen

Greg S. Corrado

Jeff Dean

Andrew Y. Ng

2012 2

Page 3: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Outline

• Short introduction - Unsupervised Learning

• Overview

• Training Deep autoencoder

• Model Architecture

• Parallelism and ASGD

• Results

3

Page 4: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Supervised Learning

Input Data

Learning Machine

Outputs

ObjectiveExternal rewards

Target

World

Machine

4

Page 5: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Unsupervised Learning

Input Data

Learning Machine

Outputs

ObjectiveIntrinsic Rewards

Target

World

Machine

5

Page 6: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Three Kinds of Learning

Supervised Leaning Unsupervised Learning Reinforced Learning

Input X – Data, Y- Label X – Data Current state, reward

Goal Learn a function to map X to Y Learn structure Optimize reward

Limitation Availability of labeled data Complexity and size Training model

Examples Classification, Segmentation, Object detection, Image captioning

Feature learning, Generative models.

Policy/Decisions/Games

6

Page 7: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview• Building high level class-specific feature detectors from unlabeled data.

• How can a perceptual system build itself by looking at the world? How much prior structure is necessary?

• Could a network learn, in an unsupervised way, to be sensitive to high level concepts like human faces, cats.

• Inspiration: “Grandmother neurons”: Represents a complex but specific concept or object.

“Invariant visual representation by single neurons in the human brain,” Quian Quiroga et al.7

Page 8: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Main concept: Deep Autoencoders

• Hierarchy of representations with increasing level of abstraction.

• Each module transforms its input representation into a higher-level one.

• High-level features are more global and more invariant.

• Low-level features are shared among categories.

x1

x2

x3

x4

x5

x'1

x'2

x‘3

x‘4

x‘5

Encoding Decoding

8

Page 9: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Training Deep autoencoders

End to End training:x1

x2

x3

x4

x5

x'1

x'2

x‘3

x‘4

x‘5

Encoding Decoding

• Encoding decoding through all layers• Calculating loss on input and

reconstruction.

9

Page 10: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Training Deep autoencoders

Greedy Layer wise:• Training each layer separately as an autoencoder.

• The input of each autoencoder is the output of the previous hierarchy

• Finetuning on the full network.

x'1

x'2

x‘3

x‘4

x‘5

x1

x2

x3

x4

x5

x'1

x'2

x‘3

x1

x2

x3

x'1

x'2

x‘3

x‘4

x1

x2

x3

x4

10

Page 11: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

The Network Outline

• 3 Encoding-Decoding layers.

• 9 Layer autoencoder.

• All parameters in our model were trained jointly with the objective being the sum of the objectives of the three layers.

Image

Encode

Pool & LCN

Decode

Encode

Pool & LCN

Decode

Encode

Pool & LCN

Decode

60,000 Neurons

200X200 Image 11

Page 12: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

One layer architecture

First sublayer: Local receptive fields.

• 18x18 pixels RF windows.• 8 Feature maps.• Each neuron connects to all

input channels• Not convolutional for more

invariance.

12

Page 13: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

One layer architecture

Second sublayer - Pooling

• L2 pooling • 5x5 overlapping windows.• H – Fixed pooling matrix.• Pooling over one feature.• Improves invariance to local deformations.

𝑦𝑗,𝑖 =

𝑢𝑣

𝐻𝑢,𝑣𝑔𝑗+𝑢,𝑖+𝑣2

13

Page 14: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

One layer architecture

Third sublayer – Local contrast normalization

𝑔𝑖,𝑗,𝑘 = ℎ𝑖,𝑗,𝑘 − 𝑖𝑢𝑣𝐺𝑢𝑣ℎ𝑖,𝑗+𝑢,𝑖+𝑣

𝑦𝑖,𝑗,𝑘= 𝑔𝑖,𝑗,𝑘

max{𝑐, 𝑖𝑢𝑣 𝐺𝑢𝑣𝑔2𝑖,𝑗+𝑢,𝑖+𝑣}

𝐺 − Gaussian weighted window 5x5𝑐 − Small constant to prevent numerical errorsi/u/v – Channel number, and window size

• 5x5 overlapping windows. • Connects to all input channels

14

Page 15: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Local contrast normalization

• Relatively dominant activations are preferred over high activations on all features.• Enforcing a sort of local competition between adjacent feature, and between

features at the same spatial location in different feature maps• Improves optimization.

LCN

0

1

0

-1

0

0.5

0

-0.5

LCN

10

10

10

10

0

0

0

0

15

Page 16: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

One Layer summery

W1

H

8 output maps 171x171.Local Contrast Normalization.

Input3 x 200 x 200 image

Local receptive fields:• 18x18 pixels RF windows.• Not convolutional• 8 Feature maps.• Each neuron connects to all

input channels

Second sublayerFirst Sublayer

Pooling:• 5x5 overlapping pooling

maps• L2 pooling• Pooling over one feature

Third sublayer

Local contrast normalization:• Pooling over all features

3 x 200 x 200 image xi

16

Page 17: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

9 Layer structure

3 x 200 x 200 image xi

LCN LCN LCN

W11 W1

2W1

3

HH

L2 Pooling

L2 Pooling

L2 Pooling

Local ReceptiveFields

Local ReceptiveFields

Local ReceptiveFields

H

17

Page 18: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

The Optimization Problem

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑤1𝑤2

𝑖=1

𝑚

𝑊2𝑊1𝑇𝑥(𝑖) − 𝑥(𝑖)

2

2+ 𝜆

𝑗=1

𝑘

𝜖 + 𝐻𝑗(𝑊1𝑇𝑥(𝑖))2

𝑊1 Encoding matrix

𝑊2 Decoding matrix

𝜆 Tradeoff between sparsity and reconstruction (0.1)

k Number of pooling units

𝐻𝑗 Vector of weights of the j-th pooling unit (constant)

𝜖 Numerical stability constant

m Number of examples

ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning, Le, Q. V et al18

Page 19: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

The Optimization Problem

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑤1𝑤2

𝑖=1

𝑚

𝑊2𝑊1𝑇𝑥(𝑖) − 𝑥(𝑖)

2

2+ 𝜆

𝑗=1

𝑘

𝜖 + 𝐻𝑗(𝑊1𝑇𝑥(𝑖))2

Global reconstruction cost -Ensures the representations encode important information about the data = they can reconstruct the input data

Group Sparsity / Spatial pooling –• Outputs of second sublayer.• Lower sum of activations is preferred.• Encourages pooling to group similar

features together to achieve invariances.

19

Page 20: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Feature Grouping

Forces encodings to be organized in a topographical map by pooling together structure-correlated features belonging to the same hidden topic, More specifically, features that are near to each other in the topographic map are relatively strongly dependent in the sense of mutual information.

Kavukcuoglu, Koray, Rob Fergus, and Yann LeCun. "Learning invariant features through topographic filter maps."20

Page 21: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Training the Network

3 x 200 x 200 image xi8 LCN maps.5x5 kernels.Unit computes

8 LCN maps.5x5 kernels.Unit computes

8 LCN maps.5x5 kernels.Unit computes

W11 W1

2W1

3

HHH

W21 W2

2 W23

LCN maps from prior layer LCN maps from prior layer

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑤1𝑤2

𝑖=1

𝑚

𝑊2𝑊1𝑇𝑥(𝑖) − 𝑥(𝑖)

2

2+ 𝜆

𝑗=1

𝑘

𝜖 + 𝐻𝑗(𝑊1𝑇𝑥(𝑖))2

21

Page 22: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Implementation

Year Deep network Arcitecture Parameters

2012 Alexnet 60M

2014 VGGnet 138M

2014 GoogLeNet 5M

2012 Google autoencoder 1.15B

Dataset: 10 million 200X200 unlabeled images from YouTube

Training: 2000 machines with 16000 CPU cores for 1 week

Parameters: 1.15B learned weights

22

Page 23: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Model parallelism

• The network is partitioned• Each machine store the partition parameters• Partitions pass update messages• Less fault tolerant (requires some recovery if

any single machine fails).• Good for convolution layers, less for fully-

connected.

Large Scale Distributed Deep Networks, Dean et al.23

Page 24: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Data parallelism

• Multiple instances of the model.• Each computes parameters updates.• Communicates results with parameter server.

Large Scale Distributed Deep Networks, Dean et al.24

Page 25: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Asynchronous Gradient Descent

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, Abadi et al

Synchronized parallelismEach iteration: • Waiting for all devices to finish• Calculating parameter updates• Updating parameters server

Asynchronous parallelism• Each model run separately • Updates the parameters without synchronization• Less accurate if using ASGD.

25

Page 26: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results -Detection

• Looking for a neuron that is sensitive to high level concepts, a face/cat/body part - detector.

• Method – Test set with known positive/negative ratio .(Example - Faces: 37,000 images, of those 13,026 are of faces.)

• For each neuron checking the minimum and maximum activation values.

• Splitting the activation range to 20 equally spaced threshold.

• Picking the best neuron and the best threshold that gives the highest accuracy.

26

Page 27: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results -DetectionSummary of numerical comparisons against other baselines:

Histograms of faces (red) vs. no faces (blue):

27

Page 28: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results - Invariance

• Method - choosing of 10 face images and perform distortions to them, e.g., scaling and translating.

• Out-of-plane rotation using 10 images of faces rotating in 3D.

pixels pixels

28

Page 29: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results -Visualization

Most responsive stimuli in the test set. The optimal stimulus according to numerical constraint optimization

29

Page 30: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results – ImageNet

• Unsupervised training on YouTube and ImageNet images.• Logistic classifier on top of the highest layer.• Training the logistic classifiers and then fine-tuned the network.• The entire training was carried out on 2,000 machines for one week

Summary of classification accuracies for our method and other state-of-the-art baselines on ImageNet.

30

Page 31: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Summary• This work shows that it is possible to train neurons to be selective for

high-level concepts using entirely unlabeled data.

• The network was able to learn invariances from unlabeled data.

• Object recognition on ImageNet: A significant leap of 70% relative improvement over the state-of-the-art.

Google Builds a Brain that Can Search for Cat Videos, Time, June 2012How Many Computers to Identify a Cat? 16,000, NYT June 2012

31

Page 32: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Unsupervised Learning of Visual Representations using Videos

Xiaolong Wang, Abhinav Gupta

Robotics Institute, Carnegie Mellon University

Published in 2015

http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Wang_Unsupervised_Learning_of_ICCV_2015_paper.pdf

32

Page 33: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Agenda

• Overview

• Patch Mining in Videos

• CNN implementation

• Results

• Discussion and Conclusion

33

Page 34: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

• Do we really need millions of semantically-labeled images to learn a good representation?

• It seems humans can learn visual representations using little or no semantic supervision but our approaches still remain completely supervised.

34

Page 35: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

• Previous work on unsupervised learning:• Millions of static images or frames extracted from

videos• The most common architecture used is an auto-

encoder which learns representations based on its ability to reconstruct the input images.

35

Page 36: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

• Previous work on unsupervised learning:• Have been able to automatically learn V1-like filters

given unlabeled data, but they are still far away from supervised approaches on tasks such as object detection

36

Page 37: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

37

Page 38: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

• Key insight:• Visual tracking is one of the first capabilities that develops

in infants and often before semantic representations are learned.

• Using a video and tracking we are able to produce patches of the same object. Should have similar visual representation in deep feature space.

38

Page 39: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

http://www.aoa.org/patients-and-public/good-vision-throughout-life/childrens-vision/infant-vision-birth-to-24-months-of-age?sso=y#1

39

Page 40: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

• Proposal:• Siamese-triplet network with ranking loss function to train

a CNN representation.• This ranking loss function enforces that in the final deep

feature space the first frame patch should be much closer to the tracked patch than any other randomly sampled patch.

40

Page 41: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Overview

• Proposal:

41

Page 42: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Patch Mining in Videos

• Source for videos: YouTube• Estimated number of new videos uploaded: 300K per

minute (2016)• Tracking:

• Obtain SURF interest points (Speed up robust features, 2006)

• Improved Dense Trajectories (IDT) to obtain motion (2013)

• Kernelized correlation filter (KCF, 2014)

42

Page 43: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Patch Mining in Videos

43

• Patches accepted:o > 25 % of moving SURF points ando < 75 % of moving SURF points

Page 44: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Patch Mining in Videos

44

Page 45: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Siamese Triplet Network

• 3 networks which share the same parameters• Image with size 227 × 227 as input• Based on the AlexNet architecture• Two fully connected layers stacked on the pool5 outputs,

whose neuron numbers are 4096 and 1024 respectively• Thus final output of each single network is 1024

dimensional feature space

45

Page 46: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Siamese Triplet Network

46

Page 47: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Ranking Loss Function

• Cosine distance in the feature space

𝐷 𝑋1, 𝑋2 = 1 −𝑓 𝑋1 ∙ 𝑓 𝑋2𝑓 𝑋1 𝑓 𝑋2

• Goal: 𝐷 𝑋𝑖 , 𝑋𝑖− > 𝐷 𝑋𝑖 , 𝑋𝑖

+

• 𝑋𝑖 - first frame patch

• 𝑋𝑖+ - last frame patch

• 𝑋𝑖− - patch from different video

47

Page 48: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Ranking Loss Function

• Per triplet of images:𝐿 𝑋𝑖 , 𝑋𝑖

+, 𝑋𝑖− = 𝑚𝑎𝑥 0, 𝐷 𝑋𝑖 , 𝑋𝑖

+ − 𝐷 𝑋𝑖 , 𝑋𝑖− +𝑀

• Total objective:

min𝑊

𝜆

2𝑊 22 +

𝑖=1

𝑁

𝐿 𝑋𝑖 , 𝑋𝑖+, 𝑋𝑖−

M = 0.5

𝜆 = 0.0005

48

Page 49: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Patch Mining for Triplet Sampling

• Given 𝑋𝑖 , 𝑋𝑖+, how to select 𝑋𝑖

• Random Selection:• For each images couple in batch B randomly sample K

negative matches in the same batch• Shuffle all the images randomly after each epoch of

training

49

Page 50: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Patch Mining for Triplet Sampling

• Given 𝑋𝑖 , 𝑋𝑖+, how to select 𝑋𝑖

• Hard Negative Mining• Applied after 10 epochs of training• Choose k samples from batch with highest loss• K = 4, B = 100

50

Page 51: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Adapting for Supervised Tasks

• Method #1:oBased on RCNN paper.oUse pre-trained unsupervised “AlexNet” based network oParameters of layers till pool5 are used as initialization.oTwo fully connected layers initialized randomly.oLearning rate is 0.01 instead of 0.001 (RCNN)

51

Page 52: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Adapting for Supervised Tasks

• Method #2:oIterative approach

1) Fine-tune using the PASCAL VOC data2) Re-adapt to ranking triplet task3) Again, transfer convolutional parameters for re-

adaptingoNetwork converges after two iterations

52

Page 53: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Implementation Details

• 100K videos into 8M patches

• 3 different networks using 1.5M, 5M and 8M patches

• Batch size: 100

• Initial learning rate: 0.001

• Random negative sampling for 150K iterations, afterwards hard negative mining

53

Page 54: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Implementation Details

• 1.5M Patches:• Reduce learning rate by 10 every 80K iterations

• Total: 240K iterations

• 5M & 8M Patches:• Reduce learning rate by 10 every 120K iterations

• Total: 350K iterations

54

Page 55: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results: Learned features

55

Page 56: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results: Network response

56

Page 57: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, no fine-tuning: Qualitative comparison

57

Page 58: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, no fine-tuning: Quantitative comparison • Measurement: retrieval rate by counting number of correct

retrievals in top-K retrievals (K=20)

• Pool 5 features with cosine distance

58

Method Score

Article’s 40%

Elda on HOG 24%

Random AlexNet 19%

ImageNet CNN 62%

Page 59: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, with fine-tuning: Object detection• Follows the pipeline in RCNN• PASCAL VOC 2012 dataset• Trainval set & Test set ~ 10K images• SVM classifier• Learning rate: 0.01, x0.1 each 80K• Total iteration for fine-tune: 200K• 21 Clasees

59

Page 60: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, with fine-tuning: Object detection

60

Page 61: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, with fine-tuning: Object detection

•Without using a single image from ImageNet, just 100K unlabeled videos and VOC 2012 dataset, an ensemble of AlexNet networks achieves 52% mAP.

• ImageNet-supervised counterpart: an ensemble which achieves 54.4% mAP

61

Page 62: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, with fine-tuning: Surface Normal Estimation

62

Page 63: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, with fine-tuning: Surface Normal Estimation

63

• 227 × 227 image as input• Output of our network is 20 × 20 pixels• Each of which is represented by a distribution over

20 code-words, which learnt using K-means• Dimension of output is 20 × 20 × 20 = 8000• Two fully connected layers with 4096 and 8000

neurons on the pool5

Page 64: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Results, with fine-tuning: Surface Normal Estimation

64

Page 65: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Discussion and Conclusion

65

• Much more data available

• Might be as close as 2.5% in mAP to supervised networks

• Greater boost using ensemble of networks

• Can be generalized to different tasks

• Mimic human brain?

Page 66: UNSUPERVISED DEEP LEARNING - TAUweb.eng.tau.ac.il/.../uploads/2016/12/Unsupervised-Deep-Learning.pdf · UNSUPERVISED DEEP LEARNING Erez Aharonov Noam Eilon Deep Learning Seminar School

Questions

66