Download - Unsupervised Learning Jointly With Image Clusteringjyang375/Jianwei_Yang... · Unsupervised Learning Jointly With Image Clustering Virginia Tech Jianwei Yang Devi Parikh Dhruv Batra

Unsupervised Learning Jointly With Image Clustering

Virginia Tech

Jianwei Yang Devi Parikh Dhruv Batra

https://filebox.ece.vt.edu/~jw2yang/ 1

Huge amount of images!!!

3


Learning without annotation efforts

4



What we need to learn?

5

An open problem




6

An open problem

A hot problem




7

Various methodologies

An open problem

A hot problem




8

Learning distribution (structure)

Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31.3 (1999): 264-323.

Clustering

9



Clustering

K-means (Image Credit: Jesse Johnson)

10



Clustering


Hierarchical Clustering

11



Clustering

K-means (Image Credit: Jesse Johnson)Spectral Clustering

Manor et al, NIPS’04Hierarchical Clustering

12



Clustering

K-means (Image Credit: Jesse Johnson)Spectral Clustering

Manor et al, NIPS’04Hierarchical Clustering Graph Cut

Shi et al, TPAMI’00

13



Clustering


DBSCAN, Ester et al, KDD’96 (Image Credit: Jesse Johnson)

Spectral Clustering Manor et al, NIPS’04

Hierarchical Clustering Graph CutShi et al, TPAMI’00

14



Clustering





EM Algorithm, Dempster et al, JRSS’77

15



Clustering





EM Algorithm, Dempster et al, JRSS’77NMF, Xu et al, SIGIR‘03 (Image Credit: Conrad Lee)

16


Sub-space Analysis

PCA (Image Credit: Jesse Johnson)ICA (Image Credit: Shylaja et al)

tSNE, Maaten et al, JMLR’08

Subspace Clustering, Vidal et al.

Sparse coding, Olshausen et al. Vision Research’9717

Learning representation (feature)

Yoshua Bengio, Aaron Courville, and Pierre Vincent. "Representation learning: A review and new perspectives." IEEE Transactions on Pattern Analysis and Machine Intelligence. 35.8 (2013): 1798-1828.

Autoencoder, Hinton et al, Science’06 (Image Credit: Jesse Johnson)

DBN, Hinton et al, Science’06 DBM, Salakhutdinov et al, AISTATS’09

Bengio et al, TPAMI’13

18

Learning representation (feature)

VAE, Kingma et al, arXiv’13(Image Credit: Fast Forward Labs)

GAN, Goodfellow et al, NIPS’14DCGAN, Radford et al, arXiv’15

(Image Credit: Mike Swarbrick Jones)

19

Most Recent CV Works

Spatial context, Doersch et al, ICCV’15Temporal context, Wang et al, ICCV’15

Solving Jigsaw, Noroozi et al, ECCV’16Context Encoder, Deepak et al, CVPR’16

Ego-motion, Jayaraman et al, ICCV’15

20

Most Recent CV Works

Visual concept clustering, Huang et al, CVPR’16

Graph constraint, Li et al, ECCV’16

TAGnet, Wang et al, SDM’16

Deep Embedding, Xie et al, ICML’1621

Our Work

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters

22

Outline

• Intuition

• Approach

• Experiments

• Extensions

23

Intuition

Meaningful clusters can provide supervisory signals to learn image representations

24

Intuition

Meaningful clusters can provide supervisory signals to learn image representations

Good representations help to get meaningful clusters

25

Intuition

Cluster images first, and then learn representations

26

Intuition


Learn representations first, and then cluster images

27

Intuition

Cluster images and learn representations progressively


Learn representations first, and then cluster images

28

IntuitionGood clusterGood representationsGood clusters

Good representations

Poor clusters

Poor representations 29

IntuitionGood clusterGood representationsGood clusters


Poor clusters


IntuitionGood clusterGood representations


Good clusters

Poor clusters


Approach

• Framework

• Objective

• Algorithm & Implementation

33

Approach: Framework

Agglomerative Clustering

Convolutional Neural Network

Representation Learning

Agglomerative Clustering

arg min ( | , )y

L y I

arg min ( | , )L y I

,

arg min ( , | )y

L y I

34

Approach: Framework

Convolutional Neural Network Agglomerative Clustering

arg min ( | , )L y I

arg min ( | , )y

L y I

35

Approach: Recurrent Framework

36


37


38


39


40


41


Backward at each time-step is time-consuming and prone to over-fitting!

42


How about updating once for multiple time-steps?

Backward at each time-step is time-consuming and prone to over-fitting!

43


Partially Unrolling: divide all T time-steps into P periods

In each period, we merge clusters for multiple times and update CNN parameters at the end of period

44




45




P is determined by a hyper-parameter will be introduced later 46

Approach: Objective Function

Overall loss:

,

arg min ( , | )y

L y I

arg min ( | , )y

L y I arg min ( | , )L y I

47


Loss at time-step t:

Conventional Agg. Clustering Strategy

Proposed Agg. Clustering Strategy

48





Affinity measure

49





i-th cluster

50





K_c nearest neighbor clusters of i-th cluster

51





Affinity between i-thcluster and its NN

52






Differences between two cluster affinities 53






Differences between two cluster affinities

Merge these two clusters

54






Differences between two cluster affinities

Merge these two clusters

55


Loss in forward pass in period p (merge clusters):


56




57




CNN parameters are fixed

58




CNN parameters are fixed

Cluster labels are fixed

59


Forward Pass:

Simple Greedy Algorithm

Merge two clusters which minimize the loss at each time step

60


Forward Pass:



61


Forward Pass:



62


Forward Pass:



63

Approach: Objective

Backward Pass:

64

Approach: Objective

Backward Pass:

Consider all previous periods

65

Approach: Objective

Backward Pass:

Cluster based loss is not proper for batch optimization!!!


66

Approach: Objective

Backward Pass:

Cluster based loss is not proper for batch optimization!!!


Approximation:

67

Approach: Objective

Backward Pass:

Convert to sample-based loss:


Intra-sample affinity Inter-sample affinity

Recall cluster-based loss:

68

Approach: Objective

Backward Pass:

Convert to sample-based loss:


Intra-sample affinity Inter-sample affinity

Recall cluster-based loss:

Weighted triplet loss

69

Approach: Algorithm & Implementation

70


Raw image data

71


Raw image data

Assume it is known

72


Raw image data

Assume it is known

Randomly initialize CNN parameters4 samples in each cluster in average

73


Raw image data

Assume it is known


Train CNN for about 20 epochs

74


Raw image data

Assume it is known


Train CNN for about 20 epochs

We can go back and retrain the model, but it improve slightly

75

Experiments

• Datasets

• Network Architecture

• Image Clustering

• Representation Learning

76

Experiments: Datasets

MNIST (70000, 10, 28x28) USPS (11000, 10, 16x16) COIL20 (1440, 20, 128x128) COIL100 (7200, 100, 128x128)

UMist (575, 20, 112x92) FRGC (2462, 20, 32x32) CMU-PIE (2856, 68, 32x32) Youtube Face (1000, 41, 55x55)77

Experiments: SettingsTwo important parameters

Set the layer numbers so that theOutput feature map is about 10x10

78

Experiments: Clustering : Performance

+6.43% on NMI to best performance of existing approaches averaged over all datasets

79


+12.76% on AC to best performance of existing approaches averaged over all datasets

80


Average +21.5% on NMI

81


Average +25.7% on NMI

82


Our clustering performance vs. that of existing clustering approaches using raw image data.

Clustering performance using our representation fed to existing clustering algorithms.

Experiments: Clustering : Visualization

COIL-20

COIL-100

84

Experiments: Clustering : Visualization

USPS

MNIST-test

85

Experiments: Clustering : Ablation study

86

Experiments: Clustering : Verification

87

Experiments: Clustering : Time Cost

88

Experiments: Representation Learning

Testing generalization of our learnt (unsupervised) representation to LFW face verification.

Evaluation on CIFAR-10 classification

Representation transfer

Representation learning

89

Extensions: Data Visualization

90

Conclusion

• A new unsupervised learning method jointly with image clustering, cast the problem into a recurrent optimization problem;

• In the recurrent framework, clustering is conducted during forward pass, and representation learning is conducted during backward pass;

• A unified loss function in the forward pass and backward pass;

• Performance outperforms the state-of-the-art over a number of datasets;

• It can also learn plausible representations for image recognition.

91

Thanks!

https://github.com/jwyang/joint-unsupervised-learning 92

Download - Unsupervised Learning Jointly With Image Clusteringjyang375/Jianwei_Yang... · Unsupervised Learning Jointly With Image Clustering Virginia Tech Jianwei Yang Devi Parikh Dhruv Batra

Top Related