small codes and large image databases for recognition cvpr 2008 antonio torralba, mit rob fergus,...

33
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Post on 19-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Small Codes and Large Image Databases for RecognitionCVPR 2008

Antonio Torralba, MITRob Fergus, NYU

Yair Weiss, Hebrew University

Page 2: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Outline

• Introduction• Methods• Experiment• Conclusion

Page 3: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Outline

• Introduction• Methods• Experiment• Conclusion

Page 4: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Summary

• Goal– efficient image search(real time on web-sized)

and fast, just require little memory, enable on standard hardware or handheld devices

• Approach– Use machine learning to convert Gist descriptor

to a compact binary code with a few hundred bits per image

Page 5: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Gist descriptor

• Global image representation• Describe the shapes occurring in an image

with one descriptor– Subdivide image in 4×4 sub images– Calculate Gabor responses in each of these– Create histograms of Gabor responses in each sub

image

Slide by James Hays and Alexei Efros

Page 6: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Gist descriptor

Slide by James Hays and Alexei Efros

Page 7: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Gist descriptor

• In this paper– 8 orientations ,4 frequency = 4×8×16 = 512

dimensional vector.– For smaller images (32×32 pixels), use 3 frequency

= 3×8×16 = 384 dimensions.

Page 8: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Binary Code

• Three reason– compression, it’s possible to represent images

with a very small number of bits and still maintain the information for recognition

Page 9: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Binary Code

– scaling up to web-size databases requires doing the calculations in memory. Fitting hundreds of millions of images into a few GB of memory means we have a budget of very few bytes per image.

– short binary codes allow very fast querying in standard hardware, either using hash tables or efficient bit-count operations

Page 10: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Locality Sensitive Hashing (LSH)

• high dimensional Euclidean space – finds nearest neighbors in constant time

• a number of random projections of that point into R1– each projection contributes a few bits

• when the number of bits is fixed and small– LSH can perform quite poorly

• In this paper– N = 30 bits

Page 11: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Outline

• Introduction• Methods• Experiment• Conclusion

Page 12: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Learning binary codes

• A database of images {xi}• a distance function D(i, j)• a binary feature vector yi = f(xi)• Hamming distance• N100(xi) - the 100 nearest neighbors of xi according to

the distance function D(i, j)• N100(yi) - the 100 descriptors yj that are closest to yi in

terms of Hamming distance• we would like N100(xi) = N100(yi) for all examples in

our training set

Page 13: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

BoostSSC

• Boosting similarity sensitive coding• Learn original input space into a new space– distances between images can be computed using

a weighted Hamming distance.• Binary feature(M bits) – –

• weighted Hamming distance–

Page 14: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

BoostSSC

• positive examples– pairs of images xi, xj , j ∈ N(xi).

• Negative examples– pairs of images that are not neighbors

• regression stump–

Page 15: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

BoostSSC

• Minimize the square loss–

– K is the number of training pairs– Zk = 1, if the two images are neighbors; = −1,

otherwise–

• In this paper– – M around 30 bits

Page 16: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Restricted Boltzmann Machines

• Network of binary stochastic units•

– weights W, bias b

Hidden units: h

Symmetric weights: w

Visible units: v

Page 17: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Restricted Boltzmann Machines

• A probability can be assigned to a binary vector at the visible units –

• Convenient conditional distributions–

Learn weights and biases usingContrastive Divergence

Page 18: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Multi Layer RBM architecture‐

Page 19: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Training RBM models

• Pre training‐– Unsupervised– Use Contrastive Divergence to learn weights and biases– Gets parameters to right ballpark

• Fine tuning‐– Supervised– No longer stochastic– Backpropagate error to update parameters– Moves parameters to local minimum

Page 20: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Outline

• Introduction• Methods• Experiment• Conclusion

Page 21: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Two test datasets

• LabelMe– 22,000 images– Ground truth segmentations for all– Can define distance between images using these

segmentations• Web data[28]– 12.9 million images 32 × 32 colorimages – Subset of 80 million images– No labels, so use L2 distance between GIST vectors as

ground truth[28] A. Torralba, R. Fergus, and W. T. Freeman. Tiny images. Technical Report MIT-CSAIL-TR-2007-024, Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 2007.

Page 22: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

LabelMe retrieval

Page 23: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

LabelMe retrieval

• what ground truth semantic similarity is– spatial pyramid matching over object labels

Page 24: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

LabelMe retrieval

Page 25: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

LabelMe retrieval

• On 2000 test images, N = 50

• • • • •

Page 26: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Web images retrieval

Page 27: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Web images retrieval

Page 28: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Retrieval speed evaluation

• Using multi-threading (M/T) on a quad-core

Page 29: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University
Page 30: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Pixel label

• On 2000 test images

Page 31: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Web images recognition

• On 2000 test images

Page 32: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Outline

• Introduction• Methods• Experiment• Conclusion

Page 33: Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University

Conclusion

• Possible to build compact codes for retrieval– Fast and small on standard PC– Suitable for use on large database– Much room for improvement