minimal loss hashing for compact binary codes
DESCRIPTION
Minimal Loss Hashing for Compact Binary Codes. Mohammad Norouzi David Fleet University of Toronto. Near Neighbor Search. Near Neighbor Search. Near Neighbor Search. Similarity-Preserving Binary Hashing. Why binary codes? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/1.jpg)
Minimal Loss Hashing for Compact Binary Codes
Mohammad Norouzi
David Fleet
University of Toronto
![Page 2: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/2.jpg)
Near Neighbor Search
![Page 3: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/3.jpg)
Near Neighbor Search
![Page 4: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/4.jpg)
Near Neighbor Search
![Page 5: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/5.jpg)
Similarity-Preserving Binary Hashing
Why binary codes?
Sub-linear search using hash indexing
(even exhaustive linear search is fast)
Binary codes are storage-efficient
![Page 6: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/6.jpg)
input vector
parametermatrix
binaryquantization
Random projections used by locality-sensitive hashing
(LSH) and related techniques [Indyk & Motwani ‘98;
Charikar ’02; Raginsky & Lazebnik ’09]
Similarity-Preserving Binary Hashing
Hash function
kth row of W
![Page 7: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/7.jpg)
Learning Binary Hash Functions
Reasons to learn hash functions:
to find more compact binary codes
to preserve general similarity measures
Previous work
boosting [Shakhnarovich et al ’03]
neural nets [Salakhutdinov & Hinton 07; Torralba et al 07]
spectral methods [Weiss et al ’08]
loss-based methods [Kulis & Darrel ‘09]
…
![Page 8: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/8.jpg)
Formulation
Input data:
Similarity labels:
Hash function:
Binary codes:
![Page 9: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/9.jpg)
Loss Function
Hash code quality measured by a loss function:
similarity label
binarycodes : code for item 1
: code for item 2
: similarity label
cost
measures consistency
Similar items should map to nearby hash codes
Dissimilar items should map to very different codes
![Page 10: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/10.jpg)
Hinge Loss
Similar items should map to codes within a radius of bits
Dissimilar items should map to codes no closer than bits
![Page 11: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/11.jpg)
Empirical Loss
Good:
incorporates quantization and Hamming distance
Not so good:
discontinuous, non-convex objective function
Given training pairs with similarity labels
![Page 12: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/12.jpg)
We minimize an upper bound on empirical loss,
inspired by structural SVM formulations
[Taskar et al ‘03; Tsochantaridis et al ‘04; Yu &
Joachims ‘09]
![Page 13: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/13.jpg)
Bound on loss
LHS = RHS
![Page 14: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/14.jpg)
Bound on loss
Remarks: piecewise linear in W convex-concave in W relates to structural SVM with latent variables
[Yu & Joachims ‘09]
![Page 15: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/15.jpg)
Bound on Empirical Loss
Loss-adjusted inference
Exact
Efficient
![Page 16: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/16.jpg)
Perceptron-like Learning
Initialize with LSH
Iterate over pairs
• Compute , the codes given by
• Solve loss-adjusted inference
• Update
[McAllester et al.., 2010]
![Page 17: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/17.jpg)
Experiment: Euclidean ANN
Similarity based on Euclidean distance
Datasets LabelMe (GIST) MNIST (pixels) PhotoTourism (SIFT) Peekaboom (GIST) Nursery (8D attributes) 10D Uniform
![Page 18: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/18.jpg)
Experiment: Euclidean ANN
22K LabelMe
512 GIST
20K training
2K testing
~1% of pairs are similar
Evaluation
Precision: #hits / number of items retrieved
Recall: #hits / number of similar items
![Page 19: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/19.jpg)
Techniques of interest
MLHMLH – minimal loss hashing (This work)
LSHLSH – locality-sensitive hashing (Charikar ‘02)
SHSH – spectral hashing (Weiss, Torralba & Fergus ‘09)
SIKHSIKH – shift-Invariant kernel hashing (Raginsky & Lazebnik ‘09)
BRE BRE – Binary reconstructive embedding (Kulis & Darrel ‘09)
![Page 20: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/20.jpg)
Euclidean Labelme – 32 bits
![Page 21: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/21.jpg)
Euclidean Labelme – 32 bits
![Page 22: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/22.jpg)
Euclidean Labelme – 32 bits
![Page 23: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/23.jpg)
Euclidean Labelme – 64 bits
![Page 24: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/24.jpg)
Euclidean Labelme – 64 bits
![Page 25: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/25.jpg)
Euclidean Labelme – 128 bits
![Page 26: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/26.jpg)
Euclidean Labelme – 256 bits
![Page 27: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/27.jpg)
Experiment: Semantic ANN
Semantic similarity measure based on annotations(object labels) from LabelMe database:
512D GIST, 20K training, 2K testing
Techniques of interest
MLHMLH – minimal loss hashing
NNNN – nearest neighbor in GIST space
NNCA NNCA – multilayer network with RBM pre-training and nonlinear NCA fine tuning [Torralba, et al. ’09; Salakhutdinov & Hinton ’07]
![Page 28: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/28.jpg)
Semantic LabelMe
![Page 29: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/29.jpg)
Semantic LabelMe
![Page 30: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/30.jpg)
![Page 31: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/31.jpg)
![Page 32: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/32.jpg)
![Page 33: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/33.jpg)
Summary
A formulation for learning binary hash functions
based on
structured prediction with latent variables
hinge-like loss function for similarity search
Experiments show that with minimal loss hashing
binary codes can be made more compact
semantic similarity based on human labels can be preserved
![Page 34: Minimal Loss Hashing for Compact Binary Codes](https://reader030.vdocuments.net/reader030/viewer/2022033015/5681487c550346895db585a6/html5/thumbnails/34.jpg)
Thank you!
Questions?