recent advances of compact hashing for large-scale visual search shih-fu chang columbia university...

41
Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook), Sanjiv Kumar (Google), Wei Liu (IBM Research), and Jun Wang (IBM Research)

Upload: debra-tate

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Recent Advances of Compact Hashing for Large-Scale Visual Search

Shih-Fu Chang

Columbia University

October 2012

Joint work with Junfeng He (Facebook), Sanjiv Kumar (Google), Wei Liu (IBM Research), and Jun Wang (IBM Research)

Page 2: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

digital video | multimedia lab

Outline Lessons learned in designing hashing functions

The importance of balancing hash bucket size How to incorporate supervised information

Prediction of NN search difficulty & hashing performance

Demo: Bag of hash bits for Mobile Visual Search

Page 3: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Fast Nearest Neighbor Search• Applications: image search, texture synthesis, denoising … • Avoid exhaustive search ( time complexity)

3

Dense matching, Coherence sensitive hashing (Korman&Avidan ’11)

Photo tourism patch search

Image search

Page 4: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Locality-Sensitive Hashing

• hash code collision probability proportional to original similarityl: # hash tables, K: hash bits per table

0

1

0

10

1

4

hash function

random

101 Index by compact code

[Indyk, and Motwani 1998] [Datar et al. 2004]

Page 5: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Hash Table based Search

5

• O(1) search time by table lookup• bucket size is important (affect accuracy & post processing

cost)

xi

n

q01101

01101

01110

01111

01100

hash table

hash bucket address

Page 6: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Different Approaches

6

Unsupervised Hashing

LSH ‘98, SH ‘08, KLSH ‘09,AGH ’10, PCAH, ITQ ‘11

Semi-Supervised Hashing

SSH ‘10, WeaklySH ‘10

Supervised Hashing

RBM ‘09, BRE ‘10, MLH ‘11, LDAH ’11,ITQ ‘11, KSH ‘12

Page 7: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

PCA + Minimize Quantization Errors

• PCA to maximize variance in each hash dimension• find optimal rotation in the subspace to minimize

quantization error

ITQ method, Gong&Lazebnik, CVPR 11

Page 8: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Effects of Min Quantization Errors

• 580K tiny images PCA-ITQ, Gong&Lazebnik, CVPR 11

PCA-random rotation PCA-ITQ optimal alignment

Page 9: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Utilize supervised labelsSemantic Category Supervision

9

Metric Supervision

similar

dissimilardissimilar

similar

dissimilar

Page 10: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Design Hash Codes to Match Supervised Information

10

similar

dissimilar

0

1• Preferred hashing function

Page 11: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Adding Supervised Labels to PCA Hash

Relaxation:

Wang, Kumar, Chang, CVPR ’10, ICML’10

“adjusted” covariance matrix

• solution W: eigen vectors of adjusted covariance matrix• If no supervision (S=0), it is simply PCA hash

Fitting labels PCA covariance matrix

dissimilar pairsimilar pair

Page 12: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Semi-Supervised Hashing (SSH)1 Million GIST Images1% labels, 99% unlabeled

Supervised RBM

Random LSH

Unsupervised SH

SSHPrecision @ top 1K

Page 13: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Problem of orthogonal projections

• Many buckets become empty when # bits increases.

• Need to search many neighbor buckets at query time

Precision @ hamming radius 2

Page 14: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

• Explicitly optimize two terms– Preserve similarity (accuracy)– Balanced bucket size max entropy min mutual info I (search time)

Search accuracy

ICA Type Hashing

2

, 1

( ) || ||N

pq p qp q

D Y W Y Y

Balanced bucket size

1

1

min ( ,..., ,..., )

while ( ) 0

k M

N

pp

I y y y

E y Y

SPICA Hash, He et al, CVPR 11

Fast ICA to find non-orthogonal projections

Page 15: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

The Importance of balanced size

Bucket index

Buck

et s

ize LSHSPICA HashBalanced bucket size

Simulation over 1M tiny image samples

The largest bucket of LSH contains 10% of all 1M samples

Page 16: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Different Approaches

16

Unsupervised Hashing

LSH ‘98, SH ‘08, KLSH ‘09,AGH ’10, PCAH, ITQ ‘11

Semi-Supervised Hashing

SSH ‘10, WeaklySH ‘10

Supervised Hashing

RBM ‘09, BRE ‘10, MLH ‘11, LDAH ’11,ITQ ‘11, KSH ‘12

Page 17: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Better ways to handle supervised information?

17

MLH [Norouzi & Flee, ‘11]

BRE [Kulis & Darrell, ‘10]Hamming distance between H(xi) and H(xj)

hinge loss

But optimizing Hamming Distance (DH, XOR) is not easy!

Page 18: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

A New Supervision Form: Code Inner Products

18

S

x2

x3

x1

dis

sim

ilar

similar

supervised hashing

labeled data

dissim

ilar

1 -1 1

1 -1 1

-1 1 -1

1 1 1

-1 -1 1

1 1 -1Х

Tcode matrix

1 1 -1

1 1 -1

-1 -1 1

x1

x2

x3

x1 x2 x3

pair-wise label matrix

code inner products

rx1

x2

x3

code matrix

fitting

Liu, Wang, Ji, Jiang, Chang, CVPR’12

proof: code inner product ≡ Hamming distance

Page 19: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Code Inner Product enables efficient optimization

• Much easier/faster to optimize and extend to kernels

19

sample

hash bitHashing:

Design hash codes to match

supervised information

Liu, Wang, Ji, Jiang, Chang, CVPR2012

Page 20: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Extend Code Inner Product to Kernel• Following KLSH, construct a hash function using a kernel

function and m anchor samples:

zero-mean normalization applied to k(x).

20

1 -1 1

1 -1 1

-1 1 -1

1 1 -1

=sgn

hash coefficientskernel matrix

×l samples

m anchors

Page 21: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Benefits of Code Inner Product

21

•CIFAR 10, 60K object images from 10 classes, 1K query images.

•1K supervised labels. •KSH0 Spec Relax, KSH Sigmoid hashing function

Supervised Methods

Open Issue: empty buckets and balance not addressed

Page 22: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Speedup by Inner Code Product

22CVPR 2012

Method

Train Time Test Time

48 bits 48 bits

SSH 2.1 0.9×10−5

LDAH 0.7 0.9×10−5

BRE 494.7 2.9×10−5

MLH 3666.3 1.8×10−5

KSH0 7.0 3.3×10−5

KSH 156.1 4.3×10−5

Significant speedup

Page 23: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

25

Tiny-1M: Visual Search Results

CVPR 2012

More visuallyrelevant

Page 24: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Comparison of Hashing vs. KD-Tree

Supervised Hashing

Photo Tourism Patch set (Norte Dame subset, 103K samples)512D GIFTAnchor Graph

Hashing

KD Tree

Page 25: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

• How difficult is approximate nearest neighbor search in a dataset?

Understand Difficulty of Approximate Nearest Neighbor Search

Toy example

q

x is an ε-approximate NN if

Search not meaningful!

A concrete measure of difficulty of search in a dataset?

He, Kumar, Chang, ICML 2012

Page 26: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

• A naïve search approach: Randomly pick a point and compare that to the NN

Relative Contrast

q

Relative Contrast

• High Relative Contrast easier search

• If , search not meaningful

He, Kumar, Chang, ICML 2012

Page 27: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

• With CLT, and binomial approximation

Estimation of Relative Contrast

ϕ - standard Gaussian cdf

σ' – a function of data properties (dimensionality and sparsity)

n: data sizep: Lp distance

Page 28: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

• Data sampled randomly from U[0,1]

Synthetic Datare

lati

ve c

ontr

ast

rela

tive

con

tras

t

higher dimensionality bad sparser vectors good

s: prob. of non-zero element in each dim.d: feature dimension

Page 29: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

• Data sampled randomly from U[0,1]

Synthetic Data

rela

tive

con

tras

t

rela

tive

con

tras

t

lower p goodLarger database good

Page 30: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Predict Hashing Performance of Real-World Data

16 bits LSH

Dataset Dimensionality (d)

Sparsity (s)

Relative Contrast (Cr) for p = 1

SIFT 128 0.89 4.78

Gist 384 1.00 1.83

Color Hist 1382 0.027 3.19

Imagenet BoW 10000 0.024 1.90

28 bits LSH

Page 31: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Mobile Search System by Hashing

34

Light Computing Low Bit Rate Big Data Indexing

He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.

Page 32: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Estimate the Complexity

• 500 local features per image– Feature size ~128 Kbytes– more than 10 seconds for transmission over 3G

• Database indexing– 1 million images need 0.5 billions local features– Finding matched features becomes challenging

• Idea: directly compute compact hash codes on mobile devices

Page 33: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Approach: hashing• Each local feature coded as hash bits

– locality sensitive, efficient for high dimensions• Each image is represented as Bag of Hash Bits

011001100100111100…

110110011001100110…

36

Page 34: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Bit Reuse for Multi-Table Hashing• To reduce transmission size

– Reuse a single hash bit pool by random subsampling

37

1 0 0 1 1 1 0 0 0 0 1 0 1 0 1 0 . . . 0 0 1 1 0 1 1 1

Optimal hash bit pool (e.g., 80 bits, PCA Hash or SPICA hash)

Random subset

Random subset

Random subset

Random subset. . .

Table 1 Table 2 Table 11 Table 12. . . 32 bits

Union Results

Page 35: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Rerank Results with Boundary Features• Use automatic salient object segmentation for every

image in DB [Cheng et al, CVPR 2011]

• Compute boundary features: normalized central distance, Fourier magnitude

• Invariance: translation, scaling, rotation

38

Page 36: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Boundary Feature – Central Distance

Distance to Center D(n) FFT: F(n) 39

Page 37: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Reranking with boundary feature

40

Page 38: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Server:• 1 million product images crawled from

Amazon, eBay and Zappos• Hundreds of categories; shoes, clothes,

electrical devices, groceries, kitchen supplies, movies, etc.

Speed• Feature extraction: ~1s • Transmission:

80 bits/feature, 1KB/image• Serer Search: ~0.4s• Download/display: 1-2s

Mobile Product Search System: Bags of Hash Bits and Boundary features

video demo (52”)

He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.

Page 39: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Performance• Baseline [Chandrasekhar et al CVPR ‘10]:

Client: compress local features with CHoGServer: BoW with Vocabulary Tree (1M codes)

30% higher recall and 6X-30X search speedup

42

Page 40: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

Summary• Some Ideas Discussed

– bucket balancing is important– code inner product – an efficient form of supervised

hashing– insights on search difficulty prediction– Large mobile search – a good test case for hashing

• Open Issues– supervised hashing vs. attribute discovery– hashing beyond point-to-point search– hashing to incorporate structured relation (spatio-

temporal)

43

Page 41: Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook),

References• (Supervised Kernel Hash)

W. Liu, J. Wang, R. Ji, Y. Jiang, and S.-F. Chang, Supervised Hashing with Kernels, CVPR 2012.

• (Difficulty of Nearest Neighbor Search)J. He, S. Kumar, S.-F. Chang, On the Difficulty of Nearest Neighbor Search, ICML 2012.

• (Hash Based Mobile Product Search)J. He, T. Lin, J. Feng, X. Liu, S.-F. Chang, Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012

• (Hashing with Graphs)W. Liu, J. Wang, S. Kumar, S.-F. Chang. Hashing with Graphs, ICML 2011.

• (Iterative Quantization)Y. Gong and S. Lazebnik, Iterative Quantization: A Procrustean Approach to Learning Binary Codes, CVPR 2011.

• (Semi-Supervised Hash)J. Wang, S. Kumar, S.-F. Chang. Semi-Supervised Hashing for Scalable Image Retrieval. CVPR 2010.

• (ICA Hashing)J.He, R. Radhakrishnan, S.-F. Chang, C. Bauer. Compact Hashing with Joint Optimization of Search Accuracy and Time. CVPR 2011. 44