image retrieval, vector quantization and nearest neighbor...
TRANSCRIPT
![Page 1: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/1.jpg)
Image retrieval, vector quantization and nearestneighbor search
Yannis Avrithis
National Technical University of Athens
Rennes, October 2014
![Page 2: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/2.jpg)
Part I: Image retrieval
• Particular object retrieval
• Match images under differentviewpoint/lighting, occlusion
• Given local descriptors, investigatematch kernels beyond Bag-of-Words
Part II: Vector quantization and nearest neighbor search
• Fast nearest neighbor search inhigh-dimensional spaces
• Encode vectors based on vectorquantization
• Improve fitting to underlyingdistribution
![Page 3: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/3.jpg)
Part I: Image retrieval
• Particular object retrieval
• Match images under differentviewpoint/lighting, occlusion
• Given local descriptors, investigatematch kernels beyond Bag-of-Words
Part II: Vector quantization and nearest neighbor search
• Fast nearest neighbor search inhigh-dimensional spaces
• Encode vectors based on vectorquantization
• Improve fitting to underlyingdistribution
![Page 4: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/4.jpg)
Part I: Image retrieval
To aggregate or not to aggregate:selective match kernels for image search
Joint work with Giorgos Tolias and Herve Jegou, ICCV 2013
![Page 5: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/5.jpg)
Overview
• Problem: particular object retrieval
• Build common model for matching (HE) and aggregation (VLAD)methods; derive new match kernels
• Evaluate performance under exact or approximate descriptors
![Page 6: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/6.jpg)
Related work
• In our common model:• Bag-of-Words (BoW) [Sivic & Zisserman ’03]• Descriptor approximation (Hamming embedding) [Jegou et al. ’08]• Aggregated representations (VLAD, Fisher) [Jegou et al. ’10][Perronnin
et al. ’10]
• Relevant to Part II:• Soft (multiple) assignment [Philbin et al. ’08][Jegou et al. ’10]
• Not discussed:• Spatial matching [Philbin et al. ’08][Tolias & Avrithis ’11]• Query expansion [Chum et al. ’07][Tolias & Jegou ’13]• Re-ranking [Qin et al. ’11][Shen et al. ’12]
![Page 7: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/7.jpg)
Related work
• In our common model:• Bag-of-Words (BoW) [Sivic & Zisserman ’03]• Descriptor approximation (Hamming embedding) [Jegou et al. ’08]• Aggregated representations (VLAD, Fisher) [Jegou et al. ’10][Perronnin
et al. ’10]
• Relevant to Part II:• Soft (multiple) assignment [Philbin et al. ’08][Jegou et al. ’10]
• Not discussed:• Spatial matching [Philbin et al. ’08][Tolias & Avrithis ’11]• Query expansion [Chum et al. ’07][Tolias & Jegou ’13]• Re-ranking [Qin et al. ’11][Shen et al. ’12]
![Page 8: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/8.jpg)
Related work
• In our common model:• Bag-of-Words (BoW) [Sivic & Zisserman ’03]• Descriptor approximation (Hamming embedding) [Jegou et al. ’08]• Aggregated representations (VLAD, Fisher) [Jegou et al. ’10][Perronnin
et al. ’10]
• Relevant to Part II:• Soft (multiple) assignment [Philbin et al. ’08][Jegou et al. ’10]
• Not discussed:• Spatial matching [Philbin et al. ’08][Tolias & Avrithis ’11]• Query expansion [Chum et al. ’07][Tolias & Jegou ’13]• Re-ranking [Qin et al. ’11][Shen et al. ’12]
![Page 9: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/9.jpg)
Image representation
• Entire image: set of local descriptors X = {x1, . . . , xn}• Descriptors assigned to cell c: Xc = {x ∈ X : q(x) = c}
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 10: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/10.jpg)
Set similarity function
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wcM (Xc,Yc)
normalization factor cell weighting cell similarity
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 11: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/11.jpg)
Set similarity function
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wcM (Xc,Yc)
normalization factor cell weighting cell similarity
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 12: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/12.jpg)
Set similarity function
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wcM (Xc,Yc)
normalization factor cell weighting cell similarity
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 13: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/13.jpg)
Set similarity function
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wcM (Xc,Yc)
normalization factor cell weighting cell similarity
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 14: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/14.jpg)
Set similarity function
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wcM (Xc,Yc)
normalization factor cell weighting cell similarity
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 15: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/15.jpg)
Set similarity function
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wcM (Xc,Yc)
normalization factor cell weighting cell similarity
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 16: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/16.jpg)
Bag-of-Words (BoW) similarity function
Cosine similarity
M(Xc,Yc) = |Xc| × |Yc| =∑x∈Xc
∑y∈Yc
1
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 17: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/17.jpg)
Bag-of-Words (BoW) similarity function
Cosine similarity
M(Xc,Yc) = |Xc| × |Yc| =∑x∈Xc
∑y∈Yc
1
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 18: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/18.jpg)
Bag-of-Words (BoW) similarity function
Cosine similarity
M(Xc,Yc) = |Xc| × |Yc| =∑x∈Xc
∑y∈Yc
1
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 19: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/19.jpg)
Hamming Embedding (HE)
M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w(
h(bx, by
))
weight function Hamming distance binary codes
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 20: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/20.jpg)
Hamming Embedding (HE)
M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w(
h(bx, by
))
weight function Hamming distance binary codes
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 21: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/21.jpg)
Hamming Embedding (HE)
M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w(
h(bx, by
))
weight function Hamming distance binary codes
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 22: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/22.jpg)
Hamming Embedding (HE)
M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w(
h(bx, by
))
weight function Hamming distance binary codes
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 23: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/23.jpg)
Hamming Embedding (HE)
M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w(
h(bx, by
))
weight function Hamming distance binary codes
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 24: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/24.jpg)
Hamming Embedding (HE)
M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w(
h(bx, by
))
weight function Hamming distance binary codes
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 25: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/25.jpg)
VLAD
M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y)
aggregated residual∑
x∈Xcr(x) residual x− q(x)
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 26: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/26.jpg)
VLAD
M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y)
aggregated residual∑
x∈Xcr(x) residual x− q(x)
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 27: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/27.jpg)
VLAD
M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y)
aggregated residual∑
x∈Xcr(x) residual x− q(x)
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 28: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/28.jpg)
VLAD
M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y)
aggregated residual∑
x∈Xcr(x) residual x− q(x)
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 29: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/29.jpg)
VLAD
M (Xc,Yc) = V (Xc)>V (Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y)
aggregated residual∑
x∈Xcr(x) residual x− q(x)
Generic set similarity
K(X ,Y) = γ(X ) γ(Y)∑c∈C
wc M (Xc,Yc)
![Page 30: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/30.jpg)
Design choices
Hamming embedding
• Binary signature & voting per descriptor (not aggregated)
• Selective: discard weak votes
VLAD
• One aggregated vector per cell
• Linear operation
Questions
• Is aggregation good with large vocabularies (e.g. 65k)?
• How important is selectivity in this case?
![Page 31: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/31.jpg)
Design choices
Hamming embedding
• Binary signature & voting per descriptor (not aggregated)
• Selective: discard weak votes
VLAD
• One aggregated vector per cell
• Linear operation
Questions
• Is aggregation good with large vocabularies (e.g. 65k)?
• How important is selectivity in this case?
![Page 32: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/32.jpg)
Common model
Non aggregated
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
selectivity function descriptor representation (residual, binary, scalar)
Aggregated
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
normalization (`2, power-law) cell representation
![Page 33: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/33.jpg)
Common model
Non aggregated
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
selectivity function descriptor representation (residual, binary, scalar)
Aggregated
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
normalization (`2, power-law) cell representation
![Page 34: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/34.jpg)
Common model
Non aggregated
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
selectivity function descriptor representation (residual, binary, scalar)
Aggregated
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
normalization (`2, power-law) cell representation
![Page 35: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/35.jpg)
Common model
Non aggregated
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
selectivity function descriptor representation (residual, binary, scalar)
Aggregated
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
normalization (`2, power-law) cell representation
![Page 36: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/36.jpg)
Common model
Non aggregated
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
selectivity function descriptor representation (residual, binary, scalar)
Aggregated
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
normalization (`2, power-law) cell representation
![Page 37: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/37.jpg)
Common model
Non aggregated
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
selectivity function descriptor representation (residual, binary, scalar)
Aggregated
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
normalization (`2, power-law) cell representation
![Page 38: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/38.jpg)
BoW, HE and VLAD in the common modelModel M(Xc,Yc) φ(x) σ(u) ψ(z) Φ(Xc)BoW MN or MA 1 u z |Xc|HE MN only bx w
(B2 (1− u)
)— —
VLAD MN or MA r(x) u z V (Xc)
BoW M(Xc,Yc) =∑x∈Xc
∑y∈Yc
1 = |Xc| × |Yc|
HE M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w (h (bx, by))
VLAD M (Xc,Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y) = V (Xc)>V (Yc)
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
![Page 39: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/39.jpg)
BoW, HE and VLAD in the common modelModel M(Xc,Yc) φ(x) σ(u) ψ(z) Φ(Xc)BoW MN or MA 1 u z |Xc|HE MN only bx w
(B2 (1− u)
)— —
VLAD MN or MA r(x) u z V (Xc)
BoW M(Xc,Yc) =∑x∈Xc
∑y∈Yc
1 = |Xc| × |Yc|
HE M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w (h (bx, by))
VLAD M (Xc,Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y) = V (Xc)>V (Yc)
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
![Page 40: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/40.jpg)
BoW, HE and VLAD in the common modelModel M(Xc,Yc) φ(x) σ(u) ψ(z) Φ(Xc)BoW MN or MA 1 u z |Xc|HE MN only bx w
(B2 (1− u)
)— —
VLAD MN or MA r(x) u z V (Xc)
BoW M(Xc,Yc) =∑x∈Xc
∑y∈Yc
1 = |Xc| × |Yc|
HE M (Xc,Yc) =∑x∈Xc
∑y∈Yc
w (h (bx, by))
VLAD M (Xc,Yc) =∑x∈Xc
∑y∈Yc
r(x)>r(y) = V (Xc)>V (Yc)
MN(Xc,Yc) =∑x∈Xc
∑y∈Yc
σ(φ(x)>φ(y)
)
MA(Xc,Yc) = σ
ψ(∑x∈Xc
φ(x)
)>
ψ
∑y∈Yc
φ(y)
= σ(
Φ(Xc)>Φ(Yc))
![Page 41: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/41.jpg)
Selective Match Kernel (SMK)
SMK(Xc,Yc) =∑x∈Xc
∑y∈Yc
σα(r(x)>r(y))
• Descriptor representation: `2-normalized residual
φ(x) = r(x) = r(x)/‖r(x)‖• Selectivity function
σα(u) =
{sign(u)|u|α, u > τ0, otherwise
0
0.2
0.4
0.6
0.8
1
-0.4-0.2 0 0.2 0.4 0.6 0.8 1
sim
ilarit
y sc
ore
dot product
α=3, τ=0
![Page 42: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/42.jpg)
Selective Match Kernel (SMK)
SMK(Xc,Yc) =∑x∈Xc
∑y∈Yc
σα(r(x)>r(y))
• Descriptor representation: `2-normalized residual
φ(x) = r(x) = r(x)/‖r(x)‖• Selectivity function
σα(u) =
{sign(u)|u|α, u > τ0, otherwise
0
0.2
0.4
0.6
0.8
1
-0.4-0.2 0 0.2 0.4 0.6 0.8 1
sim
ilarit
y sc
ore
dot product
α=3, τ=0
![Page 43: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/43.jpg)
Matching example—impact of threshold
α = 1, τ = 0.0
α = 1, τ = 0.25
thresholding removes false correspondences
![Page 44: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/44.jpg)
Matching example—impact of shape parameter
α = 3, τ = 0.0
α = 3, τ = 0.25
weighs matches based on confidence
![Page 45: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/45.jpg)
Aggregated Selective Match Kernel (ASMK)
ASMK(Xc,Yc) = σα
(V (Xc)>V (Yc)
)• Cell representation: `2-normalized aggregated residual
Φ(Xc) = V (Xc) = V (Xc)/‖V (Xc)‖
• Similar to [Arandjelovic & Zisserman ’13], but:• with selectivity function σα• used with large vocabularies
![Page 46: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/46.jpg)
Aggregated Selective Match Kernel (ASMK)
ASMK(Xc,Yc) = σα
(V (Xc)>V (Yc)
)• Cell representation: `2-normalized aggregated residual
Φ(Xc) = V (Xc) = V (Xc)/‖V (Xc)‖
• Similar to [Arandjelovic & Zisserman ’13], but:• with selectivity function σα• used with large vocabularies
![Page 47: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/47.jpg)
Aggregated features: k = 128 as in VLAD
![Page 48: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/48.jpg)
Aggregated features: k = 65K as in ASMK
![Page 49: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/49.jpg)
Why to aggregate: burstiness
• Burstiness: non-iid statistical behaviour of descriptors
• Matches of bursty features dominate the total similarity score
• Previous work: [Jegou et al. ’09][Chum & Matas ’10][Torii et al. ’13]
In this work
• Aggregation and normalization per cell handles burstiness
• Keeps a single representative, similar to max-pooling
![Page 50: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/50.jpg)
Why to aggregate: burstiness
• Burstiness: non-iid statistical behaviour of descriptors
• Matches of bursty features dominate the total similarity score
• Previous work: [Jegou et al. ’09][Chum & Matas ’10][Torii et al. ’13]
In this work
• Aggregation and normalization per cell handles burstiness
• Keeps a single representative, similar to max-pooling
![Page 51: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/51.jpg)
Binary counterparts SMK? and ASMK?
• Full vector representation: high memory cost
• Approximate vector representation: binary vector
SMK?(Xc,Yc) =∑x∈Xc
∑y∈Yc
σα
{b(r(x))>b(r(y))
}
ASMK?(Xc,Yc) = σα
b(∑x∈Xc
r(x)
)>
b
∑y∈Yc
r(y)
b includes centering and rotation as in HE, followed by binarizationand `2-normalization
![Page 52: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/52.jpg)
Impact of selectivity
64 66 68 70 72 74 76 78 80 82
1 2 3 4 5 6 7
mA
P
α
SMK
Oxford5kParis6k
Holidays 64 66 68 70 72 74 76 78 80 82
1 2 3 4 5 6 7
mA
P
α
SMK*
Oxford5kParis6k
Holidays
64 66 68 70 72 74 76 78 80 82
1 2 3 4 5 6 7
mA
P
α
ASMK
Oxford5kParis6k
Holidays 64 66 68 70 72 74 76 78 80 82
1 2 3 4 5 6 7
mA
P
α
ASMK*
Oxford5kParis6k
Holidays
![Page 53: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/53.jpg)
Impact of aggregation
• Improves performance for different vocabulary sizes
• Reduces memory requirements of inverted file
k memory ratio
8k 69 %16k 78 %32k 85 %65k 89 %
70
72
74
76
78
80
82
84
8k 16k 32k 65km
AP
k
Oxford5k - MA
SMKSMK-BURST
ASMK
with k = 8k on Oxford5k• VLAD → 65.5%• SMK → 74.2%• ASMK → 78.1%
![Page 54: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/54.jpg)
Comparison to state of the art
Dataset MA Oxf5k Oxf105k Par6k Holiday
ASMK? 76.4 69.2 74.4 80.0ASMK? × 80.4 75.0 77.0 81.0ASMK 78.1 - 76.0 81.2ASMK × 81.7 - 78.2 82.2
HE [Jegou et al. ’10] 51.7 - - 74.5HE [Jegou et al. ’10] × 56.1 - - 77.5HE-BURST [Jain et al. ’10] 64.5 - - 78.0HE-BURST [Jain et al. ’10] × 67.4 - - 79.6Fine vocab. [Mikulık et al. ’10] × 74.2 67.4 74.9 74.9AHE-BURST [Jain et al. ’10] 66.6 - - 79.4AHE-BURST [Jain et al. ’10] × 69.8 - - 81.9Rep. structures [Torri et al. ’13] × 65.6 - - 74.9
![Page 55: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/55.jpg)
Discussion
• Aggregation is also beneficial with large vocabularies → burstiness
• Selectivity always helps (with or without aggregation)
• Descriptor approximation reduces performance only slightly
![Page 56: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/56.jpg)
Part II: Vector quantizationand nearest neighbor search
Locally optimized product quantization
Joint work with Yannis Kalantidis, CVPR 2014
![Page 57: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/57.jpg)
Overview
• Problem: given query point q, find its nearest neighbor with respectto Euclidean distance within data set X in a d-dimensional space
• Focus on large scale: encode (compress) vectors, speed up distancecomputations
• Fit better underlying distribution with little space & time overhead
![Page 58: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/58.jpg)
Applications
• Retrieval (image as point) [Jegou et al. ’10][Perronnin et al. ’10]
• Retrieval (descriptor as point) [Tolias et al. ’13][Qin et al. ’13]
• Localization, pose estimation [Sattler et al. ’12][Li et al. ’12]
• Classification [Boiman et al. ’08][McCann & Lowe ’12]
• Clustering [Philbin et al. ’07][Avrithis ’13]
![Page 59: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/59.jpg)
Related work
• Indexing• Inverted index (image retrieval)• Inverted multi-index [Babenko & Lempitsky ’12] (nearest neighbor
search)
• Encoding and ranking• Vector quantization (VQ)• Product quantization (PQ) [Jegou et al. ’11]• Optimized product quantization (OPQ) [Ge et al. ’13]
Cartesian k-means [Norouzi & Fleet ’13]• Locally optimized product quantization (LOPQ) [Kalantidis and
Avrithis ’14]
• Not discussed• Tree-based indexing, e.g., [Muja and Lowe ’09]• Hashing and binary codes, e.g., [Norouzi et al. ’12]
![Page 60: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/60.jpg)
Related work
• Indexing• Inverted index (image retrieval)• Inverted multi-index [Babenko & Lempitsky ’12] (nearest neighbor
search)
• Encoding and ranking• Vector quantization (VQ)• Product quantization (PQ) [Jegou et al. ’11]• Optimized product quantization (OPQ) [Ge et al. ’13]
Cartesian k-means [Norouzi & Fleet ’13]• Locally optimized product quantization (LOPQ) [Kalantidis and
Avrithis ’14]
• Not discussed• Tree-based indexing, e.g., [Muja and Lowe ’09]• Hashing and binary codes, e.g., [Norouzi et al. ’12]
![Page 61: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/61.jpg)
Related work
• Indexing• Inverted index (image retrieval)• Inverted multi-index [Babenko & Lempitsky ’12] (nearest neighbor
search)
• Encoding and ranking• Vector quantization (VQ)• Product quantization (PQ) [Jegou et al. ’11]• Optimized product quantization (OPQ) [Ge et al. ’13]
Cartesian k-means [Norouzi & Fleet ’13]• Locally optimized product quantization (LOPQ) [Kalantidis and
Avrithis ’14]
• Not discussed• Tree-based indexing, e.g., [Muja and Lowe ’09]• Hashing and binary codes, e.g., [Norouzi et al. ’12]
![Page 62: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/62.jpg)
Inverted indexIndex
54
67
72
54
67
72
12 13 14 15 16 17 18 19 20 21 22
images
query
3
![Page 63: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/63.jpg)
Inverted indexInverted file
54
67
72
1 1 1
54
67
72
12 13 14 15 16 17 18 19 20 21 22
images
query
3
![Page 64: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/64.jpg)
Inverted indexInverted file
54
67
72
1 2 2 1
54
67
72
12 13 14 15 16 17 18 19 20 21 22
images
query
3
![Page 65: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/65.jpg)
Inverted indexInverted file
54
67
72
1 3 1 2 1 1
54
67
72
12 13 14 15 16 17 18 19 20 21 22
images
query
3
![Page 66: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/66.jpg)
Inverted indexRanking
54
67
72
1 3 1 2 1 1
54
67
72
12 13 14 15 16 17 18 19 20 21 22
images
query
rankedshortlist
3
![Page 67: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/67.jpg)
Inverted index—issues
• Are items in a postings list equally important?
• What changes under soft (multiple) assignment?
• How should vectors be encoded for memory efficiency and fast search?
![Page 68: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/68.jpg)
Inverted multi-index
0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
inverted index inverted multi-indexFigure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left – the inverted index basedon standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right – the inverted multi-index based on productquantization (each of the two codebooks has 16 1D codewords). The number of operations needed to match a query to codebooks is thesame for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left)contain 45 and 62 words respectively (circled). Note that when a query lies near a space partition boundary (as happens most often in highdimensions) the resulting list is heavily “skewed” and may not contain many of the nearest neighbors. Note also that the inverted indexis not able to return lists of a pre-specified small length (e.g. 30 points). For the same queries, the candidate lists of at least 30 vectorsare requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such shortlists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resultingvector sets span the neighborhoods that are much less “skewed” (i.e., the neighborhoods are approximately centered at the queries). In highdimensions, the capability to visit many cells that surround the query from different directions translates into considerably higher accuracyof retrieval and nearest neighbor search.
multi-index table corresponds to a part of the original vec-tor space and contains a list of points that fall within thatpart. Importantly, we propose a simple and efficient algo-rithm that produces a sequence of multi-index entries or-dered by the increasing distance between the given queryvector and the centroid of the corresponding entry. Simi-larly to standard inverted indices, concatenating the vectorlists for a certain number of entries that are closest to thequery vector then produces the candidate list.
Crucially, given comparable time budgets for queryingthe dataset as well as for the initial index construction, in-verted multi-indices subdivide the vector space orders ofmagnitude more densely compared to standard inverted in-dices (Figure 1). Our experiments demonstrate the ad-vantages resulting from this property, in particular in thecontext of very large scale approximate nearest neighborsearch. We evaluate the inverted multi-index on the BI-GANN dataset of 1 billion SIFT vectors recently introducedby Jegou et al. [11] as well as on the “Tiny Images” datasetof 80 million GIST vectors introduced by [24]. We showthat as a result of the “extra-fine” granularity, the candidatelists produced by querying multi-indices are more accurate(have shorter lengths and higher probability of containingtrue nearest neighbors) compared to standard inverted in-dices. We also demonstrate that in combination with a suit-able reranking procedure, multi-indices substantially im-prove the state-of-the-art approximate nearest neighbor re-trieval performance on the BIGANN dataset.
2. Related Work
The use of inverted indices has a long history in infor-mation retrieval [15]. Their use in computer vision was pi-
oneered by Sivic and Zisserman [23]. Since then, a largenumber of improvements that transfer further ideas fromtext retrieval (e.g. [4]), improve the quantization process(e.g. [20]), and integrate the query process with geomet-ric verification (e.g. [27]) have been proposed. Many ofthese improvements can be used in conjunction with in-verted multi-indices in the same way as with regular in-verted indices.
Approximate near(est) neighbor (ANN) search is a coreoperation in AI. ANN-systems based on tree-based indices(e.g. [2]) as well as on random projections (e.g. [7]) areoften employed. However, the large memory footprint ofthese methods limits their use to smaller datasets (up to mil-lions of vectors). Recently, lossy compression schemes thatadmit both compact storage and efficient distance evalua-tions and are therefore more suitable for large-scale datasetshave been developed. Towards this end, binary encodingschemes (e.g. [22, 25, 21]) as well as product quantization[9] have brought down both memory consumption and dis-tance evaluation time by order(s) of magnitude comparedto manipulating uncompressed vectors, to the point whereexhaustive search can be used to query rather large datasets(up to many millions of vectors).
The idea of fast distance computation via product quan-tization introduced by Jegou et al. [9] has served as a pri-mary inspiration for this work. Our contribution, however,is complementary to that of [9]. In fact, the systems pre-sented by Jegou et al. in [9, 11, 10] use standard invertedindices and, consequently, have to rerank rather long can-didate lists when querying very large datasets in order toachieve high recall. Unlike [9, 11, 10], we focus on theuse of PQ for indexing and candidate list generation. Wealso note that while we combine multi-indices with the PQ-
• decompose vectors as x = (x1,x2)
• train codebooks C1, C2 from datasets {x1n}, {x2
n}• induced codebook C1 × C2 gives a finer partition
• given query q, visit cells (c1, c2) ∈ C1 × C2 in ascending order ofdistance to q by multi-sequence algorithm
![Page 69: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/69.jpg)
Inverted multi-index
0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
inverted index inverted multi-indexFigure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left – the inverted index basedon standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right – the inverted multi-index based on productquantization (each of the two codebooks has 16 1D codewords). The number of operations needed to match a query to codebooks is thesame for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left)contain 45 and 62 words respectively (circled). Note that when a query lies near a space partition boundary (as happens most often in highdimensions) the resulting list is heavily “skewed” and may not contain many of the nearest neighbors. Note also that the inverted indexis not able to return lists of a pre-specified small length (e.g. 30 points). For the same queries, the candidate lists of at least 30 vectorsare requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such shortlists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resultingvector sets span the neighborhoods that are much less “skewed” (i.e., the neighborhoods are approximately centered at the queries). In highdimensions, the capability to visit many cells that surround the query from different directions translates into considerably higher accuracyof retrieval and nearest neighbor search.
multi-index table corresponds to a part of the original vec-tor space and contains a list of points that fall within thatpart. Importantly, we propose a simple and efficient algo-rithm that produces a sequence of multi-index entries or-dered by the increasing distance between the given queryvector and the centroid of the corresponding entry. Simi-larly to standard inverted indices, concatenating the vectorlists for a certain number of entries that are closest to thequery vector then produces the candidate list.
Crucially, given comparable time budgets for queryingthe dataset as well as for the initial index construction, in-verted multi-indices subdivide the vector space orders ofmagnitude more densely compared to standard inverted in-dices (Figure 1). Our experiments demonstrate the ad-vantages resulting from this property, in particular in thecontext of very large scale approximate nearest neighborsearch. We evaluate the inverted multi-index on the BI-GANN dataset of 1 billion SIFT vectors recently introducedby Jegou et al. [11] as well as on the “Tiny Images” datasetof 80 million GIST vectors introduced by [24]. We showthat as a result of the “extra-fine” granularity, the candidatelists produced by querying multi-indices are more accurate(have shorter lengths and higher probability of containingtrue nearest neighbors) compared to standard inverted in-dices. We also demonstrate that in combination with a suit-able reranking procedure, multi-indices substantially im-prove the state-of-the-art approximate nearest neighbor re-trieval performance on the BIGANN dataset.
2. Related Work
The use of inverted indices has a long history in infor-mation retrieval [15]. Their use in computer vision was pi-
oneered by Sivic and Zisserman [23]. Since then, a largenumber of improvements that transfer further ideas fromtext retrieval (e.g. [4]), improve the quantization process(e.g. [20]), and integrate the query process with geomet-ric verification (e.g. [27]) have been proposed. Many ofthese improvements can be used in conjunction with in-verted multi-indices in the same way as with regular in-verted indices.
Approximate near(est) neighbor (ANN) search is a coreoperation in AI. ANN-systems based on tree-based indices(e.g. [2]) as well as on random projections (e.g. [7]) areoften employed. However, the large memory footprint ofthese methods limits their use to smaller datasets (up to mil-lions of vectors). Recently, lossy compression schemes thatadmit both compact storage and efficient distance evalua-tions and are therefore more suitable for large-scale datasetshave been developed. Towards this end, binary encodingschemes (e.g. [22, 25, 21]) as well as product quantization[9] have brought down both memory consumption and dis-tance evaluation time by order(s) of magnitude comparedto manipulating uncompressed vectors, to the point whereexhaustive search can be used to query rather large datasets(up to many millions of vectors).
The idea of fast distance computation via product quan-tization introduced by Jegou et al. [9] has served as a pri-mary inspiration for this work. Our contribution, however,is complementary to that of [9]. In fact, the systems pre-sented by Jegou et al. in [9, 11, 10] use standard invertedindices and, consequently, have to rerank rather long can-didate lists when querying very large datasets in order toachieve high recall. Unlike [9, 11, 10], we focus on theuse of PQ for indexing and candidate list generation. Wealso note that while we combine multi-indices with the PQ-
• decompose vectors as x = (x1,x2)
• train codebooks C1, C2 from datasets {x1n}, {x2
n}• induced codebook C1 × C2 gives a finer partition
• given query q, visit cells (c1, c2) ∈ C1 × C2 in ascending order ofdistance to q by multi-sequence algorithm
![Page 70: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/70.jpg)
Inverted multi-index
0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
0 0.2 0.4 0.6 0.8 1
0.4
0.5
0.6
0.7
inverted index inverted multi-indexFigure 1. Indexing the set of 600 points (small black) distributed non-uniformly within the unit 2D square. Left – the inverted index basedon standard quantization (the codebook has 16 2D codewords; boundaries are in green). Right – the inverted multi-index based on productquantization (each of the two codebooks has 16 1D codewords). The number of operations needed to match a query to codebooks is thesame for both structures. Two example queries are issued (light-blue and light-red circles). The lists returned by the inverted index (left)contain 45 and 62 words respectively (circled). Note that when a query lies near a space partition boundary (as happens most often in highdimensions) the resulting list is heavily “skewed” and may not contain many of the nearest neighbors. Note also that the inverted indexis not able to return lists of a pre-specified small length (e.g. 30 points). For the same queries, the candidate lists of at least 30 vectorsare requested from the inverted multi-index (right) and the lists containing 31 and 32 words are returned (circled). As even such shortlists require visiting several nearest cells in the partition (which can be done efficiently via the multi-sequence algorithm), the resultingvector sets span the neighborhoods that are much less “skewed” (i.e., the neighborhoods are approximately centered at the queries). In highdimensions, the capability to visit many cells that surround the query from different directions translates into considerably higher accuracyof retrieval and nearest neighbor search.
multi-index table corresponds to a part of the original vec-tor space and contains a list of points that fall within thatpart. Importantly, we propose a simple and efficient algo-rithm that produces a sequence of multi-index entries or-dered by the increasing distance between the given queryvector and the centroid of the corresponding entry. Simi-larly to standard inverted indices, concatenating the vectorlists for a certain number of entries that are closest to thequery vector then produces the candidate list.
Crucially, given comparable time budgets for queryingthe dataset as well as for the initial index construction, in-verted multi-indices subdivide the vector space orders ofmagnitude more densely compared to standard inverted in-dices (Figure 1). Our experiments demonstrate the ad-vantages resulting from this property, in particular in thecontext of very large scale approximate nearest neighborsearch. We evaluate the inverted multi-index on the BI-GANN dataset of 1 billion SIFT vectors recently introducedby Jegou et al. [11] as well as on the “Tiny Images” datasetof 80 million GIST vectors introduced by [24]. We showthat as a result of the “extra-fine” granularity, the candidatelists produced by querying multi-indices are more accurate(have shorter lengths and higher probability of containingtrue nearest neighbors) compared to standard inverted in-dices. We also demonstrate that in combination with a suit-able reranking procedure, multi-indices substantially im-prove the state-of-the-art approximate nearest neighbor re-trieval performance on the BIGANN dataset.
2. Related Work
The use of inverted indices has a long history in infor-mation retrieval [15]. Their use in computer vision was pi-
oneered by Sivic and Zisserman [23]. Since then, a largenumber of improvements that transfer further ideas fromtext retrieval (e.g. [4]), improve the quantization process(e.g. [20]), and integrate the query process with geomet-ric verification (e.g. [27]) have been proposed. Many ofthese improvements can be used in conjunction with in-verted multi-indices in the same way as with regular in-verted indices.
Approximate near(est) neighbor (ANN) search is a coreoperation in AI. ANN-systems based on tree-based indices(e.g. [2]) as well as on random projections (e.g. [7]) areoften employed. However, the large memory footprint ofthese methods limits their use to smaller datasets (up to mil-lions of vectors). Recently, lossy compression schemes thatadmit both compact storage and efficient distance evalua-tions and are therefore more suitable for large-scale datasetshave been developed. Towards this end, binary encodingschemes (e.g. [22, 25, 21]) as well as product quantization[9] have brought down both memory consumption and dis-tance evaluation time by order(s) of magnitude comparedto manipulating uncompressed vectors, to the point whereexhaustive search can be used to query rather large datasets(up to many millions of vectors).
The idea of fast distance computation via product quan-tization introduced by Jegou et al. [9] has served as a pri-mary inspiration for this work. Our contribution, however,is complementary to that of [9]. In fact, the systems pre-sented by Jegou et al. in [9, 11, 10] use standard invertedindices and, consequently, have to rerank rather long can-didate lists when querying very large datasets in order toachieve high recall. Unlike [9, 11, 10], we focus on theuse of PQ for indexing and candidate list generation. Wealso note that while we combine multi-indices with the PQ-
• decompose vectors as x = (x1,x2)
• train codebooks C1, C2 from datasets {x1n}, {x2
n}• induced codebook C1 × C2 gives a finer partition
• given query q, visit cells (c1, c2) ∈ C1 × C2 in ascending order ofdistance to q by multi-sequence algorithm
![Page 71: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/71.jpg)
Multi-sequence algorithm
C1 →
C2↓
![Page 72: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/72.jpg)
Vector quantization (VQ)
minimize∑x∈X
minc∈C‖x− c‖2 =
∑x∈X‖x− q(x)‖2 = E(C)
dataset codebook quantizer distortion
![Page 73: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/73.jpg)
Vector quantization (VQ)
minimize∑x∈X
minc∈C‖x− c‖2 =
∑x∈X‖x− q(x)‖2 = E(C)
dataset codebook quantizer distortion
![Page 74: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/74.jpg)
Vector quantization (VQ)
minimize∑x∈X
minc∈C‖x− c‖2 =
∑x∈X‖x− q(x)‖2 = E(C)
dataset codebook quantizer distortion
![Page 75: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/75.jpg)
Vector quantization (VQ)
• For small distortion → large k = |C|:• hard to train• too large to store• too slow to search
![Page 76: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/76.jpg)
Product quantization (PQ)
minimize∑x∈X
minc∈C‖x− c‖2
subject to C = C1 × · · · × Cm
![Page 77: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/77.jpg)
Product quantization (PQ)
• train: q = (q1, . . . , qm) where q1, . . . , qm obtained by VQ
• store: |C| = km with |C1| = · · · = |Cm| = k
• search: ‖y − q(x)‖2 =
m∑j=1
‖yj − qj(xj)‖2 where qj(xj) ∈ Cj
![Page 78: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/78.jpg)
Optimized product quantization (OPQ)
minimize∑x∈X
minc∈C‖x−Rc‖2
subject to C = C1 × · · · × CmR>R = I
![Page 79: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/79.jpg)
OPQ, parametric solution for X ∼ N (0,Σ)
• independence: PCA-align by diagonalizing Σ as UΛU>
• balanced variance: permute Λ such that∏i λi is constant in each
subspace; R← UP>π
• find C by PQ on rotated data x = R>x
![Page 80: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/80.jpg)
Locally optimized product quantization (LOPQ)
• compute residuals r(x) = x− q(x) on coarse quantizer q
• collect residuals Zi = {r(x) : q(x) = ci} per cell
• train (Ri, qi)← OPQ(Zi) per cell
![Page 81: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/81.jpg)
Locally optimized product quantization (LOPQ)
• better capture support of data distribution, like local PCA [Kambhatla& Leen ’97]
• multimodal (e.g. mixture) distributions• distributions on nonlinear manifolds
• residual distributions closer to Gaussian assumption
![Page 82: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/82.jpg)
Multi-LOPQ
x = ( x1 , x2 )
q2
q1 ...
...
![Page 83: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/83.jpg)
Comparison to state of the artSIFT1B, 64-bit codes
Method R = 1 R = 10 R = 100
Ck-means [Norouzi & Fleet ’13] – – 0.649IVFADC 0.106 0.379 0.748IVFADC [Jegou et al. ’11] 0.088 0.372 0.733OPQ 0.114 0.399 0.777Multi-D-ADC [Babenko & Lempitsky ’12] 0.165 0.517 0.860
LOR+PQ 0.183 0.565 0.889LOPQ 0.199 0.586 0.909
Most benefit comes from locally optimized rotation!
![Page 84: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/84.jpg)
Comparison to state of the artSIFT1B, 64-bit codes
Method R = 1 R = 10 R = 100
Ck-means [Norouzi & Fleet ’13] – – 0.649IVFADC 0.106 0.379 0.748IVFADC [Jegou et al. ’11] 0.088 0.372 0.733OPQ 0.114 0.399 0.777Multi-D-ADC [Babenko & Lempitsky ’12] 0.165 0.517 0.860
LOR+PQ 0.183 0.565 0.889LOPQ 0.199 0.586 0.909
Most benefit comes from locally optimized rotation!
![Page 85: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/85.jpg)
Comparison to state of the artSIFT1B, 128-bit codes
T Method R = 1 10 100
20KIVFADC+R [Jegou et al. ’11] 0.262 0.701 0.962LOPQ+R 0.350 0.820 0.978
10KMulti-D-ADC [Babenko & Lempitsky ’12] 0.304 0.665 0.740OMulti-D-OADC [Ge et al. ’13] 0.345 0.725 0.794Multi-LOPQ 0.430 0.761 0.782
30KMulti-D-ADC [Babenko & Lempitsky ’12] 0.328 0.757 0.885OMulti-D-OADC [Ge et al. ’13] 0.366 0.807 0.913Multi-LOPQ 0.463 0.865 0.905
100KMulti-D-ADC [Babenko & Lempitsky ’12] 0.334 0.793 0.959OMulti-D-OADC [Ge et al. ’13] 0.373 0.841 0.973Multi-LOPQ 0.476 0.919 0.973
![Page 86: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/86.jpg)
Residual encoding in related work
• PQ (IVFADC) [Jegou et al. ’11]: single product quantizer for all cells
• [Uchida et al. ’12]: multiple product quantizers shared by multiple cells
• OPQ [Ge et al. ’13]: single product quantizer for all cells, globallyoptimized for rotation (single/multi-index)
• LOPQ: with/without one product quantizer per cell, with/withoutrotation optimization per cell (single/multi-index)
• [Babenko & Lempitsky ’14]: one product quantizer per cell, optimizedfor rotation per cell (multi-index)
![Page 87: Image retrieval, vector quantization and nearest neighbor searchimage.ntua.gr/iva/files/rennes.pdf · 2018-04-30 · Image retrieval, vector quantization and nearest neighbor search](https://reader033.vdocuments.net/reader033/viewer/2022041823/5e5f917196782773813c9220/html5/thumbnails/87.jpg)
Residual encoding in related work
• PQ (IVFADC) [Jegou et al. ’11]: single product quantizer for all cells
• [Uchida et al. ’12]: multiple product quantizers shared by multiple cells
• OPQ [Ge et al. ’13]: single product quantizer for all cells, globallyoptimized for rotation (single/multi-index)
• LOPQ: with/without one product quantizer per cell, with/withoutrotation optimization per cell (single/multi-index)
• [Babenko & Lempitsky ’14]: one product quantizer per cell, optimizedfor rotation per cell (multi-index)