hpatches:abenchmarkandevaluationof ...vgg/publications/2017/balntas17/poster.pdf ·...
TRANSCRIPT
HPatches: ABenchmarkandEvaluationofHandcraftedandLearnedLocalDescriptors
Vassileios Balntas∗, Karel Lenc∗, Andrea Vedaldi and Krystian Mikolajczyk∗Authors contributed equally http://hpatches.github.io
Imperial College London, UK University of Oxford, UK
Motivation & Aim
• No standardised multi-task evaluation method for lo-cal feature descriptors
• Result inconsistencies in previous works:LIOP > SIFT [7, 10] , SIFT > LIOP [12]
BRISK > SIFT [4, 7] , SIFT > BRISK [5]ORB > SIFT [9] , SIFT > ORB [7]
• New benchmark for local image descriptors– Train and test split– Strictly defined evaluation protocol for matching,
retrieval and verification task– Meticulously controlled for side effects such as
measurement region size• Performance of traditional descriptors is comparable
to deep learning ones when carefully normalised
Design objectives
Comparison to existing local descriptor benchmarks.
dataset patchdiverse
reallarge
multitask
Photo Tourism [11] X X X
DTU [1] X X
Oxford-Affine [6] X X
Synth. Matching [3] X X
CVDS [2] X X X
RomePatches [8] X X
HPatches X X X X X
Image Sequences
A large scale Oxford-Affine dataset [6] - 6 imagesin 116 sequences with a ground-truth Homography.
Viewpoint (59 sequences)
Photometric (57 sequences)
116 Sequences � 696 Images � 6 images/sequence157 000 Unique patches of 65 × 65 pixels
Patch extraction
Patches detected by multiple feature detectors on a reference image and reprojected using ground truth homog-raphy with controlled geometry noise Easy, Hard and Tough to measure geometry invariance of a descriptor.
2 3 4 5 60.4
0.5
0.6
0.7
0.8
Image
Av
g.
Over
lap
graf EASY
boat HARD
wall TOUGH
Hes HesAff
Benchmark tasks
Image Matching
Ref.
Target
CorrectCorrect
Patch Retreival
Query
Pool
Patch Verification
Same
Same
Same
SameNot Same
Not Same
Benchmark Results
0 20 40 60 80 100
RESZ
MSTD
BRIEF
RSIFT
ORB
SIFT
BBOOST
DC-S
+SIFT
+RSIFT
DC-S2S
DDESC
+DC-S
+DDESC
TF-M
TF-R
+TF-M
+DC-S2S
+TF-R 83.24%
83.03%
82.69%
81.92%
81.90%81.65%
81.63%
79.51%78.23%
76.70%
74.35%
70.04%
66.67%
65.12%
60.15%
58.53%
58.07%
48.75%
48.11%
Patch Verification mAP [%]
0 20 40 60 80 100
MSTD
RESZ
BRIEF
BBOOST
ORB
DC-S
SIFT
RSIFT
DC-S2S
DDESC
TF-R
+DC-S
+DC-S2S
TF-M
+SIFT
+TF-M
+TF-R
+DDESC
+RSIFT 36.77%
35.44%
34.37%
34.29%32.76%
32.64%
32.34%
31.65%
30.61%
28.05%
27.69%27.22%
25.47%
24.92%15.32%
14.77%
10.50%
7.16%
0.10%
Image Matching mAP [%]
0 20 40 60 80 100
MSTD
RESZ
BRIEF
ORB
BBOOST
SIFT
RSIFT
DC-S2S
DC-S
TF-R
+DC-S2S
TF-M
+DC-S
DDESC
+TF-M
+TF-R
+SIFT
+RSIFT
+DDESC 44.55%
43.84%
40.36%
40.23%
40.02%
39.83%
39.68%
39.40%38.23%
37.69%34.84%
34.76%
33.56%
31.98%22.45%
18.85%
16.03%
13.12%
1.20%
Patch Retrieval mAP [%]
Evaluated Descriptors
Selected descriptors (with trivial baselines) and theirproperties. Speed in thousand of patches/s.
Descr. MS
td
Resz
SIF
T
RS
IF
T
BR
IE
F
BB
oo
st
OR
B
DC
-S
DC
-S
2S
DD
esc
TF
-M
TF
-R
Dims 2 36 128 128 ∗256 ∗256 ∗256 256 512 128 128 128
Patch Sz 65 65 65 65 32 32 32 64 64 64 32 32Speed CPU 67 3 2 2 333 2 333 0.3 0.2 0.1 0.6 0.6
Speed GPU 10 5 2.3 83 83
Descriptor types:Trivial, SIFT-Like, Binary, Deep learning‘+’ variants - with PCA whitening.
Conclusions
• Introduced patch-based, larger-scale reproduciblebenchmark with strict evaluation protocol
• Ranking of descriptors varies across tasks• Good performance on Patch Verification does not
imply good performance in the other tasks• Simple PCA-and power low based normalisation can
significantly help matching and retrieval
Source Code
Python/MATLAB source evaluation available at:github.com/hpatches/hpatches-benchmark
Dataset available (with original sequences):github.com/hpatches/hpatches-dataset
References[1] H. Aanæs, et al.. Interesting interest points. IJCV, 2012.[2] V. Chandrasekhar, et al.. Feature matching performance of compact descriptors for visual search.
DCC, 2014.[3] P. Fischer, A. Dosovitskiy, and T. Brox. Descriptor matching with CNNs: a comparison to sift.
arXiv, 2014.[4] S. Leutenegger, et al. BRISK: Binary robust invariant scalable keypoints. ICCV, 2011.[5] G. Levi and T. Hassner. LATCH: learned arrangements of three patch codes. WACV, 2016.[6] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. PAMI, 2005.[7] O. Miksik and K. Mikolajczyk. Evaluation of local detectors and descriptors for fast feature
matching. ICPR, 2012.[8] M. Paulin, et al. Local convolutional features with unsupervised training for image retrieval. ICCV,
2015.[9] E. Rublee, et al. ORB: An efficient alternative to SIFT or SURF. ICCV, 2011.[10] Z. Wang, B. Fan, F. Wu. Local intensity order pattern for feature description. ICCV, 2011.[11] S. Winder and M. Brown. Learning local image descs. CVPR, 2007.[12] Tsun-Yi Yang, et al. Accumulated stability voting: A robust descriptor from descriptors of multiple
scales. CVPR, 2016.
Acknowledgements
Karel Lenc is supported by ERC 677195-IDIU and Vas-sileios Balntas is supported by FACER2VM EPSRCEP/N007743/1.