hpatches:abenchmarkandevaluationof ...vgg/publications/2017/balntas17/poster.pdf ·...

HPatches: ABenchmarkandEvaluationofHandcraftedandLearnedLocalDescriptors

Vassileios Balntas∗, Karel Lenc∗, Andrea Vedaldi and Krystian Mikolajczyk∗Authors contributed equally http://hpatches.github.io

Imperial College London, UK University of Oxford, UK

Motivation & Aim

• No standardised multi-task evaluation method for lo-cal feature descriptors

• Result inconsistencies in previous works:LIOP > SIFT [7, 10] , SIFT > LIOP [12]

BRISK > SIFT [4, 7] , SIFT > BRISK [5]ORB > SIFT [9] , SIFT > ORB [7]

• New benchmark for local image descriptors– Train and test split– Strictly defined evaluation protocol for matching,

retrieval and verification task– Meticulously controlled for side effects such as

measurement region size• Performance of traditional descriptors is comparable

to deep learning ones when carefully normalised

Design objectives

Comparison to existing local descriptor benchmarks.

dataset patchdiverse

reallarge

multitask

Photo Tourism [11] X X X

DTU [1] X X

Oxford-Affine [6] X X

Synth. Matching [3] X X

CVDS [2] X X X

RomePatches [8] X X

HPatches X X X X X

Image Sequences

A large scale Oxford-Affine dataset [6] - 6 imagesin 116 sequences with a ground-truth Homography.

Viewpoint (59 sequences)

Photometric (57 sequences)

116 Sequences � 696 Images � 6 images/sequence157 000 Unique patches of 65 × 65 pixels

Patch extraction

Patches detected by multiple feature detectors on a reference image and reprojected using ground truth homog-raphy with controlled geometry noise Easy, Hard and Tough to measure geometry invariance of a descriptor.

2 3 4 5 60.4

0.5

0.6

0.7

0.8

Image

Av

g.

Over

lap

graf EASY

boat HARD

wall TOUGH

Hes HesAff

Benchmark tasks

Image Matching

Ref.

Target

CorrectCorrect

Patch Retreival

Query

Pool

Patch Verification

Same

Same

Same

SameNot Same

Not Same

Benchmark Results

0 20 40 60 80 100

RESZ

MSTD

BRIEF

RSIFT

ORB

SIFT

BBOOST

DC-S

+SIFT

+RSIFT

DC-S2S

DDESC

+DC-S

+DDESC

TF-M

TF-R

+TF-M

+DC-S2S

+TF-R 83.24%

83.03%

82.69%

81.92%

81.90%81.65%

81.63%

79.51%78.23%

76.70%

74.35%

70.04%

66.67%

65.12%

60.15%

58.53%

58.07%

48.75%

48.11%

Patch Verification mAP [%]

0 20 40 60 80 100

MSTD

RESZ

BRIEF

BBOOST

ORB

DC-S

SIFT

RSIFT

DC-S2S

DDESC

TF-R

+DC-S

+DC-S2S

TF-M

+SIFT

+TF-M

+TF-R

+DDESC

+RSIFT 36.77%

35.44%

34.37%

34.29%32.76%

32.64%

32.34%

31.65%

30.61%

28.05%

27.69%27.22%

25.47%

24.92%15.32%

14.77%

10.50%

7.16%

0.10%

Image Matching mAP [%]

0 20 40 60 80 100

MSTD

RESZ

BRIEF

ORB

BBOOST

SIFT

RSIFT

DC-S2S

DC-S

TF-R

+DC-S2S

TF-M

+DC-S

DDESC

+TF-M

+TF-R

+SIFT

+RSIFT

+DDESC 44.55%

43.84%

40.36%

40.23%

40.02%

39.83%

39.68%

39.40%38.23%

37.69%34.84%

34.76%

33.56%

31.98%22.45%

18.85%

16.03%

13.12%

1.20%

Patch Retrieval mAP [%]

Evaluated Descriptors

Selected descriptors (with trivial baselines) and theirproperties. Speed in thousand of patches/s.

Descr. MS

td

Resz

SIF

T

RS

IF

T

BR

IE

F

BB

oo

st

OR

B

DC

-S

DC

-S

2S

DD

esc

TF

-M

TF

-R

Dims 2 36 128 128 ∗256 ∗256 ∗256 256 512 128 128 128

Patch Sz 65 65 65 65 32 32 32 64 64 64 32 32Speed CPU 67 3 2 2 333 2 333 0.3 0.2 0.1 0.6 0.6

Speed GPU 10 5 2.3 83 83

Descriptor types:Trivial, SIFT-Like, Binary, Deep learning‘+’ variants - with PCA whitening.

Conclusions

• Introduced patch-based, larger-scale reproduciblebenchmark with strict evaluation protocol

• Ranking of descriptors varies across tasks• Good performance on Patch Verification does not

imply good performance in the other tasks• Simple PCA-and power low based normalisation can

significantly help matching and retrieval

Source Code

Python/MATLAB source evaluation available at:github.com/hpatches/hpatches-benchmark

Dataset available (with original sequences):github.com/hpatches/hpatches-dataset

References[1] H. Aanæs, et al.. Interesting interest points. IJCV, 2012.[2] V. Chandrasekhar, et al.. Feature matching performance of compact descriptors for visual search.

DCC, 2014.[3] P. Fischer, A. Dosovitskiy, and T. Brox. Descriptor matching with CNNs: a comparison to sift.

arXiv, 2014.[4] S. Leutenegger, et al. BRISK: Binary robust invariant scalable keypoints. ICCV, 2011.[5] G. Levi and T. Hassner. LATCH: learned arrangements of three patch codes. WACV, 2016.[6] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. PAMI, 2005.[7] O. Miksik and K. Mikolajczyk. Evaluation of local detectors and descriptors for fast feature

matching. ICPR, 2012.[8] M. Paulin, et al. Local convolutional features with unsupervised training for image retrieval. ICCV,

2015.[9] E. Rublee, et al. ORB: An efficient alternative to SIFT or SURF. ICCV, 2011.[10] Z. Wang, B. Fan, F. Wu. Local intensity order pattern for feature description. ICCV, 2011.[11] S. Winder and M. Brown. Learning local image descs. CVPR, 2007.[12] Tsun-Yi Yang, et al. Accumulated stability voting: A robust descriptor from descriptors of multiple

scales. CVPR, 2016.

Acknowledgements

Karel Lenc is supported by ERC 677195-IDIU and Vas-sileios Balntas is supported by FACER2VM EPSRCEP/N007743/1.

hpatches:abenchmarkandevaluationof ...vgg/publications/2017/balntas17/poster.pdf ·...

Documents