hpatches:abenchmarkandevaluationof ...vgg/publications/2017/balntas17/poster.pdf ·...

1
HPatches: A Benchmark and Evaluation of Handcrafted and Learned Local Descriptors Vassileios Balntas * , Karel Lenc * , Andrea Vedaldi and Krystian Mikolajczyk * Authors contributed equally http://hpatches.github.io Imperial College London, UK University of Oxford, UK Motivation & Aim No standardised multi-task evaluation method for lo- cal feature descriptors Result inconsistencies in previous works: LIOP > SIFT [7, 10] , SIFT > LIOP [12] BRISK > SIFT [4, 7] , SIFT > BRISK [5] ORB > SIFT [9] , SIFT > ORB [7] New benchmark for local image descriptors Train and test split Strictly defined evaluation protocol for matching, retrieval and verification task Meticulously controlled for side effects such as measurement region size Performance of traditional descriptors is comparable to deep learning ones when carefully normalised Design objectives Comparison to existing local descriptor benchmarks. dataset patch diverse real large multitask Photo Tourism [11] DTU [1] Oxford-Affine [6] Synth. Matching [3] CVDS [2] RomePatches [8] HPatches Image Sequences A large scale Oxford-Affine dataset [6] - 6 images in 116 sequences with a ground-truth Homography. Viewpoint (59 sequences) Photometric (57 sequences) 116 Sequences 696 Images 6 images/sequence 157 000 Unique patches of 65 × 65 pixels Patch extraction Patches detected by multiple feature detectors on a reference image and reprojected using ground truth homog- raphy with controlled geometry noise Easy, Hard and Tough to measure geometry invariance of a descriptor. 2 3 4 5 6 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 Image Avg. Overlap graf E ASY boat H ARD wall TOUGH Hes HesAff Benchmark tasks Image Matching Ref. Target Correct Correct Patch Retreival Query Pool Patch Verication Same Same Same Same Not Same Not Same Benchmark Results 0 20 40 60 80 100 R ESZ MS TD BRIEF RSIFT ORB SIFT BB OOST DC-S +SIFT +RSIFT DC-S2S DD ESC +DC-S +DD ESC TF-M TF-R +TF-M +DC-S2S +TF-R 83.24% 83.03% 82.69% 81.92% 81.90% 81.65% 81.63% 79.51% 78.23% 76.70% 74.35% 70.04% 66.67% 65.12% 60.15% 58.53% 58.07% 48.75% 48.11% Patch Verification mAP [%] 0 20 40 60 80 100 MS TD R ESZ BRIEF BB OOST ORB DC-S SIFT RSIFT DC-S2S DD ESC TF-R +DC-S +DC-S2S TF-M +SIFT +TF-M +TF-R +DD ESC +RSIFT 36.77% 35.44% 34.37% 34.29% 32.76% 32.64% 32.34% 31.65% 30.61% 28.05% 27.69% 27.22% 25.47% 24.92% 15.32% 14.77% 10.50% 7.16% 0.10% Image Matching mAP [%] 0 20 40 60 80 100 MS TD R ESZ BRIEF ORB BB OOST SIFT RSIFT DC-S2S DC-S TF-R +DC-S2S TF-M +DC-S DD ESC +TF-M +TF-R +SIFT +RSIFT +DD ESC 44.55% 43.84% 40.36% 40.23% 40.02% 39.83% 39.68% 39.40% 38.23% 37.69% 34.84% 34.76% 33.56% 31.98% 22.45% 18.85% 16.03% 13.12% 1.20% Patch Retrieval mAP [%] Evaluated Descriptors Selected descriptors (with trivial baselines) and their properties. Speed in thousand of patches/s. Descr. MStd Resz SIFT RSIFT BRIEF BBoost ORB DC-S DC-S2S DDesc TF-M TF-R Dims 2 36 128 128 * 256 * 256 * 256 256 512 128 128 128 Patch Sz 65 65 65 65 32 32 32 64 64 64 32 32 Speed CPU 67 3 2 2 333 2 333 0.3 0.2 0.1 0.6 0.6 Speed GPU 10 5 2.3 83 83 Descriptor types: Trivial, SIFT-Like, Binary, Deep learning +’ variants - with PCA whitening. Conclusions Introduced patch-based, larger-scale reproducible benchmark with strict evaluation protocol Ranking of descriptors varies across tasks Good performance on Patch Verification does not imply good performance in the other tasks Simple PCA-and power low based normalisation can significantly help matching and retrieval Source Code Python/MATLAB source evaluation available at: github.com/hpatches/hpatches-benchmark Dataset available (with original sequences): github.com/hpatches/hpatches-dataset References [1] H. Aanæs, et al.. Interesting interest points. IJCV, 2012. [2] V. Chandrasekhar, et al.. Feature matching performance of compact descriptors for visual search. DCC, 2014. [3] P. Fischer, A. Dosovitskiy, and T. Brox. Descriptor matching with CNNs: a comparison to sift. arXiv, 2014. [4] S. Leutenegger, et al. BRISK: Binary robust invariant scalable keypoints. ICCV, 2011. [5] G. Levi and T. Hassner. LATCH: learned arrangements of three patch codes. WACV, 2016. [6] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. PAMI, 2005. [7] O. Miksik and K. Mikolajczyk. Evaluation of local detectors and descriptors for fast feature matching. ICPR, 2012. [8] M. Paulin, et al. Local convolutional features with unsupervised training for image retrieval. ICCV, 2015. [9] E. Rublee, et al. ORB: An efficient alternative to SIFT or SURF. ICCV, 2011. [10] Z. Wang, B. Fan, F. Wu. Local intensity order pattern for feature description. ICCV, 2011. [11] S. Winder and M. Brown. Learning local image descs. CVPR, 2007. [12] Tsun-Yi Yang, et al. Accumulated stability voting: A robust descriptor from descriptors of multiple scales. CVPR, 2016. Acknowledgements Karel Lenc is supported by ERC 677195-IDIU and Vas- sileios Balntas is supported by FACER2VM EPSRC EP/N007743/1.

Upload: ngophuc

Post on 16-Sep-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPatches:ABenchmarkandEvaluationof ...vgg/publications/2017/Balntas17/poster.pdf · HPatches:ABenchmarkandEvaluationof HandcraftedandLearnedLocalDescriptors ... DTU [1] X X Oxford-Affine

HPatches: ABenchmarkandEvaluationofHandcraftedandLearnedLocalDescriptors

Vassileios Balntas∗, Karel Lenc∗, Andrea Vedaldi and Krystian Mikolajczyk∗Authors contributed equally http://hpatches.github.io

Imperial College London, UK University of Oxford, UK

Motivation & Aim

• No standardised multi-task evaluation method for lo-cal feature descriptors

• Result inconsistencies in previous works:LIOP > SIFT [7, 10] , SIFT > LIOP [12]

BRISK > SIFT [4, 7] , SIFT > BRISK [5]ORB > SIFT [9] , SIFT > ORB [7]

• New benchmark for local image descriptors– Train and test split– Strictly defined evaluation protocol for matching,

retrieval and verification task– Meticulously controlled for side effects such as

measurement region size• Performance of traditional descriptors is comparable

to deep learning ones when carefully normalised

Design objectives

Comparison to existing local descriptor benchmarks.

dataset patchdiverse

reallarge

multitask

Photo Tourism [11] X X X

DTU [1] X X

Oxford-Affine [6] X X

Synth. Matching [3] X X

CVDS [2] X X X

RomePatches [8] X X

HPatches X X X X X

Image Sequences

A large scale Oxford-Affine dataset [6] - 6 imagesin 116 sequences with a ground-truth Homography.

Viewpoint (59 sequences)

Photometric (57 sequences)

116 Sequences � 696 Images � 6 images/sequence157 000 Unique patches of 65 × 65 pixels

Patch extraction

Patches detected by multiple feature detectors on a reference image and reprojected using ground truth homog-raphy with controlled geometry noise Easy, Hard and Tough to measure geometry invariance of a descriptor.

2 3 4 5 60.4

0.5

0.6

0.7

0.8

Image

Av

g.

Over

lap

graf EASY

boat HARD

wall TOUGH

Hes HesAff

Benchmark tasks

Image Matching

Ref.

Target

CorrectCorrect

Patch Retreival

Query

Pool

Patch Verification

Same

Same

Same

SameNot Same

Not Same

Benchmark Results

0 20 40 60 80 100

RESZ

MSTD

BRIEF

RSIFT

ORB

SIFT

BBOOST

DC-S

+SIFT

+RSIFT

DC-S2S

DDESC

+DC-S

+DDESC

TF-M

TF-R

+TF-M

+DC-S2S

+TF-R 83.24%

83.03%

82.69%

81.92%

81.90%81.65%

81.63%

79.51%78.23%

76.70%

74.35%

70.04%

66.67%

65.12%

60.15%

58.53%

58.07%

48.75%

48.11%

Patch Verification mAP [%]

0 20 40 60 80 100

MSTD

RESZ

BRIEF

BBOOST

ORB

DC-S

SIFT

RSIFT

DC-S2S

DDESC

TF-R

+DC-S

+DC-S2S

TF-M

+SIFT

+TF-M

+TF-R

+DDESC

+RSIFT 36.77%

35.44%

34.37%

34.29%32.76%

32.64%

32.34%

31.65%

30.61%

28.05%

27.69%27.22%

25.47%

24.92%15.32%

14.77%

10.50%

7.16%

0.10%

Image Matching mAP [%]

0 20 40 60 80 100

MSTD

RESZ

BRIEF

ORB

BBOOST

SIFT

RSIFT

DC-S2S

DC-S

TF-R

+DC-S2S

TF-M

+DC-S

DDESC

+TF-M

+TF-R

+SIFT

+RSIFT

+DDESC 44.55%

43.84%

40.36%

40.23%

40.02%

39.83%

39.68%

39.40%38.23%

37.69%34.84%

34.76%

33.56%

31.98%22.45%

18.85%

16.03%

13.12%

1.20%

Patch Retrieval mAP [%]

Evaluated Descriptors

Selected descriptors (with trivial baselines) and theirproperties. Speed in thousand of patches/s.

Descr. MS

td

Resz

SIF

T

RS

IF

T

BR

IE

F

BB

oo

st

OR

B

DC

-S

DC

-S

2S

DD

esc

TF

-M

TF

-R

Dims 2 36 128 128 ∗256 ∗256 ∗256 256 512 128 128 128

Patch Sz 65 65 65 65 32 32 32 64 64 64 32 32Speed CPU 67 3 2 2 333 2 333 0.3 0.2 0.1 0.6 0.6

Speed GPU 10 5 2.3 83 83

Descriptor types:Trivial, SIFT-Like, Binary, Deep learning‘+’ variants - with PCA whitening.

Conclusions

• Introduced patch-based, larger-scale reproduciblebenchmark with strict evaluation protocol

• Ranking of descriptors varies across tasks• Good performance on Patch Verification does not

imply good performance in the other tasks• Simple PCA-and power low based normalisation can

significantly help matching and retrieval

Source Code

Python/MATLAB source evaluation available at:github.com/hpatches/hpatches-benchmark

Dataset available (with original sequences):github.com/hpatches/hpatches-dataset

References[1] H. Aanæs, et al.. Interesting interest points. IJCV, 2012.[2] V. Chandrasekhar, et al.. Feature matching performance of compact descriptors for visual search.

DCC, 2014.[3] P. Fischer, A. Dosovitskiy, and T. Brox. Descriptor matching with CNNs: a comparison to sift.

arXiv, 2014.[4] S. Leutenegger, et al. BRISK: Binary robust invariant scalable keypoints. ICCV, 2011.[5] G. Levi and T. Hassner. LATCH: learned arrangements of three patch codes. WACV, 2016.[6] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. PAMI, 2005.[7] O. Miksik and K. Mikolajczyk. Evaluation of local detectors and descriptors for fast feature

matching. ICPR, 2012.[8] M. Paulin, et al. Local convolutional features with unsupervised training for image retrieval. ICCV,

2015.[9] E. Rublee, et al. ORB: An efficient alternative to SIFT or SURF. ICCV, 2011.[10] Z. Wang, B. Fan, F. Wu. Local intensity order pattern for feature description. ICCV, 2011.[11] S. Winder and M. Brown. Learning local image descs. CVPR, 2007.[12] Tsun-Yi Yang, et al. Accumulated stability voting: A robust descriptor from descriptors of multiple

scales. CVPR, 2016.

Acknowledgements

Karel Lenc is supported by ERC 677195-IDIU and Vas-sileios Balntas is supported by FACER2VM EPSRCEP/N007743/1.