destnet: densely fused spatial transformer networks...

ANNUNZIATA, SAGONAS, CALÌ: DENSELY FUSED SPATIAL TRANSFORMER NETWORKS 1 DeSTNet: Densely Fused Spatial Transformer Networks – Supplementary Material Roberto Annunziata roberto.annunziata@onfido.com Christos Sagonas christos.sagonas@onfido.com Jacques Calì jacques.cali@onfido.com Onfido Research 3 Finsbury Avenue London, UK 1 Additional Results for Section 4.1 Figures 1 and 2 show additional alignment results obtained by the proposed DeSTNet model on GTSRB [3] and MNIST [1] datasets, respectively. Figure 1: Sample alignment results produced by the DeSTNet-4 model on the GTSRB dataset. Row 1: input image. Rows 2-4: results produced after each one of the four lev- els. 2 Architectures and Additional Results for Section 4.2 Table 1 reports the architectures of the compared CSTN-5 [2] and DeSTNet-5 models for the task of planar image alignment. c 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.

Upload: others

Post on 16-Oct-2020

3 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: DeSTNet: Densely Fused Spatial Transformer Networks ...bmvc2018.org/contents/supplementary/pdf/0552_supp.pdfInverse compositional spatial transformer networks. In Proceedings of IEEE

ANNUNZIATA, SAGONAS, CALÌ: DENSELY FUSED SPATIAL TRANSFORMER NETWORKS1

DeSTNet: Densely Fused SpatialTransformer Networks – SupplementaryMaterial

Roberto [email protected]

Christos [email protected]

Jacques Calì[email protected]

Onfido Research3 Finsbury AvenueLondon, UK

1 Additional Results for Section 4.1Figures 1 and 2 show additional alignment results obtained by the proposed DeSTNet modelon GTSRB [3] and MNIST [1] datasets, respectively.

Figure 1: Sample alignment results produced by the DeSTNet-4 model on the GTSRBdataset. Row 1: input image. Rows 2-4: results produced after each one of the four lev-els.

2 Architectures and Additional Results for Section 4.2Table 1 reports the architectures of the compared CSTN-5 [2] and DeSTNet-5 models forthe task of planar image alignment.

Citation

{Stallkamp, Schlipsing, Salmen, and Igel} 2011

Citation

{LeCun} 1998

Citation

{Lin and Lucey} 2017

Page 2: DeSTNet: Densely Fused Spatial Transformer Networks ...bmvc2018.org/contents/supplementary/pdf/0552_supp.pdfInverse compositional spatial transformer networks. In Proceedings of IEEE

2ANNUNZIATA, SAGONAS, CALÌ: DENSELY FUSED SPATIAL TRANSFORMER NETWORKS

Figure 2: Sample alignment results produced by the DeSTNet-4 model on the MNISTdataset. Row 1: input image. Rows 2-4: results produced after each one of the four lev-els.

Model ArchitectureCSTN-5 [ conv3-64(2) | conv3-128(2) | conv3-256(2) | FC8 ]×5

DeSTNet-5 F{[ conv3-64(2) | conv3-128(2) | conv3-256(2) | FC8 ]×5}Table 1: Architectures utilized by CSTN-5 and DeSTNet-5. convD1-D2(D3): convolutionlayer with D1×D1 receptive field, D2 channels and D3 stride, FC: fully connected layer, F :fusion operation used in DeSTNet for fusing the parameters updates.

Additional qualitative results obtained by the CSTN-5 and DeSTNet-5 on the IDocDBdatabase are provided in Figs. 3, 4. These results confirm that the proposed DeSTNet ismore accurate than the CSTN and show better robustness against partial-occlusions, clutterand low-light conditions.

References[1] Yann LeCun. The mnist database of handwritten digits. http://yann. lecun.

com/exdb/mnist/, 1998.

[2] Chen-Hsuan Lin and Simon Lucey. Inverse compositional spatial transformer networks.In Proceedings of IEEE International Conference on Computer Vision & Pattern Recog-nition (CVPR), pages 2568–2576, 2017.

[3] Johannes Stallkamp, Marc Schlipsing, Jan Salmen, and Christian Igel. The german traf-fic sign recognition benchmark: a multi-class classification competition. In Proceedingsof International Joint Conference on Neural Networks (IJCNN), pages 1453–1460, 2011.