higher-order statistical modeling based deep cnns part-ii · 2018/11/23  · dimensional gaussian...

71
Global Second-order Pooling for Deep CNN Li Peihua, Wang Qilong 2018-11-23 Higher-order Statistical Modeling based Deep CNNs Part-II

Upload: others

Post on 17-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling for Deep CNN Li Peihua, Wang Qilong

2018-11-23

Higher-order Statistical Modeling based Deep CNNs (Part-II)

Page 2: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Outline

Global Second-order Pooling Matrix Power Normalizatin and Fast Training Global Distribution Modeling for CNN Conclusion

Page 3: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global average pooling

Global Second-order Pooling

• Deep CNN Learning Represenation from Local to Global

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

Page 4: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling

• Global Average Pooling: Widespread in Inception, ResNet, DenseNet etc.

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

Global average pooling

Min Lin, Qiang Chen, Shuicheng Yan. Network in network. In ICLR, 2014.

Page 5: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

Global average pooling

Min Lin, Qiang Chen, Shuicheng Yan. Network in network. In ICLR, 2014.

w h

p Last feature maps

• Global Average Pooling: Widespread in Inception, ResNet, DenseNet etc.

chan

nel

width h

Last feature maps

Page 6: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling • Global Covariance Pooling

Capturing correlations of different channels

w h

p

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

chan

nel

width h

Last feature maps

Page 7: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

J. Carreira et al. Semantic Segmentation with Second-Order Pooling. in ECCV, 2012. J. Carreira et al. Freeform region description with second-order pooling. IEEE TPAMI, 2015.

log( )∑

covariance matrix ∑

Global Second-order Pooling From O2P (ECCV’12) to DeepO2P (ICCV’15)

Page 8: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling • From O2P (ECCV’12) to DeepO2P (ICCV’15)

w h

p

Last feature maps

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

log( )∑

[DeepO2P] C. Ionescu et al. Matrix Backpropagation for Deep Networks with Structured Layers. In ICCV, 2015.

First end-to-end global covariance pooling

Backprogation of Singlular Value Decompositon (SVD)

Performance is not competitive

Page 9: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling • Global Covariance Pooling ─ Blinear CNN (i.e. B-CNN, ICCV’15)

w h

p

Last feature maps

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

First end-to-end global covariance pooling

Element-wise SQRT+l2 normalziation

Competitive performance on Fine-grained classification

[B-CNN] T.-Y. Lin, A. RoyChowdhury, S. Maji. Bilinear CNN models for fine-grained visual recognition. In ICCV, 2015.

Page 10: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Last feature maps

Global Second-order Pooling

w h

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

1. Statistical problem of Small sample/high-dimensionality(n<p )

AlexNet: 256/36

VGG-16: 512/49

ResNet: 2048/49

d/n

d

d

n=wh

• Challenges of Global Covariance Pooling in CNN

1

1 ( )( )n

Ti i

in =

∑ = − −∑ x xµ µ

2. Manifold structure of covariance matrices

p

n=wh

Geodesic distance needs to be considered

Will global 2nd-order pooling work on

DeepO2P and B-CNN fail to perform well

Page 11: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling

w h

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

1. Statistical problem of small sample/high-dimensionality(n<p )

AlexNet: p=256/n=36

VGG-16: p=512/n=49

ResNet: p=2048/n=49

p/n

p

p

n=wh

• Challenges of Global Covariance Pooling in CNN

Last feature maps

1

1 ( )( ) , n

T pi i i

in =

− − ∈∑ x x xµ µ

Page 12: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Last feature maps

Global Second-order Pooling

w h

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

1. Statistical problem of Small sample/high-dimensionality(n<p )

AlexNet: 256/36

VGG-16: 512/49

ResNet: 2048/49

d/n

d

d

n=wh

• Challenges of Global Covariance Pooling in CNN

1

1 ( )( )n

Ti i

in =

∑ = − −∑ x xµ µ

2. Manifold structure of covariance matrices

p

n=wh

Geodesic distance needs to be considered

Page 13: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Global Second-order Pooling

DeepO2P and B-CNN fail to well address :

[DeepO2P] C. Ionescu et al. Matrix Backpropagation for Deep Networks with Structured Layers. In ICCV, 2015. [B-CNN] T.-Y. Lin, A. RoyChowdhury, S. Maji. Bilinear CNN models for fine-grained visual recognition. In ICCV, 2015.

3. Whether work on large-scale .

1. Statistical small sample/high dimensionality;

2. Manifold structure of covariance matrices;

Page 14: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Outline

Global Covariance Pooling Matrix Power Normalizatin and Fast Training Global Distribution Modeling for CNN Conclusion

Page 15: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Matrix Power Normalization and Fast Training

• MPN-COV (ICCV’17)

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

Capturing correlations of different channels

0.5 α = performing best

Page 16: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

0.5 α = performing best

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

Matrix Power Normalization and Fast Training

• MPN-COV (ICCV’17)

Challenge 1 Small sample/high-dimensionality, e.g.

ResNet, 49/2048

Challenge 2 Exploiting geometry of

covariance spaces

Page 17: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Small sample/high-dimensionality ( n < p )

O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Analysis, 88(2):365–411, 2004. C. Stein. Lectures on the theory of estimation of many parameters. Journal of Soviet Mathematics, 34(1):1373–1403, 1986.

Why MPN-COV works:Statistical Insight (qualitatively)

p

n=wh

1

1 ( )( ) ,n

T pi i i

in =

− − ∈∑ x x xµ µ Adverse effect: with p/n , largest (resp. smallest) values tends to be much larger (smaller) than ground truths

Page 18: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

1log( ) 0

0 log( )

T

p

U Uλ

λ

log-E metric

4 210 10− −→

0.64 0.8→1 0

0

T

p

U U

α

α

λ

λ

Why MPN-COV works:Statistical Insight (qualitatively-FP

Power-E metric

Power-E metric relatively shrinks largest eigenvalues while stretchs smallest eigenvalues

Page 19: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Power-E metric

1 0

0

T

p

U U

α

α

λ

λ

1log( ) 0

0 log( )

T

p

U Uλ

λ

log-E metric

4 210 10− −→

0.9 0.1→−410 9.1− → −

Why MPN-COV works:Statistical Insight (qualitatively-FP)

Log-E metric over stretchs smallest eigenvalues, changing order of significance of eigenvalues

Page 20: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

• Log-E: Smallest eigenvalues affect the gradient considerably

Power-E metric log-E metric

Why MPN-COV works:Statistical Insight (qualitatively-FP)

Page 21: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Classical MLE 1

1 ( )( ) N

Ti i

iN =

− − =∑P

x xµ µ

[RAID-G] Qilong Wang, Peihua Li, Wangmeng Zuo, Lei Zhang. RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016

Why MPN-COV works: Statistical Insight (quantitatively)

Page 22: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

• Regularized Maximum Likelihood Estimation (MLE) Method: vN-MLE

Classical MLE 1

1 ( )( ) N

Ti i

iN =

− − =∑P

x xµ µ

[RAID-G] Qilong Wang, Peihua Li, Wangmeng Zuo, Lei Zhang. RAID-G: Robust Estimation of Approximate Infinite Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016

Why MPN-COV works: Statistical Insight (quantitatively)

Page 23: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

• Power Euclidean exploits gemotry

Why MPN-COV works─Geometric Insight

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

log-E metric

Page 24: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

• Exploiting geometry of covariance spaces

Why MPN-COV works─Geometric Insight

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

Log-E metric

measures true geodesic distance The input must be strictly positive

Power-E metric

measures it approximately allows non-negative number

log-E metric

Page 25: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Forward propagation Backward propagation

( )( ) ( )

, , ,

, ,

T

T

T

=

Λ = Λ

Λ = Λ

X P P XIXP U P U U

U Q Q UF U

tr d tr d

tr d tr d d

tr d d tr d

T T

T T T

TT T

l l

l l l

l l l

∂ ∂ = ∂ ∂ ∂ ∂ ∂ = + Λ ∂ ∂ ∂Λ

∂ ∂ ∂ + Λ = ∂ ∂Λ ∂

X PX P

P UP U

U QU Q

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

Matrix Power Normalization and Fast Training

Page 26: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

MPN

Matrix Power Normalization and Fast Training

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

Page 27: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017. [G2DeNet] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. In CVPR, 2017 (Oral).

MPN-COV (ICCV17)

• Downside of Eigendecomposition (or SVD)

Matrix Power Normalization and Fast Training

Page 28: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Matrix Power Normalization and Fast Training • Downside of Eigendecomposition (or SVD)

[MPN-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for Large-scale Visual Recognition? In ICCV, 2017.

MPN-COV (ICCV17)

Time (ms) taken by EIG/SVD of 256x256 covariance matrix

Inefficient due to GPU unfriendly

EIG/SVD

Page 29: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

• Iterative matrix square root (CVPR’18)

[iSQRT-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

very fast!

inefficient

Matrix Power Normalization and Fast Training

Page 30: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Matrix square root is computed via GPU-friendly iterative method, much faster than GPU-hostile EIG.

• Iterative matrix square root (CVPR’18) Capturing channel correlation (2nd-order statistics)

cov matrix

Matrix Power Normalization and Fast Training

Page 31: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Iterative Matrix Square Root Normalization (CVPR’18)

A Directed Acyclic Graph (DAG) with Iteration

[iSQRT-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

Page 32: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Backward gradient

1 1 1

1 1 1

1 (3 )21 (3 )2

k k k k

k k k k

− − −

− − −

= −

= −

Y Y I Z Y

Z I Z Y Z

( )

( )

1 1 1 1 1 11

1 1 1 1 1 11

1 32

1 32

k k k k k kk k k k

k k k k k kk k k k

l l l l

l l l l

− − − − − −−

− − − − − −−

∂ ∂ ∂ ∂= − − − ∂ ∂ ∂ ∂

∂ ∂ ∂ ∂= − − − ∂ ∂ ∂ ∂

I Y Z Z Z Z YY Y Z Y

I Y Z Y Y Z YZ Z Y Z

Forward propagation

[iSQRT-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

Iterative Matrix Square Root Normalization (CVPR’18)

Page 33: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

( )( ) ( )

( )

21 1tr

trtr

1 tr2 tr

T

T

N

l l l

l

∂ ∂ ∂ = − Σ + ∂Σ ∂ Σ ∂ Σ

∂ + ∂ Σ

IA A

Y IC

1− <A I

Guarantee convergence

Gradient

Iterative Matrix Square Root Normalization (CVPR’18)

Page 34: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Impact of post-compensation on iSQRT-COV with ResNet-50 architecture on ImageNet.

Iterative Matrix Square Root Normalization Gradient ( )

( )post

tr

1 tr2 tr

N

T

N

l l

l l

∂ ∂ = Σ ∂ ∂

∂ ∂ = ∂Σ ∂ Σ

Y C

Y IC

Page 35: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Evaluation on Time (FP+BP, ms) of single meta-layer with AlexNet architecture on ImageNet

[iSQRT-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

Page 36: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Evaluation on • Images per second (FP+BP) of network training with AlexNet architecture.

Our iSQRT-COV network can make better use of computing power of multi-GPU than MPN-COV

Page 37: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Evaluation on • Convergence curves of different networks trained with ResNet-50

architecture on ImageNet.

Our proposed iSQRT-COV network can converge in much less epochs.

Page 38: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Evaluation on • Evluation on ImageNet

Classes: 1000 Train: 1.28 million Validation: 50k Test: 100k

http://www.image-net.org/

Page 39: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Error comparison of second-order networks with first-order ones on ImagetNet

Evaluation on

[iSQRT-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

Page 40: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Generalization to Small-scale Datasets • Birds (CUB-200-2011)

• Aircrafts (FGVC-aircraft)

• Cars (Stanford cars)

Classes Images

200 11,788

Classes Images

100 10,000

Classes Images

196 16,185

[iSQRT-COV] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

Page 41: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Generalization to Small-scale Datasets

Model & Method Dim

Page 42: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Generalization to Small-scale Datasets

Model & Method Dim

Page 43: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Generalization to Small-scale Datasets

Model & Method Dim

Page 44: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

w h

d Last feature maps

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

Why not a probability distribution?

Page 45: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Outline

Global Covariance Pooling Matrix Power Normalizatin and Fast Training Global Distribution Modeling for CNN Conclusion

Page 46: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

G2DeNet (CVPR’17) • Why not a probability distribution?

Maximum entropy distribution with specified mean and covariance – Least prior information – Physical systems tend to be of maximum entropy

[G2DeNet] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. In CVPR, 2017 (Oral).

Page 47: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

G2DeNet (CVPR’17) • Why not a probability distribution?

Capturing Gaussian distribution

w h

d

Last feature maps

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

[G2DeNet] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. In CVPR, 2017 (Oral).

Page 48: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

G2DeNet (CVPR’17) • Why not Global Covariance + Average Pooling?

Capturing Gaussian distribution

w h

d

Image Local Representation of image regions of increasing receptive field

Global Representation of image

Classifier

[G2DeNet] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. In CVPR, 2017 (Oral).

Last feature maps

Page 49: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Q: How to construct our trainable global Gaussian embedding layer?

A: The key is to give the explicit forms of Gaussian distributions.

Forward Propagation

Riemannian Geometry Structure

Algebraic Structure

Backward Propagation

Differentiable

G2DeNet (CVPR’17)

Page 50: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Gaussian embedding [TPAMI’17]

[TPAMI’17] shows space of Gaussians is equipped with a Lie group structure.

The space of Gaussians is a Riemannian manifold having special geometric structure.

( ) , Σ−

− =

, 1T

T

TL

LA

0

+=

12

1

T

TY

Σ

− →, TL

A YO

Gaussian Positive upper triangular matrix Symmetric positive definite (SPD) matrix

1T− −= L LΣleft polar decomp.

Cholesky decomp.

[TPAMI’17] Peihua Li, Qilong Wang, Hui Zeng and Lei Zhang. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. TPAMI, 2017.

演示者
演示文稿备注
For this purpose, we first give an explicit form of Gaussian. The previous work all show the space of Gaussians is a manifold. Our recent work discloses that the space of Gaussians is not only a manifold, but can be equipped with a Lie group structure. By using Lie group theory, we can first map a Gaussian into a unique positive upper triangular matrix A with Cholesky decomposition on covariance matrix, after that matrix A can be mapped into a SPD matrix P with unique left polar decomposition. Through a series of mappings, a Gaussian is uniquely embedded into a square-rooted SPD matrix. Different from existing methods, our form considers both geometric and algebraic structures of Gaussian.
Page 51: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Gaussian embedding [TPAMI’17]

[TPAMI’17] shows space of Gaussians is equipped with a Lie group structure.

The space of Gaussians is a Riemannian manifold having special geometric structure.

( ) , Σ−

− =

, 1T

T

TL

LA

0

+=

12

1

T

TY

Σ

− →, TL

A YO

Gaussian Positive upper triangular matrix Symmetric positive definite (SPD) matrix

1T− −= L LΣleft polar decomp.

Cholesky decomp.

[TPAMI’17] Peihua Li, Qilong Wang, Hui Zeng and Lei Zhang. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. TPAMI, 2017.

演示者
演示文稿备注
For this purpose, we first give an explicit form of Gaussian. The previous work all show the space of Gaussians is a manifold. Our recent work discloses that the space of Gaussians is not only a manifold, but can be equipped with a Lie group structure. By using Lie group theory, we can first map a Gaussian into a unique positive upper triangular matrix A with Cholesky decomposition on covariance matrix, after that matrix A can be mapped into a SPD matrix P with unique left polar decomposition. Through a series of mappings, a Gaussian is uniquely embedded into a square-rooted SPD matrix. Different from existing methods, our form considers both geometric and algebraic structures of Gaussian.
Page 52: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

G2DeNet (CVPR’17)

[G2DeNet] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. In CVPR, 2017 (Oral). [TPAMI’17] Peihua Li, Qilong Wang, Hui Zeng and Lei Zhang. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. TPAMI, 2017.

Page 53: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

( ) +

→ =

12

,1

T

TY

ΣΣ

1. Matrix Partition Sub-layer : 2. Square-rooted SPD Matrix Sub-layer:

( )

( )

+= =

= + +

Σ

1

1 2

T

TMPL

T T T T

sym

f

N N

Y X

AX XA AX 1b B

( )= =

= Λ

12

12

ESRL

T

fZ Y Y

U U

Gaussian Embedding :

Y is a function of convolutional features X. Computing square-root of Y via SVD.

G2DeNet (CVPR’17)

[G2DeNet] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network and Its Application to Visual Recognition. In CVPR, 2017 (Oral). [TPAMI’17] Peihua Li, Qilong Wang, Hui Zeng and Lei Zhang. Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification. TPAMI, 2017.

演示者
演示文稿备注
Now, we can compute Gaussian with a square-rooted SPD matrix Y, which is composed of mean and covariance of Gaussian. By using this embedding method, we construct global Gaussian embedding layer with two sub-layers. The first one is matrix partition sub-layer, which writes embedding matrix Y as a function of convolutional features X. The second one is used to compute square-root of embedding matrix Y via SVD.
Page 54: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Existing Deep CNNs

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

X

……

Global Pooling LayerInput image Convolutional layers Loss

BP for convolutional layers

or

Existing global pooling layers assume unimodal distributions, which cannot fully capture statistics of convolutional activations.

Unimodal Distributions?

Page 55: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Existing Deep CNNs

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Learning Deep Features for Discriminative Localization. CVPR, 2016

Page 56: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Mixture Model

Ensemble of multiple models High computational cost as number of CMs gets large, while a small number of

CMs may be insufficient for characterizing complex distributions. Simple direct ensemble will make all CMs tend to learn similar characteristics.

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Page 57: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Deep CNNs with GM-SOP

. . .Classification

Loss

Final Representation

. . .

Sparsity-constrained Gating Module

Gated Mixture of Second-order Pooling

SR-SOP

Balance Loss

Image ConvolutionLayers

Prediction Layers

4th-CM

Top-K...

Weight

conv.

Bank of Parametric Component Models (CMs)

3rd-CMSR-SOP

conv.

2nd-CMSR-SOP

conv.

1st-CMSR-SOP

conv.

:Dot Product :Sum

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

( )iω X

( )iM X

Page 58: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Sparsity-constrained Gating Module

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Page 59: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Components Models

SR-SOP seems to be a good choice for CM

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Page 60: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

( ) ( )12

1 1ˆ ˆ, J J

Τ= =Z Σ Σ L L X L L X

ˆΤ=Σ X JX

Tj j j=G L L

~ constant

~ trainable

J

~ normal Gaussian distribution

12=Z Σ

SR-SOP (Existing Top Unimodal Pooling) [CVPR2017, ICCV2017, CVPR 2018]

~ multivariate generalized Gaussian distribution

Parametric SR-SOP (Ours)

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Parametric Components Models

Page 61: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Ablation study

Numbers of N and K Training: Testing: 1.28M Images 50K Images

Page 62: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Ablation study

Page 63: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Experiments on ImageNet-1K

Methods Backbone Models

Top-1 Error

Top-5 Error

Top-1 Increment

Top-5 Increment

GAP ResNet-18

49.08 24.25 +0.00 +0.00 SOP 40.32 18.68 +8.76 +5.57 GM-SOP 38.21 17.01 +10.87 +7.24 GAP

ResNet-50 41.42 18.14

+5.69 +3.18 GM-SOP 35.73 14.96 GAP

WRN-36-2 39.55 16.57

+7.22 +4.22 GM-SOP 32.33 12.35

Training: Testing: 1.28M 50K

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Page 64: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Experiments on Places365

Methods Backbone Models Top-1 Error Top-5 Error

GAP

ResNet-18

49.96 19.19 GM-GAP 48.07 17.84 GAP-8256d 49.99 19.32 SOP 48.11 18.01 SR-SOP 47.48 17.52 GM-SOP 47.18 17.02

Training: Testing: ~1.8M 36.5K

Qilong Wang et al. Global Gated Mixture of Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Page 65: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Deep BoVW - NetVLAD

Relja Arandjelovi´c, et al. NetVLAD: CNN architecture for weakly supervised place recognition. CVPR, 2016.

Page 66: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Deep BoVW – Simple NetFV

Lin et al. Bilinear CNNs for Fine-grained Visual Recognition. TPAMI, 2017.

Simple NetFV:

NetVLAD:

Page 67: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Comparison

Methods Birds CUB-200-2011 FGVC-Aircraft FGVC-Cars NetFV [TPAMI’17] 79.9 79.0 86.2 NetVLAD [CVPR’16] 81.9 81.8 88.6 B-CNN [ICCV’15] 84.1 84.1 91.3 G2DeNet (Ours) 87.1 89.0 92.5

Methods Backbone Top-1 error Top-5 error NetVLAD [CVPR’16]

ResNet-18 45.16 21.73

SOP 40.32 18.68 GM-SOP 38.21 17.01

FGVC

64x64 ImageNet-1K

Page 68: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Conclusion

Statistical and geometrical insights

Fast convergence, computation-efficient

Performing and generalizing much better

Page 69: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

Related Publications Global Covariance Pooling [1] Peihua Li, Jiangtao Xie, Qilong Wang and Wangmeng Zuo. Is Second-order Information Helpful for

Large-scale Visual Recognition? In ICCV, 2017.

Robust Covariance Estimation [5] Qilong Wang, Peihua Li, Wangmeng Zuo, Lei Zhang. RAID-G: Robust Estimation of Approximate

Infinite Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016.

Global Gaussian Pooling and Gaussian Embedding [2] Qilong Wang, Peihua Li, Lei Zhang. G2DeNet: Global Gaussian Distribution Embedding Network

and Its Application to Visual Recognition. In CVPR, 2017 (Oral). [3] Peihua Li, Qilong Wang, Hui Zeng, Lei Zhang. Local Log-Euclidean Multivariate Gaussian

Descriptor and Its Application to Image Classification. IEEE TPAMI, 2017. Fast Training Algorithm [4] Peihua Li, Jiangtao Xie, Qilong Wang and Zilin Gao. Towards Faster Training of Global Covariance

Pooling Networks by Iterative Matrix Square Root Normalization. In CVPR, 2018.

Multimodal Distribution [6] Qilong Wang, Zilin Gao, Jiangtao Xie, Wangmeng Zuo, Peihua Li. Global Gated Mixture of

Second-order Pooling for Improving Deep Convolutional Neural Networks. In NIPS, 2018.

Page 70: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical

All code are (or will be) released at http://peihuali.org

Page 71: Higher-order Statistical Modeling based Deep CNNs Part-II · 2018/11/23  · Dimensional Gaussian with Application to Materiel Recognition. In CVPR, 2016 Why MPN-COV works: Statistical