deep fisher networks and class saliency maps for and...
TRANSCRIPT
![Page 1: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/1.jpg)
Deep Fisher Networks and Class Saliency Maps for
Object Classification and Localisation
Karén Simonyan, Andrea Vedaldi, Andrew ZissermanVisual Geometry Group, University of Oxford
![Page 2: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/2.jpg)
Outline
• Classification challenge• can Fisher Vector encodings be improved by a deep architecture?• deep Fisher Network (FN)• combination of two deep models: Convolutional Network (CN) and deep Fisher Network
• Localisation challenge• visualization of class saliency maps and per‐image foreground pixels from a single classification CN
• bounding boxes computed from foreground pixels• weak supervision: only image class labels used for training
![Page 3: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/3.jpg)
• Bag of Visual Words (BOW) pipeline
... ... ... ...
VQ
Linear SVM
dogs
Shallow Image Encoding & Classification• Dense SIFT features
[Luong & Malik, 1999][Varma & Zisserman, 2003]
[Csurka et al, 2004] [Vogel & Schiele, 2004]
[Jurie & Triggs, 2005][Lazebnik et al, 2006]
[Bosch et al, 2006]
![Page 4: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/4.jpg)
soft-assignment to GMM
1st order stats (k-th Gaussian):
2nd order stats (k-th Gaussian):
80-D 80-D 80-D
FV dimensionality: 80×2×512=81,920(for a mixture of 512 Gaussians)
stacking e.g. if SIFT x reduced to 80 dimensions by PCA
Dense set of local SIFT features → Fisher vector (high dim)
Fisher Vector (FV) – Encoding
Perronnin et al CVPR 07 & 10, ECCV 10
![Page 5: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/5.jpg)
• Learn projection onto a low-dim space where classes are well-separated
• Joint learning of projection and projected-space classifiers (WSABIE):
• Or project onto the space of classifier scores:
• are linear SVM classifiers in the high-dimensional FV space
• fast-to-learn
Projection LearningFisher vector (high dim) → low dimensional representation
Wφ
![Page 6: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/6.jpg)
Deep Fisher Network
Dense feature extractionSIFT, colour
One vs. rest linear SVMs
low-dim FV encoder
Spatial stacking
L2 norm. & PCA
FV encoder
SSR & L2 norm.
SSR & L2 norm.
input image
0-th layer
1-st Fisher layer (local & global pooling)
2-nd Fisher layer(global pooling)
classifier layer
Dense feature extraction
SIFT, raw patches, …
One vs rest linear SVMs
FV encoder
SSR & L2 norm.
Shallow Fisher Vector
![Page 7: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/7.jpg)
Fisher Layer
Compressedlocal Fisher encoding
Spatial stacking(2×2)
L2 norm‐n & PCA
decorrelation
featurew
h
80
w/2
h/2
82000
w/2
h/2
1000
w/2
h/2
4000
w/2
h/2
256
![Page 8: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/8.jpg)
Deep Fisher Network
Dense feature extractionSIFT, colour
One vs. rest linear SVMs
low-dim FV encoder
Spatial stacking
L2 norm. & PCA
FV encoder
SSR & L2 norm.
SSR & L2 norm.
input image
0-th layer
1-st Fisher layer (local & global pooling)
2-nd Fisher layer(global pooling)
classifier layer
Dense feature extraction
SIFT, raw patches, …
One vs rest linear SVMs
FV encoder
SSR & L2 norm.
Shallow Fisher Vector
![Page 9: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/9.jpg)
Classification Results for Fisher Network
ImageNet 2010 challenge dataset: • 1.2M images, 1K classes• SIFT & colour features• Learning: 2‐3 days on 200 CPU cores (MATLAB + MEX implementation)
Improved classification accuracy by adding layer
![Page 10: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/10.jpg)
Deep ConvNet Implementation
• Based on cuda‐convnet [Krizhevsky et al., 2012]• 8 weight layers (rather narrow):
conv64‐conv256‐conv256‐conv256‐conv256‐full4096‐full4096‐full1000
• Jittering:• cropping, flipping, PCA‐aligned noise• random occlusion:
• Single ConvNet instance
![Page 11: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/11.jpg)
Classification Results
ImageNet 2012 challenge dataset: • 1.2M images, 1K classes• top‐5 classification accuracy
Method top‐5 accuracyFV encoding (our 2012 entry) 72.7%Deep FishNet 76.9%Deep ConvNet [Krizhevsky et al., 2012] 81.8%
83.6% (5 ConvNets)Deep ConvNet (our implementation) 82.3%Deep ConvNet + Deep FishNet 84.8%
ConvNet and FisherNet are complementary
![Page 12: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/12.jpg)
Outline
• Classification challenge• can Fisher Vector encodings be improved by a deep architecture?• deep Fisher Network (FN)• combination of two deep models: Convolutional Network (CN) and deep Fisher Network
• Localisation challenge• visualization of class saliency maps and per‐image foreground pixels from a single classification CN
• bounding boxes computed from foreground pixels• weak supervision: only image class labels used for training
![Page 13: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/13.jpg)
Deep inside ConvNets: what Has Been Learnt?
ConvNet class model visualisation• find a (regularised) image with a high class score :
with a fixed learnt model
• compute using back‐prop
Cf ConvNet training• max log‐likelihood of the correct class
• using back‐prop
Visualizing higher‐layer features of a deep network. Erhan, D., Bengio, Y., Courville, A., Vincent, P. Technical report, University of Montreal, 2009.
fully connectedclassifier layer
soft-max layer
…
![Page 14: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/14.jpg)
fox
![Page 15: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/15.jpg)
pepper
![Page 16: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/16.jpg)
dumbbell
![Page 17: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/17.jpg)
Deep inside ConvNets: what Has Been Learnt?
ConvNet class model visualisation• find a (regularised) image with a high class score :
with a fixed learnt model
• compute using back‐prop
NB
Visualizing higher‐layer features of a deep network. Erhan, D., Bengio, Y., Courville, A., Vincent, P. Technical report, University of Montreal, 2009.
fully connectedclassifier layer
soft-max layer
…
gives less prominent visualisation, as it concentrates on reducing scores of other classes
![Page 18: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/18.jpg)
Deep inside ConvNets:What Makes an Image Belong to a Class?
• ConvNets are highly non‐linear → local linear approxima on
• 1st order expansion of a class score around a given image :
• has the same dimensions as image• magnitude of defines a saliency map for image and class
– computed using back‐prop
– score of ‐th class
How to Explain Individual Classification Decisions. Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.‐R. JMLR, 2010.
![Page 19: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/19.jpg)
Saliency Maps For Top‐1 Class
![Page 20: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/20.jpg)
Saliency Maps For Top‐1 Class
![Page 21: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/21.jpg)
Saliency Maps For Top‐1 Class
![Page 22: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/22.jpg)
• Weakly supervised• computed using class‐n ConvNet, trained on image class labels• no additional annotation required (e.g. boxes or masks)
• Highlights discriminative object parts• Instant computation – no sliding window• Fires on several object instances
• Related to deconvnet [Zeiler and Fergus, 2013]• very similar for convolution, max‐pooling, and RELU layers• but we also back‐prop through fully‐connected layers
Image Saliency Map
![Page 23: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/23.jpg)
Saliency Maps for Object Localisation• Image → top‐k class → class saliency map → object box
![Page 24: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/24.jpg)
• Given an image and a saliency map:
BBox Localisation for ILSVRC Submission
![Page 25: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/25.jpg)
• Given an image and a saliency map:1. Foreground/background mask
using thresholds on saliency
BBox Localisation for ILSVRC Submission
blue – foregroundcyan – backgroundred – undefined
![Page 26: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/26.jpg)
• Given an image and a saliency map:1. Foreground/background mask
using thresholds on saliency2. GraphCut colour segmentation
[Boykov and Jolly, 2001]
BBox Localisation for ILSVRC Submission
![Page 27: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/27.jpg)
• Given an image and a saliency map:1. Foreground/background mask
using thresholds on saliency2. GraphCut colour segmentation
[Boykov and Jolly, 2001]3. Bounding box of the largest
connected component
• Colour information propagates segmentation from the most discriminative areas
BBox Localisation for ILSVRC Submission
![Page 28: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/28.jpg)
Segmentation‐Localisation Examples
![Page 29: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/29.jpg)
Segmentation‐Localisation Examples
![Page 30: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/30.jpg)
Segmentation‐Localisation Failure Cases
• Several object instances
![Page 31: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/31.jpg)
• Segmentation isn’t propagated from the salient parts
Segmentation‐Localisation Failure Cases
![Page 32: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/32.jpg)
• Limitations of GraphCut segmentation
Segmentation‐Localisation Failure Cases
![Page 33: Deep Fisher Networks and Class Saliency Maps for and ...image-net.org/challenges/LSVRC/2013/slides/ILSVRC_az.pdf · Deep inside ConvNets: What Makes an Image Belong to a Class? •](https://reader033.vdocuments.net/reader033/viewer/2022052803/5f26bf9a421c4b2b0840bbdc/html5/thumbnails/33.jpg)
Summary
• Fisher encoding benefits from stacking
• Deep FishNet is complementary to Deep ConvNet
• Class saliency maps are useful for localisation• location of discriminative object parts• weakly supervised: bounding boxes not used for training• fast to compute