computer vision, part 2 object recognition and scene “understanding”
TRANSCRIPT
![Page 1: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/1.jpg)
Computer Vision, Part 2
Object recognition and scene “understanding”
![Page 2: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/2.jpg)
• What makes object recognition a hard task for computers?
![Page 3: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/3.jpg)
HMAX Riesenhuber, M. & Poggio, T. (1999),
“Hierarchical Models of Object Recognition in Cortex”
Serre, T., Wolf, L., Bileschi, S., Risenhuber, M., and Poggio, T. (2006),“Robust Object Recognition with Cortex-Like Mechanisms”
• HMAX: A hierarchical neural-network model of object recognition.
• Meant to model human vision at level of “immediate recognition” capabilities of ventral visual pathway, independent of attention or other top-down processes.
• Also called “Standard Model” (because it incorporates the “standard model” of visual cortex)
• Inspired by earlier “Neocognitron” model of Fukushima (1980)
![Page 4: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/4.jpg)
General ideas behind model
• “Immediate” visual processing is feedforward and hierachical: low levels detect simple features, which are combined hierarchically into increasingly complex features to be detected
• Layers of hierarchy alternate between “sensitivity” (to detecting features) and “invariance” (to position, scale, orientation)
• Size of receptive fields increases along the hierarchy
• Degree of invariance increases along the hierarchy
![Page 5: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/5.jpg)
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
![Page 6: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/6.jpg)
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
![Page 7: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/7.jpg)
S1 layerEdge detectors
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
![Page 8: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/8.jpg)
S1 layerEdge detectors
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
C1 layerMax over local S1 units
Layers alternatebetween“specificity”and“invariance”over position, scale, orientation
![Page 9: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/9.jpg)
S1 layerEdge detectors
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
C1 layerMax over local S1 units
S2 layerPrototypes
(small image patches)
Layers alternatebetween“specificity”and“invariance”over position, scale, orientation
![Page 10: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/10.jpg)
S1 layerEdge detectors
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
C1 layerMax over local S1 units
S2 layerPrototypes
(small image patches)
C2 layerMax activation over each
prototype Layers alternatebetween“specificity”and“invariance”over position, scale, orientation
![Page 11: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/11.jpg)
S1 layerEdge detectors
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
C1 layerMax over local S1 units
S2 layerPrototypes
(small image patches)
C2 layerMax activation over each
prototype
Classification layerObject or image classification
Layers alternatebetween“specificity”and“invariance”over position, scale, orientation
![Page 12: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/12.jpg)
S1 layerEdge detectors
The HMAX model for object recognition(Riesenhuber, Poggio, Serre, et al.)
Image (gray-scale)
C1 layerMax over local S1 units
S2 layerPrototypes
(small image patches)
C2 layerMax activation over each
prototype
Classification layerObject or image classification
Layers alternatebetween“specificity”and“invariance”over position, scale, orientation
Job of HMAX is toproduce a higher-level representation of an image thatwill be useful for classification.
![Page 13: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/13.jpg)
S1 layerEdge detectors
4 orientations, 16 scales
Image (gray-scale)
![Page 14: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/14.jpg)
Etc.: 16 scales
One S1 receptive field:
![Page 15: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/15.jpg)
MAX MAX
S1 layerEdge detectors
4 orientations, 16 scales
C1 layerMax activation over local S1 units (local position, scale)
4 orientations, 8 scales
Image (gray-scale)
![Page 16: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/16.jpg)
S2 layerCalculate similarity to
prototype (radial basis function)4 orientations, 8 scales
…
C1 layerMax activation over local S1 units (local position, scale)
4 orientations, 8 scales
S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
![Page 17: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/17.jpg)
S2 layerCalculate similarity to
prototype (radial basis function)4 orientations, 8 scales
…
Prototypes(~1000, chosen from image collection,
translated to C1 features)
C1 layerMax activation over local S1 units (local position, scale)
4 orientations, 8 scales
S2 unit: Calculate similarity to prototype for each “pooled” position in C1 layer.
![Page 18: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/18.jpg)
S2 layerCalculate similarity to
prototype (radial basis function)4 orientations, 8 scales
…
Prototypes(~1000, chosen from image collection,
translated to C1 features)
C1 layerMax activation over local S1 units (local position, scale)
4 orientations, 8 scales
S2 unit: Calculate similarity to prototype for each “pooled” positionin C1 layer.
Similarity: Radial basis function:
€
S2 i = exp −γ X − Pi2
( )
![Page 19: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/19.jpg)
S2 layerCalculate similarity to
prototype (radial basis function)4 orientations, 8 scales
…
C2 layerMax activation over
position, orientation, scale
S21 S22 …
MAX(1 value)
MAX(1 value)
…
![Page 20: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/20.jpg)
C2 layerMax over position, orientation, scale
.11 .78 … .32
Support Vector Machineclassification(e.g., dog / not dog)
![Page 21: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/21.jpg)
Streetscenes “scene understanding” system(Bileschi, 2006)
Use HMAX + SVM to identify object classes: Car, Pedestrian, Bicycle, Building, Tree
![Page 22: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/22.jpg)
How Streetscenes Works(Bileschi, 2006)
1. Densely tile the image withwindows of different sizes.
2. C1 and C2 features are computed in each window.
3. The features in eachwindow are given as inputto each of five trained support vector machines
4. If any return a classification with score above a learned threshold, that object is said to be “detected” .
…
![Page 23: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/23.jpg)
Object detection (here, “car”) with HMAX model (Bileschi, 2006)
![Page 24: Computer Vision, Part 2 Object recognition and scene “understanding”](https://reader038.vdocuments.net/reader038/viewer/2022103121/56649c7c5503460f94930136/html5/thumbnails/24.jpg)
Sample of results from HMAX model
(Serre et al., 2006)