Filters and other potionsP. Perona - Caltech
MIT - 21 November 2013
? whatwhere
Architectures
Architecture 1
Image(s)
The vision black box
Ripe bananas
Marble torso
train
build
ing
Feature extraction: texture
stereo disparity color contrast motion flow
edgels ….
Surface shape, scene depth,
spatial relationships, 3D motion
Grouping: image regions
Perceptual organization: 2.5D sketch:
boundaries, junctions, foregrnd, bckgrnd
Recognition, surface properties
Image processing Regions and surfacesObjects, verbs, categories…
motor
cognition
[Marr ’82]
features?
Le Corbusier, Villa Savoyehttp://flickr.com/photos/ikura/1398271367/
edges
http://www.iit.edu/~stawraf/perspx.jpgLe Corbusier, Villa Savoye
[Fukushima ‘80]
Architecture 2
[DeValois ’85]
Column
Hypercolumn
Dense sampling
translation, rotation invariance
[LeCun et al. 1998]
scale invariance
[Lowe 2004]
[Hinton et al. ’12]
translation, rotation, scale invariance
96 filters 6 orientations 2 center-surround 14 scale samples over 2.2 binary octaves
Detection Performance
Caltech pedestrians: 1M frames, 250K hand-annotated
Detection Performance
Detection Performance
Dollar et al. ‘10Dollar et al. ‘08
Viola & Jones ‘01
Dalal-Triggs ‘05 *
Walk et al. ‘10
filter technology
Scale, orientation, elongation…. lots of CPU cycles
how do we make computations efficient?
Separability
[Adelson & Bergen, ’85]Cost = m x n Cost = m + n
R(i, j) =X
h=1:M,k=1:N
k(h, k)I(i� h, j � k) R(i, j) =X
h=1:M
X
k=1:N
k(h)k0(k)I(i� h, j � k)
Separability and decomposition
[Adelson & Bergen, ’85]
Steerability
[Freeman & Adelson, ’91]
General decomposition
k(x, ✓) =DX
i=1
bi(✓)fi(x)
k(x, y) =DX
i=1
fi(x)gi(y)
k(x, y; ✓) =DX
i=1
bi(✓)fi(x)gi(y)
Design?
x
✓
=k(x; ✓)
D
bi(✓)
✓
x
�i,i
fi(x)
A = USV T
ApproximationK(x, y; ✓) =
DX
i=1
bi(✓)fi(x, y)
K(x, y; ✓) ⇡RX
i=1
bi(✓)fi(x, y) R ⌧ D
[Perona ’95]
[Perona ’95]
[Perona ’95]
Tensor Factorization
k(x, y; ✓) =DX
i=1
bi(✓)fi(x)gi(y)
•Not a convex problem •Gradient descent
[Shy, Perona ’96]
Including scale by resampling
[Manduchi et al. ’98][cfr. Simoncelli et al]
Exploiting Image Statistics
original
upsampled
sampling the gradient
[Dollar et al. 2013]
Gradient histograms[Dollar et al. 2013]
Power law feature scaling
Power law feature scaling
Individual images
[Dollar et al. 2013]
Fast computations
Fast computations
[Dollar et al. 2013]
Performance
[Dollar et al. 2013]
Conclusions• Filtering front-end
• Need fine sampling of scale, orientation, …
• Scalable, separable and steerable approximations
• Exploiting image statistics to extrapolate
• Fast and accurate detection