pushmeet kohli microsoft research - damtp...autocollage robotics darpa urban challenge p(x|z) ∏...
TRANSCRIPT
Pushmeet Kohli Microsoft Research
Infer the values of hidden random variables
Geometry Estimation Object Segmentation
Building
Sky
Tree Grass
Pose Estimation
Orientation Joint angles
Medical Imaging
Image Search
Human Computer Interaction
Minority Report (2002)
Image and
Video Editing
Autocollage
Robotics
DARPA Urban Challenge
P(x|z) P(z|xi) ∏ xi
P(xi,xj) ∏ xi,xj Posterior Probability
Object Segmentation
Building
Sky
Tree Grass
xi ϵ Labels L = {l1, l2, … , lk}
=
Likelihood Term Pairwise Consistency Prior
x1 x2
P(x|z) P(z|xi) ∏ xi
P(xi,xj) ∏ xi,xj Posterior Probability
Object Segmentation
Building
Sky
Tree Grass
xi ϵ Labels L = {l1, l2, … , lk}
=
Likelihood Term Pairwise Consistency Prior
Left image Right image
Disparity map
Depth Estimation
E(X) E: {0,1}n → R 0 → fg
1 → bg
Image (D)
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
Unary Cost (ci) Dark (negative) Bright (positive)
E: {0,1}n → R 0 → fg
1 → bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
∑ fi (xi)
Pixel Colour
E(X) =
Unary Cost (ci) Dark (negative) Bright (positive)
E: {0,1}n → R 0 → fg
1 → bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
x* = arg min E(x)
∑ fi (xi)
Pixel Colour
E(X) =
Discontinuity Cost (dij)
E: {0,1}n → R 0 → fg
1 → bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
∑ fi (xi)
Pixel Colour
E(X) = + ∑ gij (xi,xj)
Smoothness Prior
Discontinuity Cost (dij)
E: {0,1}n → R 0 → fg
1 → bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
∑ fi (xi)
Pixel Colour
E(X) = + ∑ dij|xi-xj|
Smoothness Prior
Global Minimum (x*)
x* = arg min E(x) x
How to minimize E(x)?
E: {0,1}n → R 0 → fg
1 → bg
n = number of pixels
[Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and Blake `04]
∑ fi (xi)
Pixel Colour
E(X) = + ∑ dij|xi-xj|
Smoothness Prior
Space of Problems
Submodular
Functions
CSP
Tree Structured
Pair-wise O(n3)
MAXCUT
O(n6)
n = Number of Variables
Segmentation Energy
NP-Hard
.. So what are the challenges?
Image Segmentation
Semantic Segmentation
Texture Denoising
Penalizes all discontinuities
Pairwise Models are Weak
Does not enforce “group” consistency
Does not enforce consistency
of local structure
Unary Potentials [Shotton et al. ECCV 2006]
+ Pairwise Smoothness
Potentials
Higher Order Energy
Colour, Location & Texture Sky
Building
Tree
Grass
Higher Order Potentials (Defined using multiple
Segmentations)
+
Kohli Ladicky Torr [CVPR 2008]
{ 0 if xi=L, I ϵ p C otherwise
h(Xp) =
p
Pixels belonging to a group should take the
same label
Unary Potentials [Shotton et al. ECCV 2006]
+ Pairwise Smoothness
Potentials
+
Higher Order Energy
Colour, Location & Texture Sky
Building
Tree
Grass
Kohli Ladicky Torr [CVPR 2008]
{ 0 if xi=L, i ϵ p C otherwise h(Xp) =
Higher Order Potentials (Defined using multiple
Segmentations)
Unary Potentials [Shotton et al. ECCV 2006]
+ Pairwise Smoothness
Potentials
Segmentation Solution
+
Higher Order Energy
Energy Minimization
Colour, Location & Texture Sky
Building
Tree
Grass
Kohli Ladicky Torr [CVPR 2008]
Higher Order Potentials (Defined using multiple
Segmentations)
Image
Unary Classifiers Pairwise CRF
Pn Potts
Robust Pn Potts
Clustering 1 Clustering 2
Image (MSRC-21)
Pairwise CRF
Higher order CRF
Ground Truth
Sheep
Grass
[Runner-Up, PASCAL VOC 2008 ]
Test Image Test Image (60% Noise)
Training Image
Result
Pairwise Energy
P(x)
Minimized using st-mincut or max-product
message passing
Higher Order Structure not Preserved
Minimize:
Where:
Higher Order Function (|c| = 10x10 = 100)
Assigns cost to 2100 possible labellings!
Exploit function structure to transform it to a Pairwise function
E(X) = P(X) + ∑ hc (Xc) c
hc: {0,1}|c| → R
p1 p2 p3
Test Image Test Image (60% Noise)
Training Image
Pairwise Result
Higher-Order Result
Learned Patterns
Identified transformable families of higher order function s.t.
1. Constant or polynomial number of auxiliary variables (a) added
2. All pairwise functions (g) are submodular
Pairwise Submodular
Function
Higher Order Function
H (X) = F ( ∑ xi ) Example:
H (X)
∑ xi
concave
[Kohli et al. ‘08]
0
Simple Example using Auxiliary variables
{ 0 if all xi = 0 C1 otherwise f(x) =
min f(x) min C1a + C1 ∑ ā xi
x ϵ L = {0,1}n
x = x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi < 1 a=0 (ā=1) f(x) = 0
∑xi ≥ 1 a=1 (ā=0) f(x) = C1
min f(x) min C1a + C1 ∑ ā xi x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
C1
C1∑xi
[Kohli et al. ‘08]
min f(x) min C1a + C1 ∑ ā xi x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
C1
C1∑xi
a=1 a=0 Lower envelop of concave functions is
concave
[Kohli et al. ‘08]
min f(x) min f1 (x)a + f2(x)ā x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
Lower envelop of concave functions is
concave
f2(x)
f1(x)
[Kohli et al. ‘08]
min f(x) min f1 (x)a + f2(x)ā x
= x,a ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
a=1 a=0 Lower envelop of concave functions is
concave
f2(x)
f1(x)
[Kohli et al. ‘08]
min f(x) min max f1 (x)a + f2(x)ā x
= x ϵ {0,1}
Higher Order Submodular Function
Quadratic Submodular Function
∑xi
1 2 3
a=1 a=0 Upper envelope
of linear functions is
convex
f2(x)
f1(x)
a ϵ {0,1}
Very Hard Problem!!!! [Kohli and Kumar, CVPR 2010]
Object Prior on size of object (say n/2)
Background
BINARY SEGMENTATION
SIZE/VOLUME PRIORS
[Kohli and Kumar, CVPR 2010]
SILHOUETTE CONSTRAINTS
Rays must touch silhouette
at least once
3D RECONSTRUCTION
[Sinha et al. ‘05, Cremers et al. ’08] [Woodford et al. 2009]
Original Video
MSR
Edited Video
Alternative Representation
MSR
(Memory Efficient)
[Rav-Acha, Kohli, Rother & Fitzgibbon, SIGGRAPH 2008]
A deforming 3D object
x
y x = (x, y) Note:
u = (u, v) Note:
Visibility Mask Warping
w(u,t) b(u,t)
Texture map
visibility mask warping Image Texture map
+ =
+ =
visibility mask warping Image Texture map
[Rav-Acha, Kohli, Rother & Fitzgibbon, SIGGRAPH 2008]
Speed and Accuracy
3,600,000,000 Pixels Created from about 800 8 MegaPixel Images
[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]
[Kopf et al. (MSR Redmond) SIGGRAPH 2007 ]
Speed and Scalability
Processing Videos 1 minute video of 1M pixel resolution
3.6 B pixels
3D reconstruction [500 x 500 x 500 = .125B voxels]
Speed and Scalability
Dynamic Algorithms [ICCV2005][CVPR 2008][PAMI 2007] [PAMI 2010]
Reuse computation
Hybrid Algorithms [PAMI 2010]
Use different algorithms to solve easy/hard parts
Adaptive Algorithms [AISTATS 2010] [ICML 2011] [CVPR 2011]
Focus on the unsolved parts of the problem