pixels and particles sequential monte carlo for image analysis patrick pérez [[email protected]]...
TRANSCRIPT
Pixels and ParticlesSequential Monte Carlo for Image
Analysis
Patrick Pérez [[email protected]]
Microsoft Research, Cambridge, UKWorkshop on Particle Filter, Paris, 2-3 Dec. 2002
Outline Forewords Visual Tracking
Specifities Why it is difficult Why particle filters are appealing Various forms
Six years of visual tracking with particles 1996-200 Tracking at Oxford 2000-2002 Blossom
Two applied examples at Microsoft Research Color-based tracking Interactive contour extraction
Thoughts: promises, pitfalls, open problems, and alternatives
Why SMC with Images? Probabilistic generative models: powerful for a large range of
image analysis and computer vision tasks Good at capturing (high-dimensional) priors Good at solving inverse problems under uncertainty
Visual Tracking: « following » entities in successive video frames Perceptual interfaces Visual communication Intelligent cars Robotics Surveillance, biometrics Medical imaging Motion capture for sport,
medicine, games, movies Video editing, analysis, compression
Other applications: extracting contours in still images
Many Faces of Visual TrackingTracking…
Object of a given nature. Cars. People. Faces.
Object of a given nature, with a specific attribute. Moving cars. Walking people. Talking heads. Face of a
given person.
A picked object, whatever its nature.Moving entities.
Tasks and Problems Hierarchy of tasks
Tracking an entity, based on frame-to-model consistency Detection (for initialization and re-initialization) Recognition (identity, activity)
Multiple sources of trouble Dimension loss Noise Variability of image-based appearance Occlusions, partial to total Clutter Motion amplitude Real time constraints
Why Particle Filter? Sometimes hybrid and/or high dimensional state-spaces Always complex measurement models
Huge number of measurements at each instant The state tells which subset of data to look at Hence highly non-linear, and multi-modal under clutter Total and partial occlusions make data association even
worse
State captures various aspects of tracked objects 3D pose/shape [cont.] 2D pose/shape [cont. or disc.] Point-wise appearance [cont. or disc.] Color [cont. or disc.] Identity [disc.] Activity [disc.]
Dimension ranges from 2 to 50 Dynamics: often AR-p on continuous, HMM on discrete Link to data:
Part of state defines a portion of image plane: Part of state might define an appearance: Likelihood will explain data in and/or around
Hidden Process
State: control points
Context: entities of unspecified nature, e.g., moving objects
Deformable Curves
[Freedman’s active contours ]
State: Affinity + few deformations modes on average shape
Context: for objects of a given type whose shape is learnt off-line and linearly parameterized (PCA)
Eigen Shapes
[Taylor and Cootes’s Active Shapes]
State: Affinity + few deformations modes on average template
Context: for objects of a given type whose appearance is learnt off-line and linearly parameterized (PCA)
Eigen Appearances
[Taylor and Cootes’s Active Appearances]
State: Affinity applied to small set of examplars
Context: for objects of a given type, whose shape and/or appearance are learnt off-line as a “flip book”
Examplars
[Gravila’s Chamfer System]
State: 3D pose of a set of parameterized parts (possibly articulated)
Context: pose tracking of objects of known type (manufactured objects, human body) whose geometry is known, assumed, or learnt
3D Models
[Sminchisescu’s body model] [Sidenbladh’s body tracker]
Measurements Raw images (monocular, binocular, etc.)
Intensity ColorSupport: pixel grid, possibly sub-sampled
Filtered image Smoothed image Frame difference Gradients Walevets coefficients, steerable filtersSupport: pixel grid, possibly sub-sampled
Low-level features (output of detectors) Edges Corners Moving edgesSupport: sparse, possibly dependent on
Measurements: maxima of projected luminance gradient along normals ( such events on normal)
Outline Likelihood
Measurements: outputs of a filter bank on a grid of points
Background distribution: learned at each grid point on empty scene
Foreground distribution: learned off-line for objects of interest
Scene likelihood
Fg/Bkg Grid Likelihood
Measurements: outputs of a filter bank on a grid of points
Background distribution: learned at each grid point on empty scene
Foreground distribution: learned off-line for objects of interest
Scene likelihood
Fg/Bkg Grid Likelihood
Measurements: outputs of a filter bank on a grid of points
Background distribution: learned at each grid point on empty scene
Foreground distribution: learned off-line for objects of interest
Scene likelihood
Fg/Bkg Grid Likelihood
Measurements: outputs of a filter bank on a grid of points
Background distribution: learned at each grid point on empty scene
Foreground distribution: learned off-line for objects of interest
Scene likelihood
Fg/Bkg Grid Likelihood
Measurements: outputs of a filter bank on a grid of points
Background distribution: learned at each grid point on empty scene
Foreground distribution: learned off-line for objects of interest
Scene likelihood
Fg/Bkg Grid Likelihood
Appearance Likelihood Reference appearance Hypothetized appearance (affine wrap)
Likelihood
Point-wise
Shuffled
Appearance Likelihood Reference appearance Hypothetized appearance (affine wrap)
Likelihood
Point-wise
Shuffled
1996-200 Oxford HeritageMostly contour-based tracking. All papers there. [Isard’96] CONDENSATION [Isard’98] Contour/skin color SIS with color-based proposal density [Isard’98] Smoothing [Isard’98] Switching AR-processes [McCormick’99] Exclusion principle/partitioned sampling for MOT [Deutscher’99] 3D articulated tracking with singularities [Rittsher’99] Partial importance sampling for human motion classif [McCormick’00] Partioned sampling for articulated motion [Deutscher’00] Annealed particle filter for 3D human tracking
2001 BlossomICCV’01 [Philomin’01] Quasi-random sampling [Toyama’01] Likelihood for contour/appearance examplars [Choo’01] Hybrid Monte Carlo for 3D human tracking [Isard’01] 3D multi-people tracking with bckg substraction [Vermaak’01] SIS for audio-visual speaker localization [Sullivan’01] Deterministic search guidance [Pérez’01] Interactive contour extraction with particles [Sidenbladh’01] 3D human tracking with 2D motion dataCVPR’01 [Rui’01] Unscented particle filter for contour-based face tracking [Sminchisescu’01] Cov. Scaled Sampling for 3D Body TrackingMisc. [Spengler’01] Multi-cue democratic integration
2002 BlossomECCV’02 [Sidenbladh’02] Example-based state process for 3D human tracking [Sullivan’02] View based tracking/recogn. of human actions [Vermaak’02] Adaptive multi-cue tracking [Pérez’02] color histogram-based tracking of multiple objects [Sminchisescu’02] Hyperdynamics Importance Sampling Misc. [Nummiaro’02] color histogram-based tracking [Spengler’02] Multi-cue multi-nature (car/human) tracking on bckg [Tweed’02] Tracking many objects with subordinated PF [Nummiaro’02b] Adatptive color histogram-based tracking
Color-based Tracking[Joint work with C. Hue, J. Vermaak, M. Gangnet. ECCV’02]
Colour-only tracking appealing when: No prior knowledge of entities to be tracked Dramatic changes of appearance through the sequence
Principle: compare colour content of candidate regions against a reference colour histogram
Two deterministic predecessors: [Bradski’98][Comaniciu’00]
Model IngredientsState vector (position, scale)Associated image regionN-bin colour histogramReference histogram
Likelihood based on Bhattacharyya distance:
Results
Clutter [deterministic vs. MonteCarlo]
large motion, blur, shape changes, partial occlusion complete occlusion
Multipart Colour Model Idea: capture roughly spatial colour layout Multipart model
Region is partitioned as with associated reference histograms
Assuming conditional independence of sub-regions
where the histogram is collected in region of
Multiple Objects State with object associated to ref.
hist. Independent dynamics Data likelihood: marginalizing out depth ordering
with computed on
Background Modelling When still camera: background subtraction Reference background image Likelihood
Skin Detection Learn skin colour histogram off-line Label pixel on/off-skin with thresholded likelihood Start new object around skin-labelled pixel cluster of sufficient
size and away from existing hypothesized objects Filter-out false alarms with motion information if still camera
Results Automatic detection (skin-based) and background subtraction
Interactive contour extraction[Joint work with A. Blake and M. Gangnet. ICCV’01]
Applications Interactive cutout for image editing Road extraction in aerial/satellite images Blood vessels extraction in endoscopic images
The SMC approach Contour as trajectory of a hidden dynamic process Difficult tracking: gaps, spurious contours, branching Unconventional tracking: no natural time, no sequential
data
State model: bi-dimensional 2nd order AR process
: chain of pixels traversed by polyline with vertices Measurement model: on and off the curve
Combined in the posterior: probability of a path knowing the data
Ingredients
Measurments: norm of intensity gradient Likelihoods
over the whole image: consistent exponential behaviour on plausible contours interactively extracted: complex mixture
over the whole range. We chose a uniform distribution
Data Model
Fixed step-size: e.g., Smooth: Gaussian direction changes with a few abrupt changes
Dynamics
Proposal DensitySmooth component of the dynamics, except if a corner is present(as assessed by Harris corner detector, and labelled otherwise)
Proposal without corners Proposal with corners
User interaction Starting point and direction Rough positioning of « dams » to block to strong spurious contours Restarting, especially at corners
Demo…
JetStream: Interactive Cutout
Joint extraction of two “parallel” contours width part of the unknowns dynamics on it: likelihood ratios on
“Ribbon” Extraction
Road Extraction
JetStream Ribbon JetStream Varying width
(Aerial photographs: courtesy of the GeoInformation Group)
Pros and Cons of SMC Advantages of SMC
Easy to implement and expand Robust to clutter and brief occlusions A wealth of theoretical tools
Problems Jitter of the final estimate Computational loads Only brief capture of multimodality
Thoughts Research directions?
Long-term multimodality Multiple objects Data fusion On-line model adaptation Proper likelihoods Data-driven proposal function
Final controversial view Often, dynamics simply maintains temporal coherence Good engine does not fix weak model A descriminant and robust data model for task at hand
remains the challenge pattern recognition Alternatives to PF: Variational approximation, EM?