pixels and particles sequential monte carlo for image analysis patrick pérez [[email protected]]...

Pixels and ParticlesSequential Monte Carlo for Image

Analysis

Patrick Pérez [[email protected]]

Microsoft Research, Cambridge, UKWorkshop on Particle Filter, Paris, 2-3 Dec. 2002

http://research.microsoft.com/users/pperez/


Outline Forewords Visual Tracking

Specifities Why it is difficult Why particle filters are appealing Various forms

Six years of visual tracking with particles 1996-200 Tracking at Oxford 2000-2002 Blossom

Two applied examples at Microsoft Research Color-based tracking Interactive contour extraction

Thoughts: promises, pitfalls, open problems, and alternatives

Why SMC with Images? Probabilistic generative models: powerful for a large range of

image analysis and computer vision tasks Good at capturing (high-dimensional) priors Good at solving inverse problems under uncertainty

Visual Tracking: « following » entities in successive video frames Perceptual interfaces Visual communication Intelligent cars Robotics Surveillance, biometrics Medical imaging Motion capture for sport,

medicine, games, movies Video editing, analysis, compression

Other applications: extracting contours in still images

Many Faces of Visual TrackingTracking…

Object of a given nature. Cars. People. Faces.

Object of a given nature, with a specific attribute. Moving cars. Walking people. Talking heads. Face of a

given person.

A picked object, whatever its nature.Moving entities.

Tasks and Problems Hierarchy of tasks

Tracking an entity, based on frame-to-model consistency Detection (for initialization and re-initialization) Recognition (identity, activity)

Multiple sources of trouble Dimension loss Noise Variability of image-based appearance Occlusions, partial to total Clutter Motion amplitude Real time constraints

Why Particle Filter? Sometimes hybrid and/or high dimensional state-spaces Always complex measurement models

Huge number of measurements at each instant The state tells which subset of data to look at Hence highly non-linear, and multi-modal under clutter Total and partial occlusions make data association even

worse

State captures various aspects of tracked objects 3D pose/shape [cont.] 2D pose/shape [cont. or disc.] Point-wise appearance [cont. or disc.] Color [cont. or disc.] Identity [disc.] Activity [disc.]

Dimension ranges from 2 to 50 Dynamics: often AR-p on continuous, HMM on discrete Link to data:

Part of state defines a portion of image plane: Part of state might define an appearance: Likelihood will explain data in and/or around

Hidden Process

State: control points

Context: entities of unspecified nature, e.g., moving objects

Deformable Curves

[Freedman’s active contours ]

http://www.cs.rpi.edu/~freedd/

State: Affinity + few deformations modes on average shape

Context: for objects of a given type whose shape is learnt off-line and linearly parameterized (PCA)

Eigen Shapes

[Taylor and Cootes’s Active Shapes]

http://www.wiau.man.ac.uk/~bim/






State: Affinity + few deformations modes on average template

Context: for objects of a given type whose appearance is learnt off-line and linearly parameterized (PCA)

Eigen Appearances

[Taylor and Cootes’s Active Appearances]






State: Affinity applied to small set of examplars

Context: for objects of a given type, whose shape and/or appearance are learnt off-line as a “flip book”

Examplars

[Gravila’s Chamfer System]

http://www.gavrila.net/




State: 3D pose of a set of parameterized parts (possibly articulated)

Context: pose tracking of objects of known type (manufactured objects, human body) whose geometry is known, assumed, or learnt

3D Models

[Sminchisescu’s body model] [Sidenbladh’s body tracker]

http://www.inrialpes.fr/movi/people/Sminchisescu/




http://www.nada.kth.se/~hedvig/index_en.html





Measurements Raw images (monocular, binocular, etc.)

Intensity ColorSupport: pixel grid, possibly sub-sampled

Filtered image Smoothed image Frame difference Gradients Walevets coefficients, steerable filtersSupport: pixel grid, possibly sub-sampled

Low-level features (output of detectors) Edges Corners Moving edgesSupport: sparse, possibly dependent on

Measurements: maxima of projected luminance gradient along normals ( such events on normal)

Outline Likelihood

Measurements: outputs of a filter bank on a grid of points

Background distribution: learned at each grid point on empty scene

Foreground distribution: learned off-line for objects of interest

Scene likelihood

Fg/Bkg Grid Likelihood

Appearance Likelihood Reference appearance Hypothetized appearance (affine wrap)

Likelihood

Point-wise

Shuffled

1996-200 Oxford HeritageMostly contour-based tracking. All papers there. [Isard’96] CONDENSATION [Isard’98] Contour/skin color SIS with color-based proposal density [Isard’98] Smoothing [Isard’98] Switching AR-processes [McCormick’99] Exclusion principle/partitioned sampling for MOT [Deutscher’99] 3D articulated tracking with singularities [Rittsher’99] Partial importance sampling for human motion classif [McCormick’00] Partioned sampling for articulated motion [Deutscher’00] Annealed particle filter for 3D human tracking

http://www.robots.ox.ac.uk/~vdg/

2001 BlossomICCV’01 [Philomin’01] Quasi-random sampling [Toyama’01] Likelihood for contour/appearance examplars [Choo’01] Hybrid Monte Carlo for 3D human tracking [Isard’01] 3D multi-people tracking with bckg substraction [Vermaak’01] SIS for audio-visual speaker localization [Sullivan’01] Deterministic search guidance [Pérez’01] Interactive contour extraction with particles [Sidenbladh’01] 3D human tracking with 2D motion dataCVPR’01 [Rui’01] Unscented particle filter for contour-based face tracking [Sminchisescu’01] Cov. Scaled Sampling for 3D Body TrackingMisc. [Spengler’01] Multi-cue democratic integration

http://www.umiacs.umd.edu/users/vasi/Default.htm

http://research.microsoft.com/users/toyama/

http://research.microsoft.com/users/misard/

http://www-sigproc.eng.cam.ac.uk/~jv211/






2002 BlossomECCV’02 [Sidenbladh’02] Example-based state process for 3D human tracking [Sullivan’02] View based tracking/recogn. of human actions [Vermaak’02] Adaptive multi-cue tracking [Pérez’02] color histogram-based tracking of multiple objects [Sminchisescu’02] Hyperdynamics Importance Sampling Misc. [Nummiaro’02] color histogram-based tracking [Spengler’02] Multi-cue multi-nature (car/human) tracking on bckg [Tweed’02] Tracking many objects with subordinated PF [Nummiaro’02b] Adatptive color histogram-based tracking




Color-based Tracking[Joint work with C. Hue, J. Vermaak, M. Gangnet. ECCV’02]

Colour-only tracking appealing when: No prior knowledge of entities to be tracked Dramatic changes of appearance through the sequence

Principle: compare colour content of candidate regions against a reference colour histogram

Two deterministic predecessors: [Bradski’98][Comaniciu’00]

http://www.intel.com/research/mrl/people/bradski_g.htm

http://www.intel.com/research/mrl/people/bradski_g.htm

http://www.caip.rutgers.edu/~comanici/

Model IngredientsState vector (position, scale)Associated image regionN-bin colour histogramReference histogram

Likelihood based on Bhattacharyya distance:

Results

Clutter [deterministic vs. MonteCarlo]

large motion, blur, shape changes, partial occlusion complete occlusion

Multipart Colour Model Idea: capture roughly spatial colour layout Multipart model

Region is partitioned as with associated reference histograms

Assuming conditional independence of sub-regions

where the histogram is collected in region of

Multiple Objects State with object associated to ref.

hist. Independent dynamics Data likelihood: marginalizing out depth ordering

with computed on

Background Modelling When still camera: background subtraction Reference background image Likelihood

Skin Detection Learn skin colour histogram off-line Label pixel on/off-skin with thresholded likelihood Start new object around skin-labelled pixel cluster of sufficient

size and away from existing hypothesized objects Filter-out false alarms with motion information if still camera

Results Automatic detection (skin-based) and background subtraction

Interactive contour extraction[Joint work with A. Blake and M. Gangnet. ICCV’01]

Applications Interactive cutout for image editing Road extraction in aerial/satellite images Blood vessels extraction in endoscopic images

The SMC approach Contour as trajectory of a hidden dynamic process Difficult tracking: gaps, spurious contours, branching Unconventional tracking: no natural time, no sequential

data

State model: bi-dimensional 2nd order AR process

: chain of pixels traversed by polyline with vertices Measurement model: on and off the curve

Combined in the posterior: probability of a path knowing the data

Ingredients

Measurments: norm of intensity gradient Likelihoods

over the whole image: consistent exponential behaviour on plausible contours interactively extracted: complex mixture

over the whole range. We chose a uniform distribution

Data Model

Fixed step-size: e.g., Smooth: Gaussian direction changes with a few abrupt changes

Dynamics

Proposal DensitySmooth component of the dynamics, except if a corner is present(as assessed by Harris corner detector, and labelled otherwise)

Proposal without corners Proposal with corners

User interaction Starting point and direction Rough positioning of « dams » to block to strong spurious contours Restarting, especially at corners

Demo…

JetStream: Interactive Cutout

Joint extraction of two “parallel” contours width part of the unknowns dynamics on it: likelihood ratios on

“Ribbon” Extraction

Road Extraction

JetStream Ribbon JetStream Varying width

(Aerial photographs: courtesy of the GeoInformation Group)

Pros and Cons of SMC Advantages of SMC

Easy to implement and expand Robust to clutter and brief occlusions A wealth of theoretical tools

Problems Jitter of the final estimate Computational loads Only brief capture of multimodality

Thoughts Research directions?

Long-term multimodality Multiple objects Data fusion On-line model adaptation Proper likelihoods Data-driven proposal function

Final controversial view Often, dynamics simply maintains temporal coherence Good engine does not fix weak model A descriminant and robust data model for task at hand

remains the challenge pattern recognition Alternatives to PF: Variational approximation, EM?

pixels and particles sequential monte carlo for image analysis patrick pérez [[email protected]]...

Documents

given nature

given type

tracking of objects

shape andor appearance

identity disc

given person

based appearance occlusions

activity disc