9. tracking - u-szeged.hukato/teaching/computervision/09-tracking.pdf25 zoltan kato: computer vision...

9
9. Tracking 9. Tracking Computer Vision Computer Vision Zoltan Zoltan Kato Kato http://www.inf.u http://www.inf.u http://www.inf.u - - - szeged.hu/~kato szeged.hu/~kato szeged.hu/~kato / / / 2 Zoltan Zoltan Kato: Computer Vision Kato: Computer Vision Tracking Identify targets to track Features like corners Whole objects (shape) Follow targets over subsequent frames 3 Zoltan Zoltan Kato: Computer Vision Kato: Computer Vision Tracking Applications Robotics Manipulation, grasping [Hong, 1995] Mobility, driving [Taylor et al., 1996] Localization [Dellaert et al., 1998] Surveillance/Activity monitoring Street, highway [Koller et al., 1994; Stauffer & Grimson, 1999] Aerial [Cohen & Medioni, 1998] Human-computer interaction Expressions, gestures [Kaucic & Blake, 1998; Starner & Pentland, 1996] Smart rooms/houses [Shafer et al., 1998; Essa, 1999] 4 Zoltan Zoltan Kato: Computer Vision Kato: Computer Vision Feature matching vs. tracking What is a good feature? Stereo correspondence (Feature matching): Extract features independently and then match by comparing descriptors Feature tracking: Extract features in first images and then try to find same feature back in next view

Upload: others

Post on 11-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

9. Tracking9. Tracking

Computer VisionComputer Vision

ZoltanZoltan KatoKatohttp://www.inf.uhttp://www.inf.uhttp://www.inf.u---szeged.hu/~katoszeged.hu/~katoszeged.hu/~kato///

2

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Tracking• Identify targets to track

• Features like corners• Whole objects (shape)

• Follow targets over subsequent frames

3

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Tracking Applications• Robotics

• Manipulation, grasping [Hong, 1995]• Mobility, driving [Taylor et al., 1996]• Localization [Dellaert et al., 1998]

• Surveillance/Activity monitoring • Street, highway [Koller et al., 1994;

Stauffer & Grimson, 1999]• Aerial [Cohen & Medioni, 1998]

• Human-computer interaction• Expressions, gestures [Kaucic & Blake,

1998; Starner & Pentland, 1996]

• Smart rooms/houses [Shafer et al., 1998; Essa, 1999]

4

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Feature matching vs. tracking

What is a good feature?

Stereo correspondence (Feature matching):Extract features independently and then match by comparing descriptors

Feature tracking: Extract features in first images and then try to find same feature back in next view

Page 2: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

5

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Feature tracking• Identify features and track them over video

• Small difference between frames• potential large difference overall

• Approaches:• Standard: KLT

• Kanade-Lucas-Tomasi [Kanade-Lucas 81] [Shi-Tomasi94]

• Kalman filter• State of the Art: CONDENSATION

• CONditional DENSity propagATION [Isard-Blake 98]6

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Kanade-Lucas-Tomasi tracker• Identify good feature points to track

• ~Corners• Track feature points

• Assume small (~1 pixel) displacement between subsequent frames translational motion

• Assume pixels in a small window around feature point have the same displacement constant flow

• What about large motion?• Use multi-scale technique

• top-down strategy in a Gaussian pyramid

7

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

• Compute translation ∆=(dx,dy) assuming it is small:

• Differentiate:

• Affine motion is also possible (6x6 instead of 2x2) :

KLT Tracker

8

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Feature point extraction• Approximate SSD for small displacement ∆

• Image difference, square difference for pixel

• SSD for window

Page 3: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

9

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Feature point extraction

homogeneous

edge

corner

Find points for which the following is maximum

i.e. maximize smallest eigenvalue of M10

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Good features to track• Use same window in feature selection as for

tracking itself

• Maximize minimal eigenvalue of M• Strategy:

• Look for strong well distributed features, typically few hundreds

• initialize and then track, renew features when too many are lost

11

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Tracking as probabilistic inference• We know something about

• object shape, • dynamics, but we want to estimate state

• There is also uncertainty due to • noise, • unpredictability of motion, • etc…

12

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

)()()|()|(

ZPXPXZPZXP =

Bayesian inference

• For tracking, these random variables have common names: • X is the state• Z is the measurement• These are multi-valued and time-indexed, so:

likelihood prior on X

posterior on Xevidence

)()|()|( ttttt PPP XXZZX α=

Page 4: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

13

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

The Notion of State• State Xt is a vector of the parameters we

are trying to estimate• Changing over time

• Some possibilities: • Position: Image coordinates, world

coordinates (i.e., depth)• Orientation (2-D or 3-D)

• Rigid “pose” of entire object (e.g. a car)• Joint angle(s) if the object is articulated (e.g., a

person’s arm): • Curvature if the object is “bendable” (e.g. lips

reading)• Differential quantities like velocity,

acceleration, etc.

Example: state = image coord. + velocity

14

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Measurements• Zt is what we observe at one moment

• For example, image position, image dimensions, color, etc.

• Measurement likelihood P(Zt|Xt):Probability of measurement given the state

• Implicitly contains:• Measurement prediction function H(X)

mapping states to measurements• e.g., perspective projection• e.g., removal of velocity terms

unobservable in single image • Comparison function such that probability

is inversely proportional to |Zt-H(Xt )|

• Example:• State = position &

velocity

• Measurement = position

• Measurement prediction = remove velocity

15

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Dynamics• The prior probability on the state P(Xt) depends

on previous states: P(Xt|Xt-1,Xt-2, ...)

• Dynamics (Markov property):• 1st-order: Only consider t-1

• E.g., Random walk, constant velocity • 2nd order: Only use t-1 and t-2

• E.g., Changes of direction, periodic motion • Can be represented as a 1st-order process by doubling

the size of the state to “remember” the last value • Implicitly contains:

• State prediction function F(X) mapping current state to future

• Comparison function: Bigger |Xt-F(Xt-1)| Less likely Xt

• Example:• State

• Measurement

• Measurement prediction

• State prediction = constant velocity

Xt-3 Xt-2 Xt-1 Xt

Zt-3 Zt-2 Zt-1 Zt

XtXt

16

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Inference by MAP criterion• Want best estimate of state given current

measurement zt and previous state xt-1 :• Use, for example, Maximum A Posteriori criterion:

• For general measurement likelihood & state prior, obtaining best estimate requires iterative search• Can confine search to region of state space near F(xt-1)

for efficiency since this is where probability mass is concentrated

these are fixed

Page 5: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

17

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Feature Tracking• Detect corner-type features • State xt

• Position of template image (original found corner)• Optional: Velocity, acceleration terms• Rotation, perspective: For a planar feature, homography

describes full range of possibilities

• Measurement likelihood P(zt|X): Similarity of match (e.g., SSD/correlation) between template and zt, which is patch of image

zt H (xt) |zt – H (xt)|

18

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Feature Tracking• Dynamics P(X|xt-1): Static or with displacement

prediction• Inference is simple: Gradient descent on match

function starting at the predicted feature location• Can actually do this in one step assuming a small enough

displacement• Image pyramid representation (i.e., Gaussian) can help

with larger motions

19

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Snakes (= Active Contours)• Idea: Track contours such as

silhouettes, road lines using edge information

• Dynamics• Low-dimensional warp of shape

template [Blake et al., 1993]

• Translation, in-plane rotation, affine, etc.

• Or more general non-rigid deformations of curve

• Measurement likelihood• Error measure = Mean distance from

predicted curve to nearest Canny edge

• Or integrate gradient orthogonal to curve along it

20

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Contour based hand tracking

Page 6: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

21

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Kalman filtering• Used to optimize feature tracking results

• It relies on the fact that the measurement and observation equations are linear, and the posterior distribution is assumed to be Gaussian

• Optimal linear estimation• Assume: Linear system with uncertainties

• State x• Dynamical (system) model: x=Φxt-1+ε• Measurement model: z=Hx+µ• ε, µ indicate white, zero-mean, Gaussian noise with

covariances Q, R respectively• Q, R set from real data if possible, but ad-hoc numbers may also

work

• Want best state estimate at each instant22

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Kalman Filter• Essentially an online version of least squares• Provides best linear unbiased estimate

Slide adopted from CS5245 Computer Vision and Graphics for SpecSlide adopted from CS5245 Computer Vision and Graphics for Special Effects Dr. Ng Teck Khimial Effects Dr. Ng Teck Khim 23

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

kKcompute

)ˆ(ˆupdate

kkkkkk xHzKxx −+=

kPcompute

kTkkkk

kkk

QPPxx

+=

=

+

+

φφ

φ

1

1 ˆˆstepnextpredict

00ˆ Px

L,, 10 zz

L,, 10 xx

Filtering algorithm

24

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Example:• State: 2D position, velocity

• Kalman-estimated states

courtesy of K. Murphy

Page 7: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

25

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering• Idea: Stochastic approximation of state posterior with a set of

N weighted particles (a.k.a. samples) fs(i), π(i)g, where s(i)

is a possible state and π(i) is its weight• Simulation instead of analytic solution, the underlying

probability distribution may take any form• Example: CONDENSATION — A particle filter developed for

person tracking [Isard & Blake, 1996]

from Isard & Blake, 1998

15 samples with size proportional to weight26

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering Basics• Each particle s(i) is a possible state, which has a likelihoodπ(i) associated with it that is easily computable

• The posterior distribution is approximated by the ensemble of weights on all of these sampled states

• By keeping track of state samples with non-zero probability, we imply that the rest of the distribution has zero probability

• Simulate deterministic and probabilistic motion of particles, update weights using measurement likelihood

More particles Better approximation (and more expensive), but there’s no formula for the “right amount”

27

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Updating the Particle Set1. Sample: Randomly select N particles based on

weights (same particle may be picked multiple times)

2. Predict: Move particles according to deterministic dynamics (drift), then perturb individually (diffuse)

3. Measure: Get a likelihood for each new sample by making a prediction about the image’s local appearance and comparing; then update weight on particle accordingly

28

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering: Initial Particle Set• Particles at t=0 drawn

from wide prior because of large initial uncertainty• Gaussian with large

covariance• Uniform distribution

from MacCormick & Blake, 1998

State includes shape & position;prior more constrained for shape

Page 8: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

29

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering: Sampling• Normalize N particle weights

so that they sum to 1• Resample particles by picking

randomly and uniformly in [0,1]range N times• Analogous to spinning a roulette

wheel with arc-lengths of bins equal to particle weights

• Adaptively focuses on promising areas of state space

π(1)

π(2)

π(3)

π(N)

π(N-1)

courtesy of D. Fox

30

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering: Prediction• Update each particle using generative form of

dynamics:

• Drift may be nonlinear (i.e., different displacement for each particle)

• Each particle diffuses independently• Typically modeled with a Gaussian

Random component (aka “diffusion”)

Deterministic component (aka “drift”)

31

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering: Measurement• For each particle s(i),

compute new weight π(i) as measurement likelihood π(i)=P(z|s(i))

• Enforcing plausibility: Particles that represent impossible configurations are given 0 likelihood• E.g., positions outside of

image from MacCormick & Blake, 1998

A snake measurement likelihood method

32

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Particle Filtering Steps (CONDENSATION)

drift

diffuse

measure

measurementlikelihood

from Isard & Blake, 1998

Sampling occurshere

Page 9: 9. Tracking - u-szeged.hukato/teaching/computervision/09-Tracking.pdf25 Zoltan Kato: Computer Vision Particle Filtering • Idea: Stochastic approximation of state posterior with a

33

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

CONDENSATION: 1D example• The animation shows a

few cycles of the algorithm applied to a one-dimensional system. • The green spheres

correspond to the members of the sample set, where the size of the sphere is an indication of the sample weight.

• The red line is the measurement density function.

http://www.robots.ox.ac.uk/~misard/condensation.html

34

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Obtaining a State Estimate• Note that there’s no explicit state estimate

maintained just a “cloud” of particles• Can obtain an estimate at a particular time by

querying the current particle set• Some approaches

• “Mean” particle• Weighted sum of particles• Confidence: inverse variance

• Really want a mode finder mean of tallest peak

35

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

Condensation: Estimating Target State

From Isard & Blake, 1998

State samples (thickness proportional to weight)

Mean of weighted state samples

36

ZoltanZoltan Kato: Computer VisionKato: Computer Vision

CONDENSATION in action

• Tracking agile motion: • a video sequence of a girl dancing to a Scottish reel is tracked• a leaf blowing in the wind, against a background of similar leaves.

• Effective anticipation by the computer of likely movements is crucial to enable it to ``see'' such agile movements.

http://www.robots.ox.ac.uk/~misard/condensation.html