eecs 274 computer vision tracking. motivation: obtain a compact representation from an image/motion...

66
EECS 274 Computer Vision Tracking

Upload: mark-townsend

Post on 21-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

EECS 274 Computer Vision

Tracking

Page 2: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Tracking

• Motivation: Obtain a compact representation from an image/motion sequence/set of tokens

• Should support application• Broad theory is absent at present

• Some slides from R. Collins lecture notes

• Reading: FP Chapter 17

Page 3: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Tracking

• Very general model: – We assume there are moving objects, which have an

underlying state X– There are measurements Y, some of which are functions

of this state– There is a clock

• at each tick, the state changes• at each tick, we get a new observation

• Examples– object is ball, state is 3D position+velocity,

measurements are stereo pairs– object is person, state is body configuration,

measurements are frames, clock is in camera (30 fps)

Page 4: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Three main steps

Page 5: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Simplifying assumptions

Page 6: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Tracking as induction

• Assume data association is done– we’ll talk about this later; a dangerous

assumption

• Do correction for the 0’th frame• Assume we have corrected estimate

for i’th frame– show we can do prediction for i+1,

correction for i+1

Page 7: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

General model for tracking

• The moving object of interest is characterized by an underlying state X

• State X gives rise to measurements or observations Y

• At each time t, the state changes to Xt and we get a new observation Yt

X1 X2

Y1 Y2

Xt

Yt

Can be better explained with graphical model

Page 8: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Base case

Page 9: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Induction step

Given

Page 10: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Induction step

Page 11: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Explanation with graphical model

Page 12: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Explanation with graphical model

Page 13: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Explanation with graphical model

Page 14: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Linear dynamic models

);(~

);(~ 11

i

i

miii

diii

N

N

ΣxMy

ΣxDx

• Use notation ~ to mean “has the pdf of”, N(a, b) is a normal distribution with mean a and covariance b.

• Then a linear dynamic model has the form

• This is much, much more general than it looks, and extremely powerful

D: transition matrixM: observation (measurement) matrix

Page 15: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Propagation of Gaussian densities

Page 16: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Examples

• Drifting points– we assume that the new position of the point is

the old one, plus noise.– For the measurement model, we may not need

to observe the whole state of the object• e.g. a point moving in 3D, at the 3k’th tick we see x,

3k+1’th tick we see y, 3k+2’th tick we see z• in this case, we can still make decent estimates of all

three coordinates at each tick.

– This property, which does not apply to every model, is called Observability

Page 17: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Examples

• Points moving with constant velocity• Periodic motion• etc.• Points moving with constant

acceleration

Page 18: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Moving with constant velocity• We have

– (the Greek letters denote noise terms)• Stack (u, v) into a single state vector, xi

– which is the form we had above

iii

iiii t

ςvv

εvpp

1

11

noise0

1

id

d

iI

tI

v

p

v

p

0,0 di

d

ddi I

I

tII

MD

);(~

);(~ 11

i

i

miii

diii

N

N

ΣxMy

ΣxDx

Page 19: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Moving with constant acceleration• We have

– (the Greek letters denote noise terms)

• Stack (u, v) into a single state vector

– which is the form we had above

iii

iiii

iiii

t

t

ξaa

ςavv

εvpp

1

11

11

noise

00

0

0

1

id

d

d

iI

tI

tI

a

v

u

a

v

u

00,

00

0

0

di

d

dd

dd

i I

I

tII

tII

MD

Page 20: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Constant velocity dynamic model

state1st component of stateposition

measurement and 1st component of state

position

velocity

time

time

Page 21: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Constant acceleration dynamic model

position

velocityposition

time

Page 22: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Kalman filter

• Key ideas: – Linear models interact uniquely well

with Gaussian noise • make the prior Gaussian, everything else

Gaussian and the calculations are easy

– Gaussians are really easy to represent • once you know the mean and covariance,

you’re done

Page 23: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Kalman filter in 1D

• Dynamic Model

• NotationPredicted mean

Corrected mean

Before/after i-th measurement

Page 24: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Prediction for 1D Kalman filter• Because the new state is obtained by

– multiplying old state by known constant– adding zero-mean noise

• Therefore, predicted mean for new state is– constant times mean for old state

• Predicted variance is – sum of constant^2 times old state variance and noise

variance– old state is normal random variable, multiplying normal

rv by constant implies mean is multiplied by a constant variance by square of constant, adding zero mean noise adds zero to the mean, adding rv’s adds variance

Page 25: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application
Page 26: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Correction for 1D Kalman filter• Pattern match to identities given in book

– basically, guess the integrals, get:

• Notice:– if measurement noise is small, we rely mainly on the measurement,– if it’s large, mainly on the prediction

iiim

i

mi

i

iim

ii

i

i

xx

mm

yx

, large is

,0

Page 27: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Example: Kalman filtering

t)measuremenafter n (correctio :

t)measuremen beforen (predictio :*

tmeasuremen:

state :

i

i

x

x

Page 28: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

In higher dimensions, derivation follows the same lines, but isn’t as easy. Expressions here.

Kalman filter: general case

Page 29: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

the measurement standard deviation is small, so the state estimates are rather good

Page 30: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Smoothing

• Idea– We don’t have the best estimate of state -

what about the future?– Run two filters, one moving forward, the

other backward in time.– Now combine state estimates

• The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter’s prediction as yet another measurement for the forward filter

– so we’ve already done the equations

Page 31: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Forward filter

t)measuremenafter n (correctio :

t)measuremen beforen (predictio :*

tmeasuremen:

state :

i

i

x

x

Page 32: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Backward filter

t)measuremenafter n (correctio :

t)measuremen beforen (predictio :*

tmeasuremen:

state :

i

i

x

x

Page 33: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Forward-backward filter

t)measuremenafter n (correctio :

t)measuremen beforen (predicito :*

tmeasuremen:

state :

i

i

x

x

Page 34: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Data association

•Also known as correspondence problem• Given features detectors (measurements) are not perfect, how can one find correspondence between measurements and features in the model?

Page 35: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Data association

• Determine which measurements are informative (as not every measurement conveys the same amount of information)

• Nearest neighbors– choose the measurement with highest probability

given predicted state– popular, but can lead to catastrophe

• Probabilistic Data Association (PDA)– combine measurements, weighting by probability

given predicted state– gate using predicted state

Page 36: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Data association

Constant velocity modelmeasurements are plottedwith standard deviation (dash line)

t)measuremenafter n (correctio :

t)measuremen beforen (predictio :*

tmeasuremen:

state :

i

i

x

x

Page 37: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Data association with NN•The blue track represents the actual measurements of state, and red tracks are noise•The noise is all over the place, and the gate around each predictiontypically contains only one measurement (the right one).•This means that we can track rather well ( the blue track is very largely obscured by the overlaid measurements of state)

•Nearest neighbors (NN) pick best measurements (consistent with predictions) work well in this case•Occasional mis-identifcaiton may not cause problems

t)measuremenafter n (correctio :

t)measuremen beforen (predictio :*

tmeasuremen:

state :

i

i

x

x

Page 38: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Data association with NN

• If the dynamic model is not sufficiently constrained, choosing the measurement that best may lead to failure

t)measuremenafter n (correctio :

t)measuremen beforen (predictio :*

tmeasuremen:

state :

i

i

x

x

Page 39: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Data association with NN

• But actually the tracker loses tracks

•These problems occur because error can accumulate

•it is now relatively easy to continue tracking the wrong point for a long time, •and the longer we do this the less chance there is of recovering the right point

Page 40: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Probabilistic data association• Instead of choosing the region most

like the predicted measurement, • we can exclude all regions that are

too different and then use the others,• weighting them according to their

similarity to prediction

ji

jijih

ii

hPE

P

yyyy

yyy

),,|(

),,|( using of Instead

10

10

superscipt indicates the region

Page 41: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Nonlinear dynamics

•As one repeats iterations of this function --- there isn’t anynoise --- points move towards or away from points where sin(x)=0.

•This means that if one hasa quite simple probability distribution on x_0, one can end up with a very complex distribution (many tiny peaks) quite quickly.

•Most easily demonstrated by running particles through thisdynamics as in the next slide.

)sin(1 iii xxx

Page 42: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Nonlinear dynamics

•Top figure shows tracks of particle position against iteration for the dynamics of the previousslide, where the particle position was normal in the initial configuration (first graph at the bottom)•As the succeeding graphs show, the number of particles near a point (= histogram = p(x_n))very quickly gets complex, with many tiny peaks.

Page 43: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Propagation of general densities

Page 44: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Factored sampling

• Represent the state distribution non-parametrically– Prediction: Sample points from prior density for the state,

p(x)– Correction: Weight the samples according to p(y|x)

)|()(

)(

)(

)()|()|(

1

)(

)(

xyx

s

s

xxyyx

pp

p

p

pkpp

z

N

j

jz

nz

n

ttttt

tttttt

dpp

ppp

xyyxxy

yyxxyyyx

10

100

,,||

,,||,,|

Representing p(y|x) in terms of {st-1(n), πt-1

(n) }

Page 45: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Particle filtering

• We want to use sampling to propagate densities over time (i.e., across frames in a video sequence)

• At each time step, represent posterior p(xt|yt) with weighted sample set

• Previous time step’s sample set p(xt-1|yt-1) is passed to next time step as the effective prior

Page 46: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Particle filtering

M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual tracking”, IJCV 29(1):5-28, 1998

Start with weighted samples from previous time step

Sample and shift according to dynamics model

Spread due to randomness; this is predicted density p(xt|yt-1)

Weight the samplesaccording to observation density

Arrive at corrected density estimate p(xt|yt)

Page 47: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Paramterization of splines

Page 48: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Dynamic model p(xt|xt-1)

• Can learn from examples using ARMA or LDS

• Can be modeled with random walk

• Often factorize xt

Page 49: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Observation model p(yt|xt)

Page 50: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Observation model: 2D case

Page 51: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Particle filtering results

Page 52: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Issues• Initialization• Obtaining observation and dynamics model

– Generative observation model: “render” the state on top of the image and compare– Discriminative observation model: classifier or detector score– Dynamics model: learn (very difficult) or specify using domain knowledge

• Prediction vs. correction– If the dynamics model is too strong, will end up ignoring the data – If the observation model is too strong, tracking is reduced to repeated

detection

• Data association– What if we don’t know which measurements to associate with which

tracks?

• Drift– Errors caused by dynamical model, observation model, and data

association tend to accumulate over time

• Failure detection

Page 53: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object Tracking

Search in the model’s

neighborhood in next frame

Start from the position of the model in the current frame

Find best candidate by maximizing a

similarity func.

Repeat the same process in the next pair of

frames

Current frame

… …Model Candidate

Page 54: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingTarget Representation

Choose a reference target

model

Quantized Color Space

Choose a feature space

Represent the model by its PDF in the

feature space

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 2 3 . . . m

color

Probability

Kernel Based Object Tracking, by Comaniniu, Ramesh, Meer

Page 55: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingPDF Representation

,f y f q p y Similarity

Function:

Target Model(centered at 0)

Target Candidate(centered at y)

1..1

1m

u uu mu

q q q

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1 2 3 . . . m

color

Probability

1..

1

1m

u uu mu

p y p y p

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 . . . m

color

Probability

Page 56: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingFinding the PDF of the target model

1..i i nx

Target pixel locations

( )k x A differentiable, isotropic, convex, monotonically decreasing kernel• Peripheral pixels are affected by occlusion and background interference

( )b x The color bin index (1..m) of pixel x

2

( )i

u ib x u

q C k x

Normalization factor

Pixel weight

Probability of feature u in model

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 . . . m

color

Probability

Probability of feature u in candidate

0

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 . . . m

color

Probability

2

( )i

iu h

b x u

y xp y C k

h

Normalization factor Pixel weight

0

model

y

candidate

Page 57: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingSimilarity Function

1 , , mp y p y p y

1, , mq q q

Target model:

Target candidate:

Similarity function: , ?f y f p y q

1 , , mp y p y p y

1 , , mq q q

1

1

The Bhattacharyya Coefficient

1

cosT m

y u uu

p y qf y p y q

p y q

'q

)(' yp

y

Page 58: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingTarget Localization Algorithm

Start from the position of the model in the current frame

Search in the model’s

neighborhood in next frame

Find best candidate by maximizing a

similarity func.

'q

)(' yp ]'),('[ qypf

Page 59: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

01 1 0

1 1

2 2

m mu

u u uu u u

qf y p y q p y

p y

Linear approx.

(around y0)

Mean-Shift Object TrackingApproximating the Similarity Function

0yModel location:yCandidate location:

2

12

nh i

ii

C y xw k

h

2

( )i

iu h

b x u

y xp y C k

h

Independent of y

Density estimate!

(as a function of y)

1

m

u uu

f y p y q

Page 60: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingMaximizing the Similarity Function

2

12

nh i

ii

C y xw k

h

The mode of = sought maximum

Important Assumption:

One mode in the searched

neighborhood

The target representation

provides sufficient discrimination

Page 61: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingApplying Mean-Shift

Original Mean-Shift:

2

0

1

1 2

0

1

ni

ii

ni

i

y xx g

hy

y xg

h

Find mode of

2

1

ni

i

y xc k

h

using

2

12

nh i

ii

C y xw k

h

The mode of = sought maximum

Extended Mean-Shift:

2

0

1

1 2

0

1

ni

i ii

ni

ii

y xx w g

hy

y xw g

h

2

1

ni

ii

y xc w k

h

Find mode of

using

Page 62: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingAbout Kernels and Profiles

A special class of radially symmetric kernels: 2

K x ck x

The profile of kernel K

Extended Mean-Shift:

2

0

1

1 2

0

1

ni

i ii

ni

ii

y xx w g

hy

y xw g

h

2

1

ni

ii

y xc w k

h

Find mode of using

k x g x

Page 63: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingChoosing the Kernel

Epanechnikov kernel:

1 if x 1

0 otherwise

xk x

A special class of radially symmetric kernels: 2

K x ck x

11

1

n

i ii

n

ii

x wy

w

2

0

1

1 2

0

1

ni

i ii

ni

ii

y xx w g

hy

y xw g

h

1 if x 1

0 otherwiseg x k x

Uniform kernel:

Page 64: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingAdaptive Scale

Problem:

The scale of the target

changes in time

The scale (h) of the kernel

must be adapted

Solution:

Run localization 3

times with different h

Choose h that achieves

maximum similarity

Page 65: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingResults

Feature space: 161616 quantized RGBTarget: manually selected on 1st frameAverage mean-shift iterations: 4

Page 66: EECS 274 Computer Vision Tracking. Motivation: Obtain a compact representation from an image/motion sequence/set of tokens Should support application

Mean-Shift Object TrackingResults

Partial occlusion Distraction Motion blur