eecs 274 computer vision tracking. motivation: obtain a compact representation from an image/motion...
TRANSCRIPT
EECS 274 Computer Vision
Tracking
Tracking
• Motivation: Obtain a compact representation from an image/motion sequence/set of tokens
• Should support application• Broad theory is absent at present
• Some slides from R. Collins lecture notes
• Reading: FP Chapter 17
Tracking
• Very general model: – We assume there are moving objects, which have an
underlying state X– There are measurements Y, some of which are functions
of this state– There is a clock
• at each tick, the state changes• at each tick, we get a new observation
• Examples– object is ball, state is 3D position+velocity,
measurements are stereo pairs– object is person, state is body configuration,
measurements are frames, clock is in camera (30 fps)
Three main steps
Simplifying assumptions
Tracking as induction
• Assume data association is done– we’ll talk about this later; a dangerous
assumption
• Do correction for the 0’th frame• Assume we have corrected estimate
for i’th frame– show we can do prediction for i+1,
correction for i+1
General model for tracking
• The moving object of interest is characterized by an underlying state X
• State X gives rise to measurements or observations Y
• At each time t, the state changes to Xt and we get a new observation Yt
X1 X2
Y1 Y2
Xt
Yt
…
Can be better explained with graphical model
Base case
Induction step
Given
Induction step
Explanation with graphical model
Explanation with graphical model
Explanation with graphical model
Linear dynamic models
);(~
);(~ 11
i
i
miii
diii
N
N
ΣxMy
ΣxDx
• Use notation ~ to mean “has the pdf of”, N(a, b) is a normal distribution with mean a and covariance b.
• Then a linear dynamic model has the form
• This is much, much more general than it looks, and extremely powerful
D: transition matrixM: observation (measurement) matrix
Propagation of Gaussian densities
Examples
• Drifting points– we assume that the new position of the point is
the old one, plus noise.– For the measurement model, we may not need
to observe the whole state of the object• e.g. a point moving in 3D, at the 3k’th tick we see x,
3k+1’th tick we see y, 3k+2’th tick we see z• in this case, we can still make decent estimates of all
three coordinates at each tick.
– This property, which does not apply to every model, is called Observability
Examples
• Points moving with constant velocity• Periodic motion• etc.• Points moving with constant
acceleration
Moving with constant velocity• We have
– (the Greek letters denote noise terms)• Stack (u, v) into a single state vector, xi
– which is the form we had above
iii
iiii t
ςvv
εvpp
1
11
noise0
1
id
d
iI
tI
v
p
v
p
0,0 di
d
ddi I
I
tII
MD
);(~
);(~ 11
i
i
miii
diii
N
N
ΣxMy
ΣxDx
Moving with constant acceleration• We have
– (the Greek letters denote noise terms)
• Stack (u, v) into a single state vector
– which is the form we had above
iii
iiii
iiii
t
t
ξaa
ςavv
εvpp
1
11
11
noise
00
0
0
1
id
d
d
iI
tI
tI
a
v
u
a
v
u
00,
00
0
0
di
d
dd
dd
i I
I
tII
tII
MD
Constant velocity dynamic model
state1st component of stateposition
measurement and 1st component of state
position
velocity
time
time
Constant acceleration dynamic model
position
velocityposition
time
Kalman filter
• Key ideas: – Linear models interact uniquely well
with Gaussian noise • make the prior Gaussian, everything else
Gaussian and the calculations are easy
– Gaussians are really easy to represent • once you know the mean and covariance,
you’re done
Kalman filter in 1D
• Dynamic Model
• NotationPredicted mean
Corrected mean
Before/after i-th measurement
Prediction for 1D Kalman filter• Because the new state is obtained by
– multiplying old state by known constant– adding zero-mean noise
• Therefore, predicted mean for new state is– constant times mean for old state
• Predicted variance is – sum of constant^2 times old state variance and noise
variance– old state is normal random variable, multiplying normal
rv by constant implies mean is multiplied by a constant variance by square of constant, adding zero mean noise adds zero to the mean, adding rv’s adds variance
Correction for 1D Kalman filter• Pattern match to identities given in book
– basically, guess the integrals, get:
• Notice:– if measurement noise is small, we rely mainly on the measurement,– if it’s large, mainly on the prediction
iiim
i
mi
i
iim
ii
i
i
xx
mm
yx
, large is
,0
Example: Kalman filtering
t)measuremenafter n (correctio :
t)measuremen beforen (predictio :*
tmeasuremen:
state :
i
i
x
x
In higher dimensions, derivation follows the same lines, but isn’t as easy. Expressions here.
Kalman filter: general case
the measurement standard deviation is small, so the state estimates are rather good
Smoothing
• Idea– We don’t have the best estimate of state -
what about the future?– Run two filters, one moving forward, the
other backward in time.– Now combine state estimates
• The crucial point here is that we can obtain a smoothed estimate by viewing the backward filter’s prediction as yet another measurement for the forward filter
– so we’ve already done the equations
Forward filter
t)measuremenafter n (correctio :
t)measuremen beforen (predictio :*
tmeasuremen:
state :
i
i
x
x
Backward filter
t)measuremenafter n (correctio :
t)measuremen beforen (predictio :*
tmeasuremen:
state :
i
i
x
x
Forward-backward filter
t)measuremenafter n (correctio :
t)measuremen beforen (predicito :*
tmeasuremen:
state :
i
i
x
x
Data association
•Also known as correspondence problem• Given features detectors (measurements) are not perfect, how can one find correspondence between measurements and features in the model?
Data association
• Determine which measurements are informative (as not every measurement conveys the same amount of information)
• Nearest neighbors– choose the measurement with highest probability
given predicted state– popular, but can lead to catastrophe
• Probabilistic Data Association (PDA)– combine measurements, weighting by probability
given predicted state– gate using predicted state
Data association
Constant velocity modelmeasurements are plottedwith standard deviation (dash line)
t)measuremenafter n (correctio :
t)measuremen beforen (predictio :*
tmeasuremen:
state :
i
i
x
x
Data association with NN•The blue track represents the actual measurements of state, and red tracks are noise•The noise is all over the place, and the gate around each predictiontypically contains only one measurement (the right one).•This means that we can track rather well ( the blue track is very largely obscured by the overlaid measurements of state)
•Nearest neighbors (NN) pick best measurements (consistent with predictions) work well in this case•Occasional mis-identifcaiton may not cause problems
t)measuremenafter n (correctio :
t)measuremen beforen (predictio :*
tmeasuremen:
state :
i
i
x
x
Data association with NN
• If the dynamic model is not sufficiently constrained, choosing the measurement that best may lead to failure
t)measuremenafter n (correctio :
t)measuremen beforen (predictio :*
tmeasuremen:
state :
i
i
x
x
Data association with NN
• But actually the tracker loses tracks
•These problems occur because error can accumulate
•it is now relatively easy to continue tracking the wrong point for a long time, •and the longer we do this the less chance there is of recovering the right point
Probabilistic data association• Instead of choosing the region most
like the predicted measurement, • we can exclude all regions that are
too different and then use the others,• weighting them according to their
similarity to prediction
ji
jijih
ii
hPE
P
yyyy
yyy
),,|(
),,|( using of Instead
10
10
superscipt indicates the region
Nonlinear dynamics
•As one repeats iterations of this function --- there isn’t anynoise --- points move towards or away from points where sin(x)=0.
•This means that if one hasa quite simple probability distribution on x_0, one can end up with a very complex distribution (many tiny peaks) quite quickly.
•Most easily demonstrated by running particles through thisdynamics as in the next slide.
)sin(1 iii xxx
Nonlinear dynamics
•Top figure shows tracks of particle position against iteration for the dynamics of the previousslide, where the particle position was normal in the initial configuration (first graph at the bottom)•As the succeeding graphs show, the number of particles near a point (= histogram = p(x_n))very quickly gets complex, with many tiny peaks.
Propagation of general densities
Factored sampling
• Represent the state distribution non-parametrically– Prediction: Sample points from prior density for the state,
p(x)– Correction: Weight the samples according to p(y|x)
)|()(
)(
)(
)()|()|(
1
)(
)(
xyx
s
s
xxyyx
pp
p
p
pkpp
z
N
j
jz
nz
n
ttttt
tttttt
dpp
ppp
xyyxxy
yyxxyyyx
10
100
,,||
,,||,,|
Representing p(y|x) in terms of {st-1(n), πt-1
(n) }
Particle filtering
• We want to use sampling to propagate densities over time (i.e., across frames in a video sequence)
• At each time step, represent posterior p(xt|yt) with weighted sample set
• Previous time step’s sample set p(xt-1|yt-1) is passed to next time step as the effective prior
Particle filtering
M. Isard and A. Blake, “CONDENSATION -- conditional density propagation for visual tracking”, IJCV 29(1):5-28, 1998
Start with weighted samples from previous time step
Sample and shift according to dynamics model
Spread due to randomness; this is predicted density p(xt|yt-1)
Weight the samplesaccording to observation density
Arrive at corrected density estimate p(xt|yt)
Paramterization of splines
Dynamic model p(xt|xt-1)
• Can learn from examples using ARMA or LDS
• Can be modeled with random walk
• Often factorize xt
Observation model p(yt|xt)
Observation model: 2D case
Particle filtering results
Issues• Initialization• Obtaining observation and dynamics model
– Generative observation model: “render” the state on top of the image and compare– Discriminative observation model: classifier or detector score– Dynamics model: learn (very difficult) or specify using domain knowledge
• Prediction vs. correction– If the dynamics model is too strong, will end up ignoring the data – If the observation model is too strong, tracking is reduced to repeated
detection
• Data association– What if we don’t know which measurements to associate with which
tracks?
• Drift– Errors caused by dynamical model, observation model, and data
association tend to accumulate over time
• Failure detection
Mean-Shift Object Tracking
Search in the model’s
neighborhood in next frame
Start from the position of the model in the current frame
Find best candidate by maximizing a
similarity func.
Repeat the same process in the next pair of
frames
Current frame
… …Model Candidate
Mean-Shift Object TrackingTarget Representation
Choose a reference target
model
Quantized Color Space
Choose a feature space
Represent the model by its PDF in the
feature space
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 2 3 . . . m
color
Probability
Kernel Based Object Tracking, by Comaniniu, Ramesh, Meer
Mean-Shift Object TrackingPDF Representation
,f y f q p y Similarity
Function:
Target Model(centered at 0)
Target Candidate(centered at y)
1..1
1m
u uu mu
q q q
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 2 3 . . . m
color
Probability
1..
1
1m
u uu mu
p y p y p
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 . . . m
color
Probability
Mean-Shift Object TrackingFinding the PDF of the target model
1..i i nx
Target pixel locations
( )k x A differentiable, isotropic, convex, monotonically decreasing kernel• Peripheral pixels are affected by occlusion and background interference
( )b x The color bin index (1..m) of pixel x
2
( )i
u ib x u
q C k x
Normalization factor
Pixel weight
Probability of feature u in model
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 . . . m
color
Probability
Probability of feature u in candidate
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 . . . m
color
Probability
2
( )i
iu h
b x u
y xp y C k
h
Normalization factor Pixel weight
0
model
y
candidate
Mean-Shift Object TrackingSimilarity Function
1 , , mp y p y p y
1, , mq q q
Target model:
Target candidate:
Similarity function: , ?f y f p y q
1 , , mp y p y p y
1 , , mq q q
1
1
The Bhattacharyya Coefficient
1
cosT m
y u uu
p y qf y p y q
p y q
'q
)(' yp
y
Mean-Shift Object TrackingTarget Localization Algorithm
Start from the position of the model in the current frame
Search in the model’s
neighborhood in next frame
Find best candidate by maximizing a
similarity func.
'q
)(' yp ]'),('[ qypf
01 1 0
1 1
2 2
m mu
u u uu u u
qf y p y q p y
p y
Linear approx.
(around y0)
Mean-Shift Object TrackingApproximating the Similarity Function
0yModel location:yCandidate location:
2
12
nh i
ii
C y xw k
h
2
( )i
iu h
b x u
y xp y C k
h
Independent of y
Density estimate!
(as a function of y)
1
m
u uu
f y p y q
Mean-Shift Object TrackingMaximizing the Similarity Function
2
12
nh i
ii
C y xw k
h
The mode of = sought maximum
Important Assumption:
One mode in the searched
neighborhood
The target representation
provides sufficient discrimination
Mean-Shift Object TrackingApplying Mean-Shift
Original Mean-Shift:
2
0
1
1 2
0
1
ni
ii
ni
i
y xx g
hy
y xg
h
Find mode of
2
1
ni
i
y xc k
h
using
2
12
nh i
ii
C y xw k
h
The mode of = sought maximum
Extended Mean-Shift:
2
0
1
1 2
0
1
ni
i ii
ni
ii
y xx w g
hy
y xw g
h
2
1
ni
ii
y xc w k
h
Find mode of
using
Mean-Shift Object TrackingAbout Kernels and Profiles
A special class of radially symmetric kernels: 2
K x ck x
The profile of kernel K
Extended Mean-Shift:
2
0
1
1 2
0
1
ni
i ii
ni
ii
y xx w g
hy
y xw g
h
2
1
ni
ii
y xc w k
h
Find mode of using
k x g x
Mean-Shift Object TrackingChoosing the Kernel
Epanechnikov kernel:
1 if x 1
0 otherwise
xk x
A special class of radially symmetric kernels: 2
K x ck x
11
1
n
i ii
n
ii
x wy
w
2
0
1
1 2
0
1
ni
i ii
ni
ii
y xx w g
hy
y xw g
h
1 if x 1
0 otherwiseg x k x
Uniform kernel:
Mean-Shift Object TrackingAdaptive Scale
Problem:
The scale of the target
changes in time
The scale (h) of the kernel
must be adapted
Solution:
Run localization 3
times with different h
Choose h that achieves
maximum similarity
Mean-Shift Object TrackingResults
Feature space: 161616 quantized RGBTarget: manually selected on 1st frameAverage mean-shift iterations: 4
Mean-Shift Object TrackingResults
Partial occlusion Distraction Motion blur