seminar on vision and learning university of california, san diego september 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Seminar on Vision and LearningUniversity of California, San Diego

September 20, 2001

Learning and Recognizing Human Dynamics in Video Sequences

Christoph Bregler

Presented by :Anand D. Subramaniam Electrical and Computer Engineering Dept.,University of California, San Diego


Outline• Gait Recognition

• The Layering Approach

• Layer One - Input Image Sequence — Optical Flow

• Layer Two - Coherence Blob Hypothesis — EM Clustering

• Layer Three - Simple Dynamical Categories— Kalman Filters

• Layer Four - Complex Movement Sequences— Hidden Markov Models

• Model training

• Simulation results


Gait Recognition

Running

Walking Skipping


The Layering Approach

Layer 1

Layer 2

Layer 3

Layer 4


Input Image Sequence Layer 1• Feature vector comprises of optical flow, color value and

pixel value.

Optical Flow equation

Affine Motion Model

Affine Warp

0),,(),(),,( yxtIyxvyxtI t

y

x

dysxs

dysxsyxv

2,21,2

2,11,1),(

y

x

d

d

ss

ssS ,

1

1

2,21,2

2,11,1


Expectation Maximization Algorithm• EM is an iterative algorithm which computes locally optimal

solutions to certain cost functions.

• EM simplifies a complex cost function into a bunch of easily solvable cost functions by introducing a “missing parameter”.

• Missing data is the Indicator Function .y

iS


Expectation Maximization Algorithm

• EM iterates between two steps

• E - Step : — Estimate the conditional mean estimate of the missing

parameter given the previous estimate of model parameters and the observations.

• M - Step :— Re-estimate the model parameters given the soft clustering

done by the E - Step.

• EM is numerically stable with the likelihood non-decreasing with every iteration.

• EM converges to a local optima.

• EM has linear convergence.


Density Estimation using EM• Gaussian mixture models can be used to model any given

probability density function to arbitrary accuracy by using sufficient number of clusters. ( curve fitting using Gaussian kernels)

• For a given number of clusters, the EM tries to minimize the Kullback-Leibler divergence measure between the arbitrary pdf and the class of Gaussian mixture models with the given number of clusters.

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7


Coherence Blob Hypotheses Layer 2

k yxkyxk

k yxkk

k yxkk

kkk

k

K

kkkk

yx

tyxyxtIPSC

tyxPyxtSC

yxtSC

tyxyxtIPtyxP

tyxyxtIkyxtSPyxtS

tyxyxtIPtyxPttyxyxtIP

tyxyxtIPtIP

,,,3

,2

,1

0

,

)(,,,,log

)(,log),,(

log,,

)(,,),,()(,

)(,,),,,(),,(,,

)(,,),,()(,)()( ,),,,(

)( ,),,,()(

LikelihoodEquation

MixtureModel

MissingData

SimplifiedCost

Functions


EM Initialization

• We need to track the temporal variation of blob parameters in order to initialize the EM for a given frame.

• Kalman filters

• Recursive EM using Conjugate priors


All Roads Lead From Gauss 1809 “ … since all our measurements and observations are nothing more

than approximations to the truth, the same must be true of all

calculations resting upon them, and the highest aim of all

computations made concerning concrete phenomenon must be to

approximate, as nearly as practicable, to the truth. But this can be

accomplished in no other way than by suitable combination of more

observations than the number absolutely requisite for the determination of

the unknown quantities. This problem can only be properly undertaken

when an approximate knowledge of the orbit has been already attained,

which is afterwards to be corrected so as to satisfy all the observations

in the most accurate manner possible.”

- From Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections, Gauss, 1809


Estimation Basics• Problem statement

• Observation Random variable X (Given)

• Target Random Variable Y (Unknown)

• Joint Probability Density f(x,y) (Given)

• What is the best estimate yopt=g(x) which minimizes the expected mean square error between yopt and y ?

• Answer : Conditional Mean g(x) = E(Y|X=x)

• Estimate g(x) can be potentially nonlinear and unavailable in closed form.

• When X and Y are jointly Gaussian g(x) is linear.

• What is the best linear estimate ylin=Wx which minimizes the mean square error ?


Wiener Filter 1940

Wiener-Hopf Solution : W = RYX (Rxx)-1

• Involves Matrix Inversion

• Applies only to stationary processes

• Not amenable for online recursive implementation.

Span(X)

Y

Ylin


Kalman Filter

• The estimate can be obtained recursively.

• Can be applied to non-stationary processes.

• If measurement noise and process noise are white and Gaussian, then the filter is “optimal”.

• Minimum variance unbiased estimate

• In the general case, the Kalman filter is the best linear estimator among all linear estimators.

kkkk

kkkk

vyMx

uyAy

1Process Model :

Measurement Model :

STATE SPACE MODEL


The Water Tank Problem

tt

ttt

rr

rLL

rdt

dL

1

1 1

Guassian i.i.dmean zero are ,

01

10

11

2,

1,

1

1

kk

kt

tt

k

k

t

t

t

t

vu

vr

LL

u

u

r

L

r

L

Process Model :

Measurement Model :


What does a Kalman filter do ?• The Kalman filter propagates the conditional density in time.

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 1xyf 2xyf

21, xxyf


How does it do it ?

• The Kalman filter iterates between two steps

• Time Update (Predict)— Project current state and covariance forward to the next time

step, that is, compute the next a priori estimates.

• Measurement Update (Correct)—Update the a priori quantities using noisy measurements, that is,

compute the a posteriori estimates.

• Choose Kk to minimize error covariance

kkkkkk xMxKyy ˆˆˆ


Applications

GPS Satellite orbitcomputation

Active noise control

Tracking



Layer 1

Layer 2

Layer 3

Layer 4


Simple Dynamical Categories Layer 3• A sequence of blobs k(t), k(t+1),…, k(t+d) is grouped into

dynamical categories. The group assignment is “soft”.

• The dynamical categories are represented with a set of M second order linear dynamical systems.

• Each category is certain phase during a gait cycle.

• Categories called “movemes” (like “phonemes”).

• Dm(t,k) : Probability that a certain blob k(t) belongs to one of the dynamical categories m.

Q(t) = A1m Q(t-2) + A0m Q(t-1) + Bm w

• Q(t) is the motion estimate of the specific blob k(t), w is the system noise and Cm= Bm .(Bm)T is the system covariance.

• The dynamical systems form states of a Hidden Markov Model.


The Model


Trellis representation


HMM in speech


HMM model parameters

State Transition Matrix : AObservation state PDF : BNumber of states : NNumber of Observation levels : MInitial probability distribution :

,,,, MNBA


Three Basic Problems• Given the observation sequence O = O1

O2…OT, and a model , how do we efficiently compute P(O|), the probability of the observation sequence, given the model ?

• Given the observation sequence O = O1

O2…OT, and the model , how do we choose a corresponding state sequence Q = q1q2…qT, which best “explains” the observations ?

• How do we adjust the model parameters to maximize P(O|) ?

ForwardBackwardAlgorithm

ViterbiAlgorithm

BaumWelch

Algorithm


How do they work ? Key ideas

• Both Forward-Backward algorithm and the Viterbi algorithm solve the associated problem by induction (or recursively).

• The induction is a consequence of the Markovity of the model.

• The Baum-Welch is exactly the EM algorithm with a different “missing parameter”.

• The missing parameter is the state a particular observation belongs to.



Layer 1

Layer 2

Layer 3

Layer 4


Complex Movement Sequences Layer 4

• Each dynamical system becomes a state of a Hidden Markov Model.

• Different gaits are modeled using different HMM’s.

• Paper uses 33 sequences of 5 different subjects performing 3 different gait categories.

• Choose that HMM that has the maximum likelihood given the observation.

• Number of correct classified gait cycles in the test set varied from 86% to 93%.


References

• EM Algorithm

• A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum Likelihood from incomplete data via the EM Algorithm”, Journal of the Royal Statistical Society, 39(B),1977.

• Richard A. Redner and Homer F. Walker, “Mixture densities, Maximum likelihood and the EM algorithm”, SIAM Review, vol. 26.,No. 2, April 1984.

• G.J. McLachlan and T. Krishnan, “EM Algorithm and its extensions”, Wiley and Sons, 1997.

• Jeff A. Bilmes, “A Gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and Hidden Markov Models”, available on the net.


References

• Kalman Filter

• Anderson, B. D. O. and Moore, J. B. (1979). Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ.

• H. Sorenson, "Kalman Filtering: Theory and Application," IEEE Press, 1985.

• Peter Maybeck, Stochastic Models, Estimation, and Control, Volume 1, Academic Press. 1979

• Web site : http://www.cs.unc.edu/~welch/kalman/


References

• Hidden Markov Models

• Rabiner, “ An introduction to Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, 1989.

• Rabiner and Juang, “An introduction to Hidden Markov Models”, IEEE ASSP Magazine, 1986.

• M.I. Jordan and C.M. Bishop, “An Introduction to Graphical Models and Machine Learning”, ask Serge.

seminar on vision and learning university of california, san diego september 20, 2001

Documents