seminar on vision and learning university of california, san diego september 20, 2001

34
Learning and Vision Seminar Anand D. Subramaniam Seminar on Vision and Learning University of California, San Diego September 20, 2001 Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Presented by : Anand D. Subramaniam Electrical and Computer Engineering Dept., University of California, San Diego

Upload: nitesh

Post on 13-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Seminar on Vision and Learning University of California, San Diego September 20, 2001 Learning and Recognizing Human Dynamics in Video Sequences Christoph Bregler Presented by : Anand D. Subramaniam Electrical and Computer Engineering Dept., University of California, San Diego. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Seminar on Vision and LearningUniversity of California, San Diego

September 20, 2001

Learning and Recognizing Human Dynamics in Video Sequences

Christoph Bregler

Presented by :Anand D. Subramaniam Electrical and Computer Engineering Dept.,University of California, San Diego

Page 2: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Outline• Gait Recognition

• The Layering Approach

• Layer One - Input Image Sequence — Optical Flow

• Layer Two - Coherence Blob Hypothesis — EM Clustering

• Layer Three - Simple Dynamical Categories— Kalman Filters

• Layer Four - Complex Movement Sequences— Hidden Markov Models

• Model training

• Simulation results

Page 3: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Gait Recognition

Running

Walking Skipping

Page 4: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

The Layering Approach

Layer 1

Layer 2

Layer 3

Layer 4

Page 5: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Input Image Sequence Layer 1• Feature vector comprises of optical flow, color value and

pixel value.

Optical Flow equation

Affine Motion Model

Affine Warp

0),,(),(),,( yxtIyxvyxtI t

y

x

dysxs

dysxsyxv

2,21,2

2,11,1),(

y

x

d

d

ss

ssS ,

1

1

2,21,2

2,11,1

Page 6: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Page 7: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Expectation Maximization Algorithm• EM is an iterative algorithm which computes locally optimal

solutions to certain cost functions.

• EM simplifies a complex cost function into a bunch of easily solvable cost functions by introducing a “missing parameter”.

• Missing data is the Indicator Function .y

iS

Page 8: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Expectation Maximization Algorithm

• EM iterates between two steps

• E - Step : — Estimate the conditional mean estimate of the missing

parameter given the previous estimate of model parameters and the observations.

• M - Step :— Re-estimate the model parameters given the soft clustering

done by the E - Step.

• EM is numerically stable with the likelihood non-decreasing with every iteration.

• EM converges to a local optima.

• EM has linear convergence.

Page 9: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Density Estimation using EM• Gaussian mixture models can be used to model any given

probability density function to arbitrary accuracy by using sufficient number of clusters. ( curve fitting using Gaussian kernels)

• For a given number of clusters, the EM tries to minimize the Kullback-Leibler divergence measure between the arbitrary pdf and the class of Gaussian mixture models with the given number of clusters.

0 200 400 600 800 1000 12000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Page 10: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Coherence Blob Hypotheses Layer 2

k yxkyxk

k yxkk

k yxkk

kkk

k

K

kkkk

yx

tyxyxtIPSC

tyxPyxtSC

yxtSC

tyxyxtIPtyxP

tyxyxtIkyxtSPyxtS

tyxyxtIPtyxPttyxyxtIP

tyxyxtIPtIP

,,,3

,2

,1

0

,

)(,,,,log

)(,log),,(

log,,

)(,,),,()(,

)(,,),,,(),,(,,

)(,,),,()(,)()( ,),,,(

)( ,),,,()(

LikelihoodEquation

MixtureModel

MissingData

SimplifiedCost

Functions

Page 11: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

EM Initialization

• We need to track the temporal variation of blob parameters in order to initialize the EM for a given frame.

• Kalman filters

• Recursive EM using Conjugate priors

Page 12: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Page 13: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

All Roads Lead From Gauss 1809 “ … since all our measurements and observations are nothing more

than approximations to the truth, the same must be true of all

calculations resting upon them, and the highest aim of all

computations made concerning concrete phenomenon must be to

approximate, as nearly as practicable, to the truth. But this can be

accomplished in no other way than by suitable combination of more

observations than the number absolutely requisite for the determination of

the unknown quantities. This problem can only be properly undertaken

when an approximate knowledge of the orbit has been already attained,

which is afterwards to be corrected so as to satisfy all the observations

in the most accurate manner possible.”

- From Theory of the Motion of the Heavenly Bodies Moving about the Sun in Conic Sections, Gauss, 1809

Page 14: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Estimation Basics• Problem statement

• Observation Random variable X (Given)

• Target Random Variable Y (Unknown)

• Joint Probability Density f(x,y) (Given)

• What is the best estimate yopt=g(x) which minimizes the expected mean square error between yopt and y ?

• Answer : Conditional Mean g(x) = E(Y|X=x)

• Estimate g(x) can be potentially nonlinear and unavailable in closed form.

• When X and Y are jointly Gaussian g(x) is linear.

• What is the best linear estimate ylin=Wx which minimizes the mean square error ?

Page 15: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Wiener Filter 1940

Wiener-Hopf Solution : W = RYX (Rxx)-1

• Involves Matrix Inversion

• Applies only to stationary processes

• Not amenable for online recursive implementation.

Span(X)

Y

Ylin

Page 16: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Kalman Filter

• The estimate can be obtained recursively.

• Can be applied to non-stationary processes.

• If measurement noise and process noise are white and Gaussian, then the filter is “optimal”.

• Minimum variance unbiased estimate

• In the general case, the Kalman filter is the best linear estimator among all linear estimators.

kkkk

kkkk

vyMx

uyAy

1Process Model :

Measurement Model :

STATE SPACE MODEL

Page 17: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

The Water Tank Problem

tt

ttt

rr

rLL

rdt

dL

1

1 1

Guassian i.i.dmean zero are ,

01

10

11

2,

1,

1

1

kk

kt

tt

k

k

t

t

t

t

vu

vr

LL

u

u

r

L

r

L

Process Model :

Measurement Model :

Page 18: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

What does a Kalman filter do ?• The Kalman filter propagates the conditional density in time.

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8 1xyf 2xyf

21, xxyf

Page 19: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

How does it do it ?

• The Kalman filter iterates between two steps

• Time Update (Predict)— Project current state and covariance forward to the next time

step, that is, compute the next a priori estimates.

• Measurement Update (Correct)—Update the a priori quantities using noisy measurements, that is,

compute the a posteriori estimates.

• Choose Kk to minimize error covariance

kkkkkk xMxKyy ˆˆˆ

Page 20: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Applications

GPS Satellite orbitcomputation

Active noise control

Tracking

Page 21: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

The Layering Approach

Layer 1

Layer 2

Layer 3

Layer 4

Page 22: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Simple Dynamical Categories Layer 3• A sequence of blobs k(t), k(t+1),…, k(t+d) is grouped into

dynamical categories. The group assignment is “soft”.

• The dynamical categories are represented with a set of M second order linear dynamical systems.

• Each category is certain phase during a gait cycle.

• Categories called “movemes” (like “phonemes”).

• Dm(t,k) : Probability that a certain blob k(t) belongs to one of the dynamical categories m.

Q(t) = A1m Q(t-2) + A0m Q(t-1) + Bm w

• Q(t) is the motion estimate of the specific blob k(t), w is the system noise and Cm= Bm .(Bm)T is the system covariance.

• The dynamical systems form states of a Hidden Markov Model.

Page 23: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Page 24: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

The Model

Page 25: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Trellis representation

Page 26: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

HMM in speech

Page 27: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

HMM model parameters

State Transition Matrix : AObservation state PDF : BNumber of states : NNumber of Observation levels : MInitial probability distribution :

,,,, MNBA

Page 28: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Three Basic Problems• Given the observation sequence O = O1

O2…OT, and a model , how do we efficiently compute P(O|), the probability of the observation sequence, given the model ?

• Given the observation sequence O = O1

O2…OT, and the model , how do we choose a corresponding state sequence Q = q1q2…qT, which best “explains” the observations ?

• How do we adjust the model parameters to maximize P(O|) ?

ForwardBackwardAlgorithm

ViterbiAlgorithm

BaumWelch

Algorithm

Page 29: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

How do they work ? Key ideas

• Both Forward-Backward algorithm and the Viterbi algorithm solve the associated problem by induction (or recursively).

• The induction is a consequence of the Markovity of the model.

• The Baum-Welch is exactly the EM algorithm with a different “missing parameter”.

• The missing parameter is the state a particular observation belongs to.

Page 30: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

The Layering Approach

Layer 1

Layer 2

Layer 3

Layer 4

Page 31: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

Complex Movement Sequences Layer 4

• Each dynamical system becomes a state of a Hidden Markov Model.

• Different gaits are modeled using different HMM’s.

• Paper uses 33 sequences of 5 different subjects performing 3 different gait categories.

• Choose that HMM that has the maximum likelihood given the observation.

• Number of correct classified gait cycles in the test set varied from 86% to 93%.

Page 32: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

References

• EM Algorithm

• A.P. Dempster, N.M. Laird and D.B. Rubin, “Maximum Likelihood from incomplete data via the EM Algorithm”, Journal of the Royal Statistical Society, 39(B),1977.

• Richard A. Redner and Homer F. Walker, “Mixture densities, Maximum likelihood and the EM algorithm”, SIAM Review, vol. 26.,No. 2, April 1984.

• G.J. McLachlan and T. Krishnan, “EM Algorithm and its extensions”, Wiley and Sons, 1997.

• Jeff A. Bilmes, “A Gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and Hidden Markov Models”, available on the net.

Page 33: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

References

• Kalman Filter

• Anderson, B. D. O. and Moore, J. B. (1979). Optimal Filtering. Prentice-Hall, Englewood Cliffs, NJ.

• H. Sorenson, "Kalman Filtering: Theory and Application," IEEE Press, 1985.

• Peter Maybeck, Stochastic Models, Estimation, and Control, Volume 1, Academic Press. 1979

• Web site : http://www.cs.unc.edu/~welch/kalman/

Page 34: Seminar on Vision and Learning University of California, San Diego September 20, 2001

Learning and Vision Seminar Anand D. Subramaniam

References

• Hidden Markov Models

• Rabiner, “ An introduction to Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, 1989.

• Rabiner and Juang, “An introduction to Hidden Markov Models”, IEEE ASSP Magazine, 1986.

• M.I. Jordan and C.M. Bishop, “An Introduction to Graphical Models and Machine Learning”, ask Serge.