learning and recognizing activities in streams of video dinesh govindaraju

22
Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Post on 22-Dec-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Learning and Recognizing Activities in Streams of Video

Dinesh Govindaraju

Page 2: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Motivation

Activity recognition from video for higher functionalityWho is presenting

agenda itemAttendee interest

levels

Page 3: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Motivation

Want it to be automatic and not involve hand generation of modelsImpractical in the case of many

activitiesLess versatile as you might be

constrained to particular aspects of the problem

Page 4: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Problem Definition

Video Data Observations are extracted

movement deltas via face tracking Hand label training segments Learn underlying models from

training segments Carry out activity recognition

Page 5: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Learning

Assume underlying models can be approximated by HMMs

Use Baum Welch to learn best model using training segments

Need to find observation space and number of states

Page 6: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Learning

To find observation space:Run through all training segments

and add observationsFor new observation when doing

recognition, augment learned observation matrices

Page 7: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Learning

To find number of states, Q (for each activity):Set upper bound as length of longest

training segmentIterate over values and generate

most likely model using Baum Welch

Page 8: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Learning

To find number of states, Q (for each activity):Choose best Q using N-fold cross

validation using criterion of discriminative power

With best Q, run Baum Welch using a number of sets of randomly initialized parameters to get λa

Page 9: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Recognition

Define a window width, w From the beginning, sequentially

consider windows of observations (where L is length of entire sequence)

Page 10: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Recognition

Calculate likelihood of each window segment

L Rabinier, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings IEEE, 1989

Page 11: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Approach - Recognition

Label middle frame in each window with activity with highest likelihood

Page 12: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Activities being observed:

Page 13: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Observation stream obtained from 87 second long image sequence

1296 individual frames Example frames after face detection:

Page 14: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Observation sequence first hand labeled

Segments showing same activity extracted

4 training segments used to learn each activity

Page 15: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Page 16: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Once underlying models were learned, calculate likelihood using sliding window

Value of 21 was used for the window width, w, as this was the average length of training segments

Page 17: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Page 18: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Carry out recognition using the likelihoods by assigning activities to the frames

Compare against hand assigned labels

Accuracy approximately 76%

Page 19: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Algorithm assigned:

Different from hand label

Same as hand label

Page 20: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Evaluation and Results

Hand assigned:

Different from algorithm label

Same as algorithm label

Page 21: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Future Work

Learn underlying model generating sequence of activities themselves

Standardize lengths of training segments using Dynamic Time Warping and use that as the window width

Page 22: Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

The End

Questions