action recognition from video using feature covariance matrices

25
ACTION RECOGNITION FROM VIDEO USING FEATURE COVARIANCE MATRICES Kai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE

Upload: hasana

Post on 22-Feb-2016

91 views

Category:

Documents


0 download

DESCRIPTION

Action Recognition from Video Using Feature Covariance Matrices. Kai Guo , Prakash Ishwar , Senior Member, IEEE , and Janusz Konrad , Fellow, IEEE. Outline. Introduction Framework Action Feature Experiments Conclusion. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Action Recognition  from Video Using Feature  Covariance Matrices

ACTION RECOGNITION FROM VIDEO USING FEATURE COVARIANCE MATRICESKai Guo, Prakash Ishwar, Senior Member, IEEE, and Janusz Konrad, Fellow, IEEE

Page 2: Action Recognition  from Video Using Feature  Covariance Matrices

OUTLINE Introduction Framework Action Feature Experiments Conclusion

Page 3: Action Recognition  from Video Using Feature  Covariance Matrices

INTRODUCTION A new approach to action representation—

one based on the empirical covariance matrix of a bag of local action features.

We apply the covariance matrix representation to two types of local feature collections:

1. A sequence of silhouettes of an object (the so – called silhouette tunnel) 2. The optical flow.

Page 4: Action Recognition  from Video Using Feature  Covariance Matrices

INTRODUCTION We focus on two distinct types of classifiers: 1. The nearest-neighbor (NN) classifier. 2. the sparse-linear approximation (SLA) classifier.

Transformation of the supervised classification problem in the closed convex cone of covariance matrices into an equivalent problem in the vector space of symmetric matrices via the matrix logarithm.

Page 5: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK Feature Covariance Matrices We adopt a “bag of dense local feature

vectors” modeling approach.

Inspired by Tuzel et al.’s work, the feature-covariance matrix can provide a very discriminative representation for action recognition.

Page 6: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK

Let F = {fn} denote a “bag of feature vectors” extracted from a video sample, the size of the feature set |F| be N.

The empirical estimate of the covariance estimate of the covariance matrix of F is given by:

Where is the empirical mean feature vector.

Page 7: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK Log-Covariance Matrices

A key idea is to map the convex cone of covariance matrices to the vector space of symmetric matrices1 by using the matrix logarithm proposed by Arsigny et al. .

The eigen-decomposition of C is given by C = Then log(C) := , where is a diagonal matrix

obtained from D by replacing D’s diagonal entries by their logarithms.

Page 8: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK Classification Using Log-Covariance Matrices

Nearest-Neighbor (NN) Classification: Given a query sample, find the most similar sample in the annotated training set, where similarity is measured with respect to some distance measure, and assign its label to the query sample.

Page 9: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK Sparse Linear Approximation (SLA) Classification: We approximate the log-covariance matrix of a query

sample by a sparse linear combination of log-covariance matrices of all training samples p1, . . . , pN .

Page 10: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK Given a query sample , one may attempt to

express it as a linear combination of training samples by solving the matrix-vector equation given by

By solving the following NP-hard optimization problem:

If the optimal solution α∗ is sufficiently sparse:

This difficulty can be overcome by introducing a noise term as follows: where z is an additive noise term whose length is assumed to be bounded by ε,

This leads to the following -minimization problem:

Page 11: Action Recognition  from Video Using Feature  Covariance Matrices

FRAMEWORK Use a reconstruction residual error (RRE) measure to

decide the query class. Let α∗ denote the coefficients associated with class i

(having label li ), corresponding to columns of training matrix Pi.

The RRE measure of class i is defined as :

To annotate the sample we assign the class label that leads to the minimum RRE

Page 12: Action Recognition  from Video Using Feature  Covariance Matrices

ACTION FEATURE Silhouette Tunnel Shape Features

Our goal is to reliably discriminate between shapes; not to accurately reconstruct them. Hence a coarse, low-dimensional representation of shape would suffice.

We capture the shape of the 3D silhouette tunnel by the empirical covariance matrix of a bag of thirteen-dimensional local shape features.

Page 13: Action Recognition  from Video Using Feature  Covariance Matrices

ACTION FEATURE We associate the following 13-dimensional feature

vector f(s) that captures certain shape characteristics of the tunnel:

Page 14: Action Recognition  from Video Using Feature  Covariance Matrices

ACTION FEATURE After obtaining 13-dimensional silhouette shape

feature vectors, we can compute their 13 × 13 covariance matrix, denoted by C, using (1) (with N = |S|):

Where is the mean feature vector.

Thus, C is an empirical covariance matrix of the collection of vectors F.

Page 15: Action Recognition  from Video Using Feature  Covariance Matrices

ACTION FEATURE Optical Flow Features Here we use a variant of the Horn and Schunck

method, which optimizes a functional based on residuals from the intensity constraints and a smoothness regularization term.

Let I (x, y, t) denote the luminance of the raw video sequence at pixel position (x, y, t) and let u(x, y, t) represent the corresponding optical flow vector .

Based on I (x, y, t) and u(x, y, t), we use the following feature vector f(x, y, t):

Page 16: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 17: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 18: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 19: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 20: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 21: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 22: Action Recognition  from Video Using Feature  Covariance Matrices

EXPERIMENTS

Page 23: Action Recognition  from Video Using Feature  Covariance Matrices

CONCLUSION The action recognition framework that we

have developed in this paper is conceptually simple, easy to implement, has good run-time performance.

The TRECVID [63] and VIRAT [64] video datasets exemplify these types of realworld challenges and much work remains to be done to address them.

Page 24: Action Recognition  from Video Using Feature  Covariance Matrices

CONCLUSION Our method’s relative simplicity, as

compared to some of the top methods in the literature, enables almost tuning-free rapid deployment and real-time operation.

This opens new application areas outside the traditional surveillance/security arena, for example in sports video annotation and customizable human-computer interaction.

Page 25: Action Recognition  from Video Using Feature  Covariance Matrices

THE END