office of graduate studies university of south florida tampa, …sarkar/pdfs/isidro... ·...
TRANSCRIPT
Office of Graduate StudiesUniversity of South Florida
Tampa, Florida
CERTIFICATE OF APPROVAL
This is to certify that the dissertation of
ISIDRO ROBLEDO VEGA
in the graduate degree program ofComputer Science and Engineeringwas approved on August 22, 2002
for the Doctor of Philosophy degree.
Examining Committee:
Major Professor: Sudeep Sarkar, Ph.D.
Member: Dmitry Goldgof, Ph.D.
Member: Eugene Fink, Ph.D.
Member: Tapas Das, Ph.D.
Member: Thomas Sanocki, Ph.D.
Member: Kevin Bowyer, Ph.D.
Committee Verification:
Associate Dean
MOTION MODEL BASED ON STATISTICS OF FEATURE RELATIONS:
HUMAN IDENTIFICATION FROM GAIT
by
ISIDRO ROBLEDO VEGA
A dissertation submitted in partial fulfillmentof the requirements for the degree of
Doctor of PhilosophyDepartment of Computer Science and Engineering
College of EngineeringUniversity of South Florida
Date of Approval:August 22, 2002
Major Professor: Sudeep Sarkar, Ph.D.
c©Copyright by Isidro Robledo Vega 2002All rights reserved
DEDICATION
To Alex and Myrna
ACKNOWLEDGEMENTS
I want to thank CONACYT-SEP-Mexico for their support during my Ph.D. studies. This
research was supported by funds from National Science Foundation grants EIA 0130768 and
IIS-9907141 and DARPA HumanID program under contract AFOSR-F49620-00-1-00388. I
also want to thank the members of my comitte for spending their time reviewing this
manuscript; to Dr. Jonathon Phillips, Dr. Kevin Bowyer, and Patrick Grother for their
contributions in the design of the gait challenge problem; to Stan Janet and Karen Marshall
at NIST for helping us in processing the gait challenge dataset and creating the bounding
box information for the gait sequences; to Dr. Patrick Flynn at University of Notre Dame
for testing the baseline algorithm code and scripts; to my friends at the computer vision
lab, Paddu, Earnie, Jaesik, Yong and Tong for sharing their ideas and helping me in many
different ways to accomplish my goals; to Zongyi for sharing his ideas to improve the
computation of binary silhouettes; to Ayush for keeping the protocol of the data acquisitions;
to Laura, Adebola and Christy for manually extracting silhouettes, and specially to my
advisor Dr. Sudeep Sarkar for accepting me as his student and preparing me during the
last two and a half year to be a good researcher and contribute to computer vision. Thanks
my parents Apolinar and Olivia; my sisters Domy, Laura, and Claudia; my nices Gaby,
Paulina, and Sofia; and my brothers in law Rene and Cesar for their love and support for
my family and me during this last four years away from home. Finally, thanks to my wife
Myrna and my son Alex for being my greatest source of inspiration.
TABLE OF CONTENTS
LIST OF TABLES iii
LIST OF FIGURES v
ABSTRACT viii
CHAPTER 1 INTRODUCTION 1
CHAPTER 2 RELATED WORK 82.1 Biomechanics of Human Gait 82.2 Visual Perception of Human Gait 112.3 Human Gait Analysis Using Computer Vision Techniques 13
CHAPTER 3 MOTION MODELING: THEORY 173.1 Relational Distributions 17
3.1.1 Moving Edge Based Features 193.1.2 Scaling Constant D 22
3.2 Space of Probability Functions 233.3 Similarity Measures 27
3.3.1 Time Un-normalized Distance 283.3.2 Time Normalized Distance 283.3.3 Similarity Measure Based on Multiple Gait Cycles 29
CHAPTER 4 INSIGHTS INTO THE SOPF REPRESENTATION THROUGHAN EXAMPLE 31
4.1 Can We Discriminate Between Motion Types Across Persons? 344.2 For Each Person, Can We Discriminate Between Motion Types? 364.3 Is Identifying Persons Based on Motion Gait Possible? 364.4 Is the SoPF Representation Robust with Respect to Segmentation Er-
rors? 364.5 Is the SoPF Representation Stable with Respect to Scale Variations? 374.6 PCA of the Edge Images 40
CHAPTER 5 EVALUATION METHODOLOGY 435.1 Covariates 43
5.1.1 Analysis of Variance (ANOVA) 445.2 Performance Evaluation 44
5.2.1 Identification 455.2.2 Verification 45
i
5.3 Statistical Methods for the Evaluation of Human Identification Algo-rithms 45
5.3.1 Mc Nemar’s Test 475.3.2 Performance Variations due to Variation in Gallery Data 48
CHAPTER 6 HUMAN IDENTIFICATION FROM DIFFERENT GAIT TYPES 506.1 Analysis of Covariates 506.2 Gait-Based Recognition Experiments 52
CHAPTER 7 WALKING GAIT BASED IDENTIFICATION FROM DIFFER-ENT VIEW ANGLES 55
7.1 Analysis of Covariates 557.2 Gait-Based Recognition Experiments 58
CHAPTER 8 BENCHMARKING WALKING GAIT BASED IDENTIFICATION 628.1 The Gait Challenge Problem 628.2 The Data Set 628.3 Challenge Experiments 668.4 Baseline Algorithm 66
8.4.1 Silhouette Extraction 678.4.2 Similarity Computation 698.4.3 Parameters 70
8.5 Baseline Performance 71
CHAPTER 9 PERFORMANCE OF THE SOPF REPRESENTATION 749.1 Varying the Type of Low Level Features 74
9.1.1 Silhouette Masked Image Edges as Low Level Features 759.1.2 Silhouette Boundary Edges as Low Level Features 76
9.2 Using Manually Segmented Silhouettes 789.3 Performance Variation of Baseline and SoPF Algorithms due to Varia-
tions in Gallery Data 81
CHAPTER 10 CONCLUSIONS 91
REFERENCES 93
ABOUT THE AUTHOR End Page
ii
LIST OF TABLES
Table 1. Summary of recent research on gait-based recognition using computer vi-sion techniques. 16
Table 2. Distance between the traces through the SoPF of two different cycles ofmotion for the three persons and three motion types dataset. 34
Table 3. Summary statistics of distances between the traces through the SoPF forthe three persons and three motion types dataset. 37
Table 4. Distance between the traces through the SoPF of two different cycles ofmotion for the three persons and three motion types with moderate amountof segmentation noise (Walking (W), Jogging (J), and Running (R)). 38
Table 5. Summary statistics of the distances between the traces through the SoPFfor sequences with moderate amount of segmentation noise. 39
Table 6. Distance between the traces through the SoPF of two different cycles ofmotion for the three persons and three motion types with large amount ofsegmentation noise (Walking (W), Jogging (J), and Running (R)). 39
Table 7. Summary statistics of the distances between the traces through the SoPFfor sequences with large amount of segmentation noise. 40
Table 8. Distance between the traces through the SoPF of two different half scaledcycles of motion for the three persons and three motion types. 41
Table 9. Summary statistics of the distances between the traces through the SoPFof the half scaled version of the testing set, keeping the training set at theoriginal size. 41
Table 10. Sample rows from a file in SAS format for the experiment on differentmotion types. 44
Table 11. Paired data from algorithms being compared with Mc Nemar’s test. 47
Table 12. ANOVA table with results for different motion types experiments. 52
Table 13. Number of persons correctly identified for different motion types experi-ments. 53
Table 14. Distance between the traces through the SoPF of two different cycles ofwalking motion for 10 persons. 53
iii
Table 15. Distance between the traces through the SoPF of two different cycles ofjogging motion for 10 persons. 54
Table 16. Distance between the traces through the SoPF of two different cycles ofrunning motion for 10 persons. 54
Table 17. ANOVA table with results for different view angle experiments. 57
Table 18. Gallery and probe sets for gait recognition experiments over the 20 persondatabase. 58
Table 19. Number of sequences for each combination of possible surface (G or C),shoe (A or B), and camera view (L or R). 65
Table 20. The probe set for each of challenge experiments. 66
Table 21. Baseline performance for the challenge experiments in terms of the iden-tification rate PI at ranks 1 and 5, verification rate PV at a false alarmrate of 10%, and area under ROC (AUC). 71
Table 22. Performance comparison of baseline and SoPF algorithm when using sil-houette masked image edges as low level features. 76
Table 23. Performance comparison of the baseline and SoPF algorithms when usingsilhouette boundary edges as low level features. 78
Table 24. Performance comparison of the SoPF algorithm when using silhouettemasked image edges (SoPF-M) and silhouette boundary edges (SoPF-B)as low level features. 80
Table 25. Gait recognition results using ground truth silhouettes. 81
Table 26. Relationship between data subsets and challenge experiments when usingdifferent subsets as gallery. 82
Table 27. Performance variation of baseline algorithm due to variations in gallerytype. 82
Table 28. Performance variation of the SoPF algorithm due to variations in gallerytype. 90
iv
LIST OF FIGURES
Figure 1. Different phases of a walking gait cycle. 3
Figure 2. Image processing steps to build the SoPF. 5
Figure 3. The process to compute similarity between two image sequences. 6
Figure 4. Samples of Eadweard Muybridge photographs from “The Humans in Mo-tion.” 9
Figure 5. Five point light display frames of a human walking. 11
Figure 6. An empirical sampling-based interpretation of relational distributions. 18
Figure 7. Detection of edges in motion using background subtraction. 19
Figure 8. Detection of edges in motion using frame differencing. 20
Figure 9. Edge pixel based 2-ary relational distribution. 21
Figure 10. Edge pixel based 3-ary relational distribution. 22
Figure 11. Fitting of a line through the height curve generated from a walking cycleof motion at (a) 0◦ or frontal-parallel (b) 22.5◦ (c) 45◦ with respect to theimage plane to determine the scaling constant D. 24
Figure 12. Some configurations of legs in motion in (a), (c) and (e) with their corre-sponding 2-ary relational distributions in (b), (d) and (f). 25
Figure 13. Similarity measure between sequences with multiple gait cycles. 30
Figure 14. Two consecutive frames from a running sequence. 31
Figure 15. Ten most dominant dimensions of SoPF for the treadmill sequences. 32
Figure 16. Eigenvalues associated with the SoPF of 2-ary relational distributions. 33
Figure 17. Variation of (a) c1(t) and (b) c2(t) within each motion cycle for each ofthe three persons and motion types. 35
Figure 18. (a) and (b) show some typical frames where the segmentation processmisses significant portions of the legs. (c) An under segmented frame. (d)A more under segmented frame. 38
v
Figure 19. Comparison of the largest eigenvalues associated with the edge images ofpeople in motion and those associated with the SoPF of 2-ary relationaldistributions of the same images. 42
Figure 20. The process of evaluating the performance of our algorithms. 46
Figure 21. Sample frames of a person (a) walking, (b) jogging, and (c) running. 51
Figure 22. Ten most dominant dimensions of the SoPF for different motion typesdatabase consisting of 10 persons. 51
Figure 23. Setup for data acquisition of different view angle walking sequences. 56
Figure 24. Sample frames from the same person walking (a) frontal-parallel (b) 22.5◦
(c) 45◦ with respect to the image plane. 57
Figure 25. Ten most dominant dimensions of the SoPF for 20 person database. 58
Figure 26. (a) CMC and (b) ROC curves for experiments 1, 2 and 3, studying iden-tification and verification rates at varying viewpoints. 60
Figure 27. (a) CMC and (b) ROC curves for experiments 1, 4 and 5, studying varia-tion of identification and verification rates with change in view point. 61
Figure 28. Camera setup for the gait data acquisition. 63
Figure 29. Frames from (a) the left camera for concrete surface, (b) the right camerafor concrete surface, (c) the left camera for grass surface, (d) the rightcamera for grass surface. 65
Figure 30. Sample bounding boxed image data as viewed from (a) left camera onconcrete, (b) right camera on concrete, (c) left camera on grass, and (d)right camera on grass. 67
Figure 31. Estimated mean background for a sequence on (a) concrete and (c) grass.Variance of the RGB channels in the background pixels on (b) concreteand (d) grass. 68
Figure 32. The bottom row shows sample silhouette frames depicting the nature ofsegmentation issues that need to tackled. 69
Figure 33. Baseline performance for the challenge experiments, (a) CMC curves and(b) ROCs plotted up to a false alarm rate of 20%. 72
Figure 34. Moving edges (a) using the binary silhouettes as masks over the edges ofthe original images and (b) directly from the binary silhouettes. 75
Figure 35. Performance of the SoPF representation using silhouette masked imageedges as low level features. 77
Figure 36. Performance of the SoPF representation using silhouette boundary edgesas low level features. 79
vi
Figure 37. (a) Manually extracted silhouette and (b) automatically extracted silhou-ette. 80
Figure 38. CMCs of (a) baseline and (b) SoPF algorithms for experiment A (view). 83
Figure 39. CMCs of (a) baseline and (b) SoPF algorithms for experiment B (shoe). 84
Figure 40. CMCs of (a) baseline and (b) SoPF algorithms for experiment C (viewand shoe). 85
Figure 41. CMCs of (a) baseline and (b) SoPF algorithms for experiment D (surface). 86
Figure 42. CMCs of (a) baseline and (b) SoPF algorithms for experiment E (surfaceand shoe). 87
Figure 43. CMCs of (a) baseline and (b) SoPF algorithms for experiment F (surfaceand view). 88
Figure 44. CMCs of (a) baseline and (b) SoPF algorithms for experiment G (surface,shoe and view). 89
vii
MOTION MODEL BASED ON STATISTICS OF FEATURE RELATIONS:
HUMAN IDENTIFICATION FROM GAIT
by
ISIDRO ROBLEDO VEGA
An Abstract
of a dissertation submitted in partial fulfillmentof the requirements for the degree of
Doctor of PhilosophyDepartment of Computer Science and Engineering
College of EngineeringUniversity of South Florida
Date of Approval:August 22, 2002
Major Professor: Sudeep Sarkar, Ph.D.
viii
There is renewed interest in gait analysis in the computer vision community, not from
a structure-from-motion point of view, as was the past emphasis, but from the intriguing
possibility of human identification from gait. A novel representation scheme for human
gait analysis is presented here that is based on just the evolution in the statistics of the
relationships among the detected image features, without the need for object models, perfect
segmentation, or part level tracking. Instead of the statistics of the feature attributes
themselves, the statistics of the feature relations are represented as a point in a space
where the Euclidean distance is related to the Bhattacharya distance between probability
functions. Different motion types sweep out different traces in this Space of Probability
Functions (SoPF). The effectiveness of this SoPF representation is shown on four data sets
of image sequences of humans engaged in walking, jogging or running. The first set of
sequences, which was designed to study the variation with respect to segmentation errors
and scale changes, is a small one consisting of 3 persons on a treadmill in an indoor setting.
The second set of sequences, which was designed to study the possibility of recognizing
persons from walking, jogging, and running gaits, is from 10 persons in outdoor settings.
The third set of sequences, which was designed to study viewpoint variations, consists
of 20 persons walking on paths inclined at 0◦, 22.5◦, and 45◦ with respect to the image
plane. The fourth set of sequences, which was designed to study variations due to footwear,
walking surface and view from two cameras, consists of 74 persons walking elliptical paths.
The experimental results show that (a) the SoPF representation is robust with respect to
segmentation errors and scale changes, (b) personal attributes is by far the largest source
of variation when compared to factors such as direction of motion, viewpoint, and motion
type, (c) it is possible to recognize persons not only from walking gait, but also from their
jogging and running gaits, (d) identification of persons is possible from walking sequences
viewed at angles other than frontal-parallel as long as the gallery contains the gait from the
ix
same viewpoint as the probe and lastly (e) walking surface variations have significant effect
on performance.
Abstract Approved:Major Professor: Sudeep Sarkar, Ph.D.Professor, Department of Computer Science and Engineering
Date Approved:
x
CHAPTER 1
INTRODUCTION
Motion analysis deals with input from different sources, for example, static camera acquir-
ing moving objects, moving camera acquiring information about static or moving objects,
or static or moving camera acquiring images of static or moving objects with light varia-
tions. It is hard to encapsulate all of motion analysis research in computer vision. The
diversity of goals and tasks is staggering. It can include tasks such as inferring motion pa-
rameters, distinguishing rigid motion from non-rigid motion [1], computing the periodicity
of motion [2] [3] [4], or even using the motion information to infer object identities. This
last task, that is motion-based recognition, is relevant to our work. In particular, we are
interested in using high-level complex motion patterns, as exhibited when someone moves,
to recognize that person. There are many possible approaches to this problem, many of
which we discuss in the next chapter. We are, however, interested in a method that is
robust with respect to segmentation errors, does not require (point or extended) feature
correspondences, and part or object identities are also not needed. The last condition is
based on the observation that high level complex motion analysis need not be contingent
on part or object recognition [5] [6]. Many have explored methods that require feature
correspondences in terms of optic flow fields [7] [8] [9] [3] or object parts [10], but the per-
formance of these methods is strongly affected by noise, image resolution, and the extent of
frame-to-frame motion. The approaches that avoid these problems rely on more area based
measures, such as image or object self-similarity, or behavior over a long time period [2] [4].
We propose a novel strategy that emphasizes the evolution of spatial relationships among
features with motion, rather than the attributes of the individual features [11] [12].
With motion, the statistics of the relationships among the image features change. This
change or non-stationarity in relational statistics is not random, but follows the motion
1
pattern. The shape of the probability function governing the distribution of the inter-
feature relations, which can be estimated by the normalized histogram of observed values,
changes as parts of the object move. We have developed the concept of a space over these
probability functions, which we refer to as the SoPF (Space of Probability Functions),
to study the trend of change in their shapes. Distances in this space are related to the
Bhattacharya distance between probability mass functions. Each motion type creates a
trace in this space. The attractive aspects of this approach are that:
(a) it does not require perfect segmentation of the object from the background,
(b) it does not require feature tracking,
(c) it is amenable to learning, and
(d) there is no assumption about single pixel movement between frames.
It is also worthwhile pointing out that by focusing on the change in relational parameters
over time we bring dynamic aspects of motion into the fore.
The use of multidimensional histograms, even relational ones, in computer vision is
not new. They have been used extensively in image databases [13], recognition [14], and
shape modeling [15]. The novelty of the present contribution is that we offer a strategy for
incorporating dynamic aspects and use it for motion-based recognition of humans.
Interest in applications for human identification using is very high in these days. Biomet-
rics is the measure of biological or behavioral characteristics for identification of individuals.
These characteristics can be fingerprint, face, hand geometry, voice, DNA, iris, retina, ear,
gait, etc. Each biometric has different properties. Technologies that facilitate human iden-
tification at a distance are of particular interest as they are not intrusive nor do they require
contact. Gait, or the way a person walks, is such a biometric and has the advantage that
it can be collected at greater distances and does not require very co-operative subjects.
The Webster Collegiate Dictionary defines gait as “a manner of walking”. Because of the
periodic nature of the human walking, one gait cycle is considered the unit for analysis in
most of the systems devoted for human identification based on gait. A gait cycle, as defined
by Murray et al. [16], is the time interval starting when the right heel strikes the floor, going
2
Figure 1. Different phases of a walking gait cycle.
to the swing of the left leg advancing forward, then the left heel strikes the floor and the
right leg swings to advance and ending when the right heel strikes the floor again. This
process is illustrated in Fig. 1. Four phases can be distinguished in a gait cycle:
(a) Right stance phase is the period of time the right foot is in contact with the floor. It
begins with a “right heel-strike” and ends with a “right toe-off.”
(b) Left swing phase is the period of time the left foot is not in contact with the floor. It
begins with a “left toe-off” and ends with a “left heel-strike.”
(c) Left stance phase is the period of time the left foot is in contact with the floor. It
begins with a “left heel-strike” and ends with a “left toe-off.”
(d) Right swing phase is the period of time the right foot is not in contact with the floor.
It begins with a “right toe-off” and ends with a “right heel-strike.”
When the left and right stances overlap that means both feet are in contact with the
floor, this is also called “double limb support” period. The left stance phase is not completed
at the end of the gait cycle; it finishes with the “left toe-off” of the next cycle. Murray et
al. [16] suggest that if all the components of the gait movements are considered, then gait
can be unique. About twenty gait components can be considered, but some of them can be
very difficult to capture by computer vision systems since they can only be measured from
top views of the subjects (i.e. pelvis, thorax and ankle rotation).
3
In the context of human identification based on gait, the specific questions that we
explore in this dissertation using our statistical motion model are:
(a) Can we identify persons from not just walking gait but jogging and running as well?
(b) Is gait viewed frontal-parallel (which is the current practice) the only possibility?
(c) Can we identify humans from gait viewed at 22.5◦ and 45◦?
(d) Is it possible to do gait-based identification using a representation that is robust with
respect to segmentation and does not involve part level tracking?
(e) How is gait-based identification dependent on covariates such as viewpoint, shoe type,
or surface?
(f) What is the performance of gait-based identification on datasets with a large number
of subjects?
Our system for gait-based human identification using statistical motion models involves
different stages. We will try to briefly introduce these stages in the following paragraphs.
The discussion here is necessarily terse. The new concepts of relational distributions and
space of probability functions (SoPF) will become clearer in later chapters. The purpose
here is just to provide a quick overview.
In the first stage, we process a sequence of images containing a person in motion with
the purpose of segmenting the person from the background. The outputs of this process
are binary silhouettes. The heights of these silhouettes are calculated and used to compute
a scale normalization factor or scaling constant.
In the second stage, we use the binary silhouettes to extract low level features and
compute relational distributions over them. The outputs of this stage are relational dis-
tributions in the form of histograms that accumulate the occurrences of each relationship
between paired image features. Each frame in an image sequence has its corresponding
relational distribution. According to our experience, a typical image sequence of a walking
gait cycle will contain between 28 and 42 frames if acquired at 30 frames per second. This
variation is due to walking speed and stride length difference between persons.
4
Figure 2. Image processing steps to build the SoPF.
In the third stage, or training stage, the dataset is partitioned to generate a training set
of relational distributions, which we use to build a space of probability functions (SoPF)
using principal component analysis (PCA). Once the SoPF is constructed, the relational
distributions in the training set are represented as points in this space. The output of this
stage is a set of point coordinates for each relational distribution in the training set. Fig 2
illustrates the training process followed to arrive at the SoPF.
In the fourth stage, or testing stage, the relational distributions are projected onto
the SoPF to obtain their point coordinates. The sequence of point coordinates representing
relational distributions from a gait cycle traces out a path in the SoPF. We use the Euclidean
distance between the traces of two gait cycles as a similarity measure. This distance can be
time normalized to compute similarity between cycles of dissimilar gait (i.e. walking versus
jogging) or un-normalized distances can be used to measure similarity between cycles from
same gait (i.e. walking versus walking). Based on these similarity values, we compute
5
Figure 3. The process to compute similarity between two image sequences.
performance measures such as identification and verification rates. Fig. 3 shows the process
for computing the similarity measure between two image sequences of persons in motion.
The organization of this dissertation is as follows. Overview of research done on gait
analysis from biomechanics and visual perception points of view, along with the state of
the art in gait-based human identification using computer vision techniques, is presented
in Chapter 2. Then we introduce our framework for motion analysis in Chapter 3, starting
with the relational distributions, the development of the concept of the Space of Probability
Functions (SoPF), and the method used to measure similarity between image sequences.
Chapter 4 contains a set of introductory experiments over a small dataset of three persons.
Human identification experiments from walking, jogging and running gaits over a dataset of
10 persons are presented in Chapter 6. Experiments with walking gait viewed at different
6
angles over a dataset of 20 persons are presented in Chapter 7. Chapter 8 introduces a
larger dataset containing 74 subjects, a baseline algorithm and a set of experiments that we
refer to as the gait challenge problem, which serves as a benchmark for human gait based
identification. We also present the performance of our system over this large dataset in
Chapter 9. Finally, we conclude with Chapter 10.
7
CHAPTER 2
RELATED WORK
2.1 Biomechanics of Human Gait
Biomechanics is defined as “the scientific study of the mechanics of biological and especially
muscular activity” by the Webster Collegiate Dictionary and sometimes referred to only as
“gait analysis.”
Gait analysis as a science started long time back. References to experiments done by
Aristotle (384–322 BC), Leonardo da Vinci (1452–1519) and others were found in [17].
Photographic analysis started with Eadweard Muybridge in the 1870’s who analyzed horses
in motion and showed the gallop of a horse to be a four-beat gait. After his success with
analyzing horse motion, he started taking photos of other animals, including humans. Fig. 4
shows some photographs of a man walking at ordinary speed, which were digitized from [18].
The squared pattern in the background was used to measure displacement. These frames
were captured using a number of precisely time-synchronized cameras.
Research done on the mechanical aspects of human gait is multidisciplinary and can
includes fields such as anatomy, physical therapy, prosthetics, orthopedics, rehabilitation,
ergonomics, physiology, and sports science. The predominant applications involve medical
purposes. Hip, knee and ankle movement, and flexion are typically considered in clinical
gait analysis to diagnose abnormalities. Sensors used to capture these features include
3D electromagnetic motion trackers, force platforms, electromyography, and visual markers
with video systems based on fast shutter speed CCD cameras in different configurations for
2D or 3D data capture.
Aminian et al. [19] present an ambulatory system for gait analysis that can segment
gait phases. It uses small sensors, called gyroscopes, to measure the velocity of angular
rotation. They attached the sensors to the shanks of subjects. The signals produced are
8
Figure 4. Samples of Eadweard Muybridge photographs from “The Humans in Motion.”
then processed using multi-resolution wavelet decomposition to enhance heel-strike and
toe-off negative peaks in these signals.
Pappas et al. [20] also use gyroscopes in their gait phase detection system (GBDS) to
segment gait cycles into heel off, swing, heel strike and stance phases. It is composed of one
gyroscope and three force sensitive resistors installed in a shoe sole and a portable signal
processing board. The system was tested in indoor and outdoors environments showing
robustness to diverse walking conditions.
Huitema et al. [21] introduce a low cost ultrasonic motion analysis system for the mea-
surement of spatial and temporal gait parameters such as duration of stance and swing
phases, and step and stride lengths. They put a fixed ultrasonic transmitter on the floor
and installed receivers on the subject’s feet. They used heel strike and toe off signals to
measure the duration of stance and swing phases. According to them, walking speed is not
constant over a cycle for asymmetric gaits and this gait pathology can be captured by their
system.
9
Sadeghi et al. [22] present an extensive literature review with more than 160 references
and clarify the concepts of gait symmetry, gait asymmetry, limb dominance, and laterality.
They try to answer questions such as do the lower limbs behave symmetrically in able-
bodied gait? and how limb dominance effects symmetry in the behavior of lower limbs?.
They review research work supporting both gait symmetry and gait asymmetry, and mention
that there are not enough studies showing the effects of limb dominance or laterality on gait
behavior. Their conclusion is that in most of the studies, symmetry is assumed to simplify
gait analysis and that asymmetry reflects a natural difference in the behavior of limbs,
which can be caused by limb dominance or laterality. More research was recommended to
support this hypothesis.
Ambrosio et al. [23] designed a method for the reconstruction of a 3D biomechanical
model of the human body from a single camera. They use a set of kinematics constraint
equations associated with the biomechanical model to solve the system of equations that
calculates the spatial position of each anatomical point. A minimization cost function is
used to select the optimum solution based on the smoothness of the reconstructed motion.
LaFiandra et al. [24] present experiments to determine the effects of carrying a backpack
on the transverse plane of upper and lower body torque while walking. They mention that
the counter-rotation of upper and lower body is reduced when the subject is carrying a load
and suggest that the upper body torque increases. The purpose is to know the effects of
carrying load so as to reduce injuries.
Chau [25] [26] reviews several approaches for gait data analysis that include fuzzy sys-
tems, multivariate statistical techniques, fractal dynamics, neural networks and wavelet
methods. Chau considers high dimensionality, temporal dependence, correlation between
curves and nonlinear relationships of gait data to be the main challenges for its analy-
sis. This review aims to provide knowledge to researchers in clinical interpretation about
abilities and limitations of these techniques.
10
Figure 5. Five point light display frames of a human walking.
2.2 Visual Perception of Human Gait
Johansson [27] presented a method for isolating motion patterns, which is known as point
light displays. With this method he removed the interference of the body shape or aspect
with motion information. Light points were attached to body joints to produce images like
those in Fig. 5. By viewing isolated images it is hard to describe what they contain, but
when they are animated, it is easy to perceive and discriminate between different types of
motion such as walking, running, dancing, etc.
Cutting and Kozlowski [28] made use of Johansson’s method for experiments in which
subjects could recognize themselves and others claiming that point-light displays are suf-
ficient cues for identification. In a different experiment [29], they showed that men and
women gaits can also be differentiated using dynamic point light displays.
Mather and Mudoch [30] also performed experiments to discriminate gender based in
human locomotion studies showing that males and females have different lateral body sway,
they mention that males swing their arms more than females but rotate their hips less.
They used markers on shoulders and hips to measure sway from frontal views. They claim
that their approach is more robust and have better performance than the one by Kozlowski
and Cutting [29]. In previous studies [31], Mather et al. also used dynamic motion cues
to demostrate that observers can identify the direction in which the walkers are going from
just the motion of their extremities. In their experiments they removed the translatory
component of the motion displays and presented the observers only elliptical and oscillatory
components.
11
Bientema and Lappe [32] present experiments on subjects with brain lesions in motion
processing areas to show that even when they have severely impaired image motion per-
ception they can still perceive human figures from point light displays without local image
motion. Based on these studies they propose that image motion is not the basis for the
perception of biological motion, giving more importance to the dynamic evolution of the
body posture over time.
Neri et al. [33] make use of point light displays to investigate the ability of the visual
system to process biological motion over space and time. It is known that when more points
are added to the motion displays, an observer can perceive biological motion faster. They
conducted experiments in which subjects were asked to detect the presence of a walker and
the direction of walk in the presence of dynamic random noise. The points in the motion
displays appeared and disappeared over time. By adding more information over time they
found that the parts of the visual system that process biological motion are not so efficient
in constantly integrating the new information.
Pavlova et al. [34] investigated the effect of showing films backwards on the visual
perception of biological motion. They showed motion displays to a group of subjects in
forward direction and then in backward direction, which they call normal mode. Then
another group was exposed to motion displays in reverse mode. They found apparent-facing
effects in the perception of biological motion in both normal and reverse modes including
leftward and rightward motion.
Grossman et al. [35] studied functional magnetic resonance images to measure activ-
ity levels in different areas of the brain to determine which of them are directly involved
in the perception of biological motion. The area activated when viewing motion displays
was located in the superior temporal sulctus (STS). In subsequent studies, Grossman and
Blake [36] presented inverted motion displays to observers and found, by measuring the
activity levels of the regions of the brain dedicated to process biological motion, that per-
ception of biological motion is dependent to orientation, supporting claims that inverted
animations are more difficult to perceive. Activity in the STS was higher with inverted
displays than with scrambled ones.
12
Grezes et al. [37] investigate the areas of the brain involved in the perception of rigid
and non-rigid motion. They measured activity levels in the different regions of the brain
by analyzing functional magnetic resonance images. They considered that a specific neural
network in the brain performed the perception of structure from motion. They found that
the left intraparietal cortex is involved in the perception of non-rigid biological motion in
addition to the STS.
A computational interpretation for visual perception of human movements is presented
by Hoffman and Flinchbaugh [38]. In another work, Flinchbaugh and Chandrasekaran [39]
present the theory of spatio-temporal aggregation where they explain the grouping processes
performed by the visual system when exposed to image sequences. This set of works started
the crucial link between psychological studies and construction of computer vision systems
studying human motion.
2.3 Human Gait Analysis Using Computer Vision Techniques
Recently, gait analysis has received renewed interest in computer vision. It includes works
that recognize human gait types, such walking, running, jogging, or climbing [40] [41] [42],
and the identification of people from gait. We concentrate on the latter body of work.
Bobick and Johnson [43] use static body and stride parameters as features for recogni-
tion, which are recovered from different view angles, in indoor and outdoor settings, on a
database of 20 persons using electromagnetic markers and 18 persons using video-based fea-
ture recovery. An expected confusion measure is used to evaluate the discrimination ability
of the set of parameters under these different conditions. In a parallel line of work from
the same research group, Tanawongsuwan and Bobick [44] use joint-angle trajectories of
lower-body parts, as captured with 3D electro-magnetic markers attached to the body. The
3D location measurements are projected onto the walking plane to compute the joint-angle
trajectories. Recognition is performed using the nearest neighbor algorithm on a database
of 150 sequences from 18 people.
Shakhnarovich et al. [45] compensate for different viewpoints; a view-normalization ap-
proach for face and gait recognition is adopted. They first compute the visual hull using
13
images from four cameras, which is then used to produce canonical viewpoints. For gait
recognition, they generate virtual side views of the person and compute the silhouette based
on the inferred view. This silhouette is then divided into seven regions. The centroid, as-
pect ratio, and orientation of the fitted ellipses for each of the regions over time are used as
features. They use the nearest neighbor classifier, based on a diagonal covariance Gaussian
model of the features, for gait recognition. In a later work [46], they consider gait appear-
ance features for gait recognition and a support-vector machine is used to perform gender
classification.
Little and Boyd [9] describe the shape of motion with features derived from the moments
computed over the dense optical flow of image sequences. They construct sequences of
scalars from each flow and analyze them in the frequency domain. These scalars have the
same period but different phases. Recognition is performed based on the difference in phase
features between persons. In a more recent work, Boyd [47] uses phase-locked loops to
represent frequency and phase locking in the oscillations of human gait. Applying video
phase locked loop algorithm to each frame in a sequence produces a phasor containing phase
and angle information in complex form. Procrustes shape analysis is adapted to measure
the similarity between vectors of phasors from different video sequences.
Hayfron-Acquah et al. [48] base recognition on motion symmetry, as measured by a
generalized symmetry operator applied to edge maps of silhouettes. Gait recognition is per-
formed using k-nearest neighbors based on the Fourier features computed from symmetry
measurements. Another line of attack by the same research group [49] adopts a statistical
approach based on velocity moments, which are an extension of centralized moments. Veloc-
ity moments up to order four are computed over temporal templates from image sequences
of people walking. Clustering the velocity moment values achieves classification.
BenAbdelkader et al. [50] [51] introduced the concept of eigengaits where principal
component analysis (PCA) is applied to similarity plots, described in [2], to map them
to a lower dimensional space with good data separability. Similarity maps capture the
variation in image similarity over time; for periodic motion these maps are also periodic. In
a different approach [52], this group use stride length and cadence to differentiate between
persons.
14
Kale et al. [53] use continuous hidden Markov models (HMMs) trained to classify feature
vectors generated from gait sequences by computing Euclidean distances between images
from a set five key frames over one gait cycle. They claim that these feature vectors
compactly capture the structural and transitional characteristics that are unique to each
person. However, they need several gait cycles from each person to successively train the
HMMs.
Collins et al. [54] present a method for human identification based on body shape and
gait. This method performs template matching of body silhouette images from frontal-
parallel view points. Nearest neighbor classification is performed over normalized correlation
scores from training and testing silhouette images.
Table 1 summarizes the salient aspects in which this present work is different from the
state of art in gait-based recognition. The table lists the basic technology used, data size
in terms of number of persons, extent of dependence on quality of segmentation, and the
need for part level tracking. The statement regarding the dependence of an approach on
segmentation quality reflects our experience and opinions with low-level vision algorithms.
The contributions of the present work lies in that it does not require part level tracking
and, as we show later, it is robust with respect to segmentation errors. The database size
is also competitive with respect to the present state of art.
15
Table 1. Summary of recent research on gait-based recognition using computer vision tech-niques. Database size is expressed as the number of subjects and includes acquisitionconditions (“I” for indoor and “O” for outdoors), and the best recognition rate reported.
Work BasicTechnology
Number ofsubjects,acquisitionconditions,recognitionrate
Type of segmentationneeded and dependenceon quality of segmenta-tion
Part LevelTracking
Bobick andJohnson [43]
Static body pa-rameters, stridelength
18, I, none15, O, none
Silhouette divided into10 sections, dependenton quality of segmenta-tion
Yes (head,pelvis,feet)
Boyd [47] Video phase-locked loops
2 real and2 synthetic,none
Bounding box from hipdown
Yes(hip, legs)
Collins et al.[54]
Body shape andgait
25, I, 100%24, I, 100%55, O, 87%28, I, 93%
Template matching ofsilhouettes, strong de-pendence
None
Lee and Grim-son [46]
Gait appearancefeatures
24, I, 100%25, I, 99.7%
Silhouettes divided into7 regions, strong depen-dence
None
Little andBoyd [9]
Statistical mea-sures from opti-cal flow
6, O, 92.2% Optical flow computa-tion, strongly depen-dent on illuminationchanges
None
Kale et al. [53] ContinuousHMMs
5, O, none25, I, 72%43, O, 56%
Width vectors from 5silhouette images, notstrongly dependent
None
Hayfron-Acquah et al.[48]
Symmetry analy-sis
4, I, 100%6, I, 97.6%
Edge maps of silhou-ettes, strongly depen-dent
None
BenAbdelkaderet al. [50] [51]
Eigengaits: PCAover self similar-ity plots
6, O, 93%44, O, 77%7, I, 65%,25, I, 76%
Image templates fromsilhouettes, not stronglydependent
None
Shutler et al.[49]
Temporal mo-ments
4, I, none Temporal templatesfrom silhouettes, depen-dent
None
This work Non-stationarityin feature rela-tions statistics
3, I, none10, O, 100%20, O, 80%74, O, 90%
Edges in motion usingsilhouettes as masks,weakly dependent
None
16
CHAPTER 3
MOTION MODELING: THEORY
In this chapter we describe the statistical model for motion analysis developed in this
dissertation, starting with the definition of the concept of Relational Distributions, followed
by the theoretical description of the Space of Probability Functions (SoPF), and ending with
the method to compute similarity between traces in the SoPF.
3.1 Relational Distributions
We view an image as an assemblage of low-level features, such as edge pixels, corners,
straight lines, or region patches. The structure perceived in an image is determined more
by the relationships among features than by the individual feature attributes. Our goal is to
devise a mechanism to capture this structure so that we can use its evolution with time to
model high-level motion patterns. Graphs and hyper-graphs have been the most commonly
used mechanism for capturing these relationships among features [55] [56] [57] [58]. However,
the study of variation of a graph over time requires solving the correspondence problem
between features, which is a computationally difficult problem. We avoid this need for
feature-level correspondence by focusing on the statistical distribution of the relational
attributes observed in the image.
Definition 1
Let
(a) F = {f1, · · · , fN} represent the set of N features in an image,
(b) Fk represent a k-tuple of features randomly picked, and
(c) the relationship among these k-tuple features be denoted by Rk.
17
Figure 6. An empirical sampling-based interpretation of relational distributions.
Thus, the 2-ary relationship between features, which is the most commonly used form,
will be denoted by R2. Notice that low-order spatial dependence is captured by small values
of k and higher-orders of spatial dependences are captured by larger values of k. In a set
of N primitive features there are CkN possible k-tuples.
Definition 2
Let the relationships, Rk, be characterized by a set of M attributes Ak = {Ak1, . . . , AkM}.Then the shape of the object can be represented by joint probability functions: P (Ak = ak),
also denoted by P (ak1, . . . , akM ) or P (ak), where aki is the (discretized, in practice) value
taken by the relational attribute Aki.
We term these probabilities as the Relational Distributions. Fig. 6 contains a graphical
interpretation of this concept. Given an image, if you randomly pick k-tuples of features,
what is the probability that it will exhibit the relational attributes ak?, what is P (Ak = ak)?
The representation of these relational distributions can be in parametric forms or in
non-parametric, histogram or bin-based forms. The advantage of parametric forms, such as
mixture of Gaussians, is the low representational overhead. However, we have noticed that
these relational distributions exhibit complicated shapes that do not readily afford model-
ing using a combination of simple shaped distributions. So, we adopt the non-parametric
histogram based form. To reduce the size that is associated with a histogram based repre-
sentation, we propose the Space of Probability Functions, which is described in Section 3.2.
But before that, we look at a concrete example of a relational distribution.
18
(a)
(b)
(c)
Figure 7. Detection of edges in motion using background subtraction. Sample frame froma walking outdoors sequence is shown in (a) the background subtracted image is shown in(b), its corresponding edges in motion are shown in (c).
3.1.1 Moving Edge Based Features
We illustrate the concept of Relational Distributions using moving edge pixels as low-level
features. Other features types such as the neurally inspired keys [59] or those based on the
Gaussian derivatives [14] will be subject of future studies. We consider moving pixels most
likely to belong to moving objects. One of the methods we use to identify these edge pixels
in motion is as follows. We apply the Canny edge detector over each image frame and select
only those edge pixels that fall in or within a small distance from a motion mask created
either by frame differencing or by background subtraction. Fig. 7 shows the edges selected
using masks created from background subtraction. Fig. 8 shows an example of a different
method for detecting moving features using frame differencing with a liberal threshold.
Each edge pixel in motion, fi, is associated with the gradient direction, θi, at that point,
which is estimated using the Gaussian smoothed gradient that is computed by the Canny
19
(a) (b)
(c) (d)
Figure 8. Detection of edges in motion using frame differencing. Two consecutive framesfrom a running sequence are shown in (a) and (b). The thresholded difference image isshown in (c). (d) The segmented edges in motion.
edge detector. To capture the structure between edge pixels, we use the distance between
the two edge pixels and the difference in edge orientations as the attributes {A21, A22} of
R2. We normalize the distance between the pixels by a distance (D), which is related to the
size of the object in the image, to make it somewhat scale invariant. In the next section,
we discuss how we choose this scaling constant D. Note that our choice of attributes is
such that the probability representation is invariant with respect to image plane rotation,
translation, and invariant with respect to scale changes. Fig. 9(a) depicts the attributes
that are computed between the two pixels. Fig. 9(c) shows P (a21, a22) for the edge image
shown in Fig. 9(b), where high probabilities are shown as brighter pixels. Fig. 9(d) shows
a 3D bar plot of the probability values. Note the concentration of high values in certain
regions of the probability event space.
To capture the relational distribution over triples of edge pixels, P (a31, a32, a33, a34),
we can use four attributes, as illustrated in Fig. 10(a). Since all pairs of distances in the
triplet are not independent of each other, attributes over all pairs would not constitute
an independent set of attributes. To arrive at an independent set of relations, the pairs
of pixels that are connected by the maximum distance spanning tree over them can be
considered, which for Fig. 10(a) are (1, 2) and (1, 3). The four attributes characterizing
20
d
θ
Edge
Edge
(a) (b)
5
10
15
20
25
30
5
10
15
20
25
30
0
2
4
6
8
x 10−3
d/D
3D Bar Plot
θ
Pro
babi
lity
(c) (d)
Figure 9. Edge pixel based 2-ary relational distribution. (a) The two attributes character-izing relationship between two edge pixels. (b) Moving edge pixels in an image. (c) Therelational distribution P (d/D, θ), where D is a scaling constant. P (0, 0) is the top left cor-ner of the image. Brighter pixels denote higher probabilities. (d) The relational distributionshown as a 3D bar plot.
21
(a) (b)
Figure 10. Edge pixel based 3-ary relational distribution. (a) The four attributes character-izing the relationship among three edge pixels. (b) The four dimensional relational distribu-tion P (d12/D, d13/D, θ12, θ13) visualized as 2D image for the edge image in Fig. 9(b). Therows correspond to the row-scanned version of the (d12/D, d13/D) subspaces. The columnscorrespond to the row-scanned version of the (θ12, θ13) subspaces. Only non-zero rows areshown. P (0, 0, 0, 0) is the top left corner of the image.
the relationship among three edge pixels are shown in Fig. 10(a). The four dimensional
relational distribution P (d12/D, d13/D, θ12, θ13) visualized as 2D image for the edge image in
Fig. 9(b). The rows correspond to the row-scanned version of the (d12/D, d13/D) subspaces.
The columns correspond to the row-scanned version of the (θ12, θ13) subspaces.
3.1.2 Scaling Constant D
We use the scaling constant D to normalize the distance between edge features and make
them invariant with respect to scale changes. The value of the scaling constant D can be
chosen in a number of ways. If, for example, we knew that the object under consideration
occupied most of the image, then we could use image dimensions, such as the image diagonal,
as the scaling constant. We did this for the treadmill sequences and the gait challenge data
where the silhouettes are normalized (Fig. 8). A second more involved strategy, which we
use for all other sequences is as follows. First we obtain the height of the binary silhouette
from each frame. Since estimates of heights of persons from images may be noisy due to
movement, segmentation errors or perspective effects, we obtain a smoothed estimate by
22
fitting a straight line to the height curve as a function of time. Fig. 11 shows the variation
of this estimated D with time for three motion trajectories. Note that, for frontal-parallel
motion (Fig. 11(a)), D is more or less a constant. As the angle of the motion trajectory
with respect to the image plane increases (Figs. 11(b) and (c)), D changes linearly with
time, accounting for the change in size of the projected image.
3.2 Space of Probability Functions
As the parts of an articulated object move, the relational distributions will change. Motion
will introduce non-stationarity in the relational distributions. Fig. 12 shows some examples
of 2-ary relational distributions for some leg configurations. Notice how the modes of the
probability functions, which are the bright regions in the images, change with leg motion.
Is it possible to infer the nature of articulated motion by quantifying the evolution of the
nature of these non-stationarities? Is it possible to not only make gross judgments about
the nature of motion, such as distinguishing periodic motion from non-periodic one, but
can we also establish identity of the person in motion? In order to enable us to answer
these questions in the affirmative, we first set up a more compact representation for these
relational distributions that is easier to manipulate and is more parsimonious than just
plain histograms.
Definition 3
Let P (ak, t) represent the relational distribution at time t.
Definition 4
Let √P (ak, t) =
n∑i=1
ci(t)Φi(ak) + µ(ak) + η(ak) (1)
describe the square root of each relational distribution as a linear combination of orthogonal
basis functions where Φi(ak)’s are orthonormal functions, the function µ(ak) is a mean
function defined over the attribute space, and η(ak) is a function capturing small random
noise variations with zero mean and small variance. We refer to this space as the Space of
Probability Functions (SoPF).
23
(a)
(b)
(c)
Figure 11. Fitting of a line through the height curve generated from a walking cycle ofmotion at (a) 0◦ or frontal-parallel (b) 22.5◦ (c) 45◦ with respect to the image plane todetermine the scaling constant D.
24
(a) (b)
(c) (d)
(e) (f)
Figure 12. Some configurations of legs in motion in (a), (c) and (e) with their corresponding2-ary relational distributions in (b), (d) and (f).
Given a set of relational distributions, {P (ak, ti)|i = 1, · · · , T}, the SoPF can be arrived
at by using the Karhunen-Loeve transform or, for the discrete case, by principal component
analysis (PCA). The dimensions of the SoPF are given by the eigenvectors of the covariance
of the square root of the given relational distributions. The variance along each dimension is
proportional to the eigenvalues associated with it. In practice, we can consider the subspace
spanned by a few (N << n) dominant eigenvectors associated with the largest eigenvalues.
We have found that for human motion just N = 10 eigenvectors are sufficient. Thus, a
relational distribution can be represented using these N coordinates (ci(t)s), which is more
compact representation than a normalized histogram based representation.
Note that this use of the PCA is different from other uses of this technique in motion
tracking. For example, Black and Jepson [7] also used PCA but in the context of tracking
and matching moving objects. The representation is also different because they use PCA
over the image pixel space whereas we use it over relational probability functions.
25
We use the square root function so that we arrive at a space where the distances are
related to the Bhattacharya distance between the relational distributions, which we prove
in the next two theorems.
Theorem 1
The Euclidean distance between the square root of the two relational distributions,
dE(√
P (ak, t1),√
P (ak, t2)), is monotonically related to the Bhattarcharya distance between
relational distribution, dB(P (ak, t1), P (ak, t2)), as captured by
dE(√
P (ak, t1),√
P (ak, t2)) = 2 − 2e−dB(P (ak,t1),P (ak,t2))
Proof: The proof uses the facts that the sum of the probabilities equals one and that
the Bhattacharya distance between two probability functions, P1(x) and P2(x) is given by
−ln∑
x
√P1(x)P2(x).
dE(√
P (ak, t1),√
P (ak, t2)) =∑
ak(√
P (ak, t1) −√
P (ak, t2))2
=∑
akP (ak, t1) +
∑ak
P (ak, t2)
− 2∑
ak
√P (ak, t1)P (ak, t2)
= 2 − 2e−dB(P (ak,t1),P (ak,t2))
(2)
Theorem 2
In the SoPF representation, the Euclidean distance between the coordinates, {ci(t1)} and
{ci(t2)}, is monotonically related to the Bhattacharya distance between the corresponding
relational distributions P (ak, t1) and P (ak, t2).
Proof: The square roots of the relational distributions, P (ak, t1) and P (ak, t2), can be
approximately represented as follows using the SoPF coordinates; the error in the approxi-
mation is the energy of the eigenvectors ignored during SoPF construction.
√P (ak, t1) ≈
N∑i=1
ci(t1)Φi(ak) + µ(ak) (3)
26
Similarly, √P (ak, t2) ≈
N∑i=1
ci(t2)Φi(ak) + µ(ak) (4)
The Euclidean distance between them can be expressed in terms of the distance between
the coordinates as follows. We have used the fact that the dimensions of the SoPF, Φi(ak)s,
are orthonormal.
dE(√
P (ak, t1),√
P (ak, t2)) =∑
ak
(√P (ak, t1) −
√P (ak, t2)
)2
=∑
ak
(∑Ni=1 ci(t1)Φi(ak) − ∑N
j=1 cj(t1)Φj(ak))2
=∑
ak
(∑Ni=1(ci(t1) − ci(t2))Φi(ak)
)2
=∑
ak
∑ij(ci(t1) − ci(t2))(cj(t1) − cj(t2))Φi(ak)Φj(ak)
=∑
ij
∑ak
(ci(t1) − ci(t2))(cj(t1) − cj(t2))Φi(ak)Φj(ak)
=∑
ij(ci(t1) − ci(t2))(cj(t1) − cj(t2))∑
akΦi(ak)Φj(ak)
=∑
i(ci(t1) − ci(t2))2
(5)
Using this result and Theorem 1, we can write
∑i
(ci(t1) − ci(t2))2 = 2(1 − e−dB(P (ak,t1),P (ak,t2)))
3.3 Similarity Measures
Articulated motion sweeps a path or trace through the SoPF. Distances between SoPF
traces can quantify differences in motions. There are various sophisticated techniques such
as those based on hidden Markov models, dynamic Bayesian networks, and state space
trajectories [60] that can be used to model the trajectories. In this work, however, we adopt
a simpler distance measure between two traces to demonstrate the viability of using the
traced paths for discriminating between motion types and for inferring personal identity. We
show in later chapters that even with a simple distance measure we are able to obtain good
discrimination. We define two versions of this distance measure: (i) time un-normalized
and (ii) time normalized.
27
3.3.1 Time Un-normalized Distance
The time un-normalized distance between two SoPF traces is defined as the average Eu-
clidean distance between the two traces, {c1(ti), i = 1 · · · n} and {c2(ti), i = 1 · · · n}. To
compute this distance, we align the two traces with respect to one time instant from the
two traces, i.e. find shift K such that ||c1(tk) − c2(tk + K)|| is minimum for some tk. If
the number of frames of the sequences being compared is different (m < n), then the dis-
tance is computed over the minimum number of frames, m. Mathematically, this distance
is expressed as
dun-norm(c1, c2) =1m
m∑ti=1
N∑j=1
(c1j (ti) − c2
j (ti + K))2 (6)
This measure is good for comparing motion of the same type and similar speed, for example,
in comparing the gait traces from different persons who are known to be walking.
3.3.2 Time Normalized Distance
If the speed of motion is not controlled, for example when we are comparing traces from
walking and running gaits from the same person, it is desirable to normalize the two traces,
{c1(ti), i = 1 · · ·m} and {c2(ti), i = 1 · · · n}, with respect to time. We adopt a strategy
similar to dynamic time warping used in speech recognition, except that we allow only
for constant stretching or contraction. We estimate this constant warping factor by first
establishing two alignment points on the two traces. Without loss of generality, let us
assume that the first trace has fewer samples than the second one, that is m ≤ n. The
distance between these traces is computed by first constructing a continuous curve, C1(ti)
from the first trace by assuming linear interpolation between the coordinate points. Next
we stretch this curve such that the first and the last coordinates match with the second
trace, i.e. C1(mn ti). Then we compute the distance between the second trace coordinate
points and the stretched curve.
dnorm(c1, c2) =1n
n∑ti=1
N∑j=1
(c2j (ti) −C1
j(m
nti))2 (7)
28
The warped distance measure responds to changes in shapes of the traces over each motion
cycle but does not change with the speed with which each cycle is executed. Thus, the
distance between a fast walk and a slow walk would tend to be small as compared to the
distance between a walk and a run cycle.
3.3.3 Similarity Measure Based on Multiple Gait Cycles
When we have sequences containing multiple gait cycles, which is the case of the gait chal-
lenge dataset, we formulate the problem of computing a similarity measure as follows. Let
the two image sequences to be compared be denoted by S1 = {S1(1), · · · , S1(M)} and
S2 = {S2(1), · · · ,S2(N)}. We partition S1 into disjoint subsequences of NS1 contiguous
frames each, such that each subsequence contains roughly one cycle. Let the k-th subse-
quence from S1 be denoted by S1k = {S1(k), · · · ,S1(k + NS1)}. We then compare each of
these subsequences with S2:
Corr(S1k,S2)(l) =NS1∑j=1
d (S1(k + j),S2(l + j)) (8)
The distance between two frames, in our case, is the Euclidean distance between their
correspondent points in the SoPF. The similarity is chosen to be the median value of the
distance of the S2 with each of these S1 subsequences as illustrated in Fig. 13.
Similarity(S1,S2) = Mediank
(max
lCorr(S1k,S2)(l)
)(9)
This method of computing the similarity between two sequences is robust with respect
to noise that distorts the motion information in a small set of contiguous frames.
29
Figure 13. Similarity measure between sequences with multiple gait cycles.
30
CHAPTER 4
INSIGHTS INTO THE SOPF REPRESENTATION THROUGH ANEXAMPLE
In this chapter, we present results on a small dataset of three persons performing three
types of motion, walking, jogging, and running, on a treadmill to illustrate and test various
aspects of the SoPF based representation. In the following chapters, we will present results
using larger and more complex datasets.
The data for the experiments described in this chapter and in Chapters 6 and 7 was
acquired with a Canon Optura digital video (DV) camera that has a single CCD and
performs progressive scans. Video was captured at 30 frames per second on DV tapes.
Then, it was downloaded via IEEE 1394 interface to a Pinnacle’s micro DV500 video capture
board installed on a PC to produce Microsoft AVI files using Sony’s dvsd codec. The AVI
files were broken into frames in PPM format using the Sony decoder. Frames, which are
720×480 in size, were then cropped to an image subregion within which the subject appears
in all the frames. Two consecutive frames of a running person are shown in Fig. 14. The
size of each frame is 256 × 130.
The small size of the database allows us to explore the following questions, by considering
individual raw distances and not just aggregate performance measures.
Figure 14. Two consecutive frames from a running sequence.
31
Figure 15. Ten most dominant dimensions of SoPF for the treadmill sequences.
(a) For each person, can we discriminate between motion types?
(b) Can we discriminate between motion types across persons?
(c) Is it possible to identify persons based on walking, jogging, or running gaits?
(d) Is the SoPF representation robust with respect to segmentation errors?
(e) Is the SoPF representation stable with respect to scale variations?
(f) Why not just do a PCA of the raw edges?
To explore these questions, we used only the 2-ary relational distributions, P (d/D, θ),
to build the SoPF. One cycle of each motion type for each person forms the training set,
which is a total of 306 frames. The eigenvectors of the SoPF associated with the 10 largest
eigenvalues are shown in Fig. 15 as gray level images with their corresponding eigenvalues
quantifying the associated variation shown below each image. The sizes are 30 × 30 cells
each. The size of each relational distribution is 30×30. The vertical axes of the images plot
the distance attribute, d/D, and the angle θ is along the horizontal axes. From the banded
pattern in the two most dominant eigenvectors, we can see that they emphasize differences
in the distance attribute between two features. Differences in orientation are emphasized
by the other eigenvectors.
Fig. 16 shows the sorted eigenvalues for the 2-ary relational distributions. Notice that
most of the energy of the variation in the relational distributions is captured by the few
32
0 5 10 15 20 25 30 350
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Eig
enva
lues
Figure 16. Eigenvalues associated with the SoPF of 2-ary relational distributions.
large eigenvalues. For the results in our experiments, we used the eigenvectors associated
with the 10 largest eigenvalues, which are sufficient. This number of dimensions is only a
small fraction of the 900 entries in the 30 × 30 histogram representation of the relational
distributions.
Fig. 17(a) shows the variation of the coordinate, c1(t), associated with the most domi-
nant eigenvector for each of the three persons and for each motion type. Each plot shows
the variation over three motion cycles, overlaid on each other. We can make the following
observations from the figures. First, the differences in the nature of the variation for any
person and a motion type over different cycles are small. Second, c1(t) captures mostly the
periodic nature of the variation; the variation in this dimension between motion types and
between persons is small. For the first and the third persons, the second peak is smaller
than the first one in the walking traces. Also, the amplitude of variation for jogging motion
of the third person is lower than the other two. Since the eigenvector associated with this
coordinate emphasize variation in distance between features (see Fig. 15) and maximum
distance change is for features from the foot, the amplitude of variation of c1(t) seems to
be related to the stride lengths. In other words, the third person’s jogging stride is shorter
33
Table 2. Distance between the traces through the SoPF of two different cycles of motion forthe three persons and three motion types dataset. (Walking (W), Jogging(J), and Running(R)).
Trace Distances (10−3)Person 1 Person 2 Person 3
W J R W J R W J RW 1.23 30.74 4.41 42.12 52.54 10.99 67.11 39.86 14.91
Person 1 J 22.22 2.73 30.4 17.3 17.64 15.6 30.96 25.0 18.92R 10.65 43.82 4.4 56.66 64.57 18.39 73.35 40.2 15.57W 49.86 13.05 57.43 2.11 4.28 21.72 15.89 24.79 33.6
Person 2 J 41.57 10.14 49.38 8.04 6.67 16.97 22.05 19.65 28.49R 26.87 31.92 24.67 34.19 37.01 13.17 55.94 22.53 16.32W 61.94 25.42 63.34 15 15.74 34.26 3.66 18.93 31.34
Person 3 J 30.63 18.67 31.11 21.98 21.31 17.18 23.5 3.88 10.07R 16.16 27.87 13.22 32.41 38 8.61 41.56 14.54 3.18
than the other two. Another aspect worth pointing out is that the plots for running tend
to be more pointed than for the other two motion types.
Fig. 17(b) plots the variation of the second coordinate c2(t), which has larger variation
among different types of motion and persons. Differences in walking style of the second
person from the first and the third show up in the nature of the variation. The running
style of each person is different from the other two, wich is also evident from the plots.
The matrix containing the time normalized distances (dnorm) is shown in Table 2. Min-
imum value in each row is highlighted. Notice that the diagonal entries are lower than
the off-diagonal ones, which indicates good discrimination. The distance matrix can be
partitioned into sub-matrices that provide insights into the kinds of discrimination that are
possible. These we consider next.
4.1 Can We Discriminate Between Motion Types Across Persons?
To test whether we can reliably distinguish between walking, jogging, and running across
persons, we grouped the data into three classes, each representing one motion type and
containing SoPF traces from all persons. We compute the intra-class and inter-class dis-
tances, whose mean values are listed in the first row of Table 3 along with the variances of
34
(a)
(b)
Figure 17. Variation of (a) c1(t) and (b) c2(t) within each motion cycle for each of the threepersons and motion types.
35
these estimates. The mean inter-class distance (30.52× 10−3) is almost double of the mean
intra-class distance (15.82 × 10−3). As we see next, this discrimination between motion
types, irrespective of person identity, is lower than on per-person basis.
4.2 For Each Person, Can We Discriminate Between Motion Types?
To answer this, we consider the three by three sub-matrices along the diagonal of the
distance matrix in Table 2. For each person the distances between traces from the same
motion type form the intra-class distances and those between traces from different motion
types are the inter-class distances. The second, third, and fourth rows of Table 3 list the
mean distances for these two classes, along with estimates of their variances, for each of the
three persons. We see that the mean inter-class distances are about 4 to 20 times larger
than the mean intra-class distances. This indicates that, as expected, motion types from
each person can be easily discriminated.
4.3 Is Identifying Persons Based on Motion Gait Possible?
The next question we consider is the possibility of distinguishing persons based on SoPF
traces of different gaits. To study this, for each motion type, we formed three classes of
traces, one for each person. The inter- and intra-class mean distances between the traces
over a cycle of motion is listed in the last three rows of Table 3. The second and the
fourth columns list the mean distances. The third and fifth columns list the variances of
respective mean estimates. As we can see the inter-class mean distances are about 4 to 10
times larger than the intra-class mean distances. This seem to indicate strong possibility
for discriminating between persons based on SoPF traces.
4.4 Is the SoPF Representation Robust with Respect to Segmentation Errors?
One of the claims is that our approach does not rely on perfect segmentation. Indeed, as
outlined before, the segmentation process used is a rather crude one that identifies motion
edges based on image differencing. The motion edges identified in such a manner contain a
number of edge pixels from the background as was seen in Fig. 8(d). Sometimes, as shown
36
Table 3. Summary statistics of distances between the traces through the SoPF for the threepersons and three motion types dataset.
Trace Distance (10−3)Distinguishing Intra-Class Inter-Class
µ σµ µ σµ
Motion Types 15.82 0.83 30.52 0.72Motion Typesof Person 1 2.42 0.39 41.38 2.75of Person 2 3.76 0.69 19.80 0.75of Person 3 4.78 1.46 15.45 0.70
Persons basedon Walking 2.11 0.39 22.88 1.72on Jogging 5.65 1.47 20.12 1.73on Running 3.22 0.60 22.69 1.25
in Fig. 18(a) and (b), even significant edges are missed if they are too close to the motion
region boundary. Results presented so far were based on images that contained all these
artifacts.
We also conducted a controlled study, where we relaxed our thresholds for identifying
motion edges to include more edges. Fig. 18(c) and (d) show motion edges identified for
the frame shown in Fig. 8 for two different degrees of tolerances. More background edges
are included in Fig. 18(d) than in Fig. 18(c), which is more than in Fig. 8(d). The pairwise
distances are shown in Tables 4 and 6. The minimum in each row is highlighted. The
various inter- and intra-class mean distances traces for the two noisy segmentations are
listed in Tables 5 and 7. By comparing these distances with that listed in Table 3, we see
that, although the gap between the inter- and intra-class means decrease with increasing
segmentation noise, there is still enough discriminating power between the classes.
4.5 Is the SoPF Representation Stable with Respect to Scale Variations?
To show that the SoPF representation is scale invariant, we sub-sampled our testing set to
half the size of the sequence frames keeping the training set, which was used to construct
the SoPF, at the original size. Table 8 lists the distances between two cycles from the
reduced size testing set using the SoPF constructed with original size images. Table 9
37
(a) (b)
(c) (d)
Figure 18. (a) and (b) show some typical frames where the segmentation process missessignificant portions of the legs. (c) An under segmented frame. (d) A more under segmentedframe. (Corresponding to that in Fig. 8).
Table 4. Distance between the traces through the SoPF of two different cycles of motionfor the three persons and three motion types with moderate amount of segmentation noise(Walking (W), Jogging (J), and Running (R)).
Segmentation Study: Trace Distances (10−3)Person 1 Person 2 Person 3
W J R W J R W J RW 1.04 27.63 3.5 36.75 47.01 9.8 60.58 35.74 14.6
P1 J 20.33 2.41 26.79 13.99 15.56 12.56 27.52 21.83 16.13R 9.18 37.86 3.76 48.23 55.78 16.48 64.11 34.52 14.69W 45.6 11.44 51.22 1.94 3.93 18.42 13.93 22.59 28.54
P2 J 37.31 8.44 43.25 6.51 5.68 14.32 19.07 18.03 24.54R 23.94 25.88 22.11 27.44 30.02 11.6 47.82 18.9 14.27W 56.56 22.75 55.87 13.63 14.06 29.92 3.1 17.04 25.75
P3 J 28.56 17.58 27.96 19.69 19.69 15.82 20.53 3.12 8.39R 15.71 24.08 13.2 27.01 32.51 8.71 34.91 11.61 3.08
38
Table 5. Summary statistics of the distances between the traces through the SoPF forsequences with moderate amount of segmentation noise.
Segmentation Study: Trace Distance (10−3)Distinguishing Intra-Class Inter-Class
µ σµ µ σµ
Motion Types 13.68 0.72 27.16 0.64Motion Typesof Person 1 2.12 0.36 37.32 2.46of Person 2 3.34 0.62 18.05 0.75of Person 3 4.23 1.31 14.30 0.58
Persons basedon Walking 1.78 0.34 20.37 1.51on Jogging 5.04 1.28 16.99 1.42on Running 2.87 0.54 19.33 1.05
Table 6. Distance between the traces through the SoPF of two different cycles of motion forthe three persons and three motion types with large amount of segmentation noise (Walking(W), Jogging (J), and Running (R)).
Segmentation Study: Trace Distances (10−3)Person 1 Person 2 Person 3
W J R W J R W J RW 1.17 23.67 2.74 28.41 37.2 8.59 49.56 29.75 14.41
P1 J 18.68 2.18 21.46 11.05 13.42 10.8 22.01 17.26 13.69R 8.24 30.28 3.46 37.66 43.76 15.02 51.4 28.83 14.5W 36.93 9.35 38.82 1.62 3.73 13.12 12.9 16.4 19.77
P2 J 30.11 6.65 32.61 4.75 5.04 9.91 16.99 14.86 18.4R 19.6 20.71 17.18 20.5 22.3 9.35 39.62 17.03 13.13W 46.47 17.83 42.7 11.24 12.85 23.22 3.07 11.19 16.24
P3 J 25.91 14.74 23.35 15.88 17.08 14.76 14.78 2.25 6.2R 15.42 20.34 11.81 20.86 25.66 9.4 27.39 10.04 4.22
39
Table 7. Summary statistics of the distances between the traces through the SoPF forsequences with large amount of segmentation noise.
Segmentation Study: Trace Distance (10−3)Distinguishing Intra-Class Inter-Class
µ σµ µ σµ
Motion Types 10.74 0.56 22.16 0.49Motion Typesof Person 1 1.79 0.30 30.33 1.97of Person 2 2.91 0.53 15.17 0.64of Person 3 3.81 1.06 12.68 0.44
Persons basedon Walking 1.60 0.31 17.04 1.29on Jogging 4.31 1.05 12.92 1.07on Running 2.60 0.47 14.15 0.73
shows the summary of the distances between different subsets. We can see that the distances
between traces are very similar to those shown in Table 3. This indicates that the relational
distribution representation has some amount of built-in scale invariance.
4.6 PCA of the Edge Images
One might ask, why not just do a PCA of the edge images instead of the relational distri-
bution of the edges. Our experience shows that the SoPF representation is more compact
than the PCA space of the raw edges themselves. Fig. 19 shows the plot of the eigenvalues
for both the edge-PCA and the SoPF spaces. Values were energy normalized. From this
plot it is obvious that the edge-PCA space is much less compact than the SoPF space; SoPF
can work with a lesser number of dimensions that edge-PCA.
The computational complexity of the edge PCA is dependent of the image size used, and
hence dependent on the scale of the images, whereas the SoPF computational complexity is
dependent on the size of the relational distributions, which is a constant. In fact, we found
it difficult to allocate enough memory to compute the eigenvalues and eigenvectors directly
from the 265 × 130 edge images using a Sun Ultra 30 Creator workstation running at 246
MHz with 256 MB of RAM. On reduced sized images it took almost 34 hours to calculate
40
Table 8. Distance between the traces through the SoPF of two different half scaled cyclesof motion for the three persons and three motion types (Walking (W), Jogging (J), andRunning (R)).
Scale Study: Trace Distances (10−3)Person 1 Person 2 Person 3
W J R W J R W J RW 10.05 52.48 79.13 32.75 56.44 50.70 10.97 20.42 22.32
Person 1 J 49.58 3.12 18.14 18.24 6.77 24.89 54.94 23.91 32.35R 62.95 18.38 4.21 32.12 21.48 19.18 63.56 39.39 34.91W 21.63 22.76 35.57 3.42 22.09 29.05 27.55 22.58 20.12
Person 2 J 41.76 9.95 28.20 16.46 7.75 21.07 48.39 18.89 27.26R 30.97 21.44 26.35 20.91 20.36 5.15 33.13 20.64 13.52W 10.71 54.47 74.76 38.27 58.70 42.24 5.34 19.39 15.36
Person 3 J 30.92 37.44 61.05 38.85 36.89 26.49 28.09 15.37 16.83R 18.61 33.05 46.12 31.76 35.87 18.98 16.52 9.94 3.95
Table 9. Summary statistics of the distances between the traces through the SoPF of thehalf scaled version of the testing set, keeping the training set at the original size.
Scale Study: Trace Distance (10−3)Distinguishing Intra-Class Inter-Class
µ σµ µ σµ
Motion Types 16.60 0.83 32.60 0.68Motion Typesof Person 1 3.12 0.50 43.06 2.61of Person 2 4.39 0.79 21.93 0.55of Person 3 5.66 1.64 17.52 0.78
Persons basedon Walking 2.69 0.45 20.39 1.43on Jogging 6.61 1.66 21.50 1.69on Running 3.90 0.70 26.30 1.36
41
Figure 19. Comparison of the largest eigenvalues associated with the edge images of peoplein motion and those associated with the SoPF of 2-ary relational distributions of the sameimages.
the eigenspace. In contrast, the size of the relational distribution we used was 30 × 30 and
the eigenspace was easily computed.
42
CHAPTER 5
EVALUATION METHODOLOGY
In this chapter we explain the methods used to evaluate the performance of the algorithms
developed. First, we introduce concepts and methods related to measuring performance of
an algorithm over a specific dataset, and then we present methods to evaluate performance
of different algorithms over the same dataset.
According to Jain et al. [61], there is no evaluation method sufficient to provide a con-
vincing and reliable accuracy of a biometric system, because performance evaluations are
very dependent on the database tested. That is why in our experiments we used four differ-
ent data sets captured under different imaging conditions, highlighting different covariates,
and of different sizes. For each experiment, we divide our datasets into gallery and probe
sets, adopting the successful FacE REcognition Technology (FERET) evaluations [62]. In
the biometrics vocabulary, gallery set represents the enrolled data or data on watch list and
the probe sets are the query data. The probe sets vary from the gallery set in increasing
degrees of differences in terms of the covariates. Same subject can be represented in both
the gallery and probe sets, but the same data unit (i.e. sequence of images) from a person
is not used in both gallery and probe sets. We match each probe sequence to the gallery se-
quences, thus obtaining a similarity matrix with size that is the number of probe sequences
by the number of gallery sequences.
5.1 Covariates
We consider a covariate as a condition affecting gait and the different experiments that
we performed are structured to study how covariates like gait type (Chapter 6), view an-
gle (Chapter 7), walking surface, footwear and view point (Chapters 8 and 9) affect the
recognition performance by varying that condition between gallery and probe sets.
43
Table 10. Sample rows from a file in SAS format for the experiment on different motiontypes.
Person MotionType Direction DistanceSamePerson SameMotionType DiffDirection 2.3164163SamePerson DiffMotionType SameDirection 6.3817268SamePerson DiffMotionType DiffDirection 6.5920931SamePerson DiffMotionType SameDirection 5.3880706SamePerson DiffMotionType DiffDirection 5.7101398DiffPerson SameMotionType SameDirection 7.2667040
5.1.1 Analysis of Variance (ANOVA)
We use ANOVA to quantify the effect of covariates studied in the experiments of Chap-
ters 6 and 7. The information contained in the similarity matrices, constructed out of the
similarities between each pair of gallery and probe sequences, is used for this purpose. We
use the SAS software to perform the statistical analyses. Generalized linear model (GLM)
is used, which is a better option over the traditional factorial model since it supports the
use of categorical variables and provides additional output information. The data in the
similarity matrix is rearranged into a single column format (row-scanned). Tags are added
for each similarity value specifying the relation of the covariates that produced the value.
The product is a file in SAS format. Table 10 shows a sample of the first few lines of such
file.
The similarity or distance is defined as the dependent variable and covariates such as
person, motion type, view angle, and direction are defined as the independent variables.
The results of the test will provide us with statistical evidence of the variation induced by
each independent variable in an experiment. By comparing the F-values and P-values we
can determine which independent variable will have more effect over the dependent variable
and if this effect can be considered statistically significant for the experiment.
5.2 Performance Evaluation
Following the pattern of the FERET evaluations, we measure performance for both identi-
fication and verification scenarios using cumulative match characteristics (CMCs) and re-
44
ceiver operating characteristics (ROCs), respectively. The evaluation process is illustrated
in Fig. 20.
5.2.1 Identification
In the identification scenario, the task is to identify a given probe to be one of the given
gallery images. To quantify performance, for each probe we sort the gallery images based
on computed similarities with that probe. In terms of the similarity matrix, this would
correspond to sorting the individual rows of the similarity matrix. If the correct gallery
sequence corresponding to the given probe occurs within rank k in this sorted set, then we
have a successful identification at rank k. A cumulative match characteristic plots these
identification rates (PI) against the rank k. The identification rate is the ratio of the number
of correct identifications to the total number of probes. Note that this is a closed universe
test, where every probe should be in the gallery.
5.2.2 Verification
In the verification scenario, we are interested in knowing whether a person is indeed the one
he/she claims. In other words, we are interested in matching a given pair of probe and gallery
images. This type of scenario can arise when trying to gain access to ATM machines or
entry into a building. To quantify performance in this scenario we use the classical receiver
operating characteristics (ROCs) that plot the verification rates (or detection rates) against
false alarm rates. The verification rate is the ratio of the number of correct identifications
to the number of probes in the gallery. The false alarm rate is the ratio of the number of
incorrect identifications to all possible wrong pairings of gallery and probe subjects. This
is an open universe test, where some probes are not in the gallery.
5.3 Statistical Methods for the Evaluation of Human Identification Algorithms
In this section we describe Mc Nemar’s test that we used to analyze and compare the
performance of two recognition strategies on the same dataset. The use of ROC and CMC
to evaluate the performance of these algorithms give us a point of comparison between
45
Figure 20. The process of evaluating the performance of our algorithms.
46
Table 11. Paired data from algorithms being compared with Mc Nemar’s test.
Outcome ofAlgorithm A
S FOutcome of S 33 7Algorithm B F 4 20
algorithms reporting results over the same database and using the same gallery and probe
sets, but is not enough evidence to determine if one algorithm performs better than other.
The other statistical characterization that we consider establishes ranges of variations for
changes in the gallery set.
5.3.1 Mc Nemar’s Test
Beveridge et. al. [63] introduced to the computer vision biometrics community a simple
binomial model for the outcomes of human identification algorithms and proposed the use
of Mc Nemar’s test to compare two algorithms (A and B) tested on common data. Following
this methodology, four numbers need to be calculated:
(a) SS is the number of subjects correctly classified by both algorithms.
(b) FF is the number of subjects incorrectly classified by both algorithms.
(c) SF is the number of subjects correctly classified by algorithm A but incorrectly clas-
sified by algorithm B.
(d) FS is the number of subjects incorrectly classified by algorithm A but correctly clas-
sified by algorithm B.
These numbers are represented in Table 11. In Mc Nemar’s test, SS and FF numbers
are discarded. The null hypothesis, H0, is that P [SF ] = P [FS], this means that a failure
(SF or FS) is equally likely to favor algorithm A or B. The formulation for the rest of the
test is as follows: Let NSF be the number of SF occurrences and NFS be the number of
FS occurrences. An alternative hypothesis, HALT , is considered to be P [SF ] > P [FS]
47
(for the one sided version of this test), which implies that algorithm A fails less often than
algorithm B. Under H0,
P [at least NSF mismatches favor A] = P [at least NFS mismatches favor B]
=∑NF S
i=0n!
i!(N−i)!0.5NSF +NF S
The probability resulting from these computation is the p-value for rejecting H0 in favor
of HALT .
5.3.2 Performance Variations due to Variation in Gallery Data
Gait datasets are partitioned into subsets according to the covariates being investigated
and experiments are designed to study their effects. Typically, we select the largest data
subset as the gallery set and test against all other subsets defined from each experiment. In
this section, we describe a method to study the variation in performance of the algorithms
developed for all the designed experiments considering variations in the gallery data.
Let {C1, C2, . . . , CK} be the K covariates being investigated. We consider covariates
having 2 levels (ci(1) and ci(2)). Then, the dataset can be partitioned into 2K subsets for dif-
ferent combination of covariates. Thus, a dataset can be denoted by {C1 = c1(l1), . . . , Ci =
ci(li), . . . , CK = cK(lK)}. To study recognition rates with change in covariate Ci, we have
to consider gallery and probe sets with different Ci level, keeping other covariates constant.
Thus, possible gallery-probe pairs would be:
Gallery = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(1), Ci+1 = ci+1(li+1), . . . , CK =
cK(lK)}Probe = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(2), Ci+1 = ci+1(li+1), . . . , CK =
cK(lK)}for different combinations of {l1, . . . , li−1, li, li+1, . . . , lK}. Note that between the gallery
and probe pair only Ci is changing. There are 2K−1 such combinations. Another 2K−1
combinations could be generated by reversing the roles of the probe and the gallery in the
above pariting, thus
Gallery = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(2), Ci+1 = ci+1(li+1), . . . , CK =
cK(lK)}
48
Probe = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(1), Ci+1 = ci+1(li+1), . . . , CK =
cK(lK)}The variation in recognition rates for these 2K cases would give us an idea of the variabil-
ity of recognition rate when the Ci factor is changed. In our case, we test and compare such
variations in performance of two different algorithms over the same dataset and present
the results in CMC curves for each experiment to graphically compare variations in the
performance of both algorithms.
49
CHAPTER 6
HUMAN IDENTIFICATION FROM DIFFERENT GAIT TYPES
In this chapter, we present an experiment designed to explore the possibility of gait-based
identification in a more extensive manner than in Chapter 4. We use a database of 10
persons performing three motion types, walking, jogging, and running, in an outdoor set-
ting. The viewpoint is frontal-parallel. Some example frames are shown in Fig. 21 for a
person (a) walking, (b) jogging, and (c) running. The average height of the person is 120
pixels. Each person performed these three different motion types in two different directions,
left-to-right and right-to-left. This gives us six different types of sequences (Walking-Left,
Walking-Right, Jogging-Left, Jogging-Right, Running-Left, Running-Right) for each per-
son, resulting in a total of 60 sequences.
6.1 Analysis of Covariates
The three covariates present in the 10 person database are motion type, walking direction,
and the identity of the person. In this section we quantify the strength of the variations in
gait due to these covariates. For our analysis, from each of the 60 sequences, we extracted
two motion cycles: one was used to build the SoPF (training set) and the other was used
for analysis (testing set). The dimensions of the trained SoPF are shown in Fig. 22 as
gray level images with the corresponding eigenvalues quantifying the associated variation
shown below each image. Although the eigenvectors are not exactly the same as those for
the treadmill sequences (Fig. 15), which just included the lower legs, we can see certain
similarities. Variation of distances seems to be important for the top eigenvectors and the
orientation variations are emphasized by later eigenvectors.
We computed the time-normalized distances (dnorm) between each pair of the 60 training
and 60 testing gait cycles. We then used analysis of variance (ANOVA) to study the effect
50
(a)
(b)
(c)
Figure 21. Sample frames of a person (a) walking, (b) jogging, and (c) running.
Figure 22. Ten most dominant dimensions of the SoPF for different motion types databaseconsisting of 10 persons.
51
Table 12. ANOVA table with results for different motion types experiments.
Source DF SS F-value P-valuePerson 1 793.92 114.22 < 0.0001Angle 1 9.53 1.37 0.2419
Direction 1 12.06 1.74 0.1879
of person, motion type, and direction of motion on the computed distance. Each covariate
can have two possible values: same or different, i.e. same person or different persons, same
motion type or different motion types, and same motion direction and different motion
directions. For instance, a computed distance could be between, say, different persons,
same motion types, and for movement in the same direction. ANOVA results are shown in
Table 12, from which we can see that differences due to the subject is, by far, the largest
source of variation as compared to motion type or direction. In fact, as the F-values suggest,
the variation due to the persons is at least three orders of magnitude larger than due to
motion type or walking direction.
6.2 Gait-Based Recognition Experiments
Given that the subject is the largest source of variation in the distances out of the three
factors, it is natural to ask what kind of recognition rates can we get based on gait, be it
walking, jogging, or running. This, we investigate next.
We conducted three gait recognition experiments based on walking, jogging, and running
gaits. For each experiment, we separated the sequences with the corresponding motion type
into gallery and probe sets, as explained in Chapter 5. One cycle from each sequence with
the person going left formed the gallery set and one cycle from each sequence with the
person going right formed the probe sets. We are basically using the left profile of the
person as gallery and the right profile as probe. The specific gallery and probe sets for each
experiment are listed in the second row of Table 13. The gallery set of images was also the
training set used to build the SoPF.
52
Table 13. Number of persons correctly identified for different motion types experiments.
At ExperimentRank Gallery: Walking Left Gallery: Jogging Left Gallery: Running Left
Probe: Walking Right Probe: Jogging Right Probe: Running Right1 10 of 10 10 of 10 8 of 102 10 of 10 10 of 10 9 of 10
Table 14. Distance between the traces through the SoPF of two different cycles of walkingmotion for 10 persons.
Trace Distances (10−3)Gallery Probe
1 2 3 4 5 6 7 8 9 101 12.96 22.15 20.00 16.00 35.48 21.03 25.44 33.12 24.86 26.672 15.34 7.33 23.05 11.68 19.93 15.94 17.48 24.96 24.09 29.273 20.10 30.40 12.92 24.61 31.39 22.43 30.40 24.97 29.73 18.374 15.10 17.71 20.75 8.55 30.24 16.59 19.89 21.76 22.30 16.355 21.68 20.59 14.83 17.80 13.48 17.37 17.73 14.87 30.78 21.126 16.35 16.68 21.23 14.81 24.02 8.71 16.86 16.85 18.96 17.047 23.07 20.64 24.78 19.70 19.02 18.91 9.95 15.88 31.63 27.018 26.70 25.94 23.96 23.68 20.44 24.17 16.70 9.66 39.46 22.789 22.86 23.43 28.87 21.97 17.24 11.00 10.37 18.63 18.23 22.9110 35.15 47.61 32.34 37.36 41.08 27.78 37.20 26.24 38.87 11.96
For each probe we compute its distance from all the gallery images. If the identity of
the gallery image with the smallest distance to the probe matches the identity of the probe,
then we have successful identification. Table 14 shows the distances between the gallery
containing cycles from the persons walking right and the probe containing persons walking
left, Table 15 shows the distances between cycles from the persons jogging, and Table 16
shows the distances between cycles from the persons running. The minimum value in each
column, which corresponds to one probe, is highlighted. Note that the gallery entry with
the minimum values corresponds to correct identity. Thus, we have correct identification
10 out of 10 times for walking gait. The same is also true for jogging gait. But, as we see in
Table 16, the rate falls to 8 out of 10 correct identifications for running gait. If we accept
correct identification to be the case when the identities of either the minimum (rank 1) or
the second minimum (rank 2) match, then the identification rate increases to 9 out of 10.
53
Table 15. Distance between the traces through the SoPF of two different cycles of joggingmotion for 10 persons.
Trace Distances (10−3)Gallery Probe
1 2 3 4 5 6 7 8 9 101 12.69 17.73 16.02 13.38 16.38 17.04 17.99 19.24 17.13 23.592 21.18 7.03 23.03 12.79 18.11 13.55 14.49 15.79 17.25 29.983 18.69 28.38 9.12 22.22 25.75 25.19 23.26 26.12 25.36 15.354 19.42 16.24 17.99 8.96 14.12 12.17 14.42 12.54 14.33 17.205 27.00 19.69 20.80 17.97 7.73 13.84 16.98 14.92 19.10 30.466 22.45 13.73 23.18 11.93 12.74 8.92 12.98 10.88 13.13 20.797 29.66 16.96 22.97 17.41 14.46 15.35 11.11 13.30 23.48 33.758 31.19 19.42 25.97 17.24 18.62 13.41 17.33 9.09 19.44 32.629 30.55 18.67 26.73 22.57 19.89 10.31 18.27 14.16 11.09 20.5710 19.51 30.41 20.51 17.12 30.60 24.94 26.36 22.08 23.40 9.84
Table 16. Distance between the traces through the SoPF of two different cycles of runningmotion for 10 persons.
Trace Distances (10−3)Gallery Probe
1 2 3 4 5 6 7 8 9 101 6.32 15.45 17.98 15.84 20.25 21.14 20.84 18.35 26.24 26.512 13.83 6.90 19.34 15.24 21.43 18.76 17.96 17.91 31.30 31.373 21.13 23.91 5.69 14.26 19.22 16.67 24.72 14.35 25.61 12.014 20.01 22.02 11.00 11.95 18.78 16.87 22.60 16.36 25.13 17.375 23.45 26.09 19.51 17.88 13.22 20.26 23.82 19.32 25.98 23.326 23.64 26.74 20.42 18.86 19.77 10.71 13.15 10.46 16.24 21.757 22.80 14.53 24.87 18.28 27.07 13.07 12.19 14.60 28.80 35.378 23.04 17.78 18.08 16.86 22.08 8.02 17.48 8.15 28.04 27.629 24.96 31.20 25.16 25.49 29.72 23.06 19.39 21.81 12.68 28.5210 42.47 61.60 27.19 39.03 53.07 39.91 43.46 35.08 29.50 24.43
54
CHAPTER 7
WALKING GAIT BASED IDENTIFICATION FROM DIFFERENT VIEWANGLES
In this chapter, we investigate the viability of the SoPF framework for walking gait-based
recognition using a larger dataset of 20 persons. We also investigate the relationship of
the achieved recognition rates with viewing angle. For this, we imaged 20 persons walking
frontal-parallel, 22.5◦, and 45◦ with respect to the image plane, as depicted in Fig. 23.
As before, each person walks each of the three slanted paths in two different directions,
left-to-right and right-to-left. Thus, for each person, there are 6 sequences imaged under
6 possible conditions: 0◦ (frontal-parallel) going left (0L), 0◦ (frontal-parallel) going right
(0R), 22.5◦ going left (22L), 22.5◦ going right (22R), 45◦ going left (45L), and 45◦ going
right (45R). The abbreviations in parentheses will be used in the following discussion to
refer to these conditions. Fig. 24 shows 3 sample frames from the same person walking
the three differently angled paths. The frame size is 280 × 130. Although our dataset is
moderate in size, it is quite challenging. It presents a very difficult scenario for background
subtraction, including persons moving in the background and sudden illumination changes
due to clouds.
7.1 Analysis of Covariates
The three covariates present in the database for this experiment are walking direction, angle
of motion path, and the identity of the person. We quantify the strength of effect of these
factors on the variations in the distance values computed between two cycles from each of the
120 sequences (20 persons×3 covariates×2 conditions per covariates). One cycle from each
of the 120 sequences form the training set of images was used to construct the SoPF, whose
leading dimensions are shown in Fig. 25 with the corresponding eigenvalues quantifying
the associated variation shown below each image. Qualitatively, these dimensions capture
55
Figure 23. Setup for data acquisition of different view angle walking sequences.
56
(a)
(b)
(c)
Figure 24. Sample frames from the same person walking (a) frontal-parallel (b) 22.5◦ (c)45◦ with respect to the image plane.
Table 17. ANOVA table with results for different view angle experiments.
Source DF SS F-value P-valuePerson 1 4624.33 1208.97 < 0.0001Angle 1 0.74 0.19 0.6604
Direction 1 4.51 1.18 0.2775
similar aspects, such as primarily distance in the first few dimensions and then a combination
of orientation and distance, as that in Fig. 22. They are, of course, different when examined
closely, since the previous database included three types of motion, not just walking as in
the present case.
As before, we quantify the effect of the covariates on the distances using ANOVA, whose
output is shown in Table 17. We see that subject is the largest and most significant source
of variation. In fact, as the F-values suggest, the variation due to the persons is at least
three orders of magnitude larger than due to angle or walking direction.
57
Figure 25. Ten most dominant dimensions of the SoPF for 20 person database.
Table 18. Gallery and probe sets for gait recognition experiments over the 20 persondatabase.
Experiment Training set/Gallery Set Probe Set1 0◦ (frontal-parallel) going left (0L) 0◦ (frontal-parallel) going right (0R)2 22.5◦ going left (22L) 22.5◦ going right (22R)3 45◦ going left (45L) 45◦ going right (45R)4 0◦ (frontal-parallel) going left (0L) 22.5◦ going right (22R)5 0◦ (frontal-parallel) going left (0L) 45◦ going right (45R)
7.2 Gait-Based Recognition Experiments
Given that the subject is the largest source of gait variation, as measured in the SoPF, how
does the recognition rates vary with view angle? To answer this, we separated our database
into five sets of gallery and probe combinations, corresponding to 5 experiments listed in
Table 18. The going-left sequences form the galleries and the going-right sequences form
the probes. The training set of images, used to create the SoPF, consists of the union of
the gallery sets.
We use the first three experiments, 1, 2 and 3, to study if recognition can be possible
from views other than frontal-parallel ones. To answer how recognition varies as the view
angle, we use experiments 1, 4, and 5. On comparing the CMCs and ROCs for experiments
1, 2, and 3, which are shown in Figs. 26 (a) and (b) respectively, we see that identification
and recognition rates from the three experiments are similar. Rank 1 identification rates
58
range from 75% to 80% which improves to 85% to 95% at rank 2. Verification rates at 10%
false alarm are around 90%. We can conclude that gait-based recognition is possible from
non frontal-parallel views, such as those viewed at 22.5◦ or 45◦.
Fig. 27 shows the (a) CMC and (b) ROC curves for experiments 1, 4 and 5 that study
the variation of identification and verification rates with viewpoint. Identification rate when
the gallery and the probe are from the same view angle is 80%, which drops only to 75%
when the probe is from 22.5◦ viewpoint. But the performance falls drastically to 55% with
a 45◦ viewpoint probe set. The same trend is also seen in the ROCs. Thus, it appears that
the gait-based recognition using the SoPF framework is robust with respect to viewpoint
change up to 22.5◦.
One might argue that on a small dataset one should get near 100% identification rates.
To this, we point out the complexity of the outdoor imaging conditions in the data set and
the fact that we have a clear separation of train and test sets; we use the left profile for
training (or as gallery) and try to identify people from their right profiles (the probe sets).
Thus, the recognition rates also reflect the inherent variation in gait due to opposite profile
viewpoints in addition to any other factor that might be different between the probe and
gallery sets in each of the experiments.
59
0 2 4 6 8 10 12 14 16 18 2050
55
60
65
70
75
80
85
90
95
100
105
Rank
Iden
tific
atio
n R
ate
View Angle Experiments
0L − 0R22L − 22R45L − 45R
(a)
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100View Angle Experiments
False Alarm Rate
Ver
ifica
tion
Rat
e
0L − 0R22L − 22R45L − 45R
(b)
Figure 26. (a) CMC and (b) ROC curves for experiments 1, 2 and 3, studying identificationand verification rates at varying viewpoints.
60
0 2 4 6 8 10 12 14 16 18 2050
55
60
65
70
75
80
85
90
95
100
105
Rank
Iden
tific
atio
n R
ate
View Angle Experiments
0L − 0R0L − 22R0L − 45R
(a)
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100View Angles Experiments
False Alarm Rate
Ver
ifica
tion
Rat
e
0L − 0R0L − 22R0L − 45R
(b)
Figure 27. (a) CMC and (b) ROC curves for experiments 1, 4 and 5, studying variation ofidentification and verification rates with change in view point.
61
CHAPTER 8
BENCHMARKING WALKING GAIT BASED IDENTIFICATION
There is an increasing interest in human identification based on gait in the computer vi-
sion community. However, there is no benchmark to compare all emerging techniques,
mainly because we do not quite know the set of conditions under which the problem can
be solved. We know about potential factors that can effect human gait, such as walking
surface, footwear, view points, carrying objects, etc. However, the effects are not quantified
on a large dataset. In this chapter we first summarize the gait challenge problem [64] [65],
which contains a data set covering some of the mentioned variations, a set of experiments,
and a baseline algorithm. Then we present the performance of the baseline algorithm.
8.1 The Gait Challenge Problem
The gait challenge problem was designed to investigate the factors inducing variations in
human gait, and how they will affect the performance of gait-based recognition. There
are processes used in gait-based recognition that also need to be investigated such as fore-
ground/background segmentation, tracking, and dealing with occlusions. It is not possible
to draw conclusions based only in the performance figures of one algorithm on a small
database. Rather, these conclusions will come from detailed analysis of performance statis-
tics of multiple algorithms on a large common data set. The gait challenge problem provides
this framework.
8.2 The Data Set
The key to the success of the challenge problem is the database of video sequences collected
to support it. An ideal database helps define a set of challenge experiments that span a
range of characteristics and difficulties. These ranges are included in the gait challenge
62
Figure 28. Camera setup for the gait data acquisition.
problem because of the number of conditions under which a person’s gait is collected, the
number of individuals in the database, and the fact that all sequences are taken outside.
The database used in the challenge problem is the largest available to date in terms of
number of people, number of video sequences, and conditions under which a person’s gait
is observed. The current installment of the database consists of 452 sequences from 74
individuals, with each individual collected in up to 8 conditions. All the data is collected
outside, reflecting the added complications of shadows from sunlight, moving background,
and moving shadows due to cloud cover. This dataset is significantly larger than those that
are being used in present studies (Table 1), most of which are not publicly available.
The cameras were consumer-grade Canon Optura for the concrete surface, these are the
same cameras used to collect data for experiments in Chapters 4, 6, and 7. Two Canon
Optura PI cameras were used for the grass surface. All four are progressive-scan, single-
CCD cameras capturing 30 frames per second with a shutter speed of 1/250 second and with
auto-focus left on as all subjects were essentially at infinity. The cameras stream compressed
63
digital video to DV tape at 25 Mbits per second by applying 4:1:1 chrominance sub-sampling
and quantization, and lossy intra-frame adaptive quantization of DCT coefficients.
The imagery was recovered from tape at the National Institute of Standards and Tech-
nology (NIST). The camera was accessed over its IEEE 1394 Firewire interface using Pin-
nacle’s micro DV 300 PC board. The result is a stand-alone video file stored using Sony’s
(Digital Video) DV-specific dvsd codec in a Microsoft AVI wrapper. This capture from tape
does not re-compress and is not additionally lossy. Finally, the imagery is transcoded from
DV to 24-bit RGB using the Sony decoder and the result is written as PPM files, one file
per frame (720× 480 PPM file). This representation trades off storage efficiency for ease of
access.
Each subject walked multiple times, counterclockwise, around each of two similar sized
and shaped elliptical courses. The basic setup is illustrated in Fig. 28. The elliptical
courses were approximately 15 meters on the major axis and 5 meters on the minor axis.
Both courses were outdoors. One course was laid out on a flat concrete walking surface. The
other was laid out on typical grass lawn surface. Each course was viewed by two cameras,
whose lines of sight were not parallel, but verged at approximately 30◦, so that the whole
ellipse was just visible from the two cameras. Fig. 29 shows one sample frame from each of
the four cameras on the two surfaces. The orange traffic cones marked the major axes of
the ellipses. The checkered object in the middle can be used to calibrate the two cameras.
The final sequences contain each subject walking several laps of the course. However,
only data from one full elliptical circuit for each condition is available. For the gait database,
those frames were clipped from the last such lap when the persons are more comfortable of
being taped and they reach normal walking speed. The number of frames in each sequence
ranges from 600 to 700 frames. The gait video data was collected on May 21 and 22, 2001.
Subjects were asked to bring a second pair of shoes, so that they could walk the two
ellipses a second time in a different pair of shoes. A little over half of the subjects walked in
two different shoe types. Thus there are as many as eight video sequences for each subject:
(grass(G) or concrete(C))×(two cameras, L or R)×(shoe A or shoe B). Table 19 shows the
number of sequences for each combination of conditions in the present database.
64
(a) (b)
(c) (d)
Figure 29. Frames from (a) the left camera for concrete surface, (b) the right camera forconcrete surface, (c) the left camera for grass surface, (d) the right camera for grass surface.
Table 19. Number of sequences for each combination of possible surface (G or C), shoe (Aor B), and camera view (L or R).
Surface Concrete (C) Grass (G)Shoe A B A B
Left Camera 70 44 71 41Right Camera 70 44 71 41
65
Table 20. The probe set for each of challenge experiments. The number of subjects in eachsubset are in square brackets.
Exp. Probe DifferenceA (G, A, L) [71] ViewB (G, B, R) [41] ShoeC (G, B, L) [41] Shoe, ViewD (C, A, R) [70] SurfaceE (C, B, R) [44] Surface, ShoeF (C, A, L) [70] Surface, ViewG (C, B, L) [44] Surface, Shoe, View
8.3 Challenge Experiments
The set of challenge experiments of increasing difficulty for gait-based recognition are pre-
sented next. Three covariates are studied: walking surface (concrete (C) or grass (G)), shoe
type (A or B), and viewpoint (left (L) or right (R)). Based on the values of the covariates
the dataset is divided into 8 possible subsets: {(G, A, L), (G, A, R), (G, B, L), (G, B, R),
(C, A, L), (C, A, R), (C, B, L), (C, B, R)}. Since not every subject was imaged under
every possible combination of factors, the sizes of these sets are different (Table 19). One
of the large subsets (G, A, R), i.e. (Grass, Shoe Type A, Right Camera), was designated as
the gallery set, which includes 71 subjects. The rest of the subsets are probe sets, differing
in various ways from the gallery. The structure of the challenge experiments is listed in
Table 20.
8.4 Baseline Algorithm
The baseline algorithm was designed to be simple and fast. It is composed of three parts.
The first part semi-automatically defines bounding boxes around the moving person in each
frame of a sequence. Using a Java-based GUI, bounding boxes in the starting, middle, and
ending frames of the sequence are manually outlined. The bounding boxes for the interme-
diate frames are linearly interpolated from these manual ones. Specifically, the locations of
the upper-left and the bottom-right corners are interpolated. This approximation strategy
works well for cases where there is nearly frontal-parallel, constant velocity motion, which
66
(a) (b) (c) (d)
Figure 30. Sample bounding boxed image data as viewed from (a) left camera on concrete,(b) right camera on concrete, (c) left camera on grass, and (d) right camera on grass.
is the case of the frames from the back portion of the ellipse being processed in the gait
challenge experiments. Fig. 30 shows some examples of the image data inside the bounding
box. Note that bounding boxes are not specified tightly around the person; rather there is
some amount of background information all around the person in each box. The second and
the third parts of the algorithm are silhouette extraction and computation of the similarity
measure, which are explained in detail in the next two subsections.
8.4.1 Silhouette Extraction
The motion silhouette is extracted from each frame by background subtraction, but only
within the semi-manually defined bounding boxes. In the first pass through a sequence,
the background statistics of the RGB values at each image location, (x, y), is computed
using pixel values outside the manually defined bounding boxes in each frame. Then, the
mean µB(x, y) and the covariances ΣB(x,y) of the RGB values at each pixel location are
computed1. Fig. 31 shows examples of the estimated mean background image and the
associated variances of the RGB channels. These images were histogram equalized and are
smaller than those shown in Fig 29, because they show only the image locations where
the color statistics were computed. Notice that the variances are significantly higher in1Note that the images in this database are in color, unlike the ones used in previous chapters.
67
(a) (b)
(c) (d)
Figure 31. Estimated mean background for a sequence on (a) concrete and (c) grass. Vari-ance of the RGB channels in the background pixels on (b) concrete and (d) grass.
the regions corresponding to the bushes than other regions. The sharp contrast of the
calibration box also introduces significant variations, mainly due to DV compression.
The Mahalanobis distance of the pixel value from the estimated mean background value
is computed for pixels within the bounding box of each frame. Any pixel with this distance
above a user specified threshold DMaha is a foreground pixel. If the difference image is
smoothed using a 9×9 pyramidal-shaped averaging filter, or equivalently, two passes of a 3×3
averaging filter, the quality of the silhouette and recognition performance improves. This
smoothing compensates for DV compression artifacts. On the difference thresholded image,
we perform two post-processing steps to extract the normalized silhouette. First, small
regions are detected by connected component labeling, if the region is less than NSize pixels
then it is deleted. Second, the remaining foreground region is scaled so that its height is 128
pixels to occupy the whole length of the 128×88 output silhouette frame. The scaling of the
silhouette offers some amount of scale invariance and facilitates the fast computation of the
similarity measure. There are two ways of performing this scaling: (a) scale the thresholded
silhouette, or (b) scale the difference image using bilinear interpolation and then threshold.
The second method involves more computations than the first and produces better “looking”
silhouettes, but as will be seen, the performance is not significantly different. The sources of
segmentation errors include (a) shadows, especially in the concrete sequences, (b) inability
to segment parts of a body as distances fall just below the threshold, (c) moving objects
68
Figure 32. The bottom row shows sample silhouette frames depicting the nature of segmen-tation issues that need to be tackled. The raw image corresponding to each silhouette isshown in the top row.
in the background, such as the fluttering tape in the concrete sequences, moving leaves in
the grass sequence or other moving persons in the background, and (d) DV compression
artifacts near the boundaries of the person. Fig. 32 shows some of these problematic cases.
8.4.2 Similarity Computation
The similarity computation for sequences with more than one gait cycle presented in
Section 3.3 is reformulated in this section in terms of the challenge problem. Let the
probe and the gallery silhouette sequences be denoted by SP = {SP(1), · · · , SP(M)} and
SG = {SG(1), · · · ,SG(N)}, respectively.
The probe sequences are partitioned into disjoint subsequences of NProbe contiguous
frames. Let the k-th probe subsequence be denoted by SPk = {SP(k), · · · ,SP(k+NProbe)}.The distance measure will be the correlation between each of these subsequences with the
gallery sequence
69
Corr(SPk,SG)(l) =NProbe∑
j=1
FrameSim (SP(k + j),SG(l + j)) (10)
The similarity is chosen to be the median value of the maximum correlation of the
gallery sequence with each of these probe subsequences.
Sim(SP,SG) = Mediank
(max
lCorr(SPk,SG)(l)
)(11)
At the core of the above computation is the need to compute the similarity between two
silhouette frames, FrameSim (SP(i),SG(j)), which is computed as the ratio of the number
of pixels in their intersection to that in their union. Thus, if the number of foreground
pixels in silhouette S is denoted by Num(S) then we have,
FrameSim(SP(i),SG(j)) =Num(SP(i) ∩ SG(j))Num(SP(i) ∪ SG(j))
(12)
8.4.3 Parameters
There is no calibration requirement. However, the algorithm does have three parameters
that need to be chosen.
(a) DMaha is used to threshold the Mahalanobis distance. Since this distance measure is
normalized by the covariances, the choice of the threshold tends not to be sensitive
to a particular image.
(b) NSize is used to delete small regions and fill in small holes in the thresholded difference
image.
(c) NProbe is the size of each subsequence obtained by partitioning the probe sequence.
The performance variation of the challenge Experiment A around the operating point:
DMaha = 7, NSize = 200, NProbe = 30, which is shown to be at least a locally optimal
point. With increase in DMaha the silhouettes become thinner, but parts tend to become
disconnected. The impact of the NSize parameter is less obvious visually in terms of the
overall main silhouette, but it does get rid of spurious small extraneous connected regions.
70
Table 21. Baseline performance for the challenge experiments in terms of the identificationrate PI at ranks 1 and 5, verification rate PV at a false alarm rate of 10%, and area underROC (AUC).
Experiment Difference PI (at rank) PV AUC1 5
A View 79% 96% 86% 0.937B Shoe 66% 81% 76% 0.883C Shoe, View 56% 76% 59% 0.844D Surface 29% 61% 42% 0.765E Surface, Shoe 24% 55% 52% 0.774F Surface, View 30% 46% 41% 0.750G Surf, Shoe, View 10% 33% 36% 0.759
The NProbe parameter is used in the similarity computation and does not affect the silhouette
quality.
8.5 Baseline Performance
Fig. 33 plots the CMCs and ROCs of the 7 challenge experiments. Table 21 lists some of
the key performance indicators, namely, the identification rate (PI) at ranks 1 and 5, the
verification rate (PV ) for a false alarm rate of 10%, and the area under the ROC (AUC).
There are several observations to be made. First, the identification ranges from 10%
to 79% at rank 1, which improves to a range from 33% to 96% at rank 5. In terms of
ROC performance, the detection rates range from 36% to 86% for a false alarm rate of
10%. These are very encouraging performances given the simplistic nature of the baseline
algorithm. Algorithms that are more sophisticated will result in better performance, for
which there is much room.
Second, both the identification rates, as seen in the CMCs, and the detection rates, as
seen in the ROCs, fall as one goes from Experiment A to G. This offers a natural ranking
of the experiments in terms of their challenge nature, i.e. the situation in Experiment A,
where the difference between probe and gallery is just the viewpoint, is easier to solve than
that in Experiment G, where the probe is different in terms of all the three covariates.
71
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Gallery: (G, A, R) Size: 71)
Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100ROC (Gallery: (G, A, R) Size: 71)
False Alarm Rate
Ver
ifica
tion
Rat
e
Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)
(b)
Figure 33. Baseline performance for the challenge experiments, (a) CMC curves and (b)ROCs plotted up to a false alarm rate of 20%.
72
Third, among the three covariates, viewpoint variation of about 30◦ seems to have the
least impact and surface type has the most impact based on the drop in the identification
rate due to each of these covariates. Apart from the effect of the individual covariates on
performance, there also seem to be interactions between their effects. For instance, shoe
type (Experiment B) seems to impact performance more than viewpoint (Experiment A)
but viewpoint change along with surface change (Experiment F) impacts performance more
than shoe type change along with surface change (Experiment E).
73
CHAPTER 9
PERFORMANCE OF THE SOPF REPRESENTATION
In this chapter, we explore the performance of the SoPF representation on the gait challenge
database. First, we experiment with the low level feature types used to build the SoPF.
One choice is to use the computed silhouettes to select the 2D edges as was done in the
previous experiments. The other choice is to consider the edges of the silhouette itself. The
first strategy might pick up features from the subjects clothing, which is not a problem for
the second case. However, the second case relies heavily on the quality of the silhouettes.
We have experimented with using features from both strategies. Fig. 34 shows the moving
edges as computed in previous experiments in (a) and from the binary silhouettes in (b).
Second, we present statistical tests to compare the performance of the SoPF with that of
the baseline algorithm. Our interest is not only to visually compare CMC and ROC curves
but also to statistically measure the significance of possible differences in performance in
each one of the experiments of the challenge problem. For this purpose we use Mc Nemar’s
test, which was introduced in Section 5.3.1. Third, we measure performance of the SoPF
representation and baseline algorithm using manually specified silhouette data, which give a
better understanding of the best possible performance by eliminating noise from background
subtraction. Last, we present experiments to quantify the variation in performance of both
the baseline and SoPF algorithms when varying the gallery type.
9.1 Varying the Type of Low Level Features
We experimented with two types of edge features: (a) edges of the gray level images selected
by the silhouettes, and (b) edges of the silhouette itself. In the first case, when silhouettes
are used as masks to extract the edges in motion (Fig. 34(a)) the size of the bounding box
is variable. The scaling factor D is needed to compute the relational distributions, it was
74
(a) (b)
Figure 34. Moving edges (a) using the binary silhouettes as masks over the edges of theoriginal images and (b) directly from the binary silhouettes.
calculated by fitting a line through the curve generated from the change in the height of the
silhouettes over time. In the second case, when edges are extracted from the silhouettes,
these are scaled to the size of the bounding box as in Fig. 34(b). The scaling factor D used
in this case is the diagonal of the bounding box. The size of the bounding box was fixed to
128 × 88 pixels.
For each experiment (Table 20), the gallery set was used as the training set to build
the SoPF. We keep 10 eigenvectors associated with the 10 largest eigenvalues. For each
case, we have experimented with varying the number of eigenvectors that are kept based
on the approximation error, but this did not change the overall results significantly. In
computing the similarities between two sequences, we used the mean instead of the median
(see Eq. 9),which gave us better results.
9.1.1 Silhouette Masked Image Edges as Low Level Features
We first present results where moving edges were selected using silhouettes as masks, which
is the case of the experiments presented so far. The results for the challenge experiments
are shown in the (a) CMC and (b) ROC curves of Fig. 35, where we can see that the
performance for the first 3 experiments is good compared with the results shown for the
baseline algorithm (see Fig. 33). The performance for the last 4 experiments dropped
significantly and is even lower than the baseline algorithm. These experiments exercise
75
Table 22. Performance comparison of baseline and SoPF algorithm when using silhouettemasked image edges as low level features. (Correct identifications (Corr), Total probes ingallery (Tot), and identification rate (PI)).
Baseline SoPF Paired Outcomes Mc Nemar’sExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF test p-value
A 56/71 79% 64/71 90% 51 5 13 2 0.98B 27/41 66% 30/41 74% 24 3 6 8 0.91C 23/41 56% 22/41 54% 13 10 9 9 0.50D 19/66 29% 10/66 15% 3 16 7 40 0.05E 10/42 24% 3/42 7% 1 9 2 30 0.03F 20/66 30% 3/66 5% 1 19 2 44 < 0.01G 4/42 9% 3/42 7% 1 3 2 36 0.47
the surface covariate. The most probable cause is that the gait velocity is different across
surfaces for the same person.
Table 22 shows the breakup of the successes and failures for the baseline and SoPF
algorithm. For each experiment, we list the total number of correct matches for the baseline
and SoPF, followed by the number of sequences in which both succeeded (SS), one succeeded
and the other failed (SF and FS), and both failed (FF). The last column lists the p-value for
the Mc Nemar’s test. The drop in performance that we see for experiments D, E, and F is
significant. However, note that for all these experiments the number of sequences on which
both the baseline and SoPF failed is very large. Thus, attesting to the difficult nature of
the surface covariate.
9.1.2 Silhouette Boundary Edges as Low Level Features
In this section we present a different approach from the one we used in our previous ex-
periments to compute low level features. For this case, we perform edge detection over the
binary silhouettes and use the resulting edge pixels as low level features. The purpose of
developing this strategy is to make the SoPF algorithm more robust to possible clothing
variations or other noise coming from the boundary gap between the edges of the person
and objects in the background. This approach is faster since it does not have to read the
original image to compute the edges and then select the ones falling within the binary sil-
76
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Gallery: (G, A, R) Size: 71)
Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100ROC (Gallery: (G, A, R) Size: 71)
False Alarm Rate
Ver
ifica
tion
Rat
e
Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)
(b)
Figure 35. Performance of the SoPF representation using silhouette masked image edges aslow level features. (a) CMC curves and (b) ROCs plotted upto a false alarm rate of 20%
77
Table 23. Performance comparison of the baseline and SoPF algorithms when using silhou-ette boundary edges as low level features. (Correct identifications (Corr), Total probes ingallery (Tot), and identification rate (PI)).
Baseline SoPF Paired Outcomes Mc Nemar’sExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF test p-value
A 56/71 79% 47/71 66% 42 14 5 10 0.03B 27/41 66% 25/41 61% 22 5 3 11 0.36C 23/41 56% 14/41 34% 8 15 6 12 0.04D 19/66 29% 8/66 12% 5 14 3 44 < 0.01E 10/42 24% 4/42 10% 2 8 2 30 0.05F 20/66 30% 8/66 12% 5 15 3 43 < 0.01G 4/42 9% 2/42 5% 0 4 2 36 0.33
houette. Instead the edges are computed directly over the binary silhouette image. The
rest of the process is the same. The results for the challenge experiments are shown in the
(a) CMC and (b) ROC curves of Fig. 36, where we can see the same behavior as in the
experiment in the last section; however, the performance for the first three experiments is
lower but close to that from the baseline algorithm, and the last four experiments are below
the baseline. Mc Nemar’s test was also applied in this case. In Table 23 we can see that
when using this strategy, only the performance from experiment B can be compared to that
from the baseline.
For completeness, we compare the performance of our two low level feature strategies
using Mc Nemar’s test, the success and failure modes are presented in Table 24 where we can
see that for experiments A, B, and C, which investigate covariates within walking surface,
the difference in performance is statistically significant in favor of the first strategy when
we use silhouette masked image edges. For experiment E, the difference is significant in
favor of our second strategy when we use silhouette boundary edges. For the rest of the
experiments the difference is not significant.
9.2 Using Manually Segmented Silhouettes
In this section, we used a few manually segmented silhouettes to test both the baseline
and the SoPF algorithms. One gait cycle was extracted from 19 Gallery sequences, 15
78
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Gallery: (G, A, R) Size: 71)
Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100ROC (Gallery: (G, A, R) Size: 71)
False Alarm Rate
Ver
ifica
tion
Rat
e
Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)
(b)
Figure 36. Performance of the SoPF representation using silhouette boundary edges as lowlevel features. (a) CMC curves and (b) ROCs plotted up to a false alarm rate of 20%.
79
Table 24. Performance comparison of the SoPF algorithm when using silhouette masked im-age edges (SoPF-M) and silhouette boundary edges (SoPF-B) as low level features. (Correctidentifications (Corr), Total probes in gallery (Tot), and identification rate (PI)).
SoPF-M SoPF-B Paired Outcomes Mc Nemar’sExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF test p-value
A 64/71 90% 47/71 66% 44 20 3 4 < 0.01B 30/41 74% 25/41 61% 24 6 1 10 0.05C 22/41 54% 14/41 34% 9 13 5 14 0.05D 10/66 15% 8/66 12% 2 8 6 50 0.40E 3/42 7% 4/42 10% 2 1 2 37 0.75F 3/66 5% 8/66 12% 0 3 8 55 0.97G 3/42 7% 2/42 5% 0 3 2 37 0.47
(a) (b)
Figure 37. (a) Manually extracted silhouette and (b) automatically extracted silhouette.
Probe B sequences, and 12 Probe D sequences. This ground truth data allows us to obtain
recognition results that are not influenced by noise, shadows, and errors of the automatic
background subtraction techniques. We considered the edges from the boundary of the
silhouettes as the strategy to extract low level features for the SoPF representation. Fig. 37
shows silhouettes from the same frame (a) manually and (b) automatically extracted, in
which we can see how much background noise is eliminated in the ground truth data.
Gait recognition results are shown in Table 25 along with the number of successes and
failures from Mc Nemar’s test. In this two experiments the behavior of both algorithms
was similar with the SoPF algorithm having more successes than the baseline. But the
performance of experiment D, that is the surface covariate, seems to be worse than that of
80
Table 25. Gait recognition results using ground truth silhouettes. (Correct identifications(Corr), Total probes in gallery (Tot), and identification rate (PI)).
Baseline SoPF Paired OutcomesExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF
B 6/15 40% 7/15 47% 5 1 2 7D 1/12 8% 2/12 17% 0 1 2 9
experiment B, that is viewpoint. For definitive conclusions we will have to experiment with
larger number of manually selected silhouettes.
9.3 Performance Variation of Baseline and SoPF Algorithms due to Variations
in Gallery Data
In this section, we consider the challenge dataset as the union of eight subsets {(G,A,R) ∪(G,A,L)∪(G,B,R)∪(G,B,L)∪(C,A,R)∪(C,B,R)∪(C,A,L)∪(C,B,L)}. The notation
for each subset was defined in Section 8.3. So far, we have used the (G, A, R) subset as
gallery for our experiments. Here we vary the choice of the gallery and compute recognition
rates for all the challenge experiments (see Table 20) as described in Section 5.3.2. Table 26
depicts the relationships of data subsets with the challenge experiments. The corresponding
results are shown in Table 27, which contains identification rates for the baseline algorithm,
and Table 28, which contains the rates for the SoPF algorithm. Figs. 38 to 44 show the
CMC curves for each experiment of both baseline and SoPF algorithms. For the SoPF
algorithm, we consider the first approach described in Section 9.1.1, which use silhouette
masked image edges as low level features. The set of parameters used, both for the baseline
and SoPF algorithms, are the same used in previous experiments. No optimization was
performed.
The results presented in this section show that walking surface is the covariate causing
most of the variations in performance, followed by view, and with shoe being more sta-
ble. Within walking surfaces (experiments A, B and C), performance is better than across
surfaces (D, E, F, and G), and with less variations. Experiment B is the one with small-
est variation in performance for both the baseline and SoPF algorithms and with similar
81
Table 26. Relationship between data subsets and challenge experiments when using differentsubsets as gallery.
ExperimentsA B C D E F G
Gallery (view) (shoe) (shoe (surface) (surface (surface (surface+ view) + shoe) + view) shoe
+ view)(G,A,R) (G,A,L) (G,B,R) (G,B,L) (C,A,R) (C,B,R) (C,A,L) (C,B,L)(G,A,L) (G,A,R) (G,B,L) (G,B,R) (C,A,L) (C,B,L) (C,A,R) (C,B,R)(G,B,R) (G,B,L) (G,A,R) (G,A,L) (C,B,R) (C,A,R) (C,B,L) (C,A,L)(G,B,L) (G,B,R) (G,A,L) (G,A,R) (C,B,L) (C,A,L) (C,B,R) (C,A,R)(C,A,R) (C,A,L) (C,B,R) (C,B,L) (G,A,R) (G,B,R) (G,A,L) (G,B,L)(C,B,R) (C,B,L) (C,A,R) (C,A,L) (G,B,R) (G,A,R) (G,B,L) (G,A,L)(C,A,L) (C,A,R) (C,B,L) (C,B,R) (G,A,L) (G,B,L) (G,A,R) (G,B,R)(C,B,L) (C,B,R) (C,A,L) (C,A,R) (G,B,L) (G,A,L) (G,B,R) (G,A,R)
Table 27. Performance variation of baseline algorithm due to variations in gallery type.
Gallery ExperimentsA B C D E F G
(G,A,R) 79% 66% 56% 29% 24% 30% 9%(G,A,L) 81% 68% 53% 38% 25% 25% 18%(G,B,R) 83% 82% 44% 38% 28% 19% 24%(G,B,L) 73% 73% 51% 41% 32% 29% 27%(C,A,R) 56% 77% 47% 15% 14% 12% 8%(C,B,R) 77% 81% 44% 22% 24% 19% 17%(C,A,L) 70% 77% 49% 20% 16% 11% 16%(C,B,L) 83% 77% 53% 16% 29% 16% 12%Range 70-83% 66-82% 44-56% 15-38% 14-32% 11-30% 8-27%
recognition rates. Also looking at variation in performance of experiment A we see that
viewpoint change on concrete seems to impact the baseline algorithm to some extent but it
produces grater variations in the performance of the SoPF algorithm.
82
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment A − view)
G: (G,A,R) P: (G,A,L)G: (G,A,L) P: (G,A,R)G: (G,B,R) P: (G,B,L)G: (G,B,L) P: (G,B,R)G: (C,A,R) P: (C,A,L)G: (C,B,R) P: (C,B,L)G: (C,A,L) P: (C,A,R)G: (C,B,L) P: (C,B,R)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment A − view)
G: (G,A,R) P: (G,A,L)G: (G,A,L) P: (G,A,R)G: (G,B,R) P: (G,B,L)G: (G,B,L) P: (G,B,R)G: (C,A,R) P: (C,A,L)G: (C,B,R) P: (C,B,L)G: (C,A,L) P: (C,A,R)G: (C,B,L) P: (C,B,R)
(b)
Figure 38. CMCs of (a) baseline and (b) SoPF algorithms for experiment A (view).
83
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment B − shoe)
G: (G,A,R) P: (G,B,R)G: (G,A,L) P: (G,B,L)G: (G,B,R) P: (G,A,R)G: (G,B,L) P: (G,A,L)G: (C,A,R) P: (C,B,R)G: (C,B,R) P: (C,A,R)G: (C,A,L) P: (C,B,L)G: (C,B,L) P: (C,A,L)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment B − shoe)
G: (G,A,R) P: (G,B,R)G: (G,A,L) P: (G,B,L)G: (G,B,R) P: (G,A,R)G: (G,B,L) P: (G,A,L)G: (C,A,R) P: (C,B,R)G: (C,B,R) P: (C,A,R)G: (C,A,L) P: (C,B,L)G: (C,B,L) P: (C,A,L)
(b)
Figure 39. CMCs of (a) baseline and (b) SoPF algorithms for experiment B (shoe).
84
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment C − shoe+view)
G: (G,A,R) P: (G,B,L)G: (G,A,L) P: (G,B,R)G: (G,B,R) P: (G,A,L)G: (G,B,L) P: (G,A,R)G: (C,A,R) P: (C,B,L)G: (C,B,R) P: (C,A,L)G: (C,A,L) P: (C,B,R)G: (C,B,L) P: (C,A,R)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment C − shoe+view)
G: (G,A,R) P: (G,B,L)G: (G,A,L) P: (G,B,R)G: (G,B,R) P: (G,A,L)G: (G,B,L) P: (G,A,R)G: (C,A,R) P: (C,B,L)G: (C,B,R) P: (C,A,L)G: (C,A,L) P: (C,B,R)G: (C,B,L) P: (C,A,R)
(b)
Figure 40. CMCs of (a) baseline and (b) SoPF algorithms for experiment C (view and shoe).
85
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment D − surface)
G: (G,A,R) P: (C,A,R)G: (G,A,L) P: (C,A,L)G: (G,B,R) P: (C,B,R)G: (G,B,L) P: (C,B,L)G: (C,A,R) P: (G,A,R)G: (C,B,R) P: (G,B,R)G: (C,A,L) P: (G,A,L)G: (C,B,L) P: (G,B,L)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment D − surface)
G: (G,A,R) P: (C,A,R)G: (G,A,L) P: (C,A,L)G: (G,B,R) P: (C,B,R)G: (G,B,L) P: (C,B,L)G: (C,A,R) P: (G,A,R)G: (C,B,R) P: (G,B,R)G: (C,A,L) P: (G,A,L)G: (C,B,L) P: (G,B,L)
(b)
Figure 41. CMCs of (a) baseline and (b) SoPF algorithms for experiment D (surface).
86
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment E − surface+shoe)
G: (G,A,R) P: (C,B,R)G: (G,A,L) P: (C,B,L)G: (G,B,R) P: (C,A,R)G: (G,B,L) P: (C,A,L)G: (C,A,R) P: (G,B,R)G: (C,B,R) P: (G,A,R)G: (C,A,L) P: (G,B,L)G: (C,B,L) P: (G,A,L)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment E − surface+shoe)
G: (G,A,R) P: (C,B,R)G: (G,A,L) P: (C,B,L)G: (G,B,R) P: (C,A,R)G: (G,B,L) P: (C,A,L)G: (C,A,R) P: (G,B,R)G: (C,B,R) P: (G,A,R)G: (C,A,L) P: (G,B,L)G: (C,B,L) P: (G,A,L)
(b)
Figure 42. CMCs of (a) baseline and (b) SoPF algorithms for experiment E (surface andshoe).
87
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment F − surface+view)
G: (G,A,R) P: (C,A,L)G: (G,A,L) P: (C,A,R)G: (G,B,R) P: (C,B,L)G: (G,B,L) P: (C,B,R)G: (C,A,R) P: (G,A,L)G: (C,B,R) P: (G,B,L)G: (C,A,L) P: (G,A,R)G: (C,B,L) P: (G,B,R)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment F − surface+view)
G: (G,A,R) P: (C,A,L)G: (G,A,L) P: (C,A,R)G: (G,B,R) P: (C,B,L)G: (G,B,L) P: (C,B,R)G: (C,A,R) P: (G,A,L)G: (C,B,R) P: (G,B,L)G: (C,A,L) P: (G,A,R)G: (C,B,L) P: (G,B,R)
(b)
Figure 43. CMCs of (a) baseline and (b) SoPF algorithms for experiment F (surface andview).
88
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment G − surface+shoe+view)
G: (G,A,R) P: (C,B,L)G: (G,A,L) P: (C,B,R)G: (G,B,R) P: (C,A,L)G: (G,B,L) P: (C,A,R)G: (C,A,R) P: (G,B,L)G: (C,B,R) P: (G,A,L)G: (C,A,L) P: (G,B,R)G: (C,B,L) P: (G,A,R)
(a)
0 2 4 6 8 10 12 14 16 18 200
10
20
30
40
50
60
70
80
90
100
Rank
Iden
tific
atio
n R
ate
Cummulative Match Characteristics (Experiment G − surface+shoe+view)
G: (G,A,R) P: (C,B,L)G: (G,A,L) P: (C,B,R)G: (G,B,R) P: (C,A,L)G: (G,B,L) P: (C,A,R)G: (C,A,R) P: (G,B,L)G: (C,B,R) P: (G,A,L)G: (C,A,L) P: (G,B,R)G: (C,B,L) P: (G,A,R)
(b)
Figure 44. CMCs of (a) baseline and (b) SoPF algorithms for experiment G (surface, shoeand view).
89
Table 28. Performance variation of the SoPF algorithm due to variations in gallery type.
Gallery ExperimentsA B C D E F G
(G,A,R) 90% 74% 54% 15% 7% 5% 7%(G,A,L) 89% 79% 42% 6% 8% 8% 8%(G,B,R) 88% 76% 49% 8% 8% 6% 6%(G,B,L) 85% 81% 62% 6% 6% 13% 8%(C,A,R) 44% 70% 26% 16% 13% 6% 3%(C,B,R) 59% 65% 38% 11% 12% 11% 8%(C,A,L) 54% 65% 32% 5% 3% 3% 3%(C,B,L) 58% 71% 26% 6% 10% 2% 10%Range 44-90% 65-81% 26-62% 6-16% 3-13% 2-13% 3-10%
90
CHAPTER 10
CONCLUSIONS
We presented a statistical framework for motion analysis that tracks the variation of non-
stationarity in the distributions of relations among image features in individual frames.
We proposed the concept of the Space of Probability Functions (SoPF) that allows us to
capture the non-stationary variations. Among the attractive features of this approach are
(a) no feature level tracking or correspondence is necessary, (b) segmentation of object from
background need not be perfect, (c) there is no need for explicit object shape models, and
(d) movement between frames need not be in the order of one or two pixels. We presented
extensive experiments. First, we studied the robustness of the SoPF representation with
respect to segmentation and scale changes. Second, we explored the possibility of recognition
from walking, jogging, and running gaits. Third, we studied the variation of walking gait
with respect to viewpoint changes. Fourth, we benchmarked the performance using the
gait challenge problem over a large gait dataset. Qualitative conclusions that can be drawn
from the studies are: (a) the SoPF representation is robust with respect to segmentation
and scale changes, (b) the subject is a far greater source of gait variation than viewpoint,
motion types, or direction of motion, (c) it is possible to recognize persons from jogging
and running gaits and not just from walking gait, (d) gait-based recognition need not be
restricted to frontal-parallel views; walking gait viewed from 22.5◦ and 45◦ also results in
similar recognition as that from frontal-parallel views, and (d) the effects of different surface
types are statistically more significant than shoe or viewpoint.
For future work, we will consider the use of a different technique for the recognition
stage of the SoPF representation to substitute the time normalized and time un-normalized
distances. For instance, we are considering Auto Regressive Moving Average models, which
represent time series as a set of coefficients that can be used to represent a gait pattern,
91
SoPF traces in our case, for classification. From data complexity point of view, we are
expecting to increment the size of the challenge dataset to over 100 subjects, which will
include more covariates. The next set of data that is going to be available will include
persons carrying objects, which will be interesting to investigate. Then, another set of
data will be available which will include the same persons after some period of time. This
“persons over time” covariate will allow us to investigate variations with respect to physical
changes (i.e. hair length) and clothing.
92
REFERENCES
[1] A. Selinger and L. Wixson, “Classifying moving objects as rigid or non-rigid withoutcorrespondences,” in DARPA, pp. 341–347, 1998.
[2] R. Cutler and L. Davis, “Robust real-time periodic motion detection, analysis, andapplications,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 22, no. 8, pp. 781–796, 2000.
[3] R. Polana and R. Nelson, “Detection and recognition of periodic, nonrigid motion,”International Journal of Computer Vision, vol. 23, no. 3, pp. 261–282, 1997.
[4] S. Seitz and C. Dyer, “Cyclic motion analysis using periodic trace,” in Motion-BasedRecognition, ch. 4, Kluwer Academic Publishers, 1997.
[5] R. Collins, A. Lipton, and T. Kanade, “Introduction to the special section on videosurveillance,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 22, no. 8, pp. 745–746,2000.
[6] M. Shah and R. E. Jain, Motion-Based Recognition. Kluwer Academic Publishers,1997.
[7] M. Black and A. Jepson, “EigenTracking:robust matching and tracking of articulatedobjects using view-based representation,” in European Conference on Computer Vision,pp. 329–342, 1996.
[8] M. Black, Y. Yacoob, and S. Ju, “Recognizing human motion using parameterized mod-els of optical flow,” in Motion-Based Recognition, ch. 11, Kluwer Academic Publishers,1997.
[9] J. Little and J. Boyd, “Recognizing people by their gait: The shape of motion,” Videre,vol. 1, no. 2, pp. 1–33, 1998.
[10] N. Goddard, “Human activity recognition,” in Motion-Based Recognition, ch. 7, KluwerAcademic Publishers, 1997.
[11] I. Robledo Vega and S. Sarkar, “Experiments on gait analysis by exploiting nonsta-tionarity in the distribution of feature relationships,” in International Conference onPattern Recognition, pp. I:385–388, 2002.
[12] I. Robledo Vega and S. Sarkar, “Representation of the evolution of feature relationshipstatistics: Human gait-based recognition,” IEEE Trans. Pattern Anal. and Mach. In-tel., Under Revision.
93
[13] A. Huet and E. Hancock, “Line pattern retrieval using relational histograms,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 13, pp. 1363–1370, 1999.
[14] B. Schiele and J. Crowley, “Recognition without correspondence using multidimen-sional receptive field histograms,” International Journal of Computer Vision, vol. 36,no. 1, pp. 31–50, 2000.
[15] S. Belongie and J. Malik, “Matching with shape contexts,” in Workshop on Content-Based Access of Image and Video Libraries, pp. 20–26, 2000.
[16] M. Murray, A. Drought, and R. Kory, “Walking patterns of normal men,” Journal ofBone and Joint Surgery, vol. 46-A, no. 2, pp. 335–360, 1964.
[17] C. Kirtley, Introduction to Computerized Gait Analysis.http://engineering.cua.edu/biomedical/courses/be522/notes.html, The Hong KongPolytechnic University, 1997.
[18] E. Muybridge, The Human Figure in Motion. Dover Publications, 1955.
[19] K. Aminian, B. Najafi, C. Bula, P.-F. Leyvraz, and P. Robert, “Ambulatory gait anal-ysis using gyroscopes,” in 25th. Annual Meeting of the American Society of Biome-chanics, 2001.
[20] I. Pappas, T. Keller, and M. Popovic, “Validation of a new gait phase detection sys-tem,” in 6th. Annual Gait and Clinical Movement Analysis Meeting, 2001.
[21] R. Huitema, A. Hof, and K. Postema, “Ultrasonic motion analysis system - measurmentof temporal and spatial gait parameters,” Journal of Biomechanics, vol. 35, no. 6,pp. 837–842, 2002.
[22] H. Sadeghi, P. Allard, F. Prince, and H. Labelle, “Symmetry and limb dominance inable-bodied gait: A review,” Gait and Posture, vol. 12, pp. 34–45, 2000.
[23] J. Ambrosio, G. Lopes, J. Costa, and A. J., “Spatial reconstruction of the humanmotion based on images of a single camera,” Journal of Biomechanics, vol. 34, no. 9,pp. 1217–1221, 2001.
[24] M. LaFiandra, K. Holt, R. Wagenaar, and J. Obusek, “Transverse plane kinetics duringtreadmill walking with and without a load,” Clinical Biomechanics, vol. 17, pp. 34–45,2002.
[25] T. Chau, “A review of analytical techniques for gait data. part 1: Fuzzy, statisticaland fractal methods,” Gait and Posture, vol. 13, pp. 49–66, 1 2001.
[26] T. Chau, “A review of analytical techniques for gait data. part 2: Neural networks andwavelet methods,” Gait and Posture, vol. 13, no. 2, pp. 102–120, 2001.
[27] G. Johansson, “Visual perception of biological motion and a model for its analysis,”Perception & Psychophysics, vol. 14, no. 2, pp. 201–211, 1973.
[28] J. Cutting and L. Kozlowski, “Recognizing friends by their walk: Gait perceptionwithout familiarity cues,” Bulletin of the Psychonomic Society, vol. 9, no. 5, pp. 353–356, 1977.
94
[29] L. Kozlowski and J. Cutting, “Recognizing the sex of a walker from a dynamic point-light display,” Perception & Psychophysics, vol. 21, no. 6, pp. 575–580, 1977.
[30] G. Mather and L. Murdoch, “Gender discrimination in biological motion displays basedin dynamic cues,” Proceedings of the Royal Society of London Series B-Biological Sci-ences, vol. 258, pp. 273–279, 1994.
[31] G. Mather, K. Radford, and W. S., “Low level visual processing of biological mo-tion,” Proceedings of the Royal Society of London Series B-Biological Sciences, vol. 249,pp. 149–155, 1992.
[32] J. Bientema and M. Lappe, “Perception of biological motion without local image mo-tion,” in Proceedings of the National Academy of Sciences of the United Satets ofAmerica, vol. 99, pp. 5661–5663, April 2002.
[33] P. Neri, M. Concetta Morrone, and D. Burr, “Seeing biological motion,” Nature,vol. 395, pp. 894–896, October 1998.
[34] M. Pavlova, I. Krageloh-Mann, N. Birbaumer, and A. Sokolov, “Biological motionshown backwards: The apparent-facing effect,” Perception, vol. 31, no. 4, pp. 435–443,2002.
[35] E. Grossman, R. Donnelly, R. Price, D. Pickens, V. Morgan, and R. Blake, “Brainareas involved in perception of biological motion,” Journal of Cognitive Neuroscience,vol. 12, no. 5, pp. 711–720, 2000.
[36] E. Grossman and R. Blake, “Brain activity evoked by inverted and imagined biologicalmotion,” Vision Research, vol. 41, no. 10-11, pp. 1475–1482, 2001.
[37] J. Grezes, P. Fonlupt, B. Bertenthal, C. Delon-Martin, C. Segebarth, and J. Decety,“Does perception of biological motion rely on specific brain regions?,” NeuroImage,vol. 13, no. 5, pp. 775–785, 2001.
[38] D. Hoffman and B. Flinchbaugh, “The interpretation of biological motion,” BiologicalCybernatics, vol. 42, pp. 195–204, 1982.
[39] B. Flinchbaugh and B. Chandrasekaran, “A theory of spatio-temporal aggregation forvision,” AI, vol. 17, pp. 387–407, 1981.
[40] A. Bissacco, A. Chiuso, Y. Ma, and S. Soatto, “Recognition of human gaits,” in Com-puter Vision and Pattern Recognition, pp. II:52–57, 2001.
[41] R. Polana and R. Nelson, “Temporal texture and activity recognition,” in Motion-Based Recognition, ch. 5, Kluwer Academic Publishers, 1997.
[42] S. Sarkar and I. Robledo Vega, “Discrimination of motion based on traces in the spaceof probability functions over feature relations,” in Computer Vision and Pattern Recog-nition, pp. I:976–983, 2001.
[43] A. Bobick and A. Johnsson, “Gait recognition using static, activity-specific parame-ters,” in Computer Vision and Pattern Recognition, pp. I:423–430, 2001.
95
[44] R. Tanawongsuwan and A. Bobick, “Gait recognition from time-normalized joint-angle trajectories in the walking plane,” in Computer Vision and Pattern Recognition,pp. II:726–731, 2001.
[45] G. Shakhnarovich, L. Lee, and T. Darrell, “Integrated face and gait recognition frommultiple views,” in Computer Vision and Pattern Recognition, pp. I:439–446, 2001.
[46] L. Lee and W. Grimson, “Gait analysis for recognition and classification,” in Interna-tional Conference on Automatic Face and Gesture Recognition, pp. 155–162, 2002.
[47] J. Boyd, “Video phase-locked loops in gait recignition,” in International Conferenceon Computer Vision, pp. I:696–703, 2001.
[48] J. Hayfron-Acquah, M. Nixon, and J. Carter, “Automatic gait recognition by symme-try analysis,” in 3rd International Conference on Audio- and Video-Based BiometricPerson Authentication, pp. 272–277, 2001.
[49] J. Shutler, M. Nixon, and C. Carter, “Statistical gait description via temporal mo-ments,” in 4th IEEE Southwest Symp. on Image Analysis and Int., pp. 291–295, 2000.
[50] C. BenAbdelkader, R. Cutler, H. Nanda, and L. Davis, “Eigengait: Motion-basedrecognition of people using image self-similarity,” in 3rd International Conference onAudio- and Video-Based Biometric Person Authentication, pp. 284–294, 2001.
[51] C. BenAbdelkader, R. Cutler, and L. Davis, “Motion-based recognition of people ineigengait space,” in International Conference on Automatic Face and Gesture Recog-nition, pp. 267–272, 2002.
[52] C. BenAbdelkader, R. Cutler, and L. Davis, “Stride and cadence as a biometric in auto-matic person identification and verification,” in International Conference on AutomaticFace and Gesture Recognition, pp. 372–377, 2002.
[53] A. Kale, A. Rajagopalan, N. Cuntoor, and V. Kruger, “Human identification us-ing gait,” in International Conference on Automatic Face and Gesture Recognition,pp. 336–341, 2002.
[54] R. Collins, R. Gross, and J. Shi, “Silhouette-based human identification from bodyshape and gait,” in International Conference on Automatic Face and Gesture Recogni-tion, pp. 366–371, 2002.
[55] K. Boyer and A. Kak, “Structural stereo for 3-D vision,” IEEE Trans. PatternAnal. and Mach. Intel., vol. 10, no. 2, pp. 144–166, 1988.
[56] R. Wilson and E. Hancock, “Graph matching by configurational relaxation,” in Inter-national Conference on Pattern Recognition, pp. B:563–566, 1994.
[57] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker, “Shock graphs and shapematching,” International Journal of Computer Vision, vol. 35, no. 1, pp. 13–32, 1999.
[58] Y. Keselman and S. Dickinson, “Generic model abstraction from examples,” in Com-puter Vision and Pattern Recognition, pp. I:856–863, 2001.
[59] D. Lowe, “Object recognition from local scale-invariant features,” in International Con-ference on Computer Vision, pp. 1150–1157, 1999.
96
[60] A. Bobick and A. Wilson, “A state based approach to the representation and recogni-tion of gesture,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 19, no. 12, pp. 1325–1337, 1997.
[61] A. Jain, R. Bolle, and S. Pankanti, Biometrics: Personal Identification in a NetworkedSociety. Kluwer Academic Publishers, 1999.
[62] P. Phillips, H. Moon, S. Rizvi, and P. Rauss, “The FERET evaluation methodologyfor face-recognition algorithms,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 22,no. 10, pp. 1090–1104, 2000.
[63] J. Beveridge, B. Draper, K. She, and G. Givens, “Parametric and nonparametric meth-ods for the statistical evaluation of humanid algorithms,” in IEEE Workshop on Em-pirical Evaluation Methods in Computer Vision, pp. xx–yy, 2001.
[64] P. Phillips, S. Sarkar, I. Robledo, P. Grother, and K. Bowyer, “Baseline results for thechallenge problem of Human ID using gait analysis,” in International Conference onAutomatic Face and Gesture Recognition, pp. 137–142, 2002.
[65] P. Phillips, S. Sarkar, I. Robledo, P. Grother, and K. Bowyer, “The gait identificationchallenge problem: Data sets and baseline algorithm,” in International Conference onPattern Recognition, pp. I:1–4, 2002.
97
ABOUT THE AUTHOR
Isidro Robledo Vega earned his B.S. in Industrial Engineering in Electronics in 1989 and
M.S. in Electronics Engineering with Computer Science option in 1996 at the Instituto
Tecnologico de Chihuahua in Chihuahua, Mexico. His research interests include computer
vision, digital image processing and artificial intelligence.