oﬃce of graduate studies university of south florida tampa, …sarkar/pdfs/isidro... ·...

Office of Graduate StudiesUniversity of South Florida

Tampa, Florida

CERTIFICATE OF APPROVAL

This is to certify that the dissertation of

ISIDRO ROBLEDO VEGA

in the graduate degree program ofComputer Science and Engineeringwas approved on August 22, 2002

for the Doctor of Philosophy degree.

Examining Committee:

Major Professor: Sudeep Sarkar, Ph.D.

Member: Dmitry Goldgof, Ph.D.

Member: Eugene Fink, Ph.D.

Member: Tapas Das, Ph.D.

Member: Thomas Sanocki, Ph.D.

Member: Kevin Bowyer, Ph.D.

Committee Verification:

Associate Dean

MOTION MODEL BASED ON STATISTICS OF FEATURE RELATIONS:

HUMAN IDENTIFICATION FROM GAIT

by

ISIDRO ROBLEDO VEGA

A dissertation submitted in partial fulfillmentof the requirements for the degree of

Doctor of PhilosophyDepartment of Computer Science and Engineering

College of EngineeringUniversity of South Florida

Date of Approval:August 22, 2002


c©Copyright by Isidro Robledo Vega 2002All rights reserved

DEDICATION

To Alex and Myrna

ACKNOWLEDGEMENTS

I want to thank CONACYT-SEP-Mexico for their support during my Ph.D. studies. This

research was supported by funds from National Science Foundation grants EIA 0130768 and

IIS-9907141 and DARPA HumanID program under contract AFOSR-F49620-00-1-00388. I

also want to thank the members of my comitte for spending their time reviewing this

manuscript; to Dr. Jonathon Phillips, Dr. Kevin Bowyer, and Patrick Grother for their

contributions in the design of the gait challenge problem; to Stan Janet and Karen Marshall

at NIST for helping us in processing the gait challenge dataset and creating the bounding

box information for the gait sequences; to Dr. Patrick Flynn at University of Notre Dame

for testing the baseline algorithm code and scripts; to my friends at the computer vision

lab, Paddu, Earnie, Jaesik, Yong and Tong for sharing their ideas and helping me in many

different ways to accomplish my goals; to Zongyi for sharing his ideas to improve the

computation of binary silhouettes; to Ayush for keeping the protocol of the data acquisitions;

to Laura, Adebola and Christy for manually extracting silhouettes, and specially to my

advisor Dr. Sudeep Sarkar for accepting me as his student and preparing me during the

last two and a half year to be a good researcher and contribute to computer vision. Thanks

my parents Apolinar and Olivia; my sisters Domy, Laura, and Claudia; my nices Gaby,

Paulina, and Sofia; and my brothers in law Rene and Cesar for their love and support for

my family and me during this last four years away from home. Finally, thanks to my wife

Myrna and my son Alex for being my greatest source of inspiration.

TABLE OF CONTENTS

LIST OF TABLES iii

LIST OF FIGURES v

ABSTRACT viii

CHAPTER 1 INTRODUCTION 1

CHAPTER 2 RELATED WORK 82.1 Biomechanics of Human Gait 82.2 Visual Perception of Human Gait 112.3 Human Gait Analysis Using Computer Vision Techniques 13

CHAPTER 3 MOTION MODELING: THEORY 173.1 Relational Distributions 17

3.1.1 Moving Edge Based Features 193.1.2 Scaling Constant D 22

3.2 Space of Probability Functions 233.3 Similarity Measures 27

3.3.1 Time Un-normalized Distance 283.3.2 Time Normalized Distance 283.3.3 Similarity Measure Based on Multiple Gait Cycles 29

CHAPTER 4 INSIGHTS INTO THE SOPF REPRESENTATION THROUGHAN EXAMPLE 31

4.1 Can We Discriminate Between Motion Types Across Persons? 344.2 For Each Person, Can We Discriminate Between Motion Types? 364.3 Is Identifying Persons Based on Motion Gait Possible? 364.4 Is the SoPF Representation Robust with Respect to Segmentation Er-

rors? 364.5 Is the SoPF Representation Stable with Respect to Scale Variations? 374.6 PCA of the Edge Images 40

CHAPTER 5 EVALUATION METHODOLOGY 435.1 Covariates 43

5.1.1 Analysis of Variance (ANOVA) 445.2 Performance Evaluation 44

5.2.1 Identification 455.2.2 Verification 45

i

5.3 Statistical Methods for the Evaluation of Human Identification Algo-rithms 45

5.3.1 Mc Nemar’s Test 475.3.2 Performance Variations due to Variation in Gallery Data 48

CHAPTER 6 HUMAN IDENTIFICATION FROM DIFFERENT GAIT TYPES 506.1 Analysis of Covariates 506.2 Gait-Based Recognition Experiments 52

CHAPTER 7 WALKING GAIT BASED IDENTIFICATION FROM DIFFER-ENT VIEW ANGLES 55

7.1 Analysis of Covariates 557.2 Gait-Based Recognition Experiments 58

CHAPTER 8 BENCHMARKING WALKING GAIT BASED IDENTIFICATION 628.1 The Gait Challenge Problem 628.2 The Data Set 628.3 Challenge Experiments 668.4 Baseline Algorithm 66

8.4.1 Silhouette Extraction 678.4.2 Similarity Computation 698.4.3 Parameters 70

8.5 Baseline Performance 71

CHAPTER 9 PERFORMANCE OF THE SOPF REPRESENTATION 749.1 Varying the Type of Low Level Features 74

9.1.1 Silhouette Masked Image Edges as Low Level Features 759.1.2 Silhouette Boundary Edges as Low Level Features 76

9.2 Using Manually Segmented Silhouettes 789.3 Performance Variation of Baseline and SoPF Algorithms due to Varia-

tions in Gallery Data 81

CHAPTER 10 CONCLUSIONS 91

REFERENCES 93

ABOUT THE AUTHOR End Page

ii

LIST OF TABLES

Table 1. Summary of recent research on gait-based recognition using computer vi-sion techniques. 16

Table 2. Distance between the traces through the SoPF of two different cycles ofmotion for the three persons and three motion types dataset. 34

Table 3. Summary statistics of distances between the traces through the SoPF forthe three persons and three motion types dataset. 37

Table 4. Distance between the traces through the SoPF of two different cycles ofmotion for the three persons and three motion types with moderate amountof segmentation noise (Walking (W), Jogging (J), and Running (R)). 38

Table 5. Summary statistics of the distances between the traces through the SoPFfor sequences with moderate amount of segmentation noise. 39

Table 6. Distance between the traces through the SoPF of two different cycles ofmotion for the three persons and three motion types with large amount ofsegmentation noise (Walking (W), Jogging (J), and Running (R)). 39

Table 7. Summary statistics of the distances between the traces through the SoPFfor sequences with large amount of segmentation noise. 40

Table 8. Distance between the traces through the SoPF of two different half scaledcycles of motion for the three persons and three motion types. 41

Table 9. Summary statistics of the distances between the traces through the SoPFof the half scaled version of the testing set, keeping the training set at theoriginal size. 41

Table 10. Sample rows from a file in SAS format for the experiment on differentmotion types. 44

Table 11. Paired data from algorithms being compared with Mc Nemar’s test. 47

Table 12. ANOVA table with results for different motion types experiments. 52

Table 13. Number of persons correctly identified for different motion types experi-ments. 53

Table 14. Distance between the traces through the SoPF of two different cycles ofwalking motion for 10 persons. 53

iii

Table 15. Distance between the traces through the SoPF of two different cycles ofjogging motion for 10 persons. 54

Table 16. Distance between the traces through the SoPF of two different cycles ofrunning motion for 10 persons. 54

Table 17. ANOVA table with results for different view angle experiments. 57

Table 18. Gallery and probe sets for gait recognition experiments over the 20 persondatabase. 58

Table 19. Number of sequences for each combination of possible surface (G or C),shoe (A or B), and camera view (L or R). 65

Table 20. The probe set for each of challenge experiments. 66

Table 21. Baseline performance for the challenge experiments in terms of the iden-tification rate PI at ranks 1 and 5, verification rate PV at a false alarmrate of 10%, and area under ROC (AUC). 71

Table 22. Performance comparison of baseline and SoPF algorithm when using sil-houette masked image edges as low level features. 76

Table 23. Performance comparison of the baseline and SoPF algorithms when usingsilhouette boundary edges as low level features. 78

Table 24. Performance comparison of the SoPF algorithm when using silhouettemasked image edges (SoPF-M) and silhouette boundary edges (SoPF-B)as low level features. 80

Table 25. Gait recognition results using ground truth silhouettes. 81

Table 26. Relationship between data subsets and challenge experiments when usingdifferent subsets as gallery. 82

Table 27. Performance variation of baseline algorithm due to variations in gallerytype. 82

Table 28. Performance variation of the SoPF algorithm due to variations in gallerytype. 90

iv

LIST OF FIGURES

Figure 1. Different phases of a walking gait cycle. 3

Figure 2. Image processing steps to build the SoPF. 5

Figure 3. The process to compute similarity between two image sequences. 6

Figure 4. Samples of Eadweard Muybridge photographs from “The Humans in Mo-tion.” 9

Figure 5. Five point light display frames of a human walking. 11

Figure 6. An empirical sampling-based interpretation of relational distributions. 18

Figure 7. Detection of edges in motion using background subtraction. 19

Figure 8. Detection of edges in motion using frame differencing. 20

Figure 9. Edge pixel based 2-ary relational distribution. 21

Figure 10. Edge pixel based 3-ary relational distribution. 22

Figure 11. Fitting of a line through the height curve generated from a walking cycleof motion at (a) 0◦ or frontal-parallel (b) 22.5◦ (c) 45◦ with respect to theimage plane to determine the scaling constant D. 24

Figure 12. Some configurations of legs in motion in (a), (c) and (e) with their corre-sponding 2-ary relational distributions in (b), (d) and (f). 25

Figure 13. Similarity measure between sequences with multiple gait cycles. 30

Figure 14. Two consecutive frames from a running sequence. 31

Figure 15. Ten most dominant dimensions of SoPF for the treadmill sequences. 32

Figure 16. Eigenvalues associated with the SoPF of 2-ary relational distributions. 33

Figure 17. Variation of (a) c1(t) and (b) c2(t) within each motion cycle for each ofthe three persons and motion types. 35

Figure 18. (a) and (b) show some typical frames where the segmentation processmisses significant portions of the legs. (c) An under segmented frame. (d)A more under segmented frame. 38

v

Figure 19. Comparison of the largest eigenvalues associated with the edge images ofpeople in motion and those associated with the SoPF of 2-ary relationaldistributions of the same images. 42

Figure 20. The process of evaluating the performance of our algorithms. 46

Figure 21. Sample frames of a person (a) walking, (b) jogging, and (c) running. 51

Figure 22. Ten most dominant dimensions of the SoPF for different motion typesdatabase consisting of 10 persons. 51

Figure 23. Setup for data acquisition of different view angle walking sequences. 56

Figure 24. Sample frames from the same person walking (a) frontal-parallel (b) 22.5◦

(c) 45◦ with respect to the image plane. 57

Figure 25. Ten most dominant dimensions of the SoPF for 20 person database. 58

Figure 26. (a) CMC and (b) ROC curves for experiments 1, 2 and 3, studying iden-tification and verification rates at varying viewpoints. 60

Figure 27. (a) CMC and (b) ROC curves for experiments 1, 4 and 5, studying varia-tion of identification and verification rates with change in view point. 61

Figure 28. Camera setup for the gait data acquisition. 63

Figure 29. Frames from (a) the left camera for concrete surface, (b) the right camerafor concrete surface, (c) the left camera for grass surface, (d) the rightcamera for grass surface. 65

Figure 30. Sample bounding boxed image data as viewed from (a) left camera onconcrete, (b) right camera on concrete, (c) left camera on grass, and (d)right camera on grass. 67

Figure 31. Estimated mean background for a sequence on (a) concrete and (c) grass.Variance of the RGB channels in the background pixels on (b) concreteand (d) grass. 68

Figure 32. The bottom row shows sample silhouette frames depicting the nature ofsegmentation issues that need to tackled. 69

Figure 33. Baseline performance for the challenge experiments, (a) CMC curves and(b) ROCs plotted up to a false alarm rate of 20%. 72

Figure 34. Moving edges (a) using the binary silhouettes as masks over the edges ofthe original images and (b) directly from the binary silhouettes. 75

Figure 35. Performance of the SoPF representation using silhouette masked imageedges as low level features. 77

Figure 36. Performance of the SoPF representation using silhouette boundary edgesas low level features. 79

vi

Figure 37. (a) Manually extracted silhouette and (b) automatically extracted silhou-ette. 80

Figure 38. CMCs of (a) baseline and (b) SoPF algorithms for experiment A (view). 83

Figure 39. CMCs of (a) baseline and (b) SoPF algorithms for experiment B (shoe). 84

Figure 40. CMCs of (a) baseline and (b) SoPF algorithms for experiment C (viewand shoe). 85

Figure 41. CMCs of (a) baseline and (b) SoPF algorithms for experiment D (surface). 86

Figure 42. CMCs of (a) baseline and (b) SoPF algorithms for experiment E (surfaceand shoe). 87

Figure 43. CMCs of (a) baseline and (b) SoPF algorithms for experiment F (surfaceand view). 88

Figure 44. CMCs of (a) baseline and (b) SoPF algorithms for experiment G (surface,shoe and view). 89

vii

MOTION MODEL BASED ON STATISTICS OF FEATURE RELATIONS:

HUMAN IDENTIFICATION FROM GAIT

by

ISIDRO ROBLEDO VEGA

An Abstract

of a dissertation submitted in partial fulfillmentof the requirements for the degree of

Doctor of PhilosophyDepartment of Computer Science and Engineering

College of EngineeringUniversity of South Florida

Date of Approval:August 22, 2002


viii

There is renewed interest in gait analysis in the computer vision community, not from

a structure-from-motion point of view, as was the past emphasis, but from the intriguing

possibility of human identification from gait. A novel representation scheme for human

gait analysis is presented here that is based on just the evolution in the statistics of the

relationships among the detected image features, without the need for object models, perfect

segmentation, or part level tracking. Instead of the statistics of the feature attributes

themselves, the statistics of the feature relations are represented as a point in a space

where the Euclidean distance is related to the Bhattacharya distance between probability

functions. Different motion types sweep out different traces in this Space of Probability

Functions (SoPF). The effectiveness of this SoPF representation is shown on four data sets

of image sequences of humans engaged in walking, jogging or running. The first set of

sequences, which was designed to study the variation with respect to segmentation errors

and scale changes, is a small one consisting of 3 persons on a treadmill in an indoor setting.

The second set of sequences, which was designed to study the possibility of recognizing

persons from walking, jogging, and running gaits, is from 10 persons in outdoor settings.

The third set of sequences, which was designed to study viewpoint variations, consists

of 20 persons walking on paths inclined at 0◦, 22.5◦, and 45◦ with respect to the image

plane. The fourth set of sequences, which was designed to study variations due to footwear,

walking surface and view from two cameras, consists of 74 persons walking elliptical paths.

The experimental results show that (a) the SoPF representation is robust with respect to

segmentation errors and scale changes, (b) personal attributes is by far the largest source

of variation when compared to factors such as direction of motion, viewpoint, and motion

type, (c) it is possible to recognize persons not only from walking gait, but also from their

jogging and running gaits, (d) identification of persons is possible from walking sequences

viewed at angles other than frontal-parallel as long as the gallery contains the gait from the

ix

same viewpoint as the probe and lastly (e) walking surface variations have significant effect

on performance.

Abstract Approved:Major Professor: Sudeep Sarkar, Ph.D.Professor, Department of Computer Science and Engineering

Date Approved:

x

CHAPTER 1

INTRODUCTION

Motion analysis deals with input from different sources, for example, static camera acquir-

ing moving objects, moving camera acquiring information about static or moving objects,

or static or moving camera acquiring images of static or moving objects with light varia-

tions. It is hard to encapsulate all of motion analysis research in computer vision. The

diversity of goals and tasks is staggering. It can include tasks such as inferring motion pa-

rameters, distinguishing rigid motion from non-rigid motion [1], computing the periodicity

of motion [2] [3] [4], or even using the motion information to infer object identities. This

last task, that is motion-based recognition, is relevant to our work. In particular, we are

interested in using high-level complex motion patterns, as exhibited when someone moves,

to recognize that person. There are many possible approaches to this problem, many of

which we discuss in the next chapter. We are, however, interested in a method that is

robust with respect to segmentation errors, does not require (point or extended) feature

correspondences, and part or object identities are also not needed. The last condition is

based on the observation that high level complex motion analysis need not be contingent

on part or object recognition [5] [6]. Many have explored methods that require feature

correspondences in terms of optic flow fields [7] [8] [9] [3] or object parts [10], but the per-

formance of these methods is strongly affected by noise, image resolution, and the extent of

frame-to-frame motion. The approaches that avoid these problems rely on more area based

measures, such as image or object self-similarity, or behavior over a long time period [2] [4].

We propose a novel strategy that emphasizes the evolution of spatial relationships among

features with motion, rather than the attributes of the individual features [11] [12].

With motion, the statistics of the relationships among the image features change. This

change or non-stationarity in relational statistics is not random, but follows the motion

1

pattern. The shape of the probability function governing the distribution of the inter-

feature relations, which can be estimated by the normalized histogram of observed values,

changes as parts of the object move. We have developed the concept of a space over these

probability functions, which we refer to as the SoPF (Space of Probability Functions),

to study the trend of change in their shapes. Distances in this space are related to the

Bhattacharya distance between probability mass functions. Each motion type creates a

trace in this space. The attractive aspects of this approach are that:

(a) it does not require perfect segmentation of the object from the background,

(b) it does not require feature tracking,

(c) it is amenable to learning, and

(d) there is no assumption about single pixel movement between frames.

It is also worthwhile pointing out that by focusing on the change in relational parameters

over time we bring dynamic aspects of motion into the fore.

The use of multidimensional histograms, even relational ones, in computer vision is

not new. They have been used extensively in image databases [13], recognition [14], and

shape modeling [15]. The novelty of the present contribution is that we offer a strategy for

incorporating dynamic aspects and use it for motion-based recognition of humans.

Interest in applications for human identification using is very high in these days. Biomet-

rics is the measure of biological or behavioral characteristics for identification of individuals.

These characteristics can be fingerprint, face, hand geometry, voice, DNA, iris, retina, ear,

gait, etc. Each biometric has different properties. Technologies that facilitate human iden-

tification at a distance are of particular interest as they are not intrusive nor do they require

contact. Gait, or the way a person walks, is such a biometric and has the advantage that

it can be collected at greater distances and does not require very co-operative subjects.

The Webster Collegiate Dictionary defines gait as “a manner of walking”. Because of the

periodic nature of the human walking, one gait cycle is considered the unit for analysis in

most of the systems devoted for human identification based on gait. A gait cycle, as defined

by Murray et al. [16], is the time interval starting when the right heel strikes the floor, going

2

Figure 1. Different phases of a walking gait cycle.

to the swing of the left leg advancing forward, then the left heel strikes the floor and the

right leg swings to advance and ending when the right heel strikes the floor again. This

process is illustrated in Fig. 1. Four phases can be distinguished in a gait cycle:

(a) Right stance phase is the period of time the right foot is in contact with the floor. It

begins with a “right heel-strike” and ends with a “right toe-off.”

(b) Left swing phase is the period of time the left foot is not in contact with the floor. It

begins with a “left toe-off” and ends with a “left heel-strike.”

(c) Left stance phase is the period of time the left foot is in contact with the floor. It

begins with a “left heel-strike” and ends with a “left toe-off.”

(d) Right swing phase is the period of time the right foot is not in contact with the floor.

It begins with a “right toe-off” and ends with a “right heel-strike.”

When the left and right stances overlap that means both feet are in contact with the

floor, this is also called “double limb support” period. The left stance phase is not completed

at the end of the gait cycle; it finishes with the “left toe-off” of the next cycle. Murray et

al. [16] suggest that if all the components of the gait movements are considered, then gait

can be unique. About twenty gait components can be considered, but some of them can be

very difficult to capture by computer vision systems since they can only be measured from

top views of the subjects (i.e. pelvis, thorax and ankle rotation).

3

In the context of human identification based on gait, the specific questions that we

explore in this dissertation using our statistical motion model are:

(a) Can we identify persons from not just walking gait but jogging and running as well?

(b) Is gait viewed frontal-parallel (which is the current practice) the only possibility?

(c) Can we identify humans from gait viewed at 22.5◦ and 45◦?

(d) Is it possible to do gait-based identification using a representation that is robust with

respect to segmentation and does not involve part level tracking?

(e) How is gait-based identification dependent on covariates such as viewpoint, shoe type,

or surface?

(f) What is the performance of gait-based identification on datasets with a large number

of subjects?

Our system for gait-based human identification using statistical motion models involves

different stages. We will try to briefly introduce these stages in the following paragraphs.

The discussion here is necessarily terse. The new concepts of relational distributions and

space of probability functions (SoPF) will become clearer in later chapters. The purpose

here is just to provide a quick overview.

In the first stage, we process a sequence of images containing a person in motion with

the purpose of segmenting the person from the background. The outputs of this process

are binary silhouettes. The heights of these silhouettes are calculated and used to compute

a scale normalization factor or scaling constant.

In the second stage, we use the binary silhouettes to extract low level features and

compute relational distributions over them. The outputs of this stage are relational dis-

tributions in the form of histograms that accumulate the occurrences of each relationship

between paired image features. Each frame in an image sequence has its corresponding

relational distribution. According to our experience, a typical image sequence of a walking

gait cycle will contain between 28 and 42 frames if acquired at 30 frames per second. This

variation is due to walking speed and stride length difference between persons.

4

Figure 2. Image processing steps to build the SoPF.

In the third stage, or training stage, the dataset is partitioned to generate a training set

of relational distributions, which we use to build a space of probability functions (SoPF)

using principal component analysis (PCA). Once the SoPF is constructed, the relational

distributions in the training set are represented as points in this space. The output of this

stage is a set of point coordinates for each relational distribution in the training set. Fig 2

illustrates the training process followed to arrive at the SoPF.

In the fourth stage, or testing stage, the relational distributions are projected onto

the SoPF to obtain their point coordinates. The sequence of point coordinates representing

relational distributions from a gait cycle traces out a path in the SoPF. We use the Euclidean

distance between the traces of two gait cycles as a similarity measure. This distance can be

time normalized to compute similarity between cycles of dissimilar gait (i.e. walking versus

jogging) or un-normalized distances can be used to measure similarity between cycles from

same gait (i.e. walking versus walking). Based on these similarity values, we compute

5

Figure 3. The process to compute similarity between two image sequences.

performance measures such as identification and verification rates. Fig. 3 shows the process

for computing the similarity measure between two image sequences of persons in motion.

The organization of this dissertation is as follows. Overview of research done on gait

analysis from biomechanics and visual perception points of view, along with the state of

the art in gait-based human identification using computer vision techniques, is presented

in Chapter 2. Then we introduce our framework for motion analysis in Chapter 3, starting

with the relational distributions, the development of the concept of the Space of Probability

Functions (SoPF), and the method used to measure similarity between image sequences.

Chapter 4 contains a set of introductory experiments over a small dataset of three persons.

Human identification experiments from walking, jogging and running gaits over a dataset of

10 persons are presented in Chapter 6. Experiments with walking gait viewed at different

6

angles over a dataset of 20 persons are presented in Chapter 7. Chapter 8 introduces a

larger dataset containing 74 subjects, a baseline algorithm and a set of experiments that we

refer to as the gait challenge problem, which serves as a benchmark for human gait based

identification. We also present the performance of our system over this large dataset in

Chapter 9. Finally, we conclude with Chapter 10.

7

CHAPTER 2

RELATED WORK

2.1 Biomechanics of Human Gait

Biomechanics is defined as “the scientific study of the mechanics of biological and especially

muscular activity” by the Webster Collegiate Dictionary and sometimes referred to only as

“gait analysis.”

Gait analysis as a science started long time back. References to experiments done by

Aristotle (384–322 BC), Leonardo da Vinci (1452–1519) and others were found in [17].

Photographic analysis started with Eadweard Muybridge in the 1870’s who analyzed horses

in motion and showed the gallop of a horse to be a four-beat gait. After his success with

analyzing horse motion, he started taking photos of other animals, including humans. Fig. 4

shows some photographs of a man walking at ordinary speed, which were digitized from [18].

The squared pattern in the background was used to measure displacement. These frames

were captured using a number of precisely time-synchronized cameras.

Research done on the mechanical aspects of human gait is multidisciplinary and can

includes fields such as anatomy, physical therapy, prosthetics, orthopedics, rehabilitation,

ergonomics, physiology, and sports science. The predominant applications involve medical

purposes. Hip, knee and ankle movement, and flexion are typically considered in clinical

gait analysis to diagnose abnormalities. Sensors used to capture these features include

3D electromagnetic motion trackers, force platforms, electromyography, and visual markers

with video systems based on fast shutter speed CCD cameras in different configurations for

2D or 3D data capture.

Aminian et al. [19] present an ambulatory system for gait analysis that can segment

gait phases. It uses small sensors, called gyroscopes, to measure the velocity of angular

rotation. They attached the sensors to the shanks of subjects. The signals produced are

8

Figure 4. Samples of Eadweard Muybridge photographs from “The Humans in Motion.”

then processed using multi-resolution wavelet decomposition to enhance heel-strike and

toe-off negative peaks in these signals.

Pappas et al. [20] also use gyroscopes in their gait phase detection system (GBDS) to

segment gait cycles into heel off, swing, heel strike and stance phases. It is composed of one

gyroscope and three force sensitive resistors installed in a shoe sole and a portable signal

processing board. The system was tested in indoor and outdoors environments showing

robustness to diverse walking conditions.

Huitema et al. [21] introduce a low cost ultrasonic motion analysis system for the mea-

surement of spatial and temporal gait parameters such as duration of stance and swing

phases, and step and stride lengths. They put a fixed ultrasonic transmitter on the floor

and installed receivers on the subject’s feet. They used heel strike and toe off signals to

measure the duration of stance and swing phases. According to them, walking speed is not

constant over a cycle for asymmetric gaits and this gait pathology can be captured by their

system.

9

Sadeghi et al. [22] present an extensive literature review with more than 160 references

and clarify the concepts of gait symmetry, gait asymmetry, limb dominance, and laterality.

They try to answer questions such as do the lower limbs behave symmetrically in able-

bodied gait? and how limb dominance effects symmetry in the behavior of lower limbs?.

They review research work supporting both gait symmetry and gait asymmetry, and mention

that there are not enough studies showing the effects of limb dominance or laterality on gait

behavior. Their conclusion is that in most of the studies, symmetry is assumed to simplify

gait analysis and that asymmetry reflects a natural difference in the behavior of limbs,

which can be caused by limb dominance or laterality. More research was recommended to

support this hypothesis.

Ambrosio et al. [23] designed a method for the reconstruction of a 3D biomechanical

model of the human body from a single camera. They use a set of kinematics constraint

equations associated with the biomechanical model to solve the system of equations that

calculates the spatial position of each anatomical point. A minimization cost function is

used to select the optimum solution based on the smoothness of the reconstructed motion.

LaFiandra et al. [24] present experiments to determine the effects of carrying a backpack

on the transverse plane of upper and lower body torque while walking. They mention that

the counter-rotation of upper and lower body is reduced when the subject is carrying a load

and suggest that the upper body torque increases. The purpose is to know the effects of

carrying load so as to reduce injuries.

Chau [25] [26] reviews several approaches for gait data analysis that include fuzzy sys-

tems, multivariate statistical techniques, fractal dynamics, neural networks and wavelet

methods. Chau considers high dimensionality, temporal dependence, correlation between

curves and nonlinear relationships of gait data to be the main challenges for its analy-

sis. This review aims to provide knowledge to researchers in clinical interpretation about

abilities and limitations of these techniques.

10

Figure 5. Five point light display frames of a human walking.

2.2 Visual Perception of Human Gait

Johansson [27] presented a method for isolating motion patterns, which is known as point

light displays. With this method he removed the interference of the body shape or aspect

with motion information. Light points were attached to body joints to produce images like

those in Fig. 5. By viewing isolated images it is hard to describe what they contain, but

when they are animated, it is easy to perceive and discriminate between different types of

motion such as walking, running, dancing, etc.

Cutting and Kozlowski [28] made use of Johansson’s method for experiments in which

subjects could recognize themselves and others claiming that point-light displays are suf-

ficient cues for identification. In a different experiment [29], they showed that men and

women gaits can also be differentiated using dynamic point light displays.

Mather and Mudoch [30] also performed experiments to discriminate gender based in

human locomotion studies showing that males and females have different lateral body sway,

they mention that males swing their arms more than females but rotate their hips less.

They used markers on shoulders and hips to measure sway from frontal views. They claim

that their approach is more robust and have better performance than the one by Kozlowski

and Cutting [29]. In previous studies [31], Mather et al. also used dynamic motion cues

to demostrate that observers can identify the direction in which the walkers are going from

just the motion of their extremities. In their experiments they removed the translatory

component of the motion displays and presented the observers only elliptical and oscillatory

components.

11

Bientema and Lappe [32] present experiments on subjects with brain lesions in motion

processing areas to show that even when they have severely impaired image motion per-

ception they can still perceive human figures from point light displays without local image

motion. Based on these studies they propose that image motion is not the basis for the

perception of biological motion, giving more importance to the dynamic evolution of the

body posture over time.

Neri et al. [33] make use of point light displays to investigate the ability of the visual

system to process biological motion over space and time. It is known that when more points

are added to the motion displays, an observer can perceive biological motion faster. They

conducted experiments in which subjects were asked to detect the presence of a walker and

the direction of walk in the presence of dynamic random noise. The points in the motion

displays appeared and disappeared over time. By adding more information over time they

found that the parts of the visual system that process biological motion are not so efficient

in constantly integrating the new information.

Pavlova et al. [34] investigated the effect of showing films backwards on the visual

perception of biological motion. They showed motion displays to a group of subjects in

forward direction and then in backward direction, which they call normal mode. Then

another group was exposed to motion displays in reverse mode. They found apparent-facing

effects in the perception of biological motion in both normal and reverse modes including

leftward and rightward motion.

Grossman et al. [35] studied functional magnetic resonance images to measure activ-

ity levels in different areas of the brain to determine which of them are directly involved

in the perception of biological motion. The area activated when viewing motion displays

was located in the superior temporal sulctus (STS). In subsequent studies, Grossman and

Blake [36] presented inverted motion displays to observers and found, by measuring the

activity levels of the regions of the brain dedicated to process biological motion, that per-

ception of biological motion is dependent to orientation, supporting claims that inverted

animations are more difficult to perceive. Activity in the STS was higher with inverted

displays than with scrambled ones.

12

Grezes et al. [37] investigate the areas of the brain involved in the perception of rigid

and non-rigid motion. They measured activity levels in the different regions of the brain

by analyzing functional magnetic resonance images. They considered that a specific neural

network in the brain performed the perception of structure from motion. They found that

the left intraparietal cortex is involved in the perception of non-rigid biological motion in

addition to the STS.

A computational interpretation for visual perception of human movements is presented

by Hoffman and Flinchbaugh [38]. In another work, Flinchbaugh and Chandrasekaran [39]

present the theory of spatio-temporal aggregation where they explain the grouping processes

performed by the visual system when exposed to image sequences. This set of works started

the crucial link between psychological studies and construction of computer vision systems

studying human motion.

2.3 Human Gait Analysis Using Computer Vision Techniques

Recently, gait analysis has received renewed interest in computer vision. It includes works

that recognize human gait types, such walking, running, jogging, or climbing [40] [41] [42],

and the identification of people from gait. We concentrate on the latter body of work.

Bobick and Johnson [43] use static body and stride parameters as features for recogni-

tion, which are recovered from different view angles, in indoor and outdoor settings, on a

database of 20 persons using electromagnetic markers and 18 persons using video-based fea-

ture recovery. An expected confusion measure is used to evaluate the discrimination ability

of the set of parameters under these different conditions. In a parallel line of work from

the same research group, Tanawongsuwan and Bobick [44] use joint-angle trajectories of

lower-body parts, as captured with 3D electro-magnetic markers attached to the body. The

3D location measurements are projected onto the walking plane to compute the joint-angle

trajectories. Recognition is performed using the nearest neighbor algorithm on a database

of 150 sequences from 18 people.

Shakhnarovich et al. [45] compensate for different viewpoints; a view-normalization ap-

proach for face and gait recognition is adopted. They first compute the visual hull using

13

images from four cameras, which is then used to produce canonical viewpoints. For gait

recognition, they generate virtual side views of the person and compute the silhouette based

on the inferred view. This silhouette is then divided into seven regions. The centroid, as-

pect ratio, and orientation of the fitted ellipses for each of the regions over time are used as

features. They use the nearest neighbor classifier, based on a diagonal covariance Gaussian

model of the features, for gait recognition. In a later work [46], they consider gait appear-

ance features for gait recognition and a support-vector machine is used to perform gender

classification.

Little and Boyd [9] describe the shape of motion with features derived from the moments

computed over the dense optical flow of image sequences. They construct sequences of

scalars from each flow and analyze them in the frequency domain. These scalars have the

same period but different phases. Recognition is performed based on the difference in phase

features between persons. In a more recent work, Boyd [47] uses phase-locked loops to

represent frequency and phase locking in the oscillations of human gait. Applying video

phase locked loop algorithm to each frame in a sequence produces a phasor containing phase

and angle information in complex form. Procrustes shape analysis is adapted to measure

the similarity between vectors of phasors from different video sequences.

Hayfron-Acquah et al. [48] base recognition on motion symmetry, as measured by a

generalized symmetry operator applied to edge maps of silhouettes. Gait recognition is per-

formed using k-nearest neighbors based on the Fourier features computed from symmetry

measurements. Another line of attack by the same research group [49] adopts a statistical

approach based on velocity moments, which are an extension of centralized moments. Veloc-

ity moments up to order four are computed over temporal templates from image sequences

of people walking. Clustering the velocity moment values achieves classification.

BenAbdelkader et al. [50] [51] introduced the concept of eigengaits where principal

component analysis (PCA) is applied to similarity plots, described in [2], to map them

to a lower dimensional space with good data separability. Similarity maps capture the

variation in image similarity over time; for periodic motion these maps are also periodic. In

a different approach [52], this group use stride length and cadence to differentiate between

persons.

14

Kale et al. [53] use continuous hidden Markov models (HMMs) trained to classify feature

vectors generated from gait sequences by computing Euclidean distances between images

from a set five key frames over one gait cycle. They claim that these feature vectors

compactly capture the structural and transitional characteristics that are unique to each

person. However, they need several gait cycles from each person to successively train the

HMMs.

Collins et al. [54] present a method for human identification based on body shape and

gait. This method performs template matching of body silhouette images from frontal-

parallel view points. Nearest neighbor classification is performed over normalized correlation

scores from training and testing silhouette images.

Table 1 summarizes the salient aspects in which this present work is different from the

state of art in gait-based recognition. The table lists the basic technology used, data size

in terms of number of persons, extent of dependence on quality of segmentation, and the

need for part level tracking. The statement regarding the dependence of an approach on

segmentation quality reflects our experience and opinions with low-level vision algorithms.

The contributions of the present work lies in that it does not require part level tracking

and, as we show later, it is robust with respect to segmentation errors. The database size

is also competitive with respect to the present state of art.

15

Table 1. Summary of recent research on gait-based recognition using computer vision tech-niques. Database size is expressed as the number of subjects and includes acquisitionconditions (“I” for indoor and “O” for outdoors), and the best recognition rate reported.

Work BasicTechnology

Number ofsubjects,acquisitionconditions,recognitionrate

Type of segmentationneeded and dependenceon quality of segmenta-tion

Part LevelTracking

Bobick andJohnson [43]

Static body pa-rameters, stridelength

18, I, none15, O, none

Silhouette divided into10 sections, dependenton quality of segmenta-tion

Yes (head,pelvis,feet)

Boyd [47] Video phase-locked loops

2 real and2 synthetic,none

Bounding box from hipdown

Yes(hip, legs)

Collins et al.[54]

Body shape andgait

25, I, 100%24, I, 100%55, O, 87%28, I, 93%

Template matching ofsilhouettes, strong de-pendence

None

Lee and Grim-son [46]

Gait appearancefeatures

24, I, 100%25, I, 99.7%

Silhouettes divided into7 regions, strong depen-dence

None

Little andBoyd [9]

Statistical mea-sures from opti-cal flow

6, O, 92.2% Optical flow computa-tion, strongly depen-dent on illuminationchanges

None

Kale et al. [53] ContinuousHMMs

5, O, none25, I, 72%43, O, 56%

Width vectors from 5silhouette images, notstrongly dependent

None

Hayfron-Acquah et al.[48]

Symmetry analy-sis

4, I, 100%6, I, 97.6%

Edge maps of silhou-ettes, strongly depen-dent

None

BenAbdelkaderet al. [50] [51]

Eigengaits: PCAover self similar-ity plots

6, O, 93%44, O, 77%7, I, 65%,25, I, 76%

Image templates fromsilhouettes, not stronglydependent

None

Shutler et al.[49]

Temporal mo-ments

4, I, none Temporal templatesfrom silhouettes, depen-dent

None

This work Non-stationarityin feature rela-tions statistics

3, I, none10, O, 100%20, O, 80%74, O, 90%

Edges in motion usingsilhouettes as masks,weakly dependent

None

16

CHAPTER 3

MOTION MODELING: THEORY

In this chapter we describe the statistical model for motion analysis developed in this

dissertation, starting with the definition of the concept of Relational Distributions, followed

by the theoretical description of the Space of Probability Functions (SoPF), and ending with

the method to compute similarity between traces in the SoPF.

3.1 Relational Distributions

We view an image as an assemblage of low-level features, such as edge pixels, corners,

straight lines, or region patches. The structure perceived in an image is determined more

by the relationships among features than by the individual feature attributes. Our goal is to

devise a mechanism to capture this structure so that we can use its evolution with time to

model high-level motion patterns. Graphs and hyper-graphs have been the most commonly

used mechanism for capturing these relationships among features [55] [56] [57] [58]. However,

the study of variation of a graph over time requires solving the correspondence problem

between features, which is a computationally difficult problem. We avoid this need for

feature-level correspondence by focusing on the statistical distribution of the relational

attributes observed in the image.

Definition 1

Let

(a) F = {f1, · · · , fN} represent the set of N features in an image,

(b) Fk represent a k-tuple of features randomly picked, and

(c) the relationship among these k-tuple features be denoted by Rk.

17

Figure 6. An empirical sampling-based interpretation of relational distributions.

Thus, the 2-ary relationship between features, which is the most commonly used form,

will be denoted by R2. Notice that low-order spatial dependence is captured by small values

of k and higher-orders of spatial dependences are captured by larger values of k. In a set

of N primitive features there are CkN possible k-tuples.

Definition 2

Let the relationships, Rk, be characterized by a set of M attributes Ak = {Ak1, . . . , AkM}.Then the shape of the object can be represented by joint probability functions: P (Ak = ak),

also denoted by P (ak1, . . . , akM ) or P (ak), where aki is the (discretized, in practice) value

taken by the relational attribute Aki.

We term these probabilities as the Relational Distributions. Fig. 6 contains a graphical

interpretation of this concept. Given an image, if you randomly pick k-tuples of features,

what is the probability that it will exhibit the relational attributes ak?, what is P (Ak = ak)?

The representation of these relational distributions can be in parametric forms or in

non-parametric, histogram or bin-based forms. The advantage of parametric forms, such as

mixture of Gaussians, is the low representational overhead. However, we have noticed that

these relational distributions exhibit complicated shapes that do not readily afford model-

ing using a combination of simple shaped distributions. So, we adopt the non-parametric

histogram based form. To reduce the size that is associated with a histogram based repre-

sentation, we propose the Space of Probability Functions, which is described in Section 3.2.

But before that, we look at a concrete example of a relational distribution.

18

(a)

(b)

(c)

Figure 7. Detection of edges in motion using background subtraction. Sample frame froma walking outdoors sequence is shown in (a) the background subtracted image is shown in(b), its corresponding edges in motion are shown in (c).

3.1.1 Moving Edge Based Features

We illustrate the concept of Relational Distributions using moving edge pixels as low-level

features. Other features types such as the neurally inspired keys [59] or those based on the

Gaussian derivatives [14] will be subject of future studies. We consider moving pixels most

likely to belong to moving objects. One of the methods we use to identify these edge pixels

in motion is as follows. We apply the Canny edge detector over each image frame and select

only those edge pixels that fall in or within a small distance from a motion mask created

either by frame differencing or by background subtraction. Fig. 7 shows the edges selected

using masks created from background subtraction. Fig. 8 shows an example of a different

method for detecting moving features using frame differencing with a liberal threshold.

Each edge pixel in motion, fi, is associated with the gradient direction, θi, at that point,

which is estimated using the Gaussian smoothed gradient that is computed by the Canny

19

(a) (b)

(c) (d)

Figure 8. Detection of edges in motion using frame differencing. Two consecutive framesfrom a running sequence are shown in (a) and (b). The thresholded difference image isshown in (c). (d) The segmented edges in motion.

edge detector. To capture the structure between edge pixels, we use the distance between

the two edge pixels and the difference in edge orientations as the attributes {A21, A22} of

R2. We normalize the distance between the pixels by a distance (D), which is related to the

size of the object in the image, to make it somewhat scale invariant. In the next section,

we discuss how we choose this scaling constant D. Note that our choice of attributes is

such that the probability representation is invariant with respect to image plane rotation,

translation, and invariant with respect to scale changes. Fig. 9(a) depicts the attributes

that are computed between the two pixels. Fig. 9(c) shows P (a21, a22) for the edge image

shown in Fig. 9(b), where high probabilities are shown as brighter pixels. Fig. 9(d) shows

a 3D bar plot of the probability values. Note the concentration of high values in certain

regions of the probability event space.

To capture the relational distribution over triples of edge pixels, P (a31, a32, a33, a34),

we can use four attributes, as illustrated in Fig. 10(a). Since all pairs of distances in the

triplet are not independent of each other, attributes over all pairs would not constitute

an independent set of attributes. To arrive at an independent set of relations, the pairs

of pixels that are connected by the maximum distance spanning tree over them can be

considered, which for Fig. 10(a) are (1, 2) and (1, 3). The four attributes characterizing

20

d

θ

Edge

Edge

(a) (b)

5

10

15

20

25

30

5

10

15

20

25

30

0

2

4

6

8

x 10−3

d/D

3D Bar Plot

θ

Pro

babi

lity

(c) (d)

Figure 9. Edge pixel based 2-ary relational distribution. (a) The two attributes character-izing relationship between two edge pixels. (b) Moving edge pixels in an image. (c) Therelational distribution P (d/D, θ), where D is a scaling constant. P (0, 0) is the top left cor-ner of the image. Brighter pixels denote higher probabilities. (d) The relational distributionshown as a 3D bar plot.

21

(a) (b)

Figure 10. Edge pixel based 3-ary relational distribution. (a) The four attributes character-izing the relationship among three edge pixels. (b) The four dimensional relational distribu-tion P (d12/D, d13/D, θ12, θ13) visualized as 2D image for the edge image in Fig. 9(b). Therows correspond to the row-scanned version of the (d12/D, d13/D) subspaces. The columnscorrespond to the row-scanned version of the (θ12, θ13) subspaces. Only non-zero rows areshown. P (0, 0, 0, 0) is the top left corner of the image.

the relationship among three edge pixels are shown in Fig. 10(a). The four dimensional

relational distribution P (d12/D, d13/D, θ12, θ13) visualized as 2D image for the edge image in

Fig. 9(b). The rows correspond to the row-scanned version of the (d12/D, d13/D) subspaces.

The columns correspond to the row-scanned version of the (θ12, θ13) subspaces.

3.1.2 Scaling Constant D

We use the scaling constant D to normalize the distance between edge features and make

them invariant with respect to scale changes. The value of the scaling constant D can be

chosen in a number of ways. If, for example, we knew that the object under consideration

occupied most of the image, then we could use image dimensions, such as the image diagonal,

as the scaling constant. We did this for the treadmill sequences and the gait challenge data

where the silhouettes are normalized (Fig. 8). A second more involved strategy, which we

use for all other sequences is as follows. First we obtain the height of the binary silhouette

from each frame. Since estimates of heights of persons from images may be noisy due to

movement, segmentation errors or perspective effects, we obtain a smoothed estimate by

22

fitting a straight line to the height curve as a function of time. Fig. 11 shows the variation

of this estimated D with time for three motion trajectories. Note that, for frontal-parallel

motion (Fig. 11(a)), D is more or less a constant. As the angle of the motion trajectory

with respect to the image plane increases (Figs. 11(b) and (c)), D changes linearly with

time, accounting for the change in size of the projected image.

3.2 Space of Probability Functions

As the parts of an articulated object move, the relational distributions will change. Motion

will introduce non-stationarity in the relational distributions. Fig. 12 shows some examples

of 2-ary relational distributions for some leg configurations. Notice how the modes of the

probability functions, which are the bright regions in the images, change with leg motion.

Is it possible to infer the nature of articulated motion by quantifying the evolution of the

nature of these non-stationarities? Is it possible to not only make gross judgments about

the nature of motion, such as distinguishing periodic motion from non-periodic one, but

can we also establish identity of the person in motion? In order to enable us to answer

these questions in the affirmative, we first set up a more compact representation for these

relational distributions that is easier to manipulate and is more parsimonious than just

plain histograms.

Definition 3

Let P (ak, t) represent the relational distribution at time t.

Definition 4

Let √P (ak, t) =

n∑i=1

ci(t)Φi(ak) + µ(ak) + η(ak) (1)

describe the square root of each relational distribution as a linear combination of orthogonal

basis functions where Φi(ak)’s are orthonormal functions, the function µ(ak) is a mean

function defined over the attribute space, and η(ak) is a function capturing small random

noise variations with zero mean and small variance. We refer to this space as the Space of

Probability Functions (SoPF).

23

(a)

(b)

(c)

Figure 11. Fitting of a line through the height curve generated from a walking cycle ofmotion at (a) 0◦ or frontal-parallel (b) 22.5◦ (c) 45◦ with respect to the image plane todetermine the scaling constant D.

24

(a) (b)

(c) (d)

(e) (f)

Figure 12. Some configurations of legs in motion in (a), (c) and (e) with their corresponding2-ary relational distributions in (b), (d) and (f).

Given a set of relational distributions, {P (ak, ti)|i = 1, · · · , T}, the SoPF can be arrived

at by using the Karhunen-Loeve transform or, for the discrete case, by principal component

analysis (PCA). The dimensions of the SoPF are given by the eigenvectors of the covariance

of the square root of the given relational distributions. The variance along each dimension is

proportional to the eigenvalues associated with it. In practice, we can consider the subspace

spanned by a few (N << n) dominant eigenvectors associated with the largest eigenvalues.

We have found that for human motion just N = 10 eigenvectors are sufficient. Thus, a

relational distribution can be represented using these N coordinates (ci(t)s), which is more

compact representation than a normalized histogram based representation.

Note that this use of the PCA is different from other uses of this technique in motion

tracking. For example, Black and Jepson [7] also used PCA but in the context of tracking

and matching moving objects. The representation is also different because they use PCA

over the image pixel space whereas we use it over relational probability functions.

25

We use the square root function so that we arrive at a space where the distances are

related to the Bhattacharya distance between the relational distributions, which we prove

in the next two theorems.

Theorem 1

The Euclidean distance between the square root of the two relational distributions,

dE(√

P (ak, t1),√

P (ak, t2)), is monotonically related to the Bhattarcharya distance between

relational distribution, dB(P (ak, t1), P (ak, t2)), as captured by

dE(√

P (ak, t1),√

P (ak, t2)) = 2 − 2e−dB(P (ak,t1),P (ak,t2))

Proof: The proof uses the facts that the sum of the probabilities equals one and that

the Bhattacharya distance between two probability functions, P1(x) and P2(x) is given by

−ln∑

x

√P1(x)P2(x).

dE(√

P (ak, t1),√

P (ak, t2)) =∑

ak(√

P (ak, t1) −√

P (ak, t2))2

=∑

akP (ak, t1) +

∑ak

P (ak, t2)

− 2∑

ak

√P (ak, t1)P (ak, t2)

= 2 − 2e−dB(P (ak,t1),P (ak,t2))

(2)

Theorem 2

In the SoPF representation, the Euclidean distance between the coordinates, {ci(t1)} and

{ci(t2)}, is monotonically related to the Bhattacharya distance between the corresponding

relational distributions P (ak, t1) and P (ak, t2).

Proof: The square roots of the relational distributions, P (ak, t1) and P (ak, t2), can be

approximately represented as follows using the SoPF coordinates; the error in the approxi-

mation is the energy of the eigenvectors ignored during SoPF construction.

√P (ak, t1) ≈

N∑i=1

ci(t1)Φi(ak) + µ(ak) (3)

26

Similarly, √P (ak, t2) ≈

N∑i=1

ci(t2)Φi(ak) + µ(ak) (4)

The Euclidean distance between them can be expressed in terms of the distance between

the coordinates as follows. We have used the fact that the dimensions of the SoPF, Φi(ak)s,

are orthonormal.

dE(√

P (ak, t1),√

P (ak, t2)) =∑

ak

(√P (ak, t1) −

√P (ak, t2)

)2

=∑

ak

(∑Ni=1 ci(t1)Φi(ak) − ∑N

j=1 cj(t1)Φj(ak))2

=∑

ak

(∑Ni=1(ci(t1) − ci(t2))Φi(ak)

)2

=∑

ak

∑ij(ci(t1) − ci(t2))(cj(t1) − cj(t2))Φi(ak)Φj(ak)

=∑

ij

∑ak

(ci(t1) − ci(t2))(cj(t1) − cj(t2))Φi(ak)Φj(ak)

=∑

ij(ci(t1) − ci(t2))(cj(t1) − cj(t2))∑

akΦi(ak)Φj(ak)

=∑

i(ci(t1) − ci(t2))2

(5)

Using this result and Theorem 1, we can write

∑i

(ci(t1) − ci(t2))2 = 2(1 − e−dB(P (ak,t1),P (ak,t2)))

3.3 Similarity Measures

Articulated motion sweeps a path or trace through the SoPF. Distances between SoPF

traces can quantify differences in motions. There are various sophisticated techniques such

as those based on hidden Markov models, dynamic Bayesian networks, and state space

trajectories [60] that can be used to model the trajectories. In this work, however, we adopt

a simpler distance measure between two traces to demonstrate the viability of using the

traced paths for discriminating between motion types and for inferring personal identity. We

show in later chapters that even with a simple distance measure we are able to obtain good

discrimination. We define two versions of this distance measure: (i) time un-normalized

and (ii) time normalized.

27

3.3.1 Time Un-normalized Distance

The time un-normalized distance between two SoPF traces is defined as the average Eu-

clidean distance between the two traces, {c1(ti), i = 1 · · · n} and {c2(ti), i = 1 · · · n}. To

compute this distance, we align the two traces with respect to one time instant from the

two traces, i.e. find shift K such that ||c1(tk) − c2(tk + K)|| is minimum for some tk. If

the number of frames of the sequences being compared is different (m < n), then the dis-

tance is computed over the minimum number of frames, m. Mathematically, this distance

is expressed as

dun-norm(c1, c2) =1m

m∑ti=1

N∑j=1

(c1j (ti) − c2

j (ti + K))2 (6)

This measure is good for comparing motion of the same type and similar speed, for example,

in comparing the gait traces from different persons who are known to be walking.

3.3.2 Time Normalized Distance

If the speed of motion is not controlled, for example when we are comparing traces from

walking and running gaits from the same person, it is desirable to normalize the two traces,

{c1(ti), i = 1 · · ·m} and {c2(ti), i = 1 · · · n}, with respect to time. We adopt a strategy

similar to dynamic time warping used in speech recognition, except that we allow only

for constant stretching or contraction. We estimate this constant warping factor by first

establishing two alignment points on the two traces. Without loss of generality, let us

assume that the first trace has fewer samples than the second one, that is m ≤ n. The

distance between these traces is computed by first constructing a continuous curve, C1(ti)

from the first trace by assuming linear interpolation between the coordinate points. Next

we stretch this curve such that the first and the last coordinates match with the second

trace, i.e. C1(mn ti). Then we compute the distance between the second trace coordinate

points and the stretched curve.

dnorm(c1, c2) =1n

n∑ti=1

N∑j=1

(c2j (ti) −C1

j(m

nti))2 (7)

28

The warped distance measure responds to changes in shapes of the traces over each motion

cycle but does not change with the speed with which each cycle is executed. Thus, the

distance between a fast walk and a slow walk would tend to be small as compared to the

distance between a walk and a run cycle.

3.3.3 Similarity Measure Based on Multiple Gait Cycles

When we have sequences containing multiple gait cycles, which is the case of the gait chal-

lenge dataset, we formulate the problem of computing a similarity measure as follows. Let

the two image sequences to be compared be denoted by S1 = {S1(1), · · · , S1(M)} and

S2 = {S2(1), · · · ,S2(N)}. We partition S1 into disjoint subsequences of NS1 contiguous

frames each, such that each subsequence contains roughly one cycle. Let the k-th subse-

quence from S1 be denoted by S1k = {S1(k), · · · ,S1(k + NS1)}. We then compare each of

these subsequences with S2:

Corr(S1k,S2)(l) =NS1∑j=1

d (S1(k + j),S2(l + j)) (8)

The distance between two frames, in our case, is the Euclidean distance between their

correspondent points in the SoPF. The similarity is chosen to be the median value of the

distance of the S2 with each of these S1 subsequences as illustrated in Fig. 13.

Similarity(S1,S2) = Mediank

(max

lCorr(S1k,S2)(l)

)(9)

This method of computing the similarity between two sequences is robust with respect

to noise that distorts the motion information in a small set of contiguous frames.

29

Figure 13. Similarity measure between sequences with multiple gait cycles.

30

CHAPTER 4

INSIGHTS INTO THE SOPF REPRESENTATION THROUGH ANEXAMPLE

In this chapter, we present results on a small dataset of three persons performing three

types of motion, walking, jogging, and running, on a treadmill to illustrate and test various

aspects of the SoPF based representation. In the following chapters, we will present results

using larger and more complex datasets.

The data for the experiments described in this chapter and in Chapters 6 and 7 was

acquired with a Canon Optura digital video (DV) camera that has a single CCD and

performs progressive scans. Video was captured at 30 frames per second on DV tapes.

Then, it was downloaded via IEEE 1394 interface to a Pinnacle’s micro DV500 video capture

board installed on a PC to produce Microsoft AVI files using Sony’s dvsd codec. The AVI

files were broken into frames in PPM format using the Sony decoder. Frames, which are

720×480 in size, were then cropped to an image subregion within which the subject appears

in all the frames. Two consecutive frames of a running person are shown in Fig. 14. The

size of each frame is 256 × 130.

The small size of the database allows us to explore the following questions, by considering

individual raw distances and not just aggregate performance measures.

Figure 14. Two consecutive frames from a running sequence.

31

Figure 15. Ten most dominant dimensions of SoPF for the treadmill sequences.

(a) For each person, can we discriminate between motion types?

(b) Can we discriminate between motion types across persons?

(c) Is it possible to identify persons based on walking, jogging, or running gaits?

(d) Is the SoPF representation robust with respect to segmentation errors?

(e) Is the SoPF representation stable with respect to scale variations?

(f) Why not just do a PCA of the raw edges?

To explore these questions, we used only the 2-ary relational distributions, P (d/D, θ),

to build the SoPF. One cycle of each motion type for each person forms the training set,

which is a total of 306 frames. The eigenvectors of the SoPF associated with the 10 largest

eigenvalues are shown in Fig. 15 as gray level images with their corresponding eigenvalues

quantifying the associated variation shown below each image. The sizes are 30 × 30 cells

each. The size of each relational distribution is 30×30. The vertical axes of the images plot

the distance attribute, d/D, and the angle θ is along the horizontal axes. From the banded

pattern in the two most dominant eigenvectors, we can see that they emphasize differences

in the distance attribute between two features. Differences in orientation are emphasized

by the other eigenvectors.

Fig. 16 shows the sorted eigenvalues for the 2-ary relational distributions. Notice that

most of the energy of the variation in the relational distributions is captured by the few

32

0 5 10 15 20 25 30 350

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

Eig

enva

lues

Figure 16. Eigenvalues associated with the SoPF of 2-ary relational distributions.

large eigenvalues. For the results in our experiments, we used the eigenvectors associated

with the 10 largest eigenvalues, which are sufficient. This number of dimensions is only a

small fraction of the 900 entries in the 30 × 30 histogram representation of the relational

distributions.

Fig. 17(a) shows the variation of the coordinate, c1(t), associated with the most domi-

nant eigenvector for each of the three persons and for each motion type. Each plot shows

the variation over three motion cycles, overlaid on each other. We can make the following

observations from the figures. First, the differences in the nature of the variation for any

person and a motion type over different cycles are small. Second, c1(t) captures mostly the

periodic nature of the variation; the variation in this dimension between motion types and

between persons is small. For the first and the third persons, the second peak is smaller

than the first one in the walking traces. Also, the amplitude of variation for jogging motion

of the third person is lower than the other two. Since the eigenvector associated with this

coordinate emphasize variation in distance between features (see Fig. 15) and maximum

distance change is for features from the foot, the amplitude of variation of c1(t) seems to

be related to the stride lengths. In other words, the third person’s jogging stride is shorter

33

Table 2. Distance between the traces through the SoPF of two different cycles of motion forthe three persons and three motion types dataset. (Walking (W), Jogging(J), and Running(R)).

Trace Distances (10−3)Person 1 Person 2 Person 3

W J R W J R W J RW 1.23 30.74 4.41 42.12 52.54 10.99 67.11 39.86 14.91

Person 1 J 22.22 2.73 30.4 17.3 17.64 15.6 30.96 25.0 18.92R 10.65 43.82 4.4 56.66 64.57 18.39 73.35 40.2 15.57W 49.86 13.05 57.43 2.11 4.28 21.72 15.89 24.79 33.6

Person 2 J 41.57 10.14 49.38 8.04 6.67 16.97 22.05 19.65 28.49R 26.87 31.92 24.67 34.19 37.01 13.17 55.94 22.53 16.32W 61.94 25.42 63.34 15 15.74 34.26 3.66 18.93 31.34

Person 3 J 30.63 18.67 31.11 21.98 21.31 17.18 23.5 3.88 10.07R 16.16 27.87 13.22 32.41 38 8.61 41.56 14.54 3.18

than the other two. Another aspect worth pointing out is that the plots for running tend

to be more pointed than for the other two motion types.

Fig. 17(b) plots the variation of the second coordinate c2(t), which has larger variation

among different types of motion and persons. Differences in walking style of the second

person from the first and the third show up in the nature of the variation. The running

style of each person is different from the other two, wich is also evident from the plots.

The matrix containing the time normalized distances (dnorm) is shown in Table 2. Min-

imum value in each row is highlighted. Notice that the diagonal entries are lower than

the off-diagonal ones, which indicates good discrimination. The distance matrix can be

partitioned into sub-matrices that provide insights into the kinds of discrimination that are

possible. These we consider next.

4.1 Can We Discriminate Between Motion Types Across Persons?

To test whether we can reliably distinguish between walking, jogging, and running across

persons, we grouped the data into three classes, each representing one motion type and

containing SoPF traces from all persons. We compute the intra-class and inter-class dis-

tances, whose mean values are listed in the first row of Table 3 along with the variances of

34

(a)

(b)

Figure 17. Variation of (a) c1(t) and (b) c2(t) within each motion cycle for each of the threepersons and motion types.

35

these estimates. The mean inter-class distance (30.52× 10−3) is almost double of the mean

intra-class distance (15.82 × 10−3). As we see next, this discrimination between motion

types, irrespective of person identity, is lower than on per-person basis.

4.2 For Each Person, Can We Discriminate Between Motion Types?

To answer this, we consider the three by three sub-matrices along the diagonal of the

distance matrix in Table 2. For each person the distances between traces from the same

motion type form the intra-class distances and those between traces from different motion

types are the inter-class distances. The second, third, and fourth rows of Table 3 list the

mean distances for these two classes, along with estimates of their variances, for each of the

three persons. We see that the mean inter-class distances are about 4 to 20 times larger

than the mean intra-class distances. This indicates that, as expected, motion types from

each person can be easily discriminated.

4.3 Is Identifying Persons Based on Motion Gait Possible?

The next question we consider is the possibility of distinguishing persons based on SoPF

traces of different gaits. To study this, for each motion type, we formed three classes of

traces, one for each person. The inter- and intra-class mean distances between the traces

over a cycle of motion is listed in the last three rows of Table 3. The second and the

fourth columns list the mean distances. The third and fifth columns list the variances of

respective mean estimates. As we can see the inter-class mean distances are about 4 to 10

times larger than the intra-class mean distances. This seem to indicate strong possibility

for discriminating between persons based on SoPF traces.

4.4 Is the SoPF Representation Robust with Respect to Segmentation Errors?

One of the claims is that our approach does not rely on perfect segmentation. Indeed, as

outlined before, the segmentation process used is a rather crude one that identifies motion

edges based on image differencing. The motion edges identified in such a manner contain a

number of edge pixels from the background as was seen in Fig. 8(d). Sometimes, as shown

36

Table 3. Summary statistics of distances between the traces through the SoPF for the threepersons and three motion types dataset.

Trace Distance (10−3)Distinguishing Intra-Class Inter-Class

µ σµ µ σµ

Motion Types 15.82 0.83 30.52 0.72Motion Typesof Person 1 2.42 0.39 41.38 2.75of Person 2 3.76 0.69 19.80 0.75of Person 3 4.78 1.46 15.45 0.70

Persons basedon Walking 2.11 0.39 22.88 1.72on Jogging 5.65 1.47 20.12 1.73on Running 3.22 0.60 22.69 1.25

in Fig. 18(a) and (b), even significant edges are missed if they are too close to the motion

region boundary. Results presented so far were based on images that contained all these

artifacts.

We also conducted a controlled study, where we relaxed our thresholds for identifying

motion edges to include more edges. Fig. 18(c) and (d) show motion edges identified for

the frame shown in Fig. 8 for two different degrees of tolerances. More background edges

are included in Fig. 18(d) than in Fig. 18(c), which is more than in Fig. 8(d). The pairwise

distances are shown in Tables 4 and 6. The minimum in each row is highlighted. The

various inter- and intra-class mean distances traces for the two noisy segmentations are

listed in Tables 5 and 7. By comparing these distances with that listed in Table 3, we see

that, although the gap between the inter- and intra-class means decrease with increasing

segmentation noise, there is still enough discriminating power between the classes.

4.5 Is the SoPF Representation Stable with Respect to Scale Variations?

To show that the SoPF representation is scale invariant, we sub-sampled our testing set to

half the size of the sequence frames keeping the training set, which was used to construct

the SoPF, at the original size. Table 8 lists the distances between two cycles from the

reduced size testing set using the SoPF constructed with original size images. Table 9

37

(a) (b)

(c) (d)

Figure 18. (a) and (b) show some typical frames where the segmentation process missessignificant portions of the legs. (c) An under segmented frame. (d) A more under segmentedframe. (Corresponding to that in Fig. 8).

Table 4. Distance between the traces through the SoPF of two different cycles of motionfor the three persons and three motion types with moderate amount of segmentation noise(Walking (W), Jogging (J), and Running (R)).

Segmentation Study: Trace Distances (10−3)Person 1 Person 2 Person 3

W J R W J R W J RW 1.04 27.63 3.5 36.75 47.01 9.8 60.58 35.74 14.6

P1 J 20.33 2.41 26.79 13.99 15.56 12.56 27.52 21.83 16.13R 9.18 37.86 3.76 48.23 55.78 16.48 64.11 34.52 14.69W 45.6 11.44 51.22 1.94 3.93 18.42 13.93 22.59 28.54

P2 J 37.31 8.44 43.25 6.51 5.68 14.32 19.07 18.03 24.54R 23.94 25.88 22.11 27.44 30.02 11.6 47.82 18.9 14.27W 56.56 22.75 55.87 13.63 14.06 29.92 3.1 17.04 25.75

P3 J 28.56 17.58 27.96 19.69 19.69 15.82 20.53 3.12 8.39R 15.71 24.08 13.2 27.01 32.51 8.71 34.91 11.61 3.08

38

Table 5. Summary statistics of the distances between the traces through the SoPF forsequences with moderate amount of segmentation noise.

Segmentation Study: Trace Distance (10−3)Distinguishing Intra-Class Inter-Class

µ σµ µ σµ



Table 6. Distance between the traces through the SoPF of two different cycles of motion forthe three persons and three motion types with large amount of segmentation noise (Walking(W), Jogging (J), and Running (R)).

Segmentation Study: Trace Distances (10−3)Person 1 Person 2 Person 3

W J R W J R W J RW 1.17 23.67 2.74 28.41 37.2 8.59 49.56 29.75 14.41

P1 J 18.68 2.18 21.46 11.05 13.42 10.8 22.01 17.26 13.69R 8.24 30.28 3.46 37.66 43.76 15.02 51.4 28.83 14.5W 36.93 9.35 38.82 1.62 3.73 13.12 12.9 16.4 19.77

P2 J 30.11 6.65 32.61 4.75 5.04 9.91 16.99 14.86 18.4R 19.6 20.71 17.18 20.5 22.3 9.35 39.62 17.03 13.13W 46.47 17.83 42.7 11.24 12.85 23.22 3.07 11.19 16.24

P3 J 25.91 14.74 23.35 15.88 17.08 14.76 14.78 2.25 6.2R 15.42 20.34 11.81 20.86 25.66 9.4 27.39 10.04 4.22

39

Table 7. Summary statistics of the distances between the traces through the SoPF forsequences with large amount of segmentation noise.

Segmentation Study: Trace Distance (10−3)Distinguishing Intra-Class Inter-Class

µ σµ µ σµ



shows the summary of the distances between different subsets. We can see that the distances

between traces are very similar to those shown in Table 3. This indicates that the relational

distribution representation has some amount of built-in scale invariance.

4.6 PCA of the Edge Images

One might ask, why not just do a PCA of the edge images instead of the relational distri-

bution of the edges. Our experience shows that the SoPF representation is more compact

than the PCA space of the raw edges themselves. Fig. 19 shows the plot of the eigenvalues

for both the edge-PCA and the SoPF spaces. Values were energy normalized. From this

plot it is obvious that the edge-PCA space is much less compact than the SoPF space; SoPF

can work with a lesser number of dimensions that edge-PCA.

The computational complexity of the edge PCA is dependent of the image size used, and

hence dependent on the scale of the images, whereas the SoPF computational complexity is

dependent on the size of the relational distributions, which is a constant. In fact, we found

it difficult to allocate enough memory to compute the eigenvalues and eigenvectors directly

from the 265 × 130 edge images using a Sun Ultra 30 Creator workstation running at 246

MHz with 256 MB of RAM. On reduced sized images it took almost 34 hours to calculate

40

Table 8. Distance between the traces through the SoPF of two different half scaled cyclesof motion for the three persons and three motion types (Walking (W), Jogging (J), andRunning (R)).

Scale Study: Trace Distances (10−3)Person 1 Person 2 Person 3

W J R W J R W J RW 10.05 52.48 79.13 32.75 56.44 50.70 10.97 20.42 22.32

Person 1 J 49.58 3.12 18.14 18.24 6.77 24.89 54.94 23.91 32.35R 62.95 18.38 4.21 32.12 21.48 19.18 63.56 39.39 34.91W 21.63 22.76 35.57 3.42 22.09 29.05 27.55 22.58 20.12

Person 2 J 41.76 9.95 28.20 16.46 7.75 21.07 48.39 18.89 27.26R 30.97 21.44 26.35 20.91 20.36 5.15 33.13 20.64 13.52W 10.71 54.47 74.76 38.27 58.70 42.24 5.34 19.39 15.36

Person 3 J 30.92 37.44 61.05 38.85 36.89 26.49 28.09 15.37 16.83R 18.61 33.05 46.12 31.76 35.87 18.98 16.52 9.94 3.95

Table 9. Summary statistics of the distances between the traces through the SoPF of thehalf scaled version of the testing set, keeping the training set at the original size.

Scale Study: Trace Distance (10−3)Distinguishing Intra-Class Inter-Class

µ σµ µ σµ



41

Figure 19. Comparison of the largest eigenvalues associated with the edge images of peoplein motion and those associated with the SoPF of 2-ary relational distributions of the sameimages.

the eigenspace. In contrast, the size of the relational distribution we used was 30 × 30 and

the eigenspace was easily computed.

42

CHAPTER 5

EVALUATION METHODOLOGY

In this chapter we explain the methods used to evaluate the performance of the algorithms

developed. First, we introduce concepts and methods related to measuring performance of

an algorithm over a specific dataset, and then we present methods to evaluate performance

of different algorithms over the same dataset.

According to Jain et al. [61], there is no evaluation method sufficient to provide a con-

vincing and reliable accuracy of a biometric system, because performance evaluations are

very dependent on the database tested. That is why in our experiments we used four differ-

ent data sets captured under different imaging conditions, highlighting different covariates,

and of different sizes. For each experiment, we divide our datasets into gallery and probe

sets, adopting the successful FacE REcognition Technology (FERET) evaluations [62]. In

the biometrics vocabulary, gallery set represents the enrolled data or data on watch list and

the probe sets are the query data. The probe sets vary from the gallery set in increasing

degrees of differences in terms of the covariates. Same subject can be represented in both

the gallery and probe sets, but the same data unit (i.e. sequence of images) from a person

is not used in both gallery and probe sets. We match each probe sequence to the gallery se-

quences, thus obtaining a similarity matrix with size that is the number of probe sequences

by the number of gallery sequences.

5.1 Covariates

We consider a covariate as a condition affecting gait and the different experiments that

we performed are structured to study how covariates like gait type (Chapter 6), view an-

gle (Chapter 7), walking surface, footwear and view point (Chapters 8 and 9) affect the

recognition performance by varying that condition between gallery and probe sets.

43

Table 10. Sample rows from a file in SAS format for the experiment on different motiontypes.

Person MotionType Direction DistanceSamePerson SameMotionType DiffDirection 2.3164163SamePerson DiffMotionType SameDirection 6.3817268SamePerson DiffMotionType DiffDirection 6.5920931SamePerson DiffMotionType SameDirection 5.3880706SamePerson DiffMotionType DiffDirection 5.7101398DiffPerson SameMotionType SameDirection 7.2667040

5.1.1 Analysis of Variance (ANOVA)

We use ANOVA to quantify the effect of covariates studied in the experiments of Chap-

ters 6 and 7. The information contained in the similarity matrices, constructed out of the

similarities between each pair of gallery and probe sequences, is used for this purpose. We

use the SAS software to perform the statistical analyses. Generalized linear model (GLM)

is used, which is a better option over the traditional factorial model since it supports the

use of categorical variables and provides additional output information. The data in the

similarity matrix is rearranged into a single column format (row-scanned). Tags are added

for each similarity value specifying the relation of the covariates that produced the value.

The product is a file in SAS format. Table 10 shows a sample of the first few lines of such

file.

The similarity or distance is defined as the dependent variable and covariates such as

person, motion type, view angle, and direction are defined as the independent variables.

The results of the test will provide us with statistical evidence of the variation induced by

each independent variable in an experiment. By comparing the F-values and P-values we

can determine which independent variable will have more effect over the dependent variable

and if this effect can be considered statistically significant for the experiment.

5.2 Performance Evaluation

Following the pattern of the FERET evaluations, we measure performance for both identi-

fication and verification scenarios using cumulative match characteristics (CMCs) and re-

44

ceiver operating characteristics (ROCs), respectively. The evaluation process is illustrated

in Fig. 20.

5.2.1 Identification

In the identification scenario, the task is to identify a given probe to be one of the given

gallery images. To quantify performance, for each probe we sort the gallery images based

on computed similarities with that probe. In terms of the similarity matrix, this would

correspond to sorting the individual rows of the similarity matrix. If the correct gallery

sequence corresponding to the given probe occurs within rank k in this sorted set, then we

have a successful identification at rank k. A cumulative match characteristic plots these

identification rates (PI) against the rank k. The identification rate is the ratio of the number

of correct identifications to the total number of probes. Note that this is a closed universe

test, where every probe should be in the gallery.

5.2.2 Verification

In the verification scenario, we are interested in knowing whether a person is indeed the one

he/she claims. In other words, we are interested in matching a given pair of probe and gallery

images. This type of scenario can arise when trying to gain access to ATM machines or

entry into a building. To quantify performance in this scenario we use the classical receiver

operating characteristics (ROCs) that plot the verification rates (or detection rates) against

false alarm rates. The verification rate is the ratio of the number of correct identifications

to the number of probes in the gallery. The false alarm rate is the ratio of the number of

incorrect identifications to all possible wrong pairings of gallery and probe subjects. This

is an open universe test, where some probes are not in the gallery.

5.3 Statistical Methods for the Evaluation of Human Identification Algorithms

In this section we describe Mc Nemar’s test that we used to analyze and compare the

performance of two recognition strategies on the same dataset. The use of ROC and CMC

to evaluate the performance of these algorithms give us a point of comparison between

45

Figure 20. The process of evaluating the performance of our algorithms.

46

Table 11. Paired data from algorithms being compared with Mc Nemar’s test.

Outcome ofAlgorithm A

S FOutcome of S 33 7Algorithm B F 4 20

algorithms reporting results over the same database and using the same gallery and probe

sets, but is not enough evidence to determine if one algorithm performs better than other.

The other statistical characterization that we consider establishes ranges of variations for

changes in the gallery set.

5.3.1 Mc Nemar’s Test

Beveridge et. al. [63] introduced to the computer vision biometrics community a simple

binomial model for the outcomes of human identification algorithms and proposed the use

of Mc Nemar’s test to compare two algorithms (A and B) tested on common data. Following

this methodology, four numbers need to be calculated:

(a) SS is the number of subjects correctly classified by both algorithms.

(b) FF is the number of subjects incorrectly classified by both algorithms.

(c) SF is the number of subjects correctly classified by algorithm A but incorrectly clas-

sified by algorithm B.

(d) FS is the number of subjects incorrectly classified by algorithm A but correctly clas-

sified by algorithm B.

These numbers are represented in Table 11. In Mc Nemar’s test, SS and FF numbers

are discarded. The null hypothesis, H0, is that P [SF ] = P [FS], this means that a failure

(SF or FS) is equally likely to favor algorithm A or B. The formulation for the rest of the

test is as follows: Let NSF be the number of SF occurrences and NFS be the number of

FS occurrences. An alternative hypothesis, HALT , is considered to be P [SF ] > P [FS]

47

(for the one sided version of this test), which implies that algorithm A fails less often than

algorithm B. Under H0,

P [at least NSF mismatches favor A] = P [at least NFS mismatches favor B]

=∑NF S

i=0n!

i!(N−i)!0.5NSF +NF S

The probability resulting from these computation is the p-value for rejecting H0 in favor

of HALT .

5.3.2 Performance Variations due to Variation in Gallery Data

Gait datasets are partitioned into subsets according to the covariates being investigated

and experiments are designed to study their effects. Typically, we select the largest data

subset as the gallery set and test against all other subsets defined from each experiment. In

this section, we describe a method to study the variation in performance of the algorithms

developed for all the designed experiments considering variations in the gallery data.

Let {C1, C2, . . . , CK} be the K covariates being investigated. We consider covariates

having 2 levels (ci(1) and ci(2)). Then, the dataset can be partitioned into 2K subsets for dif-

ferent combination of covariates. Thus, a dataset can be denoted by {C1 = c1(l1), . . . , Ci =

ci(li), . . . , CK = cK(lK)}. To study recognition rates with change in covariate Ci, we have

to consider gallery and probe sets with different Ci level, keeping other covariates constant.

Thus, possible gallery-probe pairs would be:

Gallery = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(1), Ci+1 = ci+1(li+1), . . . , CK =

cK(lK)}Probe = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(2), Ci+1 = ci+1(li+1), . . . , CK =

cK(lK)}for different combinations of {l1, . . . , li−1, li, li+1, . . . , lK}. Note that between the gallery

and probe pair only Ci is changing. There are 2K−1 such combinations. Another 2K−1

combinations could be generated by reversing the roles of the probe and the gallery in the

above pariting, thus

Gallery = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(2), Ci+1 = ci+1(li+1), . . . , CK =

cK(lK)}

48

Probe = {C1 = c1(l1), . . . , Ci−1 = ci−1(li−1), Ci = ci(1), Ci+1 = ci+1(li+1), . . . , CK =

cK(lK)}The variation in recognition rates for these 2K cases would give us an idea of the variabil-

ity of recognition rate when the Ci factor is changed. In our case, we test and compare such

variations in performance of two different algorithms over the same dataset and present

the results in CMC curves for each experiment to graphically compare variations in the

performance of both algorithms.

49

CHAPTER 6

HUMAN IDENTIFICATION FROM DIFFERENT GAIT TYPES

In this chapter, we present an experiment designed to explore the possibility of gait-based

identification in a more extensive manner than in Chapter 4. We use a database of 10

persons performing three motion types, walking, jogging, and running, in an outdoor set-

ting. The viewpoint is frontal-parallel. Some example frames are shown in Fig. 21 for a

person (a) walking, (b) jogging, and (c) running. The average height of the person is 120

pixels. Each person performed these three different motion types in two different directions,

left-to-right and right-to-left. This gives us six different types of sequences (Walking-Left,

Walking-Right, Jogging-Left, Jogging-Right, Running-Left, Running-Right) for each per-

son, resulting in a total of 60 sequences.

6.1 Analysis of Covariates

The three covariates present in the 10 person database are motion type, walking direction,

and the identity of the person. In this section we quantify the strength of the variations in

gait due to these covariates. For our analysis, from each of the 60 sequences, we extracted

two motion cycles: one was used to build the SoPF (training set) and the other was used

for analysis (testing set). The dimensions of the trained SoPF are shown in Fig. 22 as

gray level images with the corresponding eigenvalues quantifying the associated variation

shown below each image. Although the eigenvectors are not exactly the same as those for

the treadmill sequences (Fig. 15), which just included the lower legs, we can see certain

similarities. Variation of distances seems to be important for the top eigenvectors and the

orientation variations are emphasized by later eigenvectors.

We computed the time-normalized distances (dnorm) between each pair of the 60 training

and 60 testing gait cycles. We then used analysis of variance (ANOVA) to study the effect

50

(a)

(b)

(c)

Figure 21. Sample frames of a person (a) walking, (b) jogging, and (c) running.

Figure 22. Ten most dominant dimensions of the SoPF for different motion types databaseconsisting of 10 persons.

51

Table 12. ANOVA table with results for different motion types experiments.

Source DF SS F-value P-valuePerson 1 793.92 114.22 < 0.0001Angle 1 9.53 1.37 0.2419

Direction 1 12.06 1.74 0.1879

of person, motion type, and direction of motion on the computed distance. Each covariate

can have two possible values: same or different, i.e. same person or different persons, same

motion type or different motion types, and same motion direction and different motion

directions. For instance, a computed distance could be between, say, different persons,

same motion types, and for movement in the same direction. ANOVA results are shown in

Table 12, from which we can see that differences due to the subject is, by far, the largest

source of variation as compared to motion type or direction. In fact, as the F-values suggest,

the variation due to the persons is at least three orders of magnitude larger than due to

motion type or walking direction.

6.2 Gait-Based Recognition Experiments

Given that the subject is the largest source of variation in the distances out of the three

factors, it is natural to ask what kind of recognition rates can we get based on gait, be it

walking, jogging, or running. This, we investigate next.

We conducted three gait recognition experiments based on walking, jogging, and running

gaits. For each experiment, we separated the sequences with the corresponding motion type

into gallery and probe sets, as explained in Chapter 5. One cycle from each sequence with

the person going left formed the gallery set and one cycle from each sequence with the

person going right formed the probe sets. We are basically using the left profile of the

person as gallery and the right profile as probe. The specific gallery and probe sets for each

experiment are listed in the second row of Table 13. The gallery set of images was also the

training set used to build the SoPF.

52

Table 13. Number of persons correctly identified for different motion types experiments.

At ExperimentRank Gallery: Walking Left Gallery: Jogging Left Gallery: Running Left

Probe: Walking Right Probe: Jogging Right Probe: Running Right1 10 of 10 10 of 10 8 of 102 10 of 10 10 of 10 9 of 10

Table 14. Distance between the traces through the SoPF of two different cycles of walkingmotion for 10 persons.

Trace Distances (10−3)Gallery Probe

1 2 3 4 5 6 7 8 9 101 12.96 22.15 20.00 16.00 35.48 21.03 25.44 33.12 24.86 26.672 15.34 7.33 23.05 11.68 19.93 15.94 17.48 24.96 24.09 29.273 20.10 30.40 12.92 24.61 31.39 22.43 30.40 24.97 29.73 18.374 15.10 17.71 20.75 8.55 30.24 16.59 19.89 21.76 22.30 16.355 21.68 20.59 14.83 17.80 13.48 17.37 17.73 14.87 30.78 21.126 16.35 16.68 21.23 14.81 24.02 8.71 16.86 16.85 18.96 17.047 23.07 20.64 24.78 19.70 19.02 18.91 9.95 15.88 31.63 27.018 26.70 25.94 23.96 23.68 20.44 24.17 16.70 9.66 39.46 22.789 22.86 23.43 28.87 21.97 17.24 11.00 10.37 18.63 18.23 22.9110 35.15 47.61 32.34 37.36 41.08 27.78 37.20 26.24 38.87 11.96

For each probe we compute its distance from all the gallery images. If the identity of

the gallery image with the smallest distance to the probe matches the identity of the probe,

then we have successful identification. Table 14 shows the distances between the gallery

containing cycles from the persons walking right and the probe containing persons walking

left, Table 15 shows the distances between cycles from the persons jogging, and Table 16

shows the distances between cycles from the persons running. The minimum value in each

column, which corresponds to one probe, is highlighted. Note that the gallery entry with

the minimum values corresponds to correct identity. Thus, we have correct identification

10 out of 10 times for walking gait. The same is also true for jogging gait. But, as we see in

Table 16, the rate falls to 8 out of 10 correct identifications for running gait. If we accept

correct identification to be the case when the identities of either the minimum (rank 1) or

the second minimum (rank 2) match, then the identification rate increases to 9 out of 10.

53

Table 15. Distance between the traces through the SoPF of two different cycles of joggingmotion for 10 persons.


1 2 3 4 5 6 7 8 9 101 12.69 17.73 16.02 13.38 16.38 17.04 17.99 19.24 17.13 23.592 21.18 7.03 23.03 12.79 18.11 13.55 14.49 15.79 17.25 29.983 18.69 28.38 9.12 22.22 25.75 25.19 23.26 26.12 25.36 15.354 19.42 16.24 17.99 8.96 14.12 12.17 14.42 12.54 14.33 17.205 27.00 19.69 20.80 17.97 7.73 13.84 16.98 14.92 19.10 30.466 22.45 13.73 23.18 11.93 12.74 8.92 12.98 10.88 13.13 20.797 29.66 16.96 22.97 17.41 14.46 15.35 11.11 13.30 23.48 33.758 31.19 19.42 25.97 17.24 18.62 13.41 17.33 9.09 19.44 32.629 30.55 18.67 26.73 22.57 19.89 10.31 18.27 14.16 11.09 20.5710 19.51 30.41 20.51 17.12 30.60 24.94 26.36 22.08 23.40 9.84

Table 16. Distance between the traces through the SoPF of two different cycles of runningmotion for 10 persons.


1 2 3 4 5 6 7 8 9 101 6.32 15.45 17.98 15.84 20.25 21.14 20.84 18.35 26.24 26.512 13.83 6.90 19.34 15.24 21.43 18.76 17.96 17.91 31.30 31.373 21.13 23.91 5.69 14.26 19.22 16.67 24.72 14.35 25.61 12.014 20.01 22.02 11.00 11.95 18.78 16.87 22.60 16.36 25.13 17.375 23.45 26.09 19.51 17.88 13.22 20.26 23.82 19.32 25.98 23.326 23.64 26.74 20.42 18.86 19.77 10.71 13.15 10.46 16.24 21.757 22.80 14.53 24.87 18.28 27.07 13.07 12.19 14.60 28.80 35.378 23.04 17.78 18.08 16.86 22.08 8.02 17.48 8.15 28.04 27.629 24.96 31.20 25.16 25.49 29.72 23.06 19.39 21.81 12.68 28.5210 42.47 61.60 27.19 39.03 53.07 39.91 43.46 35.08 29.50 24.43

54

CHAPTER 7

WALKING GAIT BASED IDENTIFICATION FROM DIFFERENT VIEWANGLES

In this chapter, we investigate the viability of the SoPF framework for walking gait-based

recognition using a larger dataset of 20 persons. We also investigate the relationship of

the achieved recognition rates with viewing angle. For this, we imaged 20 persons walking

frontal-parallel, 22.5◦, and 45◦ with respect to the image plane, as depicted in Fig. 23.

As before, each person walks each of the three slanted paths in two different directions,

left-to-right and right-to-left. Thus, for each person, there are 6 sequences imaged under

6 possible conditions: 0◦ (frontal-parallel) going left (0L), 0◦ (frontal-parallel) going right

(0R), 22.5◦ going left (22L), 22.5◦ going right (22R), 45◦ going left (45L), and 45◦ going

right (45R). The abbreviations in parentheses will be used in the following discussion to

refer to these conditions. Fig. 24 shows 3 sample frames from the same person walking

the three differently angled paths. The frame size is 280 × 130. Although our dataset is

moderate in size, it is quite challenging. It presents a very difficult scenario for background

subtraction, including persons moving in the background and sudden illumination changes

due to clouds.

7.1 Analysis of Covariates

The three covariates present in the database for this experiment are walking direction, angle

of motion path, and the identity of the person. We quantify the strength of effect of these

factors on the variations in the distance values computed between two cycles from each of the

120 sequences (20 persons×3 covariates×2 conditions per covariates). One cycle from each

of the 120 sequences form the training set of images was used to construct the SoPF, whose

leading dimensions are shown in Fig. 25 with the corresponding eigenvalues quantifying

the associated variation shown below each image. Qualitatively, these dimensions capture

55

Figure 23. Setup for data acquisition of different view angle walking sequences.

56

(a)

(b)

(c)

Figure 24. Sample frames from the same person walking (a) frontal-parallel (b) 22.5◦ (c)45◦ with respect to the image plane.

Table 17. ANOVA table with results for different view angle experiments.

Source DF SS F-value P-valuePerson 1 4624.33 1208.97 < 0.0001Angle 1 0.74 0.19 0.6604

Direction 1 4.51 1.18 0.2775

similar aspects, such as primarily distance in the first few dimensions and then a combination

of orientation and distance, as that in Fig. 22. They are, of course, different when examined

closely, since the previous database included three types of motion, not just walking as in

the present case.

As before, we quantify the effect of the covariates on the distances using ANOVA, whose

output is shown in Table 17. We see that subject is the largest and most significant source

of variation. In fact, as the F-values suggest, the variation due to the persons is at least

three orders of magnitude larger than due to angle or walking direction.

57

Figure 25. Ten most dominant dimensions of the SoPF for 20 person database.

Table 18. Gallery and probe sets for gait recognition experiments over the 20 persondatabase.

Experiment Training set/Gallery Set Probe Set1 0◦ (frontal-parallel) going left (0L) 0◦ (frontal-parallel) going right (0R)2 22.5◦ going left (22L) 22.5◦ going right (22R)3 45◦ going left (45L) 45◦ going right (45R)4 0◦ (frontal-parallel) going left (0L) 22.5◦ going right (22R)5 0◦ (frontal-parallel) going left (0L) 45◦ going right (45R)

7.2 Gait-Based Recognition Experiments

Given that the subject is the largest source of gait variation, as measured in the SoPF, how

does the recognition rates vary with view angle? To answer this, we separated our database

into five sets of gallery and probe combinations, corresponding to 5 experiments listed in

Table 18. The going-left sequences form the galleries and the going-right sequences form

the probes. The training set of images, used to create the SoPF, consists of the union of

the gallery sets.

We use the first three experiments, 1, 2 and 3, to study if recognition can be possible

from views other than frontal-parallel ones. To answer how recognition varies as the view

angle, we use experiments 1, 4, and 5. On comparing the CMCs and ROCs for experiments

1, 2, and 3, which are shown in Figs. 26 (a) and (b) respectively, we see that identification

and recognition rates from the three experiments are similar. Rank 1 identification rates

58

range from 75% to 80% which improves to 85% to 95% at rank 2. Verification rates at 10%

false alarm are around 90%. We can conclude that gait-based recognition is possible from

non frontal-parallel views, such as those viewed at 22.5◦ or 45◦.

Fig. 27 shows the (a) CMC and (b) ROC curves for experiments 1, 4 and 5 that study

the variation of identification and verification rates with viewpoint. Identification rate when

the gallery and the probe are from the same view angle is 80%, which drops only to 75%

when the probe is from 22.5◦ viewpoint. But the performance falls drastically to 55% with

a 45◦ viewpoint probe set. The same trend is also seen in the ROCs. Thus, it appears that

the gait-based recognition using the SoPF framework is robust with respect to viewpoint

change up to 22.5◦.

One might argue that on a small dataset one should get near 100% identification rates.

To this, we point out the complexity of the outdoor imaging conditions in the data set and

the fact that we have a clear separation of train and test sets; we use the left profile for

training (or as gallery) and try to identify people from their right profiles (the probe sets).

Thus, the recognition rates also reflect the inherent variation in gait due to opposite profile

viewpoints in addition to any other factor that might be different between the probe and

gallery sets in each of the experiments.

59

0 2 4 6 8 10 12 14 16 18 2050

55

60

65

70

75

80

85

90

95

100

105

Rank

Iden

tific

atio

n R

ate

View Angle Experiments

0L − 0R22L − 22R45L − 45R

(a)

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100View Angle Experiments

False Alarm Rate

Ver

ifica

tion

Rat

e

0L − 0R22L − 22R45L − 45R

(b)

Figure 26. (a) CMC and (b) ROC curves for experiments 1, 2 and 3, studying identificationand verification rates at varying viewpoints.

60

0 2 4 6 8 10 12 14 16 18 2050

55

60

65

70

75

80

85

90

95

100

105

Rank

Iden

tific

atio

n R

ate

View Angle Experiments

0L − 0R0L − 22R0L − 45R

(a)

0 10 20 30 40 50 60 70 80 90 1000

10

20

30

40

50

60

70

80

90

100View Angles Experiments

False Alarm Rate

Ver

ifica

tion

Rat

e

0L − 0R0L − 22R0L − 45R

(b)

Figure 27. (a) CMC and (b) ROC curves for experiments 1, 4 and 5, studying variation ofidentification and verification rates with change in view point.

61

CHAPTER 8

BENCHMARKING WALKING GAIT BASED IDENTIFICATION

There is an increasing interest in human identification based on gait in the computer vi-

sion community. However, there is no benchmark to compare all emerging techniques,

mainly because we do not quite know the set of conditions under which the problem can

be solved. We know about potential factors that can effect human gait, such as walking

surface, footwear, view points, carrying objects, etc. However, the effects are not quantified

on a large dataset. In this chapter we first summarize the gait challenge problem [64] [65],

which contains a data set covering some of the mentioned variations, a set of experiments,

and a baseline algorithm. Then we present the performance of the baseline algorithm.

8.1 The Gait Challenge Problem

The gait challenge problem was designed to investigate the factors inducing variations in

human gait, and how they will affect the performance of gait-based recognition. There

are processes used in gait-based recognition that also need to be investigated such as fore-

ground/background segmentation, tracking, and dealing with occlusions. It is not possible

to draw conclusions based only in the performance figures of one algorithm on a small

database. Rather, these conclusions will come from detailed analysis of performance statis-

tics of multiple algorithms on a large common data set. The gait challenge problem provides

this framework.

8.2 The Data Set

The key to the success of the challenge problem is the database of video sequences collected

to support it. An ideal database helps define a set of challenge experiments that span a

range of characteristics and difficulties. These ranges are included in the gait challenge

62

Figure 28. Camera setup for the gait data acquisition.

problem because of the number of conditions under which a person’s gait is collected, the

number of individuals in the database, and the fact that all sequences are taken outside.

The database used in the challenge problem is the largest available to date in terms of

number of people, number of video sequences, and conditions under which a person’s gait

is observed. The current installment of the database consists of 452 sequences from 74

individuals, with each individual collected in up to 8 conditions. All the data is collected

outside, reflecting the added complications of shadows from sunlight, moving background,

and moving shadows due to cloud cover. This dataset is significantly larger than those that

are being used in present studies (Table 1), most of which are not publicly available.

The cameras were consumer-grade Canon Optura for the concrete surface, these are the

same cameras used to collect data for experiments in Chapters 4, 6, and 7. Two Canon

Optura PI cameras were used for the grass surface. All four are progressive-scan, single-

CCD cameras capturing 30 frames per second with a shutter speed of 1/250 second and with

auto-focus left on as all subjects were essentially at infinity. The cameras stream compressed

63

digital video to DV tape at 25 Mbits per second by applying 4:1:1 chrominance sub-sampling

and quantization, and lossy intra-frame adaptive quantization of DCT coefficients.

The imagery was recovered from tape at the National Institute of Standards and Tech-

nology (NIST). The camera was accessed over its IEEE 1394 Firewire interface using Pin-

nacle’s micro DV 300 PC board. The result is a stand-alone video file stored using Sony’s

(Digital Video) DV-specific dvsd codec in a Microsoft AVI wrapper. This capture from tape

does not re-compress and is not additionally lossy. Finally, the imagery is transcoded from

DV to 24-bit RGB using the Sony decoder and the result is written as PPM files, one file

per frame (720× 480 PPM file). This representation trades off storage efficiency for ease of

access.

Each subject walked multiple times, counterclockwise, around each of two similar sized

and shaped elliptical courses. The basic setup is illustrated in Fig. 28. The elliptical

courses were approximately 15 meters on the major axis and 5 meters on the minor axis.

Both courses were outdoors. One course was laid out on a flat concrete walking surface. The

other was laid out on typical grass lawn surface. Each course was viewed by two cameras,

whose lines of sight were not parallel, but verged at approximately 30◦, so that the whole

ellipse was just visible from the two cameras. Fig. 29 shows one sample frame from each of

the four cameras on the two surfaces. The orange traffic cones marked the major axes of

the ellipses. The checkered object in the middle can be used to calibrate the two cameras.

The final sequences contain each subject walking several laps of the course. However,

only data from one full elliptical circuit for each condition is available. For the gait database,

those frames were clipped from the last such lap when the persons are more comfortable of

being taped and they reach normal walking speed. The number of frames in each sequence

ranges from 600 to 700 frames. The gait video data was collected on May 21 and 22, 2001.

Subjects were asked to bring a second pair of shoes, so that they could walk the two

ellipses a second time in a different pair of shoes. A little over half of the subjects walked in

two different shoe types. Thus there are as many as eight video sequences for each subject:

(grass(G) or concrete(C))×(two cameras, L or R)×(shoe A or shoe B). Table 19 shows the

number of sequences for each combination of conditions in the present database.

64

(a) (b)

(c) (d)

Figure 29. Frames from (a) the left camera for concrete surface, (b) the right camera forconcrete surface, (c) the left camera for grass surface, (d) the right camera for grass surface.

Table 19. Number of sequences for each combination of possible surface (G or C), shoe (Aor B), and camera view (L or R).

Surface Concrete (C) Grass (G)Shoe A B A B

Left Camera 70 44 71 41Right Camera 70 44 71 41

65

Table 20. The probe set for each of challenge experiments. The number of subjects in eachsubset are in square brackets.

Exp. Probe DifferenceA (G, A, L) [71] ViewB (G, B, R) [41] ShoeC (G, B, L) [41] Shoe, ViewD (C, A, R) [70] SurfaceE (C, B, R) [44] Surface, ShoeF (C, A, L) [70] Surface, ViewG (C, B, L) [44] Surface, Shoe, View

8.3 Challenge Experiments

The set of challenge experiments of increasing difficulty for gait-based recognition are pre-

sented next. Three covariates are studied: walking surface (concrete (C) or grass (G)), shoe

type (A or B), and viewpoint (left (L) or right (R)). Based on the values of the covariates

the dataset is divided into 8 possible subsets: {(G, A, L), (G, A, R), (G, B, L), (G, B, R),

(C, A, L), (C, A, R), (C, B, L), (C, B, R)}. Since not every subject was imaged under

every possible combination of factors, the sizes of these sets are different (Table 19). One

of the large subsets (G, A, R), i.e. (Grass, Shoe Type A, Right Camera), was designated as

the gallery set, which includes 71 subjects. The rest of the subsets are probe sets, differing

in various ways from the gallery. The structure of the challenge experiments is listed in

Table 20.

8.4 Baseline Algorithm

The baseline algorithm was designed to be simple and fast. It is composed of three parts.

The first part semi-automatically defines bounding boxes around the moving person in each

frame of a sequence. Using a Java-based GUI, bounding boxes in the starting, middle, and

ending frames of the sequence are manually outlined. The bounding boxes for the interme-

diate frames are linearly interpolated from these manual ones. Specifically, the locations of

the upper-left and the bottom-right corners are interpolated. This approximation strategy

works well for cases where there is nearly frontal-parallel, constant velocity motion, which

66

(a) (b) (c) (d)

Figure 30. Sample bounding boxed image data as viewed from (a) left camera on concrete,(b) right camera on concrete, (c) left camera on grass, and (d) right camera on grass.

is the case of the frames from the back portion of the ellipse being processed in the gait

challenge experiments. Fig. 30 shows some examples of the image data inside the bounding

box. Note that bounding boxes are not specified tightly around the person; rather there is

some amount of background information all around the person in each box. The second and

the third parts of the algorithm are silhouette extraction and computation of the similarity

measure, which are explained in detail in the next two subsections.

8.4.1 Silhouette Extraction

The motion silhouette is extracted from each frame by background subtraction, but only

within the semi-manually defined bounding boxes. In the first pass through a sequence,

the background statistics of the RGB values at each image location, (x, y), is computed

using pixel values outside the manually defined bounding boxes in each frame. Then, the

mean µB(x, y) and the covariances ΣB(x,y) of the RGB values at each pixel location are

computed1. Fig. 31 shows examples of the estimated mean background image and the

associated variances of the RGB channels. These images were histogram equalized and are

smaller than those shown in Fig 29, because they show only the image locations where

the color statistics were computed. Notice that the variances are significantly higher in1Note that the images in this database are in color, unlike the ones used in previous chapters.

67

(a) (b)

(c) (d)

Figure 31. Estimated mean background for a sequence on (a) concrete and (c) grass. Vari-ance of the RGB channels in the background pixels on (b) concrete and (d) grass.

the regions corresponding to the bushes than other regions. The sharp contrast of the

calibration box also introduces significant variations, mainly due to DV compression.

The Mahalanobis distance of the pixel value from the estimated mean background value

is computed for pixels within the bounding box of each frame. Any pixel with this distance

above a user specified threshold DMaha is a foreground pixel. If the difference image is

smoothed using a 9×9 pyramidal-shaped averaging filter, or equivalently, two passes of a 3×3

averaging filter, the quality of the silhouette and recognition performance improves. This

smoothing compensates for DV compression artifacts. On the difference thresholded image,

we perform two post-processing steps to extract the normalized silhouette. First, small

regions are detected by connected component labeling, if the region is less than NSize pixels

then it is deleted. Second, the remaining foreground region is scaled so that its height is 128

pixels to occupy the whole length of the 128×88 output silhouette frame. The scaling of the

silhouette offers some amount of scale invariance and facilitates the fast computation of the

similarity measure. There are two ways of performing this scaling: (a) scale the thresholded

silhouette, or (b) scale the difference image using bilinear interpolation and then threshold.

The second method involves more computations than the first and produces better “looking”

silhouettes, but as will be seen, the performance is not significantly different. The sources of

segmentation errors include (a) shadows, especially in the concrete sequences, (b) inability

to segment parts of a body as distances fall just below the threshold, (c) moving objects

68

Figure 32. The bottom row shows sample silhouette frames depicting the nature of segmen-tation issues that need to be tackled. The raw image corresponding to each silhouette isshown in the top row.

in the background, such as the fluttering tape in the concrete sequences, moving leaves in

the grass sequence or other moving persons in the background, and (d) DV compression

artifacts near the boundaries of the person. Fig. 32 shows some of these problematic cases.

8.4.2 Similarity Computation

The similarity computation for sequences with more than one gait cycle presented in

Section 3.3 is reformulated in this section in terms of the challenge problem. Let the

probe and the gallery silhouette sequences be denoted by SP = {SP(1), · · · , SP(M)} and

SG = {SG(1), · · · ,SG(N)}, respectively.

The probe sequences are partitioned into disjoint subsequences of NProbe contiguous

frames. Let the k-th probe subsequence be denoted by SPk = {SP(k), · · · ,SP(k+NProbe)}.The distance measure will be the correlation between each of these subsequences with the

gallery sequence

69

Corr(SPk,SG)(l) =NProbe∑

j=1

FrameSim (SP(k + j),SG(l + j)) (10)

The similarity is chosen to be the median value of the maximum correlation of the

gallery sequence with each of these probe subsequences.

Sim(SP,SG) = Mediank

(max

lCorr(SPk,SG)(l)

)(11)

At the core of the above computation is the need to compute the similarity between two

silhouette frames, FrameSim (SP(i),SG(j)), which is computed as the ratio of the number

of pixels in their intersection to that in their union. Thus, if the number of foreground

pixels in silhouette S is denoted by Num(S) then we have,

FrameSim(SP(i),SG(j)) =Num(SP(i) ∩ SG(j))Num(SP(i) ∪ SG(j))

(12)

8.4.3 Parameters

There is no calibration requirement. However, the algorithm does have three parameters

that need to be chosen.

(a) DMaha is used to threshold the Mahalanobis distance. Since this distance measure is

normalized by the covariances, the choice of the threshold tends not to be sensitive

to a particular image.

(b) NSize is used to delete small regions and fill in small holes in the thresholded difference

image.

(c) NProbe is the size of each subsequence obtained by partitioning the probe sequence.

The performance variation of the challenge Experiment A around the operating point:

DMaha = 7, NSize = 200, NProbe = 30, which is shown to be at least a locally optimal

point. With increase in DMaha the silhouettes become thinner, but parts tend to become

disconnected. The impact of the NSize parameter is less obvious visually in terms of the

overall main silhouette, but it does get rid of spurious small extraneous connected regions.

70

Table 21. Baseline performance for the challenge experiments in terms of the identificationrate PI at ranks 1 and 5, verification rate PV at a false alarm rate of 10%, and area underROC (AUC).

Experiment Difference PI (at rank) PV AUC1 5

A View 79% 96% 86% 0.937B Shoe 66% 81% 76% 0.883C Shoe, View 56% 76% 59% 0.844D Surface 29% 61% 42% 0.765E Surface, Shoe 24% 55% 52% 0.774F Surface, View 30% 46% 41% 0.750G Surf, Shoe, View 10% 33% 36% 0.759

The NProbe parameter is used in the similarity computation and does not affect the silhouette

quality.

8.5 Baseline Performance

Fig. 33 plots the CMCs and ROCs of the 7 challenge experiments. Table 21 lists some of

the key performance indicators, namely, the identification rate (PI) at ranks 1 and 5, the

verification rate (PV ) for a false alarm rate of 10%, and the area under the ROC (AUC).

There are several observations to be made. First, the identification ranges from 10%

to 79% at rank 1, which improves to a range from 33% to 96% at rank 5. In terms of

ROC performance, the detection rates range from 36% to 86% for a false alarm rate of

10%. These are very encouraging performances given the simplistic nature of the baseline

algorithm. Algorithms that are more sophisticated will result in better performance, for

which there is much room.

Second, both the identification rates, as seen in the CMCs, and the detection rates, as

seen in the ROCs, fall as one goes from Experiment A to G. This offers a natural ranking

of the experiments in terms of their challenge nature, i.e. the situation in Experiment A,

where the difference between probe and gallery is just the viewpoint, is easier to solve than

that in Experiment G, where the probe is different in terms of all the three covariates.

71

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Gallery: (G, A, R) Size: 71)

Exp. A Probe: (G, A, L)Exp. B Probe: (G, B, R)Exp. C Probe: (G, B, L)Exp. D Probe: (C, A, R)Exp. E Probe: (C, B, R)Exp. F Probe: (C, A, L)Exp. G Probe: (C, B, L)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100ROC (Gallery: (G, A, R) Size: 71)

False Alarm Rate

Ver

ifica

tion

Rat

e


(b)

Figure 33. Baseline performance for the challenge experiments, (a) CMC curves and (b)ROCs plotted up to a false alarm rate of 20%.

72

Third, among the three covariates, viewpoint variation of about 30◦ seems to have the

least impact and surface type has the most impact based on the drop in the identification

rate due to each of these covariates. Apart from the effect of the individual covariates on

performance, there also seem to be interactions between their effects. For instance, shoe

type (Experiment B) seems to impact performance more than viewpoint (Experiment A)

but viewpoint change along with surface change (Experiment F) impacts performance more

than shoe type change along with surface change (Experiment E).

73

CHAPTER 9

PERFORMANCE OF THE SOPF REPRESENTATION

In this chapter, we explore the performance of the SoPF representation on the gait challenge

database. First, we experiment with the low level feature types used to build the SoPF.

One choice is to use the computed silhouettes to select the 2D edges as was done in the

previous experiments. The other choice is to consider the edges of the silhouette itself. The

first strategy might pick up features from the subjects clothing, which is not a problem for

the second case. However, the second case relies heavily on the quality of the silhouettes.

We have experimented with using features from both strategies. Fig. 34 shows the moving

edges as computed in previous experiments in (a) and from the binary silhouettes in (b).

Second, we present statistical tests to compare the performance of the SoPF with that of

the baseline algorithm. Our interest is not only to visually compare CMC and ROC curves

but also to statistically measure the significance of possible differences in performance in

each one of the experiments of the challenge problem. For this purpose we use Mc Nemar’s

test, which was introduced in Section 5.3.1. Third, we measure performance of the SoPF

representation and baseline algorithm using manually specified silhouette data, which give a

better understanding of the best possible performance by eliminating noise from background

subtraction. Last, we present experiments to quantify the variation in performance of both

the baseline and SoPF algorithms when varying the gallery type.

9.1 Varying the Type of Low Level Features

We experimented with two types of edge features: (a) edges of the gray level images selected

by the silhouettes, and (b) edges of the silhouette itself. In the first case, when silhouettes

are used as masks to extract the edges in motion (Fig. 34(a)) the size of the bounding box

is variable. The scaling factor D is needed to compute the relational distributions, it was

74

(a) (b)

Figure 34. Moving edges (a) using the binary silhouettes as masks over the edges of theoriginal images and (b) directly from the binary silhouettes.

calculated by fitting a line through the curve generated from the change in the height of the

silhouettes over time. In the second case, when edges are extracted from the silhouettes,

these are scaled to the size of the bounding box as in Fig. 34(b). The scaling factor D used

in this case is the diagonal of the bounding box. The size of the bounding box was fixed to

128 × 88 pixels.

For each experiment (Table 20), the gallery set was used as the training set to build

the SoPF. We keep 10 eigenvectors associated with the 10 largest eigenvalues. For each

case, we have experimented with varying the number of eigenvectors that are kept based

on the approximation error, but this did not change the overall results significantly. In

computing the similarities between two sequences, we used the mean instead of the median

(see Eq. 9),which gave us better results.

9.1.1 Silhouette Masked Image Edges as Low Level Features

We first present results where moving edges were selected using silhouettes as masks, which

is the case of the experiments presented so far. The results for the challenge experiments

are shown in the (a) CMC and (b) ROC curves of Fig. 35, where we can see that the

performance for the first 3 experiments is good compared with the results shown for the

baseline algorithm (see Fig. 33). The performance for the last 4 experiments dropped

significantly and is even lower than the baseline algorithm. These experiments exercise

75

Table 22. Performance comparison of baseline and SoPF algorithm when using silhouettemasked image edges as low level features. (Correct identifications (Corr), Total probes ingallery (Tot), and identification rate (PI)).

Baseline SoPF Paired Outcomes Mc Nemar’sExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF test p-value

A 56/71 79% 64/71 90% 51 5 13 2 0.98B 27/41 66% 30/41 74% 24 3 6 8 0.91C 23/41 56% 22/41 54% 13 10 9 9 0.50D 19/66 29% 10/66 15% 3 16 7 40 0.05E 10/42 24% 3/42 7% 1 9 2 30 0.03F 20/66 30% 3/66 5% 1 19 2 44 < 0.01G 4/42 9% 3/42 7% 1 3 2 36 0.47

the surface covariate. The most probable cause is that the gait velocity is different across

surfaces for the same person.

Table 22 shows the breakup of the successes and failures for the baseline and SoPF

algorithm. For each experiment, we list the total number of correct matches for the baseline

and SoPF, followed by the number of sequences in which both succeeded (SS), one succeeded

and the other failed (SF and FS), and both failed (FF). The last column lists the p-value for

the Mc Nemar’s test. The drop in performance that we see for experiments D, E, and F is

significant. However, note that for all these experiments the number of sequences on which

both the baseline and SoPF failed is very large. Thus, attesting to the difficult nature of

the surface covariate.

9.1.2 Silhouette Boundary Edges as Low Level Features

In this section we present a different approach from the one we used in our previous ex-

periments to compute low level features. For this case, we perform edge detection over the

binary silhouettes and use the resulting edge pixels as low level features. The purpose of

developing this strategy is to make the SoPF algorithm more robust to possible clothing

variations or other noise coming from the boundary gap between the edges of the person

and objects in the background. This approach is faster since it does not have to read the

original image to compute the edges and then select the ones falling within the binary sil-

76

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate



(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90


False Alarm Rate

Ver

ifica

tion

Rat

e


(b)

Figure 35. Performance of the SoPF representation using silhouette masked image edges aslow level features. (a) CMC curves and (b) ROCs plotted upto a false alarm rate of 20%

77

Table 23. Performance comparison of the baseline and SoPF algorithms when using silhou-ette boundary edges as low level features. (Correct identifications (Corr), Total probes ingallery (Tot), and identification rate (PI)).

Baseline SoPF Paired Outcomes Mc Nemar’sExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF test p-value

A 56/71 79% 47/71 66% 42 14 5 10 0.03B 27/41 66% 25/41 61% 22 5 3 11 0.36C 23/41 56% 14/41 34% 8 15 6 12 0.04D 19/66 29% 8/66 12% 5 14 3 44 < 0.01E 10/42 24% 4/42 10% 2 8 2 30 0.05F 20/66 30% 8/66 12% 5 15 3 43 < 0.01G 4/42 9% 2/42 5% 0 4 2 36 0.33

houette. Instead the edges are computed directly over the binary silhouette image. The

rest of the process is the same. The results for the challenge experiments are shown in the

(a) CMC and (b) ROC curves of Fig. 36, where we can see the same behavior as in the

experiment in the last section; however, the performance for the first three experiments is

lower but close to that from the baseline algorithm, and the last four experiments are below

the baseline. Mc Nemar’s test was also applied in this case. In Table 23 we can see that

when using this strategy, only the performance from experiment B can be compared to that

from the baseline.

For completeness, we compare the performance of our two low level feature strategies

using Mc Nemar’s test, the success and failure modes are presented in Table 24 where we can

see that for experiments A, B, and C, which investigate covariates within walking surface,

the difference in performance is statistically significant in favor of the first strategy when

we use silhouette masked image edges. For experiment E, the difference is significant in

favor of our second strategy when we use silhouette boundary edges. For the rest of the

experiments the difference is not significant.

9.2 Using Manually Segmented Silhouettes

In this section, we used a few manually segmented silhouettes to test both the baseline

and the SoPF algorithms. One gait cycle was extracted from 19 Gallery sequences, 15

78

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate



(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90


False Alarm Rate

Ver

ifica

tion

Rat

e


(b)

Figure 36. Performance of the SoPF representation using silhouette boundary edges as lowlevel features. (a) CMC curves and (b) ROCs plotted up to a false alarm rate of 20%.

79

Table 24. Performance comparison of the SoPF algorithm when using silhouette masked im-age edges (SoPF-M) and silhouette boundary edges (SoPF-B) as low level features. (Correctidentifications (Corr), Total probes in gallery (Tot), and identification rate (PI)).

SoPF-M SoPF-B Paired Outcomes Mc Nemar’sExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF test p-value

A 64/71 90% 47/71 66% 44 20 3 4 < 0.01B 30/41 74% 25/41 61% 24 6 1 10 0.05C 22/41 54% 14/41 34% 9 13 5 14 0.05D 10/66 15% 8/66 12% 2 8 6 50 0.40E 3/42 7% 4/42 10% 2 1 2 37 0.75F 3/66 5% 8/66 12% 0 3 8 55 0.97G 3/42 7% 2/42 5% 0 3 2 37 0.47

(a) (b)

Figure 37. (a) Manually extracted silhouette and (b) automatically extracted silhouette.

Probe B sequences, and 12 Probe D sequences. This ground truth data allows us to obtain

recognition results that are not influenced by noise, shadows, and errors of the automatic

background subtraction techniques. We considered the edges from the boundary of the

silhouettes as the strategy to extract low level features for the SoPF representation. Fig. 37

shows silhouettes from the same frame (a) manually and (b) automatically extracted, in

which we can see how much background noise is eliminated in the ground truth data.

Gait recognition results are shown in Table 25 along with the number of successes and

failures from Mc Nemar’s test. In this two experiments the behavior of both algorithms

was similar with the SoPF algorithm having more successes than the baseline. But the

performance of experiment D, that is the surface covariate, seems to be worse than that of

80

Table 25. Gait recognition results using ground truth silhouettes. (Correct identifications(Corr), Total probes in gallery (Tot), and identification rate (PI)).

Baseline SoPF Paired OutcomesExperiment Corr/Tot PI Corr/Tot PI SS SF FS FF

B 6/15 40% 7/15 47% 5 1 2 7D 1/12 8% 2/12 17% 0 1 2 9

experiment B, that is viewpoint. For definitive conclusions we will have to experiment with

larger number of manually selected silhouettes.

9.3 Performance Variation of Baseline and SoPF Algorithms due to Variations

in Gallery Data

In this section, we consider the challenge dataset as the union of eight subsets {(G,A,R) ∪(G,A,L)∪(G,B,R)∪(G,B,L)∪(C,A,R)∪(C,B,R)∪(C,A,L)∪(C,B,L)}. The notation

for each subset was defined in Section 8.3. So far, we have used the (G, A, R) subset as

gallery for our experiments. Here we vary the choice of the gallery and compute recognition

rates for all the challenge experiments (see Table 20) as described in Section 5.3.2. Table 26

depicts the relationships of data subsets with the challenge experiments. The corresponding

results are shown in Table 27, which contains identification rates for the baseline algorithm,

and Table 28, which contains the rates for the SoPF algorithm. Figs. 38 to 44 show the

CMC curves for each experiment of both baseline and SoPF algorithms. For the SoPF

algorithm, we consider the first approach described in Section 9.1.1, which use silhouette

masked image edges as low level features. The set of parameters used, both for the baseline

and SoPF algorithms, are the same used in previous experiments. No optimization was

performed.

The results presented in this section show that walking surface is the covariate causing

most of the variations in performance, followed by view, and with shoe being more sta-

ble. Within walking surfaces (experiments A, B and C), performance is better than across

surfaces (D, E, F, and G), and with less variations. Experiment B is the one with small-

est variation in performance for both the baseline and SoPF algorithms and with similar

81

Table 26. Relationship between data subsets and challenge experiments when using differentsubsets as gallery.

ExperimentsA B C D E F G

Gallery (view) (shoe) (shoe (surface) (surface (surface (surface+ view) + shoe) + view) shoe

+ view)(G,A,R) (G,A,L) (G,B,R) (G,B,L) (C,A,R) (C,B,R) (C,A,L) (C,B,L)(G,A,L) (G,A,R) (G,B,L) (G,B,R) (C,A,L) (C,B,L) (C,A,R) (C,B,R)(G,B,R) (G,B,L) (G,A,R) (G,A,L) (C,B,R) (C,A,R) (C,B,L) (C,A,L)(G,B,L) (G,B,R) (G,A,L) (G,A,R) (C,B,L) (C,A,L) (C,B,R) (C,A,R)(C,A,R) (C,A,L) (C,B,R) (C,B,L) (G,A,R) (G,B,R) (G,A,L) (G,B,L)(C,B,R) (C,B,L) (C,A,R) (C,A,L) (G,B,R) (G,A,R) (G,B,L) (G,A,L)(C,A,L) (C,A,R) (C,B,L) (C,B,R) (G,A,L) (G,B,L) (G,A,R) (G,B,R)(C,B,L) (C,B,R) (C,A,L) (C,A,R) (G,B,L) (G,A,L) (G,B,R) (G,A,R)

Table 27. Performance variation of baseline algorithm due to variations in gallery type.

Gallery ExperimentsA B C D E F G

(G,A,R) 79% 66% 56% 29% 24% 30% 9%(G,A,L) 81% 68% 53% 38% 25% 25% 18%(G,B,R) 83% 82% 44% 38% 28% 19% 24%(G,B,L) 73% 73% 51% 41% 32% 29% 27%(C,A,R) 56% 77% 47% 15% 14% 12% 8%(C,B,R) 77% 81% 44% 22% 24% 19% 17%(C,A,L) 70% 77% 49% 20% 16% 11% 16%(C,B,L) 83% 77% 53% 16% 29% 16% 12%Range 70-83% 66-82% 44-56% 15-38% 14-32% 11-30% 8-27%

recognition rates. Also looking at variation in performance of experiment A we see that

viewpoint change on concrete seems to impact the baseline algorithm to some extent but it

produces grater variations in the performance of the SoPF algorithm.

82

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment A − view)

G: (G,A,R) P: (G,A,L)G: (G,A,L) P: (G,A,R)G: (G,B,R) P: (G,B,L)G: (G,B,L) P: (G,B,R)G: (C,A,R) P: (C,A,L)G: (C,B,R) P: (C,B,L)G: (C,A,L) P: (C,A,R)G: (C,B,L) P: (C,B,R)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment A − view)

G: (G,A,R) P: (G,A,L)G: (G,A,L) P: (G,A,R)G: (G,B,R) P: (G,B,L)G: (G,B,L) P: (G,B,R)G: (C,A,R) P: (C,A,L)G: (C,B,R) P: (C,B,L)G: (C,A,L) P: (C,A,R)G: (C,B,L) P: (C,B,R)

(b)

Figure 38. CMCs of (a) baseline and (b) SoPF algorithms for experiment A (view).

83

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment B − shoe)

G: (G,A,R) P: (G,B,R)G: (G,A,L) P: (G,B,L)G: (G,B,R) P: (G,A,R)G: (G,B,L) P: (G,A,L)G: (C,A,R) P: (C,B,R)G: (C,B,R) P: (C,A,R)G: (C,A,L) P: (C,B,L)G: (C,B,L) P: (C,A,L)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment B − shoe)

G: (G,A,R) P: (G,B,R)G: (G,A,L) P: (G,B,L)G: (G,B,R) P: (G,A,R)G: (G,B,L) P: (G,A,L)G: (C,A,R) P: (C,B,R)G: (C,B,R) P: (C,A,R)G: (C,A,L) P: (C,B,L)G: (C,B,L) P: (C,A,L)

(b)

Figure 39. CMCs of (a) baseline and (b) SoPF algorithms for experiment B (shoe).

84

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment C − shoe+view)

G: (G,A,R) P: (G,B,L)G: (G,A,L) P: (G,B,R)G: (G,B,R) P: (G,A,L)G: (G,B,L) P: (G,A,R)G: (C,A,R) P: (C,B,L)G: (C,B,R) P: (C,A,L)G: (C,A,L) P: (C,B,R)G: (C,B,L) P: (C,A,R)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment C − shoe+view)

G: (G,A,R) P: (G,B,L)G: (G,A,L) P: (G,B,R)G: (G,B,R) P: (G,A,L)G: (G,B,L) P: (G,A,R)G: (C,A,R) P: (C,B,L)G: (C,B,R) P: (C,A,L)G: (C,A,L) P: (C,B,R)G: (C,B,L) P: (C,A,R)

(b)

Figure 40. CMCs of (a) baseline and (b) SoPF algorithms for experiment C (view and shoe).

85

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment D − surface)

G: (G,A,R) P: (C,A,R)G: (G,A,L) P: (C,A,L)G: (G,B,R) P: (C,B,R)G: (G,B,L) P: (C,B,L)G: (C,A,R) P: (G,A,R)G: (C,B,R) P: (G,B,R)G: (C,A,L) P: (G,A,L)G: (C,B,L) P: (G,B,L)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment D − surface)

G: (G,A,R) P: (C,A,R)G: (G,A,L) P: (C,A,L)G: (G,B,R) P: (C,B,R)G: (G,B,L) P: (C,B,L)G: (C,A,R) P: (G,A,R)G: (C,B,R) P: (G,B,R)G: (C,A,L) P: (G,A,L)G: (C,B,L) P: (G,B,L)

(b)

Figure 41. CMCs of (a) baseline and (b) SoPF algorithms for experiment D (surface).

86

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment E − surface+shoe)

G: (G,A,R) P: (C,B,R)G: (G,A,L) P: (C,B,L)G: (G,B,R) P: (C,A,R)G: (G,B,L) P: (C,A,L)G: (C,A,R) P: (G,B,R)G: (C,B,R) P: (G,A,R)G: (C,A,L) P: (G,B,L)G: (C,B,L) P: (G,A,L)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment E − surface+shoe)

G: (G,A,R) P: (C,B,R)G: (G,A,L) P: (C,B,L)G: (G,B,R) P: (C,A,R)G: (G,B,L) P: (C,A,L)G: (C,A,R) P: (G,B,R)G: (C,B,R) P: (G,A,R)G: (C,A,L) P: (G,B,L)G: (C,B,L) P: (G,A,L)

(b)

Figure 42. CMCs of (a) baseline and (b) SoPF algorithms for experiment E (surface andshoe).

87

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment F − surface+view)

G: (G,A,R) P: (C,A,L)G: (G,A,L) P: (C,A,R)G: (G,B,R) P: (C,B,L)G: (G,B,L) P: (C,B,R)G: (C,A,R) P: (G,A,L)G: (C,B,R) P: (G,B,L)G: (C,A,L) P: (G,A,R)G: (C,B,L) P: (G,B,R)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment F − surface+view)

G: (G,A,R) P: (C,A,L)G: (G,A,L) P: (C,A,R)G: (G,B,R) P: (C,B,L)G: (G,B,L) P: (C,B,R)G: (C,A,R) P: (G,A,L)G: (C,B,R) P: (G,B,L)G: (C,A,L) P: (G,A,R)G: (C,B,L) P: (G,B,R)

(b)

Figure 43. CMCs of (a) baseline and (b) SoPF algorithms for experiment F (surface andview).

88

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment G − surface+shoe+view)

G: (G,A,R) P: (C,B,L)G: (G,A,L) P: (C,B,R)G: (G,B,R) P: (C,A,L)G: (G,B,L) P: (C,A,R)G: (C,A,R) P: (G,B,L)G: (C,B,R) P: (G,A,L)G: (C,A,L) P: (G,B,R)G: (C,B,L) P: (G,A,R)

(a)

0 2 4 6 8 10 12 14 16 18 200

10

20

30

40

50

60

70

80

90

100

Rank

Iden

tific

atio

n R

ate

Cummulative Match Characteristics (Experiment G − surface+shoe+view)

G: (G,A,R) P: (C,B,L)G: (G,A,L) P: (C,B,R)G: (G,B,R) P: (C,A,L)G: (G,B,L) P: (C,A,R)G: (C,A,R) P: (G,B,L)G: (C,B,R) P: (G,A,L)G: (C,A,L) P: (G,B,R)G: (C,B,L) P: (G,A,R)

(b)

Figure 44. CMCs of (a) baseline and (b) SoPF algorithms for experiment G (surface, shoeand view).

89

Table 28. Performance variation of the SoPF algorithm due to variations in gallery type.

Gallery ExperimentsA B C D E F G

(G,A,R) 90% 74% 54% 15% 7% 5% 7%(G,A,L) 89% 79% 42% 6% 8% 8% 8%(G,B,R) 88% 76% 49% 8% 8% 6% 6%(G,B,L) 85% 81% 62% 6% 6% 13% 8%(C,A,R) 44% 70% 26% 16% 13% 6% 3%(C,B,R) 59% 65% 38% 11% 12% 11% 8%(C,A,L) 54% 65% 32% 5% 3% 3% 3%(C,B,L) 58% 71% 26% 6% 10% 2% 10%Range 44-90% 65-81% 26-62% 6-16% 3-13% 2-13% 3-10%

90

CHAPTER 10

CONCLUSIONS

We presented a statistical framework for motion analysis that tracks the variation of non-

stationarity in the distributions of relations among image features in individual frames.

We proposed the concept of the Space of Probability Functions (SoPF) that allows us to

capture the non-stationary variations. Among the attractive features of this approach are

(a) no feature level tracking or correspondence is necessary, (b) segmentation of object from

background need not be perfect, (c) there is no need for explicit object shape models, and

(d) movement between frames need not be in the order of one or two pixels. We presented

extensive experiments. First, we studied the robustness of the SoPF representation with

respect to segmentation and scale changes. Second, we explored the possibility of recognition

from walking, jogging, and running gaits. Third, we studied the variation of walking gait

with respect to viewpoint changes. Fourth, we benchmarked the performance using the

gait challenge problem over a large gait dataset. Qualitative conclusions that can be drawn

from the studies are: (a) the SoPF representation is robust with respect to segmentation

and scale changes, (b) the subject is a far greater source of gait variation than viewpoint,

motion types, or direction of motion, (c) it is possible to recognize persons from jogging

and running gaits and not just from walking gait, (d) gait-based recognition need not be

restricted to frontal-parallel views; walking gait viewed from 22.5◦ and 45◦ also results in

similar recognition as that from frontal-parallel views, and (d) the effects of different surface

types are statistically more significant than shoe or viewpoint.

For future work, we will consider the use of a different technique for the recognition

stage of the SoPF representation to substitute the time normalized and time un-normalized

distances. For instance, we are considering Auto Regressive Moving Average models, which

represent time series as a set of coefficients that can be used to represent a gait pattern,

91

SoPF traces in our case, for classification. From data complexity point of view, we are

expecting to increment the size of the challenge dataset to over 100 subjects, which will

include more covariates. The next set of data that is going to be available will include

persons carrying objects, which will be interesting to investigate. Then, another set of

data will be available which will include the same persons after some period of time. This

“persons over time” covariate will allow us to investigate variations with respect to physical

changes (i.e. hair length) and clothing.

92

REFERENCES

[1] A. Selinger and L. Wixson, “Classifying moving objects as rigid or non-rigid withoutcorrespondences,” in DARPA, pp. 341–347, 1998.

[2] R. Cutler and L. Davis, “Robust real-time periodic motion detection, analysis, andapplications,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 22, no. 8, pp. 781–796, 2000.

[3] R. Polana and R. Nelson, “Detection and recognition of periodic, nonrigid motion,”International Journal of Computer Vision, vol. 23, no. 3, pp. 261–282, 1997.

[4] S. Seitz and C. Dyer, “Cyclic motion analysis using periodic trace,” in Motion-BasedRecognition, ch. 4, Kluwer Academic Publishers, 1997.

[5] R. Collins, A. Lipton, and T. Kanade, “Introduction to the special section on videosurveillance,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 22, no. 8, pp. 745–746,2000.

[6] M. Shah and R. E. Jain, Motion-Based Recognition. Kluwer Academic Publishers,1997.

[7] M. Black and A. Jepson, “EigenTracking:robust matching and tracking of articulatedobjects using view-based representation,” in European Conference on Computer Vision,pp. 329–342, 1996.

[8] M. Black, Y. Yacoob, and S. Ju, “Recognizing human motion using parameterized mod-els of optical flow,” in Motion-Based Recognition, ch. 11, Kluwer Academic Publishers,1997.

[9] J. Little and J. Boyd, “Recognizing people by their gait: The shape of motion,” Videre,vol. 1, no. 2, pp. 1–33, 1998.

[10] N. Goddard, “Human activity recognition,” in Motion-Based Recognition, ch. 7, KluwerAcademic Publishers, 1997.

[11] I. Robledo Vega and S. Sarkar, “Experiments on gait analysis by exploiting nonsta-tionarity in the distribution of feature relationships,” in International Conference onPattern Recognition, pp. I:385–388, 2002.

[12] I. Robledo Vega and S. Sarkar, “Representation of the evolution of feature relationshipstatistics: Human gait-based recognition,” IEEE Trans. Pattern Anal. and Mach. In-tel., Under Revision.

93

[13] A. Huet and E. Hancock, “Line pattern retrieval using relational histograms,” IEEETransactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 13, pp. 1363–1370, 1999.

[14] B. Schiele and J. Crowley, “Recognition without correspondence using multidimen-sional receptive field histograms,” International Journal of Computer Vision, vol. 36,no. 1, pp. 31–50, 2000.

[15] S. Belongie and J. Malik, “Matching with shape contexts,” in Workshop on Content-Based Access of Image and Video Libraries, pp. 20–26, 2000.

[16] M. Murray, A. Drought, and R. Kory, “Walking patterns of normal men,” Journal ofBone and Joint Surgery, vol. 46-A, no. 2, pp. 335–360, 1964.

[17] C. Kirtley, Introduction to Computerized Gait Analysis.http://engineering.cua.edu/biomedical/courses/be522/notes.html, The Hong KongPolytechnic University, 1997.

[18] E. Muybridge, The Human Figure in Motion. Dover Publications, 1955.

[19] K. Aminian, B. Najafi, C. Bula, P.-F. Leyvraz, and P. Robert, “Ambulatory gait anal-ysis using gyroscopes,” in 25th. Annual Meeting of the American Society of Biome-chanics, 2001.

[20] I. Pappas, T. Keller, and M. Popovic, “Validation of a new gait phase detection sys-tem,” in 6th. Annual Gait and Clinical Movement Analysis Meeting, 2001.

[21] R. Huitema, A. Hof, and K. Postema, “Ultrasonic motion analysis system - measurmentof temporal and spatial gait parameters,” Journal of Biomechanics, vol. 35, no. 6,pp. 837–842, 2002.

[22] H. Sadeghi, P. Allard, F. Prince, and H. Labelle, “Symmetry and limb dominance inable-bodied gait: A review,” Gait and Posture, vol. 12, pp. 34–45, 2000.

[23] J. Ambrosio, G. Lopes, J. Costa, and A. J., “Spatial reconstruction of the humanmotion based on images of a single camera,” Journal of Biomechanics, vol. 34, no. 9,pp. 1217–1221, 2001.

[24] M. LaFiandra, K. Holt, R. Wagenaar, and J. Obusek, “Transverse plane kinetics duringtreadmill walking with and without a load,” Clinical Biomechanics, vol. 17, pp. 34–45,2002.

[25] T. Chau, “A review of analytical techniques for gait data. part 1: Fuzzy, statisticaland fractal methods,” Gait and Posture, vol. 13, pp. 49–66, 1 2001.

[26] T. Chau, “A review of analytical techniques for gait data. part 2: Neural networks andwavelet methods,” Gait and Posture, vol. 13, no. 2, pp. 102–120, 2001.

[27] G. Johansson, “Visual perception of biological motion and a model for its analysis,”Perception & Psychophysics, vol. 14, no. 2, pp. 201–211, 1973.

[28] J. Cutting and L. Kozlowski, “Recognizing friends by their walk: Gait perceptionwithout familiarity cues,” Bulletin of the Psychonomic Society, vol. 9, no. 5, pp. 353–356, 1977.

94

[29] L. Kozlowski and J. Cutting, “Recognizing the sex of a walker from a dynamic point-light display,” Perception & Psychophysics, vol. 21, no. 6, pp. 575–580, 1977.

[30] G. Mather and L. Murdoch, “Gender discrimination in biological motion displays basedin dynamic cues,” Proceedings of the Royal Society of London Series B-Biological Sci-ences, vol. 258, pp. 273–279, 1994.

[31] G. Mather, K. Radford, and W. S., “Low level visual processing of biological mo-tion,” Proceedings of the Royal Society of London Series B-Biological Sciences, vol. 249,pp. 149–155, 1992.

[32] J. Bientema and M. Lappe, “Perception of biological motion without local image mo-tion,” in Proceedings of the National Academy of Sciences of the United Satets ofAmerica, vol. 99, pp. 5661–5663, April 2002.

[33] P. Neri, M. Concetta Morrone, and D. Burr, “Seeing biological motion,” Nature,vol. 395, pp. 894–896, October 1998.

[34] M. Pavlova, I. Krageloh-Mann, N. Birbaumer, and A. Sokolov, “Biological motionshown backwards: The apparent-facing effect,” Perception, vol. 31, no. 4, pp. 435–443,2002.

[35] E. Grossman, R. Donnelly, R. Price, D. Pickens, V. Morgan, and R. Blake, “Brainareas involved in perception of biological motion,” Journal of Cognitive Neuroscience,vol. 12, no. 5, pp. 711–720, 2000.

[36] E. Grossman and R. Blake, “Brain activity evoked by inverted and imagined biologicalmotion,” Vision Research, vol. 41, no. 10-11, pp. 1475–1482, 2001.

[37] J. Grezes, P. Fonlupt, B. Bertenthal, C. Delon-Martin, C. Segebarth, and J. Decety,“Does perception of biological motion rely on specific brain regions?,” NeuroImage,vol. 13, no. 5, pp. 775–785, 2001.

[38] D. Hoffman and B. Flinchbaugh, “The interpretation of biological motion,” BiologicalCybernatics, vol. 42, pp. 195–204, 1982.

[39] B. Flinchbaugh and B. Chandrasekaran, “A theory of spatio-temporal aggregation forvision,” AI, vol. 17, pp. 387–407, 1981.

[40] A. Bissacco, A. Chiuso, Y. Ma, and S. Soatto, “Recognition of human gaits,” in Com-puter Vision and Pattern Recognition, pp. II:52–57, 2001.

[41] R. Polana and R. Nelson, “Temporal texture and activity recognition,” in Motion-Based Recognition, ch. 5, Kluwer Academic Publishers, 1997.

[42] S. Sarkar and I. Robledo Vega, “Discrimination of motion based on traces in the spaceof probability functions over feature relations,” in Computer Vision and Pattern Recog-nition, pp. I:976–983, 2001.

[43] A. Bobick and A. Johnsson, “Gait recognition using static, activity-specific parame-ters,” in Computer Vision and Pattern Recognition, pp. I:423–430, 2001.

95

[44] R. Tanawongsuwan and A. Bobick, “Gait recognition from time-normalized joint-angle trajectories in the walking plane,” in Computer Vision and Pattern Recognition,pp. II:726–731, 2001.

[45] G. Shakhnarovich, L. Lee, and T. Darrell, “Integrated face and gait recognition frommultiple views,” in Computer Vision and Pattern Recognition, pp. I:439–446, 2001.

[46] L. Lee and W. Grimson, “Gait analysis for recognition and classification,” in Interna-tional Conference on Automatic Face and Gesture Recognition, pp. 155–162, 2002.

[47] J. Boyd, “Video phase-locked loops in gait recignition,” in International Conferenceon Computer Vision, pp. I:696–703, 2001.

[48] J. Hayfron-Acquah, M. Nixon, and J. Carter, “Automatic gait recognition by symme-try analysis,” in 3rd International Conference on Audio- and Video-Based BiometricPerson Authentication, pp. 272–277, 2001.

[49] J. Shutler, M. Nixon, and C. Carter, “Statistical gait description via temporal mo-ments,” in 4th IEEE Southwest Symp. on Image Analysis and Int., pp. 291–295, 2000.

[50] C. BenAbdelkader, R. Cutler, H. Nanda, and L. Davis, “Eigengait: Motion-basedrecognition of people using image self-similarity,” in 3rd International Conference onAudio- and Video-Based Biometric Person Authentication, pp. 284–294, 2001.

[51] C. BenAbdelkader, R. Cutler, and L. Davis, “Motion-based recognition of people ineigengait space,” in International Conference on Automatic Face and Gesture Recog-nition, pp. 267–272, 2002.

[52] C. BenAbdelkader, R. Cutler, and L. Davis, “Stride and cadence as a biometric in auto-matic person identification and verification,” in International Conference on AutomaticFace and Gesture Recognition, pp. 372–377, 2002.

[53] A. Kale, A. Rajagopalan, N. Cuntoor, and V. Kruger, “Human identification us-ing gait,” in International Conference on Automatic Face and Gesture Recognition,pp. 336–341, 2002.

[54] R. Collins, R. Gross, and J. Shi, “Silhouette-based human identification from bodyshape and gait,” in International Conference on Automatic Face and Gesture Recogni-tion, pp. 366–371, 2002.

[55] K. Boyer and A. Kak, “Structural stereo for 3-D vision,” IEEE Trans. PatternAnal. and Mach. Intel., vol. 10, no. 2, pp. 144–166, 1988.

[56] R. Wilson and E. Hancock, “Graph matching by configurational relaxation,” in Inter-national Conference on Pattern Recognition, pp. B:563–566, 1994.

[57] K. Siddiqi, A. Shokoufandeh, S. Dickinson, and S. Zucker, “Shock graphs and shapematching,” International Journal of Computer Vision, vol. 35, no. 1, pp. 13–32, 1999.

[58] Y. Keselman and S. Dickinson, “Generic model abstraction from examples,” in Com-puter Vision and Pattern Recognition, pp. I:856–863, 2001.

[59] D. Lowe, “Object recognition from local scale-invariant features,” in International Con-ference on Computer Vision, pp. 1150–1157, 1999.

96

[60] A. Bobick and A. Wilson, “A state based approach to the representation and recogni-tion of gesture,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 19, no. 12, pp. 1325–1337, 1997.

[61] A. Jain, R. Bolle, and S. Pankanti, Biometrics: Personal Identification in a NetworkedSociety. Kluwer Academic Publishers, 1999.

[62] P. Phillips, H. Moon, S. Rizvi, and P. Rauss, “The FERET evaluation methodologyfor face-recognition algorithms,” IEEE Trans. Pattern Anal. and Mach. Intel., vol. 22,no. 10, pp. 1090–1104, 2000.

[63] J. Beveridge, B. Draper, K. She, and G. Givens, “Parametric and nonparametric meth-ods for the statistical evaluation of humanid algorithms,” in IEEE Workshop on Em-pirical Evaluation Methods in Computer Vision, pp. xx–yy, 2001.

[64] P. Phillips, S. Sarkar, I. Robledo, P. Grother, and K. Bowyer, “Baseline results for thechallenge problem of Human ID using gait analysis,” in International Conference onAutomatic Face and Gesture Recognition, pp. 137–142, 2002.

[65] P. Phillips, S. Sarkar, I. Robledo, P. Grother, and K. Bowyer, “The gait identificationchallenge problem: Data sets and baseline algorithm,” in International Conference onPattern Recognition, pp. I:1–4, 2002.

97

ABOUT THE AUTHOR

Isidro Robledo Vega earned his B.S. in Industrial Engineering in Electronics in 1989 and

M.S. in Electronics Engineering with Computer Science option in 1996 at the Instituto

Tecnologico de Chihuahua in Chihuahua, Mexico. His research interests include computer

vision, digital image processing and artificial intelligence.

oﬃce of graduate studies university of south florida tampa, …sarkar/pdfs/isidro... ·...

Documents