Behavior Recognition Architecture for Surveillance
Applications
Charles J. Cohen
Katherine A. Scott
Marcus J. Huber
Steven C. Rowe
Cybernet Systems Corporation
727 Airport Blvd.
Ann Arbor, MI 48108
(734) 668-2567
ccohen, kscott, mhuber, srowe @ cybernet.com
Frank Morelli
U.S. Army Research Laboratory
Aberdeen Proving Ground, MD 21005
(410) 278-8824
frank.morelli @ us.army.mil
Abstract— Differentiating between normal human activity
and aberrant behavior via closed circuit television cameras
is a difficult and fatiguing task. The vigilance required of
human observers when engaged in such tasks must remain
constant, yet attention falls off dramatically over time. In
this paper we propose an architecture for capturing data
and creating a test and evaluation system to monitor video
sensors and tag aberrant human activities for immediate
review by human monitors. A psychological perspective
provides the inspiration of depicting isolated human motion
by point-light walker (PLW) displays, as they have been
shown to be salient for recognition of action. Low level
intent detection features are used to provide an initial
evaluation of actionable behaviors. This relies on strong
tracking algorithms that can function in an unstructured
environment under a variety of environmental conditions.
Critical to this is creating a description of “suspicious
behavior” that can be used by the automated system. The
resulting confidence value assessments are useful for
monitoring human activities and could potentially provide
early warning of IED placement activities.
INTRODUCTION
Closed circuit television (CCTV) cameras have become a
pervasive component in modern society’s arsenal of crime
deterrent devices. When employed as preventative devices,
CCTVs are only as effective as the human monitor who
views the data and orchestrates a response. The human
operator in a CCTV system is subject to boredom and
fatigue, and usually has to shift his or her attention between
a large number of CCTV camera inputs. The combination of
fatigue and distraction makes it difficult for human CCTV
monitors to detect criminal and terrorist behavior.
Additionally, the cues leading up to these behaviors may be
subtle, easily escaping human detection during cursory
monitoring.
Here we detail a software system that assists the CCTV
monitor by ―flagging‖ potentially hostile behavior for
further scrutiny, and alerting monitoring personnel, in order
to increase operational efficiency, increase vigilance, and
potentially save lives. This is illustrated in Figure 1, where
the surveillance operator is alerted, by the red flashing
border and associated sound, to the center CCTV monitor
that shows a person burying an object.
A system that detects and highlights hostile behavior in
CCTV camera data will improve monitoring only if it is
able to reliably detect hostile behavior without distracting
the human monitor with false-positive results (non-hostile
behavior tagged as hostile, see Table 1). For a multiple
camera system, false-positives are acceptable so long as
they do not distract the user from true hostile behavior – or
in other words, the false positives are not hazardous false-
positives. Benign false positives could potentially be
distracting, but an optimal intent detection system should
closely mirror the behavior of a skilled surveillance
operator, and therefore not present too many distractions.
Figure 1: Surveillance operator alerted to suspicious
activity by the red flashing border (and associated
sound) in the center CCTV.
Table 1: Detection Criteria
Detection Type Description
True Positive The system detects hostile
actions.
True Negative The system detects the lack
of hostile actions.
False Positive The system detects hostile
actions where none exist.
False Negative The system fails to detect
hostile actions where they
do exist.
Hazardous False Positive The system simultaneously
detects a false positive and a
false negative on different
data sources.
Benign False Positive The system simultaneously
detects false positives and
true negatives.
When considering the analysis of 3-dimensional scenes that
feature human activity, biological motion is a critical
component for the visual deduction of complex semantic
information. Johansson’s [2] seminal work showed that
simple isolated joint motion, even without the influence of
contextual factors such as clothing or background
information, provided ample visual data for the accurate
perception of the human form in motion. Johansson’s
original research provided the foundation for modern
inquiry into the perception of human biological motion
using point-light displays, which have revealed that
classifications such as gender [4], [9], [11], action
categorization [5], and even emotional state [6] may be
reliably derived from motion cues alone.
What is critical is a system that can take the tracking of
these feature points, allow the surveillance operator to
define actionable behavior through a usable user interface,
apply a method for intent detection, and test them under a
variety of environmental and operational conditions. In this
paper, we detail the architecture shown in Figure 3.
A vertical slice demonstration of this system has been
developed, using the basic elements of each architecture
module. Each module will be explained, although some
modules are basic in description and are just pure interface
software. Other modules that deal with machine vision and
assessment are given expanded space. The final section
details our conclusions, the applications of our research
system, and our future work.
SECTION I: HUMAN PERCEPTION OF INTENT
In an attempt to capture what the Surveillance Operator is
observing, we examine the human perception of intent from
a psychological perspective. Joint movement is a useful
variable for determining often subtle motion cues, where
even partial data due to occlusion may be enough to provide
an accurate classification of intent. The U.S. Army Research
Laboratory (ARL) is currently examining the human
perceptual ability to detect specified classes of human
biological motion, looking at the role of mere exposure as a
possible modulator for proficiency in rapidly categorizing
intent based on human biological motion. The focus for this
research is the anticipatory inference of threatening actions
based solely upon incomplete human motion cues, with a
specific focus on human motion characteristics that lead up
to a violent or threatening action. As the temporal window
that precedes a hostile act progressively closes without
awareness, the likelihood of an undesirable outcome for
forces on the receiving end of an assault increases
dramatically. The earlier one detects the behavioral
tendencies that may be precursors for a hostile act, the
greater the advantage to an operator for executing the rapid,
proactive decisions necessary to thwart or neutralize
impending hostility.
Video Analysis, Markerless Motion Capture, and
Conversion of Footage to Experimental Stimuli
To create a stimulus set that isolates human motion from the
complex visual information inherent to complex 3-
dimensional scenes, video footage of hostile and non-
threatening action sequences is being translated into point-
light walker (PLW) displays. To accomplish this goal, a
video library has been compiled from open source,
unclassified footage of hostile activity, produced and
released to the public over the internet by groups seeking to
maximize the viewing audience for their exploits. These
often lengthy videos have undergone careful review and
have been coded and parsed into short clips of footage that
Figure 3: Surveillance Data Collection & Testing
Architecture
Figure 2: a) Image frame taken from video footage of an
insurgent in the act of throwing an explosive device; b) Points
are superimposed on major joints using AFRL Point-Light
Video Developer (PLVD) software [12]; c) Still frame of
rendered point-light figure shown in isolation.
feature action sequences with the potential for use as
experimental stimuli. A number of problems have been
identified and addressed in the undertaking of this effort,
with the most prominent among them noted below.
One main issue of significant note in this process, and one
that will predictably remain so for future efforts, has been
video format variability. The videos have been encoded
using different codecs (i.e., compression/decompression
algorithms), container formats (e.g., .mpeg, .avi, .rm, .wma,
among others), and frame rate standards (e.g., NTSC, PAL,
SECAM), limiting initial visibility and subsequent usability.
This necessitated a conversion to a singular format, for both
viewing and eventual digitization into point-light walker
displays. Universal video format conversion was
accomplished using multiple applications, including
Macintosh Final Cut Pro, Thompson Grass Valley’s EDIUS
nonlinear postproduction editing software, as well as open
source web-based applications (e.g., http://media-
convert.com). This approach led to successful conversion
for the bulk of the videos, but not without additional
complications. Frame rate conversion has not always been
reliable, leading to difficulty in video segment localization
for processing. Additionally, the conversion process has at
times led to extremely poor video quality, causing
uncertainty when engaged in the point-light digitization
process. For example, extreme pixilation has often resulted
in joint assignment difficulty. When combined with
resolution levels that are initially quite poor, and when
considering natural obstructions inherent to real world
visual scenes that lead to figure occlusion and distortion, the
process of creating point-light displays using a markerless
methodology is often arduous. Despite these concerns, as
well as the inherent tedium involved in the process, this
method for digitization has been quite successful and
minimizes concerns within the field of perceptual
psychology regarding content validity. Given that the
―subjects― being digitized are engaged in behavior without
coercion from a experimental ―director‖ (i.e., they are not
actors engaged in movement on command for experimental
purposes), behavioral data such as detection accuracy and
response time relative to the presentation of motion displays
can be confidently reported to be accurate responses to
genuine behaviors.
To digitize the isolated movements of individuals within the
original video footage into point-light displays, customized
software, conceptually based on the open-source system
generated by the Temple University Vision Laboratory [10],
was written by the U.S. Air Force Research Laboratory
(AFRL) [12] for use by ARL (see Figure 2). Individuals are
isolated from video sequences and rendered into point-light
displays, animations devoid of background detail present in
the original footage. Given that the nature of the depicted
actions is a subjective categorization determined by the
experimenters, an initial study will validate experimenter
action content categorization by asking subjects to passively
observe each display and rate the actions they depict with
respect to perceived threat content. Participant stimulus
categorizations that are concurrent with those assignments
made by experimenters will result in the retention of those
stimuli for further experimental use. Non-concurrent stimuli
will be omitted from further use.
Follow-on experimentation will employ the retained stimuli
for analysis of human response timing and accuracy relative
to incomplete point-light display exposure. Given that
accurate identification of threat within an operational scenario
is less useful subsequent to a hostile action than it could be
prior to execution, where preventative measures might still be
viable, the emphasis for this experiment will be the
examination of response characteristics that fall within the
temporal window preceding the terminal event in an action
sequence. To ensure rapid decision-making on the part of the
observer that falls within that temporal period, the PLW
displays that will be presented to subjects will not feature the
final frames that animate a particular goal action, so observers
will not be inclined to wait for the completion of an executed
motion before rendering their classification of ―threatening‖
or ―innocuous‖. The executed motion they are presented with
will never be fully enacted, thus a choice will be ―forced‖
based on uncertain, incomplete visual input, as is often the
case during real-world perception and action downrange.
SECTION II: SUPPORT MODULES
In this section, we briefly discuss some of the modules that
provide supporting functionality within the Vigilance
application. These support modules include the Input
Database, the Video to Database process, Analysis Database,
and the Display. Later in the paper, we discuss the core
Vigilance components in more detail.
Video to Database. Data from each video source is stored as
individual frames within the SQL database. Each frame can
then be tagged with an arbitrary amount of metadata using
relational methods. The power of this approach comes from
the ability of multiple, independent data processing
applications browsing the data that has been captured and
then further adding to the metadata. This approach also
provides the capability to view or process only the data that
meet specified criteria. In the future, database functions will
also enable the search of observed anomalous behavior in
other stored video. For efficiency purposes, our
implementation collocates a few cameras with a computer
running database server software. These ―sensor nodes‖ are
interconnected via a network and can be treated as a
distributed database by external processing and viewing
applications.
Input Database. This is the storage area for all the raw video
created by the Video to DB module. It can also contain still
imagery and information that the surveillance operator has
specified using the front end as important for detecting
anomalous behavior (such as prohibited zones, loitering
behavior. This information is used for analysis; along with the
operational ―badguyology1‖ parameters (see Section IV
below).
Analysis Database. The Vigilance Modules detect, track,
identify, and classify objects within the video clips and from
this identify benign and potentially hostile events and
behaviors. The Analysis Database is used to maintain the
association between the inputs, badguyology parameters, and
the results computed by the Automated Vigilance Module
and specified by a subject matter expert using the Manual
Vigilance Module.
Display. The Display module shows the results of tracking
and anomaly detection. It also lets the operator view
automated results and select appropriate analysis for truthing
data. For example, the operator can decide that the tracking
algorithms are sufficiently accurate for a specified video set.
That data set would then be available for regression testing if
the tracking algorithms are changed or replaced, resulting in
system performance measurements. This display will also
cause a red border to appear around the associated video
whenever an alert was triggered.
Comparator. This module compares the results of the
Automated Vigilance module to either the truth data specified
from the Manual Vigilance Module (see below) or regression
testing against a prior Automated Vigilance Module run on
the same data. The Comparator Module provides us the
capability to automatically validate approaches and
algorithms against ground truth and prior implementations
and immediately identify whether the modifications improved
the system’s performance.
SECTION III: THE FRONT END
The Front End Module consists of the interface that performs
a number of functions:
Determine which video (one or more) to examine.
Tag objects for tracking (while this is not required
since automatic tracking is part of the system, it is
useful for regression testing).
Set badguyology parameters for surveillance.
Create regression testing experiments.
Run real time operations.
Note that the surveillance video is sent both to a database for
future retrieval and to the front end for real time evaluation.
This allows for the operator to tag people or objects to track
and set up the badguyology parameters (such as marking
zones of interest – see Section III).
SECTION IV: BADGUYOLOGY PARAMETERS
Badguyology Parameters are specifications of the actions,
behaviors, and suspicious activities that the Automated
Vigilance and Manual Vigilance modules use for analysis.
1 Term originated by Dr. Barbara O’Kane, RDECOM, Principal
Scientist for Human Performance, U.S. Army Night Vision Lab
The specific parameters and values are based on the
operator’s specification within the Front End Module,
The types of parameters are, for all intents and purposes,
limitless, based on possible terrorist activities and the subject
matter expertise of surveillance operators. However, we have
made the attempt to create various anomaly categories that
can be used for the first instantiation of this system, and
provide immediate feedback for surveillance operators.
These include, but are not limited to:
Area violations, such as beyond boundary, inside
region, and outside region.
Abnormal/unusual physical behaviors, including
unusual location (in a location not frequented),
unusual motions (such as moving against normal
traffic flow, moving fast, moving slow, and stopping
at too many places), and unusual numbers (too many
and too few).
Abnormal/unusual temporal behaviors, including
dropping/abandoning objects (not picking up
something dropped or someone else picking it up),
picking up objects, loitering, same person/vehicle
appearing too often in video scenes, and odd gaits or
motions (carrying a concealed object, for example).
Camera view being altered (being moved, obscured,
or blinded).
There are a number of vision-system characteristic descriptors
that are required to support the above broad-spectrum of alert
conditions. So far, these include Object Location, Object
Tracking, Object Size, Object Type/Classification, Motion
Detection, Object Speed, Object Color, Object Counts, Object
Abandonment, Object Removal, Temporal/spatial
distributions, Camera Move/Blind, and Compound Queries.
SECTION V: MANUAL VIGILANCE
This module is essentially another user interface with
functionality to allow a surveillance subject matter expert to
manually, given an input video sequence of images and
specified badguyology parameters, mark objects of interest to
be tracked, identifies and classifies them, indicates anomalies
observed, etc.
The results of this module become truthing data, and is
compared with the equivalent output from the automatic
system in the Comparator Module.
Manual analysis will be extremely tedious and take an
extremely long time, and should only be used as an analysis
debugging tool or for extremely critical data that must be
captured in a very specific manner. For full regression
testing, we take the results of the Automated Vigilance
module, and evaluate the results manually in the Display for
approval or rejection.
SECTION VI: AUTOMATED VIGILANCE
This is the core module of our intent detection architecture,
and is composed of two components: camera data extraction
(tracking), detecting anomalous behavior.
CCTV Data Extraction
The collection of CCTV camera data for our intent detection
system is performed at both the system level (i.e. the
collection of CCTV monitoring at a facility) and at the
camera level. The rationale for using a multiple camera
system is that it we believe behaviors exhibited between
multiple cameras may yield a secondary source of intent
data. For example, multiple cameras make it possible to
track an individual moving through a facility or along a road
and observe surveillance behaviors that may be indicative of
IED emplacement.
Single Camera Data Extraction
At the camera level, human motion data for our gesture
recognition system is extracted from CCTV cameras using a
series of image processing operations that yield a set of
features for processing by our gesture recognition system.
For a single CCTV camera the chain of image processing
operations is as follows:
1. Capture raw image data.
2. Pre-process, data - resize and flip as necessary.
3. Segment foreground and background data using the
codebook method [3] (see Figure 4).
4. Threshold the foreground imagery into binary
connected-component ―blobs‖ or features.
5. Dilate or merge the features and ignore any feature
that meets our criteria for noise (e.g. too small, or
unrealistic aspect ratio).
6. Perform a rough classification of the binary image
components using parameters like size, aspect
ratio, position, and color profile to yield a
classification confidence pair. (e.g. 0.6 confidence
that a feature is a torso).
7. Attempt to associate current frame image features
with ―similar‖ features from previous frames. Re-
evaluate the feature classification based on
previous frame data.
8. Attempt to de-compose large features into smaller
sub-features using convex hull detection (e.g.
breaking torsos, that include legs and arms, into
constituent parts).
9. Feed refined image feature position data and
classification into the gesture recognition system
mentioned previously.
The codebook algorithm works by building a per-pixel
background model of color and luminance viewed by the
camera over a short period of time (30 to 150 frames, or 1 to
5 seconds). As reported by Kim [3] the results were quite
good, and the method has greatly improved our system’s
ability to track moving objects. The codebook method
significantly reduces the overall system noise and the
number of motion artifacts that were present in our previous
background subtraction method. Figure 4 gives an example
of the system’s overall segmentation performance. The
codebook method is not without its drawbacks; these
include a number of tuning parameters, the requirement of a
stationary camera, and high computational cost particularly
during initialization.
Multiple Camera Data Extraction
For multiple camera environments, the system consists of
multiple camera sensor nodes, software applications, and a
computer network. This modular approach is highly
scalable; new sensor nodes and applications can be added as
needed. Intelligent pre-processing of data at each sensor
node minimizes required network bandwidth. Each sensor
node contains a standard functionality suite that includes
data capture, data recording, data processing, configuration,
and networking.
A suite of specific programs, rather than a single monolithic
application, controls the configuration of each node. Each of
these programs has a set of operational parameters that are
stored such that they can be updated in real-time via some
API, and is accessible to the database tagging applications.
The programs can be stopped and started remotely.
Intent Recognition
This part of the automated detection uses a variety of
methods to determine intent based on the results of tracking.
Figure 4: Codebook Foreground-Background
Segmentation. Left image: Input image with
tracking. Right image: Segmented image, black
represents background while white represents
foreground imagery.
Zone Detection
We have created a simple ―zone detection‖ algorithm that
allows the user to define a specific zone within a video
stream and receive an alert whenever a person or vehicle
enters the zone. Potential applications of this technology
include watching for person entering or leaving a sidewalk,
a vehicle entering or leaving a road lane, and detecting
transitions in other critical areas. The Automated Vigilance
Module will use the zone detection feature later on to create
a security ―grammar‖ for analysis. For example, a user
could define the boundaries for roads, sidewalks, curbs,
parking areas, etc, and the software could determine if an
observer is entering a dangerous area, or if a vehicle is
travelling through a restricted location
Linger Detection
We have created a software module for detecting lingering
behavior. This software module monitors all tracked objects
and issues an alert whenever an object remains stationary
for pre-determined amount of time. Therefore we can detect
when a previously moving car has parked or when an
observed person has stopped at a street corner.
Path Recording
Path recording allows us to monitor and record the paths
taken by tracked object (as shown in the center monitor of
Figure 1). This method allows us to make more informed
decisions about the behaviors of tracked objects. For
example this method will allow us to collect statistical data
about where tracked objects go and when a particular
object/path/time combination occurs. Furthermore this
feature will be used to expand our image processing
grammar by providing us with path and speed information.
Behavior Recognition
We have developed gesture recognition algorithms that use
position and velocity information to identify behaviors and
intents. Gestures, and combinations of gestures, are
identified through the use of a bank of predictor bins (see
Figure 5), each containing a dynamic system model with
parameters preset to a specific gesture. We assume that the
motions of human circular gestures are decoupled in x and
y, therefore there are separate predictor bins for the x and y
axes (as well as one for z axis when dealing with three
dimensional motions). A predictor bin is required for each
gesture type for each dimension – the position and velocity
information from the sensor module is fed directly into each
bin.
The idea for seeding each bin with different parameters was
inspired by Narendra and Balakrishnan's work on improving
the transient response of adaptive control systems. In this
work, they created a bank of indirect controllers that were
tuned online, but whose identification models had different
initial estimates of the plant parameters. When the plant
was identified, the bin that best matched that identification
supplied a required control strategy for the system [8].
Each bin's model, which has parameters that tune it to a
specific gesture, is used to predict the future position and
velocity of the motion – the prediction of a particular gesture
is made by feeding the current state of the motion into the
gesture model. This prediction is compared to the next
position and velocity, a residual error is computed, and the
bin for each axis with the least residual error is the best
gesture match. If the best gesture match is not below a
predefined threshold (which is a measure of how much
variation from a specific gesture is allowed), then the result is
ignored and no gesture is identified. Otherwise, geometric
information is used to constrain the gesture further – a single
gesture identification number, which represents the
combination of the best x bin, the best y bin, and the
geometric information, is output to the transformation
module. This number (or NULL if no gesture is identified) is
output immediately upon the initiation of the gesture, and is
continually updated. The parameters used to initially seed
each predictor bin can be calculated by feeding the data of
each axis from previously categorized motions into the
recursive linear least squares identifier.
The identification module contains the majority of the
required processing. Compared to most of the systems
developed for gesture recognition (for example, see [1], and
[7]), the identification module requires relatively little
processing time and memory to identify each individual
gesture feature.
Just as the gesture recognition module is built on a bank of
predictor bins, the behavior recognition system is composed
of a bank of gesture recognition modules. Each module
focuses on a specific point of the body (such as features
acquired from a PLW system). As that point moves through
space, a ―gesture‖ is generated and identified. The
combination of gestures from those points is what we define
as a motion behavior, which can be categorized and
identified. The simplicity of the behavior recognition
system is possible because of the strength and utility of the
gesture recognition modules.
Figure 5: Simplified Diagram of the Predictor Module
which Determines Gesture Characteristics.
Currently we are developing a Dempster-Shafer based
analysis, which uses belief functions to combine evidence to
determine a probability of an event. The results of this
analysis will be the subject of a future paper.
SECTION VII: EXPERIMENTS, FUTURE
DIRECTIONS, AND CONCLUSIONS
We have developed an end-to-end vertical slice
demonstration of the system. All the modules have been
created with varying levels of functionality. We also have a
large data set of video for regression testing, and have
performed the intent detection described above on over
sixteen three-minute videos. The goal was to ensure end-to-
end functionality before expanding capabilities.
We performed experiments to test the behavior recognition
system. First, a variety of behaviors were performed and
served as the baseline for identification. Then these actions
were repeated and the data sent through the system.
The results of the end to end system are shown in Figure 6.
Here sixteen CCTV video feeds were individually processed
in real time according to the methods described above
(features were acquired and tracked, gestures recognition,
and behaviors identified). The outputs, with associated
tracking rectangles and red flashing alerts, were merged into
a four-by-four display as shown. Note that while this
merging was not done in real-time (and is part of our future
effort) the recognition and identification was performed in
real time with excellent results.
We are also exploring the identification of group behaviors,
based on combinations of behaviors from one or more
sensors in a CCTV environment. Such group behaviors
would include the recognition of IED placement activities
that require more than one person to bury the device and
plant the transmitter or lay and bury the triggering wire.
Any future developments must involve verification of the
behavior recognition system. Verification involves both the
testing of the algorithms and the use of modules in a
completely integrated system. To verify the system, our
plan is to:
Test the system with a wide variety of users.
Test the system using a large number of intents in
our lexicon.
We strongly believe that these steps will verify the system,
thus facilitating its development into a full system useful in
a wide variety of applications.
Check point duty and security tasks represent high-risk
missions for soldiers. The increase of asymmetric threats to
military and civilian assets throughout the world has
generated a critical need for enhanced perimeter security
systems, such as automated surveillance systems. Current
U.S. Military operations, such as those in Iraq and
Afghanistan, have greatly increased the number of U.S. assets
in harm’s way – which in turn has put a strain on our ability
to adequately protect all of our facilities. Appropriate
perimeter and facility security requires continuous indoor and
outdoor monitoring under a wide variety of environmental
conditions – typically these locations are complex in that they
include open perimeters and high traffic flow that would
strain even human observers.
To maximize the effectiveness of military and security
personnel, automated real-time methods for tracking,
identifying, and predicting the activities of individuals and
groups are crucial. The human resources available to such
tasks are constantly being reduced, while the prevention of
negative actions must be increased. Small numbers of
individuals are able to cause a large amount of damage and
terror by going after non-military secondary targets that still
have an adverse affect on the military’s abilities to fulfill its
missions – the system presented here can be used to identify
behaviors that would be flagged for observation and
evaluation by human monitors based on their observed
threat, thereby reducing the tedium and making it possible
for one operator to accurately and attentively observe large
numbers of CCTV sensors.
REFERENCES
[1] Darrell, Trevor J. and Penland, Alex P. (1993) ―Space-
Time Gestures.‖ In /IEEE Conference on Vision and Pattern
Recognition, /NY, NY.
[2] Johansson, G. (1973). Visual perception of biological
motion and a model for its analysis. Perception &
Psychophysics, 14 (2), 201-211.
[3] K. Kim (2005) Real-time Foreground-Background
Segmentation using Codebook Model, Real-time Imaging,
Volume 11, Issue 3, Pages 167-256
[4] Kozlowski, L.T. & Cutting, J.E. (1977). Recognizing the
sex of a walker from a dynamic point-light display.
Perception & Psychophysics, 21 (6), 575-580.
[5] Loula, F., Prasad, S., Harber, K. & Shiffrar, M. (2005).
Recognizing People From Their Movement. Journal of
Figure 6: Monitoring system showing vehicles and
people being tracked in real time.
Experimental Psychology: Human Perception and
Performance, 31(1), 210–220.
[6] Montepare, J.M., Goldstein, S.B. & Clausen, A. (1987).
The identification of emotions from gait information.
Journal of Nonverbal Behavior, 11(1), 33-42.
[7] Murakami, Kouichi and Taguchi, Hitomi (1991).
―Gesture Recognition Using Recurrent Neural Networks.‖
/Journal of the ACM/, 1(1):237-242.
[8] Narendra, Kumpati S. and Balakrishnan, Jeyendran
(1994). ―Improving Transient Response of Adaptive Control
Systems using Multiple Models and Switching.‖ /IEEE
Transactions on Automatic Control/, 39:1861-1866.
[9] Pollick, F.E., Kay, J.W., Heim, K. & Stringer, R. (2005).
Gender recognition from point-light walkers. Journal of
Experimental Psychology: Human Perception and
Performance, 31(6), 1247–1265.
[10] Shipley, T.F. & Brumberg, J.S. (2003). Markerless
motion-capture for point-light displays. Technical Report,
Temple University Vision Laboratory, Philadelphia, PA.
[11] Troje, N.F. (2002). Decomposing biological motion: A
framework for analysis and synthesis of human gait
patterns. Journal of Vision, 2, 371-387.
[12] Fullenkamp, A. M. (in press). Point-light visualization
developer user guide. Technical Report, U.S. Air Force
Research Laboratory, Wright-Patterson Air Force Base, OH.