gesture recognition using bezier curves for visualization navigation from registered 3-d data

14
Pattern Recognition 37 (2004) 1011 – 1024 www.elsevier.com/locate/patcog Gesture recognition using Bezier curves for visualization navigation from registered 3-D data Min C. Shin a ; , Leonid V. Tsap b , Dmitry B. Goldgof c a Department of Computer Science, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USA b Advanced Communications and Signal Processing Group, Electronics Engineering Department, University of California Lawrence Livermore National Laboratory, Livermore, CA 94551, USA c Department of Computer Science, University of South Florida, Tampa, FL 34642, USA Received 22 May 2003; accepted 6 November 2003 Abstract This paper presents a gesture recognition system for visualization navigation. Scientists are interested in developing inter- active settings for exploring large data sets in an intuitive environment. The input consists of registered 3-D data. A geometric method using Bezier curves is used for the trajectory analysis and classication of gestures. The hand gesture speed is incor- porated into the algorithm to enable correct recognition from trajectories with variations in hand speed. The method is robust and reliable: correct hand identication rate is 99.9% (from 1641 frames), modes of hand movements are correct 95.6% of the time, recognition rate (given the right mode) is 97.9%. An application to gesture-controlled visualization of 3D bioinformatics data is also presented. ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Motion analysis; Gesture recognition system; Hand tracking; Bezier curves; Human-computer interaction; Region detection and identication; Visualization navigation; 3-D trajectory analysis 1. Introduction 1.1. Motivation Large and complex data sets are produced at a more rapid pace than tools and algorithms for their processing, analy- sis, and exploration. For example, the National Institute of Health’s (NIH) Visible Human project generated data sets of a single 3-D volume that consists of 12 billion elements. Nearly a terabyte of satellite data is produced on a daily This work was performed under the auspices of the U.S. De- partment of Energy by University of California Lawrence Liver- more National Laboratory under contract number W-7405-Eng- 48. UCRL-JC-152416. Corresponding author. Tel.: +1-704-687-6008; fax: +1-704- 687-3516. E-mail addresses: [email protected] (M.C. Shin), [email protected] (L.V. Tsap), [email protected] (D.B. Goldgof). basis. Advanced physics simulation at Lawrence Livermore National Laboratory (LLNL) is generating large data sets, which are expected to increase to one terabyte every 5 min by 2004. Among the tools used to help explore and understand large data, visualization aids in gaining greater insight into important physical parameters (such as temperature, height, stress, velocity or pressure) and in nding anomalies. Such anomalies often are not obvious to scientists during the au- tomatic localization but are easily picked out with visual data exploration. Correct representation can drastically re- duce the time needed for the analysis. As data becomes huge, it requires more time for processing. Even the latest supercomputers may require days or even weeks for compu- tations. This makes real-time visualization mission-critical, since interesting properties can show up which allow sci- entists to adjust parameters of computation and restart it, if needed. Visualization is integrated into the process and is no longer just the last step. 0031-3203/$30.00 ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2003.11.007

Upload: min-c-shin

Post on 29-Jun-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

Pattern Recognition 37 (2004) 1011–1024www.elsevier.com/locate/patcog

Gesture recognition using Bezier curves for visualizationnavigation from registered 3-D data�

Min C. Shina ;∗, Leonid V. Tsapb, Dmitry B. Goldgof c

aDepartment of Computer Science, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC 28223, USAbAdvanced Communications and Signal Processing Group, Electronics Engineering Department, University of California Lawrence

Livermore National Laboratory, Livermore, CA 94551, USAcDepartment of Computer Science, University of South Florida, Tampa, FL 34642, USA

Received 22 May 2003; accepted 6 November 2003

Abstract

This paper presents a gesture recognition system for visualization navigation. Scientists are interested in developing inter-active settings for exploring large data sets in an intuitive environment. The input consists of registered 3-D data. A geometricmethod using Bezier curves is used for the trajectory analysis and classi5cation of gestures. The hand gesture speed is incor-porated into the algorithm to enable correct recognition from trajectories with variations in hand speed. The method is robustand reliable: correct hand identi5cation rate is 99.9% (from 1641 frames), modes of hand movements are correct 95.6% of thetime, recognition rate (given the right mode) is 97.9%. An application to gesture-controlled visualization of 3D bioinformaticsdata is also presented.? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.

Keywords: Motion analysis; Gesture recognition system; Hand tracking; Bezier curves; Human-computer interaction; Region detection andidenti5cation; Visualization navigation; 3-D trajectory analysis

1. Introduction

1.1. Motivation

Large and complex data sets are produced at a more rapidpace than tools and algorithms for their processing, analy-sis, and exploration. For example, the National Institute ofHealth’s (NIH) Visible Human project generated data setsof a single 3-D volume that consists of 12 billion elements.Nearly a terabyte of satellite data is produced on a daily

� This work was performed under the auspices of the U.S. De-partment of Energy by University of California Lawrence Liver-more National Laboratory under contract number W-7405-Eng- 48.UCRL-JC-152416.

∗ Corresponding author. Tel.: +1-704-687-6008; fax: +1-704-687-3516.

E-mail addresses: [email protected] (M.C. Shin),[email protected] (L.V. Tsap), [email protected] (D.B. Goldgof).

basis. Advanced physics simulation at Lawrence LivermoreNational Laboratory (LLNL) is generating large data sets,which are expected to increase to one terabyte every 5 minby 2004.

Among the tools used to help explore and understandlarge data, visualization aids in gaining greater insight intoimportant physical parameters (such as temperature, height,stress, velocity or pressure) and in 5nding anomalies. Suchanomalies often are not obvious to scientists during the au-tomatic localization but are easily picked out with visualdata exploration. Correct representation can drastically re-duce the time needed for the analysis. As data becomeshuge, it requires more time for processing. Even the latestsupercomputers may require days or even weeks for compu-tations. This makes real-time visualization mission-critical,since interesting properties can show up which allow sci-entists to adjust parameters of computation and restart it, ifneeded. Visualization is integrated into the process and isno longer just the last step.

0031-3203/$30.00 ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.doi:10.1016/j.patcog.2003.11.007

Page 2: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1012 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

Fig. 1. (a) A 15-projector display with a total resolution of 6400 by 3072 used for interactive applications. (b) An example of interactivemanipulation of isosurface and volume rendering parameters.

State-of-the-art visualization displays keep pace with thedata requirements. For instance, one of LLNL’s “powerwalls” (Fig. 1(a)) is a 15-projector system that displays ap-proximately 19.7 million pixels on a 16- by 8-ft screen.Systems like this allow for detailed data analysis and teamcollaboration. However, applying even simple commands tothe data (such as zoom, rotation and translation) requiresa secondary, or “background”, communication process be-tween scientists working with the data (see the two standingby the screen in Fig. 1(a)) and the “operator” responsiblefor executing selected commands (sitting, left). This reducesproductivity of the team and aJects quality of presentations.

Therefore, scientists are interested in developing new,interactive settings for exploring their data in a moreintuitive environment. A gesture-recognition system caninterpret commands and supply data manipulation parame-ters to visualization software (Fig. 1(b)) without having an“operator” involved in the process.

Since the system is being developed as a front end forgesture-controlled, large-scale visualization and virtual re-ality manipulation, certain requirements and complicationsare apparent. First, 3-D information is required, not neces-sarily at a video-frame rate, but at least a few times per sec-ond (optimal parameters should be determined as a result oftesting on a large group of people). Second, traditional tech-niques such as background subtraction cannot easily separatea 5gure from the background, since the entire body of theinteracting person (not only arms or hands) is moving.More-over, interaction takes place in front of the screen where thedata is updated dynamically and, therefore, the backgroundchanges most of the time. Third, motion of the interactingperson should be natural and should result in intuitive datamanipulation, where intuitive means easily learned and fastto provide immediate results.

1.2. Previous work

Gesture tracking and recognition are important researchdomains. Traditional approaches to tracking typically reliedon segmentation of the intensity data, using motion or ap-

pearance data. A majority of the methods began by segment-ing the human body from the background. For example, in“blob approaches”, people were modeled as a number ofblobs resulting from pixel classi5cation based on their colorand position in the image. Wren et al. [1] achieved segmen-tation by classifying pixels into one of several models, in-cluding a static world and a dynamic user represented byGaussian blobs. Yang and Ahuja [2] used skin color and thegeometry of palm and face regions for segmentation stagesof their system. A Gaussian mixture (with parameters es-timated by an EM algorithm) modeled the distribution ofskin-color pixels. Rehg and Kanade [3] used a 3-D handmodel to track a hand. They compared line features from theimages with the projected model and performed incrementalstate corrections. Similar work was presented by Kuch andHuang [4], in which the synthesis process could 5t the handmodel to any person’s hand. Cutler and Davis [5] segmentedthe motion and computed a moving object’s self-similarity(including human motion experiments).

A signi5cant amount of work is being performed inthe area of recognition, where Hidden Markov models(HMMs) are often employed successfully [6–8] by allow-ing researchers to address the highly stochastic nature ofhuman gestures. Yacoob and Black proposed parameterizedrepresentation of human movement in the form of principalcomponents [9]. Bobick and Wilson [10] treated gestureas a sequence of states and computed con5guration statesalong prototype gestures. Yang and Ahuja [2] used motiontrajectories for recognition. Grzeszcuk et al. [11] describedclassi5cation algorithm with statistical moments of the bi-narized gesture templates. Hong et al. [12] treated each ges-ture as a 5nite state machine (FSM) in the spatial-temporalspace; FSMs were trained using k-means clustering. Apreliminary trained neural network was used by Sato et al.[13]. Hongo et al. [14] performed recognition by a lineardiscriminant analysis in each discriminant space by usingfour directional features. Approach described by Yoon etal. [15] derived features from location, angle, and veloc-ity and employed a k-means clustering algorithm for theHMMs. Gesture contour representation and alignment-based

Page 3: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024 1013

classi5cation were proposed by Gupta and Ma [16]. A re-view by Aggarwal and Cai [17] classi5ed approaches to hu-man motion analysis, the tasks involved, and major areas re-lated to human motion interpretation. A review by Pavlovicet al. [18] addressed main components and directions in ges-ture recognition research for human-computer interaction(HCI).

2. Framework

2.1. Overview

In this section, we describe the method for recognizingthree gesture types: rotation, zoom, and translation. Giventhe 3-D trajectory of the manipulating hand, we 5t a Beziercurve to the trajectory. The curvature of the curve is usedto determine the gesture.

Gesture recognition involves 5ve steps:

1. Detecting the manipulating hand.2. Identifying the beginning of the gesture.3. Detecting the end of the gesture.4. Computing the 3-D trajectory of the manipulating hand.5. Recognizing the gesture by 5tting the curve to the 3-D

trajectory.

These steps are described in separate sections of this pa-per. Please see Fig. 2 for an overview of the recognitionprocess.

2.2. Manipulating hand detection

This presents a brief overview of the process. For moredetails, please see [19].Detecting skin regions: To identify a hand involved in

virtual object manipulation, the color image is converted tothe SCT color space to reduce lighting artifacts [20]. Skinpixels are selected with a minimum distance classi5er usingMahalanobis distance (skin statistics is collected oJ-line).After the noise removal (by a sequence of erosions and dila-tions), the skin image is segmented using connected compo-nent analysis. Small regions or unlikely shapes are removedfrom further consideration.Identifying the hand involved in manipulation: We as-

sume that the hand gesture occurs in front of the body.Therefore, the skin region closest to the camera is identi-5ed as the manipulating hand. We determine the 3-D loca-tion of the region by using histogram-based noise 5ltering,where all smaller bins are considered noise [19]. Thus, colorand range features together are suOcient for reliable identi-5cation; no supplementary body tracking is needed. Virtualcommands do not happen all the time. Hand gestures can beclassi5ed in general as object manipulations and other handmovements. Selection of meaningful gestures is describednext.

2.3. Gesture on, gesture o;

We recognize the gestures for manipulation of virtual ob-jects by “grabbing” them. Both grabbing and releasing aredetected by checking the area change of the hand. If the areahas decreased, then the hand is closed and therefore the ob-ject is grabbed. If the area has increased, then the hand isopened and therefore the object is released. Since the speedof the hand closing and opening could vary, we build a his-tory of hand area change. We accumulate the area change onthe past three frames. Grabbing is detected when the accu-mulated change of area is less than −75%, and releasing isdetected when the accumulated change is greater than 75%.Note that the negative area change means the area decreased.

Changes due to small movement or noise around the handarea could occur, and these should not be counted towardthe determination of grabbing and releasing. Therefore, weignore change of area smaller than 15%.

2.4. Bezier curves

2.4.1. Representation and propertiesA Bezier curve uses control points to represent a curve.

C(u) =n∑i=0

Bi;n(u)Pi: (1)

Points along a curve are determined by using u,06 u6 1. n indicates the degree of curve which is one lessthan the number of control points. Pi is the ith control pointwhere P(0) = C(0) and P(n) = C(1). Bi;n is a Bernsteinpolynomial.

Bi;n =n!

i!(n− i)!ni(1 − u)n−i : (2)

We use Bezier curves with three control points describingquadratic curves. The Bernstein polynomials with n=2 areshown below.

B0;2 = (1 − u)2; (3)

B1;2 = 2u(1 − u); (4)

B2;2 = u2: (5)

Therefore, the equation for the Bezier curve with three con-trol points is simpli5ed.

C(u) = (1 − u)2 + 2u(1 − u) + u2: (6)

2.4.2. Curve =tting—general methodWe use the curve approximation method described by

Piegl and Tiller [21]. We describe the method for 5ttingcurves with n + 1 control points with m + 1 data points.The simple solution for three control points is shown inAppendix A.

Page 4: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1014 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

manipulationbegins

manipulationends

gesture

identifymanipulating hand

identifymanipulating hand

identifymanipulating hand

recognize

ROTATION

Fig. 2. Overview of gesture recognition.

Given a set of 3-D points to 5t Q0; : : : ; Qm, we estimatecontrol points P0; : : : ; Pn wherem¿n. Suk is computed usingthe length of the curve up to Qk .

|Q| =m∑i=1

|Qi − Qi−1|; (7)

Suk =

∑ki=1 |Qi − Qi−1|

|Q| : (8)

We 5rst set P0 =C(0)=Q0 and Pn=C(1)=Qm. We thenestimate the unknowns which are the intermediate control

points P1; : : : ; Pn−1 such that E is minimized.

E =m−1∑k=1

|Qk − C( Suk)|2: (9)

For the system with (n+1) control points, the estimationof P becomes solving

(N TN )P = R; (10)

N =

N1;p( Su 1) · · · Nn−1;p( Su 1)

.... . .

...

N1;p( Sum−1) · · · Nn−1;p( Sum−1)

; (11)

Page 5: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024 1015

where R is the vector of n− 1 points

R=

N1;p( Su 1)R1 + · · · + N1;p( Sum−1)Rm−1

...

Nn−1;p( Su 1)R1 + · · · + Nn−1;p( Sum−1)Rm−1

:

(12)

Each Rk is de5ned as

Rk = Qk − N0;p( Suk)Q0 − Nn;p( Suk)Qm; (13)

P =

P1

...

Pn−1

; (14)

P is a matrix with unknown control points in each row wherex, y, and z are stored as a separate element.

2.5. Recognition

2.5.1. Good input?The gesture is recognized using the trajectory between

grabbing and releasing. However, the gesture is ambiguousif the movement is too small. Also, if the trajectory containsabruptly sampled or too few points, computing the shape oftrajectory becomes diOcult.

Fig. 3. Selected frames for three sample gestures: zoom (top row), translate (mid row), and rotate (bottom row). Hands used for manipulationare (automatically) detected and shown in natural skin colors, whereas the rest of each image is displayed as greyscale for visualization.

We consider the gesture to be invalid if the trajectory isshorter than 20 cm, if the trajectory is irregularly sampled,or if the trajectory contains less than six points. We selectedthe minimum number of points to be six because (1) weobserved that at least six points were needed to robustly 5tto three control points, and (2) we assumed that the gestureroughly requires one or two seconds and the frame rate wasclose to six frames/sec.

In order to determine irregularity, we have computedUk for all the points along the trajectory, then counted thepoints in three bins ranging from [0:0; 0:33], [0:34; 0:66],and [0:67; 1:0] with Uk values. If there are no items in anyof the three bins, we consider the trajectory to be invalid.However, a natural variability in gesture velocity is al-lowed. uk can adjust for diJerent speeds/intervals betweenpoints Qk . u is the sampling parameter which could rangebetween [0, 1] to select a point on a curve by using C(u).C(0) is the beginning of the curve, and C(1) is the end ofthe curve. Choosing uk (which is the u for the kth point)by using the distance between two consecutive pointsis equivalent to adjusting the diJerent speeds along thegesture.

2.5.2. Determining gesture typesA geometric (rather than statistical) model is motivated by

the nature of the application (representation of visualizationoperations) and its requirements (robustness and simplicity).In order to allow for the variation of the hand speed withina gesture and between gestures, the Suk in a Bezier curve

Page 6: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1016 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

P0 P2

C(0.5)

θP2P0

C(0.5)

θ

rotation

θ > 100θ < 100

translation

P0 P2

|P0,z - P2,z|< 0.5

P0 P2

|P0,z - P2,z |> 0.5

zoom

Fig. 4. Determining gesture type using Bezier curve.

description is computed using the distance that the handtravels between the frames.

Using its angle and direction, the trajectory is classi5edinto rotation, translation or zoom. Examples of three ges-tures are shown in Fig. 3. The inside angle at the middlepoint is used to recognize the gesture from 3-D trajectory(see Fig. 4). Using two end control points (P0; P2) and themiddle point (C(0:5)), we compute the inside angle of tri-angle inside the curve (�) by

v10 = C(0:5) − P0; (15)

v12 = C(0:5) − P2; (16)

v10 · v12 = |v10‖v12| cos(�): (17)

We classify “Tat” trajectory as translation or zoomingwhile “curved” as rotation. The inside angle would besmaller if the curve has high curvature. So if �6 100◦, thenwe classify the gesture as rotation. Otherwise, the gestureis translation or zooming. If the movement was mostly in zdirection, then we recognize it as “zooming” or otherwise“translation”. To do so, we compute the ratio of the changeof depth between 5rst and last control points vs. distancebetween them.

|P2; z − P0; z||P2 − P0| : (18)

If the distance is greater than 0.5 then it is classi5ed aszoom, otherwise it would be a translation. Since the methodis geometry-based, the entire trajectory is recorded and canbe used to recognize gestures by the hand speed.

3. Results

Experimental setup consists of a Digiclops system (PointGrey Research, [22]) on a Pentium 4 PC 1:5 GHz with

512 MB RAM. The system is based on a triangulationbetween three cameras. Since the camera parameters (theirrelative positions, the focal length and resolution) are5xed, re-calibration is not usually required. The results areorganized in four sections: manipulating hand detection;manipulation mode detection; gesture recognition; and over-all performance. Testing data set includes 100 gestures, 226manipulation modes, and 1641 frames (with a 320 × 240image size) of manipulating hands of four people. We havepresented a 3D noise 5ltering algorithm and initial manip-ulating hand tracking results on a smaller data set in [19].The data was captured in an indoor setting with varyingnatural light and a complex background. Four diJerent peo-ple with various skin tones participated in the experiments(refer to Fig. 5).

3.1. Manipulating hand detection

The results of detecting manipulating hands (described inSection 2.2) are shown in Table 1. We detected the manipu-lating hand correctly in 1639 of 1641 images (99.9%). Twoincidences of misdetection are shown in Fig. 6. In both in-stances, detection failed because the depth readings of themanipulating hand and face were identical. In the upper im-age, the face was at [0:0323; −0:359; 1:727] (in meters)and the hand was at [−0:466; −0:051; 1:730]. For the lowerimage, the face was at [0:032; −0:359; 1:727] and the handwas at [ − 0:360; 0:045; 1:727]. In these two cases, thehistogram-based range 5ltering was not eJective enough torobustly compute the 3-D location.

3.2. Manipulation detection

3.2.1. Evaluating manipulation detectionEach manipulation mode is de5ned by the starting frame,

end frame, and its mode (on or oJ). We evaluated the ma-nipulation mode detection in four categories: correct, incor-rect, over, and under detection as shown in Fig. 7. Detectedmanipulation mode on and oJ is shown as a bar as the framenumber increase from left to right. Upper bar is for Algo-rithm Detected (AD) and lower bar is for Ground Truth(GT).

Four possible outcomes are classi5ed below:

• Correct detection. AD and GT detections with at least90% overlap in duration and same mode.

• Incorrect detection. AD and GT detections with at least90% overlap in duration but diJerent mode.

• Over detection. Multiple AD detections with at least 90%overlap in duration with one GT detection with samemode.

• Under detection. Multiple GT detections with at least90% overlap in duration with one AD detection with samemode.

Page 7: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024 1017

Fig. 5. Subjects participated in the experiment. Note the variations in lighting condition, diJerence in skin tone, and a complex background.

Table 1Manipulating hand detection results for ten sequences

Data set Correct Incorrect Total manipulatinghand

Manipulating Non-manipulating Face Non-skinhand hand other body parts regions

1 98 0 0 0 982 199 0 0 0 1993 98 0 0 0 984 198 0 0 0 1985 50 0 0 0 506 199 0 0 0 1997 200 0 0 0 2008 198 0 2 0 2009 199 0 0 0 199

10 200 0 0 0 200Total 1639 0 2 0 1641Total (%) 99.9 0.0 0.1 0.0 100.0

Fig. 6. Two incidences of misdetection of manipulating hand. The color images (left), skin regions (mid), and detected manipulating hand(right) are shown.

Page 8: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1018 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

ON OFF

AD (algorithm detected)

ON OFF ON

ON

GT (ground truth)

OFF ONON OFF OFF

corr

ect

inco

rrec

t

over

unde

r

Fig. 7. Evaluating manipulation mode detection.

3.2.2. ResultsThe results of manipulation mode detections are shown

in Table 2. The method correctly identi5ed 216 of the226 modes (95.6%). All 5ve underdetections were due to

Table 2Manipulating mode detection

Data set Correct Incorrect Over Under Total

1 19 0 0 0 192 35 0 0 0 353 11 0 0 1 134 24 0 0 1 255 8 0 0 0 86 30 0 0 0 307 31 0 0 0 318 16 0 0 1 199 18 0 0 1 21

10 24 0 0 1 25Total 216 0 0 5 226Total (%) 95.6 0.0 0.0 2.2 100.0

Fig. 8. Two misidenti5cations of manipulation modes. Upper with area change of 63% and lower with area change of −54% do not meetthe threshold of ±75%.

missing the manipulation mode change (hand opening orclosing). For these 5ve missed detections, the area changewas 63%, 74.7%, 71.4%, 68%, and −54%, which did notmeet the threshold of ±75%. The hand detections of 63%and −54% are shown in Fig. 8.

3.3. Gesture recognition

In order to compute the results of gesture recognition, wehave taken the correctly identi5ed manipulation modes andchecked the recognition rate. Note that recognition rate isevaluating the goodness of recognition rate by using Beziercurve representation. The overall performance of evaluatingthe correctness of recognition versus intended recognitionis shown in the next section. The results are shown in Table3. The algorithm identi5ed gestures correctly 93 out of 95times (97.9%). For the two incorrect recognitions, the zoomgestures were recognized as rotation by estimating the innerangle (�) as 104.9 and 125.1. The trajectory plots of twogestures are shown in Fig. 9. The reason for the failure was

Page 9: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024 1019

Table 3Gesture recognition performance

Data set Correct Incorrect Total

Zoom Translation Rotation

1 7 0 0 0 72 14 0 0 0 143 6 0 0 0 64 10 0 0 0 105 4 0 0 0 46 13 0 0 0 137 12 0 0 0 128 7 0 0 0 79 8 2 0 0 10

10 12 0 0 0 12Total 93 2 0 0 95Total (%) 97.9 2.1 0 0 100.0

-0.34-0.32

-0.3-0.28

-0.26-0.24

-0.25

-0.2

-0.15

-0.1

-0.05

01

1.2

1.4

1.6

X, m

Y, m

Z, m

Fig. 9. 3-D trajectories of two incorrectly recognized gestures.

due to using three-control points Bezier curve, and the curvesin question were too concave to be adequately described bythree control points.

Motion trajectories for de5ned object manipulationsin a 3-D space are shown in Fig. 10. It is apparent thatzoom gesture (blue) represents a signi5cant change in Zcoordinate relative to the almost constant depth during thetranslation (green), which is a predominantly X-axis mo-tion. Rotational semicircle is also well-de5ned. Trajectoryanalysis is domain speci5c. Currently, we are working withHCI experts to determine necessary sets of gestures forcontrolling diJerent visualization packages.

3.4. Overall performance

Overall performance is evaluating the number of cor-rectly recognized gestures vs. the number of intended ges-tures (shown in Table 4). The algorithm correctly recog-nized 93 out of 100 intended gestures. Note that in the

-0.3

-0.2

-0.1

0

0.1

0.2 -0.10

0.10.2

0.3

0.8

0.9

1

1.1

1.2

1.3

Y, m

X, m

Z, m

Fig. 10. Motion trajectories in a 3-D space.

previous section, only 95 gestures were considered because5 were not detected due to misdetection of manipulationmode as shown in Section 3.2. Table 4 also shows the rea-sons for 5ve non-correct classi5cations of gestures. We havedivided those 5ve incidences into “incorrect” if the gesturewas incorrectly recognized as wrong gesture and “missed”if the gesture was not detected. “Incorrect” and “missed” arethen subdivided into failures of three steps: “hand” (manip-ulating hand detection), “mode” (manipulating mode detec-tion), and “recognition” (gesture recognition). Two gestureswere incorrectly recognized due to a small angle as men-tioned in Section 3.3. Three gestures were missed becausethe mode detection underdetected the gesture as mentionedin Section 2. The system executed at the rate of 5.5 frames/s.

3.5. Application to virtual manipulation

Fig. 11 shows the application of recognized gestural com-mands to a small virtual object. Complex virtual objects(3D views of two chromosomes) have been selected to

Page 10: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1020 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

Table 4Overall recognition performance

Data set Correct Incorrect Missed Total

Hand Mode Recognition Hand Mode Recognition

1 7 0 0 0 0 0 0 72 14 0 0 0 0 0 0 143 6 0 0 0 0 1 0 74 10 0 0 0 0 1 0 115 4 0 0 0 0 0 0 46 13 0 0 0 0 0 0 137 12 0 0 0 0 0 0 128 7 0 0 0 0 1 0 89 8 0 0 2 0 1 0 11

10 12 0 0 0 0 1 0 12Total 93 0 0 2 0 5 0 100

1

3 translate

2

rotate

zoom

Fig. 11. Example of virtual manipulation application. Three sequential manipulations are shown. Red arrows are superimposed on imagesto enhance the appearance of gesture trajectories.

Page 11: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024 1021

Fig. 12. Frame samples for two-handed gestures (top to bottom, respectively): rotation, translation, zoom, bend, stretch and squeeze.

provide a visual illustration of discussed operations. Thedataset shown represents visualization of 3D reconstructionof fruit Ty chromosomes based on 2D slices of CT scan im-ages. The results of three sequentially applied virtual manip-ulations through gestures of rotation, zoom, and translation

are shown in Fig. 11. Both recognition and visualization arein real-time. The latter part is written in OpenGL. Real-timevisualization of large data sets represents a signi5cant bot-tleneck, which is currently under investigation by variousresearch groups.

Page 12: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1022 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

Fig. 13. Appearance of a virtual environment after a horizontalstretch operation.

3.6. Environment (two-handed) operations

So far we discussed only basic visualization commands.However, the technique described is easily expandable tolarger sets of gestures covering more advanced data manipu-lations. For example, environment operations are applicablewhen several objects have to be aJected. Such operationsinclude:

(1) (environment) stretch,(2) (environment) squeeze,(3) (environment) bend,(4) (environment) translation,(5) (environment) rotation, and(6) (environment) zoom.

All of these manipulations require two-handed gestures.When two simultaneous “ON” switches of the manipula-tion mode occur close in the depth, this signals that an en-vironment command is being processed. Gestures (4)–(6)are de5ned similarly to their respective one-handed equiv-alents. They are distinguished from (1)–(3) by having anapproximately constant distance between hands (Fig. 12).Gesture (3) is characterized by the angle between trajecto-ries, while (1) and (2) occur along a straight line and diJeronly in direction. Appearance of a virtual environment aftera horizontal stretch operation is shown in Fig. 13.

4. Conclusions

Visual data exploration has tremendous capabilities forrevealing properties and abnormalities in large data sets.This paper described a gesture recognition system for visu-alization navigation. Scientists are interested in developinginteractive settings for exploring large data sets in an in-tuitive environment. The input consists of registered 3-Ddata. Bezier curves are used for trajectory analysis and clas-si5cation of gestures. The system improved upon previ-ous work by emphasizing reliability, naturalness of motionand the lack of restrictions. A geometric (rather than sta-

tistical) model is motivated by the nature of the applica-tion (representation of visualization operations) and its re-quirements (robustness and simplicity). Since the methodis geometry-based, the entire trajectory is recorded and canbe used to recognize gestures by the hand speed. Variationsin the hand speed within a gesture as well as between ges-tures are normalized by parameterizing the points along thetrajectory to a Bezier curve description with respect to thehand speed at each frame. The method is robust and reli-able: correct hand identi5cation rate is 99.9% (from 1639frames), modes of hand movements are correct 95.6% of thetime, recognition rate (given the right mode) is 97.9%. Anapplication to gesture-controlled visualization of 3D bioin-formatics data is also presented. Future work includes de5n-ing a larger set of gestures based on the basic movementsalready analyzed, using gesture speed for classi5cation, andcreating actual hand models for detection of smaller, moresubtle gestures.

Acknowledgements

We would like to thank the LLNL VIEWS Visualizationproject for Fig. 1(a); the example data set appears as Fig.1(b) courtesy of Art Mirin of LLNL. Also, we thank Ben-jamin Lok at University of Florida for the Fig. 11.

Appendix A. Least squares tting of 3-D Bezier curves

With three control points, we have only three unknowns:x, y, and z in P1 (the middle control point).

First, the nth-degree Bezier curve is described as

C(u) =n∑i=0

Bi;n(u)Pi: (A.1)

With three control points,

N0;2 = B0;2 = (1 − u)2; (A.2)

N1;2 = B1;2 = 2u(1 − u); (A.3)

N2;2 = B2;2 = u2: (A.4)

N is now (m − 1) × 1 ((m − 1) × (n − 1) and n = 2)matrix and

N TN =m−1∑k=1

N1;p( Suk) =m−1∑k=1

2 Suk(1 − Suk) (A.5)

We set P to be

P = [P1; x P1; y P1; z]: (A.6)

Rk is now simpli5ed and R is a 1 × 3 matrix.

Rk = Qk − (1 − Suk)2Q0 − Su2kQm; (A.7)

R= [Rx Ry Rz]T; (A.8)

Page 13: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024 1023

=

[m−1∑k=1

2 Suk(1 − Suk)Rk

]: (A.9)

The middle control point is the solution to (N TN )P = Rwhich is de5ned as

P1; x =Rx

(N TN ); (A.10)

P11; y =Ry

(N TN ); (A.11)

P11; z =Rz

(N TN ): (A.12)

References

[1] C. Wren, A. Azarbayejani, T. Darrell, A. Pentland, P5nder:real-time tracking of the human body, IEEE Trans. PAMI 19(7) (1997) 780–785.

[2] M.-H. Yang, N. Ahuja, Recognizing hand gestures usingmotion trajectories, in: Proceedings of IEEE CS Conferenceon Computer Vision and Pattern Recognition, Fort Collins,CO, Vol. 1, June 1999, pp. 466–472.

[3] J.M. Rehg, T. Kanade, Visual tracking of high DOF articulatedstructures: an application to human hand tracking, Proceedingsof European Conference on Computer Vision, Stockholm,Sweden, Vol. 2, May 1994, pp. 35–46.

[4] J.J. Kuch, T.S. Huang, Model-based tracking of self-occludingarticulated objects, in: Vision Based Hand Modeling andTracking for Virtual Teleconferencing and Telecollaboration,Cambridge, MA, June 1995, pp. 666–671.

[5] R. Cutler, L. Davis, Real-time periodic motion detection,analysis, and applications, in: Proceedings of IEEE CSConference on Computer Vision and Pattern Recognition, FortCollins, CO, Vol. 2, June 1999, pp. 326–332.

[6] D.J. Moore, I.A. Essa, M.H. Hayes III, Exploiting humanactions and object context for recognition tasks, in:Proceedings of International Conference on Computer Vision,Corfu, Greece, September 1999, pp. 80–86.

[7] Y. Iwai, H. Shimizu, M. Yachida, Real-time context-basedgesture recognition using hmm and automaton, in: Proceedingsof International Workshop on Recognition, Analysis, andTracking of Faces and Gestures in Real-Time Systems, Corfu,Greece, September 1999, pp. 127–134.

[8] C. Vogler, H. Sun, D. Metaxas, A framework for motionrecognition with applications to American sign language andgait recognition, in: Proceedings of the Workshop on HumanMotion, Los Alamitos, CA, December 2001, pp. 33–38.

[9] Y. Yacoob, M.J. Black, Parameterized modeling andrecognition of activities, J. Comput. Vision Image Understand.73 (2) (1999) 232–247.

[10] A.F. Bobick, A.D. Wilson, A state-based approach to therepresentation and recognition of gesture, IEEE Trans. PatternAnal. Machine Intel. 19 (12) (1997) 1325–1337.

[11] R. Grzeszcuk, G. Bradski, M.H. Chu, J.-Y. Bouguet, Stereobased gesture recognition invariant to 3D pose and lighting,in: Proceedings of the IEEE CS Conference on ComputerVision and Pattern Recognition, Hilton Head Island, SC, Vol.1, June 2000, pp. 826–833.

[12] P. Hong, M. Turk, T.S. Huang, Constructing 5nite statemachines for fast gesture recognition, in: Proceedings ofthe 15th International Conference on Pattern Recognition,Barcelona, Spain, Vol. 3, September 2000,pp. 691–694.

[13] Y. Sato, M. Saito, H. Koike, Real-time input of 3D pose andgestures of a user’s hand and its applications for HCI, in:Proceedings of the IEEE Virtual Reality, Yokohama, Japan,March 2001, pp. 79–86.

[14] H. Hongo, M. Ohya, M. Yasumoto, K. Yamamoto, Face andhand gesture recognition for human-computer interaction, in:Proceedings of the 15th International Conference on PatternRecognition, Barcelona, Spain, Vol. 2, September 2000, pp.921–924.

[15] H.S. Yoon, J. Soh, Y.J. Bae, H.S. Yang, Hand gesturerecognition using combined features of location, angle andvelocity, Pattern Recognition 34 (7) (2001) 1491–1501.

[16] L. Gupta, S. Ma, Gesture-based interactionand communication: automated classi5cation of hand gesturecontours, IEEE Trans. Syst. Man Cybernet. C 31 (1) (2001)114–120.

[17] J.K. Aggarwal, Q. Cai, Human motion analysis: a review, in:IEEE Nonrigid and Articulated Motion Workshop, San Juan,Puerto Rico, June 1997, pp. 90–102.

[18] V.I. Pavlovic, R. Sharma, T.S. Huang, Visual interpretationof hand gestures for human-computer interaction: a review,IEEE Trans. PAMI 19 (7) (1997) 677–695.

[19] L.V. Tsap, M.C. Shin, Improving quality and speed ofrange data for communication, in: Proceedings of theIEEE International Workshop on Cues in Communication,CD-ROM, Kauai, Hawaii, December 2001.

[20] M.W. Powell, R. Murphy, Position estimation of micro-roversusing a spherical coordinate transform color segmenter, in:Proceedings of IEEE Workshop on Photometric Modeling forComputer Vision and Graphics, Fort Collins, CO, June 1999,pp. 21–27.

[21] L. Piegl, W. Tiller, The NURBS Book, Springer, New York,1997.

[22] Point Grey Research Inc. Digiclops Stereo Vision System,User’s guide and command reference. Inc., Point GreyResearch, Vancouver, BC, 2000.

About the Author—MIN C. SHIN received the B.S., M.S., and Ph.D. degrees in computer science from the University of South Florida,Tampa in 1992, 1996, and 2001, respectively. He received University of South Florida Graduate Council’s Outstanding Dissertation Prize.He is currently an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. His researchinterests include gesture recognition, range image analysis, nonrigid motion analysis, and performance evaluation. Dr. Shin is a member ofIEEE, UPE, and the Golden Key Honor Society. More information can be obtained from http://www.cs.uncc.edu/∼mcshin.

Page 14: Gesture recognition using Bezier curves for visualization navigation from registered 3-D data

1024 M.C. Shin et al. / Pattern Recognition 37 (2004) 1011–1024

About the Author—LEONID V. TSAP received the B.S. degree in Computer Science from the Kiev Civil Engineering Institute, Ukraine,in 1991, and the M.S. and Ph.D. degrees in Computer Science from the University of South Florida, Tampa, in 1995 and 1999, respectively.He is a three-time winner of the annual University of South Florida USPS Scholarship Award, and a recipient of the Provost’s Commendationfor Outstanding Teaching by a Graduate Student. He also received University of South Florida Graduate Council’s Outstanding DissertationPrize. He is currently with the Advanced Communications and Signal Processing Group (Electronics Engineering Department) at theUniversity of California Lawrence Livermore National Laboratory.Leonid V. Tsap is a member of the IEEE-CS and ACM. He is a member of the Editorial Board of the Pattern Recognition journal. Hiscurrent research interests include image analysis/computer vision, nonrigid motion analysis, pattern recognition, perceptual user interfaces,physically-based modeling and biocomputing. His research resulted in 24 refereed publications. More information can be obtained fromhttp://marathon.csee.usf.edu/∼tsap and http://www.llnl.gov/CASC/people/tsap.

About the Author—DMITRY B. GOLDGOF has received the Ph.D. degree in Electrical Engineering from the University of Illinois atUrbana-Champaign, in 1989. He is currently a Professor in the Department of Computer Science and Engineering at the University ofSouth Florida in Tampa and a member of H. Lee MoOtt Cancer Center and Research Institute. Professor Goldgof research interests includemotion and deformation analysis of biological objects, motion analysis, computer vision, image processing and its biomedical applications,bioinformatics and pattern recognition. He has graduated 10 Ph.D and 24 M.S. students, and has published 50 journal and over 100 conferencepublications, 15 book chapters and 4 books.Professor Goldgof was awarded Annual Pattern Recognition Society Awards (for best papers) in 1993 and 2002. His paper entitled“Automatic tumor segmentation using knowledge-based techniques” was selected by International Medical Informatics Association for 2000IMIA Yearbook containing “the best of medical informatics”. Professor Goldgof is a senior member of IEEE. He is the North AmericanEditor for Image and Vision Computing Journal and Associate Editor for IEEE Transactions on Systems, Man and Cybernetics, Part B.Dr. Goldgof has served as a member of the Editorial Board of the Pattern Recognition (1990–2001), a member of International Associationof Pattern Recognition (IAPR) Education Committee (2000–2002), and as Associate Editor for IEEE Transactions on Image Processing(1996–1998). More information can be obtained from http://marathon.csee.usf.edu/∼goldgof/.