feature extraction from 2d gesture trajectory ieee

6

Click here to load reader

Upload: aditya-verma

Post on 06-Mar-2015

112 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Feature Extraction From 2D Gesture Trajectory IEEE

Feature Extraction from 2D Gesture Trajectory inDynamic Hand Gesture Recognition

M.K. Bhuyan, D. Ghosh and P.K. BoraDepartment of Electronics and Communication Engineering

Indian Institute of Technology Guwahati, India 781039Email: (manas kb, ghosh, prabin )@iitg.ernet.in

Abstract— Vision-based hand gesture recognition is a popularresearch topic for human-machine interaction (HMI). We haveearlier developed a model-based method for tracking handmotion in complex scene by using Hausdorff tracker. In thispaper, we now propose to extract certain features from thegesture trajectory so as to identify the form of the trajectory.Thus, these features can be efficiently used for trajectory guidedrecognition/classification of hand gestures. Our experimentalresults show 95% of accuracy in identifying the forms of thegesture trajectories. This indicates that the trajectory featuresproposed in this paper are appropriate for defining a particulargesture trajectory.

Keywords—human machine interaction, video object plane,motion trajectory

I. INTRODUCTION

One very interesting field of research in Pattern Recognitionthat has gained much attention in recent times is GestureRecognition. Gesture refers to a particular pose and/or move-ment of the body parts, such as hand, head, face etc., so as toconvey some message. Accordingly, one important direction ofresearch in gesture recognition is concerned with hand gesturesformed by different hand shapes, positions, orientations andmovements. While static hand gestures are modelled in termsof hand configuration, as defined by the flex angles of thefingers and palm orientation, dynamic hand gestures includehand trajectories and orientation in addition to these. So,appropriate interpretation of dynamic gestures on the basis ofhand movement in addition to shape and position is necessaryfor recognition.

Another form of dynamic hand gesture that is in commonuse is in which the 2D gesture trajectory alone builds up aparticular message. Examples of such gestures are shown inFig. 1. Recognition of these gesture, hence, will require tra-jectory estimation followed by extraction of features definingthe estimated gesture trajectory.

As mentioned above, the first task in dynamic hand gesturerecognition is to track hand motion from the gesture videosequence and subsequently estimate the trajectory throughwhich the hand moves during gesticulation. While trajectoryestimation is quite simple and straightforward in glove-basedhand gesture recognition system [1], [2] that provides spatialinformation directly, trajectory estimation in vision-based sys-tem may require to apply complex algorithms to track hand

and fingers using silhouettes and edges. One such method isgiven in [3] that uses skin color to segment out hand regionsand subsequently determines 2D motion trajectories. However,many of these techniques are plagued by some difficulties suchas large variation in skin tone, unknown lighting conditions anddynamic scenes. In an attempt to overcome these difficultiesa model-based method for tracking hand motion in complexscenes is proposed in [4]. The modified version of this algo-rithm is described in Section II.

Next important step in gesture recognition is the selec-tion of suitable features. Selecting good features is crucialto gesture recognition, since hand gestures are very rich inshape variation, motion and textures. In view of the presentproblem, in this paper, we propose to use some basic trajectoryfeatures like key trajectory points, trajectory length, trajectoryshape, location feature, orientation feature, velocity featureand acceleration feature, so as to accomplish trajectory guidedrecognition successfully. Section III describes how these fea-tures are extracted from the estimated gesture trajectory.

Fig. 1. Hand gestures for representing a circle, two, square and a wavy hand.

II. TRACKING ALGORITHM

A. Motion vector estimation

The tracking algorithm given in [4] is based on Hausdorffobject tracker developed in [5]. The core of this trackingalgorithm is an object tracker that matches a two-dimensionalbinary model of the object against subsequent frames using theHausdorff distance measure. The best match found indicatesthe translation the object has undergone, and the model isupdated with every frame to accommodate for rotation andchanges in shape.

In our proposed technique, we segment the frames in thegesture sequence so as to form video object planes (VOPs)where the hand is considered as video object (VO) [6]. Next,

1–4244–0023–6/06/$20.00 c© 2006 IEEE CIS 2006

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Page 2: Feature Extraction From 2D Gesture Trajectory IEEE

the VOPs are binarized to form binary alpha plane – the hand-pixels are assigned “1” and the background pixels are assigned“0”. Thus, a binary model for the moving hand is derived andis used for tracking. The Hausdorff object tracker finds theposition where the input hand model best matches the nextedge image and returns the motion vector MVi that representsthe best translation.

B. Trajectory estimation

Determination of centroid of hand image

After determining binary alpha plane corresponding to eachVOP, moments are used to find the center of the hand. The0th and the 1st moments are defined as:

M00 =∑

x

∑y

I(x, y) (1)

M10 =∑

x

∑y

xI(x, y) (2)

M01 =∑

x

∑y

yI(x, y) (3)

Subsequently, the centroid is calculated as

xc =M10

M00and yc =

M01

M00(4)

In the above equations, I(x, y) is the pixel value at the position(x, y) of the image. Since the background pixels are assigned“0”, the centroid of the hand in a frame is also the centroidof the total frame. Therefore, in moment calculation, we mayeither take the summation over all pixels in the frame or overonly the hand pixels.

Estimation of centroids of motion trajectory

As shown in the Fig. 2 and 3, centroid is calculated both bymoment equations (1) to (4) as well as by motion vector, wherecentroid Ci is obtained by translating Ci−1 by the respectivemotion vector. The final centroid is taken as the average ofthese two. This is to nullify the effect of slight shape changesin successive VOPs.

C. Trajectory formation and smoothing of final trajectory

Final trajectory is formed by joining all the calculated centroidsin sequential manner. However, this trajectory may be noisydue to the following reasons.

• points too close.

• isolated points far away from correct trajectory due tochange in shape of the hand model.

• unclosed end points.

• hand trembling.

• unintentional movements.

*

VOP VOP

Final Centroid

Centroid computed by Motion Vector

Motion Vector

ii−1

Centroid computed by Moment Eqn.

Fig. 2. Estimation of centroids in VOPs

Fig. 3. Estimated centroids, calculated by motion vector and momentequations in the VOPs.

Therefore, in order to obtain a smooth trajectory, we applypiecewise approximation of spatial positions of the centroidsin successive VOPs following the MPEG-7 motion trajectoryapproximation and representation technique [7]. This gives thefinal trajectory model.

Trajectory approximation

• First order approximation :

x(t) = xi + vi(t − ti),where vi =xi+1 − xi

ti+1 − ti(5)

• Second order approximation :

x(t) = xi + vi(t − ti) +12ai(t − ti)2 (6)

where vi = xi+1−xi

ti+1−ti− 1

2ai(ti+1 − ti)

and similarly for the other dimension y. Here vi and ai

represent hand velocity and acceleration respectively, consid-ered constant on [ti, ti+1], (xi, yi) and (xi+1, yi+1) are handpositions at times ti and ti+1.

So, dynamic hand gesture can be interpreted as a set ofpoints in a spatio-temporal space as

DG = {(x1, y1), (x2, y2), ......, (xt, yt)} (7)

D. Key frame based trajectory estimation

The computational burden for estimating hand trajectory canbe significantly reduced by selecting the key video frames inthe gesture video sequence. The key VOPs are selected on thebasis of Hausdorff distance measure, thereby transforming anentire video clip into a small set of representative frames thatare sufficient to represent a particular gesture sequence [8], [9].After getting the key frames, hand trajectories are obtained byfollowing the same procedure as that of the previous one, butconsidering only the key VOPs in the sequence.

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Page 3: Feature Extraction From 2D Gesture Trajectory IEEE

III. FEATURE EXTRACTION FROM ESTIMATED TRAJECTORY

For trajectory matching we consider both static and dynamicfeatures. The static features considered in this paper corre-spond to shape of the hand trajectory, the total length of theextracted trajectory, location of key trajectory points and theorientation of hand in the gesture trajectory. The dynamicfeatures proposed for trajectory classification are velocity andacceleration features. We describe the static features as lowlevel features while dynamic features are high level features.The basic idea is that, even if the static features correctly matchduring classification, i.e., even if the shape or length match-ing criteria fulfil it may not represent actual hand trajectoryuntil motion patterns are not compared. Therefore, for actualrecognition, both low level and high level features have to beconsidered for trajectory matching.

A. Static features

1) Key trajectory point selection: The basic principle ofkey point selection is merging of adjacent approximationinterval of the estimated trajectory, until the interpolation errorexceeds a predefined threshold. Key points are defined bytheir coordinates in 2D space and time, i.e., (x, y, t). Thesekey points best represent the prominent locations of the handin the gesture trajectory. The total number of key pointscan be chosen by the user so that the global precision andcompactness required by the application is met, and variabletime intervals between key points can be chosen to match thelocal trajectory’s smoothness.

Key Trajectory Points

Fig. 4. Ideal key points in the ‘circle’ and ‘square’ representing gestures

2) Trajectory length calculation: For determining the totallength traversed by hand during a gesture (i.e., the length ofthe gesture trajectory) the sum of all the Euclidean distancesbetween all points is calculated.

D =∑

{(xi − xi+1)2 + (yi − yi+1)2} 12 (8)

where the summation in runs from i = 0 to N − 1.

3) Location feature extraction: The location feature is themeasure of the distance between the center of gravity andthe selected key points in a gesture trajectory, as depicted inFig. 5. The location of a gesture trajectory cannot be solelydetermined from the origin/start point, since for a particulargesture there is always spatial variation in the gesture startpoint. This is solved by using the gesture center of gravity

for calculation of locations of the selected key points on thegesture trajectory.

(x( ,

x

xy

yN N

My

M

)

)

( )Key Point

)y^

,x^

(Center of Gravity

LLN

,

,

0 0 0

Fig. 5. Extraction of location feature.

The center of gravity of a gesture trajectory is calculatedas

x =1N

N∑i=0

xi and y =1N

N∑i=0

yi (9)

and the location of different key points from the center ofgravity is calculated as

Li =∣∣∣√(xi − x)2 + (yi − y)2

∣∣∣ (10)

where (xi, yi) is the ith key point. The average positionaldistance of all the key points from the center of gravity, givenas

Lavg =1

N + 1

N∑i=0

Li (11)

Lavg is our proposed location feature that is related to theoverall gesture size.

4) Orientation feature extraction: Orientation feature givesthe direction along which the hand traverses in space whilemaking a gesture. In order to extract this feature information,first the hand displacement vector at every point on thetrajectory is calculated as

di = [xi − xi−1, yi − yi−1], i = 1, 2, ...., N (12)

θi = tan−1

(yi − yi−1

xi − xi−1

), i = 1, 2, ...., N (13)

where di and θi give the magnitude and direction, respectively,of hand displacement at the ith trajectory point. Feature valuesthat we propose to derive from these measured displacementvectors are as follow.

1) Directions of hand movement at the starting and endingpoints, i.e., θs = θ1 and θe = θN .

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Page 4: Feature Extraction From 2D Gesture Trajectory IEEE

time time

Vel

ocit

y

Vel

ocit

y

(a) (b)

Fig. 6. Ideal velocity plot of (a) Circle and (b) Square representing gestures

2) Number of points Nθ at which the change in direction ofhand movement exceeds some predefined threshold Tθ,i.e., |θi+1 − θi| ≥ Tθ. This feature corresponds to thenumber of significant corners in the gesture trajectory.

From large number of visualization experiments, it is observedthat a human can generally perceive change in direction ofhand movement only when the amount of angular displacementis approximately 45o or more. Accordingly, we select thethreshold Tθ equal to 45o.

B. Dynamic features

Motion features are computed from the spatial positions ofhand in the gesture trajectory and the time interval betweentwo prominent hand positions as follow.

1) Velocity feature: In some critical situations, velocityfeature plays a decisive role during gesture recognition phase.It is based on an important observation that each gesture ismade at different speeds, as depicted in Fig. 6. For a circlerepresenting gesture, hand velocity is more or less constantthroughout the gesture, whereas in square gesture, velocity ofthe hand decreases at the corner points of the square and/or ifhand pauses at the corner points velocity drops to zero.

2) Acceleration feature: As mentioned earlier, continuousgestures are composed of a series of gestures that as a wholebears some meaning. As a first step towards recognition, acontinuous gesture sequence needs to be segmented into itscomponent gestures. However, the process is complicated dueto the phenomenon of ‘co-articulation’ in which one gestureinfluences the next in the temporal sequence [10]. This happensdue to hand movement during transition from one gesture tothe next. The problem is very significant in case of fluent signlanguage. Recognition of co-articulated gestures is one of thehardest parts in gesture recognition.

In view of this, we propose acceleration feature which maydistinguish co-articulation phase from the meaningful dynamicgesture sequence, as during co-articulation hand moves veryquickly, i.e., with high acceleration, just after the completionof one gesture to the next gesture start position, as shownin Fig. 7. The acceleration feature in combination with otherfeatures and representation, viz., FSM representation [8], canefficiently isolate the co-articulation phase from the rest of thegesture sequence.

Gesture Gesturen n+1phase

Co−articulation

Key Frames

Acc

eler

atio

n

Fig. 7. Ideal acceleration plot for gestures connected sequentially

Computation of motion features

Velocity features are computed from two consecutive key pointdistance and the corresponding time interval as shown inequation (14).

The first derivative of the velocity of the hand motionin successive video frames determines the hand accelerationwhile performing gestures. The velocity of the hand for atrajectory T (xi, yi, ti), i = 0, 1, 2, ....., N , is calculated as

vi ={

xi+1 − xi

ti+1 − ti,yi+1 − yi

ti+1 − ti

}, i = 0, 1, 2, .....N − 1 (14)

Following velocity based features are extracted for a giventrajectory.

• Average velocity over the whole trajectory length vavg.

• Maximum trajectory velocity vmax.

• Minimum trajectory velocity vmin.

• Number of maxima (Nv,max) in the velocity profile. Thiscorresponds to the number of smooth trajectory segments.

• Number of minima (Nv,min) in the velocity profile. Thiscorresponds to the number of corners in the trajectory.

Instead of doing computation for extracting acceleration fea-ture from each and every frame of gesture video sequence, wecan compute it effectively by using either the key points of theextracted trajectory or from the key frames of the sequence.This greatly improves the computational burden.

Normalization of features

For a particular gesture feature, viz., location, velocity oracceleration, normalization is done as follows.

Normmax =M

maxi=1

(Normi) (15)

thenvnormi =

Normi

Normmax(16)

where Normi represents the feature vector of dimension ito be normalized and Normmax is the maximum value ofthe feature vector which is determined from all the M key

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Page 5: Feature Extraction From 2D Gesture Trajectory IEEE

points in the gesture trajectory. Finally, from equation (16), thenormalized value of the feature vector vnormi is calculated,which lies between 0.0 to 1.0, and converted into feature codesby partitioning the range from 0.1 to 1.0.

C. Forming prototype feature vectors and knowledge-base forgesture matching

The set of feature values extracted from the template trajectoryfor a particular gesture, as described above, forms the proto-type feature vector that gives the mathematical description forthat class of gesture. The dimensionality of this feature vectoris 10, where the ten feature values are

• Trajectory length l,

• Location feature Lavg ,

• Starting hand orientation θs,

• Ending hand orientation θe,

• Number of significant changes in hand orientation Nθ,

• Average velocity vavg ,

• Maximum velocity vmax,

• Minimum velocity vmin,

• Number of maxima in the velocity profile Nv,max, and

• Number of minima in the velocity profile Nv,min.

Finally, all prototype feature vectors, one per gesture class,together form the knowledge-base which is used for gesturematching during classification.

IV. EXPERIMENTAL RESULTS

We have tested altogether ten different hand trajectoriesin view of special applications like robot control and gesturebased window menu activation in the Human-Computer inter-active (HCI) platform. They are shown in Fig. 8. The extractedtrajectories for four of the gesture sequences are shown inFig. 9. First row shows extracted trajectories and the secondrow shows smoothed trajectory. Trajectories extracted from thekey VOPs of the gesture video sequence are shown Fig. 10.Our proposed trajectory estimator gives about 99% of accuracyin finding the actual trajectory. The accuracy criteria is fixedin terms of shape similarity between the extracted gesture

Fig. 8. Hand gestures showing different motion trajectories.

trajectory and the corresponding template/prototype trajectory.In the next step, all the derived trajectories are normalizedand aligned in time using Dynamic Time Warping (DTW)technique as described in [11]. Then from these normalizedtrajectories we have extracted the trajectory features followingthe method described in this paper. Subsequently, we usethese features for trajectory recognition. In our experiment,we achieved 95% of accuracy.

The high recognition rate in identifying the different formsof the trajectories may be attributed to the proposed trajectoryfeature extraction process. The trajectory features used heregive distinct measures from one trajectory to another makingthem easily distinguishable. As an example, consider thenormalized location feature in case of circle and square asgiven in Table1, where the number of key points for both thetrajectories are predefined. For circular trajectory, locationsfrom the center of gravity to the predefined key points arealmost same for all the trajectory key points, whereas locationfeature is not constant along the whole periphery for thesquare representing trajectory. As a second example, Table 2illustrates the variation in acceleration of hand during transitionfrom one gesture to another. The connected gesture sequencesfor our experiment are transition of 1 to 2 and from 5 to 1representing hand gesture trajectories. It is seen during co-articulation, hand acceleration is significant as compared tothe acceleration during gesticulation.

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������

Fig. 9. Estimated and smoothen trajectories.

����������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

����������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������

Fig. 10. Key VOP based trajectory estimation.

V. CONCLUSIONS AND DISCUSSION

The advantage of VOP based method for segmentation ofhand image is that no extra computation for rotation andscaling of the object are required, where the shape changeis represented explicitly by a sequence of two dimensionalmodels, one corresponding to each image frame. Moreover,trajectory estimated from the corresponding VOPs bears spatialinformation in dynamic gestures, which is required in the ges-ture classification stage. During trajectory guided recognition,

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.

Page 6: Feature Extraction From 2D Gesture Trajectory IEEE

Gestures

*

Circle

Square

0.320

0.521 0.322

0.300

0.541

0.330

0.300

0.322

0.550

0.321

0.310

0.321

0.540

0.301

0.330

1 2 3 4 5 6 7 8Key points

0.315

Table 1: Normalized Location Feature

Equal no. of key points are taken for both gestures.

Gesture phases

1st gesture Coarticulationphase phase phase

2nd gestureGesturesequence

*

0.342 0.783 0.338

0.257 0.845 0.276

1 − 2

5 − 1

Table 2 : Acceleration feature for connected gesture sequence.

Average value is taken for a particular gesture phase.

the extracted features serve the purpose of efficient classi-fication of gestures. Also, the proposed acceleration featureworks nicely only when the spatial end position of precedinggesture is different from the start position of next gesture inthe connected gesture sequence.

REFERENCES

[1] D.L. Quam, “Gesture recognition with a data glove,” Proc. IEEE Conf.National Aerospace and Electronics, vol. 2, pp. 755–760, 1990.

[2] D.J. Sturman, and D. Zeltzer, “A survey of glove-based input,” IEEEComputer Graphics and Applications, vol. 14, pp. 30–39, 1994.

[3] M.H. Yang, N. Ahuja, and M. Tabb, “Extraction of 2D motion trajectoriesand its application to hand gesture recognition,”IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 24, no. 8, pp. 1061–1074, 2002.

[4] M.K. Bhuyan, D. Ghosh, P.K. Bora, “ Estimation of 2D motion trajectoriesfrom video object planes and its application in hand gesture recognition,”Lecture Notes in Computer Science – PreMI’05, Springer-Verlag., LNCS3776, pp. 509–514, 2005.

[5] D. P. Huttenlocher, J.J. Noh, and W.J. Rucklidge, “ Tracking non-rigidobjects in complex scene”, Proc. Fourth International Conference ofComputer Vision, pp. 93–101, 1993.

[6] M.K. Bhuyan, D. Ghosh, P.K. Bora, “Automatic video object planegeneration for recognition of hand gestures,” Proc. International Conf.Systemics, Cybernetics and Informatics (ICSCI), pp. 147–152, 2005.

[7] B.S.Manjunath, P. Salembier, and T. Sikora, ed., “Intoduction to MPEG-7,Multimedia Content Description Interface”, John Wiley and Sons, Ltd.,2002, pp. 273–276.

[8] M.K. Bhuyan, D. Ghosh, and P.K. Bora, “ Finite state representation ofhand gestures using key video object plane”, Proc. IEEE Region 10 -Asia-Pacific Conf. (TENCON), , pp. 21–24, 2004.

[9] M.K. Bhuyan, D. Ghosh, and P.K. Bora, “ Key video object plane selectionby MPEG-7 visual shape descriptor for summarization and recognitionof hand gestures”, Proc. Fourth Indian Conference on Computer Vision,Graphics and Image Processing (ICVGIP), pp. 638–643, 2004.

[10] A. Shamaie, W. Hai, and A. Sutherland, “ Hand gesturerecognition for HCI”, ERCIM News (on line edition),http://www.ercim.org/publication/Ercim News, no. 46, 2001.

[11] M.K. Bhuyan, D. Ghosh, P.K. Bora, “Trajectory guided recognition ofhand gestures for human computer interface,” Proc. 2nd Indian Interna-tional Conf. Artificial Intelligence (IICAI), pp. 312-327, 2005.

Authorized licensed use limited to: Women's College of Engineering. Downloaded on August 05,2010 at 10:37:12 UTC from IEEE Xplore. Restrictions apply.