a multilayer hidden markov models-based method for human
Post on 11-Feb-2022
4 Views
Preview:
TRANSCRIPT
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 384865 12 pageshttpdxdoiorg1011552013384865
Research ArticleA Multilayer Hidden Markov Models-Based Method forHuman-Robot Interaction
Chongben Tao and Guodong Liu
Key Laboratory of Light Industry Process Advanced Control School of Internet of Things EngineeringJiangnan University Wuxi 214122 China
Correspondence should be addressed to Chongben Tao tom1tao163com
Received 31 May 2013 Revised 6 August 2013 Accepted 8 August 2013
Academic Editor Vishal Bhatnaga
Copyright copy 2013 C Tao and G Liu This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
To achieve Human-Robot Interaction (HRI) by using gestures a continuous gesture recognition approach based on MultilayerHidden Markov Models (MHMMs) is proposed which consists of two parts One part is gesture spotting and segment modulethe other part is continuous gesture recognition module Firstly a Kinect sensor is used to capture 3D acceleration and 3D angularvelocity data of hand gestures And then a Feed-forward Neural Networks (FNNs) and a threshold criterion are used for gesturespotting and segment respectively Afterwards the segmented gesture signals are respectively preprocessed and vector symbolizedby a sliding window and a K-means clustering method Finally symbolized data are sent into Lower Hidden Markov Models(LHMMs) to identify individual gestures and then a Bayesian filter with sequential constraints among gestures in Upper HiddenMarkov Models (UHMMs) is used to correct recognition errors created in LHMMs Five predefined gestures are used to interactwith aKinectmobile robot in experimentsThe experimental results show that the proposedmethod not only has good effectivenessand accuracy but also has favorable real-time performance
1 Introduction
Human-Robot Interaction (HRI) is one of the hottestresearch directions in robot field and activity recognition isone of the most important markers of HRI As gesture recog-nition is a branch of activity recognition methods used inactivity recognition are also suitable for gesture recognitionGenerally depending on sensor types gesture recognitionis divided into two control modes motion sensor-based [1ndash3] and vision-based [4 5] Both of them have their owncharacteristics where gesture recognitions based on motionsensors are easy to capture the motion data of a body butmost of them need to be fixed on humanrsquos body which maymake users feel uncomfortable While vision-based gesturerecognition also has some disadvantages For instance illu-mination conditions and background scenery of an environ-ment may cause the difficulty of recognition Additionally itis difficult to obtain orientation information from a frame oftwo-dimensional (2D) image
According to different gesture recognition methods usedin HRI they can be divided into four categories whichinclude generative method discriminative method heuristic
analysis method and combinations of these methods [6]Generativemethoduses generativemodels for randomly gen-erating observed data typically given some hidden param-eters It specifies a joint probability distribution over obser-vation and label sequences [7] Generative method attemptsto build a complete description of the input or data spaceusually with a probabilistic model Hidden Markov Models(HMMs) is one of the most popular generative methodswhich is a probabilistic model with a specific structure Oncethe model is established it can be learned from data and usedto analyze the data Discriminative method analyzes featuresextracted from sensor data or segments without consideringsequential connections in the data Common discriminativemethods include Neural Networks (NNs) [8] ConditionalRandom Fields (CRFs) [9] and Support Vector Machines(SVM) [10] Heuristic analysis method is based on thedirect characteristic analysis and description of the data fromsensors [11] However each individual has different character-istics Thus this method is difficult to find a consistency wayfor observation Since generative method and discriminativemethod are both machine learning algorithms their param-eters can be training by using different types of individual
2 Mathematical Problems in Engineering
(a) (b)
Projector RGB sensor Infrared detector
(c)
Figure 1 (a) Xtion Pro Live sensor (b) Kinect sensor and (c) Internal structure of Kinect sensor
data However one drawback of generative approach is thatenough data must be available to learn the complete probabi-listic representations that are required Moreover both ofthese two types ofmethods have the problem of high-compu-tational complexity Currently researchers have tried to inte-grate the advantages of different methods to address complexactivity recognition problems [12]
In recent years a new type of RGB-Depth (RGB-D) sen-sor has appeared which is not only able to extract RGB fea-ture and depth information with position and direction butis also able to obtain three-dimensional (3D) acceleration and3D angular velocity information of the bodyrsquos motion AsusXtion Pro Live sensor and Microsoft Kinect sensor are twotypes of famous RGB-D sensors [13] as shown in Figures 1(a)and 1(b) respectively The internal structure of a Kinectsensor is shown in Figure 1(c) Since the advent of Kinectsensor in November 2010 researchers have used it to doextensive researches in the field of Human-Robot Interaction(HRI) Giraldo et al [14] proposed an approach based onmachine learning to identify predefined gestures based ondepth images obtained from a Kinect sensor Additionallythey also adopted a kernel-based learningmethod to discoverthe relationship among gestures and samples which can beused to correctly describe gestures Ren et al [15] detected theshape of a hand from color images and depth images obtainedfrom a Kinect sensor and then proposed a shape distancemetric method called Finger-Earth Moverrsquos Distance for dis-similarity measurement Finally they recognize predefinedgestures by template matching Bauer et al [16] proposed aHuman-Robot Interaction system Firstly a person makes apredefined gesture and then a robot try to understand theaction and respond accordingly in realtime
To overcome the shortcomings of large computation andpoor real-time performance a continuous gesture recogni-tion method is proposed for HRI The rest of the paperis organized as follows Section 2 describes a Feed-forward
Neural Networks (FNNs) for gesture spotting and the pro-posed Multilayer Hidden Markov Models (MHMMs) forcontinuous gesture recognition And then gives the pro-posed continuous hand gesture recognition procedure whichcontains hand spotting and gesture recognition based onFNNs and MHMMs The experimental results are shown inSection 3 Finally Section 4 shows the summary and then theacknowledgments
2 Mhmms-Based Continuous GestureRecognition
In this paper the proposed gesture recognition algorithmconsists of two parts the gesture spotting and segmentmodule and the continuous gesture recognition module Animportant problem in gesture recognition is how to distin-guish gestures and nongestures which is called gesture spot-ting problem namely how to detect a start point and anend point of a gesture [17] Firstly a Feed-Forward NeuralNetworks (FNNs) is used to detect gestures in an actionsignal and then a threshold criterion is used to segment ges-tures from the signal Finally MHMMs is used to recognizesegmented continuous gestures
FNNs are firstly used to distinguish whether a particularaction is a gestureWhen the FNNsdetermine a certain actionis a gesture then the output is 1 otherwise the output is 0 Fea-tures can bewell combined together in FNNs for classificationof gestures and nongestures Additionally through trainingthe weights and deviations of FNNs can be optimized Twocounters are used to detect a start point and an end point ofa gesture in accordance to a preset threshold value When theend point of a gesture is detected the spotting and segmentmodule will trigger the recognition module As MHMMsis a probabilistic model which has a high computationalcomplexity FNNs-based segment module will be used as aswitch to control data flows thus computation time can be
Mathematical Problems in Engineering 3
saved and the real-time performance of the whole algorithmcan be approved Sequential constraints among gestures willbe modeled by MHMMs in recognition module Forwardpropagation program in the Bayesian filtering will be used toproduce a posterior gesture decision and update the recog-nition result In this paper the flow chart of MHMMs-basedcontinuous gesture recognition method is shown in Figure 2
21 Gesture Spotting and Segment In this paper a Feed-forward Neural Networks (FNNs) is designed to discovergesture data from signals [18] The structure of a three-layer FNNs module is shown in Figure 3 The number ofinput nodes is 119899 the input variable of the input layer is119884 = (119910
1 1199102 119910
119899)119879 which indicates an original input signal
of actions the output vector of hidden layer is 1198862119894=
(1198862
1 1198862
2 119886
2
119898)
119879 the output vector of output layer is 1198863119894=
(1198863
1 1198863
2 119886
3
119898)
119879 which means a gesture spotting result theweightmatrix from the input layer to the hidden layer is1198821
119894=
(1198821
11198821
2 119882
1
119898) the weight matrix from the hidden layer
to the output layer is1198822119894= (119882
2
11198822
2 119882
2
119898) and Sigmoid
function 119891(119909) = (1 + 119890minus119909)minus1 is used as an excitation functionA 3D acceleration 120572
119894= [120572119909 120572119910 120572119911]119879 is one kind of data
extracted by RGB-D sensor which is trained by FNNs Let 1198863119894
be the output of FNNs
1198863
119894= hard lim (1198912 (11988221198911 (1198821120583
119894+ 1198871) + 1198872) minus 05) (1)
where11988211198822 1198871 and 1198872 are parameters of FNNsBoth of the network functions 1198911 and 1198912 are log-sigmoid
functions function hard lim is a hard limiting functionTherefore parameters of FNNs can be trained by a back-propagationmethod In the output layer set1198823 = 1 and 1198873 =0 in order to produce a discrete output The output of FNNsis a binary number (1 or 0) which respectively represents agesture or a nongesture The first layer and the second layerconstitute a two-layer feed-forward network which can beobtained optimized parameters through training And thenweights and deviations are fixed in the output layer in orderto produce a discrete output
Moreover through training the neural networks weightsand deviations of the FNNs can be optimized Two digitalcounters are used for continuous recording of the outputof FNNs When both of them exceed a preset thresholdvalue the start point and the end point of a gesture can bedetected thereby erroneous classification of the FNNs canbe prevented When an end point of a gesture is inspectedthe spotting and segment module will trigger the recognitionmodule
22 Multilayer Hidden Markov Models In our daily lifepeople usually use some gestures to commandpets And thesegestures are corresponded to some reasonable modes whichreflect sequential restrictions among continuous gesturesTherefore these restrictions can be used to improve theaccuracy of gesture recognition According to this inspira-tion this paper proposed a continuous gesture recognitionalgorithm based on MHMMs which is a statistical model
derived from HMMs HMMs is a famous statistical modelfor continuous data recognition which has been widelyused in speech recognition handwriting recognition andpattern recognition [19] It is characterized by a set ofparameters 120582 = (AB120587) where A B and 120587 are a statetransition probability distribution a probability distributionof observation symbols in each state and an initial statedistribution respectively In this paper a forward-backwardalgorithm is used to estimate a series of likelihood values ofobservation P(O | 120582) when given a specific HMMs [20] Fora given observation sequence O in the test mode a Viterbialgorithm is used to find the best state sequence Q [21] AnExpectation Maximization (EM) method is used to train theparameters of HMMs [22]
The proposed MHMMs are a kind of generalization fora segmented model where each segment has subsegmentsMHMMs are divided into two layers where LHMMs areused to identify individual gestures in the continuous gesturesignal and a Bayesian filter with sequential constraints inUHMMs is used to correct recognition errors created inLHMMs Figure 4 shows the basic idea of MHMMs A timesequence is layered into several segments where 1198781
119894represents
the state sequence of UHMMs and 1198782119894represents the state
sequence of LHMMs 1198782119894is the sub-HMMs of state sequence
1198781
119894O2119894means the decision of LHMMs
221 Data Preprocessing Before MHMMs are used to train-ing and recognition gesture data segmented by FNNs apreprocessing step needs to be completed firstly whichincludes removing high-frequency noise and finding thestroke duration of a gesture Each raw data obtained bya Kinect sensor is composed of a 6-component vector Ua 3D acceleration [120572
119909 120572119910 120572119911]119879 and a 3D angular velocity
[120596119909 120596119910 120596119911]119879 where
U = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911]
119879
(2)
When the robot computer receives a set of data as theoriginal data contains high-frequency noise whichmay affectgesture recognition a low pass filter with a cutoff frequencyof 5Hz is used to remove the noise and meanwhile smooththe data and then a sliding window with the length of 100msis used to calculate the average to remove theDC componentsin the 3D acceleration in time domain and generate a vectorw = [119889
119909 119889119910 119889119911]119879 Additionally a Fast Fourier Transform
(FFT) is applied to the deviation vector w in frequencydomain and the maximum frequency among 119889
119909 119889119910 and 119889
119911
is used as the fundamental frequency of a gesture Finally thedata pointV composing of a 3D angular velocity vector a 3Dacceleration vector and a 3D deviation vector of accelerationis fed into MHMMs for gesture recognition where
V = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911 119889119909 119889119910 119889119911]
119879
(3)
Then a K-means clustering method is used to train cen-troids in order to quantize vectors into observable symbolsThe process of symbolization converts feature vectors into afinite number of symbols which can be used in the discrete
4 Mathematical Problems in Engineering
Raw data ofmotion
FNNs
Y
Y
Y
Y
NN
N
N
Counter gtthreshold
Counter gtthreshold
Start point
Increase gesturetime counter
Is this agesture
Gesturesegment
Increasenonsense gesture
time counter
End point
Spotting and segment module
Recognition moduleSliding
windows
Data preprocessing
K-means clustering
Likelihoodestimation based on
each HMM
Maximum likelihoodestimation
Bayesianfilter with constraints
Save the decision
Has the slidingwindow arrived at the
end of the dataRecognition
result
Figure 2 Flow chart of MHMMs-based continuous gesture recognition
y1
y2
y3
yn
y
1 1 1
n times 1 W1 W2
3 times n
1 times 1
1 times 11 times 13 times 1
1 times 13 times 1
3 times 11 times 3
1 times 1
b1 b2
+ + +n1 n2 n3f1
f2 f3
a1 a2
1 times 1
a3
Input First layer Second layer Third layer
a1 = f1(W1y + b1) a2 = f2(W2f1(W1y + b1) + b2) a2 minus 05)a3 = hardlim(
Raw data
minus05
R
Figure 3 The structure of a three-layer FNNs
Mathematical Problems in Engineering 5
S11S12
S21 S22 S23 S24
O21 O2
2 O23
S21 S22 S23
O21 O2
2 O23O2
4
Figure 4 The architecture of the proposed MHMMs
HMMs Namely the K-means clustering method is appliedon the vector V to get the partition value for each vector andalso a set of centroids for clustering the data into observablesymbols in the recognition phase The K-means clusteringmethod clusters 119899 objects into 119896 partitions where 119896 lt 119899which is based on the properties of objects
As shown in Figure 5 a sliding window of 1 second (150data points) is moving with a speed of 30 data points eachstep which is simultaneously estimating likelihood of eachset of HMMs parameters Each sliding window generates alikelihood value Therefore for a certain gesture the modelwhich maximizes the likelihood over other HMMs to be therecognized type is chosen as the output decision of the slidingwindow
222 LHMMs-Based Individual Gestures Recognition
(1) Training Phase The training phase is divided into foursteps which include finding the stroke duration of a gesturequantifying the vectors into observable symbols settingup initial HMMs parameters and iterating for ExpectationMaximization
Step 1 (find the stroke duration of a gesture) FFT is used tofind the stroke duration of the gesture for HMMs
Step 2 (quantify the vectors into observable symbols) Afterpreprocessing the raw data these data are used for theobservation sequence O = O
1O2 O119879and the parameters
of HMMs 120582 = (AB120587) respectively Given parameters of themodel conditional probability of the observation sequencecan be calculated Namely the problem is given a specificHMMs to assess the probability of an observation sequenceP(O | 120582) a forward-backward algorithm is used to realizethe task as follows [20] Define the forward variables
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (4)
that is given a model the output observable sequence isO1O2 O119879to the time 119905 and the probability of the state is
119878119894at time 119905 The forward variables can be recursive calculated
by the following formula
120572119905+1(119895) = [
119873
sum
119894=1
120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)
where 1 le 119905 le 119879 minus 1 1 le 119895 le 119873
5
3
0
minus3
minus5
100 200 300 400 500 600 700 800 900
Sliding windows
Segmented gesture signal by FNNs
axayaz
middot middot middot
Figure 5 Sliding windows for preprocessing
Define the backward variables
120573119905(119894) = P (O
119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)
That is given model parameters at time 119905 the state is 119878119894
From the moment 119905 + 1 to the end of the sequence the prob-ability of the output observation sequence is O
1O2 O119879
Similarly a recursive method can be used to calculate 120573119905(119894)
as follows
120573119905(119894) =
119873
sum
119895=1
119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)
where 119905 = 119879 minus 1 119879 minus 2 1 1 le 119895 le 119873Then a K-means clustering method is applied to a 9D
vector V to get the partition value for each vector and alsoa set of centroids for clustering the data into observablesymbols in the recognition phase
Step 3 (set up initial HMMs parameter) Set up state numbersin themodelThe initial value of different observable symbolsO = O
1O2 O119879and 120582 = (AB120587) in each state need to be
iterative which should meet the randomness constraints ofHMMs parameters Then a Viterbi algorithm is used to findthe single best state sequenceQ = 119902
11199022 119902119879[21]
Step 4 (expectation maximization iteration) A Baum-Welchalgorithm is used to calculate an auxiliary function andmaximize the likelihood value of 120582 [22] 119873 iterations arepreceded until the likelihood approaches to a steady valueExpectation value Q(120582120582) and the maximized likelihood 120582are calculated as follows
Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]
max120582
[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)
(2) Recognition Phase During the recognition phase aftertraining a group of centroids for K-means clustering several
6 Mathematical Problems in Engineering
Move forward
StopMove backward
Turn rightTurn left
S1
S2
S3S4
S5
(a)
q0 q1 qt qt+1
O0 O1 Ot Ot+1
(b)
Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs
groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type
223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878
1 1198782 1198785 The system undergoes a change
of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886
119894119895described below relates the current
state with the previous state
119886119894119895= P [119902
119896= 119878119895| 119902119896minus1= 119878119894] (9)
The initial state distribution represents the probabilitydistribution of the first order which is defined as
120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)
Another element of LHMMs is the state probabilitydistribution of observation symbol 119878
119895
119887119895(119896) = P [O
119896| 119902119905= 119878119895] (11)
119887119895means the probability which is identified as different
observation symbols whereO119896represents decisions made by
UHMMsFigure 6(b) shows a sequence inUHMMs where the state
119902119905represents a 119905th gesture and O
119905is a majority decision
result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as
P(O119905+1| 119902119905+1= 119878119895) represents 119887
119894(O119905+1) where the
state at time 119905 + 1 is 119878119895and the observation isO
119905+1
P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability
119886119894119895in matrix A
The forward variable 120572119905(119894) is defined as the probability of
the observation sequence O1O2 O119905 and state 119878
119894at time 119905
given the model 120582
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (12)
where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)
we can conclude
120572119905+1(119895) = P (O
119905+1| 119902119905+1= 119878119895)
timessum
119878119894
P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)
(13)
with the initial condition of uniform distribution
1205720(119894) = P (119902
0= 119878119894) = 120587119894 (14)
For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
(a) (b)
Projector RGB sensor Infrared detector
(c)
Figure 1 (a) Xtion Pro Live sensor (b) Kinect sensor and (c) Internal structure of Kinect sensor
data However one drawback of generative approach is thatenough data must be available to learn the complete probabi-listic representations that are required Moreover both ofthese two types ofmethods have the problem of high-compu-tational complexity Currently researchers have tried to inte-grate the advantages of different methods to address complexactivity recognition problems [12]
In recent years a new type of RGB-Depth (RGB-D) sen-sor has appeared which is not only able to extract RGB fea-ture and depth information with position and direction butis also able to obtain three-dimensional (3D) acceleration and3D angular velocity information of the bodyrsquos motion AsusXtion Pro Live sensor and Microsoft Kinect sensor are twotypes of famous RGB-D sensors [13] as shown in Figures 1(a)and 1(b) respectively The internal structure of a Kinectsensor is shown in Figure 1(c) Since the advent of Kinectsensor in November 2010 researchers have used it to doextensive researches in the field of Human-Robot Interaction(HRI) Giraldo et al [14] proposed an approach based onmachine learning to identify predefined gestures based ondepth images obtained from a Kinect sensor Additionallythey also adopted a kernel-based learningmethod to discoverthe relationship among gestures and samples which can beused to correctly describe gestures Ren et al [15] detected theshape of a hand from color images and depth images obtainedfrom a Kinect sensor and then proposed a shape distancemetric method called Finger-Earth Moverrsquos Distance for dis-similarity measurement Finally they recognize predefinedgestures by template matching Bauer et al [16] proposed aHuman-Robot Interaction system Firstly a person makes apredefined gesture and then a robot try to understand theaction and respond accordingly in realtime
To overcome the shortcomings of large computation andpoor real-time performance a continuous gesture recogni-tion method is proposed for HRI The rest of the paperis organized as follows Section 2 describes a Feed-forward
Neural Networks (FNNs) for gesture spotting and the pro-posed Multilayer Hidden Markov Models (MHMMs) forcontinuous gesture recognition And then gives the pro-posed continuous hand gesture recognition procedure whichcontains hand spotting and gesture recognition based onFNNs and MHMMs The experimental results are shown inSection 3 Finally Section 4 shows the summary and then theacknowledgments
2 Mhmms-Based Continuous GestureRecognition
In this paper the proposed gesture recognition algorithmconsists of two parts the gesture spotting and segmentmodule and the continuous gesture recognition module Animportant problem in gesture recognition is how to distin-guish gestures and nongestures which is called gesture spot-ting problem namely how to detect a start point and anend point of a gesture [17] Firstly a Feed-Forward NeuralNetworks (FNNs) is used to detect gestures in an actionsignal and then a threshold criterion is used to segment ges-tures from the signal Finally MHMMs is used to recognizesegmented continuous gestures
FNNs are firstly used to distinguish whether a particularaction is a gestureWhen the FNNsdetermine a certain actionis a gesture then the output is 1 otherwise the output is 0 Fea-tures can bewell combined together in FNNs for classificationof gestures and nongestures Additionally through trainingthe weights and deviations of FNNs can be optimized Twocounters are used to detect a start point and an end point ofa gesture in accordance to a preset threshold value When theend point of a gesture is detected the spotting and segmentmodule will trigger the recognition module As MHMMsis a probabilistic model which has a high computationalcomplexity FNNs-based segment module will be used as aswitch to control data flows thus computation time can be
Mathematical Problems in Engineering 3
saved and the real-time performance of the whole algorithmcan be approved Sequential constraints among gestures willbe modeled by MHMMs in recognition module Forwardpropagation program in the Bayesian filtering will be used toproduce a posterior gesture decision and update the recog-nition result In this paper the flow chart of MHMMs-basedcontinuous gesture recognition method is shown in Figure 2
21 Gesture Spotting and Segment In this paper a Feed-forward Neural Networks (FNNs) is designed to discovergesture data from signals [18] The structure of a three-layer FNNs module is shown in Figure 3 The number ofinput nodes is 119899 the input variable of the input layer is119884 = (119910
1 1199102 119910
119899)119879 which indicates an original input signal
of actions the output vector of hidden layer is 1198862119894=
(1198862
1 1198862
2 119886
2
119898)
119879 the output vector of output layer is 1198863119894=
(1198863
1 1198863
2 119886
3
119898)
119879 which means a gesture spotting result theweightmatrix from the input layer to the hidden layer is1198821
119894=
(1198821
11198821
2 119882
1
119898) the weight matrix from the hidden layer
to the output layer is1198822119894= (119882
2
11198822
2 119882
2
119898) and Sigmoid
function 119891(119909) = (1 + 119890minus119909)minus1 is used as an excitation functionA 3D acceleration 120572
119894= [120572119909 120572119910 120572119911]119879 is one kind of data
extracted by RGB-D sensor which is trained by FNNs Let 1198863119894
be the output of FNNs
1198863
119894= hard lim (1198912 (11988221198911 (1198821120583
119894+ 1198871) + 1198872) minus 05) (1)
where11988211198822 1198871 and 1198872 are parameters of FNNsBoth of the network functions 1198911 and 1198912 are log-sigmoid
functions function hard lim is a hard limiting functionTherefore parameters of FNNs can be trained by a back-propagationmethod In the output layer set1198823 = 1 and 1198873 =0 in order to produce a discrete output The output of FNNsis a binary number (1 or 0) which respectively represents agesture or a nongesture The first layer and the second layerconstitute a two-layer feed-forward network which can beobtained optimized parameters through training And thenweights and deviations are fixed in the output layer in orderto produce a discrete output
Moreover through training the neural networks weightsand deviations of the FNNs can be optimized Two digitalcounters are used for continuous recording of the outputof FNNs When both of them exceed a preset thresholdvalue the start point and the end point of a gesture can bedetected thereby erroneous classification of the FNNs canbe prevented When an end point of a gesture is inspectedthe spotting and segment module will trigger the recognitionmodule
22 Multilayer Hidden Markov Models In our daily lifepeople usually use some gestures to commandpets And thesegestures are corresponded to some reasonable modes whichreflect sequential restrictions among continuous gesturesTherefore these restrictions can be used to improve theaccuracy of gesture recognition According to this inspira-tion this paper proposed a continuous gesture recognitionalgorithm based on MHMMs which is a statistical model
derived from HMMs HMMs is a famous statistical modelfor continuous data recognition which has been widelyused in speech recognition handwriting recognition andpattern recognition [19] It is characterized by a set ofparameters 120582 = (AB120587) where A B and 120587 are a statetransition probability distribution a probability distributionof observation symbols in each state and an initial statedistribution respectively In this paper a forward-backwardalgorithm is used to estimate a series of likelihood values ofobservation P(O | 120582) when given a specific HMMs [20] Fora given observation sequence O in the test mode a Viterbialgorithm is used to find the best state sequence Q [21] AnExpectation Maximization (EM) method is used to train theparameters of HMMs [22]
The proposed MHMMs are a kind of generalization fora segmented model where each segment has subsegmentsMHMMs are divided into two layers where LHMMs areused to identify individual gestures in the continuous gesturesignal and a Bayesian filter with sequential constraints inUHMMs is used to correct recognition errors created inLHMMs Figure 4 shows the basic idea of MHMMs A timesequence is layered into several segments where 1198781
119894represents
the state sequence of UHMMs and 1198782119894represents the state
sequence of LHMMs 1198782119894is the sub-HMMs of state sequence
1198781
119894O2119894means the decision of LHMMs
221 Data Preprocessing Before MHMMs are used to train-ing and recognition gesture data segmented by FNNs apreprocessing step needs to be completed firstly whichincludes removing high-frequency noise and finding thestroke duration of a gesture Each raw data obtained bya Kinect sensor is composed of a 6-component vector Ua 3D acceleration [120572
119909 120572119910 120572119911]119879 and a 3D angular velocity
[120596119909 120596119910 120596119911]119879 where
U = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911]
119879
(2)
When the robot computer receives a set of data as theoriginal data contains high-frequency noise whichmay affectgesture recognition a low pass filter with a cutoff frequencyof 5Hz is used to remove the noise and meanwhile smooththe data and then a sliding window with the length of 100msis used to calculate the average to remove theDC componentsin the 3D acceleration in time domain and generate a vectorw = [119889
119909 119889119910 119889119911]119879 Additionally a Fast Fourier Transform
(FFT) is applied to the deviation vector w in frequencydomain and the maximum frequency among 119889
119909 119889119910 and 119889
119911
is used as the fundamental frequency of a gesture Finally thedata pointV composing of a 3D angular velocity vector a 3Dacceleration vector and a 3D deviation vector of accelerationis fed into MHMMs for gesture recognition where
V = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911 119889119909 119889119910 119889119911]
119879
(3)
Then a K-means clustering method is used to train cen-troids in order to quantize vectors into observable symbolsThe process of symbolization converts feature vectors into afinite number of symbols which can be used in the discrete
4 Mathematical Problems in Engineering
Raw data ofmotion
FNNs
Y
Y
Y
Y
NN
N
N
Counter gtthreshold
Counter gtthreshold
Start point
Increase gesturetime counter
Is this agesture
Gesturesegment
Increasenonsense gesture
time counter
End point
Spotting and segment module
Recognition moduleSliding
windows
Data preprocessing
K-means clustering
Likelihoodestimation based on
each HMM
Maximum likelihoodestimation
Bayesianfilter with constraints
Save the decision
Has the slidingwindow arrived at the
end of the dataRecognition
result
Figure 2 Flow chart of MHMMs-based continuous gesture recognition
y1
y2
y3
yn
y
1 1 1
n times 1 W1 W2
3 times n
1 times 1
1 times 11 times 13 times 1
1 times 13 times 1
3 times 11 times 3
1 times 1
b1 b2
+ + +n1 n2 n3f1
f2 f3
a1 a2
1 times 1
a3
Input First layer Second layer Third layer
a1 = f1(W1y + b1) a2 = f2(W2f1(W1y + b1) + b2) a2 minus 05)a3 = hardlim(
Raw data
minus05
R
Figure 3 The structure of a three-layer FNNs
Mathematical Problems in Engineering 5
S11S12
S21 S22 S23 S24
O21 O2
2 O23
S21 S22 S23
O21 O2
2 O23O2
4
Figure 4 The architecture of the proposed MHMMs
HMMs Namely the K-means clustering method is appliedon the vector V to get the partition value for each vector andalso a set of centroids for clustering the data into observablesymbols in the recognition phase The K-means clusteringmethod clusters 119899 objects into 119896 partitions where 119896 lt 119899which is based on the properties of objects
As shown in Figure 5 a sliding window of 1 second (150data points) is moving with a speed of 30 data points eachstep which is simultaneously estimating likelihood of eachset of HMMs parameters Each sliding window generates alikelihood value Therefore for a certain gesture the modelwhich maximizes the likelihood over other HMMs to be therecognized type is chosen as the output decision of the slidingwindow
222 LHMMs-Based Individual Gestures Recognition
(1) Training Phase The training phase is divided into foursteps which include finding the stroke duration of a gesturequantifying the vectors into observable symbols settingup initial HMMs parameters and iterating for ExpectationMaximization
Step 1 (find the stroke duration of a gesture) FFT is used tofind the stroke duration of the gesture for HMMs
Step 2 (quantify the vectors into observable symbols) Afterpreprocessing the raw data these data are used for theobservation sequence O = O
1O2 O119879and the parameters
of HMMs 120582 = (AB120587) respectively Given parameters of themodel conditional probability of the observation sequencecan be calculated Namely the problem is given a specificHMMs to assess the probability of an observation sequenceP(O | 120582) a forward-backward algorithm is used to realizethe task as follows [20] Define the forward variables
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (4)
that is given a model the output observable sequence isO1O2 O119879to the time 119905 and the probability of the state is
119878119894at time 119905 The forward variables can be recursive calculated
by the following formula
120572119905+1(119895) = [
119873
sum
119894=1
120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)
where 1 le 119905 le 119879 minus 1 1 le 119895 le 119873
5
3
0
minus3
minus5
100 200 300 400 500 600 700 800 900
Sliding windows
Segmented gesture signal by FNNs
axayaz
middot middot middot
Figure 5 Sliding windows for preprocessing
Define the backward variables
120573119905(119894) = P (O
119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)
That is given model parameters at time 119905 the state is 119878119894
From the moment 119905 + 1 to the end of the sequence the prob-ability of the output observation sequence is O
1O2 O119879
Similarly a recursive method can be used to calculate 120573119905(119894)
as follows
120573119905(119894) =
119873
sum
119895=1
119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)
where 119905 = 119879 minus 1 119879 minus 2 1 1 le 119895 le 119873Then a K-means clustering method is applied to a 9D
vector V to get the partition value for each vector and alsoa set of centroids for clustering the data into observablesymbols in the recognition phase
Step 3 (set up initial HMMs parameter) Set up state numbersin themodelThe initial value of different observable symbolsO = O
1O2 O119879and 120582 = (AB120587) in each state need to be
iterative which should meet the randomness constraints ofHMMs parameters Then a Viterbi algorithm is used to findthe single best state sequenceQ = 119902
11199022 119902119879[21]
Step 4 (expectation maximization iteration) A Baum-Welchalgorithm is used to calculate an auxiliary function andmaximize the likelihood value of 120582 [22] 119873 iterations arepreceded until the likelihood approaches to a steady valueExpectation value Q(120582120582) and the maximized likelihood 120582are calculated as follows
Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]
max120582
[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)
(2) Recognition Phase During the recognition phase aftertraining a group of centroids for K-means clustering several
6 Mathematical Problems in Engineering
Move forward
StopMove backward
Turn rightTurn left
S1
S2
S3S4
S5
(a)
q0 q1 qt qt+1
O0 O1 Ot Ot+1
(b)
Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs
groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type
223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878
1 1198782 1198785 The system undergoes a change
of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886
119894119895described below relates the current
state with the previous state
119886119894119895= P [119902
119896= 119878119895| 119902119896minus1= 119878119894] (9)
The initial state distribution represents the probabilitydistribution of the first order which is defined as
120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)
Another element of LHMMs is the state probabilitydistribution of observation symbol 119878
119895
119887119895(119896) = P [O
119896| 119902119905= 119878119895] (11)
119887119895means the probability which is identified as different
observation symbols whereO119896represents decisions made by
UHMMsFigure 6(b) shows a sequence inUHMMs where the state
119902119905represents a 119905th gesture and O
119905is a majority decision
result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as
P(O119905+1| 119902119905+1= 119878119895) represents 119887
119894(O119905+1) where the
state at time 119905 + 1 is 119878119895and the observation isO
119905+1
P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability
119886119894119895in matrix A
The forward variable 120572119905(119894) is defined as the probability of
the observation sequence O1O2 O119905 and state 119878
119894at time 119905
given the model 120582
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (12)
where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)
we can conclude
120572119905+1(119895) = P (O
119905+1| 119902119905+1= 119878119895)
timessum
119878119894
P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)
(13)
with the initial condition of uniform distribution
1205720(119894) = P (119902
0= 119878119894) = 120587119894 (14)
For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
saved and the real-time performance of the whole algorithmcan be approved Sequential constraints among gestures willbe modeled by MHMMs in recognition module Forwardpropagation program in the Bayesian filtering will be used toproduce a posterior gesture decision and update the recog-nition result In this paper the flow chart of MHMMs-basedcontinuous gesture recognition method is shown in Figure 2
21 Gesture Spotting and Segment In this paper a Feed-forward Neural Networks (FNNs) is designed to discovergesture data from signals [18] The structure of a three-layer FNNs module is shown in Figure 3 The number ofinput nodes is 119899 the input variable of the input layer is119884 = (119910
1 1199102 119910
119899)119879 which indicates an original input signal
of actions the output vector of hidden layer is 1198862119894=
(1198862
1 1198862
2 119886
2
119898)
119879 the output vector of output layer is 1198863119894=
(1198863
1 1198863
2 119886
3
119898)
119879 which means a gesture spotting result theweightmatrix from the input layer to the hidden layer is1198821
119894=
(1198821
11198821
2 119882
1
119898) the weight matrix from the hidden layer
to the output layer is1198822119894= (119882
2
11198822
2 119882
2
119898) and Sigmoid
function 119891(119909) = (1 + 119890minus119909)minus1 is used as an excitation functionA 3D acceleration 120572
119894= [120572119909 120572119910 120572119911]119879 is one kind of data
extracted by RGB-D sensor which is trained by FNNs Let 1198863119894
be the output of FNNs
1198863
119894= hard lim (1198912 (11988221198911 (1198821120583
119894+ 1198871) + 1198872) minus 05) (1)
where11988211198822 1198871 and 1198872 are parameters of FNNsBoth of the network functions 1198911 and 1198912 are log-sigmoid
functions function hard lim is a hard limiting functionTherefore parameters of FNNs can be trained by a back-propagationmethod In the output layer set1198823 = 1 and 1198873 =0 in order to produce a discrete output The output of FNNsis a binary number (1 or 0) which respectively represents agesture or a nongesture The first layer and the second layerconstitute a two-layer feed-forward network which can beobtained optimized parameters through training And thenweights and deviations are fixed in the output layer in orderto produce a discrete output
Moreover through training the neural networks weightsand deviations of the FNNs can be optimized Two digitalcounters are used for continuous recording of the outputof FNNs When both of them exceed a preset thresholdvalue the start point and the end point of a gesture can bedetected thereby erroneous classification of the FNNs canbe prevented When an end point of a gesture is inspectedthe spotting and segment module will trigger the recognitionmodule
22 Multilayer Hidden Markov Models In our daily lifepeople usually use some gestures to commandpets And thesegestures are corresponded to some reasonable modes whichreflect sequential restrictions among continuous gesturesTherefore these restrictions can be used to improve theaccuracy of gesture recognition According to this inspira-tion this paper proposed a continuous gesture recognitionalgorithm based on MHMMs which is a statistical model
derived from HMMs HMMs is a famous statistical modelfor continuous data recognition which has been widelyused in speech recognition handwriting recognition andpattern recognition [19] It is characterized by a set ofparameters 120582 = (AB120587) where A B and 120587 are a statetransition probability distribution a probability distributionof observation symbols in each state and an initial statedistribution respectively In this paper a forward-backwardalgorithm is used to estimate a series of likelihood values ofobservation P(O | 120582) when given a specific HMMs [20] Fora given observation sequence O in the test mode a Viterbialgorithm is used to find the best state sequence Q [21] AnExpectation Maximization (EM) method is used to train theparameters of HMMs [22]
The proposed MHMMs are a kind of generalization fora segmented model where each segment has subsegmentsMHMMs are divided into two layers where LHMMs areused to identify individual gestures in the continuous gesturesignal and a Bayesian filter with sequential constraints inUHMMs is used to correct recognition errors created inLHMMs Figure 4 shows the basic idea of MHMMs A timesequence is layered into several segments where 1198781
119894represents
the state sequence of UHMMs and 1198782119894represents the state
sequence of LHMMs 1198782119894is the sub-HMMs of state sequence
1198781
119894O2119894means the decision of LHMMs
221 Data Preprocessing Before MHMMs are used to train-ing and recognition gesture data segmented by FNNs apreprocessing step needs to be completed firstly whichincludes removing high-frequency noise and finding thestroke duration of a gesture Each raw data obtained bya Kinect sensor is composed of a 6-component vector Ua 3D acceleration [120572
119909 120572119910 120572119911]119879 and a 3D angular velocity
[120596119909 120596119910 120596119911]119879 where
U = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911]
119879
(2)
When the robot computer receives a set of data as theoriginal data contains high-frequency noise whichmay affectgesture recognition a low pass filter with a cutoff frequencyof 5Hz is used to remove the noise and meanwhile smooththe data and then a sliding window with the length of 100msis used to calculate the average to remove theDC componentsin the 3D acceleration in time domain and generate a vectorw = [119889
119909 119889119910 119889119911]119879 Additionally a Fast Fourier Transform
(FFT) is applied to the deviation vector w in frequencydomain and the maximum frequency among 119889
119909 119889119910 and 119889
119911
is used as the fundamental frequency of a gesture Finally thedata pointV composing of a 3D angular velocity vector a 3Dacceleration vector and a 3D deviation vector of accelerationis fed into MHMMs for gesture recognition where
V = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911 119889119909 119889119910 119889119911]
119879
(3)
Then a K-means clustering method is used to train cen-troids in order to quantize vectors into observable symbolsThe process of symbolization converts feature vectors into afinite number of symbols which can be used in the discrete
4 Mathematical Problems in Engineering
Raw data ofmotion
FNNs
Y
Y
Y
Y
NN
N
N
Counter gtthreshold
Counter gtthreshold
Start point
Increase gesturetime counter
Is this agesture
Gesturesegment
Increasenonsense gesture
time counter
End point
Spotting and segment module
Recognition moduleSliding
windows
Data preprocessing
K-means clustering
Likelihoodestimation based on
each HMM
Maximum likelihoodestimation
Bayesianfilter with constraints
Save the decision
Has the slidingwindow arrived at the
end of the dataRecognition
result
Figure 2 Flow chart of MHMMs-based continuous gesture recognition
y1
y2
y3
yn
y
1 1 1
n times 1 W1 W2
3 times n
1 times 1
1 times 11 times 13 times 1
1 times 13 times 1
3 times 11 times 3
1 times 1
b1 b2
+ + +n1 n2 n3f1
f2 f3
a1 a2
1 times 1
a3
Input First layer Second layer Third layer
a1 = f1(W1y + b1) a2 = f2(W2f1(W1y + b1) + b2) a2 minus 05)a3 = hardlim(
Raw data
minus05
R
Figure 3 The structure of a three-layer FNNs
Mathematical Problems in Engineering 5
S11S12
S21 S22 S23 S24
O21 O2
2 O23
S21 S22 S23
O21 O2
2 O23O2
4
Figure 4 The architecture of the proposed MHMMs
HMMs Namely the K-means clustering method is appliedon the vector V to get the partition value for each vector andalso a set of centroids for clustering the data into observablesymbols in the recognition phase The K-means clusteringmethod clusters 119899 objects into 119896 partitions where 119896 lt 119899which is based on the properties of objects
As shown in Figure 5 a sliding window of 1 second (150data points) is moving with a speed of 30 data points eachstep which is simultaneously estimating likelihood of eachset of HMMs parameters Each sliding window generates alikelihood value Therefore for a certain gesture the modelwhich maximizes the likelihood over other HMMs to be therecognized type is chosen as the output decision of the slidingwindow
222 LHMMs-Based Individual Gestures Recognition
(1) Training Phase The training phase is divided into foursteps which include finding the stroke duration of a gesturequantifying the vectors into observable symbols settingup initial HMMs parameters and iterating for ExpectationMaximization
Step 1 (find the stroke duration of a gesture) FFT is used tofind the stroke duration of the gesture for HMMs
Step 2 (quantify the vectors into observable symbols) Afterpreprocessing the raw data these data are used for theobservation sequence O = O
1O2 O119879and the parameters
of HMMs 120582 = (AB120587) respectively Given parameters of themodel conditional probability of the observation sequencecan be calculated Namely the problem is given a specificHMMs to assess the probability of an observation sequenceP(O | 120582) a forward-backward algorithm is used to realizethe task as follows [20] Define the forward variables
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (4)
that is given a model the output observable sequence isO1O2 O119879to the time 119905 and the probability of the state is
119878119894at time 119905 The forward variables can be recursive calculated
by the following formula
120572119905+1(119895) = [
119873
sum
119894=1
120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)
where 1 le 119905 le 119879 minus 1 1 le 119895 le 119873
5
3
0
minus3
minus5
100 200 300 400 500 600 700 800 900
Sliding windows
Segmented gesture signal by FNNs
axayaz
middot middot middot
Figure 5 Sliding windows for preprocessing
Define the backward variables
120573119905(119894) = P (O
119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)
That is given model parameters at time 119905 the state is 119878119894
From the moment 119905 + 1 to the end of the sequence the prob-ability of the output observation sequence is O
1O2 O119879
Similarly a recursive method can be used to calculate 120573119905(119894)
as follows
120573119905(119894) =
119873
sum
119895=1
119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)
where 119905 = 119879 minus 1 119879 minus 2 1 1 le 119895 le 119873Then a K-means clustering method is applied to a 9D
vector V to get the partition value for each vector and alsoa set of centroids for clustering the data into observablesymbols in the recognition phase
Step 3 (set up initial HMMs parameter) Set up state numbersin themodelThe initial value of different observable symbolsO = O
1O2 O119879and 120582 = (AB120587) in each state need to be
iterative which should meet the randomness constraints ofHMMs parameters Then a Viterbi algorithm is used to findthe single best state sequenceQ = 119902
11199022 119902119879[21]
Step 4 (expectation maximization iteration) A Baum-Welchalgorithm is used to calculate an auxiliary function andmaximize the likelihood value of 120582 [22] 119873 iterations arepreceded until the likelihood approaches to a steady valueExpectation value Q(120582120582) and the maximized likelihood 120582are calculated as follows
Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]
max120582
[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)
(2) Recognition Phase During the recognition phase aftertraining a group of centroids for K-means clustering several
6 Mathematical Problems in Engineering
Move forward
StopMove backward
Turn rightTurn left
S1
S2
S3S4
S5
(a)
q0 q1 qt qt+1
O0 O1 Ot Ot+1
(b)
Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs
groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type
223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878
1 1198782 1198785 The system undergoes a change
of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886
119894119895described below relates the current
state with the previous state
119886119894119895= P [119902
119896= 119878119895| 119902119896minus1= 119878119894] (9)
The initial state distribution represents the probabilitydistribution of the first order which is defined as
120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)
Another element of LHMMs is the state probabilitydistribution of observation symbol 119878
119895
119887119895(119896) = P [O
119896| 119902119905= 119878119895] (11)
119887119895means the probability which is identified as different
observation symbols whereO119896represents decisions made by
UHMMsFigure 6(b) shows a sequence inUHMMs where the state
119902119905represents a 119905th gesture and O
119905is a majority decision
result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as
P(O119905+1| 119902119905+1= 119878119895) represents 119887
119894(O119905+1) where the
state at time 119905 + 1 is 119878119895and the observation isO
119905+1
P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability
119886119894119895in matrix A
The forward variable 120572119905(119894) is defined as the probability of
the observation sequence O1O2 O119905 and state 119878
119894at time 119905
given the model 120582
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (12)
where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)
we can conclude
120572119905+1(119895) = P (O
119905+1| 119902119905+1= 119878119895)
timessum
119878119894
P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)
(13)
with the initial condition of uniform distribution
1205720(119894) = P (119902
0= 119878119894) = 120587119894 (14)
For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
Raw data ofmotion
FNNs
Y
Y
Y
Y
NN
N
N
Counter gtthreshold
Counter gtthreshold
Start point
Increase gesturetime counter
Is this agesture
Gesturesegment
Increasenonsense gesture
time counter
End point
Spotting and segment module
Recognition moduleSliding
windows
Data preprocessing
K-means clustering
Likelihoodestimation based on
each HMM
Maximum likelihoodestimation
Bayesianfilter with constraints
Save the decision
Has the slidingwindow arrived at the
end of the dataRecognition
result
Figure 2 Flow chart of MHMMs-based continuous gesture recognition
y1
y2
y3
yn
y
1 1 1
n times 1 W1 W2
3 times n
1 times 1
1 times 11 times 13 times 1
1 times 13 times 1
3 times 11 times 3
1 times 1
b1 b2
+ + +n1 n2 n3f1
f2 f3
a1 a2
1 times 1
a3
Input First layer Second layer Third layer
a1 = f1(W1y + b1) a2 = f2(W2f1(W1y + b1) + b2) a2 minus 05)a3 = hardlim(
Raw data
minus05
R
Figure 3 The structure of a three-layer FNNs
Mathematical Problems in Engineering 5
S11S12
S21 S22 S23 S24
O21 O2
2 O23
S21 S22 S23
O21 O2
2 O23O2
4
Figure 4 The architecture of the proposed MHMMs
HMMs Namely the K-means clustering method is appliedon the vector V to get the partition value for each vector andalso a set of centroids for clustering the data into observablesymbols in the recognition phase The K-means clusteringmethod clusters 119899 objects into 119896 partitions where 119896 lt 119899which is based on the properties of objects
As shown in Figure 5 a sliding window of 1 second (150data points) is moving with a speed of 30 data points eachstep which is simultaneously estimating likelihood of eachset of HMMs parameters Each sliding window generates alikelihood value Therefore for a certain gesture the modelwhich maximizes the likelihood over other HMMs to be therecognized type is chosen as the output decision of the slidingwindow
222 LHMMs-Based Individual Gestures Recognition
(1) Training Phase The training phase is divided into foursteps which include finding the stroke duration of a gesturequantifying the vectors into observable symbols settingup initial HMMs parameters and iterating for ExpectationMaximization
Step 1 (find the stroke duration of a gesture) FFT is used tofind the stroke duration of the gesture for HMMs
Step 2 (quantify the vectors into observable symbols) Afterpreprocessing the raw data these data are used for theobservation sequence O = O
1O2 O119879and the parameters
of HMMs 120582 = (AB120587) respectively Given parameters of themodel conditional probability of the observation sequencecan be calculated Namely the problem is given a specificHMMs to assess the probability of an observation sequenceP(O | 120582) a forward-backward algorithm is used to realizethe task as follows [20] Define the forward variables
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (4)
that is given a model the output observable sequence isO1O2 O119879to the time 119905 and the probability of the state is
119878119894at time 119905 The forward variables can be recursive calculated
by the following formula
120572119905+1(119895) = [
119873
sum
119894=1
120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)
where 1 le 119905 le 119879 minus 1 1 le 119895 le 119873
5
3
0
minus3
minus5
100 200 300 400 500 600 700 800 900
Sliding windows
Segmented gesture signal by FNNs
axayaz
middot middot middot
Figure 5 Sliding windows for preprocessing
Define the backward variables
120573119905(119894) = P (O
119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)
That is given model parameters at time 119905 the state is 119878119894
From the moment 119905 + 1 to the end of the sequence the prob-ability of the output observation sequence is O
1O2 O119879
Similarly a recursive method can be used to calculate 120573119905(119894)
as follows
120573119905(119894) =
119873
sum
119895=1
119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)
where 119905 = 119879 minus 1 119879 minus 2 1 1 le 119895 le 119873Then a K-means clustering method is applied to a 9D
vector V to get the partition value for each vector and alsoa set of centroids for clustering the data into observablesymbols in the recognition phase
Step 3 (set up initial HMMs parameter) Set up state numbersin themodelThe initial value of different observable symbolsO = O
1O2 O119879and 120582 = (AB120587) in each state need to be
iterative which should meet the randomness constraints ofHMMs parameters Then a Viterbi algorithm is used to findthe single best state sequenceQ = 119902
11199022 119902119879[21]
Step 4 (expectation maximization iteration) A Baum-Welchalgorithm is used to calculate an auxiliary function andmaximize the likelihood value of 120582 [22] 119873 iterations arepreceded until the likelihood approaches to a steady valueExpectation value Q(120582120582) and the maximized likelihood 120582are calculated as follows
Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]
max120582
[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)
(2) Recognition Phase During the recognition phase aftertraining a group of centroids for K-means clustering several
6 Mathematical Problems in Engineering
Move forward
StopMove backward
Turn rightTurn left
S1
S2
S3S4
S5
(a)
q0 q1 qt qt+1
O0 O1 Ot Ot+1
(b)
Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs
groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type
223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878
1 1198782 1198785 The system undergoes a change
of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886
119894119895described below relates the current
state with the previous state
119886119894119895= P [119902
119896= 119878119895| 119902119896minus1= 119878119894] (9)
The initial state distribution represents the probabilitydistribution of the first order which is defined as
120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)
Another element of LHMMs is the state probabilitydistribution of observation symbol 119878
119895
119887119895(119896) = P [O
119896| 119902119905= 119878119895] (11)
119887119895means the probability which is identified as different
observation symbols whereO119896represents decisions made by
UHMMsFigure 6(b) shows a sequence inUHMMs where the state
119902119905represents a 119905th gesture and O
119905is a majority decision
result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as
P(O119905+1| 119902119905+1= 119878119895) represents 119887
119894(O119905+1) where the
state at time 119905 + 1 is 119878119895and the observation isO
119905+1
P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability
119886119894119895in matrix A
The forward variable 120572119905(119894) is defined as the probability of
the observation sequence O1O2 O119905 and state 119878
119894at time 119905
given the model 120582
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (12)
where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)
we can conclude
120572119905+1(119895) = P (O
119905+1| 119902119905+1= 119878119895)
timessum
119878119894
P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)
(13)
with the initial condition of uniform distribution
1205720(119894) = P (119902
0= 119878119894) = 120587119894 (14)
For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
S11S12
S21 S22 S23 S24
O21 O2
2 O23
S21 S22 S23
O21 O2
2 O23O2
4
Figure 4 The architecture of the proposed MHMMs
HMMs Namely the K-means clustering method is appliedon the vector V to get the partition value for each vector andalso a set of centroids for clustering the data into observablesymbols in the recognition phase The K-means clusteringmethod clusters 119899 objects into 119896 partitions where 119896 lt 119899which is based on the properties of objects
As shown in Figure 5 a sliding window of 1 second (150data points) is moving with a speed of 30 data points eachstep which is simultaneously estimating likelihood of eachset of HMMs parameters Each sliding window generates alikelihood value Therefore for a certain gesture the modelwhich maximizes the likelihood over other HMMs to be therecognized type is chosen as the output decision of the slidingwindow
222 LHMMs-Based Individual Gestures Recognition
(1) Training Phase The training phase is divided into foursteps which include finding the stroke duration of a gesturequantifying the vectors into observable symbols settingup initial HMMs parameters and iterating for ExpectationMaximization
Step 1 (find the stroke duration of a gesture) FFT is used tofind the stroke duration of the gesture for HMMs
Step 2 (quantify the vectors into observable symbols) Afterpreprocessing the raw data these data are used for theobservation sequence O = O
1O2 O119879and the parameters
of HMMs 120582 = (AB120587) respectively Given parameters of themodel conditional probability of the observation sequencecan be calculated Namely the problem is given a specificHMMs to assess the probability of an observation sequenceP(O | 120582) a forward-backward algorithm is used to realizethe task as follows [20] Define the forward variables
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (4)
that is given a model the output observable sequence isO1O2 O119879to the time 119905 and the probability of the state is
119878119894at time 119905 The forward variables can be recursive calculated
by the following formula
120572119905+1(119895) = [
119873
sum
119894=1
120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)
where 1 le 119905 le 119879 minus 1 1 le 119895 le 119873
5
3
0
minus3
minus5
100 200 300 400 500 600 700 800 900
Sliding windows
Segmented gesture signal by FNNs
axayaz
middot middot middot
Figure 5 Sliding windows for preprocessing
Define the backward variables
120573119905(119894) = P (O
119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)
That is given model parameters at time 119905 the state is 119878119894
From the moment 119905 + 1 to the end of the sequence the prob-ability of the output observation sequence is O
1O2 O119879
Similarly a recursive method can be used to calculate 120573119905(119894)
as follows
120573119905(119894) =
119873
sum
119895=1
119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)
where 119905 = 119879 minus 1 119879 minus 2 1 1 le 119895 le 119873Then a K-means clustering method is applied to a 9D
vector V to get the partition value for each vector and alsoa set of centroids for clustering the data into observablesymbols in the recognition phase
Step 3 (set up initial HMMs parameter) Set up state numbersin themodelThe initial value of different observable symbolsO = O
1O2 O119879and 120582 = (AB120587) in each state need to be
iterative which should meet the randomness constraints ofHMMs parameters Then a Viterbi algorithm is used to findthe single best state sequenceQ = 119902
11199022 119902119879[21]
Step 4 (expectation maximization iteration) A Baum-Welchalgorithm is used to calculate an auxiliary function andmaximize the likelihood value of 120582 [22] 119873 iterations arepreceded until the likelihood approaches to a steady valueExpectation value Q(120582120582) and the maximized likelihood 120582are calculated as follows
Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]
max120582
[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)
(2) Recognition Phase During the recognition phase aftertraining a group of centroids for K-means clustering several
6 Mathematical Problems in Engineering
Move forward
StopMove backward
Turn rightTurn left
S1
S2
S3S4
S5
(a)
q0 q1 qt qt+1
O0 O1 Ot Ot+1
(b)
Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs
groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type
223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878
1 1198782 1198785 The system undergoes a change
of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886
119894119895described below relates the current
state with the previous state
119886119894119895= P [119902
119896= 119878119895| 119902119896minus1= 119878119894] (9)
The initial state distribution represents the probabilitydistribution of the first order which is defined as
120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)
Another element of LHMMs is the state probabilitydistribution of observation symbol 119878
119895
119887119895(119896) = P [O
119896| 119902119905= 119878119895] (11)
119887119895means the probability which is identified as different
observation symbols whereO119896represents decisions made by
UHMMsFigure 6(b) shows a sequence inUHMMs where the state
119902119905represents a 119905th gesture and O
119905is a majority decision
result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as
P(O119905+1| 119902119905+1= 119878119895) represents 119887
119894(O119905+1) where the
state at time 119905 + 1 is 119878119895and the observation isO
119905+1
P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability
119886119894119895in matrix A
The forward variable 120572119905(119894) is defined as the probability of
the observation sequence O1O2 O119905 and state 119878
119894at time 119905
given the model 120582
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (12)
where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)
we can conclude
120572119905+1(119895) = P (O
119905+1| 119902119905+1= 119878119895)
timessum
119878119894
P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)
(13)
with the initial condition of uniform distribution
1205720(119894) = P (119902
0= 119878119894) = 120587119894 (14)
For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
Move forward
StopMove backward
Turn rightTurn left
S1
S2
S3S4
S5
(a)
q0 q1 qt qt+1
O0 O1 Ot Ot+1
(b)
Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs
groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type
223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878
1 1198782 1198785 The system undergoes a change
of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886
119894119895described below relates the current
state with the previous state
119886119894119895= P [119902
119896= 119878119895| 119902119896minus1= 119878119894] (9)
The initial state distribution represents the probabilitydistribution of the first order which is defined as
120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)
Another element of LHMMs is the state probabilitydistribution of observation symbol 119878
119895
119887119895(119896) = P [O
119896| 119902119905= 119878119895] (11)
119887119895means the probability which is identified as different
observation symbols whereO119896represents decisions made by
UHMMsFigure 6(b) shows a sequence inUHMMs where the state
119902119905represents a 119905th gesture and O
119905is a majority decision
result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as
P(O119905+1| 119902119905+1= 119878119895) represents 119887
119894(O119905+1) where the
state at time 119905 + 1 is 119878119895and the observation isO
119905+1
P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability
119886119894119895in matrix A
The forward variable 120572119905(119894) is defined as the probability of
the observation sequence O1O2 O119905 and state 119878
119894at time 119905
given the model 120582
120572119905(119894) = P (O
1O2 O119905 119902119905= 119878119894| 120582) (12)
where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)
we can conclude
120572119905+1(119895) = P (O
119905+1| 119902119905+1= 119878119895)
timessum
119878119894
P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)
(13)
with the initial condition of uniform distribution
1205720(119894) = P (119902
0= 119878119894) = 120587119894 (14)
For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
VICON motion capture system
(a)
Kinectsensor
MicrocomputerFitPC2
Pioneer 3DXmobile robot
(b)
Figure 7 Experimental environment and a Kinect mobile robot
3 Experiments
31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor
Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program
32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from
the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition
33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9
We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps
Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data
Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
0 20 40 60 80 100 120
70
60
50
40
30
20
10
0
minus10
Rall data evaluation
KinectVICON
Time (s)
Deg
ree (
∘ )
(a)
8070605040302010
0minus10
minus20
minus30
minus40
Pitch data evaluation
KinectVICON
0 20 40 60 80 100 120Time (s)
Deg
ree (
∘ )
(b)
Yaw data evaluation
KinectVICON
40302010
0
minus10
minus20
minus30
minus40
minus500 20 40 60 80 100 120
Time (s)
Deg
ree (
∘ )
(c)
Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor
Waving front and back Waving left and right Waving up and down
Swing counterclockwise Swing clockwise
Figure 9 Five predefined gestures for Human-Robot Interaction
Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs
To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence
331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 9
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Time (s)
Erro
r
Time (s)
Time (s) Time (s)
151
050
minus05
minus1
minus150 50 100 150 200 250 300 350 400 450 500
Erro
r
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
FNNsOutput
FNNsOutput
15
1
05
0
minus050 50 100 150 200 250 300 350 400 450 500
Gro
und
trut
h ve
rsus
0 50 100 150 200 250 300
Performance is 00919598 goal is1e minus 005100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
300 epochs0 2 4 6 8 10 12
Trai
ning
-blu
e Goa
l-bla
ck
Trai
ning
-blu
e Goa
l-bla
ck
101
100
10minus1
10minus2
10minus3
10minus4
10minus5
10minus6
13 epochs
Performance is 271473e minus 006 goal is 1e minus 005
(a1) (b1)
(a2) (b2)
(a3) (b3)
out
put
Gro
und
trut
h ve
rsus
out
put
Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs
After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors
332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
10 Mathematical Problems in Engineering
0 20 40 60 80 100 120 140 160 180
x
y
z
5000
minus500
3D angular velocity
Deg
ree (
∘ )
Time (s)
(a)
0 20 40 60 80 100 120 140 160 180
151
050
FNNs output(A) (B)
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
(C)(D)
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Time (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained
We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition
accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition
4 Conclusion
In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 11
Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs
Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759
0 20 40 60 80 100 120 140 160 180
3D acceleration10
0
minus10
Time (s)
(ms2)
(a)
0 20 40 60 80 100 120 140 160
FNNs output210
minus1
Time (s)
(b)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Ground truthLHMMs
LHMMs decision
Time (s)
(c)
0 20 40 60 80 100 120 140 160
StopTurn right
Turn leftMove forward
Move backwardNongesture
Times (s)
UHMMs decision
Ground truthUHMMs
(d)
Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth
acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was
fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
12 Mathematical Problems in Engineering
Acknowledgment
This work is supported by the National Natural ScienceFoundation of China (60277605)
References
[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012
[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010
[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011
[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012
[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011
[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012
[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011
[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004
[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007
[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009
[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010
[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013
[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012
[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012
[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011
[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban
environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009
[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007
[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996
[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989
[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006
[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010
[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010
[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996
[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source
robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009
[26] Neural httpwwwmathworkscomproductsneuralnet
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
top related