a multilayer hidden markov models-based method for human

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2013 Article ID 384865 12 pageshttpdxdoiorg1011552013384865

Research ArticleA Multilayer Hidden Markov Models-Based Method forHuman-Robot Interaction

Chongben Tao and Guodong Liu

Key Laboratory of Light Industry Process Advanced Control School of Internet of Things EngineeringJiangnan University Wuxi 214122 China

Correspondence should be addressed to Chongben Tao tom1tao163com

Received 31 May 2013 Revised 6 August 2013 Accepted 8 August 2013

Academic Editor Vishal Bhatnaga

Copyright copy 2013 C Tao and G Liu This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

To achieve Human-Robot Interaction (HRI) by using gestures a continuous gesture recognition approach based on MultilayerHidden Markov Models (MHMMs) is proposed which consists of two parts One part is gesture spotting and segment modulethe other part is continuous gesture recognition module Firstly a Kinect sensor is used to capture 3D acceleration and 3D angularvelocity data of hand gestures And then a Feed-forward Neural Networks (FNNs) and a threshold criterion are used for gesturespotting and segment respectively Afterwards the segmented gesture signals are respectively preprocessed and vector symbolizedby a sliding window and a K-means clustering method Finally symbolized data are sent into Lower Hidden Markov Models(LHMMs) to identify individual gestures and then a Bayesian filter with sequential constraints among gestures in Upper HiddenMarkov Models (UHMMs) is used to correct recognition errors created in LHMMs Five predefined gestures are used to interactwith aKinectmobile robot in experimentsThe experimental results show that the proposedmethod not only has good effectivenessand accuracy but also has favorable real-time performance

1 Introduction

Human-Robot Interaction (HRI) is one of the hottestresearch directions in robot field and activity recognition isone of the most important markers of HRI As gesture recog-nition is a branch of activity recognition methods used inactivity recognition are also suitable for gesture recognitionGenerally depending on sensor types gesture recognitionis divided into two control modes motion sensor-based [1ndash3] and vision-based [4 5] Both of them have their owncharacteristics where gesture recognitions based on motionsensors are easy to capture the motion data of a body butmost of them need to be fixed on humanrsquos body which maymake users feel uncomfortable While vision-based gesturerecognition also has some disadvantages For instance illu-mination conditions and background scenery of an environ-ment may cause the difficulty of recognition Additionally itis difficult to obtain orientation information from a frame oftwo-dimensional (2D) image

According to different gesture recognition methods usedin HRI they can be divided into four categories whichinclude generative method discriminative method heuristic

analysis method and combinations of these methods [6]Generativemethoduses generativemodels for randomly gen-erating observed data typically given some hidden param-eters It specifies a joint probability distribution over obser-vation and label sequences [7] Generative method attemptsto build a complete description of the input or data spaceusually with a probabilistic model Hidden Markov Models(HMMs) is one of the most popular generative methodswhich is a probabilistic model with a specific structure Oncethe model is established it can be learned from data and usedto analyze the data Discriminative method analyzes featuresextracted from sensor data or segments without consideringsequential connections in the data Common discriminativemethods include Neural Networks (NNs) [8] ConditionalRandom Fields (CRFs) [9] and Support Vector Machines(SVM) [10] Heuristic analysis method is based on thedirect characteristic analysis and description of the data fromsensors [11] However each individual has different character-istics Thus this method is difficult to find a consistency wayfor observation Since generative method and discriminativemethod are both machine learning algorithms their param-eters can be training by using different types of individual

2 Mathematical Problems in Engineering

(a) (b)

Projector RGB sensor Infrared detector

Figure 1 (a) Xtion Pro Live sensor (b) Kinect sensor and (c) Internal structure of Kinect sensor

data However one drawback of generative approach is thatenough data must be available to learn the complete probabi-listic representations that are required Moreover both ofthese two types ofmethods have the problem of high-compu-tational complexity Currently researchers have tried to inte-grate the advantages of different methods to address complexactivity recognition problems [12]

In recent years a new type of RGB-Depth (RGB-D) sen-sor has appeared which is not only able to extract RGB fea-ture and depth information with position and direction butis also able to obtain three-dimensional (3D) acceleration and3D angular velocity information of the bodyrsquos motion AsusXtion Pro Live sensor and Microsoft Kinect sensor are twotypes of famous RGB-D sensors [13] as shown in Figures 1(a)and 1(b) respectively The internal structure of a Kinectsensor is shown in Figure 1(c) Since the advent of Kinectsensor in November 2010 researchers have used it to doextensive researches in the field of Human-Robot Interaction(HRI) Giraldo et al [14] proposed an approach based onmachine learning to identify predefined gestures based ondepth images obtained from a Kinect sensor Additionallythey also adopted a kernel-based learningmethod to discoverthe relationship among gestures and samples which can beused to correctly describe gestures Ren et al [15] detected theshape of a hand from color images and depth images obtainedfrom a Kinect sensor and then proposed a shape distancemetric method called Finger-Earth Moverrsquos Distance for dis-similarity measurement Finally they recognize predefinedgestures by template matching Bauer et al [16] proposed aHuman-Robot Interaction system Firstly a person makes apredefined gesture and then a robot try to understand theaction and respond accordingly in realtime

To overcome the shortcomings of large computation andpoor real-time performance a continuous gesture recogni-tion method is proposed for HRI The rest of the paperis organized as follows Section 2 describes a Feed-forward

Neural Networks (FNNs) for gesture spotting and the pro-posed Multilayer Hidden Markov Models (MHMMs) forcontinuous gesture recognition And then gives the pro-posed continuous hand gesture recognition procedure whichcontains hand spotting and gesture recognition based onFNNs and MHMMs The experimental results are shown inSection 3 Finally Section 4 shows the summary and then theacknowledgments

2 Mhmms-Based Continuous GestureRecognition

In this paper the proposed gesture recognition algorithmconsists of two parts the gesture spotting and segmentmodule and the continuous gesture recognition module Animportant problem in gesture recognition is how to distin-guish gestures and nongestures which is called gesture spot-ting problem namely how to detect a start point and anend point of a gesture [17] Firstly a Feed-Forward NeuralNetworks (FNNs) is used to detect gestures in an actionsignal and then a threshold criterion is used to segment ges-tures from the signal Finally MHMMs is used to recognizesegmented continuous gestures

FNNs are firstly used to distinguish whether a particularaction is a gestureWhen the FNNsdetermine a certain actionis a gesture then the output is 1 otherwise the output is 0 Fea-tures can bewell combined together in FNNs for classificationof gestures and nongestures Additionally through trainingthe weights and deviations of FNNs can be optimized Twocounters are used to detect a start point and an end point ofa gesture in accordance to a preset threshold value When theend point of a gesture is detected the spotting and segmentmodule will trigger the recognition module As MHMMsis a probabilistic model which has a high computationalcomplexity FNNs-based segment module will be used as aswitch to control data flows thus computation time can be

Mathematical Problems in Engineering 3

saved and the real-time performance of the whole algorithmcan be approved Sequential constraints among gestures willbe modeled by MHMMs in recognition module Forwardpropagation program in the Bayesian filtering will be used toproduce a posterior gesture decision and update the recog-nition result In this paper the flow chart of MHMMs-basedcontinuous gesture recognition method is shown in Figure 2

21 Gesture Spotting and Segment In this paper a Feed-forward Neural Networks (FNNs) is designed to discovergesture data from signals [18] The structure of a three-layer FNNs module is shown in Figure 3 The number ofinput nodes is 119899 the input variable of the input layer is119884 = (119910

1 1199102 119910

119899)119879 which indicates an original input signal

of actions the output vector of hidden layer is 1198862119894=

(1198862

1 1198862

2 119886

119898)

119879 the output vector of output layer is 1198863119894=

(1198863

1 1198863

2 119886

119898)

119879 which means a gesture spotting result theweightmatrix from the input layer to the hidden layer is1198821

119894=

(1198821

11198821

2 119882

119898) the weight matrix from the hidden layer

to the output layer is1198822119894= (119882

11198822

2 119882

119898) and Sigmoid

function 119891(119909) = (1 + 119890minus119909)minus1 is used as an excitation functionA 3D acceleration 120572

119894= [120572119909 120572119910 120572119911]119879 is one kind of data

extracted by RGB-D sensor which is trained by FNNs Let 1198863119894

be the output of FNNs

1198863

119894= hard lim (1198912 (11988221198911 (1198821120583

119894+ 1198871) + 1198872) minus 05) (1)

where11988211198822 1198871 and 1198872 are parameters of FNNsBoth of the network functions 1198911 and 1198912 are log-sigmoid

functions function hard lim is a hard limiting functionTherefore parameters of FNNs can be trained by a back-propagationmethod In the output layer set1198823 = 1 and 1198873 =0 in order to produce a discrete output The output of FNNsis a binary number (1 or 0) which respectively represents agesture or a nongesture The first layer and the second layerconstitute a two-layer feed-forward network which can beobtained optimized parameters through training And thenweights and deviations are fixed in the output layer in orderto produce a discrete output

Moreover through training the neural networks weightsand deviations of the FNNs can be optimized Two digitalcounters are used for continuous recording of the outputof FNNs When both of them exceed a preset thresholdvalue the start point and the end point of a gesture can bedetected thereby erroneous classification of the FNNs canbe prevented When an end point of a gesture is inspectedthe spotting and segment module will trigger the recognitionmodule

22 Multilayer Hidden Markov Models In our daily lifepeople usually use some gestures to commandpets And thesegestures are corresponded to some reasonable modes whichreflect sequential restrictions among continuous gesturesTherefore these restrictions can be used to improve theaccuracy of gesture recognition According to this inspira-tion this paper proposed a continuous gesture recognitionalgorithm based on MHMMs which is a statistical model

derived from HMMs HMMs is a famous statistical modelfor continuous data recognition which has been widelyused in speech recognition handwriting recognition andpattern recognition [19] It is characterized by a set ofparameters 120582 = (AB120587) where A B and 120587 are a statetransition probability distribution a probability distributionof observation symbols in each state and an initial statedistribution respectively In this paper a forward-backwardalgorithm is used to estimate a series of likelihood values ofobservation P(O | 120582) when given a specific HMMs [20] Fora given observation sequence O in the test mode a Viterbialgorithm is used to find the best state sequence Q [21] AnExpectation Maximization (EM) method is used to train theparameters of HMMs [22]

The proposed MHMMs are a kind of generalization fora segmented model where each segment has subsegmentsMHMMs are divided into two layers where LHMMs areused to identify individual gestures in the continuous gesturesignal and a Bayesian filter with sequential constraints inUHMMs is used to correct recognition errors created inLHMMs Figure 4 shows the basic idea of MHMMs A timesequence is layered into several segments where 1198781

119894represents

the state sequence of UHMMs and 1198782119894represents the state

sequence of LHMMs 1198782119894is the sub-HMMs of state sequence

1198781

119894O2119894means the decision of LHMMs

221 Data Preprocessing Before MHMMs are used to train-ing and recognition gesture data segmented by FNNs apreprocessing step needs to be completed firstly whichincludes removing high-frequency noise and finding thestroke duration of a gesture Each raw data obtained bya Kinect sensor is composed of a 6-component vector Ua 3D acceleration [120572

119909 120572119910 120572119911]119879 and a 3D angular velocity

[120596119909 120596119910 120596119911]119879 where

U = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911]

119879

When the robot computer receives a set of data as theoriginal data contains high-frequency noise whichmay affectgesture recognition a low pass filter with a cutoff frequencyof 5Hz is used to remove the noise and meanwhile smooththe data and then a sliding window with the length of 100msis used to calculate the average to remove theDC componentsin the 3D acceleration in time domain and generate a vectorw = [119889

119909 119889119910 119889119911]119879 Additionally a Fast Fourier Transform

(FFT) is applied to the deviation vector w in frequencydomain and the maximum frequency among 119889

119909 119889119910 and 119889

119911

is used as the fundamental frequency of a gesture Finally thedata pointV composing of a 3D angular velocity vector a 3Dacceleration vector and a 3D deviation vector of accelerationis fed into MHMMs for gesture recognition where

V = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911 119889119909 119889119910 119889119911]

119879

Then a K-means clustering method is used to train cen-troids in order to quantize vectors into observable symbolsThe process of symbolization converts feature vectors into afinite number of symbols which can be used in the discrete

Raw data ofmotion

Counter gtthreshold

Start point

Increase gesturetime counter

Is this agesture

Gesturesegment

Increasenonsense gesture

time counter

End point

Spotting and segment module

Recognition moduleSliding

windows

Data preprocessing

K-means clustering

Likelihoodestimation based on

each HMM

Maximum likelihoodestimation

Bayesianfilter with constraints

Save the decision

Has the slidingwindow arrived at the

end of the dataRecognition

result

Figure 2 Flow chart of MHMMs-based continuous gesture recognition

n times 1 W1 W2

3 times n

1 times 1

1 times 11 times 13 times 1

1 times 13 times 1

3 times 11 times 3

1 times 1

+ + +n1 n2 n3f1

1 times 1

Input First layer Second layer Third layer

a1 = f1(W1y + b1) a2 = f2(W2f1(W1y + b1) + b2) a2 minus 05)a3 = hardlim(

Raw data

minus05

Figure 3 The structure of a three-layer FNNs

S11S12

S21 S22 S23 S24

O21 O2

S21 S22 S23

O21 O2

2 O23O2

Figure 4 The architecture of the proposed MHMMs

HMMs Namely the K-means clustering method is appliedon the vector V to get the partition value for each vector andalso a set of centroids for clustering the data into observablesymbols in the recognition phase The K-means clusteringmethod clusters 119899 objects into 119896 partitions where 119896 lt 119899which is based on the properties of objects

As shown in Figure 5 a sliding window of 1 second (150data points) is moving with a speed of 30 data points eachstep which is simultaneously estimating likelihood of eachset of HMMs parameters Each sliding window generates alikelihood value Therefore for a certain gesture the modelwhich maximizes the likelihood over other HMMs to be therecognized type is chosen as the output decision of the slidingwindow

222 LHMMs-Based Individual Gestures Recognition

(1) Training Phase The training phase is divided into foursteps which include finding the stroke duration of a gesturequantifying the vectors into observable symbols settingup initial HMMs parameters and iterating for ExpectationMaximization

Step 1 (find the stroke duration of a gesture) FFT is used tofind the stroke duration of the gesture for HMMs

Step 2 (quantify the vectors into observable symbols) Afterpreprocessing the raw data these data are used for theobservation sequence O = O

1O2 O119879and the parameters

of HMMs 120582 = (AB120587) respectively Given parameters of themodel conditional probability of the observation sequencecan be calculated Namely the problem is given a specificHMMs to assess the probability of an observation sequenceP(O | 120582) a forward-backward algorithm is used to realizethe task as follows [20] Define the forward variables

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (4)

that is given a model the output observable sequence isO1O2 O119879to the time 119905 and the probability of the state is

119878119894at time 119905 The forward variables can be recursive calculated

by the following formula

120572119905+1(119895) = [

119873

119894=1

120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)

where 1 le 119905 le 119879 minus 1 1 le 119895 le 119873

minus3

minus5

100 200 300 400 500 600 700 800 900

Sliding windows

Segmented gesture signal by FNNs

axayaz

middot middot middot

Figure 5 Sliding windows for preprocessing

Define the backward variables

120573119905(119894) = P (O

119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)

That is given model parameters at time 119905 the state is 119878119894

From the moment 119905 + 1 to the end of the sequence the prob-ability of the output observation sequence is O

1O2 O119879

Similarly a recursive method can be used to calculate 120573119905(119894)

as follows

120573119905(119894) =

119873

119895=1

119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)

where 119905 = 119879 minus 1 119879 minus 2 1 1 le 119895 le 119873Then a K-means clustering method is applied to a 9D

vector V to get the partition value for each vector and alsoa set of centroids for clustering the data into observablesymbols in the recognition phase

Step 3 (set up initial HMMs parameter) Set up state numbersin themodelThe initial value of different observable symbolsO = O

1O2 O119879and 120582 = (AB120587) in each state need to be

iterative which should meet the randomness constraints ofHMMs parameters Then a Viterbi algorithm is used to findthe single best state sequenceQ = 119902

11199022 119902119879[21]

Step 4 (expectation maximization iteration) A Baum-Welchalgorithm is used to calculate an auxiliary function andmaximize the likelihood value of 120582 [22] 119873 iterations arepreceded until the likelihood approaches to a steady valueExpectation value Q(120582120582) and the maximized likelihood 120582are calculated as follows

Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]

max120582

[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)

(2) Recognition Phase During the recognition phase aftertraining a group of centroids for K-means clustering several

Move forward

StopMove backward

Turn rightTurn left

q0 q1 qt qt+1

O0 O1 Ot Ot+1

Figure 6 (a) transitions among UHMMs consider sequential constraints of context information (b) update gesture recognition result byUHMMs

groups of HMMs are formed Then we calculate the likeli-hood of each group of HMMs parameters Given observablesequence O the Viterbi algorithm is used to find the singlebest state sequence Q and the HMMs with the greatestlikelihood can be identified as the most likely gesture type

223 UHMMs-Based Continuous Gesture Update In orderto improve accuracy in decision making we consider thesequential conditions of ldquoContextrdquo in UHMMs where aBayesian filter is used to update results from LHMMsThere-fore five gestures are predefined which includes ldquoMove for-wardrdquo ldquoMove backwardrdquo ldquoTurn leftrdquo ldquoTurn rightrdquo and ldquoStoprdquoMeanwhile the ldquoContextrdquo is used as sequential constraintsamong different types of gestures For example the event thatsomeone continuously sends a same command to a robot hasa small probability someone firstly sends a ldquoMove forwardrdquocommand and before he wants to send a ldquoMove backwardrdquocommand he should send a ldquoStoprdquo command Figure 6(a)shows a transition among UHMMs which is a discretefirst-order HMMs with five states and observation symbolsrespectively UHMMs can be described as a sequence ofgestures and at any time as being in one of a set of119873 (119873 = 5)distinct states 119878

1 1198782 1198785 The system undergoes a change

of state according to a set of probabilities associated with thestate Each arch represents a probability of transition betweentwo adjacent states The moment associated with change ofstates is indicated as 119896 = 1 2 119873 and the 119896th actual stateis 119902119896 The probability 119886

119894119895described below relates the current

state with the previous state

119886119894119895= P [119902

119896= 119878119895| 119902119896minus1= 119878119894] (9)

The initial state distribution represents the probabilitydistribution of the first order which is defined as

120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)

Another element of LHMMs is the state probabilitydistribution of observation symbol 119878

119895

119887119895(119896) = P [O

119896| 119902119905= 119878119895] (11)

119887119895means the probability which is identified as different

observation symbols whereO119896represents decisions made by

UHMMsFigure 6(b) shows a sequence inUHMMs where the state

119902119905represents a 119905th gesture and O

119905is a majority decision

result given by LHMMs A Forward propagationmethod [23]is used to update the joint probability observation gathereduntil time 119905 In this paper the model is defined as

P(O119905+1| 119902119905+1= 119878119895) represents 119887

119894(O119905+1) where the

state at time 119905 + 1 is 119878119895and the observation isO

119905+1

P(119902119905+1= 119878119895| 119902119905= 119878119894)means the transition probability

119886119894119895in matrix A

The forward variable 120572119905(119894) is defined as the probability of

the observation sequence O1O2 O119905 and state 119878

119894at time 119905

given the model 120582

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (12)

where 1 le 119894 le 119873According to the network structure shown in Figure 6(b)

we can conclude

120572119905+1(119895) = P (O

119905+1| 119902119905+1= 119878119895)

timessum

119878119894

P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)

with the initial condition of uniform distribution

1205720(119894) = P (119902

0= 119878119894) = 120587119894 (14)

For the UHMMs the training parameters 120582(AB120587) areobtained by observing interaction between experimenter androbot so these parameters vary from person to person Thetransition matrix A is estimated from the statistical results ofactually observed gestures Probability distribution of obser-vation symbol B is an accuracy matrix of individual gesturesfrom LHMMs The forward variable is updated after havingobtained the current observation value which represents theposterior probability of a current state in the case of givencontext constraints of UHMMsTherefore the updated resultis the maximum posteriori probability of the state

VICON motion capture system

Kinectsensor

MicrocomputerFitPC2

Pioneer 3DXmobile robot

Figure 7 Experimental environment and a Kinect mobile robot

3 Experiments

31 Introduction of Hardware Platform An experimentalenvironment where a VICON motion capture system [24]and a Kinect mobile robot platform are installed is shownin Figures 7(a) and 7(b) respectively The mobile robotplatform mainly includes a Pioneer 3DX mobile robot amicrocomputer called FitPC2 and a Kinect sensor whichis developed from a RGB camera and a depth sensor Theeffective sensing range of the Kinect sensor is from 04metersto 4 meters vertical viewing angle range is plusmn430 horizontalrange isplusmn570 and frame rate is 30 fpsMicrocomputer FitPC2is a fanless minicomputer which is able to install Windowsor Linux operating systems and equipped withWiFi Besidesthe main components mentioned above an external batteryis used to supply power for both of FitPC2 and Kinect sensor

Robot Operating System (ROS) is installed in the Linuxoperating system of FitPC2microcomputer which is an opensourcemetaoperating system and provides services like a realoperating system including hardware abstraction low-leveldevice control implementation of commonly-used func-tionality message-passing between processes and packagemanagement [25] In this paper all processes are developed inaccordance to the ROS framework which includes a driver ofKinect sensor a motion data receiver program a motion dataprocessing program and a display program

32 Motion Data Evaluation Based on VICON In order toassess accuracy of the motion data captured by a Kinectsensor aVICONmotion capture systemwas used to evaluateAs angular velocity data is similar to acceleration datawe select angular velocity for evaluation Firstly severalmarkers of the VICON system were fixed on the wrist ofan experimenter and then the experimenter made a varietyof predefined gestures in front of the Kinect sensor Finallywe sent pitch roll and yaw data respectively recorded from

the VICON system and the Kinect sensor into a PC whichwas used to plot these data on the same graph as shownin Figure 8 The data from VICON system was plotted bysolid lines while the data from the Kinect sensor was plottedby dotted lines From these Figures we find that the Kinectsensor has accurately measured motions in most of the timeexcept that a period of time did not match as shown inFigure 8(a) which was due to the arm swings out of thescope of the Kinect sensor And the mismatch range betweenthese two types of data was always less than 10 degrees Bycomparison with the VICON system we have confirmedthat motion data captured from the Kinect sensor meets theaccuracy requirement of gesture recognition

33 Continuous Gesture Recognition Test In order to test theperformance of the proposed algorithm we predefine fivegestures for experiments They are ldquoMove forwardrdquo ldquoMovebackwardrdquo ldquoStoprdquo ldquoTurn leftrdquo and ldquoTurn rightrdquo where thegesture ldquoWaving front and backrdquo means ldquoMove forwardrdquo thegesture ldquoWaving left and rightrdquo means ldquoMove backwardrdquo thegesture ldquoWaving up and downrdquo means ldquoStoprdquo the gestureldquoSwing counterclockwiserdquo means ldquoTurn leftrdquo and the gestureldquoSwing clockwiserdquo means ldquoTurn rightrdquo as shown in Figure 9

We respectively recorded 10 sets of data from threeexperimenters for training and testing and these experimentswere carried out in accordance to the following three steps

Step 1 In the scope of a Kinect sensor repeat gesture Type 1for 15 times and take a 5 seconds break Continue performingthe rest of the types following in the same way until Type 5 isdone and then save the data

Step 2 In the scope of a Kinect sensor perform a sequence of20 gestures with a break of about 2 seconds between gesturesand save the data

0 20 40 60 80 100 120

minus10

Rall data evaluation

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

Pitch data evaluation

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

Figure 8 Comparisons of Roll (a) Pitch (b) and Yaw (c) data between a VICON system and a Kinect sensor

Waving front and back Waving left and right Waving up and down

Swing counterclockwise Swing clockwise

Figure 9 Five predefined gestures for Human-Robot Interaction

Step 3 Process the training data and the testing data Firstlytraining the FNNs to distinguish gestures and nongesturessecondly use each block of training data to train the LHMMs

To trade off the computational complexity with efficiencyand accuracy set up the number of states 20 in the LHMMMeanwhile set up the number of distinct observation symbols20 and then use the trained LHMMs to recognize individualgestures in the test data The output of each test is a sequenceof recognized gestures Finally a Bayesian filtering withsequential constraints in UHMMs is used to generate a mostlikely underlying gesture sequence

331 Performance Evaluation of FNNs A MATLAB neuralnetwork toolbox [26] is used for training in the first andsecond layer of FNNs The initial values of weights anddeviations are randomly selected Different initial values havedifferent performances If the performance does not reach thetarget then the training phase needs to restart a new neuralnetwork until it reaches a sufficient accuracy For instancewithin 180 iterations two different initial values got twodifferent performances as shown in Figure 10 which gavetwo results of both good and bad neural network training

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

Performance is 00919598 goal is1e minus 005100

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

Performance is 271473e minus 006 goal is 1e minus 005

(a1) (b1)

(a2) (b2)

(a3) (b3)

Figure 10 Performance comparison of gesture spotting between two FNNs (a1) the performance goal is not met within 180 iterations (a2)the output of FNNs accuracy = 7569 (a3) the error of FNNs (b1) the performance goal is met within 26 iterations (b2) the output ofFNNs accuracy = 9686 (b3) the error of FNNs

After 180 iterations the FNNs did not reach the target asshown in Figure 10(a1) These parts circled in Figure 10(a3)showed continuous error which led to detection errors inFigure 10(a2) Thus it can be seen that only if the perfor-mance curve reaches the standard the FNNs can achievesufficient accuracy As the performance curve cannot meetits target in Figure 10(a1) the training program needs to berestarted In Figure 10(b1) we find the FNNs achieved therequirement after 26 iterations Although there were somediscrete errors in Figure 10(b3) they did not generate anydetection error in Figure 10(b2) Therefore we can concludethat several discrete errors on the edge of gesture signal donot affect detection results but consecutive errors can causedetection errors

332 Performance Evaluation of MHMMs We respectivelyrecorded 10 sets of data from three experimenters for trainingand testing again and every set of sequence data has 20 ges-turesThe test includes two steps We firstly performed train-ing LHMMs to recognize individual gestures in a sequenceSecondly the decision obtained from training was used ina Bayesian filter with sequential restrictions to generate amost likely potential command sequence as the final resultThe test results of 3D angular velocity and 3D accelerationare shown in Figures 11 and 12 respectively In Figure 11(a)the signal was a 3D angular velocity which included twentygestures captured by a Kinect sensorWhile the circled region(A) and (B) in Figure 11(b) indicated two errors generated byFNNswhich caused the size of the segmentation to be shorter

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Turn leftMove forward

Move backwardNongesture

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

Figure 11 Results of FNNs LHMMs and UHMMs (a) the raw 3D angular velocity (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth

than its actual length after detecting the start and end pointsof the gesture After using LHMMs to recognize gestureserrors appeared as shown in the circled regions (C) and (D)of Figure 11(c) which indicate the length of segment affectsthe accuracy of gesture recognition While after UHMMswere used to update recognition results from LHMMs theoutput obtained accurate recognition results Similarly 3Dacceleration data was also used for testing recognition errorsalso appeared in LHMMs as four circled regions shown inFigure 12(c) However after updating in UHMMs accuraterecognition results were also obtained

We have calculated the average recognition rate and aver-age time of 30 sets of gestures which were composed by thefive predefined gestures Table 1 lists the average recognition

accuracy and the average recognition time of HMMs andMHMMs respectively From the table we can conclude thatthe recognition accuracy of MHMMs which use Bayesian fil-tering is better than thatwhich only usesHMMs and the real-time performance of theMHMMs-based gesture recognitionmeets the requirement of online gesture recognition

4 Conclusion

In this paper a novel MHMMs continuous gesture recog-nition algorithm is proposed for Human-Robot Interactionwhich uses raw motion data captured by a Kinect sen-sor Therefore it is different from traditional vision-basedmethods Firstly a Kinect sensor was used to obtain 3D

Table 1 Comparisons of average recognition accuracy and average recognition time between HMMs and MHMMs

Gesture tyep HMMs accuracy () MHMMs accuracy () HMMs recognition time (s) MHMMs recognition time (s)1 8683 9571 0564 06372 9124 9728 0629 07843 7843 9063 0468 06154 8592 9342 0693 08125 7397 8985 0503 0759

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Figure 12 Results of FNNs LHMMs and UHMMs (a) the raw 3D acceleration (b) the output of FNNs (c) the LHMMs decision resultscompared with the ground truth (d) the UHMMs decision results compared with the ground truth

acceleration and 3D angular velocity of an arm Secondlya FNNs and a threshold criterion were respectively usedfor gesture spotting and segment And then the segmentedgesture signalswere sent into LHMMs to recognize individualgestures After that the identified gesture sequence was

fed into UHMMs where a Bayesian filter with sequentialconstraints was used to correct errors generated in LHMMsThe experimental results verify the proposed algorithm notonly effectively improves the accuracy of continuous gesturesbut also has a good real-time performance

Acknowledgment

This work is supported by the National Natural ScienceFoundation of China (60277605)

References

[1] X Wang M Xia and H Cai ldquoHidden markov models baseddynamic hand gesture recognitionrdquo Mathematical Problems inEngineering vol 2012 Article ID 986134 11 pages 2012

[2] A Pantelopoulos and N G Bourbakis ldquoA survey on wearablesensor-based systems for health monitoring and prognosisrdquoIEEE Transactions on Systems Man and Cybernetics C vol 40no 1 pp 1ndash12 2010

[3] J Yang J Lee and J Choi ldquoActivity recognition based on RFIDobject usage for smart mobile devicesrdquo Journal of ComputerScience and Technology vol 26 no 2 pp 239ndash246 2011

[4] R Poppe ldquoA survey on vision-based human action recognitionrdquoImage and Vision Computing vol 42 no 6 pp 790ndash808 2012

[5] D Weinland R Ronfard and E Boyer ldquoA survey of vision-based methods for action representation segmentation andrecognitionrdquo Computer Vision and Image Understanding vol115 no 2 pp 224ndash241 2011

[6] L Chen J Hoey and C D Nugent ldquoSensor-based activityrecognitionrdquo Transactions on Systems Man and Cyberneticsvol 42 pp 790ndash808 2012

[7] S Bilal R Akmeliawati A A Shafie and M J E SalamildquoHidden Markov model for human to computer interaction astudy on humanhand gesture recognitionrdquoArtificial IntelligenceReview vol 8 pp 1ndash22 2011

[8] L Bao and S S Intille ldquoActivity recognition from user-annotated acceleration datardquo Pervasive Computing vol 3001pp 1ndash17 2004

[9] C Sutton A McCallum and K Rohanimanesh ldquoDynamicconditional random fields factorized probabilistic models forlabeling and segmenting sequence datardquo Journal of MachineLearning Research vol 8 no 2 pp 693ndash723 2007

[10] O Brdiczka J L Crowley and P Reignier ldquoLearning situationmodels in a smart homerdquo IEEE Transactions on Systems Manand Cybernetics B vol 39 no 1 pp 56ndash63 2009

[11] F Omar M Sinn J Truszkowski P Poupart J Tung and ACaine ldquoComparative analysis of probabilistic models for activ-ity recognition with an instrumented walkerrdquo in Proceedings ofthe 26th Conference on Uncertainty in Artificial Intelligence (UAIrsquo10) pp 392ndash400 July 2010

[12] F J Ordonez P D Toledo and A Sanchis ldquoActivity recognitionusing hybrid generativediscriminative models on home envi-ronments using binary sensorsrdquo Sensors vol 13 no 1 pp 5460ndash5477 2013

[13] Z Zhang ldquoMicrosoft kinect sensor and its effectrdquo IEEE Multi-Media vol 19 no 2 pp 4ndash10 2012

[14] R Giraldo D M Giraldo and S A Meza ldquoKernel based handgesture recognition using kinect sensorrdquo in Proceedings of the17th Symposiumof Image Signal Processing andArtificial Visionpp 158ndash161 2012

[15] Z Ren J Meng J Yuan and Z Zhang ldquoRobust hand gesturerecognition with kinect sensorrdquo in Proceedings of the 19th ACMInternational Conference on Multimedia ACM Multimedia 2011(MM rsquo11) pp 759ndash760 December 2011

[16] A Bauer K Klasing and G Lidoris ldquoThe autonomous cityexplorer towards natural human-robot interactivity in urban

environmentsrdquo International Journal of Social Robotics vol 1no 3 pp 127ndash140 2009

[17] H-D Yang A-Y Park and S-W Lee ldquoGesture spotting andrecognition for human-robot interactionrdquo IEEETransactions onRobotics vol 23 no 2 pp 256ndash270 2007

[18] M T Hagan H B Demuth and M H Beale Neural NetworkDesign PWS Publishing Company 1996

[19] L R Rabiner ldquoTutorial on hiddenMarkov models and selectedapplications in speech recognitionrdquo Proceedings of the IEEE vol77 no 2 pp 257ndash286 1989

[20] S-Z Yu and H Kobayashi ldquoPractical implementation of anefficient forward-backward algorithm for an explicit-durationhiddenMarkov modelrdquo IEEE Transactions on Signal Processingvol 54 no 5 pp 1947ndash1951 2006

[21] W Liu andW Han ldquoImproved Viterbi algorithm in continuousspeech recognitionrdquo in International Symposium in InformationTechnology pp 542ndash545 October 2010

[22] OMourad ldquoHmms parameters estimation using hybrid baum-welch genetic algorithmrdquo in Proceedings of the InternationalConference on Computer Application and System Modeling pp207ndash209 2010

[23] KHirasawaMOhbayashiM Koga andMHarada ldquoForwardpropagation universal learning networkrdquo in Proceedings of theIEEE International Conference onNeural Networks pp 353ndash358June 1996

[24] VICON httpwwwviconcom[25] M Quigley K Conley and P Gerkey B ldquoRos an open-source

robot operating systemrdquo in Proceedings of the ICRA Workshopon Open Source Software 2009

[26] Neural httpwwwmathworkscomproductsneuralnet

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

OptimizationJournal of

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Operations ResearchAdvances in

Journal of

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Decision SciencesAdvances in

Discrete MathematicsJournal of

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

(a) (b)

Projector RGB sensor Infrared detector

Figure 1 (a) Xtion Pro Live sensor (b) Kinect sensor and (c) Internal structure of Kinect sensor

data However one drawback of generative approach is thatenough data must be available to learn the complete probabi-listic representations that are required Moreover both ofthese two types ofmethods have the problem of high-compu-tational complexity Currently researchers have tried to inte-grate the advantages of different methods to address complexactivity recognition problems [12]

In recent years a new type of RGB-Depth (RGB-D) sen-sor has appeared which is not only able to extract RGB fea-ture and depth information with position and direction butis also able to obtain three-dimensional (3D) acceleration and3D angular velocity information of the bodyrsquos motion AsusXtion Pro Live sensor and Microsoft Kinect sensor are twotypes of famous RGB-D sensors [13] as shown in Figures 1(a)and 1(b) respectively The internal structure of a Kinectsensor is shown in Figure 1(c) Since the advent of Kinectsensor in November 2010 researchers have used it to doextensive researches in the field of Human-Robot Interaction(HRI) Giraldo et al [14] proposed an approach based onmachine learning to identify predefined gestures based ondepth images obtained from a Kinect sensor Additionallythey also adopted a kernel-based learningmethod to discoverthe relationship among gestures and samples which can beused to correctly describe gestures Ren et al [15] detected theshape of a hand from color images and depth images obtainedfrom a Kinect sensor and then proposed a shape distancemetric method called Finger-Earth Moverrsquos Distance for dis-similarity measurement Finally they recognize predefinedgestures by template matching Bauer et al [16] proposed aHuman-Robot Interaction system Firstly a person makes apredefined gesture and then a robot try to understand theaction and respond accordingly in realtime

To overcome the shortcomings of large computation andpoor real-time performance a continuous gesture recogni-tion method is proposed for HRI The rest of the paperis organized as follows Section 2 describes a Feed-forward

Neural Networks (FNNs) for gesture spotting and the pro-posed Multilayer Hidden Markov Models (MHMMs) forcontinuous gesture recognition And then gives the pro-posed continuous hand gesture recognition procedure whichcontains hand spotting and gesture recognition based onFNNs and MHMMs The experimental results are shown inSection 3 Finally Section 4 shows the summary and then theacknowledgments

2 Mhmms-Based Continuous GestureRecognition

In this paper the proposed gesture recognition algorithmconsists of two parts the gesture spotting and segmentmodule and the continuous gesture recognition module Animportant problem in gesture recognition is how to distin-guish gestures and nongestures which is called gesture spot-ting problem namely how to detect a start point and anend point of a gesture [17] Firstly a Feed-Forward NeuralNetworks (FNNs) is used to detect gestures in an actionsignal and then a threshold criterion is used to segment ges-tures from the signal Finally MHMMs is used to recognizesegmented continuous gestures

FNNs are firstly used to distinguish whether a particularaction is a gestureWhen the FNNsdetermine a certain actionis a gesture then the output is 1 otherwise the output is 0 Fea-tures can bewell combined together in FNNs for classificationof gestures and nongestures Additionally through trainingthe weights and deviations of FNNs can be optimized Twocounters are used to detect a start point and an end point ofa gesture in accordance to a preset threshold value When theend point of a gesture is detected the spotting and segmentmodule will trigger the recognition module As MHMMsis a probabilistic model which has a high computationalcomplexity FNNs-based segment module will be used as aswitch to control data flows thus computation time can be

1 1199102 119910

(1198862

1 1198862

2 119886

119898)

(1198863

1 1198863

2 119886

119898)

119894=

(1198821

11198821

2 119882

11198822

2 119882

119898) and Sigmoid

1198863

119894= hard lim (1198912 (11988221198911 (1198821120583

119894+ 1198871) + 1198872) minus 05) (1)

119894represents

1198781

[120596119909 120596119910 120596119911]119879 where

U = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911]

119879

119909 119889119910 and 119889

119911

V = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911 119889119909 119889119910 119889119911]

119879

Raw data ofmotion

Counter gtthreshold

Start point

Is this agesture

Gesturesegment

time counter

End point

windows

Data preprocessing

K-means clustering

each HMM

Save the decision

result

n times 1 W1 W2

3 times n

1 times 1

1 times 13 times 1

3 times 11 times 3

1 times 1

+ + +n1 n2 n3f1

1 times 1

Raw data

minus05

S11S12

S21 S22 S23 S24

O21 O2

S21 S22 S23

O21 O2

2 O23O2

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (4)

120572119905+1(119895) = [

119873

119894=1

120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)

minus3

minus5

100 200 300 400 500 600 700 800 900

Sliding windows

axayaz

120573119905(119894) = P (O

119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)

1O2 O119879

as follows

120573119905(119894) =

119873

119895=1

119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)

11199022 119902119879[21]

Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]

max120582

[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)

Move forward

StopMove backward

Turn rightTurn left

q0 q1 qt qt+1

O0 O1 Ot Ot+1

119886119894119895= P [119902

119896= 119878119895| 119902119896minus1= 119878119894] (9)

120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)

119895

119887119895(119896) = P [O

119896| 119902119905= 119878119895] (11)

P(O119905+1| 119902119905+1= 119878119895) represents 119887

119894(O119905+1) where the

119905+1

119886119894119895in matrix A

119894at time 119905

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (12)

we can conclude

120572119905+1(119895) = P (O

119905+1| 119902119905+1= 119878119895)

timessum

119878119894

P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)

1205720(119894) = P (119902

0= 119878119894) = 120587119894 (14)

Kinectsensor

MicrocomputerFitPC2

3 Experiments

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

1 1199102 119910

(1198862

1 1198862

2 119886

119898)

(1198863

1 1198863

2 119886

119898)

119894=

(1198821

11198821

2 119882

11198822

2 119882

119898) and Sigmoid

1198863

119894= hard lim (1198912 (11988221198911 (1198821120583

119894+ 1198871) + 1198872) minus 05) (1)

119894represents

1198781

[120596119909 120596119910 120596119911]119879 where

U = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911]

119879

119909 119889119910 and 119889

119911

V = [120572119909 120572119910 120572119911 120596119909 120596119910 120596119911 119889119909 119889119910 119889119911]

119879

Raw data ofmotion

Counter gtthreshold

Start point

Is this agesture

Gesturesegment

time counter

End point

windows

Data preprocessing

K-means clustering

each HMM

Save the decision

result

n times 1 W1 W2

3 times n

1 times 1

1 times 13 times 1

3 times 11 times 3

1 times 1

+ + +n1 n2 n3f1

1 times 1

Raw data

minus05

S11S12

S21 S22 S23 S24

O21 O2

S21 S22 S23

O21 O2

2 O23O2

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (4)

120572119905+1(119895) = [

119873

119894=1

120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)

minus3

minus5

100 200 300 400 500 600 700 800 900

Sliding windows

axayaz

120573119905(119894) = P (O

119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)

1O2 O119879

as follows

120573119905(119894) =

119873

119895=1

119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)

11199022 119902119879[21]

Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]

max120582

[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)

Move forward

StopMove backward

Turn rightTurn left

q0 q1 qt qt+1

O0 O1 Ot Ot+1

119886119894119895= P [119902

119896= 119878119895| 119902119896minus1= 119878119894] (9)

120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)

119895

119887119895(119896) = P [O

119896| 119902119905= 119878119895] (11)

P(O119905+1| 119902119905+1= 119878119895) represents 119887

119894(O119905+1) where the

119905+1

119886119894119895in matrix A

119894at time 119905

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (12)

we can conclude

120572119905+1(119895) = P (O

119905+1| 119902119905+1= 119878119895)

timessum

119878119894

P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)

1205720(119894) = P (119902

0= 119878119894) = 120587119894 (14)

Kinectsensor

MicrocomputerFitPC2

3 Experiments

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

Raw data ofmotion

Counter gtthreshold

Start point

Is this agesture

Gesturesegment

time counter

End point

windows

Data preprocessing

K-means clustering

each HMM

Save the decision

result

n times 1 W1 W2

3 times n

1 times 1

1 times 13 times 1

3 times 11 times 3

1 times 1

+ + +n1 n2 n3f1

1 times 1

Raw data

minus05

S11S12

S21 S22 S23 S24

O21 O2

S21 S22 S23

O21 O2

2 O23O2

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (4)

120572119905+1(119895) = [

119873

119894=1

120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)

minus3

minus5

100 200 300 400 500 600 700 800 900

Sliding windows

axayaz

120573119905(119894) = P (O

119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)

1O2 O119879

as follows

120573119905(119894) =

119873

119895=1

119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)

11199022 119902119879[21]

Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]

max120582

[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)

Move forward

StopMove backward

Turn rightTurn left

q0 q1 qt qt+1

O0 O1 Ot Ot+1

119886119894119895= P [119902

119896= 119878119895| 119902119896minus1= 119878119894] (9)

120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)

119895

119887119895(119896) = P [O

119896| 119902119905= 119878119895] (11)

P(O119905+1| 119902119905+1= 119878119895) represents 119887

119894(O119905+1) where the

119905+1

119886119894119895in matrix A

119894at time 119905

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (12)

we can conclude

120572119905+1(119895) = P (O

119905+1| 119902119905+1= 119878119895)

timessum

119878119894

P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)

1205720(119894) = P (119902

0= 119878119894) = 120587119894 (14)

Kinectsensor

MicrocomputerFitPC2

3 Experiments

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

S11S12

S21 S22 S23 S24

O21 O2

S21 S22 S23

O21 O2

2 O23O2

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (4)

120572119905+1(119895) = [

119873

119894=1

120572119905(119894) 119886119894119895] 119887119895(O119905+1) (5)

minus3

minus5

100 200 300 400 500 600 700 800 900

Sliding windows

axayaz

120573119905(119894) = P (O

119905+1O119905+1 O119879| 119902119905= 119878119894120582) (6)

1O2 O119879

as follows

120573119905(119894) =

119873

119895=1

119886119894119895119887119895(O119905+1) 120573119905+1(119895) (7)

11199022 119902119879[21]

Q (120582120582) = sumQP (Q | O120582) log [P (OQ | 120582)]

max120582

[Q (120582120582)] 997904rArr P (O | 120582) gt P (O | 120582) (8)

Move forward

StopMove backward

Turn rightTurn left

q0 q1 qt qt+1

O0 O1 Ot Ot+1

119886119894119895= P [119902

119896= 119878119895| 119902119896minus1= 119878119894] (9)

120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)

119895

119887119895(119896) = P [O

119896| 119902119905= 119878119895] (11)

P(O119905+1| 119902119905+1= 119878119895) represents 119887

119894(O119905+1) where the

119905+1

119886119894119895in matrix A

119894at time 119905

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (12)

we can conclude

120572119905+1(119895) = P (O

119905+1| 119902119905+1= 119878119895)

timessum

119878119894

P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)

1205720(119894) = P (119902

0= 119878119894) = 120587119894 (14)

Kinectsensor

MicrocomputerFitPC2

3 Experiments

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

Move forward

StopMove backward

Turn rightTurn left

q0 q1 qt qt+1

O0 O1 Ot Ot+1

119886119894119895= P [119902

119896= 119878119895| 119902119896minus1= 119878119894] (9)

120587 = P [1199021= 119878119894 (119894 = 1 2 119873)] (10)

119895

119887119895(119896) = P [O

119896| 119902119905= 119878119895] (11)

P(O119905+1| 119902119905+1= 119878119895) represents 119887

119894(O119905+1) where the

119905+1

119886119894119895in matrix A

119894at time 119905

120572119905(119894) = P (O

1O2 O119905 119902119905= 119878119894| 120582) (12)

we can conclude

120572119905+1(119895) = P (O

119905+1| 119902119905+1= 119878119895)

timessum

119878119894

P (119902119905+1= 119878119895| 119902119905= 119878119894) 120572119905(119894)

1205720(119894) = P (119902

0= 119878119894) = 120587119894 (14)

Kinectsensor

MicrocomputerFitPC2

3 Experiments

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

Kinectsensor

MicrocomputerFitPC2

3 Experiments

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

0 20 40 60 80 100 120

minus10

KinectVICON

Time (s)

8070605040302010

0minus10

minus20

minus30

minus40

KinectVICON

0 20 40 60 80 100 120Time (s)

Yaw data evaluation

KinectVICON

40302010

minus10

minus20

minus30

minus40

minus500 20 40 60 80 100 120

Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

Time (s)

Time (s) Time (s)

minus05

minus1

minus150 50 100 150 200 250 300 350 400 450 500

minus050 50 100 150 200 250 300 350 400 450 500

FNNsOutput

minus050 50 100 150 200 250 300 350 400 450 500

0 50 100 150 200 250 300

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

300 epochs0 2 4 6 8 10 12

10minus1

10minus2

10minus3

10minus4

10minus5

10minus6

13 epochs

(a1) (b1)

(a2) (b2)

(a3) (b3)

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

0 20 40 60 80 100 120 140 160 180

minus500

3D angular velocity

Time (s)

0 20 40 60 80 100 120 140 160 180

FNNs output(A) (B)

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

(C)(D)

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Time (s)

UHMMs decision

Ground truthUHMMs

4 Conclusion

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

0 20 40 60 80 100 120 140 160 180

3D acceleration10

minus10

Time (s)

0 20 40 60 80 100 120 140 160

FNNs output210

minus1

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Ground truthLHMMs

LHMMs decision

Time (s)

0 20 40 60 80 100 120 140 160

StopTurn right

Times (s)

UHMMs decision

Ground truthUHMMs

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

Acknowledgment

References

Volume 2014

Journal of

Function Spaces

Algebra

Volume 2014

Journal of

Function Spaces

Algebra

a multilayer hidden markov models-based method for human

Documents

hidden markov models - home | princeton...

l13: hidden markov models - texas a&m...

hidden markov model nov 11, 2008 sung-bae cho. hidden markov...

hidden markov models -...

hidden markov models

hidden markov models./awm/tutorials/hmm14.pdf · hidden...

lecture 6a: introduction to hidden markov models · lecture...

inference in mixed hidden markov models and applications...

markov chains and hidden markov models - rice university ·...

l13: hidden markov...

markov hidden

hidden markov model

hidden markov models in...

hidden markov modelle

hidden markov models - uml.edugrinstei/91.510/hmm/class...

hidden markov models · hidden markov models 1 10-601...

ee365: hidden markov models - stanford...

hidden markov models and gaussian mixture models · hidden...

hidden markov

9 markov chains and hidden markov models - freie … · 9...