improved collaborative representation classifier based on 2...

Research ArticleImproved Collaborative Representation Classifier Based on1198972-Regularized for Human Action Recognition

Shirui Huo1 Tianrui Hu2 and Ce Li34

1City University of Hong Kong Kowloon Tong Hong Kong2Beijing University of Posts and Telecommunications Beijing China3State Key Laboratory of Coal Resources and Safe Mining China University of Mining amp Technology Beijing China4University of Chinese Academy of Sciences Beijing China

Correspondence should be addressed to Ce Li licekonggmailcom

Received 10 April 2017 Revised 15 August 2017 Accepted 28 September 2017 Published 20 November 2017

Academic Editor Naiyang Guan

Copyright copy 2017 Shirui Huo et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Human action recognition is an important recent challenging task Projecting depth images onto three depth motion maps(DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterizethe spatiotemporal information of a specific action from a sequence of depth images In this paper a unified improved collaborativerepresentation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classescan be well defined and calculated The improved collaborative representation classifier (ICRC) based on 1198972-regularized for humanaction recognition is presented to maximize the likelihood that a test sample belongs to each class then theoretical investigationinto ICRC shows that it obtains a final classification by computing the likelihood for each class Coupled with the DMMs andDCNN features experiments on depth image-based action recognition including MSRAction3D and MSRGesture3D datasetsdemonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performanceover the state-of-the-art methods including SRC CRC and SVM

1 Introduction

Human action recognition has been studied in the computervision community for decades due to its applications in videosurveillance [1] human computer interaction [2] andmotionanalysis [3] Prior to the Microsoft Kinect the conventionalresearch focused onhuman action recognition fromRGB butKinect sensors provide an affordable technology to captureRGB and depth (D) images in real time which can offerbetter geometric cues and less sensitivity to illuminationchanges for action recognition In [1] a bag of 3D pointsand graphical model are obtained to characterize spatialand temporal information from depth images In [3] threedepth motion maps (DMMs) are projected to capture bodyshape andmotion which is a discriminant feature to describethe spatiotemporal information of a specific action from asequence of depth images Seen from the literature reviewalthough depth based methods appear to be compelling

toward a practical application even if there are a few ofdeep-learned features for depth based action recognition theperformance is still far from satisfactory due to the largevariations of themotion In this paper we focus on leveragingone kind of the structure of representative model to improveperformance in multiclass classification with handcraftedDMMs descriptor In [4] three channel deep convolutionalneural networks are trained to extract features of depthmap sequences after projecting weighted DMMs on threeorthogonal planes at several temporal scales It was verifiedthat the method using DCNN features can achieve almostthe same state-of-the-art results on the MSRAction3D andMSRGesture3D dataset DCNNs have been demonstratedas an effective kind of models for performing state-of-the-art results in the tasks of image recognition segmentationdetection and retrieval With the success of DCNN we alsotake it as feature extraction and apply it in our classifiermodel

HindawiJournal of Electrical and Computer EngineeringVolume 2017 Article ID 8191537 6 pageshttpsdoiorg10115520178191537

2 Journal of Electrical and Computer Engineering

As for representative models many achievements basedon space representation include image restoration [5] com-pressive sensing [6 7] morphological component analysis[8] and super-resolution [9 10] As the advances of classifiersbased on representation several pattern recognition prob-lems in the field of computer vision can be effectively solvedby sparse coding or sparse representation methods in recentdecades In particular the linear models can be representedas 119910 = 119860120572 [11] where 119910 120572 and 119860 represent the data a sparsevector and a given matrix with the overcomplete sample setrespectively Because of the great success of sparse codingalgorithms on image processing the sparse representationbased classifiers such as sparse representation classification(SRC) and collaborative representation classification (CRC)have gained more attention nowadays

The basic idea of SRCCRC is to code the test sampleover a set of samples with sparsity constraints which canbe calculated by 1198971-minimization In [12] Wright proposeda basic SRC model for classification by the discriminativenature of sparse representation which is based on the theorythat newly signals are recognized as the linear combinationsof previously observed ones Based on SRC Yang and Zhangproposed aGabor occlusion dictionary based SRC which cansignificantly reduce the computational cost [13] In [14] theauthors combined sparse representation with linear pyramidmatching for image classification Rather than using theentire training set Zhang and Li [15] proposed learneddictionary using SRC In [16] 1198971-graph is constructed bya sparse representation subspace over the other samplesYang et al [14] also proposed a method to preserve the 1198971-graph for image classification by using a subspace to solvemisalignment problems in image classification task BesidesSRC is used for robust illumination [17] image-plane trans-formation [18] and so on However Zhang argued that thegood performance of SRC should be largely attributed to thecollaborative representation of a test sample by training sam-ples across all classes and proposed more effective CRC Insummary SRCCRC simply uses the reconstruction error orresidual by each class-specific subspace to determine the classlabel and many modified models and solution algorithmsto SRCCRC are also proposed for visual recognition tasksincluding Augmented Lagrange Multiplier Proximal Gradi-ent Gradient Projection Iterative Shrinkage-Thresholdingand Homotopy [19] Recently some researchers [20 21] havepointed out the purpose of 1198971-regularized based sparsity inpattern classification On the contrary using 1198972-regularizedbased representation for classification can do a similar job to1198971-regularized but the computational cost will reduce a lot

Motivated by the work of modifications of CRC inthis paper we mainly present the improved collaborativerepresentation classifier (ICRC) based on 1198972-regularized forhuman action recognition Based on three DMMsrsquo descrip-tor feature the ICRC approach is to jointly maximize thelikelihood that a test sample belongs to each of the multipleclasses then the final classification is performed by com-puting the likelihood for each class The experiments onhuman action classification tasks including MSRAction3DandMSRGesture3D datasets are demonstrated and analyzed

on the superior performance of this algorithm over the state-of-the-art methods including SRC CRC and SVMThe restof the paper is organized as follows In Section 2 we introducerelated feature descriptors using DMM Section 3 details theaction classifier based on ICRC and Section 4 shows theexperimental results of our approach on relevant datasetsThe conclusion and acknowledgment are drawn in Section 5and Acknowledgments section

2 Feature Descriptors

21 Using Depth Motion Maps In this section we explainthe extracted feature descriptor using depth motion maps(DMMs) from depth images which is generated by selectingand stacking motion energy of depth maps projected ontothree orthogonal Cartesian planes aligning with front (119891)side (119904) and top (119905) views (ie 119901119891 119901119904 and 119901119905 resp) As foreach projectedmap itsmotion energy is computed by thresh-olding the difference between consecutive maps The binarymap of motion energy provides a strong clue of the actioncategory being performed and indicates motion regions orwhere movement happens in each temporal interval Wesuggest that all frames should be deployed to calculatemotioninformation instead of selecting frames Considering both thediscriminability and robustness of feature descriptors we usethe 1198971-norm of the absolute difference of a frame to definethe salient information on depth sequences Because 1198971-normis invariant to the length of a depth sequence and 1198971-normcontains more salient information than other norms (eg 1198972)we have

DMM119891119904119905 = 119873minusVsum119894=1

100381610038161003816100381610038161003816119901119891119904119905119894+V minus 119901119891119904119905119894 100381610038161003816100381610038161003816 (1)

where V is the frame interval i represents the frame indexand N is the total number of frames in a depth sequence Inthe case that the sum operation in (1) is only used given athreshold satisfied the scale of V affects little the local patternhistogram on the DMMs

22 Using Deep Convolutional Neural Networks In this sec-tion we introduce three deep convolutional neural networks(DCNNs) to train the features on three projected planes ofDMMs and perform fusion of three nets by combining thesoftmax in fully connected layer The layer configuration ofour three CNNs is schematically shown in Figure 1 in whichthere are five convolutional layers and three fully connectedlayers in each net The detail of our implementation isillustrated in Section 412

3 Action Classifier Based on ICRC

Based on depth motion maps to incorporate the featuredescriptors into a powerful classifier an improved collabora-tive representation classifier (ICRC) is presented for humanaction recognition

31 1198972-Regularized Collaborative Representation ClassifierThebasic idea of SRC is to get a test sample by sparsely choos-ing a small number of atoms fromanovercomplete dictionary

Journal of Electrical and Computer Engineering 3

Figure 1Three DCNNs architecture for a depth action sequence toextract features

that contain all training samples [12] Denoted by119860119895 isin R119889times119899the set of training samples form class 119895 and suppose we have119862 class of subjects So 119860 = [1198601 1198602 119860119862] isin R119889times119899 involvesmany samples from all classes and 119860119895 (119895 = 1 2 119862) is theindividual class of training samples 119899 is the total number oftraining samples and 119889 is the dimension of training samplesA query sample 119910 isin R119889 can be presented by 119910 = 119860120572 where119910 120572 and 119860 represent the data a sparse vector and a givenmatrix with the overcomplete training samples respectively

To be specific in the mechanism of collaborative repre-sentation classifier (CRC) each data point in the collaborativesubspace can be represented as a linear combination ofsamples in 119860 where 120572 = [1205721 1205722 120572119862] is an 119899 times 1representation vector associated with training sample and120572119895 (119895 = 1 2 119862) is the subvector corresponding to119860119895 Generally it is formulized as a 1198971-norm minimizationproblem with a convex objective and solved by

min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120579 1205721 st 119910 = 119860120572 (2)

where 120579 is a positive scalar to balance the sparsity term andthe residual The residual can be computed as119890119895 = 10038171003817100381710038171003817119910 minus 119860119895120572119895100381710038171003817100381710038172 (3)

where 120572119895 is the coefficient vector corresponding to class 119895And then the output of the identity of 119910 can be obtained bythe lowest residual as

class (119910) = argmin119895

119890119895 (4)

For more details of SRCCRC one can refer to [12]Because of the computational time consuming in 1198971-regularized minimization (1) is approximated as

min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120582 11987112057222 st 119910 = 119860120572 (5)

where 119910 isin R119889 is representative by 120572 and 119860 if the 1198972-normof 120572 is smaller In (5) 119871 is the Tikhonov regularization [27]to calculate the coefficient vector and 120582 is the regularizationparameter 1198711205722 is the 1198972-regularization term to add a certainamount of sparsity to 120572 which is weaker than 1198971-normminimization The diagonal matrix 119871 and the coefficientvector 120572 are calculated as follows [21]

119871 = (1003817100381710038171003817119910 minus ℎ110038171003817100381710038172 sdot sdot sdot 0 d0 sdot sdot sdot 1003817100381710038171003817119910 minus ℎ11989610038171003817100381710038172)

120572 = 119875119910(6)

where 119875 = (119860119879119860 + 120582119871119879119871)minus1119860119879 is independent of 119910 and pre-calculated With (3) and (4) the data 119910 is assigned differentidentities based on 12057232 The Proposed ICRC Method Based on the training sam-ple set we propose an improved collaborative representationclassifier based on 1198972-regularized term which assigns the datapoints with different probabilities based on 120572 by adding atermsum119862119895 119860120572minus11986011989512057211989522 that attempts to find a point119860119895120572119895 closeto the common point 119910 inside each subspace of class 119895 Thefirst two terms 119910 minus11986012057222 + 12058211987112057222 still form a 1198972-regularizedcollaborative representation term which encourages to find apoint 119860120572 close to 119910 in the collaborative subspace Therefore(5) is rewritten as

min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120582 11987112057222 + 120574119862119862sum119895

10038171003817100381710038171003817119860120572 minus 1198601198951205721198951003817100381710038171003817100381722 st 119910 = 119860120572 (7)

Obviously the parameters 120582 and 120574 balance three terms whichcan be set from the training data Accordingly a new solutionof representative vector 120572 is obtained from (7)

In the condition of 120574 = 0 (7) will degenerate to CRCwith the first two terms and 119910 minus 11986012057222 + 12058211987112057222 will playan important role in determining 120572 When 120574 gt 0 these twoterms 119910minus11986012057222 +12058211987112057222 will be the same for all classes andthus the term sum119862119895 119860120572 minus 11986011989512057211989522 will be dominant to furtherfine-tune 120572119895 by 119860119895 yielding to a precise 120572 That is the lastnewly added term is introduced to further adjust 120572119895 by 119860119895resulting in a more stable solution to representative vector 120572

We can omit the first two same terms for all classesmake the classifier rule by the last term and formulize it asa probability exponent

class (119910) = argmax119895

exp (minus 10038171003817100381710038171003817119860120572 minus 1198601198951205721198951003817100381710038171003817100381722) (8)

The proposed 1198972-regularizedmethod for human action recog-nition is presented to maximize the likelihood that a testsample belongs to each class then the experiments in thefollowing section show that it obtains a final classification bychecking which class has themaximum likelihood So far theabovementioned classifier model in (7) and (8) is named asthe improved collaboration representation classifier (ICRC)


Figure 2 Three DMMs of a depth action sequence ldquoASL Zrdquo fromthe front (119891) view side (119904) view and top (119905) view respectively4 Experimental Results

Based on depth motion maps to incorporate the featuredescriptors into a powerful classifier an ICRC is presented forhuman action recognition To verify the effectiveness of theproposed ICRC algorithm on action recognition applicationsusing DMM descriptors of depth sequences we carry outexperiments on challenging depth based action datasetsMSRAction3D [1] and MSRGesture3D [1] for human actionrecognition


411 DMMs The MSRAction3D [1] dataset is composed ofdepth images sequence captured by the Microsoft Kinect-V1 camera It includes 20 actions performed by 10 subjectsfacing the camera Each subject performed each action 2 or3 times There are 20 action types high arm wave horizontalarm wave hammer hand catch forward punch high throwdraw X draw tick draw circle hand clap two-hand waveside boxing bend forward kick side kick jogging tennisswing tennis serve golf swing and pick up and throw Thesize of each depth image is 240 times 320 pixels The backgroundinformation has been removed in the depth data

The MSRGesture3D [1] dataset is for continuous onlinehuman action recognition from a Kinect device It consists of12 gestures defined by American Sign Language (ASL) Eachpersonperforms each gesture 2 or 3 timesThere are 333 depthsequences For action recognition on the MSRAction3D andMSRGesture3D dataset we use the feature computed fromthe DMMs and each depth action sequence generates threeDMMs corresponding to three projection views The DMMsof high arm wave class from the MSRAction dataset areshown in Figure 2 and the DMMs of ASL Z class from theMSRGesture3D dataset are shown in Figure 3

412 DCNNs Furthermore our implementation of DCNNfeatures is based on the publicly available MatConvNet tool-box [28] using one Nvidia Titan X cardThe network weightsare learned bymini-batch stochastic gradient descent Similarto [4] the momentum is set to 09 and weight decay is setto 00005 and all hidden weight layers use the rectificationactivation function At each iteration 256 samples in each

Figure 3ThreeDMMs of a depth action sequence ldquoSwipe leftrdquo fromthe front (119891) view side (119904) view and top (119905) view respectivelybatch are constructed and resized to 256 times 256 then 224 times224 patches are randomly cropped from the center of theselected image to artificial data augmentation The dropoutregularization ratio is 05 in the nets Besides the initiallearning rate is set to 001 with pretrainedmodel on ILSVRC-2012 to fine-tune our model and the learning rate decreasesevery 20 epochs Finally we concatenate three 4096 dimen-sional feature vectors in 7th fully connected layer to input thesubsequent classifier

42 Experiment Setting The same experimental setup in [1]was adopted and the actions in MSRAction3D dataset weredivided into three subsets as follows AS1 horizontal wavehammer forward punch high throw hand clap bend tennisserve and pickup throw AS2 high wave hand catch drawx draw tick draw circle two-hand wave forward kick andside boxing AS3 high throw forward kick side kick joggingtennis swing tennis serve golf swing and pickup throwWe performed three experiments with 23 training samplesand 13 testing samples in AS1 AS2 and AS3 respectivelyThus the performance on MSRAction3D is evaluated bythe average accuracy (Accu unit ) on three subsets Onthe other hand the same experimental setting reported in[26 29 30] was followed 12 gestures were tested by leave-one-subject-out cross-validation to evaluate the performanceof the proposed method

43 Recognition Results with DMMs and ICRC We concate-nate the sign magnitude and center features to form thefeature based on DMMs as the final feature representationThe compared methods are similar to [29 30] The sameparameters reported in [26] were used here for the sizes ofSI and block A total of 20 actions are employed and one-half of the subjects (1 3 5 7 and 9) are used for training andthe remaining subjects are used for testing The recognitionperformance of our method and existing approaches arelisted in Table 1 It is clear that our method achieves betterperformance than other competing methods

To show the outcome of our method Figures 4 and 5illustrate the recognition rates of each class in two datasets Itis stated that there are 14 classes obtaining 100 recognitionrates in the MSRAction3D dataset and the performance of 3


Table 1 Recognition accuracies (unit ) comparison on theMSRAction3D dataset and MSRGesture3D dataset

Method Accu () on two datasetsMSRAction3D MSRGesture3D

DMM-HOG [3] 855 892Random Occupancy [22] 865 885Actionlet Ensemble [23] 882 882Depth Cuboid [24] 893 905Vemulapalli [25] 895 877DMM + CRC [26] 923 925DMM + ICRC (proposed) 938 948

Hig

h w

ave

Hor

izon

tal w

ave

Ham

mer

Han

d ca

tch

Forw

ard

punc

hH

igh

thro

wD

raw

XD

raw

tick

Dra

w ci

rcle

Han

d cla

pTw

o-ha

nd w

ave

Side

box

ing

Bend

Forw

ard

kick

Side

kic

kJo

ggin

gTe

nnis

swin

gTe

nnis

serv

eG

olf s

win

gPi

ckup

thro

w

0102030405060708090

100

Figure 4 Recognition rates (unit ) of 20 classes inMSRAction3Ddataset (average results of three subsets)

classes reaches up to best in the MSRGesture3D dataset Allexperiments are carried out usingMATLAB 2016b on an Inteli7-6500U desktop with 8GB RAM and the average time ofvideo processing gets about 26 frames per second meeting areal-time processing demand basically

44 Comparison with DCNN Features and ICRC Further-more in order to evaluate our proposed classifier methodwe also extract the deep features by the abovementioned con-ventional CNNmodel and then input the 12288 dimensionalvectors to the proposed ICRC for action recognition Table 2shows that DCNN algorithm indeed has advances as goodas in other popular tasks of image classification and objectdetection and it can improve the accuracy greatly up to 6 inMSRAction3D and MSRGesture3D This would also explainthe importance of effective feature to ICRC classifier

5 Conclusion

In this paper we propose improved collaborative represen-tation classifier (ICRC) based on 1198972-regularized for humanaction recognition The DMMs and DCNN feature descrip-tors are involved as an effective action representation For theaction classifier ICRC is proposed based on collaborative rep-resentation with the additional regularization term The newinsight focuses on a subspace constraints on the solutionTheexperimental results on MSRAction3D and MSRGesture3D

0102030405060708090

100

Z J

Whe

re

Stor

e

Pig

Past

Hun

gry

Gre

en

Fini

sh

Blue

Bath

room Milk

Figure 5 Recognition rates (unit ) of 12 classes inMSRGesture3Ddataset

Table 2 Recognition accuracies (unit ) comparison of DMM+ ICRC and DCNN + ICRC on the MSRAction3D dataset andMSRGesture3D dataset


DMM + ICRC (proposed) 938 948DCNN + ICRC (proposed) 9999 1000

show that the proposed algorithm performs favorably againstthe state-of-the-art methods including SRC CRC and SVMFuture work will focus on involving the deep-learned net-work in the depth image representation and evaluating morecomplex datasets such as MSR3DActivity UTKinect-Actionand NTU RGB+D for the action recognition task

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

The work was supported by the State Key Laboratory ofCoal Resources and Safe Mining under Contracts SKL-CRSM16KFD04 and SKLCRSM16KFD03 in part by the Nat-ural Science Foundation of China under Contract 61601466and in part by the Fundamental Research Funds for theCentral Universities under Contract 2016QJ04

References

[1] W Li Z Zhang and Z Liu ldquoAction recognition based ona bag of 3D pointsrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern RecognitionWorkshops (CVPRW rsquo10) pp 9ndash14 San Francisco Calif USAJune 2010

[2] S Wang Z Ma Y Yang X Li C Pang and A G HauptmannldquoSemi-supervised multiple feature analysis for action recogni-tionrdquo IEEE Transactions on Multimedia vol 16 no 2 pp 289ndash298 2014

[3] X Yang C Zhang and Y Tian ldquoRecognizing actions usingdepth motion maps-based histograms of oriented gradientsrdquo


in Proceedings of the 20th ACM International Conference onMultimedia MM 2012 pp 1057ndash1060 jpn November 2012

[4] P Wang W Li Z Gao J Zhang C Tang and P OgunbonaldquoDeep convolutional neural networks for action recognitionusing depth map sequencesrdquo Computer Vision and PatternRecognition arXiv preprint httpsarxivorgabs1501046862015

[5] W E Vinje and J L Gallant ldquoSparse coding and decorrelationin primary visual cortex during natural visionrdquo Science vol 287no 5456 pp 1273ndash1276 2000

[6] E Candes ldquoCompressive samplingrdquo in Proceedings of the Inter-national Congress of Mathematics vol 3 pp 1433ndash1452 2006

[7] N Guan D Tao Z Luo and B Yuan ldquoNeNMF an optimalgradient method for nonnegative matrix factorizationrdquo IEEETransactions on Signal Processing vol 60 no 6 pp 2882ndash28982012

[8] J-L Starck M Elad and D Donoho ldquoRedundant multiscaletransforms and their application for morphological componentseparationrdquo Advances in Imaging and Electron Physics vol 132pp 287ndash348 2004

[9] J Yang JWright T Huang and YMa ldquoImage super-resolutionas sparse representation of raw image patchesrdquo in Proceedingsof the 26th IEEE Conference on Computer Vision and PatternRecognition pp 1ndash8 June 2008

[10] W Dong L Zhang G Shi and X Wu ldquoImage deblurringand super-resolution by adaptive sparse domain selection andadaptive regularizationrdquo IEEETransactions on Image Processingvol 20 no 7 pp 1838ndash1857 2011

[11] KHuang and S Aviyente ldquoSparse representation for signal clas-sificationrdquo in Proceedings of the NIPS pp 609ndash616 VancouverCanada December 2006

[12] JWright A Y Yang A Ganesh S S Sastry and YMa ldquoRobustface recognition via sparse representationrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 31 no 2 pp210ndash227 2009

[13] M Yang and L Zhang ldquoGabor feature based sparse represen-tation for face recognition with gabor occlusion dictionaryrdquo inProceedings of the 11th European Conference on Computer Vision(ECCV rsquo10) pp 448ndash461 Springer Crete Greece 2010

[14] J Yang K Yu Y Gong and T Huang ldquoLinear spatial pyramidmatching using sparse coding for image classificationrdquo in Pro-ceedings of the IEEE Computer Society Conference on ComputerVision and Pattern Recognition (CVPR rsquo09) pp 1794ndash1801 June2009

[15] Q Zhang and B Li ldquoDiscriminative K-SVD for dictionarylearning in face recognitionrdquo in Proceedings of the IEEE Com-puter Society Conference on Computer Vision and Pattern Rec-ognition (CVPR rsquo10) pp 2691ndash2698 June 2010

[16] B Cheng J Yang S Yan Y Fu and T S Huang ldquoLearningwith l1-graph for image analysisrdquo IEEE Transactions on ImageProcessing vol 19 no 4 pp 858ndash866 2010

[17] A Wagner J Wright A Ganesh Z Zhou and Y Ma ldquoTowardsa practical face recognition system Robust registration andillumination by sparse representationrdquo in Proceedings of the2009 IEEE Computer Society Conference on Computer Visionand Pattern RecognitionWorkshops CVPRWorkshops 2009 pp597ndash604 usa June 2009

[18] J Z Huang X L Huang and D Metaxas ldquoSimultaneousimage transformation and sparse representation recoveryrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo08) pp 1ndash8 Anchorage AlaskaUSA June 2008

[19] A Y Yang A Genesh Z Zhou S Sastry and Y Ma ldquoA Reviewof Fast L1-Minimization Algorithms for Robust Face Recogni-tionrdquo Defense Technical Information Center 2010

[20] L Zhang M Yang and X Feng ldquoSparse representation orcollaborative representation Which helps face recognitionrdquo inProceedings of the IEEE International Conference on ComputerVision (ICCV rsquo11) pp 471ndash478 Barcelona Spain November2011

[21] P Berkes B L White and J Fiser ldquoNo evidence for activesparsification in the visual cortexrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing SystemsNIPS 2009 pp 108ndash116 can December 2009

[22] J Wang Z Liu J Chorowski Z Chen and Y Wu ldquoRobust 3Daction recognition with random occupancy patternsrdquo in Com-puter VisionmdashECCV 2012 12th European Conference on Com-puter Vision Florence Italy October 7ndash13 2012 ProceedingsPart II pp 872ndash885 Springer Berlin Germany 2012

[23] J Wang Z Liu Y Wu and J Yuan ldquoMining actionlet ensemblefor action recognition with depth camerasrdquo in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition(CVPR rsquo12) pp 1290ndash1297 Providence RI USA June 2012

[24] L Xia and J K Aggarwal ldquoSpatio-temporal depth cuboid sim-ilarity feature for activity recognition using depth camerardquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition (CVPR rsquo13) pp 2834ndash2841 IEEE PortlandOre USA June 2013

[25] R Vemulapalli F Arrate and R Chellappa ldquoHuman actionrecognition by representing 3D skeletons as points in a liegrouprdquo in Proceedings of the 27th IEEE Conference on Com-puter Vision and Pattern Recognition (CVPR rsquo14) pp 588ndash595Columbus Ohio USA June 2014

[26] C Chen M Liu B Zhang J Han J Junjun and H Liu ldquo3Daction recognition using multi-temporal depth motion mapsand fisher vectorrdquo in Proceedings of the 25th International JointConference on Artificial Intelligence IJCAI 2016 pp 3331ndash3337usa July 2016

[27] A N Tikhonov and V Y Arsenin Solutions of Ill-PosedProblems John Wiley amp Sons New York NY USA 1977

[28] httpwwwvlfeatorgmatconvnet[29] Y Yang B Zhang L Yang CChen andWYang ldquoAction recog-

nition using completed local binary patterns and multiple-classboosting classifierrdquo in Proceedings of the 3rd IAPR AsianConference on Pattern Recognition ACPR 2015 pp 336ndash340mys November 2016

[30] A W Vieira E R Nascimento G L Oliveira Z Liu and MF M Campos ldquoOn the improvement of human action recog-nition from depth map sequences using space-time occupancypatternsrdquo Pattern Recognition Letters vol 36 no 1 pp 221ndash2272014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of


International Journal of

RotatingMachinery


Hindawi Publishing Corporation httpwwwhindawicom

Journal of

Volume 201

Submit your manuscripts athttpswwwhindawicom

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics


Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

SensorsJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks



As for representative models many achievements basedon space representation include image restoration [5] com-pressive sensing [6 7] morphological component analysis[8] and super-resolution [9 10] As the advances of classifiersbased on representation several pattern recognition prob-lems in the field of computer vision can be effectively solvedby sparse coding or sparse representation methods in recentdecades In particular the linear models can be representedas 119910 = 119860120572 [11] where 119910 120572 and 119860 represent the data a sparsevector and a given matrix with the overcomplete sample setrespectively Because of the great success of sparse codingalgorithms on image processing the sparse representationbased classifiers such as sparse representation classification(SRC) and collaborative representation classification (CRC)have gained more attention nowadays

The basic idea of SRCCRC is to code the test sampleover a set of samples with sparsity constraints which canbe calculated by 1198971-minimization In [12] Wright proposeda basic SRC model for classification by the discriminativenature of sparse representation which is based on the theorythat newly signals are recognized as the linear combinationsof previously observed ones Based on SRC Yang and Zhangproposed aGabor occlusion dictionary based SRC which cansignificantly reduce the computational cost [13] In [14] theauthors combined sparse representation with linear pyramidmatching for image classification Rather than using theentire training set Zhang and Li [15] proposed learneddictionary using SRC In [16] 1198971-graph is constructed bya sparse representation subspace over the other samplesYang et al [14] also proposed a method to preserve the 1198971-graph for image classification by using a subspace to solvemisalignment problems in image classification task BesidesSRC is used for robust illumination [17] image-plane trans-formation [18] and so on However Zhang argued that thegood performance of SRC should be largely attributed to thecollaborative representation of a test sample by training sam-ples across all classes and proposed more effective CRC Insummary SRCCRC simply uses the reconstruction error orresidual by each class-specific subspace to determine the classlabel and many modified models and solution algorithmsto SRCCRC are also proposed for visual recognition tasksincluding Augmented Lagrange Multiplier Proximal Gradi-ent Gradient Projection Iterative Shrinkage-Thresholdingand Homotopy [19] Recently some researchers [20 21] havepointed out the purpose of 1198971-regularized based sparsity inpattern classification On the contrary using 1198972-regularizedbased representation for classification can do a similar job to1198971-regularized but the computational cost will reduce a lot

Motivated by the work of modifications of CRC inthis paper we mainly present the improved collaborativerepresentation classifier (ICRC) based on 1198972-regularized forhuman action recognition Based on three DMMsrsquo descrip-tor feature the ICRC approach is to jointly maximize thelikelihood that a test sample belongs to each of the multipleclasses then the final classification is performed by com-puting the likelihood for each class The experiments onhuman action classification tasks including MSRAction3DandMSRGesture3D datasets are demonstrated and analyzed

on the superior performance of this algorithm over the state-of-the-art methods including SRC CRC and SVMThe restof the paper is organized as follows In Section 2 we introducerelated feature descriptors using DMM Section 3 details theaction classifier based on ICRC and Section 4 shows theexperimental results of our approach on relevant datasetsThe conclusion and acknowledgment are drawn in Section 5and Acknowledgments section


21 Using Depth Motion Maps In this section we explainthe extracted feature descriptor using depth motion maps(DMMs) from depth images which is generated by selectingand stacking motion energy of depth maps projected ontothree orthogonal Cartesian planes aligning with front (119891)side (119904) and top (119905) views (ie 119901119891 119901119904 and 119901119905 resp) As foreach projectedmap itsmotion energy is computed by thresh-olding the difference between consecutive maps The binarymap of motion energy provides a strong clue of the actioncategory being performed and indicates motion regions orwhere movement happens in each temporal interval Wesuggest that all frames should be deployed to calculatemotioninformation instead of selecting frames Considering both thediscriminability and robustness of feature descriptors we usethe 1198971-norm of the absolute difference of a frame to definethe salient information on depth sequences Because 1198971-normis invariant to the length of a depth sequence and 1198971-normcontains more salient information than other norms (eg 1198972)we have

DMM119891119904119905 = 119873minusVsum119894=1

100381610038161003816100381610038161003816119901119891119904119905119894+V minus 119901119891119904119905119894 100381610038161003816100381610038161003816 (1)

where V is the frame interval i represents the frame indexand N is the total number of frames in a depth sequence Inthe case that the sum operation in (1) is only used given athreshold satisfied the scale of V affects little the local patternhistogram on the DMMs

22 Using Deep Convolutional Neural Networks In this sec-tion we introduce three deep convolutional neural networks(DCNNs) to train the features on three projected planes ofDMMs and perform fusion of three nets by combining thesoftmax in fully connected layer The layer configuration ofour three CNNs is schematically shown in Figure 1 in whichthere are five convolutional layers and three fully connectedlayers in each net The detail of our implementation isillustrated in Section 412

3 Action Classifier Based on ICRC

Based on depth motion maps to incorporate the featuredescriptors into a powerful classifier an improved collabora-tive representation classifier (ICRC) is presented for humanaction recognition

31 1198972-Regularized Collaborative Representation ClassifierThebasic idea of SRC is to get a test sample by sparsely choos-ing a small number of atoms fromanovercomplete dictionary





min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120579 1205721 st 119910 = 119860120572 (2)



class (119910) = argmin119895

119890119895 (4)


min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120582 11987112057222 st 119910 = 119860120572 (5)



120572 = 119875119910(6)


min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120582 11987112057222 + 120574119862119862sum119895

10038171003817100381710038171003817119860120572 minus 1198601198951205721198951003817100381710038171003817100381722 st 119910 = 119860120572 (7)




class (119910) = argmax119895

exp (minus 10038171003817100381710038171003817119860120572 minus 1198601198951205721198951003817100381710038171003817100381722) (8)

















Hig

h w

ave

Hor

izon

tal w

ave

Ham

mer

Han

d ca

tch

Forw

ard

punc

hH

igh

thro

wD

raw

XD

raw

tick

Dra

w ci

rcle

Han

d cla

pTw

o-ha

nd w

ave

Side

box

ing

Bend

Forw

ard

kick

Side

kic

kJo

ggin

gTe

nnis

swin

gTe

nnis

serv

eG

olf s

win

gPi

ckup

thro

w

0102030405060708090

100




5 Conclusion


0102030405060708090

100

Z J

Whe

re

Stor

e

Pig

Past

Hun

gry

Gre

en

Fini

sh

Blue

Bath

room Milk








Acknowledgments


References

































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation













min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120579 1205721 st 119910 = 119860120572 (2)



class (119910) = argmin119895

119890119895 (4)


min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120582 11987112057222 st 119910 = 119860120572 (5)



120572 = 119875119910(6)


min120572

1003817100381710038171003817119910 minus 119860120572100381710038171003817100381722 + 120582 11987112057222 + 120574119862119862sum119895

10038171003817100381710038171003817119860120572 minus 1198601198951205721198951003817100381710038171003817100381722 st 119910 = 119860120572 (7)




class (119910) = argmax119895

exp (minus 10038171003817100381710038171003817119860120572 minus 1198601198951205721198951003817100381710038171003817100381722) (8)

















Hig

h w

ave

Hor

izon

tal w

ave

Ham

mer

Han

d ca

tch

Forw

ard

punc

hH

igh

thro

wD

raw

XD

raw

tick

Dra

w ci

rcle

Han

d cla

pTw

o-ha

nd w

ave

Side

box

ing

Bend

Forw

ard

kick

Side

kic

kJo

ggin

gTe

nnis

swin

gTe

nnis

serv

eG

olf s

win

gPi

ckup

thro

w

0102030405060708090

100




5 Conclusion


0102030405060708090

100

Z J

Whe

re

Stor

e

Pig

Past

Hun

gry

Gre

en

Fini

sh

Blue

Bath

room Milk








Acknowledgments


References

































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation
























Hig

h w

ave

Hor

izon

tal w

ave

Ham

mer

Han

d ca

tch

Forw

ard

punc

hH

igh

thro

wD

raw

XD

raw

tick

Dra

w ci

rcle

Han

d cla

pTw

o-ha

nd w

ave

Side

box

ing

Bend

Forw

ard

kick

Side

kic

kJo

ggin

gTe

nnis

swin

gTe

nnis

serv

eG

olf s

win

gPi

ckup

thro

w

0102030405060708090

100




5 Conclusion


0102030405060708090

100

Z J

Whe

re

Stor

e

Pig

Past

Hun

gry

Gre

en

Fini

sh

Blue

Bath

room Milk








Acknowledgments


References

































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation






































RoboticsJournal of





Journal of



RotatingMachinery



Journal of

Volume 201


VLSI Design



Shock and Vibration







Journal of



Volume 2014


SensorsJournal of





Propagation









improved collaborative representation classifier based on 2...

Documents