pose-invariant face recognition using curvelet neural network
TRANSCRIPT
www.ietdl.org
1&
Published in IET BiometricsReceived on 1st February 2013Revised on 16th May 2013Accepted on 20th June 2013doi: 10.1049/iet-bmt.2013.0019
28The Institution of Engineering and Technology 2014
ISSN 2047-4938
Pose-invariant face recognition using curveletneural networkPoonam Sharma1, Ram N. Yadav2, Karmveer V. Arya3
1Department of Computer Science and Engineering, Madhav Institute of Technology and Science, Gwalior, India2Department of Electronics and Communication Engineering, Maulana Azad National Institute of Technology,
Bhopal, India3Department of Information and Communication Technology, ABV-Indian Institute of Information Technology and
Management, Gwalior, India
E-mail: [email protected]
Abstract: A novel pose-invariant face recognition method is proposed by combining curvelet-invariant moments with curveletneural network. First a special set of statistical coefficients using higher-order moments of curvelet are extracted as the featurevector and then the invariant features are fed into curvelet neural networks. Finally, supervised invariant face recognition isachieved by converging the neural network using curvelet as the activation function of the hidden layer neurons. Theexperimental results demonstrate that curvelet higher-order moments and curvelet neural networks achieve higher accuracy forface recognition across pose and converge rapidly than standard back propagation neural networks.
1 Introduction
Face recognition has been studied extensively by a number ofresearchers as it is natural and passive than other biometrictechniques. Most of the existing face recognition methodswork efficiently on frontal faces [1–4] and are able toachieve low error rates in well controlled environment.Being non-intrusive, a face recognition system must be ableto identify a face in an uncontrolled environment, unnoticedand without cooperation from the subject. However, recentliterature survey [5] shows that pose variation is aprominent unsolved problem in development of real-timeautomatic face recognition system. The performance of facerecognition is seriously affected by variation in the testimage from the frontal image. The difficulty in the facerecognition across different poses is because of the fact thatwithin class variance dominates between class scatter of thedata and results in loss of discriminative feature data. Aspose variation is linear in three-dimensional (3D) butnon-linear in two-dimensional (2D), good results can beobtained by developing a 3D model of the subject but hasdrawback of precise registration and exhaustive computationinvolved. Based on these, pose-invariant face recognition isdivided into two parts namely 2D-based methods and3D-based methods. 2D methods are more attractive asaddition of new subject to the database is easy, computationis simpler and memory requirement is less. Most of therecently developed 2D methods are again divided into twoparts. First set of methods are based on the synthesis ofdifferent poses from the frontal face and these faces act asthe gallery images, or a 3D face can be generated. Differentsynthetic poses are generated by finding a mapping function
between gallery frontal pose and the gallery non-frontalposes. Another set of methods in 2D directly model thediscriminative features in face because of pose across samesubject and among different subjects. A number of methodshave been proposed to model the local appearance change.It is based on the assumption that the appearance variationamong different subjects with same pose is always largerthan the appearance variation among different poses of thesame subject. But it is not applicable to large posevariation, where the appearance variation among differentposes of the same subject is significant as compared withappearance variation among same pose of different subjects.These problems can be overcome by combining the globalfeatures and the local features of the face image. In theproposed method, the global features are obtained bygenerating multiresolution curvelets. Motivation behindusing curvelet is its similarity to human visual system(HVS). HVS have spatial frequency subband decompositionas the key feature. HVS recognises images by scale,position and orientation variation. Curvelet is amultiresolution tool with variation in scale, position andorientation. Adaptive nature of curvelet transform makes itmore similar to HVS. Then each curvelet is divided on agrid in polar coordinates into a central region and eightradial sectors. Then the local features are considered bygenerating invariant higher-order moments of differentregions. Higher-order moments are used to counteract thedrawback of curvelet as being global and also to reduce thedimensionality of the feature vector without reducing itsdiscriminatory power. The results are applied to the curveletneural network for classification that uses curvelet as theactivation function. Curvelet neural network also closely
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
www.ietdl.org
resembles multiresolution properties of visual neurons. Theneural network with activation function closer to HVSprovides higher efficiency and quick convergence ascompared with basic neural networks [6–8].The rest of this paper is organised as follows. In Section 2,we describe the related work done in the field of facerecognition across pose. Section 3 describes the curvelettransform, curvelet-invariant moments and curvelet neuralnetworks. The detailed comparative analysis ofexperimental results has been presented in Section4. Section 5 concludes the paper.
2 Related work
The state-of–the-art methods for face recognition undervarying poses may be divided into three categories. Firstcategory is based on transforming an input image to thesame pose as stored in the database. Second category ofmethods is based on deforming a generic face model to fitwith the input image. Third category is based on generatingfeature vector from all views and then generating a commonfeature vector that is a combination of feature vector of allviews. Pentland et al. [9] proposed a view-based methodfor recognition under variable pose. In this method,recognition is done on the basis of eigenvector of that viewspace and calculating distance from face space. Thismethod has a drawback that it requires a large number ofgallery images in different poses. Cootes et al. [10]proposed a view-based active appearance model (AAM)assuming that 2D statistical models can capture the facialfeatures from any view point. However, the experimentationwas done on face tracking only. Gross et al. [11] proposedan eigen light fields (ELF) method to tackle the poseproblem. In this method, ELF of the subject head iscalculated from the gallery image. Gallery and probeimages are then compared on the basis of ELF. Thismethod requires a large number of gallery images and thecalculation of the ELF is difficult and the feature vector wascalculated by concatenating the normalised image vectors ofdifferent gallery images of the same subject. Chai et al.[12] presented an affine transformation based on statisticalanalysis. Face is divided into three rectangular regions andaffine transformation of rectangular regions with differentposes is calculated and is used for recognition of faceacross pose. Although the results improved as comparedwith other existing methods but recognition rate is still low.Also there is no method for automatic marking of facelandmarks. Prince et al. [13] proposed a generative modelthat generates different poses from an identity space.Expectation maximisation algorithm is used for estimatingthe linear transformation and noise data from training data.This method is probabilistic and provides a posteriorprobability for the matching to a gallery. Also the methoddescribes how an underlying pose invariant data generates apose varying data. It gives good result for face recognitionin constrained environment but the result reduces forunconstrained or real-time environment. Shan et al. [14]presented an extension of adaptive principal componentanalysis (APCA) method. They developed a face model anda rotation model that is used to generate the feature vectorand synthesise frontal face from non-frontal face. Adaboostcascade face detection is used to reduce the problem ofinitialisation of AAM. AAM is used for developing rotationmodel and pose estimation. APCA is used fordimensionality reduction and is insensitive to illumination
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
and expression change for face recognition. Sarfraz andHellwich [15] developed a pose-invariant method that doesnot require perfect alignment between the gallery and theprobe image. It models approximated joint probabilitydistribution of the gallery and the probe images at differentposes. Features are extracted using gradient locationorientation histogram signatures and features are synthesisedfrom non-frontal to frontal views.Chai et al. [16] developed local linear regression (LLR)
method for generating virtual frontal face from a non-frontalface image. They used the fact that the local facial regionsbetter correspond to the local regions of non-frontal face.Thus, the face is divided into multiple local patches and thelinear regression is applied to each patch to obtain the patchof the frontal face. The recognition rate is better ascompared with global linear regression and ELF method butconsiders the image as piecewise linear. Choi et al. [17]proposed pose and illumination invariant face recognitionwhere pose is estimated based on 2D image and uses aclassification rule to classify a pose of a face image. Theshadow compensation is obtained after determining thelight direction and the feature is extracted by applying nullspace linear discriminant analysis. Then classification isdone using nearest neighbour rule. However, dimensionalityreduction technique reduces the efficiency. Wang et al. [18]presented face recognition across pose by considering aprobe image with a different pose from gallery imageswhich is represented by a linear combination of the galleryimages. They proved that orthogonal discriminant vector(ODV) is a pose-invariant feature. ODVs of each subject isgenerated where a subject posses zero projection and possesmaximum projection with reference to other subjects. Thenclassification is done using distance metric. However, theresult reduces with large pose variations. Moallem et al.[19] proposed a fuzzy rule-based system forpose-independent face detection. Skin colour, lip position,face shape information and ear texture are fed to fuzzyrule-based classifier to extract face candidate. Threshold onface candidates is optimised by using genetic algorithm. Itworks well for slight variation in pose. They usedgeometric moments based ear texture classification to verifyits outcomes. Mohammad et al. [7] developed a facerecognition based on multidimentional principal componentanalysis (PCA). Extreme learning (EL) machine is used toclassify the subjects. Face image is decomposed usingcurvelet and a subband that exhibits maximum standarddeviation is selected. Then dimensionality reduction is doneusing PCA and the result is applied to EL machine forclassification. This method has a faster and efficientrecognition as compared with other existing methods, but isdependent on variation in gallery images, scales of curveletsand number of hidden neurons.Huang et al. [20] proposed a pose-invariant face
recognition approach by generating view specific eigenfacesfor each view. Feature coefficients are extracted for eacheigenfaces for each view. Feature coefficients extracted foreach eigenfaces is used to train view-specific neuralnetworks. Then second layer of neural network is used tocombine the decisions obtained from the first layer. Theefficiency tends to reduce as the number of eigenfaces isreduced. Singh et al. [21] proposed a hierarchicalregistration method by using affine transformation andmutual information-based registration method. Themosaicing is done by mask generation, stitching andblending using Laplacian and Gaussian pyramids andclassification is done using support vector machine (SVM)
129& The Institution of Engineering and Technology 2014
www.ietdl.org
classifier. Arashloo et al. [22] proposed recognition of facesin arbitrary pose by using the energy of the establishedmatch between a pair of images as the matching criterion. Ituses a multiresolution approach based on super-couplingtransform to establish pixel-wise correspondences. Thefeature vector is obtained by applying PCA andnearest-neighbour classifier is used for classification. Theefficiency is low when eye coordinates are used for texturecomparison as single gallery image is used for training.Zhang et al. [23] proposed pose robust face recognitionmethod using sparse property of the representationcoefficients of a face image over its corresponding view,which is invariant to pose. This method models the latentidentity space via sparse coefficients and formulates themapping from non-frontal to frontal faces as a non-linearmapping by solving a sparsity regularised optimisationproblem. The drawback of the method is that the probeimage must be known as a prior knowledge. Lu et al. [24]proposed face recognition via weighted sparserepresentation and integrated data linearity and locality withbetter results in lower dimensional subspace. The efficiencyis very low for lower dimensions. Ma et al. [25] proposedhead pose estimation by combining biologically inspiredfeatures with local binary pattern (LBP) after applyingGabor filter on the training image. PCA is used to reducethe dimensionality of the local biologically inspiredfeatures. Nearest neighbour and SVM classifier is used forclassification. However, dimensionality reduction reducesthe efficiency. Meshgini et al. [26] proposed facerecognition using a combination of Gabor wavelets, directlinear discriminant analysis (DLDA) and SVM. Featurevector is extracted using DLDA. SVM is used forclassification with hyperhemisphercally normalisedpolynomial as the kernel function. The combination ofDLDA and Gabor filter generate pose invariant features butdimensionality reduction reduces the efficiency. Singh andSahan [27] proposed face recognition using a combinationof global and local features. Wavelet moments werecombined with wavelet invariants to enhance theperformance. Wavelet moments are the image descriptors,which includes both global and local characteristics ofimage and whose magnitude is invariant to image rotation.Sharma et al. [6] proposed a face recognition method usingcombination of Gabor wavelet and LBP for featureextraction. The feature vector generated is classified usinggeneralised mean neural network. Generalised mean neuralnetwork proved to be better classification method ascompared with SVM. Efficiency of the proposed method isvery high for slight variation in pose and illumination butthe efficiency reduces with large pose variation.3 Curvelet transform
3.1 Basic fast discrete curvelet transform
Curvelet transform [28–30] developed by Candes andDonoho is a multiresolution tool and have the capability ofmore sparse representation of the images as compared withwavelet transform. For facial images, the lines and curvesare the main features. The face recognition using curvelettransform [28] was first proposed to highlight the curvedsingularities in images for face recognition and proved tobe better as compared to wavelet transform in terms ofefficiency. Curvelet better characterise the facial imagesas it can capture the intrinsic geometrical structures likeedges in face image. Wavelet can only capture point
130& The Institution of Engineering and Technology 2014
discontinuities, whereas curvelet can capture linearsegments of contours. After applying curvelet transform,image is converted to coefficients of low frequency andhigh frequency in matrix form. The low-frequencycomponent contains the approximation of the face images.The high-frequency component contains the detailedinformation of the curves. The low-frequency coefficientsalso known as curvefaces, contain most significantinformation of faces and are crucial for face recognition.The curvelet transform is a combination of subband
decomposition, smooth partitioning, renormalisation andridgelet analysis [29]. Curvelet transform directly takesedges as the basic representation elements and is anisotropicwith strong direction and so it is useful for representing theedges of images efficiently. In a 2D image with a numberof edges, curvelet transform is used to capture the edgeinformation. To form an efficient feature set it is crucial tocollect these interesting edge information which in turnincreases the discriminatory power of a recognition system.Discrete curvelet coefficients defined by Candes and
Donoho [28] are given as:
CD j, l, k1, k2( ) = ∑0≤m≤M
0≤n≤N
f m, n[ ]fDj,l,k1,k2 m,n[ ] (1)
Here, each ∅Dj,l,k1,k2 m,n[ ] is a digital curvelet waveform. The2D moment of order (p + q) of the digital image F(x, y) ofsize M × N is defined as
mpq =∑M−1
x=0
∑N−1
y=0
xpyqf (x, y) (2)
Magnitude of mpq remains invariant even if the image rotates.We consider the curvelet family function in polar form as
follows
Cabpq(r) =1��a
√ Cpqr − b
a
( )(3)
where a is a dilation parameter and b is a shifting parameter.The mother curvelet Cpq is a higher-order moment of curvelet.The Cabpq(r) is the invariant moments of the face image. Theapproximate band and detailed band obtained after applyingcurvelet for sample image are shown in Fig. 1.
3.2 Curvelet-invariant higher-order moments
It is observed from literature review that face recognitionrequires both global and local features to effectivelyrepresent the entire face information. Also most of themethods suffer from high computational complexity.Wavelet moments have been used by Singh and Sahan [27]since it combines the characteristics of multi resolution andinvariance. It also reduces the computational complexity.Wavelet moments has been used for image-processingapplications and proved better than radial momentinvariants. However, it suffers from the drawback ofignoring higher-order moments, which also includesimportant facial information. Curvelet is a multiresolutiontool with better directionality, optimal approximation rate,easy implementation and non-redundancy as compared withwavelet transform. Moment-based LBP [31] has also beenused for invariant pattern recognition. Thus curvelet-based
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
Fig. 1 Approximate and Detailed subband for an example image
www.ietdl.org
moments have been used in the proposed method to overcomethe drawbacks of wavelet moments. The magnitude ofmoments for sample images with varying pose and curveletimages are shown in Figs. 2a and b. It is clear from thefigure that the curvelet moments are invariant to posevariation. The statistical moments that are invariant to poseand illumination can be used for face recognition. The 2Dmoment of order (p + q) of the digital image F(x, y) of sizeM ×N is defined as [32]
mpq =∑M−1
x=0
∑N−1
y=0
xpyqf (x, y) (4)
The corresponding central moment of order (p + q) is defined
Fig. 2 For different poses of sample image from CMU-PIE database
a Magnitude of momentsb Magnitude of curvelet moments
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
as
m pq =∑M−1
x=0
∑N−1
y=0
x− �x( )p y− �y( )q
f (x, y) (5)
where
�x = m10
m00and �y = m01
m00
The six statistical moments used in the proposed methodincludes mean, standard deviation, measurement ofsmoothness, skewness, measurement of uniformity andrandomness. The mean intensity that estimates the valuearound central clustering for one-dimensional (1D) is
131& The Institution of Engineering and Technology 2014
Fig. 3 Curvelet neural network
www.ietdl.org
given by
m =∑L−1
i=0
zip zi( )
(6)
where zi is a random variable indicating intensity, p(zi) is thehistogram of the intensity levels in the region, L is the numberof possible intensity levels. Standard deviation is a measure ofthe average contrast for 1D and is given as
s = ������m2(z)
√ (7)
Measure of the relative smoothness of the intensity in theregion for 1D is given by
R = 1− 1/ 1+ s2( )(8)
Third moment that measures the skewness of a histogram for1D is given by
m3 =∑L−1
i=0
zi − m( )3
p zi( )
(9)
Fig. 4 Block diagram of the proposed method
132& The Institution of Engineering and Technology 2014
Uniformity measures the uniformity of the intensity in thehistogram for 1D and is given by
U =∑L−1
i=0
p2 zi( )
(10)
A measure of randomness for 1D is given by
e = −∑L−1
i=0
p zi( )
log2p zi( )
(11)
The statistical features for 2D can be calculated for eachcombination of scale and orientation using (4) and (5).
3.3 Curvelet neural network
Neural network is a very old technique, but generalised meanneural network is a new advancement in the field of neuralnetwork, which have showed good results for differentapplications of signal processing [33] and image processing[6]. It has been observed that, having a non-linearaggregation functions, generalised mean neuron enhancesthe classification capability and reduces the convergencetime of neural network when used for image-processingapplications. Curvelet neural network is applied forclassification, and is a kind of multilayer feedforwardnetwork, based on curvelet analysis. The curvelet neuralnetwork has higher function learning capability and hasbetter directionality and anisotropy characteristics thanGabor wavelet neural network [6–8]. The curvelet neuralnetwork has input layer excitation function as the lineartransform function, the hidden layer excitation function asthe curvelet function and output layer excitation function issigmoid function.Suppose the nth sample input is
Xn = xni{ }
, i = 0, 1, 2, . . . ., L
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
Fig. 5 Feature extraction using higher moments of curvelet transform
www.ietdl.org
Network output is
Yn = ynk{ }
, k = 0, 1, 2, . . . ., S
and expected output is
Dn = dnk{ }
, k = 0, 1, 2, . . . ., S, n = 0, 1, 2, . . . , N
Let N be the number of samples, L is the input layer unitnumber and S is the output layer unit number. The curveletneural network has the connections as shown in Fig. 3. Theweight coefficient between jth hidden neuron and ith inputunit is Wij and the weight coefficient between Kth outputneuron and jth hidden neuron is Vjk. The input to the jthhidden neuron is given by
netnj =∑Li=1
Wijxni +Woj
[ ]1/n
(12)
where n n e <+( )is the generalisation parameter and gives the
various means (arithmetic mean, geometric mean andharmonic mean) depending on the value of n.The output of the hidden layer units is given by
Cabpq netnj
( )= 1��
a√ Cpq
netnj − b
a
( )(13)
Fig. 6 Histogram
a Approximate bandb Detailed subband
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
The output of the kth unit in the output layer is given by
Ynk = s
∑Li=1
VjkCabpq netnj
( )+ Vok
[ ]1/n
(14)
where σ(x) = 1/(1 + e−x) is a sigmoid function.Mean-square error (MSE) can be calculated as follows
E = 1
2
∑Nn=1
∑s
k=1
Ynk − Dn
k
( )2(15)
Network can be trained by gradient descent rule as follows
dE
dWij= dE
dy
dy
dWij= −
∑s
k=1
Yk − Dk
( )Cabpq netnj
( )( )(16)
dE
dVjk= dE
dy
dy
dVjk= −
∑s
k=1
Yk − Dk
( )xni
( )(17)
dE
dbij= dE
dy
dy
dCabpq netnj
( ) dCabpq netnj
( )dbij
= −∑s
k=1
Yk − Dk
( )Ynk 1− Yn
k
( )Vjk∗Qij 1−
∑ni=1
Q2ij
{ }( )
(18)
133& The Institution of Engineering and Technology 2014
Table
1Comparisonofdifferen
tstate-of-the-artmetho
dsin
thetraintim
ean
dclas
sific
ationrate
forLF
W,F
ERETan
dCMU-PIE
datab
ase
S.N
o.Metho
ds
LFW
FERET
CMU-PIE
Trainingtim
e,s
Rec
ognitio
nrate,%
Trainingtim
e,s
Rec
ognitio
nrate,%
Trainingtim
e,s
Rec
ognitio
nrate,%
35
73
57
35
73
57
35
73
57
1BPneu
raln
etwork
10.6
15.1
41.2
56.2
61.3
63.9
7.8
13.1
32.2
76.2
81.3
89.9
7.67
14.16
35.2
75.2
83.3
90.23
2wav
elet
neu
raln
etwork
10.2
12.6
26.5
74.6
76.8
79.3
7.71
12.3
24.3
79.6
84.8
91.3
7.21
13.30
32.6
78.3
82.3
90.6
3ridge
letneu
raln
etwork
9.8
11.8
24.3
77.9
79.8
82.3
7.56
12.1
22.7
85.9
89.8
92.3
7.16
12.67
31.7
83.7
88.4
91.56
4ridge
+Gab
or[37]
10.1
11.9
23.9
78.7
79.9
83.4
7.81
11.9
21.0
86.3
89.9
92.7
7.18
12.55
28.9
85.4
91.7
92.3
5robust
regression[38]
11.1
12.5
25.4
79.3
80.6
83.7
8.31
12.7
23.8
87.5
91.6
93.1
8.12
13.8
34.5
87.2
92.1
93.33
6PLS
+Gab
or[37]
10.5
11.8
24.1
81.2
83.7
86.4
7.64
11.82
20.9
89.2
93.8
95.2
7.16
12.40
28.2
88.5
92.4
95.67
7propose
dmetho
d9.6
11.6
23.7
82.3
84.7
87.4
7.32
11.8
18.7
92.3
94.7
98.4
7.02
12.40
26.3
92.6
95.3
98.77
www.ietdl.org
dE
daj= dE
dy
dy
dCabpq netnj
( ) dCabpq netnj
( )daj
= −∑s
k=1
Yk − Dk
( )Ynk 1− Yn
k
( )Vjk∗Q3
ij 1−∑ni=1
Q2ij
{ }( )
(19)
Thus, the parameters are updated using the followingequations
Wij(t + 1) = Wij(t)− hdE
dWij
+ aDWij(t) (20)
Vjk(t + 1) = V jk(t)− hdE
dV jk+ aDVjk(t) (21)
bij(t + 1) = bij(t)− hdE
dbij+ aDbij(t) (22)
aj(t + 1) = aj(t)− hdE
daj+ aDaj(t) (23)
where η and α are learning rate and momentum, respectively.The learning rate and momentum have significant effect onthe learning speed and stability of neural network. Althougha large value of η results in faster convergence but maylead to instability and divergence yet a small value of ηresults in stable but leads to slower convergence. Thus, anadaptive learning rate has been used in the proposedalgorithm. If the error reduces the learning rate is increasedand vice versa. The momentum coefficient α reduceshigh-frequency weight changes and provides stability andfast learning. Momentum values are kept high to improvetraining characteristics.
4 Proposed methodology
The block diagram of the proposed method is given in Fig. 4.The proposed method for pose invariant face recognitionusing curvelet neural network is given below
Algorithm 1
Step 1: Extract the face and normalise it to 112 × 92 pixels.Step 2: Apply curvelet transform on the normalised faceimage. Select detailed subbands as they are invariant tochanges in illumination and pose.Step 3: Subbands are divided into radial sections with centreregion and other neighbouring regions (Fig. 5.).Curvelet-invariant moments are obtained by using (4–11) ofthe subbands to form the feature vector for classification(Fig. 1.).Step 4: Feature vectors from the curvelet are fed as input tothe curvelet neural network. Number of input nodes is equalto the size of the feature vector. Number of output neuronsis equal to the number of subjects to be recognised.Number of hidden neurons has been used equal to thenumber of clusters.Step 5: Initial values of translation and dilation parameters arefixed based on the centres and width of the clusters.Step 6: Initialise all weights Wij and Vjk to some randomvalues between 0 and 1.
134& The Institution of Engineering and Technology 2014
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
Table 2 Recognition accuracy (in%) of different methods with different views for CMU-PIE database
Views MSM [39] Graph [39] SRC [39] JSRC [39] JDSRC [39] Proposed method
1 36.5 44.5 45.0 45.0 45.0 44.73 48.9 63.4 59.5 53.6 72.0 74.35 52.5 72.0 62.2 55.0 82.3 83.67 55.9 76.5 63.3 51.4 84.5 88.3
Table 3 Recognition accuracy (in%) of different methods with different dimensionality for CMU-PIE database
Dimensionality MSM [39] Graph [39] SRC [39] JSRC [39] JDSRC [39] Proposed method
32 0 62.6 47.7 34.6 68.8 69.164 52.0 73.9 63.2 54.7 82.1 84.3128 68.8 75.4 68.5 67.9 83.4 84.1256 74.3 77.7 69.3 74.1 85.9 86.7
Table 4 Recognition accuracy (in%) of different state-of-the-artmethods for FERET & CMU-PIE database
S.no.
Methods Recognition rate, %
CMU-PIE(average all13 images)
FERET(average of
all set)
LFW
1 Eigen face [37] 16.6 — —2 ridge + intensity
[37]88.24 81.6 —
3 local linearregression [37]
94.6 68.1 72.6
4 ridge +Gabor [37] 90.9 92.6 —5 PLS +Gabor [37] 89.5 — —6 local ridge
regression [40]90.8 — 70.2
7 ortho discriminantvector [18]
85.5 92 —
8 probabilisticlearning [15]
80.7 — —
9 latent SDA [40] 90.08 — 72.410 eigenspace +Neural
network [20]— — 59.0
11 mosaicing [21] 94.76a — —12 multiresolution MRF
[22]94.1 89 —
13 Gabor + LDA + SVM[26]
94.0 86 —
14 LLR [16] (For 23o) 93.5 — —15 LLR [16] (For 45o) 89.716 appearance based +
light field [11]78.8 — —
17 TFA [13] (for 23o) 100b — —18 TFA [13] (for 65o) 91 — —19 3 ODV [18] 91.16 — —20 2DPCA + ELM (very
low pose variation)[7]
— 99.78c —
21 Gabor + LBP +GMN[6]
93.3 98.56d 86.9d
22 complex moment[27]
-.- 98.0e —
23 proposed method 94.9 96.6 76.5
aResults are calculated on the same platform by simulating thealgorithm in [21].bResult is only for 23° pose and 65° pose [13].cPose variation or the set of database has not been explicitlydeclared in [7]. It seems to be less than 10° because the resultsare comparable with ORL [41] database used which have posevariation less than 10°.dPose variation is declared to be less than 10° [6].eResults are only for fb set of FERET database [27].
www.ietdl.org
Step 7: For all the images available in the dataset do Step 8 toStep 14.Step 8: Repeat step 9 to 14 until the error becomes negligiblysmall or 10−6.Step 9: Set all inputs to activation of xni . Set all outputs toactivation of Yn
k .Step 10: Compute the curvelet function at the hidden nodeusing (13).Step 11: Calculate the output of the curvelet neural networkusing (14).Step 12: Calculate the MSE for the kth pattern using
ek =1
2Yk − Dk
( )2Step 13: Calculate the error function E using (15).Step 14: Update the weights, Wij and Vjk, translation bij anddilation aj using (20–23).Step 15: Test the curvelet neural network so trained for theunknown patterns.
The curvelet neural network converges the network using(14) and gradient decent rule (back propagation algorithm).Activation function is the curvelet function given in (1).Training set includes the images 2–8 per subject chosenrandomly.Based on the experimentation the observations are made
for false acceptance rate (FAR) and false rejection rate(FRR). The percentage of imposter image acceptance iscalled as FAR. FAR is computed by training the proposedsystem with the faces from the one database and tested forfaces from other database and vice-versa. FRR is thepercentage of rejection of genuine images. FRR iscalculated by training the curvelet neural network by theimages from one database and testing on the images fromthe same database.
5 Experiment results
The face recognition experiments were carried out inMATLAB 2012b, on a 32-bit Intel (R) CORE (TM) 2 Duo2.10 GHz processor, with 3 GB RAM. The 40 subjects eachfrom FERET [34], LFW [35] and CMU-PIE [36] databaseswere considered. FERET database [34] consists of 14051eight-bit greyscale images of human heads with views
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
135& The Institution of Engineering and Technology 2014
www.ietdl.org
ranging from frontal to left and right profiles. Fa containing1196 frontal images of 1196 subjects is used as the galleryset, whereas Fb contains 1195 images with expressionvariations, Fc contains 194 images taken under differentillumination conditions, Dup I includes 722 images ofdifferent subjects taken later in time between 1 min to 1031days and Dup II contains 234 images, a subset of Dup Itaken at least after 18 months were used as the probe sets.LFW [35] is a database for unconstrained face recognition.The data set contains more than 13 000 images of facescollected from the web. 1680 of the people pictured havetwo or more distinct photos in the data set. In LFW, as thenumber of images used for training is one there is reductionin the recognition rate. For comparison, we simulated theavailable methods for LFW database on the commonplatform. CMU-PIE is a database of 41 368 images of 68people. By extending the CMU 3D Room we were able toimage each person under 13 different poses, 43 differentillumination conditions and with four different expressions.This database is known as the CMU Pose, Illumination andExpression (PIE) database.In both the databases ten images per subject having poseand illumination variation were considered. The databasewas divided into training set and test sets by consideringrandomly one images from each subject in training set andone in test sets. The training images were decomposedusing curvelet transform at scale = 2 and angle = 8. Thusnine components were produced, including one approximateand eight detailed subbands. Approximate and detailedsubband for an example image from FERET database isshown in Fig. 1. Histograms of approximate and detailedsubband are shown in Fig. 6. It is clear from the figure thatthe approximate bands are much affected by pose andillumination, but detailed subbands are less affected. This isthe reason for considering detailed subbands in theproposed method.It is evident from Table 1 that with increase in the number
of features, the recognition rate and training time increases.
Fig. 7 ROC plot for different comparable methods for
a FERET databaseb CMU-PIE database
136& The Institution of Engineering and Technology 2014
The training time of proposed method is the least in boththe databases. This is because of orthogonal feature ofcurvelet. The recognition rate of curvelet neural network ishigher than other neural networks. The effectiveness of theproposed method is also tested for different number ofviews of the same face image. 21 images of 100 subjectswere considered and training was done using one to sevenimages at different angles and rest of the images wereconsidered in the test set. Proposed method showed slightlyweaker results for one view, but showed better results formore number of views. The results are given in Table 2.The effectiveness of the proposed method is also evaluatedby changing the dimensionality of data and is shown inTable 3. As the dimensionality of the data increases therecognition accuracy also increases. Also the table showsthat the proposed method outperforms with wide variationin poses. The proposed algorithm is compared with otherstate-of-the-art for recognition accuracy in Table 4. InTable 4, the results are calculated on the same platform bysimulating the algorithm in [21]. In [16] the result is onlyfor 23o pose and 65o pose. In [7], the pose variation or theset of database has not been explicitly declared in the result.It seems to be less than 10° because the results arecomparable to ORL [41] database used which have posevariation less than 10°. In [26] the pose variation isdeclared to be <10°. The results are only for Fb Set ofFERET database [27].To visualise the effectiveness of the proposed method the
receiver operating curve (ROC) that represents the genuineacceptance and FAR of different state-of-the-art methodsalong with the proposed method is shown in Fig. 7. It isclear from Fig. 7 that the proposed scheme is efficient asthe obtained ROC curve is approaching the ideal curve forFERET and CMU-PIE dataset. Ideal ROC curve is astraight line touching genuine acceptance rate of 1 for FARfrom 0 to 1.Fig. 8a shows the improvement in the recognition accuracy
for features from approximate band and moments of detailed
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
Fig. 8 Recognition accuracy for different poses of CMU-PIE database
www.ietdl.org
subband. The efficiency for all the poses using curveletmoment is higher as compared with moment of the image.This proves that the curvelet moments are more efficientthan the moments of the image. The recognitionperformance of proposed method for each of the 13 posesfor the test set for CMU-PIE database in comparison withother methods is shown in Fig. 8b. It is clear from thefigure that the proposed method outperforms other existingstate-of-the-art methods.
6 Conclusion
This paper presents an efficient face recognition techniqueusing higher-order invariant moments of the curveletsubbands. It enhances the recognition rate as invarianthigher-order moments include global and local informationof the face image. Furthermore, the capability of curveletneural network is better than other variants of neuralnetworks in terms of convergence time and recognition rate.The experiments on LFW, FERET and CMU-PIE facedatabases prove that the proposed method is robust to thechanges in pose and slight variation in lighting conditionsand outperformed all the curvelet-based face recognitionmethods.
7 Acknowledgment
This work was supported by the Department of Science andTechnology, New Delhi, India under Technology SystemDevelopment Scheme DST/TSG/ICT/2011/56-G.
8 References
1 Aroussi, M.E., Hassouni, M.E., Ghouzali, S., Rziza, M., Aboutajdine,D.: ‘Local appearance based face recognition method using blockbased steerable pyramid transform’, Signal Process., 2011, 91,pp. 38–50
2 Guan, N., Tao, D., Luo, Z., Yuan, B.: ‘NeNMF: an optimal gradientmethod for non-negative matrix factorization’, IEEE Trans. SignalProcess., 2012, 60, (6), pp. 2882–2898
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019
3 Liu, G., Lin, Z., Yan, S., Sun, J., Yu, Y., Ma, Y.: ‘Robust recovery ofsubspace structures by low-rank representation’, IEEE Trans. PatternAnal. Mach. Intell., 2012, 35, (1), pp. 170–174
4 Zhang, B., Gao, Y., Zhao, S., Liu, J.: ‘Local derivative pattern versuslocal binary pattern: face recognition with high-order local patterndescriptor’, IEEE Trans. Image Process., 2010, 19, (2), pp. 533–544
5 Shen, L., Bai, L.: ‘A review on Gabor wavelets for face recognition’,Pattern Anal. Appl., 2006, 9, (2), pp. 273–292
6 Sharma, P., Arya, K.V., Yadav, R.N.: ‘Efficient face recognition usingwavelet based generalized neural network’, Signal Process., 2013, 93,(6), pp. 1557–1565
7 Mohammad, A.A., Minhas, R., Wu, Q.M.J., Sid-Ahmad, M.A.: ‘Humanface recognition based on multidimentional PCA and extreme learningmachine’, Pattern Recognit., 2011, 44, pp. 2588–2597
8 Zainuddin, Z., Pauline, O.: ‘Modified wavelet neural network infunction approximation and its application in prediction of time-seriesof time series pollution data’, Appl. Soft Comput., 2011, 11, (8),pp. 4866–4874
9 Pentland, A., Moghaddam, B., Starner, T.: ‘View-based and modulareigenspaces for face recognition’. Proc. ITEE Conf. Computer Visionand Pattern Recognition, 1994, pp. 84–91
10 Cootes, T.F., Walker, K., Taylor, C.J.: ‘View-based active appearancemodels’. Proc. Fourth IEEE Int. Conf. Automatic Face and GestureRecognition, 2000, pp. 227–232
11 Gross, R., Matthews, I., Baker, S.: ‘Appearance based face recognitionand light-fields’, IEEE Trans. PAMI, 2004, 26, pp. 449–465
12 Chai, X., Shan, S., Gao, W.: ‘Pose normalization for robust facerecognition based on statistical affine transformation’. Information,Communications and Signal Processing Conf., 2003, 3, pp. 1413–1417
13 Prince, S.J.D., Elder, J.H., Warrell, J., Felisberti, F.M.: ‘Tied factoranalysis for face recognition across large pose differences’, IEEETrans. Pattern Anal. Mach. Intell., 2008, 30, (6), pp. 1–14
14 Shan, T., Lovell, B.C., Chen, S.: ‘Face recognition robust to head posefrom one sample image’. Proc. 18th Int. Conf. Pattern Recognition,2006, pp. 515–518
15 Sarfraz, M.S., Hellwich, O.: ‘Probabilistic learning for fully automaticface recognition across pose’, Image Vis. Comput., 2010, 28,pp. 744–753
16 Chai, X., Shan, S., Chen, X., Gao, W.: ‘Local linear regression (LLR)for pose invariant face recognition’, IEEE Trans. Image Process.,2007, 16, (7), pp. 1716–1729
17 Choi, S., Choi, C., Kwak, N.: ‘Face Recognition based on 2D Imagesunder illumination and pose variations’, Pattern Recognit. Lett., 2011,32, pp. 561–571
18 Wang, J., You, J., Li, Q., Xu, Y.: ‘Orthogonal discriminant vector forface recognition across pose, pattern recognition, http://dx.doi.org/10.1016/j.patcog.2012.04.012, 2012, 45, (12), pp. 4069–4079
19 Moallem, P., Mousavi, B.S., Monadjemi, S.A.: ‘A novel fuzzy rule basesystem for pose independent faces detection’, Appl. Soft Comput., 2011,11, (2), pp. 1801–1810
137& The Institution of Engineering and Technology 2014
www.ietdl.org
20 Huang, F.J., Zhou, Z., Zhang, H.J., Chen, T.: ‘Pose invariant facerecognition’. Proc. IEEE Int. Conf. Automatic Face and GestureRecognition, Grenoble, France, 2000, pp. 245–250
21 Singh, R., Vatsa, m., Ross, A., Noore, A.: ‘A Mosaicing scheme forpose-invariant face recognition’, IEEE Trans. Syst. Man Cybern. B,2007, 37, (5), pp. 1212–1225
22 Arashloo, S.R., Kittler, J.J., Christmas, W.J.: ‘Pose invariant facerecognition by matching on multiresolution MRF’s linked bysupercoupling transform’, Comput. Vis. Image Underst., 2011, 115,pp. 1073–1083
23 Zhang, H., Zhang, Y., Huang, T.S.: ‘Pose robust face recognition viasparse representation’, Pattern Recognit., 2013, 46, pp. 1511–1521
24 Lu, C.Y., Min, H., Gui, J., Zhu, L., Lei, Y.K.: ‘ Face recognition viaweighted sparse representation’, J. Vis. Commun. Image Represent.,2013, 24, pp. 111–116
25 Ma, B., Chai, X., Wang, T.: ‘A novel feature descriptor based onbiologically inspired feature for head pose estimation’, NeuroComput., http://dx.doi.org/10.1016/j.neucom2012.11.005, 115, (4),pp. 1–10
26 Meshgini, S., Aghagolzadeh, A., Seyedarabi, H.: ‘Face recognitionusing gabor based direct linear discriminant analysis and supportvector machine’, Comput. Electr. Eng., http://dx.doi.org/10.1016/j.compeleceng2012.12.011, 39, (3), pp. 727–745
27 Singh, C., Sahan, A.M.: ‘ Face recognition using complex waveletmoments’, Opt. Laser Technol., 2013, 47, pp. 256–267
28 Candes, E.J., Donoho, D.L.: ‘Curvelets- a suprisingly effectivenonadaptive representation for objects with edges’ (VanderbiltUniversity Press, Nashville, TN, 2000)
29 Candes, E.J., Donoho, D.L.: ‘New tight frames of curvelets and optimalrepresentations of objects with C2 singularities’, Commun. Pure Appl.Math., 2002, 57, (2), pp. 219–266
30 Candes, E.J., Demanet, L., Donoho, D.L., Ying, L.: ‘Fast discrete curvelettransform’, SIAM Multiscale Model. Simul., 2005, 5, pp. 861–899
138& The Institution of Engineering and Technology 2014
31 Papakostas, G.A., Koulouriotis, D.E., Karakasis, E.G., Tourassis, V.D.:‘Moment-based local binary patterns: a novel descriptor for invariantpattern recognition applications’, Neuro Comput., 2013, 99, pp. 358–37
32 Sharma, P., Arya, K.V., Yadav, R.N.: ‘Extraction of facial features usinghigher order moments in curvelet transform and recognition usinggeneralized mean neural networks’. Int. Conf. Soft Computing forProblem Solving at IIT Roorkee, 20–22 December, 2011, vol. 131,pp. 717–728
33 Yadav, R.N., Kumar, N., Kalra, P.K., John, J.: ‘Learning withgeneralized-mean neuron model’, Neuro Comput., 2006, 69,pp. 2026–2032
34 Phillips, P.J.: ‘The FERET database and evaluation procedure forface recognition algorithm’, Image Vis. Comput., 1998, 16, (5),pp. 295–306
35 Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: ‘Labeled facesin the wild: a database for studying face recognition in unconstrainedenvironments’, Technical Report 07–49, University of Massachusetts,Amherst, October, 2007
36 Sim, T., Baker, S., Bsat, M.: ‘The CMU Pose, illumination andexpression database of human faces’, CMU Technical ReportCMU-RI-TR-01-02, 2001
37 Sharma, A., Haj, m.A., Choi, J., Davis, L.S., Jacob, D.W.: ‘Robust poseinvariant face recognition using coupled latent space discriminantanalysis’, Comput. Vis. Image Underst., 2012, 116, pp. 1095–1110
38 Naseem, I., Togneri, R., Bennamoun, M.: ‘Robust regression for facerecognition’, Pattern Recognit., 2012, 45, pp. 104–118
39 Zhang, H., Nasrabadi, N.M., Zhang, Y., Huang, T.S.: ‘Joint dynamicsparse representation for multi view face recognition’, PatternRecognit., 2012, 45, pp. 1290–1298
40 Xue, H., Zhu, Y., Chan, S.: ‘Local ridge regression for face recognition’,Neuro Comput., 2009, 72, pp. 1342–1346
41 ORL Database at URL: www.uk.research.att.com/facedatabase.html
IET Biom., 2014, Vol. 3, Iss. 3, pp. 128–138doi: 10.1049/iet-bmt.2013.0019