n-feature neural network human face recognition

N-feature neural network human face recognition

Javad Haddadniaa, Majid Ahmadib,*

aDepartment of Engineering, Tarbiat Moallem University of Sabzevar, Sabzevar, Khorasan 397, IranbDepartment of Electrical and Computer Engineering, University of Windsor, 401 Sunset Avenue, Windsor, Ont., Canada N9B 3P4

Received 31 July 2003; received in revised form 4 March 2004; accepted 22 March 2004

Abstract

This paper introduces an efficient method for human face recognition system, which is called the hybrid N-feature neural network

(HNFNN) human face recognition system. The HNFNN employs a set of different kind of features from face images with radial basis

function (RBF) neural networks, which are fused together through the majority rule. The proposed method improves the performance of the

system by combining RBF neural networks, training with different learning algorithms, in committees. This article also evaluates how the

performance can be improved by disregarding irrelevant data from the face images by defining the efficient parameters. Experimental results

on the ORL and Yale face databases confirm that the proposed method lends itself to higher classification accuracy relative to existing

techniques.

q 2004 Elsevier B.V. All rights reserved.

Keywords: Face recognition; RBF neural network; Clustering technique; Shape information

1. Introduction

Face recognition may seem an easy task for humans,

however, computerized face recognition system still cannot

achieve a completely reliable performance. Difficulties arise

due to large variation in facial appearance, head size,

orientation and change in environmental conditions. Such

difficulties make face recognition one of the fundamental

problems in pattern analysis. In recent years, there has been

a growing interest in machine recognition of faces due to its

potential for commercial applications such as film proces-

sing, law enforcement, person identification, access control

systems, etc. A recent survey of the face recognition

systems can be found in Refs. [1–3].

A complete face recognition system generally consists of

three stages. The first stage involves detecting and

localizing the face in arbitrary images [4–6]. The second

stage requires extraction of pertinent features from the

localized image obtained in the first stage. Finally, the third

stage involves classification of facial images based on the

derived feature vector obtained in the previous stage. In

order to design a high accuracy recognition system,

the choice of feature extractor is very crucial. Two main

approaches to feature extraction have been extensively used

in conventional techniques [2]. The first one is based on

extracting structural facial features that are local structure of

face images, for example, the shapes of the eyes, nose and

mouth. The structure-based approaches deal with local

information instead of global information, and, therefore,

are not affected by irrelevant information in an image.

However, because of the explicit model of facial features,

the structure-based approaches are sensitive to the unpre-

dictability of face appearance and environmental

conditions [2]. The second method is a statistical-based

approach that extracts features from the entire image and,

therefore uses global information instead of local infor-

mation. Since the global data of an image are used to

determine the feature elements, data that are irrelevant to

facial portion such as hair, shoulders and background may

create erroneous feature vectors that can affect the

recognition results [7]. In recent years, many researchers

have noticed this problem and tried to exclude these

irrelevant data while performing face recognition [2–4,7].

An efficient method based on shape information is proposed

to eliminate the irrelevant data.

In the field of pattern recognition, the combination of an

ensemble of classifiers has been proposed to achieve image

0262-8856/$ - see front matter q 2004 Elsevier B.V. All rights reserved.

doi:10.1016/j.imavis.2004.03.011

Image and Vision Computing 22 (2004) 1071–1082

www.elsevier.com/locate/imavis

* Corresponding author. Tel.: þ1-519-253-3000x2576; fax: þ1-519-

971-3695.

E-mail address: [email protected] (M. Ahmadi).

http://www.elsevier.com/locate/imavis

classification systems with higher performance in compari-

son with the best performance achievable employing a

single classifier. By combining more than one classifier, the

deficiencies of each classifier may be compensated for by

the efficiency of others. This has been verified experimen-

tally in Refs. [8–10]. A number of image classification

systems based on the combination of outputs of different

classifier systems have been proposed. Different structures

for combining classifier systems can be grouped in three

configurations [9,10]. In the first group, the classifier

systems are connected in cascade to create a pipeline

structure. In the second one, the classifier systems are used

in parallel and their outputs are combined, named it parallel

structure. Finally, the hybrid structure is a combination of

the pipeline and parallel structures. In this paper, we

propose a human face recognition system that can be

designed based on hybrid structure classifier system to

have evolutionary recognition results by developing the

N-features and selecting them for the recognition problem in

the feature extraction stage. The proposed human face

recognition system uses available information and extracts

more characteristics for face classification purpose by

extracting different feature domains from input images. In

this paper, three different feature domains have been used

for extracting features from input images. These include

pseudo Zernike moment (PZM) [4], principal component

analysis (PCA) [11] and discrete cosine transform (DCT)

[12]. Radial basis function (RBF) neural networks are used

as individual classifiers. The RBF neural networks offer an

efficient approach to solve many engineering problems. An

important property of the RBF neural networks is that they

form a unifying link among many different research fields

such as function approximation, regularization, noisy

interpolation and pattern recognition. The increasing

popularity of the RBF neural networks is partly due to

their simple topological structure, their locally tuned

neurons and their ability to have a fast learning algorithm

in comparison with the multi-layer feed forward neural

networks [13,14]. A central issue in neural networks

generally and in the RBF neural network especially, is the

problem of learning algorithm. Learning algorithms have a

strong influence on the performance achieved by the RBF

neural networks [15–17]. The RBF neural networks usually

employ a combination of unsupervised and supervised

learning process. In the unsupervised learning phase, many

clustering techniques have been used to define the

distribution of the center vectors over the input space. The

clustering techniques associate a cluster to each node in

hidden layer. A non-linear optimization strategy is involved

to define free parameters in the hidden layer. In the

supervised phase, the output layer performs a linear

optimization of the weights associated with the output

layer. The authors also believe that different learning

algorithms have different recognition rates in human face

recognition [16,17], however, individual techniques can

complement each other when they are combined.

Thus, the combination of the RBF neural networks, each

using a different learning technique, could improve the

performance achieved.

The main objective of this paper is to present an efficient

and high accuracy human face recognition system using

combination of different feature domains and RBF neural

networks with different learning algorithms. The proposed

method constructs the RBF neural networks in unsupervised

learning phase using clustering techniques, which includes

k-mean clustering [18], fuzzy clustering [19] and iterative

optimization (IO) [20]. The adjustment of the parameters

and the determination of the neural network connection

weights are done through the linear least square method

[16,17]. An effective parameter is defined to distinguish

between face and non-face images. Also, to eliminate

irrelevant data, an efficient method for finding pure face

portion in an image is presented.

The rest of this paper is organized as follows: the

proposed human face recognition system is developed in

Section 2; face localization technique is described in

Section 3; Section 4 presents the feature extraction method;

the classification technique is described in Section 5; finally,

Sections 6 and 7 present the experimental results and

conclusions.

2. The proposed HNFNN

Fig. 1 shows conventional human face recognition,

which uses one feature domain and one classifier. Usually

neural network is used as classifier; therefore, the

conventional method is called single feature neural network

(SFNN) human face recognition system.

The proposed human face recognition has been shown in

Fig. (2). Unlike the SFNN, the proposed hybrid N-feature

neural network (HNFNN) system is developed in five

stages. In the first stage, face localization process is done.

To ensure a robust and accurate feature extraction that

distinguishes between face and non-face region in an image,

we require the exact location of the face region. In this

paper, we have used shape information technique that

presented in Ref. [4] for face localization. After face

localization, in the second stage, a sub-image is created,

which contains information needed for recognition algor-

ithm. By using a sub-image, data that are irrelevant to the

facial portion are disregarded. In the third stage, different

features are extracted in parallel from the derived sub-

image. These features are obtained from the different feature

domains. The fourth stage requires classification, which

classifies a new face image, based on the chosen features,

Fig. 1. Single feature neural network human face recognition system

(SFNN).

J. Haddadnia, M. Ahmadi / Image and Vision Computing 22 (2004) 1071–10821072

into one of the possibilities. This is done for each feature

domain in parallel using RBF neural networks, training with

different learning algorithms, as shown in Fig. 2. Finally, the

last stage combines the outputs of the each RBF neural

network classifier to construct the identification. In this

paper, the majority method has been selected for decision

strategy. The corresponding solutions for each stage are

presented in the following sections.

3. Face localization

Many algorithms have been proposed for face localiza-

tion and detection. A critical survey on face localization and

detection can be found in Refs. [2,3]. The ultimate goal of

the face localization is finding an object in an image as a

face candidate that its shape resembles the shape of a face.

Faces are characterized by elliptical shape and an ellipse can

approximate the shape of a face. A technique is presented,

which finds the best-fit ellipse to enclose the facial region of

the human face in a frontal view of facial image [4]. In this

algorithm, an ellipse model with five parameters has been

used. Initially, connected component objects are determined

by applying a region-growing algorithm. Consequently, for

each connected component object with a given minimum

size, the best-fit ellipse is computed on the basis of its

moments [6]. To assess how well its best-fit ellipse

approximates the connected component object, we define

a distance measure between the connected component

object and the best-fit ellipse as follows:

fi ¼ Pinside=m0;0 ð1Þ

fo ¼ Poutside=m0;0 ð2Þ

where the Pinside is the number of background points inside

the ellipse, Poutside is the number of points of the connected

component object that are outside of the ellipse and m0;0 is

the size of the connected component object [4,5]. The

connected component objects are closely approximated by

their best-fit ellipses when fi and fo is as small as possible.

We refer to the threshold values for fi and fo as facial

candidate threshold (FCT). Our experimental study indi-

cates that when FCT is less than 0.1 the connected

component object is very similar to ellipse; therefore, it is

a good candidate as a face region. If fi and fo are greater

than 0.1, there is no face region in the input image,

therefore, we reject it as non-face image [4]. An example of

this method for locating a face candidate and rejecting non-

face image with the corresponding values of f has been

shown in Fig. 3. Subsequently, the rest of the system

processes the selected face candidates for recognition.

4. Feature extraction

To design a system with low to moderate complexity, the

feature vectors created from feature extraction stage should

contain the most pertinent information about the face to be

recognized. In the statistical-based feature extraction

approaches, global information is used to create a set of

feature vector elements to perform recognition. A mixture

of irrelevant data, which are usually part of a facial image,

may result in an incorrect set of feature vector elements.

Therefore, data that are irrelevant to facial portion such as

hair, shoulders and background, should be disregarded in

the feature extraction phase. In the proposed HNFNN

system, feature extraction is done in two steps. In the first

step, after face localization, we create a sub-image, which

contains information needed for the recognition algorithm

while disregarding the irrelevant data. In the second step,

the feature vector is obtained by calculating the different

feature domains of the derived sub-image.

4.1. Sub-image creation

The sub-image encloses the pertinent information around

the face in an ellipse while pixel values outside the ellipse

Fig. 2. The HNFNN human face recognition system.

Fig. 3. Distinguishing between face and non-face image using best-fit

ellipse and FCT threshold.

J. Haddadnia, M. Ahmadi / Image and Vision Computing 22 (2004) 1071–1082 1073

are set to zero. Unfortunately, through creation of the sub-

image with the best-fit ellipse, described in Section 3, many

unwanted regions of the face image may still appear in this

sub-image, as shown in Fig. 4. These include hair portion,

neck and part of the background as an example.

To overcome this problem, instead of using the best-fit

ellipse for creating a sub-image, we have defined another

ellipse. The proposed ellipse has the same orientation and

center as the best-fit ellipse but the lengths of its major and

minor axes are calculated from the lengths of the major

and minor axes of the best-fit ellipse as follows:

A ¼ r·a ð3Þ

B ¼ r·b ð4Þ

where A and B are the lengths of the major and minor axes of

the proposed ellipse, and a and b are the lengths of the

major and minor axes of the best-fit ellipse that have been

defined [4,6]. The coefficient r is called the correct

information ratio (CIR) and varies from 0 to 1. Fig. 5

shows the effect of changing CIR while Fig. 6 shows the

corresponding sub-images. Experimental results with 400

face images in the ORL database and 165 face images in the

Yale database show that the best value for CIR is around

0.87. By using the above procedure, data that are irrelevant

to facial portion are disregarded. Computing the PZM, PCA

and DCT of the sub-image, which is obtained in this

subsection, the feature vectors are generated.

4.2. Pseudo Zernike moment

The advantages of considering orthogonal moments are

that they are shift, rotation and scale invariant and very

robust in the presence of noise. Pseudo Zernike polynomials

are well known and widely used in the analysis of optical

systems. Pseudo Zernike polynomials are orthogonal set of

complex-valued polynomials, and are defined as [4,21]:

Vnmðx; yÞ ¼ Rnmðx; yÞexp jm tan21 y

x

� �� ð5Þ

where x2 þ y2 # 1; n $ 0; lml # n is even and Radial

polynomials Rnm are defined as:

Rnmðx; yÞ ¼Xn2lml

s¼0

Dn;lml;sðx2 þ y2Þn2s=2 ð6Þ

Dn;lml;s ¼ ð21Þsð2n þ 1 2 sÞ!

s!ðn 2 lml2 sÞ!ðn 2 lml2 s þ 1Þ!ð7Þ

The PZM of order n and repetition m; can be computed as

follows:

PZMnm ¼n þ 1

p

Xn2lml

ðn2m2sÞeven;s¼0

Dn;lml;sXk

a¼0

Xmb¼0

k

a

!

�m

b

!ð2jÞbCM2kþm22a2b;2aþb þ

n þ 1

p

�Xn2lml

ðn2m2sÞodd;s¼0

Dn;lml;sXda¼0

Xmb¼0

d

a

!m

b

!

� ð2jÞbRM2dþm22a2b;2aþb ð8Þ

where k ¼ ðn 2 s 2 mÞ=2; d ¼ ðn 2 s 2 m þ 1Þ=2; CMp;q is

the scale invariant central moments and RMp;q is the scale

invariant radial geometric moments are defined as:

CMp;q ¼mpq

Mðpþqþ2Þ=200

ð9Þ

RMp;q ¼

Xx

Xy

f ðx; yÞðx02 þ y02Þ1=2x0py0q

Mðpþqþ2Þ=200

ð10Þ

where x_ ¼ x 2 x0; y_ ¼ y 2 y0 and Mpq; mpq and x0; y0 are

defined as follow:

Mpq ¼X

x

Xy

f ðx; yÞxpyq ð11Þ

mpq ¼X

x

Xy

f ðx; yÞðx 2 x0Þpðy 2 y0Þ

q ð12Þ

x0 ¼ M10=M00 ð13Þ

y0 ¼ M01=M00 ð14Þ

4.3. Principal component analysis

PCA is a well-known statistical technique for feature

extraction. Each M £ N image in the training set was row

Fig. 4. Creating sub-images based on best-fit ellipse.

Fig. 5. Different ellipses with pertinent value of CIR.

Fig. 6. Sub-images formation based on the proposed ellipse and its

corresponding CIR values.


concatenated to form MN £ 1 vectors ~Ai: Given a set of NT

training images { ~Ai}i¼0;1;…;NT; the mean vector of the

training set was obtained as:

�A ¼1

NT

XNT

i¼1

~Ai ð15Þ

The average vector was subtracted out from the training

vectors to obtain:

Ai ¼ ~Ai 2 �A; i ¼ 1; 2; 3;…;NT ð16Þ

An NT £ MN matrix, A; was constructed with the ATi as

its row vectors. The singular value decomposition of A can

then be written as:

VTAU ¼ lSl0l ð17Þ

where S is an NT £ NT diagonal matrix with singular values

si . 0 arranged in descending order, and V and U are NT £ NT

and MN £ MN orthogonal matrices, respectively; V is

composed of the eigenvectors of AAT; while U is composed

of the eigenvectors of AAT: These are related by:

U ¼ ATV ð18Þ

where U consists of the eigenvectors of AAT; which

correspond to the non-zero singular values. This relation

allows a smaller NT £ NT eigenvalue problem for AAT to be

solved, and to subsequently obtain U by matrix multiplication.

The projection of a face vector onto the space of NT

eigenfaces results in an NT-dimensional feature vector of

projection weights. Since PCA has the property of packing

the greatest energy into the least number of principal

components, the smaller principal components, which are

less than a threshold, can be discarded with minimal loss of

representational capability. This dimensionality reduction

results in face weight vectors of dimensions ~NT , NT: An

appropriate value of ~NT can be chosen by considering the

basis restriction error (BRE) as a function of ~NT [11]. This

gradual decrease in error is significant for recognition

techniques based on eigenfaces where storage and compu-

tational performance are directly related to NT:

4.4. Discrete cosine transform

The DCT transforms spatial information to decoupled

frequency information in the form of DCT coefficients. Also

it exhibits excellent energy compaction. The definition of

DCT for an N £ N image is [12]:

DCTuv ¼1

N2

XN21

x¼0

XN21

y¼0

� f ðx; yÞcosð2x þ 1Þup

2N

� �cos

ð2y þ 1Þvp

2N

� �� ð19Þ

where f ðx; yÞ is N £ N image pixels.

5. Classifier design

Neural networks have been employed and compared to

conventional classifiers for a number of classification

problems. The results have shown that the accuracy of the

neural network approaches equivalent to, or slightly better

than, other methods. Also, due to the simplicity, generality

and good learning ability of the neural networks, these types

of classifiers are found to be more efficient [22,23]. The

RBF neural networks have found to be very attractive for

many engineering problems, and attempts have been carried

out to make the learning process in this type of classification

faster than normally required for the multi-layer feed

forward neural networks. Therefore, they serve as an

excellent candidate for pattern applications [23].

In this paper, RBF neural networks are used as individual

classifiers where the inputs to the each RBF neural network

are feature vectors derived from the feature extraction

technique described in the previous section.

5.1. RBF neural network structure

An RBF neural network structure is shown in Fig. 7. The

construction of the RBF neural network involves an input

layer, a hidden layer and an output layer with feed forward

architecture. The input layer of this network is a set of n

units, which accept the elements of an n-dimensional input

feature vector. The input units are fully connected to the

hidden layer with r hidden units. Connections between the

input and the hidden layer have unit weights and, as a result,

do not have to be trained. In this structure, the hidden units

are referred to as the RBF units. The goal of the RBF units is

to cluster the data and reduce its dimensionality with a non-

linear transformation and to map the input data to a new

space. The RBF units are also fully connected to the output

layer. The output layer contains s units, which implements a

linear combination on this new space.

The RBF neural network is a class of neural networks,

where the distance between the input vector and a prototype

vector determines the activation function of the hidden

units. The activation function of the RBF units is expressed

Fig. 7. RBF neural network structure.


as follow [13,14,16,17,22,23]:

RiðxÞ ¼ Ri

kx 2 ciksi

� �; i ¼ 1; 2;…; r ð20Þ

where x is an n-dimensional input feature vector, ci is an

n-dimensional vector called the center of the RBF unit, si is

the width of RBF unit and r is the number of RBF units. One

of the most common activation functions for the RBF units

is the Gaussian with mean vector ci and variance vector si

as follows [13,16,17,22,23]:

RiðxÞ ¼ exp 2kx 2 cik

2

s 2i

!ð21Þ

Note that s 2i represents the diagonal entries of covari-

ance matrix of Gaussian function. The output units are linear

and therefore the response of the jth output unit for input x is

given as:

yjðxÞ ¼Xr

i¼1

RiðxÞwij ð22Þ

where yjðxÞ is the response of the jth output unit to input x;

wij is the connection weight of the ith RBF unit to the jth

output node.

5.2. RBF-based classifier design

RBF neural network classifier can be viewed as a

function mapping interplant that tries to construct hyper

surfaces, one for each class, by taking a linear combination

of the RBF units. These hyper surfaces can be viewed as

discriminate functions, where the surface has a high value

for the class it represents and a low value for all others. An

unknown input feature vector is classified as belonging to

class associated with the hyper surface with the largest

output at that point. In this case, the RBF units serve as

components in a finite expansion of the desired hyper

surface where the component coefficients (the weights) have

to be trained [17,23,24].

RBF neural network learning algorithms usually involves

two steps. In the first step, one clustering method is used to

initialize and determine the hidden layer parameters. In the

second step, the output connection weights are adjusted. In

this article, each individual classifier based on RBF neural

network, has used one clustering method that is different

from another individual classifiers. The output connection

weights for all individual classifiers are computed based on

linear least squared method [16,17]. The following sub-

sections describe how use the clustering methods and

compute the output connection weights.

5.3. Clustering techniques

Clustering techniques employ unsupervised method to

find natural data groups in a non-classed data set. Clustering

is a process for partitioning a population of NT patterns into

k sets. In this process, the value of a cost function is

minimized and the clusters are placed in representative

regions of the input space. Several clustering techniques

have been proposed to find an optimal or near optimal

partition configuration. In this paper, batch k-means [18],

fuzzy clustering [19] and IO [20] are used. These techniques

described in the following subsections.

5.3.1. Batch k-means

The traditional k-means technique provides a simple

mechanism for data clustering [18]. It divides the input

space into k distinct clusters and places the center of each

cluster to its middle point. The clusters are defined by

minimizing the sum of the square distances between the

input patterns and the center of their cluster. The batch

approach is applied when the whole data set is available

beforehand. The initial partition in the batch k-means is

arbitrary defined by placing each input pattern in cluster

randomly selected. Thus, every cluster should end up with at

least one pattern. Since the whole data set is available, the

centers are defined to be the average of the patterns inside

the cluster. When the k-means is performed, the patterns

keep changing from one cluster to another and the centers

are recalculated at each step. Let ctk and xt denote the kth

center and the input pattern at time t; respectively. The

k-means algorithm adaptively computes the new center at

time t þ 1 as:

ctþ1k ¼ ct

k þ h·DkðxtÞ·ðxt 2 ct

kÞ ð23Þ

where h is a learning rate, which defines the rate at which

the centers are updated, and DkðxtÞ is a membership

indicator specifying whether the pattern xt belongs to the

cluster k whose center is ctk: This indicator ensures that only

the correct center is updated. The membership indicator

DkðxtÞ uses the minimum square Euclidean distance. If for

all i – k; the condition kx 2 ckk2# kx 2 cik is satisfied, the

membership indicator DkðxÞ ¼ 1; otherwise it will be zero.

The following algorithm is the basis of the k-means

technique:

Step 1 Arbitrarily select an initial cluster configuration.

Step 2 For each cluster k; calculate the center c1;…; ck:

Step 3 Redistribute patterns among clusters using the

minimum square Euclidean distance.

Step 4 Go to Step 2 until there is no further change in the

clusters’ centers.

5.3.2. Iterative optimization

The IO technique [20] employs breadth search and

examines the effect in objective function of moving a

pattern from one cluster to another. Starting with an

arbitrary partition, all possible move are investigated. A

move becomes definitive when the largest decrease in the

objective function is achieved. Consider, for example, the

pattern xi belonging to the cluster s: The pattern xi will only


be moved from cluster s to the cluster r if the following

conditions are satisfied:

ns·kxi 2 csk2

ns 2 1.

nr·kxi 2 crk2

nr þ 1ð24Þ

nr·kxi 2 crk2

nr þ 1¼ min

1#j#k;k–s

nj·kxi 2 cjk2

nj þ 1ð25Þ

where nj is the number of patterns currently in the cluster j:

This technique employs the following algorithm:

Step 1 Select an initial partition of the n samples into k

clusters and compute the means cluster c1;…; ck:

Step 2 Select the next candidate sample xi:

Step 3 Suppose xi is currently in the cluster s:

Step 4 If ni ¼ 1 since clusters with one pattern should

not be destroyed, therefore, go to Step 6. Else

compute:

pj ¼nj·kxi 2 cjk

2

nj 2 1ð26Þ

Step 5 Transfer xi to cluster r if pr # pj for all j:

Step 6 Update c1;…; ck:

Step 7 If there is no change in the objective function in

n attempts, then stop, else go to Step 2.

The centers can be efficiently updated by starting them

with the mean of the patterns inside each cluster using:

c0j ¼

1

nj

XX[cj

x ð27Þ

and in the following steps using:

ctþ1j ¼ ct

j þx 2 mj

nj þ 1: ð28Þ

5.3.3. Fuzzy clustering

The fuzzy clustering takes advantage of the recent

advances in optimal fuzzy clustering known as the Fuzzy-C-

Mean (FCM) algorithm to cluster data set [19]. FCM is a

clustering algorithm that each data point is associated with a

cluster through a membership degree. This technique

partitions a collection of NT data points into r fuzzy groups

and finds a cluster center in each group, such that a cost

function of a dissimilarity measure is minimized. The

algorithm employs fuzzy partitioning such that a given data

point can belong to several groups with a degree specified

by membership grades between 0 and 1. A fuzzy r-partition

of input feature vector X ¼ {x1; x2;…; xNT} , Rn is rep-

resented by a matrix U ¼ ½mik�; where the entries satisfy the

following constraints:

mik [ ½0; 1�; 1 # i # r; 1 # k # NT ð29Þ

Xr

i¼1

mik ¼ 1; 1 # k # NT ð30Þ

0 ,XNT

k¼1

mik , NT; 1 # i # r ð31Þ

The cluster structure of X can be described by U by

interpreting mik as the degree of membership of xk to cluster

i. A proper partition U of X may be defined by the

minimization of the following objective function [19]:

JmðU;CÞ ¼XNT

k¼1

Xr

i¼1

ðmikÞmd2

ik ð32Þ

where m [ ½1;þ1Þ is a weighting exponent called

fuzzifier, C ¼ {c1; c2;…; cr} is the vector of the cluster

centers, and dik is the distance between xk and the ith cluster.

Bezdek [25] proved that if m $ 1; d2ik . 0; and 1 # i # r;

then U and C minimize JmðU;CÞ only if the entries of them

are computed as follow:

mpik ¼

Xr

j¼1

ðdik=djkÞ2=ðm21Þ

24

3521

ð33Þ

cpi ¼XNT

k¼1

ðmikÞmxk

" #, XNT

k¼1

ðmikÞm

" #ð34Þ

The computation of the membership degrees, mpik;

depends on the definition of the distance measure dik;

which is the inner product norms (quadratic norms). The

squared quadratic norm (distance) between a pattern vector

xk and the center ci of the ith cluster is defined as follows:

d2ik ¼ ðxk 2 ciÞ

TGðxk 2 ciÞ ð35Þ

where G is a n £ n identity matrix. The FCM algorithm

determines the cluster centers ci and the membership matrix

U for a given r value as follows:

Step 1 Initially, the membership matrix is constructed

using random values between 0 and 1.

Step 2 For each cluster i ði ¼ 1; 2;…rÞ; the fuzzy cluster

centers ci are calculated using Eq. (34).

Step 3 For each cluster i, the distance measures dik are

computed using Eq. (35).

Step 4 The cost function in Eq. (32) is computed and if

either it is found to be below a certain tolerance

value, or its improvement over the previous

iteration is below a certain threshold, then it is

stopped and the clustering procedure is terminated.

Step 5 A new U using Eq. (33) is computed and Steps 2–5

are repeated.

5.4. Computing output connection weight

The second step in each learning algorithms, is to adjust

the output connection weights. Here, we use the linear least


squared method for adjusting the RBF neural network

output connection weights [16,17]. For any input feature

vector xk [ Rn; the output of the RBF neural network in

Eq. (22) can be determined in a more compact form as

follows:

W £ R ¼ Y ð36Þ

where R [ Ru£NT is the matrix of the RBF units, W [ Rs£u

is the output connection weight matrix, Y [ Rs£NT is the

output matrix and NT is the total number of training sample

patterns, which is usually greater than s; therefore, this is an

over-determined problem and generally, there is no exact

procedure to solve for W in Eq. (36). Instead, an iterative

method based on the linear least squared method is utilized

to obtain an approximate solution W 0 [ Rs£u; which is close

to W in a least squared sense. To find W 0; the squared error

function is determined by:

SE ¼ kT 2 W £ Rk2 ð37Þ

where T ¼ ðt1; t2;…; tsÞT [ Rs£NT0 is the target matrix

consisting of 1’s and 0’s with each column having only

one non-zero element that identifies the processing pattern

to which the given exemplar belongs. By using the linear

least squared method we can find W 0 such that:

W 0 £ R ¼ T ð38Þ

The optimal W 0 can be obtained by [16,17]:

W 0 ¼ TðRTRÞ21RT ð39Þ

where ðRTRÞ21RT is the pseudo inverse of R and RT is the

transpose of R: From Eq. (39) we can now compute the

output connection weights.

6. Experimental results

To check the utility of the proposed system, experimental

studies are carried out on the ORL database images of

Cambridge University and the Yale face database of Yale

University. The ORL database contains 400 face images

from 40 individuals in different states. The total number of

images for each person is 10. None of the 10 samples is

identical to any other. They vary in position, rotation, scale

and expression. The changes in orientation have been

accomplished by rotating the person a maximum of 208 in

the same plane; also each person has changed his/her facial

expression in each of 10 samples (open/closed eye,

smiling/not smiling). The changes in scale have been

achieved by changing the distance between the person and

the video camera. For some individuals, the images were

taken at different times, varying facial details (glasses/no

glasses). Each image was digitized and presented by a

112 £ 92 pixel array whose gray levels ranged between

0 and 255. Samples of the ORL database are shown in Fig. 8.

The Yale face database contains 165 face images of 15

individuals. There are 11 images per subject, one for each

facial expression or configuration: center light, glasses/no

glasses, happy, normal, left light, right light, sad, sleepy,

surprised and wink. Samples of Yale face database are

shown in Fig. 9.

Dividing database images into training and test sets was

part of the experimental studies. The training and testing set

is selected, by randomly choosing five images for each

subject from the ORL database and six images from the

Yale database. Therefore, in the ORL database a total of

200 images are used as the training set and another 200 are

used as the testing set while in the Yale database a total of

Fig. 8. Samples of the ORL database.


90 images are used for training and the rest are used for

testing. There is no overlap between the training and test

sets. All moments from order 9 to 10 with 21 elements were

considered as feature vector in the PZM feature domain [4],

while in the PCA feature domain, feature vector was created

based on the 50 largest PCA values [14]. In the DCT feature

domain, the 50 DCT values were considered as feature

vector [16]. The experimental study conducted in this paper

has been done with respect to the following subsections.

6.1. The SFNN system performance evaluation

In this phase of the experimental study, each individual

RBF classifier with its corresponding clustering technique

has been trained with training set on the ORL database, and

separately tested with each feature domain. This process has

been repeated for each classifier 30 times by randomly

choosing different training and test set. The average classifier

error among 30 runs for the RBF þ k-means with respect to

the face classes has been shown in Fig. 10, while Figs. 11

and 12 show the average classifier error for the RBF þ IO

and RBF þ fuzzy, respectively. These figures show the best

performance achieved for the PZM, PCA and DCT are

achieved, when for the PZM fuzzy clustering, for the PCA,

k-means method and for the DCT IO are used for clustering

technique on the RBF neural network classifiers.

6.2. The HNFNN system performance evaluation

In this phase, the HNFNN system has been constructed.

The PZM with RBF þ fuzzy, PCA with RBF þ k-means

and DCT with RBF þ IO have been considered as feature

domains and individual classifiers, respectively. Training

and test set has been selected on the ORL and Yale

databases. The recognition rate of 100% was obtained for

the Yale face database while for the ORL database

the recognition rate was 99.5%. Fig. 13 shows the classifier

error with respect to the class number for the HNFNN

system on the ORL database. Also in this figure, for each

individual classifier with its corresponding feature domain,

the classifier error has been shown. From the results, it is

obvious that the recognition rate of the HNFNN is much

better than that of any individual classifiers. From this

figure, it is clear that the output of each individual classifier

may agree or conflict with each other but the proposed

HNFNN searches for a maximum degree of agreement

between the conflicting supports of a face pattern.

6.3. FCT and distinguishing between face and non-face

To evaluate the effect of FCT in the face localization step

and distinguishing between face and non-face images,

Fig. 9. Samples of the Yale face database.

Fig. 10. Average classifier error for RBF þ k-means with different feature

domains based on the class number.


we prepared 20 non-face images and applied them to the

system. Fig. 3 shows a sample of such images with

fi ¼ 0:15 and fo ¼ 0:191: We varied the FCT value and

evaluated the number of non-face images that passed

through the system. Experimental results showed that

FCT ¼ 0.1 is a good threshold for distinguishing between

face and non-face images. Fig. 14 shows this result.

6.4. CIR and disregarding irrelevant data

For the purpose of evaluating how the irrelevant data of a

facial image such as hair, neck, shoulders and background

will influence the recognition results, we selected

FCT ¼ 0.1 for the face localization method and the ORL

database. We varied the CIR value and evaluated the

recognition rate of the proposed HNFNN system. Fig. 15

shows the effect of CIR values on the classifier error.

6.5. Comparison with other face recognition systems

This study compares the proposed HNFNN system with

the methods that used the same ORL database. These

include the shape information neural network (SINN) [4],

convolution neural network (CNN) [26], nearest feature line

(NFL) [27] and the fractal transformation (FT) [28]. In this

comparison, the training set and the test set were derived in

the same way as was suggested in Refs. [4,26–28]: The

10 images from each class of the 40 persons were randomly

partitioned into sets, resulting in 200 training images

and 200 test images, with no overlap between the two.

Also in this study, the error rate was defined to be the

number of misclassified images in the test phase over the

total number of test images, as was used in other studies

[4,26–28]. To conduct the comparison, an average error rate

was utilized as defined below [4,26–28]:

Eave ¼

Xmi¼1

Nim

mNt

ð40Þ

where m is the number of experimental runs, each being

performed on a random partitioning of the database into

sets, Nim is the number of misclassified images for the ith

run, and Nt is the number of total test images for each run.

Table 1 shows the comparison between the different

techniques using the same ORL database in term of Eave:

In this table, the CNN error rate was based on the average of

three runs as given in Ref. [26], while for the NFL

Fig. 12. Average classifier error for RBF þ fuzzy with different feature


Fig. 13. Classifier error in the HNFNN and each individual classifier with

respect to class number.

Fig. 11. Average classifier error for RBF þ IO with different feature


Fig. 14. Effect of facial candidate threshold (FCT).

Fig. 15. Error rate variation with respect to CIR value on the ORL database.


the average error rate of four runs was reported in Ref. [27].

Also an average run of one for the FT [28] and four runs for

the SINN [4] were carried out as suggested in the respective

papers. The average error rate of the proposed method for

the four runs is 0.48%, which yields the lowest error rate of

these techniques on the ORL database.

6.6. Time considering

For the purpose of evaluating the time considering

problem, we have tested our simulation results on the P4

computer with the 1.8 GHZ speed. The average time value

for each stage in Section 6.1 and the HNFNN system among

30 runs have been computed. Table 2 shows the results.

7. Conclusions

This paper presented an efficient method for the

recognition of human faces in two-dimensional digital

images. The proposed technique is based on the HNFNN

structure. The paper introduces two parameters, FCT and

CIR, for efficient and robust feature extraction technique.

Exhaustive experimentations were carried out to investigate

the effect of varying these parameters on the recognition

rate. We have also indicated the optimum values of the FCT

and CIR corresponding to the best recognition results. The

RBF neural networks, training with different learning

algorithms, were fused together to make a decision. This

paper also showed how the combining RBF neural networks

improve the classifier performance. The highest recognition

rates of 99.5% with the ORL database and 100% with the

Yale database were obtained using the proposed HNFNN

system. Comparison with some of the existing traditional

technique in the literatures on the same database indicates

the usefulness of the proposed technique.

Acknowledgements

The authors would like to thank Natural Sciences and

Engineering Research Council (NSERC) of Canada and

Micronet for supporting this research, and the anonymous

reviewers in VI02 conference for helpful comments.

References

[1] M.A. Grudin, On internal representation in face recognition systems,

Pattern Recognition 33 (7) (2000) 1161–1177.

[2] J. Daugman, Face detection: a survey, Computer Vision and Image

Understanding 83 (3) (2001) 236–274. September.

[3] M.H. Yung, D.J. Kreigman, N. Ahuja, Detecting face in images: a

survay, IEEE Transactions on Pattern Analysis and Machine

Intelligence 34 (1) (2002) 34–58.

[4] J. Haddadnia, M. Ahmadi, K. Faez, An Efficient Method for

Recognition of Human Face Recognition using Higher Order Pseudo

Zernike Moment Invariant, The Fifth IEEE International Conference

on Automatic Face and Gesture Recognition, Washington, DC, USA,

May 20–21, 2002, pp. 315–320.

[5] J. Haddadnia, K. Faez, Human Face Recognition Based on Shape

Information and Pseudo Zernike Moment, Fifth International Fall

Workshop Vision, Modeling and Visualization, Saarbrucken,

Germany, November 22–24, 2000, pp. 113–118.

[6] J. Wang, T. Tan, A new face detection method based on shape

information, Pattern Recognition Letter 21 (2000) 463–471.

[7] L.F. Chen, H.M. Liao, J. Lin, C. Han, Why recognition in a statistic-

based face recognition system should be based on the pure face

portion: a probabilistic decision-based proof, Pattern Recognition 34

(7) (2001) 1393–1403.

[8] G. Giacinto, F. Roli, G. Fumera, Unsupervised Learning of Neural

Network Ensembles for Image Classification, IEEE International Joint

Conference on Neural Network, vol. 3, 2000, pp. 155–159.

[9] J. Kittler, M. Hatef, R.P.W. Duin, J. Matas, On combining classifier,

IEEE Transactions on Pattern Analysis and Machine Intelligence 20

(1998) 226–239. March.

[10] T.K. Ho, J.J. Hull, S.N. Srihari, Decision combination in multiple

classifier systems, IEEE Transactions on Pattern Analysis and

Machine Intelligence 16 (1) (1994) 66–75. January.

[11] M. Truk, A. Pentland, Eigenfaces for recognition, Journal of

Cognitive Neuroscience 3 (1) (1991) 71–86.

[12] P.M. Embree, B. Kimble, C Language Algorithm for Digital Signal

Processing, Printice-Hall, Englewood Cliffs, NJ, 1991.

[13] J.S. Roger Jang, C.T. Sun, Functional equivalence between radial

basis function network and fuzzy inference system, IEEE Trans-

actions on Neural Networks 4 (1) (1993) 156–158.

[14] C.E. Thomaz, R.Q. Feitosa, A. Veiga, Design of Radial Basis

Function Network as Classifier in Face Recognition using Eigenfaces,

IEEE Proceedings of Fifth Brazilian Symposium on Neural Network,

1998, pp. 118–123.

[15] Y. Hara, R.G. Atkins, S.H. Yueh, R.T. Shin, J.A. Kong, Application

on neural networks to radar image classification, IEEE Transactions

on Geoscience and Remote Sensing 32 (1) (1994) 100–109.

[16] J. Haddadnia, k. Faez, M. Ahmadi, P. Moallem, Design of RBF neural

network using an efficient hybrid learning algorithm with application

in human face recognition with pseudo Zernike moment, IEICE

Transactions on Informatics and Systems E86-D (2) (2003) 316–325.

February.

[17] J. Haddadnia, M. Ahmadi, K. Faez, A. Hybrid, Learning RBF Neural

Network for Human Face Recognition with Pseudo Zernike Moment

Invariant, IEEE International Joint Conference on Neural Network,

Honolulu, Hawaii, USA, May 12–17, 2002, pp. 11–16.

Table 1

Error rate in different methods

Methods m Eave (%)

CNN [26] 3 3.83

NFL [27] 4 3.125

FT [28] 1 1.75

SFNN [10] 4 1.323

Proposed method 4 0.48

Table 2

Error rate in different methods

Methods Average time value

among 30 runs (ms)

RBF þ k-means 2223

RBF þ fuzzy 3421

RBF þ IO 2850

HNFNN 2910


[18] S. Gutta, J.R.J. Huang, P. Jonathon, H. Wechsler, Mixture of

experts for classification of gender, ethnic origin, and pose of

human faces, IEEE Transactions on Neural Networks 11 (4)

(2000) 949–960. July.

[19] F. Behloul, B.P.F. Lelieveldt, A. Boudraa, J.H.C. Reiber, Optimal

design of radial basis function neural networks for fuzzy-rule

extraction in high dimensional data, Pattern Recognition 35 (3)

(2002) 659–675. March.

[20] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis,

Wiley Interscience Publication, New York, 1973.

[21] C.H. The, R.T. Chin, On image analysis by the methods of moments,

IEEE Transactions on Pattern Analysis and Machine Intelligence 10

(4) (1988) 496–513.

[22] J. Haddadnia, K. Faez, P. Moallem, Neural Network based Face

Recognition with Moments Invariant, IEEE International Conference

on Image Processing, Thessaloniki, Greece, 7–10 October, vol. I,

2001, pp. 1018–1021.

[23] W. Zhou, Verification of the non-parametric characteristics of

backporpagation neural networks for image classification, IEEE

Transactions on Geoscience and Remote Sensing 37 (2) (1999)

771–779. March.

[24] Y. Lu, Knowledge integrations in a multiple classifier system, Applied

Intelligence 6 (2) (1996) 75–86. April.

[25] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function

Algorithms, Plenum, New York, 1981.

[26] S. Lawrence, C.L. Giles, A.C. Tsoi, A.D. Back, Face recognition: a

convolutional neural networks approach, IEEE Transactions on

Neural Networks, Special Issue on Neural Networks and Pattern

Recognition 8 (1) (1997) 98–113.

[27] S.Z. Li, J. Lu, Face recognition using the nearest feature line method,

IEEE Transactions on Neural Networks 10 (1999) 439–443.

[28] T. Tan, H. Yan, Face recognition by fractal transformations, IEEE

International Conference on Acoustics, Speech and Signal Processing

6 (1999) 3537–3540.


n-feature neural network human face recognition

Documents