analysis of unsupervised learning techniques for face recognition

7
Analysis of Unsupervised Learning Techniques for Face Recognition Dinesh Kumar, 1 C. S. Rai, 2 Shakti Kumar 3 1 Department of Computer Science and Engineering, Guru Jambheshwar University of Science & Technology, Hisar, Haryana, India 2 University School of Information Technology, GGS Indraprastha University, Kashmere Gate, Delhi, India 3 Computational Intelligence Lab, Institute of Science & Technology, Klawad, District Yamuna Nagar, Haryana, India Received 16 January 2008; accepted 15 May 2010 ABSTRACT: Face recognition has always been a potential research area because of its demand for reliable identification of a human being especially in government and commercial sectors, such as se- curity systems, criminal identification, border control, etc. where a large number of people interact with each other and/or with the sys- tem. The last two decades have witnessed many supervised and unsupervised learning techniques proposed by different researchers for the face recognition system. Principal component analysis (PCA), self-organizing map (SOM), and independent component analysis (ICA) are the most widely used unsupervised learning techniques reported by research community. This article presents an analysis and comparison of these techniques. The article also includes two SOM processing methods global SOM (GSOM) and local SOM (LSOM) for performance evaluation along with PCA and ICA. We have used two different databases for our analysis. The simulation result establishes the supremacy of GSOM in general among all the unsu- pervised techniques. V V C 2010 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 20, 261–267, 2010; View this article online at wileyonlinelibrary. com. DOI 10.1002/ima.20248 Key words: face recognition; principal component analysis; self- organizing maps; independent component analysis I. INTRODUCTION Face recognition has gained a lot of popularity in the past 15 to 20 years due to recent advancements and a large number of real world applications such as surveillance, secured access, information secu- rity, voter ID, etc. Face recognition approaches were broadly grouped into three categories (Chellapa et al., 1995; Zhao et al., 2003): feature based (Kelly, 1970; Manjunath et al., 1992; Samaria and Fallside, 1993), holistic or appearance based (Kirby and Siro- vich, 1990; Turk and Pentland, 1991; Etemed and Chellappa, 1997; Teixeira and Beveridge, 2003), and hybrid techniques (Pentland et al., 1994; Penev and Atick, 1996; Weyrauch et al., 2004). Although all the three approaches are important, it has been observed that the appearance-based approach has attracted the attention of majority of researchers. This approach extracts the holistic representation of whole face image and a comparison is done with several such images to find a match. Turk and Pentland (1991) made significant contribution toward machine recognition of the faces and their discovery led to the design of reliable real-time automated face recognition systems. Principal Component Analysis (PCA) technique, pioneered by Kirby and Sirovich (1990), was used to reduce the dimensions of the data by removing less useful information and decomposing the face into uncorrelated components known as Eigenfaces (Turk and Pentland, 1991). This approach was purely based on second-order statistics. Linear Discriminant Analysis (LDA) also known as Fish- erfaces was proposed in (Swets and Weng, 1996; Belhumeur et al., 1997). This technique was proposed with the aim as to maximize between class variance and minimize the within class variance. LDA was insensitive to large variations in lighting and facial expressions. Although it was claimed that LDA is better than PCA, yet it was proved that PCA outperformed LDA especially when the training data set was small (Martinez and Kak, 2001). Besides, PCA was found to be less sensitive to different training data set. Another technique known as Laplacianfaces was proposed (He et al., 2005) to preserve the local structure of the data. Locality Pre- serving Projections (LPP) was used for mapping the face images into face subspace. The technique was found more suitable for fron- tal face images. Several other methods such as Probabilistic Sub- spaces (Moghaddam and Pentland, 1995, 1998; Moghaddam, et al., 1998; Moghaddam, 1999), Feature Line Method (Li and Lu, 1999), Evolutionary Pursuit (Liu and Wechsler, 2000), Support Vector Machines (SVM) (Phillips, 1999), etc., have also been proposed by various researchers with their relative advantages and disadvantages. Correspondence to: Dinesh Kumar; e-mail: [email protected] ' 2010 Wiley Periodicals, Inc.

Upload: dinesh-kumar

Post on 11-Jun-2016

222 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Analysis of unsupervised learning techniques for face recognition

Analysis of Unsupervised Learning Techniques for FaceRecognition

Dinesh Kumar,1 C. S. Rai,2 Shakti Kumar3

1 Department of Computer Science and Engineering, Guru Jambheshwar University of Science &Technology, Hisar, Haryana, India

2 University School of Information Technology, GGS Indraprastha University, Kashmere Gate, Delhi, India

3 Computational Intelligence Lab, Institute of Science & Technology, Klawad,District Yamuna Nagar, Haryana, India

Received 16 January 2008; accepted 15 May 2010

ABSTRACT: Face recognition has always been a potential researcharea because of its demand for reliable identification of a human

being especially in government and commercial sectors, such as se-

curity systems, criminal identification, border control, etc. where a

large number of people interact with each other and/or with the sys-tem. The last two decades have witnessed many supervised and

unsupervised learning techniques proposed by different researchers

for the face recognition system. Principal component analysis (PCA),self-organizing map (SOM), and independent component analysis

(ICA) are the most widely used unsupervised learning techniques

reported by research community. This article presents an analysis

and comparison of these techniques. The article also includes twoSOM processing methods global SOM (GSOM) and local SOM

(LSOM) for performance evaluation along with PCA and ICA. We have

used two different databases for our analysis. The simulation result

establishes the supremacy of GSOM in general among all the unsu-pervised techniques. VVC 2010 Wiley Periodicals, Inc. Int J Imaging

Syst Technol, 20, 261–267, 2010; View this article online at wileyonlinelibrary.

com. DOI 10.1002/ima.20248

Key words: face recognition; principal component analysis; self-

organizing maps; independent component analysis

I. INTRODUCTION

Face recognition has gained a lot of popularity in the past 15 to 20

years due to recent advancements and a large number of real world

applications such as surveillance, secured access, information secu-

rity, voter ID, etc. Face recognition approaches were broadly

grouped into three categories (Chellapa et al., 1995; Zhao et al.,

2003): feature based (Kelly, 1970; Manjunath et al., 1992; Samaria

and Fallside, 1993), holistic or appearance based (Kirby and Siro-

vich, 1990; Turk and Pentland, 1991; Etemed and Chellappa, 1997;

Teixeira and Beveridge, 2003), and hybrid techniques (Pentland

et al., 1994; Penev and Atick, 1996; Weyrauch et al., 2004).

Although all the three approaches are important, it has been

observed that the appearance-based approach has attracted the

attention of majority of researchers. This approach extracts the

holistic representation of whole face image and a comparison is

done with several such images to find a match. Turk and Pentland

(1991) made significant contribution toward machine recognition of

the faces and their discovery led to the design of reliable real-time

automated face recognition systems.

Principal Component Analysis (PCA) technique, pioneered by

Kirby and Sirovich (1990), was used to reduce the dimensions of

the data by removing less useful information and decomposing the

face into uncorrelated components known as Eigenfaces (Turk and

Pentland, 1991). This approach was purely based on second-order

statistics. Linear Discriminant Analysis (LDA) also known as Fish-

erfaces was proposed in (Swets and Weng, 1996; Belhumeur et al.,

1997). This technique was proposed with the aim as to maximize

between class variance and minimize the within class variance.

LDA was insensitive to large variations in lighting and facial

expressions. Although it was claimed that LDA is better than PCA,

yet it was proved that PCA outperformed LDA especially when the

training data set was small (Martinez and Kak, 2001). Besides,

PCA was found to be less sensitive to different training data set.

Another technique known as Laplacianfaces was proposed (He

et al., 2005) to preserve the local structure of the data. Locality Pre-

serving Projections (LPP) was used for mapping the face images

into face subspace. The technique was found more suitable for fron-

tal face images. Several other methods such as Probabilistic Sub-

spaces (Moghaddam and Pentland, 1995, 1998; Moghaddam, et al.,

1998; Moghaddam, 1999), Feature Line Method (Li and Lu, 1999),

Evolutionary Pursuit (Liu and Wechsler, 2000), Support Vector

Machines (SVM) (Phillips, 1999), etc., have also been proposed by

various researchers with their relative advantages and

disadvantages.Correspondence to: Dinesh Kumar; e-mail: [email protected]

' 2010 Wiley Periodicals, Inc.

Page 2: Analysis of unsupervised learning techniques for face recognition

A large number of face recognition algorithms use PCA that

deals with second-order statistics of the images and does not take

the higher order dependencies into consideration. Independent

Component Analysis (ICA) (Comon, 1994; Hyvarinen, 1999) is one

such technique that decorrelates the higher order moments of the

input besides second-order ones. There exit a large number of statis-

tical techniques based on information theoretic concepts and alge-

braic approaches for performing ICA. Neural algorithms derived

from these approaches are used for extracting the independent com-

ponents. Bartlett and Sejnowski (Bartlett and Sejnowski, 1997;

Bartlett et al., 1998, 2002) developed method based on ICA for rep-

resenting the images for face recognition using two different archi-

tectures. Self-organizing map (SOM) (Kohonen, 1988, 1997) is

another algorithm used for self-organization or unsupervised learn-

ing that discovers the significant features in the input data. These

have also been successfully used as a way of dimensionality reduc-

tion and feature selection for face space representations (Lawrence

et al., 1996; Neagoe and Ropot, 2002; Tan et al., 2005).

We have used two different popularly used standard face data-

bases (ORL and Yale) for our experimental work. ORL face data-

base (http://www.cam-orl.co.uk/facedatabase.html) is composed of

400 images containing as many as 40 different subjects (persons),

each having his/her 10 different images. These images vary in terms

of facial expressions (open/close eyes, smiling/no-smiling) and fa-

cial details (glasses/no-glasses). These images with slightly varying

illumination are in up-right front position with slight left–right rota-

tion. Yale face database (http://www1.cs.columbia.edu/belhumeur/

pub/images/yalefaces/) contains images of 28 human subjects under

nine poses and 64 illumination conditions. For performing the

experiments, two different sets, each having 10 subjects, were pre-

pared. Yale Pose set had 10 subjects each having nine different

poses, whereas the other set, Yale Illumination had 10 subjects each

having 10 face images with different illumination. Another database

was prepared that had face images from ORL, Yale Pose, and Yale

Illumination face databases. This database had a total of 40 subjects

(20 from ORL, 10 each from Yale Pose and Yale Illumination).

This article investigates the performance of two approaches

global SOM (GSOM) and local SOM (LSOM) for face recognition

for the above said databases. These two approaches have been com-

pared in terms of recognition rate of the face recognition system. A

comparison of PCA, SOM, and ICA has also been carried out. This

article has been divided into five sections. Section II introduces

SOM, PCA, and ICA followed by description of GSOM and LSOM

in Section III. Section IV is devoted to performance evaluation of

the system in terms of recognition rate. Based on the results and dis-

cussions presented in the previous section, conclusions have been

drawn in Section V.

II. SOM, PCA, AND ICA

A. Self-Organizing Maps. SOM is a feed forward neural net-

work approach that belongs to unsupervised class. It uses unsuper-

vised training algorithm. It follows a process of self-organization

and configures the output units in such a way that the topological

representation of the original data is preserved. SOM transforms the

high-dimensional data onto 1D or 2D layer of neurons. There is a

competition among the neurons to be activated and fired. Only that

neuron is activated that is the winner of the competition. The neigh-

borhood function is centered around this winning neuron. The

neighborhood function initially covers the entire lattice and it is

allowed to shrink gradually until it has the winning neuron. The

algorithm goes through two phases: ordering phase, during which

the topological ordering of the weight vectors takes place, whereas

convergence phase covers the fine tuning of the computational map.

B. Principal Component Analysis. PCA is a popular statisti-

cal unsupervised technique because of its wide use in many applica-

tions such as signal processing and pattern recognition. It is used to

reduce the dimensionality of the data by compressing the data into

lower dimensions. It linearly transforms the original set of variables

into a smaller set of variables. These variables are uncorrelated and

contain most of the information as contained by the original set of

data. A smaller set of data is much easier to handle with, requires

less storage space, and reduces the computational complexity while

retaining the maximum information. PCA has successfully been

used for face recognition applications where the dimensionality of

data is very high. For face recognition applications, consider a set

of N sample images G 5 {G1,G2,. . .,GN} taking values in an n-dimensional image space. Assume each face image has m 3 n 5 Mpixels and is represented as M 3 1 column vector. A training set Gwith N number of face images of known individuals forms a M 3 Nmatrix. Covariance matrix

C ¼XN

k¼1

Ck �Wð Þ Ck �Wð ÞT ;

is computed where C is the mean image of all the samples and

eigenvectors and eigenvalues of the C are determined such that CF5 kF where k 5 diag(k1,k2,. . .,kN) is a diagonal matrix defined by

the eigenvalues of the matrix C; F 5 [V1,V2,. . .,VN] are the associ-

ated eigenvectors. The dimensionality can be reduced by selecting

first L < N eigenvectors to find the data in the new directions and

discarding the rest.

C. Independent Component Analysis. ICA is a generaliza-

tion of PCA. It was originally developed to deal with problems that

were closely related to the cocktail party problem most commonly

known as Blind Source Separation (BSS). It gained lot of popularity

because of its use in wide variety of applications such as signal proc-

essing, pattern recognition, telecommunications, medical imaging,

and financial time series analysis (Hyvarinen and Oja, 2000). PCA

deals with second-order statistics, whereas ICA reduces the higher

order statistical dependencies and tries to make the signal as inde-

pendent as possible. Consider u is the source vector and A is the

mixing matrix. The observation vector x is given by x 5 Au, whereboth A and u are unknown. The aim is to find the demixing matrix

sayW, such that the original vector u can be recovered from the out-

put vector y defined as y 5 Wx. As x 5 Au, therefore y 5 WAu.Here W 5 A21 leads to perfect separation of source signals, i.e., y5 u. Practically, y should be as close to u as possible. We need an

iterative technique for updating the weight matrix W in unsuper-

vised manner that will lead to source separation. Various ICA

approaches were proposed that were mainly based on information

theoretic concepts. Bartlett et al. (Bartlett and Sejnowski, 1997;

Bartlett, et al., 1998, 2002; Bartlett, 1998) developed methods for

representing face images. Two different architectures were proposed

with the assumption that the face images are a linear mixer of an

unknown set of statistically independent source images. Architec-

ture I treated images as random variables and pixels as outcomes,

whereas the pixels are treated as random variables and images as

262 Vol. 20, 261–267 (2010)

Page 3: Analysis of unsupervised learning techniques for face recognition

outcome in Architecture II. The Infomax algorithm proposed by

Bell and Sejnowski (1995) was used for performing the ICA.

The weight update rule is DW 5 h(I 1 (1 2 2z)yT)W. Z is the

output of the nonlinearity (logistic function) used (Fig. 1). ICA was

performed on both the Architectures I and II.

III. GSOM AND LSOM

A. Introduction. The global processing (GSOM) is the one in

which each and every pixel of the face image is fed into the SOM

networks, whereas in the local processing (LSOM) method, the face

image is divided into blocks and these blocks of pixels are proc-

Figure 1. Maximum entropy method for ICA.

Figure 2. Flowchart for SOM algorithm.

Table I. Total variance contribution rate for different n.

Eigenvalues (n) 199 160 120 100 80 40 20

TVCR 100 98.67 96.06 94.13 91.53 81.72 69.82

Figure 3. (a) Original images, (b) reconstructed images using local

processing, and (c) reconstructed images using global processing.

Vol. 20, 261–267 (2010) 263

Page 4: Analysis of unsupervised learning techniques for face recognition

essed. Global processing requires substantially larger network as

compared with that required for local processing technique. This is

due to the fact that the usage of pixel blocks effectively results in

reduction of dimensionality of data space that has to be topologi-

cally represented in the SOM space. The training images in both

approaches are mapped to lower dimensions and the weight matrix

of each training image is stored. At the time of recognition, the

training images are reconstructed using the weight matrices and

matching is done with the test image using Euclidean norm (L2

norm) as the similarity measure.

B. Algorithm. The steps are as follows:

1. Consider a face image I of size n 3 n. For GSOM, concate-

nate the face image to form a single vector x of size b 3 1

where b 5 n 3 n. This will form the input for 2D SOM. For

LSOM, the face image is divided into sub-blocks of size say

a 3 a resulting in total of p 5 (n 3 n)/(a 3 a) blocks each of

which contains q5 a 3 a number of elements, concatenation

of which produces a vector to represent one block resulting

in a matrix X 5 [x1,x2,. . .,xp] of size q 3 p. This gives a

stream of training vectors {xi}i51p.

2. Consider 2D (r 3 r) map of neurons each of which is identi-

fied as index jk, j, k 5 1, 2,. . .,r. The j kth neuron has an

incoming weight vector wjk 5 (wl,jk,. . .,wq,jk) at instant i. Thevalue of neighborhood around the winning neuron is hJK at

instant i. Initialize weights wjK, neighborhood hJK, and the

learning rate h0.

3. For GSOM, present the single vector x of size b 3 1 as

obtained in step 1 to a 2D (r 3 r) map of neurons with a total

of z 5 r 3 r neurons. For LSOM, pick a sample vector xi atrandom and present it to a 2D (r 3 r) map of neurons with a

total of z5 r3 r neurons.4. Find out best matching (winning neuron) using following dis-

tance criterion

jjxi � wJKðiÞjj ¼ minjk

fjjxi � wjkðiÞjjg;

where wJK is the best matching weight vector.

5. Update the synaptic weight vectors of only the winning cluster

wjkðiþ1Þ ¼ wjkðiÞ þ hiðxðiÞ � wjkðiÞÞ jk 2 hJKðiÞ;

6. Update learning rate hi and the neighborhood hJK(i).

7. Continue with step 3 until no noticeable changes in the fea-

ture map are observed. Finally a matrix R of size z 3 1 is

obtained for GSOM. In case of LSOM, a matrix M of size z3 q is obtained.

8. Retain the weight matrices for both the methodologies.

9. Repeat the above steps for all training images.

10. Reconstruct the images at the time of recognition and match

with the test image using nearest neighbor classifier.

Figure 2 shows the flowchart. At the start of the algorithm, the

neighborhood hJK(i) usually includes all neurons in the vector field

and its value reduces gradually. During the initial period of adapta-

tion called the ordering phase, the learning rate hi is kept close to

unity and then decreases either linearly or exponentially or inversely

with index i. During the tuning phase which occurs after ordering

phase, it has a very small value but never zero. For the experimenta-

tion purpose ‘‘hextop" topology has been chosen together with

‘‘linkdist’’ as the distance function. The ordering phase learning

rate was kept 0.9 when maintaining the tuning phase rate as 0.02.

For PCA, the total variance contribution rate (TVCR), was com-

puted by retaining only n number of eigenvalues for total of 200

images (200 eigenvalues).

TVCR ¼Pn

i¼1 kiPLi¼1 ki

3100;

L is the total number of eigenvalues. Table I gives TVCR for vari-

ous values of n.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONSThe ORL face database discussed in Section I was used for com-

puter simulations in this article. The original image 92 3 112 was

resized to 80 3 80 before further processing of the face image. Eu-

clidean norm was used as the similarity measure to see which

Table II. Recognition rate vs. block size – (ORL).

Recognition Rate (%)

Method Size of Block

43 4 83 8 163 16

LOCAL SOM (53 5) 96 96 96

GLOBAL SOM (53 5) 98

Table III. Recognition rate vs. block size – (Yale_Illumination).

Recognition Rate (%)

Method Size of Block

43 4 83 8 163 16

LOCAL SOM (53 5) 72 70 74

GLOBAL SOM (53 5) 74

Table IV. Recognition rate vs. block size – (Yale_Pose).

Recognition Rate (%)

Method

Size of Block

43 4 83 8 163 16

LOCAL SOM (53 5) 72.5 67.5 75

GLOBAL SOM (53 5) 75

Table V. Recognition rate vs. SOM size – (ORL).

Recognition Rate (%)

Method

Size of SOM

SOM (33 3) SOM (53 5)

LOCAL SOM 94 96

GLOBAL SOM 98 98

264 Vol. 20, 261–267 (2010)

Page 5: Analysis of unsupervised learning techniques for face recognition

images are most alike. As many as five training images and the

same number of test images were used for performing the experi-

ments. There is no overlap between training and test sets. The

experiments are as follows:

1. The first experiment was performed to see the effect of global

and local processing on the recognition rate of the face recog-

nition system. Two-dimensional self-organizing map of size

5-by-5 was chosen, for both global and local processing. The

face image of size 80 3 80 was concatenated to form a single

vector of size 1 3 6400. This formed the input for the SOM

and it was trained. After training, a matrix of size 25 3 1 was

obtained and retained. The image was reconstructed with the

help of this matrix at the time of recognition for matching

purpose. As many as five training images and the same num-

ber of test images for the first 10 classes of image database

were used for performing the experiment 2. The same proce-

dure was adopted to get the results for another face database;

Yale-Pose and Yale-Illumination. As there are total of nine

images per subject in former and 10 images per subject in the

latter, as many as five images per subject were taken for

training and remaining four (five in Yale-Illumination) nono-

verlapping images were used for testing. The image was

cropped and resized to 48 3 48 for making the computations

simple. Figure 3 shows the original images and the images

reconstructed using GSOM and LSOM. Table II shows that

there is no change in the recognition rates with respect to the

change in the size of the block, whereas the global processing

system performs better in terms of recognition rate as com-

pared with local processing system. Tables III and IV indi-

cate that the size of block affects the recognition rate which

is more with 4 3 4 block size as compared with that using a

block of size 8 3 8. But, as the size was increased to 16 316, the rate approached to a value that was equal to that

obtained using global processing.

2. The second experiment was performed to see the effect of

changing the SOM size on the performance of recognition

system. For this purpose, maps of two different sizes (3 3 3

and 5 3 5) were chosen and the experiment was performed

for the first 10 classes of the face database using five training

images and the same number of test images and the matching

was done using Euclidean norm. The block size for local

processing was kept as 4 3 4. Table V depicts that the recog-

nition rate remains same for both, 3 3 3 and 5 3 5 sizes if

the global processing method is used whereas local process-

ing method results in the change in recognition rate as the

size of the SOM is changed. There is an increase in recogni-

tion rate as the size of the SOM is increased. This experiment

was also repeated for Yale-Pose and Yale-Illumination face

databases. Tables VI and VII revealed that no change in rec-

ognition rate was observed so far as Yale-Pose was con-

cerned with respect to the change of SOM size, whereas a

change was noticed for Yale-Illumination. For LSOM,

though there was an increase in value yet still less than the

value obtained for GSOM.

3. In the third experiment, the number of classes of the face

database was varied from 10 to 20 to 40. This experiment

was performed to see the effect of local and global processing

methods on changing the number of classes of the face data-

base. The block size was kept as 4 3 4 for local processing.

Table VIII clearly shows that the recognition rate decreases

for both global and local processing systems as the number

of classes is increased. The increase in the number of classes

results in the increase in chances of similarity among the

classes and hence results in the decrease in performance of

the system. The results indicate that global processing still

performs better than the local processing method.

4. In this experiment, the training images were taken in the

form of a matrix, each column of which represented one

image and the number of columns was equal to the number

of images. The eigenvectors and eigenvalues were obtained

from the covariance matrix of the training images and the

80% of total number of eigenvectors, covering almost 99%

energy (Table I), were retained for performing PCA. Recon-

struction of test images was done after finding out the

Table VI. Recognition rate vs. SOM size – (Yale_Illumination).

Recognition Rate (%)

Method

Size of SOM

SOM (33 3) SOM (53 5)

LOCAL SOM 70 72

GLOBAL SOM 76 76

Table VII. Recognition rate vs. SOM size – (Yale_Pose).

Recognition Rate (%)

Method

Size of SOM

SOM (33 3) SOM (53 5)

LOCAL SOM 72.5 72.5

GLOBAL SOM 75 75

Table VIII. Recognition rate vs. number of classes for SOM, PCA, and

ICA – (ORL).

Recognition Rate (%)

Method

Number of Classes

10 20 40

GSOM 98.00 94.00 90.50

LSOM 96.00 90.00 88.00

PCA 94.00 91.67 89.83

ICA-I 94.00 92.00 84.00

ICA-II 90.00 85.00 79.00

Figure 4. Recognition rate for varying number of classes for SOM,

PCA, and ICA2 ORL.

Vol. 20, 261–267 (2010) 265

Page 6: Analysis of unsupervised learning techniques for face recognition

Karhunen-Loeve (KL) coefficients using the retained eigen-

vectors followed by matching using Euclidean norm. As

many as five training images and the same number of test

images were used for performing the experiments. There was

no overlap between training and test sets. The number of

classes was varied from 10 to 20 to 40 to determine the rec-

ognition rate. To perform ICA, a matrix X was obtained that

had 40% of the total number of principal axes. The data were

first whitened by passing the input matrix X through the whit-

ening matrix WZ ¼ 23ðCovðXÞÞ�12 and ICA was performed.

The weights W were updated as per rule DW 5 h(I 1 (1 22z)yT)W for 1600 iterations. The learning rate was initialized

at 0.001 and annealed down to 0.0001. A comparison among

all the techniques GSOM, LSOM, ICA-I, ICA-II, and PCA

has been given. Table VIII and Figure 4 show the recognition

rate of the techniques as the number of classes is varied.

5. This experiment was performed on the database obtained af-

ter mixing images from ORL, Yale-Pose, and Yale-Illumina-

tion databases, as explained in Section I. The images were

resized to 48 3 48 and as many as four images per subject

were taken for training and the same number was used for

testing. Table IX and Figure 5 depict the results. The Table

IX shows that PCA gives better results for 40 classes. It is

pertinent to mention here that this result has been obtained

when 80% of the total number of eigenvectors was retained.

The change in the recognition rate with respect to the number

of retained eigenvectors has been shown in Figure 6. There is

an increase in former as the latter increases. It has been

observed that normally 40% of the total number of eigenvec-

tors are retained that gives good results and at the same time

results in sufficient reduction in dimensionality.

V. CONCLUSIONS. In this article, an analysis of three unsuper-

vised learning techniques was done. From experimental results it

was found that while training the SOM, the local processing

approach took very less time as compared with global processing

method. This was due to the reason that pixel blocks were used that

reduced the dimensionality of the data space that was to be repre-

sented topologically in the SOM space. The results show that there

was no change in the performance of the recognition system for

ORL face database, whereas a change was observed in case of

Yale-Pose and Yale-Illumination databases as the size of the block

is changed. It was also observed that as the size of the block was

made 16 3 16, it yielded same results for both GSOM and LSOM.

It was further observed that the increase in the size of the SOM

does result in the increase of recognition rate (ORL and Yale-Illu-

mination) so far as local processing is concerned but still less than

that using global processing. Tables VIII and IX highlight compari-

son of all the unsupervised techniques. The simulation results indi-

cate that the performance of face recognition system decreases as

the number of classes (subjects) is increased. The reason for the

decrease in performance of recognition system is attributed to the

fact that as the number of classes (subjects) increase, the chances of

mismatch increase due to more similar faces. The results very

clearly highlight that GSOM outperforms all other techniques.

REFERENCES

M.S. Bartlett, Face image analysis by unsupervised learning and redundancy

reduction, Ph.D. Dissertation, University of California, San Diego, 1998.

M.S. Bartlett, H.M. Lades, and T.J. Sejnowski, Independent component rep-

resentations for face recognition, Proceedings of SPIE Symposium on Elec-

tronic Imaging: Science and Technology, Conference on Human Vision and

Electronic Imaging III, California,1998, pp. 528–539.

M.S. Bartlett, J.R. Movellan, and T.J. Sejnowski, Face recognition by inde-

pendent component analysis, IEEE Trans Neural Networks 13(2002), 1450–

1464.

M.S. Bartlett and T.J. Sejnowski, Independent components of face images:

A representation for face recognition, Proceedings of the 4th Annual Joint

Symposium on Neural Computation, Pasadena, 1997, pp. 3–10.

P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, Eigenfaces vs fisherfa-

ces: Recognition using class specific linear projection, IEEE Trans Pattern

Anal Machine Intelligence 19(1997), 711–720.

A.J. Bell and T.J. Sejnowski, ‘‘An information-maximization approach to

blind separation and blind deconvolution, Neural Comput 7(1995), 1129–

1159.

R. Chellapa, C.L. Wilson, and S. Sirobey, Human and machine recognition

of faces: A survey, Proc IEEE 83(1995), 705–740.

Figure 6. Recognition rate vs. number of selected eigenvectors for

PCA (ORL1Yale).

Figure 5. Recognition rate for varying number of classes for SOM,

PCA, and ICA (ORL1Yale).

Table IX. Recognition rate vs. number of classes for SOM, PCA, and ICA

– (ORL 1 Yale).

Recognition Rate (%)

Method

Number of Classes

10 20 40

GSOM 100.00 93.75 81.88

LSOM 97.50 92.50 80.63

PCA 100.00 93.75 83.13

ICA-I 97.50 92.50 83.75

ICA-II 95.00 92.50 84.38

266 Vol. 20, 261–267 (2010)

Page 7: Analysis of unsupervised learning techniques for face recognition

P. Comon, Independent component analysis, A newconcept, Signal Process

36(1994), 287–314.

K. Etemed and R. Chellappa, Discriminant analysis for recognition of

human face images, J Opt Soc Am A 14(1997), 1724–1733.

X. He, S. Yan, Y. Hu, P. Niyogi, and H.J. Zhang, Face recognition using

Laplacianfaces, IEEE Trans Pattern Anal Machine Intelligence 27(2005),

328–340.

A. Hyvarinen, Survey on independent component analysis, Neural Comput

Surveys 2(1999), 94–128.

A. Hyvarinen and E. Oja, Independent component analysis: Algorithms and

applications, Neural Networks 3(2000), 411–430.

M.D. Kelly, Visual Identification of people by computer, Technical Report,

AI-130, Stanford, CA, 1970.

M. Kirby and L. Sirovich, Application of the Karhunen-Loeve procedure for

the characterization of human faces, IEEE Trans Pattern Anal Machine

Intelligence 12(1990), pp. 103–108.

T. Kohonen, Self organization and associative memory, 2nd Edition,

Springer-Verlag, Berlin, Germany, 1988.

T. Kohonen, Self organizing map, 2nd Edition, Springer-Verlag, Berlin,

Germany, 1997.

S. Lawrence, C.L. Giles, and A.C. Tsoi, Convolutional neural networks for

face recognition, Proceedings of IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, 1996, pp. 217–222.

S.Z. Li and J. Lu, Face recognition using nearest feature line method, IEEE

Trans Neural Networks 10(1999), 439–443.

C. Liu and H. Wechsler, Evolutionary pursuit and its applications to face

recognition, IEEE Trans Pattern Anal Machine Intelligence 22(2000), 570–

582.

B.S. Manjunath, R. Chellappa, and C. Von der Malsburg, A feature based

approach to face recognition, Proceedings of IEEE CS Conference Com-

puter Vision and Pattern Recognition, Champaign, USA, 1992, pp. 373–378.

A.M. Martinez and A.C. Kak, PCA versus LDA, IEEE Trans Pattern Anal

Machine Intelligence 23(2001), 228–233.

B. Moghaddam, Principal manifolds and Bayesian subspaces for visual rec-

ognition, International Conference on Computer Vision, Greece, 1999, pp.

1131–1136.

B. Moghaddam, T. Jebara, and A. Pentland, Efficient MAP/ML similarity

matching for visual recognition, Proceedings of Fourteenth International

Conference on Pattern Recognition, Brisbane, Australia, Vol. 1, 1998, pp.

876–881.

B. Moghaddam and A. Pentland, Probabilistic visual learning for object

detection, Proceedings of International Conference on Computer Vision,

MIT Cambridge, Massachusetts, 1995, pp. 786–793.

B. Moghaddam and A. Pentland, Probabilistic matching for face recognition,

IEEE Southwest Symposium on Image Analysis and Interpretation, Tucson,

AZ, USA, 1998, pp. 186–191.

V.E. Neagoe and A.D. Ropot, Concurrent Self organizing maps for pattern

classification, Proceedings of First International Conference on Cognitive

Informatics, ICCI’02, 2002, Washington, DC, USA. pp. 304–312.

P. Penev and J. Atick, Local feature analysis: A general statistical theory for

object representation, Network: Comput Neural System 7(1996), 477–500.

A. Pentland, B. Moghaddam, and T. Starner, View-based and modular

Eigenspaces for face recognition, IEEE Conference on Computer Vision and

Pattern Recognition, Washington, USA, 1994, pp. 84–91.

P.J. Phillips, Support vector machines applied to face recognition, Proceed-

ings of the 1998 Conference on Advances in Neural Information Processing

Systems II, MIT Press, Cambridge, USA, 1999, pp. 803–809.

F. Samaria and F. Fallside, ‘‘Face identification and feature extraction using

hidden Markov models,’’ In: image processing: Theory and application, G.

Vernazza (Editors), Elsevier, San Remo, Italy pp. 295–298.

D.L. Swets and J.J. Weng, Using discriminant Eigenfeatures for image re-

trieval, IEEE Trans Pattern Anal Machine Intelligence 18(1996), 831–836.

X. Tan, S. Chen, Z.H. Zhou, and F. Zhang, Recognizing partially occluded,

expression variant faces from single training image per person with SOM

and Soft k-NN Ensemble, IEEE Trans Neural Networks 16(2005), 875–886.

M.L. Teixeira and J.R. Beveridge, An implementation and study of the Mog-

haddam and Pentland intrapersonal/extrapersonal image difference face rec-

ognition algorithm, CSU Computer Science Department Technical Report,

Colorado State University, USA, 2003.

M. Turk and A. Pentland, Eigenfaces for recognition, J Cogn Neurosci 3

(1991), 71–86.

B. Weyrauch, B. Heisele, J. Huang, and V. Blanz, Component based face

recognition with 3-D morphable models, Computer Vision and Pattern Rec-

ognition Workshop, Vol. 5, 2004, pp. 85.

W. Zhao, R. Chellapa, A. Rosenfeld, and P.J. Phillips, Face recognition: A

literature survey, ACM Comput Surveys 35(2003), 399–458.

Vol. 20, 261–267 (2010) 267