detection and recognition technologies advances in face ... · variations, a model based approach...

Post on 12-Jun-2018

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

NEC J. of Adv. Tech., Winter 200528

ABSTRACT

Detection and Recognition Technologies

Advances in Face Detection andRecognition TechnologiesBy Atsushi SATO,* Hitoshi IMAOKA,* Tetsuaki SUZUKI* and Toshinori HOSOI*

*Media and Information Research Laboratories

1. INTRODUCTION

In recent years there have been great expectationsof biometric authentication in view of increasing vi-cious crimes and terrorist threats. Biometric authen-tication is the automatic identification or identityverification of an individual based on physiological orbehavioral characteristics such as fingerprint, iris,face, vein, voice, and so on. These kinds of authentica-tions are most commonly used to safeguard interna-tional borders, control access to facilities, and en-hance computer network security.

Face authentication has interesting characteris-tics that other biometrics do not have; facial imagescan be captured from a distance, any special actionsare not always required for authentication, and acrime-deterrent effect can be expected because thecaptured images can be recorded and we can see whothe person is at a glance. Due to such characteristics,the face recognition technique is expected to be ap-plied widely not only to security applications such asvideo surveillance but also to image indexing, imageretrievals and natural user interfaces.

The authors have previously developed face detec-tion and recognition technologies[1]. Subsequently,several improvements have been accomplished on theprevious method. This paper describes such advancesin the authors’ face detection and recognition tech-nologies. For face detection, a hierarchical scheme for

combined face and eye detection has been developedbased on the Generalized Learning Vector Quantiza-tion method (GLVQ). For face recognition, the pertur-bation space method has been improved to reduce theadverse effects of illumination changes as well aspose changes. This paper is organized as follows: Facedetection technology and face recognition technologyare described in Section 2 and Section 3, respectively.Experimental results are given in Section 4, and con-clusions are presented in Section 5.

2. FACE DETECTION AND ALIGNMENT

Face detection has two important tasks; one is todetermine facial regions in an image against variousbackgrounds, and the other is to determine alignmentof each face such as position, size and rotation toobtain better performance in face recognition. Assum-ing that the face is rigid, its pose is completely definedby six parameters; three coordinates and three rota-tion angles in three-dimensional space. If the face is afrontal view, its pose can be defined by four param-eters, because the degree of rotational freedom de-creases from three to one; that is, in-plane rotation.Therefore, most face detection algorithms focus ondetermining the position of both eyes in the image,which has four free parameters. To take account of itsout-of-plane rotation, another facial feature pointshould be detected; the center of the mouth for ex-ample, or face recognition algorithms should be de-signed to be robust with respect to such variations.

Much work has been done in face detection sofar[2]. Skin colors are often used to determine facial

This paper describes advances in the authors’ face detection and recognition technologies. For facedetection, a hierarchical scheme for combined face and eye detection has been developed based on

the Generalized Learning Vector Quantization method to achieve precise face alignment. For face recognition,the perturbation space method has been improved to reduce the adverse effects of illumination changes as well aspose changes by using a standard face model. Experimental results have revealed that the proposed methodoutperforms our previously employed method.

KEYWORDS Face detection, Face recognition, Face authentication, Biometric authentication

NEC Journal of Advanced Technology, Vol. 2, No. 1 29

Special Issue on Security for Network Society

regions because of ease of implementation, but itsperformance is easily degraded by illuminationchanges. View-based methods achieve very high per-formance for detecting faces against complicatedbackgrounds without using skin colors[3,4]. However,these methods consume much time for searching fa-cial regions exhaustively over the image. Moreover,face alignment is not so precise in general, becausesuch methods ignore high-frequency components ofthe image to speed up searching.

The authors have developed a hierarchical schemefor face detection and alignment in order to improveperformance. Figure 1 shows a block diagram of theproposed method. In face detection, the face positionis roughly determined using low-frequency compo-nents by searching over multi-scale images takingaccount of in-plane rotation. And then in eye detec-tion, the position of both eyes is determined preciselyby a coarse-to-fine search using high-frequency com-ponents of the image. View-based method has beenemployed for these detectors in which templates aretrained by Generalized Learning Vector Quantization(GLVQ) as shown next.

2.1 Generalized Learning Vector QuantizationGLVQ[5,6] is a learning method of templates, as

used in nearest neighbor classifiers, based on Mini-mum Classification Error (MCE) criterion. MCEminimizes the smoothed empirical risk defined by

where ),,1( Nnn

=x and ),,1( Kkk

=ω denotetraining samples and classes, respectively, and 1(·) isan indicator function such that 1(true)=1 and1(false)=0. )( is a smoothed loss function defined by

where )0( >ξ controls the slant of the sigmoid func-tion, and when goes to infinity, Eq. (1) becomes identi-cal to the empirical loss in Bayes decision theory.

);( θρnkx is called a misclassification measure as ex-

plained later. The classifier parameter θ can be up-dated for a given

nx to minimize the smoothed empiri-

cal risk as follows in an online learning form calledprobabilistic descent:

For nearest neighbor classifiers, the classifier pa-rameter consists of reference vectors called tem-plates; that is, },,1;,,1|{

kkiNiKk === mθ

where k

N is the number of reference vectors in classω k. In GLVQ, the misclassification measure is de-fined as follows to ensure convergence of referencevectors:

where );( θnk

d x is the squared Euclidian distance be-tween

nx and the nearest reference vector mki of class

ω k to which nx belongs, and likewise dl( n

x ;θ) is thesquared Euclidian distance between

nx and the near-

est reference vector mlj of the other classes. Then wecan obtain GLVQ learning rule for these two refer-ence vectors as follows:

where ))};((1)){;((4));(( θρθρθρ xxxkkk

w −= . In ad-dition, GLVQ employs a simulated annealing tech-nique to avoid getting trapped in local minima, in

Fig. 1 Block diagram of the proposed face detection and alignment method.

∑∑= =

∈=N

n

K

k

knnke

NR

1 1

)(1));((1

)( ωθρθ xx , (1)

)exp(1

1)(

ξρρ

−+= , (2)

θθ

εθθ∂

∂−←

)(eR , (3)

∑=

∈∂

∂=

∂∂ K

k

kn

nkeR

1

)(1));(()(

ωθ

θρθθ

xx . (4)

);();(

);();();(

θθθθ

θρnlnk

nlnk

nk

dd

dd

xx

xxx

+−

= , (5)

)()};();({

);(

));((

2 kin

nlnk

nk

nkkiki

dd

d

w

mx

xx

x

xmm

−+

×

+←

θθθ

θρε

(6)

)()};();({

);(

));((

2 ljn

nlnk

nl

nkljlj

dd

d

w

mx

xx

x

xmm

−+

×

−←

θθθ

θρε

(7)

·

NEC J. of Adv. Tech., Winter 200530

which the slant parameter ξ is set to a small positivenumber at the beginning and is increased duringlearning.

2.2 Face DetectionFigure 2 shows the flow of the proposed face detec-

tion system. First, multi-scale images are generatedfrom an input image, and then reliability maps aregenerated by GLVQ. Finally, these maps are mergedthrough interpolation to obtain final results. The reli-ability maps are generated by scanning the multi-scale images with templates as shown in Fig. 3. Thereliability is calculated by

where )(xFd and )(x

NFd denote the minimum dis-

tances between a query sub-image x and templateswhich belong to face and non-face categories, respec-tively. The value of ρ(x) ranges in [-1, 1], and if thevalue is positive, the query sub-image is regarded asface. Figure 4 shows examples of face images and

non-face images used in GLVQ training. The size offace was normalized on the basis of eye positions. Totake account of in-plane rotation, rotated face imageswere also added to the training samples as shown inFig. 5.

Since searching speed depends on the size of tem-plates, reducing the image resolution is effective forspeeding up searching. However, doing this degradesface detection performance, because high-frequencycomponents of the image vanish. To solve this prob-lem, high-frequency components are extracted beforereducing the image size in our method. Figure 6shows an example of extracting high-frequency fea-tures from the image; that is, extracting edgestrength for each direction. The size of these edgeimages is reduced by summing them up in local areas.Since the size of each multi-scale image can be re-duced as well, the searching speed can be improvedwithout noticeably degrading the performance.

2.3 Eye DetectionEye detection determines the precise position of

Fig. 2 Processing flow of the face detection.

Fig. 3 Scanning multi-scale images with templates.

Fig. 4 Examples of face images and non-face images used in GLVQ training forface detection.

))()(/())()(()( xxxxxFNFFNFdddd +−=ρ , (8)

NEC Journal of Advanced Technology, Vol. 2, No. 1 31

Special Issue on Security for Network Society

both eyes by a course-to-fine search as shown in Fig.7. First, a grid is assumed on the image centered atthe initial position of each eye, which was determinedby face detection. Next, the reliability of the eye iscalculated for each grid point in the same manner asin the face detection. The most likely point for eachgrid is selected for the candidate for the eye and thenthe size of the grid is reduced for the next search.Templates were trained by GLVQ with eye imagesand non-eye images as in the face detection. However,in order to preserve high-frequency components of theimage, training samples were zoomed in to the eyeregions as shown in Fig. 8.

3. FACE RECOGNITION

Face recognition is to calculate similarities orscores between a query facial image and enrolledfacial images. Several methods have been proposedthus far; extracting effective features from whole fa-cial images based on principal component analysis[7],extracting local features based on local feature analy-sis[8], and extracting relative positions of facial land-marks[9]. The performance of face recognition de-grades by two factors: one is global image variationscaused by pose and illumination changes, and the

other is local image variations caused by facial ex-pression, aging, wearing glasses and so on. However,such variations are not taken into account explicitlyin the above previously discussed methods.

To reduce such adverse effects, two key technolo-gies have been developed; the perturbation spacemethod for global image variations, and adaptive re-gional blend matching for local image variations.

3.1 Perturbation Space MethodFor reducing the adverse effect of global image

variations, a model based approach is very effective,because such variations are caused by physical prop-erties. As mentioned, the position and the rotation ofthe face in three-dimensional space can be describedby six degrees of freedom. Moreover, it has beenshown that illumination changes can be described nomore than ten degrees of freedom[10]. Thus, face rec-ognition using a three-dimensional facial model im-proves the recognition performance greatly even if apose or illumination changes. However, this methodrequires the individual’s own facial shape and reflec-tance on the surface, thus it cannot be applied toordinary face recognition systems based on imagematching.

We have developed a standard face model to

Fig. 5 Examples of face images taking account of in-plane rotation.

Fig. 6 Example of extracting edge strength for each direction.

Fig. 7 Processing flow of course-to-fine search in eye detection.

NEC J. of Adv. Tech., Winter 200532

generate various facial appearances. This face modelconsists of a three dimensional facial surface and aset of illumination bases as shown in Fig. 9. For posechanges, the enrolled image is mapped onto the threedimensional facial surface, and is rotated in threedimensional space to simulate out-of-plane posechanges. For illumination changes, various shadedimages are generated from the enrolled image usingthe illumination bases which were obtained by ex-ecuting principal component analysis (PCA) on vari-ous facial images.

If all of these generated images are enrolled to thedatabase, it may take long to compare between aquery and the database in the recognition process. Tosolve this problem, these images are compressed us-ing PCA, and the comparison is executed using thefollowing equation:

where x is the query image, µ is the mean of all

generated images, and φj is the j-th eigenvector ob-tained by PCA. Eq. (9) denotes the squared distancebetween x and the subspace spanned by the eigenvec-tors. This method, which is called the perturbationspace method, achieves fast comparisons in the

Fig. 8 Examples of eye images and non-eye images used in GLVQ train-ing for eye detection.

Fig. 9 Conceptual figure of generation of various facial images using standard face model.

∑=

−−−=M

j

T

jD

1

222 )}({||||)( µφµ xxx

Fig. 10 Conceptual figure of adaptive re-gional blend matching.

(9)

NEC Journal of Advanced Technology, Vol. 2, No. 1 33

Special Issue on Security for Network Society

recognition process, taking account of pose and illu-mination changes.

3.2 Adaptive Regional Blend MatchingFor local image variations, robust image matching

is desirable because the modeling of facial expres-sions change and aging effects, or wearing glasses isquite a difficult problem. To reduce the adverse ef-fects of such local changes, adaptive regional blendmatching has been developed.

As shown in Fig. 10, the enrolled image and thequery image are divided into N segments, and a scoreS[i] is calculated for each pair of segments by theperturbation space method. Some scores havinglarger values are taken into account for calculatingthe final score, instead of using all of them. Thismeans that segments in the query image which arequite different from the corresponding segments inthe enrolled image are neglected for calculating thefinal score, so that this method achieves robustmatching with respect to local image variations.

4. EXPERIMENTS

Face recognition experiments were conducted todemonstrate the effectiveness of the proposed methodfor several databases collected in our laboratories.DB1 and DB2 contain aging effects, DB3 containsfacial expression changes, and DB4 contains illumi-nation changes. The number of persons representedin each database is from 200 to 1,000.

Figure 11 shows experimental results by the pro-posed method compared with our previous method.Equal error rate (EER) is the performance measure ofverification in which the threshold for scores is tunedso that its false acceptance rate is equal to its falserejection rate. The identification rate is the probabil-ity that the 1st candidate having the highest score is

the genuine person. The performance of the proposedmethod is drastically improved compared with theprevious method, and its performance is almost thesame as the case that eye positions were given byhand instead of using face detection. It can be saidthat the accuracy of eye positions by the proposed facedetection method is almost the same as human esti-mation.

5. CONCLUSION

This paper has described advances in the authors’face detection and recognition technologies. For facedetection, a hierarchical scheme for combined faceand eye detection has been developed, based on theGeneralized Learning Vector Quantization method.For face recognition, the perturbation space methodhas been improved to reduce the adverse effects ofillumination changes as well as of pose changes. Ex-perimental results reveal that the proposed methodachieves much higher performances for several data-bases than by our previous method. It is also revealedthat the accuracy of eye positions by the proposed facedetection method is almost the same as for humanestimation. This would be greatly helpful for puttingautomatic face recognition into practice. In futureworks, estimating three-dimensional facial shapefrom an image should be developed to improve therecognition performance even further.

REFERENCES

[1] A. Sato, A. Inoue, et al., “NeoFace - Development of FaceDetection and Recognition Engine,” NEC Res. & De-velop., 44, 3, pp.302-306, July 2003.

[2] E. Hjelman and B. K. Low, “Face Detection: A Survey,”Computer Vision and Image Understanding, 83, 3,pp.236-274, 2001.

[3] H. A. Rowley, S. Baluja and T. Kanade, “NeuralNetWork-Based Face Detection,” IEEE Trans. PAMI, 20,

Fig. 11 Experimental results of face recognition for several databases.

NEC J. of Adv. Tech., Winter 200534

* * * * * * * * * * * * * * *

Received January 17, 2005

Atsushi SATO received his D.S degree in phys-ics from Tohoku University 1989. He joinedNEC Corporation in 1989, and is now a princi-pal researcher at the Media and InformationResearch Laboratories. His research interestsinclude image processing, pattern recognition

and artificial neural networks. During 1994-1995, he was avisiting researcher in Department of Electrical Engineering atUniversity of Washington.

Dr. Sato is a member of the Institute of Electronics, Infor-mation and Communication Engineers (IEICE) and the Physi-cal Society of Japan (JPS). He received the Convention Awardfrom the Information Processing Society of Japan (IPSJ) andfrom IEICE in 1991 and 1994, respectively.

Hitoshi IMAOKA received his Dr. Eng. degreein statistical mechanics from Osaka Univer-sity in 1997. He joined NEC Corporation in1997, and is now a senior researcher at theMedia and Information Research Laborato-ries. His research interests include pattern

recognition, and image processing.Dr. Imaoka is a member of the Japanese Neural Network

Society and the Institute of Electronics, Information and Com-munication Engineers (IEICE).

1, pp.23-38, 1998.[4] K. K. Sung and T. Poggio, “Example-Based Learning for

View-Based Human Face Detection,” IEEE Trans. PAMI,20, 1, pp.39-51, 1998.

[5] A. Sato and K. Yamada, “Generalized Learning VectorQuantization,” In Advances in Neural Information Proc-essing Systems, 8, pp.423-429, MIT Press, 1996.

[6] A. Sato, “Discriminative Dimensionality Reduction Basedon Generalized LVQ,” In Artificial Neural Networks -ICANN2001, pp.65-72, Springer LNCS 2130, 2001.

[7] M. Turk and A. Pentland, “Eigenfaces for Recognition,” J.of Cognitive Neuroscience, 3, 1, pp.71-86, 1991.

[8] P. Penev and J. Atick, “Local Feature Analysis: A Gen-eral Statistical Theory for Object Representation,” Net-work: Computation in Neural Systems, pp.477-500, 1996.

[9] L. Wiskott, J. M. Fellous, et al., “Face Recognition byElastic Bunch Graph Matching,” IEEE Trans. PAMI, 19,7, pp.775-779, 1997.

[10] R. Ishiyama and S. Sakamoto, “Geodesic IlluminationBasis: Compensating for Illumination Variations in AnyPose for Face Recognition,” In Proc. of the Int. Conf. onPattern Recognition, 4, pp.297-301, 2002.

* * * * * * * * * * * * * * *

Tetsuaki SUZUKI received his M.E. degree incomputer science from the Tokyo Institute ofTechnology in 2000. He joined NEC Corpora-tion in 2000, and is now a researcher at theMedia and Information Research Laborato-ries. His research interests include pattern

recognition and color image processing.Mr. Suzuki is a member of the Institute of Electronics,

Information and Communication Engineers (IEICE).

Toshinori HOSOI received his M.E. degree inmechanical science from Osaka University in2001. He joined NEC Corporation in 2001, andis now a researcher at the Media and Informa-tion Research Laboratories. His research in-terests include pattern recognition and image

processing.Mr. Hosoi is a member of the Institute of Electronics, Infor-

mation and Communication Engineers (IEICE).

top related