2132 ieee transactions on information ...mkafai/papers/paper_tifs2014.pdf2132 ieee transactions on...

12
2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face Graph for Face Recognition Mehran Kafai, Member, IEEE, Le An, Student Member, IEEE, and Bir Bhanu, Fellow, IEEE Abstract— Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. The demand for unconstrained practical face recognition is rising with the explosion of online multimedia such as social networks, and video surveillance footage where face analysis is of significant importance. In this paper, we approach face recognition in the context of graph theory. We recognize an unknown face using an external reference face graph (RFG). An RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. Centrality measures are utilized to identify distinctive faces in the reference face graph. The proposed RFG-based face recognition algorithm is robust to the changes in pose and it is also alignment free. The RFG recognition is used in conjunction with DCT locality sen- sitive hashing for efficient retrieval to ensure scalability. Experi- ments are conducted on several publicly available databases and the results show that the proposed approach outperforms the state-of-the-art methods without any preprocessing necessities such as face alignment. Due to the richness in the reference set construction, the proposed method can also handle illumination and expression variation. Index Terms— Face recognition, graph analysis, centrality measure, alignment-free, pose robust. I. I NTRODUCTION A S THE explosion of multimedia has been observed in the past decade, the processing and analysis of the multimedia data, including images and videos, are of broad interest. Among these data, face images and videos take a large fraction. The recognition of the faces in different contexts enables various applications such as surveillance monitor- ing, social networking, and human-computer interaction. For example, in the recent event of Boston marathon bombings, images of the suspects taken from street surveillance cameras aided the FBI to identify the suspects [1]. Face recognition includes the studies of automatically identifying or verifying a person from an image or video sequences. Face identification refers to determining the ID of the person, given as a probe, from a large pool of candidates in the gallery. Face verification or face authentication is the Manuscript received March 27, 2014; revised July 13, 2014; accepted July 14, 2014. Date of publication September 19, 2014; date of current version November 10, 2014. This work was supported by the National Science Foundation under Grant 0905671, Grant 0915270, and Grant 1330110. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Sebastien Marcel.(Mehran Kafai and Le An contributed equally to this work.) M. Kafai is with Hewlett Packard Laboratories, Palo Alto, CA 94304 USA (e-mail: [email protected]). L. An is with the Department of Electrical Engineering, University of California at Riverside, Riverside, CA 92521 USA (e-mail: [email protected]). B. Bhanu is with the Center for Research in Intelligent Systems, Uni- versity of California at Riverside, Riverside, CA 92521 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIFS.2014.2359548 Fig. 1. Sample unconstrained face images of the same person with variations in pose, illumination, expression, and alignment. Such variations make face recognition a challenging task. problem of deciding whether a given pair of images are of the same person or not. Although face recognition has been studied extensively and impressive results have been reported on benchmark databases [2]– [4], unconstrained face recognition is of more practical use and it is still a challenge due to the following factors: Pose variation. The face to be recognized may come under arbitrary poses. Misalignment. As the faces are usually detected by an automated face detector [5], the cropped faces are not aligned. However since most feature descriptors require alignment before feature extraction, misalignment degrades the performance of a face recognition system. Illumination variation. The illumination on a face is influ- enced by the lighting conditions and the appearance of the face would vary significantly under different lighting. Expression variation. The face images may differ with different expressions. Figure 1 shows some examples of unconstrained faces. The appearance of the same person varies significantly due to the change in imaging conditions. In most of the previous literature either some or all of the aforementioned factors have to be taken care of before the recognition algorithm is able to perform. For instance, illumination normalization is performed as part of a pre-processing step to remove the variations of the lighting effects in [6]. For commercial face recognition software, some specific constraints have to be imposed. For example, the locations of the eyes have to be determined for the FaceVACS recognition system [7]. In this paper, a Reference Face Graph framework (RFG) is presented for face recognition. We focus on modeling the task of face recognition as a graph analysis problem. An early work by Wiskott et al. [8] proposed to represent each face by a bunch graph based on a Gabor Wavelet transform. In our approach, the identity of an unknown face is described by its similarity to the reference faces in the constructed graph. 1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 01-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014

Reference Face Graph for Face RecognitionMehran Kafai, Member, IEEE, Le An, Student Member, IEEE, and Bir Bhanu, Fellow, IEEE

Abstract— Face recognition has been studied extensively;however, real-world face recognition still remains a challengingtask. The demand for unconstrained practical face recognition isrising with the explosion of online multimedia such as socialnetworks, and video surveillance footage where face analysisis of significant importance. In this paper, we approach facerecognition in the context of graph theory. We recognize anunknown face using an external reference face graph (RFG).An RFG is generated and recognition of a given face is achievedby comparing it to the faces in the constructed RFG. Centralitymeasures are utilized to identify distinctive faces in the referenceface graph. The proposed RFG-based face recognition algorithmis robust to the changes in pose and it is also alignment free. TheRFG recognition is used in conjunction with DCT locality sen-sitive hashing for efficient retrieval to ensure scalability. Experi-ments are conducted on several publicly available databases andthe results show that the proposed approach outperforms thestate-of-the-art methods without any preprocessing necessitiessuch as face alignment. Due to the richness in the reference setconstruction, the proposed method can also handle illuminationand expression variation.

Index Terms— Face recognition, graph analysis, centralitymeasure, alignment-free, pose robust.

I. INTRODUCTION

AS THE explosion of multimedia has been observed inthe past decade, the processing and analysis of the

multimedia data, including images and videos, are of broadinterest. Among these data, face images and videos take alarge fraction. The recognition of the faces in different contextsenables various applications such as surveillance monitor-ing, social networking, and human-computer interaction. Forexample, in the recent event of Boston marathon bombings,images of the suspects taken from street surveillance camerasaided the FBI to identify the suspects [1].

Face recognition includes the studies of automaticallyidentifying or verifying a person from an image or videosequences. Face identification refers to determining the ID ofthe person, given as a probe, from a large pool of candidatesin the gallery. Face verification or face authentication is the

Manuscript received March 27, 2014; revised July 13, 2014; acceptedJuly 14, 2014. Date of publication September 19, 2014; date of currentversion November 10, 2014. This work was supported by the National ScienceFoundation under Grant 0905671, Grant 0915270, and Grant 1330110. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Dr. Sebastien Marcel.(Mehran Kafai and Le An contributedequally to this work.)

M. Kafai is with Hewlett Packard Laboratories, Palo Alto, CA 94304 USA(e-mail: [email protected]).

L. An is with the Department of Electrical Engineering, University ofCalifornia at Riverside, Riverside, CA 92521 USA (e-mail: [email protected]).

B. Bhanu is with the Center for Research in Intelligent Systems, Uni-versity of California at Riverside, Riverside, CA 92521 USA (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIFS.2014.2359548

Fig. 1. Sample unconstrained face images of the same person with variationsin pose, illumination, expression, and alignment. Such variations make facerecognition a challenging task.

problem of deciding whether a given pair of images areof the same person or not. Although face recognition hasbeen studied extensively and impressive results have beenreported on benchmark databases [2]– [4], unconstrained facerecognition is of more practical use and it is still a challengedue to the following factors:

• Pose variation. The face to be recognized may comeunder arbitrary poses.

• Misalignment. As the faces are usually detected byan automated face detector [5], the cropped faces arenot aligned. However since most feature descriptorsrequire alignment before feature extraction, misalignmentdegrades the performance of a face recognition system.

• Illumination variation. The illumination on a face is influ-enced by the lighting conditions and the appearance ofthe face would vary significantly under different lighting.

• Expression variation. The face images may differ withdifferent expressions.

Figure 1 shows some examples of unconstrained faces. Theappearance of the same person varies significantly due tothe change in imaging conditions. In most of the previousliterature either some or all of the aforementioned factors haveto be taken care of before the recognition algorithm is able toperform. For instance, illumination normalization is performedas part of a pre-processing step to remove the variations ofthe lighting effects in [6]. For commercial face recognitionsoftware, some specific constraints have to be imposed. Forexample, the locations of the eyes have to be determined forthe FaceVACS recognition system [7].

In this paper, a Reference Face Graph framework (RFG)is presented for face recognition. We focus on modeling thetask of face recognition as a graph analysis problem. An earlywork by Wiskott et al. [8] proposed to represent each faceby a bunch graph based on a Gabor Wavelet transform. Inour approach, the identity of an unknown face is described byits similarity to the reference faces in the constructed graph.

1556-6013 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

KAFAI et al.: RFG FOR FACE RECOGNITION 2133

This is an indirect measure rather than comparing the unknownface to the gallery faces directly. The purpose of using areference set of external faces is to represent a given face byits degree of similarity to the images of a set of N referenceindividuals. Specifically, two faces which are visually similarin one pose, for example profile, are also, to some extent,similar in other poses, for example frontal. In other words,we assume that visual similarity follows from underlyingphysical similarity in the real world. We take advantage of thisphenomenon in the following way: compare a gallery/probeface image, with all the images of a reference individualfrom the reference set. The similarity with the best matchingimage of the reference individual is the degree of the similaritybetween the gallery/probe subject and the reference individ-ual. By repeating this procedure for each of the referenceindividuals, we create a basis descriptor for the gallery/probeface image that reflects the degree of similarity between thisface image and the reference set individuals. Through theuse of reference-based descriptors, we mitigate the pose andother variations by not using the appearance features extracteddirectly from the original images for face matching. Thepurpose for building a reference face graph is to obtain thenode centralities. By doing this, we determine which face inthe reference set is more discriminative and important. Thecentralities are used as weights for the faces in the referenceface graph. The proposed alignment-free approach is robust topose changes and it is tolerant to illumination and expressionchange.

II. RELATED WORK AND CONTRIBUTIONS

A. Related Work

1) General Approaches Towards Robust Face Recognition:Recent work focuses on unconstrained face recognition, whichis inherently difficult due to the uncontrolled image acquisitionthat allows variations in pose, illumination, expression, andmisalignment.

In the pursuit of an advanced feature descriptor,Barkan et al. [9] form an over-complete face representationusing modified multi-scale LBP features for face recognition.Together with Diffusion Maps as a feature reduction technique,competitive results are obtained. Lei et al. [10] learn discrimi-nant local features called discriminant face descriptors (DFD)in a data-driven manner instead of a handcrafted way andit is effectiveness on both homogeneous face recognition(i.e., images from the same modality) and heterogeneous facerecognition (i.e., images from different modalities). Contraryto tradition in recognition where high-dimensional features arenot preferred, Chen et al. [11] empirically show that usinghigh-dimensional LBP feature descriptors, the state-of-the-artis achieved. To make the use of high-dimensional featuresapplicable, a sparse projection method is proposed to reducecomputation and model storage.

To tackle the pose variations, pose-invariant face recogni-tion is achieved by using Markov random field based imagematching [12]. In [13] an expression invariant face recognitionalgorithm is proposed based on compressive sensing. RecentlyLiao et al. [14] propose an alignment-free approach for partial

face recognition using Multi-Keypoint Descriptors (MKDs).However, none of these methods are able to simultaneouslyhandle the faces with unconstrained pose, illumination, expres-sion, and alignment. In [15], a 3D model for each subjectin the database is constructed using a single 2D image, thenthe synthesized face images at different poses are generatedfrom the 3D model for matching. In contrast to most distance-based methods for face recognition, the probability that twofaces have the same underlying identity cause is evaluatedin [16]. This approach renders comparable or better resultsthan current methods for face recognition with varying poses.In [17] a probabilistic elastic matching method is proposedto handle pose variation in recognition. In this method, aGaussian mixture model (GMM) is used to capture the spacial-appearance distribution of the faces in the training set andSVM is used for face verification.

To eliminate the lighting effects that hinder face recognitionsystems, a face representation called Local Quantized Pat-terns (LQP) is proposed in [18]. The illumination invarianceof this representation leads to improved performance for state-of-the-art methods on the challenging Labeled Faces in theWild (LFW) [19] database. In [6], a robust illuminationnormalization technique is proposed using Gamma correc-tion, difference of Gaussian filtering, masking and contrastequalization. The preprocessing step improves the recognitionperformance on several benchmark databases. In [20], nearinfrared images are used for face recognition regardless ofvisible illumination changes in the environment.

For face recognition with expressions, most approachesaim at reproducing the neutral faces for matching. In [13],the images of the same subject with different expressionsare viewed as an ensemble of intercorrelated signals andthe sparsity accounts for the variation in expressions. Thus,two feature images, the holistic face image and the expres-sion image, are generated for subsequent face recognition.Hsieh et al. [21] remove the expression from a given faceby using the optical flow computed from the input face withrespect to a neutral face.

Face mis-alignment can abruptly degrade the face recogni-tion system performance [27]. However, for the unconstrainedface images, accurate alignment is a challenging topic itself.In [14] an arbitrary patch of a face image can be used torecognize a face with an alignment-free face representation.Wang et al. [28] propose a misalignment-robust face recogni-tion method by inferring the spatial misalignment parametersin a trained subspace. Cui et al. [29] try to solve misalignmentin face recognition by extracting sparse codes of position-freepatches within each spatial block in the image. A pairwise-constrained multiple metric learning is proposed to integratethe face descriptors from all blocks.

For unconstrained face with multiple variations such aspose, expression, etc., approaches have been developed totake some of these aspects into account simultaneously. Forinstance, multiple face representations and background statis-tics are combined in [30] for improved unconstrained facerecognition. In [31], a reference set of faces is utilized foridentity-preserving alignment and identity classifier learning.A collection of the classifiers is able to discriminate the

Page 3: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

2134 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014

TABLE I

RECENT WORK SUMMARY FOR FACE RECOGNITION VIA INDIRECT SIMILARITY AND REFERENCE-BASED DESCRIPTORS

subjects whose faces are captured in the wild. Müller et al. [32]separate the learning of the invariance from learning newinstances of individuals. A set of examples called model isused to learn the invariance and new instances are comparedby rank list similarity.

2) Recognition via Indirect Matching: Recognition andretrieval via reference-based descriptors and indirect similarityhas been previously explored in the field of computer vision.Shan et al. [33] and Guo et al. [34] use exemplar-basedembedding for vehicle matching. Rasiwasia et al. [35] labelimages with a set of pre-defined visual concepts, and thenuse a probabilistic method based on the visual features forimage retrieval. Liu et al. [36] represent human actions by aset of attributes, and perform activity recognition via visualcharacteristics symbolizing the spatial-temporal evolution ofactions in a video.

Kumar et al. [22] propose using attribute and simile classi-fiers for verification, where face-pairs are compared via theirsimiles and attributes rather than a direct comparison. Exper-iments are performed on PubFig and Labeled Faces in theWild (LFW) databases [19] in which images with significantpose variation are not present. Also, attribute classifiers requireextensive training for recognition across pose.

In the field of face recognition, method based on rank-listshave been investigated. Schroff et al. [24] propose describinga face image by an ordered list of similar faces from aface library, i.e., a rank list representation (Doppelgängerlist) is generated for each image. Proximity between galleryand probes images is determined via the similarity betweenordered rank lists of the corresponding images. The maindrawback of this approach is the complexity of comparing theDoppelgänger lists. The authors propose using the similaritymeasure of Jarvis and Patrick [37] which is computationallyexpensive. Also, no suitable indexing structure is available forefficient ranked-list indexing.

An associate-predict model is proposed in [23] to han-dle the intra-personal variations due to pose and viewpointchange. The input face is first associated with similar iden-tities from an additional database and then the associatedface is used in the prediction. Cui et al. [38] introduce aquadratic programming approach based on reference imagesets for video-based face recognition. Images from the galleryand probe video sequences are bridged with a reference set

pre-defined and pre-structured to a set of local models for exactalignment. Once the image sets are aligned, the similarity ismeasured by comparing the local models.

Chen et al. [26] represent a face x by the sum of twoindependent Gaussian random variables: x = μ + ε, whereμ represents the identity of the face and ε represents facevariations within the same identity. The covariance matricesof μ and ε, Sμ and Sε , are learned from the training data usingan expectation maximization algorithm. Having learned these,the log-likelihood ratio for two input images x1 and x2 beingof the same individual can be computed. A summary of themost recent work for face recognition via indirect similarityis shown in Table I.

B. Contributions

As compared to the state-of-the-art approaches, the distin-guishing characteristics of the proposed method are:

• A novel reference face graph (RFG) based face recog-nition framework is proposed. The contribution of theapproach is shown by performing empirical experimentson various databases.

• DCT locality-sensitive hashing [39] is incorporated, ina novel manner, into the proposed framework for fastsimilarity computation, efficient retrieval, and scalability.

• The proposed framework can be used in conjunctionwith any feature descriptor (e.g., LBP [40], LGBP [41]).The results are shown on several publicly available facedatabases and they are compared with the state-of-the-arttechniques.

III. TECHNICAL APPROACH

Our proposed framework consists of two main steps:1. Preparing the reference face graph; 2. Generating thereference face descriptors. In the following, first we definethe terms used in this paper, and then we discuss each of theabove steps in detail.

A reference face graph (RFG) is a structure of nodes and thedyadic relationships (edges) between the nodes. A referenceface is a node representing a single individual in the referenceface graph. Each reference face has multiple images withvarious poses, expressions, and illumination. All the images ofthese reference faces build a set called the reference basis set.

Page 4: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

KAFAI et al.: RFG FOR FACE RECOGNITION 2135

Fig. 2. System diagram for reference face graph recognition.

Fig. 3. Reference basis set: a set of images containing multiple individuals(reference faces). Each reference face has multiple images with various poses,expressions, and illumination settings.

We use the term basis set because we represent each probeface or gallery face as a linear combination of the similaritiesto the reference faces in the reference basis set. A basisdescriptor of an image is a vector that describes the imagein terms of its similarity to the reference faces. A referenceface descriptor (RFD) is defined by incorporating the RFGnode centrality metrics into the basis descriptor.

Figure 2 shows the overall system diagram for the RFGframework for face identification. A similar process is usedfor face verification.

A. Building the Reference Face Graph

1) Initializing the Reference Face Graph: We define a RFGstructure and pre-populate it with a set of individuals calledthe reference faces. The reference faces are not chosen fromthe gallery or probes. The reference faces build a set calledthe reference basis set. Figure 3 illustrates a sample referencebasis set with N reference faces.

Each image in the reference basis set is partitioned intoregions. In total we use eight different partitioning schemesto create 4, 9, 16, 25, 49, 64, 81, and 100 regions. Thepartitioned regions do not overlap. In each scheme, the regionsare translated 1/4 of their size both horizontally and vertically.By translation, we mean the shifting of grid lines (e.g., aftertranslation of one grid line, the resulting partition regions of an

TABLE II

PSEUDOCODE FOR COMPUTING DCT HASH-BASED SIMILARITY

BETWEEN IMAGE A AND A REFERENCE FACE Ri

Fig. 4. An example of oversampling regions.

image no long have the same size). For each patch in a specificpartitioning, a feature vector is extracted. These feature vectorsare not concatenated. Instead, these feature vectors of the sameperson in the reference set construct a reference face Ri andDCT hash based similarity [39] computation is performed asshown in Table II.

The motivation for having such regions is related to thecommon mis-alignment of face images after face detection,in addition to having a scale-free approach without the needfor facial feature point detection (e.g., eyes). Most state-of-the-art face recognition algorithms require aligned faces tooperate with high accuracy. By using multiple regions withdifferent scale and translation setting we eliminate the needfor face alignment. Our approach is capable of working onnonaligned images with different scale. Figure 4 illustrates asample partitioning for a reference basis set image.

Each reference face is assigned a node Ri , i = 1 . . . Nin the RFG. Each Ri is connected to all other nodes via adirect edge e, i.e., the RFG is complete. The weight of edgeei j between node i and node j , wi j , represents the similaritybetween reference faces Ri and R j , which is defined as

wi j = sim(Ri , R j ) = maxu,v

sim(Rui , Rv

j ), (1)

where Rui and Rv

j refer to images u and v of referencefaces Ri and R j , respectively. Equation 1 shows that thesimilarity between two reference faces is defined as themaximum similarity between images of two reference faces.

Page 5: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

2136 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014

Fig. 5. Sample reference face graph. The smaller circles within each noderefer to multiple images for each reference face in the reference basis set.The edge weights represent the similarity between reference faces followingEquation 1.

The Chi-square measure [40] is used to compute the similaritybetween two reference basis set images. Specifically, given aChi-square distance x , we convert it into a similarity score susing the Chi-square cumulative distribution function given by

s = 1 − γ ( k2 , x

2 )

�( k2 )

, (2)

where �( k2 ) is the gamma function and γ is the lower incom-

plete gamma function. k refers to the degrees of freedom,which is chosen as 59 in the experiments. As for the features,we utilize various texture-based descriptors (e.g., LBP [40],LGBP [41]). Later in Section IV, we state the type of featuresused for each experiment.

It’s important to note that each node represents a referenceface which itself consists of several images of the same personwith various pose, expression, and illumination. Figure 5shows a sample RFG with four reference faces. The smallercircles within each node refer to the multiple images for eachreference face in the reference basis set.

2) Node Centrality Measures: We represent the proposedRFG as a static undirected weighted graph G = (V , E)consisting of a set of nodes V and a set of edges E . We utilizeboth structural and linkage-based analysis to determine themore distinctive reference faces. Our goal is to use the link-age structure to propagate the labels among different nodes.We measure the centrality for each reference face to determinehow important each reference face in the RFG is. In order tomeasure the centrality of each reference face, we adopt threemeasures of node centrality including degree, betweenness,and closeness for weighted graphs [42].

Generally, in weighted graphs, degree of a node is extendedto the sum of weights and named as node strength. SinceG is an undirected weighted graph, we use node strengthinstead of degree as a measure of centrality. Cw

D(i), the nodestrength [42] of node i , is defined as the sum of weights ofall edges connected to i . That is,

CwD(i) =

N∑

j

wi j , (3)

where j represents all other nodes, N is the total numberof nodes, and wi j denotes the weight of the edge betweennode i and node j . For the purpose of this paper, we use theaverage node weakness for each node, C

′wD (i). We use the term

weakness because in the proposed RFG, a node with high nodestrength is less distinct compared with other nodes with low

node strength. In other words, a node with high node weaknessis less similar to its neighboring nodes. C

′wD (i) is defined as:

C′wD (i) = 1 − Cw

D(i)

N − 1. (4)

Before discussing betweenness and closeness, the distancebetween two nodes should be defined. In a weighted graph,the distance between node i and node j is defined as

dw(i, j) = min

(1

wihu

+ . . . + 1

whv j

), (5)

where hu and hv represent intermediary nodes on the pathfrom i to j .

Betweenness is the second measure for centrality used inthis paper. The betweenness for node i, Cw

B (i), is defined asthe number of shortest paths from all nodes to all other nodesthat pass through node i . In other words,

CwB (i) = gw

j k(i)

gwj k

, (6)

where gwj k is the number of shortest paths connecting j and k,

and gwj k(i) is the number of shortest paths connecting j and k

and for which node i is a part of. We use the normalized valueof betweenness, C

′wB , defined as

C′wB (i) = Cw

B (i)

0.5 × (N − 1) × (N − 2). (7)

The third measure of centrality used in this paper is close-ness. Closeness is defined as the inverse sum of shortestdistances to all other nodes from a focal node [42]. In otherwords, closeness represents the length of the average shortestpath between a node and all other nodes in a graph. Moreformally,

CwC (i) =

⎣N∑

j

dw(i, j)

⎦−1

. (8)

For the proposed RFG, the normalized value of closeness,C

′wC (i), is used. It is defined as

C′wC (i) = Cw

C (i)

N − 1. (9)

Node weakness, betweenness, and closeness are computedfor all reference faces in the RFG. For the nodes in weightedgraph G (Figure 5),

weakness : C′wD = {C ′w

D (1), C′wD (2), . . . , C

′wD (N)} (10)

represents a vector of node weakness values,

betweenness : C′wB = {C ′w

B (1), C′wB (2), . . . , C

′wB (N)} (11)

denotes a vector of normalized betweenness values, and

closeness : C′wC = {C ′w

C (1), C′wC (2), . . . , C

′wC (N)} (12)

refers to a vector of normalized closeness values. C′wD repre-

sents a vector of node weakness values, C′wB denotes a vector

of normalized betweenness values, and C′wC refers to a vector

of normalized closeness values.

Page 6: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

KAFAI et al.: RFG FOR FACE RECOGNITION 2137

B. Generating the Reference Face Descriptors

In the following, we discuss how the RFDs are generatedfor the probe and gallery image for face identification. Thesame methodology holds for verification as well.

A probe or gallery image is described as a function of itsregion-based cosine similarity to the reference faces. Comput-ing the reference face descriptor G A for image A consistsof two steps. First, the basis descriptor FA is generated.FA (Equation 13) is an N-dimensional vector representing thesimilarity between image A and the reference faces in thereference basis set.

FA = [sim(A, R1), . . . , sim(A, RN )] (13)

Second, node centrality measures are incorporated to computethe reference face descriptor G A from the basis descriptor FA.

1) Weights for Face Regions: When computing the sim-ilarity between two images, for each region p of a face,a weight wp is assigned, where wp ∈ [0, 1]. The weightsare computed via a genetic optimization algorithm [43]. Thegoal of the objective function is to maximize the classificationaccuracy. A set of training data disjoint from the testing datais used to compute the weights for different face regions. Themotivation for computing weights for each region is based onthe fact that specific regions of the human face (e.g., eyesand mouth) are more discriminative than others for facerecognition [40]. Images from the FERET face database [3]are used to learn the weights of face regions. We ensure thatthe identities for learning region weights are disjoint to thetraining data for building up the reference face graph and thetest data.

2) DCT Locality Sensitive Hashing [39]: Computingsim(A, Ri ) between image A and a reference face Ri iscomputationally expensive because of the large number ofoversampled regions. For this purpose, we use DCT localitysensitive hashing [39] to compute sim(A, Ri ). The DCT hashfunction maps input vectors to a set of hashes of size H chosenfrom a universe 1 . . .U , where U is a large integer (we choseU to be 216). In this way, computing FA is fast and efficient,and it is performed in constant time. The number of hashes perinput vector, H , is 200 and common hash suppression (CHS)is utilized. The algorithm for computing the DCT hash-basedsimilarity between A and Ri is shown in Table II. Furtherdetails on DCT hashing can be found in [39].

3) Reference Face Descriptor: Having generated the basisdescriptor FA for image A, the reference face descriptor GA

is defined as

G A = [FA � C′wD , FA � C

′wB , FA � C

′wC ], (15)

where � represents element-wise multiplication.The reference face descriptor G A represents A in terms

of its similarity to the reference faces, incorporating thecentrality measures corresponding to each reference face. Forface identification the reference face descriptors are generatedfor all probe and gallery images, and recognition is performedby comparing the reference face descriptors. For face verifi-cation the similarity between the image pair’s reference facedescriptors is utilized.

For efficiency, reference face descriptors are compared viaDCT hash based retrieval [39]. The pseudocode for similaritycomputation between reference face descriptors is provided inTable II. To evaluate the contribution of the centrality mea-sures, we provide a baseline using only reference-based (RB)face recognition without centrality measures (i.e., G A = FA

as contrast to Equation (15)).

IV. EXPERIMENTS

To show the effectiveness of the proposed method, weevaluate RFG for both face verification and identification.which are defined as:

• Verification: given a pair of face images, determine ifthey belong to the same person or not (pair matching).The verification threshold is obtained by a linear SVMclassifier trained on images not used during evaluation.

• Identification: Automatically searching a facial image(probe) in a database (gallery) resulting in a set of facialimages ranked by similarity.

The proposed RFG based method is compared with severalstate-of-the-art algorithms on multiple face databases. RFGface recognition does not require any face alignment. Thatis, the input to the proposed methods is the output of theface detection algorithm. In our experiments, we adopt theOpenCV implementation of the Viola-Jones face detector [5].The detected faces have different resolutions depending on theoriginal image size. We normalize all the detections to the sizeof 100 × 100 in all the experiments.

A. Data

The images used in the experiments are from the followingdatabases.

• LFW (Labeled Faces in the Wild) database [19]:5749 individuals, 13233 images gathered from the web

• Multi-PIE face database [44]: 337 individuals,755, 370 images, 15 camera angles, 19 illuminationsettings, various facial expressions

• FacePix database [45]: 30 individuals, 16290 imagesspanning the spectrum of 180° with increments of 1° withvarious illumination conditions

• FEI face database [2]: 200 individuals, 2800 images,14 images per individual with various poses and illumi-nation conditions

• CMU-PIE database [4]: 68 individuals, 41368 images,13 poses, 43 illumination conditions, 4 facial expressions

Each aforementioned database has its own specific settings.For example LFW [19] contains images taken under uncon-strained settings with limited pose variation and occlusion,whereas FacePix [45] images include a variety of poses andillumination settings. We choose these databases to show howour algorithm is capable of performing under various con-strained and unconstrained image settings. Figure 6 presentssample images from the databases.

We prepare two reference basis sets. The reference setwith unconstrained faces is used for recognition on uncon-strained database (e.g., LFW). The reference set with lab-controlled faces is used for recognition on lab-controlled

Page 7: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

2138 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014

Fig. 6. Sample images from public databases. (a) LFW [19]. (b) M-PIE [44].(c) FacePix [45]. (d) FEI [2]. (e) PIE [4].

database (e.g., Multi-PIE). The first set contains unconstrainedimages gathered from the web of 300 subjects, where eachsubject has 30 images, and is used for experiments on theLFW database [19]. No individual is common between thefirst reference basis set and the LFW database. The sec-ond reference basis set, used for experiments on the otherdatabases, includes 500 individuals from the FEI [2] andMulti-PIE [44] databases. For each individual from the FEIdatabase, 13 images are selected under various poses fromfull profile left to full profile right. For each individual fromthe Multi-PIE database 13 poses are chosen from profile leftto profile right with 15 degree increments. In addition, threeimages are selected for each pose with varying illumination(images #01, #07, and #13).

B. Reference Basis Set Evaluation

The reference face descriptors are obtained by projecting thegallery and probe images into the reference basis set; therefore,the selection of reference faces of the reference basis set playsan important role, and affects the overall system performance.In order to evaluate the reference basis set, we adopt themethod in [46] but from a different perspective. In [46],an image alignment method is proposed using sparse andlow-rank decomposition. The low-rank decomposition learnsthe common component of a set of images with variationsin illumination, expression, pose, etc. The calculated sparseerror component indicates the specificity of that image in theimage set. In our application, we examine the diversity ofthe reference faces. Specifically, for each pose or specificexpression in the reference basis set, a matrix D whosecolumns represent images I1, . . . , IN of all reference facesis constructed. The goal is to determine the diversity of Dand how effective it is as the basis function matrix. Theoptimization follows the formulation in [46]:

minA,E,τ

||A||� + λ||E ||1 s.t. D ◦ τ = A + E, (16)

where A is the aligned version of D, ||A||� takes the nuclearnorm of A, E is the error matrix, λ is a positive weightingparameter, and ||E ||1 corresponds to the 1-norm of E . τ is aset of transformations specified by,

D ◦ τ = [I1 ◦ τ1, . . . , In ◦ τn], (17)

where Ii ◦τi represents applying transformation τi to image Ii .If the images in D are similar, the sparse error E wouldbe small since the common component will be dominating.On the other hand, if the images in D are very dissimilar, theerror E would be larger. In the reference basis set, dissimilar

Fig. 7. Mean squared error vs. reference basis set size. The black line is apolynomial trendline of order 2 fitted to the mean squared values.

Fig. 8. Verification accuracy on FacePix [45] with increasing number ofreference faces.

TABLE III

CLASSIFICATION ACCURACY ON LFW [19] USING PROTOCOL OF

“UNRESTRICTED, LABELED OUTSIDE DATA RESULTS”

images are preferred to define a more definitive basis set.Figure 7 shows the averaged mean squared error of E overmultiple experimental runs as the size of the reference basis setincreases. The black line is a polynomial trendline of order 2fitted to the mean squared values.

The peak of the MSE is observed with 400 reference faces inthe reference basis set which contains images from FEI [2] andMulti-PIE [44]. Thus, from the 500 individuals in the secondreference basis set only 400 are chosen for the experiments.A similar evaluation is performed for the first reference basisset, and based on the results 250 individuals are chosen.

Figure 8 illustrates how the number of reference facesaffects the verification accuracy on the FacePix database [45].Four Patch LBP (FPLBP) [30] is the feature used in thisexperiment. As the number of reference faces increases over400, the plot flattens around 78%. Note that the numberof reference faces (N) determines the dimensionality of thereference face descriptors.

C. Comparison With Other Methods

We compare the proposed method with several state-of-the-art algorithms on multiple databases.

• Comparison on LFW database [19]: We followed thedefault 10-fold verification protocol on the LFW databaseto provide a fair comparison. Table III shows the averageclassification accuracies of the proposed RFG approach and

Page 8: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

KAFAI et al.: RFG FOR FACE RECOGNITION 2139

Fig. 9. ROC plot for verification comparison on LFW [19].

Fig. 10. ROC plot comparison for original vs. aligned LFW.

other methods including a commercial recognition system.With simple LBP feature used and without face alignment, ourmethod achieves an average classification accuracy of 0.9284with a standard deviation of 0.0027, which is very competitive.

To examine the performance in detail, we further compareRFG recognition with associate-predict (AP) method [23],likelihood-predict (LP) model [23], cosine similarity metriclearning [50], and message passing model [25] using ReceiverOperating Characteristic (ROC) curve for face verification onthe LFW database. For these methods same LBP features areused as compared to the features in our approach. Specif-ically, for all the methods, uniform LBP descriptors withradius 3 and 8 neighboring pixels are used. Note that allthe methods in Figure 9 except the proposed RFG performon the aligned or funneled version of LFW, whereas RFGuses the non-aligned version. The results in Figure 9 clearlydemonstrate that RFG outperforms the other methods evenwhen no alignment is performed. The results using RB arealso shown in this Figure. As compared to RB which does notutilize centrality measures, RFG achieves better results.

Figure 10 demonstrates how the proposed RFG performs onthe aligned version of LFW in comparison with the originalnon-aligned LFW. The results show that RFG performs quiteclosely for aligned and non-aligned images.

Note that although there are some recent methods(see [31], [11], [29]) which have reported better performanceon the LFW database, but they have limitations which are notpresent in our method. For example, in [31] 95 parts of the facehave to be located to perform alignment and 5000 classifiershave to be trained involving a lot of parameters. In [11]features are extracted at dense facial landmarks and the methodheavily relies on accurate facial landmark detection. In [29]

Fig. 11. Comparison with MKD-SRC [14] on LFW [19].

the input face has to be aligned and for each face regiona Mahalanobis matrix has to be learned. Besides, it is notscalable to larger database. On the contrary, our methoddoes not require any face alignment. In addition, the use ofDCT hashing guarantees the efficiency and scalability of ourmethod and this framework is compatible with any advancedfeature descriptors (e.g., Local Quantized Patterns (LQP) [18],Learning-based (LE) descriptor [23]) to further improve theperformance.

Figure 11 compares the proposed RFG with the multi-keypoint descriptor sparse representation-based classificationmethod (MKD-SRC) [14]. MKD-SRC is a recent paper thatproposes an alignment free approach for practical face recogni-tion. For RFG, LBP features are used with parameters similarto those used for Figure 9. Results for MKD-SRC are reportedusing two features; Gabor Ternary Patterns (GTP) [14], andSIFT descriptors [52]. RFG performs significantly better thanMKD-SRC even though standard LBP features are used.

• Comparison with 3D pose normalization [51]: Wecompare the proposed method with the 3D pose normalizationmethod discussed in [51] for face identification on the FacePixand Multi-PIE [44] databases. Table IV demonstrates howRFG outperforms 3D pose normalization [51] in terms ofrank-1 recognition rate on the FacePix database. The experi-ment setup is as follows. The gallery includes the frontal faceimage of all 30 subjects from the FacePix database. For theprobe images, 180 images per subject are chosen with posesranging from −90° to +90° in yaw angle. The 3D recognitionsystem from [51] requires exact alignment of the face images,and it is limited to poses ranging from −45° to +45°. For faircomparison, Local Gabor Binary Pattern [41] descriptors areused for all methods in this experiment.

For close to frontal poses where the pose ranges from−30° to 30°, 3D pose normalization has better performancethan RFG. This relies on the fact that reconstructing the frontalface image from a close to frontal face image is accuratelyperformed via 3D pose normalization. For poses ranging from−90° to −31° and 31° to 90°, 3D pose normalization is eitherincapable of recognition or it results in a lower recognition ratethan our proposed method. These results show that the pro-posed reference face graph based recognition algorithms havesuperior performance in recognition across pose. When thepose displacement is large, we observe that the 3D approachfails to compete with our proposed methods. More importantly,3D pose normalization requires additional steps such as face

Page 9: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

2140 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014

TABLE IV

RANK-1 RECOGNITION RATES (%) ON FACEPIX [45]; 3D POSE NORMALIZATION [51] VS. RB AND RFG

TABLE V

RANK-1 RECOGNITION RATES (%) ON MULTI-PIE [44]; 3D POSE

NORMALIZATION [51] VS. RFG

boundary extraction and face alignment. The results by RBwithout centrality measures are also provided, it is observedthat for different poses, RFG consistently outperforms RB.

Table V compares RFG with 3D pose normalization onthe Multi-PIE database. The gallery and probes are chosenaccording to the experiments performed in [51]. Images thatare part of this experiment are excluded from the reference setto avoid training and testing on the same images. In addition,we ensure that the identities in the test set do not overlapwith identities in the reference set, which is made up fromboth Multi-PIE and FEI databases. Thus, a fair comparisonwith [51] is made. The chosen probes contain six differentposes from −45° and +45°. Similar to the previous experi-ment, LGBP descriptors are used as features. The results inTable V show that RFG outperforms 3D pose normalization forall six poses from the Multi-PIE database [44]. It is to be notedthat there are recent methods that report better results on Multi-PIE database (see [53], [54]). However, these methods haveto normalize the faces using manually labeled facial points,while alignment is not necessary for our method.

• Comparison with Doppelgänger Lists [24]: Table VIcompares the proposed RFG algorithm with Doppelgängerlist comparison introduced in [24] in terms of verificationaccuracy at equal error rate. FPLBP [30] is utilized as thefeature descriptor for all methods. Similar to [24], ten test setsare chosen from the FacePix database [45], each including500 positive and 500 negative pairs. Selected poses rangefrom −90° to +90°. For direct FPLBP, the probe FPLBPdescriptors are compared directly, and verification is solelybased on direct comparison of FPLBP features. Note that, ourproposed method does not either require alignment of the faceimages or any canonical coordinates or facial control points(eye, nose, etc.), whereas Doppelgänger list comparison [24]requires face image alignment.

The results in Table VI show that even only RB is used,the verification rate is higher than the competing methods.The proposed RFG algorithm outperforms Doppelgänger listcomparison [24] by 10.6% without image alignment. Table VIalso shows that similar to Doppelgänger list comparison, ourproposed method is robust to pose variation. Figure 12 showsexamples FacePix image pairs and if they are correctly orincorrectly classified.

• Comparison with MRF model image matching [12]:Table VII demonstrates how reference graph based face

Fig. 12. FacePix [45] example image pairs. (a) Image pairs correctlyclassified by RFG but incorrectly classified by DLC [24]. (b) Image pairincorrectly classified by both RFG and DLC [24].

recognition performs under uneven illumination settings inaddition to pose difference. For this reason, we select theprobes as a subset of the CMU-PIE database [4] imageswith three illumination settings and three pose angles (frontal,profile, and 3/4 profile). The gallery consists of frontal faceimages of all 68 subjects of the CMU-PIE database. From the21 images with different illumination settings available foreach pose, we randomly choose 3 images to perform recogni-tion and obtain the varying illumination results in Table VII.Uniform LBP features with a radius of 3 and 8 neighboringpixels are used as features [40].

The MRF model image matching [12] results in Table VIIshow that the rank-1 recognition rate decreases by 38.8%(79% to 40.2%) when illumination conditions change for theprofile pose. Under the same conditions, we observe a rank-1recognition rate decrease of only 1.3% (85.2% to 83.9%) forthe proposed RFG algorithm. A similar recognition rate drop-off is also observed for the 3/4 profile pose. This shows howthe proposed face recognition algorithm performs with littledegradation under varying illumination settings.

• Comparison with stereo matching [55], [56]: Theproposed RFG algorithm is compared with two stereo match-ing techniques for recognition across pose [55], [56]. Stereomatching provides a robust measure for face similarity whichis pose-insensitive. Comparison is performed on the CMUPIE database [4]. For each of the 68 individuals in the PIEdatabase, one image is randomly selected for the galleryand the remaining images as probes. In this experiment, theStereo Matching Distance [55] and Slant Stereo MatchingDistance [56] methods report 82.4% and 85.3% rank-1 recog-nition rate respectively, whereas the proposed RFG algorithmachieves 92.4% rank-1 recognition rate as shown in Table VIII.

Compared to the second best results by slant stereo match-ing distance [56], the proposed method reports a performancegain of over 8% in terms of rank-1 recognition rate on theCMU-PIE database. Using the stereo matching cost as ameasure of similarity between two images with different poseand illumination settings is expensive, it requires detection oflandmark points, and the cost of retrieval is high for largedatabases. On the contrary, the proposed approach does notrequire facial feature detection and is fast and efficient due toits integration with DCT hashing-based retrieval.

Page 10: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

KAFAI et al.: RFG FOR FACE RECOGNITION 2141

TABLE VI

VERIFICATION ACCURACY (%) OVER VARIOUS POSE RANGES ON FACEPIX [45]; DOPPELGÄNGER LISTS [24] VS. RB AND RFG

TABLE VII

RANK-1 RECOGNITION RATE (%) COMPARISON UNDER NEUTRAL AND

VARYING ILLUMINATION ON CMU-PIE [4]; MRF MODEL IMAGE

MATCHING [12] VS. RFG

TABLE VIII

RANK-1 RECOGNITION RATE COMPARISON ON CMU-PIE [4];

STEREO MATCHING [55], [56] VS RFG

D. Computational Cost

The proposed method incorporates DCT locality sensitivehashing [39] for efficient similarity computation. Experimentswere performed on a laptop with Intel Core i7 CPU and 8GBof RAM. The time required to generate the reference facedescriptor (online processing) for a given image is 0.01 secondon average with non-optimized code. In [39] the resultsshowed that the performance of DCT hashing is very closeto linear scan. The time needed to retrieve a probe from a 40ksize gallery using DCT hashing is about 10 ms, while linearscan takes about 9000 ms. In addition, the time for linear scanincreases linearly with the size of the gallery. The time forDCT retrieval, however, is essentially constant. Please see [39]for more details and experimental results. The main limitationof the current approach is that batch-training is performed tobuild the reference face graph, which however needs to beperformed offline and only once.

V. CONCLUSIONS AND FUTURE WORK

We proposed a novel reference face graph (RFG) basedapproach towards face recognition in real-world scenarios. Theextensive empirical experiments on several publicly availabledatabases demonstrate that the proposed method outperformsthe state-of-the-art methods with similar feature descriptortypes. The proposed approach does not require face alignment,and it is robust to changes in pose. Results on real-worlddata with additional complications such as expression, scale,and illumination suggest that the RFG based approach is alsorobust in these aspects. The proposed approach is scalabledue to the integration of DCT hashing with the descriptivereference face graph that covers a variety of face images withdifferent pose, expression, scale, and illumination.

Future work includes development of other effective meth-ods for reference set selection. Another direction is to studyfeature transformation or selection methods to further improvethe performance of the proposed reference face graph basedface recognition.

REFERENCES

[1] The New York Times. [Online]. Available: http://www.nytimes.com/2013/ 04/ 19/ us/ fbi-releases-video-of-boston-bombing-suspects.html,accessed Apr. 2013.

[2] FEI Face Database. [Online]. Available: http://www.fei.edu.br/∼cet/facedatabase.html

[3] P. J. Phillips, H. Moon, P. Rauss, and S. A. Rizvi, “The FERETevaluation methodology for face-recognition algorithms,” in Proc.IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 1997,pp. 137–143.

[4] T. Sim, S. Baker, and M. Bsat, “The CMU pose, illumination, andexpression (PIE) database,” in Proc. 5th IEEE Int. Conf. Autom. FaceGesture Recognit., May 2002, pp. 46–51.

[5] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Com-put. Vis., vol. 57, no. 2, pp. 137–154, May 2004.

[6] X. Tan and B. Triggs, “Enhanced local texture feature sets for facerecognition under difficult lighting conditions,” IEEE Trans. ImageProcess., vol. 19, no. 6, pp. 1635–1650, Jun. 2010.

[7] (2012). Facevacs Software Developer Kit, Cognitec Systems GmbH.[Online]. Available: http://www.cognitec-systems.de

[8] L. Wiskott, J.-M. Fellous, N. Kuiger, and C. von der Malsburg, “Facerecognition by elastic bunch graph matching,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 19, no. 7, pp. 775–779, Jul. 1997.

[9] O. Barkan, J. Weill, L. Wolf, and H. Aronowitz, “Fast high dimensionalvector multiplication face recognition,” in Proc. IEEE Int. Conf. Comput.Vis. (ICCV), Dec. 2013, pp. 1960–1967.

[10] Z. Lei, M. Pietikainen, and S. Z. Li, “Learning discriminant facedescriptor,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 2,pp. 289–302, Feb. 2014.

[11] D. Chen, X. Cao, F. Wen, and J. Sun, “Blessing of dimensionality:High-dimensional feature and its efficient compression for faceverification,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.(CVPR), Jun. 2013, pp. 3025–3032.

[12] S. R. Arashloo and J. Kittler, “Energy normalization for pose-invariant face recognition based on MRF model image matching,” IEEETrans. Pattern Anal. Mach. Intell., vol. 33, no. 6, pp. 1274–1280,Jun. 2011.

[13] P. Nagesh and B. Li, “A compressive sensing approach for expression-invariant face recognition,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit., Jun. 2009, pp. 1518–1525.

[14] S. Liao, A. K. Jain, and S. Z. Li, “Partial face recognition: Alignment-free approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 5,pp. 1193–1205, May 2013.

[15] U. Prabhu, J. Heo, and M. Savvides, “Unconstrained pose-invariant facerecognition using 3D generic elastic models,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 33, no. 10, pp. 1952–1961, Oct. 2011.

[16] P. Li, Y. Fu, U. Mohammed, J. H. Elder, and S. J. D. Prince,“Probabilistic models for inference about identity,” IEEE Trans. PatternAnal. Mach. Intell., vol. 34, no. 1, pp. 144–157, Jan. 2012.

[17] H. Li, G. Hua, Z. Lin, J. Brandt, and J. Yang, “Probabilistic elasticmatching for pose variant face verification,” in Proc. IEEE Conf.Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013, pp. 3499–3506.

[18] S. U. Hussain, T. Napoléon, and F. Jurie, “Face recognition usinglocal quantized patterns,” in Proc. Brit. Mach. Vis. Conf., 2012,pp. 99.1–99.11.

Page 11: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

2142 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014

[19] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeledfaces in the wild: A database for studying face recognition inunconstrained environments,” Dept. Comput. Sci., Univ. Massachusetts,Amherst, MA, USA, Tech. Rep. 7–49, Oct. 2007.

[20] S. Z. Li, R. Chu, S. Liao, and L. Zhang, “Illumination invariant facerecognition using near-infrared images,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 29, no. 4, pp. 627–639, Apr. 2007.

[21] C.-K. Hsieh, S.-H. Lai, and Y.-C. Chen, “Expression-invariant facerecognition with constrained optical flow warping,” IEEE Trans.Multimedia, vol. 11, no. 4, pp. 600–610, Jun. 2009.

[22] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, “Attribute andsimile classifiers for face verification,” in Proc. IEEE Int. Conf. Comput.Vis., Sep./Oct. 2009, pp. 365–372.

[23] Q. Yin, X. Tang, and J. Sun, “An associate-predict model for facerecognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Jun. 2011, pp. 497–504.

[24] F. Schroff, T. Treibitz, D. Kriegman, and S. Belongie, “Pose, illumi-nation and expression invariant pairwise face-similarity measure viaDoppelgänger list comparison,” in Proc. IEEE Int. Conf. Comput.Vis. (ICCV), Nov. 2011, pp. 2494–2501.

[25] W. Shen, B. Wang, Y. Wang, X. Bai, and L. J. Latecki, “Faceidentification using reference-based features with message passingmodel,” Neurocomputing, vol. 99, pp. 339–346, Jan. 2013.

[26] D. Chen, X. Cao, L. Wang, F. Wen, and J. Sun, “Bayesian face revisited:A joint formulation,” in Proc. Eur. Conf. Comput. Vis., vol. 7574. 2012,pp. 566–579.

[27] S. Shan, Y. Chang, W. Gao, B. Cao, and P. Yang, “Curse ofmis-alignment in face recognition: Problem and a novel mis-alignmentlearning solution,” in Proc. IEEE Int. Conf. Autom. Face GestureRecognit., May 2004, pp. 314–320.

[28] S. Yan, H. Wang, J. Liu, X. Tang, and T. S. Huang, “Misalignment-robust face recognition,” IEEE Trans. Image Process., vol. 19, no. 4,pp. 1087–1096, Apr. 2010.

[29] Z. Cui, W. Li, D. Xu, S. Shan, and X. Chen, “Fusing robust face regiondescriptors via multiple metric learning for face recognition in the wild,”in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2013,pp. 3554–3561.

[30] L. Wolf, T. Hassner, and Y. Taigman, “Effective unconstrained facerecognition by combining multiple descriptors and learned backgroundstatistics,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 10,pp. 1978–1990, Oct. 2011.

[31] T. Berg and P. N. Belhumeur, “Tom-vs-Pete classifiers and identity-preserving alignment for face verification,” in Proc. Brit. Mach. Vis.Conf., 2012, pp. 129.1–129.11.

[32] M. K. Müller, M. Tremer, C. Bodenstein, and R. P. Würtz,“Learning invariant face recognition from examples,” Neural Netw.,vol. 41, pp. 137–146, May 2013.

[33] Y. Shan, H. Sawhney, and R. Kumar, “Vehicle identification betweennon-overlapping cameras without direct feature matching,” in Proc. 10thIEEE Int. Conf. Comput. Vis., vol. 1. Oct. 2005, pp. 378–385.

[34] Y. Guo, Y. Shan, H. Sawhney, and R. Kumar, “PEET: Prototypeembedding and embedding transition for matching vehicles overdisparate viewpoints,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog-nit., Jun. 2007, pp. 1–8.

[35] N. Rasiwasia, P. J. Moreno, and N. Vasconcelos, “Bridging the gap:Query by semantic example,” IEEE Trans. Multimedia, vol. 9, no. 5,pp. 923–938, Aug. 2007.

[36] J. Liu, B. Kuipers, and S. Savarese, “Recognizing human actions byattributes,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),Jun. 2011, pp. 3337–3344.

[37] R. A. Jarvis and E. A. Patrick, “Clustering using a similarity measurebased on shared near neighbors,” IEEE Trans. Comput., vol. C-22,no. 11, pp. 1025–1034, Nov. 1973.

[38] Z. Cui, S. Shan, H. Zhang, S. Lao, and X. Chen, “Image sets alignmentfor video-based face recognition,” in Proc. IEEE Conf. Comput. Vis.Pattern Recognit. (CVPR), Jun. 2012, pp. 2626–2633.

[39] M. Kafai, K. Eshghi, and B. Bhanu, “Discrete cosine transform locality-sensitive hashes for face retrieval,” IEEE Trans. Multimedia, vol. 16,no. 4, pp. 1090–1103, Jun. 2014.

[40] T. Ahonen, A. Hadid, and M. Pietikainen, “Face description with localbinary patterns: Application to face recognition,” IEEE Trans. PatternAnal. Mach. Intell., vol. 28, no. 12, pp. 2037–2041, Dec. 2006.

[41] W. Zhang, S. Shan, W. Gao, X. Chen, and H. Zhang, “Local Gaborbinary pattern histogram sequence (LGBPHS): A novel non-statisticalmodel for face representation and recognition,” in Proc. 10th IEEE Int.Conf. Comput. Vis., vol. 1. Oct. 2005, pp. 786–791.

[42] T. Opsahl, F. Agneessens, and J. Skvoretz, “Node centrality in weightednetworks: Generalizing degree and shortest paths,” Soc. Netw., vol. 32,no. 3, pp. 245–251, 2010.

[43] A. Popov. (2005). Genetic Algorithms for Optimization. [Online].Available: http://www.automatics.hit.bg

[44] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, “Multi-PIE,”Image Vis. Comput., vol. 28, no. 5, pp. 807–813, 2010.

[45] D. Little, S. Krishna, J. Black, and S. Panchanathan, “A methodologyfor evaluating robustness of face recognition algorithms with respect tovariations in pose angle and illumination angle,” in Proc. IEEE Int. Conf.Acoust., Speech, Signal Process., vol. 2. Mar. 2005, pp. 89–92.

[46] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robustalignment by sparse and low-rank decomposition for linearly correlatedimages,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11,pp. 2233–2246, Nov. 2012.

[47] Z. Cao, Q. Yin, X. Tang, and J. Sun, “Face recognition withlearning-based descriptor,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (CVPR), Jun. 2010, pp. 2707–2714.

[48] T. Berg and P. N. Belhumeur, “POOF: Part-based one-vs.-one featuresfor fine-grained categorization, face verification, and attribute estima-tion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR),Jun. 2013, pp. 955–962.

[49] Y. Taigman and L. Wolf. (2011). “Leveraging billions of faces to over-come performance barriers in unconstrained face recognition.” [Online].Available: http://arxiv.org/abs/1108.1122

[50] H. V. Nguyen and L. Bai, “Cosine similarity metric learning for faceverification,” in Proc. 10th Asian Conf. Comput. Vis., 2010, pp. 709–720.

[51] A. Asthana, T. K. Marks, M. J. Jones, K. H. Tieu, and M. Rohith, “Fullyautomatic pose-invariant face recognition via 3D pose normalization,”in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Nov. 2011, pp. 937–944.

[52] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, 2004.

[53] X. Cai, C. Wang, B. Xiao, X. Chen, and J. Zhou, “Regularized latentleast square regression for cross pose face recognition,” in Proc. 23rdInt. Joint Conf. Artif. Intell., 2013, pp. 1247–1253.

[54] A. Li, S. Shan, and W. Gao, “Coupled bias–variance tradeoff forcross-pose face recognition,” IEEE Trans. Image Process., vol. 21, no. 1,pp. 305–315, Jan. 2012.

[55] C. D. Castillo and D. W. Jacobs, “Using stereo matching with generalepipolar geometry for 2D face recognition across pose,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 31, no. 12, pp. 2298–2304, Dec. 2009.

[56] C. D. Castillo and D. W. Jacobs, “Wide-baseline stereo for facerecognition with large pose variation,” in Proc. IEEE Conf. Comput.Vis. Pattern Recognit. (CVPR), Jun. 2011, pp. 537–544.

Mehran Kafai (S’11–M’13) is currently a ResearchScientist with the Hewlett Packard Laboratories,Palo Alto, CA, USA. He received the M.Sc. degreein computer engineering from the Sharif Universityof Technology, Tehran, Iran, in 2005, the M.Sc.degree in computer science from San Francisco StateUniversity, San Francisco, CA, USA, in 2009, andthe Ph.D. degree in computer science from the Cen-ter for Research in Intelligent Systems, Universityof California at Riverside, Riverside, CA, USA, in2013. His recent research has been concerned with

secure computation, information retrieval, and big data analysis.

Le An (S’13) received the B.Eng. degree intelecommunications engineering from ZhejiangUniversity, Hangzhou, China, in 2006, and the M.Sc.degree in electrical engineering from the EindhovenUniversity of Technology, Eindhoven, TheNetherlands, in 2008. He is currently pursuing thePh.D. degree with the Department of ElectricalEngineering, University of California at Riverside,Riverside, CA, USA, where he is currently aGraduate Student Researcher with the Center forResearch in Intelligent Systems. His research

interests include image processing, computer vision, pattern recognition, andmachine learning. His current research focuses on face recognition, personreidentification, and facial expression recognition. He was a recipient ofthe Best Paper Award from the 2013 IEEE International Conference onAdvanced Video and Signal-Based Surveillance.

Page 12: 2132 IEEE TRANSACTIONS ON INFORMATION ...mkafai/papers/Paper_tifs2014.pdf2132 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 9, NO. 12, DECEMBER 2014 Reference Face

KAFAI et al.: RFG FOR FACE RECOGNITION 2143

Bir Bhanu (S’72–M’82–SM’87–F’95) received theS.M. and E.E. degrees in electrical engineer-ing and computer science from the MassachusettsInstitute of Technology, Cambridge, MA, USA, thePh.D. degree in electrical engineering from theUniversity of Southern California, Los Angeles, CA,USA, and the M.B.A. degree from the University ofCalifornia at Irvine, Irvine, CA, USA.

He serves as the Interim Chair of Bioengineeringand the Founding Director of the InterdisciplinaryCenter for Research in Intelligent Systems with the

University of California at Riverside (UCR), Riverside, CA, USA, where he iscurrently the Distinguished Professor of Electrical and Computer Engineering.He was the Founding Professor with the College of Engineering wherehe served as the Founding Chair of Electrical Engineering from 1991 to1994. Since 1991, he has been the Director of Visualization and IntelligentSystems Laboratory at UCR. He currently serves as the Director of theNational Science Foundation (NSF) Interdisciplinary Graduate Education,Research, and Training Program in video bioinformatics at UCR. Since1991, 2006, and 2008, he has been a Cooperative Professor of ComputerScience and Engineering, Bioengineering, and Mechanical Engineering atUCR. He was a Senior Honeywell Fellow with Honeywell Inc., Minneapo-lis, MN, USA. He has been with the Faculty of the Computer Science,University of Utah, Salt Lake City, UT, USA, and with Ford Aerospaceand Communications Corporation, Newport Beach, CA, USA, the FrenchInstitute for Research in Computer Science and Control, Paris, France, andthe IBM San Jose Research Laboratory, San Jose, CA, USA. He has beenthe Principal Investigator of various programs for the NSF, the DefenseAdvanced Research Projects Agency (DARPA), NASA, the Air Force Officeof Scientific Research, the Office of Naval Research, the Army ResearchOffice, and other agencies and industries in the areas of video networks,video understanding, video bioinformatics, learning and vision, image under-standing, pattern recognition, target recognition, biometrics, autonomousnavigation, image databases, and machine-vision applications. He is thecoauthor of the books entitled Computational Learning for Adaptive Com-puter Vision (to be published), Human Recognition at a Distance inVideo (Berlin, Germany: Springer-Verlag, 2011), Human Ear Recognition by

Computer (Berlin, Germany: Springer-Verlag, 2008), Evolutionary Synthesisof Pattern Recognition Systems (Berlin, Germany: Springer-Verlag, 2005),Computational Algorithms for Fingerprint Recognition (Norwell, MA, USA:Kluwer, 2004), Genetic Learning for Adaptive Image Segmentation (Norwell,MA, USA: Kluwer, 1994), and Qualitative Motion Understanding (Nor-well, MA, USA: Kluwer, 1992). He is the Coeditor of the books entitledComputer Vision Beyond the Visible Spectrum (Berlin, Germany: Springer-Verlag, 2004), Distributed Video Sensor Networks (Berlin, Germany: Springer-Verlag, 2011), and Multibiometrics for Human Identification (Cambridge,U.K.: Cambridge University Press, 2011). He is the holder of 18 (fivepending) U.S. and international patents. He has authored over 470 reviewedtechnical publications, including over 125 journal papers and 44 bookchapters.

Dr. Bhanu is a fellow of the American Association for the Advancement ofScience, the International Association of Pattern Recognition, and the Interna-tional Society for Optical Engineering. He has served as the General Chair ofthe IEEE Conference on Computer Vision and Pattern Recognition (CVPR),the IEEE Conference on Advanced Video and Signal-Based Surveillance, theAssociation for Computing Machinery/IEEE Conference on Distributed SmartCameras, the DARPA Image Understanding Workshop, the IEEE Workshopson Applications of Computer Vision (founded in 1992, currently the WinterApplications of Computer Vision Conference), and the CVPR Workshopson Learning in Computer Vision and Pattern Recognition, Computer VisionBeyond the Visible Spectrum, and Multi-Modal Biometrics. He serves as theGeneral Chair of the IEEE Winter Conference on Applications of ComputerVision (2015). He served on the Editorial Board of various journals, and hasedited special issues of several IEEE publications, such as the IEEE TRANS-ACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, the IEEETRANSACTIONS ON IMAGE PROCESSING, the IEEE TRANSACTIONS ON

SYSTEMS, MAN, AND CYBERNETICS, PART B: CYBERNETICS, the IEEETRANSACTIONS ON ROBOTICS AND AUTOMATION, the IEEE TRANSAC-TIONS ON INFORMATION FORENSICS AND SECURITY, the IEEE SENSORSJOURNAL, and the IEEE Computer Society. He also served on the IEEEFellow Committee from 2010 to 2012. He received the Best ConferencePapers and Outstanding Journal Paper Awards, and the Industrial and Univer-sity Awards for Research Excellence, Outstanding Contributions, and TeamEfforts, and the UCR Doctoral/Dissertation Advisor/Mentor Award.