3d face recognition based on g-h shape variation

3D Face Recognition based on G-H Shape variation

Chenghua Xu1, Yunhong Wang1, Tieniu Tan1, and Long Quan2

1National Laboratory of Pattern Recognition, Institute of Automation, CAS P. O. Box 2728, Beijing, P. R. China, 100080

1chxu, wangyh, [email protected] 2Department of Computer Science, Hong Kong University of Science and Technology,

Kowloon, Hong Kong [email protected]

Abstract. Face recognition has been an interesting issue in pattern recognition over the past few decades. In this paper, we propose a new method for face recognition using 3D information. During preprocessing, the scanned 3D point clouds are first registered together, and at the same time, the regular meshes are generated. Then the novel shape variation representation based on Gaussian-Hermite moments (GH-SVI) is proposed to characterize an individual. Experimental results on the 3D face database 3DPEF, with complex pose and expression variations, and 3D_RMA, likely the largest 3D face database currently available, demonstrate that the proposed features are very important to characterize an individual.

1 Introduction

Nowadays biometric identification has received much attention due to the social requirement of reliably characterizing individuals. Of all the biometrics features, the face is among the most common and most accessible so that face recognition remains one of the most active research issues in pattern recognition. Over the past few decades, most work has focused on the source of 2D intensity or color images [1]. Since the accuracy of 2D face recognition is influenced by variations of poses, expressions, illumination and subordinates, it is still difficult to develop a robust automatic 2D face recognition system.

With the rapid development of 3D acquisition equipment, 3D capture is becoming easier and faster, and face recognition based on 3D information is attracting more and more attention. 3D face recognition usually explores depth information and surface features to characterize an individual, which provides a promising way to understand human facial features in 3D space and has the potential possibility to improve the performance of a recognition system. The advantages of using 3D facial data: sufficient information, invariance of measured features relative to geometric transformation and capture process by laser scanners being immune to illumination variation, are attracting more and more attention. This makes 3D face recognition a promising solution to overcome difficulties faced with 2D face recognition.

Using 3D features to characterize an individual is the most common method of 3D face recognition. This category mainly focuses on how to extract and represent 3D features. Some earlier researches on curvature analysis [2,3,4] have been proposed for face recognition based on high-quality range data from laser scanners. In addition, based on 3D surface features, some recognition schemes have been developed. Chua et al. [5] represent the facial rigid parts by point signatures to identify an individual. Beumier et al. [6,7] propose two methods of surface matching and central/lateral profiles to compare two instances. Both of these methods construct some central and lateral profiles to represent an individual. Tanaka et al. [8] treat the face recognition problem as a 3D shape recognition problem of rigid free-form surfaces. Each face is represented as an Extended Gaussian Image, constructed by mapping principal curvatures and their directions. In more recent work, Hesher et al. [9] use a 3D scanner for generating range images and registering them by aligning salient facial features. PCA approaches are explored to reduce the dimensionality of feature vector. Lee et al. [10] perform 3D face recognition by locating the nose tip, and then forming a feature vector based on contours along the face at a sequence of depth values.

In this paper, a novel local surface representation, GH-SVI (Shape Variation Information based on Gaussian-Hermite Moments) is extracted to represent the facial features. We first define a metric to quantify the shape information with a 1-D signal, and G-H moments [11,12] are then applied to describe the shape variation. This representation of shape variation is novel and shows its excellent performance in our experiments.

The remainder of this paper is organized as follows. Section 2 describes the methods of preprocessing: nose tip detection and registration. The process of feature extraction is described in Section 3. Section 4 reports the experimental results and gives some comparisons with existing methods. Finally, Section 5 summarizes this paper.

2 Preprocessing

Usually, the point clouds from laser scanners have different poses. It is essential to exactly align the point clouds prior to feature extraction. In this section, we finely register point clouds using the facial rigid area.

2.1 Nose Tip Detection

In the facial range data, the nose is the most distinct feature. Most existing methods [7,9,10] for nose detection is usually based on the assumption that the nose tip is the highest point in the range data. However, due to the noise and rotation of the subject, the assumption does not hold. Gordon [3] used curvature information to detect the nose. Her methods are suitable for clean 3D data and would not work in the case that there are holes around the nose.

Here, we locate the nose using local statistic features [13]. This method is immune to the rotation and translation, holes and outliers, and suitable for multi-resolution data. Of all the instances in our 3D database, 3DPEF, only two samples fail. We mark

the nose tip manually in the wrongly detected point clouds to ensure the following processing.

2.2 Coarse Alignment

It is assumed that Ω is the point set of the cloud. Thus, its covariance matrix can be obtained:

∑Ω∈

−⊗−=x

xxxxQ )()( (1)

where ⊗ denotes the outer product vector operator, Q is 33× positive semi-definite

matrix and x is the mean value of all the points. If 321 λλλ ≥≥ denote the eigenvalues of Q with unit eigenvectors 321 ,, vvv , respectively, we consider the orthogonal vectors, 321 ,, vvv , as the three main axis of the point cloud.

Then we rotate the point cloud so that 321 ,, vvv are parallel to Y-, X- and Z- axis of the reference coordinate system, respectively. Finally, the point cloud is translated so that its nose tip overlaps the origin of the reference coordinate system. Thus, all the point clouds are coarsely registered together according to the above transformation.

(a) (b) (c) (d)(a) (b) (c) (d)(a) (b) (c) (d)Fig. 1. Registration results. (a)(b)The original point clouds to register. (c) The results ofcoarse registration. (d) The final registration results. For clear display, we show the firstmodel in shading mode in (c,d). The top row: point clouds with different poses; The middle row: point clouds with expressions of angry and laugh; The bottom row: point clouds of different persons with the same pose and expression.

2.3 Fine Alignment

This process aims to register two point clouds finely. It can be formulated as follows: given two meshes, the individual mesh, P , and the ground mesh, Q , find the best rigid transformation, T , so that the error function, )),(( QPTd Γ= , is minimized, where Γ is the distance metric.

Here we explore the classic algorithm of the Iterative Closest Point (ICP) [14]. ICP is an efficient, accurate and reliable method for the registration of free form surfaces, and it converges monotonically to a local minimum. At each iteration, the algorithm computes correspondences by finding the closest triangle for each vertex, and then minimizes the mean square error between the correspondences.

The facial point cloud contains expression variations. Strictly, the registration among different scans is non-rigid transformation. During registration, we only consider the points above the nose tip. Thus, the registration result can be avoided the unwanted influence of mouth and jaw, which is most prone to expressions. Fig.1 shows some results.

2.4 Meshing from Point Clouds

Beginning with a simple basic mesh (see Fig.2), a regular and dense mesh model is generated to fit the 3D scattered point cloud. We develop a universal fitting algorithm [15] for regulating the hierarchical meshes to conform to the 3D points.

Fig.2 shows the mesh after regulation on different refining levels. Of course, the denser the mesh is, the better the face is represented with more time and space cost. In this paper, we use a mesh refined four times (545 nodes and 1024 triangles) to balance the resolution of the facial mesh and the cost of time and space.

To reduce the influence of noise and decrease the computational cost, the points in the margin are ignored. Also, during meshing, we only regulate the Z coordinate of each mesh node, which not only speeds up the meshing process but also keeps the correspondences of the generated meshes.

3 Feature Extraction

So far, all the point clouds have been registered together, and also each is described with a regular mesh, which has corresponding nodes and assists in extracting features to characterize an individual. We extract shape variation information (GH-SVI) to characterize an individual.

(a) (b) (c) (d) (e)(a) (b) (c) (d) (e)(a) (b) (c) (d) (e)

Fig. 2. The regulated meshes in different levels. (a)Basic mesh. (b)-(e)Level one to four.

We first define a metric to describe the local shape of each mesh node with a 1-D vector. To transfer the 3D shape onto a 1-D vector, we first define a metric to describe the local shape of one mesh node as shown in Fig.3. Since the mesh well approximates the point cloud, we can obtain the following local information of each mesh node, ep , that is, its spatial direct neighboring triangles, ,, 21 nTTT L , its normal, peN and neighboring points in the point cloud within a small sphere. Due to

the regularity of our mesh, the number of neighboring triangles of the common node (not the edge node) is always six. Its normal, peN , can be estimated according to its

neighboring triangles. The radius of the sphere to decide the neighboring points is set as half of the length of one mesh edge in our work.

Further, the neighboring points can be classified into n categories, ,, 21 nCCC L . Which class one point belongs to depends on which triangle the

point's projection falls in the same direction to the normal, peN . For each class kC ,

we can define its surface signal as the following:

∑=

−+=m

ipeekiek Npq

md

1),cos(

21

21

(2)

with

peeki

peekipeeki

Npq

NpqNpq

•−

•−=−

(

)(),cos(

(3)

where kiq is the neighboring point belonging to the class, kC , m is the number of the point in kC , and ]1,0[∈ekd .

Then we can describe the local shape of each mesh node using the following vector:

,,, 21 eneee ddds L= (4)

where ekd is the surface signal. This vector describes the shape near this vertex. It is noted that if one class, kC , does not contain any points, its surface signal can be replaced with the mean value of neighboring classes. If one mesh node lies in one hole in the point cloud so that it has no any neighboring points within the preset sphere, we can enlarge the radius until neighboring points are included. In addition, the order and the beginning position of the shape vectors, es , of all nodes, should be identical.

According to this metric, we can describe the shape of each row in the mesh with a combination of shape vectors of all nodes in this row respectively.

,,, 21 iriii sssS L= (5)

where iS is the shape vector of i th row and ijs is the shape vector of j th vertex in

i th row. Further, from 1S to nS , we connect them in turn to form a long shape

vector, S , in the alternate way of head and tail connection. The 1-D vector, S , is used to describe the shape of one mesh.

It is well-known that moments have been widely used in pattern recognition and image processing, especially in various shape-based applications. Here, Gaussian-Hermite moments are used for feature representation due to their mathematical orthogonality and effectiveness for characterizing local details of the signal [12]. They provide an effective way to quantify the signal variation. The n th order 1-D G-H moment ))(,( xSxM n of a signal )(xS is defined as:

,.....2,1,0)()()( =+= ∫∞

∞−ndttxStBxM nn (6)

with

)2/exp()2(),(

)exp()exp()1()(

)/(),()(

222/12

22

σπσσ

σσ

xtgdt

tdttH

tHtgtB

n

nn

n

nn

−=

−−=

=

−

(7)

where ),( σtg is a Gaussian function and )(tH n is a scaled Hermite polynomial

function of order n . G-H moments have many excellent qualities, especially being insensitive to noise generated during differential operations. The parameter σ and the

ep

PeN

1T

2T

3T

4T

5T6T

iq

θep

PeN

1T

2T

3T

4T

5T6T

iq

θep

PeN

1T

2T

3T

4T

5T6T

iq

θ

Fig. 3. Shape representation of one mesh node.

-10 0 100

0.2

0.4

-10 0 10-0.5

0

0.5

-10 0 10-1

0

1

-10 0 10-5

0

5

-10 0 10-5

0

5

10

0 40 800.02

0.04

0.06

0.08

0 40 80-0.05

0

0.05

0 40 800

0.1

0.2

0 40 80-0.2

0

0.2

0 40 800

0.5

1

1.5

Fig. 4. Gaussian-Hermite moments. The top row: the spatial responses of the Gaussian-Hermite moments, order 0 to 4; the bottom row: the corresponding G-H moments of a segment of 1-D shape signal.

order of G-H moments need to be determined through experiments. Here we use 0 th to 4 th order G-H moments to analyze the shape variation when 0.2=σ . The top row in Fig.4 shows the spatial responses of the Gaussian-Hermite moments from order 0 to 4.

To shape vector, S , we calculate its n th order G-H moments, thus obtaining 1-D moment vectors, nSM , which are called n th G-H shape variation information (GH-SVI). They describe the shape variation of the facial surface. The bottom row in Fig.4 shows a segment of nSM using different orders of G-H moments.

4. Experiments

To test the proposed algorithm, we implement it with different databases. All the experiments are executed in the PC with a PIV 1.3GHz processor, 128M RAM and the display card Nvidia Getforce2 MX 100/200.

4.1 Database

Unlike 2D face images, there is no common 3D face database of a feasible size for testing recognition algorithms. Here, we use two different databases to test our proposed algorithm. The first database (3D Pose and Expression Face Database, 3DPEF) is sampled in our lab using the Minolta VIVID 900 working on Fast Mode. This data set contains 30 persons (6 women), and each person includes five poses (normal, left, right, up, down) and five expressions (smile, laugh, sad, surprise, eye closed). The instances of two people are shown in Fig.5.

The second 3D face database is 3D_RMA [6,7], likely the biggest database in the public domain, where each face is described with a 3D scattered point cloud, obtained by the technique of structured light. The database includes 120 persons and two sessions. In each session, each person is sampled with three shots. From these sessions, two databases are built: Automatic DB (120 persons) and Manual DB (30 persons). The quality of Manual DB is better than that of Automatic DB.

Fig. 5. Instances of two persons in 3DPEF. The variations from left to right are: normal, left, right, up, down, smile, laugh, sad, surprise, eye closed

4.2 Selection of Shape Variation Features

Different order G-H moments represent different shape variation information, which have a different ability to characterize an individual. In this experiment, we find an appropriate order GH-SVI to obtain the best recognition performance.

We evaluate the correct classification rate (CCR) on the database 3D_RMA under different order GH-SVI as shown in Table 1. Considering that the number of instances of each person is small, we use the leave-one-out scheme. In each test, one is the test sample, all the others are for training. The nearest neighbor method is adopted as the classifier. `Session1-2' means that the samples in session1 and session2 are blended together.

Table 1. CCR(%) in different sets in Manual DB of 3D_RMA using different order GH-SVI

Data sets 0th 1st 2nd 3rd 4th Manual DB, session1 84.4 74.4 86.7 65.5 7.8 Manual DB, session2 75.6 66.7 76.7 64.4 4.4 Manual DB, session1-2 90.0 90.0 93.3 86.1 7.2

From Table 1, we find that recognition using 0th to 2nd order GH-SVI has a high CCR, and 2nd order has the highest CCR (86.7%,76.7% and 93.3%). When the order is increased, the CCR decreases largely and the 4th order GH-SVI has a very low CCR (7.8%, 4.4%, 7.2%). In Automatic DB of 3D_RMA, we obtain the same conclusion. Intuitively, the 0th order moment is similar to smoothing the original signal, and in fact, it does not describe shape variation. The face surface is smooth on the whole and high order moments usually describe the intense variation. So it is not necessary to calculate the higher order moments. In the following experiments, we use the 2nd order G-H moment to represent the shape variation information.

4.3 Recognition Performance Evaluation

Identification accuracy is evaluated with the different sets. Table 2 summarizes the Correct Classification Rate (CCR) using features, GH-SVI. As to 3DPEF, we use three samples (`normal', `right' and `left') as the gallery set and other samples as the test set. As to Manual DB and Automatic DB in 3D_RMA, we use the samples in session1 (three instances for each person) as the gallery set and the samples in session2 as the test set, which agrees with the regulation that there is an interval between the gallery set and the test set. After computing the similarity differences using the Euclidean distance in single classifiers, and the nearest neighbor (NN) is then applied to the classification.

From an overall view, we can draw the following conclusions: Shape variation contains important information to characterize an individual.

Our proposed method for extracting shape variation feature is very effective. The highest recognition we obtain is 93.3% (30 persons) and 73.4% (120

persons). Although the testing database is not big enough, this result is obtained in the fully automatic way, which is fairly encouraging.

Table 2. CCR(%) in different databases

Databases 2nd GH-SVI 3DPEF (30 persons) 82.4 Manual DB in 3D_RMA (30 persons) 93.3 Automatic DB in 3D_RMA (120 persons) 73.4

Noise and volume of the test database affect the CCR strongly. Automatic DB has more people and contains much noise, and its recognition performance is distinctly worse than Manual DB.

4.4 Comparisons with Existing Methods

Surface curvature is the classic property used in 3D face recognition [2,3,4]. Point signature is another important technique of representing free-form surfaces, and has obtained a good result in 3D face recognition [5] under a small database (6 persons). Here, we compare them with our proposed GH-SVI features in the way of CCR. Unfortunately, we cannot obtain the source codes and databases used in their publication. We only make a comparison in current databases and with our own codes.

We calculate the point signature for each mesh node and use them to characterize an individual. Twelve signed distances are calculated to describe the point signature of each mesh node. The detailed algorithm can be found in [16]. The curvature of each node is calculated using numerical estimation, surface normal change [17]. There are several representations for curvature and we use the mean curvature as features. Table 3 shows the CCR respectively using point signature (PS), surface curvature (SC) and shape variation information (GH-SVI, using 2nd order G-H moments) on Manual DB of 3D_RMA. In this test, we still use the leave-one-out scheme.

From this table, we can see that GH-SVI outperforms PS and SC on the whole. Compared with table 1, we find that the CCR obtained from point signature are similar to the results from GH-SVI using 0th order G-H moments.

Table 3. CCR (%) using point signature (PS), surface curvature (SC) and shape variation information (GH-SVI) in Manual DB of 3D_RMA.

Algorithm Session1 Session2 Session3PS 83.3 76.7 88.3 SC 71.1 75.5 76.1

GH-SVI 86.7 76.7 93.3

5 Conclusions

Recently, personal identification based on 3D information has been gaining more and more interest. In this paper, we have proposed a new method for 3D face recognition. Based on the generated regular meshes, GH_SVI is extracted to characterize an individual. We test the proposed algorithm on 3DPEF and 3D_RMA, and the

encouraging results have showed the effectiveness of the proposed method for 3D face recognition. Compared with previous work, our algorithm demonstrates an outstanding performance.

Acknowledgements

This work is supported by research funds from the Natural Science Foundation of China (Grant No. 60121302 and 60332010) and the Outstanding Overseas Chinese Scholars Fund of CAS (No.2001-2-8).

References

[1] W. Zhao, R. Chellappa, P.J. Phillips, and A. Rosenfeld, “Face Recognition: A Literature Survey”, ACM Computing Surveys (CSUR) archive, Vol.35, No.4, pp.399-458, 2003.

[2] J.C. Lee, and E. Milios, “Matching Range Images of Human Faces”, Proc. ICCV'90, pp.722-726, 1990.

[3] G.G. Gordon, “Face Recognition Based on Depth and Curvature Features”, Proc. CVPR'92, pp.108-110, 1992.

[4] Y. Yacoob and L.S. Davis, “Labeling of Human Face Components from Range Data”, CVGIP: Image Understanding, 60(2):168-178, 1994.

[5] C.S. Chua, F. Han, and Y.K. Ho, “3D Human Face Recognition Using Point Signature”, Proc. FG'00, pp.233-239, 2000.

[6] C. Beumier and M. Acheroy, “Automatic Face Authentication from 3D Surface”, Proc. BMVC'98, pp.449-458, 1998.

[7] C. Beumier and M. Acheroy, “Automatic 3D Face Authentication”, Image and Vision Computing, 18(4):315-321, 2000.

[8] H.T. Tanaka, M. Ikeda and H. Chiaki, “Curvature-based Face Surface Recognition Using Spherical Correlation”, Proc. FG'98, pp.372-377, 1998.

[9] C. Hesher, A. Srivastava, and G. Erlebacher, “A Novel Technique for Face Recognition Using Range Imaging”, Inter. Multiconference in Computer Science, 2002.

[10] Y. Lee, K. Park, J. Shim, and T. Yi, “3D Face Recognition Using Statistical Multiple Features for the Local Depth Information”, Proc. 16th Inter. Conf. on Vision Interface, 2003.

[11] S. Liao, M. Pawlak, “On Image Analysis by Moments”, IEEE Trans. on PAMI, Vol.18, No.3, pp.254-266, 1996.

[12] J. Shen, W. Shen and D. Shen, “On Geometric and Orthogonal Moments”, Inter. Journal of Pattern Recognition and Artificial Intelligence, Vol.14, No.7, pp.875-894, 2000.

[13] C. Xu, Y. Wang, T. Tan, and L. Quan, “A Robust Method for Detecting Nose on 3D Point Cloud”, Proc. ICIP’04, 2004(to appear).

[14] P.J. Besl, and N.D. Mckay, “A Method for Registration of 3-D shapes”, IEEE Trans. PAMI, Vol.14, No.2, pp.239-256, 1992.

[15] C. Xu, L. Quan, Y. Wang, T. Tan, M. Lhuillier, “Adaptive Multi-resolution Fitting and its Application to Realistic Head Modeling”, IEEE Geometric Modeling and Processing, pp.345-348, 2004.

[16] C.S. Chua and R. Jarvis, “Point Signatures: A New Representation for 3-D Object Recognition”, IJCV, Vol.25, No.1, pp.63-85, 1997.

[17] R.L. Hoffman, and A.K. Jain, “Segmentation and Classification of Range Images”, IEEE Trans. on PAMI, 9(5):608-620, 1987.

3d face recognition based on g-h shape variation

Documents