face recognition - pdfs.semanticscholar.org€¦ · face recognition abstract face recognition...

4
Face Recognition Abstract Face recognition involves identifying or verifying a person from a digital image or video frame and is still one of the most challenging tasks in computer vision today. The conventional face recognition pipeline consists of face , , and . This page further explains three exemplary state-of- detection face alignment feature extraction, classification the-art architectures: DeepID3 , FaceNet , and Sparse ConvNet . (6) (9) (11) 1 Introduction 2 Overview 3 Notable networks 3.1 DeepID3 3.2 FaceNet 3.3 Sparse ConvNet 4 Literature 5 Weblinks Introduction The task of face recognition involves identifying or verifying a person from a digital image or video frame. Computer applications capable of performing this task, known as facial recognition systems, have been around for decades. The general idea of face recognition is identifying facial features by extracting and then compare facial landmarks to other images by matching those features. However, face recognition is still one of the most relevant and challenging research areas in computer vision and pattern recognition due to variations in facial expressions, poses, and illumination. (1) Overview The conventional face recognition pipeline consists of four stages: , (or face detection face alignment preprocessing), (or face feature extraction representation) and , as illustrated in classification figure 1. A milestone in the face detection areas was the contribution by Viola & Jones in 2001, which (2) provided an object detection framework that was operating in real-time and was suited for human faces. The remaining multi-view face detection problem was first tackled by Farfade, Saberian, & Li in 2015 (3) by using deep instead of cascade-based CNNs approach as Viola & Jones. Current state-of-the-art approaches use region-based to enable a faster CNNs and more reliable detection. (4) To simplify the extraction part, a proper alignment is crucial. If facial points can be identified correctly, features can be matched in a region around them. Recently, -based architectures showed success in CNN this area. (5) The feature extraction part is often considered the most challenging and important of all, since any matching algorithm is limited by the quality of the underlying features. Figure 1: Stages of the face recognition pipeline. So urce: own illustration from and Link Link

Upload: others

Post on 26-Jun-2020

27 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Face Recognition - pdfs.semanticscholar.org€¦ · Face Recognition Abstract Face recognition involves identifying or verifying a person from a digital image or video frame and is

Face RecognitionAbstract

Face recognition involves identifying or verifying a person from a digital image or video frame and is still one of the most challenging tasks in computer vision today. The conventional face recognition pipeline consists of face

, , and . This page further explains three exemplary state-of-detection face alignment feature extraction, classificationthe-art architectures: DeepID3  , FaceNet , and Sparse ConvNet .(6) (9) (11)

1 Introduction2 Overview3 Notable networks

3.1 DeepID33.2 FaceNet3.3 Sparse ConvNet

4 Literature5 Weblinks

Introduction

The task of face recognition involves identifying or verifying a person from a digital image or video frame. Computer applications capable of performing this task, known as facial recognition systems, have been around for decades. The general idea of face recognition is identifying facial features by extracting  and then compare facial landmarksto other images by matching those features. 

However, face recognition is still one of the most relevant and challenging research areas in computer vision and pattern recognition due to variations in facial expressions, poses, and illumination. (1)

 

Overview

The conventional face recognition pipeline consists of four stages: , (or face detection face alignmentpreprocessing), (or face feature extractionrepresentation) and , as illustrated in classification figure 1.

A milestone in the face detection areas was the contribution by Viola & Jones  in 2001, which  (2)

provided an object detection framework that was operating in real-time and was suited for human faces. The remaining multi-view face detection problem was first tackled by Farfade, Saberian, & Li  in 2015 (3)

by using deep instead of cascade-based CNNsapproach as Viola & Jones. Current state-of-the-art approaches use region-based to enable a faster CNNs and more reliable detection.    (4)

To simplify the extraction part, a proper alignment is crucial. If facial points can be identified correctly, features can be matched in a region around them. Recently, -based architectures showed success in CNNthis area.   (5)

The feature extraction part is often considered the most challenging and important of all, since any matching algorithm is limited by the quality of the underlying features. 

 

Figure 1: Stages of the face recognition pipeline.  Source: own illustration from  and Link Link

Page 2: Face Recognition - pdfs.semanticscholar.org€¦ · Face Recognition Abstract Face recognition involves identifying or verifying a person from a digital image or video frame and is

Notable networks

There is a verity of successful architectures. This section focuses on three different models and explains their idiosyncrasies. Evaluations for face recognition approaches are almost always performed on the Labeled Face in

    data set, with as the most common metric. In the verification task, the Wild (LFW) (12)   face verification accuracy given a pair of face images, the goal is to determine whether they are coming from a single subject or not.

 

DeepID3

DeepID3 is the third generation of the DeepID architecture, which was one of the first publications to propose learning discriminative deep face representations (DFR) through large-scale face identity classification. The second generation proposed DFR by joint face identification-verification, which finally brought the networks up to human performance.

In this third approach (shown in ), Sun et. al      figure 2 (6)

were trying to use insights of the most successful architectures from the ImageNet challenge in 2014: the inception layers of GoogLeNet  and stacked (7) convolutions of VGG  . They also included joint (8)

identification-verification supervisory signals to multiple layers, to further reduce the intra-personal variance of the representation. The publication shows that very deep neural networks achieve state-of-the-art performance on face recognition tasks and slightly outperform their shallow counterparts. By exposing the architectures to large-scale training data, another increase in effectiveness is expected.

 

Figure 2: Layers of DeepID3 network.  Source:   (6)

FaceNet

The FaceNet publications by Google researchers   introduced (9)

a novelty to the field by directly learning a mapping from face images to a compact Euclidean space. The distances between representation vectors are a direct measure of their similarity with 0.0 corresponding to two equal pictures and 4.0 marking the opposite site of the spectrum. The representation is also able to significantly reduce the image complexity to only 128-bytes per face. This generalized embedding significantly differs from other approaches, which are trained over a set of known faces and then generalized via an intermediate bottleneck layer. Figur

shows the exemplary scores of e 3 pairs of test images.

Figure 4.  This netModel structure. work consists of a batch input layer

and a deep CNN followed by L2 normalization, which results in the face embedding. This is followed

by the triplet loss during training.  Source:   (9)

 

 

 

Page 3: Face Recognition - pdfs.semanticscholar.org€¦ · Face Recognition Abstract Face recognition involves identifying or verifying a person from a digital image or video frame and is

The architecture is a combination of the multiple interleaved layers of convolutions of Zeiler & Fergus   (10)

and the inception model of GoogLeNet . These models are  (7)

interwoven to a deep architecture, which is symbolized as a black box in f  The most important part igure 4.of the approach lies in the end-to-end learning of the whole system. As a loss function, the Triplet Loss was used, which is explained and shown in figure 5.

During the time of the publication, FaceNet set a new record accuracy on the LFW   dataset with (12)

99.63%. The drawback of this model is the demand for a large training data set (200 million training samples in this case).

 

Figure 3: Illumination and pose Pose and illumination invariance. 

have been a long standing problem in face recognition. This figure shows the output distances of

FaceNet between pairs of illumination combinations. A

distance of 0.0 means the faces are identical, 4.0 corresponds to

the opposite spectrum, two different identities. You can see

that a threshold of 1.1 would classify every pair correctly.  Source

:   (9)

Figure 5: The  Triplet Lossminimizes the distance between an 

and a  both of anchor  positive, which have the same identity, and maximizes the distance between the  and a  of a anchor  negative different identity.  Source:   (9)

Sparse ConvNet

In this recent publication, Sun et al. . tried to further improve their achievements of DeepID3  . by taking a  (11) (6)

trained, dense , sparsify the connections, and train it even further to improve performance. This architecture CNNincreases the baseline performance of the DeepID3 from 98.95% to 99.30%, which implies an error rate reduction of 33%. It is important to note that even if it did not achieve a better performance than FaceNet  , it only required (9)

300,000 training samples and can thereby be considered more efficient. 

 

Literature

1)  Kasar, M. M., Bhattacharyya, D., & Kim, T. H. (2016). . Face Recognition Using Neural Network: A Review International Journal of Security and Its Applications, 10(3), 81-100.

2)  Viola, P., & Jones, M. (2001). . In Rapid object detection using a boosted cascade of simple features Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on (Vol. 1, pp. I-I). IEEE.

3)  Farfade, S. S., Saberian, M. J., & Li, L. J. (2015, June). Multi-view Face Detection Using Deep Convolutional . In Neural Networks Proceedings of the 5th ACM on International Conference on Multimedia Retrieval (pp. 643-

650). ACM.

4)  Jiang, H., & Learned-Miller, E. (2016). . Face detection with the faster R-CNN arXiv preprint.

5)  Sun, Y., Wang, X., & Tang, X. (2013). . In Deep convolutional network cascade for facial point detection Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3476-3483).

   Sun, Y., Liang, D., Wang, X., & Tang, X. (2015).  . 6) DeepID3: Face recognition with very deep neural networks. arXiv preprint.

  Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. 7)(2015). . In Going deeper with convolutions Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1-9).

Page 4: Face Recognition - pdfs.semanticscholar.org€¦ · Face Recognition Abstract Face recognition involves identifying or verifying a person from a digital image or video frame and is

Simonyan, K., & Zisserman, A. (2014). . 8)   Very deep convolutional networks for large-scale image recognition arXivpreprint.

9)  Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and . In clustering Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 815-823).

Zeiler, M. D., & Fergus, R. (2014, September). . In 10)  Visualizing and understanding convolutional networks European conference on computer vision (pp. 818-833). Springer International Publishing.

11) Sun, Y., Wang, X., & Tang, X. (2016). . In Sparsifying neural network connections for face recognition Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 4856-4864).

Weblinks

12)    Labeled Faces in the Wild dataset