lecture 10 ming yang - face recognition systems

•

•

Machine

Learning

Biometric

System

Computer

Vision

Image

ProcessingFaceRec

•

•

•

http://www.marketsandmarkets.com/PressReleases/facial-recognition.asp , 5/2013

• Security and access control– ATM, buildings, airports/border control, smartphones

• Law enforcement and criminal justice system– Mug-shot systems, post-event analysis, forensics, missing children

• General identity verification (smart cards)– Driver license, passports, voter registration, welfare fraud

• Advanced video surveillance (CCTV control)

• Entertainment and multimedia HCI– Smart TV, context-aware system, VIP customers

– Video indexing and celebrity search

– Refining search engine searches

– Photo tagging in social media

• Face verification (1:1)

• Face identification (1:N)

• Face search/retrieval

• Still image

• Video sequence

• 3D/depth sensing or infrared imagery

• Facial key-point / landmark detection– Face editing, automatic face beautification on selfie photos

• Face pose estimation– Amazon Fire Phone, rendering 3D scenes w.r.t. viewer’s head pose

• Face or eye or gaze tracking– Fatigue detection for drivers, HCI, sport video analysis

• Facial attribute recognition– Gender/age/ethnicity estimation, business intelligence

• Face expression recognition– Sentiment / emotion analysis, perception of advertising

• Face liveness detection

• Face hallucination or super-resolution

• Face synthesis (3D morphable models)

• Faces belong to a single category

– Subtle intrapersonal and interpersonal variations

• Intrinsic factors– Aging, facial expressions, hair styles, accessories, makeup, etc.

• Extrinsic factors

– illumination, pose, resolution, scaling, noise, occlusion, etc.

Boston Marathon bombing suspects in surveillance footage, 4/15/2013

property constrained unconstrained

resolution about 2000x2000 50x50

viewpoint fully frontal rotated, loose

illumination controlled arbitrary

occlusion disallowed allowed

Detect Align Represent Classify

• W.W. Bledsoe, “Man-machine facial recognition”,

Tech. Report, PRI 22, Panaromic Res. Inc., 1966

• T. Kanade, “Picture processing system by computer

complex and recognition of human faces”, Doctoral

Dissertation, Kyoto University, 1973

• Almost all successful algorithms in pattern recognition

/ machine learning / computer vision have been

applied to FaceRec !

Courtesy by A.K. Jain, Intl. Conf. on Biometrics 2013.

• Feature-based structural matching approaches– Kanade, 1973: 16 geometrical facial parameters

– Brunelli and Poggio, 1993: 35 geometric features

– Wiskott et al., 1997, Elastic bunch graph matching (EBGM)

– …

– Chen et al., Blessing of dimensionality, High-dim LBP, CVPR 2013

• Appearance-based holistic approaches– Turk and Pentland, 1991, Eigenfaces (PCA)

– Belhumeur et al., 1997, Fisherfaces (LDA)

– He and Yan et al., 2003, Laplacianfaces (LPP)

– Wright, et al., 2009, Sparse representation (SRC)

– …

– Taigman and Yang, et al., DeepFace, CVPR 2014

• Hybrid approaches

Face: key-points

Face recognition by elastic bunch graph matching, Wiskott, et al., TPAMI 1997

Face recognition: features versus templates, Brunelli and Poggio, TPAMI 1993

Feature based face recognition using mixture-distance, Cox, et al., CVPR 1996

35 geometric features

Face: A graph of key-points with a bunch of Gabor features

• LBP, 59 uniform LBP for (8,1) circular neighborhood

Face recognition with local binary patterns, Ahonen, et al., ECCV 2004

• Face: spatially enhanced histogram of LBP descriptors

Face: a 2D array of intensities projections on Eigenfaces

Face recognition using Eigenfaces, Turk and Pentland, CVPR 1991.

Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,

Belhumeur, Hespanha, Kriegman,TPAMI 1997.

Face: projections on Fisherfaces (LDA on scatter matrices)

“the variations between the images of the same face due to lighting are almost always

larger than image variations due to a change in face identity”

• Isometric Feature Mapping (ISOMAP)

• Locally Linear Embedding (LLE)

• Locality Preserving Projection (LPP)

• Other metric learning methods …

Face recognition using Laplacianfaces, He, Yan, Hu, Niyogi, Zhang, TPAMI 2005.

Face: projections on Laplacianfaces, a linear embedding

preserving local manifold structures of an adjacency graph.

Laplacian L=D-S of the

nearest neighbor graph

Low-dimensional embedding

Robust face recognition via sparse representation, Wright, et al., TPAMI 2009.

Face: a 2D array of intensities a sparse representation

in a sufficiently large feature space

Eigenfaces, Laplacianfaces,

downsampled, random projections

• Integral images

• Fast Haar features

• Cascaded boosting classifier

• Bootstrapping hard negatives

• Sliding window search on image pyramid

• Non-maximum suppression

Rapid object detection using a boosted cascade of simple features, Viola&Jones, CVPR01

• In-plane/out-of-plane rotations (roll, yaw, pitch)

High-performance rotation invariant multiview face detection, Huang, et al., PAMI 2007

Non-face

• Width-first-search tree: multi-class vector boosting

• Pixel features in a granular space

High-performance rotation invariant multiview face detection, Huang, et al., PAMI 2007

• Weak classifier: stump function piece-wise function

• 30K frontal, 25K half profile, 20K full profile faces

• Define and label facial landmarks

• Build landmark detectors – templates, SVM/regression

• Constrain shape variations– Point distribution models for all landmarks

• EM-like algorithm– M-step: Find the optimized location for each landmark individually

– E-step: Smooth the shape by using point distribution model

Active shape models: their training and application, Cootes, CVIU 1995

Active appearance models, Cootes, et al., ECCV 1998.

• Efficient joint optimization

• Efficient landmark detector / regresssor

• New modeling of local appearances and global shape

Face alignment at 3000 FPS via regressing local binary features, Ren, Cao, Wei,

Sun, et al., CVPR 2014

Face alignment through subspace constrained mean-shifts, Saragih, et al. ICCV 2009

Face alignment by explicit shape regression, Cao, et al., CVPR 2012

Face detection, pose estimation, and landmark localization in the wild, Zhu and

Ramanan, CVPR 2012

Deep convolutional network cascade for facial point detection, Sun, et al., CVPR 2013

Detecting and aligning faces by image retrieval, Shen, et al., CVPR 2013

Head pose estimation in computer vision: A survey, Murphy-Chutorian and Trivedi,

TPAMI 2009

• CMU Multi-Pie– 750,000+ faces of 337 individuals

– 9 view points, 19 lighting conditions

– Facial expressions in 4 sessions

• Extended Yale B dataset– 21,800+ faces of 38 individuals

– 9 poses, 64 lighting conditions

• CAS-PEAL face dataset– 30,900 faces of 1040 individuals

– Pose, lighting, accessories, etc.

• MORPH dataset– 55,000 faces of 13,000 individuals

– Mug-shot images, biased distribution

• FERET (Face Recognition Technology), 1993-1997– 316 individuals (1993) 14,051 images of 1199 individuals (2000).

• FRVT (Face Vendor Recognition Test) 2000, 2002, 2006

• FRVT 2010 (still image track of MBE 2010)

• FRVT 2013 (done in 5/2014)– 1.6 million faces

– frontal faces with ambient lighting

Organized by National Institute of Standards and Technology (NIST), USA

J. Phillips, FRVT 2010 Report by NIST

FRR = 0.3% at

FAR = 0.1%

Error rate drops

by 3 orders of

magnitude in 20

years!

Test 1:

9,240 true-mates

vs. 10K imposters

Test 2:

12X3000X2=72K

genuine scores

vs. 18M imposter

scores

• A large gallery: 1.6 million faces

• Probe set: 171K mug-shots and 10.7K webcam images

• Evaluation metric: rank-1 miss rate

Vendor Mug-shot Webcam

NEC 4.1% 11.3%

Morpho 9.1% 36.4%

Toshiba 10.7% 23.7%

Cognitec 13.6% 57.6%

3M 17.2% 36.4%

Neurotech 20.5% 66.9% FRVT 2013 Report from NIST

• A large gallery: 1.6 million faces

• Probe set: 40K mated and 40K non-mated searches

• Evaluation metric: – Rank-1 miss rate

– FNIR at FPIR = 0.002

– FNIR (false negative identification rate)

– FPIR (false positive identification rate)

Vendor FRVT 2013 FRVT 2010 FNIR (FPIR = 0. 2%)

NEC 6.4% 8.9% 10.8%

Morpho 12.1% 13.5% 19.4%

Cognitec 17.0% 18.7% 34.2%

Neurotech 23.1% 25.8% 68.4%

FRVT 2013 Report from NIST

• Gallery images: 1 million mug-shot + 6 web images

• Probe images: 5 faces

• Rank ranking – w/o or with demographic filtering

A case study of automated face recognition: the Boston Marathon bombing suspects, J.

C. Klontz and A.K. Jain, IEEE Computer, 2013

• What is the state-of-the-art TPR (true positive rate) at

FAR (false alarm rate) 0.1% for constrained face

verification in FRVT 2010?– ( a ) 99.7%

– ( b ) 97.5%

– ( c ) 95.9%

• What is the state-of-the-art rank-1 accuracy on probe-

gallery search among 1.6 million faces for constrained

face identification in FRVT 2013?– ( a ) 99.7%

– ( b ) 97.5%

– ( c ) 95.9%

CVPR 2014

No automatic face recognition service in EU countries

property constrained unconstrained

resolution about 2000x2000 50x50

viewpoint fully frontal rotated, loose

illumination controlled arbitrary

occlusion disallowed allowed

• Data collection– 13,233 web photos of 5,749 celebrities

– 6,000 face pairs in 10 splits

• Metric: mean recognition accuracy over 10 folds– Restricted protocol: only same/not-same labels

– Unrestricted protocol: face identities, additional training pairs

– Unsupervised setting: no training whatsoever on LFW images

Labeled faces in the wild: A database for studying face recognition in

unconstrained environments, Huang, Jain, Learned-Miller, ECCVW, 2008

• User study on Mechanical Turk – 10 different workers per face pair

– Average human performance

– Original images, tight crops, inverse crops

Attribute and simile classifiers for face verification, Kumar, et al., ICCV 2009

99.20%

97.53%

94.27%

•

•

•

•

•

•

•

•

•

•

•

http://vis-www.cs.umass.edu/lfw/

60.02%

73.93%78.47%

85.54%88.00%

92.58%95.17% 96.33% 97.53%

37.08%

19.24%

37.09%

20.52%

48.06%52.32%

49.15%

Accuracy / year

Reduction of error wrt human / year

• Accurate (27) dense facial landmarks

• Concatenate multi-scale descriptors– ~100K-dim LBP, SIFT, Garbor, etc.

• Transfer learning: Joint Bayesian

• WDRef dataset– 99,773 images of 2,995 individuals

– 95.17% => 96.33% on LFW (unrestricted protocol)

Face alignment by explicit shape regression, Cao, et al., CVPR 2012

Bayesian face revisited: A joint formulation, Chen, et al., ECCV 2012

Blessing of dimensionality: High-dimensional feature and its efficient compression for

face verification, Chen, et al., CVPR 2013

A practical transfer learning algorithm for face verification, Cao, et al., ICCV 2013

Likelihood ratio test:

EM update of the between/within class covariance

• 12X5 Siamese ConvNets X8 + RBM classification

Hybrid deep learning for computing face similarities, Sun, Wang, Tang, ICCV 2013.

12 face regions

8 pairs of inputs

Detect Align Represent Classify

Localization Front-End ConvNet Local (Untied)

Convolutions

Globally

Connected

DeepFace

Replica

DeepFace

Replica

88.00%

92.58%

95.17%

96.33%

97.35% 97.53%

98.4%

2010 2011 2012 2013 DeepFace Human New*

Reduction of error wrt human / year

Accuracy / year

20.52%

48.06%

52.32%49.15%

85.00%

Face recognition in unconstrained videos with matched background similarity, Wolf,

Hassner, Maoz, ICCV 2011

• Data collection– 3,425 Youtube videos 1,595 celebrities (a subset of LFW subjects)

– 5,000 video pairs in 10 splits

– Detected and roughly aligned face frames available.

• Metric: mean recognition accuracy over 10 folds– Restricted protocol: only same/not-same labels

– Unrestricted protocol: face identities, additional training pairs

87.993.7 94.3 97.35

91.3

No Alignment 3D Pertrubation 2D Alignment 3D Alignment 3D Alignment +LBP

(LFW Acc. %)

97

96.0796.72

95.53

97.17

95.87

4096 4096bits

1024 1024bits

256 256 bits

0

0.2

0.4

0.6

0.8

1

0.1 0.2 0.3 0.4 0.5 0.6 7 0.8 0.9 1

8.74 10.915.1

20.7

100% of the data 50% of the data 20% of the data 10% of the data

DB Size / DNN Test Error (%)

8.7411.2 12.6 13.5

C1+M2+C3+L4+L5+L6+F7 -C3 -L4 -L5 -C3 -L4 -L5

• Naïve binarization

97 96.72 96.78 97.1796.42 96.1

94.5

92.75

89.8

96.0795.53 95.5 95.87

93.38

91.45

87.15

85

87

89

91

93

95

97

dim=4096 dim=1024 dim=512 dim=256 dim=128 dim=64 dim=32 dim=16 dim=8

Verification accuracy (%) on LFW (restricted protocol)

float binary

• All false negatives on LFW (1%)

• All false positive on LFW (0.65%)

• Sample false negatives on YTF

• Sample false positives on YTF

• Coupling 3D alignment with large-capacity

locally-connected networks

• At the brink of human-level performance for

face verification

50.00%

55.00%

60.00%

65.00%

70.00%

75.00%

80.00%

85.00%

90.00%

95.00%

100.00%

Accuracy / year

Baseline Rank-1 rate (%)

Rank-1 rate (%) @

1% False alarm rate Verification (%)

[1] 25 56.7 NA

[2] 44.5 64.9 97.35

[3] 61.9 82.5 98.4

• What is the state-of-the-art level of rank-1 accuracy

searching 3K faces against 4K gallery faces on the

unconstrained LFW dataset?– ( a ) ~80%

– ( b ) ~60%

– ( c ) ~40%

• Questions

• Comments

• Suggestions

• We are recruiting!

• https://www.facebook.com/careers/

• Locations: MPK/Seattle/NYC/London/Dublin

lecture 10 ming yang - face recognition systems

Software