deep face recognition - nvidia · herta deep face recognition gpu-powered face recognition offices...

16
Deep Face Recognition Challenges and Tips for Real-life Deployment [email protected]

Upload: others

Post on 13-Oct-2020

43 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

Deep Face Recognition Challenges and Tips for Real-life Deployment

[email protected]

Page 2: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

1 Deep Face Recognition

2 Public DBs

3 Public models

4 Managing imbalance

5 Embeddings

6 Conclusions

Page 3: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Deep Face Recognition

GPU-powered face recognition

Offices in Barcelona, Madrid, London, Los Angeles

Crowds, unconstrained

Deep Face Recognition

Large training DBs, >100K images, >1K subjects (Public DBs)

Public models (Inception, VGG, ResNet, SENet…), close to state-of-the-art

Typically, embedding layer (yielding facial descriptor) feeds one-hot encoding

Unconstrained (in-the-wild) environments

Page 4: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Public DBs

CWF

LFW

VGG Face

VGG Face 2

IJBB

• Mostly celebrities: subjects overlap

2.6K

9.1K

10.6K

1.8K

5.7K

Page 5: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Public DBs

LFWCWF

• Highly imbalancedD

emo

grap

hic

gro

up

Imag

es /

su

bje

ct

Page 6: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Public models

Public models • trained on public DBs (DIY)

Validate with • demographically-balanced DB:

Asian female: 1M pairsAsian male: 1M pairsBlack female: 1M pairsBlack male: 1M pairsWhite female:1M pairsWhite male: 1M pairs

FaceNet (2015) CWF / MS-1MCVGGFace (2015) VGGSphereFace (2017) CWFVGGFace2 (2017) MS-1MC + VGG2

(50% same ID, 50% different ID)

Page 7: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Public models: examples of failures

False positives False negatives

Page 8: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Public models: evaluation

FaceNet (2015)

SphereFace (2017)

VGGFace (2015)

VGGFace2 (2017)

1MC CWF VGG

CWF 1MC 1MCVG2 VG2

White male Black male Asian female

Page 9: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

“Features get better at understanding faces, improving

performances of individual tasks”

Multi-tasklearning

id

gender

ethnics

Managing imbalance

Undersampling

Oversampling

Cost-sensitive learning

c

SAMPLING(DATA-ORIENTED)

TRAINING LOSS(MODEL-ORIENTED)

R Ranjan, VM Patel, R Chellappa. “Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition.” TPAMI 2017

Page 10: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Managing imbalance – Data augmentation

• Data augmentation: makes imbalance mitigation much more effective

Stochasticdata augmentation

Oversampled DB DNNDatabase

I Masi et al. "Do we really need to collect millions of faces for effective face recognition?" ECCV 2016.

Page 11: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Managing imbalance – Proposal

Traditional imbalance:

Proposal: IDR(robust to outliers)

Iterative multi-label oversampling:

𝑚𝑎𝑥 𝑋

𝑚𝑖𝑛(𝑋)

𝐷9 𝑋

𝐷1(𝑋)

1. Find most imbalanced label L2. Find most imbalanced category C within L3. Draw random sample from C, replicate

𝐷1

𝐷9

𝑚𝑎𝑥 𝑋

𝑚𝑖𝑛(𝑋)

𝐷9 𝑋

𝐷1(𝑋)

#samples added #samples added

Page 12: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Managing imbalance – Sample training batch

Before oversampling… …and after

Page 13: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Managing imbalance

• Results with ResNet 20 (tiny network, for comparison only)• Better with almost 6X less subjects, 2X less images!

10.6K subjects,494K images

1.8K subjects,295K images

Page 14: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Sparse embedding

Typically, in deep face recognition: •

What about • ReLU + embedding + one-hot encoding? (e.g. VGGFace)Why more dimensions, if 90% zero?

Larger representation subspace, at expense of computational efficiency•

But can gain it back! • ̴200M comp/s

image CNNembedding

layerone-hot encoding

Sparse 4096-d Dense 512-dDict + Dense 256-d

Page 15: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

HERTA

www.hertasecurity.com

Conclusions

• Public training / validation DBs: heavily biased at multiple levels• Without balancing, trained models will be biased, too!• Prefer “better data” over “more data”

• Machine Learning vs Machine Teaching

Explainable ML

Designing algorithms to passively train models

Choosing which examplesto show a learner

Zhu, Xiaojin, et al. "An Overview of Machine Teaching." arXiv preprint arXiv:1801.05927 (2018).

Page 16: Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

Questions?

[email protected]