face recognition with deep learning hji/cs519_slides/face recognition with deep... · pdf...
Click here to load reader
Post on 12-May-2020
Embed Size (px)
Face Recognition with
Outline 1. Introduction
2. Related works
• Classical Face recognition pipeline
• Shallow Learning
SIFT, LBP, HOG, etc. Features = handpicked Works well on small datasets but fails on large datasets Also ails on illumination variations and facial expressions
Face patterns lie on a complex nonlinear and non‐convex manifold in the high‐dimensional space.
• Deep Learning Convolutional Neural Networks (CNNs) DeepFace DeepID Series Facenet VGGFace etc.
DeepFace (Taigman and Wolf 2014)
2D/3D face modeling and alignment
using affine transformations
9 layer deep neural network
120 million parameters
Alignment（Frontalization） (a) The detected face, with 6 initial fidu- cial points.
(b) The induced 2D-aligned crop.
(c) 67 fiducial points on the 2D-aligned crop with their
corresponding Delaunay triangulation, we added
triangles on the contour to avoid discontinuities.
(d) The reference 3D shape transformed to the 2D-
aligned crop image-plane.
(e) Triangle visibility w.r.t. to the fitted 3D-2D camera;
darker triangles are less visible.
(f) The 67 fiducial points induced by the 3D model that
are used to direct the piece-wise affine warping.
(g) The final frontalized crop.
(h) A new view generated by the 3D model.
• Input: 3D aligned 3 channel (RGB) face image
• 9 layer deep neural network architecture
• Performs softmax for minimizing cross entropy
• Uses SGD(stochastic), Dropout, ReLU
• Outputs k-Class prediction
Layer 1-3 :
• Convolution layers - extract low-level features (e.g. simple edges and
• ReLU after each conv. layer, making the whole cascade produce
highly non-linear and sparse features
• Max-pooling: make convolution network more robust to local
• Apply filters to different locations on the feature map
• Similar to a conv. layer but spatially dependent
• Fully connected and generates 4096d vector
• Sparse representation of face descriptor
• 75% of outputs are zero
• mainly due to ReLU and Dropout
• F8 calculates probability with softmax
• softmax produces a distribution over the class labels
• Cross-entropy loss function: for each training sample
• Computed using SGD and performs backpropagation
• Fully connected and generates 4030d vector
• Trained on SFC 4M faces (4030 identities, 800-1200 images per person)
• Focus on Labeled Faces in the Wild (LFW) evaluation
• Used SGD with momentum of 0.9
• Learning rate 0.01 with manual decreasing, final rate was 0.0001
• Random weight initializing
• 15 epochs of training
• 3 days total on a GPU-based engine
Training on SFC
• Experimented with different depths of networks
• Removed C3/L4,L5/C3, L4, L5
• Compared error rate to number of classes K
– Deeper is better
Result on LFW