face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต...
TRANSCRIPT
Standard procedure• Image capturing: camera, webcam, surveillance
• Face detection: locate faces in the image
• Face alignment: normalize size, rectify rotation
• Face matching
• 1:1 Face verification
• 1:N Face recognition
Viola-Jones Haar-like detector (OpenCV haarcascade_frontalface_alt2.xml)
face size~35x35 to 80x80 pixels
too small
occlusion
rotation
Recognition = compare these faces to known faces
Controlled environment face size 218x218 pixels
Viola-Jones eye detector
Eyes distance = 81 pixels Eyes angle = -0.7 degrees
Face size = 180x200 pixels Eyes distance = 100 pixels
Eyes angle = 0 degrees
Comparing face• Face image
• Bitmap of size 180x200 pixels
• Grayscale (0-255)
• 36,000 values/face image
• Given 2 face images x1 and x2
• x1(x,y) - x2(x,y)
• | x1(x,y) - x2(x,y) |
• (x1(x,y) - x2(x,y))2
• What should be used?
Basic Maths• 1 Face image = 1 vector
• 36,000 dimensions (d)
• matrix with 1 column
• Distance
• Euclidean distance
• Norm-p distance
• Norm-1 distance
• Norm-infinity distance
Pixels importance and projection
• Not all pixels have the same importance
• Pixel with low variation -> not important
• Pixel with large variation -> could be important
Projection When ||w||=1, wTx is the projection of x on axis w
w
Subspace projection
• What should be the axis w?
• How many axis do we need?
Principal Component Analysis PCA (1)
• Basic idea
• Measure of information = variance
• Variance of z1,…,zN for real numbers zt
• Given a set of face vectors x1,…,xN and axis wVariance of wTx1,…,wTxN is
Covariance matrix
Principal Component Analysis PCA (2)
• Best axis w is obtained by maximizing wTCw
with constraint ||w||=1
• w is an eigenvector of C : Cw = a w
• Variance wTCw=a is the corresponding eigenvalue of w
• PCA
• Construct Covariance matrix C
• Eigen-decompose C
• Select m largest eigenvectors
Eigenface (1)• What is the problem with face data?
• Solution
Dot matrix
dxd matrix NxN matrix
Eigenface (2)• We work with vectors of projected values
x1 x2 …
x40
x Enrollment
Template
Eigenface (3)
• Vector of raw intensity: 36,000 dimensions
• Vector of Eigenface coefficients: 10 dimensions
• Large Eigenface = large variation
• Small Eigenface = noise
Related techniques• Fisherface (LDA)
• Nullspace LDA
• Laplacianface
• Locality Sensitive Discriminant Analysis
• 2DPCA
• 2DLDA
• 2DPCA+2DLDA
Result on ORL (~10 years ago)
Techniques Accuracy #dimEigenface 90-95 200
Fisherface 91-97 50NLDA 92-97 40
Laplacianface 89-95 50LSDA 91-97 50
2DPCA 91.52DLDA 90.5
2DPCA+2DLDA 93.5
Limitations
• Occlusion: glasses, beard
• Lighting condition
• Facial expression
• Pose
• Make-up
Evaluation• Accuracy: find closest template and check the ID
• Verification (access control)
• Live captured image VS. stored image
• We have distance -> Should we accept or not?
• False Accept (FA) VS. False Reject (FR)
• From a set of face images
• Compute distances between all pair
• Select threshold T that gives 0 FA and X FR
• Number of tries
distance
T
Labeled Faces in the Wild
• Large number of subjects (>5,000)
• Unconstrained conditions
• Human performance 97-99%
• Traditional methods fail
• New alignment technique: funneling
LFW results
Use outside data to train the model
Deep Learning
Neural Network timeline
McCulloch & Pitts Neuron model (1943)
Perceptron limitation (1969)
Backprop algorithm 70-80’s
SVM (1992)
Deep Learning (2006)
• Return of Neural Network
• Focus on Deep Structure
• Take advantage of today computing power
Neural Networks (1)• Neurons are connected via synapse
• A neuron receives signals from other neurons
• When the activation reaches a threshold, it fires a signal to other neurons
http://en.wikipedia.org/wiki/Neuron
Neural Networks (2)• Universal Approximator
• Classical structure: MLP
• #hidden nodes, learning rate
• Backprop algorithm
• Gradient
• Direction of change that increases value of objective function
• Vector of partial derivatives wrt. each parameters
• Work on all structures, all objective functions
• Stoping criteria, local optima, gradient vanishing/exploding
Deep Learning• 2006 Hinton et al.: layer by layer construction -> pre-training
• Stack of RBMs, Stack of Autoencoders
• Convolutional NN (CNN)
• Shared weights
• Take advantage of GPU
CNN today• Common components
• Convolution layer, Max-pooling layer
• ReLU
• Drop-out, Sampling+flip training data
• GPU
• Tools: Caffe, TensorFlow, Theano, Torch
• Structure: LeNet, AlexNet, GoogLeNet
LeNet
LeNet
AlexNet
LeNet
AlexNet
GoogLeNet
LeNet
AlexNet
GoogLeNet
Microsoft deep residual network: 150 layers!
DeepID(Sun et al. CVPR 2014)
• 160 dim, 60 regions, flipped
• 19,200 dimensions!! • Input to other model • CelebFace • Refine training
Learning technique
for deep structure
Big dataComputing
power GPU, etc.