ming-hsuan yang, member, ieee, david j. kriegman, senior member, ieee, narendra ahuja, fellow, ieee...
TRANSCRIPT
Detecting Faces in Images: A Survey
Ming-Hsuan Yang, Member, IEEE, David J. Kriegman, Senior Member, IEEE, Narendra Ahuja, Fellow, IEEE
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 1, JANUARY 2002
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Face Detection
Given a single image, Identify all image regions which contain a face Regardless of
▪ its 3D position, ▪ orientation and ▪ lighting conditions
Categorize and evaluate different algorithms
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Methods to Detect/Locate Faces
Knowledge-based methods Encode human knowledge of what constitutes a typical face (usually, the
relationships between facial features)
Feature invariant approaches Aim to find structural features of a face that exist even when the pose,
viewpoint, or lighting conditions vary
Template matching methods Several standard patterns stored to describe the face as a whole or the facial
features separately
Appearance-based methods The models (or templates) are learned from a set of training images which
capture the representative variability of facial appearance
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Appearance-Based Methods
Learn appearance “templates” from examples in images
Statistical analysis and machine-learning
Train a classifier using positive (and usually negative) examples of faces Representation Pre processing Train a classifier Search strategy Post processing View based
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Bayesian Classifier
Image or feature vector: variable x
High-dimension x multimodal of p(x|..) No natural parameterized forms Empirically validated parametric or non-
parametric approximation
( | )
( | )
p face
p nonface
x
x
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Appearance-based Methods: Classifiers
Neural network: Multilayer Perceptrons Principal Component Analysis (PCA), Factor Analysis Mixture of PCA, Mixture of factor analyzers Support vector machine (SVM) Distribution-based method Naïve Bayes classifier Hidden Markov model Sparse network of winnows (SNoW) Kullback relative information Inductive learning: C4.5 Adaboost …
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Eigenfaces
Face Images linearly encoded using a modest number of basis images [Kirby and Sirovich] Principle Component Analysis (PCA)
mxnEigen faces
m*n vectors, N samples
… …
K Basis vectors, K<<N
Minimize the mean square error between the projection of the training images onto this subspace and the original images
Eigenfaces for recognition
Matthew Turk and Alex PentlandJ. Cognitive Neuroscience1991
Face Recognition and Detection 9
Linear subspaces
Classification can be expensive: Big search prob (e.g., nearest neighbors) or store large PDF’s
Suppose the data points are arranged as above Idea—fit a line, classifier measures distance to line
CSE 576, Spring 2008
convert x into v1, v2 coordinates
What does the v2 coordinate measure?
What does the v1 coordinate measure?
- distance to line- use it for classification—near 0 for orange pts
- position along line- use it to specify which orange point it is
Face Recognition and Detection 10
Dimensionality reduction
CSE 576, Spring 2008
Dimensionality reduction• We can represent the orange points with only their v1 coordinates
(since v2 coordinates are all essentially 0)• This makes it much cheaper to store and compare points• A bigger deal for higher dimensional problems
Face Recognition and Detection 11
Linear subspaces
CSE 576, Spring 2008
Consider the variation along direction v among all of the orange points:
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution: v1 is eigenvector of A with largest eigenvalue v2 is eigenvector of A with smallest eigenvalue
Face Recognition and Detection 12
Principal component analysis
Suppose each data point is N-dimensional Same procedure applies:
The eigenvectors of A define a new coordinate system▪ eigenvector with largest eigenvalue captures the most variation among
training vectors x▪ eigenvector with smallest eigenvalue has least variation
We can compress the data using the top few eigenvectors▪ corresponds to choosing a “linear subspace”
▪ represent points on a line, plane, or “hyper-plane”▪ these eigenvectors are known as the principal components
CSE 576, Spring 2008
Face Recognition and Detection 13
The space of faces
An image is a point in a high dimensional space An N x M image is a point in RNM
We can define vectors in this space as we did in the 2D case
CSE 576, Spring 2008
+=
Face Recognition and Detection 14
Dimensionality reduction
The set of faces is a “subspace” of the set of images We can find the best subspace using PCA This is like fitting a “hyper-plane” to the set of faces
▪ spanned by vectors v1, v2, ..., vK
▪ any face
CSE 576, Spring 2008
Face Recognition and Detection 15
Eigenfaces
PCA extracts the eigenvectors of A Gives a set of vectors v1, v2, v3, ... Each vector is a direction in face space
▪ what do these look like?
CSE 576, Spring 2008
Face Recognition and Detection 16
Projecting onto the eigenfaces
The eigenfaces v1, ..., vK span the space of faces A face is converted to eigenface coordinates by
CSE 576, Spring 2008
Face Recognition and Detection 17
Recognition with eigenfaces
Algorithm1. Process the image database (set of images with labels)
• Run PCA—compute eigenfaces• Calculate the K coefficients for each image
2. Given a new image (to be recognized) x, calculate K coefficients
3. Detect if x is a face
4. If it is a face, who is it?▪ Find closest labeled face in database
▪ nearest-neighbor in K-dimensional space
CSE 576, Spring 2008
Face Recognition and Detection 18
Choosing the dimension K
How many eigenfaces to use? Look at the decay of the eigenvalues
the eigenvalue tells you the amount of variance “in the direction” of that eigenface
ignore eigenfaces with low variance
CSE 576, Spring 2008
K NMi =
eigenvalues
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Distribution-Based Methods
Learn distribution of image patterns from one object from positive and negative examples Distribution-based models for face/nonface
patterns▪ 19x19 image, 361-D vector▪ K-means: 6 face clusters, 6 non-face clusters▪ Multidimensional Gaussian: mean & covariance matrix
Multilayer perceptron classifier
[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Distribution-Based Methods[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Distribution-Based Methods
Masking: reduce the unwanted background noise in a face pattern
Illumination gradient correction: find the best fit brightness plane and then subtracted from it to reduce heavy shadows caused by extreme lighting angles
Histogram equalization: compensates the imaging effects due to changes in illumination and different camera input gains
[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Distance Metrics
Compute distances of a sample to all the face and non-face clusters Within subspace distance (D1)
▪ Mahalanobis distance of the projected sample to the cluster center
Distance to the subspace (D2)▪ Distance of the sample to the subspace
[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Distribution-Based Methods
Distance measure
[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Distribution-Based Methods
Feature vector for each sample A vector of distance measurements to all clusters
Multilayer perceptron classifier Train from database: 47316
▪ 4150 face: easy to collect▪ Non-face: hard to get the representative sample
▪ Bootstrap method: selectively adds image to the training set as training progress
[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Face and Non-Face Exemplars
Positive examples Get as much variation as possible Manually crop and normalize each face
image into a standard size (e.g., 19 ×19) Creating virtual examples [Sung and Poggio
94]
Negative examples: Fuzzy idea Any images that do not contain faces A large image subspace Bootstraping [Sung and Poggio 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Creating Virtual Positive Examples
Simple and very effective method
Randomly mirror, rotate, translate and scale face samples by small amounts
Increase number of training examples
Less sensitive to alignment error
Randomly mirrored, rotated translated, and scaled faces
[Sung & Poggio 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Bootstrapping
1. Start with a small set of non-face examples in the training set
2. Train a MLP classifier with the current training set
3. Run the learned face detector on a sequence of random images.
4. Collect all the non-face patterns that the current system wrongly classifies as faces (i.e., false positives)
5. Add these non-face patterns to the training set
6. Got to Step 2 or stop if satisfied
Improve the system performance greatly
[Sung and Poggio, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Probabilistic Visual Learning method based on density estimation
distance in feature space
distance from feature space
(B. Moghaddam and A. Pentland) i
PCA decomposition Principal subspace Orthogonal complement
▪ Discarded in standard PCA
Learn local features Multivariate Gaussian Mixture of Gaussians
Detect Maximum likelihood
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Mixture of Factor Analyses
Factor Analysis (FA) Generative method that performs clustering and
dimensionality reduction within each cluster
Modeling the covariance structure of High dimensional data using a small number of latent variables
Similar with PCA, but different ▪ Data density is normalized along the principal component subspace ▪ Robust to independent noise in the features
Able to detect faces in wide variations
[Yang et al. 00]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Mixture of Factor Analyses
Use mixture model to detect faces in different pose
Using EM to estimate all the parameters in the mixture model
See also [Moghaddam and Pentland 97] on using probabilistic Gaussian mixture for object localization
[Yang et al. 00]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Fisher’s Linear Discriminant
High-D image space to low-D Provides a better projection than PCA for pattern
classification since it aims to find the most discriminant projection direction.
Outperform the Eigenface method on several databases
[Yang et al. 00]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Fisher’s Linear Discriminant
Apply Self Self-Organizing Map (SOM) to cluster faces/non-faces, and thereby labels for samples
Apply FLD to find optimal projection matrix for maximal separation
Estimate class-conditional density for detection
[Yang et al. 00]
Given a set of unlabeled face and non—face samples
SOM
Face/non face prototypes generated by SOM
FLD
Class Conditional Density
Maximum Likelihood Estimation
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Neural Networks
Feasibility of training a system to capture the complex class conditional density of face patterns
Hierarchical neural networks [Agui et al. 1992] Two parallel subnetworks
▪ First: Inputs are intensity values from original image and intensity values from filtered image using 3x3 Sobel filter
▪ Second: outputs from the subnetworks and extracted feature values
Works for faces have the same size
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Convolutional neural networks
Examples of face/non-face images: 20x20 pixels
Two neural networks: A: Trained to find approximate locations of faces at
some scale -- select candidates B: trained to determine the exact position of faces
at some scale -- verify
Vaillant et al.
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Multilayer Perceptron
Compress examples using SOM
Multilayer perceptron is used to learn them for face/background classification
Detection Scanning each image at various resolution Normalize each location and size to standard size
Classify normalized window by an MLP
[Burel and Carel, 94]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Autoassociative network
With multiple layers nonlinear principle component analysis
Different autoassociative networks to One to Detect frontal-view faces One to Turned up to 60°to left/right A gating networks to assign weights to frontal/side
face detectors ▪ Utilized in an ensemble of autoassociative networks
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Probabilistic Decision-Based Neural Network (PDBNN)
Similar to radial basis function network with Modified learning rules Probabilistic interpretation
Extract feature vectors on intensity and edge Contains eyebrows, eyes, nose
Feed two vectors to PDBNN and Use fusion of the outputs to classify
[Lin et al. 1997]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Multilayer Neural Network
Train multiple multilayer perceptrons with different receptive fields [Rowley and Kanade 96].
Merging the overlapping detections within one network
Train an arbitration network to combine the results from different networks
Needs to find the right neural network architecture (number of layers, hidden units, etc.) and parameters (learning rate, etc.)
Rowley et al.
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Neural Network-Based DetectorH. Rowley, S. Baluja, and T. Kanade
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Dealing with Multiple Detects
Merging overlapping detections within one network [Rowley and Kanade 96]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Dealing with Multiple Detects
Arbitration among multiple networks AND operator OR operator Voting Arbitration network
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Support Vector Machines
A paradigm to train polynomial function, neural networks, or radial basis function (RBF) classifiers
Methods for training a classifier (e.g., Bayesian, neural networks, radial basis function RBF) are based on of minimizing the training error
SVMs operates on structural risk minimization, to minimize an upper bound on the expected generalization error
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Support Vector Machines
Find the optimal separating hyperplane constructed by support vectors [Vapnik 95]
Maximize distances between the data points closest to the separating hyperplane (large margin classifier)
Formulated as a quadratic programming problem
Kernel functions for nonlinear SVMs support
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
SVM-Based Face Detector
Adopt similar architecture Similar to [Sung and Poggio 94] with the SVM classifier
Pros: Good recognition rate with theoretical support
Cons: Time consuming in training and
testing Need to pick the right kernel
[Osuna et al. 97]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
SVM-Based Face Detector: Issues
Training: Solve a complex quadratic optimization problem Speed-up: Sequential Minimal Optimization (SMO) [Platt 99]
Testing: The number of support vectors may be large lots of kernel computations
Speed-up: Reduced set of support vectors [Romdhani et al. 01]
Variants: Component-based SVM [Heisele et al. 01]:
▪ Learn components and their geometric configuration▪ Less sensitive to pose variation
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Sparse Network of Winnows (SNoW)
A sparse network of linear functions that utilizes the Winnow update rule
On line, mistake driven algorithm Attribute (feature) efficiency Allocations of nodes and links is data driven
complexity depends on number of active features Allows for combining task hierarchically Multiplicative learning rule
Yang et al. 00
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Sparse Network of Winnows (SNoW)
Multiplicative weight update algorithm
Pros: On--line feature selection [Yang et al. 00] Detect faces with different features and expressions, in different poses, and under
different lighting conditions
Cons: Need more powerful feature representation
Have similar performance, but computationally more efficient
Also been applied to object recognition [Yang et al. 02]
Yang et al. 00
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Naive Bayes Classifier
Estimate joint probability of local appearance and position at multiple resolutions Local patterns are more unique Intensity patterns around the eyes are much more
distinctive
Learn the distribution by parts using Naïve Bayes classifier Provides better estimation of conditional density functions Provides a functional form of the posterior probability to
capture the joint statistics of local appearance and position
Schneiderman and Kanade, 98
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Naive Bayes Classifier
At each scale, a face image is decomposed into 4 subregions
The project to a lower dimensional space (PCA)
Quantized into a finite set of patterns
The statistics of each projected subregion are estimated from the projected samples to encode local appearance
Schneiderman and Kanade, 98
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Naive Bayes Classifier
Apply Bayes decision rule
Further decompose the appearance into space, frequency, and orientation
Also wavelet representation for general object recognition [H. Schneiderman and T. Kanade, 00]
Schneiderman and Kanade, 98
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Detecting faces in Different Pose
Extend to detect faces in different pose with multiple detectors
Each detector specializes to a view: frontal, left pose and right pose
[Mikolajczyk et al. 01] extend to detect faces from side pose to frontal view
Schneiderman and Kanade, 98
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Experimental ResultsSchneiderman and Kanade, 98
Able to detect profile faces [Schneiderman and Kanade 98]
Extended to detect cars[Schneiderman and Kanade 00]
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Hidden Markov Model
Assumption of HMM: Patterns can be characterized as a parametric random process Parameters can be estimated in a precise, well-defined manner
Develop HMM Hidden states need to be decided
Learn transitional probability between states from examples▪ each example is represented as a sequence of observations
Maximize the probability of observing the training data by adjusting the parameters (Viterbi segmentation method and Baum-Welch algorithms)
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Hidden Markov Model
Face Pattern Several regions (eye, nose, mouth, forehead, chin) Observe these regions in an appropriate order
(top-bottom, left-right)
Aims to associate facial regions with the states of a continuous density Hidden Markov Model
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Hidden Markov Model for Face Localization
Observe vectors: scan the window vertically with P pixels of overlap
Five hidden states
The boundaries between strips of pixels are represented by probabilistic transitions between states
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Information-Theoretical Approach
Contextual constraints in a face pattern A small neighborhood of pixels
Markov random field (MRF) Convenient and consistent to model context-dependent entities
▪ image pixels ▪ correlated features
Achieved by characterizing mutual influences using conditional MRF distributions Using Kullback relative information, Markov process maximizing the information-based discrimination
between the two classes Apply to detection
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Elements of Information Theory
Probability functions p(x): the template is a face q(x): the template is a non-face
Training database to estimate distribution Face
▪ 100 individuals x 9 views Nonface
▪ 143000 nonface templates using histograms
T. Cover and J. Thomas, 91
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Information-Theoretical Approach
Select the most informative pixels (MIP) Maximize the Kullback relative information between p(x) and q(x)
▪ the MIP distribution focuses on the eye and mouth regions and avoids the nose area.
Use MIP to obtain linear features for classification and representation [Fukunaga and Koontz]
Detect faces Pass a window over the input image Compute the distance from face space (DFFS) [Pentland et al, 94] If the DFFS-Face < DFFS-Nonface, a face is assumed to exist within
the window
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Information-Theoretical Approach
Apply Kullback relative information to Maximize the information-based discrimination between
positive and negative examples of faces
A family of discrete Markov processes Model the face and background patterns Estimate the probability model
Colmenarez and Huang, 97
Select the Markov process that maximizes the information-based discrimination between the two classes
Learning Optimization
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Object Detection Using HierarchicalMRF and MAP Estimation
Combine view-based and model-based Use visual-attention algorithm to reduce search
space – select important image regions
Detect face in selected regions ▪ Combination of template matching and feature matching ▪ Using a hierarchical Markov random field ▪ Maximum a posterior estimation
Qian and Huang, 97
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Inductive Learning
Learning by example A system tries to induce a general rule from a set of
observed instances
Algorithms ID3 (Quinlan, 1986) C4.5 (Quinlan, 1993) FOIL (Quinlan, 1990)
http://sifter.org/~brandyn/InductiveLearning.html
http://www.iiia.csic.es/Projects/FedLearn/OO-Induction.html
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Detection of Human FacesUsing Decision Trees
Learn decision tree from positive and negative examples of face pattern Training example
▪ 8x8 pixel window ▪ represented by a vector of 30 attributes ▪ which is composed of entropy, mean, and standard deviation of the pixel intensity values.
C4.5 builds a classifier as a decision tree ▪ leaves indicate class identity ▪ nodes specify tests to perform on a single attribute.
The learned decision tree is then used to decide whether a face exists in the input example.
Results Localization accuracy rate of 96% A set of 2,340 frontal face images in the FERET data set.
J. Huang et al. 96
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Learning the Human Face Concept from Black and White Pictures
Learn face concept using Mitchell’s Find-S algorithm Distribution of face patterns P(x|face) can be approximated by a set of Gaussian
clusters For a face instance,
Apply Find-S algorithm to learn the thresholding distance such that faces and nonfaces can be differentiated.
Several distinct characteristics First, it does not use negative (nonface) examples Second, only the central portion of a face is used for training. Third, feature vectors consist of images with 32 intensity levels or textures,
while some uses full-scale intensity values as inputs.
Detection rate of 90 percent on the first CMU data set.
N. Duta and A.K. Jain, IIPR, 1998.
( , ) max ( , ),0 1i j ij
Dis x c k Dis x c k
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Face Databases
Training process is essential Benchmark data sets Face image Databases
FERET database ▪ consists of monochrome images taken in different frontal
views and in left and right profiles▪ assess the strengthens and weaknesses of different face
recognition approaches▪ Since each image consists of an individual on a uniform
and uncluttered background, it is not suitable for face detection benchmarking
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Turk and Pentland
16 people images are taken in frontal view with slight
variability in head orientation (tilted upright, right, and left)
on a cluttered background
ftp://whitechapel.media.mit.edu/pub/images/
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
AT&T Cambridge Laboratories
Formerly known as the Olivetti database 10 images for 40 distinct subjects Different time, lighting, facial expression, facial
details
http://www.uk.research.att.com/facedatabase.html
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Harvard Database
Cropped, masked frontal face images Taken from a wide variety of light sources
Study on face recognition under the effect of varying illumination conditions
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Yale Face Database
5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64
illumination conditions).
For every subject in a particular pose An image with ambient (background) illumination was
also captured.
Total number of images is in fact 5760+90=5850. Total size of the compressed database is ~ 1GB.
http://vision.ucsd.edu/~leekc/ExtYaleDatabase/Yale%20Face%20Database.htm
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
M2VTS Multimodal Database
Developed for access control experiments using multimodal inputs
Contains sequences of face images of 37 people. Five sequences for each subject were taken over one
week. Each image sequence contains images from right
profile (-90 degree) to left profile (90 degree) While the subject counts from“0” to “9” in their
native languages
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
UMIST Database
564 images of 20 people with varying pose.
The images of each subject cover a range of poses from right profile to frontal views
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Purdue AR Database
3,276 color images of 126 people (70 males + 56 females) in frontal view Designed for face recognition experiments under several mixing factors, such as
facial expressions, illumination conditions, and occlusions. Also has been applied to image and video indexing as well as retrieval
All the faces appear with different facial expression (neutral, smile, anger, and scream), illumination (left light source, right light source, and sources from both sides), Occlusion (wearing sunglasses or scarf).
Taken During two sessions separated by two weeks. By the same camera setup under tightly controlled conditions of illumination
and pose.
A. Martinez and R. Benavente, 1998
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Face Image Databaseshttp://web.mit.edu/emeyers/www/face_databases.html
The abovementioned databases are designed mainly to measure performance of face recognition methods and, thus,
each image contains only one individual.Best utilized as training sets rather than test sets
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Benchmark Test Sets
K.-K. Sung and T. Poggio, 96&98 First, 301 frontal and near-frontal mugshots of 71
different people▪ High quality digitized images with a fair amount of lighting
variation Second, 23 images with a total of 149 face patterns.
Most of these images have complex background with Faces taking up only a small amount of the image area
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Samples of Sung and Poggio 98
Some images are scanned from newspapers and, thus, have low resolution. Though most faces in the images are upright and frontal. Some faces in the
images appear in different pose
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Database by Rowley et al.
130 images with a total of 507 frontal faces. Also includes 23 images of the second data
set used by [Sung and Poggio, 1998].
Most images contain more than one face on a cluttered background
A good test set to assess algorithms which detect upright frontal faces.
http://vasc.ri.cmu.edu/NNFaceDetector/
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Database by Rowley et al.
Some images contain hand-drawn cartoon faces.
Most images contain more than one face and the face size varies significantly.
http://vasc.ri.cmu.edu/NNFaceDetector/
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Another Database by Rowley et al.
For detecting 2D faces with frontal pose and rotation in image
50 images with a total of 223 faces, of which 210 are at
angles > 10 degrees.
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Profile Views Database
208 images Each image contains
faces with facial expressions and in profile views
Schneiderman and Kanade, 00
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Kodak Face Database
A common test bed for direct benchmarking of face detection and recognition algorithms
300 digital photos Captured in a variety of resolutions Face size ranges from as small as 13x13 pixels to as
large as 300x300 pixels.
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Test Sets for Face Detection
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Performance Evaluation
They were not tested on the same test set
Performance among several appearance-based face detection methods on two standard data sets Test Set 1 (125 Images with 483 Faces) and Test Set 2 (23 Images with 136 Faces)
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Experimental Results
Appearance-based face detection methods
The number and variety of training examples have a direct effect on the classification performance
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
More Issues
Training time and execution time The number of scanning windows vary a lot Different criteria adopted in reporting the detection
rates
A loose criterion may declareall the faces as “successful” detections, while a more strict one would declare most of them as nonfaces.
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
More Issues
Training time and execution time The number of scanning windows vary a lot Different criteria adopted in reporting the detection
rates The evaluation criteria may and should depend on
the purpose of the detector Required computational resources, particularly, time
and memory
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
A Collect of sample face detection codes and evaluation tools
http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Detecting Faces in Images: A Survey
Provide a comprehensive survey of research on face detection
Provide some structural categories for the methods described in over 150 papers
It is imprudent to explicitly declare which methods indeed have the lowest error rates The community needs to more seriously consider
systematic performance evaluation
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Challenging and Interesting Problem
The class of faces admits a great deal of shape, color, albedo variability due to differences
in individuals, nonrigidity, facial hair, glasses, and
makeup
Images are formed under variable lighting and 3D pose and may have cluttered
backgrounds
Robust real-time face detection
Paul A. Viola and Michael J. JonesIntl. J. Computer Vision57(2), 137–154, 2004(originally in CVPR’2001)(slides adapted from Bill Freeman, MIT 6.869, April 2005)
Face Recognition and Detection 89CSE 576, Spring 2008
Scan classifier over locs. & scales
Face Recognition and Detection 90CSE 576, Spring 2008
“Learn” classifier from data
Training Data• 5000 faces (frontal)• 108 non faces• Faces are normalized
Scale, translation Many variations• Across individuals• Illumination• Pose (rotation both in plane and out)
Face Recognition and Detection 91CSE 576, Spring 2008
Characteristics of algorithm
• Feature set (…is huge about 16M features)• Efficient feature selection using AdaBoost• New image representation: Integral Image • Cascaded Classifier for rapid detection
Fastest known face detector for gray scale images
Face Recognition and Detection 92CSE 576, Spring 2008
Image features
• “Rectangle filters” Similar to Haar wavelets
• Differences between sums of pixels inadjacent rectangles
Face Recognition and Detection 93CSE 576, Spring 2008
Partial sum
Any rectangle is D = 1+4-(2+3)
Also known as:• summed area tables [Crow84]• boxlets [Simard98]
Integral Image
Face Recognition and Detection 94CSE 576, Spring 2008
Huge library of filters
Face Recognition and Detection 95CSE 576, Spring 2008
Constructing the classifier
Perceptron yields a sufficiently powerful classifier
Use AdaBoost to efficiently choose best features• add a new hi(x) at each round• each hi(xk) is a “decision stump”
b=Ew(y [x> q])
a=Ew(y [x< q])x
hi(x)
q
Face Recognition and Detection 96CSE 576, Spring 2008
Constructing the classifier
For each round of boosting:• Evaluate each rectangle filter on each example• Sort examples by filter values• Select best threshold for each filter (min error)
Use sorting to quickly scan for optimal threshold• Select best filter/threshold combination• Weight is a simple function of error rate• Reweight examples
(There are many tricks to make this more efficient.)
Face Recognition and Detection 97CSE 576, Spring 2008
Good reference on boosting
Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting
http://www-stat.stanford.edu/~hastie/Papers/boost.ps “We show that boosting fits an additive logistic regression
model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.”
Face Recognition and Detection 98CSE 576, Spring 2008
Trading speed for accuracy
Given a nested set of classifier hypothesis classes
Computational Risk Minimization
Face Recognition and Detection 99CSE 576, Spring 2008
Speed of face detector (2001)
Speed is proportional to the average number of features computed per sub-window.
On the MIT+CMU test set, an average of 9 features (/ 6061) are computed per sub-window.
On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).
Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.
Face Recognition and Detection 100CSE 576, Spring 2008
Sample results
Face Recognition and Detection 101CSE 576, Spring 2008
Summary (Viola-Jones)
• Fastest known face detector for gray images• Three contributions with broad applicability:
Cascaded classifier yields rapid classificationAdaBoost as an extremely efficient feature
selectorRectangle Features + Integral Image can be used
for rapid image analysis
Face Recognition and Detection 102CSE 576, Spring 2008
Face detector comparison
Informal study by Andrew Gallagher, CMU,for CMU 16-721 Learning-Based Methods in Vision, Spring 2007 The Viola Jones algorithm OpenCV implementation
was used. (<2 sec per image). For Schneiderman and Kanade, Object Detection
Using the Statistics of Parts [IJCV’04], the www.pittpatt.com demo was used. (~10-15 seconds per image, including web transmission).
Face Recognition and Detection 103CSE 576, Spring 2008
SchneidermanKanadeViola
Jones
Example-based Caricature Generation with Exaggeration
Lin Liang1, Hong Chen2, Ying-Qing Xu1, Heung-Yeung Shum11 Microsoft Research, Asia2 Xi’an Jiaotong University, China
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Labeled feature points
Training data include 92 pairs of original facial images <--> exaggerated caricatures
drawn by an artist
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
System Framework
IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)
Exaggerated Caricature
Original image
Unexaggerated sketch
Exaggerated caricature
Caricatureby the artist
Apply to the image