ming-hsuan yang, member, ieee, david j. kriegman, senior member, ieee, narendra ahuja, fellow, ieee...

Detecting Faces in Images: A Survey

Ming-Hsuan Yang, Member, IEEE, David J. Kriegman, Senior Member, IEEE, Narendra Ahuja, Fellow, IEEE

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 24, NO. 1, JANUARY 2002

IEEE TRANS. ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2002 24(1)

Face Detection

Given a single image, Identify all image regions which contain a face Regardless of

▪ its 3D position, ▪ orientation and ▪ lighting conditions

Categorize and evaluate different algorithms


Methods to Detect/Locate Faces

Knowledge-based methods Encode human knowledge of what constitutes a typical face (usually, the

relationships between facial features)

Feature invariant approaches Aim to find structural features of a face that exist even when the pose,

viewpoint, or lighting conditions vary

Template matching methods Several standard patterns stored to describe the face as a whole or the facial

features separately

Appearance-based methods The models (or templates) are learned from a set of training images which

capture the representative variability of facial appearance


Appearance-Based Methods

Learn appearance “templates” from examples in images

Statistical analysis and machine-learning

Train a classifier using positive (and usually negative) examples of faces Representation Pre processing Train a classifier Search strategy Post processing View based


Bayesian Classifier

Image or feature vector: variable x

High-dimension x multimodal of p(x|..) No natural parameterized forms Empirically validated parametric or non-

parametric approximation

( | )

( | )

p face

p nonface

x

x


Appearance-based Methods: Classifiers

Neural network: Multilayer Perceptrons Principal Component Analysis (PCA), Factor Analysis Mixture of PCA, Mixture of factor analyzers Support vector machine (SVM) Distribution-based method Naïve Bayes classifier Hidden Markov model Sparse network of winnows (SNoW) Kullback relative information Inductive learning: C4.5 Adaboost …


Eigenfaces

Face Images linearly encoded using a modest number of basis images [Kirby and Sirovich] Principle Component Analysis (PCA)

mxnEigen faces

m*n vectors, N samples

… …

K Basis vectors, K<<N

Minimize the mean square error between the projection of the training images onto this subspace and the original images

Eigenfaces for recognition

Matthew Turk and Alex PentlandJ. Cognitive Neuroscience1991

Face Recognition and Detection 9

Linear subspaces

Classification can be expensive: Big search prob (e.g., nearest neighbors) or store large PDF’s

Suppose the data points are arranged as above Idea—fit a line, classifier measures distance to line

CSE 576, Spring 2008

convert x into v1, v2 coordinates

What does the v2 coordinate measure?

What does the v1 coordinate measure?

- distance to line- use it for classification—near 0 for orange pts

- position along line- use it to specify which orange point it is


Dimensionality reduction


Dimensionality reduction• We can represent the orange points with only their v1 coordinates

(since v2 coordinates are all essentially 0)• This makes it much cheaper to store and compare points• A bigger deal for higher dimensional problems


Linear subspaces


Consider the variation along direction v among all of the orange points:

What unit vector v minimizes var?

What unit vector v maximizes var?

Solution: v1 is eigenvector of A with largest eigenvalue v2 is eigenvector of A with smallest eigenvalue


Principal component analysis

Suppose each data point is N-dimensional Same procedure applies:

The eigenvectors of A define a new coordinate system▪ eigenvector with largest eigenvalue captures the most variation among

training vectors x▪ eigenvector with smallest eigenvalue has least variation

We can compress the data using the top few eigenvectors▪ corresponds to choosing a “linear subspace”

▪ represent points on a line, plane, or “hyper-plane”▪ these eigenvectors are known as the principal components



The space of faces

An image is a point in a high dimensional space An N x M image is a point in RNM

We can define vectors in this space as we did in the 2D case


+=


Dimensionality reduction

The set of faces is a “subspace” of the set of images We can find the best subspace using PCA This is like fitting a “hyper-plane” to the set of faces

▪ spanned by vectors v1, v2, ..., vK

▪ any face



Eigenfaces

PCA extracts the eigenvectors of A Gives a set of vectors v1, v2, v3, ... Each vector is a direction in face space

▪ what do these look like?



Projecting onto the eigenfaces

The eigenfaces v1, ..., vK span the space of faces A face is converted to eigenface coordinates by



Recognition with eigenfaces

Algorithm1. Process the image database (set of images with labels)

• Run PCA—compute eigenfaces• Calculate the K coefficients for each image

2. Given a new image (to be recognized) x, calculate K coefficients

3. Detect if x is a face

4. If it is a face, who is it?▪ Find closest labeled face in database

▪ nearest-neighbor in K-dimensional space



Choosing the dimension K

How many eigenfaces to use? Look at the decay of the eigenvalues

the eigenvalue tells you the amount of variance “in the direction” of that eigenface

ignore eigenfaces with low variance


K NMi =

eigenvalues


Distribution-Based Methods

Learn distribution of image patterns from one object from positive and negative examples Distribution-based models for face/nonface

patterns▪ 19x19 image, 361-D vector▪ K-means: 6 face clusters, 6 non-face clusters▪ Multidimensional Gaussian: mean & covariance matrix

Multilayer perceptron classifier

[Sung and Poggio, 94]


Distribution-Based Methods[Sung and Poggio, 94]



Masking: reduce the unwanted background noise in a face pattern

Illumination gradient correction: find the best fit brightness plane and then subtracted from it to reduce heavy shadows caused by extreme lighting angles

Histogram equalization: compensates the imaging effects due to changes in illumination and different camera input gains



Distance Metrics

Compute distances of a sample to all the face and non-face clusters Within subspace distance (D1)

▪ Mahalanobis distance of the projected sample to the cluster center

Distance to the subspace (D2)▪ Distance of the sample to the subspace




Distance measure




Feature vector for each sample A vector of distance measurements to all clusters

Multilayer perceptron classifier Train from database: 47316

▪ 4150 face: easy to collect▪ Non-face: hard to get the representative sample

▪ Bootstrap method: selectively adds image to the training set as training progress



Face and Non-Face Exemplars

Positive examples Get as much variation as possible Manually crop and normalize each face

image into a standard size (e.g., 19 ×19) Creating virtual examples [Sung and Poggio

94]

Negative examples: Fuzzy idea Any images that do not contain faces A large image subspace Bootstraping [Sung and Poggio 94]


Creating Virtual Positive Examples

Simple and very effective method

Randomly mirror, rotate, translate and scale face samples by small amounts

Increase number of training examples

Less sensitive to alignment error

Randomly mirrored, rotated translated, and scaled faces

[Sung & Poggio 94]


Bootstrapping

1. Start with a small set of non-face examples in the training set

2. Train a MLP classifier with the current training set

3. Run the learned face detector on a sequence of random images.

4. Collect all the non-face patterns that the current system wrongly classifies as faces (i.e., false positives)

5. Add these non-face patterns to the training set

6. Got to Step 2 or stop if satisfied

Improve the system performance greatly



Probabilistic Visual Learning method based on density estimation

distance in feature space

distance from feature space

(B. Moghaddam and A. Pentland) i

PCA decomposition Principal subspace Orthogonal complement

▪ Discarded in standard PCA

Learn local features Multivariate Gaussian Mixture of Gaussians

Detect Maximum likelihood


Mixture of Factor Analyses

Factor Analysis (FA) Generative method that performs clustering and

dimensionality reduction within each cluster

Modeling the covariance structure of High dimensional data using a small number of latent variables

Similar with PCA, but different ▪ Data density is normalized along the principal component subspace ▪ Robust to independent noise in the features

Able to detect faces in wide variations

[Yang et al. 00]


Mixture of Factor Analyses

Use mixture model to detect faces in different pose

Using EM to estimate all the parameters in the mixture model

See also [Moghaddam and Pentland 97] on using probabilistic Gaussian mixture for object localization

[Yang et al. 00]


Fisher’s Linear Discriminant

High-D image space to low-D Provides a better projection than PCA for pattern

classification since it aims to find the most discriminant projection direction.

Outperform the Eigenface method on several databases

[Yang et al. 00]


Fisher’s Linear Discriminant

Apply Self Self-Organizing Map (SOM) to cluster faces/non-faces, and thereby labels for samples

Apply FLD to find optimal projection matrix for maximal separation

Estimate class-conditional density for detection

[Yang et al. 00]

Given a set of unlabeled face and non—face samples

SOM

Face/non face prototypes generated by SOM

FLD

Class Conditional Density

Maximum Likelihood Estimation


Neural Networks

Feasibility of training a system to capture the complex class conditional density of face patterns

Hierarchical neural networks [Agui et al. 1992] Two parallel subnetworks

▪ First: Inputs are intensity values from original image and intensity values from filtered image using 3x3 Sobel filter

▪ Second: outputs from the subnetworks and extracted feature values

Works for faces have the same size


Convolutional neural networks

Examples of face/non-face images: 20x20 pixels

Two neural networks: A: Trained to find approximate locations of faces at

some scale -- select candidates B: trained to determine the exact position of faces

at some scale -- verify

Vaillant et al.


Multilayer Perceptron

Compress examples using SOM

Multilayer perceptron is used to learn them for face/background classification

Detection Scanning each image at various resolution Normalize each location and size to standard size

Classify normalized window by an MLP

[Burel and Carel, 94]


Autoassociative network

With multiple layers nonlinear principle component analysis

Different autoassociative networks to One to Detect frontal-view faces One to Turned up to 60°to left/right A gating networks to assign weights to frontal/side

face detectors ▪ Utilized in an ensemble of autoassociative networks


Probabilistic Decision-Based Neural Network (PDBNN)

Similar to radial basis function network with Modified learning rules Probabilistic interpretation

Extract feature vectors on intensity and edge Contains eyebrows, eyes, nose

Feed two vectors to PDBNN and Use fusion of the outputs to classify

[Lin et al. 1997]


Multilayer Neural Network

Train multiple multilayer perceptrons with different receptive fields [Rowley and Kanade 96].

Merging the overlapping detections within one network

Train an arbitration network to combine the results from different networks

Needs to find the right neural network architecture (number of layers, hidden units, etc.) and parameters (learning rate, etc.)

Rowley et al.


Neural Network-Based DetectorH. Rowley, S. Baluja, and T. Kanade


Dealing with Multiple Detects

Merging overlapping detections within one network [Rowley and Kanade 96]


Dealing with Multiple Detects

Arbitration among multiple networks AND operator OR operator Voting Arbitration network


Support Vector Machines

A paradigm to train polynomial function, neural networks, or radial basis function (RBF) classifiers

Methods for training a classifier (e.g., Bayesian, neural networks, radial basis function RBF) are based on of minimizing the training error

SVMs operates on structural risk minimization, to minimize an upper bound on the expected generalization error


Support Vector Machines

Find the optimal separating hyperplane constructed by support vectors [Vapnik 95]

Maximize distances between the data points closest to the separating hyperplane (large margin classifier)

Formulated as a quadratic programming problem

Kernel functions for nonlinear SVMs support


SVM-Based Face Detector

Adopt similar architecture Similar to [Sung and Poggio 94] with the SVM classifier

Pros: Good recognition rate with theoretical support

Cons: Time consuming in training and

testing Need to pick the right kernel

[Osuna et al. 97]


SVM-Based Face Detector: Issues

Training: Solve a complex quadratic optimization problem Speed-up: Sequential Minimal Optimization (SMO) [Platt 99]

Testing: The number of support vectors may be large lots of kernel computations

Speed-up: Reduced set of support vectors [Romdhani et al. 01]

Variants: Component-based SVM [Heisele et al. 01]:

▪ Learn components and their geometric configuration▪ Less sensitive to pose variation


Sparse Network of Winnows (SNoW)

A sparse network of linear functions that utilizes the Winnow update rule

On line, mistake driven algorithm Attribute (feature) efficiency Allocations of nodes and links is data driven

complexity depends on number of active features Allows for combining task hierarchically Multiplicative learning rule

Yang et al. 00

http://cogcomp.cs.illinois.edu/software/doc/snow-userguide/node9.html


Sparse Network of Winnows (SNoW)

Multiplicative weight update algorithm

Pros: On--line feature selection [Yang et al. 00] Detect faces with different features and expressions, in different poses, and under

different lighting conditions

Cons: Need more powerful feature representation

Have similar performance, but computationally more efficient

Also been applied to object recognition [Yang et al. 02]

Yang et al. 00


Naive Bayes Classifier

Estimate joint probability of local appearance and position at multiple resolutions Local patterns are more unique Intensity patterns around the eyes are much more

distinctive

Learn the distribution by parts using Naïve Bayes classifier Provides better estimation of conditional density functions Provides a functional form of the posterior probability to

capture the joint statistics of local appearance and position

Schneiderman and Kanade, 98



At each scale, a face image is decomposed into 4 subregions

The project to a lower dimensional space (PCA)

Quantized into a finite set of patterns

The statistics of each projected subregion are estimated from the projected samples to encode local appearance




Apply Bayes decision rule

Further decompose the appearance into space, frequency, and orientation

Also wavelet representation for general object recognition [H. Schneiderman and T. Kanade, 00]



Detecting faces in Different Pose

Extend to detect faces in different pose with multiple detectors

Each detector specializes to a view: frontal, left pose and right pose

[Mikolajczyk et al. 01] extend to detect faces from side pose to frontal view



Experimental ResultsSchneiderman and Kanade, 98

Able to detect profile faces [Schneiderman and Kanade 98]

Extended to detect cars[Schneiderman and Kanade 00]


Hidden Markov Model

Assumption of HMM: Patterns can be characterized as a parametric random process Parameters can be estimated in a precise, well-defined manner

Develop HMM Hidden states need to be decided

Learn transitional probability between states from examples▪ each example is represented as a sequence of observations

Maximize the probability of observing the training data by adjusting the parameters (Viterbi segmentation method and Baum-Welch algorithms)


Hidden Markov Model

Face Pattern Several regions (eye, nose, mouth, forehead, chin) Observe these regions in an appropriate order

(top-bottom, left-right)

Aims to associate facial regions with the states of a continuous density Hidden Markov Model


Hidden Markov Model for Face Localization

Observe vectors: scan the window vertically with P pixels of overlap

Five hidden states

The boundaries between strips of pixels are represented by probabilistic transitions between states


Information-Theoretical Approach

Contextual constraints in a face pattern A small neighborhood of pixels

Markov random field (MRF) Convenient and consistent to model context-dependent entities

▪ image pixels ▪ correlated features

Achieved by characterizing mutual influences using conditional MRF distributions Using Kullback relative information, Markov process maximizing the information-based discrimination

between the two classes Apply to detection

http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence


Elements of Information Theory

Probability functions p(x): the template is a face q(x): the template is a non-face

Training database to estimate distribution Face

▪ 100 individuals x 9 views Nonface

▪ 143000 nonface templates using histograms

T. Cover and J. Thomas, 91



Select the most informative pixels (MIP) Maximize the Kullback relative information between p(x) and q(x)

▪ the MIP distribution focuses on the eye and mouth regions and avoids the nose area.

Use MIP to obtain linear features for classification and representation [Fukunaga and Koontz]

Detect faces Pass a window over the input image Compute the distance from face space (DFFS) [Pentland et al, 94] If the DFFS-Face < DFFS-Nonface, a face is assumed to exist within

the window



Apply Kullback relative information to Maximize the information-based discrimination between

positive and negative examples of faces

A family of discrete Markov processes Model the face and background patterns Estimate the probability model

Colmenarez and Huang, 97

Select the Markov process that maximizes the information-based discrimination between the two classes

Learning Optimization


Object Detection Using HierarchicalMRF and MAP Estimation

Combine view-based and model-based Use visual-attention algorithm to reduce search

space – select important image regions

Detect face in selected regions ▪ Combination of template matching and feature matching ▪ Using a hierarchical Markov random field ▪ Maximum a posterior estimation

Qian and Huang, 97


Inductive Learning

Learning by example A system tries to induce a general rule from a set of

observed instances

Algorithms ID3 (Quinlan, 1986) C4.5 (Quinlan, 1993) FOIL (Quinlan, 1990)

http://sifter.org/~brandyn/InductiveLearning.html

http://www.iiia.csic.es/Projects/FedLearn/OO-Induction.html


Detection of Human FacesUsing Decision Trees

Learn decision tree from positive and negative examples of face pattern Training example

▪ 8x8 pixel window ▪ represented by a vector of 30 attributes ▪ which is composed of entropy, mean, and standard deviation of the pixel intensity values.

C4.5 builds a classifier as a decision tree ▪ leaves indicate class identity ▪ nodes specify tests to perform on a single attribute.

The learned decision tree is then used to decide whether a face exists in the input example.

Results Localization accuracy rate of 96% A set of 2,340 frontal face images in the FERET data set.

J. Huang et al. 96


Learning the Human Face Concept from Black and White Pictures

Learn face concept using Mitchell’s Find-S algorithm Distribution of face patterns P(x|face) can be approximated by a set of Gaussian

clusters For a face instance,

Apply Find-S algorithm to learn the thresholding distance such that faces and nonfaces can be differentiated.

Several distinct characteristics First, it does not use negative (nonface) examples Second, only the central portion of a face is used for training. Third, feature vectors consist of images with 32 intensity levels or textures,

while some uses full-scale intensity values as inputs.

Detection rate of 90 percent on the first CMU data set.

N. Duta and A.K. Jain, IIPR, 1998.

( , ) max ( , ),0 1i j ij

Dis x c k Dis x c k


Face Databases

Training process is essential Benchmark data sets Face image Databases

FERET database ▪ consists of monochrome images taken in different frontal

views and in left and right profiles▪ assess the strengthens and weaknesses of different face

recognition approaches▪ Since each image consists of an individual on a uniform

and uncluttered background, it is not suitable for face detection benchmarking


Turk and Pentland

16 people images are taken in frontal view with slight

variability in head orientation (tilted upright, right, and left)

on a cluttered background

ftp://whitechapel.media.mit.edu/pub/images/





AT&T Cambridge Laboratories

Formerly known as the Olivetti database 10 images for 40 distinct subjects Different time, lighting, facial expression, facial

details

http://www.uk.research.att.com/facedatabase.html




Harvard Database

Cropped, masked frontal face images Taken from a wide variety of light sources

Study on face recognition under the effect of varying illumination conditions


Yale Face Database

5760 single light source images of 10 subjects each seen under 576 viewing conditions (9 poses x 64

illumination conditions).

For every subject in a particular pose An image with ambient (background) illumination was

also captured.

Total number of images is in fact 5760+90=5850. Total size of the compressed database is ~ 1GB.

http://vision.ucsd.edu/~leekc/ExtYaleDatabase/Yale%20Face%20Database.htm

http://cvc.yale.edu/projects/yalefacesB/illum_pose.html


M2VTS Multimodal Database

Developed for access control experiments using multimodal inputs

Contains sequences of face images of 37 people. Five sequences for each subject were taken over one

week. Each image sequence contains images from right

profile (-90 degree) to left profile (90 degree) While the subject counts from“0” to “9” in their

native languages


UMIST Database

564 images of 20 people with varying pose.

The images of each subject cover a range of poses from right profile to frontal views


Purdue AR Database

3,276 color images of 126 people (70 males + 56 females) in frontal view Designed for face recognition experiments under several mixing factors, such as

facial expressions, illumination conditions, and occlusions. Also has been applied to image and video indexing as well as retrieval

All the faces appear with different facial expression (neutral, smile, anger, and scream), illumination (left light source, right light source, and sources from both sides), Occlusion (wearing sunglasses or scarf).

Taken During two sessions separated by two weeks. By the same camera setup under tightly controlled conditions of illumination

and pose.

A. Martinez and R. Benavente, 1998


Face Image Databaseshttp://web.mit.edu/emeyers/www/face_databases.html

The abovementioned databases are designed mainly to measure performance of face recognition methods and, thus,

each image contains only one individual.Best utilized as training sets rather than test sets

http://web.mit.edu/emeyers/www/face_databases.html

http://web.mit.edu/emeyers/www/face_databases.html


Benchmark Test Sets

K.-K. Sung and T. Poggio, 96&98 First, 301 frontal and near-frontal mugshots of 71

different people▪ High quality digitized images with a fair amount of lighting

variation Second, 23 images with a total of 149 face patterns.

Most of these images have complex background with Faces taking up only a small amount of the image area


Samples of Sung and Poggio 98

Some images are scanned from newspapers and, thus, have low resolution. Though most faces in the images are upright and frontal. Some faces in the

images appear in different pose


Database by Rowley et al.

130 images with a total of 507 frontal faces. Also includes 23 images of the second data

set used by [Sung and Poggio, 1998].

Most images contain more than one face on a cluttered background

A good test set to assess algorithms which detect upright frontal faces.

http://vasc.ri.cmu.edu/NNFaceDetector/


Database by Rowley et al.

Some images contain hand-drawn cartoon faces.

Most images contain more than one face and the face size varies significantly.

http://vasc.ri.cmu.edu/NNFaceDetector/


Another Database by Rowley et al.

For detecting 2D faces with frontal pose and rotation in image

50 images with a total of 223 faces, of which 210 are at

angles > 10 degrees.


Profile Views Database

208 images Each image contains

faces with facial expressions and in profile views



Kodak Face Database

A common test bed for direct benchmarking of face detection and recognition algorithms

300 digital photos Captured in a variety of resolutions Face size ranges from as small as 13x13 pixels to as

large as 300x300 pixels.


Test Sets for Face Detection


Performance Evaluation

They were not tested on the same test set

Performance among several appearance-based face detection methods on two standard data sets Test Set 1 (125 Images with 483 Faces) and Test Set 2 (23 Images with 136 Faces)


Experimental Results

Appearance-based face detection methods

The number and variety of training examples have a direct effect on the classification performance


More Issues

Training time and execution time The number of scanning windows vary a lot Different criteria adopted in reporting the detection

rates

A loose criterion may declareall the faces as “successful” detections, while a more strict one would declare most of them as nonfaces.


More Issues

Training time and execution time The number of scanning windows vary a lot Different criteria adopted in reporting the detection

rates The evaluation criteria may and should depend on

the purpose of the detector Required computational resources, particularly, time

and memory


A Collect of sample face detection codes and evaluation tools

http://vision.ai.uiuc.edu/mhyang/face-detection-survey.html




Detecting Faces in Images: A Survey

Provide a comprehensive survey of research on face detection

Provide some structural categories for the methods described in over 150 papers

It is imprudent to explicitly declare which methods indeed have the lowest error rates The community needs to more seriously consider

systematic performance evaluation


Challenging and Interesting Problem

The class of faces admits a great deal of shape, color, albedo variability due to differences

in individuals, nonrigidity, facial hair, glasses, and

makeup

Images are formed under variable lighting and 3D pose and may have cluttered

backgrounds

Robust real-time face detection

Paul A. Viola and Michael J. JonesIntl. J. Computer Vision57(2), 137–154, 2004(originally in CVPR’2001)(slides adapted from Bill Freeman, MIT 6.869, April 2005)

Face Recognition and Detection 89CSE 576, Spring 2008

Scan classifier over locs. & scales


“Learn” classifier from data

Training Data• 5000 faces (frontal)• 108 non faces• Faces are normalized

Scale, translation Many variations• Across individuals• Illumination• Pose (rotation both in plane and out)


Characteristics of algorithm

• Feature set (…is huge about 16M features)• Efficient feature selection using AdaBoost• New image representation: Integral Image • Cascaded Classifier for rapid detection

Fastest known face detector for gray scale images


Image features

• “Rectangle filters” Similar to Haar wavelets

• Differences between sums of pixels inadjacent rectangles


Partial sum

Any rectangle is D = 1+4-(2+3)

Also known as:• summed area tables [Crow84]• boxlets [Simard98]

Integral Image


Huge library of filters


Constructing the classifier

Perceptron yields a sufficiently powerful classifier

Use AdaBoost to efficiently choose best features• add a new hi(x) at each round• each hi(xk) is a “decision stump”

b=Ew(y [x> q])

a=Ew(y [x< q])x

hi(x)

q


Constructing the classifier

For each round of boosting:• Evaluate each rectangle filter on each example• Sort examples by filter values• Select best threshold for each filter (min error)

Use sorting to quickly scan for optimal threshold• Select best filter/threshold combination• Weight is a simple function of error rate• Reweight examples

(There are many tricks to make this more efficient.)


Good reference on boosting

Friedman, J., Hastie, T. and Tibshirani, R. Additive Logistic Regression: a Statistical View of Boosting

http://www-stat.stanford.edu/~hastie/Papers/boost.ps “We show that boosting fits an additive logistic regression

model by stagewise optimization of a criterion very similar to the log-likelihood, and present likelihood based alternatives. We also propose a multi-logit boosting procedure which appears to have advantages over other methods proposed so far.”

http://www-stat.stanford.edu/~hastie/Papers/boost.ps


Trading speed for accuracy

Given a nested set of classifier hypothesis classes

Computational Risk Minimization


Speed of face detector (2001)

Speed is proportional to the average number of features computed per sub-window.

On the MIT+CMU test set, an average of 9 features (/ 6061) are computed per sub-window.

On a 700 Mhz Pentium III, a 384x288 pixel image takes about 0.067 seconds to process (15 fps).

Roughly 15 times faster than Rowley-Baluja-Kanade and 600 times faster than Schneiderman-Kanade.


Sample results


Summary (Viola-Jones)

• Fastest known face detector for gray images• Three contributions with broad applicability:

Cascaded classifier yields rapid classificationAdaBoost as an extremely efficient feature

selectorRectangle Features + Integral Image can be used

for rapid image analysis


Face detector comparison

Informal study by Andrew Gallagher, CMU,for CMU 16-721 Learning-Based Methods in Vision, Spring 2007 The Viola Jones algorithm OpenCV implementation

was used. (<2 sec per image). For Schneiderman and Kanade, Object Detection

Using the Statistics of Parts [IJCV’04], the www.pittpatt.com demo was used. (~10-15 seconds per image, including web transmission).

http://www.cs.cmu.edu/~efros/courses/LBMV07/

http://www.pittpatt.com/


SchneidermanKanadeViola

Jones

Example-based Caricature Generation with Exaggeration

Lin Liang1, Hong Chen2, Ying-Qing Xu1, Heung-Yeung Shum11 Microsoft Research, Asia2 Xi’an Jiaotong University, China


Labeled feature points

Training data include 92 pairs of original facial images <--> exaggerated caricatures

drawn by an artist


System Framework


Exaggerated Caricature

Original image

Unexaggerated sketch

Exaggerated caricature

Caricatureby the artist

Apply to the image

ming-hsuan yang, member, ieee, david j. kriegman, senior member, ieee, narendra ahuja, fellow, ieee...

Documents