neutral face classification using personalized appearance models for fast and robust emotion...

NEUTRALFACE CLASSIFICATION USING PERSONALIZED APPEARANCE MODELS FOR FAST AND ROBUST EMOTION DETECTION

WELCOME

NEUTRAL FACE CLASSIFICATION USING PERSONALIZED APPEARANCE MODELS FOR FAST AND ROBUST EMOTION DETECTION Presented by,Lekshmi RamachandranMtechCSIPRoll No.7

CONTENTS

MOTIVATION

INTRODUCTION

PROPOSED SYSTEM

ADVANTAGES

LIMITATIONS

CONCLUSION

FUTURE WORKS

3

MOTIVATIONRobust neutral face recognition in real time-a major problem

User stays in neutral state for majority of time.

Neutral positioning of the facial features, implying a lack of strong emotion.

Emotions can be considered as short temporalvisual events, which are deviations from the predominant neutral state of any person.

INTRODUCTIONProposing a light-weight neutral vs. emotion classification engine, acting as a pre-processor to traditional supervised emotion classification approaches.

Increases accuracy and decreases computational complexiety of ER.

INTRODUCTIONSupervised learning based facial expression recognition methods

Geometry based methodAppearance based method

Track key feature points on face, and the spatial and temporal distances between the points are used as features to classify AUs. Classify textural appearance changes on the face into various AUs.

INTRODUCTIONOn mobile platforms, processing speed of ER system is slow.(why?)

More CPU idle states Faster processing ER further implying lesser battery drain.

Processing of all frames not needed, if we know a given frame is a neutral or emotion using a low complexity pre-processor algorithm.

ACCURACY AND SPEEDA light weight on-line method for neutral vs. emotion classification as pre-processor.

Constructs personalized model - both learning and testing on same user.

Neutral frames are preferred for reference model generation.

ASSUMPTIONA set of a few reference neutral frames of the user is available apriori

Experiment conducted.

User will be in neutral state for majority of the time , so face can be sampled at regular intervals and using online clustering techniques, neutral face of the user can be learnt.

The probability of later frames having emotion is high so very first two frames are taken as reference for each user.

NEUTRAL DETECTION USING ONLINE LEARNT PERSONALIZED APPEARANCE MODELSKey steps of our algorithm is discussed:Procrustes analysisSelection of key emotion pointsTracking of KE pointsPatch representationAffine noise based statistical model generationMulti-neighbor comparison due to structural similarityFusion of distances of KE point pairsUsing textural change inferences to improve AU classification accuracy

1.PROCRUSTES ANALYSIS

Used to align each i/p shape to a common shape.

CLM to track N facial feature points.

CLM fitted i/p shape aligned with pre-trained mean shape model using procrustes,compensating affine variations.

Reference frame at t=0

For further frames, current frame aligned to reference frame

Aligned shape for t>0

Affine variations12

1.PROCRUSTES ANALYSIS

CLM fitted face input (a) is aligned with the reference shape (pink)(b) to get the aligned shape (black) (b).

2.SELECTION OF KEY EMOTION POINTSAll CLM tracked points are not affected equally by facial expressions.

Lower eyebrow CLM affected by:

2.SELECTION OF KEY EMOTION POINTSMouth corner CLM affected by:

Considering all CLM points for textural change computation-averages out changes at few points and results in missing of emotions.

2.SELECTION OF KEY EMOTION POINTSSo,

w.r.t closest stable CLM tracked points in reference shape(t=0)

few KE points over eyebrows and cheek

CLM Points at the tails of the arrow marks are used to generateKE points (red) at the head.

2.SELECTION OF KEY EMOTION POINTS

Example for showcasing the sensitivity of KE points for various AUs: (a) Reference, (b) Neutral, and (c-e) Emotions. Large Mahalanobis distances at the KE points for various AUs/emotions with respect to the reference are noted.

3.TRACKING OF KE POINTS

Fixed offsets may not be consistent with time due to CLM misalignmentinaccurate KE points.

To alleviate this,KE points at t=0 (chances of CLM fitting accuracy is high) is tracked to later frames using Algorithm1

Algorithm 1 : KE Point Tracking

tmi :transform parameters.Reference/aligned shapes: faces shapes in the common spacePrevious/current/current normalized shapes: faces shapes in the original i/p space .

These four transformations are required for accurate mapping of KE points from the input space to the common space.

4.PATCH REPRESENTATION

ROI at a KE point is based on the prevalence of emotion at that point.E.g.: upper half of patch for eyebrow is sensitive to textural changes.Histograms of uniform patterns are computed for the ROI inside the patch.

(a) Patch extraction at KE points in the common space. Highlighted part inside each patch shows the ROI used for histogram computation. Arrow marks indicate the directions to generate neighbour patches in the current shape. (b) A indicates the location of eyebrow KE point. B, C, and D are its neighbours. Four patches are extracted at these four locations to compare with the model at A.

5.AFFINE NOISE BASED STATISTICAL MODEL GENERATION

Patches generated do not represent distortions due to CLM fitting inaccuracies.To alleviate this, Initial patch models Affine noise More patch models generatedTexture histograms of all patches at a KE point are used to generate its statistical model as:

M(p,q)-model at KE pt (p,q),P(I,j)^H-patch hist extrctd at neighbr (I,j)of(p,q),mu nd sigma are mean nd varianceof hists.N-neighbrhood22

5.AFFINE NOISE BASED STATISTICAL MODEL GENERATION

Since learnt the possible textural variations, may be robust to serve for a pose variation in range 15.

E.g:Due to CLM inaccuracy if point on cheek shifted downwards or leftwards, patch region at that point will coincide with mouth or nose resulting in noisy textural variations.

So, during model construction itself, model is made robust to fitting inaccuracies.

M(p,q)-model at KE pt (p,q),P(I,j)^H-patch hist extrctd at neighbr (I,j)of(p,q),mu nd sigma are mean nd varianceof hists.N-neighbrhood23

6.MULTI-NEIGHBOR COMPARISON DUE TO STRUCTURAL SIMILARITY

Textural changes due to

So instead of judging change at a KE point only comparing current hist. with its model, neighbouring patches are generated around KE point in meaningful directions.

Meaningful directions: Neighbours in the direction of no noisy textural changes.EmotionsAlignment errorsAffect whole neighbourhood at a KE point.Occur only in the direction of alignment error.


E.g.:

Changes in textural patterns due to noisy neighbours at eyebrow KE point: reference patch, (b) neighbour patch in the upward direction, and (c) neighbour patch in the downward direction.

Neighbours below eyebrows are not reliable.

Choice of direction will be above eyebrow as most of eye AUs occur there.

6.MULTI-NEIGHBOR COMPARISON DUE TO STRUCTURAL SIMILARITYEach neighbour's patch histogram compared with the learned statistical texture model at the KE point (using Mahalanobis distance)

An array of distance values

Median of these distances Used to judge the change between current and model at that point

26


Improves neutral accuracy as any noisy distances due to alignment errors and unconstrained conditions are filtered out by median operation.

Ddirn in which patch has to be extracted,delta (p,q)-distance at(p,q)27

7.FUSION OF DISTANCES OF KE POINT PAIRSDistances at KE points are fused separately for each region.

If KE points are asymmetrical across both halves due to CLM misalignments, points across both halves are not affected simultaneously.

Maximum of point distance chosen and thresholded to identify changes in locations of points in pair.

8.USING TEXTURAL CHANGE INFERENCES TO IMPROVE AU CLASSIFICATION ACCURACYAfter detecting change status at a region using proposed algorithm given to

corresponding AUs for emotion classification in contrast to ER s/ms.

E.g.: If change is detected at eyebrow, only eye related AUs are used for further classification.

Probability [error by AU classification s/m ] is reduced.

Illustrative diagram of the proposed pre-processor

EXPERIMENTAL RESULTSHuge outperformance of our algorithm on our datasets as our database contains various real world challenges like pose, lighting variations, etc.

Offering high TPR(True Positive Rate) and low FDR(False Discovery Rate) values.

By exploiting textural change inferences at KE points, an increase in f-measure for each emotion.

ADVANTAGES

Low complexiety.

Saves computation cycles of ER system.

E.g.:

Using method similar to Barlett et al.in ER s/m on SRID2 database, the number of CPU cycles required is 0.3501.5G76000

Our low complexity pre-processer, having a pre-processing accuracy of 66% on SRID2 (which has 28000 neutrals), total number of CPU cycles saved is 0.3501.5G.6628000 (approx.).

350ms for single frame p/rcing,32

LIMITATIONS

May not handle talking faces.

May not succeed under large and abrupt pose variations.

Sudden pose variation in a sequence .Also background lighting conditions varied.CLM fitting inaccurate undersudden pose variation.

May not work with good accuracy on images.

CONCLUSIONProposed a personalized pre-processing method to improve neutral and emotion classification accuracies of traditional offline trained ER system.

Aimed at mobile phone/tablet use cases, reported good accuracy for various challenging situations like 15 of pose variations, illumination variations, facial biases, etc., after cascading with a state-of-the-art ER system.

Acts as a fast pre-processing unit that reduces computational complexity of ER processing, thereby improving battery performance.

FUTURE WORKThe issue of non-availability of reference neutral frames in certain scenarios will be addressed as a separate research problem.

THANK YOU

Constrained Local ModelA Constrained Local Model (CLM) is class of methods of locating sets of points on a target image. The general approach is to

Sample a region from the image around the current estimate, projecting it into a reference frame.

For each point, generate a "response image" giving a cost for having the point at each pixel.

Searching for a combination of points which optimises the total cost, by manipulating the shape model parameters.

Sampling into the reference frame, then applying local models to compute response images R(x)

back

neutral face classification using personalized appearance models for fast and robust emotion...

Engineering