facial keypoint recognition

Facial Keypoint Recognition

By

Akrita Agarwal

&

Srivathsava Sista

Introduction

The goal of this project is to be able to properly label the key points on a greyscale photograph of a human face.

We are given labelled training data consisting of 7049 images.

We used a variety of methods to use this data to perform predictions on a test data which had 1783 images

Test data was also labelled which allowed us to measure the accuracy of each method used

Implemented in R

Format of the Data

Each image was a 96 x 96 size image in greyscale

This means that each pixel was is described by a value which indicates the intensity of grey with 0 being purely white and 255 being purely black

Each training image is labelled with the (x,y) coordinates for 15 facial keypoints; which include the centre and corners of the eyes, eyebrows, lips, tip of the nose etc.

Each of these labels is followed by the 9216 integers which is essentially the greyscale image itself.

The entire data is given in a CSV file.

Features of an Image:

Evaluation of a Predictor

We compare the generated results with the labelled test data and calculate the root mean square error of the results.

The root mean square error will punish large errors and give us a good reflection of the accuracy of the predictor used.

Simple Means

Calculated the mean of every feature in the training data.

Applied the mean as the required answer for every test data picture.

No real analysis of the data was done, a very simplistic method

Resulted in an RMSE of 3.96244

Obviously not a very refined approach

Image Patches

This method is similar to the simplistic means method, but instead of taking the mean only of the point, it considers a patch of the image centered around the keypoint.

We can consider a patch size of about 10 or 15 pixels as reasonable.

Using this method, we are able to better aggregate and generalize the results for every image as it now looks for an entire area around the keypoint to roughly match the average.

Mean Right Eye Patch Mean Nose Tip Patch

Evaluation of Mean Patches

Depending on the size of the patch, we got different results for the RMSE

Testing for patch sizes between 10 and 15, we found the optimal size to be 14 as we ended up with an RMSE of 3.75538

Artificial Neural Networks

We then used neural network based classification for the data.

However, the data was just too massive to perform the entire calculation as 9216 labels had to be assigned for over 7000 images. This led to unfeasible execution times in trying to train the neural network.

Using a decimation filter, we reduced the 96x96 size images down to a 24x24 size, and considered only half of the original training set.

However as the training data is still sizable, the plots of the neural networks remained unreadable, but execution time was cut down on.

Neural Network Plot with 2 Hidden Layers

Evaluation of Neural Networks

Conclusion

The improvement of the RMSE over simplistic methods when using neural networks indicates that the features in the data are not independent of each other.

Our earlier methods did not consider the inter-dependency of the features.

It makes intuitive sense that the data proved to be interdependent as the features of the face generally follow a certain pattern.

Facial Keypoint Recognition is a very important field as it forms the initial step towards more advanced application such as facial recognition and facial expression identification

References

All the project data was obtained from kaggle.com

The dataset was in turn obtained from Dr. Yoshua Bengio, University of Montreal.

R packages and tutorials from the official site : http://www.r-project.org

facial keypoint recognition

Education