facial keypoint recognition
DESCRIPTION
Detecing facial keypoints is a very challenging problem. Facial features vary greatly from one individual to another, and even for a single individual, there is a large amount of variation due to 3D pose, size, position, viewing angle, and illumination conditions. Computer vision research has come a long way in addressing these difficulties, but there remain many oppurtunities for improvement. In this presentation we have used different methods to recognize facial keypoints and compared their RMSE (Root Mean Square Errors) to get better results and accuracy.TRANSCRIPT
Facial Keypoint Recognition
By
Akrita Agarwal
&
Srivathsava Sista
Introduction
The goal of this project is to be able to properly label the key points on a greyscale photograph of a human face.
We are given labelled training data consisting of 7049 images.
We used a variety of methods to use this data to perform predictions on a test data which had 1783 images
Test data was also labelled which allowed us to measure the accuracy of each method used
Implemented in R
Format of the Data
Each image was a 96 x 96 size image in greyscale
This means that each pixel was is described by a value which indicates the intensity of grey with 0 being purely white and 255 being purely black
Each training image is labelled with the (x,y) coordinates for 15 facial keypoints; which include the centre and corners of the eyes, eyebrows, lips, tip of the nose etc.
Each of these labels is followed by the 9216 integers which is essentially the greyscale image itself.
The entire data is given in a CSV file.
Features of an Image:
Evaluation of a Predictor
We compare the generated results with the labelled test data and calculate the root mean square error of the results.
The root mean square error will punish large errors and give us a good reflection of the accuracy of the predictor used.
Simple Means
Calculated the mean of every feature in the training data.
Applied the mean as the required answer for every test data picture.
No real analysis of the data was done, a very simplistic method
Resulted in an RMSE of 3.96244
Obviously not a very refined approach
Image Patches
This method is similar to the simplistic means method, but instead of taking the mean only of the point, it considers a patch of the image centered around the keypoint.
We can consider a patch size of about 10 or 15 pixels as reasonable.
Using this method, we are able to better aggregate and generalize the results for every image as it now looks for an entire area around the keypoint to roughly match the average.
Mean Right Eye Patch Mean Nose Tip Patch
Evaluation of Mean Patches
Depending on the size of the patch, we got different results for the RMSE
Testing for patch sizes between 10 and 15, we found the optimal size to be 14 as we ended up with an RMSE of 3.75538
Artificial Neural Networks
We then used neural network based classification for the data.
However, the data was just too massive to perform the entire calculation as 9216 labels had to be assigned for over 7000 images. This led to unfeasible execution times in trying to train the neural network.
Using a decimation filter, we reduced the 96x96 size images down to a 24x24 size, and considered only half of the original training set.
However as the training data is still sizable, the plots of the neural networks remained unreadable, but execution time was cut down on.
Neural Network Plot with 2 Hidden Layers
Evaluation of Neural Networks
Conclusion
The improvement of the RMSE over simplistic methods when using neural networks indicates that the features in the data are not independent of each other.
Our earlier methods did not consider the inter-dependency of the features.
It makes intuitive sense that the data proved to be interdependent as the features of the face generally follow a certain pattern.
Facial Keypoint Recognition is a very important field as it forms the initial step towards more advanced application such as facial recognition and facial expression identification
References
All the project data was obtained from kaggle.com
The dataset was in turn obtained from Dr. Yoshua Bengio, University of Montreal.
R packages and tutorials from the official site : http://www.r-project.org