real-time face detection and tracking using multiple cameras rit computer engineering senior design...
Post on 15-Jan-2016
219 views
TRANSCRIPT
Real-Time Face Detection and Tracking Real-Time Face Detection and Tracking Using Multiple CamerasUsing Multiple Cameras
RIT Computer Engineering Senior Design Project
John Ruppert Justin HnatowJared Holsopple
This project effectively detects and tracks human faces. Using two cameras with
different zoom levels -- one viewing an entire scene, one zoomed in on a human
face – it is able to work through partial occlusion and slight illumination changes.
Because of the color space that was used, this system has the ability to track
people of all races. Utilizing multiple cameras in conjunction enables a more robust
detection and tracking environment, while increasing the complexity of the design.
The key elements of the design are the graphical user interface, the
communications algorithm, the face detection algorithm, the tracking algorithm, and
the camera view correspondence.
Hardware ConfigurationFace Detection
Face Tracking
Camera View Correspondence
Software Configuration
Capture
In order to perform object tracking in real-time, an algorithm that was not very
computationally expensive was desired. The algorithm chosen to perform this
task was the Continuously Adaptive Mean Shift (CAMSHIFT) tracking algorithm.
CAMSHIFT is a modified version of the mean shift algorithm.
1000 mm
X
Z
SVC OVC
θ
(XT, ZT)
ZT
XT
(ZT, YT)
Y
Z
YT
φ
Y
Z
ZT
The cameras are modeled using
the pinhole camera model.
A coordinate system is introduced
for the translation of region of
interest information between
cameras.
3D depth information is extracted
from 2D images based on the
relationship between average face
size and distance from the camera.
Parameter computation for driving the
pan and tilt angles of the OVC to the
pixel center of the region of interest of
the SVC is accomplished using
geometric transformations and pixel-
to-millimeter mapping information
extracted from test images.
The face detection was done using a Support Vector Machine, which is a
learning machine that has the ability to classify complicated information,
such as faces.
The face detection was first trained using approximately 150 20x20
images of both non-faces and faces.
Before classification and training, each image was
converted to grayscale and resized to 20x20 pixels.
The histogram of the image was then equalized to
normalize brightness and increase contrast. The
image was then masked on the edges to reduce the
background noise.
Original Grayscale
Resized
Equalized
MaskedSegment
Face Detect
Track with SVC
Track with OVC
From the Camera to the PC, the hardware utilized was:
•Sony EVI D100 Color Video Cameras
•SVC – Scene View Camera with wide angle view
•OVC – Object View Camera with narrow angle view
•2 PCs with:
•Osprey 200 Frame grabber cards
•2GB RAM
•Dual 2.8GHz Intel Xeons
Each PC was running Gentoo Linux. The Intel OpenCV libraries were installed on
both PCs. A GUI was created using OpenCV. The GUI displays the current
image with detection and tracking information as well as a handy command
window displaying all of the useful user interrupts.
Contributors and Resources
Contributors:
•Dr. Czernikowski – Thank you for your advice
•Dr. Savakis – Thank you for the project idea, equipment, and advice.
•Paul Mezzanini – Thank you for administering our computers.
•Yuriy Luzanov – Thank you for your guidance.
•All the people who allowed us to take their pictures.
Resources:
•Intel OpenCV Library – http://www.intel.com/research/mrl/research/opencv/
•SVM Light - http://svmlight.joachims.org/
First, the image is captured from the camera by the frame grabber card.
Using OpenCV functions, the image is then converted from its native RGB
color space to the HSI color space. This is done because the hue value of
all humans with skin pigment is in a certain well-defined range.
After being captured and converted, the image processing begins. It first goes through skin color
filtering with the skin segmentation algorithm. Skin segmentation is done using a 2-dimensional
histogram of hue and saturation values that was generated from a sample set of skin images. After
everything but skin tones are filtered from the image, a scaled black and white image is created.
This density map is 4x smaller than the original image. Each pixel is determined to be either black
or white dependent upon the percentage of skin pixels in a 4x4 region. A connected components
algorithm is run on the density map to bound the regions skin. Once the connected components algorithm has been run, rectangular regions of skin tones
are generated. Each of these is run through a face detection algorithm to determine if it is a
region of interest. The face detection algorithm then confidently classifies each region into
either a face or non-face category.
The region that the face detection algorithm most confidently classifies
is passed to the tracking algorithm on the SVC. The SVC tracks the
face until it leaves the scene or becomes occluded.
After the SVC begins to track the face, it
transmits the coordinates of the face to the
OVC. The OVC then converts the
coordinates to its own coordinate system,
moves to find the face, and begins tracking
the face with a higher zoom level. If it loses
the face, it will notify the SVC and wait for a
packet containing the latest coordinates of
the face. It will then re-center the camera’s
view on the face and, once again, begin
tracking.
The algorithm utilizes a 3-dimensional histogram based on the hue, saturation,
and intensity values of a training set. Based on the histogram, a grayscale
image is generated where each pixel represents the probability that that pixel
contains skin. This image is used to find and resize a search window that,
through successive frames, tracks the object of interest.
System Setup
SVC View
OVC View
Backprojection Original