real-time face detection and tracking using multiple cameras rit computer engineering senior design...

1
Real-Time Face Detection and Real-Time Face Detection and Tracking Using Multiple Cameras Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John Ruppert Justin Hnatow Jared Holsopple This project effectively detects and tracks human faces. Using two cameras with different zoom levels -- one viewing an entire scene, one zoomed in on a human face – it is able to work through partial occlusion and slight illumination changes. Because of the color space that was used, this system has the ability to track people of all races. Utilizing multiple cameras in conjunction enables a more robust detection and tracking environment, while increasing the complexity of the design. The key elements of the design are the graphical user interface, the communications algorithm, the face detection algorithm, the tracking algorithm, and the camera view correspondence. Hardware Configuration Face Detection Face Tracking Camera View Correspondence Software Configuration Capture In order to perform object tracking in real-time, an algorithm that was not very computationally expensive was desired. The algorithm chosen to perform this task was the Continuously Adaptive Mean Shift (CAMSHIFT) tracking algorithm. CAMSHIFT is a modified version of the mean shift algorithm. 1000 m m X Z SVC OVC θ (X T, Z T ) Z T X T (Z T, Y T ) Y Z Y T φ Y Z Z T The cameras are modeled using the pinhole camera model. A coordinate system is introduced for the translation of region of interest information between cameras. 3D depth information is extracted from 2D images based on the relationship between average face size and distance from the camera. Parameter computation for driving the pan and tilt angles of the OVC to the pixel center of the region of interest of the SVC is accomplished using geometric The face detection was done using a Support Vector Machine, which is a learning machine that has the ability to classify complicated information, such as faces. The face detection was first trained using approximately 150 20x20 images of both non-faces and faces. Before classification and training, each image was converted to grayscale and resized to 20x20 pixels. The histogram of the image was then equalized to normalize brightness and increase contrast. The image was then masked on the edges to reduce the background noise. Original Grayscale Resized Equalized Masked Segment Face Detect Track with SVC Track with OVC From the Camera to the PC, the hardware utilized was: Sony EVI D100 Color Video Cameras SVC – Scene View Camera with wide angle view OVC – Object View Camera with narrow angle view 2 PCs with: Osprey 200 Frame grabber cards 2GB RAM Dual 2.8GHz Intel Xeons Each PC was running Gentoo Linux. The Intel OpenCV libraries were installed on both PCs. A GUI was created using OpenCV. The GUI displays the current image with detection and tracking information as well as a handy command window displaying all of the useful user interrupts. Contributors and Resources Contributors: Dr. Czernikowski – Thank you for your advice Dr. Savakis – Thank you for the project idea, equipment, and advice. Paul Mezzanini – Thank you for administering our computers. Yuriy Luzanov – Thank you for your guidance. All the people who allowed us to take their pictures. First, the image is captured from the camera by the frame grabber card. Using OpenCV functions, the image is then converted from its native RGB color space to the HSI color space. This is done because the hue value of all humans with skin pigment is in a certain well- defined range. After being captured and converted, the image processing begins. It first goes through skin color filtering with the skin segmentation algorithm. Skin segmentation is done using a 2-dimensional histogram of hue and saturation values that was generated from a sample set of skin images. After everything but skin tones are filtered from the image, a scaled black and white image is created. This density map is 4x smaller than the original image. Each pixel is determined to be either black or white dependent upon the percentage of skin pixels in a 4x4 region. A connected components algorithm is run on the density map to bound the regions skin. Once the connected components algorithm has been run, rectangular regions of skin tones are generated. Each of these is run through a face detection algorithm to determine if it is a region of interest. The face detection algorithm then confidently classifies each region into either a face or non-face category. The region that the face detection algorithm most confidently classifies is passed to the tracking algorithm on the SVC. The SVC tracks the face until it leaves the scene or becomes occluded. After the SVC begins to track the face, it transmits the coordinates of the face to the OVC. The OVC then converts the coordinates to its own coordinate system, moves to find the face, and begins tracking the face with a higher zoom level. If it loses the face, it will notify the SVC and wait for a packet containing the latest coordinates of the face. It will then re-center the camera’s The algorithm utilizes a 3-dimensional histogram based on the hue, saturation, and intensity values of a training set. Based on the histogram, a grayscale image is generated where each pixel represents the probability that that pixel contains skin. This image is used to find and resize a search window that, through successive frames, tracks the object of interest. System Setup SVC View OVC View Backprojection Original

Post on 15-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This

                          

                                                 

Real-Time Face Detection and Tracking Real-Time Face Detection and Tracking Using Multiple CamerasUsing Multiple Cameras

RIT Computer Engineering Senior Design Project

John Ruppert Justin HnatowJared Holsopple

This project effectively detects and tracks human faces. Using two cameras with

different zoom levels -- one viewing an entire scene, one zoomed in on a human

face – it is able to work through partial occlusion and slight illumination changes.

Because of the color space that was used, this system has the ability to track

people of all races. Utilizing multiple cameras in conjunction enables a more robust

detection and tracking environment, while increasing the complexity of the design.

The key elements of the design are the graphical user interface, the

communications algorithm, the face detection algorithm, the tracking algorithm, and

the camera view correspondence.

Hardware ConfigurationFace Detection

Face Tracking

Camera View Correspondence

Software Configuration

Capture

In order to perform object tracking in real-time, an algorithm that was not very

computationally expensive was desired. The algorithm chosen to perform this

task was the Continuously Adaptive Mean Shift (CAMSHIFT) tracking algorithm.

CAMSHIFT is a modified version of the mean shift algorithm.

1000 mm

X

Z

SVC OVC

θ

(XT, ZT)

ZT

XT

(ZT, YT)

Y

Z

YT

φ

Y

Z

ZT

The cameras are modeled using

the pinhole camera model.

A coordinate system is introduced

for the translation of region of

interest information between

cameras.

3D depth information is extracted

from 2D images based on the

relationship between average face

size and distance from the camera.

Parameter computation for driving the

pan and tilt angles of the OVC to the

pixel center of the region of interest of

the SVC is accomplished using

geometric transformations and pixel-

to-millimeter mapping information

extracted from test images.

The face detection was done using a Support Vector Machine, which is a

learning machine that has the ability to classify complicated information,

such as faces.

The face detection was first trained using approximately 150 20x20

images of both non-faces and faces.

Before classification and training, each image was

converted to grayscale and resized to 20x20 pixels. 

The histogram of the image was then equalized to

normalize brightness and increase contrast. The

image was then masked on the edges to reduce the

background noise.

Original Grayscale

Resized

Equalized

MaskedSegment

Face Detect

Track with SVC

Track with OVC

From the Camera to the PC, the hardware utilized was:

•Sony EVI D100 Color Video Cameras

•SVC – Scene View Camera with wide angle view

•OVC – Object View Camera with narrow angle view

•2 PCs with:

•Osprey 200 Frame grabber cards

•2GB RAM

•Dual 2.8GHz Intel Xeons

Each PC was running Gentoo Linux. The Intel OpenCV libraries were installed on

both PCs. A GUI was created using OpenCV. The GUI displays the current

image with detection and tracking information as well as a handy command

window displaying all of the useful user interrupts.

Contributors and Resources

Contributors:

•Dr. Czernikowski – Thank you for your advice

•Dr. Savakis – Thank you for the project idea, equipment, and advice.

•Paul Mezzanini – Thank you for administering our computers.

•Yuriy Luzanov – Thank you for your guidance.

•All the people who allowed us to take their pictures.

Resources:

•Intel OpenCV Library – http://www.intel.com/research/mrl/research/opencv/

•SVM Light - http://svmlight.joachims.org/

First, the image is captured from the camera by the frame grabber card.

Using OpenCV functions, the image is then converted from its native RGB

color space to the HSI color space. This is done because the hue value of

all humans with skin pigment is in a certain well-defined range.

After being captured and converted, the image processing begins. It first goes through skin color

filtering with the skin segmentation algorithm. Skin segmentation is done using a 2-dimensional

histogram of hue and saturation values that was generated from a sample set of skin images. After

everything but skin tones are filtered from the image, a scaled black and white image is created.

This density map is 4x smaller than the original image. Each pixel is determined to be either black

or white dependent upon the percentage of skin pixels in a 4x4 region. A connected components

algorithm is run on the density map to bound the regions skin. Once the connected components algorithm has been run, rectangular regions of skin tones

are generated. Each of these is run through a face detection algorithm to determine if it is a

region of interest. The face detection algorithm then confidently classifies each region into

either a face or non-face category.

The region that the face detection algorithm most confidently classifies

is passed to the tracking algorithm on the SVC. The SVC tracks the

face until it leaves the scene or becomes occluded.

After the SVC begins to track the face, it

transmits the coordinates of the face to the

OVC. The OVC then converts the

coordinates to its own coordinate system,

moves to find the face, and begins tracking

the face with a higher zoom level. If it loses

the face, it will notify the SVC and wait for a

packet containing the latest coordinates of

the face. It will then re-center the camera’s

view on the face and, once again, begin

tracking.

The algorithm utilizes a 3-dimensional histogram based on the hue, saturation,

and intensity values of a training set. Based on the histogram, a grayscale

image is generated where each pixel represents the probability that that pixel

contains skin. This image is used to find and resize a search window that,

through successive frames, tracks the object of interest.

System Setup

SVC View

OVC View

Backprojection Original