personal access control system using moving object...

*Associate member of the Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria, Email:

[email protected]

Personal Access Control System Using Moving Object Detection

and Face Recognition

Vesna Zeljković1, Du Zhang1, Ventzeslav Valev2*, Zhongyu Zhang1, Shengjie Zhu1, Junjie Li1 1 School of Engineering & Computing Sciences, New York Institute of Technology, Nanjing Campus,

USA, [email protected] 2 IAPR Fellow, School of Computing, University of North Florida, Florida, USA

Abstract — Real time automated personal access control

system is proposed in order to detect the moving objects,

localize, extract and recognize their faces in real image

sequence. The described method encompasses two important

issues in personal access control system that receives increased

attention over years: moving object detection and face

recognition. It is tested on personal access controlled area video

testing. The efficiency of the described system is illustrated on

four real world interior video sequences recorded in

indoor/outdoor mixed environment with slight illumination

changes.

Keywords: Moving Object Detection, Moving Object Tracking,

Face Recognition, Image Processing.

I INTRODUCTION

Real time detection of moving objects from a video

sequence, [1, 2], has become an important research topic in

computer vision and video processing field and number of

visual surveillance systems has greatly increased in recent

years. Answering to the growing demand of computer vision

tools for the last generations of consumer electronic devices

equipped with smart cameras these systems have developed

into intellectual systems that automatically detect, track, and

recognize objects in video.

There is a great interest in this kind of systems because of its

wide application and huge spectra of use. The systems have

practical application in traffic control systems, surveillance

systems, robotics vision, securing different objects in

interior or exterior and even more complex tasks such as

human face recognition.

Face recognition technology has received a great deal of

attention over the decades in the field of image analysis and

computer vision. It has been studied by scientists from

various areas of psychophysical and computer sciences.

Psychologists and neuroscientists analyze human perception

and engineers study the computational aspects

of face recognition using machine recognition of

human faces. Even though face recognition represents

natural human ability, the task to develop mathematical

algorithm to perform face recognition is one of the most

challenging tasks in computer vision.

Various face recognition techniques have been developed

such as image-based, video-based, appearance-based,

model-based, 2D and 3D face recognition algorithms as well

as many methods have been proposed

for moving object detection and tracking based on edge,

color, texture information; modeling both background and

foreground with spatial-temporal reference data; non-

parametric algorithms; various techniques for fixed

background segmentation based on changing decision

threshold, etc.

Due to unpredictable characteristics of objects in blurry and

foggy videos, because of various causes and reasons, the

task of automatic moving object detection and tracking and

face recognition remains very challenging task in video

surveillance applications.

The rest of the paper is organized as follows. Review of the

most recent studies in the field of moving object detection

and face recognition is presented in the second section. The

third section describes the mathematical formulation of the

proposed moving object detection and face recognition

methods. The proposed techniques for moving object

detection, face extraction and face recognition are applied

and tested on real life video sequences and the obtained

results are exposed in the fourth section followed by

conclusive remarks in the final section.

II REVIEW OF MOVING OBJECT DETECTION AND

FACE RECOGNITION METHODS

In [3] is presented a real-time implementation of an

optimized spatio-temporal nonparametric

moving object detection method where the kernels’

bandwidths required to model the background are

dynamically estimated, the background model is selectively

updated and smart cooperation between a computer/device's

central and graphics processing units and extensive usage of

the texture mapping and filtering units of the latter,

including a novel method for fast evaluation of Gaussian

functions are implemented.

Motion detection approach is presented in [4] based on the

cerebellar-model-articulation-controller through artificial

neural networks to detect moving objects in high and low

bit-rate video streams. The proposed approach consists of a

978-1-4799-5313-4/14/$31.00 ©2014 IEEE 662

probabilistic background generation module that produces a

probabilistic background model through an unsupervised

learning process over variable bit-rate video streams and

a moving object detection module which is based on the

Cerebellar Model Articulation Controller (CMAC) network

and detects moving objects by implementing a block

selection procedure and object detection procedure.

A three-term low-rank matrix decomposition approach is

proposed in [5] in which the turbulence sequence is

decomposed into the background, the turbulence, and

the object where this extremely difficult problem of

simultaneous turbulence mitigation

and moving object detection is simplified by minimization

into nuclear norm, Frobenius norm, and 21 norm based on

two observations: 1) the turbulence causes dense and

Gaussian noise and therefore can be captured by Frobenius

norm, while the moving objects are sparse and thus can be

captured by 21 norm; 2) since the object's motion is linear

and intrinsically different from the Gaussian-like turbulence,

a Gaussian-based turbulence model can be employed to

enforce an additional constraint on the search space of the

minimization.

The problem of moving object detection in aerial video

where moving object is detected from moving background is

addressed in [6]. The motion of the background is modeled

by using the Gaussian mixture model framework and the

optical flow between every two adjacent frames is computed

to get the motion information for each pixel. The idea in [7]

is to capture a series of video pictures at regular intervals

used to describe the vector information of the region. The

segmented frames are converted into color or grayscale

images for better performance. Algorithm

for moving object detection based on log Gabor filter and

dominant eigen map approaches is described in [8] where

moving object detected and tracked by connected

component analysis and centroid manipulation.

Method for moving object detection under a moving camera

by utilizing the rank minimization framework, alignment

and moving object detection is proposed in [9]. Region

shrinking is applied on the region with high density of

motion pixels in a binary difference image in [10] in order

to detect and locate moving objects. In case of more than

one object in an image power transformation is applied to

enhance objects at different positions. The feature rectangle

of an object is derived, which is used in further tracking.

Design of a background image subtraction method and its

Field Programmable Gate Arrays (FPGA) implementation

for moving object detection in surveillance video

applications with high resolution frames of 720×480 pixels

is described in [11]. Moving object detection and retrieval

model that integrates the spatial and temporal information in

video sequences and uses the integral density method to

quickly identify the motion regions in an unsupervised way

is proposed in [12]. Key information locations in video

frames are achieved as maxima and minima of the result of

difference of Gaussian function and motion map of adjacent

frames is obtained from the diversity of the outcomes from

simultaneous partition and class parameter estimation

framework. The motion map filters key information

locations into key motion locations where the existence

of moving objects is implied. Besides showing the motion

zones, the motion map also indicates the motion direction

which guides the proposed integral density approach to

quickly and accurately locate the motion regions.

In [13] an algorithm is presented that derives fuzzy rules to

merge the detected bounding boxes into a unique cluster

bounding box that covers a unique object by defining the

relationships of a pair of boxes by their box geometrical

affinity, by their motion cohesion, and their appearance

similarity. Review of different face recognition techniques is

given in [14] and it is demonstrated that the performance of

applied image processing technique is highly dependent on

the type of pre-processing steps used and that equal error

rates of the eigen-face and Fisher-face methods can be

reduced.

Authors in [15] study the influence of demographics on the

performance and recognition accuracies of six

different face recognition algorithms: three commercial, two

non-trainable, and one trainable. Experimental results

demonstrate that the matching accuracy for race/ethnicity

and age cohorts can be improved by training exclusively on

that specific cohort which leads to a dynamic face matcher

selection scenario, where

multiple face recognition algorithms (each trained on a

different demographic cohort) are available for a selection to

a biometric system operator based on the demographic

information extracted from a probe image. It is shown that

an alternative to dynamic face matcher selection is to

train face recognition algorithms on datasets that are evenly

distributed across demographics.

In [16] a solution for illumination

invariant face recognition for indoor applications is

presented that uses active near infrared imaging system

which is able to produce good condition face images

regardless of visible lights in the environment; statistical

learning algorithms which are used to extract most

discriminative features from a large pool of invariant local

binary pattern features based on which a highly

accurate face matching engine is constructed; and a system

that is able to achieve accurate and fast face recognition in

practice.

Adaptive approach to illumination invariant

face recognition problem is presented in [17] which uses

image quality to adaptively select fusion parameters for

wavelet-based multi-stream face recognition and applies

global and region illumination normalization procedures.

Multi-resolution property of wavelet transforms for facial

feature descriptors extraction at different scales and

frequencies is utilized as it is shown that high-frequency

663

wavelet sub-bands provide illumination

invariant face descriptors.

In [18] a comparative study of several conventional face

recognition methods like Principal Component Analysis

(PCA) related to eigenfaces, Radial-Basis Function (RBF)

and Novel Kernel Methods Like Kernel Principal

Component Analysis (KPCA) and Support Vector Machine

(SVM) are provided that are suitable to work properly as

part of multimodal interface which interacts with Hybrid

Broadcast Broadband Television (HBB-TV) user. The

influence of noise and partial occlusion

on face recognition accuracy is evaluated with special focus

on occlusions of eyes and eyebrows. An

automatic face recognition system has been designed in [19]

that uses frontal images represented with gray level, Local

Binary Pattern (LBP), Local Ternary Patterns (LTP), and

two dimensional Gabor filter features and consists of an

alignment process which includes face detection, eye

detection, mapping of the center coordinates of the eyes to a

standard face template and classification of aligned faces

which is performed in a fully automatic manner.

A theory for constructing linear subspace approximations

to face recognition algorithms is presented in [20] and it is

empirically demonstrated the adequacy of the linear model,

specified in terms of a linear subspace spanned by non-

orthogonal vectors, using six different face-

recognition algorithms, spanning template-based and

feature-based approaches, with a complete separation of the

training and test sets. Face recognition algorithm robust to

large-scale changes in facial pose and lighting conditions is

described in [21] that applies a 2D-to-3D face model and

self-principal component analysis method based on bit-plane

feature fusion.

III ALGORITHM FOR PERSONAL ACCESS CONTROL

SYSTEM

The proposed personal access control system is face

recognition system that could have wide spectra of

applications in video coding, video conference, crowd

surveillance, human-computer interfaces, and controlled

entrance to secured buildings and areas.

The proposed personal access control system is shown in

Figure 1.

Figure 1. Proposed personal access control system

The input video is captured by fixed camera placed on the

ceiling of the video surveilled area facing the entrance.

Camera output is connected to CPU which contains the

proposed system and processes recorded video.

Personal access control system consists of three major

blocks which perform moving object detection in the input

video, face extraction in the detected person and finally face

identification which gives the required attendance survey.

We propose moving object detection algorithm that is

resistant to slight illumination changes. Every tenth frame is

processed due to enabling real time implementation of the

proposed personal access control system. Every tenth frame

is subtracted from the background frame in the video which

usually represents the first frame in the recorded video.

Background frame does not contain any moving objects.

The elapsed time between subtracted frames is chosen

empirically. This parameter corresponds to the maximum

speed of the moving objects, their distance from the camera,

and frame frequency. The derivative of the mean value is

calculated for every subtraction frame and it is analyzed in

time, see Figure 3. The derivative indicates the rate of

changes in the frame’s content. If the derivative value is

lower than predetermined threshold, it can be concluded that

there is a moving object detected in the analyzed frame that

is leaving the scene. Several frames before the one in which

the moving object is identified are stored in order to

facilitate the face recognition stage.

Face extraction phase is realized by face detection system

described in [22] that uses the illumination insensitive

features gained from the local successive mean quantization

transform features and the rapid detection achieved by the

split up sparse network of Winnows classifier which

represents a learning architecture that is specifically tailored

for learning in the presence of a very large number of

features and can be used as a general purpose multi-class

classifier. The local successive mean quantization transform

features are applied for illumination and sensor insensitive

operation in object recognition, a split up sparse network of

Winnows is used to speed up the original classifier and

finally, the features and classifier are combined for the task

of frontal face detection.

664

After the face is extracted from detected moving object, face

detection procedure is implemented by applying principal

component analysis procedure as described in [23-25].

M images of m x n resolution comprise the face training set

where every face image is represented as m*n=s-

dimensional vector. Principal component analysis finds a

new t-dimensional subspace whose basis vectors correspond

to the maximum variance direction in the original face image

space. This new subspace is lower dimensional than the

initial face images space, i.e. t has much lower value than

value s and is called face space. All images of known faces

are projected onto the face space to find sets of weights that

describe the contribution of each vector and an unknown

face image is also projected onto the face space to obtain its

set of weights. The unknown face image is identified by

comparing a set of weights for the unknown face to sets of

weights of known faces.

If we consider the image elements as random variables, the

principal component analysis basis vectors are defined as

eigenvectors of the scatter matrix ST that is defined as:

M

i

T

iiT xxS1

where µ is the mean of all face images in the training set and

xi is the i-th image with its columns concatenated in a vector.

The projection matrix WPCA is composed of t eigenvectors

corresponding to t largest eigenvalues, thus creating a t-

dimensional face space. Since these eigenvectors, that

represent principal component analysis basis vectors, look

like some ghostly faces they were conveniently named

eigenfaces.

The output of the face recognition algorithm gives the

detected person’s identification if her/his image is contained

in the face database.

IV SIMULATION RESULTS

The simulation results were performed using Visual C++

and MATLAB. The proposed personal access control

system is tested on four video sequence recorded by fixed

camera placed on the ceiling facing the entrance in


changes that occur in videos.

In the moving object detection phase every tenth frame is

subtracted from the background frame in the video which

represents the first frame in the recorded video and does not

contain any moving objects. The mean value is calculated

for every subtraction frame, see Figure 2. The derivative of

the mean value is calculated for every subtraction frame and

it is analyzed in time, see Figure 3. The derivative indicates

the rate of changes in the frame’s content. If the derivative

value is lower than predetermined threshold, it can be

concluded that there is a moving object detected in the

analyzed frame that is leaving the scene.

a) b)

c) d)

Figure 2. Mean value of the subtracted images observed in

time for videos: a) Video A; b) Video B; c) Video C and d)

Video D.

a) b)

c) d)

Figure 3. Derivative of mean value of the subtracted images

observed in time for videos: a) Video A; b) Video B; c)

Video C and d) Video D.

Observing the peaks in Figure 3 lower than the preset

threshold it can be concluded that there are 6 persons

detected in Videos A and C, 3 persons detected in Video B

665

who entered the video surveilled area and 4 persons detected

in Video D. Percentage of the correct moving object

detection algorithm is 100% for all tested videos and is

presented in Table 1.

The output of the moving object detection phase, i.e.

detected moving objects is presented in Figure 4 for all four

videos.

The frame where the moving object is detected is stored as

well as several frames before the one in which the moving

object appeared in order to facilitate the face recognition

stage, see Figure 5. The face extraction rate is given in Table

1 for four consecutive frames. The percent of the correct

face extraction is above 91% which is good enough to give

the highest percent of the face recognition algorithm. The

output of the face extraction phase is presented in Figure 5

for five consecutive frames for all four tested videos.

a)

b)

c)

d)

Figure 4. The output of the moving object detection phase

for video: a) Video A; b) Video B; c) Video C and d) Video

D.

666

a)

b)

c)

d)

Figure 5. The output of the face extraction phase for video:

a) Video A; b) Video B; c) Video C and d) Video D.

Experiments described in [23] show that recognition

performance decreases dramatically as the detected face

image resolution is not the same as the one of the face

images stored in the database. The percent of correct

classification averaged for various face image sizes reported

in [23] is 64%.

This is understandable as under size changes, the correlation

from one image to another is largely lost, unlike under

various illumination conditions. This imposes the solution of

the multi-scale approach where the faces of particular size

are compared to one another. One way of realizing this

approach is that the database contains face images of every

individual of several different sizes.

Given this indication from the literature, we tested PCA face

recognition algorithm under various face image sizes and

using two different metrics city block and Euclidian for

distance measurement. The obtained results are listed in

Table 2. It can be observed from Table 2 that Euclidian

distance and 60x60 image resolution give the highest face

recognition rate.

Table 1 Summary of the obtained results of the classification

Video

Percent of

correct

classification

moving object

detection

Percent of

correct

classification

face

extraction

Percent of

correct

classification

face

recognition

A 100 96.67 100

B 100 100 100

C 100 91.67 100

D 100 100 100

667

Table 2 Summary of the obtained results of face recognition under various face image sizes

Image

size

Correct rate of recognition

Video A Video B Video C Video D

Distance measurement Distance measurement Distance measurement Distance measurement

City Block Euclidean City Block Euclidean City Block Euclidean City Block Euclidean

45x45 83.33 83.33 66.67 66.67 100 100 100 100

50x50 83.33 83.33 66.67 100 50 83.33 75 100

55x55 83.33 83.33 66.67 100 83.33 83.33 100 100

60x60 100 100 66.67 100 100 100 100 100

65x65 83.33 83.33 66.67 100 83.33 83.33 100 100

70x70 83.33 83.33 66.67 100 83.33 83.33 100 100

Table 1 represents the summary of the obtained results of

the proposed automated personal access control system for

all three major phases: moving object detection, face

extraction and face recognition. It can be concluded that the

moving object detection and face recognition phase with the

parameters chosen by analyzing Table 2, give 100%

successful classification. Face extraction stage scores above

91% which is good enough to achieve the highest final

results of 100%.

V CONCLUSION

Real time personal access control system based on moving

object detection and face recognition is proposed. It is

illustrated on the personal access controlled area video

testing. It consists of three major stages: the moving object

detection, face localization and extraction and face

recognition in real image sequences, which gives the

required attendance survey.

Novel moving object detection algorithm resistant to slight

illumination changes processes every tenth frame and

subtracts it from the background. The derivative of the mean

value is calculated for every subtraction frame, it indicates

the rate of changes in the frame’s content and successfully

detects the person who entered the video surveilled space.

Face extraction phase is realized by face detection system

using the illumination insensitive features gained from the

local successive mean quantization transform features and

the rapid detection achieved by the split up sparse network

of Winnows classifier. After the face is extracted from

detected moving object, face detection procedure is

implemented by applying principal component analysis

procedure.

The efficiency of the described system is illustrated on four

real world interior video sequences recorded in


changes. We obtain very high recognition rate which

justifies our future research in this direction and the effort to

make the proposed system competitive.

REFERENCES [1] V. Zeljkovic, “Video Surveillance Techniques and Technologies”,

IGI Global Hershey PA, USA, 2013, ISBN: 978-1-4666-4896-8,

http://www.igi-global.com/book/video-surveillance-techniques-

technologies/78939.

[2] V. Zeljkovic, “Illumination Independent Moving Object Detection in

Image Sequences”, LAP Lambert Academic Publishing GmbH & Co.

KG, 2010, ISBN: 978-3-8433-5943-6,

http://www.bookdepository.co.uk/Illumination-Independent-Moving-

Object-Detection-Image-Sequences-Vesna-

Zeljkovic/9783843359436.

[3] Berjon D., Cuevas C., Moran F., Garcia N., “GPU-Based

Implementation of an Optimized Nonparametric Background

Modeling for Real-Time Moving Object Detection”, IEEE

Transactions on Consumer Electronics, Vol. 59 , Issue 2, 2013 , pp.

361–369.

[4] Shih-Chia Huang, Bo-Hao Chen, “Highly

Accurate Moving Object Detection in Variable Bit Rate Video-Based

Traffic Monitoring Systems”, IEEE Transactions on Neural Networks

and Learning Systems, Vol. 24, Issue 12, 2013, pp. 1920–1931.

[5] Oreifej O., Xin Li, Shah M., “Simultaneous Video Stabilization

and Moving Object Detection in Turbulence”, IEEE Transactions on

Pattern Analysis and Machine Intelligence, Vol. 35, Issue 2, 2013,

pp. 450-462.

[6] Yunfei Wang, Zhaoxiang Zhang, Yunhong Wang,

“Moving Object Detection in Aerial Video”, 11th International

Conference on Machine Learning and Applications, Vol. 2, 2012, pp.

446-450.

[7] Rajagur D., Manimuthu S.D., Rajkamal A., Malik H.M.,

“Moving Object Detection Using Drawpad”, International

Conference on Advances in Engineering, Science and Management,

2012, pp. 39-42.

[8] Krishna M.T.G., RaviShankar M., Babu, R., “Log-DEM: Log Gabor

Filter and Dominant Eigen Map Approaches

for Moving Object Detection”, 12th International Conference on

Intelligent Systems Design and Applications, 2012, pp. 568-573.

[9] Sang-Woo Noh, Tae-Hyun Oh, In So Kweon,

“Moving Object Detection under Moving Camera by Rank

668

Minimization”, 9th International Conference on Ubiquitous Robots

and Ambient Intelligence, 2012, pp. 586-587.

[10] Zhihui Li, Haibo Liu, Di Sun, “Moving Object Detection and

Locating Based on Region Shrinking Algorithm”, International

Conference on Mechatronics and Automation, 2012, pp. 2515-2518.

[11] Lopez-Bravo A., Diaz-Carmona J., Ramirez-Agundis A., Padilla-

Medina A., Prado-Olivarez J., “FPGA-Based Video System for Real

Time Moving Object Detection”, International Conference on

Electronics, Communications and Computing, 2013, pp. 92-97.

[12] Dianting Liu, Mei-Ling Shyu,

“Effective Moving Object Detection and Retrieval via Integrating

Spatial-Temporal Multimedia Information”, IEEE International

Symposium on Multimedia, 2012, pp. 364-371.

[13] Jilin Tu, Del Amo A., Yi Xu, Li Guan, Mingching Chang, Sebastian

T., “A Fuzzy Bounding Box Merging Technique for Moving

Object Detection”, Annual Meeting of the North American Fuzzy

Information Processing Society, 2012, pp. 1-6.

[14] Teja G.P., Ravi S., Face Recognition Using Subspaces Techniques”,

International Conference on Recent Trends In Information

Technology, 2012, pp. 103–107.

[15] Klare B.F., Burge M.J., Klontz J.C., Vorder Bruegge R.W., Jain A.K..

“Face Recognition Performance: Role of Demographic Information”,

IEEE Transactions on Information Forensics and Security, Vol.

7, Issue 6, 2012, pp. 1789–1801.

[16] Li S.Z., Ru Feng Chu, Shengcai Liao, Lun Zhang, “Illumination

Invariant Face Recognition Using Near-Infrared Images”, IEEE

Transactions on Pattern Analysis and Machine Intelligence, Vol. 29

, Issue 4, 2007, pp. 627–639.

[17] Sellahewa H., Jassim S.A., “Image-Quality-Based

Adaptive Face Recognition”, IEEE Transactions on Instrumentation

and Measurement, Vol. 59, Issue: 4, 2010, pp. 805–813.

[18] Jozer B., Matej F., Lubos O., Milos O., Jarmila P.,

“Face Recognition under Partial Occlusion and Noise”, IEEE

EUROCON, 2013, pp. 2072–2079.

[19] Yavuz H.S., Cevikalp H., Edizkan, R.,

“Automatic Face Recognition from Frontal Images”, 21st IEEE Signal

Processing and Communications Applications Conference, 2013, pp.

1–4.

[20] Mohanty P., Sarkar S., Kasturi R., Phillips P.J., “Subspace

Approximation of Face Recognition Algorithms: An Empirical

Study”, IEEE Transactions on Information Forensics and Security,

Vol. 3, Issue 4, 2008, pp. 734–748.

[21] Yi Dai, Guoqiang Xiao, Kaijin Qiu, “Efficient Face Recognition with

Variant Pose and Illumination in Video”, 4th IEEE International

Conference on Computer Science & Education, 2009, pp. 18–22.

[22] Nilsson M., Nordberg J., Claesson I., "Face Detection using Local

SMQT Features and Split up Snow Classifier", IEEE International

Conference on Acoustics, Speech and Signal Processing, 2007, Vol.

2, pp. 589-592.

[23] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of

Cognitive Neuroscience, 1991, Vol. 3, No. 1, pp. 71-86.

[24] Turk M.A., Pentland A.P., "Face Recognition Using Eigenfaces",

IEEE Computer Society Conference on Computer Vision and Pattern

Recognition, 1991, pp. 586 - 591.

[25] Delac K., Grgic M., Grgic S., "Independent Comparative Study of

PCA, ICA, and LDA on the FERET Data Set", International Journal

of Imaging Systems and Technology, Vol. 15, Issue 5, 2006, pp. 252-

260.

669

personal access control system using moving object...

Documents