personal access control system using moving object...
TRANSCRIPT
*Associate member of the Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria, Email:
Personal Access Control System Using Moving Object Detection
and Face Recognition
Vesna Zeljković1, Du Zhang1, Ventzeslav Valev2*, Zhongyu Zhang1, Shengjie Zhu1, Junjie Li1 1 School of Engineering & Computing Sciences, New York Institute of Technology, Nanjing Campus,
USA, [email protected] 2 IAPR Fellow, School of Computing, University of North Florida, Florida, USA
Abstract — Real time automated personal access control
system is proposed in order to detect the moving objects,
localize, extract and recognize their faces in real image
sequence. The described method encompasses two important
issues in personal access control system that receives increased
attention over years: moving object detection and face
recognition. It is tested on personal access controlled area video
testing. The efficiency of the described system is illustrated on
four real world interior video sequences recorded in
indoor/outdoor mixed environment with slight illumination
changes.
Keywords: Moving Object Detection, Moving Object Tracking,
Face Recognition, Image Processing.
I INTRODUCTION
Real time detection of moving objects from a video
sequence, [1, 2], has become an important research topic in
computer vision and video processing field and number of
visual surveillance systems has greatly increased in recent
years. Answering to the growing demand of computer vision
tools for the last generations of consumer electronic devices
equipped with smart cameras these systems have developed
into intellectual systems that automatically detect, track, and
recognize objects in video.
There is a great interest in this kind of systems because of its
wide application and huge spectra of use. The systems have
practical application in traffic control systems, surveillance
systems, robotics vision, securing different objects in
interior or exterior and even more complex tasks such as
human face recognition.
Face recognition technology has received a great deal of
attention over the decades in the field of image analysis and
computer vision. It has been studied by scientists from
various areas of psychophysical and computer sciences.
Psychologists and neuroscientists analyze human perception
and engineers study the computational aspects
of face recognition using machine recognition of
human faces. Even though face recognition represents
natural human ability, the task to develop mathematical
algorithm to perform face recognition is one of the most
challenging tasks in computer vision.
Various face recognition techniques have been developed
such as image-based, video-based, appearance-based,
model-based, 2D and 3D face recognition algorithms as well
as many methods have been proposed
for moving object detection and tracking based on edge,
color, texture information; modeling both background and
foreground with spatial-temporal reference data; non-
parametric algorithms; various techniques for fixed
background segmentation based on changing decision
threshold, etc.
Due to unpredictable characteristics of objects in blurry and
foggy videos, because of various causes and reasons, the
task of automatic moving object detection and tracking and
face recognition remains very challenging task in video
surveillance applications.
The rest of the paper is organized as follows. Review of the
most recent studies in the field of moving object detection
and face recognition is presented in the second section. The
third section describes the mathematical formulation of the
proposed moving object detection and face recognition
methods. The proposed techniques for moving object
detection, face extraction and face recognition are applied
and tested on real life video sequences and the obtained
results are exposed in the fourth section followed by
conclusive remarks in the final section.
II REVIEW OF MOVING OBJECT DETECTION AND
FACE RECOGNITION METHODS
In [3] is presented a real-time implementation of an
optimized spatio-temporal nonparametric
moving object detection method where the kernels’
bandwidths required to model the background are
dynamically estimated, the background model is selectively
updated and smart cooperation between a computer/device's
central and graphics processing units and extensive usage of
the texture mapping and filtering units of the latter,
including a novel method for fast evaluation of Gaussian
functions are implemented.
Motion detection approach is presented in [4] based on the
cerebellar-model-articulation-controller through artificial
neural networks to detect moving objects in high and low
bit-rate video streams. The proposed approach consists of a
978-1-4799-5313-4/14/$31.00 ©2014 IEEE 662
probabilistic background generation module that produces a
probabilistic background model through an unsupervised
learning process over variable bit-rate video streams and
a moving object detection module which is based on the
Cerebellar Model Articulation Controller (CMAC) network
and detects moving objects by implementing a block
selection procedure and object detection procedure.
A three-term low-rank matrix decomposition approach is
proposed in [5] in which the turbulence sequence is
decomposed into the background, the turbulence, and
the object where this extremely difficult problem of
simultaneous turbulence mitigation
and moving object detection is simplified by minimization
into nuclear norm, Frobenius norm, and 21 norm based on
two observations: 1) the turbulence causes dense and
Gaussian noise and therefore can be captured by Frobenius
norm, while the moving objects are sparse and thus can be
captured by 21 norm; 2) since the object's motion is linear
and intrinsically different from the Gaussian-like turbulence,
a Gaussian-based turbulence model can be employed to
enforce an additional constraint on the search space of the
minimization.
The problem of moving object detection in aerial video
where moving object is detected from moving background is
addressed in [6]. The motion of the background is modeled
by using the Gaussian mixture model framework and the
optical flow between every two adjacent frames is computed
to get the motion information for each pixel. The idea in [7]
is to capture a series of video pictures at regular intervals
used to describe the vector information of the region. The
segmented frames are converted into color or grayscale
images for better performance. Algorithm
for moving object detection based on log Gabor filter and
dominant eigen map approaches is described in [8] where
moving object detected and tracked by connected
component analysis and centroid manipulation.
Method for moving object detection under a moving camera
by utilizing the rank minimization framework, alignment
and moving object detection is proposed in [9]. Region
shrinking is applied on the region with high density of
motion pixels in a binary difference image in [10] in order
to detect and locate moving objects. In case of more than
one object in an image power transformation is applied to
enhance objects at different positions. The feature rectangle
of an object is derived, which is used in further tracking.
Design of a background image subtraction method and its
Field Programmable Gate Arrays (FPGA) implementation
for moving object detection in surveillance video
applications with high resolution frames of 720×480 pixels
is described in [11]. Moving object detection and retrieval
model that integrates the spatial and temporal information in
video sequences and uses the integral density method to
quickly identify the motion regions in an unsupervised way
is proposed in [12]. Key information locations in video
frames are achieved as maxima and minima of the result of
difference of Gaussian function and motion map of adjacent
frames is obtained from the diversity of the outcomes from
simultaneous partition and class parameter estimation
framework. The motion map filters key information
locations into key motion locations where the existence
of moving objects is implied. Besides showing the motion
zones, the motion map also indicates the motion direction
which guides the proposed integral density approach to
quickly and accurately locate the motion regions.
In [13] an algorithm is presented that derives fuzzy rules to
merge the detected bounding boxes into a unique cluster
bounding box that covers a unique object by defining the
relationships of a pair of boxes by their box geometrical
affinity, by their motion cohesion, and their appearance
similarity. Review of different face recognition techniques is
given in [14] and it is demonstrated that the performance of
applied image processing technique is highly dependent on
the type of pre-processing steps used and that equal error
rates of the eigen-face and Fisher-face methods can be
reduced.
Authors in [15] study the influence of demographics on the
performance and recognition accuracies of six
different face recognition algorithms: three commercial, two
non-trainable, and one trainable. Experimental results
demonstrate that the matching accuracy for race/ethnicity
and age cohorts can be improved by training exclusively on
that specific cohort which leads to a dynamic face matcher
selection scenario, where
multiple face recognition algorithms (each trained on a
different demographic cohort) are available for a selection to
a biometric system operator based on the demographic
information extracted from a probe image. It is shown that
an alternative to dynamic face matcher selection is to
train face recognition algorithms on datasets that are evenly
distributed across demographics.
In [16] a solution for illumination
invariant face recognition for indoor applications is
presented that uses active near infrared imaging system
which is able to produce good condition face images
regardless of visible lights in the environment; statistical
learning algorithms which are used to extract most
discriminative features from a large pool of invariant local
binary pattern features based on which a highly
accurate face matching engine is constructed; and a system
that is able to achieve accurate and fast face recognition in
practice.
Adaptive approach to illumination invariant
face recognition problem is presented in [17] which uses
image quality to adaptively select fusion parameters for
wavelet-based multi-stream face recognition and applies
global and region illumination normalization procedures.
Multi-resolution property of wavelet transforms for facial
feature descriptors extraction at different scales and
frequencies is utilized as it is shown that high-frequency
663
wavelet sub-bands provide illumination
invariant face descriptors.
In [18] a comparative study of several conventional face
recognition methods like Principal Component Analysis
(PCA) related to eigenfaces, Radial-Basis Function (RBF)
and Novel Kernel Methods Like Kernel Principal
Component Analysis (KPCA) and Support Vector Machine
(SVM) are provided that are suitable to work properly as
part of multimodal interface which interacts with Hybrid
Broadcast Broadband Television (HBB-TV) user. The
influence of noise and partial occlusion
on face recognition accuracy is evaluated with special focus
on occlusions of eyes and eyebrows. An
automatic face recognition system has been designed in [19]
that uses frontal images represented with gray level, Local
Binary Pattern (LBP), Local Ternary Patterns (LTP), and
two dimensional Gabor filter features and consists of an
alignment process which includes face detection, eye
detection, mapping of the center coordinates of the eyes to a
standard face template and classification of aligned faces
which is performed in a fully automatic manner.
A theory for constructing linear subspace approximations
to face recognition algorithms is presented in [20] and it is
empirically demonstrated the adequacy of the linear model,
specified in terms of a linear subspace spanned by non-
orthogonal vectors, using six different face-
recognition algorithms, spanning template-based and
feature-based approaches, with a complete separation of the
training and test sets. Face recognition algorithm robust to
large-scale changes in facial pose and lighting conditions is
described in [21] that applies a 2D-to-3D face model and
self-principal component analysis method based on bit-plane
feature fusion.
III ALGORITHM FOR PERSONAL ACCESS CONTROL
SYSTEM
The proposed personal access control system is face
recognition system that could have wide spectra of
applications in video coding, video conference, crowd
surveillance, human-computer interfaces, and controlled
entrance to secured buildings and areas.
The proposed personal access control system is shown in
Figure 1.
Figure 1. Proposed personal access control system
The input video is captured by fixed camera placed on the
ceiling of the video surveilled area facing the entrance.
Camera output is connected to CPU which contains the
proposed system and processes recorded video.
Personal access control system consists of three major
blocks which perform moving object detection in the input
video, face extraction in the detected person and finally face
identification which gives the required attendance survey.
We propose moving object detection algorithm that is
resistant to slight illumination changes. Every tenth frame is
processed due to enabling real time implementation of the
proposed personal access control system. Every tenth frame
is subtracted from the background frame in the video which
usually represents the first frame in the recorded video.
Background frame does not contain any moving objects.
The elapsed time between subtracted frames is chosen
empirically. This parameter corresponds to the maximum
speed of the moving objects, their distance from the camera,
and frame frequency. The derivative of the mean value is
calculated for every subtraction frame and it is analyzed in
time, see Figure 3. The derivative indicates the rate of
changes in the frame’s content. If the derivative value is
lower than predetermined threshold, it can be concluded that
there is a moving object detected in the analyzed frame that
is leaving the scene. Several frames before the one in which
the moving object is identified are stored in order to
facilitate the face recognition stage.
Face extraction phase is realized by face detection system
described in [22] that uses the illumination insensitive
features gained from the local successive mean quantization
transform features and the rapid detection achieved by the
split up sparse network of Winnows classifier which
represents a learning architecture that is specifically tailored
for learning in the presence of a very large number of
features and can be used as a general purpose multi-class
classifier. The local successive mean quantization transform
features are applied for illumination and sensor insensitive
operation in object recognition, a split up sparse network of
Winnows is used to speed up the original classifier and
finally, the features and classifier are combined for the task
of frontal face detection.
664
After the face is extracted from detected moving object, face
detection procedure is implemented by applying principal
component analysis procedure as described in [23-25].
M images of m x n resolution comprise the face training set
where every face image is represented as m*n=s-
dimensional vector. Principal component analysis finds a
new t-dimensional subspace whose basis vectors correspond
to the maximum variance direction in the original face image
space. This new subspace is lower dimensional than the
initial face images space, i.e. t has much lower value than
value s and is called face space. All images of known faces
are projected onto the face space to find sets of weights that
describe the contribution of each vector and an unknown
face image is also projected onto the face space to obtain its
set of weights. The unknown face image is identified by
comparing a set of weights for the unknown face to sets of
weights of known faces.
If we consider the image elements as random variables, the
principal component analysis basis vectors are defined as
eigenvectors of the scatter matrix ST that is defined as:
M
i
T
iiT xxS1
where µ is the mean of all face images in the training set and
xi is the i-th image with its columns concatenated in a vector.
The projection matrix WPCA is composed of t eigenvectors
corresponding to t largest eigenvalues, thus creating a t-
dimensional face space. Since these eigenvectors, that
represent principal component analysis basis vectors, look
like some ghostly faces they were conveniently named
eigenfaces.
The output of the face recognition algorithm gives the
detected person’s identification if her/his image is contained
in the face database.
IV SIMULATION RESULTS
The simulation results were performed using Visual C++
and MATLAB. The proposed personal access control
system is tested on four video sequence recorded by fixed
camera placed on the ceiling facing the entrance in
indoor/outdoor mixed environment with slight illumination
changes that occur in videos.
In the moving object detection phase every tenth frame is
subtracted from the background frame in the video which
represents the first frame in the recorded video and does not
contain any moving objects. The mean value is calculated
for every subtraction frame, see Figure 2. The derivative of
the mean value is calculated for every subtraction frame and
it is analyzed in time, see Figure 3. The derivative indicates
the rate of changes in the frame’s content. If the derivative
value is lower than predetermined threshold, it can be
concluded that there is a moving object detected in the
analyzed frame that is leaving the scene.
a) b)
c) d)
Figure 2. Mean value of the subtracted images observed in
time for videos: a) Video A; b) Video B; c) Video C and d)
Video D.
a) b)
c) d)
Figure 3. Derivative of mean value of the subtracted images
observed in time for videos: a) Video A; b) Video B; c)
Video C and d) Video D.
Observing the peaks in Figure 3 lower than the preset
threshold it can be concluded that there are 6 persons
detected in Videos A and C, 3 persons detected in Video B
665
who entered the video surveilled area and 4 persons detected
in Video D. Percentage of the correct moving object
detection algorithm is 100% for all tested videos and is
presented in Table 1.
The output of the moving object detection phase, i.e.
detected moving objects is presented in Figure 4 for all four
videos.
The frame where the moving object is detected is stored as
well as several frames before the one in which the moving
object appeared in order to facilitate the face recognition
stage, see Figure 5. The face extraction rate is given in Table
1 for four consecutive frames. The percent of the correct
face extraction is above 91% which is good enough to give
the highest percent of the face recognition algorithm. The
output of the face extraction phase is presented in Figure 5
for five consecutive frames for all four tested videos.
a)
b)
c)
d)
Figure 4. The output of the moving object detection phase
for video: a) Video A; b) Video B; c) Video C and d) Video
D.
666
a)
b)
c)
d)
Figure 5. The output of the face extraction phase for video:
a) Video A; b) Video B; c) Video C and d) Video D.
Experiments described in [23] show that recognition
performance decreases dramatically as the detected face
image resolution is not the same as the one of the face
images stored in the database. The percent of correct
classification averaged for various face image sizes reported
in [23] is 64%.
This is understandable as under size changes, the correlation
from one image to another is largely lost, unlike under
various illumination conditions. This imposes the solution of
the multi-scale approach where the faces of particular size
are compared to one another. One way of realizing this
approach is that the database contains face images of every
individual of several different sizes.
Given this indication from the literature, we tested PCA face
recognition algorithm under various face image sizes and
using two different metrics city block and Euclidian for
distance measurement. The obtained results are listed in
Table 2. It can be observed from Table 2 that Euclidian
distance and 60x60 image resolution give the highest face
recognition rate.
Table 1 Summary of the obtained results of the classification
Video
Percent of
correct
classification
moving object
detection
Percent of
correct
classification
face
extraction
Percent of
correct
classification
face
recognition
A 100 96.67 100
B 100 100 100
C 100 91.67 100
D 100 100 100
667
Table 2 Summary of the obtained results of face recognition under various face image sizes
Image
size
Correct rate of recognition
Video A Video B Video C Video D
Distance measurement Distance measurement Distance measurement Distance measurement
City Block Euclidean City Block Euclidean City Block Euclidean City Block Euclidean
45x45 83.33 83.33 66.67 66.67 100 100 100 100
50x50 83.33 83.33 66.67 100 50 83.33 75 100
55x55 83.33 83.33 66.67 100 83.33 83.33 100 100
60x60 100 100 66.67 100 100 100 100 100
65x65 83.33 83.33 66.67 100 83.33 83.33 100 100
70x70 83.33 83.33 66.67 100 83.33 83.33 100 100
Table 1 represents the summary of the obtained results of
the proposed automated personal access control system for
all three major phases: moving object detection, face
extraction and face recognition. It can be concluded that the
moving object detection and face recognition phase with the
parameters chosen by analyzing Table 2, give 100%
successful classification. Face extraction stage scores above
91% which is good enough to achieve the highest final
results of 100%.
V CONCLUSION
Real time personal access control system based on moving
object detection and face recognition is proposed. It is
illustrated on the personal access controlled area video
testing. It consists of three major stages: the moving object
detection, face localization and extraction and face
recognition in real image sequences, which gives the
required attendance survey.
Novel moving object detection algorithm resistant to slight
illumination changes processes every tenth frame and
subtracts it from the background. The derivative of the mean
value is calculated for every subtraction frame, it indicates
the rate of changes in the frame’s content and successfully
detects the person who entered the video surveilled space.
Face extraction phase is realized by face detection system
using the illumination insensitive features gained from the
local successive mean quantization transform features and
the rapid detection achieved by the split up sparse network
of Winnows classifier. After the face is extracted from
detected moving object, face detection procedure is
implemented by applying principal component analysis
procedure.
The efficiency of the described system is illustrated on four
real world interior video sequences recorded in
indoor/outdoor mixed environment with slight illumination
changes. We obtain very high recognition rate which
justifies our future research in this direction and the effort to
make the proposed system competitive.
REFERENCES [1] V. Zeljkovic, “Video Surveillance Techniques and Technologies”,
IGI Global Hershey PA, USA, 2013, ISBN: 978-1-4666-4896-8,
http://www.igi-global.com/book/video-surveillance-techniques-
technologies/78939.
[2] V. Zeljkovic, “Illumination Independent Moving Object Detection in
Image Sequences”, LAP Lambert Academic Publishing GmbH & Co.
KG, 2010, ISBN: 978-3-8433-5943-6,
http://www.bookdepository.co.uk/Illumination-Independent-Moving-
Object-Detection-Image-Sequences-Vesna-
Zeljkovic/9783843359436.
[3] Berjon D., Cuevas C., Moran F., Garcia N., “GPU-Based
Implementation of an Optimized Nonparametric Background
Modeling for Real-Time Moving Object Detection”, IEEE
Transactions on Consumer Electronics, Vol. 59 , Issue 2, 2013 , pp.
361–369.
[4] Shih-Chia Huang, Bo-Hao Chen, “Highly
Accurate Moving Object Detection in Variable Bit Rate Video-Based
Traffic Monitoring Systems”, IEEE Transactions on Neural Networks
and Learning Systems, Vol. 24, Issue 12, 2013, pp. 1920–1931.
[5] Oreifej O., Xin Li, Shah M., “Simultaneous Video Stabilization
and Moving Object Detection in Turbulence”, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 35, Issue 2, 2013,
pp. 450-462.
[6] Yunfei Wang, Zhaoxiang Zhang, Yunhong Wang,
“Moving Object Detection in Aerial Video”, 11th International
Conference on Machine Learning and Applications, Vol. 2, 2012, pp.
446-450.
[7] Rajagur D., Manimuthu S.D., Rajkamal A., Malik H.M.,
“Moving Object Detection Using Drawpad”, International
Conference on Advances in Engineering, Science and Management,
2012, pp. 39-42.
[8] Krishna M.T.G., RaviShankar M., Babu, R., “Log-DEM: Log Gabor
Filter and Dominant Eigen Map Approaches
for Moving Object Detection”, 12th International Conference on
Intelligent Systems Design and Applications, 2012, pp. 568-573.
[9] Sang-Woo Noh, Tae-Hyun Oh, In So Kweon,
“Moving Object Detection under Moving Camera by Rank
668
Minimization”, 9th International Conference on Ubiquitous Robots
and Ambient Intelligence, 2012, pp. 586-587.
[10] Zhihui Li, Haibo Liu, Di Sun, “Moving Object Detection and
Locating Based on Region Shrinking Algorithm”, International
Conference on Mechatronics and Automation, 2012, pp. 2515-2518.
[11] Lopez-Bravo A., Diaz-Carmona J., Ramirez-Agundis A., Padilla-
Medina A., Prado-Olivarez J., “FPGA-Based Video System for Real
Time Moving Object Detection”, International Conference on
Electronics, Communications and Computing, 2013, pp. 92-97.
[12] Dianting Liu, Mei-Ling Shyu,
“Effective Moving Object Detection and Retrieval via Integrating
Spatial-Temporal Multimedia Information”, IEEE International
Symposium on Multimedia, 2012, pp. 364-371.
[13] Jilin Tu, Del Amo A., Yi Xu, Li Guan, Mingching Chang, Sebastian
T., “A Fuzzy Bounding Box Merging Technique for Moving
Object Detection”, Annual Meeting of the North American Fuzzy
Information Processing Society, 2012, pp. 1-6.
[14] Teja G.P., Ravi S., Face Recognition Using Subspaces Techniques”,
International Conference on Recent Trends In Information
Technology, 2012, pp. 103–107.
[15] Klare B.F., Burge M.J., Klontz J.C., Vorder Bruegge R.W., Jain A.K..
“Face Recognition Performance: Role of Demographic Information”,
IEEE Transactions on Information Forensics and Security, Vol.
7, Issue 6, 2012, pp. 1789–1801.
[16] Li S.Z., Ru Feng Chu, Shengcai Liao, Lun Zhang, “Illumination
Invariant Face Recognition Using Near-Infrared Images”, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol. 29
, Issue 4, 2007, pp. 627–639.
[17] Sellahewa H., Jassim S.A., “Image-Quality-Based
Adaptive Face Recognition”, IEEE Transactions on Instrumentation
and Measurement, Vol. 59, Issue: 4, 2010, pp. 805–813.
[18] Jozer B., Matej F., Lubos O., Milos O., Jarmila P.,
“Face Recognition under Partial Occlusion and Noise”, IEEE
EUROCON, 2013, pp. 2072–2079.
[19] Yavuz H.S., Cevikalp H., Edizkan, R.,
“Automatic Face Recognition from Frontal Images”, 21st IEEE Signal
Processing and Communications Applications Conference, 2013, pp.
1–4.
[20] Mohanty P., Sarkar S., Kasturi R., Phillips P.J., “Subspace
Approximation of Face Recognition Algorithms: An Empirical
Study”, IEEE Transactions on Information Forensics and Security,
Vol. 3, Issue 4, 2008, pp. 734–748.
[21] Yi Dai, Guoqiang Xiao, Kaijin Qiu, “Efficient Face Recognition with
Variant Pose and Illumination in Video”, 4th IEEE International
Conference on Computer Science & Education, 2009, pp. 18–22.
[22] Nilsson M., Nordberg J., Claesson I., "Face Detection using Local
SMQT Features and Split up Snow Classifier", IEEE International
Conference on Acoustics, Speech and Signal Processing, 2007, Vol.
2, pp. 589-592.
[23] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of
Cognitive Neuroscience, 1991, Vol. 3, No. 1, pp. 71-86.
[24] Turk M.A., Pentland A.P., "Face Recognition Using Eigenfaces",
IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 1991, pp. 586 - 591.
[25] Delac K., Grgic M., Grgic S., "Independent Comparative Study of
PCA, ICA, and LDA on the FERET Data Set", International Journal
of Imaging Systems and Technology, Vol. 15, Issue 5, 2006, pp. 252-
260.
669