an object detection and tracking techniques using ... · 1 an object detection and tracking...

1

An Object Detection and Tracking Techniques Using

Superpixel Extraction in Compressed Video Sequences

1 Chakaravarthi S, 2 Visu P, 3Ganesan R

1Associate Professor, CSE Department, Velammal Engineering College, Chennai, Tamilnadu, India.

2Associate Professor, CSE Department, Velammal Engineering College, Chennai, Tamilnadu, India.

3Assistant Professor-III, CSE Department, Velammal Engineering College, Chennai, Tamilnadu, India

[email protected], [email protected],[email protected]

Abstract— Segmenting Foreground objects from a video gathering is a key get ready and

fundamental walk in video examination, for instance, question acknowledgment and followi ng. A

couple of Motion disclosure methods and edge locators have been delivered till now however the

issue is that it is particularly difficult to move a perfect article, say frontal territory, due to the block

from the factors like atmosphere, light, shadow and chaos. This paper proposes another frontal area

discovery approach that arrangements with edge location of edges of packed video groupings by

applying the methods of removing the superpixels and playing out the foundation subtraction

calculation and optical stream alongside SMED (Separable Morphological Edge Detector) on those

superpixels separated in each edge of the video. SMED has strength to light changing and equipped

for recognizing even slight development in the video grouping. The proposed work is profoundly

quick and exact in identifying the moving items in different situations, for example, quick moving

articles, moderate moving articles and notwithstanding moving articles in unique scenes, where

both the frontal area and foundation changes. On applying the above strategies consecutively on the

video grouping, closer view question can be divided precisely with increment in execution speed of

the calculation, exactness, decreasing the foundation commotions.

I. INTRODUCTION

A Video Surveillance framework [L.Cheng, M.Gong]1 must be equipped for ceaseless operation

under different climate and brightening conditions. It ought to be equipped for managing

development through jumbled territories, objects covering in the visual field, shadows, lighting

changes, and impacts of moving components of the scene (e.g. influencing trees), moderate moving

items, and articles being presented or expelled from the scene. Additionally, Real time video division

includes the division of forefront picture from a foundation scene precisely. Yet, the key issues of

clamor, undetected edges, loss of smoothness, inappropriate division of forefront protest

particularly in covered questions and keeping up vigor because of changes in brightening stays all

things considered despite the fact that such a large number of calculations came into a being.

International Journal of Pure and Applied MathematicsVolume 119 No. 17 2018, 553-574ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

553

2

Likewise some more extra works must be accomplished for protest following, movement

investigation and conduct examination for fragmenting the closer view objects. Conventional

methodologies in view of foundation techniques regularly flop in these general circumstances.

The principle goal is to fragment the frontal area protest from foundation scene is to take out all

the related issues and to have an exact closer view picture. To have an accurate foreground image

from a video, so many edge detection algorithms are used but each of them has some constraints

and they cannot satisfy the purpose entirely. So to tackle with the above related problems this paper

gives a solution by applying the following methods sequentially thereby to improve the efficiency.

First, extraction of superpixel from a video frame to reduce the number of comparisons. Second,

applying the background subtraction algorithm and optical flow on those superpixels extracte d in

each frame of the video [K.Suganya Devi, N.Malmurugan]2 instead of pixels to detect the edges of

objects in the video clearly and finally by using the SMED (Separable Morphological Edge Detector)

the foreground object is segmented from background scene accurately.

Till now the images/video frames are analyzed with the help of pixels or sub-pixels which leads to

more comparisons and the edge detection techniques are applied to these segments which are

formed of pixels or sub pixels only. But the concept of extracting Superpixels in this paper is much

more beneficial as it reduces the number of comparisons of frame segments and it also reduces the

processing of data more than 75% and thus reduces the storage capacity. Foundation subtraction is

a vital piece of reconnaissance applications for effective division of objects of enthusiasm for a scene

for applications, for example, observation. The reason for Background Subtraction calculation is to

recognize moving articles (in the future alluded to as the closer view) from static or moving parts of

the scene (called Background). Straightforward movement discovery calculation [ O.Barnich, M.Van

Droogenbroeck]3 contrasts a static foundation outline and a present edge of a video scene, pixel by

pixel. This is the essential guideline of foundation subtraction where every video edge is analyzed

against a foundation display, and the pixels which altogether go astray from the model are thought

to be the forefront. These "frontal area" pixels are further post-handled for protest limitation and

following. The general structure of BackGround Subtraction (BGS) [Cong zhoo, Xiaogang wang, wai-

kuen cham]4 more often than not contains four stages: preprocessing, foundation displaying, closer

view identification, and post preparing. The preprocessing step gathers preparing tests and expels

imaging commotions; the foundation demonstrating step manufactures a foundation model which is

by and large powerful to certain foundation changes; the frontal area location step produces cl oser

view hopefuls through figuring the deviation of a pixel from the foundation show; at last, the post

handling step limits those possibility to shape forefront veils.

Among these four stages, the third step to be specific closer view discovery is the es sential

procedure that ought to distinguish frontal area question precisely. i.e., the resultant video

International Journal of Pure and Applied Mathematics Special Issue

554

3

succession or picture in this manner got ought not contain foundation commotion. We will likely

make a hearty, versatile following framework for forefront identification of video questions under

the packed space that is sufficiently adaptable to deal with varieties in lighting, moving scene mess,

different moving articles and other discretionary changes to the watched scene.

This paper fundamentally goes for the new strategy of video handling used to take care of the

above said issues related with the continuous video reconnaissance framework. Another frontal area

identification approach called Optical Flow with Gaussian Mixture display and SMED (OFGM-SMED)

that takes a shot at the Superpixels separated has been proposed for identifying the closer view

objects. Whatever is left of the paper is sorted out as takes after; area II depicts the past work and

current inadequacies, segment III presents the proposed approach for superpixel extraction that can

be utilized by OFGM and SMED (OFGM-SMED) and in segment IV some trial results and exchanges to

confirm the proposed approach is clarified. At long last, the paper is finished up in area V.

II. RELATED WORK AND CURRENT SHORTCOMINGS

Sobel (1970), Prewit (1971), Robinson (1977), and Frei -khan (1977) gave the classical edge

detection methods and they are very simple in computation and capable to detect edges and their

orientations but they have lack of smoothing and very sensitive to noise and they are inaccurate and

are less efficient. So there is a need for efficient edge detection algorithm and Canny proposed an

edge detection algorithm which removed the constraints of the classical methods and is preferred as

the best edge detection method but it also has one the same conventional disadvantage that it

segments the frame on pixel by pixel basis only.

The current frontal area discovery calculations can be partitioned into three classes: outline

contrast, optical stream and foundation subtract.

Outline distinction [K.Gupta, V.Anjali Kulkarni]5 ascertains pixel dark scale contrast between

neighboring two casings in a consistent picture succession and decides closer view by setting limit.

Outline contrast strategy can be utilized as a part of element condition, however it can't totally

extricate all the closer view range, the focal piece of the objective will be lost, which brings about

terrible target acknowledgment. Furthermore, this strategy is hard to precisely re cognize quick

moving items and numerous articles.

Optical stream [B.K.P.Horn, B.G.Schunck]6 is the conveyance of obvious speeds of development of

splendor examples in a picture. Optical stream can emerge from relative movement of articles and

can give essential data about the spatial game plan of the items saw and the rate of progress of this

course of action. Discontinuities in Optical stream can help in fragmenting pictures into locales that

relate to various articles. Since it is extremely hard to compute the genuine speed field utilizing


555

4

picture arrangement, the optical stream field which is acquired from the data of moving items can

be utilized to supplant the speed field. In any case, the optical stream field alone can't be utilized in

light of the fact that it can't dispose of foundation clamors [C.Stauffer, W.Grimson]7which happen

because of the impact of encompassing light.

Foundation subtract [C.Stauffer, W.Grimson]7 is a typical technique utilized as a part of forefront

discovery. It ascertains the distinction between the present picture and foundation picture and

identifies forefront by setting edge. Essentially there are two techniques to acquire foundation

picture, viz.,

1.Defining background image manually, and

2.Obtaining a background model by a training algorithm as like used in Gaussian Mixture

Model (GMM).

Contrasted with the previous, the last is more precise and the consequence of forefront

recognition is greatly improved. Foundation subtract strategy has vigor to light changing and slight

development, however when utilizing this technique to manage long picture grouping there might

be much amass mistake in the frontal area. Additionally, GMM delivers great aftereffects of

distinguishing Foreground in uncompressed [K. Chitra Lakshmi , K. Nagarajan]21,area of Video

grouping. In any case, for compacted space of video it delivers exceptionally poor outcomes with

high foundation clamor. Optical stream covers long separation and the foundation commotion

because of shine change is less which brings about less gather blunder rate.

In advanced picture handling [C.Gonzalez, Woods R Eddins]8, the edge location is a critical

method. Edge location is the way toward finding important moves in a picture. Different edge

discovery [N.Senthil kumaran, R.Rajesh]9 calculations have been proposed in light of either angle

administrator or measurable methodologies. Generally the angle administrators are effectively

influenced by foundation commotion, and the separating administrators are utilized to diminish the

foundation clamor rate. In edge recognition, morphological edge identifiers [M.Fathy, M.Y.Siyal]10

are additionally accessible which are more compelling than the inclination administrators. A few

sorts of morphological locators are likewise accessible and those are not profici ent while contrasting

with distinguishable morphological edge finder. Be that as it may, the edges at various points are not

secured and thin edges are missed by this numerical morphological locator. Thus distinct

morphological edge indicator distinguishes thin edges and the edges at various edges with less

foundation commotion [M.Y.Siyal, A.Solangi]11.

Different strategies have been proposed to video picture handling as of recently. In any case,

these current strategies have a few challenges with blockage, shadows, foundation commotion and

different lighting conditions. This writing report portrays different systems included, their

requirements like memory, registering tie, multifaceted nature.


556

5

The Video observation technique proposed by Baumann.A et al [A.Baumann, M.Boltz, Julia

Ebling]12, goes for vigor with low volume of false positive and false negative rate all the while. Be

that as it may, the prerequisite is to have zero false negative rates and furthermore it ought to adapt

to differing light condition, impediment circumstances and low complexity. Constant video

observation proposed by Nan Lu et al [Nan Lu, Jihong Wang, Wu Q H, Li Yang]13 manages continuous

recognition of moving articles. This arrangements with issues like storage room and time uti lization

to record the video. To keep away from the above issues this uses movement recognition calculation

however this spreads just the video that has vital data. Continuously visual reconnaissance W4S

[A.Rourke and M.G.H.Bell]14 technique (What, When, Where and Who) is the minimal effort PC

based ongoing visual observation framework for identifying and following individuals and checking

their exercises in an outside situation. It has been executed to track individuals and their parts of the

body. It has the issues like sudden light changes, shadow and occlusion.W4S is a coordinated

constant stereo technique which has tended to the restriction that W4S has met. It manages

following of individuals in open air condition. Be that as it may, this makes followi ng substantially

harder in power pictures. End-to-End technique has been proposed which is utilized for expelling

moving focuses from a surge of ongoing recordings. It sorts them as indicated by picture based

properties. Yet, this includes in powerful following of moving targets. Keen video reconnaissance

frameworks bolster the human administrators with distinguishing proof of huge occasions in video.

It can do question discovery in outside and indoor situations under changing brightening conditions.

Be that as it may, this depends on the state of recognized items.

Programmed video observation utilizing foundation subtraction [Wei Li, Xiaojuan Wu, Matsumoto

K, Hua-An Zhao]15 has distinctive issues. Pixel based multi shading foundation model is an effective

answer for this issue. However this strategy experiences moderate learning toward the start and it

couldn't separate between moving items and moving shadows. Sight and sound observation

[H.Rahmalan, M.S.Nixon, J.N.Carter]16 uses arranged number of related media streams, each of

which has an alternate confirmation level to achieve various reconnaissance undertakings. It is hard

to embed another stream in the framework with no learning of earlier history.

Edge location has been a testing issue in picture preparing. Because of absence of edge data the

yield picture is not outwardly satisfying. Different sorts of edge finders are talked about here, Robert

edge locator [L.Daniel, Schmoldt, Pei Li, Lynn Abbott]17 identifies edges which keep running along

vertical pivot of 45 and 135 degree. The main disadvantage is that it requires long investment to

figure. Gaussian edge locator decreases foundation clamor by smoothing pictures and gives better

outcomes in a loud situation. The trouble is that it is exceptionally tedious and extremely complex

for calculation. Zero intersection identifiers utilize second subordinate of the info picture and it

incorporates the laplacian administrator. It is having settled attributes every which way. Be that as it

may, it is delicate to foundation clamor. In Canny edge locator approach, if the set edge is low, it


557

6

creates false edges and on the other hand, if the set edge is high it forgets vital edges. The primary

drawback of watchful edge finder is that it is not defenseless to foundation clamor [Browne Alan,

T.M.McGinnity, G.Prasad, J.Condell]18.

To beat the aforementioned issues required in the current procedures we propose a recently

adjusted approach. This is exceptionally powerful and could conquer all the previously mentioned

issues like clog, shadow and lighting moves, strength to light changing and even slight development.

This proposed approach will be extremely successful and the best decision for both static and

element foundation with shifting Frame rates. Additionally the proposed approach has created great

outcomes in distinguishing frontal area video protests in the compacted video groupings .

III. SUPERPIXEL EXTRACTION

Superpixel speaks to a limited type of district portrayal, adjusting the clashing objectives of

decreasing picture multifaceted nature. Superpixel is a polygonal piece of an advanced

picture/outline which is bigger than ordinary pixel that is rendered in the comparable properties, for

example, shading, brilliance, surface, vector and so forth. Superpixels are gathering of pixels with

roughly same pixel values. The fundamental objective of removing the superpixels from a

picture/edge is to diminish the extent of information that experiences for foundation subtraction.

Other than this the benefits of utilizing superpixels are computational effectiveness, decrease in

number of primitives and speculation, and less blunder rate [ J.Lee , R.M.Haralick , L.G.Shapiro]19.

Superpixel outline many fancied properties: It is computationally effective: it lessens the intricacy

of pictures from a huge number of pixels to just a couple of hundred superpixels. It is additionally

representationally effective: combine savvy limitations between units, while just for contiguous

pixels on the pixel-lattice, can now show any longer range collaborations between superpixels. The

superpixels are perceptually important: each superpixel is a perceptually reliable unit, i.e. all pixels in

a superpixel are no doubt uniform in, say, shading and surface. It is close entire: in light of the fact

that superpixels are consequences of an over division, most structures in the picture is rationed.

There is next to no misfortune in moving from the pixel -framework to the superpixel delineate.

The superpixel extraction algorithm is as follows

Input: Image/frame of a video

Output: Image/frame with extraction of superpixels

Method:

1. Initialize the primary pixel esteem P1 as introductory estimation of the superpixel.

2. Get a pixel Pi

3. Compare Pi with P1 i.e. process Pi-P1


558

7

n

i

i PPKP0

1)(

4. If Pi lies amongst T1 and T2 then current superpixel, increase i then goto 2 else goto 5

5. Create next superpixel along the bearing of Pi from current superpixel,

a. Settle the limit values

b. Initialize the present pixel esteem as beginning pixel estimation of this superpixel

c. Goto 2

6. Stop if Pi=Pn.

Each pixel in the picture/edge is experienced to the previously mentioned investigation and thus

superpixels are separated from the picture/outline. The course of the pixel which does not lie

between the edges is noted down concerning first pixel to frame another superpixel toward that

path. For instance, a pixel beneath the primary pixel is out of the limit esteem henceforth another

superpixel is made in the south of current superpixel and those pixel organizes that is more

noteworthy than this pixel alone are taken for examination (Figure 1). This thought helps in

diminishing the calculation of superpixel extraction which abstains from covering of superpixels

moreover. A similar strategy is taken after till it achieves Pn. This technique for superpixel extraction

prompts to each superpixel roughly with same size and shape. The critical indicate be noted amid

superpixel extraction is pixel crashes must not happen with the end goal that none of the pixel must

have a place with more than one superpixel.

Consider k is the aggregate number of superpixels to be separated from a picture/edge, the

aggregate number of pixels in the picture/casing is n, and the pixel estimation of ith pixel is spoken

to by Pi. Settle a base edge T1=0 and a most extreme limit T2=+1 for each superpixel. The pixel

estimation of first pixel in the superpixel is taken as the underlying incentive for that superpixel and

the contrast between every pixel qualities are contrasted and the main pixel esteem. The condition

to decide the distinction is,

(1)

(2)

In the event that the distinction of pixel qualities lies between the limit T1(zero) to T2(+one) then the

pixel is thought to be in the same superpixel else the pixel has a place with new superpixel .

One of the significant favorable position of this technique is each pixels exists inside the same

superpixel have high connection and each superpixels have extensively less relationship which will

be useful to effortlessly distinguish the edges of the question in the picture/outline for foundation


559

8

subtraction. The space expected to store the pixel estimations of a picture/edge is decreased from

thousands to couple of hundreds. Each superpixel spoke to in the memory is the portrayal of

gathering of pixels under the relating superpixel.

As the extraction of Superpixels is computationally more efficient and it reduces the number of

comparisons it is considered as a better way when compared to pixels or sub pixels. Still they cannot

predict the boundary clearly. Without setting a boundary properly we cannot get a efficient

segmented foreground image. So clear boundary setting is needed and is done with the help of edge

detectors. In that the SMED is used as it is prone to all the related problems of edge detection.

Hence, the SMED along with OFGM algorithm is performed on superpixels which give an efficient

image/video frame with clear boundaries.

IV. OPTICAL FLOW

Another closer view location approach cal led OFGM-SMED which makes utilization of Lucas-

Kanade optical stream [Nan Lu, Jihong Wang, Wu Q H, Li Yang]13 and Gaussian Mixture Model

alongside SMED is proposed. An impeccable forefront can't be acquired by utilizing optical stream

alone because of some shine change. Yet, ideal frontal area can be gotten by OFGM-SMED

adequately.

It is realized that there are five sorts of optical stream technique and Lucas Kanade (LK) optical

stream appeared in Figure 2 is a sort of angle based calculation [9]. On the off chance that I(x; y; t) is

the power of pixel m(x; y) at time t, vm = [vx; vy] is the speed vector of pixel m(x; y), then a little

while later interim Δt, the optical stream limitation condition

(3)

where

is the spatial force inclination vector.

Since vm is a two measurement variable, more requirements are expected to settle this

question. LK optical stream technique gauges vm by v communicated in (2) on the suspicion that vm

is a steady in a little spatial neighborhood Ω.

∑m=W2(m)(∇I.v+ )2 (4)


560

9

2

2)(

1 2

1)( n

nx

n

n

n ewXP

In (4), W2 (m) is a window function making the central part of the neighborhood has greater weight

than the peripheral part. For the pixels mi (i = 1, 2 ..., n) in Ω, the solution v can be obta ined by

v=(ATW2A)-1ATW2b (5)

where

A = (∇I (m1)... ∇I (mn)) T;

W = diag (W (m1)... W (mn))

and

Since LK technique figures optical stream in each pixel, by utilizing this strategy we can recognize

every one of the progressions between neighboring casings, along these lines it is the best decision

in distinguishing swarm development [N.Hashimoto]20. Be that as it may, optical stream strategies

are exceptionally delicate to brilliance change, when utilizing LK strategy it is hard to locate an

appropriate limit to section frontal area and foundation. Truth be told, regardless of how to settle on

a decision, the discovery result may either lose some forefront zone or contain some foundation

commotions. Clearly we can't acquire an ideal closer view by utilizing LK technique alone.

Subsequently we attempt to utilize another technique to enhance the outcome. After a progression

of tests we found that by consolidating LK optical stream and SMED technique we could get an

immaculate outcome.

V. GAUSSIAN BACKGROUND MODELING

Gaussain Background Modeling (GBM) [M.Fathy, M.Y.Siyal]10 appeared in Figure 3, is one among the

various types of foundation subtract technique. In this technique, K Gaussian models are utilized to

inexact the pixel values in the picture, these models are refreshed on each edges of the video. On

the off chance that the remaining estimation of pixel esteem and inexact esteem is bigger than the

set limit esteem, this pixel is viewed as forefront, else it is considered as foun dation. Utilizing K

Gaussian blend models, the dim likelihood capacity of pixel X at time t is given as

(6)

where wn is the heaviness of number n Gaussian model whose mean and difference are µn and

s2n.Usually, the estimation of K is from 3 to 5. Keeping in mind the end goal to speak to a perplexing

scene, we have to utilize bigger K. It ought to be noticed that the count time increments for bigger K.


561

10

where wn is the heaviness of number n Gaussian model whose mean and difference are µn and

s2n.Usually, the estimation of K is from 3 to 5. Keeping in mind the end goal to speak to a perplexing

scene, we have to utilize bigger K. It ought to be noticed that the count time increments for bigger K.

VI. SEPARABLE MORPHOLOGICAL EDGE DETECTOR

The picture edges incorporate rich data that is exceptionally critical for getting the picture

qualities by protest acknowledgment. Edge recognition alludes to the way toward recognizing and

finding sharp discontinuities in a picture. Different edge identification calculations [K.Suganya Devi,

N.Malmurugan, R.Sivakumar]21 have been proposed in view of slope administrator or factual

methodologies. For the most part the slope administrators are effortlessly influenced by foundation

clamor, and the sifting administrators are utilized to lessen the foundation commotion rate. In edge

location, morphological edge locators are additionally accessible which are viable than the slope

administrators. A few sorts of morphological locators [Browne Alan, T.M.McGinnity, G.Prasad,

J.Condell]18-[ J.Lee , R.M.Haralick , L.G.Shapiro]19 are likewise accessible and they are not effective

when contrasted and Separable Morphological Edge Detectors. The viability of many picture

handling and PC vision calculations relies on upon the flawlessness of recognizing significant edges.

Because of absence of protest edge data the yield picture is not outwardly satisfying.

Existing edge locators are additionally accessible yet the principle drawback is that they are

delicate to foundation commotion and off base. A few illustrations are Robert's edge locator and

Sobel edge finder. As existing edge finders have a few impediments with foundation clamor, another

morphological edge-identification administrator Separable Morphological Edge Detector (SMED)

[K.Suganya Devi, N.Malmurugan, R.Sivakumar]21 has been proposed. It requires less calculation, and

has considerable execution when contrasted and other morphological administrators. The purposes

behind receiving SMED administrator is recorded beneath.

1) SMED can recognize edges at various edges, while other edge locators can't recognize a wide

range of edges.

2) The quality of the edges recognized by SMED is twice than other edge indicators.

3) SMED utilizes divisible middle sifting to expel foundation clamor. Distinct middle separating has

appeared to have practically identical execution to the genuine middle sifting, yet requires less

computational power.

A Separable Morphological Edge Detector as appeared in Figure 4 plays out the accompanying

1. Grayscale transformation.

2. Middle Filtering

3. Limit Extraction utilizing histogram and settling

Edge.


562

11

4. Expel the uproarious pixels from the Image and Filling

the pixel soften up standard example.

5. Make Image One Pixel Thick in Horizontal and

Vertical Direction.

SMED, which utilizes perfect and effortlessly implementable administrators, has a lower

computational necessity, contrasted with the other morphological edge-recognition administrator.

Open–close has preferable execution over the SMED administrator [N.Hashimoto]20, however it

requires around eight circumstances more computational power, along these lines, it is not

reasonable for continuous applications.

We propose another approach by consolidating SMED and OFBM which is appeared in Figure 5.

It can be seen that OFBM technique applies LK optical stream and GBM in parallel. On one hand, we

first utilize the two adjoining pictures f(x; y; t − 1) and f(x; y; t) to compute the LK optical stream

field, then middle channel and Gaussian channel are utilized to wipe out salt and pepper clamors

and high-recurrence commotions individually.

After that we utilize an edge Tlk , LK Threshold, to fragment optical stream field to get LK closer

view veil flk(x; y; t), our test outcomes demonstrate the scope of Tlk is [0.05, 0.20], picking littler Tlk

will deliver bigger forefront territory including foundation commotions, while picking greater edge

may lose some frontal area zone.

With a specific end goal to recognize all the development range we select the littlest esteem 0.05,

and afterward we attempt to kill the foundation commotions in the closer view veil flk(x; y; t). Then

again, GBM strategy is utilized to get another closer view cover where the scale channel is utilized

for dividing frontal area and foundation. In the scale channel, we set another limit Tg, GMM

Threshold, that implies a zone of pixel square. For an acquired forefront picture, if a pixel piece has

littler size than Tg, it will be named foundation, else it is kept as closer view. Thus, w e can get

another forefront veil fg(x; y; t). In our test, the estimation of Tg ought to be close to 1/400 of the

picture range. For instance, when the extent of picture is 320×240, the scope of Tg is [160, 200]. As

like LK strategy, we select the littlest Tg to get the biggest forefront cover fg(x; y; t).

At long last, these two covers are duplicated and we work morphological preparing [ B.K.P.Horn,

B.G.Schunck]6 to join the nearby ranges and reject little squares in the forefront, then an ideal

frontal area fore(x; y; t) can be gotten as appeared in the outcomes. It ought to be noticed that

however both flk(x;y;t) and fg(x; y; t) contain foundation commotions, the foundation clamor in flk(x;


563

12

y; t) is created by brilliance change and shows up arbitrarily on the profiles of items, though in fg(x;

y; t) the foundation commotion happens on the edge of articles and with time passing by, the

foundation clamor shows up at a similar place. Since the two foundation clamors show up at better

places, we can wipe out most foundation commotions by duplicating flk(x; y; t) and fg(x;y;t). The

closer view picture fore(x; y; t) acquired by OFGM is then utilized as a part of the group thickness

estimation.

When utilizing SMED technique, the closer view containing tremendously gathered mistake

because of foundation clamor ought to be killed. The foundation commotions show in the Optical

Flow calculation can likewise be expelled by this approach by applying Separable Morphological

Edge Detector. While utilizing the proposed OFGM-SMED approach, all the foundation commotions

are evacuated, and no frontal area is lost, so the last protest location result will be ideal as it gets

low blunder rate appeared in Table II. OFGM-SMED approach is compelling.

VII. RESULTS AND DISCUSSIONS

The Foreground recognition is the base of movement investigation, for example, protest

following, picture division, and movement estimation appeared in Figure 7. The proposed approach

is tested on video sequences of both the uncompressed and compressed domain for three different

videos with sample of minimum 250 images under the following category. The results are discussed

herein.

The Video1, Figure 6a is slow moving scene taken by static camera with a frame rate of 30 fps,

Video2, Figure 6b is a scene taken by a dynamic camera (both foreground and background is

dynamic) with frame rate of 25 fps and Video3, Figure 6c is the fast moving traffic scene captured by

a static camera with a frame rate of 15fps. The Superpixels were extracted from the original vi deo

sequence a) b) c) as shown in Figure 7. Superpixels when applied in a large amount are not that

much efficient as it takes the shadows of the objects too. So this extraction of superpixels can be

well implemented using some edge detection methods to get clear edges of the foreground images.

So OFGM algorithm with SMED is used along with the superpixels.

The segmentation is carried out on the videos under uncompressed (avi) and Compressed (mp4)

domain. GMM shows good result in the case of fast moving objects with lower frame rate. But GMM

produces noisy moving regions for all the input videos a, b, c under the compressed domain. The

moving regions are segmented by a bounding box as shown in Figure 8.

The Figure 9 shows the better performance of our approach. The computational time (elapsed

time) of GMM is relatively high when compared with their counterparts’ say optical flow and OF with

SMED.


564

13

The elapsed time is even reduced under compressed domain (mp4), where the number of frames

to be processed is less which has been shown in Figure 10. The performance evaluation of OFGM-

SMED for MPEG-4 and AVI has been shown in Figure 10.

The comparison of elapsed time taken by GMM, OF and OFGM-SMED for MPEG-4 has been

shown in Figure 11. In the graph, peaks represent the elapsed time which shows that our method

OFGM-SMED has better results under the compressed domain, as the peaks are comparatively less

throughout the frames.

The results are verified based upon the execution time and the results are formulated in the form

of a table (Table I). Also the overall performance of OFGM-SMED with superpixel extraction under

compressed domain is optimal shown in the Table I compared to its counterparts as the

computational cost is low, elapsed time is comparatively reduced and is less sensitive to background

noise.

Normal blunder rate is computed for all techniques which indicate OFGM-SMED is exceptionally

successful and has less mistake rate. The numerical outcome in Table II demonstrates OFGM-SMED

that it has less blunder rate (1.74%). It can be seen that when utilizing optical stream strategy, there

are some foundation commotions in light of the fact that each movement zone is recognized. The

calculation utilizes a current system by applying basic yet viable operations . As this approach

functions admirably in compacted space, the calculation time has been decreased while contrasted

with other video reconnaissance operation. The vehicle discovery operation is a less delicate edge -

based strategy. The limit choice is done progressively utilizing a measurable way to deal with lessen

the impacts of varieties of lighting. The approach depends on choosing a point on the level hub of

the histogram of the picture, where the total of the entropy of the focuses above and underneath

this point is most extreme. This point is chosen as the limit an incentive for the following time frame.

When utilizing SMED strategy, the frontal area which had more collected blunder has been

dispensed with.

As SMED have middle sifting it disposes of all foundation commotion introduce in optical stream.

Swarm thickness estimation is critical in reconnaissance. Surface investigation and minute

examination are two basic approaches to gauge swarm thickness, in surface examination an

arrangement of thickness components can be extricated from Gray level co-event framework

(GLCM) which is figured from frontal area picture. In the event that M is the GLCM of the nxm

forefront picture, where i and j are the spatial positions in the frontal area picture, we can f igure

another element FM, GLCM of Foreground picture, which is characterized as takes after.


565

14

(7)

At the time examination, in light of the fact that the zeroth request minute speaks to the aggregate

mass of the given picture, we propose another component F00 characterized as takes after,

(8)

where Af is the zone of forefront and m00 is the zeroth request snapshot of closer view picture. Both

FM and F00 can be utilized to appraise the group thickness, the bigge r estimations of FM and the

littler estimations of F00 mean higher thickness. In our test, we utilized FM to assess the group in

various scenes and utilize F00 to quantify the diverse group in settle scene. We completed our

approach on seven unique recordings which contain 1200 casings of picture, and we haphazardly

grabbed 100 pictures to gauge OF-SMED. To begin with we falsely delegated the closer view range

on each picture which is the genuine frontal area, and afterward we utilized the accompanying

conditions to test the blunder rate of OFBM

X 100% (9)

where Areal is the range of genuine forefront, An is the territory of trial result closer view, r is the

mistake rate and the normal blunder rate R for haphazardly grabbed 100 pictures is given by,

(10)

The test result can be seen in Table II that shows our approach OFGM-SMED and OFGM-SMED on

superpixels has less error rate comparatively than their counterparts.

This algorithm works well not only for the static camera but also for moving camera with static and

dynamic background video sequences. It is also possible to detect the foreground object based on

the Region of Interest (ROI). This system is adoptable even for sudden illumination changes, since

the combination of OFGM with SMED is less sensitive to ambient lighting. In the video2 (dynamic

camera with dynamic background) irrespective of sudden illumination changes, the foreground

object has been detected effectively as shown in Figure 9.

VIII. CONCLUSION

Our approach is purely based on the concepts of extracting the superpixels without collisions and

detecting the edges clearly. For this purpose we took the conventional models of extracting the

pixels and applying the superpixel concept on those pixels. If that superpixel concept is applied, the

frame is segmented superpixel by superpixel wise and the number of comparisons needed is

n

i

m

j

n

i

m

j

M jiMjiMjiMF1 11 1

2 ),(ln),(),(

00

00

00 lnlnln MAA

mF f

f

100

100

1i irR


566

15

reduced considerably. Thus the constraints in the conventional models such as wasting time in

comparisons is eliminated thoroughly. Also, a new and simpler algorithm that can be applied on

those superpixels extracted for the segmentation of compressed video (OFGM-SMED) has been

established and explained. The proposed approach OFGM-SMED joins the forefronts of both OF and

GBM together to dispose of foundation commotion. In optical stream the foundation clamor shows

up aimlessly and in SMED strategy the foundation commotion shows up at settled spots, so by doing

the blend all the foundation clamors can be killed. Additionally, SMED is utilized which recognizes

even slight development and adjusts to change in light. When utilizing the proposed OFGM-SMED

approach on superpixels, it can be seen that all foundation clamors are evacuated, and no forefron t

is lost, so the last protest location result is ideal with nearly decreased passed time. OFGM-SMED on

superpixels turns out to be an ideal approach for activity and group checking ongoing applications

with mistake rate of 1.26% which is a fulfilled outcome.

REFERENCES

[1] L.Cheng, M.Gong, “Real Time Discriminative Background Subtraction” in IEEE Transactions on

Image Processing, Vol. 20, No. 5, pp.1401-1414, 2011.

[2] K.Suganya Devi, N.Malmurugan, “OFGM-SMED: An Efficient and Robust Foreground Object

Detection in Compressed Video Sequences”, Engineering Applications of Artificial Intelligence,

Vol. 28, pp. 210-217, 2014.

[3] O.Barnich, M.Van Droogenbroeck, “ViBe: A Universal background subtraction algorithm for

video sequences”, IEEE transactions on Image Processing, Vol. 20, No. 6, pp.1709-1724, 2011.

[4] Cong zhoo, Xiaogang wang, wai-kuen cham, “Background subtraction via Robust dictionary

Learning,” EURASIP Journal on Image and Video Processing , Article ID: 972961, 12 pages, 2011.

Doi:10.1155/2011/972961.

[5] K.Gupta, V.Anjali Kulkarni, “Implementation of an Automated Single Camera Object Tracking

System Using Frame Differencing and Dynamic Template Matching,” In Proceedings of the 2007

International Conference on Systems, Computing Sciences and Software Engineering (SCSS(1)),

pp. 245-250, 2007.

[6] B.K.P.Horn, B.G.Schunck, “Determining optical flow”, Artificial Intelligence, Vol.17, pp.185-203,

1981.

[7] C.Stauffer, W.Grimson, “Adaptive background mixture models for real -time tracking,” IEEE

Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp.246-252,

1999.


567

16

[8] C.Gonzalez, Woods R Eddins, “Digital Image Processing,” 2nd ed Prentice Hall, Upper Saddle

River, NJ, pp. 11-17, 2002.

[9] N.Senthil kumaran, R.Rajesh, “Edge Detection Techniques for Image Segmentation-A Survey,”

Proceedings of the International Conference on Managing Next Generation Software

Applications (MNGSA-08), pp.749-760, 2008.

[10]M.Fathy, M.Y.Siyal, “An image detection technique based on morphological edge detection and

background differencing for realtime traffic analysis,” Pattern Recognition Lett., Vol. 16, pp.

1321–1330, 1995.

[11]M.Y.Siyal, A.Solangi, “A Novel morphological edge detector based approach for monitoring

vehicles at traffic junctions,” Innovations in Information Technology, pp. 1-5, 2006.

[12]A.Baumann, M.Boltz, Julia Ebling, Matthias Koenig, HartmutS Loos, Marcel Merkel, Wolfgang

Niem, JanKarl Warzelhan, Jie Yu, “A Review and Comparison of Measures for Automatic Video

Surveillance Systems,” Hindawi Publishing Corporation EURASIP Journal on Image and Video

Processing, Article ID 824726, 30 pages doi:10.1155/2008/824726, 2008.

[13]Nan Lu, Jihong Wang, Wu Q H, Li Yang, “An Improved Motion Detection Method for Real -Time

Surveillance,” Proc. Of IAENG International Journal of Computer Science, Vol. 35, No. 1, pp.119-

135, 2008.

[14]A.Rourke and M.G.H.Bell, “Traffic analysis using low cost image processing,” in Proc. Seminar on

Transportation Planning Methods, PTRC, Bath, U.K., pp.217-28. 1988.

[15]Wei Li, Xiaojuan Wu, Matsumoto K, Hua-An Zhao, “Foreground Detection Based on Optical Flow

and Background Subtract,” International Conference on Communications, Circuits and Systems

(ICCCAS), pp.359 – 362, 2010.

[16]H.Rahmalan, M.S.Nixon, J.N.Carter, “On Crowd Density Estimation for Surveillance,” The

Institution of Engineering and Technology Conference on Crime and Security , pp.540 – 545,

2006.

[17]L.Daniel, Schmoldt, Pei Li, Lynn Abbott, “Machine vision using artificial neural networks with

local 3D neighborhoods,” Computers and Electronics in Agriculture, Vol. 16, pp. 255-271, 1997.

[18]Browne Alan, T.M.McGinnity, G.Prasad, J.Condell, “FPGA Based High Accuracy Optical Flow

Algorithm,” IET Signals and Systems Conference (ISSC 2010) , Ucc, cork, pp.112-117, 2010.

[19]J.Lee , R.M.Haralick , L.G.Shapiro, “Morphologic edge detection,” IEEE J.Robot. Automat. , Vol. 3,

No. 2, pp: 142-156, 1987.


568

17

[20]N.Hashimoto, “Development of an image processing traffic flow measurement system,”

Sumitomo Electronic Tech. Rev., pp. 133–138, 1998.

[21]K. Chitra Lakshmi , K. Nagarajan, “Geometric Mean Cordial Labeling Of Subdivision Of Standard

Graphs”, International Journal Of Pure And Applied Mathematics, pp103-112,2017.

[22]K.Suganya Devi, N.Malmurugan, R.Sivakumar, “OF-SMED: An Optimal foreground Detection

method in Surveillance Sytem for Traffic Monitoring” in Proceedings of IEEE International

Conference on Cyber-Security and Digital Forensics, pp. 12-17, June 24-26, 2012.

Figures:

0 1

2 3 4

Figure 1. Superpixel extraction

FIRST

PIXEL

NEW

SUPER

PIXEL

FIRST

SUPER

PIXEL

NEW

SUPER

PIXEL

NEW

SUPER

PIXEL


569

18

Figure 2. Cars on highway-Optical flow

Figure 3. The flow diagram of Background subtraction

Figure 4. Edge Detection using SMED


570

19

Figure 5. Flowchart of OFGM-SMED

Figure 6 a), b), c) Original Video sequence.

Figure 7. Superpixel extracted video for the original video sequence a, b, c.

Figure 8. The moving objects segmented by OFGM-SMED for the original video sequences a, b,c


571

20

Figure 9. Row 1 segmentation done by GMM for the compressed video sequences a,b,c. Row 2 segmentation

done by OF for the compressed video sequences a, b, c. Row 3 segmentation done by OFGM -SMED for the

compressed video sequences a, b, c.

Figure 10.Performance Evaluation of OFGM-SMED for MPEG-4 and AVI

Figure 11. Comparison of GMM, OF, OFGM-SMED for MPEG-4


572

21

Tables:

TABLE I. COMPARISON OF ELAPSED TIME (IN SECONDS) FOR THE AVERAGE FRAME SIZE 120

FOR THE ORIGINAL COMPRESSED VIDEO SEQUENCE

TABLE II. COMPARISON OF AVERAGE ERROR RATE

Original Video sequence

shown in Figure 5

Elapsed Time in seconds

OFGM-S MED

without superpixel

extraction

OFGM-S MED with

superpixel extraction

a) Slow Moving Object with

static background 0.161259 0.115432

b) Moving object with dynamic

background 0.365566 0.282512

c) Fast Moving Object with

static background 0.099303 0.069267

Approach SMED OFBM OFGM SMED

OFGM-S MED on

superpixels

Error rate

4.85% 2.01% 1.74%

1.26%


573

an object detection and tracking techniques using ... · 1 an object detection and tracking...

Documents