huang benson final report

8/3/2019 Huang Benson Final Report

1/30

1

EECE 496 Project Final Report

Mixing Shadows into Video Imagery

Student: Benson Huang

85172013

Supervisor: Dr. Sidney Fels, Changsong Shen


2/30

2

ABSTRACT

The Human Communication Technologies Laboratory is building an ongoing project

called Mixing Shadows into Video Imagery. The project captures silhouettes of

moving objects which will be positioned on previously recorded backgrounds to produce

an interactive video artwork. Human tracking systems are popular research topics in

computer vision industries. The project will be required to develop a certain kind of

motion tracking software as one of its step toward the final goal. Currently, the project

has not yet completed the step that captures the moving objects. The development for

this step will be the focus in the paper.


3/30

3

TABLE OF CONTENTS

ABSTRACT........................................................................................................................ 2TABLE OF CONTENTS.................................................................................................... 3

LIST OF ILLUSTRATIONS.............................................................................................. 4

1 - INTRODUCTION......................................................................................................... 5

2 - EQUIPMENT AND METHODOLOGY ...................................................................... 7

2.1 Canon powershot A70............................................................................................... 7

2.2 Blaze Media Pro........................................................................................................ 8

2.3 Matlab ....................................................................................................................... 9

2.4 Assumptions............................................................................................................ 11

2.5 Algorithm/Approach............................................................................................... 11

3 - DESIGNS AND EXPERIMENTS .............................................................................. 14

3.1 System Overview.................................................................................................... 14

3.2 Foreground Extraction ............................................................................................ 15

3.2.1 Gaussian Model ............................................................................................... 16

3.2.2 Background Subtraction................................................................................... 17

3.3 Noise Filter.............................................................................................................. 18

3.3.1 Dilate/Erode ..................................................................................................... 19

3.3.2 Edge Rounding................................................................................................. 22

4 RESULTS ................................................................................................................... 24

4.1 Difficulties .............................................................................................................. 26

4.2 Future Development................................................................................................ 27

5 CONCLUSION........................................................................................................... 28

REFERENCES ................................................................................................................. 30


4/30

4

LIST OF ILLUSTRATIONS

Figure 1 System Overview Diagram.............................................................................. 15

Figure 2 Background Model .......................................................................................... 17

Figure 3 Subtracted Frame............................................................................................. 18

Figure 4 Morphological operation 1 .............................................................................. 20



Figure 7 Comparison of Input / Output.......................................................................... 23

Figure 8 Example of background problem .................................................................... 25


5/30

5

1 - INTRODUCTION

This project investigates the working of a foreground segmentation algorithm that will be

used to extract moving foreground objects from a video. The video will be shot with a

standard digital camera in a controlled environment. The resulting video from the

extraction process will contain a silhouette that tracks the foreground objects

movements. The objectives for the project are to research a practical algorithm,

implement the algorithm, and finally, apply filtering techniques to the extracted

foreground to reduce the noises remaining from the segmentation. This extraction

process is part of a larger project that will be used as an interactive display. The entire

project will combine silhouettes of moving objects and pre-recorded background

imageries to create a video artwork. There are a few research papers written for different

segmentation approaches and algorithms; however, most algorithms are still not able to

achieve a fully noiseless extraction due to lighting and modeling complications.

Different segmentation algorithms are still actively being researched and tested. There

was an attempt at building a real-time extraction project by a previous EECE 496 student.

The resulting video frames were noisy and the time constraint of the student did not allow

him to implement a better filtering technique. Unlike the project by the previous student,

the new extraction algorithm will not have to run in real-time; therefore, I will start the

project from the beginning. The newly built segmentation process will be able to extract

foreground objects; also, the accuracy of the process should improve from the previous

work. The report will be divided into three major sections. First, the equipment and

software used will be discussed and it will be followed by the analysis of the systems


6/30

6

design. The performance of the process and the problems faced during the work of the

project will be covered last.


7/30

7

2 - EQUIPMENT AND METHODOLOGY

The equipments used for the project include a digital camera, video editing and

compressing software, and a processing tool that will perform the extraction process. The

digital camera will be used to take videos of moving foreground objects. Then, the video

editing software will compress the video in order for the processing tool to recognize the

file format. Lastly, the processing tool will output a video that will show only the

shadow of foreground objects. Assumptions for the project may alter the decision for

choosing the practical extraction algorithm. There are many different techniques that can

extract foregrounds from videos. This project will use a statistical approach that can be

applied to videos with random moving objects in the foreground.

2.1 CANON POWERSHOT A70

The video camera that was used for this project was the Canon PowerShot A70. It is a

general purpose digital camera that takes both pictures and photographs. This lower-end

camera has most functions that a normal digital camera has. The camera can record

videos at three different pixel resolutions: 640 by 480, 320 by 240, and 160 by 120. At

default, the video will be recorded at 15 frame per second (FPS) and 320 by 240 pixels.

The video along with the sound are saved as Audio & Video Interleaved (AVI) files with

the M-JPEG codec.


8/30

8

This video camera was used because it is always available. Multiple sessions of filming

was made in order to produce a video that was most suitable for the extraction algorithm.

Therefore, a camera that is both mobile and accessible is needed. The usage of the

camera is very basic. Like all other digital cameras, the A70 will first need to set its

functional mode to movie. After adjusting the resolution and zoom-in/zoom-out

settings, the camera will be able to start filming the video.

The only problem that occurred with the digital camera was that the AVI file downloaded

from the memory card could not be read straight by Matlab, the image processing tool.

The problem was easily solved by using video editing software to convert the file so that

Matlab will be able to process the video.

2.2 BLAZE MEDIA PRO

This project uses the video editing software, Blaze Media Pro. The official website for

the software can be found at www.blazemp.com. This audio and video editing software

has features such as AVI converter, audio editor, and movie editor etc. Even though the

software has an extensive list of powerful features, only the AVI converter will be used.

Not only does the Blaze Media Pro meet all the needs that the project requires, it is also

very easy to operate. After starting the video editing program, there are many audio and

video editing features available. For the project, we would only need to use the Convert
http://www.blazemp.com/http://www.blazemp.com/


9/30

9

Video feature. The converting tool allows the user to change the output videos frame

rate, codec, and picture resolution. Usually, a frame rate of 10 FPS is preferred due to the

large sizes of videos. A video with 200 frames takes the extracting algorithm about 10

minutes to complete; therefore, the minimal 10 FPS is desired. Furthermore, the Indeo

5.1 codec is used as the video encoder for the project. The only reason for this selection

is because Matlab can read the Indeo 5.1 encoding. Lastly, the software can also change

the frame resolution of the video. At default, the Canon A70s video files have a frame

size of 320 by 240, a four to three ratio. If the video file has a frame size of 640 by 480,

the software can decrease the resolution so the foreground separation process would not

take too long. One thing to consider is that the ratio to cut down by would have to match

the original videos height to width ratio. The most important feature that this editing

software provides is that it can take out the audio from the AVI file. This is required

since the extraction algorithm only expects image data. The output video should be the

same as the original video except that there will be no audio. The video quality may

decrease due to editing options like frame rate and resolution.

2.3 MATLAB

For the project, I chose to use Matlab as the programming language. It is a high-level

language that specializes in data analysis and computing mathematical problems.

Matlabs official website can be found at www.mathworks.com. The program
http://www.mathworks.com/http://www.mathworks.com/http://www.mathworks.com/


10/30

10

environment has an interactive command window that allows users to test and experiment

with the code line by line. Users can also save their codes into an M-file and run the

program. The Matlab Help Navigator is also very useful. It properly categorizes and

provides detailed explanations and sample usages of all functions. Just like C++ and

Java, the language syntax provides loops and condition statements for programming

purposes.

The language was chosen over C++ and Java because there are a lot of built-in functions

that are specific for image processing. As well, the compiler can compute large

mathematical equations faster than other languages. These advantages suit the project

perfectly due to the large matrix computations required during the extraction process.

There were some minor problems that occurred during the working of the project. The

first problem was that Matlab is a complete new language and environment for me. I had

to get myself familiarized with Matlab by practicing simple tutorials and exploring with

the programming environment. Another problem that arose is that Matlab takes a long

time running the segmentation code. When compared to C++ and Java, Matlab can

calculate matrices quicker, but the large video files take a long time for a scripting

language to compile. Lastly, the Matlab software environment requires a lot of memory

to run. During the process of starting up and compiling, windows often cannot provide

enough memory for Matlab and windows will sometimes shutdown automatically.


11/30

11

2.4 ASSUMPTIONS

Assumptions play a major role in this project due to the randomness of events that may

occur. There are a couple assumptions relative to the background environment of the

video. The first assumption allows the user to choose the location of the video. It can be

filmed either indoor or outdoor. Secondly, lighting in the video must always be constant

due to the difficulty that arises when a light source changes its brightness or location.

Also, the background of the video must be static. No moving objects are allowed in the

background. Even slight movement such as a reflection off of a window can create

unwanted noises. Lastly, the software process is not required to run at real-time. This

assumption greatly reduces the complexity of the software.

2.5 ALGORITHM/APPROACH

The study of human motion capture in the field of computer science is still being actively

researched. Even though there are numerous papers written about different methods to

capture the human motion, a practical method is still yet in place. For this project, a

foreground is defined as an object that moves; therefore, human motion capturing

methods can be applied to the project.

In the field of human motion capturing, different applications often requires different

constraints. Many human tracking uses active sensing to locate the joints or features of


12/30

12

the human body. However, the project requirement does not allow the placing of signal

transmitters on the foreground objects so passive sensing will be used. Moreover, due to

the fact that the project does not only concern with the tracking of human body, the

building of a human model will be avoided.

One of the segmentation algorithms that was considered is the use of simple thresholding.

Thresholding will measure the foreground objects color or intensity value and compare it

with the background. If the color intensity of the pixel is different from its neighboring

pixels, the original pixel will be treated as the foreground. The one drawback to this

approach is that it requires the foreground object to have a significant difference in the

surface color than the background [1]. This approach will not work because it relies

heavily on the foreground and the backgrounds difference in color.

The extraction algorithm that will be used for this project is based on a statistical

approach. This approach is very similar to the background subtraction method.

Background subtraction takes each frame in the video and subtracts it from a static

background that is known prior to the extraction process. Instead, this approach will

calculate the mean and variance of the color intensity of each pixel and build a

background image from the calculations. After obtaining the background, each frame in

the video will be subtracted from the background and a silhouette will be formed. The

reason this approach was chosen is because of its practicality and suitability. Background

subtraction algorithms are generally less complicated than other methods [1]. Even


13/30

13

though human models will generate significantly less errors, it does not satisfy as a

general foreground object.


14/30

14

3 - DESIGNS AND EXPERIMENTS

The entire project consists of two steps. The first step will require the user to manually

prepare a valid video for foreground extraction. The equipments and video editing

software used in this step is not limited to only the equipments mentioned in this report.

This step will not be discussed in this section since it is not part of the actual system

design. The next step will be execution of the software system which will consist of two

main phases. The first phase will implement the extraction algorithm that was discussed

previously. The second phase will take the result of the previous phase and reduce as

much noise as possible. This section will focus on the entire systems flow and the

techniques that were utilized during the extraction and filtering phases.

3.1 SYSTEM OVERVIEW

The overall system consists of two phases, an extraction phase and a filtering phase. The

purpose of the extraction phase is to execute the foreground extraction algorithm on the

given the video file. The extraction phase will receive a valid video as input. The images

will first be processed to create a Gaussian Model of the background. Then the

background model will be subtracted from each frame to generate an unfiltered

silhouette. The second phase will filter the images to reduce any unwanted noises using

closing, opening and edging techniques. The final product will be a video that has a


15/30

15

shadow which tracks the foreground objects motion from the original video. Below is a

diagram that shows the flow of the overall system.

Figure 1 System Overview Diagram

3.2 FOREGROUND EXTRACTION

The first phases main purpose is to process the video and perform background

subtraction on every frame. First a statistical based Gaussian model is made. This model

will be used as the reference for every frame when executing background subtraction.

The output of the phase will contain sequence of images that requires further filtering

process.


16/30

16

3.2.1 Gaussian Model

The first step in developing a foreground extraction software is to build a model of the

background. Since there are no preset background images to use, the software will have

to generate a model automatically. Using the statistical approach, the software will build

a Gaussian Model. A Gaussian Model calculates each pixel-value from all the sample

pixels mean and variance. The model will set a lower bound and an upper bound that

will eliminate pixels that are outside of the norm. If a video is to run for an extended

period of time, the pixels average will equal to the backgrounds value unless the

foreground object stays static.

On the Matlabs official website, www.mathworks.com, I was able to find a

downloadable software package that creates a Gaussian Model from a video. The code

works with Hue, Saturation, Value (HSV) color space instead of the Red, Green, Blue

(RGB) color space since HSV can minimize the effect that shadow have on images.

After loading the video into the software package, a background can be calculated. (See

Figure 2)
http://www.mathworks.com/http://www.mathworks.com/


17/30

17

Figure 2 Background Model

3.2.2 Background Subtraction

The background subtraction is very straight forward. This step takes every frame from

the video and subtracts it by the Gaussian Model that was calculated in the previous step.

The resulting frame shows a general shape of where the foreground objects locate,

however, the frame is very noisy due to the imperfection of the subtraction method. See

Figure 3.


18/30

18

Figure 3 Subtracted Frame

There was one problem that was faced during this step. The Matlab software occupies

too much memory and often the process would have to be stopped short. Instead of

saving the pixel data into one large variable, the program now reads each frame

individually during each subtraction and deletes the temporary frames immediately after

each loop.

3.3 NOISE FILTER

This phase receives a rough sketch of the shadow from the extraction phase and performs

filtering techniques to eliminate remaining noises. First technique that will be applied is

the traditional morphological techniques. Then an edge rounding technique will be

applied to smooth out the edges of the figure.


19/30

19

Before executing any filtering techniques, the program will need to label each pixel as

either a foreground or background pixel. A technique that will help solve this problem is

thresholding. Pixel deviation values that are higher than the threshold value will be

allowed to pass through and set the pixels status as foreground. The lower the threshold,

more noises will appear since more pixels will pass as foreground objects. However,

threshold value can not be set too high either because important data will be lost. After

passing through the threshold, the frame will now be a binary image that either contains a

0 (background) or 1 (foreground).

3.3.1 Dilate/Erode

In the field of image processing, most people use morphological operations to clean up

noises after the subtraction process. Using techniques like dilation and erosion are very

standard operations. Dilation is the expansion of the foreground whereas erosion is the

contraction of the foreground. For every pixel, both techniques check its neighbors for

other foreground pixels within a specified radius. If a foreground pixel is detected,

dilation would mark its current pixel as a foreground and erosion would perform the

opposite way. Furthermore, when dilation is followed by erosion, the technique is called

closing and it will fill up any unwanted holes inside an enclosed foreground area. In

addition, when erosion is followed by dilation, the technique is called opening and it will

eliminate noises that are present in the background. Choosing the radius size is very

important because if the radius is too small, some of the misplaced pixels would not get


20/30

20

removed and if the radius is too large, certain important foreground features would be

missing. The technique of closing works closely with the idea of threshold. If the

threshold is set at very high value, fewer foreground pixels will pass through which

results in losing some of the foreground data. This can be compensated by choosing a

wider radius during the closing operation to reform some of the lost figure [2].

For this project, the closing and opening operations were applied to the images. Matlab

has built-in functions imclose and imopen. Both functions can select the radius size,

as well as the shapes in which they search for foreground pixels. Usually, a standard

shape like a disk of radius 1 is used for both opening and closing. Through testing, I

found the combination that eliminates the most noise is using a square of radius 1 for

closing, and a disk of radius 2 for opening. The comparison below shows much better

results using two different types of shape for closing and opening.

Figure 4 Morphological operation 1


21/30

21

Figure 4 uses a square shape of radius 1 for closing and a disk shape of radius 2 for

opening. For this particular frame, there is minimum noise present.


Figure 5 uses a disk shape of radius 1 for both the closing and opening operations. It is

very clear that the image have considerably more errors than the Figure 5.



22/30

22

Figure 6 uses a disk shape of radius 1 for closing and a disk shape of radius 3 for

opening. The closing operations parameters are the same as Figure 5, however, due to

the increase of radius during the opening operation, Figure 6 has considerably less noise

on the background. If compared closely, there are some noticeable details surrounding

the foreground figure that Figure 6 does not capture.

3.3.2 Edge Rounding

Another technique that will be included in the filtering process is edge rounding. After

applying morphological operations on the silhouette, the figure boundary seems to look

rugged. With edge rounding, it will smoothen out the edges to form a softer looking

figure. For this project, it uses Canny edge detection to draw out the figures shape.

Canny edge detectors are really popular in image processing developments due to its

simplicity and effectiveness. Usually it detects the edges of a colored image by

comparing the gradient values of pixels with neighbors [3]. In this project, it will be used

to smoothen out a binary image in order to clean up the morphological operations.

First the filtered images will be processed using the Canny edge detection and another set

of images with traced edges will be produced. Next the filtered frame will be added with

the edges in order to fill up the body of the silhouette. An attempt was made to fill out

the edges but the uncertainty of the foreground objects made filling the edges really

difficult. After the edge rounding operation, each frame will be packaged into the output


23/30

23

product. Below is a comparison of the before and after of a frame entering this entire

extraction process.

Figure 7 Comparison of Input / Output


24/30

24

4 RESULTS

The resulting video from the background subtraction meets most of the expectations from

the project. The objective requires the extraction algorithm to accurately track the motion

and the figure of any foreground objects. The extraction process does produce a well

defined figure of the foreground objects. It also can track any shape of multiple moving

objects in the scene. However, the objective also requires the foreground objects to be

tracked as accurately as possible. After processing the video, there are some minor

noises present within certain frames. The filtering process did not completely remove all

the misinterpreted pixels; therefore, a perfect noiseless frame was not attained.

Due to the fact that the algorithm uses a statistical model of the background to execute

the extraction, the environment becomes vital in making an errorless silhouette. The

video samples that were used to test the algorithm have a variety of background colors,

intensity, and surfaces. After analyzing samples of the video output, there seem to be a

common trend of the frames environment when the object disfigures. Extreme color

intensity values from the background often distort the foreground objects. For example,

some of the sample videos have a both the light color such as a sky and a dark color such

as a tree bush as its background. (See Figure 8)


25/30

25

Figure 8 Example of background problem

When a foreground object over laps and covers the background spot that has a significant

lighting difference, the silhouette becomes very noisy and starts to deform.

For this project, we were given the luxury of specifying the background environment.

Through experiment, I found a couple preferred conditions of the surroundings that work

well with the algorithm. Most importantly, an evenly spread light source is preferred. If

a strong and focused light source was present in the video, not only would the shadow

create accuracy problems, extreme changes in lighting intensity would also create noises

in unexpected areas. In addition, a background with similar color and intensity values

would also help to distinct the foreground object more precisely. Within these

conditions, the foreground extraction process proves to have much less noises.


26/30

26

4.1 DIFFICULTIES

There were a couple problems that occurred during the development of the project. The

first problem that took place was the lack of memory on the computer. Running the

Matlab software takes a lot of resources from the memory. Using the Matlab software,

the project reads a video file and processes every frame of the video. Assuming the video

is running at 10 FPS, a twenty second video will have two hundred frames that are 320 by

240 pixels. Each of the pixels are needed to calculate the Gaussian Model, and later used

for extraction. All these tasks are being saved in cache. If a video with more than one

hundred and fifty frames, Matlab will give an error stating that there was insufficient

memory to process all the data. This problem was solved by skipping every nth frame

when creating the Gaussian Model. The resulting background model is almost identical

to the model that uses every frame even if only every fourth frame is considered for

background calculation. The results are similar because the objects movements are still

tracked at a high pace. Additionally, during the execution of the frame differencing, the

software will only utilize one frame at a time to reduce storage of the same frame into

memory twice.

Another problem that was presented was the filling of the disfigured shadows. During

noisy frames, figures often break apart and have holes within the body. An attempt was

made to fill in the holes of the body; however, it was not successful. Since there are a lot

of random movements that the human body is capable of, labeling a gap as an error or an

actual gap of the picture is very difficult. For example, when a person crosses his arm


27/30

27

against his waist, a gap is created in between the arm and the body. In this case, it is not

easy to figure out which holes to fill.

4.2 FUTURE DEVELOPMENT

There are still lots of improvement to be made for the project before becoming a

complete product. Numerous effective filtering techniques were not used in this project

such as size filtering. Size filtering will remove the small noises that illuminate once in a

while. This technique calculates the size of an enclosed area and removes it from the

frame if it is under a certain threshold; there is however a drawback to this type of

filtering. If a small part of the foreground object is separated from the main body due to

certain background obstacles, the smaller part will risk being deleted if the threshold

value was not set accordingly.

Modeling is another technique that will help enhance the shadow video to become more

accurate. There are a lot of research papers that discuss about matching the human model

with the body motion from the video. Building a 3D human model is a lot of work due to

the complexity of our human body. Development of the human model would be a good

project for any future EECE 496 students.


28/30

28

5 CONCLUSION

The report discussed the design and test results of a statistical extraction algorithm used

for video foreground segmentation. The goal is to develop a software package that

processes a video and outputs a shadow that accurately tracks the shapes and motions of

any foreground objects. The project successfully achieved the tracking of multiple

foreground figures; however, noiseless extraction was not accomplished.

With the given constraints and assumptions for this project, choosing a suitable algorithm

becomes very important. A statistical approach was chosen to implement this extraction

software because it is practical in most areas of concern. The entire project includes the

set up of the sample video that requires foreground extraction as well as the actual

program implementation. The software package was programmed in Matlab due to its

capabilities in graphic processing. The entire system consists of two phases, an

extraction phase and a filtering phase. The extraction phase is responsible for subtracting

each video frame with a reference background model. The result will be passed to the

next phase for further processing. The filtering phase will take the subtracted images and

apply morphological operations to create a shadow figure. Lastly, Canny edging

technique help finalizes the figure to obtain a smoothened silhouette.

The resulting video has met most of the objectives except the accuracy of the shadow

video. Within certain frames, there were often noises that were caused by an extreme

difference in the background models color intensity. This problem will leave future


29/30

29

work on the project with great possibilities. Size filtering is definitely worth researching

about since it will help reduce unsuspected noises. Future students can implement or add

on even better filtering techniques to approach even closer to the final product.


30/30

REFERENCES

[1] Thomas B. Moeslund and Erik Granum, A Survey of Computer Vision-Based

Human Motion Capture, Laboratory of Computer Vision and Media Technology,

Aalborg University, Denmark

[2] Nicholas R. Howe and Amanda Deschamps, Better Foreground Segmentation

Through Graph Cuts, Smith College, 2004,

http://maven.smith.edu/~nhowe/research/code/

[3] Lijun Ding and Ardeshir Goshtasby, On the Canny edge detector, Pattern

Recognition, vol. 34, pp. 721-725, 2001

[4] Chris Stauffer and W.E.L Grimson, Adaptive background mixture models for real-

time tracking, Massachusetts Institute of Technology, Cambridge, MA

[5] David Lowe, Introduction to Matlab for Computer Vision, 2004,

http://www.cs.ubc.ca/~lowe/425/matlab.html

[6] http://www.mathworks.com/access/helpdesk/help/helpdesk.html
http://maven.smith.edu/~nhowe/research/code/http://www.cs.ubc.ca/~lowe/425/matlab.htmlhttp://www.mathworks.com/access/helpdesk/help/helpdesk.htmlhttp://www.mathworks.com/access/helpdesk/help/helpdesk.htmlhttp://www.cs.ubc.ca/~lowe/425/matlab.htmlhttp://maven.smith.edu/~nhowe/research/code/

huang benson final report

Documents