thesis--- identification of billboards in a live handball game
Post on 11-Sep-2021
2 Views
Preview:
TRANSCRIPT
Table of Contents
ABSTRACT ............................................................................................................................................3 I. INTRODUCTION...............................................................................................................................4
1.1. BACKGROUND ...............................................................................................................................4 1.2. IMAGE AND VIDEO ISSUES .............................................................................................................7 1.3. VARIOUS APPROACHES..................................................................................................................9
II. IMAGE FEATURES FOR TRACKING ......................................................................................12 2.1. EDGES..........................................................................................................................................12 2.2. COLOR .........................................................................................................................................14 2.3. HISTOGRAM.................................................................................................................................16
III. DEFORMABLE TEMPLATE MATCHING..............................................................................18 3.1. BASIC THEORY ............................................................................................................................19
3.1.1. Bayes Theorem....................................................................................................................19 3.1.2. Bayesian Formulation of the Deformation..........................................................................19
3.2. DEFORMATION MODELS ..............................................................................................................21 3.3. ALGORITHM.................................................................................................................................24
IV. CONDITIONAL DENSITY PROPAGATION (CONDENSATION) .......................................27 4.1. BASIC THEORY ............................................................................................................................28
4.1.1. Modelling Shape and Motion ..............................................................................................28 4.1.2. Discrete Time Propagation of State Density.......................................................................28 4.1.3. Temporal Propagation of Conditional Densities ................................................................29 4.1.4. Dynamic Models .................................................................................................................30 4.1.5. Measurement.......................................................................................................................30 4.1.6. Propagation ........................................................................................................................31 4.1.7. Factored Sampling..............................................................................................................32
4.2. THE CONDENSATION ALGORITHM ...............................................................................................33 4.3. TEMPLATE REPRESENTATION ......................................................................................................35
4.3.1. B-spline Curves...................................................................................................................35 4.3.2. Template Curves .................................................................................................................40 4.3.3. Affine Representation of B-spline Curves ...........................................................................47
4.4. CONDENSATION TRACKER...........................................................................................................53 4.4.1. Dynamic Model...................................................................................................................53 4.4.2. Observation Model..............................................................................................................56 4.4.3. Initialization........................................................................................................................58 4.4.4. Detailed Algorithm..............................................................................................................59
V. EXPERIMENT AND FINDINGS ..................................................................................................63 5.1. EXPERIMENT PURPOSE.................................................................................................................63 5.2. DATA SET ....................................................................................................................................64 5.3. ERROR MEASUREMENT................................................................................................................66
5.3.1. Mean Square Error .............................................................................................................66 5.3.2. Confidence Interval.............................................................................................................68
5.4. EXPERIMENTS ON DEFORMABLE TEMPLATE MATCHING .............................................................69 5.4.1. Experimental Results...........................................................................................................69 5.4.2. Findings and Discussions ...................................................................................................76
5.5. EXPERIMENTS ON CONDENSATION ALGORITHM..........................................................................78 5.5.1. Define an Optimal Template ...............................................................................................78 5.5.2. Coefficients 0A , 1A , and B ..............................................................................................79 5.5.3. Result and Findings ............................................................................................................85
5.6. FURTHER DISCUSSIONS..............................................................................................................104 5.6.1. Optimal Definition of Final Q at Measurement Step ........................................................104 5.6.2. Optimal Number of Particles ............................................................................................105
1
5.6.3. Initialization and the Stability of the Filter.......................................................................106 VI. COMPARISON OF TEMPLATE MATCHING AND CONDENSATION ...........................107 VII. CONCLUSION AND FUTURE IMPROVEMENTS..............................................................109 APPENDIX .........................................................................................................................................112
A. GRADIENT MAGNITUDE FOR PIXEL REPRESENTATION.................................................................112 B. COLOR MODEL CONVERSION FROM RGB TO HSI........................................................................112 C. ESTIMATION OF A0, A1, AND B BY MAXIMUM LIKELIHOOD ESTIMATION METHOD. ....................113 D. IMAGES FROM THE EXPERIMENT..................................................................................................115
D.1. The Results of Section 5.4 Experiments on Deformable Template Matching......................115 D.2. 0A s, 1A s and Bs Calculated from the Training Data in Section 5.5.2..............................124 D.3. Results of Section 5.5 Experiments on Condensation .........................................................125
E. MATLAB CODE.............................................................................................................................131 REFERENCES ...................................................................................................................................159
2
Abstract Replacing commercial billboards around the ballgame field in a live TV transmission
has huge commercial potential. The fewer parties are involved in this process, the
more profitable it is. This thesis addresses part of this process in a handball game: the
way to track a billboard in a live handball game in real-time without knowledge about
camera conditions. The information provided beforehand is the approximate location
of cameras and the designs of the billboards. We present two methods to detect non-
rigid objects, namely deformable template matching and condensation algorithm, and
evaluate their accuracy and speed.
The template matching method seeks an object that best matches a deformable
template using the edge direction and the gradient magnitude of the edge. It can detect
the right object quite accurately. However, it is too slow to achieve real-time tracking.
The condensation algorithm predicts the location of the target object using a non
linear dynamic model. Following this the observation model determines the new
probability distribution for the next step by comparing the edge feature between
samples and the B-spline template. B-spline curves are flexible for representing
various shapes. The condensation algorithm is flexible and fast enough to be able to
achieve real-time tracking. However, it is difficult to create an appropriate dynamic
model suitable for many different settings.
Last but not least, we wish to thank our supervisor, Kim Streenstrup Pederson, for his
advice and encouragement. We also thank Maz Spork at Bopa Vision for materials as
well as advice from a practical point of view.
3
I. Introduction
1.1. Background The analysis and manipulation of live TV transmissions has huge commercial
potential. For example, while a TV station in Denmark is live broadcasting a football
match in England, the TV station would be able to replace the advertisement
billboards in the stadium with Danish advertisements without the viewers in Denmark
being able to notice the underlying process. It is even more flexible and profitable if
the TV station in Denmark can achieve this replacement without any information
from other parties, such as the location and properties of the involved cameras. To
implement such a system several problems need to be solved.
This project has been done in collaboration with the Danish company Bopa Vision,
which would like to create a product for real-time commercial replacement for live
TV of, for instance, sports events. Bopa Vision provided us with parts of a video
taped handball game, several pictures with entire billboards, and a map of the
approximate locations of the billboards. The given video sequence consists of many
clips1 taken by different cameras.
Replacing billboards in a live handball game includes a lot of work as depicted in
Figure 1: identifying clips, identifying the location of billboards with advertisement in
each frame, taking care of the change in the appearance of billboards by motion blur,
zooming, shadows, and obstacles in front of the billboards, and placing another
commercial in the right place by adjusting for changes in global illumination without
losing picture quality or introducing delays in the transmission.
1 By a clip we mean a sequence of frames taken by one camera
4
Design of existing billboards
Identifying clips
Clips
Identifying billboards in real-time
Position of billboards or
Transformation
Replacing the design of billboards in real-time
Clips with new billboards
Connecting clips into a sequence
Design of new billboards
• Illumination
• Motion blur • Zooming • Shadow • Occlusion
Video sequence
Figure 1 Diagram of the billboard replacement process
Given the limited scope of our project we will mainly focus on the second part of such
a system. Given a clip, we will detect the relevant billboard and provide either the
transformation parameters or the location of the billboard corresponding to each
frame in the video sequence. Even though Bopa Vision’s request is detecting all
billboards in the scene, we focus on the primary stage of detecting one billboard at a
time. A robust tracker should be able to deal with any occasions happening in the
scene, such as shadows, lighting changes, and situations by overlapping objects and
objects coming into the scene or moving out of it (Stauffer and Grimson, 2000).
In the identification process, we will deal with the following issues:
1. Should the process be done frame by frame? Can it be done on an entire clip?
2. How should the billboard be defined? We are supposed to have only limited
information about the scene before hand, such as the picture of a billboard. Is
5
it possible to detect billboards only from their shape? And if it is the case, will
connected billboards be considered as one?
3. What should the output be? What kind of coordinate descriptions should be
given to describe the locations of the billboards? Since the output should be
ready for the replacement process, it should also identify different kind of
advertisements.
4. How should distorted billboards be handled? Even though every input is a clip
taken by one camera, the camera zooms and pans all the time, which can result
in motion blur. Furthermore, the lighting conditions in the stadium may
change. These factors make the identification rather difficult.
5. How should a billboard, which is not totally visible, be identified? Sometimes
one object or a few objects will be in front of a billboard, which makes part of
the billboard invisible. So we might see only part of the billboard or a
billboard which is disconnected by obstacles.
6. How should we identify obstacles and generate masks for them. As mentioned
in Issue 5, it is often the case that a few obstacles will be in front of the
billboards. We need to identify these obstacles and make masks for them, so
that the frame can be ready for the replacement process. These masks are very
important as we need to put the object back to the original places after
replacing the billboard.
Regarding Issue 1, we start by detecting a template frame by frame, even though the
motion between two consecutive frames is in general not very large. We need to find
the accurate location of the template in order to maintain the sequence’s continuality.
On the other hand, it is impossible to know beforehand when the camera makes a
sudden big panning. If we skip a frame where the camera pans a lot, it is very likely
we will lose the track of the template.
Regarding issue 2, among many image features, we will discuss edges, colors,
histograms and shapes. Especially we will look at edges and mainly use this feature in
our algorithms.
Regarding issue 3, the output could be coordinates of each billboard. However, all
billboards move in the same way based on some transformation functions. Therefore
6
it is also possible to predict the locations of all billboards in a frame if we concentrate
on detecting this common transformation function. We will consider this issue in
Chapter IV.
Regarding issue 5, we will perform a few experiments of the case where a few players
running in front of a billboard. However, we leave the more thorough discussion of
this issue together with issue 4 and issue 6 for future research.
1.2. Image and Video Issues In order to successfully locate and track objects in a video sequence, we need to
understand the features of the targeted objects and some issues concerning the video
sequence.
Useful features of an image are normally the ones that are detectable to the human
eyes, such as color, texture, edges and etc. These features are often used in image
segmentation and object recognition. In our project, each billboard has its unique
feature. Most of the billboards are rectangular containing text of a certain color and
font. But color itself might not be efficient in identifying billboards in our project,
because some billboards have the same colors and the background scene may also
have the same color as the billboards. So it is natural that we will start the tracking by
representing each billboard by a unique feature. This could be the shape of the logos
represented on the billboard and some additional color information. In order to
identify features of each billboard, we will use the color and edge details of each
frame. We will discuss color and image edges in chapter II.
Another source of concern is related to the motion in the video sequence. In video
sequences objects are blurred and the view frequently changes due to the zooming and
panning of the camera. This makes tracking more difficult. Moreover most target
objects do not have exactly the same features as the template image because of not
only motion blur, but also occlusions and lighting changes etc. Even the same object’s
features will change from frame to frame due to the above mentioned factors and the
object will be deformed by the zooming and panning of the camera as well as the
change of the 3 dimensional (3D) orientation of the billboard. Moving objects in a
7
video sequence have inherent motion blur, especially fast moving objects. Very often,
an object is partly occluded in the scene, for example, in our project, the billboards are
very often behind the handball players and parts of the billboard will be outside the
screen. This will cause a failure in locating and tracking the billboards. Figure 2
shows an example of an EL GIGANTEN billboard being occluded by a player.
One player is in front of
ELGIGANTEN
Figure 2 A frame from the video sequence
Figure 3 is an example of motion blur taken from one of the clips. The images blur a
lot when the camera turns quickly to trace the ball. On top of that, players who are
running fast blur a lot as well.
Figure 3 Images with motion blur (taken from the 120 th frame in ’10-2.avi’)
A player on the left and the billboard behind her are heavily blurred because she is running faster than the other players.
In our video sequence, the camera moves constantly, sometimes very fast, in order to
follow the handball and the players. In this way, almost all the billboards we want to
track are motion blurred.
The task at this stage, detecting billboards, in the whole project of commercial
replacement is to provide the accurate location of the billboard in each frame where
another billboard can be inserted. But when the replaced billboard is put into the
scene, the same motion blur need to be generated in order to make the scene look
8
natural. It looks odd, for example, if a blurred edge of the original billboard is visible
at the border of the new, replaced billboard.
Due to the limitation in time, we will only cover motion blur and occlusion briefly
later in the thesis.
Our thesis will be based on the following assumptions:
1) A sequence of frames in the same clip is taken by the same camera.
2) The following information is given: (1) the designs of all billboards, (2) layout
of the billboards in the sports arena (relative positioning of billboards and
cameras)
3) Billboards have to be detected even though they are occluded or part of them
is invisible in the screen.
4) All kinds of billboards can be detected by the system we implement.
5) Real time performance can be achieved by the use of programming languages
such as C++. We have, however, decided to use Matlab and lower the speed
requirement reflecting the expected gains in speed from using a faster
programming language.
6) Clips are already made, which means that separating a video sequence into
clips is out of the scope of this project. The provided video sequence is an
MPEG2 file, which matlab cannot read. Therefore the clips are converted into
avi files, which is readable to matlab. We will use these converted images.
Somewhat surprisingly, the amount of frames is reduced when the MPEG2 file
is converted to an avi file and thereby, the data set becomes smaller than the
original one.
1.3. Various Approaches
There are several real-time tracking approaches using various image features for
different purposes. For video surveillance systems, where a camera neither moves nor
zooms, background subtraction works well because it detects only the pixels which
have changed their colors or intensities depending on whether it uses color images or
9
gray scale images2 (Stauffer and Grimson, 2000) (Lipton et. al., 1998). Lipton et. al.
mention, however, that background subtraction is not robust to changes in object size,
orientation and lighting conditions, which happen often in a handball game.
Combining more features into background subtraction for improving accuracy and/or
speed, Berriss, W. P. et. al. (2003) investigate a color-based approach for MPEG-7
standard and Koller et. al. (1994) use contour tracker and an affine motion model for a
robust real-time traffic scene surveillance. Background subtraction is effective only
when the objects change neither their shape nor size.
Another application of real-time tracking of deformable objects is area tracking, for
example, tracking a speaker’s head in a video conference. Fieguth and Terzopoulos
(1997) develop a very fast color-based model for tracking a speaker’s head. For this
purpose, we do not need the information on precise positions as long as the object is
seen in the screen.
On the contrary, accuracy is one of the important requirements for our project. In
addition, billboards not only change their size but also may deform by the projection
from the three-dimensional (3D) viewing frame to the two-dimensional (2D) viewing
plane. Therefore, we need more precise information on the positions or
transformations of deformable objects for every frame. Deformable template
matching addressed by Jain et. al. (1996, 1998) may be able to achieve this goal by
detecting the precise positions or transformation of an object in non-linear systems. A
disadvantage of this approach is that it is time consuming (Kervrann and Heitz, 1994).
The length of processing time also depends on the capacity of the computer used for
the analysis.
If deformable template matching is too slow to be able to achieve on-lien tracking,
another approach desirable for on-line tracking is to predict the motion as Yang and
Waibel (1996) suggest. Kalman filtering is a popular approach for linear tracking
when clutters do not occur in the image (Isard and Blake, 1996). In the handball game
we need to deal with clutters because players often overlap in front of billboards.
2 Explanation of color and intensity follows in Chapter 2.
10
Particle filtering or condensation algorithm (Isard and Blake, 1996) is considered
because this non-linear tracking can keep tracking even in clutters.
We will discuss image features for tracking at first in chapter 2; then the first method,
the deformable template matching in chapter 3; the second method, the conditional
density propagation method in chapter 4; the experiments on the two methods in
chapter 5; comparison of the two methods in chapter 6 and finally conclusion of our
work in chapter 7.
11
II. Image Features for Tracking In order to locate and track the billboards in the video sequence, we need first to
define the features of these billboards. This is done by making a unique template for
each billboard. The template T can be represented in different ways. For example, T
can be a histogram representing a certain pattern of the billboard. T can also be a
closed or open curve representing the shape of the billboard.
2.1. Edges Edges are very important image features in image processing. They are the points
with high intensity contrast and characterize boundaries of objects contained in an
image. Using edge information of an image also significantly reduces the amount of
data while preserving the important structural properties of an image. An edge is
characterized by its gradient magnitude3 and gradient direction4. Among them we use
the canny edge operator, which can attain (i) low probability of error, (ii) good
localization and (iii) only one response to a single edge, at the same time. Low
probability of error corresponds to maximizing the signal to noise ratio because they
are monotonically decreasing functions. Canny (1983) found that optimization of the
product of the first two conflicting criteria, (i) and (ii), provides a single best solution.
It takes a gray scale image, smoothes it by Gaussian convolution in order to ignore
small intensity differences, and shows where the intensity discontinuity is higher than
a given threshold value5 in a binary image. Those whose gradient magnitudes are
higher than threshold are considered as edges and shown in white (binary value 1) on
an edge map, whereas the other pixels are in black with the binary value 0.
3 The gradient magnitude is the difference in intensities or magnitude between neighboring pixels in
two directions, x and y. It is defined as [ ] yxyx GGGGyf
xff +≈+=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+⎟⎠⎞
⎜⎝⎛∂∂
=∇½22
½22for image
function f (Gonzalez, 2001). The vector f = [ xG is called gradient. More details are given in the appendix.
]TyG
4 The gradient direction is the direction angle of the gradient vector. It is defined as ⎟⎟⎠
⎞⎜⎜⎝
⎛−
x
y
GG1tan
at a point
(x, y). The angle is measured with respect to the x-axis. The edge direction is perpendicular to the direction of the gradient vector (Gonzalez, 2001). 5 The threshold value should be a decimal number between zero and one.
12
It is considered as an edge, if the gradient magnitude is larger than a threshold value
we give.
The following figures show the importance of choosing the right threshold to define
edges. The same picture as Figure 4.a) has different edge images, when different
canny edge thresholds are given. Two figures, Figure 4.b) and Figure 4.c) clearly
show that different threshold values influences the result when the edge feature is
used in image analysis. Figure 4.b) with the threshold 0.4 keeps strong edges in the
logo ’EL GIGANTEN’ as well as other less important edges, whereas most players’
edges and even the ‘El GIGANTEN’ billboard’s boundary disappear with the
threshold 0.6 in Figure 4 c).
a) Original image
b) Edge map with canny edge threshold as 0.4
13
c) Edge map with canny edge threshold as 0.6
Figure 4 Edge pictures by different threshold values
It is very important to choose a suitable threshold value, so that the edges need to be
detected are visible in the edge map. It is, at the same time, a very difficult task. In
this thesis, we try to find a good threshold value through trials and visual judgment.
We will further discuss how the threshold affects the tracking performance later in
chapter 5.
2.2. Color Color is another important and useful descriptor of objects in an image. Using color as
a feature has advantages; firstly, processing color is relatively fast and secondly, color
is orientation invariant (Yang and Waibel, 1996). There are many color models in use
today in order to organize variety of colors.
RGB color model (Red, Green, Blue), HSI model (Hue, Saturation, Intensity), and
HSV model (Hue, Saturation, Value) are the ones most frequently used in digital
image processing. RGB model is used by most digital image devices (e.g. monitors
and color cameras) including our video sequence and images. In RGB model, the
color is represented in terms of the amount of red, green and blue light it contains.
HSI model describes a color in terms of the way perceived by the human eyes. Hue
represents pure color. In the color model, hue is measured as an angle between 0 and
360 degrees starting from red via yellow, green, cyan, blue and magenta. For example
hue of pure green is 120 degrees. Saturation represents how much a pure color is
14
diluted by white light. It is measured between 0 and 1, where 0 indicates absence of
color and 1.0 indicates a pure color. Intensity or brightness is an achromatic notion.
Gray scale images are shown in intensity. Edges are defined only by gradient
magnitude of the intensity. We convert color images into gray scale images and use
the intensity values to find edges. HSI model is obtained by converting from RGB
model6.
We can use color information to differentiate the target objects from their
surroundings. So color might be used as an image feature to identify the target or at
least significantly reduce the searching area.
Figure 5 is a frame image from the video sequence. If, for example, we want to track
billboard ‘e’ , we can limit the searching area in each frame image to the area
where the color information is similar to billboard ‘e’, ie, the combination of red and
white color.
Figure 5 Areas taken out by color information
left: the original picture, right: red areas are selected based on the color in ‘e.’ The right picture is made by using Photoshop
Color information is helpful in identifying target. It can be, at the same time,
misleading. One of the problems with regard to images is the same objects might have
different colors and intensities when the lighting condition changes or there are
shadows. It occurs particularly often in our project. The billboard images for
templates were taken separately in a different circumstance from the actual game in
the video sequence. But in the live match broadcasting, the lighting conditions are
6 The conversion formula from RGB to HSI is given in the appendix.
15
different and they even change often during the match. Moreover there are many
shadows caused by the players in front of the billboards. So when we use the template
color as the sample color and try to find areas with the similar color in the frame, if
the tolerant level is high, a lot of unnecessary area will be included and the reduction
in searching area is not very significant; on the other hand, if the tolerant level is low,
we have the risk of ignoring the important area.
2.3. Histogram Histogram can be another useful feature for tracking. One of the problems is that the
intensity is vulnerable to various unknown and uncontrollable factors which change
constantly (Weng et. al. 1993), such as lighting and shadow. Even though lighting and
shadow affect the intensity, the histogram or at least its distribution may keep
characteristics of the target object. When the billboard is, however, occluded by
players or other objects, the histogram of the target billboard changes, then it is
difficult to identify the correct target. Figure 7 shows the difference in histogram of a
billboard with and without occlusion given in Figure 6.
a) without occlusion b) with occlusion
Figure 6 Billboards without and with occlusion
a) and b)are neighbouring billboards on the same frame, therefore lighting effects are similar. a) taken from 10-1.avi 30th frame (415 ≤≤ x 565, 165 ≤≤ y 220), b) taken from 10-1.avi 30th frame (555 ≤≤ x 705, 155 215) ≤≤ y
a) without occlusion b) with occlusion
Figure 7 Histogram of the intensity of Figure 6 a) and b)
The x-axis represents the intensity value from 0 to 255 and the y-axis represents the number of pixels with the corresponding intensity.
16
We can see the clear difference in histogram between intensity value of 50 and 100.
The histogram changes every time the billboard is occluded. Players move suddenly
and irregularly as they try to trick their opponents. Since we have no clue about when
occlusion occurs, histogram does not seem to be very useful in our case.
17
III. Deformable Template Matching
An object in two-dimensional (2D) video images cannot avoid non-rigid 7
transformation from the original 3D viewing frame. Objects change their shapes, for
example, two parallel lines look as if they meet at the end, if we see them from the
side. This is called shearing. Even though shearing is too small to notice, it is not zero.
Therefore it is dangerous to use a rigid scheme because a target object may be
considered as different from the template by the transformations other than
translation, scaling, and rotation (Jain et. al. 1996).
On the other hand, deformable template can ‘deform’ itself to fit the image features
by non-rigid transformations, meaning various transformations. They are usually
more complex than translation, scaling, and rotation (Jain et. al., 1996). The
deformable template matching method is an approach often used for object
localization and identification of non-rigid shapes (Jain et. al. 1996, 1998, 2000, Tate
and Takefuji 2002). The basic idea is searching for a target object which minimizes
the energy function defined by a Bayesian objective function. The template moves
around and is deformed in order to match with the boundary of objects in an input
image. The one which achieves the minimum energy is considered as the target. Thus
deformable template matching is more versatile and flexible in dealing with various
non-rigid transformations. It is also applicable to tracking (Jain et. al. 2000, Kervrann
and Heitz, 1994). Moreover, templates can be any shape whose exact knowledge may
not be available; it can also be in the form of hand-drawn sketch (Jain et. al. 1996).
The disadvantage of this template matching method is that it may take too much time
to achieve real time tracking. It takes six minutes to process whole 256×256 image
(Kervrann and Heitz, 1994). Among them the method introduced by Jain et. al. (1996)
seems fast enough for consideration. Later on Jain and Zhong (2000) combined color
and texture as well as shape into object localization with still images.
7 Rigid transformation includes translation, scaling, and rotation, where the shape does not change.
18
3.1. Basic Theory
3.1.1. Bayes Theorem Before explaining the basic theory we briefly give the definition of Bayes theorem,
which we use for both template matching and condensation.
Bayes theorem is defined as
)()()(
)(Bp
ApABpBAp = eq. 1
)( BAp is the posterior probability distribution, called a posteriori or posterior in
English. )( ABp is called likelihood and generative model including noise, whereas
provides the prior probability of the model, therefore is called a priori or prior in
English. is a constant in the model and works as a normalization term. We can
obtain the posterior probability based on the prior available knowledge.
)(Ap
)(Bp
3.1.2. Bayesian Formulation of the Deformation
The deformable template matching method combines both the structural knowledge
and local image features. These features are versatile in incorporating object
variations. We try to implement the deformable template matching method as
described by Jain et. al. (1996, 2000) to locate the target in one still frame at first.
Later on we would like to extend it to the sequence as described by Jain et. al. (2000).
Our template deformation model has the following structures.
1) A binary contour map of the prototype template describing a real shape (gray
levels 0 and 255)
2) A set of parametric deformation transformations
3) A probabilistic model of deformation which weights the possible deformation
levels.
19
The templates consist of a set of points on the object contour, which can be either
closed or open. This contour is represented with white pixels (gray level of 255)
surrounded by dark pixels (gray level of 0) elsewhere.
Deformations are defined by classes, such as locations, color, and texture. The points
are moved by the function (x, y) →(x, y) + ( , ), where and
are displacement functions. More details about this projection are described
later. Then the deformed template is represented as follows using the rotation angle θ,
scaling factor s, translation
),( yxxχ ),( yxyχ ),( yxxχ
),( yxyχ
d = ( , ) and local deformation parameter xd yd ξ on a
different orthogonal basis:
[ ]( )),()),((),(),( 0,,,yx
ds ddyxRyxsyx ++⋅ℑ=ℑ θξξθ χ eq. 2
where denotes the prototype template and is the rotation by an angle θ. 0ℑ θR
The posterior probability density of the deformed template given the input image
(p(DT|I)) is obtained by combining the prior probability density of the deformed
template (p(DT)) and the likelihood of the input image given the deformed template
(p(I|DT))) using the Bayes theorem (equation 1).
)(
)()()(
IpDTpDTIp
IDTp =
This probability problem can be replaced by an energy function problem, so that the
template with less transformation has less total energy, sum of the internal and
external energy. The internal energy is model driven and measures how much the
template is deformed from its original shape. The external energy is data driven and
measures the goodness of fit between the template ℑ and the input image I.
20
3.2. Deformation Models
First, a prototype template should be defined. The prototype template is the prior
shape information of the object of interest. It contains the edge or boundary
information. In our case, the templates are the binary edge maps of given pictures of
the billboards.
We assume that the template edge map is drawn on a unit square . The
displacement functions and are continuous and satisfy the following
boundary conditions: . First the binary
edge map space is spanned by the following orthogonal bases defined by Amit et. al.
(1991) so that we can measure the deformation levels relative to these axes (Jain et.
al., 1996),
[ ,0=S ]21
),( yxxχ ),( yxyχ
0)1,()0,(),1(),0( ≡≡≡≡ xxyy yyxx χχχχ
( ) ( )mymxyxe xm ππ cossin2(),( = , 0)
=),( yxe ym (0, ( ) ( )mymx ππ sincos2 ) eq. 3
where m = 1, 2,… and controls the locality and smoothness. As m increases, these
basis functions vary from global and smooth to local and coarse. Using the
deformation parameters ξ = {( , , m = 1, 2, …}, which project the displacement
functions on the orthogonal basis, the deformation in the finite range for m is defined
as
xmξ )y
mξ
),( yxξχ = ( ),( yxxξχ , ),( yxy
ξχ ) = ∑=
⋅+⋅M
m m
ym
ym
xm
xm ee
1 λξξ
eq. 4
where m = 1, 2,… are the normalizing constants. In this way
complex deformations are represented as a function of m and . The smaller is,
the smaller the deformation is; therefore the deformed template is closer to the
original shape. Naturally the prototype template is the most likely prior shape, and the
template closer to the original shape is more likely generated than largely displaced
ones and at the same time has larger internal energy.
)2( 22 mm απλ =
mξ mξ
21
Assuming that and are independent and s are independent of each other, and
the simple internal energy for the deformable template
xmξ
ymξ mξ
ℑ is independent identically
distributed zero-mean Gaussian with variance , the internal energy is given as: 2σ
))(( ζε ℑ =∑ +M
m
ym
xm )
2( 2
22
σξξ
eq. 5
A smaller variance implies a more rigid template. The smaller the deformation
parameters are, the smaller the internal energy is. So the template which is close to
the original shape is favoured.
2σ
mξ
The external energy measures how the template contours fit with the edges in the
input edge map image data I. The better they fit, the smaller the energy is. The
important variables for the external energy are the distance between the contour and
its nearest edges and the difference in their edge directions. The gradient of edges
based on the canny method is used as the direction. First the template is placed in the
edge potential field, which is defined by the positions and the direction of the input
image as follows:
{ ),(exp),( yxDyx }ρ−−=Φ eq. 6
where is the distance to the nearest edge point in I and( ½22),( yxyxD δδ += ) ρ is a
smoothing factor which controls the degree of smoothness of the potential field. The
larger ρ is, the smaller the change in the absolute value of ),( yxΦ is. In this way a
larger ρ smoothes the potential field more. Fitness of the edge is measured in
combination with the distance between two edges and the difference in the direction
of them. Combining edge directions and distances can decrease the risk of false
match. The distance is defined as ),( yxΦ and the difference in edge direction is
measured in the cosine of ),( yxβ , the angle between the edge direction at (x, y) on the
template and the direction of its nearest edge. Let Θ represents all rigid
22
transformations, translation d , scaling s, and rotation θ , that is },,{ θsd=Θ , then the
external energy is defined as:
∑ Φ+=Θℑyx
T
yxyxn
I,
))),(cos(),(1(1)),,(( βξε
= }{ ))),(cos(),(exp1(1,∑ −−yx
T
yxyxDn
βρ eq. 7
Tn is the number of pixels on the template. In order to make the potentials in Σ
positive, the constant 1 is added.
We would like to minimize the total energy function given by combining equation 5
and equation 7:
=Θℑ )),,(( IE ξ }{ ))),(cos(),(exp1(1,∑ −−yx
T
yxyxDn
βρ +∑ +M
m
ym
xm )
2( 2
22
σξξ
eq. 8
This equation should be minimized with respect to the deformation parameterξ and
rigid transformation parameter Θ .
There are several local minima. Jain et. al. (1996) employs a multiresolution
approach, which can quickly find good solutions (Jain et. a., 1996). Starting the search
from the coarsest stage with a large value of ρ in equation 6, insignificant dips are
smoothed. Then there are fewer spurious local minima, which makes it easier to
roughly locate the global minimum. This helps the template quickly move there. This
process is repeated at a finer stage in order to move the template even closer to the
global minimum. By narrowing the search area the exact location is detected. This
multiresolution approach reduces the iteration steps and deformation parameters
because at every step we do not need a thorough search to detect the exact location.
23
3.3. Algorithm First we try with the simplest conditions in order to check whether the method is
robust or not and whether processing time is within an acceptable range. Some
assumptions are made in order to make the primary experiment as simple as possible.
First of all, our template is composed of some points manually selected on the
billboard edges. Even though these points are selected on the curve, they are
considered as individual and their relationships are not utilized.
Secondly since the billboard shapes do not deform much in the video sequence and
thereby the main changes can be regarded as translation, rotation and scaling, we
assume that the change in the internal energy term can be ignored. We consider only
the external energy part in equation 8. Therefore we do not control the locality and
smoothness, because the controlling parameter m is only involved in the internal
energy.
Thirdly, we limit the translation field on the edges in the input image because
applying template matching for the entire image takes time, at least five seconds (Jain
et. al., 1996). Jain and Zhong (2000) use color and texture information in order to
limit the search area by region-based screening. First we use only edge features. We
place one of the template points on the input image pixels which are considered as
edge by canny method.
More general cases will be examined if the simplest algorithm works well.
Preprocessing-Define Prototype Template - on the billboard image
(i) Create an edge map from a given billboard image using the canny edge
method with a threshold value of 0.4, which by visual inspiration seems to
nicely retrieve the contours of the logos. The edge map stores the
normalized directional vector ( , ) at every pixel which is considered
as an edge.
xv yv
24
(ii) On the edge map, manually select some points on the contour or edge
which can represent the billboard as prototype template. Ideally we should
take into account the fit of all pixels on the template. We use, however,
only some points in order to save processing time based on the assumption
that the quality of the search is still kept. Figure 8 shows an example. The
manually selected 12 red * points represent the billboard with the logo
‘tøj.’ Remember that our template is represented only by points;
relationships between them, such as the curves where they lie, are ignored.
Figure 8 Template representing ‘tøj’
Deformable Template Matching - on the frame image (iii) Create a canny edge map of the selected pixels on the frame as done in (i) and
store the normalized directional vectors of each of the selected pixels. Create a
‘distance map’ of the selected pixels as follows. Each pixel should store the
shortest distance from the edge points. In order to minimize the calculation,
calculate distances only along their normal.
250 250 250 250 250 250 250 250 250 250 250 250 3 250 3 3 250 250 2 2 250
250 2 250 250 1 250 250 250 1 250 250 1 250 250 250 1 250 250 1 250 250
250 1 1 250 2 250 250
250 250 2 3 3 250 250 250 250 3 250 250 250 250 250 250 250 250 250 250 250
Figure 9 Nearest edge searching mechanism
At first a very large number, such as 250, is assigned on each pixel as ‘distance.’ If a pixel is within a distance along the normal from a selected point on the template curve, the ‘distance’ assigned on this pixel is overwritten by the actual distance.
25
Set all pixels as a large number, such as 250 as a default. We only take into
account pixels within a certain distance of the edge points in order to reduce the
computational load. Replace the default values along the normals if the real
distances are smaller than the default, such as 250.
(iv) Place the template so that the first point fits on each edge pixel on the frame
image. This process is considered as translating the template on every possible
location. Calculate the external energy using equation 7 and find the template,
which gives the minimum energy. The rigid transformation range is defined as
rotation range: -1/12 π (-30 degrees) to 1/12π (30 degrees) with an interval of
1/24π (15 degrees), scaling range: 21 to 2 with an interval 1/8 as a trial.
26
IV. Conditional Density Propagation (Condensation) The deformable template matching method may be too slow for real-time tracking. In
this chapter, we discuss the Condensation - Conditional Density Propagation –,
algorithm for feature tracking. It is more likely to achieve real-time video tracking for
untrained data than deformable template matching.
Condensation introduced by Blake and Isard (1995 and 1998), is based on the idea
that it is possible to track an object in subsequent images if motion and shape are
modelled (Isard and Blake, 1996). It is a particle filtering algorithm which represents
a tracked object’s state using an entire probability distribution. A probability density
function describing the likely state of the objects is propagated over time using a
dynamic model. The measurements influence the probability function and allow the
incorporation of new objects into the tracking scheme.
In general there are two cases of the probability distribution of states and
measurement: 1) both dynamics and measurement are linear or 2) nonlinear. Even a
slight non-linearity due to objects’ overlap results in multi-peaks in the probability
distribution of a tracked object state, which makes tracking very difficult. Laptev and
Lindeberg (2003) argue that a feature tracking approach may fail when objects
overlap or split because the assumed constant appearance of an object changes.
However, Forsyth and Ponce (2003) suggest that there is an algorithm that often
works for states in a low-dimensional space.
Unlike Kalman filtering8, the Condensation tracker can have more than one local
maximum, thus making it multimodal. It can track an object correctly without losing
the way in a non-linear dynamic model even after hitting one local maximum.
Moreover, this framework is not vulnerable to background clutter.
A billboard in the given still images is used as a template. Our template is represented
as a curve since we use not only straight boundary lines but also letters on the 8 Kalman filtering assumes that the likelihood of the image data given the samples configuration is the Gaussian distributed. Since Gaussian is unimodal, meaning only one local maximum is allowed, the tracker cannot represent simultaneous alternatives in clutter and once the tracker is distracted, it never recovers.
27
billboard. A spline is flexible and often used to produce a smooth curve given a set of
control points (Baker, 2004). A set of weights are distributed along the strip to keep
the spline smooth. Among various splines B-spline curves with ‘B’ standing for
‘basis’, are suitable for our purpose because they are more flexible than other splines
in locally adjusting the curvature; they can represent sharp bends and even corners as
well (Buss, 2003).
We will discuss more details about the condensation algorithm and the representation
of the template in the following sections, where Section 1 and 2 cover the basic theory
of the condensation algorithm; Section 3 discusses B-spline curves and describes our
algorithm for creating templates; Section 4 gives details and definitions for our
condensation tracker.
4.1. Basic Theory
4.1.1. Modelling Shape and Motion First a model is made in order to track the object in an image sequence. This model
should characterise the object features and be resistant to motion over time.
Theoretically, B-spline curves could be parameterised by their control points and
blending functions. Control points move over time, but blending functions stay the
same. At every frame new curves are placed based on the new positions of the control
points.
Secondly, given the target object represented as a curve, the tracking problem is to
estimate the motion of this curve. The modelling of motion is to specify the likely
dynamics of the curve over time relative to the template using the probability
densities. Some reasonable functions can be chosen for these densities.
4.1.2. Discrete Time Propagation of State Density Tracking techniques exploit the previous history of the image features’ motion to
predict the positions of these features in the next frame (Trucco and Verri, 1998). For
28
digital analysis, the propagation process is defined at discrete time t. The following
terms are used in the condensation algorithm
xt : The state of the object at time t
zt: the set of image features at time t
then
The history of the state at time t is ( ) txxx ,..., 21
The history of observation data at time t is ( ) tzzz ,..., 21
Theoretically an observation density that characterises the statistical variability of z
given x, can be estimated for xt given zt at any time t (Blake and Isard, 1996). The
next state is estimated by the current measurement, state, and the motion model.
4.1.3. Temporal Propagation of Conditional Densities The probability density propagation is well described by a so-called ‘Fokker-Planck’
equation 9 given by Isard and Blake (1996), where the density for xt drifts
deterministically and diffuses stochastically. The condensation algorithm is aimed to
address this situation in general. The deterministic components drift the entire body of
the probability density together as shown in Figure 10. Then it is diffused by the
random component of the stochastic part, which results in spreading or increasing
uncertainty and changes the probability density distribution.
9 Fokker Planck equation describes the probability density of the position and velocity of particles over time. It was first made to describe stochastically the motion of particles in fluid.
29
Figure 10 Probability density propagation over a discrete time-step
There are three phases: drift due to the deterministic component of object dynamics; diffusion due to the random component; reactive reinforcement due to observations (Blake and Isard, 1998).
4.1.4. Dynamic Models
We make the general assumption that the probabilistic framework of our dynamic
model is based on a Markov chain, which is a succession of elements each of which is
generated from the preceding elements and the future depends only on the present
regardless the previous. Then the state is assumed to be dependent only on the
previous state, independent of its earlier history:
)(),...,( 1121 −−− = ttttt xxpxxxxp eq. 9
Using second order stochastic differential equation in discrete time, the dynamics are
entirely determined by the conditional density )( 1−tt xxp . We will discuss additional
details of the model later.
4.1.5. Measurement The observations zt are assumed to be independent of both its earlier history and the
previous states. Figure 11 depicts this relationship between x and z.
30
tt
tt
zzzz
xxxx
,...........,.........,
.......
121
121
−
− ↔↔↔↔
b
Figure 11 State x and measurement z is only dependent on its current state ↔ shows the dependency
Using this relationship that zi is dependent only on xi, the probability of the
measurement given the state is expressed as follows:
)x,.....x,xz,....z,z(p)x,....x,xx(p)x,....x,xx,z.....z,z(p tttttttttttt 121121121121121 −−−−−−−−−− =
∏−
=−−=
1
1121
t
iiittt )xz(p)x,.....x,xx(p eq. 10
Therefore the observation process is defined only by the conditional density
)( tt xzp at each time t. We assume that )( tt xzp does not depend on time or
stationary. Then this assumption leads to:
)()( xzpxzp tt = eq. 11
The term on the right is called the observation model. Details about our model are
discussed later.
4.1.6. Propagation The conditional state density )z,......z,zx(p ttt 11− gives all necessary information
about the state at time t determined by the entire history of the data. Based on
Bayes theorem the propagation of state density over time is defined as
tx
)z,......z,zx(p)xz(pk)z,.....zzx(p ttttttt,tt 12111 −−− = eq. 12
31
Where is a normalization constant, which does not depend on . The first term tk tx
)( tt xzp is called the likelihood and the latter term ),.....,( 121 zzzxp ttt −− is known as
the effective prior and is actually a prediction taken from the posterior at t-1. It can be
written as follows
112111121 ),.......,()(),.......,( −−−−−−− ∫= ttttttttt dxzzzxpxxpzzzxp eq. 13
The prior ),....,( 1211 zzzxp ttt −−− at t-1 is the posterior distribution of . 1−tx
Here Bayes theorem is used in a sequence; the prior is based on the information up to
t-1, and the information at time t is added to update it as a posterior. On the other
hand, in template matching the same Bayes theorem is used in the same frame to
deform a template in a proper way.
4.1.7. Factored Sampling The factored sampling algorithm can deal with non-Gaussian observations in an
image sequence. Weighting samples based on the current sample set creates an
artificial conditional situation to predict the future state by tx ),.......,( 121 zzzxp ttt −− .
As described above, the posterior density )( zxp offers all the necessary information
on x . This is obtained by applying Bayes theorem, equation 1.
)()()( xpxzkpzxp = eq. 14
where k is a normalization constant which does not dependent on x. Since the
posterior is not always computable, sampling is done in two steps. First a sample set
{s1, s2,…, sN} with N samples is generated for all possible x values based on the prior
density p(x). Then each sample is chosen with probability.
∑ =
= N
j jz
nn
)sz(p
)sz(p
1
π eq. 15
32
Due to this weight, the samples which are more likely to happen have higher
probability of being chosen. Extending this idea with time evolution, the probability
or weight at any time t is given as
∑ =
=== N
j tjtz
tnttntttn
szp
szpsxzp
1 ,
,,,
)(
)()(π eq. 16
When new samples have been selected at every time t+1, a new set of weights is
calculated based on the new sample set.
4.2. The Condensation Algorithm The condensation algorithm is an iterative process of sampling, predicting the state of
the next time-step, and measuring the position of the object through an image
sequence. The factored sampling approach is used at sampling phase. At each
iteration step t, a sample set {sn,t, n = 1,…N} is updated with a corresponding weight
{ nπ }, which approximates the conditional state density ),....,( 11 zzzxp ttt − . We
introduce another parameter, a cumulative probability {cn} in order to reduce the
computational load of factored sampling. The sum of the weight is normalized as cN =
1. We generate a random number between 0 and 1. If this number is between
and , is selected as a sample. In this way N times random decimal number
selection determines N samples at the next step t+1. c
tic ,1−
tic , tis ,
n is defined as
0,0
,,1,
=
+= −
t
tntntn
ccc π
eq. 17
We construct a new sample set at time step t, { }tntntn cs ,,, ,,π from an old sample set at
t-1 { }1.1,1, ,, −−− tntntn cs π as follows:
1. Select a sample tns ,′
1) Generate a random number [ ]1,0∈r , uniformly distributed.
33
2) Find the smallest j for which 10rc tj ≥−1, .
3) Set 1,, −=′ tjtn ss
2. Predict by sampling from the following equation for each n = 1….N. tns ,
)( ,1 tntt sxxp ′=− eq. 18
In this project, the dynamics are explained by a linear stochastic differential
equation as described later. Therefore the new sample value can be generated
as tntntn BwsAs ,,, +′= where is a standard normal random numbers
representing Gaussian white noise
tnw ,
11, and tBB is the process noise covariance.
3. Measure and weight the new position in terms of the measured feature tz
)( ,, tntttn sxzp ==π eq. 19
Then normalize so that . The cumulative probability is updated as 1, =∑n tnπ
0, =toc
tntntn cc ,,1, π+= − (n =1....N)
Update the new sample set{ }tntntn cs ,,, ,,π .
Figure 12 Condensation algorithm (taken from Blake and Isard, 1996)
10 A fast way is binary subdivision search, which checks whether the given number can fit in the lower half of the given non-descending set. If the given number in the middle is larger than the given number, it should fit in the lower half. Then divide the chosen half into two and repeat the same check until we can find where the number can fit in the given cue. This binary search is very quick because the number of iteration steps is a function of logarithm (Sestoft, 1998). 11Gaussian white noise is a zero-mean random noise whose values are independent and identically distributed.
34
4.3. Template Representation There are several different kinds of splines for representing a smooth curve, such as
Hermite spline, Bézier spline and B-spline.
Hermite spline is described by a cubic polynomial and interpolates two end control
points and has specified first derivatives at the end points (Baker, 2004). Hermite
curve segments are connected so that the neighboring segments share control points.
Bézier spline approximates any arbitrary number of given control points. The degree
of the curve is determined by the number of control points. Some very useful
properties of the Bézier curve are that the curve connects or passes through the two
end control points and that the first derivatives at the end points can be given as a
multiple of the vector from the end point to the next control point. The B-spline also
approximates a set of control points, but more general than Bézier splines. Its two
advantages over Bézier splines are: 1) the degree of a B-spline can be set
independently from the number of control points, and 2) the shape of B-splines can be
controlled locally. On the other hand, a change of a control point affects the whole
part of the Bézier curve. The Bézier spline is a special case of B-splines.
We use B-spline curves for our template because we can determine the degree of the
curve no matter how many control points we need. For example it is possible to use a
first-order linear spline to represent a line and a cubic polynomial to represent a letter.
Furthermore B-splines can represent a sharp bend.
4.3.1. B-spline Curves
4.3.1.1. Definition of B-spline Curves B-splines approximate a set of control points, which indicate the general shape of the
curve. The contribution of each control point is weighted and combined by blending
functions.
35
Using u as a parameter, blending functions B(u) = [B1(u), B2(u), … ,Bn (u)], and X =
[x1, x2, …,xn] T 12 and Y = [y1, y2, ....yn] T , which are coordinates of the ith control point
(xi, yi), a parametric representation of a curve is given by
r(u) = (x(u), y(u)) , ustart ≤ u≤ uend eq. 20
where x(u) = B(u)X
y(u) = B(u)Y .
The parameter u traces the curve from one end to the other, where ustart<uend are the
value of u at two end points. The range of u can be defined by any real numbers;
however starting from zero is simple and easy for computational purposes. uend can be
any positive number. Sometimes it is easy if uend = 1 so that the range is normalized.
A curve is subdivided into n sections and each subinterval endpoint is called a knot
(Baker, 2004). Each blending function is defined over d intervals of the total range of
u = [ustart, uend]. An entire set of knots compose a knot vector, which should be a
nondecreasing sequence. That is, knots can be any values as long as .
Blending functions for B-spline curves are defined by the Cox-de Boor recursion
formulas (de Boor, 1986).
1+≤ jj uu
eq. 21 ⎪⎩
⎪⎨⎧ <≤=
+
.otherwise,
uuuif,)u(B
kk
,k0
1 1
1
)())(1()()()( 1,1,11,,, uBuwuBuwuB dkdkdkdkdk −++− −+= with
⎪⎩
⎪⎨
⎧ ≠−
−
=−+
−+
.,0
,)(
11,
otherwise
uuifuu
uu
uwdkk
kdk
k
dk
One minor exception is that the last point endk uu =+1 is included in the last nonzero function . That is, if 11 =,kB endkk uuu =< +1 , then 11 =,kB for 1+≤≤ kk uuu .
12 T represents transpose here. X and Y should be a column vector for calculation purpose.
36
Each blending function corresponds to a control point. Therefore there are as
many blending functions as control points. Each control point acts as a weight on the
corresponding blending function in the weighted sum in the r(u). The B-spline
blending functions are polynomials of order d-1, where d is the degree parameter
(Baker, 2004). The degree can be any integer value from 2 up to the number of
control points, n.
)(, uB dk
B-spline curves possess the following properties (Baker, 2004):
• The polynomial curve has degree d-1 and Cd-2 continuity13 over the range of u.
• The curve is described with n control points and as many blending functions.
• Each blending function is defined over d subintervals between
)(, uB dk
[ )dkk uu +, .
• A knot vector is composed of d+n knot values.
• Each subinterval of the B-spline curve is only influenced by d control points
and accordingly d blending functions.
• Any one control point can affect the shape of at most d subintervals.
• A B-spline curve is defined only in the interval between [ )1, +nd uu 14, since each
strip of the curve is defined by d blending functions, but some of them are not
defined beyond this interval.
This is illustrated in Figure 13 using an example d = 4 and n = 6. The knot vector is
composed of 10 knots. A blending function, for example, is defined between [ ,
and a B-spline curve is defined between
4,3B 3u
)7u [ 4u , . )7u
13 dth–order parametric continuity, Cd continuity, means that all the first and second and .. dth parametric derivatives exist and continuous at u (Baker, 2004). 14 Much of the literature (Buss, 2003, Baker, 2004, de Boor, 1986) states the number of control points as n+1 starting from CP0 and ending with CPn. For our convenience in programming, the control points are set from CP1 to CPn in total n control points. We stick to use this notation in our thesis as well as in our programming.
37
u7
3
2
2
1
u1
u2
u3 u4
u5
u6
u8
u9B1,4
1
4
4
3
4
B6,4
B5,4
B4,4B3,4
B2,4
u10
Figure 13 The range which blending functions are defined An example with d = 4 and n = 6. Bold numbers on the curve implies the number of blending functions defined between the corresponding intervals. A B-spline is defined only when 4 blending functions are defined, which means in this example a B-spline is defined between [ 4u , . )7u
By choosing a using equation 21 the constraint of a partition of unity is kept. )(, uB dk
eq. 22 1)(0
, =∑=
uBn
kdk
Since any is nonnegative, a B-spline curve lies within the convex hull of at
most d+1 control points (Baker, 2004).
)(, uB dk
B-splines are represented by control points and blending functions. Control points
influence the shape of the curve; however they usually do not lie on the curve itself. If
control points are on the curve, it is because of repeated knots just like the end knots
of open splines. Once a knot is repeated, the corresponding point on the curve loses
one degree of continuity. So if three knots have the same value, which means the knot
is repeated twice, continuity decreases by two degrees. Since the curve loses some
continuity properties for its derivatives at the repeated knots, it loses its good
smoothness (Buss, 2003).
38
4.3.1.2. Classification of B-spline Curves B-splines are generally classified by the knot vector types; there are three knot vector
types, namely uniform, open uniform and nonuniform (Baker, 2004).
1. Uniform B-spline curves
The interval between knot values is constant, such as {-1.0, -0.5, 0.0, 0.5, 1.0, 1.5}. It
is more convenient when a knot vector is normalized such as {0.0, 0.2, 0.4, 0.6, 0.8,
1.0} or when a knot vector starts with zero and the interval between two neighbouring
knots is one, such as {0, 1, 2, 3, 4, 5}.
One of the properties of uniform B-spline curves is that their blending functions are
periodic because the denominators in equation 21 are fixed as (d-1)× the interval of
knots.
)2()()( ,2,1, uuBuuBuB dkdkdk ∆+=∆+= ++ eq. 23
Periodicity can reduce computation time.
2. Open uniform B-spline curves (Open B-spline curves)
Open uniform B-splines are the same as uniform B-splines except at both ends, where
knot values are repeated d times, such as {0, 0, 0, 0, 1, 2, 3, 3, 3, 3} when d = 4 and n
= 6, and {0, 0, 0, 0.5, 1, 1, 1} when d = 3 and n = 4. Open uniform B-spline curves
have similar properties as Bézier spline curves, such as the curve connects the first
and the last control points. Actually Bézier spline curves are a special case of B-
splines, where a knot vector has only zeros and ones, such as {0, 0, 0, 0, 1, 1, 1, 1}.
3. Nonuniform B-spline curves
Nonuniform B-spline curves are defined by any values and intervals for knot vectors
as long as the knot sequence is not descending. Knot vectors can have multiple knot
values and unequal spacing, such as {0, 1, 3, 3, 4} and {0, 0, 0, 1, 1, 3, 3, 3}.
39
Nonuniform B-splines provide more flexibility in controlling a curve shape because
the range of each blending function, which is defined by the interval of two knots, can
be given in different lengths.
4.3.1.3. First-Order Derivatives of Blending Functions Later we need the first-order derivatives of blending functions in order to obtain the
normal to the curve at arbitrary points on the curve. Given the first-order derivative
(x’, y’), where x’ = duudx )( , y’ = duudy )( , at an arbitrary point r(u), the normal is
defined as (y’, -x’)/ 22 '' xy + .
From equation 20, the first-derivative of a curve r(u) depends only on the first-
derivative of blending functions, since the coordinates of control points are fixed. The
first-order derivative of is also calculated using two blending functions one
order lower, and (de Boor, 1986).
)(, uB dk
)(1, uB dk − )(1,1 uB dk −+
)()()( 1,1,11,,, uBzuBzuDB dkdkdkdkdk −++− −= eq. 24
where
⎪⎩
⎪⎨
⎧ ≠−−
=−+
−+
.,0
,11
1,
otherwise
uuifuu
d
zdkk
kdkdk
Therefore the first-order derivatives can be calculated given a specific u value just in
the same way as blending functions in our algorithm.
4.3.2. Template Curves Our template is general and created on a billboard image by manually selecting knots.
Therefore knot vectors are non-uniform and for reasons of convenience the knot
values are normalized as 10 1 =≤≤≤= + endjjstart uuuu . The order of our curve is no
more than 3 (d ≤ 4) because cubic polynomial in general provides a good trade-off
40
between flexibility versus controllability and computational complexity (Baker,
2004).
Our template curve should be drawn on the image based on some selected points on
image edges. Therefore neither the position of control points nor the u parameter on
the curve is available. Instead we know the coordinates of points on the curve. We
would prefer to interpolate points on the curve. Therefore, based on the information
available, we use somewhat an opposite way to create a curve, that is, we interpolate
B-splines using points on the image edges as knots, then calculating the coordinates of
control points. Buss (2003) explains this method.
The u parameters are approximated by the so-called chord length parameterization
method, where is chosen as ku
11 −− −=− kkkk PPuu eq. 25
where is the coordinate of the kth interpolated point on the curve as shown in
Figure 14. Buss (2003) finds that the chord length parameterization method can
provide a successfully smooth curve.
kP
u1
u2
u3
P2
P1 P3 Figure 14 Example of the chord length parameterization
The dotted line represents the approximated length || Pk-Pk-1|| of the real length between knot uk and uk-1.
The next section describes our interpolation method based on Buss (2003). At the
tracking stage, control points are transformed according to our transformation method
described later, whereas corresponding blending functions are kept throughout the
time sequence.
41
4.3.2.1. Interpolating with B-splines Our strategy to interpolate smooth B-splines is first to select knot positions, second to
calculate the chord length parameters for these knots, third to give the blending
function values for these parameters, and fourth to obtain the control points.
The preconditions are:
1) A curve connects or passes through the first and the last control points
2) The degree of the polynomials is at most 4
3) Knots are manually selected on the image so that the coordinates of knots are
available
4) Spacing between knots is nonuniform
5) Multiple selection of the same point as a knot is not allowed15
6) Knot values are normalized
Two end points are used as two control points from the precondition 1). As for 2), a
linear B-spline (d = 2) can be used for a straight line and a cubic polynomial B-spline
(d = 4) can be used for a smoothly bending curve. As for 6), it is easier to use the
curve later if the precise range of knot values is available and fixed.
The below explanation is based on the assumption that the curve is of order three (d =
4). Given points on the curve, [P1, P2, P3,…,Pm16 ] and their parameters
with for all j, from precondition 5), the knot vectors
should repeat the two end knots four times.
[ ]mm uuuu ,,,,,,,, 121 − 1+< jj uu
[ ]mmmmm uuuuuuuuuu ,,,,,,,,,,,,,, 121111 −
= ⎥⎦
⎤⎢⎣
⎡ ∑∑∑−
=+
−
=+
−
=+ 1,1,1,1,||/||,,,,,||/,0,0,0,0
1
11
2
11
1
1121
m
iií
m
iií
m
iií PPPPPPPP eq. 26
15 The repeated knots disable the following matrix calculation because the determinant becomes zero (see equation 29). In reality it is nearly impossible to click exactly the same position twice; therefore we think this assumption does not limit the generality of the template for the condensation algorithm. 16 The number of control points n ≠ the number of selected points on the curve m when d 2. The reason is explained in
≠Table 1
42
There are m+2× (d-1) = m+6 knots. The number of control points then must be (m+2×
(d-1))-d = m+2. That is m+2 control points should be determined from m known
points.
The following table shows the relationships between the number of known conditions,
which are the points to be interpolated, and the number of control points and how
many conditions are required to calculate all control points for each degree d = 2, 3,
and 4.
Degree Number of
knots Number of
known points Number of
control points Number of missing
information 2 m+2 m m 0 3 m+4 m m+1 1 4 m+6 m m+2 2
Table 1 Numbers of known conditions and missing information given m interpolated points
Table 1 shows that from m given interpolated points we would like to know m+1
control points when d = 3, and m+2 control points when d = 4. That means one more
condition is required to calculate all control points when d = 3, and two more
condition are required when d = 4. Therefore, Buss (2003) suggests making ‘one more
arbitrary assumption’ to fulfil the minimum conditions to identify all control points.
That is, the first-derivatives at u = 0 and u = 1 are equal to zero when d = 4. This is
equivalent to assuming that the first two control points must be the same and the last
two control points are identical as well. When d=3, the required condition is just one,
so the additional condition is only that the first derivative at u = 0 is zero. Then we
can calculate m different control points from m interpolated points.
4.3.2.2. B-spline drawing algorithm The goal is to create a nonuniform B-spline curve on an image using a set of manually
selected points on the image as knots and to return the coordinates of control points as
mentioned above. The first-order derivatives are also calculated by following this
algorithm.
43
Given interpolated points, the knot values are calculated using Chord length
parameterization as mentioned in the previous section and normalized. The end knots
0 and 1, are repeated d times to create a knot vector as given in equation 26.
Taking d = 4 as an example, each interpolated point is defined as follows
kkkkkkkkk CPuBCPuBCPuBCPuBuP )()()()()( 4,14,124,234,3 +++= −−−−−−
1+<≤ kk uuu 4 2+≤≤ mk eq. 27
where m denotes the number of selected points on the curve and CPk is the kth control
points.
Regardless of the degree, there are m different control points and as many interpolated
points. We use only m different points and corresponding equations for the linear
matrix calculation in order to avoid obtaining a zero determinant. With the condition
that the two end points are used as control points, equation 27 can be put into a single
matrix equation.
eq. 28
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⋅
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
mm CP
CPCP
uBuBuB
uBuBuB
p
pP
.
.
.
1,0....................,.........0,,0,0
.
0,..,0),(),(),(,0
0.....,,.........0),(),(),(
0,.......,0,0,00,1
.
.
.2
1
64,564,464,3
54,454,354,22
1
We can solve this equation in terms of the control points by multiplying the inverse of
the blending function matrix (B) on both terms. In case the inverse does not exist, the
pseudo inverse is used instead of the inverseTT BBBB 1)( −+ = 1−B (Hartley and
Zisserman, 2003).
We need to explain how each blending function is calculated. We know the u value of
each interpolated point. For each point, each of the blending functions k )(, uB dk ∈
[1, n+d-1] is calculated using d blending functions [ ]. )(,,,),( 1,11,11, uBBuB ddkdkdk −−+−+−
44
In the computation process of blending functions, first when d =1, only one blending
function for and other first1)(1, =uBk 1+<≤ kk uuu 0)(1, =uBi for i ∈ [1, n-1] i .
Each blending function influences two blending functions at one higher degree as
shown in Figure 15; for example, influences and .
k≠
)(1, uBk )(2, uBk )(2,1− uBk
Figure 15 How blending functions are influenced by each other Given only = 1 at d = 1, all blending functions at d = 2, 3, and 4 are calculated
as shown in Table 2.
)(1, uBk
d = 1 d = 2 d = 3 d = 4
1 0 0 0 0 2 0 0 0 0 0 0 0
3,24,3 −− ⋅ kk Bw
0 0 2,13,2 −− ⋅ kk Bw 3,14,13,24,2 )1( −−−− ⋅−+⋅ kkkk BwBw
.. 0 2,1−kw 2,3,2,13,1 )1( kkkk BwBw ⋅−+⋅ −− 3,4,3,14,1 )1( kkkk BwBw ⋅−+⋅ −−
k 1 2,kw 2,3, kk Bw ⋅ 3,4, kk Bw ⋅
.. 0 0 0 0 n 0 0 0 0
Table 2 How the blending functions at k, d are calculated
The definition of a scalar wk,d and blending function matrix Bk,d is given in equation 21 The first-derivatives of blending functions are influenced by the same two blending
functions. For example, is calculated using and . Therefore the
first derivatives can be calculated in the same algorithm with different weights.
dkDB , 1, −dkDB 1,1 −+ dkDB
A step-by-step algorithm for producing a B-spline curve given a u value as an input is
given as follows.
45
Algorithm
1. Manually select a number of points {P1, P2, P3, , Pm} on the image.
2. Compute normalized knot values for these points using chord length
parameterization in equation 26.
a) Approximate the distance between every two neighbouring points using
the chord length parameterization shown in
b) Assign the accumulated distance from p1 to each point of {p1, p2, p3, , pm}
c) Set {p’1, p’2, p’3, , p’m} = {0, ∑−
=+
1
1121 ||/
m
iii pppp , ∑
−
=+
1
1132 ||/
m
iii pppp , ,1}
3. Create a knot vector and a set of sample points.
a) Create a knot vector by adding d-1 zeros at the beginning and as many
ones at the last knot.
ex. v = {0, 0, 0, 0, p’2, p’3, ,,, p’m-1 , 1, 1, 1, 1} for d = 4.
4. Calculate for each u for interpolated points. When the first-order
derivatives are not required, the calculation is skipped.
)(, uB dk
)(, uDB dk )(, uDB dk
a) For each knot value u,
assign for k = [1, 2,…, n+d-1]. ⎪⎩
⎪⎨⎧ <<=
−
.,0
,1)(
1
1,otherwise
uuuifuB
kk
k
b) Calculate using for k = [1, 2, , n]. )(2, uBk )(1, uBk
c) Calculate using for k = [1, 2, , n]. )(2, uDBk )(1, uBk
d) Repeat this process until obtaining and using for
k = [1, 2,…., n].
)(, uB dk )(, uDB dk )(1, uB dk −
e) Repeat the process from a) to d) one by one for each u value.
5. Calculate the coordinates of the control points.
[ ] [ )(),(, 1 yPxPBYX ⋅= − ]
46
where X = [ ] ’ and similar for Y. Tn xCPxCPxCP )(),...,(),( 21
CP(x) denotes the x coordinate of the first control point. = and
= as explained in section 4.3.2.1.
1CP 1P
nCP nP
P(x) = [ ] and similar for P(y). PTnn xCPxPxPxPxCP )(),(),...,(),(,)( 2211 − i
denotes the ith sample point.
B is the square blending function matrix nn×
where each row is blending functions [ , ,…, ] for
the corresponding u for each sample point of P.
)(,1 uB d )(,2 uB d )(, uB dn
The pseudo-inverse is used when det(B) =017.
6. Return the coordinates of control points and the first-order derivatives of
sample points.
4.3.3. Affine Representation of B-spline Curves
4.3.3.1. Definition of the Affine Representation In the tracking process, the template is transformed to find a reasonable match. As we
have mentioned in the beginning of chapter III, objects are deformed by the projection
of the 3D viewing frame to the 2D viewing plane. However, since a billboard is a
rigid and planar shape, we approximate the transformation of billboards by the affine
transformation as Weng et al. (1989) assumed the scene is rigid for their image
matching. Since the affine transformation keeps parallel lines as parallel, the
projective transformation of depth in the 3D view frame onto a 2D viewing plane as
shown in Figure 16 is missed.
17 The case where det(B) =0 did not happen during our experiment partly due to the fact that we do not allow repeated knots except for the end knots.
47
2D3D
Figure 16 Non-affine projective transformation from 3D to 2D
In the affine transformation only six affine degrees of freedom is required to describe
a curve, namely translation vertically and horizontally, rotation and scaling vertically,
horizontally and in the diagonal direction, which means equally scaling vertically and
horizontally (Blake et. al., 1995). Furthermore, in our project, we aim at tracking the
template represented by B-spline curves, which is defined by the control points and
the corresponding blending functions. Since the tracking of all control points allows
too many degrees of freedom, tracking all of them will be complicated and unstable.
Instead we decrease the number of variables as low as six without loosing the
accuracy based on the known fact that supposing a billboard is represented as a planar
shape, it can be described by the six affine degrees of freedom (Blake et. al., 1995).
A B-spline curve can be approximated by a six-degree linear vector-valued function,
Q. The relationship between the coordinates of the control points and Q are defined as
follows (Blake et. al., 1995).
⎟⎟⎠
⎞⎜⎜⎝
⎛+=⎟⎟
⎠
⎞⎜⎜⎝
⎛YX
WQYX
eq. 29
or ⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛=
YX
YX
MQ eq. 30
where the matrices W and M defines the relationship between Q and the template
),( YX .
This template ),( YX is a column vector of the x and y coordinates of n control points,
respectively. Using the notation 1 and 0 for a n×1 column vector with all 1 and 0,
respectively, the space of Q-vectors is spanned by the following basis vectors, {( 1 ;
48
0 ), ( 0 ; 1 ), X( ; 0 }, ( 0 ; Y ), ( 0 ; X ), Y( ; 0 )}(Blake et. al., 1995). Then the 2n×6
matrix W can be defined as follows:
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=
0,,,0,1,0
,0,0,,0,1
XY
YXW eq. 31
Then the pseudo inverse of W, called M is given as follows.
eq. 32 ΗΗ= − TT WWWM 1)(
where H is a 2n×2n matrix,
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=Η
∫
∫n
T
nT
duuBuB
duuBuB
0
0
)()(,,0
0,)()(
Using this expression a curve is rewritten as
eq. 33 )()(),( tQuUtur =
where
WuB
uBuU ⋅
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=
)(0
0)()(
where . [ ])(,,,),(),()( 21 uBuBuBuB n=
As can be seen from the equation 30, any planar B-spline curve with n control points
is expressible with a six-dimensional vector Q, which enormously reduces
computation time because n is usually much larger than six.
49
4.3.3.2 Transformation in Q space
The following relationship between Q and a control point ( , ) is obtained by
inserting equation 31 into equation 29.
)(iX )(iY
⎟⎟⎠
⎞⎜⎜⎝
⎛+
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
⎟⎟⎠
⎞⎜⎜⎝
⎛=⎟⎟
⎠
⎞⎜⎜⎝
⎛YX
QQQQQQ
XYYX
YX
)6()5()4()3()2()1(
0,,,0,1,0,0,0,,0,1
eq. 34
This can be rewritten as
X = Q(1) 1⋅ + Q(3) X⋅ + Q(6) Y⋅ + X
Y= Q(2) 1⋅ + Q(4) Y⋅ + Q(5) X⋅ + Y
This means that ith template control point is transformed by Q as follows.
⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛⋅⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
+
+=⎟⎟
⎠
⎞⎜⎜⎝
⎛)2()1(
)()(
)1)4((,)5(
)6(),1)3((
)()(
iYiX
iYiX
eq. 35
The influence of each of the six elements is depicted with the simplest template of a
square with its four corner coordinates as P1=(-1, -1), P2=(1, -1), P3=(1, 1), and
P4=(-1, 1) as shown in Figure 17.
(-1, 1) (1,1)
(-1,-1) (1,-1)
Figure 17 Template for checking the Q space transformation
50
Then the template coordinate vectors are =X [-1, 1, 1, -1]T, and =Y [-1, -1, 1, 1] T .
The Q vectors are changed so that one of the six elements has a non-zero value and
X , Y and each Q-vector are inserted into equation 34 to calculate the transformed
template.
Figure 18 shows the influence of Q(1) and Q(2). Only the value of Q(1) is given in
a) as Qa=[1,0,0,0,0,0]T and b) as Qb=[-1,0,0,0,0,0]T, where only the value of Q(2) is
given in c) as Qc=[0,1,0,0,0,0]T and d) as Qd=[0,-1,0,0,0,0]T. We can see that Q(1)
and Q(2) translate the template in the x direction and the y direction by Q(1) and Q(2),
respectively.
a) Qa=[1,0,0,0,0,0]T b) Qb=[-1,0,0,0,0,0] T c) Qc=[0,1,0,0,0,0]T d) Qd=[0,-1,0,0,0,0]T
Figure 18 Influence of Q(1) and Q(2) The template is transformed by Qa, Qb, Qc, and Qd.
The influence of Q(3) and Q(4) are tested in the same way using another set of Q
vectors, which have a non-zero value either in Q(3) or Q(4).
Figure 19 shows that Q(3) and Q(4) scale the x coordinate and the y coordinate with
respect to the y-axis and the x-axis by Q(3)+1 and Q(4)+1, respectively. It can
naturally assume that Q(3)+1 and Q(4)+1 are always non-negative because it does not
make sense to scale in the negative direction, which implies that a billboard is flipped
just as reflected in a mirror.
51
a) Qe=[0,0,1,0,0,0]T b) Qf=[0,0,-1,0,0,0] T c) Qg=[0,0,0,1,0,0]T d) Qh=[0,0,0,-1,0,0] T
Figure 19 Influence of Q(3) and Q(4)
The template is transformed by Qe, Qf, Qg, and Qh. Finally the influence of Q(5) and Q(6) are tested in the same way using another set of
Q vectors, which have a non-zero value either in Q(5) or Q(6). Figure 20 shows that
Q(5) and Q(6) shear relative to the y direction and the x direction, respectively.
a) Qk=[0,0,0,0,1,0]T b) Ql=[0,0,0,0,-1,0] T c) Qp=[0,0,0,0,0,1]T d) Qq=[0,0,0,0,0,-1]T
Figure 20 Influence of Q(5) and Q(6) The template is transformed by Qk, Ql, Qp, and Qq.
In general affine transformation is a composition of rotations, scaling, shearing and
translations and when the transformation is along the (x, y) axes and centered at the
origin, it can be expressed as:
⎟⎟⎠
⎞⎜⎜⎝
⎛+⎟⎟
⎠
⎞⎜⎜⎝
⎛⋅⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛ −=⎟⎟
⎠
⎞⎜⎜⎝
⎛
y
x
yy
xx
tt
iYiX
scsh
shsc
iYiX
)()(
cos,sin
sin,cos
)()(
θθ
θθ eq. 36
Where and denote scaling in the x and y direction, respectively, and
denote shearing along the x and y direction, respectively, and and denote
translation in the x and y direction, respectively. The transformation includes rotation
only when the appropriate values are given for
xsc ysc xsh
ysh xt yt
θcos and θsin , which satisfies the
following equation by combining equations 35 and 36.
52
= θθ 22 sincos +xyyx sh
Qsh
Qsc
Qsc
Q )6()5(1)4(1)3(⋅−
+⋅
+ =1
This condition is never met when three out of four elements are zero. Therefore the
above examples are not rotated. The following example in Figure 21 with Q = [0.5, -
0.5, 0.25, 0.25, 1, -1]T shows the rotation effect as well.
Figure 21 Transformation with Q = [0.5; -0.5; 0.25, 0.25; 1; -1]
As such we track these six transformation values instead of the control points
themselves. Knowing the control points of the template ( ,X Y ), we can always
recover control points of a target object (X, Y) by equation 34.
4.4. Condensation Tracker
4.4.1. Dynamic Model As mentioned earlier in section 4.1.4, a particle’s state at time t depends on its history.
Based on the assumption that the object dynamics form a Markov chain, the new state
is conditional only on the immediately preceding state, which can be written as:
),......,,,( 121 −tt xxxxp = )( 1−tt xxp eq. 37
Using a second order stochastic differential equation in discrete time, )( 1−tt xxp can
be modelled by:
ttt bxxaxx ω+−=− − )( 1 eq. 38
53
where tω are independent vectors of independent standard Gaussian variables and, as
such, can be regarded as Gaussian white noise, and x here is the mean value of
.1−tx 18 a controls the deterministic drift component of the dynamic model. It
determines the oscillatory motion of the system, such as modes, natural frequencies
and damping constants. b adjusts the stochastic part of the system and it couples the
noise into the deterministic dynamics of the system.
This dynamic model is applied in our 6D affine Q space. Let xt= ⎟⎟⎠
⎞⎜⎜⎝
⎛ −
t
t
QQ 1 , ,⎟⎟
⎠
⎞⎜⎜⎝
⎛=
x a=
and b = , respectively, equation 38 can then be written as ⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
10 ,
,0
AA
I
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
B,0
0,0
= tQ ++ −− 1120 tt QAQA QAAI )( 10 −− + tBω , eq. 39 2≥t Q represents the transformation to the mean position. Q displaces the template to the
mean position at t-1, meaning that the transformation by Q is represented based on
the mean position at t-1. The next state at t is dependent only on the two previous
states and . 1−tQ 2−tQ
Coefficients and B are important for a good prediction of the next state. They
can be estimated from a training data set or set as a good default values. We first set
likely default values for and B. If it does not work well, we will improve the
algorithm so that coefficients are estimated based on training.
,, 10 AA
,, 10 AA
4.4.1.1. Estimation of the coefficients and B ,, 10 AA
The more accurate the coefficients A0, A1 and B are estimated, the more robust
tracking can be. Since our tracking model is complicated with both stochastic and
18It is confusing to use the same expression as the template described in the earlier section. But this mean can be different from the template.
54
dynamic elements, the motion can be more accurately tracked if the model has been
trained, meaning the model parameters have been estimated. Although the movement
in live handball games is always irregular in terms of the direction and the speed, we
assume that the camera has basically similar movement within a game and from one
game to the next so that we can make use of the same training data.
We use the maximum-likelihood estimation method, an ‘effective’ method for
learning the dynamics from the previous Q sequence introduced by Blake et.
al.(1995). They found that the performance is improved especially by ignoring
background features and by being able to follow rapid motions. The disadvantage of
this method is that the tracker works well only for the specific shapes and motions
used for the training (Blake et. al., 1995).
In Maximum Likelihood Estimation (MLE) we try to find the most likely discrete-
time system parameters, , and B, by maximizing the likelihood of the observed
data set over all possible values of these parameters based on Blake et. al.(1995). The
deterministic and stochastic parameters are independent. Therefore we can separately
estimate them. The likelihood of the parameters , and B with respect to a set of
samples is given as
0A 1A
0A 1A
∏ −++ −−==
nnnnnn BQAQAQpBAAQQQp ))((),,,...,,( 1
11021021 ω eq. 40
Remember that tω are independent vectors of independent zero-mean Gaussian
variables with the variance BBT. The probability density function of a Gaussian
random variable, w, is given by
22 2)(
21)( σµ
σπ−−= wewp eq. 41
where µ is the mean of w, and σ is its standard deviation.
The log-likelihood is often used instead because the calculation becomes much simple
thanks to the change of multiplications into additions and because the log function is
monotonic; the maximum of equation 40 is obtained with the same parameters as the
55
maximum of its logarithm. Then the log-likelihood function L for the sequence of
states given the parameters , and B is described as follows (Blake et. al.,
1995).
iQ 0A 1A
constBAAQQpBAAQQQL nn +≡ ),,,...,(log),,,...,,( 1011021 eq. 42
Our goal is finding and B which maximize the log-likelihood L. Except for the
constant, equation 42 is rewritten by inserting equations 40 and 41 with
,A,A 10
µ = 0 and
= BB2σ T.
),,,...,,( 1021 BAAQQQL n
Bdetlog)m()QAQAQ(Bm
nnnn 2
21 2
1
2
11021 −−−−−= ∑
−
=++
− eq. 43
for a training sequence . It is principally impossible to estimate B itself, but
it is possible to estimate the covariance . Maximizing L with respect to
and is independent of the value of C because it is mutually independent to estimate
deterministic parameters, and , and stochastic parameter, B. By going through
the detailed proof and calculations given in appendix C, we can obtain parameters
which maximize equation 43; and are the solutions of
and
mQQ ,...,1
TBBC = 0A
1A
0A 1A
0A 1A
010100020 =−− SASAS 011101021 =−− SASAS , where , ( i , j=
0, 1, 2) and B =
∑−
=++=
2
1
m
n
Tjninij QQS
)ˆ,ˆ(2
110 AAZ
m −. .
4.4.2. Observation Model The purpose of the observation step is to measure how likely each particle is predicted
given the observed data. Naturally the observation model will utilize the set of image
features , such as edges, color, histograms, etc., to determine how the predicted state
of a sample matches the input or observation data .
tz
tz
56
The observation process is defined by an observation density function )( tt xzp , which
defines the posterior probability of the measurement for a given state . As give in
equation 11,
tz tx
)()( xzpxzp tt = , which means that )( xzp can be used throughout the
tracking (Blake and Isard, 1998).
In a 2D image, the observations z is, in principal, the entire set of visible features.
The observation density )( xzp in two dimensions depicts the distribution of the
observed curve z(s) given the predicted curve r(s), 10 ≤≤ s described by a state
parameter x. Each possible observation is found by tracing normals from the predicted
curve r. It does not always result in finding the correct observation z, because edges
arise from clutter as well as from foreground features. The observation density
)( xzp in 2D is defined as:
)( xzp ∝ ));(2
1exp( 12 µσ
vf− eq. 44
Where );( µvf = min ( , ), 2v 2µ µ is a constant, which also helps limit the tracing in a
finite distance by defining a definite value when the no image features are found
within this finite distance and represents the distance to the closest image feature
from the predicted curve. Therefore
f
1v
µ is the tracing range and can be any value large
enough. In this way we can avoid selecting more than one feature per point on the
predicted curve. The simplest discrete approximation of equation 44 is its integration
of one-dimensional densities with rM=2σ
)));()((2
1exp()(1
µmm
M
msrszf
rMxzp −−= ∑
=
eq. 45
where Mmsm = . This means that the minimum distance is found and evaluated
independently along M curve normals.
57
4.4.3. Initialization As mentioned in the condensation algorithm in section 4.2, the tracking is done
through an iterative process of sampling, predicting, measuring a sample set S {sn,t, n
= 1…N} and updating the corresponding weight { nπ }, so a set of samples needs to be
generated at t = 0 before the tracking starts. Furthermore, since the dynamic model is
assumed to be represented by a second order linear stochastic differential equation
(equation 39), we need at least the state at the first two time steps t = 0 and 1, ie.
and , to start the condensation algorithm.
1x
2x
First, a set of random samples are generated and each sample is represented by a state.
At this stage, the likelihood of the states only depends on how much information on
the targeted objects we have beforehand. If the initial guess of each parameter in the
state is within a rather large range, the likelihood of some samples might be quite low.
At time t = 0, N samples in Q space are ‘randomly’ thrown on the first image of the
video sequence depending on a random number, rand, taken from a uniformly
distributed set between zero and one. The range of each of six Q elements [ ,
] is given as input and the Q(i) of a sample is selected as
)(min iQ
)(max iQ
))()(()( maxmin iQiQrandQiQ mín−×+= 10 ≤≤ rand
Each of the randomly generated sample’s associated weight is directly calculated
using the above discussed observation model. At this time step, no prediction is
needed, since the motion has not started yet.
Next at time t =1, N new samples are generated from the set of N samples at t = 0
using their weights. There is, however, no relationship described in equation 39
between the samples at the first two steps because equation 39 is not defined for the
first two steps. The first samples may be far away from the target object since the
information about the template is not reflected.
58
Even though the tracking result might not be good for the first few time steps, the
condensation algorithm itself can gradually find a better candidate as t increases.
4.4.4. Detailed Algorithm This section depicts an overview of the detailed algorithm of the condensation tracker
as a summary. First, a template curve is created from an image with billboards. Then
the condensation tracker is run to find the best match.
4.4.4.1. Creating a template 1. Define the template curve as described in the B-spline drawing algorithm in section
4.3.2.2 and get the normalized knot vector v and the coordinates of the control
points.
2. Translate the template so that the first point lies on the origin because the Q space
transformation is with respect to x = 0 (y-axis) and y = 0 (x-axis) as explained in
section 4.3.3.2.
3. Transform the template representation from the control point coordinates into the Q
space representation using equation 30 in section 4.3.3.1.
4.4.4.2. Running the Condensation Tracker 1. Generate a sample set with n particles and each particle state represented in Q space
with six parameters as described in section 4.3.3.1. Each of the six parameter
values is randomly chosen by uniform distribution within a range defined by the
user. The range should indicate the transformation range of the state, such as
translation, scaling and shearing.
2. At time t=0, throw the sample set on the first frame of the clip as described in
section 4.4.3. Figure 22 depicts this process using an example.
59
Initially throw 100 particles onto first frame of the video sequence, as the red ‘e’s.
Figure 22 n = 100 samples thrown based on the uniformly distributed random sampling
3. Create an edge map of the first frame image using canny edge function.
4. Get measurement at t = 0 and measure the fitness of all the particles in the sample
set as follows
i) Use equation 29 to compute the corresponding control points coordinates, CP
of the state at t= 0; 0Q
ii) Pick uniformly distributed points on the curve, such as the curve parameter u
as u = 0:1/20:1( ), then calculate the blending function B(u) at each
point. Then get the coordinates of each point using
10 ≤≤ u
CPuBr ⋅= )( .
iii) Calculate the first-derivatives of the blending functions at each point r(u) in
order to obtain the normal there. The first-derivative gives the tangent vector
to the curve at the corresponding point as shown in Figure 23. So, given the
tangent vector as (x, y), the normal vector can be represented as (y,-x).
Figure 23 Normal and tangent at a point
iv) Normalize the normal vectors.
v) Set the default distance at a large constant. Measure the distance to the nearest
edge along the curve normal from each curve point selected in ii). In Figure 24
for example, the black curve is the curve resulted from ii) and the red curve is
the detected canny edge of the image.
60
Figure 24 The nearest edge searching process along the normal The distance to the nearest edge along the normal from each point
5. Sum the shortest distances and calculate a weight using equation 20.
6. Normalize the weights of all the particles and calculate the cumulative weights for
each particle.
7. Define the optimum and recover the curve using and control point
coordinates as a result at t = 0. The optimum result of , can be defined in
different ways, for example, the weighted average or the one with the highest
weight. We take the weighted average for this experiment because it may reflect
various errors in an adequate way. Other possibilities are discussed later.
0Q 0Q
0Q
Figure 25 The weighted average of the initial stage with 0QThe weighted average of samples in Figure 22
8. At t = 1 sample from the first n samples. Re-do step 3 to 7 for the second frame. Up
to this stage, the initialization is finished and a sample set containing n particles
(s(Q,w,c)) with corresponding states, weights and cumulated weights is obtained
at time t=0 and t=1.
9. From the third frame to the last frame of the clip, ie, from t=2 to t=number of
frames in the clip, run the condensation algorithm, that is to select samples,
predict by dynamic model = tQ QAAI )( 10 −− + tBω , then measure and weight by
The black color’e’ is the first measurement result.
normal
61
observation model )));(()( µsrsxzp for)(2
1exp(1
mm
M
mzf
rM−−= ∑
=
Mms = . In
Figure 26 the pictur ed, a ttom
picture shows the weighted average of particles.
m
e on the top shows 100 particles sampl nd the bo
Figure 26 Result of after the 3rd step top: 100 samples, bottom: their weighted average
Particles after resample and
on the
prediction step the 3rd fame ofvideo sequence.
Result after measurement step for 3rd frame.
62
V. Experiment and Findings
We discussed in much detail the theory behind the deformable template matching and
condensation algorithm in the previous two chapters. In this chapter we will
investigate the performance of these two algorithms.
5.1. Experiment Purpose As mentioned in the introduction, we would like to implement real-time tracking of
billboards in a live-broadcasting handball game without any information other than
the billboards’ designs and the layout of the billboards and the cameras beforehand.
The tracker should identify all billboards by their design. The positional and
translation information should be precise enough to replace the billboards with other
designs later19. In order to achieve natural replacement, occlusion should also be taken
into account. As such, our experiments will test mainly the following issues, which
we assume are essential for an accurate real-time tracking:
1) Speed
The algorithm should be fast enough to achieve real-time tracking.
2) Accuracy
The algorithm should achieve the accuracy level, at which the replacement of
billboard are undetectable by human observation.
3) Good template to represent the uniqueness of the billboard.
The template is represented both as a set of points and as curves in the
experiments.
4) Generalization
Even though handball games are different from each other, we assume the
cameras move mainly horizontally with different zooming levels. We hope to
be able to get a set of values to approximate the dynamic model and use them
for all the handball games tracking.
5) Handling of occlusion
19 As mentioned in chapter I, the replacement is out of the scope of this project.
63
How well the algorithms handle situations where occlusion occurs and where
part of the billboard runs out of the scene
5.2. Data Set
Even though our assumption is that clips are given, as mentioned in chapter I, what
we have as data is a video sequence of a handball game in MPEG2 format. So our
training data and test data are made out of the given video sequence of one handball
game. We split the original video sequences into clips and the training data and test
data are video sequences from clips taken by different cameras. Each sequence, which
consists of 50 frames, is converted into AVI file compressed with the IndeoVideo5
codec.
The training data is 18 different video sequences. The first nine training data are
sequences taken from camera N20and the last nine training data are taken from camera
S. The locations of these two cameras are shown as in Figure 27.
camera S
camera N
Figure 27 Reference of camera Figure 28 shows our data set structure. The test data are different video sequences
from the training data. They are taken by either camera N or camera S. We included
especially some sequences where the target billboards are occluded or partly out of
the scene. As described at 4) of 5.1 Experiment Purpose, we would like to experiment
on the generalization of the algorithms, so we hope that the set of value we can get
20 We can see from the given camera layout picture that there is one camera at up side of the playfield and three cameras at the opposite side of the playfield. So in order to make it simple, we refer the camera at the up side as camera N and the ones at the opposite side as camera S (since we do not know which one of the three cameras is the one used for each clip, we made all the data we assumed to be taken by camera S by pure eye observation. )
64
from the training data should be applied generally. As such, the test data are different
from the training data.
• Video clips taken by a camera at one side
Sequence used Sequence used for training for test
t50 ta ta+50t1
Time: t
• Video clips taken by a camera at the other side
Sequence used for test
tc tc+50
Figure 28 Data set structure As for templates, the given still billboard images are very big in terms of file size, so
we cut out only the billboard part and lower the resolution in order to make it easier to
process in matlab. The following images in Figure 29 are the images we used to
obtain the templates. ‘e’ at the left bottom is taken from a frame in the video
sequence. That is why it is blurred a little bit. Since we are only interested in the
shape of the billboard, reducing the size and resolution of the original image does not
affect algorithm’s performance. It does not matter whether the billboard image is
taken from the video or is draw by users.
Figure 29 billboard images used for form templates
65
5.3. Error Measurement None of Jain et. al.(1998), Isard and Blake (1996) and other relevant papers have
quantized the error. The goodness of fit is measured by visual judgment. One of the
reasons might be that their goals for tracking are for surveillance or TV meeting,
therefore the performance does not need to be very accurate as long as the camera can
follow the targeted object. In our case, however, the accuracy of the location is as
important as real-time tracking21.
5.3.1. Mean Square Error
In addition to the visual judgment, an error measurement is made in order to quantify
the quality of tracking. Since the correct tracking data does not exist, we compare the
tracking result with our manual tracking result based on the assumption that both
cannot avoid errors but errors in manual selection is within a reasonably acceptable
range. The mean square error (MSE)22 is often used in error measurement, which can
be defined as
MSE = ∑=
−Tn
iii
T
PXn 1
2)(1 eq. 46 2, ℜ∈ii PX
where, in our case, can the number of points define the template (ie. the points on
the template in the deformable template matching and the control points of the
template in the condensation algorithm ), is the ith point of the tracked result for a
frame and is the corresponding ith point of the manually tracked result for that
frame. So, the average error of a video sequence can be written as
Tn
iX
iP
21 As mentioned in Chapter 1, we cannot achieve real-time tracking because of the processing speed of matlab. Our goal is to make an algorithm which can run in real-time under a better circumstance.
22 MSE is determined by calculating the deviations of points from their true position, summing up the measurements, and then taking the square root of the sum.
66
N
MSEavgError
N
tt∑
== 1 eq. 47
where N is the number of frames in the video sequence and is the mean square
error of the tth frame. The standard deviation of the error for a sequence can then be
written as
tMSE
1
)(1
2
−
−=∑=
N
avgErrorMSEN
tt
σ eq. 48
Since the real tracking starts from the 3rd frame and the tracking results therefore
becomes stable there, the MSE, avgError and σ that we will measure later in the
experiments will not include the first two frames of each test sequence.
In order to reduce the error in manual clicking, the four corners of the billboard are
clicked and the error is measured only from the four manually tracked and the tracker
tracked corners in each frame as in Figure 30. We believe this error is a good
approximation of the error occurred in the tracking process.
Figure 30 manually tracking the 4 corner of the billboard
Image is result of manual tracking the 3rd frame of sequence tn2.avi
67
5.3.2. Confidence Interval Since our manual tracking may not be completely correct, we estimate the range
within which the true tracking error lies with certain probability using the confidence
interval (See Papoulis and Pillai, 2002).
We can estimate the correct value within a tolerance limit, which is defined as interval
estimator (Papoulis and Pillai, 2002). When the unknown θ is in the interval ( 1θ , 2θ ),
100 γ× percent of the samples measured under the same condition falls within this
range. Then ( 1θ , 2θ ) is called a γ confidence interval of θ . This relationship is
represented in the following equation.
P{ 1θ <θ < 2θ }=γ eq. 49
where the constant γ is the confidence coefficient of the estimate θ and δ =1-γ is
called confidence level. If the estimator of the mean η is unbiased, meaning that the
mean of the sampling distribution can be shown to be equal to the estimated mean,
and the density of the MSE is symmetrical about the mean, the mean is in the middle
of the interval ( 1θ , 2θ ), namely 2 η× = 1θ + 2θ .
We define the variance and the error as follows. We two people individually click the
same sequence. One of the clicking results is assumed as the golden standard, Pi and
the other as Xi in equation 46. Then the average error and standard deviation σ
between the two clicked sequences are computed using equations 47 and 48
respectively23. We use this standard deviation as when calculating the confidence
interval.
The error of the tracking result is defined as its difference from the golden standard.
First MSE is calculated by equation 46 using the golden standard as Pi and the result
of tracker as Xi. Then the average MSE of this tracking error, avgError is computed
using equation 47.
23 Following exactly Papoulis and Pillai (2002) we should measure the same sequence many times. However, we assume that clicking different frames provides almost the same effect of generalization of sampling.
68
Suppose the variance is known, but the distribution of the MSE is unknown, then the
probability of the measured errors being in the interval estimate
(avgError , Nδσ
− avgErrorNδσ
+ ) is larger than γ , where N is the number of
frame24. This is represented as follows.
P{ avgError <<− ηδσ
N avgError
Nδσ
+ }>1-δ = γ
If we set the confidence level δ as 0.05, 1/ δ =4.47.
Then the confidence interval with the confidence level δ is calculated as:
avgError <<− ησN
47.4 avgError Nσ47.4+ eq. 50
In the experiments in testing condensation algorithm, we will use the equation 50 to
calculate the confidence interval of the true tracking error with confidence coefficient
as 0.9525 for each test sequence.
5.4. Experiments on Deformable Template Matching
5.4.1. Experimental Results
The experiments are conducted on both a still image with ‘tøj’, which is one of the
billboard images, and two frames from video sequences as show in Figure 31. The
logo ‘SPAR’ is relatively clear, whereas ‘EL GIGANTEN’ is blurred a lot.
Throughout the experiment in template matching the edge smoothing parameter ρ in
equation 8 on page 23 is kept as 1 and the distance map is made as far as 20 from the
template edge with a canny edge threshold value of 0.4 when creating a distance map.
24 As mentioned under equation 48, the first two frames are not included. 25 Confidence coefficient 0.95 means that 95 percent of errors measured under the same condition fall within in this range.
69
a) tøj b) El GIGANTEN c) SPAR
Figure 31 images and their canny edge images for testing deformable template matching The canny edge threshold used is 0.4
The billboards’ canny edge images are as in Figure 32, with which the templates are
defined as described in the algorithm in chapter 3.3. Their original images are in
Figure 29.
a) tøj b) El GIGANTEN c) SPAR
Figure 32 Edge maps of ‘tøj,’ ‘EL GIGANTEN,’ and ‘SPAR’ templates The canny edge threshold used is 0.4. Original images are shown in Figure 29.
5.4.1.1. Experiment with point templates We tested with different number of points to see whether the number of points affect
the results. The points do not have to lie next to each other because each point is
independent. As mentioned earlier, it is very difficult to click exactly the
corresponding points on a frame, especially when the points are not visible in the edge
image. For example, vertical lines between ‘EL GIGANTEN’ billboards disappear in
the edge image of the frame (See in Figure 31 b). So we should avoid selecting points
70
on this edge. It is easier to select the points at the corner, but they are not suitable as
points on the template because their edge directions are not uniquely determined.
Then in order to click exactly the same points, we chose points easier to find, such as
points on the extension of a straight horizontal line in G.
Below Table 326 shows some of the results of experiments with different number of
points representing templates. The processing time to create a template is not included
because it can be done beforehand. The whole experiment results are as in Table 3 in
the Appendix. We experienced almost as many failed matches as good ones although
fewer failed results are shown in the below table. The Quality of the fit is quantized
by the MSE for not clearly ‘failed’ ones. As mentioned before, the error calculated
includes the error in manually locating the exact corresponding points. Throughout
considerable number of experiments, we feel that the error measurement does not
reflect the error very well because the more points are involved; the higher the risk is
to click wrong parts.
Exp.No.1
Figure No.
Temp-late
Number of template points
Time to create a distance map2
(seconds)
Time to find the best match3
(seconds) MSE
1 Figure 33 tøj 4 3.4 27.0 0.5 2 Figure 71 tøj 4 3.8 28.0 1.0 3 Figure 72 tøj 4 3.3 27.0 Failed5
4 Figure 34 tøj 4 3.3 26.6 Failed 5 Figure 74 E G4 6 3.8 43.9 4.0 6 Figure 75 E G 6 3.8 43.8 1.2 7 Figure 76 E G 11 3.8 47.9 2.9 8 Figure 35 E G 11 3.8 49.8 2.5 9 Figure 78 E G 11 3.9 47.5 3.6
10 Figure 36 E G 11 3.9 50.7 Failed 11 Figure 80 E G 15 3.9 52.9 Failed 12 Figure 81 SPAR 8 3.9 46.8 1.1 13 Figure 82 SPAR 8 3.9 45.0 12.1 14 Figure 83 SPAR 10 3.8 44.9 1.0 15 Figure 37 SPAR 10 3.8 45.0 0.7 16 Figure 85 SPAR 4 3.8 41.4 Failed 17 Figure 38 SPAR 10 3.8 45.6 Failed
1. Result images are shown below if numbers are in bold font. 2. Time to create a distance map of the frame up to 20 pixels from edges. 3. Time to find an object which best matches with the template. 4. EG stands for ‘EL GIGANTEN.’ 5. Failed here indicates that another object is located instead of the target.
Table 3 Results of template matching with point templates
26 Results images shown are those whose result numbers are in bold font in Table 3.
71
The most serious problem is the processing time. For every frame, it takes (time to
create a distance map of the frame + time to find the best match) at least 30 seconds.
The more points involved, the longer it takes. We believe that even with better
computer conditions, real-time tracking cannot be achieved.
As for the accuracy, the ‘tøj’ shows a good match with small error even with small
number of points. Although ‘EL GIGANTEN’ is blurred more than ‘SPAR,’ the
number of points to obtain a good result is even smaller. This fact implies that the
good choice of template is more important than the number of points, as long as the
number is sufficient to represent the uniqueness of the billboard. A good template
captures the uniqueness of the design of billboards, such as L in ‘El GIGANTEN’ and
a curve between t and ø in ‘tøj.’ On the other hand, no part of the ‘SPAR’ billboard is
apparently different from any other parts in the frame. When the number of points is
larger than those in Table 3, the error becomes larger because of, presumably, the
larger error in manual clicking.
a) Template with 4 points b) Result
Figure 33 Result 1 with ‘tøj’ -4 points - successful Total time to find the match = 30 seconds, MSE = 0.5
a) Template with 4 points b) Result
Figure 34 Result 4 with ‘toj’ – 4 points - failed Total time to find the match = 30 seconds
72
a) Template with 11 points b) Result Figure 35 Result 8 with ‘El GIGANTEN’ – 11 points – successful
Total time to find the match = 54 seconds, MSE = 2.5
a) Template with 11 points b) Result
Figure 36 Result 10 with ‘El GIGANTEN’ – 11 points – failed
Total time to find the match = 55 seconds
a) Template with 10 points b) Result
Figure 37 Result 15 with ‘SPAR’ - 10 points - successful Total time to find the match = 49 seconds, MSE = 0.7
a) Template with 10 points b) Result
Figure 38 Result 17 with ‘SPAR’ - 10 points - failed Total time to find the match = 49 seconds
73
Next curves are used to represent templates for billboard ‘EL GIGANTEN’ and
‘SPAR.’ This time manually selected points are used as knots and a given number of
uniformly distributed points between the first and last knots are used for matching.
Therefore there is no problem in clicking corner points because they are not directly
used for matching. Moreover, we can select more points for matching than we
actually click.
Table 4 shows the experiment result using curve as template. Some of the good and
failed results are shown in Figure 39 to Figure 42.
Exp.No.
Figure No. 1
Temp-late
Number of
points clicked2
Number of points
used3
Time to create a distance
map4
(seconds)
Time to find the best
match5
(seconds)
MSE
18 Figure 87 E G6 6 11 3.8 44.7 1.7 19 Figure 39 E G 6 11 3.8 48.2 3.1 20 Figure 89 E G 6 11 3.8 48.2 3.9 21 Figure 90 E G 6 14 3.8 51.0 2.3 22 Figure 91 E G 6 14 3.8 51.0 2.7 23 Figure 92 E G 10 16 3.8 51.7 1.3 24 Figure 93 E G 10 16 3.8 45.0 0.7 25 Figure 40 E G 6 11 3.9 51.9 Failed7
26 Figure 95 E G 10 11 3.9 51.0 Failed 27 Figure 96 SPAR 6 5 3.8 41.0 15.4 28 Figure 97 SPAR 6 10 3.8 48.4 8.3 29 Figure 98 SPAR 6 10 3.8 45.9 11.0 30 Figure 99 SPAR 10 10 3.7 45.2 1.8 31 Figure 100 SPAR 10 15 3.8 48.5 1.9 32 Figure 41 SPAR 10 15 3.8 44.6 1.8 33 Figure 42 SPAR 10 10 3.9 47.8 Failed 34 Figure 103 SPAR 15 15 3.8 52.7 Failed
1. Result images are shown below if numbers are in bold font, other images are in the appendix. 2. The number of points clicked to create a template curve 3. The number of points used to find a best match 4. Time to create a distance map of the frame up to 20 pixels from edges. 5. Time to find an object which best matches with the template. 6. EG stands for ‘EL GIGANTEN. 7. Failed here indicates that another object is located instead of the target.
Table 4 Results of template matching with curve templates In both SPAR and EG cases, a template can be represented by a curve with six points.
The more points are chosen, the more accurate the result is, however the more time it
takes. Then it saves labour if more points are used than manually selected. When we
use as many points to find the best match as we have selected, the results are often not
good, even though many points are selected. Presumably this is because nearly the
same points as manually selected, which are often at corners, are used for finding the
74
best match. Therefore the match is better when more points are used for matching
than originally selected as knots.
However, more points are required for matching than previous experiments 1-17
because presumably we cannot deliberately choose points uniquely representing a
billboard like we could for experiments 1-17. Therefore the choice of points is less
important than point template cases.
a) Template with 6 points curve b) Result
Figure 39 Result 19 with ‘EL GIGANTEN’ - 6 points are clicked, 11 points sampled - successful Total time to find the find the match = 52 seconds, MSE = 3.1
a) Template with 6 points curve b) Result
Figure 40 Result 25 with ‘EL GIGANTEN’ - 6 points are clicked, 11 points sampled - failed Total time to find the find the match = 56 seconds
a) Template with 10 points b) Result
Figure 41 Result 32 with ‘SPAR’ - 10 points are clicked, 15 points sampled - successful T
otal time to find the find the match = 49 seconds, MSE = 1.8
75
a) Template with 10 points b) Result
Figure 42 Result 33 with ‘SPAR’ - 10 points are clicked, 10 points sampled - failed Total time to find the find the match = 49 seconds
5.4.2. Findings and Discussions
1) Speed
The above experiment shows clearly that deformable template matching is very slow.
It takes approximately half a minute to process the image with a simple background
using only 4 points to represent the template. When the number of the points used for
the template increases and the background becomes more complicated, meaning that
there are more edges in the image, the processing time increases tremendously. It is
unlikely to achieve on-line tracking even with faster computers or fast program
languages. Therefore we give up the idea of using this method for object tracking.
2) Accuracy
The deformable template matching method can locate target objects in images with
clear edges, regardless of blur and the number of template points, as long as these
points are enough to represent the unique shape of the template. This accuracy is very
likely achieved because not only the distance but also the edge direction between all
the edges selected on the template and the target object are used to check the
goodness of fit.
3) Good template to represent the uniqueness of the template
A unique template representation is the key to a successful matching. Curve templates
are better than point templates because it saves labor to select points enough to
represent a billboard, which contributes to reducing the error in manual clicking.
76
However, it becomes more difficult to define an object uniquely when the target
object does not have very strong edges in the image. The deformable template
matching uses only the edge feature for matching, so a well defined clear edge of the
target object is essential in using this method. There is no guarantee that the edge
points of the object, which we would like to track, are visible in the edge map,
because we use one canny edge threshold value to process one sequence. As described
in the algorithm, the template is placed on all the edge points on the edge map, if the
corresponding match point of the first point of the template is not visible in the edge
map, the real object will never be located accurately in the frame. It is difficult to find
an optimal threshold value to suit all the frames or define a different threshold value
for each frame. We will further discuss this in the experiment on condensation
algorithm.
Of course, the more points used, the more likely the uniqueness of the template is kept
and more likely to succeed in matching. However, the processing time simply
increases proportional to the number of points, whereas the accuracy depends more on
the quality of the template representation than the number of points. So as long as the
points are enough to represent the unique feature of the template, a huge number of
points are not necessary. However, it is difficult to know which points better represent
the billboard and at least how many points are required.
4) Generalization
Once a good template is defined for a billboard, we believe it is possible to detect the
same design no matter how frames are taken as long as it is not drastically deformed,
since as assumed in section 3.3, deformation of the template is not taken into account
in the algorithm. If it is, even a deformed design can be detected. We will not try to
include the deformation part as it will take even more time to process a single frame.
We think template matching is a strong method to detect the accurate position of the
template. It is, however, a fundamental flaw that the processing speed is intolerably
slow for tracking.
77
5.5. Experiments on Condensation Algorithm
Based on section 5.1 Experiment Purpose, many experiments are conducted on the
condensation algorithm to show how different factors influence the performance of
the tracker. These factors are mainly a good template representation, an accurate set
of coefficients , , and and an optimal number of particles need to be generated. 0A 1A 0B
As we have discussed earlier in section 4.4.1 in Chapter 4, coefficients , , and
are very important in successfully tracking the target. They approximate the
motion of the system. As such, we will classify the experiments according to how the
coefficients , , and are approximated.
0A 1A
0B
0A 1A 0B
5.5.1. Define an Optimal Template A template can be represented by a single curve, as in Figure 25. However, the
condensation tracker often fails to track a simple single curve when a similar shape,
but of different size or rotation, is hit by a particle or a sample. Furthermore, in
reality, it is very difficult to represent unique features of a billboard with only a single
curve. As such, we use multiple curves to represent billboards.
Take the ‘EL GIGANTEN’ billboard for example; we can represent it with EL plus
the billboard borders.
Figure 43 Template with multiple curves As show in Figure 43, the template is represented by six curves (third-order B-spline
curve) and four straight lines (first-order B-spline curves). We denote these six
segments all together as a curve.
78
In the measurement step, we use the same Q to get the control points for each segment
and then eventually the six segments of each particle. Using the same method for
measuring one curve as described in section 4.4.4.2. Running the condensation
tracker, we measure the distance to the nearest edges to all the six segments and the
sum of these edge distances will be used to calculate the normalized weight. Let L(i)
be the shortest distance to the ith segment, then the total distance can be written as
Total L= and after measuring the Total L for every particle, we use equation
17 to get the normalized weight for each particle.
∑=6
)(I
iiL
It will be ideal to include all the curves of a billboard in the template, but it will
increase the computation time in the measurement step. In the experiments, three
curves are used to represent each billboard. The curves that are chosen to represent a
billboard are, first of all, the ones that we assume can be detected by the canny edge
in the measurement step. Secondly, these curves together should be a good
representation of the billboard.
Figure 44 shows some examples of the templates used to represent some billboards.
a) ‘e’ template b) ‘El GIGANTEN’ template c) ‘SPAR’ template
Figure 44 Templates with multiple curves a) ‘e’ billboard is represented by the shape of ‘e’ and the billboard’s bottom line. b) ‘El GIGANTEN’ billboard is represented by the shape of E and the lines around E and the billboard’s bottom line. c) ‘SPAR’ billboard is represented by the shape of S and R and the lines around SPAR.
5.5.2. Coefficients , , and 0A 1A B
As mentioned earlier, the more accurate the coefficients A0, A1 and B, the more robust
and accurate the tracking will be. As such, we would like to find out if we can train or
define a set of A0, A1 and B, which are suitable for billboard tracking in all parts of the
79
handball games. We assume that the camera at each side moves in roughly the same
pattern, even though every handball game has its own features. So the question is that,
suppose we can train on enough sequences from handball games to obtain A0, A1 and
B, whether this trained set of A0, A1 and B works well in tracking any billboard in
general, or whether we can define a simple default set of A0, A1 and B.
(1) Estimated , , and B from training data 0A 1A
A training sequence, ,…, , where j is the number of training samples, is needed in
order to estimate A
1Q jQ
0, A1 and B (Blake et. al. (1995) and Reynard et. al. (1996)). In
order to make the situation simple, we use the four corner points of the billboard as
the interpolated points to obtain a linear B-spline curve (first-order) to represent the
template. Taking the ‘e’ billboard for example, the manually selected four yellow
corner points seen in Figure 45 are the control points needed to be tracked manually
in each frame of the video sequence. Qs can then be calculated using equation 30.
Figure 45 Template for training Yellow points are manually clicked points used to calculate control points. Red points are calculated control points, which have been transformed so that the first control point is at the coordinate origin.
Using the control points obtained manually from the training data, Qs are calculated and , and B are estimated. 0A 1A We observed that during the video sequence, the target billboard’s size and direction
have only very slightly changed, so the values of Q(2)s, Q(3)s, Q(4)s, Q(5)s and Q(6)s
do not change much through this video sequence. Furthermore, Q(5) and Q(6) are
very small as there is not much shearing for the shape of the billboard for this
sequence. As mentioned earlier, the 5th column determines the rotation and shearing
relative to y coordinate and the 6th column determines the rotation and shearing
relative to the x coordinate, we expect that the 5th and the 6th rows of the , and B 0A 1A
80
matrices will be very small. It can be seen from Table 11 and Table 12 in appendix,
which shows the s, s and Bs calculated from the training data. 0A 1A
In order to observe how the camera moves, we depict the motion of the centre of ‘e’
billboard from the training data (sequences) we manually tracked (denoted as C) in
the following figures, where the x axis is the ordinal numbers of the frames in the
video sequences and the y axis is C’s x coordinate or y coordinate in each frame of the
video sequences.
e1.avi e2.avi e3.avi
e4.avi e5.avi en11.avi
en12.avi en13.avi en14.avi Figure 46 Horizontal motion of the centre of ‘e’ billboard in a sequence—training data take from
camera N (training sequences e1.avi-e9.avi, en11.avi-en14.avi) The x-axis is the ordinal numbers of the frame and the y-axis is C’s x coordinate in each frame.
e1.avi e2.avi e3.avi
e4.avi e5.avi en11.avi
en12.avi en13.avi en14.avi Figure 47 Vertical motion of the centre of ‘e’ billboard in a sequence —training data take from
camera N (training sequences e1.avi -e5.avi, en11.avi -en14.avi) The x-axis is the ordinal numbers of the frame and the y-axis is C’s y coordinate in each frame.
81
e6.avi e7.avi e8.avi
e9.avi e10.avi es11.avi
es12.avi es13.avi es14.avi
Figure 48 Horizontal motion of the centre of ‘e’ billboard in a sequence —training data take from camera S (training sequences e6.avi -e10.avi, es11.avi -es14.avi)
The x-axis is the ordinal numbers of the frame and the y-axis is C’s x coordinate in each frame.
Figure 49 Vertical motion of the centre of ‘e’ billboard in a sequence —training data take from camera S (training sequences e6.avi -e10.avi, es11.avi -es14.avi)
The x-axis is the ordinal numbers of the frame and the y-axis is C’s y coordinate in each frame.
It is very difficult to observe a clear motion pattern from Figure 46 to Figure 49,
however, it shows that the motion difference between two consecutive frames is about
0-20 in the x coordinate and 0-10 in the y coordinate. The motion along the x
coordinate is bigger than that along the y coordinate. The horizontal motion direction
is changing. Sometimes the billboard moves to the right and sometimes it moves to
the left, so instead of using the trained set of , , and 0A 1A B , a simple default set can
probably be defined.
82
(2) Default coefficients , , and 0A 1A B
In the description of the prediction step, we have stated that , , and values can
be the likely default values. The simple , , and matrix can be the matrix which
have non-zero values only at the diagonal. Our experiments show that the particles do
not move when and are set as:
0A 1A 0B
0A 1A 0B
0A 1A
0A = = 1A
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
500000005000000050000000500000005000000050
..
..
..
and there is no Gaussian noise effect when is set as: B
B =
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
000000000000000000000000000000000000
When the diagonal values for matrix and are set higher than 0.5, the particles
move towards the right, and when the diagonal values are set lower than 0.5, the
particles move towards the left. This finding is only based on our eye observation
from the experiment. We have not further deplored how the elements in matrices ,
, and B influence each other due to time limit. However, we would like to use the
sets of , , and B , which we observed works fairly well for a few sequences, to
test if they can be used as reasonable default values. So, initially we set , , and
B as:
0A 1A
0A
1A
0A 1A
0A 1A
0A =0.48 I× , =0.48 1A I× ; B=diag([15 15 0.01 0.01 0.001 0.001])
and during the tracking, the direction of the motion from time t-2 to t-1 needs to be
checked. One condition for this method to work is that we trust the tracker, that is to
say, we consider the tracked particle as the optimal target. Our assumption is that the
83
next movement will be the same as the previous movement; therefore the movement
at time t will be to the right if the movement from time t-2 to t-1 is to the right. Since
the first value in Q vector, Q(1) indicates the translation in the x direction, we can say
that if (1) - (1) is positive, then at time t, the motion is toward the right, then
we update , and as:
1−tQ 2−tQ
0A 1A
0A =0.52 I× ; =0.52 1A I×
Otherwise, the motion at time t is toward the left and we update , and as: 0A 1A
0A =0.48 I× ; =0.48 1A I×
as illustrated in Figure 50.
1 2
3
tQ 2−tQ 1−tQ
3 1
2−tQ tQ 1−tQ
1−tQ (1)- (1)<0, then Q is supposed to move to the left, ie,
2−tQ t
A0=A1=0.48 × eye (6)
2
1−tQ (1)- (1)>0, then Q is supposed to move to the right, ie,
2−tQ t
A0=A1=0.52 × eye (6)
Figure 50 Check motion and set default and 0A 1A
We will compare tracker’s performance of using the estimated , and B with that
of using default ,and B. Regarding the trained A
0A , 1A
0A , 1A 0, A1 and B five different
situations are tested:
1) whether it can model the dynamics of those video sequences taken from the same
camera;
2) whether it can be used to model the dynamics of the video sequences taken from
other cameras;
3) whether tracking will be successful if we combine the training data from different
cameras and obtain a corresponding set of A0, A1 and B to model the dynamics of
any video sequence of a handball game regardless of which camera being used;
4) whether the trained set of A0, A1 and B obtained by the data of the locations of
billboards located at one place can be used to track the billboards located at other
places in the frame image, for example, whether we can use the trained A0, A1 and
84
B from billboards at the corner, such as the‘e’ billboard, to track the billboards at
the side of the playground, say ‘SPAR’ billboard;
5) whether on-line training can be used to get A0, A1 and B.
5.5.3. Result and Findings
Many factors affect the performance of the condensation algorithm, such as the
number of particles, the initial searching range where the particles need to be
generated, the coefficient , , and and etc. It is very difficult to separate those
factors in the experiments. As such, when doing experiments, we manually specify
some values, such as the initial range, in order to observe the influence of each
individual factor.
0A 1A 0B
5.5.3.1 Experiment on the performance by number of samples In order to conduct such experiments, we manually tracked the billboard of each test
sequence and estimated the , , and B for them. We consider these
estimated , , and B for each test sequence accurate enough to represent the
dynamic model of each sequence. The initial range input for each test sequence can be
found in Table 13 in the appendix.
0A 1A
0A 1A
Although Isard and Blake (1998) use more than 1000 samples, we use 300 at most
because of the processing speed. The more particles are involved, the slower the
implementation becomes. By observing the tracking process, we can see that the
tracking result using 300 particles is better than using 100 particles. With 300
particles, the tracker keeps tracking the target. The tracking is accurate with respect to
location, size and shearing factors. Even when the particles hit places with similar
curves, the tracker can still track the right target (See Figure 51)
85
Figure 51 Tracker successfully track the right target even tough some particles are predicted to
be around a similar curve near by. The image is taken from the process of tracking the ‘e’ billboard in video sequence new11.avi
Below Table 5 shows the average error of tracking for each video sequence we tested
and the standard deviation. As mentioned earlier, the error in the below table includes
the error in manual tracking. So even though the error is big, such as 26.198 in
ree11.avi, the tracking result by inspection is rather good as shown in Figure 52.
Figure 52 Good tracking result despite a high error
Tracking result is rather good by inspection even though the average error measured is as high as 26.198.The images are taken from the tracking result sequence of ree11.avi.
86
ExpNo27.
Test data (avi)
Result file28
(avi)
No. of particles
Confidence interval of the error η
Confidence interval of manual tracking
error η ’29
Standard deviation
1 new 11 53 100 2.5871e+007<η <5871e+007 7.3299<η <43.12 1.0016e+008
2 ´´ 54 300 12.0<η <31.0 ´´ 14.6378
3 ei ree1100 100 91474<η <91486 1.6350<η ’<14.7650 41853
4 ‘’ ree11 300 20.3<η <32.0 ´´ 15.46
5 tn3 retn33 100 11.1<η <25.8 0.5081<η ’<26.7419 10.848
6 ‘’ retn3300 300 10.2<η <25.0 ´´ 10.519
7 ts1 rets110 100 50.1<η <59.5 3.8960<η ’<18.2540 49.124
8 ‘’ rets18 300 42.1<η <51.5 ´´ 38.33 9 ts2 rets22 100 6.3<η <16.5 3.7280<η ’<23.5720 7.1146
10 ‘’ rets21 300 4.7<η <14.9 ´´ 5.4373
Table 5 Errors in tracking using A0, A1 and B trained from the test sequence itself In order to see how the tracker’s performance is through out the tracking process, we
use the below Figure 53 to show the MSE occurred at each frame for the above test
video sequences. The x axis is the ordinal number of the frame and the y axis is the
MSE occurred at each frame. Since errors can be very big, the y axis is with
logarithmic scale, which is often used in plotting drastic changes.
27 Each experiment result is given a number, which makes it easier to refer to a particular experiment result later in the thesis. 28 Each experiment result is a new video sequence, where the tracking result is recoded. 29 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
87
53.avi 54.avi
ree1100.avi ree11.avi
retn3100.avi retn3300.avi
rets110.avi rets18.avi
ret22.avi ret21.avi
Figure 53 Errors per frame –experiment no. 1 to 10
The x-axis denotes the frame number and the y-axis denotes MSE at each frame. The above experiments show that the number of particle samples is a very important
factor in deciding the performance of the tracker. The more samples we use, the better
the tracking result is. Take experiment 1 and 2 for example, the tracking result using
300 particles is much better than that using 100 particles. When the particle numbers
are not enough, it is more likely that the target object is not hit by the particles, as
such, the tracking result might be very wrong. However, the better performance is
achieved by sacrificing the speed of the tracker, ie, the tracker needs more time for
tracking when the number of particle samples increases. It takes generally 5.1 second
to process each frame using 100 particles and 8.0 second using 300 particles.
It is very difficult to have a universal optimal number to suit all trackers for tracking
under different kinds of conditions. Take experiment no.1, 2 and experiment no.9 10
88
for example, comparing with its performance using 100 particles in experiment 1, the
tracker’s performance improves dramatically in experiment no. 2 using 300 particles.
However, we cannot see such an improvement in experiment no.9 and no.10. The
reason is that the initial range inputs are different. As described in section 4.4.4
Detailed Algorithm of Chapter IV, a number of particles are generated in a certain
searching area with certain rotation and shearing parameters which are defined by the
user’s input Range30. From the table here, we can see that the initial range where
particles are initially generated for experiment no.1
and no.2 (100 x 50) is bigger than that
of experiment no.9 and no.10(50 x30).
Furthermore, the initial shearing factors
are neglected in experiment no.1 and
no.2, while they are much closer to the
true values in experiment no.9 and
no.10. As such, fewer particles are
required to achieve a good tracking
result. We can also observe from Table 5 that the standard deviation of the error
decreases when the number of particles increases, which indicates that the
condensation tracker becomes more stable.
Exp. No. 1&2 9&10
Test Sequence new11.avi ts2.avi
Initial Range Min. Max. Min. Max.
Q0(1) 480 580 100 150
Q0 (2) 120 170 190 220
Q0 (3) -0.5 -0.25 -0.5 -0.25
Q0 (4) -0.5 -0.25 -0.5 -0.25
Q0 (5) 0 0 -0.15 -0.1
Q0 (6) 0 0 0.2 0.4
We assume that at least 300 particles are needed in order to achieve an acceptable
accuracy level using our tracker, however, this optimal number is valid under the
restriction that the initial range, within which the particles are generated, is a good
guess of where the target would be. If the variations of the particles’ locations, sizes
and shearing factors are in a wide range, the optimal number of 300 might not be true.
5.5.3.2. Experiment on the performance using trained , , and B based on another sequence
0A 1A
(1) A0, A1 and B trained from sequences taken by the same camera.
(2) A0, A1 and B trained from sequences taken by different cameras.
(3) tracking non-corner billboard with A0, A1 and B trained from corner
billboard.
30 Later in Chapter VII, we briefly mentioned that we might be able to get automatically the values for Range by using the deformable template matching method.
89
(1) A0, A1 and B trained from sequences taken by the same camera.
Based on the findings in experiments using different numbers of samples, we use 300
particles in the experiments in Table 6 and the initial range for each test sequence is
the same as the one used in the previous experiments no.1 to no. 10.
Exp.No.
Test data (avi)
Result file
(avi)
Coefficients
0A , , and B 1A
Confidence interval of the error η
Confidence interval of manual tracking error
η ’31
Standard deviation
11 e1 ree13 Trained from e1.avi-e5.avi
5348<η <5360 1.6350<η ’<14.765 2813.7
12 tn3 retn32 ´´ 2193<η <2207 0.5081<η ’<26.742 1229.5
13 new11
new11-3 ´´ 3595 <η <3614 7.3299<η ’<43.12 4277.2
14 E1 ree15
Trained from e1.avi-
e5.avi,en11.avi-en14.avi
13799<η <13811 1.6350<η ’<14.765 5111.2
15 tn3 retn37 ´´ 2939<η <2954 0.5081<η ’<26.742 462.49
16 new11
renew116 ´´ 45.2<η <64.3 7.3299<η ’<43.12 74.344
17 ts1 rets16 Trained from
e6.avi-10.avi, es11.avi-es14.avi
2511<η <2521 3.8960<η ’<18.2540 1934.8
18 ts2 rets24 ´´ 8.8<η <19.0 3.7280<η ’<23.5720 9.9144
Table 6 Errors in tracking using A0, A1 and B trained from sequences taken by the same camera We can see that the errors are very big except the experiment no.18, where the initial
range is very limited and close to the true state of the billboard. The failure is due to
many reasons. First of all, the coefficients A0, A1 and B do not accurately represent the
motion model as those trained from each test sequence itself. In different sequences,
the camera moves with different speed and zooming, as such, the oscillatory motion
trained from many sequences are the general representation of all the sequences and
the noise coupled to the dynamics model also increases. Take experiment no.13 and
experiment no.16 for example, test sequence new11.avi is taken by camera S and
training data el.avi to e5.avi are also taken by the same camera S, but all with lower
zooming than that used in new11.avi . Training data en11.avi to en14.avi are taken by
camera S with similar zooming as in new11.avi. When we use , , and B trained
only from el.avi to e5.avi, the tracking error is approximately 3500-3600. The tracker
only manages to track the rough locations of the target for a few frames and lose the
track totally very quickly. When we include en11.avi to en14.avi in the training data
0A 1A
31 If the confidence interval of the error η is below the maximum of η ’, we consider the tracker successfully track the target.
90
and re-estimate , , and B, the tracker’s performance becomes better and the
average error decreases to approximately 45-60. Figure 104 in the appendix shows the
MSE per frame for the above experiments. We can observe that tracker follows the
right location of the targets for most of the frames throughout the sequence. We
believe that if the training data are large enough and carefully chosen, a good
approximation of coefficients , , and B can be obtained.
0A 1A
0A 1A
In order to further test whether coefficients , , and B trained from the data taken
by one camera can be used in tracking the sequences taken by the same camera , we
further manually limit the initial range as
0A 1A
Table 14, so that the searching area and the initial state are very close to the true
location and state of the target. With these new range inputs, we conduct experiment
no.11 to no.17 again and the results are shown in Table 15 in the appendix. It is still
very difficult to conclude whether coefficients , , and B trained from the data
taken by one camera are stable and good enough to be used in tracking as only re-
experiment on experiment no.11 (experiment no.19) shows a satisfactory result. Other
experiments still have very huge errors. The only conclusion we can make is that the
more accurate coefficients , , and B represent the system motion model, the better
the tracker performs. The tracker’s performance is better and more stable when
coefficients , , and B trained from the sequence itself , as in experiments no.2, 4,
6, 8, and 10.
0A 1A
0A 1A
0A 1A
We observed in above experiments, in many failed cases, the tracker keeps on
tracking the same wrong target, even though sometimes, the real target is hit by one of
the particles. If the tracker had tracked various different locations with big variations,
we could have concluded that we could not use , , and B trained from other
sequences. However, we cannot observe such phenomena. Take experiment no.24 for
example, Figure 54 shows some the tracking results in tracking billboard ‘e’. The
tracker tracked the right target until 20
0A 1A
th frame. It keeps on tracking the corner of
billboard ‘Harboe’ from the 20th frame till the end of the test sequence ts1.avi.
91
15th frame 18th frame 21st frame 24th frame
27th frame 30th frame 33rd frame 36th frame
Figure 54 Tracking results from experiment no.24 We can further observe from experiment no.12 and no.20, where we tracked the
billboard ‘e’ in the test video sequence tn3.avi, the tracker jumps to another target
from almost the same frame in the two experiments. The same occurred in experiment
no.17 and no.24, where we tracked the billboard ‘e’ in the test video sequence ts1.avi.
a) top- retn32.avi (exp. No.12), b) top- rets16.avi (exp. No.17),
bottom- retn3R1.avi (exp. No.20) bottom- rets1R3.avi (exp. No.24)
Figure 55 Error per frame for experiment no.12 and no.20 and experiment no.17 and no.24 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
92
So another very import factor which affects the performance of the tracker is the
edges detected by the canny edge detector in the measurement step. As discussed in
chapter II Image Features for Tracking, different canny thresholds result in different
edge images. A higher threshold value might keep many edges in the searching area
in the edge image of each frame, which might mislead the tracker, as shown in the
below figure. When threshold is set as 0.3, the ‘e’ billboard edges are remained,
however, the edges in the nearby billboard are kept as well. If many particles hits both
red circles, the final located target might not be the true target at all. So in stead of
tracking ‘e’, the tracker might track the nearby similar curves or the final target will
be located in the middle of the two red circles.
a) Edge map of the 1st frame of the test video sequence tn3.avi
b) Tracking result of the 15th frame –experiment no.20 – the location of the target is in the middle
of the two red circles in a)
93
c) Tracking result of the 15th frame –experiment no.20 –the target is the nearby similar
curves in red circles in a) instead of ‘e’.
Figure 56 Influence of canny edge threshold value on tracker’s performance If the canny edge threshold is set higher, the nearby edges might not be kept in the
edge image, but at the same time, the target’s edge might be excluded as well. It is
very difficult to set a single good threshold value to suit all occasions, especially
when the camera changes zooming or moves very fast throughout a sequence, because
the target might get more blurred due to motion blur or being out of focus. As shown
in Figure 57, in the video sequence tn2.avi, the camera pans very fast. Even though
the camera zooms in this video sequence, the edges of the target are not totally visible
due to strong motion blue. The left corner edges of the target are missing (as in the red
circle in the figure). So the tracker is misled to the edge of the players.
a) Tracking result of 4th frame –experiment no. 25 b) Part of the edge image asTable 16 in appendix
Figure 57 Influence of motion blur on tracker’s performance The failure caused by edge image can also be seen later in image d) of Figure 58 in
this chapter when we use the default , , and B for tracking. In all the experiments,
we use actually different thresholds for different sequences in order to exclude or
reduce the influence of the canny edge threshold value. During actual real-time
tracking, however, we cannot change the threshold value manually. So, one optimal
0A 1A
94
threshold value needs to be defined beforehand or the tracker should automatically
generate different threshold to suit tracking in different zooming effect, we will leave
this for future improvement due to time limit.
Besides the problem with edges, the way we calculate the weight of each particle in
the measurement step has limitations as well. Unlike in the template matching, the
gradient direction information is not used. As such, the tracker only checks the
distance to the nearest edge, thereby the shape of the curve is actually ignored.
(2) A0, A1 and B trained from sequences taken by different cameras
Based on the results of the above experiments, we do not expect that the tracker will
track the target successfully using A0, A1 and B trained from sequences taken by
another camera or by different cameras. Table 7and Table 8 show that the tracker fails
to track the right target in all the cases. The errors as per frames are shown in Figure
107 in the appendix.
Exp.No. Test data result file Confidence interval of
the error η
Confidence interval of manual tracking
error η ’32
Standard deviation
26 e1.avi ree1R7.avi 48792<η <43804 1.6350<η ’<14.7650 44102
27 tn3.avi retn3R5.avi 5784<η <5800 0.5081<η ’<26.7419 866.8
28 new11.avi renew11R11.avi 383.2<η <402.2 7.3299<η ’<43.12 185.13
29 ts1.avi/ rets111.avi 10282<η <10292 3.8960<η ’<18.2540 3091.2 30 ts2.avi/ rets25.avi 7022<η <7033 3.7280<η ’<23.5720 6269.7
Table 7 Errors in tracking using A0, A1 and B trained from sequences taken by another camera
Exp.No. Test data result file Confidence interval of
the error η
Confidence interval of manual tracking
error η ’33
Standard deviation
31 e1.avi ree1R8.avi 22435<η <22447 1.6350<η ’<14.7650 11864
32 tn3.avi retn3R4.avi 4944<η <4959 0.5081<η ’<26.7419 623.92
33 new11.avi renew11R12.avi 106.3<η <125.3 7.3299<η ’<43.12 76.814
34 ts1.avi/ rets1K9.avi 14091<η <14101 3.8960<η ’<18.2540 21723
35 ts2.avi/ rets26.avi/ 1539<η <1550 3.7280<η ’<23.5720 2732.5
Table 8 Errors in tracking using A0, A1 and B trained from sequences taken by both cameras
32 If the confidence interval of the error η is below the maximum of η ’, we consider the tracker successfully track the target. 33 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
95
(3) Tracking non-corner billboard with A0, A1 and B trained from corner billboard
Most of the non-corner billboards are occluded because players are in front of them.
As such, the experiments on tracking non-corner billboard with A0, A1 and B trained
from corner billboard will be shown section 5.5.3.5 Experiment on occluded
billboards and billboards whose part run out of the scene.
5.5.3.3. Experiment on tracking with default A0, A1 and B
As discussed earlier in section 5.5.2, we would like to test if we can use a likely
default set of A0, A1 and B as the coefficients. Below Table 9 and Figure 58 clearly
show the error for each test sequence. As the experiments done earlier, the tracker
failed to track the right target for test sequence e1.avi and tn3.avi, instead, the tracker
keeps on tracking almost the same wrong places. We assume the reason is the same as
we have discussed so far.
In experiment no.41 to 43, the tracker managed to tracker the target’s rough location,
though the scaling and the shearing are not as accurate as those in experiment no.2, 4,
6, 8, and 10, where we use the coefficient A0, A1 and B trained from the sequence
itself. With the default A0, A1 and B, the tracker’s performance is as unstable as in the
earlier experiments (no.11-no.35). As shown in experiment no.42, when the player
runs near to the billboard, the bottom line edge of the target is invisible, furthermore,
the players have strong edges in the searching area, as such, the tracker jumps to the
player and starts tracking the wrong target, even though it has successfully track the
right targets for 19 frames.
96
Exp.No.
Test data result file
Confidence interval of the
error η
Confidence interval of manual tracking
error η ’34
Standard deviation Comment
39 e1.avi ree1Default2.avi 21484<η <21496 1.6350<η ’<14.7650 13728
40 tn3.avi retn3Default2.avi
5663<η <5678 0.5081<η ’<26.7419 2065.5
41 new11.avi
renew11Default2.avi
639.3<η <658.4 7.3299<η ’<43.12 2287.1
42 ts1.avi rets1Default1.avi
8283<η <8292 3.8960<η ’<18.2540 8351.7
43 ts2.avi rets2Default1.avi
109.7<η <120.0 3.7280<η ’<23.5720 45.091
initial ranges used are as those used in experiment no.19 –no.24
Table 9 Errors in tracking using default A0, A1 and B
a) ree1Default2.avi-The tracker failed to track the target.-exp.no.39
b) retn3Default2.avi – The tracker keeps on tracking the same wrong place as in other experiments partly
due to the edge image as described earlier in this section.- exp.no.40
c) renew11Default2.avi—The tracker keeps on track the right target most of the frames. The high
peak error in the pink circle occurs when the tracker jump to the nearby ‘harboe’ billboard.- exp.no.41
34 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
97
d) rets1Default1.avi. At 20th frame, the tracker jump to the player as the bottom edge line is no longer
visible- exp.no.42
r keeps on tracking the target through out the sequence, but the scaling and
5.5.3.4. On-line training
e observed from the above experiment that both the default and the trained A0, A1,
e) rets2Default1.avi – The trackeshearing are not accurate.- exp.no.43
Figure 58 Tracking result and error per frame using default A , A , and B 0 1
W
and B from the same video sequence give a fairly good tracking result, we would like
to combine them to make on-line training. Questions are when and how often we
should train the data. Our experiment shows that it is not a good idea to start training
data from the very beginning because initially particles are randomly chosen. They do
not represent the system’s motion well. We start to train the data from the 31st frame.
The tracking result is not as accurate as we can observe in using A0, A1, and B trained
from the test sequence itself, but the tracked curve is very close to the target.
Furthermore, we can also conclude that it is not necessary to train data to get a new
set of A0, A1, and B for every frame; however, it is difficult for us to find out an
optimal interval for training. Below figure shows some of the tracking results using
on-line training. 2 experiments result can be seen in Figure 109 and Table 17 in the
appendix.
98
31st frame 36th frame 41st frame 46th frame
Figure 59 Tracking using on-line trained A0, A1, and B—some result from experiment no.44
5.5.3.5. Experiment on occluded billboards and billboards whose part run out of the scene It is very difficult to train , , and B by manually tracking the billboards at each
side of the playfield, as most of the billboards at each side are either totally or partly
covered by the players or totally or partly out of the scene. The following Table 10
shows the experiment results of tracking billboards at the side of the playfield using
both ,
0A 1A
0A 1A , and B trained from tracking the corner billboard in training data taken by
the same camera and that trained from tracking the side billboard in the test sequence
itself. Because experiment no.16 and no.18 show very good tracking results
using ,0A 1A , and B trained from sequences taken by the same camera, we test only
these two video sequences, where in test sequence ts2.avi, billboard ‘e BOKS’ is
occluded and in test sequence new11.avi, billboard ‘EL GIGANTEN’ in the pink
circle as shown in
Figure 60 is also occluded.
99
This billboard, we name it as EL1, eventually goes out of the screen.
This billboard, we name it as EL2, has players in front.
a) 1st frame from new11.avi
b) 1st frame of ts2.avi
Figure 60 Video sequences used to test occlusion cases
Exp.No.
Test data result file
Coefficients
0A , , and B 1AConfidence
interval of the error η
Confidence interval of manual
tracking error η ’35
Standard deviation
46 ts2.avi
reBoksTs25.avi
trained by tracking the side billboard itself of the same sequence
686.0<η <696.0 3.7280<η ’<23.5720
758.07
47 ´´ reBoksTs24.avi
by manually tracking the corner billboard
2383<η <2393 ´´ 1075.4
48 new11.avi reelg03.avi
trained by tracking the side billboard itself of the same sequence
4444<η <4463 7.3299<η ’<43.12 3232.4
49 ‘’ elgan12.avi by manually tracking the corner billboard
45044<η <45064 ´´ 35028
Table 10 Errors in tracking using A0, A1 and B trained by manually tracking the corner
billboard and those trained by tracking the side billboard itself of the same sequence -experiment no.46-49
As shown in Table 10, using the coefficients ,0A 1A , and B trained from tracking the
corner billboard of training data taken by the same camera, the tracker’s performance
35 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
100
is not satisfactory. In tracking ‘eBOKS’ billboard case, the tracker managed to track
the rough location of the billboard, but the rotation of the tracking result is very wrong
from the very beginning even though the initial two frames tracked almost the correct
target (see Figure 61 )
1st frame 2nd frame 3rd frame 4th frame
Figure 61 Tracking result of experiment no. 47- reBoksTs24.avi On the contrary, the tracking result using the coefficients and B trained by the
tracking ‘eBOKS’ billboard in the sam test video sequence is much better. We
ed and
nore the projection from 3D to 2D images.
Tracking ‘EL GIGANTEN’ is very difficult. We can observe that the tracking result is
s
mation to define the
0A , 1A ,
e
assume one of the reasons is that we assume the billboards are affine transform
ig
very bad. One reason is clearly the canny edge threshold as we have observed in all
the previous experiments. When we lower the canny edge threshold, the error seem
decreased. Since our measurement step uses only the edge infor
degree of fitness, edges for each frame are very decisive in tracker’s performance.
The figure below shows the canny edge images of some frame. As we observed, there
are seldom clear edges for ‘EL GIGANTEN’ billboard even though we have lowered
the canny edge threshold to 0.3 or 0.1. The strong edges for this billboard are the
bottom line and the lines around text ‘EL GIGANTEN.’ These lines are however not
the unique representation of ‘EL GIGANTEN’ billboard.
101
Figure 62 Canny edges of ‘ELGIGANTEN Further to testing the tracking result for the cases like occlusion and part of im
nning out of the e eOcclusion.avi,
1st frame 6th frame
14th frame 24th frame
’ billboard
ages
ru image, we experiment on the video sequenc
sparOut.avi and brother.avi, where in eOcclusion.avi, billboard ‘e’ is occluded in the
last few frames, in sparOut.avi, billboard ‘SPAR’ runs out of the frame image in the
last few frames and in brother.avi, billboard ‘brother’ is occluded from start. The
results are shown in Table 18 in the appendix.
eOcclusion.avi sparOut.avi brother.avi
Figure 63 Test data—experiment no.50-52
for ppens, the tracker’s
We observe that when occlusion occurs, the tracker’s performance is affected. Take
‘eOcllusion3.avi’ example, before the occlusion ha
performance is pretty good. When occlusion occurs, as long as it does not
significantly change the shape of the curve, the tracker can manage to locate the target
102
as shown in Figure 64. But when occlusion covers most part of the curves, the tracker
fails to track the target.
11th frame 13th frame 15th frame 17th frame
22nd frame 24th frame 26th frame
Figure 64 Tracking occluded object-experiment no.50
ory. We analyse
ne reason is that the occlusion occurs right from start and it is difficult to locate the
For the case of tracking billboard ‘brother,’ the result is not satisfact
o
right candidate from the first few frames and after a few very wrong tracking, it is
difficult for the tracker to recover the target again. Another reason for the bad result is
that ‘brother’ billboard is like the ‘EL GIGANTEN’ billboard. They are rather long.
As such, when we increase the searching area, the number of particles might also need
to be increased in order to get a satisfactory tracking result. Furthermore, most of the
billboards on the side are the same billboards lying one after another. So it is very
likely that the tracking will switch tracking another same pattern billboard instead of
the one we target.
103
5.6. Further Discussions
Q at Measurement Step
has a unique
1 Q vector for its transformation. Since we have no clue about what exactly the
calculates a weighted average of Q vectors because it takes all samples
into account based on their probabilities. The weighted average can be a good
However when the weighted average
ay not always lead a good result because the value with the highest weight does not
a) b)
Figure 66 Mode in different distributions
5.6.1. Optimal Definition of Final
There are n Q vectors every time for n samples because each sample
6×
correct transformation is, an optimal Q vector should be defined based on convincing
reasoning.
Our tracker
alternative when the distribution is symmetrical and steep unimodal as shown in
Figure 65 a).
a) b) c)
Figure 65 Weighted average in different distributions
distribution is asymmetry like Figure 65 b), the
m
contribute much. Instead, the mode as shown in Figure 66 a) reflects the real
probability.
104
hen there are two peak mode as in Figure 66 b)
better than using the weighted average. At most of the handball games, there are
ssions so far only handle tracking one single billboard. Suppose the
acking is correct and there are a few of the same billboards in a scene, it might be a
.6.2. Optimal Number of Particles
,000 or more samples, particle filter’s
erformance improves a lot in tracking a shaking object with a natural speed in six-
rames, the particle
samples are more or less clustered at the same place as shown in Figure 67. As such,
ber of partic
W s as shown in Figure 66 c), using the
is
always a few same billboards in a scene. These billboards are either placed one after
another or at different places. Our current algorithm only gives one best result. As
such, if two billboards are hit by the particles and their weights will be both high, then
we are facing the problem as in Figure 66 c), the weighted average will fail to track
one of the good candidate, instead, the final result will be somewhere in-between the
two targets.
All the discu
tr
good idea to set up a good threshold to check the fitness. All the particles whose
fitness degree is above the threshold can be the multiple targets.
5
Isard and Blake (1998) argue that, with 1
p
dimensional shape space. Our result proves their findings as well, even though we
tested with fewer samples as 100, 200 and 300 due to the running time problem. We
assume that using a large number of particle samples and other fast computer
languages other than matlab, our tracker’s performance will be much better. As
mentioned before, an increased number of particles will increase the computation
time, so it is important to find an optimal number of particles where the required
quality and the speed of tracking are both at an acceptable level.
We observed that when using the trained A , A ,and B, after some f0 1
we assume that we can reduce the num les after a certain number of
frames by only re-sampling those particles which have higher probabilities and
ignoring the ones whose probabilities are very low.
105
1st frame-randomly throw particles in the range area 3rd frame
7th frame
Figure 67 Particles are almost with the same size and shearing as tracking co tinues.
.6.3. Initialization and the Stability of the Filter
As mentioned earlier in this chapter, in order to run the condensation algorithm, we
h particle with a weight
n
5
need to generate a number of particle samples and assign eac
according to how well the particle fits the template. Since the initial particles are
randomly generated, it is very often that the results for the first two frames are not
accurate. As long as the first two steps’ results are not very wrong, the condensation
algorithm will gradually locate the target through the iterative sampling and the
particle filter’s performance becomes stable (as shown in Figure 68). It is very
important that the template is unique enough to prevent the tracker jumping to another
similar object. It also requires that the curves chosen as the template should be strong
edges so that they will be visible in the binary edge image.
106
1st frame 2nd frame 3rd frame 4th frame Figure 68 Initial results & tracking results
hand, so
us games are used to
su ess
atching and Condensation
late matching and
onde ectively and demonstrated these with experiments
mplate matching algorithm, the
radient directions of the template edge and the frame image edge are used in
approximating the posterior probability. Secondly, in the template matching, the
However, in real-time tracking, the video sequence cannot be trained before
only the coefficients A , A , and B trained from previo0 1
approximate the system’s motion model. In such a case, an accurate initialization will
be preferable to ensure cc ful tracking.
VI. Comparison of Template M
In chapter III and chapter IV, we discussed deformable temp
nsation algorithm respc
discussed in chapter V. The two methods have many common features. First of all, a
good unique representation of the template is required in both methods. Secondly, the
shape of the template can be either rigid or non-rigid (deformable), although, in the
template matching method, we considered only the rigid case. Thirdly, both the
deformable template matching and the condensation tracker use edge information to
check the fitness level, ie, we use the edge information to approximate the posterior
probability of each sample (each location in the template matching method). In order
to approximate the posterior probability, the distances between particle’s curves and
their nearest edges in the frame image are calculated.
Both methods also have unique features. First, in the te
g
107
object is located in each frame by minimizing the total energy and does not utilize any
relationship between the frames such as the motion pattern. As such, the result from
the previous frame does not affect the result from the next frame and the tracking
from frame to frame is totally independent. We could utilize the result from the
previous frame by limiting the search region around the previous location of the target
object based on the assumption that the object does not move too quickly.
In contrast, the condensation tracker is a combination of factored sampling and a
stochastic model for object motion. A number of particles are generated, each of
hich has a state defined as a Q-space representation of the template curves and its
has at the same time its difficulties such as
itialization and inaccuracy at the beginning of the tracking. Deformable temple
able template matching to
cate the object initially and then define the initial particles’ range according to the
w
associated weights. The states of the particles at time t depend on their states at time t-
1. As such, the result from the previous frame affects the result from the next frame.
The final object location is obtained as the weighted average location of the particles.
It is more robust and time-saving.
It is clear that the condensation algorithm is more suitable for real-time tracking with
respect to speed. Condensation
in
matching, on the other hand, can locate the right object given a good representation of
the template and a clear edge image right from the start. This is achieved by a huge
amount of computations, which is very time-consuming.
Since each method has its advantage and disadvantage, we can try to utilize both of
them. For example, it may be a good idea to use the deform
lo
location of the detected object. As such, an automatic initialization becomes possible.
Of course, it will cause problem when a new clip starts, since new particles need to be
generated and the long processing time required by the template matching method
will cause delays in the broadcasting.
108
VII. Conclusion and Future Improvements
ing of a billboard
a handball game. Specifically, we studied two algorithms, deformable template
ound is cluttered and
tion is complicated. The deformable template matching finds successfully a target
combines statistical factored sampling
and deterministic and stochastic effects
d to the tracker in the future in order to improve
d the accuracy of the performance.
on of the curve as the state of the
ation in the measurement step to check how well each
We have implemented algorithms which can achieve real-time track
in
matching and condensation algorithm, and made a program for each of these and
tested them in terms of speed and accuracy. We have also discussed what constitutes
good template representation and the handling of occlusion
Tracking in real time is difficult, especially when the backgr
mo
by searching for edges which minimize the potential energy by penalizing the
deformation of the template. It takes a lot of time even to process just one frame. We
think it is more suitable for locating objects and retrieving from some database and
processing speed is not crucial.
The condensation algorithm, on the other hand,
for non-Gaussian observations in a sequence
in dynamic model. This algorithm is fast enough to track non-rigid motion even with
a complicated background. Motion blur, however, is a problem for a robust tracking.
A good number of samples are required to improve the accuracy at the cost of
processing time. Tracking can be achieved in real-time, if a faster programming
language other than matlab is used. Furthermore, the number of control points can
also affect the processing time; the more control points used, the more accurate the
template is represented. The fewer points used in the measurement step, the faster the
performance is.
The following features can be adde
the speed an
First, our tracker uses a Q-space representati
particles and uses edge inform
particle fits the template. However, other state parameters and image features can also
be used. For example, we can define the state as a searching window. Instead of
checking the nearest edges, we can compare the histogram or color combinations of
109
each searching window with that of the template window and then use the difference
to approximate the probability.
Secondly, we can eliminate particles with low likelihood as Rui and Chen (2001)
ggest. In order to do so, we should find a good threshold value to eliminate very
deal with motion blur and occlusion. If we can estimate the amount of motion blur
condensation might improve the
acker’s performance. Since the reliability of the condensation tracker improves a lot
su
unlikely particles. Then all the particles will be our targets as long as their values
obtained from the measurement step are higher than the threshold. This threshold
value will also help us to realize that the targets are no longer in the scene or the
tracker has lost tracking. In such a case, we should reinitiate the particles.
Alternatively, we can reduce the number of particles when the tracker becomes stable
in order to reduce the computational load, and subsequently the processing time
without lowering the quality. Through the iteration, the more likely particles gradually
get higher probabilities and less likely particles get lower probabilities. Until selecting
good particles, it is important to have many samples, however afterwards the number
does not matter a lot. We should find a reasonable number of iterations for the tracker
to stabilize, and after that number of iterations, we can reduce the number of particles.
Thirdly, in order to make the replacement as natural as possible, in the future, we need
to
in the video sequence, we can remove it when getting the edge map of the frames and
put the same motion blur back when inserting the new billboard. Furthermore, we
need to identify players and make masks for them so that we can detect occlusion and
put the players back after inserting new billboards.
Finally, a combination of template matching and
tr
if the initial Q-space state is sampled close to the target, initialization can be done
through template matching. It can be feasible even if it takes one minute for
initialization as long as the delay in presentation time is kept equal for every frame
and the sequence looks natural (Lu, 1997). Actually, Jain et. al. (1998) also suggest
the possibility of combining Kalman filtering or other prediction schemes with the
template matching.
110
The template matching can also be improved in some respects. First, as we mentioned
econdly we could utilize the variable m in equation 3 to change the locality and
inally, Jain and Zhong (2000) discuss that in the first step, region screening by
earlier, when implementing the deformable template matching method, we made the
assumption that the template does not deform so that we can ignore the internal
energy part in equation 8. However, a 3D to 2D projection is likely to deform the
shape of an object. Even though the deformation is small, it should be included in the
energy minimization calculation. At the same time, this inclusion takes additional
computation time, which result in longer processing time.
S
smoothness of the model. The original algorithm uses a threshold to decide whether
an object is the target object. If it is a good candidate, m is incremented by 1. Such an
elaborated screening mechanism searches objects more thoroughly, which may result
in more computational load together with more accuracy. In addition we have no idea
which threshold we should use; we choose the object with the minimum energy
defined by equation 7, which means that we can only detect one object.
F
texture and color information is quite useful for reducing processing time. Combining
edge features together with other features might improve the speed and accuracy of
this step as well.
111
Appendix
A. Gradient Magnitude for Pixel Representation
The gradient magnitude is defined as
[ ] yxyx GGGGyf
xff +≈+=
⎥⎥⎦
⎤
⎢⎢⎣
⎡⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+⎟⎠⎞
⎜⎝⎛∂∂
=∇½22
½22
eq. 51
for image function f (Gonzalez, 2001), where [ xG is called gradient. ]TyGOften intensities are distributed to each pixel as show in Figure 69.
1z 2z 3z
4z 5z 6z
7z 8z 9z
Figure 69 A 3×3 region of an image, where the z are intensity (gray-level) values In this case, the gradient magnitude is calculated as follows. 6859 zzzzf −+−≈∇ eq. 52
or if we should take gradients in four directions instead of two, 741963321987 2222 zzzzzzzzzzzzf ++−+++++−++≈∇ eq. 53
The corresponding masks for the former are called Roberts cross-gradient operators
and those for the latter are Sobel operators.
B. Color Model Conversion from RGB to HSI
H = eq. 54 ⎪⎩
⎪⎨
⎧
>−
≤
GB
GB
θ
θ
360 with
112
[ ]
[ ]⎪⎪⎭
⎪⎪⎬
⎫
⎪⎪⎩
⎪⎪⎨
⎧
−−+−
−+−= −
212
1 21
)BG)(BR()GR(
)BR()GR(cosθ eq. 55
S = 1- )B,G,Rmin(BGR ++
3 eq. 56
I = 3
BGR ++ eq. 57
We convert the color information into the intensity values by equation 57 in order to
calculate the gradient and subsequently find edges.
C. Estimation of A0, A1, and B by Maximum Likelihood Estimation Method.
First we pursue maximizing the log-likelihood with respect to and . That is
equivalent to minimizing the following equation with respect to and , by
ignoring the second term of equation 43 on page 56 because it is independent of
either or .
0A 1A
0A 1A
0A 1A
22
11102
110 ∑
−
=++
− −−=m
nnnn )QAQAQ(B)A,A(f eq. 58
This equation is equivalent to
∑−
=++++
−− −−−−=2
111021102
1110 ))((),(
m
n
Tnnnnnn
T QAQAQQAQAQBBAAf
= ∑ −
=++++++
− +−−−2
100201120222
1 ((m
n
TTnn
Tnn
TTnn
TTnn
Tnn AQQAQQAAQQAQQQQC
))1111011211110TT
nnTT
nnTnn
TTnn AQQAAQQAQQAAQQA ++++++ ++−+
= ∑∑∑∑−
=++
−
=
−
=++
−
=++
− −++2
1112
2
100
2
11111
2
122
1 )()()((m
n
TTnn
m
n
TTnn
m
n
TTnn
m
n
Tnn AQQAQQAAQQAQQC
∑∑ ∑−
=++
−
=
−
=++ −+−
2
1211
2
1
2
101102 )()()(
m
n
Tnn
m
n
m
n
TTnn
TTnn QQAAQQAAQQ
113
∑ ∑−
=
−
=++ +−
2
1
2
111020 )()(
m
n
m
n
TTnn
Tnn AQQAQQA
Then
eq. 59 )ZC(tr)A,A(f 110
−=where
TTTTTT ASASASAASAASASASAASASZ 101002012101010201210000111122 +−−+−−++=
, i,j=0,1,2, eq. 60 ∑−
=++=
2
1
m
n
Tjninij QQS
From equation 58, f is non-negative and quadratic (two dimensions), there is a
minimum of f and the minimum of zero is achieved when
01102 =−− ++ nnn QAQAQ
This condition is expanded as
=−− ++ nnnn QQAQAQ )( 1102 010100020 =−− SASAS
( =−− +++ 11102 ) nnnn QQAQAQ 011101021 =−− SASAS eq. 61
In this way the solutions and are obtained independent of C. 00 AA = 11 AA =
Now it is the turn to estimate C. By rewriting equation 43 on page 56,
11 detlog)2(21)(
21 −− −+−= CmZCtrL
Then taking the derivative with respect to (using the
identity ) and setting it as zero,
1−CTMMMM −≡∂∂ )(det/)(det
TCCC
mZL )(detdet
2 11
−− ⋅
−+=∇ = 0
114
By fixing and obtained in equation 61 and the optimum C which maximizes L
is given as
0A 1A ˆ
)ˆ,ˆ(2
1ˆ10 AAZ
mC
−= eq. 62
Since C is a covariance matrix, B can be simply obtained as a matrix square root of C:
CB =
D. Images from the Experiment
D.1. The Results of Section 5.4 Experiments on Deformable Template Matching
D. 1. 1. Experiment with Point Templates Given in Table 3
In the following figures blue points are manually clicked points and red points are the
best matched points on the target object.
a) Template with 4 points b) Result
Figure 70 Result 1 with ‘tøj’ -4 points - successful (The same figure as Figure 33) Total time to find the match = 30 seconds, MSE = 0.5
a) Template with 4 points b) Result
Figure 71 Result 2 with ‘tøj’ -4 points – successful Total time to find the match = 32 seconds, MSE = 1.0
115
a) Template with 4 points b) Result
Figure 72 Result 3 with ‘toj’ – 4 points - failed
Total time to find the best match = 30 seconds, MSE = 4578
a) Template with 4 points b) Result
Figure 73 Result 4 with ‘toj’ – 4 points - failed (The same figure as Figure 34) Total time to find the best match = 30 seconds, MSE = 4578
a) Template with 6 points b) Result
Figure 74 Result 5 with ‘El GIGANTEN’ -6 points – successful
Total time to find the best match = 47 seconds, MSE = 4.0
a) Template with 6 points b) Result
Figure 75 Result 6 with ‘El GIGANTEN’ – 6 points - successful Total time to find the best match = 47 seconds, MSE = 1.2
116
a) Template with 11 points b) Result
Figure 76 Result 7 with ‘El GIGANTEN’ – 11 points - successful
Total time to find the best match = 52 seconds, MSE = 2.9
a) Template with 11 points b) Result
Figure 77 Result 8 with ‘El GIGANTEN’ – 11 points – successful
(The same figure as Figure 35) Total time to find the best match = 54 seconds, MSE = 2.5
a) Template with 11 points b) Result
Figure 78 Result 9 with ‘El GIGANTEN’ – 11 points – successful Total time to find the best match = 51 seconds, MSE = 3.6
a) Template with 11 points b) Result
Figure 79 Result 10 with ‘El GIGANTEN’ – 11 points – failed
(The same figure as Figure 36) Total time to find the best match = 55 seconds
117
a) Template with 15 points b) Result
Figure 80 Result 11 with ‘El GIGANTEN’ – 15 points – failed Total time to find the best match = 57 seconds
a) Template with 8 points b) Result
Figure 81 Result 12 with ‘SPAR’ - 8 points – successful Total time to find the best match = 51 seconds, MSE = 1.1
a) Template with 8 points b) Result
Figure 82 Result 13 with ‘SPAR’ - 8 points - successful Total time to find the best match = 49 seconds, MSE = 12.1
a) Template with 10 points b) Result
Figure 83 Result 14 with ‘SPAR’ - 10 points – successful Total time to find the best match = 49 seconds, MSE = 1.0
118
a) Template with 10 points b) Result
Figure 84 Result 15 with ‘SPAR’ - 10 points - successful (The same figure as Figure 37) Total time to find the best match = 49 seconds, MSE = 0.7
a) Template with 4 points b) Result
Figure 85 Result 16 with ‘SPAR’ - 4 points - failed Total time to find the best match = 45 seconds
a) Template with 10 points b) Result
Figure 86 Result 17 with ‘SPAR’ - 10 points - failed (The same figure as Figure 38) Total time to find the best match = 49 seconds
D.1.2. Experiment with Curve Templates Given in Table 4
In the following pictures, blue points are manually clicked points in order to create a
curve. Red points are control points of the curve. Yellow points are sampled for
searching the target object.
119
a) Template with 6 points curve b) Result
Figure 87 Result 18 with ‘EL GIGANTEN’ - 6 points are clicked, 11 points sampled - successful Total time to find the best match = 49 seconds, MSE = 1.7
a) Template with 6 points curve b) Result
Figure 88 Result 19 with ‘EL GIGANTEN’ - 6 points are clicked, 11 points sampled - successful (The same figure as Figure 39) Total time to find the best match = 52 seconds, MSE = 3.1
a) Template with 6 points curve b) Result
Figure 89 Result 20 with ‘El GIGANTEN’ - 6 points are clicked, 11 points sampled - successful Total time to find the best match = 52 seconds, MSE = 3.9
a) Template with 6 points curve b) Result
Figure 90 Result 21 with ‘EL GIGANTEN’ - 6 points are clicked, 15 points sampled - successful Total time to find the best match = 55 seconds, MSE = 2.3
120
a) Template with 6 points curve b) Result
Figure 91 Result 22 with ‘EL GIGANTEN’ - 6 points are clicked, 15 points sampled - successful Total time to find the best match = 55 seconds, MSE = 2.7
a) Template with 10 points curve b) Result
Figure 92 Result 23 with ‘EL GIGANTEN’ - 10 points are clicked, 16 points sampled - successful Total time to find the best match = 56 seconds, MSE = 1.3
a) Template with 10 points curve b) Result
Figure 93 Result 24 with ‘EL GIGANTEN’ - 10 points are clicked, 16 points sampled - successful Total time to find the best match = 49 seconds, MSE = 2.6
121
a) Template with 6 points curve b) Result
Figure 94 Result 25 with ‘EL GIGANTEN’ - 6 points are clicked, 11 points sampled - failed (The same figure as Figure 40) Total time to find the best match = 56 seconds
a) Template with 10 points curve b) Result
Figure 95 Result 26 with ‘EL GIGANTEN’ - 10 points are clicked, 11 points sampled - failed Total time to find the best match = 55 seconds
a) Template with 6 points curve b) Result
Figure 96 Result 27 with ‘SPAR’ - 6 points are clicked, 5 points sampled - successful Total time to find the best match = 45 seconds, MSE = 15.4
a) Template with 6 points b) Result
Figure 97 Result 28 with ‘SPAR’ - 6 points are clicked, 10 points sampled - successful Total time to find the best match = 52 seconds, MSE = 8.3
122
a) Template with 6 points b) Result
Figure 98 Result 29 with ‘SPAR’ - 6 points are clicked, 10 points sampled - successful Total time to find the best match = 50 seconds, MSE = 11
a) Template with 10 points b) Result
Figure 99 Result 30 with ‘SPAR’ - 10 points are clicked, 10 points sampled - successful Total time to find the best match = 49 seconds, MSE = 1.8
a) Template with 10 points b) Result
Figure 100 Result 31 with ‘SPAR’ - 10 points are clicked, 15 points sampled - successful Total time to find the best match = 52 seconds, MSE = 1.9
a) Template with 10 points b) Result
Figure 101 Result 32 with ‘SPAR’ - 10 points are clicked, 15 points sampled - successful (The same figure as Figure 41) Total time to find the best match = 49 seconds, MSE = 1.9
123
a) Template with 10 points b) Result
Figure 102 Result 33 with ‘SPAR’ - 10 points are clicked, 10 points sampled - failed (The same figure as Figure 42) Total time to find the best match = 49 seconds
a) Template with 15 points b) Result
Figure 103 Result 34 with ‘SPAR’ - 15 points are clicked, 15 points sampled - failed Total time to find the best match = 52 seconds
D.2. s, s and Bs Calculated from the Training Data in Section 5.5.2. 0A 1A
0A
Camera
N
1A
124
B
Table 11 , and B estimated from the training data taken from camera N 0A 1A
0A
1A Camera
S
B
Table 12 , and B estimated from training data taken from camera S 0A 1A
D.3. Results of Section 5.5 Experiments on Condensation
Test Sequence new11.avi e1.avi tn3.avi ts1.avi/ ts2.avi
Initial Range Min. Max. Min. Max. Min. Max. Min. Max. Min. Max.
Q0(1) 480 580 380 430 400 450 135 175 100 150
Q0 (2) 120 170 145 175 160 180 165 185 190 220
Q0 (3) -0.5 -0.25 -0.8 -0.5 -0.8 -0.5 -0.5 -0.25 -0.5 -0.25
Q0 (4) -0.5 -0.25 -0.8 -0.5 -0.8 -0.5 -0.5 -0.25 -0.5 -0.25
Q0 (5) 0 0 0 0 0 0 0 0 -0.15 -0.1
Q0 (6) 0 0 0 0 0 0 0 0 0.2 0.4
Table 13 Initial range where particles are generated in each test video sequence—used for
experiment no.1-no.18
125
ree13.avi retn32. avi
new11-3.avi ree15.avi
retn37.avi renew116.avi
rets16.avi rets24.avi
Figure 104 Errors per frame –experiment no. 11 to 18 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
Test Sequence new11.avi e1.avi tn3.avi ts1.avi/ ts2.avi
Initial Range Min. Max. Min. Max. Min. Max. Min. Max. Min. Max.
Q0(1) 500 550 380 410 400 440 155 200 100 150
Q0 (2) 145 175 145 165 155 175 180 190 190 220
Q0 (3) -0.1 -0.3 -0.8 -0.5 -0.8 -0.5 -0.5 -0.25 -0.5 -0.25
Q0 (4) -0.1 -0.3 -0.8 -0.5 -0.8 -0.5 -0.5 -0.25 -0.5 -0.25
Q0 (5) 0.01 0.02 0.01 0.02 0.001 0.01 -0.1 -0.08 -0.15 -0.1
Q0 (6) 0.004 0.005 0.02 0.03 0.008 0.01 0.2 0.4 0.2 0.4
Table 14 Initial range where particles are generated in each test video sequence—used for
experiment no.19-no.24
Exp.No./
Crp. Exp.no
Testing data
Result file
Coefficients
0A , , and B 1A
Confidence
interval of the error η
Confidence interval of manual tracking
error η ’36Standard deviation
19/11 e1.avi ree1R1.avi
Trained from e1.avi-e5.avi
5.8<η <17.5 1.6350<η ’<14.765 5.4843
20/12 tn3.avi retn3R1.avi ´´ 3499<η <3514 0.5081<η ’<26.741 2134.3
36 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
126
21/13 new11.avi
new11R1.avi ´´smaller and smaller 924.6<η <943.7 7.3299<η ’<43.12 563.56
22/15 e1.avi ree1R5.avi
Trained from e1.avi-
e5.avi,en11.avi-en14.avi
28293<η <28305 1.6350<η ’<14.765 13179
23/16 new11.avi
renew11R13.avi ´´ 46.0<η <65.0 7.3299<η ’<43.12 159.78
24/17 ts1.avi rets1R3.avi
Trained from e6.avi-e10.avi,es11.avi-es14.avi
4739<η <4748 3.8960<η ’<18.254 3587.2
Table 15Errors in tracking using A0, A1 and B trained from the sequences taken by the same
camera experiment no.19-24
ree1R1.avi retn3R1.avi
renew11R1.avi ree1R5.avi
rets1R3.avi
Figure 105 Errors as per frame –experiment no. 19 to 24 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
.
Exp.No
.
Test data
result file
Coefficients
0A , , and B 1AConfidence
interval of the error η
Confidence interval of manual tracking
error η ’37
Standard
deviation
25 tn2.avi
retn204.avi
Trained from sequences taken by the same
camera (e1.avi-e5.avi,en11.avi-
en14.avi)
86976<η <87002 4.9268<η ’<40.173 71314
initial range= [380 410 145 155 -0.3 -0.2 -0.3 -0.2 0.01 -0.02 0.07 0.1], threshold =0.5
Table 16 Errors in tracking using A0, A1 and B trained from the sequences taken by the same camera experiment no.25
37 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
127
Figure 106 Errors as per frame –experiment no. 25
The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
ree1R7.avi retn3R5.avi
renew11R11.avi rets111.avi
rets25.avi
Figure 107 Errors as per frame –experiment no. 26-no.30 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
ree1R8.avi retn3R4.av
renew11R12.avi rets1K9.avi
128
rets26.avi
Figure 108 Errors as per frame –experiment no. 31-no.35 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
avgError Coefficients
0A , , and B 1AExp.No.
Test data
Result file
Default
Estimated from the trained data of the previous
30 frames
Confidence interval of the
error η
Confidence interval of
manual tracking
error η ’38
Total Standard deviation
44 new11.avi 4.avi 25.201 72.124 35.2<η <54.3 7.3299<η ’<
43.12 58.344
45 ts2.avi
rets2On3.avi 38.859 560.64 1087<η <1098 3.7280<η ’<
23.5720 1982.6
Table 17 Errors in tracking the template ‘e’ using on-line trained A0, A1 and B-exp no.44, 45
Figure 109 Error per frame –experiment no. 44 and no.45 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
reBoksTs24.avi reBoksTs25.avi
38 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
129
reelg03.avi Elgan12.avi
Figure 110 Error per frame –experiment no. 46 and no.49 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
Coefficients , ,
and B 0A 1A
Estimated from the trained data of the same sequence
Experiment factor Occlusion occur later Occlusion from start Out of image
Experiment No. 50 51 52 Confidence interval
of the error η 12044<η <12068 3274<η <3323 35989<η <36013
Confidence interval of manual tracking
error η ’39
6.0286<η ’< 29.0214
7.3091<η ’<103.9091
7.0562<η ’<30.293
Standard deviation of the error 37409 959.19 37845
File name of the result1 eOcllusion3.avi brother4.avi spar9.avi
Table 18 Errors –experiment nr.50-52
a) Error per frame–experiment no.50 b) Error per frame –experiment no.51
c) Error per frame –experiment no.52 d) Error per frame –experiment no.53
e) Error per frame –experiment no.54 f) Error per frame–experiment no.55
Figure 111Error per frame –experiment no. 50-52 The x-axis denotes the frame number and the y-axis denotes MSE at each frame.
39 If the confidence interval of the error η is below the max. of η ’, we consider the tracker successfully track the target.
130
E. Matlab Code 1. Functions used for both template matching and condensation function [BF, DB1] = BleFuncGivenV2(u, v, d) % Create a blending function for a paramaeter u on the curve % from a given knot vector v % input: v = [v1, v2, v3,,,, ] = [0,0,0,0, u1,,,u,u,u,u] % the first and last numbers are repeated 4 times. % u -- parameter on the curve % d -- degree of the B-splines % output: BF -- blending function (B1,B2, B3...,Bn+1) % DB1 -- first derivative of Blending function at u m = length(v); % number of knots n+d+1 B = zeros(m-d,d); % B = number of control points x degree DB = zeros(m-d-1, d-1); if u>=v(d) & u<=v(m-d+1) idx = length(find(u>v)); % the range of non-zero Bk,1 is u(idx)~u(idx+1) if u == 0 B(1:d, 1) = 1; else B(idx,1)=1; end for i = 2:d [B, DB] = calcACol(B,DB,v,u,m-d,i); end end BF = B(:,d)'; DB1 = DB(:,d-1)'; function [B, DB] = calcACol(B,DB,v,u,bfn,d) % calculating the dth column of the blending function matrix col(= n+1) x degree. w = zeros(1,bfn); z = zeros(1,bfn); for k = 1:bfn if v(k+d-1)-v(k) ~= 0 z(k) = 1/(v(k+d-1)-v(k)); w(k) = (u-v(k))*z(k); end end for k = 1:bfn-1 B(k,d)=B(k,d-1)*w(k) + B(k+1,d-1)*(1-w(k+1)); DB(k,d-1)=B(k,d-1)*(d-1)*z(k) - B(k+1,d-1)*(d-1)*z(k+1); end B(bfn,d) = B(bfn,d-1)*w(bfn); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
131
Function [BF, DB1] = BleFuncGivenVp(u, v, p, d) % Create a blending function for a parameter u on the curve % from a given knot vector v % input: v = [v1, v2, v3,..]= [0,...0, u1,...,1,...,1] % the first and last numbers are repeated d times. % u -- parameter on the curve % p -- number of interpolated points % d -- degree of the B-splines % output: BF -- blending function (B1,B2,B3...,Bn+1) % DB1 -- first derivative of Blending function at u if d==4 tempv = zeros(1,p+5); for i = 1:length(tempv) tempv(i) = v(i+1); end v = tempv; end m = length(v); % number of knots n+d+1 B = zeros(p,d); % B = number of control points x degree DB = zeros(p-1, d-1); if u>=v(1) & u<=v(end) idx = length(find(u>=v)); if idx > p idx = p; end % the range of non-zero Bk, 1 is u(idx)~u(idx+1) if u == 0 B(1:d, 1) = 1; else B(idx,1)=1; end for i = 2:d [B, DB] = calcACol(B,DB,v,u,p,i); end end BF = B(:,d)'; DB1 = DB(:,d-1)'; function [B, DB] = calcACol(B,DB,v,u,bfn,d) % calculating the dth column of the blending function matrix col(= n+1) x degree. w = zeros(1,bfn); z = zeros(1,bfn); for k = 1:bfn if v(k+d-1)-v(k) ~= 0 z(k) = 1/(v(k+d-1)-v(k)); w(k) = (u-v(k))*z(k); end end for k = 1:bfn-1 B(k,d)=B(k,d-1)*w(k) + B(k+1,d-1)*(1-w(k+1)); DB(k,d-1)=B(k,d-1)*(d-1)*z(k) - B(k+1,d-1)*(d-1)*z(k+1); end B(bfn,d) = B(bfn,d-1)*w(bfn);
132
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [eout,ax,ay] = cannyEdge(varargin) % Create a binary edge map based on the given threshold value % using Canny edge method. % Arranged by edge function in matlab. % input: template image and canny edge threshold % output: eout -- binary edge map - 1 implies edge and 0 otherwise % ax, ay -- gradient magnitude in x and y direction [a,method,thresh,sigma,H,kx,ky] = parse_inputs(varargin{:}); % Transform to a double precision intensity image if necessary if ~isa(a, 'double') a = im2double(a); end m = size(a,1); n = size(a,2); rr = 2:m-1; cc=2:n-1; % The output edge map: e = repmat(false, m, n); if strcmp(method,'canny') % Magic numbers GaussianDieOff = .0001; PercentOfPixelsNotEdges = .7; % Used for selecting thresholds ThresholdRatio = .4; % Low thresh is this fraction of the high. % Design the filters - a gaussian and its derivative pw = 1:30; % possible widths ssq = sigma*sigma; width = max(find(exp(-(pw.*pw)/(2*sigma*sigma))>GaussianDieOff)); if isempty(width) width = 1; % the user entered a really small sigma end t = (-width:width); gau = exp(-(t.*t)/(2*ssq))/(2*pi*ssq); % the gaussian 1D filter % Find the directional derivative of 2D Gaussian (along X-axis) % Since the result is symmetric along X, we can get the derivative along % Y-axis simply by transposing the result for X direction. [x,y]=meshgrid(-width:width,-width:width); dgau2D=-x.*exp(-(x.*x+y.*y)/(2*ssq))/(pi*ssq); % Convolve the filters with the image in each direction % The canny edge detector first requires convolution with % 2D gaussian, and then with the derivitave of a gaussian. % Since gaussian filter is separable, for smoothing, we can use % two 1D convolutions in order to achieve the effect of convolving % with 2D Gaussian. We convolve along rows and then columns. %smooth the image out
133
aSmooth=imfilter(a,gau,'conv','replicate'); % run the filter accross rows aSmooth=imfilter(aSmooth,gau','conv','replicate'); % and then accross columns %apply directional derivatives ax = imfilter(aSmooth, dgau2D, 'conv','replicate'); ay = imfilter(aSmooth, dgau2D', 'conv','replicate'); mag = sqrt((ax.*ax) + (ay.*ay)); if any(ay) theta = atan(ay./ax); end magmax = max(mag(:)); if magmax>0 mag = mag / magmax; % normalize end % Select the thresholds if isempty(thresh) [counts,x]=imhist(mag, 64); highThresh = min(find(cumsum(counts) > PercentOfPixelsNotEdges*m*n)) / 64; lowThresh = ThresholdRatio*highThresh; thresh = [lowThresh highThresh]; elseif length(thresh)==1 highThresh = thresh; if thresh>=1 error('The threshold must be less than 1.'); end lowThresh = ThresholdRatio*thresh; thresh = [lowThresh highThresh]; elseif length(thresh)==2 lowThresh = thresh(1); highThresh = thresh(2); if (lowThresh >= highThresh) | (highThresh >= 1) error('Thresh must be [low high], where low < high < 1.'); end end % The next step is to do the non-maximum supression. % We will accrue indices which specify ON pixels in strong edgemap % The array e will become the weak edge map. idxStrong = []; for dir = 1:4 idxLocalMax = cannyFindLocalMaxima(dir,ax,ay,mag); idxWeak = idxLocalMax(mag(idxLocalMax) > lowThresh); e(idxWeak)=1; idxStrong = [idxStrong; idxWeak(mag(idxWeak) > highThresh)]; end rstrong = rem(idxStrong-1, m)+1; cstrong = floor((idxStrong-1)/m)+1; e = bwselect(e, cstrong, rstrong, 8); e = bwmorph(e, 'thin', 1); % Thin double (or triple) pixel wide contours end if nargout==0,
134
imshow(e); else eout = e; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Local Function : cannyFindLocalMaxima % function idxLocalMax = cannyFindLocalMaxima(direction,ix,iy,mag); % % This sub-function helps with the non-maximum supression in the Canny % edge detector. The input parameters are: % % direction - the index of which direction the gradient is pointing, % read from the diagram below. direction is 1, 2, 3, or 4. % ix - input image filtered by derivative of gaussian along x % iy - input image filtered by derivative of gaussian along y % mag - the gradient magnitude image % % there are 4 cases: % % The X marks the pixel in question, and each % 3 2 of the quadrants for the gradient vector % O----0----0 fall into two cases, divided by the 45 % 4 | | 1 degree line. In one case the gradient % | | vector is more horizontal, and in the other % O X O it is more vertical. There are eight % | | divisions, but for the non-maximum supression % (1)| |(4) we are only worried about 4 of them since we % O----O----O use symmetric points about the center pixel. % (2) (3) [m,n,o] = size(mag); % Find the indices of all points whose gradient (specified by the % vector (ix,iy)) is going in the direction we're looking at. switch direction case 1 idx = find((iy<=0 & ix>-iy) | (iy>=0 & ix<-iy)); case 2 idx = find((ix>0 & -iy>=ix) | (ix<0 & -iy<=ix)); case 3 idx = find((ix<=0 & ix>iy) | (ix>=0 & ix<iy)); case 4 idx = find((iy<0 & ix<=iy) | (iy>0 & ix>=iy)); end % Exclude the exterior pixels if ~isempty(idx)
135
v = mod(idx,m); extIdx = find(v==1 | v==0 | idx<=m | (idx>(n-1)*m)); idx(extIdx) = []; end ixv = ix(idx); iyv = iy(idx); gradmag = mag(idx); % Do the linear interpolations for the interior pixels switch direction case 1 d = abs(iyv./ixv); gradmag1 = mag(idx+m).*(1-d) + mag(idx+m-1).*d; gradmag2 = mag(idx-m).*(1-d) + mag(idx-m+1).*d; case 2 d = abs(ixv./iyv); gradmag1 = mag(idx-1).*(1-d) + mag(idx+m-1).*d; gradmag2 = mag(idx+1).*(1-d) + mag(idx-m+1).*d; case 3 d = abs(ixv./iyv); gradmag1 = mag(idx-1).*(1-d) + mag(idx-m-1).*d; gradmag2 = mag(idx+1).*(1-d) + mag(idx+m+1).*d; case 4 d = abs(iyv./ixv); gradmag1 = mag(idx-m).*(1-d) + mag(idx-m-1).*d; gradmag2 = mag(idx+m).*(1-d) + mag(idx+m+1).*d; end idxLocalMax = idx(gradmag>=gradmag1 & gradmag>=gradmag2); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Local Function : parse_inputs % function [I,Method,Thresh,Sigma,H,kx,ky] = parse_inputs(varargin) % OUTPUTS: % I Image Data % Method Edge detection method % Thresh Threshold value % Sigma standard deviation of Gaussian % H Filter for Zero-crossing detection % kx,ky From Directionality vector error(nargchk(1,5,nargin)); I = varargin{1}; copy_checkinput(I,{'double','logical','uint8','uint16'},... {'nonsparse','2d'},mfilename,'I',1); % Defaults Method='canny'; Thresh=[]; Direction='both'; Sigma=2; H=[]; K=[1 1];
136
methods = {'canny','prewitt','sobel','marr-hildreth','log','roberts','zerocross'}; directions = {'both','horizontal','vertical'}; % Now parse the nargin-1 remaining input arguments % First get the strings - we do this because the intepretation of the % rest of the arguments will depend on the method. nonstr = []; % ordered indices of non-string arguments for i = 2:nargin if ischar(varargin{i}) str = lower(varargin{i}); j = strmatch(str,methods); k = strmatch(str,directions); if ~isempty(j) Method = methods{j(1)}; if strcmp(Method,'marr-hildreth') warning('''Marr-Hildreth'' is an obsolete syntax, use ''LoG'' instead.'); end elseif ~isempty(k) Direction = directions{k(1)}; else error(['Invalid input string: ''' varargin{i} '''.']); end else nonstr = [nonstr i]; end end % Now get the rest of the arguments switch Method case 'canny' Sigma = 1.0; % Default Std dev of gaussian for canny threshSpecified = 0; % Threshold is not yet specified for i = nonstr if prod(size(varargin{i}))==2 & ~threshSpecified Thresh = varargin{i}; threshSpecified = 1; elseif prod(size(varargin{i}))==1 if ~threshSpecified Thresh = varargin{i}; threshSpecified = 1; else Sigma = varargin{i}; end elseif isempty(varargin{i}) & ~threshSpecified % Thresh = []; threshSpecified = 1; else error('Invalid input arguments'); end end otherwise error('Invalid input arguments'); end if Sigma<=0
137
error('Sigma must be positive'); end switch Direction case 'both', kx = K(1); ky = K(2); case 'horizontal', kx = 0; ky = 1; % Directionality factor case 'vertical', kx = 1; ky = 0; % Directionality factor otherwise error('Unrecognized direction string'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function v = chordLengthPara(P,d) % Calculate the knot parameters using the Chord Length % Parameterization method. % % input: P -- [p1(x,y); p2(x,y);..;pn+1(x,y)]' interpolated points % d -- degree (cubic d=4, quadratic d=3, linear d=2) % output: v -- normalized knot vector for open B-splines m = length(P); % Calculate the distance between two neighboring pixels Lsqrt = sqrt(sum((P(1:m-1,:)-P(2:m,:))'.^2)); totalL = sum(Lsqrt); % Normalize the knot vector s = cumsum(Lsqrt)/totalL; knots = [0, s]; % Pad zeros and ones at both ends of the knot vector so that the curve interpolates the two end control points. v = [zeros(1,d-1), knots, ones(1,d-1)]; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function CP = find_Control_Points_byCL(kv, P, d) % Find coordinates of n+1 control points for a curve defined by n+1 % interpolated points on the curve and corresponding blending function % when d==4, d==3, d==2 % knots length(v) = n+7, n+5, n+3 % control points = n+3, n+2, n+1 % conditons = n+1, n+1, n+1 % we need more conditions when d==4 and d==3 or we should reduce control % points. % Assume that first control point cp0 repeat twice when d==3 % the first and the last control points cp0 and cpn repeat twice when d==4 % input: kv = [v1, v2, v3,,,, ]= [0,..0, u1,,,1,..1] knot vector
-- the first and last numbers are repeated d times. % P = [p1(x,y); p2(x,y);...;pn+1(x,y)] coordinates of % interpolated points
138
% d = degree of the B-spline % output: CP = [cp1(x,y),...,cpn+1(x,y)] coordinates of control points p = length(P); B = zeros(p); B(1,1) = 1; % the first control point = first knot B(end, end) = 1; % the last control point = last knot % Calculate the value of blending functions and the first derivatives at each knot. for i = 2:p-1 [BF, DB1]= BleFuncGivenVp(kv(d+i-1), kv, p, d); B(i,:) = BF; end % Calculate the coordinates of control points using the inverse of B. % If det(B) ==0, pseudo inverse is used instead. if det(B) ~= 0 CP = inv(B)*P; else CP = inv(B'*B)*B'*P; end % add the first and last knots as control points % when d==4 repead both the first and the last control points, when d==3 repeat the first control % point. When d==2, do nothing. switch d case 4 CP = [CP(1,:); CP; CP(end,:)]; case 3 CP = [CP(1,:); CP;]; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % function [AX,AY] = normaliz2(ax, ay); % Normalize a vector (ax(i,j), ay(i,j)) % input : ax -- x coordinates matrix % ay -- y coordinates matrix % output: AX -- normalized x coordinates matrix % AY -- normalized y coordinates matrix [m,n]=size(ax); AX=ones(m,n); AY=ones(m,n); for i = 1:m for j=1:n d=sqrt(ax(i,j)^2+ay(i,j)^2); if d~=0 AX(i,j)=ax(i,j)/d; AY(i,j)=ay(i,j)/d; end end end
139
2. Functions used for template matching function [dMap,ix,iy, iX, iY,nrI,Ie]=disMap(I,thresh,d) %Calculate the distance map Ø and %the correspoinding nearest edge point's unit vector %input: I -- frame image % thresh -- canny edge threshold % d -- range to check the distance from edge points %output: dMap -- distance map % ix, iy -- coordinates of the edges % iX, iY -- normalized directional vectors of the nearest edge % nrI -- number of edge pixels % Ie -- edge map by canny tic %I=imread(I); I=rgb2gray(I); [m,n]=size(I); % Since the frame is blurred, put lower threshold to get better edge [Ie,ix1,iy1] = cannyEdge(I,’canny’, thresh); % Normalize the directional vector [IX, IY] = normaliz2(ix1,iy1); % Get index for the edge points [iy, ix] = find(Ie==1); % Get the distance map for the frame as far as d from edge points nrI=length(iy); dMap=250*ones(m,n);% initial distance map iY=ones(m,n); % used to record distances iX=ones(m,n); for i=1:nrI for j=-d:1:d r=round(iy(i)+j*IY(i)); c=round(ix(i)+j*IX(i)); if r>1 & r<m & c>1 & c<n % not include the border if dMap(r,c)>abs(j) dMap(r,c)=abs(j); iY(r,c)=IY(iy(i),ix(i)); iX(r,c)=IX(iy(i),ix(i)); end end end end toc %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function MSE=errorTM(tx, ty, mx, my) % Calculate the mean square error % The tracking result is compared to the manually selected result % input: tx, ty -- column vector with n tracked coordinate % mx, my -- column vector with n manually selected coordinate % output: MSE -- error per frame % the number of points q=size(tx, 1);
140
% Distance between the tracked and manually selected points. dx = tx-mx; dy = ty-my; SE = (dx'*dx+dy'*dy); MSE= SE/q; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [mx, my]=manualTrack(I, n) % Manually track for error measurement and show the result % input: I -- input image % n - number of interpolated points % output: mx -- a column vector of x coordinates of % manually clicked points. % my -- a column vector of y coordinates of % manually clicked points. %I = imread(I); [BW, ax, ay] =cannyEdge(rgb2gray(I), 'canny', 0.4); figure(2); imshow(BW); [mx, my] = ginput(n); hold on; for i=1:n plot(mx(i), my(i), 'r*'); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [mmx, mmy]=manualTrackCurve(I, n, d, k) % Manually track for error measurement and show the result % input: I -- input image % n -- number of interpolated points % d -- degree % k -- interval of sample points % output: mmx -- a column vector of x coordinates of uniformly % distributed points based on manual click of knots. % mmy -- a column vector of y coordinates of uniformly % distributed points based on manual click of knots. %I = imread(I); [BW, ax, ay] =cannyEdge(rgb2gray(I), 'canny', 0.4); figure(2); imshow(BW); [mx, my] = ginput(n); hold on; m = size(mx,1); % number of interpolated points % calculate knot values using the Chord Length parameterization method P = [mx, my]; v = chordLengthPara(P,d); % Find control points given the knot vector, interpolated points and the degree
141
CP = find_Control_Points_byCL(v, P, d); for i= 1:m plot(P(i,1), P(i,2), 'b*'); end % Sample, draw and save uniformly distributed points r = []; for i = 0:k u = i*1/k; [BF, DB1] = BleFuncGivenV2(u, v, d); r = [r; BF*CP]; plot(r(i+1, 1), r(i+1,2), 'y.:'); end r = round(r); mmx = r(:,1); mmy = r(:,2); hold off %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [E,finY,finX]=objFunc(tx,ty,tX,tY,nrT,In,threshIn,threshT,d,p) % implement objective function % Input: T -- template % In -- frame image % d -- the range where distance will be calculated % p -- smooth factor in objective function % threshIn/T -- cannyedge threshold for Input image/Template % tx, ty -- coordinates of the edges of the template points % tX, tY -- normalized directional vectors of the nearest % edge of the template points % Output: E -- minimum energy % finY -- y coordinates of the final findings % finX -- x coordinates of the final findings tic; [dMap,ix,iy,iX,iY,nrI,Ie]=disMap(In,threshIn,d); t1=toc; tic; [m,n]=size(Ie); %move template over frame image and check the minimum E. minE=inf; % translate so that the first point on the origin Teyy=ty-ty(1); Texy=tx-tx(1); % Rotate the template for r = -1/12*pi:1/24*pi:1/12*pi TeyY=Teyy*cos(r)-Texy*sin(r); TexY=Texy*cos(r)+Teyy*sin(r); TXX=tX*cos(r)+tY*sin(r); TYY=tY*cos(r)-tX*sin(r); %scale based on the 1st edge point for sc=1/2:1/8:2%scale based on the 1st edge point nTey=round(sc*TeyY); nTex=round(sc*TexY);
142
% translate template 'over' frame image for index=1:nrI R=iy(index); C=ix(index); sTey=nTey+R; sTex=nTex+C; if all(max(sTey)<m & min(sTey)>0 & max(sTex)<n & min(sTex)>1)% guarantee all the points are within the image E=0; %calculate the energy for a=1:nrT % the original location used
% to find the directional vector tr=sTey(a); tc=sTex(a); cos=abs(iX(tr,tc)*TXX(a)+iY(tr,tc)*TYY(a)); E=E+1-exp(-p*dMap(tr,tc))*cos; end EE=E/nrT; %objective function energy if EE< minE %find the min E minE = EE; finY=sTey; finX=sTex; sss=sc; sssr=r; end end end end end figure(5) imshow(Ie); hold on for q=1:nrT plot(finX(q),finY(q),'r*'); end hold off t2=toc; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function r = runFindCurve(I,n,d,k) % Create a B-spline curve and sample and return uniformly % distributed points. % input: I -- image % n -- number of interpolated points % d -- degree % k -- interval of sample points % output: r -- uniformly distributed points sampled on the curve imshow(I); hold on; P = ginput(n); % Calculate knot values using Chord Length parameterization method v = chordLengthPara(P,d); CP = find_Control_Points_byCL(v, P, d);
143
% Draw selected points for i= 1:n plot(P(i,1), P(i,2), 'b*'); end % Draw control points for i= 1:length(CP) plot(CP(i,1), CP(i,2), 'r*'); end % Save and draw uniformly distributed points r = []; for i = 0:k u = i*1/k; [BF, DB1] = BleFuncGivenV2(u, v, d); r = [r; BF*CP]; plot(r(i+1, 1), r(i+1,2), 'y.:'); end r = round(r); hold off %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [tx,ty,tX,tY,nrT,P]=tempPoints(T,thresh,n) % Defines the sample points of the template manually % input: T -- template image % thresh -- canny edge threshold % output: tx, ty -- coordinates of input points % tX, tY -- normalized directional vectors % on the rounded coordinates of points % nrT -- number of input points % T=imread(T); T=rgb2gray(T); [Te,tx1,ty1] = cannyEdge('canny', T,thresh); [TX, TY] = normaliz2(tx1,ty1); P = selectPoints(Te, n); tx=P(:,1); ty=P(:,2); nrT=length(tx); tX=zeros(nrT,1); tY=tX; for i=1:nrT tX(i,1)=TX(ty(i,1),tx(i,1)); tY(i,1)=TY(ty(i,1),tx(i,1)); end function [P] = selectPoints(I,n) % input I -- image % n -- number of interpolated points % output P -- input points imshow(I); hold on; P = ginput(n); P = round(P); for i= 1:n
144
plot(P(i,1), P(i,2), 'b*'); end hold off %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [tx,ty,tX,tY,nrT,P]=tempPointsCurve(T, thresh, n, d, k) % Select points for finding the best match given the degree of the curve, % number of knots and the number of sampling on the curve % input: T -- template image % thresh -- canny edge threshold % n -- number of interpolated points % d -- degree % k -- number of sample points for finding the best match % output: tx, ty -- coordinates of input points % tX, tY -- normalized directional vectors on the % rounded coordinates of points % nrT -- number of input sample points % P -- sampled points on the curve to find a best match %T = imread(T); T = rgb2gray(T); % Obtain a binary edge map and gradient magnitude of each edge point [Te, tx1, ty1] = cannyEdge(T,thresh); [TX, TY] = normaliz2(tx1,ty1); % Create a curve and get points for finding the best match. [P] = runFindCurve(Te, n, d, k); tx=P(:,1); ty=P(:,2); % Save normalized directional vectors nrT=length(tx); tX=zeros(nrT,1); tY=tX; for i=1:nrT tX(i,1)=TX(ty(i,1),tx(i,1)); tY(i,1)=TY(ty(i,1),tx(i,1)); end
3. Functions used for condensation function num = binarySubdivide(cumDist, r, n) % Find the smallest number > r by binary subdivision search % Binary subdivision search is a fast way to find where the given % number should fit % The code is given in ITU’s % input: cumDist -- array to be searched % r -- number whose position in cumDist is searched % n -- largest number in cumDist % output: num -- minimum number which satisfies > r high = n; low = 0; while (high > (low+1))
145
middle = round((high + low)/2); if r > cumDist(middle) low = middle; else high = middle; end end num = low + 1; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [Temp1, Temp2, Temp3,Temp4] = create3temp(I, d1, d2, d3,d4) % Create a template with 3 curves (Temp4 and d4 are used for % error measurement) % input I - billboard image from which template can be created. % d – the degree of the b-spline curve % output Templ -curve segment % Temp.CP-control points % Temp.v-knot values used to get the control points % Temp.d-b-spline curve's degree imshow(I); hold on; Temp1 = interpolate(d1); Temp2 = interpolate(d2); Temp3 = interpolate(d3); Temp4 = interpolate(d4); hold off function Temp = interpolate(d); % create a interpolated curve with the given degree d % input: d -- degree % output: Temp -- template containing its knot vector, control points % and degree P = ginput; % calculate knot values using Chord Length parameterization method v = chordLengthPara(P,d); % calucate the control points CP = find_Control_Points_byCL(v,P, d); % draw the curve x = []; y = []; for u = 0:1/20:1 [BF, DB1] = BleFuncGivenV2(u, v, d); r = BF*CP; x = [x, r(1)]; y = [y, r(2)]; end line(x, y,'Color','r'); Temp.v = v; Temp.CP = CP; Temp.d= d;
146
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [newR,dRR]=getCurveAndNormal3(Q,Temp) % Construct the predicted curve from the predicted Q % get the normal of the curve % input: Q -- the space vector of the new curve % Temp -- the control points info. % output: newR -- the new curve % dRR -- the normal of the curve(not normalized) %1st step, get the new control points from the given Q %get W matrix from the template control points Nc = length(Temp.CP(:,1)); One = ones(Nc,1); Zero = zeros(Nc,1); W = [One Zero Temp.CP(:,1) Zero Zero Temp.CP(:,2); Zero One Zero Temp.CP(:,2) Temp.CP(:,1) Zero]; %get the new control points from Q, W ,Temp nCP=W*Q+[Temp.CP(:,1);Temp.CP(:,2)]; newCP=[nCP(1:Nc),nCP(Nc+1:2*Nc)]; %2nd step, get the coordinates of the certain points on the curve % and the corresponding normal vector newR=[]; dR=[]; for u = 0:1/20:1 % the interval can be changed % get the blending functions and the first derivatives of the curve [BF, DB1] = BleFuncGivenV2(u, Temp.v, Temp.d); r = BF*newCP; newR=[newR;r]; dr=DB1*newCP(1:(Nc-1),:); dR=[dR;dr]; end % when the tangent vector(the 1st derivetive) is (dx,dy), % then the normal vector is (-dy, dx) dRR=[-dR(:,2),dR(:,1)]; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [weight, c,QM]=Meas3(newQ, frame, r,thresh,Temp1,Temp2, Temp3,D); % Measurement step % measure all the predicted particles possibilities % input: frame -- the current frame % r -- st = sqrt(rM) = 0.3 % Temp -- the control points and the degree of % the b-spline curve segment of the tempate % D -- the length along the curve norm, whithin which % we need to check edge point % output: weight -- new normalized weight of the each particle % c -- new normalized cummulated weight % QM -- the best matching curve (the average)
147
% using edge info % functin used equ.45 [M N]=size(frame(:,:,1)); %size of the frame image n=length(newQ(1,:)); c=ones(1,n); %get the edges of the frame [Ie,ix1,iy1] = cannyEdge(rgb2gray(frame),'canny', thresh); % measure all thresh curve segments for i=1:n L1(:,i) = Poisson(newQ(:,i), Temp1, Ie, r, M, N, D); L2(:,i) = Poisson(newQ(:,i), Temp2, Ie, r, M, N, D); L3(:,i) = Poisson(newQ(:,i), Temp3, Ie, r, M, N, D); end LL=[L1',L2',L3']; LL=LL'; weight=exp(-0.5*1/r*sqrt(sum(LL.^2))); %addn 3 weights for each sample and normalize sumW = sum(weight); if sumW ~= 0 weight = (weight/sumW); end c=cumsum(weight); QM=newQ*weight';% the tracking result -- the average function L = Poisson(newQ, Temp, Ie, r, M, N, D) % Measurement by the observation model % input: newQ -- new Q vector % Temp -- template including the knot vector, control points and % degree of the curve % Ie -- binary edge map of the frame % r -- st = sqrt(rM) = 0.3 % M, N -- the size of the frame Ie, row and column, respectively % D -- the length along the curve norm, whithin which % we need to check edge point % output: L -- the distance to the nearest edge from sample points % reconstruct the curve and get its norms [newR, dRR]=getCurveAndNorm3(newQ,Temp); %get the normalized norm of the curv [X,Y] = normaliz2(dRR(:,1),dRR(:,2)); nr=length(Y); L=40*ones(nr,1);%initialize the nearest edge distance as 40 (250 is too big) % find the nearest edge for j=1:nr-1 for z=-D:1:D ro=round(newR(j,2)+z*Y(j)); c=round(newR(j,1)+z*X(j)); if ro>1 & ro<M & c>1 & c<N % not include the border
148
if Ie(ro,c)==1 if L(j)>abs(z) L(j)=abs(z); end end end end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pf1=pfIn(range,n) % Initialize the particle filter - Creat n particles (t=0) % each particle is a shape-space vector % input: range -- 12X1 vector indicating the initial guess of % the range of the six parameters of Q % [min. of Q(1), max. of Q(1), min. of Q(2), ..., % max. of Q(6)] % n -- number of samples % output: pf1 -- Initial particle set % pf1.q1 ~ pf1.q6 each value of the Q space vector % pf1.w is weight(probablity) % pf1.c is the cumulated weight rand('state', sum(100*clock)); pf1.n=n; pf1.w = zeros(n,1); pf1.c = zeros(n,1); for i=1:n %define the six values for Q space vector pf1.q1(i)=[rand*(range(2)-range(1))+range(1)]; pf1.q2(i)=[rand*(range(4)-range(3))+range(3)]; pf1.q3(i)=[rand*(range(6)-range(5))+range(5)]; pf1.q4(i)=[rand*(range(8)-range(7))+range(8)]; pf1.q5(i)=[rand*(range(10)-range(9))+range(9)]; pf1.q6(i)=[rand*(range(12)-range(11))+range(11)]; end pf1.bar=zeros(6,1); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pp=plotAll3(Q,file,Temp4,Temp2, Temp3) % Plot the frame image with the tracking result % input: Q -- the Q space vector % file -- the frame image % Temp -- Template with the knot vector, the control points % and the degree of the curve imshow(file); hold on pp = plotEach(Q,file,Temp4); pp2 = plotEach(Q,file,Temp2); pp3 = plotEach(Q,file,Temp3); hold off function pp=plotEach(Q, file, Temp) pp=[];
149
Nc=length(Temp.CP(:,1)); One = ones(Nc,1); Zero = zeros(Nc,1); W = [One Zero Temp.CP(:,1) Zero Zero Temp.CP(:,2); Zero One Zero Temp.CP(:,2) Temp.CP(:,1) Zero]; for e=1:size(Q,2) %get new control points from Q, W ,TempCP nCP=W*Q(:,e)+[Temp.CP(:,1);Temp.CP(:,2)]; newCP=[nCP(1:Nc),nCP(Nc+1:2*Nc)]; pp=[pp,newCP]; %2nd step, get the coordinates of the certain points on the curve % In order to use line function, x and y coordinates should be % saved in a column individually. x = []; y = []; for u = 0:1/20:1 % the interval can be changed. [BF, DB1] = BleFuncGivenV2(u, Temp.v, Temp.d); r = BF*newCP; x = [x, r(1)]; y = [y, r(2)]; end line(x, y); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function newQ=Pr1(PF,Qplus1,A0,A1,B,n,QM) % Prediction step % using the dynamic model to predect the resampled particles' % (at time t-1) Qs at time t % using function: Qn+2=A0Qn+A1Qn+1+(I-A0-A1)Qbar)+BWn or % Xt=xBar+(Xtminus-xBar)*A+B*Noise % input: Q -- resampled PFn % Qplus1 -- resampled PFnplus1 % A0 -- 6x6 matrix % A1 -- 6x6 matrix % B -- 6x6 matrix % QM -- result from the previous time step t-1 % output: newQ -- the predicted state at time t newQ=[]; Q=[[PF.q1]; [PF.q2]; [PF.q3] ; [PF.q4] ;[PF.q5] ;[PF.q6]]; I=1*eye(6); for i=1:n noise=randn(6,1); newq=A0*Q(:,i)+A1*Qplus1(:,i)+(I-A0-A1)*QM+B*noise; newQ=[newQ, newq]; end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
150
function mov = readfile(filename) % Read an avi sequence % input: filename -- the name of the avifile in ' .avi' % output: mov -- avifile mov=aviread(filename); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function Qplus1=reS(pfMinus) % Construct a new sample set at time t from the old one at time t-1. % At first generate a random number r, uniformaly distrubuted % Then find, by binary subdivision, the smallest j for which ct-1(j)>=r % At last set st(n)=st-1(j) % input: pftMinus -- sample set at t-1 % output: [q1T,q2T,q3T,q4T,q5T,q6T] -- sample set's states at time t n=pfMinus.n; Qplus1=zeros(6,n); for i=1:n r=rand(1); smallest = binarySubdivide(pfMinus.c, r, n); Qplus1(1,i)=pfMinus.q1(smallest); Qplus1(2,i)=pfMinus.q2(smallest); Qplus1(3,i)=pfMinus.q3(smallest); Qplus1(4,i)=pfMinus.q4(smallest); Qplus1(5,i)=pfMinus.q5(smallest); Qplus1(6,i)=pfMinus.q6(smallest); end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [sS, ppp]=runPFQ3curves(mov,range,n,A0,A1,B,r,thresh,Temp1, Temp2, Temp3,Temp4,D) % Run the particle filter to track target object. % This code can only track one target at a time. % input: mov -- avi video sequence % range -- 12X1 vector indicating the initial guess % of the range of the six parameters of Q % n -- nr. of particles % A -- dynamic model -- reflects drift % B -- dynamic model -- reflects diffuse by noise % r -- as in equation 45 in the thesis report % thresh -- the canny edge threshold % Temp.CP -- the control points of the template % Temp.v -- the normalized knot vector on the template curve % Temp.d -- degree of the B-splines % D -- the length along the curve norm, within % which we need to check edge point sS=[];% record the Q vector, which gives the transformation info in each frame ppp=[];%used to record control points to do the erro measurement later nrF=length(mov); %nr of frames in the clip
151
% generate n number of particles which the range pf0=pfIn(range,n);%get the initial particles at t=0 Q0=[[pf0.q1]; [pf0.q2]; [pf0.q3] ; [pf0.q4] ;[pf0.q5] ;[pf0.q6]]; %Initilalization--get the sample sets with corresponding weights for the 1st %and 2nd frame. pp=plotAll3(Q0,mov(1).cdata, Temp4, Temp2, Temp3);% plot all the particles pause(1/50); % create a new avi file to record the tracking result tracker = avifile('reelg03.avi'); tracker.Compression='Indeo5'; % the compression format tracker.fps=25; % frames per second %for the 1st frame: measure, plot and get weights for particles [weight1, c1,QM]=Meas3(Q0, mov(1).cdata, r,thresh,Temp1,Temp2, Temp3,D); pp=plotAll3(QM,mov(1).cdata,Temp4, Temp2, Temp3); pause(1/50); frame = getframe(gca); tracker = addframe(tracker, frame); % write the new frames into this file PF=upD1(Q0,weight1,c1, QM);% update the partilces %for the 2nd frame: measure, plot and get weights for particles %Qplus=reS(PF);% since we donot do any prediction at t=2, this resample is not very meaningful [weight2, c2,QM]=Meas3(Q0, mov(2).cdata, r,thresh,Temp1, Temp2, Temp3,D); pp=plotAll3(QM,mov(2).cdata,Temp4, Temp2, Temp3); frame = getframe(gca); tracker = addframe(tracker, frame); pause(1/50); PFplus1=upD1(Q0,weight2,c2, QM); %get the initilized particles at t=1 and t=2,ie. PF and PFplus1 %............................................................... %run the condensation algrithm from the 3rd frame for i=3:50 %step1: resample particle set at time t=t+2 Qplus1=reS(PFplus1); %step2: prediction newQ=Pr1(PF,Qplus1,A0,A1,B,n,QM); pp=plotAll3(newQ,mov(i).cdata,Temp4, Temp2, Temp3); pause(1/50);
152
%step3: observation model--measurement [weight, c,QM]=Meas3(newQ, mov(i).cdata, r,thresh,Temp1, Temp2, Temp3,D); sS=[sS,QM]; %show the result pp=plotAll3(QM,mov(i).cdata,Temp4, Temp2, Temp3); pause(1/50); frame = getframe(gca); tracker = addframe(tracker, frame); ppp=[ppp,pp]; %upgrade particles at t=t and t=t+1 [PF,PFplus1]=upD(newQ,PFplus1,weight,c, QM); end tracker=close(tracker); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [PF,PFplus1]=upD(newQ,PFplus1,weight,c,QM) % Uupdate the particles at time t+1 % Input: newQ -- result from prediction step at time t % PFplus1 -- particles at time t=t-1 % weight -- reuslt from the measurement % c -- result from the measurement PF=PFplus1; N=length(newQ(1,:)); PFplus1.n=N; PFplus1.w=weight; PFplus1.c=c; for i=1:N PFplus1.q1(i)=newQ(1,i); PFplus1.q2(i)=newQ(2,i); PFplus1.q3(i)=newQ(3,i); PFplus1.q4(i)=newQ(4,i); PFplus1.q5(i)=newQ(5,i); PFplus1.q6(i)=newQ(6,i); end PFplus1.bar=QM; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function pf=upD1(Q0,weight,c,QM) % Update the particles at time t=1 and t=2 % input: nQ -- the Q0 generated from pIn % weight -- result of the weight from the measurement % c -- result of the cumulated weight from the measurement N=length(Q0(1,:)); pf.n=N; pf.w=weight; pf.c=c; for i=1:N pf.q1(i)=Q0(1,i);
153
pf.q2(i)=Q0(2,i); pf.q3(i)=Q0(3,i); pf.q4(i)=Q0(4,i); pf.q5(i)=Q0(5,i); pf.q6(i)=Q0(6,i); end pf.bar=QM;
4. Functions used for error measurement of condensation function [MSE, MSEperF, ST]=errorCP(cp1,cp2) % Calculate the mean squared error % the tracking result is compared to the manually selected result % input: cp1 -- tracker tracked reuslt % cp2 -- manually tracked result % output: MSE -- average error of a sequence % MSEperF -- mean square error per frame % std -- standard deviation of MSE % the length of the sequence is 48 since we do not want to % take into account the first 2 frames, which are not really % tracked by the particle filter MSEperF=zeros(1,48); for i=1:48 tt=(cp1(:,(2*i-1):(2*i))-cp2(:,(2*i-1):(2*i))).*(cp1(:,(2*i-1):(2*i))-cp2(:,(2*i-1):(2*i))); MSEperF(1,i)=sum(sum(tt,2))/2; end MSE = mean(MSEperF); ST = std(MSEperF); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [A0,A1,B] = estimateAB(setQ) % Estimate A0, A1, and B from the history of (Q1,Q2,,,,,Qt) % input : setQ = [Q1, Q2,,,Qn] 6xn matrix % output: A0, A1, B – coefficients for the dynamic model [h,m] = size(setQ); S = zeros(6,6,3*3); % S = [S(0,0), S(0,1), S(0,2), ,,,S(2,2)] for i = 0:2 for j=0:2 sumQ= zeros(6); for k = 1:m-2 tempQ = setQ(1:h,k+i)*setQ(1:h,k+j)'; sumQ =tempQ+sumQ; end S(1:6,1:6,i*3+j+1) = sumQ; end end A0 = zeros(6); A1 = zeros(6); if det(S(1:6,1:6,4))~=0 & det(S(1:6,1:6,5)) ~=0 S0 = S(1:6,1:6,1)*inv(S(1:6,1:6,4))-S(1:6,1:6,2)*inv(S(1:6,1:6,5)); if det(S0)~=0 A0 = (S(1:6,1:6,7)*inv(S(1:6,1:6,4))-S(1:6,1:6,8)*inv(S(1:6,1:6,5)))*inv(S0)
154
end end if det(S(1:6,1:6,1))~=0 & det(S(1:6,1:6,2)) ~=0 S1 = S(1:6,1:6,4)*inv(S(1:6,1:6,1))-S(1:6,1:6,5)*inv(S(1:6,1:6,2)); if det(S1)~=0 A1 = (S(1:6,1:6,7)*inv(S(1:6,1:6,1))-S(1:6,1:6,8)*inv(S(1:6,1:6,2)))*inv(S1) end end Z = S(1:6,1:6,9)+A1*S(1:6,1:6,5)*A1'+A0*S(1:6,1:6,1)*A0'-S(1:6,1:6,8)*A1'-S(1:6,1:6,7)*A0'+A1*S(1:6,1:6,4)*A0'-A1*S(1:6,1:6,6)-A0*S(1:6,1:6,3)+A0*S(1:6,1:6,2)*A1'; C = Z/(m-2); B = sqrtm(C) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [min, max]=EstMean(cp1,cp2, trackmse) % Quantify the manually tracked error and the confidence interval % the standard deviation is assumed as from two manual clicking. % Suppose normal distribution % When the distribution is not known (Tchebycheff inequality) % zu = 3.16, u = 0.90 % zu = 4.47, u = 0.95 % input: cp1 -- manually tracked result 1 (golden standrad) % cp2 -- manually tracked result 2 % trackmse -- mean error of tracked sequence % output: min -- minimum of the estimated mean % max -- maximum of the estimated mean zu = 4.47; n = 18; MSEperF=zeros(1,n); for i=3:20 tt=(cp1(:,(2*i-1):(2*i))-cp2(:,(2*i-1):(2*i))).*(cp1(:,(2*i-1):(2*i))-cp2(:,(2*i-1):(2*i))); MSEperF(1,i)=sum(sum(tt,2))/2; end s = std(MSEperF); min = trackmse-zu*s/sqrt(n); max = trackmse+zu*s/sqrt(n);zu = 4.47; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [min, max]=EstMean2(cp1,cp2) % Quantify the manually tracked error and the confidence interval of it % Suppose one clicking result is the golden standard. % The standard deviation is assumed from two manual clicking. % The mean error is taken from the same set. % the output is the confidence interval of the error % assume zu = 2 meaning u = 0.975 % When the distribution is not known (Tchebycheff inequality) % zu = 3.16, u = 0.90
155
% zu = 4.47, u = 0.95 % input cp1 -- manually tracked result 1 (golden standrad) % cp2 -- manually tracked result 2 % input can be a sequence of tracker tracking % output: min -- minimum of the estimated mean % max -- maximum of the estimated mean zu = 4.47; n = 18; MSEperF=zeros(1,n); for i=3:20 tt=(cp1(:,(2*i-1):(2*i))-cp2(:,(2*i-1):(2*i))).*(cp1(:,(2*i-1):(2*i))-cp2(:,(2*i-1):(2*i))); MSEperF(1,i)=sum(sum(tt,2))/2; end MSE = mean(MSEperF); s = std(MSEperF); min = MSE+zu*s/sqrt(n); max = MSE+zu*s/sqrt(n); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [CPP, Q, Cxy]=findQ(filename,n1,n2,CP) % Use manual tracking to get the Q and the control points % used only in the experiment to estimate A0,A1,B and do the error % measurement % input: filename -- the video sequence % n1,n2 -- the starting and the ending frame of the sequence % that we want to manually track % CP -- the template control points % output:CPP -- the manually tracked control points for each frame % Q -- Q for each frame % Cxy -- the coordinates of the center of the control points % for each frame CPP=[]; Q=[]; Cxy=[];%the center of the template for i=n1:n2 imshow(filename(i).cdata); P=ginput; hold on v = chordLengthPara(P,2); cp = find_Control_Points_byCL(v,P, 2);%get the control points for one frame CPP=[CPP, cp];%record the control points for n1:n2 frames cxy=sum(cp)/4;% get the center of the template Cxy=[Cxy;cxy];% recorde the ceter points %get Q %first get blending function n = length(P); B = zeros(n); B(1,1) = 1; % the first control point = first knot
156
B(end, end) = 1; % the last control point = last knot for j = 2:n-1 [BF, DB1]= BleFuncGivenVp(v(2+j-1), v, n, 2); B(j,:) = BF; end %then calculate W and H, then M Nc = size(cp,1); One = ones(Nc,1); Zero = zeros(Nc,1); W = [One Zero CP(:,1) Zero Zero CP(:,2); Zero One Zero CP(:,2) CP(:,1) Zero]; sumBs = B'*B; H = [sumBs, zeros(Nc, Nc); zeros(Nc, Nc), sumBs]; M = inv(W'*H*W)*W'*H; % calculate Q q = M *[cp(:,1)-CP(:,1); cp(:,2)-CP(:,2)]; Q=[Q,q]; i end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function plotMSEperF(MSEperF) % Plot the mean square error per frame in logarithmic scale semilogy(MSEperF,'b-'); hold off %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function CP=temCPforError(im) %Used only for the experiments %Get the template control points coordinates %Use CP to calculate Q for training data to calculate A,B. %input: im --template image %output: CP --the template curve's control points(the curve used % to do the error measurement) % the control points are transformed so that the 1st points is at the % coordinate origin imshow(im); hold on axis auto P=ginput; v = chordLengthPara(P,2); m = size(P,1); for i= 1:m plot(P(i,1), P(i,2), 'y*'); end CP1 = find_Control_Points_byCL(v,P, 2); n=size(CP1,1); CP=zeros(n,2); for j=1:n CP(j,1)=CP1(j,1)-CP1(1,1); CP(j,2)=CP1(j,2)-CP1(1,2); end
157
for k=1:n plot(CP1(k,1), CP1(k,2), 'r*'); end x = []; y = []; for u = 0:1/20:1 [BF, DB1] = BleFuncGivenV2(u, v, 2); r = BF*CP1; x = [x, r(1)]; y = [y, r(2)]; end line(x, y); hold off
158
References Amit, Y., U. Grenander and M. Piccioni, Structural image restoration through deformable template, Journal of the American Statistical Association. Vol. 86, no.414. pp 376-387, June 1991. Baker, Hearn, Computer Graphics with Open GL, third edition, Pearson Prentice Hall, 2004. Blake, Andrew, Michael Isard, and David Reynard, Learning to track the visual motion of contours, Artificial Intelligence, 78 179-212, 1995. Buss, R. Samuel, 3-D Computer Graphics, Cambridge University Press, 2003. Canny, John, Finding Edges and Lines in Images, Massachusetts Institute of Technology, Cambridge, MA, 1983. de Boor, Carl, B(asic)-spline Basics, in Carl de Boor (Ed.)Extension of b-spline curve algorithms to surfaces(Vols. Course #5, ACM SIGGRAPH 86, pp.18-22). Dallas, TX: ACM, 1986. Duda, Richard O., Peter E. Hart, and David G. Stork, Maximum-likelihood and Bayesian parameter estimation, Pattern classification, 2nd edition, Wiley-Interscience, p.84-90, 2001. Fieguth, P. and D. Terzopoulos, Color-Based Tracking of Heads and Other Mobile Objects at Video Frame Rates, IEEE Conference on Computer Vision and patterns, Puerto Rico, p.21-27, 1997 Forsyth, D., and J. Ponce, Computer Vision: A Modern Approach, chapter 17, Prentice Hall, 2003. Hartley Richard and Andrew Zisserman, Multiple View Geometry in computer Vision, Cambridge University Press, 2. Edition, 2003 Gonzalez, Rafael C. and Richard E. Woods, Digital Image Processing, chapter 3, Addison Wesley, second edition, 2001. Gonzalez, Rafael C. and Richard E. Woods, Digital Image Processing, chapter 5, Addison Wesley, second edition, 2001. Gonzalez, Rafael C. and Richard E. Woods, Digital Image Processing, chapter 6, Addison Wesley, second edition, 2001. Isard, Michael and Andrew Blake. CONDENSATION – Conditional Density Propagation for Visual Tracking, International Journal of Computer Vision, 29(1): 5-28, 1998. Isard, Michael and Andrew Blake. Contour tracking by stochastic propagation of conditional density, European Conference Computer Vision, 343-356, 1996. Jain Anil K. and Yu Zhong, Object localization using color, texture and shape, Pattern Recognition 33, p.671-684, 2000. Jain Anil K. Yu Zhong, and Marie-Pierre Dubuisson-Jolly, Deformable template models: A review, Signal Processing 71 p.109-129, 1998.
159
Jain Anil K. Yu Zhong, and Marie-Pierre Dubuisson-Jolly, Object tracking using deformable templates, IEEE Transactions on pattern analysis and machine intelligence, vol. 22 No.5 p.544-549, May 2000. Jain, Anil K., Yu Zhong and Sridhar Lakshmanan, Object Matching Using Deformable Templates—Fellow, IEEE, 1996 Kervrann, Charles and Heitz, Fabrice, A hierarchical statistical framework for the segmentation of deformable objects in image sequences, Proceedings of Computer Vision and Pattern Recognition, 1994 IEEE Computer Society Conference on, p.724-728, 1994. Koller D., J. Weber, T. Huang, J. Malik, G. Ogasawara, B. Rao, and S. Rassell, Towards robust zutomatic traffic scene analysis in real-time, Proceedings of the 33rd Conference on Decision and Control, Lake Buena Vista, FL, December, p. 3776-3781, 1994. Laptev, Ivan and Tony Lindeberg, Interest point detection and scale selection in space-time, in Proccedings ICCV 2003, Nice, France, pp.432-439, 2003. Lipton, Aln J., Hironobu Fujiyoshi, and Raju S. Patil, Moving target classification and tracking from real-time video, in Workshop Applications of Computer Vision, Princeton, NJ, October, 1998. Lu, Guojun, Communication and Computing for Distributed Multimedia Systems, chapter 9, Artech House Publications, 1997. Papoulis, Athanasios and S. Unnikrishna Pillai, Probability, Random Variables and Stochastic Processes, chapter 8, Fourth Edition, McGraw-Hill, 2002. Pedersen, Kim Steenstrup, Curves and Surfaces, part II, Lecture not for Computer Graphics, IT University of Copenhagen, autumn 2004. Peter Sestoft, Searching and sorting with Java, from lecture note on Java, 1998. Reynard, David, Andrew Wildenberg, Andrew Blake and John Marchant, Learning Dynamics of Complex Motions from Image Sequences, Proceedings of the 4th European Conference on Computer Vision-Volume I, p.357-368, 1996 Rui, Yong and Yuqiang Chen, Better Proposal Distributions: Object tracking using unscented particle filter, in Proceedings of IEEE Conference Computer Vision and Pattern Recognition, Kauai, Hawaii, vol. II p.786-793, 2001. Stauffer, Chris, and W.Eric L. Grimson, Learning patterns of activity using real-time tracking, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.22(8), p.747-757, August 2000. Tate, Shuta and Yoshiyasu Takefuji, Video-based human shape detection by deformable templates and neural network, Knowledge-Based Intelligent Information Engineering Systems and Allied Technologies. KES 2002 Trucco, E. and A. Verri, Introductory Techniques for 3-D computer Vision, chapter 8, Prentice-Hall, 1998. Weng, Juyang, Narendra Ahuja, and Thomas S. Huang, Motion from images: Image matching parameter estimation and intrinsic stability, in Proceedings of IEEE Workshop Visual Motion (Irvine, CA), pp. 356-366, 1989.
160
Weng, J. John, Narendra Ahuja, and Thomas S. Huang, Learning Recognition and Segmentation of 3-D Objects from 2-D Images, in Proceedings of the Fourth International Conference on Computer Vision, IEEE, 1993. Yang, Jie and Alex Waibel, A real-time face tracker, in Proceedings of Third Workshop Applications of Computer Vision, IEEE, p.142-147, 1996.
161
top related