tracking revised)
TRANSCRIPT
-
8/2/2019 Tracking Revised)
1/29
Object Tracking by Exploiting Adaptive Region-wise Linear
Subspace Representations and Adaptive Templates in an Iterative
Particle Filter
Ming-Che Ho, Cheng-Chin Chiang, Ying-Yu Su
Department of Computer Science and Information Engineering, National Dong Hwa University, Shoufeng,
Hualien, Taiwan, 974
Abstract
Aiming at tracking visual objects under harsh conditions, such as partial occlusions, illu-
mination changes, and appearance variations, this paper proposes an iterative particle lter
incorporated with an adaptive region-wise linear subspace (RWLS) representation o objects.
The iterative particle lter employs a coarse-to-ne scheme to decisively generate particles
that convey better hypothetic estimates o tracking parameters. As a result, a higher track-
ing accuracy can be achieved by aggregating the good hypothetic estimates rom particles.
Accompanying with the iterative particle lter, the RWLS representation is a special design
to tackle the partial occlusion problem which oten causes tracking ailure. Moreover, the
RWLS representation is made adaptive by exploiting an ecient incremental updating mech-
anism. This incremental updating mechanism can adapt the RWLS to gradual changes in
object appearances and illumination conditions. Additionally, we also propose the adaptive
mechanism to continuously adjust the object templates so that the varying appearances o
tracked objects can be well handled. Experimental results demonstrate that the proposed
approach achieves better perormance than other related prior arts.
Keywords: object tracking, region-wise linear subspace (RWLS), iterative particle lter,
incremental PCA
Corresponding author: Cheng-Chin Chiang. Tel.: +886-3-8634027; ax: +886-3-8634010.Email address: [email protected] (Cheng-Chin Chiang)
Preprint submitted to Pattern Recognition Letters July 8, 2011
-
8/2/2019 Tracking Revised)
2/29
1. Introduction
Visual object tracking is a core task in most computer vision applications and has been
intensively researched over the past decade. The typical objects needing to be tracked in
applications include aces, hands, cars and human bodies, etc. A wide spectrum o poten-
tial applications, such as tele-conerencing, video surveillance, human-machine interactions,
intelligent transportation systems, etc., have been developed or our daily lie.
Visual object tracking are challenging due to the intrinsic and extrinsic variations in
tracking conditions. Intrinsic variations, the variations that appear on tracked objects, may
include the dynamic changes o object poses, geometries, colors and textures. In contrast,
extrinsic variations are variations induced by environmental conditions, including illumina-
tion changes, cluttered backgrounds, and partial occlusions on tracked objects. No matter
what kind o variations appear, the diculty o object tracking increases i the tracking is
based on the matching o object appearances.
Tracking by matching object appearances, sometimes termed as the template matching,
is a common approach or object tracking ?. Under diferent poses and illumination, the
object appearance would vary continuously during the tracking. Hence, tracking objects
with one or several xed templates o object appearances is not easible in practical applica-
tions. An adaptive template representation becomes essential in handling the varying object
appearances during the tracking.
Besides the varying poses and illumination, partial occlusions on a tracked object is
another major cause o tracking ailure. A partial occlusion that causes the discrepancy
between the appearances o the occluded object and the template oten ails the template-
based tracking. Thereore, handling the problem o partial occlusions demands a exible
representation o object appearances that reveals high tolerance to the missing o some local
parts o object appearances.
Due to the dynamic pose changes o an object during its motion, an object tracking
method is usually required to estimate the pose o the tracked object. The pose o an
object is usually characterized with the parameters relating to the position, scale/dimension,
2
-
8/2/2019 Tracking Revised)
3/29
and rotation o the object. Hence, the tracking problem is sometimes reerred to as the
pose estimation problem. Inventing a good way which can accurately estimate the pose
parameters o the tracked object under the challenges rom various appearance variations is
actually the kernel task and also the nal goal o the research o visual object tracking.
Motivated by the demanded technical designs and research goals mentioned above, our
work presented in this paper aims at developing a robust and efective solution to the object
tracking problem. This solution encompasses a method to estimate pose parameters o
visual objects and an adaptive and exible representation to tackle the problems o varying
illumination and partial occlusions. The proposed method to estimate object poses is aniterative particle lter method, which ofers improved perormance over other particle lter
methods. The proposed object representation is a RWLS representation which enables the
tracking o partially occluded objects. To adapt the RWLS representation to intrinsic and
extrinsic variations during the tracking, an incremental updating scheme is also designed to
update the bases o the subspace with each up-to-date input in an ecient way. Besides the
adaptive linear subspace representation, we also devise an incremental updating mechanism
to adapt the object template to the varying object appearances. With the iterative particle
lter, the adaptive RWLS representation, and the adaptive object template, the problem o
pose estimation under the challenges o partial occlusions, illumination changes, and varying
object appearances can be simultaneously handled very well.
The rest o this paper is organized as ollows. Section 2 presents a brie survey o
related work. Section 3 then introduces the adaptive RWLS representation o objects and
the adaptive incremental updating scheme o the linear subspace or dynamic appearance
variations. Section 4 presents a preliminary review o the conventional particle lter and then
proposes the iterative particle lter method or pose estimation. The mechanism to handle
partial occlusions and the adaptive templates or handling varying object appearances are
also described in this section. Section 5 shows the experimental results and compares the
perormance with those o other related prior arts. Section 6 gives the conclusions to end
this paper.
3
-
8/2/2019 Tracking Revised)
4/29
2. Related Work
Several studies (????) have shown the high ecacy o the linear subspace in many appli-
cations o object tracking and recognition. ? proposed a pre-trained view-based eigenspace
representation or object tracking, called Eigentracking. The appearance variations o ob-
jects with diferent poses are limitedly captured by the samples o many diferent views. ?
developed an ecient ane tracking scheme to deal with changing illumination by exem-
plar training under a variety o lighting conditions. However, these approaches still may
encounter exceptional cases or untrained poses or illumination conditions. Moreover, these
methods require the storage and the eforts or collecting a large set o samples or building
the linear subspace.
To reduce the cost o storage and the eforts or of-line linear subspace training, the
online subspace learning (???) ofers an alternative solution. The key merit o the on-
line subspace learning methods is to incrementally update the subspace whenever any new
sample becomes available. The updating requires no accesses to the past samples and thus
no storage or accumulating the past data is necessary. Following this principle, ? pre-
sented an incremental principal component analysis (PCA) to update the linear subspace.
? employed the incremental PCA to update the appearance model or ace recognition and
tracking. However, the incremental PCA method can build a biased subspace i the new
samples contain outliers or out-o-date noise. ? developed an ecient incremental updating
algorithm which incorporates a orgetting actor to wear down the inuence o older samples.
Their empirical results show that the incremental method can tolerate larger pose variations
and illumination changes. However, the problem o partial occlusions still cannot be well
solved.
As to the pose estimation, previous work can be divided into two categories: the deter-
ministic approach and the stochastic approach. The deterministic approach, including the
template-based algorithms (??) and the mean-shit algorithm (??), estimates the object
poses without introducing any random process into the estimation. One typical example
o the template-based algorithms is the algorithm proposed by ? which revises the Lucas-
4
-
8/2/2019 Tracking Revised)
5/29
Kanade optical ow algorithm (?) into an ecient inverse compositional (IC) algorithm.
This algorithm estimates the poses o objects that undergo diferent motion by the process
o gradient-descent error minimization. The major weakness o the gradient-descent error
minimization is the problem o trapping into local minima during the minimization, which
may lead to undesirable tracking results. This method is also error-prone to track objects
with larger appearance changes. ? presented the mean-shit tracking algorithm using the
Bhattacharyya distance to calculate the similarity between the color density distributions o
the template and the tracked object. Since the color density distribution is a global visual
eature which is not sensitive to local distortion o object appearances, the mean-shit al-gorithm can tolerate partial occlusions to some extent. Nonetheless, it is dicult to attain
accurate estimation o object poses under the use o a coarse-level visual eature. Some ex-
tensions o the mean-shit algorithm (??) capture the spatial inormation which calculates
the means and the covariance corresponding to their color bins making the pose estimation
more accurate and robust.
In contrast to the deterministic approach, the stochastic-based approach or object track-
ing typically estimates the object pose parameters, which are usually modeled with random
variables, through a random process. The instance values o all modeled pose parameters
at a certain moment are collectively called the state o the tracked object. The Kalman
lter is a well-known method or the state space estimation based on a linear stochastic
model o system dynamics. The Kalman lter produces the estimates o the true values o
measurements and their associated calculated values by predicting a value, estimating the
uncertainty o the predicted value, and computing a weighted average o the predicted value
and the measured value. The particle lter is a generalized extension to the Kalman lter
because it assumes no linearity on the system dynamics. In addition, the random noise in
the stochastic process can be non-Gaussian. The pioneering work o the particle lter is
the CONDENSATION algorithm proposed by ?. Some extensions o the particle lter have
also been developed to enhance the eciency and efectiveness or visual object tracking. ?
urther proposed the ICONDENSATION algorithm, which incorporated an auxiliary blob
tracker into the CONDENSATION algorithm. The tracking o the auxiliary blob tracker is
5
-
8/2/2019 Tracking Revised)
6/29
based on well segmented regions o interest with homogeneous colors. Unortunately, an-
other challenging problem encountered in this method is how to robustly segment the object
into desirable regions.
The estimation results o the particle lter signicantly depend on a set o randomly
generated particles, with each carrying an hypothetic instance o the estimated state. The
nal state estimation is aggregated rom the state instances on all particles. To achieve
better aggregation, the number o particles is usually kept large so that the bad inuence o
outlier particles can be reduced. However, the larger number o particles do not necessarily
guarantee the better hypothetic instances on particles. Moreover, the computation burdensor the state estimation also increase with the number o generated particles. Thereore, a
good design is to devise a mechanism to decisively improve the goodness o state instances
conveyed on particles so that the number o particles required to achieve accurate state
estimation can be reduced.
3. The Adaptive Region-wise Linear Subspace (RWLS) Representation
3.1. RWLS Representation
The principal component analysis (PCA) is a well-known technique o linear subspace
representation. By PCA, a transormation U can be derived to transorm a data vector x
in a higher dimensional space into another data vector c in a lower dimensional space, i.e.,
x = x + Uc, where x is the mean vector o the collected data set. Inspired by the region-
based ace recognition approach proposed by ?, we adopt a region-wise linear subspace
representation or object appearance. By partitioning an object appearance into several
regions, with each represented with a linear subspace, a partially occluded object still can
be tracked as long as one or more regions are not occluded. Thus, the robustness o tracking
a partially occluded object can be efectively enhanced by the region-wise representation.
Concerning the way to partition the object appearance into regions, a simple and regular
partitioning scheme is preerred or easier subimage cropping and region tracking. Hence,
the proposed region-wise representation exploits the simplest way to uniormly partition
6
-
8/2/2019 Tracking Revised)
7/29
each object appearance into kk rectangular regions, where k is a designs choice. For ease
o reerence, we use the notation Rk to denote the scheme o k k region partitioning. For
example, the partitioning scheme R1 treats the whole object appearance as a single region,
while R2 partitions the object appearance into 22 equal-size regions.
3.2. Incremental Subspace Updating
In PCA, a linear subspace can be built by solving the eigenproblem on the covariance
matrix computed rom a collection o samples. However, such a batched processing on a
xed set o samples cannot well model the appearance variations o a moving object over
time. Suppose that the rame rate o the video camera is aster enough, e.g., more than 30
rames per second, we can assume that all appearance changes on objects occur gently. To
adapt the RWLS to the up-to-date conditions, we need to recomputed the eigenvectors o
the new regional covariance matrices updated with new coming samples. Nonetheless, this
recomputation o covariance matrices and eigenvectors would require a high storage o all
past samples and also demands a high cost in time.
? proposed an ecient method to incrementally update the eigenvectors without storing
the past samples. This method updates the current eigenvectors using only the newest
sample. One problem o this method is that the subspace may be improperly updated by
some incoming outliers or noisy samples. To avoid this problem, we proposes a revised
incremental subspace updating scheme, called the weighted incremental PCA (WI-PCA).
The merit o the WI-PCA is to update the subspace using a new coming sample i this
sample is reliable enough. To this end, each incoming sample is associated with a weight
value, which is inversely proportional to the residual computed on approximating the sampleusing its lower-dimensional subspace representation. I the residual is large, meaning that
this sample is very likely to be an outlier or a noise with respect to the current subspace,
then the associated weight value is small. In what ollows, we present the ormal derivations
o the WI-PCA.
Let CN be the covariance matrix computed rom {xi}1iN, and CN+1 be the new
covariance matrix obtained ater adding a new sample xN+1. Similarly, the mean vectors
7
-
8/2/2019 Tracking Revised)
8/29
beore and ater adding the new sample xN+1 are denoted by xN and xN+1, respectively.
Both CN+1 and xN+1 can be derived recursively as ollows:
xN+1 =1N+1
i=1 i
Ni=1
ixN + N+1xN+1
, (1)
CN+1 =
Ni=1 iN+1i=1 i
CN +N+1
Ni=1 i
(N+1
i=1 i)2
xxT, (2)
where x = (xN+1 xN), and i is the weight associated with xi. Note that the above
computations involve no past samples in {xi}1iN. The value o the term Ni=1 i can be
also incrementally updated and stored with a variable, say N, on the arrival o each new
sample. In efect, this term sums up the weights o the past samples. Here, we introduce a
orgetting actor f, or 0 < f 1, into the adaptive update o N, i.e., N+1 = fN+N+1.
The orgetting actor aims to lower down the importance o the past particles. Accordingly,
the adaptive computation o xN+1 and CN+1 in Eq. (1) and Eq. (2) can be rewritten with
xN+1 =1
N+1(fNxN + N+1xN+1) , (3)
CN+1 =fNN+1
CN +N+1fN(N+1)2
xxT. (4)
In physical meaning, the associated weight i in Eq. (1) and Eq. (2) diferentiates the
inuence o the sample xi on the PCA process. Noisy or outlier samples are assigned
with smaller weights to reduce their inuence. As mentioned previously, we can relate the
weight value o a sample xi to the approximation residual using the current linear subspace.
According to PCA, the approximation residual can be computed by ri = xi x Uci.Accordingly, the weight i o xi can be set as N+1 = exp(k||ri||), where k is a constant
or controlling the rate o change o the weight with respect to the magnitude o the residual
and we set it to 1 in our implementation.
Let U be the matrix whose columns comprise the current set o eigenvectors obtained
rom the PCA on N samples, {xi}1iN. When the new incoming sample xN+1 becomes
available, the new set o eigenvectors must be recomputed and stored into the matrix U.
8
-
8/2/2019 Tracking Revised)
9/29
The underlying eigenproblem o the PCA can then be ormulated by
CN+1U = U (5)
where is a diagonal matrix whose diagonal elements are the eigenvalues corresponding to
the eigenvectors in U. Due to the orthogonality o both U and U, the new eigenvectors ater
adding the new sample can be considered as a rotated version o the set o old eigenvectors,
i.e. U = RU, where R is an orthornormal rotation matrix. Equivalently, it can be written
with
U = UR (6)
where R = UTRU or another rotation matrix R. From Eqs. (5) and (6), we have
UTCN+1UR = R (7)
Consequently, Eq. (7) leads to the equation o a new eigenproblem. The solution o the
rotation matrix R is exactly the eigenvectors o the composite matrix U
T
CN+1U. Sincethis composite matrix has a much lower dimension than the matrix CN+1, the computation
complexity or deriving its eigenvectors is also much lower. Ater nding the rotation matrix
R rom the composite matrix, the new eigenvectors can be obtained rom Eq. (6).
4. Visual Object Tracking by Iterative Particle Filter
4.1. Particle Filter
A particle lter ormulates the tracking problem by a state prediction equation,
xk = f(xk1, uk) (8)
and a measurement (or observation) unction,
zk = h(xk, nk) (9)
9
-
8/2/2019 Tracking Revised)
10/29
where xk Rn and zk R
d are respectively the vector o state parameters and the mea-
surement (or observation) at the time k, uk and nk are respectively an independent and
identically distributed (i.i.d.) random vector o process noise and measurement noise, and
the unctions f() and h() respectively dene a prediction model o the state parameter
vector and a measurement unction with respect to the given state parameter vector . This
measurement is generally modeled by a likelihood unction p(zk|xk).
The Sequential Importance Resampling (SIR) method (??), a well-known state parame-
ter estimating method o the particle lter, approximates the expectation o the prediction
E[f(xk)p(xk|z1:k)] by aggregating a set o weighted particles Sk = {xi
k, wi
k}1iNs where theweight wik approximate the relative posterior probabilities o the stochastically generated
particle and satisesNs
i=1 wik = 1. The aggregated state estimation is
xk = E[f(xk)p(xk|z1:k)] = E[xk|z1:k] Nsi=1
wikxik. (10)
With respect to the measurement zik induced by the hypothetic estimate xik on a particle i,
the particle weight wik or the current rame turns out to be
wik wik1p(z
ik|x
ik). (11)
One common problem with the particle lter is the degeneracy problem, which occurs
ater several iterations o re-weighting (?). This problem occurs when all but one particle
has negligible weight, implying that most computational eforts are wasted on updating
particles that contribute nothing to the approximation to p(xk|z1:k). A remedial operationis to perorm a resampling process o particles i the number o efective particles is too
small. To determine the appropriate timing or the resampling process, a criterion or
the degeneracy is dened as Neff = 1/NSi=1(wik)2, which logically indicates the number oparticles with efective weights. The resampling process is thus initiated i Neff NT,
where NT is a predened threshold. In the resampling process, each inefective particle is
replaced with a new particle which carries a state instance stochastically perturbed rom an
10
-
8/2/2019 Tracking Revised)
11/29
existing particle with a higher particle weight. All new particles ater the resampling process
have equal weight (i.e., wik1
= 1/Ns), implying that wi
krelates only to p(zi
k|xi
k) according
to Eq. (11).
4.2. The Iterative Particle Filter
In the SIR particle lter, the particle weights are related to the likelihood value p(zik|xik).
In our method, we design the weight o each particle to be proportional to a criterion unction
G(zik|xik), which denes quantitatively the goodness o the observation z
ik with respect to the
hypothetic state parameter xik. Instead o employing the conditional resampling strategy o
the SIR particle lter, we perorm the resampling process unconditionally at every rame
k. Furthermore, when doing tracking on each rame, a ltering process and the resampling
process are iteratively perormed or a xed number o iterations to enhance the goodness
o the survived particles or tracking. This is why we call the proposed particle lter an
iterative particle lter.
4.2.1. Models of State Transition and Measurement
For tracking an object with a particle lter, we represent the state parameters with the
location and the dimension o the tracked object on each video rame. Hence, the state
parameters are encoded with a vector x = (x,y,w,h), dening the objects bounding box
which has its upper-let corner situated at (x, y) and a dimension o w h pixels on the
rame.
As ormulated in Eqs. (8) and (9), a particle lter requires a state transition model
and a measurement model. To characterize the object motion, a discrete equal-velocity
equation is adopted or modeling the position parameter o the object, i.e. vk = pk1pk2,
where pk = (xk, yk, 0, 0)T denotes the position o the objects bounding box at the rame
k. For simplicity, the perturbation uk in Equation (8) is dened as a random vector uk =
(xk, yk, wk, hk)T. This vector u adds small random deviations, xk, yk, wk and
hk, to the estimated position and dimension o the object. Combining the velocity and
11
-
8/2/2019 Tracking Revised)
12/29
the random perturbations, the state transition model can be dened as
xk = xk1 + vk + uk. (12)
Given the state parameter vector xk, the measurement on the input rame Ik is dened as
the appearance o the tracked object, i.e., zk = I(xk), where Ik(xk) denotes the subimage
enclosed by the bounding box specied by xk. Here, no random noise, as the nk in Eq. (9),
is assumed or the measurement model.
4.2.2. Iterative Filtering and Resampling
The particle weight in our design is computed rom a quantitative unction G(zik|xik) which
evaluates the goodness o the hypothetic observation on each particle. Suppose that tk is the
adaptive template used or tracking the target object. The quantitative unction G(zik|xik)
can be designed in terms o the matching error between the observation zik = I(xik)) and the
template tk. Based on the linear subspace representation, the particle weight is designed as
wik = G(zik|xik) = exp(Utkzik Utktk2k), (13)
where the columns in the matrix U contain the eigenvectors o the current linear subspace.
The matching error in Eq. (13) involves a weighted norm which is dened as
a = (a1, a2, . . . , am)t2k=(1,2,...,m)t
=mi=1
ia2i (14)
The parameter in Eq. (13) is a small positive value and controls the sensitivity o thegoodness value to the change o the weighted norm. The elements in the weight vector
k reect the importance o the elements o the vector in calculating the norm. Inside the
compressed adaptive template Utktk, i the values o an element present only a small variance
over a period o time, meaning that this element is stable and reliable under the current
appearance variations, then this element should gain a higher weight ( i) in computing the
norm. The detailed procedure or obtaining the weight vector k and the adaptive template
12
-
8/2/2019 Tracking Revised)
13/29
tk is presented later in Section 4.4.
When tracking an object on a certain video rame, we use the nal state estimation
o the previous rame as the seed to stochastically generate hypothetic state instances on
the particles. The generation o state instances ollows the stochastic state transition model
given in Eq. (12). The goodness o each particle is then evaluated according to Eq. (13). With
the particles and the goodness o particles (say Sk = {(xik, w
ik = G(z
ik|x
ik))}
Nsi=1), a ltering
operation removes the particles with lower goodness values and the remained particles are:
S
k
= filter(Sk, ) = {(xi
k
, wi
k
)|((xi
k
, wi
k
)i
k
Sk) (wi
k
)}, (15)
where = 1.2 min{wik}Nsi=1 is set as a 20% increment o the minimum value o the weights.
With the particles in the new particle set Sk, the resampling process particle lter is then
perormed. In the resampling process, the particles are resampled according a probability
proportional to their weights. For each sampled particle, a small random perturbation uk,
as dened in Eq.(12), is made on the carried state instance to increase the opportunity
o escaping rom a locally optimal estimation. The random perturbation is designed to
decrease with iterations, i.e., uk = iteruk, or (0, 1), to ensure the nal convergence o
the state estimation ater several iterations. On each rame, the ltering operation and the
resampling process are iteratively perormed in turn or several runs to enhance the goodness
o the remained particles or tracking.
Let Sk be the particle set obtained rom the nal run o resampling process. The weight
or each particle in Sk is re-evaluated according to Eq. (13). Each weight is then normalized
by dividing its value by the the sum o all weights. Finally, the aggregation scheme isperormed to iner the nal estimation rom the normalized weights, i.e.,
xk =
(xik,wi
k)filter(S
k,)
wikxik. (16)
To illustrate the gradual improvement o the tracking as the iterations go, Figure 1
demonstrates an example to track a ast-moving hand. The white box in Figure 1(a) shows
13
-
8/2/2019 Tracking Revised)
14/29
(a) (b) (c)
Figure 1: The gradually improved particle estimates o the proposed iterative particle flter or 3 iterationso particle resampling. (a) Tracking result on a rame at time t 1, (b) the particle estimates o the threeiterations o particle resampling, illustrated respectively with white boxes, gray boxes, and black boxes, onthe rame at time t, and (c) the fnal aggregated tracking result on the rame at time t.
the tracking result on a certain rame. On tracking the hand in the next rame shown
in 1(b), the white boxes illustrate the estimates rom 100 particles generated in the rst
iteration. Ater the second iteration, the estimates rom these particles are illustrated with
gray boxes. Apparently, the estimates get closer to the true position o the hand comparing
to the estimates in the rst iteration. When the third iteration is completed, the generated
estimates illustrated with black boxes are even better than the estimates in the second
iteration. The nal aggregated estimation is shown with the white box in Figure 1(c).
For the conventional particle lter, this kind o ast-moving objects would demand a large
number o particles (e.g., >600) and require large perturbations to attain good tracking
results. Additionally, the tracking results on diferent video rames may drit unstably
because o the introduced large perturbations.
4.3. Handling of Partial Occlusions
Owing to the RWLS representation, the matching between each observation and theobject template is also perormed in a region-wise manner. Intuitively, the matching errors
o occluded regions would be larger than those o un-occluded regions. Let zk(r) be the image
observation corresponding to region r, and tk(r) be the corresponding regional sub-image
on the object template. The regional matching error on this region is computed by
Err(zk(r)) = Utkzk(r) U
tktk(r)k , (17)
14
-
8/2/2019 Tracking Revised)
15/29
where the matrix Uk contains the eigenvectors o the current linear subspace, and the weight
norm k is dened in Eq. (14). A region r is claimed to be occluded i its regional matching
error satises the ollowing condition
Err(zk(r)) > mean({Err(zk(r))}Rr=1) + stdv({Err(zk(r))}
Rr=1), (18)
where R is the total number o partitioned regions and the two unctions mean(S) and
stdv(S) compute respectively the mean and standard deviation o the data in a given data
set S. The constant controls the allowed error deviation rom the averaged regional
matching errors. For each particle, i more than a hal o the partitioned regions on the
hypothetic observation o this particle are identied as occluded regions, then this particle
is discarded. For each video rame, i no particle remains ater the identication o occluded
regions, then the particle lter skips the tracking on the current rame and proceeds to the
next rame.
For handling partial occlusions, the matching errors o occluded regions should not be
included into the nal matching error between the observation and the template. Otherwise,
the tracking may be ailed by these occluded regions. Hence, we rene the matching error
between the object observation and the object template as
Err(zk) =1
R |Socc|
r/Socc
Err(zk(r)), (19)
where Socc is the set containing all occluded regions and |Socc| denotes the number o occluded
regions.
4.4. Adaptive Template Updating
The incremental updating o the linear subspace adapts the subspace representation
to the up-to-date inputs. However, even with the up-to-date subspace representation, the
stored object template is still likely to be out-o-date. Thereore, adaptively updating the
template is another important mechanism or handling the appearance variations o objects.
We call the adaptively updated template an adaptive template.
15
-
8/2/2019 Tracking Revised)
16/29
The adaptive template is updated with the appearance o the object tracked on the most
recent rame. Nonetheless, the template updating should be made conditional to avoid the
improper inuence rom disturbances. According to the tracked object on the current rame,
i no region is identied as an occluded region, meaning that the tracked observation has
no atal disturbances in appearance, then the template can be updated with the tracked
observation. Let tk be the adaptive template at the rame k, and zk be the corresponding
tracked observation. The update is perormed according to
tk+1 = (1 )tk + zk, (20)
where (0, 1) controls the rate o the updating. A larger value o means a aster
adaptation o the template toward the new object appearance. Empirically, the value o this
parameter highly depends on the rate o appearance changes. The parameter is normally
set below 0.05 or objects with normal moving speeds and slowly changed illumination and
around 0.05 0.5 or aster variations in object appearances and illumination.
Since the template is adaptable, some elements o the template vector may have requent
changes due to the appearance changes o the tracked object. Such varying elements may
unstably afect the calculation o weighted norm between the observed appearance and the
object template in Eq.(13) and Eq. (17) and consequently lead to wrong tracking results.
Hence, we introduce the weight vector in Eq.(13) and Eq. (17) to reduce such a negative
efect. Recall that the elements o the weight vector assign diferent importance actors
to the elements o the compared vectors on calculating the weighted norm. Highly varying
elements in the template should be given low importance values. Thereore, the importanceo each element in the template vector can be quantitatively evaluated in terms o the
variance computed rom the element values collected rom the tracked objects on past rames.
Similar to the adaptive updating in Eq. (1) and Eq. (2), the variances o the elements in the
16
-
8/2/2019 Tracking Revised)
17/29
template vector represented in the linear subspace can be incrementally updated by
k+1 =1
k + 1kk + Utk+1tk+1 , (21)
k+1 =k
k + 1k +
k
(k + 1)2(Utk+1tk+1 k+1)(U
tk+1tk+1 k+1)
t, (22)
k+1 = expv(diag(1
2(Utk+1tk+1 k+1)
1k+1(U
tk+1tk+1 k+1)
t)), (23)
where diag(M) denotes a vector composed o the diagonal elements o a matrix M and
expv
(a = (a1, a2, . . . , ad)t) = (exp(a1), exp(a2), . . . , exp(ad))
t. In summary, the proposed
method adapts both the template and the linear space. Eq. (20) denes the way to adapt the
template in the original image space. Meanwhile, the eigenspace is incrementally updated by
the proposed method presented in Section 3.2. Eqs. (21)-(23) denes the way to adaptively
compute the weight vector k required or calculating the matching residual dened in
Eq. (17). Algorithm 1 lists o the detailed steps o the proposed iterative particle lter.
5. Experimental Results and Performance Comparison
To evaluate the perormance o the proposed algorithm, some experiments are conducted
to track objects on six testing video sequences acquired in real-world environments. Two
sequences are available at http://www.cs.toronto.edu/dross/ivt/, presenting appear-
ance variations, including illumination changes, pose changes and acial expressions, on the
tracked objects. The rest our sequences are captured with our camcorder and present the
cases o partial occlusions, size variations, and ast motions. The tracking algorithm is im-
plemented in C++ on Microsot Visual Studio using the Intel Pentium 4 2.8 GHz CPU. Theassessed processing speed is about 14.7 rames per second or 100 particles.
When initializing the tracking o a sequence, we manually speciy the rectangle bounding
box or the target object on the rst rame. The boxed target object appearance is then
resized to a 2424 object template. Then, we randomly translate and scale the bounding
box by a small random perturbation or 100 times to acquire 100 samples to build the initial
linear subspaces o the partitioned regions. Two region partitioning schemes, R1 and R2, are
17
-
8/2/2019 Tracking Revised)
18/29
Algorithm 1 The Proposed Tracking Algorithm with Occlusion Handling
1: Given a particle set Sk1 = {xik1, 1/Ns}
Nsi=0, target template tk1 and subspace model
k1 = {xk1, Uk1, Ck1} at the rame k 1.2: Set occ flag = 0 to indicate no occlusion.3: Set iter = 1.4: for i = 1 : Ns do5: Propagate the particle set or the initial iteration by xik,iter = x
ik1 + vk + uk.
6: Evaluate the measurement zik,iter corresponding to state xik,iter.
7: Update the weight wi
k,iter
by Eq. (13).8: end for9: Normalize the weight wik,iter = w
ik,iter/
Nsi=1 w
ik,iter, or 1 i Ns.
10: for iter = 2 : Iter do11: Generate the seed sample set according to ltering unction Sk,iter =
{xjk,iter, wjk,iter} = filter(Sk,iter1, ) by Eq. (15) .
12: Set c0k = 0.13: for j = 1 : J do14: cjk = c
j1k + w
jk,iter.
15: end for16: Normalize the cumulative probability cjk = c
jk/c
Jk or 1 j J.
17: for i = 1 : Ns do18: Generate a uniormly distributed random number r [0, 1].19: Find the smallest j which cjk r.20: Propagate the particle set xik,iter = x
jk,iter +
iteruk.
21: Evaluate the measurement zik,iter corresponding to state xik,iter.
22: Update the weight wik,iter by Eq. (13).23: end for24: Normalize the weight wik,iter = w
ik,iter/
Nsi=1 w
ik,iter.
25: end for26: Perorm the ltering unction Sk = filter(S
k,iter,
).
27: Normalize the weight wik,iter = wik,iter/Nsi=1 wik,iter.28: Estimate the state xk by Eq. (16).29: Set occ flag according to match error by Eq. (18).30: if occ flag = 0 then31: Update the template tk by Eq. (20) and subspace model k presented in Section 3.2.32: end if
18
-
8/2/2019 Tracking Revised)
19/29
Table 1: Parameter settings or the proposed iterative particle flter
orgetting actor, [0.05,1] 0.2 0.7 1.5 [0.03,0.1]
used or comparison. The numbers o eigenvectors used in representing the linear subspaces
o R1 and R2 are 50 and 15, respectively. Only 100 particles are generated in our iterative
particle lter on tracking each testing sequence.
The setting o other parameters, including orgetting actor, , , and , are listed
in Table 1. The crucial parameters are orgetting actor and learning rate o template
, which should be set according to the rate o appearance changes o the tracked object.
The range o the uniormly distributed random vector uk = (xk, yk, wk, hk)T are
xk U[12, 12], yk [12, 12], wk [0.035, 0.035], hk [0.035, 0.035].
5.1. Experimental Results
The experiment conducted on the rst testing video sequence is to track a human ace
(dudek) with diferent head poses and acial expressions. Figure 2 demonstrates some snap-
shots o the tracking results. At the bottom o each illustrated snapshot, the thumbnail
images rom let to right show the tracked target, the template, the subspace mean, the
approximation error (residual) image and the approximated image, respectively. For com-
parison, Figure 2 simultaneously illustrates the tracking results on using the ollowing our
diferent combinations o adaptive mechanisms:
1. a xed template and a xed subspace representation,
2. an adaptive template and a xed subspace representation,
3. a xed template and an adaptive subspace representation, and
4. an adaptive template and an adaptive subspace representation.
The results show evidentially that an adaptive template combined with an adaptive subspace
representation attains the best perormance. The combinations with no adaptive templates
19
-
8/2/2019 Tracking Revised)
20/29
ail to track the aces rom Frame #796 onward, while the one with both the adaptive
template and the adaptive subspace correctly tracks the aces on all rames.
Another video sequence or tracking the ace o another person (ming-hsuan) is also
tested. Figure 3 shows the snapshots o the tracking results or comparing the use o the xed
subspace and the adaptive subspace. Both compared methods use the adaptive template.
Note that this testing sequence contains large illumination variations on some rames (#400
and #1200). The top row in Figure 3, which illustrates the tracking results o the xed
subspace, shows that the ace cannot be well tracked on Frame #1420. However, this rame
still can be correctly tracked by the adaptive subspace, as shown in the bottom row oFigure 3.
Frame362
Frame684
Frame934Figure 2: Face tracking on a sequence (dudek) with variations in poses and acial expressions. Column 1:the tracking results with a fxed template and a fxed subspace model; Column 2: the tracking results with afxed template and an adaptive subspace model; Column 3: the tracking results with an adaptive templateand a fxed subspace model; Column 4: the tracking results with an adaptive template and an adaptivesubspace model.
Two other sequences are tested to examine the ecacy o the RWLS in handling partial
occlusions. The rst sequence is the video o a moving toy tank which is gradually occluded
20
-
8/2/2019 Tracking Revised)
21/29
Frame 1 Frame 400 Frame 1200 Frame 1420
Frame 840 Frame 1120 Frame 1200 Frame 1420
Figure 3: Face tracking on another sequence (ming-hsuan) with large variations in illumination and poses.The top row shows the tracking results with an adaptive template and a fxed subspace representation. Thebottom row shows the tracking results with an adaptive template and an adaptive subspace representation.
by another scene object during its motion. The maximal occluded area during its motion is
about 50%. Figure 4 shows the tracking results on some rames or the region partitioning
schemes o R1 and R2. Note that the character N shown at the let side o each snapshot o
the scheme R2 indicates that the corresponding partitioned region is automatically identied
as an un-occluded region by the proposed tracking algorithm. On the contrary, a region
which is identied as an occluded regions is labeled with its region identication number. As
shown in Figure 4, the scheme R2 successully tracks the occluded tank, while the scheme
R1 ails. This result veries the good capability in handling partial occlusions or the
region-wise tracking o objects. The tracking on the other testing sequence is to track a
Chinese character printed on an aluminium oil package. As shown in Figure 5, the maximal
occluded area during its motion is about 40% o size o this Chinese character. The results
demonstrate again the superiority o the proposed region-wise tracking o objects in handling
partial occlusions.
21
-
8/2/2019 Tracking Revised)
22/29
Frame 1 Frame 295 Frame 308 Frame 500
Frame 1 Frame 295 Frame 308 Frame 500
Figure 4: Tracking a moving toy tank with severe occlusions during its motion. Top row: the results or theR1 representation. Bottom row: the results or the R2 representation.
Frame 1 Frame 57 Frame 77 Frame 97
Frame 1 Frame 57 Frame 77 Frame 97
Figure 5: Tracking a Chinese character on a moving aluminium oil package with partial occlusions. Toprow: the results or the R1 representation. Bottom row: the results or the R4 representation. The objectstracked by the region partitioning R4 are more accurate in object size on Frame 77 and Frame 97.
22
-
8/2/2019 Tracking Revised)
23/29
5.2. Performance Comparisons with Other Particle Filters
The perormance o the proposed iterative particle lter are compared with those theSIR particle lter and a general version o particle lter, which is called the GPF in this
paper. The GPF perorms only one iteration o particle generation. For a more balanced
comparison, both the adaptive subspace and adaptive template are exploited in the SIR and
GPF. Two video sequences are tested or the perormance comparison. One sequence is a
car moving away rom the camera at a normal speed and the other is a doll moved by a ast
moving hand. On testing the video sequence, both SIR and GPF use 300 particles, while
our iterative particle lter uses 75 particles in each iteration. The number o iterations areset to our.
As shown in Figure 6, both SIR and GPF ail to track the car accurately when the
car runs urther or the car sequence. This result indicates that SIR and GPF cannot well
handle the size variations o the car. In contrast, the proposed iterative particle lter gains
better capability in tracking objects with size variations according to the results shown in
Figure 6. The perormance comparison or the doll sequence is shown in Figure 7. Note
that the high moving speed o the hand causes motion blurs on some rames. Consequently,
both SIR and GPF ail to track the doll rom Frame #130 onward. Again, the proposed
iterative particle lter demonstrates its better perormance in tracking ast-moving objects
in this test. The proposed iterative particle lter successully tracks the doll on every rame
o the sequence.
5.3. Performance Comparison with Halls Incremental PCA
The proposed WI-PCA is an improved variant over the incremental PCA proposed by?. Hence, we use three sequences, including the car sequence, the doll sequence and the
jal sequence, as the benchmarks or comparing the perormance o Halls method and our
method. Note that the adaptive template is also exploited in both methods or the test.
To quantitatively evaluate the tracking accuracy or comparison, we compute the posi-
tion error or the tracking result on each video rame. The position error is dened to be
Errpos =
4i=1(X(i) X(i))
2, where X(i) and X(i), or 1 i 4, are respectively the
23
-
8/2/2019 Tracking Revised)
24/29
Frame 1 Frame 62 Frame 77 Frame 115
Figure 6: Tracking a car moving away rom the camera using the SIR, the GPF (using 300 particles), and theproposed particle flter (using 4 iterations with 75 particles per iteration) on rows 1, 2, and 3, respectively.
24
-
8/2/2019 Tracking Revised)
25/29
Frame 109 Frame 127 Frame 130 Frame 137
Frame 109 Frame 127 Frame 130 Frame 137
Frame 109 Frame 127 Frame 130 Frame 137
Figure 7: Tracking a doll moved rapidly by a hand. The top two rows are the tracking results or SIR andGPF, respectively. The bottom row is the tracking results o the proposed particle flter.
25
-
8/2/2019 Tracking Revised)
26/29
corresponding corners o the bounding boxes o the tracked object and the ground truth.
Ater ve rounds o tracking on each sequence, Table 2 lists the position errors averaged
over all rames or these three sequences. The results show that the accuracy o the pro-
posed WI-PCA is slightly better than that o Halls approach. In the test, we nd that
Halls approach may improperly update the linear subspace with the samples o bad object
appearances drawn rom incorrect tracking results. On the contrary, our proposed WI-PCA
can ignore the bad samples or updating through the designed weighting mechanism. Fur-
thermore, the introduced orgetting actor in our WI-PCA gives higher inuence to the
up-to-date good samples, while Halls method equally treats the out-o-date samples.
Table 2: the statistics o the tracking position errors or three video sequences.
jal car dollWI-PCA method 5.350.02 6.501.16 7.420.31
Halls method 6.450.23 7.252.14 7.600.40
To urther demonstrate the efectiveness o the proposed WI-PCA in modeling object
appearance, we also compare the reconstruction errors between the WI-PCA method and
the Halls method. Fity eigenvectors are selected to reconstruct the appearance o ob-
ject or both compared methods. In addition to the previous three benchmark sequences,
we also include the two ace sequences (dudek and ming-hsuan ) used in Section 5.1 or
evaluation. Table 3 presents the computed reconstruction errors averaged over all rames
or ve runs o tracking on each sequence. The reconstruction error is dened to be
RMSE = 1NNi=1(I(i) I(i))2, where I(i) and I(i) are respectively the ith pixel onthe template image and the reconstructed image and N is the total number o the pixels onthe template. The results in Table 3 show that the proposed WI-PCA better represents the
varying appearances o objects.
5.4. Performance Comparison with Mean-Shift Algorithm
We urther compare the proposed tracker with the mean-shit algorithm ( ??) and the
CAMShit algorithm (?) on the doll sequence and the jal sequence. As described in Section
26
-
8/2/2019 Tracking Revised)
27/29
Table 3: statistics o the reconstruction error (RMSE per pixel).
dudek ming-hsuan jal car dollWI-PCA method 0.7680.005 0.9960.002 0.9550.003 1.3520.074 1.4220.001
Halls method 0.7390.117 1.3060.018 0.9970.011 1.5170.017 2.1320.116
2, since the mean-shit algorithm uses the global color histogram as the visual eature or
object tracking, accurate estimation o object poses becomes more dicult. Furthermore,
the mean-shit tracker is error-prone under varying illumination because the color histogram
is sensitive to illumination. Figure 9 demonstrates some snapshots o the tracking results on
the two testing sequences. The tracked results o the proposed tracker and the mean-shit
tracker are drawn with solid white boxes and dashed yellow boxes, respectively. Figure 9
shows that the mean-shit tracker loses the doll when the doll move rapidly (#130, #200).
The proposed tracker tracks very well till the end o the sequence. Figure 9 shows that the
mean-shit tracker ails to track the ace under illumination changes (#40, #170). Figure
10 (a) and (b) plot respectively the position error o the tracked object on each rame o the
tracked doll and the jal sequences or these compared two trackers.
Frame 110 Frame 130 Frame 182 Frame 209
Frame 40 Frame 80 Frame 160 Frame 197
Figure 8: The snapshots o the tracked object on the doll sequence and the jal sequence or comparing theproposed tracker, the mean-shit tracker and the CAMShit tracker. The dashed yellow boxes are the resultso the mean-shit tracker; the solid red boxes are the results o the CAMShit tracker; the solid white boxesare the results o the proposed tracker.
27
-
8/2/2019 Tracking Revised)
28/29
6. Concluding Remarks
This paper designs an improved particle lter and a RWLS representation or visual
object tracking. With this RWLS representation, the proposed tracking method partitions
the object into kk independent regions. The independent tracking o these regions enables
the proposed method to ignore the occluded regions and continue the tracking o un-occluded
regions. Thus, the partial occlusions can be efectively handled during the tracking.
To enhance the adaptability o the linear subspace, an adaptive subspace learning model,
the WI-PCA, which can eciently and incrementally update the built subspace is proposed.
This adaptive learning model can well adapt to the variations in object appearances and
illumination. In addition, the WI-PCA seeks to modulate the bad inuence rom outliers,
noisy inputs, and out-o-date data by introducing a weighting mechanism and a orgetting
actor into the adaptation. Besides the adaptive subspace, the object template or tracking
is also made adaptive so that the dynamic appearance variations o the tracked object can
be handled even better.
On the particle lter ramework, we propose an iterative particle lter to improve over
the traditional particle lters. The improved particle lter eatures a better strategy o
iterative particle generation which is designed to guarantee the gradually improved quality
o generated particles. The improved particle quality leads to an improved tracking accu-
racy ater the particle aggregation. The experimental results demonstrate the efectiveness
and the superiority o the proposed algorithm on tracking objects undergoing various pose
changes, partial occlusions, and illumination variations.
28
-
8/2/2019 Tracking Revised)
29/29
(a) The doll sequence.
(b) The jal sequence.
Figure 9: The comparison o position errors o the tracked objects on the doll sequence and the jal sequenceor the mean-shit tracker, the CAMShit tracker and the proposed tracker.
29