tracking revised)

8/2/2019 Tracking Revised)

1/29

Object Tracking by Exploiting Adaptive Region-wise Linear

Subspace Representations and Adaptive Templates in an Iterative

Particle Filter

Ming-Che Ho, Cheng-Chin Chiang, Ying-Yu Su

Department of Computer Science and Information Engineering, National Dong Hwa University, Shoufeng,

Hualien, Taiwan, 974

Abstract

Aiming at tracking visual objects under harsh conditions, such as partial occlusions, illu-

mination changes, and appearance variations, this paper proposes an iterative particle lter

incorporated with an adaptive region-wise linear subspace (RWLS) representation o objects.

The iterative particle lter employs a coarse-to-ne scheme to decisively generate particles

that convey better hypothetic estimates o tracking parameters. As a result, a higher track-

ing accuracy can be achieved by aggregating the good hypothetic estimates rom particles.

Accompanying with the iterative particle lter, the RWLS representation is a special design

to tackle the partial occlusion problem which oten causes tracking ailure. Moreover, the

RWLS representation is made adaptive by exploiting an ecient incremental updating mech-

anism. This incremental updating mechanism can adapt the RWLS to gradual changes in

object appearances and illumination conditions. Additionally, we also propose the adaptive

mechanism to continuously adjust the object templates so that the varying appearances o

tracked objects can be well handled. Experimental results demonstrate that the proposed

approach achieves better perormance than other related prior arts.

Keywords: object tracking, region-wise linear subspace (RWLS), iterative particle lter,

incremental PCA

Corresponding author: Cheng-Chin Chiang. Tel.: +886-3-8634027; ax: +886-3-8634010.Email address: [email protected] (Cheng-Chin Chiang)

Preprint submitted to Pattern Recognition Letters July 8, 2011


2/29

1. Introduction

Visual object tracking is a core task in most computer vision applications and has been

intensively researched over the past decade. The typical objects needing to be tracked in

applications include aces, hands, cars and human bodies, etc. A wide spectrum o poten-

tial applications, such as tele-conerencing, video surveillance, human-machine interactions,

intelligent transportation systems, etc., have been developed or our daily lie.

Visual object tracking are challenging due to the intrinsic and extrinsic variations in

tracking conditions. Intrinsic variations, the variations that appear on tracked objects, may

include the dynamic changes o object poses, geometries, colors and textures. In contrast,

extrinsic variations are variations induced by environmental conditions, including illumina-

tion changes, cluttered backgrounds, and partial occlusions on tracked objects. No matter

what kind o variations appear, the diculty o object tracking increases i the tracking is

based on the matching o object appearances.

Tracking by matching object appearances, sometimes termed as the template matching,

is a common approach or object tracking ?. Under diferent poses and illumination, the

object appearance would vary continuously during the tracking. Hence, tracking objects

with one or several xed templates o object appearances is not easible in practical applica-

tions. An adaptive template representation becomes essential in handling the varying object

appearances during the tracking.

Besides the varying poses and illumination, partial occlusions on a tracked object is

another major cause o tracking ailure. A partial occlusion that causes the discrepancy

between the appearances o the occluded object and the template oten ails the template-

based tracking. Thereore, handling the problem o partial occlusions demands a exible

representation o object appearances that reveals high tolerance to the missing o some local

parts o object appearances.

Due to the dynamic pose changes o an object during its motion, an object tracking

method is usually required to estimate the pose o the tracked object. The pose o an

object is usually characterized with the parameters relating to the position, scale/dimension,

2


3/29

and rotation o the object. Hence, the tracking problem is sometimes reerred to as the

pose estimation problem. Inventing a good way which can accurately estimate the pose

parameters o the tracked object under the challenges rom various appearance variations is

actually the kernel task and also the nal goal o the research o visual object tracking.

Motivated by the demanded technical designs and research goals mentioned above, our

work presented in this paper aims at developing a robust and efective solution to the object

tracking problem. This solution encompasses a method to estimate pose parameters o

visual objects and an adaptive and exible representation to tackle the problems o varying

illumination and partial occlusions. The proposed method to estimate object poses is aniterative particle lter method, which ofers improved perormance over other particle lter

methods. The proposed object representation is a RWLS representation which enables the

tracking o partially occluded objects. To adapt the RWLS representation to intrinsic and

extrinsic variations during the tracking, an incremental updating scheme is also designed to

update the bases o the subspace with each up-to-date input in an ecient way. Besides the

adaptive linear subspace representation, we also devise an incremental updating mechanism

to adapt the object template to the varying object appearances. With the iterative particle

lter, the adaptive RWLS representation, and the adaptive object template, the problem o

pose estimation under the challenges o partial occlusions, illumination changes, and varying

object appearances can be simultaneously handled very well.

The rest o this paper is organized as ollows. Section 2 presents a brie survey o

related work. Section 3 then introduces the adaptive RWLS representation o objects and

the adaptive incremental updating scheme o the linear subspace or dynamic appearance

variations. Section 4 presents a preliminary review o the conventional particle lter and then

proposes the iterative particle lter method or pose estimation. The mechanism to handle

partial occlusions and the adaptive templates or handling varying object appearances are

also described in this section. Section 5 shows the experimental results and compares the

perormance with those o other related prior arts. Section 6 gives the conclusions to end

this paper.

3


4/29

2. Related Work

Several studies (????) have shown the high ecacy o the linear subspace in many appli-

cations o object tracking and recognition. ? proposed a pre-trained view-based eigenspace

representation or object tracking, called Eigentracking. The appearance variations o ob-

jects with diferent poses are limitedly captured by the samples o many diferent views. ?

developed an ecient ane tracking scheme to deal with changing illumination by exem-

plar training under a variety o lighting conditions. However, these approaches still may

encounter exceptional cases or untrained poses or illumination conditions. Moreover, these

methods require the storage and the eforts or collecting a large set o samples or building

the linear subspace.

To reduce the cost o storage and the eforts or of-line linear subspace training, the

online subspace learning (???) ofers an alternative solution. The key merit o the on-

line subspace learning methods is to incrementally update the subspace whenever any new

sample becomes available. The updating requires no accesses to the past samples and thus

no storage or accumulating the past data is necessary. Following this principle, ? pre-

sented an incremental principal component analysis (PCA) to update the linear subspace.

? employed the incremental PCA to update the appearance model or ace recognition and

tracking. However, the incremental PCA method can build a biased subspace i the new

samples contain outliers or out-o-date noise. ? developed an ecient incremental updating

algorithm which incorporates a orgetting actor to wear down the inuence o older samples.

Their empirical results show that the incremental method can tolerate larger pose variations

and illumination changes. However, the problem o partial occlusions still cannot be well

solved.

As to the pose estimation, previous work can be divided into two categories: the deter-

ministic approach and the stochastic approach. The deterministic approach, including the

template-based algorithms (??) and the mean-shit algorithm (??), estimates the object

poses without introducing any random process into the estimation. One typical example

o the template-based algorithms is the algorithm proposed by ? which revises the Lucas-

4


5/29

Kanade optical ow algorithm (?) into an ecient inverse compositional (IC) algorithm.

This algorithm estimates the poses o objects that undergo diferent motion by the process

o gradient-descent error minimization. The major weakness o the gradient-descent error

minimization is the problem o trapping into local minima during the minimization, which

may lead to undesirable tracking results. This method is also error-prone to track objects

with larger appearance changes. ? presented the mean-shit tracking algorithm using the

Bhattacharyya distance to calculate the similarity between the color density distributions o

the template and the tracked object. Since the color density distribution is a global visual

eature which is not sensitive to local distortion o object appearances, the mean-shit al-gorithm can tolerate partial occlusions to some extent. Nonetheless, it is dicult to attain

accurate estimation o object poses under the use o a coarse-level visual eature. Some ex-

tensions o the mean-shit algorithm (??) capture the spatial inormation which calculates

the means and the covariance corresponding to their color bins making the pose estimation

more accurate and robust.

In contrast to the deterministic approach, the stochastic-based approach or object track-

ing typically estimates the object pose parameters, which are usually modeled with random

variables, through a random process. The instance values o all modeled pose parameters

at a certain moment are collectively called the state o the tracked object. The Kalman

lter is a well-known method or the state space estimation based on a linear stochastic

model o system dynamics. The Kalman lter produces the estimates o the true values o

measurements and their associated calculated values by predicting a value, estimating the

uncertainty o the predicted value, and computing a weighted average o the predicted value

and the measured value. The particle lter is a generalized extension to the Kalman lter

because it assumes no linearity on the system dynamics. In addition, the random noise in

the stochastic process can be non-Gaussian. The pioneering work o the particle lter is

the CONDENSATION algorithm proposed by ?. Some extensions o the particle lter have

also been developed to enhance the eciency and efectiveness or visual object tracking. ?

urther proposed the ICONDENSATION algorithm, which incorporated an auxiliary blob

tracker into the CONDENSATION algorithm. The tracking o the auxiliary blob tracker is

5


6/29

based on well segmented regions o interest with homogeneous colors. Unortunately, an-

other challenging problem encountered in this method is how to robustly segment the object

into desirable regions.

The estimation results o the particle lter signicantly depend on a set o randomly

generated particles, with each carrying an hypothetic instance o the estimated state. The

nal state estimation is aggregated rom the state instances on all particles. To achieve

better aggregation, the number o particles is usually kept large so that the bad inuence o

outlier particles can be reduced. However, the larger number o particles do not necessarily

guarantee the better hypothetic instances on particles. Moreover, the computation burdensor the state estimation also increase with the number o generated particles. Thereore, a

good design is to devise a mechanism to decisively improve the goodness o state instances

conveyed on particles so that the number o particles required to achieve accurate state

estimation can be reduced.

3. The Adaptive Region-wise Linear Subspace (RWLS) Representation

3.1. RWLS Representation

The principal component analysis (PCA) is a well-known technique o linear subspace

representation. By PCA, a transormation U can be derived to transorm a data vector x

in a higher dimensional space into another data vector c in a lower dimensional space, i.e.,

x = x + Uc, where x is the mean vector o the collected data set. Inspired by the region-

based ace recognition approach proposed by ?, we adopt a region-wise linear subspace

representation or object appearance. By partitioning an object appearance into several

regions, with each represented with a linear subspace, a partially occluded object still can

be tracked as long as one or more regions are not occluded. Thus, the robustness o tracking

a partially occluded object can be efectively enhanced by the region-wise representation.

Concerning the way to partition the object appearance into regions, a simple and regular

partitioning scheme is preerred or easier subimage cropping and region tracking. Hence,

the proposed region-wise representation exploits the simplest way to uniormly partition

6


7/29

each object appearance into kk rectangular regions, where k is a designs choice. For ease

o reerence, we use the notation Rk to denote the scheme o k k region partitioning. For

example, the partitioning scheme R1 treats the whole object appearance as a single region,

while R2 partitions the object appearance into 22 equal-size regions.

3.2. Incremental Subspace Updating

In PCA, a linear subspace can be built by solving the eigenproblem on the covariance

matrix computed rom a collection o samples. However, such a batched processing on a

xed set o samples cannot well model the appearance variations o a moving object over

time. Suppose that the rame rate o the video camera is aster enough, e.g., more than 30

rames per second, we can assume that all appearance changes on objects occur gently. To

adapt the RWLS to the up-to-date conditions, we need to recomputed the eigenvectors o

the new regional covariance matrices updated with new coming samples. Nonetheless, this

recomputation o covariance matrices and eigenvectors would require a high storage o all

past samples and also demands a high cost in time.

? proposed an ecient method to incrementally update the eigenvectors without storing

the past samples. This method updates the current eigenvectors using only the newest

sample. One problem o this method is that the subspace may be improperly updated by

some incoming outliers or noisy samples. To avoid this problem, we proposes a revised

incremental subspace updating scheme, called the weighted incremental PCA (WI-PCA).

The merit o the WI-PCA is to update the subspace using a new coming sample i this

sample is reliable enough. To this end, each incoming sample is associated with a weight

value, which is inversely proportional to the residual computed on approximating the sampleusing its lower-dimensional subspace representation. I the residual is large, meaning that

this sample is very likely to be an outlier or a noise with respect to the current subspace,

then the associated weight value is small. In what ollows, we present the ormal derivations

o the WI-PCA.

Let CN be the covariance matrix computed rom {xi}1iN, and CN+1 be the new

covariance matrix obtained ater adding a new sample xN+1. Similarly, the mean vectors

7


8/29

beore and ater adding the new sample xN+1 are denoted by xN and xN+1, respectively.

Both CN+1 and xN+1 can be derived recursively as ollows:

xN+1 =1N+1

i=1 i

Ni=1

ixN + N+1xN+1

, (1)

CN+1 =

Ni=1 iN+1i=1 i

CN +N+1

Ni=1 i

(N+1

i=1 i)2

xxT, (2)

where x = (xN+1 xN), and i is the weight associated with xi. Note that the above

computations involve no past samples in {xi}1iN. The value o the term Ni=1 i can be

also incrementally updated and stored with a variable, say N, on the arrival o each new

sample. In efect, this term sums up the weights o the past samples. Here, we introduce a

orgetting actor f, or 0 < f 1, into the adaptive update o N, i.e., N+1 = fN+N+1.

The orgetting actor aims to lower down the importance o the past particles. Accordingly,

the adaptive computation o xN+1 and CN+1 in Eq. (1) and Eq. (2) can be rewritten with

xN+1 =1

N+1(fNxN + N+1xN+1) , (3)

CN+1 =fNN+1

CN +N+1fN(N+1)2

xxT. (4)

In physical meaning, the associated weight i in Eq. (1) and Eq. (2) diferentiates the

inuence o the sample xi on the PCA process. Noisy or outlier samples are assigned

with smaller weights to reduce their inuence. As mentioned previously, we can relate the

weight value o a sample xi to the approximation residual using the current linear subspace.

According to PCA, the approximation residual can be computed by ri = xi x Uci.Accordingly, the weight i o xi can be set as N+1 = exp(k||ri||), where k is a constant

or controlling the rate o change o the weight with respect to the magnitude o the residual

and we set it to 1 in our implementation.

Let U be the matrix whose columns comprise the current set o eigenvectors obtained

rom the PCA on N samples, {xi}1iN. When the new incoming sample xN+1 becomes

available, the new set o eigenvectors must be recomputed and stored into the matrix U.

8


9/29

The underlying eigenproblem o the PCA can then be ormulated by

CN+1U = U (5)

where is a diagonal matrix whose diagonal elements are the eigenvalues corresponding to

the eigenvectors in U. Due to the orthogonality o both U and U, the new eigenvectors ater

adding the new sample can be considered as a rotated version o the set o old eigenvectors,

i.e. U = RU, where R is an orthornormal rotation matrix. Equivalently, it can be written

with

U = UR (6)

where R = UTRU or another rotation matrix R. From Eqs. (5) and (6), we have

UTCN+1UR = R (7)

Consequently, Eq. (7) leads to the equation o a new eigenproblem. The solution o the

rotation matrix R is exactly the eigenvectors o the composite matrix U

T

CN+1U. Sincethis composite matrix has a much lower dimension than the matrix CN+1, the computation

complexity or deriving its eigenvectors is also much lower. Ater nding the rotation matrix

R rom the composite matrix, the new eigenvectors can be obtained rom Eq. (6).

4. Visual Object Tracking by Iterative Particle Filter

4.1. Particle Filter

A particle lter ormulates the tracking problem by a state prediction equation,

xk = f(xk1, uk) (8)

and a measurement (or observation) unction,

zk = h(xk, nk) (9)

9


10/29

where xk Rn and zk R

d are respectively the vector o state parameters and the mea-

surement (or observation) at the time k, uk and nk are respectively an independent and

identically distributed (i.i.d.) random vector o process noise and measurement noise, and

the unctions f() and h() respectively dene a prediction model o the state parameter

vector and a measurement unction with respect to the given state parameter vector . This

measurement is generally modeled by a likelihood unction p(zk|xk).

The Sequential Importance Resampling (SIR) method (??), a well-known state parame-

ter estimating method o the particle lter, approximates the expectation o the prediction

E[f(xk)p(xk|z1:k)] by aggregating a set o weighted particles Sk = {xi

k, wi

k}1iNs where theweight wik approximate the relative posterior probabilities o the stochastically generated

particle and satisesNs

i=1 wik = 1. The aggregated state estimation is

xk = E[f(xk)p(xk|z1:k)] = E[xk|z1:k] Nsi=1

wikxik. (10)

With respect to the measurement zik induced by the hypothetic estimate xik on a particle i,

the particle weight wik or the current rame turns out to be

wik wik1p(z

ik|x

ik). (11)

One common problem with the particle lter is the degeneracy problem, which occurs

ater several iterations o re-weighting (?). This problem occurs when all but one particle

has negligible weight, implying that most computational eforts are wasted on updating

particles that contribute nothing to the approximation to p(xk|z1:k). A remedial operationis to perorm a resampling process o particles i the number o efective particles is too

small. To determine the appropriate timing or the resampling process, a criterion or

the degeneracy is dened as Neff = 1/NSi=1(wik)2, which logically indicates the number oparticles with efective weights. The resampling process is thus initiated i Neff NT,

where NT is a predened threshold. In the resampling process, each inefective particle is

replaced with a new particle which carries a state instance stochastically perturbed rom an

10


11/29

existing particle with a higher particle weight. All new particles ater the resampling process

have equal weight (i.e., wik1

= 1/Ns), implying that wi

krelates only to p(zi

k|xi

k) according

to Eq. (11).

4.2. The Iterative Particle Filter

In the SIR particle lter, the particle weights are related to the likelihood value p(zik|xik).

In our method, we design the weight o each particle to be proportional to a criterion unction

G(zik|xik), which denes quantitatively the goodness o the observation z

ik with respect to the

hypothetic state parameter xik. Instead o employing the conditional resampling strategy o

the SIR particle lter, we perorm the resampling process unconditionally at every rame

k. Furthermore, when doing tracking on each rame, a ltering process and the resampling

process are iteratively perormed or a xed number o iterations to enhance the goodness

o the survived particles or tracking. This is why we call the proposed particle lter an

iterative particle lter.

4.2.1. Models of State Transition and Measurement

For tracking an object with a particle lter, we represent the state parameters with the

location and the dimension o the tracked object on each video rame. Hence, the state

parameters are encoded with a vector x = (x,y,w,h), dening the objects bounding box

which has its upper-let corner situated at (x, y) and a dimension o w h pixels on the

rame.

As ormulated in Eqs. (8) and (9), a particle lter requires a state transition model

and a measurement model. To characterize the object motion, a discrete equal-velocity

equation is adopted or modeling the position parameter o the object, i.e. vk = pk1pk2,

where pk = (xk, yk, 0, 0)T denotes the position o the objects bounding box at the rame

k. For simplicity, the perturbation uk in Equation (8) is dened as a random vector uk =

(xk, yk, wk, hk)T. This vector u adds small random deviations, xk, yk, wk and

hk, to the estimated position and dimension o the object. Combining the velocity and

11


12/29

the random perturbations, the state transition model can be dened as

xk = xk1 + vk + uk. (12)

Given the state parameter vector xk, the measurement on the input rame Ik is dened as

the appearance o the tracked object, i.e., zk = I(xk), where Ik(xk) denotes the subimage

enclosed by the bounding box specied by xk. Here, no random noise, as the nk in Eq. (9),

is assumed or the measurement model.

4.2.2. Iterative Filtering and Resampling

The particle weight in our design is computed rom a quantitative unction G(zik|xik) which

evaluates the goodness o the hypothetic observation on each particle. Suppose that tk is the

adaptive template used or tracking the target object. The quantitative unction G(zik|xik)

can be designed in terms o the matching error between the observation zik = I(xik)) and the

template tk. Based on the linear subspace representation, the particle weight is designed as

wik = G(zik|xik) = exp(Utkzik Utktk2k), (13)

where the columns in the matrix U contain the eigenvectors o the current linear subspace.

The matching error in Eq. (13) involves a weighted norm which is dened as

a = (a1, a2, . . . , am)t2k=(1,2,...,m)t

=mi=1

ia2i (14)

The parameter in Eq. (13) is a small positive value and controls the sensitivity o thegoodness value to the change o the weighted norm. The elements in the weight vector

k reect the importance o the elements o the vector in calculating the norm. Inside the

compressed adaptive template Utktk, i the values o an element present only a small variance

over a period o time, meaning that this element is stable and reliable under the current

appearance variations, then this element should gain a higher weight ( i) in computing the

norm. The detailed procedure or obtaining the weight vector k and the adaptive template

12


13/29

tk is presented later in Section 4.4.

When tracking an object on a certain video rame, we use the nal state estimation

o the previous rame as the seed to stochastically generate hypothetic state instances on

the particles. The generation o state instances ollows the stochastic state transition model

given in Eq. (12). The goodness o each particle is then evaluated according to Eq. (13). With

the particles and the goodness o particles (say Sk = {(xik, w

ik = G(z

ik|x

ik))}

Nsi=1), a ltering

operation removes the particles with lower goodness values and the remained particles are:

S

k

= filter(Sk, ) = {(xi

k

, wi

k

)|((xi

k

, wi

k

)i

k

Sk) (wi

k

)}, (15)

where = 1.2 min{wik}Nsi=1 is set as a 20% increment o the minimum value o the weights.

With the particles in the new particle set Sk, the resampling process particle lter is then

perormed. In the resampling process, the particles are resampled according a probability

proportional to their weights. For each sampled particle, a small random perturbation uk,

as dened in Eq.(12), is made on the carried state instance to increase the opportunity

o escaping rom a locally optimal estimation. The random perturbation is designed to

decrease with iterations, i.e., uk = iteruk, or (0, 1), to ensure the nal convergence o

the state estimation ater several iterations. On each rame, the ltering operation and the

resampling process are iteratively perormed in turn or several runs to enhance the goodness

o the remained particles or tracking.

Let Sk be the particle set obtained rom the nal run o resampling process. The weight

or each particle in Sk is re-evaluated according to Eq. (13). Each weight is then normalized

by dividing its value by the the sum o all weights. Finally, the aggregation scheme isperormed to iner the nal estimation rom the normalized weights, i.e.,

xk =

(xik,wi

k)filter(S

k,)

wikxik. (16)

To illustrate the gradual improvement o the tracking as the iterations go, Figure 1

demonstrates an example to track a ast-moving hand. The white box in Figure 1(a) shows

13


14/29

(a) (b) (c)

Figure 1: The gradually improved particle estimates o the proposed iterative particle flter or 3 iterationso particle resampling. (a) Tracking result on a rame at time t 1, (b) the particle estimates o the threeiterations o particle resampling, illustrated respectively with white boxes, gray boxes, and black boxes, onthe rame at time t, and (c) the fnal aggregated tracking result on the rame at time t.

the tracking result on a certain rame. On tracking the hand in the next rame shown

in 1(b), the white boxes illustrate the estimates rom 100 particles generated in the rst

iteration. Ater the second iteration, the estimates rom these particles are illustrated with

gray boxes. Apparently, the estimates get closer to the true position o the hand comparing

to the estimates in the rst iteration. When the third iteration is completed, the generated

estimates illustrated with black boxes are even better than the estimates in the second

iteration. The nal aggregated estimation is shown with the white box in Figure 1(c).

For the conventional particle lter, this kind o ast-moving objects would demand a large

number o particles (e.g., >600) and require large perturbations to attain good tracking

results. Additionally, the tracking results on diferent video rames may drit unstably

because o the introduced large perturbations.

4.3. Handling of Partial Occlusions

Owing to the RWLS representation, the matching between each observation and theobject template is also perormed in a region-wise manner. Intuitively, the matching errors

o occluded regions would be larger than those o un-occluded regions. Let zk(r) be the image

observation corresponding to region r, and tk(r) be the corresponding regional sub-image

on the object template. The regional matching error on this region is computed by

Err(zk(r)) = Utkzk(r) U

tktk(r)k , (17)

14


15/29

where the matrix Uk contains the eigenvectors o the current linear subspace, and the weight

norm k is dened in Eq. (14). A region r is claimed to be occluded i its regional matching

error satises the ollowing condition

Err(zk(r)) > mean({Err(zk(r))}Rr=1) + stdv({Err(zk(r))}

Rr=1), (18)

where R is the total number o partitioned regions and the two unctions mean(S) and

stdv(S) compute respectively the mean and standard deviation o the data in a given data

set S. The constant controls the allowed error deviation rom the averaged regional

matching errors. For each particle, i more than a hal o the partitioned regions on the

hypothetic observation o this particle are identied as occluded regions, then this particle

is discarded. For each video rame, i no particle remains ater the identication o occluded

regions, then the particle lter skips the tracking on the current rame and proceeds to the

next rame.

For handling partial occlusions, the matching errors o occluded regions should not be

included into the nal matching error between the observation and the template. Otherwise,

the tracking may be ailed by these occluded regions. Hence, we rene the matching error

between the object observation and the object template as

Err(zk) =1

R |Socc|

r/Socc

Err(zk(r)), (19)

where Socc is the set containing all occluded regions and |Socc| denotes the number o occluded

regions.

4.4. Adaptive Template Updating

The incremental updating o the linear subspace adapts the subspace representation

to the up-to-date inputs. However, even with the up-to-date subspace representation, the

stored object template is still likely to be out-o-date. Thereore, adaptively updating the

template is another important mechanism or handling the appearance variations o objects.

We call the adaptively updated template an adaptive template.

15


16/29

The adaptive template is updated with the appearance o the object tracked on the most

recent rame. Nonetheless, the template updating should be made conditional to avoid the

improper inuence rom disturbances. According to the tracked object on the current rame,

i no region is identied as an occluded region, meaning that the tracked observation has

no atal disturbances in appearance, then the template can be updated with the tracked

observation. Let tk be the adaptive template at the rame k, and zk be the corresponding

tracked observation. The update is perormed according to

tk+1 = (1 )tk + zk, (20)

where (0, 1) controls the rate o the updating. A larger value o means a aster

adaptation o the template toward the new object appearance. Empirically, the value o this

parameter highly depends on the rate o appearance changes. The parameter is normally

set below 0.05 or objects with normal moving speeds and slowly changed illumination and

around 0.05 0.5 or aster variations in object appearances and illumination.

Since the template is adaptable, some elements o the template vector may have requent

changes due to the appearance changes o the tracked object. Such varying elements may

unstably afect the calculation o weighted norm between the observed appearance and the

object template in Eq.(13) and Eq. (17) and consequently lead to wrong tracking results.

Hence, we introduce the weight vector in Eq.(13) and Eq. (17) to reduce such a negative

efect. Recall that the elements o the weight vector assign diferent importance actors

to the elements o the compared vectors on calculating the weighted norm. Highly varying

elements in the template should be given low importance values. Thereore, the importanceo each element in the template vector can be quantitatively evaluated in terms o the

variance computed rom the element values collected rom the tracked objects on past rames.

Similar to the adaptive updating in Eq. (1) and Eq. (2), the variances o the elements in the

16


17/29

template vector represented in the linear subspace can be incrementally updated by

k+1 =1

k + 1kk + Utk+1tk+1 , (21)

k+1 =k

k + 1k +

k

(k + 1)2(Utk+1tk+1 k+1)(U

tk+1tk+1 k+1)

t, (22)

k+1 = expv(diag(1

2(Utk+1tk+1 k+1)

1k+1(U

tk+1tk+1 k+1)

t)), (23)

where diag(M) denotes a vector composed o the diagonal elements o a matrix M and

expv

(a = (a1, a2, . . . , ad)t) = (exp(a1), exp(a2), . . . , exp(ad))

t. In summary, the proposed

method adapts both the template and the linear space. Eq. (20) denes the way to adapt the

template in the original image space. Meanwhile, the eigenspace is incrementally updated by

the proposed method presented in Section 3.2. Eqs. (21)-(23) denes the way to adaptively

compute the weight vector k required or calculating the matching residual dened in

Eq. (17). Algorithm 1 lists o the detailed steps o the proposed iterative particle lter.

5. Experimental Results and Performance Comparison

To evaluate the perormance o the proposed algorithm, some experiments are conducted

to track objects on six testing video sequences acquired in real-world environments. Two

sequences are available at http://www.cs.toronto.edu/dross/ivt/, presenting appear-

ance variations, including illumination changes, pose changes and acial expressions, on the

tracked objects. The rest our sequences are captured with our camcorder and present the

cases o partial occlusions, size variations, and ast motions. The tracking algorithm is im-

plemented in C++ on Microsot Visual Studio using the Intel Pentium 4 2.8 GHz CPU. Theassessed processing speed is about 14.7 rames per second or 100 particles.

When initializing the tracking o a sequence, we manually speciy the rectangle bounding

box or the target object on the rst rame. The boxed target object appearance is then

resized to a 2424 object template. Then, we randomly translate and scale the bounding

box by a small random perturbation or 100 times to acquire 100 samples to build the initial

linear subspaces o the partitioned regions. Two region partitioning schemes, R1 and R2, are

17


18/29

Algorithm 1 The Proposed Tracking Algorithm with Occlusion Handling

1: Given a particle set Sk1 = {xik1, 1/Ns}

Nsi=0, target template tk1 and subspace model

k1 = {xk1, Uk1, Ck1} at the rame k 1.2: Set occ flag = 0 to indicate no occlusion.3: Set iter = 1.4: for i = 1 : Ns do5: Propagate the particle set or the initial iteration by xik,iter = x

ik1 + vk + uk.

6: Evaluate the measurement zik,iter corresponding to state xik,iter.

7: Update the weight wi

k,iter

by Eq. (13).8: end for9: Normalize the weight wik,iter = w

ik,iter/

Nsi=1 w

ik,iter, or 1 i Ns.

10: for iter = 2 : Iter do11: Generate the seed sample set according to ltering unction Sk,iter =

{xjk,iter, wjk,iter} = filter(Sk,iter1, ) by Eq. (15) .

12: Set c0k = 0.13: for j = 1 : J do14: cjk = c

j1k + w

jk,iter.

15: end for16: Normalize the cumulative probability cjk = c

jk/c

Jk or 1 j J.

17: for i = 1 : Ns do18: Generate a uniormly distributed random number r [0, 1].19: Find the smallest j which cjk r.20: Propagate the particle set xik,iter = x

jk,iter +

iteruk.

21: Evaluate the measurement zik,iter corresponding to state xik,iter.

22: Update the weight wik,iter by Eq. (13).23: end for24: Normalize the weight wik,iter = w

ik,iter/

Nsi=1 w

ik,iter.

25: end for26: Perorm the ltering unction Sk = filter(S

k,iter,

).

27: Normalize the weight wik,iter = wik,iter/Nsi=1 wik,iter.28: Estimate the state xk by Eq. (16).29: Set occ flag according to match error by Eq. (18).30: if occ flag = 0 then31: Update the template tk by Eq. (20) and subspace model k presented in Section 3.2.32: end if

18


19/29

Table 1: Parameter settings or the proposed iterative particle flter

orgetting actor, [0.05,1] 0.2 0.7 1.5 [0.03,0.1]

used or comparison. The numbers o eigenvectors used in representing the linear subspaces

o R1 and R2 are 50 and 15, respectively. Only 100 particles are generated in our iterative

particle lter on tracking each testing sequence.

The setting o other parameters, including orgetting actor, , , and , are listed

in Table 1. The crucial parameters are orgetting actor and learning rate o template

, which should be set according to the rate o appearance changes o the tracked object.

The range o the uniormly distributed random vector uk = (xk, yk, wk, hk)T are

xk U[12, 12], yk [12, 12], wk [0.035, 0.035], hk [0.035, 0.035].

5.1. Experimental Results

The experiment conducted on the rst testing video sequence is to track a human ace

(dudek) with diferent head poses and acial expressions. Figure 2 demonstrates some snap-

shots o the tracking results. At the bottom o each illustrated snapshot, the thumbnail

images rom let to right show the tracked target, the template, the subspace mean, the

approximation error (residual) image and the approximated image, respectively. For com-

parison, Figure 2 simultaneously illustrates the tracking results on using the ollowing our

diferent combinations o adaptive mechanisms:

1. a xed template and a xed subspace representation,

2. an adaptive template and a xed subspace representation,

3. a xed template and an adaptive subspace representation, and

4. an adaptive template and an adaptive subspace representation.

The results show evidentially that an adaptive template combined with an adaptive subspace

representation attains the best perormance. The combinations with no adaptive templates

19


20/29

ail to track the aces rom Frame #796 onward, while the one with both the adaptive

template and the adaptive subspace correctly tracks the aces on all rames.

Another video sequence or tracking the ace o another person (ming-hsuan) is also

tested. Figure 3 shows the snapshots o the tracking results or comparing the use o the xed

subspace and the adaptive subspace. Both compared methods use the adaptive template.

Note that this testing sequence contains large illumination variations on some rames (#400

and #1200). The top row in Figure 3, which illustrates the tracking results o the xed

subspace, shows that the ace cannot be well tracked on Frame #1420. However, this rame

still can be correctly tracked by the adaptive subspace, as shown in the bottom row oFigure 3.

Frame362

Frame684

Frame934Figure 2: Face tracking on a sequence (dudek) with variations in poses and acial expressions. Column 1:the tracking results with a fxed template and a fxed subspace model; Column 2: the tracking results with afxed template and an adaptive subspace model; Column 3: the tracking results with an adaptive templateand a fxed subspace model; Column 4: the tracking results with an adaptive template and an adaptivesubspace model.

Two other sequences are tested to examine the ecacy o the RWLS in handling partial

occlusions. The rst sequence is the video o a moving toy tank which is gradually occluded

20


21/29

Frame 1 Frame 400 Frame 1200 Frame 1420


Figure 3: Face tracking on another sequence (ming-hsuan) with large variations in illumination and poses.The top row shows the tracking results with an adaptive template and a fxed subspace representation. Thebottom row shows the tracking results with an adaptive template and an adaptive subspace representation.

by another scene object during its motion. The maximal occluded area during its motion is

about 50%. Figure 4 shows the tracking results on some rames or the region partitioning

schemes o R1 and R2. Note that the character N shown at the let side o each snapshot o

the scheme R2 indicates that the corresponding partitioned region is automatically identied

as an un-occluded region by the proposed tracking algorithm. On the contrary, a region

which is identied as an occluded regions is labeled with its region identication number. As

shown in Figure 4, the scheme R2 successully tracks the occluded tank, while the scheme

R1 ails. This result veries the good capability in handling partial occlusions or the

region-wise tracking o objects. The tracking on the other testing sequence is to track a

Chinese character printed on an aluminium oil package. As shown in Figure 5, the maximal

occluded area during its motion is about 40% o size o this Chinese character. The results

demonstrate again the superiority o the proposed region-wise tracking o objects in handling

partial occlusions.

21


22/29



Figure 4: Tracking a moving toy tank with severe occlusions during its motion. Top row: the results or theR1 representation. Bottom row: the results or the R2 representation.



Figure 5: Tracking a Chinese character on a moving aluminium oil package with partial occlusions. Toprow: the results or the R1 representation. Bottom row: the results or the R4 representation. The objectstracked by the region partitioning R4 are more accurate in object size on Frame 77 and Frame 97.

22


23/29

5.2. Performance Comparisons with Other Particle Filters

The perormance o the proposed iterative particle lter are compared with those theSIR particle lter and a general version o particle lter, which is called the GPF in this

paper. The GPF perorms only one iteration o particle generation. For a more balanced

comparison, both the adaptive subspace and adaptive template are exploited in the SIR and

GPF. Two video sequences are tested or the perormance comparison. One sequence is a

car moving away rom the camera at a normal speed and the other is a doll moved by a ast

moving hand. On testing the video sequence, both SIR and GPF use 300 particles, while

our iterative particle lter uses 75 particles in each iteration. The number o iterations areset to our.

As shown in Figure 6, both SIR and GPF ail to track the car accurately when the

car runs urther or the car sequence. This result indicates that SIR and GPF cannot well

handle the size variations o the car. In contrast, the proposed iterative particle lter gains

better capability in tracking objects with size variations according to the results shown in

Figure 6. The perormance comparison or the doll sequence is shown in Figure 7. Note

that the high moving speed o the hand causes motion blurs on some rames. Consequently,

both SIR and GPF ail to track the doll rom Frame #130 onward. Again, the proposed

iterative particle lter demonstrates its better perormance in tracking ast-moving objects

in this test. The proposed iterative particle lter successully tracks the doll on every rame

o the sequence.

5.3. Performance Comparison with Halls Incremental PCA

The proposed WI-PCA is an improved variant over the incremental PCA proposed by?. Hence, we use three sequences, including the car sequence, the doll sequence and the

jal sequence, as the benchmarks or comparing the perormance o Halls method and our

method. Note that the adaptive template is also exploited in both methods or the test.

To quantitatively evaluate the tracking accuracy or comparison, we compute the posi-

tion error or the tracking result on each video rame. The position error is dened to be

Errpos =

4i=1(X(i) X(i))

2, where X(i) and X(i), or 1 i 4, are respectively the

23


24/29


Figure 6: Tracking a car moving away rom the camera using the SIR, the GPF (using 300 particles), and theproposed particle flter (using 4 iterations with 75 particles per iteration) on rows 1, 2, and 3, respectively.

24


25/29




Figure 7: Tracking a doll moved rapidly by a hand. The top two rows are the tracking results or SIR andGPF, respectively. The bottom row is the tracking results o the proposed particle flter.

25


26/29

corresponding corners o the bounding boxes o the tracked object and the ground truth.

Ater ve rounds o tracking on each sequence, Table 2 lists the position errors averaged

over all rames or these three sequences. The results show that the accuracy o the pro-

posed WI-PCA is slightly better than that o Halls approach. In the test, we nd that

Halls approach may improperly update the linear subspace with the samples o bad object

appearances drawn rom incorrect tracking results. On the contrary, our proposed WI-PCA

can ignore the bad samples or updating through the designed weighting mechanism. Fur-

thermore, the introduced orgetting actor in our WI-PCA gives higher inuence to the

up-to-date good samples, while Halls method equally treats the out-o-date samples.

Table 2: the statistics o the tracking position errors or three video sequences.

jal car dollWI-PCA method 5.350.02 6.501.16 7.420.31

Halls method 6.450.23 7.252.14 7.600.40

To urther demonstrate the efectiveness o the proposed WI-PCA in modeling object

appearance, we also compare the reconstruction errors between the WI-PCA method and

the Halls method. Fity eigenvectors are selected to reconstruct the appearance o ob-

ject or both compared methods. In addition to the previous three benchmark sequences,

we also include the two ace sequences (dudek and ming-hsuan ) used in Section 5.1 or

evaluation. Table 3 presents the computed reconstruction errors averaged over all rames

or ve runs o tracking on each sequence. The reconstruction error is dened to be

RMSE = 1NNi=1(I(i) I(i))2, where I(i) and I(i) are respectively the ith pixel onthe template image and the reconstructed image and N is the total number o the pixels onthe template. The results in Table 3 show that the proposed WI-PCA better represents the

varying appearances o objects.

5.4. Performance Comparison with Mean-Shift Algorithm

We urther compare the proposed tracker with the mean-shit algorithm ( ??) and the

CAMShit algorithm (?) on the doll sequence and the jal sequence. As described in Section

26


27/29

Table 3: statistics o the reconstruction error (RMSE per pixel).

dudek ming-hsuan jal car dollWI-PCA method 0.7680.005 0.9960.002 0.9550.003 1.3520.074 1.4220.001

Halls method 0.7390.117 1.3060.018 0.9970.011 1.5170.017 2.1320.116

2, since the mean-shit algorithm uses the global color histogram as the visual eature or

object tracking, accurate estimation o object poses becomes more dicult. Furthermore,

the mean-shit tracker is error-prone under varying illumination because the color histogram

is sensitive to illumination. Figure 9 demonstrates some snapshots o the tracking results on

the two testing sequences. The tracked results o the proposed tracker and the mean-shit

tracker are drawn with solid white boxes and dashed yellow boxes, respectively. Figure 9

shows that the mean-shit tracker loses the doll when the doll move rapidly (#130, #200).

The proposed tracker tracks very well till the end o the sequence. Figure 9 shows that the

mean-shit tracker ails to track the ace under illumination changes (#40, #170). Figure

10 (a) and (b) plot respectively the position error o the tracked object on each rame o the

tracked doll and the jal sequences or these compared two trackers.



Figure 8: The snapshots o the tracked object on the doll sequence and the jal sequence or comparing theproposed tracker, the mean-shit tracker and the CAMShit tracker. The dashed yellow boxes are the resultso the mean-shit tracker; the solid red boxes are the results o the CAMShit tracker; the solid white boxesare the results o the proposed tracker.

27


28/29

6. Concluding Remarks

This paper designs an improved particle lter and a RWLS representation or visual

object tracking. With this RWLS representation, the proposed tracking method partitions

the object into kk independent regions. The independent tracking o these regions enables

the proposed method to ignore the occluded regions and continue the tracking o un-occluded

regions. Thus, the partial occlusions can be efectively handled during the tracking.

To enhance the adaptability o the linear subspace, an adaptive subspace learning model,

the WI-PCA, which can eciently and incrementally update the built subspace is proposed.

This adaptive learning model can well adapt to the variations in object appearances and

illumination. In addition, the WI-PCA seeks to modulate the bad inuence rom outliers,

noisy inputs, and out-o-date data by introducing a weighting mechanism and a orgetting

actor into the adaptation. Besides the adaptive subspace, the object template or tracking

is also made adaptive so that the dynamic appearance variations o the tracked object can

be handled even better.

On the particle lter ramework, we propose an iterative particle lter to improve over

the traditional particle lters. The improved particle lter eatures a better strategy o

iterative particle generation which is designed to guarantee the gradually improved quality

o generated particles. The improved particle quality leads to an improved tracking accu-

racy ater the particle aggregation. The experimental results demonstrate the efectiveness

and the superiority o the proposed algorithm on tracking objects undergoing various pose

changes, partial occlusions, and illumination variations.

28


29/29

(a) The doll sequence.

(b) The jal sequence.

Figure 9: The comparison o position errors o the tracked objects on the doll sequence and the jal sequenceor the mean-shit tracker, the CAMShit tracker and the proposed tracker.

29

tracking revised)

Documents