tracking revised)

Upload: cheng-chin-chiang

Post on 05-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Tracking Revised)

    1/29

    Object Tracking by Exploiting Adaptive Region-wise Linear

    Subspace Representations and Adaptive Templates in an Iterative

    Particle Filter

    Ming-Che Ho, Cheng-Chin Chiang, Ying-Yu Su

    Department of Computer Science and Information Engineering, National Dong Hwa University, Shoufeng,

    Hualien, Taiwan, 974

    Abstract

    Aiming at tracking visual objects under harsh conditions, such as partial occlusions, illu-

    mination changes, and appearance variations, this paper proposes an iterative particle lter

    incorporated with an adaptive region-wise linear subspace (RWLS) representation o objects.

    The iterative particle lter employs a coarse-to-ne scheme to decisively generate particles

    that convey better hypothetic estimates o tracking parameters. As a result, a higher track-

    ing accuracy can be achieved by aggregating the good hypothetic estimates rom particles.

    Accompanying with the iterative particle lter, the RWLS representation is a special design

    to tackle the partial occlusion problem which oten causes tracking ailure. Moreover, the

    RWLS representation is made adaptive by exploiting an ecient incremental updating mech-

    anism. This incremental updating mechanism can adapt the RWLS to gradual changes in

    object appearances and illumination conditions. Additionally, we also propose the adaptive

    mechanism to continuously adjust the object templates so that the varying appearances o

    tracked objects can be well handled. Experimental results demonstrate that the proposed

    approach achieves better perormance than other related prior arts.

    Keywords: object tracking, region-wise linear subspace (RWLS), iterative particle lter,

    incremental PCA

    Corresponding author: Cheng-Chin Chiang. Tel.: +886-3-8634027; ax: +886-3-8634010.Email address: [email protected] (Cheng-Chin Chiang)

    Preprint submitted to Pattern Recognition Letters July 8, 2011

  • 8/2/2019 Tracking Revised)

    2/29

    1. Introduction

    Visual object tracking is a core task in most computer vision applications and has been

    intensively researched over the past decade. The typical objects needing to be tracked in

    applications include aces, hands, cars and human bodies, etc. A wide spectrum o poten-

    tial applications, such as tele-conerencing, video surveillance, human-machine interactions,

    intelligent transportation systems, etc., have been developed or our daily lie.

    Visual object tracking are challenging due to the intrinsic and extrinsic variations in

    tracking conditions. Intrinsic variations, the variations that appear on tracked objects, may

    include the dynamic changes o object poses, geometries, colors and textures. In contrast,

    extrinsic variations are variations induced by environmental conditions, including illumina-

    tion changes, cluttered backgrounds, and partial occlusions on tracked objects. No matter

    what kind o variations appear, the diculty o object tracking increases i the tracking is

    based on the matching o object appearances.

    Tracking by matching object appearances, sometimes termed as the template matching,

    is a common approach or object tracking ?. Under diferent poses and illumination, the

    object appearance would vary continuously during the tracking. Hence, tracking objects

    with one or several xed templates o object appearances is not easible in practical applica-

    tions. An adaptive template representation becomes essential in handling the varying object

    appearances during the tracking.

    Besides the varying poses and illumination, partial occlusions on a tracked object is

    another major cause o tracking ailure. A partial occlusion that causes the discrepancy

    between the appearances o the occluded object and the template oten ails the template-

    based tracking. Thereore, handling the problem o partial occlusions demands a exible

    representation o object appearances that reveals high tolerance to the missing o some local

    parts o object appearances.

    Due to the dynamic pose changes o an object during its motion, an object tracking

    method is usually required to estimate the pose o the tracked object. The pose o an

    object is usually characterized with the parameters relating to the position, scale/dimension,

    2

  • 8/2/2019 Tracking Revised)

    3/29

    and rotation o the object. Hence, the tracking problem is sometimes reerred to as the

    pose estimation problem. Inventing a good way which can accurately estimate the pose

    parameters o the tracked object under the challenges rom various appearance variations is

    actually the kernel task and also the nal goal o the research o visual object tracking.

    Motivated by the demanded technical designs and research goals mentioned above, our

    work presented in this paper aims at developing a robust and efective solution to the object

    tracking problem. This solution encompasses a method to estimate pose parameters o

    visual objects and an adaptive and exible representation to tackle the problems o varying

    illumination and partial occlusions. The proposed method to estimate object poses is aniterative particle lter method, which ofers improved perormance over other particle lter

    methods. The proposed object representation is a RWLS representation which enables the

    tracking o partially occluded objects. To adapt the RWLS representation to intrinsic and

    extrinsic variations during the tracking, an incremental updating scheme is also designed to

    update the bases o the subspace with each up-to-date input in an ecient way. Besides the

    adaptive linear subspace representation, we also devise an incremental updating mechanism

    to adapt the object template to the varying object appearances. With the iterative particle

    lter, the adaptive RWLS representation, and the adaptive object template, the problem o

    pose estimation under the challenges o partial occlusions, illumination changes, and varying

    object appearances can be simultaneously handled very well.

    The rest o this paper is organized as ollows. Section 2 presents a brie survey o

    related work. Section 3 then introduces the adaptive RWLS representation o objects and

    the adaptive incremental updating scheme o the linear subspace or dynamic appearance

    variations. Section 4 presents a preliminary review o the conventional particle lter and then

    proposes the iterative particle lter method or pose estimation. The mechanism to handle

    partial occlusions and the adaptive templates or handling varying object appearances are

    also described in this section. Section 5 shows the experimental results and compares the

    perormance with those o other related prior arts. Section 6 gives the conclusions to end

    this paper.

    3

  • 8/2/2019 Tracking Revised)

    4/29

    2. Related Work

    Several studies (????) have shown the high ecacy o the linear subspace in many appli-

    cations o object tracking and recognition. ? proposed a pre-trained view-based eigenspace

    representation or object tracking, called Eigentracking. The appearance variations o ob-

    jects with diferent poses are limitedly captured by the samples o many diferent views. ?

    developed an ecient ane tracking scheme to deal with changing illumination by exem-

    plar training under a variety o lighting conditions. However, these approaches still may

    encounter exceptional cases or untrained poses or illumination conditions. Moreover, these

    methods require the storage and the eforts or collecting a large set o samples or building

    the linear subspace.

    To reduce the cost o storage and the eforts or of-line linear subspace training, the

    online subspace learning (???) ofers an alternative solution. The key merit o the on-

    line subspace learning methods is to incrementally update the subspace whenever any new

    sample becomes available. The updating requires no accesses to the past samples and thus

    no storage or accumulating the past data is necessary. Following this principle, ? pre-

    sented an incremental principal component analysis (PCA) to update the linear subspace.

    ? employed the incremental PCA to update the appearance model or ace recognition and

    tracking. However, the incremental PCA method can build a biased subspace i the new

    samples contain outliers or out-o-date noise. ? developed an ecient incremental updating

    algorithm which incorporates a orgetting actor to wear down the inuence o older samples.

    Their empirical results show that the incremental method can tolerate larger pose variations

    and illumination changes. However, the problem o partial occlusions still cannot be well

    solved.

    As to the pose estimation, previous work can be divided into two categories: the deter-

    ministic approach and the stochastic approach. The deterministic approach, including the

    template-based algorithms (??) and the mean-shit algorithm (??), estimates the object

    poses without introducing any random process into the estimation. One typical example

    o the template-based algorithms is the algorithm proposed by ? which revises the Lucas-

    4

  • 8/2/2019 Tracking Revised)

    5/29

    Kanade optical ow algorithm (?) into an ecient inverse compositional (IC) algorithm.

    This algorithm estimates the poses o objects that undergo diferent motion by the process

    o gradient-descent error minimization. The major weakness o the gradient-descent error

    minimization is the problem o trapping into local minima during the minimization, which

    may lead to undesirable tracking results. This method is also error-prone to track objects

    with larger appearance changes. ? presented the mean-shit tracking algorithm using the

    Bhattacharyya distance to calculate the similarity between the color density distributions o

    the template and the tracked object. Since the color density distribution is a global visual

    eature which is not sensitive to local distortion o object appearances, the mean-shit al-gorithm can tolerate partial occlusions to some extent. Nonetheless, it is dicult to attain

    accurate estimation o object poses under the use o a coarse-level visual eature. Some ex-

    tensions o the mean-shit algorithm (??) capture the spatial inormation which calculates

    the means and the covariance corresponding to their color bins making the pose estimation

    more accurate and robust.

    In contrast to the deterministic approach, the stochastic-based approach or object track-

    ing typically estimates the object pose parameters, which are usually modeled with random

    variables, through a random process. The instance values o all modeled pose parameters

    at a certain moment are collectively called the state o the tracked object. The Kalman

    lter is a well-known method or the state space estimation based on a linear stochastic

    model o system dynamics. The Kalman lter produces the estimates o the true values o

    measurements and their associated calculated values by predicting a value, estimating the

    uncertainty o the predicted value, and computing a weighted average o the predicted value

    and the measured value. The particle lter is a generalized extension to the Kalman lter

    because it assumes no linearity on the system dynamics. In addition, the random noise in

    the stochastic process can be non-Gaussian. The pioneering work o the particle lter is

    the CONDENSATION algorithm proposed by ?. Some extensions o the particle lter have

    also been developed to enhance the eciency and efectiveness or visual object tracking. ?

    urther proposed the ICONDENSATION algorithm, which incorporated an auxiliary blob

    tracker into the CONDENSATION algorithm. The tracking o the auxiliary blob tracker is

    5

  • 8/2/2019 Tracking Revised)

    6/29

    based on well segmented regions o interest with homogeneous colors. Unortunately, an-

    other challenging problem encountered in this method is how to robustly segment the object

    into desirable regions.

    The estimation results o the particle lter signicantly depend on a set o randomly

    generated particles, with each carrying an hypothetic instance o the estimated state. The

    nal state estimation is aggregated rom the state instances on all particles. To achieve

    better aggregation, the number o particles is usually kept large so that the bad inuence o

    outlier particles can be reduced. However, the larger number o particles do not necessarily

    guarantee the better hypothetic instances on particles. Moreover, the computation burdensor the state estimation also increase with the number o generated particles. Thereore, a

    good design is to devise a mechanism to decisively improve the goodness o state instances

    conveyed on particles so that the number o particles required to achieve accurate state

    estimation can be reduced.

    3. The Adaptive Region-wise Linear Subspace (RWLS) Representation

    3.1. RWLS Representation

    The principal component analysis (PCA) is a well-known technique o linear subspace

    representation. By PCA, a transormation U can be derived to transorm a data vector x

    in a higher dimensional space into another data vector c in a lower dimensional space, i.e.,

    x = x + Uc, where x is the mean vector o the collected data set. Inspired by the region-

    based ace recognition approach proposed by ?, we adopt a region-wise linear subspace

    representation or object appearance. By partitioning an object appearance into several

    regions, with each represented with a linear subspace, a partially occluded object still can

    be tracked as long as one or more regions are not occluded. Thus, the robustness o tracking

    a partially occluded object can be efectively enhanced by the region-wise representation.

    Concerning the way to partition the object appearance into regions, a simple and regular

    partitioning scheme is preerred or easier subimage cropping and region tracking. Hence,

    the proposed region-wise representation exploits the simplest way to uniormly partition

    6

  • 8/2/2019 Tracking Revised)

    7/29

    each object appearance into kk rectangular regions, where k is a designs choice. For ease

    o reerence, we use the notation Rk to denote the scheme o k k region partitioning. For

    example, the partitioning scheme R1 treats the whole object appearance as a single region,

    while R2 partitions the object appearance into 22 equal-size regions.

    3.2. Incremental Subspace Updating

    In PCA, a linear subspace can be built by solving the eigenproblem on the covariance

    matrix computed rom a collection o samples. However, such a batched processing on a

    xed set o samples cannot well model the appearance variations o a moving object over

    time. Suppose that the rame rate o the video camera is aster enough, e.g., more than 30

    rames per second, we can assume that all appearance changes on objects occur gently. To

    adapt the RWLS to the up-to-date conditions, we need to recomputed the eigenvectors o

    the new regional covariance matrices updated with new coming samples. Nonetheless, this

    recomputation o covariance matrices and eigenvectors would require a high storage o all

    past samples and also demands a high cost in time.

    ? proposed an ecient method to incrementally update the eigenvectors without storing

    the past samples. This method updates the current eigenvectors using only the newest

    sample. One problem o this method is that the subspace may be improperly updated by

    some incoming outliers or noisy samples. To avoid this problem, we proposes a revised

    incremental subspace updating scheme, called the weighted incremental PCA (WI-PCA).

    The merit o the WI-PCA is to update the subspace using a new coming sample i this

    sample is reliable enough. To this end, each incoming sample is associated with a weight

    value, which is inversely proportional to the residual computed on approximating the sampleusing its lower-dimensional subspace representation. I the residual is large, meaning that

    this sample is very likely to be an outlier or a noise with respect to the current subspace,

    then the associated weight value is small. In what ollows, we present the ormal derivations

    o the WI-PCA.

    Let CN be the covariance matrix computed rom {xi}1iN, and CN+1 be the new

    covariance matrix obtained ater adding a new sample xN+1. Similarly, the mean vectors

    7

  • 8/2/2019 Tracking Revised)

    8/29

    beore and ater adding the new sample xN+1 are denoted by xN and xN+1, respectively.

    Both CN+1 and xN+1 can be derived recursively as ollows:

    xN+1 =1N+1

    i=1 i

    Ni=1

    ixN + N+1xN+1

    , (1)

    CN+1 =

    Ni=1 iN+1i=1 i

    CN +N+1

    Ni=1 i

    (N+1

    i=1 i)2

    xxT, (2)

    where x = (xN+1 xN), and i is the weight associated with xi. Note that the above

    computations involve no past samples in {xi}1iN. The value o the term Ni=1 i can be

    also incrementally updated and stored with a variable, say N, on the arrival o each new

    sample. In efect, this term sums up the weights o the past samples. Here, we introduce a

    orgetting actor f, or 0 < f 1, into the adaptive update o N, i.e., N+1 = fN+N+1.

    The orgetting actor aims to lower down the importance o the past particles. Accordingly,

    the adaptive computation o xN+1 and CN+1 in Eq. (1) and Eq. (2) can be rewritten with

    xN+1 =1

    N+1(fNxN + N+1xN+1) , (3)

    CN+1 =fNN+1

    CN +N+1fN(N+1)2

    xxT. (4)

    In physical meaning, the associated weight i in Eq. (1) and Eq. (2) diferentiates the

    inuence o the sample xi on the PCA process. Noisy or outlier samples are assigned

    with smaller weights to reduce their inuence. As mentioned previously, we can relate the

    weight value o a sample xi to the approximation residual using the current linear subspace.

    According to PCA, the approximation residual can be computed by ri = xi x Uci.Accordingly, the weight i o xi can be set as N+1 = exp(k||ri||), where k is a constant

    or controlling the rate o change o the weight with respect to the magnitude o the residual

    and we set it to 1 in our implementation.

    Let U be the matrix whose columns comprise the current set o eigenvectors obtained

    rom the PCA on N samples, {xi}1iN. When the new incoming sample xN+1 becomes

    available, the new set o eigenvectors must be recomputed and stored into the matrix U.

    8

  • 8/2/2019 Tracking Revised)

    9/29

    The underlying eigenproblem o the PCA can then be ormulated by

    CN+1U = U (5)

    where is a diagonal matrix whose diagonal elements are the eigenvalues corresponding to

    the eigenvectors in U. Due to the orthogonality o both U and U, the new eigenvectors ater

    adding the new sample can be considered as a rotated version o the set o old eigenvectors,

    i.e. U = RU, where R is an orthornormal rotation matrix. Equivalently, it can be written

    with

    U = UR (6)

    where R = UTRU or another rotation matrix R. From Eqs. (5) and (6), we have

    UTCN+1UR = R (7)

    Consequently, Eq. (7) leads to the equation o a new eigenproblem. The solution o the

    rotation matrix R is exactly the eigenvectors o the composite matrix U

    T

    CN+1U. Sincethis composite matrix has a much lower dimension than the matrix CN+1, the computation

    complexity or deriving its eigenvectors is also much lower. Ater nding the rotation matrix

    R rom the composite matrix, the new eigenvectors can be obtained rom Eq. (6).

    4. Visual Object Tracking by Iterative Particle Filter

    4.1. Particle Filter

    A particle lter ormulates the tracking problem by a state prediction equation,

    xk = f(xk1, uk) (8)

    and a measurement (or observation) unction,

    zk = h(xk, nk) (9)

    9

  • 8/2/2019 Tracking Revised)

    10/29

    where xk Rn and zk R

    d are respectively the vector o state parameters and the mea-

    surement (or observation) at the time k, uk and nk are respectively an independent and

    identically distributed (i.i.d.) random vector o process noise and measurement noise, and

    the unctions f() and h() respectively dene a prediction model o the state parameter

    vector and a measurement unction with respect to the given state parameter vector . This

    measurement is generally modeled by a likelihood unction p(zk|xk).

    The Sequential Importance Resampling (SIR) method (??), a well-known state parame-

    ter estimating method o the particle lter, approximates the expectation o the prediction

    E[f(xk)p(xk|z1:k)] by aggregating a set o weighted particles Sk = {xi

    k, wi

    k}1iNs where theweight wik approximate the relative posterior probabilities o the stochastically generated

    particle and satisesNs

    i=1 wik = 1. The aggregated state estimation is

    xk = E[f(xk)p(xk|z1:k)] = E[xk|z1:k] Nsi=1

    wikxik. (10)

    With respect to the measurement zik induced by the hypothetic estimate xik on a particle i,

    the particle weight wik or the current rame turns out to be

    wik wik1p(z

    ik|x

    ik). (11)

    One common problem with the particle lter is the degeneracy problem, which occurs

    ater several iterations o re-weighting (?). This problem occurs when all but one particle

    has negligible weight, implying that most computational eforts are wasted on updating

    particles that contribute nothing to the approximation to p(xk|z1:k). A remedial operationis to perorm a resampling process o particles i the number o efective particles is too

    small. To determine the appropriate timing or the resampling process, a criterion or

    the degeneracy is dened as Neff = 1/NSi=1(wik)2, which logically indicates the number oparticles with efective weights. The resampling process is thus initiated i Neff NT,

    where NT is a predened threshold. In the resampling process, each inefective particle is

    replaced with a new particle which carries a state instance stochastically perturbed rom an

    10

  • 8/2/2019 Tracking Revised)

    11/29

    existing particle with a higher particle weight. All new particles ater the resampling process

    have equal weight (i.e., wik1

    = 1/Ns), implying that wi

    krelates only to p(zi

    k|xi

    k) according

    to Eq. (11).

    4.2. The Iterative Particle Filter

    In the SIR particle lter, the particle weights are related to the likelihood value p(zik|xik).

    In our method, we design the weight o each particle to be proportional to a criterion unction

    G(zik|xik), which denes quantitatively the goodness o the observation z

    ik with respect to the

    hypothetic state parameter xik. Instead o employing the conditional resampling strategy o

    the SIR particle lter, we perorm the resampling process unconditionally at every rame

    k. Furthermore, when doing tracking on each rame, a ltering process and the resampling

    process are iteratively perormed or a xed number o iterations to enhance the goodness

    o the survived particles or tracking. This is why we call the proposed particle lter an

    iterative particle lter.

    4.2.1. Models of State Transition and Measurement

    For tracking an object with a particle lter, we represent the state parameters with the

    location and the dimension o the tracked object on each video rame. Hence, the state

    parameters are encoded with a vector x = (x,y,w,h), dening the objects bounding box

    which has its upper-let corner situated at (x, y) and a dimension o w h pixels on the

    rame.

    As ormulated in Eqs. (8) and (9), a particle lter requires a state transition model

    and a measurement model. To characterize the object motion, a discrete equal-velocity

    equation is adopted or modeling the position parameter o the object, i.e. vk = pk1pk2,

    where pk = (xk, yk, 0, 0)T denotes the position o the objects bounding box at the rame

    k. For simplicity, the perturbation uk in Equation (8) is dened as a random vector uk =

    (xk, yk, wk, hk)T. This vector u adds small random deviations, xk, yk, wk and

    hk, to the estimated position and dimension o the object. Combining the velocity and

    11

  • 8/2/2019 Tracking Revised)

    12/29

    the random perturbations, the state transition model can be dened as

    xk = xk1 + vk + uk. (12)

    Given the state parameter vector xk, the measurement on the input rame Ik is dened as

    the appearance o the tracked object, i.e., zk = I(xk), where Ik(xk) denotes the subimage

    enclosed by the bounding box specied by xk. Here, no random noise, as the nk in Eq. (9),

    is assumed or the measurement model.

    4.2.2. Iterative Filtering and Resampling

    The particle weight in our design is computed rom a quantitative unction G(zik|xik) which

    evaluates the goodness o the hypothetic observation on each particle. Suppose that tk is the

    adaptive template used or tracking the target object. The quantitative unction G(zik|xik)

    can be designed in terms o the matching error between the observation zik = I(xik)) and the

    template tk. Based on the linear subspace representation, the particle weight is designed as

    wik = G(zik|xik) = exp(Utkzik Utktk2k), (13)

    where the columns in the matrix U contain the eigenvectors o the current linear subspace.

    The matching error in Eq. (13) involves a weighted norm which is dened as

    a = (a1, a2, . . . , am)t2k=(1,2,...,m)t

    =mi=1

    ia2i (14)

    The parameter in Eq. (13) is a small positive value and controls the sensitivity o thegoodness value to the change o the weighted norm. The elements in the weight vector

    k reect the importance o the elements o the vector in calculating the norm. Inside the

    compressed adaptive template Utktk, i the values o an element present only a small variance

    over a period o time, meaning that this element is stable and reliable under the current

    appearance variations, then this element should gain a higher weight ( i) in computing the

    norm. The detailed procedure or obtaining the weight vector k and the adaptive template

    12

  • 8/2/2019 Tracking Revised)

    13/29

    tk is presented later in Section 4.4.

    When tracking an object on a certain video rame, we use the nal state estimation

    o the previous rame as the seed to stochastically generate hypothetic state instances on

    the particles. The generation o state instances ollows the stochastic state transition model

    given in Eq. (12). The goodness o each particle is then evaluated according to Eq. (13). With

    the particles and the goodness o particles (say Sk = {(xik, w

    ik = G(z

    ik|x

    ik))}

    Nsi=1), a ltering

    operation removes the particles with lower goodness values and the remained particles are:

    S

    k

    = filter(Sk, ) = {(xi

    k

    , wi

    k

    )|((xi

    k

    , wi

    k

    )i

    k

    Sk) (wi

    k

    )}, (15)

    where = 1.2 min{wik}Nsi=1 is set as a 20% increment o the minimum value o the weights.

    With the particles in the new particle set Sk, the resampling process particle lter is then

    perormed. In the resampling process, the particles are resampled according a probability

    proportional to their weights. For each sampled particle, a small random perturbation uk,

    as dened in Eq.(12), is made on the carried state instance to increase the opportunity

    o escaping rom a locally optimal estimation. The random perturbation is designed to

    decrease with iterations, i.e., uk = iteruk, or (0, 1), to ensure the nal convergence o

    the state estimation ater several iterations. On each rame, the ltering operation and the

    resampling process are iteratively perormed in turn or several runs to enhance the goodness

    o the remained particles or tracking.

    Let Sk be the particle set obtained rom the nal run o resampling process. The weight

    or each particle in Sk is re-evaluated according to Eq. (13). Each weight is then normalized

    by dividing its value by the the sum o all weights. Finally, the aggregation scheme isperormed to iner the nal estimation rom the normalized weights, i.e.,

    xk =

    (xik,wi

    k)filter(S

    k,)

    wikxik. (16)

    To illustrate the gradual improvement o the tracking as the iterations go, Figure 1

    demonstrates an example to track a ast-moving hand. The white box in Figure 1(a) shows

    13

  • 8/2/2019 Tracking Revised)

    14/29

    (a) (b) (c)

    Figure 1: The gradually improved particle estimates o the proposed iterative particle flter or 3 iterationso particle resampling. (a) Tracking result on a rame at time t 1, (b) the particle estimates o the threeiterations o particle resampling, illustrated respectively with white boxes, gray boxes, and black boxes, onthe rame at time t, and (c) the fnal aggregated tracking result on the rame at time t.

    the tracking result on a certain rame. On tracking the hand in the next rame shown

    in 1(b), the white boxes illustrate the estimates rom 100 particles generated in the rst

    iteration. Ater the second iteration, the estimates rom these particles are illustrated with

    gray boxes. Apparently, the estimates get closer to the true position o the hand comparing

    to the estimates in the rst iteration. When the third iteration is completed, the generated

    estimates illustrated with black boxes are even better than the estimates in the second

    iteration. The nal aggregated estimation is shown with the white box in Figure 1(c).

    For the conventional particle lter, this kind o ast-moving objects would demand a large

    number o particles (e.g., >600) and require large perturbations to attain good tracking

    results. Additionally, the tracking results on diferent video rames may drit unstably

    because o the introduced large perturbations.

    4.3. Handling of Partial Occlusions

    Owing to the RWLS representation, the matching between each observation and theobject template is also perormed in a region-wise manner. Intuitively, the matching errors

    o occluded regions would be larger than those o un-occluded regions. Let zk(r) be the image

    observation corresponding to region r, and tk(r) be the corresponding regional sub-image

    on the object template. The regional matching error on this region is computed by

    Err(zk(r)) = Utkzk(r) U

    tktk(r)k , (17)

    14

  • 8/2/2019 Tracking Revised)

    15/29

    where the matrix Uk contains the eigenvectors o the current linear subspace, and the weight

    norm k is dened in Eq. (14). A region r is claimed to be occluded i its regional matching

    error satises the ollowing condition

    Err(zk(r)) > mean({Err(zk(r))}Rr=1) + stdv({Err(zk(r))}

    Rr=1), (18)

    where R is the total number o partitioned regions and the two unctions mean(S) and

    stdv(S) compute respectively the mean and standard deviation o the data in a given data

    set S. The constant controls the allowed error deviation rom the averaged regional

    matching errors. For each particle, i more than a hal o the partitioned regions on the

    hypothetic observation o this particle are identied as occluded regions, then this particle

    is discarded. For each video rame, i no particle remains ater the identication o occluded

    regions, then the particle lter skips the tracking on the current rame and proceeds to the

    next rame.

    For handling partial occlusions, the matching errors o occluded regions should not be

    included into the nal matching error between the observation and the template. Otherwise,

    the tracking may be ailed by these occluded regions. Hence, we rene the matching error

    between the object observation and the object template as

    Err(zk) =1

    R |Socc|

    r/Socc

    Err(zk(r)), (19)

    where Socc is the set containing all occluded regions and |Socc| denotes the number o occluded

    regions.

    4.4. Adaptive Template Updating

    The incremental updating o the linear subspace adapts the subspace representation

    to the up-to-date inputs. However, even with the up-to-date subspace representation, the

    stored object template is still likely to be out-o-date. Thereore, adaptively updating the

    template is another important mechanism or handling the appearance variations o objects.

    We call the adaptively updated template an adaptive template.

    15

  • 8/2/2019 Tracking Revised)

    16/29

    The adaptive template is updated with the appearance o the object tracked on the most

    recent rame. Nonetheless, the template updating should be made conditional to avoid the

    improper inuence rom disturbances. According to the tracked object on the current rame,

    i no region is identied as an occluded region, meaning that the tracked observation has

    no atal disturbances in appearance, then the template can be updated with the tracked

    observation. Let tk be the adaptive template at the rame k, and zk be the corresponding

    tracked observation. The update is perormed according to

    tk+1 = (1 )tk + zk, (20)

    where (0, 1) controls the rate o the updating. A larger value o means a aster

    adaptation o the template toward the new object appearance. Empirically, the value o this

    parameter highly depends on the rate o appearance changes. The parameter is normally

    set below 0.05 or objects with normal moving speeds and slowly changed illumination and

    around 0.05 0.5 or aster variations in object appearances and illumination.

    Since the template is adaptable, some elements o the template vector may have requent

    changes due to the appearance changes o the tracked object. Such varying elements may

    unstably afect the calculation o weighted norm between the observed appearance and the

    object template in Eq.(13) and Eq. (17) and consequently lead to wrong tracking results.

    Hence, we introduce the weight vector in Eq.(13) and Eq. (17) to reduce such a negative

    efect. Recall that the elements o the weight vector assign diferent importance actors

    to the elements o the compared vectors on calculating the weighted norm. Highly varying

    elements in the template should be given low importance values. Thereore, the importanceo each element in the template vector can be quantitatively evaluated in terms o the

    variance computed rom the element values collected rom the tracked objects on past rames.

    Similar to the adaptive updating in Eq. (1) and Eq. (2), the variances o the elements in the

    16

  • 8/2/2019 Tracking Revised)

    17/29

    template vector represented in the linear subspace can be incrementally updated by

    k+1 =1

    k + 1kk + Utk+1tk+1 , (21)

    k+1 =k

    k + 1k +

    k

    (k + 1)2(Utk+1tk+1 k+1)(U

    tk+1tk+1 k+1)

    t, (22)

    k+1 = expv(diag(1

    2(Utk+1tk+1 k+1)

    1k+1(U

    tk+1tk+1 k+1)

    t)), (23)

    where diag(M) denotes a vector composed o the diagonal elements o a matrix M and

    expv

    (a = (a1, a2, . . . , ad)t) = (exp(a1), exp(a2), . . . , exp(ad))

    t. In summary, the proposed

    method adapts both the template and the linear space. Eq. (20) denes the way to adapt the

    template in the original image space. Meanwhile, the eigenspace is incrementally updated by

    the proposed method presented in Section 3.2. Eqs. (21)-(23) denes the way to adaptively

    compute the weight vector k required or calculating the matching residual dened in

    Eq. (17). Algorithm 1 lists o the detailed steps o the proposed iterative particle lter.

    5. Experimental Results and Performance Comparison

    To evaluate the perormance o the proposed algorithm, some experiments are conducted

    to track objects on six testing video sequences acquired in real-world environments. Two

    sequences are available at http://www.cs.toronto.edu/dross/ivt/, presenting appear-

    ance variations, including illumination changes, pose changes and acial expressions, on the

    tracked objects. The rest our sequences are captured with our camcorder and present the

    cases o partial occlusions, size variations, and ast motions. The tracking algorithm is im-

    plemented in C++ on Microsot Visual Studio using the Intel Pentium 4 2.8 GHz CPU. Theassessed processing speed is about 14.7 rames per second or 100 particles.

    When initializing the tracking o a sequence, we manually speciy the rectangle bounding

    box or the target object on the rst rame. The boxed target object appearance is then

    resized to a 2424 object template. Then, we randomly translate and scale the bounding

    box by a small random perturbation or 100 times to acquire 100 samples to build the initial

    linear subspaces o the partitioned regions. Two region partitioning schemes, R1 and R2, are

    17

  • 8/2/2019 Tracking Revised)

    18/29

    Algorithm 1 The Proposed Tracking Algorithm with Occlusion Handling

    1: Given a particle set Sk1 = {xik1, 1/Ns}

    Nsi=0, target template tk1 and subspace model

    k1 = {xk1, Uk1, Ck1} at the rame k 1.2: Set occ flag = 0 to indicate no occlusion.3: Set iter = 1.4: for i = 1 : Ns do5: Propagate the particle set or the initial iteration by xik,iter = x

    ik1 + vk + uk.

    6: Evaluate the measurement zik,iter corresponding to state xik,iter.

    7: Update the weight wi

    k,iter

    by Eq. (13).8: end for9: Normalize the weight wik,iter = w

    ik,iter/

    Nsi=1 w

    ik,iter, or 1 i Ns.

    10: for iter = 2 : Iter do11: Generate the seed sample set according to ltering unction Sk,iter =

    {xjk,iter, wjk,iter} = filter(Sk,iter1, ) by Eq. (15) .

    12: Set c0k = 0.13: for j = 1 : J do14: cjk = c

    j1k + w

    jk,iter.

    15: end for16: Normalize the cumulative probability cjk = c

    jk/c

    Jk or 1 j J.

    17: for i = 1 : Ns do18: Generate a uniormly distributed random number r [0, 1].19: Find the smallest j which cjk r.20: Propagate the particle set xik,iter = x

    jk,iter +

    iteruk.

    21: Evaluate the measurement zik,iter corresponding to state xik,iter.

    22: Update the weight wik,iter by Eq. (13).23: end for24: Normalize the weight wik,iter = w

    ik,iter/

    Nsi=1 w

    ik,iter.

    25: end for26: Perorm the ltering unction Sk = filter(S

    k,iter,

    ).

    27: Normalize the weight wik,iter = wik,iter/Nsi=1 wik,iter.28: Estimate the state xk by Eq. (16).29: Set occ flag according to match error by Eq. (18).30: if occ flag = 0 then31: Update the template tk by Eq. (20) and subspace model k presented in Section 3.2.32: end if

    18

  • 8/2/2019 Tracking Revised)

    19/29

    Table 1: Parameter settings or the proposed iterative particle flter

    orgetting actor, [0.05,1] 0.2 0.7 1.5 [0.03,0.1]

    used or comparison. The numbers o eigenvectors used in representing the linear subspaces

    o R1 and R2 are 50 and 15, respectively. Only 100 particles are generated in our iterative

    particle lter on tracking each testing sequence.

    The setting o other parameters, including orgetting actor, , , and , are listed

    in Table 1. The crucial parameters are orgetting actor and learning rate o template

    , which should be set according to the rate o appearance changes o the tracked object.

    The range o the uniormly distributed random vector uk = (xk, yk, wk, hk)T are

    xk U[12, 12], yk [12, 12], wk [0.035, 0.035], hk [0.035, 0.035].

    5.1. Experimental Results

    The experiment conducted on the rst testing video sequence is to track a human ace

    (dudek) with diferent head poses and acial expressions. Figure 2 demonstrates some snap-

    shots o the tracking results. At the bottom o each illustrated snapshot, the thumbnail

    images rom let to right show the tracked target, the template, the subspace mean, the

    approximation error (residual) image and the approximated image, respectively. For com-

    parison, Figure 2 simultaneously illustrates the tracking results on using the ollowing our

    diferent combinations o adaptive mechanisms:

    1. a xed template and a xed subspace representation,

    2. an adaptive template and a xed subspace representation,

    3. a xed template and an adaptive subspace representation, and

    4. an adaptive template and an adaptive subspace representation.

    The results show evidentially that an adaptive template combined with an adaptive subspace

    representation attains the best perormance. The combinations with no adaptive templates

    19

  • 8/2/2019 Tracking Revised)

    20/29

    ail to track the aces rom Frame #796 onward, while the one with both the adaptive

    template and the adaptive subspace correctly tracks the aces on all rames.

    Another video sequence or tracking the ace o another person (ming-hsuan) is also

    tested. Figure 3 shows the snapshots o the tracking results or comparing the use o the xed

    subspace and the adaptive subspace. Both compared methods use the adaptive template.

    Note that this testing sequence contains large illumination variations on some rames (#400

    and #1200). The top row in Figure 3, which illustrates the tracking results o the xed

    subspace, shows that the ace cannot be well tracked on Frame #1420. However, this rame

    still can be correctly tracked by the adaptive subspace, as shown in the bottom row oFigure 3.

    Frame362

    Frame684

    Frame934Figure 2: Face tracking on a sequence (dudek) with variations in poses and acial expressions. Column 1:the tracking results with a fxed template and a fxed subspace model; Column 2: the tracking results with afxed template and an adaptive subspace model; Column 3: the tracking results with an adaptive templateand a fxed subspace model; Column 4: the tracking results with an adaptive template and an adaptivesubspace model.

    Two other sequences are tested to examine the ecacy o the RWLS in handling partial

    occlusions. The rst sequence is the video o a moving toy tank which is gradually occluded

    20

  • 8/2/2019 Tracking Revised)

    21/29

    Frame 1 Frame 400 Frame 1200 Frame 1420

    Frame 840 Frame 1120 Frame 1200 Frame 1420

    Figure 3: Face tracking on another sequence (ming-hsuan) with large variations in illumination and poses.The top row shows the tracking results with an adaptive template and a fxed subspace representation. Thebottom row shows the tracking results with an adaptive template and an adaptive subspace representation.

    by another scene object during its motion. The maximal occluded area during its motion is

    about 50%. Figure 4 shows the tracking results on some rames or the region partitioning

    schemes o R1 and R2. Note that the character N shown at the let side o each snapshot o

    the scheme R2 indicates that the corresponding partitioned region is automatically identied

    as an un-occluded region by the proposed tracking algorithm. On the contrary, a region

    which is identied as an occluded regions is labeled with its region identication number. As

    shown in Figure 4, the scheme R2 successully tracks the occluded tank, while the scheme

    R1 ails. This result veries the good capability in handling partial occlusions or the

    region-wise tracking o objects. The tracking on the other testing sequence is to track a

    Chinese character printed on an aluminium oil package. As shown in Figure 5, the maximal

    occluded area during its motion is about 40% o size o this Chinese character. The results

    demonstrate again the superiority o the proposed region-wise tracking o objects in handling

    partial occlusions.

    21

  • 8/2/2019 Tracking Revised)

    22/29

    Frame 1 Frame 295 Frame 308 Frame 500

    Frame 1 Frame 295 Frame 308 Frame 500

    Figure 4: Tracking a moving toy tank with severe occlusions during its motion. Top row: the results or theR1 representation. Bottom row: the results or the R2 representation.

    Frame 1 Frame 57 Frame 77 Frame 97

    Frame 1 Frame 57 Frame 77 Frame 97

    Figure 5: Tracking a Chinese character on a moving aluminium oil package with partial occlusions. Toprow: the results or the R1 representation. Bottom row: the results or the R4 representation. The objectstracked by the region partitioning R4 are more accurate in object size on Frame 77 and Frame 97.

    22

  • 8/2/2019 Tracking Revised)

    23/29

    5.2. Performance Comparisons with Other Particle Filters

    The perormance o the proposed iterative particle lter are compared with those theSIR particle lter and a general version o particle lter, which is called the GPF in this

    paper. The GPF perorms only one iteration o particle generation. For a more balanced

    comparison, both the adaptive subspace and adaptive template are exploited in the SIR and

    GPF. Two video sequences are tested or the perormance comparison. One sequence is a

    car moving away rom the camera at a normal speed and the other is a doll moved by a ast

    moving hand. On testing the video sequence, both SIR and GPF use 300 particles, while

    our iterative particle lter uses 75 particles in each iteration. The number o iterations areset to our.

    As shown in Figure 6, both SIR and GPF ail to track the car accurately when the

    car runs urther or the car sequence. This result indicates that SIR and GPF cannot well

    handle the size variations o the car. In contrast, the proposed iterative particle lter gains

    better capability in tracking objects with size variations according to the results shown in

    Figure 6. The perormance comparison or the doll sequence is shown in Figure 7. Note

    that the high moving speed o the hand causes motion blurs on some rames. Consequently,

    both SIR and GPF ail to track the doll rom Frame #130 onward. Again, the proposed

    iterative particle lter demonstrates its better perormance in tracking ast-moving objects

    in this test. The proposed iterative particle lter successully tracks the doll on every rame

    o the sequence.

    5.3. Performance Comparison with Halls Incremental PCA

    The proposed WI-PCA is an improved variant over the incremental PCA proposed by?. Hence, we use three sequences, including the car sequence, the doll sequence and the

    jal sequence, as the benchmarks or comparing the perormance o Halls method and our

    method. Note that the adaptive template is also exploited in both methods or the test.

    To quantitatively evaluate the tracking accuracy or comparison, we compute the posi-

    tion error or the tracking result on each video rame. The position error is dened to be

    Errpos =

    4i=1(X(i) X(i))

    2, where X(i) and X(i), or 1 i 4, are respectively the

    23

  • 8/2/2019 Tracking Revised)

    24/29

    Frame 1 Frame 62 Frame 77 Frame 115

    Figure 6: Tracking a car moving away rom the camera using the SIR, the GPF (using 300 particles), and theproposed particle flter (using 4 iterations with 75 particles per iteration) on rows 1, 2, and 3, respectively.

    24

  • 8/2/2019 Tracking Revised)

    25/29

    Frame 109 Frame 127 Frame 130 Frame 137

    Frame 109 Frame 127 Frame 130 Frame 137

    Frame 109 Frame 127 Frame 130 Frame 137

    Figure 7: Tracking a doll moved rapidly by a hand. The top two rows are the tracking results or SIR andGPF, respectively. The bottom row is the tracking results o the proposed particle flter.

    25

  • 8/2/2019 Tracking Revised)

    26/29

    corresponding corners o the bounding boxes o the tracked object and the ground truth.

    Ater ve rounds o tracking on each sequence, Table 2 lists the position errors averaged

    over all rames or these three sequences. The results show that the accuracy o the pro-

    posed WI-PCA is slightly better than that o Halls approach. In the test, we nd that

    Halls approach may improperly update the linear subspace with the samples o bad object

    appearances drawn rom incorrect tracking results. On the contrary, our proposed WI-PCA

    can ignore the bad samples or updating through the designed weighting mechanism. Fur-

    thermore, the introduced orgetting actor in our WI-PCA gives higher inuence to the

    up-to-date good samples, while Halls method equally treats the out-o-date samples.

    Table 2: the statistics o the tracking position errors or three video sequences.

    jal car dollWI-PCA method 5.350.02 6.501.16 7.420.31

    Halls method 6.450.23 7.252.14 7.600.40

    To urther demonstrate the efectiveness o the proposed WI-PCA in modeling object

    appearance, we also compare the reconstruction errors between the WI-PCA method and

    the Halls method. Fity eigenvectors are selected to reconstruct the appearance o ob-

    ject or both compared methods. In addition to the previous three benchmark sequences,

    we also include the two ace sequences (dudek and ming-hsuan ) used in Section 5.1 or

    evaluation. Table 3 presents the computed reconstruction errors averaged over all rames

    or ve runs o tracking on each sequence. The reconstruction error is dened to be

    RMSE = 1NNi=1(I(i) I(i))2, where I(i) and I(i) are respectively the ith pixel onthe template image and the reconstructed image and N is the total number o the pixels onthe template. The results in Table 3 show that the proposed WI-PCA better represents the

    varying appearances o objects.

    5.4. Performance Comparison with Mean-Shift Algorithm

    We urther compare the proposed tracker with the mean-shit algorithm ( ??) and the

    CAMShit algorithm (?) on the doll sequence and the jal sequence. As described in Section

    26

  • 8/2/2019 Tracking Revised)

    27/29

    Table 3: statistics o the reconstruction error (RMSE per pixel).

    dudek ming-hsuan jal car dollWI-PCA method 0.7680.005 0.9960.002 0.9550.003 1.3520.074 1.4220.001

    Halls method 0.7390.117 1.3060.018 0.9970.011 1.5170.017 2.1320.116

    2, since the mean-shit algorithm uses the global color histogram as the visual eature or

    object tracking, accurate estimation o object poses becomes more dicult. Furthermore,

    the mean-shit tracker is error-prone under varying illumination because the color histogram

    is sensitive to illumination. Figure 9 demonstrates some snapshots o the tracking results on

    the two testing sequences. The tracked results o the proposed tracker and the mean-shit

    tracker are drawn with solid white boxes and dashed yellow boxes, respectively. Figure 9

    shows that the mean-shit tracker loses the doll when the doll move rapidly (#130, #200).

    The proposed tracker tracks very well till the end o the sequence. Figure 9 shows that the

    mean-shit tracker ails to track the ace under illumination changes (#40, #170). Figure

    10 (a) and (b) plot respectively the position error o the tracked object on each rame o the

    tracked doll and the jal sequences or these compared two trackers.

    Frame 110 Frame 130 Frame 182 Frame 209

    Frame 40 Frame 80 Frame 160 Frame 197

    Figure 8: The snapshots o the tracked object on the doll sequence and the jal sequence or comparing theproposed tracker, the mean-shit tracker and the CAMShit tracker. The dashed yellow boxes are the resultso the mean-shit tracker; the solid red boxes are the results o the CAMShit tracker; the solid white boxesare the results o the proposed tracker.

    27

  • 8/2/2019 Tracking Revised)

    28/29

    6. Concluding Remarks

    This paper designs an improved particle lter and a RWLS representation or visual

    object tracking. With this RWLS representation, the proposed tracking method partitions

    the object into kk independent regions. The independent tracking o these regions enables

    the proposed method to ignore the occluded regions and continue the tracking o un-occluded

    regions. Thus, the partial occlusions can be efectively handled during the tracking.

    To enhance the adaptability o the linear subspace, an adaptive subspace learning model,

    the WI-PCA, which can eciently and incrementally update the built subspace is proposed.

    This adaptive learning model can well adapt to the variations in object appearances and

    illumination. In addition, the WI-PCA seeks to modulate the bad inuence rom outliers,

    noisy inputs, and out-o-date data by introducing a weighting mechanism and a orgetting

    actor into the adaptation. Besides the adaptive subspace, the object template or tracking

    is also made adaptive so that the dynamic appearance variations o the tracked object can

    be handled even better.

    On the particle lter ramework, we propose an iterative particle lter to improve over

    the traditional particle lters. The improved particle lter eatures a better strategy o

    iterative particle generation which is designed to guarantee the gradually improved quality

    o generated particles. The improved particle quality leads to an improved tracking accu-

    racy ater the particle aggregation. The experimental results demonstrate the efectiveness

    and the superiority o the proposed algorithm on tracking objects undergoing various pose

    changes, partial occlusions, and illumination variations.

    28

  • 8/2/2019 Tracking Revised)

    29/29

    (a) The doll sequence.

    (b) The jal sequence.

    Figure 9: The comparison o position errors o the tracked objects on the doll sequence and the jal sequenceor the mean-shit tracker, the CAMShit tracker and the proposed tracker.

    29