se263 video analytics course project initial report

22
SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09

Upload: meriel

Post on 25-Feb-2016

30 views

Category:

Documents


1 download

DESCRIPTION

SE263 Video Analytics Course Project Initial Report. X. Mei and H. Ling, ICCV’09. Presented by M. Aravind Krishnan, SERC, IISc. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SE263 Video Analytics  Course Project Initial Report

SE263 Video Analytics Course Project Initial Report

Presented by M. Aravind Krishnan, SERC, IISc

X. Mei and H. Ling, ICCV’09

Page 2: SE263 Video Analytics  Course Project Initial Report

AIM of the course project is to implement and if possible, improve the work done by

Xue Mei and Haibin Ling in visual tracking, as explained in their paper Robust Visual

Tracking using l1 minimization.

By ‘improve’ it is meant to ‘accelerate’ the speed of execution using special

processing hardware called Graphics Processing Units.

Page 3: SE263 Video Analytics  Course Project Initial Report

1. I will begin by explaining the work done in the paper, and the various mathematical

tools used in achieving the desired results.

1. Bayesian state inference framework, used to predict the affine state of the object.

(Called the particle filter)

2. Sparse representation of the Tracking target.

3. Non-negativity constraints

4. l1 minimization

5. Template update

2. This will be followed by a brief overview of Graphics processing Units, and how they

can be used for general purpose computation.

3. Finally the parts of the algorithm most suited for being executed in a GPU is

proposed.

OVERVIEW

Page 4: SE263 Video Analytics  Course Project Initial Report

Templates

• Sample/collection of possible views of the object, whose linear

combination can be used to represent the tracked object in the

frame.

• Two types of templates are considered in this paper, Target templates

and Trivial templates.

• Target templates to deal with various lighting conditions, poses, etc.

• Trivial templates to deal with occlusions, noise, bacckground clutter,

etc.

Page 5: SE263 Video Analytics  Course Project Initial Report
Page 6: SE263 Video Analytics  Course Project Initial Report

Templates continued

• Target templates are densely used to

represent, and hence are less in number.

• Trivial templates are sparsely used to

represent, and hence can be large in number.

Page 7: SE263 Video Analytics  Course Project Initial Report

State of object being trackedXt =

2D deformation parameters

2D translation parameters

Page 8: SE263 Video Analytics  Course Project Initial Report

If zt is the observed distribution of the state of the object at time t, then the predicted distribution of the object xt is given by the recursive computation

Page 9: SE263 Video Analytics  Course Project Initial Report
Page 10: SE263 Video Analytics  Course Project Initial Report

"filtering" refers to determining the distribution of a latent variable at a specific time, given all observations up to that time; particle filters are so named because they allow for approximate "filtering" using a set of "particles" (differently-weighted samples of the distribution). -Wikipedia

Page 11: SE263 Video Analytics  Course Project Initial Report

l1 minimization

Page 12: SE263 Video Analytics  Course Project Initial Report
Page 13: SE263 Video Analytics  Course Project Initial Report

Non negativity

Page 14: SE263 Video Analytics  Course Project Initial Report

Optimization

Convex Optimization – Interior point method The method uses the preconditioned conjugate gradients (PCG) algorithm to compute the search direction and the run time is determined by the product of the total number of PCG steps required over all iterations and the cost of a PCGstep. This process can be accelerated by GPUs.

Page 15: SE263 Video Analytics  Course Project Initial Report

Algorithm for template update

Page 16: SE263 Video Analytics  Course Project Initial Report

Review of AlgorithmFrame 1

1. Manually detect object to be tracked2. Initialize Target Templates with random variations of

object

Generate a set of N states around current state Xt, with each of the 6 affine parameters being modeled as an independent

gaussian variable.

Calculate p(Xt|Z1:t ) by determining the Bayesian weights of the importance wi = p(zt|xt), in turn determined from the

errors/residuals in projecting the tracked object onto each of the solutions of 3.

Represent each of the N generated states as a sparse linear combination of target and trivial templates by solving the l1

minimization problem min||Bc-y||22+λ||c||1

Update templates if the highest similarity of the templates with newly tracked object is less than a threshold. Do by replacing

lowest similarity template with the newly tracked object.

1

2

3

4

5

Page 17: SE263 Video Analytics  Course Project Initial Report

Working of a GPU

• Consists of a lot ALUs.Banks of ALUs with shared memory are called cores.

• An average CPU consists of upto 4 SIMD units.• A GPU consists of 32-128 SIMD units• A tesla C1060 unit available in SERC will be

used to try and speed up the optimization process, and hence the whole algorithm.

Page 18: SE263 Video Analytics  Course Project Initial Report

The functionality of GPUs – Data Parallelism

• GPUs are extremely good at executing the same instruction across bulky data. Eg. Vector addition, Matrix Vector Multiplication, BLAS routines, etc.

• The major bottle-neck of this algorithm is the convex optimization performed using Interior point method. It involves some matrix vector operations over the same matrix and around N different vectors. This can be readily and trivially parallelized, and great speedup can be achieved if done carefully.

Page 19: SE263 Video Analytics  Course Project Initial Report

Architecture of GPU

Page 20: SE263 Video Analytics  Course Project Initial Report
Page 21: SE263 Video Analytics  Course Project Initial Report

Goals and tasks of project• Dividing the minimization algorithm amongst the cores of

the GPU, and figuring out optimal grid configuration. • Optimizing to perform the whole task with minimal data

transfer from CPU to GPU and performing the algorithm in real time using just one kernel invocation, for a long video.

• Achieve a frame rate > 30 fps on Tesla C1060.• Achieve frame rate of 18 fps or more using ATI mobility

Radeon HD 5650 graphics processor with 1Gb internal memory available in my laptop. (requires transcription to OpenCL. Under constraints of time)

Page 22: SE263 Video Analytics  Course Project Initial Report

Thank you