cody lawyer ee 7150 final paper

EE 7150

Comparison of Soccer Player Video Tracking

AlgorithmsDue on Tuesday, December 10, 2013

Cody Lawyer

December 10, 2013

1

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking Algorithms

Contents

Abstract 3

Introduction 4

Technical Discussion 4

Results 10

Conclusion 11

References 12

Appendix 13

of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking Algorithms

Abstract

The problem of tracking players in sporting events using video cameras is an interesting

problem. Tracking players allows for attributes to be measured that would not otherwise

normally be able to be measured, such as fitness and positioning. It is also a difficult

problem because in some sports there are many players on the field, the players occlude each

other, and the camera is possibly moving. There have been a variety of methods proposed

and implemented to track players. The mean shift algorithm and the continuous adaptive

mean shift algorithm both use the color histogram of the target to do the tracking. The

particle filter method uses randomly placed particles and state estimation for tracking. This

project will implement and compare the mean shift algorithm, continuous adaptive mean

shift algorithm and the particle filter method using MATLAB and a soccer video dataset.

of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking Algorithms Abstract

Introduction

The problem of tracking players using video is a commonly studied problem. It is a difficult

problem because the camera can be non-stationary, many players can be on the field and

the players can occlude each other among other difficulties.

In [1], the mean shift algorithm of tracking objects is described. In [2], a modified mean shift

algorithm called the continously adaptive mean shift algorithm is described. Both meth-

ods use a color histogram to represent and track the target. In [3], a multiple hypothesis

tracker is described which implements camera parameter detection, player detection, and

player tracking using multiple camera angles. In [4], player tracking is implemented using a

Kalman filter. Finally, in [5], player tracking is implemented using a particle filter.

In this project, the mean shift algorithm, the continously adaptive mean shift algorithm,

and the particle filter algorithm were selected to be implemented and compared. The mean

shift and continously adaptive mean shift algorithms were chosen because they are similar

so if one is implemented the other is not difficult to implement. The particle filter method

was chosen it is an interesting method and is very different from the other two algorithms.

The rest of this paper is organized as follows: First, a technical discussion describing the

details of the three methods. Next, a description of the dataset used for comparison fol-

lowed by a descripition of the MATLAB simulation results. Finally, a conclusion on which

of the three methods is best for tracking players. The references used as well as an appendix

containing the MATLAB code is also included.

Technical Discussion

Mean Shift Algorithm

For the mean shift algorithm [1], first the user is presented with the first frame in the

video to select what player to track. The frame is converted from RBG color space to HSV

color space. This is done because the hue and saturation values in HSV are less susceptible

to changes in brightness. A small window around where the user selects is used to calculate

the target hue histogram of the image. The histogram is formed by placing each pixel in a

bin based on it’s value. The histogram is then used to calculate the backprojection of the

entire frame. The backprojection is formed by replacing each pixel with the value of the bin

it would be placed in the histogram. As shown in Figure 1, this gives greater values to areas

that match the target to be tracked.

Technical Discussion continued on next page. . . of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking AlgorithmsTechnical Discussion (continued)

Figure 1: Example Backprojection

A window of backprojection values around the target is selected. The following moments

are then computed.

M00 =∑x

∑y

I(x, y)

M10 =∑x

∑y

xI(x, y)

M01 =∑x

∑y

yI(x, y)

Using the moments, the mean x and mean y location of the window can be calculated.

xc =M10

M00

yc =M01

M00

The window is then shifted to the calculated location. If the difference between the previous

location and the new location is smaller than a user defined value, the next frame is loaded.

If the difference is large, the new moments and location are calculated until the window

shifts less than the user defined amount. This done until the target leaves the video frame

or until the end of the video.

The mean shift algorithm requires tuning depending on the application. The tracking win-

dow size needs to be adjusted to be similar is size to the expected target. Also the difference

between locations while continuing to iterate can be adjusted to increase speed or the pre-

cision of the track.

CAMshift Algorithm



The cam shift algorithm [2] is a modified version of the mean shift algorithm. Instead

of using the fixed window size, it adjusts the size of the window throughout the tracking to

better track through occulsions and changes in size.

The cam shift algorithm is implemented by starting with the standard mean shift algorithm.

The window size is adjusted as a function of M00 after it is calculated. This a function that

needs to be tuned to the particular application but in my use it is set as followed.

s = 2 ∗√M00

The window width is set to 1.5s and the window height is set to 2s. The values that s is

multiplied by can be tuned to the particular application.

Particle Filter The particle filter can be used to track objects. It is implemented as de-

scribed in [5] and [6] The tracked object is represented by a collection of randomly generated

particles. Each particle has a state and a weight. The state is represented by

xt = [x y r g xt yt]

where x and y are the location of the particle in the x and y dimension, rt and g are the red

and green chromacity values of the particle, and xt and yt are the velocity of the particle in

the x and y dimension. The chromacity values are between 0 and 1.

The particle and their weights are updated each frame using observed data. For this applica-

tion, the observed data is the extracted player regions from the frame which are represented

by

C = (e, p, c)

where e is the list of edge points for the region, p is the center of mass of the region, and c

is the average color of the region.

Player region detection is implemented via various common digital image processing tasks.

Using the first frame of the video, a playing field mask is calculated. This is used to filter

out sideline clutter. First, the hue histogram of the entire image is calculated. Since the

image is mostly grass, the histogram will have a peak at green hues. The backprojection

of the entire frame is calculated using the histogram and is thresholded. This is shown in

Figure 2 and represents the grass areas in the image.



Figure 2: Playing Field Mask Thresholded Backprojection

The largest areas of this backprojection are selected by finding the connected regions and

counting the pixels. These are assumed to be the large areas inside the playing field. Those

areas are combined by taking the convex hull of them. This results in Figure 3 which is the

playing field mask.

Figure 3: Playing Field Mask

The next step is to determine the player regions in each frame. First, the compliment of the

thresholded backprojection is combined with the playing field mask. This results in Figure

4 which has the players as regions but also has additional clutter such as the playing field



markings.

Figure 4: Player Backprojection

To filter out the clutter and seperate players that are close together, image erosion is applied

to the backprojection. Also regions that are too large, small, wide, or tall are removed. This

results in Figure 5 which are the player regions. The edge points, center of mass, and the

average color are calculated for each region and passed to the particle filter each frame.

Figure 5: Player Regions

On the first frame, the region to be tracked is selected by the user. The particles are dis-

tributed uniformly over the region. For each new frame, the particles are updated using the



previous frame’s particles and noise and the weights are updated using the player region data.

The particles are updated using the previous frame’s particles and noise. The x and y

positions of the particles are updated using the following equations.

xt = xt−1 + ˙xt−1 + vx vx ∼ N(0, σx)

yt = yt−1 + ˙yt−1 + vy vx ∼ N(0, σy)

The red and green chromacity values are updated using the following equations.

rt = rt−1 + vr vr ∼ N(0, σr)

gt = gt−1 + vg vg ∼ N(0, σg)

Finally, the x and y velocity values are updated using the following equations.

xt = γ ˙xt−1 + (1 − γ)(xt − xt− 1) γ = [0, 1]

yt = γ ˙yt−1 + (1 − γ)(yt − yt− 1) γ = [0, 1]

After the particles have been updated for the current frame, the weight of each particle is

calulated. The weights are calculated using the player region data. The error vector for each

particle is calculated as follows. ex = 0 if in region else it is set to distance to the nearest

edge. ey = 0 if in region else it is set to distance to the nearest edge. er = Cr − rt where Cr

is the average red chromacity of the region and rt is the red chromacity value of the particle

for the current frame. In a similar approach, eg = Cg − gt where Cg is the average green

chromacity of the region and gt is the green chromacity value of the particle for the current

frame. ext and eyt are set to zero because they are not updated using any player region data,

just the movement of the particles. The error vector is defined as ed = [ex ey er eg ext eyt ]T .

The weight of each particle is calculated using

w = exp

(−e

TdR

−1ed2

)where R is the covariance matrix to weight errors different amounts if neccessary. For ex-ample, if in your application the color data is noisy, those errors can be weighted less.

A problem with the particle filter is that after a few iterations most of the particles willhave very low weights. This problem can be mostly prevented by resampling the particlesafter the weights are calculated each frame. This done by replacing the particles with smallweights with copies of particles that have large weights. First, a CDF C(i) is calculated usingthe weights of each particle. A random starting point u1 is drawn from U [o,N−1

s ] where Ns

is the number of particles. Then moving along the CDF using uj = ui + N−1s (j − 1) and

while uj > C(i) move until a particle with a large enough weight is found. The new particleis assigned X i∗

k = X ik and the new weight is assigned wi∗

k = N−1s .

Finally, the tracked location of the object is the weighted mean of the particle locations.There is a large amount of tuning the particle filter your particular application. The numberof particle Ns, the particle noises σx, σy, σr, σg, γ, and the player region detection methodall need to be tuned to your particular application. Also some applications with low processnoise are not suited for particle filters as described below in results.

of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking AlgorithmsTechnical Discussion

Results

Dataset

The dataset [7] used was six videos of the same clip of a soccer match. Each video was

a different viewpoint of a stationary camera. The resolution of each camera was 1920 by

1080 pixels. The dataset provided ground truth tracking data for each of the players in the

video. An example frame from one of the cameras is shown in Figure 6.

Figure 6: Sample Frame From Dataset

Results For each of the methods described, they were tested by tracking the same player

and comparing the results to that of the truth data. The mean shift algorithm did fairly

well with the track getting worse as the player starts to leave the frame. The video of

this is available at https://www.youtube.com/watch?v=vcH0O8cc7ho. The cam shift algo-

rithm aslo performs pretty well with the track not being lost as much as the player leaves the

frame. The video of this is available at https://www.youtube.com/watch?v=k4pXwkEeEU8.

The particle filter did not track the object. A large number of different settings were try

and none appeared to help the problem. It appeared to be working for a few frames but

collapsed to a single particle after that. Upon further reading, this happens when there is

a small amount of process noise in the data. This means the data is not well suited for a

particle filter. The paper that was used to implement the particle filter used soccer video

with moving cameras while the dataset I was using had stationary cameras. This is what I

believe to cause the problem.

To quantify the results, the root mean square error of each track was calculated. The

result for the mean shift algorithm was RMSEMS = 227.8530. The result for the cam shift

Results continued on next page. . . of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking Algorithms Results (continued)

algorithm was RMSECS = 209.0335. Figure 7 shows a plot of the tracking data and the

truth data.

Figure 7: Tracking Results

Conclusion

This project was a comparison of soccer player video tracking algorithms. The mean shift

algorithm, the cam shift algorithm, and the particle filter method were all detailed and

implemented in MATLAB. They were compared using a soccer video tracking dataset. The

mean shift algorithm and the cam shift algorithm both did a reasonably good job at tracking

the player. The cam shift algorithm did a better job when comparing root mean square

errors. This is expected because the cam shift algorithm is a modified version of the mean

shift algorithm. The particle filter method did not work for this dataset. The collection of

particles collapsed to a single location after a few iterations. This is due to low process noise

possibly caused by the stationary camera of this dataset.

of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking Algorithms Conclusion

References

1. L. W. Kheng, ”Mean Shift Tracking”, Computer Vision and Pattern Recognition CS4243,

National University of Singapore.

2. G. R. Bradski, ”Computer Vision Face Tracking For Use in a Perceptual User Inter-

face”, Intel Technology Journal, Q2 1998.

3. M. Beetz, S. Gedikli, J. Bandouch, B. Kirchlechner, N. von Hoyningen-Huene, and A.

C. Perzylo, ”Visually Tracking Football Games Based on TV Broadcasts” In IJCAI (pp.

2066-2071), January 2007.

4. M. Xu, J. Orwell, L. Lowey, and D. Thirde, ”Architecture and algorithms for tracking

football players with multiple cameras,” in IEEE Proceedings of Vision, Image and Signal

Processing, vol.152, no.2, pp.232,241, 8 April 2005.

5. A. Dearden, Y. Demiris, and O. Grau, ”Tracking Football Player Movement From a

Single Moving Camera Using Particle Filters”, in Proceedings of CVMP-2006, pp. 29-37,

IET Press, 2006.

6. M. S. Arulampalam, S. Maskell, N. Gorden, and T. Clapp. ”A tutorial on particle

Filters for online nonlinear/non-gaussian bayesian tracking,” in IEEE transactions on signal

processing, 50(2), 2002.

7. T. DOrazio, M.Leo, N. Mosca, P.Spagnolo, P.L.Mazzeo, ”A Semi-Automatic System

for Ground Truth Generation of Soccer Video Sequences,” in the Proceeding of the 6th

IEEE International Conference on Advanced Video and Signal Surveillance, Genoa, Italy

September 2-4, 2009.

of 13

Cody Lawyer EE 7150: Comparison of Soccer Player Video Tracking Algorithms References

Appendix

The following m-files are attached:

calcHueBackprojection.m

CAMShiftTracking.m

meanShiftTracking.m

ParticleFilterTracking.m

resampleSIR.m

SIRParticleFilter.m

of 13

cody lawyer ee 7150 final paper

Documents

andplayer tracking

particle filter algorithm

player detection

soccer video dataset

video cameras

field andthe players

modified mean shiftalgorithm

cody lawyer ee