realtime object-of-interest tracking by learning composite patch-based templates yuanlu xu, hongfei...

1
REALTIME OBJECT-OF-INTEREST TRACKING BY LEARNING COMPOSITE PATCH-BASED TEMPLATES Yuanlu Xu, Hongfei Zhou, Qing Wang*, Liang Lin Sun Yat-sen University, Guangzhou, China Objective To tracking people with partial occlusions or significant appearance variations, as Fig. 1 illustrates. Contributions A novel model maintenance approach for patch-based tracking. A more concise and effective texture descriptor. INTRODUCTION Fig. 1. Difficulties of tracking: partial occlusions (left two), significant appearance variations (right two). Composite Patch-based Templates (CPT) Model We extract image patches with size from the tracking window and the up, down, left and right 4 sub- regions around the tracking window, forming the patch set of the tracking target and the patch set of the background , respectively. For each image patch, different types of features are applied to capture the local statistics. A.Histogram of gradient (HOG), to capture edges. B.Center-symmetric local binary patterns (CS-LBP), as illustrated in Fig. 3, to capture texture. C.Color histogram, to capture flatness. OUR METHODS CPT model is initialized by selecting image patches from the tracking window, which maximizes the difference with the background. Tracking Algorithm To infer the tracking target location, as shown in Fig. 4, we independently find the best match of each template (green rectangle) within the surrounding area (blue rectangle). And compute the location with the following formulation: To maintain the CPT model online, we propose a new maintenance algorithm by picking new patches and fusing them into CPT model, as shown in Fig. 5. EXPERIMENTS • The novel maintenance approach selects effective composite templates from the fusion of the matching templates and the candidate set, which outperforms other state-of-art algorithms in tracking targets with various challenges. • CS-LBP descriptor is an effective dimension-reduced texture descriptor. CONCLUSION [1] X. Liu et al, “Representing and recognizing objects with massive local image patches,” Pattern Recognition, vol. 45(1), pp. 231–240, 2012. [2] Y. Xie, L. Lin, and Y. Jia, “Tracking objects with adaptive feature patches for PTZ camera visual surveillance,” ICPR, pp. 1739– 1742, 2010. [3] X. Liu, L. Lin, S. Yan, H. Jin, and W. Jiang, “Adaptive object tracking by learning hybrid template on-line,” TCSVT, vol. 21(11), pp. 1588–1599, 2011. REFERENCES Please contact [email protected], [email protected]. FOR FURTHER INFORMATION Fig. 7. Quantitative comparisons 2010 International Conference on Image Processing Fig. 2. Illustration of CPT model. Fig. 3. Illustration of CS-LBP descriptor, CS-LBP operator compares the intensities in center-symmetric direction. where indicates the offset of each template, the discriminability weight. Fusing in an excess model Matching template set Candidate template set The new CPT model Fig. 5. Find the most discrimination patches in matching templates and candidate templates to maintain the CPT model online. Fig. 4. Finding the best match of each templat e. By thresholding the matching distance, we can obtain the matching templates in two successive frames. Constructing the candidate template set with a new frame and the inferred target location, we obtain two feature sets , from the inferred tracking window and the background, respectively. Re-selecting We collect 4 test videos to verify our approach, two video of the human face from Multiple Instance Learning (MIL) and two surveillance videos from the internet. A number of sampled tracking results are shown in Fig. 6. We compute the average tracking error with manually labeled groundtruth of the tracking target location and compare with two state-of-the-art algorithms: MIL and Ensemble Tracking, as shown in Fig. 7. Fig. 6. Sampled results of our tracking methods, target with severe body variations and large- scale occlusions. Girl Face Occlus ion Camera 1 Camera 2

Upload: solomon-curtis

Post on 15-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: REALTIME OBJECT-OF-INTEREST TRACKING BY LEARNING COMPOSITE PATCH-BASED TEMPLATES Yuanlu Xu, Hongfei Zhou, Qing Wang*, Liang Lin Sun Yat-sen University,

REALTIME OBJECT-OF-INTEREST TRACKING BY LEARNING COMPOSITE PATCH-BASED TEMPLATES

Yuanlu Xu, Hongfei Zhou, Qing Wang*, Liang LinSun Yat-sen University, Guangzhou, China

Objective

To tracking people with partial occlusions or significant appearance variations, as Fig. 1 illustrates.

Contributions

A novel model maintenance approach for patch-based tracking. A more concise and effective texture descriptor.

INTRODUCTION

Fig. 1. Difficulties of tracking: partial occlusions (left two), significant appearance variations (right two).

Composite Patch-based Templates (CPT) Model We extract image patches with size from the tracking window and the up, down, left and right 4 sub-regions around the tracking window, forming the patch set of the tracking target and the patch set of the background , respectively. For each image patch, different types of features are applied to capture the local statistics.A.Histogram of gradient (HOG), to capture edges.B.Center-symmetric local binary patterns (CS-LBP), as illustrated in Fig. 3, to capture texture.C.Color histogram, to capture flatness.

OUR METHODS

CPT model is initialized by selecting image patches from the tracking window, which maximizes the difference with the background.

Tracking Algorithm To infer the tracking target location, as shown in Fig. 4, we independently find the best match of each template (green rectangle) within the surrounding area (blue rectangle). And compute the location with the following formulation:

To maintain the CPT model online, we propose a new maintenance algorithm by picking new patches and fusing them into CPT model, as shown in Fig. 5.

EXPERIMENTS

• The novel maintenance approach selects effective composite templates from the fusion of the matching templates and the candidate set, which outperforms other state-of-art algorithms in tracking targets with various challenges. • CS-LBP descriptor is an effective dimension-reduced texture descriptor.

CONCLUSION

[1] X. Liu et al, “Representing and recognizing objects with massive local image patches,” Pattern Recognition, vol. 45(1), pp. 231–240, 2012.[2] Y. Xie, L. Lin, and Y. Jia, “Tracking objects with adaptive feature patches for PTZ camera visual surveillance,” ICPR, pp. 1739–1742, 2010.[3] X. Liu, L. Lin, S. Yan, H. Jin, and W. Jiang, “Adaptive object tracking by learning hybrid template on-line,” TCSVT, vol. 21(11), pp. 1588–1599, 2011.

REFERENCES

Please contact [email protected], [email protected].

FOR FURTHER INFORMATION

Fig. 7. Quantitative comparisons

2010 International Conference on Image Processing

Fig. 2. Illustration of CPT model.

Fig. 3. Illustration of CS-LBP descriptor, CS-LBP operator compares the intensities in center-symmetric direction.

where indicates the offset of each template, the discriminability weight.

Fusing in an excess model

Matching template set

Candidate template set

The new CPT model

Fig. 5. Find the most discrimination patches in matching templates and candidate templates to maintain the CPT model online.

Fig. 4. Finding the best match of each template.

By thresholding the matching distance, we can obtain the matching templates in two successive frames.

Constructing the candidate template set with a new frame and the inferred target location, we obtain two feature sets , from the inferred tracking window and the background, respectively.

Re-selecting

We collect 4 test videos to verify our approach, two video of the human face from Multiple Instance Learning (MIL) and two surveillance videos from the internet. A number of sampled tracking results are shown in Fig. 6. We compute the average tracking error with manually labeled groundtruth of the tracking target location and compare with two state-of-the-art algorithms: MIL and Ensemble Tracking, as shown in Fig. 7.

Fig. 6. Sampled results of our tracking methods, target with severe body variations and large-scale occlusions.

Girl

Face Occlusi

on

Camera 1

Camera 2