chapter 5 multi-cue 3d model-based object tracking

Chapter 5Chapter 5

Multi-Cue 3D Model-Multi-Cue 3D Model-Based Object TrackingBased Object Tracking

Geoffrey Taylor

Lindsay Kleeman

Intelligent Robotics Research Centre (IRRC)

Department of Electrical and Computer Systems Engineering

Monash University, Australia

Visual Perception and Robotic Manipulation

Springer Tracts in Advanced Robotics

2Taylor and Kleeman, Visual Perception and Robotic Manipulation, Springer Tracts in Advanced Robotics

ContentsContents

• Motivation and Background

• Overview of proposed framework

• Kalman filter

• Colour tracking

• Edge tracking

• Texture tracking

• Experimental results


IntroductionIntroduction

• Research aim:– Enable a humanoid robot

to manipulate a priori unknown objects in an unstructured office or domestic environment.

• Previous results:– Visual servoing

– Robust 3D stripe scanning

– 3D segmentation, object modellingMetalman


Why Object Tracking?Why Object Tracking?

• Metalman uses visual servoing to execute manipulations: control signals are calculated from observed relative pose of gripper and object.

• Object tracking allows Metalman to:

– Handle dynamic scenes

– Detect unstable grasps

– Detect motion from accidental collisions

– Compensate for calibration errors in kinematic and camera models


Why Multi-Cue?Why Multi-Cue?

• Individual cues only provide robust tracking under limited conditions:– Edges fail in low contrast,

distracted by texture

– Textures not always available, distracted by reflections

– Colour gives only partial pose

• Fusion of multiple cues provides robust tracking in unpredictable conditions.


Multi-Cue TrackingMulti-Cue Tracking

• Mainly applied to 2D feature-based tracking.

• Sequential cue tracking:– Selector (focus of attention) followed by tracker

– Can be extended to multi-level selector/tracker framework (Tomaya and Hager 1999).

• Cue integration:– Voting, fuzzy logic (Kragić and Christensen 2001)

– Bayesian fusion, probabilistic models

– ICondensation (Isard and Blake 1998)


Proposed frameworkProposed framework

• 3D Model-based tracking: models extracted using segmentation of range data from stripe scanner.

• Colour (selector), edges and texture (trackers) optimally fused in a Kalman filter framework.

Colour + range scan Textured polygonal models


Kalman filterKalman filter

• Optimally estimate object state xk given previous state xk-1 and new measurements yk.

• System state comprises pose and velocity screw:

xk = [pk, vk]T

• State Prediction (constant velocity dynamics):

p*k = pk-1 + vk-1·t vk = vk-1

• State Update:

xk = x*k + Kk [ yk - y*(x*

k) ]

• Need measurement function for each cue: y*(x*k)


MeasurementsMeasurements

• For each new frame, predict object pose of and project model onto image to define region of interest (ROI):– only process within ROI

to eliminate distractions and reduce computational expense Captured frame,

predicted pose & ROI


Colour TrackingColour Tracking

• Colour filter created from RGB histogram of texture

• Image processing:– Apply filter to ROI

– Calculate centroid of the largest connected blob

• Measurement prediction:– Project centroid of model

vertices at predicted pose onto the image plane


Edge TrackingEdge Tracking

• To avoid texture, only consider silhouette edges

• Image processing:– Extract directional edge pixels (Sobel masks)

– Combine colour data to extract silhouette edges

– Match pixels to projected model edge segments


Edge TrackingEdge Tracking

• Fit line to matched points for each segment and extract angle and mean position

• Measurement prediction:– Project model vertices to

image plane

– For each model edge, calculate angle and distance to measured mean point


Texture TrackingTexture Tracking

• Textures represented as 8×8 pixel templates with high spatial variation of intensity

• Image processing:– Render textured object in predicted pose

– Apply feature detector (Shi & Tomasi 1994)

– Extract templates, match to captured frame by SSD


Texture TrackingTexture Tracking

• Apply outlier rejection:– Consistent motion vectors

– Invertible matching

• Calculate the 3D position of texture features on the surface of the model

• Measurement prediction:– Project 3D surface features in

current pose onto image plane


Experimental ResultsExperimental Results

• Three tracking scenarios:– Poor visual conditions

– Occluding obstacles

– Rotation about axis of symmetry

• Off-line processing of captured video sequence:– Direct comparison of tracking performance using

edges only, texture only, and nultimodal fusion.

– Actual processing rate is about 15 frames/sec


Poor Visual ConditionsPoor Visual Conditions

Colour, texture and edge tracking


Poor Visual ConditionsPoor Visual Conditions

Texture onlyEdges only


OcclusionsOcclusions




Texture onlyEdges only



Tracking precision


Symmetrical ObjectsSymmetrical Objects



Symmetrical ObjectsSymmetrical Objects

Object orientation


ConclusionsConclusions

• Fusion of multimodal visual features overcomes weaknesses in individual cues, and provides robust tracking where single cue tracking fails.

• The proposed framework is extensible; additional modalities can be fused provided a suitable measurement model is devised.


Open IssuesOpen Issues

• Include additional modalities:

– optical flow (motion)

– depth from stereo

• Calculate measurement errors as part of feature extraction for measurement covariance matrix.

• Modulate size of ROI to reflect current state covariance, so ROI automatically increases as visual conditions degrade, and decreases under good conditions to increase processing speed.

chapter 5 multi-cue 3d model-based object tracking

Documents