autonomous segmentation of near-symmetric objects through vision and robotic nudging
TRANSCRIPT
8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging
http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 1/6
Autonomous Segmentation of Near-Symmetric Objects through Vision
and Robotic Nudging
Wai Ho Li and Lindsay Kleeman
Abstract— This paper details a robust and accurate segmen-tation method for near-symmetric objects placed on a tableof known geometry. Here we define visual segmentation asthe problem of isolating all potions of an image that belongsto a physically coherent object. The term Near-Symmetricis used as our method can segment objects with some non-symmetric parts, such as a coffee mug and its handle. Usingbilateral symmetry this problem is solved autonomously androbustly through the aid of physical action provided by arobot manipulator. Our proposed approach does not requireprior models of target objects and assumes no previouslycollected background statistics. Instead, our approach relieson a precise robotic nudge to generate the necessary objectmotion to perform segmentation. Experiments performed on tenobjects show that our model-free approach can autonomouslyand accurately segment a variety of objects. These experimentsalso indicate that our segmentation approach is not adverselyaffected when operating in cluttered scenes and can segmentmulti-coloured and transparent objects in a robust manner.
I. INTRODUCTION
A. Motivation
Object segmentation is an important sensory process for
robots using vision. It allows a robot to build accurate
internal models of its surroundings by isolating regions of
images that correspond to objects in the real world. Multi-
scale computer vision object recognition methods, such asSIFT [1] and Haar boosted cascades [2] can imbue a robot
with the ability to robustly detect and classify modeled
objects. However, training such schemes to recognize objects
require many hand-labeled and well segmented images of
positive and negative examples. Precious human resources
are required to obtain this kind of training data. For very large
object sets, the amount of time and effort required can be
prohibitive. The autonomous process described in this paper
attempts to address this problem by obtaining accurate object
segmentations robustly without the need for human aid or
intervention.
Another motivating factor is to provide a segmentation
process that is highly autonomous. By limiting target objectsto those with bilateral symmetry, a model-free approach
can be applied, which allows us to abandon the a priori
assumptions and offline training demanded by other seg-
mentation approaches. For example, our method can operate
on transparent objects as we do not assume any temporal
constancy or colour uniformity in an object’s appearance.
Wai Ho Li and Lindsay Kleeman are with the Department of Electrical and Computer Systems Engineering, Monash University,Clayton Campus, Melbourne, Australia [email protected],[email protected]
This work is intended for use in domestic robotics ap-
plications as there are many objects with symmetry in
most households. However, the sensing parts of the process,
namely locating points of interest using symmetry triangu-
lation and object segmentation by folded frame difference,
are applicable to other robotic tasks. The overall aim is to
provide robots with general methods of dealing with common
household objects such as cups, bottles and cans, without the
burden of mandatory offline training for every new object.
As our approach assumes nothing about the appearance of
the robot manipulator, the actuation of target objects can be
provided by any manipulator capable of performing a roboticnudge as described in Section III, including a human hand.
B. Contributions
Segmentation using robotic action has been explored in
the past, most recently by Fitzpatrick et al [3], [4]. Their
approach uses a poking action, which sweeps the end effector
across the workspace. The presence of an object is detected
when visual motion increases due to contact with the moving
effector. Their segmentation method use frames just before
and after this point of contact. No planning is performed
prior to robotic action. Assuming the target object is not
deformed by the poking action, objects of any shape can be
segmented.The main contributions of our work are as follows. Firstly,
by limiting our scope to near-symmetric objects, locations of
interest are found prior to the application of robotic action.
This is achieved by clustering the intersections between
stereo triangulated symmetry axes and a table plane. By
avoiding dense stereo approaches, we can also localized
transparent objects with bilateral symmetry. Details of our
stereo triangulation approach, including a comparison of
results against dense stereo, can be found in [5].
Limited by the use of elastic acutators in their manipulator,
the approach of Fitzpatrick et al uses applies an imprecise
poking action to objects. In contrast, our method uses a
short, accurate robotic nudge, applied only to locations of interest. In experiments, we show that our method does not
tip over tall objects such as empty bottles and does not
damage fragile objects such as ceramic mugs. This level
of gentleness in object manipulation is not demonstrated in
the work of Fitzpatrick et al. While neither method address
the problem of end effector obstacle avoidance, the small
workspace footprint of the robotic nudge should make path
planning easier.
Finally, while appearing similar at a glance, our approach
to visual segmentation is very different to that of Fitz-
8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging
http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 2/6
patrick et al. Their approach uses video frames during robotic
action, around the time of contact between the end effector
and object. Due to their motion-based initiation, bad frame
timing with respect to the time of contact can produce
poor segmentations. This is highlighted in Figure 11 of
[4], which shows that their end effector can be included in
the segmentation results. This problem never occurs in our
approach as we use video frames that are temporally further
apart, captured before and after robot action. Also, near-
empty segmentations can be returned by their approach. Our
approach will only perform segmentation if object motion is
detected during the nudge and the subsequent stereo tracking
remains convergent. The satisfaction of these conditions
prevents poor segmentations due to insufficient or unexpected
object motion.
C. System Overview
Fig. 1. Robot System Components
The components of our robot system are shown in Fig-
ure 1. The stereo cameras consists of two Videre Design 1394
CMOS cameras verged together at around 15 degrees from
parallel. These cameras capture 640x480 images at 25Hz
during nearly all parts of the segmentation process, except
for high resolution 1280x960 snapshots of the scene taken
before and after the robotic nudge. The PUMA 260 robot arm
has six degrees of freedom. The calibration grid is used to
perform camera-arm calibration and to estimate the geometryof the table plane. Details of both are described in Section II-
A.
Our autonomous segmentation process is summarized in
Figure 2. The robot begins by surveying the scene for
interesting locations to explore. The details of this process
are described in Section II. Once an interesting location
has been found, the robot manipulator nudges the target
location. If motion is detected during the nudge, stereo
tracking is initiated to keep track of the moving object.
Section III describes the robotic nudge and stereo tracking. If
Fig. 2. Autonomous Segmentation Flowchart
tracking converges, the object is segmented using the method
described in Section IV.
Bilateral symmetry is used as the primary visual feature
throughout all stages of the process. Our Fast Bilateral
Symmetry Detection [6] scheme, herein referred to simply
as symmetry detection, is used to locate lines of symmetry
within input images. The noise robustness of our detection
method is crucial when performing segmentation in visually
cluttered environments.
II. FINDING INTERESTING LOCATIONS
A. System Calibration
This section details the methods used to calibrate our
robotic system. Firstly, the stereo cameras are calibrated
using the MATLAB camera calibration toolbox [7]. The
intrinsic parameters of each camera in the stereo pair are
obtained individually. This is followed by a calibration to
obtain the extrinsics of the stereo system. After this, the
camera system can be used to triangulate locations in 3D
space. The geometry of the table is found by fitting a plane
to the checkerboard corners, the locations of which are found
using stereo triangulation.Prior to calibration, a grid of points is drawn on the table
using the robot manipulator with a special pen attachment.
Using this grid, the corners of the calibration is placed at a
known location in the manipulator’s coordinate frame. The
same corners are triangulated using the stereo cameras to find
their coordinates relative to the camera frame of reference.
The arm-camera calibration is performed by solving the
Absolute Orientation problem, finding the transformation to
map the corner points from one frame of reference to another.
We use the PCA-based solution proposed by [8].
8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging
http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 3/6
B. Clustering Symmetry Intersects
Symmetry lines are detected in the left and right video
frames to provide data to a clustering algorithm. All pos-
sible pairings of symmetry lines between the left and right
images are triangulated to form 3D axes of symmetry using
the method described in our previous paper [5]. In our
experiments, three symmetry lines are detected for each
image, resulting in a maximum of nine triangulated axesof symmetry. Symmetry axes that lie outside the robot
manipulator’s workspace are left out of the clustering data.
Axes that are more than 10 degrees from being perpendicular
to the table plane are also rejected.
The intersections between valid symmetry axes and the
table plane are collected over 25 pairs of video frames and
recorded as 2D locations on the table plane. This collection
of locations are grouped into clusters using a modified QT
algorithm [9]. The QT clustering algorithm does not require
any prior knowledge of the number of actual clusters. This is
important as we are not making any assumptions concerning
the number of objects on the table. The QT algorithm also
provides a way to limit the diameter of clusters, reducingthe likelihood of clusters that include symmetry lines from
multiple objects. The orignal QT algorithm was modified
with the addition of a cluster quality threshold. The quality
threshold is used to ignore clusters formed by symmetry
axes that occur in less than half of all collected frames. The
geometric centroids of the clusters provide the robot with a
list of interesting locations to explore. A nudge is performed
on the valid location closest to the camera. A location is
deemed invalid if the robot gripper will collide with other
locations of interest during a nudge.
III. OBJECT MANIPULATION: THE ROBOTIC
NUDGE A. Motion Control
Fig. 3. Side view of Robotic Nudge
The motion of the robot gripper during a nudge is shown in
Figures 3 and 4. The L-shaped protrusion is made of sponge
to provide damping during contact, which is especially
important when nudging brittle objects such as ceramic cups.
The L-shaped sponge also allows the application of pushing
force at a height which is very close to the table plane. By
Fig. 4. Top-Down view of Robotic Nudge
applying force to the bottom of objects, nudged objects are
less likely to tip over. An example nudge captured by the
right camera is shown in Figure 5.
Fig. 5. Consecutive video frames from Right camera during a nudge. Theframes are taken from the P 1-P 2-P 1 portion of the nudge motion
The nudge begins by lowering the gripper from P 0 to P 1.
The height of the gripper at location P 0 is well above the
height of tallest expected object. Dmax is set to ensure thatthe L-shaped sponge will not hit the largest expected object
during its descent. After arriving at P 1, the gripper travels
towards P 2. Dmin is selected such that the gripper will make
contact with the smallest expected object before arriving at
P 2. The nudge motion ensures that the gripper never visually
crosses the object’s symmetry line when viewed from the
right camera. The gripper then retreats back through P 1 to
P 0. In early tests, the gripper was moved directly from P 2
back to P 0. This knocked over tapered objects such as the
blue cup in Figure 8 due to friction between the soft sponge
and the object’s outer surface.
In the overhead view of Figure 4, the nudge vector is
perpendicular to the line formed between the focal pointof the right camera and the target object’s symmetry line,
assuming one is present at the location to explore. This
choice of motion will nudge the object horizontally across the
camera’s image. This reduces the scale change of the target
object and also lowers the probability of glancing contact,
improving the quality of segmentation. After a location of
interest has been found, P 0, P 1 and P 2 are determined
based on the camera’s location. Using inverse kinematics,
linearly-interpolated encoder values are generated at run time
to move the gripper smoothly between these three points.
8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging
http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 4/6
B. Obtaining Visual Feedback by Stereo Tracking
When the gripper begins its descent at P 0, the right
camera image is monitored for motion. Motion detection
is performed at a coarse resolution using 8x8 pixel cells.
Cells with two times the motion of the global average are
labeled as moving. This block motion algorithm is the same
as the one used in our symmetry tracking paper [10]. To
prevent ego motion of the robot manipulator from beinginterpreted as object motion, the object’s symmetry line is
used as a visual barrier. As the robot gripper never crosses
the symmetry line, motion detection is only performed on
green region in Figure 3.
Once motion has been detected, the robot begins stereo
tracking on the target object’s symmetry line. A Kalman
filter is used to track the polar parameters of the target
symmetry line. The tracking system is identical to the one
described in our previous work on real time monocular
symmetry tracking [10]. The monocular tracker is replicated
twice to perform stereo tracking. Visual segmentation will
only take place if tracking converges to a symmetry axes
roughly perpendicular to the table plane. This prevents poorsegmentation caused by insufficient object motion.
Videos of the robotic nudge and stereo tracking can be
downloaded from:
www.ecse.monash.edu.au/centres/irrc/li_iro08.php
IV. OBJECT SEGMENTATION
A. Object Segmentation by Folded Frame Difference
(a) Before Nudge (b) After Nudge (c) Frame Difference
(d) Folded Difference (e) Symmetry Filled (f) Segmentation Result
Fig. 6. Segmentation by Folded Frame Difference. Note that the FoldedDifference and Symmetry Filled images are rotated such that the object’ssymmetry line is vertical
Segmentation is performed using the object motion gen-
erated by the robotic nudge. Figure 6 illustrates the major
steps of segmentation. Figure 6(a) and Figure 6(b) are
images taken by the right camera before and after the nudge.
The absolute frame difference between the before and after
images is shown in Figure 6(c). The green lines are the
object’s symmetry lines before and after the nudge, found
using our symmetry detector. Note that thresholding the raw
frame difference will produce a mask that includes many
background pixels. The mask will also have a large gap
at the center of low-texture objects, such as the clear cup
in the example. Using the object’s symmetry lines, we can
overcome these problems.
Figure 6(d) shows the folded frame difference of the
object. This image is produced by removing the frame dif-
ference pixels between the two symmetry lines. This process
folds the frame difference image together as if it is printed on
a piece of paper, pressing the creases at the symmetry lines
together. Changes in the orientation of the object’s symmetry
lines before and after the nudge are removed prior to folding.
This folding process removes the excess area of the motion
mask autonomously and reduces the size of the motion gap
at the center of the moved object’s frame difference.
After folding, a small gap still remains in the frame
difference. This can be seen in Figure 6(d) as a dark vertical
section inside the cup-like shape. To remedy this, we again
exploit object symmetry to our advantage. Recall that thefolding step merges the symmetry lines of the object in the
before and after frames. Using this newly merged symmetry
line as a mirror, we search for motion on either side of it.
A pixel is considered moving if its frame difference value
is above a threshold. The folded difference image is rotated
so that the merged symmetry line is vertical. The widest
pair of moving pixels bisected by the object’s symmetry line
are recorded for each row of the image. This produces a
symmetric contour of the object. By filling the interior of this
contour, we produce the image in Figure 6(e). Note that this
filling approach retains the non-symmetric parts of objects.
The final segmentation result in Figure 6(f) is obtained by
thresholding the symmetry filled difference image.
V. SEGMENTATION EXPERIMENT R ESULTS
Segmentation experiments were carried out on ten ob-
jects of different size, shape, texture and colour. Trans-
parent, multi-coloured and partially symmetric objects are
also included. Objects are set against different backgrounds,
ranging from plain to cluttered. All segmentation results are
obtained autonomously by our robot without any human aid.
Objects in our scenes casts many shadows due to four bright
fluorescent ceiling light sources illuminating the table. For
safety reasons, a flashing warning beacon is active during
robot motion, periodically casting red light on the table when
the robot manipulator is powered.Due to space constraints, some segmentation results have
been left out. They can be found at:
www.ecse.monash.edu.au/centres/irrc/li_iro08.php
A. Cups without Handles
The white cup in Figure 7 poses a challenge to our
segmentation process not because of its imperfect symmetry,
but because of its shape. Due to its narrow stem-like bottom
half, the nudge produces very small shifts in the object’s
8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging
http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 5/6
Fig. 7. Partially Symmetric White Cup
Fig. 8. Blue Cup
location, creating a narrow and weak contour of pixels in
the frame difference. As seen from the resulting segmen-
tation, our algorithm is able to handle this kind of object.
Figure 8 shows detection results for a symmetric cup against
background clutter. Lastly, Figure 9 illustrates the robustness
and accuracy of our segmentation process. The robot was
able to autonomously obtain a very clean segmentation of atransparent cup against background clutter.
B. Mugs with Handles
The mugs in Figures 10 and 11 tests the robustness of
our segmentation approach for objects with non-symmetric
parts. The handles of both mugs are successfully included
in the segmentation results. The multi-coloured mug in
Figure 11 was chosen for additional reasons. Firstly, it is a
multi-coloured object with intensities similar to background
Fig. 9. Transparent Cup in Clutter
Fig. 10. White Mug
Fig. 11. Multi-coloured Mug
shadows. This is the reason why the segmentation results is
quite noisy around the bottom of the mug. Secondly, it is a
brittle object made of ceramic. Its successful manipulation
provides evidence of the gentle nature of our robotic nudge.
C. Drink Bottles
Fig. 12. Small Water-Filled Bottle
The water-filled bottle in Figure 12 is used to test the
strength and accuracy of the robotic nudge. Due to its
small size and weight, the nudge must be accurate and
firm to produce enough object motion for segmentation. Thesegmentation result shows that the nudge can actuate small
and dense objects.
The remaining test objects are empty plastic drink bottles.
They are lightweight and have high centers of gravity, very
easy to tip over. During the nudge, their symmetry lines
tend to wobble, which provides noisy measurements to the
symmetry trackers. As such, these objects test the robustness
of stereo tracking and the robotic nudge. Figure 13 shows
a successful segmentation of a textured bottle against a
plain background. Figure 14 is a similar experiment repeated
8/3/2019 Autonomous Segmentation of Near-Symmetric Objects through Vision and Robotic Nudging
http://slidepdf.com/reader/full/autonomous-segmentation-of-near-symmetric-objects-through-vision-and-robotic 6/6
Fig. 13. Textured Bottle
Fig. 14. Textured Bottle in Clutter
against background clutter. Finally, Figure 15 contains two
segmentation results for a transparent bottle. Note the accu-
rate segmentation obtained for the transparent bottle, which
produces a very weak motion signature when nudged.
VI. CONCLUSION
Our segmentation approach performs robustly and accu-
rately on near-symmetric objects in cluttered environments.By using the robotic nudge, the entire segmentation process
is carried out autonomously. Multi-coloured and transpar-
ent objects, as well as objects with non-symmetric parts,
are handled in a robust manner. We have shown that our
approach can segment objects of varying visual appearance
autonomously, shifting the burden of training data collection
from the user to the robot.
End effector obstacle avoidance and path planning, espe-
cially in situations where non-symmetric objects are present
in the nudge path, are left to future work. As our symmetry
detection method uses edge pixels as input, our segmentation
approach is visually orthogonal to those that use pixel
information, such as colour and image gradient. In situationswhere the target object is non-symmetric, approaches relying
on other features can be applied synergetically.
Our objection to stereo optical flow and graph cuts is their
reliance on object surface information, which is completely
unreliable for transparent and reflective objects. However,
if the opaqueness of an object has been confirmed, these
approaches can be used with our robotic nudge. As the
geometry of our table plane is known, a stereo approach to
segmentation can further improve segmentation by removing
the object shadow which is present in some of the results.
Fig. 15. Transparent Bottle
VII. ACKNOWLEDGMENTS
Thanks go to Steve Armstrong for his help with repairing
the PUMA 260 manipulator and the anonymous reviewers
for their insightful comments.
REFERENCES
[1] D. G. Lowe, “Distinctive image features from scale-invariant key-points,” IJCV , vol. 60, no. 2, pp. 91–110, November 2004.
[2] P. Viola and M. J. Jones, “Rapid object detection using a boostedcascade of simple features,” in IEEE CVPR, 2001.
[3] P. Fitzpatrick, “First contact: an active vision approach to segmenta-
tion,” in Proceedings of Intelligent Robots and Systems, 2003. (IROS2003), vol. 3. IEEE, October 2003, pp. 2161–2166.
[4] P. Fitzpatrick and G. Metta, “Grounding vision through experimentalmanipulation,” in Philosophical Transactions of the Royal Society:
Mathematical, Physical, and Engineering Sciences, 2003, pp. 2165–2185.
[5] W. H. Li and L. Kleeman, “Fast stereo triangulation using symmetry,”in Australasian Conference on Robotics and Automation, 2006.
[6] W. H. Li, A. M. Zhang, and L. Kleeman, “Fast global reflectionalsymmetry detection for robotic grasping and visual tracking,” in
Australasian Conference on Robotics and Automation, 2005.[7] J.-Y. Bouguet, “Camera calibration toolbox for matlab,” Online, July
2006, http://www.vision.caltech.edu/bouguetj/calib doc/.[8] T. S. H. K. S. Arun and S. D. Blostein, “Least-squares fitting of two
3-d point sets,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 9, pp. 698–700, 1987.
[9] S. K. Laurie J. Heyer and S. Yooseph, “Exploring expression data:
Identification and analysis of coexpressed genes,” Genome Research,vol. 9, pp. 1106–1115, 1999.
[10] W. H. Li and L. Kleeman, “Real time object tracking using reflectionalsymmetry and motion,” in IEEE/RSJ Conference on Intelligent Robotsand Systems, 2006, pp. 2798–2803.