leaving flatland: efficient real-time three …aravind/documents/rusu09-flatland.pdfleaving...

22
Leaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan Rusu Artificial Intelligence Center SRI International 333 Ravenswood Avenue Menlo Park, California 94025 Intelligent Autonomous Systems Technische Universit¨ at M ¨ unchen Boltzmannstrasse 3 85748 Garching b. Munich, Germany Aravind Sundaresan and Benoit Morisset Artificial Intelligence Center SRI International 333 Ravenswood Avenue Menlo Park, California 94025 Kris Hauser Computer Science Department Stanford University Stanford, California 94305 Motilal Agrawal Artificial Intelligence Center SRI International 333 Ravenswood Avenue Menlo Park, California 94025 Jean-Claude Latombe Computer Science Department Stanford University Stanford, California 94305 Journal of Field Robotics 26(10), 841–862 (2009) C 2009 Wiley Periodicals, Inc. Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/rob.20313

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •

Leaving Flatland: EfficientReal-Time Three-Dimensional

Perception and MotionPlanning

Radu Bogdan RusuArtificial Intelligence CenterSRI International333 Ravenswood AvenueMenlo Park, California 94025Intelligent Autonomous SystemsTechnische Universitat MunchenBoltzmannstrasse 385748 Garching b.Munich, Germany

Aravind Sundaresan and Benoit MorissetArtificial Intelligence CenterSRI International333 Ravenswood AvenueMenlo Park, California 94025

Kris HauserComputer Science DepartmentStanford UniversityStanford, California 94305

Motilal AgrawalArtificial Intelligence CenterSRI International333 Ravenswood AvenueMenlo Park, California 94025

Jean-Claude LatombeComputer Science DepartmentStanford UniversityStanford, California 94305

Journal of Field Robotics 26(10), 841–862 (2009) C© 2009 Wiley Periodicals, Inc.Published online in Wiley InterScience (www.interscience.wiley.com). • DOI: 10.1002/rob.20313

Page 2: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

842 • Journal of Field Robotics—2009

Michael BeetzIntelligent Autonomous SystemsTechnische Universitat MunchenBoltzmannstrasse 385748 Garching b.Munich, Germany

Received 5 January 2009; accepted 21 July 2009

In this article we present the complete details of the architecture and implementationof Leaving Flatland, an exploratory project that attempts to surmount the challengesof closing the loop between autonomous perception and action on challenging terrain.The proposed system includes comprehensive localization, mapping, path planning, andvisualization techniques for a mobile robot to operate autonomously in complex three-dimensional (3D) indoor and outdoor environments. In doing so we integrate robust visualodometry localization techniques with real-time 3D mapping methods from stereo data toobtain consistent global models annotated with semantic labels. These models are usedby a multiregion motion planner that adapts existing two-dimensional planning tech-niques to operate in 3D terrain. All the system components are evaluated on a varietyof real-world data sets, and their computational performance is shown to be favorable forhigh-speed autonomous navigation. C© 2009 Wiley Periodicals, Inc.

1. INTRODUCTION

Imagine mobile robots that can autonomously nav-igate in the same environments at the same speedsand with the same reliability and flexibility as hu-mans without requiring any prior preparation. Suchrobots are necessary for many tasks that require long-term navigation, such as environment exploration,rescue operations, performance of scientific exper-iments, and object delivery in an unknown envi-ronment. The ultimate goal of the Leaving Flatlandproject is to realize the necessary mapping and navi-gation methods for this task. To this end, the projectstudies the key challenges in closing the loop onautonomous perception and action in challengingterrain.

To demonstrate the capability of our system, webased our application scenario on the RHex mobilerobot (Saranli, Buehler, & Koditschek, 2001), a highlycapable six-legged robot developed by Boston Dy-namics and illustrated in Figure 1. RHex can runat high speed on flat ground, crawl over rocky ter-rain, and climb stairs. It is equipped with an iner-tial measurement unit (IMU) and a stereo camera,which we use to keep track of its position and builda three-dimensional (3D) model of the environmentthat is continuously updated as the robot moves. Pe-riodic model updates are sent to a multiregion motion

planner, which generates 3D trajectories that are to beexecuted by the robot. These trajectories consist of asequence of primitive gait commands to be sent to therobot’s onboard controller. The planner is typicallyasked to reach a nearby goal within the robot’s visualrange, usually chosen by either a human operatoror some supervisory behavior. Long-range goals areachieved by frequent planning. To increase the mobil-ity of the robot and help the planner select the mostappropriate gait for the terrain the robot is to navi-gate, we annotate the 3D model with semantic labels.The motion planner uses the inferred terrain labels todecompose the terrain locally into two-dimensional(2D) regions and selects fast 2D planning techniqueswithin each region.

Figure 1. The RHex mobile robot.

Journal of Field Robotics DOI 10.1002/rob

Page 3: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 843

Each individual component of our system wastested on data sets acquired in both outdoor andindoor environments. Our experiments show thatthe 3D mapping module can be operated in realtime, averaging about 250 ms for each map update.We note that it is not necessary to update the mapunless there is a significant change in the view ofthe camera. The motion planner can be queriedapproximately once or twice a second depending onthe environment’s complexity.

The remainder of this paper is organized as fol-lows. Section 2 describes similar research initiativesas well as related work on particular dimensions ofour project. Section 3 describes the system architec-ture of our system. The visual odometer that is in-tegrated with the IMU is described in Section 4. InSection 5 we present our 3D mapping pipeline in de-tail and provide insight on its implementation perfor-mance. Section 6 describes the motion planner usedto control the robot, and in Section 7 we present a di-rect application of our system for the purpose of as-sisted teleoperation. We conclude and list directionsfor future work in Section 8.

2. RELATED WORK

Typical state-of-the-art mapping and navigation tech-niques use 2D discrete representations of the world

such as occupancy maps (Konolige et al., 2006; Thrunet al., 2006; Vincent et al., 2008) to plan robot motion.Occupancy maps define obstacles in a crude binarysense (see Figure 2), which overly simplifies the com-plex interactions between the robot and the environ-ment. For instance, obstacles can be context-specific,such as a ledge that is an obstacle when approachedfrom a lower level but is the ground while the robotis on it. These shortcomings can be addressed by sys-tems using elevation maps (Simeon & Dacre-Wright,1993), which work well for presurveyed natural ter-rain. But both these representations are inherently2D, and they cannot represent environments withmultiple overlapping levels, such as multilevel tun-nel systems or multistory buildings.

One special category of maps is the hybrid 2D–3D map, such as those of Iocchi and Pellegrini (2007),Mozos, Triebel, Jensfelt, Rottmann, and Burgard(2007), Schroter, Weber, Beetz, and Radig (2004), andVasudevan, Gachter, Nguyen, and Siegwart (2007),where 2D laser sensors are used to create a map usedfor navigation and additional semantics are acquiredthrough the use of vision. For example, in Mozoset al. (2007) places are semantically labeled into corri-dors, rooms, and doorways. A similar decompositionof maps into regions and gateways is presented inSchroter et al. (2004). The advantage of these hybridrepresentations is that the computational costs are

Figure 2. Classical occupancy maps in mobile robotics. Top: Centibots (Vincent et al., 2008), left: Grand Challenge (Thrunet al., 2006), right: Learning Applied to Ground Robots (Konolige et al., 2006).

Journal of Field Robotics DOI 10.1002/rob

Page 4: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

844 • Journal of Field Robotics—2009

kept down by using the well-studied 2D SLAM (si-multaneous localization and mapping) techniques forlocalization and pose estimation, whereas the prob-lem of place labeling is solved through the use of fea-ture descriptors and machine learning. However, byreducing the dimensionality of the mapping to twodimensions, the world geometry is lost. Also, the la-bel classes need to be learned a priori through su-pervised learning and it is therefore unclear whetherthese representations scale well.

Semantic mapping interpretations in three di-mensions are used in Nuechter, Surmann, andHertzberg (2003) and Vandapel, Huber, Kapuria, andHebert (2004), in which either generic architectural el-ements (floor, ceiling, walls, doors) identified basedon the relationships between geometric features (par-allel, orthogonal, above, under, equal height) aretaken into account (Nuechter et al., 2003) or segmen-tation methods based on 3D point statistics (eigen-value decomposition) are applied (Vandapel et al.,2004). The classes obtained from outdoor LADARdata in Vandapel et al. (2004) are clutter (for grassand trees), linear (for wires and branches), and sur-face (for ground terrain and rocks). However, thesemapping interpretations can be applied only to datacoming from high-accuracy laser sensors and are un-suitable for mapping with stereo cameras, for exam-ple, as the levels of noise there are much higher andthe computational time requirements lower.

Other research (Biber, Andreasson, Duckett, &Schilling, 2004; Diebel & Thrun, 2006; Liu, Emery,Chakrabarti, Burgard, & Thrun, 2001; Weingarten,Gruener, & Siegwart, 2003) has addressed 3D map-ping and reconstruction for mobile robots that uselaser range finders or stereo cameras. These ap-proaches lack any additional semantics, however,and have not been demonstrated to work in real timewith a fully autonomous robot system. More recentworks (Munoz, Vandapel, & Hebert, 2008; Posner,Cummins, & Newman, 2008) use probabilistic meth-ods such as Markov random fields for 3D scene la-beling of sensed laser data, but their results are notyet suitable for fast real-time mapping and requirea priori manual segmentation and learning of the de-sired output classes.

Search-based (Simeon & Dacre-Wright, 1993) andnumerical optimization (Howard, Knepper, & Kelly,2008) techniques have been used for mobile robotpath planning in 2D elevation maps, with movementcosts defined to penalize highly sloped and uneventerrain. Search has also been applied to footstep plan-

ning for biped robots in uneven terrain (Chestnutt,Kuffner, Nishiwaki, & Kagami, 2003). In fully 3D en-vironments, path planning has been addressed witha configuration space sampling approach (Hauser,Bretl, Latombe, & Wilcox, 2006), but this is pro-hibitively expensive for real-time use.

3. SYSTEM OVERVIEW

The system architecture is presented in Figure 3. Themain modules, namely the 3D mapping and the mo-tion planner, are highlighted together with their in-puts and outputs. Our system does not include ahigh-level task planner, and we assume that the goalpose is either known a priori, computed externally, orsimply given by the user as a target.

As shown in Figure 1, the only sensing devicesattached to the RHex mobile robot are a stereo cam-era and an IMU. The information provided by thesesensors is fed to the visual odometry (VO) module,which estimates a global camera pose by trackingcharacteristic features between consecutive frames(Agrawal & Konolige, 2006). The IMU is used to cor-rect bad pose estimates computed solely from thestereo disparity in situations in which the imagesare blurred due to excessive camera motion (see Sec-tion 4 for details). Because the relationship betweenthe camera coordinate framework and the robot co-ordinate framework is known and fixed, the globalcamera pose is used to compute a global robot pose.

The 3D mapping module described in Section 5uses the stereo data and the estimated global pose tocompute a polygonal model for the scene. Addition-ally, it makes use of heuristic rules based on geome-try to annotate individual as well as patches of poly-gons with semantic class labels (Rusu, Sundaresan,Morisset, Agrawal, & Beetz, 2008).

The polygonal map, semantic labels, and a giventarget goal pose constitute the inputs to the motionplanner module, which is described in Section 6. Ourmotion planner uses a new decomposition technique,which breaks down the 3D workspace into 2D re-gions, in order to achieve real-time online replanning.This enables the use of established algorithms for mo-bile robot planning, while at the same time enablingthe use of specialized planning techniques for differ-ent terrain types. One straightforward advantage ofour approach is that we can use simpler, fast methodsfor flat ground and reserve more complex techniquesfor uneven terrain. The output of the motion plan-ner is a sequence of gaits and waypoints to be sent

Journal of Field Robotics DOI 10.1002/rob

Page 5: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 845

Figure 3. The general architecture of Leaving Flatland.

to the robot. We use gaits that have been designedspecifically for RHex, including high-speed runningon flat and uneven terrain (Saranli et al., 2001) andstair-climbing gaits (Moore & Buehler, 2001).

4. VISUAL ODOMETER

The accuracy of the global map depends directly onthe precision with which we evaluate the pose ofthe robot with respect to its previous location in theworld. This evaluation needs to be performed onlineas robustly as possible while the robot is moving.Another way to formulate the problem is to deter-mine the 3D robot pose increment (i.e., the 3D trans-formation) with respect to its former location, suchthat the overlap between the current and former cam-era views has a minimal error in a distance metricsense. This process is called registration, and its ac-curacy determines the quality of the final model. In-stead of employing a classical registration framework(e.g., iterative closest point), our solution uses a vi-sual odometer.

As input, the VO takes a series of monocularand disparity images and computes the camera mo-tion (3D rotation and 3D translation) between thoseviews. This motion is estimated by tracking charac-

teristic features from one image to the other (Agrawal& Konolige, 2006). In general the system is flexibleand works reasonably with any type of 2D features.In our case, we use the CenSurE (Center SurroundExtrema) features proposed in Konolige, Agrawal,and Sola (2007), which gave slightly better results forour data sets. Then, the motion is scored using thepixel reprojection errors in both cameras, and its in-liers are evaluated using disparity space homography(Konolige, Agrawal, & Iocchi, 2005). The hypothesiswith the best score (maximum number of inliers) isused as the starting point for a nonlinear optimizationroutine.

Given the pose of the robot in an arbitrary globalcoordinate system, each point cloud Pt computed inthe robot’s camera frame can be registered to the fixedglobal coordinate system. Figure 4 illustrates the VOregistration process for two individual local pointclouds obtained from different view points.

Because the VO algorithm computes new cam-era poses at about 10 Hz, the variation between twoconsecutive poses is typically small, and therefore itwould be highly inefficient to update the 3D modelwith the same frequency. Therefore, we update themodel only if the change in the camera pose sincethe last update is greater than a threshold (we use

Journal of Field Robotics DOI 10.1002/rob

Page 6: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

846 • Journal of Field Robotics—2009

Figure 4. Point cloud registration using VO: The two images on the left represent data from two frames approximately 2 sapart. Both frames have been registered to a global coordinate frame using the visual odometer module, and the result ofthe registration is illustrated in the images on the right.

0.1 m for the translation and 0.1 rad for the rotation).Figure 5 shows results obtained on two different datasets (dumpster and stairs) after aligning approximately3,000 and 2,000 stereo frames, respectively.

The accuracy of the global map depends on thequality of the VO pose estimate. Owing to excessivecamera motions, and in particular to rapid rotationalaccelerations of the robot, some images can appearas blurred (see the top-left-hand part of Figure 6). Inaddition, some areas of the environment do not con-tain enough features for the VO tracker. If the system

skips too many frames, the global pose estimate willbe erroneous. If several failures from VO occur con-secutively, the system could lose track of the robotmotion and the integrity of the 3D model could becompromised.

These effects can be attenuated by integrating anIMU and using it to fill in when VO fails to estimatea pose. In this version of our work, we did not inte-grate the translational information given by the IMU.We simply entered the angle data to update the lastpose computed by visual odometry. Figure 6 presents

Figure 5. Globally aligned PCD for two data sets (top image is bird’s-eye view, and the bottom image is the close-up view).(a) Stairs (≈2,000 frames) and (b) dumpster (≈3,000 frames).

Journal of Field Robotics DOI 10.1002/rob

Page 7: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 847

Figure 6. Global pose corrections by integrating an IMU into the VO system (top right and bottom row) for a blurred frame(top left).

the corrections in roll (bottom left) and pitch (bottomright) before and after the IMU integration with VO.Even if some error in pitch persists, the drift on z isreduced by approximatively 65% (top right). In thesequence in Figure 6, the robot navigates a ramp andthe initial and final positions of the robot are on levelground. The initial and final values of the Z coordi-nate, pitch, and roll should therefore be zero.

5. 3D MAPPING

The objective of the 3D mapping module is to buildand update a model of the scene in a fixed globalcoordinate frame, which can then be used for plan-ning a 3D path. The global coordinate frame can bechosen arbitrarily and is fixed for the duration ofeach sequence. The mapping module uses the image-disparity data obtained from the stereo sensor as in-put. We employ the VO module described in the pre-ceding section in order to keep track of the pose of therobot. The pose with respect to the fixed global coor-dinate frame can be used to register the point cloud

data (PCD) to a fixed coordinate frame. The regis-tered data are passed through a mapping pipelinethat builds and updates a 3D model of the scene.We introduce the following notation in order to ex-plain the role of each submodule in the 3D mappingpipeline.

• P (t) = {x(t)1 , x(t)

2 , . . .} is the registered PCD set,i.e., the set of 3D points in the global coordi-nate frame acquired at time t . xi is a 3D pointhaving coordinates (xi yi zi)′ in the global co-ordinate frame, with an optional color vector(ri gi bi)′.

• M(t)L , M(t)

G , and M(t)F denote the local, global,

and final models at time t .

The 3D mapping pipeline consists of the followingsteps at a time instant t .

1. Decompose the PCD P (t) using a dynamic oc-tree (Section 5.1).

2. Compute the local model M(t)L from decom-

posed P (t) (Section 5.2).

Journal of Field Robotics DOI 10.1002/rob

Page 8: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

848 • Journal of Field Robotics—2009

Figure 7. 3D mapping pipeline.

3. Merge the local model M(t)L with the global

model from previous frame M(t−1)G and com-

pute the updated and refined global modelM(t)

G (Section 5.3).4. Add semantic labels to compute the final

model M(t)F (Section 5.4).

The processing pipeline is presented in Figure 7. Theoutput of the module is represented by a 3D polyg-onal model annotated with semantic labels. Each ofthe steps listed above is described in detail in the fol-lowing sections.

5.1. Spatial Decomposition

The PCD obtained from the stereo sensor at eachframe, P (t), can contain hundreds of thousands of

points. As the robot moves, it acquires points from re-gions that overlap the existing model as well as frompreviously unexplored regions. From a real-time pro-cessing point of view, we are faced with two chal-lenges. First, we need to efficiently handle and rep-resent redundant data, as storing or processing theaccumulated set of points becomes unmanageable af-ter only a few data frames. Second, we need to beable to process and update only the relevant partsof the model. The second aspect becomes importantas the size of our model grows, and it becomes im-practical to update the model as a whole. The choiceof data representation is therefore critical to the ef-ficiency of the entire system. Furthermore, point-based representations are unsuitable for 3D motionplanners as they need to perform many geometricqueries on the model, such as distance and collisionchecks.

Journal of Field Robotics DOI 10.1002/rob

Page 9: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 849

Our solution is to represent the PCD using a setof convex polygons that approximates the real dataas accurately as possible. At any given time, we re-fer to the existing model or data as the global modelor data. The new data obtained at each time frameare referred to as the local data (PCD) and the corre-sponding model as the local model. Our global polyg-onal model is updated whenever we obtain new (lo-cal) data at a frequency compatible with the speedof robot motion. As mentioned earlier, we are inter-ested in the ability to process or update only the rel-evant part of the global model, i.e., the region thatoverlaps the local data. To enable this, we perform anoctree decomposition of the volume of interest intocells. The size of a cell can be set based on a num-ber of factors determined by the application, such asthe required accuracy of the model and the expectedterrain. The size of the cell is usually in the order ofa few centimeters to tens of centimeters. To facilitatethe merging of local and global models, all octree de-composition is performed so that the cells are per-fectly aligned by snapping them to a common grid.This ensures that there exists a one-to-one correspon-dence between the cells in the local model (M(t)

L ) andthe cells of the global model (MG) wherever there isan overlap.

To obtain a local octree, we perform a spatial de-composition of the local PCD in each frame (P (t)). Thebounds of the local volume of interest are determinedby the bounds of P (t), as illustrated in Figure 8. Thesize of the cell at the leaf of the octree is fixed (lSIZE),and the points that belong to each cell are representedby a suitable model as described in the following

section. The model for each cell is computed usingpoints from only that cell, and because the cells arenonoverlapping, it is possible to perform this opera-tion in parallel if desired. For the first acquired frame,the bounds of the global octree will be the same as thebounds of the local octree. Subsequently, we grow theglobal octree dynamically to accommodate the pointdata from later frames.

The data points we obtain are typically contigu-ous and occupy only a fraction of the volume of in-terest (which is a cube), and thus the octree represen-tation is an efficient way to both represent and accessthe data.

5.2. Polygonal Modeling

The main goal of the modeling step is to represent thelarge number of points in a manner that is both easyfor the motion planner to use and efficient in terms ofstorage and speed. The PCD are therefore convertedinto a local polygonal model, thus reducing the sizeof the model by several orders of magnitude—fromseveral hundreds of thousands of points to a few hun-dred polygons. As described in Section 5.1, we as-sign each point to cells in the local octree. For eachoccupied cell in the local octree, we can compute aset of convex polygons to represent the points con-tained within, so that the scene is approximated by aset of local planar patches. This represents our localmodel, M(t)

L . We choose convex polygons over othermodels not only because searching for complicatednonlinear shape models (e.g., cylinders, spheres) ina point cloud is a costly process but also because

Figure 8. Spatial decomposition using fixed-size octrees for two local PCD sets. Only the occupied cells are drawn.

Journal of Field Robotics DOI 10.1002/rob

Page 10: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

850 • Journal of Field Robotics—2009

using such shapes does not bring much more addi-tional semantic information in our applications. Theplanar polygon representation is relatively inexpen-sive to compute and store and easy to manipulate.

The individual steps that lead to the creationof the local polygonal model M(t)

L are listed inAlgorithm 1. Note that because steps 5–12 are com-puted independently for each cell, a parallel schemethat boosts the overall computation efficiency of themodel by using multiple CPU cores (Soares, Menier,Raffin, & Roch, 2007) when available can be easily im-plemented. We used Nmin = 5 and Npoly = 1 in our ex-periments.

Algorithm 1 Compute a local polygonal model for a givenpoint cloud view

Require: a point cloud dataset P (t)

1: set Npoly, the number of polygons per cell2: set Nmin, the minimum number of points per cell3: build a local octree representation OL for P (t)

4: for every occupied cell i in OL do5: let Ci be set of all points in cell i

6: if |Ci | > Nmin then7: search for a maximum of Npoly planes in Ci

8: for each plane found j = 1:Npoly do

9: project all inliers Cji onto the j th plane

10: construct a 2D polygonal hull which approxi-mates Cj

i and add it to M(t)L

11: end for12: end if13: end for

The search for planar models is performed usinga RMSAC (Randomized M-Estimator SAmple Con-sensus) estimation method, which adds a Td, d test asproposed in Chum and Matas (2002) to a MSAC es-timator (Torr & Zisserman, 2000). For each plane, theinliers are projected to the plane, and then a convex

polygon that contains the projections is computed.We keep track of the support of the polygon, i.e., thenumber of inliers. The use of convex polygons is justi-fied by the fact that, locally, they are good approxima-tions to the underlying regular geometric structurespresent in the scene and they provide an advantageover triangulation methods or concave hulls (alphashapes) because they are fast to compute and geo-metrically stable and do not require any thresholdsor parameters. Figure 9 presents the polygonal fittingresults for a given scene in the stairs sequence, andFigure 10 presents the resultant polygonal models forthe two data sets shown in Figure 5.

5.3. Merging and Refinement

The local model M(t)L is merged with the global model

M(t)G at the local cell level. Depending on the chosen

local octree cell size, one polygon per cell (Npoly = 1)may be sufficient. For each pair of overlapping octreecells from the local and global models, we comparethe respective polygons based on their relative ori-entation and distance. The “distance” between poly-gons is determined by computing the integral of thedistance of a point along the circumference of thesmaller polygon from the plane of the bigger poly-gon. The integral is normalized by the circumferenceof the smaller polygon to obtain the “distance” of thesmaller polygon from the bigger polygon. The ori-entation between the polygons is the angle betweentheir normals. Two polygons are judged to be similarif both the “distance” and the angle between the poly-gons are less than their respective thresholds. The dis-tance threshold is 2 cm, and the angle threshold is15 deg. If the polygons are similar, we construct anew polygon that is the “average” of the two poly-gons weighted by their inlier support. The average ofthe two polygons is constructed as follows. The best

Figure 9. Point cloud model simplification using polygonal fitting. From left to right: overlapping aligned point cloudviews, combination PCD model and polygonal model, and the resultant polygonal model.

Journal of Field Robotics DOI 10.1002/rob

Page 11: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 851

Figure 10. Resultant polygonal models for two data sets: stairs (left) (≈2,000 frames), dumpster (right) (≈3,000 frames). Top:bird’s-eye view; bottom: close-up view.

plane fit is computed using the vertices of both poly-gons weighted by the number of inliers in each. Thepoints are then projected to this new plane, and a con-vex polygon is fitted to the projections. If the poly-gons are dissimilar we replace the model with thepolygon that has the larger support. In the generalcase in which Npoly > 1, we merge the polygons suchthat the number of polygons in the cell is capped atNpoly.

To perform the refinements, our algorithm con-siders only the cells in the global model that havebeen updated as a result of merging the polygonsfrom the local polygon. We perform simple outlierremoval by erasing isolated polygons (i.e., polygonsthat have no neighbors). Because a single location canbe observed several times during the mapping pro-cess, some surfaces may generate duplicate polygonsin adjacent cells (see Figure 11). We identify these du-plicates by searching for similar polygons in neigh-boring cells that are very close to each other.

The premise for reducing the polygonal model isbased on the geometric relationships between adja-cent polygons, namely the angle and the “distance”between them. If neighboring polygons are similar,the polygon with smaller support is removed. If,within a group of a 2 × 2 × 2 cluster of eight cells, wefind that all the polygons are similar to each other, wereplace all polygons with a single “super” polygon inorder to simplify the representation. Our algorithm

is similar to the one presented in Weingarten et al.(2003), with the only difference being that our polyg-onal simplification is in fact a weighted reestimationof the original polygons and thus more precise. In ad-dition, we perform lone polygon filtering in the samestep. It is possible to extend this procedure in a re-cursive manner, which is a future extension of ourproject. Experimentally, we observed that this proce-dure reduced the global number of polygons in MG

by a factor of 2.5 on average.

5.4. Semantic Labeling

Because our motion planner can manipulate severaltypes of motion primitives, each of them adapted to aspecific terrain type, it is important to help the plan-ner select the most appropriate primitive based onthe type of terrain it has to navigate. To enable this,we annotate the polygonal model M(t)

G with semanticclass labels based on heuristic rules, thus creating thefinal model M(t)

F .The labeling is performed using geometric

heuristic rules that examine the local terrain geome-try at and around each polygon. We use the followingfive labels:

• Level: Polygons that form an angle less than15 deg with the horizontal plane.

Journal of Field Robotics DOI 10.1002/rob

Page 12: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

852 • Journal of Field Robotics—2009

Figure 11. Reducing the model complexity through merging and duplicate removal. (a) The polygonal reduction factorfor four different data sets. (b) Model before refinement (c) Model after refinement.

• Ground: Polygons that form an angle lessthan 15 deg with the ground plane and areconnected to the ground plane.

• Vertical: Polygons that form an angle greaterthan 75 deg with the horizontal plane.

• Stairs: Groups of adjacent polygons that formsteps, i.e., belong to planes that are parallel orperpendicular to the horizontal plane.

• Unknown: Everything else.

After refining and labeling the global model, we an-notate ground polygons. To begin, we label polygonsthat are within 15 deg and 2 cm from the groundplane (plane described by z = 0) as ground. A wave isthen propagated from this set of “ground” polygons.At each iteration, the neighbors of the seed “ground”polygons are considered and all neighboring poly-gons whose angle is less than 15 deg from the groundplane and also within 2 cm from the seed polygon arealso labeled as ground. With this ground propagation,a surface with a smooth inclination (up or down) isstill considered flat ground. RHex can traverse smallslopes and unevenness with ease, so ignoring mi-nor irregularities helps speed up the motion planner.Figure 12 presents the semantic annotations for thescene in Figure 9, and Figure 13 presents the over-all semantic models for the two data sets shown inFigure 5.

Most of the above surface class labels are as-signed directly while performing the region-growingoperations through dot products and distance com-putations. The only exception to this rule is thestairs class, which is treated separately. Algorithm 2presents the individual steps that lead to the labelingof a polygon or group of polygons as being part of thestairs class.

Algorithm 2 Stairs labeling algorithm

Require: a set of horizontal (m1) and vertical (m2) polygo-nal models1: if size of m1 and size of m2 ≥ 1 then2: compute the geometric centroids of m1 and m2 as cm1

and cm2

3: for each horizontal centroid in cm1 do4: search for nearest vertical neighbors in cm2 where‖cm1 , cm2‖2 < dth5: if neighbors found then6: add both neighbors and the query centroid to astairs set7: end if8: end for9: end if

The dth threshold dictates the desired searchspace for each candidate stair step. In Algorithm 2,it describes the maximum radius of a sphere

Journal of Field Robotics DOI 10.1002/rob

Page 13: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 853

Figure 12. Semantic labeling of polygonal surfaces based on heuristic geometrical rules. The labeled classes are flat andlevel terrain (green and yellow), vertical structures (red), stairs (purple), and unclassified or unknown (gray).

containing the nearest neighbors (from cm2 ) of eachhorizontal centroid in cm1 . Setting the parameter toa high value will render the algorithm to search formore neighbors, which might bring an additionalcomputation burden. In our implementation, we setthe parameter to 0.2 m, as this provided an intu-itive way for restricting a stair step size, while at the

same time giving good results for all data sets that wetested against.

5.5. Hybrid Model

In this section we describe a hybrid model that con-sists of sampled PCD and polygonal planes and can

Figure 13. Semantically annotated polygonal models for two data sets (top image is bird’s-eye view, and the bottom imageis the close-up view). (a) Stairs (≈2,000 frames) and (b) dumpster (≈3,000 frames).

Journal of Field Robotics DOI 10.1002/rob

Page 14: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

854 • Journal of Field Robotics—2009

Figure 14. Hybrid model visualized remotely. Ground and level cells are plotted as polygons and the rest as points.

be used for visualization. We note that although apolygonal model is convenient for mapping, it is notthe most suitable for visualization. One of the pos-sible applications of this project is to enable remotevisualization of the scene, for instance, by a teleoper-ator. In this case it is necessary to transmit the polyg-onal model over a bandwidth-limited network. Thegoal is to provide a visually rich as well as schemati-cally coherent view for the remote operator in an ef-ficient manner. The full PCD are rich in terms of vi-sualization but require a large bandwidth for trans-mission, whereas the polygonal model is a compactrepresentation but is not suitable for parts of thescene that are more complex, such as vegetation. Wepresent a scheme in which parts of the scene are rep-resented using either the polygonal model or sam-pled PCD in order to better visualize the scene inan efficient manner. It is possible to extend our sys-tem to allow the operator to select a schematic or ac-tual view or even a user-specified combination of thetwo. We present one such view of the hybrid modelin Figure 14.

5.6. 3D Mapping Performance

To evaluate the system’s performance, we monitoredboth the computation time and memory usage foreach component of our 3D mapping module. Themapping pipeline is split into the following compo-nents:

• Stereo + VO: produces the disparity imagefrom the stereo system and computes theglobal pose estimate.

• PCD→LocalOctree: produces the localPCD from the disparity image and cre-ates/updates the local octree.

• GlobalOctree growing: expands the local oc-tree’s boundaries when needed.

• LocalOctree→Polygon→GlobalOctree: gener-ates the polygonal model and merges it withthe existing model.

• GlobalOctree→GlobalModel: refines the model(polygonal merging, polygonal growing, out-lier removal).

• Semantic (Functional) Labeling: annotates themodel.

Figure 15 presents the results for the data setsthat correspond to the models presented in Fig-ure 5. The majority of the model-building compo-nents have low computational requirements. Almostconsistently, Stereo + VO requires the majority of re-sources. To address this, we implemented the Stereo +VO component in a process separate from the other(model-building) components. By concurrently run-ning the two processes, the global map can be up-dated every 250 ms on average.

Figure 16 shows the memory usage plot for thesame two data sets presented above. To illustrate thesystem’s operation better, we analyzed the amount of

Journal of Field Robotics DOI 10.1002/rob

Page 15: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 855

Figure 15. Computational time of each 3D mapping component for the data sets presented in Figure 5.

memory allocated at each frame, when new data arereceived and have to be integrated into the model.Therefore, the two curves, blue and red, show thequantity of memory occupied before M(t)

L is createdand after.

6. MOTION PLANNING

The motion planning subsystem is periodically calledto generate a trajectory that connects the current po-sition of the robot to a user-designated target. Theplanner is given a map of the environment built bythe 3D mapping system. Its output is a trajectory thatconsists of a sequence of gait primitives to be exe-cuted by the robot and the intermediate milestones.We seek a trajectory with low (but not necessarily op-timal) execution cost. We define execution cost as afunction of local terrain characteristics, such as penal-

izing slopes and unevenness, as in Simeon and Dacre-Wright (1993).

6.1. Overview

Usually, our planner treats RHex like a mobile roboton a 2D elevation map (Simeon & Dacre-Wright,1993). As mentioned in the Introduction, the el-evation map approach cannot represent environ-ments with multiple overlapping levels. Therefore,our planner starts by decomposing the 3D environ-ment model into regions that are locally 2D using agrid-based approach. Furthermore, to speed up plan-ning, our planner chooses planning techniques thatare specialized to the semantic type of the region’sground surface. When planning on uneven outdoorterrain, we must assess terrain traversability and thestability of robot configurations, in addition to testing

Figure 16. Memory usage for the creation of the final polygonal models M(t)F for the data sets presented in Figure 5.

Journal of Field Robotics DOI 10.1002/rob

Page 16: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

856 • Journal of Field Robotics—2009

Figure 17. Illustrating the terrain decomposition step for a 2D slice of terrain.

for collision. These assessments are moderately ex-pensive. However, they are unnecessary on flat ter-rain and planning can be considerably simplified.Similar simplifications can be made when planningon stairs and steep slopes.

To implement this idea, the planner has a reper-toire of primitive planners that work only when therobot interacts with certain semantic types of terrain.The planner uses a two-phase algorithm to generatemultiregion paths. First, the planner builds a largegraph of configurations with edges representing can-didate primitive planning queries. This graph is usedin the second phase to help choose primitive plan-ning queries that have low computational cost andare likely to yield a path to the goal. Several single-mode trajectories are then combined to produce a tra-jectory from the start to the goal.

6.2. Terrain Decomposition

The terrain decomposition step decomposes theempty portion of 3D space into volumetric cells,which are bounded on the top and bottom by ter-rain surfaces (or no surface at all) and on the sidesby vertical faces. A region is defined as a collection ofcells whose upper and lower terrain surfaces have thesame semantic label and are connected along bound-ary faces. The lower surface of a region can be repre-sented as an elevation map. If two regions have con-nected boundary faces, we say they are adjacent. Thus,from any polygonal model M, we can compute theregion graph R, where nodes are regions and edgesconnect adjacent regions. Any path in 3D space thatdoes not intersect M corresponds to a path in R. Itis a 3D analog of the trapezoidal decomposition, ex-cept that regions are not convex in general. R couldpotentially be computed exactly from the model M

[Figures 17(a) and 17(b)], but for large models of tensof thousands of polygons. This would be computa-tionally expensive, and the structure would becomeextremely large. Instead, we use an approximation inwhich the polygons are projected and rasterized ontoa x–y grid.

The approximation is computed as follows. First,we define the function OVERLAP(c,M), which com-putes the set of polygons in M whose projectionon the x–y plane overlaps the 2D grid cell c. Letp1, . . . , pk be the polygons in OVERLAP(c,M). Foreach polygon pi , let [ai, bi] be the vertical range ofthe intersection between pi and c. Let S = ⋃k

i=1[ai, bi]be the vertical range of c intersected by surfaces andF = R\S be the vertical range that is free. For eachconnected component f of F , we output a cell whosetop and bottom surfaces are identified by the polygondefining f ’s upper and lower bounds [Figure 17(c)].

The total running time of the approximate de-composition is O(CK log K), where C is the num-ber of grid cells and K is the maximum numberof polygons overlapping a single grid cell. The setsOVERLAP(c,M) be computed for all grid cells inO(CK) time by rasterizing the polygons. The decom-position of each F takes O(K log K) time. Becausethere are C grid cells, the overall running time isO(CK log K).

A flood-fill algorithm is used to collect cellsinto regions and determine the adjacencies betweenregions [Figure 17(d)]. This is implemented withan O(n log n) breadth-first search. Additionally, wereduce the total number of regions by aggregat-ing regions that differ only by their upper sur-face. This reduces the decomposition’s complexity,which speeds the multiregion planner. During theaggregation process, we prevent incorporating a newcell into a region if it would cause the x–y projection

Journal of Field Robotics DOI 10.1002/rob

Page 17: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 857

of the region to self-overlap. Because the num-ber of cells is O(CK), the overall algorithm takesO(KC log CK) time, limited by the running time ofthe breadth-first search.

6.3. Primitive Planners

We define a mode m as a set of regions in which a sin-gle primitive planner can be applied. We also definea validity condition VALID(m) that holds if (1) the re-gions of m induce a connected subgraph of R (thatis, the subset of the free space defined by m is con-nected) and (2) the regions in m do not overlap in thex–y plane (so a primitive planner can assume that m

is locally 2D).A primitive planning query is specified with a

(valid) mode, a start configuration, and a goal con-figuration. The query terminates when the planner ei-ther produces a feasible trajectory, determines that nopath exists, or exceeds a user-specified time limit.

We use the following primitive planners, classi-fied by the type of region(s) on which they are appli-cable:

• Unlabeled terrain. We use an A* search on a dis-cretized (x, y, θ ) state space (Simeon & Dacre-Wright, 1993). Successors are generated byenumerating all gait primitives. At each gen-erated configuration, we check for collisionswith the region’s ceiling and compute the ex-ecution cost. To compute the execution cost,we compute the terrain underneath the robotT and fit a plane p to T (minimizing distancein a least-squares sense). Then, cost is definedas a weighted sum of the following criteria:

1. The roll and tilt of p relative to theheading of the robot. Penalizes steepslopes.

2. The average squared deviation ofT from p (area-weighted). Penalizesrough terrain.

3. The maximum deviation of T from p.Penalizes rough terrain.

4. The surface normal averaged overT (area-weighted). We use the pitchand roll as features. Penalizes steepslopes.

5. The standard deviation of the nor-mals over T from the average normal.We use the pitch and roll as features.Penalizes rough terrain.

6. The area of the projection of T to thex–y plane, divided by the area of therobot’s geometry projected to the x–y

plane. Penalizes regions with missingdata.

7. A constant term, which representsthe base movement cost on flat ter-rain. For example, moving backwardis less efficient than moving forwardand is assigned a higher constantterm.

Weights are currently tuned by hand, al-though in the future learning techniques maybe used to match weights to traversabil-ity, execution time, and energy expenditure(Shirkhodaie, Amrani, & Tunstel, 2004). Forthe A*search, we use a straight-line distanceheuristic. The planner finds the trajectorywith minimum cost [see Figure 18(a)].

• Flat ground. We use a sample-based mo-tion planner called SBL (single-query, bidi-rectional sampling, lazy collision checking)(Sanchez & Latombe, 2002), which buildstrees bidirectionally in configuration space toconnect the start and goal. SBL is typicallymuch faster than an A* search and uses ran-dom sampling to avoid discretization arti-facts. We use an (x, y, θ ) configuration spaceand propagate either car-like paths or turn-and-drive paths, depending on the user’spreference [see Figure 18(b)]. The robot’s el-evation, pitch, and roll are constant, so an ele-vation map to assess stability is not required.We check for collisions with the region’s ceil-ing, at samples and at uniformly spaced con-figurations along paths.

• Stairs or steep slopes. We restrict the robot’sheading to face either up or down the slopeand use the stair-climbing gait primitive. Theplanner examines a one-dimensional (1D)line in the configuration space for collisionavoidance.

• Flat ground and stairs, or flat ground and steepslopes. The same as above.

The unlabeled terrain planner is also used asa backup when no other primitive planner isapplicable.

Note that the unlabeled terrain primitive plannercan be applied to any mode for which any other prim-itive planners can be applied.

Journal of Field Robotics DOI 10.1002/rob

Page 18: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

858 • Journal of Field Robotics—2009

Figure 18. Examples of paths planned by primitive planners.

6.4. Multiregion Planner

Each primitive planner can be applied only to modeswhose regions are connected and do not overlap inthe x–y plane (otherwise, this is considered an invalidmode).

The multiregion planner selects (valid) modesand initial and final configurations for primitive plan-ner queries and chooses the fastest planner for thegiven query. We use a two-phase algorithm, whichis similar to the one used in motion planning for aclimbing robot (Bretl, Lall, Latombe, & Rock, 2004).

In the first phase, the algorithm builds a networkof configurations including the start and goal con-figurations and sampled configurations that transi-tion between modes. We call this the transition graph.Each edge in the transition graph corresponds to apotential primitive planning query. Rather than im-mediately making the query, an edge is assigned aheuristic cost that includes an estimate of the plan-ning cost and execution costs. The second phase ofthe algorithm refines the transition graph, makingprimitive planning queries that correspond to eachedge.

The transition graph G is built by using a breadth-first search. To expand a configuration q at a givenmode m, we consider all modes m′ obtained by ei-ther (1) adding a region r ′ adjacent to some regionr in m or (2) removing a region in m. For each m′ forwhich VALID(m′) holds, we sample a set T of configu-rations that transition from m to m′ and add them toG (Figure 19). If m contains the goal configuration qg ,

we then add qg to T . For each transition configurationq ′ ∈ T , we connect an edge between q and q ′. The pro-cess is repeated for some number of iterations, at leastuntil G contains a path to qg [Figures 20(a)–20(d)].

To sample a configuration q ′ that transitions fromm to m′, we use the following procedure: First, con-sider the case in which m′ was obtained by addinga region r ′ to m. To maximize the amount of terraincovered by simpler primitive planners, we try to findq ′ that remains in m but is close to r ′. This can easilybe done by sampling q ′ in the terrain-decompositioncells that border r ′ and a region r in m and then shift-ing q ′ so that it is not supported by r ′. If q ′ violates

Figure 19. Configuration space sampling at the junction oftwo different regions (transitions).

Journal of Field Robotics DOI 10.1002/rob

Page 19: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 859

Figure 20. Illustrating multiregion search for a circular robot in 2D terrain. In (c) transition configurations are drawnshaded gray. Nodes are labeled with their mode. In (e) primitive planners find a valid trajectory for the first edge but notthe second.

the constraints of either m or m′, it is rejected. Second,consider the case in which m′ removes a region r fromm. If the robot at configuration q is supported by r , wesample q ′ in the cells in the terrain that borders r anda region r ′ in m′ and shift q ′ out of r . Otherwise, wesimply set q ′ = q.

In the algorithm’s second phase, we try to findthe trajectories that correspond to edges in G. For eachedge e = (q, q ′) in G, we set the edge cost ce equal toa heuristic c(q, q ′) that estimates the sum of the plan-ning and execution cost. We then perform Dijkstra’salgorithm to find the lowest cost path from start togoal. This path is then refined by executing single-mode planning queries along each unsolved edge e

along the path. If a single-mode trajectory is success-fully found between q and q ′, we mark e as solved andreplace ce with the trajectory’s actual execution cost.If not, we remove e from the graph [Figure 20(e)]. This

procedure is repeated until a feasible trajectory to thegoal is found [Figure 20(f)].

We also adaptively improve estimates of theheuristic edge costs. For each primitive planningquery, we measure the computation time and thenuse the result to update the planning cost estimate.For a primitive planner p, the planning cost cp(q, q ′)between q and q ′ is assumed to be a linear modelapd(q, q ′) + bp. We improve our estimates of ap andbp over time, using a recursive least-squares estima-tion. We use a similar procedure to adaptively esti-mate execution cost.

6.5. Experiments

We tested the planner with and without semantic la-beling. First, we tested a problem with a uniformlyflat terrain of more than 20,000 triangles, with a point

Journal of Field Robotics DOI 10.1002/rob

Page 20: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

860 • Journal of Field Robotics—2009

Figure 21. Motion planning example based on semanticpolygonal labels for the “dumpster” data set.

target 5 m away from the robot’s initial position.The decomposition step takes approximately 220 ms.If the terrain is unlabeled, the multiregion planningstage took 184 ms to generate a path (averaged over10 runs). Labeled as flat ground, it took 53 ms. Wethen tested a two-level problem that consisted of flatground and stairs, with a point target on a secondlevel, approximately 2 m above the robot. The de-composition step took approximately 200 ms. Multi-region planning took 886 ms on unlabeled terrain and134 ms on labeled terrain. Similar increases in compu-tational speed were observed in several other experi-ments.

Experiments on terrain generated from the 3Dmapping module (e.g., Figure 21) show that the plan-ner can generally produce a path to targets up to ap-proximately 5 m away from the robot’s initial posi-tion in less than a second. The running time can varysignificantly depending on the size of the model, theshape of obstacles, and the performance of heuristics.

7. APPLICATION: ASSISTED TELEOPERATION

Teleoperating a robot is especially difficult when therobot is not in sight. The feedback from the robot’scameras is not a sufficient source of information toguarantee collision-free navigation and the appropri-ate selection of motion primitives by a user. Our sys-tem, in real time, fulfills these objectives for the user.To combine user commands with our autonomousnavigation system, we added some intuitive controlsto specify the task of the motion planner. A user’ssimple direction, given via a joystick, is automati-cally translated to a complex trajectory in the 3D

Figure 22. Example of intuitive control using a joystick.The user describes a general direction (red line), and themotion planner adapts it to a correct trajectory (yellowpolyline).

model with the correct series of motion primitives(Figure 22). The user can also control the aggressive-ness of the motion planning to modify the level of risktaken by the robot (Figure 23).

7.1. Intuitive Control

The motion planner described above assumed that agoal configuration was specified. This level of taskspecification is inefficient for a human operator, whomost likely would be using a joystick or console tocommand the robot. Here, we describe how the plan-ner can be adapted to handle more intuitive tasks,such as a point target or line following (see Figure 22).

We assume that the user’s input is translated intoan early termination penalty function p(q) definedover configuration space, which penalizes the plan-ner if it chooses to terminate planning at a path end-ing in q. For example, a point target t would define areward function p(q) = a‖t − x(q)‖, where a is a posi-tive constant and x(q) is the center of the robot at con-figuration q. The function can be shifted uniformly byan arbitrary constant without affecting the planner.

We implement early termination as follows:When building the transition graph G, we do not addthe goal configuration qg as a node in G. Rather, weadd a virtual edge e between each transition configu-ration q and a virtual goal configuration. We assignthe cost ce = p(q) to each of these edges. We alsoallow the planner to terminate early within a mode,not necessarily at a transition. To do this, we augmentG with configurations sampled inside a single mode.The second phase of the planner proceeds as usual,except that virtual edges are treated as solved and thevirtual goal is treated as the de facto goal.

Journal of Field Robotics DOI 10.1002/rob

Page 21: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

Rusu et al.: Leaving Flatland: Efficient Real-Time 3D Perception and Motion Planning • 861

Figure 23. Plans to reach a target on unlabeled terrain, with (a) low aggression and (b) 5× higher aggression.

7.2. Aggressiveness Modulation

The early termination penalty function relies on thedefinition of a scale factor a. We denote this factorthe aggressiveness. A high aggressiveness value causesthe planner to (1) produce trajectories with higherexecution costs and (2) spend more time trying tofind a path before terminating. A low aggressive-ness will make the planner act greedily and reactivelyand be more risk-averse. Figure 23 illustrates anexample.

8. CONCLUSION AND FUTURE WORK

In this article, we presented the theoretical and imple-mentation details of Leaving Flatland, a comprehen-sive system for real-time 3D mapping and planningfor general-purpose mobile robots, with an emphasison the RHex robot.

The system includes mapping and localizationtechniques, coupled with higher level polygonalmodeling for stereo data. These annotations result inheuristic semantic labels for different terrain types,which are extremely helpful for motion planningtechniques to choose the most appropriate gaits forthe robot. The motion planner uses a new decompo-sition technique to convert a 3D model into locally 2Dregions. Existing mobile robot planning techniquescan be used within each region, and our planner usesa repertoire of techniques specialized for certain ter-rain types, such as flat ground and stairs.

The performance of both the mapping and plan-ning subsystems in terms of both the quality of themodeling and its speed allows the deployment of theoverall system on real robots and makes the architec-ture suitable for fast, real-time 3D navigation.

Our short-term goals include system integrationwork and testing on the RHex robot in realistic recon-naissance and search-and-rescue scenarios. We planto improve the mapping and localization modulesby incorporating loop-closure techniques to globallyalign maps gathered over long periods of time. Weplan to improve the accuracy of the 3D map by bet-ter polygon fitting, especially with respect to multipleresolutions. We also plan to further develop our pre-liminary work on assisted teleoperation.

ACKNOWLEDGMENTS

The authors thank Edward Van Reuth and DARPAfor support on the “Leaving Flatland” project (con-tract #FA8650-04-C-7136), as well as the Cognitionfor Technical Systems (CoTeSys) cluster of excellenceat the Technische Universitat Munchen for partialsupport.

REFERENCES

Agrawal, M., & Konolige, K. (2006, August). Real-time lo-calization in outdoor environments using stereo visionand inexpensive GPS. In International Conference onPattern Recognition, Hong Kong.

Biber, P., Andreasson, H., Duckett, T., & Schilling, A. (2004,October). 3D modeling of indoor environments by amobile robot with a laser scanner and panoramic cam-era. In IEEE/RSJ International Conference on Intelli-gent Robotics Systems, Sendai, Japan.

Bretl, T., Lall, S., Latombe, J.-C., & Rock, S. (2004, July).Multi-step motion planning for free-climbing robots.In WAFR, Zeist, Netherlands.

Chestnutt, J., Kuffner, J., Nishiwaki, K., & Kagami, S. (2003,October). Planning biped navigation strategies in com-plex environments. In IEEE International ConferenceHumanoid Robots, Karlsruhe, Germany.

Journal of Field Robotics DOI 10.1002/rob

Page 22: Leaving Flatland: Efficient Real-Time Three …aravind/documents/rusu09-flatland.pdfLeaving Flatland: Efficient Real-Time Three-Dimensional Perception and Motion Planning Radu Bogdan

862 • Journal of Field Robotics—2009

Chum, O., & Matas, J. (2002). Randomized RANSAC withTd,d test. In Image and vision computing (vol. 22,pp. 837–842). Elsevier.

Diebel, J., & Thrun, S. (2006). An application of Markov ran-dom fields to range sensing. In Y. Weiss, B. Scholkopf,and J. Platt (Eds.), Advances in neural informationprocessing systems 18 (pp. 291–298). Cambridge, MA:MIT Press.

Hauser, K., Bretl, T., Latombe, J.-C., & Wilcox, B. (2006,July). Motion planning for a six-legged lunar robot. InWAFR, New York.

Howard, T.M., Knepper, R.A., & Kelly, A. (2008). Con-strained optimization path following of wheeledrobots in natural terrain. Experimental Robotics, 39,343–352.

Iocchi, L., & Pellegrini, S. (2007, July). Building 3D mapswith semantic elements integrating 2D laser, stereo vi-sion and INS on a mobile robot. In 2nd ISPRS Interna-tional Workshop 3D-ARCH, Zurich, Switzerland.

Konolige, K., Agrawal, M., Bolles, R.C., Cowan, C., Fis-chler, M., & Gerkey, B.P. (2006, July). Outdoor map-ping and navigation using stereo vision. In Interna-tional Symposium on Experimental Robotics (ISER),Rio de Janeiro, Brazil.

Konolige, K., Agrawal, M., & Iocchi, L. (2005, Jan-uary). Real-time detection of independent mo-tion using stereo. In IEEE Workshop on Motion(WACV/MOTION), Breckenridge, CO.

Konolige, K., Agrawal, M., & Sola, J. (2007, November).Large scale visual odometry for rough terrain. In Inter-national Symposium on Robotics Research, Hiroshima,Japan.

Liu, Y., Emery, R., Chakrabarti, D., Burgard, W., & Thrun,S. (2001, June). Using EM to learn 3D models ofindoor environments with mobile robots. In ICML,Williamstown, MA (pp. 329–336).

Moore, E.Z., & Buehler, M. (2001, September). Stable stairclimbing in a simple hexapod robot. In InternationalConference Climbing and Walking Robots, Karlsruhe,Germany.

Mozos, O.M., Triebel, R., Jensfelt, P., Rottmann, A., &Burgard, W. (2007). Supervised semantic labeling ofplaces using information extracted from laser and vi-sion sensor data. Robotics and Autonomous Systems,55(5), 391–402.

Munoz, D., Vandapel, N., & Hebert, M. (2008, June). Direc-tional associative Markov network for 3-D point cloudclassification. In Fourth International Symposium on3D Data Processing, Visualization and Transmission,Atlanta, GA.

Nuechter, A., Surmann, H., & Hertzberg, J. (2003, Octo-ber). Automatic model refinement for 3D reconstruc-tion with mobile robots. In Proceedings of the 4thIEEE International Conference on Recent Advances in3D Digital Imaging and Modeling (3DIM’03), Banff,Canada.

Posner, I., Cummins, M., & Newman, P. (2008, June). Fastprobabilistic labeling of city maps. In Proceedings ofRobotics: Science and Systems IV, Zurich, Switzerland.

Rusu, R.B., Sundaresan, A., Morisset, B., Agrawal, M., &Beetz, M. (2008, October). Leaving Flatland: Realtime

3D stereo semantic reconstruction. In Proceedings ofthe International Conference on Intelligent Roboticsand Applications (ICIRA) 2008, Wuhan, China.

Sanchez, G., & Latombe, J.-C. (2002). On delaying colli-sion checking in PRM planning: Application to multi-robot coordination. International Journal of RoboticsResearch, 21(1), 5–26.

Saranli, U., Buehler, M., & Koditschek, D.E. (2001). Rhex: Asimple and highly mobile hexapod robot. InternationalJournal of Robotics Research, 20(7), 616–631.

Schroter, D., Weber, T., Beetz, M., & Radig, B. (2004, Au-gust). Detection and classification of gateways forthe acquisition of structured robot maps. In Proceed-ings of 26th Pattern Recognition Symposium (DAGM),Tubingen, Germany.

Shirkhodaie, A., Amrani, R., & Tunstel, E. (2004). Visual ter-rain mapping for traversable path planning of mobilerobots. Intelligent robots and computer vision XXII: Al-gorithms, techniques, and active vision. Proceedings ofthe SPIE, 5608, 118–127.

Simeon, T., & Dacre-Wright, B. (1993, July). A practical mo-tion planner for all-terrain mobile robots. In IEEE/RSJInternational Conference Intelligent Robotic Systems,Yokohama, Japan.

Soares, L., Menier, C., Raffin, B., & Roch, J.-L. (2007, March).Parallel adaptive octree carving for real-time 3D mod-eling. In IEEE Virtual Reality Conference, Charlotte,NC.

Thrun, S., Montemerlo, M., Dahlkamp, H., Stavens, D.,Aron, A., Diebel, J., Fong, P., Gale, J., Halpenny, M.,Hoffmann, G., Lau, K., Oakley, C., Palatucci, M., Pratt,V., Stang, P., Strohband, S., Dupont, C., Jendrossek,L.-E., Koelen, C., Markey, C., Rummel, C., van Niekerk,J., Jensen, E., Alessandrini, P., Bradski, G., Davies, B.,Ettinger, S., Kaehler, A., Nefian, A., & Mahoney, P.(2006). Stanley: The robot that won the DARPA GrandChallenge. Journal of Field Robotics, 23, 661–692.

Torr, P., & Zisserman, A. (2000). MLESAC: A new robustestimator with application to estimating image geom-etry. Computer Vision and Image Understanding, 78,138–156.

Vandapel, N., Huber, D., Kapuria, A., & Hebert, M. (2004,April). Natural terrain classification using 3-D ladardata. In IEEE International Conference Robotics andAutomation, Barcelona, Spain (vol. 5, pp. 5117–5122).

Vasudevan, S., Gachter, S., Nguyen, V., & Siegwart, R.(2007). Cognitive maps for mobile robots—An objectbased approach. Robotics and Autonomous Systems,55(5), 359–371.

Vincent, R., Dieter, F., Ko, J., Konolige, K., Limketkai, B.,Morisset, B., Ortiz, C., Schulz, D., & Stewart, B. (2008).Distributed multirobot exploration, mapping, and taskallocation. In D.A. Shapiro and D.G.A. Kaminka (Eds.),Annals of Math and Artificial Intelligence (AMAI),Special Issue on Multi-Robot Coverage, Search, andExploration, 52, 229–255.

Weingarten, J., Gruener, G., & Siegwart, R. (2003, June).A fast and robust 3D feature extraction algorithmfor structured environment reconstruction. In Inter-national Conference on Advanced Robotics, Coimbra,Portugal.

Journal of Field Robotics DOI 10.1002/rob