a viewpoint planning and navigation algorithm for mobile

A Viewpoint Planning and Navigation Algorithmfor Mobile Robots using Depth Images

Davide Valeriani, Dario Lodi Rizzini, Fabio Oleari, Stefano CaselliDipartimento di Ingegneria dell’Informazione, University of Parma, Italy

{davide.v,dlr,oleari,caselli}@ce.unipr.it

Abstract

In this paper, we illustrate a viewpoint plan-ning and local navigation algorithm for mobilerobot exploration using 3D perception. Laserscans and stereo camera data are read and com-posed into a single point cloud. The latest ac-quired point clouds are registered and used todetect relevant objects and obstacles in space.To demonstrate the capability of the proposedperception system, a viewpoint planning appli-cation has been developed. The algorithm se-lects a goal object, computes a viewpoint to ob-serve the region occluded by such object, andplans a path to reach the desired configuration.The navigation method executes the computedpath and reaches the goal. Experiments haveassessed the capability of the system to planusing three-dimensional occupancy informationand to find new viewpoints in cluttered environ-ments.

1 Introduction

The capability of navigating in unstructured and clut-tered environments is a crucial task in service robotics.A typical human-populated environment is full of ob-jects, possibly with protuberant parts, and may not befaithfully represented by a map built by a planar sensor.Furthermore, it would be convenient to qualify seman-tically interesting obstacles as objects and to obtain acomplete view of the interesting parts of the environ-ment. Thus, the perception of three-dimensional shapessignificantly allows a mobile robot to perform high-leveltasks and to achieve context-aware interactions with theenvironment.

The diffusion of cheap 3D sensors like range camerasand tilted laser scanners has popularised navigation inthree-dimensional space and scene interpretation. Thepoint cloud data structure provides a common repre-sentation for measurements acquired from multiple sen-sors. Since the point clouds acquired by range sensors

are often rather accurate, navigation algorithms may ex-ploit both occupancy and semantic information. Stan-dard mobile robots move on a plane and their config-uration can be often represented by two position andone orientation variables. Such convenient assumption,that avoids complex motion planning, holds even for mo-bile robots equipped with manipulators, since the roboticarm is usually idle during navigation. A worst-case es-timation of free space can be achieved by projecting thepoints of the cloud on the motion plane. Several nav-igation algorithms operate under such assumption [1;2]. However, a planning technique exploiting such ap-proximation may not detect narrow passages among ob-stacles, since the free space is usually underestimated.More sophisticated methods discriminate between obsta-cles at different heights [3; 4].

Furthermore, 3D sensors allow a detailed perceptionof shapes which can improve performance in object de-tection, robot motion and manipulation in cluttered en-viroments [5]. Most research on 3D perception has fo-cused on scene segmentation and object recognition, andseveral effective algorithms have been proposed. How-ever, detection is often addressed as a separate sensor-guided task that weakly interacts with mobile robot nav-igation. As a matter of fact, exploration of the envi-ronment can benefit from a raw semantic assessment ofthe 3D obstacle shapes and their relative placement [6;7].

In this paper, we present an integrated system com-prising viewpoint planning and a local navigation algo-rithm for mobile robot exploration using 3D perception.The proposed system acquires and stores sensor mea-surements from both a range finder and a stereo camerapair into a point cloud. The latest measurements be-longing to a sliding time-window are aligned and accu-mulated in a single point cloud. The point cloud is usedto build a planar local occupancy grid map which is usedfor planning robot path. The planar occupancy map isbuilt using only points which may collide with the robotaccording to their height from the ground. Although the

Proceedings of Australasian Conference on Robotics and Automation, 2-4 Dec 2013, University of New South Wales, Sydney Australia

construction of the map is performed using a standardlocal grid map component that projects the 3D points onthe ground plane, the points are selected so as to avoidunnecessary restriction of the free space. Moreover, thepoint cloud is divided into clusters of strongly connectedpoints.

To demonstrate the perception capability of the pro-posed system, a viewpoint planning application has beenimplemented. In particular, a cluster of interest is se-lected, according to conditions on size and relative po-sition, and the application determines a new viewpointto achieve the observation of the region occluded by theobject. Then, the path for reaching the viewpoint isplanned using the occupancy grid map built projectingthe point cloud on a plane. The local map keeps a short-term memory of the obstacles since most of the consid-ered objects are ephemeral elements of the environmentand their location can change rather frequently. Theplanned trajectory is executed using a variant of the dy-namic window method, while the local map is updatedduring the robot motion. Finally, the system checkswhether the goal is reachable and a new goal is plannedusing latest observations.

The main contribution of this work is the implementa-tion of a complete robotic system that exploits 3D sensorsources for both occupancy and semantic segmentationusing standard components. Moreover, the proposed ap-proach avoids unnecessary restrictions of the free spaceavailable in the local map and achieves a safe navigationbehaviour in cluttered environments.

The paper is organized as follows. Section 2 reviewsthe state of the art in navigation and semantic segmen-tation using 3D perception. Section 3 illustrates the ac-quisition, registration and processing of the point cloudsobtained from sensors. Section 4 discusses the construc-tion of the local map and the mobile robot navigation.Section 5 presents the viewpoint planning application.Section 6 presents the experiments performed to test theimplemented system and section 7 discusses the results.

2 Related Works

Three-dimensional perception has increasingly been usedto perceive obstacles and perform collision-free naviga-tion. For most purposes the point clouds representingsamples of occupied space can be converted into a stan-dard planar representation by projecting points on theground plane. Since mobile robots, like humans, moveon such surface, this approximation can be reasonablyapplied to address most navigation problems where therobot keeps far away from obstacles. In [8] the 3D dataacquired by a MS Kinect are used only to represent onthe plane obstacles that may collide with the robot. Inmodern automated warehouses, the movement of goodsis increasingly managed with autonomous robots whose

perception of the environment and safety usually rely onplanar laser scanners. Despite reliability of these sen-sors, problems may arise in case of hanging obstacles. In[9] the use a time of flight camera for obstacle detectionis investigated as a safety device for Automated GuidedVehicles, but no sensor fusion is considered. Mappingand navigation with a MS Kinect sensor are also investi-gated in [2]. In particular, this work presents a compar-ison between 2D mapping by projecting points on theground and 3D feature-based SLAM. Madder et al. [1]

illustrate a system that uses a voxel grid map of the envi-ronment, but still a planar cost map is built by projectingvoxel map columns and by expanding the robot footprintaround obstacles. In [10] the points of the acquired depthimages are sampled and fit into planes. Planes are usedto perform both localization and navigation. Hornunget al. [4] propose to use both voxel map and multi-layeroccupancy grid map to check collisions. Such solutionallows the robot to move closer to obstacles when therobot does not collide to the grid map at a given height.The challenge is due to the complexity of 3D collisionchecking. A fast method to incrementally build 3D oc-cupancy maps is illustrated in [3]. This paper is one ofthe few works addressing collision check in space. How-ever, the reported experiments are performed offline ona previously acquired dataset. 3D perception has alsobeen applied to navigation, exploration and localizationof humanoid robots. Maier et al. [11] propose to usean octree occupancy grid map for localization, but pathplanning and navigation are executed on a projected pla-nar map. These works mainly focus on how to use 3Dperception for localization, mapping and navigation anduse point clouds to infer spatial occupancy information.

A limited number of works exploit object detection asa mean to semantically segment the environment and toplan the execution of meaningful navigation tasks in theenvironment. The detected objects are often used to en-rich maps with higher-order semantic information [12].The work in [7] presents a sensor model that takes intoaccount object detection and robot viewpoint. A conve-nient path is then planned as a trade-off between gainingadditional information about an object hypothesis andthe cost for executing such trajectory. Aydemir et al. [13]

propose an algorithm for discovering objects using rela-tions between objects. In particular, the observation ofa specific object is used by the view planner as a cluefor the presence of other objects (e.g. if a table is par-tially observed, the robot looks for possible objects onthe rest of the table). Since the state of a mobile robot isdescribed by its position and orientation on the groundplane, the view planning task is usually formulated as aplanar problem.


(a) (b)

Figure 1: The rigid bar with the ARToolkit marker usedfor calibration (a) and the estimation of the relative ref-erence frames (b).

3 Perception System

This section illustrates the 3D perception system devel-oped in this work and the segmentation of the acquiredpoint cloud. This task is the core of the whole roboticsystem, since it allows the estimation of collision-freepaths and the decision of navigation goals. The 3D per-ception system consists of a pair of Logitech C270 web-cams and a planar laser scanner Sick LMS100. The twocameras are fixed to a bar with approximately verticalorientation and a baseline of 12 cm in order to composea stereo camera. Both the laser scanner and the cam-era are placed on a Pioneer 3DX mobile robot and areoriented to the front of the robot. The stereo cameraacquires colored point clouds that can be used to de-tect objects and protruding 3D shapes, but has a limitedfield of view (about 20◦) and returns rather sparse andnoisy range measurements [14]. On the other hand, pla-nar range finders are part of standard sensor equipmentsince they allow an accurate representation of occupiedregions on the scanning plane, but are scarcely useful forobject recognition. To overcome these limitations, sensorfusion is performed by accumulation of the synchronizedmeasurements into a single point cloud. Furthermore,the latest point clouds are aligned and registered in or-der to build a short-term more extensive representationof the environment.

In the following, we present the sensor calibration, theprocessing and registration of point clouds, and the ob-ject detection components.

3.1 Calibration

The aim of the calibration procedure is the estimationof the intrinsic and extrinsic parameters that make thestereo camera and range finder work correctly. The in-trinsic parameters are related to camera optics and otherdevice features and are rather stable during time. Theextrinsic parameters encode the relative position among

the devices. An accurate estimation of these parametersis required to compute the disparity image by match-ing homologous points between the two cameras and toachieve a consistent point cloud from the fusion betweenrange finder and stereo camera measurements. In par-ticular, sensor fusion is achieved by overlapping mea-surements expressed w.r.t. the same reference frame.Furthermore, the assessment of the sensor pose w.r.t.the robot reference frame influences the results of pointcloud registration.

The estimation of the intrinsic and extrinsic parame-ters of stereo vision is a long standing problem and sev-eral tools are available to perform calibration. In par-ticular, we used the ROS package camera calibration

that allows joint estimation of intrinsic and extrinsic pa-rameters using a checkerboard of known size. The rela-tive position and orientation of the laser scanner framew.r.t. the robot can be measured manually with satis-factory accuracy. On the other hand, the relative posebetween the stereo camera and the robot are difficult toassess and may easily change due to accidental bumps.To overcome this problem, a target marker has been fixedto a rigid bar so that its pose (marker) w.r.t. the robotframe (base link) is stable. The transformation frombase link to marker can be manually measured withsatisfactory accuracy. The pose of the marker w.r.t. thestereo camera frame (baseline12) is estimated with theARToolkit library [15]. The rigid bar with the markeris shown in Figure 1(a). Thus, camera measurementscan be straightforwardly transformed from baseline12

to base link reference frame. Figure 1(b) shows all theframes used in this work. Hence, all the sensor data canbe given w.r.t. the robot and overlapped.

In future works, we plan to add a plate on the rigidbar under the marker that can be observed by the rangefinder. The laser scanner would observe a segment of thetarget plate, thereby improving the estimation of sensororientation and position. Thus, the position and orien-tation of the laser w.r.t. the marker would be measureddirectly and more accurately by the range finder.

3.2 Point Cloud Processing

The sensor data acquired by laser scanner and stereocamera are combined into a single point cloud that isused to navigate and to detect objects in the scene. Thedepth measurements tend to be sparse, due to failuresin homologous point matching, and to be limited by thenarrow camera field of view. For these reasons, the latestk point clouds are registered and accumulated.

Algorithm 1 shows the main operations performed toobtain a consistent point cloud. First, sensor data mustbe expressed in a single frame, the robot frame, in orderto enable sensor fusion operations. Frame conversion isperformed by applying to the point cloud the transfor-


mation matrix obtained in the calibration step. Due tothe different rates, a software synchronization betweenthese measurements is performed to merge them into asingle consistent point cloud.

Then, three filtering operations are performed to cleanthe point cloud and remove noise from camera data. Thefirst one removes points with NaN value, correspondingto points without depth generated by the stereo algo-rithm. The second filtering operation aims at removingisolated points, which are likely due to noisy perceptionor algorithm failures, and is performed using statisticaloutlier removal (SOR). The last one removes points tooclose to the camera and, thus, more subject to acquisi-tion noise.

The corresponding laser scan point coordinates aretransformed to the robot reference frame and processedfor sensor fusion. Due to the lower amount of data ac-quired and the simpler data structure, this operation isquite fast and simple. The stereo camera point cloudand the laser point cloud are merged together with anoperation of sensor fusion, with priority to laser scan dueto the higher accuracy of the sensor, to obtain a singlepoint cloud.

Finally, the accumulation of the latest k point cloudsis performed. The previously acquired point clouds arestored in a FIFO vector and aligned to the current pointcloud at iteration t. It is sufficient to estimate the rel-ative pose T t−1

t of the robot at time t − 1 w.r.t. thecurrent robot pose t and to apply such transformationto all the clouds stored in a vector V. The computa-tional complexity of the alignment is O(N), where N isthe number of point clouds to be aligned. The estima-tion of T t−1

t is achieved using a 3D variant of IterativeClosest Point (ICP) algorithm, which is implemented inthe Point Cloud Library (PCL) [16]. Before the execu-tion of ICP, a regular spatial discretization is achievedby keeping the mean point for each occupied voxel. Suchoperation reduces the computation time of ICP and re-duces association errors due to an irregular distributionof points.

Next, point clouds in V are added together to obtain adenser point cloud. This point cloud may cover a widerfield of view, in particular when the robot changes itsorientation. However, accumulation of unaligned pointclouds would be deleterious because it would overlap sen-sor data referred to different scenes. Therefore, beforepoint cloud addition, point clouds P t−i

t and P tt points

are associated using the PCL implementation of kd-treealgorithm. If the number of points that have a corre-sponding element in the other point cloud exceeds a cer-tain threshold then the two point clouds will be fused;otherwise, this step will be skipped. Eventually, the per-ception system has all sensor data stored in a single pointcloud that will be used for different purposes.

The execution of all the steps in Algorithm 1 and ofthe sensor data acquisition requires about 6.44 s on anotebook equipped with an Intel Core i7-2620M 2.7GHz,8GB RAM. Although planning tasks do not require highupdate rate, such execution time does not meet typicaltime constraints. Thus, point cloud processing is com-putationally too expensive for the on-board laptops orembedded CPUs commonly mounted on mobile robots.In order to reduce the complexity, the most expensiveoperations in Algorithm 1 have been identified and asimplified pipeline has been implemented. In particular,the execution time almost halves by avoiding statisticaloutlier removal. Furthermore, the distributed executionof acquisition and point cloud processing on two sepa-rate hosts has allowed to reduce the computation timeunder 1 s without significantly reducing the quality ofthe output.

Algorithm 1: Point Cloud Processing

Data: L: laser scan data; C: camera data withdisparity;

Result: P: point cloud obtained by sensor fusion;while ros::ok() do1

store L and C in point cloud PL and PC respectively;2

express PL and PC sensor data w.r.t. robot frame;3

filter NaN values from PC;4

if outlier removal enabled then5

filter outlier values from PC using SOR;6

end7

filter PC values too close to camera using8

PassThrough;Pt = PL ∪ PC ; /* spatial sensor fusion */9

if first point cloud processed then10

insert P into point clouds vector V11

else12

if V is full then13

pop older point cloud;14

end15

compute transformation matrix to align Pt to16

Pt−1;Ptot ← ∅; /* temporal sensor fusion */17

forall Pi in V do18

apply transformation matrix to Pi;19

Ptot ← Ptot ∪ Pi;20

end21

P ← Ptot;22

end23

end24

3.3 Object Detection

The point cloud obtained by registration and sensor fu-sion in the previous step can be used to detect objects inthe scene. In this context, objects correspond to pointclusters lying on the ground plane and matching givensize and shape conditions. Cloud segmentation allowsthe identification of candidate goals for robotic tasks andof interesting regions that are occluded in the current


view. Algorithm 2 shows in pseudocode the main oper-ations needed to identify objects.

Algorithm 2: Object Detection

Data: P: point cloud containing acquired sensor data;Result: CP: cluster point clouds; OP: object point

cloud;while ros::ok() do1

if floor removal enabled then2

remove floor from P;3

end4

extract clusters from P primitive and store in C;5

forall Ci in C do6

CPi ← ∅;7

OPi ← ∅;8

minDist←∞;9

forall point pi in Ci do10

dpi ← pi euclidean distance;11

if minDist > dpi then12

minDist← dpi;13

end14

push pi in OPi;15

color pi with cluster color;16

push pi in CPi;17

end18

push CPi in CP;19

compute dimensions of cluster bounding box;20

if cluster satisfies user settings then21

OP ← OPi;22

end23

end24

end25

First of all, the dominant plane is removed using theSAC Segmentation method, included in PCL. Normally,the dominant plane is the ground plane on which theobjects stand. The ground may not be detected when thepoint cloud is not dense enough or the camera orientationleaves out the floor. In order to reduce processing time,it is possible to disable this function and use a planeremoval that relies on an a-priori knowledge of planeequation.

Next, clusters of unconnected points are extractedfrom the point cloud using the euclidean cluster extrac-tion method. A cluster is a set of points similar accord-ing to feature and position in space. Only the relevantclusters remain after setting a proper minimum and max-imum number of points.

Main operations carried out by the cluster extractionmethod are:

• initialization of an empty list C of clusters and anempty queue Q of points to analyze ;

• for each point pi ∈ P, where P is the point cloud:

– add pi to Q;

– for each point pj ∈ Q:

∗ search neighbour points set Pj in a sphere ofradius r < dth, where dth is a tolerance value;

∗ for each neighbour pk ∈ Pj , check if it hasalready been processed; if not, add it to Q;

– add Q to clusters list C and resetQ to an emptyqueue;

• ends when all points pi ∈ P have been processedand inserted in the clusters list C.

For each cluster extracted, a particular point cloudcontaining only cluster points is created. Moreover, theeuclidean distance between a cluster and the robot iscalculated as d = x2

p + y2p + z2p, where (xp, yp, zp) arethe coordinates of the cluster point closer to robot. Thedimensions of the bounding box containing the clusterare also calculated to estimate the object size. If theobject size is compatible with dimensions defined by theuser, the cluster will be identified as a candidate objectof interest. Otherwise, the next cluster will be processed.The output of the simple object detection component isa list of point clouds along with their centroid positionsand bounding boxes, which can be used by the pathplanner to compute a goal configuration.

4 Navigation

Main purpose of 3D perception is to enable secure nav-igation for the mobile robot. Several works presentedin Section 2 reduce the role of 3D perception in naviga-tion to a projection on the ground plane. In this way,however, 3D obstacle detection incurs in a worst casescenario. In the approach described in this paper, the3D objects are projected on the ground plane only if apart of the robot collides with the obstacle at least in agiven orientation. Thus, the ability to navigate throughenclosed passages (such as under a table) is retained,while ensuring safety and efficient planning using a local2D map. For example, the robot used in our experi-ments, shown in Figure 4, is equipped with a fork lift tograb specific objects and must avoid collision with hang-ing parts. Navigation behaviour can be divided into twotasks: construction of a local map, to identify occupiedareas, and path planning and navigation, to calculateand execute a safe path to the goal.

The task considered in this work is the detection andcomplete observation of objects in the scene rather thanthe construction of a complete representation of all theexplored environment. Object categories of interest in-clude chairs, tables, furniture and small human artifactsthat can be moved. Hence, a global map is not requiredin this case due to the presence of ephemeral informa-tion.

The first step of the implemented navigation be-haviour is the projection on a plane of the point cloud ob-tained in previous tasks. In fact, although 3D perceptiondescribes obstacles distribution in the space, the mobilerobot moves on a plane (ground) and a 3D local map


is not required. However, the free space should not beunnecessarily restricted by projecting all points. Pointsabove the maximum height of the robot are not consid-ered, while different rules may be defined by identifyingdifferent layers and footprints for the robot height. Cellsaround every obstacle are identified as inflated obstacles,since the mobile robot is approximated as a point. Onecontribution of this paper, thus, is to limit the restric-tion of free space, even though the construction of thelocal map is performed through a standard componentlike costmap 2d node.

For correct planning, it is necessary to construct ashort-term memory local map allowing the integration ofsensors data acquired in the last time interval. This mapis obtained by registration of the point clouds projectedon the motion plane. Point clouds are previously alignedas described in section 3.2. Odometry provides anotherinformation used in the creation of the local map.

The local map 2 is organized as an expanded occu-pancy map to avoid collisions. The map creation processis performed by costmap 2d node, included in move base

ROS package.

Figure 2: Example of short-term memory local mapshown using Rviz.

5 Application: viewpoint planning

In this section, a viewpoint planning application of thesystem developed is presented. In particular, the 3Dperception system has been used to compute a trajectoryallowing the mobile robot to acquire different views of anobject in the scene. In cluttered environments objectsare often responsible for occlusions. Hence, the proposedapplication aims at detecting possible new viewpointsbeyond the closest object. The system selects the targetobject using two criteria:

• dimensions: the object selected must have dimen-sions in ranges set by the user;

• distance: the object selected will be the one closerto the robot.

The object selected is approximated by its centroid.From this value, the goal configuration will be set at a

certain distance, selected by the user. In order to seethe object from the rear, the goal orientation is set asopposite w.r.t. the current robot orientation, as shownin fig. 3. In particular, if (xc, yc, zc) are the coordinatesof object centroid, dc is the object depth and x0 is theuser defined distance, goal coordinates are set as [xc +dc + x0, yc, zc]

T and the orientation is toward the initialposition of the robot.

After setting the goal, a local trajectory planner com-putes the best path to reach it, considering the short-term memory local map constructed in the previous step.Motion planning is performed by the nav core node, in-cluded in ROS navigation package [1]. This node pro-vides both local and global planners that use odometryto estimate robot position. However, in this work onlythe local planner has been used. The local trajectoryplanner uses an implementation of the Dynamic Windowalgorithm [17], whose main execution steps are shown inalgorithm 3.

Algorithm 3: Main execution steps of DWA algo-rithm.Data: M: short-term memory local map;Result: C: velocity commands to the robot motors;while true do1

discrete sampling of the robot control space2

(dx, dy, dθ);forall sampled velocity do3

perform forward simulation from the robot4

current state to predict what would happen ifthe sampled velocity were applied for some(short) period of time;if current trajectory crosses an obstacle then5

discard trajectory;6

else7

assign a score to the trajectory dependent8

from proximity to obstacles, goal and globalpath and from the speed.

pick the highest-scoring trajectory and send the9

associated velocity to the mobile base.

The planner produces as output linear and angularspeed values to be sent to the mobile robot to performmovement actions. If the goal is unreachable, due to theabsence of free path to reach it, the planner fails and therobot stops its motion. The robot will also stop whenit reaches the goal position with correct orientation. Inboth cases, the robot can forward to the next task, suchas lifting the object or searching for other interestingentities in the environment.

6 Experiments

Figure 4(a) shows the Pioneer 3DX wheeled robot usedin our experiments. The robot is equipped with a SickLMS100 laser scanner and a stereo camera consisting of


Figure 3: View of the complete navigation system: short-term memory local map, object detected enclosed in ablack bounding box, goal configuration shown as a yellowarrow, planned trajectory shown in green.

(a) (b)

Figure 4: Robot platform used in experiments (a) andexperimental environment (b).

two Logitech C270 cameras mounted in an aluminiumcase. The system uses ROS (Robot Operating Sys-tem), an open source framework that includes severaltools and libraries for robot programming. In this work,costmap 2d and nav core standard nodes have beenused respectively for short-term memory map compu-tation and navigation behaviour. The values of mainparameters used in our experiments are shown in Table1 and Table 2.

Point cloud processing is subdivided between twonodes. The cloud processing node is entrusted withsensor fusion and point cloud registration, while thecloud clustering node is responsible of point cloudsegmentation, object detection and viewpoint plan-ning. The second node is based on the result of thefirst one; in fact, the same point cloud, processed bycloud processing node, is used for both navigation andobject detection. Before starting an experiment, the cal-ibration bar described in section 3 is mounted on therobot. After few seconds, the calibration application isable to estimate position and orientation of the left cam-era. The rigid bar is then unmounted and calibrationparameters are published to the other nodes.

Experiments were carried out in an indoor clutteredenvironment, comprising tables, packages, chairs, and

Table 1: Main parameters used in costmap 2d nodeParameter Valueobstacle range 5.0 mraytrace range 5.0 minflation radius 0.24 mobservation sources point cloud sensorsensor frame base linkdata type PointCloud2topic /cloudmarking trueclearing trueobservation persistence 5.0 sexpected update rate 5.0 Hzglobal frame /odomrobot base frame base linkupdate frequency 5.0 Hzpublish frequency 5.0 Hzstatic map falserolling window truewidth 6.0 mheight 6.0 mresolution 0.05 m

Table 2: Main parameters used in nav core nodeParameter Valuemax vel x 0.2 m/smin vel x 0.1 m/smax rotational vel 1.0 rad/smin in place rotational vel 0.4 rad/sacc lim th 3.2 m/s2

acc lim x 2.5 m/s2

acc lim y 2.5 m/s2

holonomic robot true

other kinds of objects. The mobile robot was positionedin the center of the room (Figure 4(b)). The closestobjects were at a distance of about 1.5 m, since the en-vironment for experiments was cluttered. Initial testsshowed that sensor data acquired from the stereo cam-era are quite noisy, especially the disparity image, so atleast part of the filtering operations described in section3.2 is required. Laser scanner data, instead, are veryaccurate, so no filtering was applied.

The robot tends to choose the objects in front of it dueto the constrained field of view. In such cluttered envi-ronment it is likely to find a matching object in the front.If the robot does not find any object, the search contin-ues in other regions and directions looking for fartherobjects. After having identified the object, the mobilerobot reaches the region occluded by the object by plan-ning and executing a trajectory path. The robot reachesthe back of the target object avoiding obstacles, includ-ing those outside the scanning plane of the laser scanner,thereby proving the usefulness of 3D perception.

A sequence of images referred to a trial is shown inFigure 5. A black bar (shown on the left in the images)occludes one side to robot navigation. This bar cannot


Figure 5: An experimental test (left to right, top to bottom). The robot keeps the left because on the right side ofthe obstacle a bar protrudes at a height that may lead to a collision.

be perceived by the laser scanner because it is aboveits scanning plane, whereas it is correctly perceived bythe stereo camera. The robot uses this information tocompute an occupancy grid suitable for safe navigation.Indeed, as shown in the figure, the robot reaches theobject from the opposite free side.

Several experiments have been performed in order totest effectiveness and the robustness of the proposed sys-tem. For quantitative analysis, two controlled series often trials each have been carried out in an environmentsimilar to the one in Figure 5. In the first set of trialsthe whole perception, planning and navigation applica-tion has been run on a single laptop on-board of therobot (Intel Core i7-2620M 2.7GHz, 8GB RAM). Theresults are shown in Table 3 (row (a)). The robot hassuccessfully accomplished the exploration and planningtask in 6 trials out of 10, with an average task execu-tion time of about 42 s for successful trials. Failuresare due to the computational complexity of 3D percep-tion data processing. Indeed, the robot requires a newcommand before the time-out elapses, otherwise it stopsand performs a shaky motion. When the new commandis not received, the odometry error increases due to theirregular motion, thereby affecting alignment of sensordata. Moreover, the navigation algorithm relies on theoccupancy grid map representing obstacles and unknownspace: a temporal mismatch between the current mapand the aligned point cloud may thus result into mapinconsistencies.

A second set of trials has been performed by splittingthe computational load between the on-board laptop anda more powerful workstation with an Intel Core7-37703.6GHz, 8GB RAM. In particular, the on-board laptophandles the acquisition of sensor data and the execu-tion of robot motion, while the workstation performsthe expensive point cloud processing task. Results forthe second set of trials are shown in Table 3 (row (b)).

Table 3: Experimental results for reference task: successrate and task execution time. Processing: (a) fully on-board, (b) distributed.

Trials Success Rate Execution Time [s]Mean St.Dev

(a) 10 60% 42.67 2.33(b) 10 80% 30.69 2.35

In this case, the system succeeds in accomplishing thetask in 8 out of 10 trials. Some failures still occur due tothe computationally expensive 3D processing, therebyemphasizing the need of powerful computation for ad-vanced sensor data processing. In the second set of tri-als, the time required to reach the target configurationis shorter, since the robot stops less frequently to waitfor the incoming commands.

7 Conclusion

In this paper, we have presented an integrated systemused for viewpoint planning and mobile robot explo-ration using 3D perception. The system acquires bothlaser scans and disparity images, and accumulates thesensor data into a single point cloud. The last k pointclouds are registered to achieve a better representationof the scene. The 3D data are used both to build anextended occupancy grid map, that takes into accountsobstacles at different heights and not only on few planes,and to detect objects and interesting occluded regions,allowing the robot to perform safe goal-oriented naviga-tion in the real scene.

Objects are detected by partitioning the point cloudinto loosely connected components that meet require-ments on size and shape. A specific object of interest isselected as a goal, and the algorithm computes a view-point for achieving its complete observation. A path isplanned to reach the goal using the local map and the


robot moves toward the goal configuration.The developed 3D perception and navigation system

has been tested with real-time sensor data acquisitionon a physical robot. The experiments have proved theeffectiveness in detecting relevant objects on the groundplane, in obtaining a short-term local map of the envi-ronment, in planning a collision free path and reachingthe defined goal. Furthermore, they have shown the rel-evance of computation time constraints for the correctexecution of a navigation. Future work will address ap-plications exploiting 3D perception, such as global mapcomputation or object classification according to multi-ple acquired viewpoints.

Acknowledgment

The authors wish to thank the reviewers for their con-structive suggestions about how to improve the paper.

References[1] E. Marder-Eppstein, E. Berger, T. Foote, B. Gerkey, and

K. Konolige, “The Office Marathon: Robust navigationin an indoor office environment,” in Proc. of the IEEEInt. Conf. on Robotics & Automation (ICRA), 2010, pp.300–307.

[2] A. Oliver, S. Kang, B. Wunsche, and B. MacDonald,“Using the Kinect as a Navigation Sensor for MobileRobotics,” in Proc. of Conf. on Image and Vision Com-puting New Zealand (IVCNZ), 2012, pp. 509–514.

[3] B. Lau, C. Sprunk, and W. Burgard, “Incremental Up-dates of Configuration Space Representations for Non-Circular Mobile Robots with 2D, 2.5D, or 3D ObstacleModels,” in Proc. of the European Conference on MobileRobots (ECMR), 2011.

[4] A. Hornung, M. Phillips, E. Jones, M. Bennewitz,M. Likhachev, and S. Chitta, “Navigation in Three-dimensional Cluttered Environments for Mobile Manip-ulation,” in Proc. of the IEEE Int. Conf. on Robotics &Automation (ICRA), 2012, pp. 423–429.

[5] M. Quigley, S. Batra, S. Gould, E. Klingbeil, Q. Le,A. Wellman, and A. Ng, “High-accuracy 3D sensingfor mobile manipulation: Improving object detectionand door opening,” in Proc. of the IEEE Int. Conf. onRobotics & Automation (ICRA), 2009, pp. 2816–2822.

[6] K. Aydemir, A. Sjoo and P. P. Jensfelt, “Object searchon a mobile robot using relational spatial information,”in Proc. of the Intl. Conf. on Intelligent AutonomousSystems (IAV), 2010.

[7] J. Velez, G. Hemann, A. Huang, I. Posner, and N. Roy,“Active exploration for robust object detection,” inProc. of the Int. Conf. on Artificial Intelligence (IJCAI),2011, pp. 2752–2757.

[8] J. Cunha, E. Pedrosa, C. Cruz, A. Neves, and N. Lau,“Using a Depth Camera for Indoor Robot Localizationand Navigation,” in RSS Workshop on RGB-D: Ad-vanced Reasoning with Depth Cameras, 2011.

[9] R. Bostelman, T. Hong, and R. Madhavan, “TowardsAGV safety and navigation advancement obstacle detec-tion using a TOF range camera,” in Advanced Robotics,

2005. ICAR ’05. Proceedings., 12th International Con-ference on, 2005, pp. 460–467.

[10] J. Biswas and M. Veloso, “Depth Camera Based IndoorMobile Robot Localization and Navigation,” in Proc. ofthe IEEE Int. Conf. on Robotics & Automation (ICRA),2012.

[11] D. Maier, A. Hornung, and M. Bennewitz, “Real-TimeNavigation in 3D Environments Based on Depth Cam-era Data,” in Proc. of Int. Conf. on Humanoid Robotics(HUMANOID), 2012.

[12] B. Douillard, D. Fox, and F. Ramos, “Laser and visionbased outdoor object mapping,” in Proc. of Robotics:Science and Systems (RSS), 2008.

[13] A. Aydemir, M. Godelbecker, A. Pronobis, K. Sjoo, andP. Jensfelt, “Plan-based Object Search and Explorationusing Semantic Spatial Knowledge in the Real World,”in Proc. of the European Conference on Mobile Robots(ECMR), 2011.

[14] F. Oleari, D. Lodi Rizzini, and S. Caselli, “A Low-CostStereo System for 3D Object Recognition,” in Proc. ofthe Int. Conf. on Intelligent Computer Communicationand Processing (ICCP), Cluj-Napoca, Romania, Sept2013.

[15] H. Kato and M. Billinghurst, “Marker tracking andHMD calibration for a video-based augmented realityconferencing system,” in Proc. of the Int. Workshop onAugmented Reality, 1999.

[16] R. Rusu and S. Cousins, “3D is here: Point CloudLibrary (PCL),” in Proc. of the IEEE Int. Conf. onRobotics & Automation (ICRA), Shanghai, China, May9-13 2011.

[17] D. Fox, W. Burgard and S. Thrun, “The Dynamic Win-dow Approach to Collision Avoidance,” IEEE RoboticsAutomation Magazine, March 1997.


a viewpoint planning and navigation algorithm for mobile

Documents