countering drift in visual odometry for planetary rovers...

6
Countering Drift in Visual Odometry for Planetary Rovers by Registering Boulders in Ground and Orbital Images Emmanouil Hourdakis and Manolis Lourakis Abstract—Visual Odometry (VO) is a proven technology for planetary exploration rovers, facilitating their localization with a small error over medium-sized trajectories. However, due to VO’s incremental mode of operation, its estimation error accumulates over time, resulting in considerable drift for long trajectories. This paper proposes a global localization method that counters VO drift by matching boulders extracted from overhead and ground images and using them periodically to re- localize the rover and refine VO estimates. The performance of the proposed method is evaluated with the aid of overhead im- agery of different resolutions. Experimental results demonstrate that a very terse representation, consisting of approximate boulder locations only, suffices for significantly improving the accuracy of VO over long traverses. I. I NTRODUCTION Exploration of planets, and especially of Mars, has in recent years been among the main objectives of missions planned by space agencies like ESA and NASA. These missions include activities such as the characterization of geological material, the detection of water presence, the identification of climate change or the assessment of human habitability potential. Autonomous rovers are indispensable to planetary science, facilitating both in situ exploration and selective collection of samples to be returned to Earth. Con- sequently, planetary rovers should possess advanced mobility capabilities, which in turn leads to the requirement that they should be able to accurately localize themselves over long traverses using sensory input and on-board processing. To support this functionality, vision-based approaches have been extensively used to determine a rover’s position using on-board cameras [1]. In particular, research has focused on variations of the VO paradigm [2], [3], which refers to the process of determining the position and orientation (i.e. pose) of a camera rig by successively analyzing the images it acquires over time. Localization in an extraterrestrial environment cannot rely on features such as line segments or special structures such as planar surfaces, which abound in terrestrial man-made urban settings. Martian terrain, for instance, is dusty and has variable morphology, being occa- sionally covered with self-occluding boulders and geological elements of similar luminance. To account for this, planetary VO relies on naturally occurring point features defined by the local image content. An inherent drawback of VO is that it determines camera pose for each frame in an incremental fashion, therefore it is The authors are with the Institute of Computer Science, Foundation for Research and Technology - Hellas, N. Plastira 100, Vassilika Vouton, GR 700 13, Heraklion, Greece. Work partially supported by the ESA SPARTAN extension activity SEXTANT, ref. 4000103357/11/NL/EK. Images used for evaluation are courtesy of Marcos Avil´ es. sensitive to errors that accumulate over time. Furthermore, VO estimates the position of the rover relative to its starting location. While this might suffice for small to medium-sized traverses, complex planetary missions that involve longer traverses demand the rover position to be expressed in an absolute coordinate system. Such a global localization capacity [4], will allow future missions to overcome poten- tial landing uncertainties introduced by localization errors during descent, and determine the position of a site using a planet’s cartographic reference system. Apart from planetary rovers, this capacity can also be employed in GPS-denied environments on Earth. This paper proposes a global localization method that is able to determine the pose of a rover by matching boulders among images obtained from its on-board cameras and or- thoimages pre-acquired by the planet’s orbiters. Localizing a robot outdoors using prior knowledge of the environment has been addressed by several authors, e.g. see [5] and references therein. These approaches, however, are tailored to terrestrial environments and employ features such roads, line segments and building footprints or sensors such as GPS and LIDAR that are only found on Earth. Thus, the primary contribution of the proposed method is to demonstrate that localization improvements can be based on a compact representation derived from common geological structures such as boulders. Due to its reliance on mean statistics to describe boulders and the use of geometrical information to derive position estimates, the method is resilient to changes in illumination and ground morphology. Experiments demonstrate that the rover can localize itself over long distances, with less than 1% positional error over the entire traverse. II. PREVIOUS WORK VO for planetary rovers has demonstrated important break- throughs over the last decade, yielding fairly accurate lo- calization results for small and medium-sized traverses (see [6] for a review of the algorithms implemented on NASA’s Mars Exploration Rovers (MERs) and [3] for our recent proposal). Even though VO can compute highly accurate relative motion estimates, it suffers from drift errors, which accumulate over time and cannot be recovered from in the absence of more elaborate knowledge of the environment. One way to rectify those errors is to employ a visual simultaneous localization and mapping (VSLAM) approach. VSLAM is the process of incrementally updating a map of an unknown environment whilst localizing the rover within the said map [7]. However, VSLAM techniques do not scale well with the number of feature matches, thus are quite 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Congress Center Hamburg Sept 28 - Oct 2, 2015. Hamburg, Germany 978-1-4799-9994-1/15/$31.00 ©2015 IEEE 111

Upload: others

Post on 15-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Countering Drift in Visual Odometry for Planetary Rovers ...users.ics.forth.gr/~lourakis/publ/2015_iros_vo.pdf · Countering Drift in Visual Odometry for Planetary Rovers by Registering

Countering Drift in Visual Odometry for Planetary Rovers byRegistering Boulders in Ground and Orbital Images

Emmanouil Hourdakis and Manolis Lourakis

Abstract— Visual Odometry (VO) is a proven technology forplanetary exploration rovers, facilitating their localization witha small error over medium-sized trajectories. However, dueto VO’s incremental mode of operation, its estimation erroraccumulates over time, resulting in considerable drift for longtrajectories. This paper proposes a global localization methodthat counters VO drift by matching boulders extracted fromoverhead and ground images and using them periodically to re-localize the rover and refine VO estimates. The performance ofthe proposed method is evaluated with the aid of overhead im-agery of different resolutions. Experimental results demonstratethat a very terse representation, consisting of approximateboulder locations only, suffices for significantly improving theaccuracy of VO over long traverses.

I. INTRODUCTION

Exploration of planets, and especially of Mars, has inrecent years been among the main objectives of missionsplanned by space agencies like ESA and NASA. Thesemissions include activities such as the characterization ofgeological material, the detection of water presence, theidentification of climate change or the assessment of humanhabitability potential. Autonomous rovers are indispensableto planetary science, facilitating both in situ exploration andselective collection of samples to be returned to Earth. Con-sequently, planetary rovers should possess advanced mobilitycapabilities, which in turn leads to the requirement that theyshould be able to accurately localize themselves over longtraverses using sensory input and on-board processing.

To support this functionality, vision-based approaches havebeen extensively used to determine a rover’s position usingon-board cameras [1]. In particular, research has focusedon variations of the VO paradigm [2], [3], which refers tothe process of determining the position and orientation (i.e.pose) of a camera rig by successively analyzing the imagesit acquires over time. Localization in an extraterrestrialenvironment cannot rely on features such as line segmentsor special structures such as planar surfaces, which aboundin terrestrial man-made urban settings. Martian terrain, forinstance, is dusty and has variable morphology, being occa-sionally covered with self-occluding boulders and geologicalelements of similar luminance. To account for this, planetaryVO relies on naturally occurring point features defined by thelocal image content.

An inherent drawback of VO is that it determines camerapose for each frame in an incremental fashion, therefore it is

The authors are with the Institute of Computer Science, Foundation forResearch and Technology - Hellas, N. Plastira 100, Vassilika Vouton, GR700 13, Heraklion, Greece. Work partially supported by the ESA SPARTANextension activity SEXTANT, ref. 4000103357/11/NL/EK. Images used forevaluation are courtesy of Marcos Aviles.

sensitive to errors that accumulate over time. Furthermore,VO estimates the position of the rover relative to its startinglocation. While this might suffice for small to medium-sizedtraverses, complex planetary missions that involve longertraverses demand the rover position to be expressed inan absolute coordinate system. Such a global localizationcapacity [4], will allow future missions to overcome poten-tial landing uncertainties introduced by localization errorsduring descent, and determine the position of a site using aplanet’s cartographic reference system. Apart from planetaryrovers, this capacity can also be employed in GPS-deniedenvironments on Earth.

This paper proposes a global localization method that isable to determine the pose of a rover by matching bouldersamong images obtained from its on-board cameras and or-thoimages pre-acquired by the planet’s orbiters. Localizing arobot outdoors using prior knowledge of the environment hasbeen addressed by several authors, e.g. see [5] and referencestherein. These approaches, however, are tailored to terrestrialenvironments and employ features such roads, line segmentsand building footprints or sensors such as GPS and LIDARthat are only found on Earth. Thus, the primary contributionof the proposed method is to demonstrate that localizationimprovements can be based on a compact representationderived from common geological structures such as boulders.Due to its reliance on mean statistics to describe bouldersand the use of geometrical information to derive positionestimates, the method is resilient to changes in illuminationand ground morphology. Experiments demonstrate that therover can localize itself over long distances, with less than1% positional error over the entire traverse.

II. PREVIOUS WORK

VO for planetary rovers has demonstrated important break-throughs over the last decade, yielding fairly accurate lo-calization results for small and medium-sized traverses (see[6] for a review of the algorithms implemented on NASA’sMars Exploration Rovers (MERs) and [3] for our recentproposal). Even though VO can compute highly accuraterelative motion estimates, it suffers from drift errors, whichaccumulate over time and cannot be recovered from in theabsence of more elaborate knowledge of the environment.One way to rectify those errors is to employ a visualsimultaneous localization and mapping (VSLAM) approach.VSLAM is the process of incrementally updating a map ofan unknown environment whilst localizing the rover withinthe said map [7]. However, VSLAM techniques do not scalewell with the number of feature matches, thus are quite

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)Congress Center HamburgSept 28 - Oct 2, 2015. Hamburg, Germany

978-1-4799-9994-1/15/$31.00 ©2015 IEEE 111

Page 2: Countering Drift in Visual Odometry for Planetary Rovers ...users.ics.forth.gr/~lourakis/publ/2015_iros_vo.pdf · Countering Drift in Visual Odometry for Planetary Rovers by Registering

demanding for the limited computational resources availableon a planetary rover (for example, the radiation-hardenedRAD6000 and RAD750 flight processors, respectively on-board the MERs and Curiosity, are capable of executing upto only 35 and 400 MIPS). Furthermore, VSLAM is moreeffective when a vehicle closes loops, which is not commonfor a planetary rover. Another possibility is the so-calledwindowed bundle adjustment [8], which performs a localoptimization over a few most recent frames. Still, even that ischallenging for the computational capacity of current rovers.

To mitigate the problems due to the accumulated VOerror to a manageable computational cost, researchers haveresorted to using additional, non-visual sensors or priorinformation regarding the environment. In [9], for example,orientation sensors on-board the rover were used to correctthe VO orientation error and eventually reduce the drift toa linear function of the distance traveled. Sparse bundleadjustment and an Inertial Measurement Unit (IMU) wereemployed in [10] to correct VO drift. Other works combinea LIDAR sensor with geo-referenced Digital Elevation Maps(DEMs), in order to align orbital and ground images [11].Similarly, [12] creates a DEM using the rover’s stereoimages, which is then matched against the DEM obtainedfrom an orbiter. Localization in [13] is based on matchinga DEM built by the rover against an orbital terrain map,combined with aligning the horizon curve detected by therover with that rendered from the orbital DEM. In [14],panoramic ground images are warped to obtain bird’s eyeviews that are then matched to satellite images with standardpatch descriptors. Despite achieving remarkable localizationresults, the aforementioned methods rely on information thatis not always readily available on planetary rovers. Forexample, DEMs must be reconstructed from images of theplanet’s surface, and require considerable storage capacity aswell as special hardware such as GPUs for being renderedfast [13]. Moreover, in contrast to stereo cameras which arealready being used by rovers for non-navigation purposes(e.g. discern geological information), the use of a LIDARsensor would require additional installations, increasing thevehicle’s weight, complexity and power consumption. Morerelevant to this work, [4] performs localization by matchingnetworks of regions of interest that are extracted from orbitaland ground images using a classification scheme.

III. PROPOSED METHOD

Our proposed localization method uses as features boul-ders from orthorectified aerial images of a planet’s surface(aka orbital image). The locations of these boulders can beknown accurately via offline processing of images acquiredfrom orbit. While traversing the ground, the rover extractsboulders from its images and attempts to match them againstthe overhead ones. Upon a successful match, the rover canupdate its VO estimate, thereby reduce drift. In this sense,boulders serve as known landmarks allowing the rover torelocalize itself with respect to them. The aforementionedprocedure is implemented by a pipeline which (i) estimateslocal relative displacement using VO, (ii) derives a statistical

Fig. 1: Different snapshots from the ground stereo sequence,showing regions with mineral deposits and soil, occlusionbetween boulders and shadows.

descriptor for boulders in the ground and orbital images,(iii) registers the ground and orbital images by estimatingthe transformation that minimizes a misalignment error de-fined on boulder locations and (iv) re-initializes VO usingthe estimated ground-orbital transformation. The followingsubsections present each step in detail.

Even though high resolution imagery from the HiRISEMars orbiter [15] and the Martian rovers stereo camerasis publicly available, no public dataset exists that includesinformation from both these sources for the same traverse.Consequently, to evaluate our method, we employed a setof synthetic images that correspond to a terrain with mor-phological properties similar to those found in a Martianenvironment. The dataset consists of a stereo image sequencewith accurate ground truth and an orthoimage depicting anarea that includes the ground traverse, as would be seenfrom a high-flying orbiter. The dataset simulates a Mars-likesurface featuring rocks of various sizes and their shadows,as well as sand and soil segments (see Fig. 1).

A. Visual Odometry

A feature-based VO pipeline is assigned the task ofproviding local motion estimates. It assumes that the rover’simaging system consists of two parallel forward-lookingcameras, a configuration that eliminates the well-knowndepth/scale ambiguity [16] and simplifies the estimation ofmotion when the rover is stationary [17]. VO employs SIFTfor feature detection & matching and absolute orientation onreconstructed sparse 3D point clouds for motion estimation.More details on our VO pipeline are given next.

1) Feature detection and matching: SIFT is a populartechnique for extracting point features and their descrip-tors [18]. SIFT features are detected as scale-space extremaand then described using weighted spatial histograms of gra-dient orientations. Considering that planetary rovers do notmove fast, successive images are not expected to differ con-siderably, thus scale-invariance of features is not essential.Therefore, in this work, the computationally expensive scale-space filtering for determining feature locations is avoidedand SIFT features are detected at a single scale. For each ofthese features, a SIFT descriptor capturing the distribution oforientations in a region around it is computed. For increased

112

Page 3: Countering Drift in Visual Odometry for Planetary Rovers ...users.ics.forth.gr/~lourakis/publ/2015_iros_vo.pdf · Countering Drift in Visual Odometry for Planetary Rovers by Registering

efficiency, SIFT descriptors are computed without orientationnormalization according to the dominant gradient direction.

Prior to estimating 3D motion, the 2D motion of imagepoints has to be determined by matching them across images.Point matching proceeds according to the standard distanceratio matching strategy [18], as follows. Given an image pair,matches are identified by finding the two nearest neighborsof each interest point from the first image among thosein the second, and only accepting a match if the distanceto the closest neighbor is less than a fixed fraction ofthat to the second closest neighbor. This fraction can beraised to select more matches or lowered to select only themost reliable. Distances among SIFT descriptors are usuallycomputed with the Euclidean (L2) norm. Here, higher qualitymatches are obtained by substituting L2 with the χ2 distancewhich originates from the χ2 test statistic [19]. This is ahistogram distance that takes into account the fact that inmany natural histograms, the difference between large binsis less important than the difference between small bins andshould therefore be reduced. Compared to more elaboratedistances, the use of the χ2 distance has been found to offera good performance / computational cost trade-off.

2) Motion estimation: Using the feature detection andmatching techniques described above, a number of pointsare detected and matched between the two stereo views at acertain time instant t. Knowledge of the stereo calibrationparameters allows the estimation via triangulation of the3D points giving rise to the matched image projections.Triangulation recovers 3D points as the intersections of back-projected rays defined by the matched image projections andthe camera centers. Considering that there is no guaranteethat back-projected rays will actually intersect in space (i.e.,they are not skew), matched image points should be refinedprior to triangulation so as to exactly satisfy the underlyingepipolar geometry. This is achieved by computing the pointson the epipolar lines that are closest to the original ones,using the Sampson approximation of the distance of pointsto epipolar lines [20].

As the two cameras move, detected feature points aretracked over time in both stereo views from time t tot + 1. By triangulating the tracked points in time t + 1,the camera system motion can be computed as the rigidtransformation bringing the 3D points of time t in agreementwith those obtained at time t + 1. Determining the rigidbody transformation that aligns two matched 3D point setsis known as the absolute orientation problem. Its solutioninvolves the minimization of the mean squared error be-tween the point sets under the sought transformation and isexpressed analytically with the aid of unit quaternions [21].Since incorrectly matched points caused by phenomena suchas occlusions, depth discontinuities, repetitive patterns, etc.cannot be completely avoided in descriptor-based matching,care must be taken so that their influence on the estimationof motion is limited. This is achieved by embedding the esti-mation of motion in a robust regression framework based onRANSAC [22], which ensures the detection and eliminationof spurious matches.

Fig. 2: Overhead image detail with visible boulders.

B. Boulder detection

Automatic detection of boulders in surface images isa difficult problem, since no assumptions can be maderegarding the ground morphology, boulder size, texture orcolor (cf. Fig. 2). For example, images acquired on Marshave small color and illumination variations, therefore shad-ows drastically reduce boulders discrepancy [23]. To detectboulders, existing approaches have relied on edge detection[24], estimation of protrusion from a locally fitted groundplane [25], or template-based matching [26]. Nevertheless,even if a reliable algorithm for boulder detection wasavailable, ground images differ substantially compared tooverhead ones due to their very different vantage points.Differences in resolution, lighting and time of observationfurther complicate the situation. Therefore, boulder match-ing using appearance cues is particularly challenging. Theapproach adopted here is to base boulder detection andmatching on geometrical information. Boulder detection inorbital and ground images is described next.

1) Orbital boulder detection: Overhead images are pro-cessed using adaptive Otsu thresholding to convert themto binary, followed by 15× 15 median filtering to reducenoise while preserving edges. Connected components arethen extracted from the resulting binary image and their areais calculated. Only those components whose area is above acertain threshold are retained and their centroids are com-puted. Fig. 3 demonstrates the result of this process. Sinceoverhead images are geo-registered, the computed centroidsare readily associated with cartographic coordinates.

2) Ground boulder detection: Ground boulders are ex-tracted with the following sequence of operations. First,images are segmented using an approach based on meanshift [27]. Mean shift is a non-parametric kernel densityestimation technique, based on iterative clustering. It wasemployed in this work due to its robustness to parameterdefinition, as it works well with clusters of arbitrary size,shape and number, and its ability to preserve discontinuitiesin an image after smoothing. To reduce over-segmentation,a post-processing step that iteratively merges neighboringregions with similar intensity characteristics is applied to

113

Page 4: Countering Drift in Visual Odometry for Planetary Rovers ...users.ics.forth.gr/~lourakis/publ/2015_iros_vo.pdf · Countering Drift in Visual Odometry for Planetary Rovers by Registering

Fig. 3: Connected components shown as white blobs, corre-sponding to boulders detected in the image of Fig. 2. Redcrosses mark the centroids of sufficiently large components.

Fig. 4: Segmentation of an image from the rover’s left cam-era. Original image (left), segmented image superimposedsemi-transparently on the original (right).

the output of mean shift. Small regions are eliminatedfrom further consideration as it is unlikely that they arevisible in the overhead images, while the remaining regionsare assigned a unique label. Fig. 4 shows representativesegmentation results for the first left frame of the sequence.It is evident that mean shift is able to preserve edges on theboulders. This effect drastically improves the segmentationresults, since boulders found in the said scenery usuallyexhibit several color discontinuities on their surface, causingother algorithms to fail.

Segmented regions are temporally associated betweensuccessive frames with the aid of reconstructed 3D points.Specifically, SIFT features extracted from the segmentedregions are matched between stereo views and used torecover 3D points via triangulation. For all n segmentedregions in frame t and m segmented regions in frame t+1, an×m voting matrix M is formed. Each new reconstructed 3Dpoint is assigned two labels, one for the segmented region itprojects into in frame t and another for that in frame t +1,and casts one vote to the corresponding row and columnof M. These labels are associated in consecutive frames bysolving the assignment problem for M with the Munkresalgorithm [28]. Labels with no vote in frame t are discarded,while labels with no vote in frame t + 1 are consideredas corresponding to newly seen boulders. Fig. 5 illustrateshow this voting scheme succeeds in temporally matching

Fig. 5: Boulders matched between successive time instantsafter voting and grouping the 3D points. Identically coloredpixels share the same boulder label.

Fig. 6: 3D points projected on the orbital image, drawn indifferent colors according to their labels. Left image showspoints before clustering and right after. (Colors are assignedindependently to each image).

boulders.Due to SIFT feature mismatches and occasional over-

segmentation, there is a strong possibility for point cloudsoriginating from the same boulder to be assigned differentlabels. To prevent this, we perform agglomerative clusteringon reconstructed point coordinates, which employs theirEuclidean distances to form groups of uniformly clustered3D points that represent boulders. Fig. 6 illustrates thereassignment of labels to point clouds after clustering, whichpermits points from the same boulder that have been assigneddifferent labels, to be merged into one. After grouping, weuse the mean of each 3D point cloud, as a representationfor each boulder. Due to relying on the mean statisticof strictly geometrical information, shadows on a boulder,terrain morphological variations and changes in illuminationdo not compromise the quality of the boulder representation.

C. Ground-orbital registration

Ground boulders are extracted over short image subse-quences with the aforementioned procedure. The centroids ofthe extracted 3D point clouds are transformed to a commoncoordinate frame using the underlying VO estimates andthen projected vertically to obtain a local 2D map. Localmaps made up of detected ground boulders are registeredwith the 2D map formed by overhead boulders, using theIterative Closest Point (ICP) algorithm with the point-to-point distance metric [29]. ICP iterates between establishingputative correspondences between two point sets and esti-mating a geometric transformation that registers them. For

114

Page 5: Countering Drift in Visual Odometry for Planetary Rovers ...users.ics.forth.gr/~lourakis/publ/2015_iros_vo.pdf · Countering Drift in Visual Odometry for Planetary Rovers by Registering

ICP to converge correctly, the initial pose of a local mapshould be sufficiently close to that of the overhead map,which in turn ensures that sufficient overlap exists betweenthe two. Numerous enhancements to the basic ICP algorithmhave been proposed that aim to widen its convergence basin,accelerate its rate of convergence or increase its robustnessto local minima, outlying points and noise [30]. In thiswork, the following two enhancements were found to im-prove convergence speed and accuracy: motion parameterextrapolation, using the method outlined in [29] and rejecting20% to 40% of the worst matches.

As a byproduct, ICP matching yields the similarity trans-formation that aligns the ground with the overhead boulders.Eventually, this transformation is used for correcting the2D position and in-plane rotational components of the VOestimate. To reduce the computational overhead, the roverselects for matching only those overhead boulders that arein the vicinity of its current position as estimated using theVO. It is noted that the application of ICP assumes thatthe starting point of the rover trajectory is approximatelyknown in the orbital map. When this is not the case, initialground-orbital map registration can be based on a lessconstrained, albeit more computationally expensive scheme,such as geometric hashing [31].

IV. EXPERIMENTAL RESULTS

Results from the experimental evaluation of a Matlabimplementation of the method are presented in this section. Asequence of 1667 synthetic stereo images with a resolution of512×384 pixels and a 66◦×52 ◦ field of view was employed.The images emulated a stereo camera pair mounted at aheight of 0.7m above the ground, with an angle of 39 ◦ withthe horizon and a baseline equal to 0.12m. An S-shaped,100m long rover traverse was simulated. Ground truth poseswere available for all images of the traverse. Plain VO alongthe entire traverse estimated the final location of the roverwith a positional error around 2.5m. The simulated orbitalorthoimage was 15360×15360 pixels and depicted an areaof 76.8× 76.8m2 using a resolution of 0.005m per pixel.VO estimates were used to transform boulders exracted fromthe ground images into a common reference frame. Theaccumulated VO error was rectified every 30 frames, basedon the ground-orbital boulder registration.

To test the limits of the method, and determine whetherit is resilient to lower resolution orbital images, we haveevaluated its accuracy when applied to orbital images thatwere resampled with increasingly lower resolutions. Morespecifically, the original orbital image was shrunk to 75%,50% and 25% of its dimensions and the proposed methodwas applied to all these lower resolution orbital images.Fig. 7(a) illustrates that the proposed method yields highlyaccurate positional estimates for all resolutions tested, main-taining the corresponding error to less than 1% of thetraveled distance over the entire 100m traverse. The trans-lational error between an estimate t and the true trans-lation t is computed as ||t − t||, whereas the rotationalerror between an estimated rotation matrix R and the true

Fig. 7: Positional (top) and orientation (bottom) errors overthe entire traverse. Plain VO error is plotted in red, whilethe remaining curves plot the translational error after correc-tion, for different orbital image resolutions: 15360× 15360(green), 11520 × 11520 (blue), 7680 × 7680 (cyan) and3840×3840 (magenta).

R is arccos((trace(R−1R)−1)/2) and corresponds to theamount of rotation about a unit vector that transfers R toR. A comparison of the rotational errors for the trajectoriescomputed using VO with and without orbital correction isincluded in Fig. 7(b). From these plots, it is clear that theincorporation of orbital imagery in the localization pipelineachieves considerable improvements in the accuracy of VOestimates, even with orbital images of low resolution. Itis noteworthy that the proposed VO refinement performsworse than plain VO at the beginning of the trajectory,especially when lower resolution orbital images are used.This is because the accuracy of VO corrections estimatedwith the proposed method is limited by the orbital imageresolution, whereas plain VO is still quite accurate as drifthas not yet accumulated significantly. It is also pointed outthat VO refinements, either correct or wrong, cause thesudden jumps observed in the orientation error for the twolower resolutions in Fig. 7(b) (cyan and magenta curves).

Table I summarizes the method’s translation and orienta-tion errors at the end of the traverse, along with the number

115

Page 6: Countering Drift in Visual Odometry for Planetary Rovers ...users.ics.forth.gr/~lourakis/publ/2015_iros_vo.pdf · Countering Drift in Visual Odometry for Planetary Rovers by Registering

TABLE I: Final position and orientation errors (in meters &degrees) for increasingly coarser orbital image resolutions.

Resolution Positional Error Rotational Error #Boulders15360×15360 0.7743 0.3809 859311520×11520 0.9939 0.5378 63487680×7680 0.6954 0.9254 19513840×3840 1.2522 1.2457 2522

of boulders used during the matching process in each case. Itis worth mentioning that even the coarsest of the simulatedorbital images has a resolution considerably higher than thatprovided from current high-resolution orbital imagers suchas the HiRISE instrument (i.e. 2cm vs. 25cm per pixel).However, imagery of suitable detail can be obtained bysuper-resolution techniques such as [32], which reconstructshigher resolution imagery at 5cm per pixel from multipleoverlapping orbital images.

V. CONCLUSION

Global localization is an important capacity for futureplanetary exploration missions. Images acquired by orbitersof a planet have opened the potential of localizing a planetaryrover in a global reference frame, independently of its initiallocation and traveled distance. This paper has suggested amethod that is able to correct the drift in VO by registeringboulders detected in ground and orbital images. Due to itsreliance on geometrical information, the method is able tomatch such images despite their large appearance differences.

With respect to its implementation, the proposed methodis well-suited to the low-end hardware available on planetaryrovers, as it is both computationally and memory efficient.The feature matching process diminishes each boulder to asingle mean statistic, not overwhelming configurations withlimited storage. In addition, several parts of the boulderdetection pipeline can be parallelized and therefore can beaccelerated with an FPGA-based implementation. As theexperimental results demonstrate, the method can be appliedsuccessfully to orbital maps of quite coarse resolution. Onedrawback of the method is that it only provides correctionsto surge (longitudinal) and sway (lateral) movements butnot heave (vertical). To remedy this, one should obtain a3D representation of the boulders of the orbital image, orconsider additional constraints during the matching process(e.g. using different transforms for the projection of theground boulders on the orbital image).

REFERENCES

[1] L. Matthies et al., “Computer vision on Mars,” Int. J. Comput. Vision,vol. 75, no. 1, pp. 67–92, 2007.

[2] D. Nister, O. Naroditsky, and J. R. Bergen, “Visual odometry,” inCVPR, vol. 1, 2004, pp. 652–659.

[3] G. Lentaris, I. Stamoulias, D. Soudris, and M. Lourakis, “HW/SWco-design and FPGA acceleration of visual odometry algorithms forrover navigation on Mars,” IEEE Trans. Circuits Syst. Video Technol.,2015, to appear.

[4] E. Boukas, A. Gasteratos, and G. Visentin, “Towards Orbital basedGlobal Rover Localization,” in Proc. ICRA, 2015, pp. 2874–2881.

[5] R. Kummerle, B. Steder, C. Dornhege, A. Kleiner, G. Grisetti, andW. Burgard, “Large scale graph-based SLAM using aerial images asprior information,” Auton. Robots, vol. 30, no. 1, pp. 25–39, 2011.

[6] M. Maimone, Y. Cheng, and L. Matthies, “Two years of visualodometry on the Mars exploration rovers,” J. Field Robot., vol. 24,no. 3, pp. 169–186, 2007.

[7] A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse, “MonoSLAM:Real-Time Single Camera SLAM,” IEEE Trans. Pattern Anal. Mach.Intell., vol. 29, no. 6, pp. 1052–1067, 2007.

[8] E. Mouragnon, M. Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd,“Real time localization and 3D reconstruction,” in CVPR, vol. 1, 2006,pp. 363–370.

[9] C. F. Olson, L. H. Matthies, M. Schoppers, and M. W. Maimone,“Rover navigation using stereo ego-motion,” Robot. Auton. Syst.,vol. 43, no. 4, pp. 215–229, 2003.

[10] K. Konolige, M. Agrawal, and J. Sola, “Large-scale visual odometryfor rough terrain,” in Robotics Research. Springer, 2011, pp. 201–212.

[11] P. J. Carle, P. T. Furgale, and T. D. Barfoot, “Long-range roverlocalization by matching lidar scans to orbital elevation maps,” J. FieldRobot., vol. 27, no. 3, pp. 344–370, 2010.

[12] J. W. Hwangbo, K. Di, and R. Li, “Integration of orbital and groundimage networks for the automation of rover localization,” in ASPRS2009 Annual Conference, 2009.

[13] A. Nefian, X. Boyssounousse, L. Edwards, T. Kim, E. Hand, J. Rhizor,M. Deans, G. Bebis, and T. Fong, “Planetary rover localization withinorbital maps,” in ICIP. IEEE, 2014.

[14] A. Viswanathan, B. R. Pires, and D. Huber, “Vision based robotlocalization by ground to satellite matching in GPS-denied situations,”in IEEE IROS, 2014, pp. 192–198.

[15] A. S. McEwen et al., “Mars reconnaissance orbiter’s high resolutionimaging science experiment (HiRISE),” J. Geophys. Res.: Planets(1991-2012), vol. 112, no. E5, 2007.

[16] M. Lourakis and X. Zabulis, “Accurate Scale Factor Estimation in 3DReconstruction,” in CAIP, ser. LNCS. Springer Berlin Heidelberg,2013, vol. 8047, pp. 498–506.

[17] R. Hartley and A. Zisserman, Multiple View Geometry in ComputerVision, 2nd ed. Cambridge University Press,, 2004.

[18] D. G. Lowe, “Distinctive image features from scale-invariant key-points,” Int. J. Comput. Vision, vol. 60:2, pp. 91–110, 2004.

[19] Y. Rubner, J. Puzicha, C. Tomasi, and J. M. Buhmann, “Empiricalevaluation of dissimilarity measures for color and texture,” Comput.Vis. Image Und., vol. 84, no. 1, pp. 25–43, 2001.

[20] P. Sampson, “Fitting conic sections to very scattered data: An iterativerefinement of the Bookstein algorithm,” Computer Graphics andImage Processing, vol. 18, no. 1, pp. 97–108, 1982.

[21] B. K. P. Horn, “Closed-form solution of absolute orientation using unitquaternions,” J. Opt. Soc. Am. A, vol. 4, no. 4, pp. 629–642, 1987.

[22] M. A. Fischler and R. C. Bolles, “Random sample consensus: Aparadigm for model fitting with applications to image analysis andautomated cartography,” Commun. ACM, vol. 24:6, pp. 381–395, 1981.

[23] D. R. Thompson and R. Castano, “Performance comparison of rockdetection algorithms for autonomous planetary geology,” in Aerosp.Conf. IEEE, 2007, pp. 1–9.

[24] R. Castano et al., “Onboard autonomous rover science,” in Aerosp.Conf. IEEE, 2007, pp. 1–13.

[25] V. Gor, R. Castano, R. Manduchi, R. Anderson, and E. Mjolsness,“Autonomous rock detection for Mars terrain,” Space, pp. 1–14, 2001.

[26] V. C. Gulick, R. L. Morris, M. A. Ruzon, and T. L. Roush, “Au-tonomous image analyses during the 1999 Marsokhod rover field test,”J. Geophys. Res.: Planets (1991-2012), vol. 106, no. E4, pp. 7745–7763, 2001.

[27] D. Comaniciu and P. Meer, “Mean shift: A robust approach towardfeature space analysis,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 24, no. 5, pp. 603–619, 2002.

[28] F. Bourgeois and J.-C. Lassalle, “An extension of the Munkres algo-rithm for the assignment problem to rectangular matrices,” Commun.ACM, vol. 14, no. 12, pp. 802–804, 1971.

[29] P. Besl and N. McKay, “A Method for Registration of 3-D Shapes,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 14:2, pp. 239–256, 1992.

[30] S. Rusinkiewicz and M. Levoy, “Efficient Variants of the ICP Algo-rithm,” in 3DIM, 2001, pp. 145–152.

[31] Y. Lamdan and H. Wolfson, “Geometric hashing: A general andefficient model-based recognition scheme,” in ICCV, 1988, pp. 238–249.

[32] Y. Tao and J.-P. Muller, “Super-resolution of repeat-pass orbitalimagery at 400km altitude to obtain rover-scale imagery at 5cm,” inEuropean Planetary Science Congress, vol. 9, EPSC2014-201, 2014.

116