towards robust learning-based pose estimation of ...space 2016, 2016. [4]s. sharma and s. d’amico,...

2
TOWARDS ROBUST LEARNING-BASED POSE ESTIMATION OF NONCOOPERATIVE SPACECRAFT Tae Ha Park 1 , Sumant Sharma 1 and Simone D’Amico 1 ; 1 Space Rendezvous Laboratory, Stanford University, Durand Building, 496 Lomita Mall, Stanford, CA 94305 Abstract. This work presents the latest effort by Stanford’s Space Rendezvous Laboratory to develop a hardware-in-the-loop testbed for testing vision-based navi- gation algorithms. Along with the software image render- ing pipeline, the testbed was used to create Spacecraft Pose Estimation Dataset that was publicly released for an in- ternational competition on satellite pose estimation. This work also presents a novel Convolutional Neural Network (CNN) architecture that scored fourth place in the pose es- timation competition and explores texture randomization as part of the training procedure of the CNN. By random- izing the texture of the spacecraft in the synthetic imagery at the training stage, the CNN can generalize to space- borne imagery without additional training. Introduction. The ability to accurately determine and track the pose (i.e., the relative position and attitude) of a noncooperative client spacecraft with a set of mini- mal hardware is an enabling technology for current and future on-orbit servicing and debris removal missions such as RemoveDEBRIS mission by Surrey Space Centre, 1 the Phoenix program by DARPA, 2 and the Restore-L mission by NASA. 3 In particular, performing on-board pose esti- mation is key to the real-time generation of the approach trajectory and control update. The use of a single monoc- ular camera to perform pose estimation is especially at- tractive due to low power and mass requirements posed by small spacecraft such as CubeSats. Current state-of- the-art approaches employ image processing techniques to detect relevant features from a 2D image, which are then matched with features of a known 3D model of the client spacecraft in order to extract relative attitude and posi- tion information. However, these approaches are known to suffer from a lack of robustness due to extreme illu- mination conditions and dynamic Earth background in space imagery. Moreover, these approaches are compu- tationally demanding during pose initialization due to a large search space in determining the feature correspon- dences between the 2D image and the 3D model. In order to overcome these shortcomings, several au- thors have recently proposed to use deep Convolutional Neural Networks (CNN) to perform pose estimation. No- tably, the recent work of Sharma and D’Amico introduced a CNN with three branches that solves for the pose using state-of-the-art object detection and the Gauss-Newton algorithms. 4 The same work also introduced the Space- craft PosE Estimation Dataset (SPEED) benchmark that contains 15,300 images consisting of synthetic and actual camera images of a mock-up of the Tango spacecraft from the PRISMA mission. 5, 6 However, there are significant challenges that must be addressed before the application Figure 1. The Testbed for Rendezvous and Opti- cal Navigation (TRON) facility at Stanford’s Space Rendezvous Laboratory (SLAB). of such deep learning-based pose estimation algorithms in space missions. Most importantly, neural networks are known to lack robustness to data distributions different from the one used during training, and it must be verified that these algorithms can meet the accuracy requirements on spaceborne imagery even when trained solely on syn- thetically generated images. It is especially challenging since spaceborne imagery can contain texture and surface illumination properties and other unmodeled camera arti- facts that cannot be perfectly replicated in synthetic im- agery. Since spaceborne images are expensive to acquire, the CNN must be able to address this issue with minimal or no access to the properties of spaceborne imagery. Contributions. The primary contribution of this work is the presentation of the Space Rendezvous Lab- oratory’s (SLAB) recent effort to develop the hardware- in-the-loop testbed to validate vision-based navigation al- gorithms. Specifically, this work explains the continued development of the Testbed for Rendezvous and Optical Navigation (TRON) facility shown in Figure 1. The facil- ity consists of a set of two six degrees-of-freedom robotic arms, one mounted on the ground to hold a mockup model of a satellite or an asteroid, and another on a ceiling- mounted rail drive to hold a camera. The facility is also equipped with custom LED wall panels to simulate Earth albedo and a xenon short-arc lamp to simulate collimated sunlight in various orbit regimes. TRON as a whole allows for capturing the real imagery of a desired target with high-fidelity illumination conditions and a wide range of poses that is tracked by the Vicon motion capture cam- eras. These real images have different statistical distri- butions compared to synthetic images, so they provide a unique opportunity of validating the generalization capa- bility of spaceborne computer vision algorithms. Specifi- cally, the real imagery of TRON augments the synthetic 2 nd RPI Space Imaging Workshop. Saratoga Springs, NY. 28-30 October 2019 1

Upload: others

Post on 03-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TOWARDS ROBUST LEARNING-BASED POSE ESTIMATION OF ...Space 2016, 2016. [4]S. Sharma and S. D’Amico, \Pose estimation for non-cooperative rendezvous using neural networks," in 2019

TOWARDS ROBUST LEARNING-BASED POSE ESTIMATION OFNONCOOPERATIVE SPACECRAFTTae Ha Park1, Sumant Sharma1 and Simone D’Amico1; 1Space Rendezvous Laboratory, Stanford University,

Durand Building, 496 Lomita Mall, Stanford, CA 94305

Abstract. This work presents the latest effort by

Stanford’s Space Rendezvous Laboratory to develop a

hardware-in-the-loop testbed for testing vision-based navi-

gation algorithms. Along with the software image render-

ing pipeline, the testbed was used to create Spacecraft Pose

Estimation Dataset that was publicly released for an in-

ternational competition on satellite pose estimation. This

work also presents a novel Convolutional Neural Network

(CNN) architecture that scored fourth place in the pose es-

timation competition and explores texture randomization

as part of the training procedure of the CNN. By random-

izing the texture of the spacecraft in the synthetic imagery

at the training stage, the CNN can generalize to space-

borne imagery without additional training.

Introduction. The ability to accurately determine

and track the pose (i.e., the relative position and attitude)

of a noncooperative client spacecraft with a set of mini-

mal hardware is an enabling technology for current and

future on-orbit servicing and debris removal missions such

as RemoveDEBRIS mission by Surrey Space Centre,1 the

Phoenix program by DARPA,2 and the Restore-L mission

by NASA.3 In particular, performing on-board pose esti-

mation is key to the real-time generation of the approach

trajectory and control update. The use of a single monoc-

ular camera to perform pose estimation is especially at-

tractive due to low power and mass requirements posed

by small spacecraft such as CubeSats. Current state-of-

the-art approaches employ image processing techniques to

detect relevant features from a 2D image, which are then

matched with features of a known 3D model of the client

spacecraft in order to extract relative attitude and posi-

tion information. However, these approaches are known

to suffer from a lack of robustness due to extreme illu-

mination conditions and dynamic Earth background in

space imagery. Moreover, these approaches are compu-

tationally demanding during pose initialization due to a

large search space in determining the feature correspon-

dences between the 2D image and the 3D model.

In order to overcome these shortcomings, several au-

thors have recently proposed to use deep Convolutional

Neural Networks (CNN) to perform pose estimation. No-

tably, the recent work of Sharma and D’Amico introduced

a CNN with three branches that solves for the pose using

state-of-the-art object detection and the Gauss-Newton

algorithms.4 The same work also introduced the Space-

craft PosE Estimation Dataset (SPEED) benchmark that

contains 15,300 images consisting of synthetic and actual

camera images of a mock-up of the Tango spacecraft from

the PRISMA mission.5,6 However, there are significant

challenges that must be addressed before the application

Figure 1. The Testbed for Rendezvous and Opti-

cal Navigation (TRON) facility at Stanford’s Space

Rendezvous Laboratory (SLAB).

of such deep learning-based pose estimation algorithms

in space missions. Most importantly, neural networks are

known to lack robustness to data distributions different

from the one used during training, and it must be verified

that these algorithms can meet the accuracy requirements

on spaceborne imagery even when trained solely on syn-

thetically generated images. It is especially challenging

since spaceborne imagery can contain texture and surface

illumination properties and other unmodeled camera arti-

facts that cannot be perfectly replicated in synthetic im-

agery. Since spaceborne images are expensive to acquire,

the CNN must be able to address this issue with minimal

or no access to the properties of spaceborne imagery.

Contributions. The primary contribution of this

work is the presentation of the Space Rendezvous Lab-

oratory’s (SLAB) recent effort to develop the hardware-

in-the-loop testbed to validate vision-based navigation al-

gorithms. Specifically, this work explains the continued

development of the Testbed for Rendezvous and Optical

Navigation (TRON) facility shown in Figure 1. The facil-

ity consists of a set of two six degrees-of-freedom robotic

arms, one mounted on the ground to hold a mockup model

of a satellite or an asteroid, and another on a ceiling-

mounted rail drive to hold a camera. The facility is also

equipped with custom LED wall panels to simulate Earth

albedo and a xenon short-arc lamp to simulate collimated

sunlight in various orbit regimes. TRON as a whole allows

for capturing the real imagery of a desired target with

high-fidelity illumination conditions and a wide range of

poses that is tracked by the Vicon motion capture cam-

eras. These real images have different statistical distri-

butions compared to synthetic images, so they provide a

unique opportunity of validating the generalization capa-

bility of spaceborne computer vision algorithms. Specifi-

cally, the real imagery of TRON augments the synthetic

2nd RPI Space Imaging Workshop. Saratoga Springs, NY.

28-30 October 2019

1

Page 2: TOWARDS ROBUST LEARNING-BASED POSE ESTIMATION OF ...Space 2016, 2016. [4]S. Sharma and S. D’Amico, \Pose estimation for non-cooperative rendezvous using neural networks," in 2019

imagery in SPEED, which was publicly released to pro-

vide a common benchmark for satellite pose estimation

algorithms. SPEED was also used in the recent Satellite

Pose Estimation Challenge (SPEC)† organized by SLAB

and the Advanced Concepts Team of the European Space

Agency. Over 48 international teams participated in this

five months competition, and the winning team achieved

a sub-degree attitude and a centimeter-level position ac-

curacies on the synthetic test image set.

The secondary contribution of this work is a novel

method to enable an efficient pose estimation based on

a Convolutional Neural Network (CNN). The problem of

pose estimation is decoupled into object detection and

pose estimation networks. The pose estimation is per-

formed by regressing the 2D locations of the spacecraft’s

surface keypoints then solving the Perspective-n-Point

(PnP) problem.7 The extracted keypoints have known

correspondences to those in the 3D model, since the CNN

is trained to predict them in a pre-defined order. This

design choice allows for bypassing the computationally

expensive feature matching through algorithms such as

RANSAC8 and directly use publicly available PnP solvers

only once per image. The proposed architecture has

scored fourth place in SPEC and is shown to be fast and

robust to a variety of illumination conditions and inter-

spacecraft separation ranging from 3 to over 30 meters.

The tertiary contribution of this work is the introduc-

tion of a novel training procedure that improves the ro-

bustness of the CNN to spaceborne imagery when trained

solely on synthetic images. Specifically, inspired by the

recent work of Geirhos et al.,9 the technique of texture

randomization is introduced as part of the training pro-

cedure of the CNN. Geirhos et al. suggest that CNN tends

to focus on the local texture of the target object, thus ran-

domizing the object texture using the Neural Style Trans-

fer (NST) technique forces the CNN to instead learn the

global shape of the object.10 Following their work, a new

dataset is generated by applying NST to a custom syn-

thetic dataset that has same pose distribution as SPEED

dataset. It is shown that the network exposed to new

texture-randomized dataset during training performs bet-

ter on spaceborne images without having been trained on

them.

Overall, this work presents the current state of SLAB’s

capability of performing the hardware-in-the-loop space-

borne computer vision tasks. The continuous develop-

ment of the TRON facility and SPEED improves the

means of training and validating various pose estimation

algorithms and gauging their capability of generalizing

to the imagery of different statistical distributions. This

work also presents a novel CNN for pose estimation of

noncooperative spacecraft and a training mechanism that

improves the CNN’s performance on spaceborne imagery

without additional training.

†https://kelvins.esa.int/satellite-pose-estimation-challenge/home/

References.

[1] J. L. Forshaw, G. S. Aglietti, N. Navarathinam, H. Kad-hem, T. Salmon, A. Pisseloup, E. Joffre, T. Chabot, I. Re-tat, R. Axthelm, and et al., “RemoveDEBRIS: An in-orbitactive debris removal demonstration mission,” Acta As-tronautica, vol. 127, p. 448–463, 2016.

[2] B. Sullivan, D. Barnhart, L. Hill, P. Oppenheimer, B. L.Benedict, G. V. Ommering, L. Chappell, J. Ratti, andP. Will, “DARPA Phoenix Payload Orbital Delivery sys-tem (PODs): “FedEx to GEO”,” AIAA SPACE 2013Conference and Exposition, 2013.

[3] B. B. Reed, R. C. Smith, B. J. Naasz, J. F. Pellegrino,and C. E. Bacon, “The Restore-L servicing mission,” AiaaSpace 2016, 2016.

[4] S. Sharma and S. D’Amico, “Pose estimation fornon-cooperative rendezvous using neural networks,” in2019 AAS/AIAA Astrodynamics Specialist Conference,Ka’anapali, Maui, HI, January 13-17 2019.

[5] S. D’Amico, M. Benn, and J. L. Jørgensen, “Pose esti-mation of an uncooperative spacecraft from actual spaceimagery,” International Journal of Space Science and En-gineering, vol. 2, no. 2, p. 171, 2014.

[6] S. D’Amico, P. Bodin, M. Delpech, and R. Noteborn,“PRISMA,” in Distributed Space Missions for Earth Sys-tem Monitoring Space Technology Library (M. D’Errico,ed.), vol. 31, ch. 21, pp. 599–637, 2013.

[7] V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An ac-curate O(n) solution to the PnP problem,” InternationalJournal of Computer Vision, vol. 81, no. 2, p. 155–166,2008.

[8] M. A. Fischler and R. C. Bolles, “Random sample con-sensus: A paradigm for model fitting with applications toimage analysis and automated cartography,” Readings inComputer Vision, p. 726–740, 1987.

[9] R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A.Wichmann, and W. Brendel, “ImageNet-trained CNNsare biased towards texture; increasing shape bias improvesaccuracy and robustness.,” in International Conferenceon Learning Representations, 2019.

[10] P. T. Jackson, A. A. Abarghouei, S. Bonner, T. P.Breckon, and B. Obara, “Style augmentation: Data aug-mentation via style randomization,” 2018.

2nd RPI Space Imaging Workshop. Saratoga Springs, NY.

28-30 October 2019

2