a self-locating and map-constructing robot based on orb-slampwhum/classprojects/eecs... · c. orb...

8
A Self-Locating and Map-Constructing Robot based on ORB-SLAM Lili Chen, Mingyu Yang, Yue Dai, Wenhao Peng Dept. of Electrical and Computer Engineering, University of Michigan Email: {chenlili, mingyuy, rudydai, pwhum}@umich.edu Abstract—This paper presents a real-time ORB-SLAM system implemented on a Raspberry Pi (RPi) as a design project for EECS 452 DSP Design Laboratory. This system is able to estimate the robot’s motion locally without the aid of GPUs or remote servers. Meanwhile, it sends out the results to a remote PC through WiFi where the estimated route is shown. The whole system is implemented in C++ and various open source libraries are used to perform feature extraction, nonlinear optimization and graphical interface. The system is finally validated in a university lab and a university classroom. I. I NTRODUCTION A. Background Human beings never stop their steps in exploring unknown areas. Two fundamental tasks of exploring unknown areas are building the map of the surroundings (map-constructing) and learning its own location in the map (self-locating). However, solutions to these two tasks are not trivial. Traditional GPS- based solutions work well for outdoor scenarios, but they cannot provide accurate results for complex scenarios like indoor, underground, outer space, undersea environments. Therefore, more complex techniques are needed to better tackle these two tasks. Fig. 1. Robots can work well under complex conditions like outer space, undersea, and underground, where human beings cannot reach [1]. B. Simultaneous Localization And Mapping Although map-constructing and robot self-locating seem like two separate tasks, researchers usually deal with these two together because it is essentially a "chicken-or-egg" problem. To know the robot location, a map is needed. To create a map, the robot needs to make an accurate estimate of its own location. This task is usually called Simultaneous Localization And Mapping (SLAM) problem. Solving SLAM problem is not trivial because robot location and map are both unknown. The result solely depends on landmarks matching between consecutive observations. Picking up wrong landmarks for robot tracking may cause catastrophic consequences because the estimate error is accumulated [1]. Hard as SLAM is, it is very useful for indoor vacuum cleaner, surveillance with unmanned air vehicles, reef monitoring, and etc [1]. C. Current Solutions There are many approaches to implement SLAM techniques. Some of them are realized by expensive LiDARs ($159 [2]) and depth-sensing cameras ($169 [3]) [4]. Others are realized by binocular visions, which could have potential issues in synchronization between these two cameras [5]. One major issue with visual SLAM techniques is that it needs huge computation power, which is almost impossible to be implemented on simple processors like Raspberry Pi. To realize real-time map-constructing and self-locating on simple processors, some researchers try to use remote server and external GPUs to handle image processing and SLAM computation [6], [7]. In summary, there are three major issues with current SLAM implementations: 1) expensive hardware, 2) synchronization among components in a complex system, and 3) huge compu- tation demand. D. Our Approach Our goal is to build a simple, low-cost and self-contained visual SLAM system which can be implemented on a Raspberry Pi with real-time localization and map construction. To achieve this, we only use one Raspberry Pi camera (monocular) and all the estimations are based on the captured images. In addition, we use simple feature extraction algorithm (ORB), downsample our input images and avoid convolutional neural networks to reduce the computation load. There is a similar research project which implements monocular SLAM algorithms on Raspberry Pi[6]. However, it only treats Raspberry Pi as a camera and sends images to remote servers for further processing. Actually, Raspberry Pi is a wonderful processor which is able to handle complex com- putation. So, we decide to make our system "self-contained" and not to rely on any external devices. E. Paper Structure This paper is organized as follows. Section II elaborates our major signal processing flow. Section III presents our hardware

Upload: others

Post on 09-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

A Self-Locating and Map-Constructing Robot basedon ORB-SLAM

Lili Chen, Mingyu Yang, Yue Dai, Wenhao PengDept. of Electrical and Computer Engineering,

University of MichiganEmail: {chenlili, mingyuy, rudydai, pwhum}@umich.edu

Abstract—This paper presents a real-time ORB-SLAM systemimplemented on a Raspberry Pi (RPi) as a design project forEECS 452 DSP Design Laboratory. This system is able to estimatethe robot’s motion locally without the aid of GPUs or remoteservers. Meanwhile, it sends out the results to a remote PCthrough WiFi where the estimated route is shown. The wholesystem is implemented in C++ and various open source librariesare used to perform feature extraction, nonlinear optimizationand graphical interface. The system is finally validated in auniversity lab and a university classroom.

I. INTRODUCTION

A. Background

Human beings never stop their steps in exploring unknownareas. Two fundamental tasks of exploring unknown areas arebuilding the map of the surroundings (map-constructing) andlearning its own location in the map (self-locating). However,solutions to these two tasks are not trivial. Traditional GPS-based solutions work well for outdoor scenarios, but they cannotprovide accurate results for complex scenarios like indoor,underground, outer space, undersea environments. Therefore,more complex techniques are needed to better tackle these twotasks.

Fig. 1. Robots can work well under complex conditions like outer space,undersea, and underground, where human beings cannot reach [1].

B. Simultaneous Localization And Mapping

Although map-constructing and robot self-locating seem liketwo separate tasks, researchers usually deal with these twotogether because it is essentially a "chicken-or-egg" problem.To know the robot location, a map is needed. To create amap, the robot needs to make an accurate estimate of its ownlocation. This task is usually called Simultaneous LocalizationAnd Mapping (SLAM) problem.

Solving SLAM problem is not trivial because robot locationand map are both unknown. The result solely depends onlandmarks matching between consecutive observations. Picking

up wrong landmarks for robot tracking may cause catastrophicconsequences because the estimate error is accumulated [1].Hard as SLAM is, it is very useful for indoor vacuum cleaner,surveillance with unmanned air vehicles, reef monitoring, andetc [1].

C. Current Solutions

There are many approaches to implement SLAM techniques.Some of them are realized by expensive LiDARs ($159 [2])and depth-sensing cameras ($169 [3]) [4]. Others are realizedby binocular visions, which could have potential issues insynchronization between these two cameras [5].

One major issue with visual SLAM techniques is that itneeds huge computation power, which is almost impossibleto be implemented on simple processors like Raspberry Pi.To realize real-time map-constructing and self-locating onsimple processors, some researchers try to use remote serverand external GPUs to handle image processing and SLAMcomputation [6], [7].

In summary, there are three major issues with current SLAMimplementations: 1) expensive hardware, 2) synchronizationamong components in a complex system, and 3) huge compu-tation demand.

D. Our Approach

Our goal is to build a simple, low-cost and self-containedvisual SLAM system which can be implemented on a RaspberryPi with real-time localization and map construction. To achievethis, we only use one Raspberry Pi camera (monocular) and allthe estimations are based on the captured images. In addition,we use simple feature extraction algorithm (ORB), downsampleour input images and avoid convolutional neural networks toreduce the computation load.

There is a similar research project which implementsmonocular SLAM algorithms on Raspberry Pi[6]. However,it only treats Raspberry Pi as a camera and sends images toremote servers for further processing. Actually, Raspberry Piis a wonderful processor which is able to handle complex com-putation. So, we decide to make our system "self-contained"and not to rely on any external devices.

E. Paper Structure

This paper is organized as follows. Section II elaborates ourmajor signal processing flow. Section III presents our hardware

Page 2: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

and software implementation platforms. Section IV talks aboutour project progress. Section V demonstrates the results of ourmap-constructing and self-locating robot traveling in an indoorenvironment. Finally, Section VI gives a conclusion of thisproject.

II. DSP DESIGN

A. High Level Signal Processing Flow

Fig. 2. Our signal processing flow includes both the RPi and the user interface(UI). Red blocks are our own designs and green blocks are adapted from opensource projects.

The high level DSP architecture is shown in Fig. 2. InRaspberry Pi, an RGB image is first captured by the Pi cameraand downsampled to 640x480 pixels. After that, we extractfeatures using ORB feature descriptors, which are then appliedto estimate camera poses. After pose estimation, the resultingposes are sent to UI via WiFi, and finally the trajectory ispresented on the remote PC. Each block is described in detailin the following sections.

B. Image Downsampling

The most important constraint of the self-locating and map-constructing robot is its processing speed. With high processingspeed, accuracy and user experience can be improved at thesame time. Without image downsampling, the frame rateof our system is just about 0.5 fps and features changedrastically between consecutive frames, which makes it difficultto match common keypoints. As a result, tracking lost happensfrequently.

Theoretically, the smaller the image, the faster the processingspeed and the higher the frame rate. However, there is a trade-off between image downsampling scale and number of matchedfeatures. Although a high downsampling rate improves theframe rate a lot, there will be fewer features matched dueto the reduced quality of the picture.As shown in Fig. 3 andFig. 4, for the same picture, the number of matched featuresof size 512×384 is 359 while the number of matched featuresof size 640×480 is 469. At the same time, the quality ofextracted features drops when the sampling rate is high, which

can be illustrated by the bad keypoints on the carpet in Fig. 3.The decreasing number of matched keypoints and increasingnumber of bad keypoints degrade the system accuracy a lotespecially when the robot makes a turn. As shown in Fig. 5 andFig. 6, the same set of images which contains two 90◦ turnsis used to test the performance of two different downsamplingscales. It is obvious that when a higher downsampling scale isused, the trajectory is less accurate.

Fig. 3. Matched features, shown in green boxes, when the image size is512x384 pixels, have worse quality.

Fig. 4. Matched features, shown in green boxes, when the image size is640x480 pixels, have better quality.

Fig. 5. The trajectory of two 90◦ turns when the image size is 512x384pixels is inaccurate.

By trying different downsampling scales, a 640×480 pixelsize is used to give the best performance in the trade-offbetween processing speed and accuracy.

Page 3: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

Fig. 6. The trajectory of two 90◦ turns when the image size is 640x480pixels is accurate.

C. ORB Feature Extraction

ORB stands for Oriented FAST and Rotated BRIEF, whichis first introduced by Ethan Vincent [10] as an alternative toSIFT or SUFT, which is more than two orders of magnitudesfaster. With the usage of ORB, the computational load is greatlydecreased which makes a real-time SLAM system possible.

As is indicated by its name, ORB consists of Oriented FAST(oFAST) as the feature detector and Rotated BRIEF (rBRIEF)as the feature descriptor [10]. FAST and its variants are themethod of choice for finding keypoints in real-time systems thatmatch visual features [13], [14]. FAST is useful in ParallelTracking and Mapping systems because it is efficient andfinds reasonable corner keypoints [15]. However, FAST doesnot provide orientation information, which is key to featurematching. To handle this problem, oFAST introduces IntensityCentroid as a simple measurement of orientation which isproved to have similar performance as gradient measures [10].BRIEF is a recent feature descriptor that uses simple binarytests between pixels in a smoothed image patch [12]. Itsperformance is similar to SIFT in many aspects, includingrobustness to lighting, blur, and perspective distortion. However,it is very sensitive to in-plane rotation. rBRIEF solves theproblem by first steering all keypoints by the correspondingpatch angle. Then apply a subset of binary tests learned from300k keypoints to increase the variance of each binary testwhile reducing the correlation between them [10].

To make ORB scale-invariant, Gaussian pyramid is used tosimulate different scaling effects of the same image in orderto find the matched keypoints in different scales. A pyramidstructure with 8 levels is used to give the best result of featurematching in previous works [10]. This structure can even beused in high-speed tracking such as tracking a car movingon the highway where the scale of the same keypoints willchange drastically between continuous frames. However, inthis project, the robot won’t move very fast, and as a result, forthe same keypoints, the scale of those keypoints won’t changea lot. An 8-level pyramid won’t improve the feature extractionmuch but instead reduce the processing speed a lot. For thisreason, a 4 level pyramid structure is applied to improve theprocessing speed while keeping the performance almost thesame.

D. Pose Estimation

1) Overall description: After ORB feature extraction, theinformation of keypoints is used to estimate the camera pose,which is defined as a 3× 4 matrix [R|t], where R is a 3× 3rotation matrix representing the orientation of camera and t isa 3× 1 translation vector showing the position of camera inworld coordinates.

We derive our work from an existing ORB-SLAMproject [10]. Our overall procedures are shown in Fig. 7. At first,an initialization process is needed to give an initial estimateof 3D map points and the current camera pose. Then, thepose of the current processing frame is coarsely estimated onlyusing the last frame, which is called initial pose estimation.If there are few matching points, we claim that tracking islost. If tracking is still alive, a finer estimation is conductedby projecting all related map points to the current frame, andthat is called tracking local map. After that, we can move theSLAM system. During the motion, as a new frame comes in, ifwe don’t get enough common 3D map points, the current framewill be treated as a new keyframe which is used to create newmap points and estimate the route. Next, the positions of 3Dmap points and all related poses of keyframes are optimizedagain by local bundle adjustment.

Fig. 7. Overall procedure of pose estimation of our work is simple. Thesystem keeps tracking its position and determines whether a new keyframe isneeded to describe its environment by counting the common map points. Everytime a new keyframe is inserted, a local bundle adjustment is performed.

Different from the previous work [10], we discard the loopclosing process that compares the current frame with all thekeyframes generated and stored in the system. If the currentframe is similar to a keyframe, the system will claim that aloop is detected and run a global bundle adjustment to optimizeall the poses and map points within that loop. Loop closing

Page 4: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

technique is extremely helpful. However, to detect a loop, eachtime a new frame is captured, we need to compare it withevery single keyframe recorded, which is time-consuming andlimits our speed of processing. Without loop closing, the framerate is improved greatly.

More details of each block are summarized in the followingsubsections based on our understanding of the open sourcecode [11].

2) Initialization: The system starts by setting the first frameas the initial frame so that subsequent frames can be usedfor initialization. For the subsequent images, the system findsthe matched keypoints between the current frame and theinitial frame. If the matched keypoint number is less thana threshold (100 in our system), then this frame cannot beused for initialization and the initialization then starts over.Otherwise, our system uses RANSAC to calculate and scorehomography and fundamental matrix. If one of the score crossesover a certain threshold, our system will construct the mappoints from the corresponding matrix. Otherwise, our systemwill check whether the next frame satisfies this situation. If arequired frame is found, our system will set up the map pointand set the initial frame and the current frame as keyframes,and then it will perform bundle adjustment to optimize theexisting map points. If the number of overlapped map pointsin those frames is smaller than a threshold, the system willreset. If it is larger than the threshold, initialization process isdone.

3) Initial Pose Estimation: Initial pose estimation is doneby first inferring the camera pose with a constant speed motionmodel. In other words, the movement of current frame isassumed to be identical with the last frame. Then, the matched3-D map points of last frame are projected to current frameand the projection error is minimized over each possible pose.In essence, this is a non-convex optimization problem. g2olibrary is adopted to solve this optimization problem in C++.

4) Tracking Lost Detection: After feature matching, thesystem detects the number of matched points between thecurrent frame and the previous frame. If it is less than a certainthreshold (100 in our project), the system claims that these twoframes are not correlated enough to recover the camera pose,so it is also called tracking lost. If there are more than 100matched points, we continue to further estimate the camerapose.

5) Track Local Map: After the camera pose is initiallyestimated, all the map points in the local map that can be seenin the current frame are projected onto the current frame. The3D-2D projection error is optimized over the current framepose. This should yield a better estimate of camera pose since ituses all the available map points stored in the local map. Aftercamera pose is optimized, the system will recheck whetherthere is enough matches in the current frame with respect tothe map point database. If true, tracking successes; otherwise,tracking loses.

6) New Keyframe Insertion: The conditions for the currentframe to be a keyframe are that: 1). The number of trackedpoints between the current frame and the previous keyframe is

smaller than a certain threshold and there are enough potentialmap points in the current frame. 2). Enough frames have passed,and there is no local mapping in progress, or too many frameshave passed. If both conditions are satisfied, the current frameis inserted as a keyframe.

After a new keyframe is inserted, for each unmatchedORB keypoint in the newly inserted keyframe, we searchfor a match with other unmatched points in other keyframes.Then ORB pairs are triangulated using the estimated poses ofcorresponding keyframes. After that, to accept the new points,positive depth in camera, parallax, reprojection error and scaleconsistency are checked.

7) Local Bundle Adjustment: After generating new mappoints, a local bundle adjustment is performed to optimize thelatest processed keyframes which share common map points.The optimization objectives for local bundle adjustment are notonly poses but also the positions of map points. Map pointsmarked as outliers are discarded during the optimization.

E. Communication

It is found that when the RPi is used for both SLAMcomputation and UI drawing, and having a remote displaythrough SSH, there is a huge delay between the display and theSLAM system. The delay comes from two parts: 1). Trajectorydrawing thread has a heavy computation load not suitablefor RPi’s computation ability and slows down other threads’processing at the same time. 2). SSH link is slow for screendisplay due to its security property.

To reduce the computation load and the latency betweenthe robot’s vision and the trajectory displayed, we transferthe display thread from RPi to PC. RPi is only used forcapturing images, image downsampling, ORB extraction andpose estimation. After pose estimation, RPi sends the estimationresults to PC through TCP sockets. Then, a user interface on PCis used for trajectory display. Fig. 8 shows the delay differencebetween SSH and TCP/IP.

Fig. 8. Delay difference between an SSH link and a TCP/IP link is huge[8],[9]. The TCP/IP link is about 7x faster than the SSH link.

Fig.9 shows the data flow between the RPi and the userinterface. Data flow between the RPi and user interfaceinvolves serializing and deserializing a matrix. A camera matrixgenerated on RPi is wrapped into a TCP socket and transmittedover the Internet to the UI, which is then unwrapped.

The current camera pose and all the keyframe poses aretransmitted from RPi to UI. There are two threads running inparallel. The current camera pose is estimated in the trackingthread, and the keyframe poses are determined in the localmapping thread. As a result, two different ports are used

Page 5: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

Fig. 9. Only a string of text is transmitted over the Internet, so the load onthe Internet is much lighter than SSH graphics. This string carries the posematrix information and allows a real-time display at the UI.

to transmit and receive camera poses and keyframe posesseparately. To transmit a 4x4 camera pose matrix, a cv::Matobject is first serialized into a vector of 16 floating numbers,which is 64 bytes long. To prevent issues caused by differentendiannesses in different systems, the 64-byte vector is encodedinto a 128-byte vector of hexadecimal characters. This vectorwill then be sent out using TCP/IP protocol via WiFi. On theuser interface side, the reverse is done to recover the cameramatrix data, and they are put into two containers that store allthe received camera poses and all the received keyframe posesfor display purposes.

Since local bundle adjustment optimizes the poses ofprevious keyframes, we need to keep updating the keyframeposes on the UI. To do that, we send all the keyframe posesand replace the old ones on the UI each time the number ofkeyframes increases by 10.

F. User Interface

Fig. 10. The UI displays the keyframes generated in the system and anestimation on the current location. The keyframes are represented by the bluemarkers, and the estimation is represented by the green marker.

In the UI, all information stored in the camera pose containerand the keyframes container will be plotted, as is shown inFig.10. OpenGL library is used to plot the poses in 3D. Thekeyframes are represented by the blue markers, and the currentcamera pose is represented by the green marker. The green lineconnects all the camera poses to show the estimated trajectoryof the robot. Users can change the perspective by using amouse. Users can also choose whether or not the center of

view follows the camera or not. Users can press "CTRL+Z"whenever they want to quit the user interface.

III. IMPLEMENTATION

A. Use Scenario

Fig. 11. Use scenario of our SLAM system. A controller can give motioncontrol to the robot to move the SLAM system around. The user remotelycontrols the robot which is exploring an unknown terrain and is sending backreal-time localization and mapping information.

The use scenario is shown in Fig.11. There is a robot anda UI. The robot traverses an unknown terrain following theradio controls from the user, and it transmits SLAM resultsback to the UI.

B. Implementation Overview

Fig. 12. At a high level, the SLAM system consists of a robot and a UI.All image processing and SLAM computation are conducted on the RPi. Thelaptop is only used for the UI.

Page 6: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

The implementation is summarized in Fig.12. The controlleris the stock radio controller for the RC car, and the batteryis realized with a portable charger and a USB power splitter.The RPi runs Ubuntu MATE and a set of open source libraries,such as OpenCV, and it wraps the SLAM results in TCPpackets, which are then transmitted over the Internet to the UI,which runs Ubuntu and uses OpenGL and Pangolin to drawthe interactive 3D interface for the user.

C. Implementation Platforms

1) Software Platforms: Due to the complex nature of theSLAM task, various libraries are used to streamline thedevelopment of the code.

a) Ubuntu MATE: We need to access Linux standardheaders to use libraries that are designed for Linux, so Raspbianis replaced by Ubuntu MATE.

b) GNU C++: We use GNU C++ together with the latestboost and chrono library to realize an efficient implementationof our algorithm in a multi-thread configuration and to obtainaccurate timestamps for our frames. We also use the socketheader in Linux to perform TCP/IP communication.

c) OpenCV: We use OpenCV to quickly develop thecodes for matrix transformation for both the images and thefeatures.

d) Pangolin: Pangolin is used to draw the SLAM resultsin an interactive 3D interface.

e) DBoW: DBoW is a library we use for ORB featuredescriptors.

f) Eigen3 and g2o: Eigen3 is needed by g2o, which isused for performing non-linear optimizations.

2) Hardware Platforms: Because SLAM is a computationdemanding task, we have optimized the hardware environmentfor RPi by keeping the temperature low and the supply voltagestable.

a) RPi: We use the RPi together with its monocularcamera to perform SLAM computations and to collect images.

b) RPi Cooling Case: A RPi case with heat sinks and acooling fan replaces the stock case for improved mechanicalsupport and cooling.

c) Portable Chargers with a USB Power Splitter: Theofficial RPi power supply is capable of outputting a continuous5.1V 2.5A power source to support the RPi the current it needsfor the best performance. However, many portable chargers aredesigned for phones and tablet computers, and the ones weget are able to supply 5.1V 2.1A. We often get low voltagewarnings on the screen if we only supply power with oneportable charger, so we use a USB power splitter to combinethe power from two portable chargers to support one RPi.The two portable chargers can support the RPi for continuousSLAM operation over the design expo.

d) RC Car: We use a stock RC car to carry the portablechargers and the RPi and to support the monocular camerain place, facing forward. The RC car is either moved aroundin hand or placed on ground and driven by a remote radiocontroller. The number of turns of wires in the motor in thisRC car is reduced to reduce torque and ensure a smooth

acceleration and deceleration when the radio control button istoggled.

D. Camera Calibration

Each camera has different imaging characteristics. Thesecharacteristics can be represented by a matrix called the cameraintrinsic matrix. Our system needs to take it into account toavoid distortion in 3D-2D projections. In this project, we applyOpenCV camera calibration library to calibrate the Pi camera.To get a better calibration result, pictures containing a chessboard at the corner are needed. The calibration procedure isshown in Fig. 13.

Fig. 13. Calibration procedure

As shown in the left part of Fig. 14, without cameracalibration, a 90-degree turn in reality only gives a 120-degreeturn result on the viewer. After a careful camera calibration,the trajectory becomes more accurate.

Fig. 14. Left: 90-degree turn results before calibration. Right: U-turn resultsafter calibration. Turns can be accurately recorded after camera calibration.

E. Budget

TABLE IPRICES OF THE COMPONENTS USED IN OUR SYSTEM ARE REASONABLE

AND COMPETITIVE.

Component PriceRPi w/ Camera $35Cooling Case $9.99

4000 mAh Charger $24.95×2MicroSD $29.99

Power Splitter $5.99

The cost of each component in our system is listed in Table I.The budget for this project is $250, and we are well withinbudget.

Page 7: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

IV. PROGRESS

We started working on this project in mid February andimplemented our SLAM algorithm on our robot by Milestone1. Our system was slow and not robust at that time, althoughit worked. Then we focused on improving the efficiency andmaking our system more robust. We also designed a user-friendly viewer to show our results after Milestone 2. Thedetailed tasks are shown in Fig. 15.

Fig. 15. Timeline of our project. We implemented our SLAM algorithm onour robot by Milestone 1. Then we made some improvements to our systemto enhance efficiency and create a user-friendly viewer to present our results.

V. RESULTS

A. Processing Speed

By downsampling images, reducing the ORB pyramidlevel, skipping loop closing, and implementing an efficientcommunication between the RPi and the UI, the frame rate ofthe system is increased from about 0.5 fps to about 3.5 fps.With this frame rate, the RPi can perform real-time self-locatingand map-constructing.

B. Field Trials

Field trials are performed in a university lab, and also ina university classroom. The university lab trial is used forvalidating the 2D mapping and localization accuracies, and theuniversity classroom trial is used for further validating the 3Dmapping and localization functionalities.

1) University Lab: The robot is held in hand and is carriedaround in a university lab. There are chairs placed randomly inthe lab, and we have to walk around these chairs. The SLAMsystem can map the trajectory we have traversed accurately.The results are displayed in our UI, and the top view overlaidon the lab layout is shown in Fig. 16.

2) University Classroom: The robot is held in hand andis walked around in a university classroom. The robot startsin the middle of the first edge of a square classroom, and iswalked on the three edges of this square classroom. Then, therobot is reversed back at a lower height along the third edge ithas already traversed. The results are displayed in two views.The top view is shown in Fig.17, and the side view is shownin Fig.18.

Fig. 16. The keyframes recorded by the SLAM system in a university lab isoverlaid on the floorplan of that university lab. The trajectory as indicated bythe keyframes closely matches the layout of the university lab.

Fig. 17. The keyframes recorded by the SLAM system seen from above in auniversity classroom displayed on our user interface shows that the recordedtrajectory closely matches the layout of the classroom.

Fig. 18. The keyframes recorded by the SLAM system seen from aside in auniversity classroom displayed on our user interface shows that the robot is ableto identify the past trajectory based on the map it has generated and localizeitself, and then update its map to represent the new path it has traversed.

C. Limitation

1) Frame Rate: The current frame rate might sometimescause problems when the robot is turning. Fig. 19 showsthat when the robot makes a sharp turn at the same speedas it is moving straight forward, the features on the outsidepart changes drastically, which causes blur and increases thedifficulty for ORB to find the matched keypoints and estimatethe camera pose. Sometimes, tracking will be lost at the sharpturn. The temporary solution is to slow down the robot ingeneral, but the ultimate solution is to increase the frame rate.

2) Scenario Limitation: The self-locating and map-constructing robot does not work very well when there are only

Page 8: A Self-Locating and Map-Constructing Robot based on ORB-SLAMpwhum/classProjects/EECS... · C. ORB Feature Extraction ORB stands for Oriented FAST and Rotated BRIEF, which is first

Fig. 19. Turning problem of the camera is caused by a rapid change invelocity. Blue lines are camera position and the black arrows are camera movedirection.

a few objects around it. For example, the field trial fails whenthe robot is running on the aisle of the second floor of theEECS building. This is because the aisles are uniform and havefew textures, which causes difficulties for the ORB featureextractor to detect enough matched keypoints. As shown inFig. 20, only 89 matched keypoints can be detected for thepicture captured at aisle. However, for the picture captured inthe lab, 487 matched keypoints can be detected. To ensure asuccessful demonstration for this project, some symbols suchas arrows, squares, triangles are drawn on the white board andsome chairs are placed around the demo field to add somefeatures.

Fig. 20. Left: Difficult scenario in EECS second floor aisle. Right: Easyscenario in EECS second floor lab. There are many more objects that can bedetected as keypoints in the easier scenario.

3) Sight Height: Due to the small size of the RC car, thecamera sits low and causes the floor to occupy most part ofthe image. However, floor does not provide useful information.Therefore, an increased height might help, but this requires aredesign of the robot.

D. Future Work

Right now we only use purchased components for sensing.If we were given more time, we would want to build distancesensing devices to measure distance from the robot to itssurrounding objects. We would also want to build our ownrobot with strong off-road capabilities to explore inaccessibleterrains.

VI. CONCLUSION

In conclusion, a working real-time ORB-SLAM system isimplemented on a Raspberry Pi. With only one monocularcamera and a raspberry pi, the cost of the entire system islow compared to systems with dedicated sensors. With ORBfeature descriptors and further optimization by ourselves, thecomputation load is lowered for self-contained operation ina Raspberry Pi. With a simple configuration for hardware,

successful demonstrations are performed. This system wrapsup all we have learned in EECS 452.

VII. ACKNOWLEDGEMENT

We would like to extend our gratitudes to Prof. Hun-SeokKim, Dr. Kurt Metzger, and GSIs Siddharth Venkatesan andCarl Steinhauser, for their help in this project. We havedistributed our work evenly among the team members. Majorcontributions of each team member are as follows:

• RC Car Modification: Yue Dai• RPi Peripherals: Wenhao Peng and Lili Chen• User Interface Design: Mingyu Yang• TCP/IP Communication Link Design: Yue Dai and

Mingyu Yang• Camera Calibration: Lili Chen and Wenhao Peng• SLAM Field Trial: All members• Documentation: All members• Data Structure Improvement: Wenhao Peng and Lili Chen• Algorithm Efficiency Improvement: Yue Dai

REFERENCES

[1] Ais.informatik.uni-freiburg.de, 2018. [Online]. Available: http://ais.informatik.uni-freiburg.de/teaching/ss12/robotics/slides/12-slam.pdf. [Ac-cessed: 18- Apr- 2018].

[2] Amazon.com, 2018. [Online]. Available: https://www.amazon.com/Neato-LIDAR-Laser-distance-sensor/dp/B00QSUJPSY. [Accessed: 18-Apr- 2018].

[3] "Intel R© RealSenseTM Depth Cameras", Click.intel.com, 2018. [Online].Available: https://click.intel.com/realsense.html. [Accessed: 18- Apr-2018].

[4] Ocw.mit.edu, 2018. [Online]. Available: https://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf. [Accessed: 18- Apr- 2018].

[5] "Binocular Vision-SLAM Using Improved SIFT Algorithm - IEEEConference Publication", Ieeexplore.ieee.org, 2018. [Online]. Avail-able: https://ieeexplore.ieee.org/document/5473273/. [Accessed: 18- Apr-2018].

[6] Courses.cit.cornell.edu, 2018. [Online]. Available: https://courses.cit.cornell.edu/ece6930/ECE6930_Spring16_Final_MEng_Reports/SLAM/Real-time\%20ROSberryPi\%20SLAM\%20Robot.pdf.[Accessed: 18- Apr- 2018].

[7] S. N. Sinha, J. michael Frahm, M. Pollefeys, and Y. Genc, "GPU-basedVideo Feature Tracking and Matching", Workshop on Edge ComputingUsing New Commodity Architectures, 2006

[8] Training.nectar.org.au, 2018. [Online]. Available:http://training.nectar.org.au/package07/sections/images/SSHKeys.png.[Accessed: 18- Apr- 2018].

[9] Vidyakv.files.wordpress.com, 2018. [Online]. Available: https://vidyakv.files.wordpress.com/2011/12/cs-120-3-3341.png?w=580. [Accessed: 18-Apr- 2018].

[10] R. Mur-Artal and J. M. M. Montiel and J. D. Tardós, "ORB-SLAM: AVersatile and Accurate Monocular SLAM System", IEEE Transactionson Robotics, 2015.

[11] R. Mur-Artal and J. M. M. Montiel and J. D. Tardós, ORB SLAM2.https://github.com/raulmur/ORB_SLAM2.

[12] H. Strasdat, J. M. M. Montiel, and A. J. Davison, “Scale drift-awarelarge scale monocular SLAM,” presented at the Proc. Robot.: Sci. Syst.,Zaragoza, Spain, Jun. 2010.

[13] O. D. Faugeras and F. Lustman, “Motion and structure from motion ina piecewise planar environment,” Int. J. Pattern Recog. Artif. Intell., vol.2, no. 03, pp. 485–508, 1988.

[14] W. Tan, H. Liu, Z. Dong, G. Zhang, and H. Bao, “Robust monocularSLAM in dynamic environments,” in Proc. IEEE Int. Symp. MixedAugmented Reality, Adelaide, Australia, Oct. 2013, pp. 209–218.

[15] B. Williams, M. Cummins, J. Neira, P. Newman, I. Reid, and J. D.Tardos, “A comparison of loop closing techniques in monocular SLAM,”Robot. Auton. Syst., vol. 57, no. 12, pp. 1188–1197, 2009.