human-robot collaborative remote object...

12
Human-Robot Collaborative Remote Object Search Jun Miura, Shin Kadekawa, Kota Chikaarashi, and Junichi Sugiyama Department of Computer Science and Engineering, Toyohashi University of Technology Abstract. Object search is one of the typical tasks for remotely-controlled service robots. Although object recognition technologies have been well developed, an efficient search strategy (or viewpoint planning method) is still an issue. This paper describes a new approach to human-robot col- laborative remote object search. An analogy for our approach is ride on shoulders; a user controls a fish-eye camera on a remote robot to change views and search for a target object, independently of the robot. Com- bined with a certain level of automatic search capability of the robot, this collaboration can realize an efficient target object search. We developed an experimental system to show the feasibility of the approach. Keywords: Human-robot collaboration, object search, observation planning. 1 Introduction Demands for remotely-controlled mobile robots are increasing in many applica- tion areas such as disaster response and human support. One of the important tasks for such robots is object search. To find an object, a robot continuously changes its position and examines various parts of the environment. An object search task is thus roughly composed of viewpoint planning and object recog- nition. Although technologies for object recognition have been well developed with recent high-performance sensors and the use of informative visual features, viewpoint planning is still a challenging problem. Exploration planning [1, 2] is a viewpoint planning for making a description of the whole workspace. Efficient space coverage is often the goal to achieve in this planning. Concerning object search, Tsotsos and his group have been devel- oping a general, statistical framework of visual object search [3, 4]. Saidi et al. [5] takes a similar approach in object search by a humanoid. Aydemir et al. [6] utilize high-level knowledge on spatial relations between objects to select low- level search strategies. We have also developed algorithms for efficient mapping and object search in unknown environments (called environment information summarization). Masuzawa and Miura [7, 8] formulated this problem as a com- bination of greedy exploration ordering of unknown sub-regions and a statistical optimization of viewpoint planning for object verification. Boussard and Miura [9] formulated the same problem as an MDP and presented an efficient solution

Upload: others

Post on 07-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Human-Robot Collaborative Remote Object

Search

Jun Miura, Shin Kadekawa, Kota Chikaarashi, and Junichi Sugiyama

Department of Computer Science and Engineering,Toyohashi University of Technology

Abstract. Object search is one of the typical tasks for remotely-controlledservice robots. Although object recognition technologies have been welldeveloped, an efficient search strategy (or viewpoint planning method) isstill an issue. This paper describes a new approach to human-robot col-laborative remote object search. An analogy for our approach is ride onshoulders; a user controls a fish-eye camera on a remote robot to changeviews and search for a target object, independently of the robot. Com-bined with a certain level of automatic search capability of the robot, thiscollaboration can realize an efficient target object search. We developedan experimental system to show the feasibility of the approach.

Keywords: Human-robot collaboration, object search, observation planning.

1 Introduction

Demands for remotely-controlled mobile robots are increasing in many applica-tion areas such as disaster response and human support. One of the importanttasks for such robots is object search. To find an object, a robot continuouslychanges its position and examines various parts of the environment. An objectsearch task is thus roughly composed of viewpoint planning and object recog-nition. Although technologies for object recognition have been well developedwith recent high-performance sensors and the use of informative visual features,viewpoint planning is still a challenging problem.

Exploration planning [1, 2] is a viewpoint planning for making a descriptionof the whole workspace. Efficient space coverage is often the goal to achieve inthis planning. Concerning object search, Tsotsos and his group have been devel-oping a general, statistical framework of visual object search [3, 4]. Saidi et al.[5] takes a similar approach in object search by a humanoid. Aydemir et al. [6]utilize high-level knowledge on spatial relations between objects to select low-level search strategies. We have also developed algorithms for efficient mappingand object search in unknown environments (called environment informationsummarization). Masuzawa and Miura [7, 8] formulated this problem as a com-bination of greedy exploration ordering of unknown sub-regions and a statisticaloptimization of viewpoint planning for object verification. Boussard and Miura[9] formulated the same problem as an MDP and presented an efficient solution

Owner
タイプライターテキスト
Proc. 13th Int. Conf. on Intelligent Autonomous Systems, Padova, Italy, July 2014.

using LRTDP [10]. These works are for improving performance and efficiency ofautomatic object search.

A human operator sometimes controls or supports the robotic explorationand/or object search in a tele-operation context, where interface design is an im-portant issue. Various design approaches are possible depending on how largelythe robot controller and the operator contribute to actual robot actions. Whenthe operator mainly controls the motion of the robot, an informative display ofthe remote scene is required. Fong et al. [11] proposed a sensor fusion displayfor vehicle tele-operation which can provide visual depth cues by displaying datafrom a heterogeneous set of range sensors. Suzuki [12] developed a vision systemcombining views from a usual camera and an omnidirectional one to provide amore informative view of a remote scene. Saitoh et al. [13] proposed a 2D-3Dintegrated interface using an omnidirectional camera and a 3D range sensor.Shiroma et al. [14] showed a bird’s-eye view could provide a better display for amobile robot tele-operation than a panoramic or a conventional camera.

The idea of safeguard teleoperation (e.g., [15]) is often used in which theoperator gives a higher level command and the robot realizes it with keepingsafety. Shared autonomy is a concept that a human and a robot collaborate by aneven-contribution manner. Sawaragi et al. [16] deals with an ecological interfacedesign for shared autonomy for a tele-operation of a mobile robot. The interfaceprovides sufficient information for evoking the operator’s natural response. Theseworks do not suppose a high-level autonomy of the robot systems.

This paper describes a new type of human-robot collaboration in remote ob-ject search. We suppose the robot has an enough level of autonomy for achievingthe task. Since the human’s ability of scene recognition is usually better thanthose of robots, however, a human operator also observes a remote scene andhelps the robot by giving advice on the target object location. An analogy ofour approach is ride on shoulders; a boy on his father’s shoulders searches for atarget object and tells its location to the father, while the father is also searchingfor it. The boy can also understand which direction the father is focusing and/ormoving. We realize this relationship by putting a fish-eye camera on a humanoidrobot and make the camera’s focus of attention be remotely controllable.

The rest of the paper is organized as follows. Section 2 explains the hardwareand software configuration of the system. Section 3 describes an automatic objectsearch strategy that the robot takes. Section 4 describes a camera interface forthe operator and human-robot interaction in the collaborative object search.Section 5 summarizes the paper and discusses future work.

2 Overview of the System

Fig. 1 shows the hardware and software configuration of the system. The robotwe use is HIRO, an upper-body humanoid by Kawada, put on an omnidirectionalmobile base. It has three types of sensors. An RGB-D camera (Kinect) on thehead is used for detecting tables and object candidates. Two CCD cameras atboth wrists are used for recognizing objects based on the textures. Three laser

hand cameraheadset

3D mouse

fish-eye camera

table and object recognition

hand/arm and neck control

mobile base control

voice recognition

mapping

fish-eye image server

fish-eye image client

3D mouse controller

Robot PC

Operator PC

remote robot side operator side

LRF

RGB-D camera

Fig. 1. System configuration.

range finders (LRFs) (UHG-08LX by Hokuyo) on the mobile base are used forSLAM (simultaneous localization and mapping).

The robot is equipped with a fish-eye camera (NM33-UVCT by Opto Inc.)for providing the operator the image of the remote scene. The operator canextract a perspective image of any direction using a 3D mouse (SpaceNavigatorby 3DConnecxion Inc.) so that he/she can search anywhere at the remote site fora target object. The interface also provides the operator the state of the robot,that is, where it is and where it is searching for the target object. A headset isused for the operator to give voice commands to the robot.

The software is composed of multiple functional modules, shown by roundedrectangles in Fig. 1. Each module is realized as an RT component in the RT-middleware environment [17], which supports a modularized software develop-ment. We use an implementation of RT-middleware by AIST, Japan [18].

3 Automatic Object Search

3.1 Algorithm of automatic object search

The task of the robot is to fetch a specific object in a room. Object candidatesare assumed on a table. The robot thus starts from finding tables in the room,and then moves on to candidate detection and target recognition for fetching.We deal with only rectangular parallelepipeds as objects, and extract and storethe visual features (i.e., color histogram and SIFT descriptors [19], explainedbelow) of the target object in advance. The sizes of objects are also known. Thealgorithm of automatic object search is summarized in Fig. 2.

3.2 Object detection and recognition routines

Table detection Tables are detected from point cloud data taken by the Kinectusing PCL (Point Cloud Library) [20]. Assuming that the heights of tables are

Step 1: Detect all tables in the room using the RGB-D camera by turning theneck.

Step 2: Choose the nearest and unexplored table and approach it.

(a) If it exists, goto Step 3.(b) If not, go back to the initial position and report failure.

Step 3: Search for target object candidates using color histogram.

(a) If found, goto Step 4.(b) If not, goto Step 2 (search another table).

Step 4: Approach candidates and recognize them using a SIFT-based recogni-tion/pose estimation.

(a) If the target object is recognized, grasp it and go back to the initialposition.

(b) If not, goto Step 2 (search another table).

Fig. 2. Algorithm for automatic object search.

(a) Test scene. (b) Height filtering. (c) Detected table.

Fig. 3. Table detection.

between 70 [cm] and 90 [cm], planar segments with vertical normals in that heightrange are detected using a RANSAC-based algorithm. Fig. 3 shows a table de-tection result.

Candidate detection on a table Once a table to approach is determined, therobot moves to the position with a certain distance to the table. The point clouddata is again analyzed using PCL to extract data corresponding to objects onthe estimated table plane. The extracted data are clustered into objects, eachof which is characterized by a Hue histogram. A normalized cross-correlation(NCC) is then calculated between the model and the data histogram to judge ifan object is a candidate. Fig. 4 illustrates the process of candidate detection.

input image detected objects target candidatesHue0 360 deg.

Hue histogram of the target

Fig. 4. Candidate detection. The rightmost object is the target.

hand camera

detected target with estimated pose

Fig. 5. Object recognition andpose estimation.

Robot

target object candidates

other objects

Table

initial positionposition for candidate detection

positions for recognition and pose estimation

Fig. 6. Movement of the robot for detection andrecognition. Two target candidates on the left aregrouped and recognized from a single position.

Object recognition using SIFT Each target object candidate is verified byusing a hand camera. SIFT features are extracted in each candidate object regionand matched with those in the model. If the number of matched features is abovea threshold, the target object is considered verified. The pose of the object can becalculated from the pairs of the 2D (image) and the 3D (model) feature positions.We use the cv::solvePnP function in the OpenCV library [21] for pose estimation.Fig. 5 illustrates the object recognition and pose estimation procedure.

3.3 Mobile base control

The omnidirectional mobile base uses four actuated casters with a differentialdrive mechanism [22]. The mobile base is also equipped with three LRFs, whichare used for an ICP-based ego-motion estimation provided by Mobile RobotProgramming Toolkit (MRPT) [23].

In detecting candidates on a table, the robot moves to a position which has acertain relative distance (about 1 [m]) from the table so that the whole tabletopcan be observed. In recognizing the target object, the robot approaches thetable to obtain an enough number of SIFT features using the hand camera. Thepositions for recognition are determined considering the placements of targetcandidates on the table; nearer candidates are grouped to reduce the number ofmovements in front of the table. The robot keeps a right-angle position to thetable in both observations. Fig. 6 illustrates a typical movement of the robot.

3.4 Arm and hand control

The hands of the robot is used for placing a camera above a candidate objectfor recognition as well as pick and place operations. In the case of recognition,the upper surface of each candidate is observed . The camera pose is defined inadvance with respect to the coordinate system attached to the correspondingsurface. For pick and place, we use a predefined set of grasping and approachingposes, also defined with respect to the surface coordinates. We implemented

(a) Simulation/action for observation. (b) Simulation/action for picking.

Fig. 7. Hand motion generation.

Fig. 8. Experimental scene. There are three tables and the target object is on theleftmost table with respect to the robot.

a point cloud-based collision check procedure, which can check both robot-to-object collisions and self-collisions. Fig. 7 shows examples of hand movementswith collision checks.

3.5 Automatic object search experiment

We performed automatic object search experiments. Fig. 8 shows the experi-mental scene. The robot examines tables in the right-to-left order because therightmost table is the nearest to the initial position. Since the target object is onthe leftmost table from the robot, it at least searches every table for candidatedetection.

Fig. 9 shows snapshots of an automatic object search. The search processis as follows. After detecting three tables in the room (Step 1), the robot firstmoved to the rightmost one to find a candidate (Step 2). Since the candidate wasnot the target (Step 3), the robot moved to the center table where no candidateswere found (Step 4). It then move to the leftmost one to find a candidate (Step5). Since the candidate was recognized as the target, the robot picked it up (Step6) and brought it to the initial position (Step 7).

(a) Step 1: Detect three tables. (b) Step 2: Move to the first table and find acandidate.

(c) Step 3: Fail to recognize (no target object). (d) Step 4: Move to the second table and find nocandidates.

(e) Step 5: Move to the last table and find a can-didate.

(f) Step 6: Succeed to recognize and grasp thetarget.

(g) Step 7: Go back to the initial position.

Fig. 9. Snapshots of an experiment of automatic object search. The views from therobot are shown on the right at each step.

4 Collaborative Object Search

A child on his father mentioned in Sec. 1 is the analogy for the collaborativeobject search in this work. The child does not walk by himself but looks aroundto independently search for a target object, and once he finds it, he tells hisfather of the location of the target. This is a kind of interruption to father’saction, and the father takes the advice and moves to it.

To provide an independent view to the operator, a fish-eye camera is put onthe mobile base and is made controllable to the operator. The operator changesthe view for searching for the target and gives verbal advice to the robot.

(a) Setting of the fish-eye camera.

(b) View the remotescene and the robot.

(c) Zoom up the centertable in (b).

(d) Zoom up the left ta-ble in (b).

Fig. 10. Fish eye camera and examples views.

4.1 Fish-eye camera-based interface

The fish-eye camera is set at the rear of the robot as shown in Fig. 10(a).This setting enables the operator to view not only the remote scene but alsothe robot’s state (see Fig. 10(b)). The camera has a function of extracting anarbitrary part of the fish-eye image and converting it to a perspective image.The operator can thus control the pan/tilt/zoom of a virtual camera using the3D mouse. Fig. 10(b)-(d) show examples of images taken from the same robotposition.

4.2 Voice command-based instruction

The operator uses voices to instruct the robot to take a better action thanthe robot’s current one. Since the task (i.e., target object search) is simple,instructions used are also simple enough to be easily used by the operator. Table1 summarizes the voice instructions and the corresponding robot actions to beinvoked.

Table 1. Voice instructions

Voice instruction Robot action

“Hiro” (name of robot) Stop action

“Left table” Look at the table on the left

“More to the left” Look at the table next to the left one

“Right table” Look at the table on the right

“More to the right” Look at the table next to the right one

“Come back” Come back to the initial position

“Search there” Move to the table in front of the robot for search

4.3 Collaborative search experiments

Fig. 11 shows snapshots of a collaborative object search when the target existsin the scene. After detecting three tables in the room (Step 1), the robot movedto the rightmost one. While this movement, the operator found the target on theleftmost table and said “Hiro” to stop the robot (Step 2). The operator then said“left table” and the robot looked at the center table (Step 3). Since the targetis on the table at the left side of the current one1, the operator further said“more to the left” and the robot looked at the leftmost table (Step 4). Then theoperator said “search there” and the robot moved to that table (Step 5). As therobot found a candidate, it approached the table (Step 6). Since the candidatewas recognized as the target, the robot picked it up (Step 7) and fetched it tothe initial position (Step 8).

Fig. 12 shows snapshots of a collaborative object search when the targetdoes not in the scene. After detecting three tables, the robot moved to therightmost one and further approached it because a candidate was found (Step1). Recognition using a hand camera failed (Step 2). While the robot was movingto the next table, the operator noticed that there were no target objects in theroom and said “Hiro” to stop the robot (Step 3). The robot stopped searchingand came back to the initial position (Step 4).

4.4 Comparison of automatic and collaborative search

In collaborative search, the operator observes the remote scene from a distantplace through a fish-eye camera and a display and, once he finds the target, heinterrupts the robot and instructs the place to search (or orders to stop search).Appropriate operator’s advice keeps the robot from examining tables withouttargets, thereby reducing the total cost of search.

We compared the automatic and the collaborative object search in terms ofthe total search time, the number of tables examined for candidates, and that ofcandidates examined for target recognition. Table 2 summarizes the comparisonresults. The collaborative search is more efficient than automatic one by thetimely advice from the operator to the robot.

5 Conclusions and Future Work

This paper describes a novel type of human-robot collaboration for object search.Analogy to the child-on-the-shoulder case, the operator examines the remotescene through a camera on the robot, by which he can observe the state ofthe robot as well as the scene, and gives timely advice to the robot. We haveimplemented an experimental system and shown the collaborative object searchis more efficient than automatic one in several preliminary experiments.1 Note that the operator was able to see which table the robot is looking at through

a fish-eye camera

(a) Step 1: Detect three tables. (b) Step 2: While the robot moves to the firsttable, the operator finds the target objects andorders it to stop.

(c) Step 3: The operator says “left table” and therobot looks at the center table.

(d) Step 4: The operator says “more to the left”and the robot shift the focus on the leftmost one.

(e) Step 5: The operator says “search there” andthe robot move to the table.

(f) Step 6: Detect a candidate and approach tothe table.

(g) Step 7: Recognize the target and pick it up. (h) Step 8: Go back to the initial position.

Fig. 11. Snapshots of an experiment of collaborative object search. The views fromthe robot are shown on the right at each step.

The current system deals with only object on tables. Extending the spaceto search to various locations (e.g., in the shelf) is desirable. This will requireincreasing voice instructions so that various places can be specified. Commu-nications between the operator and the robot could be more interactive, sincethe current communication is unidirectional (from operator to robot). The robotmay want to actively ask about probable place of a target object or ask the oper-ator to examine some place where the robot thinks objects probably exists. Suchkinds of interactions, which may be observed in actual child-father interactionsare expected to make the collaborative search much more efficient.

(a) Step 1: The robot approaches to the first ta-ble to recognize the candidate.

(b) Step 2: Recognition fails.

(c) Step 3: While the robot moves to the secondtable, the operator notices that no target existand says “Hiro” to stop the robot.

(d) Step 4: The robot stops searching and comesback to the initial position.

Fig. 12. Snapshots of another experiment of collaborative object search when the targetobject does not exist in the scene. The views from the robot are shown on the right ateach step.

Table 2. Comparison of automatic and collaborative search.

Case 1: the target object exists in the room.method search time # of table examined # of candidates examined

automatic 5 min. 20 sec. 3 2

collaborative 3 min. 33 sec. 1 1

Case 2: the target object does not exist in the room.method search time # of table examined # of candidates examined

automatic 4 min. 47 sec. 3 2

collaborative 3 min. 36 sec. 1 1

References

1. A.A. Makarenko, S.B. Williams, F. Bourgault, and H.F. Durrant-Whyte. An Ex-periment in Integrated Exploration. In Proceedings of 2002 IEEE/RSJ Int. Conf.on Intelligent Robots and Systems, pp. 534–539, 2002.

2. R. Martinez-Cantin, N. de Freitas, R. Brochu, J. Castellanos, and A. Doucet.A Bayesian Exploration-Exploitation Approach for Optimal Online Sensing andPlanning with a Visually Guided Mobile Robot. Autonomous Robots, Vol. 27, pp.93–103, 2009.

3. Y. Ye and J.K. Tsotsos. Sensor Planning for 3D Object Search. Computer Visionand Image Understanding, Vol. 73, No. 2, pp. 145–168, 1999.

4. K. Shubina and J.K. Tsotsos. Visual Search for an Object in a 3D EnvironmentUing a Mobile Robot. Computer Vision and Image Understanding, Vol. 114, pp.535–547, 2010.

5. F. Saidi, O. Stasse, K. Yokoi, and F. Kanehiro. Online Object Search with aHumanoid Robot. In Proceedings of 2007 IEEE/RSJ Int. Conf. on IntelligentRobots and Systems, pp. 1677–1682, 2007.

6. A. Aydemir, K. Sjoo, J. Folkesson, A. Pronobis, and P. Jensfelt. Search in the RealWorld: Active Visual Object Search Based on Spatial Relations. In Proceedings of2011 IEEE Int. Conf. on Robotics and Automation, pp. 2818–2824, 2011.

7. H. Masuzawa and J. Miura. Observation Planning for Efficient Environment Infor-mation Summarization. In Proceedings of 2009 IEEE/RSJ Int. Conf. on IntelligentRobots and Systems, pp. 5794–5780, 2009.

8. H. Masuzawa and J. Miura. Observation Planning for Environment InformationSummarization with Deadlines. In Proceedings of 2010 IEEE/RSJ Int. Conf. onIntelligent Robots and Systems, pp. 30–36, 2010.

9. M. Boussard and J. Miura. Observation planning for object search by a mobilerobot with uncertain recognition. In Proceedings of the 12th Int. Conf. on Intelli-gent Autonomous Systems, 2012. F3B.5 (CD-ROM).

10. B. Bonet and H. Geffner. Labeled RTDP: Improving the Convergence of Real-TimeDynamic Programming. In Enrico Giunchiglia, Nicola Muscettola, and Dana S.Nau, editors, Proceedings 13th Int. Conf. on Automated Planning and Scheduling(ICAPS-2003), pp. 12–31. AAAI, 2003.

11. T. Fong, C. Thorpe, and C. Baur. Advanced Interfaces for Vehicle Teleopera-tion: Collaborative Control, Sensor Fusion Displays, and Remote Driving Tools.Autonomous Robots, Vol. 11, pp. 77–85, 2001.

12. S. Suzuki. A Vision System for Remote Control of Mobile Robot to Enlarge Fieldof View in Horizontal and Vertical. In Proceedings of 2011 IEEE Int. Conf. onRobotics and Biomimetics, pp. 8–13, 2011.

13. K. Saitoh, T. Machida, K. Kiyokawa, and H. Takemura. A 2D-3D Integrated In-terface for Mobile Robot Control using Omnidirectional Images and 3D GeometricModels. In Proceedings of 2006 IEEE/ACM Int. Symp. on Mixed and AugmentedReality, pp. 173–176, 2006.

14. N. Shiroma, N. Sato, Y. Chiu, and F. Matsuno. Study on Effective Camera Imagesfor Mobile Robot Teleoperation. In Proceedings of 13th IEEE Int. Workshop onRobot and Human Interactive Communication, pp. 107–112, 2004.

15. T. Fong, C. Thorpe, and C. Baur. A Safeguarded Teleoperation Controller. InProceedings of 2001 IEEE Int. Conf. on Advanced Robotics, 2001.

16. T. Sawaragi, T. Shiose, and G. Akashi. Foundations for Designing an EcologicalInterface for Mobile Robot Teleoperation. Robotics and Autonomous Systems,Vol. 31, pp. 193–207, 2000.

17. N. Ando, T. Suehiro, and T. Kotoku. A Software Platform for Component BasedRT System Development: OpenRTM-aist. In Proceedings of the 1st Int. Conf. onSimulation, Modeling, and Programming for Autonomous Robots (SIMPAR ’08),pp. 87–98, 2008.

18. OpenRTM. http://openrtm.org/openrtm/en/.19. D.G. Lowe. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. of

Computer Vision, Vol. 60, No. 2, pp. 91–110, 2004.20. Point Cloud Library. http://pointclouds.org/.21. OpenCV. http://opencv.org/.22. Y. Ueno, T. Ohno, K. Terashima, H. Kitagawa, K. Funato, and K. Kakihara.

Novel Differential Drive Steering System with Energy Saving and Normal Tireusing Spur Gear for Omni-directional Mobile Robot. In Proceedings of the 2010IEEE Int. Conf. on Robotics and Automation, pp. 3763–3768, 2010.

23. MRPT. http://www.mrpt.org/.