knowledge-based control of a humanoid...

Knowledge-based Control of a Humanoid Robot

Dongkyu Choi, Yeonsik Kang, Heonyoung Lim, and Bum-Jae You

Abstract— Today, there are increased interest and variousefforts in using cognitive architectures to control robotic plat-forms. Recent advances to essential capabilities in robots con-tributed to this trend, which tries to meet the increased demandfor high-level control mechanisms. The tradition of cognitivearchitectures aims for general intelligence, and they have somegreat potential for use in robots that are now increasingly morecapable of complex tasks. In this paper, we introduce one sucharchitecture, ICARUS, used in a robotic environment to provideknowledge-based control for a humanoid robot, MAHRU. Weshow some experimental observations in the Blocks Worlddomain.

I. INTRODUCTION

Recently, researchers in the field of robotics show in-creased interests in the use of cognitive architectures [1].There have been various efforts to use them to control roboticplatforms, which now possess essential capabilities requiredfor interesting high-level tasks. These architectures, designedas computational models of human cognition, commits to aspecific set of knowledge representation and the processesthat work over it. They provide mechanisms for inferring thestate of the world, making decisions based on the computedstate, and applying actions that are needed. In most cases,they also support learning in various ways, including, butnot limited to, learning from problem solving, learning fromobservations, and learning from failures [2][3][4][5].

One such architecture, ICARUS, has a long history ofcontinuous development and revisions, and it is specificallydesigned for physical domains [3]. Previously, it was usedsuccessfully in various domains, which include a simulatedBlocks World, a FreeCell solitaire game [3], a simulationof urban driving [6], and a first-person shooter game, UrbanCombat [7]. These domains provided challenging environ-ments for the ICARUS architecture, with different levels ofcomplexity. They served as important testbeds on which mostof the components and features of the architecture wereevaluated and experimented. However, there has always beenthe desire and need for a domain that involves some type ofphysical machinery.

In this paper, we report an application of the ICARUS

architecture in a simulated robotic environment. The simula-tion is a realistic computational representation of a humanoidrobot, MAHRU [8]. As a preliminary step before the actual

D. Choi is with Cognitive Systems Laboratory, Center for the Study ofLanguage and Information, Stanford University, Stanford, CA 94305, [email protected] (corresponding author)

Y. Kang and B.-J. You are with Center for Cognitive RoboticsResearch, Korea Institute of Science and Technology, Seoul, [email protected], [email protected]

H. Lim is with Robust Design Engineering Laboratory, Seoul NationalUniversity, Seoul, Korea. [email protected]

integration of the architecture and the humanoid, we havedevised an interface between ICARUS and the simulation.We selected a familiar, but often challenging domain, theBlocks World, for our first set of experiments. As a simplifiedversion of the book shelving task, we created a scenario, inwhich the robot should sort blocks of different colors andstack them in towers with the same-colored blocks.

In the following sections, we first review the fundamentalsof the ICARUS architecture, and provide more details onthe humanoid robot, MAHRU, and its simulation. Then wepresent some experimental observations in the Blocks World.We conclude after a discussion on related and future work.

II. REVIEW OF ICARUS ARCHITECTURE

The ICARUS architecture is one of the systems that grewout of the cognitive architecture movement. It shares somebasic features with other architectures, like Soar [2] andACT-R [9]. The architecture commits to a specific represen-tation of knowledge, and distinguishes long-term and short-term memories. It has separate memories for its conceptualknowledge that describes the state of the environment, and itsskill knowledge that specifies procedures to achieve specificgoals. The system provides an organized way to specifydomain knowledge that is readily available for execution,and supports various methods for learning to acquire theknowledge automatically.

In this section, we describe ICARUS’ commitment for acertain representation of knowledge and for memories thatstore the knowledge. Then we show processes that work overthese memories, including the inference of current situation,the evaluation of skills, and the execution of actions in theworld.

A. Representation and Memories

Based on psychological evidences, ICARUS distinguishesconceptual and skill knowledge, and the architecture hasseparate memories for them. It further distinguishes long-term and short-term knowledge, and a long-term conceptualmemory stores definitions of its conceptual knowledge, orconcepts, and a short-term conceptual memory, or beliefmemory, houses instances of these concepts that are true inthe current state. Similarly, a long-term skill memory storesdefinitions of ICARUS’ procedural knowledge, or skills, anda short-term skill memory records instantiated skills. Theseskill instances are, however, closely tied to the goals theyachieve, and the short-term skill memory also serves as agoal memory, and it stores substantially more informationthan the skill instances themselves.

The 2009 IEEE/RSJ International Conference onIntelligent Robots and SystemsOctober 11-15, 2009 St. Louis, USA

978-1-4244-3804-4/09/$25.00 ©2009 IEEE 3949

TABLE I

SOME SAMPLE CONCEPTS FOR OUR ROBOT, MAHRU IN THE

SIMULATED BLOCKS WORLD DOMAIN.

((holding ?gripper ?block):percepts ((gripper ?gripper status ?block)

(block ?block)))

((hand-empty ?gripper):percepts ((gripper ?gripper status ?status)):tests ((eq ?status ’empty)))

((clear ?block):percepts ((block ?block)):relations ((not (on ?other ?block))))

Table I shows some sample concepts used in the simulatedBlocks World domain. Concept definitions start with a head,which includes the name of the concept, and its argumentsthat are often variables marked with a preceeding questionmark as in ?gripper and ?block. The definitions alsohave several optional fields like :percepts that includesperceptual matching conditions, :tests that specifies con-ditions among matched variables, and :relations that hasreferences to other concepts. The first two concepts shownare primitive, in the sense that they consist solely of variablematchings against objects in the world like a gripper anda block and conditions among these variables. The lastconcept, however, is a non-primitive one, which refers toanother concept. In this case, the reference is negated, andthe concept (clear ?block) matches when the predicate (on?other ?block) is not true in the current state.

In Table II, we provide some sample skills for MAHRU inthis domain. As with concepts, ICARUS’ skills consist of ahead and several optional fields. While a :percepts fieldserves the same purpose as in concepts, a :start fieldspecifies preconditions of the skill, an :actions field hasits implied actions in the world, and a :subgoals fieldincludes a subgoal decomposition for the particular skill.The first example shown is a primitive skill, which directlyrefers to basic actions the ICARUS agent can apply in theenvironment. However, the rest of skills are non-primitive,and they include references to subgoals that, in turn, leadto other skills. In this way, both concepts and skills inthe ICARUS architecture are hierarchically organized, andthey provide rich vocabulary to describe complex states andprocedures required by domains in the real world.

The ICARUS architecture also has a goal memory. Thismemory stores the agent’s top-level goals and their subgoalswith the skill instances it intends to execute, or has executedfor them. The information is stored in the form of goal stacks,as shown in Table III. Here, the top-level goal is (color-sorted), which means that all the blocks are sorted by theircolors in towers. The system retrieved a skill instance withid 19 that is known to achieve the goal. Also shown is theexecution stack on the particular cycle, which we will discusslater in this section.

TABLE II

SOME SAMPLE SKILLS FOR OUR ROBOT, MAHRU IN THE SIMULATED

BLOCKS WORLD DOMAIN.

((stacked ?block ?to) ID: 3:percepts ((gripper ?gripper)

(block ?block)(block ?to x ?x y ?y z ?z height ?h))

:start ((stackable ?gripper ?block ?to)):actions ((*release ?block ?x ?y (+ ?z ?h) pb)))

((on ?block1 ?block2) ID: 16:percepts ((block ?block1)

(block ?block2)):subgoals ((stackable ?gripper ?block1 ?block2)

(stacked ?block1 ?block2)))

((one-color-sorted ?color) ID: 17:percepts ((block ?block1 color ?color)

(block ?block2)):start ((same-color ?block1 ?block2)

(clear ?block2)):subgoals ((on ?block2 ?block1)

(one-color-sorted ?color)))

((one-color-sorted ?color) ID: 18:percepts ((block ?block1 color ?color)

(block ?block2)(block ?block3))

:start ((same-color ?block1 ?block2)(same-color ?block1 ?block3)(on ?block2 ?block1))

:subgoals ((clear ?block3)(on ?block3 ?block2)(one-color-sorted ?color)))

((color-sorted) ID: 19:start ((not-color-sorted ?color)):subgoals ((one-color-sorted ?color)

(color-sorted)))

B. Inference, Evaluation, and Execution

The ICARUS architecture runs in distinct cycles. On eachcycle, it performs a series of cognitive processes, includingthe perception of its surroundings, the inference of conceptinstances based on the perceived data, the evaluation of skillsunder the current state, and the execution of actions impliedby the chosen skill instance. Fig. 1 shows ICARUS’ operationon each cycle. At the beginning of each cycle, ICARUS

receives information on the objects it can perceive from theenvironment. This information is deposited in the perceptualbuffer. ICARUS then matches its concepts stored in the long-term conceptual memory against perceived objects, infers allthe concept instances that are true in the current state, andstores them in its belief memory.

Based on the current beliefs, the architecture evaluates itshierarchical skills and finds a skill path that is most relevantto the situation. Shown in the :EXECUTION STACK fieldin Table III is an example of an executable skill path. In thiscase, the goal is (color-sorted), and the system retrieves anon-primitive skill with the same name that has an id 19. Thisskill refers to another skill, (one-color-sorted green), and it,in turn, refers to (on block4 block2). This continues until

3950

TABLE III

AN EXAMPLE OF A GOAL STACK STORED IN ICARUS’ GOAL MEMORY.

(A SIMPLIFIED VERSION IS SHOWN FOR EASY REFERENCE.)

[(GOAL:CHAINTYPE SKILL:GOALTYPE PRIMARY:OBJECTIVE (COLOR-SORTED):INTENTION ((COLOR-SORTED) ID: 19 BINDINGS: NIL:EXECUTION STACK#1 (((COLOR-SORTED) ID: 19

BINDINGS: ((?COLOR . GREEN))((ONE-COLOR-SORTED GREEN) ID: 17

BINDINGS: ((?BLOCK1 . BLOCK2)(?BLOCK2 . BLOCK4)(?COLOR . GREEN))

((ON BLOCK4 BLOCK2) ID: 16BINDINGS: ((?BLOCK1 . BLOCK4)

(?BLOCK2 . BLOCK2))((STACKABLE ?GRIPPER BLOCK4 BLOCK2) ID: 12

BINDINGS: ((?GRIPPER2 . G-R)(?GRIPPER1 . G-L)(?BLOCK . BLOCK4)(?TO . BLOCK2))

((REACHABLE-BY-BOTH-ARMS BLOCK4) ID: 14BINDINGS: ((?FROM . T) (?BLOCK . BLOCK4)

(?GRIPPER . G-L))((PUTDOWN-IN-THE-CENTER BLOCK4 ?TO) ID: 4

BINDINGS: ((?HEIGHT . 0.89) (?Z . 0.89)(?Y . 0.0) (?X . 0.0)(?TO . T) (?GRIPPER . G-L)(?BLOCK . BLOCK4))))]

Fig. 1. A schematic of execution cycles in ICARUS architecture. Rectanglesdenote the environment (far right) and ICARUS’ buffers and memories, whileoval shapes denote processes that work over them.

it reaches a primitive skill, (putdown-in-the-center block4t). This path represents the decision ICARUS made at themoment, with the most abstract skill on top and the mostspecific one at the bottom.

Once an executable skill path is found, the architectureexecutes actions implied by the skill path to change itsenvironment. This, in turn, changes ICARUS’ perception onthe next cycle. Then the system continues to the next stepof the procedure. It is notable that this cyclic operationgives reactivity to the ICARUS architecture while stayinggoal-oriented. Therefore, ICARUS agents can readily adapt to

unexpected situations or outcomes. For example, if the robotfails to pick up a block or loses the grip of one, ICARUS willperceive the unexpected outcome in its subsequent cycle andcorrect the situation through retrials. We will discuss more onICARUS’ ability to recover from failures later in this paper. Inthe next section, we introduce our robot platform, MAHRUin more detail.

III. A HUMANOID ROBOT: MAHRU

We integrated the ICARUS architecture with a simulationof a humanoid robot, MAHRU, developed at Korea Instituteof Science and Technology. The simulated environment isa realistic representation of the actual robot, and manycomponents of the current interface will transfer directly tothe eventual integration of ICARUS and MAHRU. In thissection, we provide more details on the humanoid platformand its simulation.

A. Hardware Platform

As shown in Fig. 2, MAHRU is a humanoid robot platformthat stands at 150cm tall and weighs about 57kg. It canreach the maximum speed of 1.2km/h. The robot has 12-DOF in its legs and 20-DOF in the two arms including4-DOF in its grippers. The robot has an IMU (InertialMeasurement Unit) sensor mounted on its body, and it isequipped with four force/torque sensors on its arms andlegs. For real-time control, the robot runs on a PC withreal-time Linux (RTAI/Xenomai) operating system. It usescommunication lines that follows the IEEE 1394 protocol todeliver control signals between the main PC and its DSP-based sub-controllers [10][11].

The robot is specifically designed as a network-basedhumanoid, which can send various sensor data to externalservers over the wireless network to perform high-levelrecognition, inference, and decision-making. This type ofrobots costs less than standalone systems, and they have in-creased operation time per each battery charge. Furthermore,it provides more flexibility for features and services a robotcan have.

Fig. 2. A humanoid platform, MAHRU

3951

B. Simulated Platform

We used a 3-D simulation of MAHRU in SimStudio[12], a simulation software with a built-in dynamics enginethat solves complex rigid body contact problems includingcollision detection and resolution. The environment providesmodels for various sensors and actuators. It allows real-timeanalysis of the simulated robotic system. Fig. 3 shows ascreenshot of the simulation environment.

Using the virtual sensor and actuator simulations in thesoftware, we encoded in the simulator the same motioncontrol algorithm as in the actual robot. Since this algorithmcontrols all the low-level robot motions, the simulator caninterface with higher-level controllers – in this case, theICARUS architecture – through various text-based commands[13]. For example, we can trigger a grasping motion byissuing a command like ”grab object with id 1”, and have therobot walk forward by sending a command, ”walk forward1 meter”.

Fig. 3. 3-D dynamic simulation environment

IV. EXPERIMENTAL OBSERVATIONS

We integrated the ICARUS architecture and the MAHRUhumanoid robot with a TCP connection between them, unlikesome previous work on ICARUS that used shared memorystructures [7]. We considered the fact that our robot isdesigned to be network-based, and high-level control systemsfor this robot should be on a server connected through awireless interface. We gave the integrated system some basictasks in the simulation of the Blocks World. By doing so,we wanted to verify that ICARUS can indeed make high-leveldecisions for the humanoid in this domain, and that MAHRUcan execute these control commands smoothly and achieveits goal.

In the simulated Blocks World, we created a set of taskswith an initial condition where seven blocks are on table.There are two red blocks, three green blocks, and twoblue blocks. Table IV shows the tasks we gave our robot,

TABLE IV

BLOCKS WORLD TASKS GIVEN TO MAHRU

goals tasks cycles

(on block0 block1) build a tower of two red blocks 5

(on block0 block6)build a tower of two blocks thatinvolve a hand change 7

(color-sorted) build towers of same-colored blocks 13

MAHRU, with the corresponding goal encoded for each taskand the number of cycles it took until completion when theprobability of action failures in the environment were setto zero. The first task is building a tower of two red blocks,block0 and block1, and therefore achieving the goal, (onblock0 block1). The task requires a simple movement of ablock, grasping block0 and moving and releasing it onblock1. It took five cycles for ICARUS-controlled MAHRUto complete this task.

The second task is more complicated, although the goal isto build a similar tower by achieving (on block0 block6). Inthis case, one block is on the right-hand side of the robot,and the other is on the left-hand side of the robot. Due to thelimited reachability of each hand, MAHRU cannot move thefirst block onto the second block with one hand. It shouldchange hands in between to achieve the goal. Therefore,MAHRU first grasps block0 with its right hand and movesit to the center of the table where both of its hands can reach.Then it uses its left hand to grasp the block and moves itonto block6 on its left side. This task took two more cyclesfor MAHRU, finishing the execution in seven cycles.

The two tasks so far require two different procedures forgoals in the form, (on ?block1 ?block2). The first proceduresimply moves a block on top of another, but the second oneinvolves a hand change during the move. In ICARUS, we canhave multiple disjunctive skills for a given goal, representingprocedures for different preconditions. We will see anotherexample of disjunctive skills below, during our discussion onthe third task.

The last task is also the most complicated, and it is worthdiscussing it in more detail. Figure 4 shows the sequence ofactions ICARUS commanded MAHRU to achieve the color-sorting goal. Here, the robot should build three differenttowers, one with two red blocks, another with three greenblocks, and yet another with two blue blocks. For this task,ICARUS decomposes the top-level goal, (color-sorted) intothree different subgoals, which are distinct instances of (one-color-sorted ?color) for the three colors, red, green, and blue.

The system then chooses a color at random, and it workson red blocks first in this case (see 1–3 in Fig. 4). Here, ituses a particular definition of the skill, the one with id 17shown earlier in Table II. The skill is more general than theother one with id 18, and its preconditions match when thereare at least two blocks of a color and one of them is clear.Once this skill finishes, the system selects the next color,

3952

Fig. 4. Screenshots of MAHRU performing the color sorting of blocks.

green (see 4–9). In this case, it first uses the same skill asabove, and build a green tower with two blocks with anothergreen block still on table. Then it uses the skill with id 18,for which more specific preconditions match – there are threeblocks that are green, and two of them are stacked. After thesystem successfully completes the procedure, it builds a bluetower in a similar way as it did with red blocks (see 10–12).

As noted earlier, we acquired the results above with theprobability of action failures set to zero. When we increasedthe probability, ICARUS sometimes faced unexpected out-comes such as failing to grab a block or dropping a blockelsewhere than its intended location. In such cases, thearchitecture reacted to the unexpected situation, and persis-tently made attempts to fix the failure, or proceeded with adifferent skill that is known to work in the new situation.Of course, these failures caused an inevitable increase inICARUS’ cognitive cycles to complete its tasks. For instance,at 10% probability of action failures, it took 17.1 cycles onaverage for ICARUS to complete the color sorting task, whichis an increase of approximately 30%.

We verified that the ICARUS architecture is able to makehigh-level decisions in the robotic Blocks World domaineven under the possibility of action failures, and that the hu-

manoid, MAHRU can execute the commands from ICARUS

correctly. The physical restrictions the robot platform hasadd complexity in various dimensions, but the architecturesuccessfully manages to produce valid procedure to achievethe given goals.

V. RELATED AND FUTURE WORK

Recently, we have seen increased interests in cognitivearchitectures among researchers in robotics and its relatedfields. Cognitive psychologists and researchers in artificialintelligence have developed these architectures as steps to-ward general intelligence over a relatively long period oftime, and they usually provide stable architectural supportfor robotic applications in many different directions. Forinstance, Benjamin et al. [15] developed a new architecturespecifically for robots based on the Soar architecture. Theyused the system with a mobile robot designed to interact withpeople under different circumstances.

Kieras and Meyer [14] used the EPIC architecture forhuman-computer interaction. Taking advantage of the archi-tecture’s multi-task performance, they showed that modelinghuman-computer interaction in EPIC results in a precisecomputational models. The EPIC architecture successfullysimulates human behavior including visual and auditoryprocesses.

Meanwhile, other researchers developed their own archi-tectures for cognitive control of robots. Kawamura et al. [16]developed a humanoid robot, ISAC and extended the systemto a cognitive architecture. The architecture was multiagent-based, with attention, memory, and internal rehearsal hap-pening in parallel.

Kasderidis [17] introduced the GNOSYS architecture, asan execution system for robotic agents. It provides manyfeatures found in traditional cognitive architectures, includ-ing execution for multiple goals, dynamic prioritization, andopportunistic behavior.

Although our current work shows some promising re-sults, our research on using the ICARUS architecture in therobotic platform is still at its preliminary stage. We willcontinue our work by replacing the simulation with theactual robot, MAHRU, and testing the integrated systemin the Blocks World. Then we will attempt to use morecapabilities supported by the ICARUS architecture in therobotic environment, especially, learning from problem solv-ing and learning from observations. We would also like touse more of the features our robot has. For example, therobot’s capability to move around with its biped will openup new possibilities in its applications and it certainly willchallenge the ICARUS architecture’s ability to control morecomplicated behaviors. We plan to find another domain thatinvolves all of these additional capabilities that are readilyavailable in our integrated system. We hope to report moreon this topic in the near future.

3953

VI. CONCLUSIONS

In this paper, we integrated a cognitive architecture,ICARUS and a humanoid robot, MAHRU. We tested theintegrated system in a robotic Blocks World domain, and itsuccessfully performed tasks at different levels of complex-ity, which included a task to building towers of same-coloredblocks from an initial condition with blocks of three differentcolors. Although the integration still used a simulation of theactual robot, the simulated environment provided a realisticrepresentation of the platform with adjustable probability ofaction failures, to which ICARUS reacted properly. Therefore,we are confident that the overall system will work smoothlywith the robot, and we hope that we can report on theintegration in the near future.

VII. ACKNOWLEDGMENTS

This research was funded in part by an individual researchcontract for Project No. 2E20870 from Korea Institute of Sci-ence and Technology. Discussions with Pat Langley, ChunkiPark, and Negin Nejati contributed to many ideas presentedhere.

REFERENCES

[1] A. Newell, Unified Theories of Cognition, Harvard University Press,Cambridge, MA; 1990.

[2] J.E. Laird, P.S. Rosenbloom, and A. Newell, Chunking in Soar: Theanatomy of a general learning mechanism, Machine Learning, vol. 1,1986, pp. 11–46.

[3] P. Langley, D. Choi, and S. Rogers, Acquisition of hierarchical reactiveskills in a unified cognitive architecture, Cognitive Systems Research,vol. 10, 2009, pp. 316–332.

[4] R.M. Jones, and P. Langley, A Constrained Architecture for Learningand Problem Solving, Computational Intelligence, vol. 21, 2005, pp.480–502.

[5] N. Nejati, P. Langley, and T. Konik, ”Learning Hierarchical TaskNetworks by Observation”, in Proceedings of the Twenty-Third In-ternational Conference on Machine Learning, Pittsburgh, PA, 2006.

[6] D. Choi, M. Morgan, C. Park, and P. Langley, ”A testbed for evaluationof architectures for physical agents”, in Proceedings of the AAAI-2007Workshop on Evaluating Architectures for Intelligence, Vancouver,BC, 2007.

[7] D. Choi, T. Konik, N. Nejati, C. Park, and P. Langley, ”Structuraltransfer of cognitive skills”, in Proceedings of the Eighth InternationalConference on Cognitive Modeling, Ann Arbor, MI, 2007.

[8] B.-J. You, Y. Choi, M. Jeong, D. Kim, Y. Oh, C. Kim, J. Cho, M.Park, and S. Oh, ”Network-based humanoids ’Mahru’ and ’Ahra’”,in Proceedings of the International Conference on Ubiquitous Robotsand Ambient Intelligence, 2005, pp. 376–379.

[9] J.R. Anderson, Rules of the Mind, Lawrence Erlbaum, Hillsdale, NJ;1993.

[10] S. Kim, C. Kim, and J.H. Park, ”Human-like arm motion generationfor humanoid robots using motion capture database”, in Proceedingsof the IEEE/RSJ International Conference on Intelligent Robots andSystems, 2006.

[11] Y.-H. Chang, Y. Oh, D. Kim, and S. Hong, ”Balance control in wholebody coordination framework for biped humanoid robot MAHRU-R”.in Proceedings of the IEEE International Symposium on Robot andHuman Interactive Communication, 2008.

[12] SimLab co., ltd., ”SimStudio2 User Manual (English, v2.4.1)”,http://www.simstudio.co.kr/english/.

[13] H. Lim, Y. Kang, J. Lee, J. Kim, and B.-J. You, ”Multiple humanoidcooperative control system for heterogeneous humanoid team”, inProceedings of the IEEE International Symposium on Robot andHuman Interactive Communication, 2008.

[14] D.E. Kieras and D.E. Meyer, An overview of the EPIC architecturefor cognition and performance with application to human-computerinteraction, Human Computer Interaction, vol. 12, 1997, pp. 391–438.

[15] D.P. Benjamin, D. Lonsdale, and D. Lyons, ”A cognitive roboticsapproach to comprefending human language and behaviors”, in Pro-ceedings of the Conference on Human-Robot Interaction, 2007.

[16] K. Kawamura, S.M. Gordon, P. Ratanaswasd, E. Erdemir, and J.F.Hall, Implementation of cognitive control for a humanoid robot,International Journal of Humanoid Robotics, vol. 5, 2008, pp. 547–586.

[17] S. Kasderidis, ”An executive system for cognitive agents”, in Proceed-ings of the IEEE/RSJ International Conference on Intelligent Robotsand Systems, 2008.

3954

knowledge-based control of a humanoid...

Documents