ada 459229

8
32 1094-7167/02/$17.00 © 2002 IEEE IEEE INTELLIGENT SYSTEMS I n t e r a c t i v e E n t e r t a i n m e n t Toward a New Generation of Virtual Humans for Interactive Experiences Jeff Rickel and Stacy Marsella, USC Information Sciences Institute Jonathan Gratch, Randall Hill, David Traum, and William Swartout, USC Institute for Creative Technologies I magine yourself as a young lieutenant in the US Army on your first peacekeeping mis- sion. You must help another group, called Eagle1-6, inspect a suspected weapons cache. You arrive at a rendezvous point, anxious to proceed with the mission, only to see your platoon sergeant looking upset as smoke rises from one of your platoon’s vehicles and a civilian car. A seriously injured child lies on the ground, surrounded by a distraught woman and a medic from your team. You ask what happened and your sergeant reacts defensively. He casts an angry glance at the mother and says, “They rammed into us, sir. They just shot out from the side street and our driver couldn’t see them.” Before you can think, an urgent radio call breaks in: “This is Eagle1-6. Where are you guys? Things are heating up. We need you here now!” From the side street, a CNN camera team appears. What do you do now, lieutenant? Interactive virtual worlds provide a powerful medium for entertainment and experiential learning. Army lieutenants can gain valuable experience in decision-making in scenarios like the example above. Others can use the same technology for entertaining role-playing even if they never have to face such sit- uations in real life. Similarly, students can learn about, say, ancient Greece by walking through its virtual streets, visiting its buildings, and interacting with its people. Scientists and science fiction fans alike can experience life in a colony on Mars long before the required infrastructure is in place. The range of worlds that people can explore and experi- ence with virtual-world technology is unlimited, ranging from factual to fantasy and set in the past, present, or future. Our goal is to enrich such worlds with virtual humans—autonomous agents that support face-to- face interaction with people in these environments in a variety of roles, such as the sergeant, medic, or even the distraught mother in the example above. Existing virtual worlds, such as military simulations and computer games, often incorporate virtual humans with varying degrees of intelligence. How- ever, these characters’ability to interact with human users is usually very limited: Typically, users can shoot at them and they can shoot back. Those char- acters that support more collegial interactions, such as in children’s educational software, are usually very scripted and offer human users no ability to carry on a dialogue. In contrast, we envision virtual humans that cohabit virtual worlds with people and support face-to-face dialogues situated in those worlds, serv- ing as guides, mentors, and teammates. Although our goals are ambitious, we argue here that many key building blocks are already in place. Early work on embodied conversational agents 1 and animated pedagogical agents 2 has laid the ground- work for face-to-face dialogues with users. Our prior work on Steve 3,4 (see Figure 1) is particularly rele- vant. Steve is unique among interactive animated agents because it can collaborate with people in 3D virtual worlds as an instructor or teammate. Our goal with Steve is to integrate the latest advances from separate research communities into a single agent architecture. While we continue to add more sophis- tication, we have already implemented a core set of capabilities and applied it to the Army peacekeep- ing example described earlier. Virtual humans— autonomous agents that support face-to- face interaction in a variety of roles—can enrich interactive virtual worlds. Toward that end, the Mission Rehearsal Exercise project involves an ambitious integration of core technologies centered on a common representation of task knowledge.

Upload: maja-majci

Post on 17-Sep-2015

235 views

Category:

Documents


1 download

DESCRIPTION

l

TRANSCRIPT

  • 32 1094-7167/02/$17.00 2002 IEEE IEEE INTELLIGENT SYSTEMS

    I n t e r a c t i v e E n t e r t a i n m e n t

    Toward a NewGeneration of VirtualHumans for InteractiveExperiencesJeff Rickel and Stacy Marsella, USC Information Sciences Institute

    Jonathan Gratch, Randall Hill, David Traum, and William Swartout, USC Institute for Creative Technologies

    Imagine yourself as a young lieutenant in the US Army on your first peacekeeping mis-sion. You must help another group, called Eagle1-6, inspect a suspected weaponscache. You arrive at a rendezvous point, anxious to proceed with the mission, only to see

    your platoon sergeant looking upset as smoke rises from one of your platoons vehicles

    and a civilian car. A seriously injured child lies onthe ground, surrounded by a distraught woman anda medic from your team. You ask what happened andyour sergeant reacts defensively. He casts an angryglance at the mother and says, They rammed intous, sir. They just shot out from the side street and ourdriver couldnt see them. Before you can think, anurgent radio call breaks in: This is Eagle1-6. Whereare you guys? Things are heating up. We need youhere now! From the side street, a CNN camera teamappears. What do you do now, lieutenant?

    Interactive virtual worlds provide a powerfulmedium for entertainment and experiential learning.Army lieutenants can gain valuable experience indecision-making in scenarios like the example above.Others can use the same technology for entertainingrole-playing even if they never have to face such sit-uations in real life. Similarly, students can learnabout, say, ancient Greece by walking through itsvirtual streets, visiting its buildings, and interactingwith its people. Scientists and science fiction fansalike can experience life in a colony on Mars longbefore the required infrastructure is in place. Therange of worlds that people can explore and experi-ence with virtual-world technology is unlimited,ranging from factual to fantasy and set in the past,present, or future.

    Our goal is to enrich such worlds with virtualhumansautonomous agents that support face-to-face interaction with people in these environments

    in a variety of roles, such as the sergeant, medic, oreven the distraught mother in the example above.Existing virtual worlds, such as military simulationsand computer games, often incorporate virtualhumans with varying degrees of intelligence. How-ever, these charactersability to interact with humanusers is usually very limited: Typically, users canshoot at them and they can shoot back. Those char-acters that support more collegial interactions, suchas in childrens educational software, are usually veryscripted and offer human users no ability to carry ona dialogue. In contrast, we envision virtual humansthat cohabit virtual worlds with people and supportface-to-face dialogues situated in those worlds, serv-ing as guides, mentors, and teammates.

    Although our goals are ambitious, we argue herethat many key building blocks are already in place.Early work on embodied conversational agents1 andanimated pedagogical agents2 has laid the ground-work for face-to-face dialogues with users. Our priorwork on Steve3,4 (see Figure 1) is particularly rele-vant. Steve is unique among interactive animatedagents because it can collaborate with people in 3Dvirtual worlds as an instructor or teammate. Our goalwith Steve is to integrate the latest advances fromseparate research communities into a single agentarchitecture. While we continue to add more sophis-tication, we have already implemented a core set ofcapabilities and applied it to the Army peacekeep-ing example described earlier.

    Virtual humans

    autonomous agents

    that support face-to-face interaction in avariety of rolescanenrich interactive

    virtual worlds. Toward

    that end, the Mission

    Rehearsal Exercise

    project involves anambitious integration

    of core technologiescentered on a common

    representation of taskknowledge.

  • Report Documentation Page Form ApprovedOMB No. 0704-0188Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering andmaintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information,including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, ArlingtonVA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if itdoes not display a currently valid OMB control number.

    1. REPORT DATE

    2002 2. REPORT TYPE 3. DATES COVERED

    00-00-2002 to 00-00-2002

    4. TITLE AND SUBTITLE

    Toward a New Generation of Virtual Humans for Interactive Experiences 5a. CONTRACT NUMBER

    5b. GRANT NUMBER

    5c. PROGRAM ELEMENT NUMBER

    6. AUTHOR(S) 5d. PROJECT NUMBER

    5e. TASK NUMBER

    5f. WORK UNIT NUMBER

    7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) University of California,Institute for Creative Technologies,13274 FijiWay,Marina del Rey,CA,90292

    8. PERFORMING ORGANIZATIONREPORT NUMBER

    9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITORS ACRONYM(S)

    11. SPONSOR/MONITORS REPORT NUMBER(S)

    12. DISTRIBUTION/AVAILABILITY STATEMENT

    Approved for public release; distribution unlimited

    13. SUPPLEMENTARY NOTES

    The original document contains color images.

    14. ABSTRACT

    15. SUBJECT TERMS

    16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT

    18. NUMBEROF PAGES

    7

    19a. NAME OFRESPONSIBLE PERSON

    a. REPORT

    unclassified b. ABSTRACT

    unclassified c. THIS PAGE

    unclassified

    Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18

  • Mission Rehearsal ExerciseSteve supports many capabilities required

    for face-to-face collaboration with people invirtual worlds. Like earlier intelligent tutoringsystems, he can help students by providingfeedback on their actions and by answeringquestions, such as What should I do next?and Why? However, because Steve has ananimated body and cohabits the virtual worldwith students, he can interact with them inways that previous disembodied tutors couldnot. For example, he can lead students aroundthe virtual world, demonstrate tasks, guidetheir attention through his gaze and pointinggestures, and play the role of a teammatewhose activities they can monitor.

    Steves behavior is not scripted. Rather,Steve consists of a set of general, domain-independent capabilities operating over adeclarative representation of domain tasks.We can apply Steve to a new domain simplyby giving him declarative knowledge of thevirtual worldits objects, their relevant sim-ulator state variables, and their spatial prop-ertiesand the tasks that he can perform inthat world. We give task knowledge to Steveusing a relatively standard hierarchical planrepresentation. Each task model consists of aset of steps (each a primitive action or anothertask), a set of ordering constraints on thosesteps, and a set of causal links. The causallinks describe each steps role in the task; eachlink specifies that one step achieves a partic-ular goal that is a precondition for a secondstep (or for the tasks termination). Stevesgeneral capabilities use such knowledge toconstruct a plan for completing a task fromany given state of the world, revise the planwhen the world changes unexpectedly, andmaintain a collaborative dialogue with his stu-dent and teammates.

    Steves capabilities are well-suited fortraining people on complex, well-definedtasks, such as equipment operation and main-tenance. However, virtual worlds like thepeacekeeping example introduce new re-quirements. To create a more engaging andemotional experience for users, agents musthave realistic bodies, emotions, and distinctpersonalities. To support more flexible inter-action with users, agents must use sophisti-cated natural language capabilities. Finally,to provide more realistic perceptual capabil-ities and limitations for dynamic virtualworlds, agents such as Steve need a human-like model of perception.

    We are addressing these requirements inan ambitious new project called the Mission

    Rehearsal Exercise (MRE).5 In addition toextending Steve with several new domain-independent capabilities, we implementedthe peacekeeping scenario as an exampleapplication to guide our research. Figure 2shows a screen shot from our current imple-mentation. The system displays the visualscene on an eight-foot-tall screen that wrapsaround the user in a 150-degree arc with a12-foot radius. Immersive audio softwareuses 10 audio channels and two subwooferchannels to envelop a participant in spatial-ized sounds that include general ambience(such as crowd noise) and triggered effects(such as explosions or helicopter flyovers).We render the graphics, including staticscene elements and special effects, withMultigen/Paradigms Vega. The simulatoritselfor a human operator using a graphi-cal interfacetriggers external events, suchas radio transmissions from Eagle1-6, amedevac helicopter, and a command center.

    Three Steve agents interact with the humanuser (who plays the role of lieutenant): thesergeant, the medic, and the mother. All othervirtual humans (a crowd of locals and foursquads of soldiers) are scripted charactersimplemented in Boston Dynamics Peo-

    pleShop. The lieutenant talks with the ser-geant to assess the situation, issue orders(which the sergeant carries out through foursquads of soldiers), and ask for suggestions.The lieutenants decisions influence the waythe situation unfolds, culminating in a glow-ing news story praising the users actions ora scathing news story exposing decision flawsand describing their sad consequences.

    This sort of interactive experience clearlyhas both entertainment and training applica-tions. The US Army is well aware of the dif-ficulty of preparing officers to face suchdilemmas in foreign cultures under stressfulconditions. By training in immersive, realisticworlds, officers can gain valuable experience.The same technology could power a new gen-eration of games or educational software, let-ting people experience exciting adventures inroles that are richer and more interactive thanthose that current software supports.

    In the remainder of this article, we describethe key areas in which we are extendingSteve. Although there has been extensiveprior research in each of these areas, we can-not simply plug in different modules repre-senting the state of the art from the differentresearch communities. Researchers devel-

    JULY/AUGUST 2002 computer.org/intelligent 33

    Figure 1. Steve, an interactive agent that functions as a collaborative instructor orteammate in a virtual world, describes a power light.

  • oped the state of the art in each area inde-pendently from the others, so our fundamen-tal research challenge is to understand thedependencies among them. Our integrationrevolves around a common representation fortask knowledge. In addition to providing ahub for integration, this approach providesgenerality: By changing the task knowledge,we can apply our virtual humans to new vir-tual worlds in different domains.

    Virtual human bodiesResearch in computer graphics has made

    great strides in modeling human bodymotion. Most relevant to our objectives is theresearch focusing on real-time control ofhuman figures. Within that area, some workuses forward and inverse kinematics to syn-thesize body motions dynamically so thatthey can achieve desired end positions forbody parts while avoiding collisions withobjects along the way. Other work focuseson dynamically sequencing motion segmentscreated with key-frame animation or motioncapture. This approach achieves more real-istic body motions at the expense of flexibil-ity. Both approaches have reached a sophis-ticated level of maturity, and much currentresearch focuses on combining them toachieve both realism and flexibility.

    We designed Steve to accommodate dif-ferent bodies. His motor-control moduleaccepts abstract motor commands from hiscognition module and sends detailed com-mands to his body through a generic API.Integrating a new body into a Steve agentsimply requires adding a layer of code to mapthat API onto the bodys API. Steves origi-nal body, developed by Marcus Thibaux at

    the USC Information Sciences Institute, gen-erates all motions dynamically using an effi-cient set of algorithms. However, the bodydoes not have legsSteve moves by float-ing aroundand its face has a limited rangeof expressions that do not support synchro-nizing lip movements to speech. Further-more, Steves repertoire of arm gestures islimited to pointing. By integrating a moreadvanced body onto Steve, we can achievemore realistic motion with little or no modi-fication to Steves other modules.

    For MRE, we integrated new bodies devel-oped by Boston Dynamics Incorporated, asshown in Figure 2. The bodies use the humanfigure models and animation algorithms fromPeopleShop, but BDI extended them in sev-eral ways to suit our needs. First, BDI inte-grated new faces (developed by HaptekIncorporated) to support lip-movement syn-chronization and facial expressions. Second,while the basic PeopleShop software pri-marily supports dynamic sequencing ofprimitive motion fragments, BDI combinedtheir motion-capture approach with proce-dural animation to provide more flexibility,primarily in the areas of gaze and arm ges-tures. The new gaze capability lets Stevedirect his body to look at any arbitrary objector point in the virtual world. In addition tospecifying the gaze target, we can specify themanner of the gaze, including the speed ofdifferent body parts (eyes, head, and torso)and the degree to which they orient towardthe target.

    The gesture capability uses an interest-ing hybrid of motion capture and proceduralanimation. Motion capture helped create thebasic repertoire of gestures. However, we

    decompose gestures into stages (such aspreparation, stroke, and retraction), and canchange the timing of each gesture stage toachieve different effects. Moreover, gesturestypically have two extremes: a small,restrained version of the gesture and a large,emphatic version. A Steve agent can dynam-ically generate any gesture between theseextremes by specifying a weighted combi-nation of the two. Thus, these new bodiesleverage the realism of motion capturewhile providing the flexibility of proceduralanimation.

    Task-oriented dialogueSpoken dialogue is crucial for collabora-

    tion. Students must be able to ask their virtualhuman instructors a wide range of questions.Teammates must communicate to coordinatetheir activities, including giving commandsand requests, asking for and offering statusreports, and discussing options. Without spo-ken-dialogue capabilities, virtual humanscannot fully collaborate with people in vir-tual worlds.

    Steve used commercial speech recognitionand synthesis products to communicate withhuman students and teammates. However,like many virtual humans, it had no true nat-ural-language-understanding capabilities andunderstood only a relatively small set of pre-selected phrases. There are two problems withthis approach that make it impractical forambitious applications such as MRE. First, itis very labor intensive to add broad coverage,since each phrase and semantic representa-tion must be individually added. For exam-ple, in a domain such as the MRE, the sameconcept (such as the third squad) could bereferred to in multiple ways (such as thirdsquad, where are they, what are theydoing). It would be very cumbersome to adda separate phrase for each referring expres-sion and predicate about the concept. More-over, adding a new similar concept (such as,fourth squad) requires duplicating all thiseffort. In contrast, a grammar-based approachallows one to achieve the same coverage bymerely adding individual lexical entries to anexisting grammar. In addition to increasingefficiency, this approach can help achieve bet-ter understanding when the system recognizesonly parts of a complete phrase.

    The second problem with using a phrase-based speech recognizer without full nat-ural-language dialogue capabilities is thatall the recognition is done by the speech rec-ognizer itself rather than by modules that

    34 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    I n t e r a c t i v e E n t e r t a i n m e n t

    Figure 2. An interactive peacekeeping scenario featuring (from left to right) asergeant, a mother, and a medic.

  • have access to the evolving task and dialoguecontext. A rich task modelsuch as the oneSteve usescan help choose a more sensi-ble interpretation than a speech recognizeralone.

    For these reasons, in the MRE project weextended the spoken dialogue capabilities toinclude

    A domain-specific finite-state speech rec-ognizer with a vocabulary of several hun-dred words, allowing recognition of thou-sands of distinct utterances

    A finite-state semantic parser that pro-duces partial semantic representations ofinformation expressed in text strings

    A dialogue model that explicitly repre-sents aspects of the social context6,7 and isoriented toward multiple participants andface-to-face communication8

    A dialogue manager that recognizes dia-logue acts from utterances, updates thedialogue model, and selects new contentfor the system to say

    A natural language generator that can pro-duce nuanced English expressions, depend-ing on the agents personality and emo-tional state as well as the selected content9

    An expressive speech synthesizer capableof speaking in different voice modesdepending on factors such as proximity(speaking or shouting) and illocutionaryforce (command or normal speech)

    These new components, along with Stevestask model, give our agents a very rich andflexible dialogue capability: The agent cananswer questions and perform or negotiateabout directives following user- or mixed-initiative strategies rather than forcing theuser to follow a strict system-guided script.

    Dialogue behavior and reasoning makecrucial use of Steves task model, both fordetermining the full intent of an underspec-ified command and for deciding what to donext in a given situation. Consider the taskmodel fragment in Figure 3, which is anno-tated with a sequence of dialogue actions,dialogue state updates, and the sergeants dia-logue goals. The task model encodes socialinformation for team tasks, including bothwho is responsible for the task (R) and whohas authority to allow the task to happen (A).In this task model, the lieutenant has theresponsibility for rendering aid. However,some subtasks, such as securing the local area,can be delegated to the sergeant, who, in turn,can delegate subtasks to squad leaders.

    In Figure 3, the sergeants task focus is ini-tially on the Render Aid task. When the lieu-tenant issues the command to secure the area(utterance U11), the sergeant recognizes thecommand as referring to a subaction of Ren-der Aid in the current task model (Task 2).As a direct effect of the lieutenant issuing acommand to perform this task, the lieutenantbecomes committed to the task, the sergeant

    has an obligation to perform the task, and thetask becomes authorized. Because thesergeant already agrees that this is an appro-priate next step, he is able to accept it withutterance U12, which also commits him toperform the action. The sergeant then pushesthis task into his task model focus and beginsplans to carry it out. In this case, because itis a team task requiring actions of other team-mates, the sergeant, as team leader, mustannounce the task to the other team mem-bers. Thus, the system forms a communica-tive goal to make this announcement. Beforethe sergeant can issue this announcement, hemust make sure he has the squad leadersattention and has them engaged in conversa-tion. He forms a goal to open a new conver-sation so that he can produce the announce-ment. Then his focus can turn to theindividual tasks for each squad leader. Aseach one enters the sergeants focus, heissues the command that commits thesergeant and authorizes the troops to carry itout. When the troops move into action, it sig-nals that they understood the sergeants orderand adopted his plan. When the task com-pletes, the conversation between sergeant andsquad leaders finishes and the sergeant turnshis attention to other matters.

    EmotionsIt is hard to imagine an entertaining or

    compelling character that doesnt express

    JULY/AUGUST 2002 computer.org/intelligent 35

    (b)(a)

    A=Lt,R=Sgt

    A=Sgt,R=1sldr

    Render aid

    Secure area

    Squads in area

    Area secure

    1

    2

    Secure 12-4

    3

    A=Sgt,R=3sldr

    Secure 8-12

    5A=Sgt,R=4sldr

    Secure accident

    6

    A=Sgt,R=2sldr

    Secure 4-8

    4

    Decomposition

    Focus=1Lt: U11 secure the areaCommitted(lt,2), 2 authorized, Obl(sgt,U11)Sgt: U12 yes sirCommitted(sgt,2), Push(2,focus)Goal7: Announce(2,{1sldr,2sldr,3sldr,4sldr})Goal8: Start-conversation(sgt,{1sldr,2sldr,...},2)Goal8 ->Sgt: U21 Squad leaders listen up!Goal7 ->Sgt: U22 I want 360 degree securityPush(3, focus)Goal9: authorize 3Goal9-> Sgt: U23 1st squad take 12-4Committed(sgt,3), 3 authorizedPop(3), Push(4)Goal10: authorize 4Goal10-> Sgt: U24 2nd squad take 4-8Committed(sgt,3), 3 authorizedPop(4)...A10: Squads moveA10: grounds U21-U26,... ends conversation about 2, realizes 2

    Decomposition

    Figure 3. A task model fragment with dialogue-related behavior.

  • emotion. Skilled animators use emotionalbehaviors to create a sense of empathy anddrama and to fill their creations with a richmental life. The growth in learning gamesbuilds on the theory that entertainment valuetranslates into greater student enthusiasm forinstruction and better learning. But beyondcreating a sense of engagement, emotionappears to play a central role in teaching.Tutors frequently convey emotion to moti-vate or reprimand students.

    Furthermore, unemotional agents such asthe original version of Steve are simply un-realistically rational as teammates. In manytraining situations like the MRE, studentsmust learn how their teammates are likely toreact under stress, because learning to mon-itor teammates and adapt to their errors is an

    important aspect of team training. So, Steveslack of emotions hampers his performanceas an instructor and teammate, and it makeshim less engaging for interactive entertain-ment applications.

    Fortunately, research on computationalemotion models has exploded in recent years.Some of that work is particularly well suitedto the type of task reasoning that forms thebasis of the Steve system.10,11 We integratedthese models with Steve and significantlybroadened their scope.12,13 This work is moti-vated by psychological theories of emotionthat emphasize the relationship betweenemotions, cognition, and behavior.

    Figure 4 illustrates the basic model. Theappraisal process at the emotional modelscore results in an emotional state that changesin response to changes in the environment orto changes in an agents beliefs, desires, orintentions. Verbal and nonverbal cues mani-fest this emotional state through facial dis-plays, gestures, and other kinds of body lan-guage, such as fidgeting, gaze aversion, orshoulder rubbing. But emotions dont servemerely to modulate surface behavior. Theyare also powerful motivators. People typicallycope with emotions by acting on the world orby acting internally to change their goals orbeliefs. For example, the sergeants defensiveresponse at the beginning of this article is aclassic example of emotion-focused copingshifting blame in response to strong negativeemotions. The agents coping mechanismmodels this kind of behavior by formingintentions to act or altering internal beliefsand desires in response to strong emotions.

    Figure 5 illustrates some details of theappraisal process, which characterizes eventsin terms of several featuressuch as valence,intensity, and responsibilitythat are map-ped to specific emotions. For example, anaction in the world that threatens an agentsgoals would cause an agent to appraise theaction as undesirable. If the action mightoccur in the future, an agent would appraiseit as an unconfirmed threat, which would besubsequently mapped to fear. If the actionhas already occurred, the confirmed threatwould be mapped to distress. Figure 5 showsa task representation from the mothers per-spective in the peacekeeping scenario. Shewants her child to be healthy and believesthat will occur if the troops stay and treat thechild. This potentially beneficial action leadsto an expression of hope. If the troops leave,their leaving threatens her plans and leads toappraisals of distress and anger.

    In contrast, we can look at coping as theinverse of appraisal. For example, to dis-charge a strong emotion about some situa-tion, one obvious strategy would be tochange one or more of the appraised factorsthat contribute to the emotion. Coping oper-ates on the same representations as theappraisalsthe agents beliefs, goals, andplansbut in reverse to make a direct or in-direct change that would have a desirableimpact on the original appraisal. For exam-ple, the sergeant feels distress because he ispotentially responsible for an action with anundesirable outcome (the boy is injured). Hecould cope with the stress by reversing theundesirable outcome (perhaps by forming anintention to help the boy) or by shiftingblame. Currently, we use a crude personal-ity-trait model to assert preferences overalternative coping strategies. We are work-ing on extending this model.

    Appraisal and coping work together to cre-ate dynamic external behaviorappraisalleads to coping behaviors that in turn lead toa reappraisal of the agent-environment rela-tionship. This model of appraisal, coping,and reappraisal begins to approach the rich-ness and subtle dynamics necessary to cre-ate entertaining and engaging characters.

    Humanlike perceptionOne obstacle to creating believable inter-

    action with a virtual human is omniscience.When human participants realize that a char-acter in a computer game or a virtual worldcan see through walls or instantly knowsevents that have occurred well outside its per-

    36 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    I n t e r a c t i v e E n t e r t a i n m e n t

    Current simulation state

    Treat child

    Troops_helping

    Reinforce Eagle1-6

    Troops_leaving

    Express anger

    Anger

    Fear

    Hope

    Troops_helping

    Child healthy

    Child healthy

    Figure 5. An emotional appraisal that builds on Steves explicit task models. Theappraisal mechanism assesses the relationship between the events and the agentsgoals.

    Prob

    lem

    -focu

    sed

    Emot

    ion-

    focu

    sed

    Appraisal

    Emotion

    Coping

    Goals,beliefs...

    Environment

    Figure 4. The model of emotion in whichan agent appraises the emotional significance of events in terms of theirimpact on the agents goals and plans.

  • ceptual range, they lose the illusion of real-ity and become frustrated with a system thatgives the virtual humans an unfair advantage.Many current applications of virtual humansfinesse the issue of how to model perceptionand spatial reasoning. Steves original ver-sion was no exception. Steve was omniscient;he received messages from the virtual worldsimulator describing every state-change rel-evant to his task model, regardless of his cur-rent location or attention state. Without arealistic model of human attention and per-ception, we had no principled basis to limitSteves access to these state changes.

    Recent research might provide that prin-cipled basis. Randall Hill, for example, hasdeveloped a model of perceptual resolutionbased on psychological theories of humanperception.14,15 Hills model predicts thelevel of detail at which an agent will perceiveobjects and their properties in the virtualworld. He applied his model to syntheticfighter pilots in simulated war exercises.Complementary research by Sonu Chopra-Khullar and Norman Badler provides amodel of visual attention for virtualhumans.16 Their work, which is also basedon psychological research, specifies thetypes of visual attention required for severalbasic tasks (such as locomotion, objectmanipulation, or visual search), as well asthe mechanisms for dividing attentionamong multiple tasks.

    We have begun to put these principles intopractice, beginning with making Stevesmodel of perception more realistic. Weimplemented a model that simulates many ofthe limitations of human perception, bothvisual and aural, and limited Steves simu-lated visual perception to 190 horizontaldegrees and 90 vertical degrees. The level ofdetail Steve perceives about objects is high,medium, or low, depending on where theobject is in Steves field of view and whetherSteve is giving attention to it. Steve can per-ceive both dynamic and static objects in theenvironment. Steve perceives dynamicobjects, under the control of a simulator, byfiltering updates that the system periodicallybroadcasts.

    The simulator does not represent someobjects, such as buildings and trees, in thesame way that it represents dynamic objects,which means Steve wont perceive them inthe same manner that he perceives dynamicobjects. Instead, Steve perceives the locationsof buildings, trees, and other static objects byusing the scene graph and an edge-detection

    algorithm to determine the locations of theseobjects. The system encodes this informationin a cognitive map along with the locationsof exits, which Steve can infer using a space-representation algorithm.17 We will eventu-ally use the cognitive map for way-findingand other spatial tasks.18

    We model aural perception by estimatingthe sound pressure levels of objects in theenvironment and determining their individ-ual and cumulative effects on each listener onthe basis of the distances and directions of thesources. This lets the agents perceive auralevents involving objects not in the visual fieldof view. For example, the sergeant can per-ceive that a vehicle is approaching frombehind, prompting him to turn around and

    look to see whether it is the lieutenant.Another effect of modeling aural perceptionis that some sound events can mask others. Ahelicopter flying overhead can make it impos-sible to hear someone speaking in normaltones a few feet away. The noise mightprompt the virtual speaker to shout and mightalso prompt the Steve agent to cup his ear toindicate that he cannot hear.

    We have made great progress towardvirtual humans that collaborate withpeople in virtual worlds, but much workremains. The implemented peacekeeping sce-nario serves as a valuable test bed for ourresearch. It provides concrete examples ofchallenging research issues and serves as abasis for evaluating our virtual humans withreal users. We continue to extend Steves indi-vidual capabilities, but we are especiallyfocusing on the interdependencies amongthem, such as how emotions and personality

    affect each capability and how Steves exter-nal behavior reflects its internal cognitive andemotional state. While our goals are ambi-tious, the potential payoff is high: Virtualhumans that support rich interactions withpeople pave the way toward a new generationof interactive systems for entertainment andexperiential learning.

    AcknowledgmentsThe Office of Naval Research funded the orig-

    inal research on Steve under grant N00014-95-C-0179 and AASERT grant N00014-97-1-0598. TheDepartment of the Army funds the MRE projectunder contract DAAD 19-99-D-0046. We thankour many colleagues who are contributing to theproject. Shri Narayanan leads a team working onspeech recognition. Mike van Lent, ChangheeHan, and Youngjun Kim are working with RandallHill on perception. Ed Hovy, Deepak Ravichan-dran, and Michael Fleischman are working on nat-ural-language understanding and generation.Lewis Johnson, Kate LaBore, Shri Narayanan, andRichard Whitney are working on speech synthe-sis. Kate LaBore is developing the behaviors of thescripted characters. Larry Tuch wrote the story linewith creative input from Richard Lindheim andtechnical input on Army procedures from ElkeHutto and General Pat ONeal. Sean Dunn, SherylKwak, Ben Moore, and Marcus Thibaux createdthe simulation infrastructure. Chris Kyriakakis andDave Miraglia are developing the immersive soundeffects. Jacki Morie, Erika Sass, Michael Murguia,and Kari Birkeland created the graphics for theBosnian town. Any opinions, findings, and con-clusions expressed in this article are those of theauthors and do not necessarily reflect the views ofthe Department of the Army.

    References

    1. J. Cassell et al., Embodied ConversationalAgents, MIT Press, Cambridge, Mass., 2000.

    2. W.L. Johnson, J.W. Rickel, and J.C. Lester,Animated Pedagogical Agents: Face-to-FaceInteraction in Interactive Learning Environ-ments, Intl J. Artificial Intelligence in Edu-cation, vol. 11, 2000, pp. 4778.

    3. J. Rickel and W.L. Johnson, AnimatedAgents for Procedural Training in VirtualReality: Perception, Cognition, and MotorControl, Applied Artificial Intelligence, vol.13, nos. 45, JuneAug. 1999, pp. 343382.

    4. J. Rickel and W.L. Johnson, Virtual Humansfor Team Training in Virtual Reality, Proc.9th Intl Conf. Artificial Intelligence in Edu-cation, IOS Press, Amsterdam, 1999, pp.578585.

    JULY/AUGUST 2002 computer.org/intelligent 37

    Virtual humans that support

    rich interactions with people

    pave the way toward a new

    generation of interactive systems

    for entertainment and experiential

    learning.

  • 5. W. Swartout et al., Toward the Holodeck:Integrating Graphics, Sound, Character, andStory, Proc. 5th Intl Conf. AutonomousAgents, ACM Press, New York, 2001, pp.409416.

    6. D.R. Traum, A Computational Theory of

    Grounding in Natural Language Conversation,doctoral dissertation, Dept. of Computer Sci-ence, Univ. of Rochester, Rochester, N.Y., 1994.

    7. C. Matheson, M. Poesio, and D. Traum,Modeling Grounding and Discourse Obli-gations Using Update Rules, Proc. 1st Conf.

    North Am. Chapter of the Assoc. for Compu-tational Linguistics, Morgan Kaufmann, SanFrancisco, 2000, pp.18.

    8. D. Traum and J. Rickel, Embodied Agentsfor Multiparty Dialogue in Immersive VirtualWorlds, to be published in Proc. 1st IntlJoint Conf. Autonomous Agents and Multi-agent Systems, ACM Press, New York, 2002.

    9. M. Fleischman and E. Hovy, EmotionalVariation in Speech-Based Natural LanguageGeneration, to be published in Proc. 2nd IntlNatural Language Generation Conf., 2002.

    10. J. Gratch, mile: Marshalling Passions inTraining and Education, Proc. 4th Intl Conf.Autonomous Agents, ACM Press, New York,2000, pp. 325332.

    11. S.C. Marsella, W.L. Johnson, and C. LaBore,Interactive Pedagogical Drama, Proc. 4thIntl Conf. Autonomous Agents, ACM Press,New York, 2000, pp. 301308.

    12. J. Gratch and S. Marsella, Tears and Fears:Modeling Emotions and Emotional Behav-iors in Synthetic Agents, Proc. 5th Intl Conf.Autonomous Agents, ACM Press, New York,2001, pp. 278285.

    13. S. Marsella and J. Gratch, A Step TowardIrrationality: Using Emotion to ChangeBelief, to be published in Proc. 1st Intl JointConf. Autonomous Agents and MultiagentSystems, ACM Press, New York, 2002.

    14. R. Hill, Modeling Perceptual Attention inVirtual Humans, Proc. 8th Conf. ComputerGenerated Forces and Behavioral Represen-tation, SISO, Orlando, Fla., 1999, pp.563573.

    15. R. Hill, Perceptual Attention in VirtualHumans: Toward Realistic and BelievableGaze Behaviors, Proc. AAAI Fall Symp. Sim-ulating Human Agents, AAAI Press, MenloPark, Calif., 2000, pp. 4652.

    16. S. Chopra-Khullar and N.I. Badler, Whereto Look? Automating Attending Behaviors ofVirtual Human Characters, Proc. 3rd IntlConf. Autonomous Agents, ACM Press, NewYork, 1999, pp. 1623.

    17. W.K. Yeap and M.E. Jefferies, Computing aRepresentation of the Local Environment,Artificial Intelligence, vol. 107, no. 2, Feb.1999, pp. 265301.

    18. R. Hill, C. Han, and M. van Lent, ApplyingPerceptually Driven Cognitive Mapping toVirtual Urban Environments, to be publishedin Proc. 14th Annual Conf. Innovative Appli-cations of Artificial Intelligence, AAAI Press,Menlo Park, Calif., 2002.

    For more information on this or any other com-puting topic, please visit our digital library athttp://computer.org/publications/dlib.

    38 computer.org/intelligent IEEE INTELLIGENT SYSTEMS

    I n t e r a c t i v e E n t e r t a i n m e n t

    T h e A u t h o r sJeff Rickel is a project leader at the University of Southern Californias Infor-mation Sciences Institute and a research assistant professor in the Departmentof Computer Science. His research interests include intelligent agents for edu-cation and training, especially animated agents that collaborate with peoplein virtual reality. He received his PhD in computer science from the Univer-sity of Texas at Austin. He is a member of the ACM and the AAAI. Contacthim at the USC Information Sciences Inst., 4676 Admiralty Way, Suite 1001,Marina del Rey, CA 90292; [email protected]; www.isi.edu/~rickel.

    Stacy Marsella is a project leader at the University of Southern CaliforniasInformation Sciences Institute. His research interests include computationalmodels of emotion, nonverbal behavior, social interaction, interactive drama,and the use of simulation in education. He received his PhD in computer sci-ence from Rutgers University. Contact him at the USC Information SciencesInst., 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292;[email protected]; www.isi.edu/~marsella.

    Jonathan Gratch heads up the stress and emotion project at the Universityof Southern Californias Institute for Creative Technology and is a researchassistant professor in the department of computer science. His research inter-ests include cognitive science, emotion, planning, and the use of simulationin training. He received his PhD in computer science from the University ofIllinois. Contact him at the USC Inst. for Creative Technologies, 13274 FijiWay, Marina del Rey, CA 90292; [email protected]; www.ict.usc.edu/~gratch.

    Randall Hill is the deputy director of technology at the University of South-ern Californias Institute for Creative Technologies. His research interestsinclude modeling perceptual attention and cognitive mapping in virtualhumans. He received his PhD in computer science from the University ofSouthern California. He is a member of the AAAI. Contact him at the USCInst. for Creative Technologies, 13274 Fiji Way, Marina del Rey, CA 90292;[email protected].

    David Traum is a research scientist at the University of Southern Califor-nias Institute for Creative Technologies and a research assistant professor ofcomputer science. His research interests include collaboration and dialoguecommunication between human and artificial agents. He received his PhD incomputer science from the University of Rochester. Contact him at the USCInst. for Creative Technologies, 13274 Fiji Way, Marina del Rey, CA 90292;[email protected]; www.ict.usc.edu/~traum.

    William Swartout is the director of technology at the University of South-ern Californias Institute for Creative Technologies and a research associateprofessor of computer science at USC. His research interests include virtualhumans, immersive virtual reality, knowledge-based systems, knowledgerepresentation, knowledge acquisition, and natural language generation. Hereceived his PhD in computer science from the Massachusetts Institute ofTechnology. He is a fellow of the AAAI and a member of the IEEE Intelli-gent Systems editorial board. Contact him at the USC Inst. for Creative Tech-nologies, 13274 Fiji Way, Marina del Rey, CA 90292; [email protected].