mediating the expression of emotion in educational collaborative virtual environments: an...

16
ORIGINAL ARTICLE Marc Fabri David Moore Dave Hobbs Mediating the expression of emotion in educational collaborative virtual environments: an experimental study Received: 3 September 2002 / Accepted: 2 October 2003 / Published online: 5 February 2004 Ó Springer-Verlag London Limited 2004 Abstract The use of avatars with emotionally expressive faces is potentially highly beneficial to communication in collaborative virtual environments (CVEs), especially when used in a distance learning context. However, little is known about how, or indeed whether, emotions can effectively be transmitted through the medium of a CVE. Given this, an avatar head model with limited but human-like expressive abilities was built, designed to enrich CVE communication. Based on the facial action coding system (FACS), the head was designed to ex- press, in a readily recognisable manner, the six universal emotions. An experiment was conducted to investigate the efficacy of the model. Results indicate that the ap- proach of applying the FACS model to virtual face representations is not guaranteed to work for all expressions of a particular emotion category. However, given appropriate use of the model, emotions can effectively be visualised with a limited number of facial features. A set of exemplar facial expressions is pre- sented. Keywords Avatar Collaborative virtual environment Emotion Facial expression 1 Introduction This document outlines an experimental study to investigate the use of facial expressions for humanoid user representations as a means of non-verbal commu- nication in collaborative virtual environments (CVEs). The intention is to establish detailed knowledge about how facial expressions can be effectively and efficiently visualised in CVEs. We start by arguing for the insufficiency of existing distance communication media in terms of emotional context and means for emotional expression, and pro- pose that this problem could be overcome by enabling people to meet virtually in a CVE and engage in quasi face-to-face communication via their avatars. We fur- ther argue that the use of avatars with emotionally expressive faces is potentially highly beneficial to com- munication in CVEs. However, although research in the field of CVEs has been proceeding for some time now, the representation of user embodiments, or avatars, in most systems is still relatively simple and rudimentary [1]. In particular, virtual environments are often poor in terms of the emotional cues that they convey [2]. Accordingly, the need for sophisticated ways to reflect emotions in virtual embodiments has been pointed out repeatedly in recent investigations [3, 4]. In the light of this, a controlled experiment was conducted to investigate the applicability of non-verbal means of expression, particularly the use of facial expressions, via avatars in CVE systems. It is the pur- pose of the experiments to establish whether and how emotions can effectively be transmitted through the medium of CVE. 2 A case for a CVE for education Today’s information society provides us with numerous technological options to facilitate human interaction over a distance, in real time or asynchronously: telephony, electronic mail, text-based chat, video-conferencing systems. These tools are useful, and indeed crucial, for people who cannot come together physically but need to discuss, collaborate on, or even dispute certain matters. Distance learning programmes make extensive use of such technologies to enable communication M. Fabri (&) D. Moore ISLE Research Group, Leeds Metropolitan University, Leeds, UK E-mail: [email protected] Tel.: +44-113-2832600 Fax: +44-113-2833182 D. Hobbs School of Informatics, University of Bradford, Bradford, UK Virtual Reality (2004) 7: 66–81 DOI 10.1007/s10055-003-0116-7

Upload: marc-fabri

Post on 15-Jul-2016

225 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

ORIGINAL ARTICLE

Marc Fabri Æ David Moore Æ Dave Hobbs

Mediating the expression of emotion in educational collaborativevirtual environments: an experimental study

Received: 3 September 2002 / Accepted: 2 October 2003 / Published online: 5 February 2004� Springer-Verlag London Limited 2004

Abstract The use of avatars with emotionally expressivefaces is potentially highly beneficial to communication incollaborative virtual environments (CVEs), especiallywhen used in a distance learning context. However, littleis known about how, or indeed whether, emotions caneffectively be transmitted through the medium of aCVE. Given this, an avatar head model with limited buthuman-like expressive abilities was built, designed toenrich CVE communication. Based on the facial actioncoding system (FACS), the head was designed to ex-press, in a readily recognisable manner, the six universalemotions. An experiment was conducted to investigatethe efficacy of the model. Results indicate that the ap-proach of applying the FACS model to virtual facerepresentations is not guaranteed to work for allexpressions of a particular emotion category. However,given appropriate use of the model, emotions caneffectively be visualised with a limited number of facialfeatures. A set of exemplar facial expressions is pre-sented.

Keywords Avatar Æ Collaborative virtual environment ÆEmotion Æ Facial expression

1 Introduction

This document outlines an experimental study toinvestigate the use of facial expressions for humanoiduser representations as a means of non-verbal commu-nication in collaborative virtual environments (CVEs).

The intention is to establish detailed knowledge abouthow facial expressions can be effectively and efficientlyvisualised in CVEs.

We start by arguing for the insufficiency of existingdistance communication media in terms of emotionalcontext and means for emotional expression, and pro-pose that this problem could be overcome by enablingpeople to meet virtually in a CVE and engage in quasiface-to-face communication via their avatars. We fur-ther argue that the use of avatars with emotionallyexpressive faces is potentially highly beneficial to com-munication in CVEs.

However, although research in the field of CVEs hasbeen proceeding for some time now, the representationof user embodiments, or avatars, in most systems is stillrelatively simple and rudimentary [1]. In particular,virtual environments are often poor in terms of theemotional cues that they convey [2]. Accordingly, theneed for sophisticated ways to reflect emotions in virtualembodiments has been pointed out repeatedly in recentinvestigations [3, 4].

In the light of this, a controlled experiment wasconducted to investigate the applicability of non-verbalmeans of expression, particularly the use of facialexpressions, via avatars in CVE systems. It is the pur-pose of the experiments to establish whether and howemotions can effectively be transmitted through themedium of CVE.

2 A case for a CVE for education

Today’s information society provides us with numeroustechnological options to facilitate human interactionover a distance, in real time or asynchronously: telephony,electronic mail, text-based chat, video-conferencingsystems. These tools are useful, and indeed crucial,for people who cannot come together physically butneed to discuss, collaborate on, or even dispute certainmatters. Distance learning programmes make extensiveuse of such technologies to enable communication

M. Fabri (&) Æ D. MooreISLE Research Group, Leeds Metropolitan University,Leeds, UKE-mail: [email protected].: +44-113-2832600Fax: +44-113-2833182

D. HobbsSchool of Informatics, University of Bradford,Bradford, UK

Virtual Reality (2004) 7: 66–81DOI 10.1007/s10055-003-0116-7

Page 2: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

between spatially separated tutors and learners, andbetween learners and fellow learners [5]. Extensiveresearch [6, 7, 8] has shown that such interaction iscrucial for the learning process, for the purpose ofmutual reflection on actions and problem solutions, formotivation and stimulation as well as assessment andcontrol of progress, and has given rise to a growing bodyof literature in computer supported collaborativelearning, cf [9, 10].

However, when communicating over a distancethrough media tools, the emotional context is often lost,as well as the ability to express emotional states in theway one is accustomed to in face-to-face conversations.When using text-based tools, important indicators likeaccentuation, emotion and change of emotion or into-nation are difficult to mediate [11]. Audio conferencingtools can alleviate some of these difficulties but lack waysto mediate non-verbal means of communication such asfacial expressions, posture or gesture. These channels,however, play an important role in human interactionand it has been argued that the socio-emotional contentthey convey is vital for building relationships that need togo beyond purely factual and task-oriented communi-cation [11].

Video conferencing can alleviate some of the short-comings concerning body language and the visualexpression of a participant’s emotional state. Daly-Joneset al. [12] identify several advantages of video confer-encing over high quality audio conferencing, in partic-ular the vague awareness of an interlocutor’s attentionalfocus. However, because of the non-immersive characterof typical video-based interfaces, conversational threadsduring meetings can easily break down when people aredistracted by external influences or have to change theactive window, for example, to handle electronicallyshared data [13].

CVEs are a potential alternative to these communi-cation tools, aiming to overcome the lack of emotionaland social context whilst at the same time offering astimulating and integrated framework for conversationand collaboration. Indeed, it can be argued that CVEsrepresent a communication technology in their ownright due to the highly visual and interactive character ofthe interfaces that allow communication and the repre-sentation of information in new, innovative ways. Usersare likely to be actively engaged in interaction with thevirtual world and with other inhabitants. In the distancelearning discipline in particular, this high-level interac-tivity, in which the users’ senses are engaged in the ac-tion and they ‘‘feel’’ they are participating in it, is seen asan essential factor for effective and efficient learning [14].

3 The need for emotionally expressive avatars

The term ‘‘non-verbal communication’’ is commonlyused to describe all human communication events whichtranscend the spoken or written word [15]. Non-verbalcommunication plays a substantial role in human

interpersonal behaviour. Social psychologists argue thatmore than 65% of the information exchanged during aperson-to-person conversation is carried on the non-verbal band [16]. Argyle [17] sees non-verbal behaviourtaking place whenever one person influences another bymeans of facial expressions, gestures, body posture,bodily contact, gaze and pupil dilation, spatial behav-iour, clothes, appearance, or non-verbal vocalisation(e.g. a murmur).

A particularly important aspect of non-verbal com-munication is its use to convey information concerningthe emotional state of interlocutors. Wherever oneinteracts with another person, that other person’s emo-tional expressions are monitored and interpreted—andthe other person is doing the same [18]. Indeed, theability to judge the emotional state of others is consid-ered an important goal in human perception [19], and itis argued that from an evolutionary point of view, it isprobably the most significant function of interpersonalperception. Since different emotional states are likely tolead to different courses of action, it can be crucial forsurvival to be able to recognise emotional states, inparticular anger or fear, in another person. Similarly,Argyle [17] argues that the expression of emotion, in theface or through the body, is part of a wider system ofnatural human communication that has evolved tofacilitate social life. Keltner [20] showed that for exam-ple embarrassment is an appeasement signal that helpsreconcile relations when they have gone awry, a way ofapologising for making a social faux-pas. Again, recentfindings in psychology and neurology suggest thatemotions are also an important factor in decision-mak-ing, problem solving, cognition and intelligence in gen-eral [19, 21, 22, 23].

Of particular importance from the point of view ofeducation, it has been argued that the ability to showemotions, empathy and understanding through facialexpressions and body language is central to ensuring thequality of tutor-learner and learner-learner interaction[24]. Acceptance and understanding of ideas and feel-ings, encouraging and criticising, silence, question-ing—all involve non-verbal elements of interaction[15, 24].

Given this, it can be argued that CSCL technologiesought to provide for at least some degree of non-verbal,and in particular emotional, communication. For in-stance, the pedagogical agent STEVE [25] is used in avirtual training environment for control panel opera-tion. STEVE has the ability to give instant praise orexpress criticism via hand and head gestures dependingon a student’s performance. Concerning CVE technol-ogy in particular, McGrath and Prinz [26] call forappropriate ways to express presence and awarenessin order to aid communication between inhabitants, beit full verbal communication or non-verbal presence insilence.

Thalmann [1] sees a direct relation between thequality of a user’s representation and his ability tointeract with the environment and with other users. Even

67

Page 3: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

avatars with rather primitive expressive abilities canpotentially cause strong emotional responses in peopleusing a CVE system [27]. It appears, then, that theavatar can readily take on a personal role, therebyincreasing the sense of togetherness—the communityfeeling. The avatar potentially becomes a genuine rep-resentation of the underlying individual, not only visu-ally, but also within a social context.

It is argued, then, that people’s naturally developedskill to ‘‘read’’ emotional expressions is potentiallyhighly beneficial to communication in CVEs in general,and educational CVEs in particular. The emotionallyexpressive nature of an interlocutor’s avatar may be ableto aid the communication process and provide infor-mation that would otherwise be difficult to mediate.

4 Modelling an emotionally expressive avatar

Given that emotional expressiveness would be a desir-able attribute of a CVE, the issue becomes one of howsuch emotional expressions can be mediated. Whilst allof the different channels for non-verbal communica-tion—face, gaze, gesture, posture—can in principle bemediated in CVEs to a certain degree, our current workfocuses on the face. In the real world, it is the face that isthe most immediate indicator of the emotional state of aperson [28].

While physiology looks beneath the skin, physiog-nomy stays on the surface studying facial features andlineaments; it is the art of judging character or theemotional state of an individual from the features of theface [29]. The face reflects interpersonal attitudes, pro-vides feedback on the comments of others and is re-garded as the primary source of information afterhuman speech [15]. Production (encoding) and recogni-tion (decoding) of distinct facial expressions constitute asignalling system between humans [30]. Surakka andHietanen [31] see facial expressions of emotion clearlydominating over vocal expressions of emotion; Knapp[15] generally considers facial expressions as the primarysite for communication of emotional states.

Indeed, most researchers even suggest that the abilityto classify facial expressions of an interlocutor is anecessary prerequisite for the inference of emotion. Itappears that there are certain key stimuli in the humanface that support cognition. Zebrowitz [32] found that,for example, in the case of an infant’s appearance, thesekey stimuli can by themselves trigger favourable emo-tional responses. Strongman [18] points out that humansmake such responses not only to the expression but alsoto what is believed to be the ‘‘meaning’’ behind theexpression.

Our work therefore concentrates on the face. Tomodel an emotionally expressive avatar face, the work of[33] was followed. It was found that there are six universalfacial expressions, corresponding to the following emo-tions: surprise, anger, fear, happiness, disgust/Contempt,and sadness. This categorisation is widely accepted, and

considerable research has shown that these basic emo-tions can be accurately communicated by facial expres-sions [32, 34]. Indeed, it is held that the expression, and toan extent the recognition, of these six emotions has aninnate basis. They can be found in all cultures, and cor-respond to distinctive patterns of physiognomic arousal.Figure 1 shows sample photographs depicting the sixuniversal emotions, together with the neutral expression(from [35], used with permission).

4.1 Describing facial expressions

Great effort has gone into the development of scoringsystems for facial movements. These systems attempt toobjectively to describe and quantify all visually dis-criminating units of facial action seen in adults. For thepurpose of analysis, the face is typically broken downinto three areas:

1. the brows and forehead2. the eyes, eyelids and the root of the nose3. the lower face with the mouth, nose, cheeks, and chin

These are the areas which appear to be capable ofindependent movement. In order to describe the visiblemuscle activity in the face comprehensively, FACS wasdeveloped [36]. FACS is based on highly detailed ana-tomical studies of human faces and results from a majorbody of work. It has formed the basis for numerousseries of experiments in social psychology, computervision and computer animation [37, 38, 39, 40].

A facial expression is a high level description of facialmotions, which can be decomposed into certain mus-cular activities, i.e., relaxation or contraction, called‘‘action units’’ (AUs). FACS identifies 58 action units,which separately or in various combinations are capableof characterising any human expression. An AU corre-sponds to an action produced by one or a group ofrelated muscles. Action Unit 1, for example, is the inner-brow-raiser, a contraction of the central frontalis muscle.Action Unit 7 is the lid-tightener, tightening the eyelidsand thereby narrowing the eye opening.

FACS is usually coded from video or photographs,and a trained human FACS coder decomposes an ob-served expression into the specific AUs that occurred,their duration, onset, and offset time [37]. From thissystem, some very specific details can be learnt aboutfacial movement for different emotional expressions ofhumans in the real world. For instance, the brow seemscapable of the fewest positions and the lower face themost [15]. Certain emotions also seem to manifestthemselves in particular areas of the face. The bestpredictors for anger, for example, are the lower face andthe brows/forehead area, whereas sadness is most re-vealed in the area around the eyes [15].

For our current modelling work, then, FACS wasadapted to generate the expression of emotions in thevirtual face, by applying a limited number of relevantaction units to the animated head. Figure 2 shows

68

Page 4: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

photographs of some alternative expressions for theanger emotion category, together with the correspondingvirtual head expressions as modelled by our avatar.Equivalent representations exist for all remaining uni-versal emotions (and the neutral expression). All pho-tographs are taken from the Pictures of Facial Affectdatabank [35].

4.2 Keeping it simple

Interest in modelling the human face has been strong inthe computer graphics community since the 1980s. The

first muscle-based model of an animated face, usinggeometric deformation operators to control a largenumber of muscle units, was developed by Platt andBadler [41]. This was developed further by modelling theanatomical nature of facial muscles and the elastic nat-ure of human skin, resulting in a dynamic muscle model[40, 42].

The approach adopted in this study, however, isfeature-based and therefore less complex than a realisticsimulation of real-life physiology. It is argued that it isnot necessary, and indeed may be counter-productive, toassume that a ‘‘good’’ avatar has to be a realistic and veryaccurate representation of the real world physiognomy.

Fig. 1 The six universalemotions and neutralexpressions

Fig. 2 Photographs showingvariations of anger, withcorresponding virtual heads

69

Page 5: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

We argue this partly on the ground that early evidencesuggested that approaches aiming to reproduce the hu-man physics in detail may in fact be wasteful [43].

Indeed, this has been described as the Uncanny Valley[44], originally created to predict human psychologicalreaction to humanoid robots (see Fig. 3, adapted from[45]). When plotting human reaction against robotmovement, the curve initially shows a steady upwardtrend. That trend continues until the robot reaches areasonably human quality. The curve then plunges downdramatically, even evoking a negative emotional re-sponse. A nearly human robot is considered irritatingand repulsive. The curve only rises again once the roboteventually reaches a complete resemblance to humans.

It is postulated that human reaction to avatars issimilarly characterised by an uncanny valley. An avatardesigned to suspend disbelief that is only nearly realisticmay be equally confusing and not be accepted, evenconsidered repulsive. In any event, Hindmarsh et al. [46]suggest that even with full realism and full perceptualcapabilities of physical human bodies in virtual space,opportunities for employing more inventive and evoca-tive ways of expression would probably be lost if thefocus is merely on simulating the real world—with itsrules, habits and limitations.

It may be more appropriate, and indeed more sup-portive to perception and cognition, to represent issuesin simple or unusual ways. Godenschweger et al. [47]found that minimalist drawings of body parts, showinggestures, were generally easier to recognise than morecomplex representations. Further, Donath [48] warnsthat because the face is so highly expressive and humansare so adept in reading (into) it, any level of detail in 3Dfacial rendering could potentially provoke the interpre-tation of various social messages. If these messages are

unintentional, the face will arguably be hindering ratherthan helping communication.

Again, there is evidence that particularly distinctivefaces can convey emotions more efficiently than normalfaces [32, 49, 50], a detail regularly employed by cari-caturists. The human perception system can recognisephysiognomic clues, in particular facial expressions,from very few visual stimuli [51].

To summarise, rather than simulating the real worldaccurately, we aim to take advantage of humans’ innatecognitive abilities to perceive, recognise and interpretdistinctive physiognomic clues. With regard to avatarexpressiveness and the uncanny valley, we are targetingthe first summit of the curve (see Fig. 3) where humanemotional response is maximised while employing arelatively simple avatar model.

4.3 Modelling facial expressions

In order to realise such an approach in our avatar work,we developed an animated virtual head with a limitednumber of controllable features. It is loosely based onthe H-Anim specification [52] developed by the inter-national panel that develops the virtual reality modelinglanguage (VRML). H-Anim specifies seven controlparameters:

1. the left eyeball2. the right eyeball3. the left eyebrow4. the right eyebrow5. the left upper eyelid6. the right upper eyelid7. the temporomandibular (for moving the jaw)

Early in the investigation it became evident, however,that eyeball movement was not necessary as the virtualhead was always in direct eye contact with the observer.We also found that although we were aiming at a simplemodel, a single parameter for moving and animating themouth area (the temporomandibular) was insufficientfor the variety of expressions required in the lower facearea.

Consequently, the H-Anim basis was developed fur-ther and additional features were derived from, andclosely mapped to, FACS action units. This allowed forgreater freedom, especially in the mouth area. It has tobe noted that while FACS describes muscle movement,our animated head was not designed to necessarilyemulate such muscle movement faithfully, but to achievea visual effect very similar to the result of muscle activityin the human face.

It turned out that it is not necessary for the entire setof action units to be reproduced in order to achieve thelevel of detail envisaged for the current face model. Infact, reducing the number of relevant action units is notan uncommon practice for simple facial animationmodels [53, 54], and this study used a subset of 11 actionunits (see Table 1).Fig. 3 The ‘‘uncanny valley’’

70

Page 6: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

The relevant animation control parameters requiredto model facial features that correspond to these 11action units are illustrated in Fig. 4.

As an example, Fig. 5 shows four variations of thesadness emotion, as used in the experiment. Note thewider eye opening in 1, and the change of angle andposition of the eyebrows.

Certain facial features have deliberately been omittedto keep the number of control parameters, and actionunits, low. For example, AU12 (the lip corner puller)normally involves a change in cheek appearance. Thevirtual head however shows AU12 only in the mouthcorners. Also, the virtual head showing AU26 (the jawdrop) does not involve jawbone movement but is char-acterised solely by the relaxation of the mentalis muscle,resulting in a characteristic opening of the mouth. Theseomissions were considered tolerable, as they did notappear to change the visual appearance of the expressionsignificantly. Accordingly, neither the statistical analysisnor feedback from participants indicated a disadvantageof doing so.

In summary, then, we argue that the virtual facemodel introduced above is a potentially effective andefficient means for conveying emotion in CVEs. Byreducing the facial animation to a minimal set of fea-tures believed to display the most distinctive area seg-ments of the six universal expressions of emotion(according to [28]), we take into account findings fromcognitive and social psychology. These findings suggestthat there are internal, probably innate, physiognomicschemata that support face perception and emotionrecognition in the face [55]. This recognition processworks with even a very limited set of simple but dis-tinctive visual clues [17, 51].

5 Experimental investigation

We argue, then, that there is a strong prima facie casethat the proposed virtual head, with its limited, buthuman-like expressive abilities, is a potentially effectiveand efficient means to convey emotions in virtual envi-ronments, and that the reduced set of action units andthe resulting facial animation control parameters are

sufficient to express, in a readily recognisable manner,the six universal emotions.

We have experimentally investigated this prima facieargument, comparing recognition rates of virtual headexpressions with recognition rates based on photographsof faces for which FACS action unit coding, as well asrecognition rates from human participants, was avail-able. These photographs were taken from [35]. A de-tailed description of the experimental setup is presentedin this section. The aims of the experiment were (a) toinvestigate the use of simple but distinctive visual cluesto mediate the emotional and social state of a CVE user,and (b) to establish the most distinctive and essentialfeatures of an avatar facial expression.

Given these aims, the experiment was designed toaddress the following working hypothesis: ‘‘For a well-defined subset that includes at least one expression peremotion category, recognition rates of the virtual headmodel and of the corresponding photographs are com-parable’’.

5.1 Design

The independent variable (IV) in this study is the stim-ulus material presented to the participants. The facialexpressions of emotion are presented in two differentways, as FACS training photographs or in emotionsdisplayed by the animated virtual head. Within each ofthese two factors, there are seven sub-levels (the sixuniversal expressions of emotion and neutral). Thedependent variable (DV) is the success rate achievedwhen assigning the presented expressions of emotion totheir respective categories.

Two control variables (CVs) can be identified: thecultural background of participants and their previousexperience in similar psychological experiments. Sincethe cultural background of participants potentially mayaffect their ability to recognise certain emotions in theface [32], this factor was neutralised by ensuring that allparticipants had broadly the same ability concerning therecognition of emotion. In the same manner, it waschecked that none of the participants had previousexperience with FACS coding or related psychological

Table 1 Reduced set of actionunits AU Facial action code Muscular basis

1 Inner brow raiser Frontalis, pars medialis2 Outer brow raiser Frontalis, pars lateralis4 Brow lowerer Depressor glabellae, depressor supercilli, corrugator5 Upper lid raiser Levator palpebrae superioris7 Lid tightener Orbicularis oculi, pars palebralis10 Upper lip raiser Levator labii superioris, caput infraorbitalis12 Lip corner puller Zygomatic major15 Lip corner depressor Triangularis17 Chin raiser Mentalis25 Lips part Depressor labii, relaxation of mentalis or orbicularis oris26 Jaw drop (mouth only) Masetter, relaxation of temporal and internal preygoids

71

Page 7: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

experiments, as this may influence the perception abili-ties due to specifically developed skills.

We adopted a one-factor, within subjects design (alsoknown as repeated measures design) for the experiment.The factor comprises two levels, photograph or virtualface, and each participant performs under both condi-tions:

– Condition A: emotions depicted by the virtual head– Condition B: emotions shown by persons on FACS

photographs

29 participants took part in the experiment, 17 femaleand 12 male, with an age range of 22 to 51 years old. Allparticipants were volunteers. None had classified facialexpressions or used FACS before. None of the partici-pants worked in facial animation, although some werefamiliar with 3D modelling techniques in general.

5.2 Procedure

The experiment involved three phases: a pre-test ques-tionnaire, a recognition exercise and a post-test ques-tionnaire. Each participant was welcomed by theresearcher and seated at the workstation where theexperiment would be conducted. The researcher thengave the participant an overview of what was expectedof him/her and what to expect during the experiment.Care was taken not to give out information that mightbias the user. The participants were assured that theythemselves were not under evaluation and that theycould leave the experiment at any point if they feltuncomfortable. Participants were then presented withthe pre-test questionnaire, which led into the recognitionexercise. From this moment, the experiment ran auto-matically, via a software application, and no furtherexperimenter intervention was required.

The actual experiment was preceded by a pilot testwith a single participant. This participant was not partof the participant group in the later experiment. Thepilot run confirmed that the software designed topresent the stimulus material and collect the data wasfunctioning correctly and also that a 20 minutesduration time per participant was realistic. Further-more, it gave indications that questionnaire itemspossess the desired qualities of measurement and dis-criminability.

The pre-test questionnaire (Fig. 6) collected infor-mation about the participant in relation to their appli-cability for the experiment.

The Cancel button allowed the abortion of theexperiment at any stage, in which case all data collectedso far was deleted. Back and Next buttons were dis-played depending on the current context. A screen col-lecting further data about the participant’s backgroundon FACS as well as possible involvement in similarexperiments followed the pre-test questionnaire.

Before the recognition task started, a ‘‘practice’’screen illustrating the actual recognition screen andgiving information about the choice of emotion cate-gories and the functionality of buttons and screen ele-ment, was shown to the participant.

During the recognition task, each participant wasshown 28 photographs and 28 corresponding virtualhead images, mixed together in a randomly generatedorder that was the same for all participants. Each of thesix emotion categories was represented in 4 variations,and 4 variations of the neutral face were also shown. Thevariations were defined not by intensity, but by differ-ences in expression of the same emotion. The control-lable parameters of the virtual head were adjusted sothat they corresponded with the photographs.

All material was presented in digitised form, i.e., asvirtual head screenshots and scanned photographs,respectively. Each of the six emotion categories wasrepresented in four variations. In addition, there werefour variations of the neutral face. Each participantwas therefore asked to classify 56 expressions (two

Fig. 4 Controllable features of the virtual head

Fig. 5 Variations within emotion category sadness

72

Page 8: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

conditions x seven emotion categories x four variationsper category).

All virtual head images depicted the same male modelthroughout, whereas the photographs showed severalpeople, expressing a varying number of emotions (21images showing male persons, eight female).

The order of expressions in terms of categories andvariations was randomised but the same for all partici-pants. Where the facial atlas did not provide four dis-tinctive variations of a particular emotion category, orthe virtual head could not show the variation because ofthe limited set of animation parameters, a similar facewas repeated.

The face images used in the task were cropped todisplay the full face, including hair. Photographs werescaled to 320 · 480 pixels, whereas virtual head imageswere slightly smaller at 320 · 440 pixels. The data col-lected for each facial expression of emotion consisted of:

– the type of stimulus material– the expression depicted by each of the facial areas– the emotion category expected– the emotion category picked by the participant

A ‘‘recognition screen’’ (Fig. 7) displayed the imagesand provided buttons for participants to select anemotion category. In addition to the aforementionedseven categories, two more choices were offered. The‘‘Other...’’ choice allowed entry of a term that, accordingto the participant, described the shown emotion best butwas not part of the categories offered. If none of theemotions offered appeared to apply, and no otheremotion could be named, the participant was able tochoose ‘‘Don’t know’’.

On completion of the recognition task, the softwarepresented the post-test questionnaire (Fig. 8) to the

participant. This collected various quantitative andqualitative data, with a view to complementing the datacollected during the recognition task. The Next buttonwas enabled only on completion of all rows.

6 Results

The overall number of pictures shown was 1624 (29participants x 56 pictures per participant). On average, aparticipant took 11 minutes to complete the experimentincluding the pre-test and the post-test questionnaire.Results show that recognition rates varied across emo-tion categories, as well as between the two conditions.Figure 9 summarises the results.

Fig. 6 The pre-testquestionnaire

Fig. 7 The recognition screen

73

Page 9: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

Surprise, fear, happiness and neutral show slightlyhigher recognition rates for the photographs, while incategoriesanger and sadness the virtual faces are moreeasily recognised than their counterparts. Disgust standsout as it shows a very low scoring for virtual faces(around 20%) in contrast to the result for photographsof disgust which is over 70%.

Overall, results clearly suggest that recognition ratesfor photographs (78.6% overall) are significantly higher

than those for virtual heads (62.2% overall). The Mann-Whitney test confirms this, even at a significance level of1%. However, a closer look at the recognition rates ofparticular emotions reveals that all but one emotioncategory have at least one photograph-virtual head pairwith comparable results, demonstrating that recognitionwas as successful with the virtual head as it was with thedirectly corresponding photographs. Figure 10 showsrecognition rates for these ‘‘top’’ virtual heads in each

Fig. 8 The post-testquestionnaire

Fig. 9 A summary ofrecognition rates

Fig. 10 A summary ofrecognition rates for selectedimages

74

Page 10: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

category. Disgust still stands out as a category with‘‘poor’’ results for the virtual head.

Results also indicate that recognition rates vary sig-nificantly between participants. The lowest scoringindividual recognised 30 out of 56 emotions correctly(54%), and the highest score was 48 (86%). Those whoachieved better results did so homogeneously betweenvirtual heads and photographs. Lower scoring partici-pants were more likely to fail recognising virtual headsrather than the photographs.

The expressions of emotion identified as being mostdistinctive are shown in Fig. 11. Each expression is codedaccording to FACS with corresponding action units.Some action units are binary, i.e., they are applied or not,while other action units have an associated intensityscoring. Intensity can vary from A (weakest) to E(strongest). The study results would recommend use ofthese particular expressions, or ‘‘exemplars’’, for modelswith a similarly limited number of animation controlparameters:

Surprise (AUs 1C 2C 5C 26) is a very brief emotion,shown mostly around the eyes. Our ‘‘exemplary surpriseface’’ features high raised eyebrows and raised upper

lids. The lower eyelids remain in the relaxed position.The open mouth is relaxed, not tense. Unlike the typicalhuman surprise expression, the virtual head does notactually drop the jaw bone. The evidence is that thisdoes not have an adverse effect however, consideringthat 80% of all participants classified the expressioncorrectly.

Fear (AUs 1B 5C L10A 15A 25) usually has a dis-tinctive appearance in all three areas of the face [28]. Thevariation which proved to be most successful in ourstudy is characterised by raised, slightly arched eye-brows. The eyes are wide open as in surprise and the lipsare parted and tense. This is in contrast to the open butrelaxed ‘‘surprise’’ mouth. There is an asymmetry in thatthe left upper lip is slightly raised.

Disgust (AUs 4C 7C 10A) is typically shown in themouth and nose area [28]. The variation with the bestresults is characterised mainly by the raised upper lip(AU10) together with tightened eyelids. It has to bestressed that disgust was the least successful categorywith only 30% of the participants assigning thisexpression correctly.

Our Anger face (AUs 2A 4B 7C 17B) features low-ered brows that are drawn together. Accordingly, theeyelids are tightened which makes the eyes appear to bestaring out in a penetrating fashion. Lips are pressedfirmly together with the corners straight, a result of thechin raiser AU17.

Happiness (AUs 12C, 25) turned out to be easy torecognise—in most cases a cheek raiser (AU12) is suffi-cient. In our exemplary face, the eyes are relaxed andmouth corners are being pulled up. The virtual headdoes not allow a change to the cheek appearance, neitherdoes it allow for wrinkles to appear underneath the eyes.Such smiles without cheek or eye involvement aresometimes referred to as non-enjoyment smiles, or‘‘Duchenne smiles’’ after the 19th century French neu-rologist Duchenne de Boulogne [31].

The sadness expression (AUs 1D 4D 15A 25) thatwas most successful has characteristic brow and eyefeatures. The brows are raised in the middle while theouter corners are lowered. This affects the eyes which aretriangulated with the inner corner of the upper lidsraised. The slightly raised lower eyelid is not necessarilytypical [33] but, in this case, increases the sadnessexpression. The corners of the lips are down.

Fig. 11 The most distinctive expressions

Table 2 Error matrix foremotion categorisation Category Response (virtual/photograph) ‘‘Other’’ or

‘‘Don’t know’’Surprise Fear Disgust Anger Happiness Sadness Neutral

Surprise .67 .85 .06 .07 .00 .00 .00 .01 .23 .00 .00 .00 .01 .00 .03 .08Fear .15 .19 .41 .73 .00 .04 .30 .00 .03 .00 .03 .00 .02 .00 .06 .03Disgust .01 .02 .02 .00 .22 .77 .39 .14 .01 .00 .04 .00 .10 .01 .21 .07Anger .03 .04 .00 .04 .00 .03 .77 .72 .02 .00 .03 .03 .11 .05 .05 .09Happiness .01 .00 .01 .00 .01 .00 .01 .00 .64 .84 .03 .00 .26 .15 .04 .02Sadness .06 .00 .09 .10 .00 .00 .00 .01 .01 .01 .85 .66 .03 .09 .01 .07Neutral .03 .00 .03 .00 .01 .00 .00 .01 .00 .02 .11 .01 .78 .94 .04 .02

75

Page 11: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

6.1 Recognition errors

The errors made by participants when assigningexpressions to categories are presented in Table 2. Thematrix shows which categories have been confused, andcompares virtual heads with photographs. Rows give percent occurrence of each response. Confusion valuesabove 10% are shaded light grey, above 20% dark grey,above 30% black.

6.1.1 Disgust and anger

Table 2 shows that the majority of confusion errors weremade in the category disgust, an emotion frequentlyconfused with anger. When examining results for virtualheads only, anger (39%) was picked almost twice asoften as disgust (22%). Further, with faces showingdisgust, participants often felt unable to select any givencategory and instead picked ‘‘Don’t know’’, or suggestedan alternative emotion. These alternatives were forexample aggressiveness, hatred, irritation, or self-right-eousness.

Ekman and Friesen [28] describe disgust (or con-tempt) as an emotion that often carries an element ofcondescension toward the object of contempt. Peoplefeeling disgusted by other people, or their behaviour,tend to feel morally superior to them. Our observationsconfirm this tendency, for where ‘‘other’’ was selectedinstead of the expected ‘‘disgust’’, the suggested alter-native was often in line with the interpretation found by[28].

6.1.2 Fear and surprise

The error matrix (Table 2) further reveals that fear wasoften mistaken for surprise, a tendency that was alsoobserved in several other studies (see [34]). It is statedthat a distinction between the two emotions can be ob-served with high certainty only in ‘‘literate’’ cultures, butnot in ‘‘pre-literate’’, visually isolated cultures. Socialpsychology states that experience and therefore expres-sion of fear and surprise often happen simultaneously,such as when fear is felt suddenly due to an unexpectedthreat [28]. The appearance of fear and surprise is alsosimilar, with fear generally producing a more tense facialexpression. However, fear differs from surprise in threeways:

1. Whilst surprise is not necessarily pleasant orunpleasant, even mild fear is unpleasant.

2. One can be afraid of something familiar that is cer-tainly going to happen (for example a visit to thedentist), whereas something familiar or expected canhardly be surprising.

3. Whilst surprise usually disappears as soon as it isclear what the surprising event was, fear can lastmuch longer, even when the nature of the event isfully known.

These indicators allow for the differentiation ofwhether a person is afraid or surprised. All three have todo with the context and timing of the fear-inspiringevent—factors that are not perceivable from a stillimage.

In accordance with this, Poggi and Pelachaud [56]found that emotional information is not only containedin the facial expression itself, but also in the performativesof a communicative act: suggesting, warning, ordering,imploring, approving and praising. Similarly, Bartneck[49] observed significantly higher recognition rates whenstill images of facial expressions were shown in a dicegame context, compared to a display without any con-text. In other words, the meaning and interpretation of anemotional expression can depend on the situation inwhich it is shown. This strongly suggests that in situationswhere the facial expression is animated or displayed incontext, recognition rates can be expected to be higher.

6.1.3 Fear and anger

The relationship between fear and anger is similar tothat between fear and surprise. Both can occur simul-taneously, and their appearance often blends. What isstriking is that all confusions were made with virtualfaces, whilst not even one of the fear photographs wascategorised as anger. This may suggest that the fearcategory contained some relatively unsuitable examplesof modelled facial expressions. An examination of theresults shows that there was one artefact in particularthat was regularly mistaken for anger.

Figure 12 shows an expression with the appearanceof the eyes being characteristic of fear. The lower eyelidis visibly drawn up and appears to be very tensed. Botheyebrows are slightly raised and drawn together. Thelower area of the face also shows clear characteristics offear, such as the slightly opened mouth with stretchedlips that are drawn together. In contrast, an angrymouth has the lips either pressed firmly together or openin a ‘‘squarish’’ shape, as if to shout.

However, 18 out of 29 times this expression wascategorised as anger. In anger, as in fear, eyebrows canbe drawn together. But unlike the fearful face whichshows raised eyebrows, the angry face features a loweredbrow. Generally, we have found that subtle changes toupper eyelids and brows had a significant effect on theexpression overall, which is in line with findings for real-life photographs [28].

The eyebrows in Fig. 12 are only slightly raised fromthe relaxed position, but perhaps not enough to give thedesired impression. Another confusing indicator is thefurrowed shape of the eyebrows, since a straight line orarched brows are more typical for fear.

In contrast, in Fig. 13 the expression is identical tothe expression in Fig. 12 apart from the eyebrows, whichare now raised and arched, thereby changing the facialexpression significantly and making it less ambiguousand distinctively ‘‘fearful’’.

76

Page 12: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

6.2 Post-experiment questionnaire results

After completing the recognition task participants wereasked to complete a questionnaire and were invited to

comment on any aspect of the experiment. Responses tothe latter are discussed in the next section of this paper.The questionnaire comprised eleven questions, each oneanswered on a scale from 0–4 with 0 being total dis-agreement and 4 being total agreement. Table 3 showsthe average values per question.

7 Discussion and conclusions

The experiment followed the standard practice forexpression recognition experiments by preparing the sixuniversal emotions as pictures of avatar faces of pho-tographs of real human faces and showing these picturesto participants who were asked to say what emotion theythought each photograph or picture portrayed [57].Photographs were selected from the databank ‘‘Picturesof Facial Affect’’ solely based on their high recognitionrates. This was believed to be the most appropriatemethod, aiming to avoid the introduction of factors thatwould potentially disturb results, such as gender, age orethnicity.

Furthermore, the photographs are considered stan-dardised facial expressions of emotions and exact AUcoding is available for them. This ensures concurrentvalidity, since performance in one test (virtual head) isrelated to another, well reputed test (FACS coding andrecognition). Potential order effects induced by thestudy’s repeat measures design were neutralised by pre-senting the artefacts of the two conditions in a mixedrandom order.

Further confidence in the results derives from the factthat participants found the interface easy to use (Ta-ble 3, statement 1), implying that results were not dis-torted by extraneous user interface factors. Similarly,although participants tended to feel that the photo-graphs looked posed (Table 3, statement 8), they nev-ertheless tended to see them as showing real emotion(Table 3, statement 4). Again, despite some ambivalencein the matter (Table 3 statements 2 and 9), participantswere on the whole happy with the number of categoriesof emotion offered in the experiment. This is not unex-pected since the facial expressions were showing merelythe offered range of emotions, and it supports thevalidity of our results. However, the slight agreementindicates more categories could potentially have

Fig. 12 Fear expression, variation A

Table 3 Post-experimentsquestionnaire results No. Statement Score (0=disagree, 4=agree)

1. The interface was easy to use 3.82. More emotion categories would have been better 2.33. Emotions were easy to recognise 2.24. The real people showed natural emotions 2.65. I responded emotionally to the pictures 2.06. It was difficult to find the right category 1.97. The ‘‘recognisability’’ of the emotions varied a lot 2.58. The real-life photographs looked posed 2.29. The choice of emotions was sufficient 2.910. Virtual faces showed easily recognisable emotions 2.711. The virtual head looked alienating 2.9

Fig. 13 Fear expression, variation B

77

Page 13: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

produced more satisfaction in participants when makingtheir choice. Two participants noted in their comments,explicitly, that they would have preferred a wider choiceof categories.

Having established the validity of the experimentalprocedure and results, an important conclusion to bedrawn is that the approach of applying the reducedFACS model to virtual face representations is notguaranteed to work for all expressions, or all variationsof a particular emotion category. This is implied by thefinding that recognition rates for the photographs weresignificantly higher than those for the virtual heads.Further evidence is supplied in the post-experimentquestionnaire data. Two participants, for example, no-ted that on several occasions the virtual face expressionwas not distinctive enough, and two other participantsnoted that the virtual head showed no lines or wrinklesand that recognition might have been easier with thesevisual cues.

Nevertheless, our data also suggests that, whenapplying the FACS model to virtual face representa-tions, emotions can effectively be visualised with a verylimited number of facial features and action units. Forexample, in respect of the ‘‘top scoring’’ virtual heads,emotion recognition rates are, with the exception of the‘‘disgust’’ emotion, comparable to those of their corre-sponding real-life photographs. These top-scoringexpressions are exemplar models for which detailed AUscoring is available. They potentially build a basis foremotionally expressive avatars in collaborative virtualenvironments and hence for the advantages of emo-tionally enriched CVEs argued for earlier.

No categorisation system can ever be complete.Although accepted categories exist, emotions can vary inintensity and inevitably there is a subjective element torecognition. When modelling and animating facial fea-tures, however, our results suggest that such ambiguity ininterpretation can be minimised by focussing on, andemphasising, those visual clues that are particularly dis-tinctive. Although it remains to be corroborated throughfurther studies, it is believed that such simple, pureemotional expressions could fulfil a useful role in dis-playing explicit, intended communicative acts which cantherefore help interaction in a CVE. They can provide abasis for emotionally enriched CVEs, and hence for thebenefits of such technology being used, for example,within distance learning as argued for earlier.

It should perhaps be noted, however, that such pureforms of emotion are not generally seen in real life, asmany expressions occurring in face-to-face communi-cation between humans are unintended or automaticreactions. They are often caused by a complex interac-tion of several simultaneous emotions, vividly illustratedin Picard’s example of a marathon runner who, afterwinning a race, experiences a range of emotions: ‘‘tre-mendously happy for winning the race, surprised becauseshe believed she would not win, sad that the race wasover, and a bit fearful because during the race she hadacute abdominal pain’’ [21].

With regards to our own work, such instinctivereactions could be captured and used to control anavatar directly, potentially allowing varying intensitiesand blends of facial expressions to be recognised andmodelled onto avatar faces. However, this study hasdeliberately opted for an avatar that can expressclearly, and unambiguously, what the controllingindividual exactly wants it to express, since this is oneway in which people may want to use CVE technol-ogy.

Another issue concerns consistency. Social psychol-ogy suggests, as do our own findings, that an emotion’srecognisability depends on how consistently it is shownon a face. Furthermore, most emotions, with theexception of sadness, become clearer and more distinc-tive when their intensity increases. There are indicationsthat in cases where the emotion appeared to be ambig-uous at first, the photographs contained subtle clues asto what emotion is displayed, enabling the viewer toassign the emotion after closer inspection. These cluesappear to be missing in the virtual head artefacts, sug-gesting the need to either emphasise distinctive andunambiguous features, or to enhance the model byadding visual cues that help identify variations of emo-tion more clearly. For further work on emotions in real-time virtual environment interactions the authors aim toconcentrate on the former.

Overall, it should be noted that many of the arte-facts classified by participants as the ‘‘Other...’’ choiceare actually close to the emotion category expected,confirming that the facial expressions in those caseswere not necessarily badly depicted. This highlightsthe importance of having a well-defined vocabularywhen investigating emotions—a problem that is notnew to the research community and that has beendiscussed at length over the years (see [33] for an earlycomparison of emotion dimensions vs. categories, also[32, 58]).

The experimental work discussed in this paper pro-vides strong evidence that creating avatar representa-tions based on the FACS model, but using only a limitednumber of facial features, allows emotions to be effec-tively conveyed, giving rise to recognition rates that arecomparable with those of the corresponding real-lifephotographs. Effectiveness has been demonstratedthrough good recognition rates for all but one of theemotion categories, and efficiency has been establishedsince a reduced feature set was found to be sufficient tobuild a successfully recognised core set of avatar facialexpressions.

In consequence, the top-scoring expressions illus-trated earlier may be taken to provide a sound basis forbuilding emotionally expressive avatars to representusers (which may in fact be agents or human users), inCVEs. When modelling and animating facial features,potential ambiguity in interpretation can be minimisedby focussing on, and emphasising, particularly distinc-tive visual clues of a particular emotion. We have pro-posed a set of expressions that fulfil this. These are not

78

Page 14: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

necessarily the most distinctive clues for a particularemotion as a whole, but those that we found to be verydistinctive for that emotion category.

8 Further work

It is planned to extend the work in a variety of ways. Thedata reveals that certain emotions were confused moreoften than others, most notably disgust and anger. Thiswas particularly the case for the virtual head expres-sions. Markham and Wang [59] observed a similar linkbetween these two emotions when showing photographsof faces to children. Younger children (aged 4–6) inparticular tended to group certain emotions together,while older children (aged 10+) were typically found tohave the ability to differentiate correctly. In view of thefindings from the current study, this may indicate thatalthough adults can differentiate emotions well in day-to-day social interactions, the limited clues provided bythe virtual head make observers revert back to a lessexperience-based, but more instinct-based manner whencategorising them. However, more work will be neces-sary to investigate this possibility.

Two other studies also found disgust often confusedwith anger [49, 60] and concluded that the lack of morphtargets, or visual clues, around the nose was a likelycause. In humans, Disgust is typically shown around themouth and nose [28] and although our model features aslightly raised lip (AU10), there is no movement of thenose. This strongly suggests that to improve distinc-tiveness of the disgust expression in a real-time animatedmodel, the nose should be included in the animation, asshould the relevant action unit AU9 which is responsiblefor ‘‘nose wrinkling’’. Given this, we have now devel-oped an animated model of the virtual head that iscapable of lifting and wrinkling the nose to expressdisgust.

The experimental results, in particular the relativelyhigh number of ‘‘Other’’ and ‘‘Don’t know’’ responses,indicate that limiting the number of categories of emo-tion might have had a negative effect on the recognitionsuccess rates. It might be that allowing more categories,and/or offering a range of suitable descriptions for anemotion category (such as joy, cheerfulness and delight,to complement happiness), would yield still higher rec-ognition rates, and future experiments will address this.

Similarly, although concentrating on the face as theprimary channel for conveying emotions, the work mustbe seen in a wider context in which the entire humanoidrepresentation of a user can in principle act as thecommunication device in CVEs. The experiments dis-cussed here set the foundation for further work onemotional postures and the expression of attitudethrough such a virtual embodiment, drawing for exam-ple on the work of [61] on posture, [62] on gestures, or [4]on spatial behaviour and gestures.

A further contextual aspect of emotional recognitionconcerns the conversational milieu within which emo-

tions are expressed and recognised. Context plays acrucial role in emotion expression and recognition—theeffective, accurate mediation of emotion is closely linkedwith the situation and other, related, communicativesignals. A reliable interpretation of facial expressions,which fails to take cognisance of the context in whichthey are displayed, is often not possible. One wouldexpect, therefore, that recognition of avatar representa-tions of emotion will be higher when contextualised.This assumption requires empirical investigation, how-ever, and future experiments are planned to address this.

Bartneck [49] distinguishes between the recognisabil-ity of a facial expression of emotion, and its ‘‘convinc-ingness’’, seeing the latter as more important, andfurther experimental work will enable study of how thisdistinction plays itself out in a virtual world. It is pre-dicted that timing will affect ‘‘convincingness’’ in a vir-tual world. For example, showing surprise over a periodof, say, a minute would—at the very least—send con-fusing or contradictory signals. It will also be possible toinvestigate this and, more generally, what impact themediation of emotions has on the conversational inter-changes.

A further contextual issue concerns culture. Althoughemotions exist universally, there can be cultural differ-ences concerning when emotions are displayed [32]. Itappears that people in various cultures differ in whatthey have been taught about managing or controllingtheir facial expressions of emotions. Ekman and Friesen[28] call these cultural norms ‘‘display rules’’. Displayrules prescribe whether, and if so when, an emotion issupposed to be fully expressed, masked, lowered orintensified. For instance, it has been observed that maleJapanese are often reluctant to show unpleasant emo-tions in the physical presence of others. Interestingly,these cultural differences can also affect the recognitionof emotions. In particular, Japanese people reportedlyhave more difficulty than others recognising negativeexpressions of emotions, an effect that may reflect a lackof perceptual experience with such expression because ofthe cultural proscriptions against displaying them [32].How such cultural differences might play themselves outin a virtual world is an important open question.

Finally, the authors wish to explore how the resultsconcerning the mediation of emotions via avatars mightbe beneficially used to help people with autism. Acommonly, if not universally, held view of the nature ofautism is that it involves a ‘‘triad of impairments’’ [63].There is a social impairment: the person with autismfinds it hard to relate to, and empathise with, otherpeople. Secondly, there is a communication impairment:the person with autism finds it hard to understand anduse verbal and non-verbal communication. Finally,there is a tendency to rigidity and inflexibility in think-ing, language and behaviour. Much current thinking isthat this triad is underpinned by a ‘‘theory of minddeficit’’—people with autism may have a difficulty inunderstanding mental states and in ascribing them tothemselves or to others.

79

Page 15: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

CVE technology of the sort discussed in this papercould potentially provide a means by which people withautism might communicate with others (autistic or non-autistic) and thus circumvent their social and commu-nication impairment and sense of isolation. Further, aswell as this prosthetic role, the technology can also beused for purposes of practice and rehearsal. For this tohelp combat any theory of mind problem, users wouldneed to be able to recognise the emotions being dis-played via the avatars. The findings reported in thecurrent paper give grounds for confidence that thetechnology will be useful in such a role, but this needs tobe investigated in practice [64, 65].

Much remains to be investigated, therefore, con-cerning the educational use of the emerging CVE tech-nology. It is hoped that the work reported in this paperwill help set the foundation for further work on themediation of emotions in virtual worlds.

Acknowledgements Photographs from the CD-Rom Pictures ofFacial Affect [35] are used with permission. Original virtual headgeometry by Geometrek. Detailed results of this study as well as thevirtual head prototypes are available online at http://www.leedsmet.ac.uk/ies/comp/staff/mfabri/emotion.

References

1. Thalmann D (2001) The role of virtual humans in virtualenvironment technology and interfaces. In: Earnshaw R, GuedjR and Vince J (eds) Frontiers of human-centred computing,online communities and virtual environments. Springer, BerlinHeidelberg New York

2. Fleming B, Dobbs D (1999) Animating facial features andexpressions. Charles River Media, Boston, MA

3. Dumas C, Saugis G, Chaillou C, Degrande S and Viaud M(1998) A 3-D interface for cooperative work. In: Proceedings ofthe Conference on Collaborative Virtual Environments, Man-chester, UK, 17–19 June 1998

4. Manninen T, Kujanpaa T (2002) Non-verbal communicationforms in multi-player game sessions. In: Faulkner X, Finlay J,Detienne F (eds) People and computers XVI—memorable yetinvisible. BCS Press, London, UK

5. Atkins H, Moore D, Hobbs D and Sharpe S (2001) Learningstyle theory and computer mediated communication. In:Proceedings of ED-Media, Tampere, Finland, 25–30 June2001

6. Laurillard D (1993) Rethinking university teaching. Routledge,London, UK

7. Moore M (1993) Three types of interaction. In: Harry K, JohnM and Keegan D (eds) Distance education: new perspectives.Routledge, London, UK

8. Johnson D, Johnson R (1994) Cooperative learning in theculturally diverse classroom. In: DeVillar, Faltis, Cummins(eds) Cultural diversity in schools. State University of NewYork Press, Albany, NY

9. Webb N (1995) Constructive activity and learning in collabo-rative small groups. Educ Psychol 87(3):406–423

10. Wu A, Farrell R and Singley M (2002) Scaffolding grouplearning in a collaborative networked environment. In: Pro-ceedings of CSCL 2002, Boulder, CO, 7–11 January 2002

11. Lisetti C, Douglas M and LeRouge C (2002) Intelligent affec-tive interfaces: a user-modeling approach for telemedicine. In:Proceedings of the International Conference on Universal Ac-cess in HCI, New Orleans, LA, 5–10 August 2002

12. Daly-Jones O, Monk A and Watts L (1998) Some advantagesof video conferencing over high-quality audio conferencing:

fluency and awareness of attentional focus. Int J Hum-CompStud 49(1):21–58

13. McShea J, Jennings S and McShea H (1997) Characterisinguser control of video conferencing in distance education. In:Proceedings of CAL-97, Exeter, UK, 25–26 March 1997

14. Fabri M, Gerhard M (2000) The virtual student: userembodiment in virtual learning environments. In: Orange G,Hobbs D (eds) International perspectives on tele-education andvirtual learning environments, Ashgate, Aldershot, UK

15. Knapp M (1978) Nonverbal communication in human inter-action. Holt Rinehart Winston, New York, NY

16. Morris D, Collett P, Marsh P and O’Shaughnessy M (1979)Gestures, their origin and distribution. Jonathan Cape, Lon-don, UK

17. Argyle M (1988) Bodily communication (second edition). Me-thuen, New York, NY

18. Strongman K (1996) The psychology of emotion (fourth edi-tion). Wiley, New York

19. Dittrich W, Troscianko T, Lea S and Morgan D (1996) Per-ception of emotion from dynamic point-light displays presentedin dance. Perception 25:727–738

20. Keltner D (1995) Signs of appeasement: evidence for the dis-tinct displays of embarrassment, amusement and shame. PersSoc Psychol 68(3):441–454

21. Picard R (1997) Affective computing. MIT Press, Cambridge,MA

22. Lisetti C, Schiano D (2000) Facial expression recognition:where human-computer interaction, artificial intelligence andcognitive science intersect. Prag Cognit 8(1):185–235

23. Damasio A (1994) Descarte’s error: emotion, reason and thehuman brain. Avon, New York

24. Cooper B, Brna P and Martins A (2000) Effective affective inintelligent systems—building on evidence of empathy inteaching and learning. In: Paiva A (ed) Affective interactions:towards a new generation of computer interfaces. Springer,Berlin Heidelberg New York

25. Johnson W (1998) Pedagogical agents. In: Computers in edu-cation proceedings, Beijing, China

26. McGrath A, Prinz W (2001) All that Is solid melts into soft-ware. In: Churchill, Snowdon, Munro (eds) Collaborative vir-tual environments—digital places and spaces for interaction,Springer, Berlin Heidelberg New York

27. Durlach N, Slater M (2002) Meeting people virtually: experi-ments in shared virtual environments. In: Schroeder R (ed) Thesocial life of avatars, Springer, Berlin Heidelberg New York

28. Ekman P, Friesen W (1975) Unmasking the face. Prentice Hall,Englewood Cliffs, NJ

29. New Oxford Dictionary of English. Oxford University Press,Oxford, UK

30. Russell J, Fernandez-Dols J (1997) The psychology of facialexpression. Cambridge University Press, Cambridge, UK

31. Surakka V, Hietanen J (1998) Facial and emotional reactionsto Duchenne and non-Duchenne smiles. Int J Psychophys29(1):23–33

32. Zebrowitz L (1997) Reading faces: window to the soul? West-view, Boulder, CO

33. Ekman P, Friesen W and Ellsworth P (1972) Emotion in thehuman face: guidelines for research and an integration offindings. Pergamon, New York

34. Ekman P (1999) Facial expressions. In: Dalgleish T, Power M(eds) Handbook of cognition and emotion. Wiley, New York

35. Ekman P, Friesen W (1975) Pictures of facial affect. Universityof California Press, San Francisco, CA

36. Ekman P, Friesen W (1978) Facial action coding system.Consulting Psychologists Press, San Francisco, CA

37. Bartlett M (1998) Face image analysis by unsupervised learningandredundancy reduction.Dissertation,UniversityofCalifornia

38. Pelachaud C, Badler N and Steedman M (1996) Generatingfacial expressions for speech. Cog Sci 20(1):1–46

39. Ekman P, Rosenzweig L (eds) What the face reveals: basicand applied studies of spontaneous expression using the facialaction coding system. Oxford University Press, Oxford, UK

80

Page 16: Mediating the expression of emotion in educational collaborative virtual environments: an experimental study

40. Terzopoulos D, Waters K (1993) Analysis and synthesis offacial image sequences using physical and anatomical models.Patt Anal Mach Intell 15(6):569–579

41. Platt S, Badler N (1981) Animating facial expression. ACMSIGGRAPH 15(3):245–252

42. Parke F (1982) Parameterized modeling for facial animation.IEEE Comp Graph Appl 2(9):61–68

43. Benford S, Bowers J, Fahlen L, Greenhalgh C and Snowdon D(1995) User embodiment in collaborative virtual environments.In: Proceedings of CHI 1995 Proceedings, Denver, CO, 7–11May 1995

44. Mori M (1982) The buddha in the robot. Tuttle, Boston, MA45. Reichardt J (1978) Robots: fact, fiction and prediction. Thames

& Hudson, London, UK46. Hindmarsh J, Fraser M, Heath C and Benford S (2001) Vir-

tually missing the point: configuring CVEs for object-focusedinteraction. In: Churchill E, Snowdon D and Munro A (eds)Collaborative virtual environments: digital places and spacesfor interaction, Springer, Berlin Heidelberg New York

47. Godenschweger F, Strothotte T and Wagener H (1997) Ren-dering gestures as line drawings. In: Proceedings of the GestureWorkshop 1997, Springer, Berlin Heidelberg New York

48. Donath J (2001) Mediated faces. In: Beynon M, Nehaniv C andDautenhahn K (eds) Cognitive technology: instruments ofmind, Warwick, UK

49. Bartneck C (2001) Affective expressions of machines. In: Pro-ceedings of CHI 2001, Seattle, WA, 31 March–5 April 2001

50. Ellis H (1990) Developmental trends in face recognition. BulletBrit Psychol Soc 3:114–119

51. Dittrich W (1991) Facial motion and the recognition of emo-tions. Psychol Beit 33(3/4):366–377

52. H-Anim Working Group. Specification for a standard VRMLhumanoid. http://www.h-anim.org.

53. Yacoob Y, Davis L (1994) Computing spatio-temporal repre-sentations of human faces. In: Proceedings of the ComputerVision and Pattern Recognition Conference, Seattle, WA, June1994

54. Essa I, Pentland A (1995) Coding, analysis, interpretation, andrecognition of facial expressions. IEEE Trans Patt Anal MachIntellig 19(7):757–763

55. Neisser U (1976) Cognition and reality. Freeman, San Fran-cisco, CA

56. Poggi I, Pelachaud C (2000) Emotional meaning and expressionin animated faces. In: Paiva A (ed) Affective interactions: to-wards a new generation of computer interfaces, Springer, BerlinHeidelberg New York

57. Rutter D (1990) Non-verbal communication. In: Eysenck M(ed) The Blackwell dictionary of cognitive psychology, Black-well, Oxford, UK

58. Wehrle T, Kaiser S (2000) Emotion and facial expression. In:Paiva A (ed) Affective interactions: towards a new generationof computer interfaces, Springer, Berlin Heidelberg New York

59. Markham R, Wang L (1996) Recognition of emotion by Chi-nese and Australian children. Cross-Cult Psychol 27(5):616–643

60. Spencer-Smith J, Innes-Ker A, Wild H and Townsend J (2002)Making faces with action unit morph targets. In: Proceedingsof the Artificial Intelligence and Simulated Behaviour Confer-ence, London, UK, 3–5 April 2002

61. Coulson M (2002) Expressing emotion through body move-ment: a component process approach. In: Proceedings of theArtificial Intelligence and Simulated Behaviour Conference,London, UK, 3–5 April 2002

62. Capin T, Pandzic I, Thalmann N and Thalmann D (1999)Realistic avatars and autonomous virtual humans in VLNETnetworked virtual environments. In: Earnshaw R, Vince J (eds)Virtual worlds on the Internet, IEEE Computer Science Press,Washington, DC

63. Wing L (1996) Autism spectrum disorders. Constable Robin-son, London, UK

64. Moore D, McGrath P and Thorpe J (2000) Computer aidedlearning for people with autism—a framework for research anddevelopment. Innov Educ Train Intl 37(3): 218–228

65. Moore D, Taylor J (2001) Interactive multimedia systems forpeople with autism. Educ Med 25(3):169–177

81