a flexible architecture for virtual humans in networked collaborative virtual environments

12
A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments Igor Sunday Pandzic 1 , Tolga K. Capin 2 , Elwin Lee 1 , Nadia Magnenat Thalmann 1 , Daniel Thalmann 2 1 MIRALab - CUI, University of Geneva, 24 rue du Général-Dufour, CH1211 Geneva 4, Switzerland 2 Computer Graphics Lab. , Swiss Federal Institute of Technology (EPFL), CH1015 Lausanne, Switzerland Abstract Complex virtual human representation provides more natural interaction and communication among participants in networked virtual environments, hence it is expected to increase the sense of being together within the same virtual world. We present a flexible framework for the integration of virtual humans in networked collaborative virtual environments. A modular architecture allows flexible representation and control of the virtual humans, whether they are controlled by a physical user using all sorts of tracking and other devices, or by an intelligent control program turning them into autonomous actors. The modularity of the system allows for fairly easy extensions and integration with new techniques making it interesting also as a testbed for various domains from "classic" VR to psychological experiments. We present results in terms of functionalities, example applications and measurements of performance and network traffic with an increasing number of participants in the simulation. Keywords: virtual humans, networked virtual environments, virtual environment software architecture, experimental evaluation. 1. Introduction Networked Collaborative Virtual Environments (NCVE) are often described as systems that permit to the users to feel as if they were together in a shared Virtual Environment. Indeed, the feeling of "being together" is extremely important for collaboration, as well as for the sense of presence felt by participants. A very important factor for the feeling of being together in a virtual world is the way users perceive each other: their embodiment. In a broader sense, this includes not just the graphical appearance but also the way movements, actions and emotions are represented. Although Networked Collaborative Virtual Environments have been around as a topic of research for quite some time, in most of the existing systems the embodiments are fairly simple, ranging from primitive cube-like appearances, non-articulated human-like or cartoon-like avatars to articulated body representations using rigid body segments 1-4 . Ohya et al. 5 report the use of human representations with animated bodies and faces in a virtual teleconferencing application. In the approach we adopted when developing the Virtual Life Network (VLNET) system 6-9 , more sophisticated virtual humans are simulated, including full anatomically based body articulation, skin deformations and facial animation (Figure 1). Managing multiple instances of such complex representations and the involved data flow in the context of a Networked Collaborative Virtual Environment, while maintaining versatile input and interaction options, requires careful consideration of the software architecture to use. In view of the complexity of the task and versatility we wanted to achieve, a highly modular, multiprocess architecture was a logical choice. Some of the modules are fixed as part of the system core, while the others are replaceable external processes allowing greater flexibility in controlling the events in the environment, in particular the movement and facial expressions of the virtual humans. EUROGRAPHICS ’97 / D.W. Fellner and L. Szirmay-Kalos (Guest Editors) Volume 16, (1997), Number 3 The Eurographics Association 1997

Upload: igor-sunday-pandzic

Post on 22-Sep-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

A Flexible Architecture for Virtual Humans inNetworked Collaborative Virtual Environments

Igor Sunday Pandzic1, Tolga K. Capin2, Elwin Lee1, Nadia Magnenat Thalmann1, Daniel Thalmann2

1 MIRALab - CUI, University of Geneva, 24 rue du Général-Dufour, CH1211 Geneva 4, Switzerland2 Computer Graphics Lab. , Swiss Federal Institute of Technology (EPFL), CH1015 Lausanne, Switzerland

AbstractComplex virtual human representation provides more natural interaction and communication among participantsin networked virtual environments, hence it is expected to increase the sense of being together within the samevirtual world. We present a flexible framework for the integration of virtual humans in networked collaborativevirtual environments. A modular architecture allows flexible representation and control of the virtual humans,whether they are controlled by a physical user using all sorts of tracking and other devices, or by an intelligentcontrol program turning them into autonomous actors. The modularity of the system allows for fairly easyextensions and integration with new techniques making it interesting also as a testbed for various domains from"classic" VR to psychological experiments. We present results in terms of functionalities, example applicationsand measurements of performance and network traffic with an increasing number of participants in the simulation.

Keywords: virtual humans, networked virtual environments, virtual environment software architecture,experimental evaluation.

1. IntroductionNetworked Collaborative Virtual Environments

(NCVE) are often described as systems that permit tothe users to feel as if they were together in a sharedVirtual Environment. Indeed, the feeling of "beingtogether" is extremely important for collaboration, aswell as for the sense of presence felt by participants.A very important factor for the feeling of beingtogether in a virtual world is the way users perceiveeach other: their embodiment. In a broader sense, thisincludes not just the graphical appearance but also theway movements, actions and emotions arerepresented.

Although Networked Collaborative VirtualEnvironments have been around as a topic of researchfor quite some time, in most of the existing systemsthe embodiments are fairly simple, ranging fromprimitive cube-like appearances, non-articulatedhuman-like or cartoon-like avatars to articulated bodyrepresentations using rigid body segments1-4. Ohya et

al.5 report the use of human representations withanimated bodies and faces in a virtual teleconferencingapplication. In the approach we adopted whendeveloping the Virtual Life Network (VLNET)system6-9, more sophisticated virtual humans aresimulated, including full anatomically based bodyarticulation, skin deformations and facial animation(Figure 1). Managing multiple instances of suchcomplex representations and the involved data flow inthe context of a Networked Collaborative VirtualEnvironment, while maintaining versatile input andinteraction options, requires careful consideration ofthe software architecture to use. In view of thecomplexity of the task and versatility we wanted toachieve, a highly modular, multiprocess architecturewas a logical choice. Some of the modules are fixedas part of the system core, while the others arereplaceable external processes allowing greaterflexibility in controlling the events in theenvironment, in particular the movement and facialexpressions of the virtual humans.

EUROGRAPHICS ’97 / D.W. Fellner and L. Szirmay-Kalos(Guest Editors)

Volume 16, (1997), Number 3

The Eurographics Association 1997

Page 2: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

We first introduce in general the problems linkedwith involving Virtual Humans in NCVEs. The nextsection contains the presentation of overall propertiesof the VLNET system, followed by a detaileddescription of the software architecture adopted for thesystem: the VLNET server, then the VLNET clientwith its processes, engines, interfaces and drivers. Theresults section covers some experimental applicationsof VLNET, as well as performance and networkmeasurements with an increasing number of users.Finally we present our conclusions and discusspossibilities for future work.

2. Virtual Humans in NCVEsThe participant representation in a networked VEsystem has several functions:

• perception (to see if anyone is around)

• localization (to see where the person is)

• identification (to recognize the person)

• visualization of interest focus (to see where theperson's attention is directed)

• visualization of actions (to see what the person isdoing)

• communication (lip movement in sync withspeech, facial expressions, gestures)

Virtual Humans can fulfil all these functions in anintuitive, natural way resembling the way we achievethese tasks in real life. Even with limited sensorinformation, a virtual human frame can be constructedin the virtual world, reflecting the activities of the realuser. Slater and Usoh10 indicate that such a body,even if crude, already increases the sense of presencethat the participants feel. The participants visualizethe environment through the eyes of their virtualactor, and move their virtual body by different meansof body control. In addition, introducing the human-like autonomous actors for various tasks increases thelevel of interaction within the virtual environment.

Introducing virtual humans in the distributed virtualenvironment is a complex task combining severalfields of expertise6. The principal components are:

• virtual human simul ation , involving real timeanimation/deformation of bodies and faces

• virtual environment simulation , involving visualdata base management and rendering techniqueswith real time optimisations

• networking , involving communication of varioustypes of data with varying requirements in termsof bitrate, error resilience and latency

• interaction , involving support of different devicesand paradigms

• artificial intelligence (in case autonomous virtualhumans are involved), involving decision makingprocesses and autonomous behaviors

Each of these components represents in itself anarea of research and most of them are very complex.When combining them together the interactionbetween components and their impact on each otherhave to be considered. For example, using virtualhumans sets new requirements on interaction whichhas to allow not only simple interaction with theenvironment, but at the same time the visualizationof the actions through the body and face representingthe user; the necessity to communicate data throughthe network forces more compact representation offace and body animation data.

Figure 1: An example session of the Virtual LifeNetwork System with virtual humans.

Considering the total complexity of the abovecomponents, a divide-and-conquer approach is alogical choice. By splitting the complex task intomodules (see Figure 3), each with a precise functionand with well defined interfaces between them, severaladvantages are achieved:

• high flexibility

• easier software management, especially in a teamwork environment

• higher performance

• leveraging the power of multiprocessor hardware ordistributed environments when available

Flexibility is particularly important, because of themultitude of emerging hardware and software

Page 3: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

technologies that can potentially be linked withNCVE systems (various input devices and techniques,AI algorithms, real-time data sources driving multi-user applications). This is especially interesting in aresearch environment where a NCVE system can beused as a testbed for research in fields of AI,psychology, medical information systems etc. Ingeneral, a good NCVE system must allowimplementation of different applications whiletransparently performing its basic tasks (networking,user representation, interaction, rendering...) andletting the application programmer concentrate on theapplication-specific problems.

From the software management point of view, amonolithic system of this complexity would beextremely difficult to manage, in particular by a teamof programmers.

By carefully assigning tasks to processes andsynchronizing them intelligently, higher performancecan be achieved11.

Finally, a multi-process system will naturallyharness the power of multi-processor hardware if it isavailable. It is also possible to distribute modules onseveral hosts.

Once it is decided to split the system into modules,roughly corresponding to the components listedabove, it is necessary to define in detail the task ofeach module and the means of communicationbetween them. The next section describes in detailhow it is done in the case of the Virtual Life Networksystem.

3. Virtual Life NetworkBased on the considerations from the previous

section we have developed the Virtual Life Network(VLNET) system. From the networking point ofview, VLNET is based on a fairly simple client/serverarchitecture. The next two subsections discuss inmore detail the server and the client architecture.

3.1. VLNET ServerA VLNET server site consists of a HTTP server and

a VLNET Connection Server. They can serve severalworlds, which can be either VLNET files or VRML1.0 files. For each world, a World Server is spawned

as necessary, i.e. when a client requests a connectionto that particular world. The life of a World Serverends when all clients are disconnected.

Figure 2 schematically depicts a VLNET server sitewith several connected clients. A VLNET session isinitiated by a Client connecting to a particular worlddesignated by a URL. The Client first fetches theworld database from the HTTP server using the URL.After that it extracts the host name from the URL andconnects to the VLNET Connection Server on thesame host. The Connection Server spawns the WorldServer for the requested world if one is not alreadyrunning and sends to the Client the port address of theWorld Server. Once the connection is established, allcommunication between the clients in a particularworld passes through the World Server.

HTTPServer

VLNETConnection Server

World 1 World 2 World 3

VLNET World Server

VLNET World Server

VLNET Client

VLNET Client

VLNET Client

Figure 2: Connection of several clients to aVLNET server site

In order to reduce the total network load, the WorldServer performs the filtering of messages by checkingthe users' viewing frusta in the virtual world anddistributing messages only on an as-needed basis.Clients keep the possibility of overriding thismechanism by requesting a higher delivery insurancelevel for a particular message, e.g. for heartbeatmessages of a dead reckoning algorithm12.

Page 4: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

LEGEND:

Internal VLNET processes; can be changed only by recompiling VLNET

Logical entities within VLNET main process, called engines

Internal shared memory segments for data exchange within internal processes; not accessable to users

External shared memory interfaces, accessable to the users through defined APIs

External processes (called drivers); can be programed by the user using the defined APIs; they are replacable and sometimes optional

External devices; sometimes optional or replacable

FACIALREPRESEN-

TATIONENGINE

BODYREPRESEN-

TATIONENGINE

OBJECTBEHAVIOR

ENGINE

NAVIGATIONAND OBJECT

MANIPULATIONENGINE

COMMUNI-CATION

PROCESS

MESSAGEQUEUE

MAINSHAREDMEMORY

DATA BASEPROCESS

CULLPROCESS

DRAWPROCESS

FACIALEXPRESSIONINTERFACE

BODYPOSTURE

INTERFACE

NAVIGATIONINTERFACE

OBJECT BEHAVIORINTERFACE

FACIALEXPRESSION

DRIVER

BODYPOSTUREDRIVER

NAVIGATIONDRIVER

OBJECTBEHAVIOR

DRIVER

VLNET MAINPROCESS

...MORE OBJECT

BEHAVIORDRIVERS

NET

SCREEN,HMD

CAMERA

FOB,..

SB,MOUSE,

TRACKING

If any of the drivers runs ona remote host, the network interface

is automatically installed hereVLNET CORE

VIDEOENGINE

VIDEOINTERFACE

VIDEODRIVER

CAMERA,VCR...

Figure 3: Virtual Life Network system overview

3.2. VLNET ClientThe design of the VLNET Client is highly

modular, with functionalities split into a number ofprocesses. Figure 3 presents an overview of themodules and their connections. VLNET has an openarchitecture, with a set of interfaces allowing a userwith some programming knowledge to access thesystem core and change or extend the system byplugging custom-made modules, called drivers, intothe VLNET interfaces. In the next subsections we

explain in some detail the VLNET Core with itsvarious processes, as well as the drivers and thepossibilities for system extension they offer.

3.2.1. VLNET Core

The VLNET core is a set of processes,interconnected through shared memory, that performbasic VLNET functions. The Main Process performshigher level tasks, like object manipulation,navigation, body representation, while the otherprocesses provide services for networking

Page 5: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

(Communication Process), database loading andmaintenance (Database Process) and rendering (CullProcess and Draw Process).

3.2.1.1. The Main Process

The Main Process consists of five logical entities,called engines, covering different aspects of VLNET.It also initializes the session and spawns all otherprocesses and drivers. Each engine is equipped withan interface for the connection of external drivers.

The Object Behavior Engine takes care of thepredefined object behaviors, like rotation or falling,and has an interface allowing to program differentbehaviors using external drivers.

The Navigation and Object ManipulationEngine takes care of the basic user input:navigation, picking and displacement of objects. Itprovides an interface for the navigation driver. If nonavigation driver is activated, standard mousenavigation exists internally.

The Body Representation Engine isresponsible for the deformation of the body. In anygiven body posture (defined by a set of joint angles)this engine will provide a deformed body ready to berendered. The body representation is based on theHumanoid body model13. This engine provides theinterface for changing the body posture. A standardBody Posture Driver is provided, that connects also tothe navigation interface to get the navigationinformation, then uses the Walking Motor and theArm Motor9,14 to generate the natural bodymovement based on the navigation.

The Facial Representation Engine providesthe synthetic faces with a possibility to changeexpressions or the facial texture. The FacialExpression Interface is used for this task. It can beused to animate a set of parameters defining the facialexpression. The facial representation is a polygonmesh model with Free Form Deformationssimulating muscle actions15.

The Video Engine manages the streaming ofdynamic textures to the objects in the environmentand their correct mapping. Its interface provides theuser the possibility to stream video textures on anyobject(s) in the environment. The facial texturementioned concerning the Facial RepresentationEngines is a special case and it is handled by bothengines.

Audio is also a very important feature especiallyfor human to human communication, and should beavailable. Audio can be added to the VLNET client,similar to the other streams, using an audio engine.Currently, adding sound to the VLNET client is underprogress.

All the engines in the VLNET core process arecoupled to the main shared memory and to themessage queue. They use the data from the cullingprocess in order to perform "computation culling".This means that operations are performed only foruser embodiments and other objects when they arewithin the field of view, e.g. there is no need toperform facial expressions if the face is not visible atthe moment. Substantial speedup is achieved usingthis technique.

3.2.1.2. Cull and Draw Processes

Cull and Draw processes access the main sharedmemory and perform the functions of culling anddrawing as their names suggest. These processes arestandard SGI Performer processes.

3.2.1.3. The Communication Process

The Communication Process handles all networkconnections and communication in VLNET. Allother processes and engines are connected with itthrough the Message Queue. They fill the queue withoutgoing messages for the Communication Processto send. The messages are sent to the server,distributed and received by Communication Processesof other clients. The Communication Process putsthese incoming messages into the Message Queuefrom where the other processes and engines can readthem and react. All messages in VLNET use astandard message packet. The packet has a standardheader determining the sender and the message type,and the message body. The message body contentdepends on the message type but is always of thesame size (80 bytes), satisfying all message types inVLNET. For certain types of messages (positions,body postures, facial expressions) a dead-reckoningalgorithm is implemented within the CommunicationProcess 12. The video data from the Video Engine is aspecial case and is handled using a separatecommunication channel. It is given lower prioritythan the other data. By isolating the communicationsin this separate process, and by keeping the VLNETServer relatively simple, we leave the possibility toswitch relatively easily to a completely differentnetwork topology, e.g. multicasting instead ofclient/server.

3.2.1.4. The Data Base Process

The Data Base Process takes care of the off-linefetching and loading of objects and userrepresentations. By keeping these time consumingoperations in the separate process non-blockingoperation of the system is assured.

Page 6: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

3.2.2. The Drivers

The drivers provide the simple and flexible meansto access and control all the complex functionalitiesof VLNET. Simple, because each driver isprogrammed using a very small API that basicallyconsists of exchanging crucial data with VLNETthrough shared memory. Flexible, because usingvarious combinations of drivers it is possible tosupport all sorts of input devices ranging from themouse to the camera with complex gesturerecognition software, to control all the movements ofthe body and face using those devices, to controlobjects in the environment and stream video texturesto them, to build any amount of artificial intelligencein order to produce autonomous or semi-autonomousvirtual humans in the networked virtual environment.

The Drivers are directly tied to the Engines in theVLNET Main Process. Each engine provides a sharedmemory interface to which a driver can connect. Mostdrivers are optional and the system will provideminimal functionality (plain navigation andmanipulation of objects) without any drivers. Thedrivers are spawned by the VLNET Main Process atthe beginning of the session, based on the commandline where all combinations of drivers can bespecified. The drivers can be spawned on the localhost or on a remote host, in which case thetransparent networking interface processes are insertedon both hosts. In a simple case, as with most driversshown in figure 3, a driver controls only one engine.However, it is possible to control more than oneengine with a single driver, insuring synchronizationand cooperation.

3.2.2.1. Driver types

The Facial Expression Driver is used tocontrol expressions of the user's face. Theexpressions are defined using the Minimal PerceptibleActions (MPAs) 16. The MPAs provide a completeset of basic facial actions, and using them it ispossible to define any facial expression. Examples ofexisting facial expression drivers include a driver thatuses the video signal from the camera to track facialfeatures and map them into the MPAs describingexpressions17 and a driver that lets the user choosefrom a menu of expressions or emotions to show onhis face. The facial expression driver is optional.

The Body Posture Driver controls the motionof the user's body. The postures are defined using aset of joint angles corresponding to 72 degrees offreedom of the skeleton model used in VLNET. Anobvious example of using this driver is direct motioncontrol using magnetic trackers18. A more complexdriver is used to control body motion in a general

case when trackers are not used. This driver connectsalso to the Navigation Interface and uses thenavigation trajectory to generate the walking motionand arm motion. It also imposes constraints on theNavigation Driver, e.g. not allowing the hand tomove further than arm length or take an unnaturalposture. This is the standard body posture driverwhich is spawned by the system unless another driveris explicitly requested.

The Navigation Driver is used for navigation,hand movement, head movement, basic objectmanipulation and basic system control. The basicmanipulation includes picking objects up, carryingthem and letting them go, as well as grouping andungrouping of objects. The system control providesaccess to some system functions that are usuallyaccessed by keystrokes, e.g. changing drawing modes,toggling texturing, displaying statistics. Typicalexamples are a spaceball driver, tracker+glove driver,extended mouse driver (with GUI console). There isalso an experimental facial navigation driver lettingthe user navigate using his head movements andfacial expressions tracked by a camera17. If nonavigation driver is used, internal mouse navigationis activated within the Navigation Engine.

The Object Behavior Driver is used to controlthe behavior of objects. Currently it is limited tocontrolling motion and scaling. Examples include thecontrol of a ball in a tennis game and the control ofgraphical representation of stock values in a virtualstock exchange.

The Video Driver is used to stream video texture(but possibly also static textures) onto any object inthe environment. Alpha channel can be used forblending and achieving effects of mixing real andvirtual objects/persons. This type of driver is alsoused to stream facial video on the user's face forfacial communication19.

4. Results

4.1. ApplicationsSeveral experimental applications were developed

using the VLNET system. Some snapshots arepresented in figure 4.

4.1.1. Entertainment

NCVE systems lend themselves to development ofall sorts of multi user games. We had successfuldemonstrations of chess and other games playedbetween Switzerland and Singapore, as well asbetween Switzerland and several European countries.

Page 7: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

Figure 4: Snapshots from applications

A virtual tennis game has been developed20 wherethe user plays against an opponent who is anautonomous virtual human. The referee is also anautonomous virtual human capable of refereeing thegame and communicating the points, faults etc. byvoice.

Currently a multi user adventure game is underdevelopment.

4.1.2. Teleshopping

In collaboration with Chopard, a watch company inGeneva, a teleshopping application was successfullydemonstrated between Geneva and Singapore. Theusers were able to communicate with each otherwithin a rich virtual environment representing awatch gallery with several watches exposed. Theycould examine the watches together, exchangebracelets, and finally choose a watch.

4.1.3. Medical education

In collaboration with our colleagues working on amedical project we have used the 3D data of human

organs, muscles and skeleton reconstructed from MRIimages21 to build a small application in the field ofmedical education. The goal is to teach a student theposition, properties and function of a particularmuscle. In the virtual classroom several tools are atthe disposition of the professor. MRI, CT andanatomical slice images are on the walls showing theslices where the muscle is visible. A 3Dreconstruction of the skeleton is available togetherwith the muscle, allowing to examine the shape ofthe muscle and the points of attachment to theskeleton. Finally, an autonomous virtual human witha simple behavior is there to demonstrate themovements resulting from a contraction of thediscussed muscle.

4.1.4. Stock exchange

Currently an application is being developed tovisualize in 3D the real time updates of stockexchange data and allow interactions of users with thedata and with each other.

Page 8: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

4.2. Performance and networkingConsidering the graphical and computational

complexity of the human representations used inVLNET, we are currently not aiming for a systemscalable to large number of users, but rather trying toobtain a high quality experience for a small numberof users. The graphs of performance and networktraffic (figures 6 and 7) show that the system isindeed not scalable to any larger number ofparticipants. Nevertheless, the results are reasonablefor a small number of users and, more importantly,their analysis permits gaining insight to the stepsneeded to insure better scalability.

4.2.1. Experiment design

In order to conveniently simulate a growingnumber of users we have designed simple drivers togenerate some reasonable motion and facialexpressions. The navigation driver we used generatesa random motion within a room, and the facialexpressions driver generates a facial animationsequence repeatedly. By launching several clients ondifferent hosts using these drivers we can easilysimulate a session with a number of persons walkingin the room and emoting using their faces.

Although we looked for a way to simulatesomething close to a real multi-user session in acontrolled way, there is a difference in the fact thatsimulated users move their bodies and faces all thetime. In a real session, the real users would probablymake pauses and thus generate less activity. Thereforewe expect somewhat better results in a real sessionthan the ones shown here, although for practicalreasons we did not make such tests with real users.

In order to evaluate the overhead induced by the useof high level body representation in the NCVE, wehave undertaken three series of measurements: withfull body representation, simplified bodyrepresentation and without a body representation. Thefull body representation involves complex graphicalrepresentation (approx. 10 000 polygons) anddeformation algorithms. The simplified bodyrepresentation consists of a body with reducedgraphical complexity (approx. 1500 polygons), withfacial deformations and with a simplified bodyanimation based on displacement of rigid bodyelements (no deformation). The tests without thebody representation were made for the sake ofcomparison. To mark the positions of users we usedsimple geometric shapes. No facial or bodyanimation is involved. Figure 5 shows somesnapshots from the measurement sessions.

The network traffic (figure 7) was measured on theserver. We have measured incoming and outgoingtraffic in kilobits per second.

For the performance measures (figure 6), weconnected one standard, user controlled client to beable to control the point of view. The rest of theclients were user simulations as described above.Performance is measured in number of frames persecond. Since it varies depending on the point ofview because of rendering and computation culling,we needed to insure consistent results. We havechosen to take measurements with a view where allthe user embodiments are within the field of view,which represents the minimal performance for a givennumber of users in the scene. The performance wasmeasured on a Silicon Graphics Indigo MaximumImpact workstation with 200 MHz R4400 processorand 128M RAM.

4.2.2. Analysis of performance results

Figure 6 shows the variation of performance withrespect to the number of users in the simulation withdifferent complexities of body representation asexplained in the previous subsection. It is importantto notice that this is the minimal performance, i.e.the one measured at the moment when all the users'embodiments are actually within the field of view.Rendering and computation culling boost theperformance when the user is not looking at all theother embodiments because they are not rendered, andthe deformation calculations are not performed forthem. The embodiments that are out of the field ofview do not decrease performance significantly, whichmeans that the graph in figure 6 can be alsointerpreted as the peak performance when looking atN users, regardless of the total number of users in thescene. For example, even if the total number ofparticipants is 9, performance can still be 9 framesper second at the moment when only one user is inthe field of view. This makes the system much moreusable than the initial look at the graphs mightsuggest.

We have also measured the percentage of timespent on two main tasks: rendering and application-specific computation. With any number of users, therendering takes around 65 % of the time, and the restis spent on the application-specific computation, i.e.mostly the deformation of faces and bodies. For thecase of simplified body representation, the renderingtime percentage within a frame is 58 %. On machineswith less powerful graphics hardware, an even greaterpercentage of the total time is dedicated to rendering.

The results show that the use of complex bodyrepresentation induces a considerable overhead on the

Page 9: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

rendering side and somewhat less on the computationside. The fact that the performance can be boostedsignificantly by simply changing body representationproves that the system framework does not induce anoverhead in itself. The system is scalable in the sensethat for a particular hardware and requirements a setupcan be found to produce satisfying performance. Mostimportantly, this proves that the system should lenditself to implementation of extended Level of Detailtechniques managing automatically the balancebetween performance and quality of humanrepresentation.

4.2.3. Analysis of the network results

Figure 7 shows the bit rates measured on theVLNET server during a session with varying numberof simulated users walking within a room asdescribed in the subsection on experiment design. Wehave measured separately the incoming and outgoingtraffic, then calculated the total.

Obviously, the incoming traffic grows linearlywith the number of users, each user generating thesame bitrate. It can be remarked that the bitrategenerated per user is roughly 30 Kbit/sec covering thetransmission of body positions, body postures andfacial expressions.

It is worthwhile remarking that the In traffic curvemeasured at the server corresponds also to themaximal incoming traffic on a client. The client willreceive the incoming bitstream corresponding to thegraph in the situations where all the embodiments ofother users are in the viewing frustum of the localuser, i.e. in the worst case. Otherwise, when lookingat N users the incoming traffic will correspond to Nusers at the graph. This is similar to the situationwith performance measurements.

The outgoing traffic represents the distribution ofthe data to all the clients that need it. The Totaltraffic is the sum of incoming and outgoing traffic.The Projected traffic curve represents the trafficcalculated mathematically for the case of totaldistribution (all to all). The filtering technique at theserver (section 3.1.) insures that messages aredistributed only on as-needed basis and keeps theTotal curve well below the Projected curve. Furtherreduction is achieved using the dead-reckoningtechnique12, as illustrated by the curve "Total-DR".

Network traffic is the same when using full andsimplified body representations, because samemessages are transferred. In the case when no bodyrepresentation is used, less network traffic isgenerated because there is no need to send messagesabout body postures and facial expressions. A userwithout a body representation sends only position

updates, generating approximately 8 Kbit/sec. Thetotal traffic curve without a body representation isshown in figure 7 with label "No body".

5. ConclusionsWe have discussed problems involved with

introducing high level human representations inNetworked Collaborative Virtual Environments andpresented one solution to this problem - the VirtualLife Network. The system's modular architecture, thefunction of each component and theirinterconnections were described in detail. Variousexperimental applications highlighted in the resultssection show that this kind of system lends itself to awide variety of collaborative applications. Theperformance and network measurements quantify theoverhead induced on the CPU and the network by theintroduction of complex virtual humans. They showthat the overhead is considerable, but at the sametime they show the potential of the system towardsscalability.

6. Future workThe work on more realistic real time body

representations and performance improvements, aswell as methods for capturing and interpreting bodyand face movement, is ongoing. The performanceresults indicate a path for future research in thedirection of more scalable body representations. Levelof Detail technique can be extended to the bodyrepresentation not only in graphical, but also inprocedural sense by switching to simpleranimation/deformation techniques for lower levels ofdetail. This method can be further extended to act inreducing network traffic.

Networking performance is only a small part of theevaluation of the system. Our current work is focusedon evaluation of other parts, such as the visualquality of the human representation and its effects onthe participants' acceptance.

7. AcknowledgmentsThis research is financed by "Le Programme

Prioritaire en Telecommunications de Fonds NationalSuisse de la Recherche Scientifique" and the TEN-IBC project VISINET.

Numerous colleagues at LIG and MIRALab havehelped this research by providing libraries, body andenvironment models, scenarios for applications, inparticular Eric Chauvineau, Hansrudi Noser, MarlenePoizat, Laurence Suhner and Jean-Claude Moussaly.Dr. Jean Fasel, CMU, University of Geneva,provided the advice for the medical application.

Page 10: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

a)

b) c)

Figure 5: Snapshots from performance and network measurements: a) full bodies; b) simplified bodies; c) nobody representation

Page 11: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

Number of users

Fra

mes

/sec

0

5

1 0

1 5

2 0

2 5

3 0

1 2 3 4 5 6 7 8 9

Full body

No body

Simple body

Figure 6: Minimal performance with respect to the number of users in the session

Number of users

Kb

it/s

ec

0

5 0 0

1 0 0 0

1 5 0 0

2 0 0 0

2 5 0 0

1 2 3 4 5 6 7 8 9

I n

Out

Total

Projected

No body

Total DR

Figure 7: Network traffic measurements with respect to the number of users in the session

Page 12: A Flexible Architecture for Virtual Humans in Networked Collaborative Virtual Environments

References1. Barrus J. W., Waters R. C., Anderson D. B.,"Locales and Beacons: Efficient and Precise SupportFor Large Multi-User Virtual Environments",Proceedings of IEEE VRAIS, 1996.

2. Carlsson C., Hagsand O., "DIVE - a Multi-UserVirtual Reality System", Proceedings of IEEEVRAIS '93, Seattle, Washington, 1993.

3. Macedonia M.R., Zyda M.J., Pratt D.R., BarhamP.T., Zestwitz, "NPSNET: A Network SoftwareArchitecture for Large-Scale Virtual Environments",Presence: Teleoperators and Virtual Environments,Vol. 3, No. 4, 1994.

4. Singh G., Serra L., Png W., Wong A., Ng H.,"BrickNet: Sharing Object Behaviors on the Net",Proceedings of IEEE VRAIS '95, 1995.

5. Ohya J., Kitamura Y., Kishino F., Terashima N.,"Virtual Space Teleconferencing: Real-TimeReproduction of 3D Human Images", Journal ofVisual Communication and Image Representation,Vol. 6, No. 1, pp. 1-25, 1995.

6. Capin T.K., Pandzic I.S., Noser H., MagnenatThalmann N., Thalmann D. "Virtual HumanRepresentation and Communication in VLNETNetworked Virtual Environments", IEEEComputer Graphics and Applications, Special Issueon Multimedia Highways, 1997.

7. Capin T.K., Pandzic I.S., Magnenat-Thalmann N.,Thalmann, D., "Virtual Humans for RepresentingParticipants in Immersive Virtual Environments",Proceedings of FIVE '95, London, 1995.

8. D. Thalmann, T. K. Capin, N. MagnenatThalmann, I. S. Pandzic, “Participant, User-Guided,Autonomous Actors in the Virtual Life NetworkVLNET”, Proc. ICAT/VRST ’95, pp. 3-11.

9. Pandzic I.S., Capin T.K., Magnenat ThalmannN., Thalmann D. "Motor functions in the VLNETBody-Centered Networked Virtual Environment",Proc. of 3rd Eurographics Workshop on VirtualEnvironments, Monte Carlo, 1996.

10. Slater M., Usoh M. "Body Centered Interactionin Immersive Virtual Environments", in ArtificialLife and Virtual Reality, N. Magnenat Thalmann, D.Thalmann, eds., John Wiley, pp 1-10, 1994.

11. Rohlf J., Helman J., "IRIS Performer: A HighPerformance Multiprocessing Toolkit for Real-Time3D Graphics", Proc. SIGGRAPH'94, 1994.

12. Capin T.K., Pandzic I.S., Thalmann D.,Magnenat Thalmann N. "A Dead-ReckoningAlgorithm for Virtual Human Figures", Proc.VRAIS'97, IEEE Press, 1997.

13. Boulic R., Capin T., Huang Z., Kalra P.,Lintermann B., Magnenat-Thalmann N., MoccozetL., Molet T., Pandzic I., Saar K., Schmitt A., ShenJ., Thalmann D., "The Humanoid Environment forInteractive Animation of Multiple DeformableHuman Characters", Proceedings of Eurographics '95,1995.

14. Boulic R., Magnenat-Thalmann N. M.,ThalmannD. "A Global Human Walking Model with RealTime Kinematic Personification", The VisualComputer, Vol.6(6),1990.

15. Kalra P., Mangili A., Magnenat Thalmann N.,Thalmann D., "Simulation of Facial Muscle ActionsBased on Rational Free Form Deformations", Proc.Eurographics '92, pp.59-69., 1992.

16. Kalra P. "An Interactive Multimodal FacialAnimation System", PhD Thesis nr. 1183, EPFL,1993.

17. Pandzic I.S., Kalra P., Magnenat-Thalmann N.,Thalmann D., "Real-Time Facial Interaction",Displays, Vol. 15, No 3, 1994.

18. Molet T., Boulic R., Thalmann D., "A RealTime Anatomical Converter for Human MotionCapture", Proc. of Eurographics Workshop onComputer Animation and Simulation, 1996.

19. Pandzic I.S., Capin T.K., Magnenat ThalmannN., Thalmann D. "Towards Natural Communicationin Networked Collaborative Virtual Environments",Proc. FIVE 96, Pisa, Italy, 1996.

20. Noser H., Pandzic I.S., Capin T.K., MagnenatThalmann N., Thalmann D. "Playing Games throughthe Virtual Life Network", Proceedings of ArtificialLife V, Nara, Japan, 1996.

21. Beylot P., Gingins P., Kalra P., Magnenat-Thalmann N., Maurel W., Thalmann D., Fasel J."3D Interactive Topological Modeling using VisibleHuman Dataset", Proceedings of EUROGRAPHICS96, Poitiers, France, 1996.