auditory virtual environments: basics and applications for interactive simulations

16
* Corresponding author. Fax: #49-2407-57799. E-mail address: winfried.krebber@head.acoustics.de (W. Krebber). Signal Processing 80 (2000) 2307}2322 Auditory virtual environments: basics and applications for interactive simulations Winfried Krebber*, Hans-Wilhelm Gierlich, Klaus Genuit HEAD acoustics GmbH, Elbertstr. 30a, 52134 Herzogenrath, Germany Received 30 July 1999; received in revised form 29 November 1999 Dedicated to Prof. Dr. H.D. Lu K ke on the occasion of his 65th birthday Abstract Basic principles of auditory virtual environments, based on head-related acoustics, are explained, beginning with a short survey on binaural technology: binaural recording, playback and synthesis. Emphasis is laid on sound reproduction by loudspeaker arrangements. Features and limits provided by four-loudspeaker arrangements in rooms and car cabins are discussed in detail. The impact of the integration of vibration excitation is discussed also. Requirements for interactive, immersive auditory virtual environments are presented in terms of update rates, latency and complexity. Applications are discussed focussing on driving simulation. ( 2000 Elsevier Science B.V. All rights reserved. Zusammenfassung Die grundlegenden Prinzipien virtueller auditorischer Umgebungen, die auf kopfbezogener Akustik basieren, werden erkla K rt. Begonnen wird einem kurzen U G berblick binauraler Technologien: binaurale Aufnahme, Wiedergabe und Synthese. Die Betonung wird auf Schallreproduktion durch Lautsprecher- Anordnungen gelegt. Merkmale und Grenzen von Vier-Lautsprecher-Anordnungen in Ra K umen und Fahrzeug-Inneren werden detailliert diskutiert. Die Auswirkung der Integration von Vibrationserregung wird ebenfalls diskutiert. Erfordernisse fu K r interaktive immersive auditorische virtuelle Umgebungen werden dargestellt in Form von Aktualisierungsraten, Latenz und Komplexita K t. Anwendungen werden diskutiert, wobei sich auf Fahrsimulation konzentriert wird. ( 2000 Elsevier Science B.V. All rights reserved. Re 2 sume 2 Les principes fondamentaux des environnements auditifs virtuels, base H s sur l'acoustique spe H ci"que a` la te ( te, sont explique H s, en commenc 7 ant par un court passage en revue de la technologie binaurale: enregistrement binaural, re H e H coute et synthe` se. L'emphase est mise sur la reproduction des sons par des arrangements de hauts-parleurs. Les caracte H ristiques et les limites des arrangements a` quatre hauts-parleursdans les salles et les habitacles de voiture sont discute H es en de H tail. L'impact de l'inte H gration de l'excitation de vibration est e H galement discute H . Les exigences concernant les environnements auditifs virtuels immergeants sont pre H sente H es en termes de taux de mise a` jour, temps de latence et complexite H . Des applications focalise H es sur une simulation de conduite automobile sont discute H s. ( 2000 Elsevier Science B.V. All rights reserved. 0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 1 1 9 - 5

Upload: winfried-krebber

Post on 02-Jul-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Auditory virtual environments: basics and applications for interactive simulations

*Corresponding author. Fax: #49-2407-57799.E-mail address: [email protected] (W. Krebber).

Signal Processing 80 (2000) 2307}2322

Auditory virtual environments: basics and applicationsfor interactive simulations

Winfried Krebber*, Hans-Wilhelm Gierlich, Klaus Genuit

HEAD acoustics GmbH, Elbertstr. 30a, 52134 Herzogenrath, Germany

Received 30 July 1999; received in revised form 29 November 1999

Dedicated to Prof. Dr. H.D. LuK ke on the occasion of his 65th birthday

Abstract

Basic principles of auditory virtual environments, based on head-related acoustics, are explained, beginning witha short survey on binaural technology: binaural recording, playback and synthesis. Emphasis is laid on soundreproduction by loudspeaker arrangements. Features and limits provided by four-loudspeaker arrangements in roomsand car cabins are discussed in detail. The impact of the integration of vibration excitation is discussed also.Requirements for interactive, immersive auditory virtual environments are presented in terms of update rates, latencyand complexity. Applications are discussed focussing on driving simulation. ( 2000 Elsevier Science B.V. All rightsreserved.

Zusammenfassung

Die grundlegenden Prinzipien virtueller auditorischer Umgebungen, die auf kopfbezogener Akustik basieren, werdenerklaK rt. Begonnen wird einem kurzen UG berblick binauraler Technologien: binaurale Aufnahme, Wiedergabe undSynthese. Die Betonung wird auf Schallreproduktion durch Lautsprecher- Anordnungen gelegt. Merkmale und Grenzenvon Vier-Lautsprecher-Anordnungen in RaK umen und Fahrzeug-Inneren werden detailliert diskutiert. Die Auswirkungder Integration von Vibrationserregung wird ebenfalls diskutiert. Erfordernisse fuK r interaktive immersive auditorischevirtuelle Umgebungen werden dargestellt in Form von Aktualisierungsraten, Latenz und KomplexitaK t. Anwendungenwerden diskutiert, wobei sich auf Fahrsimulation konzentriert wird. ( 2000 Elsevier Science B.V. All rights reserved.

Re2 sume2

Les principes fondamentaux des environnements auditifs virtuels, baseH s sur l'acoustique speH ci"que a la te( te, sontexpliqueH s, en commenc7 ant par un court passage en revue de la technologie binaurale: enregistrement binaural, reH eH coute etsynthese. L'emphase est mise sur la reproduction des sons par des arrangements de hauts-parleurs. Les caracteH ristiques etles limites des arrangements a quatre hauts-parleurs dans les salles et les habitacles de voiture sont discuteH es en deH tail.L'impact de l'inteH gration de l'excitation de vibration est eH galement discuteH . Les exigences concernant les environnementsauditifs virtuels immergeants sont preH senteH es en termes de taux de mise a jour, temps de latence et complexiteH . Desapplications focaliseH es sur une simulation de conduite automobile sont discuteH s. ( 2000 Elsevier Science B.V. All rightsreserved.

0165-1684/00/$ - see front matter ( 2000 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 1 1 9 - 5

Page 2: Auditory virtual environments: basics and applications for interactive simulations

Keywords: Auditory virtual environments; Binaural technologies; Interactive and immersive virtual environments; Driving simulation;Vibro-acoustic reproduction; Sound localization

1. Introduction

The simulation of virtual environments is oftenassociated with the generation of three-dimensionalvisual sequences. Three-dimensional audio pre-sentation as well as tactile and low-frequency vibra-tion cues are then regarded as of minor importance.However, the simulation of three-dimensionalaudio cues } besides tactile and vibration cues }willincrease the realism of a multi-modal represen-tation quite signi"cantly. For a number of applica-tions, the visual sensation may be the moreimportant one, while the auditory sensation onlysupports the overall impression of immersiveness.Non-visible objects, however, can only be recog-nized by the auditory sensation initiated by thatobject [2]. For example, the user is able to noticeactions occurring behind him only by the auditorysignals emitted during those actions. In addition,the auditory feedback can be implemented in analmost perfect manner [3,28]. This is not possiblefor the visual feedback.

Virtual environment (VE) applications can befound in a wide scale of complexity. More complexsystems are used for medical environments anddriving simulation, while the low-cost solutions arepreferred for so-called multimedia PCs and enter-tainment applications. This paper describes anauditory feedback concept suitable for complexsystems which require a high quality.

2. Acoustical approach

From the acoustical point of view in general twoapproaches may be used to generate auditory vir-tual environments (AVE): room-related or head-related techniques.

Room-related techniques are based on a speci"crecording of the sound "eld using specializedmicrophones arrangements and adequate playback(simulation) of the sound "eld using loudspeakers.

This approach, e.g. can be found in eidophony [32]or ambisonic techniques [15]. Eidophony, forexample, includes 36 channels in its latest develop-ment phase. To each of them a spatial solid angle"eld is allocated during recording and } in a similarmanner } during playback by an appropriate loud-speaker arrangement.

Another approach is the use of a spatial loud-speaker matrix, i.e. various signals are fed throughvarious loudspeaker arrangements, e.g. Dolby-sur-round [19]. Although loudspeaker playback isused, no speci"c microphone arrangements areused. Such technologies may be classi"ed moreaccurately as `discrete planning techniquesa sincethe sound intended to determine the direction is fedselectively to closest loudspeaker in the system.

In a number of applications where headphoneuse is not desirable, the techniques described aboveare used. Many of the multi-channel systems havebeen designed to produce a spatial sound repres-entation for a large audience. Those systems(Cinemascope, Dolby-stereo, Dolby-surround) areused especially in connection with visual repres-entation, e.g. in cinemas. Common to all thesesystems is that true-to-original playback in thesense of authentic reproduction of sound events iseither not possible or rather limited. Even ineidophony true-to-original reproduction of thesound situation is inherently impossible (see [29]).It is, however, generally accepted that these systemsallow the listener a spatial classi"cation of soundsources.

Head-related techniques (binaural techniques) re-fer to the human head as the sound receiver. Themain idea is the authentic reproduction of hearingevents. The binaural approach is based on the prin-ciple that an authentic reproduction of the inputsignals measured in both ear canals is su$cient toreproduce the auditory event. This idea resulted inthe development of technology for binaural play-back and recording (e.g. arti"cial heads), for bi-naural analysis and synthesis. The basic idea has

2308 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 3: Auditory virtual environments: basics and applications for interactive simulations

Fig. 1. Basic elements of the human HRTF.

a long history [4]. First, experiments with an arti"-cial head have been carried out in 1886 in the BellLaboratories. In 1939, a forerunner of the modernarti"cial head was developed at Philips by de Boer,and Vermeulen [8], the "rst head used for elec-troacoustic transmission. Further developmentsoccurred in Berlin, GoK ttingen and Aachen[24,20,22]. The "rst professionally used arti"cialhead was introduced by KuK rer et al. (1969) and hasbeen built since 1973. Today various arti"cial headsare commercially available for various applications[13]. Binaural processing has also been improvedsigni"cantly during recent years allowing modi"ca-tion and post-processing of binaural recordings aswell as binaural synthesis, i.e. the creation of audi-tory virtual environments based on binaural tech-niques.

3. Binaural technologies

3.1. Head-related transfer functions (HRTFs)

Human hearing analyses the input signals ofboth ears with respect to interaural signal di!er-ences as well as monaural cues [4]. Both e!ects areincluded in the head-related transfer function(HRTF) that describes the characteristics of thesignal path from a sound source on a certain pointin space to both ears of the `receivera. The HRTF isa function of frequency and spatial position.Left/right perception is dominated by interauralcues, while up/down and front/back perception isderived mainly from monaural cues.

The HRTF is determined by di!ractions, reson-ances and re#ections caused by acoustically rel-evant elements as head, torso, shoulder, pinna,cavum conchae, ear canal incl. the ear drum at theend. All in#uences can be divided into direction-dependent and direction-independent components(see Fig. 1).

The in#uence of the torso becomes visible in thefrequency range below 1 kHz. Di!raction due tothe torso leads to increases or decreases of $3 dB,depending on direction. In a similar way, the shoul-der has a direction-dependent in#uence on theHRTF. Shoulder di!raction leads to increases ordecreases of about $5 dB in the frequency range

below 2 kHz. The distance between shoulder andear canal entrance is critical and determines themagnitude and location on the frequency axis ofincreases and decreases.

The in#uence of the basic head structure isstrongly directional as well. If a broadband soundsource is moved around a test subject in the hori-zontal plane, the level at the ear changes accordingto source position. If the source is on the far side ofthe head, the signal is lowpass-"ltered with a cut-o!frequency of about 1 kHz [10]. Level variationsof as much as !10 to #15 dB occur in the fre-quency range above 1 kHz as the sound source ismoved from the far to the near side. Because the earcanal entrance is not located exactly in the middleof the head, a signal interference somewhat similarto comb "ltering occurs, depending on the direc-tion of incidence [10].

The cavum conchae contributes a broadband am-pli"cation of 20 dB maximum in the frequencyrange of about 1}10 kHz, independent of direction.In addition, di!raction e!ects at the pinna result inpeaks and troughs in the magnitude of the HRTFsin the frequency range above about 2 kHz, depend-ing on the direction.

Typical monaural HRTFs (i.e. referred to thefront position in the horizontal plane) can be seenin Fig. 2. Level variations in the range of $25 dBabove 300 Hz can be obtained as the sound sourceis moved from the contralateral to the ipsilateralside.

The in#uence of the ear canal is entirely direc-tion-independent. Therefore, arti"cial heads used

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2309

Page 4: Auditory virtual environments: basics and applications for interactive simulations

Fig. 2. Monaural HRTF (i.e. referred to 03) for 3 angles of sound incidence in the horizontal plane: 903, 1803 and !903.

for free-"eld applications do not include a simula-tion of the ear canal.

Everybody possesses the same acoustically rel-evant elements, but in di!erent geometrical dimen-sions. Thus, HRTFs di!er from person to person ina complex way. An `averagea HRTF set necessary,e.g. for designing an arti"cial head with averagetransmission characteristics cannot be achieved bysimple linear averaging. A better approach is toaverage the geometrical data of di!erent test sub-jects. Investigations have shown that the correctposition of all acoustically e!ective elements, espe-cially the exact positioning of the pinna, has to beconsidered [10].

For virtual environment applications either indi-vidual or averaged HRTFs, e.g. measured at anarti"cial head with averaged geometrical dimen-sions, can be used. Instead of measuring HRTFsalso models could be used calculating HRTFs fromgeometrical dimensions. The input parameters forsuch a model are basically the geometrical data ofthe acoustically relevant components such as torso,shoulder, head and ear [10,35].

It could be shown that large di!erences can beexpected in localization if di!erent arti"cial headsare used [9]. However, it is possible to obtain verysmall localization errors when choosing a goodarti"cial head for recording. The localization erroris comparable to that which can be achieved when

using the listener's own head for recording by in-serting probe microphones in the ear canal. Local-ization problems arise mainly from the followingconstruction di!erences:f Insuzcient signal to noise ratio: Inherent noise,

introduced by electronic circuits led to `in headlocalizationa of the noise signal which does notinclude any binaural information. If the noiselevel is above the threshold of human hearing,the localization of binaurally recorded soundsmay be disturbed.

f Incorrect equalization: As mentioned above, thecorrect hearing event can be expected if thesound pressure at the ear canal entrance is repro-duced exactly. For that purpose an equalization(see later) has to be introduced due to lineardistortions of recording and playback arrange-ment. Incorrect equalization may a!ect the local-ization capabilities due to incorrect monauralcues.

f Diwerent dimensions of acoustically relevant com-ponents: The dimensions of the various acousti-cally relevant components and their positioningare the most critical points when constructing anarti"cial head or a modeled HRTF set. Since therelevant dimensions of test subjects vary signi"-cantly, it is impossible to construct an averagehead or an average HRTF set providing opti-mum localization cues for everyone. However,

2310 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 5: Auditory virtual environments: basics and applications for interactive simulations

Fig. 3. The equalization principle for arti"cial heads and headphones.

di!erent methods could be used to de"ne anaverage HRTF set/arti"cial head providing opti-mum-averaged localization cues.

3.2. The general principles of equalization

Equalization of binaurally recorded signals isneeded in order(a) to achieve loudspeaker compatibility,(b) to be able to compare signals to those achieved

by conventional microphones,(c) to reproduce the correct ear signal at the lis-

teners ear. The principle of the equalization isshown in Fig. 3. For better understanding andfor practical reasons the equalization is dividedinto an equalization applied to the arti"cialhead signal and one applied to the headphonereproduction.

The equalizations typically used are:f Free-xeld equalization: The output signal is

equalized such that for 03 sound incidence underfree-"eld conditions the transfer function of thearti"cial head is independent of frequency.

f Diwuse-xeld equalization: The output signal isequalized such that in a di!use sound "eld thetransfer function of the arti"cial head is indepen-dent of frequency.

f Independent-of-direction equalization: The outputsignal is equalized such that only the in#uence of

the non-directional components of the HRTF(see Fig. 1) is equalized.

3.3. Binaural synthesis

Binaural synthesis means that the binauralsound signals perceived by both human ears haveto be synthesized. For that purpose head-relatedimpulse responses (HRIR) are measured, describingthe sound pathes from an object at a speci"c posi-tion to both ear canals. A HRIR set can be mea-sured with an arti"cial head or individual testpersons. Measurements using the arti"cial head canbe performed with high accuracy, reproducibilityand spatial resolution [12]. Because of interin-dividual di!erences HRIR measured with an arti"-cial head are not the optimum solution fora specixc user. For a small number of users indi-vidual HRIR sets measured with microphoneslocated in the ear canal of the speci"c user may bea good choice. In order to get a su$cient spatialresolution, such HRIR sets have to be interpolated.For multi-user applications either a HRIR set de-rived from arti"cial head measurements ora database of individual HRIR sets can be used. Inthis case a quick selection method has to be de-veloped in order to get the best-"tting HRIR set.During the simulation the monophonic soundsource is convoluted with the HRIR (left and right)

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2311

Page 6: Auditory virtual environments: basics and applications for interactive simulations

Fig. 4. Structure of the auditory display for one sound source.

corresponding to the actual position of the soundsource with regard to the listener's head. Re#ec-tions at walls etc. can be considered by additionalsecondary sound sources. After the convolutionwith the HRIR "lters all sound components areadded in order to get the binaural output signal.Fig. 4 shows a typical structure of an auditorydisplay performing the signal processing for onesound source. The interaural delay (realized withtwo delay boxes &T' in Fig. 4) is separated from theHRIR "lter in order to get short HRIR "lters. Anadditional delay for each re#ection simulates thelength of the re#ection path, while the pre"lterssimulate the re#ection characteristics of the re#ect-ing surface and/or transmission characteristics ofthe sound path, i.e. attenuation coe$cients of trans-mitted virtual objects, for example, a helmet wornby the listener.

4. Immersive auditory virtual environments

A VE system should provide an immersive VE.Therefore, a realistic simulation of visual, auditoryand tactile sensations is required. There are someimportant di!erences between binaural room simu-lation on the one hand and the generation of anauditory interactive scenario on the other. Roomsimulation is performed in order to get a veryaccurate simulation of a given room, for example,a concert hall. Room simulation algorithms use

image sound source models or ray-tracing [5,38]and are normally based on geometrical acoustics.A lot of enhancements were introduced during thelast 10 years, probably, the most important onesare the consideration of di!use wall re#ections andthe convolution of received re#ections with head-related impulse responses in order to get a binauralroom impulse response.

Normally, the room impulse response is cal-culated for a limited number of sound source loca-tions and listener positions. By the division of theimpulse response into direct sound and early re#ec-tions, early and late reverberation, parts which canbe handled with di!erent complexity, the calcu-lation e!ort can be reduced [17]. However, a su$-cient room simulation still requires some secondsup to several days of calculation time, dependenton the complexity of the room and the calcula-tion power available. Compared to the room simu-lation the time needed for the convolution ofthe monophonic input signal with the binauralroom impulse response can be neglected in mostapplications.

In auditory interactive scenarios all is quite dif-ferent. The room impulse response and the binauraloutput signals have to be calculated in real time.Each delay reduces the impression of immersive-ness. Sound emitting objects as well as the listenerare moving around in the virtual space. To achievea consistent virtual environment avoiding anyaudible spatial steps the room simulation as wellas the signal-processing elements (HRIRs, pre"l-ters, time delays) have to be updated at least 30times per second. The head orientation has to bemeasured by a tracking sensor with the same up-date rate.

Besides update rate, latency in#uences the qual-ity of auditory virtual environments. Especially, forthe interaction between the user and virtual envi-ronment, latency is the most important parameter.Actions done by the user ask for a quick responsewithin the virtual environment. A critical value thatshould not be exceeded is a time-lag of about100 ms [31].

Compared to concert hall simulation (see above)results have to be calculated extremely fast. Sothere is no doubt that the quality has to be reducedin order to meet those requirements (see also [30])

2312 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 7: Auditory virtual environments: basics and applications for interactive simulations

Fig. 5. Principle of Doppler-e!ect. S: sound source, L: listener,<

S: velocity of sound source, <

L: velocity of listener.

Only a limited number of re#ections (10}50) can becalculated directly, they have to be assumed asgeometrical re#ections. All remaining re#ectionshave to be simulated by a simpli"ed reverberationalgorithm using recursive "lter structures. Forsources or listeners moving with high velocity theDoppler-shift has to be taken into account.

After the sound "eld parameters have been cal-culated, the binaural output signals have to begenerated by a real-time audio signal-processingsystem. For each sound source three binaural sig-nal components have to be calculated in realtime. Direct sound, early re#ections and latereverberation. Due to possible movements ofsources and listener at least direct sound andearly re#ections have to be realized by time-variantdirectional "lters as already shown in Fig. 4. Due tothe real-time conditions parallel processing isneeded.

4.1. Doppler-shifts

For fast moving objects (resp. listeners) the well-known Doppler-shift has to be considered. Fig. 5shows the principle of the Doppler-shift calcu-lation.

For the general situation that both } listener andsound source } are moving, only source velocitycomponents pointing to the listener and vice versaare considered. Using some physical and mathe-matical knowledge, a general expression can befound for the Doppler shift f/f

0:

f"f0

c#l¸1

c!lS1

, c'lS1.

This equation can be used also to adjust the samp-ling frequency of a given pre-recorded source soundin order to get the correct Doppler-shift for thesimulation. Thus, a time-variant up-/down-samp-ling algorithm has to be implemented into theauditory display for fast-moving objects. TheDoppler-shift has to be calculated before the bi-naural coding, otherwise the HRIR would bechanged also. Since the secondary sound sourcesmay move di!erent with regard to the listener,a di!erent Doppler-shift has to calculated for eachre#ection. More complex solutions are discussed in[36].

4.2. Reverberation

Fig. 6 shows the main structure of a binauralreverberation unit simulating the late reverber-ation, that is calculated from a weighted sum of allsound sources. This simpli"cation is based on thefact that localization of sound sources is dominatedby the "rst re#ections, while the late reverberationcomponent generates a more di!use impression.A pre"lter is used in order to adjust the timber ofthe reverberation signal to those of the early re#ec-tions. Recursive reverberation algorithms are de-scribed by a lot of authors, e.g. [33,34,27,18]. Thealgorithms introduced by Moorer and Jot allow toimplement a frequency-dependent reverberationtime. Fig. 6 shows the structure according toMoorer's proposal.

The system was tested with various combina-tions of comb-"lters and all-pass "lters for eachchannel. A low number of elements seems to besu$cient for most VE applications. In real roomsthe late reverberation signal produces a di!use spa-tial impression. That impression can be approxi-mated by uncorrelated reverberation signals for leftand right channel, resulting from slightly di!erent"lter coe$cients.

5. Sound reproduction

5.1. Introduction

As mentioned above, binaural technology isbased on the idea of reproduction of the ear signals

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2313

Page 8: Auditory virtual environments: basics and applications for interactive simulations

Fig. 6. Binaural late reverberation unit, basic structure according to Moorer [27], with N comb-"lters and M all-pass "lters in eachchannel.

in order to reproduce the complete hearing sensa-tion as described in [28,23]. This idea impliesthe use of headphones since only headphonereproduction ensures that no cross talk betweenboth channels of the binaural signal occurs by thereproduction system, i.e. the right ear receives onlythe signal recorded in the right ear and the left earonly that recorded in the left ear.

Sometimes, e.g. in driving simulation, headphonereproduction is not desirable in order to achievea virtual situation in which all aspects are close toreality. In that case loudspeaker reproduction of allsounds is required.

During the development of the arti"cial head asdescribed in [26], theoretical research and numer-ous experiments with loudspeaker reproductionhave been performed. Since head-related interauraltime di!erences are included in the HRTF, thespatial de"nition of auditory events in a loud-speaker playback situation based on arti"cial headsignals often delivers better results than solutionsusing coincident or semi-coincident microphonetechniques. The HRTF is still present in loud-speaker playback, but without coloration of timber,due to normalizing equalization (e.g. free-"eld ordi!use-"eld equalization). Head-related time andfrequency domain information remains in the loud-speaker reproduction despite the non-binauralpresentation mode, giving excellent imaging andtransparency.

5.2. Two-loudspeaker arrangement

Arrangements for loudspeaker reproductionusing crosstalk canceling techniques have beendescribed in [7,6,25]. Both analog- and digital-processing techniques may be used, althoughdigital techniques allow a more correct cancelingdue to the accuracy in "ltering and delay. Theprinciple of such an arrangement is shown in Fig. 7.Under ideal conditions (anechoic or semi-anechoicchamber, exact positioning of the listener, correctequalization) it is possible to achieve the sameauditory events as obtained when listening withheadphones. Sometimes even a better frontallocalization is reported compared to headphonereproduction. The disadvantage of this method,however, is that these environmental conditionscannot be easily realized in living rooms, studios,cars or other more or less reverberant environ-ments.

5.3. Four-loudspeaker arrangements in rectangularrooms

Another reproduction procedure used quite fre-quently is a four-loudspeaker arrangement asshown in Fig. 8. The loudspeakers typically arepositioned in a square formation around a centralpoint equidistantly (see Fig. 8). Other arrangementsare possible, e.g. speci"c arrangements in a car. The

2314 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 9: Auditory virtual environments: basics and applications for interactive simulations

Fig. 7. Principle arrangement for crosstalk cancellation using two loudspeakers in anechoic conditions.

Fig. 8. Four-loudspeaker arrangement for display of binaural recordings.

two left-hand loudspeakers receive the same free-"eld equalized arti"cial head signal of the left-handchannel only. The right-hand side is arrangedsimilarly.

If other techniques than headphone reproduc-tion for sound reproduction are used, each has tobe compared with the results of listening tests whenusing headphones for reproduction. This includeslocalization tests as well as distance localizationtests. A detailed description of such tests can befound (e.g. in [11]). For the evaluation of the refer-ence condition (headphone playback) as well as for

loudspeaker reproduction listening tests are to becarried out using test stimuli recorded in anechoicconditions for various positions in the horizontalplane. For the test results described here the 3603angle around the arti"cial head in this plane isdivided into 12 equal segments of 303 each, corre-sponding to the face of a clock (front"12, be-hind"6, right"3, left"9). These numbers arealso used by the test subjects when later identifyingthe hearing direction. The test stimulus itselfconsists of a pseudo-random noise with whitespectrum.

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2315

Page 10: Auditory virtual environments: basics and applications for interactive simulations

Fig. 9. Di!erence of localization errors between headphone reproduction and four-loudspeaker reproduction in various rooms. Foreach direction in steps of 303 the di!erence d of localization errors F, averaged over all subjects is presented referred to the maximumangle deviation. In addition are shown: Fi } percentage of `in-head localizationsa (sound source is localized inside the head), Fr} percentage of direction inversions, Fq } angle deviation relative to the total, i.e. the localization error in degrees averaged over alllisteners (and sound directions) referred to the maximum angle deviation.

The tests for loudspeaker reproduction with testsubjects were carried out in a test room with highsound absorption, in a typical o$ce room and ina reverberant room. The test arrangement is shownin Fig. 8. The loudspeakers also have to be adjustedto the equalization selected during the arti"cialhead recording in order to obtain accurate play-back. The test subjects stated the hearing directionorally. The test results were derived from the judge-ments of 12 subjects. Using this type of test theresults shown in Fig. 9 were achieved.

As it can be seen in Fig. 9, there are hardly anydi!erences between the results of the hearing testscarried out in the rooms. Compared with head-phone playback, a similarly good or even betterlocalization of sound sources in front of and behindthe arti"cial head is demonstrated. In contrast,laterally displaced sound sources cannot be localiz-ed so well, because the hearing direction is fanningout. These e!ects are well explained by summinglocalization which can occur in the frontal as wellas the rear hearing event space. In contrast to

headphone playback, recognition of direction dir-ectly from left or right (directions 3 and 9) is muchmore di$cult. The reason could be the markeddi!erence in the way the outer ear "lters the earinput signals from the front and rear. An accumula-tion of `in-head localizationa for directions 6 andparticularly 12, such as occurs in headphoneplayback, cannot be identi"ed in the case of thefour-loudspeaker arrangement. Furthermore, testsreported in [11] show that even a displacement ofthe listener does not lead to a complete confusion ofthe localization. Certainly, the sound source loca-tion is shifted towards the direction of the listenersdisplacement, all sources appear virtually moreclose to the direction of displacement.

In Fig. 10 the results of distance localization testsare shown. The test material used was a speechsequence with a duration of approximately 2.5 s,recorded in a room 8.4 m long, 3.7 m wide and 3 mhigh. The average reverberation time was 0.5 s. Therecordings were conducted at distances of 1, 2 and3 m using an arti"cial head. Positive numbers

2316 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 11: Auditory virtual environments: basics and applications for interactive simulations

Fig. 10. Di!erence of percentage of correct distance localiza-tions (numbers in percentage, headphone reproduction } loud-speaker reproduction).

expressed in (percentage) indicate a deterioration ofthe distance localization as compared to the head-phone reproduction, negative numbers correspondto an improvement. Generally it can be seen that,e.g. in an o$ce room (Fig. 10) the distance ofa virtual sound source is fairly well localized al-though more close to original localization can befound in anechoic rooms (see [39]).

5.4. Four-loudspeaker arrangements in car cabins

Based on the above-described experiments itseems to be obvious that this kind of loudspeakerreproduction also should be suitable for vehiclesimulation. The "rst experiments conducted ina car using a similar arrangement inside the vehiclewere found to be very promising, the localization oftypical car sounds achieved by this arrangementwas perceived very well. For the setup within thecar cabin the implemented loudspeakers can beused, however, they have to equalized carefully. Inorder to investigate the quality of that kind ofarrangement localization experiments were per-formed. To get best results for both headphone andloudspeaker reproduction individually selected

HRTF sets were used for the auditory display un-der test.

Within the European project AUDIS a databaseincluding 35 individual HRTF sets has been builtup [1]. For each of the 35 HRTF sets a noise burstvirtually rotating on a circle around the listener'shead in the horizontal plane was generated usingthe HRTFs of the speci"c HRTF set. The subjectslistened to this rotating burst via headphones aswell as via a loudspeaker arrangement in a carcabin.

For both reproduction techniques seven subjectswere told to describe verbally the hearing eventthey perceived listing to the moving noise burst.Each subject found at least three HRTF sets sup-porting the perception of a sound virtually rotatingon a circle. The subjects were told to select the bestHRTF set out of those well-"tting ones. With oneexception all subjects selected di!erent sets for bothreproduction techniques. Those two best HRTFsets } one for headphone reproduction and one forloudspeaker reproduction } were used for the nextexperiment.

Subsequent to the selection of the best HRTF set,the subjects participated in a localization test per-formed for both reproduction techniques. As stimu-lus, a noise burst of 270 ms repeated two times waspresented. The virtual positions were randomizedwithin the horizontal plane in steps of 303. Thus,the subjects had to select one of totally 12 possiblepositions in space.

The results are shown in Fig. 11 (see also [21]).Localization errors can be classi"ed as shown inTable 1. Results obtained from seven subjects givean indication of the main e!ects. For loudspeakerreproduction, the perception is more di!use, asindicated by the increase of two-step errors, butfront/back confusions are only slightly increased.

6. The Integration of vibration simulation

6.1. Introduction

Realism of virtual environments can be signi"-cantly enhanced by the integration of feedbackchannels not addressing the ears, but the wholebody: very low-frequency airborne sound as well as

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2317

Page 12: Auditory virtual environments: basics and applications for interactive simulations

Fig. 11. Result of the localization test. Left diagram results for headphone reproduction, right diagram for loudspeaker reproduction.

Table 1Classi"cation of localization errors

Type Headphone Loudspeaker

One-step error (303) 33.33% 26.19%Two-step error (603) 0% 11.9%Front/back confusion 10.7% 15.48%

structure vibration. For that purpose, the binauraltechnology as described above has to be extended.For recording, multi-channel measurement systemsmay be used that allow the recording of acousticaland vibrational data simultaneously. As a simpli"-cation for some applications (e.g. driving simula-tion for training purposes) it is su$cient to generatethose vibration components directly from the bi-naural recording by lowpass "ltering and equaliza-tion. For playback, suitable playback arrange-ments have to be found.

6.2. Noise and vibration exposure in vehicles

The vibrational situation in a passenger com-partment can be divided into two main categories:f Vibrational excitation through operational

devices, i.e. engine, transmission system, wheels

and suspension system. A typical example is thesecond order of a four-cylinder engine.

f Vibrational impact by `comfort featuresa, suchas power windows, electric sunroof, power seatsand electrical mirrors. The electrical devices usedhere primarily cause low frequent noise shares(`boominga) and vibrations.

At present, there is no detailed research yet on thedependencies between vibrational and acousticalperception. Examinations have shown a trade-o!phenomena between sound and vibration when thevibration level is in the range of perception thre-shold: the loudness is judged higher when vibra-tions are present in this case [37]. The experiencewhen dealing with complaints in vehicles haveshown that normally the consideration of vibra-tions at passenger's seat and of the rotational vibra-tions at the steering wheel is su$cient for a "rstapproach. The mentioned vibrations represent themajor part of relevant in#uences for the judgement.For particular devices } for example power win-dows } the excitation of other points at the carbody may be considered. Introductory researchtests within the European research project OB-ELICS (BRPR CT96-0242) have shown that theuse of combined vibroacoustic playback systemsleads to more reliable judgements of sound charac-teristics and sound quality. Based on this, a suitablevibro-acoustical playback system may consist of

2318 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 13: Auditory virtual environments: basics and applications for interactive simulations

Fig. 12. Con"guration of a sound simulation subsystem in a car,including the vibroacoustic playback system as well as elementsnecessary to achieve an interactive virtual environment for driv-ing simulation: control elements, vehicle dynamics simulation,sound simulation using binaural playback for not moving sour-ces and binaural synthesis for moving sources.

airborne sound via head phone(s), low frequentsound (20}150 Hz) via subwoofer(s), and vibrationsat steering wheel and seat via excitation devices[14]. The set-up of such a system is shown inFig. 12.

7. Applications

7.1. Driving simulation

Driving simulation is doubtless one of the mostinteresting applications of virtual environment(VE) technologies. The driver controlling a virtualvehicle feels immersed into the virtual world if he

receives a plausible feedback to his actions. Themost important actions to be considered by thesimulator are interactions with control elements assteering-wheel or handle-bar, clutch, brake, acceler-ation pedal, etc., and head and body movements(the latter especially for motorcycles).

The most important feedback components areinertial feedback, visual feedback and vibro-acous-tical feedback. In comparison to other VE applica-tions a `mixed realitya scenario is implemented[16]: Inertial, visual, acoustical and vibrationalfeedback is simulated, but some elements are realones. A real passenger compartment, respectively,motorcycle mock-up is normally used, since theirsimulation would introduce a lot of e!ort withoutany enhancement of the driving simulation. For thesame reason normally real control elements are notreplaced by sensory gloves or other means.

In most cases a driving simulation system iscontrolled by a central computer hosting models ofthe virtual world and the vehicle dynamics andinteracting with all feedback or sensory subsystems,including a strategy manager taking decisions in allsituations.

The acoustical feedback system normally is de-signed as a subsystem. Since it is a feedback simula-tion system, real-time data #ow is always directedfrom the central controller to the acoustical subsys-tem. Since there are a lot of data relevant only forthe acoustical representation it is advantageous touse a local database containing all sounds produc-ed by the vehicle or any other object in the virtualworld as well as all relevant information of allacoustical relevant objects existing in the virtualworld.

An overview of a typical acoustical feedbacksystem is given in Fig. 13. Messages coming fromthe controller are routed by the message handler tothe sound source model and/or the auditory ren-derer. The sound source model selects the requiredsound from the database or generates it syntheti-cally (see below). All sounds required have to beprovided by the sound generation hardware in par-allel and in real time.

For sounds produced by objects moving withrespect to the driver additional signal processinghas to be done for the simulation of Doppler, shiftsand the binaural coding of the sound emitted by

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2319

Page 14: Auditory virtual environments: basics and applications for interactive simulations

Fig. 13. Structure of the acoustical feedback system. Used abbreviations: VE"virtual environment, DB"database.

those objects. Two components shown in Fig. 13perform that task:f The auditory renderer calculates all parameters

necessary for the simulation of Doppler-shiftsand binaural coding, based on a local databaseof the environment on the one hand and the lastupdate message routed by the message handleron the other.

f The auditory display performs the real-timesignal-processing using the specialized audio sig-nal-processing hardware mentioned above.

For the driving situation the following sounds haveto be taken into account:

f Engine sound, dependent on engine speed andtorque, but not moving in space.

f Tire sound, dependent on speed and road condi-tions, but not moving in space.

f Wind noise, dependent on speed, but not movingin space.

f Sounds produced by other objects, moving inspace, especially other vehicles. For vehicles thissound depends on vehicle speed and orientation.Other objects may be pedestrians, birds, etc.

f Background sounds, not moving in space.f Commands to the driver, not moving in space.Sound components not moving in space are repro-duced by binaural playback, while moving soundsare generated by binaural synthesis.

Depending on the user requirements more or lesssimpli"cations can be made concerning the genera-tion of the di!erent sound components. If thesystem has to achieve an auditory impression veryclose to that perceived in a speci"c real vehicle, a lotof recordings of that vehicle have to be stored in thelocal database shown in Fig. 13. If a `good impres-siona is su$cient, synthetically generated or moregeneral sounds can be used instead.

8. Other applications

Auditory virtual environments are not limited todriving simulation. Auditory feedback systemsstructured as described above are used in variousapplications. Examples are:f Research on auditory and multi-modal percep-

tion. SCATLAB, a spatially coordinated audi-tory/ tactile interactive scenario was built upin the European research project SCATIS(ESPRIT 6358). Emphasis is laid on the auditoryand tactile aspects of VE, so no visual feedback isimplemented.

f Medical applications. VETIR, also an Europeanresearch project (TIDE 1216), has been focussedon the development of a VE system for the reha-bilitation of patients with motor dexterity dis-abilities.

2320 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322

Page 15: Auditory virtual environments: basics and applications for interactive simulations

f Another interesting application is the enhance-ment of video conference systems. Sound qualityand speech intelligibility can be enhanced if sev-eral speakers are located at di!erent virtualpositions. Simple implementations of binauralsynthesis modules can be easily implemented inusual notebooks. Networking via Internetenables then virtual conferencing including natu-ral acoustics.

f Speech communication systems including acous-tical warning signals as used in aircraft cockpitsor control stations can also be enhanced by usingan auditory display as described in this paper.Situational awareness as well as speech com-munication can be enhanced if the pilot (resp.controller) is supported by an auditory display.This application has been investigated in theAUDIS project (ESPRIT 22352).

f Sound design. A number of sound quality as-pects can be investigated using playback of arti"-cial head recordings, but in most applications thequality of such investigations can be enhanced ifthe test is conducted in a real environment. Thus,the SoundCar described above is used asa powerful tool in sound design for some years.A further enhancement can be achieved by testsunder real driving conditions. Tests in real cars,however, are expensive or even impossible. Inorder to achieve reproducible results for severalcars and several conditions, driving simulationseems to be an alternative solution. In addition,simulators allow to evaluate cars, situationsand/or conditions which are hardly available inreal live with real cars on real roads. Thus,acoustic simulation techniques can help to re-duce the e!ort for sound engineering signi"-cantly. The virtual environment is stored asa sound database consisting of di!erent compo-nents which are recombined to complete scen-arios during the simulation phase. Normally, thedatabase is derived from acoustical measure-ments of the real car under several conditionscovering the range to be simulated. However,sounds produced by o!-line simulators can beincluded also, e.g. the output of a engine simula-tion software. In that way the sound of a com-pletely virtual car can be evaluated. Up to nowmost sound design activities start when the "rst

prototypes are available. Acoustic simulationtechniques help to shift those activities to earlierstages of development.

Acknowledgements

Some parts of the work described in this paperhave been supported by the European commissionwithin the R&D projects MORIS (ESPRIT 20521),AUDIS (ESPRIT 22352) and OBELICS (BRPRCT96-0242).

References

[1] AUDIS catalogue of Human HRTFs. A demo version isavailable at EAA documenta acustica.

[2] D. R. Begault, 3D-sound for virtual reality and multi-media, AP Professional, 1994.

[3] J. Blauert, The auditory representation in virtual reality,Proceedings of the 15th ICA, Trondheim 1995, Vol. III, p.S. 207!.

[4] J. Blauert, Spatial Hearing, MIT Press, Cambridge, MA,1983.

[5] J. Borish, Extension of the image model to arbitrary poly-hedra, JASA 75 (1984) S. 1827!.

[6] D.H. Cooper, J.L. Banck, Prospects for transaural record-ing, J. Audio Eng. Am. 37 (1/2) (1989) 3}19.

[7] P. Damaske, V. Mellert, Ein Verfahren zur richtun-gstreuen Schallabbildung des oberen Halbraumes uK berzwei Lautsprechern, Acustica 22 (1969) 154}162.

[8] K. de Boer, R. Vermeulen, Eine Anlage fuK r einen Schwer-hoK rigen, Philips Technische Rundschau 4 (1939) 329}332.

[9] K. Genuit, Optimierung eines Kunstkopf-Aufnahmesys-tems, 12. Tonmeistertagung 1981, Tagungsband 1982, pp.S.218}243.

[10] K. Genuit, Ein Modell zur Beschreibung von Au{enoh-ruK bertragungseigenschaften,Dissertation, RWTH Aachen,1984.

[11] K. Genuit, H.W. Gierlich, U. KuK nzli, Improved Possibili-ties of Binaural Recording and Playback Techniques, 92ndConvention of AES Vienna, 24}27 March 1992, preprint3332.

[12] K. Genuit, N. Xiang, Measurements of arti"cial headtransfer functions for auralization and virtual auditoryenvironment, Proceedings of the 15th ICA, Trondheim,1995, Vol. II, p. S. 469!.

[13] K. Genuit, W. Brennecke, S. Peus. Standardisierung derRichtcharakteristik von Kunstkopf-Mess-Systemen, For-tschritte der Akustik - DAGA 96, Bonn, pp. 274}275.

[14] K. Genuit, J. Poggenburg, The in#uence of vibrations onthe subjective judgement of vehicle's interior noise. Noise-con 98, Ypsilanti, MI, USA.

W. Krebber et al. / Signal Processing 80 (2000) 2307}2322 2321

Page 16: Auditory virtual environments: basics and applications for interactive simulations

[15] M.A. Gerzon, Ambisonics in multichannel broadcastingand video, J. Audio Erg. Soc. 33 (1985) 859}871.

[16] H.W. Gierlich, W. Krebber, Auditory Feedback for Vir-tual Environment } An Application for Multi-ProcessorArchitecture, Proceedings of the FIVE'96, Pisa, Italy,19}20 December 1996, pp. 79}83.

[17] R. Heinz, Entwicklung und Beurteilung von computerges-tuK tzten Methoden zur binauralen Raumsimulation, Ph.D.Thesis, RWTH Aachen, 1994.

[18] J.M. Jot, E$cient models for reverberation and distancerendering in computer music and virtual audio reality,International Computer Music Conference, Thessaloniki,Greece, September 1997.

[19] S. Julstrom, A high-performance surround sound processfor home video, J. Audio Eng. Soc. 35.

[20] R. KuK rer, G. Plenge, H. Wilkins, Verfahren zur hoK rrich-tigen Aufnahme und Wiedergabe von Schallereignissenund Vorrichtung zu seiner DurchfuK hrung, Pat. No. 19 27401, Deutsche Patentanmeldung v. 29 May 1969.

[21] W. Krebber, H.W. Gierlich, Auditory displays using loud-speaker reproduction, Joint Meeting EAA/ASA Berlin,1999.

[22] P. Laws, H. J. Platte, Ein spezielles Konzept zurRealisierung eines Kunstkopfes fuK r die kopfbezogenestereophoneAufnahmetechnik.ITG-Fachberichte, No. 56,VDE Verlag, 1977, pp. 192}198.

[23] H. Lehnert, Binaurale Raumsimulation: Ein Computer-modell zur Erzeugung virtueller Umgebungen. PH.D. The-sis, Ruhr University Bochum, Shaker Verlag, Aachen, 1992.

[24] V. Mellert, Construction of a dummy head after newmeasurements of threshold of hearing, J. Acoust. Soc. Am.51 (1972) 1359}1361.

[25] H. MoK ller, Reproduction of arti"cial head recordingsthrough loudspeakers, J. Audio. Eng. Amer. 37 (1/2) (1989)30}33.

[26] H. MoK ller, Fundamentals of binaural technology, Appl.Acoust. 36 (1992) 171}218.

[27] J.A. Moorer, About this reverberation business, ComputerMusic J. 3 (2) (1979) S. 3}28.

[28] H.J. Platte, Zur Bedeutung der Au{enohruK bertragun-gseigenschaften fuK r den NachrichtenempfaK nger `Men-schliches GehoK ra. PH.D. Thesis, RWTH Aachen, 1979.

[29] H.J. Platte, K. Genuit, Kann eine elektroakustische UG ber-tragung mit Lautsprecherwiedergabe originalgetreu sein?,5. NTG HoK rrundfunktagung, Mannheim, 1980, VDE-Ver-lag, Berlin, Proceedings, pp. 51}58.

[30] J. Sahrhage, H. Strauss, Realisierung einer auditiv/taktilenvirtuellen Umgebung, DAGA 96.

[31] J. Sandvad, Dynamic aspects of auditory virtual environ-ments. Proceedings of the 100th AES Convention, Copen-hagen, May 11}14, 1996

[32] P. Scherer, Ein neues Verfahren der raumbezogenenStereophonie mit verbesserter UG bertragung der Raumin-formation, Rundfunktechnische Mitteilungen, 1977, pp.196}204.

[33] M.R. SchroK der, Digital simulation of sound transmissionin reverberant spaces, J. Acoust. Soc. Am. 47 (2) (1980)424}431.

[34] M.R. SchroK der, Natural sounding arti"cial reverberation,J. Aud. Eng. Soc. 10 (1962) S. 219}223.

[35] R. Sottek, K. Genuit, Physical modeling of individualhead-related transfer functions (HRTFs), Joint meetingASA/EAA/DEGA, Berlin, 1999, Acta Acoustica, 85(Suppl.1) (1999) S236.

[36] H. Strauss, Implementing Doppler shifts for virtual audi-tory environments, 104th AES Convention, Amsterdam,preprint 4687, 1998.

[37] Y. Tamura, M. Hiraga, The E!ect of Low FrequencyVibration on Sound Quality of the Automobile AudioSystem } Proposal of a New Automotive Audio System.

[38] M. VorlaK nder, Ein Strahlverfolgungs-Verfahren zur Be-rechnung von Schallfeldern in RaK umen. Acustica 65 (1988)S.138!.

[39] N. Xiang, K. Genuit, H.W. Gierlich, Investigations ona new reproduction procedure for binaural recordings,95th Convention of AES, New York, 1993, 7}10 October93, preprint 3732.

2322 W. Krebber et al. / Signal Processing 80 (2000) 2307}2322