an exploration into the effectiveness of 3d sound on video games

� � ��

An exploration into the effectiveness of 3D

Sound on video games

By Jonathan Tozer

BIRMINGHAM CITY UNIVERSITY

BSc (Hons) Sound Engineering and Production

April 2010

�

� � ��

An exploration into the effectiveness of 3D sound on video games

By Jonathan Tozer

Supervisor: Islah Ali-Maclachan

� � ��

ABSTRACT

This project has set out to explore how binaural audio or 3D Sound can be utilised in

the media of video games, the advantages it can bring as well as processes that the

technology must undergo in order for the phenomenon to be perceived correctly. Apart

from extensive research into the area of psychoacoustics and digital implementation of

3D sound, further comparisons have been drawn between stereo and 5.1 surround

sound through testing conducted on human subjects. All three dispersion methods

were tested in order to find the localisation accuracy and effectiveness of each. The

main affect upon localisation accuracy is seen to be head-related transfer functions

(HRTF) and how this is used within the creation of 3D sound as there are many

possible ways to utilise this necessity. This project looks at using generalised HRTFs

for testing and it has ultimately found 3D sound to be weak using this test method, this

was found across a number of subjects compared to surround sound and even stereo.

� � ��

My greatest appreciation goes to Izzy for helping me through this project and giving me

support and guidance throughout. Also, I would like to thank Andy Bourbon and the

rest of the staff at the School of Digital Media for their constant support over the last

three years.

And finally to my parents for their support, guidance and love. Thank you.

� � ��

Table of Contents

• Glossary vi

• List of Figures and Plates vii

1.0 Introduction 1

1.1 Background 1

1.2 Problem Definition 1

1.3 Scope 2

1.4 Rationale 2

1.5 Aims and Objectives 3

2.0 Review of Existing Knowledge 3

2.1 Introduction 3

2.2 The effect of 3D sound on video games 4

2.2.1 3D sound or surround sound 4

2.2.2 Headphones and loudspeakers 5

2.3 The auditory space 6

2.3.1 Azimuth and elevation perception 7

2.3.2 The lateralization paradigm 7

2.3.3 Interaural timing difference 7

2.3.4 Interaural intensity difference 8

2.3.5 The precedence effect 8

2.3.6 Head movement 8

2.3.7 Moving sources 9

2.4 The cone of confusion 9

2.5 Head-Related Transfer Function 9

2.5.1 Individualised and non-individualised HRTF 10

2.6 Distance, depth and the environment of the virtual world 11

2.6.1 Distance and depth 11

2.6.2 Apparent source width 13

2.6.3 Inside- The- Head localisation 13

2.7 The implementation of 3D sound 13

2.7.1 Current game audio techniques 13

2.7.1.1 Current use of 3D sound in videogames 14

2.7.1.2 The use of 3D sound outside of videogames 15

� � ��

2.7.2 3D sound implementation 15

2.7.2.1 HRTF implementation 15

2.7.2.2 Generalised HRTF 16

2.8 Conclusion 17

3.0 Methodology 17

3.1 Initial testing method 17

3.2 Final testing design 18

3.3 Testing procedure design 19

3.4 Subjects used 20

3.5 Experimental set up 20

3.6 Questionnaire design 21

3.7 Experimental procedure 22

4.0 Results 23

4.1 Stereo localisation 23

4.2 Surround Sound localisation 25

4.3 3D sound localisation 26

4.4 Localising in the presence of reverb 26

4.5 Localising differing frequencies 27

4.6 Favoured form of localisation 27

5.0 Discussion 28

5.1 Actual use of 3D sound in video games 33

6.0 Conclusion 34

7.0 Recommendations for further work 35

8.0 References 36

9.0 Bibliography 38

10.0 Appendices 41

� � ��

Glossary

AE- Amphiotik Enhancer

API- Application Programmer Interface

DSP- Digital Signal Processing

FIR- Frequency Impulse Response

FPS- First Person Shooter

HRTF- Head-Related Transfer Function

IHL- Inside-The-Head Localisation

IID- Interaural Intesity Difference

ITD- Interaural Timing Difference

MAA- Minimum Audible Angle

� � ��

List of Figures and Plates

Figure No. Title Page Number

1 Recommend 7.1 Set up 8”-12” from TV 5

2 Azimuth and Elevation 6

3 Cones of Confusion 9

4 The differing HRTF of multiple subjects 11

5 Default Distance Model used by Waves Art Acoustic Environmental Modelling

12

6 Recommended 5.1 Set up 8”-12” from TV 21

7 Azimuth and Elevation Sources for Speech Stimuli 22

8 Test 1 Part 1 24

9 Test 1 Part 4 24

10 Test 2 Part 3 26

11 With which dispersion method was it easiest to localise sound?

28

12 Regardless of cost which method would you enjoy playing games through most?

28

13 Test 5 Parts 3 and 6 29



Plate No. Title Page Numbers

1 HRTF Measurement of a Human Subject 16

�

� � ��

1.0 INTRODUCTION

1.1 Background

3D sound is an audio format which will enable the recreation of a full 3D landscape

through a set of standard headphones (Begault, 1994). Unlike other sound dispersion

formats that enable surround sound (including 5.1 and 7.1.), 3D sound will allow sound

sources to be perceived from all apparent directions on both Azimuth and Elevation

planes.

This project hopes to find if there is any real advantage in implementing 3D sound into

video games and detail the affects that this sound dispersion method will have on the

player in comparison to current methods used today.

3D sound can also be referred to as binaural audio; however, this project will always

refer to the phenomenon as 3D sound.

1.2 Problem definition

As technological capabilities have grown over recent years so has the demand for

more lifelike games and although graphics and visuals have taken a huge leap forward

in terms of realism, audio is vastly limited in terms of sound dispersion and overall

quality in videogames (Bridgett. R, 2007). With a large number of players using

speakers built into a TV, its seems that however much better audio gets in terms of

sound design it will be limited by placement of speakers in the home.

By using 3D sound the gamer will be able to perceive sound from all dimensions of the

gaming world by just wearing a set of headphones. It will appear to the gamer as if one

is in the head of the character being controlled on screen, hearing sounds from behind,

up above and from any other possible direction. This will help to create an almost real-

life experience for the gamer and enhance the overall captivity of video games in

general. This effect will only work with certain styles of game due to the game play and

design; this may include platforms such as FPS and 3rd Person.

� � ��

1.3 Scope

As from beginning to end this project will have been around one year in length and has

hoped to cover important subjects from psychoacoustics and sound source localisation

through to the advancements in immersion that 3D sound brings.

This project hopes to focus more on what needs to be considered and how it affects

the listener rather than how it is achieved. Begault (1994) discusses within the book 3D

Sound for Virtual Reality and Multimedia the process of implementing the dispersion

technique into virtual reality applications. Although this is a critical part in creating a 3D

soundscape this project has had no where near enough time to research and

understand this process, however, the implementation will be covered briefly within this

project.

The design and build of the practical tests have also taken many months but the tests

do hope to be a critical part in forming a final conclusion of the project.

1.4 Rationale

Although there has been research into the implementation into 3D sound for games

and other virtual reality systems this project hopes to give an outline as to the theories

and processes involved and hopes to mention any major problems that developers

could face. More than this the project hopes to find the differences that 3D sound will

have on the player in comparison to other more popular dispersion methods and find

out if it will be of actual benefit.

3D sound hopes not only to aid game developers in creating more immersive and

enjoyable games, but also the players too. Players will have an increased advantage in

localisation of which will be of aid especially when playing online against other players

due to quicker directional detection of sound sources. Although it may be argued that

this advantage is already at hand with surround sound systems the added bonus of

elevation detection is not. Both 5.1 and 7.1 surround sound headphones have been

produced and are available to purchase in high street stores, this may possibly indicate

that players are in fact actively seeking this increased advantage over opponents

(Howley, 2009).

� � ��

1.5 Aim and objectives

Aim

To establish if 3D Sound can be effectively utilised in video games to create a more

lifelike gaming experience.

Objectives

• To research into forms of sound distribution methods that are currently used

within this area of the media, the way each is delivered and the advantages and

disadvantages of each.

• To investigate methods of sound dispersion in other areas of the media and beyond,

helping to provide technical information on sound source localisation.

• To investigate into psychoacoustics and how humans are able to localise sound

helping to discuss how sounds will be recreated in a virtual 3D environment

• To develop a number of high quality pieces of audio for a piece of game footage,

each utilising different forms of sound dispersion for testing purposes.

• To a conduct a number of tests covering localisation in a space and the overall

effectiveness of each sound dispersion method.

• To analyse and evaluate test results in order to conclude if 3D sound is suitable

for the medium of video games.

2.0 REVIEW OF EXISITING KNOWLEDGE

2.1 Introduction

In order to create a fully 3D interactive soundscape in which a player can perceive full

3D sound it is important that there is a solid understanding of human sound source

localisation as well as many other factors such as how sound will operate within a real

space. Begault (1994) explains that this is necessary so that either in real time or

algorithmically the characteristics of a particular sound source can be predicated and

processed in a way that would ensure full replication of not only the space but the ears

of the controlled on-screen character too.

� � ��

It is also important to analyse the current state of sound dispersion techniques in order

to create a comparison between the production and reproduction of 3D sound against

other technological formats that are currently used.

2.2 The effect of 3D sound on video games

3D sound will have many positive affects on how video games are played the first of

which involves immersion into the media. With the use of 3D sound the player will hear

exactly what the character is hearing on screen, including what is happening above,

below, behind and in front- making the interpretation of the audio seem almost real

(Gehring, 1997). Gehring continues to discuss the advantages of 3D sound in multiple

areas, the benefits the effect adds to gaming and ultimately how it is be achieved,

including the hardware and computer capabilities that are needed and how the audio

will be consumed whether through headphones or loudspeakers.

3D sound can also be used to the advantage of player’s as localisation becomes much

easier, meaning any events that take place outside of the player’s view can be

attended to as one would in the real world (Tullis, 2006). As Gehring takes an

approach specifically looking at 3D sound, Tullis discusses the advantages surround

sound brings to gaming as a whole and generally how this can be adapted onto

multiple gaming platforms from action to sport.

3D sound would most likely work better on some games more so than others due to

the way the games are made and played. For example, a single player FPS game

would arguably utilise 3D sound most effectively as the player will see through the

character’s eyes and hear through the ears also. 3rd person games however maybe

slightly more difficult as the developer has to decide at what point the 3D sound is

processed, i.e. at the ears of the on-screen character or at the position of the camera.

Further more when there are multiple players in one room 3D sound can become

impractical as all players would have to wear headphones- the use of loudspeakers to

create the effect would become almost impossible.

2.2.1 3D sound or surround sound?

Surround sound can come in a variety of different formats dependant upon how many

speakers are utilised (5.1, 7.1 etc) and generally, the more speakers used will bring a

� � ��

greater sense of space emitting from around the listeners head (Silva, 2010), however,

whilst these systems are effective they are also costly. For maximum effectiveness the

speakers also need to set at fixed points based upon the listener position, at the points

between the loudspeakers it is said the localisation will be weakest (Holman, 2008).

Surround sound speakers can ultimately be very complex to understand and set up (as

seen in fig. 1), as not only are there many different types but the placement of the

speakers in the home may have a huge effect on the media. Surround headphones

are also available; however, these headphones will only give the illusion of surround

sound on a horizontal axis, not a full 3D illusion.

Fig 1- Recommended 7.1 set up 8”-12” from TV

Dolby (2010)

2.2.2 Headphones and Loudspeakers

For the effect of 3D sound to be truly maximised it is recommended that playback is

delivered through headphones, although, it is possible to reproduce the effect through

loudspeakers. When delivering audio through loudspeakers both speakers must have

as little crosstalk as possible (audio that, in this situation, is shared between ears), this

is to ensure that each ear only hears the intended signal, not the signal from the other

loudspeaker, if crosstalk is too great the 3D image will be disrupted, even the slightest

� � �

head movement can disrupt the effect. Headphones give a dedicated speaker to each

ear meaning crosstalk and head movement do not become a problem (Cheene, 2002).

Eliminating crosstalk would be very difficult as the conditions for each ear would need

to be almost perfect in order for the effect to be perceived. This would ultimately leave

the phenomenon incredibly unstable; it can be seen that using loudspeakers for 3D

sound is generally very impractical.

2.3 The auditory space

In order for a listener to pin point a perceived sound there needs to be a position of

reference; this will enable the location of a sound source to be described through

height, direction and distance. These measurements of localisation will also aid in

describing the psychoacoustic properties of the subject ahead.

The two main measurements for describing a location of a sound source are azimuth

and elevation (Blauert, 1974). These two conventions aid in describing the perception

of a source within a sphere around the listeners head but does not offer a description

of distance (Fig. 2)

Fig. 2 – Azimuth and Elevation (Begault, 1994, p3)

� � A�

2.3.1 Azimuth and elevation perception

In discussing the localisation of a sound source there are many psychoacoustic

theories that need to be addressed in order to understand the implementation and

affects that 3D sound will have.

Begault (1994) describes the most important cue in localising sound as the difference

of the waveform at each ear. Interaural Intensity Differences (IID) and Interaural Time

Differences (ITD) are said to be the two main cues in the localisation of sources on the

azimuth plane- these two concepts make up the duplex theory (Rayliegh, 1907).

2.3.2 The lateralization paradigm

Lateralization is a topic discussed at length by Begault (1994, p39) in which different

psychoacoustic cues will be manipulated in order to find the “relative sensitivity of

physiological mechanisms.” Lateralization will arise in the case that sound perception

occurs inside the head and the production of the sound occurs through manipulation of

ITD and IIDs over head phones. Begault continues to explain that manipulation of

these cues can lead to a somewhat accurate prediction of the physiological properties

of human sound localisation.

Lateralization will attempt to copy Interaural Differences.

2.3.3 Interaural time difference

ITD describes the time delay in which a sound reaches both ears. The ear furthest

from the source will become aware of the sound a very short time after the detection of

the ear closest. This is due to the separation of the ears and the irregular shape of the

human head. The phase shift between the ears will then resolve the direction of the

sound source. (Angus, Howard. 1996)

It is also indicated that the phase shift begins to become less effective above 1 kHz

(Rumsey, 2001) and the phenomenon becomes obsolete at around 1.5 kHz (Angus,

Howard. 1996).

� � B�

2.3.4 Interaural intensity difference

IIDs become affective after around 1.5 kHz due to the head shadowing effect that

occurs because of wavelength and the size of the human head. (Rumsey, 2001)

When a sound source moves away from the median plane and further along the

azimuth plane the ear furthest from the source will receive an audible reduction in level

due to obstruction of the head. (Angus, Howard. 1996)

2.3.5 The precedence effect

The precedence effect can also be referred to as the Haas effect and is described as

“the law of the first wavefront” (Blauert, 1974). It is known that humans will attend to

the first sound that reaches the ear and not what is perceived after in terms of

reverberation. Any reverberation from a sound source that arrives within 30ms of the

original sound will be fused and any reflections arriving 30ms after will then be

perceived as echoes (Angus, Howard 1996).

It should also be noted that the precedence effect is not so much altered by waveforms

but is affected by arrival times. Pierce (1999) explains that this effect will still occur if

one speaker in a stereo system is put out of phase.

2.3.6 Head movement

In order to localise a sound effectively humans, when aware of a sound source, will

attend to its direction by turning the head to face the sound. Humans will do this in

order to minimize timing and intensity differences by centring the image directly in

front, helping to resolve the cone of confusion (see section 2.4) and to gain visual

information on the source also (Begault, 1994).

If a set of loudspeakers are set up to create the 3D sound illusion even the slightest

head movement will cause the sensation to be broken, however, if it is being replayed

through headphones, any movement of the head will not matter (Cheenne, D. J, 2002).

� � C�

2.3.7 Moving sources

Experiments have found that the Minimum Audible Angle (MAA) of a moving source

can be larger than that of a fixed source, ranging from 1 degree for fixed and up to

three degrees for moving. This however can change based upon the location, type of

sound and the volume of the sound (Begault, 1994).

2.4 The cone of confusion

The cone of confusion relates to the ambiguous cues that timing and intensity

differences could result in if there were no outer ear and the human head was

spherical. The outer ear and other parts of the body would make two perfectly matched

ITD and IIDs near impossible to distinguish, however it is important to note that with

these two cues alone, the human ability to localise sound would not be as accurate

(Yost, 1997)

Fig 3- Cones of Confusion (Begault, Wensel. 2005)

2.5 The head-related transfer function

The Head-Related Transfer Function (HRTF) is the measurement of the filtering of

sound through the pinnae (outer ear) before it reaches the ear drum (Cheng,

Wakefield. 2001). The complex structure of the outer ear will lead to a sensation of

elevation and the ability to distinguish from in front and behind the head.

The asymmetrical pinnae will modify the spectral information of a sound dependant on

its source position; the complexity will cause different delays and resonances which will

� � �D

be unique to the location of a sound creating a unique HRTF for every position

(Begualt, 1994).

It should also be noted that some HRTF measurements will also include the reflections

made off the shoulders and the torso as these two areas will also aid in sound source

localisation.

Within 3D sound it is said that the most effective way to replicate the location of a

sound source is to do so at the point closest to the ear drum manipulating the natural

process of localisation in the closest possible manner. Because of this the spectral

manipulation of HRTF is one of the main components in creating a 3D soundscape.

2.5.1 Individualised and non- individualised HRTF

From person to person the size and shape of the outer ear will differ and generally no

two sets of ears will be exactly the same, meaning that the personal set of HRTFs for a

single individual will be unique.

If the HRTF of every user was known the accuracy of 3D sound would be much higher,

however, as it would be highly impractical in determining the HRTFs for every single

user, it is possible to measure the HRTFs of multiple people gaining a general HRTF

(Maher, Reed. 2009).

Begault (1994) mentions the possibility of listening through artificial pinnae but also

addresses the fact that a pair of ears good at localising would in turn become worse if

the artificial pinnae were not as effective.

Wenzel et al (1993) conducted an experiment on a number of test subjects in which a

single ‘generic’ HRTF was used from a single person who possessed good localisation

abilities on both the azimuth and elevation planes. The localisation abilities were

compared between the listener’s personal abilities to localise and the ability to localise

through the generic HRTF through headphones. The experiment concluded that the

vast majority of the subjects were able to locate sources much to the same degree as

with each individuals personal HRTF. It was also noted however, that listeners

experienced an increased rate in front-to-back confusion.

� � ��

Fig. 4- The differing HRTFs of multiple subjects

(Truax, B. 1999)

2.6 Distance, depth and the environment of the virtual world

To aid 3D sound in becoming totally immersive it is also important to cover how players

can perceive the distance of a sound source in the virtual world. It is also necessary to

cover the environmental context in which the source then emits to aid in realism.

2.6.1 Distance and depth

The distance of a sound source is how far away the source appears to be whilst depth

is front to back distance of a sound (Rumsey, 2001).

Sources perceived from a distance compared to sources close are said to hold the

following characteristics:

� � ��

• “ Quieter (extra distance travelled)

• Less high frequency content (air absorption)

• More reverberant (in reflective environment)

• Less difference between time of direct sound and first floor reflection

• Attenuated ground reflection”

Rumsey (2001) p 35

The loudness cue is an extremely important factor in the judging of distance for the

human brain, however, if the listener has no prior knowledge of the intensity of a sound

this factor becomes ambiguous on its own.

The loudness of reverberation is a further cue for distance perception, in which the

loudness of the sound is judged against the loudness of the reverberation. The

intensity of a sound that travels directly from the source to the listener decreases by

one half or 6dB at every doubling of distance, the depreciation of the reverb amplitude

is not as great. The direct to reverberation amplitude ratio is lesser for further objects-

the reason distant objects appear more reverberant (Gardner, 1999).

Fig 5. Default Distance Model used by Waves Art Acoustic Environmental Modelling.

(Gardner, 1999, p6)

� � ��

2.6.2 Apparent source width

The Apparent Source Width (ASW) is the space a sound source appears to fill.

Reflections up to around 80ms will seemingly broaden the ASW of a source dependant

on the delay of early reflections (Ando, Sato. 2002)

2.6.3 Inside-the-head localisation

The apparent feeling that a sound source is generated from actually within the head is

named Inside-The-Head Localisation (IHL) as is obvious when wearing headphones

and can occur when using 3D sound. It is believed that this phenomenon is due to

HRTFs, head movement and reverb and not the structural shape of the head (Begault,

1994). The effect in which sound sources appear outside the head is “externalisation”,

which is also linked to the word “Spaciousness”, a term describing the sense of space

within a room (Rumsey, 2001). The use of reverberation will greatly improve IHL within

headphones. (Begault, 1994).

2.7 The implementation of 3D sound

Before discussing how 3D sound can be implemented into games it is important to look

at how this is achieved already with the games of today.

2.7.1 Current game audio techniques

Currently there are a number of tools that allow the game developer and audio

designer to place sounds within a 3D landscape of which can be interactive and non-

interactive and which can trigger sound bites or music, these tools can be presented in

the form of a GUI or in coding language (Walder, 2006).

The use of an API, built into games consoles and soundcards is a means of replaying

a programmed sound bite to trigger at a certain time, knowing where it is within a

space and knowing what type of space it is in (Hagon, Muschett. 2002).

The sound will be presented to the player via panning so that the player is able to

locate the source, orientation and speed of the object. How the sound is emitted will all

contribute to this.

� � ��

The ambiance of the sound is applied to the source based upon the surroundings of

the object and includes the likes of delay, reverb and Doppler shift. (Farnell, 2006).

Wave-Tracing is a means of emitting the sound of reverb based upon the geometry of

the room and will change rapidly based on the players movement through

environments (Hagon, Muschett. 2002).

2.7.1.1 Current use of 3D sound in video games

Even though it seems that 3D sound is not utilised much at all within mainstream

gaming there has been a lot of work into the creation of 3D sound engines, some have

been successful whilst in turn some have not.

Companies including QSound, Sensura and Aureal have all created 3D sound and

reverb engines to be used with PC games, all taking on different types of 3D sound

processing techniques to play back the audio over headphones, 2-channel, 4-channel

and surround sound systems. (Hagon, Muschett 2002). Possibly the most renowned

engine of them all; the A3D by Aureal was discontinued after a legal battle with

Creative Labs (Anon, 1998).

However, more recently GHOST Binaural Audio have released a fully working iPhone

application utilising 3D sound named “Aves” (Action=Reaction Games, n.d), popular

game audio software including FMOD and the Miles Sound system by RAD Tools also

allow for 3D sound implementation (Tandieflt, M. 2009), (RAD Tools, 2010). It is

interesting to note that the game “Aves”, at the time of writing, has a 2.5 of 5 rating and

very average reviews on the official iTunes site (Apple, 2010).

2.7.1.2 The use of 3D sound outside of video games

3D sound can be found outside of the video gaming industry, but interestingly not just

within the entertainment industry. 3D sound has been utilised by Advanced Simulation

Technology Inc. specifically for military training in the USA. The technology is

applicable to many training scenarios including gunner training, as well as convoy and

flight training (ASTi, 2010).

3D sound also entered the film industry for a brief period in 1993 for the IMAX theatre

in which binaural audio was used within a headset, allowing for 3D visuals and 3D

� � ��

sound simultaneously. The headset concept did not seem to be successful

(Schoenherr, 1999).

2.7.2 3D Sound implementation

Begault (1994) highlights four factors that will have an effect on the implementation of

3D sound into consoles these are:

• The use of the sound in the video game, whether it is a sound effect triggered

by on screen activity or music and its meaning, i.e warning, guidance and

motivation sounds.

• Function of the Audio Interface- The consideration that the sounds are mixed in

real-time or are pre-mixed.

• How well the player will be able to localise sound and the actual lengths the

developer is asking the player to take localisation, sometimes this may not be

practical for the player.

• The available resources also need to be taken into consideration, including,

time, money and DSP limitations.

Gehring (1997) suggests that “the hardware to deliver realistic binaural audio is already

in place” based upon 16-bit stereo soundcards, however, it has also been more

recently said that this can not be done due to the implementation of complex

algorithms and available processing power (Nanostuff, 2009). The creators at Action=

Reaction games have been able to prove this theory incorrect, needless to say that if

Apple’s iPhone is capable of reproducing 3D sound, much superior and powerful

gaming consoles such as the Playstation 3 and XBOX 360 will also be capable.

2.7.2.1 HRTF implementation

As previously discussed the use of individualised HRTFs will greatly increase the

effectiveness of 3D sound, however, it is also known that there may not be a practical

way to implement everyone’s personal transfer function into the audio engine. It is

important to know how a HRTF is captured in order to determine if it is possible over a

wider scale.

� � �

The use of FIR filters will allow for the implementation of HRTFs into the realm of DSP

by using a number of delay and gain variants to effectively recreate the pinnae

(Begault, 1994).

There are multiple ways of collecting HRTF measurements that will be used by many

at a later time, as previously discussed the two main ways would be to use the HRTF

of a person with good localisation or to use the average HRTF of a number of subjects.

By placing a microphone at particular points in the ear canal is it is possible to record

the HRTF of an individual by playing a tone through a set of loudspeakers positioned at

fixed points.

Plate 1. HRTF measurement of a human subject.

Interface Laboratory (n.d)

2.7.2.2 Generalised HRTF

The only reason that generalised HRTFs will be used is for solely practicality reasons

as it would be incredibly difficult to analyse the HRTF of every single player, as in the

image shown above. There is a multitude of ways that HRTF measurements can be

averaged including averaging the physical model of the ear as well as average the

spectral content of each persons ear (Begault, 1994).

� � �A

2.8 Conclusion

The above notes many of the processes that are involved in the creation of 3D sound

for video games and the advantages and extra excitement that the concept can bring.

However, the information that has been found also details that there is a great amount

of extra thought that needs to be dedicated to the technology. It is evident that there is

a long process in creating the technology and there are many factors that can improve

the experience and the effectiveness of the illusion for the player. Full understanding of

creating the perfect 3D sound audio engine is yet to be realised.

3.0 METHODOLOGY

The following section will discuss the design and execution process of the primary

experiment for this project.

3.1 Initial testing method

The initial testing method has developed over the course of the project and has

evolved from an original idea in order to produce more scientifically analytical results

that would help in forming a more defined conclusion to the main aim.

Originally it was planned that three sets of videos would be produced each

incorporating two of the main dispersions methods of today (5.1 and stereo) as well as

3D sound. The selected video was a three minute long game trailer that used a camera

view similar to the style of an FPS video game. All of the original audio was completely

muted and was to be replaced by recording and sampling various Foley sounds in the

same form as it would be for a major video game. The video was then panned in the

three different ways listed, stereo, 5.1 and 3D sound. It was then decided that it would

be played to a number of subjects in order to compare and contrast each video by

asking general and more detailed questions. However, it was concluded that the

results that the testing would produce would not be very scientifically detailed and also

the answers people could give could be formed by the quality of the sound design and

not by the sound dispersion method itself.

� � �B

3.2 Final testing design

The final testing method hoped to investigate how well people can localise sound over

the three dispersion methods that are mentioned above by playing a series of samples

that can be found in everyday life; this testing would also be done without the use of

visuals. The test hoped to address perceived direction, height and distance through

each method.

By conducting this test it would be possible to find if 3D sound has any real advantage

over surround sound.

When designing the experiment it was decided that the factors needed to be as

controlled as possible, this would therefore reveal most accurate results.

Three samples were used- a person whistling, fingers clicking and a bucket being

tapped. Each of which has a differing main frequency range which hoped to explore

the success of accurate localisation at different frequencies. The recordings that were

used were recorded in an anechoic room to ensure that reverb would not affect the

results- any reverb that was used was added synthetically afterward.

A Studio Projects B-1 microphone was used into a Motu 896 interface.

The “bucket tap” sample had a fundamental frequency of 139Hz, the “whistle” sample

had a fundamental of 1809Hz and finally the “finger clicking” sample has a

fundamental of 773Hz.

Testing mainly required the subjects to note down the perceived azimuth location of a

sound on a circular grid imagining the centre of the circle as the subjects head as well

as noting down the perceived elevation of a sound also on a separate chart.

The samples were panned in two programs; Amphiotik Enhancer (AE) (for 3D sound

and stereo) and Logic Pro 8 for 5.1. Amphiotik Enhancer allows for the panning of 3D

sound and stereo audio.

Because HRTF is a huge issue within 3D sound, CIAIR HRTF data was used, provided

by Nagoya University in Japan, which is shipped with the software (Holistiks, 2010).

There are possible limitations that may have occurred from this particular technique

concerning the HRTF data. In built with the program was one other popular set of

HRTF data; the KEMAR dummy provided by MIT labs. By utilising both sets it would

� � �C

have been possible to find out to what degree HRTFs actually effect 3D sound

localisation.

Both the stereo and 3D sound samples were panned using the Amphiotik Enhancer

software at random points around the listeners head making sure that there was a fair

variation of positions, distances and height across all of the samples.

The reason that AE was also used for stereo is due to the fact that the illusion of left

and right would still be similar; however, the illusion of total 3D immersion would be

very difficult to achieve. The extra dimension of elevation also appears within the

stereo samples adding another factor to the test that could show interesting results.

As AE does not include the ability to export audio in 5.1 it was decided that the

surround sound panner in Logic Pro 8 would be used to spatialise the audio. As with

AE the three samples were panned at various points around the listeners head,

however, no elevation was added as this feature is not available.

AE also factors in all elements of the room in attempt to create a believable illusion,

this will include room size, material of walls and distance from sound source to walls,

floors and ceilings.

Reverb was synthetically added to 50% of the samples. This enabled the testing to

show if humans are able to localise better in the virtual world with the added feature of

reverb. The reverb was added in AE for the 3D sound and stereo, and for 5.1; in Logic

Pro. As the Space Designer Audio Unit in Logic Pro is currently more flexible than the

reverb unit found in AE, the reverb was emulated to the closest possible degree,

however, it was still very difficult to make the reverbs both sound similar.

3.3 Testing procedure design

Levitin (1999) describes controlled testing as having two major factors- random

assignment and identical experimental conditions for each subject.

There were 12 subjects in total (see section 3.4) and each subject was to hear a set of

audio samples; 6 for 5.1, 6 for stereo and 6 for 3D sound, there were however, 12

samples for each dispersion method. Each subject would still hear 9 samples

throughout the test that utilise reverb, 9 that do not and would also hear 6 of each

sample. This ensured that the test was completely fair showing that every audience

hears exactly the same samples but each with controlled variables.

� � �D

All testing conditions were exactly the same. (See Section 3.5 for experimental set up).

3.4 Subjects used

It was decided that there would be 12 subjects for this test, 6 of which would be

considered trained (have had 2 or more years experience working in the audio field),

the other 6 untrained (little or no experience) helping to analysis any differences that

may occur between these two groups.

As it was expected that the results would be invariant between subjects it was felt that

12 subjects would be a fair number (Levitin, 1999).

3.5 Experimental setup

As this aim of this test was to investigate how well humans can localise sound it was

felt that to make it largely applicable to video games all the equipment that would be

used would represent what the average consumer may have in their homes.

The equipment that was used was as follows:

• Standard 19” TV for stereo output

• Sony MDR150 DJ Headphones for 3D sound output

• Logitech X 530 5.1-CH PC multimedia home theatre speaker system for 5.1

playback

The speakers were set up based upon recommendations from Dolby Digital (2010) in

order to obtain maximum immersion from the surround sound system.

� � ��

Fig 6- Recommended 5.1 set up 8”-12” from TV

Dolby (2010)

It is important to note that before the test all subjects were briefed on each dispersion

method and the benefits and disadvantages of each.

3.6 Questionnaire design

Please find a copy of the questionnaire in appendix B.

In asking the subjects to describe the location of a source the use of circles was

inspired by experiments on human localisation of speech by Begault and Wensel

(1993). In which the subjects had to dot the location of a source on a grid (Fig. 7).

� � ��

Fig 7- Azimuth and Elevation Sources for Speech Stimuli

(Begault, Wensel. 1993)

The questions that were asked after the main part of the test was conducted were to

bring added knowledge the results. For example, if a subject noted down completely

inaccurate results but said in turn it was very easy to localise the sound sources, it

would be interesting to explore why this could be the case. Furthermore if a subject

localised very well through 3D sound but said that he/she found localisation easy

through 5.1 surround sound it allows for this to be questioned also, ultimately allowing

for expansion across a number of topics that could arise throughout testing.

3.7 Experimental procedure

Before the testing began all of the subjects were made aware of the aim of the report

and why the test was being conducted, were also shown an example questionnaire to

ensure correct completion.

As the samples were being played the subjects were to note down where the said

sample was located. Once the localisation test was complete the listener was to fill out

the questions in order for the results to be expanded upon.

The test was limited to a 10 minute time period to ensure that the listeners did not

begin to fatigue.

For each subject the samples were played in differing orders to allow for counter

balancing, helping to avoid ambiguous results that may occur through testing (Lane. D,

� � ��

2007). Each sample was played in a specific order to ensure that the subjects heard

the same clips but in differing orders. This would help to avoid errors that the subjects

may have made help during testing, leaving the results more justified and comparable.

4.0 RESULTS

The experiment gave interesting results that proved to be fairly inconsistent across all

sound dispersion methods. Both trained and untrained listeners show both accurate

and inaccurate results through out the experiment, however, due to inconstancy it

could be possible to put this down to luck through guessing. It can be also seen that

some people are better at localisation than others.

When results are deemed to be accurate the listener has correctly distinguished the

correct quadrant and has correctly determined the direct angle of this quadrant to

within 15 degrees unless stated otherwise.

The subjects were also asked to score various factors from 1 to 10; 1 being extremely

easy and 10 being very difficult.

For the following results brown indicates the actual placement of the sound, the black

indicates trained listeners and the untrained listeners are shown in red.

4.1 Stereo localisation

Stereo localisation can be seen to have no real difference in distinguishing left and

right from a stereo source. The left and right detection rate was as follows; 27 of 36

sounds were detected correctly by trained listeners and 26 of 36 were detected

correctly from untrained listeners, although the vast majority were perceived correctly,

distance and accurate azimuth direction were somewhat more inaccurate.

Although general localisation of sound sources is fairly high, perfect determination is

rare but does appear throughout different listeners (fig. 8). It was found that these

� � ��

particular listeners were consistent throughout the test. These said listeners were both

trained and untrained.

Perceiving and judging the distance could be considered a somewhat more difficult

task as there is no set scale upon the questionnaire that indicates the actual physical

size of the circle, even in doing so, the listener will not have any visual or any memorial

recognition of distance. Therefore it was expected already the distance cues would not

be incredibly accurate. However, it can be seen that when a listener is accurate in

judging the location the determination of distance is also mostly correct.

Fig. 8 – Test 1 Part 1

There are also points in the experiment which show that the accuracy of sound source

localisation is low but the judging of distance is accurate.


Stereo projection also shows a large rate of front to back confusion of which would be

expected as there are no visual cues and the placement of the loud speakers means it

is difficult to recognise what should be in front and behind.

� � ��

The results for the judging of height was completely inconsistent and no subject was at

all very accurate, again, without the aid of visual cues, judging height proved difficult

for the subjects.

When asked how difficult it was to localise sound using a stereo dispersion method on

a scale of 1 to 10 both trained and untrained listeners listed an average of 6.

4.2 Surround sound localisation

It would be expected that the perceived rate of right to left localisation would be higher

and front to back confusion would be lower when using surround sound systems,

however, the left and right differentiation rate and overall accuracy is somewhat lower.

26 of 36 sound sources were correctly perceived between left and right for trained

listeners, however, for untrained listeners the rate was considerably less at 19 of 36.

The actual accuracy of localisation also was not very high compared to stereo, it can

be seen that although some listeners localise accurately using this dispersion method

the majority do not.

There was no elevation properties used in the surround sound samples but the

subjects were not told this, interestingly the subjects did indeed note various changes

in height for this part of the test.

When asked how difficult it was to localise sound using surround sound dispersion

method on a scale of 1 to 10 trained listeners listed an average of 3 and untrained

listed an average of 6.

4.3 3D Sound localisation

3D sound proved to be the most difficult dispersion method in distinguishing the

localisation of a sound source. 3D sound had the lowest amount between left and right

distinction; 18 of 36 for trained and only 13 of 36 for untrained. Interestingly 6 of the 18

sources that were localised correctly were in fact localised perfectly, the largest

amount of perfect localisations across each method.

� � �

The perception distance was not as accurate as it seemed to be for the stereo and

surround dispersion method and is the weakest within 3D sound.


Also there was less front-to-back confusion than expected even without the use of

visual cues.

When asked how difficult it was to localise sound using the 3D sound dispersion

method on a scale of 1 to 10 trained listeners listed an average of 5 and untrained

listed an average of 4.

4.4 Localising in the presence of reverb

When asked how difficult it was to localise sound in the presence of reverb on a scale

of 1 to 10 trained listeners listed an average of 7.5 and untrained listed an average of

7.

It seems that as the reverb becomes more and more obvious, localisation, especially in

3D sound, confuses the listener at a greater degree than subtle reverberation. When

reverberation is increased past 30ms and reflections begin to be perceived as echoes

and listeners become disorientated and unsure of where the sound originates from,

this is especially apparent with distance also, this is possibly due to factors associated

with the precedence effect.

It can be seen that out of 54 samples containing reverb 24 were localised accurately.

� � �A

4.5 Localising differing frequencies

It can be seen that there is differing localisation success across the three samples

used with the whistle having the worst success rate. It is clear that both the bucket

tapping and finger clicking were the easiest to localise and this is proven by the

accuracy of certain subjects both trained and untrained.

4.6 Favoured form of localisation

Interestingly, there is clear divide in the favoured form of sound dispersion methods

between both the trained and untrained groups, with the trained subjects opting for

surround sound and the untrained 3D sound in the majority.

It can be seen that the untrained listeners are split across all three of the dispersion

methods when asked which was easiest to localise sound however, the majority opted

for 3D sound to be the favoured form of localisation.

All trained subjects answered that surround sound was the most effective sound

dispersion method and then went on to answer that surround sound is the most

favoured projection technique.

0

10

20

30

40

50

60

70

Surround Sound 3D Stereo More or Less TheSame

Trained

Untrained

Fig. 11- With which dispersion method is it easiest to localise sound?

� � �B

0

10

20

30

40

50

60

70

Surround Sound 3D Stereo No Preference

Trained

Untrained

Fig. 12- Regardless of cost, which method would you enjoy playing video games

through most?

5.0 Discussion

The results suggest two major factors:

• 3D sound gives the weakest form of sound source localisation

• Surround Sound is generally the favoured form of sound localisation and seems

to be considerably more useful in localisation of sound sources around the

head.

The main reason 3D sound localisation was not so successful is most likely down to

individualised and non-individualised HRTFs. As previously mentioned, a set of HRTFs

as obtained by Nagoya University, Japan were used generally meaning that if the

listeners personal HRTFs were completely different to the Nagoya HRTFs, localisation

would be poor, furthermore, if the HRTFs of a listener were coincidently closer to the

Nagoya HRTFs localisation ability, in turn would be more accurate. This is proven by

the untrained listener in test 5 who seemed to be fairly consistent across most tests.

� � �C

Part 3

Part 6

Fig. 13 – Test 5 Parts 3 and 6

It can be seen that although the listener was not totally accurate there is a noticeable

degree of consistency across both 3D sound tests.

There are other circumstances in which listeners both trained and untrained are

completely inaccurate throughout testing.

� � �D

Part 3

Part 6

Fig. 14 - Test 3 Parts 3 and 6

The fact that these particular subjects are inaccurate could be put down to weak

localisation skills, however, it is also highly possible that the used HRTF data is

completely different to the test subject’s own individualised HRTFs.

There was much front to back confusion across all the tests and without the use of

visual cues this was to be expected, especially when listening to the stereo recordings

but was also present throughout all of the recordings. When playing video games the

gamers will have the aid of visuals cues to distinguish the direction of the sound, for

example if a player has trouble distinguishing the direction of a sound much like in fig.

15 and is unsure if the sound is coming from in front or behind on screen actions will

help the players determine this.

� � ��

Part 3

Part 6

Fig. 14 – Test 2 Parts 3 and 6

It was also found that the test subjects were only really experiencing front to back

confusion from directly in front and behind, sounds placed to the sides were not really

affected by this. Maher and Reed (2009) found a similar pattern in the front left and

rear right quadrants, it was explained the possible causes could be “intrinsic nuance” of

the possible HRTFs used or potentially the spectral content of the sounds used during

this test. Front to back confusion appeared an equal number of times across both the

whistle (1809Hz) and bucket (139Hz) samples.

Interestingly all but one of the obvious front to back confusion errors are made when

listening to completely anechoic recordings. The reverb tails for the recordings,

although sometimes confusing to the listener will help the listener fully understand

where the sound is originating from it terms of both direction and distance (Rumsey,

2001). Too much reverb however can cause interesting auditory effects including the

likes of flutter and can disorientate the listener which is proved throughout testing.

� � ��

Subjects constantly commented on the confusion that reverb brought to localisation but

this seemed only to be the case when the reverb was definitely obvious and due to the

placement of the recordings in the virtual space created echoes and flutter.

Furthermore within the questionnaire most subjects noted that reverb made localisation

considerably more difficult, but no subject made any comment when the reverb was

only slight but possibly, knowingly present.

It is possible that subjects could become confused due to the precedence effect in

which the listener will consider the first sound heard to be the original direction of a

sound (Rumsey, 2001) although this maybe considered unusual as the sound emitting

directly from the source should arrive first. As some sounds incorporate flutter due to

the positioning of the sound source, listeners may be unable to tell which the original

sound is and which the delayed echo is.

This would not normally be expected in the replication of these experiments with actual

physical objects emitting sounds in the real world. However, it could be possible to put

these ambiguous occurrences down to the effectiveness of the software, as AE

attempts to factor in all elements discussed within section 3.0

Head movement would not have affected any of the results as each subject was sat in

exactly the same position for each test. The only possible difference between each test

would be the height of each separate subject, which should not have had a major

affect on the results.

The nature of each sample is related to the rate of localisation accuracy and it is

believed that this could be due to both the spectral properties of the sound. Almost all

subjects commented that the whistle sample was the hardest to locate, the results also

show that this clip had the worse localisation success; the two other samples used

however, had a greater rate of localisation accuracy.

In earlier tests on frequency localisation Blauert (1974) found that certain areas around

the head would allow for more accurate localisation with certain frequencies named

‘directional bands’ and found that around 1200Hz and 12000Hz has relation to the

rear of the azimuth plain, 300-600Hz and 3000-6000Hz to the front and a band near

8000Hz for above the head. These figures therefore should show an increased

success for all three examples in the rear and samples close to the head, however, the

results fail to show any particular relationship to this. It seems that accurate localisation

� � ��

at all sides is very similar. With the added problem of front to back confusion and lack

of visual cues it is difficult to know if the results truly reflect this theory.

Height was also an ambiguous factor within this dispersion method with the vast

majority of subjects perceiving the audio in an unintentional position. However, there

were a small amount of subjects who were accurate when perceiving height in some

circumstances. Reasons for this can include the factors that are discussed in section

4.1 in which the subject is unsure of perceived distance based upon no visual or

cognitional cues.

5.1 Actual Use of 3D Sound in video games

It is clear from the results that 3D sound is yet largely too ineffective to be used in

video games, however, this can already be due to a number of factors including the

overall effectiveness of the software used. It is already evident that companies have

already invested time and money into creating 3D sound audio engines and have

succeeded in doing so; however, these said engines have not yet being utilised in

mainstream gaming.

Interestingly, when the subjects were asked which method each would prefer to play

video games through, trained subjects answered surround sound and contrastingly

untrained subjects answered 3D sound although the majority did not find this method

the most effective. There are a multitude of reason why this could be case.

Trained users are most likely familiar with binaural and previously knew about the

phenomenon before testing began, meaning there was possibly of biased opinions

within testing although all subjects were asked to answer with an open mind. Some

untrained users however, did not know about this effect previously and the curiosity

and originality of the effect possibly caused this set of subjects to be fonder of this

method by the end of testing.

Although some subjects may find 3D sound more effect than other methods, it remains

that, for this testing at least, this particular method was the weakest and it is possible

that this is so due to individualised and non-individualised HRTFs. Although the set of

HRTFs used were generalised it is possible that there were some ambiguous areas

that made localisation easier and/or harder at different points around the head.

� � ��

Concerning the current development of 3D sound outside of this research, one cannot

be sure of the quality of the engines that have been manufactured as they have yet to

be heard. As a number of companies have tried and have possibly not succeeded into

breaking out into commercial media it is possible that either developers do not see a

market for 3D sound or developers do not see the quality fit to include in the gaming.

6.0 CONCLUSION

To summarise it is evident that further work must go into the creation of effective HRTF

emulation in order for 3D sound to become a success and although the format has

shown some success during this experiment other existing sound dispersion methods

seem to be stronger.

Although 3D sound did struggle to be effective it can be seen that there has been

progress made in this area for a number of years, however, there has been no release

of an accessible game utilising 3D sound until very recently which has not proven to be

a great success.

Possibly the weakest and most ambiguous factor of all was height, localising through

all sound dispersion methods proved difficult for the majority of the subjects.

Distinguishing between front and back also became difficult for subjects at some

stages.

It has been shown that many areas need to be considered in order to create a lifelike

virtual soundscape with all factors bearing as much relevance as the next, this must be

considered at all times if audio in video games is to become more immersive and

realistic. It can be seen that 3D sound technology is also used outside of video gaming

meaning it could be possible to bring this technology into this particular virtual reality

world, furthermore, the advancement in video gaming may be able to bring 3D sound

into other mainstream media including the likes of film.

The general aim and all objectives, with the exception of one, were completed and

each brought valuable information to the project that helped to form an in depth

understanding of the subject at hand. The objective that was not met was altered in

order to improve the validity of the results greatly improving the final findings.

� � ��

It was felt that the initial research was greatly beneficial as much information already

exists on 3D sound, more so than original anticipated. This helped to form a solid

understanding and also aided in creating the experiments, efforts could then be

directed at finding new information.

It is still believed that 3D sound can bring about a gaming experience of heightened

immersion and further enjoyment at a lower cost and with less hassle; however, 3D

sound is currently confusing and inaccurate and ultimately needs to be perfected in

order for this technology to reach mainstream gaming.

7.0 RECOMMENDATIONS FOR FUTHER WORK

If this project were to be continued it would be ideal to create a 3D sound engine for a

short “level” in a playable video game in order to determine the actual effectiveness of

the phenomenon.

However, before this can be achieved, an increased level of research will need to be

placed into HRTFs. A method utilising one or more sets of HRTFs should ideally be

developed in order for all listeners to be completely comfortable and for localisation to

be as accurate as possible. This could include generalised HRTFs, HRTFs of

particularly good localisers or even a format that allows personalised HRTFs possibly

through exact replicas, adjustments of pre-existing or the ability to choose between

multiple sets.

In repeating this particular report it would interesting to conduct the testing across a

wider range of subjects of different ages using different forms of HRTFs, this would

help to form a better understanding on the true effect of the factor.

It would also be interesting to use visuals cues during testing to see if this influences

the results in anyway; the visual cues may not necessarily have to coincide with the

position of the sound.

� � �

8.0 REFERENCES

Angus J. A. S, Howard D. M (1999). Acoustics and Psychoacoustics. 4th ed. Oxford:

Focal Press.

Anon. (1998). Aureal and Create Engage in Legal Skirmish. [online] Available:

http://web.archive.org/web/19990829025202/www.aureal.com/cgi-

bin/pub/display.pl?template=press_aur_detail.htm&serial=76. Last accessed

04/03/2010

Apple. (2010). Aves. [online] Available:

http://itunes.apple.com/us/app/aves/id321295493?mt=8. Last accessed 15/04/2010

ASTi. (2010). ACE 3D "Soundfield Reconstruction". [online] Available: http://www.asti-

usa.com/telestra4/ace/3dsound/advantages.html. Last accessed 04/03/2010

Begualt, R. D (1994). 3D sound for virtual reality and multimedia. London: Academic

Press, p3, p39,

Blauert, J (1974). Spatial Hearing. Cambridge, Massachusetts: The MIT Press

Bridgett, R. (2007). Designing for Next-Gen Game Audio. [online] Available:

http://www.develop-online.net/features/65/Designing-for-Next-Gen-Game-Audio. Last

accessed 23/4/2010

Cheene, J (2005). Handbook for Sound Engineers ed Ballou, G.M. 3rd ed. Oxford:

Focal Press.

Cheng, C, Wakefield, G (2001) Introduction to Head-Related Transfer Functions:

Representations of HRTF in time, frequency and space, Audio Engineering Society, 49

Dolby. (2010). Little Things Make a Big Difference. [online] Available:

http://www.dolby.com/consumer/setup/index.html. Last accessed 12 March 2010

Farnell, A (2006). Designing Sound. London: Applied Scientific Press.

� � �A

HOLISTIKS. (2010). AMPHIOTIK ENHANCER ST . [online] Available:

http://www.holistiks.com/amphiotik/modules.php?name=_hes_Documents&file=_produ

cts_amen_st. Last accessed 20/03/2010

Gardner (1999) 3D Audio and Acoustic Environment Modelling Waves Arts Inc

Gehring, B. (1997). Why 3D Sound Through Headphones?. [online] Available:

www.fp3d.com/papers/whyheadphones.pdf. Last accessed 20/03/2010

Holman, T (2008). Surround Sound. 2nd ed. Oxford: Focal Press

Howley, L. (2009). Turtle Beach HPA2 PC Headset Review. [online] Available:

http://pcgamingcorner.com/wordpress/?p=1485. Last accessed 27/4/2010.

Lane DM (2007). Counterbalancing. [online] Available:

http://davidmlane.com/hyperstat/A128919.html. Last accessed 15/04/2010

Interface Labs. (n.d). Spatial Sound Research. [online] Available:

http://interface.cipic.ucdavis.edu/CIL_html/CIL_research.htm. Last accessed 12 March

2010

Maher, Reed (2009) An Investigation of Early Reflection’s Effects on Front-Back

Localisation of Spatial Audio, Audio Engineering Society International Convention 127,

New York, USA 2009

NanoStuff (1999). Why don't game developers actively pursue binaural sound

technologies?. [online] Available: http://www.reddit.com/r/gaming/comments/ablrh. Last

accessed 04/03/2010.

Pierce. J (1999). Music, Cognition and Computerized Sound ed. Cook, P. R.

Cambridge, Massachusetts: The MIT Press.

RAD Tools (2010) Miles Sound System Game Developer Magazine, 17

Rayliegh, L (1907) On our perception of sound source direction Philosophical

Magazine 13

Rumsey, F (2001). Spatial Hearing. Oxford: Focal Press.

� � �B

Silva, R. (2010). 5.1 vs 7.1 Channel Home Theatre Receivers - Which is Right For

You?. [online] Available:

http://hometheater.about.com/od/hometheateraudiobasics/qt/5-1vs7-1diff.htm. Last

accessed 12/03/2010

Schoenherr. (1999). IMAX film format. [online] Available:

http://history.sandiego.edu/gen/filmnotes/imax.html. Last accessed 04/03/2010

Tandefelt, M. (2009). True Binaural 3D/HRTF. [online] Available:

http://www.torquepowered.com/community/forums/viewthread/86563. Last accessed

04/03/2010

Traux, B. (1999). Binaural Hearing. [online] Available: http://www.sfu.ca/sonic-

studio/handbook/Binaural_Hearing.html. Last accessed 03/03/2010.

Tullis, M. (2006). Video Game Surround Sound for the Next Generation. [online]

Available: http://www.dolby.com/consumer/experience/dolbycast/transcript/3-games-in-

surround-sound.html. Last accessed 12 March 2010

Walder, C (2006) Intelligent Audio for Games, Audio Engineering Society International

Convention, 120, Paris, France 2006

Yost (1993) Perceptual Models for Auditory Localization Audio Engineering Society

International Conference, 12, Copenhagen, Denmark 1993

9.0 BIBLIOGRAPHY

Ando, Sato (2001) Apparent Source Width (ASW) of Complex Noises in relation to

Interaural Cross Correlation Function, Kobe University, Japan

Barry, D, Coyle, E, Lawlor, B (2004) Real Time Sound Source Separation: Azimuth

Discrimination and Resynthesis, Audio Engineering Society International Convention

117, San Francisco, USA 2004

Cha, Ryu, Seo (2008) Implementation of 3D sound using grouped HRTF, Audio

Engineering Society International Conference 34, Jeju Island, Korea, 2008

� � �C

Dale, W (1999) A Machine- Independent of 3D positional sound application

programmer interface to spatial audio engines Audio Engineering Society International

Conference, 16 Rovaniemi, Finland 1999

Dolhasz, A (2009) Microphone Arrays for Surround Sound Mixing and Recording,

Birmingham City University

Furse, R (2009) Building an Open AL Implementation using Ambisonics, Audio

Engineering Society International Conference 35, London, England 2009

Griesinger, D (2009) Architectural Acoustics: Perception and Binaural Effects in

Architectural Acoustics, Acoustic Society of America, 125

Griesinger, D (2001) The psychoacoustics of listening Area, Depth, Envelopment, in

Surround Recordings and their relationship to microphone technique , Audio

Engineering Society International Conference 19, Schloss Elmau, Germany 2001

Hiipakka, J (n.d) Implementation of 3D Sound in a Virtual Room, Helinski University of

Technology

Huopaniemi. J (1999) Virtual Acoustics and 3-D Sound in multimedia signal

processing, Helsinki University of Technology

Jin. C, Coredery. A, Carlile. S et al (2000) Spectral Cues in Human Sound Source

Localisation, University of Sydney

Kistler, L, Wrightman, F (1990) Hearing in 3 Dimensions: Sound Localisation, Audio

Engineering Society International Conference 8, Washingtion D.C, USA 1990

Kistler, L, Wrightman, F (1991) A model of head-related transfer functions based on

principal components analysis and minimum-phase reconstruction, Acoustic Society of

America, 91

Lluis-Garcia, Mlynek et al (2004) Advanced 3D Audio Algorithms by Flexible, Low

Level Application Programming Interface, Audio Engineering Society International

Convention 116, Berlin, Germany 2004

� � �D

Moore, B (1999) Controversies in Spatial Audio, Audio Engineering Society

International Conference 16, Rovaniemi, Finland 1999

Neukom, M (2007) Ambisonic Panning, AES convention paper 7297, 123

Schmidt, B (2002) Playing with sound: Audio Hardware and Software on XBOX, Audio

Engineering Society UK conference 17, London, England 2002

Sen, R A (nd) A system for HRTF calibration through comparison of test sounds, U.S

Patent Application

Wenzel. E, Miller. J, Abel J (2000) Sound Lab: A real-time software based system for

the study of spatial hearing, Audio Engineering Society International Convention 108,

Paris, France 2000

Yamada et al (1978) OUT-OF-HEAD Localisation headphone listening device, US

Patent

Wenzel. E, Kistler. M, Wrightman. F, (1993) Localising using nonindividualised head

related transfer functions, Acoustic Society of America, 94

� � ��

9.0 APPENDICIES

Appendix A

Minimum Audible Angle (MAA)

The MMA for the detection of a source will vary around the listener’s head. Directly in

front the MMA of a source is around 1 degree on the azimuth and 3 degrees on the

elevation plane. These figures will become progressively larger for sounds located

behind the listener (Holman, 2008).

� � ��