an exploration into the effectiveness of 3d sound on video games
DESCRIPTION
Abstract:This project has set out to explore how binaural audio or 3D Sound can be utilised in the media of video games, the advantages it can bring as well as processes that the technology must undergo in order for the phenomenon to be perceived correctly. Apart from extensive research into the area of psychoacoustics and digital implementation of 3D sound, further comparisons have been drawn between stereo and 5.1 surround sound through testing conducted on human subjects. All three dispersion methods were tested in order to find the localisation accuracy and effectiveness of each. The main affect upon localisation accuracy is seen to be head-related transfer functions (HRTF) and how this is used within the creation of 3D sound as there are many possible ways to utilise this necessity. This project looks at using generalised HRTFs for testing and it has ultimately found 3D sound to be weak using this test method, this was found across a number of subjects compared to surround sound and even stereo.TRANSCRIPT
� � ��
An exploration into the effectiveness of 3D
Sound on video games
By Jonathan Tozer
BIRMINGHAM CITY UNIVERSITY
BSc (Hons) Sound Engineering and Production
April 2010
�
� � ��
An exploration into the effectiveness of 3D sound on video games
By Jonathan Tozer
Supervisor: Islah Ali-Maclachan
� � ���
ABSTRACT
This project has set out to explore how binaural audio or 3D Sound can be utilised in
the media of video games, the advantages it can bring as well as processes that the
technology must undergo in order for the phenomenon to be perceived correctly. Apart
from extensive research into the area of psychoacoustics and digital implementation of
3D sound, further comparisons have been drawn between stereo and 5.1 surround
sound through testing conducted on human subjects. All three dispersion methods
were tested in order to find the localisation accuracy and effectiveness of each. The
main affect upon localisation accuracy is seen to be head-related transfer functions
(HRTF) and how this is used within the creation of 3D sound as there are many
possible ways to utilise this necessity. This project looks at using generalised HRTFs
for testing and it has ultimately found 3D sound to be weak using this test method, this
was found across a number of subjects compared to surround sound and even stereo.
� � ����
My greatest appreciation goes to Izzy for helping me through this project and giving me
support and guidance throughout. Also, I would like to thank Andy Bourbon and the
rest of the staff at the School of Digital Media for their constant support over the last
three years.
And finally to my parents for their support, guidance and love. Thank you.
� � ���
Table of Contents
• Glossary vi
• List of Figures and Plates vii
1.0 Introduction 1
1.1 Background 1
1.2 Problem Definition 1
1.3 Scope 2
1.4 Rationale 2
1.5 Aims and Objectives 3
2.0 Review of Existing Knowledge 3
2.1 Introduction 3
2.2 The effect of 3D sound on video games 4
2.2.1 3D sound or surround sound 4
2.2.2 Headphones and loudspeakers 5
2.3 The auditory space 6
2.3.1 Azimuth and elevation perception 7
2.3.2 The lateralization paradigm 7
2.3.3 Interaural timing difference 7
2.3.4 Interaural intensity difference 8
2.3.5 The precedence effect 8
2.3.6 Head movement 8
2.3.7 Moving sources 9
2.4 The cone of confusion 9
2.5 Head-Related Transfer Function 9
2.5.1 Individualised and non-individualised HRTF 10
2.6 Distance, depth and the environment of the virtual world 11
2.6.1 Distance and depth 11
2.6.2 Apparent source width 13
2.6.3 Inside- The- Head localisation 13
2.7 The implementation of 3D sound 13
2.7.1 Current game audio techniques 13
2.7.1.1 Current use of 3D sound in videogames 14
2.7.1.2 The use of 3D sound outside of videogames 15
� � ��
2.7.2 3D sound implementation 15
2.7.2.1 HRTF implementation 15
2.7.2.2 Generalised HRTF 16
2.8 Conclusion 17
3.0 Methodology 17
3.1 Initial testing method 17
3.2 Final testing design 18
3.3 Testing procedure design 19
3.4 Subjects used 20
3.5 Experimental set up 20
3.6 Questionnaire design 21
3.7 Experimental procedure 22
4.0 Results 23
4.1 Stereo localisation 23
4.2 Surround Sound localisation 25
4.3 3D sound localisation 26
4.4 Localising in the presence of reverb 26
4.5 Localising differing frequencies 27
4.6 Favoured form of localisation 27
5.0 Discussion 28
5.1 Actual use of 3D sound in video games 33
6.0 Conclusion 34
7.0 Recommendations for further work 35
8.0 References 36
9.0 Bibliography 38
10.0 Appendices 41
� � ���
Glossary
AE- Amphiotik Enhancer
API- Application Programmer Interface
DSP- Digital Signal Processing
FIR- Frequency Impulse Response
FPS- First Person Shooter
HRTF- Head-Related Transfer Function
IHL- Inside-The-Head Localisation
IID- Interaural Intesity Difference
ITD- Interaural Timing Difference
MAA- Minimum Audible Angle
� � ����
List of Figures and Plates
Figure No. Title Page Number
1 Recommend 7.1 Set up 8”-12” from TV 5
2 Azimuth and Elevation 6
3 Cones of Confusion 9
4 The differing HRTF of multiple subjects 11
5 Default Distance Model used by Waves Art Acoustic Environmental Modelling
12
6 Recommended 5.1 Set up 8”-12” from TV 21
7 Azimuth and Elevation Sources for Speech Stimuli 22
8 Test 1 Part 1 24
9 Test 1 Part 4 24
10 Test 2 Part 3 26
11 With which dispersion method was it easiest to localise sound?
28
12 Regardless of cost which method would you enjoy playing games through most?
28
13 Test 5 Parts 3 and 6 29
14 Test 3 Parts 3 and 6 30
15 Test 2 Parts 3 and 6 31
Plate No. Title Page Numbers
1 HRTF Measurement of a Human Subject 16
�
� � ��
1.0 INTRODUCTION
1.1 Background
3D sound is an audio format which will enable the recreation of a full 3D landscape
through a set of standard headphones (Begault, 1994). Unlike other sound dispersion
formats that enable surround sound (including 5.1 and 7.1.), 3D sound will allow sound
sources to be perceived from all apparent directions on both Azimuth and Elevation
planes.
This project hopes to find if there is any real advantage in implementing 3D sound into
video games and detail the affects that this sound dispersion method will have on the
player in comparison to current methods used today.
3D sound can also be referred to as binaural audio; however, this project will always
refer to the phenomenon as 3D sound.
1.2 Problem definition
As technological capabilities have grown over recent years so has the demand for
more lifelike games and although graphics and visuals have taken a huge leap forward
in terms of realism, audio is vastly limited in terms of sound dispersion and overall
quality in videogames (Bridgett. R, 2007). With a large number of players using
speakers built into a TV, its seems that however much better audio gets in terms of
sound design it will be limited by placement of speakers in the home.
By using 3D sound the gamer will be able to perceive sound from all dimensions of the
gaming world by just wearing a set of headphones. It will appear to the gamer as if one
is in the head of the character being controlled on screen, hearing sounds from behind,
up above and from any other possible direction. This will help to create an almost real-
life experience for the gamer and enhance the overall captivity of video games in
general. This effect will only work with certain styles of game due to the game play and
design; this may include platforms such as FPS and 3rd Person.
� � ��
1.3 Scope
As from beginning to end this project will have been around one year in length and has
hoped to cover important subjects from psychoacoustics and sound source localisation
through to the advancements in immersion that 3D sound brings.
This project hopes to focus more on what needs to be considered and how it affects
the listener rather than how it is achieved. Begault (1994) discusses within the book 3D
Sound for Virtual Reality and Multimedia the process of implementing the dispersion
technique into virtual reality applications. Although this is a critical part in creating a 3D
soundscape this project has had no where near enough time to research and
understand this process, however, the implementation will be covered briefly within this
project.
The design and build of the practical tests have also taken many months but the tests
do hope to be a critical part in forming a final conclusion of the project.
1.4 Rationale
Although there has been research into the implementation into 3D sound for games
and other virtual reality systems this project hopes to give an outline as to the theories
and processes involved and hopes to mention any major problems that developers
could face. More than this the project hopes to find the differences that 3D sound will
have on the player in comparison to other more popular dispersion methods and find
out if it will be of actual benefit.
3D sound hopes not only to aid game developers in creating more immersive and
enjoyable games, but also the players too. Players will have an increased advantage in
localisation of which will be of aid especially when playing online against other players
due to quicker directional detection of sound sources. Although it may be argued that
this advantage is already at hand with surround sound systems the added bonus of
elevation detection is not. Both 5.1 and 7.1 surround sound headphones have been
produced and are available to purchase in high street stores, this may possibly indicate
that players are in fact actively seeking this increased advantage over opponents
(Howley, 2009).
� � ��
1.5 Aim and objectives
Aim
To establish if 3D Sound can be effectively utilised in video games to create a more
lifelike gaming experience.
Objectives
• To research into forms of sound distribution methods that are currently used
within this area of the media, the way each is delivered and the advantages and
disadvantages of each.
• To investigate methods of sound dispersion in other areas of the media and beyond,
helping to provide technical information on sound source localisation.
• To investigate into psychoacoustics and how humans are able to localise sound
helping to discuss how sounds will be recreated in a virtual 3D environment
• To develop a number of high quality pieces of audio for a piece of game footage,
each utilising different forms of sound dispersion for testing purposes.
• To a conduct a number of tests covering localisation in a space and the overall
effectiveness of each sound dispersion method.
• To analyse and evaluate test results in order to conclude if 3D sound is suitable
for the medium of video games.
2.0 REVIEW OF EXISITING KNOWLEDGE
2.1 Introduction
In order to create a fully 3D interactive soundscape in which a player can perceive full
3D sound it is important that there is a solid understanding of human sound source
localisation as well as many other factors such as how sound will operate within a real
space. Begault (1994) explains that this is necessary so that either in real time or
algorithmically the characteristics of a particular sound source can be predicated and
processed in a way that would ensure full replication of not only the space but the ears
of the controlled on-screen character too.
� � ��
It is also important to analyse the current state of sound dispersion techniques in order
to create a comparison between the production and reproduction of 3D sound against
other technological formats that are currently used.
2.2 The effect of 3D sound on video games
3D sound will have many positive affects on how video games are played the first of
which involves immersion into the media. With the use of 3D sound the player will hear
exactly what the character is hearing on screen, including what is happening above,
below, behind and in front- making the interpretation of the audio seem almost real
(Gehring, 1997). Gehring continues to discuss the advantages of 3D sound in multiple
areas, the benefits the effect adds to gaming and ultimately how it is be achieved,
including the hardware and computer capabilities that are needed and how the audio
will be consumed whether through headphones or loudspeakers.
3D sound can also be used to the advantage of player’s as localisation becomes much
easier, meaning any events that take place outside of the player’s view can be
attended to as one would in the real world (Tullis, 2006). As Gehring takes an
approach specifically looking at 3D sound, Tullis discusses the advantages surround
sound brings to gaming as a whole and generally how this can be adapted onto
multiple gaming platforms from action to sport.
3D sound would most likely work better on some games more so than others due to
the way the games are made and played. For example, a single player FPS game
would arguably utilise 3D sound most effectively as the player will see through the
character’s eyes and hear through the ears also. 3rd person games however maybe
slightly more difficult as the developer has to decide at what point the 3D sound is
processed, i.e. at the ears of the on-screen character or at the position of the camera.
Further more when there are multiple players in one room 3D sound can become
impractical as all players would have to wear headphones- the use of loudspeakers to
create the effect would become almost impossible.
2.2.1 3D sound or surround sound?
Surround sound can come in a variety of different formats dependant upon how many
speakers are utilised (5.1, 7.1 etc) and generally, the more speakers used will bring a
� � ��
greater sense of space emitting from around the listeners head (Silva, 2010), however,
whilst these systems are effective they are also costly. For maximum effectiveness the
speakers also need to set at fixed points based upon the listener position, at the points
between the loudspeakers it is said the localisation will be weakest (Holman, 2008).
Surround sound speakers can ultimately be very complex to understand and set up (as
seen in fig. 1), as not only are there many different types but the placement of the
speakers in the home may have a huge effect on the media. Surround headphones
are also available; however, these headphones will only give the illusion of surround
sound on a horizontal axis, not a full 3D illusion.
Fig 1- Recommended 7.1 set up 8”-12” from TV
Dolby (2010)
2.2.2 Headphones and Loudspeakers
For the effect of 3D sound to be truly maximised it is recommended that playback is
delivered through headphones, although, it is possible to reproduce the effect through
loudspeakers. When delivering audio through loudspeakers both speakers must have
as little crosstalk as possible (audio that, in this situation, is shared between ears), this
is to ensure that each ear only hears the intended signal, not the signal from the other
loudspeaker, if crosstalk is too great the 3D image will be disrupted, even the slightest
� � �
head movement can disrupt the effect. Headphones give a dedicated speaker to each
ear meaning crosstalk and head movement do not become a problem (Cheene, 2002).
Eliminating crosstalk would be very difficult as the conditions for each ear would need
to be almost perfect in order for the effect to be perceived. This would ultimately leave
the phenomenon incredibly unstable; it can be seen that using loudspeakers for 3D
sound is generally very impractical.
2.3 The auditory space
In order for a listener to pin point a perceived sound there needs to be a position of
reference; this will enable the location of a sound source to be described through
height, direction and distance. These measurements of localisation will also aid in
describing the psychoacoustic properties of the subject ahead.
The two main measurements for describing a location of a sound source are azimuth
and elevation (Blauert, 1974). These two conventions aid in describing the perception
of a source within a sphere around the listeners head but does not offer a description
of distance (Fig. 2)
Fig. 2 – Azimuth and Elevation (Begault, 1994, p3)
� � A�
2.3.1 Azimuth and elevation perception
In discussing the localisation of a sound source there are many psychoacoustic
theories that need to be addressed in order to understand the implementation and
affects that 3D sound will have.
Begault (1994) describes the most important cue in localising sound as the difference
of the waveform at each ear. Interaural Intensity Differences (IID) and Interaural Time
Differences (ITD) are said to be the two main cues in the localisation of sources on the
azimuth plane- these two concepts make up the duplex theory (Rayliegh, 1907).
2.3.2 The lateralization paradigm
Lateralization is a topic discussed at length by Begault (1994, p39) in which different
psychoacoustic cues will be manipulated in order to find the “relative sensitivity of
physiological mechanisms.” Lateralization will arise in the case that sound perception
occurs inside the head and the production of the sound occurs through manipulation of
ITD and IIDs over head phones. Begault continues to explain that manipulation of
these cues can lead to a somewhat accurate prediction of the physiological properties
of human sound localisation.
Lateralization will attempt to copy Interaural Differences.
2.3.3 Interaural time difference
ITD describes the time delay in which a sound reaches both ears. The ear furthest
from the source will become aware of the sound a very short time after the detection of
the ear closest. This is due to the separation of the ears and the irregular shape of the
human head. The phase shift between the ears will then resolve the direction of the
sound source. (Angus, Howard. 1996)
It is also indicated that the phase shift begins to become less effective above 1 kHz
(Rumsey, 2001) and the phenomenon becomes obsolete at around 1.5 kHz (Angus,
Howard. 1996).
� � B�
2.3.4 Interaural intensity difference
IIDs become affective after around 1.5 kHz due to the head shadowing effect that
occurs because of wavelength and the size of the human head. (Rumsey, 2001)
When a sound source moves away from the median plane and further along the
azimuth plane the ear furthest from the source will receive an audible reduction in level
due to obstruction of the head. (Angus, Howard. 1996)
2.3.5 The precedence effect
The precedence effect can also be referred to as the Haas effect and is described as
“the law of the first wavefront” (Blauert, 1974). It is known that humans will attend to
the first sound that reaches the ear and not what is perceived after in terms of
reverberation. Any reverberation from a sound source that arrives within 30ms of the
original sound will be fused and any reflections arriving 30ms after will then be
perceived as echoes (Angus, Howard 1996).
It should also be noted that the precedence effect is not so much altered by waveforms
but is affected by arrival times. Pierce (1999) explains that this effect will still occur if
one speaker in a stereo system is put out of phase.
2.3.6 Head movement
In order to localise a sound effectively humans, when aware of a sound source, will
attend to its direction by turning the head to face the sound. Humans will do this in
order to minimize timing and intensity differences by centring the image directly in
front, helping to resolve the cone of confusion (see section 2.4) and to gain visual
information on the source also (Begault, 1994).
If a set of loudspeakers are set up to create the 3D sound illusion even the slightest
head movement will cause the sensation to be broken, however, if it is being replayed
through headphones, any movement of the head will not matter (Cheenne, D. J, 2002).
� � C�
2.3.7 Moving sources
Experiments have found that the Minimum Audible Angle (MAA) of a moving source
can be larger than that of a fixed source, ranging from 1 degree for fixed and up to
three degrees for moving. This however can change based upon the location, type of
sound and the volume of the sound (Begault, 1994).
2.4 The cone of confusion
The cone of confusion relates to the ambiguous cues that timing and intensity
differences could result in if there were no outer ear and the human head was
spherical. The outer ear and other parts of the body would make two perfectly matched
ITD and IIDs near impossible to distinguish, however it is important to note that with
these two cues alone, the human ability to localise sound would not be as accurate
(Yost, 1997)
Fig 3- Cones of Confusion (Begault, Wensel. 2005)
2.5 The head-related transfer function
The Head-Related Transfer Function (HRTF) is the measurement of the filtering of
sound through the pinnae (outer ear) before it reaches the ear drum (Cheng,
Wakefield. 2001). The complex structure of the outer ear will lead to a sensation of
elevation and the ability to distinguish from in front and behind the head.
The asymmetrical pinnae will modify the spectral information of a sound dependant on
its source position; the complexity will cause different delays and resonances which will
� � �D
be unique to the location of a sound creating a unique HRTF for every position
(Begualt, 1994).
It should also be noted that some HRTF measurements will also include the reflections
made off the shoulders and the torso as these two areas will also aid in sound source
localisation.
Within 3D sound it is said that the most effective way to replicate the location of a
sound source is to do so at the point closest to the ear drum manipulating the natural
process of localisation in the closest possible manner. Because of this the spectral
manipulation of HRTF is one of the main components in creating a 3D soundscape.
2.5.1 Individualised and non- individualised HRTF
From person to person the size and shape of the outer ear will differ and generally no
two sets of ears will be exactly the same, meaning that the personal set of HRTFs for a
single individual will be unique.
If the HRTF of every user was known the accuracy of 3D sound would be much higher,
however, as it would be highly impractical in determining the HRTFs for every single
user, it is possible to measure the HRTFs of multiple people gaining a general HRTF
(Maher, Reed. 2009).
Begault (1994) mentions the possibility of listening through artificial pinnae but also
addresses the fact that a pair of ears good at localising would in turn become worse if
the artificial pinnae were not as effective.
Wenzel et al (1993) conducted an experiment on a number of test subjects in which a
single ‘generic’ HRTF was used from a single person who possessed good localisation
abilities on both the azimuth and elevation planes. The localisation abilities were
compared between the listener’s personal abilities to localise and the ability to localise
through the generic HRTF through headphones. The experiment concluded that the
vast majority of the subjects were able to locate sources much to the same degree as
with each individuals personal HRTF. It was also noted however, that listeners
experienced an increased rate in front-to-back confusion.
� � ��
Fig. 4- The differing HRTFs of multiple subjects
(Truax, B. 1999)
2.6 Distance, depth and the environment of the virtual world
To aid 3D sound in becoming totally immersive it is also important to cover how players
can perceive the distance of a sound source in the virtual world. It is also necessary to
cover the environmental context in which the source then emits to aid in realism.
2.6.1 Distance and depth
The distance of a sound source is how far away the source appears to be whilst depth
is front to back distance of a sound (Rumsey, 2001).
Sources perceived from a distance compared to sources close are said to hold the
following characteristics:
� � ��
• “ Quieter (extra distance travelled)
• Less high frequency content (air absorption)
• More reverberant (in reflective environment)
• Less difference between time of direct sound and first floor reflection
• Attenuated ground reflection”
Rumsey (2001) p 35
The loudness cue is an extremely important factor in the judging of distance for the
human brain, however, if the listener has no prior knowledge of the intensity of a sound
this factor becomes ambiguous on its own.
The loudness of reverberation is a further cue for distance perception, in which the
loudness of the sound is judged against the loudness of the reverberation. The
intensity of a sound that travels directly from the source to the listener decreases by
one half or 6dB at every doubling of distance, the depreciation of the reverb amplitude
is not as great. The direct to reverberation amplitude ratio is lesser for further objects-
the reason distant objects appear more reverberant (Gardner, 1999).
Fig 5. Default Distance Model used by Waves Art Acoustic Environmental Modelling.
(Gardner, 1999, p6)
� � ��
2.6.2 Apparent source width
The Apparent Source Width (ASW) is the space a sound source appears to fill.
Reflections up to around 80ms will seemingly broaden the ASW of a source dependant
on the delay of early reflections (Ando, Sato. 2002)
2.6.3 Inside-the-head localisation
The apparent feeling that a sound source is generated from actually within the head is
named Inside-The-Head Localisation (IHL) as is obvious when wearing headphones
and can occur when using 3D sound. It is believed that this phenomenon is due to
HRTFs, head movement and reverb and not the structural shape of the head (Begault,
1994). The effect in which sound sources appear outside the head is “externalisation”,
which is also linked to the word “Spaciousness”, a term describing the sense of space
within a room (Rumsey, 2001). The use of reverberation will greatly improve IHL within
headphones. (Begault, 1994).
2.7 The implementation of 3D sound
Before discussing how 3D sound can be implemented into games it is important to look
at how this is achieved already with the games of today.
2.7.1 Current game audio techniques
Currently there are a number of tools that allow the game developer and audio
designer to place sounds within a 3D landscape of which can be interactive and non-
interactive and which can trigger sound bites or music, these tools can be presented in
the form of a GUI or in coding language (Walder, 2006).
The use of an API, built into games consoles and soundcards is a means of replaying
a programmed sound bite to trigger at a certain time, knowing where it is within a
space and knowing what type of space it is in (Hagon, Muschett. 2002).
The sound will be presented to the player via panning so that the player is able to
locate the source, orientation and speed of the object. How the sound is emitted will all
contribute to this.
� � ��
The ambiance of the sound is applied to the source based upon the surroundings of
the object and includes the likes of delay, reverb and Doppler shift. (Farnell, 2006).
Wave-Tracing is a means of emitting the sound of reverb based upon the geometry of
the room and will change rapidly based on the players movement through
environments (Hagon, Muschett. 2002).
2.7.1.1 Current use of 3D sound in video games
Even though it seems that 3D sound is not utilised much at all within mainstream
gaming there has been a lot of work into the creation of 3D sound engines, some have
been successful whilst in turn some have not.
Companies including QSound, Sensura and Aureal have all created 3D sound and
reverb engines to be used with PC games, all taking on different types of 3D sound
processing techniques to play back the audio over headphones, 2-channel, 4-channel
and surround sound systems. (Hagon, Muschett 2002). Possibly the most renowned
engine of them all; the A3D by Aureal was discontinued after a legal battle with
Creative Labs (Anon, 1998).
However, more recently GHOST Binaural Audio have released a fully working iPhone
application utilising 3D sound named “Aves” (Action=Reaction Games, n.d), popular
game audio software including FMOD and the Miles Sound system by RAD Tools also
allow for 3D sound implementation (Tandieflt, M. 2009), (RAD Tools, 2010). It is
interesting to note that the game “Aves”, at the time of writing, has a 2.5 of 5 rating and
very average reviews on the official iTunes site (Apple, 2010).
2.7.1.2 The use of 3D sound outside of video games
3D sound can be found outside of the video gaming industry, but interestingly not just
within the entertainment industry. 3D sound has been utilised by Advanced Simulation
Technology Inc. specifically for military training in the USA. The technology is
applicable to many training scenarios including gunner training, as well as convoy and
flight training (ASTi, 2010).
3D sound also entered the film industry for a brief period in 1993 for the IMAX theatre
in which binaural audio was used within a headset, allowing for 3D visuals and 3D
� � ��
sound simultaneously. The headset concept did not seem to be successful
(Schoenherr, 1999).
2.7.2 3D Sound implementation
Begault (1994) highlights four factors that will have an effect on the implementation of
3D sound into consoles these are:
• The use of the sound in the video game, whether it is a sound effect triggered
by on screen activity or music and its meaning, i.e warning, guidance and
motivation sounds.
• Function of the Audio Interface- The consideration that the sounds are mixed in
real-time or are pre-mixed.
• How well the player will be able to localise sound and the actual lengths the
developer is asking the player to take localisation, sometimes this may not be
practical for the player.
• The available resources also need to be taken into consideration, including,
time, money and DSP limitations.
Gehring (1997) suggests that “the hardware to deliver realistic binaural audio is already
in place” based upon 16-bit stereo soundcards, however, it has also been more
recently said that this can not be done due to the implementation of complex
algorithms and available processing power (Nanostuff, 2009). The creators at Action=
Reaction games have been able to prove this theory incorrect, needless to say that if
Apple’s iPhone is capable of reproducing 3D sound, much superior and powerful
gaming consoles such as the Playstation 3 and XBOX 360 will also be capable.
2.7.2.1 HRTF implementation
As previously discussed the use of individualised HRTFs will greatly increase the
effectiveness of 3D sound, however, it is also known that there may not be a practical
way to implement everyone’s personal transfer function into the audio engine. It is
important to know how a HRTF is captured in order to determine if it is possible over a
wider scale.
� � �
The use of FIR filters will allow for the implementation of HRTFs into the realm of DSP
by using a number of delay and gain variants to effectively recreate the pinnae
(Begault, 1994).
There are multiple ways of collecting HRTF measurements that will be used by many
at a later time, as previously discussed the two main ways would be to use the HRTF
of a person with good localisation or to use the average HRTF of a number of subjects.
By placing a microphone at particular points in the ear canal is it is possible to record
the HRTF of an individual by playing a tone through a set of loudspeakers positioned at
fixed points.
Plate 1. HRTF measurement of a human subject.
Interface Laboratory (n.d)
2.7.2.2 Generalised HRTF
The only reason that generalised HRTFs will be used is for solely practicality reasons
as it would be incredibly difficult to analyse the HRTF of every single player, as in the
image shown above. There is a multitude of ways that HRTF measurements can be
averaged including averaging the physical model of the ear as well as average the
spectral content of each persons ear (Begault, 1994).
� � �A
2.8 Conclusion
The above notes many of the processes that are involved in the creation of 3D sound
for video games and the advantages and extra excitement that the concept can bring.
However, the information that has been found also details that there is a great amount
of extra thought that needs to be dedicated to the technology. It is evident that there is
a long process in creating the technology and there are many factors that can improve
the experience and the effectiveness of the illusion for the player. Full understanding of
creating the perfect 3D sound audio engine is yet to be realised.
3.0 METHODOLOGY
The following section will discuss the design and execution process of the primary
experiment for this project.
3.1 Initial testing method
The initial testing method has developed over the course of the project and has
evolved from an original idea in order to produce more scientifically analytical results
that would help in forming a more defined conclusion to the main aim.
Originally it was planned that three sets of videos would be produced each
incorporating two of the main dispersions methods of today (5.1 and stereo) as well as
3D sound. The selected video was a three minute long game trailer that used a camera
view similar to the style of an FPS video game. All of the original audio was completely
muted and was to be replaced by recording and sampling various Foley sounds in the
same form as it would be for a major video game. The video was then panned in the
three different ways listed, stereo, 5.1 and 3D sound. It was then decided that it would
be played to a number of subjects in order to compare and contrast each video by
asking general and more detailed questions. However, it was concluded that the
results that the testing would produce would not be very scientifically detailed and also
the answers people could give could be formed by the quality of the sound design and
not by the sound dispersion method itself.
� � �B
3.2 Final testing design
The final testing method hoped to investigate how well people can localise sound over
the three dispersion methods that are mentioned above by playing a series of samples
that can be found in everyday life; this testing would also be done without the use of
visuals. The test hoped to address perceived direction, height and distance through
each method.
By conducting this test it would be possible to find if 3D sound has any real advantage
over surround sound.
When designing the experiment it was decided that the factors needed to be as
controlled as possible, this would therefore reveal most accurate results.
Three samples were used- a person whistling, fingers clicking and a bucket being
tapped. Each of which has a differing main frequency range which hoped to explore
the success of accurate localisation at different frequencies. The recordings that were
used were recorded in an anechoic room to ensure that reverb would not affect the
results- any reverb that was used was added synthetically afterward.
A Studio Projects B-1 microphone was used into a Motu 896 interface.
The “bucket tap” sample had a fundamental frequency of 139Hz, the “whistle” sample
had a fundamental of 1809Hz and finally the “finger clicking” sample has a
fundamental of 773Hz.
Testing mainly required the subjects to note down the perceived azimuth location of a
sound on a circular grid imagining the centre of the circle as the subjects head as well
as noting down the perceived elevation of a sound also on a separate chart.
The samples were panned in two programs; Amphiotik Enhancer (AE) (for 3D sound
and stereo) and Logic Pro 8 for 5.1. Amphiotik Enhancer allows for the panning of 3D
sound and stereo audio.
Because HRTF is a huge issue within 3D sound, CIAIR HRTF data was used, provided
by Nagoya University in Japan, which is shipped with the software (Holistiks, 2010).
There are possible limitations that may have occurred from this particular technique
concerning the HRTF data. In built with the program was one other popular set of
HRTF data; the KEMAR dummy provided by MIT labs. By utilising both sets it would
� � �C
have been possible to find out to what degree HRTFs actually effect 3D sound
localisation.
Both the stereo and 3D sound samples were panned using the Amphiotik Enhancer
software at random points around the listeners head making sure that there was a fair
variation of positions, distances and height across all of the samples.
The reason that AE was also used for stereo is due to the fact that the illusion of left
and right would still be similar; however, the illusion of total 3D immersion would be
very difficult to achieve. The extra dimension of elevation also appears within the
stereo samples adding another factor to the test that could show interesting results.
As AE does not include the ability to export audio in 5.1 it was decided that the
surround sound panner in Logic Pro 8 would be used to spatialise the audio. As with
AE the three samples were panned at various points around the listeners head,
however, no elevation was added as this feature is not available.
AE also factors in all elements of the room in attempt to create a believable illusion,
this will include room size, material of walls and distance from sound source to walls,
floors and ceilings.
Reverb was synthetically added to 50% of the samples. This enabled the testing to
show if humans are able to localise better in the virtual world with the added feature of
reverb. The reverb was added in AE for the 3D sound and stereo, and for 5.1; in Logic
Pro. As the Space Designer Audio Unit in Logic Pro is currently more flexible than the
reverb unit found in AE, the reverb was emulated to the closest possible degree,
however, it was still very difficult to make the reverbs both sound similar.
3.3 Testing procedure design
Levitin (1999) describes controlled testing as having two major factors- random
assignment and identical experimental conditions for each subject.
There were 12 subjects in total (see section 3.4) and each subject was to hear a set of
audio samples; 6 for 5.1, 6 for stereo and 6 for 3D sound, there were however, 12
samples for each dispersion method. Each subject would still hear 9 samples
throughout the test that utilise reverb, 9 that do not and would also hear 6 of each
sample. This ensured that the test was completely fair showing that every audience
hears exactly the same samples but each with controlled variables.
� � �D
All testing conditions were exactly the same. (See Section 3.5 for experimental set up).
3.4 Subjects used
It was decided that there would be 12 subjects for this test, 6 of which would be
considered trained (have had 2 or more years experience working in the audio field),
the other 6 untrained (little or no experience) helping to analysis any differences that
may occur between these two groups.
As it was expected that the results would be invariant between subjects it was felt that
12 subjects would be a fair number (Levitin, 1999).
3.5 Experimental setup
As this aim of this test was to investigate how well humans can localise sound it was
felt that to make it largely applicable to video games all the equipment that would be
used would represent what the average consumer may have in their homes.
The equipment that was used was as follows:
• Standard 19” TV for stereo output
• Sony MDR150 DJ Headphones for 3D sound output
• Logitech X 530 5.1-CH PC multimedia home theatre speaker system for 5.1
playback
The speakers were set up based upon recommendations from Dolby Digital (2010) in
order to obtain maximum immersion from the surround sound system.
� � ��
Fig 6- Recommended 5.1 set up 8”-12” from TV
Dolby (2010)
It is important to note that before the test all subjects were briefed on each dispersion
method and the benefits and disadvantages of each.
3.6 Questionnaire design
Please find a copy of the questionnaire in appendix B.
In asking the subjects to describe the location of a source the use of circles was
inspired by experiments on human localisation of speech by Begault and Wensel
(1993). In which the subjects had to dot the location of a source on a grid (Fig. 7).
� � ��
Fig 7- Azimuth and Elevation Sources for Speech Stimuli
(Begault, Wensel. 1993)
The questions that were asked after the main part of the test was conducted were to
bring added knowledge the results. For example, if a subject noted down completely
inaccurate results but said in turn it was very easy to localise the sound sources, it
would be interesting to explore why this could be the case. Furthermore if a subject
localised very well through 3D sound but said that he/she found localisation easy
through 5.1 surround sound it allows for this to be questioned also, ultimately allowing
for expansion across a number of topics that could arise throughout testing.
3.7 Experimental procedure
Before the testing began all of the subjects were made aware of the aim of the report
and why the test was being conducted, were also shown an example questionnaire to
ensure correct completion.
As the samples were being played the subjects were to note down where the said
sample was located. Once the localisation test was complete the listener was to fill out
the questions in order for the results to be expanded upon.
The test was limited to a 10 minute time period to ensure that the listeners did not
begin to fatigue.
For each subject the samples were played in differing orders to allow for counter
balancing, helping to avoid ambiguous results that may occur through testing (Lane. D,
� � ��
2007). Each sample was played in a specific order to ensure that the subjects heard
the same clips but in differing orders. This would help to avoid errors that the subjects
may have made help during testing, leaving the results more justified and comparable.
4.0 RESULTS
The experiment gave interesting results that proved to be fairly inconsistent across all
sound dispersion methods. Both trained and untrained listeners show both accurate
and inaccurate results through out the experiment, however, due to inconstancy it
could be possible to put this down to luck through guessing. It can be also seen that
some people are better at localisation than others.
When results are deemed to be accurate the listener has correctly distinguished the
correct quadrant and has correctly determined the direct angle of this quadrant to
within 15 degrees unless stated otherwise.
The subjects were also asked to score various factors from 1 to 10; 1 being extremely
easy and 10 being very difficult.
For the following results brown indicates the actual placement of the sound, the black
indicates trained listeners and the untrained listeners are shown in red.
4.1 Stereo localisation
Stereo localisation can be seen to have no real difference in distinguishing left and
right from a stereo source. The left and right detection rate was as follows; 27 of 36
sounds were detected correctly by trained listeners and 26 of 36 were detected
correctly from untrained listeners, although the vast majority were perceived correctly,
distance and accurate azimuth direction were somewhat more inaccurate.
Although general localisation of sound sources is fairly high, perfect determination is
rare but does appear throughout different listeners (fig. 8). It was found that these
� � ��
particular listeners were consistent throughout the test. These said listeners were both
trained and untrained.
Perceiving and judging the distance could be considered a somewhat more difficult
task as there is no set scale upon the questionnaire that indicates the actual physical
size of the circle, even in doing so, the listener will not have any visual or any memorial
recognition of distance. Therefore it was expected already the distance cues would not
be incredibly accurate. However, it can be seen that when a listener is accurate in
judging the location the determination of distance is also mostly correct.
Fig. 8 – Test 1 Part 1
There are also points in the experiment which show that the accuracy of sound source
localisation is low but the judging of distance is accurate.
Fig. 9 – Test 1 Part 4
Stereo projection also shows a large rate of front to back confusion of which would be
expected as there are no visual cues and the placement of the loud speakers means it
is difficult to recognise what should be in front and behind.
� � ��
The results for the judging of height was completely inconsistent and no subject was at
all very accurate, again, without the aid of visual cues, judging height proved difficult
for the subjects.
When asked how difficult it was to localise sound using a stereo dispersion method on
a scale of 1 to 10 both trained and untrained listeners listed an average of 6.
4.2 Surround sound localisation
It would be expected that the perceived rate of right to left localisation would be higher
and front to back confusion would be lower when using surround sound systems,
however, the left and right differentiation rate and overall accuracy is somewhat lower.
26 of 36 sound sources were correctly perceived between left and right for trained
listeners, however, for untrained listeners the rate was considerably less at 19 of 36.
The actual accuracy of localisation also was not very high compared to stereo, it can
be seen that although some listeners localise accurately using this dispersion method
the majority do not.
There was no elevation properties used in the surround sound samples but the
subjects were not told this, interestingly the subjects did indeed note various changes
in height for this part of the test.
When asked how difficult it was to localise sound using surround sound dispersion
method on a scale of 1 to 10 trained listeners listed an average of 3 and untrained
listed an average of 6.
4.3 3D Sound localisation
3D sound proved to be the most difficult dispersion method in distinguishing the
localisation of a sound source. 3D sound had the lowest amount between left and right
distinction; 18 of 36 for trained and only 13 of 36 for untrained. Interestingly 6 of the 18
sources that were localised correctly were in fact localised perfectly, the largest
amount of perfect localisations across each method.
� � �
The perception distance was not as accurate as it seemed to be for the stereo and
surround dispersion method and is the weakest within 3D sound.
Fig. 10 – Test 2 Part 3
Also there was less front-to-back confusion than expected even without the use of
visual cues.
When asked how difficult it was to localise sound using the 3D sound dispersion
method on a scale of 1 to 10 trained listeners listed an average of 5 and untrained
listed an average of 4.
4.4 Localising in the presence of reverb
When asked how difficult it was to localise sound in the presence of reverb on a scale
of 1 to 10 trained listeners listed an average of 7.5 and untrained listed an average of
7.
It seems that as the reverb becomes more and more obvious, localisation, especially in
3D sound, confuses the listener at a greater degree than subtle reverberation. When
reverberation is increased past 30ms and reflections begin to be perceived as echoes
and listeners become disorientated and unsure of where the sound originates from,
this is especially apparent with distance also, this is possibly due to factors associated
with the precedence effect.
It can be seen that out of 54 samples containing reverb 24 were localised accurately.
� � �A
4.5 Localising differing frequencies
It can be seen that there is differing localisation success across the three samples
used with the whistle having the worst success rate. It is clear that both the bucket
tapping and finger clicking were the easiest to localise and this is proven by the
accuracy of certain subjects both trained and untrained.
4.6 Favoured form of localisation
Interestingly, there is clear divide in the favoured form of sound dispersion methods
between both the trained and untrained groups, with the trained subjects opting for
surround sound and the untrained 3D sound in the majority.
It can be seen that the untrained listeners are split across all three of the dispersion
methods when asked which was easiest to localise sound however, the majority opted
for 3D sound to be the favoured form of localisation.
All trained subjects answered that surround sound was the most effective sound
dispersion method and then went on to answer that surround sound is the most
favoured projection technique.
0
10
20
30
40
50
60
70
Surround Sound 3D Stereo More or Less TheSame
Trained
Untrained
Fig. 11- With which dispersion method is it easiest to localise sound?
� � �B
0
10
20
30
40
50
60
70
Surround Sound 3D Stereo No Preference
Trained
Untrained
Fig. 12- Regardless of cost, which method would you enjoy playing video games
through most?
5.0 Discussion
The results suggest two major factors:
• 3D sound gives the weakest form of sound source localisation
• Surround Sound is generally the favoured form of sound localisation and seems
to be considerably more useful in localisation of sound sources around the
head.
The main reason 3D sound localisation was not so successful is most likely down to
individualised and non-individualised HRTFs. As previously mentioned, a set of HRTFs
as obtained by Nagoya University, Japan were used generally meaning that if the
listeners personal HRTFs were completely different to the Nagoya HRTFs, localisation
would be poor, furthermore, if the HRTFs of a listener were coincidently closer to the
Nagoya HRTFs localisation ability, in turn would be more accurate. This is proven by
the untrained listener in test 5 who seemed to be fairly consistent across most tests.
� � �C
Part 3
Part 6
Fig. 13 – Test 5 Parts 3 and 6
It can be seen that although the listener was not totally accurate there is a noticeable
degree of consistency across both 3D sound tests.
There are other circumstances in which listeners both trained and untrained are
completely inaccurate throughout testing.
� � �D
Part 3
Part 6
Fig. 14 - Test 3 Parts 3 and 6
The fact that these particular subjects are inaccurate could be put down to weak
localisation skills, however, it is also highly possible that the used HRTF data is
completely different to the test subject’s own individualised HRTFs.
There was much front to back confusion across all the tests and without the use of
visual cues this was to be expected, especially when listening to the stereo recordings
but was also present throughout all of the recordings. When playing video games the
gamers will have the aid of visuals cues to distinguish the direction of the sound, for
example if a player has trouble distinguishing the direction of a sound much like in fig.
15 and is unsure if the sound is coming from in front or behind on screen actions will
help the players determine this.
� � ��
Part 3
Part 6
Fig. 14 – Test 2 Parts 3 and 6
It was also found that the test subjects were only really experiencing front to back
confusion from directly in front and behind, sounds placed to the sides were not really
affected by this. Maher and Reed (2009) found a similar pattern in the front left and
rear right quadrants, it was explained the possible causes could be “intrinsic nuance” of
the possible HRTFs used or potentially the spectral content of the sounds used during
this test. Front to back confusion appeared an equal number of times across both the
whistle (1809Hz) and bucket (139Hz) samples.
Interestingly all but one of the obvious front to back confusion errors are made when
listening to completely anechoic recordings. The reverb tails for the recordings,
although sometimes confusing to the listener will help the listener fully understand
where the sound is originating from it terms of both direction and distance (Rumsey,
2001). Too much reverb however can cause interesting auditory effects including the
likes of flutter and can disorientate the listener which is proved throughout testing.
� � ��
Subjects constantly commented on the confusion that reverb brought to localisation but
this seemed only to be the case when the reverb was definitely obvious and due to the
placement of the recordings in the virtual space created echoes and flutter.
Furthermore within the questionnaire most subjects noted that reverb made localisation
considerably more difficult, but no subject made any comment when the reverb was
only slight but possibly, knowingly present.
It is possible that subjects could become confused due to the precedence effect in
which the listener will consider the first sound heard to be the original direction of a
sound (Rumsey, 2001) although this maybe considered unusual as the sound emitting
directly from the source should arrive first. As some sounds incorporate flutter due to
the positioning of the sound source, listeners may be unable to tell which the original
sound is and which the delayed echo is.
This would not normally be expected in the replication of these experiments with actual
physical objects emitting sounds in the real world. However, it could be possible to put
these ambiguous occurrences down to the effectiveness of the software, as AE
attempts to factor in all elements discussed within section 3.0
Head movement would not have affected any of the results as each subject was sat in
exactly the same position for each test. The only possible difference between each test
would be the height of each separate subject, which should not have had a major
affect on the results.
The nature of each sample is related to the rate of localisation accuracy and it is
believed that this could be due to both the spectral properties of the sound. Almost all
subjects commented that the whistle sample was the hardest to locate, the results also
show that this clip had the worse localisation success; the two other samples used
however, had a greater rate of localisation accuracy.
In earlier tests on frequency localisation Blauert (1974) found that certain areas around
the head would allow for more accurate localisation with certain frequencies named
‘directional bands’ and found that around 1200Hz and 12000Hz has relation to the
rear of the azimuth plain, 300-600Hz and 3000-6000Hz to the front and a band near
8000Hz for above the head. These figures therefore should show an increased
success for all three examples in the rear and samples close to the head, however, the
results fail to show any particular relationship to this. It seems that accurate localisation
� � ��
at all sides is very similar. With the added problem of front to back confusion and lack
of visual cues it is difficult to know if the results truly reflect this theory.
Height was also an ambiguous factor within this dispersion method with the vast
majority of subjects perceiving the audio in an unintentional position. However, there
were a small amount of subjects who were accurate when perceiving height in some
circumstances. Reasons for this can include the factors that are discussed in section
4.1 in which the subject is unsure of perceived distance based upon no visual or
cognitional cues.
5.1 Actual Use of 3D Sound in video games
It is clear from the results that 3D sound is yet largely too ineffective to be used in
video games, however, this can already be due to a number of factors including the
overall effectiveness of the software used. It is already evident that companies have
already invested time and money into creating 3D sound audio engines and have
succeeded in doing so; however, these said engines have not yet being utilised in
mainstream gaming.
Interestingly, when the subjects were asked which method each would prefer to play
video games through, trained subjects answered surround sound and contrastingly
untrained subjects answered 3D sound although the majority did not find this method
the most effective. There are a multitude of reason why this could be case.
Trained users are most likely familiar with binaural and previously knew about the
phenomenon before testing began, meaning there was possibly of biased opinions
within testing although all subjects were asked to answer with an open mind. Some
untrained users however, did not know about this effect previously and the curiosity
and originality of the effect possibly caused this set of subjects to be fonder of this
method by the end of testing.
Although some subjects may find 3D sound more effect than other methods, it remains
that, for this testing at least, this particular method was the weakest and it is possible
that this is so due to individualised and non-individualised HRTFs. Although the set of
HRTFs used were generalised it is possible that there were some ambiguous areas
that made localisation easier and/or harder at different points around the head.
� � ��
Concerning the current development of 3D sound outside of this research, one cannot
be sure of the quality of the engines that have been manufactured as they have yet to
be heard. As a number of companies have tried and have possibly not succeeded into
breaking out into commercial media it is possible that either developers do not see a
market for 3D sound or developers do not see the quality fit to include in the gaming.
6.0 CONCLUSION
To summarise it is evident that further work must go into the creation of effective HRTF
emulation in order for 3D sound to become a success and although the format has
shown some success during this experiment other existing sound dispersion methods
seem to be stronger.
Although 3D sound did struggle to be effective it can be seen that there has been
progress made in this area for a number of years, however, there has been no release
of an accessible game utilising 3D sound until very recently which has not proven to be
a great success.
Possibly the weakest and most ambiguous factor of all was height, localising through
all sound dispersion methods proved difficult for the majority of the subjects.
Distinguishing between front and back also became difficult for subjects at some
stages.
It has been shown that many areas need to be considered in order to create a lifelike
virtual soundscape with all factors bearing as much relevance as the next, this must be
considered at all times if audio in video games is to become more immersive and
realistic. It can be seen that 3D sound technology is also used outside of video gaming
meaning it could be possible to bring this technology into this particular virtual reality
world, furthermore, the advancement in video gaming may be able to bring 3D sound
into other mainstream media including the likes of film.
The general aim and all objectives, with the exception of one, were completed and
each brought valuable information to the project that helped to form an in depth
understanding of the subject at hand. The objective that was not met was altered in
order to improve the validity of the results greatly improving the final findings.
� � ��
It was felt that the initial research was greatly beneficial as much information already
exists on 3D sound, more so than original anticipated. This helped to form a solid
understanding and also aided in creating the experiments, efforts could then be
directed at finding new information.
It is still believed that 3D sound can bring about a gaming experience of heightened
immersion and further enjoyment at a lower cost and with less hassle; however, 3D
sound is currently confusing and inaccurate and ultimately needs to be perfected in
order for this technology to reach mainstream gaming.
7.0 RECOMMENDATIONS FOR FUTHER WORK
If this project were to be continued it would be ideal to create a 3D sound engine for a
short “level” in a playable video game in order to determine the actual effectiveness of
the phenomenon.
However, before this can be achieved, an increased level of research will need to be
placed into HRTFs. A method utilising one or more sets of HRTFs should ideally be
developed in order for all listeners to be completely comfortable and for localisation to
be as accurate as possible. This could include generalised HRTFs, HRTFs of
particularly good localisers or even a format that allows personalised HRTFs possibly
through exact replicas, adjustments of pre-existing or the ability to choose between
multiple sets.
In repeating this particular report it would interesting to conduct the testing across a
wider range of subjects of different ages using different forms of HRTFs, this would
help to form a better understanding on the true effect of the factor.
It would also be interesting to use visuals cues during testing to see if this influences
the results in anyway; the visual cues may not necessarily have to coincide with the
position of the sound.
� � �
8.0 REFERENCES
Angus J. A. S, Howard D. M (1999). Acoustics and Psychoacoustics. 4th ed. Oxford:
Focal Press.
Anon. (1998). Aureal and Create Engage in Legal Skirmish. [online] Available:
http://web.archive.org/web/19990829025202/www.aureal.com/cgi-
bin/pub/display.pl?template=press_aur_detail.htm&serial=76. Last accessed
04/03/2010
Apple. (2010). Aves. [online] Available:
http://itunes.apple.com/us/app/aves/id321295493?mt=8. Last accessed 15/04/2010
ASTi. (2010). ACE 3D "Soundfield Reconstruction". [online] Available: http://www.asti-
usa.com/telestra4/ace/3dsound/advantages.html. Last accessed 04/03/2010
Begualt, R. D (1994). 3D sound for virtual reality and multimedia. London: Academic
Press, p3, p39,
Blauert, J (1974). Spatial Hearing. Cambridge, Massachusetts: The MIT Press
Bridgett, R. (2007). Designing for Next-Gen Game Audio. [online] Available:
http://www.develop-online.net/features/65/Designing-for-Next-Gen-Game-Audio. Last
accessed 23/4/2010
Cheene, J (2005). Handbook for Sound Engineers ed Ballou, G.M. 3rd ed. Oxford:
Focal Press.
Cheng, C, Wakefield, G (2001) Introduction to Head-Related Transfer Functions:
Representations of HRTF in time, frequency and space, Audio Engineering Society, 49
Dolby. (2010). Little Things Make a Big Difference. [online] Available:
http://www.dolby.com/consumer/setup/index.html. Last accessed 12 March 2010
Farnell, A (2006). Designing Sound. London: Applied Scientific Press.
� � �A
HOLISTIKS. (2010). AMPHIOTIK ENHANCER ST . [online] Available:
http://www.holistiks.com/amphiotik/modules.php?name=_hes_Documents&file=_produ
cts_amen_st. Last accessed 20/03/2010
Gardner (1999) 3D Audio and Acoustic Environment Modelling Waves Arts Inc
Gehring, B. (1997). Why 3D Sound Through Headphones?. [online] Available:
www.fp3d.com/papers/whyheadphones.pdf. Last accessed 20/03/2010
Holman, T (2008). Surround Sound. 2nd ed. Oxford: Focal Press
Howley, L. (2009). Turtle Beach HPA2 PC Headset Review. [online] Available:
http://pcgamingcorner.com/wordpress/?p=1485. Last accessed 27/4/2010.
Lane DM (2007). Counterbalancing. [online] Available:
http://davidmlane.com/hyperstat/A128919.html. Last accessed 15/04/2010
Interface Labs. (n.d). Spatial Sound Research. [online] Available:
http://interface.cipic.ucdavis.edu/CIL_html/CIL_research.htm. Last accessed 12 March
2010
Maher, Reed (2009) An Investigation of Early Reflection’s Effects on Front-Back
Localisation of Spatial Audio, Audio Engineering Society International Convention 127,
New York, USA 2009
NanoStuff (1999). Why don't game developers actively pursue binaural sound
technologies?. [online] Available: http://www.reddit.com/r/gaming/comments/ablrh. Last
accessed 04/03/2010.
Pierce. J (1999). Music, Cognition and Computerized Sound ed. Cook, P. R.
Cambridge, Massachusetts: The MIT Press.
RAD Tools (2010) Miles Sound System Game Developer Magazine, 17
Rayliegh, L (1907) On our perception of sound source direction Philosophical
Magazine 13
Rumsey, F (2001). Spatial Hearing. Oxford: Focal Press.
� � �B
Silva, R. (2010). 5.1 vs 7.1 Channel Home Theatre Receivers - Which is Right For
You?. [online] Available:
http://hometheater.about.com/od/hometheateraudiobasics/qt/5-1vs7-1diff.htm. Last
accessed 12/03/2010
Schoenherr. (1999). IMAX film format. [online] Available:
http://history.sandiego.edu/gen/filmnotes/imax.html. Last accessed 04/03/2010
Tandefelt, M. (2009). True Binaural 3D/HRTF. [online] Available:
http://www.torquepowered.com/community/forums/viewthread/86563. Last accessed
04/03/2010
Traux, B. (1999). Binaural Hearing. [online] Available: http://www.sfu.ca/sonic-
studio/handbook/Binaural_Hearing.html. Last accessed 03/03/2010.
Tullis, M. (2006). Video Game Surround Sound for the Next Generation. [online]
Available: http://www.dolby.com/consumer/experience/dolbycast/transcript/3-games-in-
surround-sound.html. Last accessed 12 March 2010
Walder, C (2006) Intelligent Audio for Games, Audio Engineering Society International
Convention, 120, Paris, France 2006
Yost (1993) Perceptual Models for Auditory Localization Audio Engineering Society
International Conference, 12, Copenhagen, Denmark 1993
9.0 BIBLIOGRAPHY
Ando, Sato (2001) Apparent Source Width (ASW) of Complex Noises in relation to
Interaural Cross Correlation Function, Kobe University, Japan
Barry, D, Coyle, E, Lawlor, B (2004) Real Time Sound Source Separation: Azimuth
Discrimination and Resynthesis, Audio Engineering Society International Convention
117, San Francisco, USA 2004
Cha, Ryu, Seo (2008) Implementation of 3D sound using grouped HRTF, Audio
Engineering Society International Conference 34, Jeju Island, Korea, 2008
� � �C
Dale, W (1999) A Machine- Independent of 3D positional sound application
programmer interface to spatial audio engines Audio Engineering Society International
Conference, 16 Rovaniemi, Finland 1999
Dolhasz, A (2009) Microphone Arrays for Surround Sound Mixing and Recording,
Birmingham City University
Furse, R (2009) Building an Open AL Implementation using Ambisonics, Audio
Engineering Society International Conference 35, London, England 2009
Griesinger, D (2009) Architectural Acoustics: Perception and Binaural Effects in
Architectural Acoustics, Acoustic Society of America, 125
Griesinger, D (2001) The psychoacoustics of listening Area, Depth, Envelopment, in
Surround Recordings and their relationship to microphone technique , Audio
Engineering Society International Conference 19, Schloss Elmau, Germany 2001
Hiipakka, J (n.d) Implementation of 3D Sound in a Virtual Room, Helinski University of
Technology
Huopaniemi. J (1999) Virtual Acoustics and 3-D Sound in multimedia signal
processing, Helsinki University of Technology
Jin. C, Coredery. A, Carlile. S et al (2000) Spectral Cues in Human Sound Source
Localisation, University of Sydney
Kistler, L, Wrightman, F (1990) Hearing in 3 Dimensions: Sound Localisation, Audio
Engineering Society International Conference 8, Washingtion D.C, USA 1990
Kistler, L, Wrightman, F (1991) A model of head-related transfer functions based on
principal components analysis and minimum-phase reconstruction, Acoustic Society of
America, 91
Lluis-Garcia, Mlynek et al (2004) Advanced 3D Audio Algorithms by Flexible, Low
Level Application Programming Interface, Audio Engineering Society International
Convention 116, Berlin, Germany 2004
� � �D
Moore, B (1999) Controversies in Spatial Audio, Audio Engineering Society
International Conference 16, Rovaniemi, Finland 1999
Neukom, M (2007) Ambisonic Panning, AES convention paper 7297, 123
Schmidt, B (2002) Playing with sound: Audio Hardware and Software on XBOX, Audio
Engineering Society UK conference 17, London, England 2002
Sen, R A (nd) A system for HRTF calibration through comparison of test sounds, U.S
Patent Application
Wenzel. E, Miller. J, Abel J (2000) Sound Lab: A real-time software based system for
the study of spatial hearing, Audio Engineering Society International Convention 108,
Paris, France 2000
Yamada et al (1978) OUT-OF-HEAD Localisation headphone listening device, US
Patent
Wenzel. E, Kistler. M, Wrightman. F, (1993) Localising using nonindividualised head
related transfer functions, Acoustic Society of America, 94
� � ��
9.0 APPENDICIES
Appendix A
Minimum Audible Angle (MAA)
The MMA for the detection of a source will vary around the listener’s head. Directly in
front the MMA of a source is around 1 degree on the azimuth and 3 degrees on the
elevation plane. These figures will become progressively larger for sounds located
behind the listener (Holman, 2008).
� � ��