audio in ves - heriotruth/year4ves/slides08/slides07/l6.pdf · audio in ves ruth aylett. use of ......
Post on 14-Jul-2018
220 Views
Preview:
TRANSCRIPT
Use of audio in VEs
Important but still under-utilised channel forHCI including virtual environments.
Speech recognition for hands-free input All computers now have sound output
– At least a beep– Usually CD-quality stereo sound
Conventional stereo places a sound in anyspot between the left and right loudspeakers.– In true 3D sound, the source can be placed in any
location: right or left, up or down, near or far.
Potential uses
Associate sounds with particular events Associate sounds with static objects Associate sound with the motion of an object Use localised sound to attract attention to an object Use ambient sounds to add to the feeling of
immersion Use sounds to add to the feeling of realism Use speech to communicate with devices or avatars Use sound as a warning or alarm signal
Overall impact
High Quality audio provides:– Increased realism
• Reinforces visuals
– Strong immersive sense• World exists beyond part that is seen
– Strong positional cues– Extra information about the environment– The shape of the world
What does ‘High Quality’ mean?
VR sound environment
VR equipment creates a difficult soundenvironment.
CAVE– Stand in a glass box– Pretend you can’t hear the echoes– Also hard to place speakers
Semi-immersive VR theatre– Sit in a big cylinder– Reflects sound in a very strange way.
Workbench
Not so bad but still...– Big flat screen 1 metre in front of you– Sound coming from surround speakers– Creates echoes inappropriate for scene
Much VR audio is based on very high qualityheadphones– Use head tracking to get position and orientation– Play to user– Problem solved! - well, no
Mon-aural source
Phantom source
L speaker R speaker
Stereo sound In the entertainment industry, stereo was
the first successfulcommercialproductinvolvingspatialsound.
To placesound on the left, send its signal to the leftloudspeaker, to place it on the right, send itssignal to the right loudspeaker.
*and if the speakers are wired "in phase" and if the listener is more or less midwaybetween the speakers and if the room is not too acoustically irregular
Stereo techniques
If same signal sent to both speakers*, phantom sourceseems to originate from point midway between them.
Crossfading signal from one speaker to the other givesimpression of source moving continuously between thetwo positions.
Simple crossfading cannot create impression of sourceoutside of line segment between speakers.
Can also shift the location of the phantom source byexploiting the precedence effect (delay).
A world of sound
You are surrounded by sound all thetime– Silence is unheard of!
The environment affects (shapes?) thesound you hear– Size, shape, materials
Rendering sound: auralisation
To generate correct echoes must modelsound behaviour in the space– Rooms are complex– Filled with different materials
• Reflective, Absorbant, Frequency filtering
Just like rendering light
Putting sound into a VE
What sound? ‘Ambient’ sounds
– ‘Surround’ sound‘– Often use recorded sounds
Positional sounds– Designed to give a strong sense of something
happening in a particular place– Also often provided by using recorded sounds
Positional sound
Using sound to create the sense ofactive things in the environment– Enhances presence– Enhances immersion
Need to deal with many components– Reflections (echoes)– Diffraction effects
City models
VisClim: scene in Linköping’s Storatorget Surrounding environment
– Vehicles?• Several roads nearby
– People?• Many people in the square
– Weather noise effects• Rainfall• Snowfall (no sound but damping effect)
Air Traffic control
Simulation– No ‘ambient’ sound required– No aircraft noises– No realism wanted?
Positional warnings?– Designed to draw the users attention to
the location of a problem– Which may be out of the field of view
Creating positional sound
Amplitude– (or more)
Synchronisation– Audio delays
Frequency– Head-Related Transfer Function (HRTF)
Amplitude
Generate audio from position sources Calculate amplitude from distance Include damping factors
– Air conditions– Snow– Directional effect of the ears
Synchronisation
Ears are very precise instruments Very good at hearing when something
happens after something else– Sound travels slowly (c 340 m/sec in air):
different distance to each ear Use this to help define direction
– Difference in amplitude gives only veryapproximate direction information
What is 3D sound?
Able to position sounds all around a listener. Sounds created by loudspeakers/headphones:
perceived as coming from arbitrary points inspace.
Conventional stereo systems generally cannotposition sounds to side, rear, above, below
Some commercial products claim 3Dcapability - e.g stereo multimedia systemsmarketed as having “3D technology”. Butusually untrue.
3D positional sound
Humans have stereo ears Two sound pulse impacts
– One difference in amplitude– One difference in time of arrival
How is it that a human can resolvesound in 3D?– Should only be possible in 2D?
Frequency
Frequency responses of the earschange in different directions– Role of pinnae– You hear a different frequency filtering in
each ear– Use that data to work out 3D position
information
Head-Related TransferFunction
Unconscious use of time delay, amplitude difference,and tonal information at each ear to determine thelocation of the sound.– Known as sound localisation cues.– Sound localisation by human listeners has been studied
extensively.
Transformation of sound from a point in space to theear canal can be measured accurately– Head-Related Transfer Functions (HRTFs).
Measurements are usually made by insertingminiature microphones into ear canals of a humansubject or a manikin.
HRTFs
HRTFs are 3D– Depend on ear shape (Pinnae) and
resonant qualities of the head!– Allows positional sound to be 3D
Computationally difficult– Originally done in special hardware
(Convolvotron)– Can now be done in real-time using DSP
HRTFs
First series of HRTFmeasurement experimentsin 1994 by Bill Gardner andKeith Martin, MachineListening Group at MITMedia Lab.
Data from theseexperiments made available for free on the web.
Picture shows Gardner and Martin with dummy usedfor experiment - called a KEMAR dummy.
A measurement signal is played by a loudspeakerand recorded by the microphones in the dummy head.
HRTFs Recorded signals processed by computer,
derives two HRTFs (left and right ears)corresponding to sound source location.– HRTF typically consists of several hundred
numbers– describes time delay, amplitude, and tonal
transformation for particular sound source locationto left and right ears of the subject.
Measurement procedure repeated for manylocations of sound source relative to head– database of hundreds of HRTFs describing sound
transformation characteristics of a particular head.
HRTFs
Mimick process of natural hearing– reproducing sound localisation cues at the ears of listener.
Use pair of measured HRTFs as specification for apair of digital audio filters.
Sound signal processed by digital filters and listenedto over headphones– Reproduces sound localisation cues for each ear
– listener should perceive sound at the location specified bythe HRTFs.
This process is called binaural synthesis (binauralsignals are defined as the signals at the ears of alistener).
The problem
Rendering audio is really, really hard Much bigger problem than lighting Material properties are more complex
– Can’t fake it as easily– Properties are always a problem
Good methods exist but problem toocomputationally hard for these to be ingeneral use at present
What is possible now
Constraint is real time audio rendering– Must adapt to dynamic user who moves
unpredictably
Simple (reflectionless) stereo positional sound– Using amplitude– Using synchronization– Using HRTF frequency filtering
Useful for audio cues and simpleenvironmental sounds
What about surround sound?
Principal format for digitaldiscrete surround is the"5.1 channel" system.
The 5.1 name stands for fivechannels (in front: left, right andcentre, and behind: left surroundand right surround) of full-bandwidth audio (20 Hz to20 kHz)– sixth channel at times contain additional bass information to
maximise the impact of scenes such as explosions, etc.– This channel has a narrow freq. response (3 Hz to 120 Hz), thus
sometimes referred to as the ".1" channel.
What about surround sound?
Surround sound systems NOTtrue 3D audio systems - justcollection of more speakers.
Various commercialsurround sound formats - forhome entertainment, Dolbyis big name.– Dolby Surround Digital.
Lots of other proprietaryapproaches - e.g. theBattleChair (pictured).
Dolby Headphone
Dolby Headphone: based proprietary algorithm,presumably similar to HRTFs, originally developed byAustralian company Lake Technology– attempts to produce convincing surround-sound effects through
ordinary stereo headphones.
Technology originally developedfor VR or tele-conferencingapplications but not marketedfor consumer applications.
A more genuine 3Daudio system developed byUK company Sensaura.
Voice interaction
Voice input for control– Continuous? Discrete?
Voice output for information– Positional - alerts– Non-positional - ‘voice over’– Character-based - social channel
Voice output Voice synthesis
– Computer strings together set of phonemes (basiclanguage sound units)
– Problems with articulation: sounds robotic
Unit selection voices– Uses large database created from real voices– Plus sophisticated algorithm for putting bits
together– Good results but need very large memory (1
gigabyte) to hold database– Takes lots of time and expertise to create ‘voice’
top related