sounds for film postproduction
TRANSCRIPT
GENERATIVE FOOTSTEPS: SOUNDS FOR FILM POSTPRODUCTION
by
Julián Téllez Méndez
A DISSERTATION SUBMITED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF BACHELOR OF SCIENCE In the School of Audio Engineering
MIDDLESEX UNIVERSITY
JUNE 2013
i
ABSTRACT.
This dissertation adds to the research in post-production practices
by using generative audio to digitally re-construct Foley stages. The
rationale for combining generative audio with Foley processes is to
analyse the possible implementation of new technology that could
benefit from Foley practices in low-budget films. This research project
also intersects sound synthesis, signal analysis and user interaction,
where a behavioural analysis based on ground reaction forces was
prototyped.
ii
ACKNOWLEDGEMENT.
I would like to dedicate this dissertation to Andy J. Farnell whose
expertise has really helped me immensely on my way to writing
this essay. To Gillian McIver, Helena Hollis and Philippa Embley
who never ceased in helping me right until the very end.
A Dios por su intervención divina en este logro Académico. A mi
madre Zoraida Méndez y Abuela Bertha Daza por hacer the mi
una mejor persona. A mis tías Martha y Bellky Méndez por su
incondicional apoyo sin utedes nada hubiese sido possible.
iii
TABLE OF CONTENTS.
CHAPTER 1 INTRODUCTION. 1
1.1 SIGNIFICANCE OF THIS STUDY. 2
1.2 PROBLEM STATEMENT. 3
1.3 LAYOUT OF DISSERTATION. 4
CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS. 5
2.1 LITERATURE REVIEW. 5
2.1.1 INTRODUCTION. 5
2.1.2 SOUND TEXTURES. 6
2.1.3 DEFINITIONS AND PRINCIPLES OF GRANULAR SYNTHESIS. 8
2.1.4 STOCHASTIC ANALYSIS. 10
2.1.5 PROCEDURAL AUDIO IN RESPONSE TO FOOTSTEP MODELLING. 12
2.1.6 SUMMARY. 14
2.2 METHODOLOGY. 15
2.2.1 INTRODUCTION. 15
2.2.2 OBJECTIVES. 15
2.2.3 PARAMETERS. 15
2.2.3.1 The Grain Envelope Analysis. 16
2.2.3.2 The Grain Dynamics. 18
2.2.3.3 Footstep-modelling. 19
2.2.3.4 The Ground Reaction Force. 20
2.2.4 PROCEDURES. 23
2.2.4.1 Pure Data. 23
2.2.4.2 Arduino. 25
2.2.5 Architecture. 26
2.2.6 SUMMARY. 28
iii
CHAPTER 3 EVALUATION. 29
3.1 INTRODUCTION. 29
3.2. QUANTITATIVE DATA. 30
3.2.1 THE DATA COLLECTION METHOD. 31
3.2.2 RESEARCH FINDINGS. 33
3.2.2.1 STATISTICAL ANALYSIS. 35
3.2.2.2 T-Test. 38
3.2.2.3 Chi-Square. 39
3.2.3 THE RESULTS AND EVALUATION. 40
3.3 QUALITATIVE DATA. 41
3.3.1 DATA COLLECTION METHOD. 41
3.3.2 RESEARCH FINDINGS. 43
3.3.2.1 One-to-One Interview. 43
3.3.2.2 e-Interviewing. 44
CHAPTER 4 CONCLUSION. 46
APPENDICES. 48
APPENDIX A. 48
APPENDIX B. 49
APPENDIX C. 52
APPENDIX D. 53
APPENDIX E. 53
APPENDIX F. 54
APPENDIX G. 55
REFERENCES. 56
BIBLIOGRAPHY. 59
iv
LIST OF TABLES. Table 1: Average Quality. 37
Table 2: Chi-Square. 39
Table 3: Expected Values. 40
LIST OF FIGURES.
Figure 1: Sound Texture Extraction. 6
Figure 2: Gaussian Window. 9
Figure 3: Output list. 16
Figure 4: Transient Detector. 17
Figure 5: Grain Dynamics. 18
Figure 6: GRF Exemplified. 21
Figure 7: The Gait Phase. 22
Figure 8: GRF Distribution in Pure Data. 23
Figure 9: PD Environment. 24
Figure 10: The Cloud. 24 Figure 11: Code in Arduino. 25
Figure 12: Architecture. 26
Figure 13: Prototype. 27
Figure 14: Polynomial Curves. 28
Figure 15: Question 2. 33
Figure 16: Question 3. 35
Figure 17: T-Test in Excel. 39
1
CHAPTER 1 INTRODUCTION.
This project will focus on the use of granular synthesis
techniques for dynamically generated audio with a main
emphasis on film post-production. In particular, footstep
modelling will be studied extensively. The results will be
compared with those obtained from previously recorded content
including Foley and several location recordings. Creating
dynamically generated audio, otherwise known as Procedural
Audio (PA) is a practice that involves the process of using
programmable sound structures. This allows the user to
manipulate audio by establishing the input, the internal and
output parameters to ultimately develop a non-repetitive and
meaningful sound (Farnell, 2010).
Different types of technology have called on a number of
methods to attempt to provide a quick and efficient solution for
audio, especially on interactive applications such as video games.
Many of these sources and methods are discussed below,
however it is beyond the scope of this study to try and resolve
these issues once and for all. They will undoubtedly cause
controversy and debate for many years to come. This work, on
the other hand, aims to contribute to the existing evidence that
should add to a better understanding of generated audio. This
study will highlight the need to continue the research and
development of new technology that will help to encompass
generative audio.
2
1.1 Significance of this Study.
This study will aim to gather and provide the existing theories
in an effort to expand on, clarify and support them. Various books
and academic papers have been extensively examined in order
to tailor a perspective that can justify the reason for this study,
which aims to answer three very specific questions:
• Why are generative audio and sound modelling so important?
• How can they be applied and what methods have been
developed?
• What benefits can generative audio bring to the
Post-production industry?
In order to define the scope of this study, I have chosen to
investigate and analyse the process of modelling sound for Foley
footsteps. The purpose of this will be to study existing footstep
models, especially those exhibited in Andy Farnell’s book,
Designing Sound. Based on the studies carried out by authors
such as Roberto Bresin and Perry R. Cook, I will attempt to
formulate a footstep modelling method. This study is a compelling
effort to promote structured sound models in the post-production
industry. It will also be beneficial to sound design professionals
and students, as it will provide information and performance
evaluations of certain methods previously used in accordance
with footstep modelling. Moreover, it will be helpful to the
post-production industry and independent sound professionals,
as it will inform them more in the area of generative audio.
3
1.2 Problem Statement.
The human ear can only discern a limited number of sounds.
This selective attention, otherwise known as “the cocktail party
effect”, focuses on a particular stimulus, while filtering out a range
of other stimuli (Moray, 1959). Recent observations have shown
that one in seven people are able to recall information from
irrelevant sources (Wood and Cowan, 1995). Sonic events need
to happen in order for one to be able to differentiate between a
single or a continuous stream of events (Strobl, Eckel and
Rochesso, 2006). For this reason, sensible decisions regarding
what sounds should be heard at particular times are imperative.
Sonic content has the power to enhance the narrative of film but it
can also distract one’s attention and create discomfort.
Over the years, re-creation and re-recording of all human
sounds in a film has been refined into an art named Foley. This
process consists of several important steps and individuals and it
can be used to soften the audio as well as to heighten scenes.
According to Vannesa Ament, a former Foley artist, many films
contain so many different sounds that the listener’s ears can
easily become overwhelmed (Ament, 2009). Foley stages are
unique in the sense that they are built with various surfaces that
cover concrete, wood, carpet, tile, linoleum, dirt, sand and even
water.
One of the reasons why low-budget films sound amateur is the
lack of recording facilities and particularly the lack of Foley stages.
This dissertation will add to research in Foley practices by using
generative audio to digitally re-construct these stages. The
rationale for combining generative audio with Foley processes is
to analyse the possible implementation of new technology that
4
could benefit from Foley practices in low-budget films.
Throughout the last thirty years, customised libraries have
been an essential part of post-production work. Recording assets
have become an increasing commodity; a single library can
easily compile over ten thousand individual samples. According
to David Lewis Yewdall, it will literately take years to get to know
a library (Yewdall, 2007). Having thousands of sounds collected
has relatively simplified sound design; however sound libraries
on their own, are nothing but an agglomeration of samples.
Excellent editors can create very realistic and convincing sounds,
but they will never sound as authentic as custom-recorded ones.
1.3 Layout of Dissertation.
This study will be structured in the following ways. Chapter 2
consists of a literature review and the methodology, the literature
review will study the background framework of sound textures,
different approaches for the creation of these textures will be
highlighted; it will then introduce granular synthesis and explain
how it serves to structure generative audio. An attempt will be
made to analyse the evolution of dynamically generated audio in
response to sound modelling. Accordingly, the methodology will
be discussed and both the anatomy and actions of the foot will be
examined in detail to gain a greater understanding of the modes
and dynamics of gait movement. This research project will follow
a post-positivist approach where cause and effect thinking is
reinforced. Chapter three will evaluate and analyse all data
collected in an attempt to convey a structure for footstep
modelling, which will be summarised and concluded in Chapter
four.
5
CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS.
2.1 Literature Review.
2.1.1 Introduction.
This chapter provides a review of the literature and secondary
data related to sound texture, granular synthesis and footsteps
modelling. Accordingly, this chapter will initially discuss the
principle of sound texture, presenting some examples and
observations on the subject. Consequently, it will proceed to
define granular synthesis, followed by a description and analysis
of the evolution and development of procedural audio in response
to sound modelling. The concept of procedural audio (PA) is
closely linked to programming as it uses routines, subroutines
and methods to create, reshape and synthetise sound in real time,
thus there will also be an analysis of how these two relate to one
another. Finally, there will be a critical analysis of the benefits and
challenges of implementing procedural audio in post-production,
as well as a consideration of the failures of its implementation.
6
2.1.2 Sound textures.
Further studies in everyday listening, led by Gaver (Gaver,
1993) have served as the foundation for understanding sound
and hearing, particularly in the analysis and synthesis of sounds
with procedural audio. By separating contact objects from any
interaction, Gaver described individual impacts as a continuous
waveform, which characterises the force they introduce to the
object, suggesting that there may be information for interactions
that are invariant over objects, in this particular, case it is the
force exerted when a person’s body is in contact with the ground
(see figure 1). This particular topic will be examined extensively
in the oncoming subheadings. In the virtual world, interaction is
represented in terms of energy passed through a filter allowing
objects to be modelled independently. With regards to footstep
modelling, Farnell, who reflects on the generation and control of
complex signals, has also extensively researched the behaviour
and intention of sound. “Reflecting on the complexity of walking,
you will understand why film artists still dig Foley pits to produce
the nuance of footsteps, and why sampled audio is an inflexible
choice” (Farnell, 2010).
Figure 1: Sound Texture Extraction (Gaver, 1993, p 293).
7
Despite new contributions to this concept being theoretical, a
few implementations such as the Foley Automatic developed by
Kees van den Doel, Paul G. Kry and Dinesh K. pai, have proven
to deliver high-quality synthetic sound. The Foley Automatic is
composed of a dynamics simulator, a graphics renderer and a
audio modeller. Interactive audio depends upon world events
where order and timing are not usually pre-determined.
According to Farnell, the common principle, which makes audio
interactive, is the need for user input. In an attempt to represent
emotional qualities, sounds need to adapt to pull the mood of the
user (Farnell, 2007).
This project is based on Gaver’s foundation analysis and
synthesis of sounds, which involves an iterative process of
analysing recorded material and synthetising a duplicate on the
basis of the analysis. As described by Gaver, the criteria for
sound texture is based on conveying information about a given
aspect of the event as opposed to being perceptibly identical to
the original sound (Gaver, 1993). Nicolas Saint-Arnaud, defined
sound texture as a constant long-term characteristic and
attention span. “ A sound texture should exhibit similar
characteristics over time. It can have local structure and
randomness but the characteristics of the fine structure must
remain constant on the large scale. A sound texture is
characterized by its sustain… Attention span is the maximum
between events before they become distinct. High level
characteristics must be exposed within the attention span of a
few seconds ” (Saint-Arnaud, 1995).
Different studies have broadly approached the question of how
to perform a sound segmentation in order to create a sonic event
8
that resembles the original. However, no up to date applications
for producing sound textures are available and it is still based on
manually editing recorded sound material. An increasing number
of analysis and synthesis of sound textures have been formulated
in the past few years, where an intersection of many fields such
as signal analysis, sound synthesis modelling information
retrieval and computer graphics is notorious (Strobl, Eckel and
Rochesso, 2006). In the context of footstep modelling, granular
synthesis presents arguably the best approach; this research is
therefore to study the principles of granular synthesis in an
attempt to collect information that could lead to better-structured
and concise sound model.
2.1.3 Definitions and Principles of Granular Synthesis.
The concept of granular synthesis has existed for many years,
based on the fletcher paradox, stated by Zeno, which divides
time into points as opposed to segments “if everything when it
occupies an equal space is at rest, and if that which is in
locomotion is always occupying such space at any moment, the
flying arrow is therefore motionless” (Aristotle, 239). Albert
Einstein also predicted that ultrasonic vibration could occur on
the quantum level of atomic structure, which led to the concept of
acoustical quanta (Roads, 2001).
Consequently, there are various descriptions and definitions of
granular synthesis that are in existence. British scientist, Dennis
Gabor proposed that “ All sounds can be decomposed into a
family of functions obtained by time and frequency shifts of a
single Gaussian particle. Any sound can be decomposed into an
appropriate combination of thousands of elementary grains”
(Gabor, 1946); such a statement was significant in the
9
development of time frequency analysis, and set the starting
point for granular synthesis. Roads who implemented granular
sound processing in the digital domain has also made several
contributions. In his book Microsound, he stated that “ sound can
be considered as a succession of frames passing by at a rate too
fast to be heard as discrete events; sounds can be broken down
into a succession of events on a smaller time scale” (Roads,
2001).
For the purpose of this research project, the description
provided by Gabor with a slight variation on the pure Gaussian
curve (see figure 2) will be adopted (Farnell, 2010). A Tukey
envelope, also known as the cosine-tapered window, will be used;
this envelope attempts to smoothly set the waveform to zero at
the boundaries, evolving from a rectangle to a Hannig envelope
(Harris, 1978). It is useful to briefly consider the principles of
granular synthesis and how these affect audio. According to
Roads, (2001) a micro-acoustic event contains a waveform,
typically between one thousandth of a second and one tenth of a
second, shaped by an amplitude envelope. The components of
any grain of sound approach the minimum perceivable time for
duration, frequency and amplitude, creating time and frequency
domain information. By combining grains over time, sonic
atmospheres are created. However, granular synthesis requires
a broader amount of control data, which is usually controlled by
the user in global terms, leaving the synthesis algorithm to fill in
the details.
Figure 2: Gaussian Window (Roads, 2001, p87).
10
Gabor (1946) observed that any signal could be expanded in
terms of elementary acoustical quanta by a process, which
includes time analysis. Grain envelopes and durations vary in a
frequency-dependent manner. However, it is the waveform within
the grain, which is the most important parameter, as it can vary
from grain to grain or be a fixed wave throughout the grain’s
duration. This implementation pointed out the biggest flaw of time
granulation, a constant level mismatch at the beginning and end
of every sampled grain, creating a micro-transient between
grains and thus, resulting in a periodic clicking sound. More
recent work has shown that when grain envelopes are
overlapped, it creates a seamless cross-fade between them
(Jones and Parks, 1988). Numerous generative audio content
has been created and extensively developed using the principles
of acoustical quanta, allowing sound designers to easily sample,
synthesise and shape audio content; producing complex but
controllable sounds with a relatively small Central Processing
Unit (CPU) usage. According to Curtis, a grain generator is a
basic digital synthesis instrument, which consists of a wavetable
where amplitude is controlled by a Gaussian envelope. In this
project, the global organisation of the grains will follow an
asynchronous system, which means that the grains will be
encapsulated in regions or ‘clouds’ which are controlled by a
stochastic or chaotic algorithm.
2.1.4 Stochastic Analysis.
Stochastic event modelling is a process that involves random
variables where X = {X(t) ; 0 ≤ t < ∞}, on the synthesis level,
the aim is to generate a signal that can vary continuously
11
according to various parameters. Dynamic stochastic synthesis is
a concept that has existed for the last fifty years, composers such
as Xenakis, have speculated about the possibility of synthesising
completely new sonic waveforms on the basis of probability
(Harley, 2004). Xeneakis proposals to the usual method of sound
synthesis take the form of five different strategies (Roads, 1996):
1. The direct use of probability distributions such as
Gaussian and exponential.
2. Combining probability functions through multiplications.
3. Combining probability functions through addition (over
time).
4. Using random variables of amplitude and time as functions
of other variables.
5. Going to and fro between events using variables.
Roads describes how the user could control the grain ‘cloud’ by
adjusting certain parameters, these include (Roads, 2001):
1. The start-time and duration.
2. The grain’s duration.
3. The density of grains per second.
4. The frequency band of the cloud.
5. The amplitude envelope of the cloud.
6. Their spatial dispersion.
All these considerations will be tested and further explained in
the oncoming subheadings, where the effects of different grain
duration, densities and irregularities will be examined in more
detail.
12
2.1.5 Procedural Audio in Response to Footstep Modelling.
Having examined the principles of granular synthesis and
determined a suitable definition for this study, it is vital to
understand the evolution and growth of procedural audio and
how the development of technology and interactive demand has
shaped its expansion. Traditionally in films, pre-recorded
samples are commonly used to simulate diegetic sounds such as
footsteps. All sonic material needs to be gathered in order to
represent what is being shown on the screen, to achieve a high
level of fidelity, sound libraries and directly recorded sounds are
implemented (Mott, 1990). However, this approach has several
disadvantages; sampled sounds are repetitive and location
recording is not always the best or easiest option. Recent
synthetic sound models have seen an increase in interest, with
several algorithms, which make it possible create sounding
objects by implementing the use of physical principles (Cook,
2002). Despite the recognised advantages and benefits of
procedural audio, a review of the literature in this area has
revealed that the adoption of procedural audio is relatively low.
This is one of the areas of challenge that this research project
seeks to address, however before this can be measured and an
accurate research instrument created, it is also useful to
understand the state of procedural audio in the sonic industry. It
appears evident from the literature that procedural audio and
Programming are inextricably linked (Javerlein, 2000). Generally
the latter measures the success of the outcome, however the
concept of procedural audio has been further redefined in terms
of a combination of linear, recorded, interactive, adaptive,
sequenced, synthetic, generative and artificial intelligence (AI)
13
audio, which suggests that what is of greatest importance in
procedural audio is the meaning we give to the input, internal
states and output of the systems. Having taking this all into
account, if programming is only a means by which one creates
meaningful sound, where does the conflict lie?
The problem with procedural audio is that there aren’t any sets,
which contain the sound of a specific object and if there are, there
is no way of searching for them. Farnell strongly believes that a
better approach to producing sound requires more traditional
mathematical approaches based on engineering and physics
(Farnell, 2007). However, dynamically generated sound is not the
answer to all these problems; there are plenty of areas where it
fails to replace recorded sound, such as dialogues and music
scores. Even though methods for research and development
have been established, practical issues continue to affect the
realism of dynamically generated sound.
Sound designers, who have adapted their skills and learned
new tools, are in the process of finding equilibrium between data
and procedural models, which is not a fast or a complete process.
Perhaps one of the greatest disadvantages of generated audio is
that it still cannot encapsulate the significant sounds of life.
Post-production sound effects seem to fall into the psychological
rather than technical category, in most cases, they reveal through
sound the acoustic landscape in which we live in. Associations of
everyday sound play a decisive part in the language of sound
imagery, but they can easily be confused. “One of the reasons for
this is that we often see without hearing” (Balazs, 1949).
According to Bela Balaz, the Hungarian-Jewish film critic, “there
is a very considerable difference between our visual and acoustic
education”. We are far more used to visual forms than sound
14
forms; this is because we have become accustomed to seeing
and then hearing, making it rather difficult to draw conclusions
about a concrete object just by listening to it. The relationship
between visuals and sound will be furthered explained in chapter
3.2.2.1 Statistical Analysis. Sample based audio has proven to be
successful, because its principle is to represent our acoustic
world, however it is an impractical method as it fails to change in
accordance to the visible source. On the other hand, a single
procedural structure could accurately replace an entire sound
library; the problem does not lie in its principles but in that it
attempts to represent motifs associated with various situations in
film rather than our acoustic world. Having generated a great deal
of sample based audio, production companies have drastically
changed our perception of sound through film, associating
melodies and sound to specific objects or situations, making it
particularly difficult for new content to take over.
2.1.6 Summary.
This literature review has studied the background and
evolution of dynamically generated audio and has also analysed
its evolution in parallel to developments in technology. It is clear
that whilst procedural audio has many obvious advantages, its
acceptance has been lower than expected and various reasons
have been suggested to explain why this might be the case. This
section has also briefly mentioned some footstep modelling
followed by a critical analysis of the benefits and challenges of
implementing procedural audio in post-production. The following
section will present the methodology that will be used during this
study.
15
2.2 Methodology.
2.2.1 Introduction.
In this chapter, the objectives, parameters and procedures
used in this research project are explained; especially those
involved in developing dynamically generated sound where the
process for creating a footstep-modelling analysis will be
explained.
2.2.2 Objectives.
The general objectives of this research project are:
• A review of the existent knowledge on sound textures and
footstep modelling.
• To develop a method for the creation of dynamic sound
textures.
• To incorporate the previously mentioned method in
footstep sound modelling.
2.2.3 Parameters.
According to Yonathan Bard, models are designed to explain the
relationships between quantities that can be measured
independently (Bard, 1974). To understand these relationships a
set of parameters need to be introduced.
16
2.2.3.1 The Grain Envelope Analysis.
The system architecture of this model extracts and analyses
the signal with an envelope follower, which outputs the signal’s
root means square (RMS.) All significant peaks are located once
the threshold has been set. If no threshold is selected, all peaks
above 50 dB will be segmented into individual events. In order to
ensure that peaks are tracked accurately a Hanning window,
sized in samples (1024 default), has been set. Once the
envelope has marked all the significant peaks, the DSP will then
output and list all the events. Figure 3 shows a simple example of
the envelope follower’s listing process.
Figure 3: Output list.
The numbers shown in Figure 3 are expressed in milliseconds
and are applied to mark the cut-off points between events.
Significant sub events can sometimes be found within the events,
for this reason the sample gets normalised, which makes the
peak-to-peak transient recognition much more effective. This
process, however, is strictly for events recognition and is not
used as part of any playback. Thus, signal-to-noise ratio is not
raised at any moment. Each particle noise event can be
pitch-shifted, reversed, stretched and smoothed. In his analysis
17
of walking sounds, Cook suggested that in order to gel the sonic
events, a short and exponentially decaying noise burst should be
added, which has proven to be and exceptional addition to this
algorithm.
According to Curtis, time appears to be reversible in the
quantum level, meaning that grains or ‘events’ can be reversed in
time. Moreover, if the grain envelope is symmetrical, the reversed
event should sound exactly the same. In Pure Data (PD), this
was easily achieved by simply reversing the output list, which
turned out to be a success as it gave the sound texture a
time-reversible feature. However, as the overall amplitude of the
samples synthetised were not symmetric, it was impossible to
demonstrate that the waveform of a grain and its reversed form
were identical. Figure 4 shows the envelope analysis process.
Figure 4: Transient Detector.
18
2.2.3.2 The Grain Dynamics.
In order to provide a more comprehensive interaction with the
sampled signal, several dynamics such as density, duration and
pitch of the grain were implemented. The grain density was easily
achieved by dividing the number of grains by a thousand. On the
other hand, the duration and pitch required a more precise
adjustment.
To identify the pitch of a sound texture is extremely difficult, as
they do not posses any harmonic spectra. However, if properly
arranged, it is possible to distinguish the sound texture from
being higher or lower. Frequency and time are inversely
proportional in the micro level (Gabor, 1947). Therefore,
expanding or shortening a grain has inverse repercussions on its
frequency bandwidth, which results in an evident change of
timbral character. In order to achieve an accurate timbral change,
a two-octave bandwith was introduced. Figure 5 shows the ‘patch’
implemented to transform the pitch of a selected event.
Figure 5: Grain Dynamics.
19
2.2.3.3 Footstep-modelling.
This section describes how particles are extracted based on
Physically Inspired Stochastic Event Modelling (PhISEM).
According to Cook, who has extensively researched this area, the
parameterisation of walking sounds should involve interaction,
preferably provoked by friction or pressure from the feet. A
stochastic approach, a non-deterministic sequence of random
variables, models the probability that particles will make noise;
sound probability is constant at each time step (Cook, 2002).
Studies have shown the human ability to perceive source
characteristics of a natural auditory event. From various analyses
applied on walking sounds, a relationship between auditory
events and acoustic structure was found. This study considered
sounds of walking and running footstep sequences on different
textures. Textures such as gravel, snow and grass were chosen,
this was motivated by the assumption that a noisy and rich sound
spectra will still be perceived by the ear as a natural sound.
Studies carried out by Roberto Bresin, who has extensively
studied new models for sound control, shown how a double
support is created when both feet are on the ground at the same
time, suggesting there are not any silent intervals between two
adjacent steps. However, not specifying a time constrain
between two particular events will blend them into a unison
texture; therefore an ‘Attention Span’ has to be created between
steps, in order to perceive them as separate events
(Saint-Arnaud, 1995). According to Bresin, Legato and Staccato
can be associated to walking and running respectively. Some of
his recent work has reported a strong connection between motion
and music performance.
20
Having stated several parameters that directly influence
walking sounds, it is evident that large libraries of pre-recorded
sounds do not contain every possible scenario, which greatly
compromises the sonic appreciation.
2.2.3.4 The Ground Reaction Force.
A footstep sound is a combination of multiple impact sounds
between the foot (exciter) and the floor (resonator). This model
has chosen to separate both components and consider the
exciter as an input for different types of resonators. In other
words, by extracting the pressure exerted by one’s foot, different
modes can be extracted and implemented to recreate the sounds
of different kinds of floors. In the field of mechanics the pressure
exerted by one’s body is called the Ground Reaction Force
(GRF), which derives from Newton’s third law: “To every action
there is always opposed an equal reaction: or the mutual actions
of two bodies upon each other are always equal, and directed to
contrary parts” (Newton, Motte and Machin, 2010). The
architecture of this model will use the ground response force
principles to find and analyse the forces that intervene in the
creation of the multiple impact sounds that constitute a footstep.
It will then apply the analysed forces to the different resonators,
creating an opposed equal reaction that will be later translated
into sound. See figure 6. In order to analyse the forces involved in
the foot’s motion, it is important to understand how they are
distributed. A normal gait is composed of two phases, a stance
phase (60%) and a swing phase (40%). The stance phase is
composed of five categories, initial contact, loading response,
mid-stance, terminal stance and pre-swing.
21
Figure 6: GRF Exemplified.
(http://epicmartialarts.wordpress.com/tag/ground-reaction-force/)
The swing phase consists of an initial swing, a mid-swing and
a terminal swing (Porter, 2007). All these phases exert different
forces making it incredibly hard to translate all of his
micro-movements into sound. Farnell has proposed to analyse
the gait phases not as individual events, but as a distribution of
forces. As a result, three phases become apparent as shown in
figure 7 (Farnell, 2010):
1. The Contact Phase: The heel makes contact with the ground
and the ankle rotates the foot.
2. The Mid-stance Phase: The body’s weight is shifted onto the
outer tarsal.
3. The Propulsive Phase: The foot rolls along the ground ending
up on its toes.
22
Figure 7: The Gait Phase.
(http://naturalrunningcenter.com/2012/06/21/walking-vs-running-gaits/)
Ideally, each gait cycle would generate identical GRF
distributions, however they can significantly change as the
walking pace and level ground change. If this weren’t the case,
two complete footsteps could be sufficient in generating a walking
pattern. This introduces another variable, the movement of the
body, which fluctuates above and below the sum of the left and
right foot’s GRF. Andy J. Farnell, explained in his book Designing
Sound, the three different modes of movement (Farnell, 2010):
1. Creeping: Minimises pressure changes, which
diminishes the sound.
2. Walking: Maximises locomotion while minimising
energy expenditure.
3. Running: Accelerates locomotion.
Figure 8 exemplifies the Ground Reaction Force distribution of a
gait phase, where the body’s weight is transferred onto the heel,
sometimes before the weight is completely transferred, there is a
transient force experienced just before the load response,
surprisingly this force exceeds the normal standing force. The
weight’s distribution between the heel coming down and the toe
pushing off evens out just before the propulsive phase where the
body’s weight is entirely on the feet.
23
Figure 8: GRF Distribution in Pure Data.
2.2.4 Procedures.
This section describes the instruments and architecture
involved in the creation of this research project. It aims to
establish an efficient workflow that could later be implemented to
future work. This section will also explain how diverse theories
and models will be tested and how relevant data will be collected.
2.2.4.1 Pure Data.
The demonstration prototype that accompanies this research
project has been built using this platform. In order to run this
software or ‘patch’ Pure Data 0.44.0 is required. Pure Data (PD)
is an open source visual programming language developed by
Miller Puckette. It is a real time graphical programming
environment for audio, video and graphical processing. PD was
chosen partially because it is designed for real-time processing
and because it allows a fast modification of parameters, making it
extremely interactive and user friendly (See figure 9).
24
Figure 9: PD Environment.
Figure 10: The Cloud.
25
2.2.4.2 Arduino.
In order to establish a more interactive communication
between the user and the ‘patch’, a piezo-resistive force sensor
was implemented (see figure 13). The prototyping platform
Arduino UNO creates a link between PD and the presence
sensor. When pressure is applied to the sensor, Arduino will
receive the input of the analogue pin, which ranges from 0 to
1023 and it will then transmit the value to the object ‘comport
9600’ in PD. Figure 11 illustrates this process.
Figure 10: Code in Arduino.
26
2.2.5 Architecture.
This footstep model has been inspired by Perry
Cook and Andy J. Farnell’s approaches to
walking sounds. Their investigations into
parametrised synthesis, especially granularity,
have been of great help. Figure 12 illustrates the
signal flow of this prototype. Based on Road’s
idea of user’s control, this ‘patch’ routes all the
information to a common ‘cloud’ (see figure 10)
where the user can easily modify the dynamics of
the grain, as well as the sensitivity of the feet
sensors. All seven parameters mentioned in
section 2.1.4 (see page 11) were taken into
account when designing this ‘patch’. The sensors
define the start time and duration of this process
(1). The grain duration is specified by the option
‘smooth’, which divides its input into a 100ms
window and adds it to the transient’s size (2).
Seemingly the density of grains per second
(grains/1000ms) is specified by the option ‘grains’
(3). Two band-pass filters determine the
frequency band of the cloud (4). An amplitude
envelope and a freeverb~ (PD custom’s reverb)
have also been incorporated, giving the user the
option of custom-shape the signal before it
reaches the output (5 & 6).
Figure 12: Architecture.
27
Figure 11: Prototype.
In order to accurately transcribe and digitise the sensor’s
information, a split-phase and a polynomial curve have been
incorporated. The split-phase converts the input given by the
sensors into a signal that can be later scanned by the Phasor~
object in PD. It combines both feet and creates a time constrain
between one another, defining an ‘Attention Span’ fooling the ear
into perceiving both inputs as separate events (Saint-Arnaud,
1995). The polynomial curve is defined by the equation (Farnell,
2010):
f(x) = -1.5n (x^3-x) (1-x) where, 0≤n<1
Figure 14, illustrates the envelope of the polynomial curve for
the minimum and maximum values of n. These curves create a
small envelope for each of the three gait phases aforementioned.
See Figure 8. As mentioned in chapter 2.3.2.1, a burst of noise
has also been added to the ‘patch’, which contributes to the
randomness of the stochastic analysis and helps to mask any
imperfections of the grain selection, if any. A low-pass filter has
been attached to the noise generator, so that high frequencies
can be added or filtered out. This white noise is triggered directly
by the sensor pad.
28
In order to evaluate the accuracy and precision of these
methods external feedback will be collected, this will be explained
further in the following chapter.
Figure 12: Polynomial Curves.
2.2.6 Summary.
This methodology has extensively analysed the existent
knowledge on sound textures with granular synthesis in order to
develop a method for the creation of dynamically generated
textures (see page 16). It then proceeded to integrate the
mentioned model to a footstep model created from the
behavioural analysis conducted in section 2.2.2.3 (see page 19).
It has also described the architecture of the prototype designed
as part of this research. The following chapter will present the
Evaluation Process that was used for this project.
29
CHAPTER 3 EVALUATION.
3.1 Introduction.
The evaluation process presented in this study uses a mixed
method design. According to John W. Creswell, analysing both
quantitative and qualitative data helps to understand the research
problem thoroughly (Creswell, 2002). A mixed method design is
based upon pragmatic statements, which accept the truth as a
normative argument. Interesting opinions have been given
regarding mixed methods, however the issue of distinguishing
between aesthetic assumptions have not been addressed yet
(Sale, Lohfeld, Brazil, 2002). This research project will use a
sequential explanatory mixed methods design, according to
Creswell (Creswell, 2002), this method is ‘’the most
straightforward of the six major mixed method approaches”,
which is an advantage as it organises data more efficiently. This
method collects and analyses quantitative data and then goes on
to collect and analyse qualitative data. Keneth R. Howe, an
educational researcher, stated that researchers should forge
ahead only with what works. Following this statement, this study
introduced three topics, in order to structure the design of this
research project: Priority, Implementation and Integration
(Creswell, Plano Clark, Guttman & Hanson, 2003).
30
a) Which of these methods, quantitative or qualitative
will be emphasised in this study?
b) Will data collection come in sequence or in
chronological stages?
c) How will this data be integrated?
Special priority will be given to quantitative data leaving all
qualitative results to assist the results obtained in the quantitative
stage. For the purposes of efficiency, data was collected and
integrated in chronological stages, which offered a more
comprehensive and broader landscape of the gleaned
information.
3.2. Quantitative Data.
Michael Chion explained, in his book Audio-Vision, how
sounds can objectively evoke impressions without necessarily
relating to their source (Chion, 1990). A combination of
synchronism and synthesis, forged by Chion as synchresis,
describes the mental fusion between sounds and visuals, when
they occur simultaneously. According to Chion (Chion, 1990,
p115), when a precise expectation of sound is set up, synchresis
predisposes the spectator to accept the sound he or she hears.
With regards to footsteps, Chion refers to synchresis as
unstoppable stating that, “We can therefore use just about any
sound effects for these footsteps that we might desire” (Chion,
1990, p64). As an example, he referred to the film comedy Mon
Oncle, by the French filmmaker Jacques Tati, where a variety of
noises for human footsteps, which involved Ping-Pong balls and
31
glass objects were used. One of the purposes of this survey was
to demonstrate how synchronised sound textures could fool the
ear into thinking that real footsteps are being played. In order to
achieve this, a total of ten clips were played to an audience,
where a mixture of Foley, location recording and generated
sounds were given. A non-probability sampling approach was
used for this research project, as it is not the purpose of this study
to infer from the sample to the general population but to add to
the knowledge of this study.
3.2.1 The Data Collection Method.
Data collection mostly consists of observations, where several
audio samples were compared to those created with the model
aforementioned. A self-developed survey, containing items of
different formats such as multiple choice and dichotomous
questions were structured. Colin Robson describes surveys as a
very effective method in collecting data from a specific population,
or a sample from that population (Robson, 2002). Seemingly,
they are widely accepted as a key tool in conducting and applying
research methods (Rossi, Wrigth and Anderson, 1983). The
survey consisted of five questions, which will be divided into two
sections. The first section of this analysis asked questions related
to the participants’ status (Audio or Film student). The second
section measured the participants’ ability to differentiate between
recorded and dynamically generated sounds (See Appendix A).
This model sought to understand the individuals’ perception of
diegetic sounds.
The quantitative data was collected on the 29th and 30th May
2013 at Student Audio Engineering (S.A.E) House in east London,
U.K. The survey was distributed to a specific population of
32
students (Audio and Film students). In total thirty individuals were
given surveys. Based on Howe’s statement (see 3.1), the goals
of the surveys were to identify what sound textures participants
believed to be real. Two independent variables were introduced;
these were Recorded and Generated Sounds, which were played
at random to the participants. As mentioned above, a total of ten
short-clips containing five different sound textures, were prepared
for this survey. The first group of participants surveyed were
mostly Audio students, a brief explanation explaining ‘attention
span’ and the layout of the audio, was given prior the survey. At
first participants were asked to listen to just the audio of the
short-clips. A fifteen second gap between clips was given, not
only for them to draw their own conclusions (as an informal
conversational interview) but also to allow their short-term
memory to ‘forget’ the sonic information, which they had gathered.
According to Perry Miller, the duration of our short memory
seems to be between fifteen and thirty seconds (Miller, 1956).
This way, the average audio-visual span disappears from one’s
mind, allowing new data to be processed clearly. The second part
of the survey combined both picture and sound. The structure of
the survey (see Appendix A) contained three basic questions,
which were aimed to investigate the participant’s relation to
sound libraries. Both questions how you would rate the content
and what you would look for in sound libraries, were an excellent
start, which led to an open debate conducted after the survey.
This offered even more data for discussion and research.
33
3.2.2 Research Findings
This section describes the results of the survey by initially
assessing the descriptive statistics in order to specify the different
variables and characteristics that were measured. An analysis of
the remaining variables and aspects of the survey will also be
presented. As described in the previous section, the research
population comprised of thirty research participants using a
non-purposive sampling approach. The quantitative variables of
this project were collected on two different days, as it was very
difficult to integrate both audio and film students together. In
order to accurately measure both departments, this research
project has surveyed a total of fifteen audio students, one audio
specialist, thirteen film students and two film specialists. The first
part of the survey (see Appendix A) established how many
participants had used sound libraries for their particular projects.
As seen in figure 15, when asked about the content of such
libraries in question two, 40% of both audio and film student
thought their quality was very poor.
Figure 13: Question 2.
HQ 20%
GOOD 20%
VARIES 20%
POOR 40%
AUDIO STUDENTS: QUESTION 2
VARIES 60%
POOR 40%
FILM STUDENTS: QUESTION 2
34
However, the concept of poor is a very vague statement. Are
the contents of these libraries poor in sonic quality? Or are they
poor because they do not meet the user’s needs? In order to
clarify this concept a follow up question was introduced, this was
What do you look for in sound libraries? As shown in figure 16,
Audio and Film students look for very different and specific
material. 67% of the audio students surveyed specifically looked
for ambience sounds, whereas 57% of the film students surveyed
looked for Foley sounds. Many types of hypotheses can be
drawn from this statement. The perception of sound in film goes
far and beyond the pure physics of the sonic spectra. Throughout
history, film producers have chosen to artificially construct the
sound of their films (Gorbman, 1976).
Advances in technology have expanded the creative
possibilities of filmmakers and sound designers; the difference
lies in how these sonic experiences are created. Based on the
data collected, one could easily assume that film students have
an internal approach to sound (Chion, 1990). Physiological
sounds such as breathing and moans, or more subjective sounds
such as a memory or a mental voice can easily be achieved by
using Foley and ADR (Automated Dialogue Replacement)
practices, which might explain why their main concern, when
browsing through a sound library, are Foley sounds. On the other
hand, audio students seek to describe the ‘soundscape’ of the
picture, either by recreating the sonic characteristics of the
environment or by artificially creating a completely new sonic
environment.
35
Figure 14: Question 3.
Another question arises from these two hypotheses. This is,
how is the quality of such libraries perceived, if their contents are
listen to as part of a group of sounds? This is a very important
question as it strives to understand our perception of artificially
constructed sound. Being able to recreate generative audio
means nothing if it does not work in the context it was designed
for. In order to understand this matter, the aforementioned
footstep sounds (see 2.2.2.3) were played along with ambience
sounds as well as different sound effects and dynamics. The
results will be analysed in the next section.
3.2.2.1 Statistical Analysis
This section examines the results of the statistical analysis
collected from the second part of the survey. It tries to understand
how generative audio can be implemented in postproduction
processes. It should be noted at the outset of this analysis that
this research followed a non-sampling technique. According to
researchers such as Fredrick J Gravatter, convenience sampling
is probably the most adequate method to use when the
FOLEY 16%
AMBIENCE
67%
OTHER 17%
AUDIO STUDENTS: QUESTION 3
FOLEY 57%
AMBIENCE
29%
FX 14%
FILM STUDENTS: QUESTION 3
36
population to be investigated is too large. Participants were
therefore selected, based on their accessibility and proximity to
the researcher. Although convenience sampling does not offer
any guarantee of a representative sample, it collects basic data
that could later be analysed or used as a pilot study (Gravetter,
2011 p 151). In order to ensure that each variable was evaluated
to its best, they were examined one at a time, a series of visual
displays were created to help explain the relationships between
the variables examined in this study. A total of ten short clips
were presented to the participants, to answer the question, where
do you think unrealistic sounds have been placed? Participants
were given a scale from one to five to rate clip’s realism. The
films that were used for this experiment were: Terminator 2:
Judgment day (1991) mixed by the American sound designer
Garry Rydstrom, Pulp Fiction (1994) mixed by David Bartlett,
Mon Oncle (1958) produced by Jacques Tati and Here (2013)
produced as part of my portfolio. Clips one, three, four and five
were re-mixed in order to introduce the footsteps generated by
the ‘patch’ developed. The purpose of this experiment was to
determine what combination of sounds seemed the most realistic
to the participant. The results of this experiment are shown in
Appendix B. This research study conducted a T-Test and a
Chi-squared test. The aim was to understand whether there was
a significant difference between how participants rated the clips
with generated sounds and how they rated the clips with
recorded sounds. As noted in section 1.2 (see page 3) this
dissertation aims to add to the research in Foley practices by
using generative audio. It is not therefore a comparative analysis
between recorded and generative audio. A combination of
generated footsteps was presented to the participants in clips 1,
37
3, 4 and 5. Table 1 shows the average ‘quality’ that the
participants gave to generated and recorded audio respectively.
PARTICIPANT GENERATED AUDIO
RECORDED AUDIO
1 3.25 2.16 2 4 2.5 3 2.5 3.5 4 3.75 3.5 5 1 2.16 6 4.5 3.5 7 3.25 3.16 8 3.75 2.83 9 2.5 2.5
10 2.5 2.3 11 2.5 2.5 12 3 2 13 3.25 2.5 14 2.75 4 15 4 2.75 16 2.3 3 17 2.16 4.5 18 3.5 3.25 19 2.16 3.75 20 2.3 2.75 21 3 2.5 22 2 2.3 23 3.25 2.16 24 3 3.5 25 2.16 3 26 3.5 3.25 27 3 2.75 28 3 3 29 3.25 2 30 4 3
AVERAGE 2.969333333 2.885666667 STDEV. 0.75013991 0.619625489
Table 1: Average Quality.
38
As seen in table 1, it is possible to conclude that there is no
statistical difference between the perceived quality of generated
and recorded audio, this conclusion is based on their standard
deviation values, which clearly shows that the average values
from both parties overlap. In order to critically assess these
values, a T-Test was conducted; which was aimed to understand
how likely these differences were to be reliable.
3.2.2.2 T-Test
• Null Hypothesis H0: (GA = RA). There is no discernible sonic difference between recorded audio and
generated audio.
• Alternative Hypothesis H1: (GA < RA). Recorded audio possesses better sonic qualities. Therefore, there is a
significant difference between recorded audio and generated audio.
• Alternative Hypothesis H2: (GA > RA).
Generative audio possesses better sonic qualities. Therefore, there is
a significant difference between recorded audio and generated audio.
All data was computed using Microsoft Excel (figure 17).
Additionally, this set of results were compared to those obtained
at www.graphpad.com (see Appendix C), from where this
research concluded that the two tailed probability (p) value of the
data equalled 0.639. This probability value does not provide
enough evidence to reject the Null Hypothesis (H0), as there is
no evidence to prove that there is a significant difference
between recorded and generated audio. However, this does not
mean that the Null Hypothesis is true. A couple of conclusions
can be drawn from this test:
39
• The population surveyed could not discern between recorded and
generated audio.
• An average of 3 (Good Quality) was given to the clips containing
generative audio (See Appendix A).
Figure 15: T-Test in Excel.
3.2.2.3 Chi-Square
Null Hypothesis H0: (As = Fs)
There is no difference between how Audio and Film students
perceive audio ‘quality’.
DEPARTMENT GENERATED AUDIO
RECORDED AUDIO
GRAND TOTAL
AUDIO 3.1 2.790666667 5.890666667 FILM 2.838666667 2.980666667 5.819333333
GRAND TOTAL 5.938666667 5.771333333 11.71
Table 2: Chi-Square.
40
EXPECTED VALUES:
DEPARTMENT GENERATED AUDIO
RECORDED AUDIO
GRAND TOTAL
AUDIO 2.987421501 2.903245166 5.890666667 FILM 2.951245166 2.868088168 5.819333333
GRAND TOTAL 5.938666667 5.771333333 11.71
Table 3: Expected Values.
The p value obtained from Excel was 0.895, which means that
this project cannot reject the Null hypothesis and therefore, there
is no difference in how audio and film students perceive sound.
Moreover, the independent Chi-square values for Audio and Film
students were 0.008607859 and 0.008713374 respectively,
which are just below the critical value 0.05. This strongly
highlights why this hypothesis cannot be rejected.
3.2.3 The Results and Evaluation.
As mentioned in 2.2, the main purpose of collecting
quantitative data was to demonstrate how synchresis could trick
the human ear into thinking that real footsteps are shown on the
screen. From the information collated, it is easy to conclude that
whether or not there is a significant difference between recorded
and generated audio, the outcome of the latter has the potential
of being equally as good as recorded audio. The last part of this
section sought to understand if film students had a more internal
approach to sound and just how different this approach was from
these audio students. It later became apparent that there is not
actual difference between how audio and film students perceive
sound, this is probably due a level of subjectivity that is always
present. Therefore it was not possible to study such statements.
41
3.3 Qualitative Data.
Qualitative research is often criticized for lacking rigor, where
the terms ‘reliable’ and ‘valid’ are usually associated with data
obtained by quantitative methods. However, in this mixed method
design the qualitative data collected is oriented to support the
findings of the quantitative phase. This qualitative research was
divided into two sections, a one-to-one interview with Andy
Farnell and a post-survey discussion supported by an e-mail
interview with Gillian McIver. The Norwegian psychologist Steinar
Kavale expressed in his book ‘Doing Interviews’ (Kvale, 2008)
that in order to successfully conduct an interview, a pilot testing
must be implemented. This pilot testing was conducted informally
to audio students as a conversational interview, where new and
interesting follow up questions helped to the refinement of the
topics discussed in both interviews. Researchers such as
Creswell, Goodchild and Turner have broadly studied mixed
method designs. According to Creswell, the advantages of a
mixed design are its easy implementation and in-depth
exploration of quantitative data (Creswell, 2002). However,
quantitative results may show no significant differences, thus
making the whole process slow, as it requires lengthy amount of
time to complete.
3.3.1 Data Collection Method.
According to Monique Hennik, the author of Qualitative
Research Methods, an in-depth interview “is a one-to-one
method of data collection that involves an interviewer and a
interviewee discussing specific topics in depth” (Hennik, 2011). I
42
had the opportunity to arrange a Face to face interview with Andy
Farnell. This fifteen-minute in-depth interview available to listen
to online at
www.juliantellez.com/interactiveaudio/Farnell.wav. The purpose
of this interview was to gain further knowledge in the efficiency,
design and implementation of generative audio. Five
conversational questions were introduced to Farnell; not only did
he give a clear insight of all the aforementioned discussed topics,
but he also shared his perspectives with regards to the needs of
audio and film.
In order to assist with the results obtained by the survey, a
couple of interviews were conducted. The structure of these
standardised, open-ended interviews included five questions
where the content was grounded in the results of the statistical
analysis, which was extracted from the survey. The participants A.
J. Farnell and Gillian McIver a Canadian filmmaker, writer and
visual artist were interviewed using a standardised interview
approach to ensure that the same general areas of information
were collected from both of them. Additionally, Paul Groom,
Alessandro Ugo and Daria Fissoun (Film specialists) were also
contacted. As described by Sharan B. Merriam (Merriam, 1998),
in regards to qualitative data, collection and analysis occurred
simultaneously. According to McNamara (McNamara, 2008),
there is potentially a lack of consistency in the way questions are
posed, meaning that respondents may or may not be answering
the same questions. For this reason, the interview was conducted
via e-mail, not only to the ensure consistency between them but
also to make it easier for the participants to analyse the questions,
allowing them to contribute as much detailed information as they
desired.
43
3.3.2 Research Findings. This section presents the conclusions from the data collected;
the first section describes in detail the interview design for Farnell.
Subsequently the second section further expands upon the
conclusions, which were drawn on completion of the first
interview.
3.3.2.1 One-to-One Interview.
In answer to the question what do you think are the
possibilities of module-based DSP’s such as PD becoming a
prominent audio engine solution? Andy expressed that DSP’s are
intended to fill the ‘gap’ between the user’s level of expertise and
the high-level use interaction offered by DAWs (Digital Audio
Work station). However, as far as the possibilities go, their
flexibility have earned them a place in the audio industry.
A follow up question was introduced, where the interviewee was
asked about the flexibility of the DSPs and when linked to
generated audio how this flexibility is perceived. Farnell
described how there is an apparent hierarchical stack that
constitutes generated sounds; these were of behaviour, model
and implementation, when asked which one of them was most
important, he emphasised that ‘design’ (behaviour plus model)
was more important than implementation, he added “… when you
have a great model, then you can use various kinds of methods…”
the outcomes will be equally good, because the behavioural
analysis, which encapsulates model and method, facilitates the
implementation. Some prove of this, he noted, is the work that
was done by Dan Stowell from Queen Mary University’s research
group: Centre of Digital Music. Stowell has re-written most of the
44
examples from Farnell’s textbook implementing Supercollider
instead of PD. Farnell stresses that although implementation is
exchangeable, there is still a huge gap between the design and
the user’s implementation. Physically controlled implementation,
as proposed by Farnell, is the best way to research this issue. In
answer to the question, do you think generative sound could
potentially meet the needs of the film industry? Farnell introduced
a very interesting analogy, where he related generative sounds
as the beginning of a more sophisticated approach to audio. “… I
think in the next ten years you will have a CGA (Computer
Generated Audio) in Hollywood… CGA is much more powerful
than CGI (Computer Generated Imagery) because there is a
spectrum where they can be mixed with traditional techniques…
Most people wont know the difference between generated and
recorded audio ” (Farnell, 2013). Personally, I have found this
interview, especially the aforementioned analogy to be very
inspiring, I believe that it is possible to restructure the
post-production workflow by analysing and designing the sound
of a particular location stage so that one could be able to use the
sounds created by performers at any location.
3.3.2.2 e-Interviewing.
The email interviewing turned out to be more flexible,
convenient and less obtrusive than a conventional interview.
However, as it took a lot longer than the previous discussions and
interviews, only the information provided by Gillian McIver will be
analysed (Appendix D). The questions were introduced
generically in order to get more objective answers. The rationale
to this stems from a short discussion I had with some film
45
students where they expressed discontent with audio, especially
sound libraries. In answer to the question why is it that film-audio
is secondary in the film industry? McIver outlined that the
problem does not lie in the industry but in education, mentioning
that there was a clear division between both departments, so if
the problem lies in education, how can both parties overcome
difficulties such as correct audio replacement and authentic sonic
representations? Just like a DSP fills the gap between expertise
and interaction, I believe that there is a gap where the expertise
of signal processing can meet the production needs by means of
interaction. When asked about the emphasis the film industry
puts on the creation of sound technology, McIver replied: “Most
do not think about it” judging by this answer, one could conclude
that If any sound technology that is aimed at the film industry
were to be developed in the near future, it would have to be
embedded and more importantly interactive and user-friendly.
46
CHAPTER 4 CONCLUSION.
The techniques used for the generation and control of grain
signals were studied extensively throughout this research project.
A special emphasis was placed on structuring a footstep model
that enabled an instant interaction between the user and the DSP.
It encompassed some of the studies carried out by A. J. Farnell,
P. R. Cook and R. Bresin. In adherence to these studies, a
process of evaluation and testing was also conducted alongside
the footstep method, formulated in this research project. It was a
compelling effort to promote generative audio in the
postproduction industry.
The analysis of sound synthesis with procedural audio was
reviewed in great detail, where different approaches for the
creation of sound textures were highlighted. Consequently, it
defined the evolution and development of generated audio in
response to sound modelling. This was achieved by structuring
the associations between everyday sounds and sound imagery.
The criteria used for this project conveys information that
characterises an individual sound by the force, that the body
exerts upon it (Gaver, 1993). From the evidence given by the
aforementioned authors, a study of the background and evolution
of dynamically generated audio was collected; this outlined its
advantages and drawbacks. A complete separation between
contact objects and interaction was achieved.
The main findings create an intersection between sound
47
synthesis (see section 2.1.3, p 8), signal analysis (see section
2.1.5, p 12) and user’s interaction (see section 2.2.5, p 26)
(Strobl, Eckel and Rochesso, 2006). Additionally, an evaluation
phase was introduced, where several statistical tests were
conducted in order to corroborate the information stated (see
section 3.2.2.1, p 35).
As noted at the end of sections 3.3.2.1 and 3.3.2.2 (p 43-44),
sound technology has an enormous potential, which will most
certainly be explored in years to come. Recent advances have
placed sound technology in a very prominent position allowing for
efficient interaction and productivity. As far as footstep-modelling
goes, there are endless possibilities (in terms of sound textures)
where further studies can be conducted. I have extensively
emphasised the importance of user interactivity throughout this
research project. By adding GRF recognition, this study has
overcome this issue, allowing the ‘patch’ to identify the user’s gait
characteristics (see section 2.2.3.4, p 20). However, it is still a
prototype and some adjustments will be made in the near future.
The principles of GRF apply to every mass on Earth; it would
certainly be interesting to recreate any sound by simply extracting
sound textures from the environment (Gaver, 1993).
This piece of work is intended to promote the use of generated
audio in the film industry. As discovered, there are numerous
applications for these methods within the post-production sector.
However, further research and study is necessary in order to
make generated audio a standard practice.
48
APPENDICES.
Appendix A.
Survey. 29th May 2013 London, U.K.
Footstep synthesis.
Please take a moment to analyse the clips….. When you’re done, please answer the following questions:
ABOUT YOU.
□ Audio Student. □ Film Student. □ None.
How would you rate the content of these libraries?
□ Consistent high quality. □ Generally good. □ Quality varies . □ Poor quality.
Have you ever used sound libraries? □ Yes. □ No.
What do you look for in sound libraries? □ Foley sounds. □ Ambience sounds. □ Fx. □ Other. _______________
ABOUT THE CLIPS. Please rate the clips on a scale from 1 to 5: (1) poor, (2) fair, (3) good, (4) very good, (5) outstanding.
CLIP 1 2 3 4 5 1 2 3 4 5 6 7 8 9
10
Thank you for your participation!
49
Appendix B.
0 2 4 6 8
POOR
FAIR
GOOD
VERY GOOD
OUTSTANDING
AUDIO STUDENTS: CLIP 1
0 2 4 6 8
POOR
FAIR
GOOD
VERY GOOD
OUTSTANDING
FILM STUDENTS: CLIP 1
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 2
0 5 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 2
CLIP 2
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 3
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 3
50
0 2 4 6 8
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 4
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 4
0 2 4 6 8
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 5
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 5
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 6
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 6
51
0 2 4 6 8
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 7
0 5 10 15
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 7
0 2 4 6 8
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 8
0 1 2 3 4 5
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 8
0 2 4 6 8 10
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 9
0 2 4 6 8 10 12
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 9
52
Appendix C.
0 1 2 3 4 5
POOR FAIR GOOD
VERY GOOD OUTSTANDING
AUDIO STUDENTS: CLIP 10
0 5 10 15
POOR FAIR GOOD
VERY GOOD OUTSTANDING
FILM STUDENTS: CLIP 10
53
Appendix D.
Appendix E.
Transient’s representation patch:
54
Appendix E.
55
Appendix E.
56
REFERENCES.
Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457.
Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594.
Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth.
57
Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass.
Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. Roads, C (2001). Microsound. London: MIT Press. 85-118. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference.
58
Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press. Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.
59
BIBLIOGRAPHY.
Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Bresin, R., Fridberg, A. & Dahl, S. (2001). Toward a New Model for Sound Control. Proceedings of the COST G-6 Conference on Digital Audio Effects. Bresin, R. & Fontana, F. (2003). Physics-Based Sound Synthesis and Control: Crhusing, Walking and Running by Crumpling Sounds.Proceedings of the XIV Colloquium on Musical Informatics. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (1999). Toward Physically-Informed Parametric Synthesis of Sound Effects. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Cook, P. (2002). Modeling Bill's Gait: Analysis and Parametric Synthesis of Walking Sounds. Audio Engr. Society 22 Conference. 1-3. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Dahl, S. (2000). The playing of an accent: Preliminary observations from temporal and kinematic analysis of percussionists. Journal of New Music Research. 29 (3), 225-234. Dannenberg, R. & Derenyi, I. (1998). Combining Instrument and Performance
60
Models for High-Quality Music Synthesis. Carnegie Mellon University, Pennsylvania. Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Forrester, M. (2006). Auditory Perception and Sound as Event: Theorising Sound Imagery in Psychology. Available: http://www.kent.ac.uk/arts/sound-journal/index.html. Last accessed 8th May 2013. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457.
Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594.
Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth. Hahn, J., Geigel, J., Gritz. L., Takala, T. & Mishra, S . (1995). An Integrated Approach to Audio and Motion. Journal of Visualization and Computer Animation. 6 (2), 109-129. Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Howe, K.R (1988). Against the Quantitative-Qualitative Incompatibility Thesis or dogmas Die Hard. Educational Researcher. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation.
61
Jenkins, J. & Ellis, C. (2007). Using Ground Reaction Forces from Gait Analysis: Body Mass as a Week Biometric. Fith International Conference on Pervasive Computing. Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. Lostchocolatelab. (2010). Audio Implementation Greats No 8: Procedural Audio Now. Available: http://designingsound.org/2010/09/audio-implementation-greats-8-procedural-audio-now/. Last accessed 8th May 2013. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass.
Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Milicevic, M. (2008). Film Sound Beyond Reality: Subjective Sound In Narrative Cinema. Available: http://filmsound.org/articles/beyond.htm#pet5. Last accessed 8th May 2013. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Nordahl, R., Serafin, S. & Turchet, L. (2009). Extraction of Ground Reaction Forces for Real Time Synthesis of Walking Sounds. Proceeding Audio Mostly Conference. Nordahl, R., Serafin, S. & Turchet, L (2010). Sound Synthesis and Evaluation of Interactive Footstep for Virtual Reality Applications. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. O' Brien, J., Cook, P., Essl, G. (2001). Synthesising Sounds from Physically Based Motion. Computer Graphics Proceeedings, Annual Conference Series.
62
Roads, C (1996). The Computer Music Tutorial. Massachusetts: MIT Press. 338-342. Roads, C. (1988). Introduction to Granular Synthesis. Computer Music Journal. 12 (2), 11-13. Roads, C (2001). Microsound. London: MIT Press. 85-118. Robson, C (2002). Real World Research: A Resource for Social Scientists and Practitioner-Researchers. 2nd ed. New Jersey: Willey. Rowe, R (1993). Interactive Music Systems: Machine Listening and Composing. Cambridge: MIT Press. Rowe, R. (1999). The Aesthetics of Interactive Music Systems.Contemporary Music Review. 18 (3), 83-87. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference. Strobl, G. (2007). Parametric Sound Texture Generator. Graz University, Styria. Turchet, L. & Serafin, S. (2011). A Preliminary Study on Sound Delivery Methods for Footstep Sounds. Proceeding of the 14th International Conference on Digital Audio Effects. Turner, D.. (2010). Qualitative Interview Design: A Practical Guide For Novice Investigators. The Qualitative Report. 15, 754-760. Truax, B (1993). Time-Shifting and Transposition of Sampled Sound With a Real-Time Granulation Technique. Proceedings of the International Computer Music Conference. Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press.
63
Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.
1