sounds for film postproduction

GENERATIVE FOOTSTEPS: SOUNDS FOR FILM POSTPRODUCTION

by

Julián Téllez Méndez

A DISSERTATION SUBMITED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

DEGREE OF BACHELOR OF SCIENCE In the School of Audio Engineering

MIDDLESEX UNIVERSITY

JUNE 2013

i

ABSTRACT.

This dissertation adds to the research in post-production practices

by using generative audio to digitally re-construct Foley stages. The

rationale for combining generative audio with Foley processes is to

analyse the possible implementation of new technology that could

benefit from Foley practices in low-budget films. This research project

also intersects sound synthesis, signal analysis and user interaction,

where a behavioural analysis based on ground reaction forces was

prototyped.

ii

ACKNOWLEDGEMENT.

I would like to dedicate this dissertation to Andy J. Farnell whose

expertise has really helped me immensely on my way to writing

this essay. To Gillian McIver, Helena Hollis and Philippa Embley

who never ceased in helping me right until the very end.

A Dios por su intervención divina en este logro Académico. A mi

madre Zoraida Méndez y Abuela Bertha Daza por hacer the mi

una mejor persona. A mis tías Martha y Bellky Méndez por su

incondicional apoyo sin utedes nada hubiese sido possible.

iii

TABLE OF CONTENTS.

CHAPTER 1 INTRODUCTION. 1

1.1 SIGNIFICANCE OF THIS STUDY. 2

1.2 PROBLEM STATEMENT. 3

1.3 LAYOUT OF DISSERTATION. 4

CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS. 5

2.1 LITERATURE REVIEW. 5

2.1.1 INTRODUCTION. 5

2.1.2 SOUND TEXTURES. 6

2.1.3 DEFINITIONS AND PRINCIPLES OF GRANULAR SYNTHESIS. 8

2.1.4 STOCHASTIC ANALYSIS. 10

2.1.5 PROCEDURAL AUDIO IN RESPONSE TO FOOTSTEP MODELLING. 12

2.1.6 SUMMARY. 14

2.2 METHODOLOGY. 15

2.2.1 INTRODUCTION. 15

2.2.2 OBJECTIVES. 15

2.2.3 PARAMETERS. 15

2.2.3.1 The Grain Envelope Analysis. 16

2.2.3.2 The Grain Dynamics. 18

2.2.3.3 Footstep-modelling. 19

2.2.3.4 The Ground Reaction Force. 20

2.2.4 PROCEDURES. 23

2.2.4.1 Pure Data. 23

2.2.4.2 Arduino. 25

2.2.5 Architecture. 26

2.2.6 SUMMARY. 28

iii

CHAPTER 3 EVALUATION. 29

3.1 INTRODUCTION. 29

3.2. QUANTITATIVE DATA. 30

3.2.1 THE DATA COLLECTION METHOD. 31

3.2.2 RESEARCH FINDINGS. 33

3.2.2.1 STATISTICAL ANALYSIS. 35

3.2.2.2 T-Test. 38

3.2.2.3 Chi-Square. 39

3.2.3 THE RESULTS AND EVALUATION. 40

3.3 QUALITATIVE DATA. 41

3.3.1 DATA COLLECTION METHOD. 41

3.3.2 RESEARCH FINDINGS. 43

3.3.2.1 One-to-One Interview. 43

3.3.2.2 e-Interviewing. 44

CHAPTER 4 CONCLUSION. 46

APPENDICES. 48

APPENDIX A. 48

APPENDIX B. 49

APPENDIX C. 52

APPENDIX D. 53

APPENDIX E. 53

APPENDIX F. 54

APPENDIX G. 55

REFERENCES. 56

BIBLIOGRAPHY. 59

iv

LIST OF TABLES. Table 1: Average Quality. 37

Table 2: Chi-Square. 39

Table 3: Expected Values. 40

LIST OF FIGURES.

Figure 1: Sound Texture Extraction. 6

Figure 2: Gaussian Window. 9

Figure 3: Output list. 16

Figure 4: Transient Detector. 17

Figure 5: Grain Dynamics. 18

Figure 6: GRF Exemplified. 21

Figure 7: The Gait Phase. 22

Figure 8: GRF Distribution in Pure Data. 23

Figure 9: PD Environment. 24

Figure 10: The Cloud. 24 Figure 11: Code in Arduino. 25

Figure 12: Architecture. 26

Figure 13: Prototype. 27

Figure 14: Polynomial Curves. 28

Figure 15: Question 2. 33

Figure 16: Question 3. 35

Figure 17: T-Test in Excel. 39

1

CHAPTER 1 INTRODUCTION.

This project will focus on the use of granular synthesis

techniques for dynamically generated audio with a main

emphasis on film post-production. In particular, footstep

modelling will be studied extensively. The results will be

compared with those obtained from previously recorded content

including Foley and several location recordings. Creating

dynamically generated audio, otherwise known as Procedural

Audio (PA) is a practice that involves the process of using

programmable sound structures. This allows the user to

manipulate audio by establishing the input, the internal and

output parameters to ultimately develop a non-repetitive and

meaningful sound (Farnell, 2010).

Different types of technology have called on a number of

methods to attempt to provide a quick and efficient solution for

audio, especially on interactive applications such as video games.

Many of these sources and methods are discussed below,

however it is beyond the scope of this study to try and resolve

these issues once and for all. They will undoubtedly cause

controversy and debate for many years to come. This work, on

the other hand, aims to contribute to the existing evidence that

should add to a better understanding of generated audio. This

study will highlight the need to continue the research and

development of new technology that will help to encompass

generative audio.

2

1.1 Significance of this Study.

This study will aim to gather and provide the existing theories

in an effort to expand on, clarify and support them. Various books

and academic papers have been extensively examined in order

to tailor a perspective that can justify the reason for this study,

which aims to answer three very specific questions:

• Why are generative audio and sound modelling so important?

• How can they be applied and what methods have been

developed?

• What benefits can generative audio bring to the

Post-production industry?

In order to define the scope of this study, I have chosen to

investigate and analyse the process of modelling sound for Foley

footsteps. The purpose of this will be to study existing footstep

models, especially those exhibited in Andy Farnell’s book,

Designing Sound. Based on the studies carried out by authors

such as Roberto Bresin and Perry R. Cook, I will attempt to

formulate a footstep modelling method. This study is a compelling

effort to promote structured sound models in the post-production

industry. It will also be beneficial to sound design professionals

and students, as it will provide information and performance

evaluations of certain methods previously used in accordance

with footstep modelling. Moreover, it will be helpful to the

post-production industry and independent sound professionals,

as it will inform them more in the area of generative audio.

3

1.2 Problem Statement.

The human ear can only discern a limited number of sounds.

This selective attention, otherwise known as “the cocktail party

effect”, focuses on a particular stimulus, while filtering out a range

of other stimuli (Moray, 1959). Recent observations have shown

that one in seven people are able to recall information from

irrelevant sources (Wood and Cowan, 1995). Sonic events need

to happen in order for one to be able to differentiate between a

single or a continuous stream of events (Strobl, Eckel and

Rochesso, 2006). For this reason, sensible decisions regarding

what sounds should be heard at particular times are imperative.

Sonic content has the power to enhance the narrative of film but it

can also distract one’s attention and create discomfort.

Over the years, re-creation and re-recording of all human

sounds in a film has been refined into an art named Foley. This

process consists of several important steps and individuals and it

can be used to soften the audio as well as to heighten scenes.

According to Vannesa Ament, a former Foley artist, many films

contain so many different sounds that the listener’s ears can

easily become overwhelmed (Ament, 2009). Foley stages are

unique in the sense that they are built with various surfaces that

cover concrete, wood, carpet, tile, linoleum, dirt, sand and even

water.

One of the reasons why low-budget films sound amateur is the

lack of recording facilities and particularly the lack of Foley stages.

This dissertation will add to research in Foley practices by using

generative audio to digitally re-construct these stages. The

rationale for combining generative audio with Foley processes is

to analyse the possible implementation of new technology that

4

could benefit from Foley practices in low-budget films.

Throughout the last thirty years, customised libraries have

been an essential part of post-production work. Recording assets

have become an increasing commodity; a single library can

easily compile over ten thousand individual samples. According

to David Lewis Yewdall, it will literately take years to get to know

a library (Yewdall, 2007). Having thousands of sounds collected

has relatively simplified sound design; however sound libraries

on their own, are nothing but an agglomeration of samples.

Excellent editors can create very realistic and convincing sounds,

but they will never sound as authentic as custom-recorded ones.

1.3 Layout of Dissertation.

This study will be structured in the following ways. Chapter 2

consists of a literature review and the methodology, the literature

review will study the background framework of sound textures,

different approaches for the creation of these textures will be

highlighted; it will then introduce granular synthesis and explain

how it serves to structure generative audio. An attempt will be

made to analyse the evolution of dynamically generated audio in

response to sound modelling. Accordingly, the methodology will

be discussed and both the anatomy and actions of the foot will be

examined in detail to gain a greater understanding of the modes

and dynamics of gait movement. This research project will follow

a post-positivist approach where cause and effect thinking is

reinforced. Chapter three will evaluate and analyse all data

collected in an attempt to convey a structure for footstep

modelling, which will be summarised and concluded in Chapter

four.

5

CHAPTER 2 GENERATIVE FOOTSTEP SOUNDS.

2.1 Literature Review.

2.1.1 Introduction.

This chapter provides a review of the literature and secondary

data related to sound texture, granular synthesis and footsteps

modelling. Accordingly, this chapter will initially discuss the

principle of sound texture, presenting some examples and

observations on the subject. Consequently, it will proceed to

define granular synthesis, followed by a description and analysis

of the evolution and development of procedural audio in response

to sound modelling. The concept of procedural audio (PA) is

closely linked to programming as it uses routines, subroutines

and methods to create, reshape and synthetise sound in real time,

thus there will also be an analysis of how these two relate to one

another. Finally, there will be a critical analysis of the benefits and

challenges of implementing procedural audio in post-production,

as well as a consideration of the failures of its implementation.

6

2.1.2 Sound textures.

Further studies in everyday listening, led by Gaver (Gaver,

1993) have served as the foundation for understanding sound

and hearing, particularly in the analysis and synthesis of sounds

with procedural audio. By separating contact objects from any

interaction, Gaver described individual impacts as a continuous

waveform, which characterises the force they introduce to the

object, suggesting that there may be information for interactions

that are invariant over objects, in this particular, case it is the

force exerted when a person’s body is in contact with the ground

(see figure 1). This particular topic will be examined extensively

in the oncoming subheadings. In the virtual world, interaction is

represented in terms of energy passed through a filter allowing

objects to be modelled independently. With regards to footstep

modelling, Farnell, who reflects on the generation and control of

complex signals, has also extensively researched the behaviour

and intention of sound. “Reflecting on the complexity of walking,

you will understand why film artists still dig Foley pits to produce

the nuance of footsteps, and why sampled audio is an inflexible

choice” (Farnell, 2010).

Figure 1: Sound Texture Extraction (Gaver, 1993, p 293).

7

Despite new contributions to this concept being theoretical, a

few implementations such as the Foley Automatic developed by

Kees van den Doel, Paul G. Kry and Dinesh K. pai, have proven

to deliver high-quality synthetic sound. The Foley Automatic is

composed of a dynamics simulator, a graphics renderer and a

audio modeller. Interactive audio depends upon world events

where order and timing are not usually pre-determined.

According to Farnell, the common principle, which makes audio

interactive, is the need for user input. In an attempt to represent

emotional qualities, sounds need to adapt to pull the mood of the

user (Farnell, 2007).

This project is based on Gaver’s foundation analysis and

synthesis of sounds, which involves an iterative process of

analysing recorded material and synthetising a duplicate on the

basis of the analysis. As described by Gaver, the criteria for

sound texture is based on conveying information about a given

aspect of the event as opposed to being perceptibly identical to

the original sound (Gaver, 1993). Nicolas Saint-Arnaud, defined

sound texture as a constant long-term characteristic and

attention span. “ A sound texture should exhibit similar

characteristics over time. It can have local structure and

randomness but the characteristics of the fine structure must

remain constant on the large scale. A sound texture is

characterized by its sustain… Attention span is the maximum

between events before they become distinct. High level

characteristics must be exposed within the attention span of a

few seconds ” (Saint-Arnaud, 1995).

Different studies have broadly approached the question of how

to perform a sound segmentation in order to create a sonic event

8

that resembles the original. However, no up to date applications

for producing sound textures are available and it is still based on

manually editing recorded sound material. An increasing number

of analysis and synthesis of sound textures have been formulated

in the past few years, where an intersection of many fields such

as signal analysis, sound synthesis modelling information

retrieval and computer graphics is notorious (Strobl, Eckel and

Rochesso, 2006). In the context of footstep modelling, granular

synthesis presents arguably the best approach; this research is

therefore to study the principles of granular synthesis in an

attempt to collect information that could lead to better-structured

and concise sound model.

2.1.3 Definitions and Principles of Granular Synthesis.

The concept of granular synthesis has existed for many years,

based on the fletcher paradox, stated by Zeno, which divides

time into points as opposed to segments “if everything when it

occupies an equal space is at rest, and if that which is in

locomotion is always occupying such space at any moment, the

flying arrow is therefore motionless” (Aristotle, 239). Albert

Einstein also predicted that ultrasonic vibration could occur on

the quantum level of atomic structure, which led to the concept of

acoustical quanta (Roads, 2001).

Consequently, there are various descriptions and definitions of

granular synthesis that are in existence. British scientist, Dennis

Gabor proposed that “ All sounds can be decomposed into a

family of functions obtained by time and frequency shifts of a

single Gaussian particle. Any sound can be decomposed into an

appropriate combination of thousands of elementary grains”

(Gabor, 1946); such a statement was significant in the

9

development of time frequency analysis, and set the starting

point for granular synthesis. Roads who implemented granular

sound processing in the digital domain has also made several

contributions. In his book Microsound, he stated that “ sound can

be considered as a succession of frames passing by at a rate too

fast to be heard as discrete events; sounds can be broken down

into a succession of events on a smaller time scale” (Roads,

2001).

For the purpose of this research project, the description

provided by Gabor with a slight variation on the pure Gaussian

curve (see figure 2) will be adopted (Farnell, 2010). A Tukey

envelope, also known as the cosine-tapered window, will be used;

this envelope attempts to smoothly set the waveform to zero at

the boundaries, evolving from a rectangle to a Hannig envelope

(Harris, 1978). It is useful to briefly consider the principles of

granular synthesis and how these affect audio. According to

Roads, (2001) a micro-acoustic event contains a waveform,

typically between one thousandth of a second and one tenth of a

second, shaped by an amplitude envelope. The components of

any grain of sound approach the minimum perceivable time for

duration, frequency and amplitude, creating time and frequency

domain information. By combining grains over time, sonic

atmospheres are created. However, granular synthesis requires

a broader amount of control data, which is usually controlled by

the user in global terms, leaving the synthesis algorithm to fill in

the details.

Figure 2: Gaussian Window (Roads, 2001, p87).

10

Gabor (1946) observed that any signal could be expanded in

terms of elementary acoustical quanta by a process, which

includes time analysis. Grain envelopes and durations vary in a

frequency-dependent manner. However, it is the waveform within

the grain, which is the most important parameter, as it can vary

from grain to grain or be a fixed wave throughout the grain’s

duration. This implementation pointed out the biggest flaw of time

granulation, a constant level mismatch at the beginning and end

of every sampled grain, creating a micro-transient between

grains and thus, resulting in a periodic clicking sound. More

recent work has shown that when grain envelopes are

overlapped, it creates a seamless cross-fade between them

(Jones and Parks, 1988). Numerous generative audio content

has been created and extensively developed using the principles

of acoustical quanta, allowing sound designers to easily sample,

synthesise and shape audio content; producing complex but

controllable sounds with a relatively small Central Processing

Unit (CPU) usage. According to Curtis, a grain generator is a

basic digital synthesis instrument, which consists of a wavetable

where amplitude is controlled by a Gaussian envelope. In this

project, the global organisation of the grains will follow an

asynchronous system, which means that the grains will be

encapsulated in regions or ‘clouds’ which are controlled by a

stochastic or chaotic algorithm.

2.1.4 Stochastic Analysis.

Stochastic event modelling is a process that involves random

variables where X = {X(t) ; 0 ≤ t < ∞}, on the synthesis level,

the aim is to generate a signal that can vary continuously

11

according to various parameters. Dynamic stochastic synthesis is

a concept that has existed for the last fifty years, composers such

as Xenakis, have speculated about the possibility of synthesising

completely new sonic waveforms on the basis of probability

(Harley, 2004). Xeneakis proposals to the usual method of sound

synthesis take the form of five different strategies (Roads, 1996):

1. The direct use of probability distributions such as

Gaussian and exponential.

2. Combining probability functions through multiplications.

3. Combining probability functions through addition (over

time).

4. Using random variables of amplitude and time as functions

of other variables.

5. Going to and fro between events using variables.

Roads describes how the user could control the grain ‘cloud’ by

adjusting certain parameters, these include (Roads, 2001):

1. The start-time and duration.

2. The grain’s duration.

3. The density of grains per second.

4. The frequency band of the cloud.

5. The amplitude envelope of the cloud.

6. Their spatial dispersion.

All these considerations will be tested and further explained in

the oncoming subheadings, where the effects of different grain

duration, densities and irregularities will be examined in more

detail.

12

2.1.5 Procedural Audio in Response to Footstep Modelling.

Having examined the principles of granular synthesis and

determined a suitable definition for this study, it is vital to

understand the evolution and growth of procedural audio and

how the development of technology and interactive demand has

shaped its expansion. Traditionally in films, pre-recorded

samples are commonly used to simulate diegetic sounds such as

footsteps. All sonic material needs to be gathered in order to

represent what is being shown on the screen, to achieve a high

level of fidelity, sound libraries and directly recorded sounds are

implemented (Mott, 1990). However, this approach has several

disadvantages; sampled sounds are repetitive and location

recording is not always the best or easiest option. Recent

synthetic sound models have seen an increase in interest, with

several algorithms, which make it possible create sounding

objects by implementing the use of physical principles (Cook,

2002). Despite the recognised advantages and benefits of

procedural audio, a review of the literature in this area has

revealed that the adoption of procedural audio is relatively low.

This is one of the areas of challenge that this research project

seeks to address, however before this can be measured and an

accurate research instrument created, it is also useful to

understand the state of procedural audio in the sonic industry. It

appears evident from the literature that procedural audio and

Programming are inextricably linked (Javerlein, 2000). Generally

the latter measures the success of the outcome, however the

concept of procedural audio has been further redefined in terms

of a combination of linear, recorded, interactive, adaptive,

sequenced, synthetic, generative and artificial intelligence (AI)

13

audio, which suggests that what is of greatest importance in

procedural audio is the meaning we give to the input, internal

states and output of the systems. Having taking this all into

account, if programming is only a means by which one creates

meaningful sound, where does the conflict lie?

The problem with procedural audio is that there aren’t any sets,

which contain the sound of a specific object and if there are, there

is no way of searching for them. Farnell strongly believes that a

better approach to producing sound requires more traditional

mathematical approaches based on engineering and physics

(Farnell, 2007). However, dynamically generated sound is not the

answer to all these problems; there are plenty of areas where it

fails to replace recorded sound, such as dialogues and music

scores. Even though methods for research and development

have been established, practical issues continue to affect the

realism of dynamically generated sound.

Sound designers, who have adapted their skills and learned

new tools, are in the process of finding equilibrium between data

and procedural models, which is not a fast or a complete process.

Perhaps one of the greatest disadvantages of generated audio is

that it still cannot encapsulate the significant sounds of life.

Post-production sound effects seem to fall into the psychological

rather than technical category, in most cases, they reveal through

sound the acoustic landscape in which we live in. Associations of

everyday sound play a decisive part in the language of sound

imagery, but they can easily be confused. “One of the reasons for

this is that we often see without hearing” (Balazs, 1949).

According to Bela Balaz, the Hungarian-Jewish film critic, “there

is a very considerable difference between our visual and acoustic

education”. We are far more used to visual forms than sound

14

forms; this is because we have become accustomed to seeing

and then hearing, making it rather difficult to draw conclusions

about a concrete object just by listening to it. The relationship

between visuals and sound will be furthered explained in chapter

3.2.2.1 Statistical Analysis. Sample based audio has proven to be

successful, because its principle is to represent our acoustic

world, however it is an impractical method as it fails to change in

accordance to the visible source. On the other hand, a single

procedural structure could accurately replace an entire sound

library; the problem does not lie in its principles but in that it

attempts to represent motifs associated with various situations in

film rather than our acoustic world. Having generated a great deal

of sample based audio, production companies have drastically

changed our perception of sound through film, associating

melodies and sound to specific objects or situations, making it

particularly difficult for new content to take over.

2.1.6 Summary.

This literature review has studied the background and

evolution of dynamically generated audio and has also analysed

its evolution in parallel to developments in technology. It is clear

that whilst procedural audio has many obvious advantages, its

acceptance has been lower than expected and various reasons

have been suggested to explain why this might be the case. This

section has also briefly mentioned some footstep modelling

followed by a critical analysis of the benefits and challenges of

implementing procedural audio in post-production. The following

section will present the methodology that will be used during this

study.

15

2.2 Methodology.

2.2.1 Introduction.

In this chapter, the objectives, parameters and procedures

used in this research project are explained; especially those

involved in developing dynamically generated sound where the

process for creating a footstep-modelling analysis will be

explained.

2.2.2 Objectives.

The general objectives of this research project are:

• A review of the existent knowledge on sound textures and

footstep modelling.

• To develop a method for the creation of dynamic sound

textures.

• To incorporate the previously mentioned method in

footstep sound modelling.

2.2.3 Parameters.

According to Yonathan Bard, models are designed to explain the

relationships between quantities that can be measured

independently (Bard, 1974). To understand these relationships a

set of parameters need to be introduced.

16

2.2.3.1 The Grain Envelope Analysis.

The system architecture of this model extracts and analyses

the signal with an envelope follower, which outputs the signal’s

root means square (RMS.) All significant peaks are located once

the threshold has been set. If no threshold is selected, all peaks

above 50 dB will be segmented into individual events. In order to

ensure that peaks are tracked accurately a Hanning window,

sized in samples (1024 default), has been set. Once the

envelope has marked all the significant peaks, the DSP will then

output and list all the events. Figure 3 shows a simple example of

the envelope follower’s listing process.

Figure 3: Output list.

The numbers shown in Figure 3 are expressed in milliseconds

and are applied to mark the cut-off points between events.

Significant sub events can sometimes be found within the events,

for this reason the sample gets normalised, which makes the

peak-to-peak transient recognition much more effective. This

process, however, is strictly for events recognition and is not

used as part of any playback. Thus, signal-to-noise ratio is not

raised at any moment. Each particle noise event can be

pitch-shifted, reversed, stretched and smoothed. In his analysis

17

of walking sounds, Cook suggested that in order to gel the sonic

events, a short and exponentially decaying noise burst should be

added, which has proven to be and exceptional addition to this

algorithm.

According to Curtis, time appears to be reversible in the

quantum level, meaning that grains or ‘events’ can be reversed in

time. Moreover, if the grain envelope is symmetrical, the reversed

event should sound exactly the same. In Pure Data (PD), this

was easily achieved by simply reversing the output list, which

turned out to be a success as it gave the sound texture a

time-reversible feature. However, as the overall amplitude of the

samples synthetised were not symmetric, it was impossible to

demonstrate that the waveform of a grain and its reversed form

were identical. Figure 4 shows the envelope analysis process.

Figure 4: Transient Detector.

18

2.2.3.2 The Grain Dynamics.

In order to provide a more comprehensive interaction with the

sampled signal, several dynamics such as density, duration and

pitch of the grain were implemented. The grain density was easily

achieved by dividing the number of grains by a thousand. On the

other hand, the duration and pitch required a more precise

adjustment.

To identify the pitch of a sound texture is extremely difficult, as

they do not posses any harmonic spectra. However, if properly

arranged, it is possible to distinguish the sound texture from

being higher or lower. Frequency and time are inversely

proportional in the micro level (Gabor, 1947). Therefore,

expanding or shortening a grain has inverse repercussions on its

frequency bandwidth, which results in an evident change of

timbral character. In order to achieve an accurate timbral change,

a two-octave bandwith was introduced. Figure 5 shows the ‘patch’

implemented to transform the pitch of a selected event.

Figure 5: Grain Dynamics.

19

2.2.3.3 Footstep-modelling.

This section describes how particles are extracted based on

Physically Inspired Stochastic Event Modelling (PhISEM).

According to Cook, who has extensively researched this area, the

parameterisation of walking sounds should involve interaction,

preferably provoked by friction or pressure from the feet. A

stochastic approach, a non-deterministic sequence of random

variables, models the probability that particles will make noise;

sound probability is constant at each time step (Cook, 2002).

Studies have shown the human ability to perceive source

characteristics of a natural auditory event. From various analyses

applied on walking sounds, a relationship between auditory

events and acoustic structure was found. This study considered

sounds of walking and running footstep sequences on different

textures. Textures such as gravel, snow and grass were chosen,

this was motivated by the assumption that a noisy and rich sound

spectra will still be perceived by the ear as a natural sound.

Studies carried out by Roberto Bresin, who has extensively

studied new models for sound control, shown how a double

support is created when both feet are on the ground at the same

time, suggesting there are not any silent intervals between two

adjacent steps. However, not specifying a time constrain

between two particular events will blend them into a unison

texture; therefore an ‘Attention Span’ has to be created between

steps, in order to perceive them as separate events

(Saint-Arnaud, 1995). According to Bresin, Legato and Staccato

can be associated to walking and running respectively. Some of

his recent work has reported a strong connection between motion

and music performance.

20

Having stated several parameters that directly influence

walking sounds, it is evident that large libraries of pre-recorded

sounds do not contain every possible scenario, which greatly

compromises the sonic appreciation.

2.2.3.4 The Ground Reaction Force.

A footstep sound is a combination of multiple impact sounds

between the foot (exciter) and the floor (resonator). This model

has chosen to separate both components and consider the

exciter as an input for different types of resonators. In other

words, by extracting the pressure exerted by one’s foot, different

modes can be extracted and implemented to recreate the sounds

of different kinds of floors. In the field of mechanics the pressure

exerted by one’s body is called the Ground Reaction Force

(GRF), which derives from Newton’s third law: “To every action

there is always opposed an equal reaction: or the mutual actions

of two bodies upon each other are always equal, and directed to

contrary parts” (Newton, Motte and Machin, 2010). The

architecture of this model will use the ground response force

principles to find and analyse the forces that intervene in the

creation of the multiple impact sounds that constitute a footstep.

It will then apply the analysed forces to the different resonators,

creating an opposed equal reaction that will be later translated

into sound. See figure 6. In order to analyse the forces involved in

the foot’s motion, it is important to understand how they are

distributed. A normal gait is composed of two phases, a stance

phase (60%) and a swing phase (40%). The stance phase is

composed of five categories, initial contact, loading response,

mid-stance, terminal stance and pre-swing.

21

Figure 6: GRF Exemplified.

(http://epicmartialarts.wordpress.com/tag/ground-reaction-force/)

The swing phase consists of an initial swing, a mid-swing and

a terminal swing (Porter, 2007). All these phases exert different

forces making it incredibly hard to translate all of his

micro-movements into sound. Farnell has proposed to analyse

the gait phases not as individual events, but as a distribution of

forces. As a result, three phases become apparent as shown in

figure 7 (Farnell, 2010):

1. The Contact Phase: The heel makes contact with the ground

and the ankle rotates the foot.

2. The Mid-stance Phase: The body’s weight is shifted onto the

outer tarsal.

3. The Propulsive Phase: The foot rolls along the ground ending

up on its toes.

22

Figure 7: The Gait Phase.

(http://naturalrunningcenter.com/2012/06/21/walking-vs-running-gaits/)

Ideally, each gait cycle would generate identical GRF

distributions, however they can significantly change as the

walking pace and level ground change. If this weren’t the case,

two complete footsteps could be sufficient in generating a walking

pattern. This introduces another variable, the movement of the

body, which fluctuates above and below the sum of the left and

right foot’s GRF. Andy J. Farnell, explained in his book Designing

Sound, the three different modes of movement (Farnell, 2010):

1. Creeping: Minimises pressure changes, which

diminishes the sound.

2. Walking: Maximises locomotion while minimising

energy expenditure.

3. Running: Accelerates locomotion.

Figure 8 exemplifies the Ground Reaction Force distribution of a

gait phase, where the body’s weight is transferred onto the heel,

sometimes before the weight is completely transferred, there is a

transient force experienced just before the load response,

surprisingly this force exceeds the normal standing force. The

weight’s distribution between the heel coming down and the toe

pushing off evens out just before the propulsive phase where the

body’s weight is entirely on the feet.

23

Figure 8: GRF Distribution in Pure Data.

2.2.4 Procedures.

This section describes the instruments and architecture

involved in the creation of this research project. It aims to

establish an efficient workflow that could later be implemented to

future work. This section will also explain how diverse theories

and models will be tested and how relevant data will be collected.

2.2.4.1 Pure Data.

The demonstration prototype that accompanies this research

project has been built using this platform. In order to run this

software or ‘patch’ Pure Data 0.44.0 is required. Pure Data (PD)

is an open source visual programming language developed by

Miller Puckette. It is a real time graphical programming

environment for audio, video and graphical processing. PD was

chosen partially because it is designed for real-time processing

and because it allows a fast modification of parameters, making it

extremely interactive and user friendly (See figure 9).

24

Figure 9: PD Environment.

Figure 10: The Cloud.

25

2.2.4.2 Arduino.

In order to establish a more interactive communication

between the user and the ‘patch’, a piezo-resistive force sensor

was implemented (see figure 13). The prototyping platform

Arduino UNO creates a link between PD and the presence

sensor. When pressure is applied to the sensor, Arduino will

receive the input of the analogue pin, which ranges from 0 to

1023 and it will then transmit the value to the object ‘comport

9600’ in PD. Figure 11 illustrates this process.

Figure 10: Code in Arduino.

26

2.2.5 Architecture.

This footstep model has been inspired by Perry

Cook and Andy J. Farnell’s approaches to

walking sounds. Their investigations into

parametrised synthesis, especially granularity,

have been of great help. Figure 12 illustrates the

signal flow of this prototype. Based on Road’s

idea of user’s control, this ‘patch’ routes all the

information to a common ‘cloud’ (see figure 10)

where the user can easily modify the dynamics of

the grain, as well as the sensitivity of the feet

sensors. All seven parameters mentioned in

section 2.1.4 (see page 11) were taken into

account when designing this ‘patch’. The sensors

define the start time and duration of this process

(1). The grain duration is specified by the option

‘smooth’, which divides its input into a 100ms

window and adds it to the transient’s size (2).

Seemingly the density of grains per second

(grains/1000ms) is specified by the option ‘grains’

(3). Two band-pass filters determine the

frequency band of the cloud (4). An amplitude

envelope and a freeverb~ (PD custom’s reverb)

have also been incorporated, giving the user the

option of custom-shape the signal before it

reaches the output (5 & 6).

Figure 12: Architecture.

27

Figure 11: Prototype.

In order to accurately transcribe and digitise the sensor’s

information, a split-phase and a polynomial curve have been

incorporated. The split-phase converts the input given by the

sensors into a signal that can be later scanned by the Phasor~

object in PD. It combines both feet and creates a time constrain

between one another, defining an ‘Attention Span’ fooling the ear

into perceiving both inputs as separate events (Saint-Arnaud,

1995). The polynomial curve is defined by the equation (Farnell,

2010):

f(x) = -1.5n (x^3-x) (1-x) where, 0≤n<1

Figure 14, illustrates the envelope of the polynomial curve for

the minimum and maximum values of n. These curves create a

small envelope for each of the three gait phases aforementioned.

See Figure 8. As mentioned in chapter 2.3.2.1, a burst of noise

has also been added to the ‘patch’, which contributes to the

randomness of the stochastic analysis and helps to mask any

imperfections of the grain selection, if any. A low-pass filter has

been attached to the noise generator, so that high frequencies

can be added or filtered out. This white noise is triggered directly

by the sensor pad.

28

In order to evaluate the accuracy and precision of these

methods external feedback will be collected, this will be explained

further in the following chapter.

Figure 12: Polynomial Curves.

2.2.6 Summary.

This methodology has extensively analysed the existent

knowledge on sound textures with granular synthesis in order to

develop a method for the creation of dynamically generated

textures (see page 16). It then proceeded to integrate the

mentioned model to a footstep model created from the

behavioural analysis conducted in section 2.2.2.3 (see page 19).

It has also described the architecture of the prototype designed

as part of this research. The following chapter will present the

Evaluation Process that was used for this project.

29

CHAPTER 3 EVALUATION.

3.1 Introduction.

The evaluation process presented in this study uses a mixed

method design. According to John W. Creswell, analysing both

quantitative and qualitative data helps to understand the research

problem thoroughly (Creswell, 2002). A mixed method design is

based upon pragmatic statements, which accept the truth as a

normative argument. Interesting opinions have been given

regarding mixed methods, however the issue of distinguishing

between aesthetic assumptions have not been addressed yet

(Sale, Lohfeld, Brazil, 2002). This research project will use a

sequential explanatory mixed methods design, according to

Creswell (Creswell, 2002), this method is ‘’the most

straightforward of the six major mixed method approaches”,

which is an advantage as it organises data more efficiently. This

method collects and analyses quantitative data and then goes on

to collect and analyse qualitative data. Keneth R. Howe, an

educational researcher, stated that researchers should forge

ahead only with what works. Following this statement, this study

introduced three topics, in order to structure the design of this

research project: Priority, Implementation and Integration

(Creswell, Plano Clark, Guttman & Hanson, 2003).

30

a) Which of these methods, quantitative or qualitative

will be emphasised in this study?

b) Will data collection come in sequence or in

chronological stages?

c) How will this data be integrated?

Special priority will be given to quantitative data leaving all

qualitative results to assist the results obtained in the quantitative

stage. For the purposes of efficiency, data was collected and

integrated in chronological stages, which offered a more

comprehensive and broader landscape of the gleaned

information.

3.2. Quantitative Data.

Michael Chion explained, in his book Audio-Vision, how

sounds can objectively evoke impressions without necessarily

relating to their source (Chion, 1990). A combination of

synchronism and synthesis, forged by Chion as synchresis,

describes the mental fusion between sounds and visuals, when

they occur simultaneously. According to Chion (Chion, 1990,

p115), when a precise expectation of sound is set up, synchresis

predisposes the spectator to accept the sound he or she hears.

With regards to footsteps, Chion refers to synchresis as

unstoppable stating that, “We can therefore use just about any

sound effects for these footsteps that we might desire” (Chion,

1990, p64). As an example, he referred to the film comedy Mon

Oncle, by the French filmmaker Jacques Tati, where a variety of

noises for human footsteps, which involved Ping-Pong balls and

31

glass objects were used. One of the purposes of this survey was

to demonstrate how synchronised sound textures could fool the

ear into thinking that real footsteps are being played. In order to

achieve this, a total of ten clips were played to an audience,

where a mixture of Foley, location recording and generated

sounds were given. A non-probability sampling approach was

used for this research project, as it is not the purpose of this study

to infer from the sample to the general population but to add to

the knowledge of this study.

3.2.1 The Data Collection Method.

Data collection mostly consists of observations, where several

audio samples were compared to those created with the model

aforementioned. A self-developed survey, containing items of

different formats such as multiple choice and dichotomous

questions were structured. Colin Robson describes surveys as a

very effective method in collecting data from a specific population,

or a sample from that population (Robson, 2002). Seemingly,

they are widely accepted as a key tool in conducting and applying

research methods (Rossi, Wrigth and Anderson, 1983). The

survey consisted of five questions, which will be divided into two

sections. The first section of this analysis asked questions related

to the participants’ status (Audio or Film student). The second

section measured the participants’ ability to differentiate between

recorded and dynamically generated sounds (See Appendix A).

This model sought to understand the individuals’ perception of

diegetic sounds.

The quantitative data was collected on the 29th and 30th May

2013 at Student Audio Engineering (S.A.E) House in east London,

U.K. The survey was distributed to a specific population of

32

students (Audio and Film students). In total thirty individuals were

given surveys. Based on Howe’s statement (see 3.1), the goals

of the surveys were to identify what sound textures participants

believed to be real. Two independent variables were introduced;

these were Recorded and Generated Sounds, which were played

at random to the participants. As mentioned above, a total of ten

short-clips containing five different sound textures, were prepared

for this survey. The first group of participants surveyed were

mostly Audio students, a brief explanation explaining ‘attention

span’ and the layout of the audio, was given prior the survey. At

first participants were asked to listen to just the audio of the

short-clips. A fifteen second gap between clips was given, not

only for them to draw their own conclusions (as an informal

conversational interview) but also to allow their short-term

memory to ‘forget’ the sonic information, which they had gathered.

According to Perry Miller, the duration of our short memory

seems to be between fifteen and thirty seconds (Miller, 1956).

This way, the average audio-visual span disappears from one’s

mind, allowing new data to be processed clearly. The second part

of the survey combined both picture and sound. The structure of

the survey (see Appendix A) contained three basic questions,

which were aimed to investigate the participant’s relation to

sound libraries. Both questions how you would rate the content

and what you would look for in sound libraries, were an excellent

start, which led to an open debate conducted after the survey.

This offered even more data for discussion and research.

33

3.2.2 Research Findings

This section describes the results of the survey by initially

assessing the descriptive statistics in order to specify the different

variables and characteristics that were measured. An analysis of

the remaining variables and aspects of the survey will also be

presented. As described in the previous section, the research

population comprised of thirty research participants using a

non-purposive sampling approach. The quantitative variables of

this project were collected on two different days, as it was very

difficult to integrate both audio and film students together. In

order to accurately measure both departments, this research

project has surveyed a total of fifteen audio students, one audio

specialist, thirteen film students and two film specialists. The first

part of the survey (see Appendix A) established how many

participants had used sound libraries for their particular projects.

As seen in figure 15, when asked about the content of such

libraries in question two, 40% of both audio and film student

thought their quality was very poor.

Figure 13: Question 2.

HQ 20%

GOOD 20%

VARIES 20%

POOR 40%

AUDIO STUDENTS: QUESTION 2

VARIES 60%

POOR 40%

FILM STUDENTS: QUESTION 2

34

However, the concept of poor is a very vague statement. Are

the contents of these libraries poor in sonic quality? Or are they

poor because they do not meet the user’s needs? In order to

clarify this concept a follow up question was introduced, this was

What do you look for in sound libraries? As shown in figure 16,

Audio and Film students look for very different and specific

material. 67% of the audio students surveyed specifically looked

for ambience sounds, whereas 57% of the film students surveyed

looked for Foley sounds. Many types of hypotheses can be

drawn from this statement. The perception of sound in film goes

far and beyond the pure physics of the sonic spectra. Throughout

history, film producers have chosen to artificially construct the

sound of their films (Gorbman, 1976).

Advances in technology have expanded the creative

possibilities of filmmakers and sound designers; the difference

lies in how these sonic experiences are created. Based on the

data collected, one could easily assume that film students have

an internal approach to sound (Chion, 1990). Physiological

sounds such as breathing and moans, or more subjective sounds

such as a memory or a mental voice can easily be achieved by

using Foley and ADR (Automated Dialogue Replacement)

practices, which might explain why their main concern, when

browsing through a sound library, are Foley sounds. On the other

hand, audio students seek to describe the ‘soundscape’ of the

picture, either by recreating the sonic characteristics of the

environment or by artificially creating a completely new sonic

environment.

35

Figure 14: Question 3.

Another question arises from these two hypotheses. This is,

how is the quality of such libraries perceived, if their contents are

listen to as part of a group of sounds? This is a very important

question as it strives to understand our perception of artificially

constructed sound. Being able to recreate generative audio

means nothing if it does not work in the context it was designed

for. In order to understand this matter, the aforementioned

footstep sounds (see 2.2.2.3) were played along with ambience

sounds as well as different sound effects and dynamics. The

results will be analysed in the next section.

3.2.2.1 Statistical Analysis

This section examines the results of the statistical analysis

collected from the second part of the survey. It tries to understand

how generative audio can be implemented in postproduction

processes. It should be noted at the outset of this analysis that

this research followed a non-sampling technique. According to

researchers such as Fredrick J Gravatter, convenience sampling

is probably the most adequate method to use when the

FOLEY 16%

AMBIENCE

67%

OTHER 17%

AUDIO STUDENTS: QUESTION 3

FOLEY 57%

AMBIENCE

29%

FX 14%

FILM STUDENTS: QUESTION 3

36

population to be investigated is too large. Participants were

therefore selected, based on their accessibility and proximity to

the researcher. Although convenience sampling does not offer

any guarantee of a representative sample, it collects basic data

that could later be analysed or used as a pilot study (Gravetter,

2011 p 151). In order to ensure that each variable was evaluated

to its best, they were examined one at a time, a series of visual

displays were created to help explain the relationships between

the variables examined in this study. A total of ten short clips

were presented to the participants, to answer the question, where

do you think unrealistic sounds have been placed? Participants

were given a scale from one to five to rate clip’s realism. The

films that were used for this experiment were: Terminator 2:

Judgment day (1991) mixed by the American sound designer

Garry Rydstrom, Pulp Fiction (1994) mixed by David Bartlett,

Mon Oncle (1958) produced by Jacques Tati and Here (2013)

produced as part of my portfolio. Clips one, three, four and five

were re-mixed in order to introduce the footsteps generated by

the ‘patch’ developed. The purpose of this experiment was to

determine what combination of sounds seemed the most realistic

to the participant. The results of this experiment are shown in

Appendix B. This research study conducted a T-Test and a

Chi-squared test. The aim was to understand whether there was

a significant difference between how participants rated the clips

with generated sounds and how they rated the clips with

recorded sounds. As noted in section 1.2 (see page 3) this

dissertation aims to add to the research in Foley practices by

using generative audio. It is not therefore a comparative analysis

between recorded and generative audio. A combination of

generated footsteps was presented to the participants in clips 1,

37

3, 4 and 5. Table 1 shows the average ‘quality’ that the

participants gave to generated and recorded audio respectively.

PARTICIPANT GENERATED AUDIO

RECORDED AUDIO

1 3.25 2.16 2 4 2.5 3 2.5 3.5 4 3.75 3.5 5 1 2.16 6 4.5 3.5 7 3.25 3.16 8 3.75 2.83 9 2.5 2.5

10 2.5 2.3 11 2.5 2.5 12 3 2 13 3.25 2.5 14 2.75 4 15 4 2.75 16 2.3 3 17 2.16 4.5 18 3.5 3.25 19 2.16 3.75 20 2.3 2.75 21 3 2.5 22 2 2.3 23 3.25 2.16 24 3 3.5 25 2.16 3 26 3.5 3.25 27 3 2.75 28 3 3 29 3.25 2 30 4 3

AVERAGE 2.969333333 2.885666667 STDEV. 0.75013991 0.619625489

Table 1: Average Quality.

38

As seen in table 1, it is possible to conclude that there is no

statistical difference between the perceived quality of generated

and recorded audio, this conclusion is based on their standard

deviation values, which clearly shows that the average values

from both parties overlap. In order to critically assess these

values, a T-Test was conducted; which was aimed to understand

how likely these differences were to be reliable.

3.2.2.2 T-Test

• Null Hypothesis H0: (GA = RA). There is no discernible sonic difference between recorded audio and

generated audio.

• Alternative Hypothesis H1: (GA < RA). Recorded audio possesses better sonic qualities. Therefore, there is a

significant difference between recorded audio and generated audio.

• Alternative Hypothesis H2: (GA > RA).

Generative audio possesses better sonic qualities. Therefore, there is

a significant difference between recorded audio and generated audio.

All data was computed using Microsoft Excel (figure 17).

Additionally, this set of results were compared to those obtained

at www.graphpad.com (see Appendix C), from where this

research concluded that the two tailed probability (p) value of the

data equalled 0.639. This probability value does not provide

enough evidence to reject the Null Hypothesis (H0), as there is

no evidence to prove that there is a significant difference

between recorded and generated audio. However, this does not

mean that the Null Hypothesis is true. A couple of conclusions

can be drawn from this test:

39

• The population surveyed could not discern between recorded and

generated audio.

• An average of 3 (Good Quality) was given to the clips containing

generative audio (See Appendix A).

Figure 15: T-Test in Excel.

3.2.2.3 Chi-Square

Null Hypothesis H0: (As = Fs)

There is no difference between how Audio and Film students

perceive audio ‘quality’.

DEPARTMENT GENERATED AUDIO

RECORDED AUDIO

GRAND TOTAL

AUDIO 3.1 2.790666667 5.890666667 FILM 2.838666667 2.980666667 5.819333333

GRAND TOTAL 5.938666667 5.771333333 11.71

Table 2: Chi-Square.

40

EXPECTED VALUES:

DEPARTMENT GENERATED AUDIO

RECORDED AUDIO

GRAND TOTAL

AUDIO 2.987421501 2.903245166 5.890666667 FILM 2.951245166 2.868088168 5.819333333

GRAND TOTAL 5.938666667 5.771333333 11.71

Table 3: Expected Values.

The p value obtained from Excel was 0.895, which means that

this project cannot reject the Null hypothesis and therefore, there

is no difference in how audio and film students perceive sound.

Moreover, the independent Chi-square values for Audio and Film

students were 0.008607859 and 0.008713374 respectively,

which are just below the critical value 0.05. This strongly

highlights why this hypothesis cannot be rejected.

3.2.3 The Results and Evaluation.

As mentioned in 2.2, the main purpose of collecting

quantitative data was to demonstrate how synchresis could trick

the human ear into thinking that real footsteps are shown on the

screen. From the information collated, it is easy to conclude that

whether or not there is a significant difference between recorded

and generated audio, the outcome of the latter has the potential

of being equally as good as recorded audio. The last part of this

section sought to understand if film students had a more internal

approach to sound and just how different this approach was from

these audio students. It later became apparent that there is not

actual difference between how audio and film students perceive

sound, this is probably due a level of subjectivity that is always

present. Therefore it was not possible to study such statements.

41

3.3 Qualitative Data.

Qualitative research is often criticized for lacking rigor, where

the terms ‘reliable’ and ‘valid’ are usually associated with data

obtained by quantitative methods. However, in this mixed method

design the qualitative data collected is oriented to support the

findings of the quantitative phase. This qualitative research was

divided into two sections, a one-to-one interview with Andy

Farnell and a post-survey discussion supported by an e-mail

interview with Gillian McIver. The Norwegian psychologist Steinar

Kavale expressed in his book ‘Doing Interviews’ (Kvale, 2008)

that in order to successfully conduct an interview, a pilot testing

must be implemented. This pilot testing was conducted informally

to audio students as a conversational interview, where new and

interesting follow up questions helped to the refinement of the

topics discussed in both interviews. Researchers such as

Creswell, Goodchild and Turner have broadly studied mixed

method designs. According to Creswell, the advantages of a

mixed design are its easy implementation and in-depth

exploration of quantitative data (Creswell, 2002). However,

quantitative results may show no significant differences, thus

making the whole process slow, as it requires lengthy amount of

time to complete.

3.3.1 Data Collection Method.

According to Monique Hennik, the author of Qualitative

Research Methods, an in-depth interview “is a one-to-one

method of data collection that involves an interviewer and a

interviewee discussing specific topics in depth” (Hennik, 2011). I

42

had the opportunity to arrange a Face to face interview with Andy

Farnell. This fifteen-minute in-depth interview available to listen

to online at

www.juliantellez.com/interactiveaudio/Farnell.wav. The purpose

of this interview was to gain further knowledge in the efficiency,

design and implementation of generative audio. Five

conversational questions were introduced to Farnell; not only did

he give a clear insight of all the aforementioned discussed topics,

but he also shared his perspectives with regards to the needs of

audio and film.

In order to assist with the results obtained by the survey, a

couple of interviews were conducted. The structure of these

standardised, open-ended interviews included five questions

where the content was grounded in the results of the statistical

analysis, which was extracted from the survey. The participants A.

J. Farnell and Gillian McIver a Canadian filmmaker, writer and

visual artist were interviewed using a standardised interview

approach to ensure that the same general areas of information

were collected from both of them. Additionally, Paul Groom,

Alessandro Ugo and Daria Fissoun (Film specialists) were also

contacted. As described by Sharan B. Merriam (Merriam, 1998),

in regards to qualitative data, collection and analysis occurred

simultaneously. According to McNamara (McNamara, 2008),

there is potentially a lack of consistency in the way questions are

posed, meaning that respondents may or may not be answering

the same questions. For this reason, the interview was conducted

via e-mail, not only to the ensure consistency between them but

also to make it easier for the participants to analyse the questions,

allowing them to contribute as much detailed information as they

desired.

43

3.3.2 Research Findings. This section presents the conclusions from the data collected;

the first section describes in detail the interview design for Farnell.

Subsequently the second section further expands upon the

conclusions, which were drawn on completion of the first

interview.

3.3.2.1 One-to-One Interview.

In answer to the question what do you think are the

possibilities of module-based DSP’s such as PD becoming a

prominent audio engine solution? Andy expressed that DSP’s are

intended to fill the ‘gap’ between the user’s level of expertise and

the high-level use interaction offered by DAWs (Digital Audio

Work station). However, as far as the possibilities go, their

flexibility have earned them a place in the audio industry.

A follow up question was introduced, where the interviewee was

asked about the flexibility of the DSPs and when linked to

generated audio how this flexibility is perceived. Farnell

described how there is an apparent hierarchical stack that

constitutes generated sounds; these were of behaviour, model

and implementation, when asked which one of them was most

important, he emphasised that ‘design’ (behaviour plus model)

was more important than implementation, he added “… when you

have a great model, then you can use various kinds of methods…”

the outcomes will be equally good, because the behavioural

analysis, which encapsulates model and method, facilitates the

implementation. Some prove of this, he noted, is the work that

was done by Dan Stowell from Queen Mary University’s research

group: Centre of Digital Music. Stowell has re-written most of the

44

examples from Farnell’s textbook implementing Supercollider

instead of PD. Farnell stresses that although implementation is

exchangeable, there is still a huge gap between the design and

the user’s implementation. Physically controlled implementation,

as proposed by Farnell, is the best way to research this issue. In

answer to the question, do you think generative sound could

potentially meet the needs of the film industry? Farnell introduced

a very interesting analogy, where he related generative sounds

as the beginning of a more sophisticated approach to audio. “… I

think in the next ten years you will have a CGA (Computer

Generated Audio) in Hollywood… CGA is much more powerful

than CGI (Computer Generated Imagery) because there is a

spectrum where they can be mixed with traditional techniques…

Most people wont know the difference between generated and

recorded audio ” (Farnell, 2013). Personally, I have found this

interview, especially the aforementioned analogy to be very

inspiring, I believe that it is possible to restructure the

post-production workflow by analysing and designing the sound

of a particular location stage so that one could be able to use the

sounds created by performers at any location.

3.3.2.2 e-Interviewing.

The email interviewing turned out to be more flexible,

convenient and less obtrusive than a conventional interview.

However, as it took a lot longer than the previous discussions and

interviews, only the information provided by Gillian McIver will be

analysed (Appendix D). The questions were introduced

generically in order to get more objective answers. The rationale

to this stems from a short discussion I had with some film

45

students where they expressed discontent with audio, especially

sound libraries. In answer to the question why is it that film-audio

is secondary in the film industry? McIver outlined that the

problem does not lie in the industry but in education, mentioning

that there was a clear division between both departments, so if

the problem lies in education, how can both parties overcome

difficulties such as correct audio replacement and authentic sonic

representations? Just like a DSP fills the gap between expertise

and interaction, I believe that there is a gap where the expertise

of signal processing can meet the production needs by means of

interaction. When asked about the emphasis the film industry

puts on the creation of sound technology, McIver replied: “Most

do not think about it” judging by this answer, one could conclude

that If any sound technology that is aimed at the film industry

were to be developed in the near future, it would have to be

embedded and more importantly interactive and user-friendly.

46

CHAPTER 4 CONCLUSION.

The techniques used for the generation and control of grain

signals were studied extensively throughout this research project.

A special emphasis was placed on structuring a footstep model

that enabled an instant interaction between the user and the DSP.

It encompassed some of the studies carried out by A. J. Farnell,

P. R. Cook and R. Bresin. In adherence to these studies, a

process of evaluation and testing was also conducted alongside

the footstep method, formulated in this research project. It was a

compelling effort to promote generative audio in the

postproduction industry.

The analysis of sound synthesis with procedural audio was

reviewed in great detail, where different approaches for the

creation of sound textures were highlighted. Consequently, it

defined the evolution and development of generated audio in

response to sound modelling. This was achieved by structuring

the associations between everyday sounds and sound imagery.

The criteria used for this project conveys information that

characterises an individual sound by the force, that the body

exerts upon it (Gaver, 1993). From the evidence given by the

aforementioned authors, a study of the background and evolution

of dynamically generated audio was collected; this outlined its

advantages and drawbacks. A complete separation between

contact objects and interaction was achieved.

The main findings create an intersection between sound

47

synthesis (see section 2.1.3, p 8), signal analysis (see section

2.1.5, p 12) and user’s interaction (see section 2.2.5, p 26)

(Strobl, Eckel and Rochesso, 2006). Additionally, an evaluation

phase was introduced, where several statistical tests were

conducted in order to corroborate the information stated (see

section 3.2.2.1, p 35).

As noted at the end of sections 3.3.2.1 and 3.3.2.2 (p 43-44),

sound technology has an enormous potential, which will most

certainly be explored in years to come. Recent advances have

placed sound technology in a very prominent position allowing for

efficient interaction and productivity. As far as footstep-modelling

goes, there are endless possibilities (in terms of sound textures)

where further studies can be conducted. I have extensively

emphasised the importance of user interactivity throughout this

research project. By adding GRF recognition, this study has

overcome this issue, allowing the ‘patch’ to identify the user’s gait

characteristics (see section 2.2.3.4, p 20). However, it is still a

prototype and some adjustments will be made in the near future.

The principles of GRF apply to every mass on Earth; it would

certainly be interesting to recreate any sound by simply extracting

sound textures from the environment (Gaver, 1993).

This piece of work is intended to promote the use of generated

audio in the film industry. As discovered, there are numerous

applications for these methods within the post-production sector.

However, further research and study is necessary in order to

make generated audio a standard practice.

48

APPENDICES.

Appendix A.

Survey. 29th May 2013 London, U.K.

Footstep synthesis.

Please take a moment to analyse the clips….. When you’re done, please answer the following questions:

ABOUT YOU.

□ Audio Student. □ Film Student. □ None.

How would you rate the content of these libraries?

□ Consistent high quality. □ Generally good. □ Quality varies . □ Poor quality.

Have you ever used sound libraries? □ Yes. □ No.

What do you look for in sound libraries? □ Foley sounds. □ Ambience sounds. □ Fx. □ Other. _______________

ABOUT THE CLIPS. Please rate the clips on a scale from 1 to 5: (1) poor, (2) fair, (3) good, (4) very good, (5) outstanding.

CLIP 1 2 3 4 5 1 2 3 4 5 6 7 8 9

10

Thank you for your participation!

49

Appendix B.

0 2 4 6 8

POOR

FAIR

GOOD

VERY GOOD

OUTSTANDING

AUDIO STUDENTS: CLIP 1

0 2 4 6 8

POOR

FAIR

GOOD

VERY GOOD

OUTSTANDING

FILM STUDENTS: CLIP 1

0 2 4 6 8 10

POOR FAIR GOOD

VERY GOOD OUTSTANDING


0 5 10

POOR FAIR GOOD



CLIP 2

0 2 4 6 8 10

POOR FAIR GOOD



0 2 4 6 8 10

POOR FAIR GOOD



50

0 2 4 6 8

POOR FAIR GOOD



0 2 4 6 8 10

POOR FAIR GOOD



0 2 4 6 8

POOR FAIR GOOD



0 2 4 6 8 10

POOR FAIR GOOD



0 2 4 6 8 10

POOR FAIR GOOD



0 2 4 6 8 10

POOR FAIR GOOD



51

0 2 4 6 8

POOR FAIR GOOD



0 5 10 15

POOR FAIR GOOD



0 2 4 6 8

POOR FAIR GOOD



0 1 2 3 4 5

POOR FAIR GOOD



0 2 4 6 8 10

POOR FAIR GOOD



0 2 4 6 8 10 12

POOR FAIR GOOD



52

Appendix C.

0 1 2 3 4 5

POOR FAIR GOOD



0 5 10 15

POOR FAIR GOOD



53

Appendix D.

Appendix E.

Transient’s representation patch:

54

Appendix E.

55

Appendix E.

56

REFERENCES.

Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457.

Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594.

Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth.

57

Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass.

Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. Roads, C (2001). Microsound. London: MIT Press. 85-118. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference.

58

Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press. Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.

59

BIBLIOGRAPHY.

Ament, V (2009). The Foley Grail: The Art of Performing Sound for Film, Games, and Animation. Oxford: Focal Press. Balazs, B (1949). Theory of Film: Sound. London: Dennis Dobson Ltd. Bard, Y. (1974). Nonlinear Parameter Estimation. New York Academic Press. Bresin, R., Fridberg, A. & Dahl, S. (2001). Toward a New Model for Sound Control. Proceedings of the COST G-6 Conference on Digital Audio Effects. Bresin, R. & Fontana, F. (2003). Physics-Based Sound Synthesis and Control: Crhusing, Walking and Running by Crumpling Sounds.Proceedings of the XIV Colloquium on Musical Informatics. Chion, M (1990). Audio Vision. New Jersey: Columbia University Press. Cook, P (1999). Toward Physically-Informed Parametric Synthesis of Sound Effects. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. Cook, P (2002). Real Sound Synthesis for Interactive Applications. Massachusetts: AK Peters, Ltd. Cook, P. (2002). Modeling Bill's Gait: Analysis and Parametric Synthesis of Walking Sounds. Audio Engr. Society 22 Conference. 1-3. Creswell, J.W (2002). Reseach Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: SAGE Publications Ltd. Creswell, J.W., Plano Clark, V. & Hanson, W (2003). Advanced Mixed Methods Research Design. Thousand Oaks: SAGE Publications Ltd. Dahl, S. (2000). The playing of an accent: Preliminary observations from temporal and kinematic analysis of percussionists. Journal of New Music Research. 29 (3), 225-234. Dannenberg, R. & Derenyi, I. (1998). Combining Instrument and Performance

60

Models for High-Quality Music Synthesis. Carnegie Mellon University, Pennsylvania. Farnell, A. (2007). Marching Onwards: Procedural Synthetic Footsteps for Video Games and Animation. Proceedings of the Pure Data convention. Farnell, A (2010). Designing Sound. London: MIT Press. Forrester, M. (2006). Auditory Perception and Sound as Event: Theorising Sound Imagery in Psychology. Available: http://www.kent.ac.uk/arts/sound-journal/index.html. Last accessed 8th May 2013. Gabor, D. (1946). Theory of communication. Journal of the Institute of Electrical Engineers 3, (93), 429-457.

Gabor, D. (1947). Acoustical quanta and the theory of hearing. Nature. 591- 594.

Gaver, W.. (1993). How Do We Hear in the World?: Explorations in Ecological Acoustics. Ecological Psychology. 5 (4), 292-297. Gorbman, C (1976). Teaching the Soundtrack. Quarterly Review of Film and Video. Gravetter, F. J., and Wallnau, L. B. (2011). Essentials of Statistics for the Behavioral Sciences (7th Edition). Belmont, CA: Thomson/Wadsworth. Hahn, J., Geigel, J., Gritz. L., Takala, T. & Mishra, S . (1995). An Integrated Approach to Audio and Motion. Journal of Visualization and Computer Animation. 6 (2), 109-129. Harris, F. (1978). On the Use of Windows for Harmonic Analysis with the Discrete Furier Transform. Proceedings of the IEEE. Harley, J (2004). Xenakis: His Life in Music. New York: Routledge. 215-218. Hennink, M., Hutter, I. & Bailey, A (2011). Qualitative Research Methods. New Jersey: SAGE Publications Ltd. Howe, K.R (1988). Against the Quantitative-Qualitative Incompatibility Thesis or dogmas Die Hard. Educational Researcher. Javarlainen, H (2000). Algorithmic musical composition. Helsinki University of Technology, TiK-111080 Seminar on content creation.

61

Jenkins, J. & Ellis, C. (2007). Using Ground Reaction Forces from Gait Analysis: Body Mass as a Week Biometric. Fith International Conference on Pervasive Computing. Jones, D. & Parks, T. (1988). Generation and Combination of Grains for Music Synhthesis. Computer Music Journal. 12 (2), 27-34. Lostchocolatelab. (2010). Audio Implementation Greats No 8: Procedural Audio Now. Available: http://designingsound.org/2010/09/audio-implementation-greats-8-procedural-audio-now/. Last accessed 8th May 2013. McNamara, C . (2008). General Guidelines for Conducting Interviews.Available: http://managementhelp.org/businessresearch/interviews.htm. Last accessed 8th May 2013. Merriam, S. B. (1998). Qualitative research and case study applications in education. San Francisco: Jossey-Bass.

Miller, G.A (1956). The Magical Number Seven, Plus or Minus Two: Some Limitis on our Capacity for Processing Information. The Psychological Review. Milicevic, M. (2008). Film Sound Beyond Reality: Subjective Sound In Narrative Cinema. Available: http://filmsound.org/articles/beyond.htm#pet5. Last accessed 8th May 2013. Moray, N. (1959). Attention in dichotic listening: Affective cues and the influence of instructions. Quarterly Journal of Experimental Psychology. 11, 56-60. Mott, R (1990). Sound Effects, Radio TV and Film. Boston: Focal Press. Newton, Sir I., Motte, A. & Machin, J (2010). The Mathematical Principles of Natural Philosophy, Volume 1. Carolaina Charleston: Nabu Press. Nordahl, R., Serafin, S. & Turchet, L. (2009). Extraction of Ground Reaction Forces for Real Time Synthesis of Walking Sounds. Proceeding Audio Mostly Conference. Nordahl, R., Serafin, S. & Turchet, L (2010). Sound Synthesis and Evaluation of Interactive Footstep for Virtual Reality Applications. Porter, D. & Schon, L (2007). Baxter's The Foot and Ankle in Sport. 2nd ed. Missouri: Mosby. O' Brien, J., Cook, P., Essl, G. (2001). Synthesising Sounds from Physically Based Motion. Computer Graphics Proceeedings, Annual Conference Series.

62

Roads, C (1996). The Computer Music Tutorial. Massachusetts: MIT Press. 338-342. Roads, C. (1988). Introduction to Granular Synthesis. Computer Music Journal. 12 (2), 11-13. Roads, C (2001). Microsound. London: MIT Press. 85-118. Robson, C (2002). Real World Research: A Resource for Social Scientists and Practitioner-Researchers. 2nd ed. New Jersey: Willey. Rowe, R (1993). Interactive Music Systems: Machine Listening and Composing. Cambridge: MIT Press. Rowe, R. (1999). The Aesthetics of Interactive Music Systems.Contemporary Music Review. 18 (3), 83-87. Saint-Arnaud, N. (1991). Classification of Sound Textures. Mater of Science in Telecommunications. Universite Laval, Quebec. Sale, J., Lohfeld, L. & Brazil, K (2002). Revisiting the Quantitative-Qualitative Debate: Implications for Mixed-Methods Research. Netherlands: Kluwer Academic Publishers. Strobl, G., Eckel, G. & Rocchesso, D. (2006). Sound Texture Modelling: A Survey. Proceedings of the Sound and Music Computing Conference. Strobl, G. (2007). Parametric Sound Texture Generator. Graz University, Styria. Turchet, L. & Serafin, S. (2011). A Preliminary Study on Sound Delivery Methods for Footstep Sounds. Proceeding of the 14th International Conference on Digital Audio Effects. Turner, D.. (2010). Qualitative Interview Design: A Practical Guide For Novice Investigators. The Qualitative Report. 15, 754-760. Truax, B (1993). Time-Shifting and Transposition of Sampled Sound With a Real-Time Granulation Technique. Proceedings of the International Computer Music Conference. Yewdall, D (2011). The Practical Art of Motion Picture Sound. 4th ed. Oxford: Focal Press.

63

Wood, N & Cowan, N. (1995). The Cocktail Party Phenomenon Revisited: How Frequent Are Attention Shifts to One's Name in a Irrelevant Auditory Channel?. Journal of Experimental Psychology: Learning, Memory and Cognition. 21 (1), 225-260.

sounds for film postproduction

Documents