developing interactive electronic systems for improvised ... · & ii& introduction! this...

Developing Interactive Electronic Systems for

Improvised Music

Jason Alder

Advisor: Jos Herfs

ArtEZ hogeschool voor de kunsten

2012

Contents

INTRODUCTION ii

1. EVOLUTION OF ELECTRONICS IN MUSIC 1

2. IMPROVISATION 5

3. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 16

4. ARCHITECTURE 27 A. CLASSIFICATION PARADIGMS 27 B. LISTENER 33 C. ANALYZER 39 D. COMPOSER 59

5. CONCLUSION 69

REFERENCES 73

ii

Introduction

This paper will discuss how one can develop an interactive electronics

system for improvisation, looking at how this system differs from one designed

for composed music, and what elements are necessary for it to “listen, analyze,

and respond” musically. There will be a look at the nature of improvisation and

intelligence, and through discussions of research done in the fields of cognition

during musical improvisation and of artificial intelligence, insight will be

gathered as to how the interactive system must be developed so that it too

maintains an improvisational nature. Previous systems that have been developed

will be examined, analyzing how their design concepts can be used as a platform

from which to build, as well as look at what can be changed or improved, through

an analysis of various components in the system I am currently designing, made

especially for non-‐idiomatic improvisation.

The use of electronics with acoustic instruments in music is generally the

result of the goal of opening up possibilities and using a new sonic palette. There

is a wealth of approaches for how the electronics get implemented, such as a

fixed performance like tape-‐playback pieces, or the use of effects to manipulate

the acoustic sound like guitar pedals, or pre-‐recorded/sequenced material being

triggered at certain moments. A human is often controlling these electronics,

whether that is the performer or another person behind a computer or other

medium, but the possibility of the electronics controlling themselves brings

some interesting ideas to the improvisation world. With the advances in

technology and computer science, it is possible to create an interactive music

system that will “interpret a live performance to affect music generated or

Introduction iii

modified by computers” (Winkler, 1998). Using software such as Max/MSP, the

development of a real-‐time interactive system that “listens” and “analyzes” the

playing of an improviser, and “responds” in a musical way, making its own

“choices” is closer to fact than the science-‐fiction imagery it may impart.

1

1. Evolution of Electronics in Music

An initial question some may have when considering improvisation with a

computer is, “Why?” More specifically, “Why improvise with a computer when

you could improvise with other humans?” The use of electronics in music is not

an entirely new concept. The Theremin, developed in 1919, is one of the earliest

electronic instruments1. Utilizing two antennae, one for frequency and the other

for amplitude, it produces music through pitches created with oscillators. The

instrument is played by varying the distance of one’s hands to each of the

antennae. Moving the right hand towards and away from the antennae

connected to the frequency changes the sounding pitch, while the other hand

does the same in respect to the amplitude antennae to change the volume (Rowe,

1993). Throughout the 20th century, more and more instruments utilizing

electric current were developed, for example monophonic keyboard instruments

like the Sphärophone (1927), Dynaphone (1927-‐8), and the Ondes Martenot

(1928). These first attempts at electronic instruments were often modeled to try

to provide characteristics of acoustic instruments. Polyphonic inventions such as

the Givelet (1929) and Hammond Organ (1935) became more commercially

successful as replacements for pipe organs, although the distinct characteristic

sound of the Hammond also gave rise to those wanting to experiment with its

sonic possibilities beyond the traditional manner (Manning, 2004).

As has been the case throughout the development of music, the change and

development of new technology opens doors and minds to previously

1 For an explanation and demonstration of Theremin playing, see http://www.youtube.com/watch?v=cd4jvtAr8JM

Chapter 1 2

unexplored musical territory. Chopin and Liszt had the virtue of inspiration “by

the huge dramatic sound of a new piano design. The brilliance and loudness of

the thicker strings was made possible by the development of the one-‐piece cast-‐

iron frame around 1825” (Winkler, 1998). In late 1940s Paris, Pierre Schaeffer

was making Musique Concrète using the new recording technology available by

way of the phonograph and magnetic tape, and “the invention of the guitar

pickup in the 1930s was central to the later development of rock and roll. So it

makes sense today, as digital technology provides new sounds and performance

capabilities, that old instruments are evolving and new instruments are being

built to fully realize this new potential” (Winkler, 1998)

Balilla Pratella, an Italian futurist, published his Manifesto of Futurist

Musicians in 1910 calling for “the rejection of traditional musical principles and

method of teaching and the substitution of free expression, to be inspired by

nature in all its manifestations” and in his Technical Manifesto of Futurist Music

(1911) that composers should “master all expressive technical and dynamic

elements of instrumentation and regard the orchestra as a sonorous universe in

a state of constant mobility, integrated by an effective fusion of all its constituent

parts” and their work should reflect “all forces of nature tamed by man through

his continued scientific discoveries, […] the musical soul of crowds, of great

industrial plants, of trains, of transatlantic liners, of armored warships, of

automobiles, of airplanes” (Manning, 2004). In response, Luigi Russolo published

his manifesto The Art of Noises:

“Musical sound is too limited in qualitative variety of timbre. The most complicated of orchestras reduce themselves to four or five classes of instruments differing in timbre: instruments played with the bow, plucked instruments, brass-‐winds, wood-‐winds and percussion

Evolution of Electronics in Music

3

instruments… We must break out of this narrow circle of pure musical sounds and conquer the infinite variety of noise sounds.” (Russolo, 1913)

John Cage’s interest in improvisation and indeterminacy was an influence

to the composers of the sixties that first began experimenting with electronic

music in a live situation. Gordon Mumma’s Hornpipe (1967), “an interactive live-‐

electronic work for solo hornist, cybersonic console, and a performance space,”

used microphones to capture and analyze the performance of the solo horn

player, as well as the resonance and acoustic properties of the performance

space. The horn player is free to choose pitches, which in turn affects the

electronics in the “cybersonic console”. The electronic processing emitting from

the speakers then changes the acoustic resonance of the space, which is re-‐

processed by the electronics, thus creating an “interactive loop” (Cope 1977).

Morton Subotnick worked with electrical engineer Donald Buchla to create the

multimedia opera Ascent Into Air (1983), with “interactive computer processing

of live instruments and computer-‐generated music, all under the control of two

cellists who are part of a small ensemble of musicians on stage” (Winkler, 1998).

Subotnick later worked with Marc Coniglio to create Hungers (1986), a staged

piece where electronic music and video were controlled by the musicians.

Winkler comments on the “element of magic” in live interactive music,

where the “computer responds ‘invisibly’ to the performer”, and the heightened

drama of observing the impact that the actions of the clearly defined roles of

computer and performer have on one another. He continues by saying that “since

the virtue of the computer is that it can do things human performers cannot do, it

Chapter 1 4

is essential to break free from the limitations of traditional models and develop

new forms that take advantage of the computer’s capabilities” (Winkler, 1998).

The role of electronics in music is that of innovation. The aural possibilities

and a computer’s abilities to perform actions that humans cannot, create a world

of options not previously available. Utilizing these options fulfills Russolo’s

futurist vision, and using these tools for improvisation expands the potential

output of an electronics system. By allowing artificial indeterminism, human

constraints are dissipated and doors are opened for the potential of otherwise

unimaginable results.

5

2. Improvisation

The question of how one makes a computer capable of improvising is one

of the crucial elements in the task of developing an interactive improvisational

system. As a computer is not self-‐aware, how can it make “choices” and respond

in a musical manner? To address this issue, I looked to the nature of

improvisation. What is it that is actually happening when one improvises? What

is the improviser thinking about in order to play the “correct” notes, such that it

sounds like music, as opposed to a random collection of pitches or sounds? Some

may have a notion that improvisation is just a free-‐for-‐all, where the player can

do anything they wish, but this is clearly not the case. If one were to listen to an

accomplished jazz pianist play a solo, as well as an accomplished classical pianist

play a cadenza, they would likely make their respective improvisations sound

easy, effortless, and flow in its style. But if the roles were reversed, and the jazz

pianist played a Mozart cadenza and a classical pianist played a solo in a jazz

standard, there would likely be a clear difference in how they sound. The music-‐

theorist Leonard Myer defines style as:

“a replication of patterning, whether in human behavior or in the artifacts produced by human behavior, that results from a series of choices made within some set of constraints… [which] he has learned to use but does not himself create… Rather they are learned and adopted as part of the historical/cultural circumstances of individuals or group” (Myer, 1989).

There are traits and traditions particular to each style that make a piece of music

sound the way it does, and be identified as being in that style. Without the proper

training and knowledge of rhythmic and harmonic development and particular

important traits for each style, a player cannot properly improvise within it,

Chapter 2 6

which is why one would hear such a difference between the classical and jazz

pianists improvising in the same pieces.

Improvisation takes elements from material and patterns of its associated

musical culture. “The improviser’s choices in any given moment may be

unlimited, but they are not unconstrained” (Berkowitz, 2010). Mihály

Csikszentmihályi, a psychologist specializing in the study of creativity states:

“Contrary to what one might expect from its spontaneous nature, musical improvisation depends very heavily on an implicit musical tradition, on tacit rules… It is only with reference to a thoroughly internalized body of works performed in a coherent style that improvisation can be performed by the musician and understood by the audience” (Csikszentmihályi and Rich, 1997).

These traditions and rules are the conventions that stand as a basis, a

common language, for the performer to communicate to the listeners. They are

the referent, defined by psychologist and improviser Jeff Pressing as “an

underlying formal scheme or guiding image specific to a given piece, used by the

improviser to facilitate the generation and editing of improvised behavior…”

(Pressing, 1984). The ethnomusicologist Bruno Nettl calls the referent a “model”

for the improviser to “ha[ve] something given to work from-‐ certain things that

are at the base of the performance, that he uses as the ground on which he

builds” (Nettl, 1974). The referents, or models, are the musical elements such as

melodies, chord patterns, bass lines, motifs, etc., used as the basis to build the

improvisation. They provide the structural outline and the material, but are part

of the larger knowledge base necessary, which is “built into long term memory”

(Pressing, 1998).

Improvisation

7

It is also necessary to have “rapid, real-‐time thought and action” (Berkowitz,

2010) to successfully incorporate this musical information into a unique,

improvised piece of music. Pressing says:

“The improviser must effect real-‐time sensory and perceptual coding, optimal attention allocation, event interpretation, decision-‐making, prediction (of the actions of others), memory storage and recall, error correction, and movement control, and further, must integrate these processes into an optimally seamless set of musical statements that reflect both a personal perspective on musical organization and a capacity to affect listeners” (Pressing, 1998).

Through study and practice, the referents become engrained into the playing of

the improviser, and the note-‐to-‐note level of playing can be recalled

automatically, allowing the improviser to focus more on the higher-‐level musical

processes, such as form, continuity, feeling, etc.

Aaron Berkowitz, in his book The Improvising Mind: Cognition and

Creativity in the Musical Moment, studies which elements of improvisation are

conscious or unconscious decisions. He finds that “some conventions and rules

are accessible to consciousness, while others may function without conscious

awareness” (Berkowitz, 2010). These elements of memory are related directly to

the learning process, as stated by psychologist Arthur Reber:

“There can be no learning without memorial capacity; if there is no memory of past events, each occurrence is, functionally, the first. Equivalently, there can be no memory of information in the absence of acquisition; if nothing has been learned, there is nothing to store” (Reber, 1993).

The learning process can be separated into two forms, implicit and explicit.

Implicit learning is defined as:

Chapter 2 8

“The acquisition of knowledge about the underlying structure of a complex stimulus environment by a process which takes place natural, simply and without conscious operations… a non-‐conscious and automatic abstraction of the structural nature of the material arrived at from experience of instances,”

whereas explicit learning is:

“A more conscious operation where the individual makes and tests hypotheses in a search for structure… [;] the learner searching for information and building then testing hypotheses… [;] or, because we can communicate using language… assimilation of a rule following explicit instructions” (Ellis, 1994).

The important difference between implicit and explicit learning is the conscious

effort required of explicit learning and not of implicit. It is also possible to learn

implicit information during explicit learning. Berkowitz gives the example of

learning a foreign language, and memorizing phrases in the new language by

explicitly focusing on features of the words, phrases, sounds, and structures, but

at the same time implicitly learning other attributes of language (Berkowitz,

2010).

Similarly, implicit memory is defined as “memory that does not depend on

conscious recollection,” and explicit memory as “memory that involves conscious

recollection” (Eysenk and Keane, 2005). The relationship between learning and

memory is not necessarily direct and can change. Something learned implicitly

can be consciously, and thus explicitly, analyzed, and explicit knowledge can

become implicit “through practice, exposure, drills, etc.…” (Gass and Selinker,

2008).

In Berkowitz’s interviews with classical pianist Robert Levin, Levin

describes his thought processes, or sometimes lack thereof, while he improvises.

While being explicitly aware of the overall musical picture as it is happening, he

Improvisation

9

is not thinking on a note-‐by-‐note basis of what he is doing, or what he will do. He

allows his fingers to move implicitly, the years and years of practice guiding

them in the right directions. He says of the process:

“I began to realize you’re just going to have to let go of it and go wherever you go. The way jazz people do: you have this syntactical thing just the way they have their formulas, you’ve got the basics of architecturally how a cadenza works and its sectionalization, which can be abstracted from all of these cadenzas, and then you just have to accept the fact that there’s going to be some disorder… When I play, I am reacting… your fingers play a kind of, how shall I say, a potentially fateful role in all this, because if your fingers get ahead of your brain when you’re improvising, you get nonsense or you get emptiness. I never, and I mean never, say ‘I’m going to modulate to f-‐sharp major now,’ or ‘I’m going to use a dominant seventh now,’ or ‘I’m going to use a syncopated figure now…’ I do not for one millisecond when I’m improvising think what it is I’m going to be doing. I don’t say, ‘Oh I think it’s about time to end now…’” (Levin, 2007).

Berkowitz focuses on comparing improvising with language production.

When speaking in one’s native language, there is not a word-‐by-‐word analysis of

what is going to be said. The overall direction of the statement is known, but one

is not thinking word-‐by-‐word, nor about specific grammatical rules. These are

implicit elements that manifest during speaking. Children, when learning to

speak, are able to do so without any explicitly taught grammar, but just learn to

know what sounds “right”. There is also no acute awareness of the physical

aspects of speech, such as tongue, lip, and larynx position (Berkowitz, 2010).

These just fall into their learned positions in the body’s muscle memory. This

lack of direct cognition during spontaneous speech production is the same as in

improvising. Once one has learned and internalized the vocabulary and

grammatical rules to the point where it is automatically and implicitly recalled,

they can “leave nearly everything to the fingers and to chance” (Czerny, 1839).

Chapter 2 10

Achieving this level of competence comes from the development of one’s

“toolbox”, or Knowledge Base. Pianist Malcolm Bilson cites one of the elements

for learning to improvise is collecting the ideas for this toolbox (Bilson, 2007)

from the internalization of repertoire and exercises. Once the material has been

stored in the toolbox, it can be drawn upon spontaneously during improvisation,

but it is through the practice and refinement of the skill of improvising that one

can “link up novel combinations of actions in real-‐time and chang[e] chosen

aspects of them” giving one “the ability to construct new, meaningful pathways

in an abstract cognitive space” (Pressing, 1984). This process of refinement and

vocabulary development is largely implicit, in contrast to the explicitly rote

learning of chords and harmonic progressions (Berkowitz, 2010).

While Levin acknowledges that his fingers play a “fateful role” in

improvising, and that there is a lack of cognition of what exactly they will do, he

says also:

“I get to a big fermata, I think, ‘What am I going to do now? Oh, I’ll do that.’ So there’s a bit of that, but not the sense of doing it every two bars” (Levin, 2007).

This creates a dichotomy in the thinking process. On one hand there is no

thinking and purely allowing the fingers to move, but on the other hand there is

having an overall sense of direction and where the fingers need to go and

“get[ting] reasonably lucky most of the time” (Levin, 2007). Psychologist Patricia

Nardone describes this “creator-‐witness dichotomy” (Berkowitz, 2010) as

“…ensuring spontaneity while yielding to it…[,] being present and not present to

musical processes: a divided consciousness… [,] exploring a musical terrain that

is familiar and unfamiliar…” She discusses this further:

Improvisation

11

“One dialectic process is that while improvising musicians are present to and within the musical process, they are also concomitantly allowing musical possibilities to emerge pre-‐reflectively, effortlessly, and unprompted. Conversely, while musicians are outside the improvisational process and fully observant of it, they are paradoxically directing and ensuring the process itself. A second dialectical paradox is that in improvisation there is an intention to direct and ensure spontaneous musical variations while allowing the music itself to act as a guide toward a familiar domain. A third dialectical paradox is that while being present to and within the process of musical improvisation, musicians concomitantly allow the music to guide them toward an unfamiliar terrain. Conversely, while being outside the musical process and fully observant of it, musicians paradoxically intend the music toward a terrain that is familiar to them” (Nardone, 1997).

Paul Berliner speaks of the physicality of the improvisation process on the body,

“through its motor sensory apparatus, it interprets and responds to sounds and

physical impressions, subtly informing or reshaping mental concepts” (Berliner,

1994). This physicality in improvisation can also be likened to that of

spontaneous speech. One needs the effortless mechanical skills of, most often,

their hands to play their instrument just as a speaker needs the mechanical skills

of tongue, mouth, and larynx, as well as a proficiency of the syntax of music and

language to effectively communicate (Berkowitz, 2010). Czerny also speaks of

the creator-‐witness in reference to a speaker that “does not think through each

word and phrase in advance… [but] must… have the presence of mind… to

adhere constantly to his plan…” (Czerny, 1836).

Once this dichotomy of creator-‐witness has occurred, Levin describes his

thoughts once he is done improvising, “After I’m finished doing it, I… have no

idea what I played” (Levin, 2005). To this Berkowitz poses the questions, “Is not

some memory of what is occurring during the improvisation necessary if the

performer is to make it from point a to point b? Or can this only prove to be a

Chapter 2 12

hindrance?” (Berkowitz, 2010). The answer to this lies in the findings of implicit

and explicit memories. The practiced and honed skill of improvising, after time,

enters in the implicit memory as motoric reactions, even though the actions

themselves cannot be explicitly remembered. The improviser may begin with an

idea, but is then led by the movements of the fingers, allowing the music to “flow

from moment to moment magically manifest[ing], without a need to know or

remember where one has been or where one is going. In improvised

performance, the boundaries between creator and witness, past and future, and

music and musician dissolve into the musical moment” (Berkowitz, 2010).

Willem J.M. Levelt describes the processes for the generation of speech in

his book Speaking as:

Conceptualization. In this process, one plans “the communicative intention by selecting the information whose expression may realize the communicative goals.” In other words, one plans the idea(s) behind the intended message in a preverbal fashion.

Formulation. In this process, the conceptualized message is translated into linguistic structure (i.e., grammatical and phonological encoding of the intended message take place). This phrase is converted into a phonetic or articulatory plan, which is a motor program to be executed by the larynx, tongue, lips, etc.

Articulation. This is the process of actual motor execution of the message, that is, overt speech.

Self-monitoring and self-repair. By using the speech comprehension system that is also used to understand the speech of others, the speaker monitors what he or she is saying and how he or she is saying it on all levels from word choice to social context. If errors occur, the speaker must correct them (Levelt, 1989; Berkowitz, 2010).

The application of these ideas to improvisation is logical. The overall

improvisation is the concept, the form, structure, and style is the formulation,

playing the music is the articulation, and as the music is happening the

performer is monitoring the output and making corrections.

Improvisation

13

Improvisation can also, however, be likened to learning a foreign language

rather than a native language. Following Levelt’s processes, one is much more

conscious of what the conceptualized statement is, the formulation of the

translation and ordering of the words, and the correctly articulated

pronunciation. Sometimes, particularly when beginning, the monitoring and

repair section is not even achievable, as one does not even know that there was a

mistake. It can be that the foreign language learner may have knowledge and

understanding of the rules of sentence construction, but is not able to formulate

them in a manner for an effective conversation. Berkowitz analogizes this to

Levin’s descriptions of learning to improvise, and the balance between thinking

too much about what he was doing, and just allowing his fingers to go. The ability

to think about the referent and overall structure interfered with the fingers and

the note-‐by-‐note implicit level of playing. Michael Paradis says that the foreign

language speaker “may either use automatic processes or controlled processes,

but not both at the same time… Implicit competence cannot be placed under the

conscious control of explicit knowledge” (Paradis, 1994).

Finding a balance between planning and execution in speech and

improvisation is thus necessary. Eysenck and Keane estimate that 70 percent of

spoken language uses recurrent word combinations, and thus pre-‐formulation is

one tool for finding this balance (Eysenck and Keane, 2005). From a musical

perspective, this is akin to combining elements from the “toolbox,” allowing for

more attention to be paid to the referent.

Improvisation occurs constantly in everyday life. For example, it could also

be analogous to the decision to drive to the store. There must be a general plan;

Chapter 2 14

one must know the way and the best route to take, but what happens in between

is unknown. Encountering other cars, traffic lights, road construction, a dog

running across the street, etc., can all change the originally intended plan, and

the ability to immediately react and adapt to the situation is imperative. Befitting

of this example, Berkowitz says:

“Improvisation cannot exist without constraints, and that live performance will always require some degree of improvisation as its events unfold. Improvisation needs to operate within a system even when the resultant music transcends that system. Moreover, no performance situation-‐ improvised or otherwise-‐ exists in which all variables can be entirely predetermined” (Berkowitz, 2010).

Similarly, Levin states:

“The fact of the matter is that you are who you have been in the process of being who you will be, and in nothing that you do will you suddenly-‐ as an artist or a person-‐ come out with something that you have never done before in any respect. There will be quite possibly individual elements in a performance that are wildly and pathbreakingly different from anything that you’ve done before, but what about the rest and what kind of persona and consistency of an artist would you have if there was no way to connect these things…?” (Levin, 2007).

The key elements learned about improvisation here are the spontaneous

development and recombination of previously learned material and the lack of

specific conscious decisions, yet maintaining an overall view of the direction the

music is going. The musical decisions that come from spontaneous

recombination are sourced from the musician’s training and study, and what

patterns have been learned and have found their way into the implicit memory.

This is why classical and jazz pianists will improvise differently to the same

music; they have different “toolboxes”. It can then also be said that whatever

goes into the toolbox will have an effect on the output. The training that a

musician receives will be represented by the music produced. This is important

Improvisation

15

to consider for the development of an electronic music system; the contents of its

toolbox will reflect its output. Once an understanding of the nature of

improvisation has been established, the application of these principles to the

computer is the next step.

16

3. Artificial Intelligence and Machine Learning

The notion of a computer “making choices” in improvisation has been

mentioned here. There is an implication that to make a choice, one must be

capable of some amount of intelligence, which introduces the question, “What is

intelligence?” One might consider the solving of complex equations by a highly

gifted mathematician, or the moves performed by a chess master, or the

diagnoses of disease by a doctor, as being intelligent. However, the tasks

performed by all of these humans can also be accomplished by a computer,

which is typically considered as not being intelligent. As Eduardo Reck Miranda

says, “the problem is that once a machine is capable of performing such types of

activities, we tend to cease to consider these activities as intelligent. Intelligence

will always be that unknown aspect of the human mind that has not yet been

understood or simulated” (Miranda, 2000). Defining intelligence may be a

contentious task, so we will look to the attributes of it. Widmer points out that

“the ability to learn is undoubtedly one of the central aspects, if not the defining

criterion, of intelligence and intelligent behavior. While it is difficult to come up

with a general and generally agreed definition of intelligence, it seems quite

obvious that we would refuse to call something ‘intelligent’ if it cannot adapt at

all to changes in its environment, i.e., if it cannot learn” (Widmer, 2000).

It is quickly recognized that as the research and technology in the field of

artificial intelligence advances, bringing “musicality to computer music, no

model has yet come close to the complex subtleties created by humans” (Winkler,

1998), a sentiment echoed by Widmer’s statement that although computers and

software can “extract general, common performance patterns; the fine artistic

Artificial Intelligence and Machine Learning

17

details are certainly beyond their reach” (Widmer, 2000). Although Miranda

claims that “from a pragmatic point of view, the ultimate goal of Music and AI

[Artificial Intelligence] research is to make computers behave like skilled

musicians” (Miranda, 2000), it is clear that a machine is not human, and any

attempts to create an intelligent computer are merely tasks of trying to recreate

processes of the brain.

So the focus becomes one of determining what these processes are,

accomplished by looking at the desired end result. When creating a model,

attention is paid to the original design and the details necessary to copy it. But is

the goal really to create a system that is a copy of a human? One of the desirable

attributes of a computer is exactly that it is not human, such as its ability to

handle and process large amounts of data and perform calculations with a speed

and accuracy far greater than that of a human. Dannenburg speaks of the

advantages of relying on a computer’s skills and its ability to “compose complex

textures that are manipulated according to musical input. For example, a dense

cloud of notes might be generated using pitches or harmony implied by an

improvising soloist. A dense texture is quite simple to generate by computer, but

it is hard to imagine an orchestra producing a carefully sculpted texture while

simultaneously listening to and arranging pitch material from a soloist”

(Dannenberg, 2000). Rowe points out that human limitation and variability was

precisely an element that led to the use of electronics in music (Rowe, 1993) and

Bartók comments on the use of the mechanized pianola that “took advantage of

all the possibilities offered by the absence of restraints that are an outcome of

the structure of the human hand” (Bartók, 1937).

Chapter 3 18

Michael Young identifies a resulting attribute of what he calls a “living”

computer as being “unimagined music, its unresolved and unknown

characteristics offering a genuine reason for machine-‐human collaboration.” If

the computer is to “extend, not parody, human creative behaviour, machine

music should not emulate established styles or practices, or be measured

according to any associated, alleged aesthetic” (Young, 2008). It is the discovery

of new ideas and material through the use of computers in music to “create new

musical relationships that may exist only between humans and computers in a

digital world” (Winkler, 1998) that drives the continuing research in the

development of computers in music.

Looking at these factors it can be seen that a desired system may “behave

in a human-‐like manner in some respects but in a non-‐human-‐like manner in

other respects [… Exhibiting] appropriate behavior… in a manner which leads to

a certain goal” (Marsden, 2000). Referring to Widner’s quote previously about

intelligence, that goal is the ability to learn.

This then brings the question, “What is learning?” Russell and Norvig define

it as “behaving better as a result of experience” (Russell and Norvig, 1995); while

Michalski states that it is “constructing or modifying representations of what is

being experienced” (Michalski, 1986). These two definitions address different

elements of learning; improvement of behavior as stated by Russell and Norvig,

and acquisition of knowledge of the surroundings as stated by Michalski.

Marsden summarizes by saying that one key feature of an intelligent animal is its

ability to learn spontaneously from its experiences and adapt future actions as a

response to this, and that a second feature is being able to perform in unfamiliar


19

environments of which they have no previous knowledge, “tolerably well.” As

such, a goal of Artificial Intelligence is the capacity to learn and apply this

learning in unfamiliar situations (Marsden, 2000).

How, then, does a computer accomplish learning in its quest for

intelligence? Widmer cites Michalski’s definition, “learning as the extraction of

knowledge from observations or data”, as the “dominant paradigm in machine

learning research”, with examples of “classification and prediction rules (Clark

and Niblett, 1989, Quinlan, 1990), decision trees (Quinlan, 1986, 1993), or logic

programs (Lavrac and Dzeroski, 1994)” (Widmer, 2000). Through the use of

algorithms, a computer is able to assess data and make comparisons for

purposes of classification. For example, from a stream of pitches an algorithm

can analyze music to “look for collections of notes which form a series, or… check

collections of notes to see if they form a series” (Wiggens & Smaill, 2000).

Learning is thus accomplished through observation of data, allowing the

computer to classify notes as being part of a defined series, or looking for the

series within the notes. Empirical predictions based on trends and probabilities

can be made using generalizations based upon these observations. It is possible

to analyze a stream of notes, looking at intervallic relationships, to determine the

likelihood of what the next note played will be. For instance, if the software sees

the ascending step-‐wise motion of the incoming pitches F G A, it could

reasonably assume that the next note played could be a B. Coupled with some

programmed information akin to the knowledge “toolbox” discussed in the

previous section about improvisation, the computer could make even more

robust analyzations on the basis of tonality to predict upcoming notes, thus

Chapter 3 20

knowing that B-‐flat is also a likely possibility. As the computer continues to

analyze and find trends and patterns in a piece of music, its Knowledge Base can

grow and assign more accurate weights to the probabilities of certain notes. In

this respect, the learning occurs corresponsive to “behaving better as a result of

experience.”

Music-‐theorist Heinrich Schenker says that repetition is “the basis of music

as art. It creates musical form, just as the association of ideas from a pattern in

nature creates the other forms of art” (Schenker, 1954). For this reason, the

ability to recognize patterns is an important one for computers, and a key feature

for music systems. Patterns occur in music in all different levels, including “pitch,

time, dynamics and timbre dimensions of notes, chords and harmony, contours

and motion, tension and so on” (Rolland and Ganascia, 2000). Scale structures,

melodic sequences, rhythms, and chord progressions are all based on the

repetition of patterns. The cognitive processes of expectation and anticipation

derive from the brain’s ability to pick out and identify patterns (Simon and

Sumner, 1968). A cadential chord progression of a V resolving to ii, for instance,

is called a deceptive cadence. Typically in Western music, the chord pattern

should resolve to I, and because the pattern does not go where the listener

expects or anticipates that it will, they have been deceived.

Robert Rowe’s software Cypher uses the concept of anticipation to predict

the performer’s playing by looking for patterns in real-‐time. In this sense, Cypher

is learning based on Russell and Norvig’s definition, “behaving better as a result

of experience”. Once Cypher detects the first half of a recognized pattern, it

assumes that it will be continued, and can then respond to this information as


21

appropriate (Rowe, 1993). The recognition and extraction of patterns involves

“detecting parts of the source material that have been repeated, or

approximately repeated, sufficiently to be considered prominent”. Some

questions raised by Rolland and Ganascia are: “How should ‘parts’ be selected?”,

“What is ‘approximate repetition’?” “What is ‘sufficiently’?” “What algorithms can

be designed and implemented?” (Rolland and Ganascia, 2000). The manner in

which these questions are answered depends on the nature of the music and

how the pattern information is to be used by the software.

Rowe defines two goals in pattern processing as “1) learning to recognize

important sequential structures from repeated exposure to musical examples

(pattern induction), and 2) matching new input against these learned structures

(pattern matching).” Additional information can also be collected from the

patterns, such as the frequency and context of occurrence, and the relationships

between them. Differences such as transposition or retrograde are two such

relationships that can enrich the capabilities of the pattern identifier. Other

enrichment can be the ability to recognize differences with the addition or

omission of notes, metric and rhythmic displacements, altered phrasing and

articulation, and ornamentation (Rolland and Ganascia, 2000).

There will be an inherent bias from the system developer as to the decision

of what constitutes “sufficiently” prominent material to be analyzed. Widmer

addresses the fact that bias can occur in the “representation language in which

the learning system can represent its hypotheses” and that one must “be very

conscious of, and explicit about, any assumptions that guide his/her choice […] of

representation language” (Widmer, 2000). Rowe stresses that it is “critical to

Chapter 3 22

take care that the parameters of the representation preserve salient aspects of

the musical flow” (Rowe, 1993), and Miranda cites, “Designers of AI systems

require knowledge representation techniques that provide representational

power and modularity. They must capture the knowledge needed for the system

and provide a framework to assist the systems designer to easily organize this

knowledge (Bench-‐Capon, 1990; Luger and Stubblefield, 1989).” The point here

is to be mindful of how musical information is expressed to the computer. For

example, in a piece of music there could exist two phrases, one a C-‐major scale,

the other an Eb-‐major scale. If this were represented as note names (Fig. 1) the

two phrases would be regarded as not matching. However, if they were

represented as intervals (Fig. 2), counted as the number of semitones between

notes (note, the ‘-‐‘ for the value of note1, because it requires two notes for there

to be an interval, thus analysis cannot begin until the second note is played) then

the phrases would be considered matches, and the computer could choose to

take an action on the basis of the knowledge that there is scalar activity

occurring. Another example could be in regard to rhythm. For instance, there

could be a phrase played all in half-‐notes, and then again all in quarter-‐notes. If

the analysis were looking solely at the lengths of the notes and phrases, the two

would not match. However, if the lengths of the notes were represented as ratios

compared to the previous note, in this example all would be 1:1, then there

would be a match. These are merely two very simple examples of the way the

representative language can impact the analysis results. It is also not to say that a

phrase analysis should be based solely on one or the other pieces of information,

nor that the differences should be disregarded, either. The information that the

melodic line is the same intervals but transposed, and that the rhythmic pattern


23

is the same but double speed, is also important data that must be expressed and

recorded as a separate point of analysis. This illustrates examples of how data

can be interpreted by “abandon[ing] the note level and learn[ing] expression

rules directly at the level of musical structures” (Widmer, 2000).

For ways to describe these musical structures, we will look again to

comparisons in language. Crucial to the understanding of a language is the

knowledge of the grammar, which must be based on mathematical formalism to

correctly assess the function of each element of a sentence (Chomsky, 1957).

Miranda uses an example of the sentence “A musician composes the music.” To

put this sentence in mathematical terms, the knowledge will be represented in

variables:

Phrase1 Phrase2

note1 C note1 Eb

note2 D note2 F

note3 E note3 G

note4 F note4 Ab

note5 G note5 Bb

note6 A note6 C

note7 B note7 D

note8 C note8 Eb

Fig. 1

Phrase1 Phrase2

note1 -‐ note1 -‐

note2 2 note2 2

note3 2 note3 2

note4 1 note4 1

note5 2 note5 2

note6 2 note6 2

note7 2 note7 2

note8 1 note8 1

Fig. 2

Chapter 3 24

S = NS + VS (Sentence = Noun Sentence + Verb Sentence)

A musician + composes the music

NS = A + N (Noun Sentence = Article + Noun)

A + musician

VS = V + NS (Verb Sentence = Verb + Noun Sentence)

composes + the music

Describing the sentence with variables allows for substitutions from a set:

A = {the, a, an}

N = {dog, computer, music, musician, coffee}

V = {composes, makes, hears}

So the formula S = NS + VS could yield the sentence “The dog hears a computer”,

but it could also produce “The coffee makes a dog”. These mathematical

formalisms help to describe the rules of the language, but don’t prevent these

sorts of nonsense errors. For that, a certain amount of semantic rules or context

must also be supplied to the system, which can be explored through the use of

Artificial Neural Networks (ANN).

ANNs, or “connectionism” or “parallel distributed processing (PDP)”, are

models based on biological neural networks, or broadly speaking, the way the

human brain operates. The important elements of an ANN are that the neurons,

or nodes, are independent and simultaneously operating; they are

interconnected, feeding information between each other; and they are able to

learn based on input data and adapt the weights of their interconnections

(Toivianen, 2000). The basic model of an ANN consists of a number of input and


25

output nodes that are connected to each other at different weights. As each input

node receives information, it passes it to the others for more processing and

outputs a result. The weights of the connections determine how much influence

the data has, and these weights adjust themselves as the data is acquired and

reviewed. If the processed output corresponds to the expected output from the

training, the connection weight is strengthened, and conversely if it is not the

expected output then the weight is weakened.

ANNs can be trained through data sets to learn what result a certain input

should obtain. Using the example of the data set above, an ANN could learn

correct semantics by having correct sentences “read” to it. By training on this

data, for example, “The dog hears a computer”, “A musician composes the music”,

“A computer makes the music”, “A dog hears the coffee”, the network can adjust

the weights of the connections between words, learning that certain words are

more likely to follow others, while some will never follow others, “The coffee

composes an dog”. This principle can be applied similarly in music.

Cypher uses a neural network in chord identification to determine “the

central pitch of a local harmonic area” (Rowe, 1993). To broadly summarize its

operations, it uses twelve input nodes, each corresponding to one pitch class

regardless of octave, which activate when their pitch is played. Each input node

then sends a message to the six different chord theories of which it could be a

part (based on triad formations). For example, if a C is played, it sends a “+”

message to the chord theories of C major, c minor, F major, f minor, Ab major,

and a minor. It also sends a “-‐“ message to all the other chord theories. Doing this

with every note received, Cypher begins to determine what the harmonic area is

Chapter 3 26

based on the most prevalent chords. This information is then fed into another

network to determine the key. The key theories most affected are those that

could be the tonic, dominant, or subdominant of the arriving chord. So, a C major

chord would send a “+” message to the key theories of C major, F major, f minor,

and G major, and a “-‐“ message to the rest.

As the computer continues to learn through observations of the musical

environment, the data can be stored into a database for retrieval. As new

information comes in, the system can analyze and reference it to the database,

making decisions based on the previous material. In this way, learning occurs

initially through Michalski’s definition, and then by Russell and Norvig’s. The

potential of what information the system extracts from its analysis is huge.

Anything that can be represented in a language understood by the computer is

possible, and the task then lies within the creativity of the system designer. In

addition to the note and rhythm examples already given, patterns could be found

in dynamics and volume, density of sound, speed, register, timbre, etc.

27

4. Architecture

Rowe’s Cypher, consists of “two main components, the listener and the

player. The listener (or analysis section) characterizes performances

represented by streams of MIDI data. The player (or composition section)

generates and plays music material” (Rowe, 1993). Most importantly, in regard

to an improvisation system, is that Cypher listens and generates music in real-‐

time, without triggering previously recorded or sequenced material, and without

following a timeline based score as a reference.

4a. Classification Paradigms

Rowe makes a distinction in the classification of interactive systems,

separating the paradigms between Score-driven and Performance-driven systems.

Score-‐driven systems:

“Use predetermined event collections, or stored musical fragments, to match against music arriving at the input. They are likely to organize events using the traditional categories of beat, meter, and tempo. Such categories allow the composer to preserve and employ familiar ways of thinking about temporal flow, such as specifying some events to occur on the downbeat of the next measure or at the end of every fourth bar.”

As compared to Performance-‐driven systems which:

“Do not anticipate the realization of any particular score. In other words, they do not have a stored representation of the music they expect to find at the input. Further, performance-‐driven programs tend not to employ traditional metric categories but often use more general parameters, involving perceptual measures such as density and regularity, to describe the temporal behavior of music coming in” (Rowe, 1993)

The importance in making this distinction is in how the software handles the

incoming data regarding the live performer, and what techniques must be used

Chapter 4 28

to respond. A score-‐driven system uses just that, a score, or some representation

of a score, programmed into the software for it to follow and to which the

incoming signal is matched. Just as a conductor will follow notes and rhythms as

indications as to where the players are, a score-‐based system is programmed to

also identify certain moments or characteristics to know where the player is,

such as pitches, intervals, rhythms, and phrases. A score-‐driven system can also

be leading the performance, functioning based on a clock and reacting to certain

moments in accordance to what the current duration since the beginning of the

piece (or section, or other defined onset) is. As these event markers are found,

the score-‐based system is programmed to perform a function associated with

certain events. For example, play x chord when the performer arrives at y note,

or add delay to this phrase, or harmonize this section, etc.

In contrast, the performance-‐driven system does not follow a score or have

any information about the specific performance pre-‐programmed. It does not

know, for example, that in measure 54 there will be a cadence leading to a key

change. These systems react based on other information it receives, specifics of

which will be discussed later. Because performance-‐driven systems are not

dependent on prior knowledge of the upcoming music, these systems are clearly

better suited for an improvisational setting.

George Lewis, a jazz trombonist, began building and performing with his

interactive system, Voyager, in the late seventies. He says of it:

“The computer was regarded as ‘just another musician in the band.’ Hours were spent in the tweaking stage, listening to and adjusting the real-‐time output of the computer, searching for a range of behavior that was compatible with human musicians. By compatible, I mean that music transmits information about its source. An improviser

Architecture: Classification Paradigms

29

(anyone, really) takes the presence or absence of certain sonic activities as a guide to what is going on.

When I speak of musical ‘interaction’, I mean that the interaction takes place in the manner of two improvisers that have their own ‘personalities.’ The program’s extraction of important features from my activity is not reintroduced directly, but used to condition and guide a separate process of real-‐time algorithmic composition.

The performer interacts with the audible results of this process, just as the program interacts with the audible results of what I am thinking about musically; neither party to the communication has final authority to force a certain outcome-‐ no one is ‘in charge.’ I communicate with such programs only by means of my own musical behavior” (Lewis, 1994).

This approach is a guideline for which my development in an interactive

system is based. The improviser and computer are independent of each other

with their own voice and musical personality. They are not directly controlling,

but rather interacting with and influencing each other, the same way in which a

human duo improvisation would occur. This exemplifies another paradigm, that

of Instrument vs. Player. In an instrumental system, the effect of the computer is

that of adding to and enhancing the input signal with the intention of being an

extension of it, much like many guitar effects-‐pedals. The result is as though the

combined elements are one player and the music would be heard as a solo. In the

instrumental paradigm, the performer is controlling the direction of the

electronics. A player system could also behave like an instrumental system at

times, but the intention is to construct an artificial player with its own musical

presence, personality, and behavior. The degree to which it follows the input

signal varies, and in an improvisational setting neither performer nor computer

is controlling, but rather influencing each other. In this way, the result is more

Chapter 4 30

like a duet (Rowe, 1993; Winkler, 1998). Voyager is an example of the Player

paradigm, and is the goal of an interactive music system.

Rowe identifies three stages of an interactive system’s processing chain:

sensing, where the input data is collected; processing, where the computer

interprets the information it has sensed and makes decisions based on it; and

response, where the system produces its own output (Rowe, 1993). From this

point these stages will be referred to respectively as the Listener, Analyzer, and

Composer components.

The elements of the interactive music system described here have been

designed for a monophonic wind instrument, specifically clarinet and bass

clarinet. With that in mind, there are certain characteristics that have developed

as a response to the particular needs of this instrument, as well as some that

have been neglected, such as addressing the possibilities offered by a polyphonic

instrument. There are some basic technical requirements that won’t be

discussed in much detail, but it will be stated what they are.

First is a computer with the software Max/MSP from the company Cycling

742 with which the patch will be written. A patch is the name for a program

written within Max/MSP. This is one of the most used applications for creating

live electronic music. One of the beneficial features is the ability to create

modular components. That is, an element designed to perform a certain task or

function can be created on its own as a separate patch and incorporated into

2 Max/MSP is commercially available from www.cycling74.com. A free application developed by Miller Puckette, the author of Max/MSP, is Pure Data (PD) available from www.puredata.info. PD functions very similarly to Max/MSP, but not without some differences. Most notable of these are the availability of third party objects, some of which will be discussed here.

Architecture: Classification Paradigms

31

larger patches as a subpatch. Not only does this ease in troubleshooting, by being

able to verify that individual modules work on their own, but it also encourages

sharing within the community of users. It is very common practice for small

objects, abstractions, or patches that one has created to be made available for

others to use in their own works. It can greatly reduce time consumption if an

object or patch already exists that will perform the task one needs it to, without

having to program it entirely oneself. Patches are also adaptable, so that if the

originally conceived function doesn’t operate in the exact way needed for a

different project, small modifications can be made to incorporate it correctly.

The modularity also enables one’s own work to be used in their own future

projects.

The second requirement is a soundcard capable of accepting two

microphone inputs, and third are two microphones, a standard dynamic or

condenser mic and a second contact mic.

Fig. 3 shows an input chain utilizing the two microphones. MIC 1 is the

standard microphone for capturing the sound of the instrument and MIC 2 is the

contact microphone. A contact microphone is a special piezo that reacts to

vibrations rather than sound waves. The contact MIC 2 in Fig. 3 acts as a gate for

the signal from MIC 1. A threshold is set for MIC 2, as seen in the subpatch p vca

in Fig. 4, whereby any signal below the threshold closes the gate and no signal

from MIC 1 will pass. By placing the contact microphone on the instrument, it

will open the gate when the vibrations of the instrument exceed the threshold, as

when playing, and allow the signal from the standard MIC 1 to pass. Using this

Chapter 4 32

method helps to prevent unwanted extraneous room noise from passing through

the microphone, and can also be used to more accurately capture data.

Fig. 3- Input Chain

Fig. 4- p vca subpatch, developed by Jos Zwaanenburg3

3 Jos Zwaanenburg: http://web.mac.com/cmtnwt/iWeb/CMTNWT/Teachers/0D06AA24-‐D6CF-‐11DA-‐9F63-‐000A95C1C7A6.html

33

4b. Listener

The Listener is the stage of the system that collects the data from the input

signal, and it is here that the decision must be made of what the relevant data to

be collected is. Cypher uses the information from pitch, velocity, duration, and

onset time, represented in MIDI format. From this it makes other analytical

classifications like register, speed (horizontal density), single notes versus

chords (vertical density), and loudness. One of the major limitations of Cypher, as

it was written in the late eighties/early nineties, is the representation of data

only as MIDI. The MIDI protocol strips away other important elements such as

timbre, which can also supply information about the overtone partials in a pitch,

and noisiness and brightness of a sound. MIDI also principally limits the pitches

to the well-‐tempered scale, although extra Continuous Controller information

can be added to introduce pitch bends. Additionally, it doesn’t make use of the

live audio signal and therefore the Composer stage can only create pitch-‐based

music from digital synthesis and not from transformation of the original sound,

more of which will be discussed later.

Technology has advanced since the development of Cypher, and computers

today are much faster and hardware more sophisticated and capable of handling

DSP (Digital Signal Processing). DSP allows the analysis of an audio signal so that

timbral information can be included, as well as the representation of the true

pitch as hertz. Since DSP is using the live audio signal it is also possible to affect it

in the Composer stage, adding transformational effects like delay, transposition

and harmonization, ring modulation, distortion, etc.

Chapter 4 34

Using some Max/MSP objects such as analyzer~ created by Tristan Jehan4,

data can be extracted such as pitch, loudness, brightness, noisiness, Bark scale,

attack, and sinusoidal peaks of the partials. Pitch is represented in both hertz and

a decimalized MIDI note, which allows for either tempered or untempered use of

the data. For example, MIDI note 60.25 is equal to a C that is 25 cents sharp. Two

approaches to the use of the data can be taken, either noting the exact tuning of

the pitch, or the tempered note regardless of tuning discrepancies, depending on

the intended use. The loudness value measures the input signal volume on a scale

of decibels. Brightness is a timbral measure of the spectral centroid, or the

perceived brightness of the sound, whereas noisiness is a timbral measure of

spectral flatness, on a scale of 0-‐1. 0 is more “peaky” like a pure sine wave, which

oscillates with a certain number of peaks in the signal spectrum to create a

frequency, whereas 1 is more “noisy” like white noise, where peaks of all

frequencies are of the same power and create a flat spectrum. The Bark scale

measures the loudness of certain frequency bands that are associated with

hearing (Zwicker and Festl, 1990). An attack is reported whenever the loudness

increases by a specified amount within a specified time, and the sinusoidal peaks

of the partials report the frequencies and amplitudes of a specified number of

overtone partials in the signal.

Another object similar to analyzer~ is sigmund~, created by Miller

Puckette5. It provides some of the same data, although some of it is formatted or

functions differently. Pitch is available as a continuously outputted decimal MIDI

4 Tristan Jehan: http://web.media.mit.edu/~tristan/maxmsp.html

5 Miller Puckette: http://crca.ucsd.edu/~msp/software.html

Architecture: Listener

35

note, but not as hertz, but sigmund~ has a parameter notes which outputs the

pitch at the beginning attack of a note rather than continuously. This can be

useful when dealing with an unstable pitch such as from a wind instrument,

which is making constant minute fluctuations, and the desired data is that of the

principle pitch. Loudness is reported, but as linear amplitude rather than as

decibels. Sinusoidal components are also available, but organized differently.

Sigmund~ outputs the sinusoids in order of amplitude, whereas analyzer~ does

so in order of frequency. This difference can affect which frequencies are

reported, depending on how many sinusoids are asked for. For example, if three

peaks are requested from each object, analyzer~ will output the lowest three

partials, but sigmund~ will output the three partials with the highest amplitude.

The choice of which to use again lies in how the data will be used. Sigmund~ does

not provide data for brightness, noisiness, attack, or Bark scale.

In addition to the inherent data available from analyzer~ and sigmund~, the

duration of a note can be calculated by measuring the time between the onset of

a note and when either the pitch changes or the volume drops to 0. Fig. 5

demonstrates receiving the data from midivelocity and upon receipt of a non-‐

zero, starts the timer. Midivelocity sends a zero at the end of every note and is

described in more detail in the discussion of the Analyzer component. When the

timer receives this zero message, it stops and thus calculates the time between

start and stop giving the duration of a note in milliseconds.

Chapter 4 36

Fig. 5- Note Duration

A common problem of computer electronics is that of pitch detection in

real-‐time. It is difficult for the computer to correctly analyze analog pitch,

especially at fast tempi. With MIDI controllers such as keyboards, EWIs

(Electronic Wind Instruments), or electronic percussion the MIDI information

can be transferred immediately and note names can be understood based on

which key or combination of keys is pressed. With an analog signal, the computer

must first try to interpret the pitch to determine what note it hears, which

creates latency. In a fast passage it is likely that the computer will miss or

misinterpret some notes. In relation to a “live” human duo improvisation, one

player will surely not be able to recreate every single note that the other has

played, but will understand the overall shape and idea. Young also recognizes the

need for a broader analysis as it pertains to freely improvised music (Young,

2008). Since the genre is not reliant on precise harmonic relationships and

rhythms, it is sometimes better to not focus on capturing every individual note,

but instead to focus on phrases.

Architecture: Listener

37

Max/MSP allows for recording into a buffer~, a “storage space” for the

audio signal. Other objects can call upon the recording in the buffer for playback

and manipulations to the signal can be made. Buffers can be of different lengths,

but an initial choice must be made as to what that size will be. When the buffer

has been filled, it continues recording back at the beginning, overwriting the

previous contents. Making the size too small could potentially mean that

previously played and relevant material is no longer accessible, so it is better to

err on the large side. There is an upper limit, however, based on factors such as

the computer’s available memory. Fig. 6 shows a buffer of ten minutes called

improv1. When the Record to Buffer toggle is on, the signal is recorded, as shown

by the waveform, and the clocker object is started. The time from clocker

correlates to the current recording position in the buffer, buffertime, and this

data can be used to reference specific points of the recording. If the buffer

reaches the end and restarts at the beginning, clocker is reset as well.

Fig. 6- Recording Buffer

A global time component can also be used, measuring the overall time from

the start of the performance. Fig. 7 demonstrates a simple way of achieving this.

Chapter 4 38

The timer receives a bang from inlet1 to start counting. Inlet1 would be

connected to the Global Start, which could be the opening of the patch, or

another start button used to begin the patch for performance. Inlet2 receives a

bang at the beginning of each event, which causes timer to output the current

time in milliseconds. This timestamp can be used in the data collection as a way

to identify each event.

Fig. 7- Global Time

Rhythm is of course another important element of music that should be

discussed. Previous systems have devised methods of interpreting rhythms and

tempi. Rowe, Winkler, and Cope each discuss techniques to gather this

information in their books, to which I refer the interested reader. In the context

of free improvisation, however, the necessity for this exact information is less

important because the style is free from constraints of a unifying tempo and

meter. More important aspects are the general amount of activity within a period

of time (horizontal density), the time elapsed between events (delta time), and

the length of events (duration).

39

4c. Analyzer

From the Listener component the data needs to be sent for interpretation

in the Analyzer. In addition to analysis, this section will also create the database

for storage and retrieval. There is a multitude of ways to analyze the data

depending on what parameters are needed or desired for the Composer Section.

Fig. 8 shows a patch that analyzes for pitch, pitch class, interval, register, lowest

pitch, highest pitch, number of note occurrences, loudness, note duration, delta

time, and horizontal density, as well as the timbral characteristics brightness and

noisiness. Data for the beginning and ending of phrases, the globaltime, and

buffertime are also recorded. The characteristic descriptors are sent to individual

databases, a global (master) database, and a phrase database. As each new

phrase is completed, it is compared against the previous phrases to determine

which is the closest match.

There are four elements used for organizational purposes, an index and

phrase number, and globaltime, and buffertime stamps. The index is the counter in

the upper-‐left corner of Fig. 8, counting every single event as it occurs, received

from the object r midinote, which is sending from analyzer~ in another patch. To

the right is the phrasemarker subpatch shown in Fig. 9. Globaltime begins

counting at the start of the performance, activated here when the Record to

Buffer toggle from Fig. 6 is clicked, and does not stop for the entire duration of

the performance. Buffertime is similar, however is meant to keep a record of the

onset times of events happening in relation to the current position in the buffer.

The time will be the same as globaltime until the buffer is filled and starts over,

also resetting buffertime. The reason for tracking both times is precisely because

Chapter 4 40

of this possibility. If, for example, the performance has elapsed the buffer length,

causing it to start over, but data from the previous cycle of the buffer needs to be

used, it can be referenced using the globaltime, as using buffertime could relate to

new data in the buffer. However, only referencing from globaltime will not be

effective if the necessity is to playback current material from the buffer. In this

case the position in the buffer from buffertime is needed.

Architecture: Analyzer 41

Fig. 8- Analyzer Component

Chapter 4 42

The designer can independently determine what might constitute a phrase.

Rowe uses discontinuities in characteristics as an indication, with different

characteristics applying different weights in the determination of phrase

boundaries. He gives the example that discontinuities in timing are weighted

more heavily than those in dynamics; meaning changes of dynamics are less

likely to signal a phrase boundary than changes in the timing. When the amount

of change of the different features exceeds a threshold, a phrase is marked. He

also notes that, by the nature of this phrase finding, the discontinuities cannot be

found until they’ve already occurred (Rowe, 1993).

Saxophonist and programmer Ben Carey uses silence as an indication of

phrase separation in his interactive system _derivations (Carey, 2011). When the

audio signal volume drops to 0, or another determined threshold, for a user-‐

defined length of time, a phrase marker can be introduced. Fig. 9 demonstrates a

method of achieving this in Max/MSP. The patch receives the loudness signal

named envelope. When the signal level drops to 0, it starts the clocker. If the

elapsed time reaches the threshold of 500 milliseconds a bang is sent. This bang

indicates that a phrase has been finished, but what is also useful to know is when

the next phrase begins. To indicate this, the bang is stored in onebang until a

non-‐zero allows it to output, indicating the beginning of a new phrase. The non-‐

zero also stops clocker, which then waits for another silence to begin counting

again.


Fig. 9- Phrase Marker

The note-‐related material is next to the right in Fig. 8, starting with those

concerning pitch. The first record is the actual pitch in MIDI note-‐number format.

Note 57, as shown in Fig. 8, corresponds to the pitch A3. The pitch class can then

be calculated, resulting in the pitch without regard to octave. It is shown in Fig. 8

as A-‐2 octave simply because Max does not have the capability to display the

note name without the octave indication, and -‐2 is the lowest octave. This display

is only for the benefit of the user to easily see the pitch class, and the information

to be recorded is in numeric values, in this case 9 for the note A (C=0, C#=1, etc.).

The interval is calculated by subtracting the current note from the previous,

resulting in the number of semitones between them, and register is calculated by

dividing the pitch by 12. Subtracting by the integer 0 results in a whole-‐number

classification of register. The lowest and highest pitch are recorded twice, both

Chapter 4 44

globally and on a phrase-‐by-‐phrase basis, and using a histo keeps a record of the

number of times a notes is played.

Loudness is received from analyzer~ in decibel format, whereas the

midivelocity is in MIDI format. MIDI keyboards send note-‐on messages when a

key is depressed, but also a note-‐off message of a 0 upon its release. Midivelocity

is calculated with a note-‐off function so that it operates in the same manner. A

note-‐off is sent either when the note changes, when the volume from envelope

drops below a threshold (40 in Fig. 10), or when the volume increases by a

specified percentage after a specified time. The drop below the threshold is a

latency compensation for the fact that the envelope won’t drop to 0 immediately

after the player stops and so more accurately calculates the note-‐off time. The

percentage threshold measures the envelope level every 50 milliseconds and

divides by the previous value. If the increase is above the set percentage then a

note-‐off is reported. The principle is similar to the attack data sent by analyzer~,

however in analyzer~ it is measured by an increase in decibels within a given

time. The method described in Fig. 10 was developed with wind instruments in

mind and accounts for small spikes during tonguing, and is found to be more

accurate in reporting attacks. It allows for the note-‐off message not only with

staccato, but also with legato tonguing. An appropriate threshold should be

personalized for each player and instrument, however.


Fig. 10- Midi Velocity with Note-off

The velocity values with the note-‐off messages help to determine note

duration, as discussed earlier with Fig. 5. The delta time between the end of one

event and the beginning of the next can be calculated similarly with a timer. The

horizontal density is a measure of the number of notes that occur in a space of

time. Fig. 11 demonstrates calculating this by counting the number of notes in a

phrase and dividing the sum by the length of the phrase in milliseconds. The

multiplication by 1000 and rounding off to an integer is merely to achieve a more

comparable number to assign to the phrase for classification.

Chapter 4 46

Fig. 11- Horizontal Density

The individual databases collect the information from every event for each

descriptor separately. They are kept in a coll database stamped with the indexing

number and the phrase to which they belong. The data in Fig. 12 shows an

example from the pitch database. The first numbers of each line, 10-‐20, indicate

the indexing number, the second indicates the phrase number, and the final

number is the pitch expressed as a MIDI note. Individual databases are kept for

pitch, pitch class, interval, register, loudness, duration, and deltatime. Highest and

lowest pitch, number of note occurrences, and horizontal density are already

statistical data, based on a broader spectrum, so they do not have their own coll.

Brightness and noisiness are also kept from individual databases because their

data flows continuously, rather than on a per-‐event basis, so it will be recorded

in a different manner that will be described later.


The master coll keeps all the individual data as well timestamps from

globaltime and buffertime, organized by the index. The data in Fig. 13 reads index,

phrase, globaltime, buffertime, pitch, pitchclass, interval, register, loudness, note

duration, and deltatime.

One can see that some of the data doesn’t make sense, such as the duration

values for index 10. Fig. 13 shows a note duration of 0 and delta time of 0, yet a

difference of 519 between the start times of indices 10 and 11. There are a

couple factors that can contribute to misleading data, one being complications

with the Listener component. Further adjustments need to be made in the input

chain by tweaking levels and thresholds to more accurately capture good data

and filter out mistakes.

A second contributing factor that could occur, although that doesn’t appear

to be the case in this instance, is time delay issues. Although data is flowing

extremely quickly in the computer, the patch still ultimately follows a series of

events, which can create slight inconsistencies. As the measurements are being

recorded in milliseconds, which are generally imperceptible, some amount of

leeway is acceptable.

A more holistic viewpoint was discussed earlier in the section about the

Listener component in regard to the nature of improvisation, an imperfect affair

anyway. While striving for accurate data is the goal, accepting the imperfections

can also bring a more “human” element. The comparison was made to a “live”

human duo setting, and the fact that one player will not obtain all the

information provided by the other, but will understand a more general idea of

the phrase. Rowe expresses that the point is not to “’reverse engineer’ human

Chapter 4 48

listening but rather to capture enough musicianship.” With its phrase analysis,

the Analyzer can take this approach to interpreting what it hears as well. By

computing the averages of the characteristic descriptors for each phrase, a

generalized description can be rendered and assigned to each one.

The phrase coll is the largest database, keeping records of not only all the

characteristics held in the master coll, but also of the highest and lowest pitch,

horizontal density, brightness, noisiness, the global and buffer end timestamps,

and the phrase match and confidence level. For each of the descriptors, apart from

the timestamps and highest and lowest pitch, the means and standard deviations

are calculated for the phrase and stored in the phrase coll (Fig. 14), creating what

Thomas Ciufo calls a “perceptual identity” (Ciufo, 2005). At the end of each

phrase, these values are sent for comparison against the means and standard

deviations of all the previous phrases. The phrase with the most matches is

reported with a confidence level, the percentage of matches. This data is added to

10, 2 56;

11, 2 55;

12, 2 50;

13, 3 57;

14, 3 61;

15, 4 61;

16, 4 62;

17, 4 56;

18, 4 64;

19, 4 65;

20, 4 63;

Fig. 12- Pitch Coll Database

10, 2 17386 17386 56 8 -‐2 4 -‐28.907839 0 0;

11, 2 17905 17905 55 7 -‐1 4 -‐19.907631 228 0;

12, 2 18598 18598 50 2 -‐5 4 -‐27.446226 464 0;

13, 3 22499 22499 57 9 7 4 -‐19.360497 3436 3342;

14, 3 22826 22826 61 1 4 5 -‐24.470776 3436 3342;

15, 4 24033 24033 61 1 0 5 -‐34.994293 930 884;

16, 4 24359 24359 62 2 1 5 -‐31.124811 930 884;

17, 4 24729 24729 56 8 -‐6 4 -‐27.600847 930 884;

18, 4 25102 25102 64 4 8 5 -‐28.859121 696 0;

19, 4 25565 25565 65 5 1 5 -‐32.421593 271 0;

20, 4 25893 25893 63 3 -‐2 5 -‐31.064672 420 0;

Fig. 13- Master Coll Database


the phrase coll as well as to its own separate matches coll to keep track of which

phrases matched to which descriptors for later retrieval.

Carey explores the concept of long-‐term memory with his _derivations. He

has incorporated the ability to save databases and load them into the system in

the future. This Rehearsal Database includes all the data that _derivations

gathered during a previous use of the system, as well as the saved recording

from the buffer. Loading previous databases allows the system to make use of

what it has learned before “with an already rich vocabulary of phrases and

spectral information” (Carey, 2011).

Chapter 4 50

Fig. 14- Phrase Coll Database


Fig. 15- Phrase Matcher

The collection of information into the individual databases helps to create a

system that is learning based on Michalski’s definition, “constructing or

modifying representations of what is being experienced”. The incorporation of

the phrase-‐matching component is the starting point to also bring it in line with

Russell and Norvig’s definition, “behaving better as a result of experience”. The

arrival of information into the individual colls is akin to implicit learning, and

actively matching this against other memories exhibits explicit learning behavior.

The system has had, and has made notes of, previous experiences, and the

phrase-‐matching allows it to start comparing new experiences to the old ones

and make decisions based on what it has learned. For example, in Fig. 15 phrase

Chapter 4 52

34 is best matched to phrase 19 with a confidence level of 25%. The Analyzer

could decide to use data from the matching parameters of phrases 34 and 19

(pitch, pitchclass, and brightness) to send to the Composer. Or, it could decide to

use the data from the non-‐matching parameters, or perhaps it decides to just use

data from brightness. Phrase matching could also use weighting to allow certain

descriptors to play a more dominant role in determining which phrases match.

Using the confidence level enables an additional level of matching, and the

Analyzer could choose to match data only with phrases that have a confidence

level at least as high. The means and standard deviations of the input signal

could also be calculated in real-‐time and analyzed in another instance of the

phrase matcher, calculating real-‐time matches to previous phrases

characteristics. The Analyzer could then determine, for instance, that the

performer is currently playing notes with short durations, and decide to

accompany by playing a phrase or phrase fragment from the buffer of

predominantly long notes. The options of possibilities are limited only to the

creativity and knowledge of the system developer.

The concern of bias from the developer was mentioned earlier, and it is

here and with the Composer component that it can be most evident. With the

Analyzer, the bias can result from the ways the system handles decision-‐making,

whereas with the Composer it could be from the sonic and musical aesthetic of

the developer, and what types of compositional techniques are used. Widmer

cautioned in the choice of representation language to avoid bias. The relevance

to his heed in this case lies in the programming of the decision-‐making.


It is important to not create solely finite conditional statements (if x occurs,

then do y) as this leads to predictable behavior, not befitting of an

improvisational system. A better condition would be: “if x occurs, then do y or z

or q or l or w, or…” etc., where each variable is an appropriate response to the x

condition. An example in a live improvisation is that Player 1 is improvising fast

notes, mainly in a lower register, but sometimes will play a long, high note.

Player 2 hears this high note as a unique musical idea that he wants to utilize,

and decides on possible options to do so, such as matching the long, high note; or

playing short, low notes; or harmonizing the note; or use it as a starting note to

base another phrase, etc. These decisions are all implicit responses of Player 2

that will manifest themselves naturally during improvisation. An even better

condition would to be replace “if x occurs” with “if x occurs a (randomly

generated number) of times”, and for each then statement also have variable

factors, and then to have this entire conditional if-then statement active only at

some times.

By using multiple instances of this type of condition available for different

actions, a toolbox is being built up. The system will respond based on its

programmed knowledge, and therefore may react similarly to a previous time,

but never in the exact same way. It will be predictable in that its responses make

sense in the moment and sometimes will make the same decision as it had in

some previous instance, but unpredictable in what the output will be. This

exemplifies Levin’s quote previously stated in regard to improvisation:

“The fact of the matter is that you are who you have been in the process of being who you will be, and in nothing that you do will you suddenly-‐ as an artist or a person-‐ come out with something that you have never done before in any respect. There will be quite possibly

Chapter 4 54

individual elements in a performance that are wildly and pathbreakingly different from anything that you’ve done before, but what about the rest and what kind of persona and consistency of an artist would you have if there was no way to connect these things…?” (Levin, 2007).

The system will have its own personality and sound, the same way that people

are able to hear Miles Davis, or John Coltrane, or any number of musicians, and

immediately know that it is them playing, even though they are not playing

exactly anything they’ve ever played before.

How the Analyzer makes the decisions of which action to take after making

an analysis, or of which if-then condition to activate, is tied also to the discussion

of improvisation. Discussed earlier was the fact that improvisers are aware of

larger, global-‐scale, explicit elements, but the fine details are just motoric,

implicit, responses. An interactive system can reconstruct this condition with the

use of constrained randomization.

John Cage experimented with randomness and indeterminacy in the forties

and fifties, using algorithmic and random procedures as compositional tools, to

select options or set musical parameters (Winkler, 1998). This is related to

improvisation in that the outcome is unknown until it happens. Algorithms are

not cognitive and thus cannot make creative decisions, but they can, however,

“produce non-‐arbitrary changes in state… manifest[ed] as a ‘decision’ when it

modifies the audio environment… [I]t has the affect of intention” (Young, 2008).

Young continues to say that the unpredictable output of both performer and

computer should not be achieved through “simple sonification of rules or sheer

randomness. There should be a critical engagement between intended

behaviours, an appraisal of potential behaviours and response to actual sonic


realisations and their unfolding history.” A certain amount of randomization

occurs during improvisation, but it is still within a context. The constraint is

what makes it still sound like music, as opposed to pure chaos randomness. It is

very easy to generate completely random output within Max/MSP, but it is also

possible to use parameters to frame the randomization, as illustrated in the

several types of procedures in Fig. 16. Fig. 16c-‐i are part of a collection from

Karlheinz Essel6. They provide useful expansions on randomization procedures.

Fig. 16a) generates a random integer between 0 and 9.

Fig. 16b) generates a random integer between 0 and 9, within 3 integers of the previous generation.

Fig. 16c) generates an integer between 0 and 9 where the adjacent outputs are adjacent numbers.

Fig. 16d) generates an integer between 0 and 9 ensuring no immediate repetitions.

Fig. 16e) generates an integer between 0 and 9 with a 30% chance of repetition.

Fig. 16f) generates an integer between 0 and 9 without repeats until all numbers have been generated.

Fig. 16g) generates a floating-‐point decimal number between -‐10 and 9.99999.

Fig. 16h) uses the drunk object and will generate any float number up to 5 decimal points between -‐10 and 9.99999, using a Brownian linear scale.

Fig. 16i) generates an integer between 0 and 5 using a Markov chain, a table of transitional probability.

6 Karlheinz Essl: http://www.essl.at/

Chapter 4 56

Fig. 16- Random procedures

Some of the useful applications in music can already be seen, particularly

with Fig. 16c, which can generate stepwise motion, and Fig. 16f, which can

generate a twelve-‐tone row. All of the parameter settings, or arguments, given in

the descriptions of the figures represent those illustrated, but can all be changed.

The random generators are not limited to producing only numbers between 0

and 9. The arguments for each of these objects can be linked to the data collected

by the Analyzer to create randomizations that have a reference to the musical

performance. For example, the lowest pitch and highest pitch could be fed to the

between object in Fig. 16g to generate pitches within the same range.

Rowe uses another instance of an Analyzer in Cypher that listens to the

output of the Composer. He calls this the Critic. The decisions the Composer has

made of what music it will produce is sent to the Critic for analysis before being

sent to the sound generators, and fits to Levelt’s fourth process of speech

processing, self-monitoring and self-repair. This allows the system to make

modifications before actually creating the music. Rowe acknowledges that

“evaluating musical output can look like an arbitrary attempt to codify taste,”


and the capacity for the system to have “aesthetic decision making” skills is

“arbitrary”, and it needs “a set of rules [that] controls which changes will be

made to a block of music material exhibiting certain combinations of attributes”

(Rowe, 1993). This is again a viable source of bias. It could be argued that

including various rules helps to maintain musicality that a computer cannot

inherently have, but the counter-‐argument can easily be made as to how this

definition of musicality is written. It is again important that the reactions of the

Critic aren’t represented by strict rules, but the use of probability weights can

help maintain a learning paradigm. For example, if in one phrase the live

performer played loudly and the computer responded by playing quietly, the

Critic could increase the probability weight that the next time the performer

plays quietly, the computer will play loudly, as in a solo/comping exchange

situation. Representing this musical possibility as a strict rule would not be

conducive to improvisation, but incorporating it as a possibility in the toolbox

with parameters to find the probability that this action is appropriate is.

Another possible way to incorporate a critic is by analyzing the output of

the Composer with the response from the performer. In a duo improvisation,

each player is responding to each other, taking in what the other has played and

making musical comments, described by Hodson as “a self-‐altering process: the

musical materials improvised by each musician re-‐enter the system, potentially

serving as input to which the other performers may respond” (Hodson, 2007).

By analyzing how the live performer reacts to the computer, the system can

learn about its own composing as well, and what “works” or not. Decisions can

be made based on whether the performer is cooperating or trying to take the

Chapter 4 58

music in a different direction. In this way, the critique is based on the

performance and interaction of the moment, rather than codified rules.

59

4d. Composer

“Improvisation defies clear definition. Even though most musicians have difficulty explaining what it is, many can tell you the basic way that they approach it. Unlike jazz, which often deals with the improvisatory rules in a kind of gamelike exchange of modes and melodies, electronic music often lacks the qualities of rhythm, harmony, and melody that many jazz musicians rely on. Instead, electronic music improvisation is sound: the shape of the envelope; timbre; rhythm; layers or filtering; effects (echo, delay, ring modulation, etc.); amplitude; and duration. A seasoned improviser learns how to listen to many layers of sound activity as part of a performance” (Holmes, 2002).

Thom Holmes’ quote gives important insight for the approach to

developing the Composer component of an electronic improvising system. Not

only is it applicable to electronic improvisation, but also to the genre of free

improvisation as a whole. Previous systems like Robert Rowe’s Cypher or George

Lewis’ Voyager created MIDI-‐based improvisations, which are focused on the

note and rhythm paradigm. With the DSP capabilities of today, the musical realm

for electronics is expanded exponentially. While pitch and rhythm are certainly

still appropriate musical considerations, the world of sound design, with the

ability to sculpt, manipulate, and synthesize, has become an equally viable option.

There are three types of compositional methods available to a computer:

sequencing, transformation, and generation (Rowe, 1993). Sequenced music is

predetermined in some way, traditionally as a MIDI sequence, but can also be

prerecorded audio that is triggered to play back. Algorithms that produce a fixed

response, such as those that do not use indeterminate variables, are also

considered sequenced. Transformation takes the original material and changes it

in some way to produce variations. This can range from obvious transformations,

like adding a trill to a note or passing the signal through effects like a ring

Chapter 4 60

modulator, to more intricate variations like creating a retrograde inversion or

playing the signal backwards, to a complex re-‐synthesis of the entire sound

spectrum. Generative composition uses algorithms with very little source

material to produce music on its own. It could make use of information like a

scale set from which to choose pitches, but the lines produced are unique choices

from within the scale. Sound design techniques like additive or vector synthesis

are also generative composition. Within the context of improvisation,

transformative and generative composition are the most useful techniques and

will be the ones addressed here.

The options for the capabilities of the Composer are limitless. It is in the

development of this component, the building of the toolbox, that the designer’s

creativity can unleash. Some of the transformational techniques that Cypher is

capable of include:

Accelerator- shortens the durations between events.

Accenter- puts dynamic accents on some of the events in the event block.

Arpeggiator-‐ unpacks chord events into collections of single-‐note events, where each of the new events contains one note from the original chord.

Backward- takes all the events in the incoming block and reverses their order.

Basser- plays the root of the leading chord identification theory, providing a simple bass line against the music being analyzed.

Chorder- will make a four-‐note chord from every event in the input block.

Decelerator- lengthens the duration between events.

Flattener- flattens out the rhythmic presentation of the input events, setting all offsets to 250ms and all durations to 200ms.

Glisser- adds short glissandi to the beginning of each event in the input block.

Architecture: Composer 61

Gracer- appends a series of quick notes leading up to each event in the input block. Every event that comes in will have 3 new notes added before it.

Harmonizer- modifies the pitch content of the incoming event block to be consonant with the harmonic activity currently in the input.

Inverter- takes the events in the input block and moves them to pitches that are equidistant from some point of symmetry, on the opposite side of that point from where they started. All input events are inverted around the point of symmetry.

Looper- the loop module will repeat the events in the input block, taken as a whole.

Louder- adds crescendo to the events in the input block.

Obbligato- adds an obbligato line high in the pitch range to accompany harmonically whatever activity is happening below it.

Ornamenter- adds small, rapid figures encircling each event in the input block.

Phrase- temporally separates groups of events in the input block.

Quieter- adds decrescendo to the events in the input block.

Sawer- adds four pitches to each input event, in a kind of sawtooth pattern.

Solo- is the first step in the development of a fourth kind of algorithmic style, lying between the transformative and purely generative techniques.

Stretcher- affects the duration of events in the input block, stretching them beyond their original length.

Swinger- modifies the offset time of events in the input block. The state variable swing is multiplied with the offset if every other event; a value of swing equaling two will produce the 2:1 swing feel in originally equally spaced events.

Thinner- reduces the density of events in the input block.

TightenUp- aligns events in the input block with the beat boundary.

Transposer- changes the pitch level of all the events in the input block by some constant amount.

Tremolizer- adds three new events to each event in the input block. New events have a constant offset of 100ms, surrounding the pitch with either two new above and one below, or two new below and one above.

Chapter 4 62

Triller- adds four new events to each event in the input block as a trill either above or below the original pitch. (Rowe, 1993).

These transformations are rather easy to accomplish within the MIDI domain,

but many can also be applied in DSP. Of Rowe’s transformational techniques, the

ones that are easily accomplished in direct relation to a phrase can be put into

three categories: time-domain, pitch-domain, and volume-domain. Those in the

time-‐domain include: accelerator, decelerator, looper, phrase, and stretcher;

pitch-‐domain include: chorder, harmonizer, inverter, and transposer; and volume-‐

domain are: louder, and quieter. Backwards is also an easy time transformation,

but functions differently than Rowe’s. Rather than a retrograde as he describes, it

is possible to play backwards like spinning a vinyl LP record backwards. A

retrograde is also possible, but a more complicated task that will be discussed

later.

Time-‐stretching is possible using objects such as the supervp~ (Super Phase

Vocoder) collection7 and grainstretch~8, allowing for speeding-‐up or slowing-‐

down audio in the buffer without changing the pitch. These objects, as well as

native objects like groove~, can also be used for looping, phrase-‐making, and

backwards playback. Supervp~ and grainstretch~ are also capable of pitch-‐

shifting for harmonizing and transposition. Other Fast Fourier Transform (FFT)

objects like gizmo~ also perform pitch-‐shifting, and can be used for inversions.

This can be easily accomplished by using the same process as to create a MIDI

inversion, shown in Fig. 17. This patch functions just as Rowe describes, inverted

around middle C, or MIDI note 60. In this example a G (MIDI note 79) is played,

7 SuperVP is available from IRCAM: http://anasynth.ircam.fr/home/english/software/supervp

8 Grainstretch~ was written by Timo Rozendal: http://www.timorozendal.nl/?p=456


nineteen semi-‐tones above middle C, which is then inverted to an F (MIDI note

41), nineteen semi-‐tones below. The pitches are converted to their frequencies in

hertz, and the inverted pitch is divided by the original to find the transposition

factor. This value is sent to gizmo~ (inside the pfft~ patcher) to transpose the

incoming signal from the performer, producing an inverted accompaniment. The

crescendo and decrescendo volume transformations are as easy as increasing or

decreasing the amplitude over the length of the phrase playback.

Fig. 17- FFT Inversion

The other transformations Rowe uses, such as the retrograde, require

adjustments to individual events within a phrase. The transformations can be

applied similarly, but either data from the individual colls needs to be accessed to

determine where the events occur within the buffer, or other techniques need to

be used to manipulate the individual notes.

Chapter 4 64

The examples of the objects above in the time and pitch domains can also

be used in much more creative ways using DSP. The supervp~ objects has many

options for cross-‐synthesizing one signal with another for vocoding and filtering

applications, and grainstretcher~’s granular transformations can create a wealth

of possibilities. The sinusoidal data from sigmund~ can also be used in a

transformational manner with a generative aspect as well. Fig. 18 demonstrates

a simple synthesizer that uses oscillators to generate sine-‐waves using the

frequencies and amplitudes of the overtones from the input signal. Each

frequency can also be transposed individually, or on a global level, and the

amplitudes can be swapped to different frequencies. The drunksposition

subpatch uses a random generator that can give a vibrato effect, with varying

degrees of speed and width, using a transposition function. This synthesizer

could be used as an effect on the input signal or using a phrase from the buffer.

Other typical effects are also transformational options of the Composer like delay,

distortion, ring modulation, chorus, flanger, and envelope filters which can all

easily be added to the signal chain.

Generative composition uses the completion of processes and algorithms to

create music. Pre-‐existing material is not necessary, but the generation can be

based on set parameters. Fig. 16f is an example of a generative algorithm that,

when the max is set to 12, would produce the numbers for a twelve-‐tone serial

row. Using these as MIDI pitch classes, octave displacements could be made and

the notes sent to sound generators for further realization. The pitches could

easily be played as MIDI output, or converted to frequencies and sent to other

generators, like one of the oscillators of Fig. 18.


Fig. 18- Overtone Synth

Similar formalisms can be used for timing. Using Brownian motion from Fig.

16h, Essl also created a patch to generate rhythms. In Fig. 19, a sound is

Chapter 4 66

produced between every 51-‐1000 milliseconds (entry delays, ED). The ED-value

of 12 indicates that there are twelve permutations available (the row index), each

assigned to a value between 51-‐1000. The Brown factor determines how close

the output is to the previous generation, 0 creating a constant and 1 creating

pure randomness. Fig. 20 combines these components to generate notes with a

rhythm and articulation. The rhythm generator is enhanced with the durations,

so that it creates notes that occur within a space of time from each other, but also

last differing amounts of time. The pitch and durations are sent to a MIDI

soundbank, an oscillator synthesizer, or both simultaneously. Arguments for

these randomization modules can be taken from data from the Analyzer to make

the output more relevant to the input signal. Further, the expansion of the

toolbox can continue to enhance the generation from the Composer, such as by

including data in regard to scales and modes. From this, the melody generator

could have a more limiting set from which to compose, and formulas for

rhythmic composition could create a more metered pulse.

Fig. 19 - Essl Brownian Rhythm Generator


Fig. 20- Essl Brownian Pitch-Rhythm-Articulation Generator

Besides note-‐based synthesis, Max/MSP is also capable of soundscape

creation. One simple example is Fig. 21 from Alessandro Cipriani and Maurizio

Giri’s book Electronic Music and Sound Design demonstrating a white noise

generator with a frequency filter. Adjusting the parameters of the filter creates a

wide spectrum of sonic variety. Other synthesis can be produced through

combining and manipulating oscillators of different waveform shapes (sine,

sawtooth, square, triangle), used in conjunction with envelope filters. Combining,

layering, and using the output from one compositional element to affect and

influence another are all methods to further create interesting results. The

output from these soundscape generations can also be used for cross-‐synthesis

transformation with the input signal or the buffer. The possibilities of sound

design within Max/MSP are huge, and discussing them all is beyond the scope of

Chapter 4 68

this paper. For further study, I refer the interested reader to Cipriani and Giri’s

book.

Fig. 21- Cipriani/Giri- Noise Filtering

This section has discussed the design structure and architectural

requirements for an improvisational system. Differences between score-‐driven

and performance-‐driven paradigms, as well as instrumental and player

paradigms, were described as models for the interactive system. The

architecture was defined in three components, the Listener, Analyzer, and

Composer. The Listener accepts and collects the input, the Analyzer makes

processes, makes decisions about, and stores the data, and the Composer

produces music either sequentially, tranformationally, or generatively. The

incorporation of constrained indeterminacy helps to maintain an

improvisational yet musically relevant nature.

69

5. Conclusion

The focus of this paper has been on the development of an interactive

electronics system for improvised music. It has considered how the use of

electronics has evolved over time and its role in music. There was discussion

about the nature of improvisation and brain processes relating to cognition

while playing, and it was learned that improvising is an automatic response

based on learned elements in one’s musical “toolbox”. The concept of learning as

a basis for intelligence was then discussed, along with ways that this can be

achieved artificially with a computer. After these theoretical constructs were

gathered, the development of the software system itself was examined.

Implementing performance-‐driven, player paradigms as the best approaches for

interactive improvisation, Robert Rowe’s Cypher was used as a model and point

of discussion. The components of the Listener, Analyzer, and Composer of my

own interactive system were analyzed with reference to what was discovered

about improvisation and learning. By creating a database and referencing new

knowledge to it, the computer is able to learn and make informed choices. By

building a “toolbox” of musical knowledge, coupled with constrained

indeterminacy, the system is able to make music in the same theoretical manner

as improvising musicians.

Further developments in my own system need to include expanding on the

Composer and building more compositional tools for it to use. This can become

daunting as the options and possibilities are so numerous. It is important to have

a diverse toolbox for the system to work from to keep the music fresh and from

becoming predictable, but it is also very easy to become trapped in a state of

Chapter 5 70

trying to incorporate every little thing possible, using all sorts of different

generational and transformational techniques. On the one hand, the larger the

toolbox, the less prone to repetition of sonic character it will be. On the other

hand, using a model of human improvisers shows that this is the reality of

improvisation. Although there is a plethora of recombinations from the toolbox

possible, the fact remains that there is virtually nothing an improviser will play

that he hasn’t played in some way before. So a compromising balance in the

system development has to be struck to account for this. Once more

compositional elements have been built, I need to focus again on the Analyzer

and determine the best ways for it to communicate to the Composer. I still need

to develop the decision-‐making tools of how it will use the learned data to

respond in a musical manner. Further development of the analysis itself can still

be done as well. I’d like to look more into the use of probability equations and

neural networking as learning tools to integrate into the system. Refinements

can also be made to the input chain, finding the best settings for correct data

collection and responsiveness.

I am also interested in exploring non-‐auditory communication within

improvisation. Eye-‐contact and other visual cues can also be important aspects

to musical communication, and might be able to be included into the system via

Jitter, the visual component of Max/MSP. There are tools capable of shape and

color tracking using just the built-‐in web-‐camera of a laptop with Jitter, so the

possibility of integrating visual cues is certainly there. Further research would

need to be done as to the best way to do this within the framework of

improvisation. I imagine the research would be in regard to what visual cues

Conclusion 71

different improvisers notice from their fellow musicians, and how they interpret

them. I can also see this line of development as becoming extremely complex, as

subtle visual cues can also be very subjective and vary between people, so the

focus of how this information would be used in an interactive improvisation

would need to be defined.

My goal in developing this system is initially for my own use as a solo tool,

but I would also like to expand it for use in my electro-‐acoustic improvisation

duo with a saxophonist, and then possibly for an even larger ensemble. One way

to do this would simply be to use two instances of the patch, but this is more

likely to result in three separate duos performing at once, that of

clarinet/electronics 1, saxophone/electronics 2, and clarinet/saxophone. The

two electronics systems would not be communicating directly with each other,

nor with the other player. For more coherency, it would be best for all the

information to be fed to a central point somewhere in the chain, and the final

result either be a full trio or quartet ensemble. The difference would be whether

the electronics are designed to be two separate systems, each interacting with a

live performer, but as well as with each other to create a quartet; or one

electronic system responding to the live performers equally and creating a trio.

I anticipate it would take about another year to fully develop the patch in

the direction I’m currently taking with it, and perhaps a little more time to really

test and tweak it. Expanding it for multiple players might take another few

months of developmental work, and the inclusion of video, with all the

possibilities it introduces and the research needed to find the best ways to

include it, could easily add another year. Once the system is done I would allow it

Chapter 5 72

to be distributed to other electro-‐acoustic improvisers to use, pending any

licensing restrictions with any third party objects or abstractions that are used.

However, I also hope that this paper has been informative enough to help guide

people in building their own systems, for those so inclined. As mentioned in the

paper, there will be an inherent bias imposed by the developer influencing the

output, so the more people that build their own systems, the broader the

repertoire on the whole becomes.

73

References

Bartók, Bela. 1976. “Mechanical Music” in Bela Bartók Essays, ed. Benjamin Suchoff. London: Faber & Faber

Bench-‐Capon, T.J.M. 1990. Knowledge Representation: An Approach to Artificial Intelligence. London: Academic Press

Berkowitz, Aaron L. 2010. The Improvising Mind: Cognition and Creativity in the Musical Moment. New York: Oxford University Press

Berliner, Paul. 1994. Thinking in Jazz: The Infinite Art of Improvisation. Chicago: University of Chicago Press

Bilson, Malcolm. 2007. Interview by Aaron L. Berkowitz, Ithaca, NY, August 12

Carey, Ben. 2011. Email discussions throughout 2011-‐2012. Ben Carey Website. Retrieved March 7 2012. http://www.bencarey.net/#25f/custom_plain

Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mouton

Cipriani, Alessando and Maurizio Giri. 2009. Electronic Music and Sound Design: Theory and Practice with Max/MSP, volume 1, trans. by David Stutz, 2010 Rome: ConTempoNet s.a.s.

Ciufo, Thomas. 2005. “Beginners Mind: An Environment for Sonic Improvisation” in International Computer Music Conference Proceedings

Cope, David. 1977. New Music Composition. New York: Schirmer Books.

Csikszentmihályi, Mihály and Grant Jewell Rich. 1997. “Musical Improvisation: A Systems Approach,” in Creativity in Performance, ed. Keith Sawyer. Greenwich: Ablex Publishing

Czerny, Carl. 1836. A Systematic Introduction to Improvisation on the Pianoforte, Op.200, Vienna, trans. and ed. Alice L Mitchell, 1983. New York: Longman

Czerny, Carl. 1839. Letters to a Young Lady on the Art of Playing the Pianoforte, from the Earliest Rudiments to the Highest Stage of Cultivation, Vienna, trans. J.A. Hamilton, 1851. New York: Firth, Pond and Co.

Dannenberg, Roger. 2000. “Dynamic Programming for Interactive Systems” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Ellis, Nick. 1994. “Implicit and Explicit Language Learning-‐ An Overview,” in Implicit and Explicit Learning of Languages, ed. Nick Ellis. London: Academic Press

Eysenk, Michael W. and Keane, Mark T. 2005. Cognitive Psychology: A Student’s Handbook, 5th edn. East Sussex: Psychology Press

References 74

Gass, Susan M. and Larry Selinker. 2008. Second Language Acquisition, An Introductory Course, 3rd edn. New York: Routledge

Hodson, Robert. 2007. Interaction, Improvisation, and Interplay in Jazz. New York: Routledge

Holmes, Thom. 2002. Electronic and Experimental Music, second edition. New York: Routledge

Levelt, Willem J.T. 1989. Speaking. Cambridge: MIT Press

Levin, Robert. 2005. “Lecture 8” Harvard University Course “Literature and Arts B-‐52: Mozart’s Piano Concertos,” Sanders Theater, Harvard University, Cambridge, MA, October 14

Levin, Robert. 2007. Interview by Aaron L. Berkowitz, Cambridge, MA, September 10

Luger, G.F. and W.A. Stubblefield. 1989. Artificial Intelligence and the Design of Expert Systems. Redwood City: Benjamin/Cummings

Manning, Peter. 2004. Electronic and Computer Music. New York: Oxford University Press

Marsden, Alan. 2000. “Music, Intelligence and Artificiality” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Meyer, Leonard. 1989. Style and Music: Theory, History, and Ideology. Philadelphia: University of Pennsylvania Press

Michalski, R.S. 1986. “Understanding the Nature of Learning: Issues and Research Directions” in Machine Learning: An Artificial Approach, vol. II, eds. R.S. Michalski, T. Mitchell, and J. Carbonell. Los Altos, CA: Morgan Kaufmann

Miranda, Eduardo Reck. 2000. “Regarding Music, Machines, Intelligence and the Brain: An Intro to Music and AI” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Nardone, Patricia L. 1997. “The Experience of Improvisation in Music: A Phenomenological Psychological Analysis,” PhD diss., Saybrook Institute

Nettl, Bruno. 1974. “Thoughts on Improvisation: A Comparative Approach,” The Musical Quarterly 60

Paradis, Michael. 1994. “Neurolingistic Aspects of Implicit and Explicit Memory: Implications for Bilingualism and SLA,” in Implicit and Explicit Learning of Languages, ed. Nick Ellis. London: Academic Press

Pratella, Balilla. 1910. “Manifesto of Futurist Musicians”. Milan: Open statement

Pratell, Balilla. 1911. “Technical Manifesto of Futurist Music”. Milan: Open statement

References 75

Pressing, Jeff. 1984. “Cognitive Processes in Improvisation, “ in Cognitive Processes in the Perception of Art, eds. W. Ray Crozier and Anthony J. Chapman. Amsterdam: Elsevier

Pressing, Jeff. 1998. “Psychological Constraints on Improvisational Expertise and Communication,“ in In the Course of Performance: Studies in the World of Musical Improvisation, eds. Bruno Nettl and Melinda Russel. Chicago: University of Chicago Press

Reber, Arthur. 1993. Implicit Learning and Tacit Knowledge: An Essay on the Cognitive Unconscious. New York: Oxford University Press

Rolland, Pierre-‐Yves and Jean-‐Gabriel Ganascia. 2000. “Musical Pattern Extraction and Similarity Assessment” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Rowe, Robert. 1993. Interactive Music Systems. Cambridge: MIT Press

Russell, S.J. and P. Norvig. 1995. Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall

Russolo, Luigi. 1913. “The Art of Noises”. Milan: Open statement to Balilla Pratella

Schenker, Heinrich. 1954. Harmony, trans. Elisabeth Mann Borgese. Chicago: University of Chicago Press

Simon, H. and R.K. Sumner. 1968. “Patterns in Music” in Formal Representations of Human Judgement. New York: John Wiley Sons

Toivianen, Petri. 2000. “Symbolic AI versus Connectionism in Music Research” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Widmer, Gerhard. 2000. “On the Potential of Machine Learning for Music Research” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Wiggens, Geraint and Alan Smaill. 2000. “Musical Knowledge: What can AI bring to the musician?” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers

Winkler, Todd. 1998. Composing Interactive Music: Techniques and Ideas Using Max. Cambridge: MIT Press

Young, Michael. 2008. “NN Music: Improvising with a ‘Living’ Computer” in CMMR 2007, LNCS 4969, eds. R. Kronland-‐Martinet, S. Ystad, and K. Jensen. Berlin Heidelberg: Springer-‐Verlag

Zwicker, E. and H. Fastl. 1990. Psychoacoustics, Facts and Models. Berlin: Springer Verlag

developing interactive electronic systems for improvised ... · & ii& introduction! this...

Documents