developing interactive electronic systems for improvised ... · & ii& introduction! this...
TRANSCRIPT
Developing Interactive Electronic Systems for
Improvised Music
Jason Alder
Advisor: Jos Herfs
ArtEZ hogeschool voor de kunsten
2012
Contents
INTRODUCTION ii
1. EVOLUTION OF ELECTRONICS IN MUSIC 1
2. IMPROVISATION 5
3. ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING 16
4. ARCHITECTURE 27 A. CLASSIFICATION PARADIGMS 27 B. LISTENER 33 C. ANALYZER 39 D. COMPOSER 59
5. CONCLUSION 69
REFERENCES 73
ii
Introduction
This paper will discuss how one can develop an interactive electronics
system for improvisation, looking at how this system differs from one designed
for composed music, and what elements are necessary for it to “listen, analyze,
and respond” musically. There will be a look at the nature of improvisation and
intelligence, and through discussions of research done in the fields of cognition
during musical improvisation and of artificial intelligence, insight will be
gathered as to how the interactive system must be developed so that it too
maintains an improvisational nature. Previous systems that have been developed
will be examined, analyzing how their design concepts can be used as a platform
from which to build, as well as look at what can be changed or improved, through
an analysis of various components in the system I am currently designing, made
especially for non-‐idiomatic improvisation.
The use of electronics with acoustic instruments in music is generally the
result of the goal of opening up possibilities and using a new sonic palette. There
is a wealth of approaches for how the electronics get implemented, such as a
fixed performance like tape-‐playback pieces, or the use of effects to manipulate
the acoustic sound like guitar pedals, or pre-‐recorded/sequenced material being
triggered at certain moments. A human is often controlling these electronics,
whether that is the performer or another person behind a computer or other
medium, but the possibility of the electronics controlling themselves brings
some interesting ideas to the improvisation world. With the advances in
technology and computer science, it is possible to create an interactive music
system that will “interpret a live performance to affect music generated or
Introduction iii
modified by computers” (Winkler, 1998). Using software such as Max/MSP, the
development of a real-‐time interactive system that “listens” and “analyzes” the
playing of an improviser, and “responds” in a musical way, making its own
“choices” is closer to fact than the science-‐fiction imagery it may impart.
1
1. Evolution of Electronics in Music
An initial question some may have when considering improvisation with a
computer is, “Why?” More specifically, “Why improvise with a computer when
you could improvise with other humans?” The use of electronics in music is not
an entirely new concept. The Theremin, developed in 1919, is one of the earliest
electronic instruments1. Utilizing two antennae, one for frequency and the other
for amplitude, it produces music through pitches created with oscillators. The
instrument is played by varying the distance of one’s hands to each of the
antennae. Moving the right hand towards and away from the antennae
connected to the frequency changes the sounding pitch, while the other hand
does the same in respect to the amplitude antennae to change the volume (Rowe,
1993). Throughout the 20th century, more and more instruments utilizing
electric current were developed, for example monophonic keyboard instruments
like the Sphärophone (1927), Dynaphone (1927-‐8), and the Ondes Martenot
(1928). These first attempts at electronic instruments were often modeled to try
to provide characteristics of acoustic instruments. Polyphonic inventions such as
the Givelet (1929) and Hammond Organ (1935) became more commercially
successful as replacements for pipe organs, although the distinct characteristic
sound of the Hammond also gave rise to those wanting to experiment with its
sonic possibilities beyond the traditional manner (Manning, 2004).
As has been the case throughout the development of music, the change and
development of new technology opens doors and minds to previously
1 For an explanation and demonstration of Theremin playing, see http://www.youtube.com/watch?v=cd4jvtAr8JM
Chapter 1 2
unexplored musical territory. Chopin and Liszt had the virtue of inspiration “by
the huge dramatic sound of a new piano design. The brilliance and loudness of
the thicker strings was made possible by the development of the one-‐piece cast-‐
iron frame around 1825” (Winkler, 1998). In late 1940s Paris, Pierre Schaeffer
was making Musique Concrète using the new recording technology available by
way of the phonograph and magnetic tape, and “the invention of the guitar
pickup in the 1930s was central to the later development of rock and roll. So it
makes sense today, as digital technology provides new sounds and performance
capabilities, that old instruments are evolving and new instruments are being
built to fully realize this new potential” (Winkler, 1998)
Balilla Pratella, an Italian futurist, published his Manifesto of Futurist
Musicians in 1910 calling for “the rejection of traditional musical principles and
method of teaching and the substitution of free expression, to be inspired by
nature in all its manifestations” and in his Technical Manifesto of Futurist Music
(1911) that composers should “master all expressive technical and dynamic
elements of instrumentation and regard the orchestra as a sonorous universe in
a state of constant mobility, integrated by an effective fusion of all its constituent
parts” and their work should reflect “all forces of nature tamed by man through
his continued scientific discoveries, […] the musical soul of crowds, of great
industrial plants, of trains, of transatlantic liners, of armored warships, of
automobiles, of airplanes” (Manning, 2004). In response, Luigi Russolo published
his manifesto The Art of Noises:
“Musical sound is too limited in qualitative variety of timbre. The most complicated of orchestras reduce themselves to four or five classes of instruments differing in timbre: instruments played with the bow, plucked instruments, brass-‐winds, wood-‐winds and percussion
Evolution of Electronics in Music
3
instruments… We must break out of this narrow circle of pure musical sounds and conquer the infinite variety of noise sounds.” (Russolo, 1913)
John Cage’s interest in improvisation and indeterminacy was an influence
to the composers of the sixties that first began experimenting with electronic
music in a live situation. Gordon Mumma’s Hornpipe (1967), “an interactive live-‐
electronic work for solo hornist, cybersonic console, and a performance space,”
used microphones to capture and analyze the performance of the solo horn
player, as well as the resonance and acoustic properties of the performance
space. The horn player is free to choose pitches, which in turn affects the
electronics in the “cybersonic console”. The electronic processing emitting from
the speakers then changes the acoustic resonance of the space, which is re-‐
processed by the electronics, thus creating an “interactive loop” (Cope 1977).
Morton Subotnick worked with electrical engineer Donald Buchla to create the
multimedia opera Ascent Into Air (1983), with “interactive computer processing
of live instruments and computer-‐generated music, all under the control of two
cellists who are part of a small ensemble of musicians on stage” (Winkler, 1998).
Subotnick later worked with Marc Coniglio to create Hungers (1986), a staged
piece where electronic music and video were controlled by the musicians.
Winkler comments on the “element of magic” in live interactive music,
where the “computer responds ‘invisibly’ to the performer”, and the heightened
drama of observing the impact that the actions of the clearly defined roles of
computer and performer have on one another. He continues by saying that “since
the virtue of the computer is that it can do things human performers cannot do, it
Chapter 1 4
is essential to break free from the limitations of traditional models and develop
new forms that take advantage of the computer’s capabilities” (Winkler, 1998).
The role of electronics in music is that of innovation. The aural possibilities
and a computer’s abilities to perform actions that humans cannot, create a world
of options not previously available. Utilizing these options fulfills Russolo’s
futurist vision, and using these tools for improvisation expands the potential
output of an electronics system. By allowing artificial indeterminism, human
constraints are dissipated and doors are opened for the potential of otherwise
unimaginable results.
5
2. Improvisation
The question of how one makes a computer capable of improvising is one
of the crucial elements in the task of developing an interactive improvisational
system. As a computer is not self-‐aware, how can it make “choices” and respond
in a musical manner? To address this issue, I looked to the nature of
improvisation. What is it that is actually happening when one improvises? What
is the improviser thinking about in order to play the “correct” notes, such that it
sounds like music, as opposed to a random collection of pitches or sounds? Some
may have a notion that improvisation is just a free-‐for-‐all, where the player can
do anything they wish, but this is clearly not the case. If one were to listen to an
accomplished jazz pianist play a solo, as well as an accomplished classical pianist
play a cadenza, they would likely make their respective improvisations sound
easy, effortless, and flow in its style. But if the roles were reversed, and the jazz
pianist played a Mozart cadenza and a classical pianist played a solo in a jazz
standard, there would likely be a clear difference in how they sound. The music-‐
theorist Leonard Myer defines style as:
“a replication of patterning, whether in human behavior or in the artifacts produced by human behavior, that results from a series of choices made within some set of constraints… [which] he has learned to use but does not himself create… Rather they are learned and adopted as part of the historical/cultural circumstances of individuals or group” (Myer, 1989).
There are traits and traditions particular to each style that make a piece of music
sound the way it does, and be identified as being in that style. Without the proper
training and knowledge of rhythmic and harmonic development and particular
important traits for each style, a player cannot properly improvise within it,
Chapter 2 6
which is why one would hear such a difference between the classical and jazz
pianists improvising in the same pieces.
Improvisation takes elements from material and patterns of its associated
musical culture. “The improviser’s choices in any given moment may be
unlimited, but they are not unconstrained” (Berkowitz, 2010). Mihály
Csikszentmihályi, a psychologist specializing in the study of creativity states:
“Contrary to what one might expect from its spontaneous nature, musical improvisation depends very heavily on an implicit musical tradition, on tacit rules… It is only with reference to a thoroughly internalized body of works performed in a coherent style that improvisation can be performed by the musician and understood by the audience” (Csikszentmihályi and Rich, 1997).
These traditions and rules are the conventions that stand as a basis, a
common language, for the performer to communicate to the listeners. They are
the referent, defined by psychologist and improviser Jeff Pressing as “an
underlying formal scheme or guiding image specific to a given piece, used by the
improviser to facilitate the generation and editing of improvised behavior…”
(Pressing, 1984). The ethnomusicologist Bruno Nettl calls the referent a “model”
for the improviser to “ha[ve] something given to work from-‐ certain things that
are at the base of the performance, that he uses as the ground on which he
builds” (Nettl, 1974). The referents, or models, are the musical elements such as
melodies, chord patterns, bass lines, motifs, etc., used as the basis to build the
improvisation. They provide the structural outline and the material, but are part
of the larger knowledge base necessary, which is “built into long term memory”
(Pressing, 1998).
Improvisation
7
It is also necessary to have “rapid, real-‐time thought and action” (Berkowitz,
2010) to successfully incorporate this musical information into a unique,
improvised piece of music. Pressing says:
“The improviser must effect real-‐time sensory and perceptual coding, optimal attention allocation, event interpretation, decision-‐making, prediction (of the actions of others), memory storage and recall, error correction, and movement control, and further, must integrate these processes into an optimally seamless set of musical statements that reflect both a personal perspective on musical organization and a capacity to affect listeners” (Pressing, 1998).
Through study and practice, the referents become engrained into the playing of
the improviser, and the note-‐to-‐note level of playing can be recalled
automatically, allowing the improviser to focus more on the higher-‐level musical
processes, such as form, continuity, feeling, etc.
Aaron Berkowitz, in his book The Improvising Mind: Cognition and
Creativity in the Musical Moment, studies which elements of improvisation are
conscious or unconscious decisions. He finds that “some conventions and rules
are accessible to consciousness, while others may function without conscious
awareness” (Berkowitz, 2010). These elements of memory are related directly to
the learning process, as stated by psychologist Arthur Reber:
“There can be no learning without memorial capacity; if there is no memory of past events, each occurrence is, functionally, the first. Equivalently, there can be no memory of information in the absence of acquisition; if nothing has been learned, there is nothing to store” (Reber, 1993).
The learning process can be separated into two forms, implicit and explicit.
Implicit learning is defined as:
Chapter 2 8
“The acquisition of knowledge about the underlying structure of a complex stimulus environment by a process which takes place natural, simply and without conscious operations… a non-‐conscious and automatic abstraction of the structural nature of the material arrived at from experience of instances,”
whereas explicit learning is:
“A more conscious operation where the individual makes and tests hypotheses in a search for structure… [;] the learner searching for information and building then testing hypotheses… [;] or, because we can communicate using language… assimilation of a rule following explicit instructions” (Ellis, 1994).
The important difference between implicit and explicit learning is the conscious
effort required of explicit learning and not of implicit. It is also possible to learn
implicit information during explicit learning. Berkowitz gives the example of
learning a foreign language, and memorizing phrases in the new language by
explicitly focusing on features of the words, phrases, sounds, and structures, but
at the same time implicitly learning other attributes of language (Berkowitz,
2010).
Similarly, implicit memory is defined as “memory that does not depend on
conscious recollection,” and explicit memory as “memory that involves conscious
recollection” (Eysenk and Keane, 2005). The relationship between learning and
memory is not necessarily direct and can change. Something learned implicitly
can be consciously, and thus explicitly, analyzed, and explicit knowledge can
become implicit “through practice, exposure, drills, etc.…” (Gass and Selinker,
2008).
In Berkowitz’s interviews with classical pianist Robert Levin, Levin
describes his thought processes, or sometimes lack thereof, while he improvises.
While being explicitly aware of the overall musical picture as it is happening, he
Improvisation
9
is not thinking on a note-‐by-‐note basis of what he is doing, or what he will do. He
allows his fingers to move implicitly, the years and years of practice guiding
them in the right directions. He says of the process:
“I began to realize you’re just going to have to let go of it and go wherever you go. The way jazz people do: you have this syntactical thing just the way they have their formulas, you’ve got the basics of architecturally how a cadenza works and its sectionalization, which can be abstracted from all of these cadenzas, and then you just have to accept the fact that there’s going to be some disorder… When I play, I am reacting… your fingers play a kind of, how shall I say, a potentially fateful role in all this, because if your fingers get ahead of your brain when you’re improvising, you get nonsense or you get emptiness. I never, and I mean never, say ‘I’m going to modulate to f-‐sharp major now,’ or ‘I’m going to use a dominant seventh now,’ or ‘I’m going to use a syncopated figure now…’ I do not for one millisecond when I’m improvising think what it is I’m going to be doing. I don’t say, ‘Oh I think it’s about time to end now…’” (Levin, 2007).
Berkowitz focuses on comparing improvising with language production.
When speaking in one’s native language, there is not a word-‐by-‐word analysis of
what is going to be said. The overall direction of the statement is known, but one
is not thinking word-‐by-‐word, nor about specific grammatical rules. These are
implicit elements that manifest during speaking. Children, when learning to
speak, are able to do so without any explicitly taught grammar, but just learn to
know what sounds “right”. There is also no acute awareness of the physical
aspects of speech, such as tongue, lip, and larynx position (Berkowitz, 2010).
These just fall into their learned positions in the body’s muscle memory. This
lack of direct cognition during spontaneous speech production is the same as in
improvising. Once one has learned and internalized the vocabulary and
grammatical rules to the point where it is automatically and implicitly recalled,
they can “leave nearly everything to the fingers and to chance” (Czerny, 1839).
Chapter 2 10
Achieving this level of competence comes from the development of one’s
“toolbox”, or Knowledge Base. Pianist Malcolm Bilson cites one of the elements
for learning to improvise is collecting the ideas for this toolbox (Bilson, 2007)
from the internalization of repertoire and exercises. Once the material has been
stored in the toolbox, it can be drawn upon spontaneously during improvisation,
but it is through the practice and refinement of the skill of improvising that one
can “link up novel combinations of actions in real-‐time and chang[e] chosen
aspects of them” giving one “the ability to construct new, meaningful pathways
in an abstract cognitive space” (Pressing, 1984). This process of refinement and
vocabulary development is largely implicit, in contrast to the explicitly rote
learning of chords and harmonic progressions (Berkowitz, 2010).
While Levin acknowledges that his fingers play a “fateful role” in
improvising, and that there is a lack of cognition of what exactly they will do, he
says also:
“I get to a big fermata, I think, ‘What am I going to do now? Oh, I’ll do that.’ So there’s a bit of that, but not the sense of doing it every two bars” (Levin, 2007).
This creates a dichotomy in the thinking process. On one hand there is no
thinking and purely allowing the fingers to move, but on the other hand there is
having an overall sense of direction and where the fingers need to go and
“get[ting] reasonably lucky most of the time” (Levin, 2007). Psychologist Patricia
Nardone describes this “creator-‐witness dichotomy” (Berkowitz, 2010) as
“…ensuring spontaneity while yielding to it…[,] being present and not present to
musical processes: a divided consciousness… [,] exploring a musical terrain that
is familiar and unfamiliar…” She discusses this further:
Improvisation
11
“One dialectic process is that while improvising musicians are present to and within the musical process, they are also concomitantly allowing musical possibilities to emerge pre-‐reflectively, effortlessly, and unprompted. Conversely, while musicians are outside the improvisational process and fully observant of it, they are paradoxically directing and ensuring the process itself. A second dialectical paradox is that in improvisation there is an intention to direct and ensure spontaneous musical variations while allowing the music itself to act as a guide toward a familiar domain. A third dialectical paradox is that while being present to and within the process of musical improvisation, musicians concomitantly allow the music to guide them toward an unfamiliar terrain. Conversely, while being outside the musical process and fully observant of it, musicians paradoxically intend the music toward a terrain that is familiar to them” (Nardone, 1997).
Paul Berliner speaks of the physicality of the improvisation process on the body,
“through its motor sensory apparatus, it interprets and responds to sounds and
physical impressions, subtly informing or reshaping mental concepts” (Berliner,
1994). This physicality in improvisation can also be likened to that of
spontaneous speech. One needs the effortless mechanical skills of, most often,
their hands to play their instrument just as a speaker needs the mechanical skills
of tongue, mouth, and larynx, as well as a proficiency of the syntax of music and
language to effectively communicate (Berkowitz, 2010). Czerny also speaks of
the creator-‐witness in reference to a speaker that “does not think through each
word and phrase in advance… [but] must… have the presence of mind… to
adhere constantly to his plan…” (Czerny, 1836).
Once this dichotomy of creator-‐witness has occurred, Levin describes his
thoughts once he is done improvising, “After I’m finished doing it, I… have no
idea what I played” (Levin, 2005). To this Berkowitz poses the questions, “Is not
some memory of what is occurring during the improvisation necessary if the
performer is to make it from point a to point b? Or can this only prove to be a
Chapter 2 12
hindrance?” (Berkowitz, 2010). The answer to this lies in the findings of implicit
and explicit memories. The practiced and honed skill of improvising, after time,
enters in the implicit memory as motoric reactions, even though the actions
themselves cannot be explicitly remembered. The improviser may begin with an
idea, but is then led by the movements of the fingers, allowing the music to “flow
from moment to moment magically manifest[ing], without a need to know or
remember where one has been or where one is going. In improvised
performance, the boundaries between creator and witness, past and future, and
music and musician dissolve into the musical moment” (Berkowitz, 2010).
Willem J.M. Levelt describes the processes for the generation of speech in
his book Speaking as:
Conceptualization. In this process, one plans “the communicative intention by selecting the information whose expression may realize the communicative goals.” In other words, one plans the idea(s) behind the intended message in a preverbal fashion.
Formulation. In this process, the conceptualized message is translated into linguistic structure (i.e., grammatical and phonological encoding of the intended message take place). This phrase is converted into a phonetic or articulatory plan, which is a motor program to be executed by the larynx, tongue, lips, etc.
Articulation. This is the process of actual motor execution of the message, that is, overt speech.
Self-monitoring and self-repair. By using the speech comprehension system that is also used to understand the speech of others, the speaker monitors what he or she is saying and how he or she is saying it on all levels from word choice to social context. If errors occur, the speaker must correct them (Levelt, 1989; Berkowitz, 2010).
The application of these ideas to improvisation is logical. The overall
improvisation is the concept, the form, structure, and style is the formulation,
playing the music is the articulation, and as the music is happening the
performer is monitoring the output and making corrections.
Improvisation
13
Improvisation can also, however, be likened to learning a foreign language
rather than a native language. Following Levelt’s processes, one is much more
conscious of what the conceptualized statement is, the formulation of the
translation and ordering of the words, and the correctly articulated
pronunciation. Sometimes, particularly when beginning, the monitoring and
repair section is not even achievable, as one does not even know that there was a
mistake. It can be that the foreign language learner may have knowledge and
understanding of the rules of sentence construction, but is not able to formulate
them in a manner for an effective conversation. Berkowitz analogizes this to
Levin’s descriptions of learning to improvise, and the balance between thinking
too much about what he was doing, and just allowing his fingers to go. The ability
to think about the referent and overall structure interfered with the fingers and
the note-‐by-‐note implicit level of playing. Michael Paradis says that the foreign
language speaker “may either use automatic processes or controlled processes,
but not both at the same time… Implicit competence cannot be placed under the
conscious control of explicit knowledge” (Paradis, 1994).
Finding a balance between planning and execution in speech and
improvisation is thus necessary. Eysenck and Keane estimate that 70 percent of
spoken language uses recurrent word combinations, and thus pre-‐formulation is
one tool for finding this balance (Eysenck and Keane, 2005). From a musical
perspective, this is akin to combining elements from the “toolbox,” allowing for
more attention to be paid to the referent.
Improvisation occurs constantly in everyday life. For example, it could also
be analogous to the decision to drive to the store. There must be a general plan;
Chapter 2 14
one must know the way and the best route to take, but what happens in between
is unknown. Encountering other cars, traffic lights, road construction, a dog
running across the street, etc., can all change the originally intended plan, and
the ability to immediately react and adapt to the situation is imperative. Befitting
of this example, Berkowitz says:
“Improvisation cannot exist without constraints, and that live performance will always require some degree of improvisation as its events unfold. Improvisation needs to operate within a system even when the resultant music transcends that system. Moreover, no performance situation-‐ improvised or otherwise-‐ exists in which all variables can be entirely predetermined” (Berkowitz, 2010).
Similarly, Levin states:
“The fact of the matter is that you are who you have been in the process of being who you will be, and in nothing that you do will you suddenly-‐ as an artist or a person-‐ come out with something that you have never done before in any respect. There will be quite possibly individual elements in a performance that are wildly and pathbreakingly different from anything that you’ve done before, but what about the rest and what kind of persona and consistency of an artist would you have if there was no way to connect these things…?” (Levin, 2007).
The key elements learned about improvisation here are the spontaneous
development and recombination of previously learned material and the lack of
specific conscious decisions, yet maintaining an overall view of the direction the
music is going. The musical decisions that come from spontaneous
recombination are sourced from the musician’s training and study, and what
patterns have been learned and have found their way into the implicit memory.
This is why classical and jazz pianists will improvise differently to the same
music; they have different “toolboxes”. It can then also be said that whatever
goes into the toolbox will have an effect on the output. The training that a
musician receives will be represented by the music produced. This is important
Improvisation
15
to consider for the development of an electronic music system; the contents of its
toolbox will reflect its output. Once an understanding of the nature of
improvisation has been established, the application of these principles to the
computer is the next step.
16
3. Artificial Intelligence and Machine Learning
The notion of a computer “making choices” in improvisation has been
mentioned here. There is an implication that to make a choice, one must be
capable of some amount of intelligence, which introduces the question, “What is
intelligence?” One might consider the solving of complex equations by a highly
gifted mathematician, or the moves performed by a chess master, or the
diagnoses of disease by a doctor, as being intelligent. However, the tasks
performed by all of these humans can also be accomplished by a computer,
which is typically considered as not being intelligent. As Eduardo Reck Miranda
says, “the problem is that once a machine is capable of performing such types of
activities, we tend to cease to consider these activities as intelligent. Intelligence
will always be that unknown aspect of the human mind that has not yet been
understood or simulated” (Miranda, 2000). Defining intelligence may be a
contentious task, so we will look to the attributes of it. Widmer points out that
“the ability to learn is undoubtedly one of the central aspects, if not the defining
criterion, of intelligence and intelligent behavior. While it is difficult to come up
with a general and generally agreed definition of intelligence, it seems quite
obvious that we would refuse to call something ‘intelligent’ if it cannot adapt at
all to changes in its environment, i.e., if it cannot learn” (Widmer, 2000).
It is quickly recognized that as the research and technology in the field of
artificial intelligence advances, bringing “musicality to computer music, no
model has yet come close to the complex subtleties created by humans” (Winkler,
1998), a sentiment echoed by Widmer’s statement that although computers and
software can “extract general, common performance patterns; the fine artistic
Artificial Intelligence and Machine Learning
17
details are certainly beyond their reach” (Widmer, 2000). Although Miranda
claims that “from a pragmatic point of view, the ultimate goal of Music and AI
[Artificial Intelligence] research is to make computers behave like skilled
musicians” (Miranda, 2000), it is clear that a machine is not human, and any
attempts to create an intelligent computer are merely tasks of trying to recreate
processes of the brain.
So the focus becomes one of determining what these processes are,
accomplished by looking at the desired end result. When creating a model,
attention is paid to the original design and the details necessary to copy it. But is
the goal really to create a system that is a copy of a human? One of the desirable
attributes of a computer is exactly that it is not human, such as its ability to
handle and process large amounts of data and perform calculations with a speed
and accuracy far greater than that of a human. Dannenburg speaks of the
advantages of relying on a computer’s skills and its ability to “compose complex
textures that are manipulated according to musical input. For example, a dense
cloud of notes might be generated using pitches or harmony implied by an
improvising soloist. A dense texture is quite simple to generate by computer, but
it is hard to imagine an orchestra producing a carefully sculpted texture while
simultaneously listening to and arranging pitch material from a soloist”
(Dannenberg, 2000). Rowe points out that human limitation and variability was
precisely an element that led to the use of electronics in music (Rowe, 1993) and
Bartók comments on the use of the mechanized pianola that “took advantage of
all the possibilities offered by the absence of restraints that are an outcome of
the structure of the human hand” (Bartók, 1937).
Chapter 3 18
Michael Young identifies a resulting attribute of what he calls a “living”
computer as being “unimagined music, its unresolved and unknown
characteristics offering a genuine reason for machine-‐human collaboration.” If
the computer is to “extend, not parody, human creative behaviour, machine
music should not emulate established styles or practices, or be measured
according to any associated, alleged aesthetic” (Young, 2008). It is the discovery
of new ideas and material through the use of computers in music to “create new
musical relationships that may exist only between humans and computers in a
digital world” (Winkler, 1998) that drives the continuing research in the
development of computers in music.
Looking at these factors it can be seen that a desired system may “behave
in a human-‐like manner in some respects but in a non-‐human-‐like manner in
other respects [… Exhibiting] appropriate behavior… in a manner which leads to
a certain goal” (Marsden, 2000). Referring to Widner’s quote previously about
intelligence, that goal is the ability to learn.
This then brings the question, “What is learning?” Russell and Norvig define
it as “behaving better as a result of experience” (Russell and Norvig, 1995); while
Michalski states that it is “constructing or modifying representations of what is
being experienced” (Michalski, 1986). These two definitions address different
elements of learning; improvement of behavior as stated by Russell and Norvig,
and acquisition of knowledge of the surroundings as stated by Michalski.
Marsden summarizes by saying that one key feature of an intelligent animal is its
ability to learn spontaneously from its experiences and adapt future actions as a
response to this, and that a second feature is being able to perform in unfamiliar
Artificial Intelligence and Machine Learning
19
environments of which they have no previous knowledge, “tolerably well.” As
such, a goal of Artificial Intelligence is the capacity to learn and apply this
learning in unfamiliar situations (Marsden, 2000).
How, then, does a computer accomplish learning in its quest for
intelligence? Widmer cites Michalski’s definition, “learning as the extraction of
knowledge from observations or data”, as the “dominant paradigm in machine
learning research”, with examples of “classification and prediction rules (Clark
and Niblett, 1989, Quinlan, 1990), decision trees (Quinlan, 1986, 1993), or logic
programs (Lavrac and Dzeroski, 1994)” (Widmer, 2000). Through the use of
algorithms, a computer is able to assess data and make comparisons for
purposes of classification. For example, from a stream of pitches an algorithm
can analyze music to “look for collections of notes which form a series, or… check
collections of notes to see if they form a series” (Wiggens & Smaill, 2000).
Learning is thus accomplished through observation of data, allowing the
computer to classify notes as being part of a defined series, or looking for the
series within the notes. Empirical predictions based on trends and probabilities
can be made using generalizations based upon these observations. It is possible
to analyze a stream of notes, looking at intervallic relationships, to determine the
likelihood of what the next note played will be. For instance, if the software sees
the ascending step-‐wise motion of the incoming pitches F G A, it could
reasonably assume that the next note played could be a B. Coupled with some
programmed information akin to the knowledge “toolbox” discussed in the
previous section about improvisation, the computer could make even more
robust analyzations on the basis of tonality to predict upcoming notes, thus
Chapter 3 20
knowing that B-‐flat is also a likely possibility. As the computer continues to
analyze and find trends and patterns in a piece of music, its Knowledge Base can
grow and assign more accurate weights to the probabilities of certain notes. In
this respect, the learning occurs corresponsive to “behaving better as a result of
experience.”
Music-‐theorist Heinrich Schenker says that repetition is “the basis of music
as art. It creates musical form, just as the association of ideas from a pattern in
nature creates the other forms of art” (Schenker, 1954). For this reason, the
ability to recognize patterns is an important one for computers, and a key feature
for music systems. Patterns occur in music in all different levels, including “pitch,
time, dynamics and timbre dimensions of notes, chords and harmony, contours
and motion, tension and so on” (Rolland and Ganascia, 2000). Scale structures,
melodic sequences, rhythms, and chord progressions are all based on the
repetition of patterns. The cognitive processes of expectation and anticipation
derive from the brain’s ability to pick out and identify patterns (Simon and
Sumner, 1968). A cadential chord progression of a V resolving to ii, for instance,
is called a deceptive cadence. Typically in Western music, the chord pattern
should resolve to I, and because the pattern does not go where the listener
expects or anticipates that it will, they have been deceived.
Robert Rowe’s software Cypher uses the concept of anticipation to predict
the performer’s playing by looking for patterns in real-‐time. In this sense, Cypher
is learning based on Russell and Norvig’s definition, “behaving better as a result
of experience”. Once Cypher detects the first half of a recognized pattern, it
assumes that it will be continued, and can then respond to this information as
Artificial Intelligence and Machine Learning
21
appropriate (Rowe, 1993). The recognition and extraction of patterns involves
“detecting parts of the source material that have been repeated, or
approximately repeated, sufficiently to be considered prominent”. Some
questions raised by Rolland and Ganascia are: “How should ‘parts’ be selected?”,
“What is ‘approximate repetition’?” “What is ‘sufficiently’?” “What algorithms can
be designed and implemented?” (Rolland and Ganascia, 2000). The manner in
which these questions are answered depends on the nature of the music and
how the pattern information is to be used by the software.
Rowe defines two goals in pattern processing as “1) learning to recognize
important sequential structures from repeated exposure to musical examples
(pattern induction), and 2) matching new input against these learned structures
(pattern matching).” Additional information can also be collected from the
patterns, such as the frequency and context of occurrence, and the relationships
between them. Differences such as transposition or retrograde are two such
relationships that can enrich the capabilities of the pattern identifier. Other
enrichment can be the ability to recognize differences with the addition or
omission of notes, metric and rhythmic displacements, altered phrasing and
articulation, and ornamentation (Rolland and Ganascia, 2000).
There will be an inherent bias from the system developer as to the decision
of what constitutes “sufficiently” prominent material to be analyzed. Widmer
addresses the fact that bias can occur in the “representation language in which
the learning system can represent its hypotheses” and that one must “be very
conscious of, and explicit about, any assumptions that guide his/her choice […] of
representation language” (Widmer, 2000). Rowe stresses that it is “critical to
Chapter 3 22
take care that the parameters of the representation preserve salient aspects of
the musical flow” (Rowe, 1993), and Miranda cites, “Designers of AI systems
require knowledge representation techniques that provide representational
power and modularity. They must capture the knowledge needed for the system
and provide a framework to assist the systems designer to easily organize this
knowledge (Bench-‐Capon, 1990; Luger and Stubblefield, 1989).” The point here
is to be mindful of how musical information is expressed to the computer. For
example, in a piece of music there could exist two phrases, one a C-‐major scale,
the other an Eb-‐major scale. If this were represented as note names (Fig. 1) the
two phrases would be regarded as not matching. However, if they were
represented as intervals (Fig. 2), counted as the number of semitones between
notes (note, the ‘-‐‘ for the value of note1, because it requires two notes for there
to be an interval, thus analysis cannot begin until the second note is played) then
the phrases would be considered matches, and the computer could choose to
take an action on the basis of the knowledge that there is scalar activity
occurring. Another example could be in regard to rhythm. For instance, there
could be a phrase played all in half-‐notes, and then again all in quarter-‐notes. If
the analysis were looking solely at the lengths of the notes and phrases, the two
would not match. However, if the lengths of the notes were represented as ratios
compared to the previous note, in this example all would be 1:1, then there
would be a match. These are merely two very simple examples of the way the
representative language can impact the analysis results. It is also not to say that a
phrase analysis should be based solely on one or the other pieces of information,
nor that the differences should be disregarded, either. The information that the
melodic line is the same intervals but transposed, and that the rhythmic pattern
Artificial Intelligence and Machine Learning
23
is the same but double speed, is also important data that must be expressed and
recorded as a separate point of analysis. This illustrates examples of how data
can be interpreted by “abandon[ing] the note level and learn[ing] expression
rules directly at the level of musical structures” (Widmer, 2000).
For ways to describe these musical structures, we will look again to
comparisons in language. Crucial to the understanding of a language is the
knowledge of the grammar, which must be based on mathematical formalism to
correctly assess the function of each element of a sentence (Chomsky, 1957).
Miranda uses an example of the sentence “A musician composes the music.” To
put this sentence in mathematical terms, the knowledge will be represented in
variables:
Phrase1 Phrase2
note1 C note1 Eb
note2 D note2 F
note3 E note3 G
note4 F note4 Ab
note5 G note5 Bb
note6 A note6 C
note7 B note7 D
note8 C note8 Eb
Fig. 1
Phrase1 Phrase2
note1 -‐ note1 -‐
note2 2 note2 2
note3 2 note3 2
note4 1 note4 1
note5 2 note5 2
note6 2 note6 2
note7 2 note7 2
note8 1 note8 1
Fig. 2
Chapter 3 24
S = NS + VS (Sentence = Noun Sentence + Verb Sentence)
A musician + composes the music
NS = A + N (Noun Sentence = Article + Noun)
A + musician
VS = V + NS (Verb Sentence = Verb + Noun Sentence)
composes + the music
Describing the sentence with variables allows for substitutions from a set:
A = {the, a, an}
N = {dog, computer, music, musician, coffee}
V = {composes, makes, hears}
So the formula S = NS + VS could yield the sentence “The dog hears a computer”,
but it could also produce “The coffee makes a dog”. These mathematical
formalisms help to describe the rules of the language, but don’t prevent these
sorts of nonsense errors. For that, a certain amount of semantic rules or context
must also be supplied to the system, which can be explored through the use of
Artificial Neural Networks (ANN).
ANNs, or “connectionism” or “parallel distributed processing (PDP)”, are
models based on biological neural networks, or broadly speaking, the way the
human brain operates. The important elements of an ANN are that the neurons,
or nodes, are independent and simultaneously operating; they are
interconnected, feeding information between each other; and they are able to
learn based on input data and adapt the weights of their interconnections
(Toivianen, 2000). The basic model of an ANN consists of a number of input and
Artificial Intelligence and Machine Learning
25
output nodes that are connected to each other at different weights. As each input
node receives information, it passes it to the others for more processing and
outputs a result. The weights of the connections determine how much influence
the data has, and these weights adjust themselves as the data is acquired and
reviewed. If the processed output corresponds to the expected output from the
training, the connection weight is strengthened, and conversely if it is not the
expected output then the weight is weakened.
ANNs can be trained through data sets to learn what result a certain input
should obtain. Using the example of the data set above, an ANN could learn
correct semantics by having correct sentences “read” to it. By training on this
data, for example, “The dog hears a computer”, “A musician composes the music”,
“A computer makes the music”, “A dog hears the coffee”, the network can adjust
the weights of the connections between words, learning that certain words are
more likely to follow others, while some will never follow others, “The coffee
composes an dog”. This principle can be applied similarly in music.
Cypher uses a neural network in chord identification to determine “the
central pitch of a local harmonic area” (Rowe, 1993). To broadly summarize its
operations, it uses twelve input nodes, each corresponding to one pitch class
regardless of octave, which activate when their pitch is played. Each input node
then sends a message to the six different chord theories of which it could be a
part (based on triad formations). For example, if a C is played, it sends a “+”
message to the chord theories of C major, c minor, F major, f minor, Ab major,
and a minor. It also sends a “-‐“ message to all the other chord theories. Doing this
with every note received, Cypher begins to determine what the harmonic area is
Chapter 3 26
based on the most prevalent chords. This information is then fed into another
network to determine the key. The key theories most affected are those that
could be the tonic, dominant, or subdominant of the arriving chord. So, a C major
chord would send a “+” message to the key theories of C major, F major, f minor,
and G major, and a “-‐“ message to the rest.
As the computer continues to learn through observations of the musical
environment, the data can be stored into a database for retrieval. As new
information comes in, the system can analyze and reference it to the database,
making decisions based on the previous material. In this way, learning occurs
initially through Michalski’s definition, and then by Russell and Norvig’s. The
potential of what information the system extracts from its analysis is huge.
Anything that can be represented in a language understood by the computer is
possible, and the task then lies within the creativity of the system designer. In
addition to the note and rhythm examples already given, patterns could be found
in dynamics and volume, density of sound, speed, register, timbre, etc.
27
4. Architecture
Rowe’s Cypher, consists of “two main components, the listener and the
player. The listener (or analysis section) characterizes performances
represented by streams of MIDI data. The player (or composition section)
generates and plays music material” (Rowe, 1993). Most importantly, in regard
to an improvisation system, is that Cypher listens and generates music in real-‐
time, without triggering previously recorded or sequenced material, and without
following a timeline based score as a reference.
4a. Classification Paradigms
Rowe makes a distinction in the classification of interactive systems,
separating the paradigms between Score-driven and Performance-driven systems.
Score-‐driven systems:
“Use predetermined event collections, or stored musical fragments, to match against music arriving at the input. They are likely to organize events using the traditional categories of beat, meter, and tempo. Such categories allow the composer to preserve and employ familiar ways of thinking about temporal flow, such as specifying some events to occur on the downbeat of the next measure or at the end of every fourth bar.”
As compared to Performance-‐driven systems which:
“Do not anticipate the realization of any particular score. In other words, they do not have a stored representation of the music they expect to find at the input. Further, performance-‐driven programs tend not to employ traditional metric categories but often use more general parameters, involving perceptual measures such as density and regularity, to describe the temporal behavior of music coming in” (Rowe, 1993)
The importance in making this distinction is in how the software handles the
incoming data regarding the live performer, and what techniques must be used
Chapter 4 28
to respond. A score-‐driven system uses just that, a score, or some representation
of a score, programmed into the software for it to follow and to which the
incoming signal is matched. Just as a conductor will follow notes and rhythms as
indications as to where the players are, a score-‐based system is programmed to
also identify certain moments or characteristics to know where the player is,
such as pitches, intervals, rhythms, and phrases. A score-‐driven system can also
be leading the performance, functioning based on a clock and reacting to certain
moments in accordance to what the current duration since the beginning of the
piece (or section, or other defined onset) is. As these event markers are found,
the score-‐based system is programmed to perform a function associated with
certain events. For example, play x chord when the performer arrives at y note,
or add delay to this phrase, or harmonize this section, etc.
In contrast, the performance-‐driven system does not follow a score or have
any information about the specific performance pre-‐programmed. It does not
know, for example, that in measure 54 there will be a cadence leading to a key
change. These systems react based on other information it receives, specifics of
which will be discussed later. Because performance-‐driven systems are not
dependent on prior knowledge of the upcoming music, these systems are clearly
better suited for an improvisational setting.
George Lewis, a jazz trombonist, began building and performing with his
interactive system, Voyager, in the late seventies. He says of it:
“The computer was regarded as ‘just another musician in the band.’ Hours were spent in the tweaking stage, listening to and adjusting the real-‐time output of the computer, searching for a range of behavior that was compatible with human musicians. By compatible, I mean that music transmits information about its source. An improviser
Architecture: Classification Paradigms
29
(anyone, really) takes the presence or absence of certain sonic activities as a guide to what is going on.
When I speak of musical ‘interaction’, I mean that the interaction takes place in the manner of two improvisers that have their own ‘personalities.’ The program’s extraction of important features from my activity is not reintroduced directly, but used to condition and guide a separate process of real-‐time algorithmic composition.
The performer interacts with the audible results of this process, just as the program interacts with the audible results of what I am thinking about musically; neither party to the communication has final authority to force a certain outcome-‐ no one is ‘in charge.’ I communicate with such programs only by means of my own musical behavior” (Lewis, 1994).
This approach is a guideline for which my development in an interactive
system is based. The improviser and computer are independent of each other
with their own voice and musical personality. They are not directly controlling,
but rather interacting with and influencing each other, the same way in which a
human duo improvisation would occur. This exemplifies another paradigm, that
of Instrument vs. Player. In an instrumental system, the effect of the computer is
that of adding to and enhancing the input signal with the intention of being an
extension of it, much like many guitar effects-‐pedals. The result is as though the
combined elements are one player and the music would be heard as a solo. In the
instrumental paradigm, the performer is controlling the direction of the
electronics. A player system could also behave like an instrumental system at
times, but the intention is to construct an artificial player with its own musical
presence, personality, and behavior. The degree to which it follows the input
signal varies, and in an improvisational setting neither performer nor computer
is controlling, but rather influencing each other. In this way, the result is more
Chapter 4 30
like a duet (Rowe, 1993; Winkler, 1998). Voyager is an example of the Player
paradigm, and is the goal of an interactive music system.
Rowe identifies three stages of an interactive system’s processing chain:
sensing, where the input data is collected; processing, where the computer
interprets the information it has sensed and makes decisions based on it; and
response, where the system produces its own output (Rowe, 1993). From this
point these stages will be referred to respectively as the Listener, Analyzer, and
Composer components.
The elements of the interactive music system described here have been
designed for a monophonic wind instrument, specifically clarinet and bass
clarinet. With that in mind, there are certain characteristics that have developed
as a response to the particular needs of this instrument, as well as some that
have been neglected, such as addressing the possibilities offered by a polyphonic
instrument. There are some basic technical requirements that won’t be
discussed in much detail, but it will be stated what they are.
First is a computer with the software Max/MSP from the company Cycling
742 with which the patch will be written. A patch is the name for a program
written within Max/MSP. This is one of the most used applications for creating
live electronic music. One of the beneficial features is the ability to create
modular components. That is, an element designed to perform a certain task or
function can be created on its own as a separate patch and incorporated into
2 Max/MSP is commercially available from www.cycling74.com. A free application developed by Miller Puckette, the author of Max/MSP, is Pure Data (PD) available from www.puredata.info. PD functions very similarly to Max/MSP, but not without some differences. Most notable of these are the availability of third party objects, some of which will be discussed here.
Architecture: Classification Paradigms
31
larger patches as a subpatch. Not only does this ease in troubleshooting, by being
able to verify that individual modules work on their own, but it also encourages
sharing within the community of users. It is very common practice for small
objects, abstractions, or patches that one has created to be made available for
others to use in their own works. It can greatly reduce time consumption if an
object or patch already exists that will perform the task one needs it to, without
having to program it entirely oneself. Patches are also adaptable, so that if the
originally conceived function doesn’t operate in the exact way needed for a
different project, small modifications can be made to incorporate it correctly.
The modularity also enables one’s own work to be used in their own future
projects.
The second requirement is a soundcard capable of accepting two
microphone inputs, and third are two microphones, a standard dynamic or
condenser mic and a second contact mic.
Fig. 3 shows an input chain utilizing the two microphones. MIC 1 is the
standard microphone for capturing the sound of the instrument and MIC 2 is the
contact microphone. A contact microphone is a special piezo that reacts to
vibrations rather than sound waves. The contact MIC 2 in Fig. 3 acts as a gate for
the signal from MIC 1. A threshold is set for MIC 2, as seen in the subpatch p vca
in Fig. 4, whereby any signal below the threshold closes the gate and no signal
from MIC 1 will pass. By placing the contact microphone on the instrument, it
will open the gate when the vibrations of the instrument exceed the threshold, as
when playing, and allow the signal from the standard MIC 1 to pass. Using this
Chapter 4 32
method helps to prevent unwanted extraneous room noise from passing through
the microphone, and can also be used to more accurately capture data.
Fig. 3- Input Chain
Fig. 4- p vca subpatch, developed by Jos Zwaanenburg3
3 Jos Zwaanenburg: http://web.mac.com/cmtnwt/iWeb/CMTNWT/Teachers/0D06AA24-‐D6CF-‐11DA-‐9F63-‐000A95C1C7A6.html
33
4b. Listener
The Listener is the stage of the system that collects the data from the input
signal, and it is here that the decision must be made of what the relevant data to
be collected is. Cypher uses the information from pitch, velocity, duration, and
onset time, represented in MIDI format. From this it makes other analytical
classifications like register, speed (horizontal density), single notes versus
chords (vertical density), and loudness. One of the major limitations of Cypher, as
it was written in the late eighties/early nineties, is the representation of data
only as MIDI. The MIDI protocol strips away other important elements such as
timbre, which can also supply information about the overtone partials in a pitch,
and noisiness and brightness of a sound. MIDI also principally limits the pitches
to the well-‐tempered scale, although extra Continuous Controller information
can be added to introduce pitch bends. Additionally, it doesn’t make use of the
live audio signal and therefore the Composer stage can only create pitch-‐based
music from digital synthesis and not from transformation of the original sound,
more of which will be discussed later.
Technology has advanced since the development of Cypher, and computers
today are much faster and hardware more sophisticated and capable of handling
DSP (Digital Signal Processing). DSP allows the analysis of an audio signal so that
timbral information can be included, as well as the representation of the true
pitch as hertz. Since DSP is using the live audio signal it is also possible to affect it
in the Composer stage, adding transformational effects like delay, transposition
and harmonization, ring modulation, distortion, etc.
Chapter 4 34
Using some Max/MSP objects such as analyzer~ created by Tristan Jehan4,
data can be extracted such as pitch, loudness, brightness, noisiness, Bark scale,
attack, and sinusoidal peaks of the partials. Pitch is represented in both hertz and
a decimalized MIDI note, which allows for either tempered or untempered use of
the data. For example, MIDI note 60.25 is equal to a C that is 25 cents sharp. Two
approaches to the use of the data can be taken, either noting the exact tuning of
the pitch, or the tempered note regardless of tuning discrepancies, depending on
the intended use. The loudness value measures the input signal volume on a scale
of decibels. Brightness is a timbral measure of the spectral centroid, or the
perceived brightness of the sound, whereas noisiness is a timbral measure of
spectral flatness, on a scale of 0-‐1. 0 is more “peaky” like a pure sine wave, which
oscillates with a certain number of peaks in the signal spectrum to create a
frequency, whereas 1 is more “noisy” like white noise, where peaks of all
frequencies are of the same power and create a flat spectrum. The Bark scale
measures the loudness of certain frequency bands that are associated with
hearing (Zwicker and Festl, 1990). An attack is reported whenever the loudness
increases by a specified amount within a specified time, and the sinusoidal peaks
of the partials report the frequencies and amplitudes of a specified number of
overtone partials in the signal.
Another object similar to analyzer~ is sigmund~, created by Miller
Puckette5. It provides some of the same data, although some of it is formatted or
functions differently. Pitch is available as a continuously outputted decimal MIDI
4 Tristan Jehan: http://web.media.mit.edu/~tristan/maxmsp.html
5 Miller Puckette: http://crca.ucsd.edu/~msp/software.html
Architecture: Listener
35
note, but not as hertz, but sigmund~ has a parameter notes which outputs the
pitch at the beginning attack of a note rather than continuously. This can be
useful when dealing with an unstable pitch such as from a wind instrument,
which is making constant minute fluctuations, and the desired data is that of the
principle pitch. Loudness is reported, but as linear amplitude rather than as
decibels. Sinusoidal components are also available, but organized differently.
Sigmund~ outputs the sinusoids in order of amplitude, whereas analyzer~ does
so in order of frequency. This difference can affect which frequencies are
reported, depending on how many sinusoids are asked for. For example, if three
peaks are requested from each object, analyzer~ will output the lowest three
partials, but sigmund~ will output the three partials with the highest amplitude.
The choice of which to use again lies in how the data will be used. Sigmund~ does
not provide data for brightness, noisiness, attack, or Bark scale.
In addition to the inherent data available from analyzer~ and sigmund~, the
duration of a note can be calculated by measuring the time between the onset of
a note and when either the pitch changes or the volume drops to 0. Fig. 5
demonstrates receiving the data from midivelocity and upon receipt of a non-‐
zero, starts the timer. Midivelocity sends a zero at the end of every note and is
described in more detail in the discussion of the Analyzer component. When the
timer receives this zero message, it stops and thus calculates the time between
start and stop giving the duration of a note in milliseconds.
Chapter 4 36
Fig. 5- Note Duration
A common problem of computer electronics is that of pitch detection in
real-‐time. It is difficult for the computer to correctly analyze analog pitch,
especially at fast tempi. With MIDI controllers such as keyboards, EWIs
(Electronic Wind Instruments), or electronic percussion the MIDI information
can be transferred immediately and note names can be understood based on
which key or combination of keys is pressed. With an analog signal, the computer
must first try to interpret the pitch to determine what note it hears, which
creates latency. In a fast passage it is likely that the computer will miss or
misinterpret some notes. In relation to a “live” human duo improvisation, one
player will surely not be able to recreate every single note that the other has
played, but will understand the overall shape and idea. Young also recognizes the
need for a broader analysis as it pertains to freely improvised music (Young,
2008). Since the genre is not reliant on precise harmonic relationships and
rhythms, it is sometimes better to not focus on capturing every individual note,
but instead to focus on phrases.
Architecture: Listener
37
Max/MSP allows for recording into a buffer~, a “storage space” for the
audio signal. Other objects can call upon the recording in the buffer for playback
and manipulations to the signal can be made. Buffers can be of different lengths,
but an initial choice must be made as to what that size will be. When the buffer
has been filled, it continues recording back at the beginning, overwriting the
previous contents. Making the size too small could potentially mean that
previously played and relevant material is no longer accessible, so it is better to
err on the large side. There is an upper limit, however, based on factors such as
the computer’s available memory. Fig. 6 shows a buffer of ten minutes called
improv1. When the Record to Buffer toggle is on, the signal is recorded, as shown
by the waveform, and the clocker object is started. The time from clocker
correlates to the current recording position in the buffer, buffertime, and this
data can be used to reference specific points of the recording. If the buffer
reaches the end and restarts at the beginning, clocker is reset as well.
Fig. 6- Recording Buffer
A global time component can also be used, measuring the overall time from
the start of the performance. Fig. 7 demonstrates a simple way of achieving this.
Chapter 4 38
The timer receives a bang from inlet1 to start counting. Inlet1 would be
connected to the Global Start, which could be the opening of the patch, or
another start button used to begin the patch for performance. Inlet2 receives a
bang at the beginning of each event, which causes timer to output the current
time in milliseconds. This timestamp can be used in the data collection as a way
to identify each event.
Fig. 7- Global Time
Rhythm is of course another important element of music that should be
discussed. Previous systems have devised methods of interpreting rhythms and
tempi. Rowe, Winkler, and Cope each discuss techniques to gather this
information in their books, to which I refer the interested reader. In the context
of free improvisation, however, the necessity for this exact information is less
important because the style is free from constraints of a unifying tempo and
meter. More important aspects are the general amount of activity within a period
of time (horizontal density), the time elapsed between events (delta time), and
the length of events (duration).
39
4c. Analyzer
From the Listener component the data needs to be sent for interpretation
in the Analyzer. In addition to analysis, this section will also create the database
for storage and retrieval. There is a multitude of ways to analyze the data
depending on what parameters are needed or desired for the Composer Section.
Fig. 8 shows a patch that analyzes for pitch, pitch class, interval, register, lowest
pitch, highest pitch, number of note occurrences, loudness, note duration, delta
time, and horizontal density, as well as the timbral characteristics brightness and
noisiness. Data for the beginning and ending of phrases, the globaltime, and
buffertime are also recorded. The characteristic descriptors are sent to individual
databases, a global (master) database, and a phrase database. As each new
phrase is completed, it is compared against the previous phrases to determine
which is the closest match.
There are four elements used for organizational purposes, an index and
phrase number, and globaltime, and buffertime stamps. The index is the counter in
the upper-‐left corner of Fig. 8, counting every single event as it occurs, received
from the object r midinote, which is sending from analyzer~ in another patch. To
the right is the phrasemarker subpatch shown in Fig. 9. Globaltime begins
counting at the start of the performance, activated here when the Record to
Buffer toggle from Fig. 6 is clicked, and does not stop for the entire duration of
the performance. Buffertime is similar, however is meant to keep a record of the
onset times of events happening in relation to the current position in the buffer.
The time will be the same as globaltime until the buffer is filled and starts over,
also resetting buffertime. The reason for tracking both times is precisely because
Chapter 4 40
of this possibility. If, for example, the performance has elapsed the buffer length,
causing it to start over, but data from the previous cycle of the buffer needs to be
used, it can be referenced using the globaltime, as using buffertime could relate to
new data in the buffer. However, only referencing from globaltime will not be
effective if the necessity is to playback current material from the buffer. In this
case the position in the buffer from buffertime is needed.
Architecture: Analyzer 41
Fig. 8- Analyzer Component
Chapter 4 42
The designer can independently determine what might constitute a phrase.
Rowe uses discontinuities in characteristics as an indication, with different
characteristics applying different weights in the determination of phrase
boundaries. He gives the example that discontinuities in timing are weighted
more heavily than those in dynamics; meaning changes of dynamics are less
likely to signal a phrase boundary than changes in the timing. When the amount
of change of the different features exceeds a threshold, a phrase is marked. He
also notes that, by the nature of this phrase finding, the discontinuities cannot be
found until they’ve already occurred (Rowe, 1993).
Saxophonist and programmer Ben Carey uses silence as an indication of
phrase separation in his interactive system _derivations (Carey, 2011). When the
audio signal volume drops to 0, or another determined threshold, for a user-‐
defined length of time, a phrase marker can be introduced. Fig. 9 demonstrates a
method of achieving this in Max/MSP. The patch receives the loudness signal
named envelope. When the signal level drops to 0, it starts the clocker. If the
elapsed time reaches the threshold of 500 milliseconds a bang is sent. This bang
indicates that a phrase has been finished, but what is also useful to know is when
the next phrase begins. To indicate this, the bang is stored in onebang until a
non-‐zero allows it to output, indicating the beginning of a new phrase. The non-‐
zero also stops clocker, which then waits for another silence to begin counting
again.
Architecture: Analyzer 43
Fig. 9- Phrase Marker
The note-‐related material is next to the right in Fig. 8, starting with those
concerning pitch. The first record is the actual pitch in MIDI note-‐number format.
Note 57, as shown in Fig. 8, corresponds to the pitch A3. The pitch class can then
be calculated, resulting in the pitch without regard to octave. It is shown in Fig. 8
as A-‐2 octave simply because Max does not have the capability to display the
note name without the octave indication, and -‐2 is the lowest octave. This display
is only for the benefit of the user to easily see the pitch class, and the information
to be recorded is in numeric values, in this case 9 for the note A (C=0, C#=1, etc.).
The interval is calculated by subtracting the current note from the previous,
resulting in the number of semitones between them, and register is calculated by
dividing the pitch by 12. Subtracting by the integer 0 results in a whole-‐number
classification of register. The lowest and highest pitch are recorded twice, both
Chapter 4 44
globally and on a phrase-‐by-‐phrase basis, and using a histo keeps a record of the
number of times a notes is played.
Loudness is received from analyzer~ in decibel format, whereas the
midivelocity is in MIDI format. MIDI keyboards send note-‐on messages when a
key is depressed, but also a note-‐off message of a 0 upon its release. Midivelocity
is calculated with a note-‐off function so that it operates in the same manner. A
note-‐off is sent either when the note changes, when the volume from envelope
drops below a threshold (40 in Fig. 10), or when the volume increases by a
specified percentage after a specified time. The drop below the threshold is a
latency compensation for the fact that the envelope won’t drop to 0 immediately
after the player stops and so more accurately calculates the note-‐off time. The
percentage threshold measures the envelope level every 50 milliseconds and
divides by the previous value. If the increase is above the set percentage then a
note-‐off is reported. The principle is similar to the attack data sent by analyzer~,
however in analyzer~ it is measured by an increase in decibels within a given
time. The method described in Fig. 10 was developed with wind instruments in
mind and accounts for small spikes during tonguing, and is found to be more
accurate in reporting attacks. It allows for the note-‐off message not only with
staccato, but also with legato tonguing. An appropriate threshold should be
personalized for each player and instrument, however.
Architecture: Analyzer 45
Fig. 10- Midi Velocity with Note-off
The velocity values with the note-‐off messages help to determine note
duration, as discussed earlier with Fig. 5. The delta time between the end of one
event and the beginning of the next can be calculated similarly with a timer. The
horizontal density is a measure of the number of notes that occur in a space of
time. Fig. 11 demonstrates calculating this by counting the number of notes in a
phrase and dividing the sum by the length of the phrase in milliseconds. The
multiplication by 1000 and rounding off to an integer is merely to achieve a more
comparable number to assign to the phrase for classification.
Chapter 4 46
Fig. 11- Horizontal Density
The individual databases collect the information from every event for each
descriptor separately. They are kept in a coll database stamped with the indexing
number and the phrase to which they belong. The data in Fig. 12 shows an
example from the pitch database. The first numbers of each line, 10-‐20, indicate
the indexing number, the second indicates the phrase number, and the final
number is the pitch expressed as a MIDI note. Individual databases are kept for
pitch, pitch class, interval, register, loudness, duration, and deltatime. Highest and
lowest pitch, number of note occurrences, and horizontal density are already
statistical data, based on a broader spectrum, so they do not have their own coll.
Brightness and noisiness are also kept from individual databases because their
data flows continuously, rather than on a per-‐event basis, so it will be recorded
in a different manner that will be described later.
Architecture: Analyzer 47
The master coll keeps all the individual data as well timestamps from
globaltime and buffertime, organized by the index. The data in Fig. 13 reads index,
phrase, globaltime, buffertime, pitch, pitchclass, interval, register, loudness, note
duration, and deltatime.
One can see that some of the data doesn’t make sense, such as the duration
values for index 10. Fig. 13 shows a note duration of 0 and delta time of 0, yet a
difference of 519 between the start times of indices 10 and 11. There are a
couple factors that can contribute to misleading data, one being complications
with the Listener component. Further adjustments need to be made in the input
chain by tweaking levels and thresholds to more accurately capture good data
and filter out mistakes.
A second contributing factor that could occur, although that doesn’t appear
to be the case in this instance, is time delay issues. Although data is flowing
extremely quickly in the computer, the patch still ultimately follows a series of
events, which can create slight inconsistencies. As the measurements are being
recorded in milliseconds, which are generally imperceptible, some amount of
leeway is acceptable.
A more holistic viewpoint was discussed earlier in the section about the
Listener component in regard to the nature of improvisation, an imperfect affair
anyway. While striving for accurate data is the goal, accepting the imperfections
can also bring a more “human” element. The comparison was made to a “live”
human duo setting, and the fact that one player will not obtain all the
information provided by the other, but will understand a more general idea of
the phrase. Rowe expresses that the point is not to “’reverse engineer’ human
Chapter 4 48
listening but rather to capture enough musicianship.” With its phrase analysis,
the Analyzer can take this approach to interpreting what it hears as well. By
computing the averages of the characteristic descriptors for each phrase, a
generalized description can be rendered and assigned to each one.
The phrase coll is the largest database, keeping records of not only all the
characteristics held in the master coll, but also of the highest and lowest pitch,
horizontal density, brightness, noisiness, the global and buffer end timestamps,
and the phrase match and confidence level. For each of the descriptors, apart from
the timestamps and highest and lowest pitch, the means and standard deviations
are calculated for the phrase and stored in the phrase coll (Fig. 14), creating what
Thomas Ciufo calls a “perceptual identity” (Ciufo, 2005). At the end of each
phrase, these values are sent for comparison against the means and standard
deviations of all the previous phrases. The phrase with the most matches is
reported with a confidence level, the percentage of matches. This data is added to
10, 2 56;
11, 2 55;
12, 2 50;
13, 3 57;
14, 3 61;
15, 4 61;
16, 4 62;
17, 4 56;
18, 4 64;
19, 4 65;
20, 4 63;
Fig. 12- Pitch Coll Database
10, 2 17386 17386 56 8 -‐2 4 -‐28.907839 0 0;
11, 2 17905 17905 55 7 -‐1 4 -‐19.907631 228 0;
12, 2 18598 18598 50 2 -‐5 4 -‐27.446226 464 0;
13, 3 22499 22499 57 9 7 4 -‐19.360497 3436 3342;
14, 3 22826 22826 61 1 4 5 -‐24.470776 3436 3342;
15, 4 24033 24033 61 1 0 5 -‐34.994293 930 884;
16, 4 24359 24359 62 2 1 5 -‐31.124811 930 884;
17, 4 24729 24729 56 8 -‐6 4 -‐27.600847 930 884;
18, 4 25102 25102 64 4 8 5 -‐28.859121 696 0;
19, 4 25565 25565 65 5 1 5 -‐32.421593 271 0;
20, 4 25893 25893 63 3 -‐2 5 -‐31.064672 420 0;
Fig. 13- Master Coll Database
Architecture: Analyzer 49
the phrase coll as well as to its own separate matches coll to keep track of which
phrases matched to which descriptors for later retrieval.
Carey explores the concept of long-‐term memory with his _derivations. He
has incorporated the ability to save databases and load them into the system in
the future. This Rehearsal Database includes all the data that _derivations
gathered during a previous use of the system, as well as the saved recording
from the buffer. Loading previous databases allows the system to make use of
what it has learned before “with an already rich vocabulary of phrases and
spectral information” (Carey, 2011).
Chapter 4 50
Fig. 14- Phrase Coll Database
Architecture: Analyzer 51
Fig. 15- Phrase Matcher
The collection of information into the individual databases helps to create a
system that is learning based on Michalski’s definition, “constructing or
modifying representations of what is being experienced”. The incorporation of
the phrase-‐matching component is the starting point to also bring it in line with
Russell and Norvig’s definition, “behaving better as a result of experience”. The
arrival of information into the individual colls is akin to implicit learning, and
actively matching this against other memories exhibits explicit learning behavior.
The system has had, and has made notes of, previous experiences, and the
phrase-‐matching allows it to start comparing new experiences to the old ones
and make decisions based on what it has learned. For example, in Fig. 15 phrase
Chapter 4 52
34 is best matched to phrase 19 with a confidence level of 25%. The Analyzer
could decide to use data from the matching parameters of phrases 34 and 19
(pitch, pitchclass, and brightness) to send to the Composer. Or, it could decide to
use the data from the non-‐matching parameters, or perhaps it decides to just use
data from brightness. Phrase matching could also use weighting to allow certain
descriptors to play a more dominant role in determining which phrases match.
Using the confidence level enables an additional level of matching, and the
Analyzer could choose to match data only with phrases that have a confidence
level at least as high. The means and standard deviations of the input signal
could also be calculated in real-‐time and analyzed in another instance of the
phrase matcher, calculating real-‐time matches to previous phrases
characteristics. The Analyzer could then determine, for instance, that the
performer is currently playing notes with short durations, and decide to
accompany by playing a phrase or phrase fragment from the buffer of
predominantly long notes. The options of possibilities are limited only to the
creativity and knowledge of the system developer.
The concern of bias from the developer was mentioned earlier, and it is
here and with the Composer component that it can be most evident. With the
Analyzer, the bias can result from the ways the system handles decision-‐making,
whereas with the Composer it could be from the sonic and musical aesthetic of
the developer, and what types of compositional techniques are used. Widmer
cautioned in the choice of representation language to avoid bias. The relevance
to his heed in this case lies in the programming of the decision-‐making.
Architecture: Analyzer 53
It is important to not create solely finite conditional statements (if x occurs,
then do y) as this leads to predictable behavior, not befitting of an
improvisational system. A better condition would be: “if x occurs, then do y or z
or q or l or w, or…” etc., where each variable is an appropriate response to the x
condition. An example in a live improvisation is that Player 1 is improvising fast
notes, mainly in a lower register, but sometimes will play a long, high note.
Player 2 hears this high note as a unique musical idea that he wants to utilize,
and decides on possible options to do so, such as matching the long, high note; or
playing short, low notes; or harmonizing the note; or use it as a starting note to
base another phrase, etc. These decisions are all implicit responses of Player 2
that will manifest themselves naturally during improvisation. An even better
condition would to be replace “if x occurs” with “if x occurs a (randomly
generated number) of times”, and for each then statement also have variable
factors, and then to have this entire conditional if-then statement active only at
some times.
By using multiple instances of this type of condition available for different
actions, a toolbox is being built up. The system will respond based on its
programmed knowledge, and therefore may react similarly to a previous time,
but never in the exact same way. It will be predictable in that its responses make
sense in the moment and sometimes will make the same decision as it had in
some previous instance, but unpredictable in what the output will be. This
exemplifies Levin’s quote previously stated in regard to improvisation:
“The fact of the matter is that you are who you have been in the process of being who you will be, and in nothing that you do will you suddenly-‐ as an artist or a person-‐ come out with something that you have never done before in any respect. There will be quite possibly
Chapter 4 54
individual elements in a performance that are wildly and pathbreakingly different from anything that you’ve done before, but what about the rest and what kind of persona and consistency of an artist would you have if there was no way to connect these things…?” (Levin, 2007).
The system will have its own personality and sound, the same way that people
are able to hear Miles Davis, or John Coltrane, or any number of musicians, and
immediately know that it is them playing, even though they are not playing
exactly anything they’ve ever played before.
How the Analyzer makes the decisions of which action to take after making
an analysis, or of which if-then condition to activate, is tied also to the discussion
of improvisation. Discussed earlier was the fact that improvisers are aware of
larger, global-‐scale, explicit elements, but the fine details are just motoric,
implicit, responses. An interactive system can reconstruct this condition with the
use of constrained randomization.
John Cage experimented with randomness and indeterminacy in the forties
and fifties, using algorithmic and random procedures as compositional tools, to
select options or set musical parameters (Winkler, 1998). This is related to
improvisation in that the outcome is unknown until it happens. Algorithms are
not cognitive and thus cannot make creative decisions, but they can, however,
“produce non-‐arbitrary changes in state… manifest[ed] as a ‘decision’ when it
modifies the audio environment… [I]t has the affect of intention” (Young, 2008).
Young continues to say that the unpredictable output of both performer and
computer should not be achieved through “simple sonification of rules or sheer
randomness. There should be a critical engagement between intended
behaviours, an appraisal of potential behaviours and response to actual sonic
Architecture: Analyzer 55
realisations and their unfolding history.” A certain amount of randomization
occurs during improvisation, but it is still within a context. The constraint is
what makes it still sound like music, as opposed to pure chaos randomness. It is
very easy to generate completely random output within Max/MSP, but it is also
possible to use parameters to frame the randomization, as illustrated in the
several types of procedures in Fig. 16. Fig. 16c-‐i are part of a collection from
Karlheinz Essel6. They provide useful expansions on randomization procedures.
Fig. 16a) generates a random integer between 0 and 9.
Fig. 16b) generates a random integer between 0 and 9, within 3 integers of the previous generation.
Fig. 16c) generates an integer between 0 and 9 where the adjacent outputs are adjacent numbers.
Fig. 16d) generates an integer between 0 and 9 ensuring no immediate repetitions.
Fig. 16e) generates an integer between 0 and 9 with a 30% chance of repetition.
Fig. 16f) generates an integer between 0 and 9 without repeats until all numbers have been generated.
Fig. 16g) generates a floating-‐point decimal number between -‐10 and 9.99999.
Fig. 16h) uses the drunk object and will generate any float number up to 5 decimal points between -‐10 and 9.99999, using a Brownian linear scale.
Fig. 16i) generates an integer between 0 and 5 using a Markov chain, a table of transitional probability.
6 Karlheinz Essl: http://www.essl.at/
Chapter 4 56
Fig. 16- Random procedures
Some of the useful applications in music can already be seen, particularly
with Fig. 16c, which can generate stepwise motion, and Fig. 16f, which can
generate a twelve-‐tone row. All of the parameter settings, or arguments, given in
the descriptions of the figures represent those illustrated, but can all be changed.
The random generators are not limited to producing only numbers between 0
and 9. The arguments for each of these objects can be linked to the data collected
by the Analyzer to create randomizations that have a reference to the musical
performance. For example, the lowest pitch and highest pitch could be fed to the
between object in Fig. 16g to generate pitches within the same range.
Rowe uses another instance of an Analyzer in Cypher that listens to the
output of the Composer. He calls this the Critic. The decisions the Composer has
made of what music it will produce is sent to the Critic for analysis before being
sent to the sound generators, and fits to Levelt’s fourth process of speech
processing, self-monitoring and self-repair. This allows the system to make
modifications before actually creating the music. Rowe acknowledges that
“evaluating musical output can look like an arbitrary attempt to codify taste,”
Architecture: Analyzer 57
and the capacity for the system to have “aesthetic decision making” skills is
“arbitrary”, and it needs “a set of rules [that] controls which changes will be
made to a block of music material exhibiting certain combinations of attributes”
(Rowe, 1993). This is again a viable source of bias. It could be argued that
including various rules helps to maintain musicality that a computer cannot
inherently have, but the counter-‐argument can easily be made as to how this
definition of musicality is written. It is again important that the reactions of the
Critic aren’t represented by strict rules, but the use of probability weights can
help maintain a learning paradigm. For example, if in one phrase the live
performer played loudly and the computer responded by playing quietly, the
Critic could increase the probability weight that the next time the performer
plays quietly, the computer will play loudly, as in a solo/comping exchange
situation. Representing this musical possibility as a strict rule would not be
conducive to improvisation, but incorporating it as a possibility in the toolbox
with parameters to find the probability that this action is appropriate is.
Another possible way to incorporate a critic is by analyzing the output of
the Composer with the response from the performer. In a duo improvisation,
each player is responding to each other, taking in what the other has played and
making musical comments, described by Hodson as “a self-‐altering process: the
musical materials improvised by each musician re-‐enter the system, potentially
serving as input to which the other performers may respond” (Hodson, 2007).
By analyzing how the live performer reacts to the computer, the system can
learn about its own composing as well, and what “works” or not. Decisions can
be made based on whether the performer is cooperating or trying to take the
Chapter 4 58
music in a different direction. In this way, the critique is based on the
performance and interaction of the moment, rather than codified rules.
59
4d. Composer
“Improvisation defies clear definition. Even though most musicians have difficulty explaining what it is, many can tell you the basic way that they approach it. Unlike jazz, which often deals with the improvisatory rules in a kind of gamelike exchange of modes and melodies, electronic music often lacks the qualities of rhythm, harmony, and melody that many jazz musicians rely on. Instead, electronic music improvisation is sound: the shape of the envelope; timbre; rhythm; layers or filtering; effects (echo, delay, ring modulation, etc.); amplitude; and duration. A seasoned improviser learns how to listen to many layers of sound activity as part of a performance” (Holmes, 2002).
Thom Holmes’ quote gives important insight for the approach to
developing the Composer component of an electronic improvising system. Not
only is it applicable to electronic improvisation, but also to the genre of free
improvisation as a whole. Previous systems like Robert Rowe’s Cypher or George
Lewis’ Voyager created MIDI-‐based improvisations, which are focused on the
note and rhythm paradigm. With the DSP capabilities of today, the musical realm
for electronics is expanded exponentially. While pitch and rhythm are certainly
still appropriate musical considerations, the world of sound design, with the
ability to sculpt, manipulate, and synthesize, has become an equally viable option.
There are three types of compositional methods available to a computer:
sequencing, transformation, and generation (Rowe, 1993). Sequenced music is
predetermined in some way, traditionally as a MIDI sequence, but can also be
prerecorded audio that is triggered to play back. Algorithms that produce a fixed
response, such as those that do not use indeterminate variables, are also
considered sequenced. Transformation takes the original material and changes it
in some way to produce variations. This can range from obvious transformations,
like adding a trill to a note or passing the signal through effects like a ring
Chapter 4 60
modulator, to more intricate variations like creating a retrograde inversion or
playing the signal backwards, to a complex re-‐synthesis of the entire sound
spectrum. Generative composition uses algorithms with very little source
material to produce music on its own. It could make use of information like a
scale set from which to choose pitches, but the lines produced are unique choices
from within the scale. Sound design techniques like additive or vector synthesis
are also generative composition. Within the context of improvisation,
transformative and generative composition are the most useful techniques and
will be the ones addressed here.
The options for the capabilities of the Composer are limitless. It is in the
development of this component, the building of the toolbox, that the designer’s
creativity can unleash. Some of the transformational techniques that Cypher is
capable of include:
Accelerator- shortens the durations between events.
Accenter- puts dynamic accents on some of the events in the event block.
Arpeggiator-‐ unpacks chord events into collections of single-‐note events, where each of the new events contains one note from the original chord.
Backward- takes all the events in the incoming block and reverses their order.
Basser- plays the root of the leading chord identification theory, providing a simple bass line against the music being analyzed.
Chorder- will make a four-‐note chord from every event in the input block.
Decelerator- lengthens the duration between events.
Flattener- flattens out the rhythmic presentation of the input events, setting all offsets to 250ms and all durations to 200ms.
Glisser- adds short glissandi to the beginning of each event in the input block.
Architecture: Composer 61
Gracer- appends a series of quick notes leading up to each event in the input block. Every event that comes in will have 3 new notes added before it.
Harmonizer- modifies the pitch content of the incoming event block to be consonant with the harmonic activity currently in the input.
Inverter- takes the events in the input block and moves them to pitches that are equidistant from some point of symmetry, on the opposite side of that point from where they started. All input events are inverted around the point of symmetry.
Looper- the loop module will repeat the events in the input block, taken as a whole.
Louder- adds crescendo to the events in the input block.
Obbligato- adds an obbligato line high in the pitch range to accompany harmonically whatever activity is happening below it.
Ornamenter- adds small, rapid figures encircling each event in the input block.
Phrase- temporally separates groups of events in the input block.
Quieter- adds decrescendo to the events in the input block.
Sawer- adds four pitches to each input event, in a kind of sawtooth pattern.
Solo- is the first step in the development of a fourth kind of algorithmic style, lying between the transformative and purely generative techniques.
Stretcher- affects the duration of events in the input block, stretching them beyond their original length.
Swinger- modifies the offset time of events in the input block. The state variable swing is multiplied with the offset if every other event; a value of swing equaling two will produce the 2:1 swing feel in originally equally spaced events.
Thinner- reduces the density of events in the input block.
TightenUp- aligns events in the input block with the beat boundary.
Transposer- changes the pitch level of all the events in the input block by some constant amount.
Tremolizer- adds three new events to each event in the input block. New events have a constant offset of 100ms, surrounding the pitch with either two new above and one below, or two new below and one above.
Chapter 4 62
Triller- adds four new events to each event in the input block as a trill either above or below the original pitch. (Rowe, 1993).
These transformations are rather easy to accomplish within the MIDI domain,
but many can also be applied in DSP. Of Rowe’s transformational techniques, the
ones that are easily accomplished in direct relation to a phrase can be put into
three categories: time-domain, pitch-domain, and volume-domain. Those in the
time-‐domain include: accelerator, decelerator, looper, phrase, and stretcher;
pitch-‐domain include: chorder, harmonizer, inverter, and transposer; and volume-‐
domain are: louder, and quieter. Backwards is also an easy time transformation,
but functions differently than Rowe’s. Rather than a retrograde as he describes, it
is possible to play backwards like spinning a vinyl LP record backwards. A
retrograde is also possible, but a more complicated task that will be discussed
later.
Time-‐stretching is possible using objects such as the supervp~ (Super Phase
Vocoder) collection7 and grainstretch~8, allowing for speeding-‐up or slowing-‐
down audio in the buffer without changing the pitch. These objects, as well as
native objects like groove~, can also be used for looping, phrase-‐making, and
backwards playback. Supervp~ and grainstretch~ are also capable of pitch-‐
shifting for harmonizing and transposition. Other Fast Fourier Transform (FFT)
objects like gizmo~ also perform pitch-‐shifting, and can be used for inversions.
This can be easily accomplished by using the same process as to create a MIDI
inversion, shown in Fig. 17. This patch functions just as Rowe describes, inverted
around middle C, or MIDI note 60. In this example a G (MIDI note 79) is played,
7 SuperVP is available from IRCAM: http://anasynth.ircam.fr/home/english/software/supervp
8 Grainstretch~ was written by Timo Rozendal: http://www.timorozendal.nl/?p=456
Architecture: Composer 63
nineteen semi-‐tones above middle C, which is then inverted to an F (MIDI note
41), nineteen semi-‐tones below. The pitches are converted to their frequencies in
hertz, and the inverted pitch is divided by the original to find the transposition
factor. This value is sent to gizmo~ (inside the pfft~ patcher) to transpose the
incoming signal from the performer, producing an inverted accompaniment. The
crescendo and decrescendo volume transformations are as easy as increasing or
decreasing the amplitude over the length of the phrase playback.
Fig. 17- FFT Inversion
The other transformations Rowe uses, such as the retrograde, require
adjustments to individual events within a phrase. The transformations can be
applied similarly, but either data from the individual colls needs to be accessed to
determine where the events occur within the buffer, or other techniques need to
be used to manipulate the individual notes.
Chapter 4 64
The examples of the objects above in the time and pitch domains can also
be used in much more creative ways using DSP. The supervp~ objects has many
options for cross-‐synthesizing one signal with another for vocoding and filtering
applications, and grainstretcher~’s granular transformations can create a wealth
of possibilities. The sinusoidal data from sigmund~ can also be used in a
transformational manner with a generative aspect as well. Fig. 18 demonstrates
a simple synthesizer that uses oscillators to generate sine-‐waves using the
frequencies and amplitudes of the overtones from the input signal. Each
frequency can also be transposed individually, or on a global level, and the
amplitudes can be swapped to different frequencies. The drunksposition
subpatch uses a random generator that can give a vibrato effect, with varying
degrees of speed and width, using a transposition function. This synthesizer
could be used as an effect on the input signal or using a phrase from the buffer.
Other typical effects are also transformational options of the Composer like delay,
distortion, ring modulation, chorus, flanger, and envelope filters which can all
easily be added to the signal chain.
Generative composition uses the completion of processes and algorithms to
create music. Pre-‐existing material is not necessary, but the generation can be
based on set parameters. Fig. 16f is an example of a generative algorithm that,
when the max is set to 12, would produce the numbers for a twelve-‐tone serial
row. Using these as MIDI pitch classes, octave displacements could be made and
the notes sent to sound generators for further realization. The pitches could
easily be played as MIDI output, or converted to frequencies and sent to other
generators, like one of the oscillators of Fig. 18.
Architecture: Composer 65
Fig. 18- Overtone Synth
Similar formalisms can be used for timing. Using Brownian motion from Fig.
16h, Essl also created a patch to generate rhythms. In Fig. 19, a sound is
Chapter 4 66
produced between every 51-‐1000 milliseconds (entry delays, ED). The ED-value
of 12 indicates that there are twelve permutations available (the row index), each
assigned to a value between 51-‐1000. The Brown factor determines how close
the output is to the previous generation, 0 creating a constant and 1 creating
pure randomness. Fig. 20 combines these components to generate notes with a
rhythm and articulation. The rhythm generator is enhanced with the durations,
so that it creates notes that occur within a space of time from each other, but also
last differing amounts of time. The pitch and durations are sent to a MIDI
soundbank, an oscillator synthesizer, or both simultaneously. Arguments for
these randomization modules can be taken from data from the Analyzer to make
the output more relevant to the input signal. Further, the expansion of the
toolbox can continue to enhance the generation from the Composer, such as by
including data in regard to scales and modes. From this, the melody generator
could have a more limiting set from which to compose, and formulas for
rhythmic composition could create a more metered pulse.
Fig. 19 - Essl Brownian Rhythm Generator
Architecture: Composer 67
Fig. 20- Essl Brownian Pitch-Rhythm-Articulation Generator
Besides note-‐based synthesis, Max/MSP is also capable of soundscape
creation. One simple example is Fig. 21 from Alessandro Cipriani and Maurizio
Giri’s book Electronic Music and Sound Design demonstrating a white noise
generator with a frequency filter. Adjusting the parameters of the filter creates a
wide spectrum of sonic variety. Other synthesis can be produced through
combining and manipulating oscillators of different waveform shapes (sine,
sawtooth, square, triangle), used in conjunction with envelope filters. Combining,
layering, and using the output from one compositional element to affect and
influence another are all methods to further create interesting results. The
output from these soundscape generations can also be used for cross-‐synthesis
transformation with the input signal or the buffer. The possibilities of sound
design within Max/MSP are huge, and discussing them all is beyond the scope of
Chapter 4 68
this paper. For further study, I refer the interested reader to Cipriani and Giri’s
book.
Fig. 21- Cipriani/Giri- Noise Filtering
This section has discussed the design structure and architectural
requirements for an improvisational system. Differences between score-‐driven
and performance-‐driven paradigms, as well as instrumental and player
paradigms, were described as models for the interactive system. The
architecture was defined in three components, the Listener, Analyzer, and
Composer. The Listener accepts and collects the input, the Analyzer makes
processes, makes decisions about, and stores the data, and the Composer
produces music either sequentially, tranformationally, or generatively. The
incorporation of constrained indeterminacy helps to maintain an
improvisational yet musically relevant nature.
69
5. Conclusion
The focus of this paper has been on the development of an interactive
electronics system for improvised music. It has considered how the use of
electronics has evolved over time and its role in music. There was discussion
about the nature of improvisation and brain processes relating to cognition
while playing, and it was learned that improvising is an automatic response
based on learned elements in one’s musical “toolbox”. The concept of learning as
a basis for intelligence was then discussed, along with ways that this can be
achieved artificially with a computer. After these theoretical constructs were
gathered, the development of the software system itself was examined.
Implementing performance-‐driven, player paradigms as the best approaches for
interactive improvisation, Robert Rowe’s Cypher was used as a model and point
of discussion. The components of the Listener, Analyzer, and Composer of my
own interactive system were analyzed with reference to what was discovered
about improvisation and learning. By creating a database and referencing new
knowledge to it, the computer is able to learn and make informed choices. By
building a “toolbox” of musical knowledge, coupled with constrained
indeterminacy, the system is able to make music in the same theoretical manner
as improvising musicians.
Further developments in my own system need to include expanding on the
Composer and building more compositional tools for it to use. This can become
daunting as the options and possibilities are so numerous. It is important to have
a diverse toolbox for the system to work from to keep the music fresh and from
becoming predictable, but it is also very easy to become trapped in a state of
Chapter 5 70
trying to incorporate every little thing possible, using all sorts of different
generational and transformational techniques. On the one hand, the larger the
toolbox, the less prone to repetition of sonic character it will be. On the other
hand, using a model of human improvisers shows that this is the reality of
improvisation. Although there is a plethora of recombinations from the toolbox
possible, the fact remains that there is virtually nothing an improviser will play
that he hasn’t played in some way before. So a compromising balance in the
system development has to be struck to account for this. Once more
compositional elements have been built, I need to focus again on the Analyzer
and determine the best ways for it to communicate to the Composer. I still need
to develop the decision-‐making tools of how it will use the learned data to
respond in a musical manner. Further development of the analysis itself can still
be done as well. I’d like to look more into the use of probability equations and
neural networking as learning tools to integrate into the system. Refinements
can also be made to the input chain, finding the best settings for correct data
collection and responsiveness.
I am also interested in exploring non-‐auditory communication within
improvisation. Eye-‐contact and other visual cues can also be important aspects
to musical communication, and might be able to be included into the system via
Jitter, the visual component of Max/MSP. There are tools capable of shape and
color tracking using just the built-‐in web-‐camera of a laptop with Jitter, so the
possibility of integrating visual cues is certainly there. Further research would
need to be done as to the best way to do this within the framework of
improvisation. I imagine the research would be in regard to what visual cues
Conclusion 71
different improvisers notice from their fellow musicians, and how they interpret
them. I can also see this line of development as becoming extremely complex, as
subtle visual cues can also be very subjective and vary between people, so the
focus of how this information would be used in an interactive improvisation
would need to be defined.
My goal in developing this system is initially for my own use as a solo tool,
but I would also like to expand it for use in my electro-‐acoustic improvisation
duo with a saxophonist, and then possibly for an even larger ensemble. One way
to do this would simply be to use two instances of the patch, but this is more
likely to result in three separate duos performing at once, that of
clarinet/electronics 1, saxophone/electronics 2, and clarinet/saxophone. The
two electronics systems would not be communicating directly with each other,
nor with the other player. For more coherency, it would be best for all the
information to be fed to a central point somewhere in the chain, and the final
result either be a full trio or quartet ensemble. The difference would be whether
the electronics are designed to be two separate systems, each interacting with a
live performer, but as well as with each other to create a quartet; or one
electronic system responding to the live performers equally and creating a trio.
I anticipate it would take about another year to fully develop the patch in
the direction I’m currently taking with it, and perhaps a little more time to really
test and tweak it. Expanding it for multiple players might take another few
months of developmental work, and the inclusion of video, with all the
possibilities it introduces and the research needed to find the best ways to
include it, could easily add another year. Once the system is done I would allow it
Chapter 5 72
to be distributed to other electro-‐acoustic improvisers to use, pending any
licensing restrictions with any third party objects or abstractions that are used.
However, I also hope that this paper has been informative enough to help guide
people in building their own systems, for those so inclined. As mentioned in the
paper, there will be an inherent bias imposed by the developer influencing the
output, so the more people that build their own systems, the broader the
repertoire on the whole becomes.
73
References
Bartók, Bela. 1976. “Mechanical Music” in Bela Bartók Essays, ed. Benjamin Suchoff. London: Faber & Faber
Bench-‐Capon, T.J.M. 1990. Knowledge Representation: An Approach to Artificial Intelligence. London: Academic Press
Berkowitz, Aaron L. 2010. The Improvising Mind: Cognition and Creativity in the Musical Moment. New York: Oxford University Press
Berliner, Paul. 1994. Thinking in Jazz: The Infinite Art of Improvisation. Chicago: University of Chicago Press
Bilson, Malcolm. 2007. Interview by Aaron L. Berkowitz, Ithaca, NY, August 12
Carey, Ben. 2011. Email discussions throughout 2011-‐2012. Ben Carey Website. Retrieved March 7 2012. http://www.bencarey.net/#25f/custom_plain
Chomsky, Noam. 1957. Syntactic Structures. The Hague: Mouton
Cipriani, Alessando and Maurizio Giri. 2009. Electronic Music and Sound Design: Theory and Practice with Max/MSP, volume 1, trans. by David Stutz, 2010 Rome: ConTempoNet s.a.s.
Ciufo, Thomas. 2005. “Beginners Mind: An Environment for Sonic Improvisation” in International Computer Music Conference Proceedings
Cope, David. 1977. New Music Composition. New York: Schirmer Books.
Csikszentmihályi, Mihály and Grant Jewell Rich. 1997. “Musical Improvisation: A Systems Approach,” in Creativity in Performance, ed. Keith Sawyer. Greenwich: Ablex Publishing
Czerny, Carl. 1836. A Systematic Introduction to Improvisation on the Pianoforte, Op.200, Vienna, trans. and ed. Alice L Mitchell, 1983. New York: Longman
Czerny, Carl. 1839. Letters to a Young Lady on the Art of Playing the Pianoforte, from the Earliest Rudiments to the Highest Stage of Cultivation, Vienna, trans. J.A. Hamilton, 1851. New York: Firth, Pond and Co.
Dannenberg, Roger. 2000. “Dynamic Programming for Interactive Systems” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Ellis, Nick. 1994. “Implicit and Explicit Language Learning-‐ An Overview,” in Implicit and Explicit Learning of Languages, ed. Nick Ellis. London: Academic Press
Eysenk, Michael W. and Keane, Mark T. 2005. Cognitive Psychology: A Student’s Handbook, 5th edn. East Sussex: Psychology Press
References 74
Gass, Susan M. and Larry Selinker. 2008. Second Language Acquisition, An Introductory Course, 3rd edn. New York: Routledge
Hodson, Robert. 2007. Interaction, Improvisation, and Interplay in Jazz. New York: Routledge
Holmes, Thom. 2002. Electronic and Experimental Music, second edition. New York: Routledge
Levelt, Willem J.T. 1989. Speaking. Cambridge: MIT Press
Levin, Robert. 2005. “Lecture 8” Harvard University Course “Literature and Arts B-‐52: Mozart’s Piano Concertos,” Sanders Theater, Harvard University, Cambridge, MA, October 14
Levin, Robert. 2007. Interview by Aaron L. Berkowitz, Cambridge, MA, September 10
Luger, G.F. and W.A. Stubblefield. 1989. Artificial Intelligence and the Design of Expert Systems. Redwood City: Benjamin/Cummings
Manning, Peter. 2004. Electronic and Computer Music. New York: Oxford University Press
Marsden, Alan. 2000. “Music, Intelligence and Artificiality” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Meyer, Leonard. 1989. Style and Music: Theory, History, and Ideology. Philadelphia: University of Pennsylvania Press
Michalski, R.S. 1986. “Understanding the Nature of Learning: Issues and Research Directions” in Machine Learning: An Artificial Approach, vol. II, eds. R.S. Michalski, T. Mitchell, and J. Carbonell. Los Altos, CA: Morgan Kaufmann
Miranda, Eduardo Reck. 2000. “Regarding Music, Machines, Intelligence and the Brain: An Intro to Music and AI” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Nardone, Patricia L. 1997. “The Experience of Improvisation in Music: A Phenomenological Psychological Analysis,” PhD diss., Saybrook Institute
Nettl, Bruno. 1974. “Thoughts on Improvisation: A Comparative Approach,” The Musical Quarterly 60
Paradis, Michael. 1994. “Neurolingistic Aspects of Implicit and Explicit Memory: Implications for Bilingualism and SLA,” in Implicit and Explicit Learning of Languages, ed. Nick Ellis. London: Academic Press
Pratella, Balilla. 1910. “Manifesto of Futurist Musicians”. Milan: Open statement
Pratell, Balilla. 1911. “Technical Manifesto of Futurist Music”. Milan: Open statement
References 75
Pressing, Jeff. 1984. “Cognitive Processes in Improvisation, “ in Cognitive Processes in the Perception of Art, eds. W. Ray Crozier and Anthony J. Chapman. Amsterdam: Elsevier
Pressing, Jeff. 1998. “Psychological Constraints on Improvisational Expertise and Communication,“ in In the Course of Performance: Studies in the World of Musical Improvisation, eds. Bruno Nettl and Melinda Russel. Chicago: University of Chicago Press
Reber, Arthur. 1993. Implicit Learning and Tacit Knowledge: An Essay on the Cognitive Unconscious. New York: Oxford University Press
Rolland, Pierre-‐Yves and Jean-‐Gabriel Ganascia. 2000. “Musical Pattern Extraction and Similarity Assessment” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Rowe, Robert. 1993. Interactive Music Systems. Cambridge: MIT Press
Russell, S.J. and P. Norvig. 1995. Artificial Intelligence: A Modern Approach. Englewood Cliffs, NJ: Prentice Hall
Russolo, Luigi. 1913. “The Art of Noises”. Milan: Open statement to Balilla Pratella
Schenker, Heinrich. 1954. Harmony, trans. Elisabeth Mann Borgese. Chicago: University of Chicago Press
Simon, H. and R.K. Sumner. 1968. “Patterns in Music” in Formal Representations of Human Judgement. New York: John Wiley Sons
Toivianen, Petri. 2000. “Symbolic AI versus Connectionism in Music Research” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Widmer, Gerhard. 2000. “On the Potential of Machine Learning for Music Research” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Wiggens, Geraint and Alan Smaill. 2000. “Musical Knowledge: What can AI bring to the musician?” in Readings in Music and Artificial Intelligence, ed. Eduardo Reck Miranda. Amsterdam: Harwood Academic Publishers
Winkler, Todd. 1998. Composing Interactive Music: Techniques and Ideas Using Max. Cambridge: MIT Press
Young, Michael. 2008. “NN Music: Improvising with a ‘Living’ Computer” in CMMR 2007, LNCS 4969, eds. R. Kronland-‐Martinet, S. Ystad, and K. Jensen. Berlin Heidelberg: Springer-‐Verlag
Zwicker, E. and H. Fastl. 1990. Psychoacoustics, Facts and Models. Berlin: Springer Verlag