a computer model for the schillinger system of musical composition

A Computer Model for theSchillinger System of Musical

Composition

Matthew Rankin

A thesis submitted in partial fulfillment of the degree of

Bachelor of Science (Honours) atThe Department of Computer Science

Australian National University

August 2012

c©Matthew Rankin

Except where otherwise indicated, this thesis is my own original work.

Matthew Rankin28 August 2012

Acknowledgements

The author wishes to sincerely thank Dr. Henry Gardner for his extremely valuableassistance, insight and encouragement; Dr. Ben Swift also for his continuous encour-agement and academic mentorship; Jim Cotter for igniting what was a smoulderinginterest in algorithmic composition and more recently providing participants for thelistening experiment; and Mia for her unyielding, belligerent optimism.

v

Abstract

A system for the automated composition of music utilising the procedures of JosephSchillinger has been constructed. Schillinger was a well-known music theorist andcomposition teacher in New York between the first and second World Wars who de-veloped a formalism later published as The Schillinger System of Musical Composition[Schillinger 1978]. In the past the theories contained in these volumes have generallynot been treated in a sufficiently rigorous fashion to enable the automatic genera-tion of music, partly because they contain mathematical errors, notational inconsis-tencies and elements of ‘pseudo-science’ [Backus 1960]. This thesis presents ways ofresolving these issues and a computer system which can generate compositions usingSchillinger’s formalism. By means of the analysis of data gathered from a rigorouslistening survey and the results from an automatic genre classifier, the output of thesystem has been validated as possessing intrinsic musical merit and containing a rea-sonable degree of stylistic diversity within the broad categories of Jazz and WesternClassical music. These results are encouraging, and warrant further development ofthe software into a flexible tool for composers and content creators.

vii

Contents

Acknowledgements v

Abstract vii

1 Background 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Introduction to the Schillinger System . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Schillinger in Computer-aided Composition Literature . . . . . . 31.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.3 Criticism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Summary of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Overview of Computer-aided Composition 72.1 Dominant Paradigms in Computer-aided Composition . . . . . . . . . . 9

2.1.1 Style Imitation versus Genuine Composition . . . . . . . . . . . . 92.1.2 Push-button versus Interactive . . . . . . . . . . . . . . . . . . . . 102.1.3 Data-driven versus Knowledge-engineered . . . . . . . . . . . . . 112.1.4 Musical Domain Knowledge versus Emergent Behaviour . . . . . 12

2.2 Formal Computational Approaches . . . . . . . . . . . . . . . . . . . . . 122.2.1 Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . 152.2.3 Generative Grammars and Finite State Automata . . . . . . . . . 162.2.4 Case-based Reasoning and Fuzzy Logic . . . . . . . . . . . . . . . 182.2.5 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . 202.2.6 Chaos and Fractals . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.7 Cellular Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2.8 Swarm Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3 The Automated Schillinger System in Context . . . . . . . . . . . . . . . 27

3 Implementation of the Schillinger System 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 A Brief Refresher . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1.2 The Impromptu Environment . . . . . . . . . . . . . . . . . . . . . 31

3.2 Theory of Rhythm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.1 Rhythms from Interference Patterns . . . . . . . . . . . . . . . . . 323.2.2 Synchronisation of Multiple Patterns . . . . . . . . . . . . . . . . . 343.2.3 Extending Rhythmic Material Using Permutations . . . . . . . . . 34

ix

x Contents

3.2.4 Rhythms from Algebraic Expansion . . . . . . . . . . . . . . . . . 353.3 Theory of Pitch Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 Flat and Symmetric Scales . . . . . . . . . . . . . . . . . . . . . . . 363.3.2 Tonal Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.3.3 Nearest-Tone voice-leading . . . . . . . . . . . . . . . . . . . . . . 383.3.4 Deriving Simple Harmonic Progressions From Symmetric Scales 40

3.4 Variations of Music by Means of Geometrical Progression . . . . . . . . . 413.4.1 Geometric Inversion and Expansion . . . . . . . . . . . . . . . . . 413.4.2 Splicing Harmonies Using Inversion . . . . . . . . . . . . . . . . . 43

3.5 Theory of Melody . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.1 The Axes of Melody . . . . . . . . . . . . . . . . . . . . . . . . . . 443.5.2 Superimposition of Rhythm and Pitch on Axes . . . . . . . . . . . 463.5.3 Types of Motion Around the Axes . . . . . . . . . . . . . . . . . . 483.5.4 Building Melodic Compositions . . . . . . . . . . . . . . . . . . . 52

3.6 Structure of the Automated Schillinger System . . . . . . . . . . . . . . . 543.6.1 Rhythm Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.6.2 Harmonic and Melodic Modules . . . . . . . . . . . . . . . . . . . 583.6.3 Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.7 Parts of Schillinger’s Theories Not Utilised . . . . . . . . . . . . . . . . . 623.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Results and Evaluation 674.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.2 Common Methods of Evaluation . . . . . . . . . . . . . . . . . . . . . . . 674.3 Automated Schillinger System Output . . . . . . . . . . . . . . . . . . . . 684.4 Assessing Stylistic Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.4.1 Overview of Automated Genre Classification . . . . . . . . . . . . 714.4.2 Choice of Software . . . . . . . . . . . . . . . . . . . . . . . . . . . 734.4.3 Classification Experiment . . . . . . . . . . . . . . . . . . . . . . . 734.4.4 Preparation of MIDI files . . . . . . . . . . . . . . . . . . . . . . . 744.4.5 Classifier Configuration . . . . . . . . . . . . . . . . . . . . . . . . 754.4.6 Classification Results . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.5 Assessing Musical Merit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 784.5.1 Listening Survey Design . . . . . . . . . . . . . . . . . . . . . . . . 784.5.2 Listening Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 814.5.3 Quantitative Analysis and Results . . . . . . . . . . . . . . . . . . 814.5.4 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.5.4.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 864.5.4.2 Analysis and Results . . . . . . . . . . . . . . . . . . . . 874.5.4.3 Genre and Style . . . . . . . . . . . . . . . . . . . . . . . 91

4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Contents xi

5 Conclusion 955.1 Summary of Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.2 Avenues for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

A Samples of Output 99A.1 Harmony #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99A.2 Harmony #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99A.3 Harmony #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100A.4 Melody #1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100A.5 Melody #2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101A.6 Melody #3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

B Listening Survey 103

C Function List 113C.1 Rhythmic Resultants — Book I: Ch. 2, 4, 5, 6, 12 . . . . . . . . . . . . . . . 113C.2 Rhythmic Variations — Book I: Ch. 9, 10, 11 . . . . . . . . . . . . . . . . . 113C.3 Rhythmic Grouping and Synchronisation — Book I: Ch. 3, 8 . . . . . . . 114C.4 Rhythmic Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114C.5 Scale Generation — Book II: Ch. 2, 5, 7, 8 . . . . . . . . . . . . . . . . . . 114C.6 Scale Conversions — Book II: Ch. 5, 9 . . . . . . . . . . . . . . . . . . . . 114C.7 Harmony from Pitch Scales — Book II: Ch. 5, 9 . . . . . . . . . . . . . . . 114C.8 Geometric Variations — Book III: Ch. 1, 2 . . . . . . . . . . . . . . . . . . 115C.9 Melodic Functions — Book IV: Ch. 3, 4, 5, 6, 7 . . . . . . . . . . . . . . . . 115

Bibliography 117

xii Contents

Chapter 1

Background

1.1 Introduction

Almost since the inception of the discipline of computing, people have been usingcomputers to compose and generate music. This is perhaps unsurprising given theimportance of algorithmic principles in much compositional thinking throughout mu-sical history. The use of computers for music has mostly been driven by the desires ofcomposers to generate interesting and unique new material.

Recognising the distinction between the composition of musical scores and otherforms of music and sound generation, [Anders and Miranda 2011] have proposed theuse of the term ‘computer-aided composition’ to refer to one area of what is morebroadly known as ‘computer music’, a discipline which also encompasses the arts ofsound synthesis and signal processing [Roads 1996]. This thesis is concerned withcomputer-aided composition: in particular, the computer-realisation of the musicalformalism of Joseph Schillinger [Schillinger 1978]. Some authors prefer the term ‘al-gorithmic composition’ to refer to computer-aided composition [Nierhaus 2009]. Inthis thesis the two terms will be used interchangeably.

Joseph Schillinger was a Ukrainian-born composer, teacher and music theoristwho was active in New York from the 1920s until his death in 1943. Schillinger’slasting influence as a theorist and teacher exerted itself through famous students suchas George Gershwin, Benny Goodman and Glenn Miller; and several distinguishedtelevision and radio composers [Quist 2002]. The distillation of his life’s work is con-tained in three large volumes. Two of these constitute The Schillinger System of MusicalComposition [Schillinger 1978]. The third volume, The Mathematical Basis of the Arts[Schillinger 1976] was intended to be broader in scope and generalise much of hisprior work in music to visual art and design. The Schillinger System attempted todifferentiate itself from other accepted musical treatises by pursuing a more ‘scien-tific’ approach to composition. It consequently eschewed restrictive systems of rulescreated from the empirical analysis of Classical styles, as well as the notion of compo-sition by ‘intuition’. Instead it promoted a range of quasi-mathematical methods forthe construction of musical material. The system was intended to be of practical useby working composers — George Gershwin famously wrote the opera Porgy and Besswhile studying under Schillinger [Duke 1947].

1

2 Background

Schillinger’s work has frequently been mentioned in passing by researchers work-ing in the field of computer-aided composition, but rarely addressed in any detail.There are several examples of similar individual algorithms that have been incor-porated into computer-aided composition systems, but most of these systems focuson specific computational paradigms which are unrelated to the rest of Schillinger’swork. To the best of the author’s knowledge, only one other system dedicated specifi-cally to the automation of Schillinger’s procedures exists in the form of publicly avail-able software, and no such system has been referred to in the academic literature. Thisthesis will therefore provide the first formal presentation and evaluation of an ‘auto-mated Schillinger System’. From here onwards, this term will be used to refer to thecomputer implementation being presented, while the term ‘Schillinger System’ willbe used as a short form of The Schillinger System of Musical Composition.

1.2 Introduction to the Schillinger System

The two volumes of the Schillinger System [Schillinger 1978] consist of twelve bookspresented as individual ‘theories’. Each of these theories is an exposition ofSchillinger’s musical philosophy combined with his technical discussions pertainingto general principles and explicit procedures. They include numerous examples ofthe procedures being carried out by hand, and lengthy annotations by the editorswho published the work after Schillinger’s death.

The collection of theories is listed below. The work in its entirety is a formidable1640 pages. Consequently, the scope of this thesis has only allowed for the first fourtheories to be considered in detail.

I Theory of Rhythm

II Theory of Pitch-scales

III Variations of Music by Means of Geometrical Projection

IV Theory of Melody

V Special Theory of Harmony

VI Correlation of Harmony and Melody

VII Theory of Counterpoint

VIII Instrumental Forms

IX General Theory of Harmony

X Evolution of Pitch Families

XI Theory of Composition

XII Theory of Orchestration

§1.2 Introduction to the Schillinger System 3

An existing software program known as StrataSynch by David McClanahan1 is theonly other known automated system to make explicit use of Schillinger’s theories. Itimplements the generation of four-part diatonic harmony using books V and VIII, anda single chapter from book I. The system described in this thesis extends beyond thescope of that system to a more versatile form of harmony generation utilising books I,II and III; and to the generation of single-voice melodic compositions utilising booksI–IV.

1.2.1 Schillinger in Computer-aided Composition Literature

In an extended commentary on computer music from 1956–1986, Ames acknowl-edged the algorithmic nature of Schillinger’s work without pointing the reader toany known computer implementation, and noted that it had become ‘all but forgot-ten’ [Ames 1987]. Schillinger’s work was discussed in greater detail by Degazio, whoagain pointed out how much of it was presumably amenable to computer implemen-tation, and highlighted how particular properties of the Theory of Rhythm would en-able self-similar musical structures to be generated, thus relating it to the explorationof fractals in computer music [Degazio 1988]. The ability of the system to generatefractal structures was also identified by Miranda [Miranda 2001]. Miranda furthernoted the interesting rhythmic possibilities of using algebraic expansions and sym-metrical patterns of interference, both of which are also explored in the Theory ofRhythm. More recently Nierhaus gave a cursory mention of Schillinger in the epilogueof a survey of algorithmic composition, implicitly acknowledging that it is possible tobe adapted but also failing to cite any example of an implementation [Nierhaus 2009].

Although the discussion of a specific implementation is lacking, algorithms simi-lar to those in Schillinger’s Theory of Melody were used with apparent success in earlywork by Myhill (cited in [Ames 1987]) and later by Miranda as part of a musical datastructure used by agents in a swarm algorithm [Miranda 2003]. Furthermore, thereare numerous examples of algorithms which use permutation in a similar mannerto Schillinger’s Theory of Rhythm, and plenty of examples of systems which use in-version and retrograde techniques in a manner similar to Schillinger’s ‘geometricalprojections’. There is no suggestion being made that these particular techniques orig-inate from Schillinger’s system alone; indeed their use can be found throughout thehistory of Western musical composition [Nierhaus 2009].

1.2.2 Motivation

If many of the procedures expounded by Schillinger are not unique (this is not tosuggest that none of them are), then the value of his treatise is that it collates them to-gether, with each one presented in the context of the others and potentially useful in-terrelationships drawn. One of the motivations for adapting the Schillinger system istherefore the fact that it incorporates many algorithmic techniques which are demon-strably useful in computer-aided composition on their own, but have not been ex-

1 www.capsces.com/stratasync

4 Background

tensively tested together in the absence of other prevailing computational paradigms.Another motivation is the fact that other oft-cited treatises on music theory in algorith-mic composition contain rules which are derived from existing music, such as Piston’sHarmony [Piston 1987]. Conversely, Schillinger’s work purports to have taken a moreuniversal approach that does not draw its rules from the analysis of any particularmusical corpus. For this reason it is ostensibly likely to be able to produce composi-tions which do not fall into the category of ‘style imitation’, which Nierhaus identifiedas being overwhelmingly dominant in the field [Nierhaus 2009]. Instead, it should al-low for a measure of stylistic diversity. As will be discussed in chapters 3 and 4 of thisthesis, these notions are contentious and worthy of investigation.

1.2.3 Criticism

The very premise of Schillinger’s work is controversial by virtue of the fact that iteffectively condemns previous theories and methodologies as inadequate [Backus1960]. As a result it has attracted rigorous scrutiny by various authors. A 1946 reviewby Barbour [Barbour 1946] examined each of the ‘achievements’ of the Schillinger Sys-tem listed in a preface by the editors, and concluded that none of them were substan-tiated. Barbour also listed a number of errors and inconsistencies which highlightedthe work’s fundamental lack of a sound scientific or mathematical basis.

Schillinger’s work was derided extensively by Backus [Backus 1960]. Dubbing itboth ‘pseudo-science’ and ‘pseudo-mathematics’, he surveyed the first four volumesin some detail, pointing out that many descriptions of procedures are unnecessar-ily verbose and laced with undefined jargon; that the musical significance of themis based on numerology rather than any appropriately cited research; that much ofthe symbolic notation serves to obfuscate rather than clarify the expression of some-times trivial mathematical ideas; and finally that several mathematical definitions aresimply incorrect. Backus thus raised many important issues concerning the formalinterpretation of Schillinger’s techniques which are tackled in chapter 3 of this thesis.

Neither Backus nor Barbour commented on whether Schillinger’s procedures wereof any use by contemporary composers for generating musical material. In light oftheir resounding criticism, it is significant that other authors have considered many ofthe theories to be demonstrably useful in practice, or cited testimony from successfulcomposers suggesting as much [Degazio 1988]. The composer Jeremy Arden pub-lished a PhD thesis documenting the study and utilisation of the Schillinger Systemfrom a compositional perspective [Arden 1996], concluding that the Theory of Rhythmand Theory of Pitch Scales offered many useful techniques. Although he swiftly dis-missed the Theory of Melody as ‘too cumbersome’ to be of practical use, similar prin-ciples to those contained in that theory have been found useful in other contexts asmentioned above in section 1.2.1. There is therefore no absolute consensus whichwould wholly discourage computer implementations of the Schillinger System.

§1.3 Summary of this Thesis 5

1.3 Summary of this Thesis

In this thesis, the automated Schillinger System designed by the author will be pre-sented and evaluated. To begin with, chapter 2 will survey both the dominantparadigms and the specific computational approaches in the field of computer-aidedcomposition. This theoretical basis will serve to position the automated SchillingerSystem within the academic literature.

The details of the software implementation of the four initial books of theSchillinger System listed in section 1.2 will be presented in chapter 3. Alongside therequisite technical discussion, chapter 3 will provide a comprehensive outline of thebulk of the procedures contained in these books. Perhaps more importantly, it willalso identify the inherent difficulties in translating a formalism designed for com-posers into a model able to be represented computationally, including the resolutionof Schillinger’s notational and practical inconsistencies and the necessity for a raft ofnew procedures to sensibly link the theories together.

The evaluation of musical output is a perennial problem in this inter-disciplinaryfield, and few authors tend to venture beyond subjective conclusions drawing on theirown musical backgrounds. However, one method of more rigorous evaluation con-sists of the enlisting of a ‘team of experts’ to supply qualitative data for analysis. Suchan approach has been used to study the output of the system presented here. Addi-tionally, the burgeoning field of automatic genre classification has been engaged as ameans of quantitatively assessing the statistical characteristics of the output. Togetherthese forms of analysis aim to establish both the intrinsic musical merit and stylisticdiversity of the automated Schillinger System. These experiments and their resultswill be presented in chapter 4.

The recently released four-part harmony system by McClanahan and the activepursuit of new forms of representation for Schillinger’s ideas, embodied by the on-line Schillinger CHI Project2, suggest a resurgence of interest in automating parts of theSchillinger System. The software presented in this thesis aims to contribute to this mo-mentum, and is amenable to development beyond its current state as a ‘push-button’music generator into a modular interface that could be used by composers and mul-timedia content creators. Many potential avenues for future research are explored inchapter 5.

2 http://schillinger.destinymanifestation.com/

6 Background

Chapter 2

Overview of Computer-aidedComposition

This chapter will give a broad overview of the field of computer-aided composition, inorder to place the automated Schillinger System in context, and to position this thesisas an addition to the computer music literature.

As remarked upon by Supper [Supper 2001], the distinctions between compo-sitional ideas, realisation in the musical score, and auditory perception are clearlybounded in a computing context. As this thesis is focusing on computer-aided com-position rather than attempting to encompass the entire field of computer music, thisoverview does not include algorithms which take music generation beyond the levelof symbolic representation into digital audio. Instead, it is presumed that the symbolicdata generated by composition algorithms can be further mapped to musical notation,MIDI data1 or audio data depending on the application.

[Supper 2001] made a further taxonomic observation which is relevant to thischapter. He distinguished between:

1. the modelling of musically-oriented algorithmic procedures to produce encod-ings of established music theories;

2. procedures individual to a ‘composer-programmer’ where the code produces aunique class of pieces based upon the composer’s individual expertise; and

3. experiments with algorithms from extra-musical fields such as dynamic systemsor machine learning.

In fact, there are many instances where individual implementations bear relevanceto two or three of Supper’s categories, and his is only one of a number of possible tax-onomies for describing computer-aided composition — section 2.1 lists a variety ofother significant distinctions within the algorithmic composition literature. However,it is safe to observe that much recent academic research in computer-aided composi-tion is based primarily on the application of pre-existing extra-musical algorithms tomusic, thus falling into Supper’s third category. Section 2.2 describes this literature.

1 MIDI stands for Musical Instrument Digital Interface. It is the dominant protocol for handling sym-bolic musical information in computer systems and hardware synthesizers.

7

8 Overview of Computer-aided Composition

Figure 2.1 provides a visualisation of the array of computational approaches usedin the field, as discussed in section 2.2. These are connected by dashed lines which rep-resent their algorithmic or mathematical similarity, and roughly partitioned in termsof their use within the various paradigms discussed in section 2.1.

FuzzyLogic

Case-basedReasoning

GenerativeGrammars

FSA

Markov Chains

L-systems

GeneticAlgorithms

CellularAutomata

Chaos

Musical domainknowledge

Dat

a-dr

iven

Fractals

Non-musical Data Streams

Not

dat

a-dr

iven

IGAs

GeneticProgramming

ConstraintProgramming

ATNs

AutomatedSchillinger

System

ArtificialNeural Nets

SwarmAlgorithms

Musical "Expert Systems"

Som

etim

es d

ata-

driv

en

Figure 2.1: Approaches to Computer-aided Composition

§2.1 Dominant Paradigms in Computer-aided Composition 9

As this chapter will be limited to the discussion of systems designed with the ul-timate goal of composing music, other research areas such as computer auralisation,computational creativity and automated musicological analysis, despite being closelyrelated to the success of particular algorithmic composition approaches, will not beexplored per se. Discussions of computer style recognition, expressive musical per-formance and output evaluation are relevant to the experiments presented in chapter4 and will be included there in the appropriate places.

2.1 Dominant Paradigms in Computer-aided Composition

Before commencing a description of the common algorithm families used in this field,it will be useful to outline several overarching (and often competing) paradigms.These are partly representative of differing philosophical approaches to automaticmusic generation, and partly to do with historical shifts in emphasis on computationalapproaches, which are in turn the result of past developments in artificial intelligenceand the modelling of natural phenomena.

2.1.1 Style Imitation versus Genuine Composition

The reproduction of specific musical styles (‘style imitation’) constitutes the major-ity of algorithmic composition literature. Its dominance was testified to by Nierhausin the epilogue of his comprehensive survey of algorithmic composition [Nierhaus2009]. The styles in question are either those of particular individual composers, orthose exemplified by the music of a particular culture or historical period. Style imi-tation is not limited to any particular group of computer algorithms, but is frequentlythe paradigm used by most of the the approaches in figure 2.1 that encode musicaldomain knowledge.

The reason for the dominance of style imitation is somewhat evident when oneconsiders the large quantity of work dedicated specifically to four-voice chorale har-monisation [Pachet and Roy 2001]. This form of composition is perhaps the mostthoroughly studied in the musicological literature due to the enormous quantity of‘exemplar’ works courtesy of European Baroque and Classical composers. Conse-quently, a well-established set of rules of varying levels of strictness has been empiri-cally derived from this corpus over the course of several centuries, and this theoreticalframework lends itself to being expressed as an optimisation problem in the contextof ‘correct’ four-part harmony writing. Since optimisation problems sit comfortablywithin the realm of computer science, this style of composition is the most readilyapproachable by computer scientists. It has been pointed out by Allan that choraleharmonisation is “the closest thing we have to a precisely defined problem” [Allan2002]. Any music generated within formal, recognisable stylistic boundaries is able tobe evaluated either objectively or with a degree of authority by human listeners.

Conversely, the concept of ‘genuine composition’ [Nierhaus 2009] is problematicin computer music for the reason that genuinely new and different results are virtu-ally impossible to validate using quantitative methods, and very much at the mercy


of individual musical taste when it comes to human scrutiny. Nevertheless, while aca-demic work in this area is traditionally less common it is still pursued in earnest, espe-cially by researchers utilising chaos theory or algorithms with emergent behaviours.

2.1.2 Push-button versus Interactive

An algorithmic composition system which delivers a self-contained musical fragment,complete composition or an endless stream of musical material with real-time play-back requiring no human intervention after the setting of initial parameters may bereferred to as a ‘push-button’ or ‘black-box’ system. Examples of well-documentedpush-button systems range from Hiller and Isaacson’s early experiments formingthe Illiac Suite [Hiller and Isaacson 1959] to Cope’s Experiments in Musical Intelligence[Cope 2005]. Most four-part harmonisation systems also fall into this category.

Systems which generate music using continual human feedback are perhaps morefrequently cited as being successful. This paradigm has been referred to in terms of ahuman-computer ‘feedback loop’ [Harley 1995] and features in a variety of composi-tion algorithms which are designed to either incorporate real-time human behaviourinto their generative process or perform a gradual optimisation tailored to a user’smusical preference. Examples include interactive genetic algorithms using ‘humanfitness functions’ [Biles and Eign 1995]; systems which allow a user to generate rawmaterial and then modify a set of parameters to develop it further [Zicarelli 1987];systems which allow the user to influence the generation of material from a moreabstracted perspective [Beyls 1990]; systems which learn iteratively by ‘listening’ to auser’s live performance [Thom 2000]; and systems which map a user’s physical move-ment [Gartland-Jones 2002] or brain-wave activity [Miranda 2001] to a subset of thealgorithm’s parameter space in real-time. Many authors have argued that these ar-eas of research hold greater promise than push-button systems, based on the notionthat the acts of composition (and improvisation) are fundamentally human activitiesdependent on human interaction.

There also exists a body of software which functions as a kind of ‘blank slate’for composers. These programs are usually modular in the sense that individualpre-existing algorithms can be interfaced arbitrarily, and there is often the scope for‘composer-programmers’ to extend their functionality. Examples range from the earlyMUSICOMP by Robert Baker [Hiller and Baker 1964] to the more advanced Max byDavid Zicarelli [Zicarelli 2002]. Such environments are interactive by their very defi-nition, however once the template for a composition is completed by the composer, inmany cases they arguably function as push-button systems. More recently, the adventof ‘live coding’ has been made possible by environments like Impromptu [Sorensenand Gardner 2010]. These environments are specifically designed to facilitate the cod-ing of musical procedures during performance or improvisation.

§2.1 Dominant Paradigms in Computer-aided Composition 11

2.1.3 Data-driven versus Knowledge-engineered

In computer-aided composition a ‘data-driven’ solution relies on a database of exist-ing musical works on which to perform pattern extraction, statistical machine learn-ing or case-based reasoning to derive musical knowledge. By contrast, a ‘knowledge-engineered’ system requires the coding of musical knowledge in the form of proce-dures or the manual population of a knowledge base. In figure 2.1, these alternativeparadigms have been used to categorise various computational approaches on the leftof the diagram.

An expert system combines a knowledge base of facts or predicates, ‘if-then-else’rules and heuristics, with some kind of inference engine to perform logical problemsolving in a particular problem domain [Coats 1988; Connell and Powell 1990]. Sucha system requires the acquisition of knowledge either automatically or through a hu-man ‘domain expert’ [Mingers 1986]. The front end may be interactive (the user inputsqueries or data) or non-interactive (fully automated). There is generally also the pre-requisite that an expert system is capable of both objectively judging its output usingthe same knowledge base, and tracing the decision path that led to the output for theuser to analyse [Coats 1988].

The inherent flaws of expert systems are well-known. One problem is that as a sys-tem’s parameter space becomes more ambitious, the knowledge base of rules tends toexpand exponentially. In algorithmic composition this has lead to optimisation prob-lems in four-part harmonisation which become computationally intractable above acertain polyphonic density or beyond a certain length, as found by Ebcioglu [Ebcioglu1988]. Beyls also cited the ‘complexity barrier’ inherent in musical expert systems, andfurther noted the lack of graceful degradation in situations with incomplete or absentknowledge [Beyls 1991]. Phon-Amnuaisuk mentioned the common problem of ar-bitrating between contradictory voice-leading rules [Phon-Amnuaisuk 2004]. One ofMingers’ main criticisms of expert systems in general was that a rule base must alwaysbe incomplete when built from only a sample of all possible data [Mingers 1986].

In knowledge-engineered musical expert systems, the most significant obstacle isthe time-consuming encoding of a sufficient quantity of expert knowledge to allowthe system to compose anything non-trivial. For style imitation, a further problem isthat many rules inherent to a particular style may not be obvious even to experts, ormay not be possible to adequately express in the required format. Sabater et al. ar-ticulated an underlying issue of rule-based style imitation: “the rules dont make themusic, it is the music which makes the rules” [Sabater et al. 1998]. For these reasons,the data-driven approach has become favoured by many researchers. Some of theseauthors have advocated for alternative ‘connectionist’ approaches to uncover the im-plicit knowledge of a musical corpus rather than attempt to find explicit rules — theirsolutions typically perform supervised learning of the corpus using artificial neuralnetworks.


2.1.4 Musical Domain Knowledge versus Emergent Behaviour

In figure 2.1 the two paradigms of musical domain knowledge and emergent be-haviour have been split vertically. The application of musical domain knowledge incomputer-aided composition generally leads to a set of either implicit or explicit mu-sical rules being enforced, something practically unavoidable except in cases wherecompletely random behaviour is sought for aesthetic reasons. The approach is often,but not always, aligned with style imitation. Such examples found in the literatureare usually broadly referred as ‘musical expert systems’, but not all such approachesnecessarily fall into this category if the accepted meaning of the term ‘expert system’in computer science literature is enforced [Mingers 1986].

Miranda has suggested that rule-based composition systems lack ‘expression’ dueto their inability to break rules, citing a famous quote by Frederico Richter: “In music,rules are made to be broken. Good composers are those who manage to break themwell” [Miranda 2001]. This perceived fundamental flaw with the knowledge-basedapproach has provided inspiration for many researchers to look instead to paradigmswhich focus on dynamic or emergent behaviour, such as chaos, cellular automataand agent interaction in virtual swarms. Evolutionary algorithms have also been ex-plored extensively, because although they are usually designed to operate in a musicalknowledge domain, they do so in a fundamentally stochastic manner rather than byapplying generative rules [Biles 2007].

The dichotomy between knowledge-based music and ‘emergent’ music was iden-tified by Blackwell and Bentley, who separated the algorithmic composition field into’A-type’ and ’I-type’ systems [Blackwell and Bentley 2002]. These labels respectivelyrefer to systems that rely on encoded musical knowledge, and those that map thedata streams from swarms, dynamic systems, chaotic attractors, natural phenom-ena or human activity to musical output. Beyls posited an equivalent delineationof ‘symbolic’ versus ‘sub-symbolic’ algorithms [Beyls 1991]. The emergent or sub-symbolic paradigm seeks to “interpret rather than generate” [Blackwell and Bentley2002], and is therefore usually associated with Nierhaus’s notion of genuine compo-sition [Nierhaus 2009]. However, a caveat which authors choosing this path have en-countered was pointed out by Miranda: the biggest difficulty when using non-musicalprocesses for algorithmic composition is deciding how to translate the data streaminto a representation which is musically meaningful [Miranda 2001].

2.2 Formal Computational Approaches

This section will explain the specific algorithmic approaches that have been appliedto computer-aided composition. It will be seen that many of these approaches havestrong mathematical similarities (as shown in figure 2.1), and may produce statisti-cally equivalent results depending on how they are implemented. As such, the organ-isation of this section does not strictly separate the algorithms based purely on theirmathematical or purported musical properties. It does however indicate the range ofdistinct approaches to be found in the algorithmic composition literature.

§2.2 Formal Computational Approaches 13

The topics covered are grouped roughly into those that compose music using astatistical or probabilistic model of a style or corpus (Markov models and artificialneural networks); those which are most frequently associated with the ‘expert system’paradigm in terms of being driven by systems of generative rules and constraints(formal grammars, finite state automata, case-based reasoning and fuzzy logic); andthose which map the data from an extra-musical process onto a musical parameterspace (chaos, fractals, cellular automata and swarm algorithms). For the most partthe first two categories may be thought of as encoding ‘implicit’ and ‘explicit’ musicalknowledge respectively. Evolutionary algorithms do not fall neatly into this particulartaxonomy because although they encode musical knowledge, they navigate the spaceof musical possibilities stochastically.

2.2.1 Markov Models

Markov models were the earliest established extra-musical approach to computer-aided composition to be widely adopted. In a survey of the first three decades ofalgorithmic composition, Ames cited several examples of their use from the 1950s on-wards by composers such as Lejarin Hiller and Iannis Xenakis [Ames 1987]. Cohendescribed a number of early applications of the probabilistic replication of musicalstyles, treating what are essentially Markov chains as a musical application of Infor-mation Theory. Cohen’s notion of composition being regarded as simply “selectingacceptable sequences from a random source” is a potential motivation for using thetechnique for style imitation, suggesting that “the degree of selectivity of the works ofcomposers is . . . a parameter of their style” [Cohen 1962]. Their relative ease of imple-mentation has perhaps also contributed to their popularity in computer music [Ames1989].

A simple Markov model consists of a collection of states and a collection of tran-sition probabilities for moving between states in discrete time steps [Ames 1989]. Theprobabilities of states leading to one another may be represented by a ‘transition ma-trix’. The state space is discrete, and in musical applications, finite. A Markov chainis obtained by selecting an initial state and then generating a sequence of states usingthe transition matrix.

How this model is utilised in algorithmic composition differs between implemen-tations. States can be used, for example, to represent individual pitches, chords ordurations; or they may be used to represent individual Markov chains of length n,which is equivalent to enforcing a dependency on events n time steps into the past.A Markov model in which all transitions depend on the previous n transitions is annth-order Markov model; these are commonly used to instil a measure of context-sensitivity and thus encode musical objects at the phrase or cadence level. States mayalso represent entire vectors of potentially interdependent musical parameters, some-thing utilised by Xenakis in the form of ‘screens’ [Xenakis 1992].

The transition matrix may be either constructed by hand, or derived empirically byperforming an automated analysis on a database of existing musical works. The latteramounts to encoding each work as a sequence of states, and determining the transition


probabilities by the relative tallies of each transition (analogous to the experimentscarried out by A. A. Markov himself using Russian texts [Ames 1989]). These optionscorrespond with Cohen’s labels of ‘synthetic’ and ‘analytic-synthetic’ [Cohen 1962].Both approaches are present in the literature, and the choice has depended principallyon whether the user is attempting to generate a particular aesthetic for an individualcomposition [Ames 1989] or performing style imitation, where the purpose is for therandomly generated output to inherit the generalised musical rules implicit in thecorpus [Cohen 1962].

Examples of the use of Markov chains for algorithmic composition are numerous.Ames documented his use of the technique to develop works for monophonic soloinstruments [Ames 1989]. In his program, the transition matrix is hand-crafted, andthe entries define the probabilities of melodic intervals, note durations, articulationsand registers. Hiller and Isaacson’s Experiment 4 from the Illiac Suite operated in muchthe same manner [Hiller and Isaacson 1959]. Cambouopoulos applied Markov chainsto the construction of 16th century motet melodies in the style of the composer Palest-rina [Cambouropoulos 1994]. His approach also used hand-crafted transition matri-ces for melodic intervals and note durations; these were developed through manualstatistical analysis of Palestrina’s melodies. Other authors have used a data-drivenapproach: Biyikoglu ‘trained’ a Markov model using the statistical analysis of a cor-pus of Bach’s chorales to generate four-part harmonisations [Biyikoglu 2003], whileAllan solved the same chorale harmonisation problem using Hidden Markov Mod-els [Allan 2002]. Allan’s solution uses one Hidden Markov Model to generate chord‘skeletons’ (the notes of the melody are treated as observations ‘emitted’ by hiddenharmonic states), and two more to fill in the chords and provide ornamentation. Itthen uses constraint satisfaction procedures to prevent invalid chorales, and cross-entropy measured against unseen examples from the chorale set as a quantitative val-idation method.

The reported success of Markov models is varied. Allan concluded that coherentharmonisation can indeed be achieved via statistical examination of a corpus [Allan2002], while in Ames’ assessment this often leads to “a garbled sense of the originalstyle” [Ames 1989]. Biyikoglu suggested that Markov chains are not appropriate formodelling hierarchical relationships, but are capable of providing smooth harmonicchanges [Biyikoglu 2003]. Cambouopoulos highlighted the potential for higher orderchains to simulate a measure of musical context [Cambouropoulos 1994], howeverBaffioni et al. observed that chains of too high an order simply end up reproducingentire sections of the original corpus, and instead proposed a hierarchical organisationof separate Markov chains accounting for form, phrase and chord levels [Baffioni et al.1981]. As Ames suggested, the fundamental problem with many of these models isthat they provide an aural realisation of the probability distributions within a data setbut cannot discern the methods behind its construction, and therefore serve as littlemore than “partial descriptions of non-random behaviour” [Ames 1989].


2.2.2 Artificial Neural Networks

Artificial neural networks (ANNs) are often used to investigate the notion of musi-cal style, and have been successfully used to perform style and genre classification(see section 4.4.1). ANNs are well-suited to these tasks because they are particularlygood at finding generalised statistical representations of their input data [Russell andNorvig 2003]. In algorithmic composition, they tend to be aimed squarely at styleimitation for this reason. The original motivations for pursuing this ‘connectionist’approach as an alternative to expert systems were summarised by Todd, who champi-oned ANNs as a way to gracefully handle complex hidden associations within a dataset, as well as numerous ‘exceptions’ to the established musical rules which wouldnormally inflate the knowledge-base of an expert system [Todd 1989]. Hornel andMenzel commented on neural networks’ abilities to circumvent the problem of ruleexplosion inherent in building sophisticated expert systems for style imitation [Horneland Menzel 1998].

ANNs are loosely modelled on the architecture of the brain [Russell and Norvig2003]. Networks are built of simple computational units known as ‘perceptrons’,which are analogous to the function of individual biological neurons. A perceptroncalculates a weighted aggregate of its inputs, subtracts a ‘threshold’ value and ‘fires’by passing the result through a differentiable activation function such as a sigmoidor hyper-tangent. The most common practical implementation of a neural network isknown as a ‘multi-layer perceptron’ (MLP). This normally consists of a layer of ‘hid-den’ neurons connected to both a set of inputs representing the input dimensions ofthe training set, and a set of output neurons which represent the output dimensions.The basic function of a neural network is to learn associations between input vectorsand target output vectors by adjusting randomly initialised weights along networkconnections. A popular method for doing this is ‘gradient descent back propagation’,in which the input vectors are fed forward through the network and the mean-squarederror between the output and target vectors is gradually reduced (subject to a scalar‘learning rate’) over some number of epochs using the derivative of the error func-tion. In this way the weights come to form a statistical generalisation of the trainingset through repeated exposure to input vectors. In musical applications, the outputsare normally fed back into the inputs to form a ‘recurrent neural network’ (RNN),and a technique such as back propagation through time (BPTT) can then be used tomodel temporal relationships in the corpus [Mozer 1994]. Neurons which feed backinto themselves may also be used to implement short term neural ‘memory’. To com-pose new music using an RNN, a trained network is simply seeded with a new inputvector and the outputs are recorded for some number of iterations.

Todd’s original system restricted the domain to monophonic melodies representedusing the dimensions of pitch and duration [Todd 1989]. He combined two differ-ent network types — a three-layer RNN with individual neural feedback loops tomodel temporal melodic behaviour at the note level, and a standard MLP which,when trained, acted as a static mapping function from fixed input sequences to out-put sequences [Todd 1989]. Mozer implemented an RNN that learned and composed


single-voice melodies with accompaniment, called CONCERT [Mozer 1994]. It im-proved on Todd’s work in various ways, such as using a probabilistic interpretationof the network outputs, and more sophisticated data structures for musical repre-sentation. Mozer’s network inputs represented 49 pitches over four octaves. Horneland Menzel described a neural network system called HARMONET with the abilityto harmonise chorale melodies, and a counterpart system MELONET for composingmelodies [Hornel and Menzel 1998]. Both of their approaches used a combination ofANNs for the ‘creative’ work and constraint-based evaluation for the ‘book-keeping’.ANNs have also been used as fitness evaluators in evolutionary algorithms as oneway of alleviating both the inadequacy of objective musical fitness functions and the‘fitness bottleneck’ caused by human intervention (see section 2.2.5). For instance,Spector and Alpern used a three-layer MLP trained on the repertoire of jazz saxo-phonist Charlie Parker which was used to classify members of a population as either‘good’ or ‘bad’ [Spector and Alpern 1995].

The aesthetic products from ANNs are also reported as being mixed. Mozer’sresults when attempting to compose in the style of Bach were reported to be ‘rea-sonable’, but his experiments on European folk-tunes were less successful [Mozer1994]. Hornel and Menzel’s compositions using HARMONET and MELONET, onthe other hand, were evaluated as ‘very competent’, and showed that ANNs couldbe used to imitate characteristics strongly associated with a composer’s style [Horneland Menzel 1998]. Todd avoided a judgement of merit regarding his ANN-composedmelodies, stating only that they were “more or less unpredictable and therefore mu-sically interesting” [Todd 1989]. A common criticism of most ANN approaches is thatthey essentially learn the statistical equivalent of a set of complex Markov transitionmatrices, and are therefore only slightly more capable than Markov chains of mod-elling higher order musical structure [Mozer 1994]. Phon-Amnuaisuk points out thatthey learn only ‘unstructured knowledge’ [Phon-Amnuaisuk 2004]. Eck and Schmid-huber have offered a potential remedy to this problem by using ‘long short term mem-ory’ (LSTM) to allow for some association of temporally distant events manifesting asmedium-scale musical structure. Their method resulted in the ‘successful’ productionof improvisations over fixed Bebop chord sequences [Eck and Schmidhuber 2002].

2.2.3 Generative Grammars and Finite State Automata

Algorithmic composition systems incorporating generative grammars are what aremost commonly referred to as musical ‘expert systems’, because they presupposean encoding of explicit domain-specific rules, irrespective of whether those rules areencoded by hand or extracted automatically from a corpus. The attraction of thismethod is that it is capable of encoding the established musical knowledge of musi-cological texts, and it also provides a way to generate coherent musical structure atmultiple hierarchical levels, while at the same time allowing for a large space of com-plex sequences [Steedman 1984]. Many of the the generative grammar systems areinformed by the work of Chomsky regarding linguistic syntax [Chomsky 1957], andlater work by Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983] which builds


upon the musicological analysis theories of Schenker [Schenker 1954]. The generativegrammar approach bears strong similarities to the implementation of finite state au-tomata (FSA), and both grammars and FSA have been shown to function identicallyto Markov chains in certain circumstances [Roads and Wieneke 1979; Pachet and Roy2001]. Material obtained by applying the production rules of a generative grammaris most often filtered using a knowledge-base of constraints which define the legalmusical properties of the system [Anders and Miranda 2011].

A generative grammar can be described as consisting of an alphabet of non-terminal tokens N, an alphabet of terminal tokens T, an initial root token Σ and aset of production or rewrite rules P of the form A → B, where A and B are tokenstrings [Roads and Wieneke 1979]. A grammar G is represented formally by the tupleG = (N, T, Σ, P), and music is generated by establishing a set of musical tokens suchas pitches, rhythms or chord types, and designing a set of production rules that imple-ment legal musical progressions. Chomsky’s taxonomy of type 0, 1, 2 and 3 grammars(‘free’, ‘context-free’, ‘context-sensitive’ and ‘finite state’) [Chomsky 1957] is relevantto music production. For instance, Roads and Weineke observed that grammar types0 and 3 are inadequate for achieving structural coherence [Roads and Wieneke 1979].

Rader utilised stochastic grammars in an early implementation of a Classical styleimitator [Rader 1974]. The system he devised was a ‘round’ generator, wherein eachincarnation of the melody is constrained to consonantly harmonise with itself at regu-lar temporal displacements. It used an extensive set of production rules with assignedprobabilities, and a set of constraints. Domain knowledge was derived from tradi-tional harmonic theory, in this case Walter Piston’s treatise Harmony [Piston 1987].Holtzman described a system in which the production rules of multiple grammartypes were implemented along with ‘meta-production’ rules [Holtzman 1981], thusconstituting the knowledge and meta-knowledge of an expert system [Mingers 1986].These were accompanied by common transformational operations such as inversion,retrograde and transposition, and used to reproduce a work by the composer ArnoldSchoenberg [Holtzman 1981]. Steedman modelled jazz 12-bar blues chord sequenceswith context-free grammars [Steedman 1984], using an approach informed directlyby the musicological work of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983].Ebcioglu produced what was, according to Pachet and Roy [Pachet and Roy 2001], thefirst real solution to the four-part chorale harmonisation problem [Ebcioglu 1988]. Hissystem implemented an exhaustive optimisation process using multiple automata andsets of constraints based on traditional harmonic rules for generating chord skeletons,pitches and rhythms from an initial melody. Storino et al. used a manually encodedgenerative grammar to compose pieces in the style of the Italian composer Legrenzi[Storino et al. 2007]. Both Zimmerman [Zimmermann 2001] and Hedelin [Hedelin2008] have used grammars to generate large compositional structures which are thenfilled with chord skeletons using Riemann chord notation [Mickselsen 1977], beforefinally being fleshed out with note-level information — the aim being to bring formand construction closer to one another instead of relying on a single set of productionrules to generate ‘incidental’ musical structure [Hedelin 2008].

Cope’s system Experiments in Musical Intelligence (EMI) uses a type of FSA called


an augmented transition network (ATN), which is combined with a ‘reflexive patternmatcher’ to form a data-driven expert system [Cope 1992]. The analysis of a manuallyencoded and annotated corpus of works is performed using a method purportedly in-formed by the work of Schenker [da Silva 2003]. This method is referred to by Cope asSPEAC, which is an acronym for the possible chord classifications ‘statement’, ‘prepa-ration’, ‘extension’, ‘antecedent’ and ‘consequent’ depending on a chord’s makeupand context. A ‘signature dictionary’ of statistically significant recurring musical frag-ments of between 1 and 8 intervals is built using the pattern matcher [da Silva 2003].To produce new works, the ATN implements a set of production rules designed tostochastically generate a new SPEAC sequence, and constraint systems are applied todetermine the final pitch, duration and note velocity information. EMI has been usedto compose thousands of works which closely mimic the styles of famous composersincluding Bach, Chopin, Beethoven, Bartok, and Cope himself. More recently, an ‘oeu-vre’ of around one-thousand selected works in a wide range of styles produced by thesystem has been established as a style database itself, which Cope has used to interac-tively feed back into an updated system based on the same ‘recombination’ principlesknown as Emily Howell [Cope 2005]. Cope associates the notion of a prolonged styleimitation feedback loop with his proposed definition of creativity, arguing that sucha process is difficult to formally distinguish from the human creative process [Cope2005].

In general, systems incorporating some form of generative grammar imbued withexplicit musical knowledge have been found to give more convincing musical resultsfor style imitation than the statistically oriented approaches of Markov chains andANNs. Pachet and Roy concluded that the chorale harmonisation problem had es-sentially been ‘solved’ by expert systems [Pachet and Roy 2001]. The compositionsproduced by Cope’s programs have achieved notoriety for their quality [da Silva2003]. Storino et al. found that grammar-based systems were frequently capable ofsuccessfully fooling audiences of musicians into believing that computer-composedworks were in fact human-composed [Storino et al. 2007]. However, many of theseapproaches still suffer from problems common to expert systems generally, includingthe encoding of large enough knowledge bases [Coats 1988] and the potential for in-tractability due to combinatorial explosion [Pachet and Roy 2001]. Steedman notedthat simple grammars will always produce correct musical syntax, but have a naturalpropensity to generate music with no semantic: the encoding of musical meaning isan extremely difficult problem [Steedman 1984]. Miranda has claimed that the biggestweakness of these systems, in the context of composing genuinely new music, is theirinnate inability to break rules [Miranda 2001].

2.2.4 Case-based Reasoning and Fuzzy Logic

Case-based reasoning (CBR) and fuzzy logic also fall within the expert systemparadigm because they implement architectures that couple a knowledge-base withan inference engine to generate musical sequences [Sabater et al. 1998]. CBR systemsrely on a database of previous valid musical ‘cases’ from which to infer new knowl-


edge, and are therefore inherently data-driven, even though they may further incor-porate a set of immutable knowledge-engineered rules or constraints [Pereira et al.1997]. A CBR system uses past experience to solve new problems by storing previousobservations in a ‘case base’ and adapting them for use in new solutions when similaror identical problems are presented [Ribeiro et al. 2001].

Sabater et al. used case-based reasoning, supported by a set of musical rules, togenerate melody harmonisation [Sabater et al. 1998]. The rules represent ‘general’knowledge derived from traditional harmonic theory, while the cases in the databaserepresent the ‘concrete’ knowledge of a musical corpus. Their system consists of aCBR engine with a case base, and a rule module which only suggests a solution whenthe CBR fails to find an example of a past solution for a particular scenario using a‘naıve’ search (in this case a note to be harmonised). Successful solutions to problemsare added to the case base for future use. The system conforms to the traditional no-tion of an expert system which encodes domain knowledge, problem solving knowl-edge and meta-level knowledge [Connell and Powell 1990].

Ribeiro et al. implemented an interactive program called MuzaCazUza which usesa CBR system to generate melodic compositions [Ribeiro et al. 2001]. The case baseis populated with works by Bach. In this system, case retrieval is done by using ametric based on Schoenberg’s ‘chart of regions’ [Schoenberg 1969] and an indexingsystem to compare a present case with a stored case. The case with the closest matchis considered. After each retrieval phase, a musical transformation such as repeti-tion, inversion, retrograde, transposition, or random mutation is applied by the user,and an ‘adaptation’ phase simply drags non-diatonic notes into their closest diatonicpositions. The authors suggest continually feeding the results of a CBR system backinto the case base, thus creating a model not unlike the one proposed by Cope [Cope2005]. Pereira et al. used a similar system to Ribeiro et al., this time with a case baseconsisting of the works of the composer Seixas [Pereira et al. 1997]. Their CBR engineis modelled on cognitive aspects of creativity — ‘preparation’; that is, the loading ofthe problem and case base; ‘incubation’, which consists of CBR retrieval and rank-ing based on similarity metric; ‘illumination’, which is the adaptation of the retrievedcase to the current composition; and ‘verification’, which in this case is the analysisby human experts. During the incubation stage, the standard ‘musically meaningful’transformations of inversion, retrograde and transposition are employed to expandthe system’s ability to generate new music.

According to Sabater et al. the combination of rule and case-based reasoning meth-ods is especially useful in situations where it is both difficult to find a large enoughcorpus, and inappropriate to work only with general rules [Sabater et al. 1998]. Pereiraet al. believe that CBR systems contain a lot more scope for producing music that isdifferent from the originals than musical grammars inferred from a corpus [Pereiraet al. 1997].

At least one musical expert system based on fuzzy logic has been described inthe literature. The system by Elsea [Elsea 1995] was implemented in Zicarelli’s Maxenvironment [Zicarelli 2002]. The term ‘fuzzy logic’ is a potential misnomer, as theword ‘fuzzy’ refers not to the logic itself, but to the nature of the knowledge being


represented [Zadeh 1965]. The knowledge base in a fuzzy system distinguishes it-self by being made up of ‘linguistic’ rules with meanings that cannot be expressed by‘crisp’ boolean logic. For instance, the fuzzy rule “If there have been too many firstsin a row, then root or second” [Elsea 1995] is a linguistic expression guiding the in-ference system to avoid prolonged sequences of first inversion chords. Calculationsbased on this rule are made possible by assigning fractional ‘membership values’ tothe quantities of successive first inversion chords that could to some degree be con-sidered ‘too many’. The final decision of whether to transition to a root or secondinversion chord is made using a translation from fuzzy membership values to corre-sponding fuzzy values in the decision space, which are then ‘defuzzified’ to a singlevalue using an algorithm such as Mamdani or Sugeno [Hopgood 2011]. This process isdeterministic and constitutes a precise mapping. Sophisticated fuzzy expert systemsmay suffer the same problems of knowledge-engineering, ‘rule explosion’ and com-putational complexity as crisp expert systems, but they are a lot more graceful whenhandling missing, inconsistent or incomplete knowledge [Zeng and Keane 2005] andare therefore potentially more effective at making musically meaningful inferencesusing small corpora.

2.2.5 Evolutionary Algorithms

The term ‘evolutionary algorithms’ refers to a collection of techniques inspired pri-marily by Darwinian natural selection [Husbands et al. 2007]. Two of these techniqueswhich have been investigated in the field of algorithmic composition are genetic al-gorithms, and to a lesser extent genetic programming. These algorithms implementsophisticated heuristics for converging on local optimal solutions in very large searchspaces. The reason for their popularity in algorithmic composition is their ability totraverse diverse regions of a space of musical solutions stochastically. This is advan-tageous for musical optimisation problems like four-part harmonisation, because itrenders them no longer computationally intractable compared to expert system solu-tions like Ebcioglu’s [Ebcioglu 1988]. Furthermore, with a stochastic approach comesthe apparent implication that new music unhindered by generative rules is possible[Gartland-Jones and Copley 2003]. Thus, while in non-artistic fields genetic algo-rithms and genetic programming are usually used to solve optimisation problems,in music they are also commonly exploited for their ‘exploration’ abilities, and aresometimes claimed to be analogous to elements of the human composition process[Gartland-Jones 2002].

Genetic algorithms (GA) are a heuristic search technique in which candidate solu-tions are represented as a population of strings or ‘chromosomes’[Burton and Vladimirova 1999]. Each ‘gene’ of the chromosome represents a dimen-sion of the solution space. A stochastic search process is controlled by a selectionprocedure based on individual ‘fitness’ and ‘reproductive’ operators to obtain suc-cessive generations of a population, and ‘mutation’ operators to randomly introducenew genetic material into an existing population. The search runs for a fixed num-ber of generations, or until the fittest individual is somehow deemed fit enough to


be the final solution. Reproductive operators typically implement ‘genetic crossover’to merge a number of parents into an offspring, and mutation operators are used tomodify individual genes or small sections of an offspring’s chromosome. In the sim-plest ‘traditional’ GA, individuals are represented by binary strings and genetic op-erators operate at the binary level, with crossover occurring at arbitrary points alongthe string and mutation operators causing random ‘bit flips’ [Engelbrecht 2007]. How-ever, for algorithmic composition most authors have found it necessary to instill theevolutionary process with a measure of musical domain knowledge to radically en-hance the process. In particular, chromosomes are used to represent musical infor-mation at a higher level of abstraction, and ‘musically meaningful’ mutation opera-tors are chosen, including the transformational procedures of inversion, reversal andtransposition [Burton and Vladimirova 1999]. Fitness evaluation is usually cited asthe most problematic aspect of GAs. Gartland-Jones and Copely classified genetic al-gorithms by their use of either ‘automatic’ (using an objective function or an ANNtrained on a corpus) or ‘interactive’ (requiring human inspection/listening) fitnessfunctions [Gartland-Jones and Copley 2003]. The latter are often referred to as inter-active genetic algorithms (IGAs) [Biles 2001].

Phon-Amnuaisuk et al. used a GA to create traditional four-part harmonies [Phon-Amnuaisuk et al. 1999]. They relied on an objective knowledge-based fitness functionfor the evaluation of chromosomes. The chromosomes encoded short thematic pas-sages, the mutation operators included ‘perturbation’, which nudges a note in a singlevoice up or down a semitone; ‘swapping’, where chords are altered by swapping tworandom voices; ‘re-chord’ which randomly modifies the chord type; ‘phrase-start’,which mutates a phrase to begin on a root chord; and ‘phrase-end’, which mutates aphrase to end on a root chord. The main reproductive procedure involved splicingthe chromosome strings at a random crossover point. The fitness function was a castof rules commonly listed in traditional voice-leading theories.

Biles presented a genetic algorithm called GenJam for generating monophonic jazzsolos [Biles 1994]. GenJam initialises individuals within a population of melodic pas-sages. It performs musically meaningful mutations such as inversion, reversal, rota-tion and transposition. The fitness of each individual in a generation is determinedby a human operator, and the best individuals are used as the parents of the follow-ing generation. According to Biles, this feedback process converges on solos whichmatch the taste of the human operator [Biles 1994]. The main disadvantage of thismethod is that the reliance on human feedback for evaluating fitness manifests as abottleneck which makes the convergence process orders of magnitude slower thanusing objective fitness functions. Biles has addressed this problem by using entireaudiences instead of individual users [Biles and Eign 1995], using ANNs for fitnessfunctions [Biles et al. 1996], and removing fitness evaluation altogether by drawingthe initial population from an established database of superior specimens [Biles 2001].

Genetic programming (GP) is an extension to the GA paradigm in which the indi-viduals in the population are not vectors representing points in a solution space, buthierarchical expressions representing mathematical functions or the code for entire al-gorithms [Burton and Vladimirova 1999]. GP individuals are normally represented


as expression tree structures; consequently the selection, reproduction and mutationmechanisms are designed specifically to operate on these structures [Engelbrecht 2007].GP fitness functions are more commonly realised as error or ‘cost’ functions becausethey are very popular for solving symbolic regression problems, but aside from thesedifferences GP and GA implementations are fundamentally the same. Laine andKuuskankare [Laine and Kuuskankare 1994], for instance, generated an initial pop-ulation of melodies using simple mathematical operators and trigonometric func-tions, then evolved the population by performing crossover and mutation on subtrees.Longer and more complex musical phrases result from the increasing complexity ofthe population generations. Puente et al. used a GP technique to evolve context-freegrammars for producing melodies in the style of a corpus of works by several fa-mous composers [Puente et al. 2002]. In this instance the fitness function was simplya statistical comparison between the population members and the melodies from thecorpus.

Burton and Vladimirova suggested that genetic techniques allow a greater scopeof musical possibilities and often subjective ‘realism’ than other approaches such asANNs, which are restricted by training data; expert systems, which are often re-stricted by computational complexity and knowledge-engineering issues; and purelystochastic generators which exhibit good unpredictability but ‘questionable musical-ity’ [Burton and Vladimirova 1999]. However, they and many other authors have ac-knowledged the perennial problem of designing effective fitness-evaluation methodsthat reduce the counter-productive dependence on human interaction — the ‘fitnessbottleneck’ [Biles et al. 1996]. Additionally, many conundrums are ever-present inthe tuning of genetic algorithm parameters, such as whether to implement ‘elitist’ se-lection policies that may converge too quickly to local optima, or policies that retaina high level of diversity and allow low-quality individuals to continue reproducing[Burton and Vladimirova 1999]. Phon-Amnuaisuk et al. discovered that despite thesupposed advantages of using GAs for four-part harmonisation, a simple rule-basedsystem was capable of achieving consistently better results as far as the GA’s fitnessfunction was concerned [Phon-Amnuaisuk et al. 1999]. They attributed this to theGA’s lack of sufficient ‘meta-knowledge’, a natural trait for an expert system by virtueof the fact that the structure of the search process can be easily encoded in the pro-gram. They also noted the GA’s inability to guarantee globally optimal solutions (acaveat of stochastic search), and declared the GA model ill-suited to musical optimisa-tion problems. Despite all this, both interactive and non-interactive GAs continue tobe used successfully for tasks like jazz improvisation [Biles 2007] and the compositionof thematic bridging sections between user-supplied ‘source’ and ‘target’ passages[Gartland-Jones 2002].

2.2.6 Chaos and Fractals

Approaches to algorithmic composition in the tightly related fields of chaos and frac-tals have been popular as alternatives to the expert-system paradigm because of theirtendency to exhibit recurrent patterns or multi-layered self-similarity, while at the


same time being fundamentally unpredictable or complex [Harley 1995]. Both arelinked to mathematical resultants of the behaviour of iterated function systems (IFS)and dynamical systems, and were introduced as an alternative explanations for com-plex natural phenomena such as weather systems and the shape of coastlines [Man-delbrot 1983]. According to Harley [Harley 1995], their applicability to music hasbeen influenced by the work of Lerdahl and Jackendoff, who provided convincingmodels for analysing musical self-similarity [Lerdahl and Jackendoff 1983]; and Vossand Clarke, who demonstrated that some music contains patterns which can be de-scribed using 1/ f noise [Voss and Clarke 1978]. The non-musical, numerical datastreams created by applying such algorithms are not usually termed ‘emergent be-haviour’ because they are not generated by the interaction of a virtual environmentof simple interacting units. However, they share the property of being able to gen-erate complexity at the ‘macroscopic’ level from simplicity at the ‘microscopic level’[Beyls 1991]. Furthermore, their successful conversion into musical information is atthe mercy of the mapping problem noted by Miranda [Miranda 2001], a problem alsofaced by systems of emergent behaviour such as cellular automata and swarms.

Chaotic systems were explored by Bidlack as a means of using simple algorithmsfor endowing computer generated music with ‘natural’ qualities — for instance, thosewhich can be found relating to either organic processes or divergent mathematicalphenomena [Bidlack 1992]. Bidlack noted that the resultant complexity had more po-tential in computer synthesis, but suggested that the technique could be useful forperturbing musical structure at various levels of hierarchy, in order to instill a sys-tem with a measure of unpredictability. Dodge described a ‘musical fractal’ algorithmutilising 1/ f noise, arguing along the lines of Voss and Clarke that 1/ f noise rep-resents a close fit to many dynamic phenomena found in nature [Dodge 1988]. Hedrew the analogy between his recursively ‘time-filling’ process and Mandelbrot’s re-cursively ‘space-filling’ curves. The time-filling fractal form is seeded by an initialpitch sequence, which is then filled in by 1/ f noise and mapped to musical pitch,rhythm and amplitude. Harley produced an interactive algorithm that centres on a‘generator’ which provides the output of a recursive logistic differential equation; a‘mapping’ module which scales the output to a range specified by the user; a thirdmodule which provides statistical data on the generator’s output over specified time-frames to provide knowledge of high-level structures to the user; and a fourth modulewhich the user controls to reorder the generator output in the process of translatingit to musical parameters [Harley 1995]. These modules can be networked together inorder to act as raw input or as input ‘biases’ for one another.

There are several examples in the algorithmic composition literature of the useof Lindenmayer Systems (L-Systems) for generating fractal-like structures. L-Systemswere originally introduced to model the cellular growth of plants [Lindenmayer 1968],and first explored for musical applications by Prusinkiewicz [Prusinkiewicz 1986].L-Systems are deterministic and expressed almost identically to Chomsky’s gram-mars, with the crucial difference being that instead of production rules applying se-quentially, they are applied concurrently; this is what allows self-similar substructuresto quickly propagate through what are exponentially expanding strings. The work by


DuBois is a recent example of the use of L-systems for musical composition [DuBois2003]. The author separated the process into string production and string parsing,and noted that choosing the mapping scheme to use for the latter stage was criti-cal to the aesthetic qualities of the result. He described various mapping schemes,such as ‘event mapping’, where a pre-compositional process assigns the tokens in theresulting one-dimensional string to events like notes, rests and chords; and ‘spatialmapping’, where tokens represent distances in pitch from the preceding note, and canbe used to create block chords or combined with event mapping to create melodies.An additional scheme involves ‘parametric mapping’ where tokens are not assignedto musical parameters directly, but to controllers affecting the mapping of subsequenttokens to musical events. Dubois used the intermediate output of musical notationwhich was then interpreted by professional performers [DuBois 2003].

These approaches have allowed for alternatives to the reliance on both implicitand explicit musical domain knowledge, while allowing for the successful generationof coherent self-similar structures; and many authors have espoused their use in algo-rithmic composition in a general sense because of their scope for creating genuinelynew musical material. However, they all ultimately put the user in charge of complet-ing the act of composition by inventing a meaningful mapping from the data streamto musical parameters, which from a musical standpoint is hardly any different tothe ‘auralisation’ of actual natural phenomena such as seismic activity [Boyd 2011] ortree-ring patterns.2

2.2.7 Cellular Automata

Cellular automata (CA) provide a means for the generation of complex emergentstructures from the local interaction of simple, usually orthogonally-interconnectedunits. They have become a popular paradigm for exploring the analogies betweenmathematical models and biological phenomena. The motivation for the use of CAin computer-aided composition is cited by Miranda as being an expert system’s hard-wired inability to compose new musical styles [Miranda 2003]. Some types of CA beara strong relationship to chaotic dynamic systems because they exhibit unpredictablebehaviour at the macroscopic level despite being deterministic. This was formallyidentified by Wolfram, who devised a widely-referenced taxonomy for describing CAtypes [Wolfram 2002]. CA can also been described mathematically in terms of finitestate automata [Neumann and Burks 1966], and ‘static’ L-Systems [DuBois 2003].

A CA consists of grid of cells which begin in an arbitrary initial configuration andupdate their states at every time-step during execution. At a given time-step, t, thenew state of a cell is determined by the state of its orthogonal neighbours at timet− 1 using a set of evolution rules specified before run-time. Cell states are usuallybinary or ternary, and cell types are often classified using a ‘KxRy’ notation, where xrefers to the number of immediate neighbours and y refers to the radius of influence.CA are also classified according to their number of possible evolution rules, whichis a function of the number of possible cell states, the radius of cell influence and

2 http://traubeck.com/years/


the number of immediate neighbours. Wolfram’s taxonomy identified four differentclasses of CA behaviour [Wolfram 2002]:

Type 1 ‘convergent’, in which a static uniform grid state is quickly reached;

Type 2 ‘steady cycle’, in which stable repeating patterns quickly emerge;

Type 3 ‘chaotic’, in which no stable patterns emerge and any apparent structures aretransient;

Type 4 ‘complex’, in which interesting patterns are perceivable but no stability occursuntil after a large number of time steps.

The mapping of a CA to a musical parameter space is non-trivial, and as importantto the act of composition as choosing the rule set. Frequently the resulting patternsare mapped to pitches restricted to a certain scale, such as chromatic, pentatonic ordiatonic [Millen 2004]. Miranda distinguished between simplistic mappings of gridcells to MIDI note numbers and the more sophisticated method of mapping structuralchanges in groups of cells to higher-level musical structures [Miranda 2003]. Bilotta etal. identified analogous mapping categories of ‘local’ and ‘global’ [Bilotta et al. 2001].They also use ‘indirect’ methods of manipulating the structure of the information con-tained in the CA before translating it into music [Bilotta and Pantano 2002]. Resultantstructures characterised by researchers as ’gliders’, ’beetles’, ’solitons’, ’spiders’ and’beehives’ contain varying degrees of recognisable musical harmonies when mappeddirectly from cell states [Bilotta and Pantano 2002].

Miranda presented a CA system for algorithmic composition called CAMUS formapping Conway’s Game of Life to a harmonic musical output using each cell’s coor-dinates [Miranda 2003]. Bilotta et al. described a series of musical works producedusing a genetic algorithm to further evolve the musical information resulting from amapping of a binary CA’s output to musical parameters [Bilotta et al. 2000]. They con-cluded that type 1 CA are good for rhythmic generation, types 2 and 4 are good forharmonic generation, and type 3 are less useful except with very simple initial con-ditions. CA also feature in several interactive compositional or improvisational tools.Millen presented such a system wherein the musical parameters that the cells map tocan be altered by the user during performance in reaction to visual observation of thegrid state [Millen 2004].

Dorin used boolean networks (BNs) instead of CA to produce complexpolyrhythms [Dorin 2000]. BNs are one-dimensional configurations of binary statemachines — that is, each unit performs a boolean operation using the inputs from itstwo neighbours. An autonomous, synchronous boolean network is a special case ofa CA [Dorin 2000]. Dorin observed that it is rare for a BN’s stable pattern to be bro-ken even when significantly perturbed in real-time, and that this makes them idealfor generating rhythmic material for live applications. Dorin also produced a CAmounted on the faces of a virtual cube called LIQUIPRISM, distinguishing it from themore common form of CA environment which models the surface of a torus [Dorin2002]. A stochastic element is introduced by occasionally activating cells which have


been in ‘off’ states after substantial periods of inactivity. The mapping from the CA tomusic in any given time-step is done through a process of eliminating cells which arenot moving from off to on and then selecting a maximum of two cells from each face.Each face maps to a MIDI channel being fed into a synthesiser.

Miranda believes that CAs are appropriate tools for generating new material, butconcedes that they seem better suited to synthesis than composition. In his estimationthe musical results “tend to lack the cultural references that we normally rely on whenappreciating music” [Miranda 2003]. Bilotta et al. noted that as a general rule, only avery small subset of the available rule sets give ‘appreciable’ musical results, but thatcertain configurations can generate ‘pleasant’ harmony [Bilotta et al. 2001]. Dorin hasdemonstrated that the combination of musical and visual output of CAs can manifestas effective multimedia art [Dorin 2002].

2.2.8 Swarm Algorithms

Some researchers have pursued music generation by modelling the interaction be-tween simple agents in artificial swarms. This model is often promoted as a remedy tothe ‘lack of expression’ inherent in knowledge-based systems [Blackwell and Bentley2002]. The approach relies on the self-organisation of agents to form complex emer-gent spatial and temporal structures. Beyls’ view in the context of music generationwas that “behaviour may be thought of as an alternative to knowledge” [Beyls 1990].Although this principle is also fundamental to the use of cellular automata, swarm al-gorithms can instead be traced back to the work of Reynolds, who proposed the firstalgorithms for modelling the emergent geometrical organisation of birds and otheranimals [Reynolds 1987]. Swarm agents are therefore generally much more sophisti-cated than the cells in a CA, being instilled with mobility in 2D or 3D space, sets ofgoals, many possible ‘social’ interactions [Miranda 2003] or ‘personality traits’ [Bisiget al. 2011], and sometimes finite energy sources which must be replenished by theswarm environment [Beyls 1990]. However, the resulting data streams are generallystill at the mercy of the mapping problem; that is, finding a meaningful translationfrom an extra-musical data stream to a musical parameter space [Miranda 2001].

Blackwell and Bentley’s composition system based on swarm behaviour is per-haps the most widely referenced [Blackwell 2007]. In a system called SWARMUSIC[Blackwell and Bentley 2002] the agents or ‘particles’ implement the simple behavioursof swarm attraction and repulsion within the environment of a 3D ‘box’. The au-thors argue that this style of behaviour constitutes a form of ‘swarm improvisation’,conceding that compositional structure generally cannot be achieved by such sim-ple behaviour. A linear mapping occurs in three dimensions corresponding to theparticles’ positions in the box from the perspective of a hypothetical viewer. Thesedimensions, which correspond to particles’ x, y and z coordinates, are MIDI duration,MIDI pitch and MIDI velocity. The default ranges of these parameters are constrainedfor the purpose of Blackwell and Bentley’s implementation. According to the authors,the purported success of the free improvisation system is due to its focus on swarm‘collaboration’ and ‘expression’ – it develops its own ‘musical language’ rather than

§2.3 The Automated Schillinger System in Context 27

attempting to assume a pre-existing one [Blackwell and Bentley 2002].Miranda described a system for producing music using a community of simple

agents with ‘auditory, motor and cognitive skills’ who collectively ‘evolve’ a set ofmelodies, but without the use of a genetic algorithm [Miranda 2003]. This system is anexample of a swarm approach that does not require the mapping from emergent struc-ture to musical information. Miranda’s approach encodes melody using an abstractrepresentation of pitch trajectories forming an overall contour. The contour elementsdictate relative magnitudes of pitch changes, rather than the actual intervals. Agentsare instilled with the goal of imitating what they ‘hear’, and so develop individual setsof (initially random) tunes by gauging the tunes’ success through reinforcement fromother agents. Elements of tunes which are also exhibited by other members of thecommunity are strengthened, and those elements which aren’t are eventually purged.In this way a communal musical ‘repertoire’ is established [Miranda 2003].

Bisig et al. [Bisig et al. 2011] discussed another example of a swarm approach toalgorithmic composition, but this time they confronted what they term the ‘mappingchallenge’ by proposing to shift the focus of musical creation from the mapping itselfto the types of underlying structures created by the flocking simulation. Similar toBlackwell and Bentley’s system [Blackwell and Bentley 2002], in this simulation theneighbourhood forces of attraction and repulsion are implemented which determinethe swarm’s behaviour. Agents are also endowed with ‘adaptive traits’ which changeover time and affect their interaction with the rest of the swarm. The system’s ar-chitecture is split into three stages: the swarm itself, a module which interprets andcodifies the behaviour of the swarm, and a musical engine which integrates elementsof sample playback and granular synthesis. Different pieces are composed by chang-ing the properties of the agents and their environment. Each composition is based ona ‘core idea’, such as the triggering of piano notes via swarm collisions, or the chang-ing spatial distribution of agents to generate rhythms. The authors point out that thesuccess of a swarm algorithm for generating music relies on the continual injection ofhuman creativity in regard to the design of the mapping schemes and the design ofthe simple rules governing agent behaviour [Bisig et al. 2011].

2.3 The Automated Schillinger System in Context

The automated Schillinger System presented in chapter 3 of this thesis uses a set ofgenerative and transformational procedures, each invoked sequentially and seededwith random numbers. It is not interactive and does not rely on a corpus of existingmusical works. Although the generative procedures are necessarily rule-based, inas-much as they are computable, the rules dictate the space of numerical patterns avail-able at each stage of the composition process, rather than the space of legal musicalcombinations. Therefore, although the system clearly employs a form of implicit mu-sical knowledge, whether or not it falls under the umbrella of style imitation is initiallyunclear. This question will be examined in detail in chapter 4. Furthermore, despitethe fact that the system’s musical knowledge is essentially ‘engineered’, it may not be


entirely correct to label it an expert system in the manner of Ebcioglu [Ebcioglu 1988]or Cope [Cope 1987], due to the fact that it does not use a knowledge-base/inferenceengine architecture [Mingers 1986]. In figure 2.1 a dashed line has been placed aroundthe automated Schillinger System, which tentatively includes it in the realm of musi-cal expert systems.

Schillinger’s system as a whole does not lend itself to the adaptation of any par-ticular extra-musical computational approach listed in section 2.2, unlike other mu-sic theory treatises such as those by Piston [Piston 1987] and Hindemith [Hindemith1945] which have been partially implemented using Markov chains [Rohrmeier 2011;Sorensen and Brown 2008]; or standard harmony texts which can be partially ex-pressed as grammar-based optimisation problems [Ebcioglu 1988] or GA fitness func-tions [Phon-Amnuaisuk 2004]. Its automation therefore falls into Supper’s first cat-egory (algorithms which encode musical theory without the use of an establishedextra-musical approach), and partly into Supper’s second category (algorithms usedas a direct manifestation of a composer’s expertise) [Supper 2001], due to the neces-sity for the programmer to define many aspects of the formal interfacing betweenSchillinger’s various theories. In the academic literature, the category into whichthe automated Schillinger System most readily falls is Ames’ definition of ‘bottom-up processing’, which refers to the piecing together of ‘kernels’ of primary materialinto larger compositions using transformation procedures [Ames 1987].

The system presented in this thesis positions itself as a particular collection of al-gorithms for music generation which have not been previously considered as a singleentity for implementation, despite the fact that many of them are commonly usedindividually, and are thus familiar to computer music researchers in a variety of con-texts. As can be seen in figure 2.1, the automated Schillinger System sits within a classof algorithms that process some form of musical domain knowledge, but do not relyon a data-driven or interactive approach to derive that knowledge. This causes it tofall outside of the most common approaches used by computer-aided composition re-searchers, but nevertheless into categories acknowledged by both Ames [Ames 1987]and Supper [Supper 2001].

Chapter 3

Implementation of the SchillingerSystem

3.1 Introduction

This chapter details the construction of an ‘automated Schillinger System’ based solelyon The Schillinger System of Musical Composition. The books of the Schillinger Systemwhich have been considered in the scope of this work are Theory of Rhythm, Theory ofPitch-scales, Variations of Music by Means of Geometrical Projection, and Theory of Melody.Together these theories have been adapted to produce a pair of separate modules, onefor composing harmonic passages and another for composing melodic pieces. Bothmodules operate using the ‘push-button’ paradigm and thus require no interactionwith the user during the composition process.

Sections 3.2 to 3.5 of this chapter constitute a condensed summary of the first fourbooks of Schillinger’s original text to the extent necessary to explain the fundamen-tals behind the current automated system. It will be seen that much of this contentis problematic to realise as a computer implementation and requires the resolutionof inconsistencies or inadequate definitions. Despite this, it is not the purpose ofthis chapter to critically evaluate the practical merit of Schillinger’s formalism, northe mathematical or scientific correctness of any of Schillinger’s generalisations, all ofwhich are matters of contention as noted in section 1.2.3.

Section 3.6 documents the software architecture of the automated Schillinger Sys-tem and describes how Schillinger’s separate theories have been linked together toform the harmonic and melodic modules. It also describes various additional algo-rithms which have been necessary to complete this task. The final section (3.7) liststhe parts of books I–IV which have been omitted from the current system for variousreasons as discussed there.

The discussions of Schillinger’s procedures will not be accompanied by explicitreferences to his original text, however a listing of the most important functions con-stituting the automated Schillinger System can be found in appendix C, and this listmay be used to refer directly back to Schillinger’s volumes if desired.

29

30 Implementation of the Schillinger System

3.1.1 A Brief Refresher

There are many musical terms used throughout this chapter that readers may not befamiliar with, or that have different definitions in other disciplines. This section ex-plains some terminology which should facilitate the discussion while minimising po-tential confusion. Many of these definitions are not rigorous in terms of their broaderimplications, but are nevertheless adequate in the current context.

Pitch/Tone The fundamental frequency of a sound with respect to a discrete systemof musical tuning, in this case the 12-tone equally-tempered system featured ona standard piano keyboard.

Identity The name assigned to a pitch within a system of tuning.

Semitone The smallest distance between any two pitches in the aforementioned tun-ing system, produced by raising or lowering a pitch’s frequency by a factor of

12√

2.

Interval The distance between two pitches measured in semitones.

Octave 12 semitones; the interval at which two pitches share the same identity as aresult of their frequencies differing by a factor of 2.

Register A localised region of the pitch space, applied either as a general notion (forexample ‘high’/‘middle’/‘low’) or as a specific range of pitches.

Scale A group of pitches or intervals which serve as a basis for generating musicalpitch material.

Diatonic Relating to only the pitches belonging to a class of Western scales made upof seven tones.

Chromatic The property of pitches of a musical passage or scale being separated bysemitones, or containing alterations to diatonic pitches.

Tonic The starting pitch in a scale, and/or the pitch that acts as the most importantmusical reference point for a given composition or passage.

Root The starting pitch in a scale.

Duration The length of time between the onset and conclusion of a sounding pitch,usually relative to some reference value or measurement. In this chapter theterm ‘relative duration’ will be used specifically to refer to that which is relativeto a minimum time-span of 1.

Note Usually interchangeable with pitch and identity, but also used to mean a dis-crete unit of musical information possessing duration.

§3.1 Introduction 31

Rhythm A sequence of durations.1

Voice A sequence of single notes related in succession.

Voice-leading The rules or procedures which apply when determining the move-ment of individual voices within the larger wholes of harmony and counter-point.

Polyrhythm Multiple differing rhythms occurring simultaneously.

Texture A term encompassing various aspects of music such as its density in the tem-poral and spectral domains or its aesthetic ‘surface quality’.

Attack The temporal point of onset of a sounding pitch.

Dynamics Variations in loudness or intensity.

Modulation The change or period of change from one tonic to another.

MIDI Musical Instrument Digital Interface; the dominant protocol for passing symbolicmusical information between both hardware and software synthesisers.

In addition to these terms, this chapter uses a standard known as ‘Scientific PitchNotation’2, where a pitch’s label consists of its identity followed by its octave number.Pitches C4–B4 lie in the octave above and including middle-C on a piano keyboard. Itshould also be noted that MIDI note values range from 0–127, with the value 60 beingequivalent to C4.3

The use of Schillinger’s terminology will be kept to a minimum, because not allof it is especially helpful in simplifying the expression of ideas. Many problemswith Schillinger’s heavy use of jargon were quite vocally drawn attention to by Bar-bour [Barbour 1946] and Backus [Backus 1960]. Despite this, several of the terms arestill useful because they serve as short-hand for certain data structures which will bereferred to frequently. All instances of Schillinger’s terminology will be defined asneeded.

3.1.2 The Impromptu Environment

The system is written in a programming environment called Impromptu, an inter-preter with the advantage of built-in interfaces to MIDI and audio drivers. It also hasthe feature of being able to execute selected portions of the text buffer at the user’sbehest, known as ‘live coding’ [Sorensen and Gardner 2010]; however, as the pur-pose of this program is to compose musical passages autonomously rather than facil-itate real-time performances, this feature is not being exploited at present. The reason

1 This is an extremely simplistic notion of rhythm which only applies to the current version of theautomated Schillinger System.

2 This standard has been in use since its adoption by the Acoustical Society of America in 1939.3 MIDI notes 0 and 127 correspond to C-1 and G9 respectively. These pitches exist well beyond the

usable musical range.


Impromptu has been used is that it allows for rapid development in the LISP-basedlanguage Scheme, which has been found by many authors in the field of algorith-mic composition to be appropriate for representing musical information. The built-inMIDI interface also allows for instant musical feedback and hence much faster debug-ging of functions operating in the musical domain. Other algorithmic compositionenvironments such as SuperCollider4 or Max [Zicarelli 2002] would have been equallyappropriate for developing the automated Schillinger System.

The system outlined in this chapter manipulates two dimensions of musical infor-mation at the symbolic level (pitch and duration), which are able to be convenientlymapped to both MIDI data streams and musical notation. In the Impromptu environ-ment, the LISP-style list format is used for coding. Many instances of list notation willaccordingly be used throughout this chapter for illustrative purposes. Pitch is rep-resented as MIDI note numbers. Duration is represented as both ‘relative durations’during the composition process (defined in section 3.1.1), and at the output stage bydurations numerically equivalent to those displayed in standard musical notation.

3.2 Theory of Rhythm

Schillinger’s Theory of Rhythm provides procedures which are mostly used to generateand manipulate sequences of relative durations. In this chapter Schillinger’s term‘rhythmic resultant’ will be used to refer to a sequence of relative durations producedby a rhythmic procedure. Depending on the context, a rhythmic resultant will betreated as either a rhythm to be assigned to a pitch sequence, or a pattern with whichto apply change at any structural level.

3.2.1 Rhythms from Interference Patterns

The ‘interference’ between any number of lists of integers is generated by treatingthe integers as temporal durations, superimposing the lists and forming a single newlist out of the onsets of every duration. A small example is included in figure 3.1 toaccompany this explanation.

3 3

2 22

2 21 1

Figure 3.1: The interference pattern generated from two lists. The top two lists (3 3) and (2

2 2) produce the resultant pattern (2 1 1 2).

The situation in the figure is expressed as follows:

interference-pattern((3 3) (2 2 2)) = (2 1 1 2)

4 supercollider.sourceforge.net

§3.2 Theory of Rhythm 33

A particular space of symmetrical rhythmic resultants called ‘primary resultants’is formed by the interference between two integers, where each integer’s durationrepeats until the point where they both synchronise. The aforementioned figure 3.1 isthe generation of a primary resultant using arguments 2 and 3.

primary-resultant(2 3) = (2 1 1 2)

A ‘secondary resultant’ is generated by recursively calculating the interferencepattern between a primary resultant and the same resultant offset by the larger ofits two initial parameters, until it has a total duration of the square of the larger pa-rameter. This is visualised in figure 3.2.

2 21 1

2 21 1(3)

2 1 1 21 11

Figure 3.2: Secondary resultant of integers 2 and 3

secondary-resultant(2 3) = (2 1 1 1 1 1 2)

The term ‘tertiary resultant’ will be used to refer to either one of a pair of rhythmicresultants which form a polyrhythm – one rhythm existing as the ‘lead’ and one asthe ‘accompaniment’. In the current system the lead and accompaniment resultantsare treated as separate entities (see section 3.7). This function accepts three integersinstead of two, but otherwise uses the same interference method as for a primaryresultant. The ‘lead’ resultant is the pattern formed by all three integers, while the ‘ac-companiment’ is formed by the interference of their respective complementary factorswith respect to a lowest common multiple. In line with Schillinger’s suggestion, thethree-integer parameter lists for the tertiary resultant generator are limited to integerswhich belong to the same summation (Fibonacci) series.

tertiary-resultant-lead(2 3 5)= (2 1 1 1 1 2 1 1 2 2 1 1 2 2 1 1 2 1 1 1 1 2)

tertiary-resultant-accompaniment(2 3 5) = (6 4 2 3 3 2 4 6)

Three trivial ways of combining primary and secondary resultants to form modestself-contained rhythmic patterns are mentioned, each of which utilises a single pair ofparameters. They are listed using Schillinger’s terms below:

Balance: a concatenation of the secondary resultant, the primary resultant, and therelative duration equivalent to the larger of the two parameters;

Expand: a concatenation of the primary resultant and the secondary resultant;


Contract: a concatenation of the secondary resultant and the primary resultant.

res-combo-balance(2 3) = (2 1 1 1 1 1 2 2 1 1 2 3)

res-combo-expand(2 3) = (2 1 1 2 2 1 1 1 1 1 2)

res-combo-contract(2 3) = (2 1 1 1 1 1 2 2 1 1 2)

3.2.2 Synchronisation of Multiple Patterns

Material may be obtained by the synchronisation of a rhythmic resultant with a se-quence of arbitrary elements, the latter of which may represent pitch values or higher-level structural elements. There are two procedures used in this implementationwhich fall under this umbrella. The first procedure combines elements from the cyclicrepetitions of each sequence until both sequences end simultaneously. The result ofthis is a pair of sequences each containing a number of elements equal to the low-est common multiple of the lengths of the two inputs. Figure 3.3 contains a musicalexample to illustrate the concept in visual terms.

Figure 3.3: The synchronisation of a duration pattern with a pitch sequence. Each pitch ispaired with a duration, in a cyclic fashion, until both sequences end simultaneously.

The second procedure interprets the rhythmic resultant as a sequence of coeffi-cients C of length m, and synchronises it with an arbitrary sequence of elements E oflength n such that element ei mod n is appended to the result ci mod n times. This contin-ues until the last elements in both C and E are processed simultaneously. The resultsof this procedure are often used as parameter vectors for input to other procedures.In the following example, the element ‘0’ is repeated three times, ‘1’ is repeated twice,and so on.

coefficient-sync((3 2 1) (0 1)) = (0 0 0 1 1 0 1 1 1 0 0 1)

3.2.3 Extending Rhythmic Material Using Permutations

Schillinger provides a small set of methods for building longer and more complexrhythmic patterns from the variations of short and simple ones. The predominantmethod of achieving variation throughout Schillinger’s system is by permutation. The‘circular permutations’ of a sequence are a subset the complete permutations of thatsequence, formed by iteratively moving the last element of a sequence to the head orvice versa; for example:

§3.3 Theory of Pitch Scales 35

circular-permutations(2 1 1) = ((2 1 1) (1 2 1) (1 1 2))

For most purposes the circular permutations are recommended by Schillinger be-cause they retain substructures present in the original material. An example belowshows the use of circular permutations to build a longer duration sequence — a ‘con-tinuity’ to use Schillinger’s term — from a shorter one.

general-continuity(2 1 1) = (2 1 1 1 2 1 1 1 2)

Three further methods of deriving new patterns through circular permutationscan apply to sequences which are already the required total duration. In the first twoinstances, the sequences are assumed to be primary, secondary or tertiary resultants.

• Split the sequence into into a set, S, of n groups of equal total duration, such thatn > 1 and is the smallest factor among the integers used to generate the originalsequence. Select from the circular permutations of S.

• Split the sequence into a set of groups, S, where each group is of total duration nand n is the larger of the integers used to generate the original sequence. Selectfrom the circular permutations of S.

• Select from the circular permutations of the original sequence.

3.2.4 Rhythms from Algebraic Expansion

A space of non-symmetrical rhythmic resultants can be obtained by a method of al-gebraic expansion. A relative duration sequence D of length n and total duration dis raised to a power x using a brute-force method with no intermediate summations.The resultant is a sequence of nx terms with a total duration dx. For example:

(2 1 1)2 = (4 2 2 2 1 1 2 1 1)

An additional important part of this procedure, as far as Schillinger is concerned,is overlaying the resultants of all powers 0 . . . x to form a texturally rich polyrhythm.This is done by multiplying the elements of each resultant Di (for i < x) by a scalarnx−i to make each resultant the same total duration. As mentioned in section 3.7,polyrhythms are not within the scope of this work; instead the resultants are treatedindividually.

3.3 Theory of Pitch Scales

Schillinger’s Theory of Pitch-scales contains both scale generation techniques and har-mony generation techniques. Schillinger’s long and detailed theories of harmonyhave not been considered in the current scope of the work due to time constraints; in-stead, the harmony generator that is discussed in section 3.6.2 derives its initial chordprogressions from the procedures in section 3.3.4.


In this section, and throughout the rest of the chapter, the term ‘scale’ will beused to refer to a sequence of intervals, while ‘pitch-scale’ will refer to a sequenceof pitches instantiated from a scale using a tonic pitch. Algorithmic composition re-searchers tend to prefer one representation over the other depending on the natureof the problem being attempted; the automated Schillinger System uses both of theserepresentations, each depending on the requirements of the procedure at hand. Ascale is variously converted into a ‘local’ pitch-scale for some purposes and a ‘full’pitch-scale for others: the local pitch-scale will contain one more pitch than the num-ber of intervals in the scale, while the full pitch-scale is the enumeration of a scale overthe entire span of the valid pitch range (in this case, MIDI note values 0–127).

3.3.1 Flat and Symmetric Scales

The first group of scales will be known as ‘flat’ scales. A flat scale is a list of intervalswith no sub-lists. Such a scale is defined by Schillinger as having a range of less thanone octave – that is, a maximum range of 11 semitones – and a number of intervalsbetween 1 and 6. Aside from these two constraints, randomly generated flat scales areuniformly distributed over the space within the octave.

So-called ‘Western’ scales are a subset of the six-interval scales which, Schillingerargues, can be shown to be built from ‘tetrads’ — four-note combinations implyingthree-interval scales. The three-interval scales from which Western scales may bebuilt are (2 2 1), (2 1 2), (1 2 2) and (1 3 1). An arbitrary Western scaleis built by joining two of these sub-scales with a centre interval of 2, and subsequentlyremoving the last interval (the last interval produces a repetition of the tonic at theoctave which is not necessary for its completeness). For example, the scale known asharmonic minor is formed like so:

(2 1 2) (2) (1 3 1) −→ (2 1 2 2 1 3)

In this implementation, a bit passed to the flat scale generator specifies whetherto restrict six-interval scales to Western scales or not (see the parameter settings insection 3.6.3).

A ‘symmetric’ scale consists of a group of identical sub-scales spaced at equal in-tervals over a specified number of roots which are relative to an arbitrary tonic. Thesescales may span one or more octaves. They are represented by a three element set con-sisting of a flat scale, the number of roots and the interval between the roots. Thoughit is not the place to go into detail about the implications of twelve-tone equal tem-perament, it is enough to state that a number of roots equal to a factor of twelve isrequired for the scale to be both mappable onto the tuning system in question andrepeat at some number of octaves while remaining ‘symmetric’. The possible formsof symmetric scale are listed in table 3.1.

In all nine cases the maximum range of the sub-scales is one semitone less than theroot interval, and the range is allowed to be zero. A symmetric scale is generated byrandomly selecting one of the nine types, and then selecting a random flat scale of theappropriate maximum range to be the sub-scale associated with each root. In many


Table 3.1: Symmetric Scale Properties

Roots Total Range Root Interval Total Range Root Interval

2 12 (1 8ve) 6 - -3 12 (1 8ve) 4 24 (2 8ves) 84 12 (1 8ve) 3 36 (3 8ves) 96 12 (1 8ve) 2 60 (5 8ves) 1012 12 (1 8ve) 1 132 (11 8ves) 11

cases a symmetric scale must be ‘flattened’ by concatenating a number of sub-scalesequal to the number of roots, each appended with an appropriate interval to fill in thespace between each sub-scale and its following root. Symmetric scales contain muchmore information than flat scales. How this information is used by the harmonic andmelodic modules is discussed in sections 3.3.4 and 3.5.2 respectively.

3.3.2 Tonal Expansions

The tonal expansion of a pitch-scale increases the total interval range of the pitch se-quence while retaining the pitch identities (that is, the same notes in potentially dif-ferent registers). The expansion of order zero is defined as the original setting of apitch-scale; or more precisely, one in which its total interval content could not be re-duced while retaining all the pitch identities. The first-order expansion of a pitch-scaleis generated by cycling through the pitches and selecting every second pitch from thetonic (pitches 1, 3, 5 and so on), skipping over repeated pitches. The pitches in thenew sequence are register-adjusted so that the sequence increases in pitch. An exam-ple tonal expansion is given below and is visualised in figure 3.4.

0th order: (c d e f g a)

1st order: (c e g d f a)

Order 0 expansion (original)

Order 1 expansion

Figure 3.4: The tonal expansion of a pitch-scale.

The ith-order tonal expansion is therefore attained by selecting every (i+ 1)th pitchin the 0th-order pitch-scale and transposing them into order of increasing pitch in thesame manner as above.

To perform the tonal expansion of an arbitrary melodic sequence, the originalpitch-scale S of the melodic pitches must be known. After performing an ith-ordertonal expansion on S to obtain S′, a scale ‘translation’ function maps the pitches in


the sequence from S to their corresponding positions in S′. Pitches in the originalsequence that are not in S are left unmodified.

The tonal expansion of a scale, as opposed to a pitch-scale, is necessary in manyinstances. In this case an arbitrary tonic is set, the scale is converted into a local pitch-scale, the above expansion procedure is performed, and the resulting pitch-scale isconverted back into a flat scale.

3.3.3 Nearest-Tone voice-leading

Nearest-tone voice-leading aims to minimise the total interval movement betweeneach voice from one chord to the next in a harmonic passage. This procedure is sug-gested by Schillinger in lieu of the specific voice-leading techniques he introduces inlater theories of harmony. It is applied in many places in his text, but only informally,such that many of the demonstrations do not represent ‘optimal’ solutions. For thisimplementation, it will be assumed that the aim of nearest-tone voice-leading is in factto produce chord progressions with optimised minimum voice movement.

An example is given here of optimal nearest-tone voice-leading between two four-voice chords A and B. Chord A consists of fixed pitches, while the pitches in chord Bcan be octave-transposed (that is, moved ±12x semitones, x > 0) and reordered.

A = (72 67 64 45)

B = (72 56 55 51)

The total interval movement between voices of the unmodified pair of chords is 12— this is the result that must be minimised. The interval resulting from aligning a notebi with a j is found by transposing bi to a register such that |bi′ − a j| ≤ 6. The algorithmimplemented in this system first generates an interval matrix M representing the idealalignments between all possible pairs of pitches in A and B, where both chords consistof n voices.

M(A, B) =

|b0′ − a0| . . . |bn′ − a0|...

...|b0′ − an| . . . |bn′ − an|

=

0 4 5 35 1 0 44 4 3 13 1 2 6

The optimal voice-leading combination can be found by converting the matrix M

into a graph with adjacent rows and columns fully interconnected, in which nodesrepresent costs; and tracing a shortest path between either pair of opposite sides withthe constraint that no row or column can be visited twice (this would imply re-usinga pitch from B). This is shown in figure 3.5. Unfortunately, for the general case thegreedy solution for this problem is usually sub-optimal, so the algorithm uses a re-cursive depth-first search with back-tracking and pruning to guarantee an optimalpath.

The optimal nodes visited during the search correspond to the voice-leading inter-vals created from the best alignment of the two chords: thus, tracing the resulting path


0 4 5 3

5 1 0 4

4 4 3 1

3 1 2 6

Figure 3.5: Nearest-tone voice-leading search graph. The dotted line represents the sub-optimal greedy solution; the solid line is the optimal solution found by back-tracking.

through the graph from one side to the other gives the pairs of pitches from A and Bthat should be aligned to each other using octave transposition. In this example, theoptimal voice movement is found by substituting B, through subsequent reorderingand octave-transposition of its original elements, with the chord (72 67 63 44).This gives a total interval movement of 2. The result is visualised in figure 3.6.

A

B

Figure 3.6: The result of performing nearest-tone voice-leading with a fixed chord A and ad-justable chord B

The computational complexity of nearest-tone voice-leading for a sequence of mchords with n voices is O(mn!). However, in practice it runs significantly faster thanthe worst case scenario which would be equivalent to a brute-force approach. The po-tential for troublesome execution times for lengthy harmonic passages is offset by thefact that n is usually small. n ≤ 7 is used in the current system, and this encompassesa large range of potential harmonic textures and densities.


3.3.4 Deriving Simple Harmonic Progressions From Symmetric Scales

The Schillinger System provides two ways of deriving harmonic progressions frompitch-scales. This section will outline both procedures as they have been implementedand discuss the problem of choosing between them automatically.

The first procedure converts a pitch-scale into a progression of chords which aren-note aggregates of the pitch-scale units, where m is the number of notes in the pitch-scale and 2 ≤ n ≤ m. The number of chords in the series will always be equal to m.Chords with roots towards the top of the pitch-scale must inherit pitches beyond thepitch-scale’s range, octave-transposed from below. An example is given in figure 3.7for the symmetric scale ((5) 3 8) with C4 as the tonic and n = 3. The Ti bracketsdenote the roots and their sub-scales.

T0 T1 T2

Figure 3.7: Procedure 1: Extraction of n = 3 triadic harmony from a symmetric scale

Using this chord progression a ’hybrid harmony’ consisting of n + 1 voices isformed by adding a bass line centred an octave below the tonic. The bass line con-sists of the notes of the pitch-scale with their total interval range contracted throughoctave transpositions. The upper parts of the hybrid harmony are then processedusing nearest-tone voice-leading (see figure 3.9).

T0 T1 T2 T3

Figure 3.8: Procedure 2: Extraction of sub-scale tonal expansions from symmetric scale

The second procedure for deriving harmony from a symmetric pitch-scale is totake the 1st-order tonal expansions of each sub-scale, and to treat the resulting pitchsequences as chords. When the second procedure is used, the number of voices n isnecessarily equal to the number of roots in the symmetric scale. An example is givenin figure 3.8 for the symmetric scale ((2 3 2) 4 9) with C4 as the tonic. The Tibrackets denote the roots and their sub-scales. As tonal expansions near the top of thescale often get quite high above the musical staves, they have been transposed to thesame register for the figure 3.8 example; this does not occur in the system.

§3.4 Variations of Music by Means of Geometrical Progression 41

No bass line is added in the second procedure to form a hybrid harmony. Theharmonic progression is processed using nearest-tone voice-leading as before. Afterthis processing the harmonies from each procedure appear as in figure 3.9.

Procedure 1 ('hybrid harmony'):

Procedure 2:

Figure 3.9: Results of initial harmonic procedures after nearest-tone voice leading

Deciding between the two procedures is not clear-cut. Schillinger states that whenthe original setting of symmetrical pitch-scale is ‘acoustically acceptable’, it is appro-priate to use procedure 1; while a lack of acoustic acceptability should invoke proce-dure 2. This term is not defined by Schillinger, so in order to automate the decisionthe terminology is interpreted to mean “containing sufficiently large intervals, on av-erage, to avoid resulting cluster chords”. This implementation defines an acousticallyacceptable symmetric scale to be one possessing both mean and mode intervals of≥ 3 semitones when converted to a flat scale. The tendency is then for sub-scales withmany close intervals to be expanded.

Whether a scale is ‘acoustically acceptable’ or not has little bearing on how muchconsonance or dissonance a harmonic passage will contain after it has been processedfurther using the method in section 3.4.2. Moreover, it is generally not the nature ofSchillinger’s system to discriminate between consonant and dissonant harmonies, be-cause this undermines his holistic approach to musical style. Determining this prop-erty automatically without any the use of any kind of musical sensibility inserts aseemingly haphazard constraint.

3.4 Variations of Music by Means of Geometrical Progression

Schillinger’s geometrical variations correspond partially with aspects of other musicaltheories, such as Schoenberg’s ‘serial’ technique [Rufer 1965]. They are also similar tothe operators employed in a variety of algorithmic composition approaches in theacademic literature, such as mutation operators in genetic algorithms [Biles 1994].

3.4.1 Geometric Inversion and Expansion

The inversion I of a single pitch x occurs with respect to a pivot note p. The inversionof an entire chord simply maps each chord pitch in this way. The pivot in the examplein figure 3.10 is C5.

I(x, p) = p− (x− p)


Original:

Inversion:

Figure 3.10: Pitch inversion

The value of the pivot in almost all cases in Schillinger’s text is either chosen arbi-trarily or fixed as the tonic pitch of the passage being inverted. This implementationalways uses the tonic as the pivot. In the case of a sequence of pitches or chordsconstituting a melodic or harmonic sequence, the pitches can either be inverted in-place with the above formula, or in the temporal domain by reversing both the pitchsequence and its associated duration sequence. The taxonomy of Schillinger’s ‘geo-metrical inversions’ follows in table 3.2. The common names for equivalents used inother musical theories are added for reference.

Type Description Common terminology1 No modification —

2Reversal of both pitch sequence and rhythm

Retrograde (R)sequence

3Inversion of individual pitches followed by

Retrograde Inversion (RI)a type 2 reversal

4 Inversion of pitches only Inversion (I)

Table 3.2: Schillinger’s inversion taxonomy

The expansion of material can occur in either the durational dimension or the pitchdimension, as is the case with geometrical inversions. The nth order expansion E of asingle note x with respect to a pivot p is given by the formula below, and the expansionof a single chord is mapped in the same way as shown by the example in figure 3.11,where p = C4 and n = 2.

E(x, n, p) = p + n(x− p)

Original:

Expansion:

Figure 3.11: Pitch expansion

As with inversion, the pivot value is sometimes chosen arbitrarily by Schillingerbut is usually the tonic pitch. This implementation always uses the tonic pitch as thepivot. Note that the 1st order expansion maps to the original chord, while the 0th orderexpansion projects every pitch onto the pivot. The same formula is used to expandthe pitches of harmonic or melodic material. Expansion in the temporal domain isperformed by simply multiplying a sequence of durations by a scalar.

§3.4 Variations of Music by Means of Geometrical Progression 43

3.4.2 Splicing Harmonies Using Inversion

Schillinger provides what will be referred to henceforth as a ‘splicing’ procedure. Itfirst generates a vector of inversion types by synchronising a rhythmic resultant witha list of possible inversion types (see the second procedure in section 3.2.2), then usesthe vector to select and concatenate the inversions of chords from an initial chordsequence. Counters keep track of the points in the initial chord sequence that theprocedure is up to. Type 1 and 4 inversions cause a counter to move forwards, whiletype 2 and 3 inversions cause a counter to move backwards. This process continuesuntil the resulting sequence is the same length as the original. Once this splicingprocedure is complete, the voices in the starting chord are shuffled, and an applicationof the nearest-tone voice-leading algorithm produces the final harmonic passage. Theshuffling rearranges the vertical structure of the chord without changing the chord’sidentity or the identities of any of the individual pitches. It has the effect of increasingthe potential for different harmonic textures and densities.

Figures 3.12–3.14 show an example of this process, using an initial chord sequenceC and a vector V, generated from the rhythmic resultant R and type sequence T. Thepivot used for the inversion is C4, which is the tonic pitch closest to the center of thefirst chord of C.

R = (2 1 1 2)

T = (3 4)

V = coefficient-sync(R, T) = (3 3 4 3 4 4)

Chord: 1

2

3

4

5

6

7

8

9

10

11

12

13

Figure 3.12: Original chord sequence C

Inversiontype:

3

4

3

4

3

4

3

4

3

Chord: 13

12

13

12

13

1

12

11

4

9

6

7

6

Figure 3.13: Spliced chord sequence using C and V

Inversions of harmonic sequences can be musically analysed in terms of their re-lationship to the tonic: in simple tonal music, for instance, the inverted tonic chord


Figure 3.14: Nearest-tone voice leading applied to sequence in figure 3.13

is equivalent to the subdominant with the opposite major/minor identity; and the in-verted tonic-relative chord is equivalent to either the counter-tonic or the secondarydominant depending on whether the tonality is major or minor. No further musicaldetail will be entered into, but it is appropriate to point out that inverting segmentsof a chord progression usually adds complexity in a way that can be considered mu-sically meaningful – it does not simply jumble the base material; nor is it a techniquelimited in practicality to 20th Century atonal music [Rufer 1965].

3.5 Theory of Melody

Melodic theories are far scarcer in musicological literature than harmonic theories, asobserved by Hornel and Menzel [Hornel and Menzel 1998]. At around the same timethat Schillinger was teaching his method in New York, the composer Paul Hindemithcommented on the ‘astounding’ fact that “. . . instruction in composition has never de-veloped a theory of melody” [Hindemith 1945]. Schillinger’s attempt to formalisemelody was therefore quite unusual. In a nutshell, his method for melodic genera-tion is to develop an abstract melodic contour, superimpose a rhythmic pattern andpitch-scale on the contour to obtain a melodic fragment, and then concatenate variousmanipulations of the fragment into a larger melodic composition with characteristicsof musical form. As will be seen, his theory contains many uncertainties and compli-cations for implementation.

3.5.1 The Axes of Melody

Schillinger’s concept of musical contours refers to combinations of linear segments,each with a specified pitch range and total duration. There is evidence presented in[Kohonen 1989] that the use of this idea dates back to at least 1719 with the composerVogt. There is also a reference to the far more recent work of Myhill in [Ames 1987]which used a similar technique in the context of computer-aided composition.

Schillinger describes melodies as sequences of pitch with duration in relation toa ‘primary axis’. This primary axis is, in fact, related to an extant statistical mode ofa particular passage or section (that is, the most commonly occurring pitch identity)and, for the most part, reduces to the tonic. As this thesis is concerned with gener-ating music, rather than analysing it, this definition is not especially relevant here.However, Schillinger’s Theory of Melody also uses the term more generally to mean anarbitrary ‘zero crossing’ pitch in a melodic contour; that is, the point of equilibrium

§3.5 Theory of Melody 45

that all melodic movements occur in relation to.The term ‘secondary axis’ is used to refer to an individual segment of a melodic

contour. The type of a secondary axis dictates the general direction of its melodictrajectory in terms of its change in pitch relative to the primary axis. Henceforth theterm secondary axis will be referred to as simply ‘axis’, and a contour formed bymultiple axes will be referred to as a ‘system of axes’.

Axes which move away from the primary axis are referred to by Schillinger as‘unbalancing’, while those which move towards it are known as ‘balancing’. The un-balancing axes are thought of as implementing musical ‘tension’; the balancing axesmusical ‘release’. Schillinger’s axis types are most easily represented using the tax-onomy outlined in figure 3.15. The variable p shown in the diagram is known as the‘pitch basis’, which is the default height of an axis in semitones.

Unbalancing Balancing Stationary

primary axis

p

2p

-p

-2p

1 2

34

6 7

9 8

5

0

10

Figure 3.15: Taxonomy of axis types

Combination axis types are possible, as demonstrated in figure 3.16. These can beexpressed using any of the axis types in figure 3.15 with the proviso, inferred fromSchillinger’s examples, that a combination does not contain both an unbalancing anda balancing axis. The melodic contour can then ‘oscillate’ between the axes using somepattern of alternation, allowing for more elaborate contours.

Exactly how to oscillate between the axes in an axis combination is a concept that isexpressed only informally by Schillinger. As with his other ‘forms of motion’, whichwill be discussed in section 3.5.3, they are presented using hand-drawn continuoustrajectories which the composer is expected to convert to a discrete representationusing their own judgement. In this implementation, the pattern of alternation is in-cluded in the representation of the axes: for example, when mapping a discrete pitch-scale to the axis (1 4 (2 1)), two pitches map to axis type 1, followed by one pitchon axis type 4, and continuing cyclically. The treatment of these axis type combina-tions is otherwise the same as for the individual axis types, as described in sections3.5.2 and 3.5.3.

Finally, each axis is accompanied by a ‘pitch ratio’ P and a ‘time ratio’ T, which


primary axis

p

2p

-p

-2p

(0 6)

(1 4) (2 0 3)

Figure 3.16: Examples of axis combinations, which the contour alternates between.

act as coefficients for the pitch basis p and a time basis t. These parameters affect thespeed of changes in pitch, and the total interval range over which they occur. Figure3.17 illustrates this. The variable t is a relative duration that can be thought of asanalogous to the numerator of the music’s time-signature.

p

2p

-p

-2p

Type = 1P = 2T = 1

Type = (4 9)P = -1T = 2

t t t

primary axis

Figure 3.17: Time and pitch ratios, T and P, applied to axes, which alter their default rate andtotal range of change in pitch.

3.5.2 Superimposition of Rhythm and Pitch on Axes

The process for converting a melodic axis into a pitch sequence is split into two proce-dures. First, the points of intersection between the axis and the rhythmic attack pointsare established by multiplying the aggregate duration from the start of the axis by theaxis gradient for each point. The gradient of an axis is the ratio between the product ofthe pitch coefficient and pitch basis, and the product of the time coefficient and timebasis. The vertical components of the resulting intersection points can be interpretedas frequencies belonging to an arbitrary tuning system. An example using a system ofthree axes is shown in figure 3.18, in which the time basis is t = 4 and the pitch basis isp = 5. Schillinger gives no particular indication of how to arrive at these values, but it


can be inferred from his examples that p should be half the total interval range of thechosen scale (rounded upwards) and t can be chosen at random from an appropriaterange (see section 3.6.3).

2 2 2 31111111

Primary Axis

2t t t

2p

p

Figure 3.18: Superimposing rhythmic resultants onto an axis system. The duration attacks areprojected onto the axis to produce a set of intersection points.

The second procedure maps the vertical components of the intersection pointsonto discrete pitches within the standard Western tuning system, which is equivalentto the MIDI pitch space (twelve-tone equal temperament). In the case of flat scales,this requires the selected local scale to be instantiated as a full pitch-scale across therange of MIDI pitches using the tonic as the primary axis. The diagram in figure 3.19shows an example of the intersection points from figure 3.18 in relation to the intervalsof the flat scale (2 1 2 2 1 2).

For symmetric scales the superimposition method is similar, except that each sub-scale root is taken into account. First, the sub-scales are rearranged through octave-transposition, such that original distance r between roots is reduced to 12− r and thesub-scale whose root is the tonic is positioned in the middle of the other sub-scales.The melodic axes are then partitioned and shifted vertically, if required, so that seg-ments within distance p above the primary axis are assigned the primary axis, seg-ments within distance 2p above the primary axis are assigned the root 12− r above,and so on. Segments within distance p below the primary axis are assigned the root12 − r below, and so on. Separate pitch-scales for each root are then superimposedon their respective axes; from thereon the rest of the process for melodic generation isidentical.

It can be seen in figure 3.19 that most of the intersection points do not fall neatly in


Primary Axis

2

2

2

2

2

2

2

1

1

1

Ful

l Sca

le Loca

l Sca

le

a1

a2

a3

a4

a5

a6

Figure 3.19: Superimposition of the flat scale (2 1 2 2 1 2) on the system from figure 3.18.Vertical components of the intersection points must be adjusted to align with the pitches in thegiven scale on the left.

line with the discrete pitches of the scale. Schillinger stops short of providing rules forresolving each situation, focussing instead on the notions of ‘ascribed’ motion (mov-ing to the ‘outside’ of the axes), ‘inscribed’ motion (moving to the ‘inside’ of the axes)and various forms of discrete oscillatory motion; and leaving it the composer to exer-cise musical judgement. Consequently, the examples in Schillinger’s text do not fol-low any ostensible rules consistently enough to be extended to general cases. This isunderstandable from the outset, given his philosophy of reducing the presence of po-tential stylistic constraints in his system, but it does mean that automatically resolvingthe intersection points to scale pitches manifests as a significant obstacle in adaptingthe framework to computer implementation. This problem will be addressed in detailin section 3.5.3.

3.5.3 Types of Motion Around the Axes

This section outlines one possible algorithm for mapping the vertical componentsof the intersection points, found using the procedure shown in figure 3.18, onto thepitches of a discrete pitch-scale.

The most difficult part of the Theory of Melody to formally adapt is Schillinger’s no-tion of fine-grained oscillatory melodic motion relative to the axes. This is primarilybecause the different types of motion tend to be defined using hand-drawn continu-ous curves, which are generally intended to be converted to a discrete representation


using a composer’s musical judgement. To complicate matters further, Schillinger in-correctly defines the motions of sine and cosine, as Backus also noted [Backus 1960].Despite this, it is possible to derive a concrete framework that implements the typesof oscillation Schillinger intended to represent. They can be reduced to ‘inscribed’motion, ‘ascribed’ motion, ‘alternating’ motion and ‘revolving’ motion.

Inscribed and ascribed motion require intersection points to be dragged to scalepitches that are on the side of the axis closest to and furthest from the primary axis,respectively. Alternating motion requires a continuous crossing of the axis, as shownin figure 3.20(a), while revolving motion is supposed to follow a more ‘sine-like’ cross-ing of the axis as shown in figure 3.20(b).

0

1

-1

0

1

-1

0

1

-1

0

1

-1(a) (b)

Figure 3.20: Alternating and revolving motion types about an axis, represented here as zero

Although Schillinger’s definitions for these four motion types appear to be pre-sented in clear terms, the precise rules for applying them to axes with non-zero gra-dients can only be inferred through demonstration, and unfortunately the definitionsoften contradict his use of them in the provided examples. Therefore it has been neces-sary for this author to devise an appropriate algorithm from scratch in order to allowthe system to function (see algorithm 3.1 below). Two principles were adhered to inan attempt to avoid imparting too much of the author’s aesthetic influence on thesystem. Firstly, the algorithm is tuned to reproduce Schillinger’s examples as closelyas possible on average; and secondly, it is designed to tend away from sequences ofrepeated notes. The latter decision is based on a general compositional principle thatwas judged not to be inherently style-specific.

For implementation, the types of motion can be sufficiently encoded using thefollowing parameters.

• bias := inscribed (-1) | ascribed (1)

• alternating := true | false

• revolve := down (-1) | none (0) | up (1)

A ‘motion type’ is assigned to every axis as a (bias alternating revolving)tuple. The bias switches polarity at every intersection point if the alternating bitis set to true. Revolving motion is applied in a constant fashion regardless of thecurrent bias setting — if the revolve field is set to −1, the melody moves down inpitch, while if set to 1 the melody moves up in pitch. Only when the revolve fieldis zero is bias applied. If the revolve field is initialised to zero, no revolving motionoccurs at all; whereas initialising it to non-zero causes different results depending on


Algorithm 3.1 Resolve a sequence of intersection points X to pitches P, using pitch-scale S, motion parameters bias, alternating and revolve, and the axis gradient.

for all xi in X doif revolve = 1 then

pi = above(S, pi−1)nextrev = −1revolve = 0

else if revolve = −1 thenpi = below(S, pi−1)nextrev = 1revolve = 0

elseif bias and gradient are same polarity then

xi = xi+1end ifif xi falls exactly on a pitch-scale note then

pi = xielse

if xi is equidistant from below(S, xi) and above(S, xi) thenif bias = 1 then

pi = above(S, xi)w = below(S, xi)

elsepi = below(S, xi)w = above(S, xi)

end ifelse if above(S, xi) is closer to xi than below(S, xi) then

pi = above(S, xi)w = below(S, xi)

elsepi = below(S, xi)w = above(S, xi)

end ifif pi = pi−1 then

pi = wend if

end ifrevolve = nextrev

end ifif alternating then

bias = −biasend if

end for

N.B. when pi−1 or xi+1 exceed the range of i, the pitches corresponding exactly tothe start and end-points of the axis are assigned instead.


its initial polarity. This means that, in total, the parameters allow for twelve differentforms of motion.

In algorithm 3.1, the functions below(S, a) and above(S, a) are assumed to returnthe closest pitch from S which is below or above the point a. To illustrate how thisalgorithm applies to the scenario shown previously in figure 3.19, all twelve motioncombinations are documented in table 3.3 and figure 3.21 as they pertain to the firstaxis in that scenario, with the primary axis instantiated as C4.

Bias -1 1Alternating T F T F

Revolve Type -1 0 1 -1 0 1 -1 0 1 -1 0 1Label Cross-point

a1 60.00 60 60 60 60 60 60 62 62 62 62 62 62a2 62.50 63 63 63 62 62 62 63 63 63 63 63 63a3 63.75 65 65 62 63 63 60 65 65 62 65 65 62a4 65.00 67 67 67 65 65 65 65 65 65 67 67 67a5 66.25 65 65 68 63 67 67 63 67 67 65 68 68a6 67.50 70 70 70 67 68 68 67 68 68 70 70 70

Figure 3.21 references A B C D E F G H I J K L

Table 3.3: Resolution of points in figure 3.19 using the possible motion combinations, with C4

as the primary axis

A: B:

C:

D: E:

F:

G: H:

I:

J: K:

L:

Figure 3.21: Musical representation of the results in table 3.3

Figure 3.22 shows the result of applying the motion type (-1 false 0) to everyaxis in the figure 3.19 scenario.

Figure 3.22: Resolution of figure 3.19 using motion type (-1 false 0)

As a final case in point, Schillinger’s retrofitting of the opening of a compositionby J.S. Bach to a pair of axes to demonstrate the theory’s efficacy is compared with thesame pair of axes processed using the automated Schillinger System. This serves to


illustrate some of the issues that have been mentioned. The comparison can be foundin table 3.4 and figure 3.23.

Table 3.4: Modelling Bach: Schillinger’s representation and this system’s equivalent

Axis Parameter Schillinger’s text This system

1Axis type ‘ a

0 ’ (1 0 (1))Rhythm (-2 2 2 2 2 2) (-2 2 2 2 2 2)Motion ‘sine with increasing amplitude’ (-1 false 0)

2Axis type ‘b’ 2Rhythm (2 1 1 1 1 1 1 1 1 1 1) (2 1 1 1 1 1 1 1 1 1 1)Motion ‘sine+cos with constant amplitude’ (1 false -1)

Scale (2 2 1 2 2 2) (2 2 1 2 2 2)

Figure 3.23: Modelling Bach: comparison between Schillinger (left) and the this system (right)

The fact that the automated Schillinger System comes close to replicating the pas-sage from Bach is not intended to be a measure of its success. In fact, it raises thequestion of whether Schillinger’s system (and, by extension, the automated system)is really capable of generating music independent of style, or if it has simply beenmodelled off existing music using a different methodology to the treatises whichSchillinger hoped to supersede. In order to examine this question properly, it is neces-sary to collect data on the stylistic properties of the system’s output. The experimentsdesigned to do this can be found in chapter 4 of this thesis. In any case, the param-eters and the algorithm presented in this section provide a concrete specification ofaxis-relative motion which this author believes successfully encapsulates the ideasSchillinger expressed informally.

3.5.4 Building Melodic Compositions

A system of axes which has been converted to a sequence of pitches with an asso-ciated sequence of relative durations forms a melody. Depending on the stochasticparameters which have been used to generate it, the melody may be reasonably mu-sically self-contained, or it may constitute a short melodic fragment. In both cases thismelody is used as the basic material for building a complete melodic composition.This is done by appending the initial melody with a series of modifications of either


the melody or its individual axes. Schillinger suggests that these modifications can beany combination of the following:

• Tonal expansion

• Circular permutation

• Geometrical inversion (types 1–4)

• Geometrical expansion

The procedure which builds the melody takes a vector representing the sequencesof axes to use, and four vectors representing the respective sequences of modifications.As usual, Schillinger provides no formal guidelines for generating these vectors otherthan implying that the original melody should feature unmodified at the beginningand with minimal modification at the end of the composition. This basic constrainthas been implemented, as well as some other constraints which have been informedby Schillinger’s examples. In all instances below, L is the nominal length of the finalcomposition.

• The axis vector A is defined as {a0, a1, . . . , aL}; 0 ≤ ai ≤ n, where n is the numberof axes constituting the initial melody, and zero is used to denote the full initialmelody comprising all the axes in their initial order. The axis terms a1 . . . aL−1

are selected randomly with 10 percent weighting given to a value of zero and90 percent distributed evenly among rest, while the term a0 is restricted to zeroand aL is restricted evenly to either zero or the last axis in the system. Thesesimple constraints tend to generate melodic ‘expositions’ followed by sequencesof ‘developments’, and also tend to enforce similarity between the opening andclosing sections of compositions.

• The permutation vector P is defined as {p0, p1, . . . , pL}; 0 ≤ pi < length(ai).As per Schillinger’s recommendation, the permutations are restricted to circularpermutations in order to maintain the basic interval structure of the sequence.Terms p0 and pL are restricted to zero, while terms p1 . . . pL−1 are uniformlyrandom. The permutation of an axis applies to its pitches but not its durations.

• The tonal expansion vector S is defined as {s0, s1, . . . , sL}; si ∈ {0, 1}. The termsrefer to orders of tonal expansion as explained in section 3.3.2, and their proba-bilities are weighted equally. Higher orders are avoided because their intervalsquickly become enormous, and ‘collapsing’ the pitches (as used for geometricalexpansions — see below) loses the original shape of the melody, which is notintended by Schillinger in this case. s0 and sL are restricted to zero.

• The inversion vector I is defined as { j0, j1, . . . , jL}; 1 ≤ ji ≤ 4; that is, a selec-tion from the taxonomy of inversions presented in section 3.4, with 20 percentweighting given to type 1 (no inversion) and 80 percent distributed uniformlyamong the rest. The term j0 is restricted to zero, while jL is restricted evenly totype 1 or 4.


• The expansion vector E is defined as {e0, e1, . . . , eL}; ei ∈ {1, 2, 3, 5, 7}. The termsrefer to the orders of expansion as in section 3.4. Orders 4 and 6 are omitted(upon Schillinger’s recommendation) because they do nothing more than re-duce the space of pitches to a subset of order 2. These expansions frequentlyextend far beyond the range of the piano, so they are routinely ‘collapsed’ backto the register of the starting note of the sequence through octave transpositions.Geometric expansions are used sparingly because they modify the original ma-terial to the greatest extent. Thus a weighting of 60 percent is assigned to order1 (no expansion), with 40 percent distributed uniformly among the rest. e0 andeL are restricted to order 1.

A melodic composition can then be expressed as the sequence {M0, M1, . . . , ML},where Mi is built using the formula below. The order of operations has been inferredfrom the examination of Schillinger’s examples.

expgeometric(permute (invert (exptonal(ai), si), ji), pi), ei)

3.6 Structure of the Automated Schillinger System

All of the procedures described up to this point exist independently as a set of compo-sitional ‘building blocks’, and as such they cannot be used to compose music withoutbeing interfaced in some way. Although Schillinger’s theories regularly reference oneanother, in the first four books there are no formalised higher level procedures forcreating compositions from scratch. This section outlines the software solution thathas been devised by this author to encompass all four theories in a fully automatedsystem which can compose self-contained, single-voice melodic compositions, andmulti-voice harmonic passages. To orient the reader, a basic overview of the system’sarchitecture is contained in figure 3.24.

On the following page the reader will find a more comprehensive call graph of theautomated Schillinger System. This graph refers to all of the individual proceduresnecessary to summarise system’s architecture. The points in the system that the userinterfaces with can be found in the bottom left and top right corners (‘compose har-mony’ and ‘compose melody’). Red boxes surround the groups of procedures that areeither associated with or directly implement Schillinger’s theories in books I–IV.

Splic

e

Harm

ony

Invert

Harm

ony

Invert

Voic

e

Contr

act

Pit

chR

ange

Expand V

oic

e

Sca

le T

ransl

ato

r

Genera

te S

ym

m.

Harm

ony

Neare

st-t

one

Voic

e L

eadin

gSca

le t

oB

asi

c H

arm

ony

Sym

metr

ic t

oSub-s

cale

s

Hybri

d H

arm

ony

Re-v

oic

eSta

rtin

g C

hord

Compose

Harm

ony

Compose

Melody

Random

Sym

m. Sca

le

Random

Sca

leR

andom

Flat

Sca

le

Adju

st R

egis

ter

Sca

leTo

nal Expansi

on

Aco

ust

ically

Acc

epta

ble

?

Build

Melo

dy

Random

Axis

Syst

em

Superi

mpose

Pit

ch/R

hyth

m

Genera

teB

uild

Para

ms.

Genera

teSeco

ndary

Axe

s

Genera

te

Rhyth

mG

roup A

ttack

s

Gro

up D

ura

tions

Convert

Basi

s

Coeff

. /

Gro

up

Synch

ronis

ati

on

Perm

uta

tion

Genera

tor

Pri

mary

Res.

Seco

ndary

Res.

Tert

iary

Res.

Inte

rfere

nce

Patt

ern

Resu

ltant

Gro

up b

y P

air

s

Random

Res.

From

Basi

s

Self-

conta

ined

Rhyth

m

Alg

ebra

ic

Exp.

Rhy.

Conti

nuit

y

GEO

METR

IC V

AR

IATIO

NS

TH

EO

RY O

F M

ELO

DY

TH

EO

RY O

F R

HYTH

M

TH

EO

RY O

F PIT

CH

SC

ALE

S

Au

tom

ate

d S

ch

illi

ng

er

Sys

tem

: C

all

Gra

ph


HarmonicModule

MelodicModule

Automated Schillinger System

Theory of Rhythm

Theory of PitchScales

GeometricVariations

Theory of Melody

Impromptu

Figure 3.24: Basic overview of the structure of the automated Schillinger System

The following sections describe the higher level procedures that were necessary tocomplete the automated system. As far as Schillinger’s system itself is concerned, theyare entirely arbitrary manifestations of this author’s interpretation of the formalismas a whole. This is somewhat problematic, and even though every effort has beenmade to impart as little aesthetic influence as possible through these procedures, suchinfluence is difficult to perceive in the system’s output and the lack of it cannot beguaranteed.

3.6.1 Rhythm Generators

Despite the abilities of the rhythmic procedures in section 3.2 to generate a vast spaceof content, one lingering aspect of the Theory of Rhythm that remains largely undefinedby Schillinger is how to select from it; this is left entirely to the composer’s musicaltaste. In lieu of any formal procedures, the current section describes this author’snecessary solution for providing rhythmic resultants to the harmonic and melodicmodules. As mentioned above, this solution is quite arbitrary — it has been designedto incorporate as much of the content produced by his procedures as possible.

The schematic in figure 3.25 shows how the automatic Schillinger System’s tworhythm generators are structured. Calling functions make one of the following re-quests, in which t is the time basis and T is the time ratio.

• Rhythm generator 1: generate-rhythm(t, T)

• Rhythm generator 2: random-resultant(t)

The functionality of each part is listed below.

• The primary, secondary and tertiary resultants are produced using pairs or triosof integers as described in section 3.2.1.

§3.6 Structure of the Automated Schillinger System 57

1. Generate Rhythm

Group Durations

PermutationGenerator

PrimaryResultant

SecondaryResultant

TertiaryResultant

InterferencePattern

ResultantGroup by Pairs

2. RandomResultant

Self-containedRhythm

Algebraic Expansion

RhythmicContinuity

Figure 3.25: Call graph showing the structure of the rhythm generators

• Rhythm generator 2 (‘random resultant’) selects between the three kinds of re-sultants with equal probability. In line with Schillinger’s suggestion the inputsfor the tertiary resultant function are confined to trios of integers ≤ 9 drawnfrom the same Fibonacci sequence. Primary and secondary resultant inputs arealso confined to an enumerated set of possible pairs at Schillinger’s behest, withall integers i such that i ≤ 9. In all cases one of these integers is fixed as t.

• The function which generates random resultant combos in the manner shownin 3.2.1 does so by randomly generating both a primary and secondary resultantusing t, with the same constraints as rhythm generator 2.

• The permutation generator returns a random circular permutation of its input.

• The ‘Self-contained rhythm’ function first extracts a random sub-group G of du-ration t from a resultant R provided by rhythm generator 2. It then collects arandom resultant combo using t, algebraic expansions of G using powers 2 and3, and continuity patterns of all variation types listed in section 3.2.3 generatedfrom both R and G. Finally, it randomly selects a resultant from the subset of thecollection possessing total durations less than T× t.

• Rhythm generator 1 randomly selects from a ‘self-contained’ rhythm, a random(t × T)-duration sub-group of a resultant R provided by rhythm generator 2,and a random t-duration sub-group of R concatenated to a recursive call torhythm generator 1 with arguments t and T− 1.

Rhythm generator 1 is used by the melodic module to randomly generate rhythmic re-sultants of specific total durations that are then superimposed onto axes as described


in section 3.5. Rhythm generator 2 supplies only randomly selected symmetrical re-sultants of arbitrary total duration. These are used by the harmonic module to spliceharmonic inversions together as shown in section 3.4, by the melodic module to de-termine the pattern of alternation between the individual axes in a combination axis,and also by rhythm generator 1.

The rhythmic generators do not attempt to assess the inherent quality of a resul-tant or its applicability to the context it is required in. Instead, they make the as-sumption that all rhythms which satisfy the constraints t and T imposed by the callerare equally viable (and by implication, that Schillinger’s rhythmic procedures are do-ing something musically meaningful). Thus, in effect the rhythmic generator doesnothing more than impose a probability distribution across the space of all possibleresultants of a given total duration, as a side-effect of the generative procedures it hasat its disposal. To illustrate the point, figure 3.26 shows the relative frequency of allpossible resultants that are encompassed by the time basis t = 4, with T = 1.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Rhythmic resultants

(1 1

1 1

)

(1 1

2)

(1 2

1)

(2 1

1)

(2 2

)(1

3)

(3 1

)(4

)

Pro

babi

lity

of o

ccur

ence

Figure 3.26: The probability distribution imposed by rhythm generator 1 across the space ofrhythmic resultants for t = 4 and T = 1.

Degazio pointed out that Schillinger’s method of treating rhythmic cells as multi-level structural generators could be used to produce fractal structures [Degazio 1988].This possibility has not been pursued in the current scope of work because the har-monic and melodic modules contain only very limited opportunities to incorporatesuch structures. Additionally, given that the current thesis is concerned with adaptingSchillinger’s system as a music-generating entity in itself, the application of Degazio’sideas would likely fall outside of this goal.

3.6.2 Harmonic and Melodic Modules

The harmonic module uses rhythm generator 2 , the procedures pertaining to symmet-ric pitch-scales and the geometric variation procedures to build a harmonic passage.


Virtually all of the required functionality for this process has already been discussedin sections 3.3 and 3.4; the module merely controls the data flow during the compo-sition process. Figure 3.27 contains a visual representation of the module’s operation.The constraints applied during composition can be found in section 3.6.3.

YESNO

RhythmicGenerator 2

RandomSymmetric Scale

Contract PitchRange

Hybrid Harmony

Symm. Scale toSub-scales

TonalExpansions

Symm. Scaleto Chords

HarmonySplicer

GeometricInversions

Nearest-toneVoice Leading

Output

Acoustically acceptable?

3.3.1

3.3.2

3.3.4

3.3.3

3.4.2 3.4.1

3.3.4

3.3.4 3.3.4

3.3.4

3.6.13.2

Figure 3.27: Harmonic module data flow, including relevant section numbers pertaining tothis chapter.

The melodic module incorporates all four of Schillinger’s theories that have beenexamined in previous sections. The composition process is visualised in figure 3.28.As with the harmonic module, the melodic module controls the data flow during thisprocess, thereby acting as an interface between Schillinger’s theories. However, so farthe process for generating a melody is only well defined if the axis system is alreadyknown (as was the case in for examples in section 3.5). Unfortunately Schillingerprovides no explicit method for generating axis systems, so this author has providedtwo further procedures to accomplish this task.


Generate AxisSystem

Random Flat orSymmetric Scale

GenerateSecondary Axes

Build MelodyGenerate

Build Parameters

SuperimposeRhythm and Pitch

onto Axes

RhythmGenerator 1

GeometricVariations

PermutationGenerator

Tonal Expansions(Scale Translator)

Output

3.6.23.5.1

3.6.13.2

3.6.2 3.3.1

3.5.23.5.3

3.5.4 3.5.4

3.4

3.3.2

3.2.3

Figure 3.28: Melodic module data flow, including relevant section numbers pertaining to thischapter.

The first produces a set of axis parameters: a sequence of axis types, a sequenceof time ratios, a sequence of pitch ratios, a time basis t and a ‘degree of motion’. Cur-rently, the axis types are influenced by the user in the form of ‘stimulus’ list such asthe following:

(u b u b)

A value of u indicates an ‘unbalanced’ axis, while b indicates a ‘balanced’ axis.These values are used to choose axis types (or combinations of axis types) at randomfrom the taxonomy in figure 3.15. The time basis, time ratio and pitch ratio associatedwith each axis are chosen at random from the ranges documented in section 3.6.3.The degree of motion is a concept included by the author to ensure that rather than


the oscillatory motion types of each axis being selected at randomly from the twelvepossible types (see section 3.5.3), a relatively consistent amount of either angular orsmooth step-wise movement is applied from axis to axis. A degree of motion is se-lected at random from the range [1, 5]. The meaning of these options is describedbelow.

The second procedure in the chain, as observed in figure 3.28, is necessary to pro-vide a system of axes which can then undergo the superimposition process. Each axisoutput by this procedure consists of the corresponding axis type and pitch ratio Pgenerated by the first procedure; a rhythmic resultant provided by rhythm genera-tor 1 of total duration t× T; and a motion type of the form (bias, alternating,revolve). Table 3.5 shows how the motion type is influenced using the degree ofmotion by applying different probabilities to the individual parameters of the motiontype tuple. The u and b options in the bias column apply when the axis type is re-spectively unbalancing or balancing. Informally, the degrees range from guaranteedsmooth motion to guaranteed oscillatory motion with frequent melodic leaps.

Table 3.5: Probabilities of motion type parameters for different degrees of motion

DegreeBias Alternating Revolve

-1 1 T F -1 0 11 1 0 0 1 0 1 0

2b: 1 b: 0

0 1 0 1 0u: 0 u: 1

3b: 1 b: 0

0.2 0.8 0.25 0.5 0.25u: 0 u: 1

4b: 0.7 b: 0.3

0.5 0.5 0.25 0.5 0.25u: 0.3 u: 0.7

5 0.5 0.5 1 0 0.5 0 0.5

Once a melodic composition is generated, the module converts the resulting se-quence of relative durations (accompanying the pitch sequence) into a standard formappropriate to be mapped to musical notation, by dividing each relative duration bythe power of 2 closest to the time basis.

3.6.3 Parameter Settings

The table in this section (table 3.6) contains the parameter ranges that are wired intothe ‘push-button’ version of the automated Schillinger System. Within the specifiedranges, the actual values chosen are uniformly random for each execution of a mod-ule. The table does not include constraints that are in place according to Schillinger’sexplicit recommendations, thereby contributing directly to the modelling of the the-ories. This information is meant to complement the constraints introduced by theauthor as part of the process of adapting the procedures, such as those that were men-tioned in sections 3.5.3, 3.5.4 and 3.6.2. Generally speaking, the settings have beenchosen with the view to coaxing forth a representative cross-section of the system’s


Table 3.6: Parameter settings used by the author for the ‘push-button’ system

Section Parameter Range/setting

Harmonic module

No. symmetric sub-scale intervals [1, 6]Restrict 7-tone scales to Western false

Tonic note [C3, C5]Time basis for splicing [3, 9]

Possible inversion types [1, 4]

Melodic module

No. flat scale intervals [1, 7]No. symmetric sub-scale intervals [2, 6]

Flat scale range [5, 12]Restrict 7-tone scales to Western true

Tonic note [C3, C5]Time basis for rhythm [3, 9]

Nominal length [5, 9]Pitch ratio [1, 2]Time ratio [1, 4]

musical capability without requiring an enormous quantity of output and analysis.Specifically, each parameter has been given its current setting by the author for anyone of three reasons:

• To avoid unreasonably long computation times in the Impromptu environment;

• To reduce the presence of clusters in the output possessing particular anoma-lous characteristics, such as harmonies that contain only a single repeated chord,melodies with physically implausible intervals or music centered in extreme reg-isters;

• To implement musically logical lower or upper bounds that are not mentionedby Schillinger but are necessary to prevent output which is completely trivial,such as one-note harmonies or melodies5; or music that is absurdly long.

In future work, specific combinations of these parameters may be established thatserve as reliable prescriptions for stylistic or aesthetic properties in the music’s output.They could also be made individually controllable by the user as part of a graphicalor command-line interface. So far, the author has not been able to identify individualparameters that have a noticeable or measurable effect on the final output in terms ofits style.

3.7 Parts of Schillinger’s Theories Not Utilised

The content of books V–XII of the Schillinger System has not been used in either ofthe modules due to the restricted scope of this thesis. Additionally, several aspects

5 This is not to suggest that single-note melodies cannot be musically interesting. In this case however,they will certainly be trivial.

§3.7 Parts of Schillinger’s Theories Not Utilised 63

of books I–IV have also been omitted from the project for various reasons. These arelisted below to help give a clear idea of the extent and limitation of the current work,and also as a reference for future work.

• The use of tertiary generators, variation techniques and algebraic expansionsfor producing poly-rhythmic textures has not been included because the systemdoes not currently incorporate a notion of polyphony. Polyphony is central tothe construction of more complex compositions, requiring the context of booksV–XII.

• The application of resultants and synchronisation to ‘instrumental forms’ is omit-ted because it pertains to instrumentation and orchestration, which are discussedin later books.

• Rests are not incorporated into the rhythmic generator for want of a more so-phisticated method determining their placement. Schillinger offers minimal ad-vice on the placement of rests.

• Rhythmic accents are not incorporated because they are only covered extremelybriefly and fall partly into the realm of Schillinger’s Theory of Dynamics.

• Schillinger’s ‘evolution of rhythm styles’ is omitted because it consists primarilyof an analytical discussion with reference to popular musical styles of his timeof writing, rather than any explicit generative procedures.

• The discussion of ‘rhythms of variable velocities’ is relevant to the field of ex-pressive performance rather than to algorithmic composition as such. The prob-lem of expressive performance is mentioned in chapter 4 of this thesis.

• The use of synchronisation to produce simple looping melodic forms from pitch-scales has not been incorporated into the melodic module because it does not fitwith the melodic axis paradigm, which is what the current melodic module isbuilt around. As it is presented, it also produces absolute rhythmic monotony,which has been avoided for this system’s melodies.

• Schillinger’s ‘evolution of pitch-scale families’ refers to the use of interference,subdivision, circular permutation and transposition to build a set of supposedlyrelated scales which may bring unity to a longer form piece. As both modules inthis system are focussed on smaller compositions, this concept has been aban-doned for the present time.

• The concept of ‘melodic modulation’, as discussed in the Theory of Pitch-scales;that is, concatenating the synchronised melodic forms mentioned above intolonger sequences using multiple pitch-scales with pivot sequences at the con-nection points, has not so far been incorporated into the melodic module. Againthis is due to it being largely incongruous with the axis paradigm. Schillinger’smethod of identifying and reusing motifs using this concept should also benoted.


• Producing melodic continuity from symmetric pitch-scale ‘contractions’ has beenomitted for the same reasons as above.

• The accompaniment of the simple harmonic procedures in section 3.3.4 withmelodic forms derived from the same pitch-scale has been omitted from the cur-rent implementation, because without significant human intervention it placestoo many restrictions on the current method for harmonic generation used inthe harmonic module.

• The concatenation of short melodies into longer melodies using only geometri-cal inversions has been avoided as a technique in itself, because the equivalentfunctionality exists in the melody builder as part of the somewhat more sophis-ticated melodic module.

• Geometrical expansions in the temporal domain have been left out of the melodicmodule for the time being because they produce quite drastic incongruities inwhat are currently short-form compositions. It may be more appropriate to in-clude this once more explicit concepts of form and higher-level structure havebeen incorporated from later books.

• The geometrical expansion of harmonies is not currently performed, because ithas the effect of simply projecting a chord progression from the 12

√2 tuning sys-

tem into whole-tone ( 6√

2), diminished ( 4√

2), augmented ( 3√

2) and tritone ( 2√

2)systems. This technique was deemed unnecessarily limiting for short harmonicpassages, but could be viable in the context of longer compositions.

• No attempt has been made to automate Schillinger’s notion of musical seman-tics because it is mostly in the form of philosophical discussion. The section onclimax and resistance in relation to a ‘psychological dial’ is particularly note-worthy because in the past it has been referred to by successful film composers[Degazio 1988]. As explained in section 3.6.2, the user is currently in control of‘seeding’ the melodic module with a set of abstract axis types, but no explicitmusical meaning is drawn from their combination when building a composi-tion.

• Schillinger’s application of melodic trajectories to generate short embellishmentshas not been used in the current system, but is fairly amenable to being addedin the short term.

• The very brief discussion on melodic modulation in the context of axis systemsis omitted because it was felt that it would be better considered in the futurealongside Schillinger’s other discussions of melodic modulation in the contextof pitch-scales.

• Finally, the use of ‘organic forms’ (melodic motifs or entire passages generatedusing number sequences related to the Fibonacci series) in melody generation

§3.8 Discussion 65

has been omitted due to time constraints. These motifs could easily be incorpo-rated into melodic compositions by giving the melody builder the opportunityto select them either as a possible variation or an alternative initial sequence.This requires the composition’s pitch-scale to be derived from the motif.

To summarise, the elements of Schillinger’s theories listed above have mostly beenleft out either due to time constraints or because they are too heavily related to theoriesin books V-XII to warrant further investigation without the additional context. All ofthe items stand to be revisited in future work.

3.8 Discussion

The construction of an algorithmic composition system based entirely on Schillinger’stheories has presented several hurdles. In particular, none of the first four books ofthe Schillinger System under consideration contain the means for formally interfacingeach collection of procedures, and even some of the procedures which are amenableto computer realisation require significant reinterpretation to make this plausible.In both cases the author has been obliged to devise and implement algorithms notpresent in Schillinger’s theories, and it is possible that this has influenced the aestheticcharacteristics of the system’s output in ways that are difficult to detect, somethingundesirable but unavoidable.

Nevertheless, this chapter has shown that the bulk of the material in these bookscan in fact be adapted to computer implementation. As far as the author can ascertainthis is the first system of its kind to be formally documented. Two modules have beenpresented that automatically compose harmonies and melodies using Schillinger’stheories in a non-interactive ‘push-button’ paradigm. These modules have been de-scribed in detail, and the points in the system’s operation where constraints on theoutput space are enforced have been documented. Of particular note is a new formaldefinition of Schillinger’s ‘forms of motion’ in section 3.5.3, which allows for gener-ation of melodies using the informal framework he provided in the Theory of Melody.This was followed by a comparison between the formal and informal procedures inthe context of music by J. S. Bach, which has raised further pertinent questions aboutthe nature of the automated Schillinger System’s output with regard to musical style.As it stands, this chapter’s content also provides a valuable resource for others wish-ing to approach Schillinger’s first four theories of composition, because it containsconcise explanations of the majority of their generative procedures.

Up to this point the automated Schillinger System has been discussed in termsof its procedures, but not in terms of the quality or stylistic diversity of the musicit is capable of producing. This is another matter entirely which will be exploredextensively in chapter 4 as a means of critically evaluating the system.

Chapter 4

Results and Evaluation

4.1 Introduction

An algorithmic composition system is of no use if it does not produce musically mean-ingful output. In a survey of the first three decades of computer-assisted composition,Ames acknowledged the evaluation of output to be a highly problematic but essentialaspect of this research [Ames 1989]. Miranda has frequently noted the difficulty ofverifying musical output without intervening human subjectivity [Miranda 2001; Mi-randa 2003]. Section 4.2 will briefly survey the most common methods of assessmentemployed by authors who have viewed it necessary to go beyond a cursory personaljudgement. In the sections thereafter, informed by past methods of evaluation, twoexperimental methods will be described that have been used to gain some insight intothe aesthetic and stylistic characteristics of the output from the automated SchillingerSystem.

The first experiment draws on the burgeoning field of musical information re-trieval (MIR); in particular, automated genre classification. Section 4.4 presents amethod for measuring the style and diversity of MIDI output using MIR-orientedmachine learning software, and the corresponding results. The second experiment isa listening survey involving expert participants, which provides a useful collection ofboth quantitative and qualitative data from which to develop robust conclusions re-garding the subjective properties of a representative group of samples of the system’soutput. Section 4.5 describes the details of the listening survey and presents the re-sults from it. Section 4.6 summarises and discusses the implications of the results ofboth experiments.

4.2 Common Methods of Evaluation

In describing the genetic algorithm-based improvisation system GenJam, Biles claimedthat solos begin to yield “pleasing” results after five generations and “reasonable”results after ten generations [Biles 1994]. Johnson-Laird referred to the results ofa constraint-satisfaction composition system as “simplistic but pleasing” [Johnson-Laird 1991]. Johanson and Poli, referring to a system using genetic programming,gave the concluding statement that “almost all of the generated individuals were

67

68 Results and Evaluation

pleasant to listen to” [Johanson and Poli 1998]. This kind of cursory subjective judge-ment by authors in the published literature is common. There is no suggestion beingmade here that these judgements are necessarily unjustified, but they are fundamen-tally unscientific, prone to bias and therefore unsatisfactory [Wiggins et al. 1993].

The formal assessment of the validity of musical passages has often been attemptedusing objective functions, mostly in the context of genetic algorithms where it is neces-sary to sort population members by fitness. These objective functions typically calcu-late a ‘penalty’ score based on how many and what kinds of rules in a knowledge baseare broken [Phon-Amnuaisuk et al. 1999], or perform a statistical comparison to a cor-pus of musical exemplars [Puente et al. 2002]. Unfortunately these methods are lim-ited to musical problems with well-defined, widely documented aesthetic constraints— namely traditional chorale harmonisation.1 Pearce and Wiggins have discussedmore advanced frameworks intended to replace subjective judgements with exten-sive musical analysis, but they too can only operate within specific stylistic boundaries[Pearce and Wiggins 2001].

It is desirable to move beyond this kind of evaluation. For this reason some au-thors have undertaken more rigorous evaluations of output by involving one or more‘musical experts’. Phon-Amnuaisuk engaged a senior musicology lecturer to markcomputer output using the same criteria as first-year students of harmony [Phon-Amnuaisuk et al. 1999]. Hild et al. used “an audience of music professionals” whoranked the output of the system HARMONET to be on the level of an improvising or-ganist [Hild et al. 1991]. Periera et al. used “expert musicologists” to give a panel-styleevaluation using criteria such as musical interest and musical reasoning [Pereira et al.1997]. Storino et al. have concentrated on whether or not humans are able distinguishhuman-composed music of a particular style from similar computer-composed musicin controlled experiments [Storino et al. 2007].

In the human experiments above where the focus is not on fooling participantswith style imitation but rather seeking a genuine appraisal of merit, none of the meth-ods or results are presented in the literature except anecdotally, and there is little evi-dence that they are particularly rigorous. This thesis will take the concept of assessingmusical merit one step further by performing a far more in-depth survey of expert hu-man participants using carefully designed criteria. The details of this study comprisesection 4.5.

4.3 Automated Schillinger System Output

Before the details of the experiments designed for evaluation are presented, it is im-portant to make clear exactly what is being evaluated.

The automated Schillinger System does not output audio data; instead it gener-ates symbolic data constituting pitch and duration information in the form of LISP

1 Even within this apparently well-defined problem space, the use of objective functions to guidemusical quality is highly questionable, given that the exemplars of four-part chorale writing routinelybreak the rules of harmony that have supposedly been derived from them [Radicioni and Esposito 2006].

§4.3 Automated Schillinger System Output 69

data structures (discussed briefly in section 3.1.2). This has two implications: firstly,a process must take place in order to convert the symbolic data into audio, and sec-ondly, such a process will necessarily add information pertaining to musical dimen-sions other than pitch and duration. The simplest solution is to map the pitch andduration information to raw MIDI output, using default values for the other musi-cal dimensions (primarily tempo, timbre and note velocity). This method was usedduring development because it allowed instant feedback; the provision of audio andMIDI interfaces is one of the advantages of writing Scheme in Impromptu.

The plain pitch and duration data is sufficient for this chapter’s genre classifica-tion experiment, however the audio generated for instant feedback is only adequatefor verifying the correctness of the program. In order to assess the musical merit ofpitch and duration data, this data needs to be heard in the context of a fully embod-ied parameter set in order to avoid biasing or distracting the listener by the lack ofvariation in the dimensions which aren’t controlled by the system. This is especiallyimportant when listeners undertaking the evaluation are musical experts with limitedor no experience in computer-aided composition.

This issue has been identified by several authors working in the field of automatedmusical performance [Widmer and Goebl 2004; Arcos et al. 1998]. Kirke et al. pro-vided a comprehensive survey of the approaches taken towards simulating the humanperformance of musical data sets [Kirke and Miranda 2009]. The goal of this field of re-search is to extend the realm of computer generated parameters to the total symbolicparameter space of music, which would ultimately enable software to give expres-sive renditions of computer-generated compositions instead of just ’robotic’ ones. Inparticular, it focuses on the context-sensitive prediction of tempo and note velocity in-formation. The computational approaches include ’expert’ non-learning performancesystems, regression methods, neural networks, case-based reasoning systems, statis-tical graphical models, and evolutionary models.

Although automated expressive performance is clearly beyond the scope of thisthesis, it is still necessary for the music to be presented to a human audience in theform of expressive performances. Such an approach using human performers hasbeen used extensively by Cope, for similar reasons related to bias as listed above[da Silva 2003]. In this case however, to avoid the inconvenience of obtaining pro-fessional performances from multiple instrumentalists, a high quality digital soundlibrary has been used to provide the timbres for a series of performances recorded bythe author using sequencing software. These sequences are subsequently rendered toaudio. Figure 4.1 gives a visualisation of the entire process, which incorporates theopen-source musical engraving software LilyPond to produce the intermediate outputof standard musical notation. (LilyPond is also used to generate the MIDI files to beused for genre classification.) The reader, should they wish to briefly become listener,is directed to the audio samples on the CD accompanying the hard copy of this docu-ment. The samples are also available online.2

2 To access the MP3 files online, follow the hyper-links in the electronic copy of this document con-tained in table 4.3, located in section 4.5.1.


SchillingerSystem

LilyPond

PDF

MIDI Files Classifier

Author'sPerformance

Sequencer

SoundLibrary

Audio HumanAudience

Figure 4.1: Conversion process from list representation to audio

4.4 Assessing Stylistic Diversity

Both Schillinger and the editors of his published volumes make various claims tothe effect that in its capacity as a formalism designed for human composers, theessence of the Schillinger system is independent of any overbearing stylistic frame-work. The foreword by Henry Cowell, a distinguished composer and contemporaryof Schillinger [Quist 2002], suggests that Schillinger’s system is capable of generatingmusic in any style [Schillinger 1978]. The reasoning behind these views is that ratherthan encoding explicit style-specific musical knowledge like many other music the-ory treatises, the Schillinger System encodes implicit musical knowledge in the formof procedures which, for the most part, can be expressed mathematically (see chapter3).

Given that the procedures have been adapted and implemented in the form of acomputer system, the notions of style and diversity must be investigated; not simplyto assess the credibility of the claims (it is not the express purpose of this sectionof the thesis to either validate or debunk them), but more importantly to determinewhether or not the automated system could actually be used for generating materialin a variety of musical contexts.

It is for this reason that the active research field of genre classification has beenemployed. The goal of using a classifier is two-fold: to find out which musical cate-gories are assigned to the output of the automated Schillinger System, and to find outwhether the output contains a notable degree of statistical diversity — something thatwould manifest as the frequent assignment of several different genres. If the classifierwere to give statistically significant results, then it would be meaningful to comparethem to the assertions regarding style and diversity collected from participants in thelistening survey (see section 4.5.4).

Section 4.4.1 will give an overview of the field of automatic genre/style classifi-cation. This will serve as justification for the choice of software used to perform theexperiment outlined in sections 4.4.3, 4.4.4 and 4.4.5. The results will be presented anddiscussed in section 4.4.6.

§4.4 Assessing Stylistic Diversity 71

4.4.1 Overview of Automated Genre Classification

Automatically classifying musical genre or style by examining an file’s audio or sym-bolic (usually MIDI format) musical content has applications primarily in musical in-formation retrieval and cognitive science. In the former case, the goal is to automatethe human task of assigning genres to tracks in musical databases to facilitate search-ing, browsing and recommendation. In the latter, the goal is to discover the processesbehind the human cognition of musical style, and often to try and determine howcomposer styles are manifested statistically or structurally. The computational ap-proaches for each discipline have tended to be slightly different in the literature. MIRresearch focuses predominantly on statistical feature extraction and standard machinelearning techniques. Style cognition research has a longer history, and has seen em-phasis on grammatical and probabilistic models in additional to statistical feature ex-traction.

Scaringella et al. [Scaringella et al. 2006] provide a comprehensive survey of au-tomatic genre classification, pointing out that it is an extremely non-trivial problemnot only for technical reasons, but also due to many endemic problems with genredefinitions themselves. One of these problems is the lack of a consistent semanticbasis: labelling can derive from geographical origins (Latin), historical periods (Clas-sical), instrumentation (Orchestral), composition techniques (Musique Concrete), sub-cultures (Jazz), or from terms which are coined arbitrarily in the media or by artists(Dubstep). Issues of scalability arise whenever new genres emerge from combinationsof old ones. Pachet and Cazaly noted the utter lack of consensus on genre taxonomiesamong researchers and popular musical databases [Pachet and Cazaly 2000].

These problems cannot be ignored when designing classifiers. Scaringella arguesthat attempting to derive genre from audio requires the assumption that it is as muchan intrinsic attribute of a title as tempo, which is “definitely questionable” [Scaringellaet al. 2006]. Dannenberg et al. commented that higher-level musical intent appears“chaotic and unstructured” when viewed as low-level data streams [Dannenberg et al.1997]. On the other hand, one particular study seems to provide good motivationfor this line of research: Gjerdingen and Perrott found that humans with variablemusical backgrounds were able to correctly categorise musical snippets of only 250msin 53 percent of cases, and snippets of 3 seconds in 72 percent of cases [Gjerdingenand Perrott 2008]. This result is convincing evidence that even untrained humanshave an innate ability to recognise style from a small amount of data, which impliesthat the data must contain some measurable characteristics which make that possible.Therefore, in MIR the importance to date has been on the extraction of meaningfulstatistical features from short frames of audio data.

Statistical features extracted from audio fall into the broad categories of temporal,spectral, perceptual and energy content [Scaringella et al. 2006]. The precise featureextraction algorithms are numerous and need not be discussed here. Feature patternsare used to train models based on unsupervised clustering algorithms or supervisedlearning algorithms. In both cases the resulting model of pattern separation is used asthe basis for the classification of new patterns extracted from unlabelled pieces of mu-


sic. Various authors have reported success with an array of different algorithms andfeature sets, for both audio and symbolic data [Scaringella et al. 2006]. The advantageof symbolic data is that reliably discerning musical statistics such as pitch and chordrelationships is easily accomplished; a disadvantage is the shortage of important spec-tral information.

Chai and Vercoe classified symbolic encodings of monophonic folk melodies asbeing Irish, German or Austrian using Hidden Markov Models, with an accuracyapproaching 80 percent [Chai and Vercoe 2001]. The classification of symbolically en-coded folk songs was also addressed by Bod, using probabilistic grammars to achieve85 percent accuracy [Bod 2001]. Shan and Kuo trained a genre classifier using bothMIDI harmonies and melodies [Shan and Kuo 2003]; they used a method combining apriori pattern finding with heuristics, which achieved an accuracy of 84 percent usingjust melodic features. Keirnan used self-organising maps to successfully partition au-dio into three classes representing the composers Friederick, Quantz and Bach [Kier-nan 2000]. Ruppin et al. [Ruppin and Yeshurun 2006] used the K-nearest neighbouralgorithm to classify MIDI files as either Classical, Pop or Classical Japanese, with 85percent accuracy. Kosina used K-nearest-neighbours to classify audio as Metal, Danceor Classical with 88 percent accuracy [Kosina 2002]. Xu et al. distinguished betweenPop, Classical, Jazz and Rock audio using support-vector machines, with 96 percentaccuracy [Xu et al. 2003]. Among the most comprehensive and successful work inMIR to date is that by McKay, who used a learning ensemble consisting of neural net-work and K-nearest-neighbour classifiers trained on MIDI files using 111 features andaudio using 26 features, each weighted by sensitivity using a genetic algorithm. Thissystem achieved a 9-genre classification accuracy of 98 percent [McKay 2010].

The majority of authors agree that improvement can be made by increasing thesophistication of the feature sets, but evidently there is still no widely accepted algo-rithm for making even extremely broad classifications. Some authors have deducedthat the relatively small size of the datasets may be to blame — both McKay and Poncede Leon et al. have concluded that song databases much larger than those currentlyin use are the key to assessing the real worth of particular combinations of feature setsand learning algorithms [McKay 2010; Ponce de Leon et al. 2004]. McKay also advo-cates the training of classifiers on both audio and symbolic features simultaneously.This requires perfect MIDI transcriptions of audio files, a rare commodity that willcontinue to rely on highly skilled human labour until significant advances are madein the field of automated polyphonic transcription [McKay 2010].

The recent release of a million-song feature-set for public use [Bertin-Mahieux et al.2011] is likely to instigate the next generation of MIR research and a significant rais-ing of the bar in the near future. In the meantime, it must be stressed that the assign-ment of genre labels to the automated Schillinger System’s output will be flawed toan extent; the purpose of the experiment is simply to determine whether the output’sstatistical characteristics point more towards certain styles than others, and whetherthe output contains a notable degree of diversity.


4.4.2 Choice of Software

As described in section 4.3, the output of the automated Schillinger System requiresconversion to audio for the human participants in the listening survey; however, onlyMIDI files were able to be used for the purpose of automated classification. The mainreason for this is that the method for encoding audio from symbolic musical datain figure 4.1 is time-consuming, and it was desirable to classify a large number ofcompositions in order to obtain statistically significant results. The use of MIDI filesmeant that symbolic classification software was required.

Classification software designed specifically for MIR research is currently diffi-cult to come by. Fortunately McKay has developed a suite for precisely this purposecalled jMIR [McKay 2010] which may be used for both symbolic and audio files, anda predecessor called Bodhidharma [McKay 2004] which was designed specifically forworking with MIDI files and is equivalent to using jMIR in symbolic mode. Bodhid-harma was responsible for the winning entry at the 2005 MIREX music classificationconference [Mckay and Fujinaga 2005]. It extracts up to 111 selectable features, uses ahierarchical taxonomy of 9 root genres and 38 leaf genres, and uses a learning ensem-ble consisting of artificial neural network and K-nearest-neighbour classifiers [McKay2004]. Furthermore, it is accompanied by a sizable training set of 950 MIDI files (re-ferred to henceforth as the Bodhidharma set) intended for use with the hierarchicaltaxonomy. It is therefore arguably the best possible means for performing a classifi-cation experiment on MIDI data currently publicly available. Other options for MIDIfeature extraction and analysis such as Humdrum [Huron 2002] and The MIDI Toolbox[Eerola and Toiviainen 2004] were examined, but proved to be not as comprehensiveas Bodhidharma.

4.4.3 Classification Experiment

The goals of this experiment can be summarised as follows:

1. To find out which genres are automatically assigned to the Schillinger output;

2. To see if those assignments are significantly different for the outputs of the har-monic and melodic modules;

3. To test the hypothesis that the output from the Schillinger system is stylisticallydiverse.

The method used is outlined below:

1. Automatically generate sets of melodies and harmonies using the automatedSchillinger System;

2. Establish appropriate configurations for training a classifier for each set;

3. Train separate classifiers on the Bodhidharma set using the two configurations;


4. Present the Schillinger sets to their respective classifiers to obtain genre labels;

5. Analyse the distribution of genre labels to satisfy each goal above.

4.4.4 Preparation of MIDI files

The current version of the automated Schillinger System is effectively engineered as a‘push-button’ solution consisting of separate modules for generating harmonies andmelodies. The combinations of parameters controlling these modules, specified by theauthor, have been listed in section 3.6.3.

The melodic module accepts as input a vector of high-level axis types (see section3.6.2). One-hundred MIDI melodies were generated using the input (u u b b) —that is, a sequence of two ‘unbalancing’ axes followed by two ‘balancing’ axes. Thisset will be referred to as the 100M set. The harmonic module is fully automated. One-hundred harmonies were generated which will be referred to as the 100H set.3

Harmonies were encoded as MIDI files using one voice per track, in order to im-prove the performance of the feature extractor for pitch-class and textural features[McKay 2010].

Ideally, to properly control the experiment the Bodhidharma set should be modi-fied to be in exactly the same format as the system’s output. This would mean creatingone Bodhidharma set with all non-melodic tracks removed and another with all non-harmonic tracks removed, with appropriate rhythmic quantization applied to all noteevents. There are both practical and technical reasons why this could not be done inthe required time-frame:

• Distinguishing between melodic and harmonic tracks is very problematic insome genres, despite being simple in others (those with lead vocals, for in-stance);

• Melodic content contributes to harmonic content, and the music’s functionalcontext can easily change when the melody is absent;

• The two issues above mean that automating such a process could not give reli-able results without implementing a complex set of algorithms for musical anal-ysis. Such an implementation would be inordinately time-consuming within thescope of the thesis, as would the manual modifications otherwise required;

• With so much information missing, the classifier’s ability to train successfullyon the Bodhidharma set may end up being too weak. If this were the case then itmight give credence to the notion that statistically similar harmonies or melodiescan be adapted to multiple genres, but it could just as easily lead to meaninglessclassification results.

3 The first constraint in table 3.6 restricts harmonies to between 2 and 7 voices. This is deliberate,because anything thicker than 7 voices causes the nearest-tone voice leading algorithm to have an un-reasonable execution time due to its computational complexity and the fact that Impromptu is an inter-preter. See section 3.3.3.


Thus, a potentially less-than-ideal situation was settled upon to ensure the exper-iment was at least feasible.

4.4.5 Classifier Configuration

Bodhidharma’s strength as a MIR utility lies in the carefully designed set of 111 sta-tistical features that are extracted from the MIDI data. These features are split intogroups pertaining to instrumentation, texture, rhythm, dynamics, pitch, melody andchords. The complete list can be found in [McKay 2004]. In order to focus as closelyas possible on the parameters controlled by the automated Schillinger System, theclassifiers for the 100H and 100M sets were trained with certain features switched offfor practical considerations, as shown below in table 4.1. For instance, it would notmake sense to include the instrumentation features in the training patterns when ev-ery single member of the 100H and 100M sets uses the single default instrument ofgrand piano. McKay and Fujinaga pointed out that instrumentation features can havestrong classification ability on their own [Mckay and Fujinaga 2005], therefore it isnecessary to remove the possibility of all 200 samples being assigned a genre whichis strongly defined by the presence of grand piano. Similar reasoning lies behind theignoring of features relating to dynamics and, in the case of the block harmonies ofthe 100H set, rhythm.4

Table 4.1: Classification Experiments Using a 38-leaf Hierarchical Taxonomy

Feature Types Default 100M 100HInstrumentation on off offTexture on on onRhythm on on offDynamics on off offPitch Statistics on on onMelody on on onChords on on onRoot success rate 84.7 67.0 80.0Leaf success rate 58.3 43.9 57.0

Table 4.2, found below, lists the parameter settings which have the most impacton the execution time and the classification accuracy for the training set. As Bodhid-harma is flexible enough to allow training sessions which may run for impracticalamounts of time (in the order of several CPU-weeks), it was necessary to make sev-eral compromises. The final configuration was slightly more liberal than one used byMcKay which was deemed successful in [McKay 2004]. Using this configuration, thevarious combinations of extracted features lead to the root and leaf classification suc-cess rates on the training set found above in table 4.1. It should be noted that using

4 In fact, Bodhidharma contains a bug that causes division by zero during the extraction of certainrhythmic features from MIDI sequences in which note events are perfectly quantized and regularlyspaced — so the decision was further enforced by circumstance.


a hierarchical taxonomy tends to hinder the assignment of correct root categories —when trained with Bodhidharma’s regular flat taxonomy, root success rates are gener-ally above 95 percent [McKay 2004]. The leaf success rates for the training set, whilenot spectacular, are still impressive compared to the expected success rate of 2.63 per-cent for pure chance (that is, the random assignment of leaf genres), and hence shouldbe adequate for gaining an insight into the characteristics of the 100M and 100H sets.

Table 4.2: Bodhidharma configuration

Preference SettingTraining/test split 80:20Cross validation NOWeight multi-dimensional features YESFlat classifier ensemble YESHierarchical classifier ensemble YESRound robin ensembles NOMax GA generations 105Max change in GA error before abort 10e-5Max NN epochs 2000Max change in NN error before abort 10e-7Certainty threshold 0.25

4.4.6 Classification Results

The classifier was trained on the Bodhidharma set. The resultant training time for theconfiguration described in 4.4.5 was roughly 300 minutes. The 100M and 100H setswere then fed to the classifier to obtain genre labels. The assignment of genres for thetwo sets is presented in figures 4.2 and 4.3. In the case where multiple outputs of theneural network fired above the certainty threshold, multiple genres were assigned.This provision is widely considered to be representative of how genres are assignedby humans [Scaringella et al. 2006; McKay 2010], and is the reason for the relativegenre assignments in the graphs summing to more than 100 percent.

In figures 4.2 and 4.3, clustering is apparent in the broader genres of Jazz, Rhythmand Blues, and Western Classical. Many genres have not been assigned at all. Thereis also a significant difference between the assignment of harmonies and melodies.100M was classified as 67 percent Jazz, 16 percent Rhythm and Blues and 82 percentWestern Classical. Conversely, a convincing 100 percent of the 100H set is deemedto be Western Classical with only 4 percent being assigned Jazz. These figures areapparently strong evidence that the output of the automated Schillinger System doesin fact have salient statistical properties which are suggestive of particular styles, andthat the melodic module has more diverse output than the harmonic module. Theseresults will be discussed further in section 4.6, in the context of the data from thelistening survey.


0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%Leaf Classifier Results

B

lue g

rass

Conte

mpora

ry C

ountry

Tra

diti

onal C

ountry

Bebop

Cool

Boss

a N

ova

Jazz

Soul

Sm

ooth

Jazz

Ragtim

eS

win

g

Adult

Conte

mpora

ryD

ance

Pop

Pop R

ap

Tech

no

Hard

core

Rap

Blu

es

Rock

Chic

ago B

lues

Country

Blu

es

Soul B

lues

Funk

Rock

and R

oll

Soul

Hard

Rock

Psy

chedelic

Alte

rnativ

e R

ock

Meta

lP

unk

Baro

que

Cla

ssic

al

Medie

val

Renais

sance

Modern

Cla

ssic

al

Rom

antic

Celti

cFla

menco

Sals

aTango

Reggae

100H

100M

Figure 4.2: Leaf genres assigned to 200 samples from a 38-leaf hierarchical taxonomy

0%

20%

40%

60%

80%

100%Root Classifier Results

Country

Jazz

Modern

Pop

Rap

Rhyt

hm

& B

lues

Rock

West

ern

Cla

ssic

al

West

ern

Folk

Worldbeat

100H

100M

Figure 4.3: Root genres assigned to 200 samples from a 38-leaf hierarchical taxonomy


4.5 Assessing Musical Merit

Section 4.2 mentioned the inadequacy of informal assessments of computer-generatedcompositions. As the automated Schillinger System does not make any attempt toimitate a particular style, there is no objective function of any complexity that will beable to give an indication of the inherent quality of the compositions. Hence, there isa motivation to evaluate the system using a group of expert listeners, using a morerigorous and repeatable method than is typically found in the academic literature.The aim of the experiment is to gather and process subjective data as objectively aspossible to correctly identify consensus or variation in collective opinion.

The following sections will outline the details of a listening experiment that hasprovided strong indications about the intrinsic musical merit of the material gener-ated by the automated Schillinger System. Sections 4.5.1 and 4.5.2 will discuss thesurvey and the audio samples, and justify the decisions that went into their prepara-tion. Section 4.5.3 will present the quantitative results from the sections of the surveyinvolving Likert scales, and section 4.5.4 will present the qualitative results obtainedfrom participants’ written responses using a process of analysis borrowed from thefield of Grounded Theory.

4.5.1 Listening Survey Design

Unlike the situation in section 4.4.3 in which a batch of 200 output samples couldbe presented to a classifier, only a handful of samples can be presented to an audi-ence. A group of three melodies and three harmonies was split evenly between themain system modules of harmony and melody. In order for this group of six samplesto be properly representative of the range of output from the automated SchillingerSystem, a selection process was necessary, because it is possible for the system to pro-duce a string of pieces utilising collections of parameters that effectively form clustersin terms of their resulting interval distributions or rhythmic content. In the literature,Holtzman acknowledged the necessity of selecting from the output in this way. “Ulti-mately, a composer must choose which generated utterances to use, how to interpretthe data generated by the machine, and so on. The composer may be seen as a selec-tor” [Holtzman 1981]. Cope selected exemplars from his system’s output to constitutea final representative collection [Cope 2005].

The decision to render the selected output as audio performances, to provide thelisteners with a complete musical context, was informed by several authors who havefaced the same decision. Hiller commented that “performance is, without a doubt, thebest test of the results” [Hiller 1981]. Gartland-Jones followed a similar philosophy ina festival installation of an interactive GA system, where output was performed onguitar: “Recording the output on a real instrument enabled the perceived musical-ity of the fragments to be brought out, and provides additional musical dimensions”[Gartland-Jones 2002]. DuBois is of the same view – that the product of his L-Systemis an intermediate state requiring joint interpretation by the composer and performerto render it fit for consumption [DuBois 2003].

§4.5 Assessing Musical Merit 79

Title Instrumentation View (Appendix A) Media URLHarmony #1 Rhodes piano A.1 ListenHarmony #2 Orchestra A.2 ListenHarmony #3 Grand piano A.3 ListenMelody #1 Clarinet A.4 ListenMelody #2 Grand piano A.5 ListenMelody #3 Violin A.6 Listen

Table 4.3: Output samples used in the listening survey

It should be noted that these issues are not relevant for all computer music sys-tems. This includes those with output that is physically impossible to perform andthose which are interactive during performance [Blackwell 2007; Biles 2007].

The method for generating performances from the system’s output for the listen-ing survey was described earlier in section 4.3. To prevent the listeners from becomingbored of and potentially biased against the timbre of a single instrument, a variety ofinstruments was used. Table 4.3 lists the instruments used for each sample. Thesetitles correspond with tracks 1–6 on the CD accompanying this thesis. The table alsocontains hyper-links for listening to the audio files online.

The survey was designed in consultation with Jim Cotter, a senior lecturer in com-position at the Australian National University (ANU). The survey preamble encour-ages participants to provide entirely subjective opinions, and to judge musical meritagainst their own musical experiences instead of attempting to compare the samplesto other computer-aided composition software. For each audio sample, listeners wereasked register their opinion of four different aspects of the music on a Likert scale, aswell as to provide written opinions on what intrigued or bored them. Likert scalesare widely used in many fields of research within the humanities; they are used torank opinion strength and valence as shown in figure 4.4. Their symmetry allows fora respondent to express impartiality. Five labels were used, with four extra nodes in-terspersed so that participants would feel free to register opinions between the labels.

-4 -3 -2 -1 0 +1 +2 +3 +4O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O

Very negative Negative Neutral Positive Very Positive

Figure 4.4: Likert scale example

The Likert scales for each audio sample represented the dimensions gut reaction, in-terestingness, logic and predictability. The final page of the survey registered two furtherdimensions — diversity and uniqueness. Each term may be mostly self-explanatory,however they were deliberately not defined or clarified for the participants prior tothe commencement of the experiment. Instead, it was intended for them to decidefor themselves precisely what to listen for, rather than add the distraction of tryingto reconcile worded definitions with what they were hearing. Explanations of thedimensions encompassed by the survey are itemised below.

http://www.mediafire.com/?fiyheq6m8fii2i4

http://www.mediafire.com/?ws106hom92atu3t

http://www.mediafire.com/?vzjbqa9h8ti7qdh

http://www.mediafire.com/?3f0zpt3qb4l1x3q

http://www.mediafire.com/?4pabx0xfvf8xls8

http://www.mediafire.com/?10domolbqxtwban


• Peoples’ gut reactions were recorded so that a measure could be obtained ofwhether the group actually enjoyed what they were listening to on a fundamen-tally aesthetic level. This question was placed at the top of each page to increasethe likelihood of it being answered first in a spontaneous way. This kind of mea-sure is obviously important if the point of a composition system is to producemusic that people like.

• Interestingness is, broadly speaking, a measure of how well the music holds peo-ples’ attention, and as far as composition as an art-form is concerned, a measureof success. Miranda concluded that while computers can compose music, rule-based systems seldom produce interesting music [Miranda 2001]. Given that theautomated Schillinger System is rule-based, it is clearly important to find out ifit can produce interesting music or not.

• Logic was chosen as a subjective measure because several authors or their au-diences have commented on the fact that despite computer compositions being‘pleasing’ or ‘acceptable’, they are often criticised for lacking logical progres-sion, development or higher-level structure [Pereira et al. 1997; Mozer 1994].Although logic in terms of musical structural coherence can, to some extent, bemeasured quantitatively by searching for multilevel self-similarity in the man-ner of Lerdahl and Jackendoff [Lerdahl and Jackendoff 1983], it is still an impor-tant element to test subjectively because it has more than one possible interpre-tation.

• Predictability was used to roughly measure the ‘surprise’ factor (or lack thereof)which can either contribute to or detract from the other three elements. It isconceived as a subjective measure of information content, thus bearing some re-lation to work by Cohen [Cohen 1962] and Pinkerton [Pinkerton 1956]; and alsoto Schillinger’s notion of the ‘psychological dial’ which has occasionally beenreferred to by film composers [Degazio 1988]. The neutral position on the Likertscale indicates a balance between predictable and unpredictable musical eventsin the minds of the listeners. It was expected that each listener’s ideal balancewould lie at this position even if their respective tastes for unpredictability dif-fered wildly. For this reason the extreme points of the scale were labelled ‘toopredictable’ and ‘too unpredictable’ so that the relationship to musical meritcould be more easily inferred.

• Diversity was intended to collect data to compare to the results of the automaticclassification system, and aid in interrogating the notion that Schillinger’s sys-tem is somehow neutral in a stylistic sense. It also helped in assessing how thesystem’s output might apply to different musical contexts in practice.

• Uniqueness was intended to gauge how different the music was to that whichthe audience had heard in the past. This question was included in order toadd perspective to the interpretation of the other answers. For instance, if the


group were to claim that they had essentially ‘heard it all before’, this might addcredibility to positive or negative consensus in other questions.

• The survey’s final question was whether, as composers, the participants couldimagine using the system themselves to generate raw musical material. Theanswers to this question may indicate whether a more advanced interactive ver-sion of the system would be adopted for experimentation if it were made avail-able to the wider composition community.

The complete survey has been included in this document in Appendix B for reference.

4.5.2 Listening Experiment

A total of 28 survey participants ranging from first-year undergraduates topostgraduates and lecturers were recruited from the composition department at theANU School of Music. Composers in particular were chosen because they are trainedto possess a strong ear for multiple levels of musical structure, they tend to have anextremely diverse range of musical tastes and listening experiences, and they mayalso be able to perceive the construction of the samples in terms of their knowledge ofcompositional techniques. The survey procedure was approved by the ANU’s HumanResearch Ethics Committee.5 Undergraduates were requested to note their composi-tion enrollment level (how far through their major they were). This field was marked‘N/A’ by post-graduates. Each audio sample was played twice over loudspeakerswhile participants filled in the survey questions. Each first playing was followed bya 30-second pause and each second playing by a 60-second pause. Participants werethen given time to fill in the section of general opinions regarding the group of com-positions as a whole.

4.5.3 Quantitative Analysis and Results

This section describes box plot summaries of the data collected from the Likert scalesfor the six samples, found below in figure 4.5. The plots represent the dimensionsgut reaction, interestingness, logic and predictability for the three harmony samples (H1-H3) and the three melody samples (M1–M3). Boxes represent interquartile ranges (the‘middle 50 percent’ of opinion), diamonds indicate arithmetic means, red bars indicatemedians, and ‘whiskers’ (the dashed lines) indicate extremes of opinion. There are nooutliers. Given that each scale contains five labels with extra nodes in between them,the range for each dimension is [−4, 4]. No participant marked in-between any of thenine nodes, so only integers were recorded. Two of the participants wrote commentson the general opinions page instead of answering the Likert scales. These answerswere transferred verbatim into the text fields on that page to ensure any qualitativedata was not lost, and the opinions were converted into reasonable estimates on theLikert scale of what these persons were thinking.

5 Ethics protocol no. 2008/237


−4 −3 −2 −1 0 1 2 3 4

M3

M2

M1

H3

H2

H1

Gut Reaction

−4 −3 −2 −1 0 1 2 3 4

M3

M2

M1

H3

H2

H1

Interestingness

−4 −3 −2 −1 0 1 2 3 4

M3

M2

M1

H3

H2

H1

Logic

−4 −3 −2 −1 0 1 2 3 4

M3

M2

M1

H3

H2

H1

Predictability

(a)

(b)

(c)

(d)

Figure 4.5: Box-plots representing individual samples

The gut reaction mean results in figure 4.5(a) range from exactly neutral for sampleH2 to 1.43 for sample M2, which is tending towards the value of ‘like’ on the Likertscale. For all samples except H2, the interquartile box lies on the positive side of neu-tral. H2 appears to have polarised the audience the most, with the mean, median andinterquartile box lying exactly on or centered around zero. The overall response forinterestingness, shown in 4.5(b), was unequivocally positive, with all means lying on


or above 1 and almost all of the interquartile data being above zero. The noticeablysmaller interquartile boxes indicate a greater consensus of opinion. In figure 4.5(c), theunanimous perception of logic within M2 is striking. There is a greater range of meansbetween samples (-0.86 to 2.14) and less consensus on each individual sample, indi-cated by most of the interquartile boxes being wider. In figure 4.5(d), the interquartileboxes for predictability are also generally wider, although the general perception iscloser to neutral (a good balance between predictability and unpredictability). Sam-ples H1 and in particular, H2, were perceived unanimously as too unpredictable.

It is notable that samples H3 and M2, which have the highest means for gut reactionand logic, also have the two lowest means for predictability (suggesting they were themost predictable). Sample H2, which was the least liked according to its gut reaction,was also considered the most interesting (by a slight margin), the least logical and themost unpredictable.

The figure 4.5 plots suggest that overall, people enjoyed what they heard, andfound it somewhat interesting and logical; but that each individual sample certainlypolarised the audience to a degree, as indicated by the width of the interquartile boxesand the extent of the whiskers. The opinions of logic and predictability also appear tohave differed significantly between samples, compared to the measures of gut reactionand interestingness.

−4 −3 −2 −1 0 1 2 3 4

Predictability

Logic

Interestingness

Gut Reaction

Sample Aggregates

−4 −3 −2 −1 0 1 2 3 4

Uniqueness

Predictability

Logic

Interestingness

Diversity

General Opinions

(a)

(b)

Figure 4.6: Box-plots representing overall opinion

The box plots in figure 4.6 give further promising indications of the intrinsic meritof the samples. Plot 4.6(a) was calculated by aggregating the data across all six sam-ples for each dimension; hence, it shows overall an extreme range of opinion, butit also shows that the average opinions on gut reaction, interestingness and logic werepositive and predictability was close to ideal. Plot 4.6(b) represents the final page of


the survey which collected participants’ overall opinions of the set of samples afterlistening was concluded. Once again, there is the suggestion of an overall positivereaction for the measures which were used for each sample. It is interesting to notethe strong correspondence between figures 4.6(a) and 4.6(b) for interestingness, logicand predictability. This indicates that opinions changed very little on average betweenthe listening phase and the final page of the survey. The opinion of diversity is pos-itive, which is supportive of the idea that the automated Schillinger System may atleast be useful in a variety of stylistic contexts. The only strongly negative measure isthat of uniqueness, which is an assertion that the audience did not encounter anythingespecially unfamiliar.

Table 4.4: Kruskal-Wallis variance measure p for each dimension across all 6 samples

Dimension Mean Median Std. Dev. p p with H2 removedGut Reaction 0.84 1 1.71 0.0125 0.1477Interest 1.26 2 1.55 0.9605 0.9023Logic 0.71 1 1.94 <0.0001 0.0004Predictability 0.28 0 1.76 0.0031 0.2359

To corroborate the intuitive conclusions of diversity of opinion between samplesfrom visual inspection, the Kruskal-Wallis variance measure was applied to each di-mension in the case of the samples. This measure, expressed as p, falls below 0.01 ifthe data in a dimension contains statistically significant differences among subgroups.The Kruskal-Wallis results can be found in table 4.4 alongside the mean, median andstandard deviation for each dimension across all six samples. Additionally, from ob-servation of figure 4.5 it would appear that Harmony #2 (H2) elicited a rather differentreaction from listeners compared to the rest of the samples. In order to validate that as-sertion, the Kruskal-Wallis measure was repeated with the H2 results removed fromthe data-set — this is included in the right-most column of table 4.4. The p valuesconfirm the consensus among participants regarding the music’s interestingness anda varying perception of both logic and predictability across the different samples. Italso confirms that sample H2 caused the high variance of predictability across samples.Sample H2 contained the most voices and arguably the highest degree of dissonance,which is perhaps what people reacted against.

The final survey question was whether or not the participants could imagine usingthis kind of system for generating musical material. The data from this question wascollected in the form of ‘no’/‘maybe’/‘yes’ circled answers. These responses were en-coded as -1, 0 and 1. The mean response of 0.07 — shown in figure 4.7 — substantiatesthe observation that most people circled ‘maybe’, and more people circled ‘yes’ than‘no’.

−1 (No) 0 (Maybe) 1(Yes)

Would you use the system?

Figure 4.7: The mean anticipated ‘usefulness’ of the automated Schillinger System


A Pearson’s correlation analysis is shown in figure 4.8 to see if any strong relation-ships exist between dimensions — in particular, whether the more ‘experienced’ com-posers, as inferred from undergraduate levels, had different opinions to those withless experience. For this to be possible the values of ‘N/A’ collected from the surveywere encoded as the value of 7, because all of the ‘N/A’ group were post-graduatesand undergraduate levels fell between 1 and 5. It is notable that composition experi-ence only correlated strongly with the opinion of uniqueness. This and other strongcorrelations to be deduced from the graph are summarised below.

Figure 4.8: Pearson’s correlation graph of the survey’s quantitative data

• Participants with more composition experience found the samples less unique(that is, more familiar);

• Participants who found the music less familiar found it more interesting;

• Participants generally found the highly logical samples to be too predictable;

• Participants who found the music interesting noted a higher level of diversity;

• Participants who registered the most positive gut reactions also found the musicsomewhat interesting and logical, suggesting that these properties are intrinsicto the enjoyment of music.


Generally speaking, the data from the Likert scales can be said to indicate a thought-ful and mostly positive response from the audience, with many divided opinionswithin individual samples and differing collective opinions across the group of sam-ples. Furthermore, the composers showed a degree of curiosity about the systemby indicating that they would entertain the idea of using it themselves. From a de-veloper’s perspective this is an encouraging response because it shows that expertlisteners have acknowledged the musical merit and potential of the current state ofthe output. This provides an impetus for further exploring the implementation ofSchillinger’s procedures.

4.5.4 Qualitative Analysis

4.5.4.1 Methodology

Each section of the survey incorporated a blank field in which participants could freelywrite about any elements of the music they believed to be intriguing or boring. Thesefields were deemed necessary in order to capture the nuances of opinion that wouldotherwise be lost in the small number of Likert dimensions.

Written responses provide a rich source of information that must be analysed us-ing an established qualitative method. The principles of Grounded Theory were bor-rowed for this purpose. Grounded Theory originates with the work of Glaser andStrauss [Glaser and Strauss 1967] and is prominent in the fields of psychology andhuman-computer interaction [Lazar et al. 2010]. Glaser and Strauss pursued the basicidea that in fields where established theories often do not exist, but where data sourcesare abundant, it makes far more sense to allow hypotheses to emerge as part of theprocess of data collection and analysis, rather than to formulate them a priori. Thusthe principles of data ‘coding’ and ‘emergent categories’ become important, as doesthe repeatability of the coding process. Coding is, in short, the conversion of humanresponses to a consistent short-hand which allows for general concepts to be repre-sented, higher-level categories to be defined and robust relationships to be identifiedwithin or between data-sets [Lazar et al. 2010].

The purpose of using Grounded Theory for the listener responses was to developa better understanding of how the listeners reacted to the audio samples. The codingprocess identified recurring keywords to help build this picture and allow conceptcategories to emerge. Since the data consisted of subjective evaluations, each instanceof a category was assigned a valence of opinion (positive or negative) and a magni-tude of opinion. An initial review of the data suggested that three levels of magnitudewere sufficient (slight=1, moderate=2, strong=3). Category instances were then talliedand graphed to facilitate higher level conclusions. The resulting concept/category hi-erarchy should enable the experiment to be easily repeated with different participantsand different audio samples.

This is an appropriate method to use on the current sample size: Guest et al. havefound that in interview situations, new codes rarely tend to emerge after 12–15 inter-views [Guest et al. 2006]. Survey responses are relatively short by comparison, butgiven that the subject matter was tightly constrained by the scenario it was highly


likely that the responses of 28 participants would contain enough data to make thisprocess worthwhile. Furthermore, this particular use of Grounded Theory is war-ranted by the fact that, despite there being only one coder (the author, as multiplecoders was not an option for organisational reasons), this coder was equipped withspecialist domain knowledge on the subject (the author is a musician and composer)[Lazar et al. 2010]. This helped to ensure consistency and reliability of the codingprocess, and also to ensure the correct identification of the point of ‘theoretical sat-uration’; that is, the threshold beyond which no new categories emerge [Glaser andStrauss 1967].

4.5.4.2 Analysis and Results

During the initial phase of coding the participants’ responses, several categories rapidlypresented themselves as elaborations of the Likert categories, including predictability,interestingness and logic; as well as form/structure, instrumentation/timbre, identificationsof style or genre, and identifications of compositional techniques like repetition and vari-ation. Understandably, several categories emerged commenting on aspects of the sam-ples beyond the control of the system, such as the performance, dynamics and recordingquality.

Consistency of coding is essential to the validity of Grounded Theory, especiallysince the interpretation of written opinions requires an unavoidable degree of subjec-tivity. Certain principles were followed which are listed below:

• Blank fields were ignored;

• Declaring that a sample had no boring aspects was viewed as a strong indicationof general merit;

• A declaration that there was nothing intriguing about a piece was considered astrong indication of a lack of general merit;

• Valence of opinion was for the most part determined by whether or not the per-son was writing in the ‘intriguing’ or ‘boring’ field unless it was otherwise ob-vious;

• Magnitude of opinion was inferred from any qualifiers or adjectives used, andwhether the opinion was in agreement or contradiction with other opinions ofthe listener in the same section;

• For opinions to qualify as strong they had to either contain emotive language orbe clearly unequivocal;

• Multiple categories could be assigned to single statements depending on theimplications given by their wording;

• It was possible for the same statement to be assigned a positive or negative va-lence of opinion depending on the person’s taste;


• Multiple comments on different concepts within the same category were treatedseparately to retain information, so that for instance, a positive reaction to per-ceived harmonic function would not be simply cancelled out by a negative reac-tion to harmonic voicing.

Some examples are given here for clarity. Codes are represented as three-elementtuples of the following format:

{category, concept, opinion type}

The opinion “perfectly acceptable melody. Sounded great” was coded as{general, merit, +3}. “A touch too dissonant, seemingly a bit random” was codedas {dissonance, general, -1} and {predictability, unpredictable, -1}. These latter opiniontypes on their own perhaps could have been interpreted as ‘moderate’, but they wereoffset by the person’s intriguing aspects of the same sample: “some very nice resolu-tion. The range was quite vast”, implying that the dissonance was only a minor issue.This opinion was coded as {harmonic, function, +2} and {textural, range, +2}. ‘Moder-ately positive’ was chosen for both due to the presence of the word ‘some’ in the firstsentence, which suggests that the very nice resolution was somewhat irregular, andthe word ‘vast’ which can have an emotive gravity but in this case has been statedas more of a detached observation than an inherently meritorious characteristic. Atotal of 239 opinions were coded in this manner. Table 4.5 contains the resulting codeconcepts and emergent categories.

Table 4.5: Emergent categories and associated code concepts

Abbr. Category Concept

TMB Instrumentation and timbreHomogeneityGeneral

PRF Aspects of the human performance GeneralREC Recording and mixing quality Reverb

TMP TempiGeneralStasis

DYN Dynamics and articulationAccentationGeneral

MOD Mood and emotional contentAmbientHappy/meanderingNice/pretty

LEN Length General

FRM Form and structureTension and releaseGeneral

RPV Compositional TechniquesRepetitionVariationMotif use

Continued on next page


Table 4.5 – Continued from previous pageAbbr. Category Concept

TON TonalityLack ofModalGeneral

DIS DissonanceGeneralVarying degrees of

PRD PredictabilityPredictableUnpredictablePresence of rules

MDY Melodic aspects

InterestContourDevelopmentRangeLack of directionLyricismLogicPolyphonic implication

TXT Textural aspectsRangeDensity

STY Comments on style or genre

ConsistencySpecific composerSpecific style or historical period

RHY Rhythmic aspects

StasisInterestLack of restsComplexityMetre

HMY Harmonic aspects

Function/resolutionInterestLogicVoice leadingDirection/developmentLack of directionComplexityStabilityImplication

GEN General musicality

SimplicityMeritPotentialNice ideasLack of diversity/contrast


−3 −2 −1 0 1 2 3

GEN

HMY

RHY

STY

TXT

MDY

PRD

DIS

TON

RPV

FRM

LEN

MOD

DYN

TMP

REC

PRF

TMB

Qualitative Analysis − Harmony

−3 −2 −1 0 1 2 3

GEN

HMY

RHY

STY

TXT

MDY

PRD

DIS

TON

RPV

FRM

LEN

MOD

DYN

TMP

REC

PRF

TMB

Qualitative Analysis − Melody

Figure 4.9: Coded results of qualitative analysis of participant responses. The results for har-monies are contained in the left-hand graph; melodies in the right-hand graph.

The coded opinions, along with their associated valence and magnitude informa-tion, are plotted in figure 4.9. Magnitude and valence of opinion constitute the hori-zontal axes and the emergent categories constitute the vertical axes. The vertical axesare unordered, however those categories that were not entirely relevant to the be-haviour of the automated Schillinger System have been placed towards the top of thegraph. The abbreviations can be deciphered using table 4.5. No information lies atzero magnitude for the simple reason that it would constitute ‘null’ opinion — noneof these were expressed by respondents. The size of each point on the graph repre-


sents the tally of each opinion type for a particular category. Figure 4.9 provides a lotof information. The most important inferences are listed below.

• Judging by the general opinions row and a greater presence of points in the +3column, participants thought the melodies were better than the harmonies;

• Predictably, most of the comments related specifically to harmonic and melodicproperties. For both groups of samples the opinions offered were substantiallymore positive than negative;

• Only a small number of people made comments which did not shed any light onthe success of the automated Schillinger System itself (the top five rows). Thisindicates that people were engaged and well aware of the parameters they werelistening for;

• People could not help being unimpressed by the static rhythm of the harmonies,despite the fact that they knew to expect it. This suggests an initial focus for fur-ther development must be to treat harmony as integral to other contexts ratherthan a lone entity;

• There was a greater perception of actual compositional techniques taking placein the melodic samples, even though opinion on their success was divided;

• From the Likert scale data it was concluded that people generally perceiveda balance of predictability and unpredictability. Figure 4.9 confirms that theymostly enjoyed whatever unpredictability or predictability they experienced.

4.5.4.3 Genre and Style

Table 4.6 contains all genres or styles that were identified by participants, includingstyles supposedly identifying particular composers. In the table these are associatedwith the root genres found by the automated classifier to give some sense of compar-ison (see section 4.4). Plotting the ratio of the occurrences of root genres in table 4.6against the classifier results in figure 4.3 is tempting, but this would not be partic-ularly legitimate because the listening survey used only six samples and the genresidentified by humans were mostly assigned to the group as a whole which containedboth harmonies and melodies. However, it is clear that the vast majority of commentson genre and style fell within the bounds of Western Classical music, and this is instriking concordance with the results of section 4.4.6.

If the attention is instead focussed within the genre of Western Classical music,which is an extremely broad genre, then the participant responses in table 4.6 do sug-gest a fair level of stylistic diversity which could perhaps not be captured by the mod-est collection of Western Classical sub-genres in McKay’s taxonomy (see figure 4.2).


Table 4.6: Genres suggested by participants

Classifier root genre Genre or style identifiedJazz Jazz

Western FolkFolkDisney

Western Classical

SibeliusChopinLate RomanticImpressionistPerpetuum MobileShostakovichPost-1920ClassicalAtonalStravinksyExpressive tonal musicNon-traditionalArt MusicMinimalistEtudeBartokWestern Tonal Classical20th Century

Modern PopPopMuzakNew-age

Rock Progressive Rock

4.6 Discussion

The automated Schillinger System’s output has been evaluated using methods whichare intended to improve upon those currently present in computer music literature.The stylistic diversity of a group of 200 output samples has been measured using anautomated genre classification system. The intrinsic musical merit of a group of sixselected output samples, rendered with human performances, has been rigorouslyassessed by a group of expert human participants.

The results from the listening experiment are convincing. Collectively, the listen-ers registered positive responses regarding the music’s merit; in particular its likeabil-ity and interestingness. They decided that the music’s level of predictability was closeto appropriate, and that there was some form of logic underlying its construction; al-though in these cases there was slightly less consensus. The application of a method ofqualitative analysis from Grounded Theory revealed a multitude of complaints andcompliments specific to various properties of the samples, which have provided awealth of information to inform further development. Ultimately these contributed

§4.6 Discussion 93

to an overall positive opinion of the system’s output.The classification experiment suggested that the harmonies fell within the sweep-

ing genre of Western Classical music, while the melodies were a somewhat more di-verse split among Western Classical, Jazz, and Rhythm and Blues. These results arecorroborated quite strongly by the list of styles and genres the human participantsattributed to the samples in the listening experiment. These latter styles, however, dorepresent diversity within the genre of Western Classical music, showing the potentialfor the automated Schillinger System to be applied to a variety of musical contexts.

Additional experiments will be needed to address some lingering questions. Forinstance, it is unclear how much of an influence the quality of the audio renderingmay have had on listeners’ perceptions of musical merit, or how much the choice ofinstrumentation influenced their interpretations of style. McKay and Fujinaga havesuggested that instrumentation is a particularly important feature for automaticallydistinguishing genre [Mckay and Fujinaga 2005]. On the other hand, Aucouturierand Pachet found that for humans, style and timbre may not be so strongly correlated[Aucouturier and Pachet 2003]. It would almost certainly be unwise to revert to pre-senting raw MIDI data to an audience, but it could be informative to perform a similarexperiment using high quality recordings limited to a single instrument.

Chapter 5

Conclusion

The Schillinger System of Musical Composition was intended to be used by students ofcomposition and by working composers. Despite its self-proclaimed grounding in aschool of thought that espoused rigorous scientific approaches to all forms of humanendeavour, it was ultimately designed to stimulate real creativity in musical thinking.Schillinger almost certainly did not conceive of the formalism as a means of generatingmusic automatically; in fact he stated quite plainly that success using his methodsdepended on “the ability to think” [Schillinger 1978]. Such a statement should not actas a deterrent, but it does force one to accept that an extensive formalism intended forthe composition of new music cannot be so rigorous that it presents itself as a completemathematical framework for computer implementation. The issues encountered inbuilding the automated Schillinger System, as described in chapter 3, were thereforeto be expected, and by necessity the resolutions of these issues required a modicum ofcreativity on the author’s part.

Rader stated the opinion that the goal of computer music is not to be aesthetically‘perfect’, but to be indistinguishable from human-produced music [Rader 1974]. Thisgoal has since been mostly superseded by the idea, echoed by Blackwell, that

the goal of automated composition research is not to replace human musicmaking with an automatic machine . . . the desire is to find artificial musicthat is different from human expression, yet comprehensible [Blackwell2007].

This line of thinking influenced the decision to use a listening experiment in thisresearch to establish the intrinsic musical merit of the automated Schillinger Sys-tem’s output, rather than attempt to bluff audiences with selections of human- andcomputer-composed pieces in the manner of Storino et al. [Storino et al. 2007]. Itis also in concordance with many authors’ views that the ultimate goal of algorith-mic composition should be to realise genuinely new music, rather than ‘recompose’existing music.

5.1 Summary of Contribution

• This thesis has presented what appears to be most comprehensive computerimplementation of The Schillinger System of Musical Composition to date. Until

95

96 Conclusion

now, no such system has been documented in academic literature. The onlyother alternative implementation is much narrower in scope and unpublished.

• Several extensions to and simplifications of Schillinger’s theories have been nec-essary in order for them to be fully implemented. This has applied particu-larly to Schillinger’s Theory of Melody, which has previously been dismissedas “completely obscure” by Backus [Backus 1960] and “too cumbersome forpractical use” by Arden [Arden 1996]. As such, this thesis also contains thegroundwork for developing a more concise and mathematically sound versionof Schillinger’s theories for both composers and future researchers.

• The use of an automatic genre classification system to assess the style and mu-sical diversity of the system’s output has shed some light on the characteristicsof the automated Schillinger System. This has aided in an investigation of theclaims of Schillinger and his editors to the effect that the system somehow op-erates independent of musical style [Schillinger 1978]; but more importantly ithas given an indication of how useful the automated system may turn out to bein practical applications. This author is not aware of any previous attempt inthe academic literature to measure the diversity of computer generated musicusing a genre classifier. The experiment is repeatable, and will provide increas-ingly accurate results as the field of musical information retrieval continues tomature.

• A rigorous listening survey with expert participants has been conducted to es-tablish the intrinsic musical merit of samples from the system’s output, by pre-senting them as expressive human performances using a variety of instrumenta-tion. The data collected has undergone both quantitative and qualitative analy-sis to precisely determine the range and strength of opinions formed by listeners.The paucity of thorough critical evaluations in the academic literature suggeststhat this kind of survey and analysis is rare, and could be more widely used inthe future to measure the success of algorithmic composition systems.

• The results of both the classification and listening experiments strongly indicatethat the automated Schillinger System’s compositions constitute a broad rangeof musical styles within the realms of Jazz and Western Classical music. Fur-thermore, the results of the listening experiment suggest that these compositionsexhibit some musical merit and are generally enjoyable and interesting to listento. Most of the 28 composers who participated in the survey also indicated a de-gree of interest, based on what they heard, in experimenting with the system forcreative purposes. It can be concluded that the system described in this thesisrepresents a musically worthwhile addition to the computer-aided compositionlandscape.

§5.2 Avenues for Future Work 97

5.2 Avenues for Future Work

The automated Schillinger System provides extensive scope for further research anddevelopment. An obvious initial task is to expand the system to incorporate as muchof the content of books V–XII of the Schillinger System as possible, and to revisit somesections of books I–IV that were omitted either due to time constraints or other reasonslisted in section 3.7. This will lead to a system capable of producing compositions withcomplete form and instrumentation, but it is likely that the types of difficulties so farencountered in adapting Schillinger’s formalism will continue.

In its current state, the implementation acts as a ‘push-button’ system withoutrequiring human intervention during the construction of each piece. The problemwith this paradigm is that the user is unable to ‘tune’ the musical surface or structuralqualities to their own liking, or indeed exercise a deeper level of control to explore theindividual Schillinger procedures for their own use. There are two possibilities thatmay address this.

The first is to retain the push-button interface, but develop a series of high-levelaesthetic parameters for the user to tweak before each execution. At present, no spe-cific aesthetic or stylistic constraints find their way into the system’s output other thanthose which are somehow inherent in Schillinger’s procedures, and those which aresymptomatic of the constraints that were necessary to make the procedures amenableto computer implementation (such as a formal definition of Schillinger’s undefinedterm ‘acoustically acceptable’ — see section 3.3.4). This results in music ranging fromextremely consonant to extremely dissonant, with a wide range of temporal and har-monic textures. As such, it will be necessary to find mappings from such high-levelparameters to precise parameter combinations that control each section of the com-position modules. The author is yet to identify combinations that constitute reliableprescriptions for particular aesthetics or styles. If such a model were successful itwould engender two further practical uses — a tool for content creators to automati-cally produce music of a particular length and character to be rendered as audio viaperformance or synthesis, or a programmable plugin for applications with generativemusic requirements such as websites or computer games.

The second possibility is to devise a command line or graphical interface thatwould give users low level access to the individual composition procedures. Eachprocedure’s input could either be assigned to the output of another compatible pro-cedure or generated randomly at the user’s behest. This would allow Schillinger’sindividual theories to be explored by those simply interested in the Schillinger Sys-tem itself, or allow for whole compositions to be built systematically with as much oras little control as desired. It would also eliminate the current reliance on proceduresthat the author has devised to interface the different theories. A brief example of howthe terminal-based realisation of this concept might function is given in figure 5.1.

Finally, as mentioned above in section 5.1, much work has gone into developingformal adaptations of procedures which were expressed inexactly by Schillinger. Theprospect of reworking the entire Schillinger System into a significantly condensedcomplementary ‘handbook’ version, free of the obfuscatory verbosity and notational

98 Conclusion

> s = (2 1 2 2 1 2)> p = 5> t = 4> axis1 = (1 rhythm(t, 2) 2 (-1 false 0))> axis2 = (2 rhythm(t, 1) 1 (-1 false 0))> axis3 = (3 rhythm(t, 1) 1 (-1 false 0))> axes = (axis1 axis2 axis3)> M = superimpose(s, C4, p, t, axes)> C = buildParams(axes, 8)> pdf(buildMelody(s, M, C))

9

Figure 5.1: The potential functioning of a terminal-based interactive Schillinger System

inconsistency lamented by [Barbour 1946], is enticing. As far as the author has beenable to ascertain, no publication exists to serve this purpose. This resource would beparticularly valuable to composers interested in Schillinger’s theories, as well as otherdevelopers of composition algorithms who might wish to program their own modelsof Schillinger’s procedures.

There is ongoing activity within the Schillinger Society1 with the aim of encourag-ing a wider exploration and adoption of Schillinger’s work. This has been bolsteredin recent years by online courses dedicated to the teaching of Schillinger’s methods.2

Moreover, the recent release of McClanahan’s four-part harmonisation program basedon Schillinger’s Special Theory of Harmony and further activity on the Schillinger CHIProject website3 seem to indicate a recent surge of enthusiasm around possible com-puter implementations of the Schillinger System. Future development of the workpresented in this thesis could form a significant contribution to this movement.

1 www.schillingersociety.com2 See http://www.schillingersociety.com/moodle/ and http://www.ssm.uk.net/index.php3 http://schillinger.destinymanifestation.com/

Appendix A

Samples of Output

The system’s output, subsequent to being processed by LilyPond, consists of MIDI filesand the corresponding musical notation in PDF format. This section contains the sixexample pieces used for the listening survey. Table 4.3 lists the instrumentation thatwas used to render each performance, and includes hyper-links for listening online.

A.1 Harmony #1 !!

!!!

!

!! !!!

"

"

!

!!!

!!

!

!! "!!

!!# "$""

#$

%& !

!

""

!! "!!

"

!

!!!!

! #!!

!" !! !!"!" " !

!!

# !

A.2 Harmony #2

!"!!"#

!!

"

"

"!!!

!

!!

!

"!

!!

"

!!

" !!!

!!

"

"

"! !

!!

!

!! #

!

"

#

!

!!!!

!!

!! "

!!

!!

"!

!"

!!!

"!!

$%$&

$&

!!!

!!

! "!

" "

!!!""

!!

!!!

!!!!

""

" "!!

!!!!

!!

"" "

"

!!!

!!

!!!

!!

#!!!

!#

!!! ""

"

%&&

11

""

!

!

!!

#"

"

!!

99

100 Samples of Output

A.3 Harmony #3

!!!!!

!!""

!!!!

!!!!"

"""

!!!""!

!!!!

!"#$% "!!!!

$!!!! !"

!!! "

!!!!

""

!!!!"

""!!!! ""

!!!!

""!!!!""

!!!

" !!!!!"

!%

11

!!!"

# !!!!

!!!!

"" !

!!!

# !!" !%

21 !"" !"!"!!

!!!!

! "! !!!! "" """ !

!

!"" "!!!!

!!!!! !

!!!! !

!!!!

""

A.4 Melody #1

!! !" !!

86# ! "!$ !" " %!&! "!! $!! !!%" !! $

&! !! " !! ! ! "!! ! ! !"%!

!&! ! $&!!&! $! ! !!! ' !!!!!!! !"

!!

"#8 ! ! !" & ! $ !!! !! !!

!!! !!!"!!!& ! ! ! ! ! !%"!! ! !&" !15 # ! !& ! $ !%! $!"" " !

§A.5 Melody #2 101

A.5 Melody #2 ! " ## !# # !# # "! ##$ % #!! "" # ## #"" # # # #

! # #!## #! #" #"

#### ! !# ! ##! #" #!9 ! #"# # !###$ #

#!# #! # # & " #! &# " #"&

!# !"

16

$ ! #& ## " # ##

## #! !

" #! &## #"

& #! & #&#! #! " # "

! #! ##!23

$ "!"!#' ##"##!#

A.6 Melody #3

!"# #$" # #" # !"% & !" ## ## # # $##" # # #"# $ #" # !#""# ###

!" #" # #" ! !$ #9

% #" #' #$ #" # #"## !" #' # # # !" !#" # #$ ##"# #' #$ # #$ # #" # #"#$ # # #!" # !

17

% #" #!"# # # #" ##" # "!'

##" " !$ ! # ##$ " #25

% #" # ###" # #$#""### ## !" # ##!"# "#

# !" ## !" !$ #33

% #" #' #"# #" #"#" ! #" !$ #" ! #" !# #" #' #

" #" ##" ! # # #41

% # #$ # ## #" #$#" # #" #' # #" # # #"#$ # # #

" #! " !" # # "# " #"48

% #$ # ##$# # !"#"#"## $##" #" !' !##$#" "#

! # ## #$ # #$ " #56

% #" ! #" !#$ !" !#" #' # #" # $ !" #"" #" # !

102 Samples of Output

Appendix B

Listening Survey

The survey document that was used by participants is included for reference.

103

Listening Survey

You are being asked to evaluate six samples of the output of a computer-automated composition system. Answer on the basis of what you feel to be the intrinsic musical merit of each individual piece from your expert musical experience. The goal is not to compare the examples with each other, to a human composer, or to any other composition software that you may be familiar with. Your evaluation should draw on your appreciation of music and the art of composition.

Each sample will be played twice. The samples consist of three homophonic harmonies and three monophonic melodies.

For each sample you will be asked to register four opinions: your gut reaction, your evaluation of its interestingness, your evaluation of its overall musical logic, and your evaluation of how predictable it was. There is also a general section at the end of the survey with several more questions relating to the group of pieces as a whole.

Ideally your answers should be carefully considered subjective opinions. You are not expected to analyse any of the samples in terms of music theory.

Indicate your answers by marking in the appropriate circle on each scale, for example:

O––––––o––––––✓––––––o––––––O––––––o––––––O––––––o––––––OReally dislike Dislike Neutral Like Really like

Please consider writing free-form answers to questions in the spaces provided. These can be as long or as short as you like, containing prose, keywords, etc – I want to know exactly what you are thinking.

You are allowed to leave individual answers blank if you wish, and you are free to opt out of this experiment completely if you are uncomfortable with any aspect of it.

Optional: please indicate which COMP Level (1-6) you are presently studying: ______(Write 'N/A' if this does not apply to you)

Matt Rankin29/03/12

Sample #1: “Harmony #1”

Gut reaction:

O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––OReally dislike Dislike Neutral Like Really like

Harmonic Interest:

O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O Completely Not Very Neutral Fairly Very Uninteresting Interesting Interesting Interesting

Harmonic Logic:

O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O Completely Not Very Neutral Fairly Very Illogical Logical Logical Logical

Predictability:

O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O Too Fairly Balanced Fairly Too Predictable Predictable Unpredictable Unpredictable

What aspects intrigued you, if any:

What aspects bored you, if any:


Gut reaction:


Harmonic Interest:


Harmonic Logic:


Predictability:




Sample #4: “Melody #1”

Gut reaction:


Melodic Interest:


Melodic Logic:


Predictability:





Gut reaction:


Melodic Interest:


Melodic Logic:


Predictability:




General Opinions

Rate the overall diversity of the material:

O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O Very Fairly Neutral Fairly Very Similar Similar Diverse Diverse

Rate the overall interestingness of the material:


Rate the overall musical logic of the material:


Rate the overall predictability of the material:


How different is it to music you've heard before?

O––––––o––––––O––––––o––––––O––––––o––––––O––––––o––––––O No A Bit Somewhat Fairly Very Different Different Different Different Different

Would you categorise it as belonging to any particular musical style or genre?

Were there any particular recurring features you found enjoyable or irritating?

Based on what you have heard today, could you imagine using this software as a compositional tool

for your own purposes? (Please circle: Yes / No / Maybe )

Thanks for participating!

112 Listening Survey

Appendix C

Function List

Not every function in the automated Schillinger System is included here; there aredozens more which are concerned with auxiliary and standard musical operations,as well as interfacing with Lilypond (see section 4.3). The listing is limited to thosewhich are related specifically to the implementation of Schillinger’s methods andthose which were necessary to interface the methods in a sensible fashion. Refer tochapter 3 for details, and the call graph in section 3.6 for an overview of the system’sstructure.1 The listing may also help to give some idea of the functions that wouldbe available to the user in the proposed command-line interface mentioned in section5.2. References back to Schillinger’s published volumes are included to aid furtherinvestigation.

C.1 Rhythmic Resultants — Book I: Ch. 2, 4, 5, 6, 12

interference_patternprimary_resultantsecondary_resultanttertiary_resultantresultant_comboalgebraic_expansion

C.2 Rhythmic Variations — Book I: Ch. 9, 10, 11

permutations_straightpermutations_circularcontinuity_rhythmicgeneral_homogeneous_continuity

1 Note that the graph in section 3.6 is a representation that has been further condensed to focus onthe most important aspects of the system’s architecture. Not every function listed here is present on thediagram.

113

114 Function List

C.3 Rhythmic Grouping and Synchronisation — Book I: Ch.3, 8

coefficient_syncgroup_durationgroup_attacks

C.4 Rhythmic Generators

random_resultant_from_basisrandom_combo_from_basisrandom_tertiary_resultant_from_basisself_contained_rhythmsmultiple_within_time_ratiosubdivide_basisgenerate_rhythmconvert_basis

C.5 Scale Generation — Book II: Ch. 2, 5, 7, 8

flat_scaleflat_7_tone_scalescale_tonal_expansionsymmetric_scale_smallsymmetric_scale_largerandom_scale

C.6 Scale Conversions — Book II: Ch. 5, 9

scale->pitch_scalescale->full_pitch_scalepitch_scale->scalesymmetric_scale->scalesymmetric_scale->pitch_scalessymmetric-scale?extend_flat_scalescale_translate

C.7 Harmony from Pitch Scales — Book II: Ch. 5, 9

acoustically_acceptable?sub_chords

§C.8 Geometric Variations — Book III: Ch. 1, 2 115

sub_chords_of_scalenearest_tone_voice_leadingrangeadjust_voice_registeradjust_harmony_register

C.8 Geometric Variations — Book III: Ch. 1, 2

invert_voiceinvert_chordinvert_harmonyrevoice_starting_chordgenerate_spliced_harmonycompose_harmonyexpand_voiceexpand_chordexpand_harmonycontract_pitch_range

C.9 Melodic Functions — Book IV: Ch. 3, 4, 5, 6, 7

random_axis_systemgenerate_secondary_axespartition_axis_systemadjust_axis_to_pitch_scalesuperimpose_pitch_rhythm_on_secondary_axesgenerate_continuity_parametersbuild_melodycompose_melody

116 Function List

Bibliography

ALLAN, M. 2002. Harmonising chorales in the style of Johann Sebastian Bach. Mas-ter’s thesis, School of Informatics, University of Edinburgh. (pp. 9, 14)

AMES, C. 1987. Automated composition in retrospect: 1956-1986. Leonardo 20, 2,169–185. (pp. 3, 13, 28, 44)

AMES, C. 1989. The markov process as a compositional model: A survey and tuto-rial. Leonardo 22, 2, 175–187. (pp. 13, 14, 67)

ANDERS, T. AND MIRANDA, E. R. 2011. Constraint programming systems formodeling music theories and composition. ACM Computing Surveys 43, 4 (Oct.),30:1–30:38. (pp. 1, 17)

ARCOS, J. L., CANAMERO, D., AND LOPEZ DE MANTARAS, R. 1998. Affect-drivengeneration of expressive musical performances. In AAAI’98 Fall Symposium on Emo-tional and Intelligent (1998), pp. 1–6. AAAI Press.

ARDEN, J. 1996. Focussing the musical imagination: exploring in composition the ideasand techniques of Joseph Schillinger. PhD thesis, City University, London. (pp. 4, 96)

AUCOUTURIER, J.-J. AND PACHET, F. 2003. Representing musical genre: A state ofthe art. Journal of New Music Research 32. (p. 93)

BACKUS, J. 1960. Re: Pseudo-science in music. Journal of Music Theory 4, 2, 221–232.(pp. vii, 4, 31, 49, 96)

BAFFIONI, C., GUERRA, F., AND LALLI, L. T. 1981. Music and aleatory processes.In Proceedings of the 5-Tage-Kurs of the USP Mathematisierung, 1981 (Bielefeld Univer-sity, 1981). (p. 14)

BARBOUR, J. M. 1946. The Schillinger System of Musical Composition by JosephSchillinger. Notes 3, 3 (June), 274–283. (pp. 4, 31, 98)

BERTIN-MAHIEUX, T., ELLIS, D. P., WHITMAN, B., AND LAMERE, P. 2011. Themillion song dataset. In The 12th International Society for Music Information RetrievalConference (2011). (p. 72)

BEYLS, P. 1990. Subsymbolic approaches to musical composition: A behaviouralmodel. In Proceedings of the 1990 International Computer Music Conference (1990).(pp. 10, 26)

BEYLS, P. 1991. Chaos and creativity: The dynamic systems approach to musicalcomposition. Leonardo Music Journal 1, 1, 31–36. (pp. 11, 12, 23)

BIDLACK, R. 1992. Chaotic systems as simple (but complex) compositional algo-rithms. Computer Music Journal 16, 3, 33–47. (p. 23)

117

118 Bibliography

BILES, J. A. 1994. Genjam: a genetic algorithm for generating jazz solos. In Pro-ceedings of the 1994 International Computer Music Conference (San Francisco, 1994).International Computer Music Association. (pp. 21, 41, 67)

BILES, J. A. 2001. Autonomous GenJam: Eliminating the Fitness Bottleneck byEliminating Fitness. In Genetic and Evolutionary Computation Conference Workshop onNon-routine Design with Evolutionary Systems (2001). (p. 21)

BILES, J. A. 2007. Evoluationary computation for musical tasks. In E. R. MIRANDA

AND J. A. BILES Eds., Evolutionary computer music, Chapter 2, pp. 28–51. Springer.(pp. 12, 22)

BILES, J. A., ANDERSON, P., AND LOGGI, L. 1996. Neural network fitness func-tions for a musical IGA. In Proceedings of the International ICSC Symposium on Intel-ligent Industrial Automation (IIA’96) and Soft Computing (SOCO’96) (1996). (pp. 21,22)

BILES, J. A. AND EIGN, W. G. 1995. Genjam populi: Training an IGA via audience-mediated performance. In Proceedings of the 1995 International Computer Music Con-ference, Volume 12 (1995). (pp. 10, 21)

BILOTTA, E. AND PANTANO, P. 2002. Synthetic harmonies: an approach to musicalsemiosis by means of cellular automata. Leonardo 35/1. (p. 25)

BILOTTA, E., PANTANO, P., AND COMUNICAZIONE, C. I. D. 2001. Artificial lifemusic tells of complexity. In ALMMA (2001), pp. 17–28. (pp. 25, 26)

BILOTTA, E., PANTANO, P., AND TALARICO, V. 2000. Music generation throughcellular automata: How to give life to strange creatures. In Proceedings of GenerativeArt GA (2000). (p. 25)

BISIG, D., SCHACHER, J., AND NEUKOM, N. 2011. Composing with swarm al-gorithms — creating interactive audio-visual pieces using flocking behaviour. InProceedings of the International Computer Music Conference (Huddersfield, England,2011). (pp. 26, 27)

BIYIKOGLU, K. 2003. A Markov model for chorale harmonization. In Proceedingsof the 5th Triennial ESCOM Conference (Hanover University of Music and Drama,Germany, 2003). (p. 14)

BLACKWELL, T. 2007. Swarming and music. In E. R. MIRANDA AND J. A. BILES

Eds., Evolutionary computer music, Chapter 9, pp. 194–217. Springer. (pp. 26, 79, 95)

BLACKWELL, T. AND BENTLEY, P. 2002. Improvised music with swarms. In Pro-ceedings of the World on Congress on Computational Intelligence, Volume 2 (Los Alami-tos, CA, USA, 2002), pp. 1462–1467. IEEE Computer Society. (pp. 12, 26, 27)

BOD, R. 2001. Probabilistic grammars for music. In Belgian-Dutch Conference on Ar-tificial Intelligence (Amsterdam, 2001). (p. 72)

BOYD, M. 2011. Review: John Luther Adams: The place where you go to listen: insearch of an ecology of music. Computer Music Journal 35, 2 (June), 92–95. (p. 24)

Bibliography 119

BURTON, A. R. AND VLADIMIROVA, T. R. 1999. Generation of musical sequenceswith genetic techniques. Computer Music Journal 23, 4 (Dec.), 59–73. (pp. 20, 21, 22)

CAMBOUROPOULOS, E. 1994. Markov chains as an aid to computer assisted com-position. Musical Praxis 1, 1. (p. 14)

CHAI, W. AND VERCOE, B. 2001. Folk music classification using hidden Markovmodels. In Proc. of International Conference on Artificial Intelligence (2001). (p. 72)

CHOMSKY, N. 1957. Syntactic Structures. Walter de Gruyter GmbH and Co., Berlin.(pp. 16, 17)

COATS, P. K. 1988. Why expert systems fail. Financial Management 17, 3, 77–86.(pp. 11, 18)

COHEN, J. E. 1962. Information theory and music. Behavioral Science 7, 2 (April),137–163. (pp. 13, 14, 80)

CONNELL, N. A. D. AND POWELL, P. L. 1990. A comparison of potential appli-cations of expert systems and decision support systems. Journal of the OperationalResearch Society 41, 5, 431–439. (p. 19)

COPE, D. 1987. An expert system for computer-assisted composition. ComputerMusic Journal 11, 4, 30–46. (p. 28)

COPE, D. 1992. Computer modeling of musical intelligence in EMI. Computer Mu-sic Journal 16, 2, 69–83. (p. 18)

COPE, D. 2005. Computer Models of Musical Creativity. MIT Press, Cambridge, Mas-sachusetts. (pp. 10, 18, 19, 78)

DA SILVA, P. 2003. David Cope and Experiments in Musical Intelligence. (pp. 18,69)

DANNENBERG, R. B., THOM, B., AND WATSON, D. 1997. A machine learning ap-proach to musical style recognition. In Proceedings of the International Computer Mu-sic Conference (1997), pp. 344–347. (p. 71)

DEGAZIO, B. 1988. The Schillinger System of Musical Composition and contempo-rary computer music. In Proceedings of Diffusion! (Montreal, Canada, 1988). (pp. 3,4, 58, 64, 80)

DODGE, C. 1988. ”Profile”: A musical fractal. Computer Music Journal 12, 3, 10–14.(p. 23)

DORIN, A. 2000. Boolean networks for the generation of rhythmic structure. InProceedings of the Australasian Computer Music Conference (2000), pp. 38–45. (p. 25)

DORIN, A. 2002. Liquiprism : Generating polyrhythms with cellular automata. InProceedings of the 2002 International Conference on Auditory Display (Kyoto, Japan,2002). (pp. 25, 26)

DUBOIS, R. L. 2003. Applications of Generative String-substitution Systems in Com-puter Music. PhD thesis, Columnbia University. (pp. 24, 78)

120 Bibliography

DUKE, V. 1947. Gershwin, Schillinger, and Dukelsky: Some reminiscences. The Mu-sical Quarterly 33, 1, 102–115. (p. 1)

EBCIOGLU, K. 1988. An expert system for harmonizing four-part chorales. Com-puter Music Journal 12, 3, 43–51. (pp. 11, 17, 20, 28)

ECK, D. AND SCHMIDHUBER, J. 2002. Finding temporal structure in music: Bluesimprovisation with lstm recurrent networks. In Neural Networks For Signal Process-ing XII, Proceedings of the 2002 IEEE workshop (2002), pp. 747–756. IEEE. (p. 16)

EEROLA, T. AND TOIVIAINEN, P. 2004. MIDI Toolbox: MATLAB Tools for Music Re-search. University of Jyvaskyla, Jyvaskyla, Finland. (p. 73)

ELSEA, P. 1995. Fuzzy logic and musical decisions. Technical report, University ofCalifornia, Santa Cruz. (pp. 19, 20)

ENGELBRECHT, A. P. 2007. Computational Intelligence: An Introduction. Wiley andSons Ltd., West Sussex. (pp. 21, 22)

GARTLAND-JONES, A. 2002. Can a genetic algorithm think like a composer? InGenerative Art (2002). (pp. 10, 20, 22, 78)

GARTLAND-JONES, A. AND COPLEY, P. 2003. The suitability of genetic algorithmsfor musical composition. Contemporary Music Review, 2003 22, 3, 43–55. (pp. 20, 21)

GJERDINGEN, R. O. AND PERROTT, D. 2008. Scanning the dial: The rapid recogni-tion of music genres. Journal of New Music Research 37, 2, 93–100. (p. 71)

GLASER, B. G. AND STRAUSS, A. L. 1967. The Discovery of Grounded Theory, Vol-ume 20. Aldine. (pp. 86, 87)

GUEST, G., BUNCE, A., AND JOHNSON, L. 2006. How many interviews areenough? Field Methods 18, 1, 59–82. (p. 86)

HARLEY, J. 1995. Generative processes in algorithmic composition: Chaos and mu-sic. Leonardo 28, 3, 221–224. (pp. 10, 23)

HEDELIN, F. 2008. Formalising form: An alternative approach to algorithmic com-position. Organized Sound 13, 3 (Dec.), 249–257. (p. 17)

HILD, H., FEULNER, J., AND MENZEL, W. 1991. HARMONET: A neural net forharmonizing chorales in the style of J. S. Bach. In NIPS’91 (1991), pp. 267–274.(p. 68)

HILLER, L. 1981. Composing with computers: A progress report. Computer MusicJournal 5, 4, 7–21. (p. 78)

HILLER, L. AND ISAACSON, L. 1959. Experimental Music. McGraw-Hill, Westport,Connecticut. (pp. 10, 14)

HILLER, L. A. AND BAKER, R. A. 1964. Computer Cantata: A study in composi-tional method. Perspectives of New Music 3, 1, 62–90. (p. 10)

HINDEMITH, P. 1945. The Craft of Musical Composition, Volume 1. Associated MusicPublishers, Inc., London. (pp. 28, 44)

Bibliography 121

HOLTZMAN, S. R. 1981. Using generative grammars for music composition. Com-puter Music Journal 5, 1, 51–64. (pp. 17, 78)

HOPGOOD, A. A. 2011. Intelligent Systems for Engineers and Scientists (Third ed.).CRC Press. (p. 20)

HORNEL, D. AND MENZEL, W. 1998. Learning musical structure and style withneural networks. Computer Music Journal 22, 4, 44–62. (pp. 15, 16, 44)

HURON, D. 2002. Music information processing using the Humdrum toolkit: Con-cepts, examples, and lessons. Computer Music Journal 26, 2 (July), 11–26. (p. 73)

HUSBANDS, P., COPELY, P., ELDRIDGE, A., AND MANDELIS, J. 2007. An introduc-tion to evolutionary computing for musicians. In E. R. MIRANDA AND J. A. BILES

Eds., Evolutionary Computer Music, Chapter 1, pp. 1–27. Springer. (p. 20)

JOHANSON, B. E. AND POLI, R. 1998. GP-music: An interactive genetic program-ming system for music generation with automated fitness raters. Technical Re-port CSRP-98-13 (May), University of Birmingham, School of Computer Science.(p. 68)

JOHNSON-LAIRD, P. N. 1991. Jazz improvisation: A theory at the computationallevel. In P. HOWELL, R. WEST, AND I. CROSS Eds., Representing Musical Structure,pp. 291–325. Academic Press. (p. 67)

KIERNAN, F. J. 2000. Score-based style recognition using artificial neural networks.In Cognition (2000). (p. 72)

KIRKE, A. AND MIRANDA, E. R. 2009. A survey of computer systems for expres-sive music performance. ACM Computing Surveys 42, 1 (Dec.), 3:1–3:41. (p. 69)

KOHONEN, T. 1989. A self-learning musical grammar, or “associative memory ofthe second kind”. In Proceedings of the 1989 International Joint Conference on NeuralNetworks (1989), pp. 1–5. (p. 44)

KOSINA, K. 2002. Music genre recognition. Master’s thesis, University of Hagen-berg. (p. 72)

LAINE, P. AND KUUSKANKARE, M. 1994. Genetic algorithms in musical style ori-ented generation. In Proceedings of the First IEEE Conference on Evolutionary Compu-tation, 1994. IEEE World Congress on Computational Intelligence., Volume 2 (jun 1994),pp. 858–862. (p. 22)

LAZAR, J., FENG, J., AND HOCHHEISER, H. 2010. Research Methods in Human-Computer Interaction (First ed.). John Wiley and Sons Ltd. (pp. 86, 87)

LERDAHL, F. AND JACKENDOFF, R. 1983. A Generative Theory of Tonal Music, Vol-ume 7. MIT Press. (pp. 16, 17, 23, 80)

LINDENMAYER, A. 1968. Mathematical models for cellular interactions in develop-ment. Journal of Theoretical Biology 18, 3, 280–299. (p. 23)

MANDELBROT, B. B. 1983. The Fractal Geometry of Nature. W. H. Freeman and Com-pany, New York. (p. 23)

122 Bibliography

MCKAY, C. 2004. Automatic genre classification of MIDI recordings. Master’s the-sis, McGill University. (pp. 73, 75, 76)

MCKAY, C. 2010. Automatic Music Classification with jMIR. PhD thesis, McGill Uni-versity. (pp. 72, 73, 74)

MCKAY, C. AND FUJINAGA, I. 2005. The Bodhidharma system and the results ofthe MIREX 2005 symbolic genre classification contest. In International Conference onMusic Information Retrieval (2005). (pp. 73, 75, 93)

MICKSELSEN, W. C. 1977. Hugo Riemann’s Theory of Harmony. University of Ne-braska Press. (p. 17)

MILLEN, D. 2004. An interactive cellular automata music application in Cocoa. InProceedings of the 2004 International Computer Music Conference (San Francisco, 2004).(p. 25)

MINGERS, J. 1986. Expert systems-experiments with rule induction. The Journal ofthe Operational Research Society 37, 11, 1031–1037. (pp. 11, 12, 17, 28)

MIRANDA, E. 2001. Composing Music with Computers. Butterworth-Heinemann,Newton, MA, USA. (pp. 3, 10, 12, 18, 23, 26, 67, 80)

MIRANDA, E. R. 2003. On the music of emergent behavior: What can evolutionarycomputation bring to the musician? Leonardo 36, 1, 55–59. (pp. 3, 24, 25, 26, 27)

MOZER, M. C. 1994. Neural network music composition by prediction: Exploringthe benefits of psychoacoustic constraints and multiscale processing. In ConnectionScience (1994), pp. 247–280. (pp. 15, 16)

NEUMANN, J. AND BURKS, A. 1966. Theory of self-reproduction automata. UrbanaIL University of Illinois Press. (p. 24)

NIERHAUS, G. 2009. Algorithmic Composition: paradigms of Automated Music Gener-ation. Springer. (pp. 1, 3, 4, 9, 12)

PACHET, F. AND CAZALY, D. 2000. A taxonomy of musical genres. In Analysis,Volume 2 (2000), pp. 1238–1245. (p. 71)

PACHET, F. AND ROY, P. 2001. Musical harmonization with constraints : A survey.Constraints 6, 1, 7–19. (pp. 9, 17, 18)

PEARCE, M. AND WIGGINS, G. 2001. Towards a framework for the evaluation ofmachine compositions. In Proceedings of the AISB01 Symposium on AI and Creativityin Arts and Science. AISB (2001), pp. 22–32. (p. 68)

PEREIRA, F., GRILO, C., MACEDO, L., AND CARDOSO, A. 1997. Composing musicwith case-based reasoning. In Proceedings of Computational Models of Creative Cogni-tion (Mind (1997). (pp. 19, 68, 80)

PHON-AMNUAISUK, S. 2004. Logical representation of musical concepts (for anal-ysis and composition tasks using computers). In SMC04 proceedings (2004). (pp. 11,16, 28)

Bibliography 123

PHON-AMNUAISUK, S., TUSON, A., AND WIGGINS, G. 1999. Evolving musicalharmonisation. In Reproduction (1999), pp. 1–9. Springer Verlag Wien. (pp. 21, 22,68)

PINKERTON, R. C. 1956. Information theory and melody. Scientific American 194, 2,77–86. (p. 80)

PISTON, W. 1987. Harmony (Fifth ed.). W. W. Norton and Company, Inc., New York.(pp. 4, 17, 28)

PONCE DE LEON, P. J., INESTA, J. M., AND PEREZ-SANCHO, C. 2004. A shallowdescription framework for musical style recognition. In Structural Syntactic and Sta-tistical Pattern Recognition: Proceedings of the joint IAPR International Workshops, SSPR2004 and SPR 2004 (Lisbon, Portugal, 2004), pp. 876–884.

PRUSINKIEWICZ, P. 1986. Score generation with L-systems. In Proceedings of the1986 International Computer Music Conference (1986), pp. 455–457. (p. 23)

PUENTE, A. O., ALFONSO, R. S., AND MORENO, M. A. 2002. Automatic compo-sition of music by means of grammatical evolution. SIGAPL APL Quote Quad 32, 4(June), 148–155. (pp. 22, 68)

QUIST, N. 2002. Toward a reconstruction of the legacy of Joseph Schillinger.Notes 58, 4, 765–786. (pp. 1, 70)

RADER, G. M. 1974. A method for composing simple traditional music by com-puter. Communications ACM 17, 11 (Nov.), 631–638. (pp. 17, 95)

RADICIONI, D. AND ESPOSITO, R. 2006. Learning tonal harmony from Bachchorales. In Proceedings of the 7th International Conference on Cognitive Modelling, 2006(2006). (p. 68)

REYNOLDS, C. W. 1987. Flocks, herds and schools: A distributed behavioralmodel. SIGGRAPH Computer Graphics 21, 4 (Aug.), 25–34. (p. 26)

RIBEIRO, P., PEREIRA, F. C., FERRAND, M., AND CARDOSO, A. 2001. Case-basedmelody generation with MuzaCazUza. In AISB’01 (2001). (p. 19)

ROADS, C. 1996. The Computer Music Tutorial. MIT Press, Cambridge, MA, USA.(p. 1)

ROADS, C. AND WIENEKE, P. 1979. Grammars as representations for music. Com-puter Music Journal 3, 1, 48–55. (p. 17)

ROHRMEIER, M. 2011. Towards a generative syntax of tonal harmony. Journal ofMathematics and Music 5, 1 (march), 35–53. (p. 28)

RUFER, J. 1965. Composition with Twelve Notes Related Only to One Another (Thirded.). Barrie and Rockliff, London. (pp. 41, 44)

RUPPIN, A. AND YESHURUN, H. 2006. MIDI music genre classification by invari-ant features. In Proceedings of the 7th International Conference on Music InformationRetrieval (2006), pp. 397–399. (p. 72)

RUSSELL, S. AND NORVIG, P. 2003. Artificial Intelligence: A Modern Approach (Sec-ond ed.). Prentice Hall, New Jersey. (p. 15)

124 Bibliography

SABATER, J., ARCOS, J. L., AND DE MANTARAS, R. L. 1998. Using rules to sup-port case-based reasoning for harmonizing melodies. In Multimodal Reasoning Pa-pers from the 1998 AAAI Spring Symposium (1998), pp. 147–151. (pp. 11, 18, 19)

SCARINGELLA, N., ZOIA, G., AND MLYNEK, D. 2006. Automatic genre classifi-cation of music content: a survey. Signal Processing Magazine, IEEE 23, 2, 133–141.(pp. 71, 72, 76)

SCHENKER, H. 1954. Harmony. University of Chicago Press, Chicago. (p. 17)

SCHILLINGER, J. 1976. The Mathematical Basis of the Arts. Da Capo, New York.(p. 1)

SCHILLINGER, J. 1978. The Schillinger System of Musical Composition. Da Capo, NewYork. (pp. vii, 1, 2, 70, 95, 96)

SCHOENBERG, A. 1969. Structural Functions of Harmony (Second ed.). W. W. Nortonand Company, Inc. (p. 19)

SHAN, M.-K. AND KUO, F.-F. 2003. Music style mining and classification bymelody. In IEICE Transactions On Information And Systems, Volume 1 (2003), pp. 1–6.IEEE. (p. 72)

SORENSEN, A. AND GARDNER, H. 2010. Programming with time: cyber-physicalprogramming with impromptu. In Proceedings of the ACM international conference onObject oriented programming systems languages and applications, OOPSLA ’10 (NewYork, NY, USA, 2010), pp. 822–834. ACM. (pp. 10, 31)

SORENSEN, A. C. AND BROWN, A. R. 2008. A computational model for the gener-ation of orchestral music in the Germanic symphonic tradition: A progress report.In Sound : Space - The Australasian Computer Music Conference (Sydney, 2008), pp.78–84. ACMA.

SPECTOR, L. AND ALPERN, A. 1995. Induction and recapitulation of deep musicalstructure. In Proceedings of International Joint Conference on Artificial Intelligence, IJ-CAI’95 Workshop on Music and AI (Montreal, Quebec, Canada, 20-25 August 1995).(p. 16)

STEEDMAN, M. J. 1984. A generative grammar for jazz chord sequences. MusicPerception: An Interdisciplinary Journal 2, 1, 52–77. (pp. 16, 17, 18)

STORINO, M., DALMONTE, R., AND BARONI, M. 2007. An investigation on theperception of musical style. Music Perception: An Interdisciplinary Journal 24, 5 (June),417–432. (pp. 17, 18, 68, 95)

SUPPER, M. 2001. A few remarks on algorithmic composition. Computer Music Jour-nal 25, 1 (March), 48–53. (pp. 7, 28)

THOM, B. 2000. Artificial intelligence and real-time interactive improvisation. InAAAI-2000 Music and AI Workshop (Austin, Texas, 2000), pp. 35–39. (p. 10)

TODD, P. M. 1989. A connectionist approach to algorithmic composition. ComputerMusic Journal 13, 4, 27–43. (pp. 15, 16)

Bibliography 125

VOSS, R. F. AND CLARKE, J. 1978. 1/f noise in music: Music from 1/f noise. Journalof the Acoustical Society of America 63, 1, 258–263. (p. 23)

WIDMER, G. AND GOEBL, W. 2004. Computational models of expressive musicperformance: The state of the art. Journal of New Music Research 33, 203–216. (p. 69)

WIGGINS, G., MIRANDA, E., SMAILL, A., AND HARRIS, M. 1993. A Frameworkfor the Evaluation of Music Representation Systems. Computer Music Journal 17, 3,31–42. (p. 68)

WOLFRAM, S. 2002. A New Kind of Science. Wolfram Media. (pp. 24, 25)

XENAKIS, I. 1992. Formalized Music: Thought and Mathematics in Music. PendragonPress. (p. 13)

XU, C., MADDAGE, N., SHAO, X., CAO, F., AND TIAN, Q. 2003. Musical genreclassification using support vector machines. In IEEE International Conference onAcoustics, Speech, and Signal Processing, 2003. Proceedings., Volume 5 (april 2003),pp. 429–32. (p. 72)

ZADEH, L. 1965. Fuzzy sets. Information Control 8, 338–353. (p. 20)

ZENG, X.-J. AND KEANE, J. 2005. Approximation capabilities of hierarchical fuzzysystems. IEEE Transactions on Fuzzy Systems 13, 5 (oct.), 659–672. (p. 20)

ZICARELLI, D. 1987. M and Jam Factory. Computer Music Journal 11, 4, 13–29.(p. 10)

ZICARELLI, D. 2002. How I learned to love a program that does nothing. ComputerMusic Journal 26, 4 (Dec.), 44–51. (pp. 10, 19, 32)

ZIMMERMANN, D. 2001. Modelling musical structures. Constraints 6, 53–83.(p. 17)

a computer model for the schillinger system of musical composition

Documents