simulating melodic and harmonic expectations for tonal...
Embed Size (px)
TRANSCRIPT

JOURNAL OF NEWMUSIC RESEARCH, 2017https://doi.org/10.1080/09298215.2017.1367010
Simulating melodic and harmonic expectations for tonal cadences usingprobabilistic models
David R. W. Searsa, Marcus T. Pearceb, William E. Caplina and Stephen McAdamsa
aMcGill University, Canada; bQueen Mary University of London, UK
ABSTRACTThis study examines how the minds predictive mechanisms contribute to the perception ofcadential closure during music listening. Using the Information Dynamics of Music model (orIDyOM) to simulate the formation of schematic expectationsa finitecontext (or ngram) modelthat predicts the next event in a musical stimulus by acquiring knowledge through unsupervisedstatistical learning of sequential structurewe predict the terminal melodic and harmonic eventsfrom 245 exemplars of the five most common cadence categories from the classical style. Ourfindings demonstrate that (1) terminal events from cadential contexts are more predictable thanthose from noncadential contexts; (2) models of cadential strength advanced in contemporarycadence typologies reflect the formation of schematic expectations; and (3) a significant decrease inpredictability follows the terminal note and chord events of the cadential formula.
ARTICLE HISTORYReceived 5 November 2016Accepted 14 July 2017
KEYWORDSCadence; expectation;statistical learning;segmental grouping;ngrammodels
1. Introduction
In the intellectual climate now prevalent, many scholarsview the brain as a statistical sponge whose purpose isto predict the future (Clark, 2013). While descending astaircase, for example, even slightlymisjudging the heightor depth of each step could be fatal, so the brain predictsfuture steps by building a mental representation of thestaircase, using incoming auditory, visual, haptic andproprioceptive cues to minimise potential predictionerrors and update the representation in memory. Researchers sometimes call these representationsschemataactive, developing patterns whose units areserially organised, not simply as individual memberscoming one after the other, but as a unitarymass (Bartlett,1932, p. 201). Over the course of exposure, theseschematic representations obtain greater specificity,thereby increasing our ability to navigate complex sensory environments and predict future outcomes.
Among music scholars, this view was first crystallisedby Meyer (1956, 1967), with the resurgence of associationist theories in the cognitive scienceswhich placedthe brains predictive mechanisms at the forefront ofcontemporary research in music psychologyfollowingsoon thereafter. Krumhansl (1990) has suggested, for example, that composers often exploit the brainspotential for prediction by organising events on themusical surface to reflect the kinds of statistical
CONTACT David R. W. Sears [email protected] David R. W. Sears, Texas Tech University, J.T. & Margaret Talkington College of Visual & PerformingArts, Holden Hall, 1011 Boston Ave., Lubbock, TX 79409, USA
The underlying research materials for this article can be accessed at https://doi.org/10.1080/09298215.2017.1367010.
regularities that listeners will learn and remember. Thetonal cadence is a case in point. As a recurrent temporalformula appearing at the ends of phrases, themes andlarger sections in music of the commonpractice period,the cadence provides perhaps the clearest instance ofphraselevel schematic organisation in the tonal system.To be sure, cadential formul flourished in eighteenthcentury compositional practice by serving to mark thebreathing places in the music, establish the tonality, andrender coherent the formal structure, thereby cementingtheir position throughout the entire period of commonharmonic practice (Piston, 1962, p. 108). As a consequence, Sears (2015, 2016) has argued that cadences arelearned and remembered as closing schemata, wherebythe initial events of the cadence activate the corresponding schematic representation in memory, allowinglisteners to form expectations for the most probable continuations inprospect. The subsequent realisationof thoseexpectations then serves to close off both the cadenceitself, and perhaps more importantly, the longer phrasestructural process that subsumes it.
There is a good deal of support for the role played byexpectation and prediction in the perception ofclosure (Huron, 2006; Margulis, 2003; Meyer, 1956;Narmour, 1990), with scholars also sometimessuggesting that listeners possess schematic representations for cadences and other recurrent closing patterns
2017 Informa UK Limited, trading as Taylor & Francis Group
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017
http://www.tandfonline.comhttp://crossmark.crossref.org/dialog/?doi=10.1080/09298215.2017.1367010&domain=pdfhttps://doi.org/10.1080/09298215.2017.1367010

2 D. R. W. SEARS ET AL.
(Eberlein, 1997; Eberlein & Fricke, 1992; Gjerdingen,1988;Meyer, 1967; Rosner &Narmour, 1992; Temperley,2004). Yet currently very little experimental evidencejustifies the links between expectancy, prediction, andthe variety of cadences in tonal music or indeed, morespecifically, inmusic of the classical style (Haydn,Mozart,and Beethoven), where the compositional significance ofcadential closure is paramount (Caplin, 2004; Hepokoski& Darcy, 2006; Ratner, 1980; Rosen, 1972). This point issomewhat surprising given that the tonal cadence is thequintessential compositional device for suppressing expectations for further continuation (Margulis, 2003). Theharmonic progression and melodic contrapuntal motionwithin the cadential formula elicit very definite expectations concerning the harmony, the melodic scale degreeand the metric position of the goal event. As Huronputs it, it is not simply the final note of the cadencethat is predictable; the final note is often approachedin a characteristic or formulaic manner. If cadences aretruly stereotypic, then this fact should be reflected inmeasures of predictability (2006, p. 154). If Huron isright, applying a probabilistic approach to the cadencesfrom a representative corpus should allow us to examinethese claims empirically.
This study applies and extends a probabilistic accountof expectancy formation called the Information Dynamics of Music model (or IDyOM)a finitecontext (orngram) model that predicts the next event in a musical stimulus by acquiring knowledge through unsupervised statistical learning of sequential structuretoexamine how the formation, fulfilment, and violationof schematic expectations may contribute to the perception of cadential closure during music listening (Pearce,2005). IDyOM is based on a class of Markov modelscommonly used in statistical language modelling(Manning & Schtze, 1999), the goal of which is to simulate the learning mechanisms underlying human cognition. Pearce explains,
It should be possible to design a statistical learning algorithm ... with no initial knowledge of sequential dependencies between melodic events which, given exposureto a reasonable corpus of music, would exhibit similarpatterns of melodic expectation to those observed inexperiments with human subjects. (Pearce, 2005, p. 152)
Unlike language models, which typically deal withunidimensional inputs, IDyOMgenerates predictions formultidimensional melodic sequences using the multiple viewpoints framework developed by Conklin (1988,1990) and Conklin and Witten (1995), which is to saythat Pearces model generates predictions for viewpointslike chromatic pitch by combining predictions from anumber of potential viewpoints using a set of simpleheuristics to minimise model uncertainty (Pearce, Conklin, & Wiggins, 2005). In the past decade, studies have
demonstrated the degree to which IDyOM can simulate the responses of listeners in tasks involving melodicsegmentation (Pearce, Mllensiefen, & Wiggins, 2010),subjective ratings of predictive uncertainty (Hansen &Pearce, 2014), subjective and psychophysiological emotional responses to expectancy violations (Egermann,Pearce, Wiggins, & McAdams, 2013), and behavioural(Omigie, Pearce, & Stewart, 2012; Pearce & Wiggins,2006; Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya,2010), electrophysiological (Omigie, Pearce,Williamson,& Stewart, 2013), and neural measures of melodic pitchexpectations (Pearce, Ruiz, et al., 2010). And yet, themajority of these studies were limited to the simulation ofmelodic pitch expectations, so this investigation developsnew representation schemes that also permit the probabilistic modelling of harmonic sequences in complexpolyphonic textures.
To consider how IDyOM might simulate schematicexpectations in cadential contexts, this study adopts acorpusanalytic approach, using the many methods ofstatistical inference developed in the experimentalsciences to examine a few hypotheses about cadentialexpectancies. To that end, Section 2 provides a brief summary and discussion of the cadence concept, as well asthe typology on which this study is based (Caplin, 1998,2004), and then offers three hypotheses designed to examine the link between prediction and cadential closure.Next, Section3 introduces themultiple viewpoints framework employed by IDyOM, and Section 4 describes themethods for estimating the conditional probability function for individualmelodic or harmonic viewpoints usingmaximumlikelihood (ML) estimation and thepredictionbypartialmatch (PPM) algorithm. We then present inSection 5 the corpus of expositions and the annotated cadence collection from Haydns string quartets anddescribe Pearces procedure for improving model performance by combining viewpoint models into a singlecomposite prediction for eachmelodic or harmonic eventin the sequence. Finally, Section 6 presents the results ofthe computational experiments, and Section 7 concludesby discussing limitations of the modelling approach andconsidering avenues for future research.
2. The classical cadence
Like many of the concepts in circulation in musicscholarship (e.g. tonality, harmony, phrase, meter), thecadence concept has been extremely resistant to definition. To sort through the profusion of terms associatedwith cadence, Blombach (1987) surveyed definitions ineightyone textbooks distributed around amedian publication date of 1970. Her findings suggest that the cadence
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 3
is most frequently characterised as a time span, whichconsists of a conventionalised harmonic progression, andin some instances, a falling melody. In over half of thetextbooks surveyed, these harmonic and melodic formul are also classified into a compendium of cadencetypes, with the degree of finality associated with eachtype sometimes leading to comparisonswith punctuationin language. However, many of these definitions alsoconceptualise the cadence as a point of arrival (Ratner,1980), or time point, which marks the conclusion of anongoing phrasestructural process, and which is oftencharacterised as a moment of rest, quiescence, relaxationor repose. Thus a cadence is simultaneously understoodas timespan and timepoint, the former relating to itsmost representative (or recurrent) features (cadence asformula), the latter to the presumed boundary it precedesand engenders (cadence as ending) (Caplin, 2004).
The compendium of cadences and other conventionalclosing patterns associated with the classical period isenormous, but contemporary scholars typically cite onlya few, which may be classified according to two fundamental types: those cadences for which the goal ofthe progression is tonic harmony (e.g. perfect authentic,imperfect authentic, deceptive, etc.), and those cadencesfor which the goal of the progression is dominant harmony (e.g. half cadences). Table 1 provides the harmonicand melodic characteristics for five of the most commoncadence categories from Caplins typology (1998, 2004).The perfect authentic cadence (PAC), which features aharmonic progression from a rootposition dominant toa rootposition tonic, as well as the arrival of the melodyon 1, serves as the quintessential closing pattern not onlyfor the high classical period (Gjerdingen, 2007), but forrepertories spanningmuch of the history ofWesternmusic. The imperfect authentic cadence (IAC) is a melodicvariant of the PAC category that replaces 1 with 3 (or,more rarely, 5) in the melody, and like the PAC category,typically appears at the conclusion of phrases, themes, orlarger sections.
The next two categories represent cadential deviations, in that they initially promise a perfect authenticcadence, yet fundamentally deviate from the patternsterminal events, thus failing to achieve authentic cadential closure at the expected moment Caplin calls cadential arrival (1998, p. 43). The deceptive cadence (DC)leaves harmonic closure somewhat open by closing witha nontonic harmony, usually vi, but the melodic lineresolves to a stable scale degree like 1 or 3, thereby providing a provisional sense of ending for the ongoingthematic process. The evaded cadence is characterised bya sudden interruption in the projected resolution of thecadential process. For example, instead of resolving to 1,the melody often leaps up to some other scale degree like
Table 1. The cadential types and categories, along with theharmonic and melodic characteristics and the count for eachcategory in the cadence collection. Categories marked with anasterisk are cadential deviations.
Types Categories Essential characteristics N
I
Perfect Authentic (PAC) V I 1221
Imperfect Authentic (IAC) V I 93 or 5
Deceptive (DC)* V ?, Typically vi 191 or 3
Evaded (EV)* V ? 11?, Typically 5
V Half (HC) ? V 845, 7, or 2
5, thereby replacing the expected ending with materialthat clearly initiates the subsequent process. Thus, theevaded cadence projects no sense of ending whatsoever,as the events at the expected moment of cadential arrival,which should group backward by ending the precedingthematic process, instead group forward by initiatingthe subsequent process. Finally, the half cadence (HC)remains categorically distinct from both the authenticcadence categories and the cadential deviations, sinceits ultimate harmonic goal is dominant (and not tonic)harmony. TheHC category also tends to be definedmoreflexibly than the other categories in that the terminalharmonymay support any chordmember in the soprano(i.e. 2, 5, or 7).
This study examines three claims about the link between prediction and cadential closure. First, if cadencesserve as the most predictable, probabilistic, specificallyenvisaged formul in all of tonal music (Huron, 2006;Meyer, 1956), we would expect terminal events fromcadential contexts to bemore predictable than those fromnoncadential contexts even if both contexts sharesimilar or even identical terminal events (e.g. tonic harmony in root position, 1 in the melody, etc.). Thus, Experiment 1 examines the hypothesis that cadences aremore predictable than their noncadential counterpartsby comparing the probability estimates obtained fromIDyOM for the terminal events from the PAC and HCcategoriesthe two most prominent categories in tonalmusicwith those fromnoncadential contexts that shareidentical terminal events.
Second, applications of cadence typologies like theone employed here often note the correspondence between cadential strength (or finality) on the one handand expectedness (or predictability) on the other.Dunsbyhas noted, for example, that in Schoenbergs view, theexperience of closure for a given cadential formula is onlysatisfying to the extent that it fulfils a stylisticexpectation (1980, p. 125). This would suggest that thestrength and specificity of our schematic expectations
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

4 D. R. W. SEARS ET AL.
formed in prospect and their subsequent realisation inretrospect contributes to the perception of cadentialstrength, where themost expected (i.e. probable) endingsare also the most complete or closed. Sears (2015) pointsout that models of cadential strength advanced in contemporary cadence typologies typically fall into two categories: those that compare every cadence category to theperfect authentic cadence (Latham, 2009; Schmalfeldt,1992), called the 1schema model; and those that distinguish the PAC, IAC andHC categories from the cadentialdeviations because the former categories allow listenersto generate expectations as to how they might end, calledthe Prospective (orGenuine) Schemasmodel (Sears, 2015,2016). In the 1schemamodel, the half cadence representsthe weakest cadential category; it is marked not by a deviation in the melodic and harmonic context at cadentialarrival (such as the deceptive or evaded cadences), butrather by the absence of that content, resulting in the following ordering of the cadence categories based on theirperceived strength, PACIACDCEVHC. In theProspective Schemas model, however, the half cadence isa distinct closing schema that allows listeners to generateexpectations for its terminal events, and so represents astronger ending than the aforementioned cadential deviations, resulting in the ordering, PACIACHCDCEV (for further details, see Sears, 2015). Experiment 2 directly compares these two models of cadentialstrength.
Third, a number of studies have supported the roleplayed by predictive mechanisms in the segmentationof temporal experience (Brent, 1999; Cohen, Adams, &Heeringa, 2007; Elman, 1990; Kurby & Zacks, 2008;Pearce, Mllensiefen, et al., 2010; Peebles, 2011). In eventsegmentation theory (EST), for example, perceivers formworking memory representations of what is happening now, called event models, and discontinuities in thestimulus elicit prediction errors that force the perceptual system to update the model and segment activityinto discrete time spans, called events (Kurby & Zacks,2008). In the context of music, such discontinuities cantake many forms: sudden changes in melody, harmony,texture, surface activity, rhythmic duration, dynamics,timbre, pitch register, and so on. What is more, whenthe many parameters effecting segmental grouping acttogether to produce closure at a particular point in acomposition, cadential or otherwise, parametric congruence obtains (Meyer, 1973). Thus, Experiment 3 examineswhether (1) the terminal event of a cadence, by servingas a predictable point of closure, is the most expectedevent in the surrounding sequence; and (2) the next eventin the sequence, which initiates the subsequent musicalprocess, is comparatively unexpected. Following EST,the hypothesis here is that unexpected events engender
prediction errors that lead the perceptual system to segment the event stream into discrete chunks (Kurby &Zacks, 2008). If the terminal events from genuine cadential contexts are highly predictable, then predictionerrors for the comparatively unpredictable events thatfollow should force listeners to segment the precedingcadential material. For the cadential deviations, however,prediction errors should occur at, rather than following,the terminal events of the cadence.
3. Multiple viewpoints
Most natural languages consist of a finite alphabet ofdiscrete symbols (letters), combinations of which formwords, phrases, and so on. As a result, the mappingbetween the individual letter or word encountered in aprinted text and its symbolic representation in a computer database is essentially onetoone. Music encodingis considerably more complex. Notes, chords, phrases,and the like are characterised by a number of different features, and so regardless of the unit of meaning,digital encodings of individual events must concurrentlyrepresent multiple properties of the musical surface. Tothat end, many symbolic formats employ some variantof the multiple viewpoints framework first proposed byConklin (1988, 1990) and Conklin and Witten (1995),and later extended and refined by Pearce (2005), Pearceet al. (2005), and Pearce and Wiggins (2004).
Themultiple viewpoints framework accepts sequencesof musical events that typically correspond to individualnotes as notated in a score, but which may also includecomposite events like chords. Each event e consists of a setof basic attributes, and each attribute is associated with atype, , which specifies the properties of that attribute.The syntactic domain (or alphabet) of each type, [ ],denotes the set of all unique elements associated withthat type, and each element of the syntactic domain alsomaps to a corresponding set of elements in the semanticdomain, [[ ]]. Following Conklin, attribute types appearhere in typewriter font to distinguish them from ordinary text. To represent a sequence of pitches as scaledegrees derived from the twelvetone chromatic scale,for example, the type chromatic scale degree (or csd)would consist of the syntactic set, {0, 1, 2, . . . , 11}, andthe semantic set, {1,1/2, 2, . . . , 7}, where 0 represents 1,7 represents 5, and so on (see Figure 1).
Within this representation language, Conklin andWitten (1995) define several distinct classes of type, butthis study examines just three: basic, derived and linked.Basic types are irreducible representations of the musical surface, which is to say that they cannot be derivedfrom any other type. Thus, an attribute representing thesequenceof pitches from the twelvetone chromatic scale
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 5
Figure 1. Top: First violin part from Haydns String Quartet in E,Op. 17/1, i, mm. 12. Bottom: Viewpoint representation.
hereafter referred to as chromatic pitch, or cpitchwould serve as a basic type inConklins approach becauseit cannot be derived froma sequence of pitch classes, scaledegrees, melodic intervals, or indeed, any other attribute.What is more, basic types represent every event in thecorpus. For example, a sequence of melodic contourswould not constitute a basic type because either the firstor last events of the melody would receive no value.Indeed, an interesting property of the set of n basic typesfor any given corpus is that the Cartesian product of thedomains of those types determines the event space for thecorpus, denoted by :
= [1] [2] [n]
Each event consists of an ntuple in a set of valuescorresponding to the set of basic types that determinethe event space. therefore denotes the set of all representable events in the corpus (Pearce, 2005).
As shouldnowbe clear from the examples given above,derived types like pitch class, scale degree, and melodicinterval do not appear in the event space but are derivedfrom one or more of the basic types. Thus, for every typein the encoded representation there exists a partial function, denoted by% , whichmaps sequences of events ontoelements of type . The term viewpoint therefore refers tothe function associated with its type, but for convenienceConklin and Pearce refer to viewpoints by the types theymodel.1 The function is partial because the outputmay beundefined for certain events in the sequence (denoted by). Again, viewpoints for attributes like melodic contouror melodic interval demonstrate this point, since eitherthe first or last element will receive no value (i.e. it will beundefined).
Basic and derived types attempt to model the relationswithin attributes, but they fail to represent the relationsbetween attributes. Prototypical utterances like cadences,for example, are necessarily comprised of a cluster ofcooccurring features, so it is important to note that therelations between those features could be just as signif
1For basic types like cpitch, % is simply a projection function, therebyreturning as output the same values it receives as input (Pearce, 2005, p. 59).
icant as their presence (or absence) (Gjerdingen, 1991).This is to say that the harmonic progression VI presented in isolation does not provide sufficient groundsfor the identification of a perfect authentic cadence, butthe cooccurrence of that progression with 1 in the soprano, a sixfour sonority preceding the rootpositiondominant, or a trill above the dominant makes such aninterpretation farmore likely. Linked viewpoints attemptto model correlations between these sorts of attributes bycalculating the crossproduct of their constituent types.
4. Finitecontext models
4.1. Maximum likelihood estimation
The goal of finitecontext models like IDyOM is to derive from a corpus of example sequences a model thatestimates the probability of event ei given a precedingsequence of events e1 to ei1, notated here as ei11 . Thus,the function p(eiei11 ) assumes that the identity of eachevent in the sequence depends only on the events thatprecede it. In principle, the length of the context is limitedonly by the length of the sequence ei11 , but contextmodels typically stipulate a global order bound such thatthe probability of the next event depends only on theprevious n 1 events, or p(eiei1(in)+1). Following theMarkov assumption, the model described here is an (n1)th orderMarkovmodel, but researchers also sometimescall it an ngram model because the sequence ei(in)+1 isan ngram consisting of a context ei1(in)+1 and a singleevent prediction ei.
To estimate the conditional probability functionp(eiei1(in)+1) for each event in the test sequence, IDyOMfirst acquires the frequency counts for a collection of suchsequences from a training set. When the trained modelis exposed to the test sequence, it then uses the frequencycounts to estimate the probability distribution governingthe identity of the next event in the sequence given then1preceding events (Pearce, 2005). In this case, IDyOMrelies onmaximum likelihood (ML) estimation.
p(eiei1(in)+1) =c(eiei1(in)+1)
eAc(eei1(in)+1)
(1)
The numerator terms represent the frequency count cfor the ngram eiei1(in)+1, and the denominator termsrepresent the sum of the frequency counts c associatedwith all of the possible events e in the alphabetA followingthe context ei1(in)+1.
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

6 D. R. W. SEARS ET AL.
4.2. Performancemetrics
To evaluate model performance, the most commonmetrics derive from informationtheoretic measures introduced by Shannon (1948, 1951). Returning to Equation1, if the probability of ei is given by the conditionalprobability function p(eiei1(in)+1), information content(IC) represents the minimum number of bits required toencode ei in context (MacKay, 2003).
IC(eiei1(in)+1) = log21
p(eiei1(in)+1)(2)
IC is inversely proportional to p and so represents thedegree of contextual unexpectedness or surprise associated with ei. Researchers often prefer to report IC overp because it has a more convenient scale (p can becomevanishingly small), and since it also has a welldefinedinterpretation in data compression theory (Pearce, Ruiz,et al., 2010), we will prefer it in the analyses that follow.
Whereas IC represents the degree of unexpectednessassociated with a particular event ei in the sequence,Shannon entropy (H) represents the degree of contextualuncertainty associated with the probability distributiongoverning that outcome, where the probability estimatesare independent and sum to one.
H(ei1(in)+1) =
eAp(eiei1(in)+1)IC(eiei1(in)+1) (3)
H is computed by averaging the information content overall e in A following the context ei1(in)+1. According toShannons equation, if the probability of a given outcomeis 1, the probabilities for all of the remaining outcomeswill be 0, andH = 0 (i.e. maximum certainty). If all of theoutcomes are equally likely, however,H will bemaximum(i.e. maximum uncertainty). Thus, one can assume thatthe best performing models will minimise uncertainty.
In practice, we rarely know the true probability distributionof the stochastic process (Pearce&Wiggins, 2004),so it is often necessary to evaluate model performanceusing an alternativemeasure called cross entropy, denotedby Hm.
Hm(pm, ej1) =
1j
j
i=1log2 pm(eiei11 ) (4)
Whereas H represents the average information contentover all e in the alphabet A, Hm represents the average information content for the model probabilities estimated by pm over all e in the sequence e
j1. That is,
cross entropy provides an estimate of how uncertain amodel is, on average, when predicting a given sequence
of events (Manning & Schtze, 1999; Pearce & Wiggins,2004). As a consequence, Hm is often used to evaluatethe performance of context models for tasks like speechrecognition, machine translation, and spelling correctionbecause, as Brown and his coauthors put it, models forwhich the cross entropy is lower lead directly to betterperformance (Brown, Della Pietra, Della Pietra, Lai, &Mercer, 1992, p. 39).
4.3. Prediction by Partial Match
Because the number of potential patterns decreases dramatically as the value of n increases, highorder modelsoften suffer from the zerofrequency problem, in whichngrams encountered in the test set do not appear inthe training set (Witten & Bell, 1991). To resolve thisissue, IDyOM applies a data compression scheme calledPrediction by PartialMatch (PPM), which adjusts theMLestimate for each event in the sequence by combining (orsmoothing) predictions generated at higher orders withless sparsely estimated predictions from lower orders(Cleary &Witten, 1984). Context models estimated withthe PPM scheme typically use a procedure called backoffsmoothing (or blending), which assigns some portion ofthe probability mass from each distribution to an escapeprobability using an escapemethod to accommodate predictions that do not appear in the training set. Whena given event does not appear in the n 1 order distribution, PPM stores the escape probability and theniteratively backs off to lower order distributions until itpredicts the event or reaches the zerothorder distribution, at which point it transmits the probability estimatefor a uniform distribution over A (i.e. where every eventin the alphabet is equally likely). PPM then multipliesthese probability estimates together to obtain the final(smoothed) estimate.
Unfortunately there is no sound theoretical basis forchoosing the appropriate escape method (Witten & Bell,1991), but two recent studies have demonstrated the potential of Moffats (1990) method C to minimise modeluncertainty in melodic and harmonic prediction tasks(Hedges & Wiggins, 2016; Pearce & Wiggins, 2004), sowe employ that method here.
(ei1(in)+1) =t(ei1(in)+1)
eAc(eei1(in)+1) + t(ei1(in)+1)
(5)
Escape method C represents the escape count t as thenumber of distinct symbols that follow thecontext ei1(in)+1. To calculate the escape probability forevents that do not appear in the training set, represents
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 7
the ratio of the escape count t to the sum of the frequencycounts c and t for the context ei1(in)+1. The appeal ofthis escape method is that it assigns greater weighting tohigherorder predictions (which are more specific to thecontext) over lower order predictions (which are moregeneral) in the final probability estimate (Bunton, 1996;Pearce, 2005). Thus, Equation 1 can be revised in thefollowing way:
(eiei1(in)+1) =c(eiei1(in)+1)
eAc(eei1(in)+1) + t(ei1(in)+1)
(6)
The PPM scheme just described remains the canonical method in many context models (Cleary & Teahan,1997), but Bunton (1997) has since provided a variantsmoothing technique called mixtures that generally improves model performance, but which, following Chenand Goodman (1999), we refer to as interpolated smoothing (Pearce & Wiggins, 2004). The central idea behindinterpolated smoothing is to compute a weighted combination of higher order and lower order models for everyevent in the sequenceregardless of whether that eventfeatures ngrams with nonzero countsunder the assumption that the addition of lower order models mightgenerate more accurate probability estimates.2
Formally, interpolated smoothing estimates the probability function p(eiei1(in)+1) by recursively computing aweighted combination of the (n 1)th order distributionwith the (n2)th order distribution (Pearce, 2005; Pearce&Wiggins, 2004).
p(eiei1(in)+1)
={
(eiei1(in)+1) + (ei1(in)+1)p(e
i1(in)+2) if e
i1(in)+2 =
1A+1t() otherwise
(7)
In the context of interpolated smoothing, it can be helpfulto think of as a weighting function, with serving asthe weighted ML estimate. Unlike the backoff smoothing procedure, which terminates at the first nonzeroprediction, interpolated smoothing recursively adjuststhe probability estimate for each orderregardless ofwhether the corresponding ngram features a nonzerocountand then terminateswith the probability estimatefor , which represents a uniform distribution over A+1 t() events (i.e. where every event in the alphabet isequally likely). Also note here that in the PPM scheme,
2Context models like the one just described also often use a techniquecalled exclusion, which improves the final probability estimate by reclaiming aportion of the probabilitymass in lower ordermodels that is otherwise wastedon redundant predictions (i.e. the counts for events that were predicted in thehigherorder distributions do not need to be included in the calculation of thelower order distributions).
the alphabet A increases by one event to accommodatethe escape count t but decreases by the number of eventsin A that never appear in the corpus.3
4.4. Variable orders
The optimal order for context models depends on thenature of the corpus, which in the absence of a prioriknowledge can only be determined empirically (2004, p.2). To resolve this issue, IDyOM employs an extensionto PPM called PPM* (Cleary & Teahan, 1997), whichincludes contexts of variable length and thus eliminatesthe need to impose an arbitrary order bound (Pearce &Wiggins, 2004, p. 6). In the PPM* scheme, the contextlength is allowed to vary for each event in the sequence,with the maximum context length selected using simpleheuristics to minimise model uncertainty. Specifically,PPM* exploits the fact that the observed frequency ofnovel events is much lower than expected for contextsthat feature exactly one prediction, called deterministiccontexts. As a result, the entropy of the distributionsestimated at or below deterministic contexts tends to belower than in nondeterministic contexts. Thus, PPM*selects the shortest deterministic context to serve as theglobal order bound for each event in the sequence. If sucha context does not exist, PPM* then selects the longestmatching context.
5. Methods
5.1. The corpus
The corpus consists of symbolic representations of 50sonataform expositions selected from Haydns stringquartets (17711803). Table 2 presents the reference information, keys, time signatures and tempo markingsfor each movement. The corpus spans much of Haydnsmature compositional style (Opp. 1776), with the majority of the expositions selected from first movements(28) or finales (11), and with the remainder appearing ininnermovements (ii: 8; iii: 3). Allmovementswere downloaded from the KernScores database in MIDI format.4To ensure that each instrumental part would qualify asmonophonicaprerequisite for the analytical techniquesthat followall trills, extended string techniques, andother ornaments were removed. For events presentingextended string techniques (e.g. double or triple stops),note events in each part were retained that preservedthe voice leading both within and between instrumentalparts. Table 3 provides a few descriptives concerning thenumber of note and chord events in each movement.
3For a worked example of the PPM* method, see Sears (2016).4http://kern.ccarh.org/.
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017
http://kern.ccarh.org/

8 D. R. W. SEARS ET AL.
Table 2. Reference information (Opus number, work, movement, measures), keys (case denotes mode), time signatures and tempomarkings for the exposition sections in the corpus.
Excerpt Key Time signature Tempo marking
Op. 17, No. 1, i, mm. 143 E 4/4 ModeratoOp. 17, No. 2, i, mm. 138 F 4/4 ModeratoOp. 17, No. 3, iv, mm. 126 E 4/4 Allegro moltoOp. 17, No. 4, i, mm. 153 c 4/4 ModeratoOp. 17, No. 5, i, mm. 133 G 4/4 ModeratoOp. 17, No. 6, i, mm. 173 D 6/8 PrestoOp. 20, No. 1, iv, mm. 155 E 2/4 PrestoOp. 20, No. 3, i, mm. 194 g 2/4 Allegro con spiritoOp. 20, No. 3, iii, mm. 143 G 3/4 Poco AdagioOp. 20, No. 3, iv, mm. 142 g 4/4 Allegro moltoOp. 20, No. 4, i, mm. 1112 D 3/4 Allegro di moltoOp. 20, No. 4, iv, mm. 149 D 4/4 Presto scherzandoOp. 20, No. 5, i, mm. 148 f 4/4 Allegro moderatoOp. 20, No. 6, ii, mm. 127 E cut AdagioOp. 33, No. 1, i, mm. 137 b 4/4 Allegro moderatoOp. 33, No. 1, iii, mm. 140 D 6/8 AndanteOp. 33, No. 2, i, mm. 132 E 4/4 Allegro moderatoOp. 33, No. 3, iii, mm. 129 F 3/4 AdagioOp. 33, No. 4, i, mm. 131 B 4/4 Allegro moderatoOp. 33, No. 5, i, mm. 195 G 2/4 Vivace assaiOp. 33, No. 5, ii, mm. 130 g 4/4 LargoOp. 50, No. 1, i, mm. 160 B cut AllegroOp. 50, No. 1, iv, mm. 175 B 2/4 VivaceOp. 50, No. 2, i, mm. 1106 C 3/4 VivaceOp. 50, No. 2, iv, mm. 186 C 2/4 Vivace assaiOp. 50, No. 3, iv, mm. 174 E 2/4 PrestoOp. 50, No. 4, i, mm. 164 f 3/4 Allegro spirituosoOp. 50, No. 5, i, mm. 165 F 2/4 Allegro moderatoOp. 50, No. 5, iv, mm. 154 F 6/8 VivaceOp. 50, No. 6, i, mm. 154 D 4/4 AllegroOp. 50, No. 6, ii, mm. 125 d 6/8 Poco AdagioOp. 54, No. 1, i, mm. 147 G 4/4 Allegro con brioOp. 54, No. 1, ii, mm. 154 C 6/8 AllegrettoOp. 54, No. 2, i, mm. 187 C 4/4 VivaceOp. 54, No. 3, i, mm. 158 E cut AllegroOp. 54, No. 3, iv, mm. 182 E 2/4 PrestoOp. 55, No. 1, ii, mm. 136 D 2/4 Adagio cantabileOp. 55, No. 2, ii, mm. 176 f cut AllegroOp. 55, No. 3, i, mm. 175 B 3/4 Vivace assaiOp. 64, No. 3, i, mm. 169 B 3/4 Vivace assaiOp. 64, No. 3, iv, mm. 179 B 2/4 Allegro con spiritoOp. 64, No. 4, i, mm. 138 G 4/4 Allegro con brioOp. 64, No. 4, iv, mm. 166 G 6/8 PrestoOp. 64, No. 6, i, mm. 145 E 4/4 AllegrettoOp. 71, No. 1, i, mm. 169 B 4/4 AllegroOp. 74, No. 1, i, mm. 154 C 4/4 Allegro moderatoOp. 74, No. 1, ii, mm. 157 G 3/8 Andantino graziosoOp. 76, No. 2, i, mm. 156 d 4/4 AllegroOp. 76, No. 4, i, mm. 168 B 4/4 Allegro con spiritoOp. 76, No. 5, ii, mm. 133 F cut Largo. Cantabile e mesto
Table 3. Descriptive statistics for the corpus.
Instrumental part N M (SD) Range
Note eventsViolin 1 14, 506 290 (78) 133442Violin 2 10, 653 213 (70) 69409Viola 9156 183 (63) 79381Cello 8463 169 (60) 64326
Chord eventsExpansiona 20, 290 406 (100) 189620
aTo identify chord events in polyphonic textures, full expansion duplicatesoverlapping note events at every unique onset time (Conklin, 2002).
To examine model predictions for the cadences inthe corpus, we classified exemplars of the five cadence
categories that achieve (or at least promise) cadentialarrival inCaplins cadence typologyPAC, IAC,HC,DCand EV (see Table 1). The corpus contains 270 cadences,but 15 cadences were excluded because either the cadential bass or soprano does not appear in the cello andfirst violin parts, respectively. Additionally, another 10cadences were excluded because they imply more thanone category (i.e. PACEV or DCEV). Thus, for theanalyses that follow, the cadence collection consists of245 cadences.
Shown in the rightmost column of Table 1, the perfect authentic cadence and the half cadence representthe most prevalent categories, followed by the caden
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 9
tial deviations: the deceptive and evaded categories. Theimperfect authentic cadence is the least common category, which perhaps reflects the latecentury stylisticpreference for perfect authentic cadential closure at theends of themes and larger sections. This distribution alsolargely replicates previous findings forMozarts keyboardsonatas (Rohrmeier & Neuwirth, 2011), so it is possiblethat this distribution may characterise the classical stylein general.
5.2. Viewpoint selection
To select the appropriate viewpoints for the predictionof cadences in Haydns string quartets, we have adoptedGjerdingens schematheoretic approach (2007), whichrepresents the core events of the cadence by the scaledegrees and melodic contours of the outer voices (i.e.the twovoice framework), a coefficient representing thestrength of the metric position (strong, weak), and asonority, presentedusingfiguredbass notation.Given theimportance of melodic intervals in studies of recognitionmemory for melodies (Dowling, 1981) wemight also addthis attribute toGjerdingens list.However, for themajority of the encoded cadences from the cadence collection,the terminal events at the moment of cadential arrivalappear in strongmetric positions, and fewof the cadencesfeature unexpected durations or interonset intervals atthe cadential arrival, sowehave excluded viewpointmodels for rhythmic or metric attributes from the presentinvestigation, concentrating instead on those viewpointsrepresenting pitchbased (melodic or harmonic) expectations. What is more, IDyOM was designed to combine melodic predictions from two or more viewpointsby mapping the probability distributions over their respective alphabets back into distributions over a singlebasic viewpoint, such as the pitches of the twelvetonechromatic scale (i.e. cpitch). Thus, for the purposesof model comparison it will also be useful to includecpitch as a baseline melodic model in the analyses thatfollow.
5.2.1. Note eventsFour viewpoints were initially selected to represent noteevents in the outer parts: chromatic pitch (cpitch),melodic pitch interval (melint), melodic contour(contour), and chromatic scale degree (csd). As described previously, cpitch represents pitches as integers from 0127 (in the MIDI representation, C4 is 60),and serves as the baseline model for the other melodicviewpoint models examined in this study. To derive sequences of melodic intervals, melint computes the numerical difference between adjacent events in cpitch,where ascending intervals are positive and descending
intervals are negative. The viewpoint contour thenreduces the information present in melint, with allascending intervals receiving a value of 1, all descendingintervals a value of1, and all lateral motion a value of 0.Finally, to relate cpitch to a referential tonic pitch classfor every event in the corpus, we manually annotated thekey, mode, modulations and pivot boundaries for eachmovement and then included the analysis in a separatetext file to accompany the MIDI representation, both ofwhich appear in the Supplementary materials for eachmovement in the corpus. Thus, every note event was associated with the viewpoints key and mode. The vectorof keys assumes values in the set {0, 1, 2, , 11 }, where0 represents the key of C, 1 represents C or D, andso on. Passages in the major and minor modes receivevalues of 0 and 1, respectively. The viewpoint csd thenmaps cpitch to key and reduces the resulting vectorof chromatic scale degreesmodulo 12 such that 0 denotesthe tonic scale degree, 7 the dominant scale degree, andso on. By way of example, Figure 1 presents the viewpointrepresentation for the first violin part from the openingtwo measures of the first movement of Haydns StringQuartet in E, Op. 17/1.
As mentioned previously, IDyOM is capable of individually predicting any one of these viewpoints using thePPM* scheme, but it can also combine viewpoint modelsfor noteevent predictions of the same basic viewpoint(i.e. cpitch) using a weighted multiplicative combination scheme that assigns greater weights to viewpointswhose predictions are associated with lower entropy atthat point in the sequence (Pearce et al., 2005). To determine the combined probability distribution for eachevent in the test sequence, IDyOM then computes theproduct of the weighted probability estimates from eachviewpoint model for each possible value of the predictedviewpoint.
Furthermore, IDyOM can automate the viewpoint selection process using a hillclimbing procedure calledforward stepwise selection, which picks the combinationof viewpoints that yields the richest structural representations of the musical surface and minimises modeluncertainty. Given an empty set of viewpoints, the stepwise selection algorithm iteratively selects the viewpointmodel additions or deletions that yield the most improvement in cross entropy, terminating when no addition or deletion yields an improvement (Pearce, 2005;Potter, Wiggins, & Pearce, 2007). To derive the optimalviewpoint system for the representation of melodicexpectations, we employed stepwise selection for the following viewpoints: cpitch, melint, csd, andcontour. In this case, IDyOMbegins with the above setof viewpoint models, but also includes the linked viewpoints derived from that set (i.e. cpitch melint,
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

10 D. R. W. SEARS ET AL.
cpitchcsd,cpitchcontour,melintcsd,melint contour, csd contour), resulting in apool of ten individual viewpoint models from which toderive the optimal combination of viewpoints.
Viewpoint selection derived the same combination ofviewpoint models for the first violin and the cello. Forthis corpus, melint was the best performing viewpointmodel in the first step, receiving a cross entropy estimateof 3.006 in the first violin and 2.798 in the cello. Inthe second step, the combination of melint with thelinked viewpoint csd cpitch decreased the crossentropy estimate to 2.765 in the first violin and 2.556in the cello. Including any of the remaining viewpointsdid not improve model performance, so the stepwiseselection procedure terminated with this combination ofviewpoints. In Section 6, we refer to this viewpointmodelas selection. What is more, the contour modelreceived a much higher cross entropy estimate than theother viewpoint models, so we elected to exclude it in theexperiments reported here. Thus, the final melodic viewpoint models selected for the present study are cpitch,melint, csd, and selection.
5.2.2. Chord eventsTo accommodate chord events, we have extended themultiple viewpoints framework by performing a full expansion of the symbolic encoding, which duplicates overlapping note events across the instrumental parts at every unique onset time (Conklin, 2002). This representation yielded two harmonic viewpoints: vertical intervalclass combination (vintcc) and chromatic scaledegreecombination (csdc). The viewpoint vintcc producesa sequence of chords that have analogues in figuredbass nomenclature by modelling the vertical intervals insemitones modulo 12 between the lowest instrumentalpart and the upper parts from cpitch. Unfortunately,however, the syntactic domain of vintcc is rather large;the domain of each vertical interval class between anytwo instrumental parts is {0, 1, 2, . . . , 11,}, yielding 13possible classes, so the number of combinatorial possibilities for combinations containing two, three, or fourinstrumental parts is 133 1, or 2196 combinations.
To reduce the syntactic domain while retaining thosechord combinations that approximate figured bass symbols, Quinn (2010) assumed that the precise locationand repeated appearance of a given interval in the instrumental texture are inconsequential to the identity ofthe combination. Adopting that approach here, we haveexcluded note events in the upper parts that double thelowest instrumental part at the unison or octave, allowedpermutations between vertical intervals, and excluded interval repetitions. As a consequence, the first two criteriareduce the major triads 4, 7, 0 and 7, 4, 0 to 4, 7,,
while the third criterion reduces the chords 4, 4, 10and 4, 10, 10 to 4, 10,. This procedure dramaticallyreduces the potential domain of vintcc from 2196 to232 unique vertical interval class combinations, thoughthe corpus only contained 190 of the 232 possible combinations, reducing the domain yet further.
To relate each combination to an underlying tonic, theviewpoint csdc represents vertical sonorities as combinations of chromatic scale degrees that are intendedto approximate Roman numerals. The viewpoint csdcincludes the chromatic scale degrees derived from csdas combinations of two, three or four instrumental parts.Here, the number of possibilities increases exponentiallyto 134 131, or 28, 548 combinations, since the cellopart is now encoded explicitly in combinationscontaining all four parts. Rather than treating permutablecombinations as equivalent (e.g. 0, 4, 7, and4, 7, 0,), as was done for vintcc, it will also beuseful to retain the chromatic scale degree in the lowestinstrumental part in csdc and only permit permutations in the upper parts. Excluding voice doublings andpermitting permutations in the upper parts reduces thepotential domain of csdc to 2784, though in the corpusthe domain reduced yet further to 688 distinct combinations.
Finally, a composite viewpoint was also created torepresent those viewpoint models characterising pitchbased (i.e. melodic and harmonic) expectations moregenerally. To simulate the cognitive mechanisms underlying melodic segmentation, Pearce, Mllensiefen, et al.(2010) found it beneficial to combine viewpoint predictions for basic attributes like chromatic pitch, interonsetinterval, and offsettoonset interval by multiplying thecomponent probabilities to reach an overall probabilityfor each note in the sequence as the joint probability ofthe individual basic attributes being predicted. Followingtheir approach, the viewpoint model composite represents the product of the selection viewpoint modelfrom the first violin (to represent melodic expectations)and the csdc viewpoint model (to represent harmonicexpectations) for each unique onset time for which a noteand chord event appear in the corpus. In this case, csdcwas preferred to vintcc in the composite modelbecause the former viewpoint explicitly encodes the chromatic scaledegree successions in the lowest instrumentalpart along with the relevant scale degrees from the upperparts.
5.3. Longterm vs. shortterm
To improve model performance, IDyOM separatelyestimates and then combines two subordinate modelstrained on different subsets of the corpus for each view
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 11
point: a longtermmodel (LTM), which is trained on theentire corpus to simulate longterm, schematic knowledge; and a shortterm model (STM), which is initiallyempty for each individual composition and then is trainedincrementally to simulate shortterm, dynamic knowledge (Pearce &Wiggins, 2012). As a result, the longtermmodel reflects interopus statistics from a large corpusof compositions, whereas the shortterm model only reflects intraopus statistics, some of which may be specificto that composition (Conklin & Witten, 1995; Pearce& Wiggins, 2004). Like the STM, the LTM may alsobe slightly improved by incrementally training on thecomposition being predicted, called LTM+.However, theSTM only discards statistics when it reaches the end ofthe composition, so it far surpasses the supposed upperlimits for shortterm andworkingmemory of around 1012 s (Snyder, 2000), sometimes by several minutes. Whatis more, the STM should be irrelevant for the presentpurposes, since cadences exemplify the kinds of interopus patterns that listeners are likely to store in longterm memory. Thus, we have elected to omit the STM inthe analyses that follow and only present the probabilityestimates from LTM+.
5.4. Performance evaluation
Context models like IDyOM depend on a training setand a test set, but in this case the corpus will need toserve as both. To accommodate small corpora like thisone, IDyOMemploys a resampling approach called kfoldcrossvalidation (Dietterich, 1998), using cross entropy asameasure of performance (Conklin&Witten, 1995). Thecorpus is divided into k disjoint subsets containing thesame number of compositions, and the LTM+ is trainedk times on k 1 subsets, each time leaving out a differentsubset for testing. IDyOM then computes an average ofthe k cross entropy values as a measure of the modelsperformance. Following Pearce and Wiggins (2004), weuse 10fold cross validation for the models that follow.
6. Computational experiments
6.1. Experiment 1
Theperfect authentic andhalf cadence categories accountfor 206 of the 245 cadences from the collection, so itseems reasonable that listeners with sufficient exposureto music of the classical style will form schematic expectations for the terminal events of exemplars from thesetwo categories. What is more, if cadences are the mostpredictable formul in all of tonal music, we shouldexpect to find lower IC estimates for the terminal eventsfrom the aforementioned cadence categories compared
to those from noncadential closing contexts even if theyboth share similar or even identical terminal events. Thus,Experiment 1 examines the hypothesis that cadences aremore predictable than their noncadential counterparts.
6.1.1. AnalysisTo compare the PAC and HC categories against noncadential contexts exhibiting varying degrees of closureor stability, each of the viewpoints estimated by IDyOMwas analysed for the terminal note events from the first violin and cellorepresented by the viewpoints cpitch,melint, csd, and selectionand the terminalchord events from the entire texturerepresented by theviewpoints vintcc, csdc, and compositeusing aoneway analysis of variance (ANOVA)with a threelevelbetweengroups factor called closure. To examine the ICestimates for the first (tonic) type, tonic closure consistsof three levels:PAC, which consists of the IC estimates forthe terminal events from the 122 exemplars of the PACcategory; tonic, which consists of an equalsized sampleof events selected randomly from the corpus that appearin strong metric positions (i.e. appearing at the tactuslevel; see Sears, 2016) and feature tonic harmony in rootposition and any scale degree in the soprano; and nontonic, which again consists of an equalsized sample ofevents selected randomly from the corpus that appearin strong metric positions, but that feature any otherharmony and any other scale degree in the soprano.
To examine the IC estimates for the second (dominant) type, dominant closure was designed in much thesameway.HC consists of the IC estimates for the terminalevents from the 84 exemplars of the HC category, whilethe other two levels consist of equalsized samples ofnoncadential events selected at random. Events fromdominant appear in strong metric positions and featuredominant harmony in root positionwith any scale degreein the soprano, while events from other appear in strongmetric positions but exclude events featuring tonic ordominant harmonies in root position. The assumptionbehind this additional exclusion criterion is that tonicevents in root position are potentially more predictablethan rootposition dominants in halfcadential contexts(an assumption examined in greater detail in Experiment2), so it was necessary here to provide a condition thatallows us to compare the IC estimates for the terminalevents in halfcadential contexts against those featuring other, presumably less stable harmonies and scaledegrees.
For every betweengroups factor examined in theexperiments reported here, Levenes equality of variancesrevealed significant differences between groups for nearlyevery viewpointmodel. Thus, we employ an alternative toFishers F ratio that is generally robust to heteroscedastic
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

12 D. R. W. SEARS ET AL.
data, called the Welch F ratio (Welch, 1951). To determine the effect size both for theWelch F ratio and for theplanned comparisons described shortly, we use Cohens(2008) recent notation of a common effect size measurecalled estimated 2.
To address more specific hypotheses about the potential differences in the IC estimates for the terminalevents from cadential and noncadential contexts, eachmodel also includes twoplanned comparisons that donotassume equal variances: the first to determine whetherthe IC estimates from the corresponding cadence category differ significantly from the two noncadential levels(Cadences vs. NonCadences), and the second to determine whether the IC estimates from the corresponding cadence category differ significantly from the second(tonic or dominant) level of closure (PAC vs Tonic orHC vs. Dominant). Unfortunately, these additional testsincrease the risk of committing aType I error, sowe applyBonferroni correction to the planned comparisons.
6.1.2. ResultsThe top bar plots in Figure 2 display the mean IC estimates for the terminal note event in the first violin (left)and cello (right) for each level of tonic closure. Table 4presents the omnibus statistics andplanned comparisons.Beginning with the first violin, oneway ANOVAs ofthe IC estimates revealed a main effect for the viewpoints melint, csd, and the optimised combinationselection, but the baseline viewpoint, cpitch, wasnot significant. Mean IC estimates also increased significantly from PAC to the noncadential levels of tonic closure for melint, csd, and selection. Although thistrend also emerged for the second planned comparisonbetweenPAC and tonic, only themelintmodel revealeda significant effect. Thus, the viewpoint models for thefirst violin demonstrated that terminal note events fromcadential contexts are more predictable than those fromnoncadential contexts.
For the cello, onewayANOVAs revealed amain effectof tonic closure for every viewpoint, but the direction ofthe effect was reversed. Mean IC estimates decreased ineverymodel fromPAC to the noncadential levels of tonicclosure, as well as from PAC to tonic. Thus, contrary toour predictions, the terminal events in the cello from cadential contexts were actually less predictable than thosefrom noncadential contexts.
The bottomleft bar plot in Figure 2 displays the meanIC estimates for the terminal chord eventrepresentedby vintcc and csdcfor each level of the betweengroups factor. Oneway ANOVAs again revealed a maineffect of tonic closure for vintcc and csdc, with themean IC estimates increasing from PAC to the noncadential levels of tonic closure. The second planned com
parison comparing PAC to tonic was not significant foreither viewpoint model, however. Thus, for both modelsthe terminal chord events from cadential contexts weremorepredictable than those fromnoncadential contexts.
To represent the predictability of the harmony andmelody in a single IC estimate for each note/chord event,we created a composite viewpoint that reflects thejoint probability of csdc and selectionvl1. Thebottomright line plot in Figure 2 displays the mean ICestimates for the terminal composite event for eachlevel of tonic closure. In this case, the oneway ANOVArevealed a significant main effect, with the mean IC estimates increasing from PAC to the noncadential levels of closure, and the increase from PAC to tonic wasmarginally significant. As a result, composite demonstrated an ascending staircase for the levels of tonic closure, withPAC receiving the lowest IC estimates and nontonic receiving the highest IC estimates.
The top bar plots in Figure 3 display the mean ICestimates for the terminal note event in the first violin(left) and cello (right) for each level of dominant closure.Table 5 presents the omnibus statistics and planned comparisons. For the first violin, oneway ANOVAs revealeda main effect for every viewpoint model. As expected,the mean IC estimates also increased significantly fromHC to the noncadential levels of dominant closure foreverymodel. The increase fromHC to dominant was alsosignificant for cpitch,melint, and selection, butnot for csd.
For the cello, the mean IC estimates demonstrated asignificant effect of dominant closure for csd, but theother viewpoint models were not significant. For csd,themean IC estimates increased significantly fromHC tothe noncadential levels. A similar trend also emerged forthe second planned comparison between HC and dominant, but the effect was not significant. For the viewpointmodels representing harmonic progressions, the meanIC estimates revealed main effects of dominant closurefor vintcc and csdc, suggesting that the terminalnote and chord events represented in the cello and theentire multivoiced texture are more predictable in halfcadential contexts than in noncadential contexts. Thefirst planned comparison comparing HC with the twononcadential levels was not significant for these viewpoint models, however. Finally, the composite viewpoint demonstrated a significant main effect of dominantclosure, with the mean IC estimates increasing significantly fromHC to the noncadential levels, but not fromHC to dominant.
6.1.3. DiscussionBoth betweengroups factors demonstrated significantlylower mean IC estimates for the terminal events from
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 13
Figure 2. Top: Bar plots of themean information content (IC) estimated for the terminal note event in the first violin (left) and cello (right)for each level of tonic closure. Viewpoints include cpitch, melint, csd, and an optimised combination called selection, whichrepresentsmelint and the linked viewpointcsdcpitch. Bottom left: Bar plot of themean IC estimated for the terminalvintccand csdc for each level of tonic closure. Bottom right: Line plot of the mean IC estimated for the combination called composite,which represents the joint probability of selectionvl1 and csdc. Whiskers represent1 standard error.
Table 4. Analysis of variance and planned comparisons predicting the information content estimates from all viewpoint models withtonic closure.
Omnibus ComparisonsPAC vs. NonCadence PAC vs. Tonic
Viewpoint df Welch F est. 2 p df t r p df t r p
Note eventsViolin 1cpitch 241.99 1.56 .003 NS 241.58 1.43 .09 NS 241.99 0.73 .05 NSmelint 238.65 5.80 .03 .003 289.44 3.38 .19 .002 239.31 2.47 .16 .029csd 238.43 7.83 .04

14 D. R. W. SEARS ET AL.
Figure 3. Top: Bar plots of themean information content (IC) estimated for the terminal note event in the first violin (left) and cello (right)for each level of dominant closure. Viewpoints include cpitch, melint, csd, and an optimised combination called selection,which represents melint and the linked viewpoint csd cpitch. Bottom left: Bar plot of the mean IC estimated for the terminalvintcc and csdc for each level of dominant closure. Bottom right: Line plot of the mean IC estimated for the combination calledcomposite, which represents the joint probability of selectionvl1 and csdc. Whiskers represent1 standard error.
Table 5. Analysis of variance and planned comparisons predicting the information content estimates from all viewpoint models withdominant closure.
Omnibus ComparisonsHC vs. NonCadence HC vs. Dominant
Viewpoint df Welch F est. 2 p df t r p df t r p
Note eventsViolin 1cpitch 164.25 4.82 .02 .009 191.26 3.11 .22 .004 165.44 2.55 .19 .024melint 159.52 5.89 .03 .003 227.92 3.42 .22 .002 146.00 2.99 .24 .007csd 162.09 8.14 .04

JOURNAL OF NEWMUSIC RESEARCH 15
Caplins typology. We might hypothesise, for example,that the strength and specificity of our schematic expectations formed in prospect and their subsequent realisationin retrospect contributes to the perception of cadentialstrength, where the most expected (i.e. probable) endingsare also the most complete or closed. From this pointof view, the probabilities estimated by IDyOM mightcorrespond with models of cadential strength advancedin contemporary cadence typologies.
6.2. Experiment 2
Recall that the cadence collection consists of exemplarsfrom five categories in Caplins typology: PAC, IAC, HC,DC, and EV. Sears (2015) recently classified models thatestimate the closural strength of these categories into twogeneral types: those that relate every cadential categoryto one essential prototype, called the 1Schema model(Latham, 2009; Schmalfeldt, 1992); and those that distinguish the categories according to whether they allowlisteners to form expectations as to how they might end,called the Prospective Schemas model (Sears, 2015) (seeSection 2). Experiment 2 directly compares these twomodels of cadential strength.
6.2.1. AnalysisTocompare themean IC estimates for the terminal eventsfrom each cadence category, each of the viewpoints wasagain analysed for the terminal note events from the firstviolin and cello and the terminal chord events from theentire texture using a oneway ANOVA with a fivelevelbetweengroups factor called cadence category (PAC, IAC,HC, DC, and EV). To examine the potential differencesin the IC estimates for the terminal events from eachcadence category, eachmodel includes two planned comparisons that do not assume equal variances, with a Bonferroni correction applied to the obtained statistics. In thefirst comparison, each level of cadence categorywas codedto represent twomodels of cadential strength: ProspectiveSchemas (PACIACHCDCEV ) and 1Schema(PACIACDCEVHC). Polynomial contrastswith linear and quadratic terms were then included toestimate the goodnessoffit for each model. In what follows, we report the contrast whose linear or quadratictrend accounts for the greatest proportion of variance inthe outcome variable. The second comparison examinesthe hypothesis that the genuine cadence categories inCaplins typology elicit lower IC estimates on averagethan the cadential deviations (Genuine vs. Deviations).
6.2.2. ResultsFigure 4 displays line plots of the mean IC estimates forthe terminal note event in the first violin (left) and cello
(right) for each level of cadence category. Table 6 presentsthe omnibus statistics and planned comparisons. For thefirst violin, the mean IC estimates revealed a main effectfor the viewpoints cpitch, csd, and selection, butnot for melint. Moreover, the bestfitting polynomialcontrast revealed a positive (increasing) linear trend inthe Prospective Schemas model (i.e. from the PAC to theEV categories) for every viewpoint model. The genuinecadence categories also received lowermean IC estimatesthan the cadential deviations in every model.
For the cello, the IC estimates also revealed a maineffect of cadence category for every viewpoint model, andthe Prospective Schemas model again produced the bestfit, with polynomial contrasts revealing positive quadratictrends for cpitch and melint, but positive lineartrends for csd and selection. The quadratic trendexhibited in the cpitch and melint models for thecello probably reflects the statistical preference forsmallermelodic intervals in the corpus, resulting in lowermean IC estimates for categories that feature stepwisemotion in the bass (HCandDC), and higher estimates forcategories featuring large leaps (PAC, IAC, and EV). Thistrendwasnot demonstrated in thecsd andselectionviewpoint models, however, as the DC category receivedhigher IC estimates relative to the other categories inthese models, thereby resulting in positive linear trends.Presumably, the HC category received the lowest IC estimates on average because scaledegree successions like45 aremore common than successions like 51. And yetsuccessions like 56 are also evidently less common than51, hence the higher IC estimates for the DC categoryand the increasing linear trend, PACIACDCEV.Finally, as expected, the genuine cadence categories received lower mean IC estimates than the cadential deviations in every model.
The left line plot in Figure 5 displays the mean ICestimates for the terminal chord eventrepresented byvintcc and csdcfor each level of cadence category.As before, the IC estimates revealed a main effect forvintcc and csdc, and the bestfitting polynomial contrast revealed a positive linear trend in the ProspectiveSchemas model for both models. The genuine cadencecategories also received lower mean IC estimates thanthe cadential deviations for vintcc and csdc.
It is also noteworthy that the terminal events fromthe EV category generally received lower IC estimatesthan those from the DC category. Recall that evadedcadences are typically characterised not by a deviation inthe harmonic progression (though such a deviation maytake place), but rather by a sudden interruption in theprojected resolution of the melody. In this collection, 10of the 11 evaded cadences feature tonic harmony eitherin root position or in first inversion at the moment of
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

16 D. R. W. SEARS ET AL.
Figure 4. Line plots of the mean IC estimated for the terminal note event in the first violin (left) and cello (right) for each cadencecategory. Viewpoints includecpitch,melint,csd, and an optimised combination calledselection, which representsmelintand the linked viewpoint csd cpitch. Whiskers represent1 standard error.
Table 6. Analysis of variance and planned comparisons predicting the information content estimates from all viewpoint models withcadence categories.
Omnibus ComparisonsProspective Schemas Genuine vs. Deviations
Viewpoint df Welch F est. 2 p Trend df t r p df t r p
Note eventsViolin 1cpitch 32.42 3.19 .03 .026 Linear 13.47 3.40 .68 .013 17.19 3.64 .66 .006melint 31.57 2.34 .02 NS Linear 12.13 3.00 .65 .032 14.14 2.94 .62 .033csd 29.95 3.02 .03 .033 Linear 18.99 3.43 .62 .008 30.51 3.17 .50 .009selection 29.80 3.77 .04 .013 Linear 16.38 3.85 .69 .004 26.03 3.66 .58 .003
Cellocpitch 31.18 8.86 .11

JOURNAL OF NEWMUSIC RESEARCH 17
Figure 5. Left: Line plot of the mean IC estimated for the terminal vintcc and csdc for each cadence category. Right: Line plot ofthe mean IC estimated for the combination called composite, which represents the product of selectionvl1 and csdc. Whiskersrepresent1 standard error.
of cadential excerpts from Mozarts keyboard sonatas.The genuine cadence categories and the cadential deviations received the highest and lowest completion ratings,respectively, which in light of the present findings suggests that the perceived strength of the cadentialending corresponds with the strength of the schematicexpectations it generates in prospect. But recall that theperception of closure also depends on the cessation ofexpectations following the terminal events of the cadence.That is, the strength of the potential boundary betweentwo sequential events results in part from the increasein information content (or decrease in probability) fromthe first to the second event (i.e. the last event of onegroup to the first event of the following group). The preceding analyses examined terminal events from cadentialand noncadential contexts in isolation, so Experiment3 considers the role played by schematic expectations inboundary perception and event segmentation by examining the time course of IC estimates surrounding theterminal events of the cadence.
6.3. Experiment 3
Experiment 3 examines two claims about the relationship between expectancy and boundary perception: (1)that the terminal event of a group is the most expected(i.e. predictable) event in the surrounding sequence; and(2) that the next event in the sequence is comparativelyunexpected (i.e. unpredictable). Again, the hypothesishere is that unexpected events engender prediction errorsthat lead the perceptual system to segment the eventstream into discrete chunks (Kurby & Zacks, 2008). Theterminal events from the PAC, IAC and HC categoriesshould behighly predictable, andprediction errors for thecomparatively unpredictable events that follow shouldforce listeners to segment the preceding cadential material. For the cadential deviations, however, predictionerrors should occur at, rather than following, the terminalevents of the cadence.
6.3.1. AnalysisIn addition to the betweengroups factor of cadence category, Experiment 3 includes a betweengroups factor oftime that consists of three levels: et , which represents theterminal event of the group, and et1 and et+1, which represent the immediately surrounding events. With morecomplex designs like this one, the number of significancetests can become prohibitively large, sowe have restrictedthe investigation to the four viewpoints that serve asreasonable approximations of the twovoice frameworkcharacterising the classical cadence: selectionvl1 andselectionvc to represent the soprano and bass, respectively, and vintcc and csdc to each represent theentire texture. Experiment 3 analyses these viewpointsusing a 5 3 twoway ANOVA with betweengroupsfactors of cadence category (PAC, IAC,HC, DC, and EV),and time (et1, et , et+1).
By moving from one to two betweengroups factors,the number of omnibus statistics and planned comparisons necessarily increases, and since Levenes test alsorevealed heteroscedastic groups for all four of the 5 3viewpoint ANOVAs, the risk of committing a Type Ierror is considerably greater here than in the previousexperiments. In this case, the two hypotheses mentionedabove concern the interaction between cadence categoryand time: namely, whether the IC estimates for eachcadence category increase or decrease significantly fromone event to the next. Thus, the following analysis ignoresthe main effects and concentrates only on the interactionterm of the twoway ANOVA. If the interaction is significant, we report simple main effects, which representoneway ANOVAs with time as a factor for each level ofcadence category. Finally, to examine the potential differences in the IC estimates for the levels of time for eachcadence category, each simple main effect included twoplanned comparisons with Bonferroni correction that donot assume equal variances: (1) whether the IC estimatefor et is lower on average than the surrounding events,et1 and et+1 (et vs. Surrounding); and (2) whether the IC
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

18 D. R. W. SEARS ET AL.
Figure 6. Time course of the mean IC estimated for theevents surrounding the terminal note event in the first violin(top) and cello (bottom) for each cadence category using theviewpoint selection, which represents melint and thelinked viewpointcsdcpitch. The statistical analysis pertainsto event numbers 1, 0 (or Terminus), and 1. Whiskers represent1 standard error.
estimate for et+1 is higher on average than the estimatefor et (et vs. et+1).
6.3.2. ResultsFigure 6 displays line plots of the mean IC estimates forthe note events over time in the first violin (top) andcello (bottom) for each level of cadence category. Table 7presents the omnibus statistics for the simplemain effectsand planned comparisons. To gain a more global pictureof the IC time course, the line plots present the meanIC estimates for the sevenevent sequence surroundingthe terminal event of each cadence category, but theANOVAmodels only consider the three events,1, 0 (orTerminus), and 1. For the first violin, a twoway ANOVAof themean IC estimates revealed a significant interactionbetween cadence category and time, F(8, 718) = 3.88,p < .001, est. 2 = .03. The mean IC estimates foreach level of cadence category revealed simplemain effectsfor PAC and HC, but the remaining categories were notsignificant.
Despite the nonsignificant simple main effects for theIAC, DC, and EV categories, simple planned comparisons revealed significant trends over time for every cadence category.As expected, the terminal event in thefirstviolin received lower IC estimates on average than theimmediately surrounding events for the genuine cadencecategories and the DC category, though the latter trendwas marginal. Thus, for cadences featuring melodies thatresolve to presumably stable scale degrees like 1 or 3, theterminal event of the group is also the most predictableevent in the sequence.
For the PAC, IAC, HC, and DC categories, the meanICestimates increased significantly from et to et+1, therebysupporting the view that the strength of the perceptualboundarydependson the increase in information contentfollowing the terminal event of the cadence. And yet sincethe EV category replaces the expected terminal event inthe melody with material that clearly initiates the subsequent processoften by leaping up to an unexpectedscale degree like 5one might therefore predict that asignificant increase in information content should occurat (and not following) the expected terminal event ofthe group. This is exactly what we observe, with theexpected terminal events from the EV category receivingthe highest mean IC estimate in the sequence (see Table7). Thus, the pattern of results from selectionvl1 isentirely consistent with the two main hypotheses: (1)the terminal event of a group is the most predictableevent in the sequence, and (2) the next event is comparatively unpredictable. Here, the mean IC estimates for thefirst violin increased significantly following the predictedboundary for every cadence category in the collection.
For the cello, a twoway ANOVA of the mean IC estimates revealed a significant interaction between cadencecategory and time, F(8, 717) = 13.02, p < .001, est.2 =.12. Excepting IAC, the mean IC estimates also revealedsimple main effects for every level of cadence category. Asexpected, the terminal event in the cello received lower ICestimates on average than the immediately surroundingevents for the HC category, but the trend was reversedfor the PAC, DC and EV categories, and the trend for theIAC category was not significant.
For the HC category at least, the terminal event wasalso the most predictable event in the sequence. Furthermore, the significant increase in information content inthe cello at the expected terminal event in the DC and EVcategories is consistent with the behaviour of cadentialdeviations. For the former category, the bass typicallyresolves deceptively to scale degrees like 6, thereby violating expectations for 1, whereas the latter categoryevades the expected resolution by leaping to other scaledegrees to support harmonies like I6. The significantincrease in information content for the terminal event ofthe PAC category is somewhatmore surprising, however.Recall from Experiment 2 that the mean IC estimatesfor the terminal events from each cadence category inthe cello demonstrated a positive quadratic trend, withthe HC category receiving the lowest IC estimates (seeFigure 4). In that case, we suggested that small melodicintervals appearmore abundantly in the corpus than largeintervals, resulting in higher IC estimates for categoriesfeaturing large leaps (PAC, IAC and EV). From this pointof view, it seems reasonable that the mean IC estimatesfor the cello would increase at et for categories featuring
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 19
Table 7. Analysis of variance and planned comparisons predicting the information content estimates from selectionvl1,selectionvc, and vintcc over time for each cadence category.
Omnibus Comparisonset vs. Surrounding et vs. et+1
Viewpoint df Welch F est. 2 p df t r p df t r p
Note eventsselectionvl1PAC 236.76 67.63 .35

20 D. R. W. SEARS ET AL.
Figure 7. Time course of the mean IC estimated for the eventssurrounding the terminal chord event for vintcc (top) andcsdc (bottom) for each cadence category. The statistical analysispertains to event numbers 1, 0 (or Terminus), and 1. Whiskersrepresent1 standard error.
event in the sequence; and (2) the next event is comparatively unexpected (i.e. unpredictable).
7. Conclusions
This study examined three claims about the relationshipbetween expectancy and cadential closure: (1) terminalevents from cadential contexts are more predictable thanthose from noncadential contexts; (2) models of cadential strength advanced in cadence typologies like theone employed here reflect the formation, violation, andfulfilment of schematic expectations; and (3) a significantdecrease in predictability follows the terminal note andchord events of the cadential process. To that end, wecreated a corpus of Haydn string quartets to serve as aproxy for the musical experiences of listeners situated inthe classical style, selected a number of viewpoints to represent suitable (i.e. cognitively plausible) representationsof the musical surface, and then employed IDyOMan ngram model that predicts the next note or chordevent in a musical stimulus through unsupervised learning of sequential structureto simulate the formation ofschematic expectations during music listening.
The findings from Experiment 1 indicate that the terminal note and chord events from perfect authentic cadences aremorepredictable than (1)noncadential eventsfeaturing tonic harmony in root position and supportingany scale degree in the soprano, and (2) noncadentialevents featuring any other harmony and any other scaledegree in the soprano. For half cadences, significant effects were limited to the chord models (vintcc, csdc,and composite) and the csd viewpoint model, butthe terminal events from halfcadential contexts were
still more predictable than those fromnoncadential contexts. Experiment 2 provided strong evidence in supportof the Prospective Schemas model of cadential strength(PACIACHCDCEV), with the genuinecadence categories (PAC, IAC, HC) and cadential deviations (DC, EV) in Caplins typology eliciting the lowestand highest IC estimates on average, respectively. Finally,the results from Experiment 3 indicated that unexpectedeventslike those directly following the terminal noteand chord events from genuine cadencesengender prediction errors that presumably lead the perceptual systemto segment the event stream immediately following thecadential process.
Taken together, the reported findings support the roleof expectancy in models of cadential closure, with themost complete or closed cadences also serving as themostexpected or probable. Nevertheless, future studies willneed to address a number of limitations in the currentinvestigation. First, the rather meager sample size forthree of the five cadence categories in the collectionas well as the corpus more generallycasts some doubtupon the generalisability of the reported findings. Thatthe estimates from IDyOM correspond so well with theoretical predictions suggests that these findings may berobust to issues of sample size, but future studies shouldlook to expand the collection considerably, as well as toconsider how the relationship between expectancy andcadential closure varies for other genres and style periods.
Second, we selected individual viewpoints if existingtheoretical or experimental evidence justified their inclusion, such as melint and csd in melodic contexts(Dowling, 1981; Krumhansl, 1990), and vintcc andcsdc in harmonic contexts (Gjerdingen, 2007). Yet in afew instances,cpitchwhichwasonly included amongthe melodic viewpoints to serve as a baseline for modelcomparisonproduced similar results (see Experiment2). One reason for this finding is that many of the viewpoints characterising melodic organisation in Haydnsstring quartets systematically covary such that statistical regularities governing the more cognitively plausibleviewpoints (e.g. melint and csd) also appear in theless plausible ones (cpitch), albeit more weakly. In amelody composed in the key of Cmajor, for example,B presumably functions as the leading tone, not because the twelvetone chromatic universe regularly features this twonote sequence regardless of the particulartonal context, but because C typically follows B in thekey of Cmajor, forming one of the many statistical associations characterising the tonal system. Nevertheless,the tendency for small melodic intervals and the prevalence of certain keys in tonal musicwherein B is morelikely to progress to C than to, say, Aensures thatIDyOM will learn statistical regularities in basic view
Dow
nloa
ded
by [Q
ueen
Mar
y U
nive
rsity
of L
ondo
n] a
t 02:
52 2
2 N
ovem
ber 2
017

JOURNAL OF NEWMUSIC RESEARCH 21
points like cpitch that are correlated to those found inother melodic viewpoints like melint or csd.
What is more, rather than assume that listeners expectspecific intervals in a melodic sequence, as is IDyOMsapproach using melint, existing models of melodicexpectation typically theorise that listeners expect smallermelodic intervals regardless of the preceding context, aprinciple known as pitch proximity (e.g. Cuddy &Lunney, 1995; Margulis, 2005; Narmour, 1990;Schellenberg, 1997). Yet in this study, we only assumethat listeners form expectations on the basis of statisticalregularities among melodic intervals in a sequence. Vosand Troost (1989) have demonstrated, for example, thatsmall intervals are far more common than large intervalsinWestern tonalmusic, so it should not be surprising thatIDyOMproduces higher probability estimates for smallerintervals, just as do proximitybased models. The difference in these two approaches is thus theoretical, ratherthan empirical, in that IDyOM bases its predictions on atheory of implicit statistical learning, whereas proximitybased models also sometimes base their assumptions onother (sensory or psychoacoustic) mechanisms. It is certainly possible that these mechanisms influence the preference for smaller over larger intervals in Western tonalmusicor indeed, the formation of expectations duringmusic listening more generallybut we do not examinethis assumption here.
Perhapsmore importantly, the cross entropy estimatesfor themelodicmodels in this study indicate that IDyOMwas more certain about its predictions for more cognitively plausible viewpoints like melint and csd thanfor less plausible ones like cpitch, thereby reinforcingthe view that representations of the musical surface arecognitively plausible to the degree that they minimiseprediction errors for future events (Pearce & Wiggins,2012). Indeed, if prediction is the primary function ofthe brain (Hawkins & Blakeslee, 2004, p. 89), and listeners learn to compress information during processing by only retaining representations of the musical surface thatminimise uncertainty (Pearce &Wiggins, 2012),