simulating melodic and harmonic expectations for tonal...

of 24 /24
JOURNAL OF NEW MUSIC RESEARCH, 2017 https://doi.org/10.1080/09298215.2017.1367010 Simulating melodic and harmonic expectations for tonal cadences using probabilistic models David R. W. Sears a , Marcus T. Pearce b , William E. Caplin a and Stephen McAdams a a McGill University, Canada; b Queen Mary University of London, UK ABSTRACT This study examines how the mind’s predictive mechanisms contribute to the perception of cadential closure during music listening. Using the Information Dynamics of Music model (or IDyOM) to simulate the formation of schematic expectations—a finite-context (or n-gram) model that predicts the next event in a musical stimulus by acquiring knowledge through unsupervised statistical learning of sequential structure—we predict the terminal melodic and harmonic events from 245 exemplars of the five most common cadence categories from the classical style. Our findings demonstrate that (1) terminal events from cadential contexts are more predictable than those from non-cadential contexts; (2) models of cadential strength advanced in contemporary cadence typologies reflect the formation of schematic expectations; and (3) a significant decrease in predictability follows the terminal note and chord events of the cadential formula. ARTICLE HISTORY Received 5 November 2016 Accepted 14 July 2017 KEYWORDS Cadence; expectation; statistical learning; segmental grouping; n-gram models 1. Introduction In the intellectual climate now prevalent, many scholars view the brain as a ‘statistical sponge’ whose purpose is to predict the future (Clark, 2013). While descending a staircase, for example, even slightly misjudging the height or depth of each step could be fatal, so the brain predicts future steps by building a mental representation of the staircase, using incoming auditory, visual, haptic and proprioceptive cues to minimise potential prediction errors and update the representation in memory. Re- searchers sometimes call these representations schemata—‘active, developing patterns’ whose units are serially organised, not simply as individual members coming one after the other, but as a unitary mass (Bartlett, 1932, p. 201). Over the course of exposure, these schematic representations obtain greater specificity, thereby increasing our ability to navigate complex sen- sory environments and predict future outcomes. Among music scholars, this view was first crystallised by Meyer (1956, 1967), with the resurgence of associa- tionist theories in the cognitive sciences—which placed the brain’s predictive mechanisms at the forefront of contemporary research in music psychology—following soon thereafter. Krumhansl (1990) has suggested, for ex- ample, that composers often exploit the brain’s potential for prediction by organising events on the musical surface to reflect the kinds of statistical CONTACT David R. W. Sears [email protected] David R. W. Sears, Texas Tech University, J.T. & Margaret Talkington College of Visual & Performing Arts, Holden Hall, 1011 Boston Ave., Lubbock, TX 79409, USA The underlying research materials for this article can be accessed at https://doi.org/10.1080/09298215.2017.1367010. regularities that listeners will learn and remember. The tonal cadence is a case in point. As a recurrent temporal formula appearing at the ends of phrases, themes and larger sections in music of the common-practice period, the cadence provides perhaps the clearest instance of phrase-level schematic organisation in the tonal system. To be sure, cadential formulæ flourished in eighteenth- century compositional practice by serving to ‘mark the breathing places in the music, establish the tonality, and render coherent the formal structure’, thereby cementing their position ‘throughout the entire period of common harmonic practice’ (Piston, 1962, p. 108). As a conse- quence, Sears (2015, 2016) has argued that cadences are learned and remembered as closing schemata, whereby the initial events of the cadence activate the correspond- ing schematic representation in memory, allowing listeners to form expectations for the most probable con- tinuations in prospect. The subsequent realisation of those expectations then serves to close oboth the cadence itself, and perhaps more importantly, the longer phrase- structural process that subsumes it. There is a good deal of support for the role played by expectation and prediction in the perception of closure (Huron, 2006; Margulis, 2003; Meyer, 1956; Narmour, 1990), with scholars also sometimes suggesting that listeners possess schematic representa- tions for cadences and other recurrent closing patterns © 2017 Informa UK Limited, trading as Taylor & Francis Group Downloaded by [Queen Mary University of London] at 02:52 22 November 2017

Author: trinhquynh

Post on 08-Sep-2018

224 views

Category:

Documents


0 download

Embed Size (px)

TRANSCRIPT

  • JOURNAL OF NEWMUSIC RESEARCH, 2017https://doi.org/10.1080/09298215.2017.1367010

    Simulating melodic and harmonic expectations for tonal cadences usingprobabilistic models

    David R. W. Searsa, Marcus T. Pearceb, William E. Caplina and Stephen McAdamsa

    aMcGill University, Canada; bQueen Mary University of London, UK

    ABSTRACTThis study examines how the minds predictive mechanisms contribute to the perception ofcadential closure during music listening. Using the Information Dynamics of Music model (orIDyOM) to simulate the formation of schematic expectationsa finite-context (or n-gram) modelthat predicts the next event in a musical stimulus by acquiring knowledge through unsupervisedstatistical learning of sequential structurewe predict the terminal melodic and harmonic eventsfrom 245 exemplars of the five most common cadence categories from the classical style. Ourfindings demonstrate that (1) terminal events from cadential contexts are more predictable thanthose from non-cadential contexts; (2) models of cadential strength advanced in contemporarycadence typologies reflect the formation of schematic expectations; and (3) a significant decrease inpredictability follows the terminal note and chord events of the cadential formula.

    ARTICLE HISTORYReceived 5 November 2016Accepted 14 July 2017

    KEYWORDSCadence; expectation;statistical learning;segmental grouping;n-grammodels

    1. Introduction

    In the intellectual climate now prevalent, many scholarsview the brain as a statistical sponge whose purpose isto predict the future (Clark, 2013). While descending astaircase, for example, even slightlymisjudging the heightor depth of each step could be fatal, so the brain predictsfuture steps by building a mental representation of thestaircase, using incoming auditory, visual, haptic andproprioceptive cues to minimise potential predictionerrors and update the representation in memory. Re-searchers sometimes call these representationsschemataactive, developing patterns whose units areserially organised, not simply as individual memberscoming one after the other, but as a unitarymass (Bartlett,1932, p. 201). Over the course of exposure, theseschematic representations obtain greater specificity,thereby increasing our ability to navigate complex sen-sory environments and predict future outcomes.

    Among music scholars, this view was first crystallisedby Meyer (1956, 1967), with the resurgence of associa-tionist theories in the cognitive scienceswhich placedthe brains predictive mechanisms at the forefront ofcontemporary research in music psychologyfollowingsoon thereafter. Krumhansl (1990) has suggested, for ex-ample, that composers often exploit the brainspotential for prediction by organising events on themusical surface to reflect the kinds of statistical

    CONTACT David R. W. Sears [email protected] David R. W. Sears, Texas Tech University, J.T. & Margaret Talkington College of Visual & PerformingArts, Holden Hall, 1011 Boston Ave., Lubbock, TX 79409, USA

    The underlying research materials for this article can be accessed at https://doi.org/10.1080/09298215.2017.1367010.

    regularities that listeners will learn and remember. Thetonal cadence is a case in point. As a recurrent temporalformula appearing at the ends of phrases, themes andlarger sections in music of the common-practice period,the cadence provides perhaps the clearest instance ofphrase-level schematic organisation in the tonal system.To be sure, cadential formul flourished in eighteenth-century compositional practice by serving to mark thebreathing places in the music, establish the tonality, andrender coherent the formal structure, thereby cementingtheir position throughout the entire period of commonharmonic practice (Piston, 1962, p. 108). As a conse-quence, Sears (2015, 2016) has argued that cadences arelearned and remembered as closing schemata, wherebythe initial events of the cadence activate the correspond-ing schematic representation in memory, allowinglisteners to form expectations for the most probable con-tinuations inprospect. The subsequent realisationof thoseexpectations then serves to close off both the cadenceitself, and perhaps more importantly, the longer phrase-structural process that subsumes it.

    There is a good deal of support for the role played byexpectation and prediction in the perception ofclosure (Huron, 2006; Margulis, 2003; Meyer, 1956;Narmour, 1990), with scholars also sometimessuggesting that listeners possess schematic representa-tions for cadences and other recurrent closing patterns

    2017 Informa UK Limited, trading as Taylor & Francis Group

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

    http://www.tandfonline.comhttp://crossmark.crossref.org/dialog/?doi=10.1080/09298215.2017.1367010&domain=pdfhttps://doi.org/10.1080/09298215.2017.1367010

  • 2 D. R. W. SEARS ET AL.

    (Eberlein, 1997; Eberlein & Fricke, 1992; Gjerdingen,1988;Meyer, 1967; Rosner &Narmour, 1992; Temperley,2004). Yet currently very little experimental evidencejustifies the links between expectancy, prediction, andthe variety of cadences in tonal music or indeed, morespecifically, inmusic of the classical style (Haydn,Mozart,and Beethoven), where the compositional significance ofcadential closure is paramount (Caplin, 2004; Hepokoski& Darcy, 2006; Ratner, 1980; Rosen, 1972). This point issomewhat surprising given that the tonal cadence is thequintessential compositional device for suppressing ex-pectations for further continuation (Margulis, 2003). Theharmonic progression and melodic contrapuntal motionwithin the cadential formula elicit very definite expecta-tions concerning the harmony, the melodic scale degreeand the metric position of the goal event. As Huronputs it, it is not simply the final note of the cadencethat is predictable; the final note is often approachedin a characteristic or formulaic manner. If cadences aretruly stereotypic, then this fact should be reflected inmeasures of predictability (2006, p. 154). If Huron isright, applying a probabilistic approach to the cadencesfrom a representative corpus should allow us to examinethese claims empirically.

    This study applies and extends a probabilistic accountof expectancy formation called the Information Dynam-ics of Music model (or IDyOM)a finite-context (orn-gram) model that predicts the next event in a mu-sical stimulus by acquiring knowledge through unsu-pervised statistical learning of sequential structuretoexamine how the formation, fulfilment, and violationof schematic expectations may contribute to the percep-tion of cadential closure during music listening (Pearce,2005). IDyOM is based on a class of Markov modelscommonly used in statistical language modelling(Manning & Schtze, 1999), the goal of which is to sim-ulate the learning mechanisms underlying human cogni-tion. Pearce explains,

    It should be possible to design a statistical learning algo-rithm ... with no initial knowledge of sequential depen-dencies between melodic events which, given exposureto a reasonable corpus of music, would exhibit similarpatterns of melodic expectation to those observed inexperiments with human subjects. (Pearce, 2005, p. 152)

    Unlike language models, which typically deal withunidimensional inputs, IDyOMgenerates predictions formultidimensional melodic sequences using the multi-ple viewpoints framework developed by Conklin (1988,1990) and Conklin and Witten (1995), which is to saythat Pearces model generates predictions for viewpointslike chromatic pitch by combining predictions from anumber of potential viewpoints using a set of simpleheuristics to minimise model uncertainty (Pearce, Con-klin, & Wiggins, 2005). In the past decade, studies have

    demonstrated the degree to which IDyOM can simu-late the responses of listeners in tasks involving melodicsegmentation (Pearce, Mllensiefen, & Wiggins, 2010),subjective ratings of predictive uncertainty (Hansen &Pearce, 2014), subjective and psychophysiological emo-tional responses to expectancy violations (Egermann,Pearce, Wiggins, & McAdams, 2013), and behavioural(Omigie, Pearce, & Stewart, 2012; Pearce & Wiggins,2006; Pearce, Ruiz, Kapasi, Wiggins, & Bhattacharya,2010), electrophysiological (Omigie, Pearce,Williamson,& Stewart, 2013), and neural measures of melodic pitchexpectations (Pearce, Ruiz, et al., 2010). And yet, themajority of these studies were limited to the simulation ofmelodic pitch expectations, so this investigation developsnew representation schemes that also permit the prob-abilistic modelling of harmonic sequences in complexpolyphonic textures.

    To consider how IDyOM might simulate schematicexpectations in cadential contexts, this study adopts acorpus-analytic approach, using the many methods ofstatistical inference developed in the experimentalsciences to examine a few hypotheses about cadentialexpectancies. To that end, Section 2 provides a brief sum-mary and discussion of the cadence concept, as well asthe typology on which this study is based (Caplin, 1998,2004), and then offers three hypotheses designed to ex-amine the link between prediction and cadential closure.Next, Section3 introduces themultiple viewpoints frame-work employed by IDyOM, and Section 4 describes themethods for estimating the conditional probability func-tion for individualmelodic or harmonic viewpoints usingmaximumlikelihood (ML) estimation and theprediction-by-partial-match (PPM) algorithm. We then present inSection 5 the corpus of expositions and the annotated ca-dence collection from Haydns string quartets anddescribe Pearces procedure for improving model per-formance by combining viewpoint models into a singlecomposite prediction for eachmelodic or harmonic eventin the sequence. Finally, Section 6 presents the results ofthe computational experiments, and Section 7 concludesby discussing limitations of the modelling approach andconsidering avenues for future research.

    2. The classical cadence

    Like many of the concepts in circulation in musicscholarship (e.g. tonality, harmony, phrase, meter), thecadence concept has been extremely resistant to defini-tion. To sort through the profusion of terms associatedwith cadence, Blombach (1987) surveyed definitions ineighty-one textbooks distributed around amedian publi-cation date of 1970. Her findings suggest that the cadence

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 3

    is most frequently characterised as a time span, whichconsists of a conventionalised harmonic progression, andin some instances, a falling melody. In over half of thetextbooks surveyed, these harmonic and melodic for-mul are also classified into a compendium of cadencetypes, with the degree of finality associated with eachtype sometimes leading to comparisonswith punctuationin language. However, many of these definitions alsoconceptualise the cadence as a point of arrival (Ratner,1980), or time point, which marks the conclusion of anongoing phrase-structural process, and which is oftencharacterised as a moment of rest, quiescence, relaxationor repose. Thus a cadence is simultaneously understoodas time-span and time-point, the former relating to itsmost representative (or recurrent) features (cadence asformula), the latter to the presumed boundary it precedesand engenders (cadence as ending) (Caplin, 2004).

    The compendium of cadences and other conventionalclosing patterns associated with the classical period isenormous, but contemporary scholars typically cite onlya few, which may be classified according to two fun-damental types: those cadences for which the goal ofthe progression is tonic harmony (e.g. perfect authentic,imperfect authentic, deceptive, etc.), and those cadencesfor which the goal of the progression is dominant har-mony (e.g. half cadences). Table 1 provides the harmonicand melodic characteristics for five of the most commoncadence categories from Caplins typology (1998, 2004).The perfect authentic cadence (PAC), which features aharmonic progression from a root-position dominant toa root-position tonic, as well as the arrival of the melodyon 1, serves as the quintessential closing pattern not onlyfor the high classical period (Gjerdingen, 2007), but forrepertories spanningmuch of the history ofWesternmu-sic. The imperfect authentic cadence (IAC) is a melodicvariant of the PAC category that replaces 1 with 3 (or,more rarely, 5) in the melody, and like the PAC category,typically appears at the conclusion of phrases, themes, orlarger sections.

    The next two categories represent cadential devia-tions, in that they initially promise a perfect authenticcadence, yet fundamentally deviate from the patternsterminal events, thus failing to achieve authentic caden-tial closure at the expected moment Caplin calls caden-tial arrival (1998, p. 43). The deceptive cadence (DC)leaves harmonic closure somewhat open by closing witha non-tonic harmony, usually vi, but the melodic lineresolves to a stable scale degree like 1 or 3, thereby pro-viding a provisional sense of ending for the ongoingthematic process. The evaded cadence is characterised bya sudden interruption in the projected resolution of thecadential process. For example, instead of resolving to 1,the melody often leaps up to some other scale degree like

    Table 1. The cadential types and categories, along with theharmonic and melodic characteristics and the count for eachcategory in the cadence collection. Categories marked with anasterisk are cadential deviations.

    Types Categories Essential characteristics N

    I

    Perfect Authentic (PAC) V I 1221

    Imperfect Authentic (IAC) V I 93 or 5

    Deceptive (DC)* V ?, Typically vi 191 or 3

    Evaded (EV)* V ? 11?, Typically 5

    V Half (HC) ? V 845, 7, or 2

    5, thereby replacing the expected ending with materialthat clearly initiates the subsequent process. Thus, theevaded cadence projects no sense of ending whatsoever,as the events at the expected moment of cadential arrival,which should group backward by ending the precedingthematic process, instead group forward by initiatingthe subsequent process. Finally, the half cadence (HC)remains categorically distinct from both the authenticcadence categories and the cadential deviations, sinceits ultimate harmonic goal is dominant (and not tonic)harmony. TheHC category also tends to be definedmoreflexibly than the other categories in that the terminalharmonymay support any chordmember in the soprano(i.e. 2, 5, or 7).

    This study examines three claims about the link be-tween prediction and cadential closure. First, if cadencesserve as the most predictable, probabilistic, specificallyenvisaged formul in all of tonal music (Huron, 2006;Meyer, 1956), we would expect terminal events fromcadential contexts to bemore predictable than those fromnon-cadential contexts even if both contexts sharesimilar or even identical terminal events (e.g. tonic har-mony in root position, 1 in the melody, etc.). Thus, Ex-periment 1 examines the hypothesis that cadences aremore predictable than their non-cadential counterpartsby comparing the probability estimates obtained fromIDyOM for the terminal events from the PAC and HCcategoriesthe two most prominent categories in tonalmusicwith those fromnon-cadential contexts that shareidentical terminal events.

    Second, applications of cadence typologies like theone employed here often note the correspondence be-tween cadential strength (or finality) on the one handand expectedness (or predictability) on the other.Dunsbyhas noted, for example, that in Schoenbergs view, theexperience of closure for a given cadential formula is onlysatisfying to the extent that it fulfils a stylisticexpectation (1980, p. 125). This would suggest that thestrength and specificity of our schematic expectations

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • 4 D. R. W. SEARS ET AL.

    formed in prospect and their subsequent realisation inretrospect contributes to the perception of cadentialstrength, where themost expected (i.e. probable) endingsare also the most complete or closed. Sears (2015) pointsout that models of cadential strength advanced in con-temporary cadence typologies typically fall into two cate-gories: those that compare every cadence category to theperfect authentic cadence (Latham, 2009; Schmalfeldt,1992), called the 1-schema model; and those that distin-guish the PAC, IAC andHC categories from the cadentialdeviations because the former categories allow listenersto generate expectations as to how they might end, calledthe Prospective (orGenuine) Schemasmodel (Sears, 2015,2016). In the 1-schemamodel, the half cadence representsthe weakest cadential category; it is marked not by a de-viation in the melodic and harmonic context at cadentialarrival (such as the deceptive or evaded cadences), butrather by the absence of that content, resulting in the fol-lowing ordering of the cadence categories based on theirperceived strength, PACIACDCEVHC. In theProspective Schemas model, however, the half cadence isa distinct closing schema that allows listeners to generateexpectations for its terminal events, and so represents astronger ending than the aforementioned cadential de-viations, resulting in the ordering, PACIACHCDCEV (for further details, see Sears, 2015). Experi-ment 2 directly compares these two models of cadentialstrength.

    Third, a number of studies have supported the roleplayed by predictive mechanisms in the segmentationof temporal experience (Brent, 1999; Cohen, Adams, &Heeringa, 2007; Elman, 1990; Kurby & Zacks, 2008;Pearce, Mllensiefen, et al., 2010; Peebles, 2011). In eventsegmentation theory (EST), for example, perceivers formworking memory representations of what is happen-ing now, called event models, and discontinuities in thestimulus elicit prediction errors that force the percep-tual system to update the model and segment activityinto discrete time spans, called events (Kurby & Zacks,2008). In the context of music, such discontinuities cantake many forms: sudden changes in melody, harmony,texture, surface activity, rhythmic duration, dynamics,timbre, pitch register, and so on. What is more, whenthe many parameters effecting segmental grouping acttogether to produce closure at a particular point in acomposition, cadential or otherwise, parametric congru-ence obtains (Meyer, 1973). Thus, Experiment 3 examineswhether (1) the terminal event of a cadence, by servingas a predictable point of closure, is the most expectedevent in the surrounding sequence; and (2) the next eventin the sequence, which initiates the subsequent musicalprocess, is comparatively unexpected. Following EST,the hypothesis here is that unexpected events engender

    prediction errors that lead the perceptual system to seg-ment the event stream into discrete chunks (Kurby &Zacks, 2008). If the terminal events from genuine ca-dential contexts are highly predictable, then predictionerrors for the comparatively unpredictable events thatfollow should force listeners to segment the precedingcadential material. For the cadential deviations, however,prediction errors should occur at, rather than following,the terminal events of the cadence.

    3. Multiple viewpoints

    Most natural languages consist of a finite alphabet ofdiscrete symbols (letters), combinations of which formwords, phrases, and so on. As a result, the mappingbetween the individual letter or word encountered in aprinted text and its symbolic representation in a com-puter database is essentially one-to-one. Music encodingis considerably more complex. Notes, chords, phrases,and the like are characterised by a number of differ-ent features, and so regardless of the unit of meaning,digital encodings of individual events must concurrentlyrepresent multiple properties of the musical surface. Tothat end, many symbolic formats employ some variantof the multiple viewpoints framework first proposed byConklin (1988, 1990) and Conklin and Witten (1995),and later extended and refined by Pearce (2005), Pearceet al. (2005), and Pearce and Wiggins (2004).

    Themultiple viewpoints framework accepts sequencesof musical events that typically correspond to individualnotes as notated in a score, but which may also includecomposite events like chords. Each event e consists of a setof basic attributes, and each attribute is associated with atype, , which specifies the properties of that attribute.The syntactic domain (or alphabet) of each type, [ ],denotes the set of all unique elements associated withthat type, and each element of the syntactic domain alsomaps to a corresponding set of elements in the semanticdomain, [[ ]]. Following Conklin, attribute types appearhere in typewriter font to distinguish them from ordi-nary text. To represent a sequence of pitches as scaledegrees derived from the twelve-tone chromatic scale,for example, the type chromatic scale degree (or csd)would consist of the syntactic set, {0, 1, 2, . . . , 11}, andthe semantic set, {1,1/2, 2, . . . , 7}, where 0 represents 1,7 represents 5, and so on (see Figure 1).

    Within this representation language, Conklin andWitten (1995) define several distinct classes of type, butthis study examines just three: basic, derived and linked.Basic types are irreducible representations of the musi-cal surface, which is to say that they cannot be derivedfrom any other type. Thus, an attribute representing thesequenceof pitches from the twelve-tone chromatic scale

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 5

    Figure 1. Top: First violin part from Haydns String Quartet in E,Op. 17/1, i, mm. 12. Bottom: Viewpoint representation.

    hereafter referred to as chromatic pitch, or cpitchwould serve as a basic type inConklins approach becauseit cannot be derived froma sequence of pitch classes, scaledegrees, melodic intervals, or indeed, any other attribute.What is more, basic types represent every event in thecorpus. For example, a sequence of melodic contourswould not constitute a basic type because either the firstor last events of the melody would receive no value.Indeed, an interesting property of the set of n basic typesfor any given corpus is that the Cartesian product of thedomains of those types determines the event space for thecorpus, denoted by :

    = [1] [2] [n]

    Each event consists of an n-tuple in a set of valuescorresponding to the set of basic types that determinethe event space. therefore denotes the set of all repre-sentable events in the corpus (Pearce, 2005).

    As shouldnowbe clear from the examples given above,derived types like pitch class, scale degree, and melodicinterval do not appear in the event space but are derivedfrom one or more of the basic types. Thus, for every typein the encoded representation there exists a partial func-tion, denoted by% , whichmaps sequences of events ontoelements of type . The term viewpoint therefore refers tothe function associated with its type, but for convenienceConklin and Pearce refer to viewpoints by the types theymodel.1 The function is partial because the outputmay beundefined for certain events in the sequence (denoted by). Again, viewpoints for attributes like melodic contouror melodic interval demonstrate this point, since eitherthe first or last element will receive no value (i.e. it will beundefined).

    Basic and derived types attempt to model the relationswithin attributes, but they fail to represent the relationsbetween attributes. Prototypical utterances like cadences,for example, are necessarily comprised of a cluster ofco-occurring features, so it is important to note that therelations between those features could be just as signif-

    1For basic types like cpitch, % is simply a projection function, therebyreturning as output the same values it receives as input (Pearce, 2005, p. 59).

    icant as their presence (or absence) (Gjerdingen, 1991).This is to say that the harmonic progression VI pre-sented in isolation does not provide sufficient groundsfor the identification of a perfect authentic cadence, butthe co-occurrence of that progression with 1 in the so-prano, a six-four sonority preceding the root-positiondominant, or a trill above the dominant makes such aninterpretation farmore likely. Linked viewpoints attemptto model correlations between these sorts of attributes bycalculating the cross-product of their constituent types.

    4. Finite-context models

    4.1. Maximum likelihood estimation

    The goal of finite-context models like IDyOM is to de-rive from a corpus of example sequences a model thatestimates the probability of event ei given a precedingsequence of events e1 to ei1, notated here as ei11 . Thus,the function p(ei|ei11 ) assumes that the identity of eachevent in the sequence depends only on the events thatprecede it. In principle, the length of the context is limitedonly by the length of the sequence ei11 , but contextmodels typically stipulate a global order bound such thatthe probability of the next event depends only on theprevious n 1 events, or p(ei|ei1(in)+1). Following theMarkov assumption, the model described here is an (n1)th orderMarkovmodel, but researchers also sometimescall it an n-gram model because the sequence ei(in)+1 isan n-gram consisting of a context ei1(in)+1 and a single-event prediction ei.

    To estimate the conditional probability functionp(ei|ei1(in)+1) for each event in the test sequence, IDyOMfirst acquires the frequency counts for a collection of suchsequences from a training set. When the trained modelis exposed to the test sequence, it then uses the frequencycounts to estimate the probability distribution governingthe identity of the next event in the sequence given then1preceding events (Pearce, 2005). In this case, IDyOMrelies onmaximum likelihood (ML) estimation.

    p(ei|ei1(in)+1) =c(ei|ei1(in)+1)

    eAc(e|ei1(in)+1)

    (1)

    The numerator terms represent the frequency count cfor the n-gram ei|ei1(in)+1, and the denominator termsrepresent the sum of the frequency counts c associatedwith all of the possible events e in the alphabetA followingthe context ei1(in)+1.

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • 6 D. R. W. SEARS ET AL.

    4.2. Performancemetrics

    To evaluate model performance, the most commonmet-rics derive from information-theoretic measures intro-duced by Shannon (1948, 1951). Returning to Equation1, if the probability of ei is given by the conditionalprobability function p(ei|ei1(in)+1), information content(IC) represents the minimum number of bits required toencode ei in context (MacKay, 2003).

    IC(ei|ei1(in)+1) = log21

    p(ei|ei1(in)+1)(2)

    IC is inversely proportional to p and so represents thedegree of contextual unexpectedness or surprise associ-ated with ei. Researchers often prefer to report IC overp because it has a more convenient scale (p can becomevanishingly small), and since it also has a well-definedinterpretation in data compression theory (Pearce, Ruiz,et al., 2010), we will prefer it in the analyses that fol-low.

    Whereas IC represents the degree of unexpectednessassociated with a particular event ei in the sequence,Shannon entropy (H) represents the degree of contextualuncertainty associated with the probability distributiongoverning that outcome, where the probability estimatesare independent and sum to one.

    H(ei1(in)+1) =

    eAp(ei|ei1(in)+1)IC(ei|ei1(in)+1) (3)

    H is computed by averaging the information content overall e in A following the context ei1(in)+1. According toShannons equation, if the probability of a given outcomeis 1, the probabilities for all of the remaining outcomeswill be 0, andH = 0 (i.e. maximum certainty). If all of theoutcomes are equally likely, however,H will bemaximum(i.e. maximum uncertainty). Thus, one can assume thatthe best performing models will minimise uncertainty.

    In practice, we rarely know the true probability distri-butionof the stochastic process (Pearce&Wiggins, 2004),so it is often necessary to evaluate model performanceusing an alternativemeasure called cross entropy, denotedby Hm.

    Hm(pm, ej1) =

    1j

    j

    i=1log2 pm(ei|ei11 ) (4)

    Whereas H represents the average information contentover all e in the alphabet A, Hm represents the aver-age information content for the model probabilities es-timated by pm over all e in the sequence e

    j1. That is,

    cross entropy provides an estimate of how uncertain amodel is, on average, when predicting a given sequence

    of events (Manning & Schtze, 1999; Pearce & Wiggins,2004). As a consequence, Hm is often used to evaluatethe performance of context models for tasks like speechrecognition, machine translation, and spelling correctionbecause, as Brown and his co-authors put it, models forwhich the cross entropy is lower lead directly to betterperformance (Brown, Della Pietra, Della Pietra, Lai, &Mercer, 1992, p. 39).

    4.3. Prediction by Partial Match

    Because the number of potential patterns decreases dra-matically as the value of n increases, high-order modelsoften suffer from the zero-frequency problem, in whichn-grams encountered in the test set do not appear inthe training set (Witten & Bell, 1991). To resolve thisissue, IDyOM applies a data compression scheme calledPrediction by PartialMatch (PPM), which adjusts theMLestimate for each event in the sequence by combining (orsmoothing) predictions generated at higher orders withless sparsely estimated predictions from lower orders(Cleary &Witten, 1984). Context models estimated withthe PPM scheme typically use a procedure called backoffsmoothing (or blending), which assigns some portion ofthe probability mass from each distribution to an escapeprobability using an escapemethod to accommodate pre-dictions that do not appear in the training set. Whena given event does not appear in the n 1 order dis-tribution, PPM stores the escape probability and theniteratively backs off to lower order distributions until itpredicts the event or reaches the zeroth-order distribu-tion, at which point it transmits the probability estimatefor a uniform distribution over A (i.e. where every eventin the alphabet is equally likely). PPM then multipliesthese probability estimates together to obtain the final(smoothed) estimate.

    Unfortunately there is no sound theoretical basis forchoosing the appropriate escape method (Witten & Bell,1991), but two recent studies have demonstrated the po-tential of Moffats (1990) method C to minimise modeluncertainty in melodic and harmonic prediction tasks(Hedges & Wiggins, 2016; Pearce & Wiggins, 2004), sowe employ that method here.

    (ei1(in)+1) =t(ei1(in)+1)

    eAc(e|ei1(in)+1) + t(ei1(in)+1)

    (5)

    Escape method C represents the escape count t as thenumber of distinct symbols that follow thecontext ei1(in)+1. To calculate the escape probability forevents that do not appear in the training set, represents

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 7

    the ratio of the escape count t to the sum of the frequencycounts c and t for the context ei1(in)+1. The appeal ofthis escape method is that it assigns greater weighting tohigher-order predictions (which are more specific to thecontext) over lower order predictions (which are moregeneral) in the final probability estimate (Bunton, 1996;Pearce, 2005). Thus, Equation 1 can be revised in thefollowing way:

    (ei|ei1(in)+1) =c(ei|ei1(in)+1)

    eAc(e|ei1(in)+1) + t(ei1(in)+1)

    (6)

    The PPM scheme just described remains the canoni-cal method in many context models (Cleary & Teahan,1997), but Bunton (1997) has since provided a variantsmoothing technique called mixtures that generally im-proves model performance, but which, following Chenand Goodman (1999), we refer to as interpolated smooth-ing (Pearce & Wiggins, 2004). The central idea behindinterpolated smoothing is to compute a weighted combi-nation of higher order and lower order models for everyevent in the sequenceregardless of whether that eventfeatures n-grams with non-zero countsunder the as-sumption that the addition of lower order models mightgenerate more accurate probability estimates.2

    Formally, interpolated smoothing estimates the prob-ability function p(ei|ei1(in)+1) by recursively computing aweighted combination of the (n 1)th order distributionwith the (n2)th order distribution (Pearce, 2005; Pearce&Wiggins, 2004).

    p(ei|ei1(in)+1)

    ={

    (ei|ei1(in)+1) + (ei1(in)+1)p(e

    i1(in)+2) if e

    i1(in)+2 =

    1|A|+1t() otherwise

    (7)

    In the context of interpolated smoothing, it can be helpfulto think of as a weighting function, with serving asthe weighted ML estimate. Unlike the backoff smooth-ing procedure, which terminates at the first non-zeroprediction, interpolated smoothing recursively adjuststhe probability estimate for each orderregardless ofwhether the corresponding n-gram features a non-zerocountand then terminateswith the probability estimatefor , which represents a uniform distribution over |A|+1 t() events (i.e. where every event in the alphabet isequally likely). Also note here that in the PPM scheme,

    2Context models like the one just described also often use a techniquecalled exclusion, which improves the final probability estimate by reclaiming aportion of the probabilitymass in lower ordermodels that is otherwise wastedon redundant predictions (i.e. the counts for events that were predicted in thehigher-order distributions do not need to be included in the calculation of thelower order distributions).

    the alphabet A increases by one event to accommodatethe escape count t but decreases by the number of eventsin A that never appear in the corpus.3

    4.4. Variable orders

    The optimal order for context models depends on thenature of the corpus, which in the absence of a prioriknowledge can only be determined empirically (2004, p.2). To resolve this issue, IDyOM employs an extensionto PPM called PPM* (Cleary & Teahan, 1997), whichincludes contexts of variable length and thus eliminatesthe need to impose an arbitrary order bound (Pearce &Wiggins, 2004, p. 6). In the PPM* scheme, the contextlength is allowed to vary for each event in the sequence,with the maximum context length selected using simpleheuristics to minimise model uncertainty. Specifically,PPM* exploits the fact that the observed frequency ofnovel events is much lower than expected for contextsthat feature exactly one prediction, called deterministiccontexts. As a result, the entropy of the distributionsestimated at or below deterministic contexts tends to belower than in non-deterministic contexts. Thus, PPM*selects the shortest deterministic context to serve as theglobal order bound for each event in the sequence. If sucha context does not exist, PPM* then selects the longestmatching context.

    5. Methods

    5.1. The corpus

    The corpus consists of symbolic representations of 50sonata-form expositions selected from Haydns stringquartets (17711803). Table 2 presents the reference in-formation, keys, time signatures and tempo markingsfor each movement. The corpus spans much of Haydnsmature compositional style (Opp. 1776), with the ma-jority of the expositions selected from first movements(28) or finales (11), and with the remainder appearing ininnermovements (ii: 8; iii: 3). Allmovementswere down-loaded from the KernScores database in MIDI format.4To ensure that each instrumental part would qualify asmonophonicapre-requisite for the analytical techniquesthat followall trills, extended string techniques, andother ornaments were removed. For events presentingextended string techniques (e.g. double or triple stops),note events in each part were retained that preservedthe voice leading both within and between instrumentalparts. Table 3 provides a few descriptives concerning thenumber of note and chord events in each movement.

    3For a worked example of the PPM* method, see Sears (2016).4http://kern.ccarh.org/.

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

    http://kern.ccarh.org/

  • 8 D. R. W. SEARS ET AL.

    Table 2. Reference information (Opus number, work, movement, measures), keys (case denotes mode), time signatures and tempomarkings for the exposition sections in the corpus.

    Excerpt Key Time signature Tempo marking

    Op. 17, No. 1, i, mm. 143 E 4/4 ModeratoOp. 17, No. 2, i, mm. 138 F 4/4 ModeratoOp. 17, No. 3, iv, mm. 126 E 4/4 Allegro moltoOp. 17, No. 4, i, mm. 153 c 4/4 ModeratoOp. 17, No. 5, i, mm. 133 G 4/4 ModeratoOp. 17, No. 6, i, mm. 173 D 6/8 PrestoOp. 20, No. 1, iv, mm. 155 E 2/4 PrestoOp. 20, No. 3, i, mm. 194 g 2/4 Allegro con spiritoOp. 20, No. 3, iii, mm. 143 G 3/4 Poco AdagioOp. 20, No. 3, iv, mm. 142 g 4/4 Allegro moltoOp. 20, No. 4, i, mm. 1112 D 3/4 Allegro di moltoOp. 20, No. 4, iv, mm. 149 D 4/4 Presto scherzandoOp. 20, No. 5, i, mm. 148 f 4/4 Allegro moderatoOp. 20, No. 6, ii, mm. 127 E cut AdagioOp. 33, No. 1, i, mm. 137 b 4/4 Allegro moderatoOp. 33, No. 1, iii, mm. 140 D 6/8 AndanteOp. 33, No. 2, i, mm. 132 E 4/4 Allegro moderatoOp. 33, No. 3, iii, mm. 129 F 3/4 AdagioOp. 33, No. 4, i, mm. 131 B 4/4 Allegro moderatoOp. 33, No. 5, i, mm. 195 G 2/4 Vivace assaiOp. 33, No. 5, ii, mm. 130 g 4/4 LargoOp. 50, No. 1, i, mm. 160 B cut AllegroOp. 50, No. 1, iv, mm. 175 B 2/4 VivaceOp. 50, No. 2, i, mm. 1106 C 3/4 VivaceOp. 50, No. 2, iv, mm. 186 C 2/4 Vivace assaiOp. 50, No. 3, iv, mm. 174 E 2/4 PrestoOp. 50, No. 4, i, mm. 164 f 3/4 Allegro spirituosoOp. 50, No. 5, i, mm. 165 F 2/4 Allegro moderatoOp. 50, No. 5, iv, mm. 154 F 6/8 VivaceOp. 50, No. 6, i, mm. 154 D 4/4 AllegroOp. 50, No. 6, ii, mm. 125 d 6/8 Poco AdagioOp. 54, No. 1, i, mm. 147 G 4/4 Allegro con brioOp. 54, No. 1, ii, mm. 154 C 6/8 AllegrettoOp. 54, No. 2, i, mm. 187 C 4/4 VivaceOp. 54, No. 3, i, mm. 158 E cut AllegroOp. 54, No. 3, iv, mm. 182 E 2/4 PrestoOp. 55, No. 1, ii, mm. 136 D 2/4 Adagio cantabileOp. 55, No. 2, ii, mm. 176 f cut AllegroOp. 55, No. 3, i, mm. 175 B 3/4 Vivace assaiOp. 64, No. 3, i, mm. 169 B 3/4 Vivace assaiOp. 64, No. 3, iv, mm. 179 B 2/4 Allegro con spiritoOp. 64, No. 4, i, mm. 138 G 4/4 Allegro con brioOp. 64, No. 4, iv, mm. 166 G 6/8 PrestoOp. 64, No. 6, i, mm. 145 E 4/4 AllegrettoOp. 71, No. 1, i, mm. 169 B 4/4 AllegroOp. 74, No. 1, i, mm. 154 C 4/4 Allegro moderatoOp. 74, No. 1, ii, mm. 157 G 3/8 Andantino graziosoOp. 76, No. 2, i, mm. 156 d 4/4 AllegroOp. 76, No. 4, i, mm. 168 B 4/4 Allegro con spiritoOp. 76, No. 5, ii, mm. 133 F cut Largo. Cantabile e mesto

    Table 3. Descriptive statistics for the corpus.

    Instrumental part N M (SD) Range

    Note eventsViolin 1 14, 506 290 (78) 133442Violin 2 10, 653 213 (70) 69409Viola 9156 183 (63) 79381Cello 8463 169 (60) 64326

    Chord eventsExpansiona 20, 290 406 (100) 189620

    aTo identify chord events in polyphonic textures, full expansion duplicatesoverlapping note events at every unique onset time (Conklin, 2002).

    To examine model predictions for the cadences inthe corpus, we classified exemplars of the five cadence

    categories that achieve (or at least promise) cadentialarrival inCaplins cadence typologyPAC, IAC,HC,DCand EV (see Table 1). The corpus contains 270 cadences,but 15 cadences were excluded because either the ca-dential bass or soprano does not appear in the cello andfirst violin parts, respectively. Additionally, another 10cadences were excluded because they imply more thanone category (i.e. PACEV or DCEV). Thus, for theanalyses that follow, the cadence collection consists of245 cadences.

    Shown in the right-most column of Table 1, the per-fect authentic cadence and the half cadence representthe most prevalent categories, followed by the caden-

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 9

    tial deviations: the deceptive and evaded categories. Theimperfect authentic cadence is the least common cat-egory, which perhaps reflects the late-century stylisticpreference for perfect authentic cadential closure at theends of themes and larger sections. This distribution alsolargely replicates previous findings forMozarts keyboardsonatas (Rohrmeier & Neuwirth, 2011), so it is possiblethat this distribution may characterise the classical stylein general.

    5.2. Viewpoint selection

    To select the appropriate viewpoints for the predictionof cadences in Haydns string quartets, we have adoptedGjerdingens schema-theoretic approach (2007), whichrepresents the core events of the cadence by the scaledegrees and melodic contours of the outer voices (i.e.the two-voice framework), a coefficient representing thestrength of the metric position (strong, weak), and asonority, presentedusingfiguredbass notation.Given theimportance of melodic intervals in studies of recognitionmemory for melodies (Dowling, 1981) wemight also addthis attribute toGjerdingens list.However, for themajor-ity of the encoded cadences from the cadence collection,the terminal events at the moment of cadential arrivalappear in strongmetric positions, and fewof the cadencesfeature unexpected durations or inter-onset intervals atthe cadential arrival, sowehave excluded viewpointmod-els for rhythmic or metric attributes from the presentinvestigation, concentrating instead on those viewpointsrepresenting pitch-based (melodic or harmonic) expec-tations. What is more, IDyOM was designed to com-bine melodic predictions from two or more viewpointsby mapping the probability distributions over their re-spective alphabets back into distributions over a singlebasic viewpoint, such as the pitches of the twelve-tonechromatic scale (i.e. cpitch). Thus, for the purposesof model comparison it will also be useful to includecpitch as a baseline melodic model in the analyses thatfollow.

    5.2.1. Note eventsFour viewpoints were initially selected to represent noteevents in the outer parts: chromatic pitch (cpitch),melodic pitch interval (melint), melodic contour(contour), and chromatic scale degree (csd). As de-scribed previously, cpitch represents pitches as inte-gers from 0127 (in the MIDI representation, C4 is 60),and serves as the baseline model for the other melodicviewpoint models examined in this study. To derive se-quences of melodic intervals, melint computes the nu-merical difference between adjacent events in cpitch,where ascending intervals are positive and descending

    intervals are negative. The viewpoint contour thenreduces the information present in melint, with allascending intervals receiving a value of 1, all descendingintervals a value of1, and all lateral motion a value of 0.Finally, to relate cpitch to a referential tonic pitch classfor every event in the corpus, we manually annotated thekey, mode, modulations and pivot boundaries for eachmovement and then included the analysis in a separatetext file to accompany the MIDI representation, both ofwhich appear in the Supplementary materials for eachmovement in the corpus. Thus, every note event was as-sociated with the viewpoints key and mode. The vectorof keys assumes values in the set {0, 1, 2, , 11 }, where0 represents the key of C, 1 represents C or D, andso on. Passages in the major and minor modes receivevalues of 0 and 1, respectively. The viewpoint csd thenmaps cpitch to key and reduces the resulting vectorof chromatic scale degreesmodulo 12 such that 0 denotesthe tonic scale degree, 7 the dominant scale degree, andso on. By way of example, Figure 1 presents the viewpointrepresentation for the first violin part from the openingtwo measures of the first movement of Haydns StringQuartet in E, Op. 17/1.

    As mentioned previously, IDyOM is capable of indi-vidually predicting any one of these viewpoints using thePPM* scheme, but it can also combine viewpoint modelsfor note-event predictions of the same basic viewpoint(i.e. cpitch) using a weighted multiplicative combina-tion scheme that assigns greater weights to viewpointswhose predictions are associated with lower entropy atthat point in the sequence (Pearce et al., 2005). To de-termine the combined probability distribution for eachevent in the test sequence, IDyOM then computes theproduct of the weighted probability estimates from eachviewpoint model for each possible value of the predictedviewpoint.

    Furthermore, IDyOM can automate the viewpoint se-lection process using a hill-climbing procedure calledforward stepwise selection, which picks the combinationof viewpoints that yields the richest structural repre-sentations of the musical surface and minimises modeluncertainty. Given an empty set of viewpoints, the step-wise selection algorithm iteratively selects the viewpointmodel additions or deletions that yield the most im-provement in cross entropy, terminating when no ad-dition or deletion yields an improvement (Pearce, 2005;Potter, Wiggins, & Pearce, 2007). To derive the optimalviewpoint system for the representation of melodicexpectations, we employed stepwise selection for the fol-lowing viewpoints: cpitch, melint, csd, andcontour. In this case, IDyOMbegins with the above setof viewpoint models, but also includes the linked view-points derived from that set (i.e. cpitch melint,

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • 10 D. R. W. SEARS ET AL.

    cpitchcsd,cpitchcontour,melintcsd,melint contour, csd contour), resulting in apool of ten individual viewpoint models from which toderive the optimal combination of viewpoints.

    Viewpoint selection derived the same combination ofviewpoint models for the first violin and the cello. Forthis corpus, melint was the best performing viewpointmodel in the first step, receiving a cross entropy estimateof 3.006 in the first violin and 2.798 in the cello. Inthe second step, the combination of melint with thelinked viewpoint csd cpitch decreased the crossentropy estimate to 2.765 in the first violin and 2.556in the cello. Including any of the remaining viewpointsdid not improve model performance, so the stepwiseselection procedure terminated with this combination ofviewpoints. In Section 6, we refer to this viewpointmodelas selection. What is more, the contour modelreceived a much higher cross entropy estimate than theother viewpoint models, so we elected to exclude it in theexperiments reported here. Thus, the final melodic view-point models selected for the present study are cpitch,melint, csd, and selection.

    5.2.2. Chord eventsTo accommodate chord events, we have extended themultiple viewpoints framework by performing a full ex-pansion of the symbolic encoding, which duplicates over-lapping note events across the instrumental parts at ev-ery unique onset time (Conklin, 2002). This represen-tation yielded two harmonic viewpoints: vertical intervalclass combination (vintcc) and chromatic scale-degreecombination (csdc). The viewpoint vintcc producesa sequence of chords that have analogues in figured-bass nomenclature by modelling the vertical intervals insemitones modulo 12 between the lowest instrumentalpart and the upper parts from cpitch. Unfortunately,however, the syntactic domain of vintcc is rather large;the domain of each vertical interval class between anytwo instrumental parts is {0, 1, 2, . . . , 11,}, yielding 13possible classes, so the number of combinatorial possi-bilities for combinations containing two, three, or fourinstrumental parts is 133 1, or 2196 combinations.

    To reduce the syntactic domain while retaining thosechord combinations that approximate figured bass sym-bols, Quinn (2010) assumed that the precise locationand repeated appearance of a given interval in the in-strumental texture are inconsequential to the identity ofthe combination. Adopting that approach here, we haveexcluded note events in the upper parts that double thelowest instrumental part at the unison or octave, allowedpermutations between vertical intervals, and excluded in-terval repetitions. As a consequence, the first two criteriareduce the major triads 4, 7, 0 and 7, 4, 0 to 4, 7,,

    while the third criterion reduces the chords 4, 4, 10and 4, 10, 10 to 4, 10,. This procedure dramaticallyreduces the potential domain of vintcc from 2196 to232 unique vertical interval class combinations, thoughthe corpus only contained 190 of the 232 possible com-binations, reducing the domain yet further.

    To relate each combination to an underlying tonic, theviewpoint csdc represents vertical sonorities as com-binations of chromatic scale degrees that are intendedto approximate Roman numerals. The viewpoint csdcincludes the chromatic scale degrees derived from csdas combinations of two, three or four instrumental parts.Here, the number of possibilities increases exponentiallyto 134 131, or 28, 548 combinations, since the cellopart is now encoded explicitly in combinationscontaining all four parts. Rather than treating permutablecombinations as equivalent (e.g. 0, 4, 7, and4, 7, 0,), as was done for vintcc, it will also beuseful to retain the chromatic scale degree in the lowestinstrumental part in csdc and only permit permuta-tions in the upper parts. Excluding voice doublings andpermitting permutations in the upper parts reduces thepotential domain of csdc to 2784, though in the corpusthe domain reduced yet further to 688 distinct combina-tions.

    Finally, a composite viewpoint was also created torepresent those viewpoint models characterising pitch-based (i.e. melodic and harmonic) expectations moregenerally. To simulate the cognitive mechanisms under-lying melodic segmentation, Pearce, Mllensiefen, et al.(2010) found it beneficial to combine viewpoint predic-tions for basic attributes like chromatic pitch, inter-onsetinterval, and offset-to-onset interval by multiplying thecomponent probabilities to reach an overall probabilityfor each note in the sequence as the joint probability ofthe individual basic attributes being predicted. Followingtheir approach, the viewpoint model composite rep-resents the product of the selection viewpoint modelfrom the first violin (to represent melodic expectations)and the csdc viewpoint model (to represent harmonicexpectations) for each unique onset time for which a noteand chord event appear in the corpus. In this case, csdcwas preferred to vintcc in the composite modelbecause the former viewpoint explicitly encodes the chro-matic scale-degree successions in the lowest instrumentalpart along with the relevant scale degrees from the upperparts.

    5.3. Long-term vs. short-term

    To improve model performance, IDyOM separatelyestimates and then combines two subordinate modelstrained on different subsets of the corpus for each view-

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 11

    point: a long-termmodel (LTM), which is trained on theentire corpus to simulate long-term, schematic knowl-edge; and a short-term model (STM), which is initiallyempty for each individual composition and then is trainedincrementally to simulate short-term, dynamic knowl-edge (Pearce &Wiggins, 2012). As a result, the long-termmodel reflects inter-opus statistics from a large corpusof compositions, whereas the short-term model only re-flects intra-opus statistics, some of which may be specificto that composition (Conklin & Witten, 1995; Pearce& Wiggins, 2004). Like the STM, the LTM may alsobe slightly improved by incrementally training on thecomposition being predicted, called LTM+.However, theSTM only discards statistics when it reaches the end ofthe composition, so it far surpasses the supposed upperlimits for short-term andworkingmemory of around 1012 s (Snyder, 2000), sometimes by several minutes. Whatis more, the STM should be irrelevant for the presentpurposes, since cadences exemplify the kinds of inter-opus patterns that listeners are likely to store in long-term memory. Thus, we have elected to omit the STM inthe analyses that follow and only present the probabilityestimates from LTM+.

    5.4. Performance evaluation

    Context models like IDyOM depend on a training setand a test set, but in this case the corpus will need toserve as both. To accommodate small corpora like thisone, IDyOMemploys a resampling approach called k-foldcross-validation (Dietterich, 1998), using cross entropy asameasure of performance (Conklin&Witten, 1995). Thecorpus is divided into k disjoint subsets containing thesame number of compositions, and the LTM+ is trainedk times on k 1 subsets, each time leaving out a differentsubset for testing. IDyOM then computes an average ofthe k cross entropy values as a measure of the modelsperformance. Following Pearce and Wiggins (2004), weuse 10-fold cross validation for the models that follow.

    6. Computational experiments

    6.1. Experiment 1

    Theperfect authentic andhalf cadence categories accountfor 206 of the 245 cadences from the collection, so itseems reasonable that listeners with sufficient exposureto music of the classical style will form schematic expec-tations for the terminal events of exemplars from thesetwo categories. What is more, if cadences are the mostpredictable formul in all of tonal music, we shouldexpect to find lower IC estimates for the terminal eventsfrom the aforementioned cadence categories compared

    to those from non-cadential closing contexts even if theyboth share similar or even identical terminal events. Thus,Experiment 1 examines the hypothesis that cadences aremore predictable than their non-cadential counterparts.

    6.1.1. AnalysisTo compare the PAC and HC categories against non-cadential contexts exhibiting varying degrees of closureor stability, each of the viewpoints estimated by IDyOMwas analysed for the terminal note events from the first vi-olin and cellorepresented by the viewpoints cpitch,melint, csd, and selectionand the terminalchord events from the entire texturerepresented by theviewpoints vintcc, csdc, and compositeusing aone-way analysis of variance (ANOVA)with a three-levelbetween-groups factor called closure. To examine the ICestimates for the first (tonic) type, tonic closure consistsof three levels:PAC, which consists of the IC estimates forthe terminal events from the 122 exemplars of the PACcategory; tonic, which consists of an equal-sized sampleof events selected randomly from the corpus that appearin strong metric positions (i.e. appearing at the tactuslevel; see Sears, 2016) and feature tonic harmony in rootposition and any scale degree in the soprano; and non-tonic, which again consists of an equal-sized sample ofevents selected randomly from the corpus that appearin strong metric positions, but that feature any otherharmony and any other scale degree in the soprano.

    To examine the IC estimates for the second (domi-nant) type, dominant closure was designed in much thesameway.HC consists of the IC estimates for the terminalevents from the 84 exemplars of the HC category, whilethe other two levels consist of equal-sized samples ofnon-cadential events selected at random. Events fromdominant appear in strong metric positions and featuredominant harmony in root positionwith any scale degreein the soprano, while events from other appear in strongmetric positions but exclude events featuring tonic ordominant harmonies in root position. The assumptionbehind this additional exclusion criterion is that tonicevents in root position are potentially more predictablethan root-position dominants in half-cadential contexts(an assumption examined in greater detail in Experiment2), so it was necessary here to provide a condition thatallows us to compare the IC estimates for the terminalevents in half-cadential contexts against those featur-ing other, presumably less stable harmonies and scaledegrees.

    For every between-groups factor examined in theexperiments reported here, Levenes equality of variancesrevealed significant differences between groups for nearlyevery viewpointmodel. Thus, we employ an alternative toFishers F ratio that is generally robust to heteroscedastic

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • 12 D. R. W. SEARS ET AL.

    data, called the Welch F ratio (Welch, 1951). To deter-mine the effect size both for theWelch F ratio and for theplanned comparisons described shortly, we use Cohens(2008) recent notation of a common effect size measurecalled estimated 2.

    To address more specific hypotheses about the po-tential differences in the IC estimates for the terminalevents from cadential and non-cadential contexts, eachmodel also includes twoplanned comparisons that donotassume equal variances: the first to determine whetherthe IC estimates from the corresponding cadence cate-gory differ significantly from the two non-cadential levels(Cadences vs. Non-Cadences), and the second to deter-mine whether the IC estimates from the correspond-ing cadence category differ significantly from the second(tonic or dominant) level of closure (PAC vs Tonic orHC vs. Dominant). Unfortunately, these additional testsincrease the risk of committing aType I error, sowe applyBonferroni correction to the planned comparisons.

    6.1.2. ResultsThe top bar plots in Figure 2 display the mean IC esti-mates for the terminal note event in the first violin (left)and cello (right) for each level of tonic closure. Table 4presents the omnibus statistics andplanned comparisons.Beginning with the first violin, one-way ANOVAs ofthe IC estimates revealed a main effect for the view-points melint, csd, and the optimised combinationselection, but the baseline viewpoint, cpitch, wasnot significant. Mean IC estimates also increased signifi-cantly from PAC to the non-cadential levels of tonic clo-sure for melint, csd, and selection. Although thistrend also emerged for the second planned comparisonbetweenPAC and tonic, only themelintmodel revealeda significant effect. Thus, the viewpoint models for thefirst violin demonstrated that terminal note events fromcadential contexts are more predictable than those fromnon-cadential contexts.

    For the cello, one-wayANOVAs revealed amain effectof tonic closure for every viewpoint, but the direction ofthe effect was reversed. Mean IC estimates decreased ineverymodel fromPAC to the non-cadential levels of tonicclosure, as well as from PAC to tonic. Thus, contrary toour predictions, the terminal events in the cello from ca-dential contexts were actually less predictable than thosefrom non-cadential contexts.

    The bottom-left bar plot in Figure 2 displays the meanIC estimates for the terminal chord eventrepresentedby vintcc and csdcfor each level of the between-groups factor. One-way ANOVAs again revealed a maineffect of tonic closure for vintcc and csdc, with themean IC estimates increasing from PAC to the non-cadential levels of tonic closure. The second planned com-

    parison comparing PAC to tonic was not significant foreither viewpoint model, however. Thus, for both modelsthe terminal chord events from cadential contexts weremorepredictable than those fromnon-cadential contexts.

    To represent the predictability of the harmony andmelody in a single IC estimate for each note/chord event,we created a composite viewpoint that reflects thejoint probability of csdc and selectionvl1. Thebottom-right line plot in Figure 2 displays the mean ICestimates for the terminal composite event for eachlevel of tonic closure. In this case, the one-way ANOVArevealed a significant main effect, with the mean IC es-timates increasing from PAC to the non-cadential lev-els of closure, and the increase from PAC to tonic wasmarginally significant. As a result, composite demon-strated an ascending staircase for the levels of tonic clo-sure, withPAC receiving the lowest IC estimates and non-tonic receiving the highest IC estimates.

    The top bar plots in Figure 3 display the mean ICestimates for the terminal note event in the first violin(left) and cello (right) for each level of dominant closure.Table 5 presents the omnibus statistics and planned com-parisons. For the first violin, one-way ANOVAs revealeda main effect for every viewpoint model. As expected,the mean IC estimates also increased significantly fromHC to the non-cadential levels of dominant closure foreverymodel. The increase fromHC to dominant was alsosignificant for cpitch,melint, and selection, butnot for csd.

    For the cello, the mean IC estimates demonstrated asignificant effect of dominant closure for csd, but theother viewpoint models were not significant. For csd,themean IC estimates increased significantly fromHC tothe non-cadential levels. A similar trend also emerged forthe second planned comparison between HC and domi-nant, but the effect was not significant. For the viewpointmodels representing harmonic progressions, the meanIC estimates revealed main effects of dominant closurefor vintcc and csdc, suggesting that the terminalnote and chord events represented in the cello and theentire multi-voiced texture are more predictable in half-cadential contexts than in non-cadential contexts. Thefirst planned comparison comparing HC with the twonon-cadential levels was not significant for these view-point models, however. Finally, the composite view-point demonstrated a significant main effect of dominantclosure, with the mean IC estimates increasing signifi-cantly fromHC to the non-cadential levels, but not fromHC to dominant.

    6.1.3. DiscussionBoth between-groups factors demonstrated significantlylower mean IC estimates for the terminal events from

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 13

    Figure 2. Top: Bar plots of themean information content (IC) estimated for the terminal note event in the first violin (left) and cello (right)for each level of tonic closure. Viewpoints include cpitch, melint, csd, and an optimised combination called selection, whichrepresentsmelint and the linked viewpointcsdcpitch. Bottom left: Bar plot of themean IC estimated for the terminalvintccand csdc for each level of tonic closure. Bottom right: Line plot of the mean IC estimated for the combination called composite,which represents the joint probability of selectionvl1 and csdc. Whiskers represent1 standard error.

    Table 4. Analysis of variance and planned comparisons predicting the information content estimates from all viewpoint models withtonic closure.

    Omnibus ComparisonsPAC vs. Non-Cadence PAC vs. Tonic

    Viewpoint df Welch F est. 2 p df t r p df t r p

    Note eventsViolin 1cpitch 241.99 1.56 .003 NS 241.58 1.43 .09 NS 241.99 0.73 .05 NSmelint 238.65 5.80 .03 .003 289.44 3.38 .19 .002 239.31 2.47 .16 .029csd 238.43 7.83 .04

  • 14 D. R. W. SEARS ET AL.

    Figure 3. Top: Bar plots of themean information content (IC) estimated for the terminal note event in the first violin (left) and cello (right)for each level of dominant closure. Viewpoints include cpitch, melint, csd, and an optimised combination called selection,which represents melint and the linked viewpoint csd cpitch. Bottom left: Bar plot of the mean IC estimated for the terminalvintcc and csdc for each level of dominant closure. Bottom right: Line plot of the mean IC estimated for the combination calledcomposite, which represents the joint probability of selectionvl1 and csdc. Whiskers represent1 standard error.

    Table 5. Analysis of variance and planned comparisons predicting the information content estimates from all viewpoint models withdominant closure.

    Omnibus ComparisonsHC vs. Non-Cadence HC vs. Dominant

    Viewpoint df Welch F est. 2 p df t r p df t r p

    Note eventsViolin 1cpitch 164.25 4.82 .02 .009 191.26 3.11 .22 .004 165.44 2.55 .19 .024melint 159.52 5.89 .03 .003 227.92 3.42 .22 .002 146.00 2.99 .24 .007csd 162.09 8.14 .04

  • JOURNAL OF NEWMUSIC RESEARCH 15

    Caplins typology. We might hypothesise, for example,that the strength and specificity of our schematic expecta-tions formed in prospect and their subsequent realisationin retrospect contributes to the perception of cadentialstrength, where the most expected (i.e. probable) endingsare also the most complete or closed. From this pointof view, the probabilities estimated by IDyOM mightcorrespond with models of cadential strength advancedin contemporary cadence typologies.

    6.2. Experiment 2

    Recall that the cadence collection consists of exemplarsfrom five categories in Caplins typology: PAC, IAC, HC,DC, and EV. Sears (2015) recently classified models thatestimate the closural strength of these categories into twogeneral types: those that relate every cadential categoryto one essential prototype, called the 1-Schema model(Latham, 2009; Schmalfeldt, 1992); and those that dis-tinguish the categories according to whether they allowlisteners to form expectations as to how they might end,called the Prospective Schemas model (Sears, 2015) (seeSection 2). Experiment 2 directly compares these twomodels of cadential strength.

    6.2.1. AnalysisTocompare themean IC estimates for the terminal eventsfrom each cadence category, each of the viewpoints wasagain analysed for the terminal note events from the firstviolin and cello and the terminal chord events from theentire texture using a one-way ANOVA with a five-levelbetween-groups factor called cadence category (PAC, IAC,HC, DC, and EV). To examine the potential differencesin the IC estimates for the terminal events from eachcadence category, eachmodel includes two planned com-parisons that do not assume equal variances, with a Bon-ferroni correction applied to the obtained statistics. In thefirst comparison, each level of cadence categorywas codedto represent twomodels of cadential strength: ProspectiveSchemas (PACIACHCDCEV ) and 1-Schema(PACIACDCEVHC). Polynomial contrastswith linear and quadratic terms were then included toestimate the goodness-of-fit for each model. In what fol-lows, we report the contrast whose linear or quadratictrend accounts for the greatest proportion of variance inthe outcome variable. The second comparison examinesthe hypothesis that the genuine cadence categories inCaplins typology elicit lower IC estimates on averagethan the cadential deviations (Genuine vs. Deviations).

    6.2.2. ResultsFigure 4 displays line plots of the mean IC estimates forthe terminal note event in the first violin (left) and cello

    (right) for each level of cadence category. Table 6 presentsthe omnibus statistics and planned comparisons. For thefirst violin, the mean IC estimates revealed a main effectfor the viewpoints cpitch, csd, and selection, butnot for melint. Moreover, the best-fitting polynomialcontrast revealed a positive (increasing) linear trend inthe Prospective Schemas model (i.e. from the PAC to theEV categories) for every viewpoint model. The genuinecadence categories also received lowermean IC estimatesthan the cadential deviations in every model.

    For the cello, the IC estimates also revealed a maineffect of cadence category for every viewpoint model, andthe Prospective Schemas model again produced the bestfit, with polynomial contrasts revealing positive quadratictrends for cpitch and melint, but positive lineartrends for csd and selection. The quadratic trendexhibited in the cpitch and melint models for thecello probably reflects the statistical preference forsmallermelodic intervals in the corpus, resulting in lowermean IC estimates for categories that feature stepwisemotion in the bass (HCandDC), and higher estimates forcategories featuring large leaps (PAC, IAC, and EV). Thistrendwasnot demonstrated in thecsd andselectionviewpoint models, however, as the DC category receivedhigher IC estimates relative to the other categories inthese models, thereby resulting in positive linear trends.Presumably, the HC category received the lowest IC es-timates on average because scale-degree successions like45 aremore common than successions like 51. And yetsuccessions like 56 are also evidently less common than51, hence the higher IC estimates for the DC categoryand the increasing linear trend, PACIACDCEV.Finally, as expected, the genuine cadence categories re-ceived lower mean IC estimates than the cadential devia-tions in every model.

    The left line plot in Figure 5 displays the mean ICestimates for the terminal chord eventrepresented byvintcc and csdcfor each level of cadence category.As before, the IC estimates revealed a main effect forvintcc and csdc, and the best-fitting polynomial con-trast revealed a positive linear trend in the ProspectiveSchemas model for both models. The genuine cadencecategories also received lower mean IC estimates thanthe cadential deviations for vintcc and csdc.

    It is also noteworthy that the terminal events fromthe EV category generally received lower IC estimatesthan those from the DC category. Recall that evadedcadences are typically characterised not by a deviation inthe harmonic progression (though such a deviation maytake place), but rather by a sudden interruption in theprojected resolution of the melody. In this collection, 10of the 11 evaded cadences feature tonic harmony eitherin root position or in first inversion at the moment of

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • 16 D. R. W. SEARS ET AL.

    Figure 4. Line plots of the mean IC estimated for the terminal note event in the first violin (left) and cello (right) for each cadencecategory. Viewpoints includecpitch,melint,csd, and an optimised combination calledselection, which representsmelintand the linked viewpoint csd cpitch. Whiskers represent1 standard error.

    Table 6. Analysis of variance and planned comparisons predicting the information content estimates from all viewpoint models withcadence categories.

    Omnibus ComparisonsProspective Schemas Genuine vs. Deviations

    Viewpoint df Welch F est. 2 p Trend df t r p df t r p

    Note eventsViolin 1cpitch 32.42 3.19 .03 .026 Linear 13.47 3.40 .68 .013 17.19 3.64 .66 .006melint 31.57 2.34 .02 NS Linear 12.13 3.00 .65 .032 14.14 2.94 .62 .033csd 29.95 3.02 .03 .033 Linear 18.99 3.43 .62 .008 30.51 3.17 .50 .009selection 29.80 3.77 .04 .013 Linear 16.38 3.85 .69 .004 26.03 3.66 .58 .003

    Cellocpitch 31.18 8.86 .11

  • JOURNAL OF NEWMUSIC RESEARCH 17

    Figure 5. Left: Line plot of the mean IC estimated for the terminal vintcc and csdc for each cadence category. Right: Line plot ofthe mean IC estimated for the combination called composite, which represents the product of selectionvl1 and csdc. Whiskersrepresent1 standard error.

    of cadential excerpts from Mozarts keyboard sonatas.The genuine cadence categories and the cadential devia-tions received the highest and lowest completion ratings,respectively, which in light of the present findings sug-gests that the perceived strength of the cadentialending corresponds with the strength of the schematicexpectations it generates in prospect. But recall that theperception of closure also depends on the cessation ofexpectations following the terminal events of the cadence.That is, the strength of the potential boundary betweentwo sequential events results in part from the increasein information content (or decrease in probability) fromthe first to the second event (i.e. the last event of onegroup to the first event of the following group). The pre-ceding analyses examined terminal events from cadentialand non-cadential contexts in isolation, so Experiment3 considers the role played by schematic expectations inboundary perception and event segmentation by exam-ining the time course of IC estimates surrounding theterminal events of the cadence.

    6.3. Experiment 3

    Experiment 3 examines two claims about the relation-ship between expectancy and boundary perception: (1)that the terminal event of a group is the most expected(i.e. predictable) event in the surrounding sequence; and(2) that the next event in the sequence is comparativelyunexpected (i.e. unpredictable). Again, the hypothesishere is that unexpected events engender prediction errorsthat lead the perceptual system to segment the eventstream into discrete chunks (Kurby & Zacks, 2008). Theterminal events from the PAC, IAC and HC categoriesshould behighly predictable, andprediction errors for thecomparatively unpredictable events that follow shouldforce listeners to segment the preceding cadential ma-terial. For the cadential deviations, however, predictionerrors should occur at, rather than following, the terminalevents of the cadence.

    6.3.1. AnalysisIn addition to the between-groups factor of cadence cat-egory, Experiment 3 includes a between-groups factor oftime that consists of three levels: et , which represents theterminal event of the group, and et1 and et+1, which rep-resent the immediately surrounding events. With morecomplex designs like this one, the number of significancetests can become prohibitively large, sowe have restrictedthe investigation to the four viewpoints that serve asreasonable approximations of the two-voice frameworkcharacterising the classical cadence: selectionvl1 andselectionvc to represent the soprano and bass, re-spectively, and vintcc and csdc to each represent theentire texture. Experiment 3 analyses these viewpointsusing a 5 3 two-way ANOVA with between-groupsfactors of cadence category (PAC, IAC,HC, DC, and EV),and time (et1, et , et+1).

    By moving from one to two between-groups factors,the number of omnibus statistics and planned compar-isons necessarily increases, and since Levenes test alsorevealed heteroscedastic groups for all four of the 5 3viewpoint ANOVAs, the risk of committing a Type Ierror is considerably greater here than in the previousexperiments. In this case, the two hypotheses mentionedabove concern the interaction between cadence categoryand time: namely, whether the IC estimates for eachcadence category increase or decrease significantly fromone event to the next. Thus, the following analysis ignoresthe main effects and concentrates only on the interactionterm of the two-way ANOVA. If the interaction is sig-nificant, we report simple main effects, which representone-way ANOVAs with time as a factor for each level ofcadence category. Finally, to examine the potential dif-ferences in the IC estimates for the levels of time for eachcadence category, each simple main effect included twoplanned comparisons with Bonferroni correction that donot assume equal variances: (1) whether the IC estimatefor et is lower on average than the surrounding events,et1 and et+1 (et vs. Surrounding); and (2) whether the IC

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • 18 D. R. W. SEARS ET AL.

    Figure 6. Time course of the mean IC estimated for theevents surrounding the terminal note event in the first violin(top) and cello (bottom) for each cadence category using theviewpoint selection, which represents melint and thelinked viewpointcsdcpitch. The statistical analysis pertainsto event numbers 1, 0 (or Terminus), and 1. Whiskers represent1 standard error.

    estimate for et+1 is higher on average than the estimatefor et (et vs. et+1).

    6.3.2. ResultsFigure 6 displays line plots of the mean IC estimates forthe note events over time in the first violin (top) andcello (bottom) for each level of cadence category. Table 7presents the omnibus statistics for the simplemain effectsand planned comparisons. To gain a more global pictureof the IC time course, the line plots present the meanIC estimates for the seven-event sequence surroundingthe terminal event of each cadence category, but theANOVAmodels only consider the three events,1, 0 (orTerminus), and 1. For the first violin, a two-way ANOVAof themean IC estimates revealed a significant interactionbetween cadence category and time, F(8, 718) = 3.88,p < .001, est. 2 = .03. The mean IC estimates foreach level of cadence category revealed simplemain effectsfor PAC and HC, but the remaining categories were notsignificant.

    Despite the non-significant simple main effects for theIAC, DC, and EV categories, simple planned compar-isons revealed significant trends over time for every ca-dence category.As expected, the terminal event in thefirstviolin received lower IC estimates on average than theimmediately surrounding events for the genuine cadencecategories and the DC category, though the latter trendwas marginal. Thus, for cadences featuring melodies thatresolve to presumably stable scale degrees like 1 or 3, theterminal event of the group is also the most predictableevent in the sequence.

    For the PAC, IAC, HC, and DC categories, the meanICestimates increased significantly from et to et+1, therebysupporting the view that the strength of the perceptualboundarydependson the increase in information contentfollowing the terminal event of the cadence. And yet sincethe EV category replaces the expected terminal event inthe melody with material that clearly initiates the sub-sequent processoften by leaping up to an unexpectedscale degree like 5one might therefore predict that asignificant increase in information content should occurat (and not following) the expected terminal event ofthe group. This is exactly what we observe, with theexpected terminal events from the EV category receivingthe highest mean IC estimate in the sequence (see Table7). Thus, the pattern of results from selectionvl1 isentirely consistent with the two main hypotheses: (1)the terminal event of a group is the most predictableevent in the sequence, and (2) the next event is compara-tively unpredictable. Here, the mean IC estimates for thefirst violin increased significantly following the predictedboundary for every cadence category in the collection.

    For the cello, a two-way ANOVA of the mean IC esti-mates revealed a significant interaction between cadencecategory and time, F(8, 717) = 13.02, p < .001, est.2 =.12. Excepting IAC, the mean IC estimates also revealedsimple main effects for every level of cadence category. Asexpected, the terminal event in the cello received lower ICestimates on average than the immediately surroundingevents for the HC category, but the trend was reversedfor the PAC, DC and EV categories, and the trend for theIAC category was not significant.

    For the HC category at least, the terminal event wasalso the most predictable event in the sequence. Further-more, the significant increase in information content inthe cello at the expected terminal event in the DC and EVcategories is consistent with the behaviour of cadentialdeviations. For the former category, the bass typicallyresolves deceptively to scale degrees like 6, thereby vi-olating expectations for 1, whereas the latter categoryevades the expected resolution by leaping to other scaledegrees to support harmonies like I6. The significantincrease in information content for the terminal event ofthe PAC category is somewhatmore surprising, however.Recall from Experiment 2 that the mean IC estimatesfor the terminal events from each cadence category inthe cello demonstrated a positive quadratic trend, withthe HC category receiving the lowest IC estimates (seeFigure 4). In that case, we suggested that small melodicintervals appearmore abundantly in the corpus than largeintervals, resulting in higher IC estimates for categoriesfeaturing large leaps (PAC, IAC and EV). From this pointof view, it seems reasonable that the mean IC estimatesfor the cello would increase at et for categories featuring

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 19

    Table 7. Analysis of variance and planned comparisons predicting the information content estimates from selectionvl1,selectionvc, and vintcc over time for each cadence category.

    Omnibus Comparisonset vs. Surrounding et vs. et+1

    Viewpoint df Welch F est. 2 p df t r p df t r p

    Note eventsselectionvl1PAC 236.76 67.63 .35

  • 20 D. R. W. SEARS ET AL.

    Figure 7. Time course of the mean IC estimated for the eventssurrounding the terminal chord event for vintcc (top) andcsdc (bottom) for each cadence category. The statistical analysispertains to event numbers 1, 0 (or Terminus), and 1. Whiskersrepresent1 standard error.

    event in the sequence; and (2) the next event is compar-atively unexpected (i.e. unpredictable).

    7. Conclusions

    This study examined three claims about the relationshipbetween expectancy and cadential closure: (1) terminalevents from cadential contexts are more predictable thanthose from non-cadential contexts; (2) models of ca-dential strength advanced in cadence typologies like theone employed here reflect the formation, violation, andfulfilment of schematic expectations; and (3) a significantdecrease in predictability follows the terminal note andchord events of the cadential process. To that end, wecreated a corpus of Haydn string quartets to serve as aproxy for the musical experiences of listeners situated inthe classical style, selected a number of viewpoints to rep-resent suitable (i.e. cognitively plausible) representationsof the musical surface, and then employed IDyOMan n-gram model that predicts the next note or chordevent in a musical stimulus through unsupervised learn-ing of sequential structureto simulate the formation ofschematic expectations during music listening.

    The findings from Experiment 1 indicate that the ter-minal note and chord events from perfect authentic ca-dences aremorepredictable than (1)non-cadential eventsfeaturing tonic harmony in root position and supportingany scale degree in the soprano, and (2) non-cadentialevents featuring any other harmony and any other scaledegree in the soprano. For half cadences, significant ef-fects were limited to the chord models (vintcc, csdc,and composite) and the csd viewpoint model, butthe terminal events from half-cadential contexts were

    still more predictable than those fromnon-cadential con-texts. Experiment 2 provided strong evidence in supportof the Prospective Schemas model of cadential strength(PACIACHCDCEV), with the genuinecadence categories (PAC, IAC, HC) and cadential devi-ations (DC, EV) in Caplins typology eliciting the lowestand highest IC estimates on average, respectively. Finally,the results from Experiment 3 indicated that unexpectedeventslike those directly following the terminal noteand chord events from genuine cadencesengender pre-diction errors that presumably lead the perceptual systemto segment the event stream immediately following thecadential process.

    Taken together, the reported findings support the roleof expectancy in models of cadential closure, with themost complete or closed cadences also serving as themostexpected or probable. Nevertheless, future studies willneed to address a number of limitations in the currentinvestigation. First, the rather meager sample size forthree of the five cadence categories in the collectionas well as the corpus more generallycasts some doubtupon the generalisability of the reported findings. Thatthe estimates from IDyOM correspond so well with the-oretical predictions suggests that these findings may berobust to issues of sample size, but future studies shouldlook to expand the collection considerably, as well as toconsider how the relationship between expectancy andcadential closure varies for other genres and style periods.

    Second, we selected individual viewpoints if existingtheoretical or experimental evidence justified their in-clusion, such as melint and csd in melodic contexts(Dowling, 1981; Krumhansl, 1990), and vintcc andcsdc in harmonic contexts (Gjerdingen, 2007). Yet in afew instances,cpitchwhichwasonly included amongthe melodic viewpoints to serve as a baseline for modelcomparisonproduced similar results (see Experiment2). One reason for this finding is that many of the view-points characterising melodic organisation in Haydnsstring quartets systematically covary such that statisti-cal regularities governing the more cognitively plausibleviewpoints (e.g. melint and csd) also appear in theless plausible ones (cpitch), albeit more weakly. In amelody composed in the key of C-major, for example,B presumably functions as the leading tone, not be-cause the twelve-tone chromatic universe regularly fea-tures this two-note sequence regardless of the particulartonal context, but because C typically follows B in thekey of C-major, forming one of the many statistical as-sociations characterising the tonal system. Nevertheless,the tendency for small melodic intervals and the preva-lence of certain keys in tonal musicwherein B is morelikely to progress to C than to, say, Aensures thatIDyOM will learn statistical regularities in basic view-

    Dow

    nloa

    ded

    by [Q

    ueen

    Mar

    y U

    nive

    rsity

    of L

    ondo

    n] a

    t 02:

    52 2

    2 N

    ovem

    ber 2

    017

  • JOURNAL OF NEWMUSIC RESEARCH 21

    points like cpitch that are correlated to those found inother melodic viewpoints like melint or csd.

    What is more, rather than assume that listeners expectspecific intervals in a melodic sequence, as is IDyOMsapproach using melint, existing models of melodicexpectation typically theorise that listeners expect smallermelodic intervals regardless of the preceding context, aprinciple known as pitch proximity (e.g. Cuddy &Lunney, 1995; Margulis, 2005; Narmour, 1990;Schellenberg, 1997). Yet in this study, we only assumethat listeners form expectations on the basis of statisticalregularities among melodic intervals in a sequence. Vosand Troost (1989) have demonstrated, for example, thatsmall intervals are far more common than large intervalsinWestern tonalmusic, so it should not be surprising thatIDyOMproduces higher probability estimates for smallerintervals, just as do proximity-based models. The differ-ence in these two approaches is thus theoretical, ratherthan empirical, in that IDyOM bases its predictions on atheory of implicit statistical learning, whereas proximity-based models also sometimes base their assumptions onother (sensory or psychoacoustic) mechanisms. It is cer-tainly possible that these mechanisms influence the pref-erence for smaller over larger intervals in Western tonalmusicor indeed, the formation of expectations duringmusic listening more generallybut we do not examinethis assumption here.

    Perhapsmore importantly, the cross entropy estimatesfor themelodicmodels in this study indicate that IDyOMwas more certain about its predictions for more cogni-tively plausible viewpoints like melint and csd thanfor less plausible ones like cpitch, thereby reinforcingthe view that representations of the musical surface arecognitively plausible to the degree that they minimiseprediction errors for future events (Pearce & Wiggins,2012). Indeed, if prediction is the primary function ofthe brain (Hawkins & Blakeslee, 2004, p. 89), and lis-teners learn to compress information during process-ing by only retaining representations of the musical sur-face thatminimise uncertainty (Pearce &Wiggins, 2012),