stat 157 probability in musicaldous/157/old... · stat 157 final project probability in music...

37
STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315 Yufan Hu 23063102 Yuqiu Shen 23094429 December 2014

Upload: others

Post on 02-Jun-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

 

 

 

 

 

 

STAT 157 FINAL PROJECT

PROBABILITY IN MUSIC

Composer Styles, Music Perception, Identification, and Generation

by

Xu Deng 22821315 Yufan Hu 23063102 Yuqiu Shen 23094429

December 2014

 

 

 

 

 

 

 

 

Page 2: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

1

Table of Contents

I. INTRODUCTION ............................................................................................................................. 2

II. DATA ................................................................................................................................................. 3

III. CLASSICAL COMPOSER STYLES AND PITCH PROFILES .................................................... 4 3.1 INTRODUCTION ....................................................................................................................................................... 4 3.2 DATA SELECTION AND CLEANING ..................................................................................................................... 4 3.3 STYLE ANALYSIS THROUGH PITCH PROFILES ................................................................................................... 6

3.3.1 Central Pitch Profile ............................................................................................................................................ 6 3.3.2 Range Profile ....................................................................................................................................................... 6 3.3.3 Proximity Profile .................................................................................................................................................. 9

3.4 SUMMARY ................................................................................................................................................................ 11

IV. EXPECTATION AND ERROR DETECTION ........................................................................... 11 4.1 INTRODUCTION ..................................................................................................................................................... 11 4.2 DATA: MELODIC EXPECTATION STUDY .......................................................................................................... 11 4.3 INTERPRETATION ................................................................................................................................................. 12 4.4 IMPROVEMENTS ..................................................................................................................................................... 14

V. MUSIC PREDICATION AND IDENTIFICATION .................................................................... 14 5.1 INTRODUCTION ..................................................................................................................................................... 14 5.2 DATA ....................................................................................................................................................................... 14 5.3 LOGISTIC REGRESSION ANALYSIS ..................................................................................................................... 15

5.3.1 Chinese Folk Song Model .................................................................................................................................. 15 5.3.2 British Child Ballad Model ................................................................................................................................ 16 5.3.3 Traditional African Music Model ...................................................................................................................... 17

5.4 MODEL PREDICTION RESULTS ........................................................................................................................... 17 5.4.1 Chinese Folk Song Prediction Result .................................................................................................................. 17 5.4.2 British Child Ballads Prediction Result .............................................................................................................. 18 5.4.3 Traditional African Music Prediction ................................................................................................................. 19

5.5 MEAN SQUARED ERROR ANALYSIS ................................................................................................................... 19 5.6 SUMMARY ................................................................................................................................................................ 20

VI. MARKOV CHAINS AND MUSIC GENERATION .................................................................... 21 6.1 INTRODUCTION ..................................................................................................................................................... 21 6.2 DATA ....................................................................................................................................................................... 21 6.3 MARKOV CHAIN APPROACH ............................................................................................................................... 22

6.3.1 Model 1 ............................................................................................................................................................. 22 6.3.2 Model 2 ............................................................................................................................................................. 23

6.4 SUMMARY ................................................................................................................................................................ 24

VII. CONCLUSION ............................................................................................................................ 24

REFERENCES .................................................................................................................................... 26

APPENDIX I: MIDI NOTES CONVERTER .................................................................................... 27

APPENDIX II: MARKOV CHAIN GENERATION RESULTS MODEL 1 ..................................... 28

APPENDIX III: MARKOV CHAIN GENERATION RESULTS MODEL 2 ................................... 30

APPENDIX IV: SOURCE R CODES ................................................................................................. 32  

Page 3: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

2

I. Introduction

The topic of interest for out project is inspired by our belief that the composition of music

can be regarded as a complex system of probabilities. This system of probability is shaped by human

perception and from the system arises expectations of music, upon which music is built. Different

musical styles will be connected with different probability relationships and with the hope of

perceiving and understanding music from a new perspective, we carried out our statistical analysis

on music. We find our project topic a very interesting one to work with not only because music is a

common interest of the group members but also because it connects two subject areas that are not

commonly connected with one another and gives a new interpretation to the meanings of music.

David Temperley's Music and Probability1 first introduced us to the musical data format that is

available for statistical analysis. It also gave us many inspirations on the potential topics we can

explore and the analyses that we can perform with musical data.

Starting from Music and Probability, and continuing on with readings of academic papers on

related topics, we finally narrow down the parts of analysis that we would like to focus on for our

group project from a broad array of potential subject matters. The project will be comprised of four

parts. The first section starts with the analysis of classical composers and their styles by calculating

statistical distributions of pitches. The second section provides a background on the power of

cognitive human perception on music expectations. If humans can easily distinguish and predict

music, we would like to see if music identification is possible by conducting logistic regressions,

which is presented in the third section of this paper. Finally, in the last part of the project, we use a

simple first order Markov Chain to test how well music can be made from probability.

                                                                                                               1 Temperley, 2007 2 http://newt.phys.unsw.edu.au/jw/notes.html

Page 4: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

3

II. Data

The data we used are mainly collected from the KernScore Website, which is a library of

virtual musical scores in various data formats. For the purpose of this project, the Melisma Format is

used to carry out our data analyses. Each of the musical data files in the Melisma Format consists of

a list of notes with an on-time, and off-time (both in milliseconds) and a pitch. The on-time means

the start time of the note, and the off-time means the end time of the note, and the pitches of the

notes are quantified in MIDI numbers2. For instance, the integer 60 stands for central C under

octave 4 (C4). Consequently, each integer (from 0 to 127) stands for a semi-tone. 61 symbolizes C#

and 69 symbolizes A under octave 4 (A4). A complete MIDI note converter chart is included in

Appendix I. Figure 1 shows the first five lines of a sample Melisma Format musical data file.

Figure 1 Sample Melisma Format Note File

To avoid errors that may be caused by performance styles, the files we used are quantized

files as oppose to performance files. Quantized files are those generated directly from a score,

whereas performance files are generated from live performance on the MIDI keyboard. In quantized

files, all time points (on-times and off-times) are quantized to a large value such as 250 milliseconds.

For example, assuming that a quarter note value is 60 beats per minutes (bpm), a note that lasts 250

milliseconds (the difference between off-time and on-time is 250) is a sixteenth note and a note that

lasts 1000 milliseconds is a quarter3. If an obtained data file is inconsistent with this format, we use R

to transform data into the desired outputs.

                                                                                                               2 http://newt.phys.unsw.edu.au/jw/notes.html 3https://www.msu.edu/course/asc/232/song_project/dectalk_pages/note_to_%20ms.html

Page 5: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

4

III. Classical Composer Styles and Pitch Profiles

3.1 Introduction

In this section, we want to explore the role that key plays in melodies. As numerous

psychological and music experimental works have studied, listeners perceive music in terms of notes

and chords relative to the key, rather than in absolute terms.

With such interest of learning key patterns, we present the pitch profiles of four selected

classical composer representatives, Bach, Beethoven, Chopin, and Mozart. Our analysis of pitch

profiles is mainly inspired by David Temperley’s Pitch Model. The Krumhansl-Schmuckler Key-

finding algorithm4, based on a set of “key-profiles” representing the stability or compatibility of each

pitch-class relative to each key, also made an impact on our thinking process. Given such theoretical

foundation, we generate our pitch profiles from 3 perspectives of musical key characteristics and

functions in melodies: Central pitch profile, Range profile, and Proximity profile.

In order to see how key plays an important role in forming various musical techniques,

genres, and styles, we graph pitch profiles of the selected classical composers and analyze the

similarities and differences among them. Ideally, the results will match our assumptions, based on

Temperley’s Pitch Model, that beloved melodies should also enjoy similar statistical patterns and

hypothetically normal distributions as those from Temperley’s data analysis on the Essen Folksong

collection.

3.2 Data Selection and Cleaning

The four composers, Bach, Mozart, Beethoven, and Mozart, are selected based on few

principles: they are from similar time periods; they are well-known composers of which we can

obtain a large enough database for analysis; their music covers similar genres to exclude possible

                                                                                                               4 Krumhansl & Schmuckler, 1986. Schmuckler & Tomovski, 1995.

Page 6: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

5

outliers due to systematical genre difference. A summary of their background information is

summarized in Table 1.

Composer Country Year Musical Style & Characters

J.S.Bach Germany 1685-1750 Baroque. Counterpoint and harmonic. Wolfgang Mozart Austria 1756-1791 Significant counterpoint. Cover all genres. Ludwig Beethoven Germany 1770-1827 Large, extended architectonic structure. Frédéric Chopin Poland 1780-1849 Instrumental ballades, mazurkas, waltz, etc.

Table 1: Summary of Classical Composers

The background knowledge of classical music and composer history is important because,

for instance, music theory wise, Chopin's music was strongly influenced by classical traditions of J.S.

Bach and Mozart. Therefore, statistically, we also expect to see similarities (some characteristics of

counterpoint, in fact) among these composers.

We gather the data on the four composers’ music pieces gratefully from a public music

analysis project founded by Carnegie Mellon School of Computer Science5. This data is a chosen

subset of the KernScore data for classical composers. For each of the composers, we selected

around 15 piano pieces, including both the left hand score and right hand score for analysis. We

examine and select data carefully, distinguishing between the quantized and the performance files of

a same music piece. Because the original data is inconsistent in terms of format and cannot be

directly used for data processing, we write a few functions and commands in R to clean up the data

into the desired Melisma Format. In order to carry out the range and proximity analysis, we added

columns for the time differences between on-time and off-time, and the pitch differences between

consecutive notes.

                                                                                                               5 http://www.cs.cmu.edu/afs/cs.cmu.edu/user/sleator/public/music-analysis/notefiles/misc/

Page 7: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

6

3.3 Style Analysis through Pitch Profiles

3.3.1 Central Pitch Profile

By creating Pitch Profiles, we would like to find preliminary answers to the questions: what

kinds of pitch sequences make a likely melody and do the sequences differ between musical styles?

The analysis starts at finding the range into which music notes usually falls. Within such range, the

central pitch of a melody is selected to be not the tonal center, but rather the center of the range.

Other constraints in the music files include the variance of the pitches, with respect of the mean

pitch of a composers' music. We observe that most of the notes fall in octaves above C4 and from

Table 2, we see that all composers have central pitches slightly above 60. This would suggests that

composers emphasize the higher-pitch right hand side of music score, probably because the main

melody of music is usually capture in there.

Composer Central Pitch SD Bach 63 9.644854

Mozart 65 10.08715

Beethoven 65 10.14119

Chopin 62 9.405644

Table 2: Summary of Central Pitches

3.3.2 Range Profile

In the range profile, a second distribution is created centered around the central pitch. The

range profiles of the four chosen composers are presented through histograms in Figure 2. The

black curve is the smoothed histogram and the blue curve is the curve of a normal distribution with

parameters being the mean and variance calculated from each composer's data sample. The red

vertical line shows the central pitch of each composer.

Page 8: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

7

Figure 2: Range profiles of 4 Classical Composers

The melodies of each composer demonstrate series of notes generated from the range

profile. Interestingly, the shape of a composer's range profile depicts the music styles of the

composer. Thanks to Beethoven’s musical composition style, which is characterized by including

large, extended architectonic structure with solid juxtaposition of different keys, the distribution of

pitches in his composition fits fairly well with normal distribution. However, the range profiles of

Bach and Mozart include less significant “peaks” of the bell-shape, but rather a flatten plateau of

pitches close to the central pitch. Especially in the range profile of Bach, we can observe an

extended spread of pitch selection from about 50 to 70, which suggests that Bach use a wide range

pitches from 50 to 70 with equal preference and has a higher probability, compared to Beethoven, to

use a pitch that is further away from the central pitch. Mozart, being influenced by Bach's

Page 9: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

8

composition styles, also has such similar characteristics, although less prominent. In order to better

understand this phenomenon, we take a closer look at Bach’s composition characteristics and

explore the connections between them.

We use the software, Music Animation Machine, to visualize a few pieces of Bach’s famous

work from the database we build. Below are two snapshots from two representative works: Bach,

“Great” Fugue in G minor, BWV 542; and Bach, Prelude 22, Book 2, Well-Tempered Clavier, B-flat.

Figure 3: Bach's piano roll

The visualization of sample pieces reminds us of Bach’s intensive usage of counterpoint and fugue

in his composition. In music, counterpoint is the relationship between voices that are

interdependent harmonically (polyphony) yet independent in rhythm and contour. Similarly, fugue is

contrapuntal composition in which a short melody or phrase (the subject) is introduced by one part

and successively taken up by others and developed by interweaving the parts. These are shown in

Figure 3 by the sections of parallel musical notes that occurs repetitively at different time intervals.

The frequent use of such techniques in his composition results in the extended range of frequent

pitches. Briefly speaking, as the parallel pitch sequences in the graphs above suggest, by using

counterpoint, Bach includes similar pitch sequences (same intervals and pattern of ascending and

descending melodies) with variance of pitch range and timing, so that the range profile gets

expanded systematically. From the music theory’s point of view, the wide range profile of Bach

could also be attributed to his preference of a tradition in classical music of writing music in sets of

Page 10: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

9

pieces that collectively cover all the major and minor keys of the chromatic scale. As a matter of fact,

well known examples include Bach’s The Well-Tempered Clavier and Chopin’s 24 Preludes, Op. 28, which

are both included in our dataset. Similar to the case of Bach, we believe that other composers'

favored techniques also explain the shapes and characteristics of their range profiles.

3.3.3 Proximity Profile

Given the normal distribution of range profile, we wonder how pitches form melodies.

From our personal experiences, we would guess that intervals between adjacent notes should be

relatively small so that a melody would sound rather smooth with no significant disruption. This

observation is also backed up by music theories of other scholars. The phenomenon, known as pitch

proximity, that usually small intervals occur between adjacent notes is statistically demonstrated by

von Hippel and Huron in Why Do Skips Precede Reversals? The Effect of Tessitura on Melodic Structure6. It

also describes a human preference in cognitive and auditory perception, which is demonstrated

several times in experiments.7

The histograms of pitch intervals (the pitch difference of two consecutive notes) of four

composers are plotted respectively in Figure 4. Table 3 summarizes the statistics of the proximity

distribution.

Composer Mean of intervals SD of intervals

Bach 0.001257862 12.33653

Mozart -0.002214417 10.25252

Beethoven -0.002545069 12.14284

Chopin -0.006589786 12.51439

Table 3: Statistical summary of Proximity Profile

                                                                                                               6 von Hippel & Huron, 2000. 7 Miller and Heise 1950; Schellenberg 1996; Deutsch 1999l Temperley, 2007  

Page 11: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

10

Figure 4: Proximity Profiles of 4 Classical Composers

From the data sample, the means of pitch differences for all four composers are very close to 0, and

the intervals between adjacent notes tend to form an approximately normal distribution with center

around 0. There is a wide range of possible pitch intervals, from -40 to 40, because the database we

used for analysis is based on note files in polyphonic format, which includes the music performed by

both hands. Compared to the histograms of the range profiles, the proximity profiles are fairly

symmetrical around mean 0, which suggests that for all four composers, on average it is equal likely

to move to a higher or lower pitch from the current one. However, the shapes of the histograms

differ between composers. For Mozart and Bach, there is a very high probability that the second

note will land within 4 semi-tones of the previous note. An example of an interval of 4 semi-tones is

from C4 to E4. The similarity in the shapes of Mozart and Bach's proximity profiles once again

Page 12: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

11

shows the influence of Bach on Mozart. Chopin, on the other hand, seems to enjoy jumps around

pitches and Beethoven follows the normal distribution the best, probably because of the wide

variation of styles in his music.

3.4 Summary

In the above analysis of the three kinds of pitch profiles, the central pitch and range profiles

give insights on the fairly limited range of pitches often used in classical music. The proximity

profiles suggest the patterns of how pitch sequences are formed. We also see how musical

techniques, genres and styles influence pitch profiles. Therefore, we can conclude that probability

and statistics have the ability to capture musical characteristics.

IV. Expectation and Error Detection

4.1 Introduction

The pitch profiles that describe the basic characteristics of musical composition and styles

suggest that there should be some connection between the current note and the notes following it.

One very important aspect of melody perception is expectation. From our experiences we know that

when listening to music, listeners form expectations as to what will happen next. This is the reason

why we can tell whether someone is singing off-tone. The fulfillment and denial of such

expectations are important to musical affect and meaning.

4.2 Data: Melodic Expectation Study

Cuddy and Lunney performed a melodic expectation study in 1995. In order to understand

human perception on melody formation, we used the data from this study for analysis8. Cuddy and

Lunney played participants two-tone contexts followed by a continuation tone, and had participants

judge the expectedness of the continuation tone on a scale of 1 to 7, with 1 being the least expected

and 7 being the most. Table 4 shows the eight two-tone contexts used in the study. For each one, a

                                                                                                               8 http://theory.esm.rochester.edu/temperley/music-prob/data/cuddy-lunney-data

Page 13: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

12

range of 25 possible continuation tones was used, within an octave above or below the second

context tone. As an example, for the two-tone contexts C4 and D4, the continuation tones range

from D3 to D5, including all these notes in between them, increasing in semitones.

Two-tone Contexts Continuation Tone C4 D4 [D3-D5] C4 Bb3 [Bb2-Bb4] C4 Eb4 [Eb3-Eb5] C4 A3 [A2-A4] C4 A4 [A3-A5] C4 Eb3 [Eb2-Eb4] C4 Bb4 [Bb3-Bb5] C4 D3 [D2-D4]

Table 4: Two-tone Contexts and Continuation Tones

With 8 two-tone contexts each having 25 continuation tones, the data included the mean

expectedness ratings for all 200 stimuli. The results from the expectation study is valuable for

analysis because the composed music that we hear and enjoy only suggests the pitch sequences that

people most expect on average, whereas the expectation study analyses relative expectedness of

different possible continuations, some being very rare in produced music.

4.3 Interpretation

A heat map is produced to interpret the expectedness ratings of continuation tones in Figure

5. The vertical axis shows the 8 different two-tone contexts and the horizontal axis shows the 25

possible continuation tones in relation to the second context tone. The darker colors indicate higher

expectedness ratings (indicating that the listeners thinks the continuation tone heard is more

expected) given the context and the lighter colors indicate lower ratings. It is very obvious from this

graph that the color is darker in the middle section and lighter on two sides. This suggests that no

matter which two-tone contexts are used, listeners expect the continuous tone to be close to the

previous note.

Page 14: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

13

Figure 5: Expectedness Ratings of Continuation Tones

To look into the variations of expectedness ratings of continuation tones in a greater detail, a

line graph for the descending major second from C4 to B-flat 3 is produced in Figure 6.

Figure 6: Expectedness Ratings of Continuation Tones Following a Descending Major Second

From the graph, if the second context tone and the continuation tone form a major second (±3),

perfect fourth (±5), perfect fifth (±7), or perfect octave (±12), the ratings are higher than the other

possible choices around the continuation tone. These are musical intervals known for being more

consonant, or stable. On the other hand, for example the augmented fourth (±6) and the major

seventh (±11) have significantly lower ratings. These are also often considered as the more dissonant

musical intervals, infrequently heard in classical music but very common is jazz music. Therefore,

from the aspect of music expectation and error detection, Cuddy and Lunney's study shows that

people tend to expect to hear consonant sounds and regard inharmonic sounds as error. Although

music styles differ, on average we would still expect this pattern to be the same is music composition.

Page 15: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

14

4.4 Improvements

First of all, the influence of key on tone expectation should be another interesting topic for

deeper analysis. In Cuddy and Lunney's study, the two-tone contexts are given without a key context.

However, it would be interesting to reproduce the experiment taking into consideration of the key.

Questions such as would listeners be more expected to hear notes sequences that belong to the same

key or would they be prepared to hear transitions from one key to another are interesting to ask. In

addition, this section only analyzed pitch expectation. However, melodic expectation in general can

be divided into the pitch and the rhythm; the pitch expectation is the issue of what notes will occur,

while rhythmic expectation is the issue of when notes will occur. An addition analysis on rhythmic

expectation would give a more comprehensive picture of music expectations.

V. Music Predication and Identification

5.1 Introduction

We discovered that human beings could easily distinguish the differences between songs

based on the note, rhythm and the beat. We also noticed that human beings are able to distinguish

songs from different styles and regions. We want to know if songs from the same geographical

location have certain consistent characteristics that help people distinguish them apart from others.

In this section, we created logistic regression models in a simple machine learning process to find a

statistical model that help explain how people distinguish music from different regions.

5.2 Data

The data set we use contains monophonic music from 38 British child ballads, 30 traditional

Chinese folk songs, and 24 traditional African songs, all taken from the KernScore Website. Unlike

in the analysis of polyphonic classical music in previous sections, we chose carry out our regressions

models on monophonic music because in monophonic music, there is only one note happening at

each time period and pitch sequence confusions when multiple notes are happening simultaneously

Page 16: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

15

can be avoided. In order to grasp the most characteristics of each song, we created several

explanatory variables for the regression: the average pitch, the variance of the pitch, the range of the

pitch, the difference between on-time and off-time, the mean of the difference between on-time and

off-time, the variance of the difference between on-time and off-time, key, major note, minor note,

and signature. Dummy variables are created to indicate the geographic location and background of

each song.

5.3 Logistic Regression Analysis

Our statistical approach to address the problem is to randomly choose half of the songs (46

songs) from the total 92 songs as a training set and use the other 46 songs as a test set of the

regression model. We further used the training set to generate logistic linear models for all three

regions. Then, we used the 46 songs in the test set to test the reliability of those linear models we

generated. We choose to use logistic regressions to generate linear models that help identify music

from different regions because the response variables for our regressions are regional dummies. We

choose a logistic model over a probit model because our explanatory variables vary in magnitude

(from several thousand milliseconds of time difference to only single digit pitch variances) and a

probit model will predict response variables beyond the probability scope of 0 to 1.

5.3.1 Chinese Folk Song Model

In the Chinese folk song model, the regressor is a dummy variable, with 1 indicating the

song is a Chinese folk song and 0 indicating that the song is from Britain or Africa. We tried to find

variables that best describe the data, so we use a forward variable selection process. The final model

we got includes variables pitch mean, pitch variance, lowest pitch (X1), highest pitch (X2), the

difference between on-time and off-time, variance of the difference, on-time (X1.1), off-time (X1.2),

major or minor, and rhythm. The regression model can be written out to be:

Page 17: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

16

𝑙𝑛(𝑝

1− 𝑝) = 𝛽! + 𝛽!𝑚𝑒𝑎𝑛 + 𝛽!𝑣𝑎𝑟 + 𝛽!𝑋1+ 𝛽!𝑋2+ 𝛽!𝑡𝑖𝑚𝑒𝑑𝑖𝑓𝑓 + 𝛽!𝑋1.1+ 𝛽!𝑡𝑖𝑚𝑒𝑣𝑎𝑟

+ 𝛽!𝑋2.1+ 𝛽!𝑚𝑎𝑗𝑜𝑟𝑚𝑖𝑛𝑜𝑟 + 𝛽!"𝑅ℎ𝑦𝑡ℎ𝑚𝐼 + 𝛽!!𝑅ℎ𝑦𝑡ℎ𝑚𝐼𝐼

where 𝑝 is the probability that a song is a Chinese Folk Song. Let the right hand side of the above

equation be A, then

𝑝 =𝑒𝑥𝑝(𝐴)

1+ 𝑒𝑥𝑝(𝐴)

Figure 7 summarizes the regression outputs of the Chinese Folk Song Model. From the R code

output, we discover that the difference between on-time and off-time tend to be more statistical

significant than other variables.

Figure 7: Regression Outputs of the Chinese Folk Song Model

5.3.2 British Child Ballad Model

Similar to the Chinese model identification, we use the same model for British songs (Figure

8). From the output, the difference between on-time and off-time is slightly more significant than

Page 18: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

17

the other variables. Compared to the Chinese folk music model, the residual deviance and the AIC

for the British Ballad model is higher. It indicates that the logistic model captures Chinese folk

music better.

Figure 8: Regression Outputs for British Children Ballads and African Music Models

5.3.3 Traditional African Music Model

Similarly, we did the same analysis on African music; the output result is a little different

from the other two models (Figure 8). The null deviance is relatively small compared to the other

two models. The residual deviance is the smallest among all three models.

5.4 Model Prediction Results

We further use the model we generated to test the other 46 songs in order to see if the

models are able to predict the music correctly.

5.4.1 Chinese Folk Song Prediction Result

In our songs list, the Chinese traditional songs have indices from 1 to 30. Figure 9 shows the

prediction results. Each number below the song index is the predicted probability that the song is

Page 19: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

18

Chinese. If the predicted probability is approximately 0, then we assume the song is not predicted as

a Chinese song. On the other hand, if the predicted probability of the song is approximately 1, then

we assume the song is predicted as Chinese. The result shows that 13 out 17 Chinese songs are

predicted correctly. Therefore, from the prediction results of the test set, the model predicts 76.47%

correctly.

Figure 9: Chinese Folk Song Model Prediction Results

5.4.2 British Child Ballads Prediction Result

British songs have indices from 55 to 92. Figure 10 shows that 16 songs are predicted

correctly as British child ballads. 80% of the British child ballads are predicted correctly.

Figure 10: British Child Ballads Model Prediction Results

Page 20: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

19

5.4.3 Traditional African Music Prediction

African songs have indices from 31 to 54, and 6 songs out of 11 songs are predicted

correctly (Figure 11). Only 54.55% of African songs are predicted correctly.

Figure 11: Traditional African Model Prediction Results

The prediction results show that most British music is predicted correctly, and the model

only predicts half of the African songs correctly. One reason for this result is that there are only 24

songs in the data set, with only 13 songs in the training set to generate the African music model.

However, there are 38 British songs in total, which provides a larger base of the training set to

generate the model. As a result, the logistic regression model predicts more British songs correctly.

These results confirmed our assumption that there exists a model that helps explain how people

identify music.

5.5 Mean Squared Error Analysis

In order to see how well the models fit our songs, we calculated the mean squared errors

using three different strategies, the validation set, the leave one out, and k-fold cross validations, and

compared errors from the three different models. From Figure 12, the Leave-One Out method gives

the least errors in all three models. Since our data includes only 92 songs, Leave-One Out Cross

Validation is the best method to generate the model. However, this method is time-wise inefficient

and will process very slowly once the data set gets larger. In this case, the K-fold Cross Validation

set will be the best way to produce the regression models. In addition, we observe that despite the

Page 21: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

20

fact that African music had the lowest success rate of identification, African music has the least

errors in all three methods. This implies African music may be more consistent, in both rhythm and

pitch, than the Chinese and British songs.

Figure 12: MSE Comparison

5.6 Summary

Our analysis confirms our assumption that there exist regression models that helps people

identity music. Given the data constraints, the linear models that include the characteristics of the

songs are able to predict the origin of the songs correctly with a relatively high probability. The

prediction results can be easily improved if we had a larger data set. The coefficients of time

signatures and keys of the songs in the regression model are now statistically insignificant. However,

this is because there is a wide range of possible keys and signatures, but relatively fewer songs in the

database to reflect them all. Sometimes, the key of a song in the test set is not covered at all in the

training set due to the lack of data. Obtaining a larger data set will improve the significance and

accuracy of these coefficients. Nevertheless, our regression results show music from different

geographic locations do have distinct characteristics and styles, and thus allows statistic models and

human beings to distinguish amongst music.

Page 22: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

21

VI. Markov Chains and Music Generation

6.1 Introduction

After the previous analysis of the composed music, we now want to know whether similar

styled music can be statistically generated using probability systems and statistical characteristics

calculated from selected data. Using Schubert's monophonic piano pieces as the database to build up

probability systems, we would like to find out models that can be used to generate music using a

Markov Chain approach.

6.2 Data

The data used for music generation is 35 Schubert's monophonic piano pieces, which

include 6036 observations in total. Schubert's range and proximity profiles are plotted in Figure 13.

Comparing to the range profiles we've seen before, Schubert used a smaller range of pitches,

because the songs we used are monophonic (including only the main melody on the right hand) and

shorter. Similar to the graphs seen in previous sections, Schubert's proximity profile is centered on 0,

but especially favoring repeating the same note and the note that is two semi-tones lower than the

previous note.

Figure 13: Range and Proximity Profiles of Schubert

Page 23: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

22

6.3 Markov Chain Approach

Markov Chain is a mathematical system that undergoes transitions from one state to

another. Its transition matrix is a probability matrix used to describe the transitions of the Markov

Chain. The first order Markov Chain assumes that the next state depends only on the current state

and not on the sequence of events that preceded it. Although is also possible to look at the second

order Markov Chain (how the first two notes transition into the third note), for the scope of our

project, we only use the first order Markov Chain as the foundation for music generation.

6.3.1 Model 1

Using R, we first compile the note files of all 35 songs together, and create two big transition

matrices, one for pitch and one for the time that the note lasts. An example of the transition matrix

for pitch is shown in Table 5. The transition matrix ranges from A3 to A5 and covers all the notes in

two octaves. Each entry in the matrix gives the transition probability from the pitches in the row

names to the column names. If no transitions between two pitches were observed within the 35

Schubert songs, then the probability entered into the transition matrix will be 0. The sum of each

row in the matrix is equaled to 1.

Table 5: Transition Matrix for Pitch

Page 24: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

23

We randomly select the starting pitch and the time that the starting note lasts with the calculated

probabilities of the first note of the 35 songs. Then, given the starting pitch and time, we use the

transition matrix to calculate for the following pitch and time and generate our music sequence. In

Table 5, notice that the highlighted probabilities on the diagonal are very high compared to the other

possible choices. This observation, along with the proximity profile histogram in Figure 13, suggests

that Schubert had a preference for repeating the same note.

From the transition matrix of the Markov Chain, 100 notes were produced from the 35

Schubert songs in our first model. The Melisma format note files, the transcribed music sheet, and

the graph of the piano roll are included in Appendix II. The produced data are also converted into

an audio file, the probability song. However, after listening to the probability song made from

Schubert's music, we found that there are some dissonances in the melody and the rhythm is a little

unpredictable, with notes that jumps around and rhythms sometimes slower and suddenly faster; the

transitions in the music are not very smooth. We thought that the reason the probability song is not

as enjoyable as we thought it would be is that we used the transition matrix created from the

compiled data. The 35 songs had around 20 different keys, and different signatures as well. If a song

is written in the C major key for example, we are more likely to see C's, E's, & G's in the music.

However, if the song is in written in B-flat major, then there is a higher probability to see B-flats and

E-flats rather than E's and C's. Because all 35 songs are combined in our analysis, we generated our

pitch sequence without taken into consideration of the greater key context, which created chaotic

sounds in the probability song.

6.3.2 Model 2

After realizing the faults of Model 1, we create another model to improve our music

generation process. In model 2, we first randomly select one of the 35 Schubert's songs, and use the

particular song to generate the transition matrix for pitch and time. 50 notes were generated because

Page 25: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

24

less data can be used as the input to calculate the transition matrix. Appendix III includes the

Melisma format note file, the transcribed music sheet, and the graph of the piano roll of the newly

generated second version of the probability song. Different from the first song, the new song has a

set A-major key, which is the same as the chosen Schubert song, shown by the three sharps on the

music sheet. Although different people judge music differently according to perception and taste, it

is still certain that the second version of the probability song sounds nicer because of smoother

transitions and less unexpected pitch sequences.

6.4 Summary

More improvements can be made to the music generation process using Markov Chains. For

example, the results may sound more alike to a Schubert song if a second order Markov Chain is

calculated and used to produce music because more information can be taken in and music

transition can be even smoother because the next note will depend on the previous two. However, a

bigger database would be needed if a second order Markov Chain were to be used. In addition, a

piece of enjoyable music contains many characteristics other than randomly generated combinations

of pitches and rhythm. For example, lots of repetitions or variations of a similar theme are presented

to enhance musical effects. While the Markov Chain model generated in this project is just a simple

one, better models can be produced with more sophisticated analysis in statistics and music theory.

VII. Conclusion

Our analysis helped us to solve the questions we raised at the beginning of our project. With

the patterns we discovered in classical composers’ uses of pitches and keys, we are able to unveil the

mysterious connection between music and probability. We further use statistic methods to explain

how people perceive and identify music. A closer look at the survey data collected in Cuddy and

Lunney’s study helps to explain how people form music expectations and detect errors in

inharmonic pitches. Moreover, we used a logistic regression model to identify music from different

Page 26: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

25

regions using unique regional characteristics of monophonic songs. Finally, we attempt to generate a

piece of probability song based on Markov Chain transition matrices from 35 Schubert’s melodies.

We would conclude that even if a person does not have any background in music, but knows

probability, he or she would still be able to interpret and write music!

There are many possible improvements for our project. Because both pitch and rhythm

make up a melody, other than the in-depth analysis of the pitch profile, we can further explore the

rhythm profile. In addition, we only explored patterns in classical music, but what also remains

curious to us is the comparison between classical music and modern music. Another challenge to

take on would be to come up with a solution to analyse polyphonic music instead of monophonic

ones. However, a strong background is musical theory is probably necessary to do so. Overall, one

of the biggest constraints of our project is the limited sources of music that we can obtain in the

desired Melisma format for data processing. A larger data set of music can better generalize our

findings and create more powerful results. Furthermore, with more songs in our analysis, we are

certain that the probability song that we attempted to generate can be further improved.

Page 27: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

26

References

Cuddy, L. L., & Lunney, C. A.Expectancies generated by melodic intervals: perceptual judgments of

melodic continuity. Perception & Psychophysics 57, 1995, pp. 451-462.

“Index of /afs/cs.cmu.edu/user/sleator/public/music-analysis/notefiles/misc” web. n.p. n.d.

Kaliakatsos-Papakostas, M. A., Epitropakis, M. G., Vrahatis, M. N. Weighted Markov Chain Model

for Musical Composer Identification. Applications of Evolutionary Computation: Lecture

Notes in Computer Science. Vol. 6625, 2011, pp. 334-343.

Krumhansl, C.L. & Schmuckler, M.A. The Petroushka chord: A perceptual investigation. Music

Perception, 4, 1986. pp. 153-184.

Miller, G. A., & Heise, G. A. The trill threshold. Journal of the Acoustical Society of America, 1950,

22, pp. 637-638.

Temperley, David. Music and Probability, The MIT Press, Cambridge, 2007

von Hippel, Paul & Huron, David. Why do skips precede reversals? The effect of tessitura on

melodic structure. Music Perception, Vol. 18, No. 1, 2000, pp. 59-85.

Page 28: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

27

Appendix I9: MIDI Notes Converter

                                                                                                               9 http://newt.phys.unsw.edu.au/jw/notes.html

Page 29: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

28

Appendix II: Markov Chain Generation Results Model 1 Melisma Format Note File

 Ontime   Offtime   Pitch   Time_diff  

   Ontime   Offtime   Pitch   Time_diff  

1   0   500   70   500    

51   44333   45333   74   1000  2   500   1500   70   1000  

 52   45333   45833   71   500  

3   1500   2000   73   500    

53   45833   46833   73   1000  4   2000   3500   71   1500  

 54   46833   47833   69   1000  

5   3500   4000   71   500    

55   47833   48333   76   500  6   4000   4500   69   500  

 56   48333   48833   76   500  

7   4500   6000   62   1500    

57   48833   50333   74   1500  8   6000   9000   71   3000  

 58   50333   51333   71   1000  

9   9000   10500   71   1500    

59   51333   52333   71   1000  10   10500   11000   71   500  

 60   52333   52833   73   500  

11   11000   12500   73   1500    

61   52833   53333   71   500  12   12500   13000   71   500  

 62   53333   53833   75   500  

13   13000   13500   74   500    

63   53833   55333   76   1500  14   13500   14250   73   750  

 64   55333   56333   73   1000  

15   14250   14500   76   250    

65   56333   56833   71   500  16   14500   14750   75   250  

 66   56833   57333   69   500  

17   14750   15000   73   250    

67   57333   58833   72   1500  18   15000   18000   74   3000  

 68   58833   59833   70   1000  

19   18000   18333   76   333    

69   59833   60333   67   500  20   18333   19833   73   1500  

 70   60333   60833   67   500  

21   19833   20333   69   500    

71   60833   61833   69   1000  22   20333   20833   67   500  

 72   61833   62333   70   500  

23   20833   21333   66   500    

73   62333   62833   68   500  24   21333   21833   59   500  

 74   62833   63833   71   1000  

25   21833   23833   66   2000    

75   63833   64333   73   500  26   23833   24583   68   750  

 76   64333   64833   73   500  

27   24583   24833   75   250    

77   64833   65583   73   750  28   24833   25333   75   500  

 78   65583   65833   77   250  

29   25333   27833   76   2500    

79   65833   66583   75   750  30   27833   28333   71   500  

 80   66583   66833   73   250  

31   28333   29333   71   1000    

81   66833   67333   76   500  32   29333   29833   72   500  

 82   67333   67833   76   500  

33   29833   30083   70   250    

83   67833   68333   77   500  34   30083   31083   71   1000  

 84   68333   68833   74   500  

35   31083   31583   64   500    

85   68833   69083   78   250  36   31583   33083   66   1500  

 86   69083   69333   74   250  

37   33083   33583   71   500    

87   69333   69833   74   500  38   33583   34083   69   500  

 88   69833   70833   72   1000  

39   34083   34583   70   500    

89   70833   71833   72   1000  40   34583   34833   69   250  

 90   71833   72333   70   500  

41   34833   35833   76   1000    

91   72333   72833   68   500  42   35833   36833   75   1000  

 92   72833   73333   70   500  

43   36833   37333   74   500    

93   73333   74333   69   1000  44   37333   39833   76   2500  

 94   74333   74833   62   500  

45   39833   40333   74   500    

95   74833   75583   64   750  46   40333   41333   75   1000  

 96   75583   75833   62   250  

47   41333   42333   73   1000    

97   75833   76833   69   1000  48   42333   43333   73   1000  

 98   76833   77083   67   250  

49   43333   43833   74   500    

99   77083   77333   71   250  50   43833   44333   74   500  

 100   77333   77583   72   250  

Page 30: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

29

Appendix II: Markov Chain Generation Results Model 1 Piano Sheet

Piano Roll

Page 31: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

30

Appendix III: Markov Chain Generation Results Model 2 Melisma Format Note File

 Ontime   Offtime   Pitch   Time_diff  

1   0   250   71   250  2   250   500   71   250  3   500   1000   69   500  4   1000   1500   71   500  5   1500   2000   69   500  6   2000   2500   69   500  7   2500   3000   68   500  8   3000   3750   66   750  9   3750   4000   68   250  

10   4000   4500   66   500  11   4500   5000   64   500  12   5000   5125   62   125  13   5125   5250   61   125  14   5250   5500   61   250  15   5500   6000   61   500  16   6000   7000   59   1000  17   7000   7500   57   500  18   7500   8000   64   500  19   8000   8500   66   500  20   8500   9000   64   500  21   9000   9500   64   500  22   9500   10000   66   500  23   10000   10500   68   500  24   10500   11250   66   750  25   11250   11500   64   250  26   11500   12000   66   500  27   12000   12750   68   750  28   12750   13000   66   250  29   13000   13500   64   500  30   13500   14250   64   750  31   14250   14500   64   250  32   14500   14750   64   250  33   14750   15250   64   500  34   15250   15750   69   500  35   15750   16250   68   500  36   16250   16750   66   500  37   16750   17250   67   500  38   17250   17750   71   500  39   17750   18250   73   500  40   18250   19250   74   1000  41   19250   19750   62   500  42   19750   21250   64   1500  43   21250   22000   66   750  44   22000   22250   64   250  45   22250   22750   61   500  46   22750   23500   61   750  47   23500   23750   62   250  48   23750   24250   64   500  49   24250   24750   57   500  50   24750   25500   69   750  

Page 32: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

31

Appendix III: Markov Chain Generation Results Model 2 Piano Sheet

Piano Roll

Page 33: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

32

Appendix IV: Source R Codes

III. Pitch Profiles’ Sample R codes (Chopin Example)

# Transform Function for Format Cleaning transform <- function(old){ noteoff = old[old$V1 == "Note-off",] noteoff$V1 = NULL noteon = old[old$V1 == "Note-on",] colnames(noteon) = c("Offtime", "Ontime", "Pitch") noteon = noteon[c(2,1,3)] noteon$Offtime=0*length(noteon$Ontime) noteon$index = c(1:length(noteon$Ontime)) sorted_on = noteon[with(noteon, order(noteon$Pitch, noteon$Ontime)),] sorted_off = noteoff[with(noteoff, order(noteoff$V3, noteoff$V2)),] sorted_on$Offtime = sorted_off$V2 noteon = sorted_on[with(sorted_on, order(sorted_on$index)),] noteon$index = NULL noteon } #Use the function, transform, to clean up data with 3 variables originally Chopin2=transform(Chopin2) #Clean up data of 4 variables and rename the columes Chopin1$V1=NULL colnames(Chopin1)=c("Ontime","Offtime","Pitch") #Compile data for each composer Chopin_all=rbind(Chopin1,Chopin2,Chopin3,Chopin4) Chipin_all$Ontime[which(Chopin_all$Ontime==NA)] # check no NA Chopin_all$Ontime=as.numeric(Chopin_all$Ontime) Chopin_all$Offtime=as.numeric(Chopin_all$Offtime) Chopin_all$Time_diff=Chopin_all$Offtime-Chopin_all$Ontime Chopin_all$Pitch_diff=c(0,diff(Chopin_all$Pitch)) #GRAPH #Normality of pitch #Chopin_all ChopinPitches=Chopin_all$Pitch hist(ChopinPitches,freq=FALSE,ylim=c(0,0.08),breaks=40,main="Histogram of Chopin's pitches") lines(density(Chopin_all$Pitch,na.rm=T)) Chopin_minpitch=min(Chopin_all$Pitch) Chopin_maxpitch=max(Chopin_all$Pitch) x=seq(Chopin_minpitch,Chopin_maxpitch,length=500) curve(dnorm(x,mean=mean(Chopin_all$Pitch),sd=sqrt(var(Chopin_all$Pitch))),col="blue",add=T,yaxt="n") #Normality of pitch intervals #Chopin_all pitch intervals ChopinPitchDiff=Chopin_all$Pitch_diff hist(ChopinPitchDiff,freq=F,breaks=80,main="Histogram of Chopin's Pitch Differences") lines(density(Chopin_all$Pitch_diff,na.rm=T)) Chopin_minpitchdiff=min(Chopin_all$Pitch_diff) Chopin_maxpitchdiff=max(Chopin_all$Pitch_diff) x=seq(Chopin_minpitchdiff,Chopin_maxpitchdiff,length=500) curve(dnorm(x,mean=mean(Chopin_all$Pitch_diff),sd=sqrt(var(Chopin_all$Pitch_diff))),col="blue",add=T,yaxt="n") #central pitch profile all=rbind(beet_all,Mzt_all,Chopin_all)

Page 34: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

33

all$Ontime=as.numeric(all$Ontime) all$Offtime=as.numeric(all$Offtime) all$Time_diff=all$Offtime-all$Ontime all$Pitch_diff=c(0,diff(all$Pitch))

IV. Melodic Expectation Study Plots # Data: http://theory.esm.rochester.edu/temperley/music-prob/data/cuddy-lunney-data f<-file.choose() expectation=read.csv(f) expectation$ratings = as.numeric(expectation$ratings) expectation$continuation_tone = as.numeric(expectation$continuation_tone) library(RColorBrewer) # HEAT MAP exp_plot <- ggplot(expectation, aes(x=continuation_tone, y=two_tone_context, fill=ratings)) exp_plot + geom_tile() + scale_x_continuous(breaks = seq(-12, 12, by = 1)) + scale_fill_gradientn(colours=brewer.pal(7, "YlGnBu")) + theme(text = element_text(size=25)) # Descending Major Second plot(x=expectation$continuation_tone[26:50], y=expectation$ratings[26:50], type="b", xlab="Continuation Tone", ylab="Ratings", ylim=c(1,7), xaxt="n", yaxt="n", main="Descending Major Second: C4-Bb3", col="darkblue") axis(1, xaxp=c(-12, 12, 24)) axis(2, at = seq(1, 7)) V. Music Prediction and Identification Regressions dff$Major.Minor = factor(dff$Major.Minor) key = factor(dff$key) dff$Rhythm.I = factor(dff$Rhythm.I) dff$Rhythm.II = factor(dff$Rhythm.II) dc = 0 + (country == "Chinese") da = 0 + (country == "Afrian") db = 0 + (country == "British") dff$dc = dc dff$da = da dff$db = db training = sample(92,46) # Model Selection selectmod = dff[training,][5:20] smmodel = glm(dc ~ 1, data=selectmod,family=binomial) big = formula(glm(dc ~.,data=selectmod,family=binomial)) big.model = step(smmodel, direction = 'forward', scope = big) big.model$anova # Chinese Model fullp=glm(dc ~ mean + var + X1 + X2 + timediff + X1.1 + timevar + X2.1 + Major.Minor + Rhythm.II,"binomial",data=selectmod) summary(fullp) pred = predict(fullp,dff, type=“response")[-training] # British Model full2=glm(db ~ mean +var + X1 + X2 + timediff + X1.1 +timevar+ X2.1 + Major.Minor +Rhythm.II,"binomial",data=selectmod) summary(full2) pred2 = predict(full2,dff,type=“response")[-training] # African Model full3=glm(da ~ mean + var + X1 + X2 + timediff + X1.1 + timevar + X2.1 + Major.Minor + Rhythm.II,"binomial",data=selectmod)

Page 35: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

34

summary(full3) pred3 = predict(full3,dff,type=“response")[-training] # Errors prederror = mean((dff$dc[-training]-pred)^2) glmerror = cv.glm(dff,fullp1)$delta[1] glm4error = cv.glm(dff,fullp1,K=5)$delta[1] bprederror = mean((dff$db[-training]-pred2)^2) bglmerror = cv.glm(dff,fullp2)$delta[1] bglm4error = cv.glm(dff,fullp2,K=5)$delta[1] aprederror = mean((dff$da[-training]-pred3)^2) aglmerror = cv.glm(dff,fullp3)$delta[1] aglm4error = cv.glm(dff,fullp3,K=5)$delta[1] chinese = c(prederror,glmerror,glm4error) african = c(aprederror,aglmerror,aglm4error) british = c(bprederror,bglmerror,bglm4error) validation = c(test,train,train) cv.error = c(chinese,african,british) Model = c(rep("Chinese",3),rep("African",3),rep("British",3)) type = rep(c("Validation Set Error","Leave-One Out","K-fold Cross Validation"),3) validation = rep(c("Test","Train","Train"),3) error = data.frame(cv.error,validation,Model,type) VI. Music Generation: Markov Chain schub_list = list(schub1,schub2,schub3,schub4,schub5,schub6,schub7,schub8,schub9,schub10,schub11,schub12,schub13,schub14,schub15,schub16,schub17,schub18,schub19,schub20,schub21,schub22,schub23,schub24,schub25,schub26,schub27,schub28,schub29,schub30,schub31,schub32,schub33,schub34,schub35) schub_all = rbind(schub1,schub2,schub3,schub4,schub5,schub6,schub7,schub8,schub9,schub10,schub11,schub12,schub13,schub14,schub15,schub16,schub17,schub18,schub19,schub20,schub21,schub22,schub23,schub24,schub25,schub26,schub27,schub28,schub29,schub30,schub31,schub32,schub33,schub34,schub35) schub_all$Ontime=as.numeric(schub_all$V2) schub_all$Offtime=as.numeric(schub_all$V3) schub_all$Pitch=as.numeric(schub_all$V4) schub_all$Time_diff=schub_all$V3-schub_all$V2 schub_all=na.omit(schub_all) pitchnames = as.character(sort(unique(schub_all$V4))) timenames = as.character(sort(unique(schub_all$diff))) ############### MODEL 1 ################# # Make Transition matrix from schub_all combined pitch_trans_m1 = as(markovchainFit(schub_all$V4)$estimate, "matrix") time_trans_m1 = as(markovchainFit(schub_all$diff)$estimate, "matrix") ############### MODEL 2 ################# # Clean files and Produce transition matrix for each library(markovchain) pitch_mlist = list() time_mlist = list() songlen = c() n = 1 for (file in schub_list){ combined = na.omit(cbind(file, "diff"=as.numeric(file$V3)-as.numeric(file$V2))) songlen = c(songlen, nrow(file)) ### pitch transition matrices

Page 36: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

35

pitch_trans = as.data.frame(as(markovchainFit(combined$V4)$estimate, "matrix")) pitch_match_ind = match(colnames(pitch_trans), pitchnames) missing_pitch = pitchnames[-pitch_match_ind] pitch_trans[,missing_pitch] <- 0 #or 1/n pitch_trans[missing_pitch,] <- 0 #or 1/n pitch_trans = pitch_trans[,order(names(pitch_trans))] pitch_trans = pitch_trans[order(rownames(pitch_trans)),] pitch_trans = as.matrix(pitch_trans) pitch_mlist[[n]] = pitch_trans ### time diff transition matrices time_trans = as.data.frame(as(markovchainFit(combined$diff)$estimate, "matrix")) time_match_ind = match(colnames(time_trans), timenames) missing_time = timenames[-time_match_ind] time_trans[,missing_time] <- 0 #or 1/n time_trans[missing_time,] <- 0 #or 1/n time_trans = time_trans[,order(as.numeric(names(time_trans)))] time_trans = time_trans[order(as.numeric(rownames(time_trans))),] time_trans = as.matrix(time_trans) time_mlist[[n]] = time_trans n = n+1 } # Simulate first order markov chain # Using Model 1 set.seed(18) start_pitch=list() for (file in schub_list){ start_pitch = c(start_pitch, file[1,4]) } pitch0 = sample(start_pitch, 1) pitch_mc_m1<-new("markovchain", states=pitchnames,transitionMatrix=pitch_trans_m1) pitch_sim_m1 = markovchainSequence(n=100, markovchain=pitch_mc_m1, t0=as.character(pitch0[[1]])) start_time = list() for (file in schub_list){ start_time = c(start_time, file[1,3]-file[1,2]) } time0 = sample(start_time, 1) time_mc_m1<-new("markovchain", states=timenames,transitionMatrix=time_trans_m1) time_sim_m1 = markovchainSequence(n=100, markovchain=time_mc_m1, t0=as.character(time0[[1]])) on_sim_m1 = cumsum(as.numeric(time_sim_m1))[-100] off_sim_m1 = cumsum(as.numeric(time_sim_m1)) sim = data.frame(Ontime=c(0, as.numeric(on_sim_m1)), Offtime=as.numeric(off_sim_m1), Pitch=as.numeric(pitch_sim_m1), Time_diff=time_sim_m1) table = data.frame(time_sim_m1, pitch_sim_m1) # Plot library(ggplot2) ggplot(sim, aes(colour=Pitch)) + geom_segment(aes(x=Ontime, xend=Offtime, y=Pitch, yend=Pitch), size=3) + xlab("Time (Miliseconds)") # Using Model 2 pitch_mlist_m2 = list() time_mlist_m2 = list() n = 1 for (file in schub_list){ combined = na.omit(cbind(file, "diff"=as.numeric(file$V3)-as.numeric(file$V2))) ### pitch transition matrices pitch_trans = as(markovchainFit(combined$V4)$estimate, "matrix") pitch_mlist_m2[[n]] = pitch_trans

Page 37: Stat 157 Probability in Musicaldous/157/Old... · STAT 157 FINAL PROJECT PROBABILITY IN MUSIC Composer Styles, Music Perception, Identification, and Generation by Xu Deng 22821315

STAT 157  

   

 PROBABILITY IN MUSIC

DENG, HU, & SHEN      

36

### time diff transition matrices time_trans = as(markovchainFit(combined$diff)$estimate, "matrix") time_mlist_m2[[n]] = time_trans n = n+1 } songid = sample(seq(1,35, by=1), 1, prob=weight) pitch0 = schub_list[[songid]][1,4] pnames = as.character(sort(unique(schub_list[[songid]]$V4))) pitch_mc_m2<-new("markovchain", states=pnames,transitionMatrix=pitch_mlist_m2[[songid]]) pitch_sim_m2 = markovchainSequence(n=50, markovchain=pitch_mc_m2, t0=pitch0) schub_list[[songid]]$diff = schub_list[[songid]]$V3 - schub_list[[songid]]$V2 tnames = as.character(sort(unique(schub_list[[songid]]$diff))) time_mc_m2<-new("markovchain", states=tnames,transitionMatrix=time_mlist_m2[[songid]]) time_sim_m2 = markovchainSequence(n=50, markovchain=time_mc_m2, t0=time0) on_sim_m2 = cumsum(as.numeric(time_sim_m2))[-50] off_sim_m2 = cumsum(as.numeric(time_sim_m2)) sim2 = data.frame(Ontime=c(0, as.numeric(on_sim_m2)), Offtime=as.numeric(off_sim_m2), Pitch=as.numeric(pitch_sim_m2), Time_diff=time_sim_m2) table_m2 = data.frame(time_sim_m2, pitch_sim_m2) ggplot(sim2, aes(colour=Pitch)) + geom_segment(aes(x=Ontime, xend=Offtime, y=Pitch, yend=Pitch), size=3) + xlab("Time (Miliseconds)") ############### Schubert's Music Style ################# # Normality of Pitch schub_meanpitch = mean(schub_all$Pitch) # 70.7 schub_maxpitch = max(schub_all$Pitch) #81 schub_minpitch = min(schub_all$Pitch) #57 hist(schub_all$Pitch, freq=FALSE, ylim=c(0,0.15), breaks=25, main="Histogram of Schubert's Pitches", xlab="Pitch") lines(density(schub_all$Pitch, na.rm=TRUE)) x=seq(schub_minpitch, schub_maxpitch,length=500) curve(dnorm(x, mean=mean(schub_all$Pitch), sd=sqrt(var(schub_all$Pitch))), col="blue", add=TRUE, yaxt="n") boxplot(schub_all$Pitch, add=TRUE,horizontal=TRUE,at=-0.0027,border="darkred",boxwex=0.007,outline=FALSE) abline(v = mean(schub_all$Pitch), col = "darkred", lwd = 2) # Normality of Pitch Diff schub_all$Pitch_diff = c(0, diff(schub_all$Pitch)) schub_meanPitchdiff = mean(schub_all$Pitch_diff) # -0.0003 -> 0 schub_maxpitchdiff = max(schub_all$Pitch_diff) #17 schub_minpitchdiff = min(schub_all$Pitch_diff) #-16 hist(schub_all$Pitch_diff, freq=FALSE, ylim=c(0,0.25), breaks=33, main="Histogram of Schubert's Pitch Differences", xlab="Pitch Interval") lines(density(schub_all$Pitch_diff, na.rm=TRUE)) x=seq(schub_minpitchdiff, schub_maxpitchdiff,length=500) curve(dnorm(x, mean=mean(schub_all$Pitch_diff), sd=sqrt(var(schub_all$Pitch_diff))), col="blue", add=TRUE, yaxt="n") boxplot(schub_all$Pitch_diff, add=TRUE,horizontal=TRUE,at=-0.005,border="darkred",boxwex=0.01,outline=FALSE) abline(v = mean(schub_all$Pitch_diff), col = "red", lwd = 2)