music and emotions

25
Music and movement share a dynamic structure that supports universal expressions of emotion Beau Sievers a,1 , Larry Polansky b , Michael Casey b , and Thalia Wheatley a,1 a Department of Psychological and Brain Sciences and b Department of Music, Dartmouth College, Hanover, NH 03755 Edited by Dale Purves, Duke-National University of Singapore Graduate Medical School, Singapore, and approved November 7, 2012 (received for review May 28, 2012) Music moves us. Its kinetic power is the foundation of human behaviors as diverse as dance, romance, lullabies, and the military march. Despite its signicance, the music-movement relationship is poorly understood. We present an empirical method for testing whether music and movement share a common structure that affords equivalent and universal emotional expressions. Our method uses a computer program that can generate matching examples of music and movement from a single set of features: rate, jitter (regularity of rate), direction, step size, and dissonance/visual spikiness. We applied our method in two experiments, one in the United States and another in an isolated tribal village in Cambodia. These experiments revealed three things: (i ) each emotion was represented by a unique combi- nation of features, (ii ) each combination expressed the same emotion in both music and movement, and (iii ) this common structure be- tween music and movement was evident within and across cultures. cross-cultural | cross-modal M usic moves us, literally. All human cultures dance to music and musics kinetic faculty is exploited in everything from military marches and political rallies to social gatherings and romance. This cross-modal relationship is so fundamental that in many languages the words for music and dance are often inter- changeable, if not the same (1). We speak of music movingus and we describe emotions themselves with music and movement words like bouncyand upbeat(2). Despite its centrality to human experience, an explanation for the music-movement link has been elusive. Here we offer empirical evidence that sheds new light on this ancient marriage: music and movement share a dynamic structure. A shared structure is consistent with several ndings from research with infants. It is now well established that very young infantseven neonates (3)are predisposed to group metrically regular, auditory events similarly to adults (4, 5). Moreover, infants also infer meter from movement. In one study, 7-mo-old infants were bounced in duple or triple meter while listening to an am- biguous rhythm pattern (6). When hearing the same pattern later without movement, infants preferred the pattern with intensity (auditory) accents that matched the particular metric pattern at which they were previously bounced. Thus, the perception of a beat,established by movement or by music, transfers across mo- dalities. Infant preferences suggest that perceptual correspond- ences between music and movement, at least for beat perception, are predisposed and therefore likely universal. By denition, how- ever, infant studies do not examine whether such predispositions survive into adulthood after protracted exposure to culture-specic inuences. For this reason, adult cross-cultural research provides important complimentary evidence for universality. Previous research suggests that several musical features are universal. Most of these features are low-level structural proper- ties, such as the use of regular rhythms, preference for small-in- teger frequency ratios, hierarchical organization of pitches, and so on (7, 8). We suggest musics capacity to imitate biological dy- namics including emotive movement is also universal, and that this capacity is subserved by the fundamental dynamic similarity of the domains of music and movement. Imitation of human physio- logical responses would help explain, for example, why angrymusic is faster and more dissonant than peacefulmusic. This capacity may also help us understand musics inductive effects: for example, the soothing power of lullabies and the stimulating, synchronizing force of military marching rhythms. Here we present an empirical method for quantitatively com- paring music and movement by leveraging the fact that both can express emotion. We used this method to test to what extent expressions of the same emotion in music and movement share the same structure; that is, whether they have the same dynamic fea- tures. We then tested whether this structure comes from biology or culture. That is, whether we are born with the predisposition to relate music and movement in particular ways, or whether these relationships are culturally transmitted. There is evidence that emotion expressed in music can be understood across cultures, despite dramatic cultural differences (9). There is also evidence that facial expressions and other emotional movements are cross- culturally universal (1012), as Darwin theorized (13). A natural predisposition to relate emotional expression in music and move- ment would explain why music often appears to be cross-culturally intelligible when other fundamental cultural practices (such as verbal language) are not (14). To determine how music and movement are related, and whether that relationship is peculiar to Western culture, we ran two experiments. First, we tested our common structure hypothesis in the United States. Then we con- ducted a similar experiment in Lak, a culturally isolated tribal village in northeastern Cambodia. We compared the results from both cultures to determine whether the connection between music and movement is universal. Because many musical practices are culturally transmitted, we did not expect both experiments to have precisely identical results. Rather, we hypothesized results from both cultures would differ in their details yet share core dynamic features enabling cross-cultural legibility. General Method We created a computer program capable of generating both music and movement; the former as simple, monophonic piano melodies, and the latter as an animated bouncing ball. Both were controlled by a single probabilistic model, ensuring there was an isomorphic relationship between the behavior of the music and the movement of the ball. This model represented both music and movement in terms of dynamic contour: how changes in the stimulus unfold over time. Our model for dynamic contour comprised ve quantitative parameters controlled by on-screen slider bars. Stimuli were gen- erated in real time, and manipulation of the slider bars resulted in immediate changes in the music being played or the animation being shown (Fig. 1). Author contributions: B.S., L.P., and T.W. designed research; B.S. performed research; B.S., M.C., and T.W. analyzed data; B.S. and T.W. wrote the paper; and B.S. wrote the MaxMSP program and recorded the supporting media. The authors declare no conict of interest. This article is a PNAS Direct Submission. 1 To whom correspondence may be addressed. E-mail: [email protected] or thalia. [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1209023110/-/DCSupplemental. 7075 | PNAS | January 2, 2013 | vol. 110 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1209023110

Upload: fratean-mihaela

Post on 18-Jul-2016

37 views

Category:

Documents


6 download

DESCRIPTION

An article about music and the universal expression of emotion

TRANSCRIPT

Page 1: music and emotions

Music and movement share a dynamic structure thatsupports universal expressions of emotionBeau Sieversa,1, Larry Polanskyb, Michael Caseyb, and Thalia Wheatleya,1

aDepartment of Psychological and Brain Sciences and bDepartment of Music, Dartmouth College, Hanover, NH 03755

Edited by Dale Purves, Duke-National University of Singapore Graduate Medical School, Singapore, and approved November 7, 2012 (received for reviewMay 28, 2012)

Music moves us. Its kinetic power is the foundation of humanbehaviors as diverse as dance, romance, lullabies, and the militarymarch. Despite its significance, the music-movement relationship ispoorly understood. We present an empirical method for testingwhethermusic andmovement share a commonstructure that affordsequivalent and universal emotional expressions. Our method usesa computer program that can generate matching examples of musicandmovement from a single set of features: rate, jitter (regularity ofrate), direction, step size, and dissonance/visual spikiness.Weappliedourmethod in twoexperiments, one in theUnitedStates andanotherin an isolated tribal village in Cambodia. These experiments revealedthree things: (i) each emotion was represented by a unique combi-nationof features, (ii) each combination expressed the same emotionin both music and movement, and (iii) this common structure be-tween music and movement was evident within and across cultures.

cross-cultural | cross-modal

Music moves us, literally. All human cultures dance to musicand music’s kinetic faculty is exploited in everything from

military marches and political rallies to social gatherings andromance. This cross-modal relationship is so fundamental that inmany languages the words for music and dance are often inter-changeable, if not the same (1). We speak of music “moving” usand we describe emotions themselves with music and movementwords like “bouncy” and “upbeat” (2). Despite its centrality tohuman experience, an explanation for the music-movement linkhas been elusive. Here we offer empirical evidence that shedsnew light on this ancient marriage: music and movement sharea dynamic structure.A shared structure is consistent with several findings from

research with infants. It is now well established that very younginfants—even neonates (3)—are predisposed to group metricallyregular, auditory events similarly to adults (4, 5). Moreover, infantsalso infer meter from movement. In one study, 7-mo-old infantswere bounced in duple or triple meter while listening to an am-biguous rhythm pattern (6). When hearing the same pattern laterwithout movement, infants preferred the pattern with intensity(auditory) accents that matched the particular metric pattern atwhich they were previously bounced. Thus, the perception of a“beat,” established by movement or by music, transfers across mo-dalities. Infant preferences suggest that perceptual correspond-ences between music and movement, at least for beat perception,are predisposed and therefore likely universal. By definition, how-ever, infant studies do not examine whether such predispositionssurvive into adulthood after protracted exposure to culture-specificinfluences. For this reason, adult cross-cultural research providesimportant complimentary evidence for universality.Previous research suggests that several musical features are

universal. Most of these features are low-level structural proper-ties, such as the use of regular rhythms, preference for small-in-teger frequency ratios, hierarchical organization of pitches, and soon (7, 8). We suggest music’s capacity to imitate biological dy-namics including emotive movement is also universal, and that thiscapacity is subserved by the fundamental dynamic similarity of thedomains of music and movement. Imitation of human physio-logical responses would help explain, for example, why “angry”

music is faster and more dissonant than “peaceful” music. Thiscapacity may also help us understand music’s inductive effects: forexample, the soothing power of lullabies and the stimulating,synchronizing force of military marching rhythms.Here we present an empirical method for quantitatively com-

paring music and movement by leveraging the fact that both canexpress emotion. We used this method to test to what extentexpressions of the same emotion in music andmovement share thesame structure; that is, whether they have the same dynamic fea-tures. We then tested whether this structure comes from biology orculture. That is, whether we are born with the predisposition torelate music and movement in particular ways, or whether theserelationships are culturally transmitted. There is evidence thatemotion expressed in music can be understood across cultures,despite dramatic cultural differences (9). There is also evidencethat facial expressions and other emotional movements are cross-culturally universal (10–12), as Darwin theorized (13). A naturalpredisposition to relate emotional expression in music and move-ment would explain why music often appears to be cross-culturallyintelligible when other fundamental cultural practices (such asverbal language) are not (14). To determine how music andmovement are related, and whether that relationship is peculiar toWestern culture, we ran two experiments. First, we tested ourcommon structure hypothesis in the United States. Then we con-ducted a similar experiment in L’ak, a culturally isolated tribalvillage in northeastern Cambodia. We compared the results fromboth cultures to determine whether the connection between musicand movement is universal. Because many musical practices areculturally transmitted, we did not expect both experiments to haveprecisely identical results. Rather, we hypothesized results fromboth cultures would differ in their details yet share core dynamicfeatures enabling cross-cultural legibility.

General MethodWe created a computer program capable of generating both musicandmovement; the former as simple,monophonic pianomelodies,and the latter as an animated bouncing ball. Both were controlledby a single probabilistic model, ensuring there was an isomorphicrelationship between the behavior of the music and the movementof the ball. This model represented both music and movement interms of dynamic contour: how changes in the stimulus unfold overtime. Our model for dynamic contour comprised five quantitativeparameters controlled by on-screen slider bars. Stimuli were gen-erated in real time, and manipulation of the slider bars resulted inimmediate changes in the music being played or the animationbeing shown (Fig. 1).

Author contributions: B.S., L.P., and T.W. designed research; B.S. performed research; B.S.,M.C., and T.W. analyzed data; B.S. and T.W. wrote the paper; and B.S. wrote the MaxMSPprogram and recorded the supporting media.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.1To whom correspondence may be addressed. E-mail: [email protected] or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1209023110/-/DCSupplemental.

70–75 | PNAS | January 2, 2013 | vol. 110 | no. 1 www.pnas.org/cgi/doi/10.1073/pnas.1209023110

Page 2: music and emotions

The five parameters corresponded to the following features:rate (as ball bounces or musical notes per minute, henceforthbeats per minute or BPM), jitter (SD of interonset interval), di-rection of movement (ratio of downward to upward movements,controlling either pitch trajectory or ball tilt), step size (ratio of bigto small movements, controlling pitch interval size or ball bounceheight), and finally consonance/smoothness [quantified usingHuron’s (15) aggregate dyadic consonance measure and mappedto surface texture].The settings for each of these parameters affected both themusic

and the movement such that certain musical features were guar-anteed to correspondwith certainmovement features. The rate andjitter sliders controlled the rate and variation in interonset intervalof events in both modalities. The overall contour of eachmelody orbounce sequence was determined by the combined positions of thedirection of movement and step-size sliders. Absolute pitch posi-tion corresponded to the extent to which the ball was tilted forwardor backward. Low pitches corresponded with the ball “leaning”forward, as though looking toward the ground, and high pitchescorresponded with “leaning” backward, looking toward the sky.For music, the consonance slider controlled the selection of oneof 38 possible 5-note scales, selected from the 12-note Westernchromatic scale and sorted in order by their aggregate dyadicconsonance (15) (SI Text). For movement, the consonance slidercontrolled the visual spikiness of the ball’s surface. Dissonant in-tervals in the music corresponded to increases in the spikiness ofthe ball, and consonant intervals smoothed out its surface. Spikinesswas dynamic in the sense that it was perpetually changing becauseof the probabilistic and continuously updating nature of the pro-gram; it did not influence the bouncing itself. Our choice of spikinessas a visual analog of auditory dissonance was inspired by the conceptof auditory “roughness” described by Parncutt (16). We understand“roughness” as an apt metaphor for the experience of dissonance.We did not use Parncutt’s method for calculating dissonance basedon auditory beating (17, 18). To avoid imposing a particular physicaldefinition of dissonance, we used values derived directly from ag-gregated empirical reports of listener judgments (15). Spikiness wasalso inspired by nonarbitrary mappings between pointed shapes andunrounded vowels (e.g., “kiki”) and between rounded shapes androunded sounds (e.g., “bouba”; refs. 19–21). The dissonance-spiki-ness mapping was achieved by calculating the dissonance of themelodic interval corresponding to each bounce, and dynamicallyscaling the spikiness of the surface of the ball proportionately.Three of these parameters are basic dynamic properties: speed

(BPM), direction, and step-size. Regularity (jitter) and smooth-ness were added because of psychological associations withemotion [namely, predictability and tension (22)] that were notalready captured by speed, step-size, and direction. The numberof features (five) was based on the intuition that this numbercreated a large enough possibility space to provide a proof-of-concept test of the shared structure hypothesis without becomingunwieldy for participants. We do not claim that these five featuresoptimally characterize the space.These parameters were selected to accommodate the production

of specific features previously identified with musical emotion (2).In themusic domain, this set of features can be grouped as “timing”features (tempo, jitter) and “pitch” features (consonance, stepsize, and direction). Slider bars were presented with text labels

indicating their function (SI Text). Each of the cross-modal (music-movement) mappings represented by these slider bars constituteda hypothesis about the relationship between music and movement.That is, based on their uses in emotional music and movement(2, 23), we hypothesized that rate, jitter, direction of move-ment, and step size have equivalent emotional function in bothmusic and movement. Additionally, we hypothesized that bothdissonance and spikiness would have negative valence, and thatin equivalent cross-modal emotional expressions the magnitude ofone would be positively correlated with the magnitude of the other.

United StatesMethods. Our first experiment took place in the United Stateswith a population of college students. Participants (n = 50) weredivided into two groups, music (n = 25) and movement (n = 25).Each participant completed the experiment individually and with-out knowledge of the other group. That is, each participant wastold about either the music or the movement capability of theprogram, but not both.After the study was described to the participants, written in-

formed consent was obtained. Participants were given a briefdemonstration of the computer program, after which they wereallowed unlimited time to get used to the program through un-directed play. At the beginning of this session, the slider bars wereautomatically set to random positions. Participants ended the playsession by telling the experimenter that they were ready to beginthe experiment. The duration of play was not recorded, but themodal duration was ∼5–10 min. To begin a melody or movementsequence, participants pressed the space bar on a computer key-board. Themusic andmovement output were continuously updatedbased on the slider bar positions such that participants could see(or hear) the results of their efforts as theymoved the bars. Betweenmusic sequences, there was silence. Between movement sequences,the ball would hold still in its final position before resetting to aneutral position at the beginning of the next sequence.After indicating they were ready to begin the experiment, par-

ticipants were instructed to take as much time as needed to use theprogram to express five emotions: “angry,” “happy,” “peaceful,”“sad,” and “scared.” Following Hevner (24), each emotion wordwas presented at the top of a block of five words with roughly thesame meaning (SI Text). These word clusters were present on thescreen throughout the entire duration of the experiment. Partic-ipants could work on each of these emotions in any order, clickingon buttons to save or reload slider bar settings for any emotion atany time. Only the last example of any emotion saved by the par-ticipant was used in our analyses. Participants could use all fivesliders throughout the duration of the experiment, and no restric-tions were placed on the order in which the sliders were used. Forexample, participants were free to begin using the tempo slider,then switch to the dissonance slider at any time, then back to thetempo slider again, and so on. In practice, participants constantlyswitched between the sliders, listening or watching the aggregateeffect of all slider positions on the melody or ball movement.

Results. The critical question was whether subjects who usedmusic to express an emotion set the slider bars to the samepositions as subjects who expressed the same emotion with themoving ball.

Fig. 1. Paradigm. Participants manipulated fiveslider bars corresponding to five dynamic featuresto create either animations or musical clips that ex-pressed different emotions.

Sievers et al. PNAS | January 2, 2013 | vol. 110 | no. 1 | 71

PSYC

HOLO

GICALAND

COGNITIVESC

IENCE

S

Page 3: music and emotions

To answer this question, the positions of the sliders for eachmodality (music vs. movement) and each emotion were analyzedusing multiway ANOVA. Emotion had the largest main effect onslider position [F(2.97, 142.44) = 185.56, P < 0.001, partial η2 =0.79]. Partial η2 reflects how much of the overall variance (effectplus error) in the dependent variable is attributable to the factor inquestion. Thus, 79% of the overall variance in where participantsplaced the slider bars was attributable to the emotion they wereattempting to convey. Thismain effect was qualified by anEmotion×Slider interaction indicating each emotion required different slidersettings [F(4.81, 230.73) = 112.90, P < 0.001; partial η2 = 0.70].Although we did find a significant main effect of Modality

(music vs. movement) [F(1,48) = 4.66, P < 0.05], it was small(partial η2 = 0.09) and did not interact with Emotion [Emotion ×Modality: F(2.97, 142.44) = 0.97, P > 0.4; partial η2 = 0.02]. Thisfinding indicates slider bar settings for music and movement wereslightly different from each other, regardless of the emotion beingrepresented. We also found a three-way interaction betweenSlider, Emotion, andModality. This interaction was significant butmodest [F(4.81, 230.73) = 4.50, P < 0.001; partial η2 = 0.09], andcan be interpreted as a measure of the extent to which music andmovement express different emotions with different patterns ofdynamic features.To investigate the similarity of emotional expressions, we

conducted a Euclidean distance-based clustering analysis. Thisanalysis revealed a cross-modal, emotion-based structure (Fig. 2).These results strongly suggest the presence of a common struc-

ture. That is, within this experiment, rate, jitter, step size, and di-rection of movement functioned the same way in emotional musicand movement, and aggregate dyadic dissonance was functionallyanalogous to visual spikiness. For our United States population,music and movement shared a cross-modal expressive code.

CambodiaMethods. We conducted our second experiment in L’ak, a ruralvillage in Ratanakiri, a sparsely populated province in northeast-ern Cambodia. L’ak is a Kreung ethnic minority village that hasmaintained a high degree of cultural isolation. (For a discussionof the possible effects of modernization on L’ak, see SI Text.) InKreung culture, music and dance occur primarily as a part of rit-uals, such as weddings, funerals, and animal sacrifices (25). Kreungmusic is formally dissimilar to Western music: it has no system ofvertical pitch relations equivalent to Western tonal harmony, isconstructed using different scales and tunings, and is performed onmorphologically dissimilar instruments. For a brief discussion ofthe musical forms we observed during our visit, see SI Text.

The experiment we conducted in L’ak proceeded in the samemanner as the United States experiment, except for a few mod-ifications made after pilot testing. Initially, because most of theparticipants were illiterate, we simply removed the text labels fromthe sliders. However, in our pilot tests we found participants haddifficulty remembering the function of each slider during themovement task. We compensated by replacing the slider labels forthe movement task with pictures (SI Text). Instructions wereconveyed verbally by a translator. (For a discussion of the trans-lation of emotion words, see SI Text.) None of the participants hadany experience with computers, so the saving/loading functionalityof the program was removed. Whereas the United States partic-ipants were free to work on any of the five emotions throughout theexperiment, the Kreung participants worked out each emotionone-by-one in a random order. There were no required repetitionsof trials. However, when Kreung subjects requested to work ona different emotion than the one assigned, or to revise an emotionthey had already worked on, that request was always granted. Aswith the United States experiment, we always used the last ex-ample of any emotion chosen by the participant. Rather than usinga mouse, participants used a hardware MIDI controller (KorgnanoKontrol) to manipulate the sliders on the screen (Fig. 3A).When presented with continuous sliders as in the United States

experiment, many participants indicated they were experiencingdecision paralysis and could not complete the task. To make thetask comfortable and tractable we discretized the sliders, limitingeach to three positions: low, medium, and high (SI Text). As withthe United States experiment, participants were split into separatemusic (n = 42) and movement (n = 43) groups.

Results: Universal Structure in Music and MovementThere were two critical questions for the cross-cultural analysis: (i)Are emotional expressions universally cross-modal? and (ii) Areemotional expressions similar across cultures? The first questionasks whether participants who used music to express an emotionset the slider bars to the same positions as participants whoexpressed the same emotion with the moving ball. This questiondoes not examine directly whether particular emotional expres-sions are universal. A cross-modal result could be achieved evenif different cultures have different conceptions of the same emo-tion (e.g., “happy” could be upward and regular in music andmovement for the United States, but downward and irregular inmusic and movement for the Kreung). The second question askswhether each emotion (e.g., “happy”), is expressed similarly acrosscultures in music, movement or both.To compare the similarity of the Kreung results to the United

States results, we conducted three analyses. All three analysesrequired the United States and Kreung data to be in a comparableformat; this was accomplished by making the United States datadiscrete. Each slider setting was assigned a value of low, medium,or high in accordance with the nearest value used in the Kreungexperiment. The following sections detail these three analyses. SeeSI Text for additional analyses, including a linear discriminant

Fig. 2. Music-movement similarity structure in the United States data.Clusters are fused based on the mean Euclidean distance between members.The data cluster into a cross-modal, emotion-based structure.

Fig. 3. (A) Kreung participants used a MIDI controller to manipulate theslider bar program. (B) L’ak village debriefing at the conclusion of the study.

72 | www.pnas.org/cgi/doi/10.1073/pnas.1209023110 Sievers et al.

Page 4: music and emotions

analysis examining the importance of each feature (slider) in dis-tinguishing any given emotion from the other emotions.

ANOVA. We z-scored the data for each parameter (slider) sepa-rately within each population.We then combined all z-scored datainto a single, repeated-measures ANOVA with Emotion andSliders as within-subjects factors and Modality and Populationas between-subjects factors. Emotion had the largest main effecton slider position [F(3.76, 492.23) = 40.60, P < 0.001, partial η2 =0.24], accounting for 24% of the overall (effect plus error) vari-ance. There were no significant main effects of Modality (musicvs. movement) [F(1,131) = 0.004, P = 0.95] or Population (UnitedStates, Kreung) [F(1,131) < 0.001, P = 0.99] and no interactionbetween the two [F(1,131) = 1.15, P = 0.29].This main effect of Emotion was qualified by an Emotion ×

Sliders interaction, indicating each emotion was expressed bydifferent slider settings [F(13.68, 1791.80) = 38.22, P < 0.001;partial η2 = 0.23]. Emotion also interacted with Population, albeitmore modestly [F(3.76, 492.23) = 11.53, P < 0.001, partial η2 =0.08] and both were qualified by the three-way Emotion× Sliders×Population interaction, accounting for 7% of the overall vari-ance in slider bar settings [F(13.68, 1791.80) = 10.13, P < 0.001,partial η2 = 0.07]. This three-way interaction can be understoodas how much participants’ different emotion configurations couldbe predicted by their population identity. See SI Text for thez-scored means.Emotion also interacted with Modality [F(3.76, 492.23) = 2.84,

P = 0.02; partial η2 = 0.02], which was qualified by the three-wayEmotion×Modality× Sliders [F(13.68, 1791.80)= 4.92, P< 0.001;partial η2= 0.04] and the four-way Emotion×Modality× Sliders×Population interactions [F(13.68, 1791.80) = 2.8, P < 0.001; partialη2 = 0.02]. All of these Modality interactions were modest, ac-counting for between 2% and 4% of the overall variance.In summary, the ANOVA revealed that the slider bar config-

urations depended most strongly on the emotion being conveyed(Emotion × Slider interaction, partial η2 = 0.23), with significant butsmall influences of modality and population (partial η2’s < 0.08).

Monte Carlo Simulation. Traditional ANOVA is well-suited todetecting mean differences given a null hypothesis that the meansare the same. However, this test cannot capture the similaritybetween populations given the size of the possibility space. Onecritical advance of the present paradigm is that it allowed partic-ipants to create different emotional expressions within a largepossibility space. Analogously, an ANOVA on distance wouldshow that Boston and New York City do not share geographiccoordinates, thereby rejecting the null hypothesis that these citiesoccupy the same space. Such a comparison would not test how closeBoston and New York City are compared with distances betweeneither city and every other city across the globe (i.e., relative tothe entire possibility space). To determine the similarity betweenKreung and United States data given the entire possibility spaceafforded by the five sliders, we ran a Monte Carlo simulation.The null hypothesis of this simulation was that there are nouniversal perceptual constraints on music-movement-emotionassociation, and that individual cultures may create music andmovement anywhere within the possibility space. Showing insteadthat differences between cultures are small relative to the size ofthe possibility space strongly suggests music-movement-emotionassociations are subject to biological constraints.We represented the mean results of each experiment as a 25-

dimensional vector (five emotions × five sliders), where eachdimension has a range from 0.0 to 2.0. The goal of the MonteCarlo simulation was to see how close these two vectors were toeach other relative to the size of the space they both occupy. Todo this, we sampled the space uniformly at random, generatingone-million pairs of 25-dimensional vectors. Each of these vec-tors represented a possible outcome of the experiment. Wemeasured the Euclidean distance between each pair to generatea distribution of intervector distances (mean = 8.11, SD = 0.97).

The distance between the Kreung and United States meanresult vectors was 4.24, which was 3.98 SDs away from the mean.Out of one-million vector pairs, fewer than 30 pairs were thisclose together, suggesting it is highly unlikely that the similaritiesbetween the Kreung and United States results were because ofchance (Fig. 4).Taken together, the ANOVA and Monte Carlo simulation

revealed that the Kreung and United States data were re-markably similar given the possibility space, and that the com-bined data were best predicted by the emotions being conveyedand least predicted by the modality used. The final analysis ex-amined Euclidean distances between Kreung and United Statesdata for each emotion separately.

Cross-Cultural Similarity by Emotion: Euclidean Distance. For thisanalysis, we derived emotional “prototypes” from the results of theUnited States experiment. This derivation was accomplished byselecting the median value for each slider for each emotion (formusic and movement combined) from the United States resultsand mapping those values to the closest Kreung setting of “low,”“medium,” and “high.”For example, themedian rate for “sad”was46 BPM in the United States sample. This BPM was closest to the“low” setting used in the Kreung paradigm (55 BPM). Using thismethod for all five sliders, the “sad” United States prototype was:low rate, low jitter, medium consonance, low ratio of big to smallmovements, and high ratio of downward to upward movements.We measured the similarity of each of the Kreung datapoints tothe corresponding United States prototype by calculating theEuclidean distance between them.For every emotion except “angry,” this distance analysis re-

vealed that Kreung results (for music and movement combined)were closer to the matching United States emotional prototypesthan they were to any of the other emotional prototypes. In otherwords, the Kreung participants’ idea of “sad” was more similar tothe United States “sad” prototype than to any other emotionalprototype, and this cross-cultural congruence was observed for allemotions except “angry.”This pattern also held for the movementresults when considered separately from music. When the musicresults were evaluated alone, three of the five emotions (happy,sad, and scared) were closer to the matching United States pro-totype than any nonmatching prototypes.The threeKreung emotional expressions that were not closest to

their matching United States prototypes were “angry” movement,

Distribution of distances

Distance

Frequency

4 6 8 10 12

010000

20000

30000

40000

4.24

Fig. 4. Distribution of distances in the Monte Carlo simulation. Bold blackline indicates where the similarity of United States and Kreung datasets fallsin this distribution.

Sievers et al. PNAS | January 2, 2013 | vol. 110 | no. 1 | 73

PSYC

HOLO

GICALAND

COGNITIVESC

IENCE

S

Page 5: music and emotions

“angry” music, and “peaceful” music; however, these had severalmatching parameters. For both cultures, “angry” music and “angry”movement were fast and downward. Although it was closer to theUnited States “scared” prototype, Kreung “angry” music matchedthe United States “angry” prototype in four of five parameters.Kreung “peaceful” music was closest to the United States “happy”prototype, and second closest to the United States “peaceful” pro-totype. In both cultures, “happy” music was faster than “peaceful”music, and “happy”movementwas faster than “peaceful”movement.

DiscussionThese data suggest two things. First, the dynamic features ofemotion expression are cross-culturally universal, at least for thefive emotions tested here. Second, these expressions have similardynamic contours in both music and movement. That is, musicand movement can be understood in terms of a single dynamicmodel that shares features common to both modalities. Thisability is made possible not only by the existence of prototypicalemotion-specific dynamic contours, but also by isomorphic struc-tural relationships between music and movement.The natural coupling of music and movement has been sug-

gested by a number of behavioral experiments with adults. Fribergand Sundberg observed that the deceleration dynamics of a runnercoming to a stop accurately characterize the final slowing at theend of a musical performance (26). People also prefer to tap tomusic at tempos associated with natural types of humanmovement(27) and common musical tempi appear to be close to some bi-ological rhythms of the human body, such as the heartbeat andnormal gait. Indeed, people synchronize the tempo of their walkingwith the tempo of the music they hear (but not to a similarly pacedmetronome), with optimal synchronization occurring around 120BPM, a common tempo in music and walking. This finding led theauthors to suggest that the “perception ofmusical pulse is due to aninternalization of the locomotion system” (28), consistent moregenerally with the concept of embodied music cognition (29).The embodiment of musical meter presumably recruits the

putative mirror system comprised of regions that coactivate forperceiving and performing action (30). Consistent with this hy-pothesis, studies have demonstrated neural entrainment to beat(31, 32) indexed by beat-synchronous β-oscillations across audi-tory and motor cortices (31). This basic sensorimotor couplinghas been described as creating a pleasurable feeling of being “inthe groove” that links music to emotion (33, 34).The capacity to imitate biological dynamics may also be ex-

pressed in nonverbal emotional vocalizations (prosody). Severalstudies have demonstrated better than chance cross-culturalrecognition of several emotions from prosodic stimuli (35–39).Furthermore, musical expertise improves discrimination of tonalvariations in languages such as Mandarin Chinese, suggestingcommon perceptual processing of pitch variations across musicand language (40). It is thus possible, albeit to our knowledge nottested, that prosody shares the dynamic structure evinced here bymusic and movement. However, cross-modal fluency betweenmusic and movement may be particularly strong because of themore readily identifiable pitch contours and metric structure inmusic compared with speech (4, 41).The close relationship between music and movement has

attracted significant speculative attention from composers, musi-cologists, and philosophers (42–46). Only relatively recently havescientists begun studying the music-movement relationship em-pirically (47–50). This article addresses several limitations in thisliterature. First, using the same statistical model to generate musicand movement stimuli afforded direct comparisons previouslyimpossible because of different methods of stimulus creation.Second, modeling lower-level dynamic parameters (e.g., conso-nance), rather than higher-level constructs decreased the potentialfor cultural bias (e.g., major/minor). Finally, by creating emotionalexpressions directly rather than rating a limited set of stimuliprepared in advance, participants could explore the full breadth ofthe possibility space.

Fitch (51) describes the human musical drive as an “instinct tolearn,” which is shaped by universal proclivities and constraints.Within the range of these constraints “music is free to ‘evolve’ asa cultural entity, together with the social practices and contexts ofany given culture.”We theorize that part of the “instinct to learn”is a proclivity to imitate. Although the present study focuses onemotive movement, music across the world imitates many otherphenomena, including human vocalizations, birdsong, the soundsof insects, and the operation of tools and machinery (52–54).We do not claim that the dynamic features chosen here describe

the emotional space optimally; there are likely to be other usefulfeatures as well as higher-level factors that aggregate across fea-tures (37, 55–56). We urge future research to test other universal,cross-modal correspondences. To this end, labanotation—a sym-bolic language for notating dance—may be a particularly fruitfulsource. Based on general principles of human kinetics (57), laba-notation scripts speed (rate), regularity, size, and direction ofmovement, as well as “shape forms” consistent with smoothness/spikiness. Other Laban features not represented here, but poten-tially useful for emotion recognition, include weight and symmetry.It may also be fruitful to test whether perceptual tendenciesdocumented in one domain extend across domains. For example,innate (and thus likely universal) auditory preferences include:seven or fewer pitches per octave, consonant intervals, scales withunequal spacing between pitches (facilitating hierarchical pitchorganization), and binary timing structures (see refs. 14 and 58 forreviews). Infants are also sensitive to hierarchical pitch organiza-tion (5) andmelodic transpositions (see ref. 59 for a review). Theseauditory sensitivities may be the result of universal proclivities andconstraints with implications extending beyond music to otherdynamic domains, such as movement. Our goal was simply to usea small set of dynamic features that describe the space well enoughto provide a test of cross-modal and cross-cultural similarity.Furthermore, although these dynamic features describe the spaceof emotional expression for music and movement, the presentstudy does not address whether these features describe the spaceof emotional experience (60, 61).Our model should not be understood as circumscribing the

limits of emotional expression in music. Imitation of movement isjust one way among many in which music may express emotion, ascultural conventions may develop independently of evolved pro-clivities. This explanation allows for cross-cultural consistency yetpreserves the tremendous diversity of musical traditions aroundthe world. Additionally, we speculate that, across cultures, mu-sical forms will vary in terms of how emotions and their relatedphysical movements are treated differentially within each culturalcontext. Similarly, the forms of musical instruments and thesubstance of musical traditions may in turn influence differentialcultural treatment of emotions and their related physical move-ments. This interesting direction for further research will requireclose collaboration with ethnomusicologists and anthropologists.By studying universal features of music we can begin to map its

evolutionary history (14). Specifically, understanding the cross-modal nature ofmusical expressionmay in turn help us understandwhy and how music came to exist. That is, if music and movementhave a deeply interwoven, shared structure, what does that sharedstructure afford and how has it affected our evolutionary path? Forexample, Homo sapiens is the only species that can follow preciserhythmic patterns that afford synchronized group behaviors, suchas singing, drumming, and dancing (14). Homo sapiens is also theonly species that forms cooperative alliances between groups thatextend beyond consanguineal ties (62). One way to form andstrengthen these social bonds may be through music: specificallythe kind of temporal and affective entrainment that music evokesfrom infancy (63). In turn, these musical entrainment-based bondsmay be the basis forHomo sapiens’ uniquely flexible sociality (64).If this is the case, then our evolutionary understanding of music isnot simply reducible to the capacity for entrainment. Rather,music is the arena in which this and other capacities participate indetermining evolutionary fitness.

74 | www.pnas.org/cgi/doi/10.1073/pnas.1209023110 Sievers et al.

Page 6: music and emotions

The shared structure of emotional music and movement mustbe reflected in the organization of the brain. Consistent with thisview, music and movement appear to engage shared neural sub-strates, such as those recruited by time-keeping and sequencelearning (31, 65, 66). Dehaene and Cohen (67) offer the term“neuronal recycling” to describe how late-developing culturalabilities, such as reading and arithmetic, come into existence byrepurposing brain areas evolved for older tasks. Dehaene andCohen suggest music “recycles” or makes use of premusicalrepresentations of pitch, rhythm, and timbre. We hypothesizethat this explanation can be pushed a level deeper: neural rep-resentations of pitch, rhythm, and timbre likely recycle brainareas evolved to represent and engage with spatiotemporal per-

ception and action (movement, speech). Following this line ofthinking, music’s expressivity may ultimately be derived from theevolutionary link between emotion and human dynamics (12).

ACKNOWLEDGMENTS. We thank Dan Wegner, Dan Gilbert, and JonathanSchooler for comments on previous drafts; George Wolford for statisticalguidance; Dan Leopold for help collecting the United States data; and theRatanakiri Ministry of Culture, Ockenden Cambodia, and Cambodian LivingArts for facilitating visits to L’ak and for assistance with Khmer-Kreung trans-lation, as well as Trent Walker for English-Khmer translation. We also thankthe editor and two anonymous reviewers for providing us with constructivecomments and suggestions that improved the paper. This research was sup-ported in part by a McNulty grant from The Nelson A. Rockefeller Center (toT.W.) and a Foreign Travel award from The John Sloan Dickey Center forInternational Understanding (to T.W.).

1. Baily J (1985) Musical Structure and Cognition, eds Howell P, Cross I, West R (Aca-demic, London).

2. Juslin PN, Laukka P (2004) Expression, perception, and induction of musical emotions:A review and a questionnaire study of everyday listening. J New Music Res 33(3):217–238.

3. Winkler I, Háden GP, Ladinig O, Sziller I, Honing H (2009) Newborn infants detect thebeat in music. Proc Natl Acad Sci USA 106(7):2468–2471.

4. Zentner MR, Eerola T (2010) Rhythmic engagement with music in infancy. Proc NatlAcad Sci USA 107(13):5768–5773.

5. Bergeson TR, Trehub SE (2006) Infants’ perception of rhythmic patterns.Music Percept23(4):345–360.

6. Phillips-Silver J, Trainor LJ (2005) Feeling the beat: Movement influences infantrhythm perception. Science 308(5727):1430.

7. Trehub S (2000) The Origins of Music. , Chapter 23, eds Wallin NL, Merker B, Brown S(MIT Press, Cambridge, MA).

8. Higgins KM (2006) The cognitive and appreciative import of musical universals. RevInt Philos 2006/4(238):487–503.

9. Fritz T, et al. (2009) Universal recognition of three basic emotions in music. Curr Biol19:1–4.

10. Ekman P (1993) Facial expression and emotion. Am Psychol 48(4):384–392.11. Izard CE (1994) Innate and universal facial expressions: Evidence from developmental

and cross-cultural research. Psychol Bull 115(2):288–299.12. Scherer KR, Banse R, Wallbott HG (2001) Emotion inferences from vocal expression

correlate across languages and cultures. J Cross Cult Psychol 32(1):76–92.13. Darwin C (2009) The Expression of the Emotions in Man and Animals (Oxford Univ

Press, New York).14. Brown S, Jordania J (2011) Universals in the world’s musics. Psychol Music, 10.1177/

0305735611425896.15. Huron D (1994) Interval-class content in equally tempered pitch-class sets: Common

scales exhibit optimum tonal consonance. Music Percept 11(3):289–305.16. Parncutt R (1989) Harmony: A Psychoacoustical Approach (Springer, Berlin).17. McDermott JH, Lehr AJ, Oxenham AJ (2010) Individual differences reveal the basis of

consonance. Curr Biol 20(11):1035–1041.18. Bidelman GM, Heinz MG (2011) Auditory-nerve responses predict pitch attributes

related to musical consonance-dissonance for normal and impaired hearing. J AcoustSoc Am 130(3):1488–1502.

19. Köhler W (1929) Gestalt Psychology (Liveright, New York).20. Ramachandran VS, Hubbard EM (2001) Synaesthesia: A window into perception,

thought and language. J Conscious Stud 8(12):3–34.21. Maurer D, Pathman T, Mondloch CJ (2006) The shape of boubas: Sound-shape cor-

respondences in toddlers and adults. Dev Sci 9(3):316–322.22. Krumhansl CL (2002) Music: A link between cognition and emotion. Curr Dir Psychol

Sci 11(2):45–50.23. Bernhardt D, Robinson P (2007) Affective Computing and Intelligent Interaction, eds

Paiva A, Prada R, Picard RW (Springer, Berlin), pp 59–70.24. Hevner K (1936) Experimental studies of the elements of expression in music. Am J

Psychol 48(2):246–268.25. United Nations Development Programme Cambodia (2010) Kreung Ethnicity: Docu-

mentation of Customary Rules (UNDP Cambodia, Phnom Penh, Cambodia). Availableat www.un.org.kh/undp/media/files/Kreung-indigenous-people-customary-rules-Eng.pdf. Accessed June 29, 2011.

26. Friberg A, Sundberg J (1999) Does music performance allude to locomotion? A modelof final ritardandi derived from measurements of stopping runners. I. Acoust Soc Am105(3):1469–1484.

27. Moelents D, Van Noorden L (1999) Resonance in the perception of musical pulse. JNew Music Res 28(1):43–66.

28. Styns F, van Noorden L, Moelants D, Leman M (2007) Walking on music. Hum Mov Sci26(5):769–785.

29. Leman M (2007) Embodied Music Cognition and Mediation Technology (MIT Press,Cambridge, MA).

30. Molnar-Szakacs I, Overy K (2006) Music and mirror neurons: From motion to ‘e’mo-tion. Soc Cogn Affect Neurosci 1(3):235–241.

31. Fujioka T, Trainor LJ, Large EW, Ross B (2012) Internalized timing of isochronoussounds is represented in neuromagnetic β oscillations. J Neurosci 32(5):1791–1802.

32. Nozaradan S, Peretz I, Missal M, Mouraux A (2011) Tagging the neuronal entrainmentto beat and meter. J Neurosci 31(28):10234–10240.

33. Janata P, Tomic ST, Haberman JM (2012) Sensorimotor coupling in music and thepsychology of the groove. J Exp Psychol Gen 141(1):54–75.

34. Koelsch S, Siebel WA (2005) Towards a neural basis of music perception. Trends CognSci 9(12):578–584.

35. Bryant GA, Barrett HC (2008) Vocal emotion recognition across disparate cultures. JCogn Cult 8(1-2):135–148.

36. Sauter DA, Eisner F, Ekman P, Scott SK (2010) Cross-cultural recognition of basic emotionsthrough nonverbal emotional vocalizations. Proc Natl Acad Sci USA 107(6):2408–2412.

37. Banse R, Scherer KR (1996) Acoustic profiles in vocal emotion expression. J Pers SocPsychol 70(3):614–636.

38. Thompson WF, Balkwill LL (2006) Decoding speech prosody in five languages. Sem-iotica 2006(158):407–424.

39. Elfenbein HA, Ambady N (2002) On the universality and cultural specificity of emotionrecognition: A meta-analysis. Psychol Bull 128(2):203–235.

40. Marie C, Delogu F, Lampis G, Belardinelli MO, Besson M (2011) Influence of musicalexpertise on segmental and tonal processing in Mandarin Chinese. J Cogn Neurosci23(10):2701–2715.

41. Zatorre RJ, Baum SR (2012) Musical melody and speech intonation: Singing a differenttune. PLoS Biol 10(7):e1001372.

42. Smalley D (1996) The listening imagination: Listening in the electroacoustic era.Contemp Music Rev 13(2):77–107.

43. Susemihl F, Hicks RD (1894) The Politics of Aristotle (Macmillan, London), p 594.44. Meyer LB (1956) Emotion and Meaning in Music (Univ of Chicago Press, Chicago, IL).45. Truslit A (1938) Gestaltung und Bewegung in der Musik. [Shape and Movement in

Music] (Chr Friedrich Vieweg, Berlin-Lichterfelde). German.46. Iyer V (2002) Embodied mind, situated cognition, and expressive microtiming in Af-

rican-American music. Music Percept 19(3):387–414.47. Gagnon L, Peretz I (2003) Mode and tempo relative contributions to “happy-sad”

judgements in equitone melodies. Cogn Emotion 17(1):25–40.48. Eitan Z, Granot RY (2006) Howmusic moves: Musical parameters and listeners’ images

of motion. Music Percept 23(3):221–247.49. Juslin PN, Lindström E (2010) Musical expression of emotions: Modelling listeners’

judgments of composed and performed features. Music Anal 29(1-3):334–364.50. Phillips-Silver J, Trainor LJ (2007) Hearing what the body feels: Auditory encoding of

rhythmic movement. Cognition 105(3):533–546.51. Fitch WT (2006) On the biology and evolution of music. Music Percept 24(1):85–88.52. Fleming W (1946) The element of motion in baroque art and music. J Aesthet Art Crit

5(2):121–128.53. Roseman M (1984) The social structuring of sound: The temiar of peninsular malaysia.

Ethnomusicology 28(3):411–445.54. Ames DW (1971) Taaken sàmàarii: A drum language of hausa youth. Africa 41(1):12–31.55. Vines BW, Krumhansl CL, Wanderley MM, Dalca IM, Levitin DJ (2005) Dimensions of

emotion in expressive musical performance. Ann N Y Acad Sci 1060:462–466.56. Russell JA (1980) A circumplex model of affect. J Pers Soc Psychol 39(6):1161–1178.57. Laban R (1975) Laban’s Principles of Dance and Movement Notation. 2nd edition, ed

Lange R (MacDonald and Evans, London).58. Stalinski SM, Schellenberg EG (2012) Music cognition: A developmental perspective.

Top Cogn Sci 4(4):485–497.59. Trehub SE, Hannon EE (2006) Infant music perception: Domain-general or domain-

specific mechanisms? Cognition 100(1):73–99.60. Gabrielsson A (2002) Emotion perceived and emotion felt: Same or different? Music

Sci Special Issue 2001–2002:123–147.61. Juslin PN, Västfjäll D (2008) Emotional responses to music: The need to consider un-

derlying mechanisms. Behav Brain Sci 31(5):559–575, discussion 575–621.62. Hagen EH, Bryant GA (2003) Music and dance as a coalition signaling system. Hum Nat

14(1):21–51.63. Phillips-Silver J, Keller PE (2012) Searching for roots of entrainment and joint action in

early musical interactions. Front Hum Neurosci, 10.3389/fnhum.2012.00026.64. Wheatley T, Kang O, Parkinson C, Looser CE (2012) From mind perception to mental

connection: Synchrony as a mechanism for social understanding. Social Psychologyand Personality Compass 6(8):589–606.

65. Janata P, Grafton ST (2003) Swinging in the brain: Shared neural substrates for be-haviors related to sequencing and music. Nat Neurosci 6(7):682–687.

66. Zatorre RJ, Chen JL, Penhune VB (2007) When the brain plays music: Auditory-motorinteractions in music perception and production. Nat Rev Neurosci 8(7):547–558.

67. Dehaene S, Cohen L (2007) Cultural recycling of cortical maps. Neuron 56(2):384–398.

Sievers et al. PNAS | January 2, 2013 | vol. 110 | no. 1 | 75

PSYC

HOLO

GICALAND

COGNITIVESC

IENCE

S

Page 7: music and emotions

Supporting InformationSievers et al. 10.1073/pnas.1209023110SI TextData. Spreadsheets containing the raw data are attached. DatasetS1 includes all of the data from the United States experiment.Dataset S2 includes all of the data from the Kreung experiment,as well as discretized data from the United States experiment. Theraw means by emotion are provided in Table S1, for each pop-ulation. Table S2 provides the z-scored means of the discrete databy Emotion, Slider, and Population (corresponding to the cross-cultural ANOVA). These means were z-scored within each slider(using discrete data), for each population separately. These meansare graphically portrayed in Fig. S1.

Fisher’s Linear Discriminant Analysis. In an attempt to estimatewhichsliders were most important to the emotion categorization effect,we performed a Fisher’s linear discriminant analysis between theslider values and their associated emotions for each population, foreach modality. The numbers represent the importance of eachfeature (slider) in distinguishing the given emotion from all of theother emotions (higher numbers mean more importance). Thesedata are presented in Table S3.In the United States data, consonance was the most effective

feature for discriminating each emotion from the other fouremotions in both modalities, accounting for ∼60% of the totaldiscrimination between emotions. The second most effective fea-ture was direction (up/down), accounting for 26% (music) and35% (movement). In the Kreung music data, rate and step sizewere most effective for discriminating each emotion comparedwith the other four; rate and consonance were most effective forthe Kreung movement data.It is important to note that low discriminant values do not nec-

essarily imply unimportance for two reasons. First, this analysis onlyreveals the importance of each feature as a discriminant for eachemotion when that emotion is compared with the other fouremotions. That is, any other comparison (e.g., each emotioncompared with a different subset of emotions) would yield differentvalues. For example, although jitter may seem relatively un-important fordiscriminatingemotionsbasedonthedata inTableS3,jitter was a key feature for discriminating between particularemotion dyads (e.g., “scared” vs. “sad” in the United States data).Second, whether a parameter (slider) was redundant in this datasetis impossible to conclude because of potential interactions betweenparameters. Additional research is necessary to determine whetherone or more features could be excluded without significant cost toemotion recognition. Such research would benefit from testingeach feature in isolation (e.g., by holding others constant) to betterelucidate its contribution to emotional expression within and acrossmodalities and cultures.

Multimedia.Audio and visual files of the emotional prototypes forboth music and movement in the United States and Kreungexperiments are available as Audios S1, S2, S3, S4, S5, S6, S7, S8,S9, and S10, and Movies S1, S2, S3, S4, S5, S6, S7, S8, S9, andS10. Each file contains three sequential probabilistically gener-ated examples based on the prototype settings as explained in thecross-cultural Euclidean distance analysis.

Detailed Methods. Our computer program was created using Max/MSP (1), Processing (2), and OpenGL (3). Subjects were pre-sented an interface with slider-bars corresponding to the five di-mensions of our statistical model: rate (in beats per minute orBPM), jitter (SD of rate), consonance/visual spikiness, step size,and step direction. The five sliders controlled parametric values

fed to an algorithm that probabilistically moved the position ofa marker around a discrete number-line in real time. We will referto the movements of this marker as a path. The position of themarker at each step in the generated path was mapped to eithermusic or animated movement.The number-line traversal algorithm can be split into two parts.

The first part, called the metronome, controlled the timing oftrigger messages sent to the second part, called the path gener-ator, which kept track of and controlled movement on the numberline. The tempo and jitter parameters were fed to the metronome,and the consonance, step size, and step direction parameters werefed to the path generator. When the subject pressed the space baron the computer keyboard, the metronome turned on, sent 16trigger messages to the path generator (variably timed as de-scribed below), and then turned off. The beginnings and endingsof paths correspond to the on and off of the metronome.Tempowasconstrained tovaluesbetweenaminimumof30BPM

and amaximumof 400BPM. Jitter was expressed as a coefficient ofthe tempowith a range between 0 and 0.99.When jitter was set to 0,themetronome would send out a stream of events at evenly spacedintervals as specified by the tempo slider. If the jitter slider wereabove 0, then specific per-event delay values were calculatednondeterministically as follows. Immediately before each event,a uniformly random value was chosen between 0 and the currentvalue of the jitter slider. That value was multiplied by the period inmilliseconds as specified by the tempo slider, and then the nextevent was delayed by a number of milliseconds equal to the result.These delays were specified on a per-event basis and applied toevents after they left the metronome. No event was delayed forlonger than the metronome period. This per-event delay was es-sentially a shifting or “sliding” of each event in the stream toward—but never past—the next note in the stream. Each shift left lessempty space on one side of the note’s original position and moreempty space on the other. This process ensured that tempo andjitter were independent. The effect was that as the value of thejitter slider increased, the precise timing of event onsets becameless predictable but the mean event density remained the same.The path generator can be conceived of as a “black box” with

a memory slot, which could store one number and which re-sponded to a small set of messages: reset, select next number, andoutput next number. Whenever the path generator was sent thereset message, a new starting position was picked and stored in thememory slot (the exact value of the starting position was con-strained by the value of the scale choice slider as explained below).Whenever the path generator was sent the select next numbermessage, it picked a new number according to the constraintsspecified by the slider bars: first, the size of the interval was se-lected, then the direction (up or down), then a specific numberaccording to the position of the scale choice slider. The output nextnumber message caused the path generator to output the nextnumber to the music and motion generators, described below.When selecting a new number, the path generator first chose a

step size, or the distance between the previous number (stored inthe memory slot) and the next. This value was calculated non-deterministically based on the position of the step size slider. Thestep size slider had a minimum value of 0 and a maximum valueof 1. When choosing a step size, a uniformly random numberbetween 0 and 1 was generated. This number was then used as thex value in the following equation, where a = the value of the stepsize slider:

Sievers et al. www.pnas.org/cgi/content/short/1209023110 1 of 19

Page 8: music and emotions

r ¼(1− a− a · xþ 1 x ≤ a− a1− a · ðx− 1Þ x > a

The result r was multiplied by 4 and then rounded up to thenearest integer to give the step size of the event. As the value ofthe step size slider increased, the likelihood of a small step sizedecreased, and vice versa. If the slider was in the minimum po-sition, all of the steps would be as small as possible. If it was inthe maximum position, all of the steps would be as large aspossible. If it was in the middle position, there would be an equallikelihood of all possible step sizes. Other positions skew thedistribution one way or the the other, where higher values re-sulted in a larger average step size. Note that these step size unitsdid not correspond directly to the units of the number line; theywere flexibly mapped to the number line as directed by the set-ting of the consonance parameter, as described below.After the step size was chosen, the path generator determined

the direction of the next step: up or down. As with step size, thestep direction was calculated nondeterministically based on theposition of the step direction slider. The step direction slider hada minimum value of 0 and a maximum value of 1. When choosingstep direction, a uniformly random number between 0 and 1 wasgenerated. If that number was less than or equal to the value ofthe step direction slider, then the next step would be downward;otherwise the next step would be upward.Finally, thenumberwasmappedon tooneof38unique scales.As

the notion of a scale is drawn from Western music theory, thisdecision requires some elaboration. In Western music theory,a collection of pitches played simultaneously or in sequencemay beheard as consonant or dissonant. The perception of a givenmusicalnote as consonant or dissonant is not a function of its absolute pitchvalue, but of the collection of intervals between all pitches com-prising the current chord or phrase. The relationship between in-terval size and dissonance is nonlinear. For example, an interval ofseven half steps, or a perfect fifth, is considered quite consonant,whereas an interval of six half steps, or a tritone, is considered quitedissonant. Intervallic distance, consonance/dissonance, and equiv-alency are closely related. If a collection of pitch classes x (a pitchclass set, or PC set) has the same set of intervallic relationships asanother PC set y, those two PC sets will have the same degree ofconsonance and are transpositionally identical (and in certainconditions equivalent).Absolute pitches also possess this property of transpositional

equivalency. When the frequency of a note is doubled, it is per-ceived as belonging to the same pitch class. For example, the A keyclosest to the middle of a piano has a fundamental frequency of440 Hz, but theA an octave higher has a fundamental frequency of880 Hz; both are heard as an A. Western music divides the octaveinto 12 pitch classes, called the chromatic scale, from which allother scales are derived. Because we wanted to investigate musicaldissonance and possible functional analogs in the modality ofmotion, our number-line scales were designed to be analogous tomusical scales, where a number-line scale is a five-member subsetof the chromatic set [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]. There are 768such subsets of the chromatic set, many of which are (in thedomain of music) transpositionally or inversionally equivalent.Our scale list was created by generating the prime forms of these768 subsets and then removing duplicates, yielding 38 uniquescales (4). These scales were ordered by their aggregate dyadicconsonance (5).The algorithm for generating a specific path across the number

line was as follows. The number line consisted of the integers from0 to 127 inclusive. When the algorithm began, three variableswere stored. First, a starting-point offset between 0 and 11 wasselected uniformly at random, then an octave bias variable was setto 5, and a scale position variable was set to 0. The starting-point

offset was used to ensure that each musical phrase began ona different, randomly selected note, ensuring that no single context-determining “tonic” pitch or “root” scale degree could be identi-fied on the basis of repetition. The current scale class was de-termined by using the scale position variable as an index to thearray of scale elements specified by the position of the scale slider.For example, if the current selected scale was [0, 3, 4, 7, 10] andthe current scale position variable was 2, then the current scaleclass would be 4 (indices start from 0). The current position on thenumber line was given by multiplying the octave bias by 12, addingthe starting-point offset, and then adding the current scale classvalue. For example, if the octave bias was 5, the starting-pointoffset was 4, and the scale class value was 7, then the currentposition on the number line would be 71.When the select next number message was received, an interval

and note direction were selected as described above. If the notedirection was upward, then the new scale position value was givenby the following:

ðcurrent scale positionþ new interval valueÞ% 5

If the note direction was downward, then the new scale posi-tion value was given by:

5þ ðcurrent scale position− new interval valueÞ

Either of these conditions may imply a modular “wrappingaround” the set of possible values (0–4). If this is the case, then thecurrent octave variable is either incremented by 1 in the case ofan upward interval, or decremented by 1 in the case of a down-ward interval. If a step in the path would move the position onthe number line outside of the allowed range, 12 would be eitheradded to or subtracted from the new position. This finding meansthat when the upper or lower boundaries of the allowed pitchrange were hit, the melody would simply stay in the topmost orbottommost octave, flattening out the overall pitch contour at theextremes. This process could result in occasional upward melodicintervals at the bottommost extreme or downwardmelodic intervalsat the topmost extreme, despite the setting of the pitch directionslider. In practice, this rarely occurred, and was confined to themore extreme “angry” emotional expressions where pitch directionwas maximally downward and step size was maximally large.The subjects were divided into two groups. For the first group,

number-line values were mapped to musical notes, and for thesecond group, number-line values were mapped to animatedmovement.Our mapping from movement across a number-line to Western

music was straightforward, as its most significant modality-specificfeatures were taken care of by the very design of the number-linealgorithm. The division of pitches into pitch-classes and scales isaccounted for by the scale-class and scale selection system used bythe algorithm, as is the modulo 12 equivalency of pitch-classes.Each number wasmapped to a specific pitch which was sounded asthe algorithm selects the number. The number 60 was mapped tomiddle C, or C4. Movement of a distance of 1 on the number linecorresponded to a pitch change of a half-step, with higher numbersbeing higher in pitch. For example, 40 maps to E2, 0 maps to A0,and 127maps toG9. Notes were triggered viaMIDI and played onthe grand piano instrument included with Apple GarageBand.Mapping from movement across a number-line to animated

movement was less straightforward. Our animated character wasa red ellipsoid ball with cubic “eyes” (Fig. S2). The ball sat atopa rectangular dark gray “floor” on a light gray background. Anellipsoid was chosen because it can be seen as rotating around acenter. The addition of eyes was intended to engage cognitiveprocesses related to the perception of biological motion. We wan-ted our subjects to perceive the ball as having its own subjectivity,

Sievers et al. www.pnas.org/cgi/content/short/1209023110 2 of 19

Page 9: music and emotions

that it could be capable of communicating or experiencing hap-piness, sadness, and so forth. The movement of our character(henceforth referred to as “the Ball”) was limited to bouncing upand down, rotating forward and backward, and modulating thespikiness of its surface. Technical details follow.The Ball was drawn as a red 3D sphere composed of a limited

number of triangular faces, which were transformed into an el-lipsoid by scaling its y axis by a factor of 1.3. The Ball was posi-tioned such that it appeared to be resting on a rectangular floorbeneath it. Its base appeared to flatten where it made contact withthe floor. The total visible height of the Ball when it is above thefloor was 176 pixels; this was reduced to 168 pixels when the Ballwas making contact with the floor. Its eyes were small white cubeslocated about 23% downward from the top of the ellipsoid. TheBall and the floor are rotated about the y axis such that it ap-peared the Ball was looking somewhere to the left of the viewer.Every time the current position on the number line changed, the

Ball would bounce. A bounce is the translation of the Ball toa position somewhere above its resting position and back downagain.Bouncedurationwasequal to93%ofthecurrentperiodof themetronome. The 7%reductionwas intended to create a perceptible“landing” between each bounce. Bounce height was determined bythe difference between the current position on the number line andthe previous position. A difference of 1 resulted in a bounce heightof 20 pixels. Each additional addition of 1 to the difference in-creased the bounce height by 13.33 pixels (e.g., a difference of 5would result in a bounce height of 73.33 pixels). The Ball reachedits translational apex when the bounce was 50% complete. The arcof the bounce followed the first half of a sine curve.The Ball would rotate, leaning forward or backward, depending

on the current number line value. High values caused the Ball tolean backward, such that it appeared to look upward, and lowvalues caused the Ball to lean forward or look down. When thecurrent value of the number line was 60, the Ball’s angle of ro-tation was 0°. An increase of 1 on the number line decreased theBall’s angle of rotation by 1°; conversely, a decrease of 1 on thenumber line increased the Ball’s angle of rotation by 1°. Forexample, if the current number-line value were 20, the Ball’sangle of rotation would be 40°. If the current number-line valuewere 90, the Ball’s angle of rotation would be −30°.TheBall could also bemore or less “spiky.”The amplitude of the

spikes or perturbations of the Ball’s surface were analogicallymapped to musical dissonance. The visual effect was achieved byadding noise to the x, y, and z coordinates of each vertex in the setof triangles comprising the Ball. Whenever a new position on thenumber-line was chosen, the aggregate dyadic consonance of theinterval formed by the new position and the previous position wascalculated. The maximum aggregate dyadic consonance was 0.8,the minimum was −1.428. The results were scaled such that whenthe consonance value was 0.8, the spikiness value was 0, and whenthe consonance value was −1.428, the spikiness value was 0.2.Changes in consonance of 0.01 resulted in a change of 0.008977 tothe spikiness value. For each vertex on the Ball’s surface, spikinessoffsets for each of the three axes were calculated. Each spikinessoffset was a number chosen uniformly at random between −1 and1, which was then multiplied by the Ball’s original spherical radiustimes the current spikiness value.For the emotion labels care was taken to avoid using words

etymologically related to either music or movement (e.g., “up-beat” for “happy” or “downtrodden” for “sad”). See Fig. S3A fora screen shot of the United States experiment interface and thelists of emotion words presented to participants.In the United States experiment, the labels for the slider bars

changed between tasks as described in Fig. S3B. In the Kreungexperiment, slider bars were not labeled in the music task, andwere accompanied by icons in the movement task. See Fig. S4 fora screenshot of the slider bars during Kreung movement task.

To confirm or reject the presence of a cross-cultural code, itneeded to be possible for Kreung participants to select slider-barpositions that would create music and movement similar to thatcreated by the United States participants. For this reason, thediscretization values for the Kreung slider bars were derived fromthe United States data. These values are shown in Table S4.For consonance, the extreme low value of 4 was chosen by taking

the midpoint between the median consonance values for “angry”and “scared.” The extreme high value of 37 was the midpoint be-tween the median values for “happy” and “peaceful.” The centralvalue of 30 was chosen by taking the median value for “sad,” whichwas neither at the numeric middle nor either of the endpoints. Thevalues for the other sliders were selected similarly, with two ex-ceptions: For rate, for which there was no emotion sitting reliablybetween the high and low extremes, the numeric middle betweenthe extremes was chosen as the central value. For direction, be-cause the values for “happy” and “peaceful” clustered around thecenter of the scale, the ideal center (50, neither up nor down) waschosen as the central value.Although this discretization did limit the number of possible

settings of the slider bars, it did not substantially encourage theKreung participants to use the same settings as the United Statesparticipants. With three possibilities for each of five slider bars,there were 35 or 243 possible settings available per emotion, withonly one of those corresponding to the choices of the UnitedStates participant population. That is, for each emotion, therewas a 0.4% chance the prototypical United States configurationwould be chosen at random.In the United States experiment the sliders were automatically

set to random positions at the beginning of each session. In theKreung experiment, with discretized sliders with three values perslider, all of the sliderswere set in themost neutral,middle position.Subjects could press a button (the space bar on the computerkeyboard) to begin a melody or movement sequence. The sliderscould bemoved both during and between sequences. The durationof each sequencewas determined by the tempo setting of the slider.When moved during a sequence, the melody/ball would changeimmediately in response to the movements. Between musicsequences, there was silence. Between movement sequences, theball would hold still in its final position before resetting to a neutralposition at the beginning of the next sequence.

Slider Bar Reliability for Kreung Data. The discrete nature of theKreung data afforded χ2 analyses to quantify the likelihood thatparameters (sliders) were used randomly. The values in Table S5indicate the likelihood that the distributions of slider positions inthe Kreung data were because of chance (lower values indicatelower likelihood of random positioning). As can be seen, the rateslider was used systematically (nonrandomly) for all emotions,across modalities. Other sliders varied in their reliability by emo-tion but were used nonrandomly for subsets of emotion.

L’ak and Modernization. L’ak, the village where we conducted ourstudy, had no infrastructure for water, waste management, or elec-tricity, although it was equipped with a gas-powered generator. TheKreung language is not mutually intelligible with Khmer, Cambo-dia’s official language, and has no writing system. L’ak and nearbyvillages maintain their own dispute resolution practices separatefrom the Cambodian legal system. The Kreung practice an animistreligion and havemaintained related practices such as speaking withspirits and ritual animal sacrifice (6). Access to the village is limitedby its remote location and the difficulty of travel on unmaintaineddirt roads, which require a four-wheel drive vehicle and are im-passable much of the year because of flooding. Almost none of theKreung participants could speak or read Khmer, so communica-tion was facilitated by an English-Khmer translator who worked inconjunctionwith aKhmer-Kreung translator who lived in the village.

Sievers et al. www.pnas.org/cgi/content/short/1209023110 3 of 19

Page 10: music and emotions

Until large-scale logging operations started in Ratanakiri in thelate 1990s, the tribal ethnic minorities in the area remained cul-turally isolated. The destruction of the forestsmade the traditionalpractices of slash-and-burn agriculture and periodic village re-location untenable, and the past decade has seen gradual, partial,and reluctant modernization (7). This modernization has beenlimited, and has not resulted in sustained contact with Westernculture via television, movies, magazines, books, or radio.We conducted a survey of our participants to determine the

extent of their exposure to non-Kreung culture. The survey in-cluded age, sex, cell phone ownership, time spent talking orlistening to music on cell phones, time spent watching television,and time spent speaking Khmer vs. speaking Kreung. Very few ofthe Kreung participants owned cell phones, and those that didreported spending very little time using them for listening tomusic. None of the Kreung participants reported having listenedto Western music. However, some reported having watched vid-eotapes of Thai movies that had been dubbed into Khmer forentertainment, and thus may have had passive exposure to Khmerand Thai music on video soundtracks. We should note again thatmost of our participants could not speak or understand Khmer.We did not disqualify participants with passive exposure to

Khmer and Thai music via videos for the following reasons: (i)Khmer and Thai music are not Western. Their traditions, in-struments, styles of singing, tuning systems, and use of verticalharmony (if any) are substantially different from those inWesternmusic, and so exposure to Khmer and Thai music would not ac-climate our participants to Western musical conventions. (ii) Ourcomputer program was not biased toward Western film music.The program excluded vertical harmony and systematic rhythmicvariation, and included many more scales than the familiarWestern major and minor modes. Ultimately, both the Kreungand Western participants frequently chose settings outside thebounds of Western cliché.

Music and Dance in Kreung Culture. In Kreung culture, music anddanceoccurprimarily as apart of rituals suchasweddings, funerals,and animal sacrifices (6). Kreung music and dance traditions havenot been well documented. Interviews with Kreung musiciansindicated that there is no formal standardization of tuning ortemperament as is found in Western music, nor is there any sys-tem of vertical pitch relations equivalent to Western tonal har-mony; Kreung music tends to be heterophonic in nature. Kreungmusical instruments bear no obvious morphological relationshiptoWestern instruments. Furthermore, although someKreung andKhmer instruments are similar, most are profoundly different,and there is very little overlap between traditional Kreung musicand that performed throughout the rest of Cambodia. The geo-graphical isolation of the Kreung combined with the pronouncedformal dissimilarity of Kreung and Western music made L’ak anideal location for a test of cross-cultural musical universality.We observed two musical forms in L’ak. First, a gong orchestra,

where each performer plays a single gong in a prearrangedrhythmic pattern, causing a greater melodic pattern to emergefrom the ensemble, often accompanying group singing and danc-ing. Second was a heterophonic style of music based arounda string instrument called the mem, accompanied by singing andwooden flutes. In this form, all players follow the same melodicline while adding loosely synchronized embellishments. The memis an extremely quiet bowed monochord that uses the musician’smouth as a resonating chamber. Traditionally the mem is bowedwith a wooden or bamboo stick, and its sound is described as im-itative of buzzing insects. Kreung music is passed along by peda-gogical tradition, and the role of musician is performed primarilyby those who are highly skilled and extensively trained. Examples

of the two forms of Kreung music described here are included inAudios S11 and S12. These recordings are courtesy of CambodianLiving Arts (www.cambodianlivingarts.org) and Sublime Frequen-cies (www.sublimefrequencies.com).We found the Kreung data to be substantially noisier than the

data collected in theUnited States.We speculate that some of thisnoise was related to unfamiliarity with the experimental context,as described in the main text. However, additional noise may havebeen the result of unfamiliarity with the tuning, timbre, and scalesused in our program, none of which are native to Kreung culture.

Kreung-English Translation: “Happy” vs. “Peaceful.” In the Kreunglanguage there is noword that translates directly to “peaceful.”Afterextensive conversation with our translators the closest wordswe could find were “sngap” and “sngap chet.” Idiomatically, thesetranslate to something like “still heart,” which seemed to capturethe essence of “peacefulness” we were looking for. However,“sngap” and “sngap chet” do not refer to emotion as a state ofbeing, but instead refer to emotion as an active process. In par-ticular, they refer to the process of having been angry and thenexperiencing that anger dissolve into happiness. For this reason,both words strongly connote happiness. Some of our subjectsseemed to use these words as synonyms for happiness, occasion-ally even reporting they had completed expressing “happy” afterbeing asked to express “peaceful,” although never the other wayaround. The difficulty understanding the concept of peacefulnesscross-culturally may be consistent with a previous finding in theliterature that reported peacefulness as the least successfullyidentified emotion compared with “happy,” “sad,” and “scary”(8). Nevertheless, the Kreung results show a distinct differencebetween “happy” and “peaceful.” “Peaceful”music and “peaceful”motion both tended to be substantially slower than their “happy”counterparts, a relationship matching the findings of the experi-ment in the United States.

Comment on Clynes and Nettheim. Clynes and Nettheim attemptedto show cross-modal, cross-cultural recognition of emotionalexpressions produced in the domain of touch andmapped to sound(9). However, there are several important differences betweentheir experiments and the one reported here. Clynes and Nettheimused a forced-choice paradigm and created individual touch-to-sound mappings per emotion, as opposed to using fixed rulesrepresenting hypotheses about the relationship between the twodomains. Clynes’s mapping decisions introduced intuitively gen-erated, arbitrary pitch content specific to each emotion, suggestingwhat was being tested was not a cross-modal relationship, butsimply the effect of pitch on emotion perception. Clynes proposesthe idea of “essentic forms”: fixed, short-time, essential emotionalforms that are biologically determined and measured in terms oftouch. Although this is interesting as a hypothesis, it is not con-firmed by the available data (10), and is not a model of feature-based cross-modal perception.

Informed Consent in the Kreung Village. Before the study began, wemet with several villagers and described the series of studies wewould be conducting. At this time we also discussed fair com-pensation. Together, we determined that a participant would bepaid the same amount that they would have forfeited by notgoing to work in the field that day. We set up the equipment in thehouse of one of the Khmer-Kreung translators. Any adult villagercould come to the house if he or she wanted to participate in thestudy. We did not solicit participation. As most of the villagers inL’ak cannot read or write, we did not obtain written consent. In-stead, consent was implied by coming to the house to participate.

Sievers et al. www.pnas.org/cgi/content/short/1209023110 4 of 19

Page 11: music and emotions

1. Zicarelli D (1998) in Proceedings of the 1998 International Computer Music Conference,ed Simoni M (Univ of Michigan, Ann Arbor), pp 463–466.

2. Reas C, Fry B (2006) Processing: Programming for the media arts. AI Soc 20(4):526–538.3. Rost RJ (2004) OpenGL Shading Language (Addison-Wesley, Boston).4. Forte A (1973) The Structure of Atonal Music (Yale Univ Press, New Haven).5. Huron D (1994) Interval-class content in equally tempered pitch-class sets: Common

scales exhibit optimum tonal consonance. Music Percept 11(3):289–305.6. United Nations Development Programme Cambodia (2010) Kreung Ethnicity: Docu-

mentation of Customary Rules (UNDP Cambodia, Phnom Penh, Cambodia). Availableat www.un.org.kh/undp/media/files/Kreung-indigenous-people-customary-rules-Eng.pdf. Accessed June 29, 2011.

7. Paterson G, Thomas A (2005) Commonplaces and Comparisons: Remaking Eco-Political Spaces in Southeast Asia (Regional Center for Social Science and SustainableDevelopment, Chiang Mai, Thailand).

8. Vieillard S, et al. (2008) Happy, sad, scary and peaceful musical excerpts for researchon emotions. Cogn Emotion 22(4):218–237.

9. Clynes M, Nettheim N (1982) Music, Mind, and Brain (Plenum, New York), pp47–82.

10. Trussoni SJ, O’Malley A, Barton A (1988) Human emotion communication by touch:A modified replication of an experiment by Manfred Clynes. Percept Mot Skills 66(2):419–424.

Fig. S1. Average values of z-scores (with SE bars) for each slider (i.e., feature) by each emotion and by population. The sign for the up/down slider was flippedfor visualization purposes (positive values indicate upward tilt/pitch).

Sievers et al. www.pnas.org/cgi/content/short/1209023110 5 of 19

Page 12: music and emotions

Fig. S2. The Ball.

Fig. S3. (A) Interface for the United States music task. (B) Slider labels by task modality.

Sievers et al. www.pnas.org/cgi/content/short/1209023110 6 of 19

Page 13: music and emotions

Fig. S4. Interface for the Kreung movement task with icons as a mnemonic aid.

Table S1. Raw means for each slider by emotion for eachpopulation

Parameter Range Angry Happy Peaceful Sad Scared

United States meansRate 30–400 331.00 280.12 69.48 53.74 289.68Jitter 0–99 53.70 33.24 11.30 19.44 58.22Consonance 0–37 8.00 32.00 32.34 22.66 10.08Size (big/small) 0–100 67.92 49.36 23.66 26.28 52.28Direction (up/down) 0–100 76.94 35.56 38.34 78.30 51.12

Kreung meansRate 0, 1, or 2 1.42 1.18 0.76 0.39 1.15Jitter 0, 1, or 2 1.06 0.84 0.80 1.09 1.00Consonance 0, 1, or 2 1.01 1.33 1.31 1.34 0.99Size (big/small) 0, 1, or 2 1.05 0.75 0.79 0.75 1.16Direction (up/down) 0, 1, or 2 1.20 0.80 0.92 1.13 1.16

Table S2. Z-scored, discrete means for each slider by emotion, foreach population

Parameter Angry Happy Peaceful Sad Scared

United States discrete meansRate 0.90 0.58 −1.00 −1.07 0.60Jitter 0.46 0.05 −0.68 −0.36 0.53Consonance −0.77 0.68 0.72 0.09 −0.72Size (big/small) 0.69 0.11 0.64 −0.49 0.24Direction (up/down) 0.66 −0.69 −0.59 0.71 −0.09

Kreung discrete meansRate 0.52 0.23 −0.26 −0.70 0.20Jitter 0.12 −0.15 −0.19 0.16 0.05Consonance −0.23 0.17 0.14 0.18 −0.26Size (big/small) 0.17 −0.17 −0.13 −0.18 0.31Direction (up/down) 0.19 −0.29 −0.15 0.10 0.15

Sievers et al. www.pnas.org/cgi/content/short/1209023110 7 of 19

Page 14: music and emotions

Table S3. Fisher’s linear discriminant analysis

Music and motion Angry Happy Peace Sad Scared Total

Music and MotionUnited States

Rate 0.02 0.01 0.05 0.07 <0.01 0.16Jitter 0.01 <0.01 0.02 0.01 0.04 0.09Consonance 0.58 0.92 0.38 0.18 0.89 2.95Size (big/small) 0.04 0.01 0.03 0.04 0.01 0.13Direction (up/down) 0.36 0.06 0.52 0.71 0.06 1.71

KreungRate 0.67 0.24 0.46 0.82 0.20 2.39Jitter 0.03 0.05 0.19 0.03 <0.01 0.31Consonance 0.14 0.17 0.13 0.08 0.32 0.84Size (big/small) 0.07 0.16 0.10 0.06 0.39 0.78Direciton (up/down) 0.09 0.39 0.11 <0.01 0.09 0.69

MusicUnited States

Rate 0.04 0.01 0.04 0.12 <0.01 0.22Jitter 0.02 <0.01 0.07 0.01 0.09 0.20Consonance 0.27 0.80 0.57 0.55 0.80 2.99Size (big/small) 0.05 0.02 0.18 0.01 0.07 0.33Direction (up/down) 0.63 0.16 0.13 0.32 0.03 1.27

KreungRate 0.30 0.50 0.08 0.95 0.10 1.93Jitter 0.05 0.04 0.04 <0.01 <0.01 0.15Consonance 0.46 0.07 0.17 0.03 0.02 0.75Size (big/small) 0.02 0.14 0.45 0.02 0.66 1.29Direction (up/down) 0.16 0.26 0.26 <0.01 0.21 0.90

MotionUnited States

Rate 0.01 <0.01 0.05 0.03 0.01 0.11Jitter <0.01 <0.01 0.01 0.02 0.02 0.07Consonance 0.78 0.91 0.32 0.01 0.75 2.77Size (big/small) 0.03 0.07 <0.01 0.11 0.11 0.33Direction (up/down) 0.18 0.01 0.63 0.83 0.12 1.77

KreungRate 0.89 <0.01 0.65 0.62 0.22 2.39Jitter <0.01 <0.01 0.27 0.07 <0.01 0.37Consonance 0.01 0.20 0.06 0.20 0.73 1.20Size (big/small) 0.06 0.09 <0.01 0.11 0.04 0.31Direction (up/down) 0.04 0.70 0.02 0.01 0.01 0.78

Linear discriminant analysis. Slider importance for discrimination of eachemotion from all other emotions. Higher values indicate higher importance.

Table S4. Discretization values for Kreung slider bars (derivedfrom United States data)

Parameter Low Medium High

Rate 55 187 320Jitter 4 25 61Consonance 4 30 37Size (big/small) 15 56 82Direction (up/down) 39 50 85

Sievers et al. www.pnas.org/cgi/content/short/1209023110 8 of 19

Page 15: music and emotions

Table S5. χ2 Reliability of slider bar use by Kreung participants

Music and motion Rate Jitter Consonance Step size (big/small) Direction (up/down)

MusicAngry <0.01 0.49 0.99 0.49 <0.01Happy 0.10 0.18 <0.01 0.02 0.06Peaceful 0.03 0.08 <0.01 <0.01 0.65Sad <0.01 0.56 <0.01 <0.01 0.34Sacred 0.08 0.24 0.34 0.04 0.08

MusicAngry 0.02 <0.01 0.02 0.81 <0.01Happy <0.01 0.06 0.01 <0.01 0.06Peaceful 0.08 0.81 0.17 <0.01 0.61Sad <0.01 0.22 0.40 0.15 0.75Sacred 0.11 0.06 0.42 0.01 0.01

MotionAngry <0.01 0.32 0.02 0.21 0.12Happy 0.21 0.12 <0.01 0.32 0.12Peaceful <0.01 0.03 <0.01 0.10 0.91Sad <0.01 0.30 <0.01 0.02 0.42Sacred 0.21 0.85 0.74 0.85 0.91

Movie S1. United States angry movement.

Movie S1

Sievers et al. www.pnas.org/cgi/content/short/1209023110 9 of 19

Page 16: music and emotions

Movie S2. Kreung angry movement.

Movie S2

Sievers et al. www.pnas.org/cgi/content/short/1209023110 10 of 19

Page 17: music and emotions

Movie S3. United States happy movement.

Movie S3

Sievers et al. www.pnas.org/cgi/content/short/1209023110 11 of 19

Page 18: music and emotions

Movie S4. Kreung happy movement.

Movie S4

Sievers et al. www.pnas.org/cgi/content/short/1209023110 12 of 19

Page 19: music and emotions

Movie S5. United States peaceful movement.

Movie S5

Sievers et al. www.pnas.org/cgi/content/short/1209023110 13 of 19

Page 20: music and emotions

Movie S6. Kreung peaceful movement.

Movie S6

Sievers et al. www.pnas.org/cgi/content/short/1209023110 14 of 19

Page 21: music and emotions

Movie S7. United States sad movement.

Movie S7

Sievers et al. www.pnas.org/cgi/content/short/1209023110 15 of 19

Page 22: music and emotions

Movie S8. Kreung sad movement.

Movie S8

Sievers et al. www.pnas.org/cgi/content/short/1209023110 16 of 19

Page 23: music and emotions

Movie S9. United States scared movement.

Movie S9

Sievers et al. www.pnas.org/cgi/content/short/1209023110 17 of 19

Page 25: music and emotions

Audio S6. Kreung peaceful music.

Audio S6

Audio S7. United States sad music.

Audio S7

Audio S8. Kreung sad music.

Audio S8

Audio S9. United States scared music.

Audio S9

Audio S10. Kreung scared music.

Audio S10

Audio S11. Example of Kreung gong music, courtesy of Sublime Frequencies.

Audio S11

Audio S12. Example of Kreung mem music, Bun Hear, courtesy of Cambodian Living Arts.

Audio S12

Dataset 1. All data from the United States experiment, in continuous format

Dataset S1

Dataset 2. All data from the United States and Kreung experiments

Dataset S2

Kreung data were collected as discrete values (each slider had three positions: 0 1 2). Continuous values from the United States data were converted todiscrete values as described in the main text.

Sievers et al. www.pnas.org/cgi/content/short/1209023110 19 of 19