pitch processing experience: comparison of musicians and tone … · 2016-01-08 · iii visual...

Pitch Processing Experience: Comparison of Musicians and Tone-Language Speakers on

Measures of Auditory Processing and Executive Function

by

Stefanie Andrea Hutka

A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy

Department of Psychology University of Toronto

© Copyright by Stefanie Andrea Hutka 2015

ii

Pitch Processing Experience: Comparison of Musicians and

Tone-Language Speakers on Measures of Auditory

Processing and Executive Function

Stefanie Andrea Hutka

Doctor of Philosophy

Department of Psychology University of Toronto

2015

Abstract

Psychophysiological evidence supports an association between music and speech such

that experience in one domain is related to processing in the other. Musicianship has been

associated with benefits to auditory processing and executive function. It is unclear,

however, whether pitch processing experience in nonmusical contexts, namely speaking a

tone language, has comparable associations with auditory processing and executive

function. The present investigation aimed to clarify this association, with the overarching

goal of better understanding how two different types of pitch processing are linked to

perceptual and cognitive processing. If pitch-processing experience gained via

musicianship or tone-language use shapes perceptual and cognitive processes in similar

ways, then musicians and tone-language speakers (nonmusicians) should outperform

controls without music training or tone-language experience. This hypothesis was tested

in a series of experiments that measured behavioural and neural responses of tone-

language speakers and musicians on tasks of perception (pitch discrimination, pitch

encoding, i.e., representation of pitch-relevant information) and cognition (pitch memory,

iii

visual working memory). Collectively, the findings reveal that benefits to auditory

processing are more closely associated with music training than with tone-language use.

When musicians and tone-language speakers performed comparably on behavioural

tasks, this occurred at the perceptual level (i.e., sound discrimination). Differential

responsiveness of tone-language speakers or musicians was evident at the neural level

(i.e., event-related potentials, brain-signal variability). Neither musicianship nor speaking

a tone language was associated with a benefit to visual working memory. These results

are discussed in relation to the respective contributions of nature and nurture to auditory

processing and visual working memory in musicians and tone-language speakers.

iv

Acknowledgments

I would like to acknowledge the contributions of my supervisors, Dr. Claude Alain and

Dr. Randy McIntosh, and thank them for their guidance, knowledge, and support. I would

like to thank Dr. Sandra Trehub and Dr. Glenn Schellenberg for their roles as thesis

committee members, and for their valuable feedback. Furthermore, I wish to thank my

internal committee members, Dr. Craig Chambers and Dr. Bruce Schneider, as well as

my external appraiser, Dr. Ravi Krishnan, for their helpful comments on my dissertation.

I would also like to thank my collaborators on the studies described herein, including Dr.

Gavin Bidelman, Dr. Sylvain Moreno, Dr. Patrick Bermudez, Dr. Yunjo Lee, Dr. Sean

Hutchins, and Sarah Carpentier. I would also like to acknowledge the funding assistance

of the National Sciences and Engineering Research Council (NSERC), NSERC-Create:

Training in Auditory Cognitive Neuroscience, and the Ontario Graduate Scholarship.

v

Table of Contents Acknowledgments .............................................................................................................. iv

Table of Contents ................................................................................................................ v

List of Tables ..................................................................................................................... ix

List of Figures ..................................................................................................................... x

Chapter 1 The Music–Language Association ..................................................................... 1

1.1 The link between music and speech ........................................................................... 1

1.2 Pitch, music, and tone languages ............................................................................... 3

1.3 Tone language: Associations with auditory processing ............................................. 4

1.4 Nature, nurture, and causality .................................................................................... 7

1.5 Executive function in musicians and tone language speakers ................................. 11

1.6 The present investigation ......................................................................................... 12

Chapter 2 Absolute Pitch and Tone-Language Experience: Associations with Pitch

Processing and Encoding in Musicians ............................................................................. 14

2.1 Introduction .............................................................................................................. 14

2.1.1 Definition of absolute pitch ............................................................................... 14

2.1.2 The link between absolute pitch and language .................................................. 15

2.1.3 The present study ............................................................................................... 16

2.2 Methods .................................................................................................................... 17

2.2.1 Participants ........................................................................................................ 17

2.2.2 Materials ............................................................................................................ 19

2.2.3 Procedure ........................................................................................................... 20

2.3 Results ...................................................................................................................... 22

2.3.1 Accuracy ............................................................................................................ 22

2.3.2 Response Time .................................................................................................. 23

2.4 Discussion ................................................................................................................ 25

2.4.1 Mechanisms underlying performance in AP possessors ................................... 26

2.4.2 No cumulative advantage of AP and tone-language experience ....................... 27

2.4.3 Speed-accuracy trade-off ................................................................................... 28

vi

2.4.4 Limitations ......................................................................................................... 28

2.5 Conclusion ............................................................................................................... 30

Chapter 3 Music Training and Tone-Language Experience: Associations with Sound

Discrimination ................................................................................................................... 31

3.1 Introduction .............................................................................................................. 31

3.1.1 Revisiting the shared processing of music and speech ...................................... 31

3.1.2 The present study ............................................................................................... 32

3.1.3 Electroencephalography: Components of interest ............................................. 33

3.1.4 Hypotheses ......................................................................................................... 34

3.2 Methods .................................................................................................................... 35

3.2.1 Participants ........................................................................................................ 35

3.2.2 Cognitive tests ................................................................................................... 36

3.2.3 Behavioural tasks ............................................................................................... 37

3.2.4 EEG stimuli ....................................................................................................... 38

3.2.5 Procedure ........................................................................................................... 40

3.3 Results ...................................................................................................................... 44

3.3.1 Cognitive tests ................................................................................................... 44

3.3.2 Behavioural tasks ............................................................................................... 44

3.3.3 ERP data ............................................................................................................ 45

3.4 Discussion ................................................................................................................ 51

3.4.1 Musicianship and tone language: Behavioural measures .................................. 53

3.4.2 Auditory neurophysiological benefits of musicianship and tone language ....... 54

3.4.3 Dissociation between neural and perceptual processing of music/speech ......... 55

3.4.4 Modularity of music and speech processing ...................................................... 57

3.4.5 Limitations ......................................................................................................... 57

3.5 Conclusion ............................................................................................................... 58

Chapter 4 A Theoretical Discourse on the Use of Nonlinear Methods to Investigate the

Music-Language Association ............................................................................................ 60

4.1 Common acoustic processing in musicians and tone-language speakers ................ 60

4.2 A nonlinear approach to studying the music-language link ..................................... 63

vii

4.2.1 The brain as a complex, nonlinear system ......................................................... 63

4.2.2 Application to the study of acoustic processing influenced by experience ....... 64

4.3 Brain signal variability ............................................................................................. 64

4.3.1 Current applications of BSV .............................................................................. 66

4.4 Moving from theory to application, in the context of the music-language

association ......................................................................................................................... 67

Chapter 5 Using Brain Signal Variability to Examine Differences between Musicians and

Tone Language Speakers .................................................................................................. 68

5.1 Introduction .............................................................................................................. 68

5.1.1 Brain signal variability: A recapitulation .......................................................... 68

5.1.2 The present investigation ................................................................................... 68

5.2 Methods .................................................................................................................... 70

5.2.1 EEG recording and pre-processing .................................................................... 70

5.2.2 Multiscale entropy analysis ............................................................................... 70

5.2.3 Spectral analysis ................................................................................................ 71

5.2.4 Statistical analysis .............................................................................................. 71

5.3 Results ...................................................................................................................... 73

5.3.1 Task PLS: Multiscale entropy and spectral data ............................................... 73

5.4 Discussion ................................................................................................................ 80

5.4.1 MSE data ........................................................................................................... 80

5.4.2 Comparing MSE results to the spectral analysis results .................................... 82

5.4.3 Comparisons to event-related potential findings (Chapter 3) ............................ 84

5.5 Limitations ............................................................................................................... 85

5.6 Conclusions .............................................................................................................. 85

Chapter 6 Tone Language, Musicianship, and Executive Function ................................. 87

6.1 Introduction .............................................................................................................. 87

6.1.1 Is tone-language experience associated with enhancement in executive function,

as is musicianship? ........................................................................................................ 87

6.1.2 Cognitive benefits in balanced, tone-language bilinguals ................................. 87

6.1.3 Working memory in tone-language speakers and musicians ............................ 88

viii

6.1.4 The present investigation ................................................................................... 90

6.2 Methods .................................................................................................................... 91

6.2.1 Participants ........................................................................................................ 91

6.2.2 Measures ............................................................................................................ 93

6.2.3 Procedure ........................................................................................................... 95

6.3 Results ...................................................................................................................... 96

6.3.1 Correlations ....................................................................................................... 96

6.3.2 F0 DL ................................................................................................................. 97

6.3.3 Pitch memory ..................................................................................................... 98

6.3.4 Two-back task .................................................................................................... 99

6.3.5 ANCOVAs ....................................................................................................... 100

6.4 Discussion .............................................................................................................. 100

6.4.1 Working memory ............................................................................................. 100

6.4.2 Replication of auditory measures, and associated limitations ......................... 103

6.5 Conclusions ............................................................................................................ 104

Chapter 7 General Discussion ......................................................................................... 106

7.1 Summary ................................................................................................................ 106

7.2 Musicianship: Nature versus nurture ..................................................................... 109

7.3 Future directions .................................................................................................... 110

8 Appendices ................................................................................................................ 113

8.1 Chapter 2: Nonmusical Stimuli .............................................................................. 113

8.2 Chapter 3: N1 and P2 ............................................................................................. 114

8.2.1 Introduction ..................................................................................................... 114

8.2.2 Methods: Analysis window and statistics ........................................................ 115

8.2.3 Results ............................................................................................................. 115

8.2.4 Discussion ........................................................................................................ 119

References ....................................................................................................................... 121

Copyright Acknowledgements ........................................................................................ 145

ix

List of Tables

Table 1. Demographic Information for the Four Participant Groups, N = 32.

Table 2. Means and Standard Errors of Mismatch Negativity Analysis Variables at Each

Level.

Table 3. Means and Standard Errors of Laterality Analysis Variables at Each Level.

Table 4. Means and Standard Errors of P3a Analysis Variables at Each Level.

Table 5. Means and Standard Errors of the Late Discriminative Negativity Analysis

Variables at Each Level.

Table 6. Correlations between Tasks Across All Groups.

Table 7. Correlations between Tasks, Displayed by Group.

Table S1. Nonmusical (Control) Stimuli Descriptions.

Table S2. Means and Standard Errors of N1 Analysis Variables at Each Level.

Table S3. Means and Standard Errors of P2 Analysis Variables at Each Level.

x

List of Figures

Figure 1. Group mean accuracy performance. **p < .001. Error bars indicate SE.

Figure 2. Group mean reaction time performance. **p < .01 Error bars indicate SE.

Figure 3. Spectrograms illustrating the standard, large, and small deviant stimuli for the

music (top row) and speech (bottom row) conditions. White lines indicate the

fundamental frequency of each tone or the first formant of each vowel.

Figure 4. Event-related potential (ERP) scalp topography for the mismatch negativity

(MMN) in the (a) large-deviant music, (b) large-deviant speech, (c) small-deviant music,

and (d) small-deviant speech conditions. The cluster of six electrodes is outlined on the

topography of Ms, as this group drove the significant between-group differences in all

conditions. Topographies show mean activation between two time points in each

condition, centered on the mean peak amplitude (190 to 200 ms for large deviants; 200 to

210 ms for small deviants).

Figure 5. A: Performance on the fundamental frequency (F0) difference limen (DL) task.

Musicians (M) and Cantonese-speaking participants (C) showed better pitch

discrimination than non-musicians (NM) controls. B: Performance on the first formant

frequency (F1) DL task. M showed superior discrimination of the first formant in speech

sounds, as compared to C and NM. ** p ≤ .01. Error bars indicate SE.

Figure 6. ERPs difference waves for each group and condition. Each waveform is an

average across six fronto-central channels (inset, F1, Fz, F2, FC1, FCz, FC2). M =

musicians; C = Cantonese speakers; NM = nonmusicians.

Figure 7. Mismatch negativity (MMN) peak amplitude between 100 ms to 250 ms for

each condition and group. The peak amplitude is the average peak of six fronto-central

electrodes (F1, Fz, F2, FC1, FCz, FC2). Error bars indicate SE. M = musicians; C =

Cantonese speakers; NM = nonmusicians.

xi

Figure 8. Loss of information as a result of averaging individual trials in EEG. The

variation between individual trials (left) is lost as a result of the averaging procedure, as

evident in the averaged waveform (right).

Figure 9. First latent variable (LV1), between-groups comparison: Contrasting the EEG

response to the music and speech conditions across measures of multiscale entropy (left)

and spectral power (right). The bar graphs (with standard error bars) depict brain scores

that were significantly expressed across the entire data set as determined by permutation

at 95% confidence intervals. The image plot highlights the brain regions and timescale or

frequency at which a given contrast was most stable; values represent ~z scores and

negative values denote significance for the inverse condition effect.

Figure 10. MSE curves for all groups, averaged across all conditions, at the right angular

gyrus.

Figure 11. First latent variable (LV1), between-conditions comparison: Contrasting the

EEG response to the music and speech conditions across measures of multiscale entropy

(left) and spectral power (right) for nonmusicians. The bar graphs (with standard error

bars) depict brain scores that were significantly expressed across the entire data set as

determined by permutation tests at 95% confidence intervals. The image plot highlights

the brain regions and timescale or frequency at which a given contrast was most stable;

values represent ~z scores and negative values denote significance for the inverse

condition effect.

Figure 12. Second latent variable (LV2), between-conditions comparison: Contrasting the

EEG response to the music and speech conditions across measures of multiscale entropy

(left) and spectral power (right) for Cantonese speakers. The bar graphs (with standard

error bars) depict brain scores that were significantly expressed across the entire data set

as determined by permutation tests at 95% confidence intervals. The image plot

highlights the brain regions and timescale or frequency at which a given contrast was

most stable; values represent ~z scores and negative values denote significance for the

inverse condition effect.

xii

Figure 13. A: Fundamental frequency difference limen task. B: Pitch memory task. C:

Visual two-back task.

Figure 14. Performance on the fundamental frequency (F0) difference limen task.

Musicians (M) showed superior pitch discrimination performance relative to

nonmusicians (NM) controls. **p ≤ .001. Error bars indicate SE.

Figure 15. Between-group performance on the pitch memory task (A: d’ data; B: reaction

time data). A gradient in d’ performance is visible, such that M > TL > NM. M perform

faster than TL. There appears to be a speed-accuracy trade-off in TL, such that good

performance is accompanied by slower reaction times. **p ≤ .001, *p < .05. � =

marginally significant. Error bars indicate SE.

Figure 16. Between-group performance on the two-back task (A: accuracy data; B:

reaction time data). Group differences are not significant. Error bars indicate SE.

Figure 17. Between-group PIQ performance. Group differences are not significant. Error

bars indicate SE.

Figure S1. ERP waves for each group and condition prior to subtraction. Each waveform

is an average across six fronto-central electrodes (F1, Fz, F2, FC1, FCz, FC2). M =

musicians; C = Cantonese speakers; NM = nonmusicians.

1

Chapter 1 The Music–Language Association

1.1 The link between music and speech

In the last 30 years, there has been increasing interest in neurophysiological processing of music

and speech (e.g., Besson & Macar, 1987; Koelsch, Maess, Gunter, & Friederici, 2001; Parbery-

Clark, Skoe, & Kraus, 2009). Between 2000 and 2013 alone, the number of articles containing

the terms “neural” AND “overlap OR sharing” AND “music” AND “language OR speech” has

risen from just under 1000 (in 2000) to nearly 6000 (in 2013; Peretz, Vuvan, Lagrois, &

Armony, 2015). Both music and speech rely heavily on auditory learning, serving as models of

experience-dependent neural plasticity in auditory networks (Patel, 2014; see Chapter 1.4 for a

discussion of gene-environment interactions related to auditory learning in musicians, as well as

for certain types of languages). Their close association is evident in the shared, interactive brain

regions involved in music and speech processing1 (e.g., Koelsch et al., 2001; Maess, Koelsch,

Gunter, & Friederici, 2001; Patel, 2008; Slevc, Rosenberg, & Patel, 2009). For example, the

processing of melody and harmony activates brain regions traditionally associated with

language-specific processes, including Broca’s and Wernicke’s areas (Koelsch et al., 2011;

Maess et al., 2011). In addition, neural regions traditionally associated with higher-order

language comprehension (i.e., frontal areas, such as Brodmann Area 47) are active when trained

musicians process complex musical metre and rhythm (Vuust, Roepstorff, Wallentin, Mouridsen,

& Ostergaard, 2006).

Given the co-activations of neural regions involved in music and speech processing,

researchers have become interested in whether music training can benefit speech processing (see

Besson, Chobert, & Marie, 2011 for a review). For example, musicianship is associated with

superior perception of degraded speech (Bidelman & Krishnan, 2010), speech in noise (Parbery-

Clark, Skoe, Lam, & Kraus, 2009; Zendel & Alain, 2012), and intonation contours (e.g., Schön,

Magne, & Besson, 2004) as well as enhanced phonological awareness (see Tierney & Kraus,

2013 for a review) and binaural sound processing (e.g., Parbery-Clark, Strait, Hittner, & Kraus,

2013). Links between music and languages are further supported by altered brain circuitry in 1 Note that co-activation of neural regions does not, by default, translate to the sharing of neural regions (see Peretz et al., 2015 for a discussion).

2

musicians at both cortical (e.g., Marie, Delogu, Lampis, Belardinelli, & Besson, 2011; Pantev,

Roberts, Schulz, Engelien, & Ross, 2001; Schön et al., 2004) and subcortical levels (e.g.,

Bidelman et al., 2011a; Wong, Skoe, Russo, Dees, & Kraus, 2007), which facilitates sensory-

perceptual and cognitive control of speech information (Bidelman, Hutka, & Moreno, 2013).

These subcortical circuits may be tuned by descending corticofugal projections from the

cortex (e.g., Kirshnan, Xu, Gandour, & Cariani, 2005; Kraus & Chandrasekaran, 2010).

However, it is of interest to compare the two-way interactions of the cortical and subcortical

circuits with the proposals made by the reverse hierarchy theory of auditory processing (Ahissar,

Nahum, Nelken, & Hochstein, 2009). According to this theory, rapid perception is only based on

high-level representations (Ahissar et al., 2009). For example, when listening to a piece of music,

the piece can be identified, despite not explicitly accessing the information that is used for this

identification, such as determining whether or not two subsequent notes ascend or descend

(Ahissar et al., 2009). Thus, this theory supports a model of perception that is driven primarily by

top-down influences, rather than a combination of bottom-up and top-down influences.

The association between music and speech is also the basis of Patel’s (2011) OPERA

hypothesis, which describes how music-driven adaptive plasticity may occur for the processing

of linguistic stimuli when five conditions are met: Overlap of brain networks for processing

speech and music; Precision, such that music processing requires greater precision than speech

processing; Emotion, such that music engages strong positive emotions; Repetition, such that

musical activities that engage this network are frequently repeated, and Attention, such that

focused attention is engaged by musical activities. When these conditions are met, neural

plasticity drives the auditory system to function with greater precision than required for speech

processing. Because music and speech share a common network, speech processing can benefit

from music training. The OPERA hypothesis was predicated on the shared syntactic integration

resource hypothesis (Patel, 2003), which claimed that music and language rely on shared,

limited-processing resources and that these resources activate separable syntactic representations.

In other words, music training may tune the precision of auditory processing (i.e., spectral

acuity), which, in turn, facilitates linguistic processing.

3

1.2 Pitch, music, and tone languages

Pitch is structurally important to both music and speech. As described in Pfordresher and

Brown (2009), pitch information in music provides information about tonality (e.g., Krumhansl,

1990), harmonic changes (e.g., Holleran, Jones, & Butler, 1995), boundaries of phrases (e.g.,

Delige, 1987), rhythm and metre (e.g., Holleran et al., 1995; Jones, 1987), and expectations

about future events (Narmour, 1990). Pitch information in speech conveys word stress, utterance

accent, phrasal meaning, and the speaker’s emotion (i.e., intonation, Cruttenden, 1997; Wong et

al., 2012; see Pfordresher & Brown, 2009). It is important in all verbal languages at the sentence

level (Rattanasone, Attina, Kasisopa, & Burnham, 2013). However, there is a gradient of the

precision of pitch use across languages. Namely, tone languages, unlike other types of

languages, use pitch phonemically (i.e., at the word level, Cutler, Dahan, & van Donselaar, 1997;

Yip, 2002).

Tone languages comprise 60 to 70 percent of the world’s languages (Rattanasone et al.,

2013), and are mostly found in Asia and Africa (Maddieson, 2013). Most Asian tone languages

consist of both level and contour tones (Rattanasone et al., 2013). From the onset to the offset of

a tone, level tones maintain a relatively stable pitch height (Rattanasone et al., 2013).

Conversely, contour tones show pitch height (i.e., pitch interval) changes; from offset to onset,

rising tones increase in pitch, whereas falling tones decrease in pitch (Rattanasone et al., 2013).

In contrast to Asian tone languages, many African tone languages convey lexical information via

pitch height, rather than via contours (Yip, 2002).

Notably, the tone language speakers tested in the current thesis spoke Asian tone

languages, and most prominently, Cantonese. Cantonese speakers were chosen because their

exposure to aspects of pitch would most closely approximate that gained via musical training, as

compared to other tone languages that would be accessible in a participant sample in Toronto. Of

all tone languages, Cantonese has one of the largest tonal inventories, comprising six tones –

three of which are level, and three of which are contour (Rattanasone et al., 2013, Wong et al.,

2012). These level pitch patterns are differentiable based on pitch height (Gandour, 1981;

Khouw & Ciocca, 2007). The proximity of tones is approximately one semitone (i.e., a 6 %

difference in frequency, calculated from Peng, 2006), which is also the smallest distance found

between pitches in music (Bidelman, Hutka, et al., 2013). Note that this does not mean that

4

Cantonese language experience is on par with musicians’ auditory experience. Cantonese

speakers have less pitch processing experience than musicians, who have extensive experience

with twelve level tones (i.e., the number of semitones in a scale) at several octaves, the

processing of pitch contours as a result of the demands of musicianship, and the perception and

production of complex melodies and harmonies. Furthermore, musicians’ auditory demands

include processing simultaneous tones (e.g., chords), and attending to the tone quality (i.e.,

timbre) of their instrument, and other instruments around them. In comparison, tone language

speakers have lesser auditory demands, typically processing a single, sequential stream of

speech, without the same emphasis as musicians on tracking timbral cues. Because of the higher

auditory demands faced by musicians (relative to tone language speakers), one might predict that

benefits to auditory processing in musicians might be greater than to tone language speakers.

Though Cantonese may not confer the same experience as music training, Cantonese is

still arguably more demanding on the auditory system than most tone languages. One might posit

that a speaker of an African language such as Jukun (three level tones; Patel, 2008) would be

similarly skilled at perceiving minute changes in steady-state, level pitch. However, Cantonese

speakers have the experience of using three additional contour tones in addition to the level

pitches to convey lexical meaning, arguably increasing their amount of pitch-processing

experience (i.e., as compared to another language with fewer level and/or contour tones).

Furthermore, one could argue that processing contour tones is associated with greater auditory

task demands than the processing of level tones, making a tone language such as Cantonese a

particularly rich auditory experience.

1.3 Tone language: Associations with auditory processing

The precise use of pitch for both musicians and tone-language speakers raises the

question of whether experience with one type of pitch processing might be associated with pitch

processing in the other domain (Wong et al., 2012). However, unlike the association of

musicianship with spectral acuity and linguistic processing (Schellenberg & Peretz, 2008;

Schellenberg & Trehub, 2008; see also Bidelman, Hutka, et al., 2013), comparable evidence of

associations with tone-language experience is limited and conflicting (Bidelman, Hutka, et al.,

2013; Yip, 2002). Behavioural studies have revealed contradictory findings with respect to the

nonlinguistic pitch perception abilities of tone-language speakers, ranging from weak

5

associations (Giuliano, Pfordresher, Stanley, Narayana, & Wicha, 2011; Wong et al., 2012) to no

associations (Bent, Bradlow, & Wright, 2006; Bidelman et al., 2011b; Schellenberg & Trehub,

2008; Stagray & Downs, 1993). Evidence from brainstem-evoked responses suggests that tone-

language speakers and musicians are more accurate than nonmusicians at tracking the pitch of a

musical interval, a lexical tone, and musical chords (Bidelman et al., 2011a, 2011b). The

equivocal findings of tone-language experience on spectral acuity may be due to limitations of

the behavioural studies, including the heterogeneity of groups (e.g., pooling listeners across

multiple language backgrounds, Pfordresher and Brown, 2009) and the use of very simple

musical stimuli (Bidelman et al., 2011b).

Additional evidence for positive associations between tone language experience and

nonlinguistic domains comes from Wong et al. (2012), who examined the influence of one’s

spoken language on musical pitch processing in individuals with or without amusia.2 The

participants in this study were either tone-language speakers (Hong Kong Cantonese), or non-

tone-language speakers (Canadian French and English). Cantonese speakers showed enhanced

pitch perception abilities, compared to the English and French speakers. This effect remained

even after controlling for variables such as musical background and age. Both groups performed

comparably on measures of rhythmic perception, demonstrating that this benefit was specific to

pitch processing (Wong et al., 2012). When only examining the participants classified as amusic

(i.e., the lowest 5% of each respective group – a cut-off chosen because this is the typical

prevalence rate of amusia in Western populations, Wong et al., 2012), the Cantonese speakers

significantly outperformed the English and French speakers. This effect remained after

controlling for music training.

However, this positive link between tone-language experience and musical pitch

perception in amusics was not evident in another study (Nan, Sun, & Peretz, 2010). This null

finding may be explained because the participants in Nan et al. (2010) were Mandarin rather than

Cantonese speakers (Wong et al., 2012). Specifically, of the four lexical tones of Mandarin, only

one is level (Rattanasone et al., 2013), potentially placing less of a demand on the use of

contextual information for tone processing, as compared to the three level tones (out of a total of

2 Amusia is a neurogenetic disorder affecting music (pitch and rhythm) processing that affects approximately 4 to

6% of the Western (non-tone language speaking) population (Kalmus & Fry, 1980; Peretz et al., 2008).

6

six lexical tones) of Cantonese (Wong et al., 2012). Wong et al.’s (2012) findings support the

position that pitch processing is domain-general, such that if a person with amusia uses pitch to

convey lexical meaning at the word level, they may have better processing of musical pitch, as

compared to a non-tone-language-speaking amusic. This implies that the processing of musical

pitch and lexical tones share certain cognitive resources, in line with the OPERA hypothesis

(Patel, 2011). Contrary to this view, Zatorre and Baum (2012) have posited that pitch processing

in typical (i.e., non-amusic) listeners differs for music and speech, with fine-grained

representations required for music and coarse-grained representations for language3.

Suggestive associations between tone-language experience and auditory processing come

from the literature on absolute pitch (AP), which refers to the ability to name a note without a

reference pitch (Baggaley, 1974). Deutsch, Henthorn, Marvin, and Xu (2006) found that

approximately 53% of tone-language-speaking musicians (Mandarin) possessed AP, as

compared with approximately 7% of musicians who are speakers of non-tone languages. They

suggested that higher rates of AP in tone-language speakers stem from the fact that pitch is used

to distinguish the meanings of words in their native language, such that tone-language speakers

learn to associate tones with meaningful verbal labels. When these individuals begin music

training, learned pitch-meaning associations may facilitate the mapping of musical tones to note

names and hence the development of AP (Deutsch et al., 2006).

In summary, tone-language experience may be associated with auditory spectral acuity

(Bidelman et al., 2011a; Bidelman, Hutka et al., 2013; Deutsch et al., 2006; Wong et al., 2012),

although the evidence favouring this claim is mixed (Bent et al., 2006; Bidelman et al., 2011b,

Giuliano et al., 2011; Schellenberg & Trehub, 2008; Stagray & Downs, 1993). Because the tone-

language speakers are also often bilingual, by virtue of being usually tested in non-tone-language

speaking countries, bilingualism may account for the observed effects. Like music training,

bilingualism has been studied as a model of sensory and cognitive enrichment (Krizman, Marian,

Shook, Skoe, & Kraus, 2012). At the sensory level, bilingualism has been found to affect neural

processing and the perception of sound (e.g., Spanish-English bilinguals: Krizman et al., 2012;

Mandarin-English bilinguals: Bidelman et al., 2011a, 2011b). At the cognitive level, bilingualism

3 This statement does not specifically address fine- versus coarse-grained representation in tone languages. It is

possible that fine-grained representations may be more relevant to tone languages than to non-tone languages.

7

has been linked with gains in inhibitory control, working memory, and attention, honed by

constantly suppressing a non-target language in favour of engaging a target language during

communication (Bialystok, 2011; Bialystok, Craik, & Luk, 2012). Via top-down control,

inhibition can shape auditory processing at the subcortical level (Fritz, Elhilali, David, &

Shamma, 2007). Thus, the cognitive benefits of bilingualism – particularly, to inhibitory control

- may contribute to their enhanced auditory abilities at the sensory level (Krizman et al., 2012).

However, differences in cognitive processing between bilingual and monolingual young adults

are small (Bialystok et al., 2012) and limited to certain conditions (e.g., daily language use and

the age one began using both languages, Luk & Bialystok, 2013).

Indeed, there are claims that the cognitive advantage for bilinguals over monolinguals on

a range of executive-control tasks (e.g., interference, working-memory updating) and for a range

of age groups arises from publication bias favouring enhancement evidence (de Bruin, Treccani,

& Della Sala, 2015). Similar skepticism has been expressed by scholars who failed to find

reliable advantages for early bilinguals, balanced bilinguals (i.e., comparable proficiency or use

of both languages), or trilingualism on inhibitory control, monitoring, and task-switching for

nonverbal tasks (Paap, Johnson, & Sawi, 2014). If bilingualism is not associated with gains to

executive function (at least not outside the auditory domain), then the claim that bilinguals’

sensory-level benefits are subject to top-down modulation is weakened. In short, if tone language

does benefit auditory processing, these gains are likely not due to the fact that they are also

bilingual.

1.4 Nature, nurture, and causality

Musicians and tone-language speakers provide a means to investigate experience-

dependent neural plasticity in auditory networks. Earlier in this chapter (1.2), I discussed the

differential auditory demands associated with music training and speaking a tone language (i.e.,

“nurture”), and how these differential demands might predict each groups’ performance on

auditory tasks. There are also differences in the contribution of “nature” to each group.

Variables such as musical aptitude, genetics, socioeconomic status (SES), and personality

may determine whether an individual takes music lessons. As discussed in Corrigall and

Schellenberg (2015), certain factors, such as increased levels of passion for music and musical

8

aptitude (i.e., natural musical ability, Schellenberg, 2015) can influence the likelihood of taking

music lessons (e.g., Macnamara, Hambrick, & Oswald, 2014; Ruthsatz, Detterman, Griscom, &

Cirullo, 2008). Furthermore, specific genes have been linked to many of these factors (Tan,

McPherson, Peretz, Berkovic, & Wilson, 2014), namely musical aptitude (e.g., Park et al., 2012;

Ukkola, Onkamo, Raijas, Karma, & Järvelä, 2009; Ukkola-Vuoti et al., 2013), musical

achievement (Hambrick & Tucker-Drob, 2015), and practice (Butkovic et al., 2015; Mosing et

al., 2014; Hambrich & Tucker-Dobb, 2015). The gene AVPR1A on chromosome 12q has been

associated with music perception (Ukkola et al., 2009), music memory (Granot et al., 2007), and

music listening (Ukkola-Vuoti et al., 2011), while other loci on chromosome 8q are implicated in

absolute pitch and music perception (Pulli et al., 2008; Theusch, Basu, & Gitschier, 2009;

Ukkola-Vuoti et al., 2013). Yet another gene, SLC64A on chromosome 17q has been linked to

music memory and choir participation (Morley et al., 2012). Therefore, individuals who possess

such genes may gravitate towards, and then continue, musical training (i.e., nurture

complementing and reinforcing nature; Schellenberg, 2011a).

Demographics also influence who becomes a musician. For example, musicians, as

compared to nonmusicians, often come from a higher family SES (e.g., Corrigall, Schellenberg,

& Misura, 2013; Mullensiefen et al., 2014; Sergeant & Thatcher, 1976; Schellenberg, 2006).

Furthermore, children enrolled in music lessons are more likely to be enrolled in extracurricular

activities other than music lessons (Corrigall et al., 2013; Schellenberg, 2006, 2011b). Musicians

also differ from nonmusicians in terms of personality traits (Corrigal et al., 2013; Corrigall &

Schellenberg, 2015). In Corrigall et al. (2013), openness-to-experience was the best predictor of

musical involvement, even when holding demographic variables and cognitive ability constant.

Furthermore, in a recent study, parents’ openness-to-experience predicted the duration of their

children’s musical training, even when controlling for the children’s demographic variables,

intelligence, and personality (Corrigal & Schellenberg, 2015). Given that personality has genetic

correlates (Bouchard, 2004; Matthews, Deary, & Whiteman, 2003), it is plausible that

individuals with certain personality traits might gravitate towards music lessons, continue

playing an instrument, and thus become the musical experts recruited for quasi-experimental

studies comparing musicians and nonmusicians.

9

Though there are several studies that have randomly assigned participants to music

lessons or a control condition (Chobert, Francois, Velay, & Besson, 2014; Francois, Chobert,

Besson, & Schön, 2013; Kraus et al., 2014; Moreno, Lee, Janus, & Bialystok, 2014; Moreno et

al., 2009), the scope of the benefits conferred by music training appear to be quite limited. For

example, for the claim that music training causes improvements in non-musical abilities, the

causal evidence in above five studies is far weaker than the effects observed in other

correlational or quasi-experimental research (Schellenberg, 2015). Such research typically

includes a group of highly-trained, adult musicians, and compares their performance to that of

nonmusicians (e.g., Oechslin, Van De Ville, Lazeyras, Hauert, & James, 2013; Palleson et al.,

2010; Parbery-Clark et al., 2013; Schön et al., 2004; Zuk et al., 2014). Thus, even if music

training can benefit auditory processing, such as speech perception or pitch processing, these

effects tend to be larger when musicians are self-selected in a cross-sectional design, rather than

when randomly assigned to training (Schellenberg, 2015). This observation supports a joint

influence of genes and environment on musicianship.

It is, however, notable that in studies using random assignment participants are trained

for a much shorter period of time than professional-level musician (i.e., four weeks in Moreno et

al., 2014, up to two years in Francois et al., 2013 and Kraus et al., 2014). To test whether long-

term training alone benefits auditory processing and cognition, participants would need to be

randomly assigned to either a music training or a control group, and formally trained for a

duration of time on par with that received by professional-level musicians (e.g., 15.9 years – the

average number of years of formal music training the musicians in this thesis received). Many

studies have indeed found correlations between the duration, quantity, or intensity of music

training and non-musical abilities (e.g., Brod & Optiz, 2012; Corrigall & Trainor, 2011; Strait,

O'Connell, Parbery-Clark, & Kraus, 2014). These correlational findings clearly indicate that

training is important; however, they do not rule out the potential contributions of genetic

predispositions and environmental factors (e.g., genetics, SES; personality; Schellenberg, 2015).

Thus far, it has been established that musicianship is affected by both genetic

contributions and training (i.e., nature and nurture). Throughout this thesis, any benefits observed

in musicians, as well as the term “musicianship”, relate to a gene-environment interaction in this

group. It is acknowledged that one cannot make causal inferences from comparisons of

musicians and nonmusicians in a cross-sectional design, as there is no random assignment in

10

these designs (Corrigall et al., 2013). That is, pre-existing differences may determine who takes

music lessons, driving the effects observed in this group, whereas no such differences are present

in the nonmusician group.

Unlike musicians, tone-language speakers are not subject to self-selection. Nearly

everyone learns a tone language if they are raised in a tone-language-speaking country (e.g.,

China, Thailand) - not just those who are gifted pitch processors. However, it is notable that

there has been some research conducted on the link between genetics and whether one speaks a

tone language (Dediu & Ladd, 2007), as well as interest in individual differences in learning

lexical tonal distinctions (Wong, Perrachione, & Parrish, 2007). Dediu and Ladd (2007)

examined the association between allele frequencies of genes related to brain growth and

development (ASPM and Microcephalin) and typological features of languages. They found a

link between genetic and linguistic diversity, such that certain alleles can bias language

acquisition and/or processing (Dediu & Ladd, 2007). This is not to say that there is a gene for a

specific language (e.g., a “Cantonese-speaker gene”). However, heritable structural and

functional differences in the brain may influence the acquisition and use of tone versus non-tone

languages (Dediu & Ladd, 2007).

Supporting the notion that there may be pre-existing differences related to learning

lexical pitch, Wong et al., (2007) examined how adult speakers of a non-tone language (English)

learned lexical tonal distinctions. In this study, functional magnetic resonance imaging (fMRI)

was used to examine the neural correlates of pitch-pattern discrimination before and after

training on the lexical tonal distinctions. Participants who excelled at learning the distinctions

showed increased brain activation in the left posterior superior temporal area after training.

However, those who reached a lower ceiling of performance showed increased activation in the

right superior temporal area and right inferior frontal gyrus (associated with nonlinguistic pitch

processing), and medial frontal areas (associated with increased use of memory and attention

resources; Wong et al., 2007). Neural activation differed between groups even before training,

suggesting that pre-existing neural differences contribute to learning lexical tonal pitch.

Collectively, these studies suggest that there may be genetic correlates at the population

level and/or individual differences related to the use of tone versus non-tone languages,

suggesting that tone language may not exclusively be the product of “nurture”. However, these

11

findings do not change the fact that speaking a tone language does not qualify as self-selection in

the way the term applies to musicians (i.e., one has a predisposition, seeks out an activity that

complements that predisposition, and continues training in that activity). I predict that acquiring

one’s pitch processing abilities via a nature-nurture interaction (i.e., musicians) might be

associated with better performance on auditory tasks than those who acquire pitch processing

abilities via only nurture. This prediction is based on the possibility that self-selecting factors

(e.g., musical aptitude, genetics), in conjunction with music training, would be associated with

greater benefits to auditory processing, as compared to tone language speakers, who were not

self-selected for pitch processing abilities. I examine this prediction in relation to findings

throughout the thesis.

1.5 Executive function in musicians and tone language speakers

In addition to benefits in spectral acuity, musicianship has been associated with

enhancements to certain executive functions (see Section 6.1.3. for a detailed discussion on the

nature-nurture interaction underlying nonmusical, cognitive benefits in musicians versus

nonmusicians). Executive function is defined here as a collection of top-down mental functions

(Diamond, 2013), including planning, working memory, inhibition, mental flexibility, initiation

of action, and monitoring of action (Chan, Shum, Toulopoulou, & Chen, 2008). For example,

musicians, as compared to nonmusicians, have shown enhancements in auditory working

memory (Pallesen et al., 2010; Parbery-Clark, Skoe, & Kraus, 2009), response inhibition

(Moreno et al., 2011) as well as verbal fluency, processing speed, and task switching (Zuk,

Benjamin, Kenyon, & Gaab, 2014). Moreover, there is a positive correlation between executive

function and pitch identification (Hou et al., 2014). However, it is important to interpret such

correlational evidence with caution, until causality can be assessed.

These findings are accounted for by the multiple executive sub-skills recruited by

musicians, such as sustained attention, working memory, and goal-directed behaviour (Zuk et al.,

2014). These executive-function benefits may modulate the cognitive-linguistic enhancements

(e.g., selective attention for speech in noise, Parbery-Clark, Skoe, Lam, et al., 2009) that have

been observed in musicians (Moreno & Bidelman, 2014; Zuk et al., 2014). However, many of

the executive-level enhancements observed in musicians may be limited to the auditory domain,

and may not extend to non-auditory executive functions. For example, several studies have found

12

advantages in auditory but not visual working memory in musicians, as compared to

nonmusicians4 (e.g., Brandler & Rammsayer, 2003; Chan, Ho, & Cheung, 1998; Ho, Cheung, &

Chan, 2003; Parkery-Clark, Skoe, & Lam, et al., 2009; Strait, Kraus, Parbery-Clark, Ashley,

2010). If all musicians indeed do not have enhanced cognitive function outside of the auditory

domain, this may mean that any top-down modulation of sensory processing in musicians is

limited to the auditory system (i.e., within-domain top-down control rather than cross-domain

top-down control).

Furthermore, there is evidence to suggest that top-down modulation may impact lower-

level processing via executive function in musicians. I posit that this modulation might also

occur in tone-language speakers. Speaking a tone language involves relative pitch processing

(Xu, 1997, 1999; Xu & Wang, 2001), which has been shown to recruit working memory (Klein,

Coles, & Donchin, 1984; Wayman, Frisina, Walkton, Hantz, & Crummer, 1992). Perhaps

working memory modulates lower-level (i.e., pitch processing) abilities in tone-language

speakers via top-down modulation. The present thesis tests whether musicians and tone-language

speakers perform comparably on measures of working memory and auditory processing, as

compared to controls (Chapter 6).

1.6 The present investigation

The overall objective of this thesis was to assess whether tone-language experience is

associated with benefits in auditory processing and executive function similar to those associated

with music training. Chapter 2 builds on the promising link between tone language and auditory

processing enhancements via AP, describing a behavioural study that examined how AP and

tone-language experience contribute to music processing and the representation of pitch-relevant

information (i.e., pitch encoding). Musicians with and without AP who speak a tone or non-tone

language were tested to determine if AP, tone-language experience, both, or neither are

associated with enhanced processing and encoding of pitch (i.e., one aspect of spectral acuity).

Although this study addressed the relative contribution of tone-language experience to spectral

acuity, it was difficult to ascertain the independent contributions of music and linguistic expertise

4 However, there are also findings of modality-independent memory enhancement in musicians (Bidelman, Hutka,

et al., 2013; George & Coch, 2011; Jakobson, Lewycky, Kilgour, & Stoesz, 2008).

13

because all participants were musicians. For this reason, all subsequent studies in the thesis

compared tone-language speakers who were nonmusicians with musicians with had no tone-

language experience and controls who had neither tone-language experience nor music training.

Chapter 3 describes a study that used behavioural tasks and electroencephalography

(EEG) to examine pitch discrimination of musical and vowel sounds by tone-language speakers,

musicians, and controls. This study examined relations between tone-language experience on

spectral acuity in several ways. First, a mismatch-negativity paradigm was used for sound

discrimination, building on a well-established research literature on auditory processing (see

Näätänen, Paavilainen, Rinne, & Alho, 2007 for a review). Second, the musical and vowel

stimuli allowed for an examination of within- and cross-domain enhancements for the three

groups of participants. Third, the fundamental frequency (pitch) and first formant (timbre) were

manipulated to probe core spectral-processing abilities. EEG was selected as a response measure

not only for its high temporal resolution and cost-effectiveness but also to eliminate the

potentially confounding effects of scanner noise on auditory processing.

Chapter 4 provides a novel theoretical approach to the association between tone-language

experience or musicianship on perception. To date, linear methods, such as averaging across

EEG trials, have been used to study the association between musicianship and language (e.g.,

Schön et al., 2004). The new theoretical approach applies non-linear analyses of brain activity to

clarify the relative contributions of musicianship and tone language-experience to auditory

processing and complement the linear measures that have been used in previous investigations of

music–language associations.

Chapter 5 applies this theoretical approach to the EEG data, examining non-linear

dependencies in the brain signal over multiple timescales (i.e., multiscale entropy), with the

application of data-driven multivariate statistical analysis known as partial least squares. These

results are then compared to the results of Chapter 3, which examined linear dependencies in the

data (i.e., averaging event-related potentials across trials).

Chapter 6 describes a behavioural study that used spectral-acuity and executive- function

(i.e., pitch memory, working memory) tasks to examine the range of plasticity associated with

tone-language experience. Chapter 7, the final chapter, summarizes the findings, and revisits the

role of nature and nurture in musicianship, as well as the role of nurture in tone language

experience. Potential applications and future directions of the current investigation are also

discussed.

14

Chapter 2 Absolute Pitch and Tone-Language Experience:

Associations with Pitch Processing and Encoding in Musicians

2.1 Introduction

2.1.1 Definition of absolute pitch

Absolute pitch (AP, also known as perfect pitch) is the ability to identify or produce a specific

pitch without a reference pitch (Baggaley, 1974). Individuals with AP have long-term memory

for musical pitch, remembering the pitches by name (Levitin, 1994). AP depends on pitch

memory and pitch labeling (Levitin, 1994). Pitch memory, which is the ability to maintain and

access stable, long-term representations of specific pitches in memory (Levitin 1994), is

commonly found in nonmusicians as well as musicians (Deutsch, 1987; Halpern, 1989;

Schellenberg & Trehub, 2003; Terhardt & Seewann, 1983; Terhardt & Ward, 1982). For

example, Schellenberg and Trehub (2003) found that university students without AP

distinguished familiar television soundtracks (instrumental) from versions that were pitch-shifted

by two semitones at 70% accuracy and from versions pitch-shifted by one semitone (i.e., the

smallest meaningful pitch difference in Western music) at 58% accuracy.

By contrast, pitch labeling—the ability to attach the correct musical label (e.g., D#, A440,

or Do) to isolated pitches—is rare and necessarily limited to those with music training (Levitin,

1994). In the United States and Europe, the prevalence of AP, and thus pitch labeling, is

estimated to be less than 0.01 % (i.e., fewer than one in 10,000 people) in the general population

(Profita & Bidder, 1988). In musically-trained individuals, this percentage has been shown to

vary, depending on the type of institution or music program in which one is enrolled (Gregersen,

Kowalsky, Kohn, & Marvin, 1999). In a large survey of music students in the United States,

24.6% of conservatory students had AP. These numbers dropped to 7.3% in a university-based

school of music, and 4.7% in a liberal arts or state university music program (Gregersen et al.,

1999). AP ability is often associated with musical giftedness, the early onset of music training,

and speaking a tone language (Deutsch et al., 2006; Takeuchi & Hulse, 1993; Ward, 1999).

15

2.1.2 The link between absolute pitch and language

The prevalence of AP is significantly higher in countries that use a tone language than in

those that do not (Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Deutsch et al., 2006).

For example, Gregersen et al. (1999) found that in students who reported their ethnic

backgrounds as “Asian or Pacific Islander”, 49.3% of conservatory students, 25.7% of university

music program students, and 8.3% of liberal arts or state university music programs had AP.

These numbers are dramatically higher than what the authors observed in non-Asian music

students, particularly at the conservatory and university-music-program level. This spread

between AP prevalence in Asian versus non-Asian participants was also observed in Deutsch et

al. (2006), who found that approximately 53% of Mandarin-speaking conservatory musicians

possessed AP, where as only approximately 7% of non-tone-language-speaking conservatory

musicians possessed AP.

The increased prevalence of AP in tone-language speakers may be related to early-

learned associations between pitches and lexical categories in tone-language speakers (Deutsch,

2013, Deutsch & Dooley, 2013; Deutsch et al., 2006; Lee & Lee, 2010). These associations may

positively influence the likelihood of developing AP, above and beyond the likeliness of

developing AP if you are a non-tone-language-speaker who starts music training early in life.

Some evidence for this claim comes from the finding that, in conservatory music students who

begin music lessons at ages 4 to 5, approximately 60% of tone-language speakers develop AP,

while less than 20% of non-tone-language speakers develop AP (Deutsch et al., 2006). Some

have also found that tone-language (and not non-tone) speakers consistently enunciate words in

their native language at a consistent pitch (Deutsch, Henthorn, & Dolson, 1999, 2004),

suggesting that tone language speakers may also have more precise pitch memory than those

who speak a non-tone language.

Numerous studies have identified a relationship between AP and neural circuitry

underlying speech processing, specifically, in the planum temporale (PT)—a temporal lobe

region that, in the left hemisphere, contains Wernicke’s area, and is critically involved in

processing semantic meaning in speech (Deutsch & Dooley, 2013). In most individuals, the PT is

leftward-asymmetric (Geschwind & Levitsky, 1968), and this asymmetry is larger in those with

AP as compared to those without AP (Keenan, Thangaraj, Halpern, & Schlaug, 2001; Schlaug,

16

Jancke, Huang, & Steinmetz, 1995; Zatorre, Perry, Beckett, Westbury, & Evans, 1998).

Furthermore, individuals with AP have greater white matter connectivity in the left PT and

surrounding areas when performing a speech processing task (Oechslin, Meyer, & Jäncke, 2010).

Those with AP also have heightened connectivity of white matter between regions in the left

temporal lobe - a region that is responsible for speech-sound categorization (Loui, Li, Hohmann,

& Schlaug, 2011). The neuroanatomical differences between AP and non-AP individuals in

speech-related areas suggest different neural circuitry underlying their memory for speech

sounds. Taken together, these findings suggest that AP is linked to language as well as to music.

The interconnection across domains is further strengthened by evidence for enhanced

auditory processing in musicians and tone-language speakers. Bidelman, Hutka, et al. (2013)

demonstrated that pitch-processing experience, whether it stemmed from tone language or music

experience, was linked to similar benefits in lower-order (pitch-discrimination sensitivity,

processing speed) and higher-order (tonal memory, melodic discrimination) processes necessary

for robust music perception. This finding indicates that tone-language experience, even in the

absence of music training, can contribute to enhanced performance on music tasks.

2.1.3 The present study

Collectively, the findings on the AP–language link, particularly as related to tone

language, raise the question of how AP and tone-language background could affect behavioural

performance on tasks of music processing and encoding. Addressing this research question

would provide insights into the relationship between music and language as well as the

mechanisms of AP. If AP and tone language contribute to pitch processing and encoding, then

being an AP possessor and a tone-language speaker may yield a cumulative advantage in pitch

processing and encoding. Conversely, if AP and tone language represent independent domains,

they should not interact on tasks of pitch processing and encoding. To test these hypotheses, the

performance of tone-language-speaking musicians with and without AP was compared to that of

non-tone-language-speaking musicians with and without AP.5 A two-by-two between groups

design using a zero-back and one-back task was used to investigate the processing and encoding

of music stimuli.

5 This study has been published as Hutka and Alain (2015).

17

2.2 Methods

2.2.1 Participants

Thirty-five participants completed the study. Three participants were excluded due to

technical difficulties that resulted in incomplete data sets. Of the remaining 32 participants, there

were 17 females and 15 males, ranging in age from 18 to 28 (mean, M = 22.53; standard error,

SE = 0.51). All participants were instrumental musicians who had a minimum of seven years of

formal training (M = 17.36; SE = 0.78). There was no effect of gender on either accuracy, F(1,

30) = 1.59, p = .217, or reaction time, F < 1. Participants whose primary instrument was voice or

percussion were not recruited. All participants reported normal hearing and normal or corrected-

to-normal visual acuity. Participants were divided into four groups: (1) no AP, tone language; (2)

no AP, non-tone language; (3) AP, tone language, and (4) AP, non-tone language. Participants,

who had an average of 17.25 years of formal education (SE = 0.39), did not differ in education

across groups, F(3, 28) = 1.68, p = .195, η2p = .152. Table 1 displays additional demographic

information for each of the four participant groups. It is notable that participants with AP started

music lessons earlier and had more years of music training than participants without AP. Ethics

approval was granted by the Baycrest Research Ethics Board and the Department of Psychology

Ethics Review Committee at the University of Toronto. Participants were recruited by flyer and

referral, primarily from the University of Toronto’s Faculty of Music. All participants had Grade

Eight Royal Conservatory of Music accreditation or equivalent (i.e., entry requirements for many

university and college music programs) and were comfortable sight-reading in the treble clef.

Participants were compensated $10.00 per hour and were reimbursed for transit or parking. Each

participant completed one test session, which lasted approximately one-and-a-half hours. All

participants provided written, informed consent.

AP ability was assessed for each participant via an AP test created for this experiment.

The AP test consisted of 20 1500-ms piano tones, randomly chosen from across the eight octaves

of a piano keyboard. Tones were generated using Sibelius v3.0, exported as an audio file, and

normalized in Adobe Audition v1.5. Fundamental frequency values were based on an equal-

tempered scale, A4 = 440Hz. After hearing a note, the participant would write down the note

name on a response sheet. If participants correctly identified 80% or more of the notes (16 or

more out of 20), they were considered AP-possessors. The range and randomization of the notes

18

made it difficult for anyone to receive a score of 16 or higher by chance. The 80% cut-off score

is based on procedures described in Ross, Gore, and Marks (2005) and Wu, Kirk, Hamm, and

Lim (2008).

Table 1

Demographic Information for the Four Participant Groups, N = 32.

To be included in a tone-language group, participants had to be native speakers of

Mandarin or Cantonese. A language questionnaire was administered to determine whether

participants were fluent in oral and written Mandarin or Cantonese as well as the age at which

they began to speak English fluently. The average age at which the tone-language speakers

learned English was 4.44 years (SE = 1.05). All tone-language participants reported that they

were still fluent in their native tone language at the time of the study.

Group Gender

distribution

Average age

(years)

Average onset

age of music

training

Average

number of

years of formal

training

AP test score

(% out of 100)

No AP, tone

language

(n = 7)

3 females

4 males

21.43

Range: 18-27

SE = 1.08

5.43

Range: 2-8

SE = 0.86

16.00

Range: 13-23

SE = 1.50

29.29

Range: 5-65

SE = 4.96

No AP, non-

tone language

(n = 10)

4 females

6 males

22.00

Range: 18-27

SE = 0.90

7.10

Range: 3-13

SE = 0.72

14.90

Range: 7-24

SE = 1.26

14.50

Range: 0-30

SE = 4.15

AP, tone

language

(n = 9)

5 females

4 males

22.67

Range: 19-27

SE = 0.95

3.72

Range: 2-5

SE = 0.76

18.94

Range: 15-23

SE = 1.32

96.67

Range: 85-100

SE = 4.38

AP, non-tone

language

(n = 6)

5 females

1 male

24.50

Range: 21-28

SE = 1.16

3.83

Range: 2-6.5

SE = 0.93

20.67

Range: 16-25

SE = 1.62

98.33

Range: 90-100

SE = 5.36

19

2.2.2 Materials

Auditory and visual stimuli were presented using Presentation v.14.1 under Windows XP.

This bi-modal task was chosen to create a realistic performance environment for the participants.

During performances, musicians usually read printed music while integrating auditory material

that matches or does not match what is on the printed page. Language-related stimuli (e.g.,

speech sounds) were not included to focus on musical pitch processing abilities.

All auditory stimuli had a sampling rate of 44.1 kHz with 16-bit-resolution and were

presented binaurally via Eartone 3A Insert earphones (Indianapolis, IN), at an average of 75-

decibel (dB) sound pressure level (SPL). The intensity of the stimuli was measured using a

Larson-Davis SPL metre (Model 824, Provo, Utah). The music stimuli were presented in piano

timbre. There were three auditory, music-stimulus conditions: Interval, Tonal, and Atonal. The

interval stimuli consisted of two melodic (i.e., sequentially-presented) piano tones, with a total

duration of 2 s. Intervals ranged from perfect unison to an octave above or below a given note

(including perfect unison, minor second, major second, minor third, major third, perfect fourth,

augmented fourth, perfect fifth, minor sixth, major sixth, minor seventh, major seventh, and

perfect octave). There were two of each interval type, with the exception of the two pairs of

perfect unison intervals: One ascending interval (i.e., lower note to higher note) and one

descending interval (i.e., higher note to lower note), thus comprising a total of 26 interval

stimuli. Each interval started on a different note. The tonal condition consisted of eight piano

tones, with a total duration of 5000 ms, arranged in a short melody. Tonal melodies followed the

scale pattern that defines the diatonic, major scale (all flat and sharp keys represented) and

harmonic minor scale in Western music. There were 60 different tonal melodies presented to

participants. The atonal condition consisted of eight piano tones, presented for a total of 5000 ms

in duration, arranged in a short melody. There were 12 atonal melodies in total. The starting tone

for each atonal melody began on a unique note name (i.e., C, C#, D, D#, E, F, F#, G, G#, A, A#,

or B). The subsequent tones in the melody were selected to ensure they did not follow any tonal

conventions of Western musical theory (i.e., did not follow minor or major scale patterns;

featured disjunct leaps (e.g., major 7th) and/or highly chromatic passages). Music stimuli (both

auditory and visual) were created for this study using Sibelius 3.0. Auditory, non-music (i.e.,

control) stimuli consisted of 11 complex, environmental sounds, such as the sound of a keyboard

20

typing, presented for 1 s (see Appendix, Table S1 for complete list). All non-music stimuli

(auditory and visual) were part of a laboratory database of auditory and visual stimuli.

All visual stimuli were presented right-side-up for 4000 ms, in the middle of a computer

screen. Participants were seated approximately 85 cm from the computer screen. Music stimuli

consisted of quarter notes presented on a stave in the treble clef. Non-music stimuli consisted of

the visual analogue of the corresponding auditory, non-music stimuli (e.g., a picture of a dog as

the visual analogue of the sound of a dog barking). The inter-stimulus interval (ISI) was 2 s.

The musical stimuli were selected for their ecological validity, representing a range of

music stimuli that a musician might encounter. As Dooley and Deutsch (2011) note, some

studies found that AP possessors are subject to Stroop-like interference effects in artificial

situations, leading to the conclusion that AP is musically irrelevant (e.g., Miyazaki, 1993;

Miyazaki & Rakowski, 2002). Examples of artificial situations include the use of detuned

intervals or movable-do labels (see Dooley and Deutsch, 2011 for additional discussion). Such

artificial stimuli were avoided in the present study.

2.2.3 Procedure

Following completion of the AP test in a sound-attenuating booth, participants completed

a zero-back and one-back task (counterbalanced). Participants were familiarized with each task

by completing 20 practice trials of the zero- and one-back tasks, respectively. Prior to beginning

each familiarization task, participants were instructed to read all music stimuli from left to right.

Participants were also instructed to respond as quickly and as accurately as possible upon

deciding if the two stimuli did or did not match because they only had 4 s to make their response

after the visual stimulus was presented. Participants were then presented with either music

(interval, tonal, atonal) or non-music stimuli. After a 2-s ISI, in which a white fixation-cross was

presented mid-screen, a music or non-music visual stimulus was presented.

In the zero-back task, the visual stimulus matched or did not match the target auditory

stimulus. Participants indicated “match” or “mismatch” of the stimuli by pressing the

corresponding button on the keyboard. Participants completed 121 zero-back trials. In the one-

back task, participants were presented with an auditory stimulus followed by a visual distractor

(presented for 4 s), which was followed by a visual stimulus (presented for 4 s). The stimuli and

21

the distractor could be either non-music or music. Participants had to verbally identify the visual

distractor while keeping the preceding auditory stimulus in memory. For music stimuli,

participants identified the first note name in an interval or melody; for non-music stimuli,

participants identified the image on the monitor. A visual distractor was chosen over an auditory

distractor (which would vary in presentation length, depending on whether it was non-music, an

interval or a melody) such that the presentation duration of the distractor was uniform. The visual

stimulus presented after the distractor either matched or did not match the auditory stimulus.

Participants indicated “match” or “mismatch” of the stimuli by pressing the corresponding button

on the keyboard. Participants completed 60 one-back trials.

2.2.3.1 Statistical analyses

Accuracy and response times were analyzed using a mixed factorial-design analysis of

variance (ANOVA). The between-subjects variables were AP status (AP or no AP) and tone-

language status (tone-language speaker6 or not a tone-language speaker). The repeated measures

were stimulus type (music vs non-music stimuli) and load (0-back vs 1-back). When appropriate,

degrees of freedom were adjusted with the Greenhouse-Geisser epsilon (ε), and all reported

probability estimates are based on the reduced degrees of freedom, although the original degrees

of freedom are reported. The Bonferroni correction for multiple comparisons was also reported

for pairwise comparisons. Statistical significance was set at alpha = 0.05. Partial eta-squared

(η2p) was used as the measure of effect size for ANOVAs.

An analysis of covariance was also run on accuracy and reaction-time data because of a

significant group difference in age of onset of music lessons7, F(3, 28) = 4.32, p = .013, η2p =

.316. This effect was driven by tone-language speakers with AP having a younger mean age of

onset of music lessons (M = 3.72 years, SE = 0.76 years) than non-tone-language-speakers

without AP (M = 7.10 years, SE = 0.72 years; p = .013). As noted, early onset of music training

promotes the acquisition of AP (Baharloo et al., 1998; Miyazaki & Rakowski, 2002; Deutsch et

al., 2006, 2009). Two age-related factors have been speculated to be associated with the

6 There were no differences between Cantonese and Mandarin speakers’ performance on any of the dependent

measures (accuracy, reaction time), F’s < 1. 7 Due to the challenges associated with finding participants who fit all of the eligibility criteria, it was difficult to match participants on this measure.

22

development of AP: the age of onset of formal music training and exposure to a “fixed-do”8

musical system before age 7 (Gregersen et al., 1999; Gregersen, Kowalsky, Kohn, & Marvin,

2001; Gregersen, Kowalsky, & Li, 2007). The assumption of homogeneity of regression slopes

was tested for each ANCOVA to ensure that the assumption was not violated.

2.3 Results

2.3.1 Accuracy

Figure 1 shows group mean accuracy across stimulus type and load. There was a

significant main effect of AP status, F(1, 28) = 18.990, p <.001, η2p = .404. Specifically,

participants with AP were more accurate than those without AP. There was no significant main

effect of tone-language-speaker status (F(1, 28) = 1.017, p = .332), nor a significant interaction

of AP and tone-language status (F < 1).

There was a significant difference in stimulus condition, F(3, 84) = 18.227, p < .001, η2p

= .394, which was driven by the significant difference between performance on the control

stimuli (M = 98.77, SE = 0.38) and the other conditions, specifically the tonal melody (M =

94.51, SE = 0.82, p < .001), atonal melody (M = 91.81, SE = 1.98, p < .001), and interval

conditions (M = 94.42, SE = 0.83, p < .001). There was also a marginally significant difference

between tonal and atonal melodies, p = .081. There was a significant interaction between

condition and AP status, F(3,84) = 4.723, p = .008, η2p = .144. Specifically, participants with AP

significantly outperformed participants without AP for the consonant condition (AP: M = 98.201,

SE = 1.150; no AP: M = 90.795, SE = 1.080, p < .001), the dissonant condition (AP: M = 94.598,

SE = 1.668; no AP: M = 88.936, SE = 1.567, p = .020), and the interval condition (AP: M =

97.742, SE = 1.208; no AP: M = 91.190, SE = 1.135, p < .001). For the non-musical condition,

AP participants (M = 99.394, SE = 532) marginally outperformed the non-AP participants (M =

98.068, SE = .500, p = .079). There was no significant interaction between condition and tone-

language status (F(3,84) = 1.099, p = .347), nor a significant interaction between condition, AP

status, or tone language status, F < 1. 8 Note that the fixed-do system (i.e., absolute association of labels with note names, such as “do” with “C”) is

contrasted with the relative, “moveable-do” system, where a label (e.g., do) refers to the starting pitch of a given musical scale (Schellenberg & Trehub, 2008). It is notable that China uses the moveable-do system (Schellenberg & Trehub, 2008), and thus, the higher rates of AP in Chinese music students (e.g., Deutsch et al., 2006; Gregersen et al., 1999) cannot be attributed to the absolute associations used in the fixed-do system.

23

There was no difference between accuracy on the zero- or one-back task, nor a significant

interaction between task type (zero- versus one-back) and AP status, Fs < 1. Similarly, there

were no significant interactions between task type and tone-language status (F(1,28) = 1.216, p =

.279), nor between task type, AP status, and tone-language status, F(1,28) = 1.093, p = .305. No

other interactions were significant, Fs < 1.

2.3.1.1 ANCOVA

Significant group differences remained after controlling for the age of onset of formal

music training, F(3, 27) = 4.35 , p = .013, η2p = .326. The covariate, age of onset of formal music

training, was unrelated to accuracy, F < 1.

Figure 1. Group mean accuracy performance. **p < .001. Error bars indicate SE.

2.3.2 Response Time

Figure 2 shows group mean response time across stimulus type and load. There was a

main effect of AP status, F(1, 28) = 15.216, p = .001, η2p = .352. Specifically, participants with

AP were faster than those without AP. There was no significant main effect of tone-language-

speaker status, nor a significant interaction of AP and tone-language status (Fs < 1).

There was a significant difference in reaction times across stimulus conditions, F(3, 84) =

130.178, p < .001 , η2p = .823, which was driven by the significant difference between the

control stimuli (M = 1074, SE = 42) and the other three conditions, specifically the tonal melody

(M = 1886, SE = 50), atonal melody (M =1966, SE = 50), and interval conditions (M = 1475, SE

24

= 60), all p < .001. The tonal and atonal conditions were also significantly slower than the

interval condition (both p < .001), which is to be expected because of only two notes in the latter

condition. There was no interaction between condition and AP status nor between condition and

tone language status, Fs < 1. The interaction between condition, AP status, and tone language

status was also not significant, F(3, 84) = 1.571, p = .202.

Participants performed faster on the zero-back (M = 1560, SE = 39) than on the one-back

task (M = 1640, SE = 46) , F(1, 28) = 8.07, p = .008, η2p =.224. There were no interactions

between task type (zero- versus one-back) and AP status (F(1, 28) = 2.602, p = .118) nor with

tone language status, F < 1, The interaction between task type, AP status, and tone-language

status was not significant, F < 1, nor was the interaction between task type and condition (F(3,

84) = 1.804, p = .166). Lastly, the interaction between task type, condition, and AP status was

not significant, F(3, 84) = 1.107, p = .344.

Figure 2. Group mean reaction time performance. **p < .01. Error bars indicate SE.

2.3.2.1 ANCOVA

The group differences remained significant even after controlling for age of onset of

formal music training, F(3, 27) = 3.73, p = .023, η2p = .293. The covariate, age of onset of formal

training, was unrelated to accuracy, F < 1.

25

2.4 Discussion

Adults with AP outperformed those without AP on measures of accuracy and response

time. Specifically, AP participants were more accurate across musical conditions than non-AP

participants, regardless of tone-language status. AP participants were also significantly faster

than their non-AP counterparts, when averaging across all conditions. There was no advantage of

having AP and speaking a tone language. Although speaking a tone language may increase the

likelihood of AP (Deutsch et al., 2006), AP rather than tone-language experience seemed to be

the main source of the pitch-encoding advantage in the present study. This pattern of results

remained significant even when controlling for the age of onset of music lessons.

The advantage in pitch processing and encoding observed for AP musicians may stem

from their use of both pitch-labeling and pitch-memory skills, as compared to the musicians

without AP, who only use pitch memory (Levitin, 1994). Other studies have found that AP

possessors performed better than non-AP possessors on tasks such as music-dictation and

interval-naming (Dooley & Deutsch, 2010, 2011), which presumably benefit from a combination

of pitch memory and pitch labelling skills (versus only pitch memory). This is not to say that

both groups’ pitch memory abilities are equal, and that only pitch labeling contributed to the

observed benefits in the former group. Perhaps AP musicians have better pitch memory than

those without AP, which, through interaction with the ability to label pitches, gives these

participants two powerful cues (i.e., pitch and the label) to use for encoding sound. In

comparison, those without AP may only use the pitch memory cue, which is not developed to the

same extent as pitch memory in AP musicians.

The present benefits observed in AP musicians may also be related to an association

between auditory digit span and AP ability. For instance, Deutsch and Dooley (2013) found that

AP possessors had a larger auditory digit span than their non-AP counterparts. According to

these authors, a large auditory span facilitates the development of associations between pitch and

verbal labels in early life, promoting AP acquisition. It is also possible that a larger auditory span

is a consequence of AP. Unfortunately, it would be difficult, if not impossible, to test the

causality underlying this association between auditory span and AP because one cannot

randomly assign participants to an AP or non-AP group. That is, the development of AP may be

influenced by number of factors, such as genetic influences (Gregersen et al., 1999; Gregersen,

26

Kowalsky, Kohn, & Marvin, 2001), early age of onset of formal music training and early

exposure to a fixed-do musical system (Gregersen et al., 1999, 2001, 2007), and speaking a tone

language (e.g., Deutsch et al., 2006; Gregersen et al., 1999). Therefore, AP cannot simply be

“trained”. Regardless of whether AP is the consequence of a larger auditory span, or vice versa,

this increased span may underlie the better behavioural performance observed in the present AP

participants. Future studies could include a measure of auditory span to establish the association

between task performance and span size, as well as an early age of onset of formal music training

and early exposure to a fixed-do musical system (Gregersen et al., 1999, 2001, 2007).

2.4.1 Mechanisms underlying performance in AP possessors

What mechanisms can account for the superior performance of individuals with AP

relative to those without AP? Four potential mechanisms include: Hyper-connected temporal-

lobe regions required for the perception and association of pitch (Loui et al., 2011); increased

functional activity in superior temporal regions that are critical for sound perception and

categorization (Loui, Zamm, & Schlaug, 2012); gradient of AP ability (Miyazaki, 1988, 1990;

Bermudez & Zatorre, 2009), and relatedly, the existence of heightened tonal memory in certain

types of AP possessors (Ross et al., 2005; Loui et al., 2011). First, musicians with AP show

higher white matter connectivity in brain regions responsible for pitch perception and

association, such as the posterior superior and middle temporal gyri, in both the left and right

hemisphere (Loui et al., 2011). Furthermore, AP musicians, as compared to musicians without

AP, showed increased functional activations in superior temporal regions critical for sound

perception and categorization, as well as increased activations in multisensory-integration areas

(Loui et al., 2012). The structural and functional differences between AP and non-AP musicians

may account for the current group differences in performance.

The behavioural outcomes of the present study may also be impacted by mechanisms

related to a spectrum of AP ability. That is, AP is not a binary trait (Loui et al., 2011), instead

reflecting a continuum of skill (Bermudez & Zatorre, 2009). Baharloo et al. (1998) categorized a

sample of AP musicians into four groups, according to where participants’ performance fell on a

distribution based on pure-tone and piano-timbre-based AP test scores. In their study, the scores

of 12 musicians without AP on tests of pure-tone- and piano-tone-based AP tests were combined

with the scores on these tests from 12 randomly-selected musicians with AP. The mean pure- and

27

piano-tone test scores, and the standard errors for these means, were calculated. This process was

repeated 100 times, and the means of these 100 means and standard errors were calculated for

the pure-tone and piano-tone test scores (i.e., bootstrapping). Participants were classified as “AP-

1” based on participants’ performance on pure-tone AP tests (i.e., the ability to label any pitch

regardless of its timbre or other attributes. According to Barharloo et al., “AP-2” and “AP-3”

groups included participants who likely had AP (i.e., strong, but not outstanding, pure-tone AP

test performance; note that the difference between AP-2 and AP-3 was not clearly outlined). The

AP-4 group included participants whose pitch-perception for pure tones was worse than the

performance of AP-1, AP-2, and AP-3 participants, despite excellent piano-tone AP test

performance. This pattern of performance distinguished AP-4 participants from the three other

groups, raising the possibility that the basis of AP-4 may differ from the other AP types. Other

studies have also found that participants who perform poorly on pure-tone-based AP tests

typically perform better when the AP-test tones use instrumental timbres and more familiar tones

(e.g., the white keys on a piano, Miyazaki, 1988, 1990). The current AP test could not reveal

which individuals had AP-1, AP-2, AP-3, or AP-4 because it used piano tones instead of pure

tones. If one could classify current task performance according to AP type, group differences in

performance might be observed. Indeed, some studies have found that AP type is related to

differences in tonal memory (Ross et al., 2005; Loui et al., 2011). For example, individuals with

“weaker” (i.e., non-AP-1) types of AP reportedly have heightened tonal memory, as contrasted

with AP1 possessors’ ability to encode the pitch of any auditory stimulus (Ross et al., 2005; Loui

et al., 2011). Assuming more than one type of AP possessor was included in the present sample,

different encoding processes might have contributed to the observed behavioural outcomes.

Future studies using sine-wave tones or a combination of sine wave and instrumental tones could

identify how performance on pitch processing and encoding tasks varies as a function of AP-

type. I predict that the present results would also be found using pure tone - rather than

instrumental - stimuli, assuming a homogenous AP-1 participant group.

2.4.2 No cumulative advantage of AP and tone-language experience

Interestingly, tone language and AP did not afford a cumulative advantage in pitch

processing and encoding. Other studies have examined the joint effects of music training

(granted, without special consideration of AP) and tone language. Cooper and Wang (2012)

recruited tone-language (Thai) and non-tone-language (English) speakers, who were musicians

28

or nonmusicians, to complete a Cantonese tone-word training task. Participants were trained to

identify words distinguished by five Cantonese tones. Participants who spoke Thai and/or were

musicians were better at Cantonese word learning. However, having both tone-language

experience and music training was not advantageous above and beyond either type of experience

alone. Similarly, Mok and Zuo (2012) investigated how music training affected native speakers

of a tone language. The authors had Cantonese and non-tone-language speakers with or without

music training perform discrimination tasks with Cantonese monosyllables and pure tones

resynthesized from Cantonese lexical tones. Although music training enhanced lexical tone

perception for non-tone-language speakers, it had little effect on the Cantonese speakers,

suggesting no joint advantage of music training and tone language experience on performance.

2.4.3 Speed-accuracy trade-off

There was no significant difference in accuracy on the zero-back versus and one-back

tasks, but reaction times were significantly slower for the one-back task as compared to the zero-

back task. The increased reaction time for the one-back task suggests that the manipulation of

increasing load was successful. Difference in reaction time but not accuracy may be explained by

the speed-accuracy trade-off often seen in behavioural experiments (Zhai, Kong, & Ren, 2004).

Alternatively, it is possible that there were ceiling effects in the accuracy data. Because the

musicians were highly trained, they may have been highly skilled at identifying melodies,

intervals, and non-music sounds, with any group differences restricted to reaction time. The need

to refer back to music notation (e.g., observing that a certain passage must be repeated) is a

critical element of versatile music performance. Even a single “distractor” (i.e., in the one-back

task) may have been insufficiently distracting to participants. Further increases in memory load

(e.g., two-back task) could avoid ceiling effects in highly trained musicians.

2.4.4 Limitations

It is important to note the limitations of the current study. As mentioned earlier, future

studies could include pure tones, in addition to piano tones, in the AP test and stimuli set. This

would allow for the investigation of how different types of AP (e.g., AP-1, AP-2, etc.) are related

to task performance. The stimuli set could also benefit from more complex control stimuli, such

as a series of non-musical, complex sounds and corresponding visuals, as opposed to a single

sound and visual. Participants’ higher accuracy and faster reaction times for the control stimuli

29

might be due to the current stimuli being too simple (i.e., easier to encode as compared to the

musical conditions, which all included more than one item). More complex control stimuli, on

par with the complexity of the music stimuli, would make the comparison of musical and non-

musical stimuli more meaningful. That is, one could more clearly study whether the current

encoding effects are specific to a single category of stimuli (i.e., musical or non-musical),

perhaps speaking to the specificity of AP sound-encoding advantages.

There were also no measures of general cognitive ability in the present study. Although

no studies examined the relationship between IQ and AP, there is evidence for an association

between music training and IQ scores (e.g., Schellenberg, 2004). For example, Schellenberg

(2004) showed that children who received 36 weeks of music training exhibited slightly greater

increases in full-scale IQ, as compared to children in control groups (drama lessons or no

lessons). Furthermore, Schellenberg (2011a) posited that children with high IQs are more likely

than their lower-IQ peers to take music lessons and perform well on a variety of tests of

cognitive ability. Therefore, if children with higher IQ are more likely to take music lessons than

their lower-IQ peers, and AP development is associated with early exposure to music training

(Gregersen et al., 1999, 2001, 2007), then high-IQ children might be enrolled in music lessons

earlier than their peers and thus have increased opportunity of developing AP. Future studies

could include IQ measures to further explore the association between intelligence, age of onset

of music training, AP, and behavioural performance on processing and encoding tasks.

Finally, another limitation of this study was its small sample size. Recruitment was

challenging due to the specific eligibility requirements (highly-trained musicians, language

requirements, AP status). Future recruitment might be more feasible by collaborating with other

investigators or conducting online testing. The latter option is becoming increasingly popular in

psychology (Athos et al., 2007; Owen et al., 2010), including auditory research (Honing &

Ladinig, 2008), making it a potentially viable option for future behavioural studies on AP and

tone-language use. A larger sample size would have also allowed for testing whether speaking a

tone language with more tones (i.e., Cantonese) was associated with greater benefits to auditory

processing than one with fewer tones (i.e., Mandarin), via higher task demands in the former

case. Though Cantonese and Mandarin speakers did not differ in performance in the present

study, such differences might emerge, given a larger sample size.

30

2.5 Conclusion

The present results suggest that AP ability confers an advantage in processing and

encoding the music stimuli tested in this study. Tone-language use did not provide an advantage,

and tone-language use and AP did not provide a pitch-encoding advantage beyond that afforded

by AP. Therefore, although speaking a tone language is one factor that may increase the

likelihood of AP (Deutsch et al., 2006), AP, rather than tone-language, may be the main source

of any pitch-encoding advantages. Note that these other factors could include genetic

predispositions (Gregersen et al., 1999, 2001), early age of onset of formal music training –

specifically, early exposure to a fixed-do musical system (Gregersen et al., 1999, 2001, 2007).

When connecting these findings to the association between tone language and auditory

processing, it is important to note that though studying tone-language speakers is a good model

for comparisons with musicians due to both groups’ enhanced pitch acuity, the link between

tone-language speakers and AP may be limited. As discussed in Levitin and Rogers (2005),

whether or not one possesses absolute pitch is largely irrelevant to most musical tasks that

require relative (not absolute) pitch judgements. Furthermore, though this study addresses the

relative contribution of tone-language use to auditory processing, it is difficult to tease apart the

relative contributions of music and speech processing, as all participants were musicians. These

limitations demonstrate the need to test tone-language speakers who are nonmusicians, musicians

with no tone-language experience and no AP ability, and controls with neither tone-language

experience nor music experience. These questions were addressed in the subsequent study, which

used behavioural and EEG techniques.

31

Chapter 3 Music Training and Tone-Language Experience:

Associations with Sound Discrimination

3.1 Introduction 3.1.1 Revisiting the shared processing of music and speech

The shared, interactive processing among brain regions activated during music and

linguistic processing has been studied extensively (e.g., Bidelman et al., 2011a; Koelsch et al.,

2001; Maess et al., 2001; Merrill et al., 2012; Sammler et al., 2013; Slevc at al., 2009). However,

this overlap is not necessarily surprising, considering that any single neural region or structure is

often involved in several processes in the human brain (Anderson, 2010). Especially given the

similarities between music and speech (e.g., acoustic perception; motor production), one might

expect to observe overlapping neural regions in the processing of music and speech (Patel, 2014;

Peretz et al., 2015).

Furthermore, the co-activation of neural regions does not, by default, translate to the

sharing of neural circuitry for music and speech processing (see Peretz et al., 2015 for a review).

For example, Sammler et al. (2013) studied the role of the superior temporal gyri in processing

syntax in music and language, using intracranial recordings in temporal lobe epilepsy patients.

Though there was overlapping, bilateral activation of the STGs for both domains, there were also

differences in the hemispheric timing and involvement in frontal and temporal regions. Two

recent studies show a similar dissociation between co-activation and shared neural circuitry in

the processing of music and speech, using different neuroimaging methods (multi-voxel pattern

analysis, Merrill et al., 2012; fMRI adaptation, Armony, Auge, Angulo-Perkins, Peretz, &

Concha, 2015). These studies demonstrate that one can observe unique neural populations

associated with music and speech processing in brain regions shared by these domains (Peretz et

al., 2015). Though such studies provide compelling evidence for differential neural circuitry in

shared brain regions for music and speech processing via converging neuroimaging techniques,

the evidence for such differential neural circuitry is still scarce (Peretz et al., 2015). Indeed,

Peretz et al. (2015) still considers the question of overlap between music and speech processing

as an “open question for the field” (p. 5).

32

3.1.2 The present study

Evidence for the shared processing of music and speech raises the question of whether

music training and tone-language experience are associated with similar benefits to the auditory

processing of music and speech. There is some evidence to suggest that these two groups process

musical and linguistic pitch similarly at the subcortical level (Bidelman et al., 2011a, 2011b; note

that there were some subtle between-group differences in subcortical representations9). However,

behavioural findings that support a positive association between tone language and auditory

processing are mixed, as discussed in Chapter 1. Promising evidence for such an association

comes from Bidelman, Hutka et al. (2013), who found that musicians and Cantonese speakers

outperform controls on a variety of auditory tasks, such as a pitch discrimination task.

To date, the cortical mechanisms that may subserve the auditory processing similarities

between musicians and tone-language speakers have yet to be explored. The neural mechanisms

underlying shared enhancement in music and speech are proposed to be rooted in the auditory

system’s two-way feedback, such that descending, corticofugal projections from the cortex tune

subcortical circuits, while ascending projections from the subcortical regions tune cortical

circuits (cf. the reverse hierarchy theory of auditory processing, Ahissar et al., 2009; see Patel,

2011 for a discussion of this cortical-subcortical interplay). The influence of tone-language

experience on musical pitch processing may be shaped by this interplay. Indeed, tone-language

speakers have enhanced brainstem representation of pitch information (musical interval and

lexical tone, Bidelman et al., 2011a; tuned and detuned musical chords, Bidelman et al., 2011b)

comparable to the representations of musicians.

To better understand how tone-language experience and music training are associated

with auditory processing, the present study compared cortical neuroelectric activity elicited by

music and speech sounds in English-speaking musicians (i.e., extensive experience with musical

pitch), Cantonese speakers (i.e., extensive experience using pitch to distinguish lexical meaning),

9 For example, in Bidelman (2011a), Mandarin speakers exhibited greater pitch strength than nonmusicians, when

processing the rapid pitch changes in a tone language speech token (Mandarin tone 2, T2). Musicians exhibited greater pitch strength than the Mandarin speakers when processing a musical interval (major third), particularly on the onset of the second note of the interval. Musicians also showed greater pitch strength to two T2 sections that corresponded to a musical note in a diatonic scale. These findings were interpreted to suggest that brainstem responses are differentially shaped according to the salience of a given acoustic dimension to one’s domain (i.e., pitch processing in music versus a tone language; Bidelman et al., 2011a, p. 432).

33

and English-speaking nonmusicians (lacking experience both with musical pitch and using pitch

linguistically at the lexical level).10 Critically, musicians, and to a lesser degree, Cantonese

speakers, have pitch-perception experience that is lacking in nonmusicians who also have no

tone language experience. These eligibility criteria ensured that there was minimal overlap

between each group’s domains of pitch processing experience.

3.1.3 Electroencephalography: Components of interest

The mismatch-negativity (MMN) response was the objective assay of early cortical

sensitivity to music and speech sounds in this study. The MMN is a prominent component of an

event-related potential (ERP), serving as a neural index of detection of auditory change that is

thought to reflect early (i.e., bottom-up) processing in the auditory cortices (Näätänen,

Paavilainen, Rinne, & Alho, 2007). Previous studies have shown that changes in complex sounds

evoke larger MMN responses in musicians than in nonmusicians (Brattico et al., 2009; Brattico,

Tervaniemi, Näätänen, & Peretz, 2006; Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004, 2005;

Koelsch, Schroger, & Tervaniemi, 1999; Tervaniemi, Rytkonen, Schroger, Ilmoniemi, &

Näätänen, 2001), indicating an advantage in automatic auditory processing (Koelsch et al.,

1999). In the current study, comparisons of MMN responses elicited by contrastive speech and

musical sounds in musicians and tone-language speakers made it possible to assess the degree to

which divergent forms of pitch experience influence early cortical auditory processing related to

speech and musical sound analysis.

In addition to the MMN, the P3a and late discriminative negativity (LDN; a sustained,

late-emerging, slow-wave component) were examined. The P3a sometimes follows the MMN

and is characterized by a frontocentrally distributed positive deflection thought to reflect an

involuntary attentional switch towards the deviant stimulus (Tervaniemi, Just, Koelsch,

Widmann, & Schröger, 2005; for a review, see Escera, Alho, Schroger, & Winkler, 2000 and/or

Polich, 2007) and/or updating of working memory (Donchin & Coles, 1988; Polich, 2007). Past

work with passive listening has shown that musicians’ P3a response to sound habituates between

trial blocks, while nonmusicians show enhancement of the P3a between blocks (Seppanen,

Pesonen, & Tervaniemi, 2012). This difference in short-term plasticity between musicians and

10 This study has been published as Hutka, Bidelman, and Moreno (2015).

34

nonmusicians reveals that musicianship is associated with enhanced attentional abilities and

auditory feature encoding (Seppanen et al. 2012).

There are two common interpretations of the functional role of the LDN, both of which

imply top-down influences on auditory processing. Specifically, the LDN has been interpreted as

an index of automatic reorienting of attention following distraction by a deviant sound

(Shestakova, Huotilainen, Ceponiene, & Cheour, 2003; Wetzel, Widmann, Berti, & Schroger,

2006) and regulation of higher-order auditory processing that follows initial change detection

reflected by the MMN (Ceponiene et al., 2004; Horvath, Roeber, & Schroger, 2009; Putkinen,

Tervaniemi, & Huotilainen, 2013). Of interest, a recent report demonstrated that the LDN is

influenced by music training and language experience (Moreno, Lee, et al., 2014).

All participants were tested in two conditions, with a contrast in musical notes (differing

only in pitch) or vowel stimuli (differencing only in first-formant frequency) presented in

separate blocks. This paradigm allowed us to test for auditory processing of both within-domain

(e.g., note condition for the musician group; vowel condition for the Cantonese group) and cross-

domain processing (e.g., note condition for the Cantonese group; vowel condition for the

musician group) in the brain, as well as to examine pitch versus spectral (timbre) discrimination

enhancements in musicians and tone-language speakers. The magnitude of stimulus change

varied over two levels (large vs. small sound contrasts). Stimuli were presented in a multiple

oddball paradigm (e.g., Näätänen, Pakarinen, Rinne, & Takegata, 2004) to determine how music

and language experience are associated with sound discriminations of different complexity. In

addition to the electrophysiological responses, behavioural measures of pitch (fundamental

frequency, F0) and vowel (first formant, F1) discrimination were obtained. These tasks assessed

listeners’ perceptual acuity for changes in sound features within music and speech domains.

3.1.4 Hypotheses

If extensive experience with pitch and spectral information as a result of music training

and tone-language usage shapes the auditory system in similar ways, then both musicians and

Cantonese speakers would show enhanced MMN (discrimination), LDN (attentional

reorienting), and behavioural sound discrimination relative to nonmusicians. If this hypothesis

holds, then one would predict that neural and behavioural enhancements would extend across

stimulus types (i.e., compared to controls, musicians would demonstrate enhanced processing of

35

speech – namely, vowels, and Cantonese speakers would demonstrate enhanced processing of

musical sounds). These outcomes would suggest that both music and tone-language experience

are associated with superior automatic as well as top-down auditory processing (i.e., a

bidirectional association; Bidelman et al., 2011a; Bidelman, Hutka, et al., 2013).

3.2 Methods

3.2.1 Participants

Sixty-seven participants were recruited from the University of Toronto and Greater

Toronto Area. The data from four participants were lost due to technical difficulties (n = 3) or

attrition (n = 1). Of the remaining 63 participants, 3 were deemed outliers (3 standard deviations

above the mean on measures of difference limen) and excluded from subsequent analysis. No

one from the previous study (Chapter 2) participated in the present study. Each participant

completed questionnaires to assess language (Li, Sepanski, & Zhao, 2006; Wong & Perrachione,

2007) and music (Bidelman, Hutka, et al., 2013) background. English-speaking musicians

(hereafter referred to as Ms) (n = 21, 14 female) were amateur instrumentalists with at least eight

years of continuous training in Western classical music on their primary instrument (M = 15.43,

SD = 6.46 years), beginning at a mean age of 7.05 (SD = 3.32). All musicians had formal private

or group lessons within the past five years and currently played their instrument(s). These

inclusion criteria are consistent with many previous studies examining neuroplastic associations

with musicianship (e.g., Bidelman & Krishnan, 2010; Bidelman et al., 2011a).

English-speaking nonmusicians (n = 21, 14 female) had a maximum of 3 years of formal

music training on any combination of instruments throughout their lifetime (M = 0.81, SD =

1.40) and had not received formal instruction within the past five years. Both musicians and

nonmusicians had some exposure to a second language that was not a tone language (musicians:

90.48%, nonmusicians: 66.67%; mainly French or Spanish) but were classified as late learners

and/or had moderate to high levels of proficiency11 in their second language.

11

Participants rated the following aspects of their second language on a scale from 1 (very poor) to 7 (native proficiency): “Reading proficiency”, “writing proficiency”, “speaking fluency”, and “listening ability.” “Fluent” was defined native proficiency in all four categories. Of the musicians who had some exposure to a second language (L2; n = 19), 15 had native proficiency in at least one of the four categories; 3 rated themselves as “good” (5) in at least one category; 1 rated themselves as “fair” (3) in at least one category. Of the nonmusicians who had some L2 exposure (n = 14), in at least one of the four categories, 12 had native proficiency, while 2 rated themselves as “good”.

36

Cantonese-speaking participants (n = 18; 11 female) were considered late bilinguals (as

in Bidelman et al., 2011a; Chandrasekaran et al., 2009), beginning formal instruction in English

at a mean age of 10.27 (SD = 5.13); these criteria were used in previous research. All participants

were born and raised in mainland China or Hong Kong and reported using Cantonese on a

regular basis (43.53% daily use, SD = 29.79 %). As with nonmusician participants, Cantonese

speakers had minimal music training throughout their lifetime (M = 0.78, SD = 0.94 years) and

had not received formal instruction in the past five years. Importantly, nonmusicians and

Cantonese speakers did not differ in their music training, p > .90. The three groups were closely

matched in age (musicians: M = 25.24, SD = 4.17; Cantonese speakers: M = 24.17, SD = 4.12;

nonmusicians: M = 23.38, SD = 4.07), p > .30, and years of formal education (musicians: M =

18.19, SD = 3.25; Cantonese speakers: M = 16.94, SD = 2.46; nonmusicians: M = 16.67, SD =

2.76), p > .10), and all were right-handed. All participants provided written, informed consent in

compliance with an experimental protocol approved by the Baycrest Centre Research Ethics

Committee. All received financial compensation for their time.

3.2.2 Cognitive tests

Participants’ general fluid intelligence and short-term memory capacity were measured to

rule out differences in cognitive ability among groups (e.g., Bidelman, Hutka, et al., 2013).

3.2.2.1 Raven’s Matrices

General fluid intelligence was measured with Raven’s Advanced Progressive Matrices

(Raven, Raven, & Court, 1998), which uses nonverbal material without cultural, language, or

social bias to assess individuals’ general cognitive ability. Each trial consisted of a 3×3 matrix of

line drawings depicting abstract patterns in all but the bottom-right cell. Participants selected the

missing pattern from among 6 to 8 alternatives and were given 10 min to complete the 29-item

battery. Items became progressively more difficult over the course of the test. Raw scores

(number correct) were recorded and used in subsequent analyses.

3.2.2.2 Corsi Blocks

A digital implementation of the Corsi blocks tapping test (Corsi, 1972) was used to gauge

each individual’s nonverbal short-term memory. On each trial, participants saw a 6×6 grid of

grey squares on the computer screen. A memory sequence was then presented by briefly

37

changing the colour of certain boxes in various locations on the screen. Participants were

required to recall the sequence in identical order by clicking on the target boxes. Sequence length

gradually increased from two to eight items, becoming progressively harder. Two repetitions

were presented for each span length. The longest span correctly recalled was an index of visual

short-term memory capacity.12

3.2.3 Behavioural tasks

Fundamental frequency difference limens (F0 DLs) and first formant difference limens

(F1 DLs) were measured for each participant using three-alternative, forced-choice (3AFC)

discrimination tasks (Bidelman & Krishnan, 2010). F0 DLs and F1 DLs were measured in

separate blocks. The F0 task used complex tones that varied in pitch. Individual tones contained

10 harmonics of the fundamental and were 200 ms in duration. The F1 task used synthetic

speech sounds that varied only in the first formant frequency (F1). For these stimuli, the F0 (115

Hz) as well as second (2500 Hz), third (3500 Hz), and fourth (4530 Hz) formants were kept

constant across vowels such that only F1 varied.

For a given trial in each task, participants heard three sequential intervals, two containing

an identical reference token (F0ref = 220 Hz for the F0 DL task; F1ref = 300 Hz for the F1 DL

task) and one containing a higher comparison, assigned randomly. Participants’ task was to

identify which of the three tokens (first, second, or third) differed from the other tokens.

Discrimination thresholds were measured using a 2-down, 1-up adaptive paradigm that tracks

71% correct performance on the psychometric function (Levitt, 1971). The initial frequency

difference between reference and comparison (ΔF) was set at 20% of F0ref/F1ref. Following two

consecutive correct responses, ΔF was decreased for the subsequent trial and increased following

a single incorrect response. ΔF was varied using a geometric step-size factor of two for the first

four reversals and was decreased to √2 thereafter. Fourteen reversals were measured and the

geometric mean of the last eight were used to compute each individual’s DL for the run,

12

It has been asserted that the backwards conditions of span tasks, namely the visual memory span (analogous to the current task) and digit span tasks, are more demanding of working memory processing than the forward conditions (see Wilde, Strauss, & Tulsky, 2004). However, when Wilde et al. (2004) assessed whether the backwards span was a more sensitive measure of working memory than the forward span, it was found that the backwards span did not afford differential sensitivity above and beyond the forward span. Nonetheless, it is important to note that this does not mean that the forward and backward span tasks are equivalent measures. Future studies should strive to include both tasks for a more complete understanding of visual memory span.

38

calculated as the minimum percent change in F0/F1 that was detectable (i.e., ΔF/Fnom). F0 DLs

of two runs were averaged per listener to obtain a final estimate of each individual’s F0

discrimination threshold, i.e., the smallest change in pitch that listeners could reliably detect.

Similarly, F1 DLs of two runs were averaged per listener to obtain a final estimate of each

individual’s F1 discrimination threshold.

3.2.4 EEG stimuli

EEGs were recorded using a passive, auditory oddball paradigm, consisting of two

conditions presented in separate blocks of music and speech (Figure 3).

Figure 3. Spectrograms illustrating the standard, large, and small deviant stimuli for the music

(top row) and speech (bottom row) conditions. White lines indicate the fundamental frequency of

each tone or the first formant of each vowel.

The order of conditions was counterbalanced across participants. The note condition consisted of

synthesized piano tones, created with Sibelius v7.1.3 and exported as .wav files. The notes

consisted of middle C (C4, F0 = 261.6 Hz), middle C mistuned by an increase of 0.5 semitones

39

(large deviant; 269.3 Hz; 2.9% increase in frequency from standard), and middle C mistuned by

an increase of 0.25 semitones (small deviant; 265.4 Hz; 1.4% increase in frequency from

standard). Note that these changes were selected because previous behavioural research has

demonstrated that both Cantonese speakers and musicians can distinguish between half-semitone

changes in a given melody better than controls, whereas musicians outperform Cantonese

speakers and controls when detecting a quarter-semitone change (Bidelman, Hutka, et al., 2013).

Tone durations were 300 ms, including 5-ms rise and fall time to reduce spectral splatter. Speech

stimuli consisted of three steady-state vowel sounds (Bidelman, Moreno, & Alain, 2013): [ʊ] as

in book, [a] as in pot, and [ʌ]13 as in but as the standard, large deviant, and small deviant (on the

border of categorical perception between the standard and large deviant; Bidelman, Moreno, et

al., 2013) respectively. The duration of each vowel was 250 ms, including 10-ms rise and fall

time. The standard vowel had an F1 of 430 Hz, the large deviant 730 Hz (41.1% increase in

frequency from standard), and the small deviant 585 Hz (26.5% increase in frequency from

standard). Speech tokens contained identical fundamental (F0), second (F2), and third (F3)

formant frequencies (F0: 100, F2: 1090, and F3: 2350 Hz), chosen to match prototypical

productions from a male speaker (Peterson & Barney, 1952). Speech stimuli were synthesized

with a cascade formant synthesizer implemented in MATLAB (The MathWorks) using

techniques described by Klatt and Klatt (1990). Stimulus onset asynchrony (SOA) was 1 s in

both conditions so that the stimulus repetition rates (and thus, neural adaptation effects) were

comparable for speech and music ERP recordings.

The magnitude of F1 change between the standard and each speech deviant was chosen to

parallel the magnitude of change in the music standard and deviants. However, it is notable that a

greater magnitude of change was required to detect the standard-large deviant and standard-small

deviant changes for F1 than F0. This difference was informed by past findings showing that

participants require a larger percent change between two vowel sounds (i.e., F1) to detect a

difference, as compared to between two pitches (i.e., F0; Bidelman & Krishnan, 2010).

Specifically, in Bidelman and Krishnan (2010), musicians could detect a ~2% change between

two F1s, and a ~0.03% change between two F0s. Nonmusicians could detect a ~4% change

13

Note that this vowel sound is found in English and not in Cantonese (Zee, 1999). In contrast, the standard and large deviant vowels are found in both English and Cantonese (Zee, 1999). See Section 3.4.4 for a discussion of the implications of this vowel-use difference for the present results.

40

between two F1s, and a ~0.90% change between two F0s. Though these difference limens were

not measured using identical stimuli as used in the current study, they demonstrate that

participants require a greater change between F1s than between F0s to detect a difference

between stimuli. Pilot testing was used at present, to determine the specific F0 and F1 standard-

deviant changes that musicians and nonmusicians could reliable detect.

There were a total of 780 trials in each condition including 90 large deviants (12% of the

trials) and 90 small deviants (12% of the trials). Note that it has previously been demonstrated

that reliable MMN waves can be elicited in the presence of more than one deviant (Näätänen et

al., 2004). In a seminal study of MMN paradigms, Näätänen et al. (2004) compared a traditional

MMN paradigm with a new MMN paradigm using five auditory deviants. In the traditional

paradigm, one deviant was presented within a single sequence, for five sequences in total. In the

new paradigm, the five different deviants were presented within the same sequence. Each deviant

had a probability of 0.1. Therefore, in the traditional paradigm, 90% of the stimuli were

standards, while in the new paradigm, 50% were standards. The MMNs observed in the new

paradigm were equal to the MMNs observed in the traditional paradigm (Näätänen et al., 2004),

demonstrating that one can obtain five different MMNs in the time it would typically take to

obtain a single MMN.

3.2.5 Procedure

Participants completed the cognitive tests (Raven’s Matrices and Corsi blocks) plus the

two difference limens tasks (F0 DL and F1 DL) (i.e., the behavioural battery) and the EEG

recording portions in counterbalanced order (P = 0.5 of starting with either the behavioural

battery or EEG recording).14 During EEG recording, participants sat in a comfortable chair and

watched a muted movie of their choice. They were instructed to attend to the movie and ignore

the sounds. Auditory stimuli were delivered binaurally from insert earphones (ER-3A) at an

intensity of 75 dB SPL. The test session lasted approximately 2 hours.

14

For the behavioural battery, either the cognitive or difference limens tests were administered first. For the former, either Corsi or Raven’s was administered first. For the latter, the F0 DL and F1 DL blocks were administered in random order. For the EEG recording, either the music or speech condition was presented first.

41

3.2.5.1 EEG recording and data analysis

EEGs were recorded using a 76-channel Biosemi Active Two-amplifier system (sampling

rate of 512 Hz) with electrodes placed around the scalp according to standard 10-20 locations

(Oostenveld & Praamstra, 2001). During EEG acquisition, all electrodes were referenced to the

CMS (Common Mode Sense) electrode, with the DRL (Driven Right Leg) electrode serving as

the common ground. Subsequent analyses were performed in EEGLAB (Delorme & Makeig,

2004) with custom routines coded in MATLAB. Data were re-referenced off-line to the

mastoids. Eye movements and artifacts were corrected in the continuous EEG using ICA

decomposition in EEGLAB. Excessively noisy channels were interpolated (two nearest

neighbour electrodes). EEG data were divided into epochs (-200-1000 ms), baseline-corrected to

the pre-stimulus interval and subsequently averaged in the time domain to obtain ERPs at each

electrode site for each response type (standards, deviants) and stimulus condition (musical notes,

vowels). Grand averaged ERPs were then digitally filtered (0.01-50 Hz, zero-phase response) for

response visualization and quantification.

MMNs were computed by deriving difference waveforms, calculated by subtracting

ERPs to the standard stimuli from their corresponding deviant ERPs of the same sequence (i.e.,

standard minus deviant). The presence of the MMN at the mastoids was confirmed when

applying a common average reference. For each participant, MMN amplitude was measured as

the most negative peak in the 100- to 250-ms time window of difference waveforms in a fronto-

central electrode cluster (mean of F1, Fz, F2, FC1, FCz, FC2 electrodes). Similarly, P3a and the

LDN were identified in these same channels as the most positive peak in the 200- to 350-ms time

window (P3a) and the mean ERP amplitude in a latency window of 300 to 500 ms (LDN),

respectively. All component latencies were selected based on analysis windows specified in prior

research and visual inspection of the waveforms (Luck, 2005; Shestakova et al., 2003).

An analysis of two additional components, the auditory N1 (particularly, the N1b

subcomponent, most prominent at vertex electrodes at ~100ms, Näätänen & Picton, 1987) and

P2 waves, were also conducted on the ERP prior to subtraction (see Appendices for introduction,

methods, results, and discussion; Figure S1 shows the standard, large, and small deviant for each

condition prior to subtraction). These two deflections have been found to be larger in musicians

than in nonmusicians (Bosnyak, Eaton, & Roberts, 2004; Pantev et al., 1998; Shahin, Bosnyak,

42

Trainor, & Roberts, 2003), making their analysis of particular interest for the present participant

groups. The analysis of the N1 and P2 also allowed for the examination of ERPs prior to

subtraction, providing an opportunity to verify whether the large and small deviants elicited a

change in amplitude from the standard, thus accounting for the difference waves on which the

following analyses were conducted. Although the examination of N1 and P2 waves are relevant

to the present investigation, they are tangential relative to the examination of MMN, P3a, and

LDN and are therefore included in the Appendices.

3.2.5.2 Statistical analysis

A univariate ANOVA was conducted for each cognitive and DL measure. Prior to

statistical analyses, F0 and F1 DL values were square-root transformed to satisfy normality and

homogeneity of variance assumptions required for parametric statistics. Note that when the

univariate ANOVAs were conducted on the raw F0 DL and F1 DL data (rather than on the

transformed data), the pattern of results remained the same (i.e., significant results remained

significant, and non-significant results remained non-significant). However, the bar graphs

displaying the F0 DL and F1 DL results (Figure 5) show the raw F0 DL means and standard

error bars, as these values are easier to interpret than the square root values.

For each of the MMN, P3a, and LDN measures, an ANOVA was conducted, with group

as the between-subjects factor, and stimulus type (music or speech) and deviant size (small or

large) as within-subjects factors. For all analyses, the dependent variable was the amplitude of a

cluster of fronto-central electrodes (average of F1, Fz, F2, FC1, FCz, FC2). For the MMN,

laterality effects were also examined, as visual inspection of the scalp topographies indicated the

possibility of between-group and between-condition differences (Figure 4). To this end, an

ANOVA was conducted, with group as the between-subjects variable, and stimulus type (music

or speech), deviant size (small or large), and laterality (left and right) electrode cluster as within-

subjects variables. The left electrode cluster was an average of a subset of left fronto-central

electrodes (AF3, F3, and F5); the right cluster was an average of right fronto-central electrodes

(AF4, F4, F6).

43

Figure 4. Event-related potential (ERP) scalp topography for the mismatch negativity (MMN) in

the (a) large-deviant music, (b) large-deviant speech, (c) small-deviant music, and (d) small-

deviant speech conditions. The cluster of six electrodes is outlined on the topography of Ms, as

this group drove the significant between-group differences in all conditions. Topographies show

mean activation between two time points in each condition, centered on the mean peak amplitude

(190 to 200 ms for large deviants; 200 to 210 ms for small deviants).

Bonferroni corrections were applied to all pairwise contrasts to control for family-wise

error (α = 0.05). When appropriate, degrees of freedom were adjusted with the Greenhouse-

Geisser epsilon (ε) and all reported probability estimates are based on the reduced degrees of

freedom, although the original degrees of freedom are reported. Partial eta-squared (η2p) was

used as the measure of effect size for all ANOVAs.

Correlations, by group, were examined between performance on i) behavioural auditory

measures (F0 DL and F1 DL) and ii) brain (MMN, P3a, and LDN) and behavioural (F0 DL, F1

DL, Corsi span, and Raven’s) measures to assess the degree to which listeners’ auditory neural

processing of speech/music predicted perceptual acuity in each domain. A false discovery rate

(FDR) procedure (Benjamini & Yekutieli, 2001) was used to correct for multiple correlation tests

with a threshold of ɑ= .05. FDR-corrected results are reported.

44

3.3 Results

3.3.1 Cognitive tests

There were no group differences on the Raven’s or Corsi blocks scores, Fs < 1,

confirming that groups were well-matched in fluid intelligence and short-term memory.

3.3.2 Behavioural tasks

There was a significant group difference on the F0 DL task, F(2, 59) = 11.91, p < .001,

η2p = 0.295 (Figure 5A). Pairwise comparisons revealed that musicians did not perform

differently from Cantonese speakers (p = .920), but that musicians and Cantonese speakers

outperformed nonmusicians [musicians vs. nonmusicians: p < .001; Cantonese speakers vs.

nonmusicians: p = .003] (i.e., musicians = Cantonese speakers > nonmusicians).

Figure 5. A: Performance on the fundamental frequency (F0) difference limen (DL) task.

Musicians (M) and Cantonese-speaking participants (C) showed better pitch discrimination than

non-musicians (NM) controls. B: Performance on the first formant frequency (F1) DL task. M

showed superior discrimination of the first formant in speech sounds, as compared to C and NM.

** p ≤ .01. Error bars indicate SE.

F1 DLs also differed between groups, F(2, 59) = 8.94, p < .001, η2p = 0.239 (Figure 5B).

Pairwise comparisons revealed that musicians outperformed Cantonese speakers (p < .001) and

45

nonmusicians (p = .011); Cantonese speakers did not differ from nonmusicians (p = .776) (i.e.,

musicians > Cantonese speakers = nonmusicians).

3.3.3 ERP data

MMN scalp topographies, waveforms, and average peak amplitudes are shown for each

group and stimulus condition in Figures 4, 6, and 7, respectively.

Figure 6. ERPs difference waves for each group and condition. Each waveform is an average

across six fronto-central channels (inset, F1, Fz, F2, FC1, FCz, FC2). M = musicians; C =


46

Figure 7. Mismatch negativity (MMN) peak amplitude between 100 ms to 250 ms for each

condition and group. The peak amplitude is the average peak of six fronto-central electrodes (F1,

Fz, F2, FC1, FCz, FC2). Error bars indicate SE. M = musicians; C = Cantonese speakers; NM =

nonmusicians.

3.3.3.1.1 MMN

Musicians had larger MMNs across all stimulus conditions than did Cantonese speakers

(p < .001) and nonmusicians (p < .001), F(2, 57) = 15.71, p < .001, η2p = 0.355 (see Table 2 for

means and standard errors). For all three groups, listeners showed larger MMNs (i.e., enhanced

discrimination) for large deviants than for small deviants across both music and speech stimuli,

F(1, 57) = 6.453, p = .014, η2p

= 0.102. All other main effects, as well as two- and three-way

interactions, were not significant, p > .05. Thus musicians had enhanced early cortical

discrimination relative to Cantonese speakers and nonmusicians across music and speech sounds.

47

Table 2

Means and Standard Errors of Mismatch Negativity Analysis Variables at Each Level.

Group Stimulus Deviant size M SE M Music Large -3.085 0.327 Small -2.815 0.364 Speech Large -2.989 0.271 Small -2.585 0.291 C Music Large -2.061 0.416 Small -1.292 0.236 Speech Large -1.934 0.241 Small -1.577 0.185 NM Music Large -2.081 0.248 Small -1.534 0.245 speech Large -1.733 0.237 Small -1.826 0.247

Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.

3.3.3.1.2 MMN laterality effects

The significant results from the MMN analysis (i.e., previous section) remained

significant after including laterality in the analysis. Pooling across groups, stimulus type, and

deviant size, the MMN was marginally stronger in the right than the left hemisphere, F(1, 57) =

3.76, p = .058, η2p = 0.062 (see Table 3 for means and standard errors). The interaction of

laterality, stimulus type, and deviant size was significant, F(1,57) = 7.91, p = .007, η2p = 0.122.

This interaction was driven by a main effect of deviant size in the right cluster, F(1,59) = 7.64, p

= .008, η2p = 0.115. Specifically, large deviants elicited stronger MMNs than small deviants. The

interaction of stimulus type and deviant size was also significant, F(1,59) = 5.42, p = .023, η2p =

.084. For the music condition, large deviants elicited stronger MMNs than small deviants,

F(1,59) = 4.32, p = .042, η2p = 0.06815. All other two-, three-, and four-way interactions were not

significant, ps > .05.

15

Following visual inspection of the data, it was evident that there may have been between-group laterality differences in the large music condition amplitudes. Therefore, pairwise comparisons were conducted to examine these between-group laterality differences. For the musicians, the right hemisphere amplitude was significantly greater than the left hemisphere amplitude, t(20) = 2.408, p = .026. There was no significant difference between hemisphere amplitudes in the other two groups [Cantonese: t(17) = 1.067, p = .301; nonmusicians: t(20) = 0.734, p = .472].

48

Table 3

Means and Standard Errors of Laterality Analysis Variables at Each Level.

Group Laterality Stimulus type Deviant size M SE M Left music large -2.578 0.300 small -2.496 0.334 speech large -2.856 0.196 small -2.340 0.251 Right music large -3.139 0.269 small -2.513 0.330 speech large -2.734 0.221 small -2.395 0.236 C Left music large -1.959 0.374 small -1.266 0.213 speech large -1.732 0.226 small -1.404 0.220 Right music large -2.177 0.429 small -1.341 0.197 speech large -1.792 0.212 small -1.560 0.216 NM Left music large -1.845 0.193 small -1.539 0.267 speech large -1.585 0.224 small -1.651 0.244 Right music large -1.962 0.182 small -1.375 0.257 speech large -1.566 0.193 small -1.916 0.225


3.3.3.1.3 P3a

There was a significant three-way interaction of stimulus type, deviant size, and group,

F(2, 57) = 5.59, p = .006, η2p = 0.164 (see Table 4 for means and standard errors). For

musicians, the large deviant had a more positive P3a than the small deviant, F(1, 20) = 3.75, p =

.067, η2p = 0.158. There was also a significant interaction of stimulus type and deviant size in

musicians, F(1, 20) = 20.59, p < 0.001, η2p = 0.507. Specifically, for the music condition, the

large deviant had a more positive P3a than the small deviant, F(1, 20) = 19.37, p < 0.001, η2p =

0.492. For the speech condition, the small deviant had a more positive P3a than the large deviant,

49

F(1, 20) = 5.61, p = .028, η2p = 0.219. For Cantonese speakers, there was a more positive P3a for

the music than for the speech condition, F(1, 17) = 5.652, p = .029, η2p = 0.250. For

nonmusicians, there were no significant main effects or interactions (p > .05). These results

indicate that musicians had stronger involuntary switching of attention for large music deviants

than for small music deviants (as indexed by the P3a response), while the opposite pattern was

true for the speech condition in musicians. Lastly, Cantonese speakers showed stronger

involuntary switching of attention for musical sounds (i.e., pitch deviants) than for speech sounds

across both deviant sizes.

All other main effects and two-way interactions, including group as a variable, were not

significant, ps > .05. There was a significant interaction of stimulus type and deviant size, F(1,

57) = 14.40, p < .001, η2p =0.202. Specifically, for the music condition, the large deviant had a

more positive P3a than the small deviant, F(1, 57) = 8.90, p = .004, η2p = 0.135, when pooled

across groups. Across groups, one might predict that a large, as compared to a small, deviant

would be associated with an involuntary switch in attention (i.e., the large deviant is a more

obvious, “attention-grabbing” change). For the speech condition, the small deviant had a more

positive P3a than the large deviant, F(1, 57) = 5.36, p = .024, η2p = 0.086. Based on the

aforementioned logic, this finding is counterintuitive: The large deviant should be more likely to

elicit a shift in involuntary attention than a small deviant. These findings suggest that perhaps the

small speech deviant actually elicited a larger involuntary shift. This vowel sound was on the

border of categorical perception between the standard and large deviant (Bidelman, Moreno, et

al., 2013), and may thus have led to an involuntary attentional shift.

50

Table 4

Means and Standard Errors of P3a Analysis Variables at Each Level.

Group Stimulus Deviant size M SE M music large 2.198 0.493 small 0.502 0.365 speech large 0.407 0.314 small 1.197 0.402 C music large 1.516 0.276 small 1.357 0.174 speech large 0.660 0.267 small 1.194 0.218 NM music large 1.023 0.352 small 0.785 0.353 speech large 0.948 0.298 small 0.928 0.291


3.3.3.1.4 LDN

There was a significant main effect of group on LDN mean amplitude, F(2, 57) = 4.56, p

= .015, η2p = 0.138 (see Table 5 for means and standard errors). Specifically, musicians had a

more negative LDN than Cantonese speakers, p = .012. There was no difference in LDN

amplitude between musicians and nonmusicians, p > .2 or between Cantonese speakers and

nonmusicians, p > .5. Pooling across groups and deviant sizes, the speech condition elicited a

more negative LDN than did the music condition, F(1, 57) = 6.48, p = .014, η2p = 0.102. There

was also a significant interaction of stimulus type and deviant size, F(1, 57) = 8.436, p = .005,

η2p = 0.129. In the speech condition, the large deviant elicited a more negative LDN than the

small deviant, t(59) = -3.597, p = .001, whereas there was no difference between the LDN

amplitudes elicited by the two music deviants (p = .498). Furthermore, the large speech condition

elicited a more negative LDN than the large music condition, t(59) = 3.391, p = .001, whereas

there was no difference in LDN amplitude between the large music and speech deviant (p =.768).

These results indicate that across all groups and deviant sizes, the speech condition elicited

greater top-down processing/re-orienting for the large deviant than for the small deviant. Across

all groups, the large speech deviant elicited a more negative LDN than the small speech deviant,

suggesting that the former deviant was processed in a top-down manner more so than the latter.

Recall that the small speech deviant (i.e., on the border of categorical perception between the

51

large deviant and standard) was associated with a larger P3a than the large speech deviant. It is

possible that, because the large speech deviant fell within a distinct perceptual category, it was

less distinct (i.e., elicited a smaller P3a/switch in attention) than the small speech deviant, and

was processed in a top-down manner. That is, because it was more aligned with a perceptual

category than the small speech deviant, it elicited a larger LDN as compared to the small speech

deviant. The LDN results suggest that musicians used top-down processing/re-orienting to a

greater extent than Cantonese speakers.

Table 5

Means and Standard Errors of the Late Discriminative Negativity Analysis Variables at Each

Level.

Group Stimulus Deviant size M SE M music large -0.808 0.297 small -1.400 0.324 speech large -1.858 0.263 small -1.062 0.269 C music large -0.274 0.221 small -0.144 0.305 speech large -1.160 0.340 small -0.553 0.255 NM music large -0.761 0.318 small -0.680 0.249 speech large -1.207 0.218 small -0.823 0.254


3.3.3.1.5 Correlations

Correlations between F0 DL and F1 DL revealed a significant positive association in

musicians only, r = 0.605, p = .004. That is, better pitch discrimination was associated with

superior timbre discrimination and vice versa. There were no significant correlations between

behavioural and ERP measures, ps > .05, after FDR correction.

3.4 Discussion

By comparing cortical MMN responses to music and speech in musicians and tone-

language speakers, this study assessed possible enhancements in auditory neural processing

52

associated with music and tone-language experience compared to adults who were neither

musicians nor tone-language speakers. Across conditions, only musicians showed enhanced

MMN, suggesting that they had better automatic discrimination of music and speech sounds than

did Cantonese and nonmusician listeners. Cantonese experience was not associated with

increased ERP amplitude despite enhanced behavioural acuity for pitch. As expected, there was

clear differentiation between deviant size, with large deviants eliciting more pronounced

responses (i.e., more negative MMN and LDN) than small deviants.

There was no significant interaction between group, stimulus type (i.e., music versus

speech), and deviant magnitude (i.e., small versus large) for any of the ERP components.

Previously, behavioural melody discrimination tasks showed that musicians are more sensitive to

quarter-semitone changes (i.e., the size of the small music deviants in the present study) than are

Cantonese speakers and nonmusicians (Bidelman, Hutka et al., 2013). For half-semitone changes

(i.e., the size of the large music deviants in the present study), musicians previously

outperformed Cantonese speakers, who in turn outperformed nonmusicians (Bidelman, Hutka et

al., 2013). Based on these data, one might predict analogous changes in the MMN response.

However, it is possible that all musical stimuli elicited a similar MMN because the MMN is a

passive index of sound discrimination. Perhaps group differences only emerge when participants

pay attention to the stimuli (i.e., in the behavioural task).

Overall, speech and music stimuli generated comparable MMN and P3a responses in all

groups. This finding suggests that simple music and speech stimuli may engage similar neural

networks. However, for the LDN, speech sound contrasts elicited larger neural responses than

did the musical stimuli. The latter findings imply that the recruitment of top-down processing/re-

orienting is more pronounced for changes in timbre than for changes in pitch. Timbre is a highly

salient cue for listeners, providing critical information about a given sound source (Schellenberg

& Habachi, 2015). In contrast, pitch height is more reliant on situational factors (Schellenberg &

Habachi, 2015). Furthermore, discriminating sound sources (i.e., via timbre) is more

evolutionarily salient, as compared to discriminating different aspects of the same source (i.e.,

pitch; Schellenberg & Hibachi, 2015). Indeed, vocal timbre has been shown to be particularly

salient, enhancing memory for melodies relative to melodies presented in instrumental timbres–

an effect that may be related to the biological relevance of the voice (e.g., Weiss, Schellenberg,

53

Trehub, & Dawber, 2015; Weiss, Trehub, & Schellenberg, 2012; Weiss, Vanzella, Schellenberg,

& Trehub, 2015).

Schellenberg and Hibashi (2015) also demonstrated the importance of timbral cues in

comparison to pitch or tempo. Participants listened to previously unfamiliar melodies and were

tested for their recognition of the melodies 10 minutes, one day, or one week from the initial

exposure. Recognition ratings were collected for the old (initially-presented) melodies, as well as

for an equal number of new melodies. In the first of two experiments, half of the old melodies

were transposed by six semitones (i.e., change in key/pitch) or shifted in tempo. In the second

experiment, timbre was changed in half of the old melodies. Timbral changes negatively

impacted recognition after all three delays, whereas changes in pitch or tempo were only

impacted after the 10-minute and one-day delays. These results suggest that information about

timbre fades more slowly than pitch or tempo. Furthermore, these effects were present in

listeners who were recruited without regard to music training, demonstrating that these results

are not limited to highly-trained musicians and/or individuals with AP.

3.4.1 Musicianship and tone language: Behavioural measures

At the behavioural level, musicians and Cantonese speakers performed better than

nonmusicians without tone-language experience. This finding adds further support to the

associations between auditory acuity for pitch and auditory experience, whether it involves

music or speech. Specifically, previous behavioural studies have reported musicians’ higher

perceptual acuity for pitch (e.g., Bidelman et al., 2011a, 2011b; Magne, Schön, & Besson, 2006;

Marques, Moreno, Castro, & Besson et al., 2007; Schön et al., 2004) and the timbral

characteristics of speech (Bidelman & Krishnan, 2010; Bidelman, Weiss, Moreno, & Alain,

2014; Chartrand & Belin, 2006).

Contrary to expectations, there was no behavioural advantage of tone-language

experience on the processing of speech timbre, with Cantonese participants performing no

differently from nonmusicians. If tone languages confer auditory processing benefits, those

benefits may be restricted to pitch processing. Whereas pitch is used to distinguish lexical

meaning in Cantonese but not in English (Cutler, Dahan, & van Donselaar, 1997; Yip, 2002),

timbre is a critical cue for source identification in general (see Schellenberg & Habachi, 2015 for

54

a discussion). Accordingly, the benefits of tone-language experience may be limited to pitch

processing (Bidelman et al., 2011b).

Selective benefits associated with the use of certain linguistic cues have been reported

previously. For example, long-term experience with duration cues from either language use or

music training predicts benefits in pre-attentive and attentive processing of duration cues in

nonspeech harmonic sounds (Marie et al., 2012). Moreover, speaking a tone language has been linked to enhanced pitch discrimination and the timing of auditory cortical responses to pitch

changes (Giuliano et al., 2010). Tone-language speakers also more readily imitate (via singing)

and discriminate musical pitch (Pfordresher & Brown, 2009). The findings suggest that tone-

language acquisition fine-tunes the processing of pitch, affecting pitch processing in linguistic

and non-linguistic domains (Bidelman et al., 2011b, Bidelman, Hutka, et al., 2013; Pfordresher

& Brown, 2009).

3.4.2 Auditory neurophysiological benefits of musicianship and tone language

Musicians’ superior behavioural discrimination of music and speech was reflected in

their MMN response, which was larger across all conditions, as compared with Cantonese

speakers and nonmusicians who did not speak a tone language. One possibility is that

musicianship tunes sensory mechanisms that subserve early discrimination in music and speech

domains (Bidelman et al., 2011b). Musicians’ superior discrimination of timbre, both at the

neural and behavioural level, also aligns with a wealth of data supporting music-to-language

associations or benefits in a number of language-related domains (e.g., Bidelman & Krishnan,

2010; Marques et al., 2007; Parbery-Clark, Skoe, Lam, et al., 2009). This superior discrimination

may be related to musicians’ predisposition for multiple aspects of superior auditory

discrimination, their broad range of auditory experiences (relative to the other groups), as well as

cross-domain benefits related to the auditory experience gained via music training (e.g., OPERA

hypothesis, Patel, 2011).

In addition to the superior sound processing of musicians at the automatic, cortical level

(i.e., MMN), they also showed enhanced top-down processing/reorienting (i.e., LDN) relative to

Cantonese speakers but not to nonmusician controls. These ERP findings only partially

corroborate previous studies that revealed enhanced subcortical (i.e., automatic) auditory

55

responses (e.g., Bidelman et al. 2011a; Bidelman et al. 2014; Musacchia, Sams, Skoe, & Kraus,

2007) and enhanced LDN (i.e., attentional reorienting/higher-order auditory processing) in

musicians (Putkinen et al., 2013; Moreno, Wodniecka, Tays, Alain, & Bialystok, 2014).

Furthermore, Cantonese speakers did not differ from nonmusicians without tone-language

experience (i.e., they did not have a more positive LDN that differentiated them from musicians

and controls).

There was no difference between groups in P3a amplitude. Previous research has

revealed P3a habituation over time in musicians and enhancement over time in nonmusicians,

which has been interpreted as musicianship honing attentional abilities and auditory feature

encoding (Seppanen et al., 2012). The shorter duration of testing in the present study than in the

previous study (25 versus 60 min) may account for the discrepant results.

Musicians showed stronger involuntary switching of attention for large music deviants

than for small music deviants (P3a). The difference between deviant sizes is surprising, given

that musicians can accurately identify half- and quarter-semitone changes in melody (e.g.,

Bidelman, Hutka, et al., 2013). The large deviant change is more obvious (i.e., easier to detect)

than the small deviant, perhaps accounting for this difference. Within the Cantonese group,

participants had stronger involuntary switching of attention for musical sounds (i.e., pitch

deviants) as indexed by the P3a, than for speech sounds across both deviant sizes. This finding

suggests that hearing fundamental frequency changes in a non-linguistic context elicited

attention reorientation in Cantonese speakers, who regularly use fundamental frequency in a

linguistic context. This finding may indicate that experience with pitch in a linguistic context

(i.e., Cantonese) can extend to attention reorientation of non-linguistic pitch. Future research on

Cantonese speakers could directly compare the P3a in response to pitch in a linguistic or non-

linguistic context, to better understand how this group processes pitch outside of their learned

(i.e., linguistic pitch) context.

3.4.3 Dissociation between neural and perceptual processing of music/speech

The ERP data reveal that musical experience—but not tone-language experience—is

associated with enhanced neural processing of music and speech information. This ERP

difference between Cantonese listeners and musicians implies that pitch and timbral elements of

56

music and speech are not as salient to tone-language speakers as they are to musicians (e.g.,

Bidelman et al., 2011a; Bidelman, Hutka, et al., 2013). However, the absence of neural

enhancements for music stimuli in Cantonese listeners is surprising in light of their behavioural

advantages for pitch processing.

Indeed, tone-language speakers’ behavioural enhancements for pitch processing were not

paralleled by neural enhancements. Previous work suggests that the engagement of cortical

circuitry subserving speech/music percepts depends on the cognitive relevance of the stimulus to

the listener (e.g., Abrams et al., 2011; Bidelman et al., 2011a, 2011b; Chandrasekaran et al.,

2009; Halpern, Martin, & Reed, 2008). For example, in response to musical stimuli, information

relayed from subcortical sensory structures engages higher-level cortical mechanisms subserving

musical pitch perception in musicians, a process that is not engaged in tone-language speakers

(Bidelman et al., 2011b). Indeed, strong correlations are observed between brain and behavioural

responses to musical chords for musicians but not for listeners lacking musical expertise (i.e.,

Cantonese and nonmusician participants; Bidelman et al., 2011b). Applying these findings to the

present study, auditory neural processing (as indexed by the MMN) seems to fully engage

higher-level perceptual mechanisms only in musicians (rather than in Cantonese participants or

nonmusicians). Similarly, timbral cues may be more salient to musicians than to Cantonese or

nonmusician participants (e.g., Bidelman et al., 2011a; Bidelman, Hutka, et al., 2013) as a result

of musicians’ extensive experience with differentiating timbre (e.g., distinguishing between

instruments when performing with other musicians). Higher auditory-processing demands of

music relative to language (e.g., Patel, 2011), as well as the contributions of nature and nurture to

musicianship, may account for musicians’ parallel enhancements in brain and behavioural

processing that is not observed in tone-language speakers.

It is also notable that there was no interaction between laterality and group in any

condition, suggesting that the lateralization of pitch and speech processing does not differ

between tone-language speakers and musicians. When collapsing across all other variables,

however, MMN responses were marginally right lateralized. Previous findings indicate that the

right hemisphere is specialized for processing the fine spectral features of musical stimuli,

whereas the left hemisphere is specialized for temporal processing (i.e., for speech perception;

see Zatorre, Belin, & Penhune, 2002 for a review). The current data are not consistent with this

right lateralization for music or left lateralization for speech (i.e., no significant interaction of

57

stimulus type and laterality). They suggest instead that participants were more focused on fine

spectral features than on temporal information in all stimuli. Another possibility was that the

current passive EEG task, using relatively simple auditory stimuli, was not sensitive enough to

detect lateralization differences. Other methods (e.g., fMRI; an active, rather than passive,

condition) might be better suited to detect such hemispheric differences.

3.4.4 Modularity of music and speech processing

The association between music and speech raises questions about whether the acoustic

cues in language and music rely on independent neural systems or a single, domain-general

processor (Slevc, 2012). The current data suggest that musicians have finely-tuned domain-

general processes, such that sound discrimination at the neural and behavioural level is enhanced

for both music and speech stimuli. This aligns with previous findings of musicians’ enhanced

general auditory processing, that is, spectral acuity above and beyond the processing of musical

stimuli (Kraus & Chandrasekaran, 2010). Future research could confirm this finding by

systematically (i.e., parametrically) varying spectral content without robust changes in pitch.

Differential associations between music and tone-language experience on ERPs suggest

that musicianship and language experience are associated with at least partially divergent neural

networks. That is, if the pitch experience derived from musicianship and tone language

experience shared a common neural mechanism, one would predict similar enhancements to ERP

components in both musicians and Cantonese speakers. The findings of Moreno, Wodniecka, et

al., (2014) support the view of different neural networks linked to musical and linguistic

experience, such that bilinguals and musicians exhibit different ERPs during an inhibition task.

Although inhibition is an executive function rather than a perceptual process (e.g., pitch

discrimination), the results of Moreno, Wodniecka et al. (2014) suggest that bilingualism and

musicianship have differential effects on the neural networks supporting a common ability.

3.4.5 Limitations

Future studies could address two limitations that might have contributed to the lack of

similar neural enhancements in musicians and Cantonese speakers. First, it is possible that the

use of pitch in a non-linguistic context was foreign to the Cantonese speakers (but not

musicians), and thus did not elicit as strong of a MMN response as for musicians. Future studies

58

could measure the neural response to pitch in a linguistic or non-linguistic context, to determine

whether the MMN amplitude in this group is influenced by such factors. Second, it is possible

that the vowel sounds used at present were biased, favouring native English speakers.

Specifically, though the standard and large deviant sounds are found in both English and

Cantonese, the small deviant vowel sound is not found in Cantonese (Zee, 1999). Perhaps the

lack of familiarity explains why the Cantonese participants did not show any differences in

MMN as compared to controls (i.e., they have less experience with this vowel than native

English speakers, who heard and used this vowel throughout their entire lives, and may thus be

better at detecting differences between standard and small vowel deviants). Future studies could

ensure that vowel stimuli consist of tokens that are used in both English and Cantonese, to

control for the amount of experience participants have with the stimuli.

3.5 Conclusion

This study tested the degree to which musicianship and tone-language experience are

associated with sound discrimination in behavioural and early cortical levels of auditory

processing. Consistent with previous reports (Bidelman, Hutka, et al., 2013), the present study

found that linguistic pitch experience and music training were associated with comparable

enhancement in basic pitch discrimination (as measured via F0 DLs). Only musicians showed

enhanced timbral processing (as measured by F1 DLs) relative to tone language speakers and

nonmusicians. Parallel enhancements of behavioural spectral acuity in early auditory processing

were observed in musicians only. That is, tone-language users’ advantages in pitch

discrimination that were observed behaviourally (e.g., Bidelman, Hutka, et al., 2013) were not

reflected in early cortical MMN responses to pitch changes. Although extensive music and tone-

language experience may enhance some aspects of auditory acuity (pitch discrimination), music

training may confer broader enhancements to auditory function, tuning pitch and timbre-related

neural processes.

An alternative explanation for the differences between neural and behavioural pitch-

processing in tone-language speakers is that mean activation over a cortical patch may not

adequately represent neural processes underlying the processing of sound, particularly pitch.

Musicians arguably have a greater range of experience with pitch (e.g., manipulating and

producing complex melodies and harmonies), as well as predispositions for superior pitch

59

processing abilities (i.e., Schellenberg, 2015), than do tone-language speakers. By this logic,

tone-language speakers should not show neural responses to – nor behavioural benefits in - pitch

discrimination that is comparable to that of musicians. Since such a behavioural benefit was

observed in Cantonese speakers, it is possible that there are unique neural circuitries associated

with pitch processing in these individuals that were not adequately captured in ERP measures.

To investigate this possibility, I sought out a methodology that could detect nuanced

effects in the brain signal that might underlie the differences between auditory processing for

tone-language speakers and musicians, and that could be applied to the existing dataset. Both of

these requirements were met by the measurement of brain signal variability in the EEG data,

which examines the information processing capacity of the brain across multiple timescales

(Ghosh, Rho, McIntosh, Kotter, & Jirsa, 2008a; Heisz, Shedden, & McIntosh, 2012; McIntosh,

Kovacevic, & Itier, 2008; Misic, Mills, Taylor, & McIntosh, 2010). This approach

conceptualizes the brain as a nonlinear dynamical system that is capable of examining

interactions between brain signal frequencies (Heisz & McIntosh, 2013). The following two

chapters explore this nonlinear approach, first at the theoretical level (Chapter 4, based on Hutka,

Bidelman, & Moreno, 2013), and then as applied to EEG data (Chapter 5). As Peretz et al.

(2015) have stated, converging neuroimaging evidence will be required before one can conclude

that neural overlap equates to neural sharing. The current nonlinear approach has the potential to

inform the distinction between overlapping neural regions – versus distinct neural circuitries –

for pitch processing as related to music and speech processing (as discussed in Peretz et al.,

2015).

60

Chapter 4 A Theoretical Discourse on the Use of Nonlinear Methods to

Investigate the Music-Language Association

4.1 Common acoustic processing in musicians and tone-language speakers

Based on past literature, one might assume that musicians and tone-language speakers

share acoustic processing resources, particularly when it comes to pitch processing. For instance,

Kraus and Chandrasekaran (2010) examined the relationship between music training and the

development of auditory skills, with an emphasis on the neural representation of pitch, timing,

and timbre in the human auditory brainstem. The authors posited that music training leads to

fine-tuning of all salient auditory signals, both musical and non-musical (Kraus &

Chandrasekaran, 2010). Further exploring these mechanisms, Besson, Chobert, and Marie (2011)

stated that when long-term experience in a domain impacts acoustic processing in another

domain (e.g., the use of pitch in a nonlinguistic or linguistic context), the findings can serve as

evidence for common acoustic processing. Similarly, when long-term experience in one domain

influences the build-up of abstract and specific percepts in another domain, results may serve as

evidence for cross-domain plasticity.

The notion of “musicianship tuning” has been extended to claims that music training

confers a range of enhanced sensory and cognitive processes. For example, Moreno and

Bidelman (2014) posited a multidimensional continuum model of common processing and cross-

domain plasticity. In this model, the extent of plasticity effects resulting from musicianship is

viewed as a spectrum along two orthogonal dimensions, namely Near-Far and Sensory-

Cognitive. The former describes the extent of plasticity (i.e., within a domain, or across

domains); the latter describes the level of affected processing, ranging from low-level sensory

processing specific to the auditory domain to high-level domain-general cognitive processes,

including executive function and language.

Evidence of overlap of neural regions involved in music and speech (discussed in

Chapters 1and 3) appears to corroborate the notion of common acoustic processing in musicians,

as compared to nonmusicians. As mentioned earlier, co-activation of neural regions in response

to music and speech processing regions does not equate to the sharing of neural networks (Peretz

61

et al., 2015). Furthermore, in contrast to the evidence for common acoustic processing, the

findings described in Chapter 2 suggest that absolute pitch ability and tone-language experience

do not rely on the same mechanisms of pitch processing, such that absolute pitch—but not tone-

language experience—is associated with enhanced encoding of pitch. Additionally, there was no

cumulative effect of absolute pitch ability and speaking a tone language. If absolute pitch ability

and tone-language recruited common pitch processing abilities, then we might expect no

difference between pitch-encoding performance across all participants, because everyone was

using the same pitch processor. Furthermore, all participants were musically trained, suggesting

they all shared a common baseline of finely-honed acoustic processing abilities. The data from

Chapter 2 therefore suggest that absolute pitch ability and tone-language use may not rely on a

common pitch processor. Alternatively, perhaps absolute pitch taps into additional auditory-

processing networks to facilitate pitch encoding, thus leading to superior performance, as

compared to tone-language speakers.

Cooper and Wang (2012) administered Cantonese tone-word training to tone language

(Thai) and non-tone language (English) speakers. These groups were further subdivided into

musicians and non-musicians. Participants were trained to identify words distinguished by five

Cantonese tones. Measures of music aptitude and phonemic tone identification were then

administered. Participants who were either Thai speakers or musicians were better at Cantonese

word learning than non-musicians. Having both tone-language experience and musical training

was not advantageous, however, above and beyond either type of experience alone. These

findings suggest that the networks underlying the processing of verbal tones in musicians and

tone-language speakers confer similar but not cumulative behavioural benefits. That is, tone

language and musicianship may rely on a common pitch processor that is similarly honed by

both types of auditory experience. Similarly, Mok and Zuo (2012) investigated how music

training impacted lexical tone perception in native tone-language speakers. Cantonese and non-

tone language speakers with or without music training performed discrimination tasks with

Cantonese monosyllables and pure tones resynthesized from Cantonese lexical tones. Although

music training was predictive of enhanced lexical tone discrimination among non-tone language

speakers, it had no association in Cantonese speakers. These data also suggest that musicianship

and tone language hone common pitch processing abilities. Perhaps once these pitch processing

abilities are acquired via either musicianship or speaking a tone language, there is a ceiling

62

effect, such that pitch experience via musicianship and tone language does not confer any

additional advantage for pitch processing over either type of experience in isolation16.

Nevertheless, when one considers data from studies that examined the neural correlates

of pitch processing in musicians and tone-language speakers, it is less clear that the use of pitch

in a lexical (tone language) or non-lexical (musical) context similarly hones common pitch

processing abilities (Bidelman et al., 2011a, 2011b). In Bidelman et al. (2011a), both Mandarin

speakers and musicians had stronger brainstem responses to pitch tracking in a musical pitch

interval and in a lexical tone than nonmusician controls. However, in Bidelman et al. (2011b),

Mandarin speakers and musicians had stronger brainstem responses to tuned and detuned

musical chords as compared to controls, but only musicians showed superior pitch discrimination

on a behavioural task. By contrast, Chapter 3 revealed that tone-language speakers and musicians

had similar pitch discrimination ability when measured behaviourally, but only musicians had

enhanced neural responses to pitch changes.

Both of these reasons may contribute to the larger MMNs observed in musicians, as

compared to the Cantonese and control groups in Chapter 3. However, the comparable

behavioural pitch discrimination performance for musicians and Cantonese speakers raises the

possibility of an alternative explanation. This discrepancy between the neural and behavioural

data in Cantonese speakers implies that there are some shared pitch processing abilities in

musicians and tone-language speakers, but that the networks that support this processing

manifest differently at the cortical level. If these networks differentially manifest at the cortical

level, they were too subtle to be detected in the MMN response in Chapter 3. As prefaced in

Section 3.5, the investigation of whether these networks manifest differently at the cortical level

would require an analytic procedure capable of detecting such nuanced effects in the brain

signal, which could also be applied to the existing dataset for direct comparison with the results

reported in Chapter 3. This procedure is the measurement of brain signal variability in EEG data,

which examines the information processing capacity of the brain across multiple timescales

(Heisz et al., 2012; Lippe, Kovacevic, & McIntosh, 2009; McIntosh et al., 2008; McIntosh et al.,

2014). At a more conceptual level, this method examines the brain as a nonlinear dynamical

16

Note that it may also be possible that the fine-grained nature of representations related to music experience may differ from that of tone language experience.

63

system (Hutka et al., 2013), as contrasted with a linear, static view of brain activity (see Section

4.2). The next section will compare these linear and nonlinear approaches, and expand on the

implications of this nonlinear approach.

4.2 A nonlinear approach to studying the music-language link

Empirical work on the association between music and language has relied on methods that

capture linear dependencies in the data, such as mean activation in or between neural regions

(e.g., Bidelman et al., 2011a, 2011b, Chapter 3). The linear approach captures brain activity as a

static entity (i.e., occurring at a single timescale). For example, in a linear approach to EEG,

waveforms are averaged together across trials. A loss of information is inherent to this process,

as the nonlinear stochastic activity that characterizes variability in each trial disappears as a

result of averaging (Figure 8). In contrast, a nonlinear approach can capture this variability

across time (see Hutka, Bidelman, & Moreno, 2013 for a discussion), thus moving from a static

view of the brain, to measuring the brain in alignment with its natural state (i.e., a complex

nonlinear system).

Figure 8. Loss of information as a result of averaging individual trials in EEG. The variation

between individual trials (left) is lost as a result of the averaging procedure, as evident in the

averaged waveform (right).

4.2.1 The brain as a complex, nonlinear system

Indeed, the brain itself is a complex nonlinear system (Bullmore & Sporns, 2009;

64

McKenna, McMullen, & Shlesinger, 1994), and requires a nonlinear model for greater

explanatory power of its functions (Hutka et al., 2013). Complex systems are typically

characterized as dynamic (i.e., they change with time), nonlinear (i.e., the effect is

disproportionate to the cause), multifaceted, open, unpredictable, self-organizing, and adaptive

(Larsen-Freeman, 1997, p. 142). Furthermore, the behaviour of a complex system, such as the

brain, does not emerge from any single component but instead from the interaction between its

ever-changing constituent components (Waldrop, 1992, p. 145). If we define the brain as a

complex, nonlinear system, a linear analysis of the brain cannot portray a complete account of

neural functioning, and must be complemented with nonlinear techniques.

4.2.2 Application to the study of acoustic processing influenced by experience

To explore further perceptual processing in music and speech in musicians and tone-

language speakers, nonlinear methods will be used. This approach holds the promise of revealing

the nuances that define and distinguish the pitch processing networks in musicians and tone-

language speakers. One nonlinear measure of brain signal variability has been successfully

applied to EEG data to examine how experience with a given stimulus manifests in the stochastic

interactions between brain frequencies (Heisz & McIntosh, 2013; Heisz et al., 2012). This

approach is ideally suited to the present research question, which examines how different types

of experience with a given acoustic cue (e.g., pitch) are differentially associated with brain signal

variability.

4.3 Brain signal variability

In the complex nonlinear system that is the brain, we find inherent variability (Faisal,

Selen, & Wolpert, 2008; Pinneo, 1966; Stein, Gossen, & Jones, 2005; Traynelis & Jaramillo,

1998), fluctuating across time, both extrinsically (i.e., during a task, Deco, Jirsa, McIntosh,

Sporns, & Kotter, 2009; Deco, Jirsa, & McIntosh, 2011; Ghosh, Rho, McIntosh, Kotter, & Jirsa,

2008;Ghosh, Rho, McIntosh, Kotter, & Jirsa, 2008b; Raichle, MacLeod, Snyder, Powers,

Gusnard, & Shulman, 2001; Raichle & Snyder, 2007) and intrinsically (i.e., at rest, Deco, Jirsa,

& McIntosh, 2011). As discussed in Faisal et al. (2008), variability arises from two sources – the

deterministic properties of a system (e.g., the initial state of neural circuitry will vary at the start

of each trial, leading to different neuronal and behavioural responses), and “noise”, which are

65

disturbances that are not part of the meaningful brain activity and thus interfere with meaningful

neural representations. The former type of variability is being addressed at present, which

reflects important brain activity and not, for example, random artifacts inherent to the acquisition

of brain data (e.g., ocular/muscular perturbations or thermal noise from electrodes or MRI

scanners). This brain signal variability (BSV) is synonymous with the transient temporal

fluctuations in the brain signal (Deco et al., 2011; see Section 5.2.5 for BSV-related formulae);

its analysis can be applied to many different types of neuroimaging data.

For example, BSV has been analyzed in EEG (e.g., Heisz et al., 2012), fMRI (e.g.,

Garrett, Kovacevic, McIntosh, & Grady, 2010) and magnetoencephalography (MEG, e.g., Misic

et al., 2010). In the EEG study by McIntosh et al. (2008), BSV was examined using two

measures, namely principal component analysis (PCA, a linear method that was applied in a

nonlinear way) and multiscale entropy (MSE, a nonlinear metric). These measures prove

sensitive to linear and nonlinear brain variability and differentiate between changes in the

temporal dynamics of a complex system and that of random variability (Costa, Goldberger, &

Peng, 2002, 2005). MSE indexes the temporal predictability of neural activity, calculated by

down sampling single-trial time series to progressively coarse-grained time scales, and

calculating sample entropy (i.e., state variability) at each scale (Costa et al., 2005). Such a linear

versus nonlinear differentiation would be useful in qualifying the complexity of complementary

neural networks, particularly temporally sensitive networks such as those responsible for

language and music processing.

Recently, BSV has been found to convey important information about network dynamics,

such as integration of information (Garrett et al., 2013) and distinguishing long-range from local

connections (McIntosh et al., 2014). That is, BSV can serve to reveal a complex neural system

that has capacity for enhanced information processing and alternates between multiple functional

states (Raja-Beharelle et al., 2012). BSV thus affords the appropriate framework with which the

interaction of music and language can be studied, allowing us to view these two systems as

dynamically fluctuating across time.

As discussed in Garrett et al. (2013), the modeling of neural networks involves mapping

an integration of information across widespread brain regions, via emerging and disappearing

correlated activity between areas over time and across multiple timescales (Honey, Kotter,

66

Breakspear, & Sporns, 2007; Jirsa & Kelso, 2000). These transient changes result in fluctuating

temporal dynamics of the corresponding brain signal, such that more variable responses are

elicited by networks with more potential configurations, i.e., “brain states” (Garrett et al., 2013).

This signal variability is thought to represent the network’s information-processing capacity,

such that variability is positively associated with integration of information across the network

(Garrett et al., 2013). Thus, this variability is experience-dependent (rather than task-dependent),

making such representations a valuable addition to understanding the interaction of neural

mechanisms supporting auditory processing in musicians and tone-language speakers.

4.3.1 Current applications of BSV

The analysis of BSV from EEG, MEG, and fMRI is a new framework in cognitive

neuroscience data analysis. Several studies have focused on developmental applications of BSV,

finding that signal variance increases with age (McIntosh et al., 2008; Misic et al., 2010; Lippe et

al., 2009). Others have also used BSV to better understand brain networks. Garrett et al. (2010)

found that the standard deviation of BOLD signal was five times more predictive of brain age

(from age 20 to 85) than mean BOLD signal. In another study, Garrett et al. (2011) examined

how BOLD variability related to age, reaction time, and consistency in healthy younger (20 to 30

years) and older (56 to 85 years) adults on three cognitive tasks (perceptual matching, attentional

cueing, and delayed match-to-sample). Younger, faster, and more consistent performers

exhibited increased BOLD variability, establishing a functional basis for this often disregarded

measure. These studies collectively demonstrate the importance of shifting from a linear (e.g.,

mean neural response) to a nonlinear (e.g., entropy/variability) conception of complex brain

systems and their relationship to behaviour.

BSV has also been applied to the study of knowledge representation. Heisz et al. (2012)

tested whether BSV reflects functional network reconfiguration during memory processing of

faces. The amount of information associated with a particular face was manipulated (i.e., the

knowledge representation for each face; e.g., a famous face would have more information

associated with it, and thus, greater knowledge representation) while measuring BSV to capture

the EEG state variability. Across two experiments, the authors found greater BSV in response to

famous faces than a group of non-famous faces, and that BSV increased with face familiarity.

Notably, these findings were not reflected in the mean ERP amplitude in the same dataset (Heisz

67

et al., 2012). Heisz et al. (2012) posited that cognitive processes in the perception of familiar

stimuli may engage a broader network of brain regions, which manifest as higher variability in

spatial and temporal brain dynamics (i.e., greater spatiotemporal changes in BSV).

The findings of Heisz et al. (2012) corroborate those of Tononi, Sporns, and Edelman

(1996), who found that the amount of information available for a given stimulus can be

determined by the extent to which the complexity of a stimulus matches its underlying system

complexity. For example, familiar stimuli would elicit a stronger match than novel stimuli, as

there would be more information available on the former, thus yielding greater BSV. These

findings collectively suggest that BSV increases as a result of the increased accumulation of

information within a neural network. Presumably, this type of “build-up” results from the

increased repertoire of brain responses associated with a given stimulus (Ghosh et al., 2008;

McIntosh et al., 2008; Tononi et al., 1994). These findings are applicable to understanding the

music-language link at a network level because brain responses associated with given stimuli

(i.e., differences between musical notes or lexical tones) should commensurately vary in BSV for

a group that has expertise with those stimuli (i.e., musicians or tone-language speakers).

4.4 Moving from theory to application, in the context of the music-language association

In summary, traditional approaches to understanding the brain (e.g., fMRI: mean

activation; ERPs: peak amplitudes) may not afford a complete understanding of neural activity

because they cannot capture nonlinear dependencies in the brain signal. Studies that have used

BSV to quantify knowledge representation (e.g., Heisz et al., 2012) suggest that BSV is a

promising and informative metric of plasticity. If the nonlinear, stochastic activity corresponding

to the neural processing of music and language could be measured in musicians and tone

language speakers, one could potentially address how music and tone-language experience are

differentially associated with the neural networks underlying pitch processing. Chapter 5

describes a study in which the BSV of the EEG time-series data from Chapter 3 is measured

while participants listened to music and speech sounds.

68

Chapter 5 Using Brain Signal Variability to Examine Differences between

Musicians and Tone Language Speakers

5.1 Introduction

5.1.1 Brain signal variability: A recapitulation

The previous chapter outlined how nonlinear analyses could be used to gain a deeper

understanding of the music-language association. One specific nonlinear approach that holds

great potential for understanding the neural mechanisms underlying auditory processing in

musicians and tone-language speakers is the measurement of BSV. There is strong evidence

showing that BSV serves as a metric of neural-network dynamics, which provides valuable

information about these dynamics that could not be obtained through the sole measurement of

mean neural activity (e.g., using ERPs, Heisz et al., 2012; McIntosh et al., 2008; Vakorin et al.,

2011; see also: Garrett et al., 2011, Ghosh et al., 2008). Previous findings suggest that BSV

reflects the brain’s information processing capacity, such that a more variable signal indicates

greater cross-network information integration (e.g., Heisz et al. 2012; Misic et al., 2010). Studies

have shown that the more information available to a listener about a given stimulus, the greater

the BSV in response to that stimulus (Heisz et al. 2012; Misic et al. 2010). Variability should

therefore increase as a function of learning, such that the more information one acquires for a

stimulus, the greater information carried in the brain signal (Heisz et al. 2012).17 For these

reasons, I posited that BSV might have great potential for studying group differences in auditory

processing of musicians and tone- language speakers.

5.1.2 The present investigation

In the current investigation, the theoretical concept of using BSV to study the music-

language association was applied to the EEG data collected from the study described in Chapter

3 (please see Section 3.2 for information on participants, cognitive tests, and EEG stimuli).

While we observed that both musicians and Cantonese speakers showed superior performance on

a behavioural pitch discrimination task, as compared to controls, only musicians had an

17

Note that variability in BSV is determined both intra-trial and intra-subject. Thus, one may have two participants with highly variable BSV, but with different data values.

69

enhanced MMN in response to pitch (i.e., indexing better automatic pitch discrimination), as

compared to Cantonese speakers and controls. One explanation for these results is that there are

unique neural circuitries associated with pitch processing in Cantonese speakers that were not

characterized in ERP measures. As discussed earlier, BSV has been shown to provide

information above and beyond what is learned from mean activation, and can index knowledge

representation supporting the processing of a given stimulus (e.g., Heisz et al., 2012). Therefore,

the current investigation measured BSV during these groups’ processing of pitch (as compared to

a non-pitch cue, namely speech timbre), with the objective of better understanding how

musicians and Cantonese speakers differ with respect to the information processing capacity of

neural networks supporting auditory processing. This examination would directly address

whether musicianship or tone-language experience is reflected in similar or different information

processing capacities of pitch versus timbre. Furthermore, comparing the spatiotemporal profile

of music and speech processing for each group could reveal similarities and differences. In

addition, measuring BSV would allow us to examine how experience with one auditory cue (e.g.,

pitch) transfers to the processing of another auditory cue (e.g., timbre).

To this end, we measured BSV of the EEG during auditory processing of music (pitch

variation) and vowels (timbre variation) in musicians, Cantonese speakers, and non-musician

controls. This design tested whether pitch processing is supported by common neural network

activations in musicians and Cantonese speakers. I hypothesized that if musical training and

speaking Cantonese similarly tune information processing supporting music and speech, then

both groups would show greater BSV supporting auditory processing relative to that of controls

(i.e., musicians = Cantonese speakers > controls). If auditory expertise and/or pre-existing

differences between musicians and Cantonese speakers differentially impact information

processing capacity, then one would predict different BSV between musicians and tone-language

speakers. This latter prediction would also manifest in unique spatiotemporal distributions for

each group, as each group would be using a different brain network to support processing of

pitch versus timbre.

70

5.2 Methods

5.2.1 EEG recording and pre-processing

Following the EEG recording and pre-processing described in Section 3.2.5.1, source

estimation was performed at 72 regions of interest defined in Talairach space (Diaconescu,

Alain, & McIntosh, 2011) using sLORETA (Pascual-Marqui, 2002), as implemented in

Brainstorm (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011). Source reconstruction was

constrained to the cortical mantle of the standardized brain template MNI/Colin27 defined by the

Montreal Neurological Institute in Brainstorm. Current density for one source orientation (X

component) was mapped at 72 brain regions of interest adapting the regional map coarse

parcellation scheme of the cerebral cortex developed in Kotter and Wanke (2005). MSE was

calculated on the source waveform at each region of interest (ROI) for each participant.

5.2.2 Multiscale entropy analysis

To characterize BSV, multiscale entropy (MSE; Costa et al., 2002, 2005) was measured, as it

indexes sample entropy across multiple timescales. MSE quantifies sample entropy (Richman &

Moorman, 2000) at multiple timescales (Costa et al., 2002, 2005). MSE was calculated in two

steps using the algorithm available at www.physionet.org/physiotools/mse (Goldberger et al.,

2000). First, the EEG signal was progressively down-sampled into multiple coarse-grained

timescales where, for scale τ, the time series is constructed by averaging the data points with

non-overlapping windows of length τ. Each element of a coarse-grained time series, yj(τ), is

calculated according to Eq. (1):

The number of scales is determined by a function of the number of data points in the signal and

the data in the present study supported 12 timescales [sampling rate (512Hz) * epoch

(1200ms)/50 time points per epoch = maximum of 12 scales]. To convert timescale into

milliseconds (ms), the timescale was divided by the EEG sampling rate (512 Hz).

Second, the algorithm calculates the sample entropy (SE) for each coarse-grained time

series (Eq. (2)):

(1)

71

Sample entropy quantifies the predictability of a time series by calculating the conditional

probability that any two sequences of m consecutive data points that are similar to each other

within a certain criterion (r) will remain similar at the next point (m+1) in the data set (N), where

N is the length of the time series (Richman & Moorman 2000). In the present study, MSE was

calculated with pattern length set to m=5 and the similarity criterion was set to r=1. MSE

estimates were obtained for each participant as the mean across single trial entropy measures for

each timescale.

5.2.3 Spectral analysis

Power spectral density (PSD) was also measured for all trials. This spectral analysis was

conducted because previous studies suggested that changes in MSE tend to follow closely

changes in spectral power, while providing unique information about the data (Gudmundsson et

al., 2007; Lippe et al., 2009; McIntosh et al., 2008; McIntosh et al,. 2008; Misic et al., 2010).

Therefore, changes in sample entropy across sources and temporal scales were examined, as well

as at changes in PSD across sources and frequency bands.

Single-trial power spectra were computed using the Fast Fourier transform (FFT). To

capture the relative contribution from each frequency band, all time series were first normalized

to a mean of 0 and SD of 1. Given a sampling rate of 512 Hz and 614 data points per trial, the

effective frequency resolution was 0.834 Hz. Hence, all spectral analyses were constrained to a

bandwidth of 0.834-50 Hz.

5.2.4 Statistical analysis

5.2.4.1 Cognitive measures

A univariate ANOVA was run for each cognitive test.

5.2.4.2 Task Partial Least Squares Analysis

Task partial least squares analysis (PLS; McIntosh, Bookstein, Haxby, & Grady, 1996)

was used to assess between- and within-subjects changes in MSE during performance. Task PLS

(2)

72

is a multivariate statistical technique that employs singular value decomposition (SVD) to extract

latent variables (LVs) that capture the maximum covariance between the task design and neural

activity. The data matrix containing participants in each group by MSE values across the 72

brain regions and sampling scales was mean-centered with respect to the column grand average.

SVD was then applied to the matrix to generate mutually orthogonal LVs, with descending order

of magnitude of covariance accounted for. Each LV consisted of: (1) a pattern of design scores,

(2) a singular image showing the distribution across brain regions and sampling scales, (3) a

singular value representing the covariance between the design scores and the singular image

(McIntosh et al., 1996; McIntosh & Lobaugh, 2004).

The statistical significance of each LV was determined using permutation tests (McIntosh

& Lobaugh, 2004). An LV was considered significant if its singular value was present less than

5% of the time in random permutations (i.e., p < .05). The reliability of each statistical effect was

assessed through bootstrap estimation of standard error confidence intervals of the singular

vector weights in each LV (Efron & Tibshirani, 1986). In the present study, this process allowed

for the assessment of the relative contribution of brain regions and timescales to each LV. Brain

regions with a singular vector weight over standard error ratio > 3.0 correspond to a 99%

confidence interval and were considered to be reliable (Sampson, Streissguth, Barr, & Bookstein,

1989). Such effects are therefore designated as “reliably expressed” throughout the results

section. In addition, the dot product of an individual participant’s raw MSE data and the singular

image from the LV produces a brain score. The brain score is similar to a factor score that

indicates how strongly a participant expresses the patterns on the latent variable and allowed us

to estimate 95% confidence intervals for the effects in each group and task condition.

The large and small deviant conditions were combined into a single condition for all

analyses, as there were no differences in MSE or PSD between these conditions. For the

between-groups comparisons, the grand mean was subtracted over all groups and conditions. For

between-conditions comparisons, the mean was subtracted from each group, rather than across

all groups, thus displaying how condition effects are associated with group membership.

73

5.3 Results

5.3.1 Task PLS: Multiscale entropy and spectral data

All groups and conditions were entered into the task PLS; Figures 9, 11, and 12 show

both MSE and spectral data.

5.3.1.1 Between-group comparisons

When comparing groups across all conditions (Figure 9; Figure 10, showing sample

entropy curves for each timescale, across all conditions.), the first latent variable (LV1) of the

MSE analysis captured greater sample entropy in the musician group as compared to the

Cantonese group (LV1, p = .004, singular value = 1.0856 corresponding to 43.82% of the

covariance). This difference was reliably expressed across both fine and coarse time scales

across all neural ROIs, particularly in the right hemisphere. The largest effects were seen across

all timescales (particularly, in coarse scales) in the right inferior parietal, angular gyrus, and

primary somatosensory area; medial posterior cingulate; and bilateral primary motor, medial

premotor, precuneus, cuneus, and superior parietal area.

LV1 of the spectral analysis captured differences in the musician group as compared to

the control and Cantonese groups (LV1, p = .012, singular value = 0.1626, corresponding to

37.01% of the covariance). This difference was reliably expressed across frequencies that were

lower than 20 Hz (primarily theta/alpha band: 4-12 Hz) in a number of brain regions similar or

identical to those observed in the MSE results.

Collectively, PLS analyses revealed that each group could be distinguished based on the

variability (MSE) and spectral details of their EEG (particularly in the right hemisphere) when

listening to speech and music stimuli. Furthermore, in the areas in which these contrasts were

reliably expressed (e.g., right angular gyrus, Figure 10), musicians had the greatest sample

entropy across all conditions; Cantonese speakers had the lowest sample entropy; nonmusicians

were in between these two groups.

74

Figure 9. First latent variable (LV1), between-groups comparison: Contrasting the EEG response

to the music and speech conditions across measures of multiscale entropy (left) and spectral

power (right). The bar graphs (with standard error bars) depict brain scores that were

significantly expressed across the entire data set as determined by permutation at 95%

confidence intervals. The image plot highlights the brain regions and timescale or frequency at

which a given contrast was most stable; values represent ~z scores and negative values denote

significance for the inverse condition effect.

75

Figure 10. MSE curves for all groups, averaged across all conditions, at the right angular gyrus.

5.3.1.2 Between-conditions comparisons

LV1 for the MSE analysis (Figure 11) captured differences in sample entropy between

the music and speech conditions for nonmusicians (p = 0.002, singular value = 0.2518,

corresponding to 22.85% of the covariance). These differences were reliably expressed at fine

timescales in several left hemisphere areas, namely the anterior insula, centrolateral and

dorsomedial prefrontal cortex, frontal polar area, and secondary visual areas. Specifically,

greater information processing capacity for speech, as compared to music, was observed in these

left-hemisphere regions. Differences were also reliably expressed in the right primary and

secondary visual areas, and the cuneus. Namely, greater information processing capacity for

music, rather than speech, was observed in these right-hemisphere regions.

Similarly, LV1 of the spectral analysis captured spectral differences between the music

and speech conditions for nonmusicians (p < 0.001 singular value = 0.0533, corresponding to

25.28% of the covariance). Processing of music, as compared to speech, was reliably expressed

at frequencies below 10 Hz (e.g., theta, 4-7 Hz for the music condition) in multiple left

hemisphere regions, namely the left anterior insula, claustrum, centrolateral and dorsomedial

prefrontal cortex, frontal polar, parahippocampal cortex, thalamus, and dorsolateral and

ventrolateral premotor cortex. These differences were also expressed in the midline posterior

76

cingulate cortex, and the right cuneus, thalamus, and ventrolateral prefrontal cortex. Processing

of speech, as compared to music, was reliably expressed in frequencies above 12 Hz (e.g., beta,

12 - 18 Hz; gamma, 25 – 70 Hz for the speech stimuli) in multiple left-hemisphere areas, namely

the left anterior insula, centrolateral and dorsomedial prefrontal cortex, orbitofrontal cortex,

frontal polar, and dorsolateral premotor cortex. These differences were also expressed in the

right primary motor area, precuneus, and the dorsolateral prefrontal cortex.

77

Figure 11. First latent variable (LV1), between-conditions comparison: Contrasting the EEG

response to the music and speech conditions across measures of multiscale entropy (left) and

spectral power (right) for nonmusicians. The bar graphs (with standard error bars) depict brain

scores that were significantly expressed across the entire data set as determined by permutation

tests at 95% confidence intervals. The image plot highlights the brain regions and timescale or

frequency at which a given contrast was most stable; values represent ~z scores and negative

values denote significance for the inverse condition effect.

78

LV2 for the MSE analysis (Figure 12) captured differences in sample entropy between

the music and speech conditions for Cantonese speakers (p = .052, singular value = 0.2029,

corresponding to 18.41% of the covariance). Specifically, greater information processing

capacity for music, rather than speech, was reliably expressed in the midline posterior cingulate

and retrosplenial cingulate cortex at fine timescales, and the primary visual area at coarse

timescales. Greater information processing capacity for speech, rather than music, was expressed

in the left medial premotor cortex and right medial premotor cortex at coarse timescales.

Similarly, LV2 of the spectral analysis captured spectral differences between the music

and speech conditions for Cantonese speakers (p = .036, singular value = 0.0382, corresponding

to 18.12% of the covariance). The processing of speech, as compared to music, was reliably

expressed at frequencies below 10 Hz (e.g., theta, 4-7 Hz) in the bilateral medial premotor

cortex. The processing of the music condition, as compared to speech, was reliably expressed in

low-frequency activity (e.g., theta, 4-7Hz) in the left parahippocampal cortex, and right anterior

insula, ventral temporal cortex, and fusiform gyrus. Processing of music was also reliably

expressed at frequencies above 12 Hz (e.g., beta, 12 - 18 Hz; gamma, 25 – 70 Hz), in the midline

posterior and retrosplenial cingulate cortex, left superior parietal cortex, and bilateral primary

and secondary visual areas.

79

Figure 12. Second latent variable (LV2), between-conditions comparison: Contrasting the EEG

response to the music and speech conditions across measures of multiscale entropy (left) and

spectral power (right) for Cantonese speakers. The bar graphs (with standard error bars) depict

brain scores that were significantly expressed across the entire data set as determined by

permutation tests at 95% confidence intervals. The image plot highlights the brain regions and

timescale or frequency at which a given contrast was most stable; values represent ~z scores and

negative values denote significance for the inverse condition effect.

80

The third latent variable (LV3), contrasting music and speech conditions for the musician

group, was not significant (MSE: p = .256; spectral analysis: p = 0.210). While it is possible that

this effect would become significant at a larger sample size, the bootstrap-estimated standard

errors were small, suggesting that this lack of an effect was reliable (i.e., a stable-zero estimate,

see McIntosh & Lobaugh 2004). The fact that musicians did not distinguish between music or

speech stimuli is important because it suggests that this group used a similar neural architecture

to process acoustic information, regardless of the stimulus domain (i.e., music ≈ speech).

Collectively, the between-condition analyses revealed that each group processed the

distinction between music and speech using a unique spatiotemporal network. LV1 showed that

nonmusicians had greater sample entropy and higher frequency activity for speech than music, at

several left hemisphere areas. LV2 showed that Cantonese speakers had greater sample entropy,

for music than speech, particularly in midline regions. The spectral analyses revealed that this

contrast was also expressed across multiple frequency bands. LV3, which was not significant,

suggested that musicians used similar neural networks to support the processing of both music

and speech stimuli. Alternatively, the passive paradigm and relatively simple auditory stimuli

might not have engaged different networks for pitch and timbre in musicians. Measuring BSV

during an active task in future studies would help eliminate this possibility.

5.4 Discussion

5.4.1 MSE data

By examining sample entropy between groups, this study sought to test if musicians and

tone-language (Cantonese) speakers have similar information processing capacity supporting

music and speech listening via distinct neural networks. Between groups, we found that

musicians had greater BSV (i.e., information processing capacity) than nonmusicians when

listening to both music and speech stimuli. Cantonese speakers had the lowest entropy of all

three groups for both stimulus conditions. Though this pattern of results was evident across

multiple neural regions and timescales, it was particularly prominent in right hemisphere regions

at coarse timescales. These data support the hypothesis that musicianship and the use of a tone

language are differentially associated with the information processing capacity supporting both

music (pitch) and speech (timbre) processing. This differential association may be due to

musicians’ extensive experience with pitch as compared to Cantonese speakers, as well their pre-

81

existing differences that distinguish them from nonmusicians (e.g., SES; musical aptitude;

personality; genetic factors, see Chapter 1.4 and Schellenberg, 2015 for a discussion). Future

studies could examine how each of these factors is related to the BSV supporting pitch

processing in musicians, Cantonese speakers, and controls, to better understand the contributions

of nature and nurture to BSV.

The finding that musicians’ increased BSV was most prominent in the right hemisphere

corroborates the finding that this hemisphere is engaged in fine spectral features of auditory

input, as compared to the left hemisphere, which is more specialized for temporal processing (see

Zatorre et al. 2002 for a review). Similarly, expression in coarse timescales suggests that the

information processing capacity of pitch and timbre is distributed across the brain, rather than

locally based (Vakorin et al. 2011). Collectively, our findings indicate that musicians’ processing

of fine spectral features – both for pitch and timbre – is likely supported by a wider network than

in Cantonese speakers and English-speaking nonmusicians. These data are consistent with other

evidence indicating that music training is associated with a wide range of benefits in spectral

processing (e.g., Bidelman & Krishnan 2010; Chandrasekaran et al. 2009; Parbery-Clark, Skoe,

Lam et al. 2009; Parbery-Clark et al. 2013; Schoen et al. 2004; Zendel & Alain 2012). These

effects may be due to the proliferation of brain networks supporting auditory processing in

musicians, as shaped by pre-existing differences and musical training.

In the between-conditions results, we found that each group engaged unique

spatiotemporal distributions to process the differences between music and speech. Nonmusicians

had greater information processing capacity for speech than music (Figure 11). This difference

was expressed primarily in several left hemisphere areas at fine timescales. The lateralization of

this result is consistent with reports that in musically naïve listeners, speech processing is more

left-lateralized than music, given the left hemisphere’s specialization for temporal processing

(see Zatorre et al. 2002 for a review). These findings also suggest that nonmusicians may have

greater, locally-based information processing capacity for speech, as compared to music (see

Vakorin et al. 2011). This group’s processing of music was right-lateralized, aligning with

evidence for right-hemisphere specialization for spectral processing (Zatorre et al., 2002).

Cantonese speakers had greater sample entropy for music as compared to speech (Figure

12). This distinction was expressed primarily in the midline posterior cingulate and retrosplenial

82

cingulate cortex at fine timescales. This finding suggests that Cantonese speakers’ use of lexical

pitch may manifest for greater sample entropy for this cue, as compared to timbre. This finding

aligns with the idea that the more familiar one is with a stimulus, the greater sample entropy

associated with processing that stimulus (i.e., familiar versus unfamiliar faces, Heisz et al.,

2012). Finally, BSV in musicians did not distinguish between processing music and speech

sounds, which implies that the spectral acuity associated with extensive training in music is

associated with enhanced information processing capacity that supports both pitch and timbral

cues. Collectively, our data demonstrate that each group processes the distinction between music

and speech using a different spatiotemporal network. Furthermore, the activation patterns for

each group suggest a gradient of pitch processing capacity, which is consistent with the proposal

that the more experience one has with pitch (i.e., musicians > Cantonese > nonmusicians), the

greater sample entropy associated with processing this cue. Namely, nonmusicians had greater

sample entropy for speech as compared to music; Cantonese speakers had a greater sample

entropy capacity for music than speech; musicians had similar levels of sample entropy for both

conditions. An analogous gradient was observed in behavioural data for a pitch memory task in

Bidelman et al. (2013). This gradient effect suggests that demands of each type of auditory

experience are associated with differential information processing capacities (von Stein &

Sarnthein, 2000).

5.4.2 Comparing MSE results to the spectral analysis results

The MSE analyses yielded some unique information that was not obtained in the spectral

analyses, as well as data that were complementary to the spectral analysis. Between-group

comparisons of sample entropy revealed that musicians had greater brain signal complexity than

tone-language speakers across all conditions. In contrast, spectral analyses revealed that

musicians’ processing of all conditions drew more heavily upon low, theta/alpha (4-12 Hz)

frequencies than the other groups. Low frequencies of the EEG have traditionally been

interpreted as reflecting long-range neural integration (von Stein & Sarnthein, 2000). Both the

MSE and spectral results were also observed in similar brain regions. Collectively, both types of

analyses suggest long-range and more “global” processing of auditory stimuli among musicians

compared to tone-language speakers or nonmusicians. However, unlike the spectral results, the

MSE data speak to the information processing capacity of the underlying networks.

83

This global processing aligns with multiple neuroimaging findings in which musicians

had increased inter-hemispheric communication as compared to nonmusicians. For example,

musicians – relative to nonmusicians – have a larger anterior corpus callosum, which is

responsible for such inter-hemispheric communications, and connecting premotor,

supplementary motor, and motor cortices (Schlaug et al., 1995). Numerous studies have since

found differences in the corpus callosum between musicians and nonmusicians (Hyde et al.,

2009; Schlaug et al., 2005; Schlaug et al., 2009; Schmithorst & Wilke, 2002; Steele et al., 2013),

particularly in regions connecting motor areas (Schlaug et al. 2005; Schlaug et al., 2009). These

differences may be honed by the bimanual coordination related to playing an instrument (Moore

et al., 2014), or by pre-existing differences that distinguish musicians from nonmusicians (see

Schellenberg, 2015 for a discussion).

Between-condition comparison of sample entropy revealed that each group showed

unique spatiotemporal distributions in their response to processing music and speech.

Nonmusicians had greater information processing for speech than music at fine timescales in

several left hemisphere areas (e.g., anterior insula, centrolateral and dorsomedial prefrontal

cortex, frontal polar area). The spectral data revealed beta and gamma frequency activity when

processing speech (as compared to music) in similar neural regions as found in the MSE

analysis. High-frequency activity has been associated with local perceptual processing (von Stein

& Sarnthein, 2000), and is in accordance with the fine-timescale (i.e., local) activation observed

in our MSE analysis (Vakorin et al., 2011).

For the music condition, the spectral data from nonmusicians differed from the MSE

analyses. Specifically, low-frequency (theta) activation was associated with music processing in

many of the same regions that expressed higher frequencies when processing speech. These data

suggest that nonmusicians may utilize longer-range neural integration to process music (von

Stein & Sarnthein, 2000). However, this difference was not reflected in the MSE analysis (i.e.,

no increase in sample entropy at coarse timescales for the music condition), which implies that

nonmusicians do not have increased information processing capacity for music, relative to

speech. This interpretation is plausible because nonmusicians have experience casually listening

to music, yet they do not have the pitch-processing experience possessed by musicians – or, to a

lesser extent, Cantonese speakers.

84

In the MSE results for the Cantonese speakers, there was greater sample entropy for

music as compared to speech – a difference that was primarily expressed at fine timescales in

midline regions. Similarly, the spectral data showed that processing of music, as compared to

speech, was associated with beta and gamma frequencies in similar neural regions as in the MSE

results. Both the fine timescale and high-frequency activity suggest that the processing of music

versus speech in Cantonese speakers relies on locally - rather than globally - distributed networks

(Vakorin et al., 2011; von Stein & Sarnthein, 2000). There was also low-frequency (i.e., theta)

activation associated with processing music, particularly in several right hemisphere areas (e.g.,

anterior insula, ventral temporal cortex, and fusiform gyrus), and with processing speech in the

bilateral medial cortex. This low-frequency activity suggests that Cantonese speakers utilize

long-range neural integration to process music and speech (von Stein & Sarnthein, 2000). This

finding is not consistent with either local complexity supporting pitch processing (MSE data), or

low-frequency communication supporting such processing (high-frequency spectral data). Future

research could seek to clarify the global versus local nature of neural networks that support

music and speech processing in Cantonese speakers.

5.4.3 Comparisons to event-related potential findings (Chapter 3)

In Chapter 3, the MMN response was measured in the same three participant groups as an

index of early, automatic cortical discrimination of music and speech sounds. In that analysis,

only musicians showed an enhanced MMN response to both music and speech, which is

consistent with the current between-group effects observed in the present chapter. That is,

compared to Cantonese speakers and controls, musicians showed greater automatic processing

(Chapter 3; Hutka, Bidelman, & Moreno, 2015) and information processing capacity (present

study) used in the processing of both music and speech.

However, in Chapter 3, no differences were observed for any group in MMN amplitude

to music or speech stimuli. In contrast, both sample entropy and spectral characteristics between

conditions were observed in controls and Cantonese speakers. Furthermore, each group had a

unique spatiotemporal distribution in response to music and speech. Despite having lower

sample entropy than musicians or nonmusicians across all conditions, Cantonese speakers

showed greater sample entropy for music as compared to speech. These data suggest that

Cantonese speakers have larger information processing capacity for pitch than timbre. In

85

contrast, MMNs did not reveal a difference in automatic processing of music versus speech in

the Cantonese group (Chapter 3; Hutka et al. 2015). The differences between the MMN findings

and the present results suggest that the nonlinear analyses provided additional, more fine-grained

information about between-condition differences (see Chapter 4 and Hutka et al. 2013 for a

discussion). That is, the averaging conducted to increase the signal-to-noise ratio in ERP

analyses may eliminate important signal variability that carries information about brain

functioning (Chapter 4; Hutka et al. 2013).

5.5 Limitations

It is important to note that further research is required to delineate specific differences in

the nature of the underlying neural activity between groups and conditions. Between-group

differences in pitch processing demands may relate to the spatiotemporal differences observed at

present (i.e., musicians > Cantonese > controls), as well as pre-existing group differences (see

Section 5.4.1 for a discussion). Furthermore, it is plausible that musicians and Cantonese

speakers’ precise use of pitch is more similar to one another than Cantonese and controls’ use of

pitch, explaining the observed between-conditions results. To specifically relate neural activity to

pitch processing demands, these demands would need to be precisely quantified. Future studies

could accomplish this by training naïve participants to distinguish between different numbers of

lexical tones (e.g., one group learns three tones, another learns four tones, etc), and then

measuring BSV when processing learned (versus unlearned) tones.

5.6 Conclusions

The present data suggest that the use of pitch for musicians relative to tone-language

speakers is associated with different information processing capacities. Furthermore, each

group’s pitch processing was associated with a unique spatiotemporal distribution, suggesting

that musicianship and tone language do not share processing resources for pitch, but instead, use

different networks. These data also serve as a proof-of-concept of the theoretical premise

outlined in Chapter 4 (see Hutka et al., 2013), namely how applying a nonlinear approach to the

study of the music-language association can advance our knowledge of each domain, as well as

the role of experience-dependent plasticity and pre-existing differences.

86

The present chapter represented the conclusion of investigating how speaking a tone

language confers benefits to perceptual processes (i.e., spectral acuity), and how tone-language

speakers are similar to and different from musicians. As discussed in Chapter 1, the extent to

which using a tone language may confer benefits to executive function is still unknown. The

investigation of how speaking a tone language impacts executive function is thus the focus of

Chapter 6.

87

Chapter 6 Tone Language, Musicianship, and Executive Function

6.1 Introduction

6.1.1 Is tone-language experience associated with enhancement in executive function, as is musicianship?

The previous studies in this thesis examined associations of tone-language experience with

spectral acuity, especially as compared with musicians. Convergent evidence from behavioural

data as well as linear and nonlinear dependencies in neuroimaging data revealed some positive

associations between auditory processing and tone-language experience, as was the case for

musicians. Musicians and tone-language speakers appear to have similar pitch discrimination

abilities, as revealed in some behavioural tasks, despite musicians showing larger automatic

responses to pitch changes than tone-language speakers and nonmusicians who do not use a tone

language (Chapter 3). There were suggestions, moreover, of a gradient of pitch processing at the

neural level such that the more extensive experience one had with pitch processing, the greater

BSV observed during pitch processing (Chapter 5).

Because of claims that musicianship enhances skills beyond the perceptual realm such as

auditory working memory, (Pallesen et al., 2010; Parbery-Clark, Skoe, & Kraus, 2009), verbal

and visual memory (e.g., George & Coch, 2011), response inhibition (Moreno et al., 2011),

verbal fluency, processing speed, and task switching (Zuk et al., 2014), it is reasonable to ask

whether tone-language experience is linked to comparable enhancements. Given that

musicianship and tone-language experience have comparable links to some auditory processes

(e.g., pitch processing), it is of interest to examine whether these processes extend to and are

modulated by executive function.

6.1.2 Cognitive benefits in balanced, tone-language bilinguals

The tone-language speakers in the present study, as well as in previous chapters, were

proficient in two languages. Note, however, that nearly 86% of the musicians and 75% of the

nonmusicians had some experience with a second language (see Section 6.3.1). The effects of

bilingualism on cognitive processing, and executive control in particular, have been studied

extensively (see Bialystok, Craik, & Luk, 2012 for a review). Many studies have focused on the

hypothesized bilingual advantage in inhibitory control (see Hilchey & Klein, 2011 for a review)

88

and working memory (e.g., Bialystok & Feng, 2010; Engel de Abreu, 2011; Morales, Calvo, &

Bialystok, 2013). For example, Morales, Calvo, and Bialystok (2013) showed that bilingual

children performed better than monolingual children regardless of working memory load;

furthermore, bilingual children excelled relative to monolingual children when tasks made

additional demands on executive function. However, other studies did not observe working

memory differences between bilingual versus monolingual children (Bialystok & Feng, 2010;

Engel de Abreu, 2011). However, Morales et al. (2013) posited that these studies did not detect

this advantage because both studies required verbal processing (Bialystok & Feng, 2010:

recalling lists of words; Engel de Abreau (2011): tasks involving words and digits) – an ability

which bilingual children tend to perform poorly on, as compared monolinguals (e.g., Gollan,

Montoya, & Werner, 2002; Portocarrero, Burright, & Donovik, 2007; Rosselli et al., 2000).

Unlike these two studies that involved verbal processing, the tasks used in Morales et al. (2013)

had low verbal requirements, reducing the possibility that verbal processing confounded their

results.

Notably, there does not appear to be a difference in cognitive abilities between bilinguals

who speak a tone language versus a non-tone language (task switching: Barac & Bialystok,

2012; receptive vocabulary: Bialystok, Luk, Peets, & Yang, 2010). More generally, others have

questioned the positive association between bilingualism and cognitive benefits, based on a

failure to find such associations (Paap et al., 2014) and a potential publication bias favoring

enhancement evidence (de Bruin et al., 2015). Based on this literature, one might predict that

balanced bilinguals who speak a tone language (e.g., Cantonese-English bilinguals) would

perform comparably to non-tone-language-speaking, balanced bilinguals and their non-tone-

language monolingual counterparts on working memory measures (Barac & Bialystok, 2012;

Bialystok, Luk, Peets, & Yang, 2010).

6.1.3 Working memory in tone-language speakers and musicians

If one examines the links between the use of lexical tone and enhanced working memory,

there is a possibility that tone-language speakers might show a working memory advantage over

non-tone-language speakers, regardless of bilingual status. Specifically, tone-language use

involves relative pitch processing (Xu, 1997, 1999; Xu & Wang, 2001), which recruits working

memory. For example, studies of AP possessors versus relative pitch (RP) possessors have found

89

that only RP processing is associated with a neural component (i.e., P300) that indexes working

memory (Klein, Coles, & Donchin, 1984, Wayman et al., 1992). Studies using positron emission

tomography and functional magnetic resonance imaging have found that areas involved in

monitoring pitch information are also more active during RP processing than during AP

processing in musicians (Zatorre et al., 1998). The involvement of RP and pitch monitoring in

using pitch to distinguish lexical meaning raises the possibility that tone-language speakers could

show improvements in auditory working memory as compared to non-tone-language speakers.

Such an improvement may be rooted in top-down modulation of perceptual benefits

(Moreno & Bidelman, 2014). Moreno and Bidelman (2014) argue that general enhancements in

executive function confer perceptual benefits in musicians (i.e., a top-down influence),

regardless of whether these executive functions are specific to the auditory domain. As Moreno

and Bidelman (2014) note, the concept of top-down, executive-level regulation of sensory

processes has received support from nonhuman (Fritz, Shamma, Elhilali, & Klein, 2003) as well

as human studies (Myers & Swan, 2012), such that increased feedback from prefrontal and

parietal regions enhances or inhibits activity in stimulus-selective sensory cortices. Thus, top-

down executive-level mechanisms may modulate lower-level benefits in musicians. By the same

logic, if speaking a tone language benefits working memory, then working memory may

modulate perceptual processing in this group.

There is some evidence that musicians show enhancement in certain types of working

memory. Musicians have shown enhanced verbal (auditory) but not visual working memory

when compared with nonmusicans (Brandler & Rammsayer, 2003; Chan et al., 1998; Ho et al.,

2003; Parbery-Clark, Skoe, Lam et al., 2009; Strait et al., 2010; Tierney et al., 2008). This

evidence suggests that the benefits of musicianship to working memory are stronger in auditory

than in non-auditory domains (Moreno & Bidelman, 2014). However, there are also findings of

modality-independent memory enhancement in musicians (Bidelman, Hutka, et al., 2013; George

& Coch, 2011; Jakobson et al., 2008). Furthermore, the claim that music training confers non-

musical, cognitive benefits has been criticized. For example, studies testing this association often

fail to take into account that individuals with high full-scale IQ (FSIQ) are more likely than

others to take music lessons and do well on any test administered to them (Schellenberg, 2011a).

These conflicting results indicate the need to examine the contribution of working memory to

lower-level processes in musicians as well as to include a measure of intelligence while doing so.

90

6.1.4 The present investigation

The present study examined how visual working memory18 is related to pitch memory

and to pitch discrimination (i.e., top-down modulation from outside the auditory domain) and

how pitch memory is related to pitch discrimination (i.e., top-down modulation from within the

auditory domain). These associations were tested in musicians, tone language speakers, and

controls (nonmusician, non-tone-language speakers). This study examined if tone-language users

exhibit enhanced visual working memory relative to non-tone-language users. If speaking a tone

language and musicianship are associated with enhanced visual working memory, then these

groups should perform better than controls on a working memory task. Note that one would

predict that tone-language speakers and musicians would not show an identical benefit over

controls. As mentioned earlier, musicians have far greater demands on their auditory system, as

well as possible pre-existing differences related to auditory processing benefits (e.g., Macnamara

et al., 2014) that may distinguish their performance on auditory tasks from that of tone-language

speakers. Furthermore, musicians regularly engage cognitive functions as part of their discipline

(e.g., auditory working memory; multitasking) due to the demands of playing music at an

advanced level (e.g., memorizing lengthy pieces, attending to auditory and visual cues from

other musicians while performing in an ensemble), which is not the case for tone-language

speakers.

To test this hypothesis, musicians, tone-language speakers, and controls were given a

two-back visual working-memory task. Participants also completed the F0 difference-limen task

from Chapter 3 and a short-term pitch memory task from Bidelman, Hutka, et al. (2013). These

tasks were used to probe how working memory outside the auditory domain (i.e., two-back

performance) is associated with an auditory difference limen task as well as a more cognitively-

demanding auditory task (pitch memory).

Performance on these two auditory tasks was also examined after holding visual working-

memory performance constant. If the lower-level auditory enhancements of musicianship and/or

18

Note that the phrase “visual working memory” is used to emphasize that there were no auditory stimuli presented to participants in this particular task. However, “verbal working memory” also accurately describes the present task, as participants hold and update information in their phonological loop, as per Baddaley & Hitch’s (1974) model of working memory.

91

speaking a tone language are driven by top-down modulation, then any benefits to pitch

discrimination or pitch memory observed in musicians and tone-language speakers should be

reduced after controlling for visual working memory. The Wechsler Abbreviated Scale of

Intelligence – Second Edition (Wechsler & Hsiao-Pin, 2011) was used to ensure there were no

group differences in fluid intelligence.

The present study examined tone-language experience from multiple languages rather

than Cantonese only as in much previous research (Bidelman, Hutka, et al., 2013; Chapter 3).

The rationale for the inclusion of multiple languages was that the performance of musicians,

Cantonese speakers, and nonmusician, non-Cantonese controls on pitch-difference limens and

pitch-memory tests is well-established. For example, the F0 DL results from Bidelman, Hutka, et

al. (2013) were replicated in Chapter 3. The inclusion of other tone languages made it possible to

examine whether specific tone languages have differential effects on auditory processing.

6.2 Methods

6.2.1 Participants

Sixty participants participated in this study. The participants were recruited from the

Royal Conservatory of Music, the University of Toronto, and the Greater Toronto Area. Each

participant completed questionnaires to assess their language and musical background (same as

in Chapter 3). English-speaking musicians (n = 21, 10 female; age: M = 24.19 years, SD = 3.12

years) were amateur instrumentalists with at least 8 years of continuous training in Western

classical music on their primary instrument (M = 14.91 years, SD = 3.21), beginning in

childhood (M = 8.19 years, SD = 3.20). All musicians had formal private or group lessons within

the past five years and currently played their instrument(s).

English-speaking nonmusicians (n = 20, 15 female; age: M = 21.45 years, SD = 2.89) had

≤ 3 years of formal music training (M = 0.75 years, SD = 0.85) and had not received formal

instruction within the past five years. Both musicians and nonmusicians had some experience

with a non-tone second language (musicians: 85.71%, nonmusicians: 75.00%; mainly French or

Spanish), but were classified as late L2 learners and/or had moderate to high levels of

proficiency in their second language. Proficiency ratings were based on participant responses on

a scale with seven options, ranging from “very poor” to “fluent”. Specifically, of the 18

92

musicians who had some L2 experience, participants rated their L2 proficiency as follows: n = 2

as fluent; n = 5 as very good; n = 3 as good; n = 1 as moderate, and n = 7 as fair. For the 15

nonmusicians with some L2 experience, the following ratings were obtained: n = 4 as fluent; n =

1 as very good; n = 4 as good; n = 3 as moderate, and n = 3 as fair. Participants who rated their

L2 proficiency as “poor” or “very poor” were considered to have no L2 proficiency.

Tone-language speakers (n = 19; 12 female; age: M = 23.84 years, SD = 4.09) were late

bilinguals, having begun formal instruction in English after a mean age of 12.74 years (SD =

4.58). This group consisted of nine Cantonese speakers, eight Mandarin speakers, one Thai

speaker, and one Vietnamese speaker. All participants were born and raised in predominantly

tone-language-speaking countries (e.g., China, Thailand, Vietnam), and reported using their

native tone language on a regular basis (M = 38.89% of daily use, SD = 19.34%). As with

nonmusicians, tone-language speakers had minimal musical training (M = 0.42 years, SD = 0.84)

and no formal instruction in the past five years. Importantly, nonmusicians and tone-language

speakers did not differ in amount of music training, F(1, 37) = 1.479, p = 0.232, η2p = .038. All

participants were right-handed. Despite attempts to match participants’ years of education, there

was a significant group difference on this metric, F(2, 57) = 6.583, p = .003, η2p = .188).

Specifically, musicians had more formal education (M = 17.48 years, SD = 2.21) than the other

two groups (tone-language speakers: M = 15.90, SD = 1.49, p = .045; nonmusicians: M = 15.30,

SD = 2.16, p = .003, which did not differ from each other, p > .05). We posit that the additional

years of education musicians received relative to the other two groups was due to music-specific

training, during which few or no non-music courses would have been a part of the participants’

curriculum. Nearly all musicians were pursuing a first or second post-secondary music degree

via the Royal Conservatory of Music, during which performance on one’s instrument is the

primary focus of education. Thus, any attenuation of significant effects was likely due to

increased years of musical training in musicians.19 All participants provided written, informed

consent in compliance with an experimental protocol approved by the Baycrest Centre Research

Ethics Committee. All were provided financial compensation for their time.

19 Task performance was also analyzed with years of education partialled out. We conducted an analysis of covariance (ANCOVA) on each dependent variable, with years of education as the covariate. Though musicians had approximately two more years of education than the other groups, the pattern of results did not change when controlling for years of education, with the exception of pitch memory reaction time. Namely, the significant group effect became marginal controlling for years of education, F(2, 56) = 2.840, p = .067, η2

p = .092 (driven by musicians being marginally faster than tone language speakers, p = .067).

93

6.2.2 Measures

6.2.2.1 F0 DL task

See Section 3.2.3 for description of F0 DL task (see Figure 13A for a schematic

illustration).

Figure 13. A: Fundamental frequency difference limen task. B: Pitch memory task. C: Visual

two-back task.

6.2.2.2 Pitch memory task

The short-term pitch-memory task used in the present study was the same as that used in

Bidelman, Hutka, et al., (2013) (see Figure 13B for a schematic illustration). The task was

designed to test the relationship between musical and nonmusical cognitive abilities (cf. Russo et

al., 2012; Steinke et al., 1997). The task assessed short-term memory of pitch sequences. On

each trial, participants heard a four-note melody (350-ms complex tones). Following a 1.5-s

silent interval, participants were asked to judge as quickly as possible whether or not a probe

tone had been heard in the preceding sequence.

Individual pitches were selected randomly from the Western chromatic scale. Random

selection ensured that melodies were tonally ambiguous, thereby minimizing the chance that

sequences could be recalled based on musical labels (e.g., musical solfège: Do, Re, Mi, etc.).

Participants heard 50 trials during the course of a run, half of which were catch trials in which

the probe tone did not occur in the melody. Sensitivity (d') was computed using rates of hits (H)

94

and false alarms (FA) for each run (i.e., d' =z(H)- z(FA), where z represents the z-transform).

Individual d' values were then averaged for two consecutive runs to yield a pitch-memory score.

Reaction time for correctly identified trials was calculated as the time lag between stimulus

offset and listeners’ response.

6.2.2.3 Two-back task

A visual two-back task (Esopenko et al., 2013) was administered to probe working

memory beyond the auditory domain (see Figure 13C for a schematic illustration). In each block,

participants were presented with numbers (1 through 8) on a computer screen. The numbers were

presented one at a time in the centre of a computer monitor, in a continuous sequence, with an

ISI of 500 ms. Presentations of each stimulus ranged from 800 ms to 975 ms, increasing in units

of 25 ms (i.e., to avoid predictability of stimulus presentation). Each block consisted of 72

stimuli, 24 of which were targets (i.e., the same stimulus presented two numbers earlier), and one

run consisted of 3 blocks. Participants were instructed to press a button on a keyboard as soon as

they detected a target. Number of correct and incorrect responses and reaction time were

recorded.

6.2.2.4 Wechsler Abbreviated Scale of Intelligence - Second Edition

Two subtests (Block Design, and Matrix Reasoning) from the Wechsler Abbreviated

Scale of Intelligence - Second Edition (WASI-II; Wechsler & Hsiao-Pin, 2011) were

administered in a standardized order. In the Block Design subtest, participants were asked to

construct designs from red and white blocks to match the design of a target picture. In the Matrix

Reasoning subtest, participants saw a matrix of coloured drawings on each trial. One section of

the matrix was missing, and the participant was asked to select one of five options to complete

the missing section.

Standardized T scores (M =50, SD = 10) were provided for each subtest. Each score was

based on norms from a large sample of American adults, calibrated separately for age.

Composite T scores from the Block Design and Matrix Reasoning subtests yielded a metric

called Performance IQ (PIQ), which was calculated for each participant. Note that FSIQ was not

measured because it requires T scores from a verbal subtest (Vocabulary) and Matrix Reasoning.

PIQ, consisting solely of non-verbal subtests, was considered appropriate for individuals whether

95

or not their first language was English. Verbal measures in a non-native language may depress

the scores of non-native relative to native speakers (e.g., Kaufman Brief Intelligence Test,

Schellenberg, 2011b).

6.2.3 Procedure

Participants completed the WASI-II, two runs of the F0 DL task, two runs of the pitch-

memory task, and three runs of the two-back task. The two auditory tasks were run on MATLAB

v2009b, and the two-back task was run on Presentation software (version 17), both programs

running under Windows 7. The order of tasks was counterbalanced.20 The WASI-II subtests

were always presented in the same order (Block Design followed by Matrix Reasoning).

Auditory stimuli were delivered binaurally via over-the-ear insert earphones (Beyerdynamic DT

770 PRO, Heilbronn, Germany). The participation session lasted approximately one hour.

6.2.3.1 Statistical analysis Data for all blocks for a given task were averaged together. Data for all blocks for a given

task were averaged. Prior to statistical analyses, F0 values were square-root transformed to

satisfy normality and homogeneity of variance assumptions required for parametric statistics.

Note that when a univariate ANOVA was conducted on the raw F0 DL data (rather than the

transformed data), the pattern of results remained the same (i.e., significant results remained

significant, and non-significant results remained non-significant). However, the bar graph

displaying the F0 DL results (Figure 14) shows the raw F0 DL means and standard error bars, as

these values are easier to interpret than the square root values. For pitch memory, d’ and mean

reaction time were analyzed. For the two-back task, accuracy and reaction time were analyzed.

For the WASI-II, there was no significant between-groups difference in performance on the

Block Design or Matrix Reasoning subtests (i.e., the two subtests that comprise the PIQ metric).

Thus, only PIQ is reported in subsequent analyses. Between-groups differences were examined

via univariate ANOVA (group x task). Tukey’s correction for multiple comparisons was reported

for pairwise comparisons.

20

The session was comprised of WASI-II, two blocks of the F0 DL task and pitch memory-task respectively, and three blocks of the two-back task (i.e., eight items in total). The eight items were arranged according to a Latin square design (i.e., eight x eight matrix; one row = order of tasks for one participant). If two of the same auditory tasks ended up in consecutive order, they were re-arranged, in order to keep the session as engaging as possible. The order for the first eight participants was repeated for each subsequent set of eight participants.

96

Analyses of covariance (ANCOVAs) were also conducted to test for top-down influences

on lower-level task performance. For pitch memory d’, two-back accuracy was held constant.

For the F0 DL data, two-back task accuracy was held constant. For the ANCOVAs as well as the

correlation analyses (described subsequently), only d’ for pitch memory and accuracy for the

two-back task were included (i.e., no reaction time data). This was done because it is difficult to

interpret the meaning of reaction time performance in isolation of accuracy measures. For

example, it is unclear whether a group with slow reaction time and high accuracy might perform

worse if pressed to give a faster response; similarly, the inverse is possible. The Bonferroni

correction for multiple comparisons was reported for pairwise comparisons. For all ANCOVAs,

the assumption of homogeneity of regression slopes was not violated. Furthermore, the covariate

was not significantly related to each dependent variable.

Correlations were also run between F0 DL, d’, two-back accuracy and PIQ, both across

all groups, and for each group.

6.3 Results

6.3.1 Correlations Correlations between tasks across all groups (Table 6) were significant

between the F0 DL task and d’; d’ and 2-back accuracy, and 2-back accuracy and PIQ. To more

closely examine the association between tasks within groups, correlations between tasks for each

group are reported (Table 7). The correlation between F0 DL and d’ is only significant in Ms.

The correlation between 2-back accuracy and PIQ is only significant in NMs.

Table 6.

Correlations between Tasks Across All Groups. 1 2 3 1 F0 DL

2 Pitch memory: d' -.381** 3 2-back: Accuracy -0.152 .277*

4 PIQ -0.212 0.191 .282* Note. ** p ≤ .01 *p ≤ .05

97

Table 7. Correlations between Tasks, Displayed by Group. M

1 2 3

1 F0 DL

2 Pitch memory: d' -.647**

3 2-back: Accuracy 0.145 0.087

4 PIQ -0.084 -0.145 -0.009 TL

1 F0 DL

2 Pitch memory: d' -0.446

3 2-back: Accuracy -0.070 0.256

4 PIQ -0.200 0.425 0.263

NM

1 F0 DL

2 Pitch memory: d' -0.366

3 2-back: Accuracy -0.151 0.303

4 PIQ -0.194 0.285 .519*

Note. M = musicians; TL = tone language speakers; NM = nonmusicians. ** p ≤ .01 *p ≤ .05

6.3.2 F0 DL

There was a significant between-groups difference in F0 DL performance [F(2, 57) =

8.072, p = .001, η2p = .221] such that musicians had lower pitch discrimination thresholds (i.e.,

better performance) than nonmusicians (p = .001; Figure 14). Musicians performed comparably

to tone-language speakers (p = .116); tone-language speakers performed comparably to

nonmusicians (p = .143).

98

Figure 14. Performance on the fundamental frequency (F0) difference limen task. Musicians (M)

showed superior pitch discrimination performance relative to nonmusicians (NM) controls. **p ≤

.001. Error bars indicate SE.

6.3.3 Pitch memory There was a significant between-groups difference in performance measured by d’ [F(2,

57) = 32.866, p < .001, η2p = .536]. Namely, musicians had higher d’ scores (i.e., better

performance) than both tone-language speakers (p < .001) and nonmusicians (p < .001; Figure

15A). Tone-language speakers marginally outperformed nonmusicians, p = .066. There was also

a significant between-groups difference in pitch-memory reaction time [F(2, 57) = 3.198, p =

.048, η2p = .101]. Musicians had faster reaction times than tone-language speakers (p = .038;

Figure 15B). There was no significant difference between the reaction times of musicians and

nonmusicians (p = 0.528) or between tone-language speakers and nonmusicians (p = .329).

99

Figure 15. Between-group performance on the pitch memory task (A: d’ data; B: reaction time

data). A gradient in d’ performance is visible, such that M > TL > NM. M perform faster than

TL. There appears to be a speed-accuracy trade-off in TL, such that good performance is

accompanied by slower reaction times. **p ≤ .001, *p < .05. � = marginally significant. Error

bars indicate SE.

6.3.4 Two-back task There was no significant between-groups difference in two-back task accuracy [F(2, 57)

= 1.498, p = .232, η2p = .050; Figure 16A] or mean reaction time [F(2, 57) = 2.262, p = .113, η2

p

= .074; Figure 16B].

Figure 16. Between-group performance on the two-back task (A: accuracy data; B: reaction time

data). Group differences are not significant. Error bars indicate SE.

There was also no significant between-groups difference in the PIQ score [F(2, 57) =

2.253, p = .114, η2p = .073; Figure 17]. In Figure 17, note that the apparent difference between

the nonmusicians and the other two groups is primarily driven by one nonmusician participant

who scored poorly on both Block Design and Matrix Reasoning subtests. This participant did not

100

score outside of the three standard deviation cut-off for either subtest or on other tasks, and

remained in the analysis. Note that removing this outlier from the analysis moved the

nonmusician group mean up to 105.000, SD of 10.930 (from M = 102.950, SD = 14.043).

Figure 17. Between-group PIQ performance. Group differences are not significant. Error bars

indicate SE.

6.3.5 ANCOVAs After controlling for two-back task accuracy, there was still a significant between-groups

difference in pitch memory d’ [F(2, 56) = 29.687, p < .001, η2p = .515; musicians outperformed

tone-language speakers and nonmusicians, p < .001, tone-language speakers marginally

outperformed nonmusicians, p = .069] and F0 DL performance [F(2, 56) = 7.129, p = .002, η2p =

.203; musicians outperformed nonmusicians, p = .001].

6.4 Discussion

6.4.1 Working memory

Musicians outperformed tone-language speakers and controls on measures of pitch

discrimination. Tone-language speakers marginally outperformed controls on the pitch-memory

task. The tone-language group’s accuracy appeared to have been achieved at the expense of their

reaction time, which was slower than that achieved by musicians (p = .038; i.e., time-accuracy

trade-off). Tone-language speakers did not show any enhancement to working memory, as

compared to musicians and controls. Indeed, there were no group differences in two-back task

performance. There were also no between-group differences in the WASI-II composite score.

These data suggest that the use of relative pitch within tone languages (Xu, 1997, 1999; Xu &

Wang, 2001) does not translate to any benefit outside the auditory domain, but may confer some

101

modest benefits over controls on pitch-memory performance. This lack of a working memory

enhancement in tone-language speakers (i.e., bilinguals) is also consistent with the reported

absence of cognitive-processing advantages for young adult bilinguals (de Bruin et al., 2015;

Paap et al., 2014). Note, however, that most musician and control participants had some

experience with a second language, which minimizes the likelihood of finding between-group

differences in cognitive performance. Such effects, if evident, would be revealed more readily

from comparisons of bilingual and monolingual speakers.

The lack of group differences on the visual working memory task does not align with the

numerous claims in the literature about the cognitive benefits of music training (e.g., George &

Coch, 2011; Jakobson et al., 2008). How can one account for these differences? First, let us

consider FSIQ in relation to the current finding. Specifically, the cognitive benefits of music

training documented in the literature might be confounded by the fact that high-IQ participants

tend to take music lessons, and relatedly, also do better on cognitive tasks (i.e., music training

does not cause cognitive benefits; Schellenberg, 2011a). To claim that there is a specific

association between nonmusical cognitive abilities and music training would require that such an

effect would remain when controlling for general intelligence (Schellenberg, 2009). The present

study included a measure of intelligence (see Section 6.2.2.4 for rationale for measuring PIQ

over FSIQ), on which between-group performance did not differ. Thus, when PIQ performance

was comparable across groups, working memory did not differ.

How can one account for the lack of a between-group PIQ difference, considering the

claim that high-IQ participants tend to take music lessons (Schellenberg, 2011a)? Perhaps the

answer can be found in the type of musicians being tested. The current musician sample studied

music exclusively, with the aim of becoming career musicians (beginning lessons, on average, at

age 8; SD = 3.20, and having formally studied music continuously for a mean of 14.91 years, SD

= 3.21). As discussed in Schellenberg (2011a), those who study music exclusively (i.e., instead

of something else, not in addition to something else) do not show any differences in intelligence

as compared to nonmusicians (Bialystok & DePape, 2009; Brandler & Rammsayer, 2003;

Helmbold, Rammsayer, & Altenmuller, 2005; Schellenberg, & Moreno, 2010). Specifically,

Schellenberg & Moreno (2010) found no difference in intelligence between participants with an

average of 11 years of music lessons (i.e., a lower mean than in the present musician group), as

compared to nonmusicians.

102

Out of three recent studies that have found non-auditory, memory-specific benefits in

musicians (e.g., Bidelman et al., 2013; George & Coch, 2011; Jakobson et al., 2008), two did not

appear to test music students/professional musicians. George and Coch (2011), who found

benefits in musicians’ auditory and visual working memory as compared to nonmusicians, did

not report testing either university music students or professional musicians. Jakobson et al.

(2008), who found that musicians, as compared to nonmusicians, had verbal and nonverbal

memory benefits (i.e., learning, recall, and delayed recall tasks), did not test professional

musicians though it is unclear whether their participant sample included university-level music

students. Both authors did not administer a test of general intelligence, suggesting that these

benefits might indeed be related to self-selection via FSIQ in musicians (Schellenberg, 2008,

2009). Bidelman, Hutka, et al. (2013), who recruited their musician group primarily from the

University of Toronto’s Faculty of Music, only found a benefit in visuospatial short-term

memory (i.e., forward Corsi blocks task) in music students – a task which is not comparable to

the visual working memory task used at present. These observations further support the claim

that only individuals who take music training in addition to something else (rather than instead of

something else) have a high FSIQ, which in turn, is related to superior performance on other

cognitive tests (Schellenberg, 2011b). This claim may therefore account for why the present

music students did not show superior PIQ performance, and relatedly, did not show superior

visual working memory performance, as compared to the other groups.

The ANCOVAs controlling for working memory accuracy in the present study helped

test whether visual working memory accounts for differences in pitch memory or pitch

discrimination (i.e., a top-down influence of executive function to lower-level functions). There

were still group differences in F0 DL performance after controlling for two-back accuracy, as

there was initially no group difference in performance on the latter measure. There were also

group differences in d’ after controlling for two-back accuracy. These results support the notion

that pitch discrimination and pitch memory performance are not modulated by top-down control

via visual working memory in musicians or tone-language speakers. The pattern of correlations

for each group also suggests that perceptual (F0 DL) and cognitive (pitch memory d’) auditory

tasks were not significantly correlated with visual working memory accuracy. These results,

coupled with the null finding for between-group visual working memory differences, provide

103

evidence against the model posited by Moreno and Bidelman (2014), as related to visual working

memory in university music students.

The correlation between d’ and F0 DL performance is also notable. Specifically, the

significant correlation between F0 DL performance and pitch memory d’ found in musicians, but

not the other two groups, suggests that any within-domain top-down modulation is only

associated with musicianship (and not speaking a tone language). The conclusion that musicians’

cognitive advantages are stronger in auditory than in non-auditory domains is consistent with

other studies that reveal musicians’ (as compared to nonmusicians’) enhanced auditory, but not

visual, working memory (e.g., Brandler & Rammsayer, 2003). It is possible that a similar

correlation was not observed in tone-language speakers because of lesser audtory demands as

well as lack of self-selecting factors, as compared to musicians.

6.4.2 Replication of auditory measures, and associated limitations

The current data partially replicate the findings from Bidelman, Hutka, et al. (2013), who

reported that musicians were more accurate and responded faster on the pitch-memory task than

did Cantonese speakers and nonmusicians. Cantonese speakers were more accurate but

responded more slowly than nonmusicians controls. At present, there was a marginal gradient in

pitch-memory accuracy (musicians > tone language speakers > controls), with tone language

speakers showing a similar time-accuracy trade-off as in Bidelman, Hutka, et al. (2013). In

contrast to past findings (Bidelman, Hutka, et al., 2013; Chapter 3), the tone-language speakers

in the present study did not outperform controls on the F0 DL task. One possibility is that

Cantonese speakers had more pitch processing experience than the other language groups, and

thus outperformed them. Cantonese is more complex (three level tones; three contour tones;

Rattanasone et al., 2013, Wong et al., 2012) than Mandarin (one level tone, three contour tones;

Rattanasone et al., 2013), Thai (three level tones, two contour tones; Abramson, 1962;

Rattanasone et al., 2013), or Vietnamese (one level, five contour; Dung, Houng, Boulakia, 1998).

Indeed, these differences were a part of the rationale for including heterogeneous tone language

groups, namely to observe if behavioural results would still resemble those previously obtained

in homogenous Cantonese populations (Bidelman et al., 2013); Chapter 3). The present results

did not fully replicate these past findings, despite similar trends in the current data. These results

might lead one to conclude that using fewer tones in one’s language might be associated with

104

poorer behavioural performance, as compared to those who use a greater number of tones.

However, when examining the between-group differences on all behavioural measures, with the

groups defined according to first language, there were no differences (p’s > .05). This null

finding may be related to small sample sizes in each language group. Future studies could

conduct this same study with a greater number of participants who spoke, for example, Mandarin

(four tones), Thai (five tones), or Cantonese (six tones), to examine if better performance is

associated with greater linguistic pitch processing demands.

Another potential factor that may account for the differences between the present findings

and past literature is the percent of daily use of one’s tone language. Nonetheless, there were no

between-group differences in performance on the behavioural measures when controlling for the

percentage of daily use of one’s tone language (p’s > .05). However, when examining the mean

percent of daily tone-language use (38.89% of daily use, SD = 19.34%), and comparing it to that

of Chapter 3 (43.53% daily use, SD = 29.79 %), it is evident that the latter group used their

native tone language more often in daily life, albeit with a larger standard deviation. Perhaps the

interaction of speaking a complex tone language such as Cantonese combined with a high daily

use accounts for better performance on the F0 DL task. Future studies could examine how type

of tone language and daily use are related to F0 DL performance.

6.5 Conclusions

The findings from the present study suggest that neither tone language nor musicianship

is associated with advantages in visual working memory, as measured by a visual two-back task.

The lack of a benefit in musicians contrasts with the extensive literature showing nonmusical,

cognitive benefits in musicians relative to nonmusicians. However, this may be related to

differences in the type of musicians tested here (i.e., university music students) compared to

those tested in other studies (i.e., trained but not career musicians). Specifically, career

musicianship is not associated with the same FSIQ – and thus, cognitive – benefits as amateur

musicianship (Schellenberg, 2011a). Furthermore, top-down modulation of auditory abilities via

visual working memory may only be present in those with pre-existing differences in cognitive

function, perhaps accounting for the lack of top down effects observed in the present study. In

the auditory domain, musicians outperformed tone-language speakers and controls on the pitch-

memory and pitch-discrimination tasks. Tone-language speakers showed a marginal benefit over

105

controls on the former task. This result may be related to the higher auditory demands of

musicianship as well as pre-disposing factors that self-select those with superior pitch processing

for music training.

106

Chapter 7 General Discussion

7.1 Summary

The primary objective of the thesis was to assess whether tone-language experience was

associated with auditory-processing and executive-function benefits like those associated with

musicianship. Links between tone language and spectral acuity were first examined in Chapter 2,

in which musicians with absolute pitch but not tone-language experience showed enhanced pitch

encoding. These findings suggested that the pitch-processing advantages associated with

musicianship and tone language are independent. Because it was impossible to tease apart the

relative contributions of music and tone-language expertise on pitch processing in a population

of musicians, subsequent studies tested tone-language speakers who were nonmusicians,

musicians with no tone-language experience, and controls with neither tone-language experience

nor music experience.

Chapter 3 used behavioural measures and EEG to examine discrimination of music and

vowel sounds in tone-language speakers, musicians, and controls. This study established that

musicians and tone-language speakers were similar and better than controls in pitch

discrimination, but only musicians exhibited timbral processing advantages (i.e., first formant

discrimination) relative to tone-language speakers and controls, as revealed by brain and

behavioural measures. The findings suggested that tone-language users exhibit some pitch-

processing advantages observed in musicians, but these advantages are task-specific (i.e.,

relating to F0 cues). Interestingly, tone-language speakers’ enhanced pitch discrimination was

evident in behavioural but not in neural measures. This discrepancy between brain and

behaviour, as well the possibility that musicianship and tone-language experience have different

consequences (Chapter 2), promoted an inquiry into nonlinear means of uncovering nuances in

auditory processing networks in musicians and tone-language speakers.

Specifically, Chapter 4 focused on nonlinear approaches to perceptual processing

networks in musicians and tone-language speakers, complementing linear approaches in current

use in this domain. Chapter 5 applied this framework to the examination of nonlinear

dependencies in an EEG time series from Chapter 3. This analysis demonstrated that musicians,

tone-language speakers, and controls use different networks to support auditory processing of

107

speech and music. Furthermore, there was a gradient of pitch processing ability such that greater

experience with pitch was associated with greater sample entropy in pitch processing. In

contrast, neural data from the MMN analyses (Chapter 3) indicated that musicianship but not

tone-language experience was associated with neural enhancements that supported auditory

processing of pitch and timbre. In other words, automatic processing of pitch or timbre changes

did not differ between tone-language speakers and controls (nonmusicians who did not speak a

tone language) in Chapter 3, whereas MSE data showed processing differences between these

groups. The nonlinear data from Chapter 5 thus provide a nuanced view of the neural networks

that support auditory processing in musicians and tone-language speakers. This difference in

results between the two datasets demonstrates the value of multiple convergent techniques for

investigating auditory processing in musicians and tone-language speakers.

Collectively, Chapters 2 through 5 examined similarities and differences in pitch

processing in tone-language speakers and musicians by means of multiple approaches, including

behavioural tasks, EEG, multiscale entropy, and spectral analysis. The overall conclusion that

emerges from this research is that pitch experience arising from tone-language use is not

associated with the same auditory-processing benefits that are evident among musicians.

However, tone-language speakers still exhibit better spectral acuity than controls. This gradient

effect might be related to two factors. First, there are differences in the auditory processing

demands associated with musicianship and speaking a tone language. Namely, musicians

constantly engage in a range of auditory tasks, such as pitch discrimination and pitch memory, as

part of their discipline, whereas the only aspect that differentiates tone-language speakers from

non-tone-language-speaking controls is the former group’s use of pitch at the phonemic level

(i.e., six lexical tones). Second, musicians are a self-selected sample as compared to tone-

language speakers: Pre-existing differences (e.g., pitch processing aptitude, SES, personality)

may lead certain individuals to music training, whereas everyone born in a tone-language-

speaking country will learn that language, regardless of predispositions that favour auditory

processing. Based on these two factors, it is plausible that musicians outperform tone-language

speakers, who in turn, outperform controls on auditory tasks.

Chapter 6 sought to move beyond the auditory realm, and examined whether musicians

and tone-language speakers would show benefits on a visual working memory task, as compared

to controls. This investigation was motivated by two factors. First, tone language has been

108

reported to utilize relative pitch processing (Xu, 1997, 1999; Xu & Wang, 2001), which has been

associated with benefits to working memory (Klein, Coles, & Donchin, 1984, Wayman et al.,

1992). Notably, there is also a wealth of literature demonstrating that musicians outperform

nonmusicians on non-musical, memory-related tasks (e.g., George & Coch, 2011; Hansen,

Wallentin, & Vuust, 2013). Second, a recent theory posits that music training confers benefits to

cognitive and perceptual domains via top-down, executive level modulation (Moreno &

Bidelman, 2014). If speaking a tone language is associated with visual working memory benefits

via relative pitch processing, then this group would outperform controls on such a measure. Such

findings would parallel observations of enhanced visual working memory in musicians (e.g.,

George & Coch, 2011; Jakobson et al., 2008). Furthermore, if both musicians and tone-language

speakers show a benefit to visual working memory, then one might observe an association

between working memory performance and performance on auditory tasks (i.e., top-down

modulation).

To this end, Chapter 6 tested musicians, tone-language speakers, and controls on

perceptual and cognitive aspects of auditory processing and on visual working memory (i.e.,

visual two-back task). The results of this study revealed that neither tone language nor

musicianship was linked to advantages to visual working memory. Tone-language speakers

outperformed controls on pitch memory, but musicians outperformed tone-language speakers.

These findings lead to four conclusions. First, tone language is not associated with enhanced

working memory, via relative pitch use. Second, there does not appear to be any top-down

modulation of cognitive and perceptual auditory tasks via visual working memory in either

musicians or tone-language speakers. Third, music students pursuing a professional career in

their discipline do not outperform tone-language speakers or controls on visual working memory

(performance may differ for amateur musicians who are not primarily pursuing music as a

career). At the surface, these findings seem contrary to the wealth of evidence supporting

nonmusical, memory-related benefits in musicians. However, this discrepancy may be explained

by pre-existing differences in cognitive skill between those who study music exclusively (those

in the current sample) versus those who pursue music training in addition to other endeavours

(those in prior research that demonstrated a visual working memory benefit in musicians).

Fourth, musicians outperformed tone-language speakers and controls on auditory tasks (pitch

memory; pitch discrimination), with a gradient effect emerging on the pitch-memory task

109

(musicians outperform tone language speakers, who in turn, outperform controls). This may be

related to the high demands of auditory processing as well as self-selection criteria associated

with musicians, which are not present in tone-language speakers.

Collectively, these data support the hypothesis that musicians and tone-language speakers

differ with regard to their auditory processing capacities. Speaking a tone language does not

confer additional benefits in pitch encoding for AP and non-AP musicians. Furthermore,

speaking a tone language is not associated with enhanced neural responses indexing pitch

discrimination. Conversely, AP (i.e., a music-related ability) is associated with better pitch

encoding, and musicians showed larger MMN responses to pitch and timbre discrimination as

well as better behavioural timbral discrimination than Cantonese speakers and controls. A

gradient effect emerged for the information processing capacities of musicians, Cantonese

speakers and controls, as well as for behavioural pitch memory performance. These effects may

be associated with the aforementioned differences in the auditory demands associated with each

group, as well as the self-selection factors related to musicianship. Cantonese speakers did,

however, perform similarly to musicians on pitch discrimination, suggesting that any similarities

between musicians and tone-language speakers may exist at a very basic, perceptual level, are

not modulated by executive function (i.e., visual working memory, Chapter 6), and are supported

by different information processing capacities for pitch (Chapter 5). Finally, no between-group

differences in visual working memory were observed, suggesting that visual working memory is

not especially honed in music students or in tone-language speakers, and thus, does not modulate

top-down control over lower-level auditory processing in these groups.

7.2 Musicianship: Nature versus nurture

As discussed in Chapter 1, music training can be viewed as a model of gene-environment

interaction (i.e., nature and nurture; Schellenberg, 2015). In contrast to musicians, tone-language

speakers are largely the product of environmental factors. Steps can also be made to disentangle

the effects of nature versus nurture via the use of random assignment to music training versus a

control group in a longitudinal design (e.g., Chobert et al., 2014). However, it is notable that

random assignment to music training in a longitudinal design is associated with significant costs

(i.e., providing participants with free music lessons), and fraught with difficulties in

standardizing how lessons are taught, while still providing an environment typical of music

110

training (i.e., music lessons taught in person rather than via computer program, as in Moreno et

al., 2011). In order to justify the costs and efforts required to implement such studies, strong

correlational and quasi-experimental evidence would be required to obtain funding that allows

for their implementation. It is therefore important to acknowledge the role that correlational and

quasi-experimental studies have played - and continue to play - in advancing research in the

psychology and neuroscience of music. However, one must simultaneously recognize the

limitations regarding causal inferences that can be made from such studies. These same

statements can be applied to other studies that compare expert versus non-expert populations. For

example, in a study of expert phoneticians (i.e., individuals trained to analyze and transcribe

speech) and non-phoneticians, there were correlations between neural structure (e.g., left pars

opercularis size) and years of phonetic training experience (Golestani, Price, & Scott, 2011).

However, there were also structural between-group differences in the transverse gyri in the

auditory cortex (thought to be established in utero) in the phoneticians versus controls,

suggesting that pre-existing differences in the brain might also lead certain individuals to

gravitate towards the study of phonetics (Golestani et al., 2011). The contributions of nature and

nurture likely play a role in many other types of expertise, necessitating their consideration in

studies of experts versus non-experts.

Correlational and/or quasi-experimental studies can also attempt to mitigate the

possibility that a third variable, such as intelligence, other cognitive abilities, or demographics

(e.g., personality, SES, education) is driving performance in musically-trained participants (as

compared to other groups). The present thesis attempted to do this by including measures to

probe non-verbal intelligence and matching or controlling for educational background (Chapter

3, 5, 6). These findings lend credence to the view that the effects observed in musicians were

influenced by music training rather than by pre-existing differences in intelligence. However,

pre-existing differences in auditory abilities, personality, and socio-economic status could have

led to differences among groups that were observed in the present thesis. Future investigations

that implement a longitudinal design with training assigned randomly would be able to eliminate

the potential influence of these variables.

7.3 Future directions Understanding how music training and speaking a tone language shape the brain and

111

behaviour is applicable to understanding, and eventually, developing rehabilitative interventions

for music and language that involve pitch processing. Individuals who are enrolled in music

therapy are unlikely to possess the pre-existing factors that self-select for music training.

Understanding the differences between pitch processing experience conferred by a nature/nurture

interaction (i.e., musicianship) versus only nurture (i.e., the limited auditory demands associated

with speaking a tone language) can shed light on how the neural circuitries related to pitch

processing can be shaped in individuals undergoing music therapy.

One intervention that relies heavily on pitch processing is melodic intonation therapy

(MIT), which is used to improve speech production in patients with non-fluent aphasia—a

profound speech-production impairment following left-hemispheric stroke (Albert, Sparks, &

Helm, 1973; Bonakdarpour, Eftekharzadeh, & Ashayeri, 2000; Laughlin, Naeser, & Gordon,

1979; Schlaug, Marchina, & Norton, 2008, 2009; Sparks, Helm, & Albert, 1974; Wilson,

Parsons, & Reutens, 2006). In MIT, a patient sings common phrases at a slow pace accompanied

by rhythmic, left-hand (i.e., contra-lesional) tapping; a hierarchical series of steps are followed,

which move from singing to speech. The components of MIT can be broken down into two parts,

namely the pitch-based component (i.e., singing) and a rhythm-based component (i.e., hand

tapping, which maps sound to action) (Schlaug et al., 2009). The benefits of MIT have

traditionally been ascribed to the pitch-based component, which stimulates the intact right-

hemisphere, eventually assuming the function of damaged left-hemisphere speech regions (see

Stahl, Kotz, Henseler, Turner, & Geyer, 2013; note that the Stahl et al. also discuss how the

rhythmic component also holds important contributions towards MIT’s efficacy). Recent

neuroimaging studies have shown that MIT enlarges the right arcuate fasciculus (Schlaug et al.,

2009) – a white matter tract that connects brain regions that enable auditory motor interaction

(e.g., superior temporal lobes, inferior frontal areas, premotor and motor regions, Catani &

Mesulam, 2008); Wan, & Schlaug, 2010). Notably, the AF is particularly well-developed in

professional singers, as compared to instrumental musicians and non-musician controls (Halwani

et al., 2011).

Recently, a new type of therapy called auditory-motor mapping (AMMT) has been

derived from MIT, to help elicit vocal and verbal production in nonverbal or minimally verbal

children with autism (Wan et al., 2011). AMMT associates pitch with action, such that the

researcher sings words and phrases with social connotations with and to the child, while showing

112

the child pictures of the action, person, or object (Wan et al., 2011). Simultaneously, the

researcher guides the child’s hand to play two drum pads tuned to different pitches (Wan et al.,

2011). MIT and AMMT both demonstrate how music making, and specifically, pitch processing

in a musical context, can be used in individuals without music training to rehabilitate speech

capacities. The current thesis helped establish how pitch discrimination in musicians and tone-

language speakers manifests at the behavioural and neural level, and laid the groundwork for

studies that can further investigate how different experiences with pitch processing can tune the

neural circuitries involved in music and speech processing. This knowledge could eventually be

applied to optimize current music-based interventions for speech processing, and to develop new

interventions.

113

8 Appendices

8.1 Chapter 2: Nonmusical Stimuli

Table S1

Nonmusical (Control) Stimuli Descriptions.

Name Modality Description

Bird Auditory Bird chirping

Visual Picture of a bird

Camera Auditory Camera shutter sound

Visual Picture of a camera

Chicken Auditory Chicken clucking

Visual Picture of a chicken

Cow Auditory Cow mooing

Visual Picture of a cow

Dog Auditory Dog barking

Visual Picture of a dog

Duck Auditory Duck quacking

Visual Picture of a duck

Fly Auditory Fly buzzing

Visual Picture of a fly

Frog Auditory Frog croaking

Visual Picture of a frog

Horse Auditory Horse neighing

Visual Picture of a horse

Phone Auditory Phone ringing

Visual Picture of a phone

Typewriter Auditory Keys of a typewriter clicking

Visual Picture of a typewriter

114

8.2 Chapter 3: N1 and P2

8.2.1 Introduction

The auditory N1 usually occurs approximately 100ms after stimulus onset and has a maximum

amplitude over frontocentral areas (Vaughan & Ritter, 1970) and/or the vertex (Picton, Hillyard,

Krausz, & Galambos, 1974). The proposed source of the N1 is the primary and associative

auditory cortex (Vaughn & Ritter, 1970). Specifically, (Picton et al., 1999) found that the N1

with maximal amplitude at frontocentral and vertex regions is mainly generated by activity in the

supratemporal plane, likely in or slightly posterior to the primary auditory cortex. The N1 has

been posited to reflect sensory and physical properties of a stimulus, such as intensity, or timing

as compared to other stimuli (Näätänen & Picton, 1987). The auditory P2 wave follows the N1

wave at anterior and central scalp sites (Luck, 2005), spanning a latency range of 150 to 275 ms

(Dunn, Dunn, Languis, & Andrews, 1998). The P2 is primarily generated in the secondary

auditory cortex (Bosnyak et al., 2004; Pantev, Eulitz, Hampson, Ross, & Roberts, 1996; Picton et

al., 1999; Scherg, Vajsar, & Picton, 1989), while (Picton et al., 1999) describes that the posterior

regions of the frontal lobe may contribute to the later part of the scalp-recorded N1 and the P2

wave. The P2 has been elicited in a variety of cognitive tasks, such as selective attention

(Hackley, Woldorff, & Hillyard, 1990; Hillyard, Hink, Schwent, & Picton, 1973; Johnson, 1989)

and stimulus change (Näätänen, 1990).

The auditory N1 and P2 have been found to be larger in musicians as compared to non-

musicians (Bosnyak et al., 2004; Pantev et al., 1998; Shahin et al., 2003). Indeed, there is

convergent evidence that these components coincide with improved perception, as discussed in

(Tremblay, Ross, Inoue, McClannahan, & Collet, 2014). Therefore, one might hypothesize that if

musicians have enhanced perception of sounds (reflected in an enhanced auditory N1 and P2 in

previous literature), and Cantonese speakers process auditory stimuli with similar spectral acuity

as musicians, then both groups would have more a pronounced N1 and P2 than controls.

In addition to providing a means to investigate auditory processing in musicians and

Cantonese speakers, the present analyses provided the opportunity to test how the large and

small deviants differed from the standard prior to subtraction. If change detection was elicited by

the deviants, then one would expect a more positive P2 in the standard than the deviant condition

115

(i.e., the greater the change detection, the more negative the P2 amplitude for the deviants, thus

indicating the MMN).

8.2.2 Methods: Analysis window and statistics

In the present investigation, the N1and P2 waves were each measured by calculating the

mean amplitude between two given latencies (Luck, 2005). These latencies were centered around

each component’s group mean by ± 20 ms. For the N1, the analysis window was 80ms to 120ms;

the P2 analysis window was 160ms to 200ms. A mixed ANOVA was conducted on the N1 and

P2 waves, with group as the between-subjects variable, and stimulus type (music or speech) and

deviant size (standard, small, or large) as the within-subjects variables. Bonferroni corrections

were applied to all pairwise contrasts to control for family-wise error (α = 0.05). When

appropriate, the degrees of freedom were adjusted with the Greenhouse-Geisser epsilon (ε) and

all reported probability estimates are based on the reduced degrees of freedom, although the

original degrees of freedom are reported. Partial eta-squared (η2p) was used as the measure of

effect size for all ANOVAs.

8.2.3 Results

8.2.3.1 N1

There was a marginal main effect of group on mean N1 amplitude, F(2, 57) = 2.886, p =

.064, η2p = .092 (see Table S2 for means and standard errors). Specifically, Cantonese speakers

had a marginally more negative N1 than nonmusicians (p = .094). There was no significant main

effect of stimulus type, F(1, 57) = 1.741, p = .192, nor was there a significant interaction of

stimulus type and group, F(2, 57) = 1.001, p = .374. There was a significant main effect of

deviant size [F(2, 114) = 5.170, p = .007, η2p = .083], such that the large deviant had a more

negative N1 than the standard (p = .035) and small deviant (p = .029). There was no significant

interaction between deviant size and group, F < 1. There was a significant interaction between

stimulus type and deviant size, F(2, 114) = 4.210, p = .027, η2p = .069. Namely, for the music

condition, there was no significant difference in N1 amplitude between the three levels of

deviant size, F < 1. For the speech condition, there was a significant difference in N1 amplitude

between deviant sizes, F(2, 56) = 7.702, p = .001, η2p = .216. Specifically, the large deviant’s N1

was significantly more negative than for the standard (p = .001) and the small deviant (p = .028).

116

There was no significant interaction between stimulus type, size, and group, F(2, 114) = 1.972, p

= .122.

Figure S1. ERP waves for each group and condition prior to subtraction. Each waveform is an

average across six fronto-central electrodes (F1, Fz, F2, FC1, FCz, FC2). M = musicians; C =


117

Table S2

Means and Standard Errors of N1 Analysis Variables at Each Level.

Group Stimulus Deviant size M SE M music standard -0.471 0.337 large -0.505 0.385 small -0.660 0.437 speech standard -0.590 0.277 large -1.606 0.372 small -0.643 0.233 C music standard -1.567 0.227 large -1.476 0.304 small -1.278 0.347 speech standard -1.376 0.218 large -1.900 0.293 small -1.569 0.262 NM music standard -0.781 0.346 large -0.791 0.461 small -0.477 0.315 speech standard -0.565 0.265 large -0.747 0.371 small -0.593 0.290


8.2.3.2 P2

There was no significant main effect of group on mean P2 amplitude, F < 1; see Table S3

for means and standard errors. There was a significant main effect of stimulus type [F(1, 57) =

261.821, p < .001, η2p = .821], such that the P2 for the music condition was more positive than

for the speech condition. There was a marginal interaction between stimulus type and group, F(2,

57) = 2.627, p = .081, η2p = .084. The music condition elicited a significantly more positive P2

than the speech condition in musicians [F(1, 57) = 99.406, p < .001, η2p = .636], Cantonese

speakers [F(1, 57) = 51.405, p < .001, η2p = .474], and nonmusicians [F(1, 57) = 122.759, p <

.001, η2p = .683]. There was a significant main effect of deviant size [F(2, 114) = 45.067, p <

.001, η2p = .442], such that the P2 for the standard was more positive than for the large (p < .001)

and small (p < .001) deviants. The P2 was also more positive for the small deviant than for the

large deviant (p = .004). There was a marginal interaction between deviant size and group, F(4,

114) = 2.286, p = .064, η2p = .074. Specifically, there was a significant difference between

118

deviant P2 amplitudes in musicians, F(2, 56) = 35.938, p < .001, η2p = .562, such that the P2 was

significantly more positive for the standard than for the large (p < .001) and small (p < .001)

deviant. There was also a significant differences between deviant P2 amplitudes in Cantonese

speakers, F(2, 56) = 9.235, p < .001, η2p = .248, where the P2 was significantly more positive for

the standard than for the large deviant (p < .001). The P2 deviant amplitudes also differed in

nonmusicians, F(2, 56) = 10.332, p < .001, η2p = .270, such that the P2 was significantly more

positive for the standard than for the large (p < .001) and small (p = .016) deviant. There was no

significant interaction between stimulus type and deviant size (F < 1), nor was there a significant

interaction between stimulus type, size, and group (F < 1 ).

Table S3

Means and Standard Errors of P2 Analysis Variables at Each Level.

Group Stimulus Deviant size M SE M music standard 3.422 0.334 large 1.709 0.449 small 2.260 0.502 speech standard 0.302 0.322 large -1.641 0.325 small -1.110 0.402 C music standard 3.390 0.574 large 2.056 0.485 small 2.950 0.616 speech standard 0.716 0.449 large -0.130 0.455 small 0.166 0.522 NM music standard 3.321 0.381 large 2.341 0.461 small 2.915 0.406 speech standard -0.157 0.400 large -1.227 0.431 small -0.974 0.396


119

8.2.4 Discussion

Cantonese speakers had a significantly more negative N1 than the other groups,

suggesting that the Cantonese speakers processed the sensory and physical properties of the all

stimuli differently than the other groups. There were no between-group differences in P2

amplitude. Collectively, these findings do not replicate past findings showing an enhanced

auditory N1 and P2 for musicians as compared to nonmusicians (e.g., Bosnyak et al., 2004;

Pantev et al., 1998; Shahin et al., 2003), and instead suggest that musicianship and speaking a

tone language do not modulate the N1 or P2 for musical tones or vowel sounds. This is puzzling,

given the converging evidence that suggests that these components coincide with improved

perception (Tremblay et al., 2014). It is possible that the present stimuli were too simple or

familiar for all particiants, thus yielding no group-specific enhancement in N1 or P2. However,

other studies have used comparably simple stimuli (e.g., piano tones in Pantev et al., 1998 and

Shahin et al., 2003; pure tones in Bosnyak et al., 2004). Thus, the simplicity or familiarity of the

stimuli may not account for the present lack of an enhanced N1 or P2 in musicians.

It is important to note that despite the evidence for an enhanced P2 coinciding with better

perceptual abilities, whether or not the P2 could serve as a biological marker of auditory learning

had not been studied until recently (Tremblay et al., 2014). To determine if the auditory evoked

P2 response is a biomarker of learning, Tremblay et al. (2014) taught native English speakers to

identify a new pre-voiced temporal cue that is not used phonemically in the English language, as

compared to participants who did not learn to identify this pre-voicing contrast. Modulations in

brain activity were recorded using EEG and MEG. The P2 amplitude increased across repeated

EEG sessions for all groups, regardless of any change in perceptual performance – an effect that

was retained for months (Tremblay et al., 2014). The changes to P2 amplitude were attributed to

changes in neural activity associated with the acquisition process, rather than the learned

outcome itself (Tremblay et al., 2014). Perhaps the prolonged exposure to the same stimuli

similarly enhanced the P2 of all groups in the present investigation, accounting for why there

was no between-groups difference in P2 amplitude.

At a methodological level, the deviant stimuli elicited a change in amplitude from the

standard for both the N1 and P2, as hypothesized. For the N1, the large deviant had a more

negative amplitude than for the standard and small deviant – an effect that was pronounced in the

120

speech condition. For the P2, the standard was more positive than the large and small deviants;

furthermore, the small deviant had a more positive P2 than the large deviant. This pattern of

findings are expected given the MMN results. Specifically, the large deviant condition’s P2

would have been attenuated (i.e., become more negative) upon elicitation of the MMN, which

overlaps with the P2. The peak MMN for the large deviant condition was indeed more negative

than that of the small deviant condition (Section 3.3.3.1.1). These findings thus confirm that the

deviant stimuli elicited a change in the standard and were modulated by the MMN – an expected

result, given the current ERP paradigm.

121

References

Abrams, D. A., Bhatara, A., Ryali, S., Balaban, E., Levitin, D. J., & Menon, V. (2011). Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. Cereb Cortex, 21(7), 1507-1518. doi: 10.1093/cercor/bhq198

Abramson, A. S. (1962). The vowels and tones of Standard Thai: Acoustical measurements and Experiments. Bloomington: Indiana University Research Centre in Anthropology, Folklore, and Linguistics.

Albert, M. L., Sparks, R. W., & Helm, N. A. (1973). Melodic intonation therapy for aphasia. Arch Neurol, 29(2), 130-131.

Anderson, M. L. (2010). Neural reuse: a fundamental organizational principle of the brain. Behav Brain Sci, 33(4), 245-266; discussion 266-313. doi: 10.1017/S0140525X10000853

Armony, J. L., Aubé, W., Angulo-Perkins, A., Peretz, I., & Concha, L. (2015). The specificity of neural responses to music and their relation to voice processing: An fMRI-adaptation study. Neurosci Lett, 593, 35-39. doi: 10.1016/j.neulet.2015.03.011

Athos, E. A., Levinson, B., Kistler, A., Zemansky, J., Bostrom, A., Freimer, N., & Gitschier, J. (2007). Dichotomy and perceptual distortions in absolute pitch ability. Proc Natl Acad Sci USA, 104(37), 14795-14800. doi: 10.1073/pnas.0703868104

Baddeley, A. D., & Hutch, G. J. (1974). Working memory. In G.H. Bower (Ed.), The psychology of learning and motivation, Volume 8: Advances in research and theory (pp. 47-89). New York: Academic Press.

Baggaley, J. (1974). Measurement of absolute pitch. Psychol Music, 2(2), 11-17.

Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer, N. B. (1998). Absolute pitch: an approach for identification of genetic and nongenetic components. Am J Hum Genet, 62(2), 224-231. doi: 10.1086/301704

Barac, R., & Bialystok, E. (2012). Bilingual effects on cognitive and linguistic development: role of language, cultural background, and education. Child Dev, 83(2), 413-422. doi: 10.1111/j.1467-8624.2011.01707.x

Barthelemy, M. (2004). Betweenness centrality in large complex networks. The European Physical Journal B-Condensed Matter and Complex Systems, 38(2), 163-168. doi: 10.1140/epjb/e2004-00111-4

Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann Stat, 1165-1188.

122

Bent, T., Bradlow, A. R., & Wright, B. A. (2006). The influence of linguistic experience on the cognitive processing of pitch in speech and nonspeech sounds. J Exp Psychol Hum Percept Perform, 32(1), 97-103. doi: 10.1037/0096-1523.32.1.97

Bermudez, P., & Zatorre, R. J. (2009). A distribution of absolute pitch ability as revealed by computerized testing. Music Percept, 27(2), 89-101. doi: 10.1525/mp.2009.27.2.89

Besson, M., Chobert, J., & Marie, C. (2011). Transfer of Training between Music and Speech: Common Processing, Attention, and Memory. Front Psychol, 2, 94. doi: 10.3389/fpsyg.2011.00094

Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in music and other non-linguistic contexts. Psychophysiology, 24(1), 14-25.

Bialystok, E. (2011). Reshaping the mind: the benefits of bilingualism. Can J Exp Psychol, 65(4), 229-235. doi: 10.1037/a0025406

Bialystok, E., Craik, F. I., & Luk, G. (2012). Bilingualism: consequences for mind and brain. Trends Cogn Sci, 16(4), 240-250. doi: 10.1016/j.tics.2012.03.001

Bialystok, E., & Depape, A. M. (2009). Musical expertise, bilingualism, and executive functioning. J Exp Psychol Hum Percept Perform, 35(2), 565-574. doi: 10.1037/a0012735

Bialystok, E., & Feng, X. L. (2010). Language proficiency and its implications for monolingual and bilingual children (A. Y. Durgunoglu & C. Goldenberg Eds.). New York: Guilford Press.

Bialystok, E., Luk, G., Peets, K. F., & Yang, S. (2010). Receptive vocabulary differences in monolingual and bilingual children. Biling (Camb Engl), 13(4), 525-531. doi: 10.1017/S1366728909990423

Bidelman, G. M., Gandour, J. T., & Krishnan, A. (2011a). Cross-domain effects of music and language experience on the representation of pitch in the human auditory brainstem. J Cogn Neurosci, 23(2), 425-434. doi: 10.1162/jocn.2009.21362

Bidelman, G. M., Gandour, J. T., & Krishnan, A. (2011b). Musicians and tone-language speakers share enhanced brainstem encoding but not perceptual benefits for musical pitch. Brain Cogn, 77(1), 1-10. doi: 10.1016/j.bandc.2011.07.006

Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone Language Speakers and Musicians Share Enhanced Perceptual and Cognitive Abilities for Musical Pitch: Evidence for Bidirectionality between the Domains of Language and Music. PLoS One, 8(4), e60676. doi: 10.1371/journal.pone.0060676

Bidelman, G. M., & Krishnan, A. (2010). Effects of reverberation on brainstem representation of speech in musicians and non-musicians. Brain Res, 1355, 112-125. doi: 10.1016/j.brainres.2010.07.100

123

Bidelman, G. M., Moreno, S., & Alain, C. (2013). Tracing the emergence of categorical speech perception in the human auditory system. Neuroimage, 79, 201-212. doi: 10.1016/j.neuroimage.2013.04.093

Bidelman, G. M., Weiss, M. W., Moreno, S., & Alain, C. (2014). Coordinated plasticity in brainstem and auditory cortex contributes to enhanced categorical speech perception in musicians. Eur J Neurosci, 40(4), 2662-2673. doi: 10.1111/ejn.12627

Bonakdarpour, B., Eftekharzadeh, A., & Ashayeri, H. (2000). Preliminary report on the effects of melodic intonation therapy in the rehabilitation of Persian aphasic patients. Iranian Journal of Medical Sciences, 25, 156-160.

Bonneville-Roussy, A., Lavigne, G. L., & Vallerand, R. J. (2011). When passion leads to excellence: The case of musicians. Psychol Music, 39(1), 123-138. doi: 10.1177/0305735609352441

Bosnyak, D. J., Eaton, R. A., & Roberts, L. E. (2004). Distributed auditory cortical representations are modified when non-musicians are trained at pitch discrimination with 40 Hz amplitude modulated tones. Cereb Cortex, 14(10), 1088-1099.

Bouchard, T. J. (2004). Genetic influence on human psychological traits: A survey. Curr Dir Psychol Sci, 13(4), 148-151. doi: 10.1111/j.0963-7214.2004.00295.x

Brattico, E., Pallesen, K. J., Varyagina, O., Bailey, C., Anourova, I., Jarvenpaa, M., . . . Tervaniemi, M. (2009). Neural discrimination of nonprototypical chords in music experts and laymen: an MEG study. J Cogn Neurosci, 21(11), 2230-2244. doi: 10.1162/jocn.2008.21144

Brattico, E., Tervaniemi, M., Näätänen, R., & Peretz, I. (2006). Musical scale properties are automatically processed in the human auditory cortex. Brain Res, 1117(1), 162-174. doi: 10.1016/j.brainres.2006.08.023

Brandler, S., & Rammsayer, T. H. (2003). Differences in mental abilities between musicians and non-musicians. Psychol Music, 31(2), 123-138. doi: 10.1177/0305735603031002290

Brod, G., & Opitz, B. (2012). Does it really matter? Separating the effects of musical training on syntax acquisition. Front Psychol, 3, 543. doi: 10.3389/fpsyg.2012.00543

Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 10(3), 186-198. doi: 10.1038/nrn2575

Catani, M., & Mesulam, M. (2008). The arcuate fasciculus and the disconnection theme in language and aphasia: history and current state. Cortex, 44(8), 953-961. doi: 10.1016/j.cortex.2008.04.002

Ceponiene, R., Lepisto, T., Soininen, M., Aronen, E., Alku, P., & Näätänen, R. (2004). Event-related potentials associated with sound discrimination versus novelty detection in children. Psychophysiology, 41(1), 130-141. doi: 10.1111/j.1469-8986.2003.00138.x

124

Chan, R. C. K., Shum, D., Toulopoulou, T., & Chen, E. Y. H. (2008). Assessment of executive functions: Review of instruments and identification of critical issues. Arch Clin Neuropsych, 23(2), 201-216. doi: 10.1016/j.acn.2007.08.010

Chandrasekaran, B., Krishnan, A., & Gandour, J. T. (2009). Relative influence of musical and linguistic experience on early cortical processing of pitch contours. Brain Lang, 108(1), 1-9. doi: 10.1016/j.bandl.2008.02.001

Chartrand, J. P., & Belin, P. (2006). Superior voice timbre processing in musicians. Neurosci Lett, 405(3), 164-167. doi: 10.1016/j.neulet.2006.06.053

Chobert, J., Francois, C., Velay, J. L., & Besson, M. (2014). Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time. Cereb Cortex, 24(4), 956-967. doi: 10.1093/cercor/bhs377

Cooper, A., & Wang, Y. (2012). The influence of linguistic and musical experience on Cantonese word learning. J Acoust Soc Am, 131(6), 4756-4769. doi: 10.1121/1.4714355

Corrigall, K. A., & Schellenberg, E. G. (2015). Predicting who takes music lessons: parent and child characteristics. Front Psychol, 6, 282. doi: 10.3389/fpsyg.2015.00282

Corrigall, K. A., & Trainor, L. J. (2011). Associations between length of music training and reading skills in children. Music Percept, 29(2), 147-155. doi: 10.1525/mp.2011.29.2.147

Corrigall, K. A., Schellenberg, E. G., & Misura, N. M. (2013). Music training, cognition, and personality. Frontiers in Psychology, 4(222), 1-10. doi: 10.3389/fpsyg.2013.00222

Corsi, P. M. (1972). Human memory and the medial temporal region of the brain [PhD thesis]. McGill University, Montreal.

Costa, M., Goldberger, A. L., & Peng, C. K. (2002). Multiscale entropy analysis of complex physiologic time series. Phys Rev Lett, 89(6), 068102.

Costa, M., Goldberger, A. L., & Peng, C. K. (2005). Multiscale entropy analysis of biological signals. Phys Rev E Stat Nonlin Soft Matter Phys, 71(2 Pt 1), 021906.

Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge: Cambridge University Press.

Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: a literature review. Lang Speech, 40 ( Pt 2), 141-201.

de Bruin, A., Treccani, B., & Della Sala, S. (2015). Cognitive advantage in bilingualism: an example of publication bias? Psychol Sci, 26(1), 99-107. doi: 10.1177/0956797614557866

Deco, G., Jirsa, V., McIntosh, A. R., Sporns, O., & Kotter, R. (2009). Key role of coupling, delay, and noise in resting brain fluctuations. Proc Natl Acad Sci U S A, 106(25), 10302-10307. doi: 10.1073/pnas.0901831106

125

Deco, G., Jirsa, V. K., & McIntosh, A. R. (2011). Emerging concepts for the dynamical organization of resting-state activity in the brain. Nat Rev Neurosci, 12(1), 43-56. doi: 10.1038/nrn2961

Dediu, D., & Ladd, D. R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin. P Natl Acad Sci USA, 104(26), 10944-10949. doi: 10.1073pnas.0610848104

Deliege, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff's grouping preference rules. Music Percept, 325-359 %@ 0730-7829.

Delogu, F., Lampis, G., & Belardinelli, M. O. (2010). From melody to lexical tone: musical ability enhances specific aspects of foreign language perception. Eur J Cogn Psychol, 22(1), 46-61. doi: 10.1080/09541440802708136

Delorme, A., & Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Met, 134(1), 9-21. doi: 10.1016/j.jneumeth.2003.10.009

Deutsch, D. (1987). The tritone paradox: effects of spectral variables. Percept Psychophys, 41(6), 563-575.

Deutsch, D. (2013). Absolute pitch. In D. Deutsch (Ed.), The Psychology of Music (3rd ed ed., pp. 141-182). San Diego: Elsevier.

Deutsch, D., & Dooley, K. (2013). Absolute pitch is associated with a large auditory digit span: a clue to its genesis. J Acoust Soc Am, 133(4), 1859-1861. doi: 10.1121/1.4792217

Deutsch, D., Henthorn, T., & Dolson, M. (1999). Absolute pitch is demonstrated in speakers of tone languages. J Acoust Soc Am, 106(4), 2267-2267. doi: 10.1121/1.427738

Deutsch, D., Henthorn, T., & Dolson, M. (2004). Absolute pitch, speech, and tone language: Some experiments and a proposed framework. Music Percept, 21, 339-356. doi: 10.1525/mp/2004.21.3.339

Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006). Absolute pitch among American and Chinese conservatory students: prevalence differences, and evidence for a speech-related critical period. J Acoust Soc Am, 119(2), 719-722.

Diaconescu, A. O., Alain, C., & McIntosh, A. R. (2011). The co-occurrence of multisensory facilitation and cross-modal conflict in the human brain. J Neurophysiol, 106(6), 2896-2909. doi: 10.1152/jn.00303.2011

Diamond, A. (2013). Executive functions. Annu Rev Psychol, 64, 135-168.

Donchin, E., & Coles, M. G. (1988). Is the P300 component a manifestation of context updating? Behav Brain Sci, 11(03), 357-374.

126

Dooley, K., & Deutsch, D. (2010). Absolute pitch correlates with high performance on musical dictation. J Acoust Soc Am, 128(2), 890-893. doi: 10.1121/1.3458848

Dooley, K., & Deutsch, D. (2011). Absolute pitch correlates with high performance on interval naming tasks. J Acoust Soc Am, 130(6), 4097-4104. doi: 10.1121/1.3652861

Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic correlates of musical pitch recognition in humans. Science, 291(5510), 1969-1972. doi: 10.1126/science.291.5510.1969

Dung, D. T., Houng, T. T., & Boulakia, G. (1998). Intonation in Vietnamese (D. Hirst & A. Di Cristo Eds.). Cambridge: Cambridge University Press.

Dunn, B. R., Dunn, D. A., Languis, M., & Andrews, D. (1998). The relation of ERP components to complex memory processing. Brain Cogn, 36(3), 355-376. doi: 10.1006/brcg.1998.0998

Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci, 1(1), 54-75.

Engel de Abreu, P. M. (2011). Working memory in multilingual children: is there a bilingual effect? Memory, 19(5), 529-537. doi: 10.1080/09658211.2011.590504

Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychol Rev, 100(3), 363-406. doi: 10.1037/0033-295X.100.3.363

Escera, C., Alho, K., Schröger, E., & Winkler, I. (2000). Involuntary attention and distractibility as evaluated with event-related brain potentials. Audiol Neuro-otol, 5(3-4), 151-166.

Esopenko, C., Kumar, P.K., Alain, C., Chow, T. W., McIntosh, A.R., Strother, S., & Levine, B.

(2013). The Interaction between Traumatic Brain Injury and Aging: Evidence from NHL Alumni. J Int Neuropsych Soc, 19(S1), i-295.

Faisal, A. A., Selen, L. P., & Wolpert, D. M. (2008). Noise in the nervous system. Nat Rev Neurosci, 9(4), 292-303. doi: 10.1038/nrn2258

Francois, C., Chobert, J., Besson, M., & Schön, D. (2013). Music training for the development of speech segmentation. Cereb Cortex, 23(9), 2038-2043. doi: 10.1093/cercor/bhs180

Fritz, J., Shamma, S., Elhilali, M., & Klein, D. (2003). Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat Neurosci, 6(11), 1216-1223. doi: 10.1038/nn1141

Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention--focusing the searchlight on sound. Curr Opin Neurobiol, 17(4), 437-455. doi: 10.1016/j.conb.2007.07.011

127

Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. J Cogn Neurosci, 16(6), 1010-1021. doi: 10.1162/0898929041502706

Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of polyphonic melodies in musicians and nonmusicians. J Cogn Neurosci, 17(10), 1578-1592. doi: 10.1162/089892905774597263

Gandour, J. T. (1981). Perceptual dimensions of tone: Evidence from Cantonese. J Chinese Linguist, 9(1), 20-36.

Garrett, D. D., Kovacevic, N., McIntosh, A. R., & Grady, C. L. (2010). Blood oxygen level-dependent signal variability is more than just noise. J Neurosci, 30(14), 4914-4921. doi: 10.1523/JNEUROSCI.5166-09.2010

Garrett, D. D., Kovacevic, N., McIntosh, A. R., & Grady, C. L. (2011). The importance of being variable. J Neurosci, 31(12), 4496-4503. doi: 10.1523/JNEUROSCI.5641-10.2011

Garrett, D. D., Samanez-Larkin, G. R., MacDonald, S. W., Lindenberger, U., McIntosh, A. R., & Grady, C. L. (2013). Moment-to-moment brain signal variability: a next frontier in human brain mapping? Neurosci Biobehav Rev, 37(4), 610-624. doi: 10.1016/j.neubiorev.2013.02.015

George, E. M., & Coch, D. (2011). Music training and working memory: an ERP study. Neuropsychologia, 49(5), 1083-1094. doi: 10.1016/j.neuropsychologia.2011.02.001

Geschwind, N., & Levitsky, W. (1968). Human brain: left-right asymmetries in temporal speech region. Science, 161(3837), 186-187.

Ghosh, A., Rho, Y., McIntosh, A. R., Kotter, R., & Jirsa, V. K. (2008a). Cortical network dynamics with time delays reveals functional connectivity in the resting brain. Cogn Neurodyn, 2(2), 115-120. doi: 10.1007/s11571-008-9044-2

Ghosh, A., Rho, Y., McIntosh, A. R., Kotter, R., & Jirsa, V. K. (2008b). Noise during rest enables the exploration of the brain's dynamic repertoire. PLoS Comput Biol, 4(10), e1000196. doi: 10.1371/journal.pcbi.1000196

Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proc Natl Acad Sci USA, 99(12), 7821-7826. doi: 10.1073/pnas.122653799

Giuliano, R. J., Pfordresher, P. Q., Stanley, E. M., Narayana, S., & Wicha, N. Y. (2011). Native experience with a tone language enhances pitch discrimination and the timing of neural responses to pitch change. Front Psychol, 2, 146. doi: 10.3389/fpsyg.2011.00146

Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., . . . Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation, 101(23), E215-220.

128

Golestani, N., Price, C. J., & Scott, S. K. (2011). Born with an ear for dialects? Structural plasticity in the expert phonetician brain. J Neurosci, 31(11), 4213-4220. doi: 10.1523/JNEUROSCI.3891-10.2011

Gollan, T. H., Montoya, R. I., & Werner, G. A. (2002). Semantic and letter fluency in Spanish-English bilinguals. Neuropsychology, 16(4), 562-576.

Granot, R. Y., Frankel, Y., Gritsenko, V., Lerer, E., Gritsenko, I., Bachner-Melman, R., . . . Ebstein, R. P. (2007). Provisional evidence that the arginine vasopressin 1a receptor gene is associated with musical memory. Evol Hum Behav, 28(5), 313-318.

Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (1999). Absolute pitch: prevalence, ethnic variation, and estimation of the genetic component. Am J Hum Genet, 65(3), 911-913. doi: 10.1086/302541

Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood music education and predisposition to absolute pitch: teasing apart genes and environment. Am J Med Genet, 98(3), 280-282.

Gregersen, P. K., Kowalsky, E., & Li, W. (2007). Reply to Henthorn and Deutsch: Ethnicity versus early environment: Comment on ‘Early Childhood Music Education and Predisposition to Absolute Pitch: Teasing Apart Genes and Environment’ by Peter K. Gregersen, Elena Kowalsky, Nina Kohn, and Elizabeth West Marvin [2000]. Am J Med Genet A, 143(1), 104-105.

Gudmundsson, S., Runarsson, T. P., Sigurdsson, S., Eiriksdottir, G., & Johnsen, K. (2007). Reliability of quantitative EEG features. Clin Neurophysiol, 118(10), 2162-2171. doi: 10.1016/j.clinph.2007.06.018

Guimera, R., Mossa, S., Turtschi, A., & Amaral, L. A. (2005). The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles. Proc Natl Acad Sci USA, 102(22), 7794-7799. doi: 10.1073/pnas.0407994102

Guimera, R., & Nunes Amaral, L. A. (2005). Functional cartography of complex metabolic networks. Nature, 433(7028), 895-900. doi: 10.1038/nature03288

Hackley, S. A., Woldorff, M., & Hillyard, S. A. (1990). Cross-modal selective attention effects on retinal, myogenic, brainstem, and cerebral evoked potentials. Psychophysiology, 27(2), 195-208.

Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Mem Cognit, 17(5), 572-581.

Halpern, A. R., Martin, J. S., & Reed, T. D. (2008). An ERP study of major-minor classification in melodies. Music Percept, 25, 181-191. doi: 10.1525/mp.2008.25.3.181

Hambrick, D. Z., & Tucker-Drob, E. M. (2015). The genetics of music accomplishment: evidence for gene-environment correlation and interaction. Psychon Bull Rev, 22(1), 112-120. doi: 10.3758/s13423-014-0671-9

129

Hansen, M., Wallentin, M., & Vuust, P. (2013). Working memory and musical competence of musicians and non-musicians. Psychology of Music, 41(6), 779-793. doi: 10.1177/0305735612452186

Heisz, J. J., & McIntosh, A. R. (2013). Applications of EEG neuroimaging data: Event-related potentials, spectral power, and multiscale entropy. J Vis Exp, 76, 50131. doi: 10.3791/50131.

Heisz, J. J., Shedden, J. M., & McIntosh, A. R. (2012). Relating brain signal variability to knowledge representation. Neuroimage, 63(3), 1384-1392. doi: 10.1016/j.neuroimage.2012.08.018

Helmbold, N., Rammsayer, T., & Altenmüller, E. (2005). Differences in primary mental abilities between musicians and nonmusicians. J Ind Diff, 26(2), 74-85.

Hillyard, S. A., Hink, R. F., Schwent, V. L., & Picton, T. W. (1973). Electrical signs of selective attention in the human brain. Science, 182(4108), 177-180.

Holleran, S., Jones, M. R., & Butler, D. (1995). Perceiving implied harmony: the influence of melodic and harmonic context. J Exp Psychol Learn Mem Cogn, 21(3), 737-753.

Honey, C. J., Kotter, R., Breakspear, M., & Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proc Natl Acad Sci U S A, 104(24), 10240-10245. doi: 10.1073/pnas.0701519104

Honing, H., & Ladinig, O. (2008). The potential of the internet for music perception research: A comment on lab-based versus web-based studies. Empirical Musicology Review, 3(1), 4-7.

Horvath, J., Roeber, U., & Schroger, E. (2009). The utility of brief, spectrally rich, dynamic sounds in the passive oddball paradigm. Neurosci Lett, 461(3), 262-265. doi: 10.1016/j.neulet.2009.06.035

Hou, J., Chen, C., Wang, Y., Liu, Y., He, Q., Li, J., & Dong, Q. (2014). Superior pitch identification ability is associated with better executive functions. Psychomusicology: Music, Mind, and Brain, 24(2), 136.

Hutka, S.A., & Alain, C. (2015). The effects of absolute pitch and tone language on pitch processing and encoding in musicians. Music Percept, 32, 344-354. doi: 10.1525/mp.2015.32.4.344

Hutka, S., Bidelman, G. M., & Moreno, S. (2013). Brain signal variability as a window into the bidirectionality between music and language processing: moving from a linear to a nonlinear model. Front Psychol, 4, 984. doi: 10.3389/fpsyg.2013.00984

Hutka, S., Bidelman, G. M., & Moreno, S. (2015). Pitch expertise is not created equal: Cross-domain effects of musicianship and tone language experience on neural and behavioural discrimination of speech and music. Neuropsychologia, 71, 52-63. doi: 10.1016/j.neuropsychologia.2015.03.019

130

Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. J Neurosci, 29(10), 3019-3025. doi: 10.1523/JNEUROSCI.5118-08.2009

Jakobson, L. S., Lewycky, S. T., Kilgour, A. R., & Stoesz, B. M. (2008). Memory for verbal and visual material in highly trained musicians. Music Percept, 26, 41-55. doi: 10.1525/mp.2008.26.1.41

Jirsa, V. K., & Kelso, J. A. (2000). Spatiotemporal pattern formation in neural systems with heterogeneous connection topologies. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 62(6 Pt B), 8462-8465.

Johnson, R., Jr. (1989). Developmental evidence for modality-dependent P300 generators: a normative study. Psychophysiology, 26(6), 651-667.

Jones, M. R. (1987). Dynamic pattern structure in music: recent theory and research. Percept Psychophys, 41(6), 621-634.

Kalmus, H., & Fry, D. B. (1980). On tune deafness (dysmelodia): frequency, development, genetics and musical background. Ann Hum Genet, 43(4), 369-382.

Keenan, J. P., Thangaraj, V., Halpern, A. R., & Schlaug, G. (2001). Absolute pitch and planum temporale. Neuroimage, 14(6), 1402-1408. doi: 10.1006/nimg.2001.0925

Khouw, E., & Ciocca, V. (2007). Perceptual correlates of Cantonese tones. J Phonetics, 35(1), 104-117.

Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. J Acoust Soc Am, 87, 820-857. doi: 10.1121/1.398894

Klein, M., Coles, M. G., & Donchin, E. (1984). People with absolute pitch process tones without producing a P300. Science, 223(4642), 1306-1309.

Koelsch, S., Maess, B., Gunter, T. C., & Friederici, A. D. (2001). Neapolitan chords activate the area of Broca. A magnetoencephalographic study. Ann N Y Acad Sci, 930, 420-421.

Koelsch, S., Schroger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport, 10(6), 1309-1313.

Kotter, R., & Wanke, E. (2005). Mapping brains without coordinates. Philos Trans R Soc Lond B Biol Sci, 360(1456), 751-766. doi: 10.1098/rstb.2005.1625

Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nat Rev Neurosci, 11(8), 599-605. doi: 10.1038/nrn2882

Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., & White-Schwoch, T. (2014). Music enrichment programs improve the neural encoding of speech in at-risk children. J Neurosci, 34(36), 11913-11918. doi: 10.1523/JNEUROSCI.1881-14.2014

131

Krishnan, A., Xu, Y., Gandour, J., & Cariani, P. (2005). Encoding of pitch in the human brainstem is sensitve to language experience. Cognitive Brain Res, 25, 161-168. doi: 10.1016/j.cogbrainres.2005.05.004

Krizman, J., Marian, V., Shook, A., Skoe, E., & Kraus, N. (2012). Subcortical encoding of sound is enhanced in bilinguals and relates to executive function advantages. P Natl Acad Sci USA, 109(20), 7877-7881.

Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press.

Larsen-Freeman, D. (1997). Chaos/Complexity science and second language acquisition. Appl Linguist, 18(2), 141-165. doi: 10.1093/applin/18.2.141

Laughlin, S. A., Naeser, M. A., & Gordon, W. P. (1979). Effects of three syllable durations using the melodic intonation therapy technique. J Speech Hear Res, 22(2), 311-320.

Lee, C. Y., & Lee, Y. F. (2010). Perception of musical pitch and lexical tones by Mandarin-speaking musicians. J Acoust Soc Am, 127(1), 481-490. doi: 10.1121/1.3266683

Levitin, D. J. (1994). Absolute memory for musical pitch: evidence from the production of learned melodies. Percept Psychophys, 56(4), 414-423.

Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: perception, coding, and controversies. Trends Cogn Sci, 9(1), 26-33. doi: 10.1016/j.tics.2004.11.007

Levitt, H. (1971). Transformed up-down methods in psychoacoustics. J Acoust Soc Am, 49(2B), 467-477. doi: 10.1121/1.1912375

Li, P., Sepanski, S., & Zhao, X. (2006). Language history questionnaire: A Web-based interface for bilingual research. Behav Res Methods, 38, 202-210. doi: 10.3758/bf03192770

Lippe, S., Kovacevic, N., & McIntosh, A. R. (2009). Differential maturation of brain signal complexity in the human auditory and visual system. Front Hum Neurosci, 3, 48. doi: 10.3389/neuro.09.048.2009

Loui, P., Li, H. C., Hohmann, A., & Schlaug, G. (2011). Enhanced cortical connectivity in absolute pitch musicians: a model for local hyperconnectivity. J Cogn Neurosci, 23(4), 1015-1026. doi: 10.1162/jocn.2010.21500

Loui, P., Zamm, A., & Schlaug, G. (2012). Enhanced functional networks in absolute pitch. Neuroimage, 63(2), 632-640. doi: 10.1016/j.neuroimage.2012.07.030

Luck, S. J. (2005). An introduction to the event-related potential technique (2nd edition ed.). Cambridge, MA: MIT Press.

Luk, G., & Bialystok, E. (2013). Bilingualism is not a categorical variable: Interaction between language proficiency and usage. Journal of Cognitive Psychology, 25(5), 605–621. doi: 10.1080/20445911.2013.795574

132

Macnamara, B. N., Hambrick, D. Z., & Oswald, F. L. (2014). Deliberate practice and performance in music, games, sports, education, and professions: a meta-analysis. Psychol Sci, 25(8), 1608-1618. doi: 10.1177/0956797614535810

Maddieson, I. (2013). Tone M. S. Dryer & M. Haspelmath (Eds.), The world atlas of language structures online. Retrieved from http://wals.info/chapter/13

Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca's area: an MEG study. Nat Neurosci, 4(5), 540-545. doi: 10.1038/87502

Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches. J Cogn Neurosci, 18(2), 199-211. doi: 10.1162/jocn.2006.18.2.199

Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. J Cogn Neurosci, 23(10), 2701-2715. doi: 10.1162/jocn.2010.21585

Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. J Cogn Neurosci, 23(2), 294-305. doi: 10.1162/jocn.2010.21413

Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a foreign language better than nonmusicians: behavioral and electrophysiological evidence. J Cogn Neurosci, 19(9), 1453-1463. doi: 10.1162/jocn.2007.19.9.1453

Matthews, G., Deary, I. J., & Whiteman, M. C. (2003). Personality Traits. Cambridge, UK: Cambridge University Press.

McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L. (1996). Spatial pattern analysis of functional brain images using partial least squares. Neuroimage, 3(3 Pt 1), 143-157. doi: 10.1006/nimg.1996.0016

McIntosh, A. R., Kovacevic, N., & Itier, R. J. (2008). Increased brain signal variability accompanies lower behavioral variability in development. PLoS Comput Biol, 4(7), e1000106. doi: 10.1371/journal.pcbi.1000106

McIntosh, A. R., & Lobaugh, N. J. (2004). Partial least squares analysis of neuroimaging data: applications and advances. Neuroimage, 23 Suppl 1, S250-263. doi: 10.1016/j.neuroimage.2004.07.020

McIntosh, A. R., Vakorin, V., Kovacevic, N., Wang, H., Diaconescu, A., & Protzner, A. B. (2014). Spatiotemporal dependency of age-related changes in brain signal variability. Cereb Cortex, 24(7), 1806-1817. doi: 10.1093/cercor/bht030

McKenna, T. M., McMullen, T. A., & Shlesinger, M. F. (1994). The brain as a dynamic physical system. Neuroscience, 60(3), 587-605.

133

Merrett, D. L., Peretz, I., & Wilson, S. J. (2014). Neurobiological, cognitive, and emotional mechanisms in melodic intonation therapy. Front Hum Neurosci, 8, 401. doi: 10.3389/fnhum.2014.00401

Merrill, J., Sammler, D., Bangert, M., Goldhahn, D., Lohmann, G., Turner, R., & Friederici, A. D. (2012). Perception of words and pitch patterns in song and speech. Front Psychol, 3, 76. doi: 10.3389/fpsyg.2012.00076

Milovanov, R., Huotilainen, M., Esquef, P. A., Alku, P., Valimaki, V., & Tervaniemi, M. (2009). The role of musical aptitude and language skills in preattentive duration processing in school-aged children. Neurosci Lett, 460(2), 161-165. doi: 10.1016/j.neulet.2009.05.063

Misic, B., Mills, T., Taylor, M. J., & McIntosh, A. R. (2010). Brain noise is task dependent and region specific. J Neurophysiol, 104(5), 2667-2676. doi: 10.1152/jn.00648.2010

Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors. Percept Psychophys, 44(6), 501-512.

Miyazaki, K., & Rakowski, A. (2002). Recognition of notated melodies by possessors and nonpossessors of absolute pitch. Percept Psychophys, 64(8), 1337-1345.

Miyazaki, K. I. (1990). The speed of musical pitch identification by absolute-pitch possessors. Music Percept, 8(177).

Miyazaki, K. I. (1993). Absolute pitch as an inability: Identification of musical intervals in a tonal context. . Music Percept, 11, 55-71.

Miyazaki, K. I. (1995). Perception of relative pitch with different references: Some absolute-pitch listeners can’t tell musical interval names. Percept Psychophys, 57(7), 962-970.

Mok, P. P., & Zuo, D. (2012). The separation between music and speech: Evidence from the perception of Cantonese tonesa). J Acoust Soc Am, 132(4), 2711-2720.

Moore, E., Schaefer, R. S., Bastin, M. E., Roberts, N., & Overy, K. (2014). Can musical training influence brain connectivity? Evidence from diffusion tensor MRI. Brain Sci, 4(2), 405-427. doi: 10.3390/brainsci4020405

Morales, J., Calvo, A., & Bialystok, E. (2013). Working memory development in monolingual and bilingual children. J Exp Child Psychol, 114(2), 187-202. doi: 10.1016/j.jecp.2012.09.002

Moreno, S., Bialystok, E., Barac, R., Schellenberg, E. G., Cepeda, N. J., & Chau, T. (2011). Short-term music training enhances verbal intelligence and executive function. Psychol Sci, 22(11), 1425-1433. doi: 10.1177/0956797611416999

Moreno, S., & Bidelman, G. M. (2014). Examining neural plasticity and cognitive benefit through the unique lens of musical training. Hear Res, 308, 84-97. doi: 10.1016/j.heares.2013.09.012

134

Moreno, S., Lee, Y., Janus, M., & Bialystok, E. (2014). Short-Term Second Language and Music Training Induces Lasting Functional Brain Changes in Early Childhood. Child Dev. doi: 10.1111/cdev.12297

Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity. Cereb Cortex, 19(3), 712-723. doi: 10.1093/cercor/bhn120

Moreno, S., Wodniecka, Z., Tays, W., Alain, C., & Bialystok, E. (2014). Inhibitory Control in Bilinguals and Musicians: Event Related Potential (ERP) Evidence for Experience-Specific Effects. PLoS One, 9(4), e94169. doi: 10.1371/journal.pone.0094169

Morley, A. P., Narayanan, M., Mines, R., Molokhia, A., Baxter, S., Craig, G., . . . Craig, I. (2012). AVPR1A and SLC6A4 polymorphisms in choral singers and non-musicians: a gene association study. PLoS One, 7(2), e31763. doi: 10.1371/journal.pone.0031763

Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice Does Not Make Perfect No Causal Effect of Music Practice on Music Ability. Psychol Sci, 0956797614541990. doi: 10.1177/0956797614541990

Mullensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: an index for assessing musical sophistication in the general population. PLoS One, 9(2), e89642. doi: 10.1371/journal.pone.0089642

Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc Natl Acad Sci U S A, 104(40), 15894-15898. doi: 10.1073/pnas.0701498104

Musacchia, G., Strait, D., & Kraus, N. (2008). Relationships between behavior, brainstem and cortical encoding of seen and heard speech in musicians and non-musicians. Hear Res, 241(1-2), 34-42. doi: 10.1016/j.heares.2008.04.013

Myers, E. B., & Swan, K. (2012). Effects of category learning on neural sensitivity to non-native phonetic categories. J Cogn Neurosci, 24(8), 1695-1708. doi: 10.1162/jocn_a_00243

Näätänen, R. (1990). The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behav Brain Sci, 13(02), 201-233.

Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: a review. Clin Neurophysiol, 118(12), 2544-2590. doi: 10.1016/j.clinph.2007.04.026

Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): towards the optimal paradigm. Clin Neurophysiol, 115(1), 140-144. doi: S1388245703003687

135

Näätänen, R., & Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology, 24(4), 375-425.

Nan, Y., Sun, Y., & Peretz, I. (2010). Congenital amusia in speakers of a tone language: association with lexical tone agnosia. Brain, 133(9), 2635-2642. doi: 10.1093/brain/awq178

Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implication–realization model. Chicago: University of Chicago Press.

Oechslin, M. S., Meyer, M., & Jäncke, L. (2010). Absolute pitch—Functional evidence of speech-relevant auditory acuity. Cereb Cortex, 20(2), 447-455.

Oechslin, M. S., Van De Ville, D., Lazeyras, F., Hauert, C. A., & James, C. E. (2013). Degree of musical expertise modulates higher order brain functioning. Cereb Cortex, 23(9), 2213-2224. doi: 10.1093/cercor/bhs206

Oostenveld, R., & Praamstra, P. (2001). The five percent electrode system for high-resolution EEG and ERP measurements. Clin Neurophysiol, 112(4), 713-719.

Owen, A. M., Hampshire, A., Grahn, J. A., Stenton, R., Dajani, S., Burns, A. S., . . . Ballard, C. G. (2010). Putting brain training to the test. Nature, 465(7299), 775-778. doi: 10.1038/nature09042

Paap, K. R., Johnson, H. A., & Sawi, O. (2014). Are bilingual advantages dependent upon specific tasks or specific bilingual experiences? Journal of Cognitive Psychology, 26(6), 615-639. doi: 10.1080/20445911.2014.944914

Pallesen, K. J., Brattico, E., Bailey, C. J., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2010). Cognitive control in auditory working memory is enhanced in musicians. PLoS One, 5(6), e11120. doi: 10.1371/journal.pone.0011120

Pantev, C., Eulitz, C., Hampson, S., Ross, B., & Roberts, L. E. (1996). The auditory evoked "off" response: sources and comparison with the "on" and the "sustained" responses. Ear Hear, 17(3), 255-265.

Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature, 392(6678), 811-814. doi: 10.1038/33918

Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., & Ross, B. (2001). Timbre-specific enhancement of auditory cortical representations in musicians. Neuroreport, 12(1), 169-174.

Parbery-Clark, A., Skoe, E., & Kraus, N. (2009). Musical experience limits the degradative effects of background noise on the neural processing of sound. J Neurosci, 29(45), 14100-14107. doi: 10.1523/JNEUROSCI.3256-09.2009

136

Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician enhancement for speech-in-noise. Ear Hear, 30(6), 653-661.

Parbery-Clark, A., Strait, D. L., Hittner, E., & Kraus, N. (2013). Musical training enhances neural processing of binaural sounds. J Neurosci, 33(42), 16741-16747. doi: 10.1523/JNEUROSCI.5700-12.2013

Parbery-Clark, A., Strait, D. L., & Kraus, N. (2011). Context-dependent encoding in the auditory brainstem subserves enhanced speech-in-noise perception in musicians. Neuropsychologia, 49(12), 3338-3345. doi: 10.1016/j.neuropsychologia.2011.08.007

Park, H., Lee, S., Kim, H. J., Ju, Y. S., Shin, J. Y., Hong, D., . . . Seo, J. S. (2012). Comprehensive genomic analyses associate UGT8 variants with musical ability in a Mongolian population. J Med Genet, 49(12), 747-752. doi: 10.1136/jmedgenet-2012-101209

Pascual-Marqui, R. D. (2002). Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find Exp Clin Pharmacol, 24 Suppl D, 5-12.

Patel, A. D. (2003). Language, music, syntax and the brain. Nat Neurosci, 6(7), 674-681. doi: 10.1038/nn1082

Patel, A. D. (2008). Music, language, and the brain. New York: Oxford University Press.

Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Front Psychol, 2, 1-14. doi: 10.3389/fpsyg.2011.00142

Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes speech? The expanded OPERA hypothesis. Hear Res, 308, 98-108. doi: 10.1016/j.heares.2013.08.011

Peng, G. (2006). Temporal and tonal aspects of Chinese syllables: A corpus-based comparative study of Mandarin and Cantonese. J Chinese Linguist, 34(1), 134.

Peretz, I., Vuvan, D., Lagrois, M. É., & Armony, J. L. (2015). Neural overlap in processing

music and speech. Philos T Roy Soc of B, 370(1664), 20140090. doi: 10.1098/rstb.2014.0090

Peretz, I., Gosselin, N., Tillmann, B., Cuddy, L. L., Gagnon, B., Trimmer, C. G., . . . Bouchard, B. (2008). On-line identification of congenital amusia. Music Percept, 24(4), 331-343. doi: 10.1525/mp.2008.25.4.331

Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. J Acoust Soc Am, 24, 175-184. doi: 10.1121/1.1917300

Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone language speakers. Atten Percept Psychophys, 71(6), 1385-1398. doi: 10.3758/APP.71.6.1385

137

Picton, T. W., Alain, C., Woods, D. L., John, M. S., Scherg, M., Valdes-Sosa, P., . . . Trujillo, N. J. (1999). Intracerebral sources of human auditory-evoked potentials. Audiol Neurootol, 4(2), 64-79. doi: 13823

Picton, T. W., Hillyard, S. A., Krausz, H. I., & Galambos, R. (1974). Human auditory evoked potentials. I. Evaluation of components. Electroen Clin Neuro, 36(2), 179-190.

Pinneo, L. R. (1966). On noise in the nervous system. Psychol Rev, 73(3), 242-247.

Polich, J. (2007). Updating P300: an integrative theory of P3a and P3b. Clin Neurophysiol, 118(10), 2128-2148.

Portocarrero, J. S., Burright, R. G., & Donovick, P. J. (2007). Vocabulary and verbal fluency of bilingual and monolingual college students. Arch Clin Neuropsychol, 22(3), 415-422. doi: 10.1016/j.acn.2007.01.015

Profita, J., & Bidder, T. G. (1988). Perfect pitch. Am J Med Genet, 29(4), 763-771. doi: 10.1002/ajmg.1320290405

Protzner, A. B., Valiante, T. A., Kovacevic, N., McCormick, C., & McAndrews, M. P. (2010). Hippocampal signal complexity in mesial temporal lobe epilepsy: a noisy brain is a healthy brain. Arch Ital Biol, 148(3), 289-297.

Pulli, K., Karma, K., Norio, R., Sistonen, P., Goring, H. H., & Jarvela, I. (2008). Genome-wide linkage scan for loci of musical aptitude in Finnish families: evidence for a major locus at 4q22. J Med Genet, 45(7), 451-456. doi: jmg.2007.056366

Putkinen, V., Tervaniemi, M., & Huotilainen, M. (2013). Informal musical activities are linked to auditory discrimination and attention in 2-3-year-old children: an event-related potential study. Eur J Neurosci, 37(4), 654-661. doi: 10.1111/ejn.12049

Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proc Natl Acad Sci U S A, 98(2), 676-682. doi: 10.1073/pnas.98.2.676

Raichle, M. E., & Snyder, A. Z. (2007). A default mode of brain function: a brief history of an evolving idea. Neuroimage, 37(4), 1083-1090; discussion 1097-1089. doi: 10.1016/j.neuroimage.2007.02.041

Raja Beharelle, A., Kovacevic, N., McIntosh, A. R., & Levine, B. (2012). Brain signal variability relates to stability of behavior after recovery from diffuse brain injury. Neuroimage, 60(2), 1528-1537. doi: 10.1016/j.neuroimage.2012.01.037

Rattanasone, N. X., Attina, V., Kasisopa, B., & Burnham, D. (2013). How to compare tones South and Southeast Asian Psycholinguistics. Cambridge: Cambridge University Press.

Ravasz, E., & Barabasi, A. L. (2003). Hierarchical organization in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys, 67(2 Pt 2), 026112.

138

Raven, J., Raven, J. C., & Court, J. H. (1998). Advanced progressive matrices. Harcourt Assessment. San Antonio, TX: Harcourt Assessment.

Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol, 278(6), H2039-2049.

Ross, D. A., Gore, J. C., & Marks, L. E. (2005). Absolute pitch: music and beyond. Epilepsy Behav, 7(4), 578-601. doi: 10.1016/j.yebeh.2005.05.019

Rosselli, M., Ardila, A., Araujo, K., Weekes, V. A., Caracciolo, V., Padilla, M., & Ostrosky-Solis, F. (2000). Verbal fluency and repetition skills in healthy older Spanish-English bilinguals. Appl Neuropsychol, 7(1), 17-24. doi: 10.1207/S15324826AN0701_3

Russo, F. A., Ives, D. T., Goy, H., Pichora-Fuller, M. K., & Patterson, R. D. (2012). Age-related difference in melodic pitch perception is probably mediated by temporal processing: empirical and computational evidence. Ear Hear, 33(2), 177-186. doi: 10.1097/AUD.0b013e318233acee

Ruthsatz, J., Detterman, D., Griscom, W. S., & Cirullo, B. A. (2008). Becoming an expert in the musical domain: It takes more than just practice. Intelligence, 36(4), 330-338. doi: 10.1016/j.intell.2007.08.003

Sammler, D., Koelsch, S., Ball, T., Brandt, A., Grigutsch, M., Huppertz, H. J., . . . Schulze-Bonhage, A. (2013). Co-localizing linguistic and musical syntax with intracranial EEG. Neuroimage, 64, 134-146. doi: 10.1016/j.neuroimage.2012.09.035

Sampson, P. D., Streissguth, A. P., Barr, H. M., & Bookstein, F. L. (1989). Neurobehavioral effects of prenatal alcohol: Part II. Partial least squares analysis. Neurotoxicol Teratol, 11(5), 477-491.

Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychol Sci, 15(8), 511-514. doi: 10.1111/j.0956-7976.2004.00711.x

Schellenberg, E. G. (2006). Long-term positive associations between music lessons and IQ. Journal of Educational Psychology, 98(2), 457-468. doi: 10.1037/0022-0663.98.2.457

Schellenberg, E. G. (2011a). Examining the association between music lessons and intelligence. British Journal of Psychology, 102(3), 283-302.

Schellenberg, E. G. (2011b). Music lessons, emotional intelligence, and IQ. Music Percept, 29(2), 185-194. doi: 10.1525/mp.2011.29.2.185

Schellenberg, E. G. (2015). Music training and speech perception: a gene-environment interaction. Ann N Y Acad Sci, 1337, 170-177. doi: 10.1111/nyas.12627

Schellenberg, E. G., & Moreno, S. (2010). Music lessons, pitch processing, and g. Psychol Music, 38(2), 209-221. doi: 10.1177/0305735609339473

139

Schellenberg, E. G., & Peretz, I. (2008). Music, language and cognition: unresolved issues. Trends Cogn Sci, 12(2), 45-46. doi: 10.1016/j.tics.2007.11.005

Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychol Sci, 14(3), 262-266.

Schellenberg, E. G., & Trehub, S. E. (2008). Is there an Asian advantage for pitch memory? Music Percept, 25, 241-252. doi: 10.1525/mp.2008.25.3.241

Scherg, M., Vajsar, J., & Picton, T. W. (1989). A source analysis of the late human auditory evoked potentials. J Cogn Neurosci, 1(4), 336-355. doi: 10.1162/jocn.1989.1.4.336

Schlaug, G., Forgeard, M., Zhu, L., Norton, A., Norton, A., & Winner, E. (2009). Training-induced neuroplasticity in young children. Ann N Y Acad Sci, 1169, 205-208. doi: 10.1111/j.1749-6632.2009.04842.x

Schlaug, G., Jancke, L., Huang, Y., & Steinmetz, H. (1995). In vivo evidence of structural brain asymmetry in musicians. Science, 267(5198), 699-701.

Schlaug, G., Marchina, S., & Norton, A. (2008). From Singing to Speaking: Why Singing May Lead to Recovery of Expressive Language Function in Patients with Broca's Aphasia. Music Percept, 25(4), 315-323. doi: 10.1525/MP.2008.25.4.315

Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white-matter tracts of patients with chronic Broca's aphasia undergoing intense intonation-based speech therapy. Ann N Y Acad Sci, 1169, 385-394. doi: 10.1111/j.1749-6632.2009.04587.x

Schlaug, G., Norton, A., Overy, K., & Winner, E. (2005). Effects of music training on the child's brain and cognitive development. Ann N Y Acad Sci, 1060, 219-230. doi: 10.1196/annals.1360.015

Schmithorst, V. J., & Wilke, M. (2002). Differences in white matter architecture between musicians and non-musicians: a diffusion tensor imaging study. Neurosci Lett, 321(1-2), 57-60.

Schön, D., Magne, C., & Besson, M. (2004). The music of speech: music training facilitates pitch processing in both music and language. Psychophysiology, 41(3), 341-349. doi: 10.1111/1469-8986.00172.x

Seppanen, M., Pesonen, A. K., & Tervaniemi, M. (2012). Music training enhances the rapid plasticity of P3a/P3b event-related brain potentials for unattended and attended target sounds. Atten Percept Psychophys, 74(3), 600-612. doi: 10.3758/s13414-011-0257-9

Sergeant, D., & Thatcher, G. (1974). Intelligence, Social Status and Musical Abilities. Psychol Music, 2(2), 32-57. doi: 10.1177/030573567422005

Shahin, A., Bosnyak, D. J., Trainor, L. J., & Roberts, L. E. (2003). Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. J Neurosci, 23(13), 5545-5552. doi: 23/13/5545 [pii]

140

Shestakova, A., Huotilainen, M., Ceponiene, R., & Cheour, M. (2003). Event-related potentials associated with second language learning in children. Clin Neurophysiol, 114(8), 1507-1512.

Slevc, L. R. (2012). Language and music: sound, structure, and meaning. Wiley Interdisciplinary Reviews: Cognitive Science, 3(4), 483-492. doi: 10.1002/wcs.1186

Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychon Bull Rev, 16(2), 374-381. doi: 10.3758/16.2.374

Sparks, R., Helm, N., & Albert, M. (1974). Aphasia rehabilitation resulting from melodic intonation therapy. Cortex, 10(4), 303-316.

Stagray, J. R., & Downs, D. (1993). Differential Sensitivity for Frequency among Speakers of a Tone and a Nontone Language. J Chinese Linguist, 21(1), 143-163.

Steele, C. J., Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2013). Early musical training and white-matter plasticity in the corpus callosum: evidence for a sensitive period. J Neurosci, 33(3), 1282-1290. doi: 10.1523/JNEUROSCI.3578-12.2013

Stein, R. B., Gossen, E. R., & Jones, K. E. (2005). Neuronal variability: noise or part of the signal? Nat Rev Neurosci, 6(5), 389-397. doi: 10.1038/nrn1668

Steinke, W. R., Cuddy, L. L., & Holden, R. R. (1997). Dissociation of musical tonality and pitch memory from nonmusical cognitive abilities. Can J Exp Psychol, 51(4), 316-334.

Strait, D. L., Kraus, N., Skoe, E., & Ashley, R. (2009). Musical experience and neural efficiency: effects of training on subcortical processing of vocal expressions of emotion. Eur J Neurosci, 29(3), 661-668. doi: 10.1111/j.1460-9568.2009.06617.x

Strait, D. L., O'Connell, S., Parbery-Clark, A., & Kraus, N. (2014). Musicians' enhanced neural differentiation of speech sounds arises early in life: developmental evidence from ages 3 to 30. Cereb Cortex, 24(9), 2512-2521. doi: 10.1093/cercor/bht103

Tadel, F., Baillet, S., Mosher, J. C., Pantazis, D., & Leahy, R. M. (2011). Brainstorm: a user-friendly application for MEG/EEG analysis. Comput Intell Neurosci, 2011, 879716. doi: 10.1155/2011/879716

Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychol Bull, 113(2), 345-361.

Tan, Y. T., McPherson, G. E., Peretz, I., Berkovic, S. F., & Wilson, S. J. (2014). The genetic basis of music ability. Front Psychol, 5, 658. doi: 10.3389/fpsyg.2014.00658

Terhardt, E., & Seewann, M. (1983). Aural key identification and its relationship to absolute pitch. Music Percept, 1, 63-83.

Terhardt, E., & Ward, W. D. (1982). Recognition of musical key: Exploratory study. J Acoust Soc Am, 72, 26-33.

141

Tervaniemi, M., Just, V., Koelsch, S., Widmann, A., & Schröger, E. (2005). Pitch discrimination accuracy in musicians vs nonmusicians: an event-related potential and behavioral study. Exp Brain Res, 161(1), 1-10.

Tervaniemi, M., Rytkonen, M., Schroger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior formation of cortical memory traces for melodic patterns in musicians. Learn Mem, 8(5), 295-300. doi: 10.1101/lm.39501

Theusch, E., Basu, A., & Gitschier, J. (2009). Genome-wide study of families with absolute pitch reveals linkage to 8q24.21 and locus heterogeneity. Am J Hum Genet, 85(1), 112-119. doi: 10.1016/j.ajhg.2009.06.010

Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: do music lessons help? Emotion, 4(1), 46-64. doi: 10.1037/1528-3542.4.1.46

Tierney, A., & Kraus, N. (2013). Music training for the development of reading skills. Prog Brain Res, 207, 209-241. doi: 10.1016/B978-0-444-63327-9.00008-4

Tononi, G., Sporns, O., & Edelman, G. M. (1994). A measure for brain complexity: relating functional segregation and integration in the nervous system. Proc Natl Acad Sci U S A, 91(11), 5033-5037.

Tononi, G., Sporns, O., & Edelman, G. M. (1996). A complexity measure for selective matching of signals by the brain. Proc Natl Acad Sci U S A, 93(8), 3422-3427.

Traynelis, S. F., & Jaramillo, F. (1998). Getting the most out of noise in the central nervous system. Trends Neurosci, 21(4), 137-145.

Tremblay, K. L., Ross, B., Inoue, K., McClannahan, K., & Collet, G. (2014). Is the auditory evoked P2 response a biomarker of learning? Front Syst Neurosci, 8, 28. doi: 10.3389/fnsys.2014.00028

Ukkola, L. T., Onkamo, P., Raijas, P., Karma, K., & Järvelä, I. (2009). Musical aptitude is associated with AVPR1A-haplotypes. PLoS One, 4(5), e5534. doi: 10.1371/journal.pone.0005534

Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., . . . Jarvela, I. (2013). Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music. PLoS One, 8(2), e56356. doi: 10.1371/journal.pone.0056356

Ukkola-Vuoti, L., Oikkonen, J., Onkamo, P., Karma, K., Raijas, P., & Jarvela, I. (2011). Association of the arginine vasopressin receptor 1A (AVPR1A) haplotypes with listening to music. J Hum Genet, 56(4), 324-329. doi: 10.1038/jhg.2011.13

Vakorin, V. A., Misic, B., Krakovska, O., & McIntosh, A. R. (2011). Empirical and theoretical aspects of generation and transfer of information in a neuromagnetic source network. Front Syst Neurosci, 5, 96. doi: 10.3389/fnsys.2011.00096

142

Vaughan, H. G., Jr., & Ritter, W. (1970). The sources of auditory evoked responses recorded from the human scalp. Electroen Clin Neuro, 28(4), 360-367.

von Stein, A., & Sarnthein, J. (2000). Different frequencies for different scales of cortical integration: from local gamma to long range alpha/theta synchronization. Int J Psychophysiol, 38(3), 301-313.

Vuust, P., Roepstorff, A., Wallentin, M., Mouridsen, K., & Ostergaard, L. (2006). It don't mean a thing... Keeping the rhythm during polyrhythmic tension, activates language areas (BA47). Neuroimage, 31(2), 832-841. doi: 10.1016/j.neuroimage.2005.12.037

Waldrop, M. M. (1992). Complexity: The emerging science at the edge of order and chaos. New York: Simon and Schuster.

Wan, C. Y., Bazen, L., Baars, R., Libenson, A., Zipse, L., Zuk, J., . . . Schlaug, G. (2011). Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: a proof of concept study. PLoS One, 6(9), e25505. doi: 10.1371/journal.pone.0025505

Ward, W. D. (1999). Absolute pitch. In D. Deutsch (Ed.), The psychology of music (2nd ed ed., pp. 265–298). San Diego: Academic Press.

Wayman, J. W., Frisina, R. D., Walton, J. P., Hantz, E. C., & Crummer, G. C. (1992). Effects of musical training and absolute pitch ability on event-related activity in response to sine tones. J Acoust Soc Am, 91(6), 3527-3531.

Wechsler, D., & Hsiao-Pin, C. (2011). WASI-II: Wechsler Abbreviated Scale of Intelligence: Pearson.

Weiss, M. W., Schellenberg, E. G., Trehub, S. E., & Dawber, E. J. (2015). Enhanced processing of vocal melodies in childhood. Dev Psychol, 51(3), 370-377. doi: 10.1037/a0038784

Weiss, M. W., Trehub, S. E., & Schellenberg, E. G. (2012). Something in the way she sings: enhanced memory for vocal melodies. Psychol Sci, 23(10), 1074-1078. doi: 10.1177/0956797612442552

Weiss, M. W., Vanzella, P., Schellenberg, E. G., & Trehub, S. E. (2015). Pianists exhibit enhanced memory for vocal melodies but not piano melodies. Q J Exp Psychol, 1-12. doi: 10.1080/17470218.2015.1020818

Wetzel, N., Widmann, A., Berti, S., & Schroger, E. (2006). The development of involuntary and voluntary attention from childhood to adulthood: a combined behavioral and event-related potential study. Clin Neurophysiol, 117(10), 2191-2203. doi: 10.1016/j.clinph.2006.06.717

Wilde, N. J., Strauss, E., & Tulsky, D. S. (2004). Memory span on the Wechsler Scales. J Clin Exp Neuropsychol, 26(4), 539-549. doi: 10.1080/13803390490496605

Wilson, S. J., Parsons, K., & Reutens, D. C. (2006). Music Percept, 24(1), 23-36.

143

Wong, P. C., Ciocca, V., Chan, A. H., Ha, L. Y., Tan, L. H., & Peretz, I. (2012). Effects of culture on musical pitch perception. PLoS One, 7(4), e33424. doi: 10.1371/journal.pone.0033424

Wong, P. C., & Perrachione, T. K. (2007). Learning pitch patterns in lexical identification by native English-speaking adults. Appl Psycholinguist, 28(04), 565-585.

Wong, P. C., Perrachione, T. K., & Parrish, T. B. (2007). Neural characteristics of successful and less successful speech and word learning in adults. Hum Brain Mapp, 28(10), 995-1006. doi: 10.1002/hbm.20330

Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat Neurosci, 10(4), 420-422. doi: 10.1038/nn1872

Wu, C., Kirk, I. J., Hamm, J. P., & Lim, V. K. (2008). The neural networks involved in pitch labeling of absolute pitch musicians. Neuroreport, 19(8), 851-854. doi: 10.1097/WNR.0b013e3282ff63b1

Xu, Y. (1997). Contextual tonal variations in Mandarin. J Phonetics, 25(1), 61-83. doi: 10.1006/jpho.1996.0034

Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27(1), 55-105. doi: 10.1006/jpho.1999.0086

Xu, Y., & Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Commun, 33(4), 319-337. doi: 10.1016/S0167-6393(00)00063-7

Yip, M. (2002). Tone. Cambridge: Cambridge University Press.

Zatorre, R. J., & Baum, S. R. (2012). Musical melody and speech intonation: singing a different tune. PLoS Biol, 10(7), e1001372. doi: 10.1371/journal.pbio.1001372

Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: music and speech. Trends Cogn Sci, 6(1), 37-46.

Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proc Natl Acad Sci U S A, 95(6), 3172-3177.

Zee, E. (1999). Chinese (Hong Kong Cantonese). Handbook of the International Phonetic Association (pp. 58-60). Cambridge: Cambridge University Press.

Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory processing. Psychol Aging, 27(2), 410-417. doi: 10.1037/a0024816

Zhai, S., Kong, J., & Ren, X. (2004). Speed–accuracy tradeoff in Fitts’ law tasks—on the equivalency of actual and nominal pointing precision. Int J Hum-Comput St, 61(6), 823-856.

144

Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive functioning in musicians and non-musicians. PLoS One, 9(6), e99868. doi: 10.1371/journal.pone.0099868

145

Copyright Acknowledgements Chapter 2: Article originally published as Hutka, Stefanie A. and Claude Alain, “The Effects of

Absolute Pitch and Tone Language on Pitch Processing and Encoding in Musicians,” Music

Perception, Vol. 32, No 4 (April 2015): pp. 344-354. © 2015 by Music Perception: University of

California Press.

Chapter 3: Article originally published as Pitch expertise is not created equal: Cross-domain

effects of musicianship and tone language experience on neural and behavioural discrimination

of speech and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., Copyright (2015),

reprinted with permission from Elsevier.

Chapter 3, Figure 3: Reprinted from Pitch expertise is not created equal: Cross-domain effects of

musicianship and tone language experience on neural and behavioural discrimination of speech

and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., p. 55, Copyright (2015), with

permission from Elsevier.















146



Chapter 4: Based on Hutka, S., Bidelman, G. M., & Moreno, S. (2013). Brain signal variability

as a window into the bidirectionality between music and language processing: moving from a

linear to a nonlinear model. Frontiers in Psychology, 4, 984. doi: 10.3389/fpsyg.2013.00984.

This is an open-access article distributed under the terms of the Creative Commons Attribution

Licence (CC BY). © 2013 Hutka, Bidelman and Moreno.

Chapter 4, Figure 8: Reprinted from Hutka, S., Bidelman, G. M., & Moreno, S. (2013). Brain

signal variability as a window into the bidirectionality between music and language processing:

moving from a linear to a nonlinear model. Frontiers in Psychology, 4, 984. doi:

10.3389/fpsyg.2013.00984. © 2013 Hutka, Bidelman and Moreno.

pitch processing experience: comparison of musicians and tone … · 2016-01-08 · iii visual...

Documents