pitch processing experience: comparison of musicians and tone … · 2016-01-08 · iii visual...
TRANSCRIPT
Pitch Processing Experience: Comparison of Musicians and Tone-Language Speakers on
Measures of Auditory Processing and Executive Function
by
Stefanie Andrea Hutka
A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy
Department of Psychology University of Toronto
© Copyright by Stefanie Andrea Hutka 2015
ii
Pitch Processing Experience: Comparison of Musicians and
Tone-Language Speakers on Measures of Auditory
Processing and Executive Function
Stefanie Andrea Hutka
Doctor of Philosophy
Department of Psychology University of Toronto
2015
Abstract
Psychophysiological evidence supports an association between music and speech such
that experience in one domain is related to processing in the other. Musicianship has been
associated with benefits to auditory processing and executive function. It is unclear,
however, whether pitch processing experience in nonmusical contexts, namely speaking a
tone language, has comparable associations with auditory processing and executive
function. The present investigation aimed to clarify this association, with the overarching
goal of better understanding how two different types of pitch processing are linked to
perceptual and cognitive processing. If pitch-processing experience gained via
musicianship or tone-language use shapes perceptual and cognitive processes in similar
ways, then musicians and tone-language speakers (nonmusicians) should outperform
controls without music training or tone-language experience. This hypothesis was tested
in a series of experiments that measured behavioural and neural responses of tone-
language speakers and musicians on tasks of perception (pitch discrimination, pitch
encoding, i.e., representation of pitch-relevant information) and cognition (pitch memory,
iii
visual working memory). Collectively, the findings reveal that benefits to auditory
processing are more closely associated with music training than with tone-language use.
When musicians and tone-language speakers performed comparably on behavioural
tasks, this occurred at the perceptual level (i.e., sound discrimination). Differential
responsiveness of tone-language speakers or musicians was evident at the neural level
(i.e., event-related potentials, brain-signal variability). Neither musicianship nor speaking
a tone language was associated with a benefit to visual working memory. These results
are discussed in relation to the respective contributions of nature and nurture to auditory
processing and visual working memory in musicians and tone-language speakers.
iv
Acknowledgments
I would like to acknowledge the contributions of my supervisors, Dr. Claude Alain and
Dr. Randy McIntosh, and thank them for their guidance, knowledge, and support. I would
like to thank Dr. Sandra Trehub and Dr. Glenn Schellenberg for their roles as thesis
committee members, and for their valuable feedback. Furthermore, I wish to thank my
internal committee members, Dr. Craig Chambers and Dr. Bruce Schneider, as well as
my external appraiser, Dr. Ravi Krishnan, for their helpful comments on my dissertation.
I would also like to thank my collaborators on the studies described herein, including Dr.
Gavin Bidelman, Dr. Sylvain Moreno, Dr. Patrick Bermudez, Dr. Yunjo Lee, Dr. Sean
Hutchins, and Sarah Carpentier. I would also like to acknowledge the funding assistance
of the National Sciences and Engineering Research Council (NSERC), NSERC-Create:
Training in Auditory Cognitive Neuroscience, and the Ontario Graduate Scholarship.
v
Table of Contents Acknowledgments .............................................................................................................. iv
Table of Contents ................................................................................................................ v
List of Tables ..................................................................................................................... ix
List of Figures ..................................................................................................................... x
Chapter 1 The Music–Language Association ..................................................................... 1
1.1 The link between music and speech ........................................................................... 1
1.2 Pitch, music, and tone languages ............................................................................... 3
1.3 Tone language: Associations with auditory processing ............................................. 4
1.4 Nature, nurture, and causality .................................................................................... 7
1.5 Executive function in musicians and tone language speakers ................................. 11
1.6 The present investigation ......................................................................................... 12
Chapter 2 Absolute Pitch and Tone-Language Experience: Associations with Pitch
Processing and Encoding in Musicians ............................................................................. 14
2.1 Introduction .............................................................................................................. 14
2.1.1 Definition of absolute pitch ............................................................................... 14
2.1.2 The link between absolute pitch and language .................................................. 15
2.1.3 The present study ............................................................................................... 16
2.2 Methods .................................................................................................................... 17
2.2.1 Participants ........................................................................................................ 17
2.2.2 Materials ............................................................................................................ 19
2.2.3 Procedure ........................................................................................................... 20
2.3 Results ...................................................................................................................... 22
2.3.1 Accuracy ............................................................................................................ 22
2.3.2 Response Time .................................................................................................. 23
2.4 Discussion ................................................................................................................ 25
2.4.1 Mechanisms underlying performance in AP possessors ................................... 26
2.4.2 No cumulative advantage of AP and tone-language experience ....................... 27
2.4.3 Speed-accuracy trade-off ................................................................................... 28
vi
2.4.4 Limitations ......................................................................................................... 28
2.5 Conclusion ............................................................................................................... 30
Chapter 3 Music Training and Tone-Language Experience: Associations with Sound
Discrimination ................................................................................................................... 31
3.1 Introduction .............................................................................................................. 31
3.1.1 Revisiting the shared processing of music and speech ...................................... 31
3.1.2 The present study ............................................................................................... 32
3.1.3 Electroencephalography: Components of interest ............................................. 33
3.1.4 Hypotheses ......................................................................................................... 34
3.2 Methods .................................................................................................................... 35
3.2.1 Participants ........................................................................................................ 35
3.2.2 Cognitive tests ................................................................................................... 36
3.2.3 Behavioural tasks ............................................................................................... 37
3.2.4 EEG stimuli ....................................................................................................... 38
3.2.5 Procedure ........................................................................................................... 40
3.3 Results ...................................................................................................................... 44
3.3.1 Cognitive tests ................................................................................................... 44
3.3.2 Behavioural tasks ............................................................................................... 44
3.3.3 ERP data ............................................................................................................ 45
3.4 Discussion ................................................................................................................ 51
3.4.1 Musicianship and tone language: Behavioural measures .................................. 53
3.4.2 Auditory neurophysiological benefits of musicianship and tone language ....... 54
3.4.3 Dissociation between neural and perceptual processing of music/speech ......... 55
3.4.4 Modularity of music and speech processing ...................................................... 57
3.4.5 Limitations ......................................................................................................... 57
3.5 Conclusion ............................................................................................................... 58
Chapter 4 A Theoretical Discourse on the Use of Nonlinear Methods to Investigate the
Music-Language Association ............................................................................................ 60
4.1 Common acoustic processing in musicians and tone-language speakers ................ 60
4.2 A nonlinear approach to studying the music-language link ..................................... 63
vii
4.2.1 The brain as a complex, nonlinear system ......................................................... 63
4.2.2 Application to the study of acoustic processing influenced by experience ....... 64
4.3 Brain signal variability ............................................................................................. 64
4.3.1 Current applications of BSV .............................................................................. 66
4.4 Moving from theory to application, in the context of the music-language
association ......................................................................................................................... 67
Chapter 5 Using Brain Signal Variability to Examine Differences between Musicians and
Tone Language Speakers .................................................................................................. 68
5.1 Introduction .............................................................................................................. 68
5.1.1 Brain signal variability: A recapitulation .......................................................... 68
5.1.2 The present investigation ................................................................................... 68
5.2 Methods .................................................................................................................... 70
5.2.1 EEG recording and pre-processing .................................................................... 70
5.2.2 Multiscale entropy analysis ............................................................................... 70
5.2.3 Spectral analysis ................................................................................................ 71
5.2.4 Statistical analysis .............................................................................................. 71
5.3 Results ...................................................................................................................... 73
5.3.1 Task PLS: Multiscale entropy and spectral data ............................................... 73
5.4 Discussion ................................................................................................................ 80
5.4.1 MSE data ........................................................................................................... 80
5.4.2 Comparing MSE results to the spectral analysis results .................................... 82
5.4.3 Comparisons to event-related potential findings (Chapter 3) ............................ 84
5.5 Limitations ............................................................................................................... 85
5.6 Conclusions .............................................................................................................. 85
Chapter 6 Tone Language, Musicianship, and Executive Function ................................. 87
6.1 Introduction .............................................................................................................. 87
6.1.1 Is tone-language experience associated with enhancement in executive function,
as is musicianship? ........................................................................................................ 87
6.1.2 Cognitive benefits in balanced, tone-language bilinguals ................................. 87
6.1.3 Working memory in tone-language speakers and musicians ............................ 88
viii
6.1.4 The present investigation ................................................................................... 90
6.2 Methods .................................................................................................................... 91
6.2.1 Participants ........................................................................................................ 91
6.2.2 Measures ............................................................................................................ 93
6.2.3 Procedure ........................................................................................................... 95
6.3 Results ...................................................................................................................... 96
6.3.1 Correlations ....................................................................................................... 96
6.3.2 F0 DL ................................................................................................................. 97
6.3.3 Pitch memory ..................................................................................................... 98
6.3.4 Two-back task .................................................................................................... 99
6.3.5 ANCOVAs ....................................................................................................... 100
6.4 Discussion .............................................................................................................. 100
6.4.1 Working memory ............................................................................................. 100
6.4.2 Replication of auditory measures, and associated limitations ......................... 103
6.5 Conclusions ............................................................................................................ 104
Chapter 7 General Discussion ......................................................................................... 106
7.1 Summary ................................................................................................................ 106
7.2 Musicianship: Nature versus nurture ..................................................................... 109
7.3 Future directions .................................................................................................... 110
8 Appendices ................................................................................................................ 113
8.1 Chapter 2: Nonmusical Stimuli .............................................................................. 113
8.2 Chapter 3: N1 and P2 ............................................................................................. 114
8.2.1 Introduction ..................................................................................................... 114
8.2.2 Methods: Analysis window and statistics ........................................................ 115
8.2.3 Results ............................................................................................................. 115
8.2.4 Discussion ........................................................................................................ 119
References ....................................................................................................................... 121
Copyright Acknowledgements ........................................................................................ 145
ix
List of Tables
Table 1. Demographic Information for the Four Participant Groups, N = 32.
Table 2. Means and Standard Errors of Mismatch Negativity Analysis Variables at Each
Level.
Table 3. Means and Standard Errors of Laterality Analysis Variables at Each Level.
Table 4. Means and Standard Errors of P3a Analysis Variables at Each Level.
Table 5. Means and Standard Errors of the Late Discriminative Negativity Analysis
Variables at Each Level.
Table 6. Correlations between Tasks Across All Groups.
Table 7. Correlations between Tasks, Displayed by Group.
Table S1. Nonmusical (Control) Stimuli Descriptions.
Table S2. Means and Standard Errors of N1 Analysis Variables at Each Level.
Table S3. Means and Standard Errors of P2 Analysis Variables at Each Level.
x
List of Figures
Figure 1. Group mean accuracy performance. **p < .001. Error bars indicate SE.
Figure 2. Group mean reaction time performance. **p < .01 Error bars indicate SE.
Figure 3. Spectrograms illustrating the standard, large, and small deviant stimuli for the
music (top row) and speech (bottom row) conditions. White lines indicate the
fundamental frequency of each tone or the first formant of each vowel.
Figure 4. Event-related potential (ERP) scalp topography for the mismatch negativity
(MMN) in the (a) large-deviant music, (b) large-deviant speech, (c) small-deviant music,
and (d) small-deviant speech conditions. The cluster of six electrodes is outlined on the
topography of Ms, as this group drove the significant between-group differences in all
conditions. Topographies show mean activation between two time points in each
condition, centered on the mean peak amplitude (190 to 200 ms for large deviants; 200 to
210 ms for small deviants).
Figure 5. A: Performance on the fundamental frequency (F0) difference limen (DL) task.
Musicians (M) and Cantonese-speaking participants (C) showed better pitch
discrimination than non-musicians (NM) controls. B: Performance on the first formant
frequency (F1) DL task. M showed superior discrimination of the first formant in speech
sounds, as compared to C and NM. ** p ≤ .01. Error bars indicate SE.
Figure 6. ERPs difference waves for each group and condition. Each waveform is an
average across six fronto-central channels (inset, F1, Fz, F2, FC1, FCz, FC2). M =
musicians; C = Cantonese speakers; NM = nonmusicians.
Figure 7. Mismatch negativity (MMN) peak amplitude between 100 ms to 250 ms for
each condition and group. The peak amplitude is the average peak of six fronto-central
electrodes (F1, Fz, F2, FC1, FCz, FC2). Error bars indicate SE. M = musicians; C =
Cantonese speakers; NM = nonmusicians.
xi
Figure 8. Loss of information as a result of averaging individual trials in EEG. The
variation between individual trials (left) is lost as a result of the averaging procedure, as
evident in the averaged waveform (right).
Figure 9. First latent variable (LV1), between-groups comparison: Contrasting the EEG
response to the music and speech conditions across measures of multiscale entropy (left)
and spectral power (right). The bar graphs (with standard error bars) depict brain scores
that were significantly expressed across the entire data set as determined by permutation
at 95% confidence intervals. The image plot highlights the brain regions and timescale or
frequency at which a given contrast was most stable; values represent ~z scores and
negative values denote significance for the inverse condition effect.
Figure 10. MSE curves for all groups, averaged across all conditions, at the right angular
gyrus.
Figure 11. First latent variable (LV1), between-conditions comparison: Contrasting the
EEG response to the music and speech conditions across measures of multiscale entropy
(left) and spectral power (right) for nonmusicians. The bar graphs (with standard error
bars) depict brain scores that were significantly expressed across the entire data set as
determined by permutation tests at 95% confidence intervals. The image plot highlights
the brain regions and timescale or frequency at which a given contrast was most stable;
values represent ~z scores and negative values denote significance for the inverse
condition effect.
Figure 12. Second latent variable (LV2), between-conditions comparison: Contrasting the
EEG response to the music and speech conditions across measures of multiscale entropy
(left) and spectral power (right) for Cantonese speakers. The bar graphs (with standard
error bars) depict brain scores that were significantly expressed across the entire data set
as determined by permutation tests at 95% confidence intervals. The image plot
highlights the brain regions and timescale or frequency at which a given contrast was
most stable; values represent ~z scores and negative values denote significance for the
inverse condition effect.
xii
Figure 13. A: Fundamental frequency difference limen task. B: Pitch memory task. C:
Visual two-back task.
Figure 14. Performance on the fundamental frequency (F0) difference limen task.
Musicians (M) showed superior pitch discrimination performance relative to
nonmusicians (NM) controls. **p ≤ .001. Error bars indicate SE.
Figure 15. Between-group performance on the pitch memory task (A: d’ data; B: reaction
time data). A gradient in d’ performance is visible, such that M > TL > NM. M perform
faster than TL. There appears to be a speed-accuracy trade-off in TL, such that good
performance is accompanied by slower reaction times. **p ≤ .001, *p < .05. � =
marginally significant. Error bars indicate SE.
Figure 16. Between-group performance on the two-back task (A: accuracy data; B:
reaction time data). Group differences are not significant. Error bars indicate SE.
Figure 17. Between-group PIQ performance. Group differences are not significant. Error
bars indicate SE.
Figure S1. ERP waves for each group and condition prior to subtraction. Each waveform
is an average across six fronto-central electrodes (F1, Fz, F2, FC1, FCz, FC2). M =
musicians; C = Cantonese speakers; NM = nonmusicians.
1
Chapter 1 The Music–Language Association
1.1 The link between music and speech
In the last 30 years, there has been increasing interest in neurophysiological processing of music
and speech (e.g., Besson & Macar, 1987; Koelsch, Maess, Gunter, & Friederici, 2001; Parbery-
Clark, Skoe, & Kraus, 2009). Between 2000 and 2013 alone, the number of articles containing
the terms “neural” AND “overlap OR sharing” AND “music” AND “language OR speech” has
risen from just under 1000 (in 2000) to nearly 6000 (in 2013; Peretz, Vuvan, Lagrois, &
Armony, 2015). Both music and speech rely heavily on auditory learning, serving as models of
experience-dependent neural plasticity in auditory networks (Patel, 2014; see Chapter 1.4 for a
discussion of gene-environment interactions related to auditory learning in musicians, as well as
for certain types of languages). Their close association is evident in the shared, interactive brain
regions involved in music and speech processing1 (e.g., Koelsch et al., 2001; Maess, Koelsch,
Gunter, & Friederici, 2001; Patel, 2008; Slevc, Rosenberg, & Patel, 2009). For example, the
processing of melody and harmony activates brain regions traditionally associated with
language-specific processes, including Broca’s and Wernicke’s areas (Koelsch et al., 2011;
Maess et al., 2011). In addition, neural regions traditionally associated with higher-order
language comprehension (i.e., frontal areas, such as Brodmann Area 47) are active when trained
musicians process complex musical metre and rhythm (Vuust, Roepstorff, Wallentin, Mouridsen,
& Ostergaard, 2006).
Given the co-activations of neural regions involved in music and speech processing,
researchers have become interested in whether music training can benefit speech processing (see
Besson, Chobert, & Marie, 2011 for a review). For example, musicianship is associated with
superior perception of degraded speech (Bidelman & Krishnan, 2010), speech in noise (Parbery-
Clark, Skoe, Lam, & Kraus, 2009; Zendel & Alain, 2012), and intonation contours (e.g., Schön,
Magne, & Besson, 2004) as well as enhanced phonological awareness (see Tierney & Kraus,
2013 for a review) and binaural sound processing (e.g., Parbery-Clark, Strait, Hittner, & Kraus,
2013). Links between music and languages are further supported by altered brain circuitry in 1 Note that co-activation of neural regions does not, by default, translate to the sharing of neural regions (see Peretz et al., 2015 for a discussion).
2
musicians at both cortical (e.g., Marie, Delogu, Lampis, Belardinelli, & Besson, 2011; Pantev,
Roberts, Schulz, Engelien, & Ross, 2001; Schön et al., 2004) and subcortical levels (e.g.,
Bidelman et al., 2011a; Wong, Skoe, Russo, Dees, & Kraus, 2007), which facilitates sensory-
perceptual and cognitive control of speech information (Bidelman, Hutka, & Moreno, 2013).
These subcortical circuits may be tuned by descending corticofugal projections from the
cortex (e.g., Kirshnan, Xu, Gandour, & Cariani, 2005; Kraus & Chandrasekaran, 2010).
However, it is of interest to compare the two-way interactions of the cortical and subcortical
circuits with the proposals made by the reverse hierarchy theory of auditory processing (Ahissar,
Nahum, Nelken, & Hochstein, 2009). According to this theory, rapid perception is only based on
high-level representations (Ahissar et al., 2009). For example, when listening to a piece of music,
the piece can be identified, despite not explicitly accessing the information that is used for this
identification, such as determining whether or not two subsequent notes ascend or descend
(Ahissar et al., 2009). Thus, this theory supports a model of perception that is driven primarily by
top-down influences, rather than a combination of bottom-up and top-down influences.
The association between music and speech is also the basis of Patel’s (2011) OPERA
hypothesis, which describes how music-driven adaptive plasticity may occur for the processing
of linguistic stimuli when five conditions are met: Overlap of brain networks for processing
speech and music; Precision, such that music processing requires greater precision than speech
processing; Emotion, such that music engages strong positive emotions; Repetition, such that
musical activities that engage this network are frequently repeated, and Attention, such that
focused attention is engaged by musical activities. When these conditions are met, neural
plasticity drives the auditory system to function with greater precision than required for speech
processing. Because music and speech share a common network, speech processing can benefit
from music training. The OPERA hypothesis was predicated on the shared syntactic integration
resource hypothesis (Patel, 2003), which claimed that music and language rely on shared,
limited-processing resources and that these resources activate separable syntactic representations.
In other words, music training may tune the precision of auditory processing (i.e., spectral
acuity), which, in turn, facilitates linguistic processing.
3
1.2 Pitch, music, and tone languages
Pitch is structurally important to both music and speech. As described in Pfordresher and
Brown (2009), pitch information in music provides information about tonality (e.g., Krumhansl,
1990), harmonic changes (e.g., Holleran, Jones, & Butler, 1995), boundaries of phrases (e.g.,
Delige, 1987), rhythm and metre (e.g., Holleran et al., 1995; Jones, 1987), and expectations
about future events (Narmour, 1990). Pitch information in speech conveys word stress, utterance
accent, phrasal meaning, and the speaker’s emotion (i.e., intonation, Cruttenden, 1997; Wong et
al., 2012; see Pfordresher & Brown, 2009). It is important in all verbal languages at the sentence
level (Rattanasone, Attina, Kasisopa, & Burnham, 2013). However, there is a gradient of the
precision of pitch use across languages. Namely, tone languages, unlike other types of
languages, use pitch phonemically (i.e., at the word level, Cutler, Dahan, & van Donselaar, 1997;
Yip, 2002).
Tone languages comprise 60 to 70 percent of the world’s languages (Rattanasone et al.,
2013), and are mostly found in Asia and Africa (Maddieson, 2013). Most Asian tone languages
consist of both level and contour tones (Rattanasone et al., 2013). From the onset to the offset of
a tone, level tones maintain a relatively stable pitch height (Rattanasone et al., 2013).
Conversely, contour tones show pitch height (i.e., pitch interval) changes; from offset to onset,
rising tones increase in pitch, whereas falling tones decrease in pitch (Rattanasone et al., 2013).
In contrast to Asian tone languages, many African tone languages convey lexical information via
pitch height, rather than via contours (Yip, 2002).
Notably, the tone language speakers tested in the current thesis spoke Asian tone
languages, and most prominently, Cantonese. Cantonese speakers were chosen because their
exposure to aspects of pitch would most closely approximate that gained via musical training, as
compared to other tone languages that would be accessible in a participant sample in Toronto. Of
all tone languages, Cantonese has one of the largest tonal inventories, comprising six tones –
three of which are level, and three of which are contour (Rattanasone et al., 2013, Wong et al.,
2012). These level pitch patterns are differentiable based on pitch height (Gandour, 1981;
Khouw & Ciocca, 2007). The proximity of tones is approximately one semitone (i.e., a 6 %
difference in frequency, calculated from Peng, 2006), which is also the smallest distance found
between pitches in music (Bidelman, Hutka, et al., 2013). Note that this does not mean that
4
Cantonese language experience is on par with musicians’ auditory experience. Cantonese
speakers have less pitch processing experience than musicians, who have extensive experience
with twelve level tones (i.e., the number of semitones in a scale) at several octaves, the
processing of pitch contours as a result of the demands of musicianship, and the perception and
production of complex melodies and harmonies. Furthermore, musicians’ auditory demands
include processing simultaneous tones (e.g., chords), and attending to the tone quality (i.e.,
timbre) of their instrument, and other instruments around them. In comparison, tone language
speakers have lesser auditory demands, typically processing a single, sequential stream of
speech, without the same emphasis as musicians on tracking timbral cues. Because of the higher
auditory demands faced by musicians (relative to tone language speakers), one might predict that
benefits to auditory processing in musicians might be greater than to tone language speakers.
Though Cantonese may not confer the same experience as music training, Cantonese is
still arguably more demanding on the auditory system than most tone languages. One might posit
that a speaker of an African language such as Jukun (three level tones; Patel, 2008) would be
similarly skilled at perceiving minute changes in steady-state, level pitch. However, Cantonese
speakers have the experience of using three additional contour tones in addition to the level
pitches to convey lexical meaning, arguably increasing their amount of pitch-processing
experience (i.e., as compared to another language with fewer level and/or contour tones).
Furthermore, one could argue that processing contour tones is associated with greater auditory
task demands than the processing of level tones, making a tone language such as Cantonese a
particularly rich auditory experience.
1.3 Tone language: Associations with auditory processing
The precise use of pitch for both musicians and tone-language speakers raises the
question of whether experience with one type of pitch processing might be associated with pitch
processing in the other domain (Wong et al., 2012). However, unlike the association of
musicianship with spectral acuity and linguistic processing (Schellenberg & Peretz, 2008;
Schellenberg & Trehub, 2008; see also Bidelman, Hutka, et al., 2013), comparable evidence of
associations with tone-language experience is limited and conflicting (Bidelman, Hutka, et al.,
2013; Yip, 2002). Behavioural studies have revealed contradictory findings with respect to the
nonlinguistic pitch perception abilities of tone-language speakers, ranging from weak
5
associations (Giuliano, Pfordresher, Stanley, Narayana, & Wicha, 2011; Wong et al., 2012) to no
associations (Bent, Bradlow, & Wright, 2006; Bidelman et al., 2011b; Schellenberg & Trehub,
2008; Stagray & Downs, 1993). Evidence from brainstem-evoked responses suggests that tone-
language speakers and musicians are more accurate than nonmusicians at tracking the pitch of a
musical interval, a lexical tone, and musical chords (Bidelman et al., 2011a, 2011b). The
equivocal findings of tone-language experience on spectral acuity may be due to limitations of
the behavioural studies, including the heterogeneity of groups (e.g., pooling listeners across
multiple language backgrounds, Pfordresher and Brown, 2009) and the use of very simple
musical stimuli (Bidelman et al., 2011b).
Additional evidence for positive associations between tone language experience and
nonlinguistic domains comes from Wong et al. (2012), who examined the influence of one’s
spoken language on musical pitch processing in individuals with or without amusia.2 The
participants in this study were either tone-language speakers (Hong Kong Cantonese), or non-
tone-language speakers (Canadian French and English). Cantonese speakers showed enhanced
pitch perception abilities, compared to the English and French speakers. This effect remained
even after controlling for variables such as musical background and age. Both groups performed
comparably on measures of rhythmic perception, demonstrating that this benefit was specific to
pitch processing (Wong et al., 2012). When only examining the participants classified as amusic
(i.e., the lowest 5% of each respective group – a cut-off chosen because this is the typical
prevalence rate of amusia in Western populations, Wong et al., 2012), the Cantonese speakers
significantly outperformed the English and French speakers. This effect remained after
controlling for music training.
However, this positive link between tone-language experience and musical pitch
perception in amusics was not evident in another study (Nan, Sun, & Peretz, 2010). This null
finding may be explained because the participants in Nan et al. (2010) were Mandarin rather than
Cantonese speakers (Wong et al., 2012). Specifically, of the four lexical tones of Mandarin, only
one is level (Rattanasone et al., 2013), potentially placing less of a demand on the use of
contextual information for tone processing, as compared to the three level tones (out of a total of
2 Amusia is a neurogenetic disorder affecting music (pitch and rhythm) processing that affects approximately 4 to
6% of the Western (non-tone language speaking) population (Kalmus & Fry, 1980; Peretz et al., 2008).
6
six lexical tones) of Cantonese (Wong et al., 2012). Wong et al.’s (2012) findings support the
position that pitch processing is domain-general, such that if a person with amusia uses pitch to
convey lexical meaning at the word level, they may have better processing of musical pitch, as
compared to a non-tone-language-speaking amusic. This implies that the processing of musical
pitch and lexical tones share certain cognitive resources, in line with the OPERA hypothesis
(Patel, 2011). Contrary to this view, Zatorre and Baum (2012) have posited that pitch processing
in typical (i.e., non-amusic) listeners differs for music and speech, with fine-grained
representations required for music and coarse-grained representations for language3.
Suggestive associations between tone-language experience and auditory processing come
from the literature on absolute pitch (AP), which refers to the ability to name a note without a
reference pitch (Baggaley, 1974). Deutsch, Henthorn, Marvin, and Xu (2006) found that
approximately 53% of tone-language-speaking musicians (Mandarin) possessed AP, as
compared with approximately 7% of musicians who are speakers of non-tone languages. They
suggested that higher rates of AP in tone-language speakers stem from the fact that pitch is used
to distinguish the meanings of words in their native language, such that tone-language speakers
learn to associate tones with meaningful verbal labels. When these individuals begin music
training, learned pitch-meaning associations may facilitate the mapping of musical tones to note
names and hence the development of AP (Deutsch et al., 2006).
In summary, tone-language experience may be associated with auditory spectral acuity
(Bidelman et al., 2011a; Bidelman, Hutka et al., 2013; Deutsch et al., 2006; Wong et al., 2012),
although the evidence favouring this claim is mixed (Bent et al., 2006; Bidelman et al., 2011b,
Giuliano et al., 2011; Schellenberg & Trehub, 2008; Stagray & Downs, 1993). Because the tone-
language speakers are also often bilingual, by virtue of being usually tested in non-tone-language
speaking countries, bilingualism may account for the observed effects. Like music training,
bilingualism has been studied as a model of sensory and cognitive enrichment (Krizman, Marian,
Shook, Skoe, & Kraus, 2012). At the sensory level, bilingualism has been found to affect neural
processing and the perception of sound (e.g., Spanish-English bilinguals: Krizman et al., 2012;
Mandarin-English bilinguals: Bidelman et al., 2011a, 2011b). At the cognitive level, bilingualism
3 This statement does not specifically address fine- versus coarse-grained representation in tone languages. It is
possible that fine-grained representations may be more relevant to tone languages than to non-tone languages.
7
has been linked with gains in inhibitory control, working memory, and attention, honed by
constantly suppressing a non-target language in favour of engaging a target language during
communication (Bialystok, 2011; Bialystok, Craik, & Luk, 2012). Via top-down control,
inhibition can shape auditory processing at the subcortical level (Fritz, Elhilali, David, &
Shamma, 2007). Thus, the cognitive benefits of bilingualism – particularly, to inhibitory control
- may contribute to their enhanced auditory abilities at the sensory level (Krizman et al., 2012).
However, differences in cognitive processing between bilingual and monolingual young adults
are small (Bialystok et al., 2012) and limited to certain conditions (e.g., daily language use and
the age one began using both languages, Luk & Bialystok, 2013).
Indeed, there are claims that the cognitive advantage for bilinguals over monolinguals on
a range of executive-control tasks (e.g., interference, working-memory updating) and for a range
of age groups arises from publication bias favouring enhancement evidence (de Bruin, Treccani,
& Della Sala, 2015). Similar skepticism has been expressed by scholars who failed to find
reliable advantages for early bilinguals, balanced bilinguals (i.e., comparable proficiency or use
of both languages), or trilingualism on inhibitory control, monitoring, and task-switching for
nonverbal tasks (Paap, Johnson, & Sawi, 2014). If bilingualism is not associated with gains to
executive function (at least not outside the auditory domain), then the claim that bilinguals’
sensory-level benefits are subject to top-down modulation is weakened. In short, if tone language
does benefit auditory processing, these gains are likely not due to the fact that they are also
bilingual.
1.4 Nature, nurture, and causality
Musicians and tone-language speakers provide a means to investigate experience-
dependent neural plasticity in auditory networks. Earlier in this chapter (1.2), I discussed the
differential auditory demands associated with music training and speaking a tone language (i.e.,
“nurture”), and how these differential demands might predict each groups’ performance on
auditory tasks. There are also differences in the contribution of “nature” to each group.
Variables such as musical aptitude, genetics, socioeconomic status (SES), and personality
may determine whether an individual takes music lessons. As discussed in Corrigall and
Schellenberg (2015), certain factors, such as increased levels of passion for music and musical
8
aptitude (i.e., natural musical ability, Schellenberg, 2015) can influence the likelihood of taking
music lessons (e.g., Macnamara, Hambrick, & Oswald, 2014; Ruthsatz, Detterman, Griscom, &
Cirullo, 2008). Furthermore, specific genes have been linked to many of these factors (Tan,
McPherson, Peretz, Berkovic, & Wilson, 2014), namely musical aptitude (e.g., Park et al., 2012;
Ukkola, Onkamo, Raijas, Karma, & Järvelä, 2009; Ukkola-Vuoti et al., 2013), musical
achievement (Hambrick & Tucker-Drob, 2015), and practice (Butkovic et al., 2015; Mosing et
al., 2014; Hambrich & Tucker-Dobb, 2015). The gene AVPR1A on chromosome 12q has been
associated with music perception (Ukkola et al., 2009), music memory (Granot et al., 2007), and
music listening (Ukkola-Vuoti et al., 2011), while other loci on chromosome 8q are implicated in
absolute pitch and music perception (Pulli et al., 2008; Theusch, Basu, & Gitschier, 2009;
Ukkola-Vuoti et al., 2013). Yet another gene, SLC64A on chromosome 17q has been linked to
music memory and choir participation (Morley et al., 2012). Therefore, individuals who possess
such genes may gravitate towards, and then continue, musical training (i.e., nurture
complementing and reinforcing nature; Schellenberg, 2011a).
Demographics also influence who becomes a musician. For example, musicians, as
compared to nonmusicians, often come from a higher family SES (e.g., Corrigall, Schellenberg,
& Misura, 2013; Mullensiefen et al., 2014; Sergeant & Thatcher, 1976; Schellenberg, 2006).
Furthermore, children enrolled in music lessons are more likely to be enrolled in extracurricular
activities other than music lessons (Corrigall et al., 2013; Schellenberg, 2006, 2011b). Musicians
also differ from nonmusicians in terms of personality traits (Corrigal et al., 2013; Corrigall &
Schellenberg, 2015). In Corrigall et al. (2013), openness-to-experience was the best predictor of
musical involvement, even when holding demographic variables and cognitive ability constant.
Furthermore, in a recent study, parents’ openness-to-experience predicted the duration of their
children’s musical training, even when controlling for the children’s demographic variables,
intelligence, and personality (Corrigal & Schellenberg, 2015). Given that personality has genetic
correlates (Bouchard, 2004; Matthews, Deary, & Whiteman, 2003), it is plausible that
individuals with certain personality traits might gravitate towards music lessons, continue
playing an instrument, and thus become the musical experts recruited for quasi-experimental
studies comparing musicians and nonmusicians.
9
Though there are several studies that have randomly assigned participants to music
lessons or a control condition (Chobert, Francois, Velay, & Besson, 2014; Francois, Chobert,
Besson, & Schön, 2013; Kraus et al., 2014; Moreno, Lee, Janus, & Bialystok, 2014; Moreno et
al., 2009), the scope of the benefits conferred by music training appear to be quite limited. For
example, for the claim that music training causes improvements in non-musical abilities, the
causal evidence in above five studies is far weaker than the effects observed in other
correlational or quasi-experimental research (Schellenberg, 2015). Such research typically
includes a group of highly-trained, adult musicians, and compares their performance to that of
nonmusicians (e.g., Oechslin, Van De Ville, Lazeyras, Hauert, & James, 2013; Palleson et al.,
2010; Parbery-Clark et al., 2013; Schön et al., 2004; Zuk et al., 2014). Thus, even if music
training can benefit auditory processing, such as speech perception or pitch processing, these
effects tend to be larger when musicians are self-selected in a cross-sectional design, rather than
when randomly assigned to training (Schellenberg, 2015). This observation supports a joint
influence of genes and environment on musicianship.
It is, however, notable that in studies using random assignment participants are trained
for a much shorter period of time than professional-level musician (i.e., four weeks in Moreno et
al., 2014, up to two years in Francois et al., 2013 and Kraus et al., 2014). To test whether long-
term training alone benefits auditory processing and cognition, participants would need to be
randomly assigned to either a music training or a control group, and formally trained for a
duration of time on par with that received by professional-level musicians (e.g., 15.9 years – the
average number of years of formal music training the musicians in this thesis received). Many
studies have indeed found correlations between the duration, quantity, or intensity of music
training and non-musical abilities (e.g., Brod & Optiz, 2012; Corrigall & Trainor, 2011; Strait,
O'Connell, Parbery-Clark, & Kraus, 2014). These correlational findings clearly indicate that
training is important; however, they do not rule out the potential contributions of genetic
predispositions and environmental factors (e.g., genetics, SES; personality; Schellenberg, 2015).
Thus far, it has been established that musicianship is affected by both genetic
contributions and training (i.e., nature and nurture). Throughout this thesis, any benefits observed
in musicians, as well as the term “musicianship”, relate to a gene-environment interaction in this
group. It is acknowledged that one cannot make causal inferences from comparisons of
musicians and nonmusicians in a cross-sectional design, as there is no random assignment in
10
these designs (Corrigall et al., 2013). That is, pre-existing differences may determine who takes
music lessons, driving the effects observed in this group, whereas no such differences are present
in the nonmusician group.
Unlike musicians, tone-language speakers are not subject to self-selection. Nearly
everyone learns a tone language if they are raised in a tone-language-speaking country (e.g.,
China, Thailand) - not just those who are gifted pitch processors. However, it is notable that
there has been some research conducted on the link between genetics and whether one speaks a
tone language (Dediu & Ladd, 2007), as well as interest in individual differences in learning
lexical tonal distinctions (Wong, Perrachione, & Parrish, 2007). Dediu and Ladd (2007)
examined the association between allele frequencies of genes related to brain growth and
development (ASPM and Microcephalin) and typological features of languages. They found a
link between genetic and linguistic diversity, such that certain alleles can bias language
acquisition and/or processing (Dediu & Ladd, 2007). This is not to say that there is a gene for a
specific language (e.g., a “Cantonese-speaker gene”). However, heritable structural and
functional differences in the brain may influence the acquisition and use of tone versus non-tone
languages (Dediu & Ladd, 2007).
Supporting the notion that there may be pre-existing differences related to learning
lexical pitch, Wong et al., (2007) examined how adult speakers of a non-tone language (English)
learned lexical tonal distinctions. In this study, functional magnetic resonance imaging (fMRI)
was used to examine the neural correlates of pitch-pattern discrimination before and after
training on the lexical tonal distinctions. Participants who excelled at learning the distinctions
showed increased brain activation in the left posterior superior temporal area after training.
However, those who reached a lower ceiling of performance showed increased activation in the
right superior temporal area and right inferior frontal gyrus (associated with nonlinguistic pitch
processing), and medial frontal areas (associated with increased use of memory and attention
resources; Wong et al., 2007). Neural activation differed between groups even before training,
suggesting that pre-existing neural differences contribute to learning lexical tonal pitch.
Collectively, these studies suggest that there may be genetic correlates at the population
level and/or individual differences related to the use of tone versus non-tone languages,
suggesting that tone language may not exclusively be the product of “nurture”. However, these
11
findings do not change the fact that speaking a tone language does not qualify as self-selection in
the way the term applies to musicians (i.e., one has a predisposition, seeks out an activity that
complements that predisposition, and continues training in that activity). I predict that acquiring
one’s pitch processing abilities via a nature-nurture interaction (i.e., musicians) might be
associated with better performance on auditory tasks than those who acquire pitch processing
abilities via only nurture. This prediction is based on the possibility that self-selecting factors
(e.g., musical aptitude, genetics), in conjunction with music training, would be associated with
greater benefits to auditory processing, as compared to tone language speakers, who were not
self-selected for pitch processing abilities. I examine this prediction in relation to findings
throughout the thesis.
1.5 Executive function in musicians and tone language speakers
In addition to benefits in spectral acuity, musicianship has been associated with
enhancements to certain executive functions (see Section 6.1.3. for a detailed discussion on the
nature-nurture interaction underlying nonmusical, cognitive benefits in musicians versus
nonmusicians). Executive function is defined here as a collection of top-down mental functions
(Diamond, 2013), including planning, working memory, inhibition, mental flexibility, initiation
of action, and monitoring of action (Chan, Shum, Toulopoulou, & Chen, 2008). For example,
musicians, as compared to nonmusicians, have shown enhancements in auditory working
memory (Pallesen et al., 2010; Parbery-Clark, Skoe, & Kraus, 2009), response inhibition
(Moreno et al., 2011) as well as verbal fluency, processing speed, and task switching (Zuk,
Benjamin, Kenyon, & Gaab, 2014). Moreover, there is a positive correlation between executive
function and pitch identification (Hou et al., 2014). However, it is important to interpret such
correlational evidence with caution, until causality can be assessed.
These findings are accounted for by the multiple executive sub-skills recruited by
musicians, such as sustained attention, working memory, and goal-directed behaviour (Zuk et al.,
2014). These executive-function benefits may modulate the cognitive-linguistic enhancements
(e.g., selective attention for speech in noise, Parbery-Clark, Skoe, Lam, et al., 2009) that have
been observed in musicians (Moreno & Bidelman, 2014; Zuk et al., 2014). However, many of
the executive-level enhancements observed in musicians may be limited to the auditory domain,
and may not extend to non-auditory executive functions. For example, several studies have found
12
advantages in auditory but not visual working memory in musicians, as compared to
nonmusicians4 (e.g., Brandler & Rammsayer, 2003; Chan, Ho, & Cheung, 1998; Ho, Cheung, &
Chan, 2003; Parkery-Clark, Skoe, & Lam, et al., 2009; Strait, Kraus, Parbery-Clark, Ashley,
2010). If all musicians indeed do not have enhanced cognitive function outside of the auditory
domain, this may mean that any top-down modulation of sensory processing in musicians is
limited to the auditory system (i.e., within-domain top-down control rather than cross-domain
top-down control).
Furthermore, there is evidence to suggest that top-down modulation may impact lower-
level processing via executive function in musicians. I posit that this modulation might also
occur in tone-language speakers. Speaking a tone language involves relative pitch processing
(Xu, 1997, 1999; Xu & Wang, 2001), which has been shown to recruit working memory (Klein,
Coles, & Donchin, 1984; Wayman, Frisina, Walkton, Hantz, & Crummer, 1992). Perhaps
working memory modulates lower-level (i.e., pitch processing) abilities in tone-language
speakers via top-down modulation. The present thesis tests whether musicians and tone-language
speakers perform comparably on measures of working memory and auditory processing, as
compared to controls (Chapter 6).
1.6 The present investigation
The overall objective of this thesis was to assess whether tone-language experience is
associated with benefits in auditory processing and executive function similar to those associated
with music training. Chapter 2 builds on the promising link between tone language and auditory
processing enhancements via AP, describing a behavioural study that examined how AP and
tone-language experience contribute to music processing and the representation of pitch-relevant
information (i.e., pitch encoding). Musicians with and without AP who speak a tone or non-tone
language were tested to determine if AP, tone-language experience, both, or neither are
associated with enhanced processing and encoding of pitch (i.e., one aspect of spectral acuity).
Although this study addressed the relative contribution of tone-language experience to spectral
acuity, it was difficult to ascertain the independent contributions of music and linguistic expertise
4 However, there are also findings of modality-independent memory enhancement in musicians (Bidelman, Hutka,
et al., 2013; George & Coch, 2011; Jakobson, Lewycky, Kilgour, & Stoesz, 2008).
13
because all participants were musicians. For this reason, all subsequent studies in the thesis
compared tone-language speakers who were nonmusicians with musicians with had no tone-
language experience and controls who had neither tone-language experience nor music training.
Chapter 3 describes a study that used behavioural tasks and electroencephalography
(EEG) to examine pitch discrimination of musical and vowel sounds by tone-language speakers,
musicians, and controls. This study examined relations between tone-language experience on
spectral acuity in several ways. First, a mismatch-negativity paradigm was used for sound
discrimination, building on a well-established research literature on auditory processing (see
Näätänen, Paavilainen, Rinne, & Alho, 2007 for a review). Second, the musical and vowel
stimuli allowed for an examination of within- and cross-domain enhancements for the three
groups of participants. Third, the fundamental frequency (pitch) and first formant (timbre) were
manipulated to probe core spectral-processing abilities. EEG was selected as a response measure
not only for its high temporal resolution and cost-effectiveness but also to eliminate the
potentially confounding effects of scanner noise on auditory processing.
Chapter 4 provides a novel theoretical approach to the association between tone-language
experience or musicianship on perception. To date, linear methods, such as averaging across
EEG trials, have been used to study the association between musicianship and language (e.g.,
Schön et al., 2004). The new theoretical approach applies non-linear analyses of brain activity to
clarify the relative contributions of musicianship and tone language-experience to auditory
processing and complement the linear measures that have been used in previous investigations of
music–language associations.
Chapter 5 applies this theoretical approach to the EEG data, examining non-linear
dependencies in the brain signal over multiple timescales (i.e., multiscale entropy), with the
application of data-driven multivariate statistical analysis known as partial least squares. These
results are then compared to the results of Chapter 3, which examined linear dependencies in the
data (i.e., averaging event-related potentials across trials).
Chapter 6 describes a behavioural study that used spectral-acuity and executive- function
(i.e., pitch memory, working memory) tasks to examine the range of plasticity associated with
tone-language experience. Chapter 7, the final chapter, summarizes the findings, and revisits the
role of nature and nurture in musicianship, as well as the role of nurture in tone language
experience. Potential applications and future directions of the current investigation are also
discussed.
14
Chapter 2 Absolute Pitch and Tone-Language Experience:
Associations with Pitch Processing and Encoding in Musicians
2.1 Introduction
2.1.1 Definition of absolute pitch
Absolute pitch (AP, also known as perfect pitch) is the ability to identify or produce a specific
pitch without a reference pitch (Baggaley, 1974). Individuals with AP have long-term memory
for musical pitch, remembering the pitches by name (Levitin, 1994). AP depends on pitch
memory and pitch labeling (Levitin, 1994). Pitch memory, which is the ability to maintain and
access stable, long-term representations of specific pitches in memory (Levitin 1994), is
commonly found in nonmusicians as well as musicians (Deutsch, 1987; Halpern, 1989;
Schellenberg & Trehub, 2003; Terhardt & Seewann, 1983; Terhardt & Ward, 1982). For
example, Schellenberg and Trehub (2003) found that university students without AP
distinguished familiar television soundtracks (instrumental) from versions that were pitch-shifted
by two semitones at 70% accuracy and from versions pitch-shifted by one semitone (i.e., the
smallest meaningful pitch difference in Western music) at 58% accuracy.
By contrast, pitch labeling—the ability to attach the correct musical label (e.g., D#, A440,
or Do) to isolated pitches—is rare and necessarily limited to those with music training (Levitin,
1994). In the United States and Europe, the prevalence of AP, and thus pitch labeling, is
estimated to be less than 0.01 % (i.e., fewer than one in 10,000 people) in the general population
(Profita & Bidder, 1988). In musically-trained individuals, this percentage has been shown to
vary, depending on the type of institution or music program in which one is enrolled (Gregersen,
Kowalsky, Kohn, & Marvin, 1999). In a large survey of music students in the United States,
24.6% of conservatory students had AP. These numbers dropped to 7.3% in a university-based
school of music, and 4.7% in a liberal arts or state university music program (Gregersen et al.,
1999). AP ability is often associated with musical giftedness, the early onset of music training,
and speaking a tone language (Deutsch et al., 2006; Takeuchi & Hulse, 1993; Ward, 1999).
15
2.1.2 The link between absolute pitch and language
The prevalence of AP is significantly higher in countries that use a tone language than in
those that do not (Baharloo, Johnston, Service, Gitschier, & Freimer, 1998; Deutsch et al., 2006).
For example, Gregersen et al. (1999) found that in students who reported their ethnic
backgrounds as “Asian or Pacific Islander”, 49.3% of conservatory students, 25.7% of university
music program students, and 8.3% of liberal arts or state university music programs had AP.
These numbers are dramatically higher than what the authors observed in non-Asian music
students, particularly at the conservatory and university-music-program level. This spread
between AP prevalence in Asian versus non-Asian participants was also observed in Deutsch et
al. (2006), who found that approximately 53% of Mandarin-speaking conservatory musicians
possessed AP, where as only approximately 7% of non-tone-language-speaking conservatory
musicians possessed AP.
The increased prevalence of AP in tone-language speakers may be related to early-
learned associations between pitches and lexical categories in tone-language speakers (Deutsch,
2013, Deutsch & Dooley, 2013; Deutsch et al., 2006; Lee & Lee, 2010). These associations may
positively influence the likelihood of developing AP, above and beyond the likeliness of
developing AP if you are a non-tone-language-speaker who starts music training early in life.
Some evidence for this claim comes from the finding that, in conservatory music students who
begin music lessons at ages 4 to 5, approximately 60% of tone-language speakers develop AP,
while less than 20% of non-tone-language speakers develop AP (Deutsch et al., 2006). Some
have also found that tone-language (and not non-tone) speakers consistently enunciate words in
their native language at a consistent pitch (Deutsch, Henthorn, & Dolson, 1999, 2004),
suggesting that tone language speakers may also have more precise pitch memory than those
who speak a non-tone language.
Numerous studies have identified a relationship between AP and neural circuitry
underlying speech processing, specifically, in the planum temporale (PT)—a temporal lobe
region that, in the left hemisphere, contains Wernicke’s area, and is critically involved in
processing semantic meaning in speech (Deutsch & Dooley, 2013). In most individuals, the PT is
leftward-asymmetric (Geschwind & Levitsky, 1968), and this asymmetry is larger in those with
AP as compared to those without AP (Keenan, Thangaraj, Halpern, & Schlaug, 2001; Schlaug,
16
Jancke, Huang, & Steinmetz, 1995; Zatorre, Perry, Beckett, Westbury, & Evans, 1998).
Furthermore, individuals with AP have greater white matter connectivity in the left PT and
surrounding areas when performing a speech processing task (Oechslin, Meyer, & Jäncke, 2010).
Those with AP also have heightened connectivity of white matter between regions in the left
temporal lobe - a region that is responsible for speech-sound categorization (Loui, Li, Hohmann,
& Schlaug, 2011). The neuroanatomical differences between AP and non-AP individuals in
speech-related areas suggest different neural circuitry underlying their memory for speech
sounds. Taken together, these findings suggest that AP is linked to language as well as to music.
The interconnection across domains is further strengthened by evidence for enhanced
auditory processing in musicians and tone-language speakers. Bidelman, Hutka, et al. (2013)
demonstrated that pitch-processing experience, whether it stemmed from tone language or music
experience, was linked to similar benefits in lower-order (pitch-discrimination sensitivity,
processing speed) and higher-order (tonal memory, melodic discrimination) processes necessary
for robust music perception. This finding indicates that tone-language experience, even in the
absence of music training, can contribute to enhanced performance on music tasks.
2.1.3 The present study
Collectively, the findings on the AP–language link, particularly as related to tone
language, raise the question of how AP and tone-language background could affect behavioural
performance on tasks of music processing and encoding. Addressing this research question
would provide insights into the relationship between music and language as well as the
mechanisms of AP. If AP and tone language contribute to pitch processing and encoding, then
being an AP possessor and a tone-language speaker may yield a cumulative advantage in pitch
processing and encoding. Conversely, if AP and tone language represent independent domains,
they should not interact on tasks of pitch processing and encoding. To test these hypotheses, the
performance of tone-language-speaking musicians with and without AP was compared to that of
non-tone-language-speaking musicians with and without AP.5 A two-by-two between groups
design using a zero-back and one-back task was used to investigate the processing and encoding
of music stimuli.
5 This study has been published as Hutka and Alain (2015).
17
2.2 Methods
2.2.1 Participants
Thirty-five participants completed the study. Three participants were excluded due to
technical difficulties that resulted in incomplete data sets. Of the remaining 32 participants, there
were 17 females and 15 males, ranging in age from 18 to 28 (mean, M = 22.53; standard error,
SE = 0.51). All participants were instrumental musicians who had a minimum of seven years of
formal training (M = 17.36; SE = 0.78). There was no effect of gender on either accuracy, F(1,
30) = 1.59, p = .217, or reaction time, F < 1. Participants whose primary instrument was voice or
percussion were not recruited. All participants reported normal hearing and normal or corrected-
to-normal visual acuity. Participants were divided into four groups: (1) no AP, tone language; (2)
no AP, non-tone language; (3) AP, tone language, and (4) AP, non-tone language. Participants,
who had an average of 17.25 years of formal education (SE = 0.39), did not differ in education
across groups, F(3, 28) = 1.68, p = .195, η2p = .152. Table 1 displays additional demographic
information for each of the four participant groups. It is notable that participants with AP started
music lessons earlier and had more years of music training than participants without AP. Ethics
approval was granted by the Baycrest Research Ethics Board and the Department of Psychology
Ethics Review Committee at the University of Toronto. Participants were recruited by flyer and
referral, primarily from the University of Toronto’s Faculty of Music. All participants had Grade
Eight Royal Conservatory of Music accreditation or equivalent (i.e., entry requirements for many
university and college music programs) and were comfortable sight-reading in the treble clef.
Participants were compensated $10.00 per hour and were reimbursed for transit or parking. Each
participant completed one test session, which lasted approximately one-and-a-half hours. All
participants provided written, informed consent.
AP ability was assessed for each participant via an AP test created for this experiment.
The AP test consisted of 20 1500-ms piano tones, randomly chosen from across the eight octaves
of a piano keyboard. Tones were generated using Sibelius v3.0, exported as an audio file, and
normalized in Adobe Audition v1.5. Fundamental frequency values were based on an equal-
tempered scale, A4 = 440Hz. After hearing a note, the participant would write down the note
name on a response sheet. If participants correctly identified 80% or more of the notes (16 or
more out of 20), they were considered AP-possessors. The range and randomization of the notes
18
made it difficult for anyone to receive a score of 16 or higher by chance. The 80% cut-off score
is based on procedures described in Ross, Gore, and Marks (2005) and Wu, Kirk, Hamm, and
Lim (2008).
Table 1
Demographic Information for the Four Participant Groups, N = 32.
To be included in a tone-language group, participants had to be native speakers of
Mandarin or Cantonese. A language questionnaire was administered to determine whether
participants were fluent in oral and written Mandarin or Cantonese as well as the age at which
they began to speak English fluently. The average age at which the tone-language speakers
learned English was 4.44 years (SE = 1.05). All tone-language participants reported that they
were still fluent in their native tone language at the time of the study.
Group Gender
distribution
Average age
(years)
Average onset
age of music
training
Average
number of
years of formal
training
AP test score
(% out of 100)
No AP, tone
language
(n = 7)
3 females
4 males
21.43
Range: 18-27
SE = 1.08
5.43
Range: 2-8
SE = 0.86
16.00
Range: 13-23
SE = 1.50
29.29
Range: 5-65
SE = 4.96
No AP, non-
tone language
(n = 10)
4 females
6 males
22.00
Range: 18-27
SE = 0.90
7.10
Range: 3-13
SE = 0.72
14.90
Range: 7-24
SE = 1.26
14.50
Range: 0-30
SE = 4.15
AP, tone
language
(n = 9)
5 females
4 males
22.67
Range: 19-27
SE = 0.95
3.72
Range: 2-5
SE = 0.76
18.94
Range: 15-23
SE = 1.32
96.67
Range: 85-100
SE = 4.38
AP, non-tone
language
(n = 6)
5 females
1 male
24.50
Range: 21-28
SE = 1.16
3.83
Range: 2-6.5
SE = 0.93
20.67
Range: 16-25
SE = 1.62
98.33
Range: 90-100
SE = 5.36
19
2.2.2 Materials
Auditory and visual stimuli were presented using Presentation v.14.1 under Windows XP.
This bi-modal task was chosen to create a realistic performance environment for the participants.
During performances, musicians usually read printed music while integrating auditory material
that matches or does not match what is on the printed page. Language-related stimuli (e.g.,
speech sounds) were not included to focus on musical pitch processing abilities.
All auditory stimuli had a sampling rate of 44.1 kHz with 16-bit-resolution and were
presented binaurally via Eartone 3A Insert earphones (Indianapolis, IN), at an average of 75-
decibel (dB) sound pressure level (SPL). The intensity of the stimuli was measured using a
Larson-Davis SPL metre (Model 824, Provo, Utah). The music stimuli were presented in piano
timbre. There were three auditory, music-stimulus conditions: Interval, Tonal, and Atonal. The
interval stimuli consisted of two melodic (i.e., sequentially-presented) piano tones, with a total
duration of 2 s. Intervals ranged from perfect unison to an octave above or below a given note
(including perfect unison, minor second, major second, minor third, major third, perfect fourth,
augmented fourth, perfect fifth, minor sixth, major sixth, minor seventh, major seventh, and
perfect octave). There were two of each interval type, with the exception of the two pairs of
perfect unison intervals: One ascending interval (i.e., lower note to higher note) and one
descending interval (i.e., higher note to lower note), thus comprising a total of 26 interval
stimuli. Each interval started on a different note. The tonal condition consisted of eight piano
tones, with a total duration of 5000 ms, arranged in a short melody. Tonal melodies followed the
scale pattern that defines the diatonic, major scale (all flat and sharp keys represented) and
harmonic minor scale in Western music. There were 60 different tonal melodies presented to
participants. The atonal condition consisted of eight piano tones, presented for a total of 5000 ms
in duration, arranged in a short melody. There were 12 atonal melodies in total. The starting tone
for each atonal melody began on a unique note name (i.e., C, C#, D, D#, E, F, F#, G, G#, A, A#,
or B). The subsequent tones in the melody were selected to ensure they did not follow any tonal
conventions of Western musical theory (i.e., did not follow minor or major scale patterns;
featured disjunct leaps (e.g., major 7th) and/or highly chromatic passages). Music stimuli (both
auditory and visual) were created for this study using Sibelius 3.0. Auditory, non-music (i.e.,
control) stimuli consisted of 11 complex, environmental sounds, such as the sound of a keyboard
20
typing, presented for 1 s (see Appendix, Table S1 for complete list). All non-music stimuli
(auditory and visual) were part of a laboratory database of auditory and visual stimuli.
All visual stimuli were presented right-side-up for 4000 ms, in the middle of a computer
screen. Participants were seated approximately 85 cm from the computer screen. Music stimuli
consisted of quarter notes presented on a stave in the treble clef. Non-music stimuli consisted of
the visual analogue of the corresponding auditory, non-music stimuli (e.g., a picture of a dog as
the visual analogue of the sound of a dog barking). The inter-stimulus interval (ISI) was 2 s.
The musical stimuli were selected for their ecological validity, representing a range of
music stimuli that a musician might encounter. As Dooley and Deutsch (2011) note, some
studies found that AP possessors are subject to Stroop-like interference effects in artificial
situations, leading to the conclusion that AP is musically irrelevant (e.g., Miyazaki, 1993;
Miyazaki & Rakowski, 2002). Examples of artificial situations include the use of detuned
intervals or movable-do labels (see Dooley and Deutsch, 2011 for additional discussion). Such
artificial stimuli were avoided in the present study.
2.2.3 Procedure
Following completion of the AP test in a sound-attenuating booth, participants completed
a zero-back and one-back task (counterbalanced). Participants were familiarized with each task
by completing 20 practice trials of the zero- and one-back tasks, respectively. Prior to beginning
each familiarization task, participants were instructed to read all music stimuli from left to right.
Participants were also instructed to respond as quickly and as accurately as possible upon
deciding if the two stimuli did or did not match because they only had 4 s to make their response
after the visual stimulus was presented. Participants were then presented with either music
(interval, tonal, atonal) or non-music stimuli. After a 2-s ISI, in which a white fixation-cross was
presented mid-screen, a music or non-music visual stimulus was presented.
In the zero-back task, the visual stimulus matched or did not match the target auditory
stimulus. Participants indicated “match” or “mismatch” of the stimuli by pressing the
corresponding button on the keyboard. Participants completed 121 zero-back trials. In the one-
back task, participants were presented with an auditory stimulus followed by a visual distractor
(presented for 4 s), which was followed by a visual stimulus (presented for 4 s). The stimuli and
21
the distractor could be either non-music or music. Participants had to verbally identify the visual
distractor while keeping the preceding auditory stimulus in memory. For music stimuli,
participants identified the first note name in an interval or melody; for non-music stimuli,
participants identified the image on the monitor. A visual distractor was chosen over an auditory
distractor (which would vary in presentation length, depending on whether it was non-music, an
interval or a melody) such that the presentation duration of the distractor was uniform. The visual
stimulus presented after the distractor either matched or did not match the auditory stimulus.
Participants indicated “match” or “mismatch” of the stimuli by pressing the corresponding button
on the keyboard. Participants completed 60 one-back trials.
2.2.3.1 Statistical analyses
Accuracy and response times were analyzed using a mixed factorial-design analysis of
variance (ANOVA). The between-subjects variables were AP status (AP or no AP) and tone-
language status (tone-language speaker6 or not a tone-language speaker). The repeated measures
were stimulus type (music vs non-music stimuli) and load (0-back vs 1-back). When appropriate,
degrees of freedom were adjusted with the Greenhouse-Geisser epsilon (ε), and all reported
probability estimates are based on the reduced degrees of freedom, although the original degrees
of freedom are reported. The Bonferroni correction for multiple comparisons was also reported
for pairwise comparisons. Statistical significance was set at alpha = 0.05. Partial eta-squared
(η2p) was used as the measure of effect size for ANOVAs.
An analysis of covariance was also run on accuracy and reaction-time data because of a
significant group difference in age of onset of music lessons7, F(3, 28) = 4.32, p = .013, η2p =
.316. This effect was driven by tone-language speakers with AP having a younger mean age of
onset of music lessons (M = 3.72 years, SE = 0.76 years) than non-tone-language-speakers
without AP (M = 7.10 years, SE = 0.72 years; p = .013). As noted, early onset of music training
promotes the acquisition of AP (Baharloo et al., 1998; Miyazaki & Rakowski, 2002; Deutsch et
al., 2006, 2009). Two age-related factors have been speculated to be associated with the
6 There were no differences between Cantonese and Mandarin speakers’ performance on any of the dependent
measures (accuracy, reaction time), F’s < 1. 7 Due to the challenges associated with finding participants who fit all of the eligibility criteria, it was difficult to match participants on this measure.
22
development of AP: the age of onset of formal music training and exposure to a “fixed-do”8
musical system before age 7 (Gregersen et al., 1999; Gregersen, Kowalsky, Kohn, & Marvin,
2001; Gregersen, Kowalsky, & Li, 2007). The assumption of homogeneity of regression slopes
was tested for each ANCOVA to ensure that the assumption was not violated.
2.3 Results
2.3.1 Accuracy
Figure 1 shows group mean accuracy across stimulus type and load. There was a
significant main effect of AP status, F(1, 28) = 18.990, p <.001, η2p = .404. Specifically,
participants with AP were more accurate than those without AP. There was no significant main
effect of tone-language-speaker status (F(1, 28) = 1.017, p = .332), nor a significant interaction
of AP and tone-language status (F < 1).
There was a significant difference in stimulus condition, F(3, 84) = 18.227, p < .001, η2p
= .394, which was driven by the significant difference between performance on the control
stimuli (M = 98.77, SE = 0.38) and the other conditions, specifically the tonal melody (M =
94.51, SE = 0.82, p < .001), atonal melody (M = 91.81, SE = 1.98, p < .001), and interval
conditions (M = 94.42, SE = 0.83, p < .001). There was also a marginally significant difference
between tonal and atonal melodies, p = .081. There was a significant interaction between
condition and AP status, F(3,84) = 4.723, p = .008, η2p = .144. Specifically, participants with AP
significantly outperformed participants without AP for the consonant condition (AP: M = 98.201,
SE = 1.150; no AP: M = 90.795, SE = 1.080, p < .001), the dissonant condition (AP: M = 94.598,
SE = 1.668; no AP: M = 88.936, SE = 1.567, p = .020), and the interval condition (AP: M =
97.742, SE = 1.208; no AP: M = 91.190, SE = 1.135, p < .001). For the non-musical condition,
AP participants (M = 99.394, SE = 532) marginally outperformed the non-AP participants (M =
98.068, SE = .500, p = .079). There was no significant interaction between condition and tone-
language status (F(3,84) = 1.099, p = .347), nor a significant interaction between condition, AP
status, or tone language status, F < 1. 8 Note that the fixed-do system (i.e., absolute association of labels with note names, such as “do” with “C”) is
contrasted with the relative, “moveable-do” system, where a label (e.g., do) refers to the starting pitch of a given musical scale (Schellenberg & Trehub, 2008). It is notable that China uses the moveable-do system (Schellenberg & Trehub, 2008), and thus, the higher rates of AP in Chinese music students (e.g., Deutsch et al., 2006; Gregersen et al., 1999) cannot be attributed to the absolute associations used in the fixed-do system.
23
There was no difference between accuracy on the zero- or one-back task, nor a significant
interaction between task type (zero- versus one-back) and AP status, Fs < 1. Similarly, there
were no significant interactions between task type and tone-language status (F(1,28) = 1.216, p =
.279), nor between task type, AP status, and tone-language status, F(1,28) = 1.093, p = .305. No
other interactions were significant, Fs < 1.
2.3.1.1 ANCOVA
Significant group differences remained after controlling for the age of onset of formal
music training, F(3, 27) = 4.35 , p = .013, η2p = .326. The covariate, age of onset of formal music
training, was unrelated to accuracy, F < 1.
Figure 1. Group mean accuracy performance. **p < .001. Error bars indicate SE.
2.3.2 Response Time
Figure 2 shows group mean response time across stimulus type and load. There was a
main effect of AP status, F(1, 28) = 15.216, p = .001, η2p = .352. Specifically, participants with
AP were faster than those without AP. There was no significant main effect of tone-language-
speaker status, nor a significant interaction of AP and tone-language status (Fs < 1).
There was a significant difference in reaction times across stimulus conditions, F(3, 84) =
130.178, p < .001 , η2p = .823, which was driven by the significant difference between the
control stimuli (M = 1074, SE = 42) and the other three conditions, specifically the tonal melody
(M = 1886, SE = 50), atonal melody (M =1966, SE = 50), and interval conditions (M = 1475, SE
24
= 60), all p < .001. The tonal and atonal conditions were also significantly slower than the
interval condition (both p < .001), which is to be expected because of only two notes in the latter
condition. There was no interaction between condition and AP status nor between condition and
tone language status, Fs < 1. The interaction between condition, AP status, and tone language
status was also not significant, F(3, 84) = 1.571, p = .202.
Participants performed faster on the zero-back (M = 1560, SE = 39) than on the one-back
task (M = 1640, SE = 46) , F(1, 28) = 8.07, p = .008, η2p =.224. There were no interactions
between task type (zero- versus one-back) and AP status (F(1, 28) = 2.602, p = .118) nor with
tone language status, F < 1, The interaction between task type, AP status, and tone-language
status was not significant, F < 1, nor was the interaction between task type and condition (F(3,
84) = 1.804, p = .166). Lastly, the interaction between task type, condition, and AP status was
not significant, F(3, 84) = 1.107, p = .344.
Figure 2. Group mean reaction time performance. **p < .01. Error bars indicate SE.
2.3.2.1 ANCOVA
The group differences remained significant even after controlling for age of onset of
formal music training, F(3, 27) = 3.73, p = .023, η2p = .293. The covariate, age of onset of formal
training, was unrelated to accuracy, F < 1.
25
2.4 Discussion
Adults with AP outperformed those without AP on measures of accuracy and response
time. Specifically, AP participants were more accurate across musical conditions than non-AP
participants, regardless of tone-language status. AP participants were also significantly faster
than their non-AP counterparts, when averaging across all conditions. There was no advantage of
having AP and speaking a tone language. Although speaking a tone language may increase the
likelihood of AP (Deutsch et al., 2006), AP rather than tone-language experience seemed to be
the main source of the pitch-encoding advantage in the present study. This pattern of results
remained significant even when controlling for the age of onset of music lessons.
The advantage in pitch processing and encoding observed for AP musicians may stem
from their use of both pitch-labeling and pitch-memory skills, as compared to the musicians
without AP, who only use pitch memory (Levitin, 1994). Other studies have found that AP
possessors performed better than non-AP possessors on tasks such as music-dictation and
interval-naming (Dooley & Deutsch, 2010, 2011), which presumably benefit from a combination
of pitch memory and pitch labelling skills (versus only pitch memory). This is not to say that
both groups’ pitch memory abilities are equal, and that only pitch labeling contributed to the
observed benefits in the former group. Perhaps AP musicians have better pitch memory than
those without AP, which, through interaction with the ability to label pitches, gives these
participants two powerful cues (i.e., pitch and the label) to use for encoding sound. In
comparison, those without AP may only use the pitch memory cue, which is not developed to the
same extent as pitch memory in AP musicians.
The present benefits observed in AP musicians may also be related to an association
between auditory digit span and AP ability. For instance, Deutsch and Dooley (2013) found that
AP possessors had a larger auditory digit span than their non-AP counterparts. According to
these authors, a large auditory span facilitates the development of associations between pitch and
verbal labels in early life, promoting AP acquisition. It is also possible that a larger auditory span
is a consequence of AP. Unfortunately, it would be difficult, if not impossible, to test the
causality underlying this association between auditory span and AP because one cannot
randomly assign participants to an AP or non-AP group. That is, the development of AP may be
influenced by number of factors, such as genetic influences (Gregersen et al., 1999; Gregersen,
26
Kowalsky, Kohn, & Marvin, 2001), early age of onset of formal music training and early
exposure to a fixed-do musical system (Gregersen et al., 1999, 2001, 2007), and speaking a tone
language (e.g., Deutsch et al., 2006; Gregersen et al., 1999). Therefore, AP cannot simply be
“trained”. Regardless of whether AP is the consequence of a larger auditory span, or vice versa,
this increased span may underlie the better behavioural performance observed in the present AP
participants. Future studies could include a measure of auditory span to establish the association
between task performance and span size, as well as an early age of onset of formal music training
and early exposure to a fixed-do musical system (Gregersen et al., 1999, 2001, 2007).
2.4.1 Mechanisms underlying performance in AP possessors
What mechanisms can account for the superior performance of individuals with AP
relative to those without AP? Four potential mechanisms include: Hyper-connected temporal-
lobe regions required for the perception and association of pitch (Loui et al., 2011); increased
functional activity in superior temporal regions that are critical for sound perception and
categorization (Loui, Zamm, & Schlaug, 2012); gradient of AP ability (Miyazaki, 1988, 1990;
Bermudez & Zatorre, 2009), and relatedly, the existence of heightened tonal memory in certain
types of AP possessors (Ross et al., 2005; Loui et al., 2011). First, musicians with AP show
higher white matter connectivity in brain regions responsible for pitch perception and
association, such as the posterior superior and middle temporal gyri, in both the left and right
hemisphere (Loui et al., 2011). Furthermore, AP musicians, as compared to musicians without
AP, showed increased functional activations in superior temporal regions critical for sound
perception and categorization, as well as increased activations in multisensory-integration areas
(Loui et al., 2012). The structural and functional differences between AP and non-AP musicians
may account for the current group differences in performance.
The behavioural outcomes of the present study may also be impacted by mechanisms
related to a spectrum of AP ability. That is, AP is not a binary trait (Loui et al., 2011), instead
reflecting a continuum of skill (Bermudez & Zatorre, 2009). Baharloo et al. (1998) categorized a
sample of AP musicians into four groups, according to where participants’ performance fell on a
distribution based on pure-tone and piano-timbre-based AP test scores. In their study, the scores
of 12 musicians without AP on tests of pure-tone- and piano-tone-based AP tests were combined
with the scores on these tests from 12 randomly-selected musicians with AP. The mean pure- and
27
piano-tone test scores, and the standard errors for these means, were calculated. This process was
repeated 100 times, and the means of these 100 means and standard errors were calculated for
the pure-tone and piano-tone test scores (i.e., bootstrapping). Participants were classified as “AP-
1” based on participants’ performance on pure-tone AP tests (i.e., the ability to label any pitch
regardless of its timbre or other attributes. According to Barharloo et al., “AP-2” and “AP-3”
groups included participants who likely had AP (i.e., strong, but not outstanding, pure-tone AP
test performance; note that the difference between AP-2 and AP-3 was not clearly outlined). The
AP-4 group included participants whose pitch-perception for pure tones was worse than the
performance of AP-1, AP-2, and AP-3 participants, despite excellent piano-tone AP test
performance. This pattern of performance distinguished AP-4 participants from the three other
groups, raising the possibility that the basis of AP-4 may differ from the other AP types. Other
studies have also found that participants who perform poorly on pure-tone-based AP tests
typically perform better when the AP-test tones use instrumental timbres and more familiar tones
(e.g., the white keys on a piano, Miyazaki, 1988, 1990). The current AP test could not reveal
which individuals had AP-1, AP-2, AP-3, or AP-4 because it used piano tones instead of pure
tones. If one could classify current task performance according to AP type, group differences in
performance might be observed. Indeed, some studies have found that AP type is related to
differences in tonal memory (Ross et al., 2005; Loui et al., 2011). For example, individuals with
“weaker” (i.e., non-AP-1) types of AP reportedly have heightened tonal memory, as contrasted
with AP1 possessors’ ability to encode the pitch of any auditory stimulus (Ross et al., 2005; Loui
et al., 2011). Assuming more than one type of AP possessor was included in the present sample,
different encoding processes might have contributed to the observed behavioural outcomes.
Future studies using sine-wave tones or a combination of sine wave and instrumental tones could
identify how performance on pitch processing and encoding tasks varies as a function of AP-
type. I predict that the present results would also be found using pure tone - rather than
instrumental - stimuli, assuming a homogenous AP-1 participant group.
2.4.2 No cumulative advantage of AP and tone-language experience
Interestingly, tone language and AP did not afford a cumulative advantage in pitch
processing and encoding. Other studies have examined the joint effects of music training
(granted, without special consideration of AP) and tone language. Cooper and Wang (2012)
recruited tone-language (Thai) and non-tone-language (English) speakers, who were musicians
28
or nonmusicians, to complete a Cantonese tone-word training task. Participants were trained to
identify words distinguished by five Cantonese tones. Participants who spoke Thai and/or were
musicians were better at Cantonese word learning. However, having both tone-language
experience and music training was not advantageous above and beyond either type of experience
alone. Similarly, Mok and Zuo (2012) investigated how music training affected native speakers
of a tone language. The authors had Cantonese and non-tone-language speakers with or without
music training perform discrimination tasks with Cantonese monosyllables and pure tones
resynthesized from Cantonese lexical tones. Although music training enhanced lexical tone
perception for non-tone-language speakers, it had little effect on the Cantonese speakers,
suggesting no joint advantage of music training and tone language experience on performance.
2.4.3 Speed-accuracy trade-off
There was no significant difference in accuracy on the zero-back versus and one-back
tasks, but reaction times were significantly slower for the one-back task as compared to the zero-
back task. The increased reaction time for the one-back task suggests that the manipulation of
increasing load was successful. Difference in reaction time but not accuracy may be explained by
the speed-accuracy trade-off often seen in behavioural experiments (Zhai, Kong, & Ren, 2004).
Alternatively, it is possible that there were ceiling effects in the accuracy data. Because the
musicians were highly trained, they may have been highly skilled at identifying melodies,
intervals, and non-music sounds, with any group differences restricted to reaction time. The need
to refer back to music notation (e.g., observing that a certain passage must be repeated) is a
critical element of versatile music performance. Even a single “distractor” (i.e., in the one-back
task) may have been insufficiently distracting to participants. Further increases in memory load
(e.g., two-back task) could avoid ceiling effects in highly trained musicians.
2.4.4 Limitations
It is important to note the limitations of the current study. As mentioned earlier, future
studies could include pure tones, in addition to piano tones, in the AP test and stimuli set. This
would allow for the investigation of how different types of AP (e.g., AP-1, AP-2, etc.) are related
to task performance. The stimuli set could also benefit from more complex control stimuli, such
as a series of non-musical, complex sounds and corresponding visuals, as opposed to a single
sound and visual. Participants’ higher accuracy and faster reaction times for the control stimuli
29
might be due to the current stimuli being too simple (i.e., easier to encode as compared to the
musical conditions, which all included more than one item). More complex control stimuli, on
par with the complexity of the music stimuli, would make the comparison of musical and non-
musical stimuli more meaningful. That is, one could more clearly study whether the current
encoding effects are specific to a single category of stimuli (i.e., musical or non-musical),
perhaps speaking to the specificity of AP sound-encoding advantages.
There were also no measures of general cognitive ability in the present study. Although
no studies examined the relationship between IQ and AP, there is evidence for an association
between music training and IQ scores (e.g., Schellenberg, 2004). For example, Schellenberg
(2004) showed that children who received 36 weeks of music training exhibited slightly greater
increases in full-scale IQ, as compared to children in control groups (drama lessons or no
lessons). Furthermore, Schellenberg (2011a) posited that children with high IQs are more likely
than their lower-IQ peers to take music lessons and perform well on a variety of tests of
cognitive ability. Therefore, if children with higher IQ are more likely to take music lessons than
their lower-IQ peers, and AP development is associated with early exposure to music training
(Gregersen et al., 1999, 2001, 2007), then high-IQ children might be enrolled in music lessons
earlier than their peers and thus have increased opportunity of developing AP. Future studies
could include IQ measures to further explore the association between intelligence, age of onset
of music training, AP, and behavioural performance on processing and encoding tasks.
Finally, another limitation of this study was its small sample size. Recruitment was
challenging due to the specific eligibility requirements (highly-trained musicians, language
requirements, AP status). Future recruitment might be more feasible by collaborating with other
investigators or conducting online testing. The latter option is becoming increasingly popular in
psychology (Athos et al., 2007; Owen et al., 2010), including auditory research (Honing &
Ladinig, 2008), making it a potentially viable option for future behavioural studies on AP and
tone-language use. A larger sample size would have also allowed for testing whether speaking a
tone language with more tones (i.e., Cantonese) was associated with greater benefits to auditory
processing than one with fewer tones (i.e., Mandarin), via higher task demands in the former
case. Though Cantonese and Mandarin speakers did not differ in performance in the present
study, such differences might emerge, given a larger sample size.
30
2.5 Conclusion
The present results suggest that AP ability confers an advantage in processing and
encoding the music stimuli tested in this study. Tone-language use did not provide an advantage,
and tone-language use and AP did not provide a pitch-encoding advantage beyond that afforded
by AP. Therefore, although speaking a tone language is one factor that may increase the
likelihood of AP (Deutsch et al., 2006), AP, rather than tone-language, may be the main source
of any pitch-encoding advantages. Note that these other factors could include genetic
predispositions (Gregersen et al., 1999, 2001), early age of onset of formal music training –
specifically, early exposure to a fixed-do musical system (Gregersen et al., 1999, 2001, 2007).
When connecting these findings to the association between tone language and auditory
processing, it is important to note that though studying tone-language speakers is a good model
for comparisons with musicians due to both groups’ enhanced pitch acuity, the link between
tone-language speakers and AP may be limited. As discussed in Levitin and Rogers (2005),
whether or not one possesses absolute pitch is largely irrelevant to most musical tasks that
require relative (not absolute) pitch judgements. Furthermore, though this study addresses the
relative contribution of tone-language use to auditory processing, it is difficult to tease apart the
relative contributions of music and speech processing, as all participants were musicians. These
limitations demonstrate the need to test tone-language speakers who are nonmusicians, musicians
with no tone-language experience and no AP ability, and controls with neither tone-language
experience nor music experience. These questions were addressed in the subsequent study, which
used behavioural and EEG techniques.
31
Chapter 3 Music Training and Tone-Language Experience:
Associations with Sound Discrimination
3.1 Introduction 3.1.1 Revisiting the shared processing of music and speech
The shared, interactive processing among brain regions activated during music and
linguistic processing has been studied extensively (e.g., Bidelman et al., 2011a; Koelsch et al.,
2001; Maess et al., 2001; Merrill et al., 2012; Sammler et al., 2013; Slevc at al., 2009). However,
this overlap is not necessarily surprising, considering that any single neural region or structure is
often involved in several processes in the human brain (Anderson, 2010). Especially given the
similarities between music and speech (e.g., acoustic perception; motor production), one might
expect to observe overlapping neural regions in the processing of music and speech (Patel, 2014;
Peretz et al., 2015).
Furthermore, the co-activation of neural regions does not, by default, translate to the
sharing of neural circuitry for music and speech processing (see Peretz et al., 2015 for a review).
For example, Sammler et al. (2013) studied the role of the superior temporal gyri in processing
syntax in music and language, using intracranial recordings in temporal lobe epilepsy patients.
Though there was overlapping, bilateral activation of the STGs for both domains, there were also
differences in the hemispheric timing and involvement in frontal and temporal regions. Two
recent studies show a similar dissociation between co-activation and shared neural circuitry in
the processing of music and speech, using different neuroimaging methods (multi-voxel pattern
analysis, Merrill et al., 2012; fMRI adaptation, Armony, Auge, Angulo-Perkins, Peretz, &
Concha, 2015). These studies demonstrate that one can observe unique neural populations
associated with music and speech processing in brain regions shared by these domains (Peretz et
al., 2015). Though such studies provide compelling evidence for differential neural circuitry in
shared brain regions for music and speech processing via converging neuroimaging techniques,
the evidence for such differential neural circuitry is still scarce (Peretz et al., 2015). Indeed,
Peretz et al. (2015) still considers the question of overlap between music and speech processing
as an “open question for the field” (p. 5).
32
3.1.2 The present study
Evidence for the shared processing of music and speech raises the question of whether
music training and tone-language experience are associated with similar benefits to the auditory
processing of music and speech. There is some evidence to suggest that these two groups process
musical and linguistic pitch similarly at the subcortical level (Bidelman et al., 2011a, 2011b; note
that there were some subtle between-group differences in subcortical representations9). However,
behavioural findings that support a positive association between tone language and auditory
processing are mixed, as discussed in Chapter 1. Promising evidence for such an association
comes from Bidelman, Hutka et al. (2013), who found that musicians and Cantonese speakers
outperform controls on a variety of auditory tasks, such as a pitch discrimination task.
To date, the cortical mechanisms that may subserve the auditory processing similarities
between musicians and tone-language speakers have yet to be explored. The neural mechanisms
underlying shared enhancement in music and speech are proposed to be rooted in the auditory
system’s two-way feedback, such that descending, corticofugal projections from the cortex tune
subcortical circuits, while ascending projections from the subcortical regions tune cortical
circuits (cf. the reverse hierarchy theory of auditory processing, Ahissar et al., 2009; see Patel,
2011 for a discussion of this cortical-subcortical interplay). The influence of tone-language
experience on musical pitch processing may be shaped by this interplay. Indeed, tone-language
speakers have enhanced brainstem representation of pitch information (musical interval and
lexical tone, Bidelman et al., 2011a; tuned and detuned musical chords, Bidelman et al., 2011b)
comparable to the representations of musicians.
To better understand how tone-language experience and music training are associated
with auditory processing, the present study compared cortical neuroelectric activity elicited by
music and speech sounds in English-speaking musicians (i.e., extensive experience with musical
pitch), Cantonese speakers (i.e., extensive experience using pitch to distinguish lexical meaning),
9 For example, in Bidelman (2011a), Mandarin speakers exhibited greater pitch strength than nonmusicians, when
processing the rapid pitch changes in a tone language speech token (Mandarin tone 2, T2). Musicians exhibited greater pitch strength than the Mandarin speakers when processing a musical interval (major third), particularly on the onset of the second note of the interval. Musicians also showed greater pitch strength to two T2 sections that corresponded to a musical note in a diatonic scale. These findings were interpreted to suggest that brainstem responses are differentially shaped according to the salience of a given acoustic dimension to one’s domain (i.e., pitch processing in music versus a tone language; Bidelman et al., 2011a, p. 432).
33
and English-speaking nonmusicians (lacking experience both with musical pitch and using pitch
linguistically at the lexical level).10 Critically, musicians, and to a lesser degree, Cantonese
speakers, have pitch-perception experience that is lacking in nonmusicians who also have no
tone language experience. These eligibility criteria ensured that there was minimal overlap
between each group’s domains of pitch processing experience.
3.1.3 Electroencephalography: Components of interest
The mismatch-negativity (MMN) response was the objective assay of early cortical
sensitivity to music and speech sounds in this study. The MMN is a prominent component of an
event-related potential (ERP), serving as a neural index of detection of auditory change that is
thought to reflect early (i.e., bottom-up) processing in the auditory cortices (Näätänen,
Paavilainen, Rinne, & Alho, 2007). Previous studies have shown that changes in complex sounds
evoke larger MMN responses in musicians than in nonmusicians (Brattico et al., 2009; Brattico,
Tervaniemi, Näätänen, & Peretz, 2006; Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004, 2005;
Koelsch, Schroger, & Tervaniemi, 1999; Tervaniemi, Rytkonen, Schroger, Ilmoniemi, &
Näätänen, 2001), indicating an advantage in automatic auditory processing (Koelsch et al.,
1999). In the current study, comparisons of MMN responses elicited by contrastive speech and
musical sounds in musicians and tone-language speakers made it possible to assess the degree to
which divergent forms of pitch experience influence early cortical auditory processing related to
speech and musical sound analysis.
In addition to the MMN, the P3a and late discriminative negativity (LDN; a sustained,
late-emerging, slow-wave component) were examined. The P3a sometimes follows the MMN
and is characterized by a frontocentrally distributed positive deflection thought to reflect an
involuntary attentional switch towards the deviant stimulus (Tervaniemi, Just, Koelsch,
Widmann, & Schröger, 2005; for a review, see Escera, Alho, Schroger, & Winkler, 2000 and/or
Polich, 2007) and/or updating of working memory (Donchin & Coles, 1988; Polich, 2007). Past
work with passive listening has shown that musicians’ P3a response to sound habituates between
trial blocks, while nonmusicians show enhancement of the P3a between blocks (Seppanen,
Pesonen, & Tervaniemi, 2012). This difference in short-term plasticity between musicians and
10 This study has been published as Hutka, Bidelman, and Moreno (2015).
34
nonmusicians reveals that musicianship is associated with enhanced attentional abilities and
auditory feature encoding (Seppanen et al. 2012).
There are two common interpretations of the functional role of the LDN, both of which
imply top-down influences on auditory processing. Specifically, the LDN has been interpreted as
an index of automatic reorienting of attention following distraction by a deviant sound
(Shestakova, Huotilainen, Ceponiene, & Cheour, 2003; Wetzel, Widmann, Berti, & Schroger,
2006) and regulation of higher-order auditory processing that follows initial change detection
reflected by the MMN (Ceponiene et al., 2004; Horvath, Roeber, & Schroger, 2009; Putkinen,
Tervaniemi, & Huotilainen, 2013). Of interest, a recent report demonstrated that the LDN is
influenced by music training and language experience (Moreno, Lee, et al., 2014).
All participants were tested in two conditions, with a contrast in musical notes (differing
only in pitch) or vowel stimuli (differencing only in first-formant frequency) presented in
separate blocks. This paradigm allowed us to test for auditory processing of both within-domain
(e.g., note condition for the musician group; vowel condition for the Cantonese group) and cross-
domain processing (e.g., note condition for the Cantonese group; vowel condition for the
musician group) in the brain, as well as to examine pitch versus spectral (timbre) discrimination
enhancements in musicians and tone-language speakers. The magnitude of stimulus change
varied over two levels (large vs. small sound contrasts). Stimuli were presented in a multiple
oddball paradigm (e.g., Näätänen, Pakarinen, Rinne, & Takegata, 2004) to determine how music
and language experience are associated with sound discriminations of different complexity. In
addition to the electrophysiological responses, behavioural measures of pitch (fundamental
frequency, F0) and vowel (first formant, F1) discrimination were obtained. These tasks assessed
listeners’ perceptual acuity for changes in sound features within music and speech domains.
3.1.4 Hypotheses
If extensive experience with pitch and spectral information as a result of music training
and tone-language usage shapes the auditory system in similar ways, then both musicians and
Cantonese speakers would show enhanced MMN (discrimination), LDN (attentional
reorienting), and behavioural sound discrimination relative to nonmusicians. If this hypothesis
holds, then one would predict that neural and behavioural enhancements would extend across
stimulus types (i.e., compared to controls, musicians would demonstrate enhanced processing of
35
speech – namely, vowels, and Cantonese speakers would demonstrate enhanced processing of
musical sounds). These outcomes would suggest that both music and tone-language experience
are associated with superior automatic as well as top-down auditory processing (i.e., a
bidirectional association; Bidelman et al., 2011a; Bidelman, Hutka, et al., 2013).
3.2 Methods
3.2.1 Participants
Sixty-seven participants were recruited from the University of Toronto and Greater
Toronto Area. The data from four participants were lost due to technical difficulties (n = 3) or
attrition (n = 1). Of the remaining 63 participants, 3 were deemed outliers (3 standard deviations
above the mean on measures of difference limen) and excluded from subsequent analysis. No
one from the previous study (Chapter 2) participated in the present study. Each participant
completed questionnaires to assess language (Li, Sepanski, & Zhao, 2006; Wong & Perrachione,
2007) and music (Bidelman, Hutka, et al., 2013) background. English-speaking musicians
(hereafter referred to as Ms) (n = 21, 14 female) were amateur instrumentalists with at least eight
years of continuous training in Western classical music on their primary instrument (M = 15.43,
SD = 6.46 years), beginning at a mean age of 7.05 (SD = 3.32). All musicians had formal private
or group lessons within the past five years and currently played their instrument(s). These
inclusion criteria are consistent with many previous studies examining neuroplastic associations
with musicianship (e.g., Bidelman & Krishnan, 2010; Bidelman et al., 2011a).
English-speaking nonmusicians (n = 21, 14 female) had a maximum of 3 years of formal
music training on any combination of instruments throughout their lifetime (M = 0.81, SD =
1.40) and had not received formal instruction within the past five years. Both musicians and
nonmusicians had some exposure to a second language that was not a tone language (musicians:
90.48%, nonmusicians: 66.67%; mainly French or Spanish) but were classified as late learners
and/or had moderate to high levels of proficiency11 in their second language.
11
Participants rated the following aspects of their second language on a scale from 1 (very poor) to 7 (native proficiency): “Reading proficiency”, “writing proficiency”, “speaking fluency”, and “listening ability.” “Fluent” was defined native proficiency in all four categories. Of the musicians who had some exposure to a second language (L2; n = 19), 15 had native proficiency in at least one of the four categories; 3 rated themselves as “good” (5) in at least one category; 1 rated themselves as “fair” (3) in at least one category. Of the nonmusicians who had some L2 exposure (n = 14), in at least one of the four categories, 12 had native proficiency, while 2 rated themselves as “good”.
36
Cantonese-speaking participants (n = 18; 11 female) were considered late bilinguals (as
in Bidelman et al., 2011a; Chandrasekaran et al., 2009), beginning formal instruction in English
at a mean age of 10.27 (SD = 5.13); these criteria were used in previous research. All participants
were born and raised in mainland China or Hong Kong and reported using Cantonese on a
regular basis (43.53% daily use, SD = 29.79 %). As with nonmusician participants, Cantonese
speakers had minimal music training throughout their lifetime (M = 0.78, SD = 0.94 years) and
had not received formal instruction in the past five years. Importantly, nonmusicians and
Cantonese speakers did not differ in their music training, p > .90. The three groups were closely
matched in age (musicians: M = 25.24, SD = 4.17; Cantonese speakers: M = 24.17, SD = 4.12;
nonmusicians: M = 23.38, SD = 4.07), p > .30, and years of formal education (musicians: M =
18.19, SD = 3.25; Cantonese speakers: M = 16.94, SD = 2.46; nonmusicians: M = 16.67, SD =
2.76), p > .10), and all were right-handed. All participants provided written, informed consent in
compliance with an experimental protocol approved by the Baycrest Centre Research Ethics
Committee. All received financial compensation for their time.
3.2.2 Cognitive tests
Participants’ general fluid intelligence and short-term memory capacity were measured to
rule out differences in cognitive ability among groups (e.g., Bidelman, Hutka, et al., 2013).
3.2.2.1 Raven’s Matrices
General fluid intelligence was measured with Raven’s Advanced Progressive Matrices
(Raven, Raven, & Court, 1998), which uses nonverbal material without cultural, language, or
social bias to assess individuals’ general cognitive ability. Each trial consisted of a 3×3 matrix of
line drawings depicting abstract patterns in all but the bottom-right cell. Participants selected the
missing pattern from among 6 to 8 alternatives and were given 10 min to complete the 29-item
battery. Items became progressively more difficult over the course of the test. Raw scores
(number correct) were recorded and used in subsequent analyses.
3.2.2.2 Corsi Blocks
A digital implementation of the Corsi blocks tapping test (Corsi, 1972) was used to gauge
each individual’s nonverbal short-term memory. On each trial, participants saw a 6×6 grid of
grey squares on the computer screen. A memory sequence was then presented by briefly
37
changing the colour of certain boxes in various locations on the screen. Participants were
required to recall the sequence in identical order by clicking on the target boxes. Sequence length
gradually increased from two to eight items, becoming progressively harder. Two repetitions
were presented for each span length. The longest span correctly recalled was an index of visual
short-term memory capacity.12
3.2.3 Behavioural tasks
Fundamental frequency difference limens (F0 DLs) and first formant difference limens
(F1 DLs) were measured for each participant using three-alternative, forced-choice (3AFC)
discrimination tasks (Bidelman & Krishnan, 2010). F0 DLs and F1 DLs were measured in
separate blocks. The F0 task used complex tones that varied in pitch. Individual tones contained
10 harmonics of the fundamental and were 200 ms in duration. The F1 task used synthetic
speech sounds that varied only in the first formant frequency (F1). For these stimuli, the F0 (115
Hz) as well as second (2500 Hz), third (3500 Hz), and fourth (4530 Hz) formants were kept
constant across vowels such that only F1 varied.
For a given trial in each task, participants heard three sequential intervals, two containing
an identical reference token (F0ref = 220 Hz for the F0 DL task; F1ref = 300 Hz for the F1 DL
task) and one containing a higher comparison, assigned randomly. Participants’ task was to
identify which of the three tokens (first, second, or third) differed from the other tokens.
Discrimination thresholds were measured using a 2-down, 1-up adaptive paradigm that tracks
71% correct performance on the psychometric function (Levitt, 1971). The initial frequency
difference between reference and comparison (ΔF) was set at 20% of F0ref/F1ref. Following two
consecutive correct responses, ΔF was decreased for the subsequent trial and increased following
a single incorrect response. ΔF was varied using a geometric step-size factor of two for the first
four reversals and was decreased to √2 thereafter. Fourteen reversals were measured and the
geometric mean of the last eight were used to compute each individual’s DL for the run,
12
It has been asserted that the backwards conditions of span tasks, namely the visual memory span (analogous to the current task) and digit span tasks, are more demanding of working memory processing than the forward conditions (see Wilde, Strauss, & Tulsky, 2004). However, when Wilde et al. (2004) assessed whether the backwards span was a more sensitive measure of working memory than the forward span, it was found that the backwards span did not afford differential sensitivity above and beyond the forward span. Nonetheless, it is important to note that this does not mean that the forward and backward span tasks are equivalent measures. Future studies should strive to include both tasks for a more complete understanding of visual memory span.
38
calculated as the minimum percent change in F0/F1 that was detectable (i.e., ΔF/Fnom). F0 DLs
of two runs were averaged per listener to obtain a final estimate of each individual’s F0
discrimination threshold, i.e., the smallest change in pitch that listeners could reliably detect.
Similarly, F1 DLs of two runs were averaged per listener to obtain a final estimate of each
individual’s F1 discrimination threshold.
3.2.4 EEG stimuli
EEGs were recorded using a passive, auditory oddball paradigm, consisting of two
conditions presented in separate blocks of music and speech (Figure 3).
Figure 3. Spectrograms illustrating the standard, large, and small deviant stimuli for the music
(top row) and speech (bottom row) conditions. White lines indicate the fundamental frequency of
each tone or the first formant of each vowel.
The order of conditions was counterbalanced across participants. The note condition consisted of
synthesized piano tones, created with Sibelius v7.1.3 and exported as .wav files. The notes
consisted of middle C (C4, F0 = 261.6 Hz), middle C mistuned by an increase of 0.5 semitones
39
(large deviant; 269.3 Hz; 2.9% increase in frequency from standard), and middle C mistuned by
an increase of 0.25 semitones (small deviant; 265.4 Hz; 1.4% increase in frequency from
standard). Note that these changes were selected because previous behavioural research has
demonstrated that both Cantonese speakers and musicians can distinguish between half-semitone
changes in a given melody better than controls, whereas musicians outperform Cantonese
speakers and controls when detecting a quarter-semitone change (Bidelman, Hutka, et al., 2013).
Tone durations were 300 ms, including 5-ms rise and fall time to reduce spectral splatter. Speech
stimuli consisted of three steady-state vowel sounds (Bidelman, Moreno, & Alain, 2013): [ʊ] as
in book, [a] as in pot, and [ʌ]13 as in but as the standard, large deviant, and small deviant (on the
border of categorical perception between the standard and large deviant; Bidelman, Moreno, et
al., 2013) respectively. The duration of each vowel was 250 ms, including 10-ms rise and fall
time. The standard vowel had an F1 of 430 Hz, the large deviant 730 Hz (41.1% increase in
frequency from standard), and the small deviant 585 Hz (26.5% increase in frequency from
standard). Speech tokens contained identical fundamental (F0), second (F2), and third (F3)
formant frequencies (F0: 100, F2: 1090, and F3: 2350 Hz), chosen to match prototypical
productions from a male speaker (Peterson & Barney, 1952). Speech stimuli were synthesized
with a cascade formant synthesizer implemented in MATLAB (The MathWorks) using
techniques described by Klatt and Klatt (1990). Stimulus onset asynchrony (SOA) was 1 s in
both conditions so that the stimulus repetition rates (and thus, neural adaptation effects) were
comparable for speech and music ERP recordings.
The magnitude of F1 change between the standard and each speech deviant was chosen to
parallel the magnitude of change in the music standard and deviants. However, it is notable that a
greater magnitude of change was required to detect the standard-large deviant and standard-small
deviant changes for F1 than F0. This difference was informed by past findings showing that
participants require a larger percent change between two vowel sounds (i.e., F1) to detect a
difference, as compared to between two pitches (i.e., F0; Bidelman & Krishnan, 2010).
Specifically, in Bidelman and Krishnan (2010), musicians could detect a ~2% change between
two F1s, and a ~0.03% change between two F0s. Nonmusicians could detect a ~4% change
13
Note that this vowel sound is found in English and not in Cantonese (Zee, 1999). In contrast, the standard and large deviant vowels are found in both English and Cantonese (Zee, 1999). See Section 3.4.4 for a discussion of the implications of this vowel-use difference for the present results.
40
between two F1s, and a ~0.90% change between two F0s. Though these difference limens were
not measured using identical stimuli as used in the current study, they demonstrate that
participants require a greater change between F1s than between F0s to detect a difference
between stimuli. Pilot testing was used at present, to determine the specific F0 and F1 standard-
deviant changes that musicians and nonmusicians could reliable detect.
There were a total of 780 trials in each condition including 90 large deviants (12% of the
trials) and 90 small deviants (12% of the trials). Note that it has previously been demonstrated
that reliable MMN waves can be elicited in the presence of more than one deviant (Näätänen et
al., 2004). In a seminal study of MMN paradigms, Näätänen et al. (2004) compared a traditional
MMN paradigm with a new MMN paradigm using five auditory deviants. In the traditional
paradigm, one deviant was presented within a single sequence, for five sequences in total. In the
new paradigm, the five different deviants were presented within the same sequence. Each deviant
had a probability of 0.1. Therefore, in the traditional paradigm, 90% of the stimuli were
standards, while in the new paradigm, 50% were standards. The MMNs observed in the new
paradigm were equal to the MMNs observed in the traditional paradigm (Näätänen et al., 2004),
demonstrating that one can obtain five different MMNs in the time it would typically take to
obtain a single MMN.
3.2.5 Procedure
Participants completed the cognitive tests (Raven’s Matrices and Corsi blocks) plus the
two difference limens tasks (F0 DL and F1 DL) (i.e., the behavioural battery) and the EEG
recording portions in counterbalanced order (P = 0.5 of starting with either the behavioural
battery or EEG recording).14 During EEG recording, participants sat in a comfortable chair and
watched a muted movie of their choice. They were instructed to attend to the movie and ignore
the sounds. Auditory stimuli were delivered binaurally from insert earphones (ER-3A) at an
intensity of 75 dB SPL. The test session lasted approximately 2 hours.
14
For the behavioural battery, either the cognitive or difference limens tests were administered first. For the former, either Corsi or Raven’s was administered first. For the latter, the F0 DL and F1 DL blocks were administered in random order. For the EEG recording, either the music or speech condition was presented first.
41
3.2.5.1 EEG recording and data analysis
EEGs were recorded using a 76-channel Biosemi Active Two-amplifier system (sampling
rate of 512 Hz) with electrodes placed around the scalp according to standard 10-20 locations
(Oostenveld & Praamstra, 2001). During EEG acquisition, all electrodes were referenced to the
CMS (Common Mode Sense) electrode, with the DRL (Driven Right Leg) electrode serving as
the common ground. Subsequent analyses were performed in EEGLAB (Delorme & Makeig,
2004) with custom routines coded in MATLAB. Data were re-referenced off-line to the
mastoids. Eye movements and artifacts were corrected in the continuous EEG using ICA
decomposition in EEGLAB. Excessively noisy channels were interpolated (two nearest
neighbour electrodes). EEG data were divided into epochs (-200-1000 ms), baseline-corrected to
the pre-stimulus interval and subsequently averaged in the time domain to obtain ERPs at each
electrode site for each response type (standards, deviants) and stimulus condition (musical notes,
vowels). Grand averaged ERPs were then digitally filtered (0.01-50 Hz, zero-phase response) for
response visualization and quantification.
MMNs were computed by deriving difference waveforms, calculated by subtracting
ERPs to the standard stimuli from their corresponding deviant ERPs of the same sequence (i.e.,
standard minus deviant). The presence of the MMN at the mastoids was confirmed when
applying a common average reference. For each participant, MMN amplitude was measured as
the most negative peak in the 100- to 250-ms time window of difference waveforms in a fronto-
central electrode cluster (mean of F1, Fz, F2, FC1, FCz, FC2 electrodes). Similarly, P3a and the
LDN were identified in these same channels as the most positive peak in the 200- to 350-ms time
window (P3a) and the mean ERP amplitude in a latency window of 300 to 500 ms (LDN),
respectively. All component latencies were selected based on analysis windows specified in prior
research and visual inspection of the waveforms (Luck, 2005; Shestakova et al., 2003).
An analysis of two additional components, the auditory N1 (particularly, the N1b
subcomponent, most prominent at vertex electrodes at ~100ms, Näätänen & Picton, 1987) and
P2 waves, were also conducted on the ERP prior to subtraction (see Appendices for introduction,
methods, results, and discussion; Figure S1 shows the standard, large, and small deviant for each
condition prior to subtraction). These two deflections have been found to be larger in musicians
than in nonmusicians (Bosnyak, Eaton, & Roberts, 2004; Pantev et al., 1998; Shahin, Bosnyak,
42
Trainor, & Roberts, 2003), making their analysis of particular interest for the present participant
groups. The analysis of the N1 and P2 also allowed for the examination of ERPs prior to
subtraction, providing an opportunity to verify whether the large and small deviants elicited a
change in amplitude from the standard, thus accounting for the difference waves on which the
following analyses were conducted. Although the examination of N1 and P2 waves are relevant
to the present investigation, they are tangential relative to the examination of MMN, P3a, and
LDN and are therefore included in the Appendices.
3.2.5.2 Statistical analysis
A univariate ANOVA was conducted for each cognitive and DL measure. Prior to
statistical analyses, F0 and F1 DL values were square-root transformed to satisfy normality and
homogeneity of variance assumptions required for parametric statistics. Note that when the
univariate ANOVAs were conducted on the raw F0 DL and F1 DL data (rather than on the
transformed data), the pattern of results remained the same (i.e., significant results remained
significant, and non-significant results remained non-significant). However, the bar graphs
displaying the F0 DL and F1 DL results (Figure 5) show the raw F0 DL means and standard
error bars, as these values are easier to interpret than the square root values.
For each of the MMN, P3a, and LDN measures, an ANOVA was conducted, with group
as the between-subjects factor, and stimulus type (music or speech) and deviant size (small or
large) as within-subjects factors. For all analyses, the dependent variable was the amplitude of a
cluster of fronto-central electrodes (average of F1, Fz, F2, FC1, FCz, FC2). For the MMN,
laterality effects were also examined, as visual inspection of the scalp topographies indicated the
possibility of between-group and between-condition differences (Figure 4). To this end, an
ANOVA was conducted, with group as the between-subjects variable, and stimulus type (music
or speech), deviant size (small or large), and laterality (left and right) electrode cluster as within-
subjects variables. The left electrode cluster was an average of a subset of left fronto-central
electrodes (AF3, F3, and F5); the right cluster was an average of right fronto-central electrodes
(AF4, F4, F6).
43
Figure 4. Event-related potential (ERP) scalp topography for the mismatch negativity (MMN) in
the (a) large-deviant music, (b) large-deviant speech, (c) small-deviant music, and (d) small-
deviant speech conditions. The cluster of six electrodes is outlined on the topography of Ms, as
this group drove the significant between-group differences in all conditions. Topographies show
mean activation between two time points in each condition, centered on the mean peak amplitude
(190 to 200 ms for large deviants; 200 to 210 ms for small deviants).
Bonferroni corrections were applied to all pairwise contrasts to control for family-wise
error (α = 0.05). When appropriate, degrees of freedom were adjusted with the Greenhouse-
Geisser epsilon (ε) and all reported probability estimates are based on the reduced degrees of
freedom, although the original degrees of freedom are reported. Partial eta-squared (η2p) was
used as the measure of effect size for all ANOVAs.
Correlations, by group, were examined between performance on i) behavioural auditory
measures (F0 DL and F1 DL) and ii) brain (MMN, P3a, and LDN) and behavioural (F0 DL, F1
DL, Corsi span, and Raven’s) measures to assess the degree to which listeners’ auditory neural
processing of speech/music predicted perceptual acuity in each domain. A false discovery rate
(FDR) procedure (Benjamini & Yekutieli, 2001) was used to correct for multiple correlation tests
with a threshold of ɑ= .05. FDR-corrected results are reported.
44
3.3 Results
3.3.1 Cognitive tests
There were no group differences on the Raven’s or Corsi blocks scores, Fs < 1,
confirming that groups were well-matched in fluid intelligence and short-term memory.
3.3.2 Behavioural tasks
There was a significant group difference on the F0 DL task, F(2, 59) = 11.91, p < .001,
η2p = 0.295 (Figure 5A). Pairwise comparisons revealed that musicians did not perform
differently from Cantonese speakers (p = .920), but that musicians and Cantonese speakers
outperformed nonmusicians [musicians vs. nonmusicians: p < .001; Cantonese speakers vs.
nonmusicians: p = .003] (i.e., musicians = Cantonese speakers > nonmusicians).
Figure 5. A: Performance on the fundamental frequency (F0) difference limen (DL) task.
Musicians (M) and Cantonese-speaking participants (C) showed better pitch discrimination than
non-musicians (NM) controls. B: Performance on the first formant frequency (F1) DL task. M
showed superior discrimination of the first formant in speech sounds, as compared to C and NM.
** p ≤ .01. Error bars indicate SE.
F1 DLs also differed between groups, F(2, 59) = 8.94, p < .001, η2p = 0.239 (Figure 5B).
Pairwise comparisons revealed that musicians outperformed Cantonese speakers (p < .001) and
45
nonmusicians (p = .011); Cantonese speakers did not differ from nonmusicians (p = .776) (i.e.,
musicians > Cantonese speakers = nonmusicians).
3.3.3 ERP data
MMN scalp topographies, waveforms, and average peak amplitudes are shown for each
group and stimulus condition in Figures 4, 6, and 7, respectively.
Figure 6. ERPs difference waves for each group and condition. Each waveform is an average
across six fronto-central channels (inset, F1, Fz, F2, FC1, FCz, FC2). M = musicians; C =
Cantonese speakers; NM = nonmusicians.
46
Figure 7. Mismatch negativity (MMN) peak amplitude between 100 ms to 250 ms for each
condition and group. The peak amplitude is the average peak of six fronto-central electrodes (F1,
Fz, F2, FC1, FCz, FC2). Error bars indicate SE. M = musicians; C = Cantonese speakers; NM =
nonmusicians.
3.3.3.1.1 MMN
Musicians had larger MMNs across all stimulus conditions than did Cantonese speakers
(p < .001) and nonmusicians (p < .001), F(2, 57) = 15.71, p < .001, η2p = 0.355 (see Table 2 for
means and standard errors). For all three groups, listeners showed larger MMNs (i.e., enhanced
discrimination) for large deviants than for small deviants across both music and speech stimuli,
F(1, 57) = 6.453, p = .014, η2p
= 0.102. All other main effects, as well as two- and three-way
interactions, were not significant, p > .05. Thus musicians had enhanced early cortical
discrimination relative to Cantonese speakers and nonmusicians across music and speech sounds.
47
Table 2
Means and Standard Errors of Mismatch Negativity Analysis Variables at Each Level.
Group Stimulus Deviant size M SE M Music Large -3.085 0.327 Small -2.815 0.364 Speech Large -2.989 0.271 Small -2.585 0.291 C Music Large -2.061 0.416 Small -1.292 0.236 Speech Large -1.934 0.241 Small -1.577 0.185 NM Music Large -2.081 0.248 Small -1.534 0.245 speech Large -1.733 0.237 Small -1.826 0.247
Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.
3.3.3.1.2 MMN laterality effects
The significant results from the MMN analysis (i.e., previous section) remained
significant after including laterality in the analysis. Pooling across groups, stimulus type, and
deviant size, the MMN was marginally stronger in the right than the left hemisphere, F(1, 57) =
3.76, p = .058, η2p = 0.062 (see Table 3 for means and standard errors). The interaction of
laterality, stimulus type, and deviant size was significant, F(1,57) = 7.91, p = .007, η2p = 0.122.
This interaction was driven by a main effect of deviant size in the right cluster, F(1,59) = 7.64, p
= .008, η2p = 0.115. Specifically, large deviants elicited stronger MMNs than small deviants. The
interaction of stimulus type and deviant size was also significant, F(1,59) = 5.42, p = .023, η2p =
.084. For the music condition, large deviants elicited stronger MMNs than small deviants,
F(1,59) = 4.32, p = .042, η2p = 0.06815. All other two-, three-, and four-way interactions were not
significant, ps > .05.
15
Following visual inspection of the data, it was evident that there may have been between-group laterality differences in the large music condition amplitudes. Therefore, pairwise comparisons were conducted to examine these between-group laterality differences. For the musicians, the right hemisphere amplitude was significantly greater than the left hemisphere amplitude, t(20) = 2.408, p = .026. There was no significant difference between hemisphere amplitudes in the other two groups [Cantonese: t(17) = 1.067, p = .301; nonmusicians: t(20) = 0.734, p = .472].
48
Table 3
Means and Standard Errors of Laterality Analysis Variables at Each Level.
Group Laterality Stimulus type Deviant size M SE M Left music large -2.578 0.300 small -2.496 0.334 speech large -2.856 0.196 small -2.340 0.251 Right music large -3.139 0.269 small -2.513 0.330 speech large -2.734 0.221 small -2.395 0.236 C Left music large -1.959 0.374 small -1.266 0.213 speech large -1.732 0.226 small -1.404 0.220 Right music large -2.177 0.429 small -1.341 0.197 speech large -1.792 0.212 small -1.560 0.216 NM Left music large -1.845 0.193 small -1.539 0.267 speech large -1.585 0.224 small -1.651 0.244 Right music large -1.962 0.182 small -1.375 0.257 speech large -1.566 0.193 small -1.916 0.225
Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.
3.3.3.1.3 P3a
There was a significant three-way interaction of stimulus type, deviant size, and group,
F(2, 57) = 5.59, p = .006, η2p = 0.164 (see Table 4 for means and standard errors). For
musicians, the large deviant had a more positive P3a than the small deviant, F(1, 20) = 3.75, p =
.067, η2p = 0.158. There was also a significant interaction of stimulus type and deviant size in
musicians, F(1, 20) = 20.59, p < 0.001, η2p = 0.507. Specifically, for the music condition, the
large deviant had a more positive P3a than the small deviant, F(1, 20) = 19.37, p < 0.001, η2p =
0.492. For the speech condition, the small deviant had a more positive P3a than the large deviant,
49
F(1, 20) = 5.61, p = .028, η2p = 0.219. For Cantonese speakers, there was a more positive P3a for
the music than for the speech condition, F(1, 17) = 5.652, p = .029, η2p = 0.250. For
nonmusicians, there were no significant main effects or interactions (p > .05). These results
indicate that musicians had stronger involuntary switching of attention for large music deviants
than for small music deviants (as indexed by the P3a response), while the opposite pattern was
true for the speech condition in musicians. Lastly, Cantonese speakers showed stronger
involuntary switching of attention for musical sounds (i.e., pitch deviants) than for speech sounds
across both deviant sizes.
All other main effects and two-way interactions, including group as a variable, were not
significant, ps > .05. There was a significant interaction of stimulus type and deviant size, F(1,
57) = 14.40, p < .001, η2p =0.202. Specifically, for the music condition, the large deviant had a
more positive P3a than the small deviant, F(1, 57) = 8.90, p = .004, η2p = 0.135, when pooled
across groups. Across groups, one might predict that a large, as compared to a small, deviant
would be associated with an involuntary switch in attention (i.e., the large deviant is a more
obvious, “attention-grabbing” change). For the speech condition, the small deviant had a more
positive P3a than the large deviant, F(1, 57) = 5.36, p = .024, η2p = 0.086. Based on the
aforementioned logic, this finding is counterintuitive: The large deviant should be more likely to
elicit a shift in involuntary attention than a small deviant. These findings suggest that perhaps the
small speech deviant actually elicited a larger involuntary shift. This vowel sound was on the
border of categorical perception between the standard and large deviant (Bidelman, Moreno, et
al., 2013), and may thus have led to an involuntary attentional shift.
50
Table 4
Means and Standard Errors of P3a Analysis Variables at Each Level.
Group Stimulus Deviant size M SE M music large 2.198 0.493 small 0.502 0.365 speech large 0.407 0.314 small 1.197 0.402 C music large 1.516 0.276 small 1.357 0.174 speech large 0.660 0.267 small 1.194 0.218 NM music large 1.023 0.352 small 0.785 0.353 speech large 0.948 0.298 small 0.928 0.291
Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.
3.3.3.1.4 LDN
There was a significant main effect of group on LDN mean amplitude, F(2, 57) = 4.56, p
= .015, η2p = 0.138 (see Table 5 for means and standard errors). Specifically, musicians had a
more negative LDN than Cantonese speakers, p = .012. There was no difference in LDN
amplitude between musicians and nonmusicians, p > .2 or between Cantonese speakers and
nonmusicians, p > .5. Pooling across groups and deviant sizes, the speech condition elicited a
more negative LDN than did the music condition, F(1, 57) = 6.48, p = .014, η2p = 0.102. There
was also a significant interaction of stimulus type and deviant size, F(1, 57) = 8.436, p = .005,
η2p = 0.129. In the speech condition, the large deviant elicited a more negative LDN than the
small deviant, t(59) = -3.597, p = .001, whereas there was no difference between the LDN
amplitudes elicited by the two music deviants (p = .498). Furthermore, the large speech condition
elicited a more negative LDN than the large music condition, t(59) = 3.391, p = .001, whereas
there was no difference in LDN amplitude between the large music and speech deviant (p =.768).
These results indicate that across all groups and deviant sizes, the speech condition elicited
greater top-down processing/re-orienting for the large deviant than for the small deviant. Across
all groups, the large speech deviant elicited a more negative LDN than the small speech deviant,
suggesting that the former deviant was processed in a top-down manner more so than the latter.
Recall that the small speech deviant (i.e., on the border of categorical perception between the
51
large deviant and standard) was associated with a larger P3a than the large speech deviant. It is
possible that, because the large speech deviant fell within a distinct perceptual category, it was
less distinct (i.e., elicited a smaller P3a/switch in attention) than the small speech deviant, and
was processed in a top-down manner. That is, because it was more aligned with a perceptual
category than the small speech deviant, it elicited a larger LDN as compared to the small speech
deviant. The LDN results suggest that musicians used top-down processing/re-orienting to a
greater extent than Cantonese speakers.
Table 5
Means and Standard Errors of the Late Discriminative Negativity Analysis Variables at Each
Level.
Group Stimulus Deviant size M SE M music large -0.808 0.297 small -1.400 0.324 speech large -1.858 0.263 small -1.062 0.269 C music large -0.274 0.221 small -0.144 0.305 speech large -1.160 0.340 small -0.553 0.255 NM music large -0.761 0.318 small -0.680 0.249 speech large -1.207 0.218 small -0.823 0.254
Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.
3.3.3.1.5 Correlations
Correlations between F0 DL and F1 DL revealed a significant positive association in
musicians only, r = 0.605, p = .004. That is, better pitch discrimination was associated with
superior timbre discrimination and vice versa. There were no significant correlations between
behavioural and ERP measures, ps > .05, after FDR correction.
3.4 Discussion
By comparing cortical MMN responses to music and speech in musicians and tone-
language speakers, this study assessed possible enhancements in auditory neural processing
52
associated with music and tone-language experience compared to adults who were neither
musicians nor tone-language speakers. Across conditions, only musicians showed enhanced
MMN, suggesting that they had better automatic discrimination of music and speech sounds than
did Cantonese and nonmusician listeners. Cantonese experience was not associated with
increased ERP amplitude despite enhanced behavioural acuity for pitch. As expected, there was
clear differentiation between deviant size, with large deviants eliciting more pronounced
responses (i.e., more negative MMN and LDN) than small deviants.
There was no significant interaction between group, stimulus type (i.e., music versus
speech), and deviant magnitude (i.e., small versus large) for any of the ERP components.
Previously, behavioural melody discrimination tasks showed that musicians are more sensitive to
quarter-semitone changes (i.e., the size of the small music deviants in the present study) than are
Cantonese speakers and nonmusicians (Bidelman, Hutka et al., 2013). For half-semitone changes
(i.e., the size of the large music deviants in the present study), musicians previously
outperformed Cantonese speakers, who in turn outperformed nonmusicians (Bidelman, Hutka et
al., 2013). Based on these data, one might predict analogous changes in the MMN response.
However, it is possible that all musical stimuli elicited a similar MMN because the MMN is a
passive index of sound discrimination. Perhaps group differences only emerge when participants
pay attention to the stimuli (i.e., in the behavioural task).
Overall, speech and music stimuli generated comparable MMN and P3a responses in all
groups. This finding suggests that simple music and speech stimuli may engage similar neural
networks. However, for the LDN, speech sound contrasts elicited larger neural responses than
did the musical stimuli. The latter findings imply that the recruitment of top-down processing/re-
orienting is more pronounced for changes in timbre than for changes in pitch. Timbre is a highly
salient cue for listeners, providing critical information about a given sound source (Schellenberg
& Habachi, 2015). In contrast, pitch height is more reliant on situational factors (Schellenberg &
Habachi, 2015). Furthermore, discriminating sound sources (i.e., via timbre) is more
evolutionarily salient, as compared to discriminating different aspects of the same source (i.e.,
pitch; Schellenberg & Hibachi, 2015). Indeed, vocal timbre has been shown to be particularly
salient, enhancing memory for melodies relative to melodies presented in instrumental timbres–
an effect that may be related to the biological relevance of the voice (e.g., Weiss, Schellenberg,
53
Trehub, & Dawber, 2015; Weiss, Trehub, & Schellenberg, 2012; Weiss, Vanzella, Schellenberg,
& Trehub, 2015).
Schellenberg and Hibashi (2015) also demonstrated the importance of timbral cues in
comparison to pitch or tempo. Participants listened to previously unfamiliar melodies and were
tested for their recognition of the melodies 10 minutes, one day, or one week from the initial
exposure. Recognition ratings were collected for the old (initially-presented) melodies, as well as
for an equal number of new melodies. In the first of two experiments, half of the old melodies
were transposed by six semitones (i.e., change in key/pitch) or shifted in tempo. In the second
experiment, timbre was changed in half of the old melodies. Timbral changes negatively
impacted recognition after all three delays, whereas changes in pitch or tempo were only
impacted after the 10-minute and one-day delays. These results suggest that information about
timbre fades more slowly than pitch or tempo. Furthermore, these effects were present in
listeners who were recruited without regard to music training, demonstrating that these results
are not limited to highly-trained musicians and/or individuals with AP.
3.4.1 Musicianship and tone language: Behavioural measures
At the behavioural level, musicians and Cantonese speakers performed better than
nonmusicians without tone-language experience. This finding adds further support to the
associations between auditory acuity for pitch and auditory experience, whether it involves
music or speech. Specifically, previous behavioural studies have reported musicians’ higher
perceptual acuity for pitch (e.g., Bidelman et al., 2011a, 2011b; Magne, Schön, & Besson, 2006;
Marques, Moreno, Castro, & Besson et al., 2007; Schön et al., 2004) and the timbral
characteristics of speech (Bidelman & Krishnan, 2010; Bidelman, Weiss, Moreno, & Alain,
2014; Chartrand & Belin, 2006).
Contrary to expectations, there was no behavioural advantage of tone-language
experience on the processing of speech timbre, with Cantonese participants performing no
differently from nonmusicians. If tone languages confer auditory processing benefits, those
benefits may be restricted to pitch processing. Whereas pitch is used to distinguish lexical
meaning in Cantonese but not in English (Cutler, Dahan, & van Donselaar, 1997; Yip, 2002),
timbre is a critical cue for source identification in general (see Schellenberg & Habachi, 2015 for
54
a discussion). Accordingly, the benefits of tone-language experience may be limited to pitch
processing (Bidelman et al., 2011b).
Selective benefits associated with the use of certain linguistic cues have been reported
previously. For example, long-term experience with duration cues from either language use or
music training predicts benefits in pre-attentive and attentive processing of duration cues in
nonspeech harmonic sounds (Marie et al., 2012). Moreover, speaking a tone language has been linked to enhanced pitch discrimination and the timing of auditory cortical responses to pitch
changes (Giuliano et al., 2010). Tone-language speakers also more readily imitate (via singing)
and discriminate musical pitch (Pfordresher & Brown, 2009). The findings suggest that tone-
language acquisition fine-tunes the processing of pitch, affecting pitch processing in linguistic
and non-linguistic domains (Bidelman et al., 2011b, Bidelman, Hutka, et al., 2013; Pfordresher
& Brown, 2009).
3.4.2 Auditory neurophysiological benefits of musicianship and tone language
Musicians’ superior behavioural discrimination of music and speech was reflected in
their MMN response, which was larger across all conditions, as compared with Cantonese
speakers and nonmusicians who did not speak a tone language. One possibility is that
musicianship tunes sensory mechanisms that subserve early discrimination in music and speech
domains (Bidelman et al., 2011b). Musicians’ superior discrimination of timbre, both at the
neural and behavioural level, also aligns with a wealth of data supporting music-to-language
associations or benefits in a number of language-related domains (e.g., Bidelman & Krishnan,
2010; Marques et al., 2007; Parbery-Clark, Skoe, Lam, et al., 2009). This superior discrimination
may be related to musicians’ predisposition for multiple aspects of superior auditory
discrimination, their broad range of auditory experiences (relative to the other groups), as well as
cross-domain benefits related to the auditory experience gained via music training (e.g., OPERA
hypothesis, Patel, 2011).
In addition to the superior sound processing of musicians at the automatic, cortical level
(i.e., MMN), they also showed enhanced top-down processing/reorienting (i.e., LDN) relative to
Cantonese speakers but not to nonmusician controls. These ERP findings only partially
corroborate previous studies that revealed enhanced subcortical (i.e., automatic) auditory
55
responses (e.g., Bidelman et al. 2011a; Bidelman et al. 2014; Musacchia, Sams, Skoe, & Kraus,
2007) and enhanced LDN (i.e., attentional reorienting/higher-order auditory processing) in
musicians (Putkinen et al., 2013; Moreno, Wodniecka, Tays, Alain, & Bialystok, 2014).
Furthermore, Cantonese speakers did not differ from nonmusicians without tone-language
experience (i.e., they did not have a more positive LDN that differentiated them from musicians
and controls).
There was no difference between groups in P3a amplitude. Previous research has
revealed P3a habituation over time in musicians and enhancement over time in nonmusicians,
which has been interpreted as musicianship honing attentional abilities and auditory feature
encoding (Seppanen et al., 2012). The shorter duration of testing in the present study than in the
previous study (25 versus 60 min) may account for the discrepant results.
Musicians showed stronger involuntary switching of attention for large music deviants
than for small music deviants (P3a). The difference between deviant sizes is surprising, given
that musicians can accurately identify half- and quarter-semitone changes in melody (e.g.,
Bidelman, Hutka, et al., 2013). The large deviant change is more obvious (i.e., easier to detect)
than the small deviant, perhaps accounting for this difference. Within the Cantonese group,
participants had stronger involuntary switching of attention for musical sounds (i.e., pitch
deviants) as indexed by the P3a, than for speech sounds across both deviant sizes. This finding
suggests that hearing fundamental frequency changes in a non-linguistic context elicited
attention reorientation in Cantonese speakers, who regularly use fundamental frequency in a
linguistic context. This finding may indicate that experience with pitch in a linguistic context
(i.e., Cantonese) can extend to attention reorientation of non-linguistic pitch. Future research on
Cantonese speakers could directly compare the P3a in response to pitch in a linguistic or non-
linguistic context, to better understand how this group processes pitch outside of their learned
(i.e., linguistic pitch) context.
3.4.3 Dissociation between neural and perceptual processing of music/speech
The ERP data reveal that musical experience—but not tone-language experience—is
associated with enhanced neural processing of music and speech information. This ERP
difference between Cantonese listeners and musicians implies that pitch and timbral elements of
56
music and speech are not as salient to tone-language speakers as they are to musicians (e.g.,
Bidelman et al., 2011a; Bidelman, Hutka, et al., 2013). However, the absence of neural
enhancements for music stimuli in Cantonese listeners is surprising in light of their behavioural
advantages for pitch processing.
Indeed, tone-language speakers’ behavioural enhancements for pitch processing were not
paralleled by neural enhancements. Previous work suggests that the engagement of cortical
circuitry subserving speech/music percepts depends on the cognitive relevance of the stimulus to
the listener (e.g., Abrams et al., 2011; Bidelman et al., 2011a, 2011b; Chandrasekaran et al.,
2009; Halpern, Martin, & Reed, 2008). For example, in response to musical stimuli, information
relayed from subcortical sensory structures engages higher-level cortical mechanisms subserving
musical pitch perception in musicians, a process that is not engaged in tone-language speakers
(Bidelman et al., 2011b). Indeed, strong correlations are observed between brain and behavioural
responses to musical chords for musicians but not for listeners lacking musical expertise (i.e.,
Cantonese and nonmusician participants; Bidelman et al., 2011b). Applying these findings to the
present study, auditory neural processing (as indexed by the MMN) seems to fully engage
higher-level perceptual mechanisms only in musicians (rather than in Cantonese participants or
nonmusicians). Similarly, timbral cues may be more salient to musicians than to Cantonese or
nonmusician participants (e.g., Bidelman et al., 2011a; Bidelman, Hutka, et al., 2013) as a result
of musicians’ extensive experience with differentiating timbre (e.g., distinguishing between
instruments when performing with other musicians). Higher auditory-processing demands of
music relative to language (e.g., Patel, 2011), as well as the contributions of nature and nurture to
musicianship, may account for musicians’ parallel enhancements in brain and behavioural
processing that is not observed in tone-language speakers.
It is also notable that there was no interaction between laterality and group in any
condition, suggesting that the lateralization of pitch and speech processing does not differ
between tone-language speakers and musicians. When collapsing across all other variables,
however, MMN responses were marginally right lateralized. Previous findings indicate that the
right hemisphere is specialized for processing the fine spectral features of musical stimuli,
whereas the left hemisphere is specialized for temporal processing (i.e., for speech perception;
see Zatorre, Belin, & Penhune, 2002 for a review). The current data are not consistent with this
right lateralization for music or left lateralization for speech (i.e., no significant interaction of
57
stimulus type and laterality). They suggest instead that participants were more focused on fine
spectral features than on temporal information in all stimuli. Another possibility was that the
current passive EEG task, using relatively simple auditory stimuli, was not sensitive enough to
detect lateralization differences. Other methods (e.g., fMRI; an active, rather than passive,
condition) might be better suited to detect such hemispheric differences.
3.4.4 Modularity of music and speech processing
The association between music and speech raises questions about whether the acoustic
cues in language and music rely on independent neural systems or a single, domain-general
processor (Slevc, 2012). The current data suggest that musicians have finely-tuned domain-
general processes, such that sound discrimination at the neural and behavioural level is enhanced
for both music and speech stimuli. This aligns with previous findings of musicians’ enhanced
general auditory processing, that is, spectral acuity above and beyond the processing of musical
stimuli (Kraus & Chandrasekaran, 2010). Future research could confirm this finding by
systematically (i.e., parametrically) varying spectral content without robust changes in pitch.
Differential associations between music and tone-language experience on ERPs suggest
that musicianship and language experience are associated with at least partially divergent neural
networks. That is, if the pitch experience derived from musicianship and tone language
experience shared a common neural mechanism, one would predict similar enhancements to ERP
components in both musicians and Cantonese speakers. The findings of Moreno, Wodniecka, et
al., (2014) support the view of different neural networks linked to musical and linguistic
experience, such that bilinguals and musicians exhibit different ERPs during an inhibition task.
Although inhibition is an executive function rather than a perceptual process (e.g., pitch
discrimination), the results of Moreno, Wodniecka et al. (2014) suggest that bilingualism and
musicianship have differential effects on the neural networks supporting a common ability.
3.4.5 Limitations
Future studies could address two limitations that might have contributed to the lack of
similar neural enhancements in musicians and Cantonese speakers. First, it is possible that the
use of pitch in a non-linguistic context was foreign to the Cantonese speakers (but not
musicians), and thus did not elicit as strong of a MMN response as for musicians. Future studies
58
could measure the neural response to pitch in a linguistic or non-linguistic context, to determine
whether the MMN amplitude in this group is influenced by such factors. Second, it is possible
that the vowel sounds used at present were biased, favouring native English speakers.
Specifically, though the standard and large deviant sounds are found in both English and
Cantonese, the small deviant vowel sound is not found in Cantonese (Zee, 1999). Perhaps the
lack of familiarity explains why the Cantonese participants did not show any differences in
MMN as compared to controls (i.e., they have less experience with this vowel than native
English speakers, who heard and used this vowel throughout their entire lives, and may thus be
better at detecting differences between standard and small vowel deviants). Future studies could
ensure that vowel stimuli consist of tokens that are used in both English and Cantonese, to
control for the amount of experience participants have with the stimuli.
3.5 Conclusion
This study tested the degree to which musicianship and tone-language experience are
associated with sound discrimination in behavioural and early cortical levels of auditory
processing. Consistent with previous reports (Bidelman, Hutka, et al., 2013), the present study
found that linguistic pitch experience and music training were associated with comparable
enhancement in basic pitch discrimination (as measured via F0 DLs). Only musicians showed
enhanced timbral processing (as measured by F1 DLs) relative to tone language speakers and
nonmusicians. Parallel enhancements of behavioural spectral acuity in early auditory processing
were observed in musicians only. That is, tone-language users’ advantages in pitch
discrimination that were observed behaviourally (e.g., Bidelman, Hutka, et al., 2013) were not
reflected in early cortical MMN responses to pitch changes. Although extensive music and tone-
language experience may enhance some aspects of auditory acuity (pitch discrimination), music
training may confer broader enhancements to auditory function, tuning pitch and timbre-related
neural processes.
An alternative explanation for the differences between neural and behavioural pitch-
processing in tone-language speakers is that mean activation over a cortical patch may not
adequately represent neural processes underlying the processing of sound, particularly pitch.
Musicians arguably have a greater range of experience with pitch (e.g., manipulating and
producing complex melodies and harmonies), as well as predispositions for superior pitch
59
processing abilities (i.e., Schellenberg, 2015), than do tone-language speakers. By this logic,
tone-language speakers should not show neural responses to – nor behavioural benefits in - pitch
discrimination that is comparable to that of musicians. Since such a behavioural benefit was
observed in Cantonese speakers, it is possible that there are unique neural circuitries associated
with pitch processing in these individuals that were not adequately captured in ERP measures.
To investigate this possibility, I sought out a methodology that could detect nuanced
effects in the brain signal that might underlie the differences between auditory processing for
tone-language speakers and musicians, and that could be applied to the existing dataset. Both of
these requirements were met by the measurement of brain signal variability in the EEG data,
which examines the information processing capacity of the brain across multiple timescales
(Ghosh, Rho, McIntosh, Kotter, & Jirsa, 2008a; Heisz, Shedden, & McIntosh, 2012; McIntosh,
Kovacevic, & Itier, 2008; Misic, Mills, Taylor, & McIntosh, 2010). This approach
conceptualizes the brain as a nonlinear dynamical system that is capable of examining
interactions between brain signal frequencies (Heisz & McIntosh, 2013). The following two
chapters explore this nonlinear approach, first at the theoretical level (Chapter 4, based on Hutka,
Bidelman, & Moreno, 2013), and then as applied to EEG data (Chapter 5). As Peretz et al.
(2015) have stated, converging neuroimaging evidence will be required before one can conclude
that neural overlap equates to neural sharing. The current nonlinear approach has the potential to
inform the distinction between overlapping neural regions – versus distinct neural circuitries –
for pitch processing as related to music and speech processing (as discussed in Peretz et al.,
2015).
60
Chapter 4 A Theoretical Discourse on the Use of Nonlinear Methods to
Investigate the Music-Language Association
4.1 Common acoustic processing in musicians and tone-language speakers
Based on past literature, one might assume that musicians and tone-language speakers
share acoustic processing resources, particularly when it comes to pitch processing. For instance,
Kraus and Chandrasekaran (2010) examined the relationship between music training and the
development of auditory skills, with an emphasis on the neural representation of pitch, timing,
and timbre in the human auditory brainstem. The authors posited that music training leads to
fine-tuning of all salient auditory signals, both musical and non-musical (Kraus &
Chandrasekaran, 2010). Further exploring these mechanisms, Besson, Chobert, and Marie (2011)
stated that when long-term experience in a domain impacts acoustic processing in another
domain (e.g., the use of pitch in a nonlinguistic or linguistic context), the findings can serve as
evidence for common acoustic processing. Similarly, when long-term experience in one domain
influences the build-up of abstract and specific percepts in another domain, results may serve as
evidence for cross-domain plasticity.
The notion of “musicianship tuning” has been extended to claims that music training
confers a range of enhanced sensory and cognitive processes. For example, Moreno and
Bidelman (2014) posited a multidimensional continuum model of common processing and cross-
domain plasticity. In this model, the extent of plasticity effects resulting from musicianship is
viewed as a spectrum along two orthogonal dimensions, namely Near-Far and Sensory-
Cognitive. The former describes the extent of plasticity (i.e., within a domain, or across
domains); the latter describes the level of affected processing, ranging from low-level sensory
processing specific to the auditory domain to high-level domain-general cognitive processes,
including executive function and language.
Evidence of overlap of neural regions involved in music and speech (discussed in
Chapters 1and 3) appears to corroborate the notion of common acoustic processing in musicians,
as compared to nonmusicians. As mentioned earlier, co-activation of neural regions in response
to music and speech processing regions does not equate to the sharing of neural networks (Peretz
61
et al., 2015). Furthermore, in contrast to the evidence for common acoustic processing, the
findings described in Chapter 2 suggest that absolute pitch ability and tone-language experience
do not rely on the same mechanisms of pitch processing, such that absolute pitch—but not tone-
language experience—is associated with enhanced encoding of pitch. Additionally, there was no
cumulative effect of absolute pitch ability and speaking a tone language. If absolute pitch ability
and tone-language recruited common pitch processing abilities, then we might expect no
difference between pitch-encoding performance across all participants, because everyone was
using the same pitch processor. Furthermore, all participants were musically trained, suggesting
they all shared a common baseline of finely-honed acoustic processing abilities. The data from
Chapter 2 therefore suggest that absolute pitch ability and tone-language use may not rely on a
common pitch processor. Alternatively, perhaps absolute pitch taps into additional auditory-
processing networks to facilitate pitch encoding, thus leading to superior performance, as
compared to tone-language speakers.
Cooper and Wang (2012) administered Cantonese tone-word training to tone language
(Thai) and non-tone language (English) speakers. These groups were further subdivided into
musicians and non-musicians. Participants were trained to identify words distinguished by five
Cantonese tones. Measures of music aptitude and phonemic tone identification were then
administered. Participants who were either Thai speakers or musicians were better at Cantonese
word learning than non-musicians. Having both tone-language experience and musical training
was not advantageous, however, above and beyond either type of experience alone. These
findings suggest that the networks underlying the processing of verbal tones in musicians and
tone-language speakers confer similar but not cumulative behavioural benefits. That is, tone
language and musicianship may rely on a common pitch processor that is similarly honed by
both types of auditory experience. Similarly, Mok and Zuo (2012) investigated how music
training impacted lexical tone perception in native tone-language speakers. Cantonese and non-
tone language speakers with or without music training performed discrimination tasks with
Cantonese monosyllables and pure tones resynthesized from Cantonese lexical tones. Although
music training was predictive of enhanced lexical tone discrimination among non-tone language
speakers, it had no association in Cantonese speakers. These data also suggest that musicianship
and tone language hone common pitch processing abilities. Perhaps once these pitch processing
abilities are acquired via either musicianship or speaking a tone language, there is a ceiling
62
effect, such that pitch experience via musicianship and tone language does not confer any
additional advantage for pitch processing over either type of experience in isolation16.
Nevertheless, when one considers data from studies that examined the neural correlates
of pitch processing in musicians and tone-language speakers, it is less clear that the use of pitch
in a lexical (tone language) or non-lexical (musical) context similarly hones common pitch
processing abilities (Bidelman et al., 2011a, 2011b). In Bidelman et al. (2011a), both Mandarin
speakers and musicians had stronger brainstem responses to pitch tracking in a musical pitch
interval and in a lexical tone than nonmusician controls. However, in Bidelman et al. (2011b),
Mandarin speakers and musicians had stronger brainstem responses to tuned and detuned
musical chords as compared to controls, but only musicians showed superior pitch discrimination
on a behavioural task. By contrast, Chapter 3 revealed that tone-language speakers and musicians
had similar pitch discrimination ability when measured behaviourally, but only musicians had
enhanced neural responses to pitch changes.
Both of these reasons may contribute to the larger MMNs observed in musicians, as
compared to the Cantonese and control groups in Chapter 3. However, the comparable
behavioural pitch discrimination performance for musicians and Cantonese speakers raises the
possibility of an alternative explanation. This discrepancy between the neural and behavioural
data in Cantonese speakers implies that there are some shared pitch processing abilities in
musicians and tone-language speakers, but that the networks that support this processing
manifest differently at the cortical level. If these networks differentially manifest at the cortical
level, they were too subtle to be detected in the MMN response in Chapter 3. As prefaced in
Section 3.5, the investigation of whether these networks manifest differently at the cortical level
would require an analytic procedure capable of detecting such nuanced effects in the brain
signal, which could also be applied to the existing dataset for direct comparison with the results
reported in Chapter 3. This procedure is the measurement of brain signal variability in EEG data,
which examines the information processing capacity of the brain across multiple timescales
(Heisz et al., 2012; Lippe, Kovacevic, & McIntosh, 2009; McIntosh et al., 2008; McIntosh et al.,
2014). At a more conceptual level, this method examines the brain as a nonlinear dynamical
16
Note that it may also be possible that the fine-grained nature of representations related to music experience may differ from that of tone language experience.
63
system (Hutka et al., 2013), as contrasted with a linear, static view of brain activity (see Section
4.2). The next section will compare these linear and nonlinear approaches, and expand on the
implications of this nonlinear approach.
4.2 A nonlinear approach to studying the music-language link
Empirical work on the association between music and language has relied on methods that
capture linear dependencies in the data, such as mean activation in or between neural regions
(e.g., Bidelman et al., 2011a, 2011b, Chapter 3). The linear approach captures brain activity as a
static entity (i.e., occurring at a single timescale). For example, in a linear approach to EEG,
waveforms are averaged together across trials. A loss of information is inherent to this process,
as the nonlinear stochastic activity that characterizes variability in each trial disappears as a
result of averaging (Figure 8). In contrast, a nonlinear approach can capture this variability
across time (see Hutka, Bidelman, & Moreno, 2013 for a discussion), thus moving from a static
view of the brain, to measuring the brain in alignment with its natural state (i.e., a complex
nonlinear system).
Figure 8. Loss of information as a result of averaging individual trials in EEG. The variation
between individual trials (left) is lost as a result of the averaging procedure, as evident in the
averaged waveform (right).
4.2.1 The brain as a complex, nonlinear system
Indeed, the brain itself is a complex nonlinear system (Bullmore & Sporns, 2009;
64
McKenna, McMullen, & Shlesinger, 1994), and requires a nonlinear model for greater
explanatory power of its functions (Hutka et al., 2013). Complex systems are typically
characterized as dynamic (i.e., they change with time), nonlinear (i.e., the effect is
disproportionate to the cause), multifaceted, open, unpredictable, self-organizing, and adaptive
(Larsen-Freeman, 1997, p. 142). Furthermore, the behaviour of a complex system, such as the
brain, does not emerge from any single component but instead from the interaction between its
ever-changing constituent components (Waldrop, 1992, p. 145). If we define the brain as a
complex, nonlinear system, a linear analysis of the brain cannot portray a complete account of
neural functioning, and must be complemented with nonlinear techniques.
4.2.2 Application to the study of acoustic processing influenced by experience
To explore further perceptual processing in music and speech in musicians and tone-
language speakers, nonlinear methods will be used. This approach holds the promise of revealing
the nuances that define and distinguish the pitch processing networks in musicians and tone-
language speakers. One nonlinear measure of brain signal variability has been successfully
applied to EEG data to examine how experience with a given stimulus manifests in the stochastic
interactions between brain frequencies (Heisz & McIntosh, 2013; Heisz et al., 2012). This
approach is ideally suited to the present research question, which examines how different types
of experience with a given acoustic cue (e.g., pitch) are differentially associated with brain signal
variability.
4.3 Brain signal variability
In the complex nonlinear system that is the brain, we find inherent variability (Faisal,
Selen, & Wolpert, 2008; Pinneo, 1966; Stein, Gossen, & Jones, 2005; Traynelis & Jaramillo,
1998), fluctuating across time, both extrinsically (i.e., during a task, Deco, Jirsa, McIntosh,
Sporns, & Kotter, 2009; Deco, Jirsa, & McIntosh, 2011; Ghosh, Rho, McIntosh, Kotter, & Jirsa,
2008;Ghosh, Rho, McIntosh, Kotter, & Jirsa, 2008b; Raichle, MacLeod, Snyder, Powers,
Gusnard, & Shulman, 2001; Raichle & Snyder, 2007) and intrinsically (i.e., at rest, Deco, Jirsa,
& McIntosh, 2011). As discussed in Faisal et al. (2008), variability arises from two sources – the
deterministic properties of a system (e.g., the initial state of neural circuitry will vary at the start
of each trial, leading to different neuronal and behavioural responses), and “noise”, which are
65
disturbances that are not part of the meaningful brain activity and thus interfere with meaningful
neural representations. The former type of variability is being addressed at present, which
reflects important brain activity and not, for example, random artifacts inherent to the acquisition
of brain data (e.g., ocular/muscular perturbations or thermal noise from electrodes or MRI
scanners). This brain signal variability (BSV) is synonymous with the transient temporal
fluctuations in the brain signal (Deco et al., 2011; see Section 5.2.5 for BSV-related formulae);
its analysis can be applied to many different types of neuroimaging data.
For example, BSV has been analyzed in EEG (e.g., Heisz et al., 2012), fMRI (e.g.,
Garrett, Kovacevic, McIntosh, & Grady, 2010) and magnetoencephalography (MEG, e.g., Misic
et al., 2010). In the EEG study by McIntosh et al. (2008), BSV was examined using two
measures, namely principal component analysis (PCA, a linear method that was applied in a
nonlinear way) and multiscale entropy (MSE, a nonlinear metric). These measures prove
sensitive to linear and nonlinear brain variability and differentiate between changes in the
temporal dynamics of a complex system and that of random variability (Costa, Goldberger, &
Peng, 2002, 2005). MSE indexes the temporal predictability of neural activity, calculated by
down sampling single-trial time series to progressively coarse-grained time scales, and
calculating sample entropy (i.e., state variability) at each scale (Costa et al., 2005). Such a linear
versus nonlinear differentiation would be useful in qualifying the complexity of complementary
neural networks, particularly temporally sensitive networks such as those responsible for
language and music processing.
Recently, BSV has been found to convey important information about network dynamics,
such as integration of information (Garrett et al., 2013) and distinguishing long-range from local
connections (McIntosh et al., 2014). That is, BSV can serve to reveal a complex neural system
that has capacity for enhanced information processing and alternates between multiple functional
states (Raja-Beharelle et al., 2012). BSV thus affords the appropriate framework with which the
interaction of music and language can be studied, allowing us to view these two systems as
dynamically fluctuating across time.
As discussed in Garrett et al. (2013), the modeling of neural networks involves mapping
an integration of information across widespread brain regions, via emerging and disappearing
correlated activity between areas over time and across multiple timescales (Honey, Kotter,
66
Breakspear, & Sporns, 2007; Jirsa & Kelso, 2000). These transient changes result in fluctuating
temporal dynamics of the corresponding brain signal, such that more variable responses are
elicited by networks with more potential configurations, i.e., “brain states” (Garrett et al., 2013).
This signal variability is thought to represent the network’s information-processing capacity,
such that variability is positively associated with integration of information across the network
(Garrett et al., 2013). Thus, this variability is experience-dependent (rather than task-dependent),
making such representations a valuable addition to understanding the interaction of neural
mechanisms supporting auditory processing in musicians and tone-language speakers.
4.3.1 Current applications of BSV
The analysis of BSV from EEG, MEG, and fMRI is a new framework in cognitive
neuroscience data analysis. Several studies have focused on developmental applications of BSV,
finding that signal variance increases with age (McIntosh et al., 2008; Misic et al., 2010; Lippe et
al., 2009). Others have also used BSV to better understand brain networks. Garrett et al. (2010)
found that the standard deviation of BOLD signal was five times more predictive of brain age
(from age 20 to 85) than mean BOLD signal. In another study, Garrett et al. (2011) examined
how BOLD variability related to age, reaction time, and consistency in healthy younger (20 to 30
years) and older (56 to 85 years) adults on three cognitive tasks (perceptual matching, attentional
cueing, and delayed match-to-sample). Younger, faster, and more consistent performers
exhibited increased BOLD variability, establishing a functional basis for this often disregarded
measure. These studies collectively demonstrate the importance of shifting from a linear (e.g.,
mean neural response) to a nonlinear (e.g., entropy/variability) conception of complex brain
systems and their relationship to behaviour.
BSV has also been applied to the study of knowledge representation. Heisz et al. (2012)
tested whether BSV reflects functional network reconfiguration during memory processing of
faces. The amount of information associated with a particular face was manipulated (i.e., the
knowledge representation for each face; e.g., a famous face would have more information
associated with it, and thus, greater knowledge representation) while measuring BSV to capture
the EEG state variability. Across two experiments, the authors found greater BSV in response to
famous faces than a group of non-famous faces, and that BSV increased with face familiarity.
Notably, these findings were not reflected in the mean ERP amplitude in the same dataset (Heisz
67
et al., 2012). Heisz et al. (2012) posited that cognitive processes in the perception of familiar
stimuli may engage a broader network of brain regions, which manifest as higher variability in
spatial and temporal brain dynamics (i.e., greater spatiotemporal changes in BSV).
The findings of Heisz et al. (2012) corroborate those of Tononi, Sporns, and Edelman
(1996), who found that the amount of information available for a given stimulus can be
determined by the extent to which the complexity of a stimulus matches its underlying system
complexity. For example, familiar stimuli would elicit a stronger match than novel stimuli, as
there would be more information available on the former, thus yielding greater BSV. These
findings collectively suggest that BSV increases as a result of the increased accumulation of
information within a neural network. Presumably, this type of “build-up” results from the
increased repertoire of brain responses associated with a given stimulus (Ghosh et al., 2008;
McIntosh et al., 2008; Tononi et al., 1994). These findings are applicable to understanding the
music-language link at a network level because brain responses associated with given stimuli
(i.e., differences between musical notes or lexical tones) should commensurately vary in BSV for
a group that has expertise with those stimuli (i.e., musicians or tone-language speakers).
4.4 Moving from theory to application, in the context of the music-language association
In summary, traditional approaches to understanding the brain (e.g., fMRI: mean
activation; ERPs: peak amplitudes) may not afford a complete understanding of neural activity
because they cannot capture nonlinear dependencies in the brain signal. Studies that have used
BSV to quantify knowledge representation (e.g., Heisz et al., 2012) suggest that BSV is a
promising and informative metric of plasticity. If the nonlinear, stochastic activity corresponding
to the neural processing of music and language could be measured in musicians and tone
language speakers, one could potentially address how music and tone-language experience are
differentially associated with the neural networks underlying pitch processing. Chapter 5
describes a study in which the BSV of the EEG time-series data from Chapter 3 is measured
while participants listened to music and speech sounds.
68
Chapter 5 Using Brain Signal Variability to Examine Differences between
Musicians and Tone Language Speakers
5.1 Introduction
5.1.1 Brain signal variability: A recapitulation
The previous chapter outlined how nonlinear analyses could be used to gain a deeper
understanding of the music-language association. One specific nonlinear approach that holds
great potential for understanding the neural mechanisms underlying auditory processing in
musicians and tone-language speakers is the measurement of BSV. There is strong evidence
showing that BSV serves as a metric of neural-network dynamics, which provides valuable
information about these dynamics that could not be obtained through the sole measurement of
mean neural activity (e.g., using ERPs, Heisz et al., 2012; McIntosh et al., 2008; Vakorin et al.,
2011; see also: Garrett et al., 2011, Ghosh et al., 2008). Previous findings suggest that BSV
reflects the brain’s information processing capacity, such that a more variable signal indicates
greater cross-network information integration (e.g., Heisz et al. 2012; Misic et al., 2010). Studies
have shown that the more information available to a listener about a given stimulus, the greater
the BSV in response to that stimulus (Heisz et al. 2012; Misic et al. 2010). Variability should
therefore increase as a function of learning, such that the more information one acquires for a
stimulus, the greater information carried in the brain signal (Heisz et al. 2012).17 For these
reasons, I posited that BSV might have great potential for studying group differences in auditory
processing of musicians and tone- language speakers.
5.1.2 The present investigation
In the current investigation, the theoretical concept of using BSV to study the music-
language association was applied to the EEG data collected from the study described in Chapter
3 (please see Section 3.2 for information on participants, cognitive tests, and EEG stimuli).
While we observed that both musicians and Cantonese speakers showed superior performance on
a behavioural pitch discrimination task, as compared to controls, only musicians had an
17
Note that variability in BSV is determined both intra-trial and intra-subject. Thus, one may have two participants with highly variable BSV, but with different data values.
69
enhanced MMN in response to pitch (i.e., indexing better automatic pitch discrimination), as
compared to Cantonese speakers and controls. One explanation for these results is that there are
unique neural circuitries associated with pitch processing in Cantonese speakers that were not
characterized in ERP measures. As discussed earlier, BSV has been shown to provide
information above and beyond what is learned from mean activation, and can index knowledge
representation supporting the processing of a given stimulus (e.g., Heisz et al., 2012). Therefore,
the current investigation measured BSV during these groups’ processing of pitch (as compared to
a non-pitch cue, namely speech timbre), with the objective of better understanding how
musicians and Cantonese speakers differ with respect to the information processing capacity of
neural networks supporting auditory processing. This examination would directly address
whether musicianship or tone-language experience is reflected in similar or different information
processing capacities of pitch versus timbre. Furthermore, comparing the spatiotemporal profile
of music and speech processing for each group could reveal similarities and differences. In
addition, measuring BSV would allow us to examine how experience with one auditory cue (e.g.,
pitch) transfers to the processing of another auditory cue (e.g., timbre).
To this end, we measured BSV of the EEG during auditory processing of music (pitch
variation) and vowels (timbre variation) in musicians, Cantonese speakers, and non-musician
controls. This design tested whether pitch processing is supported by common neural network
activations in musicians and Cantonese speakers. I hypothesized that if musical training and
speaking Cantonese similarly tune information processing supporting music and speech, then
both groups would show greater BSV supporting auditory processing relative to that of controls
(i.e., musicians = Cantonese speakers > controls). If auditory expertise and/or pre-existing
differences between musicians and Cantonese speakers differentially impact information
processing capacity, then one would predict different BSV between musicians and tone-language
speakers. This latter prediction would also manifest in unique spatiotemporal distributions for
each group, as each group would be using a different brain network to support processing of
pitch versus timbre.
70
5.2 Methods
5.2.1 EEG recording and pre-processing
Following the EEG recording and pre-processing described in Section 3.2.5.1, source
estimation was performed at 72 regions of interest defined in Talairach space (Diaconescu,
Alain, & McIntosh, 2011) using sLORETA (Pascual-Marqui, 2002), as implemented in
Brainstorm (Tadel, Baillet, Mosher, Pantazis, & Leahy, 2011). Source reconstruction was
constrained to the cortical mantle of the standardized brain template MNI/Colin27 defined by the
Montreal Neurological Institute in Brainstorm. Current density for one source orientation (X
component) was mapped at 72 brain regions of interest adapting the regional map coarse
parcellation scheme of the cerebral cortex developed in Kotter and Wanke (2005). MSE was
calculated on the source waveform at each region of interest (ROI) for each participant.
5.2.2 Multiscale entropy analysis
To characterize BSV, multiscale entropy (MSE; Costa et al., 2002, 2005) was measured, as it
indexes sample entropy across multiple timescales. MSE quantifies sample entropy (Richman &
Moorman, 2000) at multiple timescales (Costa et al., 2002, 2005). MSE was calculated in two
steps using the algorithm available at www.physionet.org/physiotools/mse (Goldberger et al.,
2000). First, the EEG signal was progressively down-sampled into multiple coarse-grained
timescales where, for scale τ, the time series is constructed by averaging the data points with
non-overlapping windows of length τ. Each element of a coarse-grained time series, yj(τ), is
calculated according to Eq. (1):
The number of scales is determined by a function of the number of data points in the signal and
the data in the present study supported 12 timescales [sampling rate (512Hz) * epoch
(1200ms)/50 time points per epoch = maximum of 12 scales]. To convert timescale into
milliseconds (ms), the timescale was divided by the EEG sampling rate (512 Hz).
Second, the algorithm calculates the sample entropy (SE) for each coarse-grained time
series (Eq. (2)):
(1)
71
Sample entropy quantifies the predictability of a time series by calculating the conditional
probability that any two sequences of m consecutive data points that are similar to each other
within a certain criterion (r) will remain similar at the next point (m+1) in the data set (N), where
N is the length of the time series (Richman & Moorman 2000). In the present study, MSE was
calculated with pattern length set to m=5 and the similarity criterion was set to r=1. MSE
estimates were obtained for each participant as the mean across single trial entropy measures for
each timescale.
5.2.3 Spectral analysis
Power spectral density (PSD) was also measured for all trials. This spectral analysis was
conducted because previous studies suggested that changes in MSE tend to follow closely
changes in spectral power, while providing unique information about the data (Gudmundsson et
al., 2007; Lippe et al., 2009; McIntosh et al., 2008; McIntosh et al,. 2008; Misic et al., 2010).
Therefore, changes in sample entropy across sources and temporal scales were examined, as well
as at changes in PSD across sources and frequency bands.
Single-trial power spectra were computed using the Fast Fourier transform (FFT). To
capture the relative contribution from each frequency band, all time series were first normalized
to a mean of 0 and SD of 1. Given a sampling rate of 512 Hz and 614 data points per trial, the
effective frequency resolution was 0.834 Hz. Hence, all spectral analyses were constrained to a
bandwidth of 0.834-50 Hz.
5.2.4 Statistical analysis
5.2.4.1 Cognitive measures
A univariate ANOVA was run for each cognitive test.
5.2.4.2 Task Partial Least Squares Analysis
Task partial least squares analysis (PLS; McIntosh, Bookstein, Haxby, & Grady, 1996)
was used to assess between- and within-subjects changes in MSE during performance. Task PLS
(2)
72
is a multivariate statistical technique that employs singular value decomposition (SVD) to extract
latent variables (LVs) that capture the maximum covariance between the task design and neural
activity. The data matrix containing participants in each group by MSE values across the 72
brain regions and sampling scales was mean-centered with respect to the column grand average.
SVD was then applied to the matrix to generate mutually orthogonal LVs, with descending order
of magnitude of covariance accounted for. Each LV consisted of: (1) a pattern of design scores,
(2) a singular image showing the distribution across brain regions and sampling scales, (3) a
singular value representing the covariance between the design scores and the singular image
(McIntosh et al., 1996; McIntosh & Lobaugh, 2004).
The statistical significance of each LV was determined using permutation tests (McIntosh
& Lobaugh, 2004). An LV was considered significant if its singular value was present less than
5% of the time in random permutations (i.e., p < .05). The reliability of each statistical effect was
assessed through bootstrap estimation of standard error confidence intervals of the singular
vector weights in each LV (Efron & Tibshirani, 1986). In the present study, this process allowed
for the assessment of the relative contribution of brain regions and timescales to each LV. Brain
regions with a singular vector weight over standard error ratio > 3.0 correspond to a 99%
confidence interval and were considered to be reliable (Sampson, Streissguth, Barr, & Bookstein,
1989). Such effects are therefore designated as “reliably expressed” throughout the results
section. In addition, the dot product of an individual participant’s raw MSE data and the singular
image from the LV produces a brain score. The brain score is similar to a factor score that
indicates how strongly a participant expresses the patterns on the latent variable and allowed us
to estimate 95% confidence intervals for the effects in each group and task condition.
The large and small deviant conditions were combined into a single condition for all
analyses, as there were no differences in MSE or PSD between these conditions. For the
between-groups comparisons, the grand mean was subtracted over all groups and conditions. For
between-conditions comparisons, the mean was subtracted from each group, rather than across
all groups, thus displaying how condition effects are associated with group membership.
73
5.3 Results
5.3.1 Task PLS: Multiscale entropy and spectral data
All groups and conditions were entered into the task PLS; Figures 9, 11, and 12 show
both MSE and spectral data.
5.3.1.1 Between-group comparisons
When comparing groups across all conditions (Figure 9; Figure 10, showing sample
entropy curves for each timescale, across all conditions.), the first latent variable (LV1) of the
MSE analysis captured greater sample entropy in the musician group as compared to the
Cantonese group (LV1, p = .004, singular value = 1.0856 corresponding to 43.82% of the
covariance). This difference was reliably expressed across both fine and coarse time scales
across all neural ROIs, particularly in the right hemisphere. The largest effects were seen across
all timescales (particularly, in coarse scales) in the right inferior parietal, angular gyrus, and
primary somatosensory area; medial posterior cingulate; and bilateral primary motor, medial
premotor, precuneus, cuneus, and superior parietal area.
LV1 of the spectral analysis captured differences in the musician group as compared to
the control and Cantonese groups (LV1, p = .012, singular value = 0.1626, corresponding to
37.01% of the covariance). This difference was reliably expressed across frequencies that were
lower than 20 Hz (primarily theta/alpha band: 4-12 Hz) in a number of brain regions similar or
identical to those observed in the MSE results.
Collectively, PLS analyses revealed that each group could be distinguished based on the
variability (MSE) and spectral details of their EEG (particularly in the right hemisphere) when
listening to speech and music stimuli. Furthermore, in the areas in which these contrasts were
reliably expressed (e.g., right angular gyrus, Figure 10), musicians had the greatest sample
entropy across all conditions; Cantonese speakers had the lowest sample entropy; nonmusicians
were in between these two groups.
74
Figure 9. First latent variable (LV1), between-groups comparison: Contrasting the EEG response
to the music and speech conditions across measures of multiscale entropy (left) and spectral
power (right). The bar graphs (with standard error bars) depict brain scores that were
significantly expressed across the entire data set as determined by permutation at 95%
confidence intervals. The image plot highlights the brain regions and timescale or frequency at
which a given contrast was most stable; values represent ~z scores and negative values denote
significance for the inverse condition effect.
75
Figure 10. MSE curves for all groups, averaged across all conditions, at the right angular gyrus.
5.3.1.2 Between-conditions comparisons
LV1 for the MSE analysis (Figure 11) captured differences in sample entropy between
the music and speech conditions for nonmusicians (p = 0.002, singular value = 0.2518,
corresponding to 22.85% of the covariance). These differences were reliably expressed at fine
timescales in several left hemisphere areas, namely the anterior insula, centrolateral and
dorsomedial prefrontal cortex, frontal polar area, and secondary visual areas. Specifically,
greater information processing capacity for speech, as compared to music, was observed in these
left-hemisphere regions. Differences were also reliably expressed in the right primary and
secondary visual areas, and the cuneus. Namely, greater information processing capacity for
music, rather than speech, was observed in these right-hemisphere regions.
Similarly, LV1 of the spectral analysis captured spectral differences between the music
and speech conditions for nonmusicians (p < 0.001 singular value = 0.0533, corresponding to
25.28% of the covariance). Processing of music, as compared to speech, was reliably expressed
at frequencies below 10 Hz (e.g., theta, 4-7 Hz for the music condition) in multiple left
hemisphere regions, namely the left anterior insula, claustrum, centrolateral and dorsomedial
prefrontal cortex, frontal polar, parahippocampal cortex, thalamus, and dorsolateral and
ventrolateral premotor cortex. These differences were also expressed in the midline posterior
76
cingulate cortex, and the right cuneus, thalamus, and ventrolateral prefrontal cortex. Processing
of speech, as compared to music, was reliably expressed in frequencies above 12 Hz (e.g., beta,
12 - 18 Hz; gamma, 25 – 70 Hz for the speech stimuli) in multiple left-hemisphere areas, namely
the left anterior insula, centrolateral and dorsomedial prefrontal cortex, orbitofrontal cortex,
frontal polar, and dorsolateral premotor cortex. These differences were also expressed in the
right primary motor area, precuneus, and the dorsolateral prefrontal cortex.
77
Figure 11. First latent variable (LV1), between-conditions comparison: Contrasting the EEG
response to the music and speech conditions across measures of multiscale entropy (left) and
spectral power (right) for nonmusicians. The bar graphs (with standard error bars) depict brain
scores that were significantly expressed across the entire data set as determined by permutation
tests at 95% confidence intervals. The image plot highlights the brain regions and timescale or
frequency at which a given contrast was most stable; values represent ~z scores and negative
values denote significance for the inverse condition effect.
78
LV2 for the MSE analysis (Figure 12) captured differences in sample entropy between
the music and speech conditions for Cantonese speakers (p = .052, singular value = 0.2029,
corresponding to 18.41% of the covariance). Specifically, greater information processing
capacity for music, rather than speech, was reliably expressed in the midline posterior cingulate
and retrosplenial cingulate cortex at fine timescales, and the primary visual area at coarse
timescales. Greater information processing capacity for speech, rather than music, was expressed
in the left medial premotor cortex and right medial premotor cortex at coarse timescales.
Similarly, LV2 of the spectral analysis captured spectral differences between the music
and speech conditions for Cantonese speakers (p = .036, singular value = 0.0382, corresponding
to 18.12% of the covariance). The processing of speech, as compared to music, was reliably
expressed at frequencies below 10 Hz (e.g., theta, 4-7 Hz) in the bilateral medial premotor
cortex. The processing of the music condition, as compared to speech, was reliably expressed in
low-frequency activity (e.g., theta, 4-7Hz) in the left parahippocampal cortex, and right anterior
insula, ventral temporal cortex, and fusiform gyrus. Processing of music was also reliably
expressed at frequencies above 12 Hz (e.g., beta, 12 - 18 Hz; gamma, 25 – 70 Hz), in the midline
posterior and retrosplenial cingulate cortex, left superior parietal cortex, and bilateral primary
and secondary visual areas.
79
Figure 12. Second latent variable (LV2), between-conditions comparison: Contrasting the EEG
response to the music and speech conditions across measures of multiscale entropy (left) and
spectral power (right) for Cantonese speakers. The bar graphs (with standard error bars) depict
brain scores that were significantly expressed across the entire data set as determined by
permutation tests at 95% confidence intervals. The image plot highlights the brain regions and
timescale or frequency at which a given contrast was most stable; values represent ~z scores and
negative values denote significance for the inverse condition effect.
80
The third latent variable (LV3), contrasting music and speech conditions for the musician
group, was not significant (MSE: p = .256; spectral analysis: p = 0.210). While it is possible that
this effect would become significant at a larger sample size, the bootstrap-estimated standard
errors were small, suggesting that this lack of an effect was reliable (i.e., a stable-zero estimate,
see McIntosh & Lobaugh 2004). The fact that musicians did not distinguish between music or
speech stimuli is important because it suggests that this group used a similar neural architecture
to process acoustic information, regardless of the stimulus domain (i.e., music ≈ speech).
Collectively, the between-condition analyses revealed that each group processed the
distinction between music and speech using a unique spatiotemporal network. LV1 showed that
nonmusicians had greater sample entropy and higher frequency activity for speech than music, at
several left hemisphere areas. LV2 showed that Cantonese speakers had greater sample entropy,
for music than speech, particularly in midline regions. The spectral analyses revealed that this
contrast was also expressed across multiple frequency bands. LV3, which was not significant,
suggested that musicians used similar neural networks to support the processing of both music
and speech stimuli. Alternatively, the passive paradigm and relatively simple auditory stimuli
might not have engaged different networks for pitch and timbre in musicians. Measuring BSV
during an active task in future studies would help eliminate this possibility.
5.4 Discussion
5.4.1 MSE data
By examining sample entropy between groups, this study sought to test if musicians and
tone-language (Cantonese) speakers have similar information processing capacity supporting
music and speech listening via distinct neural networks. Between groups, we found that
musicians had greater BSV (i.e., information processing capacity) than nonmusicians when
listening to both music and speech stimuli. Cantonese speakers had the lowest entropy of all
three groups for both stimulus conditions. Though this pattern of results was evident across
multiple neural regions and timescales, it was particularly prominent in right hemisphere regions
at coarse timescales. These data support the hypothesis that musicianship and the use of a tone
language are differentially associated with the information processing capacity supporting both
music (pitch) and speech (timbre) processing. This differential association may be due to
musicians’ extensive experience with pitch as compared to Cantonese speakers, as well their pre-
81
existing differences that distinguish them from nonmusicians (e.g., SES; musical aptitude;
personality; genetic factors, see Chapter 1.4 and Schellenberg, 2015 for a discussion). Future
studies could examine how each of these factors is related to the BSV supporting pitch
processing in musicians, Cantonese speakers, and controls, to better understand the contributions
of nature and nurture to BSV.
The finding that musicians’ increased BSV was most prominent in the right hemisphere
corroborates the finding that this hemisphere is engaged in fine spectral features of auditory
input, as compared to the left hemisphere, which is more specialized for temporal processing (see
Zatorre et al. 2002 for a review). Similarly, expression in coarse timescales suggests that the
information processing capacity of pitch and timbre is distributed across the brain, rather than
locally based (Vakorin et al. 2011). Collectively, our findings indicate that musicians’ processing
of fine spectral features – both for pitch and timbre – is likely supported by a wider network than
in Cantonese speakers and English-speaking nonmusicians. These data are consistent with other
evidence indicating that music training is associated with a wide range of benefits in spectral
processing (e.g., Bidelman & Krishnan 2010; Chandrasekaran et al. 2009; Parbery-Clark, Skoe,
Lam et al. 2009; Parbery-Clark et al. 2013; Schoen et al. 2004; Zendel & Alain 2012). These
effects may be due to the proliferation of brain networks supporting auditory processing in
musicians, as shaped by pre-existing differences and musical training.
In the between-conditions results, we found that each group engaged unique
spatiotemporal distributions to process the differences between music and speech. Nonmusicians
had greater information processing capacity for speech than music (Figure 11). This difference
was expressed primarily in several left hemisphere areas at fine timescales. The lateralization of
this result is consistent with reports that in musically naïve listeners, speech processing is more
left-lateralized than music, given the left hemisphere’s specialization for temporal processing
(see Zatorre et al. 2002 for a review). These findings also suggest that nonmusicians may have
greater, locally-based information processing capacity for speech, as compared to music (see
Vakorin et al. 2011). This group’s processing of music was right-lateralized, aligning with
evidence for right-hemisphere specialization for spectral processing (Zatorre et al., 2002).
Cantonese speakers had greater sample entropy for music as compared to speech (Figure
12). This distinction was expressed primarily in the midline posterior cingulate and retrosplenial
82
cingulate cortex at fine timescales. This finding suggests that Cantonese speakers’ use of lexical
pitch may manifest for greater sample entropy for this cue, as compared to timbre. This finding
aligns with the idea that the more familiar one is with a stimulus, the greater sample entropy
associated with processing that stimulus (i.e., familiar versus unfamiliar faces, Heisz et al.,
2012). Finally, BSV in musicians did not distinguish between processing music and speech
sounds, which implies that the spectral acuity associated with extensive training in music is
associated with enhanced information processing capacity that supports both pitch and timbral
cues. Collectively, our data demonstrate that each group processes the distinction between music
and speech using a different spatiotemporal network. Furthermore, the activation patterns for
each group suggest a gradient of pitch processing capacity, which is consistent with the proposal
that the more experience one has with pitch (i.e., musicians > Cantonese > nonmusicians), the
greater sample entropy associated with processing this cue. Namely, nonmusicians had greater
sample entropy for speech as compared to music; Cantonese speakers had a greater sample
entropy capacity for music than speech; musicians had similar levels of sample entropy for both
conditions. An analogous gradient was observed in behavioural data for a pitch memory task in
Bidelman et al. (2013). This gradient effect suggests that demands of each type of auditory
experience are associated with differential information processing capacities (von Stein &
Sarnthein, 2000).
5.4.2 Comparing MSE results to the spectral analysis results
The MSE analyses yielded some unique information that was not obtained in the spectral
analyses, as well as data that were complementary to the spectral analysis. Between-group
comparisons of sample entropy revealed that musicians had greater brain signal complexity than
tone-language speakers across all conditions. In contrast, spectral analyses revealed that
musicians’ processing of all conditions drew more heavily upon low, theta/alpha (4-12 Hz)
frequencies than the other groups. Low frequencies of the EEG have traditionally been
interpreted as reflecting long-range neural integration (von Stein & Sarnthein, 2000). Both the
MSE and spectral results were also observed in similar brain regions. Collectively, both types of
analyses suggest long-range and more “global” processing of auditory stimuli among musicians
compared to tone-language speakers or nonmusicians. However, unlike the spectral results, the
MSE data speak to the information processing capacity of the underlying networks.
83
This global processing aligns with multiple neuroimaging findings in which musicians
had increased inter-hemispheric communication as compared to nonmusicians. For example,
musicians – relative to nonmusicians – have a larger anterior corpus callosum, which is
responsible for such inter-hemispheric communications, and connecting premotor,
supplementary motor, and motor cortices (Schlaug et al., 1995). Numerous studies have since
found differences in the corpus callosum between musicians and nonmusicians (Hyde et al.,
2009; Schlaug et al., 2005; Schlaug et al., 2009; Schmithorst & Wilke, 2002; Steele et al., 2013),
particularly in regions connecting motor areas (Schlaug et al. 2005; Schlaug et al., 2009). These
differences may be honed by the bimanual coordination related to playing an instrument (Moore
et al., 2014), or by pre-existing differences that distinguish musicians from nonmusicians (see
Schellenberg, 2015 for a discussion).
Between-condition comparison of sample entropy revealed that each group showed
unique spatiotemporal distributions in their response to processing music and speech.
Nonmusicians had greater information processing for speech than music at fine timescales in
several left hemisphere areas (e.g., anterior insula, centrolateral and dorsomedial prefrontal
cortex, frontal polar area). The spectral data revealed beta and gamma frequency activity when
processing speech (as compared to music) in similar neural regions as found in the MSE
analysis. High-frequency activity has been associated with local perceptual processing (von Stein
& Sarnthein, 2000), and is in accordance with the fine-timescale (i.e., local) activation observed
in our MSE analysis (Vakorin et al., 2011).
For the music condition, the spectral data from nonmusicians differed from the MSE
analyses. Specifically, low-frequency (theta) activation was associated with music processing in
many of the same regions that expressed higher frequencies when processing speech. These data
suggest that nonmusicians may utilize longer-range neural integration to process music (von
Stein & Sarnthein, 2000). However, this difference was not reflected in the MSE analysis (i.e.,
no increase in sample entropy at coarse timescales for the music condition), which implies that
nonmusicians do not have increased information processing capacity for music, relative to
speech. This interpretation is plausible because nonmusicians have experience casually listening
to music, yet they do not have the pitch-processing experience possessed by musicians – or, to a
lesser extent, Cantonese speakers.
84
In the MSE results for the Cantonese speakers, there was greater sample entropy for
music as compared to speech – a difference that was primarily expressed at fine timescales in
midline regions. Similarly, the spectral data showed that processing of music, as compared to
speech, was associated with beta and gamma frequencies in similar neural regions as in the MSE
results. Both the fine timescale and high-frequency activity suggest that the processing of music
versus speech in Cantonese speakers relies on locally - rather than globally - distributed networks
(Vakorin et al., 2011; von Stein & Sarnthein, 2000). There was also low-frequency (i.e., theta)
activation associated with processing music, particularly in several right hemisphere areas (e.g.,
anterior insula, ventral temporal cortex, and fusiform gyrus), and with processing speech in the
bilateral medial cortex. This low-frequency activity suggests that Cantonese speakers utilize
long-range neural integration to process music and speech (von Stein & Sarnthein, 2000). This
finding is not consistent with either local complexity supporting pitch processing (MSE data), or
low-frequency communication supporting such processing (high-frequency spectral data). Future
research could seek to clarify the global versus local nature of neural networks that support
music and speech processing in Cantonese speakers.
5.4.3 Comparisons to event-related potential findings (Chapter 3)
In Chapter 3, the MMN response was measured in the same three participant groups as an
index of early, automatic cortical discrimination of music and speech sounds. In that analysis,
only musicians showed an enhanced MMN response to both music and speech, which is
consistent with the current between-group effects observed in the present chapter. That is,
compared to Cantonese speakers and controls, musicians showed greater automatic processing
(Chapter 3; Hutka, Bidelman, & Moreno, 2015) and information processing capacity (present
study) used in the processing of both music and speech.
However, in Chapter 3, no differences were observed for any group in MMN amplitude
to music or speech stimuli. In contrast, both sample entropy and spectral characteristics between
conditions were observed in controls and Cantonese speakers. Furthermore, each group had a
unique spatiotemporal distribution in response to music and speech. Despite having lower
sample entropy than musicians or nonmusicians across all conditions, Cantonese speakers
showed greater sample entropy for music as compared to speech. These data suggest that
Cantonese speakers have larger information processing capacity for pitch than timbre. In
85
contrast, MMNs did not reveal a difference in automatic processing of music versus speech in
the Cantonese group (Chapter 3; Hutka et al. 2015). The differences between the MMN findings
and the present results suggest that the nonlinear analyses provided additional, more fine-grained
information about between-condition differences (see Chapter 4 and Hutka et al. 2013 for a
discussion). That is, the averaging conducted to increase the signal-to-noise ratio in ERP
analyses may eliminate important signal variability that carries information about brain
functioning (Chapter 4; Hutka et al. 2013).
5.5 Limitations
It is important to note that further research is required to delineate specific differences in
the nature of the underlying neural activity between groups and conditions. Between-group
differences in pitch processing demands may relate to the spatiotemporal differences observed at
present (i.e., musicians > Cantonese > controls), as well as pre-existing group differences (see
Section 5.4.1 for a discussion). Furthermore, it is plausible that musicians and Cantonese
speakers’ precise use of pitch is more similar to one another than Cantonese and controls’ use of
pitch, explaining the observed between-conditions results. To specifically relate neural activity to
pitch processing demands, these demands would need to be precisely quantified. Future studies
could accomplish this by training naïve participants to distinguish between different numbers of
lexical tones (e.g., one group learns three tones, another learns four tones, etc), and then
measuring BSV when processing learned (versus unlearned) tones.
5.6 Conclusions
The present data suggest that the use of pitch for musicians relative to tone-language
speakers is associated with different information processing capacities. Furthermore, each
group’s pitch processing was associated with a unique spatiotemporal distribution, suggesting
that musicianship and tone language do not share processing resources for pitch, but instead, use
different networks. These data also serve as a proof-of-concept of the theoretical premise
outlined in Chapter 4 (see Hutka et al., 2013), namely how applying a nonlinear approach to the
study of the music-language association can advance our knowledge of each domain, as well as
the role of experience-dependent plasticity and pre-existing differences.
86
The present chapter represented the conclusion of investigating how speaking a tone
language confers benefits to perceptual processes (i.e., spectral acuity), and how tone-language
speakers are similar to and different from musicians. As discussed in Chapter 1, the extent to
which using a tone language may confer benefits to executive function is still unknown. The
investigation of how speaking a tone language impacts executive function is thus the focus of
Chapter 6.
87
Chapter 6 Tone Language, Musicianship, and Executive Function
6.1 Introduction
6.1.1 Is tone-language experience associated with enhancement in executive function, as is musicianship?
The previous studies in this thesis examined associations of tone-language experience with
spectral acuity, especially as compared with musicians. Convergent evidence from behavioural
data as well as linear and nonlinear dependencies in neuroimaging data revealed some positive
associations between auditory processing and tone-language experience, as was the case for
musicians. Musicians and tone-language speakers appear to have similar pitch discrimination
abilities, as revealed in some behavioural tasks, despite musicians showing larger automatic
responses to pitch changes than tone-language speakers and nonmusicians who do not use a tone
language (Chapter 3). There were suggestions, moreover, of a gradient of pitch processing at the
neural level such that the more extensive experience one had with pitch processing, the greater
BSV observed during pitch processing (Chapter 5).
Because of claims that musicianship enhances skills beyond the perceptual realm such as
auditory working memory, (Pallesen et al., 2010; Parbery-Clark, Skoe, & Kraus, 2009), verbal
and visual memory (e.g., George & Coch, 2011), response inhibition (Moreno et al., 2011),
verbal fluency, processing speed, and task switching (Zuk et al., 2014), it is reasonable to ask
whether tone-language experience is linked to comparable enhancements. Given that
musicianship and tone-language experience have comparable links to some auditory processes
(e.g., pitch processing), it is of interest to examine whether these processes extend to and are
modulated by executive function.
6.1.2 Cognitive benefits in balanced, tone-language bilinguals
The tone-language speakers in the present study, as well as in previous chapters, were
proficient in two languages. Note, however, that nearly 86% of the musicians and 75% of the
nonmusicians had some experience with a second language (see Section 6.3.1). The effects of
bilingualism on cognitive processing, and executive control in particular, have been studied
extensively (see Bialystok, Craik, & Luk, 2012 for a review). Many studies have focused on the
hypothesized bilingual advantage in inhibitory control (see Hilchey & Klein, 2011 for a review)
88
and working memory (e.g., Bialystok & Feng, 2010; Engel de Abreu, 2011; Morales, Calvo, &
Bialystok, 2013). For example, Morales, Calvo, and Bialystok (2013) showed that bilingual
children performed better than monolingual children regardless of working memory load;
furthermore, bilingual children excelled relative to monolingual children when tasks made
additional demands on executive function. However, other studies did not observe working
memory differences between bilingual versus monolingual children (Bialystok & Feng, 2010;
Engel de Abreu, 2011). However, Morales et al. (2013) posited that these studies did not detect
this advantage because both studies required verbal processing (Bialystok & Feng, 2010:
recalling lists of words; Engel de Abreau (2011): tasks involving words and digits) – an ability
which bilingual children tend to perform poorly on, as compared monolinguals (e.g., Gollan,
Montoya, & Werner, 2002; Portocarrero, Burright, & Donovik, 2007; Rosselli et al., 2000).
Unlike these two studies that involved verbal processing, the tasks used in Morales et al. (2013)
had low verbal requirements, reducing the possibility that verbal processing confounded their
results.
Notably, there does not appear to be a difference in cognitive abilities between bilinguals
who speak a tone language versus a non-tone language (task switching: Barac & Bialystok,
2012; receptive vocabulary: Bialystok, Luk, Peets, & Yang, 2010). More generally, others have
questioned the positive association between bilingualism and cognitive benefits, based on a
failure to find such associations (Paap et al., 2014) and a potential publication bias favoring
enhancement evidence (de Bruin et al., 2015). Based on this literature, one might predict that
balanced bilinguals who speak a tone language (e.g., Cantonese-English bilinguals) would
perform comparably to non-tone-language-speaking, balanced bilinguals and their non-tone-
language monolingual counterparts on working memory measures (Barac & Bialystok, 2012;
Bialystok, Luk, Peets, & Yang, 2010).
6.1.3 Working memory in tone-language speakers and musicians
If one examines the links between the use of lexical tone and enhanced working memory,
there is a possibility that tone-language speakers might show a working memory advantage over
non-tone-language speakers, regardless of bilingual status. Specifically, tone-language use
involves relative pitch processing (Xu, 1997, 1999; Xu & Wang, 2001), which recruits working
memory. For example, studies of AP possessors versus relative pitch (RP) possessors have found
89
that only RP processing is associated with a neural component (i.e., P300) that indexes working
memory (Klein, Coles, & Donchin, 1984, Wayman et al., 1992). Studies using positron emission
tomography and functional magnetic resonance imaging have found that areas involved in
monitoring pitch information are also more active during RP processing than during AP
processing in musicians (Zatorre et al., 1998). The involvement of RP and pitch monitoring in
using pitch to distinguish lexical meaning raises the possibility that tone-language speakers could
show improvements in auditory working memory as compared to non-tone-language speakers.
Such an improvement may be rooted in top-down modulation of perceptual benefits
(Moreno & Bidelman, 2014). Moreno and Bidelman (2014) argue that general enhancements in
executive function confer perceptual benefits in musicians (i.e., a top-down influence),
regardless of whether these executive functions are specific to the auditory domain. As Moreno
and Bidelman (2014) note, the concept of top-down, executive-level regulation of sensory
processes has received support from nonhuman (Fritz, Shamma, Elhilali, & Klein, 2003) as well
as human studies (Myers & Swan, 2012), such that increased feedback from prefrontal and
parietal regions enhances or inhibits activity in stimulus-selective sensory cortices. Thus, top-
down executive-level mechanisms may modulate lower-level benefits in musicians. By the same
logic, if speaking a tone language benefits working memory, then working memory may
modulate perceptual processing in this group.
There is some evidence that musicians show enhancement in certain types of working
memory. Musicians have shown enhanced verbal (auditory) but not visual working memory
when compared with nonmusicans (Brandler & Rammsayer, 2003; Chan et al., 1998; Ho et al.,
2003; Parbery-Clark, Skoe, Lam et al., 2009; Strait et al., 2010; Tierney et al., 2008). This
evidence suggests that the benefits of musicianship to working memory are stronger in auditory
than in non-auditory domains (Moreno & Bidelman, 2014). However, there are also findings of
modality-independent memory enhancement in musicians (Bidelman, Hutka, et al., 2013; George
& Coch, 2011; Jakobson et al., 2008). Furthermore, the claim that music training confers non-
musical, cognitive benefits has been criticized. For example, studies testing this association often
fail to take into account that individuals with high full-scale IQ (FSIQ) are more likely than
others to take music lessons and do well on any test administered to them (Schellenberg, 2011a).
These conflicting results indicate the need to examine the contribution of working memory to
lower-level processes in musicians as well as to include a measure of intelligence while doing so.
90
6.1.4 The present investigation
The present study examined how visual working memory18 is related to pitch memory
and to pitch discrimination (i.e., top-down modulation from outside the auditory domain) and
how pitch memory is related to pitch discrimination (i.e., top-down modulation from within the
auditory domain). These associations were tested in musicians, tone language speakers, and
controls (nonmusician, non-tone-language speakers). This study examined if tone-language users
exhibit enhanced visual working memory relative to non-tone-language users. If speaking a tone
language and musicianship are associated with enhanced visual working memory, then these
groups should perform better than controls on a working memory task. Note that one would
predict that tone-language speakers and musicians would not show an identical benefit over
controls. As mentioned earlier, musicians have far greater demands on their auditory system, as
well as possible pre-existing differences related to auditory processing benefits (e.g., Macnamara
et al., 2014) that may distinguish their performance on auditory tasks from that of tone-language
speakers. Furthermore, musicians regularly engage cognitive functions as part of their discipline
(e.g., auditory working memory; multitasking) due to the demands of playing music at an
advanced level (e.g., memorizing lengthy pieces, attending to auditory and visual cues from
other musicians while performing in an ensemble), which is not the case for tone-language
speakers.
To test this hypothesis, musicians, tone-language speakers, and controls were given a
two-back visual working-memory task. Participants also completed the F0 difference-limen task
from Chapter 3 and a short-term pitch memory task from Bidelman, Hutka, et al. (2013). These
tasks were used to probe how working memory outside the auditory domain (i.e., two-back
performance) is associated with an auditory difference limen task as well as a more cognitively-
demanding auditory task (pitch memory).
Performance on these two auditory tasks was also examined after holding visual working-
memory performance constant. If the lower-level auditory enhancements of musicianship and/or
18
Note that the phrase “visual working memory” is used to emphasize that there were no auditory stimuli presented to participants in this particular task. However, “verbal working memory” also accurately describes the present task, as participants hold and update information in their phonological loop, as per Baddaley & Hitch’s (1974) model of working memory.
91
speaking a tone language are driven by top-down modulation, then any benefits to pitch
discrimination or pitch memory observed in musicians and tone-language speakers should be
reduced after controlling for visual working memory. The Wechsler Abbreviated Scale of
Intelligence – Second Edition (Wechsler & Hsiao-Pin, 2011) was used to ensure there were no
group differences in fluid intelligence.
The present study examined tone-language experience from multiple languages rather
than Cantonese only as in much previous research (Bidelman, Hutka, et al., 2013; Chapter 3).
The rationale for the inclusion of multiple languages was that the performance of musicians,
Cantonese speakers, and nonmusician, non-Cantonese controls on pitch-difference limens and
pitch-memory tests is well-established. For example, the F0 DL results from Bidelman, Hutka, et
al. (2013) were replicated in Chapter 3. The inclusion of other tone languages made it possible to
examine whether specific tone languages have differential effects on auditory processing.
6.2 Methods
6.2.1 Participants
Sixty participants participated in this study. The participants were recruited from the
Royal Conservatory of Music, the University of Toronto, and the Greater Toronto Area. Each
participant completed questionnaires to assess their language and musical background (same as
in Chapter 3). English-speaking musicians (n = 21, 10 female; age: M = 24.19 years, SD = 3.12
years) were amateur instrumentalists with at least 8 years of continuous training in Western
classical music on their primary instrument (M = 14.91 years, SD = 3.21), beginning in
childhood (M = 8.19 years, SD = 3.20). All musicians had formal private or group lessons within
the past five years and currently played their instrument(s).
English-speaking nonmusicians (n = 20, 15 female; age: M = 21.45 years, SD = 2.89) had
≤ 3 years of formal music training (M = 0.75 years, SD = 0.85) and had not received formal
instruction within the past five years. Both musicians and nonmusicians had some experience
with a non-tone second language (musicians: 85.71%, nonmusicians: 75.00%; mainly French or
Spanish), but were classified as late L2 learners and/or had moderate to high levels of
proficiency in their second language. Proficiency ratings were based on participant responses on
a scale with seven options, ranging from “very poor” to “fluent”. Specifically, of the 18
92
musicians who had some L2 experience, participants rated their L2 proficiency as follows: n = 2
as fluent; n = 5 as very good; n = 3 as good; n = 1 as moderate, and n = 7 as fair. For the 15
nonmusicians with some L2 experience, the following ratings were obtained: n = 4 as fluent; n =
1 as very good; n = 4 as good; n = 3 as moderate, and n = 3 as fair. Participants who rated their
L2 proficiency as “poor” or “very poor” were considered to have no L2 proficiency.
Tone-language speakers (n = 19; 12 female; age: M = 23.84 years, SD = 4.09) were late
bilinguals, having begun formal instruction in English after a mean age of 12.74 years (SD =
4.58). This group consisted of nine Cantonese speakers, eight Mandarin speakers, one Thai
speaker, and one Vietnamese speaker. All participants were born and raised in predominantly
tone-language-speaking countries (e.g., China, Thailand, Vietnam), and reported using their
native tone language on a regular basis (M = 38.89% of daily use, SD = 19.34%). As with
nonmusicians, tone-language speakers had minimal musical training (M = 0.42 years, SD = 0.84)
and no formal instruction in the past five years. Importantly, nonmusicians and tone-language
speakers did not differ in amount of music training, F(1, 37) = 1.479, p = 0.232, η2p = .038. All
participants were right-handed. Despite attempts to match participants’ years of education, there
was a significant group difference on this metric, F(2, 57) = 6.583, p = .003, η2p = .188).
Specifically, musicians had more formal education (M = 17.48 years, SD = 2.21) than the other
two groups (tone-language speakers: M = 15.90, SD = 1.49, p = .045; nonmusicians: M = 15.30,
SD = 2.16, p = .003, which did not differ from each other, p > .05). We posit that the additional
years of education musicians received relative to the other two groups was due to music-specific
training, during which few or no non-music courses would have been a part of the participants’
curriculum. Nearly all musicians were pursuing a first or second post-secondary music degree
via the Royal Conservatory of Music, during which performance on one’s instrument is the
primary focus of education. Thus, any attenuation of significant effects was likely due to
increased years of musical training in musicians.19 All participants provided written, informed
consent in compliance with an experimental protocol approved by the Baycrest Centre Research
Ethics Committee. All were provided financial compensation for their time.
19 Task performance was also analyzed with years of education partialled out. We conducted an analysis of covariance (ANCOVA) on each dependent variable, with years of education as the covariate. Though musicians had approximately two more years of education than the other groups, the pattern of results did not change when controlling for years of education, with the exception of pitch memory reaction time. Namely, the significant group effect became marginal controlling for years of education, F(2, 56) = 2.840, p = .067, η2
p = .092 (driven by musicians being marginally faster than tone language speakers, p = .067).
93
6.2.2 Measures
6.2.2.1 F0 DL task
See Section 3.2.3 for description of F0 DL task (see Figure 13A for a schematic
illustration).
Figure 13. A: Fundamental frequency difference limen task. B: Pitch memory task. C: Visual
two-back task.
6.2.2.2 Pitch memory task
The short-term pitch-memory task used in the present study was the same as that used in
Bidelman, Hutka, et al., (2013) (see Figure 13B for a schematic illustration). The task was
designed to test the relationship between musical and nonmusical cognitive abilities (cf. Russo et
al., 2012; Steinke et al., 1997). The task assessed short-term memory of pitch sequences. On
each trial, participants heard a four-note melody (350-ms complex tones). Following a 1.5-s
silent interval, participants were asked to judge as quickly as possible whether or not a probe
tone had been heard in the preceding sequence.
Individual pitches were selected randomly from the Western chromatic scale. Random
selection ensured that melodies were tonally ambiguous, thereby minimizing the chance that
sequences could be recalled based on musical labels (e.g., musical solfège: Do, Re, Mi, etc.).
Participants heard 50 trials during the course of a run, half of which were catch trials in which
the probe tone did not occur in the melody. Sensitivity (d') was computed using rates of hits (H)
94
and false alarms (FA) for each run (i.e., d' =z(H)- z(FA), where z represents the z-transform).
Individual d' values were then averaged for two consecutive runs to yield a pitch-memory score.
Reaction time for correctly identified trials was calculated as the time lag between stimulus
offset and listeners’ response.
6.2.2.3 Two-back task
A visual two-back task (Esopenko et al., 2013) was administered to probe working
memory beyond the auditory domain (see Figure 13C for a schematic illustration). In each block,
participants were presented with numbers (1 through 8) on a computer screen. The numbers were
presented one at a time in the centre of a computer monitor, in a continuous sequence, with an
ISI of 500 ms. Presentations of each stimulus ranged from 800 ms to 975 ms, increasing in units
of 25 ms (i.e., to avoid predictability of stimulus presentation). Each block consisted of 72
stimuli, 24 of which were targets (i.e., the same stimulus presented two numbers earlier), and one
run consisted of 3 blocks. Participants were instructed to press a button on a keyboard as soon as
they detected a target. Number of correct and incorrect responses and reaction time were
recorded.
6.2.2.4 Wechsler Abbreviated Scale of Intelligence - Second Edition
Two subtests (Block Design, and Matrix Reasoning) from the Wechsler Abbreviated
Scale of Intelligence - Second Edition (WASI-II; Wechsler & Hsiao-Pin, 2011) were
administered in a standardized order. In the Block Design subtest, participants were asked to
construct designs from red and white blocks to match the design of a target picture. In the Matrix
Reasoning subtest, participants saw a matrix of coloured drawings on each trial. One section of
the matrix was missing, and the participant was asked to select one of five options to complete
the missing section.
Standardized T scores (M =50, SD = 10) were provided for each subtest. Each score was
based on norms from a large sample of American adults, calibrated separately for age.
Composite T scores from the Block Design and Matrix Reasoning subtests yielded a metric
called Performance IQ (PIQ), which was calculated for each participant. Note that FSIQ was not
measured because it requires T scores from a verbal subtest (Vocabulary) and Matrix Reasoning.
PIQ, consisting solely of non-verbal subtests, was considered appropriate for individuals whether
95
or not their first language was English. Verbal measures in a non-native language may depress
the scores of non-native relative to native speakers (e.g., Kaufman Brief Intelligence Test,
Schellenberg, 2011b).
6.2.3 Procedure
Participants completed the WASI-II, two runs of the F0 DL task, two runs of the pitch-
memory task, and three runs of the two-back task. The two auditory tasks were run on MATLAB
v2009b, and the two-back task was run on Presentation software (version 17), both programs
running under Windows 7. The order of tasks was counterbalanced.20 The WASI-II subtests
were always presented in the same order (Block Design followed by Matrix Reasoning).
Auditory stimuli were delivered binaurally via over-the-ear insert earphones (Beyerdynamic DT
770 PRO, Heilbronn, Germany). The participation session lasted approximately one hour.
6.2.3.1 Statistical analysis Data for all blocks for a given task were averaged together. Data for all blocks for a given
task were averaged. Prior to statistical analyses, F0 values were square-root transformed to
satisfy normality and homogeneity of variance assumptions required for parametric statistics.
Note that when a univariate ANOVA was conducted on the raw F0 DL data (rather than the
transformed data), the pattern of results remained the same (i.e., significant results remained
significant, and non-significant results remained non-significant). However, the bar graph
displaying the F0 DL results (Figure 14) shows the raw F0 DL means and standard error bars, as
these values are easier to interpret than the square root values. For pitch memory, d’ and mean
reaction time were analyzed. For the two-back task, accuracy and reaction time were analyzed.
For the WASI-II, there was no significant between-groups difference in performance on the
Block Design or Matrix Reasoning subtests (i.e., the two subtests that comprise the PIQ metric).
Thus, only PIQ is reported in subsequent analyses. Between-groups differences were examined
via univariate ANOVA (group x task). Tukey’s correction for multiple comparisons was reported
for pairwise comparisons.
20
The session was comprised of WASI-II, two blocks of the F0 DL task and pitch memory-task respectively, and three blocks of the two-back task (i.e., eight items in total). The eight items were arranged according to a Latin square design (i.e., eight x eight matrix; one row = order of tasks for one participant). If two of the same auditory tasks ended up in consecutive order, they were re-arranged, in order to keep the session as engaging as possible. The order for the first eight participants was repeated for each subsequent set of eight participants.
96
Analyses of covariance (ANCOVAs) were also conducted to test for top-down influences
on lower-level task performance. For pitch memory d’, two-back accuracy was held constant.
For the F0 DL data, two-back task accuracy was held constant. For the ANCOVAs as well as the
correlation analyses (described subsequently), only d’ for pitch memory and accuracy for the
two-back task were included (i.e., no reaction time data). This was done because it is difficult to
interpret the meaning of reaction time performance in isolation of accuracy measures. For
example, it is unclear whether a group with slow reaction time and high accuracy might perform
worse if pressed to give a faster response; similarly, the inverse is possible. The Bonferroni
correction for multiple comparisons was reported for pairwise comparisons. For all ANCOVAs,
the assumption of homogeneity of regression slopes was not violated. Furthermore, the covariate
was not significantly related to each dependent variable.
Correlations were also run between F0 DL, d’, two-back accuracy and PIQ, both across
all groups, and for each group.
6.3 Results
6.3.1 Correlations Correlations between tasks across all groups (Table 6) were significant
between the F0 DL task and d’; d’ and 2-back accuracy, and 2-back accuracy and PIQ. To more
closely examine the association between tasks within groups, correlations between tasks for each
group are reported (Table 7). The correlation between F0 DL and d’ is only significant in Ms.
The correlation between 2-back accuracy and PIQ is only significant in NMs.
Table 6.
Correlations between Tasks Across All Groups. 1 2 3 1 F0 DL
2 Pitch memory: d' -.381** 3 2-back: Accuracy -0.152 .277*
4 PIQ -0.212 0.191 .282* Note. ** p ≤ .01 *p ≤ .05
97
Table 7. Correlations between Tasks, Displayed by Group. M
1 2 3
1 F0 DL
2 Pitch memory: d' -.647**
3 2-back: Accuracy 0.145 0.087
4 PIQ -0.084 -0.145 -0.009 TL
1 F0 DL
2 Pitch memory: d' -0.446
3 2-back: Accuracy -0.070 0.256
4 PIQ -0.200 0.425 0.263
NM
1 F0 DL
2 Pitch memory: d' -0.366
3 2-back: Accuracy -0.151 0.303
4 PIQ -0.194 0.285 .519*
Note. M = musicians; TL = tone language speakers; NM = nonmusicians. ** p ≤ .01 *p ≤ .05
6.3.2 F0 DL
There was a significant between-groups difference in F0 DL performance [F(2, 57) =
8.072, p = .001, η2p = .221] such that musicians had lower pitch discrimination thresholds (i.e.,
better performance) than nonmusicians (p = .001; Figure 14). Musicians performed comparably
to tone-language speakers (p = .116); tone-language speakers performed comparably to
nonmusicians (p = .143).
98
Figure 14. Performance on the fundamental frequency (F0) difference limen task. Musicians (M)
showed superior pitch discrimination performance relative to nonmusicians (NM) controls. **p ≤
.001. Error bars indicate SE.
6.3.3 Pitch memory There was a significant between-groups difference in performance measured by d’ [F(2,
57) = 32.866, p < .001, η2p = .536]. Namely, musicians had higher d’ scores (i.e., better
performance) than both tone-language speakers (p < .001) and nonmusicians (p < .001; Figure
15A). Tone-language speakers marginally outperformed nonmusicians, p = .066. There was also
a significant between-groups difference in pitch-memory reaction time [F(2, 57) = 3.198, p =
.048, η2p = .101]. Musicians had faster reaction times than tone-language speakers (p = .038;
Figure 15B). There was no significant difference between the reaction times of musicians and
nonmusicians (p = 0.528) or between tone-language speakers and nonmusicians (p = .329).
99
Figure 15. Between-group performance on the pitch memory task (A: d’ data; B: reaction time
data). A gradient in d’ performance is visible, such that M > TL > NM. M perform faster than
TL. There appears to be a speed-accuracy trade-off in TL, such that good performance is
accompanied by slower reaction times. **p ≤ .001, *p < .05. � = marginally significant. Error
bars indicate SE.
6.3.4 Two-back task There was no significant between-groups difference in two-back task accuracy [F(2, 57)
= 1.498, p = .232, η2p = .050; Figure 16A] or mean reaction time [F(2, 57) = 2.262, p = .113, η2
p
= .074; Figure 16B].
Figure 16. Between-group performance on the two-back task (A: accuracy data; B: reaction time
data). Group differences are not significant. Error bars indicate SE.
There was also no significant between-groups difference in the PIQ score [F(2, 57) =
2.253, p = .114, η2p = .073; Figure 17]. In Figure 17, note that the apparent difference between
the nonmusicians and the other two groups is primarily driven by one nonmusician participant
who scored poorly on both Block Design and Matrix Reasoning subtests. This participant did not
100
score outside of the three standard deviation cut-off for either subtest or on other tasks, and
remained in the analysis. Note that removing this outlier from the analysis moved the
nonmusician group mean up to 105.000, SD of 10.930 (from M = 102.950, SD = 14.043).
Figure 17. Between-group PIQ performance. Group differences are not significant. Error bars
indicate SE.
6.3.5 ANCOVAs After controlling for two-back task accuracy, there was still a significant between-groups
difference in pitch memory d’ [F(2, 56) = 29.687, p < .001, η2p = .515; musicians outperformed
tone-language speakers and nonmusicians, p < .001, tone-language speakers marginally
outperformed nonmusicians, p = .069] and F0 DL performance [F(2, 56) = 7.129, p = .002, η2p =
.203; musicians outperformed nonmusicians, p = .001].
6.4 Discussion
6.4.1 Working memory
Musicians outperformed tone-language speakers and controls on measures of pitch
discrimination. Tone-language speakers marginally outperformed controls on the pitch-memory
task. The tone-language group’s accuracy appeared to have been achieved at the expense of their
reaction time, which was slower than that achieved by musicians (p = .038; i.e., time-accuracy
trade-off). Tone-language speakers did not show any enhancement to working memory, as
compared to musicians and controls. Indeed, there were no group differences in two-back task
performance. There were also no between-group differences in the WASI-II composite score.
These data suggest that the use of relative pitch within tone languages (Xu, 1997, 1999; Xu &
Wang, 2001) does not translate to any benefit outside the auditory domain, but may confer some
101
modest benefits over controls on pitch-memory performance. This lack of a working memory
enhancement in tone-language speakers (i.e., bilinguals) is also consistent with the reported
absence of cognitive-processing advantages for young adult bilinguals (de Bruin et al., 2015;
Paap et al., 2014). Note, however, that most musician and control participants had some
experience with a second language, which minimizes the likelihood of finding between-group
differences in cognitive performance. Such effects, if evident, would be revealed more readily
from comparisons of bilingual and monolingual speakers.
The lack of group differences on the visual working memory task does not align with the
numerous claims in the literature about the cognitive benefits of music training (e.g., George &
Coch, 2011; Jakobson et al., 2008). How can one account for these differences? First, let us
consider FSIQ in relation to the current finding. Specifically, the cognitive benefits of music
training documented in the literature might be confounded by the fact that high-IQ participants
tend to take music lessons, and relatedly, also do better on cognitive tasks (i.e., music training
does not cause cognitive benefits; Schellenberg, 2011a). To claim that there is a specific
association between nonmusical cognitive abilities and music training would require that such an
effect would remain when controlling for general intelligence (Schellenberg, 2009). The present
study included a measure of intelligence (see Section 6.2.2.4 for rationale for measuring PIQ
over FSIQ), on which between-group performance did not differ. Thus, when PIQ performance
was comparable across groups, working memory did not differ.
How can one account for the lack of a between-group PIQ difference, considering the
claim that high-IQ participants tend to take music lessons (Schellenberg, 2011a)? Perhaps the
answer can be found in the type of musicians being tested. The current musician sample studied
music exclusively, with the aim of becoming career musicians (beginning lessons, on average, at
age 8; SD = 3.20, and having formally studied music continuously for a mean of 14.91 years, SD
= 3.21). As discussed in Schellenberg (2011a), those who study music exclusively (i.e., instead
of something else, not in addition to something else) do not show any differences in intelligence
as compared to nonmusicians (Bialystok & DePape, 2009; Brandler & Rammsayer, 2003;
Helmbold, Rammsayer, & Altenmuller, 2005; Schellenberg, & Moreno, 2010). Specifically,
Schellenberg & Moreno (2010) found no difference in intelligence between participants with an
average of 11 years of music lessons (i.e., a lower mean than in the present musician group), as
compared to nonmusicians.
102
Out of three recent studies that have found non-auditory, memory-specific benefits in
musicians (e.g., Bidelman et al., 2013; George & Coch, 2011; Jakobson et al., 2008), two did not
appear to test music students/professional musicians. George and Coch (2011), who found
benefits in musicians’ auditory and visual working memory as compared to nonmusicians, did
not report testing either university music students or professional musicians. Jakobson et al.
(2008), who found that musicians, as compared to nonmusicians, had verbal and nonverbal
memory benefits (i.e., learning, recall, and delayed recall tasks), did not test professional
musicians though it is unclear whether their participant sample included university-level music
students. Both authors did not administer a test of general intelligence, suggesting that these
benefits might indeed be related to self-selection via FSIQ in musicians (Schellenberg, 2008,
2009). Bidelman, Hutka, et al. (2013), who recruited their musician group primarily from the
University of Toronto’s Faculty of Music, only found a benefit in visuospatial short-term
memory (i.e., forward Corsi blocks task) in music students – a task which is not comparable to
the visual working memory task used at present. These observations further support the claim
that only individuals who take music training in addition to something else (rather than instead of
something else) have a high FSIQ, which in turn, is related to superior performance on other
cognitive tests (Schellenberg, 2011b). This claim may therefore account for why the present
music students did not show superior PIQ performance, and relatedly, did not show superior
visual working memory performance, as compared to the other groups.
The ANCOVAs controlling for working memory accuracy in the present study helped
test whether visual working memory accounts for differences in pitch memory or pitch
discrimination (i.e., a top-down influence of executive function to lower-level functions). There
were still group differences in F0 DL performance after controlling for two-back accuracy, as
there was initially no group difference in performance on the latter measure. There were also
group differences in d’ after controlling for two-back accuracy. These results support the notion
that pitch discrimination and pitch memory performance are not modulated by top-down control
via visual working memory in musicians or tone-language speakers. The pattern of correlations
for each group also suggests that perceptual (F0 DL) and cognitive (pitch memory d’) auditory
tasks were not significantly correlated with visual working memory accuracy. These results,
coupled with the null finding for between-group visual working memory differences, provide
103
evidence against the model posited by Moreno and Bidelman (2014), as related to visual working
memory in university music students.
The correlation between d’ and F0 DL performance is also notable. Specifically, the
significant correlation between F0 DL performance and pitch memory d’ found in musicians, but
not the other two groups, suggests that any within-domain top-down modulation is only
associated with musicianship (and not speaking a tone language). The conclusion that musicians’
cognitive advantages are stronger in auditory than in non-auditory domains is consistent with
other studies that reveal musicians’ (as compared to nonmusicians’) enhanced auditory, but not
visual, working memory (e.g., Brandler & Rammsayer, 2003). It is possible that a similar
correlation was not observed in tone-language speakers because of lesser audtory demands as
well as lack of self-selecting factors, as compared to musicians.
6.4.2 Replication of auditory measures, and associated limitations
The current data partially replicate the findings from Bidelman, Hutka, et al. (2013), who
reported that musicians were more accurate and responded faster on the pitch-memory task than
did Cantonese speakers and nonmusicians. Cantonese speakers were more accurate but
responded more slowly than nonmusicians controls. At present, there was a marginal gradient in
pitch-memory accuracy (musicians > tone language speakers > controls), with tone language
speakers showing a similar time-accuracy trade-off as in Bidelman, Hutka, et al. (2013). In
contrast to past findings (Bidelman, Hutka, et al., 2013; Chapter 3), the tone-language speakers
in the present study did not outperform controls on the F0 DL task. One possibility is that
Cantonese speakers had more pitch processing experience than the other language groups, and
thus outperformed them. Cantonese is more complex (three level tones; three contour tones;
Rattanasone et al., 2013, Wong et al., 2012) than Mandarin (one level tone, three contour tones;
Rattanasone et al., 2013), Thai (three level tones, two contour tones; Abramson, 1962;
Rattanasone et al., 2013), or Vietnamese (one level, five contour; Dung, Houng, Boulakia, 1998).
Indeed, these differences were a part of the rationale for including heterogeneous tone language
groups, namely to observe if behavioural results would still resemble those previously obtained
in homogenous Cantonese populations (Bidelman et al., 2013); Chapter 3). The present results
did not fully replicate these past findings, despite similar trends in the current data. These results
might lead one to conclude that using fewer tones in one’s language might be associated with
104
poorer behavioural performance, as compared to those who use a greater number of tones.
However, when examining the between-group differences on all behavioural measures, with the
groups defined according to first language, there were no differences (p’s > .05). This null
finding may be related to small sample sizes in each language group. Future studies could
conduct this same study with a greater number of participants who spoke, for example, Mandarin
(four tones), Thai (five tones), or Cantonese (six tones), to examine if better performance is
associated with greater linguistic pitch processing demands.
Another potential factor that may account for the differences between the present findings
and past literature is the percent of daily use of one’s tone language. Nonetheless, there were no
between-group differences in performance on the behavioural measures when controlling for the
percentage of daily use of one’s tone language (p’s > .05). However, when examining the mean
percent of daily tone-language use (38.89% of daily use, SD = 19.34%), and comparing it to that
of Chapter 3 (43.53% daily use, SD = 29.79 %), it is evident that the latter group used their
native tone language more often in daily life, albeit with a larger standard deviation. Perhaps the
interaction of speaking a complex tone language such as Cantonese combined with a high daily
use accounts for better performance on the F0 DL task. Future studies could examine how type
of tone language and daily use are related to F0 DL performance.
6.5 Conclusions
The findings from the present study suggest that neither tone language nor musicianship
is associated with advantages in visual working memory, as measured by a visual two-back task.
The lack of a benefit in musicians contrasts with the extensive literature showing nonmusical,
cognitive benefits in musicians relative to nonmusicians. However, this may be related to
differences in the type of musicians tested here (i.e., university music students) compared to
those tested in other studies (i.e., trained but not career musicians). Specifically, career
musicianship is not associated with the same FSIQ – and thus, cognitive – benefits as amateur
musicianship (Schellenberg, 2011a). Furthermore, top-down modulation of auditory abilities via
visual working memory may only be present in those with pre-existing differences in cognitive
function, perhaps accounting for the lack of top down effects observed in the present study. In
the auditory domain, musicians outperformed tone-language speakers and controls on the pitch-
memory and pitch-discrimination tasks. Tone-language speakers showed a marginal benefit over
105
controls on the former task. This result may be related to the higher auditory demands of
musicianship as well as pre-disposing factors that self-select those with superior pitch processing
for music training.
106
Chapter 7 General Discussion
7.1 Summary
The primary objective of the thesis was to assess whether tone-language experience was
associated with auditory-processing and executive-function benefits like those associated with
musicianship. Links between tone language and spectral acuity were first examined in Chapter 2,
in which musicians with absolute pitch but not tone-language experience showed enhanced pitch
encoding. These findings suggested that the pitch-processing advantages associated with
musicianship and tone language are independent. Because it was impossible to tease apart the
relative contributions of music and tone-language expertise on pitch processing in a population
of musicians, subsequent studies tested tone-language speakers who were nonmusicians,
musicians with no tone-language experience, and controls with neither tone-language experience
nor music experience.
Chapter 3 used behavioural measures and EEG to examine discrimination of music and
vowel sounds in tone-language speakers, musicians, and controls. This study established that
musicians and tone-language speakers were similar and better than controls in pitch
discrimination, but only musicians exhibited timbral processing advantages (i.e., first formant
discrimination) relative to tone-language speakers and controls, as revealed by brain and
behavioural measures. The findings suggested that tone-language users exhibit some pitch-
processing advantages observed in musicians, but these advantages are task-specific (i.e.,
relating to F0 cues). Interestingly, tone-language speakers’ enhanced pitch discrimination was
evident in behavioural but not in neural measures. This discrepancy between brain and
behaviour, as well the possibility that musicianship and tone-language experience have different
consequences (Chapter 2), promoted an inquiry into nonlinear means of uncovering nuances in
auditory processing networks in musicians and tone-language speakers.
Specifically, Chapter 4 focused on nonlinear approaches to perceptual processing
networks in musicians and tone-language speakers, complementing linear approaches in current
use in this domain. Chapter 5 applied this framework to the examination of nonlinear
dependencies in an EEG time series from Chapter 3. This analysis demonstrated that musicians,
tone-language speakers, and controls use different networks to support auditory processing of
107
speech and music. Furthermore, there was a gradient of pitch processing ability such that greater
experience with pitch was associated with greater sample entropy in pitch processing. In
contrast, neural data from the MMN analyses (Chapter 3) indicated that musicianship but not
tone-language experience was associated with neural enhancements that supported auditory
processing of pitch and timbre. In other words, automatic processing of pitch or timbre changes
did not differ between tone-language speakers and controls (nonmusicians who did not speak a
tone language) in Chapter 3, whereas MSE data showed processing differences between these
groups. The nonlinear data from Chapter 5 thus provide a nuanced view of the neural networks
that support auditory processing in musicians and tone-language speakers. This difference in
results between the two datasets demonstrates the value of multiple convergent techniques for
investigating auditory processing in musicians and tone-language speakers.
Collectively, Chapters 2 through 5 examined similarities and differences in pitch
processing in tone-language speakers and musicians by means of multiple approaches, including
behavioural tasks, EEG, multiscale entropy, and spectral analysis. The overall conclusion that
emerges from this research is that pitch experience arising from tone-language use is not
associated with the same auditory-processing benefits that are evident among musicians.
However, tone-language speakers still exhibit better spectral acuity than controls. This gradient
effect might be related to two factors. First, there are differences in the auditory processing
demands associated with musicianship and speaking a tone language. Namely, musicians
constantly engage in a range of auditory tasks, such as pitch discrimination and pitch memory, as
part of their discipline, whereas the only aspect that differentiates tone-language speakers from
non-tone-language-speaking controls is the former group’s use of pitch at the phonemic level
(i.e., six lexical tones). Second, musicians are a self-selected sample as compared to tone-
language speakers: Pre-existing differences (e.g., pitch processing aptitude, SES, personality)
may lead certain individuals to music training, whereas everyone born in a tone-language-
speaking country will learn that language, regardless of predispositions that favour auditory
processing. Based on these two factors, it is plausible that musicians outperform tone-language
speakers, who in turn, outperform controls on auditory tasks.
Chapter 6 sought to move beyond the auditory realm, and examined whether musicians
and tone-language speakers would show benefits on a visual working memory task, as compared
to controls. This investigation was motivated by two factors. First, tone language has been
108
reported to utilize relative pitch processing (Xu, 1997, 1999; Xu & Wang, 2001), which has been
associated with benefits to working memory (Klein, Coles, & Donchin, 1984, Wayman et al.,
1992). Notably, there is also a wealth of literature demonstrating that musicians outperform
nonmusicians on non-musical, memory-related tasks (e.g., George & Coch, 2011; Hansen,
Wallentin, & Vuust, 2013). Second, a recent theory posits that music training confers benefits to
cognitive and perceptual domains via top-down, executive level modulation (Moreno &
Bidelman, 2014). If speaking a tone language is associated with visual working memory benefits
via relative pitch processing, then this group would outperform controls on such a measure. Such
findings would parallel observations of enhanced visual working memory in musicians (e.g.,
George & Coch, 2011; Jakobson et al., 2008). Furthermore, if both musicians and tone-language
speakers show a benefit to visual working memory, then one might observe an association
between working memory performance and performance on auditory tasks (i.e., top-down
modulation).
To this end, Chapter 6 tested musicians, tone-language speakers, and controls on
perceptual and cognitive aspects of auditory processing and on visual working memory (i.e.,
visual two-back task). The results of this study revealed that neither tone language nor
musicianship was linked to advantages to visual working memory. Tone-language speakers
outperformed controls on pitch memory, but musicians outperformed tone-language speakers.
These findings lead to four conclusions. First, tone language is not associated with enhanced
working memory, via relative pitch use. Second, there does not appear to be any top-down
modulation of cognitive and perceptual auditory tasks via visual working memory in either
musicians or tone-language speakers. Third, music students pursuing a professional career in
their discipline do not outperform tone-language speakers or controls on visual working memory
(performance may differ for amateur musicians who are not primarily pursuing music as a
career). At the surface, these findings seem contrary to the wealth of evidence supporting
nonmusical, memory-related benefits in musicians. However, this discrepancy may be explained
by pre-existing differences in cognitive skill between those who study music exclusively (those
in the current sample) versus those who pursue music training in addition to other endeavours
(those in prior research that demonstrated a visual working memory benefit in musicians).
Fourth, musicians outperformed tone-language speakers and controls on auditory tasks (pitch
memory; pitch discrimination), with a gradient effect emerging on the pitch-memory task
109
(musicians outperform tone language speakers, who in turn, outperform controls). This may be
related to the high demands of auditory processing as well as self-selection criteria associated
with musicians, which are not present in tone-language speakers.
Collectively, these data support the hypothesis that musicians and tone-language speakers
differ with regard to their auditory processing capacities. Speaking a tone language does not
confer additional benefits in pitch encoding for AP and non-AP musicians. Furthermore,
speaking a tone language is not associated with enhanced neural responses indexing pitch
discrimination. Conversely, AP (i.e., a music-related ability) is associated with better pitch
encoding, and musicians showed larger MMN responses to pitch and timbre discrimination as
well as better behavioural timbral discrimination than Cantonese speakers and controls. A
gradient effect emerged for the information processing capacities of musicians, Cantonese
speakers and controls, as well as for behavioural pitch memory performance. These effects may
be associated with the aforementioned differences in the auditory demands associated with each
group, as well as the self-selection factors related to musicianship. Cantonese speakers did,
however, perform similarly to musicians on pitch discrimination, suggesting that any similarities
between musicians and tone-language speakers may exist at a very basic, perceptual level, are
not modulated by executive function (i.e., visual working memory, Chapter 6), and are supported
by different information processing capacities for pitch (Chapter 5). Finally, no between-group
differences in visual working memory were observed, suggesting that visual working memory is
not especially honed in music students or in tone-language speakers, and thus, does not modulate
top-down control over lower-level auditory processing in these groups.
7.2 Musicianship: Nature versus nurture
As discussed in Chapter 1, music training can be viewed as a model of gene-environment
interaction (i.e., nature and nurture; Schellenberg, 2015). In contrast to musicians, tone-language
speakers are largely the product of environmental factors. Steps can also be made to disentangle
the effects of nature versus nurture via the use of random assignment to music training versus a
control group in a longitudinal design (e.g., Chobert et al., 2014). However, it is notable that
random assignment to music training in a longitudinal design is associated with significant costs
(i.e., providing participants with free music lessons), and fraught with difficulties in
standardizing how lessons are taught, while still providing an environment typical of music
110
training (i.e., music lessons taught in person rather than via computer program, as in Moreno et
al., 2011). In order to justify the costs and efforts required to implement such studies, strong
correlational and quasi-experimental evidence would be required to obtain funding that allows
for their implementation. It is therefore important to acknowledge the role that correlational and
quasi-experimental studies have played - and continue to play - in advancing research in the
psychology and neuroscience of music. However, one must simultaneously recognize the
limitations regarding causal inferences that can be made from such studies. These same
statements can be applied to other studies that compare expert versus non-expert populations. For
example, in a study of expert phoneticians (i.e., individuals trained to analyze and transcribe
speech) and non-phoneticians, there were correlations between neural structure (e.g., left pars
opercularis size) and years of phonetic training experience (Golestani, Price, & Scott, 2011).
However, there were also structural between-group differences in the transverse gyri in the
auditory cortex (thought to be established in utero) in the phoneticians versus controls,
suggesting that pre-existing differences in the brain might also lead certain individuals to
gravitate towards the study of phonetics (Golestani et al., 2011). The contributions of nature and
nurture likely play a role in many other types of expertise, necessitating their consideration in
studies of experts versus non-experts.
Correlational and/or quasi-experimental studies can also attempt to mitigate the
possibility that a third variable, such as intelligence, other cognitive abilities, or demographics
(e.g., personality, SES, education) is driving performance in musically-trained participants (as
compared to other groups). The present thesis attempted to do this by including measures to
probe non-verbal intelligence and matching or controlling for educational background (Chapter
3, 5, 6). These findings lend credence to the view that the effects observed in musicians were
influenced by music training rather than by pre-existing differences in intelligence. However,
pre-existing differences in auditory abilities, personality, and socio-economic status could have
led to differences among groups that were observed in the present thesis. Future investigations
that implement a longitudinal design with training assigned randomly would be able to eliminate
the potential influence of these variables.
7.3 Future directions Understanding how music training and speaking a tone language shape the brain and
111
behaviour is applicable to understanding, and eventually, developing rehabilitative interventions
for music and language that involve pitch processing. Individuals who are enrolled in music
therapy are unlikely to possess the pre-existing factors that self-select for music training.
Understanding the differences between pitch processing experience conferred by a nature/nurture
interaction (i.e., musicianship) versus only nurture (i.e., the limited auditory demands associated
with speaking a tone language) can shed light on how the neural circuitries related to pitch
processing can be shaped in individuals undergoing music therapy.
One intervention that relies heavily on pitch processing is melodic intonation therapy
(MIT), which is used to improve speech production in patients with non-fluent aphasia—a
profound speech-production impairment following left-hemispheric stroke (Albert, Sparks, &
Helm, 1973; Bonakdarpour, Eftekharzadeh, & Ashayeri, 2000; Laughlin, Naeser, & Gordon,
1979; Schlaug, Marchina, & Norton, 2008, 2009; Sparks, Helm, & Albert, 1974; Wilson,
Parsons, & Reutens, 2006). In MIT, a patient sings common phrases at a slow pace accompanied
by rhythmic, left-hand (i.e., contra-lesional) tapping; a hierarchical series of steps are followed,
which move from singing to speech. The components of MIT can be broken down into two parts,
namely the pitch-based component (i.e., singing) and a rhythm-based component (i.e., hand
tapping, which maps sound to action) (Schlaug et al., 2009). The benefits of MIT have
traditionally been ascribed to the pitch-based component, which stimulates the intact right-
hemisphere, eventually assuming the function of damaged left-hemisphere speech regions (see
Stahl, Kotz, Henseler, Turner, & Geyer, 2013; note that the Stahl et al. also discuss how the
rhythmic component also holds important contributions towards MIT’s efficacy). Recent
neuroimaging studies have shown that MIT enlarges the right arcuate fasciculus (Schlaug et al.,
2009) – a white matter tract that connects brain regions that enable auditory motor interaction
(e.g., superior temporal lobes, inferior frontal areas, premotor and motor regions, Catani &
Mesulam, 2008); Wan, & Schlaug, 2010). Notably, the AF is particularly well-developed in
professional singers, as compared to instrumental musicians and non-musician controls (Halwani
et al., 2011).
Recently, a new type of therapy called auditory-motor mapping (AMMT) has been
derived from MIT, to help elicit vocal and verbal production in nonverbal or minimally verbal
children with autism (Wan et al., 2011). AMMT associates pitch with action, such that the
researcher sings words and phrases with social connotations with and to the child, while showing
112
the child pictures of the action, person, or object (Wan et al., 2011). Simultaneously, the
researcher guides the child’s hand to play two drum pads tuned to different pitches (Wan et al.,
2011). MIT and AMMT both demonstrate how music making, and specifically, pitch processing
in a musical context, can be used in individuals without music training to rehabilitate speech
capacities. The current thesis helped establish how pitch discrimination in musicians and tone-
language speakers manifests at the behavioural and neural level, and laid the groundwork for
studies that can further investigate how different experiences with pitch processing can tune the
neural circuitries involved in music and speech processing. This knowledge could eventually be
applied to optimize current music-based interventions for speech processing, and to develop new
interventions.
113
8 Appendices
8.1 Chapter 2: Nonmusical Stimuli
Table S1
Nonmusical (Control) Stimuli Descriptions.
Name Modality Description
Bird Auditory Bird chirping
Visual Picture of a bird
Camera Auditory Camera shutter sound
Visual Picture of a camera
Chicken Auditory Chicken clucking
Visual Picture of a chicken
Cow Auditory Cow mooing
Visual Picture of a cow
Dog Auditory Dog barking
Visual Picture of a dog
Duck Auditory Duck quacking
Visual Picture of a duck
Fly Auditory Fly buzzing
Visual Picture of a fly
Frog Auditory Frog croaking
Visual Picture of a frog
Horse Auditory Horse neighing
Visual Picture of a horse
Phone Auditory Phone ringing
Visual Picture of a phone
Typewriter Auditory Keys of a typewriter clicking
Visual Picture of a typewriter
114
8.2 Chapter 3: N1 and P2
8.2.1 Introduction
The auditory N1 usually occurs approximately 100ms after stimulus onset and has a maximum
amplitude over frontocentral areas (Vaughan & Ritter, 1970) and/or the vertex (Picton, Hillyard,
Krausz, & Galambos, 1974). The proposed source of the N1 is the primary and associative
auditory cortex (Vaughn & Ritter, 1970). Specifically, (Picton et al., 1999) found that the N1
with maximal amplitude at frontocentral and vertex regions is mainly generated by activity in the
supratemporal plane, likely in or slightly posterior to the primary auditory cortex. The N1 has
been posited to reflect sensory and physical properties of a stimulus, such as intensity, or timing
as compared to other stimuli (Näätänen & Picton, 1987). The auditory P2 wave follows the N1
wave at anterior and central scalp sites (Luck, 2005), spanning a latency range of 150 to 275 ms
(Dunn, Dunn, Languis, & Andrews, 1998). The P2 is primarily generated in the secondary
auditory cortex (Bosnyak et al., 2004; Pantev, Eulitz, Hampson, Ross, & Roberts, 1996; Picton et
al., 1999; Scherg, Vajsar, & Picton, 1989), while (Picton et al., 1999) describes that the posterior
regions of the frontal lobe may contribute to the later part of the scalp-recorded N1 and the P2
wave. The P2 has been elicited in a variety of cognitive tasks, such as selective attention
(Hackley, Woldorff, & Hillyard, 1990; Hillyard, Hink, Schwent, & Picton, 1973; Johnson, 1989)
and stimulus change (Näätänen, 1990).
The auditory N1 and P2 have been found to be larger in musicians as compared to non-
musicians (Bosnyak et al., 2004; Pantev et al., 1998; Shahin et al., 2003). Indeed, there is
convergent evidence that these components coincide with improved perception, as discussed in
(Tremblay, Ross, Inoue, McClannahan, & Collet, 2014). Therefore, one might hypothesize that if
musicians have enhanced perception of sounds (reflected in an enhanced auditory N1 and P2 in
previous literature), and Cantonese speakers process auditory stimuli with similar spectral acuity
as musicians, then both groups would have more a pronounced N1 and P2 than controls.
In addition to providing a means to investigate auditory processing in musicians and
Cantonese speakers, the present analyses provided the opportunity to test how the large and
small deviants differed from the standard prior to subtraction. If change detection was elicited by
the deviants, then one would expect a more positive P2 in the standard than the deviant condition
115
(i.e., the greater the change detection, the more negative the P2 amplitude for the deviants, thus
indicating the MMN).
8.2.2 Methods: Analysis window and statistics
In the present investigation, the N1and P2 waves were each measured by calculating the
mean amplitude between two given latencies (Luck, 2005). These latencies were centered around
each component’s group mean by ± 20 ms. For the N1, the analysis window was 80ms to 120ms;
the P2 analysis window was 160ms to 200ms. A mixed ANOVA was conducted on the N1 and
P2 waves, with group as the between-subjects variable, and stimulus type (music or speech) and
deviant size (standard, small, or large) as the within-subjects variables. Bonferroni corrections
were applied to all pairwise contrasts to control for family-wise error (α = 0.05). When
appropriate, the degrees of freedom were adjusted with the Greenhouse-Geisser epsilon (ε) and
all reported probability estimates are based on the reduced degrees of freedom, although the
original degrees of freedom are reported. Partial eta-squared (η2p) was used as the measure of
effect size for all ANOVAs.
8.2.3 Results
8.2.3.1 N1
There was a marginal main effect of group on mean N1 amplitude, F(2, 57) = 2.886, p =
.064, η2p = .092 (see Table S2 for means and standard errors). Specifically, Cantonese speakers
had a marginally more negative N1 than nonmusicians (p = .094). There was no significant main
effect of stimulus type, F(1, 57) = 1.741, p = .192, nor was there a significant interaction of
stimulus type and group, F(2, 57) = 1.001, p = .374. There was a significant main effect of
deviant size [F(2, 114) = 5.170, p = .007, η2p = .083], such that the large deviant had a more
negative N1 than the standard (p = .035) and small deviant (p = .029). There was no significant
interaction between deviant size and group, F < 1. There was a significant interaction between
stimulus type and deviant size, F(2, 114) = 4.210, p = .027, η2p = .069. Namely, for the music
condition, there was no significant difference in N1 amplitude between the three levels of
deviant size, F < 1. For the speech condition, there was a significant difference in N1 amplitude
between deviant sizes, F(2, 56) = 7.702, p = .001, η2p = .216. Specifically, the large deviant’s N1
was significantly more negative than for the standard (p = .001) and the small deviant (p = .028).
116
There was no significant interaction between stimulus type, size, and group, F(2, 114) = 1.972, p
= .122.
Figure S1. ERP waves for each group and condition prior to subtraction. Each waveform is an
average across six fronto-central electrodes (F1, Fz, F2, FC1, FCz, FC2). M = musicians; C =
Cantonese speakers; NM = nonmusicians.
117
Table S2
Means and Standard Errors of N1 Analysis Variables at Each Level.
Group Stimulus Deviant size M SE M music standard -0.471 0.337 large -0.505 0.385 small -0.660 0.437 speech standard -0.590 0.277 large -1.606 0.372 small -0.643 0.233 C music standard -1.567 0.227 large -1.476 0.304 small -1.278 0.347 speech standard -1.376 0.218 large -1.900 0.293 small -1.569 0.262 NM music standard -0.781 0.346 large -0.791 0.461 small -0.477 0.315 speech standard -0.565 0.265 large -0.747 0.371 small -0.593 0.290
Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.
8.2.3.2 P2
There was no significant main effect of group on mean P2 amplitude, F < 1; see Table S3
for means and standard errors. There was a significant main effect of stimulus type [F(1, 57) =
261.821, p < .001, η2p = .821], such that the P2 for the music condition was more positive than
for the speech condition. There was a marginal interaction between stimulus type and group, F(2,
57) = 2.627, p = .081, η2p = .084. The music condition elicited a significantly more positive P2
than the speech condition in musicians [F(1, 57) = 99.406, p < .001, η2p = .636], Cantonese
speakers [F(1, 57) = 51.405, p < .001, η2p = .474], and nonmusicians [F(1, 57) = 122.759, p <
.001, η2p = .683]. There was a significant main effect of deviant size [F(2, 114) = 45.067, p <
.001, η2p = .442], such that the P2 for the standard was more positive than for the large (p < .001)
and small (p < .001) deviants. The P2 was also more positive for the small deviant than for the
large deviant (p = .004). There was a marginal interaction between deviant size and group, F(4,
114) = 2.286, p = .064, η2p = .074. Specifically, there was a significant difference between
118
deviant P2 amplitudes in musicians, F(2, 56) = 35.938, p < .001, η2p = .562, such that the P2 was
significantly more positive for the standard than for the large (p < .001) and small (p < .001)
deviant. There was also a significant differences between deviant P2 amplitudes in Cantonese
speakers, F(2, 56) = 9.235, p < .001, η2p = .248, where the P2 was significantly more positive for
the standard than for the large deviant (p < .001). The P2 deviant amplitudes also differed in
nonmusicians, F(2, 56) = 10.332, p < .001, η2p = .270, such that the P2 was significantly more
positive for the standard than for the large (p < .001) and small (p = .016) deviant. There was no
significant interaction between stimulus type and deviant size (F < 1), nor was there a significant
interaction between stimulus type, size, and group (F < 1 ).
Table S3
Means and Standard Errors of P2 Analysis Variables at Each Level.
Group Stimulus Deviant size M SE M music standard 3.422 0.334 large 1.709 0.449 small 2.260 0.502 speech standard 0.302 0.322 large -1.641 0.325 small -1.110 0.402 C music standard 3.390 0.574 large 2.056 0.485 small 2.950 0.616 speech standard 0.716 0.449 large -0.130 0.455 small 0.166 0.522 NM music standard 3.321 0.381 large 2.341 0.461 small 2.915 0.406 speech standard -0.157 0.400 large -1.227 0.431 small -0.974 0.396
Note. M = musicians; C = Cantonese speakers; NM = nonmusicians.
119
8.2.4 Discussion
Cantonese speakers had a significantly more negative N1 than the other groups,
suggesting that the Cantonese speakers processed the sensory and physical properties of the all
stimuli differently than the other groups. There were no between-group differences in P2
amplitude. Collectively, these findings do not replicate past findings showing an enhanced
auditory N1 and P2 for musicians as compared to nonmusicians (e.g., Bosnyak et al., 2004;
Pantev et al., 1998; Shahin et al., 2003), and instead suggest that musicianship and speaking a
tone language do not modulate the N1 or P2 for musical tones or vowel sounds. This is puzzling,
given the converging evidence that suggests that these components coincide with improved
perception (Tremblay et al., 2014). It is possible that the present stimuli were too simple or
familiar for all particiants, thus yielding no group-specific enhancement in N1 or P2. However,
other studies have used comparably simple stimuli (e.g., piano tones in Pantev et al., 1998 and
Shahin et al., 2003; pure tones in Bosnyak et al., 2004). Thus, the simplicity or familiarity of the
stimuli may not account for the present lack of an enhanced N1 or P2 in musicians.
It is important to note that despite the evidence for an enhanced P2 coinciding with better
perceptual abilities, whether or not the P2 could serve as a biological marker of auditory learning
had not been studied until recently (Tremblay et al., 2014). To determine if the auditory evoked
P2 response is a biomarker of learning, Tremblay et al. (2014) taught native English speakers to
identify a new pre-voiced temporal cue that is not used phonemically in the English language, as
compared to participants who did not learn to identify this pre-voicing contrast. Modulations in
brain activity were recorded using EEG and MEG. The P2 amplitude increased across repeated
EEG sessions for all groups, regardless of any change in perceptual performance – an effect that
was retained for months (Tremblay et al., 2014). The changes to P2 amplitude were attributed to
changes in neural activity associated with the acquisition process, rather than the learned
outcome itself (Tremblay et al., 2014). Perhaps the prolonged exposure to the same stimuli
similarly enhanced the P2 of all groups in the present investigation, accounting for why there
was no between-groups difference in P2 amplitude.
At a methodological level, the deviant stimuli elicited a change in amplitude from the
standard for both the N1 and P2, as hypothesized. For the N1, the large deviant had a more
negative amplitude than for the standard and small deviant – an effect that was pronounced in the
120
speech condition. For the P2, the standard was more positive than the large and small deviants;
furthermore, the small deviant had a more positive P2 than the large deviant. This pattern of
findings are expected given the MMN results. Specifically, the large deviant condition’s P2
would have been attenuated (i.e., become more negative) upon elicitation of the MMN, which
overlaps with the P2. The peak MMN for the large deviant condition was indeed more negative
than that of the small deviant condition (Section 3.3.3.1.1). These findings thus confirm that the
deviant stimuli elicited a change in the standard and were modulated by the MMN – an expected
result, given the current ERP paradigm.
121
References
Abrams, D. A., Bhatara, A., Ryali, S., Balaban, E., Levitin, D. J., & Menon, V. (2011). Decoding temporal structure in music and speech relies on shared brain resources but elicits different fine-scale spatial patterns. Cereb Cortex, 21(7), 1507-1518. doi: 10.1093/cercor/bhq198
Abramson, A. S. (1962). The vowels and tones of Standard Thai: Acoustical measurements and Experiments. Bloomington: Indiana University Research Centre in Anthropology, Folklore, and Linguistics.
Albert, M. L., Sparks, R. W., & Helm, N. A. (1973). Melodic intonation therapy for aphasia. Arch Neurol, 29(2), 130-131.
Anderson, M. L. (2010). Neural reuse: a fundamental organizational principle of the brain. Behav Brain Sci, 33(4), 245-266; discussion 266-313. doi: 10.1017/S0140525X10000853
Armony, J. L., Aubé, W., Angulo-Perkins, A., Peretz, I., & Concha, L. (2015). The specificity of neural responses to music and their relation to voice processing: An fMRI-adaptation study. Neurosci Lett, 593, 35-39. doi: 10.1016/j.neulet.2015.03.011
Athos, E. A., Levinson, B., Kistler, A., Zemansky, J., Bostrom, A., Freimer, N., & Gitschier, J. (2007). Dichotomy and perceptual distortions in absolute pitch ability. Proc Natl Acad Sci USA, 104(37), 14795-14800. doi: 10.1073/pnas.0703868104
Baddeley, A. D., & Hutch, G. J. (1974). Working memory. In G.H. Bower (Ed.), The psychology of learning and motivation, Volume 8: Advances in research and theory (pp. 47-89). New York: Academic Press.
Baggaley, J. (1974). Measurement of absolute pitch. Psychol Music, 2(2), 11-17.
Baharloo, S., Johnston, P. A., Service, S. K., Gitschier, J., & Freimer, N. B. (1998). Absolute pitch: an approach for identification of genetic and nongenetic components. Am J Hum Genet, 62(2), 224-231. doi: 10.1086/301704
Barac, R., & Bialystok, E. (2012). Bilingual effects on cognitive and linguistic development: role of language, cultural background, and education. Child Dev, 83(2), 413-422. doi: 10.1111/j.1467-8624.2011.01707.x
Barthelemy, M. (2004). Betweenness centrality in large complex networks. The European Physical Journal B-Condensed Matter and Complex Systems, 38(2), 163-168. doi: 10.1140/epjb/e2004-00111-4
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann Stat, 1165-1188.
122
Bent, T., Bradlow, A. R., & Wright, B. A. (2006). The influence of linguistic experience on the cognitive processing of pitch in speech and nonspeech sounds. J Exp Psychol Hum Percept Perform, 32(1), 97-103. doi: 10.1037/0096-1523.32.1.97
Bermudez, P., & Zatorre, R. J. (2009). A distribution of absolute pitch ability as revealed by computerized testing. Music Percept, 27(2), 89-101. doi: 10.1525/mp.2009.27.2.89
Besson, M., Chobert, J., & Marie, C. (2011). Transfer of Training between Music and Speech: Common Processing, Attention, and Memory. Front Psychol, 2, 94. doi: 10.3389/fpsyg.2011.00094
Besson, M., & Macar, F. (1987). An event-related potential analysis of incongruity in music and other non-linguistic contexts. Psychophysiology, 24(1), 14-25.
Bialystok, E. (2011). Reshaping the mind: the benefits of bilingualism. Can J Exp Psychol, 65(4), 229-235. doi: 10.1037/a0025406
Bialystok, E., Craik, F. I., & Luk, G. (2012). Bilingualism: consequences for mind and brain. Trends Cogn Sci, 16(4), 240-250. doi: 10.1016/j.tics.2012.03.001
Bialystok, E., & Depape, A. M. (2009). Musical expertise, bilingualism, and executive functioning. J Exp Psychol Hum Percept Perform, 35(2), 565-574. doi: 10.1037/a0012735
Bialystok, E., & Feng, X. L. (2010). Language proficiency and its implications for monolingual and bilingual children (A. Y. Durgunoglu & C. Goldenberg Eds.). New York: Guilford Press.
Bialystok, E., Luk, G., Peets, K. F., & Yang, S. (2010). Receptive vocabulary differences in monolingual and bilingual children. Biling (Camb Engl), 13(4), 525-531. doi: 10.1017/S1366728909990423
Bidelman, G. M., Gandour, J. T., & Krishnan, A. (2011a). Cross-domain effects of music and language experience on the representation of pitch in the human auditory brainstem. J Cogn Neurosci, 23(2), 425-434. doi: 10.1162/jocn.2009.21362
Bidelman, G. M., Gandour, J. T., & Krishnan, A. (2011b). Musicians and tone-language speakers share enhanced brainstem encoding but not perceptual benefits for musical pitch. Brain Cogn, 77(1), 1-10. doi: 10.1016/j.bandc.2011.07.006
Bidelman, G. M., Hutka, S., & Moreno, S. (2013). Tone Language Speakers and Musicians Share Enhanced Perceptual and Cognitive Abilities for Musical Pitch: Evidence for Bidirectionality between the Domains of Language and Music. PLoS One, 8(4), e60676. doi: 10.1371/journal.pone.0060676
Bidelman, G. M., & Krishnan, A. (2010). Effects of reverberation on brainstem representation of speech in musicians and non-musicians. Brain Res, 1355, 112-125. doi: 10.1016/j.brainres.2010.07.100
123
Bidelman, G. M., Moreno, S., & Alain, C. (2013). Tracing the emergence of categorical speech perception in the human auditory system. Neuroimage, 79, 201-212. doi: 10.1016/j.neuroimage.2013.04.093
Bidelman, G. M., Weiss, M. W., Moreno, S., & Alain, C. (2014). Coordinated plasticity in brainstem and auditory cortex contributes to enhanced categorical speech perception in musicians. Eur J Neurosci, 40(4), 2662-2673. doi: 10.1111/ejn.12627
Bonakdarpour, B., Eftekharzadeh, A., & Ashayeri, H. (2000). Preliminary report on the effects of melodic intonation therapy in the rehabilitation of Persian aphasic patients. Iranian Journal of Medical Sciences, 25, 156-160.
Bonneville-Roussy, A., Lavigne, G. L., & Vallerand, R. J. (2011). When passion leads to excellence: The case of musicians. Psychol Music, 39(1), 123-138. doi: 10.1177/0305735609352441
Bosnyak, D. J., Eaton, R. A., & Roberts, L. E. (2004). Distributed auditory cortical representations are modified when non-musicians are trained at pitch discrimination with 40 Hz amplitude modulated tones. Cereb Cortex, 14(10), 1088-1099.
Bouchard, T. J. (2004). Genetic influence on human psychological traits: A survey. Curr Dir Psychol Sci, 13(4), 148-151. doi: 10.1111/j.0963-7214.2004.00295.x
Brattico, E., Pallesen, K. J., Varyagina, O., Bailey, C., Anourova, I., Jarvenpaa, M., . . . Tervaniemi, M. (2009). Neural discrimination of nonprototypical chords in music experts and laymen: an MEG study. J Cogn Neurosci, 21(11), 2230-2244. doi: 10.1162/jocn.2008.21144
Brattico, E., Tervaniemi, M., Näätänen, R., & Peretz, I. (2006). Musical scale properties are automatically processed in the human auditory cortex. Brain Res, 1117(1), 162-174. doi: 10.1016/j.brainres.2006.08.023
Brandler, S., & Rammsayer, T. H. (2003). Differences in mental abilities between musicians and non-musicians. Psychol Music, 31(2), 123-138. doi: 10.1177/0305735603031002290
Brod, G., & Opitz, B. (2012). Does it really matter? Separating the effects of musical training on syntax acquisition. Front Psychol, 3, 543. doi: 10.3389/fpsyg.2012.00543
Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 10(3), 186-198. doi: 10.1038/nrn2575
Catani, M., & Mesulam, M. (2008). The arcuate fasciculus and the disconnection theme in language and aphasia: history and current state. Cortex, 44(8), 953-961. doi: 10.1016/j.cortex.2008.04.002
Ceponiene, R., Lepisto, T., Soininen, M., Aronen, E., Alku, P., & Näätänen, R. (2004). Event-related potentials associated with sound discrimination versus novelty detection in children. Psychophysiology, 41(1), 130-141. doi: 10.1111/j.1469-8986.2003.00138.x
124
Chan, R. C. K., Shum, D., Toulopoulou, T., & Chen, E. Y. H. (2008). Assessment of executive functions: Review of instruments and identification of critical issues. Arch Clin Neuropsych, 23(2), 201-216. doi: 10.1016/j.acn.2007.08.010
Chandrasekaran, B., Krishnan, A., & Gandour, J. T. (2009). Relative influence of musical and linguistic experience on early cortical processing of pitch contours. Brain Lang, 108(1), 1-9. doi: 10.1016/j.bandl.2008.02.001
Chartrand, J. P., & Belin, P. (2006). Superior voice timbre processing in musicians. Neurosci Lett, 405(3), 164-167. doi: 10.1016/j.neulet.2006.06.053
Chobert, J., Francois, C., Velay, J. L., & Besson, M. (2014). Twelve months of active musical training in 8- to 10-year-old children enhances the preattentive processing of syllabic duration and voice onset time. Cereb Cortex, 24(4), 956-967. doi: 10.1093/cercor/bhs377
Cooper, A., & Wang, Y. (2012). The influence of linguistic and musical experience on Cantonese word learning. J Acoust Soc Am, 131(6), 4756-4769. doi: 10.1121/1.4714355
Corrigall, K. A., & Schellenberg, E. G. (2015). Predicting who takes music lessons: parent and child characteristics. Front Psychol, 6, 282. doi: 10.3389/fpsyg.2015.00282
Corrigall, K. A., & Trainor, L. J. (2011). Associations between length of music training and reading skills in children. Music Percept, 29(2), 147-155. doi: 10.1525/mp.2011.29.2.147
Corrigall, K. A., Schellenberg, E. G., & Misura, N. M. (2013). Music training, cognition, and personality. Frontiers in Psychology, 4(222), 1-10. doi: 10.3389/fpsyg.2013.00222
Corsi, P. M. (1972). Human memory and the medial temporal region of the brain [PhD thesis]. McGill University, Montreal.
Costa, M., Goldberger, A. L., & Peng, C. K. (2002). Multiscale entropy analysis of complex physiologic time series. Phys Rev Lett, 89(6), 068102.
Costa, M., Goldberger, A. L., & Peng, C. K. (2005). Multiscale entropy analysis of biological signals. Phys Rev E Stat Nonlin Soft Matter Phys, 71(2 Pt 1), 021906.
Cruttenden, A. (1997). Intonation (2nd ed.). Cambridge: Cambridge University Press.
Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: a literature review. Lang Speech, 40 ( Pt 2), 141-201.
de Bruin, A., Treccani, B., & Della Sala, S. (2015). Cognitive advantage in bilingualism: an example of publication bias? Psychol Sci, 26(1), 99-107. doi: 10.1177/0956797614557866
Deco, G., Jirsa, V., McIntosh, A. R., Sporns, O., & Kotter, R. (2009). Key role of coupling, delay, and noise in resting brain fluctuations. Proc Natl Acad Sci U S A, 106(25), 10302-10307. doi: 10.1073/pnas.0901831106
125
Deco, G., Jirsa, V. K., & McIntosh, A. R. (2011). Emerging concepts for the dynamical organization of resting-state activity in the brain. Nat Rev Neurosci, 12(1), 43-56. doi: 10.1038/nrn2961
Dediu, D., & Ladd, D. R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin. P Natl Acad Sci USA, 104(26), 10944-10949. doi: 10.1073pnas.0610848104
Deliege, I. (1987). Grouping conditions in listening to music: An approach to Lerdahl & Jackendoff's grouping preference rules. Music Percept, 325-359 %@ 0730-7829.
Delogu, F., Lampis, G., & Belardinelli, M. O. (2010). From melody to lexical tone: musical ability enhances specific aspects of foreign language perception. Eur J Cogn Psychol, 22(1), 46-61. doi: 10.1080/09541440802708136
Delorme, A., & Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Met, 134(1), 9-21. doi: 10.1016/j.jneumeth.2003.10.009
Deutsch, D. (1987). The tritone paradox: effects of spectral variables. Percept Psychophys, 41(6), 563-575.
Deutsch, D. (2013). Absolute pitch. In D. Deutsch (Ed.), The Psychology of Music (3rd ed ed., pp. 141-182). San Diego: Elsevier.
Deutsch, D., & Dooley, K. (2013). Absolute pitch is associated with a large auditory digit span: a clue to its genesis. J Acoust Soc Am, 133(4), 1859-1861. doi: 10.1121/1.4792217
Deutsch, D., Henthorn, T., & Dolson, M. (1999). Absolute pitch is demonstrated in speakers of tone languages. J Acoust Soc Am, 106(4), 2267-2267. doi: 10.1121/1.427738
Deutsch, D., Henthorn, T., & Dolson, M. (2004). Absolute pitch, speech, and tone language: Some experiments and a proposed framework. Music Percept, 21, 339-356. doi: 10.1525/mp/2004.21.3.339
Deutsch, D., Henthorn, T., Marvin, E., & Xu, H. (2006). Absolute pitch among American and Chinese conservatory students: prevalence differences, and evidence for a speech-related critical period. J Acoust Soc Am, 119(2), 719-722.
Diaconescu, A. O., Alain, C., & McIntosh, A. R. (2011). The co-occurrence of multisensory facilitation and cross-modal conflict in the human brain. J Neurophysiol, 106(6), 2896-2909. doi: 10.1152/jn.00303.2011
Diamond, A. (2013). Executive functions. Annu Rev Psychol, 64, 135-168.
Donchin, E., & Coles, M. G. (1988). Is the P300 component a manifestation of context updating? Behav Brain Sci, 11(03), 357-374.
126
Dooley, K., & Deutsch, D. (2010). Absolute pitch correlates with high performance on musical dictation. J Acoust Soc Am, 128(2), 890-893. doi: 10.1121/1.3458848
Dooley, K., & Deutsch, D. (2011). Absolute pitch correlates with high performance on interval naming tasks. J Acoust Soc Am, 130(6), 4097-4104. doi: 10.1121/1.3652861
Drayna, D., Manichaikul, A., de Lange, M., Snieder, H., & Spector, T. (2001). Genetic correlates of musical pitch recognition in humans. Science, 291(5510), 1969-1972. doi: 10.1126/science.291.5510.1969
Dung, D. T., Houng, T. T., & Boulakia, G. (1998). Intonation in Vietnamese (D. Hirst & A. Di Cristo Eds.). Cambridge: Cambridge University Press.
Dunn, B. R., Dunn, D. A., Languis, M., & Andrews, D. (1998). The relation of ERP components to complex memory processing. Brain Cogn, 36(3), 355-376. doi: 10.1006/brcg.1998.0998
Efron, B., & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci, 1(1), 54-75.
Engel de Abreu, P. M. (2011). Working memory in multilingual children: is there a bilingual effect? Memory, 19(5), 529-537. doi: 10.1080/09658211.2011.590504
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychol Rev, 100(3), 363-406. doi: 10.1037/0033-295X.100.3.363
Escera, C., Alho, K., Schröger, E., & Winkler, I. (2000). Involuntary attention and distractibility as evaluated with event-related brain potentials. Audiol Neuro-otol, 5(3-4), 151-166.
Esopenko, C., Kumar, P.K., Alain, C., Chow, T. W., McIntosh, A.R., Strother, S., & Levine, B.
(2013). The Interaction between Traumatic Brain Injury and Aging: Evidence from NHL Alumni. J Int Neuropsych Soc, 19(S1), i-295.
Faisal, A. A., Selen, L. P., & Wolpert, D. M. (2008). Noise in the nervous system. Nat Rev Neurosci, 9(4), 292-303. doi: 10.1038/nrn2258
Francois, C., Chobert, J., Besson, M., & Schön, D. (2013). Music training for the development of speech segmentation. Cereb Cortex, 23(9), 2038-2043. doi: 10.1093/cercor/bhs180
Fritz, J., Shamma, S., Elhilali, M., & Klein, D. (2003). Rapid task-related plasticity of spectrotemporal receptive fields in primary auditory cortex. Nat Neurosci, 6(11), 1216-1223. doi: 10.1038/nn1141
Fritz, J. B., Elhilali, M., David, S. V., & Shamma, S. A. (2007). Auditory attention--focusing the searchlight on sound. Curr Opin Neurobiol, 17(4), 437-455. doi: 10.1016/j.conb.2007.07.011
127
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. J Cogn Neurosci, 16(6), 1010-1021. doi: 10.1162/0898929041502706
Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2005). Automatic encoding of polyphonic melodies in musicians and nonmusicians. J Cogn Neurosci, 17(10), 1578-1592. doi: 10.1162/089892905774597263
Gandour, J. T. (1981). Perceptual dimensions of tone: Evidence from Cantonese. J Chinese Linguist, 9(1), 20-36.
Garrett, D. D., Kovacevic, N., McIntosh, A. R., & Grady, C. L. (2010). Blood oxygen level-dependent signal variability is more than just noise. J Neurosci, 30(14), 4914-4921. doi: 10.1523/JNEUROSCI.5166-09.2010
Garrett, D. D., Kovacevic, N., McIntosh, A. R., & Grady, C. L. (2011). The importance of being variable. J Neurosci, 31(12), 4496-4503. doi: 10.1523/JNEUROSCI.5641-10.2011
Garrett, D. D., Samanez-Larkin, G. R., MacDonald, S. W., Lindenberger, U., McIntosh, A. R., & Grady, C. L. (2013). Moment-to-moment brain signal variability: a next frontier in human brain mapping? Neurosci Biobehav Rev, 37(4), 610-624. doi: 10.1016/j.neubiorev.2013.02.015
George, E. M., & Coch, D. (2011). Music training and working memory: an ERP study. Neuropsychologia, 49(5), 1083-1094. doi: 10.1016/j.neuropsychologia.2011.02.001
Geschwind, N., & Levitsky, W. (1968). Human brain: left-right asymmetries in temporal speech region. Science, 161(3837), 186-187.
Ghosh, A., Rho, Y., McIntosh, A. R., Kotter, R., & Jirsa, V. K. (2008a). Cortical network dynamics with time delays reveals functional connectivity in the resting brain. Cogn Neurodyn, 2(2), 115-120. doi: 10.1007/s11571-008-9044-2
Ghosh, A., Rho, Y., McIntosh, A. R., Kotter, R., & Jirsa, V. K. (2008b). Noise during rest enables the exploration of the brain's dynamic repertoire. PLoS Comput Biol, 4(10), e1000196. doi: 10.1371/journal.pcbi.1000196
Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proc Natl Acad Sci USA, 99(12), 7821-7826. doi: 10.1073/pnas.122653799
Giuliano, R. J., Pfordresher, P. Q., Stanley, E. M., Narayana, S., & Wicha, N. Y. (2011). Native experience with a tone language enhances pitch discrimination and the timing of neural responses to pitch change. Front Psychol, 2, 146. doi: 10.3389/fpsyg.2011.00146
Goldberger, A. L., Amaral, L. A., Glass, L., Hausdorff, J. M., Ivanov, P. C., Mark, R. G., . . . Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation, 101(23), E215-220.
128
Golestani, N., Price, C. J., & Scott, S. K. (2011). Born with an ear for dialects? Structural plasticity in the expert phonetician brain. J Neurosci, 31(11), 4213-4220. doi: 10.1523/JNEUROSCI.3891-10.2011
Gollan, T. H., Montoya, R. I., & Werner, G. A. (2002). Semantic and letter fluency in Spanish-English bilinguals. Neuropsychology, 16(4), 562-576.
Granot, R. Y., Frankel, Y., Gritsenko, V., Lerer, E., Gritsenko, I., Bachner-Melman, R., . . . Ebstein, R. P. (2007). Provisional evidence that the arginine vasopressin 1a receptor gene is associated with musical memory. Evol Hum Behav, 28(5), 313-318.
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (1999). Absolute pitch: prevalence, ethnic variation, and estimation of the genetic component. Am J Hum Genet, 65(3), 911-913. doi: 10.1086/302541
Gregersen, P. K., Kowalsky, E., Kohn, N., & Marvin, E. W. (2001). Early childhood music education and predisposition to absolute pitch: teasing apart genes and environment. Am J Med Genet, 98(3), 280-282.
Gregersen, P. K., Kowalsky, E., & Li, W. (2007). Reply to Henthorn and Deutsch: Ethnicity versus early environment: Comment on ‘Early Childhood Music Education and Predisposition to Absolute Pitch: Teasing Apart Genes and Environment’ by Peter K. Gregersen, Elena Kowalsky, Nina Kohn, and Elizabeth West Marvin [2000]. Am J Med Genet A, 143(1), 104-105.
Gudmundsson, S., Runarsson, T. P., Sigurdsson, S., Eiriksdottir, G., & Johnsen, K. (2007). Reliability of quantitative EEG features. Clin Neurophysiol, 118(10), 2162-2171. doi: 10.1016/j.clinph.2007.06.018
Guimera, R., Mossa, S., Turtschi, A., & Amaral, L. A. (2005). The worldwide air transportation network: Anomalous centrality, community structure, and cities' global roles. Proc Natl Acad Sci USA, 102(22), 7794-7799. doi: 10.1073/pnas.0407994102
Guimera, R., & Nunes Amaral, L. A. (2005). Functional cartography of complex metabolic networks. Nature, 433(7028), 895-900. doi: 10.1038/nature03288
Hackley, S. A., Woldorff, M., & Hillyard, S. A. (1990). Cross-modal selective attention effects on retinal, myogenic, brainstem, and cerebral evoked potentials. Psychophysiology, 27(2), 195-208.
Halpern, A. R. (1989). Memory for the absolute pitch of familiar songs. Mem Cognit, 17(5), 572-581.
Halpern, A. R., Martin, J. S., & Reed, T. D. (2008). An ERP study of major-minor classification in melodies. Music Percept, 25, 181-191. doi: 10.1525/mp.2008.25.3.181
Hambrick, D. Z., & Tucker-Drob, E. M. (2015). The genetics of music accomplishment: evidence for gene-environment correlation and interaction. Psychon Bull Rev, 22(1), 112-120. doi: 10.3758/s13423-014-0671-9
129
Hansen, M., Wallentin, M., & Vuust, P. (2013). Working memory and musical competence of musicians and non-musicians. Psychology of Music, 41(6), 779-793. doi: 10.1177/0305735612452186
Heisz, J. J., & McIntosh, A. R. (2013). Applications of EEG neuroimaging data: Event-related potentials, spectral power, and multiscale entropy. J Vis Exp, 76, 50131. doi: 10.3791/50131.
Heisz, J. J., Shedden, J. M., & McIntosh, A. R. (2012). Relating brain signal variability to knowledge representation. Neuroimage, 63(3), 1384-1392. doi: 10.1016/j.neuroimage.2012.08.018
Helmbold, N., Rammsayer, T., & Altenmüller, E. (2005). Differences in primary mental abilities between musicians and nonmusicians. J Ind Diff, 26(2), 74-85.
Hillyard, S. A., Hink, R. F., Schwent, V. L., & Picton, T. W. (1973). Electrical signs of selective attention in the human brain. Science, 182(4108), 177-180.
Holleran, S., Jones, M. R., & Butler, D. (1995). Perceiving implied harmony: the influence of melodic and harmonic context. J Exp Psychol Learn Mem Cogn, 21(3), 737-753.
Honey, C. J., Kotter, R., Breakspear, M., & Sporns, O. (2007). Network structure of cerebral cortex shapes functional connectivity on multiple time scales. Proc Natl Acad Sci U S A, 104(24), 10240-10245. doi: 10.1073/pnas.0701519104
Honing, H., & Ladinig, O. (2008). The potential of the internet for music perception research: A comment on lab-based versus web-based studies. Empirical Musicology Review, 3(1), 4-7.
Horvath, J., Roeber, U., & Schroger, E. (2009). The utility of brief, spectrally rich, dynamic sounds in the passive oddball paradigm. Neurosci Lett, 461(3), 262-265. doi: 10.1016/j.neulet.2009.06.035
Hou, J., Chen, C., Wang, Y., Liu, Y., He, Q., Li, J., & Dong, Q. (2014). Superior pitch identification ability is associated with better executive functions. Psychomusicology: Music, Mind, and Brain, 24(2), 136.
Hutka, S.A., & Alain, C. (2015). The effects of absolute pitch and tone language on pitch processing and encoding in musicians. Music Percept, 32, 344-354. doi: 10.1525/mp.2015.32.4.344
Hutka, S., Bidelman, G. M., & Moreno, S. (2013). Brain signal variability as a window into the bidirectionality between music and language processing: moving from a linear to a nonlinear model. Front Psychol, 4, 984. doi: 10.3389/fpsyg.2013.00984
Hutka, S., Bidelman, G. M., & Moreno, S. (2015). Pitch expertise is not created equal: Cross-domain effects of musicianship and tone language experience on neural and behavioural discrimination of speech and music. Neuropsychologia, 71, 52-63. doi: 10.1016/j.neuropsychologia.2015.03.019
130
Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. J Neurosci, 29(10), 3019-3025. doi: 10.1523/JNEUROSCI.5118-08.2009
Jakobson, L. S., Lewycky, S. T., Kilgour, A. R., & Stoesz, B. M. (2008). Memory for verbal and visual material in highly trained musicians. Music Percept, 26, 41-55. doi: 10.1525/mp.2008.26.1.41
Jirsa, V. K., & Kelso, J. A. (2000). Spatiotemporal pattern formation in neural systems with heterogeneous connection topologies. Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 62(6 Pt B), 8462-8465.
Johnson, R., Jr. (1989). Developmental evidence for modality-dependent P300 generators: a normative study. Psychophysiology, 26(6), 651-667.
Jones, M. R. (1987). Dynamic pattern structure in music: recent theory and research. Percept Psychophys, 41(6), 621-634.
Kalmus, H., & Fry, D. B. (1980). On tune deafness (dysmelodia): frequency, development, genetics and musical background. Ann Hum Genet, 43(4), 369-382.
Keenan, J. P., Thangaraj, V., Halpern, A. R., & Schlaug, G. (2001). Absolute pitch and planum temporale. Neuroimage, 14(6), 1402-1408. doi: 10.1006/nimg.2001.0925
Khouw, E., & Ciocca, V. (2007). Perceptual correlates of Cantonese tones. J Phonetics, 35(1), 104-117.
Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and perception of voice quality variations among female and male talkers. J Acoust Soc Am, 87, 820-857. doi: 10.1121/1.398894
Klein, M., Coles, M. G., & Donchin, E. (1984). People with absolute pitch process tones without producing a P300. Science, 223(4642), 1306-1309.
Koelsch, S., Maess, B., Gunter, T. C., & Friederici, A. D. (2001). Neapolitan chords activate the area of Broca. A magnetoencephalographic study. Ann N Y Acad Sci, 930, 420-421.
Koelsch, S., Schroger, E., & Tervaniemi, M. (1999). Superior pre-attentive auditory processing in musicians. Neuroreport, 10(6), 1309-1313.
Kotter, R., & Wanke, E. (2005). Mapping brains without coordinates. Philos Trans R Soc Lond B Biol Sci, 360(1456), 751-766. doi: 10.1098/rstb.2005.1625
Kraus, N., & Chandrasekaran, B. (2010). Music training for the development of auditory skills. Nat Rev Neurosci, 11(8), 599-605. doi: 10.1038/nrn2882
Kraus, N., Slater, J., Thompson, E. C., Hornickel, J., Strait, D. L., Nicol, T., & White-Schwoch, T. (2014). Music enrichment programs improve the neural encoding of speech in at-risk children. J Neurosci, 34(36), 11913-11918. doi: 10.1523/JNEUROSCI.1881-14.2014
131
Krishnan, A., Xu, Y., Gandour, J., & Cariani, P. (2005). Encoding of pitch in the human brainstem is sensitve to language experience. Cognitive Brain Res, 25, 161-168. doi: 10.1016/j.cogbrainres.2005.05.004
Krizman, J., Marian, V., Shook, A., Skoe, E., & Kraus, N. (2012). Subcortical encoding of sound is enhanced in bilinguals and relates to executive function advantages. P Natl Acad Sci USA, 109(20), 7877-7881.
Krumhansl, C. L. (1990). Cognitive foundations of musical pitch. New York: Oxford University Press.
Larsen-Freeman, D. (1997). Chaos/Complexity science and second language acquisition. Appl Linguist, 18(2), 141-165. doi: 10.1093/applin/18.2.141
Laughlin, S. A., Naeser, M. A., & Gordon, W. P. (1979). Effects of three syllable durations using the melodic intonation therapy technique. J Speech Hear Res, 22(2), 311-320.
Lee, C. Y., & Lee, Y. F. (2010). Perception of musical pitch and lexical tones by Mandarin-speaking musicians. J Acoust Soc Am, 127(1), 481-490. doi: 10.1121/1.3266683
Levitin, D. J. (1994). Absolute memory for musical pitch: evidence from the production of learned melodies. Percept Psychophys, 56(4), 414-423.
Levitin, D. J., & Rogers, S. E. (2005). Absolute pitch: perception, coding, and controversies. Trends Cogn Sci, 9(1), 26-33. doi: 10.1016/j.tics.2004.11.007
Levitt, H. (1971). Transformed up-down methods in psychoacoustics. J Acoust Soc Am, 49(2B), 467-477. doi: 10.1121/1.1912375
Li, P., Sepanski, S., & Zhao, X. (2006). Language history questionnaire: A Web-based interface for bilingual research. Behav Res Methods, 38, 202-210. doi: 10.3758/bf03192770
Lippe, S., Kovacevic, N., & McIntosh, A. R. (2009). Differential maturation of brain signal complexity in the human auditory and visual system. Front Hum Neurosci, 3, 48. doi: 10.3389/neuro.09.048.2009
Loui, P., Li, H. C., Hohmann, A., & Schlaug, G. (2011). Enhanced cortical connectivity in absolute pitch musicians: a model for local hyperconnectivity. J Cogn Neurosci, 23(4), 1015-1026. doi: 10.1162/jocn.2010.21500
Loui, P., Zamm, A., & Schlaug, G. (2012). Enhanced functional networks in absolute pitch. Neuroimage, 63(2), 632-640. doi: 10.1016/j.neuroimage.2012.07.030
Luck, S. J. (2005). An introduction to the event-related potential technique (2nd edition ed.). Cambridge, MA: MIT Press.
Luk, G., & Bialystok, E. (2013). Bilingualism is not a categorical variable: Interaction between language proficiency and usage. Journal of Cognitive Psychology, 25(5), 605–621. doi: 10.1080/20445911.2013.795574
132
Macnamara, B. N., Hambrick, D. Z., & Oswald, F. L. (2014). Deliberate practice and performance in music, games, sports, education, and professions: a meta-analysis. Psychol Sci, 25(8), 1608-1618. doi: 10.1177/0956797614535810
Maddieson, I. (2013). Tone M. S. Dryer & M. Haspelmath (Eds.), The world atlas of language structures online. Retrieved from http://wals.info/chapter/13
Maess, B., Koelsch, S., Gunter, T. C., & Friederici, A. D. (2001). Musical syntax is processed in Broca's area: an MEG study. Nat Neurosci, 4(5), 540-545. doi: 10.1038/87502
Magne, C., Schön, D., & Besson, M. (2006). Musician children detect pitch violations in both music and language better than nonmusician children: behavioral and electrophysiological approaches. J Cogn Neurosci, 18(2), 199-211. doi: 10.1162/jocn.2006.18.2.199
Marie, C., Delogu, F., Lampis, G., Belardinelli, M. O., & Besson, M. (2011). Influence of musical expertise on segmental and tonal processing in Mandarin Chinese. J Cogn Neurosci, 23(10), 2701-2715. doi: 10.1162/jocn.2010.21585
Marie, C., Magne, C., & Besson, M. (2011). Musicians and the metric structure of words. J Cogn Neurosci, 23(2), 294-305. doi: 10.1162/jocn.2010.21413
Marques, C., Moreno, S., Castro, S. L., & Besson, M. (2007). Musicians detect pitch violation in a foreign language better than nonmusicians: behavioral and electrophysiological evidence. J Cogn Neurosci, 19(9), 1453-1463. doi: 10.1162/jocn.2007.19.9.1453
Matthews, G., Deary, I. J., & Whiteman, M. C. (2003). Personality Traits. Cambridge, UK: Cambridge University Press.
McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L. (1996). Spatial pattern analysis of functional brain images using partial least squares. Neuroimage, 3(3 Pt 1), 143-157. doi: 10.1006/nimg.1996.0016
McIntosh, A. R., Kovacevic, N., & Itier, R. J. (2008). Increased brain signal variability accompanies lower behavioral variability in development. PLoS Comput Biol, 4(7), e1000106. doi: 10.1371/journal.pcbi.1000106
McIntosh, A. R., & Lobaugh, N. J. (2004). Partial least squares analysis of neuroimaging data: applications and advances. Neuroimage, 23 Suppl 1, S250-263. doi: 10.1016/j.neuroimage.2004.07.020
McIntosh, A. R., Vakorin, V., Kovacevic, N., Wang, H., Diaconescu, A., & Protzner, A. B. (2014). Spatiotemporal dependency of age-related changes in brain signal variability. Cereb Cortex, 24(7), 1806-1817. doi: 10.1093/cercor/bht030
McKenna, T. M., McMullen, T. A., & Shlesinger, M. F. (1994). The brain as a dynamic physical system. Neuroscience, 60(3), 587-605.
133
Merrett, D. L., Peretz, I., & Wilson, S. J. (2014). Neurobiological, cognitive, and emotional mechanisms in melodic intonation therapy. Front Hum Neurosci, 8, 401. doi: 10.3389/fnhum.2014.00401
Merrill, J., Sammler, D., Bangert, M., Goldhahn, D., Lohmann, G., Turner, R., & Friederici, A. D. (2012). Perception of words and pitch patterns in song and speech. Front Psychol, 3, 76. doi: 10.3389/fpsyg.2012.00076
Milovanov, R., Huotilainen, M., Esquef, P. A., Alku, P., Valimaki, V., & Tervaniemi, M. (2009). The role of musical aptitude and language skills in preattentive duration processing in school-aged children. Neurosci Lett, 460(2), 161-165. doi: 10.1016/j.neulet.2009.05.063
Misic, B., Mills, T., Taylor, M. J., & McIntosh, A. R. (2010). Brain noise is task dependent and region specific. J Neurophysiol, 104(5), 2667-2676. doi: 10.1152/jn.00648.2010
Miyazaki, K. (1988). Musical pitch identification by absolute pitch possessors. Percept Psychophys, 44(6), 501-512.
Miyazaki, K., & Rakowski, A. (2002). Recognition of notated melodies by possessors and nonpossessors of absolute pitch. Percept Psychophys, 64(8), 1337-1345.
Miyazaki, K. I. (1990). The speed of musical pitch identification by absolute-pitch possessors. Music Percept, 8(177).
Miyazaki, K. I. (1993). Absolute pitch as an inability: Identification of musical intervals in a tonal context. . Music Percept, 11, 55-71.
Miyazaki, K. I. (1995). Perception of relative pitch with different references: Some absolute-pitch listeners can’t tell musical interval names. Percept Psychophys, 57(7), 962-970.
Mok, P. P., & Zuo, D. (2012). The separation between music and speech: Evidence from the perception of Cantonese tonesa). J Acoust Soc Am, 132(4), 2711-2720.
Moore, E., Schaefer, R. S., Bastin, M. E., Roberts, N., & Overy, K. (2014). Can musical training influence brain connectivity? Evidence from diffusion tensor MRI. Brain Sci, 4(2), 405-427. doi: 10.3390/brainsci4020405
Morales, J., Calvo, A., & Bialystok, E. (2013). Working memory development in monolingual and bilingual children. J Exp Child Psychol, 114(2), 187-202. doi: 10.1016/j.jecp.2012.09.002
Moreno, S., Bialystok, E., Barac, R., Schellenberg, E. G., Cepeda, N. J., & Chau, T. (2011). Short-term music training enhances verbal intelligence and executive function. Psychol Sci, 22(11), 1425-1433. doi: 10.1177/0956797611416999
Moreno, S., & Bidelman, G. M. (2014). Examining neural plasticity and cognitive benefit through the unique lens of musical training. Hear Res, 308, 84-97. doi: 10.1016/j.heares.2013.09.012
134
Moreno, S., Lee, Y., Janus, M., & Bialystok, E. (2014). Short-Term Second Language and Music Training Induces Lasting Functional Brain Changes in Early Childhood. Child Dev. doi: 10.1111/cdev.12297
Moreno, S., Marques, C., Santos, A., Santos, M., Castro, S. L., & Besson, M. (2009). Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity. Cereb Cortex, 19(3), 712-723. doi: 10.1093/cercor/bhn120
Moreno, S., Wodniecka, Z., Tays, W., Alain, C., & Bialystok, E. (2014). Inhibitory Control in Bilinguals and Musicians: Event Related Potential (ERP) Evidence for Experience-Specific Effects. PLoS One, 9(4), e94169. doi: 10.1371/journal.pone.0094169
Morley, A. P., Narayanan, M., Mines, R., Molokhia, A., Baxter, S., Craig, G., . . . Craig, I. (2012). AVPR1A and SLC6A4 polymorphisms in choral singers and non-musicians: a gene association study. PLoS One, 7(2), e31763. doi: 10.1371/journal.pone.0031763
Mosing, M. A., Madison, G., Pedersen, N. L., Kuja-Halkola, R., & Ullén, F. (2014). Practice Does Not Make Perfect No Causal Effect of Music Practice on Music Ability. Psychol Sci, 0956797614541990. doi: 10.1177/0956797614541990
Mullensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: an index for assessing musical sophistication in the general population. PLoS One, 9(2), e89642. doi: 10.1371/journal.pone.0089642
Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc Natl Acad Sci U S A, 104(40), 15894-15898. doi: 10.1073/pnas.0701498104
Musacchia, G., Strait, D., & Kraus, N. (2008). Relationships between behavior, brainstem and cortical encoding of seen and heard speech in musicians and non-musicians. Hear Res, 241(1-2), 34-42. doi: 10.1016/j.heares.2008.04.013
Myers, E. B., & Swan, K. (2012). Effects of category learning on neural sensitivity to non-native phonetic categories. J Cogn Neurosci, 24(8), 1695-1708. doi: 10.1162/jocn_a_00243
Näätänen, R. (1990). The role of attention in auditory information processing as revealed by event-related potentials and other brain measures of cognitive function. Behav Brain Sci, 13(02), 201-233.
Näätänen, R., Paavilainen, P., Rinne, T., & Alho, K. (2007). The mismatch negativity (MMN) in basic research of central auditory processing: a review. Clin Neurophysiol, 118(12), 2544-2590. doi: 10.1016/j.clinph.2007.04.026
Näätänen, R., Pakarinen, S., Rinne, T., & Takegata, R. (2004). The mismatch negativity (MMN): towards the optimal paradigm. Clin Neurophysiol, 115(1), 140-144. doi: S1388245703003687
135
Näätänen, R., & Picton, T. (1987). The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure. Psychophysiology, 24(4), 375-425.
Nan, Y., Sun, Y., & Peretz, I. (2010). Congenital amusia in speakers of a tone language: association with lexical tone agnosia. Brain, 133(9), 2635-2642. doi: 10.1093/brain/awq178
Narmour, E. (1990). The analysis and cognition of basic melodic structures: The implication–realization model. Chicago: University of Chicago Press.
Oechslin, M. S., Meyer, M., & Jäncke, L. (2010). Absolute pitch—Functional evidence of speech-relevant auditory acuity. Cereb Cortex, 20(2), 447-455.
Oechslin, M. S., Van De Ville, D., Lazeyras, F., Hauert, C. A., & James, C. E. (2013). Degree of musical expertise modulates higher order brain functioning. Cereb Cortex, 23(9), 2213-2224. doi: 10.1093/cercor/bhs206
Oostenveld, R., & Praamstra, P. (2001). The five percent electrode system for high-resolution EEG and ERP measurements. Clin Neurophysiol, 112(4), 713-719.
Owen, A. M., Hampshire, A., Grahn, J. A., Stenton, R., Dajani, S., Burns, A. S., . . . Ballard, C. G. (2010). Putting brain training to the test. Nature, 465(7299), 775-778. doi: 10.1038/nature09042
Paap, K. R., Johnson, H. A., & Sawi, O. (2014). Are bilingual advantages dependent upon specific tasks or specific bilingual experiences? Journal of Cognitive Psychology, 26(6), 615-639. doi: 10.1080/20445911.2014.944914
Pallesen, K. J., Brattico, E., Bailey, C. J., Korvenoja, A., Koivisto, J., Gjedde, A., & Carlson, S. (2010). Cognitive control in auditory working memory is enhanced in musicians. PLoS One, 5(6), e11120. doi: 10.1371/journal.pone.0011120
Pantev, C., Eulitz, C., Hampson, S., Ross, B., & Roberts, L. E. (1996). The auditory evoked "off" response: sources and comparison with the "on" and the "sustained" responses. Ear Hear, 17(3), 255-265.
Pantev, C., Oostenveld, R., Engelien, A., Ross, B., Roberts, L. E., & Hoke, M. (1998). Increased auditory cortical representation in musicians. Nature, 392(6678), 811-814. doi: 10.1038/33918
Pantev, C., Roberts, L. E., Schulz, M., Engelien, A., & Ross, B. (2001). Timbre-specific enhancement of auditory cortical representations in musicians. Neuroreport, 12(1), 169-174.
Parbery-Clark, A., Skoe, E., & Kraus, N. (2009). Musical experience limits the degradative effects of background noise on the neural processing of sound. J Neurosci, 29(45), 14100-14107. doi: 10.1523/JNEUROSCI.3256-09.2009
136
Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician enhancement for speech-in-noise. Ear Hear, 30(6), 653-661.
Parbery-Clark, A., Strait, D. L., Hittner, E., & Kraus, N. (2013). Musical training enhances neural processing of binaural sounds. J Neurosci, 33(42), 16741-16747. doi: 10.1523/JNEUROSCI.5700-12.2013
Parbery-Clark, A., Strait, D. L., & Kraus, N. (2011). Context-dependent encoding in the auditory brainstem subserves enhanced speech-in-noise perception in musicians. Neuropsychologia, 49(12), 3338-3345. doi: 10.1016/j.neuropsychologia.2011.08.007
Park, H., Lee, S., Kim, H. J., Ju, Y. S., Shin, J. Y., Hong, D., . . . Seo, J. S. (2012). Comprehensive genomic analyses associate UGT8 variants with musical ability in a Mongolian population. J Med Genet, 49(12), 747-752. doi: 10.1136/jmedgenet-2012-101209
Pascual-Marqui, R. D. (2002). Standardized low-resolution brain electromagnetic tomography (sLORETA): technical details. Methods Find Exp Clin Pharmacol, 24 Suppl D, 5-12.
Patel, A. D. (2003). Language, music, syntax and the brain. Nat Neurosci, 6(7), 674-681. doi: 10.1038/nn1082
Patel, A. D. (2008). Music, language, and the brain. New York: Oxford University Press.
Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Front Psychol, 2, 1-14. doi: 10.3389/fpsyg.2011.00142
Patel, A. D. (2014). Can nonlinguistic musical training change the way the brain processes speech? The expanded OPERA hypothesis. Hear Res, 308, 98-108. doi: 10.1016/j.heares.2013.08.011
Peng, G. (2006). Temporal and tonal aspects of Chinese syllables: A corpus-based comparative study of Mandarin and Cantonese. J Chinese Linguist, 34(1), 134.
Peretz, I., Vuvan, D., Lagrois, M. É., & Armony, J. L. (2015). Neural overlap in processing
music and speech. Philos T Roy Soc of B, 370(1664), 20140090. doi: 10.1098/rstb.2014.0090
Peretz, I., Gosselin, N., Tillmann, B., Cuddy, L. L., Gagnon, B., Trimmer, C. G., . . . Bouchard, B. (2008). On-line identification of congenital amusia. Music Percept, 24(4), 331-343. doi: 10.1525/mp.2008.25.4.331
Peterson, G. E., & Barney, H. L. (1952). Control methods used in a study of the vowels. J Acoust Soc Am, 24, 175-184. doi: 10.1121/1.1917300
Pfordresher, P. Q., & Brown, S. (2009). Enhanced production and perception of musical pitch in tone language speakers. Atten Percept Psychophys, 71(6), 1385-1398. doi: 10.3758/APP.71.6.1385
137
Picton, T. W., Alain, C., Woods, D. L., John, M. S., Scherg, M., Valdes-Sosa, P., . . . Trujillo, N. J. (1999). Intracerebral sources of human auditory-evoked potentials. Audiol Neurootol, 4(2), 64-79. doi: 13823
Picton, T. W., Hillyard, S. A., Krausz, H. I., & Galambos, R. (1974). Human auditory evoked potentials. I. Evaluation of components. Electroen Clin Neuro, 36(2), 179-190.
Pinneo, L. R. (1966). On noise in the nervous system. Psychol Rev, 73(3), 242-247.
Polich, J. (2007). Updating P300: an integrative theory of P3a and P3b. Clin Neurophysiol, 118(10), 2128-2148.
Portocarrero, J. S., Burright, R. G., & Donovick, P. J. (2007). Vocabulary and verbal fluency of bilingual and monolingual college students. Arch Clin Neuropsychol, 22(3), 415-422. doi: 10.1016/j.acn.2007.01.015
Profita, J., & Bidder, T. G. (1988). Perfect pitch. Am J Med Genet, 29(4), 763-771. doi: 10.1002/ajmg.1320290405
Protzner, A. B., Valiante, T. A., Kovacevic, N., McCormick, C., & McAndrews, M. P. (2010). Hippocampal signal complexity in mesial temporal lobe epilepsy: a noisy brain is a healthy brain. Arch Ital Biol, 148(3), 289-297.
Pulli, K., Karma, K., Norio, R., Sistonen, P., Goring, H. H., & Jarvela, I. (2008). Genome-wide linkage scan for loci of musical aptitude in Finnish families: evidence for a major locus at 4q22. J Med Genet, 45(7), 451-456. doi: jmg.2007.056366
Putkinen, V., Tervaniemi, M., & Huotilainen, M. (2013). Informal musical activities are linked to auditory discrimination and attention in 2-3-year-old children: an event-related potential study. Eur J Neurosci, 37(4), 654-661. doi: 10.1111/ejn.12049
Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proc Natl Acad Sci U S A, 98(2), 676-682. doi: 10.1073/pnas.98.2.676
Raichle, M. E., & Snyder, A. Z. (2007). A default mode of brain function: a brief history of an evolving idea. Neuroimage, 37(4), 1083-1090; discussion 1097-1089. doi: 10.1016/j.neuroimage.2007.02.041
Raja Beharelle, A., Kovacevic, N., McIntosh, A. R., & Levine, B. (2012). Brain signal variability relates to stability of behavior after recovery from diffuse brain injury. Neuroimage, 60(2), 1528-1537. doi: 10.1016/j.neuroimage.2012.01.037
Rattanasone, N. X., Attina, V., Kasisopa, B., & Burnham, D. (2013). How to compare tones South and Southeast Asian Psycholinguistics. Cambridge: Cambridge University Press.
Ravasz, E., & Barabasi, A. L. (2003). Hierarchical organization in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys, 67(2 Pt 2), 026112.
138
Raven, J., Raven, J. C., & Court, J. H. (1998). Advanced progressive matrices. Harcourt Assessment. San Antonio, TX: Harcourt Assessment.
Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. Am J Physiol Heart Circ Physiol, 278(6), H2039-2049.
Ross, D. A., Gore, J. C., & Marks, L. E. (2005). Absolute pitch: music and beyond. Epilepsy Behav, 7(4), 578-601. doi: 10.1016/j.yebeh.2005.05.019
Rosselli, M., Ardila, A., Araujo, K., Weekes, V. A., Caracciolo, V., Padilla, M., & Ostrosky-Solis, F. (2000). Verbal fluency and repetition skills in healthy older Spanish-English bilinguals. Appl Neuropsychol, 7(1), 17-24. doi: 10.1207/S15324826AN0701_3
Russo, F. A., Ives, D. T., Goy, H., Pichora-Fuller, M. K., & Patterson, R. D. (2012). Age-related difference in melodic pitch perception is probably mediated by temporal processing: empirical and computational evidence. Ear Hear, 33(2), 177-186. doi: 10.1097/AUD.0b013e318233acee
Ruthsatz, J., Detterman, D., Griscom, W. S., & Cirullo, B. A. (2008). Becoming an expert in the musical domain: It takes more than just practice. Intelligence, 36(4), 330-338. doi: 10.1016/j.intell.2007.08.003
Sammler, D., Koelsch, S., Ball, T., Brandt, A., Grigutsch, M., Huppertz, H. J., . . . Schulze-Bonhage, A. (2013). Co-localizing linguistic and musical syntax with intracranial EEG. Neuroimage, 64, 134-146. doi: 10.1016/j.neuroimage.2012.09.035
Sampson, P. D., Streissguth, A. P., Barr, H. M., & Bookstein, F. L. (1989). Neurobehavioral effects of prenatal alcohol: Part II. Partial least squares analysis. Neurotoxicol Teratol, 11(5), 477-491.
Schellenberg, E. G. (2004). Music lessons enhance IQ. Psychol Sci, 15(8), 511-514. doi: 10.1111/j.0956-7976.2004.00711.x
Schellenberg, E. G. (2006). Long-term positive associations between music lessons and IQ. Journal of Educational Psychology, 98(2), 457-468. doi: 10.1037/0022-0663.98.2.457
Schellenberg, E. G. (2011a). Examining the association between music lessons and intelligence. British Journal of Psychology, 102(3), 283-302.
Schellenberg, E. G. (2011b). Music lessons, emotional intelligence, and IQ. Music Percept, 29(2), 185-194. doi: 10.1525/mp.2011.29.2.185
Schellenberg, E. G. (2015). Music training and speech perception: a gene-environment interaction. Ann N Y Acad Sci, 1337, 170-177. doi: 10.1111/nyas.12627
Schellenberg, E. G., & Moreno, S. (2010). Music lessons, pitch processing, and g. Psychol Music, 38(2), 209-221. doi: 10.1177/0305735609339473
139
Schellenberg, E. G., & Peretz, I. (2008). Music, language and cognition: unresolved issues. Trends Cogn Sci, 12(2), 45-46. doi: 10.1016/j.tics.2007.11.005
Schellenberg, E. G., & Trehub, S. E. (2003). Good pitch memory is widespread. Psychol Sci, 14(3), 262-266.
Schellenberg, E. G., & Trehub, S. E. (2008). Is there an Asian advantage for pitch memory? Music Percept, 25, 241-252. doi: 10.1525/mp.2008.25.3.241
Scherg, M., Vajsar, J., & Picton, T. W. (1989). A source analysis of the late human auditory evoked potentials. J Cogn Neurosci, 1(4), 336-355. doi: 10.1162/jocn.1989.1.4.336
Schlaug, G., Forgeard, M., Zhu, L., Norton, A., Norton, A., & Winner, E. (2009). Training-induced neuroplasticity in young children. Ann N Y Acad Sci, 1169, 205-208. doi: 10.1111/j.1749-6632.2009.04842.x
Schlaug, G., Jancke, L., Huang, Y., & Steinmetz, H. (1995). In vivo evidence of structural brain asymmetry in musicians. Science, 267(5198), 699-701.
Schlaug, G., Marchina, S., & Norton, A. (2008). From Singing to Speaking: Why Singing May Lead to Recovery of Expressive Language Function in Patients with Broca's Aphasia. Music Percept, 25(4), 315-323. doi: 10.1525/MP.2008.25.4.315
Schlaug, G., Marchina, S., & Norton, A. (2009). Evidence for plasticity in white-matter tracts of patients with chronic Broca's aphasia undergoing intense intonation-based speech therapy. Ann N Y Acad Sci, 1169, 385-394. doi: 10.1111/j.1749-6632.2009.04587.x
Schlaug, G., Norton, A., Overy, K., & Winner, E. (2005). Effects of music training on the child's brain and cognitive development. Ann N Y Acad Sci, 1060, 219-230. doi: 10.1196/annals.1360.015
Schmithorst, V. J., & Wilke, M. (2002). Differences in white matter architecture between musicians and non-musicians: a diffusion tensor imaging study. Neurosci Lett, 321(1-2), 57-60.
Schön, D., Magne, C., & Besson, M. (2004). The music of speech: music training facilitates pitch processing in both music and language. Psychophysiology, 41(3), 341-349. doi: 10.1111/1469-8986.00172.x
Seppanen, M., Pesonen, A. K., & Tervaniemi, M. (2012). Music training enhances the rapid plasticity of P3a/P3b event-related brain potentials for unattended and attended target sounds. Atten Percept Psychophys, 74(3), 600-612. doi: 10.3758/s13414-011-0257-9
Sergeant, D., & Thatcher, G. (1974). Intelligence, Social Status and Musical Abilities. Psychol Music, 2(2), 32-57. doi: 10.1177/030573567422005
Shahin, A., Bosnyak, D. J., Trainor, L. J., & Roberts, L. E. (2003). Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. J Neurosci, 23(13), 5545-5552. doi: 23/13/5545 [pii]
140
Shestakova, A., Huotilainen, M., Ceponiene, R., & Cheour, M. (2003). Event-related potentials associated with second language learning in children. Clin Neurophysiol, 114(8), 1507-1512.
Slevc, L. R. (2012). Language and music: sound, structure, and meaning. Wiley Interdisciplinary Reviews: Cognitive Science, 3(4), 483-492. doi: 10.1002/wcs.1186
Slevc, L. R., Rosenberg, J. C., & Patel, A. D. (2009). Making psycholinguistics musical: self-paced reading time evidence for shared processing of linguistic and musical syntax. Psychon Bull Rev, 16(2), 374-381. doi: 10.3758/16.2.374
Sparks, R., Helm, N., & Albert, M. (1974). Aphasia rehabilitation resulting from melodic intonation therapy. Cortex, 10(4), 303-316.
Stagray, J. R., & Downs, D. (1993). Differential Sensitivity for Frequency among Speakers of a Tone and a Nontone Language. J Chinese Linguist, 21(1), 143-163.
Steele, C. J., Bailey, J. A., Zatorre, R. J., & Penhune, V. B. (2013). Early musical training and white-matter plasticity in the corpus callosum: evidence for a sensitive period. J Neurosci, 33(3), 1282-1290. doi: 10.1523/JNEUROSCI.3578-12.2013
Stein, R. B., Gossen, E. R., & Jones, K. E. (2005). Neuronal variability: noise or part of the signal? Nat Rev Neurosci, 6(5), 389-397. doi: 10.1038/nrn1668
Steinke, W. R., Cuddy, L. L., & Holden, R. R. (1997). Dissociation of musical tonality and pitch memory from nonmusical cognitive abilities. Can J Exp Psychol, 51(4), 316-334.
Strait, D. L., Kraus, N., Skoe, E., & Ashley, R. (2009). Musical experience and neural efficiency: effects of training on subcortical processing of vocal expressions of emotion. Eur J Neurosci, 29(3), 661-668. doi: 10.1111/j.1460-9568.2009.06617.x
Strait, D. L., O'Connell, S., Parbery-Clark, A., & Kraus, N. (2014). Musicians' enhanced neural differentiation of speech sounds arises early in life: developmental evidence from ages 3 to 30. Cereb Cortex, 24(9), 2512-2521. doi: 10.1093/cercor/bht103
Tadel, F., Baillet, S., Mosher, J. C., Pantazis, D., & Leahy, R. M. (2011). Brainstorm: a user-friendly application for MEG/EEG analysis. Comput Intell Neurosci, 2011, 879716. doi: 10.1155/2011/879716
Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychol Bull, 113(2), 345-361.
Tan, Y. T., McPherson, G. E., Peretz, I., Berkovic, S. F., & Wilson, S. J. (2014). The genetic basis of music ability. Front Psychol, 5, 658. doi: 10.3389/fpsyg.2014.00658
Terhardt, E., & Seewann, M. (1983). Aural key identification and its relationship to absolute pitch. Music Percept, 1, 63-83.
Terhardt, E., & Ward, W. D. (1982). Recognition of musical key: Exploratory study. J Acoust Soc Am, 72, 26-33.
141
Tervaniemi, M., Just, V., Koelsch, S., Widmann, A., & Schröger, E. (2005). Pitch discrimination accuracy in musicians vs nonmusicians: an event-related potential and behavioral study. Exp Brain Res, 161(1), 1-10.
Tervaniemi, M., Rytkonen, M., Schroger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior formation of cortical memory traces for melodic patterns in musicians. Learn Mem, 8(5), 295-300. doi: 10.1101/lm.39501
Theusch, E., Basu, A., & Gitschier, J. (2009). Genome-wide study of families with absolute pitch reveals linkage to 8q24.21 and locus heterogeneity. Am J Hum Genet, 85(1), 112-119. doi: 10.1016/j.ajhg.2009.06.010
Thompson, W. F., Schellenberg, E. G., & Husain, G. (2004). Decoding speech prosody: do music lessons help? Emotion, 4(1), 46-64. doi: 10.1037/1528-3542.4.1.46
Tierney, A., & Kraus, N. (2013). Music training for the development of reading skills. Prog Brain Res, 207, 209-241. doi: 10.1016/B978-0-444-63327-9.00008-4
Tononi, G., Sporns, O., & Edelman, G. M. (1994). A measure for brain complexity: relating functional segregation and integration in the nervous system. Proc Natl Acad Sci U S A, 91(11), 5033-5037.
Tononi, G., Sporns, O., & Edelman, G. M. (1996). A complexity measure for selective matching of signals by the brain. Proc Natl Acad Sci U S A, 93(8), 3422-3427.
Traynelis, S. F., & Jaramillo, F. (1998). Getting the most out of noise in the central nervous system. Trends Neurosci, 21(4), 137-145.
Tremblay, K. L., Ross, B., Inoue, K., McClannahan, K., & Collet, G. (2014). Is the auditory evoked P2 response a biomarker of learning? Front Syst Neurosci, 8, 28. doi: 10.3389/fnsys.2014.00028
Ukkola, L. T., Onkamo, P., Raijas, P., Karma, K., & Järvelä, I. (2009). Musical aptitude is associated with AVPR1A-haplotypes. PLoS One, 4(5), e5534. doi: 10.1371/journal.pone.0005534
Ukkola-Vuoti, L., Kanduri, C., Oikkonen, J., Buck, G., Blancher, C., Raijas, P., . . . Jarvela, I. (2013). Genome-wide copy number variation analysis in extended families and unrelated individuals characterized for musical aptitude and creativity in music. PLoS One, 8(2), e56356. doi: 10.1371/journal.pone.0056356
Ukkola-Vuoti, L., Oikkonen, J., Onkamo, P., Karma, K., Raijas, P., & Jarvela, I. (2011). Association of the arginine vasopressin receptor 1A (AVPR1A) haplotypes with listening to music. J Hum Genet, 56(4), 324-329. doi: 10.1038/jhg.2011.13
Vakorin, V. A., Misic, B., Krakovska, O., & McIntosh, A. R. (2011). Empirical and theoretical aspects of generation and transfer of information in a neuromagnetic source network. Front Syst Neurosci, 5, 96. doi: 10.3389/fnsys.2011.00096
142
Vaughan, H. G., Jr., & Ritter, W. (1970). The sources of auditory evoked responses recorded from the human scalp. Electroen Clin Neuro, 28(4), 360-367.
von Stein, A., & Sarnthein, J. (2000). Different frequencies for different scales of cortical integration: from local gamma to long range alpha/theta synchronization. Int J Psychophysiol, 38(3), 301-313.
Vuust, P., Roepstorff, A., Wallentin, M., Mouridsen, K., & Ostergaard, L. (2006). It don't mean a thing... Keeping the rhythm during polyrhythmic tension, activates language areas (BA47). Neuroimage, 31(2), 832-841. doi: 10.1016/j.neuroimage.2005.12.037
Waldrop, M. M. (1992). Complexity: The emerging science at the edge of order and chaos. New York: Simon and Schuster.
Wan, C. Y., Bazen, L., Baars, R., Libenson, A., Zipse, L., Zuk, J., . . . Schlaug, G. (2011). Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: a proof of concept study. PLoS One, 6(9), e25505. doi: 10.1371/journal.pone.0025505
Ward, W. D. (1999). Absolute pitch. In D. Deutsch (Ed.), The psychology of music (2nd ed ed., pp. 265–298). San Diego: Academic Press.
Wayman, J. W., Frisina, R. D., Walton, J. P., Hantz, E. C., & Crummer, G. C. (1992). Effects of musical training and absolute pitch ability on event-related activity in response to sine tones. J Acoust Soc Am, 91(6), 3527-3531.
Wechsler, D., & Hsiao-Pin, C. (2011). WASI-II: Wechsler Abbreviated Scale of Intelligence: Pearson.
Weiss, M. W., Schellenberg, E. G., Trehub, S. E., & Dawber, E. J. (2015). Enhanced processing of vocal melodies in childhood. Dev Psychol, 51(3), 370-377. doi: 10.1037/a0038784
Weiss, M. W., Trehub, S. E., & Schellenberg, E. G. (2012). Something in the way she sings: enhanced memory for vocal melodies. Psychol Sci, 23(10), 1074-1078. doi: 10.1177/0956797612442552
Weiss, M. W., Vanzella, P., Schellenberg, E. G., & Trehub, S. E. (2015). Pianists exhibit enhanced memory for vocal melodies but not piano melodies. Q J Exp Psychol, 1-12. doi: 10.1080/17470218.2015.1020818
Wetzel, N., Widmann, A., Berti, S., & Schroger, E. (2006). The development of involuntary and voluntary attention from childhood to adulthood: a combined behavioral and event-related potential study. Clin Neurophysiol, 117(10), 2191-2203. doi: 10.1016/j.clinph.2006.06.717
Wilde, N. J., Strauss, E., & Tulsky, D. S. (2004). Memory span on the Wechsler Scales. J Clin Exp Neuropsychol, 26(4), 539-549. doi: 10.1080/13803390490496605
Wilson, S. J., Parsons, K., & Reutens, D. C. (2006). Music Percept, 24(1), 23-36.
143
Wong, P. C., Ciocca, V., Chan, A. H., Ha, L. Y., Tan, L. H., & Peretz, I. (2012). Effects of culture on musical pitch perception. PLoS One, 7(4), e33424. doi: 10.1371/journal.pone.0033424
Wong, P. C., & Perrachione, T. K. (2007). Learning pitch patterns in lexical identification by native English-speaking adults. Appl Psycholinguist, 28(04), 565-585.
Wong, P. C., Perrachione, T. K., & Parrish, T. B. (2007). Neural characteristics of successful and less successful speech and word learning in adults. Hum Brain Mapp, 28(10), 995-1006. doi: 10.1002/hbm.20330
Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat Neurosci, 10(4), 420-422. doi: 10.1038/nn1872
Wu, C., Kirk, I. J., Hamm, J. P., & Lim, V. K. (2008). The neural networks involved in pitch labeling of absolute pitch musicians. Neuroreport, 19(8), 851-854. doi: 10.1097/WNR.0b013e3282ff63b1
Xu, Y. (1997). Contextual tonal variations in Mandarin. J Phonetics, 25(1), 61-83. doi: 10.1006/jpho.1996.0034
Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27(1), 55-105. doi: 10.1006/jpho.1999.0086
Xu, Y., & Wang, Q. E. (2001). Pitch targets and their realization: Evidence from Mandarin Chinese. Speech Commun, 33(4), 319-337. doi: 10.1016/S0167-6393(00)00063-7
Yip, M. (2002). Tone. Cambridge: Cambridge University Press.
Zatorre, R. J., & Baum, S. R. (2012). Musical melody and speech intonation: singing a different tune. PLoS Biol, 10(7), e1001372. doi: 10.1371/journal.pbio.1001372
Zatorre, R. J., Belin, P., & Penhune, V. B. (2002). Structure and function of auditory cortex: music and speech. Trends Cogn Sci, 6(1), 37-46.
Zatorre, R. J., Perry, D. W., Beckett, C. A., Westbury, C. F., & Evans, A. C. (1998). Functional anatomy of musical processing in listeners with absolute pitch and relative pitch. Proc Natl Acad Sci U S A, 95(6), 3172-3177.
Zee, E. (1999). Chinese (Hong Kong Cantonese). Handbook of the International Phonetic Association (pp. 58-60). Cambridge: Cambridge University Press.
Zendel, B. R., & Alain, C. (2012). Musicians experience less age-related decline in central auditory processing. Psychol Aging, 27(2), 410-417. doi: 10.1037/a0024816
Zhai, S., Kong, J., & Ren, X. (2004). Speed–accuracy tradeoff in Fitts’ law tasks—on the equivalency of actual and nominal pointing precision. Int J Hum-Comput St, 61(6), 823-856.
144
Zuk, J., Benjamin, C., Kenyon, A., & Gaab, N. (2014). Behavioral and neural correlates of executive functioning in musicians and non-musicians. PLoS One, 9(6), e99868. doi: 10.1371/journal.pone.0099868
145
Copyright Acknowledgements Chapter 2: Article originally published as Hutka, Stefanie A. and Claude Alain, “The Effects of
Absolute Pitch and Tone Language on Pitch Processing and Encoding in Musicians,” Music
Perception, Vol. 32, No 4 (April 2015): pp. 344-354. © 2015 by Music Perception: University of
California Press.
Chapter 3: Article originally published as Pitch expertise is not created equal: Cross-domain
effects of musicianship and tone language experience on neural and behavioural discrimination
of speech and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., Copyright (2015),
reprinted with permission from Elsevier.
Chapter 3, Figure 3: Reprinted from Pitch expertise is not created equal: Cross-domain effects of
musicianship and tone language experience on neural and behavioural discrimination of speech
and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., p. 55, Copyright (2015), with
permission from Elsevier.
Chapter 3, Figure 4: Reprinted from Pitch expertise is not created equal: Cross-domain effects of
musicianship and tone language experience on neural and behavioural discrimination of speech
and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., p. 56, Copyright (2015), with
permission from Elsevier.
Chapter 3, Figure 5: Reprinted from Pitch expertise is not created equal: Cross-domain effects of
musicianship and tone language experience on neural and behavioural discrimination of speech
and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., p. 57, Copyright (2015), with
permission from Elsevier.
Chapter 3, Figure 6: Reprinted from Pitch expertise is not created equal: Cross-domain effects of
musicianship and tone language experience on neural and behavioural discrimination of speech
and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., p. 57, Copyright (2015), with
permission from Elsevier.
Chapter 3, Figure 7: Reprinted from Pitch expertise is not created equal: Cross-domain effects of
musicianship and tone language experience on neural and behavioural discrimination of speech
146
and music, Vol. 71, Hutka, S., Bidelman, G. M., & Moreno, S., p. 57, Copyright (2015), with
permission from Elsevier.
Chapter 4: Based on Hutka, S., Bidelman, G. M., & Moreno, S. (2013). Brain signal variability
as a window into the bidirectionality between music and language processing: moving from a
linear to a nonlinear model. Frontiers in Psychology, 4, 984. doi: 10.3389/fpsyg.2013.00984.
This is an open-access article distributed under the terms of the Creative Commons Attribution
Licence (CC BY). © 2013 Hutka, Bidelman and Moreno.
Chapter 4, Figure 8: Reprinted from Hutka, S., Bidelman, G. M., & Moreno, S. (2013). Brain
signal variability as a window into the bidirectionality between music and language processing:
moving from a linear to a nonlinear model. Frontiers in Psychology, 4, 984. doi:
10.3389/fpsyg.2013.00984. © 2013 Hutka, Bidelman and Moreno.