janata, p., tillmann, b., & bharucha, j. j. (2002). listening to polyphonic music recruits...

7/27/2019 Janata, P., Tillmann, B., & Bharucha, J. J. (2002). Listening to Polyphonic Music Recruits Domain-general Attention

1/20

Cognitive, Affective, & Behavioral Neuroscience

2002, 2 (2), 121-140

Most natural environments contain many simultane-ously active sound sources, thereby presenting listenerswith a complex auditory scene. As with visual scenes, inwhich it is important to segregate objects from one anotherin order to be able to orient to and interact with the visual

environment, the auditory scene must also undergo ex-tensive analysis before individual sources can be identi-fied and segregated from others (Bregman, 1990). Naturalauditory scenes vary greatly in their content and complex-ity, and it is often necessary to focus attention selectively ona single sound source. The most famous example of an au-ditory scene that humans analyze is the cocktail party inwhich the voice of an individual must be separated fromthe voices of others (Cherry, 1953). Music forms thebasis for another intriguing and diverse class of auditoryscenes, which are created and appreciated in all humancultures.

Polyphonic music is a composite of multiple streams,in which the aggregate can be appreciated, yet individualcomponents can be analyzed. For example, when listeningto a blues band, it is relatively easy to focus attention se-lectively on the drums, bass, keyboards, or lead guitar.

Relevant acoustic features that facilitate stream segrega-tion are the relative pitch height and rate at which tonesare presented (Bregman & Campbell, 1971), timbre (Iver-son, 1995), rhythmic patterns (Jones, 1993), and spatial lo-cation (see Bregman, 1990, for a review). These featuresprovide a basis for selective attention in music, and theydefine auditory objects, sequences of which constitute au-ditory streams. For example, when the tones of two differ-ent melodies are interleaved temporally, a listener can dis-cern the individual melodies, provided they differ fromeach other along at least one feature dimensionfor ex-ample, pitch range, timbre, or loudness (Dowling, 1973).

The neural mechanisms underlying auditory attention toboth single and multiple streams of auditory informationhave been studied extensively, using event-related poten-tial (ERP) recordings from the human scalp (see Ntnen,1992, for a review). With few exceptions, however, audi-tory streams in most ERP and psychophysical experimentshave consisted of extremely simple stimuli, such as re-peated pure tones with occasional pitch or loudness de-viants. Experiments in which selective attention to one oftwo streams has been investigated generally have distin-guished the streams spatially by presenting a differentstimulus train to each ear (Alho, 1992; Hillyard, Hink,

Schwent, & Picton, 1973; Woldorff, Hackley, & Hillyard,1991). The neural mechanisms underlying selective atten-

121 Copyright 2002 Psychonomic Society, Inc.

This research was supported by NIH Grant P50 NS17778-18 to J.J.B.,

the National Institute of Drug Abuse, the McDonnel Foundation, andthe Dartmouth Brain Imaging Center. We thank Jeffrey L. Birk for tran-

scribing the Schubert excerpt into MIDI format for Experiment 2, Lau-ren Fontein for helping us to test subjects, Matthew Brett for providing

display routines, Souheil Inati for assistance in pulse sequence selectionand for providing file conversion routines, and Scott Grafton for helpful

comments. The manuscript benefited from the criticisms and sugges-tions of two anonymous reviewers. The data and stimuli from the ex-

periments reported in this paper are available upon request from thefMRI Data Center at Dartmouth College (http://www.fmridc.org) under

Accession Number 2-2002-112YT. Correspondence concerning this ar-ticle should be addressed to P. Janata, Department of Psychological and

Brain Sciences, 6207 Moore Hall, Dartmouth College, Hanover, NH03755 (e-mail: [email protected]).

Listening to polyphonic musicrecruits domain-general attention

and working memory circuits

PETR JANATA, BARBARA TILLMANN, and JAMSHED J. BHARUCHADartmouth College, Hanover, New Hampshire

Polyphonic music combines multiple auditory streams to create complex auditory scenes, thus pro-viding a tool for investigating the neural mechanisms that orient attention in natural auditory contexts.

Across two fMRI experiments, we varied stimuli and task demands in order to identify the cortical areasthat are activated during attentive listening to real music. In individual experiments and in a conjunctionanalysis of the two experiments, we found bilateral blood oxygen level dependent (BOLD) signal in-creases in temporal (the superior temporal gyrus), parietal (the intraparietal sulcus), and frontal (the pre-

central sulcus, the inferior frontal sulcus and gyrus, and the frontal operculum) areas during selective andglobal listening, as compared with passive rest without musical stimulation. Direct comparisons of thelistening conditions showed significant differences between attending to single timbres (instruments)

and attending across multiple instruments, although the patterns that were observed depended on the rel-ative demands of the tasks being compared. The overall pattern of BOLD signal increases indicated that

attentive listening to music recruits neural circuits underlying multiple forms of working memory, atten-tion, semantic processing, target detection, and motor imagery. Thus, attentive listening to music appearsto be enabled by areas that serve general functions, rather than by music-specific cortical modules.
http://www.fmridc.org/


2/20

122 JANATA, TILLMANN, AND BHARUCHA

tion to somewhat more complex streams consisting of re-peating phoneme and consonantvowel (CV) syllable to-kens have also been investigated (Hink, Hillyard, & Ben-son, 1978; Sams, Aulanko, Aaltonen, & Ntnen, 1990;Szymanski, Yund, & Woods, 1999a, 1999b). Selective at-tention in the presence of more than two streams has re-

ceived scant attention (Alain & Woods, 1994; Brochard,Drake, Botte, & McAdams, 1999; Woods & Alain, 2001).Functional neuroimaging (positron emission tomogra-

phy [PET] and functional magnetic resonance imaging[fMRI]) investigations of attention in auditory contextshave begun to provide converging evidence as to whichgeneral brain circuits are recruited during detection of tar-get objects within single streams with or without the pres-ence of secondary streams. In an fMRI experiment, Pughet al. (1996) presented both binaural (single stream) anddichotic (dual stream) conditions while subjects per-formed a CV or a frequency modulation (FM) sweep de-

tection task. The increased attentional demands in the di-chotic condition resulted in increased activation in theinferior parietal, inferior frontal, and auditory associationareas. In a PET/ERP experiment modeled on the typicalauditory oddball paradigm used in the ERP field, Tzourioet al., (1997) presented frequent low tones and rare hightones, under passive-listening conditions and with deviancedetection, randomly to the right or the left ear. Subjectsattended to and reported deviants presented to one of thetwo ears. Both Heschls gyrus (HG) and the planum tem-porale (PT) were active in the passive and attentive con-ditions, as compared with the rest condition, but there was

no significant activation difference in these areas betweenthe passive and the attentive conditions. However, selec-tive attention increased activation signif icantly, as com-pared with passive listening, in the supplementary motorarea (SMA), anterior cingulate, and precentral gyrus bi-laterally. These results and others (Linden et al., 1999; Za-torre, Mondor, & Evans, 1999) suggest that selective atten-tion operates on representations of auditory objects afterprocessing in the primary and secondary auditory areas iscomplete, although other fMRI and ERP data suggest thatselective attention operates on early auditory representa-tions (Jncke, Mirzazade, & Shah, 1999; Woldorff et al.,

1993; Woldorff & Hillyard, 1991). However, in the exper-iments cited above, the stimuli were acoustically simple,the contexts were unnatural, and attention was deployed inthe framework of a target detection task. Thus, the spe-cific requirements of target detection and decision makingwere combined with potentially more general attentionalprocesses.

In the music perception domain, Platel et al. (1997) per-formed a PET study of selective attention to musical fea-tures of short, simple tonal sequences. While listening tothe same body of material repeatedly, subjects identifiedthe category membership of individual sequences along

the feature dimensions of pitch, timbre, and rhythm. Thegoal of the study was to identify those brain areas thatuniquely represent or process these different feature di-mensions. By contrasting various combinations of activa-

tion mapsfor example, pitch versus timbre or rhythmversus pitch and timbre combinedseveral areas in thefrontal lobes were found to be differentially activated byattention to the different feature dimensions. In a recentPET study of attentive listening in musicians, Satoh,Takeda, Nagata, Hatazawa, and Kuzuhara (2001) found

that listening for target pitches in the alto voice of four-part harmonic progressions resulted in greater bilateral re-gional cerebral blood flow (rCBF) in superior parietal andfrontal areas, relative to listening for minor chord targetsin the same progressions. The frontal activations appear toinclude regions around the inferior frontal sulcus (IFS)and the precentral sulcus (PcS), the left-SMA/pre-SMA,and the orbitofrontal cortex.

Although the use of acoustically and contextually sim-ple stimuli is alluring from an experimental point of view,our brains have adapted to rich and complex acoustic en-vironments, and it is of interest to determine whether

stream segregation tasks using more complex stimuli ex-hibit brain response patterns similar to those using sim-pler stimuli. Therefore, just as experiments on the neuralmechanisms of visual processing have largely progressedbeyond the use of simple oriented bar stimuli to the use offaces and other complex objects, both natural and artifi-cial, presented individually (Haxby, Hoffman, & Gobbini,2000) or as natural scenes consisting of multiple objects(Coppola, White, Fitzpatrick, & Purves, 1998; Stanley, Li,& Dan, 1999; Treves, Panzeri, Rolls, Booth, & Wakeman,1999; Vinje & Gallant, 2000), we believe that functionalneuroimaging experiments using more complex acoustic

stimuli embedded in natural contexts may help identifythose brain regions that have adapted to the processing ofsuch stimuli. This type of neuroethological approach hasproven indispensable for identifying the neural substratesof behaviors in other species that depend on complex au-ditory stimulifor example, birdsong (Brenowitz, Mar-goliash, & Nordeen, 1997; Konishi, 1985).

We conducted two experiments in order to identify thebrain areas recruited during attentive listening to excerptsof real music and to identify the brain areas recruitedduring selective listening to one of several simultane-ously presented auditory streams. To this end, we varied

task demands, musical material, and trial structure acrossthe experiments. In the f irst experiment, subjects were in-structed to listen to the musical passages either globally/holistically, without focusing their attention on any par-ticular instrument, or selectively, by attentively trackingthe part played by a single instrument. Within the con-straints of the fMRI environment, we sought to imageblood oxygen level dependent (BOLD) signal changes asthe subjects engaged in relatively natural forms of musiclistening. Our objective of imaging natural attentive musiclistening posed a significant challenge, because we did notwant to contaminate normal listening processes with

additional demands of secondary tasks, such as target de-tection. Auditory attention is studied almost exclusively inthe context of target detection, in which a stream is moni-tored for the occurrence of single target events. By virtue


3/20

FUNCTIONAL IMAGING OF ATTENTIVE LISTENING TO MUSIC 123

of being able to measure the number of targets that are de-tected as a function of attentional orienting, this type ofdesign provides a powerful tool for studying attention.However, the subjective experience of listening attentivelyto music is not the same as performing a target detectiontask, and it is possible that brain dynamics vary with those

different forms of attentional processing.To further investigate this hypothesis with a second ex-periment, an excerpt of classical music was adapted foruse in a target detection task with two levels of attentionalinstructions. As in the first experiment, subjects were re-quired to orient their attention either globally or selec-tively. This time, the attentional orienting was coupled todetection of targets that could occur in any one of threestreams (global/divided-attending condition) or in thesingle attended stream (selective-attending condi tion).

Finally, we sought to identify the cortical areas in whichthe BOLD signal increased reliably during attentive music

perception across different experimental contexts. A con-junction analysis enabled us to identify the cortical areasthat were activated in both of the experiments, therebyidentifying processes common to attentive music listen-ing despite variation in the musical material and tasks.

EXPERIMENT 1

For this experiment, we chose a musical example con-sisting of two simultaneous melodies (streams). Specifi-cally, we selected excerpts from two baroque flute duets inwhich the melodic contour and rhythmic properties of the

two voices were very similar. The feature dimension ofprimary interest was timbre, which we manipulated by as-signing two perceptually very different timbres (Krum-hansl, 1989; McAdams, Winsberg, Donnadieu, Desoete,& Krimphoff, 1995) to the melodies. We focused on tim-bre because it is a potent cue for auditory stream segrega-tion and should facilitate orienting of attention to a singlemelody line and because we wanted to identify regions ofinterest for further studies of timbre processing.

MethodSubjects. Twelve subjects (7 females; mean age, 28.5 years; age

range, 2041) participated in the experiment: 8 fellows from the

2000 Summer Institute in Cognitive Neuroscience held at Dart-mouth College, 3 visitors to Dartmouth, and 1 member of the Dart-mouth community. All the subjects reported having normal hearing,and 11 of the 12 subjects were right-handed. The duration of the sub-jects formal musical training was 8.3 3.5 years, and the averageduration for which they had played one or more instruments was

17.4 6 6.9 years. All the subjects provided informed consent ac-cording to Dartmouth human subjects committee guidelines.

Stimuli and Procedure . Melodies were excerpted from the Vi-vace and Giga movements of the Sonata in F Major for two recordersby J. B. Loeillet (1712, op. 1, no. 4), thus providing two sets of twomelodies (streams) each (Figure 1A). All of the individual melodies

were stored as separate MIDI tracks, using MIDI sequencing soft-ware (Performer 6.03, Mark of the Unicorn). When stored as MIDIsequences, every note in all of the melodies could be equated in in-

tensity by assigning the same velocity value. Each melody could beassigned to an arbitrary timbre. We used the vibes and stringstimbres generated by an FM tone generator (TX802, Yamaha). These

two timbres occupy opposite corners of three-dimensional (3-D) tim-

bral space, as defined by multidimensional scaling solutions of per-ceptual similarity judgments (Krumhansl, 1989; McAdams et al.,1995). The most salient difference between the two timbres is the at-

tack time.The assignment of timbres to melodies was completely counter-

balanced so that attention was directed to each of the four melodies

played by each of the two timbres over the course of the experiment(Figure 1B). Every subject was presented with a complete set ofmelody/timbre combinations across the two blocks. The ordering ofblocks was counterbalanced across subjects. The counterbalanceddesign allowed us to compare directly the activations caused by se-lective attention to the two different timbres. Stimulus files corre-

sponding to each of the different orders were arranged in a single trackof an audio file (SoundEdit16, Macromedia). Scanner trigger andtiming pulses were arranged on a second track, and both tracks were

written to CD.The stimuli were delivered via ear-insert tubephones (ER-30,

Etymotic Research). In order to obtain better acoustic separation of

the music from the echo-planar imaging (EPI) pinging, the subjectwore ear muffs (HB-1000, Elvex), through which the tubephones

had been inserted. Prior to amplification, the audio signal from theCD was filtered with a 31-band 1/3 octave band equalizer (Model

351, Applied Research Technologies) to compensate for the atten-uation by the tubephones of frequencies above 1.5 kHz. The stim-uli were presented at ,102 dB SPL.

Each functional EPI run began with 30 sec of rest in order to adaptthe subject to the EPI sound and to allow the global signal intensityin the EPI images to stabilize. Eight seconds prior to the onset of eachtask epoch, a verbal cue indicated the task: listen, attend vibes, or at-tend strings. Task epochs were 30 sec long and were immediately fol-lowed by a 30-sec rest period (rest condition) prior to the next verbalcue. We began and ended each block with epochs in which the subjectswere asked to listen to the melodies in a holistic, integrated manner,rather than attempting to focus their attention on one timbre or the

other (listen condition). The subjects were instructed to focus their at-tention as best they could on the cued instrument during the attendepochs. Following the experiment, the subjects completed a question-naire and an interview about their musical training, task difficulty rat-ings, and the strategies they had used in performing the task.

Data acquisition . Data were acquired on a General Electric SignaHorizon Echospeed MRI scanner (1.5 T), fitted with a GE birdcage

head coil. Functional images were collected with a gradient echo EPIpulse sequence with the following parameters: TR 5 2 sec; TE 535 msec; field of view (FOV)5 2403 240 mm; flip angle (a)5 90;matrix size 5 643 64; resolution 5 3.75 3 3.75 3 5.0 mm; inter-slice spacing 5 0 mm. Twenty-seven axial slices were collected, pro-viding whole-brain coverage. Two sets of high-resolution T1-weightedanatomical images were also obtained. The first was a set of 27 slices

taken in the same planes as the functional images (coplanar) and ob-tained with a two-dimensional fast spin echo sequence with the fol-lowing parameters: TR5 650 msec; TE5 6.6 msec; FOV5 2403240 mm;a5 90; matrix size 5 256 3 256; resolution 5 0.937 30.937 3 5.0 mm; interslice spacing 5 0 mm. The second set con-sisted of 124 sagittal slices and was obtained with a 3-D SPGR se-

quence with the following parameters: TR 5 25 msec; TE 56.0 msec; FOV5 2403 240 mm;a5 25; matrix size 5 2563 192;resolution 5 1.23 0.937 3 0.937 mm.

Data processing. EPI volumes from the initial 30-sec adaptationphase of each run, during which the subjects were waiting for thetask to begin, were discarded prior to the analysis. SPM99 was used

for image preprocessing and statistical analyses (Friston et al.,1995). Unless otherwise specified, all algorithms ran with defaultsettings in the SPM99 distribution (http://www.fil.ion.ucl.ac.

uk/spm). In order to estimate and correct for subject motion, re-alignment parameters relative to the first image of the first run werecomputed for each of the functional runs. A mean functional image
http://www.fil.ion.ucl.ac.uk/spmhttp://www.fil.ion.ucl.ac.uk/spm


4/20


was constructed and used to coregister the functional images with

the coplanar anatomical images. The mutual information algorithmimplemented in SPM99 was used for coregistration (Maes, Col-

lignon, Vandermeulen, Marchal, & Suetens, 1997). The coplanarimages were then coregistered with the 3-D high-resolution images.These, in turn, were normalized to the Montreal Neurological Insti-tutes T1-template images that approximate Talairach space, and thenormalization parameters were applied to the functional images. De-fault parameters were used for the normalization, with the exceptionthat normalized images retained the original voxel size of 3.75 33.75 3 5 mm. Normalized images were smoothed with a 6 3 6 38 mm (FWHM) Gaussian smoothing kernel. Both individual and

group statistical parametric maps (SPMs) were constructed fromthese images.

Regression coefficients were estimated for rest, verbal cue, listen,

attend vibes, and attend strings conditions. The verbal cue onset eventswere modeled as consisting of an early and a late component, using amodified version of the basis function routine in SPM99. Gammafunctions for the early and the late components were specified by pass-

ing shape parameter values of 2 and 3 to the gamma probability den-

sity function specified in SPM99, and these were then orthogonalizedby the same procedure as the default set of SPM99 gamma functions.

Similarly, the onset of each attentional condition was modeled withthese gamma functions in order to emphasize the initial period of at-tentional orienting at the onset of the musical excerpt. The regressorsfor the rest, listen, and attend conditions were modeled as a boxcarwaveform convolved with the SPM99 canonical hemodynamic re-sponse function (HRF). Regressors for a linear trend and estimatedmotion parameters were also included in the model for each subject.

In order to identify general task-related BOLD signal increasesand for purposes of comparison with Experiment 2, SPMs were cre-

ated for the following contrasts: listen minus rest, and attend(pooledacross timbres) minus rest. These contrasts identify general regionsthat respond to the musical stimulus under the different attentional de-

mands. Because the attend conditions were also separated by timbre,we compared activation maps elicited by attending to different tim-bres, as compared with the rest condition. In addition, the focused-attending (attend) condition was compared directly with the global-

1 2 3 4 5

=120

6 7 8 9

10 11 12 13 14

15 16 17 18 19

1 2 3 4

= 105

5 6 7 8

9 10 11 12

13 14

Melody 2A

Melody 2B

Melody 1A

Excerpt 2 Excerpt 2 Excerpt 1Excerpt 1 Excerpt 1Excerpt 2

Melody 1B

Melody A VibesVibes

Vibes

VibesStrings

StringsStringsStringsMelody B

Listen Attend Attend Attend Attend ListenTask

Listen Rest Rest Rest Rest RestAttend Attend Attend Attend ListenTask

Excerpt 2: GigaExcerpt 1: Vivace

Vibes

Strings Vibes

Strings

Excerpt 2 Excerpt 2 Excerpt 1Excerpt 1 Excerpt 1Excerpt 2

Melody A

VibesVibes

Vibes

Vibes

Strings

Strings

StringsStrings

Melody B

Strings

Vibes Strings

Vibes

Run 1

Run 2

A

B

Rest Rest Rest Rest Rest

None None None None None

None None None None None

Stimulus

Stimulus

Figure 1. Design of Experiment 1. (A) The excerpts from two baroque flute duets used in the experiment. Each melody could bemapped to an arbitrary synthesized timbre, in this case vibes and strings timbres. (B) Counterbalancing scheme used in the ex-

periment. In each run, task blocks alternated with rest blocks. The subjects heard only echo-planar imaging ping ing during the restblocks. Streams A and B refer to the melodic lines shown abovefor example, Melody 1A and M elody 1B, respectively. The diagram

indicates which excerpt was played and which timbre was mapped to each stream (melodic line) in each task block. For task blocks inwhich the subjects focused their attention selectively on a single stream, the attended stream (timbre) is highlighted with a box.


5/20


listening (listen) condition to determine whether the different modesof attending to the music elicited different patterns of BOLD signals.

The contrast maps from each subject were then entered into arandom-effects analysis, and SPMs for the group data were created.

SPMs were thresholded atp, .01 (uncorrected). Activations signif-icant at the cluster level (extent threshold: clusters larger than the es-timated resolution element size of 19 voxels) were identified by

comparing SPMs projected onto the mean T1-weighted structuralimage (averaged across the subjects in the study) with the atlas ofDuvernoy (Duvernoy, 1999).

Results and DiscussionAll the subjects reported that they performed the task as

instructed (Table 1). Eleven of the 12 subjects judged thetwo timbres to be equally loud. However, the ratings of howeasy it was to attend to each timbre were more variable. Onaverage, each timbre could be attended to with relativeease, although the within-timbre ratings differed widelyacross subjects for both timbres. Some subjects had an eas-

ier time attending to vibes than to strings, whereas for othersubjects the opposite was true. The difficulty ratings forthe two timbres were within one point of each other for 5of the subjects.

Significant BOLD signal increases were observed inthree principal areas in the listenrest contrast (Table 2,Figure 2A). These included, bilaterally, the superior tem-poral gyrus (STG) spanning from the planum polare (PP)to the PT, thus encompassing the regions surrounding andincluding portions of HG. The SMA/pre-SMA showedincreased BOLD signals bilaterally. The third area of sig-nificantly increased BOLD signal was the right PcS.

When the subjects were instructed to attend selectively toone instrument or the other, the pattern of BOLD signal in-creases was largely the same as when they were instructedto listen more holistically. However, increased BOLD sig-nals during selective listening was noted along the left in-traparietal sulcus (IPS) and the supramarginal gyrus (Fig-ure 2A, 140 mm slice). Frontal BOLD signal increaseswere observed bilaterally in a swath stretching from the dor-sal aspect of the inferior frontal gyrus (IFG), along the IFSto the inferior PcS (Figure2A, slices120 to160). The pre-motor areas on the right were activated during both holisticlistening and selective attending, whereas the frontal and

parietal regions in the left hemisphere appeared to be mostaffected by the selective-listening task. Figure 2B showsthat a direct statistical comparison of the attend and the lis-ten conditions largely conformed to these observations(Table 2). In addition, selective attending resulted in signif-icantly greater BOLD signals in the posterior part of the leftSTG and the underlying superior temporal sulcus (STS). In

the opposite contrast, the listen condition exhibited clustersof significantly greater BOLD signals than did the selectiveattend condition bilaterally along the anterior calcarine sul-cus and in the rostral fusiform gyrus.

Because the stimulus materials were completely coun-terbalanced, we could explore whether there were any dif-

ferences in the responses to the two timbres. We observedno striking differences in the group images as a functionof the timbre that was attended (note the overlap of con-tours in Figure 2A). The direct comparison of attentivelylistening to each timbre yielded very small foci, althoughneither did these fall in the auditory cortex or surround-ing association areas nor were they closely apposed, asmight be expected from activation of an area possessinga timbre map (data not shown). Thus, we did not pursuethe issue further.

EXPERIMENT 2

Experiment 2 was performed in order to obtain objectivebehavioral verification, in the context of a target detectiontask, that subjects had oriented their attention as instructedduring the fMRI scans. In addition, we used an excerptfrom a Schubert trio to assess whether the activation pat-terns elicited with the stimulus materials of Experiment 1could be elicited with a musical excerpt consisting of threestreams. In contrast to the first experiment, Experiment 2used a timbral deviance detection task to assess the degreeto which the subjects were able to focus on a single instru-ment or divide their attention across instruments. Because

we needed to obtain behavioral responses during divided(global/ holistic) and selective attention conditions, ourability to use the musical excerpt as a control stimulus in apassive holistic-listening condition (as in Experiment1) wascompromised. To take the place of the passive-listeningcondition, we added a scrambledcondition in which thesubjects listened to a nonmusical stimulus that was derivedfrom the musical excerpt by filtering white noise with theaverage spectral characteristics in successive 1.5-sec win-dows of the original Schubert trio excerpt. Thus, very coarsespectrotemporal features were preserved, but stream andrhythmic cues were removed. Owing to the change in task

structure, the experimental protocol was modified from onein which 30-sec task epochs were alternated with 30-secrest epochs to one in which 15-sec trials were interleavedwith 8-sec rest periods.

MethodSubjects. Fifteen subjects (8 females; mean age, 18.73 years;

range, 1822) participated in behavioral pretesting of the stimulus ma-terial. Of these subjects, 5 (4 females; age range, 1819 years) partic-

ipated in the fMRI scanning sessions. Three additional subjects (2 fe-males; age range, 1920 years) participated in the fMRI scanningsession following a brief training session on the task. The average

amount of formal musical training in the behavioral pretest cohort was

4.4 3.76 years, and for the fMRI cohort it was 6.13 5.79 years. Allthe subjects had normal hearing.

Stimuli. The musical passage is shown in Figure 3. The instru-ments parts were transcribed into MIDI sequences to allow the in-troduction of timbral deviants at various points in the excerpt. The

Table 1

Average Ratings of Task Difficulty in Experiment 1

Rating Range

Relative loudness 3.96 0.3 34Attend to strings 2.736 1.5 15

Attend to vibes 2.56 1.2 15

NoteFor relative loudness, 15 strings, 75 vibes; for the attend con-ditions, 1 5 easy, 75 difficult.


6/20


Table2

AreasofActivationinExperiment1

ListenRest(Holistic)

Attend

Rest(Selective)

Listen

Attend

Attend

Listen

Lobe

Hemispher

e

Region

x

y

z

ZScore

x

y

z

ZS

core

x

y

z

Z

Score

x

y

z

Z

Score

Temporal

left

STG

256

624

25

4.0

4

260

21

5

25

4.6

4

260

215

45

5

.30

252

22

2

25

4.2

6

252

226

45

4

.49

268

230

45

4

.00

STS/STG

264

241

10

4.0

5

rostralFG

234

241

210

3.4

6

right

STG

260

230

40

4

.70

256

21

1

25

4.6

9

260

21

9

25

4.7

4

252

22

2

25

4.7

8

252

222

45

4

.83

rostralFG

226

256

215

3.3

9

collateralsulcus

230

252

425

3.1

6

Frontal

left

SMA

428

254

55

3.3

0

PcG

252

250

40

4.0

6

inferiorPcS

256

238

25

4

.22

252

254

20

3.1

5

IFS/IFG/MFG

245

222

30

4

.50

right

SMA

240

23

8

60

4.0

0

230

234

60

4

.05

rostralIFS

234

238

15

3.8

6

caudalIFS

245

211

30

4

.07

238

215

25

4.2

6

PcS

245

23

0

45

4.2

1

241

420

40

4

.91

superiorPcS

230

428

50

3

.67

PcG

241

628

60

3.4

4

MFG

245

23

4

55

3.5

3

Parietal

left

IPS

234

252

45

3

.20

supramarginalgyrus

249

245

40

3

.02

256

234

40

3.5

7

256

226

25

3.9

6

Occipital

left

anteriorcalcarinesulcus

428

260

425

4.2

1

right


211

249

425

3.5

0

Other

left

caudate

219

424

20

3

.80

right

GP/putamen

215

424

25

3

.66

anteriorthalamus

211

428

10

3

.25

cerebellum

2

19

275245

3.7

1

Note

STG,superiortemporalgyrus;STS,superiortemporalsulcus;FG,

fusiformgyrus;SMA,supplem

entarymotorarea;PcG,precentralgyru

s;PcS,precentralsul-

cus;IFS,

inferiorfrontalsulcus,IFG,

inferiorfrontalgyrus;MF

G,m

iddlefrontalgyrus;IPS,intraparietalsulcus;GP,globuspallidus.


7/20


note-on and note-off velocities were set to the same value within eachpart, and the relative note velocities between instruments were ad-justed to achieve comparable salience of the individual parts. Thestandard piano notes were rendered by a Korg SG-1D sampling grand

piano. Timbres for all violin and cello notes and for piano deviantswere rendered by an FM tone generator (TX802, Yamaha).

In addition to the original standard sequence, 4 sequences con-

taining deviants were constructed for each of the three instruments,resulting in a total of 12 deviant sequences. For any given deviantsequence, there was a deviant in a single instrument and one of fourlocations. The locations were distributed across early and late por-tions of the excerpt. In the case of the violin and cello, the deviantnotes encompassed one measurethat is, a window of 1,000 msec

whereas for the piano, the deviant notes took up two thirds of a mea-sure (666 msec). The deviant notes were played with a slightly dif-

ferent timbrefor example, one of the other violin sound patchesavailable on the Yamaha synthesizer. The standard and deviant se-quences were recorded with MIDI and audio sequencing software(Performer 6.03, MOTU). Each sequence was 15.5 sec long. The

same set of sequences was used for the selective and the divided at-tention conditions described below.

The time courses of root-mean square (RMS) power were com-

pared for the standard and the deviant sequences. Initially, the notevelocities of the deviants were adjusted until the RMS curves over-lapped for the standard and the deviant sequences. Unfortunately,deviant notes adjusted by this procedure were very difficult to de-tect and often sounded quieter than the surrounding standard notes.We therefore adjusted the velocities of deviants until detection per-

formance in subsequent groups of pilot subjects rose above chancelevels. This adjustment resulted in a combination of timbral and

Figure 2. Group (N5 12) images showing signif icant blood oxygen level dependent (BOLD) signal increases (p , .01) for two sets

of contrasts from Experiment 1, superimposed on the average T1-weighted anatomical image of the subject cohort. (A) Contrast ofgloba l listening (green blobs) and selective listening to each of the timbres (red and blue contours) with rest. (B) The direct contrast

of selective listening with global listening. Brain areas with signif icantly stronger BOLD signals during selective listening, as comparedwith global listening, are shown in a red gradient, whereas the opposite relationship is shown in a blue gradient. The white contour

line denotes the inclusion mask that shows the edges of the volume that contained data from all participants. Susceptibility artifactsignal dropout in orbitofrontal and inferotemporal regions is clearly seen in the top row of images.


8/20


loudness deviance. We must emphasize that the multidimensionalaspect of the deviants is tangential to the primary goal of this ex-

periment. In other words, we employed deviants in this task not be-cause we were trying to investigate target detection along specificfeature dimensions, but rather because we wanted objective verifi-

cation that the subjects were attending to the excerpts and wewanted to create two different types of this attention with thistaskdivided and selective.

In addition to the musical excerpts, we synthesized a control ex-cerpt (scrambled) designed to match the gross spectral characteris-tics of the original excerpt in 1.5-sec time windows. This was ac-complished as follows. First, the amplitude spectra of the standard

sequences were created for 64 consecutive short segments of 210

samples each (23.22-msec windows). These were then averaged toyield the average spectrum of a longer segment (216 samples,

1,486.08-msec windows). Next, the spectra of corresponding shortsegments of white noise were multiplied by the average spectrumof the longer window and were converted into the time domain.This process was repeated across the total duration of the originalexcerpt. In order to eliminate clicks owing to transients betweensuccessive noise segments, the beginning and end of each noise seg-ment were multiplied by linear ramps (3.63 msec). At shorter win-dow durations of average spectrum estimation, the filtered noisestimulus tended to retain the rhythmic qualities of the original ex-

cerpt and induced a weak sense of stream segregation between a lowand a high register in the noise. Since we wanted to minimize re-cruitment of musical processing or attention to one of multiplestreams in this control stimulus, we chose the 1.5-sec windows. Wedid not use envelope-matched white noise (Zatorre, Evans, & Meyer,1994), since that type of stimulus retained the rhythmic propertiesof the excerpt.

Procedure. The subjects were scanned during three consecutive10.5-min runs. Each run consisted of divided and selective attention

trials. During a divided attention trial, the subject heard the verbalcue, everything, indicating that the deviance, if it occurred, couldoccur in any of the three instruments. During the selective attention

trials, the subjects heard the name of one of the three instruments.They were informed that if a deviant was to occur, it would occuronly in the cued instrument. Deviants never occurred in the uncued

instruments. The subjects were instructed to press a left button afterthe excerpt finished playing if they detected a deviant and a rightbutton if they detected none. Since buttonpresses immediately fol-lowing targets would have contaminated activations arising from

the process of attentive listening with motor response activations,which would have introduced a requirement for additional controlconditions, the subjects were instructed to withhold their buttonpress

until the excerpt had finished playing. The attentional trials (dividedand selective) were interspersed with the scrambled excerpts, towhich the subjects listened passively, pressing the right button afterthe excerpt finished playing. Verbal cues were given 6 sec prior to theonset of the excerpt. The pause between the end of the excerpt andthe following verbal cue was 8.5 sec. Across the three runs, 12 devi-ant and 9 standard sequences were presented in each of the atten-tional conditions. Thus, the probability of hearing a sequence with adeviant was .57. Overall, 12 scrambled excerpts were presented. Note

that in the behavioral pretest, the probability of deviants was .50, andthere were no scrambled trials. The greater proportion of deviants inthe fMRI experiment arose from reducing the overall length of func-tional scans by eliminating 3 standard sequences during each run.

Stimuli were pseudorandomized so that the number and timbralidentity of deviants would be balanced across runs. Within each run,the ordering of stimuli was randomized. Event timing and stimulus

Piano

Cello

Violin

Figure 3. The Experiment 2 stimulus in musical notation. The arrows mark the starting and stopping points of the 15-sec ex-cerpt used as the stimulus. From F. Schubert, Trios fr Klavier, Violine, und Violincello, op. 100, D 929, HN 193, p. 96, mea-

sures 137 through 156, Copyright 1973 by G. Henle Verlag, Munich.


9/20


presentation were controlled by PsyScope (Cohen, MacWhinney,Flatt, & Provost, 1993). Stimuli were presented from the left audiochannel, which was subsequently split for stereo presentation.

Data acquisition . The equipment and protocols described in Ex-

periment 1 were used for data acquisition and stimulus presentation.A magnet-triggering pulse at the start of each run and event mark-ers at the onsets of verbal cues and music excerpts (sent from the right

audio channel of the computer running PsyScope), together with theoutput of the response button interface box (MRA Inc.), were mixed(MX-8SR, Kawai) and were recorded to an audio file by SoundEdit16 running on an independent iMac computer (Apple Computer).

Data processing. The initial nine EPI volumes, during which thesubjects performed no task, were discarded prior to any data analy-

sis. Procedures for image coregistration, spatial normalization, andsmoothing were the same as those in Experiment 1. A regressionmodel was specified in which experimental conditions were mod-

eled as either epochs or events. Conditions specified as epochs werethe rest condition, during which no stimulus was presented, the acous-tic control sequence (scrambled), and a total of eight music conditions

reflecting the divided and the selective attention conditions and eachof the four deviance types (cello, piano, violin, and none) that were

present in each of the attention conditions. Verbal cues and responseswere modeled as events. The onset times of verbal cues, stimulus ex-

cerpts, and buttonpresses, relative to the start of data acquisition oneach run, were extracted from the audio files recorded during eachrun, using custom-written event extraction scripts in Matlab (Math-

works). Epochs were modeled as a 15.5-sec boxcar convolved withthe canonical HRF in the SPM99 toolbox. In order to emphasize pro-cesses related to auditory stream selection at the beginning of a mu-sical excerpt, the onsets of each epoch in both the divided and the se-lective attention conditions were modeled as events. The verbal cueand attentional onset events were modeled as consisting of an earlyand a late component, as in Experiment 1. Response events weremodeled as an impulse convolved with the canonical HRF. The mo-tion parameters estimated during EPI realignment and a linear trend

were added to the model to remove additional variance.In order to facilitate comparison with the data from Experiment 1,

the beta parameter estimates from the selective attention condition

regressors were combined and compared with the beta parameterestimate for the rest condition. Similarly, the beta parameter estimatesfrom the divided attention condition regressors, which presumablypromoted active holistic listening, were combined and contrasted

with those for the rest condition. The response to the acoustic con-trol stimulus (scrambled) condition was also contrasted with the restcondition. Finally, the selective and divided attention conditions

were compared directly by calculating the difference between theirrespective beta parameter estimates. Group effects were estimatedfrom the contrast maps for individual subjects, thresholded atp, .01(magnitude) and one resolution element (24 voxels for the contrasts

with the the rest condition and 17 voxels for the direct comparisonof attention conditions) and were projected onto the mean T1-weighted anatomical image. Anatomical regions of significant BOLDsignal changes were identified as in Experiment 1.

Results and Discussion

Behavior. The best behavioral evidence in support ofdissociable neural mechanisms underlying selective anddivided attention in real musical contexts would be a costin detection accuracy during divided attention relative toselective attention. The ideal distribution of responses inthe target detection framework would be one in which tar-

gets occurring during divided attention would be missedthat is, detected at chance levelsand all targets in the se-lectively attended stream would be detected correctly.False alarms would also be lower under selective attentionconditions.

In the first set of calibration experiments, in which reg-ular headphones were used, the subjects (n 5 6) detecteddeviants under the divided attention condition at chancelevel (49%). In the selective attention condition, detectionof deviants was above chance (60%) and significantlyhigher than in the divided attention condition (paired ttest,

t52

4,p,

.01).The adjustment of target intensity levels proved moredifficult when fMRI tubephones were used. With a groupof 9 subjects, 71% of the divided attention targets weredetected on average, as compared with 81% of the selec-tive attention targets, demonstrating that targets could bereliably detected even under less than optimal listeningconditions. However, the difference between attentionalconditions only bordered on statistical significance (p5

.084). In practice, we found that we were unable to adjustthe amplitudes of the individual timbral deviants so that wecould observe a strong and reliable dissociation between

forms of attention in the detection of targets across sub-jects. The subjects showed different propensities in detect-ing deviants in the different instruments. Some detectedthe piano deviants better than the cello deviants, and forothers the opposite was true. Similarly, the accuracy withwhich the subjects detected deviants at each of the four po-sitions within the passage varied from subject to subject.The length of the passage (15sec), the overall length of thefMRI experiment (1.5 h), and our dependence on pre-recorded material prevented us from engaging in an adap-tive psychophysical procedure to determine each individ-uals threshold for each timbral deviant in the context of a

musical piece.Five subjects whose behavioral performance was better

for the selective than for the divided attention conditionduring the calibration experiments participated in thefMRI scanning session, as did 3 additional subjects whoreceived instruction and 12 training trials immediately pre-ceding the fMRI session. Detection of deviants was sig-nificantly above chance levels in both the selective (71%7% hits [mean SEM]; t5 2.0,p, .05) and the divided(73% 6%; t5 2.7,p , .02) conditions. These hit rateswere not significantly different from each other (pairedttest, t, 1 ). In addition, false alarm rates were low in both

the selective and the divided attention conditions (11% and8%, respectively).

Table 3 summarizes the subjects ratings of task diffi-culty. Ratings were made on an integer scale from 1 (veryeasy) to 7 (very difficult). The subjects found attending tothe violin part and detecting violin deviants relatively moredifficult, as compared with the other instruments, duringfMRI scanning. The subjects reported that the EPI ping-ing interfered with their ability to detect deviants in theviolin stream, because the two fell into the same register.We were aware of the interaction between the magnet andthe violin streams at the outset but were not concerned by

it because the de facto task of separating the violin fromthe magnet was consistent with our aim of investigatingprocesses underlying stream segregation in natural con-texts. In general, both the listen condition of Experiment 1and the divided attention condition in the present experi-


10/20


ment may involve a baseline level of selective attention as-sociated with segregating the music from the EPI pinging.The overall difficulty of the task was rated as moderatelydifficult both outside and inside the magnet, as was the dif-ficulty of detecting deviants when the instruction was toattend to all the instruments. Ratings of how often (1, often;

7, not often) the subjects found themselves ignoring themusic altogether indicated that the subjects almost alwaysattended to the music. These self-reports are in agreementwith the observed performance in the detection task.

Physiology. In contrast to Experiment1, the attentionalload in this experiment was high whenever the subjectsheard the musical excerpt. Not surprisingly, the patterns ofincreased BOLD signals were very similar in the dividedand the selective attention conditions, as compared withthe REST condition (Table4, Figure4A). This observationis consistent with the absence of behavioral differentiationof the two conditions. Despite the similar behavioral per-

formance and largely overlapping activation pattern, a di-rect comparison of the BOLD signal elicited under the dif-ferent attention conditions revealed significant differencesin the activation patterns (Table 4, Figure 4B). In the com-parisons with the rest condition, both conditions recruited,bilaterally, regions of the STG surrounding and includingthe HG, the pre-SMA, and the frontal operculum. Theright-hemisphere IFG activation extended more laterallythan on the left. On the right, an area around the intersec-tion of the superior precentral and superior frontal sulciand the posterior part of the middle frontal gyrus (MFG)was activated in both conditions, although in the selective

attention condition, the size of the cluster at this locationwas slightly smaller than the extent threshold. The rightIPS and the angular gyrus were activated in both atten-tional conditions, whereas the left IPS was activated pri-marily in the divided attention condition.

The divided attention condition was associated withtwo additional activation foci not observed in the previousexperiment. One was observed bilaterally in the ventro-rostral part of the MFG, although the cluster on the rightwas slightly smaller than the extent threshold used in com-puting the contrasts. The other was activation of the ante-rior cingulate gyrus just ventrolateral to the large pre-

SMA activation cluster.The direct comparison of the selective and the divided

attention conditions further highlighted the activation dif-ferences between the two conditions (Table 4, Figure 4B).In particular, during divided attention, the BOLD signal

was stronger in bilateral parietal, right superior frontal,and left anterior cingulate areas. In contrast, the selectiveattention condition was associated with a stronger BOLDsignal in the left fusiform gyrus and in a number of oc-cipital areas bilaterally, including the lingual gyrus, thelateral occipital sulcus, and the anterior calcarine sulcus.

Note that none of the latter areas showed significant ac-tivity for the selective attention condition, relative to therest condition.

When BOLD signal increases were determined for thescrambled condition, relative to the rest condition (Fig-ure 4A), only two areas exceeded the height and extentthresholds: the right STG in the vicinity of Heschls sulcus(x, y, z 5 60, 215, 10), and the left postcentral gyrus(x, y, z5252,219, 50). The only task during the acousticcontrol sequences was to listen to the sound and make a re-sponse with the right button when the sound terminated.

CONJUNCTION ANALYSISOF EXPERIMENTS 1 AND 2

Identification of Common Patterns Despite

Variations in Tasks and StimuliDespite the variability in stimulus materials and tasks,

the two experiments making up this study had two condi-tions in common. In one, the subjects attempted to focustheir attention selectively on one of either two or threedifferent instruments that played concurrently (attendcondition in Experiment 1, selective condition in Exper-iment 2), and in the other, they rested while listening to

the EPI pinging (rest condition in both experiments).Other conditions that were somewhat related across bothexperiments were those in which the subjects listened tothe music in a more global, integrative, holistic fashion(listen condition in Experiment 1, divided condition inExperiment 2). However, the latter conditions were not aswell equated, owing to the increased task demands in thesecond experiment during the divided attention condi-tion. Given these similarities across experiments, we wereinterested in identifying the set of cortical areas activatedby similar conditions in both experiments.

Problems of Specifying Baseline Conditionsand Control Tasks in the Functional

Neuroimaging of AuditionAdditional motivation for a conjunction analysis stems

from the observation that specification of baseline tasks

Table 3

Average Ratings of Task Difficulty

Overall Attend Detect Detect in

Difficulty Violin Cello Piano Violin Cello Piano Divided Ignore

First pretest 5.86 1.2 4.06 2.6 3.56 1.2 5.76 1.0 4.86 2.4 3.86 1.5 5.36 1.2 6.26 1.0 5.06 1.6Second pretest 4.76 1.1 3.66 1.6 3.76 1.9 3.16 1.5 3.76 1.7 4.36 1.5 3.06 1.3 5.16 1.5 4.96 2.2

fMRI 4.46 1.2 4.56 1.4 2.66 1.1 3.06 1.3 4.66 1.8 2.86 1.3 3.26 1.7 4.06 1.3 5.96 1.4

NoteRatings were made on an integer scale from 1 (easy) to 7 (difficult). Attend columns indicate how difficult it was to main-tain attention to each stream, whereas Detect columns indicate how difficult it was to detect deviants within each stream. Detect

in Divided refers to difficulty of detecting targets in the divided attention condition. Ignore indicates how often the subjects foundthemselves ignoring the music altogether (1, often; 7, not often).


11/20


Table4

AreasofactivationinExperiment2

Divided

Rest

SelectiveRes

t

DividedSelective

SelectiveDivided

Lobe

Hemisphe

re

Region

x

y

z

Z

Score

x

y

z

Z

Score

x

y

z

Z

Score

x

y

z

Z

Score

Temporal

left

STG

252219

5

4.1

6

256

0

0

3.2

9

256238

15

3.1

3

256

222

10

4.3

8

260

238

15

3.2

6

FG

226

268

210

3.1

right

STG

56211

5

3.7

9

56

11

10

3.7

8

56222

10

4.8

6

52

222

10

4.1

1

68226

10

3.4

Frontal

bilateral

pre-S

MA

0

4

45

4.0

7

0

19

45

4.7

6

0

19

45

4.7

2

0

34

40

3.9

9

left

pre-S

MA

24

8

50

3.3

4

frontaloperculum

238

19

5

4.2

1

238

19

5

3.8

9

230

22

0

3.7

8

rostralMFG

238

56

10

3.5

8

230

49

5

3.5

6

CS

234226

60

3.4

9

right

frontaloperculum

38

19

0

3.5

7

45

15

5

3.9

1

34

22

10

3.4

2

SFS

26

19

50

3.2

0

superiorPcS/SFS/MFG

302

4

55

4.8

30

24

55

3.9

2

MFG

34

4

65

2.8

6

Parietal

left

PoCG

256

222

50

3.0

9

IPS

241249

50

3.9

3

234252

40

3.4

1

230264

40

3.6

5

right

IPS

38249

50

3.4

7

IPS/angulargyrus

38256

60

4.0

8

41

256

55

3.8

7

34260

50

3.2

0

Limbic

bilateral

anteriorcingulatesulcus

28

30

30

4.0

4

posteriorcingulategyrus

0230

30

3.0

9

left

anteriorcingulate

28

34

25

3.2

8

Occipital

left

lingualgyrus

215

275

25

3.4

2

right

lingualgyrus

15

282

210

3.7

8

MOG/lateraloccipitalsulcus

41

279

25

3.6

9


15

260

5

3.6

3

Other

left

caudate

215

15

0

3.1

2

anteriorthalamus

222

4

10

3.4

3

cerebellum

211275245

3.2

4

222

268

230

3.1

7

right

caudate

15

4

10

3.7

19

4

15

3.6

2

thalamus

112

8

10

4.6

5

11215

5

3.6

2

Note

STG,superior

temporalgyrus;FG,

fusiformgyrus;SM

A,supplementarymotorarea;MFG,mi

ddlefrontalgyrus;CS,centralsulcus;SF

S,superiorfrontalsul-

cus;PcS,precentralsulcus;PoCG,postcentralgyrus;IPS,intraparietalsulcus;MOG,m

iddleoccipitalgyrus.


12/20


causes problems for PET/fMRI studies that try to disso-ciate levels of processing in audition, particularly as moreacoustically complex and structured stimuli are employed.The crux of the problem is twofold: matching the acousti-cal complexity of stimuli across conditions while inferring/confirming the mental state of subjects across conditionsin which they hear the same stimulus. Seemingly, the sim-plest solution to this is to present the identical stimulus inthe different conditions and independently vary the task re-quirementsfor example, passive listening versus atten-

tive listening. This type of design has been used to describemnemonic processing in the context of melodic perception(Zatorre et al., 1994), phonetic and pitch judgments aboutspeech syllables (Zatorre, Evans, Meyer, & Gjedde, 1992),

and auditory selective attention to tones and syllables(Benedict et al., 1998; Jncke et al., 1999; Tzourio et al.,1997). One argument against usingpassive listening as acontrol task is that the mental activity associated with it isnot well defined and may vary among subjects, particu-larly as stimulus complexity and naturalness increase. Forexample, one subject may ignore melodies, speech sounds,or tones and think, instead, about the time he or she is wast-ing in the scanner, whereas other subjects may attempt toshadow (sing/speak along with) the acoustic material.

Despite these problemswhich are likely to increase theamount of variance in the data that is unaccounted forpassive-listening conditions have traditionally served asan important intermediate state between resting in the ab-

Figure 4. Group (N5 8) images showing significant blood oxygen level dependent (BOLD) signal increases ( p , .01) for two sets of

contrasts from Experiment 2, superimposed on the average T1-weighted anatomical image of the subject cohort. (A) Contrast of di-vided attention (global) listening and rest is depicted with a red gradient. The contrasts of selective listening to each instrument, rel-

ative to rest, are denoted with a contour line of different color for each instrument. The responses to an acoustically matched controlstimulus are shown in green. (B) The direct comparison of selective and divided attention conditions. Areas with a stronger BOLD re-

sponse during selective attending, as compared with divided attending, are shown in red, whereas areas with a stronger BOLD responseduring divided attending are shown in blue.


13/20


sence of a stimulus and very directed processing of astimulus. For this reason, we used resting and passive lis-tening as control tasks in Experiment 1. We should cau-tion, however, that even the passive-listening conditionmay require a minimum of selective attention for segre-gating the music from the continuous pinging sound of

the EPI pulse sequence. The degree to which separatingthe music from the pinging requires attentive mecha-nisms, rather than preattentive automatic mechanisms,according to principles of auditory stream segregation(Bregman, 1990) remains to be determined.

Other studies of auditory attention dispense withacoustically matched control stimuli altogether, insteadcomparing BOLD signal levels during attention withsilent rest blocks (Celsis et al., 1999; Linden et al., 1999;Zatorre et al., 1999) or white noise bursts (Pugh et al.,1996). This approach provides a comprehensive view ofauditory processing in a particular task/stimulus condi-

tion, although it precludes dissociating attentional pro-cesses from those related more specifically to sensoryencoding and preattentive acoustic analysis. An addi-tional concern is that the cognitive state of a resting sub-

ject is unknown to the experimenter and may influencethe activation patterns derived from comparisons of taskand rest activations, possibly even obscuring activationsof interest. Despite these shortcomings, passive rest isoften the common denominator across experiments. Assuch, it affords a reasonably constant reference againstwhich task-specific activations can be contrasted withinexperiments, and these contrasts can then be entered into

conjunction analyses to investigate the commonalityacross experiments.

Conjunction Analysis MethodsFor the purposes of the conjunction analysis, the listen

and the divided conditions were grouped together (labeledglobal) to reflect the holistic, global listening that was sug-gested or required. The attend and the selective conditions,which collectively required focusing attention on individ-ual instrument streams, were grouped together as the se-lective conditions. Regions of SPMs (thresholded atp ,

.05) that overlapped in the two experiments are shown in

Figure 5 and Table 5 for the globalrest and selectiverestcontrasts, respectively. We applied a lower significance cri-terion than in the individual experiments, since the con-

junction analysis was essentially a screen for regions of in-terest for further studies on the topic of attention in realmusical contexts. The likelihood of observing a significantconjunction was 2.53 1023. We did not perform a globalversus selective conjunction analysis, because within eachexperiment, the global versus selective contrast would re-flect the difference between high-level cognitive states thatwere somewhat different in the two experiments. In otherwords, in Experiment 2, the contrast would be between two

attentionally demanding states in the context of target de-tection, whereas in Experiment 1, the contrast would be be-tween a demanding selective attention state (not requiringdetection of single targets) and a less demanding passive-

listening state. Thus, a direct comparison of the differencestates in a conjunction analysis would not be readily inter-pretable. The conjunction maps were projected onto the av-erage T1-weighted anatomical image of the Experiment 1subjects, and Brodmanns areas were assigned to the acti-vations, using the atlas of Duvernoy (Duvernoy, 1999) and

Brodmanns areal descriptions (Brodmann, 1909/1999).

Conjunction Analysis Results and DiscussionFigure 5 shows the results of the conjunction analysis

in red. In both experiments, the requirement to focus at-tention on a single instrument, relative to passive resting(selectiverest), resulted in bilateral activation of the STG,pre-SMA, frontal operculum/rostral insula, superior PcS,IPS, and cerebellum (Figure 5A). In addition, there wasright-lateralized activation in the thalamus and the caudatenucleus. A similar pattern was observed for the globalrestconjunction with respect to the STG, pre-SMA, superior

PcS, and IPS (Figure 5B).Also shown in Figure 5 are the activations that were

unique to Experiment1 (green patches) and Experiment2(yellow patches). When the activations unique to each ex-periment are compared with the conjoint activations, bothwithin a contrast and across contrasts, a picture emergesthat can be accounted for, tentatively, on the basis of taskdifferences across the two experiments. Two major differ-ences existed between the two experiments. First, Exper-iment 2 was a target detection task, but Experiment 1 wasnot. Second, owing to the different tasks, the attentionalcharacteristics were more closely matched between ex-

periments for the selective-listening conditions than forthe global-listening conditions. Thus, areas that are verysensitive to attentional load would be expected to show aconjunction in the selectiverest contrast, whereas a con-

junction would be less likely for the globalrest contrast,owing to mismatched attentional demands.

Several areas that were activated in both experiments inthe selectiverest comparison were activated to a greaterextent or exclusively in Experiment 2 for the globalrestcomparison. These included the frontal operculum/rostralinsula and the IPS bilaterally and the caudate nucleus andthe thalamus on the right. In addition, the medial frontal

activation extended more ventrally into the anterior cingu-late sulcus in Experiment 2, and there was an activationfocus in the posterior cingulate. The presence of these ac-tivation foci in the global-listening condition of Experi-ment 2, but not of Experiment1, is consistent with the factthat the global-listening condition was a demanding di-vided attention target detection task in Experiment 2 butwas a passive-listening task in Experiment 1.

Experiment1 exhibited unique bilateral activation alongthe IFS extending caudally into the superior PcS in the se-lectiverest contrast (Figure 5A, slices120 to 150). Inaddition, there was a unique parietal activation in the left

supramarginal gyrus. The activation of these areas mightbe explained by the strategy for focusing attention on asingle stream that was suggested to all the subjects in Ex-periment 1. The suggested strategy was to listen to each


14/20


instruments part as though one was trying to learn/mem-orize it. Given that the observed frontal areas are inti-mately involved in working memory functions (see theGeneral Discussion section), the suggested listening strat-egy may have facilitated recruitment of these areas duringthe selective-listening task. Because Experiment 2 was atarget detection task, the goal of detecting a deviant withina specified stream was deemed to be sufficient for orient-ing attention, and the listening strategy was not suggestedto the subjects.

GENERAL DISCUSSION

Our experiments were designed with several questionsin mind. First, we wanted to identify neural circuits that

are activated during attentive listening to music, as com-pared with rest, regardless of the form of attentive listen-ing. Second, we wanted to determine whether we coulddissociate activations in response to attending to a singlestream from more holistic attending. Third, we were curi-ous about how the results of our experiments using realmusical stimuli would differ from several studies of audi-tory attention that have used simpler acoustic stimuli.Similarly, we wondered what the differences might be be-tween attentive listening to real music in a target detectioncontext and attentive listening in a more natural context

without the constraints of target detection.This last point presents a significant dilemma. A hall-

mark of cognitive psychology research is inference of thecognitive state of a subject from behavioral responses. For

Figure 5. Conjunction analysis showing activation regions common to both experiments. The conjunctions were performed by using

the contrasts of global- and selective-listening conditions relative to rest, because rest was the only condition that was identical acrossexperiments. Contrasts from the individual experiments were thresholded atp , .05, and these contrast masks were superimposed to

identify common voxels (shown in red). Activations for each contrast that were unique to Experiment 1 are shown in green, whereasvoxels unique to Experiment 2 are shown in yellow. The white line around the edge of the brain denotes the conjunction of the data in-

clusion masks from the two experiments.


15/20


instance, how well a subject is paying attention to the fea-tures of a stimulus is usually inferred from his or her de-tection accuracy and response times to the target features.Our concern is that requiring subjects to make decisionsabout stimulus features or to detect targets simply in thename of verifying compliance with attentional instruc-

tions fundamentally alters the constellations of brainareas that are engaged by the mental states we are inter-ested in. Thus, it becomes almost impossible to study at-tentional states that occur outside the context of target de-tection or decision making. These considerations led usto vary the task demands across two experiments andcompare the results via a conjunction analysis of the twoexperiments. In addition to the target detection accuracydata obtained in Experiment 2, ratings were obtained (inboth experiments) of the difficulty each subject had ori-enting his or her attention as instructed to each of the in-struments. Although not as compelling as reaction time

or accuracy data, the ratings showed that the subjects atleast attempted to orient their attention as instructed.Thus, the subjective ratings and the results of the con-

junction analysis left us fairly confident that we had ob-tained fMRI measures of auditory attention to musicalstimuli, even outside of a target detection context.

Circuitry Underlying Attentionto Complex Musical Stimuli

For both the global and the selective attention condi-tions, we observed increased BOLD signals in the tempo-ral, frontal, and parietal areas during attentive listening to

excerpts of real music. This finding is in agreement with

recent studies that have supported the hypothesis (Zatorreet al., 1999) that a supramodal attention circuit consistingof the temporal, parietal, and frontal areas is recruited dur-ing the detection of auditory targets presented in simpleacoustic contexts (Benedict et al., 1998; Kiehl, Laurens,Duty, Forster, & Liddle, 2001; Linden et al., 1999; Pugh

et al., 1996; Sakai etal., 2000; Tzourio et al., 1997; Zatorreet al., 1999) and musical contexts (Satoh et al., 2001). Mostselective attention experiments in audition are designed astarget detection tasks in which subjects form a searchimage of a single target stimulusfor example, a pitch, asyllable, or a location. In contrast to these studies, our sub-

jects were required to maintain attentional focus on amelody carried by a specific timbre. Rather than selectinga single search image from a background of stimuli withinan attended stream, the subjects were required to select anentire stream in the presence of multiple streams. Despitethese differences, we observed an activation pattern across

both experiments that was very similar to that observedduring auditory selective attention for acoustic featuresunder dichotic listening to simple FM sweeps and syllablestimuli, relative to detection of noise bursts (Pugh et al.,1996). Common to the study of Pugh et al. and our presentset of experiments were activations of the STG, the IPS (bi-lateral for syllables, right-lateralized for FM sweeps), thebilateral IFS, the right-lateralized PcS, the SMA/pre-SMA,and the bilateral frontal operculum.

Our observation that the set of cortical areas activatedby attentive listening to music is very similar to the setthat is activated during processing of simpler acoustical

stimuli might be seen as problematic, given that we argue

Table 5Areas of Activation Common to Experiments 1 and 2

GlobalRest SelectiveRest

Lobe Hemisphere Region (Brodmann Areas) x y z x y z

Temporal left STG (41/42/22) 255 7 25 254 7 24

262 242 20 252 244 26right STG (41/42/22) 55 7 0 53 16 0

64 234 15 64 234 16STS (22) 60 228 25 60 231 25

Frontal bilateral pre-SMA (6) 0 10 54 0 4 55

left frontal operculum (44) 235 20 5IFG, p. operc. (44) 260 7 20superior PcS (6/8) 231 25 51 252 24 48

rostral MFG (10) 227 56 9241 49 9

right frontal operculum (44) 44 19 5 35 20 5IFS/PcS (6/8) 47 12 30 51 10 33

superior PcS (6/8) 39 22 60 37 22 57Parietal left IPS (7/39) 242 248 46

right IPS (7/39) 44 248 50IPS/angular gyrus (39) 35 260 45

Other left cerebellum 232 265 228right caudate 10 9 8 11 6 12

thalamus 11 217 12cerebellum 33 266 225

NoteIf more than one set of coordinates is given for one hemisphere of an anatomical location, the coordinates de-

note the extent, rather than the center, of the activation conjunctions. STG, superior temporal gyrus; STS, superiortemporal sulcus; SMA, supplementary motor area; IFG, inferior frontal gyrus; PcS, precentral sulcus; MFG, middle

frontal gyrus; IFS, inferior frontal sulcus; IPS, intraparietal sulcus.


16/20


that it is important to use natural stimuli. In other words,if natural stimuli are somehow superior to simple stim-uli, should they not result in activation of brain areas inwhich simple stimuli are not capable of eliciting a re-sponse? If responses to the two classes of stimuli are notdifferent from each other, what have we learned? Our

study has been a f irst step in charting out the neural cir-cuitry that is recruited as listeners guide their attention inreal music. To determine conclusively whether there arebrain areas specialized for the processing of real music,one would have to titrate the musical complexity of thestimulus set while holding the task and overall acousti-cal complexity constant. However, our study allows usto conclude that attentive processing in musical contextsengages a set of brain areas that is similar to the set en-gaged by attentive processing of simpler acoustic con-texts. This conclusion was not evident prior to our per-forming these experiments, since it might have been

argued that processing of simple stimuli does not needthe same kind of complex network that might be requiredby rich natural stimuli. In this regard, our results can beseen as a validation of experiments that have used sim-pler stimuli. We will now relate the functional circuitrywe observed for both global and selective attention con-ditions to that found in relation to other studies of audi-tory attention and to circuits for general working mem-ory, attention, semantic processing, target detection, andmotor imagery functions.

Selective Versus Global Listening

Auditory attention in the STG. Several studies havetried to determine whether attention influences activity inthe primary and auditory association areas along the STG.Numerous studies from the human electrophysiologicalliterature suggest that attention influences activity in theseareas (Ntnen, 1992), whereas the functional neuro-imaging literature is equivocal. Primarily because of het-erogeneity across experiments in the stimuli, tasks, andspecific statistical comparisons performed, the locus of at-tentional selection along the STG is uncertain. In severalfunctional neuroimaging studies, it has been observedthat there is very little, if any, modulation of the primary

and secondary auditory cortical areas of the STG inattendlisten comparisons (Benedict et al., 1998; Tzourioet al., 1997; Zatorre et al., 1992; Zatorre, Halpern, Perry,Meyer, & Evans, 1996) or in comparisons of activationduring attention to different stimulus dimensions (Bene-dict et al., 1998; Platel et al., 1997; Zatorre, Mondor, &Evans, 1999). However, a recent fMRI study focusing onsuperior temporal lobe activation suggests that althoughpassive listening itself shows increased BOLD signal lev-els, as compared with rest, both primary and secondaryauditory cortices are activated during target detection rel-ative to passive listening (Jncke et al., 1999).

In both experiments, we found the spatial extent of STGactivation to be similar in global- and selective-listeningconditions, as compared with rest conditions. We observedmore extensive activation bilaterally in response to atten-

tive listening to the musical stimulus than in response topassive listening to the acoustically matched control stim-ulus in Experiment 2 (Figure 4A). A direct comparison ofselective listening with passive global listening in Exper-iment 1 showed a significantly stronger BOLD signalduring selective listening in the left posterior STG/STS

(Figure 2B). This result might lead one to infer that thisis the temporal lobe region in which stream selection oc-curs. However, the same area showed no difference be-tween the selective and the divided attention conditionsin Experiment 2. These conditions were closely matchedin their attentional demands, suggesting that the left pos-terior STG/STS area observed in Experiment 1 is moresensitive to overall attentional load than to selection of asingle stream per se. Our findings do not allow more de-tailed inferences about mechanisms of attentional selec-tion of auditory streams that are characterized by a vari-ety of featuresfor example, timbral, temporal, and so

forththat might influence activity patterns within theSTG region.

Auditory attention beyond the STG. When we com-pared selective- and global-listening conditions directly,we observed significant differences in several brain re-gions. This was true for both experiments, although theactivation patterns differed. In Experiment1, selective lis-tening was associated with the temporo-parieto-frontalcircuitry alluded to above and described in more detailbelow. When the target detection conditions of Experi-ment 2 were compared directly, selective listening showedgreater activity in a number of visual areas in the occip-

ital lobe than in the temporo-parieto-frontal areas that wemight have expected, given the results of Experiment 1.However, the divided attention condition in Experiment 2showed stronger activation of the parietal and medialfrontal structures that were more active in the selective at-tention condition of Experiment 1. When compared withthe rest condition, both target detection conditions in Ex-periment2 exhibited activation of temporo-parieto-frontalcircuitry that was similar to that observed in the selective-attending condition of Experiment1. This set of contrastsacross the two experiments suggests that activation of thetemporo-parieto-frontal circuitry may depend more on the

degree of attentional load than on selection of singlestreams. Thus, we were able to delineate a cortical net-work associated with attentive listening to music, but wedid not attain our original objective of identifying areasthat are specific for selection of single acoustic streams inthe presence of other, simultaneously sounding streams.

There is scant evidence in the literature for where au-ditory stream selection areas might exist. One other func-tional neuroimaging study that has compared focused-and divided-listening conditions directly (detection oftarget syllables in a stream of syllables accompanied by asimultaneous stream of text) found no regions in which

rCBF was greater during selective listening (detection ofsyllables while ignoring text) than during divided listen-ing (detection of syllables while monitoring text for con-tent; Benedict et al., 1998). A direct comparison of di-


17/20


vided listening with focussed listening found activationof the left posterior MFG (x5242,y5 0,z5 48) . Thisregion is roughly homologous to right-hemisphere acti-vations that we found in the divided versus selective con-trast of Experiment 2 (Table 4).

A recent study in which selective listening and global

listening in music were compared found the parietal andfrontal areas to be more active during selective listeningfor specific notes in the alto harmony part of four-part har-mony compositions than during global listening for minorchords in the same pieces (Satoh et al., 2001). The set ofregions they found to be preferentially active during se-lective listening to the alto part is very similar to the re-gions we found in Experiment 2 to be significantly moreactive during divided attending than during selective at-tending when the conditions were compared directly. Thisapparent discrepancy might be explained in terms of over-all attentional load. Detecting targets in one of three si-

multaneous streams is more difficult than detecting tar-gets in a single stream. Similarly, listening for a target thatis a component note of a chord is more difficult than judg-ing whether or not the overall chord is a minor-soundingchord, because four notes played simultaneously in thesame timbre will tend to integrate into a whole from whichit is more difficult to pick out individual notes.

Domain-General Activation

of Fronto-Parietal CircuitryThe parieto-frontal network, consisting of the IPS, the

pre-SMA, the PcS, and the IFS, that was activated in our

tasks in combination with temporal areas is also recruitedby a diverse array of cognitive tasks that, at first glance,have very little in common with our stimuli and tasks. Apartial listing (see Cabeza & Nyberg, 2000, for a morecomprehensive compilation) includes studies of spatial at-tention (Corbetta, Kincade, Ollinger, McAvoy, & Shul-man, 2000; LaBar, Gitelman, Parrish, & Mesulam, 1999),temporal attention (Coull, Frith, Buchel, & Nobre, 2000;Coull & Nobre, 1998), and working memory tasks that in-volve memory for spatial locations, abstract and concreteobjects (LaBar et al., 1999; Mecklinger, Bosch, Grue-newald, Bentin, & von Cramon, 2000), verbal and non-

verbal material (Jonides et al., 1998; Kelley et al., 1998),the pitch of single notes in melodic contexts (Zatorre et al.,1994), temporal interval patterns (Sakai et al., 1999), andspatiotemporal patterns (Schubotz & von Cramon, 2001).Our results provide further support for the idea that thisfronto-parietal network serves domain-general atten-tional and working memory functions.

In our study, the IPS was activated bilaterally in bothexperiments, although the extent of activation was great-est in the divided attention condition of Experiment 2(Figure 4). In the selective attention condition of Experi-ment 1, we observed activation in left supramarginal and

IPS areas, along with bilateral frontal activation along thecaudal IFS and the PcS. This experiment differed fromExperiment 2 in that the subjects were simply requested toattend to single instruments, rather than perform a target

detection task. As was mentioned above, we suggested tothe subjects that it was easier to focus attention on a sin-gle instrument by listening to it as though they were try-ing to learn the part. A hypothesis for further study is thatthe left parieto-frontal activation pattern we observedspecifically reflects the attempt to encode the melodies.

Across the two experiments, we observed BOLD signalincreases during selective and global attention in fourfrontal regions: the IFS, the frontal operculum, the PcS,and the SMA/pre-SMA. Cortical regions along the IFS,and the portion of the IFG ventral to the IFS, particularlyin the left hemisphere, respond to semantic representationsof objects (see the review in Poldrack et al., 1999), selec-tion demands (Thompson-Schill, DEsposito, Aguirre, &Farah, 1997; Thompson-Schill, DEsposito, & Kan, 1999),and phonological processing (Poldrack et al., 2001; Pol-drack et al., 1999). Interestingly, BOLD signal increasesare observed in these same regions during discrimination

and detection of simple auditory objects that possess spec-tral transients (Mller, Kleinhans, & Courchesne, 2001),consistent with the regions role in phonological process-ing. In the latter study, activation of the dorsal IFG and theventro-medial IFG (frontal operculum) was bilateral.Across our experiments, the most consistent activationalong the extent of the PcS and the IFS, regardless of at-tentional condition, was in the right hemisphere. In theglobal attention conditions, activation in these structureswas largely absent on the left, whereas it was present onthe left in the selective attention conditions in Experi-ment1 (Figure5). The overall right-hemispheric bias in the

PcS/IFS region may reflect the nonsemantic aspect ofmusic, along the lines of the proposition that nonverbalworking memory functions are biased toward the righthemisphere (Buckner, Kelley, & Petersen, 1999; Kelleyet al., 1998). Although musical streams are highly struc-tured and bound together by features along multiple di-mensionsthat is, timbral, rhythmic, and melodictheypossess no obvious semanti

janata, p., tillmann, b., & bharucha, j. j. (2002). listening to polyphonic music recruits...

Documents