ars.els-cdn.com€¦ · web viewsupplementary materials i: proof of concept experiment. method....

Supplementary Materials I: Proof of Concept Experiment

Methods

Participants. Sixteen Cantonese-English bilingual listeners (Mean age = 20.3 years, SD

= 3.38 years) and 16 native English listeners (Mean age = 27.0 years, SD = 9.07 years)

participated in the proof-of-concept experiment. Demographic and language learning

background information for these participants was obtained through a language background

questionnaire (Choi, Tong, Gu, Tong, & Wong, 2017). According to participants’ self-

reports, all Cantonese-English bilingual listeners were born in Hong Kong and were native

speakers of Cantonese. They began learning English as a second language in early childhood

(Mean age of onset of English learning = 2.8 years, SD = 1.48 years). Their self-assessment

of proficiency in Cantonese and English on a 5-point Likert scale (1 = poor; 5 = excellent)

indicated that their proficiencies in Cantonese and English were 4.73 (SD = .59) and 4.00 (SD

= .65) respectively. The mean frequencies of language use on a 5-point Likert scale (1 =

never; 5 = always) were 4.93 (SD = .26) and 3.73 (SD = 1.10) for Cantonese and English,

respectively. In contrast, the native English listeners had acquired English as their first

language and had not received any formal or informal instruction in Cantonese, nor had they

learned any other tone languages. The mean period of residing in Hong Kong for the native

English listeners was 2.2 years (SD = 3.84 years).

Materials and Procedure. The core stimuli for the English lexical stress discrimination

task were the four minimal pairs /'kaka/ - /ka'ka/, /'kɛkɛ/ - /kɛ'kɛ/, /'kiki/ - /ki'ki/, /'kɔkɔ/ -

/kɔ'kɔ/. The consonants and vowels were selected to be common in both Cantonese and

English in order to eliminate any possible influence of segmental contrasts. These were

recorded by two native American English speakers, one male and one female. The male

speaker was an English professor and the female speaker was an international graduate

student at The University of Hong Kong. They grew up in the United States and had acquired

English as their first language. At the time of recording, each had lived in Hong Kong for less

than 1.5 years. Prior to recording, they were given a list of English real words with

contrastive lexical stress patterns (e.g., IMport/imPORT; SUSpect/susPECT) to familiarize

them with iambic and trochaic stress patterns. During the recording, each pseudoword with a

marked lexical stress pattern (e.g., KAka) was presented on a computer screen. The speakers

recorded each pseudoword at their own pace. Each item was recorded three times, and the

best token (i.e., the most clearly articulated) was chosen by a speech-language pathologist

experienced in English phonetics. The selected tokens were normalized to 500ms.

The stimuli were presented using an AX discrimination paradigm. On each trial, a

minimal pair was presented to participants over professional quality stereo headphones with

an inter-stimulus interval of 500ms. Each minimal pair consisted of one male and one female

recording, with the order of presentation of male and female recordings counterbalanced. The

two stimuli on a trial were always the same pseudoword but could either have the same

lexical stress pattern (e.g., /'kaka/ vs. /'kaka/) or different lexical stress patterns (e.g., /'kaka/

vs. /ka'ka/). The frequency of occurrence of each minimal pair (/'kaka/ - /ka'ka/, /'keke/ -

/ke'ke/, /'kiki/ - /ki'ki/, /'koko/ - /ko'ko/) was balanced.

The use of two quite different voices was designed to prevent listeners from attending to

acoustic similarity, per se. Prior to the experiment, listeners were explicitly told that they

would be hearing items that either matched or mismatched on stress pattern. On each trial,

participants responded by pressing one of two keys on a keyboard to indicate whether the two

sounds had the same stress pattern or not. There were 128 trials (8 stimuli × 2 speaker orders

× 4 repetitions × 2 trial types). Accuracy and response time were recorded on each trial.

Results

Prior to our main analyses, we first converted the raw data of the discrimination task to

signal detection theory’s sensitivity index (d’) by subtracting the z-transform of the false

alarm rate from the z-transform of the hit rate (MacMillan, & Creelman, 2005). The hit rate

and false alarm rates were obtained from the number of correct trials on different-trials and

number of incorrect trials on same-trials, respectively.

Figure S1 shows the scatter plot of the d’ scores for Cantonese-English bilingual

listeners and native English listeners. The mean d’ score for the Cantonese-English bilingual

listeners was 3.10 (SD = .84), versus a mean of 1.26 (SD = .77) for the native English

listeners. An independent samples t-test revealed that Cantonese-English bilingual listeners

discriminated English lexical stress significantly more accurately than native English

listeners, t(30) = 6.44, p < .001, d = 2.28. This difference provides preliminary confirmation

of our hypothesis that there is a facilitative effect of tone language experience on English

lexical stress discrimination. Cantonese-English bilingual listeners also had significantly

shorter response times than native English listeners, t(30) = -3.14, p < .01, d = 1.05. Thus, the

non-native advantage cannot be attributed to a speed-accuracy trade-off.

Figure S1. Scatterplot of the d’ scores for Cantonese-English bilingual listeners and native

English listeners. Using .99 and .01 as maximum hit and minimum false alarm rates

respectively, the effective upper limit of d’ is 4.65.

Supplementary Materials II

Language and Musical Backgrounds of the Participants

According to the language background questionnaires (Choi et al., 2017), all Cantonese-

English bilingual listeners were born in Hong Kong, had acquired Cantonese as their first

language, and had learned English as their second language in early childhood (Mean age of

onset of English learning = 3.46 years, SD = 1.45 years). On a scale from 1 (poor) to 5

(excellent), their self-rated proficiencies in Cantonese and English were 4.82 (SD = .39) and

4.04 (SD = .39) respectively. On a scale from 1 (never) to 5 (always) for frequency of

language use, the mean frequencies for Cantonese and English usage were 4.96 (SD = .19)

and 3.64 (SD = .73) respectively.

The native English listeners had acquired American English as their first language and

had not received any formal or informal instruction in Cantonese or any other tonal

languages. Adopting the criteria used in prior research (e.g., Cooper & Wang, 2012;

Wayland, Herrera, & Kaan, 2010), nine Cantonese-English bilingual listeners and five native

English listeners were identified as musicians. Cantonese-English bilingual musicians (M =

9.7 years, SD = 2.50 years) and native English musicians (M = 10.2 years, SD = 2.86 years)

had similar amounts of musical training, t(12) = -.36, p = .722.

Supplementary Materials III

Stimuli in the Lexical Stress Discrimination Task

Phonotactically legal pseudowords. Four minimal pairs, i.e., /'kiki/ -

/ki'ki/, /'kwikwi/ - /kwi'kwi/, /'kimkim/ - /kim'kim/, /'kwimkwim/ - /kwim'kwim/ were

recorded. These stimuli are non-words in both languages, but are phonotactically legal in

both. The CVCV or CVCCVC structures were used because they are the most common

word-shapes with repeated syllable structures in Cantonese and English (Battistella, 1990).

The consonants and vowels selected are common in both Cantonese and English.

Phonotactically illegal pseudowords. We recorded the four minimal pairs /'riri/ -

/ri'ri/, /'krikri/ - /kri'kri/, /'kivkiv/ - /kiv'kiv/, /'krivkriv/ - /kriv'kriv/. Although these stimuli

have CVCV or CVCCVC structures, they are not phonotactically legal in Cantonese. The

consonants /v/ and /r/, and the cluster /kr/ are present in English, but do not occur in

Cantonese. These stimuli are thus legal pseudowords in English, and non-words in

Cantonese.

Real word stimuli. We recorded four minimal pairs of real English words, /ˈpɚmɪt -

pɚˈmɪt/, /ˈsəspekt - səsˈpekt/, /ˈɪnsɚt - ɪnˈsɚt/, and /ˈimpɔrt - imˈpɔrt/ (permit, suspect, insert,

import). We chose these four minimal pairs because their lexical stress change (e.g., PERmit

versus perMIT) did not involve vowel reduction (which is a segmental change, rather than a

variation in suprasegmental information).

All stimuli were digitally recorded in a sound-shielded booth, produced by two native

American English speakers (one male and one female), at a sample rate of 48 kHz. The male

speaker (the third author) has extensive experience recording stimuli for speech experiments;

he has lived in the New York City region for most of his life. The female speaker was an

undergraduate student living in the New York City region. During the recording, the male

speaker was given a written list of the stimuli and asked to produce the stimuli with the

lexical stress patterns indicated in the list (e.g., PERmit). The female speaker produced the

items by imitating recordings of the male productions: Prior to recording each item on the

list, she heard the item produced by the male speaker. Each item was recorded three times,

and the best item was chosen by the same speech-language pathologist who judged the

stimuli for the proof of concept experiment. Each chosen item was normalized to 600ms in

Praat 5.4.02 (Institute of Phonetic Sciences, University of Amsterdam, the Netherlands). The

acoustic features of the naturally recorded stimuli are provided in Tables S1, S2, S3 and S4.

We then manipulated the acoustic parameters to create two additional stimulus sets: f0-only

stimuli and duration-and-intensity only stimuli, as described below.

F0-only stimuli. The three sets of stimuli were acoustically manipulated in Praat to

remove the duration and intensity cues. Specifically, the durations of the first and second

syllables were normalized to 300ms, and their intensities neutralized to 66dB.

Duration-and-intensity only stimuli. The three sets of stimuli were acoustically

manipulated to remove the dynamic f0 information. The f0 of each syllable was flattened to

120Hz (male recordings) or 210Hz (female recordings). These f0 values correspond to the

average fundamental frequency of male and female adults (Traunmuller & Erikson, 1994).

Short-term Memory Task

We used two tasks that Zheng and Samuel (2018) had used to assess participants’

memory and intelligence. The task that measured short term memory was a version of the

children’s game “Simon” (retrieved from

http://www.freegames.ws/games/kidsgames/simon/mysimon.htm). In this task, there was a

round object split into four different wedges corresponding to different colors, i.e., red, green,

yellow and blue. On each trial, different colors would light up in a sequence, e.g., red-blue-

green. Participants were asked to remember and reproduce the sequence, e.g., red-blue-green.

The first trial presented a single color. Each time a participant gave the correct answer, the

http://www.freegames.ws/games/kidsgames/simon/mysimon.htm

sequence would increase by one, e.g., blue-yellow-green-red, until the participant failed to

reproduce the sequence. Participants completed five rounds, and we computed the

participant’s median score. In the usual game, each color corresponds to a specific musical

tone. We turned off the sound during the task to avoid the possible involvement of pitch-to-

color mapping.

Non-verbal Intelligence Task

Participants completed a set of non-verbal multiple-choice questions chosen from two

free intelligence tasks (retrieved from http://www.iq-test.com/free-iq-test/, and

http://www.quickiqtest.net). The questions were displayed on PowerPoint slides that

automatically moved on every 30 seconds. There were 14 questions, and the number of

correct answers was tallied to give the total score.

http://www.quickiqtest.net/

Table S1.

Original and modified durations of the naturally recorded stimuli originating from the male

speaker.

Stimuli Overall First syllable Second syllable

Original

duration

Modified

Duration

Original

duration

Modified

duration

Original

duration

Modified

duration

Legal

kiKI 728 600 333 275 395 325

KIki 642 600 291 273 351 327

kimKIM 755 600 343 273 412 327

KIMkim 675 600 336 399 339 201

kwiKWI 714 600 289 243 425 357

KWIkwi 671 600 290 260 381 340

kwimKWI

M 770

600

383 300387

300

KWIMkwi

m 678

600

354 313324

287

Ilegal

kivKIV 913 600 363 237 550 363

KIVkiv 926 600 359 232 567 368

kriKRI 735 600 325 263 410 337

KRIkri 703 600 315 269 388 331

krivKRIV 999 600 393 236 606 364

KRIVkriv 951 600 407 257 544 343

riRI 663 600 344 243 319 357

RIri 769 600 368 260 401 340

Real words

imPORT 727 600 272 224 455 376

IMport 855 600 250 177 605 423

inSERT 748 600 169 139 579 461

INsert 816 600 166 124 650 476

perMIT 601 600 141 151 460 449

PERmit 600 600 141 155 459 445

susPECT 913 600 459 308 454 292

SUSpect 972 600 490 272 482 328

Note. Stressed syllables are capitalized. All values are represented in ms.

Table S2.

Original and modified durations of the naturally recorded stimuli originating from the

female speaker.

Stimuli Overall First syllable Second syllable

Original

duration

Modified

duration

Original

duration

Modified

duration

Original

duration

Modified

duration

Legal

kiKI 638 600 340 319 298 281

KIki 612 600 367 361 245 239

kimKIM 714 600 373 314 341 286

KIMkim 618 600 305 297 313 303

kwiKWI 692 600 355 306 337 294

KWIkwi 660 600 300 271 360 329

kwimKWI

M 767

600

358 279409

321

KWIMkwi

m 662

600

327 294335

306

Ilegal

kivKIV 878 600 364 247 514 353

KIVkiv 852 600 321 228 531 372

kriKRI 667 600 344 308 323 292

KRIkri 619 600 304 294 315 306

krivKRIV 843 600 352 252 491 348

KRIVkriv 1013 600 368 219 645 381

riRI 675 600 306 247 369 353

RIri 616 600 323 332 293 268

Real words

imPORT 832 600 349 250 483 350

IMport 829 600 403 290 426 310

inSERT 772 600 263 204 509 396

INsert 683 600 268 239 415 361

perMIT 796 600 277 209 519 391

PERmit 748 600 300 237 448 363

susPECT 1000 600 523 247 477 353

SUSpect 962 600 526 332 436 268

Note. Stressed syllables are capitalized. All values are represented in ms.

Table S3.

Fundamental frequency, duration and intensity values of the naturally recorded stimuli

originating from the male speaker.

Stimuli First syllable Second syllable

F0

(Hz)

Duration

(ms)

Intensity

(dB)

F0

(Hz)

Duration

(ms)

Intensity

(dB)

Legal

kiKI 127.12 275 66.03 162.57 325 66.81

KIki 176.19 273 69.03 98.24 327 61.93

kimKIM 121.19 273 64.90 121.38 327 67.52

KIMkim 205.40 399 68.89 98.43 201 60.72

kwiKWI 133.44 243 65.61 142.22 357 67.00

KWIkwi 189.53 260 66.46 95.78 340 66.59

kwimKWIM 123.16 300 63.52 121.04 300 68.26

KWIMkwim 170.04 313 67.63 87.12 287 64.84

Ilegal

kivKIV 137.13 237 67.30 122.88 363 65.68

KIVkiv 166.85 232 68.82 99.55 368 63.87

kriKRI 133.27 263 65.60 144.29 337 66.50

KRIkri 190.56 269 68.10 97.10 331 63.36

krivKRIV 123.17 236 66.88 162.58 364 66.13

KRIVkriv 168.22 257 68.56 112.50 343 63.08

riRI 133.44 243 65.61 142.22 357 67.00

RIri 189.53 260 66.46 95.78 340 66.59

Real words

imPORT 109.81 224 58.19 123.29 376 67.77

IMport 163.19 177 59.91 103.00 423 66.77

inSERT 116.94 139 66.65 157.50 461 65.90

INsert 167.84 124 70.46 N/A* 476 62.97

perMIT 124.02 151 66.84 127.35 449 61.55

PERmit 168.02 155 69.34 188.23 445 57.52

susPECT 118.77 308 65.57 128.81 292 67.32

SUSpect 145.21 272 68.73 99.78 328 63.42

Note. Stressed syllables are capitalized. * = unable to obtain reliable f0 due to frication noise.

Table S4.

Fundamental frequency, duration and intensity values of the naturally recorded stimuli

originating from the female speaker.

Stimuli First syllable Second syllable

F0

(Hz)

Duration

(ms)

Intensity

(dB)

F0

(Hz)

Duration

(ms)

Intensity

(dB)

Legal

kiKI 215.92 319 65.28 214.86 281 67.65

KIki 297.58 361 67.01 235.04 239 65.62

kimKIM 235.17 314 65.55 226.74 286 67.4

KIMkim 271.63 297 68.1 231.5 303 64.28

kwiKWI 191.09 306 66.2 233.65 294 66.89

KWIkwi 347.38 271 68.37 240.89 329 64.29

kwimKWIM 230.07 279 64.87 248.49 321 67.64

KWIMkwim 325.53 294 68.55 233.43 306 63.03

Ilegal

kivKIV 227 247 64.89 236 353 67.39

KIVkiv 299.83 228 68.53 236.92 372 64.81

kriKRI 235.22 308 65.37 264.89 292 67.55

KRIkri 283.04 294 67.12 233.36 306 65.92

krivKRIV 232.75 252 64.04 243.4 348 67.67

KRIVkriv 306.85 219 69.36 287.9 381 63.6

riRI 210.62 247 64.81 221.2 353 67.39

RIri 256.13 332 67.99 180.8 268 63.76

Real words

imPORT 221.69 250 62.96 239 350 67.86

IMport 262.08 290 68.23 205.04 310 62.43

inSERT 218.01 204 64.89 310.87 396 66.89

INsert 239.81 239 69.43 208.87 361 59.58

perMIT 227.81 209 69.19 260.65 391 64.32

PERmit 119.39 237 70.28 208.9 363 59.01

susPECT 210.62 247 64.81 221.2 353 67.39

SUSpect 256.13 332 67.99 180.8 268 63.76

Note. Stressed syllables are capitalized.

Supplementary Materials IV

Preliminary Analysis: Potential Group Differences in Control Measures

Before the main analysis, we tested whether the groups differed on non-verbal

intelligence and/or short-term memory. The Cantonese-English bilingual listeners had

significantly higher non-verbal intelligence scores (M = 11.40, SD = 1.96) than the native

English listeners (M = 9.10, SD = 2.71), t(58) = 3.77, p < .001, d = .97, but no significant

difference was found in short-term memory between Cantonese-English bilingual listeners

(M = 7.53, SD = 3.15) and native English listeners (M = 8.43, SD = 2.69), t(58) = -1.19, p

= .239). Thus, in the main analyses, only non-verbal intelligence was included as a covariate.

An alternative method for controlling for the group difference in non-verbal intelligence is

described below (removing participants from the two groups in a way that equalized the

average non-verbal intelligence scores between the groups).

Supplementary Analysis with Non-verbal Intelligence-matched Native and Non-native

Listeners

To match the two subject groups on their non-verbal intelligence scores, one third of the

Cantonese-English bilingual listeners (those at the upper end of the non-verbal intelligence

distribution), and one third of the native English listeners (those at the lower end of the non-

verbal intelligence distribution) were excluded, resulting in a data subset with 20 listeners per

group. In this data subset, Cantonese-English bilingual listeners (M = 10.50, SD = 1.79) and

native English listeners (M = 10.60, SD = 1.50) did not differ significantly in non-verbal

intelligence, t(38) = -.19, p = .849. This exclusion process led to a sample in which the native

English listeners (M = 9.55, SD = 2.26) had better short-term memory scores than the

Cantonese-English bilingual listeners (M = 7.75, SD = 2.77), t(38) = -2.25, p < .05, d = .71.

Thus, short-term memory was included as a co-variate in the following analyses.

Using the same factors as the main ANCOVA (but now with memory score as the co-

variate), the ANCOVA produced the same significant main effect of group, F(1, 37) = 4.33,

p < .05, ηp2 = .11: The Cantonese-English bilingual listeners (average d' = 2.37)

outperformed native English listeners (average d' = 1.80) on English lexical stress

discrimination, even after the groups were matched on non-verbal intelligence. The

ANCOVA also revealed significant two-way interactions between acoustic condition and

group, F(2, 74) = 5.76, p < .01, ηp2 = .14, and between context and group, F(2, 74) = 4.46, p

< .05, ηp2 = .11. Consistent with our central hypothesis, simple main effects analyses

revealed that Cantonese-English bilingual listeners outperformed native English listeners in

the all-cues condition, F(1, 37) = 9.29, p < .01, ηp2 = .20, CI [.38, 1.88], in the real word

context, F(1, 37) = 6.81, p < .05, ηp2 = .16, CI [.18, 1.43], and in the legal context, F(1, 37) =

3.62, p = .065, ηp2 = .09, CI [-.04, 1.10].

Supplementary Analysis of Response Times

To evaluate the possibility that the advantage found for the Cantonese-English

bilingual listeners was due to a speed-accuracy trade-off, we conducted a 3 × 3 × 2 mixed

ANCOVA on response times with acoustic condition (all-cues, pitch-only, and duration-and-

intensity-only) and phonotactic/lexical context (legal pseudoword, illegal pseudoword and

real word) as within-subjects factors, and group (Cantonese-English bilingual listeners and

native English listeners) as a between-subjects factor. Non-verbal intelligence was the

covariate. The main effect of group was not significant, F(1, 57) = .76, p = .387: Cantonese-

English bilingual listeners did not have significantly longer response times than native

English listeners when discriminating English lexical stress.

As shown in Figures S2 and S3, instead of having longer response times, Cantonese-

English bilingual listeners actually had shorter response times than native English listeners

under the all-cues and pitch-only conditions. This was also the case for the illegal and real

word contexts. The pattern resulted in the ANCOVA having significant interactions between

acoustic condition and group, F(2, 114) = 4.85, p < .05, ηp2 = .08, and between

phonotactic/lexical context and group, F(2, 144) = 6.49, p < .01, ηp2 = .10. Simple main

effects analyses confirmed that Cantonese-English bilingual listeners had significantly shorter

response times than native English listeners under the pitch-only condition F(1, 57) = 4.01, p

= .05, ηp2 = .07, CI [.-337.75, .02] and marginally shorter response times in the real word

context, F(1, 57) = 3.37, p = .071, ηp2 = .06, CI [-311.25, 13.45]. These results demonstrate

that the advantage found for the Cantonese-English bilingual listeners cannot be accounted

for by a speed-accuracy trade-off.

All f0 di0

100

200

300

400

500

600

700

800

Cantonese-English bilingual listenersNative English listeners

Condition

Res

pons

e tim

e (m

s)

Figure S2. Mean response times for the Cantonese-English bilingual and native English

listeners across all acoustic conditions. All, f0 and di denote all cues, pitch-only, and

duration-and-intensity-only conditions respectively.

Legal Illegal Realword0

100

200

300

400

500

600

700

Cantonese-English bilingual lis-tenersNative English listeners

Condition

Res

pons

e tim

e (m

s)

Figure S3. Mean response times for the Cantonese-English bilingual and native English

listeners across all phonotactic/lexical contexts.

ars.els-cdn.com€¦ · web viewsupplementary materials i: proof of concept experiment. method....

Documents