template design © 2008 perceptual compensation for /u/-fronting in american english kataoka, reiko...

TEMPLATE DESIGN © 2008

www.PosterPresentations.com

Perceptual compensation for /u/-fronting in American EnglishKATAOKA, Reiko ([email protected])

Department of Linguistics, University of California at Berkeley, 1203 Dwinelle Hall, Berkeley, California 94720-2650

Objectives

Summary and DiscussionMaterials and Methods (cont.) Results: Experiment 2

Citation

Introduction

Listener's identification of speech sounds are influenced by both perceived and expected characteristics due to the influence of surrounding sounds. For example, a vowel ambiguous between /i/ and /e/ is heard more often as /e/ when the precursor sentence has low F1 but it is heard as /i/ when the precursor has high F1 (Ladefoged & Broadbent 1957), and a greater degree of vowel undershoot is perceptually accepted in fast speech than in slow speech (Lindblom & Studdert-Kennedy 1967).

Later, Ohala and Feder (1994) showed that American listeners judge a vowel stimulus which is ambiguous between /i/ and /u/ more frequently as /u/ in alveolar context than in bilabial context, and do so both when the context is heard and when it is “restored”. The current study is an attempt to extend Ohala & Feder’s study with additional measures to reveal the locus of perceptual compensation in the human speech processing system.

Materials and Methods (cont.)

Presentation: • A stimulus was played after a precursor “I guess the word is…”.• Each stimulus was tested 5 times.• Fillers ([CiC], [C C], [CuC], and [C C], each three times) were ɨ ʉ

mixed in the trials to create genuine vowel quality variation. • Test stimulus and fillers were presented at random order. • During a trial, the two alternative words (/CiC/ and /CuC/ forms)

appear on the screen. (See Table 1.) The words on the screen will make subjects to ‘restore’ the onset and coda phoneme even when hearing the stimulus with white noise.

Task : Two-alternative forced-choice. The subjects determined which of the two words on the screen the word just heard was.

Subjects: 31 native speakers of American English (10-M, 21-F)Factors: 1) Consonantal context (3 conditions: Alveolar, Bilabial, Zero)2) Status of context (2 conditions: Acoustic or Restored)

Table 1. Factors tested, stimuli presented (e.g. [dyt]), and two alternatives shown on the screen for each stimulus.

EXPERIMENT 2Methods are same as Experiment 1 except that precursor was

uttered in 3 different speech rates.Subjects: Same as Experiment 1.Factors: Precursor speech rate (3 conditions: fast, medium, slow).

EXPERIMENT 3ProductionSubjects uttered a sentence “Say who again.” 10 times. From the

mid point of the vowel in ‘who’, F2 value was measured. Then, by using each subject’s other vowel productions (/i, , e, æ, a, ɪɔ, , , u/, 10 times), ʌ ʊ normalized mean F2 value for /u/ was obtained as a difference from mean F2 of all the vowels on the log scale (LN).

Perception

Table 3. # of responses by Context and Condition

(5 repetition X 31 listeners = 155 cell total)

Results: Experiment 3

•To replicate Ohala & Feder’s findings and further investigate the context effect on Reaction Time (RT) (Experiment 1)•To test the effect of speech rate on the degree of perceptual compensation (Experiment 2)•To investigate the relationship between speech production and perceptual compensation (Experiment 3)

Materials and Methods

EXPERIMENT 1Stimuli: Re-synthesized syllables [dyt], [byp], and [y] (to be used in Acoustic condition) and another set where white noise replaced the consonants (to be used in Restored condition) were created by:• Iterating single vowel period in the CV transition of ‘dude’ to obtain

a steady vowel (i.e. no formant transition) of 100 ms• Adding amplitude contour for the first and last 15 ms of the vowel• Adding F0 contour: 130 at onset 90 Hz at offset• Adding excised natural stop burst (/b/ or /d/) at vowel onset and

another (/p/ or /t/) 70 ms after the vowel offset, or (Fig. 1, a)

• Adding white noise in the place of natural stop burst (Fig. 1, b)

(a)

(b)

Figure 1. Waveforms of a /dVt/ stimulus used in Acoustic Context condition (a) and Restored Context condition (b).

Stimuli: 3 continua of 10 syllables each, varying by vowel backness (i.e., /dit/ - /dut/, /bip/ - /bup/, and /i/ - /u/ continuum) were created by the following process:

• Create 10 equal-step /i/ - /u/ continuum. The voice source was extracted from a sustained vowel, which was produced by the speaker of a precursor sentence, by applying an inverse filtering of LPC object of the vowel so that the voice of the stimuli matches to the voice of the precursor. All 10 vowels have identical formant structures except for F2 value, which varied from 2,300 Hz (step 1) to 860 Hz (step 10). See Fig. 2.

• Add amplitude contour and F0 contour as in Exp.1.• Add onset and coda consonant bursts as in Exp.1. Presentation and Task: Same as Experiment 1 except that each

stimulus was tested 4 times.Subjects: Same as Experiment 1.Factor: Consonantal context (3 conditions: Alveolar, Bilabial, Zero).

Figure 2. Spectrograms of 10-step /i/ - /u/ continuum. Formant frequency (Hz) for the vowels are: F1=375, F2 = variable, F3 = 2500, F4 = 3500, and F5 = 4500.

Results: Experiment 1

Table 2. # of responses by Context and Condition

(5 repetition X 31 listeners = 155 cell total)

Chi-Square test for association: # of /u/-response and ContextsAcoustic Condition [χ2=3.429 (2), p=0.180]Restored Condition [χ2=0.612 (2), p=0.736]

Figure 3. Reaction Time for /u/ response.

Effect of ContextsRestored: [F=4.5 (1.5, 43.3), p<0.05]Acoustic: [F=3.0 (1.8, 52.9), p=0.06]

FastMedSlow

Rate

Error Bars show Mean +/- 1.0 SE

Bars show Means

Alveolar Bilabial ZeroContext

0

200

400

600

496

589566

523 513

598 589569

498

Figure 4. Reaction Time for /u/ response.

Figure 5. % /u/-response as a function of stimulus number, in three different consonant conditions.

Context

Status Alveolar Bilabial Zero

Acoustic [dyt] [byp] [y]

Restored* [NyN] [NyN] [NyN]

Alternatives ‘deet’ or ‘doot’ ‘beep’ or ‘boop’ ‘ee’ or ‘oo’ Cond. Resp.Context

TotalG. TotalAlveolar Bilabial Zero

Restored /i/ 9 22 16 47462/u/ 145 132 138 415

Acoustic /i/ 9 36 10 55465/u/ 146 119 145 410

Total 309 309 309 927

Restored contextAcoustic context

Condition

Error Bars show Mean +/- 1.0 SE

Bars show Means

523499

624

528

580550

Alveolar Bilabial ZeroContext

0

200

400

600

Cond. Resp.Context

TotalG. TotalAlveolar Bilabial Zero

Fast /i/ 8 52 35 95465/u/ 147 103 120 370

Med. /i/ 7 53 11 71

/u/ 148 101 144 393 464

Slow /i/ 3 26 6 35465/u/ 152 129 149 430

Total 465 464 465 1395

Figure 6. Scatter plot of the perceptual /u/ space against normalized F2 of /u/.

Stimulus #10987654321

% /u/ R

esponse

100

80

60

40

20

0

Error bars: 95% CI

ZeroBilabialAlveolar

Consonant

Normalized F2 of /u/0.00-0.20-0.40-0.60

Perc

eptu

al /u

/ space

51.0

48.0

45.0

42.0

39.0 R Sq Linear = 0.168

back [u] ------------------------------------------- fronted [u]narr

ow

/u/ space ---

----

----

----

--- w

ide /u/ space

Figure 7. Metric for obtaining the perceptual space for the vowel /u/. The space is defined as an area under the % /u/-response function.

Harrington, J., Kleber, F., and Reubold, U. (2008). “Compensation for coarticulation, /u/-fronting, and sound change in standard southern British: An acoustic and perceptual study,” J. Acoust. Soc. Am. 123, 2825-2835.

Ladefoged, P., and Broadbent, D. E. (1957). “Information conveyed by vowels,” J. Acoust. Soc. Am. 29, 98-104.

Lindblom, B., and Studdert-Kennedy, M. (1967). “On the role of formant transitions in vowel recognition,” J. Acoust Soc. Am. 42, 830-843.

Ohala, J. J., and Feder, D. (1994). “Listeners’ identification of speech sounds is influenced by adjacent ‘restored’ phonemes,” Phonetica 51, 111-118.

•Listeners heard an ambiguous stimulus more often as /u/ in Alveolar than in Bilabial or Zero context, and did so both in Acoustic and Restored phoneme conditions (Experiment 1). Also, the /u/ identification function shifted as a function of consonantal context (Experiment 3). These results confirm listener’s ability to perceptually compensate for coarticulatory fronting of /u/ in alveolar contexts by shifting the category boundary. It also seems to show listeners’ ability to utilize mentally stored acoustic images of linguistic contexts to perform compensation.

•Alveolar context shortens the RT for /u/-identification (Experiment 1 & 2). This suggests that the perceptual system not only shifts the perceptual category boundary but also to facilitate perceptual processing for a category that is contrastive to the context.

•Speech rate of precursor influenced vowel identification in Bilabial and Zero context (Experiment 2), suggesting the listener’s ability to perform not only categorical but also gradient compensation. The effect was absent in Alveolar context. This could be due to ceiling effect in this particular experimental setting. This effect needs to be re-examined by using an improved method.

•Mild correlation was found between /u/ identification in perception and F2 measured from /u/ production (Experiment 3). Among those who produced /u/ with relatively low F2 (back /u/), many also accepted fronted /u/s as category members in perceptual vowel categorization task. Although previous studies (e.g. Harrington et. al., 2008) demonstrated robust link between production and perception, language users seem to be much more tolerant to speech variation (e.g. dialects, sociolects, style shifts, etc.) that differs from their own production patterns.

Effect of ContextFast: [F=6.0 (2, 48), p<0.05]Med: [F=9.3 (2, 52), p<0.001]Slow: [F=10.2 (2, 60), p<0.001]

template design © 2008 perceptual compensation for /u/-fronting in american english kataoka, reiko...

Documents

vowel stimulus

fast speech

speech production

precursor speech rate

context effect

alveolar context

bilabial context

effect of speech rate