template design © 2008 perceptual compensation for /u/-fronting in american english kataoka, reiko...
Post on 18-Dec-2015
220 views
TRANSCRIPT
TEMPLATE DESIGN © 2008
www.PosterPresentations.com
Perceptual compensation for /u/-fronting in American EnglishKATAOKA, Reiko ([email protected])
Department of Linguistics, University of California at Berkeley, 1203 Dwinelle Hall, Berkeley, California 94720-2650
Objectives
Summary and DiscussionMaterials and Methods (cont.) Results: Experiment 2
Citation
Introduction
Listener's identification of speech sounds are influenced by both perceived and expected characteristics due to the influence of surrounding sounds. For example, a vowel ambiguous between /i/ and /e/ is heard more often as /e/ when the precursor sentence has low F1 but it is heard as /i/ when the precursor has high F1 (Ladefoged & Broadbent 1957), and a greater degree of vowel undershoot is perceptually accepted in fast speech than in slow speech (Lindblom & Studdert-Kennedy 1967).
Later, Ohala and Feder (1994) showed that American listeners judge a vowel stimulus which is ambiguous between /i/ and /u/ more frequently as /u/ in alveolar context than in bilabial context, and do so both when the context is heard and when it is “restored”. The current study is an attempt to extend Ohala & Feder’s study with additional measures to reveal the locus of perceptual compensation in the human speech processing system.
Materials and Methods (cont.)
Presentation: • A stimulus was played after a precursor “I guess the word is…”.• Each stimulus was tested 5 times.• Fillers ([CiC], [C C], [CuC], and [C C], each three times) were ɨ ʉ
mixed in the trials to create genuine vowel quality variation. • Test stimulus and fillers were presented at random order. • During a trial, the two alternative words (/CiC/ and /CuC/ forms)
appear on the screen. (See Table 1.) The words on the screen will make subjects to ‘restore’ the onset and coda phoneme even when hearing the stimulus with white noise.
Task : Two-alternative forced-choice. The subjects determined which of the two words on the screen the word just heard was.
Subjects: 31 native speakers of American English (10-M, 21-F)Factors: 1) Consonantal context (3 conditions: Alveolar, Bilabial, Zero)2) Status of context (2 conditions: Acoustic or Restored)
Table 1. Factors tested, stimuli presented (e.g. [dyt]), and two alternatives shown on the screen for each stimulus.
EXPERIMENT 2Methods are same as Experiment 1 except that precursor was
uttered in 3 different speech rates.Subjects: Same as Experiment 1.Factors: Precursor speech rate (3 conditions: fast, medium, slow).
EXPERIMENT 3ProductionSubjects uttered a sentence “Say who again.” 10 times. From the
mid point of the vowel in ‘who’, F2 value was measured. Then, by using each subject’s other vowel productions (/i, , e, æ, a, ɪɔ, , , u/, 10 times), ʌ ʊ normalized mean F2 value for /u/ was obtained as a difference from mean F2 of all the vowels on the log scale (LN).
Perception
Table 3. # of responses by Context and Condition
(5 repetition X 31 listeners = 155 cell total)
Results: Experiment 3
•To replicate Ohala & Feder’s findings and further investigate the context effect on Reaction Time (RT) (Experiment 1)•To test the effect of speech rate on the degree of perceptual compensation (Experiment 2)•To investigate the relationship between speech production and perceptual compensation (Experiment 3)
Materials and Methods
EXPERIMENT 1Stimuli: Re-synthesized syllables [dyt], [byp], and [y] (to be used in Acoustic condition) and another set where white noise replaced the consonants (to be used in Restored condition) were created by:• Iterating single vowel period in the CV transition of ‘dude’ to obtain
a steady vowel (i.e. no formant transition) of 100 ms• Adding amplitude contour for the first and last 15 ms of the vowel• Adding F0 contour: 130 at onset 90 Hz at offset• Adding excised natural stop burst (/b/ or /d/) at vowel onset and
another (/p/ or /t/) 70 ms after the vowel offset, or (Fig. 1, a)
• Adding white noise in the place of natural stop burst (Fig. 1, b)
(a)
(b)
Figure 1. Waveforms of a /dVt/ stimulus used in Acoustic Context condition (a) and Restored Context condition (b).
Stimuli: 3 continua of 10 syllables each, varying by vowel backness (i.e., /dit/ - /dut/, /bip/ - /bup/, and /i/ - /u/ continuum) were created by the following process:
• Create 10 equal-step /i/ - /u/ continuum. The voice source was extracted from a sustained vowel, which was produced by the speaker of a precursor sentence, by applying an inverse filtering of LPC object of the vowel so that the voice of the stimuli matches to the voice of the precursor. All 10 vowels have identical formant structures except for F2 value, which varied from 2,300 Hz (step 1) to 860 Hz (step 10). See Fig. 2.
• Add amplitude contour and F0 contour as in Exp.1.• Add onset and coda consonant bursts as in Exp.1. Presentation and Task: Same as Experiment 1 except that each
stimulus was tested 4 times.Subjects: Same as Experiment 1.Factor: Consonantal context (3 conditions: Alveolar, Bilabial, Zero).
Figure 2. Spectrograms of 10-step /i/ - /u/ continuum. Formant frequency (Hz) for the vowels are: F1=375, F2 = variable, F3 = 2500, F4 = 3500, and F5 = 4500.
Results: Experiment 1
Table 2. # of responses by Context and Condition
(5 repetition X 31 listeners = 155 cell total)
Chi-Square test for association: # of /u/-response and ContextsAcoustic Condition [χ2=3.429 (2), p=0.180]Restored Condition [χ2=0.612 (2), p=0.736]
Figure 3. Reaction Time for /u/ response.
Effect of ContextsRestored: [F=4.5 (1.5, 43.3), p<0.05]Acoustic: [F=3.0 (1.8, 52.9), p=0.06]
FastMedSlow
Rate
Error Bars show Mean +/- 1.0 SE
Bars show Means
Alveolar Bilabial ZeroContext
0
200
400
600
496
589566
523 513
598 589569
498
Figure 4. Reaction Time for /u/ response.
Figure 5. % /u/-response as a function of stimulus number, in three different consonant conditions.
Context
Status Alveolar Bilabial Zero
Acoustic [dyt] [byp] [y]
Restored* [NyN] [NyN] [NyN]
Alternatives ‘deet’ or ‘doot’ ‘beep’ or ‘boop’ ‘ee’ or ‘oo’ Cond. Resp.Context
TotalG. TotalAlveolar Bilabial Zero
Restored /i/ 9 22 16 47462/u/ 145 132 138 415
Acoustic /i/ 9 36 10 55465/u/ 146 119 145 410
Total 309 309 309 927
Restored contextAcoustic context
Condition
Error Bars show Mean +/- 1.0 SE
Bars show Means
523499
624
528
580550
Alveolar Bilabial ZeroContext
0
200
400
600
Cond. Resp.Context
TotalG. TotalAlveolar Bilabial Zero
Fast /i/ 8 52 35 95465/u/ 147 103 120 370
Med. /i/ 7 53 11 71
/u/ 148 101 144 393 464
Slow /i/ 3 26 6 35465/u/ 152 129 149 430
Total 465 464 465 1395
Figure 6. Scatter plot of the perceptual /u/ space against normalized F2 of /u/.
Stimulus #10987654321
% /u/ R
esponse
100
80
60
40
20
0
Error bars: 95% CI
ZeroBilabialAlveolar
Consonant
Normalized F2 of /u/0.00-0.20-0.40-0.60
Perc
eptu
al /u
/ space
51.0
48.0
45.0
42.0
39.0 R Sq Linear = 0.168
back [u] ------------------------------------------- fronted [u]narr
ow
/u/ space ---
----
----
----
--- w
ide /u/ space
Figure 7. Metric for obtaining the perceptual space for the vowel /u/. The space is defined as an area under the % /u/-response function.
Harrington, J., Kleber, F., and Reubold, U. (2008). “Compensation for coarticulation, /u/-fronting, and sound change in standard southern British: An acoustic and perceptual study,” J. Acoust. Soc. Am. 123, 2825-2835.
Ladefoged, P., and Broadbent, D. E. (1957). “Information conveyed by vowels,” J. Acoust. Soc. Am. 29, 98-104.
Lindblom, B., and Studdert-Kennedy, M. (1967). “On the role of formant transitions in vowel recognition,” J. Acoust Soc. Am. 42, 830-843.
Ohala, J. J., and Feder, D. (1994). “Listeners’ identification of speech sounds is influenced by adjacent ‘restored’ phonemes,” Phonetica 51, 111-118.
•Listeners heard an ambiguous stimulus more often as /u/ in Alveolar than in Bilabial or Zero context, and did so both in Acoustic and Restored phoneme conditions (Experiment 1). Also, the /u/ identification function shifted as a function of consonantal context (Experiment 3). These results confirm listener’s ability to perceptually compensate for coarticulatory fronting of /u/ in alveolar contexts by shifting the category boundary. It also seems to show listeners’ ability to utilize mentally stored acoustic images of linguistic contexts to perform compensation.
•Alveolar context shortens the RT for /u/-identification (Experiment 1 & 2). This suggests that the perceptual system not only shifts the perceptual category boundary but also to facilitate perceptual processing for a category that is contrastive to the context.
•Speech rate of precursor influenced vowel identification in Bilabial and Zero context (Experiment 2), suggesting the listener’s ability to perform not only categorical but also gradient compensation. The effect was absent in Alveolar context. This could be due to ceiling effect in this particular experimental setting. This effect needs to be re-examined by using an improved method.
•Mild correlation was found between /u/ identification in perception and F2 measured from /u/ production (Experiment 3). Among those who produced /u/ with relatively low F2 (back /u/), many also accepted fronted /u/s as category members in perceptual vowel categorization task. Although previous studies (e.g. Harrington et. al., 2008) demonstrated robust link between production and perception, language users seem to be much more tolerant to speech variation (e.g. dialects, sociolects, style shifts, etc.) that differs from their own production patterns.
Effect of ContextFast: [F=6.0 (2, 48), p<0.05]Med: [F=9.3 (2, 52), p<0.001]Slow: [F=10.2 (2, 60), p<0.001]