Conclusions Constriction Type does influence AV speech
perception when it is visibly distinct • Constriction is more effective than Articulator
in this stimulus context
• critical constriction degree (fricatives) shows the strongest visual influence
Active Articulator had little visual effect • labials did not have greater effect than linguals
• However, passive articulator differences may account for the strong /v/-/D/ effects
Articulatory Phonology implications for AV speech perception/production research• gestural parameters may offer better (or
additional) guidance than phonetic features
PrécisIn audio-visual (AV) speech perception the two modalities convey largely complementary information (V: Place, A: Manner). But place can be low visibility, and manner visible. Articulatory Phonology and ecological/direct realist views imply that examining visible vs. audible gestural structure may offer novel insights.Perceptual effects of active articulator vs. constriction degree were examined in a McGurk task using anterior consonants that differ visibly on both dimensions. Visual impact was greatest for incongruent A-V signals that used different articulators but same constriction type, stronger for fricatives than stops/glides, yet failed to yield an articulator effect. Thus, constriction affects AV perception, more so than active articulator, in identification of visually distinct anterior consonants.
Background
Audio-visual (AV) speech perception shows modality-specific contributions (MacDonald & McGurk, 1978; VPAM : Summerfield, 1987): Audio provides primarily manner information Visual provides place of articulation information
Yet, some qualifications re: those assumptions: place and manner imperfectly related to visibility:
• place (POA) visibility varies• labials vs. non-labials• also some visibility for some coronals• face dynamics re: other POA info (below)
• manner also varies: stops - fricatives - glides• unclear how narrowly to define POA, e.g. /b v/
• SAME: labial ( broad transcripttion)• DIFFERENT: labiodental vs. bilabial (narrow transcription)
dynamic visual speech information is distributed across the talking face/head (Yehia et al., 1998)
• correlates with tongue as well as lip and jaw movements
• this info can guide intelligible audio synthesis
Articulatory Phonology (Browman & Goldstein, 1992, 2000) suggests an alternative: A-V perception re: articulatory gestures (cf Fowler & Dekle, 1991) Active articulator: lower lip vs. tongue tip/blade Constriction degree: closed - critical - narrow
Results: Experiment 1
Gestural Incongruity Type main effect, p < .0001 Visual influence was strongest when A and V
tokens differed in Articulator but shared the same Constriction degree
Gestural Incongruity x Articulator, p = .0084 The preceding effect was more pronounced
when the video token used lips than tongue tip
Results cont’d: Experiment 1 Constriction Type main effect, p = .0001
Visual influence on perception was greater for fricatives than stops
Gestural Incongruity x Constriction, p < .0001
/v/-/D/ pairs showed the strongest visual effect, followed by video stop paired with opposite-articulator fricative
Articulator x Constriction, p < .017
Both fricatives had strong visual effects, but labial stop > lingual stop
Experiment 2 Gestural Incongruity Type main effect, p < .0001
Replicated that of Experiment 1
Constriction Type main effect, p = .0001
Again, fricatives had a larger visual effect than stops; in Exp. 2 they also superceded glides
Results cont’d: Experiment 2
Gestural Incongruity x Articulator, p < .053 marginal: largest visual effects for /v/-/D/; /b/-/d/
pairs and video fricative + audio stop/glide yielded next largest visual effect
Gestural Incongruity x Constriction, p < .0001 Replication/extension of Exp. 1 interaction
effect. /v/-/D/ showed strongest effect by far. Video glides with opposite-articulator stop/fricative was next-strongest.
AVSP’05
Influences of Visible Place Versus Manner DistinctionsInfluences of Visible Place Versus Manner Distinctionson Perception of Audio-Visual English CV Syllableson Perception of Audio-Visual English CV Syllables
Catherine T. Best & Daniel LazarekCatherine T. Best & Daniel [email protected]@uws.edu.au
Research Question: How do visible distinctions in active articulator and constriction degree contribute to AV speech perception?
0
0.2
0.4
0.6
0.8
1
Incongruent Pair x Video Articulator
VSI Score
lips tongue tip
0
0.2
0.4
0.6
0.8
1
Incongruent Pair x Video Constriction
VSI Score
fricativesstops
0
0.2
0.4
0.6
0.8
1
Incongruent Pair Type
VSI Score
SameArticulator,DifferentConstriction
DifferentArticulator,SameConstriction
DifferentArticulator,DifferentConstriction
0
0.2
0.4
0.6
0.8
1
Video Constriction Type
Visual Perception Index Score
stops
fricatives
glides
0
0.2
0.4
0.6
0.8
1
Incongruent Pair xVideo Articulator
VSI Score
lips tongue tip
Same Articulator, Different ConstrictionDifferent Articulator, Same ConstrictionDifferent Articulator, Different Constriction
0
0.2
0.4
0.6
0.8
1
Incongruent Pairs x Video Constriction
VSI Score
fricativesstops glides
MethodStimuli anterior Cs (USA English) that differ visibly re:
Active Articulator• lower lip• tongue tip/blade
Constriction:• closed (stop)• critical (fricative)• narrow (glide) (included only in Exp. 2)
Subjects English (USA) : Exp 1 (n =14), Exp 2 (n =12)
Task report C heard: AV-congruent & AV-incongruent
Data Visual Speech Index (VSI), calculated on
proportion correct audio identificationsVSI = [AVcongruent - AVincongruent]
lip tongueclosed /b/ /d/
critical /v/ /D/narrow (Exp 2 only) /w/ /j/
0
0.2
0.4
0.6
0.8
1
Video Articulator x Video Manner
VSI Score
lips, stop
tongue tip,stop
lips, fricative
tongue tip,fricative
/b/ /d/ /v/ /D/
video: