cross-modal prediction in speech perception carolina sánchez, agnès alsius, james t. enns &...

Cross-modal Prediction in Speech

PerceptionCarolina Sánchez, Agnès Alsius, James T. Enns & Salvador

Soto-Faraco

Multisensory Research Group

Universitat Pompeu Fabra

Barcelona

Auditory + visual performanceMSI enhancement

Background

Visual + Auditory

Improve Speech Perception

Multisensory Integration

Background

• Prediction within one sensory modality• Many levels of information processing

– Phonological prediction “ This morning I went to the library and borrowed a … book” (De Long, 2005; Pickering, 20707)

– Visual prediction: Visual search (Enns, 2008; Dambacher, 2009)

– Sensorimotor prediction: forward model (Wolpert, 1997)

Predictive coding

Pickering, 2007

Hypothesis

• If there exists prediction within the same modality,

and if predictive coding models can account for prediction at a phonological level, then …

Predictive Coding could occur across different sensory modalities too.

Indirect evidences of cross-modal transfer in speech

van Wassenhove’s , 2005

time

ERPs

• Amplitud reduction

• Shortening latency

/pa/ high visual saliency

/ka/ short visual saliency

Our study

• Visual prediction

• Auditory prediction

• Visual-to-auditory cross-modal prediction

• Auditory-to-visual cross-modal prediction

Visual prediction

Visual stream

Auditory stream

V

A

With visual informative visual context

Without informative context

Task :

AV Match vs. AV Mismatch

Target fragment

Context fragment

speechnon speech

Results

*

0

200

400

600

800

1000

1200Reaction time

mse

c

match mismatch

With visual informative context


* With previous context participants respond faster than without it.

VISUAL PREDICTION

Auditory prediction

Visual stream

Auditory stream

V

A

With auditory informative auditory context


speechnon speech

Task :

AV Match vs. AV Mismatch

Target fragment

Context fragment

Results

*

0

200

400

600

800

1000

1200

With auditory informative context


Reaction time

mse

c

match mismatch

* With previous context participants respond faster than without it.

AUDITORY PREDICTION

Visual vs. Auditory Visual prediction Auditory

prediction

0

200

400

600

800

1000

1200Rts

mse

c

congruent incongruent

With visual informative context

Without informative context*

0

200

400

600

800

1000

1200

With auditory informative context


Rts

mse

c

congruent incongruent

*

Conclusions

• Visual prediction

• Auditory prediction

Is this prediction cross-modal?

Predictability of Vision-to-Audition Design of the experiment

V

AMismatch

Unimodal continued

Auditory stream

Visual stream

Match

Unimodal continuedV

A

Discontinued

Match

V

A

Discontinued

Mismatch

V

A

Cross-modal continued

Mismatch

Predictability of Vision-to-Audition Stimuli

V

AMismatch

V

AMismatch

V

AMismatch

Unimodal continued Discontinued Cross-modal continued

Results

Participants were faster in the cross-modal condition than in the completely incongruent one.

VISUAL –TO-AUDITORY PREDICTION

700

750

800

850

900

950

1000

Reaction time

mse

c

*

VisualAuditory

Unimodal continued

Discontinued Cross-modal continued

Predictability of Audition-to-Vision Design of the experiment

Auditory stream

Visual stream

Match

Unimodal continued

V

AMismatch

Unimodal continued

V

AMatch

Discontinued

V

AMismatch

Discontinued

V

AMismatch

Cross-modal continued

0

200

400

600

800

1000

1200Reaction time

mse

c

Visual

Auditory

Unimodal continued


Results

We didn’t find any difference between the mismatch condicions

NO AUDITORY-TO-VISUAL PREDICTION

Conclusions

• There is some kind of prediction from vision-to-auditory modality

• There is not any prediction from auditory-to-vision modality

Does this prediction depend on the language?

Canadian participants with english sentences

VISUAL –TO-AUDITORY PREDICTION IN NATIVE LANGUAGE

700

750

800

850

900

950

1000Reaction time

mse

c

*

Visual

Auditory

Unimodal continued


700

750

800

850

900

950

1000

Reaction time

mse

c

*

VisualAuditory

Unimodal continued


Spanish participants with spanish sentences

Results (L1)

Results (L1)

Canadian participants with english sentences

0

200

400

600

800

1000

1200Reaction time

mse

c

No differences between the mismatch conditions

No prediction from auditory-to-visual modality in native language

Spanish participants with spanish sentences

0

200

400

600

800

1000

1200Reaction time

mse

c

Visual

Auditory

Unimodal continued


Visual

Auditory

Unimodal continued


Conclusions

• There is some kind of prediction from vision-to-auditory modality in L1

• There is not any prediction from auditory-to-vision modality L1

What happens with an unknown language?

Unknown language : visual to auditory

Canadian participants with spanish sentences

NO VISUAL-TO-AUDITORY IN OTHER LANGUAGE

700

800

900

1000

1100

1200Reaction time

mse

c

Visual

Auditory

Unimodal continued


Unknown language: auditory to visual

Spanish participants with english sentences

Canadian participants with spanish sentences

0

200

400

600

800

1000

1200Reaction time

mse

c

0

200

400

600

800

1000

1200Reaction time

mse

c

No differences between the mismatch conditions

No prediction from auditory-to-visual modality in other language

Visual

Auditory

Unimodal continued


Visual

Auditory

Unimodal continued


Conclusions

• No visual-to-auditory cross-modal prediction in an unknown language…

it seems that some level of knowledge about the articulatory phonetics of the language is required to obtain the advantage of the predictive coding

• No auditory-to-visual cross-modal prediction

General Conclusions

• Unimodal prediction from visual to visual modality from auditory to auditory

• L1: ASYMMETRY– Cross-modal prediction from visual-to-auditory

modality– No cross-modal prediction from auditory-to-visual

modality

• Unknown language: previous knowledge of the language is neccesary to make the prediction– No cross-modal prediction from visual-to-auditory

modality– No cross-modal prediction from auditory-to-visual

modality

- Agnès Alsius, Postdoc

Queen’s University

- Antonia Najas, MA/ Research Assistant Universitat Pompeu Fabra

- Phil Jaekl, PostdocUniversitat Pompeu Fabra

- All the people of the Vision Lab, UBC, Vancouver

Thanks to…

Thanks for your attention!!

cross-modal prediction in speech perception carolina sánchez, agnès alsius, james t. enns &...

Documents

visual prediction slide

auditory prediction

visual informative context

auditory informative

prediction crossmodal

speech slide

informative context

background prediction