joan borrÀs-comesprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat...

26
The role of pitch range and facial gestures in conveying prosodic meaning JOAN BORRÀS-COMES Universitat Pompeu Fabra Ph.D. Project Supervisor: Dr. Pilar Prieto

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

The role of pitch range and facial gestures in conveying prosodic meaning

JOAN BORRÀS-COMES Universitat Pompeu Fabra

Ph.D. Project

Supervisor: Dr. Pilar Prieto

Page 2: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Index

Abstract 3

Resum 4

0. General introduction 3

1. The role of pitch range in establishing intonational contrasts in Catalan 9

1.1 Introduction 9

1.2 Methodology 10

1.3 Results 11

2. Encoding of intonational contrasts as revealed by MMN 12

2.1 Introduction 12

2.2 Methodology 12

2.3 Results 13

3. Visual contribution to tune interpretation 15

3.1 Introduction 15

3.2 Methodology 16

3.3 Results 17

4. Disentangling facial gestures in audiovisual speech perception 18

References 20

Appendix 1. Commented references 23

Appendix 2. Work schedule 25

 2

Page 3: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Abstract

Since the unexpected McGurk effect was first reported in 1976, studies on audiovisual com-

munication have shown that the visual component plays a clear role in the perception of various

aspects of communication typically associated with verbal prosody (McNeill 2005). Audiovi-

sual cues for prosodic functions such as prominence and focus and question intonation have

been successfully investigated, and most of the work has described a correlated mode of

processing, whereby vision partially duplicates acoustic information and helps in the decoding

process (Dohen 2009). However, these studies have found a weak visual effect relative to a

robustly strong auditory effect (Srinivasan & Massaro 2003).

The main goal of this thesis is to investigate the contribution of visual and tonal scaling cues,

especially when acoustic information is ambiguous. We hold that there can be a complementary

mode of processing, whereby vision provides information more efficiently than hearing. In Cat-

alan, the nuclear intonation pattern of a statement (L+H* L%, i.e., a rising pitch accent followed

by a low boundary tone) contrasts with that of an echo question (L+¡H* L%, realized with an

upstepped pitch accent). However, this L+¡H* L% nuclear configuration can be used to express

not only echo questions but also contrastive foci.

The research described here consisted of three separated studies. The first study tested the

participants’ identification of these three pragmatic meanings across an auditory continuum.

Results highlighted the difference in tonal scaling between statements and the other two mean-

ings. Moreover, the study revealed a categorical contrast between statements and questions and

between contrastive foci and questions, but only a gradient difference between statements and

contrastive foci. Study 2 was an Event-related brain Potentials (ERP)study wich proved the

categorical difference found in Study 1. The existence of a Mismatch Negativity (MMN) at each

condition was found, but, crucially, a greater MMN was observed in the contrast implying a

phonological-categorical change, which suggests that intonational contrasts are encoded auto-

matically in the auditory cortex.

Study 3 investigated the contribution of visual and acoustic cues in perceiving the (auditorily

ambiguous) contrast between contrastive foci and echo questions. Results highlighted the cru-

cial role of visual information in disambiguating the two meanings and also showed an auditory

effect; reaction time measurements showed that the two factors interact significantly.

A fourth study currently underway will analyze which gestural elements guide speakers’ in-

terpretations through the use of computer-generated 3D avatars, in which each facial gesture

could be manipulated separately. An additional gating experiment will investigate the temporal

patterns of visual and auditory processing.

 3

Page 4: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Resum en català

Des de l’aparició el 1976 de l’inesperat efecte McGurk, l’estudi de la comunicació audiovi-

sual ha anat destacant el paper de la visió en la percepció de diversos aspectes comunicatius

prèviament associats amb la prosòdia verbal (McNeill 2005). S’han investigat els correlats au-

diovisuals a variables prosòdiques com la prominència i la interrogativitat, i la major part del

treball ha descrit un mode de processament correlacionat, on la visió duplica parcialment la

informació acústica i ajuda en la descodificació (Dohen 2009). No obstant això, els estudis han

anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

2003).

L’objectiu principal de la tesi és investigar la contribució de les variables visuals i tonals en

el processament de la informació prosòdica, especialment quan la informació acústica és ambi-

gua. Fem la hipòtesi que existeix un mode de processament complementari, on la visió proveeix

més informació que l’audició. En català, el patró entonatiu nuclear d’una oració declarativa

(L+H* L%; un accent melòdic ascendent seguit d’un to de frontera baix) contrasta amb el d’una

interrogativa eco (L+¡H* L%, realitzat amb un accent melòdic augmentat). Tot i això, aquest

L+¡H* L% pot ser emprada tant per expressar focus contrastius com interrogatives eco.

L’estudi 1 avalua com els participants identifiquen aquests 3 significats pragmàtics a través

d’un contínuum auditiu. Els resultats destaquen la diferència en altura tonal entre les declarati-

ves i els altres dos significats. A part, es mostra un contrast categòric entre declaratives i inter-

rogatives, i també entre focus i interrogatives, però només apareix una diferència gradual entre

declaratives i focus. L’estudi 2 constata la diferència categorial trobada entre declaratives i in-

terrogatives amb un experiment de potencials evocats cerebrals (ERP). El potencial negatiu de

disparitat (MMN) apareix a cada condició, però és crucialment major quan hi ha el canvi de

categoria, cosa que suggereix que els contrastos entonatius poden ser codificats automàticament

en el còrtex auditiu.

L’estudi 3 investiga la contribució de les variables visuals i acústiques en percebre el con-

trast (auditivament ambigu) entre focus i interrogatives. Els resultats indiquen el paper crucial

de la informació visual en la discriminació dels significats i també mostren un efecte auditiu; les

mesures de temps de reacció mostren que els factors interactuen.

L’estudi 4 analitzarà quins elements individuals del gest guien la interpretació dels parlants.

Ho farà mitjançant l’ús d’avatars 3D generats per ordinador, en els quals cada gest facial serà

manipulat separadament. Un experiment addicional de gating aprofundirà en els patrons tempo-

rals del processament visual i auditiu.

 4

Page 5: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

0. General introduction

Since the discovery of the unexpected McGurk effect (McGurk & MacDonald 1976),

in which an auditory /ba/ combined with a visual /ga/ results in a /da/ percept, the strong

influence of vision upon speech perception in normal verbal communication has increa-

singly been recognized and replicated. It is almost impossible for people to talk natu-

rally without gesturing. Moreover, studies on audiovisual communication have attested

that gestures are framed by speech and they together form a fully integrated system. It is

only by looking at both gesture and speech that we can predict how people learn, re-

member, and solve problems (Goldin-Meadow 2005).

Recent studies on audiovisual speech have revealed that the visual component actual-

ly has a clear role in the perception of various aspects of communication typically asso-

ciated with verbal prosody. The visual correlates of prominence and focus (movements

such as eyebrow flashes, head nods, and beat gestures) boost the perception of focus

and prominence (Cavé et al. 1996, Hadar et al. 1983, Krahmer & Swerts 2007, Swerts

& Krahmer 2008, Dohen & Lœvenbruck 2009). Similarly, audiovisual cues for prosodic

functions such as face-to-face grounding (Nakano et al. 2003) and question intonation

(Srinivasan & Massaro 2003) have been successfully investigated, as have the audiovi-

sual expressions of affective meanings such as uncertainty (Krahmer & Swerts 2005)

and frustration (Barkhuysen et al. 2005).

Since gesture conveys information visually, gestures and the synchronous speech that

accompany them are assumed to be coexpressive but not redundant: gesture allows

speakers to convey thoughts that may not easily fit into the categorical system that their

conventional language offers (McNeill 1992/2005, Clark 1996, Kendon 1980, Goldin-

Meadow et al. 1993, Goldin-Meadow & McNeill 1999). Most of the work on audiovi-

sual prosody has described a correlated mode of processing, whereby vision partially

duplicates acoustic information and helps in the decoding process. For example, it is

well-known that in noisy environments, visual information provides a powerful assist in

decoding speech, particularly for the hearing impaired (Sumby & Pollack 1954, Breeuer

& Plomp 1984, Massaro 1987, Summerfield 1992, Grant & Walden 1996, Grant et al.

1998, Assmann & Summerfield 2004).

The majority of studies have found a weak visual effect relative to a robustly strong

auditory effect. Dohen & Loevenbruck (2009) showed that not only segmental percep-

tion but also suprasegmental perception of speech is multimodal (see Dohen 2009).

 5

Page 6: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

They analyzed the production and perception of the contrastive informational focus in

French. Several production studies measured the articulatory and facial correlates of

contrastive prosodic focus in this language (Dohen et al. 2004, Dohen & Loevenbruck

2005, Dohen et al. 2006), revealing that there are visible correlates to contrastive focus.

In line with other studies (e.g., Krahmer & Swerts 2006, Granström & House 2005, us-

ing animated talking heads), they found that prosodic contrastive focus was detectable

from the visual modality alone and that the cues used for perception at least partly cor-

responded to those identified in the production studies (Dohen et al. 2004, Dohen &

Loevenbruck 2005). Dohen & Loevenbruck (2009) designed an experiment to avoid the

ceiling effect observed when auditory perception of prosodic focus was close to 100%

and a potential advantage of adding vision could not be measured. The speech in noise

paradigm was not adequate here since voicing is a robust feature in noise environments,

and thus the authors used whispered speech for which there is no F0. They found that

auditory-only perception was degraded and that adding vision clearly improved prosod-

ic focus detection for whispered speech. Reaction time measurements showed that add-

ing vision also reduced processing time. Further analyses of the data suggested that au-

dition and vision are actually integrated for the perception of prosodic contrastive focus

in French. Even in those cases in which only the auditory information is needed for an

adequate perception (e.g., of prosodic focus), the duration of underlying cognitive op-

erations seems to be reduced when vision is added to audition (Dohen & Loevenbruck

2009).

Srinivasan & Massaro (2003) showed that statements and questions are discriminated

auditorily (on the basis of the F0 contour, amplitude, and duration) and visually (based

on the eyebrow raise and head tilt). However, they found a much larger influence of the

auditory cues than visual cues in this judgment. Their results are consistent with those

reported for Swedish by House (2002), which found that visual cues such as eyebrow

movement and slow vertical head tilting did not strongly signal interrogative intonation.

Nevertheless, Srinivasan & Massaro (2003) also point out that the extended length of

the sentence could have been the responsible for nonoptimal integration. Thus, they

note that the use of a shorter test stimulus (e.g. “Sunny. / Sunny?”) might engage an

optimal bimodal integration process, making statement/question identification a more

automatic perceptual task and less a cognitive decision-making process.

 6

Page 7: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Our hypothesis is that a complementary mode of processing whereby vision pro-

vides information more efficiently than hearing is also possible, and that this is particu-

larly possible for more ambiguous or underspecified parts of the speech stream.

The main goal of this thesis is to investigate the contribution of visual and tonal scal-

ing cues in conveying prosodic meaning in Catalan, especially in cases where acoustic

information is ambiguous. We will first focus on the contribution of visual cues in ex-

pressing a potential difference between an echo question and a narrow focus question, a

case in which auditory information is ambiguous for the listener.

In several Romance languages, there is a contrast between the nuclear intonation pat-

tern of a statement, realized as a rising pitch accent followed by a low boundary tone

L+H* L%, and an echo question, realized as an upstepped rising pitch accent L+¡H*

L% (Savino & Grice 2007 for Bari Italian) —see Figure 1 below. In Study 1 we con-

ducted a behavioral task (identification task) with these auditory stimuli. Our results

demonstrated that Catalan speakers perceive the two ends of the continuum as discrete

categories (Borràs-Comes et al. 2010). Moreover, in the case of Catalan, contrastive

focus intonation is expressed through the use of L+¡H* L% and therefore this particular

nuclear configuration can have both an echo question reading and a contrastive focus

reading. The results of this study are shown in Section 1.

After the pitch range contrast was tested with a series of behavioral tasks (Study 1),

we undertook a study involving the measurements of Event-Related brain Potentials

(ERP) in cooperation with Carles Escera’s group at the Universitat de Barcelona (Study

2; Borràs-Comes et al. 2009). We tested whether the intonational contrast between

statements and echo questions elicits a specific Mismatch Negativity (MMN). Our re-

sults showed that the contrastive pitch range difference plays a decisive role in the elici-

tation of the MMN auditory evoked response (see Näätänen et al. 1997 for a segmental

contrast). We selected three pairs of stimuli from an auditory continuum (namely 0, 5,

10, and 15) with the same physical distance between them. There were two pairs with

an allophonic difference (0-5 and 10-15) and one with a categorical difference (5-10).

Every test pair constituted an oddball, whereby the lower pitch stimulus acted as a STD

and the higher acted as a DEV. Statistics revealed the existence of an MMN at each

condition, but, crucially, the contrast that implied a phonological-categorical change

elicited a greater MMN amplitude, which suggests that intonational contrasts in the tar-

 7

Page 8: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

get language can be encoded automatically in the auditory cortex. The results of this

study are shown in Section 2.

In Study 3, we conducted a semantically motivated identification task dealing with

the contrast between contrastive foci and echo questions. In order to investigate the con-

tribution of visual and acoustic cues to this contrast, we used both visual and auditory

input. The auditory stimuli for these tasks consisted of a continuum that was created by

modifying the F0 height of the peak in 6 steps (distance between each one = 1.2 semi-

tones) of the noun phrase petita [pə.'ti.tə] (‘little’-fem). The visual stimuli consisted of

two videotaped sequences representing the typical facial gestures used when producing

the two F0 contours. An important difference between these two gestures is eyebrow

configuration (Fig. 12). An identification task was carried in which auditory and visual

stimuli were mixed. The results of this task highlighted the crucial role of the visual

information in disambiguating the two meanings (F (1175, 1) = 77.000, p < .001) and

also showed an auditory effect (F (1175, 5) = 77.000, p < .001). Reaction times did not

differ significantly across video stimuli or auditory stimuli, but, crucially, we found that

the two factors interacted significantly (F (1175, 5) = 1.716, p = .005): shorter RTs were

observed when low tones were matched with the contrastive focus movie and high tones

with the echo question movie. Thus Study 3 demonstrates a previously unrecognized

phenomenon in the phonology of intonation, namely, that the perception of the speak-

er’s facial expression is the key factor in disambiguating two otherwise ambiguous F0

patterns. These results are presented in Section 3.

In a project fourth study, we will analyze which gestural elements are responsible for

guiding the listener to one interpretation or another. This study will be undertaken to-

gether with the research groups investigating led by Josep Blat and Núria Sebastián-

Gallés at the Universitat Pompeu Fabra. The auditory stimuli for this study will be a

continuum created by modifying the F0 peak height of the noun phrase Marina

[mə.'ɾi.nə] (proper name) in 4 steps (distance between each one = 2 semitones). The

visual stimuli that can be used for this study will consist of several computer-generated

3D avatars in which each facial gesture will be manipulated separately, namely eyebrow

position, eyelid closure, and head movement. Each gesture element will appear in four

degrees from its typical configuration in echo question and corrective contrastive focus

productions (64 possible visual conditions). Finally, a gating experiment is also being

conducted to investigate the temporal patterns of visual and auditory processing.

 8

Page 9: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

1. The role of pitch range in establishing intonational contrasts in Catalan

1.1 Introduction

In Catalan, the same rising nuclear pitch accent L+H* is used in three different sen-

tence-types, namely statements, contrastive foci, and echo questions. Since the peak

height of the rising pitch accent seems to indicate sentence type, we hypothesized that

these three pragmatic meanings would be differentiated by pitch accent range.

(1) a. — Com la vols, la cullera? What type of spoon do you want?

— Petita[, sisplau]. [I want a] small [spoon, please].

b. — Volies una cullera gran, no? You wanted a big spoon, didn’t’ you?

— PETITA[, la vull, i no gran]. [I want a] little [one, and not a big one].

c. — Jo la vull petita, la cullera I want a little spoon.

— Petita?[, n’estàs segur?] [A] little [one]? [Are you sure?]

In early analyses of the Interactive Atlas of Catalan Intonation (Prieto & Cabré 2008;

see also Prieto, in press), Catalan dialectal data showed that the intonation contours of

these three sentence-types typically differ in their pitch accent height. While the rising

pitch accent of statements is produced with a narrow pitch range (Fig. 1, left panel), that

of echo questions is produced with a much wider pitch range (Fig. 1, right panel).

Figure 1. Waveforms, F0 contours, and Cat_ToBI transcription of the utterance La petita ‘The small one’

produced with a statement meaning (left panel) and an echo question meaning (right panel).

According to the data, our initial hypothesis was that these three sentence-types may

be distributed in three well-differentiated areas of the pitch range (see Fig. 2).

 9

Page 10: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Figure 2. Idealized intonational contours for

neutral statement, corrective contrastive focus

and echo question interpretations.

1.2. Methodology

Twenty native speakers participated in two semantically motivated identification

tasks. Experiment 1 (congruency test) analyzed participants’ acceptance of each stimu-

lus ocurrying within each of the three communicative contexts. In Experiment 2 the

participants had to identify each of the three meanings for each isolated stimulus.

The stimuli for these tasks (see Fig. 3) were a continuum that was created by modify-

ing the F0 height of the peak in 11 steps (distance between each one = 1.2 semitones) of

the noun phrase petita [pə.'ti.tə] (‘little’-fem). Natural productions of the two extreme

contours (echo question and statement) were read by a male native speaker of Catalan,

and these utterances served as the source utterances for our stimuli. The speech manipu-

lation was performed by means of Praat (Boersma & Weenink 2008). The original noun

phrase sentence was pronounced with a rising-falling contour L+H* L%. The rising

movement was realised as a 100 ms-high plateau starting 30 ms after the onset of the

accented syllable /'ti/, and was preceded by a low plateau for the syllable [pə] (102.4

Hz, 100 ms). The posttonic syllable [tə] was realized with low plateau (94.5 Hz, 180

ms). The peak height continuum ranged from 105.3 Hz to 208.7 Hz.

Figure 3. Schematic contour of the pitch

manipulation.

 10

Page 11: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

For Experiment 1 we recorded the communicative contexts shown in (1) as produced

by a female native speaker of Catalan. The experiment was set up by means of the psy-

chology software E-prime version 2.0 (Psychology Software Tools 2009), with which

the data of the response frequencies and RTs were automatically recorded.

1.3. Results

Figure 4 shows the results of Experiment 1. A one-way ANOVA revealed a main ef-

fect of linguistic context on sentence interpretation (F (3582, 2) = 16.579, p < .001).

Tukey HSD post-hoc tests revealed significant differences between contexts Statement

and Question (p < .001), and between Correction and Question (p < .001), but no sig-

nificant differences were found between Statement and Correction (p = .549).

Figure 5 shows the results of Experiment 2. Identification responses as Statement

have their statistical mode (n = 82) at stimulus 1, Correction at 4 (n = 64), and Question

at 10 (n = 116). We compared each combination of possible responses, and in all cases

p < .001. However, Tukey HSD post-hoc tests comparing the responses given for each

stimulus signaled a clear difference between the number of significant differences found

between stimuli in each comparison, with Statement and Question (n = 44), Correction

and Question (n = 41) and, crucially, Statement and Correction (n = 17).

Figure 4. Mean rate of acceptance of each audi-

tory stimulus within each communicative context.

The horizontal dashed line marks the randomness

boundary at 0.3.

Figure 5. Identifications as each pragmatic mean-

ing of each of the auditory stimuli. The horizontal

dashed line marks the randomness boundary at n =

40.

 11

Page 12: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

The results of this study lead us to conclude that rising pitch accents in Catalan can

convey a categorical difference between statements and questions and between correc-

tive foci and questions, but only a gradient difference between statements and corrective

foci. Statements are realized with a L+H* nuclear pitch accent, while corrective foci and

echo questions are realized with an upstepped L+¡H* nuclear accent.

2. Encoding of intonational contrasts as revealed by MMN

2.1. Introduction

Näätänen et al. (1997) showed that the phonological role of a vowel stimulus plays a

decisive role in the elicitation of the mismatch negativity (MMN) auditory evoked re-

sponse, an electrophysiological response that can be measured by subtracting the aver-

aged response to a set of standard stimuli from the averaged response to rarer deviant

stimuli, and taking the amplitude of this difference wave in a given time window. In

Näätänen et al.’s (1997) study, speakers of two languages were presented in the oddball

paradigm with stimulus pairs deriving from both languages. When the (vowel) contrast

was phonological in the native language, the MMN response had a larger amplitude

than when the contrast was between exemplars within a sound category. Kazanina et al.

(2006) and Shestakova et al. (2002) showed that allophonic variation (using acoustical-

ly varying exemplars within a sound category) does not result in the elicitation of the

MMN. Given the MMN’s sensitivity to phonological categorical contrasts, we decided

to test whether the intonational contrast differentiating statements and echo questions

found in behavioral tasks (see Study 1) could elicit a specific MMN response.

2.2. Methodology

Study 2 mainly involved measuring Event-Related brain Potentials when subjects re-

sponded to three pseudorandom oddball stimulus pairs with the aim of finding electro-

physiological evidence for the categorical distinction between statements and echo

questions based on a scaling difference.

We first carried out a pilot experiment (Fig. 6) consisting of a double-identification

task between a statement meaning and an echo question meaning (see Study 1). The set

 12

Page 13: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

of stimuli was a continuum of 16 steps (distance = 0.6 semitones). The results of the

previous behavioral experiment were used to select the 4 stimuli used in the main ex-

periment. We selected steps 0, 5, 10 and 15. Steps 0 and 5 were categorized as state-

ments, and steps 10 and 15 as echo questions.

Figure 6. Results summary of our behavioral pilot

experiment Figure 7. Idealized intonational contours of the

stimuli used in the ERP study

As Figure 6 shows, the same physical distance was kept between each pair of stimuli,

namely 3 semitones (see also Fig. 7). Note, however, that created two allophonic differ-

ences (between 0 and 5, and 10 and 15) and one categorical difference (between 5 and

10). Each pair of stimuli constituted an oddball block in our ERP study and, hypotheti-

cally, the stimulus pair 05-10 should trigger the largest MMN amplitude.

Twenty-four Central Catalan native speakers participated in the ERP experiment.

Importantly, subjects were instructed to watch a silent video movie and to ignore the

auditory stimulation. The three oddball blocks were presented in random order. While

the lower pitch stimulus acted as a STD (80%), the higher acted as a DEV (20%).

The EEG was recorded from 33 Channels (10-20 system) at a 512 Hz sampling rate,

with the reference on the tip of the nose and the ground electrode on the sternum. Con-

tinuous EEG data was band-pass filtered off-line between 1-20 Hz and epoched from -

100 to 600ms after stimulus onset, for deviants and standards in each block separately

(180 trials per condition). The MMN was defined as the most negative peak in a time

window of 80-200 ms post-deviance onset in deviant-minus-standard difference waves.

ERP amplitudes were retrieved from the difference waves at 225-275ms post stimulus

onset to explore effects in the N1 time range, and at 285-315ms to explore effects in the

MMN time range.

 13

Page 14: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

2.3. Results

Independent one sample t-tests revealed the existence of an MMN at each condition

(1st Contrast, t18 = –2.476, p < 05; 2nd Contrast, t18 = –6.119, p<10–5; 3rd Contrast,

t18 = –3.467, p < .005). Figure 8 shows the Deviant minus Standard difference waves

elicited for all three conditions at Fz re-referenced to M1. A one-factor ANOVA with

three levels revealed an effect of the acoustic contrast at the N1 time range, increasing

in negativity with the pitch of the stimulus (F(2,36) = 3.633, p < .05). Crucially, the

central pair of stimuli (5/10), which involves the same acoustic change as the other con-

trasts but implies a phonological-categorical change, elicited a greater MMN amplitude,

although it was only marginally significant (F(2,36) = 2.270, p = .118, including Fz, F4,

FC1, FC2 and Cz electrodes).

Figure 8. Deviant minus Standard differ-

ence waves elicited for all 3 conditions at

Fz re-referenced to M1 (contrast 1-green, 2-

black, and 3-red)

Figure 9 shows the scalp potential distribution maps at two time windows (N1 and

MMN) extracted from the Deviant minus Standard difference waves (combined masto-

ids reference). The typical MMN fronto-central negativity with polarity inversion at the

mastoids can be seen in the 3 conditions.

1st contrast 2nd contrast 3rd contrast

Figure 9. Scalp potential distribution maps at two time windows (N1 and MMN) extracted from the De-

viant minus Standard difference waves (combined mastoids reference).

 14

Page 15: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Our ERP study thus revealed a stronger MMN brain response when contrasting into-

national contours that were phonologically contrastive than when listeners heard pairs

of more distant non-contrasting contours. These results suggest that intonational con-

trasts in the target language are encoded automatically in the auditory cortex.

3. Visual contribution to tune interpretation: The role of facial gestures in combi-

nation with tonal scaling

3.1. Introduction

It has been shown that a L+¡H* L% nuclear configuration is ambiguous. It can be

used to express both contrastive foci and echo questions. The goal of Study 3 was there-

fore to investigate the role of visual cues in disambiguating the meaning of two other-

wise ambiguous F0 patterns (see Figure 1 above).

Figure 10 shows a static example of facial gestures that can be associated with each

of the three meanings under study, namely, statement, contrastive focus and echo ques-

tion meaning.

Figure 10. Examples of the video frames used for each interpretation of the auditory sequence in the

gating task: statement (left panel), contrastive focus (mid panel), and echo question (right panel)

We sought to investigate how tonal scaling and facial gestures interact when Catalan

listeners try to identify a contrastive focus, a statement, or an echo question. Our main

 15

Page 16: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

hypothesis was that visual cues would play a crucial role in disambiguating the meaning

of two otherwise ambiguous F0 patterns.

3.2. Methodology

Twenty native subjects participated in an experiment consisting in a semantically

motivated identification task dealing with the contrast between corrective/contrastive

foci and echo questions. The auditory stimuli for these tasks were the same as in Study

1, an auditory continuum created by modifying the F0 height of the peak in 6 steps (dis-

tance between each one = 1.2 semitones) of the noun phrase petita [pə.'ti.tə] (‘little’-

fem).

A male native speaker of Catalan was videotaped pronouncing both possible inter-

pretations of the intonational contour. From these two video files three static images

were extracted: one for the initial neutral gesture, another simultaneous with the H into-

national peak and the third more representing the final state of the utterance (see Fig.

11). These three static images were associated in time with each syllable of the auditory

stimuli and after that sequences of images were interpolated in between the target

points.

Echo question

Contrastive focus

Figure 11. Video frames used for each interpretation of the auditory sequence.

Each auditory stimulus was presented simultaneously with each set of visual stimuli.

When perceiving these audiovisual stimuli, subjects had to decide which interpretation

 16

Page 17: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

was more likely for each stimulus by pressing the corresponding computer key. Each

task consisted of five blocks in which every stimulus in the continuum was presented to

the subjects in a randomised order. The data of the response frequencies and RTs were

automatically recorded in E-prime.

3.3. Results

Figure 12 shows the mean identification rate found for each video stimulus depend-

ing on the auditory stimulus (left panel) and their mean RTs (right panel). One can ob-

serve a clear preference for visual cues in the listener’s main decisions, but also a cru-

cial interaction between the visual information and the auditory stimuli.

Figure 12. Mean identification rate found for each video stimulus depending on

the auditory stimulus (left panel) and mean reaction times (right panel).

A one-way ANOVA analysis revealed an effect of visual stimulus (F (1175, 1) =

77.000, p < .001) and an effect of auditory stimulus (F (1175, 5) = 77.000, p < .001).

However, the interaction between the two factors was not significant (F (1175, 5) =

77.000, p = .885).

Nevertheless, the graph showing the reaction times clearly shows an interaction be-

tween the auditory-visual information: when a question-based visual stimuly occurred

with a low-pitched auditory stimulus, even if the identification response was Question,

an important time delay appears in the response. This is also the case when focus-based

visual stimuli occurred with high-pitch auditory stimuli. There was neither an auditory

 17

Page 18: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

nor visual effect, but, importantly, the interaction between the two factors was statisti-

cally significant (F (1175, 5) = 1.716, p = .005).

These facts were corroborated in a second experiment in which several visual steps

were interpolated between the first two original ones (using a face morph technique).

Crucially, in the less pronounced sets of gestures the role of audio stimuli increased.

An additional pilot experiment using the gating paradigm (Grosjean 1996) was car-

ried out. A set of gated utterances were presented as stimuli, with one subset exemplify-

ing the statement, another contrastive focus and a third the echo question. These utter-

ances occurred in the three possible modalities (auditory-visual AV, auditory only A,

and visual only V). Preliminary results revealed the following:

- In visual conditions, echo questions were recognized immediately (from the first

gate). In this case, no differences appeared that were dependent on the presence

of simultaneous auditory input. The two other types of meanings (statement and

contrastive focus) were discriminated later (after the fifth gate), mostly when

participants perceived the gestural configuration as strongly marked and there-

fore belonging to a focused type.

- The recognition point was first found in the AV condition (between the first gate

and the fourth), being closely followed by the V condition. The responses to the

A condition were late (after the ninth gate).

4. Disentangling face gestures in audiovisual speech perception

In order to find out which gestural elements are responsible for guiding the speakers’

interpretations, a planned Study 4 will consist of a fine-grained analysis of the gestural

cues involved in the perception of meaning by means of 3D modelled stimuli (see Fig.

13 for a set of examples). This study will be undertaken together with the research

groups led by Josep Blat and Núria Sebastián-Gallés at the Universitat Pompeu Fabra.

 18

Page 19: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Figure 13. Examples of 3D avatar faces, illustrating some of the possibilities that this technique permits

Wierzbicka (2000) covers the fundamental assumptions underlying a new approach

in the study of the human face (see also Mandler 1997). She stresses the need to distin-

guish the “semantics of human faces” from the “psychology of human faces”. The basis

for the interpretation of facial gestures is, above all, experiential (see also Dohen 2009).

In addition, certain components of facial behavior have constant context-independent

meanings. Therefore, facial expressions can convey meanings comparable to the mean-

ings of verbal utterances and their semantic analysis must distinguish between the con-

text-independent invariant and its contextual interpretations (Clark 1996, Kendon 1980,

McNeill 1992).

The auditory stimuli for this study will be a continuum created by modifying the F0

height of the peak in 4 steps (distance between each one = 2 semitones) of the noun

phrase Marina [mə.'ɾi.nə] (fem. prop. name). Natural productions will be read by a male

native speaker of Catalan and these utterances will serve as the source utterances for our

stimuli. Speech manipulations will be performed by means of Praat (Boersma & Wee-

nink 2008) to obtain several correlates of the noun phrase petita (used in the previous

studies).

The visual stimuli used for this study will consist of several computer-generated 3D

avatars in which we will manipulate separately each facial gesture involved in the

change in pragmatic meaning, namely eyebrow position, eyelid closure, and head

movement. Each gesture element will appear in four degrees from its typical configura-

 19

Page 20: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

tion in echo question productions and corrective contrastive focus productions, leading

to 64 possible visual conditions. The sentences will be presented in audiovisual modali-

ty. There will be one utterance (Marina) × 4 auditory steps × 64 visual steps, yielding a

total of 256 conditions.

References

Barkhuysen, P.; Krahmer, E.; Swerts, M. (2005). Problem detection in human-machine interac-

tions based on facial expressions of users. Speech Communication, 45(3), 343-359.

Boersma, P.; Weenink, D. (2008). Praat: doing phonetics by computer (version 5.0.09). Com-

puter Program. On line < http://www.fon.hum.uva.nl/praat/>

Borràs-Comes, J.; Costa-Faidella, J.; Prieto, P.; Escera, C. (2009). Encoding of intonational

contrasts as revealed with the mismatch negativity (MMN). II BRAINGLOT Workshop. Bar-

celona, 2009.

Borràs-Comes, J.; Vanrell, M.M.; Prieto, P. (2010). The role of pitch range in establishing into-

national contrasts in Catalan. Proceedings of Speech Prosody. Chicago, 2010

Cavé, C.; Guaïtella, I.; Bertrand, R.; Santi, S.; Harlay, F.; Espesser, R. (1996). About the relati-

onship between eyebrow movements and F0 variations. Proceedings of the International

Conference on Spoken Language Processing (ICSLP). Philadelphia, 2175-2179.

Clark, Herbert H. (1996). Using language. New York: Cambridge University Press.

Dohen M. (2009). Speech through the ear, the eye, the mouth and the hand. In A. Esposito, A.

Hussain, and M. Marinaro (eds.). Multimodal Signals: Cognitive and Algorithmic Issues, 24-

39. Springer: Berlin/Heidelberg.

Dohen, M.; Lœvenbruck, H. (2005). Audiovisual Production and Perception of Contrastive Fo-

cus in French: A Multispeaker Study. Proceedings of Interspeech 2005, 2413-2416.

Dohen, M.; Lœvenbruck, H. (2009). Interaction of audition and vision for the perception of

prosodic contrastive focus. Language and Speech, 52(2/3), 177-206.

Dohen, M.; Loevenbruck, H.; Cathiard, M.-A.; Schwartz, J.-L. (1004). Visual Perception of

Contrastive Focus in Reiterant French Speech. Speech Comm, 44, 155--172.

Dohen, M.; Loevenbruck, H.; Hill, H. (2006). Visual Correlates of Prosodic Contrastive Focus

in French: Description and Inter-Speaker Variabilities. Proceedings of Speech Prosody 2006,

p. 221-224.

 20

Page 21: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Goldin-Meadow, S.; McNeill, D. (1999). The role of gesture and mimetic representation in

making language the province of speech. In Michael C. Corballis, and Stephen Lea (eds.),

The descent of mind. Oxford: Oxford University Press, 155-172.

Goldin-Meadow, S.; Alibali, M. W.; Church, R. B. (1993). Transitions in concept acquisition:

Using the hand to read the mind. Psychological Review, 100, 279–297.

Granström, B.; House, D. (2005). Audiovisual representation of prosody in expressive speech

communication. Speech Comm, 46, 473-484.

Grosjean, F. (1996). Gating. Language and Cognitive Processes, 11(6), 597-604.

Hadar, U.; Steiner, T. J.; Grant, E. C.; Rose, F. C. (1983). Head movements correlates of junctu-

re and stress at sentence level. Language and Speech, 26, 117-129.

House, D. (2002). Perception of question intonation and facial gestures. Proceedings of Fonetik,

44(1), 41-44.

Kazanina, N.; Phillips, C.; Idsardi, W. (2006). The influence of meaning on the perception of

speech sounds. Proceedings of the National Academy of Sciences, 103(13), 11381-11386.

Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In Mary

R. Key (Ed.), Relationship of verbal and nonverbal communication (pp. 207–228). The Ha-

gue: Mouton.

Krahmer, E.; Swerts, M. (2005). How children and adults produce and perceive uncertainty in

audiovisual speech. Language and Speech, 48(1), 29-54.

Krahmer, E.; Swerts, M. (2006). Perceiving Focus. In C. M. Lee (ed.). Topic and Focus: A

Cross-Linguistic Perspective, 121-137. Kluwer, Dordrecht.

Krahmer, E.; Swerts, M. (2007). The effects of visual beats on prosodic prominence: Acoustic

analyses, auditory perception and visual perception. Journal of Memory and Language,

57(3), 396-414.

Mandler, G. (1997). Foreword. In Russell and Fernandez-Dols (eds). The Psychology of Facial

Espression. Cambridge: Cambridge University Press.

McGurk, H.; MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264, 746-748.

McNeill, D. (1992). Hand and mind. Chicago: University of Chicago Press.

McNeill, D. (2005). Language and thought. Chicago: University of Chicago Press.

Näätänen, R.; Lehtokoski, A.; Lennes, M.; Cheour, M.; Huotilainen, M.; Iivonen, A.; Vainio,

M.; Alku, P.; Ilmoniemi, R. J.; Luuk, A.; Allik, J.; Sinkkonen, J.; Alho, K. (1997). Langua-

ge-specific phoneme representations revealed by electric and magnetic brain responses. Na-

ture, 385(6615), 432-434.

 21

Page 22: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Nakano, Y. I.; Reinstein, G.; Stocky, T.; Cassell, J. (2003). Towards a model of face-to-face

grounding. Proceedings of the Annual Meeting of the Association for Computational Lin-

guistics (ACL). Sapporo, Japan.

Prieto, P.; Cabré, T. [coords.] (2008). Atles interactiu de l’entonació del català. On line

<http://prosodia.uab.cat/atlesentonacio/>

Prieto, P. (in press). The Intonational Phonology of Catalan. In S. A. Jun (ed.), Prosodic Typo-

logy 2. Oxford: Oxford University Press.

Psychology Software Tools. (2009). E-Prime (version 2.0). Computer Program. On line <

http://www.pst-net.com/>

Savino, M.; Grice, M. (2007). The role of pitch range in realising pragmatic contrasts – The

case of two question types in Italian. Proceeding of ICPhS XVI. Saarbrücken.

Shestakova, A.; Brattico, E.; Huotilainen, M.; Galunov, V.; Soloviev, A.; Sams, M.; Ilmoniemi,

R. J.; Näätänen, R. (2002). Abstract phoneme representations in the left temporal cortex:

magnetic mismatch negativity study. Neuroreport, 13(14), 1813-1816.

Srinivasan, R. J.; Massaro, D. W. (2003). Perceiving from the face and voice: Distinguishing

statements from echoic questions in English. Language and Speech, 46(1), 1-22.

Swerts, M.; Krahmer, E. (2004). More about brows: a cross-linguistic analysis-by-synthesis

study. In C. Pelachaud and Zs. Ruttkay (eds.). From Brows to Trust: Evaluating Embodied

Conversational Agents. Kluwer Academic Publishers, 191-216.

Swerts, M.; Krahmer, E. (2008). Facial expressions and prosodic prominence: Comparing mod-

alities and facial areas. Journal of Phonetics, 36(2), 219-238.

Wierzbicka, A. (2000). The semantics of human facial expressions. Pragmatics & Cognition,

8(1), 147-183.

 22

Page 23: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Appendix 1. Commented references

Dohen M. (2009). Speech through the ear, the eye, the mouth and the hand. In A. Esposito, A. Hussain,

and M. Marinaro (eds.). Multimodal Signals: Cognitive and Algorithmic Issues, 24-39. Springer: Ber-

lin/Heidelberg.

This article explores the multi-modal aspect of speech, in perception and produc-

tion. It focuses both on segmental phonology and, especially, on suprasegmentals,

analyzing all aspects of audiovisual communication like hand and mouth move-

ments. The paper describes many mechanisms associated with the presence or ab-

sence of contrastive focus in declarative sentences, a major theme in our Studies 3

and 4.

Näätänen, R.; Lehtokoski, A.; Lennes, M.; Cheour, M.; Huotilainen, M.; Iivonen, A.; Vainio, M.; Alku,

P.; Ilmoniemi, R. J.; Luuk, A.; Allik, J.; Sinkkonen, J.; Alho, K. (1997). Language-specific phoneme

representations revealed by electric and magnetic brain responses. Nature, 385(6615), 432-434.

This article is the precursor of the use of Event-Related Potentials (ERP) in the

study of phonetic and phonological variables. Of particular interest are the ap-

proach of the issue and the issues around the analysis of the potential negative

disparity (MMN). The experiment compared the brain activity of two populations

with different first language. Results showed that each group reacted differently to

the same stimuli, depending on whether the contrast was phonetic or phonological

in their language. Our Study 2 has sought to check the validity of this work in su-

prasegmentals.

Prieto, P. (in press). The Intonational Phonology of Catalan. In S. A. Jun (ed.), Prosodic Typology 2.

Oxford: Oxford University Press.

This article collects and systematizes the inventory of Catalan intonational confi-

gurations. It introduces the model of Tone and Break Indices and applies it to Cat-

alan language (Cat_ToBI). The conclusions of this publication, as well as the spe-

cific knowledge of all cases —tangible in the Interactive Atlas of Catalan Intona-

tion— provide the acoustic issues that must be resolved in this thesis.

 23

Page 24: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Savino, M.; Grice, M. (2007). The role of pitch range in realising pragmatic contrasts – The case of two

question types in Italian. Proceeding of ICPhS XVI. Saarbrücken.

This study is essential to perform our Study 1. The authors point out that in Bari

Italian the same tonal accent is used for two types of questions, information-

seeking questions and confirmation-seeking questions. The study of perception

that takes place is closely linked with ours. This article is also one of the first

showing that the pitch range variation between two pitch accents may involve a

pragmatic categorical difference.

Srinivasan, R. J.; Massaro, D. W. (2003). Perceiving from the face and voice: Distinguishing statements

from echoic questions in English. Language and Speech, 46(1), 1-22.

This article discusses the visual perception between affirmative and interrogative

sentences, another major theme of this thesis. The methodology used consisted of

a three-dimensional manipulation of real images, and it is particularly interesting

for the design and analysis in our Studies 3 and 4. The discussion on the role of

vision compared to the role of acoustic information is particularly interesting.

 24

Page 25: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

Appendix 2. Work schedule

Març 2010 - Juliol 2010

• S’escriurà l’estudi 1 en format d’article. Es presentarà oralment a la conferència

internacional Speech Prosody 2010 (Chicago, Illinois, EUA). Se n’enviarà a publi-

car l’article.

• S’acabaran d’analitzar els resultats dels experiments de l’estudi 2. Es començarà a

escriure’n l’article.

• S’escriurà l’estudi 3 en format d’article. Es presentarà oralment a la 12th Conferen-

ce on Laboratory Phonology (Alburquerque, Nou Mèxic, EUA). Se n’enviarà a pu-

blicar l’article.

• S’acabarà de preparar la metodologia de l’estudi 4. Es duran a terme els experiments

que s’hi inclouen. Es començarà a escriure l’article corresponent.

• Es començarà un nou estudi 5 en què s’utilitzarà el paradigma de gating. Se

n’acabarà de concretar la metodologia i es començaran a passar els experiments.

Setembre 2010 - Febrer 2011

• L’estudi 1 es presentarà al 20th Colloquium on Generative Grammar (CGG20)

(Barcelona).

• L’estudi 2 es presentarà a la conferència internacional Fourth European Conference

on Tone and Intonation (TIE4) (Estocolm, Suècia). Se n’acabarà d’escriure l’article.

Tot seguit, s’enviarà a publicar.

• L’estudi 3 es presentarà a la conferència internacional Fourth European Conference

on Tone and Intonation (TIE4) (Estocolm, Suècia).

• S’acabarà d’escriure l’estudi 4 en format d’article. Tot seguit s’enviarà a publicar

l’article corresponent.

• S’escriurà l’article corresponent a l’estudi 5. Tot seguit s’enviarà a publicar.

• Es començarà un estudi 6 que posarà a prova, en el marc dels tons de frontera (i ja

no només en accents tonals), els resultats trobats als estudis 1 i 3.

 25

Page 26: JOAN BORRÀS-COMESprosodia.upf.edu/home/arxiu/tesis/doctorat/projecte_tesi_borras.pdf · anat destacat la feblesa dels efectes visuals comparats amb els acústics (Srinivasan & Massaro

 26

Març 2011 - Juliol 2011

• L’estudi 4 es presentarà al 17th International Congress of Phonetic Sciences

(ICPhS) (Hong Kong, Xina).

• L’estudi 5 es presentarà a la conferència internacional Phonetics and Phonology in

Iberia (PaPI '11) (Tarragona). Es presentarà també al 17th International Congress

of Phonetic Sciences (ICPhS) (Hong Kong, Xina).

• S’acabaran els experiments relatius a l’estudi 6 i se’n durà a terme l’anàlisi. Se

n’escriurà l’article corresponent i s’enviarà a publicar.

Setembre 2011 - Febrer 2012

• Es presentaran els estudis 4 i 6 en alguna conferència internacional.

• Es redactarà la tesi i es dipositarà.

Març 2012

• Es defensarà la tesi.