the statistical analysis of acoustic correlates of speech...

33
The statistical analysis of acoustic correlates of speech rhythm Denise Duarte* Universidade Federal de Goiás and Universidade de São Paulo Antonio Galves Universidade de São Paulo Nancy L. Garcia Universidade Estadual de Campinas Ricardo Maronna* Universidad de La Plata http://www.ime.usp.br/~tycho * : authors who presented the paper

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

The statistical analysis of acoustic correlates of speech rhythm

Denise Duarte*Universidade Federal de Goiásand Universidade de São Paulo

Antonio Galves Universidade de São Paulo

Nancy L. GarciaUniversidade Estadual de Campinas

Ricardo Maronna*Universidad de La Plata

http://www.ime.usp.br/~tycho* : authors who presented the paper

1. Introduction

" Data description: two corpora

" “20 sentences”: 20 sentences spoken three times by two female native speakers of BP and EP ( segmented by Flaviane R. Fernandes and Janaisa M. Viscardi)

" “RNM”: 20 sentences of each of : English, Polish, Dutch, French, Spanish, Catalan, Italian, Japanese 5 sentences uttered by each of 4 female speakers

Purposes

" Apply the RNM approach to the enlarged data set.

" Present alternative descriptive statistical measures

" Analize the effect of dropping the last vocalic interval of each sentence.

" Introduce a probalility model for duration, which allows for improved descriptions and hypothesis testing

" Use this model to give statistical support to the rhythmic class hypothesis

The RNM statistics

For each sentence of the corpus the following are computed:

∆C, ∆V= standard deviation for vocalic and consonantal intervals

%V= proportion of time spent on vocalic intervals

Values are averaged for each speaker

%V, ∆C and ∆V for the ten languages

2.51 5.14 414.64 5.35 40.14.23 5.33 42.33.78 4.39 43.63.32 4.74 43.84.00 4.81 45.23.68 4.52 45.64.02 3.56 53.1

EP 4.33 5.57 45.3BP 4.01 4.53 49.1

Languages ÇV ÇC %VPolishEnglishDutchFrenchSpanishItalian CatalanJapanese

%V vs. ∆C for the ten languages

%V and ∆C for individual speakers

0.35 0.40 0.45 0.50 0.55%V

0.03

0.04

0.05

0.06

0.07

∆C

ca

ca

ca

cadu

du

du

du

en

en

en

en

es

es

eses

frfr

frfr

it

it

it

it

ja

ja

jaja

po

popo

po

EP1

EP2

BP2 BP1

∆V vs. ∆C for the ten languages

2.5 3.0 3.5 4.0 4.5∆V

3.6

4.1

4.6

5.1

5.6

∆C

Polish

EnglishDutch

French

SpanishItalian

Catalan

Japanese

EP

BP

3. Alternative analysis

" 3.1 Dropping the last vocalic interval

" The last vocalic interval is an important source of variability.

" It was obserded that in BP and EP there is a stretching in final vocalic intervals.

" New data set: omitting the last vocalic interval for each language, and also the subsequent consonantal interval, if one exists.

The data without the last vocalic interval

40 42 44 46 48 50 52 54% V

3.6

4.1

4.6

5.1

∆C

j ap

bp

cat

i taspa

fre

dut

ep

pol

eng

Location of BP and EP speakers in the %V vs. ∆C Plane – complete sentences

The effect of the last vocalic interval in BP and EP- individual values

40 42 44 46 48 50 52 54% V

4.0

4.5

5.0

5.5

∆ C

e p :w lv

b p

b p :w lv

B P s p 1

B P s p 1 - w lv

B P s p 2

B P s p 2 - w lv

E P s p 2

E P s p 2 - w lv

E P s p 1

E P s p 1 - w lv

e p

3.2 Robust statistics

“Robust”= insensive to extreme valuesSimplest robust measure of location: replace the

mean by the median.To find the median of a set of numbers: sort them

and pick the one in the middleSimplest robust measure of dispersion: replace the

standard deviation by the median absolute deviation(MAD).

=+

+=

=+ evenisnif

nmwithxx

oddisnifn

mwithxxmedian

mm

m

2)(

21

2

1

)(

)1()(

)(

)()( xmedxmedxMAD i −=

Robust statistics

40 42 44 46 48 50 52 54PVmed

3.0

3.5

4.0

4.5

5.0

5.5

DC

mad

eng

pol

dut

fre

esp

ita cat

jap

bp

ep

4. A probability model for duration

" Former analysis is descriptive Finding a parametric family of probability distributions that fits the data closely would have two advantages:

" May yield a better description of data

" Allows us to make inference, i. e., to extend results from the “sample” (the data set) to the “population” ( the set of all potential setences)

4.1 Histograms show similar asymmetrical shapes

-0.050 0.005 0.060 0.115 0.170 0.225 0.280 0.335 0.390 0.445time

0

10

20

30

40

Histogram: consonantal intervals- Dutch

0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 0.225 0.250 0.275time

0

5

10

15

20

25

30

Histogram: Consonantal intervals - Italian

Several distributions tried: Log-normal, Weibull, exponential, Gamma

The Gamma was the best fit :quantile-quantile (QQplot)-Given a data set and a theoretical, plot the quantiles of the

latter vs those of the empirical distribution

Gamma distribution

It has two parameters:α and β, controlling shape and size, respectively:

small α: high asymmetry; α = 1 gives the the exponential distribution;

Large α approximates the normal. The parameters are related to the mean µ and

standard deviation σ byµ=αβσ2=αβ2

Estimated Gamma parametersvocalic intervals

Estimated Gamma parameters consonantal intervals

Mean values for vocalic and consonantal intervals

0.1070 0.073100.1000 0.075800.1020 0.07110

E P 0.1000 0.078000.0950 0.073600.0970 0.074300.0970 0.077100.0890 0.07440

B P 0.0950 0.085800.0780 0.08780

MeanLang Consonantal Voca licEngl ishDutc hPo lis h

Frenc hSpanishItalianCatalan

Japanes e

5. Hypothesis testing

In view of the close relationship between α and β , we may use one of the two, or one function of

both, to represent relevant features.Based on the RNM results, we choose the model

standard deviation :â=ÒÑ1/2

Standard deviation of Gamma for the ten languages – complete sentences

0 .035 0 .040 0 .045 0 .050 0 .055s tdC

J a p Fre C a t E s p Ita

B P

D u t P o l

EP

E n g

Rhythmic class hypothesis

" We represent the rhythmic class hypothesis by the following statistical model:

" 1.The syllabic languages ( Italian, Spanish, French, Catalan , BP) have the same standard deviation, say, σ1.

" 2.The accentual ones (Polish, Dutch, English, EP) share another, say, σ2

" 3. σ1, σ2and the standard deviation for Japanese σ3 are different .

Results

To test the model, we first tested (1) and (2) by means of the Likelihood Ratio Test, which yielded a p-value of 0.91, which means that the equality of σσ's within rhythmic classes is highly compatible

with the data ( a small p-value indicates rejection).

Then we tested the null hypothesis that some of σ1 , σ2, σ3 are equal, which was rejected with a p-value of 0.0012, thus giving statistical evidence that the

three are different.

Acknowledgments

We want to thank Franck Ramus, Marina Nespor and Jacques Mehler, who generously made their unpublished data avalaible to us. We also thank Janaisa Viscardi and Flaviane

Fernandes for the segmentation of the acoustic data

The “20 sentences” corpus

The following sentences of the corpus 20 sentences were considered in the statistical analysis. The choice was based on the quality of acoustic signal and to avoid dubious cases of labeling.

1. A moderniza��o foi satisfatória. 5. A falta de moderniza��o � catastrófica.6. O trabalho da pesquisadora foi publicado.8. O governador aceitou a moderniza��o.9. A falta de autoridade foi alarmante.11. A catalogadora compreendeu o trabalho da pesquisadora.12. A professora discutiu a gramaticalidade.15. A procura da gramaticalidade � o nosso objetivo.16. A pesquisadora perdeu autoridade.18. A autoridade cabe ao governador.20. A gramaticalidade das frases foi conseguida.

Grants supporting the research

FAPESP grant n. 98/3382-0(Projeto Temático Rhythmic patterns, parameter setting and

language change )

PRONEX grant 66.2177/1996-6 (Núcleo de Excel�ncia Critical phenomena in probability and stochastic processes)

CNPq grant 465928/2000-5 (Probabilistic tools for pattern identification applied to linguistics)

Related papers and referencesAbercrombie, D. (1967). Elements of general phonetics. Chicago: Aldine.

Grabe, E. and Low, E., L. (2000) Acoustic correlates in rhythmic class. Paper presented at the 7th conference on laboratory phonology, Nijmegen.

Lloyd, J. (1940) Speech signal in telephony. London.

Mehler, J., Jusczyk, P., Dehane-Lambertz, G., Bertoncini, N. And Amiel-Tison, C. (1988) A percursor of language acquisition in young infants. Cognition 29: 143-178.

Nazzi, T., Bertoncini, N. and Mehler, J. ( 1998) Language discrimination by newborns towads an understanding of the role of the rhythm. Journal of experimental psychology: human perception and perfomance 24 (3): 756-766.

Nespor, M. (1990) On the rhythm parameter in phonology. Logical issues in language acquisition, Iggy Roca , 157-175.

Ramus, F. And Mehler, J. ( 1999). Language acquisition with suprasegmental cues: a study based on speecch resynthesis. JASA 105: 512-521.

Ramus, F., Nespor, M. and Mehler (1999) Correlates of linguistic rhythm in speech. Cognition 73: 265-292.

Frota, S. and Vigário, M.(2001) On the correlates of rhythm distinctions: the European/ Brazilian Portuguese case. To be published in Probus.

Appendix: the meaning of a p-valueConsider the situation of testing a statistical hypothesis: To

fix ideas, suppose that we have samples from two populations, and we want to test the hypothesis that both have the same (unknown) mean. Of course, even if the hypothesis is true, the two sample means will be different, due to sampling variability.

To test the hypothesis, we compute a number T from our data (the so-called “test statistics”) which measures the discrepancy between the data and the hypothesis. In our example T will depend on the differences between the sample means. If T is very large, we have a statistical evidence against the hypothesis. What is a rational definition of “large”?

Suppose our data yields T=3.5; and that we compute the probability p that, if the hypothesis is true, we obtain a value of T greater than 3.5. This the so-called “p-value” of the test. If, say, p= 0.002, this means that, if the means are equal, we would be observing an exceptionally large value ( since a

larger one is observed only with probability 0.2%); Thus we would have grounds to reject the

hypothesis.