lecture 3:phonetics and spoken language - universitetet … · lecture 3:phonetics and spoken...

27
Lecture 3:Phonetics and spoken language Pierre Lison, Language Technology Group (LTG) Department of Informatics Fall 2012, September 3 2012 fredag 31. august 2012 @ 2012, Pierre Lison - INF5820 course Outline Introduction to phonetics Spoken language syntax Summary 2 fredag 31. august 2012

Upload: hoangdien

Post on 17-Apr-2018

220 views

Category:

Documents


1 download

TRANSCRIPT

Lecture 3:Phonetics and spoken language

Pierre Lison, Language Technology Group (LTG)

Department of Informatics

Fall 2012, September 3 2012

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Outline

• Introduction to phonetics

• Spoken language syntax

• Summary

2fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Outline

•Introduction to phonetics

• Articulatory phonetics

• Pronunciation variation

• Acoustic phonetics

• Spoken language syntax

• Summary

3fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Phonetics

• A basic knowledge of phonetics is useful when working on spoken dialogue systems

• How does speech recognition work?

• Why is speech recognition difficult?

• How can we detect the intonation of an utterance?

• How does speech synthesis work?

• The next slides will introduce some of the key concepts we will use later

4fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Phonetics and phonology

• Phonetics = scientific study of speech sounds

• Articulatory phonetics: how are sounds produced by the vocal organs?

• Acoustic phonetics: what are the acoustic properties of speech sounds?

• Auditory phonetics: how are sounds perceived by the human ears and brain? (we’ll skip that part)

• Phonology: scientific study of how sounds are organised in a given language

5fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Phonetics

• The basic phonetic unit is the phone, which is a distinctive speech sound

• The International Phonetic Alphabet is a standard for transcribing the sounds of all human languages

• Currently 107 letters, 52 diacritics, and four prosodic marks

• A specific language (e.g. English or Norwegian) will only use a subset of these sounds

• Other compatible transcription codes, such as ARPAbet for U.S. English

6fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

IPA transcription

7

Bokmål

Nordavinden og sola kranglet om hvem av dem som var den sterkeste. Da kom det en mann gående med en varm frakk på seg. De blei enige om at den som først kunne få mannen til å ta av seg frakken skulle gjelde for sterkere enn den andre. Så blåste nordavinden av all si makt, men dess mer han blåste, dess tettere trakk mannen frakken rundt seg, og til sist gav nordavinden opp. Da skinte sola fram så godt og varmt, og straks tok mannen av seg frakken. Og så måtte nordavinden innrømme at sola var den sterkeste av dem.

IPA (Oslo)

[ ˈˈnuːɾɑˌʋinˑn ̩ɔ ˈsuːln ̩ˈˈkɾɑŋlət ɔm ˈʋem ɑ dem sɱ̩ ˈʋɑː ɖɳ̩ ˈˈstæɾk̥əstə ˌdɑˑ ˈkʰɔmː de n ˈmɑnː ˌgɔˑənə me n ˈʋɑɾm ˈfɾɑkː pɔ ˌsæ di ble ˈˈeːnjə ɔm ɑt ˈdenː sɱ ̩ˈføʂt̠ ˌkʰʉnˑə fɔ ˈmɑnːn ̩ tɔ ˈˈtʰɑː ɑ sæ ˈfɾɑkːən ˌskʉlˑə ˈˈjelːə fɔ ɖɳ̩ ˈˈstæɾkəstə ɑ ˌdemˑ ˈsoː ˈˈbloːstə ˈˈnuːɾɑˌʋinːn ̩ ɑ ˈʔɑlː sin ˈmɑkʰtʰ men ju ˈmeːɾ ɦɑm ˈˈbloːstə ju ˈˈtʰetːəɾə ˌtɾɑkˑ ˈmɑnːn ̩ ˈfɾɑkːən ˈɾʉnt sæ ɔ tə ˈsist ˌmɔtˑə ˈˈnuːɾɑˌʋinˑn ̩ˈˈjiː ˌɔpʰˑ ˌdɑː ˈˈʂintə ˈsuːln ̩ˈfɾem ˌsɔˑ ˈgɔtʰː ɔ ˈʋɑɾmtʰ ɑt̚ ˈmɑnːən ˈstɾɑks ˌmɔtˑə ˈˈtɑː ɑ sæ ˈfɾɑkːən ɔ ˈsoː ˌmɔtˑə ˈˈnuːɾɑˌʋinˑn ̩ ˈinːˌɾømˑə ɑt ˈsuːln ̩ ˈʋɑːɾ n ̩ ˈˈstæɾk̥əstə ʔɑ ˈdemː]

IPA(Fyresdal)

[ ˈˈnuːɾɑˌʋinˑn ̩ ɔ ˈsuːla ˈˈkɾɑŋlɑ um ˈkʋenː ɑʋ ˌd ̠æi sɔɱ ˈʋɑː d ̠ən ˈˈstæɾ̥kɑstə ˈdoː ˈkɔmː də ɛn ˈmɑnː ˈˈgɑŋgɑndə mə æn ˈʋɑɾmə ˈfɾɑkːə pɔ ˌseˑg dæi blæɪ ˈˈʔeːnigə um ɑt dæn sɔɱ ˈfysː ˌkʰunˑə fɔ ˈmɑnːən tə ɔ ˈˈtʰɑːk ɑʋ sə ˈfɾɑçːən ˌskʉlːə ˈˈɾeknɑs feɾ ɛ̃n ˈˈstæɾkɑst ɑʋ ˌd ̠ɛɪ ˈsoː ˈbluːs ˈˈnuːɾɑˌʋinˑn ̩ ɑʋ ˈɑlː si ˈmɑkʰtʰ mən tɪ ˈmæi hɑn ˈbluːs tɪ ˈˈtʰetːɑɾə ˈdɾuːg ˈmɑnːən ˈfɾɑçːən ˈɾunt seɰ̥ ɔ tɪ ˈs̠l ̥ʉtː ˈˈmɔtːə ˈˈnuːɾɑˌʋinˑn ̩ʔ ˌjeˑ ˈupːʰ ˈd ̥oː ˈʂæin ˈsuːla ˈfɾɑmː sɔ ˈgøʰtː ɔ ˈʋɑɾmt ɑt ˈmɑnːn ̩ me ˈʔæi ˈgɔŋ ˈˈmɔtːə ˈˈtʰɑːkə ˈɑːʋ sə ˈfɾɑçːən ɔ ˈsoː ˈˈmɔtːə ˈˈnuːɾɑˌʋinˑn ̩ ˈʋeːˌçɛn ̠ːə ʔɑt ˈsuːla ˈʋɑː dən ˈˈstæɾ̥kɑ̰st ɑʋ ˈðæ ̰ɪ̥]

[Nordavinden og sola, en norsk dialektprøvedatabase på nettet, NTNU, http://www.ling.hf.ntnu.no/nos/]

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

The speech chain

8

[Denes and Pinson (1993), «The speech chain»]

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech production

• Sounds are variations in air pressure

• How are they produced?

• An air supply: the lungs (we usually speak by breathing out)

• A sound source setting the air in motion (e.g. vibrating) in ways relevant to speech production: the larynx, in which the vocal folds are located

• A set of 3 filters modulating this sound: the pharynx, the oral tract (teeth, tongue, palate,lips, etc.) and the nasal tract

9fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech production

• Some languages also rely on sounds not produced by the vibration of the vocal folds, such as click languages (e.g. Khoisan family in southern Africa):

10fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech production

11fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Vocal tract

12

visualisation of the vocal tract via magnetic resonance imaging [MRI]:

[Speech Production and Articulation Knowledge Group, University of Southern California. http://sail.usc.edu/span/ ]

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Sound classes

• Voiced sounds are made when the vocal folds are vibrating

• e.g. [b], [d], [g], [v], [z], and all the vowels

• Voiceless sounds are made without such vibration

• e.g. [p], [t], [k], [f], [s]

13fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Sound classes

• Phones divided in 2 (+1) main classes:

• Consonants: may be voiced or voiceless, narrow to complete closure of vocal tract

• Vowels: nearly always voiced, little obstruction of vocal tract, often louder and longer than consonants

• Glides (or semivowels) between consonants and vowels, e.g. [y] and [w]

14fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Sound classes: consonants

• Consonants are realised by restricting the airflow

• They can differ in three ways:

• The place of articulation: where does the obstruction occur in the vocal tract?

• The manner of articulation: how does this restriction occur?

• Whether they are voiced or voiceless

15fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Sound classes: consonants

• Place of articulation:

• Labial: with the lips, such as [p],[b],[m]

• Coronal: with tip or blade of tongue, such as [s], [t], [d]

• Guttural: with the back of the oral cavity, e.g. [k], [g],[ŋ]

• Manner of articulation:

• Stop: airflow completely blocked then released, such as [b], [t], [k]

• Nasal: air passing through nasal cavity, such as [n],[m],[ŋ]

• Fricatives: airflow constricted but not cut off, such as [f],[v], [s]

• Approximants: articulators close together, but not enough to create turbulent airflow, such as [w],[y],[l],[r]

16fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Sound classes: consonants (English)

17

App

roxi

man

ts

Coronal «Guttural»

Stop

follo

wed

by

fric

ativ

e

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Sound classes: vowels

• Relevant parameters for vowels:

• vowel height: height of highest part of the tongue

• vowel backness: location of this high point in the oral tract

• Shape of the lips

• In some languages, distinction between short and long vowels

18fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Pronunciation variation

• Words can be pronounced very differently due to various phonetic processes:

• E.g. aspirations, assimilations, deletions

• Co-articulation: anticipation of next phone by articulators, or perseverance of previous movement from last phone

• Influence of various contextual factors: age, gender, environment, rate of speech, dialect, register

19fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Pronunciation variation

• Concept of phoneme:

• Abstraction over a set of phones/speech sounds

• But regarded as a single «sound» within a particular language, in opposition to other sounds

• «the smallest segmental unit of sound employed to form meaningful contrasts between utterances»

• Example: the English /t/ can regroup [th], [ɾ] and [t]

20fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Pronunciation variation

21fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Acoustic phonetics

• A (speech) sound is a variation of air pressure

• This variation originates from the speaker’s speech organs

• We can plot a wave showing the changes in air pressure over time (zero value being the normal air pressure)

• A wave can be defined using sine functions such as:

22

y(t) = A ∗ sin(2πft)

time variable (continuous)output signal (in our case, air pressure)

as a function of timeamplitude

frequency of the signal

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 courseTime (s)

0 2-3

3

0

untitled

0 0.5 1 1.5 2-3

-2

-1

0

1

2

3

Waveforms

• The frequency f of a wave is the number of times a second that the wave repeats itself (i.e. the number of cycles per second)

• The period T is the duration of one cycle, with

• The amplitude A corresponds to the maximum value reached by the wave

23

• Basic example:

• Wave drawn for a duration of 2 seconds

• Amplitude A = 3

• Frequency f = 2

T = 0.5 s

T =1

f

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech waveforms

• Of course, speech is more complex than a simple sine function

• But it can still be described using the same mathematical apparatus, in terms of frequency, amplitude etc.

24

Time (s)0.04047 1.479

-0.2938

0.251

0

untitled

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech waveforms

25

Time (s)1.126 1.157

-0.1201

0.136

0

untitled

• We can see about 4 cycles in the waveform, which means a frequency of about 4/0.03 ≈129 Hz

• If we zoom on the waveform from previous slide, and look at the time between 1.126 and 1.157 s:

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech waveforms

• Important measures of a speech signal:

1. The fundamental frequency F0: lowest frequency of the sound wave, corresponding to the speed of vibration of the vocal folds

26

For the human voice, F0 ranges from 85-180 Hz for males, and 165-255 Hz for females

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech waveforms

2. Its intensity: the signal power normalised to the human auditory threshold, measured in dB (decibels):

27

for a sample of N time points t1,... tN

P0 is the human auditory threshold, = 2 x 10-5 Pa

Note: dB scale is logarithmic, not linear!

Intensity = 10 log10Power

P0= 10 log10

1

NP0

N�

i=1

y(ti)2

Power =1

N

N�

i=1

y(ti)2

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Speech waveforms

• Why are F0 and the signal intensity important measures?

• F0 correlates with the pitch of the voice, and the analysis of the pitch movement for an utterance will give us its intonation

• The signal intensity gives us an indication of the loudness of the speech sound

28

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Pitch

29

Time (s)0.3906 1.284

-0.3904

0.3681

0

slice1

Time (s)0.3906 1.284

Pitc

h (H

z)

75

300

slice1Time (s)

0.3449 1.187-0.2713

0.2002

0

slice

Time (s)0.3449 1.187

Pitc

h (H

z)

75

300

slice

«The ball is red» «Is the ball red?»

Interrogative utterance → rising intonation at the end

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Time (s)3.806 4.904-1

0.7604

0

slice

Time (s)3.806 4.90450

100

Inte

nsity

(dB

)

slice

50

60

70

80

90

100

Loudness

30

Time (s)

1.848 3.012-0.2549

0.2366

0

slice

Time (s)1.848 3.01250

100

Inte

nsity

(dB

)

slice

50

60

70

80

90

100

Time (s)3.806 4.904-1

0.7604

0

slice

Time (s)3.806 4.90450

100

Inte

nsity

(dB

)

slice

50

60

70

80

90

100

Time (s)1.848 3.012

-0.2549

0.2366

0

slice

Time (s)1.848 3.01250

100

Inte

nsity

(dB

)

slice

50

60

70

80

90

100

«low intensity» «high intensity»

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• Possible to derive basic phonetic features (such as pitch or loudness) directly from the waveform

• But usually, the actual phones cannot be recognised

• For this, we need to use a different representation, in terms of the signal’s component frequencies (spectral analysis)

• Phones often have a distinctive «spectral signature»

31fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• Basic idea: a wave can be represented as a superposition of many sine waves with distinct frequencies

• This is called Fourier analysis (operation: Fourier transform)

32

Time (s)0 2

-4.492

4.492

0

sineWithNoise

0 0.5 1 1.5 2

-4

-3

-2

-1

0

1

2

3

4

Example: the wave on the left is a combination of two sine:

y(t) = 3sin(2π2t) + 1.5sin(2π21t)

• The first sine has a slow frequency (f=2 Hz) and an amplitude A = 3.0

• The second sine has a faster frequency (f = 21 Hz) and amplitude A = 1.5

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• We can then draw a spectrum showing the component frequencies of the signal and their corresponding amplitudes

• Spectrum for the waveform from the previous slide:

33

Frequency (Hz)0 30

Soun

d pr

essu

re le

vel (

dB/H

z)

60

80

100

0 5 10 15 20 25 30

We clearly see two peaks, one at frequency 2 Hz, and one at frequency 21 Hz

(as expected from the formula)

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• The spectral analysis transforms a signal encoded in the time domain into one in the frequency domain

• Goal is to determine the frequency content of a signal

34

Time (s)0 2

-4.492

4.492

0

sineWithNoise

0 0.5 1 1.5 2

-4

-3

-2

-1

0

1

2

3

4

Frequency (Hz)0 30

Soun

d pr

essu

re le

vel (

dB/H

z)

60

80

100

0 5 10 15 20 25 30

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• Below is the waveform for the phone [e] in the word «Here», associated with its spectral analysis

• We notice that some frequencies are more important than others

35

Frequency (Hz)0 6000

Soun

d pr

essu

re le

vel (

dB/H

z)

20

40

60

Time (s)0.4082 0.4394

-0.3454

0.2817

0

untitled

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

36fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• Finally, we can draw a spectrogram, which shows how the different frequencies making up a waveform change over time

• horizontal axis shows time while vertical axis shows frequencies

• The darkness of a point corresponds to the amplitude of the frequency component at that time

37

Time (s)0 1.707

-0.4086

0.4157

0

untitled

Time (s)0 1.707

0

5000

Freq

uenc

y (H

z)

untitled

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.60

1000

2000

3000

4000

5000

For instance, this frequency range (±1800-4000 Hz) is high at time 0.2 s

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spectral analysis

• From the spectrogram, we can view the formants of the speech sound, shown as dark bars

• The formants are denoted F1, F2, ...

• We can e.g. distinguish vowels based on these formants

38

Time (s)1.81 2.530

5000

Freq

uenc

y (H

z)

slice

0

1000

2000

3000

4000

5000

Time (s)13.57 14.390

5000

Freq

uenc

y (H

z)

0

1000

2000

3000

4000

5000

[a] [ae]

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Outline

• Introduction to phonetics

•Spoken language syntax

• Summary

39fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Spoken language syntax

• The structure of spoken language is markedly different than written language:

• Presence of numerous disfluencies (filled pauses, repetitions, corrections, repairs)

• Meta-communicative dialogue acts, where the speaker comments on his own «performance»

• Non-sentential utterances (NSUs) which need to be interpreted from the context

40fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Disfluencies

• Speakers construct their utterances «as they go», incrementally

• Production leaves a trace in the speech stream

• Presence of multiple disfluencies

• Pauses, fillers («øh», «um», «liksom»)

• Fragments

• repetitions («the the ball»), corrections («the ball err mug»), repairs («the bu/ ball»)

41fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Shriberg’s disfluency model

• Internal structure of a disfluency:

• reparandum: part of the utterance which is edited out

• interregnum: (optional) filler

• repair: part meant to replace the reparandum

Book a ticket to Boston� �� �reparandum

uh I mean� �� �interregnum

to Denver� �� �repair

[Shriberg (1994), «Preliminaries to a Theory of Speech Disfluencies», Ph.D thesis]

42fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Basic examples of disfluencies

• Repetitions

• Corrections:

• Rephrasing/completion:

robot please give me the ball� �� �reparandum

yes� �� �interregnum

the red one on your left� �� �repair

exactly

robot now go to the hallway� �� �reparandum

the hallway� �� �repair

ok and then turn right� �� �reparandum

no sorry I mean� �� �interregnum

left����repair

43fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

General remarks on disfluencies

• All parts of a disfluency may carry meaning relevant for interpretation

• Even filled pauses such as «uh» and «um»

• Levelt: reparandum and repair are of syntactic types that could be joined by a conjunction

• Pervasive phenomena: about 6% of the words in spontaneous speech are «edited»

[Levelt W. (1983), « Monitoring and self-repair in speech», in Cognitive Science.]

44fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

More complex disfluencies

så gikk jeg e flytta vi til Nesøya da begynte jeg på barneskolen der og så har jeg gått på Landøya ungdomsskole # som ligger ## rett over broa nesten # rett med Holmen

jeg gikk på Bryn e skole som lå rett ved der vi bodde den gangen e barneskole videre på Hauger ungdomsskole

da hadde alle hele på skolen skulle liksom # spise julegrøt og det va- det var bare en mandel og da var jeg som fikk den da ble skikkelig sånn " wow # jeg har fått den " ble så glad

45fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Non-sentential utterances

• Non-sentential utterances = utterances lacking an overt predicate

• Also called fragments or elliptical utterances

• Examples:

• «Should I take the ball?» → «yes indeed»

• «Please go the kitchen» → «go where?»

• «Task completed» → «brilliant!»

• «First take left after the corner» → «and afterwards?»

[J. Ginzburg (2008), Semantics for conversation]

46fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Non-sentential utterances

• Pervasive in spoken dialogue:

• ± 30% of the utterances, depending on the corpus

• Why are they so common?

• Their «full» meaning can often be easily retrieved from the context, with very little mental effort

• They make the turns both shorter and more effective (cf. Grice’s Maxim of Quantity)

47fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Non-sentential utterances

• Example from Harold Pinter’s play «Betrayal»:

(a) Emma: We have a flat.(b) Robert: Ah, I see. (Pause) Nice? (Pause) A flat. It’s quite well established then, your . . . uh . . . affair?(c) Emma: Yes.(d) Robert: How long?(e) Emma: Some time.(f) Robert: But how long exactly?(g) Emma: Five years.(h) Robert: Five years?

[J. Ginzburg (2008), Semantics for conversation]

48fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Non-sentential utterances

• The meaning of non-sentential utterances is (practically always) context-dependent

• Their meaning arises through the interaction itself

• This can lead to ambiguities in the resolution:

[R. Fernandez (2006), «Non sentential utterances in dialogue: classification, resolution and use», PhD thesis]

49

1. A: When are they going to open the new main station? B: Tomorrow2. A: They are going to open the station today. B: Tomorrow3. A: They are going to open the station tomorrow. B: Tomorrow

(short answer)

(correction)

(acknowledgement)

fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Outline

• Introduction to phonetics

• Spoken language syntax

•Summary

50fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Summary

• We introduced some basic concepts of phonetics:

• Speech sounds are produced by a complex interaction of several organs (lungs, vocal folds, vocal tract)

• The same vowel or consonant can be pronounced very differently depending on the context

• We can perform acoustic analysis of the speech signal to determine properties such as pitch or loudness

• Using spectral analysis, we can also distinguish different phones (looking at the formants on the spectrogram)

51fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

Summary

• We have also seen that the structure of spoken language can be very different from written language:

• Presence of disfluencies (filled pauses, repairs, repetitions, corrections)

• Non-sentential utterances are omnipresent, and they require contextual interpretation

52fredag 31. august 2012

@ 2012, Pierre Lison - INF5820 course

This Friday

• On Friday, we’ll describe the software architectures for spoken dialogue systems

• What are the main components of a dialogue system?

• How are these components interconnected?

• We’ll also introduce our focus domain: human-robot interaction

53fredag 31. august 2012