today’s class the science of talking€¦ · the science of talking: speech is highly variable...
TRANSCRIPT
The Science of Talking:
Speech is highly variable
Rachel Baker
Speech Hearing and Phonetic Sciences, UCL
02/12/2010
Today’s class
• Presentation with short tasks
• Further reading
2
Variability in speech
Between-speaker variability
3
‘The seven scouts came back’
Spk 1
Spk 2
Variability in speech
At the sentence level these differences become even
more striking….
4
Variability in speech
• Between-speaker variability
– Anatomical factors
• Vocal tract dimensions
• Vocal fold length/thickness
– Sociolinguistic factors
• Regional accent
• Other speech production differences linked to gender, age,
‘social group’ membership etc…
– Idiosyncratic/individual differences in production
• ‘jaw-movers’ vs. ‘non jaw-movers’ when producing high/low
vowels
5
Variability in speech
• Within-speaker variability
– Speech rate
– Physical health
– Mental health
– Speaking style: who speaking to, topic of conversation
etc..
– Pathologies (e.g. glossectomy)
6
Communication is a highly dynamic
process…
Maximising
communication
ease
Minimising
speaker effort
Different speaking styles reflect this trade-off
black and white1st
attempt
2nd
attempt
7
Today’s topic
Clear speech
Why bother investigating this?
8
Speaking style: CLEAR SPEECH
In what situations do we need to speak clearly??
Noisy
background
Hearing
impairment
To children
Non-native
speaker of a
language
Formal
occasion, e.g.
presentation,
lecture
9
Reasons for communication difficulties
- Noisy background
- Hearing
impairment
-Public speaking
Degraded audition Reduced language knowledge Reduced attention
- To children
- Non-native speaker
- Public speaking?
- Noisy background
- To children
- Public speaking
10
What are the acoustic-phonetic characteristics
of ‘clear speech’?
Clear
speech?
Approach 1:
Inherently clear
speakers
Approach 2:
‘casual’ vs.
‘clear’ speaking
styles
11
Approach 1: ‘Inherently clear speakers’
1. Recordings of 45 speakers of South-Eastern British
English
2. Listening tests:
• Intelligibility of words in noise
• Subjective ratings of the voice
3. Acoustic-phonetic measurements
4. Correlation between acoustic-phonetic measures and
speech intelligibility [Hazan and Markham, 2004]
Can we find specific acoustic-phonetic features of a voice
that make it particularly clear?
12
Rate the speakers….
13
af-03
am-14
af-06 af-16
am-05am-10
Female speakers
Male speakers
Overall Error rates
SPEAKER
am
-14
af-
03
am
-13
cm-0
6a
f-1
5a
m-1
2cf
-09
cf-0
3a
f-0
7cm
-03
af-
08
am
-17
am
-16
cm-0
1a
m-1
8a
m-0
3cf
-08
cf-0
6a
m-0
6a
f-1
7a
f-1
8a
m-0
2a
m-0
9cm
-02
af-
04
af-
19
am
-05
af-
16
af-
11
af-
13
cm-0
5cf
-04
af-
09
cf-0
1a
f-1
0a
m-1
9cm
-04
am
-07
af-
12
af-
02
af-
21
am
-08
am
-10
af-
14
af-
06
Err
or
rate
(%
)
20
10
0
14
What aspects of the voice are correlated with
speech clarity?
• Amount of energy in mid frequency regions
• Degree of vowel articulation
• Speech rate15
But ‘clear speakers’ can achieve ‘clarity’ in
different ways…
Speaker Intelligibility Energy in
voice
Articulation Speech rate
Af-06 1st 1st = 6th 3rd
Af-14 2nd = 19th = 11th 16th
Am-10 2nd = 9th 40th 17th
16
Can we make our speech clearer?
Voice coaching
• Drama students recorded before and after voice
training
[CDD/RADA/LAMDA SAVEP project, Shewell and colleagues]
Year 1 Year 3
17
What are the characteristics of ‘clear speech’?
Clear
speech?
Approach 1:
Inherently clear
speakers
Approach 2:
‘casual’ vs.
‘clear’ speaking
styles
18
• Read sentence normally
• Read sentence as it talking to someone who is
hearing impaired
Approach 2: comparing speaking ‘normally’
and ‘clearly’
19
Record the following sentence:
- once speaking normally
- once as if speaking to someone who is hearing
impaired
Your turn
“The young children loved the beach”
20
21
casua
l
clear
The young children loved the beach
children - 35ms
children – 85ms
[i]
2984Hz
[i]
2689Hz
What features are enhanced in ‘clear’
speaking styles, relative to ‘casual’ speech?
• Similar features to ‘inherently-clear’ speakers
Clearer vowel articulation
Slower speech rate
Higher energy in mid frequency range
+ Higher pitch and pitch range
22
What are the drawbacks of collecting data
like this?
23
Why is it important to collect interactive
spontaneous clear speech data?
• Read speech might be over/under-estimating clear speech
in an interaction
• Trade-off between communication ease and lack of effort
• Different types of clear speech?
24
Focus: investigating clear speech with
‘communicative intent’
What ways do people modify their speech
when they are in a situation where they are
not easily understood?
25
How can we collect spontaneous speech?
A collaborative task for two people is useful if
you need:
• Clear and casual speaking styles for each
participant
• Similar topics and words in both styles
• High quality recordings
26
One type of task: Diapix task (interactive spot
the difference game)
• Pairs communicate via headsets to find 12 differences
27
Talker
A
Talker BNormal channels
Communication barriers to encourage clear
speech
• Talking to someone with a hearing impairment
Simulation of a cochlear implant
3 channel noise vocoder
• Talking to someone in a noisy background
Multi-talker babble
• Talking to someone who cannot speak or
understand English well
Native Chinese/Korean speaker with low score on English
test28
Do listeners perceive the ‘communication
barrier’ speech as clearer than casual
speech?
29
BABBLE
NB
VOCODER
• Clarity ratings for VOC
and BAB sig. higher
than for NB speech
CLEAR
Are there acoustic-phonetic differences
between the ‘communication barrier’ clear
speech and casual speech?
30
mean
energy
fundamental
frequency
mean word
duration
clear (BAB)y
clear (VOC)y
casual / ‘normal’
speech
clear (L2)y
high
low
F1 and F2
range
What adaptations might be expected from
different communication barriers?
31
‘Barrier’ Effect of barrier on
listener
Helpful
clarifications
Unhelpful
clarifications
• Enhanced vowel
space
• Slower speech rate
Babble • Masking of key
acoustic cues
• Heightened
thresholds
• Higher mid-
frequency energy
• Higher F0/F0 range
• Enhanced vowel
space
• Slower speech rate
• Poor spectral
resolution
• Loss of voicing
information
• Enhanced vowel
space
• Slower speech rate
Noise-
excited
Vocoder
• Higher mid-
frequency energy
• Higher F0/F0 range
Do the data support these predictions?
32
BAB VOC
Helpful dimensions for BAB condition onlyHelpful dimensions for both conditions
% change significantly greater in BAB
than VOC
No significant differences in % change
between BAB and VOC
Can we make our speech clearer?
Artificial enhancements
• Normal sentence
– Slowed-down
– With more mid-frequency energy
– With extended pitch range/mean
33
Further investigations of clear speech
• Clear speech in languages other than English
• At what age are we able to produce clear speech?
• Comparisons of child directed speech and pet
directed speech….
34