philip harrison j p french associates & department of language & linguistic science, york...
TRANSCRIPT
Philip HarrisonJ P French Associates &
Department of Language & Linguistic Science,
York University
IAFPA 2006 Annual Conference
Göteborg, Sweden
Variability of Formant Measurements – Part 2
2
Summary
• Briefly recap previous analysis & last year’s presentation
• New analysis & results
• PhD research
• Questions
3
Study
• Aim: Investigate the variability of formant measurements which exists both within and between different software programs currently used in the field of forensic phonetics.– 3 programs – Praat, Multispeech & Wavesurfer
– 3 analysis parameters – LPC order, analysis (frame/window) width, pre-emphasis
– Word list – 5 vowel categories – 6 tokens per category – read 3 times – total = 90 tokens
– 2 speakers – Peter French & me
– 2 simultaneous recordings – microphone & telephone
4
Results & Analysis
• Scripts used to obtain 37,260 individual formant measurements using LPC formant trackers
• Analysis – microphone data only– Initial observations of raw formant data
– Quantitative analysis of results
– Statistical analysis
5
My F1s from PraatLPC Variation
0
500
1000
1500
2000
2500
3000
3500
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88
Token
F1
Fre
qu
ency
(H
z) 6
8
10
12
14
16
18
FLEECE TRAP PALM GOOSE SCHWA
6
The Plot Shows…
• Scripts work – (used in fault finding)
• Vowel categories clear
• Greatest deviation – LPC orders 6 & 8
• Orders 10 to 18 very similar for FLEECE, GOOSE & SCHWA
• Generated many more plots for all formants, parameters & software – Lots of variation
– Difficult to interpret
7
Quantitative Analysis
• Quantitative Difference Analysis– No absolute measurement to compare
formants with – outcome of analysis, not directly comparable with acoustic reality
– Difference calculated between value obtained with default analysis settings
– Absolute difference calculated for each formant then averaged by vowel category
– Shows variation between two analyses
8
Observations
• Numerical analysis confirmed impression from plots
• Clear differences between vowel categories, speakers, formants, software & settings
• Complex set of results with no clear patterns
9
Statistical Analysis
• Paired t-test between measurements from default settings and varied settings for each vowel category– Null hypothesis – altering analysis settings no effect
– Exp hypothesis – altering analysis settings effect
• Number of significant ‘hits’ summed – max 15
• Higher number = greater variation in formant measurements
• 2 significance levels – 0.01 & 0.05
10
Conclusions
• Hoped to have clear patterns, able to produce set of guidelines/recommendations
• Patterns only at specific, detailed level
• Very clear that many factors affect formant measurements
• No software is obviously better than others
• Care should be taken when measuring formants
11
New Work!!!
• Initial data contained obviously incorrect measurements
• Discard measurements – criterion?
• Determine acceptable band– Spectrograms – no
– Formant bandwidths – no (attempted)
– LPC tracker & spectrogram – no (attempted)
– Spectrum of selection – yes but still encountered problems
• Band limit 300 Hz – impressionistic
12
Spectrum Measurements
• Used to determine centre of 300 Hz acceptable band
• Spectrum with 260 Hz bandwidth – same as default spectrogram
• Measured peaks F1, F2 & F3
• Issues/problems– Windowed -> biased to centre of selection
– Formant peaks not always clear – some tokens ignored
– Double peaks – highest peak measured
13
Analysis of Accepted Measurements
• Analyse LPC variation only – other parameters more stable – not altered
• No accurate reference which raw measurements can be judged against
• Accepted results provide indication of accuracy & consistency
• Clear patterns in accepted formants
• Condense results – % accepted per vowel category
14
Plot of Accepted ResultsPraat Me Mic F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
LPC
Per
cen
tag
e A
ccep
ted
FLEECE
TRAP
PALM
GOOSE
SCHWA
15
Me Microphone AcceptedP r aat M e M i c F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h M e M i c F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er M e M i c F1
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
P r aat M e M i c F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h M e M i c F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er M e M i c F2
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18P r aat M e M i c F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h M e M i c F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er M e M i c F3
0
20
40
60
80
100
120
10 11 12 13 14 15 16 17 18
Praat Multispeech Wavesurfer
F1
F2
F3
16
Me Telephone AcceptedPraat Multispeech Wavesurfer
F1
F2
F3
P r aat M e P hone F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h M e P hone F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er M e P hone F1
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
P r aat M e P hone F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h M e P hone F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er M e P hone F2
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18P r aat M e P hone F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h M e P hone F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er M e P hone F3
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
17
JPF Microphone AcceptedPraat Multispeech Wavesurfer
F1
F2
F3
P r aat J P F M i c F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h J P F M i c F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er J P F M i c F1
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
P r aat J P F M i c F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h J P F M i c F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er J P F M i c F2
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18P r aat J P F M i c F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er J P F M i c F3
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
18
JPF Telephone AcceptedPraat Multispeech Wavesurfer
F1
F2
F3
P r aat J P F P hone F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h J P F P hone F1
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er J P F P hone F1
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
P r aat J P F P hone F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h J P F P hone F2
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er J P F P hone F2
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18P r aat J P F P hone F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
M ul ti s peec h J P F P hone F3
0
10
20
30
40
50
60
70
80
90
100
6 8 10 12 14 16 18
Waves ur f er J P F P hone F3
0
10
20
30
40
50
60
70
80
90
100
10 11 12 13 14 15 16 17 18
19
General Patterns• Praat & Multispeech – bell curves
– Most consistent setting – P 10, MS 10 to 14
– Curves shifted to left (lower LPC) for phone
• Wavesurfer – horizontal– Different behaviour to Praat & Multispeech
– Some very weak results – especially F3
– For me better results for phone recording (also true for Praat & Multispeech)
• Most consistent setting Praat LPC 10
• Again variation across vowel category, speaker, formant, software & condition
20
Microphone vs Telephone
• Künzel (2001):– Landline phone vs microphone
– Largest F1 difference in region of 14% for close vowels
• Byrne & Foulkes (2004):– GSM mobile phone vs microphone
– F1 average 29% higher for GSM
• Not big differences for F2 & F3
• Current data (spectral comparisons) – only 2 speakers
21
Comparison Tables
Me
JPF
F1 F1 % Diff F2 F2 % Diff F3 F3 % DiffFLEECE 258 26 2171 0 2891 0TRAP 771 0 1394 1 2632 -1PALM 690 6 1125 -1 2626 -2GOOSE 260 33 1748 0 2242 0SCHWA 502 0 1486 1 2513 -1
F1 F1 % Diff F2 F2 % Diff F3 F3 % DiffFLEECE 254 13 2140 0 2551 0TRAP 661 2 1413 -1 2306 0PALM 607 6 1037 -1 2439 0GOOSE 269 11 1105 -1 2222 0SCHWA 528 1 1330 0 2274 0
22
General Observations
• LPC tracks for phone recordings more stable, easier to measure– Less ‘information’ above F3
– Possibly pre-filter recordings?
• Different LPC orders produce better tracks for different formants of the same token– Contradicts my previous advice to keep LPC
setting constant across vowel categories
23
PhD Next Steps
• Use synthesised speech
• Formant values specified
• Repeat software experiments
• Other factors to investigate– Pitch
– Voice quality
– Interaction of analysis parameters
24
Other Potential Areas of Investigation for PhD
• Effects of GSM coding & transmission
• Acoustic environments
• Pseudo-formants – source???
• Mouth/telephone distance & orientation
• Any other ideas…?
25
Questions
?
Thanks to Peter French & Paul Foulkes