speaker identification based on the statistical analysis of f0
DESCRIPTION
Speaker Identification based on the statistical analysis of F0. Pavel Labutin, Sergey Koval, Andrey Raev St. Petersburg, Russia [email protected]. Report overview. The problem of the F0 usage in forensic speaker identification Main challenges Proposed Method Results Conclusion. - PowerPoint PPT PresentationTRANSCRIPT
SpeechTechnologyCenter
SpeechTechnologyCenter
Speaker Identification based on the statistical analysis of F0
Pavel Labutin, Sergey Koval, Andrey RaevSt. Petersburg, Russia
224.07.2007 www.speechpro.comwww.speechpro.com
Report overview
The problem of the F0 usage in forensic speaker identification Main challenges Proposed Method Results Conclusion
324.07.2007 www.speechpro.comwww.speechpro.com
The problem of the F0 usage in forensic speaker identification
F0 analysis - obligatory stage in forensic speaker identification.
Remedial legislation demands: forensic investigation of the speech evidence must be comprehensiveBecause pitch reflects important properties of the human voice, consequently it must be investigated by forensic examination of the speech record
Typical F0 usage by speaker identificationAutomatic F0 detection
Some data smoothing
Simple F0 statistics comparison
424.07.2007 www.speechpro.comwww.speechpro.com
Main challenges in F0 usage for forensic speaker identification
Fig.1. F0 curve for telephone conversation of the suspected person.At 15th sec he got an important information: Average F0 grew in 70Hz. Vertical axis – frequency (Hz), horizontal axis – time (sec),
Low speech quality for real police records
As usual SNR < 15 dB
Frequency range is limited
Speech signal distortions (compression, non linear FR of channel equipment, tape recorders etc.)
High inner speaker F0 variability
High dependence F0 statistics from speaker state and style of speech
524.07.2007 www.speechpro.comwww.speechpro.com
The method discussed
Three stages:
1. F0 reliable detection 2. F0 detection control an correction
3. F0 statistics data analysis and comparison.
F0 Detection algorithm: two-pass-method; using summation of multiple harmonics in the spectral field; Noise cancellation, adaptation for speech signals of very low quality Good results for field applications; Is implemented into expert software (SIS) and is used for real forensic examinations.
Fig.2. Waveform (upper window) and F0 curve (thin yellow curve) superimposed on cepstrogram (bottom window). On the cepstrogram picture [7] shadow degree corresponds to the signal periodicity degree at this point of frequency and time. Vertical axis – frequency (Hz), horizontal axis – time (sec).
624.07.2007 www.speechpro.comwww.speechpro.com
F0 detection exactness control and correction
Fig.3. Waveform (upper window)and F0 curve (thin yellow line in bottom window). Correspondence between real F0 and calculated curve is unknown and uncontrolled. Vertical axis – frequency (Hz), horizontal axis – time (sec).
724.07.2007 www.speechpro.comwww.speechpro.com
F0 detection exactness control and correction
Fig.4. Waveform (upper window), cepstrogram (signal periodicity function – in the middle) and F0 curve (thin yerllow curve) superimposed on cepstrogram (bottom window). On the cepstrogram picture [7] shadow degree corresponds to the signal periodicity degree at this point of frequency and time. Vertical axis – frequency (Hz), horizontal axis – time (sec).
824.07.2007 www.speechpro.comwww.speechpro.com
F0 detection exactness control and correction
Fig.5. Waveform (upper window), initially detected F0 curve (yellow curve) superimposed on cepstrogram (middle window), graphically corrected by expert’s F0 curve and cepstrogram (bottom window). On the cepstrogram picture [7] shadow degree corresponds to the signal periodicity degree at this point of frequency and time. Vertical axis – frequency (Hz), horizontal axis – time (sec).
924.07.2007 www.speechpro.comwww.speechpro.com
Statistical F0 features used
Values of pitch are transformed to a logarithmic scale, and then statistical pitch features are calculated.
The typical set of the statistical parameters: Average value, Hz; Maximum, Hz; Minimum, Hz; Maximum -3%, Hz;* Minimum +1%, Hz; Median, Hz; Percent of areas with raising pitch,%;* Pitch logarithm variation;* Pitch logarithm distribution asymmetry;* Pitch logarithm distribution excess; Average velocity of pitch change, %/sec; Pitch logarithm variation derivative; Pitch logarithm derivative distribution asymmetry; Pitch logarithm derivative distribution excess; Average velocity of pitch raise, %/sec;* Average velocity of pitch fall, %/sec.*The asterisk indicates the statistical features more heavily weighted in common
metric for speaker identification.
1024.07.2007 www.speechpro.comwww.speechpro.com
General identification metric
The deviation of every statistical parameter was calculated for every file pair from the corpus.
The distributions of the deviations for pairs “same-different” and “same–same” were built
Functions False Acceptance (FA), False Rejection (FR) and EER (Equal Error Rate) were calculated for every statistical parameter.
The general identification metric was constructed as a weighted sum of separate statistical parameters.
The weights were selected to minimize EER for the given speech database.
For general weighted metric FR and FA curves and ERR were calculated.
1124.07.2007 www.speechpro.comwww.speechpro.com
Speech data base used for training and testing A speaker identification algorithm was developed and trained
using the STC corpus RUSTEN.
RUSTEN includes: 126 speakers (67 women and 59 men) in 5 sessions for 5 different analog telephone lines (including public
telephones from noisy streets and underground stations), real spontaneous dialogs
and130 speakers (61 women and 69 men)in 2 – 10 sessionsfor different digital telephone linesabout 1000 files of high quality digital phone channel
conversations.
RUSTEN: Russian Switched Telephone Network speech database (STC), 2003. S0050, ELDA - Evaluations and Language resources Distribution Agency.
1224.07.2007 www.speechpro.comwww.speechpro.com
An example of F0 feature detection in SIS software
Fig.6. An example of working window of the SIS software with the results of F0 statistic comparison for two speakers.
Such screenshots are typically inserted into the expert examination conclusion to illustrate F0 statistical analysis results.
1324.07.2007 www.speechpro.comwww.speechpro.com
Pitch of the two files with differebt avaraged value
Fig.7. Cepstrograms of two compared speech files. The same speaker with different style of speech. According to pitch statistical analysis speakers are the same, although average pitch values differs significantly: 154Hz and 135Hz correspondently.
1424.07.2007 www.speechpro.comwww.speechpro.com
Results of method testing Tonal
speech duration
10 sec
template
20 sec template
40 sec template
80 sec template
10 sec
Test
All
Men
Women
17.7
25.2
26.6
20 sec
Test
All
Men
Women
16.7
23.7
24.9
15.2
21.7
22.6
40 sec
Test
All
Men
Women
16.1
23.0
23.8
14.4
20.6
21.1
13.2
19.1
19.0
80 sec
Test
All
Men
Women
15.6
22.1
23.1
13.6
19.5
19.8
12.3
17.8
17.5
10.9
16.2
15.0
Tables 1 shows the results of the speaker identification using F0 statistics analysis. The test data base includes about 1600 speech files of 256 speakers, real dialogs through public telephone net, both analog and digital channels.
1524.07.2007 www.speechpro.comwww.speechpro.com
Results of speaker discrimination using only averaged F0 value.
Tonal speech duration
10 sec
template
20 sec template
40 sec template
80 sec template
10 sec
Test
Men 32.0
20 sec
Test
Men 31.1 30.1
40 sec
Test
Men 30.5 30.1
80 sec
Test
All
Men 30.1 28.8 27.9
17.4
27.5
Tables 2 shows the results of the speaker identification using only one, usually used F0 feature: average F0 value. The test data base includes about 1600 speech files of 256 speakers, real dialogs through public telephone net, both analog and digital channels.
1624.07.2007 www.speechpro.comwww.speechpro.com
An example of FA and FR curves. Ave F0
1724.07.2007 www.speechpro.comwww.speechpro.com
An example of FA and FR curves. F0 min+ 3%
1824.07.2007 www.speechpro.comwww.speechpro.com
An example of FA and FR curves.General metric
1924.07.2007 www.speechpro.comwww.speechpro.com
CONCLUSION
The method based upon the statistical analysis of F0 for forensic speaker identification is described.
The reliability of the method is tested on a large amount of real speech material of telephone conversations.
Described really very good method to detect F0, check and correct detected F) curve for real forensic speech records.
The method is implemented into expert software (SIS) and used in everyday forensic examination practice.
2024.07.2007 www.speechpro.comwww.speechpro.com
PERSPECTIVES
The same method of the statistical analysis of F0 is used for diagnostics of unknown speaker anthropometric features, such as age, high, weight , etc.
Preliminary results are promising.
Except the statistical F0 analysis we propose for experts in addition to perform detailed structural analysis of the F0 curve.
In particular, to measure Max, Min, Range,Timing of the F0 moving for the space of accented syllable of the phrase or for voiced hesitation pauses.
2124.07.2007 www.speechpro.comwww.speechpro.com
Thank you for attention