automatic language identification
TRANSCRIPT
![Page 1: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/1.jpg)
야옹
miyav miao
mjaumiaou
mnau
meongニャン
meow
miauw
mjá
mjav
meamh
?
![Page 2: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/2.jpg)
Automatic Language Identification
(LiD)
Alexander Hosford
@awhosford
![Page 3: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/3.jpg)
The LiD Problem
• The identification of a given spoken language from a speech signal
• 94% of global population speak only 6% of
world languages
![Page 4: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/4.jpg)
Real World Situations
• Automation of LiD is desirable
• Offers many benefits to international service
industries
• Hotels, Airports, Global Call Centres
![Page 5: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/5.jpg)
Language Differences
• Languages contain information that makes
one discernable from the other;
– Phonemes
– Prosody
– Phonotactics
– Syntax
![Page 6: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/6.jpg)
Human Abilities
• Bias towards native language arises in infancy
• Prosodic features some of the first cues to be
recognised
• Humans can make a reasonable estimate on language
heard within 3-‐4 seconds of audition
• Even unfamiliar languages may be plausibly judged
![Page 7: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/7.jpg)
Previous Attempts
• Attempts as early as 1974 – USAF work, therefore
classified.
• Methodological Investigations as early as 1977 (House &
Neuburg)
• Studies for the most part center on phonotactic constrains
and phoneme modeling
• Raw acoustic waveforms have been visited (Kwasny 1993)
![Page 8: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/8.jpg)
A Simpler Approach?
• Phonotactic approaches require expert linguistic knowledge
• Phoneme and phonotactic modeling time
consuming
• Given the speed of human LiD abilities,
discrimination most likely based on acoustic
features
![Page 9: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/9.jpg)
System Overview
Pre-processed Speech Utterance nPVI
MFCC Vectors
Onset Detection
Pitch Contour
WEKA ToolsARFF File Generation
SuperColliderSystemCmd
SuperCollider
![Page 10: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/10.jpg)
Feature Extraction
• Spectral Information – MFCC Vectors
• Pitch contour information
• Handled by SCMIR Library (Collins, 2010)
• Speech Rhythm – Normalised Pairwise Variability
Index (nPVI)
Feature Type Implemented Measure
Spectral Content MFCC Vector uGen
Pitch Contour Tartini uGen
Speech Rhythm nPVI function
![Page 11: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/11.jpg)
Classification
• Handled by the WEKA toolkit
• Built in Multilayer Perceptron
• Called from the command line through a
SuperCollider system command
![Page 12: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/12.jpg)
Comparisons Made
• 66 language pairs from 12 languages
• A comparison within language families
• A comparison of all 12 languages
• Averaged & segmented data
![Page 13: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/13.jpg)
Results Within Families
*Data averaged across files
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
4 MFCC 13 MFCC Tartini 41 MFCC Tartini All Features
Germanic
Romantic
Slavic
SinoAltaic
Mean
![Page 14: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/14.jpg)
Results Within Families
*Data in 5 second segments
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
4 MFCC 13 MFCC Tartini 41 MFCC Tartini
Germanic
Romantic
Slavic
SinoAltaic
Mean
![Page 15: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/15.jpg)
Results From All Languages
10.00
20.00
30.00
40.00
50.00
60.00
70.00
80.00
90.00
100.00
4 MFCC 4 MFCC Tartini 4 MFCC Tartini nPVI
13 MFCC 13 MFCC Tartini 13 MFCC Tartini nPVI
41 MFCC 41 MFCC Tartini 41 MFCC Tartini nPVI
Averaged
Mean
5s Segments
Mean
![Page 16: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/16.jpg)
Extensions
• nPVI function to use vowel onsets
• Phonemic Segmentation
• A Larger dataset
• Better Efficiency
• Real-‐time operation
![Page 17: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/17.jpg)
Robustness
• Real world signals are very different from
processed ‘clean’ data.
• ‘Ideal’ LiD systems – independent & robust
• A need to analyse only the part of the signal that matters.
![Page 18: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/18.jpg)
CASA
• A computational modeling of human
‘Auditory Scene Analysis’ (Bregman 1990)
• Separation of signal into component parts
and reconstitution into meaningful ‘streams’
![Page 19: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/19.jpg)
Using Noisy Signals
I’m open to suggestions!
![Page 20: Automatic Language Identification](https://reader033.vdocuments.net/reader033/viewer/2022060123/55967bdc1a28abdb368b45f9/html5/thumbnails/20.jpg)
[email protected] 07814 692 549 @awhosford