a framework for bangla text to speech synthesis
DESCRIPTION
My conference presentation slide for my paper in 16th ICCIT conference, 2013.TRANSCRIPT
A Framework for Bangla Text to Speech Synthesis
Authors
K. M. Azharul Hasan, Muhammad Hozaifa, Sanjoy Dutta, Rafsan Zani Rabbi
Presented By
Sanjoy Dutta
Department of Computer Science & Engineering
Khulna University of Engineering and Technology, Khulna, Bangladesh.
Authors
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
2
Problem Statement
•Develop a framework for Bangla Text to Speech Synthesis.
3
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
4
Factors for Speech Synthesis in Bangla
• Sequential flow of diphones
A diphone is a set of two adjacent phonemes where the transition between two phonemes are modelled, usually from the middle of the first phoneme to the middle of the second phoneme.
A phoneme is a sound or a group of different sounds perceived to have the same function by speakers of the language or dialect in question. Like in English for K/C phoneme: Skill, School.
• Position vs. Pronunciation
Three kinds of position occurs of consonant and vowels:
Constant Vowel(CV)
Vowel Constant(VC)
Vowel Constant Vowel(VCV)
5
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
6
Proposed Framework Structure and Rules
• Text Normalization:
Transforming text into a single standard form.
Used when converting text to speech, numbers, dates, acronyms, and abbreviations.
Text Normalization for Position vs. Pronunciation.
7
Normalization rules for ‘ ’
8
Normalization rules for ‘ - - -’
9
Syllable Parser Development
10
Syllable Parser In Action
11
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
12
Audio File Selection and Normalization
Total 39 consonants 11 vowels in Bangla
After Reduction
28 independent consonants
8 (the vowel ’ ‘ is the exception) vowel
13
Audio File Selection and Normalization
Finally 224 (28*8) audio files for the syllables.
28 consonant against 5 vowels to generate
140 (28*5) diphones.
In summary, we need (9 vowels, 28
consonants, 224 syllables and 140 diphones)
401 audio files to be created.
14
Contents
• Problem Statement
• Factors for Speech Synthesis in Bangla
• Proposed Framework • Rules and Structure Development • Syllable Parser Development
• Audio File Selection and Normalization
• Experimental Analysis & Results
• Conclusion
15
Experimental Analysis and Results
Strategy of Analysis:
Sample Input Test: Various News Articles from News Portals
Listeners Selection: Anonymous Personals Chosen Randomly
Accuracy Analysis:
Accuracy = 𝑊𝑜𝑟𝑑𝑠 𝑙𝑖𝑠𝑡𝑒𝑛𝑒𝑟𝑠 𝑤𝑒𝑟𝑒 𝑎𝑏𝑙𝑒 𝑡𝑜 ℎ𝑒𝑎𝑟 𝑜𝑛 1𝑠𝑡 𝑎𝑡𝑡𝑒𝑚𝑝𝑡 𝑐𝑙𝑒𝑎𝑟𝑙𝑦∗100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑖𝑛 𝑒𝑣𝑒𝑟𝑦 𝑠𝑎𝑚𝑝𝑙𝑒
16
Experiment Result Listening Factors:
• Duration Synchronization and
Merging
• Numerical Value like years
Constrains in Sample 1:
, , ,
, , ,
Constrains in Sample 2:
, , , , ,
,
17
Limitations and Future Works
Detect Noun and Adjective words namely
( ) Noun and
( ) Adjective
both words should follow the rule 3(a) .
But they don't follow the rule 3(a) and their pronunciation is different.
18
CONCLUSION
We believe the proposed framework can be useful for Bangla TTS development to detect the Bangla words with minimum audio file requirement.
19
Thank You !!!
20