comparison of farsi vowel intonation with different...

7
زش خط و زبان فارسیمللی پرداتیه کىفراوس بیه ال وخس51 و51 یىر شهر5935 وشگاي سمىان دا- برق و کامپیىتر داوشکذي مهىذسیComparison of Farsi Vowel Intonation with Different Languages for Teaching and Preserving Original Accent Mohammad Savargiv Department of Electrical, Computer and IT Engineering, Islamic Azad University, Qazvin Branch, Iran [email protected] Azam Bastanfard Islamic Republic of Iran Broadcast University [email protected] ABSTRACTVoice intonation is one of criteria of appropriately expression of phonemes and words, especially in teaching language exert appropriately intonation by person on phonemes and words, regardless of type such as: interrogative, affirmative and etc, are depends on learning. Comparison and reconstruction of voice intonation are two important challenges in Computer Assisted Language Learning. In this paper a method based on discrete signal processing has been presented. By using this method user can see similarity value of own voice intonation with source voice. In addition duration of voice and silent of user sound intervals, reconstruct according to source voice, and the user will be able to hear reconstructed phoneme by own voice. Applications of this method are: teaching standard Farsi pronunciation to the non-Farsi speakers, speech therapy, animation, and E-learning. Keywords: Intonation comparison, voice reconstruction. I. INTRODUCTION General procedure learning spoken languages is listening to speech and imitates its intonation. Due to technological advances in the field of computer aided education and discrete signal processing; one of the challenging issues is teaching words and phonemes intonation by computer in Farsi. In this paper, the voice intonation in the first has been studied. Voice intonation is placed under phonetic language systems. Intonation, along with two other topics that named: Stress and Hesitation, makes features. This features that Alphabet cannot indicate them, named Suprasegmental features. However, these features can be shown by somewhat punctuation in sentences, such as:"?" for interrogative sentences, "." for affirmative sentences, "!" for exclamatory sentences and etc. This discussion was not restricted to Farsi language and almost in all languages of the world is common. Another talk more that not uses signs consists of: intonation of phonemes and words and their pronunciation type. In this case the only differences are phoneme duration and loudness. Intonation has two important aspects. The first aspect: The phonetically clauses are widely used in conversation that which without uses words and grammatical rules transfer emotion states of the speaker [7]. Phonetically clause such as: yoo-hoo, wow, ooh. Second aspect is exerting changes in the concept. If the intonation at the end is rising, regardless of grammatical structure, the term has interrogative mode. Like this, if the intonation at the end is falling, even if the sentence structure is interrogative mode, affirmative mode will be understood. Hence, the vowel intonations at the end of sentences, in conversation viewpoint, are important. As shown in table 1, in falling type, sound is low, rises slowly and again comes down to lead to silence. Affirmative intonation is usually like this. In rising type, sound is low, rises slowly, and will not decrease. Interrogative intonation is usually like this. This discussion illustrated in figure 1. TABLE 1 FORMALLY AND PHONETICALLY TYPE OF SENTENCES AND THEIR DIFFERENCE INTONATION, WITHOUT CHANGE IN GRAMMATICAL STRUCTURE.

Upload: others

Post on 29-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

Comparison of Farsi Vowel Intonation with

Different Languages for Teaching and Preserving

Original Accent

Mohammad Savargiv

Department of Electrical, Computer and IT Engineering,

Islamic Azad University, Qazvin Branch, Iran

[email protected]

Azam Bastanfard

Islamic Republic of Iran Broadcast University

[email protected]

ABSTRACT—Voice intonation is one of criteria of

appropriately expression of phonemes and words, especially

in teaching language exert appropriately intonation by

person on phonemes and words, regardless of type such as:

interrogative, affirmative and etc, are depends on learning.

Comparison and reconstruction of voice intonation are two

important challenges in Computer Assisted Language

Learning. In this paper a method based on discrete signal

processing has been presented. By using this method user

can see similarity value of own voice intonation with source

voice. In addition duration of voice and silent of user sound

intervals, reconstruct according to source voice, and the user

will be able to hear reconstructed phoneme by own voice.

Applications of this method are: teaching standard Farsi

pronunciation to the non-Farsi speakers, speech therapy,

animation, and E-learning.

Keywords: Intonation comparison, voice reconstruction.

I. INTRODUCTION

General procedure learning spoken languages is listening to speech and imitates its intonation. Due to technological advances in the field of computer aided education and discrete signal processing; one of the challenging issues is teaching words and phonemes intonation by computer in Farsi. In this paper, the voice intonation in the first has been studied. Voice intonation is placed under phonetic language systems. Intonation, along with two other topics that named: Stress and Hesitation, makes features. This features that Alphabet cannot indicate them, named Suprasegmental features.

However, these features can be shown by somewhat punctuation in sentences, such as:"?" for interrogative sentences, "." for affirmative sentences, "!" for exclamatory sentences and etc. This discussion was not restricted to Farsi language and almost in all languages of

the world is common. Another talk more that not uses signs consists of: intonation of phonemes and words and their pronunciation type. In this case the only differences are phoneme duration and loudness.

Intonation has two important aspects. The first aspect: The phonetically clauses are widely used in conversation that which without uses words and grammatical rules transfer emotion states of the speaker [7]. Phonetically clause such as: yoo-hoo, wow, ooh. Second aspect is exerting changes in the concept. If the intonation at the end is rising, regardless of grammatical structure, the term has interrogative mode. Like this, if the intonation at the end is falling, even if the sentence structure is interrogative mode, affirmative mode will be understood. Hence, the vowel intonations at the end of sentences, in conversation viewpoint, are important.

As shown in table 1, in falling type, sound is low, rises slowly and again comes down to lead to silence. Affirmative intonation is usually like this. In rising type, sound is low, rises slowly, and will not decrease. Interrogative intonation is usually like this. This discussion illustrated in figure 1.

TABLE 1 FORMALLY AND PHONETICALLY TYPE OF SENTENCES AND

THEIR DIFFERENCE INTONATION, WITHOUT CHANGE IN GRAMMATICAL

STRUCTURE.

Page 2: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

Figure 1: continuous line: Affirmative intonation curve , dash line:

interrogative intonation curve

The important point is that in different languages, exert intonation on vowels is influenced by its language accent. Which in turn is different compared to other languages and dialects. Hence the dialect spoken in second language (L2) is heavily influenced by native language accent.

In the following, at section II, work motivation and main target are explained. In sections III details of proposed method has been investigated. In section IV, proposed method is validated by using numerical results obtained from calculations.

II. WORK MOTIVATION AND BACKGROUND

Two main target of this paper are teaching correct intonation expression of words and phonemes in Farsi, ditto, exerts appropriate intonation on phoneme by applying discrete signal processing techniques in computer. In proposed method, the user will listen to the source voice and tries to imitate its intonation and repeat it to computer. The computer gives rating to user utterances after each effort. Actually this rate is similarity between source voice intonation and user voice intonation, on percentage value. Then by using modifying techniques on sound, will be ploy to reconstruct the user voice from aspect of intervals sound duration, according to the source voice. Details of this process have been studied in “Proposed Method” section.

The idea of pronunciation training using computer is one of subcategory computer assisted training [4-5-10-11-13]. Including previous efforts in the field of computer-assisted language learning can be referring to an article written by Bastanfard et al. [1] in 2010. This article is provided software for teaching language to children who are undergoing speech therapy. In this article phonetic intonation is not considered. The other article written by Kim [2] in 2006, examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation.

For example as such clear technique can refer to (CAPT) Computer Assisted Pronunciation Training. This technique was presented in 2009 by Felps et al. [11]. This technique includes several speakers in its database that

helps users to improve understanding power of their second language.

In an article by Arias et al. [5] 2009, the Viterbi alignment technique used to obtain information of vowel duration. In 2006 a new technique was introduced by Mathur et al. [3] to increasing or decreasing the length of vowel. This technique named “basic model of 44 segments” that uses time-domain equation to model pressure-wave propagation inside this approximation of the tract.

According to article written by Mathur, [3] "intonation help identify grammatical structure in speech, rather than punctuation does in writing". Also in the article by Vaseghi et al. [6-8-9], accent morphing is discussed. The main idea is, review first to fifth formant in three dialects: English, American and Australian and comparison between them and convert each one to another, by using Linear Prediction technique (LP) and 2D-HMM to estimate two-dimensional curves formant vowel in the three dialects. Hidden Markov Model is the first prominent algorithm improving effectively addressing the major parts of continuous speech [18]. Programs based on current ASR systems are based on the pattern matching template, and works by using dynamic programming, or other time normalizing techniques [2].

Techniques such as: cepstral coefficients and MEL frequency cepstral coefficients (MFCC) are suggested to determine words how are pronounced. Their main problem are vulnerability against noise and lack of complete coverage information about intonation. Because this two technique extract the short time information from frames, while the information about intonation are related to long time information. To solve these problem LPC-derived cepstral coefficients has been introduced. Their problem is high computational volume. The techniques based on linear prediction coefficients and perceptual linear prediction, are discussed, that have performance as MFCC and have similar problems.

Most applications in this regard by providing software companies have commercial aspects and black box algorithms. These applications classified in sound editing categories. In more research presented at the pronunciation and accent, the voice reference databases are limited to topics related to linguistics. In this paper there is no restriction and the main argument is the phoneme

Intonation Type Including Type

Falling Affirmative علی به خانه رفت.

/ælі:/be/kha:ne/rΛft/.

Rising Interrogative علی به خانه رفت؟

/ælі:/be/kha:ne/rΛft/?

Page 3: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

intonation imitation. In other word it is true that we can apply the linguistic sound databases as a reference voice, but as mentioned, intonation imitation by user is not limited to Farsi and can use this technique to teaching intonation of other languages. As mentioned, the targets of this paper are: teaching original intonation expression of words and phonemes in Farsi, its application in speech therapy, and also, learning speaking Farsi language to non-Farsi speakers as fluently dialect. The advantages of this approach are: the amount of similarity user voice intonation and source voice intonation will be shown as a percentage value, and another, user will be able to see similarities of two voices as graphic charts, and as last advantage, user can unlimited number listen to own voice or source voice and when he felt able to express a similar intonation, give the command to compare between them. As applications of this technique can be refer to the following: teaching intonation of exclamatory phonemes, vocative phonemes and etc in Farsi language, applications in speech therapy, the use of animation and E-learning and etc. Other benefits of this subject, which in fact caused by using computers to teach is: create a private environment, away from the stress caused by false speech in the front of people, like classroom.

III. PROPOSED METHOD

Suggested steps are implemented as follows: First, the user listens to the source voice, and then try to imitate its intonation, user voice given to computer as input. After noise reduction and detecting intervals of silence and sound, if the numbers of intervals of user voice compared to source voice are equal, then the amount of similarities between them is calculated. Also in continuation, the user can give orders to reconstruct voice and hear own reconstructed voice. Important to note there is, no restriction on choosing the source voice, number of iterations or number of reconstructions. Algorithms do this debate can be divided into two steps. Step I: Comparison of two differ voices intonation and rating their similarity in

percentage value. Step II: Reconstruction user interval length voice according to source voice.

A. Intonation Comparison

After receiving the user voice and source voice, while the user located in a noisy environment, and a special hardware device is not used for entering the sound, noise reduction must be done. (Figure 2). Then, to determining the number of audio intervals, silence intervals are removed.

The first comparison between the voices is done here. Thus, if the number of detected audio intervals, in both, voices is not equal, then the user will be asked to repeat, and the next steps are done if the equality between the numbers corresponding audio intervals established. In the next step, as shown in figure 3, calculated energy intensity per time unit for each audio ranges of user voice, obtained from formula 1 are compared to the corresponding audio intervals of source voice.

After calculation the mean similarity of the voices will show to user. The proposed method is shown in figure 4.

Figure 2: The wave charts and spectrogram of source voice (b) after, (a) before noise reduction technique.

(a)

(a)

Page 4: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

B. Duration Reconstruction of The Audio Intervals

In this stage, pattern extracted from source voice mapped to the corresponding user voice intervals .Since at this stage, the voice reconstruction is done in terms of length; therefore we need to extract starting points and endpoints and duration of the audio intervals. After the above calculations, elasticity coefficient of each audio interval is obtained by formula 2.

Elasticity Coefficient = (2)

In the next step, Elasticity coefficient and its corresponding audio interval in the user voice sends to SOLAFS function and stretch the interval audio range. Tension can be increased or decreased. This tension is related to the elasticity coefficient obtained from voice intervals division. This process is done for each audio interval in user voice. Finally, after silent intervals insertion between the reconstructed voice intervals, reconstructed user voice fully achieved. Suggested steps are shown in Fig 5.

C. Signal Duration Change Algorithm

One way to change duration the audio signal is employing the SOLAFS algorithm [15-16]. This change does by using the increasing or decreasing the sound pitch cycle. The advantage of this technique compared to the same techniques is: SOLAFS does not make a change in voice pitch sound identity, thus the voice identity restored will be input voice identity.

This technique is based on (SOLA) Synchronized Overlap-Add techniques to reconstructing sound signal duration. This technique, takes the user voice intervals, the elasticity coefficient and some other parameters such as: length of the window and the amount of overlap, as input, and then does proceeded to change the audio input length. Increasing or decreasing is depending on the input length stretch factor. In other words SOLAFS is one method in the field of signal processing, placed under time-domain part.

IV. RESULTS

Numerical result in table 2 and the diagram showed in figure 6 indicates the length of reconstructed voice ranges and source voice ranges duration, are equal exactly, and also reconstructed voice in terms of audio quality is good. Presented topics are implemented in MATLAB; in Figure 7 the user interface is displayed. In this interface, spectrogram and wave form diagram of user voice and

Fig 4: The first level of the proposed method

1. Set start point and end point and length of each audio ranges of user voice and source voice

2. Calculate the elasticity coefficient of audio intervals by using formula 2.

3. Send elasticity coefficient and the corresponding user voice interval to SOLAFS function.

4. Affix output of signal duration change algorithm to reconstructed voice.

5. Repeat step 2 to step 4 for each user interval audio voice

6. Insert silent interval between reconstructed voices according to silent intervals in source voice.

Figure 5: The audio interval duration reconstruction steps.

(a)

(b) (b)

(b) (b)

Figure 4: The firs stege of proposed method

(b)

(b)

Page 5: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

source voice are displayed, and before comparison or reconstruction command, user be able to rate the similarity of his voice intonation than to source voice intonation, from these diagrams.

Design of this interface can be divided into three parts: The first part: Eight charts located at the top and its associated keys, the second part: Comparison button and numerical result, and third part: Four charts located at the bottom and its associated keys. In the first section, the user by clicking the “Source Voice” button and “Source Filtering” button will be able to hear the source voice in

original form and noise reduction form and observe related diagrams.

Then, to input voice, after clicking on “Input Your Voice” button, user has time in length of source voice, for speaking. “Replay” button and “Filtering the Voice” button, capable the user to hear his voice in original and noise reduction forms and also observe related diagrams. In this part, the user able, to an infinite number: enter his voice, hear his voice, and view the related diagrams.

The first stage of the proposed method starts by click on “comparison” button. So that, the amount of similarity calculation of voices down. If the numbers of audio intervals in both voices were equal, then the diagrams of energy intensity per time unit and the amount of similarity in percentage value will be shown. (Fig.3) Otherwise, error message is issued and the user will be asked to re-enter the voice. The second stage of the proposed method is done by click on “Voice Reconstructing” button. Like the first part of this interface, “Replay” and “Filtering Reconstructed Voice” buttons, providing possibility to listening the reconstructed voice in original and noise reduction forms,

and viewing the related diagrams. Details of these levels were described in proposed method section.

V. CONCLUSIONS AND FUTURE WORK

Voice Intonation is one of criteria of appropriately expression of phonemes and words, in other word, intonation plays an important role in transferring the concept and expression of emotional state. Most of previous techniques are involved heavy statistical calculation and processing.

Source Voice No-native Voice Elasticity Coefficient Reconstructed Voice

no start stop length start stop length ---- length

1 7450 16362 8912 2025 10643 8618 0.9670 8912

2 7450 16362 8912 2940 13759 10819 1.2140 8912

3 7450 16362 8912 2572 4909 2337 0.2622 8912

4 7450 16362 8912 3845 8284 4439 0.4981 8912

5 7450 16362 8912 3366 8375 5009 0.5621 8912

6 7450 16362 8912 11492 14818 3326 0.3732 8912

7 7450 16362 8912 2099 17831 15732 1.7653 8912

8 7450 16362 8912 5042 15645 10603 1.1897 8912

9 7450 16362 8912 3024 5974 2950 0.3310 8912

10 7450 16362 8912 16061 18725 2664 0.2989 8912

Page 6: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

Fig 7: The implementation of proposed comparison and reconstruction method

Figure 6: Experimental results diagram

While in this paper, the first principle is simplicity. This paper presents a new method using discrete signal processing and low-volume computation to reducing computational costs and increasing performance. The advantage of proposed approach is employing simpler algorithm instead of complex calculation.

Presented flowchart in figure 4 and algorithm in figure 5 are expressing this issue. In continuing this discussion, can be considered as future work, refer to providing a

method to change user

voice formant according to source voice formant.

VI. REFERENCE

[1] A. Bastanfard, N. Attaran Rezaei, M. Mottaghizadeh and M. Fazel ”A Novel Multimedia Educational Speech Therapy System for

Hearing Impaired Children”, PCM, LNCS6298, Part II, pp 705-

715, (2010).

[2] I. S. Kim “Automatic Speech Recognition: Reliability and

Pedagogical Implications for Teaching Pronunciation”. Educational Technology & Society, 9 (1), pp 322-334, (2006).

[3] S. Mathur, B. H. Story, J. J. Rodriguez ”Vocal-Tract Modeling: Fractional Elongation of Segment Lengths in a Waveguide Model

with Half-Sample Delays” IEEE TRANSACTIONS ON AUDIO,

SPEECH, AND LANGUAGE PROCESSING, vol 14, Issue 5, pp 1754-1762, (2006).

[4] N. Moustroufas, V. Digalakis “Automatic pronunciation evaluation

of foreign speakers using unknown text”, Computer Speech and

Language 21, pp 219–230, (2007).

[5] J. P. Arias, N. B. Yoma, H. Vivanco “Automatic Intonation

Assessment for Computer Aided Language Learning”, Speech

Communication . Volume 52, Issue 3, pp 254-267, (2009).

TABLE 2: THE NUMERICAL RESULT

Page 7: Comparison of Farsi Vowel Intonation with Different …conf.semnan.ac.ir/uploads/conferance_khat/English/134.pdfیسراف ابز طخ شزادرپ یلللا یب س Àارف Áک

وخستیه کىفراوس بیه المللی پردازش خط و زبان فارسی 5935شهریىر 51و 51

داوشکذي مهىذسی برق و کامپیىتر -داوشگاي سمىان

[6] S. Vaseghi, Q. Yan, “Analysis, Modelling and Synthesis of

Formants of British, American and Australian Accents”, ICASSP,

pp 712-715, (2003).

[7] P.Y. Oudeyer “The production and recognition of emotions in

speech: features and algorithms”, Int. J. Human-Computer Studies 59, pp157–183, (2003).

[8] E. Turajlic, D. Rentzos, S. Vaseghi, C. H. Ho “Evaluation of Methods For Parameteric Formant Transformation in Voice Conversion ” ICASSP, pp 724-727, ( 2003).

[9] D. Rentzos, S. Vaseghi, Q. Yan, C.H. Ho “Voice Conversion through Transformation of Spectral and Intonation Features” ICASSP 2004, Proc. Vol I, pp 21-24, (2004).

[10] K.Popowski, B. Piorkowska, E. Szpilewski “Application of vowel allophones transforms for sentence intonation in Polish TTS system”, SPECOM'2006, St. Petersburg, pp 25-29, (2006).

[11] D. Felps, H. Bortfeld, R. Gutierrez-Osuna “Foreign Accent Conversion in Computer Assisted Pronunciation Training”,

Speech Communication 51, pp 920–932, (2009).

[12] M. Eskenazi “Using Automatic Speech Processing for Foreign Language Pronunciation Tutoring: Some Issues and A Prototype”, Language Learning & Technology, Volume 2, Number 2, pp 62-76, (1999).

[13] Y. Ohkawa, M. Suzuki, H. Ogasawara, A. Ito, S. Makino, “A speaker adaptation method for non-native speech using learners’ native utterances for computer-assisted language learning systems”, Speech Communication 51, pp 875–882, (2009).

[14] M. Huckvale and K. Yanagisawa “Spoken Language Conversion with Accent Morphing: in: Proc. ISCA Speech Synthesis Workshop, Bonn, Germany, pp 64–70, (2007).

[15] D. Henja, B. Musicus, “The SOLAFS Time-scale Modification Algorithm”, Technical Report, Bolt Beranek & Newman, (1991).

[16] Zh. Jun, T. Wei, C. Yanpu, G. Yue “Parameters Evaluation of SOLA Algorithm for Time-scale Modification”, International Journal of Speech Technology, Volume 10, Numbers 2-3, pp 89-94, (2009).

[17] A. Ranjan, R. Balakrishnan, M. Chignell “Searching in Audio: The Utility of Transcripts, Dichotic Presentation, and Time-compression”, CHI 2006 Proceedings, Search & Navigation: Mobiles & Audio, pp 721-730, (2006).

[18] T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura “Duration Modeling For Hmm-Based Speech Synthesis”, In ICSLP, pp 939, (1998).

[19] S. Nordrum, S.Erler, D. Garstecki, S. Dhar, “Comparison of Performance on the Hearing in Noise Test Using Directional Microphones and Digital Noise Reduction Algorithms”, American Journal of Audiology, Vol. 15, pp 81–91, (2006).

[20] Y. Huang, J. Benesty, and J. Chen, “Analysis and Comparison of Multichannel Noise Reduction Methods in a Common Framework,” IEEE Trans. Audio, Speech, Language Processing, vol. 16, no 5, pp 957-968, (2008).

[21] R. Brueckmann, A. Scheidig and H. M. Gross “Adaptive Noise Reduction and Voice Activity Detection for improved Verbal Human-Robot Interaction using Binaural Data”, IEEE International Conference on Robotics and Automation Roma, pp 1782-1787, (2007).

[22] X. Wang, M. J. Munro “Computer-based training for learning

English vowel contrasts”, System 32 ,pp 539–552, (2004).

[23] p. Jande, “Spoken language annotation and data-driven modelling

of phone-level pronunciation in discourse context”, Speech Communication 50, pp 126–141, (2008).

[24] L. Neumeyer, H. Franco, V. Digalakis and M. Weintraub “Automatic scoring of pronunciation quality”, Speech Communication 30, pp 83-93, (2000).