A Mandarin edutainment system integrated virtual learning environments
Post on 30-Nov-2016
Embed Size (px)
A Mandarin edutainment sir
cation and provide the personalized services, a virtuallearning environment (VLE) is merged into the speechtechnical support and 3D face recognition engages thelearners in a live communication according to individualneeds. The immersing, interactive 3D virtual games canbe designed to promote extensive levels of motivation by
Funds for the Central Universities (2009YJS025). Informedia digital videounderstanding lab in Carneige Mellon University provides the portions ofexperimental materials and environments. The authors would like alsothank the Associate Editor and the anonymous reviewers for their helpfulcomments. Corresponding author. Tel.: +86 10 51682936E-mail address: firstname.lastname@example.org (Y. Ming).
Available online at www.sciencedirect.com
Speech Communication 551. Introduction
Computer-assisted language learning (CALL) (Amoryet al., 1998; Conati and Zhou, 2002; Kearney, 2004) is acontinuously developing topic. Since the communicationamong the dierent countries has become increasingly fre-quent, more and more people urgently need to grasp one or
more foreign languages. Mandarin, as one of the most pop-ulous languages, is being given greater attention. With therapid development of speech processing technology, auto-matic pronunciation recognition and evaluation make itideal for learners studying the language by themselves.However, although the traditional speech-based classeshave laid a solid foundation for identifying the levels ofthe learners and correcting incorrect pronunciation, it iseasy to become bored with extended learning without anyinteraction. In order to better simulate real-life communi-
q This work is supported by National Natural Science Foundation(60973060), the Research Fund for the Doctoral Program (20080004001)and Beijing Program (YB20081000401) and the Fundamental ResearchAbstract
In this paper, a novel Mandarin edutainment system is developed for learning Mandarin in immersing, interactive Virtual LearningEnvironments (VLE). Our system is mainly comprised of two parts: speech technology support and virtual 3D game design. First, 3D facerecognition technology is introduced to discriminate the dierent learners and provide the personalized learning services based on the char-acteristics of the individuals. Then, aMandarin pronunciation recognition and assessment scheme is constructed by state-of-the-art speechprocessing technology. According to the distinctive dierences ofMandarin rhythm from theWestern languages, we integrate the prosodicparameters into the recognition and evaluation model to highlight Mandarin characteristics and improve the evaluation performance. Inorder to promote the engagement of foreign learners, we embed our technology framework into a Virtual Reality (VR) game environment.The character design reects the Chinese traditional culture, and the plots eectively give consider to learning pronunciation and learnersinterest, providing the scoring feedback simultaneously. In the experimental design, rst, we test the correlation of recognition results andmachine scores with the dierent errors and human scores. Then, we evaluate the usability, likeability, and knowledgeability of the wholeVLE system. We divide the learners into three categories in terms of their Mandarin levels, and they provide feedback via a questionnaire.The results show that our system can eectively promote the foreign learners engagement and improve their Mandarin level. 2012 Elsevier B.V. All rights reserved.
Keywords: Mandarin learning; Pronunciation evaluation; Virtual Reality (VR); Edutainment; Virtual learning environment (VLE); 3D face recognitionlearning env
Yue Ming a,, Qiuqi Ra Institute of Information Science, Beijing J
bBeijing Trac Control Technolog
Received 3 July 2010; received in reviseAvailable onl0167-6393/$ - see front matter 2012 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.specom.2012.06.007ystem integrated virtualonmentsq
an a, Guodong Gao b
ong University, Beijing 100044, PR China
. Ltd., Beijing 100044, PR China
rm 1 June 2012; accepted 28 June 201210 July 2012
muproviding free form storylines based on the learnersages, native languages, and cultural backgrounds. In oursystem, the learners role is as a commander as soon asthe learner has been identied by 3D face recognition tech-nology. The learner controls the gestures and behaviors ofthe virtual characters built by them, based on their prefer-ences. They can also communicate with virtual humans viainteractive dialogs and actions. In this paper, the completevirtual Mandarin learning environment is constructed,composed of real-time pronunciation processing and theinteractive scene design.
First, pronunciation recognition and assessment is anindispensable indicator for real-time language learning. Inthe last several decades, considerable research has beendevoted to this area, providing a solid technique founda-tion for our system. The SRI speech group (Franco et al.,1997; Neueyer et al., 2000; Franco et al., 2000; Neumeyeret al., 1996) rst proposed the scheme which can verifythe overall quality of the learners pronunciation. Thespeech groups of Cambridge University and the AI lab ofMIT (Witt and Young, 2000; Witt, 1999) developed jointresearch for the CALL system. Their work can be usedto eectively identify the pronunciation errors of theWestern languages, and the evaluation was based onphone-level pronunciation, not the syllable-level. TheVICK system (Cucchiarini et al., 1998; Cucchiarini et al.,1997) constructed by the University of Nijmegen, extendedthe evaluation into the prosodic level. They summarizedthe relationship between human scoring and the eect ofprosodic information, including uency, segmental qualityand stress. The results of the investigating showed thatprosody is crucial for smooth pronunciation and communi-cation. The research from Tokyo University and KyotoUniversity (Raux and Kawahara, 2002; Tsubota et al.,2004) analyzed an important degree from the dierent pho-nemes during language learning. The dierent kinds of pro-nunciation errors were also investigated in terms ofprociency. Currently, structural representation has beenintroduced to help pronunciation assessment, which caneectively reect the structure and high-level languagesemantics spoken by non-native learners (Minematsu,2004; Asakawa et al., 2005). de Wet et al. (2009) exploiteda scheme for large-scale oral pronunciation assessment.Their research focused on prociency and listening com-prehension for fairly advanced students of English as a sec-ond language. However, one consideration is that thesesystems, which are based on Western languages, cannotdirectly be used for Mandarin learning due to the distinc-tive dierences from Mandarin. By widely investigatingthe pronunciation variances between Mandarin and wes-tern alphabetic languages, prosodic and tone features playan important role in accurate recognition and evaluation.In our system, we incorporate a prosodic model into thetraditional speech recognition framework. In the experi-mental section, we have a detailed outline that analyzes
72 Y. Ming et al. / Speech Comthe detection and evaluation of Mandarin pronunciationin a real-time audio-interaction system.Our surveys demonstrate that the simple audio-interac-tion for Mandarin learning lacks intelligence and interest.For example, the learners using the traditional learningsystem cannot perceive the real interactive environmentsand take the corresponding actions according to theunderstanding of language knowledge. From the ecologi-cal perspective (Hodges, 2009), the language generationoriginates from the interaction between persons and theirlived environments. In ecolinguistics, applying an agentsactions can reect the interrelational transactions betweenthe learners and their simulated environments and bestunderstand the culture and policies as maintaining alearners behavior. Speech technology only provides areduced view of the notion of emergence in terms ofprecision, rigor and success. The ecological approach(Van Lier, 2000) asserts that the perception and personalbehaviors of the learners, and particularly the interactionin which the learners participates, are critical to the con-cept understanding of Mandarin utterances. Immersingthe learners in an interactive environment can eectivelyhelp them acquire language skills while interacting withinthe environments.
There have already been a number of researches eortsfor language education that aim to promote the motivationof learners (Young, 2004; Young et al., 2000). One promi-nent example, Lewis Johnson et al. at Alelo, Inc. (Johnsonet al., xxxx) has embraced a game method for languagelearning for many years. In their system, they construct areal-life community by integrating scenario dialogs andcultural common sense, and their technologic and peda-gogic innovations have produced the quick and economicaleectiveness. However, for Mandarin, they have not devel-oped an immersing, interactive VLE by eectively reectingMandarin characteristics and cultural backgrounds. In thispaper, we focus on building a completed Mandarin edu-tainment system by incorporating Mandarin pronunciationrecognition and evaluation, and immersing the student ininteractive virtual learning games involving Chinese historyand culture.
The rest of this paper is organized as follows: in Section2, we introduce the outline of our system and summarizethe main contributions of our system. Then, a real-timeMandarin recognition and pronunciation evaluationscheme is described in Section 3. After discussing the con-struction of VLE in Section 4, we propose our Mandarineducational virtual game. In Section 5, we evaluate the per-formance of our system based on accuracy of pronuncia-tion recognition and assessment, usability, likeability,knowledgeability of our VLE game design. Finally, we con-clude in Section 6.
2. System framework and main contributions
According to previous research (Amory et al., 1998;Conati and Zhou, 2002; Kearney, 2004; Hodges, 2009;
nication 55 (2013) 7183Van Lier, 2000; Young, 2004; Young et al., 2000), we focuson the major challenges eecting real-time interactive
Mandarin pronunciation learning and present a new systemframework in a Virtual Reality (VR) game environment(Virtual Reality is a term that applies computer-simulatedenvironments to create a lifelike experience that can simu-late physical presence in places in the real world, as well asimaginary worlds and sound through speakers (Hodges,2009)). First, 3D face recognition is used to identify the dif-ferent learners and provide the personalized learning mate-rials according to their preferences (Ming, and Ruan,2012). Then, the Mandarin learning system consists of fourimportant elements: speech recognition interface, pronun-ciation evaluation, virtual game environment, and the plot
Y. Ming et al. / Speech Communication 55 (2013) 7183 73development as shown in Fig. 1. In the past few decades,considerable eorts have been devoted to speech recogni-tion, and this technology has become quite mature for stan-dard Mandarin. The speech recognition interface can beintroduced to identify the speaking contents of the foreignlearners and display the recognition results on the screenfor reference. Based on the distinctive characteristics ofMandarin, the pronunciation model is combined with theprosodic parameters which are used for our proposed sys-tem. Then, the condence of pronunciation detection, asthe evaluating standard, can be converted to the score ofthe foreign speakers Mandarin pronunciation. The resultsin the experimental section show that the machine scores ofpronunciation evaluation in our system are quite close tothe human scores.
In our system, the learners are exposed to a VR gameenvironment, which is about a young mans Eastern adven-ture. At the beginning of the scenario, each learner candesign an intelligent virtual character for himself accordingto his personal preference. The character can be placed inthe virtual environment and reect the real-time communi-cation based on the learners Mandarin pronunciation.Once the character appears, a magic box is prepared forhim, which can take him to an island according to theirchoices. There is a guardian angel ruling the island, andthe angel has a conversation with the character inMandarin.
The score of the pronunciation evaluation contributes tothe plot development. The system deals with the learnersMandarin utterance, comparing the recognition resultswith the standard answers. If the answer spoken by theFig. 1. The ow chart of our proposed Mandarin edutainment system.learner is completely illogical in relation to the content ofthe angels questions, the angel will repeat the question,and the system will set the output score at 0 for this time.Otherwise, if the answer is reasonable, the system willassign dierent scenarios based on the evaluating score.A series of exciting adventures of the East will be startedcombined with Mandarin pronunciation and cultural back-grounds in a VR game environment. For beginners, thegame is relatively easy, and winning the game is quite sim-ple. As Mandarin pronunciation improves, the game di-culty level gradually increases. The pronunciation scorethreshold can be also correspondingly updated based onthe learners progress, and the relative rewards can be pro-vided. If the learning progress is not being made, the pun-ishment system will begin to work. If consecutive errors,including the lower pronunciation scores and unreasonableanswers surpass ve times, the severe punishment willresend the virtual character to the beginning of the gameor the last checkpoint. If learners speak and behave quitewell, a set of incentive mechanisms will provide some spe-cial magic cards for reducing the degree of the conversationdiculty, skipping questions or even entering more inter-esting scenes. Both the punishment and incentive strategiesare used to increase playability and fun in our Mandarinlearning edutainment system.
In our framework, a novel scheme is proposed toimprove the Mandarin pronunciation level in VR gameenvironment. The main contributions of our scheme canbe summarized as follows:
1. Accuracy: Via extensive investigation of the dierencebetween Mandarin and Western languages, prosodicfeatures are the cornerstones to good pronunciation,especially for the four Mandarin tones. A syllable takesits meaning from the sound and tones, which is dicultfor foreign learners. In our system, we combine the pro-sodic model with the classical speech processing frame-work which can provide promising results forMandarin recognition and evaluation. Detailed technol-ogy analysis will be given in the following section. Theexperimental results also verify this point. Once thelearners have mastered tones and rhythm, everythingelse will fall in place and great progress will be made.
2. Usability: In our system,...