a mandarin edutainment system integrated virtual learning environments

Download A Mandarin edutainment system integrated virtual learning environments

Post on 30-Nov-2016




1 download

Embed Size (px)


  • A Mandarin edutainment sir



    y CO

    d foine

    cation and provide the personalized services, a virtuallearning environment (VLE) is merged into the speechtechnical support and 3D face recognition engages thelearners in a live communication according to individualneeds. The immersing, interactive 3D virtual games canbe designed to promote extensive levels of motivation by

    Funds for the Central Universities (2009YJS025). Informedia digital videounderstanding lab in Carneige Mellon University provides the portions ofexperimental materials and environments. The authors would like alsothank the Associate Editor and the anonymous reviewers for their helpfulcomments. Corresponding author. Tel.: +86 10 51682936E-mail address: myname35875235@126.com (Y. Ming).

    Available online at www.sciencedirect.com

    Speech Communication 551. Introduction

    Computer-assisted language learning (CALL) (Amoryet al., 1998; Conati and Zhou, 2002; Kearney, 2004) is acontinuously developing topic. Since the communicationamong the dierent countries has become increasingly fre-quent, more and more people urgently need to grasp one or

    more foreign languages. Mandarin, as one of the most pop-ulous languages, is being given greater attention. With therapid development of speech processing technology, auto-matic pronunciation recognition and evaluation make itideal for learners studying the language by themselves.However, although the traditional speech-based classeshave laid a solid foundation for identifying the levels ofthe learners and correcting incorrect pronunciation, it iseasy to become bored with extended learning without anyinteraction. In order to better simulate real-life communi-

    q This work is supported by National Natural Science Foundation(60973060), the Research Fund for the Doctoral Program (20080004001)and Beijing Program (YB20081000401) and the Fundamental ResearchAbstract

    In this paper, a novel Mandarin edutainment system is developed for learning Mandarin in immersing, interactive Virtual LearningEnvironments (VLE). Our system is mainly comprised of two parts: speech technology support and virtual 3D game design. First, 3D facerecognition technology is introduced to discriminate the dierent learners and provide the personalized learning services based on the char-acteristics of the individuals. Then, aMandarin pronunciation recognition and assessment scheme is constructed by state-of-the-art speechprocessing technology. According to the distinctive dierences ofMandarin rhythm from theWestern languages, we integrate the prosodicparameters into the recognition and evaluation model to highlight Mandarin characteristics and improve the evaluation performance. Inorder to promote the engagement of foreign learners, we embed our technology framework into a Virtual Reality (VR) game environment.The character design reects the Chinese traditional culture, and the plots eectively give consider to learning pronunciation and learnersinterest, providing the scoring feedback simultaneously. In the experimental design, rst, we test the correlation of recognition results andmachine scores with the dierent errors and human scores. Then, we evaluate the usability, likeability, and knowledgeability of the wholeVLE system. We divide the learners into three categories in terms of their Mandarin levels, and they provide feedback via a questionnaire.The results show that our system can eectively promote the foreign learners engagement and improve their Mandarin level. 2012 Elsevier B.V. All rights reserved.

    Keywords: Mandarin learning; Pronunciation evaluation; Virtual Reality (VR); Edutainment; Virtual learning environment (VLE); 3D face recognitionlearning env

    Yue Ming a,, Qiuqi Ra Institute of Information Science, Beijing J

    bBeijing Trac Control Technolog

    Received 3 July 2010; received in reviseAvailable onl0167-6393/$ - see front matter 2012 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.specom.2012.06.007ystem integrated virtualonmentsq

    an a, Guodong Gao b

    ong University, Beijing 100044, PR China

    . Ltd., Beijing 100044, PR China

    rm 1 June 2012; accepted 28 June 201210 July 2012


    (2013) 7183

  • muproviding free form storylines based on the learnersages, native languages, and cultural backgrounds. In oursystem, the learners role is as a commander as soon asthe learner has been identied by 3D face recognition tech-nology. The learner controls the gestures and behaviors ofthe virtual characters built by them, based on their prefer-ences. They can also communicate with virtual humans viainteractive dialogs and actions. In this paper, the completevirtual Mandarin learning environment is constructed,composed of real-time pronunciation processing and theinteractive scene design.

    First, pronunciation recognition and assessment is anindispensable indicator for real-time language learning. Inthe last several decades, considerable research has beendevoted to this area, providing a solid technique founda-tion for our system. The SRI speech group (Franco et al.,1997; Neueyer et al., 2000; Franco et al., 2000; Neumeyeret al., 1996) rst proposed the scheme which can verifythe overall quality of the learners pronunciation. Thespeech groups of Cambridge University and the AI lab ofMIT (Witt and Young, 2000; Witt, 1999) developed jointresearch for the CALL system. Their work can be usedto eectively identify the pronunciation errors of theWestern languages, and the evaluation was based onphone-level pronunciation, not the syllable-level. TheVICK system (Cucchiarini et al., 1998; Cucchiarini et al.,1997) constructed by the University of Nijmegen, extendedthe evaluation into the prosodic level. They summarizedthe relationship between human scoring and the eect ofprosodic information, including uency, segmental qualityand stress. The results of the investigating showed thatprosody is crucial for smooth pronunciation and communi-cation. The research from Tokyo University and KyotoUniversity (Raux and Kawahara, 2002; Tsubota et al.,2004) analyzed an important degree from the dierent pho-nemes during language learning. The dierent kinds of pro-nunciation errors were also investigated in terms ofprociency. Currently, structural representation has beenintroduced to help pronunciation assessment, which caneectively reect the structure and high-level languagesemantics spoken by non-native learners (Minematsu,2004; Asakawa et al., 2005). de Wet et al. (2009) exploiteda scheme for large-scale oral pronunciation assessment.Their research focused on prociency and listening com-prehension for fairly advanced students of English as a sec-ond language. However, one consideration is that thesesystems, which are based on Western languages, cannotdirectly be used for Mandarin learning due to the distinc-tive dierences from Mandarin. By widely investigatingthe pronunciation variances between Mandarin and wes-tern alphabetic languages, prosodic and tone features playan important role in accurate recognition and evaluation.In our system, we incorporate a prosodic model into thetraditional speech recognition framework. In the experi-mental section, we have a detailed outline that analyzes

    72 Y. Ming et al. / Speech Comthe detection and evaluation of Mandarin pronunciationin a real-time audio-interaction system.Our surveys demonstrate that the simple audio-interac-tion for Mandarin learning lacks intelligence and interest.For example, the learners using the traditional learningsystem cannot perceive the real interactive environmentsand take the corresponding actions according to theunderstanding of language knowledge. From the ecologi-cal perspective (Hodges, 2009), the language generationoriginates from the interaction between persons and theirlived environments. In ecolinguistics, applying an agentsactions can reect the interrelational transactions betweenthe learners and their simulated environments and bestunderstand the culture and policies as maintaining alearners behavior. Speech technology only provides areduced view of the notion of emergence in terms ofprecision, rigor and success. The ecological approach(Van Lier, 2000) asserts that the perception and personalbehaviors of the learners, and particularly the interactionin which the learners participates, are critical to the con-cept understanding of Mandarin utterances. Immersingthe learners in an interactive environment can eectivelyhelp them acquire language skills while interacting withinthe environments.

    There have already been a number of researches eortsfor language education that aim to promote the motivationof learners (Young, 2004; Young et al., 2000). One promi-nent example, Lewis Johnson et al. at Alelo, Inc. (Johnsonet al., xxxx) has embraced a game method for languagelearning for many years. In their system, they construct areal-life community by integrating scenario dialogs andcultural common sense, and their technologic and peda-gogic innovations have produced the quick and economicaleectiveness. However, for Mandarin, they have not devel-oped an immersing, interactive VLE by eectively reectingMandarin characteristics and cultural backgrounds. In thispaper, we focus on building a completed Mandarin edu-tainment system by incorporating Mandarin pronunciationrecognition and evaluation, and immersing the student ininteractive virtual learning games involving Chinese historyand culture.

    The rest of this paper is organized as follows: in Section2, we introduce the outline of our system and summarizethe main contributions of our system. Then, a real-timeMandarin recognition and pronunciation evaluationscheme is described in Section 3. After discussing the con-struction of VLE in Section 4, we propose our Mandarineducational virtual game. In Section 5, we evaluate the per-formance of our system based