hlt r&d in south africa hlt collaboration between south africa and the low countries workshop 24...
Post on 26-Dec-2015
219 Views
Preview:
TRANSCRIPT
HLT R&D in South Africa
HLT Collaboration between South Africa
and the Low Countries Workshop
24 November 2008
Noordhoek, South Africa
Overview
Specific R&D challenges Areas of active research
Text processing Speech processing Applications of HLT
Main projects: current and recent Research institutions active in HLT Main R&D sponsors
Specific R&D Challenges
Incompleteness of basic linguistic knowledge
Scarcity of resources Linguistic data Technology components
Uniqueness of user populations and languages
Research areas (1)
Text processing: Computational morphological analysis, POS tagging Spelling checkers, grammar checkers Machine translation, machine-aided translation Computational lexicography Wordnets
Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language
distances MA for agglutinative languages
Research areas (2)
Speech processing: ASR, TTS, spoken dialogue systems Phonetic investigations for HLT Speaker verification, S-LID Speech tools (diarization, channel normalisation, speech detection)
Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language
distances Timing information in speech Multi-accent and multilingual acoustic modelling Higher order Markov models and other non-standard acoustic models
Research areas (3)
Applications of HLT Telephone-based information systems Computer assisted language learning Document proofing tools Accessibility devices Mobile devices
Main R&D initiatives
Department of Arts and Culture (DAC)Applications that support multilingualism, especially related to government service delivery DAC A: Spelling checkers DAC B: Machine-aided translation DAC C: Lwazi: Multilingual telephony-based information delivery
Department of Science and Technology (DST)Directed research in HLT aimed at addressing SA national priorities. National HLT Network projects International collaborative projects
Various individual research projects
Main R&D projects
Text processing: Computational morphological analysis: Unisa Spellcheckers: DAC A Machine translation: EtsaTrans, DAC B
Speech: Phonetic investigations: NHN PAST ASR/TTS/spoken dialogue systems:
AST, Limpopo ASR OpenPhone, Lwazi (DAC C)
Mobile E-learning for Africa (MELFA)
UNISA Computational Morphological Analysis
Development of parsing tools for Bantu languages: computational morphological analysers disambiguators syntactic parsers
Development of supporting resources for development & testing, includes extensive underlying machine-readable lexicons
Status: Initiated in 2002 (for isiZulu morphological analyser) Various prototypes under development (isiZulu, isiXhosa, Siswati, isiNdebele, Northern
Sotho and Setswana) Extended until 2010
Principal researchers: Sonja Bosch (Project Leader), Laurette Pretorius Ansu Berg, Axel Fleisch, Albert Kotze, Petro Kotze, Memezi Mfusi, Lydia Mojapelo,
Rigardt Pretorius, Linda van Huyssteen, Biffy Viljoen
Sponsor: NRF
DAC A: Spelling checkers for public administration domain
Development of spelling checkers for 10 official SA languages Specifically for use in government departments. Spelling checkers for isiNdebele, isiXhosa, isiZulu and Siswati include morphological
analysers for effective spellchecking of these agglutinative languages
Status: Final evaluation by client in progress
Principal researchers: MJ Puttkammer (NWU), S Pilon (NWU), DJ Prinsloo (UP), SE Bosch (Unisa)
Sponsor: Department of Arts and Culture, CText
EtsaTrans Machine Translation
Development of a functional machine translation system. Focus domain: mainly administrative documents Main languages: English to Afrikaans, Afrikaans to English Other languages: English to Xhosa, English to Southern Sotho
Harvesting previously translated information to create parallel corpora
Status: Initiated in 2003, ongoing Prototypes in use
Principal researchers: JA Naudé, L Jordaan
Sponsor: UFS
DAC B: Machine-aided translation tools
Development of translation tools: An integrated translation environment (ITE) Word translators Machine translation systems for three language pairs Terminology management system Document management system
Status: Under development (2007-2010) All tools, data and research output to be made available publicly
Principal researchers: HJ Groenewald, S Pilon (NWU) DJ Prinsloo (UP)
Sponsor: DAC
NHN PAST: Phonetics for Advanced Speech Technology
Technology-orientated investigation and description of the vowel system of the Sotho languages and tone in Sotho and Nguni language
Status: Initiated May 2008, Due for completion June 2009
Principal researchers: E. Barnard (Meraka) B. Khoali (independent consultant) D. Wissing (NWU) S. Zerbian (Wits)
Sponsor: National HLT Network (DST/Meraka)
African Speech Technologies (AST)
Development of a multilingual telephone-based hotel reservation system. Developed corpora and technology components (TTS, ASR, dialogue systems) for
SAE, Afrikaans, isiZulu, isiXhosa and Sesotho.
Status: Completed 2004 Gave rise to commercial company: Catchword Data available for research purposes (release imminent)
Principal researchers: J.C. Roux, E.C. Botha, J. du Preez Various collaborators
Sponsor: DACST (Innovation Fund)
Limpopo ASR
Development of baseline automatic speech recognition systems for the major languages of the Limpopo Province Languages: Sepedi (Sesotho sa Leboa), Setswana, Tshivenda and Xitsonga. Telephone speech data collection and manual annotation
Extension to text-to-speech synthesis and domain-specific prototype dialogue systems
Status: Baseline ASR systems completed (2004-2006) Extension ongoing
Principal researchers: HJ Oosthuizen and MJD Manamela
Sponsor: Telkom and other industry partners
OpenPhone
Demonstrated use of telephone-based information services in providing health information in a rural setting.
Automated health information system that provides information to caregivers looking after HIV-positive children living in the vicinity of Gabarone in Botswana
Includes Setswana TTS and ASR development
Status: Completed 2008, currently live.
http://www.meraka.org.za/hlt_projects_ophone.htm
Principal researchers: Etienne Barnard, Marelie Davel, Madelaine Plauche
Sponsor OSI/OSISA, DST
Lwazi
Development and piloting of a fully Open Source multilingual telephone-based information system ASR and TTS systems in 11 official languages ASR and TTS integrated into a telephony platform Open Source resources and tools Various pilots: first significant pilot with DPSA Community Development Workers
Status: Initiated September 2006 On track for completion September 2009
Principal researchers: Etienne Barnard, Marelie Davel, Gerhard van Huyssteen
Sponsor: DAC
Mobile E-learning for Africa (MELFA)
Mobile solutions for on-site literacy training and skills development for workers in the Building and Construction Industry
Includes text-to-speech, speech-to-speech translation Initially 30 test persons in Western Cape are involved in testing the modules for interactive
M-learning.
Status: Initiated in 2007, completing in 2009.
Principal researchers: JC Roux (Project leader, SA), A Visagie, H Engelbrecht, A Magnusdottir, P Scholtz.
Sponsor: Danida (Danish government organisation)
Research institutions: TextInstitution Areas of interest Size1 Language focus
UNISA
University of South Africa
Morphological analysis, POS disambiguation, syntactic parsing
8/2 Bantu family languages
CTexT
North-West University
Document proofing tools, machine aided translation, machine translation, computer assisted language learning, syntactic parsing
2/8 Afrikaans
(Other official languages, African languages)
UP
University of Pretoria
Morphological analysis, POS disambiguation, syntactic parsing, computational lexicography
2/0 Sepedi
UWC
University of Western Cape
POS disambiguation, computational lexicography, localization, machine translation
2/x isiXhosa
Wits (1)
University of Witwatersrand
Morphological analysis 1/0 isiZulu
UFS
University of Free State
Machine aided translation, machine translation
1/0 EnglishAfrikaans
(Sesotho/E, isiXhosa/E)
1 Size: snr researchers / post-graduate students
Research institutions: Speech
Institution Areas of interest Size Language focus
SU-CLaST
University of Stellenbosch
ASR, TTS, spoken dialogue systems, speaker verification, S-LID, computer assisted language learning, machine translation, speech-to-speech translation
6/6 SAE, isiXhosa, Afrikaans
Meraka
CSIR Meraka Institute
ASR, TTS, spoken dialogue systems, tone modelling, pronunciation modelling, speaker verification, language distances, channel normalisation, S-LID
4/15 All SA official languages
Wits
University of Witwatersrand
Tone modelling
TTS
2/1 Sotho and Nguni languages
Limpopo
University of Limpopo
ASR, TTS, language modelling 1/2 Sepedi, Xitsonga, Tshivenda, Setswana
Main R&D sponsors Department of Arts and Culture (DAC)
Applications that support multilingualism, especially related to government service delivery
Department of Science and Technology (DST)
Directed research in HLT aimed at addressing SA national priorities.
National Research Foundation (NRF)
Support for individual researchers
Industry:
Addressing industry-specific needs ASR/TTS (Telkom, Intelleca, IBM, Google and others), Spelling checkers (Microsoft) Speech processing tools (Grintek,Armscor), Speech-to-speech translation (Armscor)
International donor funding
Addressing developmental needs Open Society Initiative (OSI/OSISA), Danish Danida, UK Dept for International Development (DfID) Canadian International Development Research (IDRC), and others
Host institutions (Universities, CSIR, etc)
top related