what the አማርኛ isdemo.clab.cs.cmu.edu/sp2016-11731/slides/langin10/amharic.pdf ·...
TRANSCRIPT
What the አማርኛ is...
What the አማርኛ is Amharic?
Amharic basics
● Ethiopia's only official language
○ Other speakers in Eritrea, Canada, US, Israel, Sweden
● Originates from the Amhara region and ethnic group in Ethiopia
● ~22 million speakers, 14.8 million monolingual
● Semitic language, second-most popular next to Arabic
Fidel● abugida
○ consonant + vowel = character
● 36 consonants × 7 vowels = 252 characters
Fidel● abugida
○ consonant + vowel = character
● 36 consonants × 7 vowels = 252 characters
Fidel● abugida
○ consonant + vowel = character
● 36 consonants × 7 vowels = 252 characters
Fidel● abugida
○ consonant + vowel = character
● 36 consonants × 7 vowels = 252 characters
Characteristics
እሱ ወደ ከተማ መጣ
Ǝssu wädä kätäma mäṭṭa.
he to city came
'He came to the city.'
● SOV
● prepositions, genitives, articles precede noun heads
○ head-final, left-branching
● Three-radical system typical of Semitic languages
○ Patterns of vowels in between 3 root
consonants, e.g. for nominalization of a verb
Challenges● no standard romanization
● reordering
● gemination
○ Doubling consonants ignored, though is contrastive (homographs)
● implicit articles
● rich morphology
○ Affixes express much of the meaning
Resources
● Word alignment with distributional approach
● Phrase-based MT with word segmentation
● Teaching NLP in Addis Ababa (future…?)
Previous work
● 232,653-word corpus from
European Language Resources
Association
○ (legal and news domain), nicely
transliterated
● 219,430-word corpus from
Ethiopian Parliament
● Quran, Bible
ReferencesAmharic. Ethnologue. http://www.ethnologue.com/language/amh. Accessed 26 January 2016.Amharic alphabet, pronunciation and language. Omniglot. http://www.omniglot.com/writing/amharic.htm. Accessed 26 January 2016.Amsalu, S. 2006. Data-driven Amharic-English bilingual lexicon acquisition. LREC (Genoa, 2006), 281-286.Amsalu, S. & Gibbon, D. 2005. Finite state morphology of Amharic. In Proceedings of RANLP.Argaw, A. A. & Asker, L. 2007. An Amharic stemmer: reducing words to their citation forms. In Proceedings of the 2007 Workshop on
Computational Approaches to Semitic Languages: Common Issues and Resources (Semitic '07). Association for Computational Linguistics, Stroudsburg, PA, USA, 104-110.
Gambäck, B., Eriksson, G. & Fourla, A. 2005. Natural language processing at the school of information studies for Africa. In Proceedings of the Second ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Stroudsburg, PA, 49-56.
"Language Amharic." The World Atlas of Language Structures Online. http://wals.info/languoid/lect/wals_code_amh. Accessed 26 January 2016.
OPUS: The Open Parallel Corpus. http://opus.lingfil.uu.se/. Accessed 26 January 2016.Teshome, M. G. & Besacier, L. 2012. Preliminary experiments on English-Amharic statistical machine translation. In SLTU, 36-41.