a tool: morphological analyzer / synthesizer for lithuanian
DESCRIPTION
A Tool: Morphological Analyzer / Synthesizer for Lithuanian. Vytautas Zinkevičius VDU KLC vytas @donelaitis.vdu. Introduction Importance of the morphology level for the Lithuanian language technologies Difficulties caused by Lithuanian morphology. - PowerPoint PPT PresentationTRANSCRIPT
2004 October 25, Vilnius Slide No 1
A Tool: Morphological Analyzer / Synthesizer for Lithuanian
Vytautas Zinkevičius
VDU KLC
2004 October 25, Vilnius Slide No 2
Morphological Analyzer / Synthesizer for Lithuanian
Introduction
• Importance of the morphology level for the Lithuanian language technologies
• Difficulties caused by Lithuanian morphology
2004 October 25, Vilnius Slide No 3
Morphological Analyzer / Synthesizer for Lithuanian
Lexicon
roots /pseudo roots
Lexis Morphology70 thousand
from 120.000lexemes
Software foraccessing the
Lexicon
combinations ofaffixes
+ grammatical
meaningsprovided by thecombinations
2004 October 25, Vilnius Slide No 4
Morphological Analyzer / Synthesizer for Lithuanian
• Creating the Tool
• Problem of Ambiguity
• A Demo of the Tool
http://donelaitis.vdu.lt/~vytas/lemo/angl/lemo_down.htm
2004 October 25, Vilnius Slide No 5
Morphological Analyzer / Synthesizer for Lithuanian
Implementation of the tool in application programs and systems
• Spelling Checkers (e.g. Lithuanian Spellcheckers for Microsoft Office’97-2000)
• Grammatical tagging of the Lithuanian text corpus at CCL VMU
• Implementation of the Tool in SproUT (a multi-lingual shallow text processing system, Language Technology Lab, DFKI)
• Used in the process of compiling the "Frequency Dictionary of Contemporary Lithuanian" (Grumadienė L., Žilinskienė V., Dažninis dabartinės rašomosios lietuvių kalbos žodynas, - Vilnius, 1997-1998).
2004 October 25, Vilnius Slide No 6
Morphological Analysis in SProUT
Lithuanian text:
Šimtų tūkstančių ar milijono ir daugiau metų, per kuriuos atsirado žmogus, procesas vyko toli nuo dabartinės Lietuvos teritorijos.
The Result of the Morphological analysis:
Šimtų TYPE=wordform
LEMMA={POS=numeral <šimtas>}
GRAMM_MEANING={POS=numeral + GROUP_OF_NUMERAL=cardinal + GENDER=masculine + NUMBER=plural + CASE=genitive}
tūkstančių TYPE=wordform
LEMMA={POS=numeral <tūkstantis>}
GRAMM_MEANING={POS=numeral + GROUP_OF_NUMERAL=cardinal + GENDER=masculine + CASE=genitive}
ar TYPE=wordform
LEMMA={POS=particle <ar>}
GRAMM_MEANING={POS=particle <ar>}
LEMMA={POS=conjunction <ar>}
GRAMM_MEANING={POS=conjunction <ar>}
LEMMA={POS=onomatopoeic_interjection <ar>}
GRAMM_MEANING={POS=onomatopoeic_interjection <ar>}
2004 October 25, Vilnius Slide No 7
Morphological Analyzer / Synthesizer for Lithuanian
Foundations and projects
• Lithuanian State Science and Studies Foundation: 1994 reg. no. 94-299/4D, contract no 41; 1995 reg. no. 95-241/7E, contract no. 159.
• "Lithuanian language recognition and generation at morphological level" - in the National Lithuanian language committee program "Lithuanian language in informational society 2000 - 2006"
2004 October 25, Vilnius Slide No 8
The End
Thank You