a tool: morphological analyzer / synthesizer for lithuanian

Post on 30-Dec-2015

30 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

A Tool: Morphological Analyzer / Synthesizer for Lithuanian. Vytautas Zinkevičius VDU KLC vytas @donelaitis.vdu. Introduction Importance of the morphology level for the Lithuanian language technologies Difficulties caused by Lithuanian morphology. - PowerPoint PPT Presentation

TRANSCRIPT

2004 October 25, Vilnius Slide No 1

A Tool: Morphological Analyzer / Synthesizer for Lithuanian

Vytautas Zinkevičius

VDU KLC

vytas@donelaitis.vdu

2004 October 25, Vilnius Slide No 2

Morphological Analyzer / Synthesizer for Lithuanian

Introduction

• Importance of the morphology level for the Lithuanian language technologies

• Difficulties caused by Lithuanian morphology

2004 October 25, Vilnius Slide No 3

Morphological Analyzer / Synthesizer for Lithuanian

Lexicon

roots /pseudo roots

Lexis Morphology70 thousand

from 120.000lexemes

Software foraccessing the

Lexicon

combinations ofaffixes

+ grammatical

meaningsprovided by thecombinations

2004 October 25, Vilnius Slide No 4

Morphological Analyzer / Synthesizer for Lithuanian

• Creating the Tool

• Problem of Ambiguity

• A Demo of the Tool

http://donelaitis.vdu.lt/~vytas/lemo/angl/lemo_down.htm

2004 October 25, Vilnius Slide No 5

Morphological Analyzer / Synthesizer for Lithuanian

Implementation of the tool in application programs and systems

• Spelling Checkers (e.g. Lithuanian Spellcheckers for Microsoft Office’97-2000)

• Grammatical tagging of the Lithuanian text corpus at CCL VMU

• Implementation of the Tool in SproUT (a multi-lingual shallow text processing system, Language Technology Lab, DFKI)

• Used in the process of compiling the "Frequency Dictionary of Contemporary Lithuanian" (Grumadienė L., Žilinskienė V., Dažninis dabartinės rašomosios lietuvių kalbos žodynas, - Vilnius, 1997-1998).

2004 October 25, Vilnius Slide No 6

Morphological Analysis in SProUT

Lithuanian text:

Šimtų tūkstančių ar milijono ir daugiau metų, per kuriuos atsirado žmogus, procesas vyko toli nuo dabartinės Lietuvos teritorijos.

The Result of the Morphological analysis:

Šimtų TYPE=wordform

LEMMA={POS=numeral <šimtas>}

GRAMM_MEANING={POS=numeral + GROUP_OF_NUMERAL=cardinal + GENDER=masculine + NUMBER=plural + CASE=genitive}

tūkstančių TYPE=wordform

LEMMA={POS=numeral <tūkstantis>}

GRAMM_MEANING={POS=numeral + GROUP_OF_NUMERAL=cardinal + GENDER=masculine + CASE=genitive}

ar TYPE=wordform

LEMMA={POS=particle <ar>}

GRAMM_MEANING={POS=particle <ar>}

LEMMA={POS=conjunction <ar>}

GRAMM_MEANING={POS=conjunction <ar>}

LEMMA={POS=onomatopoeic_interjection <ar>}

GRAMM_MEANING={POS=onomatopoeic_interjection <ar>}

2004 October 25, Vilnius Slide No 7

Morphological Analyzer / Synthesizer for Lithuanian

Foundations and projects

• Lithuanian State Science and Studies Foundation: 1994 reg. no. 94-299/4D, contract no 41; 1995 reg. no. 95-241/7E, contract no. 159.

• "Lithuanian language recognition and generation at morphological level" - in the National Lithuanian language committee program "Lithuanian language in informational society 2000 - 2006"

2004 October 25, Vilnius Slide No 8

The End

Thank You

top related