a tool: morphological analyzer / synthesizer for lithuanian

8
2004 October 25, Vilnius Slide No 1 A Tool: Morphological Analyzer / Synthesizer for Lithuanian Vytautas Zinkevičius VDU KLC [email protected]

Upload: leslie-trevino

Post on 30-Dec-2015

29 views

Category:

Documents


3 download

DESCRIPTION

A Tool: Morphological Analyzer / Synthesizer for Lithuanian. Vytautas Zinkevičius VDU KLC vytas @donelaitis.vdu. Introduction Importance of the morphology level for the Lithuanian language technologies Difficulties caused by Lithuanian morphology. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 1

A Tool: Morphological Analyzer / Synthesizer for Lithuanian

Vytautas Zinkevičius

VDU KLC

[email protected]

Page 2: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 2

Morphological Analyzer / Synthesizer for Lithuanian

Introduction

• Importance of the morphology level for the Lithuanian language technologies

• Difficulties caused by Lithuanian morphology

Page 3: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 3

Morphological Analyzer / Synthesizer for Lithuanian

Lexicon

roots /pseudo roots

Lexis Morphology70 thousand

from 120.000lexemes

Software foraccessing the

Lexicon

combinations ofaffixes

+ grammatical

meaningsprovided by thecombinations

Page 4: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 4

Morphological Analyzer / Synthesizer for Lithuanian

• Creating the Tool

• Problem of Ambiguity

• A Demo of the Tool

http://donelaitis.vdu.lt/~vytas/lemo/angl/lemo_down.htm

Page 5: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 5

Morphological Analyzer / Synthesizer for Lithuanian

Implementation of the tool in application programs and systems

• Spelling Checkers (e.g. Lithuanian Spellcheckers for Microsoft Office’97-2000)

• Grammatical tagging of the Lithuanian text corpus at CCL VMU

• Implementation of the Tool in SproUT (a multi-lingual shallow text processing system, Language Technology Lab, DFKI)

• Used in the process of compiling the "Frequency Dictionary of Contemporary Lithuanian" (Grumadienė L., Žilinskienė V., Dažninis dabartinės rašomosios lietuvių kalbos žodynas, - Vilnius, 1997-1998).

Page 6: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 6

Morphological Analysis in SProUT

Lithuanian text:

Šimtų tūkstančių ar milijono ir daugiau metų, per kuriuos atsirado žmogus, procesas vyko toli nuo dabartinės Lietuvos teritorijos.

The Result of the Morphological analysis:

Šimtų TYPE=wordform

LEMMA={POS=numeral <šimtas>}

GRAMM_MEANING={POS=numeral + GROUP_OF_NUMERAL=cardinal + GENDER=masculine + NUMBER=plural + CASE=genitive}

tūkstančių TYPE=wordform

LEMMA={POS=numeral <tūkstantis>}

GRAMM_MEANING={POS=numeral + GROUP_OF_NUMERAL=cardinal + GENDER=masculine + CASE=genitive}

ar TYPE=wordform

LEMMA={POS=particle <ar>}

GRAMM_MEANING={POS=particle <ar>}

LEMMA={POS=conjunction <ar>}

GRAMM_MEANING={POS=conjunction <ar>}

LEMMA={POS=onomatopoeic_interjection <ar>}

GRAMM_MEANING={POS=onomatopoeic_interjection <ar>}

Page 7: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 7

Morphological Analyzer / Synthesizer for Lithuanian

Foundations and projects

• Lithuanian State Science and Studies Foundation: 1994 reg. no. 94-299/4D, contract no 41; 1995 reg. no. 95-241/7E, contract no. 159.

• "Lithuanian language recognition and generation at morphological level" - in the National Lithuanian language committee program "Lithuanian language in informational society 2000 - 2006"

Page 8: A Tool: Morphological Analyzer / Synthesizer for Lithuanian

2004 October 25, Vilnius Slide No 8

The End

Thank You