software applications for processing romanian texts. demonstration and comparison sanda cherata...

33
Software Applications for Processing Romanian Texts. Demonstration and Comparison Sanda Cherata Babeş-Bolyai University Faculty of Letters

Upload: whitney-bridges

Post on 17-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Software Applications for Processing

Romanian Texts. Demonstration and

Comparison

Sanda Cherata

Babeş-Bolyai University

Faculty of Letters

2

Software Applications

The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva

www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,

adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts

3

DMR

Paradigm of a given lemma• classic form• stem + termination

Accents Syllabification Morphological analysis of a given word

4

Software Applications

The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva

www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,

adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts

5

LEXICON

Specifying attributes for lexico-morphological classes

Designed to collect data from multiple users Friendly interface

6

Software Applications

The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva

www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,

adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts

7

SIASTRO-AM

Lexico-morphological analysis Parsing of noun, adjective, adverb, verb and

prepositional phrases• Uses a lexicon based on DMR, enriched with new

lexical and syntactic attributes added with the LEXICON application

• Outputs an annotated text

8

SIASTRO-AMTags for text elements

{F – Start sentencesentencesentence

F} – End sentencesentence{C – Start wordword wordC} – End wordword{N – Start unknown wordunknown word unknown wordN} – End unknown wordunknown word{D – Start numbernumber numberD} – End numbernumber

{S – Start punctuation signpunctuation sign punctuation signS} – End punctuation signpunctuation sign{L – Start hyphenhyphen

-L} – End hyphenhyphen{I – Start ignored sequenceignored sequence

sequence I} – End ignored sequenceignored sequence

9

SIASTRO-AMTags for words

{C word ( part of speech + grammatical category + grammatical category + ...... , separates parts of speech + grammatical category + grammatical category + ...... ) syllabification+accent position: , separates homographs (.......) , ....... (......) syllabification+ accent

position:+ lemma +: ......C}

{C date{C date (vrb+p_fp+,(vrb+p_fp+, sbt+fdpn+fisn+fipn+fvpa+,sbt+fdpn+fisn+fipn+fvpa+, adj+fdpn+fisn+fipn+fvpa+adj+fdpn+fisn+fipn+fvpa+ )) da-te+2:+da+:+dată+:+dat+:da-te+2:+da+:+dată+:+dat+:C}C}

10

Software Applications

The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva

www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,

adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts

11

ETRDesk top

12

ETRMenu bar

13

ETRFiles menu

14

Files Menu – New Project

15

Files Menu – New Project - FilesFiles

16

Files Menu – New Project - FilesFiles

17

Files Menu – New ProjectSubject FieldsSubject Fields

18

Files Menu – New Project - AbbreviationsAbbreviations

19

Files Menu – New Project - InitialismsInitialisms

20

FileFile Menu – Open Project

21

File menu –ContextsContexts

22

File menu – Terms

23

File menu – Terminological forms

24

View menu

25

View menu

26

Export menu

27

ETR – Term Extraction

28

ETR –Contexts

29

ETR –Move term in Terminological

form

30

ETR –Terminological Forms – contexts

31

Source text

32

ETR – Terminological Form

33

ETR – Future Developments

Syntactical analysis Enriching the terminological form by adding

new terminological features