use of wordnet and on-line dictionaries to build en-sk synsets (experimental tool)

Post on 31-Dec-2015

45 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool). J á n GEN Č I Technical University of Ko šice , Slovakia genci@tuke.sk. Plan. WordNet, EuroWordNet + Slovak language Motivation Solution Results Future plans. WordNet, EuroWordNet. Well known projects - PowerPoint PPT Presentation

TRANSCRIPT

Use of WordNet and on-line dictionaries to build EN-SK

synsets(experimental tool)

Ján GENČI

Technical University of Košice, Slovakia

genci@tuke.sk

2

Plan

• WordNet, EuroWordNet + Slovak language

• Motivation

• Solution

• Results

• Future plans

3

WordNet, EuroWordNet

• Well known projects• WordNet defines meaning of English words

and their relationships (it defines synsets)• EuroWordNet (EWN) is very similar

multilingual project

• EWN doesn’t contain Slovak language (Slovak WN)

4

Motivation

• Text classification tasks require reduction of dimensionality and Intelligent search – Morphological database– Something like WordNet

5

Our approach

• We decided to try to use on-line dictionaries to map Slovak meanings to Wordnet synset entries

• Two approaches:– Intersection of translation of each member of

EN synset– Intersection of translation of related words

6

Architecture

Input word

WordNet DB local DB

Synset

Builder

Inetonline dict.

7

Synset “members” translation

• According WN word computer has 2 meanings specified by 2 synsets– {computer, computing machine,computing device,

data processor,electronic computer, information, processing system}

– {calculator, reckoner, figurer, estimator, computer}

• Result is formed as intersection of translation of synset members

8

Translation of related words

• Based on hyponym/hyperonym relationship between words:– Related words are translated– Result is formed as intersection of partial

translations

9

Results

• We provide 4 Slovak and 2 Czech on-line dictionaries (Slovak dictionaries seem to be from one source)

• Result depends on:– Number of members in the synset (1 is

problem)– Related words– Quality(?) of dictionary

10

Results (cont.)

• Parts of speech are sometimes mixed (nouns and adjectives)

• We implemented “multilingual view”

• Time consuming approach (quite slow) – results are stored to the database

11

Examples

word computer

12

13

14

15

Example

word table

16

17

Future works (plans)

• To deal with “dictionary problem”

• To eliminate mixed parts of speech in the results (at least for Slovak language, using morphological database)

• To connect other languages

18

• Local copy of new webpage

• Addresses– http://ruzin.fei.tuke.sk/~laposp– http://ruzin.fei.tuke.sk/~sudynova (new one)

19

Thank you!

top related