use of wordnet and on-line dictionaries to build en-sk synsets (experimental tool)
DESCRIPTION
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool). J á n GEN Č I Technical University of Ko šice , Slovakia [email protected]. Plan. WordNet, EuroWordNet + Slovak language Motivation Solution Results Future plans. WordNet, EuroWordNet. Well known projects - PowerPoint PPT PresentationTRANSCRIPT
Use of WordNet and on-line dictionaries to build EN-SK
synsets(experimental tool)
Ján GENČI
Technical University of Košice, Slovakia
2
Plan
• WordNet, EuroWordNet + Slovak language
• Motivation
• Solution
• Results
• Future plans
3
WordNet, EuroWordNet
• Well known projects• WordNet defines meaning of English words
and their relationships (it defines synsets)• EuroWordNet (EWN) is very similar
multilingual project
• EWN doesn’t contain Slovak language (Slovak WN)
4
Motivation
• Text classification tasks require reduction of dimensionality and Intelligent search – Morphological database– Something like WordNet
5
Our approach
• We decided to try to use on-line dictionaries to map Slovak meanings to Wordnet synset entries
• Two approaches:– Intersection of translation of each member of
EN synset– Intersection of translation of related words
6
Architecture
Input word
WordNet DB local DB
Synset
Builder
Inetonline dict.
7
Synset “members” translation
• According WN word computer has 2 meanings specified by 2 synsets– {computer, computing machine,computing device,
data processor,electronic computer, information, processing system}
– {calculator, reckoner, figurer, estimator, computer}
• Result is formed as intersection of translation of synset members
8
Translation of related words
• Based on hyponym/hyperonym relationship between words:– Related words are translated– Result is formed as intersection of partial
translations
9
Results
• We provide 4 Slovak and 2 Czech on-line dictionaries (Slovak dictionaries seem to be from one source)
• Result depends on:– Number of members in the synset (1 is
problem)– Related words– Quality(?) of dictionary
10
Results (cont.)
• Parts of speech are sometimes mixed (nouns and adjectives)
• We implemented “multilingual view”
• Time consuming approach (quite slow) – results are stored to the database
11
Examples
word computer
12
13
14
15
Example
word table
16
17
Future works (plans)
• To deal with “dictionary problem”
• To eliminate mixed parts of speech in the results (at least for Slovak language, using morphological database)
• To connect other languages
18
• Local copy of new webpage
• Addresses– http://ruzin.fei.tuke.sk/~laposp– http://ruzin.fei.tuke.sk/~sudynova (new one)
19
Thank you!