related terms search based on wordnet / wiktionary and its application in ontology matching...

18
Related terms search based on WordNet / Wiktionary and its application in ontology matching RCDL'2009 St. Petersburg Institute for Informatics and Automation of RAS Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com) Jönköping University, Sweden

Upload: benedict-osborne

Post on 30-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Related terms search based on WordNet / Wiktionary

and its application in ontology matching

RCDL'2009

St. Petersburg Institute

for Informatics and Automation of RAS

Feiyu Lin, A. Krizhanovsky (andrew.krizhanovsky at gmail.com)

Jönköping University, Sweden

2

Contents

Wiki and Wiktionary intro

MRD, parser and Wiktionaries comparison

Correlation of relatedness measures Experiment scheme Result and comparison

Results, applications and future

Goal Is it possible to find related terms by the

current version of Wiktionary

as successfully as by WordNet? for ontology matching, for application in text search systems, etc.

What advantages?

4

Wiki-resources

Distributed users and authors (edit pages)

Centralized storage (e.g. MySQL, Apache, PHP)

Set of hyper linked articles

Each article has one or more categories (tree)

* Example: http://en.wikipedia.org

Wiktionary is

a free-content

multilingual

dictionary

6

Wiktionary data: +, -, simplicity & complexity

− Different wiktionaries have different levels of standartization.

− Fast growing data, but it’s created by a huge community(a developed parser should be very stable)

+ Rich data+ thesaurus

(synonyms, antonyms )

+ phrase books+ etymologies+ pronunciations+ sample quotations+ translations

+ Fast growing data

+ Interwiki (add. data)

+ GNU DFL

7

Wiktionary

machine-

readable

dictionary

database

scheme

Size of Wiktionaries

WordNet (2006): 150,000 words, 115,000 synsets

A shortest path in Russian Wiktionary

Correlation of relatedness measures

Correlation with human judgments of relatedness measures 353-TC to measures based on WordNet, English Wikipedia, Russian Wiktionary

Largest eight Wiktionary editions (March 2008)

Application of Machine-readable dictionary (MRD)

Thesaurus data:

Related Terms Search

Search request extension (by synonyms) / request

reformulation (in search systems)

Request recognition in question-answering systems

Word sense disambiguation

Media data (audio + pictures)

Language learning

Work plan: done and todoRussian Wiktionary• Extraction (by RE)

– Definition– Relations (synonyms…)

– Translation– Audio– Graphics

• Database API• Visualization

(MRD browser)• Quiz & tests

(test application)

Russian Wiktionary• Database scheme

– Definition– Relations (synonyms…)

– Translation– Audio– Graphics

• Database API

English Wiktionary

15

Implementation

Software based on Synarcher code

Java

MySQL or SQLite database

JUnit test framework

16

Results

The scheme of the experiment for calculating the semantic relatedness measure based on Russian Wiktionary data

The parser of Russian Wiktionary Database scheme designed Database API implemented in Java

Compared the results of related terms search based on Wiktionary and WordNet

Project site (Wiki tool kit)

http://code.google.com/p/wikokit/

Future work

Finish creation MRD

Database and software

Russian Wiktionary and English Wiktionary

Visualization (JavaFX)

MRD browser

Quiz & tests (learning application)

Online application (Java Web-start)

asdf

Thank you!