intelligent database systems lab n.y.u.s.t. i. m. unsupervised word sense disambiguation for korean...

16
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and dictionary Presenter: Chun-Ping Wu Authors: Yeohoon Yoon, Choong-Nyoung Seon, Songwook Lee, Jungynu Seo IPM 2007 國國國國國國國國 National Yunlin University of Science and Technology

Upload: alannah-walton

Post on 18-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph

using corpus and dictionary

Presenter: Chun-Ping Wu  Authors: Yeohoon Yoon, Choong-Nyoung Seon, Songwook Lee, Jungynu Seo

IPM 2007

國立雲林科技大學National Yunlin University of Science and Technology

Page 2: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Methodology Experiments Conclusion Comments

2

Page 3: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

The Word Sense Disambiguation is a common problem in natural language processing.

Traditional approaches only consider the co-occurrence probability alone.

3

Sample: I deposit some money in the bank.

Options:bank = 銀行?bank = 堤 ; 岸?bank = ( 一 ) 排; ( 一 ) 組

Page 4: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

To construct a WSD system, which can be easily implemented by learning all polysemous words at once, while covering all polysemous words which are listed in MRD.

To consider relation between each sense of context words and the sense of the target word.

4

Sample: I deposit some money In the bank.

Ans:bank = 銀行

Page 5: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Learning step Similarity matrix

Word vector

Vector representations of sense definitions in MRD

Disambiguation step The definition of acyclic weighted digraph.

Selecting context words

Constructing the acyclic weighted digraph

Searching the optimal path on the acyclic weighted digraph

5

Page 6: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Learning step

Similarity matrix

Word vector

Vector representations of sense definitions

in MRD

6

Page 7: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Learning step

Similarity matrix

Word vector

Vector representations of sense definitions

in MRD.

7

Page 8: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology Learning step

Similarity matrix

Word vector

Vector representations of sense definitions

in MRD

8

Page 9: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph.

Selecting context words

Constructing the acyclic weighted digraph

Searching the optimal path on the acyclic

weighted digraph

9

Page 10: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph.

Selecting context words

Constructing the acyclic weighted digraph

Searching the optimal path on the acyclic

weighted digraph

10

Page 11: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph.

Selecting context words

Constructing the acyclic weighted digraph

Searching the optimal path on the acyclic

weighted digraph

11

Page 12: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

Disambiguation step The definition of acyclic weighted digraph.

Selecting context words

Constructing the acyclic weighted digraph

Searching the optimal path on the acyclic

weighted digraph

12

Page 13: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

System results

13

Page 14: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

Experiment on English The accuracy of the system is 30.7% on average.

The result is very low; there are some reasons as follows. Context words are not appropriate although context words are very important in that

they decide which sense of the target word might be the best. Mapping English senses to Korean for using English-Korean dictionary leads to some

loss of information. The errors of the stemming process disturbed us to search the right root of the verb in

the MRD.

14

Page 15: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

1515

To consider the relationship between each sense of context words and the sense of the target word

By using Viterbi algorithm to reduce computational complexity.

The system showed bad results on English (30.7), but it resulted in suitable performances, 76.4% by accuracy, over the semantically ambiguous Korean words.

To apply this method to other languages by studying language characteristics.

Page 16: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

1616

Advantage To consider the relationship between each sense of context words and

the sense of the target word.

By using Viterbi algorithm to reduce computational complexity.

Drawback The performance of this system is better in Korean.

Application Word Sense Disambiguation