multilingual document mining and navigation using self-organizing maps

12
Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 Multilingual document mining and navigation using self-organizing maps Presenter : Keng-Yu Lin Author : Hsin-Chang Yang , Han-Wei Hsiao , Chung-Hong Lee IPM .2011

Upload: kata

Post on 29-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Multilingual document mining and navigation using self-organizing maps. Presenter : Keng -Yu Lin Author : Hsin -Chang Yang , Han-Wei Hsiao , Chung-Hong Lee IPM .2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

1

Multilingual document mining and navigation using self-organizing maps

Presenter : Keng-Yu LinAuthor : Hsin-Chang Yang , Han-Wei Hsiao , Chung-Hong Lee

IPM .2011

Page 2: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

2

Outlines· Motivation· Objectives· Methodology· Experiments· Conclusions· Comments

Page 3: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

· Monolingual interface may limit the spread of users who unfamiliar with the language.

3

Page 4: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

4

Objectives

· To propose an approach that could automatically arrange multilingual Web pages into a multilingual Web directory to break the language barriers in Web navigation.

Page 5: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Preprocessing Word segmentation Stopword elimination Stemming Keyword selection

· Encoding All keywords of all documents are collected to build a vocabulary VE.

A document is encoded into a binary vector according to those keywords that occurred in it.

Ex: Xi=[0,1,1,0,1,0,1,1]

5

Page 6: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

6

SOM Algorithm

=> document cluster map (DCM)=> keyword cluster map (KCM)

Page 7: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Determining dominating clusters algorithm

7

Page 8: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Evaluation of quality of generated hierarchies

8

(C1,C3)=4(C3,C5)=3(C1,C5)=3PK=(4+3+3)/3=3.33

Page 9: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

· Multilingual web directory generation Semantic similarity

Structural similarity

9

Page 10: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

10

Page 11: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusions

· The approach is fully automated and requires no human intervention.

· The result of the alignment can be applied to tackle tasks such as multilingual information retrieval.

11

Page 12: Multilingual document mining and navigation using self-organizing maps

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

· Advantage The research result can help people to break

language barrier.

· Applications Multilingual information retrieval.

12