multilingual document mining and navigation using self-organizing maps
DESCRIPTION
Multilingual document mining and navigation using self-organizing maps. Presenter : Keng -Yu Lin Author : Hsin -Chang Yang , Han-Wei Hsiao , Chung-Hong Lee IPM .2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
Multilingual document mining and navigation using self-organizing maps
Presenter : Keng-Yu LinAuthor : Hsin-Chang Yang , Han-Wei Hsiao , Chung-Hong Lee
IPM .2011
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines· Motivation· Objectives· Methodology· Experiments· Conclusions· Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
· Monolingual interface may limit the spread of users who unfamiliar with the language.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
4
Objectives
· To propose an approach that could automatically arrange multilingual Web pages into a multilingual Web directory to break the language barriers in Web navigation.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
· Preprocessing Word segmentation Stopword elimination Stemming Keyword selection
· Encoding All keywords of all documents are collected to build a vocabulary VE.
A document is encoded into a binary vector according to those keywords that occurred in it.
Ex: Xi=[0,1,1,0,1,0,1,1]
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
6
SOM Algorithm
=> document cluster map (DCM)=> keyword cluster map (KCM)
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
· Determining dominating clusters algorithm
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
· Evaluation of quality of generated hierarchies
8
(C1,C3)=4(C3,C5)=3(C1,C5)=3PK=(4+3+3)/3=3.33
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
· Multilingual web directory generation Semantic similarity
Structural similarity
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions
· The approach is fully automated and requires no human intervention.
· The result of the alignment can be applied to tackle tasks such as multilingual information retrieval.
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
· Advantage The research result can help people to break
language barrier.
· Applications Multilingual information retrieval.
12