self-organizing semantic maps and its application to word alignment in japanese-chinese parallel
DESCRIPTION
Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel. Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Qing Ma, Kyoko Kanzaki, Yujie Zhang, Masaki Murata, Hitoshi Isahara. Neural Networks 17 (2004) 1241–1253. Outline. Motivation - PowerPoint PPT PresentationTRANSCRIPT
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Advisor : Dr. HsuGraduate : Chun Kai ChenAuthor : Qing Ma, Kyoko Kanzaki, Yujie Zhang, Masaki Murata, Hitoshi Isahara
Self-organizing semantic maps and its application to word alignment in Japanese-Chinese parallel
Neural Networks 17 (2004) 1241–1253
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Introduction Self-organizing monolingual semantic maps Experimental Results Conclusions Personal Opinion
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
A number of corpus-based statistical approaches have been used to compute word similarity
It is difficult to recognize the relationships between groups or the relationships between words within groups
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
We need a technique that can map words from a very large lexicon into a small semantic space
A visible representation where words with similar meanings are placed at the same or neighboring points so that the distance between the points represents the semantic similarity in the words
Semantic maps can be automatically constructed with self-organization
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Introduction
Presents a method of self-organizing monolingual semantic maps for Chinese and Japanese using SOM for specific purpose
To construct semantic maps of nouns from the point of view of the adnominal constituents
Extended to the construction of Japanese–Chinese bilingual semantic maps
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Self-organizing monolingual semantic maps
Data coding•Baseline method•Frequency term-weighting method•TFIDF term-weighting method
dij is the word similarity
。。。。。。
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Data coding
Word wi can be defined by a set of its co-occurring words as
V(wi) is the input to the SOMonly reflects the relationships between a pair of words
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Data coding method
Baseline method
─ dij : word similarity
─ ai & aj: are the numbers of co-occurring words of wi and wj
─ cij : is the number of co-occurring words that both wi and wj have in common
Frequency term-weighting method TFIDF term-weighting method
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Table 1Comparative results for various coding methods and clustering
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Evaluation methods
Numerical evaluation─ precision─ recall─ F-measure
Intuitive evaluation─ our ‘common sense’
Comparison with other methods─ multivariate statistical analyses
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Experimental Results
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.TFIDF comparison with PCA
Fig. 2. Chinese semantic map using principal component analysisFig. 1. Chinese semantic map based on TFIDF term-weighted coding
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Table 2Clustering results with TFIDF term-weighted coding
The underlined words are those classified into incorrect areas.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Semantic map comparison with PCA
Fig. 3. Japanese semantic map based on the TFIDF term-weighted coding method
Fig. 4. Japanese semantic map using principal component analysis
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Self-organizing bilingual semantic maps(1/2)
When a translation pair of sentences like
Each Japanese word can therefore be automatically aligned to a Chinese word from this map by measuring its distance
If the Chinese word keyi (can) is closest to the Japanese word seta (can), then the Japanese word seta (can) is regarded as being aligned to the Chinese word keyi (can)
(Japanese) keiei toppu ga tei seichou jidai teichaku wo jikkan shite iru koto wo ukagawa seta.(Chinese) youci keyi kanchu, zuigao jingyingzhe shengan jingji ren tingliu zai dishu zengzhang shidai.(English) We can see that upper management has realized that the economy is fixed in an eras of slow growth.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Self-organizing bilingual semantic maps(2/2)
A small-scale (10 translation pairs) experimental comparison with the baseline method
Comparison with hierarchical clustering and multivariate statistical analysis
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Data coding(1/2)
(Japanese) keiei toppu ga tei seichou jidai teichaku wo jikkan shite iru koto wo ukagawa seta.(Chinese) youci keyi kanchu, zuigao jingyingzhe shengan jingji ren tingliu zai dishu zengzhang shidai.(English) We can see that upper management has realized that the economy is fixed in an eras of slow growth.
Ji (i=1,.,m) are Japanese words forming the Japanese sentenceCi (i=1,.,n) are Chinese words forming the translated Chinese sentence
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Data coding(2/2)
is a co-occurring word of Ji
is the normalized co-occurrence frequency
is a co-occurring word of either or severals of Jj1;.; Jj;ni
is the normalized co-occurrence frequency
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Semantic map comparison with PCA
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Semantic map comparison with Baseline
Table 3Word alignment result obtained from semantic map
Table 4Baseline word alignment results
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions and Future Work
Proposed a method of self-organizing monolingual semantic maps for Japanese and Chinese
Experimental results proved that these maps were generally consistent with our intuition
Comparison demonstrated that the hierarchical clustering technique is inferior to SOM in terms of classifying ability
Furthermore, multivariate statistical analysis such as principal component analysis and factor analysis gave worse results
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions and Future Work
An extension to the automatic construction of bilingual semantic maps of Japanese and Chinese
Develop an automatic method of transforming both Japanese and Chinese words
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Personal Opinion
…..