selecting proper lexical paraphrase for children
Post on 06-Jul-2015
129 views
DESCRIPTION
Tomoyuki Kajiwara, Hiroshi Matsumoto and Kazuhide Yamamoto. Selecting Proper Lexical Paraphrase for Children. The 25th Conference on Computational Linguistics and Speech Processing (ROCLING 2013), pp.59-73 (2013.10)TRANSCRIPT
Selecting Proper Lexical Paraphrase for Children
Tomoyuki Kajiwara Hiroshi Matsumoto Kazuhide Yamamoto
Nagaoka University of Technology
Lexical Paraphrase for Children Elementary school Japanese dictionary
【大詰め:final stage】
The last scene of the play 芝居の最後の場面
Newspaper for Children
最後の大一番 Big match of the last
Basic Vocabulary to Learn
5,404 words
Newspaper for Adults
大詰めの大一番 Big match of the final stage
Total annual number of vocabulary
200,000 words Selected by the similarity between the headword
2
BVL : Basic Vocabulary to Learn
Vocabulary that registered in the elementary school dictionary
Vocabulary that registered in the general dictionary
Vocabulary that elementary
school students can use sufficient Vocabulary of
the minimum necessary for a living
3
Basic Vocabulary 2,000 words
Basic Vocabulary to Learn 5,404 words
General Vocabulary
Paraphrase to BVL from GV and VL
Reading assistance for
elementary school students
Vocabulary to Learn 25,000 words
Related Works • Paraphrase of utilizing a dictionary
– headword → headword • Fujita et al. (2000)、Mino and Tanaka (2011)
– headword → word from the end of definition statement • Kaji et al. (2002)、Mino and Tanaka (2011)、Kajiwara and Yamamoto(2013)
”The definition statements are simpler than the headwords” ”The last segment represents the meaning of the headword”
4
Problem of Related Works Definition 【 大詰め 】芝居の最後の場面 【final stage】the last scene of the party
Paraphrase ✕ 大詰めの大一番 → 場面の大一番 Big match of the final stage → Big match of the scene
✔ 大詰めの大一番 → 最後の大一番 Big match of the final stage → Big match of the last
Appropriate target words are not always
found at the end of definitions 5
Proposed Method
Proposed Method(1/2) • Acquisition of the Target Word Candidates ① Difficult word is extracted ② Entries of the difficult word are searched ③ Words are extracted if they are the same part-of-speech as the difficult word
6
・・・ professor ・・・
【professor】People of status as professor. 【professor】Status as professor. 【professor】Teach learning and skill. 【professor】University teacher.
Japanese Dictionary
Original Sentence People Status
Professor Learning
Skill University
Teacher
②
① ③
Proposed Method(2/2) • Selection of the Proper Target Word ④ Simple words are extracted ⑤ Similarities of meaning are calculated ⑥ Simple word with the highest similarity is selected
7
Basic Vocabulary to Learn
People Learning University Skill Teacher
People Status
Professor Learning
Skill University
Teacher
:0.17 :0.11 :0.08 :0.13 :0.25
④ ⑤
⑥
Experiments
Comparative Methods • Acquisition of the Target Word Candidates One word is extracted From the end of definition statements If it is the same part-of-speech as the difficult word
• Selection of the Proper Target Word Weighted voting by following methods • Frequency • Co-occurrence frequency • Point-wise Mutual Information • Tri-gram frequency • Cosine similarity between document vectors 8
Experimental Setup • Experimental object : 152 difficult words – Do not appear in BVL – Appear more than 50 times in the Mainichi News Paper published in 2000
– Include paraphrasable simple words in the definition statements
• Dictionary : Three Japanese dictionary • Thesaurus : Japanese WordNet
9
Procedure (1/2) • Experiments on the 52 difficult words – Decide weight
• Experiments on the 100 difficult words – Weighted voting
• Evaluation – Three evaluator are judged – Decide by majority vote – Definition of “paraphrasable” The simple word can be replaced with difficult word in the original sentence
10
Procedure (2/2)
11
・・・ professor ・・・
【professor】People of status as professor. 【professor】Status as professor. 【professor】Teach learning and skill. 【professor】University teacher.
Japanese Dictionary
Original Sentence People Status
Professor Learning
Skill University
Teacher
③ Nouns are extracted
Basic Vocabulary to Learn
People Learning University Skill Teacher
People Status
Professor Learning
Skill University
Teacher
:0.17 :0.11 :0.08 :0.13 :0.25
④ Simple words are extracted ⑤ Similarities of meaning are calculated
① Difficult word is extracted
② Entries of the professor are searched
Result (1/3) • Acquisition of the Target Word Candidates – More paraphrasable simple words are acquired – Only 3.2 points difference
Many paraphrasable simple words appear at the end of definition statements
Number of paraphrasable words
Percentage of paraphrasable words
Proposed 165 / 221 74.7 % Comparative 158 / 221 71.5 %
12
Result (2/3)
13 0 10 20 30 40 50 60 70
(5) Cosine similarity
(4) Tri-gram frequency
(3) Point-wise Mutual Information
(2) Co-occurrence Frequency
(1) Frequency
【Proposed】WordNet-similarity
【Baseline】Randomness
Acquisition by comparative method Acquisition by proposed method
Result (3/3)
14
0 10 20 30 40 50 60 70
D) Weighted voting adds the WordNet-similarity to the B)
C) Weightless voting adds the WordNet-similarity to the A)
B) Weighted voting by comparative methods (1)-(5)
A) Weightless voting by comparative methods (1)-(5)
【Proposed】WordNet-similarity
【Baseline】Randomness
Acquisition by comparative method Acquisition by proposed method
The method utilizing frequency or context information selected paraphrasable word
Erroneous Examples (1/2) • Two or more simple words have the highest similarity Example • Original : A summary of the main points. • Definition :【Points】essential, score, game, spot essential score game spot
15
: similarity 1.0 : similarity 1.0 : similarity 1.0 : similarity 1.0
• The non-paraphrasable word have the highest similarity Example • Original : I can play the program during recording. • Definition : 【Play】Use the garbage again. What was gone once again regains power and life.
Erroneous Examples (2/2)
16
use : paraphrasable, similarity 0.8 power : non-paraphrasable, similarity 1.0
The method utilizing frequency or context information selected paraphrasable word
Conclusion We paraphrase difficult word to simple word with the highest similarity using the whole definition statements • Acquisition of the Target Word Candidates – More paraphrasable simple words are acquired – Many of them appear at the end of definitions
• Selection of the Proper Target Word The selection based on the similarity is better than the selection by frequency or context information
17