![Page 1: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/1.jpg)
A CROSS-LINGUAL ANNOTATION PROJECTION
APPROACH FOR RELATION DETECTION
The 23rd International Conference on Computational Linguistics (COLING 2010)August 24th, 2010, Beijing
Seokhwan Kim (POSTECH)Minwoo Jeong (Saarland University)
Jonghoon Lee (POSTECH)Gary Geunbae Lee (POSTECH)
![Page 2: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/2.jpg)
Contents
• Introduction
• Methods
Cross-lingual Annotation Projection for Relation Detection
Noise Reduction Strategies
• Evaluation
• Conclusion
2
![Page 3: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/3.jpg)
Contents
• Introduction
• Methods
Cross-lingual Annotation Projection for Relation Detection
Noise Reduction Strategies
• Evaluation
• Conclusion
3
![Page 4: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/4.jpg)
What’s Relation Detection?
• Relation Extraction
To identify semantic relations between a pair of entities
ACE RDC
• Relation Detection (RD)
• Relation Categorization (RC)
4
Jan Mullins, owner Computer Recycler Incorporated said that …of
Owner-Of
![Page 5: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/5.jpg)
What’s the Problem?
• Many supervised machine learning approaches have been
successfully applied to the RDC task
(Kambhatla, 2004; Zhou et al., 2005; Zelenko et al., 2003; Culotta
and Sorensen, 2004; Bunescu and Mooney, 2005; Zhang et al.,
2006)
• Datasets for relation detection
Labeled corpora for supervised learning
Available for only a few languages
• English, Chinese, Arabic
No resources for other languages
• Korean
5
![Page 6: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/6.jpg)
Contents
• Introduction
• Methods
Cross-lingual Annotation Projection for Relation Detection
Noise Reduction Strategies
• Evaluation
• Conclusion
6
![Page 7: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/7.jpg)
Cross-lingual Annotation Projection
• Goal
To learn the relation detector without significant annotation efforts
• Method
To leverage parallel corpora to project the relation annotation on
the source language LS to the target language LT
7
![Page 8: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/8.jpg)
Cross-lingual Annotation Projection
• Previous Work
Part-of-speech tagging (Yarowsky and Ngai, 2001)
Named-entity tagging (Yarowsky et al., 2001)
Verb classification (Merlo et al., 2002)
Dependency parsing (Hwa et al., 2005)
Mention detection (Zitouni and Florian, 2008)
Semantic role labeling (Pado and Lapata, 2009)
• To the best of our knowledge, no work has reported on the
RDC task
8
![Page 9: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/9.jpg)
ProjectionAnnotation
Overall Architecture
9
Parallel Corpus
Sentences in Ls
Preprocessing(POS Tagging,
Parsing)
NER
Relation Detection
AnnotatedSentences in
Ls
Sentences in Lt
Preprocessing(POS Tagging,
Parsing)
Word Alignment
Projection
AnnotatedSentences in
Lt
![Page 10: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/10.jpg)
How to Reduce Noise?
• Error Accumulation
Numerous errors can be generated and accumulated through a
procedure of annotation projection
• Preprocessing for LS and LT
• NER for LS
• Relation Detection for LS
• Word Alignment between LS and LT
• Noise Reduction
A key factor to improve the performance of annotation projection
10
![Page 11: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/11.jpg)
• Noise Reduction Strategies (1)
Alignment Filtering
• Based on Heuristics
A projection for an entity mention should be based on alignments between
contiguous word sequences
How to Reduce Noise?
11
accepted rejected
![Page 12: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/12.jpg)
• Noise Reduction Strategies (1)
Alignment Filtering
• Based on Heuristics
A projection for an entity mention should be based on alignments between
contiguous word sequences
Both an entity mention in LS and its projection in LT should include at
least one base noun phrase
How to Reduce Noise?
12
accepted rejected
N N N N
N
accepted rejected
![Page 13: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/13.jpg)
• Noise Reduction Strategies (1)
Alignment Filtering
• Based on Heuristics
A projection for an entity mention should be based on alignments between
contiguous word sequences
Both an entity mention in LS and its projection in LT should include at
least one base noun phrase
The projected instance in LT should satisfy the clausal agreement with the
original instance in LS
How to Reduce Noise?
13
accepted rejected
N N N N
N
accepted rejected rejected
![Page 14: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/14.jpg)
How to Reduce Noise?
• Noise Reduction Strategies (2)
Alignment Correction
• Based on a bilingual dictionary for entity mentions
Each entry of the dictionary is a pair of entity mention in LS and its
translation or transliteration in LT
14
FOR each entity ES in LSRETRIEVE counterpart ET from DICT(E-T)
SEEK ET from the sentence ST in LTIF matched THEN
MAKE new alignment ES-ETENDIF
ENDFOR
A B C D E F G
α β γ δ ε δ ε
BCD - βγ
corrected
![Page 15: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/15.jpg)
How to Reduce Noise?
• Noise Reduction Strategies (3)
Assessment-based Instance Selection
• Based on the reliability of a projected instances in LT
Evaluated by the confidence score of monolingual relation detection for
the original counterpart instance in LS
Only instances with larger scores than threshold value θ are accepted
15
conf = 0.9
accepted
conf = 0.6
rejected
θ = 0.7
![Page 16: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/16.jpg)
Contents
• Introduction
• Methods
Cross-lingual Annotation Projection for Relation Detection
Noise Reduction Strategies
• Evaluation
• Conclusion
16
![Page 17: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/17.jpg)
Experimental Setup
• Dataset
English-Korean parallel corpus
• 454,315 bi-sentence pairs in English and Korean
• Aligned by GIZA++
Korean RDC corpus
• Annotated following LDC guideline for ACE RDC corpus
• 100 news documents in Korean
835 sentences
3,331 entity mentions
8,354 relation instances
17
![Page 18: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/18.jpg)
Experimental Setup
• Preprocessors
English
• Stanford Parser (Klein and Manning, 2003)
• Stanford Named Entity Recognizer (Finkel et al., 2005)
Korean
• Korean POS Tagger (Lee et al., 2002)
• MST Parser (R. McDonald et al., 2006)
18
![Page 19: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/19.jpg)
Experimental Setup
• Relation Detection for English Sentences
Tree kernel-based SVM classifier
• Training Dataset
ACE 2003 corpus
• 674 documents
• 9,683 relation instances
• Model
Shortest path enclosed subtrees kernel (Zhang et al., 2006)
• Implementation
SVM-Light (Joachims, 1998)
Tree Kernel Tools (Moschitti, 2006)
19
![Page 20: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/20.jpg)
Experimental Setup
• Relation Detection for Korean Sentences
Tree kernel-based SVM classifier
• Training Dataset
Half of the Korean RDC corpus (baseline)
Projected instances
• Model
Shortest path dependency kernel (Bunescu and Mooney, 2005)
• Implementation
SVM-Light (Joachims, 1998)
Tree Kernel Tools (Moschitti, 2006)
20
![Page 21: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/21.jpg)
Experimental Setup
• Experimental Sets
Combinations of noise reduction strategies
• (S1: Heuristic, S2: Dictionary, S3: Assessment)
1. Baseline
Trained with only half of the Korean RDC corpus
2. Baseline + Projections (no noise reduction)
3. Baseline + Projections (S1)
4. Baseline + Projections (S1 + S2)
5. Baseline + Projections (S3)
6. Baseline + Projections (S1 + S3)
7. Baseline + Projections (S1 + S2 + S3)
21
![Page 22: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/22.jpg)
Experimental Setup
• Evaluation
On the second half of the Korean RDC corpus
• The first half is for the baseline
On true entity mentions with true chaining of coreference
Evaluated by Precision/Recall/F-measure
22
![Page 23: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/23.jpg)
Experimental Results
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
23
![Page 24: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/24.jpg)
Non-filtered Projects were Poor
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
24
![Page 25: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/25.jpg)
Heuristics Were Helpful
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
25
![Page 26: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/26.jpg)
Much Worse Than Baseline
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
26
![Page 27: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/27.jpg)
Dictionary Was Also Helpful
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
27
![Page 28: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/28.jpg)
Still Worse Than Baseline
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
28
![Page 29: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/29.jpg)
Assessment Boosted Performance
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
29
![Page 30: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/30.jpg)
Combined Strategies Achieved
Better Performance Then Baseline
Modelno assessment with assessment
P R F P R F
baseline 60.5 20.4 30.5 - - -
baseline + projection 22.5 6.5 10.0 29.1 13.2 18.2
Baseline + projection(heuristics)
51.4 15.5 23.8 56.1 22.9 32.5
Baseline + projection(heuristics + dictionary)
55.3 19.4 28.7 59.8 26.7 36.9
30
![Page 31: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/31.jpg)
Contents
• Introduction
• Methods
Cross-lingual Annotation Projection for Relation Detection
Noise Reduction Strategies
• Evaluation
• Conclusion
31
![Page 32: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/32.jpg)
Conclusion
• Summary
A cross-lingual annotation projection for relation detection
Three strategies for noise reduction
Projected instances from an English-Korean parallel corpus helped
to improve the performance of the task
• with the noise reduction strategies
• Future work
A cross-lingual annotation projection for relation categorization
More elaborate strategies for noise reduction to improve the
projection performance for relation extraction
32
![Page 33: A Cross-Lingual Annotation Projection Approach for Relation Detection](https://reader033.vdocuments.net/reader033/viewer/2022042814/554e9068b4c90526358b4dc0/html5/thumbnails/33.jpg)
Q&A