kbp2014 entity linking scorer

19
KBP2014 Entity Linking Scorer Xiaoman Pan, Qi Li, Heng Ji, Xiaoqiang Luo, Ralph Grishman [email protected]

Upload: talli

Post on 22-Feb-2016

55 views

Category:

Documents


0 download

DESCRIPTION

KBP2014 Entity Linking Scorer. Xiaoman Pan, Qi Li, Heng Ji , Xiaoqiang Luo , Ralph Grishman jih @ rpi.edu. Overview. We will apply two steps to evaluate the KBP2014 Entity Discovery and Linking results. 1. Clustering score 2. Entity Linking ( Wikification ) F-score. Overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: KBP2014 Entity Linking Scorer

KBP2014 Entity Linking Scorer

Xiaoman Pan, Qi Li, Heng Ji, Xiaoqiang Luo, Ralph [email protected]

Page 2: KBP2014 Entity Linking Scorer

Overview We will apply two steps to evaluate the KBP2014

Entity Discovery and Linking results.

1. Clustering score2. Entity Linking (Wikification) F-score

Page 3: KBP2014 Entity Linking Scorer

Overview We use a tuple

doc-id, start, end, entity-type, kb-id⟨ ⟩to represent each entity mention, where a special type of kb-id is NIL.

Let s be an entity mention in the system output, g be an corresponding gold-standard. An output mention s matches a reference mention g iff:

1. s.doc-id = g.doc-id,2. s.start = g.start, s.end = g.end, 3. s.entity-type = g.entity-type,4. and s.kb-id = g.kb-id.

Page 4: KBP2014 Entity Linking Scorer

Clustering score

In this step, we only concern clustering performance. Thus, change all mentions’ kb_id to NIL.

We will apply three metrics to evaluate the clustering score B-Cubed metric CEAF metric Graph Edit Distance (G-Edit)

Page 5: KBP2014 Entity Linking Scorer

B-Cubed

Page 6: KBP2014 Entity Linking Scorer

B-Cubed

Page 7: KBP2014 Entity Linking Scorer
Page 8: KBP2014 Entity Linking Scorer
Page 9: KBP2014 Entity Linking Scorer

CEAF CEAF (Luo 2005) Idea: a mention or entity should not be credited

more than once Formulated as a bipartite matching problem

A special ILP problem efficient algorithm: Kuhn-Munkres

We will use CEAFm as the official scoring metric because it’s more sensitive to cluster size than CEAFe

Page 10: KBP2014 Entity Linking Scorer

CEAF (Luo, 2005)

Page 11: KBP2014 Entity Linking Scorer
Page 12: KBP2014 Entity Linking Scorer
Page 13: KBP2014 Entity Linking Scorer

G-Edit The first step, evaluate the name mention

tagging F-measure The second step, evaluate the overlapped

mentions by using graph-based edit distance To do:

Find a theoretical proof that the connected component based method is the unique optimal solution

Try to make the matching more sensitive to cluster size

Page 14: KBP2014 Entity Linking Scorer
Page 15: KBP2014 Entity Linking Scorer
Page 16: KBP2014 Entity Linking Scorer
Page 17: KBP2014 Entity Linking Scorer
Page 18: KBP2014 Entity Linking Scorer

Wikification F-score

Existing Entity Linking (Wikification) F-score

Page 19: KBP2014 Entity Linking Scorer