distant supervision and multir - indian institute of...

35
Distant Supervision and MultiR Happy Mittal

Upload: others

Post on 27-Jan-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant Supervision and MultiR

Happy Mittal

Page 2: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

We will discuss

• Distant Supervision [Mintz et al, 2009]

• MultiR [Hoffmann et al, 2011]

Page 3: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Relation Instance Extraction

• Fully Supervised Learning• Labeled corpora of sentences.• Suffers from small dataset, domain bias.

• Unsupervised Learning• Cluster patterns to identify relations.• Large corpora available.• Can’t give name to relations identified.

• Bootstrap Learning• Give initial seed patterns and facts.• Generate more facts and patterns.• Suffers from semantic drift.

• Distant Supervision• Combines advantages of above approaches.

Hrithik Roshan’s Movie Kaabilfeatures love affair between two blind people.

Actor(Hrithik Roshan, Kaabil)

Page 4: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant Supervision [Mintz et al 2009]

Sentences (Ex : Wikipedia articles)

Person Birth Place

Edwin Hubble Marshfield

…. ….

Knowledge base(Ex : Freebase)

Generate training data

HOW ?

Assumption : Fact r(e1,e2) => Every sentence having entities e1 and e2specifies relation r.

Page 5: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

• Features : • Lexical Features

o Entity Types of both entities.

NE1 NE2 Label

PER LOC Birthplace

Page 6: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

• Features : • Lexical Features

o Words between entities and their POS tags.

NE1 Middle NE2 Label

PER [was/VERB born/VERB in/CLOSED] LOC Birthplace

Page 7: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

• Features : • Lexical Features

o Window of k words to left and right, k∈{0,1,2}

Left Window NE1 Middle NE2 Right window Label

[] PER [was/VERB born/VERB in/CLOSED] LOC [] Birthplace

[Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,] Birthplace

[#,Astronomer] PER [was/VERB born/VERB in/CLOSED] LOC [,Missouri] Birthplace

Page 8: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant Supervision (Generating training data)• Astronomer Edwin Hubble was born in Marshfield, Missouri.

• Features : • Syntactic Features

o Dependency Path between entities.

o Window node in dependency path.

Page 9: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Distant supervision

• Strong Assumption : If a fact r(e1,e2) is seen in KB, then • Every sentence having e1 and e2 specifies relation r.

• Relax this assumption : • At least one sentence having e1 and e2 specifies relation r [Riedel et al, 2010]

Page 10: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Relaxing the assumption [Riedel et al 2010]

Founded 𝑌 ∈ R Relation Variable

Z1 = 1 Z2 = 0

Steve Jobs founded Apple

Steve Jobs is the CEO of Apple

Z1,Z2∈ {0,1} Relation mention Variables

X1 X2

• Model the joint distribution 𝑃(𝑌 = 𝑦, 𝑍 = 𝑧|𝑥)

Page 11: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Relaxing the assumption [Riedel et al 2010]

Founded 𝑌 ∈ R Relation Variable

Z1 = 1 Z2 = 0

Steve Jobs founded Apple

Steve Jobs is the CEO of Apple

Z1,Z2∈ {0,1} Relation mention Variables

X1 X2

• Model the joint distribution 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥• Problem : Doesn’t allow overlapping relations.• MultiR solves that problem.

Page 12: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR [Hoffman et al 2011]

Founded 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

Z1 = Founded

Z2 = CEO-of

Steve Jobs founded Apple

Steve Jobs is the CEO of Apple

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

CEO-of

Z3 = None

Steve Jobs left Apple

X3

Page 13: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR [Hoffman et al 2011]

• Probability Distribution

• 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥 =1

𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 ∅

𝑒𝑥𝑡𝑟𝑎𝑐𝑡(𝑧𝑖,𝑥𝑖)

1 if at least one 𝑧𝑖mentions relation 𝑦𝑟

[Mintz et al] features

Page 14: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR [Hoffman et al 2011]• Parameter Learning

• 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥; 𝜃 =1

𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 ∅

𝑒𝑥𝑡𝑟𝑎𝑐𝑡(𝑧𝑖,𝑥𝑖)

• 𝑃 𝑌 = 𝑦, 𝑍 = 𝑧 𝑥; 𝜃 =1

𝑍𝑥 𝑟𝜙𝑗𝑜𝑖𝑛(𝑦𝑟 , 𝑧) 𝑖 exp( 𝑗 𝜃𝑗 ∅𝑗(𝑧𝑖,𝑥𝑖)

• Treat Z variables as latent variables.

• Interested in maximizing

𝐿 𝜃 =

𝑖

𝑃 𝑦𝑖 𝑥𝑖; 𝜃 =

𝑖

𝑧

𝑃 𝑦𝑖 , 𝑧 𝑥𝑖; 𝜃

𝑙 𝜃 =

𝑖

𝑙𝑜𝑔

𝑧

𝑃 𝑦𝑖 , 𝑧 𝑥𝑖; 𝜃

1 if at least one 𝑧𝑖mentions relation 𝑦𝑟

[Mintz et al] features

Page 15: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR [Hoffman et al 2011]• Parameter learning

Assumption of online training

Page 16: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR [Hoffman et al 2011]• Parameter learning

Difficult to computeCompute argmax instead

Page 17: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR [Hoffman et al 2011]

• Learning Algorithm

Need to do two inferences

Page 18: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

? 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

? ?

Steve Jobs founded Apple

Apple was founded by Steve Jobs

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

?

?

Steve Jobs ls the CEO of Apple

X3

founder CEO-of

?

Capital

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Page 19: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

? 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

? ?

Steve Jobs founded Apple

Apple was founded by Steve Jobs

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

?

?

Steve Jobs is the CEO of Apple

X3

founder CEO-of

?

Capital

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Page 20: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

? 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

Founder Founder

Steve Jobs founded Apple

Apple was founded by Steve Jobs

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

?

CEO-of

Steve Jobs is the CEO of Apple

X3

founder CEO-of

?

Capital

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Page 21: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 1 𝑎𝑟𝑔𝑚𝑎𝑥𝑦,𝑧𝑃(𝑦, 𝑧|𝑥; 𝜃)

1 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

Founder Founder

Steve Jobs founded Apple

Apple was founded by Steve Jobs

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

1

CEO-of

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

𝑂( 𝑅 𝑆 )Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Page 22: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

1 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

? ?

Steve Jobs founded Apple

Apple was founded by Steve Jobs

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

1

?

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Page 23: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

1 𝑌 ∈ 0,1 𝑟

Relation Variables(Capture aggregate level prediction)

? ?

Steve Jobs founded Apple

Apple was founded by Steve Jobs

𝑍𝑖 ∈ 𝑅

Relation mention Variables(Capture sentence level prediction)

X1 X2

1

?

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

10.512.5 4.58.9

8.78.5

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Potentials as edge weights(Ignore edgesWith y = 0)

Page 24: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

1

Variant of weighted edge cover problem

? ?

Steve Jobs founded Apple

Apple was founded by Steve Jobs

X1 X2

1

?

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

10.512.5 4.58.9

8.78.5

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Potentials as edge weights(Ignore edgesWith y = 0)

Each y at least one edgeEach z exactly one edge

Page 25: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

1

Variant of weighted edge cover problem

? ?

Steve Jobs founded Apple

Apple was founded by Steve Jobs

X1 X2

1

?

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

10.512.5 4.58.9

8.78.5

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Potentials as edge weights(Ignore edgesWith y = 0)

Each y at least one edgeEach z exactly one edge

Page 26: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

1

Variant of weighted edge cover problem

Founder Founder

Steve Jobs founded Apple

Apple was founded by Steve Jobs

X1 X2

1

CEO-Of

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

10.512.5 4.58.9

8.78.5

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Potentials as edge weights(Ignore edgesWith y = 0)

Each y at least one edgeEach z exactly one edge

Exact Solution𝑂(𝑉(𝐸 + 𝑉𝑙𝑜𝑔𝑉))

Page 27: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

MultiR Inference 2 𝑎𝑟𝑔𝑚𝑎𝑥𝑧𝑃(𝑧|𝑥, 𝑦; 𝜃)

1

Variant of weighted edge cover problem

Founder Founder

Steve Jobs founded Apple

Apple was founded by Steve Jobs

X1 X2

1

CEO-Of

Steve Jobs is the CEO of Apple

X3

founder CEO-of

0

Capital

10.512.5 4.58.9

8.78.5

Founder 10.5 12.5 4.5

CEO-of 8.9 8.7 8.5

Capital 6.3 4.5 0.5

Potentials as edge weights(Ignore edgesWith y = 0)

Each y at least one edgeEach z exactly one edge

Approx Solution𝑂(|𝑅||𝑆|)

Page 28: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Experiments• Data

• NY Times sentences : NER tagged

• Used Freebase as KB.

• Evaluation Metric

• Challenging

• Only 3% of sentences match facts in KB.

• Number of matches across relations highly unbalanced.

• Aggregate Extraction

• Matched extracted relations with freebase relations.

• Underestimates accuracy because many true relations not in free base.

• Sentential Extraction

• Sampled sentences from union of two sets of sentences : • Sentences from which some relation is extracted.

• Sentences whose arguments match with entities in freebase.

• Manually labelled them correct or incorrect.

• Overestimates the recall.

Page 29: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Experiments

• Systems compared• Original implementation of Riedel et al [2010]

• SoloR : Reimplementation of Riedel et al [2010]

• MultiR

• Metrics• Aggregate and sentential extraction results (PR curve)

• Relation specific results

• Running time

Page 30: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Experiments

• Results• Aggregate extraction

• MultiR : High precision over all recall

• MultiR : Recall from 20% to 25%

• Low precision in 0-1% Recall

• To investigate, extracted top 10

Relations marked wrong.

• Correct but not present in Freebase.

Page 31: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Experiments

• Results• Sentential extraction

• Riedel et al didn’t report.

• MultiR : High precision and recall

• MultiR : F1 score : 60.5%

Page 32: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Experiments

• Results• Relation specific results

• Take 10 top frequent relations.

• 𝑆𝑟𝑀 : Sentences MultiR extracted relation r.

• 𝑆𝑟𝐹 : Sentences matching arguments in freebase for relation r.

• Sample 100 sentences from both.

• Compute Accuracy, Precision and recall.

Page 33: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Experiments

Effect of modeling overlapping relations

Page 34: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Discussion

• Only relies on freebase for experimental evaluation [Nupur et al]

• Assumes that if a fact is present in text, then it must be present in KB [Dinesh Raghu]

• Only one relation in a sentence [Barun]

• Assume entities occur as NP only. [Gagan]

• Should use sampling instead of argmax as done in Riedel et al. [Happy, Barun]

• Evaluation problem : Only 3% sentences match in Freebase [Gagan]

• For sentential extraction evaluation, sampled only 1000 sentences.

• Separate graph for every entity pair : Scaling issue [Prachi]

Page 35: Distant Supervision and MultiR - Indian Institute of ...mausam/courses/col864/spring2017/slides/06-multir.pdf · Steve Jobs founded Apple Apple was founded by Steve Jobs X1 X2 1 CEO-Of

Possible Extensions

• Evaluate on some other datasets as well, like Google knowledge graph [Anshul, Rishabh]

• Bootstrapping like NELL [Gagan et al]

• Iteratively correct the facts during learning for 0-1% recall range [Surag]

• Extract entity mentions spanning multiple sentences [Anshul]

• Relation to MLNs : Apply Lifting [Ankit]