islam beltagy, cuong chau, gemma boleda, dan garrette, katrin erk, raymond mooney the university of...

Post on 18-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Islam Beltagy, Cuong Chau, Gemma Boleda,

Dan Garrette, Katrin Erk, Raymond Mooney

The University of Texas at Austin

Richard Montague

Andrey Markov

Montague Meets Markov:Deep Semantics with Probabilistic Logical Form

Semantic Representations

• Formal Semantics– Uses first-order logic– Deep– Brittle

• Distributional Semantics– Statistical method– Robust – Shallow

2

• Goal: combine advantages of both logical and distributional semantics in one framework

Semantic Representations

• Combining both logical and distributional semantics– Represent meaning using a probabilistic logic (in

contrast with standard first-order logic)• Markov Logic Network (MLN)

– Generate soft inference rules• From distributional semantics

3

x hamster(x) → gerbil(x) | f(w)

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

4

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

5

Markov Logic Networks[Richardson & Domingos, 2006]

• MLN: Soft FOL–Weighted rules

∀ x Smokes ( x )⇒ Cancer ( x )∀ x,y Friends ( x,y )⇒ (Smokes ( x )⇔ Smokes ( y ) )1.1

5.1

FOL rulesRules weights

6

Markov Logic Networks[Richardson & Domingos, 2006]

∀ x Smokes( x )⇒ Cancer ( x )∀ x,y Friends ( x,y )⇒ (Smokes ( x )⇔ Smokes ( y ) )1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

• MLN: Template for constructing Markov networks

• Two constants: Anna (A) and Bob (B)

7

Markov Logic Networks[Richardson & Domingos, 2006]

• Probability Mass Function (PMF)

• Inference: calculate probability of atoms– P(Cancer(Anna) | Friends(Anna,Bob), Smokes(Bob))

Weight of formula iNo. of true groundings of formula i in x

P (X=x )= 1Zexp(∑i wi ni ( x ))

Normalization constant

a possible truth assignment

8

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

9

Recognizing Textual Entailment (RTE)

• Given two sentences, a premise and a hypothesis, does the first entails the second ?

• e.g– Premise: “A male gorilla escaped from his cage in

Berlin zoo and sent terrified visitors running for cover, the zoo said yesterday.”

– Hypothesis: “A gorilla escaped from his cage in a zoo in Germany. ”

– Entails: true

10

System Architecture

11

Sent1BOXER

Rule Base

result

Sent2

LF1

LF2

Dist. RuleConstructor

Vector SpaceALCHEMY

MLN Inference

• BOXER [Bos, et al. 2004]: maps sentences to logical form

• Distributional Rule constructor: generates relevant soft inference rules based on distributional similarity

• ALCHEMY: probabilistic MLN inference • Result: degree of entailment

Sample Logical Forms

• Premise: “A man is cutting pickles”– x,y,z ( man(x) ^ cut(y) ^ agent(y, x) ^ pickles(z) ^ patient(y, z))

• Hypothesis: “A guy is slicing cucumber”– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) )

• Hypothesis in the query form– analogy to negated hypothesis in standard theorem proving– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result())

• Query – result() [Degree of Entailment]

12

13

Distributional Lexical Rules

• For every pair of words (a, b) where a is in S1 and b is in S2 add a soft rule relating the two– x a(x) → b(x) | wt(a, b)– wt(a, b) = f( cos(a, b) )

• Premise: “A man is cutting pickles”

• Hypothesis: “A guy is slicing cucumber”– x man(x) → guy(x) | wt(man, guy)– x cut(x) → slice(x) | wt(cut, slice)– x pickle(x) → cucumber(x) | wt(pickle, cucumber)

→ →

Distributional Phrase Rules

• Premise: “A boy is playing”• Hypothesis: “A little boy is playing”• Need rules for phrases– x boy(x) → little(x) ^ boy(x) | wt(boy, "little boy")

• Compute vectors for phrases using vector addition [Mitchell & Lapata, 2010]– "little boy" = little + boy

14

15

Preliminary Results: RTE-1(2005)

System Accuracy

Logic only: [Bos & Markert, 2005] 52%

Our System 57%

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

16

Semantic Textual Similarity (STS)

• Rate the semantic similarity of two sentences on a 0 to 5 scale

• Gold standards are averaged over multiple human judgments

• Evaluate by measuring correlation to human ratingS1 S2 score

A man is slicing a cucumber A guy is cutting a cucumber 5

A man is slicing a cucumber A guy is cutting a zucchini 4

A man is slicing a cucumber A woman is cooking a zucchini 3

A man is slicing a cucumber A monkey is riding a bicycle 1

17

Softening Conjunction for STS

18

• Logical conjunctions requires satisfying all conjuncts to satisfy the clause, which is too strict for STS

• Hypothesis:– x,y,z ( guy(x) ^ cut(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result())

• Break the sentence into “micro-clauses” then combine them using an “averaging combiner” [Natarajan et al., 2010]

• Becomes:– x,y,z guy(x) ^ agent(y, x)→ result()– x,y,z cut(y) ^ agent(y, x)→ result()– x,y,z cut(y) ^ patient(y, z) → result()– x,y,z cucumber(z) ^ patient(y, z) → result()

Preliminary Results: STS 2012

19

• Microsoft video description corpus– Sentence pairs given human 0-5 rating– 1,500 pairs equally split into training/test

System Pearson r

Our System with no distributional rules [Logic only] 0.52

Our System with lexical rules 0.60

Our System with lexical and phrase rules 0.73

Vector Addition [Distributional only] 0.78

Ensemble our best score with vector addition 0.85

Best system in STS 2012 (large ensemble) 0.87

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

20

21

Future Work

• Scale MLN inference to longer and more complex sentences

• Use multiple parses to reduce impact of parse errors

• Better Rule base– Vector space methods for asymmetric weights

• wt(cucumber→vegetable) > wt(vegetable→cucumber)– Inference rules from existing paraphrase collections

– More sophisticated phrase vectors

Conclusion

22

• Using MLN to represent semantics• Combining both logical and

distributional approaches– Deep semantics: represent sentences

using logic– Robust system:• Probabilistic logic and Soft inference rule• Wide coverage of distributional semantics

Thank You

top related