montague meets markov: deep semantics with probabilistic logical form

Post on 01-Feb-2016

53 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Richard Montague. Andrey Markov. Montague Meets Markov: Deep Semantics with Probabilistic Logical Form. Islam Beltagy , Cuong Chau, Gemma Boleda, Dan Garrette, Katrin Erk, Raymond Mooney The University of Texas at Austin. Distributional Semantics Statistical method Robust Shallow. - PowerPoint PPT Presentation

TRANSCRIPT

Islam Beltagy, Cuong Chau, Gemma Boleda,

Dan Garrette, Katrin Erk, Raymond Mooney

The University of Texas at Austin

Richard Montague

Andrey Markov

Montague Meets Markov:Deep Semantics with Probabilistic Logical Form

Semantic Representations

• Formal Semantics– Uses first-order logic– Deep– Brittle

• Distributional Semantics– Statistical method– Robust – Shallow

2

• Goal: combine advantages of both logical and distributional semantics in one framework

Semantic Representations

• Combining both logical and distributional semantics– Represent meaning using a probabilistic logic (in

contrast with standard first-order logic)• Markov Logic Network (MLN)

– Generate soft inference rules• From distributional semantics

3

x hamster(x) → gerbil(x) | f(w)

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

4

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

5

Markov Logic Networks[Richardson & Domingos, 2006]

• MLN: Soft FOL–Weighted rules

∀ x Smokes ( x )⇒ Cancer ( x )∀ x,y Friends ( x,y )⇒ (Smokes ( x )⇔ Smokes ( y ) )1.1

5.1

FOL rulesRules weights

6

Markov Logic Networks[Richardson & Domingos, 2006]

∀ x Smokes( x )⇒ Cancer ( x )∀ x,y Friends ( x,y )⇒ (Smokes ( x )⇔ Smokes ( y ) )1.1

5.1

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

• MLN: Template for constructing Markov networks

• Two constants: Anna (A) and Bob (B)

7

Markov Logic Networks[Richardson & Domingos, 2006]

• Probability Mass Function (PMF)

• Inference: calculate probability of atoms– P(Cancer(Anna) | Friends(Anna,Bob), Smokes(Bob))

Weight of formula iNo. of true groundings of formula i in x

P (X=x )= 1Zexp(∑i wi ni ( x ))

Normalization constant

a possible truth assignment

8

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

9

Recognizing Textual Entailment (RTE)

• Given two sentences, a premise and a hypothesis, does the first entails the second ?

• e.g– Premise: “A male gorilla escaped from his cage in

Berlin zoo and sent terrified visitors running for cover, the zoo said yesterday.”

– Hypothesis: “A gorilla escaped from his cage in a zoo in Germany. ”

– Entails: true

10

System Architecture

11

Sent1BOXER

Rule Base

result

Sent2

LF1

LF2

Dist. RuleConstructor

Vector SpaceALCHEMY

MLN Inference

• BOXER [Bos, et al. 2004]: maps sentences to logical form

• Distributional Rule constructor: generates relevant soft inference rules based on distributional similarity

• ALCHEMY: probabilistic MLN inference • Result: degree of entailment

Sample Logical Forms

• Premise: “A man is cutting pickles”– x,y,z ( man(x) ^ cut(y) ^ agent(y, x) ^ pickles(z) ^ patient(y, z))

• Hypothesis: “A guy is slicing cucumber”– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) )

• Hypothesis in the query form– analogy to negated hypothesis in standard theorem proving– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result())

• Query – result() [Degree of Entailment]

12

13

Distributional Lexical Rules

• For every pair of words (a, b) where a is in S1 and b is in S2 add a soft rule relating the two– x a(x) → b(x) | wt(a, b)– wt(a, b) = f( cos(a, b) )

• Premise: “A man is cutting pickles”

• Hypothesis: “A guy is slicing cucumber”– x man(x) → guy(x) | wt(man, guy)– x cut(x) → slice(x) | wt(cut, slice)– x pickle(x) → cucumber(x) | wt(pickle, cucumber)

→ →

Distributional Phrase Rules

• Premise: “A boy is playing”• Hypothesis: “A little boy is playing”• Need rules for phrases– x boy(x) → little(x) ^ boy(x) | wt(boy, "little boy")

• Compute vectors for phrases using vector addition [Mitchell & Lapata, 2010]– "little boy" = little + boy

14

15

Preliminary Results: RTE-1(2005)

System Accuracy

Logic only: [Bos & Markert, 2005] 52%

Our System 57%

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

16

Semantic Textual Similarity (STS)

• Rate the semantic similarity of two sentences on a 0 to 5 scale

• Gold standards are averaged over multiple human judgments

• Evaluate by measuring correlation to human ratingS1 S2 score

A man is slicing a cucumber A guy is cutting a cucumber 5

A man is slicing a cucumber A guy is cutting a zucchini 4

A man is slicing a cucumber A woman is cooking a zucchini 3

A man is slicing a cucumber A monkey is riding a bicycle 1

17

Softening Conjunction for STS

18

• Logical conjunctions requires satisfying all conjuncts to satisfy the clause, which is too strict for STS

• Hypothesis:– x,y,z ( guy(x) ^ cut(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result())

• Break the sentence into “micro-clauses” then combine them using an “averaging combiner” [Natarajan et al., 2010]

• Becomes:– x,y,z guy(x) ^ agent(y, x)→ result()– x,y,z cut(y) ^ agent(y, x)→ result()– x,y,z cut(y) ^ patient(y, z) → result()– x,y,z cucumber(z) ^ patient(y, z) → result()

Preliminary Results: STS 2012

19

• Microsoft video description corpus– Sentence pairs given human 0-5 rating– 1,500 pairs equally split into training/test

System Pearson r

Our System with no distributional rules [Logic only] 0.52

Our System with lexical rules 0.60

Our System with lexical and phrase rules 0.73

Vector Addition [Distributional only] 0.78

Ensemble our best score with vector addition 0.85

Best system in STS 2012 (large ensemble) 0.87

Agenda

• Introduction• Background: MLN• RTE• STS• Future work and Conclusion

20

21

Future Work

• Scale MLN inference to longer and more complex sentences

• Use multiple parses to reduce impact of parse errors

• Better Rule base– Vector space methods for asymmetric weights

• wt(cucumber→vegetable) > wt(vegetable→cucumber)– Inference rules from existing paraphrase collections

– More sophisticated phrase vectors

Conclusion

22

• Using MLN to represent semantics• Combining both logical and

distributional approaches– Deep semantics: represent sentences

using logic– Robust system:• Probabilistic logic and Soft inference rule• Wide coverage of distributional semantics

Thank You

top related