Islam Beltagy, Cuong Chau, Gemma Boleda,
Dan Garrette, Katrin Erk, Raymond Mooney
The University of Texas at Austin
Richard Montague
Andrey Markov
Montague Meets Markov:Deep Semantics with Probabilistic Logical Form
Semantic Representations
• Formal Semantics– Uses first-order logic– Deep– Brittle
• Distributional Semantics– Statistical method– Robust – Shallow
2
• Goal: combine advantages of both logical and distributional semantics in one framework
Semantic Representations
• Combining both logical and distributional semantics– Represent meaning using a probabilistic logic (in
contrast with standard first-order logic)• Markov Logic Network (MLN)
– Generate soft inference rules• From distributional semantics
3
x hamster(x) → gerbil(x) | f(w)
Agenda
• Introduction• Background: MLN• RTE• STS• Future work and Conclusion
4
Agenda
• Introduction• Background: MLN• RTE• STS• Future work and Conclusion
5
Markov Logic Networks[Richardson & Domingos, 2006]
• MLN: Soft FOL–Weighted rules
∀ x Smokes ( x )⇒ Cancer ( x )∀ x,y Friends ( x,y )⇒ (Smokes ( x )⇔ Smokes ( y ) )1.1
5.1
FOL rulesRules weights
6
Markov Logic Networks[Richardson & Domingos, 2006]
∀ x Smokes( x )⇒ Cancer ( x )∀ x,y Friends ( x,y )⇒ (Smokes ( x )⇔ Smokes ( y ) )1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
• MLN: Template for constructing Markov networks
• Two constants: Anna (A) and Bob (B)
7
Markov Logic Networks[Richardson & Domingos, 2006]
• Probability Mass Function (PMF)
• Inference: calculate probability of atoms– P(Cancer(Anna) | Friends(Anna,Bob), Smokes(Bob))
Weight of formula iNo. of true groundings of formula i in x
P (X=x )= 1Zexp(∑i wi ni ( x ))
Normalization constant
a possible truth assignment
8
Agenda
• Introduction• Background: MLN• RTE• STS• Future work and Conclusion
9
Recognizing Textual Entailment (RTE)
• Given two sentences, a premise and a hypothesis, does the first entails the second ?
• e.g– Premise: “A male gorilla escaped from his cage in
Berlin zoo and sent terrified visitors running for cover, the zoo said yesterday.”
– Hypothesis: “A gorilla escaped from his cage in a zoo in Germany. ”
– Entails: true
10
System Architecture
11
Sent1BOXER
Rule Base
result
Sent2
LF1
LF2
Dist. RuleConstructor
Vector SpaceALCHEMY
MLN Inference
• BOXER [Bos, et al. 2004]: maps sentences to logical form
• Distributional Rule constructor: generates relevant soft inference rules based on distributional similarity
• ALCHEMY: probabilistic MLN inference • Result: degree of entailment
Sample Logical Forms
• Premise: “A man is cutting pickles”– x,y,z ( man(x) ^ cut(y) ^ agent(y, x) ^ pickles(z) ^ patient(y, z))
• Hypothesis: “A guy is slicing cucumber”– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) )
• Hypothesis in the query form– analogy to negated hypothesis in standard theorem proving– x,y,z ( guy(x) ^ slice(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result())
• Query – result() [Degree of Entailment]
12
13
Distributional Lexical Rules
• For every pair of words (a, b) where a is in S1 and b is in S2 add a soft rule relating the two– x a(x) → b(x) | wt(a, b)– wt(a, b) = f( cos(a, b) )
• Premise: “A man is cutting pickles”
• Hypothesis: “A guy is slicing cucumber”– x man(x) → guy(x) | wt(man, guy)– x cut(x) → slice(x) | wt(cut, slice)– x pickle(x) → cucumber(x) | wt(pickle, cucumber)
→ →
Distributional Phrase Rules
• Premise: “A boy is playing”• Hypothesis: “A little boy is playing”• Need rules for phrases– x boy(x) → little(x) ^ boy(x) | wt(boy, "little boy")
• Compute vectors for phrases using vector addition [Mitchell & Lapata, 2010]– "little boy" = little + boy
14
15
Preliminary Results: RTE-1(2005)
System Accuracy
Logic only: [Bos & Markert, 2005] 52%
Our System 57%
Agenda
• Introduction• Background: MLN• RTE• STS• Future work and Conclusion
16
Semantic Textual Similarity (STS)
• Rate the semantic similarity of two sentences on a 0 to 5 scale
• Gold standards are averaged over multiple human judgments
• Evaluate by measuring correlation to human ratingS1 S2 score
A man is slicing a cucumber A guy is cutting a cucumber 5
A man is slicing a cucumber A guy is cutting a zucchini 4
A man is slicing a cucumber A woman is cooking a zucchini 3
A man is slicing a cucumber A monkey is riding a bicycle 1
17
Softening Conjunction for STS
18
• Logical conjunctions requires satisfying all conjuncts to satisfy the clause, which is too strict for STS
• Hypothesis:– x,y,z ( guy(x) ^ cut(y) ^ agent(y, x) ^ cucumber(z) ^ patient(y, z) → result())
• Break the sentence into “micro-clauses” then combine them using an “averaging combiner” [Natarajan et al., 2010]
• Becomes:– x,y,z guy(x) ^ agent(y, x)→ result()– x,y,z cut(y) ^ agent(y, x)→ result()– x,y,z cut(y) ^ patient(y, z) → result()– x,y,z cucumber(z) ^ patient(y, z) → result()
Preliminary Results: STS 2012
19
• Microsoft video description corpus– Sentence pairs given human 0-5 rating– 1,500 pairs equally split into training/test
System Pearson r
Our System with no distributional rules [Logic only] 0.52
Our System with lexical rules 0.60
Our System with lexical and phrase rules 0.73
Vector Addition [Distributional only] 0.78
Ensemble our best score with vector addition 0.85
Best system in STS 2012 (large ensemble) 0.87
Agenda
• Introduction• Background: MLN• RTE• STS• Future work and Conclusion
20
21
Future Work
• Scale MLN inference to longer and more complex sentences
• Use multiple parses to reduce impact of parse errors
• Better Rule base– Vector space methods for asymmetric weights
• wt(cucumber→vegetable) > wt(vegetable→cucumber)– Inference rules from existing paraphrase collections
– More sophisticated phrase vectors
Conclusion
22
• Using MLN to represent semantics• Combining both logical and
distributional approaches– Deep semantics: represent sentences
using logic– Robust system:• Probabilistic logic and Soft inference rule• Wide coverage of distributional semantics
Thank You