combining representation learning and logical rule
TRANSCRIPT
COMBINING REPRESENTATION LEARNING AND LOGICAL RULE REASONING FOR KNOWLEDGE GRAPH INFERENCE
Yizhou SunDepartment of Computer ScienceUniversity of California, Los [email protected]
August 15, 2021
Outline
• Introduction
•Bringing First-Order Logic into Uncertain KG Embedding
•UniKER: Integrating Horn Rule Reasoning into KGE
•Summary
1
Knowledge Graph
•What are knowledge graphs?• Multi-relational graph data
• (heterogeneous information network)
• Provide structured representation for semantic relationships between real-world entities
2
A triple (h, r, t) represents a fact, ex: (Eiffel Tower, is located in, Paris)
Examples of KG
3
General-purpose KGs
Common-sense KGs & NLP
Bio & Medical KGs
Product Graphs & E-commerce
Applications of KGs● Foundational to knowledge-driven AI systems● Enable many downstream applications (NLP tasks, QA systems, etc)
4
QA & Dialogue systems
Sorry, I don't know that one.
Computational Biology
Natural Language Processing
Recommendation Systems
Knowledge Graphs
Reasoning over Knowledge Graph
•Knowledge graph reasoning aims at inferring missing knowledge through the existing facts.
5
Infer missing facts
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
English
Mina Miller
isMarriedTo
liveIn
isMarriedTo
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo
speakLangofficialLang
Knowledge Graph Embedding
•Entities: low dimensional vectors•Relations: parametric algebraic operators•Triples: representation-based score function
6
Summary of Existing Approaches
• Define a score function for a triple: 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕)• According to entity and relation representation
• Define a loss function to guide the training• E.g., an observed triple scores higher than a negative one
7
Source: Sun et al., RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space (ICLR’19)
Pros and Cons of KGE• Knowledge Graph Embedding
• Shows good scalability as well as robustness.
• Fails to capture high-order dependency between entities and relations.
• Can’t handle cold-start entities
8
KGE-based Inference
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
English
Mina Miller
isMarriedTo
liveIn
isMarriedTo
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo
officialLang
×
Logical Rule-based KG reasoning
•Find the truth value of each triple to maximize the satisfaction of rules
9
Logical Rule Reasoning
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧
officialLanguage (Country, Language)
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
English
Mina Miller
isMarriedToisMarriedTo
officialLang
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo
speakLang
Pros and Cons of Logical Rule-based Reasoning• Logical Rule-based Reasoning
• Good at capturing high-order dependency and good interpretability.• Unable to handle noisy data as well as suffer from high computation complexity.• Coverage is low
10
Logical Rule Reasoning
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧
officialLanguage (Country, Language)
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
English
Mina Miller
isMarriedToisMarriedTo
officialLang
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo
speakLang
Combine both Worlds:1+1>2!
11
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
English
Mina Miller
isMarriedTo
liveIn
isMarriedTo
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo
speakLang speakLangofficialLang
×
Outline
• Introduction
•Bringing First-Order Logic into Uncertain KG Embedding
•UniKER: Integrating Horn Rule Reasoning into KGE
•Summary
12
The First Attempt
•Chen et al., "Embedding Uncertain Knowledge Graphs," AAAI’19
13
Two Types of Errors in KG
•False positive• An observed triple is wrong,
• e.g., (Obama, is_born_in, Kenya)
•False negative• A true fact is missing
• e.g., (Eiffel Tower, is located in, France)
14
Handling Uncertainty in Triples
•False positive errors can be alleviated by introducing uncertainty• E.g., (Obama, is_born_in, Kenya): 0.01
• Fit 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕) to uncertainty scores15
Sing
Church
Wedding
usedFor: 0.6
Pray
Choir Entertainment
Echo
People_Complaining
causes: 0.3
Song
uses: 1.0
Rehearsing_Room
uses: 0.3
From score function to uncertainty score
•Given a triple 𝑙𝑙 = ℎ, 𝑟𝑟, 𝑡𝑡 with uncertainty score 𝑠𝑠𝑙𝑙• Transform 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕) into a score in the range [0,1]
• E.g., for DisMult score function
• Where 𝜙𝜙(⋅) can be defined as
16
ϕ( ) ⟶ 𝑠𝑠𝑙𝑙ground truth confidence
h t
r
• Logistic function
• Bounded Rectifier
UKGE(logi)
UKGE(rect)
Handling Missing Facts
•Are unseen triples still needed?• Yes, negative triples are still data points!
•Can we treat them as false, i.e., 𝑠𝑠𝑙𝑙 = 0, if triple 𝑙𝑙 is unseen?• No, we are going to make too many mistakes!
• The potential probability of an unseen triple could be higher than an observed triple with low confidence
17
Bringing Logic Rules
•What are logic rules?• Logic rule (Template)
• (A , synonym, B) ∧ (B , synonym, C) → (A, synonym, C)
• Ground rule (Instance)• (college, synonym, university) ∧ (university , synonym, institute) →
(college, synonym, institute)
•Why are they helpful?• Help us to infer the score
for unseen triples
18
Probabilistic Soft Logic• Quantify a ground rule using PSL
• Lukasiewicz t-norm, from Boolean logic to soft logic
• Probability of a ground rule 𝛾𝛾 ≡ 𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 → 𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏• 𝑝𝑝𝛾𝛾 = 𝐼𝐼 ¬𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏⋁𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏 = min{1,1 − 𝐼𝐼 𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝐼𝐼(𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏))
• Distance to satisfaction•
19
More publications on PSL: https://psl.linqs.org/
The Goal: Minimize Distance to Satisfaction
•Example: Consider the following ground rule
20
• (college, synonym, university) ∧ (university , synonym, institute) →(college, synonym, institute)
• Recall,
𝑙𝑙1 confidence: 0.99 𝑙𝑙2 confidence: 0.86
𝑙𝑙3 confidence: ?
0.99 0.86
Say, our embedding model predicts it as 0.65. How good is this prediction?
The New Embedding Model
•For observed triples, force its score close to ground truth score
•For unseen triples, minimize the distance to satisfaction in ground rules they are involved
21
Embedding-based confidence function
Distance to satisfaction for a ground rule 𝜸𝜸, where 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕 is involved in
Experiments
•Datasets
•Logic Rules
22
(A, relatedto, B)∧(B,relatedto,C)➝(A,relatedto,C)(A, causes, B)∧(B,causes,C)➝(A,causes,C)
(A, competeswith, B)∧(B,competeswith,C)➝(A,competeswith,C)(A, atheletePlaysForTeam, B)∧(B, teamPlaysSports, C)➝(A, atheletePlaysSports, C)
(A, binding, B)∧(B,binding,C)➝(A,binding,C)
Baselines• Deterministic KG embedding models, which does not model
confidence scores explicitly• TransE [Bordes et al. 2013)]• DistMult [Yang et al. 2015]• ComplEx [Trouillon et al. 2016]
• Uncertain Graph Embedding, which only provides node embeddings• URGE [Hu et al. 2017]
• Two simplified version of our models• Without Negative Sampling (UKGE_n-)
• Can we just ignore the negative links during training?• Without PSL (UKGE_p-)
• Will simply treating unseen relations as 0 a good strategy?
23
Relation Fact Confidence Score Prediction
•Given an unseen triple (h,r,t), predict its confidence•Metrics: MSE and MAE (× 10−2)
24
Relation Fact Ranking
•Given a query (h, r, ?t), rank all entities in our vocabulary as tail candidates
•Metrics: normalized Discounted Cumulative Gain (nDCG) (linear gain and exp gain)
25
Relation Fact Ranking – Case Study
26
Ground Truth PredictionsEntity Score Entity Predicted Score True Score
Outline
• Introduction
•Bringing First-Order Logic into Uncertain KG Embedding
•UniKER: Integrating Horn Rule Reasoning into KGE
•Summary
27
The Second Attempt
•Cheng et al., UniKER: A Unified Framework for Combining Embedding and Horn Rules for Knowledge Graph Inference, In Submission
28
Existing Literature on Combining Both Worlds•Probabilistic logic is widely used to integrate both worlds
• PSL-based Regularization in Embedding Loss• Leverage Probabilistic Soft Logic (PSL) [7] for satisfaction loss calculation• Treat logical rules as additional regularization to embedding models,
where the satisfaction loss of ground rules is integrated into the original embedding loss.
• Limitation: only utilize a sample set of rule instances
• Embedding-based Variational Inference for MLN.• Extends Markov Logic Network (MLN) [8]• Leverage graph embedding to define variational distribution for all
possible hidden triples to conduct variational inference of MLN.• Limitation: efficiency issue, sampling is required
29
Combining Both Worlds
30
Categories Methods Interactive Exact Logical Inference
PSL-based Regularization
KALE [1] × ×
RUGE [2] √ ×
Rocktaschel et al [3] × ×
Embedding-based Variational Inference to MLN
pLogicNet [4] √ ×
ExpressGNN [5] √ ×
pGAT [6] √ ×
Our Proposed Work: UniKER for Horn Rules
• Idea 1: use forward chaining to conduct exact inference
• Idea 2: combine embedding and logical rules in an iterative manner.
• Idea 3: remove potential incorrect triples during learning to ensure robustness
31
Traditional Logical Inference: MAX-SAT problem
32
EntitiesCountry USAPersonThomas Alva Edison, Mary Stilwell, Mina MillerLanguageEnglish
Knowledge GraphPredicatesisMarriedTo
liveInisParentOfisSiblingOf
officialLanguagespeakLanguage
All ground predicatesspeakLanguage (Thomas Alva Edison, English) ⇐ liveIn (Thomas Alva Edison, USA)
∧ officialLanguage (USA, English)…
speakLanguage (Mary Stilwell, English) ⇐ liveIn (Mary Stilwell, USA)∧ officialLanguage (USA, English)
liveIn (Thomas Alva Edison, USA) T…liveIn (Mary Stilwell , USA) ?
All ground rules
SAT Solver
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)
Definite Horn rulespeakLanguage (Mina Miller, English)New fact
NP-complete
Observed FactsisMarriedTo (Thomas Alva Edison, Mary Stilwell)isMarriedTo (Thomas Alva Edison, Mina Miller)
isMarriedTo (Mary Stilwell, Mina Miller)liveIn (Mina Miller, USA)
officialLanguage (USA, English)
Forward Chaining for Horn rules: Exact and Fast
33
Definite Horn rule
Country = USALanguage = English
speakLanguage (Mina Miller, English) New fact
Forward Chaining
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)
Person = Mina Miller
Observed FactsisMarriedTo (Thomas Alva Edison, Mary Stilwell)isMarriedTo (Thomas Alva Edison, Mina Miller)
isMarriedTo (Mary Stilwell, Mina Miller)liveIn (Mina Miller, USA)
officialLanguage (USA, English)
involve only a small subset of active ground predicates/rules
Iterative Mutual Enhancement•Enhance KGE via logical inference
•Update KG via forward chaining-based logical reasoning
•Enhance logical inference via KGE•Excluding potential incorrect triples•Including potential useful hidden triples
34
Update KG via Forward Chaining-based Logical Reasoning
35
Logical Rule-based ReasoningKnowledge Graph Embedding
Definite Horn rule
liveIn (Mina Miller, USA)officialLanguage (USA, English)
Country = USALanguage = English
speakLanguage (Mina Miller, English) New fact
Logical Inference
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)
Observed Facts
Embedding Learning … … … ………
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo
speakLang
Iterative Mutual Enhancement•Enhance KGE via logical inference
•Update KG via forward chaining-based logical reasoning
•Enhance logical inference via KGE•Excluding potential incorrect triples•Including potential useful hidden triples
36
Excluding potential incorrect triples
Infer missing facts
… … … ………
Denoise KG
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedToisMarriedTo×
LearnedEmbedding
Thomas Alva Edison
Mary Stilwell
isMarriedTo
USAliveIn
officialLangEnglish
Mina Miller
isMarriedTo
37
Including potential useful hidden triples
√ triples in KGs? triples not in KGs
… … … ………
Forward Chaining
Definite Horn rule
liveIn (Mary Stilwell, USA) ?officialLanguage (USA, English) √
Country = USALanguage = English Person = Mary Stilwell
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)
Observed FactsLearned Embedding
38
√ triples in KGs? triples not in KGs
Learned Embedding
… … … ………
Forward Chaining
Definite Horn rule
liveIn (Mary Stilwell, USA) √officialLanguage (USA, English) √
Country = USALanguage = English Person = Mary Stilwell
speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)
Observed Facts
speakLanguage (Mina Miller, English)
New fact
39
Including potential useful hidden triples
Experimental Results•KG completion task
40
Experimental Results
•A few iterations is good enough
41
Robust to Noise
42
• construct a noisy dataset with noisy triples to be 40% of original data.
Effcient
43
• Evaluate the scalability of forward chaining against a number of SOTA inference algorithms for MLN
Outline
• Introduction
•Bringing First-Order Logic into Uncertain KG Embedding
•UniKER: Integrating Horn Rule Reasoning into KGE
•Summary
46
Take Away• Two methodologies in KG inference
• Embedding-based approach• Logical rule-based reasoning
• Combination of the two worlds is the promising direction• Embedding can handle noise and uncertainty of KG• Logical rules provide higher-order dependency constraints among entities and relations
• Different ways of combination• UniKER is the best solution if the logical rules are confined to Horn rules
47
References• [1] Guo S, Wang Q, Wang L, et al. Jointly embedding knowledge graphs and logical rules[C]//Proceedings of the
2016 conference on empirical methods in natural language processing. 2016: 192-202.• [2] Guo S, Wang Q, Wang L, et al. Knowledge graph embedding with iterative guidance from soft
rules[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).• [3] Rocktäschel T, Singh S, Riedel S. Injecting logical background knowledge into embeddings for relation
extraction[C]//Proceedings of the 2015 conference of the north American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 1119-1129.
• [4] Qu M, Tang J. Probabilistic logic neural networks for reasoning[J]. arXiv preprint arXiv:1906.08495, 2019.• [5] Zhang Y, Chen X, Yang Y, et al. Can Graph Neural Networks Help Logic Reasoning?[J]. arXiv preprint
arXiv:1906.02111, 2019.• [6] Harsha Vardhan L V, Jia G, Kok S. Probabilistic logic graph attention networks for reasoning[C]//Companion
Proceedings of the Web Conference 2020. 2020: 669-673.• [7] Kimmig A, Bach S, Broecheler M, et al. A short introduction to probabilistic soft logic[C]//Proceedings of the NIPS
Workshop on Probabilistic Programming: Foundations and Applications. 2012: 1-4.• [8] Richardson M, Domingos P. Markov logic networks[J]. Machine learning, 2006, 62(1-2): 107-136.• [9] Ren, H., Hu, W., and Leskovec, J. Query2box: Reasoning over knowledge graphs in vector space using box
embeddings. ICLR'2020.• [10] Ren, H. and Leskovec, J. Beta embeddings for multi-hop logical reasoning in knowledge graphs. NeurIPS'2020.• [11] Hamilton, W. L., Bajaj, P., Zitnik, M., Jurafsky, D., and Leskovec, J. Embedding logical queries on knowledge
graphs. NeurIPS'2018.
48
Q & A• Thanks to my collaborators:
• Shirley Chen, Vivian Cheng, Junheng Hao, Muhao Chen, Wei Wang, Carlo Zaniolo
49