combining representation learning and logical rule

COMBINING REPRESENTATION LEARNING AND LOGICAL RULE REASONING FOR KNOWLEDGE GRAPH INFERENCE

Yizhou SunDepartment of Computer ScienceUniversity of California, Los [email protected]

August 15, 2021

mailto:[email protected]

Outline

• Introduction

•Bringing First-Order Logic into Uncertain KG Embedding

•UniKER: Integrating Horn Rule Reasoning into KGE

•Summary

1

Knowledge Graph

•What are knowledge graphs?• Multi-relational graph data

• (heterogeneous information network)

• Provide structured representation for semantic relationships between real-world entities

2

A triple (h, r, t) represents a fact, ex: (Eiffel Tower, is located in, Paris)

Examples of KG

3

General-purpose KGs

Common-sense KGs & NLP

Bio & Medical KGs

Product Graphs & E-commerce

Applications of KGs● Foundational to knowledge-driven AI systems● Enable many downstream applications (NLP tasks, QA systems, etc)

4

QA & Dialogue systems

Sorry, I don't know that one.

Computational Biology

Natural Language Processing

Recommendation Systems

Knowledge Graphs

Reasoning over Knowledge Graph

•Knowledge graph reasoning aims at inferring missing knowledge through the existing facts.

5

Infer missing facts

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedTo

liveIn

isMarriedTo

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

speakLangofficialLang

Knowledge Graph Embedding

•Entities: low dimensional vectors•Relations: parametric algebraic operators•Triples: representation-based score function

6

Summary of Existing Approaches

• Define a score function for a triple: 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕)• According to entity and relation representation

• Define a loss function to guide the training• E.g., an observed triple scores higher than a negative one

7

Source: Sun et al., RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space (ICLR’19)

Pros and Cons of KGE• Knowledge Graph Embedding

• Shows good scalability as well as robustness.

• Fails to capture high-order dependency between entities and relations.

• Can’t handle cold-start entities

8

KGE-based Inference

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedTo

liveIn

isMarriedTo

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller


officialLang

×

Logical Rule-based KG reasoning

•Find the truth value of each triple to maximize the satisfaction of rules

9

Logical Rule Reasoning

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧

officialLanguage (Country, Language)

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller


officialLang

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller


speakLang

Pros and Cons of Logical Rule-based Reasoning• Logical Rule-based Reasoning

• Good at capturing high-order dependency and good interpretability.• Unable to handle noisy data as well as suffer from high computation complexity.• Coverage is low

10

Logical Rule Reasoning

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧

officialLanguage (Country, Language)

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller


officialLang

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller


speakLang

Combine both Worlds:1+1>2!

11

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedTo

liveIn

isMarriedTo

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller


speakLang speakLangofficialLang

×

Outline

• Introduction



•Summary

12

The First Attempt

•Chen et al., "Embedding Uncertain Knowledge Graphs," AAAI’19

13

http://web.cs.ucla.edu/%7Eyzsun/papers/2019_AAAI_UKG.pdf

Two Types of Errors in KG

•False positive• An observed triple is wrong,

• e.g., (Obama, is_born_in, Kenya)

•False negative• A true fact is missing

• e.g., (Eiffel Tower, is located in, France)

14

Handling Uncertainty in Triples

•False positive errors can be alleviated by introducing uncertainty• E.g., (Obama, is_born_in, Kenya): 0.01

• Fit 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕) to uncertainty scores15

Sing

Church

Wedding

usedFor: 0.6

Pray

Choir Entertainment

Echo

People_Complaining

causes: 0.3

Song

uses: 1.0

Rehearsing_Room

uses: 0.3

From score function to uncertainty score

•Given a triple 𝑙𝑙 = ℎ, 𝑟𝑟, 𝑡𝑡 with uncertainty score 𝑠𝑠𝑙𝑙• Transform 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕) into a score in the range [0,1]

• E.g., for DisMult score function

• Where 𝜙𝜙(⋅) can be defined as

16

ϕ( ) ⟶ 𝑠𝑠𝑙𝑙ground truth confidence

h t

r

• Logistic function

• Bounded Rectifier

UKGE(logi)

UKGE(rect)

Handling Missing Facts

•Are unseen triples still needed?• Yes, negative triples are still data points!

•Can we treat them as false, i.e., 𝑠𝑠𝑙𝑙 = 0, if triple 𝑙𝑙 is unseen?• No, we are going to make too many mistakes!

• The potential probability of an unseen triple could be higher than an observed triple with low confidence

17

Bringing Logic Rules

•What are logic rules?• Logic rule (Template)

• (A , synonym, B) ∧ (B , synonym, C) → (A, synonym, C)

• Ground rule (Instance)• (college, synonym, university) ∧ (university , synonym, institute) →

(college, synonym, institute)

•Why are they helpful?• Help us to infer the score

for unseen triples

18

Probabilistic Soft Logic• Quantify a ground rule using PSL

• Lukasiewicz t-norm, from Boolean logic to soft logic

• Probability of a ground rule 𝛾𝛾 ≡ 𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 → 𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏• 𝑝𝑝𝛾𝛾 = 𝐼𝐼 ¬𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏⋁𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏 = min{1,1 − 𝐼𝐼 𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝐼𝐼(𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏))

• Distance to satisfaction•

19

More publications on PSL: https://psl.linqs.org/

The Goal: Minimize Distance to Satisfaction

•Example: Consider the following ground rule

20

• (college, synonym, university) ∧ (university , synonym, institute) →(college, synonym, institute)

• Recall,

𝑙𝑙1 confidence: 0.99 𝑙𝑙2 confidence: 0.86

𝑙𝑙3 confidence: ?

0.99 0.86

Say, our embedding model predicts it as 0.65. How good is this prediction?

The New Embedding Model

•For observed triples, force its score close to ground truth score

•For unseen triples, minimize the distance to satisfaction in ground rules they are involved

21

Embedding-based confidence function

Distance to satisfaction for a ground rule 𝜸𝜸, where 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕 is involved in

Experiments

•Datasets

•Logic Rules

22

(A, relatedto, B)∧(B,relatedto,C)➝(A,relatedto,C)(A, causes, B)∧(B,causes,C)➝(A,causes,C)

(A, competeswith, B)∧(B,competeswith,C)➝(A,competeswith,C)(A, atheletePlaysForTeam, B)∧(B, teamPlaysSports, C)➝(A, atheletePlaysSports, C)

(A, binding, B)∧(B,binding,C)➝(A,binding,C)

Baselines• Deterministic KG embedding models, which does not model

confidence scores explicitly• TransE [Bordes et al. 2013)]• DistMult [Yang et al. 2015]• ComplEx [Trouillon et al. 2016]

• Uncertain Graph Embedding, which only provides node embeddings• URGE [Hu et al. 2017]

• Two simplified version of our models• Without Negative Sampling (UKGE_n-)

• Can we just ignore the negative links during training?• Without PSL (UKGE_p-)

• Will simply treating unseen relations as 0 a good strategy?

23

Relation Fact Confidence Score Prediction

•Given an unseen triple (h,r,t), predict its confidence•Metrics: MSE and MAE (× 10−2)

24

Relation Fact Ranking

•Given a query (h, r, ?t), rank all entities in our vocabulary as tail candidates

•Metrics: normalized Discounted Cumulative Gain (nDCG) (linear gain and exp gain)

25

Relation Fact Ranking – Case Study

26

Ground Truth PredictionsEntity Score Entity Predicted Score True Score

Outline

• Introduction



•Summary

27

The Second Attempt

•Cheng et al., UniKER: A Unified Framework for Combining Embedding and Horn Rules for Knowledge Graph Inference, In Submission

28

Existing Literature on Combining Both Worlds•Probabilistic logic is widely used to integrate both worlds

• PSL-based Regularization in Embedding Loss• Leverage Probabilistic Soft Logic (PSL) [7] for satisfaction loss calculation• Treat logical rules as additional regularization to embedding models,

where the satisfaction loss of ground rules is integrated into the original embedding loss.

• Limitation: only utilize a sample set of rule instances

• Embedding-based Variational Inference for MLN.• Extends Markov Logic Network (MLN) [8]• Leverage graph embedding to define variational distribution for all

possible hidden triples to conduct variational inference of MLN.• Limitation: efficiency issue, sampling is required

29

Combining Both Worlds

30

Categories Methods Interactive Exact Logical Inference

PSL-based Regularization

KALE [1] × ×

RUGE [2] √ ×

Rocktaschel et al [3] × ×

Embedding-based Variational Inference to MLN

pLogicNet [4] √ ×

ExpressGNN [5] √ ×

pGAT [6] √ ×

Our Proposed Work: UniKER for Horn Rules

• Idea 1: use forward chaining to conduct exact inference

• Idea 2: combine embedding and logical rules in an iterative manner.

• Idea 3: remove potential incorrect triples during learning to ensure robustness

31

Traditional Logical Inference: MAX-SAT problem

32

EntitiesCountry USAPersonThomas Alva Edison, Mary Stilwell, Mina MillerLanguageEnglish

Knowledge GraphPredicatesisMarriedTo

liveInisParentOfisSiblingOf

officialLanguagespeakLanguage

All ground predicatesspeakLanguage (Thomas Alva Edison, English) ⇐ liveIn (Thomas Alva Edison, USA)

∧ officialLanguage (USA, English)…

speakLanguage (Mary Stilwell, English) ⇐ liveIn (Mary Stilwell, USA)∧ officialLanguage (USA, English)

liveIn (Thomas Alva Edison, USA) T…liveIn (Mary Stilwell , USA) ?

All ground rules

SAT Solver

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)

Definite Horn rulespeakLanguage (Mina Miller, English)New fact

NP-complete

Observed FactsisMarriedTo (Thomas Alva Edison, Mary Stilwell)isMarriedTo (Thomas Alva Edison, Mina Miller)

isMarriedTo (Mary Stilwell, Mina Miller)liveIn (Mina Miller, USA)

officialLanguage (USA, English)

Forward Chaining for Horn rules: Exact and Fast

33

Definite Horn rule

Country = USALanguage = English

speakLanguage (Mina Miller, English) New fact

Forward Chaining


Person = Mina Miller

Observed FactsisMarriedTo (Thomas Alva Edison, Mary Stilwell)isMarriedTo (Thomas Alva Edison, Mina Miller)

isMarriedTo (Mary Stilwell, Mina Miller)liveIn (Mina Miller, USA)

officialLanguage (USA, English)

involve only a small subset of active ground predicates/rules

Iterative Mutual Enhancement•Enhance KGE via logical inference

•Update KG via forward chaining-based logical reasoning

•Enhance logical inference via KGE•Excluding potential incorrect triples•Including potential useful hidden triples

34

Update KG via Forward Chaining-based Logical Reasoning

35

Logical Rule-based ReasoningKnowledge Graph Embedding

Definite Horn rule

liveIn (Mina Miller, USA)officialLanguage (USA, English)

Country = USALanguage = English

speakLanguage (Mina Miller, English) New fact

Logical Inference


Observed Facts

Embedding Learning … … … ………

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller


speakLang

Iterative Mutual Enhancement•Enhance KGE via logical inference

•Update KG via forward chaining-based logical reasoning

•Enhance logical inference via KGE•Excluding potential incorrect triples•Including potential useful hidden triples

36

Excluding potential incorrect triples

Infer missing facts

… … … ………

Denoise KG

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo×

LearnedEmbedding

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedTo

37

Including potential useful hidden triples

√ triples in KGs? triples not in KGs

… … … ………

Forward Chaining

Definite Horn rule

liveIn (Mary Stilwell, USA) ?officialLanguage (USA, English) √

Country = USALanguage = English Person = Mary Stilwell


Observed FactsLearned Embedding

38

√ triples in KGs? triples not in KGs

Learned Embedding

… … … ………

Forward Chaining

Definite Horn rule

liveIn (Mary Stilwell, USA) √officialLanguage (USA, English) √

Country = USALanguage = English Person = Mary Stilwell


Observed Facts

speakLanguage (Mina Miller, English)

New fact

39

Including potential useful hidden triples

Experimental Results•KG completion task

40

Experimental Results

•A few iterations is good enough

41

Robust to Noise

42

• construct a noisy dataset with noisy triples to be 40% of original data.

Effcient

43

• Evaluate the scalability of forward chaining against a number of SOTA inference algorithms for MLN

Outline

• Introduction



•Summary

46

Take Away• Two methodologies in KG inference

• Embedding-based approach• Logical rule-based reasoning

• Combination of the two worlds is the promising direction• Embedding can handle noise and uncertainty of KG• Logical rules provide higher-order dependency constraints among entities and relations

• Different ways of combination• UniKER is the best solution if the logical rules are confined to Horn rules

47

References• [1] Guo S, Wang Q, Wang L, et al. Jointly embedding knowledge graphs and logical rules[C]//Proceedings of the

2016 conference on empirical methods in natural language processing. 2016: 192-202.• [2] Guo S, Wang Q, Wang L, et al. Knowledge graph embedding with iterative guidance from soft

rules[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).• [3] Rocktäschel T, Singh S, Riedel S. Injecting logical background knowledge into embeddings for relation

extraction[C]//Proceedings of the 2015 conference of the north American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 1119-1129.

• [4] Qu M, Tang J. Probabilistic logic neural networks for reasoning[J]. arXiv preprint arXiv:1906.08495, 2019.• [5] Zhang Y, Chen X, Yang Y, et al. Can Graph Neural Networks Help Logic Reasoning?[J]. arXiv preprint

arXiv:1906.02111, 2019.• [6] Harsha Vardhan L V, Jia G, Kok S. Probabilistic logic graph attention networks for reasoning[C]//Companion

Proceedings of the Web Conference 2020. 2020: 669-673.• [7] Kimmig A, Bach S, Broecheler M, et al. A short introduction to probabilistic soft logic[C]//Proceedings of the NIPS

Workshop on Probabilistic Programming: Foundations and Applications. 2012: 1-4.• [8] Richardson M, Domingos P. Markov logic networks[J]. Machine learning, 2006, 62(1-2): 107-136.• [9] Ren, H., Hu, W., and Leskovec, J. Query2box: Reasoning over knowledge graphs in vector space using box

embeddings. ICLR'2020.• [10] Ren, H. and Leskovec, J. Beta embeddings for multi-hop logical reasoning in knowledge graphs. NeurIPS'2020.• [11] Hamilton, W. L., Bajaj, P., Zitnik, M., Jurafsky, D., and Leskovec, J. Embedding logical queries on knowledge

graphs. NeurIPS'2018.

48

Q & A• Thanks to my collaborators:

• Shirley Chen, Vivian Cheng, Junheng Hao, Muhao Chen, Wei Wang, Carlo Zaniolo

49

combining representation learning and logical rule

Documents