combining representation learning and logical rule

48
COMBINING REPRESENTATION LEARNING AND LOGICAL RULE REASONING FOR KNOWLEDGE GRAPH INFERENCE Yizhou Sun Department of Computer Science University of California, Los Angeles [email protected] August 15, 2021

Upload: others

Post on 02-Feb-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

COMBINING REPRESENTATION LEARNING AND LOGICAL RULE REASONING FOR KNOWLEDGE GRAPH INFERENCE

Yizhou SunDepartment of Computer ScienceUniversity of California, Los [email protected]

August 15, 2021

Page 2: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Outline

• Introduction

•Bringing First-Order Logic into Uncertain KG Embedding

•UniKER: Integrating Horn Rule Reasoning into KGE

•Summary

1

Page 3: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Knowledge Graph

•What are knowledge graphs?• Multi-relational graph data

• (heterogeneous information network)

• Provide structured representation for semantic relationships between real-world entities

2

A triple (h, r, t) represents a fact, ex: (Eiffel Tower, is located in, Paris)

Page 4: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Examples of KG

3

General-purpose KGs

Common-sense KGs & NLP

Bio & Medical KGs

Product Graphs & E-commerce

Page 5: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Applications of KGs● Foundational to knowledge-driven AI systems● Enable many downstream applications (NLP tasks, QA systems, etc)

4

QA & Dialogue systems

Sorry, I don't know that one.

Computational Biology

Natural Language Processing

Recommendation Systems

Knowledge Graphs

Page 6: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Reasoning over Knowledge Graph

•Knowledge graph reasoning aims at inferring missing knowledge through the existing facts.

5

Infer missing facts

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedTo

liveIn

isMarriedTo

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

speakLangofficialLang

Page 7: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Knowledge Graph Embedding

•Entities: low dimensional vectors•Relations: parametric algebraic operators•Triples: representation-based score function

6

Page 8: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Summary of Existing Approaches

• Define a score function for a triple: 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕)• According to entity and relation representation

• Define a loss function to guide the training• E.g., an observed triple scores higher than a negative one

7

Source: Sun et al., RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space (ICLR’19)

Page 9: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Pros and Cons of KGE• Knowledge Graph Embedding

• Shows good scalability as well as robustness.

• Fails to capture high-order dependency between entities and relations.

• Can’t handle cold-start entities

8

KGE-based Inference

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedTo

liveIn

isMarriedTo

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

officialLang

×

Page 10: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Logical Rule-based KG reasoning

•Find the truth value of each triple to maximize the satisfaction of rules

9

Logical Rule Reasoning

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧

officialLanguage (Country, Language)

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedToisMarriedTo

officialLang

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

speakLang

Page 11: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Pros and Cons of Logical Rule-based Reasoning• Logical Rule-based Reasoning

• Good at capturing high-order dependency and good interpretability.• Unable to handle noisy data as well as suffer from high computation complexity.• Coverage is low

10

Logical Rule Reasoning

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧

officialLanguage (Country, Language)

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedToisMarriedTo

officialLang

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

speakLang

Page 12: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Combine both Worlds:1+1>2!

11

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

English

Mina Miller

isMarriedTo

liveIn

isMarriedTo

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

speakLang speakLangofficialLang

×

Page 13: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Outline

• Introduction

•Bringing First-Order Logic into Uncertain KG Embedding

•UniKER: Integrating Horn Rule Reasoning into KGE

•Summary

12

Page 14: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

The First Attempt

•Chen et al., "Embedding Uncertain Knowledge Graphs," AAAI’19

13

Page 15: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Two Types of Errors in KG

•False positive• An observed triple is wrong,

• e.g., (Obama, is_born_in, Kenya)

•False negative• A true fact is missing

• e.g., (Eiffel Tower, is located in, France)

14

Page 16: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Handling Uncertainty in Triples

•False positive errors can be alleviated by introducing uncertainty• E.g., (Obama, is_born_in, Kenya): 0.01

• Fit 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕) to uncertainty scores15

Sing

Church

Wedding

usedFor: 0.6

Pray

Choir Entertainment

Echo

People_Complaining

causes: 0.3

Song

uses: 1.0

Rehearsing_Room

uses: 0.3

Page 17: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

From score function to uncertainty score

•Given a triple 𝑙𝑙 = ℎ, 𝑟𝑟, 𝑡𝑡 with uncertainty score 𝑠𝑠𝑙𝑙• Transform 𝑓𝑓𝑟𝑟(𝒉𝒉, 𝒕𝒕) into a score in the range [0,1]

• E.g., for DisMult score function

• Where 𝜙𝜙(⋅) can be defined as

16

ϕ( ) ⟶ 𝑠𝑠𝑙𝑙ground truth confidence

h t

r

• Logistic function

• Bounded Rectifier

UKGE(logi)

UKGE(rect)

Page 18: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Handling Missing Facts

•Are unseen triples still needed?• Yes, negative triples are still data points!

•Can we treat them as false, i.e., 𝑠𝑠𝑙𝑙 = 0, if triple 𝑙𝑙 is unseen?• No, we are going to make too many mistakes!

• The potential probability of an unseen triple could be higher than an observed triple with low confidence

17

Page 19: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Bringing Logic Rules

•What are logic rules?• Logic rule (Template)

• (A , synonym, B) ∧ (B , synonym, C) → (A, synonym, C)

• Ground rule (Instance)• (college, synonym, university) ∧ (university , synonym, institute) →

(college, synonym, institute)

•Why are they helpful?• Help us to infer the score

for unseen triples

18

Page 20: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Probabilistic Soft Logic• Quantify a ground rule using PSL

• Lukasiewicz t-norm, from Boolean logic to soft logic

• Probability of a ground rule 𝛾𝛾 ≡ 𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 → 𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏• 𝑝𝑝𝛾𝛾 = 𝐼𝐼 ¬𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏⋁𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏 = min{1,1 − 𝐼𝐼 𝛾𝛾𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 + 𝐼𝐼(𝛾𝛾ℎ𝑒𝑒𝑒𝑒𝑏𝑏))

• Distance to satisfaction•

19

More publications on PSL: https://psl.linqs.org/

Page 21: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

The Goal: Minimize Distance to Satisfaction

•Example: Consider the following ground rule

20

• (college, synonym, university) ∧ (university , synonym, institute) →(college, synonym, institute)

• Recall,

𝑙𝑙1 confidence: 0.99 𝑙𝑙2 confidence: 0.86

𝑙𝑙3 confidence: ?

0.99 0.86

Say, our embedding model predicts it as 0.65. How good is this prediction?

Page 22: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

The New Embedding Model

•For observed triples, force its score close to ground truth score

•For unseen triples, minimize the distance to satisfaction in ground rules they are involved

21

Embedding-based confidence function

Distance to satisfaction for a ground rule 𝜸𝜸, where 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 𝒕𝒕 is involved in

Page 23: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Experiments

•Datasets

•Logic Rules

22

(A, relatedto, B)∧(B,relatedto,C)➝(A,relatedto,C)(A, causes, B)∧(B,causes,C)➝(A,causes,C)

(A, competeswith, B)∧(B,competeswith,C)➝(A,competeswith,C)(A, atheletePlaysForTeam, B)∧(B, teamPlaysSports, C)➝(A, atheletePlaysSports, C)

(A, binding, B)∧(B,binding,C)➝(A,binding,C)

Page 24: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Baselines• Deterministic KG embedding models, which does not model

confidence scores explicitly• TransE [Bordes et al. 2013)]• DistMult [Yang et al. 2015]• ComplEx [Trouillon et al. 2016]

• Uncertain Graph Embedding, which only provides node embeddings• URGE [Hu et al. 2017]

• Two simplified version of our models• Without Negative Sampling (UKGE_n-)

• Can we just ignore the negative links during training?• Without PSL (UKGE_p-)

• Will simply treating unseen relations as 0 a good strategy?

23

Page 25: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Relation Fact Confidence Score Prediction

•Given an unseen triple (h,r,t), predict its confidence•Metrics: MSE and MAE (× 10−2)

24

Page 26: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Relation Fact Ranking

•Given a query (h, r, ?t), rank all entities in our vocabulary as tail candidates

•Metrics: normalized Discounted Cumulative Gain (nDCG) (linear gain and exp gain)

25

Page 27: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Relation Fact Ranking – Case Study

26

Ground Truth PredictionsEntity Score Entity Predicted Score True Score

Page 28: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Outline

• Introduction

•Bringing First-Order Logic into Uncertain KG Embedding

•UniKER: Integrating Horn Rule Reasoning into KGE

•Summary

27

Page 29: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

The Second Attempt

•Cheng et al., UniKER: A Unified Framework for Combining Embedding and Horn Rules for Knowledge Graph Inference, In Submission

28

Page 30: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Existing Literature on Combining Both Worlds•Probabilistic logic is widely used to integrate both worlds

• PSL-based Regularization in Embedding Loss• Leverage Probabilistic Soft Logic (PSL) [7] for satisfaction loss calculation• Treat logical rules as additional regularization to embedding models,

where the satisfaction loss of ground rules is integrated into the original embedding loss.

• Limitation: only utilize a sample set of rule instances

• Embedding-based Variational Inference for MLN.• Extends Markov Logic Network (MLN) [8]• Leverage graph embedding to define variational distribution for all

possible hidden triples to conduct variational inference of MLN.• Limitation: efficiency issue, sampling is required

29

Page 31: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Combining Both Worlds

30

Categories Methods Interactive Exact Logical Inference

PSL-based Regularization

KALE [1] × ×

RUGE [2] √ ×

Rocktaschel et al [3] × ×

Embedding-based Variational Inference to MLN

pLogicNet [4] √ ×

ExpressGNN [5] √ ×

pGAT [6] √ ×

Page 32: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Our Proposed Work: UniKER for Horn Rules

• Idea 1: use forward chaining to conduct exact inference

• Idea 2: combine embedding and logical rules in an iterative manner.

• Idea 3: remove potential incorrect triples during learning to ensure robustness

31

Page 33: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Traditional Logical Inference: MAX-SAT problem

32

EntitiesCountry USAPersonThomas Alva Edison, Mary Stilwell, Mina MillerLanguageEnglish

Knowledge GraphPredicatesisMarriedTo

liveInisParentOfisSiblingOf

officialLanguagespeakLanguage

All ground predicatesspeakLanguage (Thomas Alva Edison, English) ⇐ liveIn (Thomas Alva Edison, USA)

∧ officialLanguage (USA, English)…

speakLanguage (Mary Stilwell, English) ⇐ liveIn (Mary Stilwell, USA)∧ officialLanguage (USA, English)

liveIn (Thomas Alva Edison, USA) T…liveIn (Mary Stilwell , USA) ?

All ground rules

SAT Solver

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)

Definite Horn rulespeakLanguage (Mina Miller, English)New fact

NP-complete

Observed FactsisMarriedTo (Thomas Alva Edison, Mary Stilwell)isMarriedTo (Thomas Alva Edison, Mina Miller)

isMarriedTo (Mary Stilwell, Mina Miller)liveIn (Mina Miller, USA)

officialLanguage (USA, English)

Page 34: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Forward Chaining for Horn rules: Exact and Fast

33

Definite Horn rule

Country = USALanguage = English

speakLanguage (Mina Miller, English) New fact

Forward Chaining

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)

Person = Mina Miller

Observed FactsisMarriedTo (Thomas Alva Edison, Mary Stilwell)isMarriedTo (Thomas Alva Edison, Mina Miller)

isMarriedTo (Mary Stilwell, Mina Miller)liveIn (Mina Miller, USA)

officialLanguage (USA, English)

involve only a small subset of active ground predicates/rules

Page 35: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Iterative Mutual Enhancement•Enhance KGE via logical inference

•Update KG via forward chaining-based logical reasoning

•Enhance logical inference via KGE•Excluding potential incorrect triples•Including potential useful hidden triples

34

Page 36: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Update KG via Forward Chaining-based Logical Reasoning

35

Logical Rule-based ReasoningKnowledge Graph Embedding

Definite Horn rule

liveIn (Mina Miller, USA)officialLanguage (USA, English)

Country = USALanguage = English

speakLanguage (Mina Miller, English) New fact

Logical Inference

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)

Observed Facts

Embedding Learning … … … ………

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo

speakLang

Page 37: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Iterative Mutual Enhancement•Enhance KGE via logical inference

•Update KG via forward chaining-based logical reasoning

•Enhance logical inference via KGE•Excluding potential incorrect triples•Including potential useful hidden triples

36

Page 38: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Excluding potential incorrect triples

Infer missing facts

… … … ………

Denoise KG

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedToisMarriedTo×

LearnedEmbedding

Thomas Alva Edison

Mary Stilwell

isMarriedTo

USAliveIn

officialLangEnglish

Mina Miller

isMarriedTo

37

Page 39: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Including potential useful hidden triples

√ triples in KGs? triples not in KGs

… … … ………

Forward Chaining

Definite Horn rule

liveIn (Mary Stilwell, USA) ?officialLanguage (USA, English) √

Country = USALanguage = English Person = Mary Stilwell

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)

Observed FactsLearned Embedding

38

Page 40: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

√ triples in KGs? triples not in KGs

Learned Embedding

… … … ………

Forward Chaining

Definite Horn rule

liveIn (Mary Stilwell, USA) √officialLanguage (USA, English) √

Country = USALanguage = English Person = Mary Stilwell

speakLanguage(Person, Language) ⇐ liveIn(Person, Country) ∧ officialLanguage (Country, Language)

Observed Facts

speakLanguage (Mina Miller, English)

New fact

39

Including potential useful hidden triples

Page 41: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Experimental Results•KG completion task

40

Page 42: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Experimental Results

•A few iterations is good enough

41

Page 43: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Robust to Noise

42

• construct a noisy dataset with noisy triples to be 40% of original data.

Page 44: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Effcient

43

• Evaluate the scalability of forward chaining against a number of SOTA inference algorithms for MLN

Page 45: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Outline

• Introduction

•Bringing First-Order Logic into Uncertain KG Embedding

•UniKER: Integrating Horn Rule Reasoning into KGE

•Summary

46

Page 46: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Take Away• Two methodologies in KG inference

• Embedding-based approach• Logical rule-based reasoning

• Combination of the two worlds is the promising direction• Embedding can handle noise and uncertainty of KG• Logical rules provide higher-order dependency constraints among entities and relations

• Different ways of combination• UniKER is the best solution if the logical rules are confined to Horn rules

47

Page 47: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

References• [1] Guo S, Wang Q, Wang L, et al. Jointly embedding knowledge graphs and logical rules[C]//Proceedings of the

2016 conference on empirical methods in natural language processing. 2016: 192-202.• [2] Guo S, Wang Q, Wang L, et al. Knowledge graph embedding with iterative guidance from soft

rules[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).• [3] Rocktäschel T, Singh S, Riedel S. Injecting logical background knowledge into embeddings for relation

extraction[C]//Proceedings of the 2015 conference of the north American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 1119-1129.

• [4] Qu M, Tang J. Probabilistic logic neural networks for reasoning[J]. arXiv preprint arXiv:1906.08495, 2019.• [5] Zhang Y, Chen X, Yang Y, et al. Can Graph Neural Networks Help Logic Reasoning?[J]. arXiv preprint

arXiv:1906.02111, 2019.• [6] Harsha Vardhan L V, Jia G, Kok S. Probabilistic logic graph attention networks for reasoning[C]//Companion

Proceedings of the Web Conference 2020. 2020: 669-673.• [7] Kimmig A, Bach S, Broecheler M, et al. A short introduction to probabilistic soft logic[C]//Proceedings of the NIPS

Workshop on Probabilistic Programming: Foundations and Applications. 2012: 1-4.• [8] Richardson M, Domingos P. Markov logic networks[J]. Machine learning, 2006, 62(1-2): 107-136.• [9] Ren, H., Hu, W., and Leskovec, J. Query2box: Reasoning over knowledge graphs in vector space using box

embeddings. ICLR'2020.• [10] Ren, H. and Leskovec, J. Beta embeddings for multi-hop logical reasoning in knowledge graphs. NeurIPS'2020.• [11] Hamilton, W. L., Bajaj, P., Zitnik, M., Jurafsky, D., and Leskovec, J. Embedding logical queries on knowledge

graphs. NeurIPS'2018.

48

Page 48: COMBINING REPRESENTATION LEARNING AND LOGICAL RULE

Q & A• Thanks to my collaborators:

• Shirley Chen, Vivian Cheng, Junheng Hao, Muhao Chen, Wei Wang, Carlo Zaniolo

49