holographic embeddings of knowledge graphs• hole combines state-of-the-art relational learning and...

31
Holographic Embeddings of Knowledge Graphs Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio

Upload: others

Post on 03-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Embeddings of Knowledge Graphs

Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio

Page 2: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Knowledge Graphs Ñ Search Engines

Page 3: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Knowledge Graphs Ñ Digital Assistants

Page 4: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Relational Knowledge Representation

Knowledge graphs provide machine-interpretable data by modeling

knowledge « entities ` their relationships

Facts are represented as binary relations Rppes , eoq.

vicePresident(Obama, Biden)memberOf(Obama, Democrats)memberOf(Biden, Democrats)

ñ

Barack Obama Joe Biden

Democratic Party

vicePresident

party party

Modern knowledge graphs like Freebase, YAGO, DBpedia are

• Very large (FB: 40M entities, 35K relations, 637M facts)

• Very incomplete (FB: Nationality for 71% of persons missing)

Page 5: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Relational Knowledge Representation

Knowledge graphs provide machine-interpretable data by modeling

knowledge « entities ` their relationships

Facts are represented as binary relations Rppes , eoq.

Multigraph structure

Entity “ Node

Fact “ Edge

Relation type “ Edge type

Barack Obama Joe Biden

Democratic Party

vicePresident

party party

Modern knowledge graphs like Freebase, YAGO, DBpedia are

• Very large (FB: 40M entities, 35K relations, 637M facts)

• Very incomplete (FB: Nationality for 71% of persons missing)

Page 6: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Machine Learning on Knowledge Graphs

Learn a statistical model of a knowledge graph

Predict probability of any edge (link prediction)

Barack Obama Joe Biden

Democratic Party

Bill Clinton Al Gore

vicePresident

vicePresident

party

party party

?

Applications

• KG Completion

• “Structured” prior forMachine Reading

• Probabilistic QA

Challenges

• Relational nature of data

• Size of modern KGs

Page 7: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Knowledge Graph Embeddings

Knowledge graph embeddings consist of

Entity Embeddings + Relation Embeddings + Score Function

Goal: Learn embeddings that best explain the data according toscore function

RESCAL (Nickel, Tresp, et al., 2011)

scorepRppes , eoqq “ eJs Rpeo

• Interpretation as tensor completion

• State-of-the-art results on SRLbenchmarks

• Runtime & memory complexity Opd2q

«

i-thentity

j-th entity

k-threlation

Xk ERkEJ

Page 8: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Knowledge Graph Embeddings

Knowledge graph embeddings consist of

Entity Embeddings + Relation Embeddings + Score Function

Goal: Learn embeddings that best explain the data according toscore function

TransE (Bordes et al., 2013):

scorepRppes , eoqq “ ´}es ` rp ´ eo}1

• Inspired by Word2Vec

• Runtime & memory complexity Opdq

• Less powerful than RESCAL

es ` rp

eo

Page 9: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Embeddings

Page 10: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Interlude: Relations « Classification of Tuples

Let E be the set of all entities in a domain

A binary relation R Ď E ˆ E is the subset ofall pairs of entities for which the relationshipis true

partyOf

E ˆ E

Characteristic Function of Relations

ϕpps,oq “

#

1, ps,oq P Rp

0, otherwise

Observation: this is what we want to learn in link prediction

Relational Learning « Classification of Tuples

(Nickel, Rosasco, et al., 2016)

Page 11: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Embeddings (HOLE)

Holographic Embeddings

• model entities as vectors

• model relations types as vectors

• represent pairs of entities as

ei ÞÑ ei P Rd

Rk ÞÑ rk P Rd

pes , eoq ÞÑ es ‹ eo P Rd

where ‹ : Rd ˆ Rd Ñ Rd denotes circular correlation

ra ‹ bsk “

d´1ÿ

i“0aibpk`iq mod d .

Model relationships via the classification of pairs of entities

PrpRppes , eoq “ 1|Θq “ σ´

rJp pes ‹ eoq

¯

where Θ “ teiunei“1 Y trku

nrk“1

(Nickel, Rosasco, et al., 2016)

Page 12: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Embeddings (HOLE)

Holographic embeddings use circular correlation ‹ : Rd ˆ Rd Ñ Rd

pes , eoq « es ‹ eo

which is defined for a,b P Rd as

ra ‹ bsk “

d´1ÿ

i“0aibpk`iq mod d .

Compressed Tensor Product

a2

a1

a0

b0 b1 b2

c2 c1 c0

(Plate, 1995)

Page 13: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Embeddings (HOLE)

Holographic embeddings use circular correlation ‹ : Rd ˆ Rd Ñ Rd

pes , eoq « es ‹ eo

which is defined for a,b P Rd as

ra ‹ bsk “

d´1ÿ

i“0aibpk`iq mod d .

Compressed Tensor Product

a2

a1

a0

b0 b1 b2

c2 c1 c0

(Plate, 1995)

Page 14: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Embeddings (HOLE)

Holographic embeddings use circular correlation ‹ : Rd ˆ Rd Ñ Rd

pes , eoq « es ‹ eo

which is defined for a,b P Rd as

ra ‹ bsk “

d´1ÿ

i“0aibpk`iq mod d .

Compressed Tensor Product

a2

a1

a0

b0 b1 b2

c2 c1 c0(Plate, 1995)

Page 15: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Circular Correlation as a Compositional Operator

Components of entity embeddings « latent features of entities

Model relation instances via interactions of latent features

e.g., partyOf relation in the US presidents example:

Liberal persons are members of liberal partiesConservative persons are members of conservative parties

HOLE as a Neural Network

es1 es2 es3 eo1 eo2 eo3

rJp pes ‹ eoq

rp

subject object

(Liberal Person ^ Liberal Party)_

(Conserv. Person ^Conserv. Party)

Liberal Person Liberal Party

Liberal Person ^ Liberal Party

Page 16: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Circular Correlation as a Compositional Operator

Components of entity embeddings « latent features of entities

Model relation instances via interactions of latent features

e.g., partyOf relation in the US presidents example:

Liberal persons are members of liberal partiesConservative persons are members of conservative parties

HOLE as a Neural Network

es1 es2 es3 eo1 eo2 eo3

rJp pes ‹ eoq

rp

subject object

(Liberal Person ^ Liberal Party)_

(Conserv. Person ^Conserv. Party)

Liberal Person Liberal Party

Liberal Person ^ Liberal Party

Page 17: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Computing Holographic Embeddings

Runtime Complexity: We can compute circular correlation efficientlyvia fast Fourier transforms (FFT) in Opd logdq

a ‹ b “ F ´1pF paq d F pbqq

where F and F ´1 denote the FFT and its inverse.

Memory Complexity: Since circular correlation is a functionRd ˆ Rd Ñ Rd , the memory complexity is Opdq

es1 es2 es3 eo1 eo2 eo3

rppes b eoq

b

rp

subject object

Tensor Product

es1 es2 es3 eo1 eo2 eo3

rJp pes ‹ eoq

rp

subject object

Circular Correlation

Page 18: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Holographic Associative Memory

Holographic Associative Memory

Let pai ,bi q be stimulus-response pairs

Storage m Ðř

i ai ˚ bi

Retrieval b1 Ð a ‹m

Clean-up b Ð arg maxbi bJi pa ‹mq

Holographic Embeddings

Let So “

ps,pqˇ

ˇRppes , eoq “ 1(

Storage eo Ðř

ps,pq rp ˚ es

Retrieval r 1 Ð es ‹ eo

Probability σprJp pes ‹ eoqq

Generalization, not memorization

Storage

Retrieval

(Plate, 1995; Poggio, 1973; Gabor, 1969; Willshaw, 1985)

Page 19: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Experiments

Page 20: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Link Prediction on WordNet

• WordNet consists of lexicalrelationships between words

• WN18 subset (Bordes et al., 2013)

Entities 40,943Relation types 18Facts 151,442

Optics

Holography

optical

hypernym

derivational form

HOLE TRANSE TRANSRRESCAL ER-MLP0

0.2

0.4

0.6

0.8

1 0.94

0.50.61

0.89

0.71

0.93

0.11

0.34

0.84

0.63

0.950.89 0.880.9

0.78

0.95 0.94 0.940.930.86

MRR Hits@1 Hits@3 Hits@10

Page 21: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Link Prediction on Freebase

• Freebase consists of general factsabout the world (e.g., harvested fromWikipedia, MusicBrainz, etc.)

• FB15k subset (Bordes et al., 2013)

Entities 14,951Relation types 1345Facts 592,213

BarackObama

DemocraticParty

Joe Biden

party

vicePresident

HOLE TRANSE TRANSRRESCAL ER-MLP0

0.2

0.4

0.6

0.8

0.520.46

0.350.350.29

0.4

0.30.220.24

0.17

0.61 0.58

0.40.41

0.32

0.74 0.75

0.580.590.5

MRR Hits@1 Hits@3 Hits@10

Page 22: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

MRR vs Number of Parameters

FB15k

TRANSE

TRANSR

RESCAL

ER-MLP

HOLE

0 5 10 15 20 25 30 35

0.3

0.4

0.5

0.6

Number of Parameters in Millions

MR

R

Page 23: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Summary

• HOLE combines state-of-the-art relational learning andhigh scalability in a single model

• Enables complex models of knowledge graphs

• Interpretation in terms of associative memory

Future WorkSince circular correlation is a function Rd ˆ Rd Ñ Rd

Tuple is vector of same size as entity

John loves Mary

believesTom

Essential property to createrecursive representations

Nested Factsbelieves(Tom,loves(John,Mary))

Higher-arity RelationstaughtAt(Tom,AI,MIT)

Page 24: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Thank you

Software

• Open-Source Library for Knowledge Graph Embeddingshttp://github.com/mnick/scikit-kge

• Experiments for this Paperhttps://github.com/mnick/holographic-embeddings

Recent Review ArticleMaximilian Nickel, Kevin Murphy, et al. (2016). “A Review of Relational MachineLearning for Knowledge Graphs”. In: Proc. of the IEEE

Page 25: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs
Page 26: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Simple Reasoning

• Task: Predict the region of countries• Setting: 10-fold cross validation over countries

Region

Subregion

Test Country

Train Country

partOf

partOf

partOf

partOfneighbors

partOf

(Nickel et al., 2015)

Page 27: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Simple Reasoning

• Task: Predict the region of countries

• Setting: 10-fold cross validation over countries

Region

Subregion

Test Country

Train Country

partOf

partOf

partOfneighbors

partOfRANDOM

RULE

MLN-S

TRANSE

ER-MLP

RESCAL

HOLE

0

0.2

0.4

0.6

0.8

1

0.32

1

0.34

1 0.96 1 1

AUC-P

R

S1 : partOfpc, sq ^ partOfps, rq ñ partOfpc, rq

(Nickel et al., 2015)

Page 28: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Simple Reasoning

• Task: Predict the region of countries

• Setting: 10-fold cross validation over countries

Region

Subregion

Test Country

Train Country

partOf

partOfneighbors

partOfRANDOM

RULE

MLN-S

TRANSE

ER-MLP

RESCAL

HOLE

0

0.2

0.4

0.6

0.8

1

0.32

0.78

0.34

0.74 0.73 0.75 0.77

AUC-P

R

S2 : neighborspc1, c2q ^ partOfpc2, rq ñ partOfpc1, rq

(Nickel et al., 2015)

Page 29: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Simple Reasoning

• Task: Predict the region of countries

• Setting: 10-fold cross validation over countries

Region

Subregion

Test Country

Train Country

partOfneighbors

partOf

RANDOM

RULE

MLN-S

TRANSE

ER-MLP

RESCAL

HOLE

0

0.2

0.4

0.6

0.8

1

0.32

0.78

0.34

0.69 0.65 0.650.71

AUC-P

R

S3 : neighborspc1, c2q ^ partOfpc2, sq ^ partOfps, rq ñ partOfpc1, rq

(Nickel et al., 2015)

Page 30: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Link Prediction on SRL Benchmarks

• Holographic Embeddings keep excellent performance onSRL benchmark datasets

• Other knowledge graph embedding models perform worse

MLN

TRANSE

ER-MLP

RESCALHOLE

MLN

TRANSE

ER-MLP

RESCALHOLE

MLN

TRANSE

ER-MLP

RESCALHOLE

0

0.2

0.4

0.6

0.8

10.85

0.75

0.98

0.34

0.740.84

0.14

0.840.940.98

0.85

0.990.980.88

0.98

AU

C-P

R

Kinships Nations UMLS

(Nickel, Tresp, et al., 2011; Garcia-Duran et al., 2015)

Page 31: Holographic Embeddings of Knowledge Graphs• HOLE combines state-of-the-art relational learning and high scalability in a single model • Enables complex models of knowledge graphs

Relational Learning with HOLE

MAP estimates for Θ “ teiuni“1 Y trkumk“1 for the joint distribution

PrpY |Θq “

s“1

p“1

o“1Prpyspo “ 1|σprJ

p pes ‹ eoqqq

Shared representations enable relational learning

yspo

λe

es eo

rp

λr

NN

M

• Entities have same embeddings assubjects, objects, and over all relations

• Embeddings are learned jointly: allows topropagate information between triples

• Decoupling effect

• Known parameters: local computation• Parameter learning: global dependencies

• Holds for many compositional models