deep prolog: end-to-end di erentiable proving in knowledge...
TRANSCRIPT
Deep Prolog: End-to-end Differentiable
Proving in Knowledge Bases
Tim Rocktaschel
University College LondonComputer Science
2nd Conference on Artificial Intelligence and Theorem Proving
26th of March 2017
Overview
Machine Learning
Deep Learning
Inputs Outputs
X YTrainable Function
Artificial Neural Network
First-order Logic
“Every father of a parent is a grandfather.”
grandfatherOf(X,Y) :–
fatherOf(X,Z),
parentOf(Z,Y).
• Behavior learned automatically• Strong generalization• Needs a lot of training data• Behavior not interpretable
• Behavior defined manually• No generalisation• Needs no training data• Behavior interpretable
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 1/37
Outline
1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining
2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs
3 Deep Prolog: Neural Backward Chaining
4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor
5 Experiments
6 Summary
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 2/37
Outline
1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining
2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs
3 Deep Prolog: Neural Backward Chaining
4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor
5 Experiments
6 Summary
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 3/37
Notation
Constant: homer, bart, lisa etc. (lowercase)
Variable: X, Y etc. (uppercase, universally quantified)
Term: constant or variable
Predicate: fatherOf, parentOf etc.function from terms to a Boolean
Atom: predicate and terms, e.g., parentOf(X, bart)
Literal: negated or non-negated atom, e.g.,not parentOf(bart, lisa)
Rule: head :– body.head: literalbody: (possibly empty) list of literals representing conjunction
Fact: ground rule (no free variables) with empty body, e.g.,parentOf(homer,bart).
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 4/37
Example Knowledge Base
1 fatherOf(abe,homer).2 parentOf(homer, lisa).3 parentOf(homer,bart).4 grandpaOf(abe, lisa).5 grandfatherOf(abe,maggie).6 grandfatherOf(X1,Y1) :–
fatherOf(X1,Z1),parentOf(Z1,Y1).
7 grandparentOf(X2,Y2) :–grandfatherOf(X2,Y2).
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 5/37
Backward Chaining
1 def or(KB, goal, Ψ):2 for rule head :– body in KB do3 Ψ′ ← unify(head, goal, Ψ)
4 if Ψ′ 6= failure then5 for Ψ′′ in and(KB, body, Ψ′) do6 yield Ψ′′
7 def and(KB, subgoals, Ψ):8 if subgoals is empty then return Ψ;9 else
10 subgoal ← substitute(head(subgoals), Ψ)11 for Ψ′ in or(KB, subgoal, Ψ) do12 for Ψ′′ in and(KB, tail(subgoals), Ψ′) do yield Ψ′′ ;
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 6/37
Unification
1 def unify(A, B, Ψ):2 if Ψ = failure then return failure;3 else if A is variable then4 return unifyvar(A, B, Ψ)
5 else if B is variable then6 return unifyvar(B, A, Ψ)
7 else if A = [a1, . . . , aN ] and B = [b1, . . . , bN ] are atoms then8 Ψ′ ← unify([a2, . . . , aN ], [b2, . . . , bN ], Ψ)
9 return unify(a1, b1, Ψ′)
10 else if A = B then return Ψ;
11 else return failure;
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 7/37
Example
Example Knowledge Base:1. fatherOf(abe,homer).2. parentOf(homer,bart).3. grandfatherOf(X,Y) :–
fatherOf(X,Z),parentOf(Z,Y).
grandfatherOf(abe, bart)?
failure failure success{X/abe,Y/bart}
3.1 fatherOf(abe, Z)?
1 2 3
success{X/abe,Y/bart,Z/homer}3.2 parentOf(homer, bart)?
failure failure
1 2 3
failure success{X/abe,Y/bart,Z/homer}
failure
1 2 3
Query
or 0
and 0
or 1
or 1
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 8/37
Outline
1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining
2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs
3 Deep Prolog: Neural Backward Chaining
4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor
5 Experiments
6 Summary
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 9/37
Symbolic Representations
Symbols (constants and predicates) do not share any information:grandpaOf 6= grandfatherOf
No notion of similarity:apple ∼ orange, professorAt ∼ lecturerAt
No generalization beyond what can be symbolically inferred:isFruit(apple), apple ∼ organge, isFruit(orange)?
But... leads to powerful inference mechanisms and proofs forpredictions: fatherOf(abe,homer). parentOf(homer, lisa).parentOf(homer,bart).grandfatherOf(X,Y) :– fatherOf(X,Z), parentOf(Z,Y).grandfatherOf(abe,Q)? {Q/lisa}, {Q/bart}Fairly easy to debug and trivial to incorporate domain knowledge:just change/add rules
Hard to work with language, vision and other modalities‘‘is a film based on the novel of the same name by’’(X, Y)
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 10/37
Neural Representations
Lower-dimensional fixed-length vector representations ofsymbols (predicates and constants):vapple, vorange, vfatherOf, . . . ∈ Rk
Can capture similarity and even semantic hierarchy ofsymbols: vgrandpaOf = vgrandfatherOf,vapple ∼ vorange, vapple < vfruit
Can be trained from raw task data (e.g. facts)
Can be compositionalv‘‘is the father of’’ = RNNθ(vis, vthe, vfather, vof)
But... need large amount of training data
No direct way of incorporating prior knowledgevgrandfatherOf(X,Y) :– vfatherOf(X,Z), vparentOf(Z,Y).
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 11/37
Related Work
Fuzzy Logic (Zadeh, 1965)Probabilistic Logic Programming, e.g.,
IBAL (Pfeffer, 2001), BLOG (Milch et al., 2005), MarkovLogic Networks (Richardson and Domingos, 2006), ProbLog(De Raedt et al., 2007) . . .
Inductive Logic Programming, e.g.,Plotkin (1970), Shapiro (1991), Muggleton (1991), De Raedt(1999) . . .Statistical Predicate Invention (Kok and Domingos, 2007)
Neural-symbolic ConnectionismPropositional rules: EBL-ANN (Shavlik and Towell, 1989),KBANN (Towell and Shavlik, 1994), C-LIP (Garcez andZaverucha, 1999)First-order inference (no training of symbol representations):Unification Neural Networks (Holldobler, 1990;Komendantskaya 2011), SHRUTI (Shastri, 1992), NeuralProlog (Ding, 1995), CLIP++ (Franca et al. 2014), LiftedRelational Networks (Sourek et al. 2015)
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 12/37
Neural Link Prediction
Real world knowledge bases (like Freebase) are incomplete!
placeOfBirth attribute is missing for 71% of people!
Commonsense knowledge often not stated explicitly
Weak logical relationships that can be used for inferring facts
melinda bill microsoft
seattle
spouseOf chairmanOf
headquarteredInlivesIn?
Predict livesIn(melinda, seattle) using local scoring function
f (vlivesIn, vmelinda, v seattle)
Das et al. (2016) 13/37
State-of-the-art Neural Link Prediction
f (vlivesIn, vmelinda, v seattle)
DistMult (Yang et al., 2014)v s , v i , v j ∈ Rk
f (v s , v i , v j) = v>s (v i � v j)
=∑k
v skv ikv jk
ComplEx (Trouillon et al., 2016)v s , v i , v j ∈ Ck
f (v s , v i , v j) =
real(v s)>(real(v i )� real(v j))
+ real(v s)>(imag(v i )� imag(v j))
+ imag(v s)>(real(v i )� imag(v j))
− imag(v s)>(imag(v i )� real(v j))
Training Loss
L =∑
rs (ei ,ej ),y ∈ T
−y log (σ(f (v s , v i , v j)))− (1− y) log (1− σ(f (v s , v i , v j)))
Gradient-based optimization for learning v s , v i , v j from data
How do we calculate gradients ∇v sL, ∇v iL, ∇v jL?
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 14/37
Computation Graphs
x y
u1
dot
z
sigm
Example: z = f (x , y) = σ(x>y)
Nodes represent variables (inputs orparameters)
Directed edges to a node correspond to adifferentiable operation
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 15/37
Backpropagation
x y
u1
dot
z
sigm
∇z
∂z∂u1
∂u1
∂x
∂u1
∂y
Chain Rule of Calculus:Given function z = f (a) = f (g(b))
∇az =(∂b∂a
)>∇bz
Backpropagation is efficient recursiveapplication of the Chain Rule
Gradient of z = σ(x>y) w.r.t. x∇xz = ∂z
∂x = ∂z∂u1
∂u1∂x = σ(u1)(1− σ(u1))y
Given upstream supervision on z , we canlearn x and y !
Deep Learning = “Large” differentiable computation graphs
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 16/37
Outline
1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining
2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs
3 Deep Prolog: Neural Backward Chaining
4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor
5 Experiments
6 Summary
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 17/37
Aims
“We are attempting to replace symbols by vectors so wecan replace logic by algebra.” — Yann LeCun
End-to-end-differentiable proving
Calculate gradient of proof success w.r.t. symbolrepresentations
Train symbol representations from facts and rules in aknowledge base via gradient descent
Use similarity of symbol representations during proofs
Induce rules of predefined structure via gradient descent
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 18/37
Neural Knowledge Base
Symbolic Representation
1 fatherOf(abe,homer).2 parentOf(homer, lisa).3 parentOf(homer,bart).4 grandpaOf(abe, lisa).5 grandfatherOf(abe,maggie).6 grandfatherOf(X1,Y1) :–
fatherOf(X1,Z1),parentOf(Z1,Y1).
7 grandparentOf(X2,Y2) :–grandfatherOf(X2,Y2).
Neural-Symbolic Representation
1 vfatherOf(vabe, vhomer).2 vparentOf(vhomer, v lisa).3 vparentOf(vhomer, vbart).4 vgrandpaOf(vabe, v lisa).5 vgrandfatherOf(vabe, vmaggie).6 vgrandfatherOf(X1,Y1) :–
vfatherOf(X1,Z1),vparentOf(Z1,Y1).
7 vgrandparentOf(X2,Y2) :–vgrandfatherOf(X2,Y2).
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 19/37
Neural Unification
Soft-matching: τA,B = e−‖vA−vB‖2 ∈ [0, 1]
1 def unify(A, B, Ψ, τ):2 if Ψ = failure then return failure, 0;3 else if A is variable then4 return unifyvar(A, B, Ψ), τ
5 else if B is variable then6 return unifyvar(B, A, Ψ), τ
7 else if A = [a1, . . . , aN ] and B = [b1, . . . , bN ] are atoms then8 Ψ′, τ ′ ← unify([a2, . . . , aN ], [b2, . . . , bN ], Ψ, τ)
9 return unify(a1, b1, Ψ′, τ ′)
10 else if A and B are symbol representations then return Ψ, min(τ, τA,B);
11 else return failure, 0;
Example: unify vgrandfatherOf(X, vbart) with vgrandpaOf(vabe, vbart)
Ψ = {X/vabe}, τ = min(e−‖vgrandfatherOf−vgrandpaOf‖2 , e−‖vbart−vbart‖2 )
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 20/37
Compiling a Computation Graph using Backward Chaining
1 def or(KB, goal, Ψ, τ , D):2 for rule head :– body in KB do3 Ψ′, τ ′ ← unify(head, goal, Ψ, τ)
4 if Ψ′ 6= failure then5 for Ψ′′, τ ′′ in and(KB, body, Ψ′, τ ′, D) do6 yield Ψ′′, τ ′′
7 def and(KB, subgoals, Ψ, τ , D):8 if subgoals is empty then return Ψ, τ ;9 else if D = 0 then return failure;
10 else11 subgoal ← substitute(head(subgoals), Ψ)12 for Ψ′, τ ′ in or(KB, subgoal, Ψ, τ , D − 1) do13 for Ψ′′, τ ′′ in and(KB, tail(subgoals), Ψ′, τ ′, D) do yield
Ψ′′, τ ′′ ;
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 21/37
Example
Example Neural Knowledge Base:1. vfatherOf(vabe, vhomer)2. vparentOf(vhomer, vbart)3. vgrandfatherOf(X,Y) :–
vfatherOf(X,Z),vparentOf(Z,Y)
v s(v i , v j)?
{ }, τ1 { }, τ2 {X/v i ,Y/v j}, τ3
3.1 vfatherOf(v i ,Z)?
1 2 3
{X/v i ,Y/v j ,Z/vhomer}, τ31
3.2 vparentOf(vhomer, v j)?
1
failure
3
{X/v i ,Y/v j ,Z/vbart}, τ32
3.2 vparentOf(vbart, v j)?
2
{X/v i ,Y/v j ,Z/vhomer}, τ311
{X/v i ,Y/v j ,Z/vhomer}, τ312
failure
12
3
{X/v i ,Y/v j ,Z/vbart}, τ321
{X/v i ,Y/v j ,Z/vbart}, τ322
failure
12
3
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 22/37
Training
Proof Aggregation
Ψ, τ = or(KB,q, { }, 1,D)
τq = max τ
Supervision Signal
yq =
{1.0 if q ∈ F0.0 otherwise
Masking Unification for Training Facts
τq,B =
{0.0 if q ∈ F and q = B
τq,B otherwise
LossL =
∑q ∈ T
−yq log(τq)− (1− yq) log(1− τq)
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 23/37
Neural Inductive Logic Programming
1 v fatherOf(vabe, vhomer).2 v parentOf(vhomer, v lisa).3 v parentOf(vhomer, vbart).4 v grandpaOf(vabe, v lisa).5 v grandfatherOf(vabe, vmaggie).
6 θ1(X1,Y1) :–θ2(X1,Z1),θ3(Z1,Y1).
7 θ4(X2,Y2) :–θ5(X2,Y2).
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 24/37
Outline
1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining
2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs
3 Deep Prolog: Neural Backward Chaining
4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor
5 Experiments
6 Summary
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 25/37
Batch Proving: Utilizing GPUs
Let A ∈ RN×k be a matrix of N symbol representations that are tobe unified with M other symbol representations B ∈ RM×k
τA,B = e−√
Asq+Bsq−2AB>+ε ∈ RN×M
Asq =
∑k
i=1 A21i
...∑ki=1 A2
Ni
1>M ∈ RN×M
Bsq = 1N
∑k
i=1 B21i
...∑ki=1 B2
Mi
>
∈ RN×M
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 26/37
Batch Proving Example
Example Neural Knowledge Base:1. vfatherOf(vabe, vhomer)2. vparentOf(vhomer, vbart)3. vgrandfatherOf(X,Y) :–
vfatherOf(X,Z),vparentOf(Z,Y)
v s(v i , v j)?
{ },[τ1τ2
] 1,2
{X/v i ,Y/v j}, τ3
3.1 vfatherOf(v i ,Z)?
3
{X/v i ,Y/v j ,Z/
[vhomer
vbart
]},[τ31τ32
]3.2
[vparentOfvparentOf
]([vhomer
vbart
],[
v jv j
])?
failure
1,2 3
{X/v i ,Y/v j ,Z/
[vhomer
vbart
]},[τ311 τ321τ312 τ322
]failure
1,2 3
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 27/37
Gradient Approximation with Kmax Proofs
Example Neural Knowledge Base:1. vfatherOf(vabe, vhomer)2. vparentOf(vhomer, vbart)3. vgrandfatherOf(X,Y) :–
vfatherOf(X,Z),vparentOf(Z,Y)
v s(v i , v j)?
{ },[τ1τ2
] {X/v i ,Y/v j}, τ3
3.1 vfatherOf(v i ,Z)?
1,2 3
{X/v i ,Y/v j ,Z/ [vK ]} , [τ3K ]
3.2[vparentOf
]([vK ] , [v j ])?
failure
1,2 3
{X/v i ,Y/v j ,Z/ [vK ]} ,[τ3K1τ3K2
]failure
1,2 3
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 28/37
Regularization by Neural Link Predictor
Train jointly with neural link prediction method
Share symbol representations
Neural link prediction model quickly learns similarities betweensymbols
Let pq be score by neural link prediction model (DistMult orComplEx), and τq be the proof success
Multi-task training loss:
L =∑
q ∈ T−yq(log(τq) + log(pq))− (1− yq)(log(1− τq) + log(1− pq))
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 29/37
Outline
1 Reasoning with SymbolsKnowledge BasesProlog: Backward Chaining
2 Reasoning with Neural RepresentationsSymbolic vs. Neural RepresentationsNeural Link PredictionComputation Graphs
3 Deep Prolog: Neural Backward Chaining
4 OptimizationsBatch ProvingGradient ApproximationRegularization by Neural Link Predictor
5 Experiments
6 Summary
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 30/37
Experiments
Countries Knowledge Base (Bouchard et al., 2015)
Test Country
Train Country
Region
Subregion
neighborOf
locatedIn
locatedIn
locatedIn
locatedIn
locatedIn
locatedIn
locatedIn
locatedIn
Test Country
Train Country
Region
Subregion
neighborOf
locatedIn
locatedIn
locatedIn
locatedIn
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 31/37
Models
NTP: prover is trained alone
DistMult: neural link prediction model by Yang et al. (2014)
NTP DistMult: jointly training prover and DistMult, and usemaximum prediction at test time
NTP DistMult λ: only prover is used at test time; DistMult actsas a regularizer
ComplEx: neural link prediction model by Trouillon et al. (2016)
NTP ComplEx: jointly training prover and ComplEx, and use themaximum prediction at test time
NTP ComplEx λ: only prover is used at test time; ComplEx actsas a regularizer
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 32/37
Rule Templates
S1 θ1(X,Y) :– θ2(Y,Z).θ1(X,Y) :– θ2(X,Z), θ2(Z,Y).
S2 θ1(X,Y) :– θ2(X,Z), θ3(Z,Y).S3 θ1(X,Y) :– θ2(X,Z), θ3(Z,W), θ4(W,Y).
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 33/37
Results
Model S1 S2 S3
Random 32.3 32.3 32.3Frequency 32.3 32.3 30.8
ER-MLP (Dong et al., 2014) 96.0 74.5 65.0Rescal (Nickel et al., 2012) 99.7 74.5 65.0HolE (Nickel et al., 2015) 99.7 77.2 69.7TARE (Wang et al., 2017) 99.4 90.6 89.0
NTP 97.3 83.7 70.0
DistMult (Yang et al., 2014) 98.1 98.3 65.5NTP DistMult 99.2 96.7 87.0NTP DistMult λ 99.4 98.3 95.9
ComplEx (Trouillon et al., 2016) 99.9 97.1 78.6NTP ComplEx 100.0 98.9 89.1NTP ComplEx λ 99.3 98.2 95.1
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 34/37
Results
S1 S2 S3Task
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
AU
C
NTPDistMultComplExNTP DistMultNTP ComplExNTP DistMult NTP ComplEx
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 35/37
Induced Logic Programs
Task Confidence Rule
S1 0.999 neighborOf(X,Y) :–neighborOf(Y,X).
0.767 locatedIn(X,Y) :–locatedIn(X,Z),locatedIn(Z,Y).
S2 0.998 neighborOf(X,Y) :–neighborOf(Y,X).
0.995 locatedIn(X,Y) :–locatedIn(X,Z),locatedIn(Z,Y).
0.705 locatedIn(X,Y) :–neighborOf(X,Z),locatedIn(Z,Y).
S3 0.891 neighborOf(X,Y) :–neighborOf(Y,X).
0.750 locatedIn(X,Y) :–neighborOf(X,Z),neighborOf(Z,W),locatedIn(W,Y).
Test Country
Train Country
Region
Subregion
neighborOf
locatedIn
locatedIn
locatedIn
locatedIn
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 36/37
Summary
Prolog’s backward chaining can be used as a recipe forrecursively constructing a neural network
Proof success differentiable w.r.t. symbol representations
Can learn vector representations of symbols and rules ofpredefined structure
Various optimizations: batch proving, gradient approximation
Outperforms neural link prediction models on a medium-sizedknowledge base
Induces interpretable rules
Tim Rocktaschel Deep Prolog: End-to-end Differentiable Proving in Knowledge Bases 37/37
Thank you!
http://rockt.github.com
tim [dot] rocktaeschel [at] gmail [dot] comTwitter: @ rockt
References
Guillaume Bouchard, Sameer Singh, and Theo Trouillon. 2015. On approximate reasoning capabilities of low-rankvector spaces. AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): IntegratingSymbolic and Neural Approaches.
Rajarshi Das, Arvind Neelakantan, David Belanger, and Andrew McCallum. 2016. Chains of reasoning over entities,relations, and text using recurrent neural networks. arXiv preprint arXiv:1607.01426.
Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, ShaohuaSun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining,pages 601–610. ACM.
Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2015. Holographic embeddings of knowledge graphs.arXiv preprint arXiv:1510.04935.
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2012. Factorizing yago: scalable machine learning forlinked data. In Proc. of International Conference on World Wide Web (WWW), pages 271–280.
Tim Rocktaschel. 2017. Combining Representation Learning with Logic for Language Processing. Ph.D. thesis,University College London, Gower Street, London WC1E 6BT, United Kingdom.
Tim Rocktaschel and Sebastian Riedel. 2016. Learning knowledge base inference with neural theorem provers. InNAACL Workshop on Automated Knowledge Base Construction (AKBC).
Theo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. 2016. Complexembeddings for simple link prediction.
Mengya Wang, Hankui Zhuo, and Huiling Zhu. 2017. Embedding knowledge graphs based on transitivity andantisymmetry of rules. arXiv preprint arXiv:1702.07543.
Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2014. Embedding entities and relations forlearning and inference in knowledge bases. arXiv preprint arXiv:1412.6575.