First-OrderProbabilistic Inference
Rodrigo de Salvo Braz
2
A remarkFeel free to ask clarification questions!
3
Slides online Just search for
“Rodrigo de Salvo Braz” and check at the presentations page.(www.cs.berkeley.edu/~braz)
4
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
5
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
6
How to solvecommonsense
reasoning
How to solveNatural
LanguageProcessing
How to solvePlanning
How to solveVision
DomainKnowledge
Inferenceand
Learning
LanguageKnowledge
Inferenceand
Learning
Domain &Planning
Knowledge
Inferenceand
Learning
Objects &Optics
Knowledge
Inferenceand
Learning
Abstracting Inference and Learning
Commonsense reasoning
Natural LanguageProcessing
Planning Vision ...
Inferenceand
Learning
AI Problems
7
Inference and LearningWhat capabilities should it have? Inference
andLearningIt should represent and use:
•predicates (relations) and quantification over objects: 8 X tank(X) Ç plane(X) 8 X tank(X) ) color(X)=greencolor(o121) = red
•varying degrees of uncertainty: 8 X { tank(X) ) color(X)=green }
since it may hold often but not absolutely.•essential for language, vision, pattern recognition,
commonsense reasoning etc•modal knowledge, utilities, and more.
8
Abstracting Inference and Learning
Domain knowledge:
tank(X) Ç plane(X){ tank(X) ) color(X)=green}color(o121) = red...
Commonsense reasoningLanguage knowledge:
{ Sentence(S) Æ Verb(S,”attacks”) Æ Subject(S,X) Æ Object(S,Y) ) attacking(X,Y) }...
Natural Language Processing
Inferenceand
Learning
Sharing inference and learning module
9
Abstracting Inference and Learning
Domain knowledge:
tank(X) Ç plane(X){ tank(X) ) color(X)=green}color(o121) = red...
Commonsense reasoning WITH Natural Language ProcessingLanguage knowledge:
{ Sentence(S) Æ Verb(S,”attacks”) Æ Subject(S,X) Æ Object(S,Y) ) attacking(X,Y) }...
Inferenceand
Learning
Joint solution made simpler
Verb(s13,”attacks”), Subject(s13,o121)...
10
Standard solutions fall short Logic
Has objects and properties; but statements are absolute;
Graphical models and machine learning Have varying degrees of uncertainty; but propositional; treating objects and
quantification is awkward.
11
Standard solutions fall shortGraphical models and objects
tank(X) Ç plane(X){ tank(X) )color(X)=green }{ next(X,Y) Æ tank(X) ) tank(Y) }
color(o121)=red color(o135)=green
color(121)
tank(o121) plane(o121)
next(o121, o135)
tank(o135) plane(o135)
color(o135)
•Transformation (propositionalization) not part of graphical model framework;
•Graphical model have only random variables, not objects;
•Different data creates distinct, formally unrelated model.
12
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
13
First-Order Probabilistic Inference Many languages have been proposed
Knowledge-based model construction (Breese, 1992) Probabilistic Logic Programming (Ng & Subrahmanian, 1992) Probabilistic Logic Programming (Ngo and Haddawy, 1995) Probabilistic Relational Models (Friedman et al., 1999) Relational Dependency Networks (Neville & Jensen, 2004) Bayesian Logic (BLOG) (Milch & Russell, 2004) Bayesian Logic Programs (Kersting & DeRaedt, 2001) DAPER (Heckerman, Meek & Koller, 2007) Markov Logic Networks (Richardson & Domingos, 2004)
“Smart” propositionalization is the main method.
14
Propositionalization Bayesian Logic Programs
Prolog-like statements such as
rescue(Battalion) | in(Soldier,Battalion), wounded(Soldier).
wounded(Soldier) | in(Soldier,Battalion),
attacked(Battalion).
associated with CPTs and combination rules.
|instead of
:-
15
Propositionalizationrescue(Battalion) | in(Soldier,Battalion),
wounded(Soldier). (plus CPT, cr)BN grounded from and-or tree:
rescue(bat13)
in(john,bat13)
wounded(john)
in(peter,bat13)
wounded(peter)
...
...
...
...
...
16
PropositionalizationMulti-Entity Bayesian Networks (Laskey, 2004)
in(Soldier,Battalion) wounded(Soldier)
rescue(Battalion)
in(Soldier,Battalion) attacked(Battalion)
wounded(Soldier)
17
PropositionalizationMulti-Entity Bayesian Networks (Laskey, 2004)
in(Soldier,Battalion) wounded(Soldier)
rescue(Battalion)
in(john,bat13) wounded(john)
rescue(bat13)
in(peter,bat13) wounded(peter)
rescue(bat13)
...
first-order fragments
instantiated
18
PropositionalizationMulti-Entity Bayesian Networks (Laskey, 2004)
in(john,bat13) wounded(john)
rescue(bat13)
in(peter,bat13) wounded(peter)
...
in(john,bat13) wounded(john)
rescue(bat13)
in(peter,bat13) wounded(peter)
rescue(bat13)
...
19
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
20
Lifted Inference
rescue(bat13)
rescue(Battalion) ( in(Soldier,Battalion) Æ wounded(Soldier){ wounded(Soldier) ( in(Soldier,Battalion) Æ attacked(Battalion) }
wounded(Soldier)
attacked(bat13)
in(Soldier, bat13)
Unbound, general variable
•Faster•More compact•More intuitive•Higher level – more
information/structure available for optimization
21
Bayesian Networks (directed)
epidemic
P(sick_john, sick_mary, sick_bob, epidemic)= P(sick_john | epidemic) * P(sick_mary | epidemic) * P(sick_bob | epidemic) * P(epidemic)
sick_john sick_mary sick_bob
P(sick_bob | epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_mary | epidemic)
22
Factor Networks (undirected)
epidemic_france
P(epi_france, epi_belgium, epi_uk, epi_germany)/ (epi_france, epi_belgium) * (epi_france, epi_uk) * (epi_france, epi_germany) * (epi_belgium, epi_germany)
epidemic_belgium epidemic_germany
epidemic_uk
factor,or potential
function
23
Bayesian Nets as Factor Networks
epidemic
P(sick_john, sick_mary, sick_bob, epidemic)/ P(sick_john | epidemic) * P(sick_mary | epidemic) * P(sick_bob | epidemic) * P(epidemic)
sick_john sick_mary sick_bob
P(sick_bob | epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_mary | epidemic)
24
Inference: Marginalization
epidemic
P(sick_john) / epidemic sick_mary sick_bob P(sick_john | epidemic) * P(sick_mary | epidemic) * P(sick_bob | epidemic) * P(epidemic)
sick_john sick_mary sick_bob
P(sick_bob | epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_mary | epidemic)
25
epidemic
sick_john sick_mary sick_bob
P(sick_bob | epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_mary | epidemic)
P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic)
* sick_mary P(sick_mary | epidemic)
* sick_bob P(sick_bob | epidemic)
Inference: Variable Elimination (VE)
26
epidemic
sick_john sick_mary
(epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_mary | epidemic)
P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic)
* sick_mary P(sick_mary | epidemic) * (epidemic)
Inference: Variable Elimination (VE)
27
epidemic
sick_john sick_mary
(epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_mary | epidemic)
P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic) * (epidemic)
* sick_mary P(sick_mary | epidemic)
Inference: Variable Elimination (VE)
28
epidemic
sick_john
(epidemic)
P(epidemic)
P(sick_john | epidemic)
P(sick_john) / epidemic P(sick_john | epidemic) * P(epidemic) * (epidemic) * (epidemic)
(epidemic)
Inference: Variable Elimination (VE)
29
sick_john
(sick_john)
P(sick_john) / (sick_john)
Inference: Variable Elimination (VE)
30
A Factor Network
sick(mary,measles)
hospital(mary)
epidemic(measles) epidemic(flu)
sick(mary,flu)
…
… sick(bob,measles)
hospital(bob)
sick(bob,flu)……
…
… …
… …
31
First-order Representation
sick(P,D)
hospital(P)
epidemic(D)
Atoms representa set of random
variables
Logical Variablesparameterize random
variables
Parfactors, for parameterized
factors
Contraints to logical
variables
P john
32
Semantics
sick(mary,measles)
hospital(mary)
epidemic(measles) epidemic(flu)
sick(mary,flu)
…
… sick(bob,measles)
hospital(bob)
sick(bob,flu)……
…
… …
… …sick(mary,measles), epidemic(measles))
33
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
34
Inversion Elimination (IE)
sick(P, D)
epidemic(D)
D measles
D measles
IE
…sick(john, flu)
epidemic(flu)
sick(mary, rubella)
epidemic(rubella)…
Grounding
…sick(john, flu)
epidemic(flu)
sick(mary, rubella)
epidemic(rubella)…
VE
Abstraction
epidemic(D)
D measles
D measles
sick(P, D)
sick(P,D) (epidemic(D),sick(P,D))
35
Inversion Elimination (IE) Proposed by Poole (2003); Lacked formalization; Did not identify cases in which it is
incorrect; Formalized and identified correctness
conditions (with Dan Roth and Eyal Amir).
36
Requires eliminated RVs to occur in separate instances of parfactor
…sick(mary, flu)
epidemic(flu)
sick(mary, rubella)
epidemic(rubella)…
sick(mary, D)
epidemic(D)
D measles
InversionElimination
correct
Inversion Elimination - Limitations
37
IE not correct
Requires eliminated RVs to occur in separate instances of parfactor
Inversion Elimination - Limitations
epidemic(D2)
epidemic(D1)
D1 D2 month
epidemic(measles)
epidemic(flu)
epidemic(rubella)
…month
epidemic(D2)
epidemic(D1)
D1 D2 month
38
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
39
Counting Elimination Need to consider joint assignments; Exponential number of those; But actually, potential depends on
histogram of values in assignment only: 00101 the same as 11000;
Polynomial number of assignments instead.
epidemic(D2)
epidemic(D1)
D1 D2 month
epidemic(measles)
epidemic(flu)
epidemic(rubella)
…month
40
A Simple Experiment
41
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
42
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
43
BLOG (Bayesian LOGic) Milch & Russell (2004); A probabilistic logic language; Current inference is propositional
sampling; Special characteristics:
Open Universe; Expressive language.
44
type Battalion;#Battalion ~ Uniform(1,10);
random Boolean Large(Battalion bat) ~ Bernoulli(0.6);
random NaturalNum Region(Battalion bat) ~ TabularCPD[[0.3, 0.4, 0.2, 0.1]]();
type Soldier;#Soldier(BatallionOf = Battalion bat)
if Large(bat) then ~ Poisson(1500); else if Region(bat) = 2 then = 300; else ~ Poisson(500);
query Average( {#Soldier(b) for Battalion b} );
BLOG (Bayesian LOGic)Open Universe
Expressive language
45
Inference in BLOGrandom Boolean Attacked(bat) ~ Bernoulli(0.7);
random Boolean Wounded(Soldier sol)if Attacked(BattalionOf(sol)) then if Experienced(sol) then ~ Bernoulli(0.1); else ~ Bernoulli(0.4); else = False;
random Boolean Experienced(Soldier sol) ~ Bernoulli(0.1);
random Boolean Rescue(Battalion bat)= exists Soldier sol BattalionOf(sol) = bat & Wounded(sol);
guaranteed Battalion bat13;query Rescue(bat13);
46
Inference in BLOG - Sampling
Rescue(bat13)
#Soldier(bat13)
Wounded(soldier1)
Wounded(soldier73)
Attacked(bat13)
...
Sampled false
Open Universe
false
false
false
47
Inference in BLOG - Sampling
Rescue(bat13)
#Soldier(bat13)
Wounded(soldier1)
Wounded(soldier49)
Attacked(bat13)
...
Sampled true
Open Universe
true Experienced(soldier1)
truetrue
48
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
49
Temporal models in BLOG Expressive enough to directly write temporal
modelstype Aircraft;#Aircraft ~ Poisson[3];
random Real Position(Aircraft a, NaturalNum t) if t = 0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Pred(t)), 2);type Blip;// num of blips from aircraft a is 0 or 1#Blip(Source = Aircraft a, Time = NaturalNum t) ~ TabularCPD[[0.2, 0.8]]()
// num false alarms has Poisson distribution.#Blip(Time = NaturalNum t) ~ Poisson[2];
random Real ApparentPos(Blip b, NaturalNum t) ~ Gaussian(Position(Source(b), Pred(t)), 2));
50
Temporal models in BLOG Inference algorithm does not use
temporal structure: Markov property: state depends on
previous state only; evidence and query come for successive
time steps; Dynamic BLOG (DBLOG); analogous to Dynamic Bayesian
networks (DBNs).
51
DBLOG Syntax Only change in syntax is a special Timestep
type, with same semantics as NaturalNum:type Aircraft;#Aircraft ~ Poisson[3];
random Real Position(Aircraft a, Timestep t) if t = @0 then ~ UniformReal[-10, 10]() else ~ Gaussian(Position(a, Prev(t)), 2);type Blip;// num of blips from aircraft a is 0 or 1#Blip(Source = Aircraft a, Time = Timestep t) ~ TabularCPD[[0.2, 0.8]]()
// num false alarms has Poisson distribution.#Blip(Time = Timestep t) ~ Poisson[2];
random Real ApparentPos(Blip b, Timestep t) ~ Gaussian(Position(Source(b), Prev(t)), 2));
52
Similar idea to DBN Particle Filtering
DBLOG Particle Filtering
evidence at t
State samples for t - 1
Samples weighed by evidence
resampling
State samples for t
Likelihoodweighting
53
Weighting by evidence in DBNs
evidence
state
t-1 t
54
Weighting evidence in DBLOG
evidence
state Position(a1,@1)
Position(a2,@1)
#Aircraft
t-1 t
obs {Blip b : Source(b) = a1} = {B1};
obs ApparentPos(B1, @2) = 3.2;ApparentPos(B1,@2)
#Blip(a1,@2)
Position(a1,@2)B1
Lazy instantiation
55
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
56
Retro-instantiation in DBLOG
Position(a1,@1)
Position(a2,@1)
#Aircraft
ApparentPos(B1,@2)
#Blip(a1,@2)
Position(a1,@2)B1
ApparentPos(B8,@9)
#Blip(a2,@9)
Position(a2,@9)
B8
Position(a2,@2)
...
obs {Blip b : Source(b) = a2} = {B8};
obs ApparentPos(B8,@2) = 2.1;
...
...
57
Coping with retro-instantiation Analyse possible queries and
evidence (provided by user) andpre-instantiate random variables that will be necessary later;
Use mixing time to decide when to stop sampling back.
58
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
59
Improving BLOG inference Rejection sampling
evidence
evidence
= ?
We count samples that match the evidence.
60
Improving BLOG inference Likelihood weighting
evidence
We count samples using evidence likelihood as weight.
61
Improving BLOG inference
{Happy(p) : Person p}=
{true, true, false}
Happy(john)= true
Happy(mary)= false
Happy(peter)=false
Evidence is deterministic function, so likelihood is always 0 or 1.So likelihood weighting is no better than rejection sampling.
62
Improving BLOG inference
{Salary(p) : Person p}=
{90,000, 75,000, 30,000}
Salary(john)= 100,000
Salary(mary)= 80,000
Salary(peter)=500,000
The more unlikely the evidence, the worse it gets.
63
Improving BLOG inference
{ApparentPos(b) : Blip b}=
{1.2, 0.5, 5.1}
ApparentPos(b1) = 2.1
ApparentPos(b2) = 3.3
ApparentPos(b3) = 3.5
Doesn’t work at all for continuous evidence.
64
Structure-dependent automatic importance sampling
{ApparentPos(b) : Blip b}=
{1.2, 0.5, 5.1}
ApparentPos(b1) ApparentPos(b2) ApparentPos(b3)
Position(Source(b1)) = 1.5
Position(Source(b2)) = 3.9
Position(Source(b3)) = 4.0
= 1.2 = 5.1 = 0.5 High-level language allows algorithm to automatically apply better sampling schemes
65
Outline A goal: First-order Probabilistic Inference
An ultimate goal Propositionalization
Lifted Inference Inversion Elimination Counting Elimination
DBLOG BLOG background DBLOG Retro-instantiation Improving BLOG inference
Future Directions and Conclusion
66
For the future Lifted inference works on symmetric,
indistinguishable objects only; Adapt approximations for dealing with
similar objects; Parameterized queries:
P(sick(X, measles)) = ? Function symbols:
(diabetes(X), diabetes(father(X))) Equality (paramodulation); Learning.
67
Conclusion Expressive, first-order probabilistic
representations in inference too! Lifted inference equivalent to
grounded inference; much faster, yet yielding the same exact answer;
DBLOG brings techniques for temporal processes reasoning to first-order probabilistic reasoning.
68