bayes’ nets: inference · 2018. 2. 21. · § bayes’ nets implicitly encode joint distribu+ons...
TRANSCRIPT
CSE473:Ar+ficialIntelligence
Bayes’Nets:Inference
LukeZe@lemoyer---UniversityofWashington[TheseslideswerecreatedbyDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableath@p://ai.berkeley.edu.]
Bayes’NetRepresenta+on
§ Adirected,acyclicgraph,onenodeperrandomvariable§ Acondi+onalprobabilitytable(CPT)foreachnode
§ Acollec+onofdistribu+onsoverX,oneforeachcombina+onofparents’values
§ Bayes’ netsimplicitlyencodejointdistribu+ons
§ Asaproductoflocalcondi+onaldistribu+ons
§ ToseewhatprobabilityaBNgivestoafullassignment,mul+plyalltherelevantcondi+onalstogether:
Example:AlarmNetwork
Burglary Earthqk
Alarm
Johncalls
Marycalls
B P(B)
+b 0.001
-b 0.999
E P(E)
+e 0.002
-e 0.998
B E A P(A|B,E)
+b +e +a 0.95
+b +e -a 0.05
+b -e +a 0.94
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
A J P(J|A)
+a +j 0.9
+a -j 0.1
-a +j 0.05
-a -j 0.95
A M P(M|A)
+a +m 0.7
+a -m 0.3
-a +m 0.01
-a -m 0.99[Demo: BN Applet]
Example:AlarmNetworkB P(B)
+b 0.001
-b 0.999
E P(E)
+e 0.002
-e 0.998
B E A P(A|B,E)
+b +e +a 0.95
+b +e -a 0.05
+b -e +a 0.94
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
A J P(J|A)
+a +j 0.9
+a -j 0.1
-a +j 0.05
-a -j 0.95
A M P(M|A)
+a +m 0.7
+a -m 0.3
-a +m 0.01
-a -m 0.99
B E
A
MJ
Example:AlarmNetworkB P(B)
+b 0.001
-b 0.999
E P(E)
+e 0.002
-e 0.998
B E A P(A|B,E)
+b +e +a 0.95
+b +e -a 0.05
+b -e +a 0.94
+b -e -a 0.06
-b +e +a 0.29
-b +e -a 0.71
-b -e +a 0.001
-b -e -a 0.999
A J P(J|A)
+a +j 0.9
+a -j 0.1
-a +j 0.05
-a -j 0.95
A M P(M|A)
+a +m 0.7
+a -m 0.3
-a +m 0.01
-a -m 0.99
B E
A
MJ
Bayes’Nets
§ Representa+on
§ Condi+onalIndependences
§ Probabilis+cInference§ Enumera+on(exact,exponen+alcomplexity)
§ Variableelimina+on(exact,worst-caseexponen+alcomplexity,ohenbe@er)
§ InferenceisNP-complete
§ Sampling(approximate)
§ LearningBayes’NetsfromData
§ Examples:
§ Posteriorprobability
§ Mostlikelyexplana+on:
Inference
§ Inference:calcula+ngsomeusefulquan+tyfromajointprobabilitydistribu+on
InferencebyEnumera+on§ Generalcase:
§ Evidencevariables:§ Query*variable:§ Hiddenvariables: Allvariables
*Worksfinewithmul:plequeryvariables,too
§ Wewant:
§ Step1:Selecttheentriesconsistentwiththeevidence
§ Step2:SumoutHtogetjointofQueryandevidence
§ Step3:Normalize
⇥ 1
Z
InferencebyEnumera+oninBayes’Net§ Givenunlimited+me,inferenceinBNsiseasy
§ Reminderofinferencebyenumera+onbyexample:B E
A
MJ
P (B |+ j,+m) /B P (B,+j,+m)
=X
e,a
P (B, e, a,+j,+m)
=X
e,a
P (B)P (e)P (a|B, e)P (+j|a)P (+m|a)
=P (B)P (+e)P (+a|B,+e)P (+j|+ a)P (+m|+ a) + P (B)P (+e)P (�a|B,+e)P (+j|� a)P (+m|� a)
P (B)P (�e)P (+a|B,�e)P (+j|+ a)P (+m|+ a) + P (B)P (�e)P (�a|B,�e)P (+j|� a)P (+m|� a)
InferencebyEnumera+on?
P (Antilock|observed variables) = ?
InferencebyEnumera+onvs.VariableElimina+on§ Whyisinferencebyenumera+onsoslow?
§ Youjoinupthewholejointdistribu+onbeforeyousumoutthehiddenvariables
§ Idea:interleavejoiningandmarginalizing!§ Called“VariableElimina+on”§ S+llNP-hard,butusuallymuchfasterthan
inferencebyenumera+on
§ Firstwe’llneedsomenewnota+on:factors
FactorZoo
FactorZooI
§ Jointdistribu+on:P(X,Y)§ EntriesP(x,y)forallx,y§ Sumsto1
§ Selectedjoint:P(x,Y)§ Asliceofthejointdistribu+on§ EntriesP(x,y)forfixedx,ally§ SumstoP(x)
§ Numberofcapitals=dimensionalityofthetable
T W P
hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3
T W P
cold sun 0.2
cold rain 0.3
FactorZooII
§ Singlecondi+onal:P(Y|x)§ EntriesP(y|x)forfixedx,ally§ Sumsto1
§ Familyofcondi+onals:P(X|Y)§ Mul+plecondi+onals§ EntriesP(x|y)forallx,y§ Sumsto|Y|
T W P
hot sun 0.8
hot rain 0.2
cold sun 0.4
cold rain 0.6
T W P
cold sun 0.4
cold rain 0.6
FactorZooIII
§ Specifiedfamily:P(y|X)§ EntriesP(y|x)forfixedy,butforallx
§ Sumsto…whoknows!
T W P
hot rain 0.2
cold rain 0.6
FactorZooSummary
§ Ingeneral,whenwewriteP(Y1…YN|X1…XM)
§ Itisa“factor,”amul+-dimensionalarray
§ ItsvaluesareP(y1…yN|x1…xM)
§ Anyassigned(=lower-case)XorYisadimensionmissing(selected)fromthearray
Example:TrafficDomain
§ RandomVariables§ R:Raining§ T:Traffic§ L:Lateforclass! T
L
R +r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
P (L) = ?
=X
r,t
P (r, t, L)
=X
r,t
P (r)P (t|r)P (L|t)
VariableElimina+on(VE)
InferencebyEnumera+on:ProceduralOutline
§ Trackobjectscalledfactors§ Ini+alfactorsarelocalCPTs(onepernode)
§ Anyknownvaluesareselected§ E.g.ifweknow,theini+alfactorsare
§ Procedure:Joinallfactors,theneliminateallhiddenvariables
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+t +l 0.3-t +l 0.1
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
Opera+on1:JoinFactors
§ Firstbasicopera+on:joiningfactors§ Combiningfactors:
§ Justlikeadatabasejoin§ Getallfactorsoverthejoiningvariable§ Buildanewfactorovertheunionofthevariables
involved
§ Example:JoinonR
§ Computa+onforeachentry:pointwiseproducts
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81T
R
R,T
Example:Mul+pleJoins
Example:Mul+pleJoins
T
R JoinR
L
R, T
L
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
R, T, L
+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729
JoinT
Opera+on2:Eliminate
§ Secondbasicopera+on:marginaliza+on
§ Takeafactorandsumoutavariable§ Shrinksafactortoasmallerone
§ Aprojec+onopera+on
§ Example:
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
+t 0.17-t 0.83
Mul+pleElimina+on
SumoutR
SumoutT
T, L L R, T, L
+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729
+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747
+l 0.134-l 0.886
ThusFar:Mul+pleJoin,Mul+pleEliminate(=InferencebyEnumera+on)
MarginalizingEarly(=VariableElimina+on)
TrafficDomain
§ InferencebyEnumera+onT
L
R P (L) = ?
§ VariableElimina+on
=X
t
P (L|t)X
r
P (r)P (t|r)
JoinonrJoinonr
Joinont
Joinont
Eliminater
Eliminatet
Eliminater
=X
t
X
r
P (L|t)P (r)P (t|r)
Eliminatet
MarginalizingEarly!(akaVE)SumoutR
T
L
+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+t 0.17-t 0.83
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
T
R
L
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
JoinR
R, T
L
T, L L
+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747
+l 0.134-l 0.866
JoinT SumoutT
Evidence
§ Ifevidence,startwithfactorsthatselectthatevidence§ Noevidenceusestheseini+alfactors:
§ Compu+ng,theini+alfactorsbecome:
§ Weeliminateallvarsotherthanquery+evidence
+r 0.1-r 0.9
+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
+r 0.1 +r +t 0.8+r -t 0.2
+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9
EvidenceII
§ Resultwillbeaselectedjointofqueryandevidence§ E.g.forP(L|+r),wewouldendupwith:
§ Togetouranswer,justnormalizethis!
§ That’sit!
+l 0.26-l 0.74
+r +l 0.026+r -l 0.074
Normalize
GeneralVariableElimina+on
§ Query:
§ Startwithini+alfactors:§ LocalCPTs(butinstan+atedbyevidence)
§ Whilethereares+llhiddenvariables(notQorevidence):§ PickahiddenvariableH§ Joinallfactorsmen+oningH§ Eliminate(sumout)H
§ Joinallremainingfactorsandnormalize
Example
Choose A
Example
ChooseE
FinishwithB
Normalize
Example2:P(B|a)
A B P
+a +b 0.08
+a ¬b 0.09B A P
+b +a 0.8
b ¬a 0.2
¬b +a 0.1
¬b ¬a 0.9
B P
+b 0.1
¬b 0.9 a
B a, B
Start/Select JoinonB Normalize
A B P
+a +b 8/17
+a ¬b 9/17
SameExampleinEqua+ons
marginalcanbeobtainedfromjointbysummingout
useBayes’netjointdistribu+onexpression
usex*(y+z)=xy+xz
joiningona,andthensummingoutgivesf1
usex*(y+z)=xy+xz
joiningone,andthensummingoutgivesf2
Allwearedoingisexploi4nguwy+uwz+uxy+uxz+vwy+vwz+vxy+vxz=(u+v)(w+x)(y+z)toimprovecomputa4onalefficiency!
AnotherVariableElimina+onExample
Computa+onalcomplexitycri+callydependsonthelargestfactorbeinggeneratedinthisprocess.Sizeoffactor=numberofentriesintable.Inexampleabove(assumingbinary)allfactorsgeneratedareofsize2---astheyallonlyhaveonevariable(Z,Z,andX3respec+vely).
VariableElimina+onOrdering
§ ForthequeryP(Xn|y1,…,yn)workthroughthefollowingtwodifferentorderingsasdoneinpreviousslide:Z,X1,…,Xn-1andX1,…,Xn-1,Z.Whatisthesizeofthemaximumfactorgeneratedforeachoftheorderings?
§ Answer:2n+1versus22(assumingbinary)
§ Ingeneral:theorderingcangreatlyaffectefficiency.
…
…
VE:Computa+onalandSpaceComplexity
§ Thecomputa+onalandspacecomplexityofvariableelimina+onisdeterminedbythelargestfactor
§ Theelimina+onorderingcangreatlyaffectthesizeofthelargestfactor.§ E.g.,previousslide’sexample2nvs.2
§ Doestherealwaysexistanorderingthatonlyresultsinsmallfactors?§ No!
WorstCaseComplexity?§ CSP:
§ IfwecananswerP(z)equaltozeroornot,weansweredwhetherthe3-SATproblemhasasolu+on.
§ HenceinferenceinBayes’netsisNP-hard.Noknownefficientprobabilis+cinferenceingeneral.
…
…
Polytrees
§ Apolytreeisadirectedgraphwithnoundirectedcycles
§ Forpoly-treesyoucanalwaysfindanorderingthatisefficient§ Tryit!!
§ Cut-setcondi+oningforBayes’netinference§ Choosesetofvariablessuchthatifremovedonlyapolytreeremains§ Exercise:Thinkabouthowthespecificswouldworkout!
Bayes’Nets
§ Representa+on
§ Condi+onalIndependences
§ Probabilis+cInference§ Enumera+on(exact,exponen+al
complexity)
§ Variableelimina+on(exact,worst-caseexponen+alcomplexity,ohenbe@er)
§ InferenceisNP-complete
§ Sampling(approximate)
§ LearningBayes’NetsfromData
ApproximateInference:Sampling
Sampling§ Samplingisalotlikerepeatedsimula+on
§ Predic+ngtheweather,basketballgames,…
§ Basicidea§ DrawNsamplesfromasamplingdistribu+onS
§ Computeanapproximateposteriorprobability
§ ShowthisconvergestothetrueprobabilityP
§ Whysample?§ Learning:getsamplesfromadistribu+on
youdon’tknow
§ Inference:gewngasampleisfasterthancompu+ngtherightanswer(e.g.withvariableelimina+on)
Sampling
§ Samplingfromgivendistribu+on§ Step1:Getsampleufromuniform
distribu+onover[0,1)§ E.g.random()inpython
§ Step2:Convertthissampleuintoanoutcomeforthegivendistribu+onbyhavingeachoutcomeassociatedwithasub-intervalof[0,1)withsub-intervalsizeequaltoprobabilityoftheoutcome
§ Example
§ Ifrandom()returnsu=0.83,thenoursampleisC=blue
§ E.g,ahersampling8+mes:
C P(C)red 0.6green 0.1blue 0.3
SamplinginBayes’Nets
§ PriorSampling
§ Rejec+onSampling
§ LikelihoodWeigh+ng
§ GibbsSampling
PriorSampling
PriorSampling
Cloudy
Sprinkler Rain
WetGrass
Cloudy
Sprinkler Rain
WetGrass
+c 0.5-c 0.5
+c
+s 0.1-s 0.9
-c
+s 0.5-s 0.5
+c
+r 0.8-r 0.2
-c
+r 0.2-r 0.8
+s
+r
+w 0.99-w 0.01
-r
+w 0.90-w 0.10
-s
+r
+w 0.90-w 0.10
-r
+w 0.01-w 0.99
Samples:
+c,-s,+r,+w-c,+s,-r,+w
…
PriorSampling
§ Fori=1,2,…,n§ SamplexifromP(Xi|Parents(Xi))
§ Return(x1,x2,…,xn)
PriorSampling
§ Thisprocessgeneratessampleswithprobability:
…i.e.theBN’sjointprobability
§ Letthenumberofsamplesofaneventbe
§ Then
§ I.e.,thesamplingprocedureisconsistent
Example
§ We’llgetabunchofsamplesfromtheBN:+c,-s,+r,+w+c,+s,+r,+w-c,+s,+r,-w+c,-s,+r,+w-c,-s,-r,+w
§ IfwewanttoknowP(W)§ Wehavecounts<+w:4,-w:1>§ NormalizetogetP(W)=<+w:0.8,-w:0.2>§ Thiswillgetclosertothetruedistribu+onwithmoresamples§ Canes+mateanythingelse,too§ WhataboutP(C|+w)?P(C|+r,+w)?P(C|-r,-w)?§ Fast:canusefewersamplesifless+me(what’sthedrawback?)
S R
W
C
Rejec+onSampling
+c,-s,+r,+w+c,+s,+r,+w-c,+s,+r,-w+c,-s,+r,+w-c,-s,-r,+w
Rejec+onSampling
§ Let’ssaywewantP(C)§ Nopointkeepingallsamplesaround§ JusttallycountsofCaswego
§ Let’ssaywewantP(C|+s)§ Samething:tallyCoutcomes,butignore(reject)sampleswhichdon’thaveS=+s
§ Thisiscalledrejec+onsampling§ Itisalsoconsistentforcondi+onalprobabili+es(i.e.,correctinthelimit)
S R
W
C
Rejec+onSampling§ IN:evidenceinstan+a+on§ Fori=1,2,…,n
§ SamplexifromP(Xi|Parents(Xi))
§ Ifxinotconsistentwithevidence§ Reject:Return,andnosampleisgeneratedinthiscycle
§ Return(x1,x2,…,xn)
LikelihoodWeigh+ng
§ Idea:fixevidencevariablesandsampletherest§ Problem:sampledistribu+onnotconsistent!§ Solu+on:weightbyprobabilityofevidence
givenparents
LikelihoodWeigh+ng
§ Problemwithrejec+onsampling:§ Ifevidenceisunlikely,rejectslotsofsamples§ Evidencenotexploitedasyousample§ ConsiderP(Shape|blue)
Shape ColorShape Color
pyramid,greenpyramid,redsphere,bluecube,redsphere,green
pyramid,bluepyramid,bluesphere,bluecube,bluesphere,blue
LikelihoodWeigh+ng
+c 0.5-c 0.5
+c
+s 0.1-s 0.9
-c
+s 0.5-s 0.5
+c
+r 0.8-r 0.2
-c
+r 0.2-r 0.8
+s
+r
+w 0.99-w 0.01
-r
+w 0.90-w 0.10
-s
+r
+w 0.90-w 0.10
-r
+w 0.01-w 0.99
Samples:
+c,+s,+r,+w…
Cloudy
Sprinkler Rain
WetGrass
Cloudy
Sprinkler Rain
WetGrass
LikelihoodWeigh+ng§ IN:evidenceinstan+a+on§ w=1.0§ fori=1,2,…,n
§ ifXiisanevidencevariable§ Xi=observa+onxiforXi§ Setw=w*P(xi|Parents(Xi))
§ else§ SamplexifromP(Xi|Parents(Xi))
§ return(x1,x2,…,xn),w
LikelihoodWeigh+ng
§ Samplingdistribu+onifzsampledandefixedevidence
§ Now,sampleshaveweights
§ Together,weightedsamplingdistribu+onisconsistent
Cloudy
R
C
S
W
LikelihoodWeigh+ng
§ Likelihoodweigh+ngisgood§ Wehavetakenevidenceintoaccountaswe
generatethesample§ E.g.here,W’svaluewillgetpickedbasedonthe
evidencevaluesofS,R§ Moreofoursampleswillreflectthestateofthe
worldsuggestedbytheevidence
§ Likelihoodweigh+ngdoesn’tsolveallourproblems§ Evidenceinfluencesthechoiceofdownstream
variables,butnotupstreamones(Cisn’tmorelikelytogetavaluematchingtheevidence)
§ WewouldliketoconsiderevidencewhenwesampleeveryvariableàGibbssampling
GibbsSampling
GibbsSampling
§ Procedure:keeptrackofafullinstan+a+onx1,x2,…,xn.Startwithanarbitraryinstan+a+onconsistentwiththeevidence.Sampleonevariableata+me,condi+onedonalltherest,butkeepevidencefixed.Keeprepea+ngthisforalong+me.
§ Property:inthelimitofrepea+ngthisinfinitelymany+mestheresul+ngsampleiscomingfromthecorrectdistribu+on
§ Ra:onale:bothupstreamanddownstreamvariablescondi+ononevidence.
§ Incontrast:likelihoodweigh+ngonlycondi+onsonupstreamevidence,andhenceweightsobtainedinlikelihoodweigh+ngcansome+mesbeverysmall.Sumofweightsoverallsamplesisindica+veofhowmany“effec+ve”sampleswereobtained,sowanthighweight.
§ Step2:Ini+alizeothervariables§ Randomly
GibbsSamplingExample:P(S|+r)
§ Step1:Fixevidence§ R=+r
§ Steps3:Repeat§ Chooseanon-evidencevariableX§ ResampleXfromP(X|allothervariables)
S +r
W
C
S +r
W
C
S +rW
CS +r
W
CS +r
W
CS +r
W
CS +r
W
CS +r
W
C
EfficientResamplingofOneVariable
§ SamplefromP(S|+c,+r,-w)
§ Manythingscancelout–onlyCPTswithSremain!§ Moregenerally:onlyCPTsthathaveresampledvariableneedtobeconsidered,and
joinedtogether
S +r
W
C
Bayes’NetSamplingSummary§ PriorSamplingP
§ LikelihoodWeigh+ngP(Q|e)
§ Rejec+onSamplingP(Q|e)
§ GibbsSamplingP(Q|e)
FurtherReadingonGibbsSampling*
§ Gibbssamplingproducessamplefromthequerydistribu+onP(Q|e)inlimitofre-samplinginfinitelyohen
§ GibbssamplingisaspecialcaseofmoregeneralmethodscalledMarkovchainMonteCarlo(MCMC)methods
§ Metropolis-Has+ngsisoneofthemorefamousMCMCmethods(infact,GibbssamplingisaspecialcaseofMetropolis-Has+ngs)
§ YoumayreadaboutMonteCarlomethods–they’rejustsampling
HowAboutPar+cleFiltering?
Par+cles:(3,3)(2,3)(3,3)(3,2)(3,3)(3,2)(1,2)(3,3)(3,3)(2,3)
Elapse Weight Resample
Par+cles:(3,2)(2,3)(3,2)(3,1)(3,3)(3,2)(1,3)(2,3)(3,2)(2,2)
Par+cles:(3,2)w=.9(2,3)w=.2(3,2)w=.9(3,1)w=.4(3,3)w=.4(3,2)w=.9(1,3)w=.1(2,3)w=.2(3,2)w=.9(2,2)w=.4
(New)Par+cles:(3,2)(2,2)(3,2)(2,3)(3,3)(3,2)(1,3)(2,3)(3,2)(3,2)
X2 X1 X2
E2
= likelihood weighting
Par+cleFiltering
§ Par+clefilteringoperatesonensembleofsamples§ Performslikelihoodweigh+ngforeachindividualsampletoelapse+meandincorporateevidence
§ Resamplesfromtheweightedensembleofsamplestofocuscomputa+onforthenext+mestepwheremostoftheprobabilitymassises+matedtobe