bayes’ nets: inference · 2018. 2. 21. · § bayes’ nets implicitly encode joint distribu+ons...

CSE473:Ar+ficialIntelligence

Bayes’Nets:Inference

LukeZe@lemoyer---UniversityofWashington[TheseslideswerecreatedbyDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableath@p://ai.berkeley.edu.]

Bayes’NetRepresenta+on

§  Adirected,acyclicgraph,onenodeperrandomvariable§  Acondi+onalprobabilitytable(CPT)foreachnode

§  Acollec+onofdistribu+onsoverX,oneforeachcombina+onofparents’values

§  Bayes’ netsimplicitlyencodejointdistribu+ons

§  Asaproductoflocalcondi+onaldistribu+ons

§  ToseewhatprobabilityaBNgivestoafullassignment,mul+plyalltherelevantcondi+onalstogether:

Example:AlarmNetwork

Burglary Earthqk

Alarm

Johncalls

Marycalls

B P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99[Demo: BN Applet]

Example:AlarmNetworkB P(B)

+b 0.001

-b 0.999

E P(E)

+e 0.002

-e 0.998

B E A P(A|B,E)

+b +e +a 0.95

+b +e -a 0.05

+b -e +a 0.94

+b -e -a 0.06

-b +e +a 0.29

-b +e -a 0.71

-b -e +a 0.001

-b -e -a 0.999

A J P(J|A)

+a +j 0.9

+a -j 0.1

-a +j 0.05

-a -j 0.95

A M P(M|A)

+a +m 0.7

+a -m 0.3

-a +m 0.01

-a -m 0.99

B E

A

MJ

Bayes’Nets

§  Representa+on

§  Condi+onalIndependences

§  Probabilis+cInference§  Enumera+on(exact,exponen+alcomplexity)

§  Variableelimina+on(exact,worst-caseexponen+alcomplexity,ohenbe@er)

§  InferenceisNP-complete

§  Sampling(approximate)

§  LearningBayes’NetsfromData

§  Examples:

§  Posteriorprobability

§  Mostlikelyexplana+on:

Inference

§  Inference:calcula+ngsomeusefulquan+tyfromajointprobabilitydistribu+on

InferencebyEnumera+on§  Generalcase:

§  Evidencevariables:§  Query*variable:§  Hiddenvariables: Allvariables

*Worksfinewithmul:plequeryvariables,too

§  Wewant:

§  Step1:Selecttheentriesconsistentwiththeevidence

§  Step2:SumoutHtogetjointofQueryandevidence

§  Step3:Normalize

⇥ 1

Z

InferencebyEnumera+on?

P (Antilock|observed variables) = ?

InferencebyEnumera+onvs.VariableElimina+on§  Whyisinferencebyenumera+onsoslow?

§  Youjoinupthewholejointdistribu+onbeforeyousumoutthehiddenvariables

§  Idea:interleavejoiningandmarginalizing!§  Called“VariableElimina+on”§  S+llNP-hard,butusuallymuchfasterthan

inferencebyenumera+on

§  Firstwe’llneedsomenewnota+on:factors

FactorZoo

FactorZooI

§  Jointdistribu+on:P(X,Y)§  EntriesP(x,y)forallx,y§  Sumsto1

§  Selectedjoint:P(x,Y)§  Asliceofthejointdistribu+on§  EntriesP(x,y)forfixedx,ally§  SumstoP(x)

§  Numberofcapitals=dimensionalityofthetable

T W P

hot sun 0.4

hot rain 0.1

cold sun 0.2

cold rain 0.3

T W P

cold sun 0.2

cold rain 0.3

FactorZooIII

§  Specifiedfamily:P(y|X)§  EntriesP(y|x)forfixedy,butforallx

§  Sumsto…whoknows!

T W P

hot rain 0.2

cold rain 0.6

FactorZooSummary

§  Ingeneral,whenwewriteP(Y1…YN|X1…XM)

§  Itisa“factor,”amul+-dimensionalarray

§  ItsvaluesareP(y1…yN|x1…xM)

§  Anyassigned(=lower-case)XorYisadimensionmissing(selected)fromthearray

Example:TrafficDomain

§  RandomVariables§  R:Raining§  T:Traffic§  L:Lateforclass! T

L

R +r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

P (L) = ?

=X

r,t

P (r, t, L)

=X

r,t

P (r)P (t|r)P (L|t)

VariableElimina+on(VE)

InferencebyEnumera+on:ProceduralOutline

§  Trackobjectscalledfactors§  Ini+alfactorsarelocalCPTs(onepernode)

§  Anyknownvaluesareselected§  E.g.ifweknow,theini+alfactorsare

§  Procedure:Joinallfactors,theneliminateallhiddenvariables

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+t +l 0.3-t +l 0.1

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

Opera+on1:JoinFactors

§  Firstbasicopera+on:joiningfactors§  Combiningfactors:

§  Justlikeadatabasejoin§  Getallfactorsoverthejoiningvariable§  Buildanewfactorovertheunionofthevariables

involved

§  Example:JoinonR

§  Computa+onforeachentry:pointwiseproducts

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81T

R

R,T

Example:Mul+pleJoins

Example:Mul+pleJoins

T

R JoinR

L

R, T

L

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

R, T, L

+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729

JoinT

Opera+on2:Eliminate

§  Secondbasicopera+on:marginaliza+on

§  Takeafactorandsumoutavariable§  Shrinksafactortoasmallerone

§  Aprojec+onopera+on

§  Example:

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t 0.17-t 0.83

Mul+pleElimina+on

SumoutR

SumoutT

T, L L R, T, L

+r +t +l 0.024+r +t -l 0.056+r -t +l 0.002+r -t -l 0.018-r +t +l 0.027-r +t -l 0.063-r -t +l 0.081-r -t -l 0.729

+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747

+l 0.134-l 0.886

ThusFar:Mul+pleJoin,Mul+pleEliminate(=InferencebyEnumera+on)

MarginalizingEarly(=VariableElimina+on)

TrafficDomain

§  InferencebyEnumera+onT

L

R P (L) = ?

§  VariableElimina+on

=X

t

P (L|t)X

r

P (r)P (t|r)

JoinonrJoinonr

Joinont

Joinont

Eliminater

Eliminatet

Eliminater

=X

t

X

r

P (L|t)P (r)P (t|r)

Eliminatet

MarginalizingEarly!(akaVE)SumoutR

T

L

+r +t 0.08+r -t 0.02-r +t 0.09-r -t 0.81

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+t 0.17-t 0.83

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

T

R

L

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

JoinR

R, T

L

T, L L

+t +l 0.051+t -l 0.119-t +l 0.083-t -l 0.747

+l 0.134-l 0.866

JoinT SumoutT

Evidence

§  Ifevidence,startwithfactorsthatselectthatevidence§  Noevidenceusestheseini+alfactors:

§  Compu+ng,theini+alfactorsbecome:

§  Weeliminateallvarsotherthanquery+evidence

+r 0.1-r 0.9

+r +t 0.8+r -t 0.2-r +t 0.1-r -t 0.9

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

+r 0.1 +r +t 0.8+r -t 0.2

+t +l 0.3+t -l 0.7-t +l 0.1-t -l 0.9

EvidenceII

§  Resultwillbeaselectedjointofqueryandevidence§  E.g.forP(L|+r),wewouldendupwith:

§  Togetouranswer,justnormalizethis!

§  That’sit!

+l 0.26-l 0.74

+r +l 0.026+r -l 0.074

Normalize

GeneralVariableElimina+on

§  Query:

§  Startwithini+alfactors:§  LocalCPTs(butinstan+atedbyevidence)

§  Whilethereares+llhiddenvariables(notQorevidence):§  PickahiddenvariableH§  Joinallfactorsmen+oningH§  Eliminate(sumout)H

§  Joinallremainingfactorsandnormalize

Example

Choose A

Example

ChooseE

FinishwithB

Normalize

Example2:P(B|a)

A B P

+a +b 0.08

+a ¬b 0.09B A P

+b +a 0.8

b ¬a 0.2

¬b +a 0.1

¬b ¬a 0.9

B P

+b 0.1

¬b 0.9 a

B a, B

Start/Select JoinonB Normalize

A B P

+a +b 8/17

+a ¬b 9/17

SameExampleinEqua+ons

marginalcanbeobtainedfromjointbysummingout

useBayes’netjointdistribu+onexpression

usex*(y+z)=xy+xz

joiningona,andthensummingoutgivesf1

usex*(y+z)=xy+xz

joiningone,andthensummingoutgivesf2

Allwearedoingisexploi4nguwy+uwz+uxy+uxz+vwy+vwz+vxy+vxz=(u+v)(w+x)(y+z)toimprovecomputa4onalefficiency!

AnotherVariableElimina+onExample

Computa+onalcomplexitycri+callydependsonthelargestfactorbeinggeneratedinthisprocess.Sizeoffactor=numberofentriesintable.Inexampleabove(assumingbinary)allfactorsgeneratedareofsize2---astheyallonlyhaveonevariable(Z,Z,andX3respec+vely).

VariableElimina+onOrdering

§  ForthequeryP(Xn|y1,…,yn)workthroughthefollowingtwodifferentorderingsasdoneinpreviousslide:Z,X1,…,Xn-1andX1,…,Xn-1,Z.Whatisthesizeofthemaximumfactorgeneratedforeachoftheorderings?

§  Answer:2n+1versus22(assumingbinary)

§  Ingeneral:theorderingcangreatlyaffectefficiency.

…

…

VE:Computa+onalandSpaceComplexity

§  Thecomputa+onalandspacecomplexityofvariableelimina+onisdeterminedbythelargestfactor

§  Theelimina+onorderingcangreatlyaffectthesizeofthelargestfactor.§  E.g.,previousslide’sexample2nvs.2

§  Doestherealwaysexistanorderingthatonlyresultsinsmallfactors?§  No!

WorstCaseComplexity?§  CSP:

§  IfwecananswerP(z)equaltozeroornot,weansweredwhetherthe3-SATproblemhasasolu+on.

§  HenceinferenceinBayes’netsisNP-hard.Noknownefficientprobabilis+cinferenceingeneral.

…

…

Polytrees

§  Apolytreeisadirectedgraphwithnoundirectedcycles

§  Forpoly-treesyoucanalwaysfindanorderingthatisefficient§  Tryit!!

§  Cut-setcondi+oningforBayes’netinference§  Choosesetofvariablessuchthatifremovedonlyapolytreeremains§  Exercise:Thinkabouthowthespecificswouldworkout!

Bayes’Nets

§  Representa+on

§  Condi+onalIndependences

§  Probabilis+cInference§  Enumera+on(exact,exponen+al

complexity)

§  Variableelimina+on(exact,worst-caseexponen+alcomplexity,ohenbe@er)

§  InferenceisNP-complete

§  Sampling(approximate)

§  LearningBayes’NetsfromData

ApproximateInference:Sampling

Sampling§  Samplingisalotlikerepeatedsimula+on

§  Predic+ngtheweather,basketballgames,…

§  Basicidea§  DrawNsamplesfromasamplingdistribu+onS

§  Computeanapproximateposteriorprobability

§  ShowthisconvergestothetrueprobabilityP

§  Whysample?§  Learning:getsamplesfromadistribu+on

youdon’tknow

§  Inference:gewngasampleisfasterthancompu+ngtherightanswer(e.g.withvariableelimina+on)

Sampling

§  Samplingfromgivendistribu+on§  Step1:Getsampleufromuniform

distribu+onover[0,1)§  E.g.random()inpython

§  Step2:Convertthissampleuintoanoutcomeforthegivendistribu+onbyhavingeachoutcomeassociatedwithasub-intervalof[0,1)withsub-intervalsizeequaltoprobabilityoftheoutcome

§  Example

§  Ifrandom()returnsu=0.83,thenoursampleisC=blue

§  E.g,ahersampling8+mes:

C P(C)red 0.6green 0.1blue 0.3

SamplinginBayes’Nets

§  PriorSampling

§  Rejec+onSampling

§  LikelihoodWeigh+ng

§  GibbsSampling

PriorSampling

PriorSampling

Cloudy

Sprinkler Rain

WetGrass

Cloudy

Sprinkler Rain

WetGrass

+c 0.5-c 0.5

+c

+s 0.1-s 0.9

-c

+s 0.5-s 0.5

+c

+r 0.8-r 0.2

-c

+r 0.2-r 0.8

+s

+r

+w 0.99-w 0.01

-r

+w 0.90-w 0.10

-s

+r

+w 0.90-w 0.10

-r

+w 0.01-w 0.99

Samples:

+c,-s,+r,+w-c,+s,-r,+w

…

PriorSampling

§  Fori=1,2,…,n§  SamplexifromP(Xi|Parents(Xi))

§  Return(x1,x2,…,xn)

PriorSampling

§  Thisprocessgeneratessampleswithprobability:

…i.e.theBN’sjointprobability

§  Letthenumberofsamplesofaneventbe

§  Then

§  I.e.,thesamplingprocedureisconsistent

Example

§  We’llgetabunchofsamplesfromtheBN:+c,-s,+r,+w+c,+s,+r,+w-c,+s,+r,-w+c,-s,+r,+w-c,-s,-r,+w

§  IfwewanttoknowP(W)§  Wehavecounts<+w:4,-w:1>§  NormalizetogetP(W)=<+w:0.8,-w:0.2>§  Thiswillgetclosertothetruedistribu+onwithmoresamples§  Canes+mateanythingelse,too§  WhataboutP(C|+w)?P(C|+r,+w)?P(C|-r,-w)?§  Fast:canusefewersamplesifless+me(what’sthedrawback?)

S R

W

C

Rejec+onSampling

+c,-s,+r,+w+c,+s,+r,+w-c,+s,+r,-w+c,-s,+r,+w-c,-s,-r,+w

Rejec+onSampling

§  Let’ssaywewantP(C)§  Nopointkeepingallsamplesaround§  JusttallycountsofCaswego

§  Let’ssaywewantP(C|+s)§  Samething:tallyCoutcomes,butignore(reject)sampleswhichdon’thaveS=+s

§  Thisiscalledrejec+onsampling§  Itisalsoconsistentforcondi+onalprobabili+es(i.e.,correctinthelimit)

S R

W

C

Rejec+onSampling§  IN:evidenceinstan+a+on§  Fori=1,2,…,n

§  SamplexifromP(Xi|Parents(Xi))

§  Ifxinotconsistentwithevidence§  Reject:Return,andnosampleisgeneratedinthiscycle

§  Return(x1,x2,…,xn)

LikelihoodWeigh+ng

§  Idea:fixevidencevariablesandsampletherest§  Problem:sampledistribu+onnotconsistent!§  Solu+on:weightbyprobabilityofevidence

givenparents

LikelihoodWeigh+ng

§  Problemwithrejec+onsampling:§  Ifevidenceisunlikely,rejectslotsofsamples§  Evidencenotexploitedasyousample§  ConsiderP(Shape|blue)

Shape ColorShape Color

pyramid,greenpyramid,redsphere,bluecube,redsphere,green

pyramid,bluepyramid,bluesphere,bluecube,bluesphere,blue

LikelihoodWeigh+ng

+c 0.5-c 0.5

+c

+s 0.1-s 0.9

-c

+s 0.5-s 0.5

+c

+r 0.8-r 0.2

-c

+r 0.2-r 0.8

+s

+r

+w 0.99-w 0.01

-r

+w 0.90-w 0.10

-s

+r

+w 0.90-w 0.10

-r

+w 0.01-w 0.99

Samples:

+c,+s,+r,+w…

Cloudy

Sprinkler Rain

WetGrass

Cloudy

Sprinkler Rain

WetGrass

LikelihoodWeigh+ng§  IN:evidenceinstan+a+on§  w=1.0§  fori=1,2,…,n

§  ifXiisanevidencevariable§  Xi=observa+onxiforXi§  Setw=w*P(xi|Parents(Xi))

§  else§  SamplexifromP(Xi|Parents(Xi))

§  return(x1,x2,…,xn),w

LikelihoodWeigh+ng

§  Samplingdistribu+onifzsampledandefixedevidence

§  Now,sampleshaveweights

§  Together,weightedsamplingdistribu+onisconsistent

Cloudy

R

C

S

W

LikelihoodWeigh+ng

§  Likelihoodweigh+ngisgood§  Wehavetakenevidenceintoaccountaswe

generatethesample§  E.g.here,W’svaluewillgetpickedbasedonthe

evidencevaluesofS,R§  Moreofoursampleswillreflectthestateofthe

worldsuggestedbytheevidence

§  Likelihoodweigh+ngdoesn’tsolveallourproblems§  Evidenceinfluencesthechoiceofdownstream

variables,butnotupstreamones(Cisn’tmorelikelytogetavaluematchingtheevidence)

§  WewouldliketoconsiderevidencewhenwesampleeveryvariableàGibbssampling

GibbsSampling

GibbsSampling

§  Procedure:keeptrackofafullinstan+a+onx1,x2,…,xn.Startwithanarbitraryinstan+a+onconsistentwiththeevidence.Sampleonevariableata+me,condi+onedonalltherest,butkeepevidencefixed.Keeprepea+ngthisforalong+me.

§  Property:inthelimitofrepea+ngthisinfinitelymany+mestheresul+ngsampleiscomingfromthecorrectdistribu+on

§  Ra:onale:bothupstreamanddownstreamvariablescondi+ononevidence.

§  Incontrast:likelihoodweigh+ngonlycondi+onsonupstreamevidence,andhenceweightsobtainedinlikelihoodweigh+ngcansome+mesbeverysmall.Sumofweightsoverallsamplesisindica+veofhowmany“effec+ve”sampleswereobtained,sowanthighweight.

§  Step2:Ini+alizeothervariables§  Randomly

GibbsSamplingExample:P(S|+r)

§  Step1:Fixevidence§  R=+r

§  Steps3:Repeat§  Chooseanon-evidencevariableX§  ResampleXfromP(X|allothervariables)

S +r

W

C

S +r

W

C

S +rW

CS +r

W

CS +r

W

CS +r

W

CS +r

W

CS +r

W

C

EfficientResamplingofOneVariable

§  SamplefromP(S|+c,+r,-w)

§  Manythingscancelout–onlyCPTswithSremain!§  Moregenerally:onlyCPTsthathaveresampledvariableneedtobeconsidered,and

joinedtogether

S +r

W

C

Bayes’NetSamplingSummary§  PriorSamplingP

§  LikelihoodWeigh+ngP(Q|e)

§  Rejec+onSamplingP(Q|e)

§  GibbsSamplingP(Q|e)

FurtherReadingonGibbsSampling*

§  Gibbssamplingproducessamplefromthequerydistribu+onP(Q|e)inlimitofre-samplinginfinitelyohen

§  GibbssamplingisaspecialcaseofmoregeneralmethodscalledMarkovchainMonteCarlo(MCMC)methods

§  Metropolis-Has+ngsisoneofthemorefamousMCMCmethods(infact,GibbssamplingisaspecialcaseofMetropolis-Has+ngs)

§  YoumayreadaboutMonteCarlomethods–they’rejustsampling

HowAboutPar+cleFiltering?

Par+cles:(3,3)(2,3)(3,3)(3,2)(3,3)(3,2)(1,2)(3,3)(3,3)(2,3)

Elapse Weight Resample

Par+cles:(3,2)(2,3)(3,2)(3,1)(3,3)(3,2)(1,3)(2,3)(3,2)(2,2)

Par+cles:(3,2)w=.9(2,3)w=.2(3,2)w=.9(3,1)w=.4(3,3)w=.4(3,2)w=.9(1,3)w=.1(2,3)w=.2(3,2)w=.9(2,2)w=.4

(New)Par+cles:(3,2)(2,2)(3,2)(2,3)(3,3)(3,2)(1,3)(2,3)(3,2)(3,2)

X2 X1 X2

E2

= likelihood weighting

Par+cleFiltering

§  Par+clefilteringoperatesonensembleofsamples§  Performslikelihoodweigh+ngforeachindividualsampletoelapse+meandincorporateevidence

§  Resamplesfromtheweightedensembleofsamplestofocuscomputa+onforthenext+mestepwheremostoftheprobabilitymassises+matedtobe

bayes’ nets: inference · 2018. 2. 21. · § bayes’ nets implicitly encode joint distribu+ons...

Documents