introduction to advanced probability for graphical modelsintroduction to advanced probability for...

IntroductiontoAdvancedProbabilityforGraphicalModels

CSC412ByElliotCreager

ThursdayJanuary11,2018

PresentedbyJonathanLorraine

*ManyslidesbasedonKaustav Kundu’s, KevinSwersky’s,Inmar Givoni’s,DannyTarlow’s,JasperSnoek’s slides,SamRoweis ‘sreviewofprobability, Bishop’sbook,andsomeimagesfromWikipedia

Outline

• Basics• Probabilityrules• Exponentialfamilymodels• Maximumlikelihood• ConjugateBayesianinference(timepermitting)

WhyRepresentUncertainty?

• Theworldisfullofuncertainty– “Whatwilltheweatherbeliketoday?”– “WillIlikethismovie?”– “Isthereapersoninthisimage?”

• We’retryingtobuildsystemsthatunderstandand(possibly)interactwiththerealworld

• Weoftencan’tprovesomethingistrue,butwecanstillaskhowlikelydifferentoutcomesareoraskforthemostlikelyexplanation

• Sometimesprobabilitygivesaconcisedescriptionofanotherwisecomplexphenomenon.

WhyUseProbabilitytoRepresentUncertainty?

• Writedownsimple,reasonablecriteriathatyou'dwantfromasystemofuncertainty(commonsensestuff),andyoualwaysgetprobability.

• CoxAxioms(Cox1946);SeeBishop,Section1.2.3

• Wewillrestrictourselvestoarelativelyinformaldiscussionofprobabilitytheory.

Notation• A randomvariableX representsoutcomesorstatesoftheworld.

• Wewillwritep(x)tomeanProbability(X=x)• Samplespace:thespaceofallpossibleoutcomes(maybediscrete,continuous,ormixed)

• p(x)istheprobabilitymass(density)function– Assignsanumbertoeachpointinsamplespace– Non-negative,sums(integrates)to1– Intuitively:howoftendoesxoccur,howmuchdowebelieveinx.

JointProbabilityDistribution• Prob(X=x,Y=y)– “ProbabilityofX=xandY=y”– p(x,y)

ConditionalProbabilityDistribution• Prob(X=x|Y=y)– “Probability ofX=xgivenY=y”– p(x|y)=p(x,y)/p(y)

MarginalProbabilityDistribution• Prob(X=x),Prob(Y=y)– “Probability ofX=x”– p(x)=\Sum_{y}p(x,y)=\Sum{y}p(x|y)p(y)

TheRulesofProbability

• SumRule(marginalization/summingout):

• Product/ChainRule:

),...,,(...)(

2112 3

Nx x x

xxxpxp

∑∑ ∑

),...,|()...|()(),...,()()|(),(

111211 −=

NNN xxxpxxpxpxxpxpxypyxp

Bayes’Rule

• Oneofthemostimportantformulasinprobabilitytheory

• Thisgivesusawayof“reversing”conditionalprobabilities

• Readas”Posterior=likelihood*prior/evidence”

')'()'|(

)()|()()()|()|(

xxpxypxpxyp

ypxpxypyxp

Independence

• Tworandomvariablesaresaidtobeindependent iff theirjointdistributionfactors

• Tworandomvariablesareconditionallyindependentgivenathirdiftheyareindependentafterconditioningonthethird

)()()()|()()|(),( ypxpypyxpxpxypyxp ===

zzxpzypzxpzxypzyxp ∀== )|()|()|(),|()|,(

ContinuousRandomVariables• Outcomesarerealvalues.Probabilitydensityfunctionsdefinedistributions.– E.g.,

• Continuousjointdistributions:replacesumswithintegrals,andeverythingholds– E.g.,Marginalizationandconditionalprobability

∫∫ ==yy

yPyzxPzyxPzxP )()|,(),,(),(

⎭⎬⎫

⎩⎨⎧ −−= 2

2 )(21exp

21),|( µ

σσπσµ xxP

SummarizingProbabilityDistributions

• Itisoftenusefultogivesummariesofdistributionswithoutdefiningthewholedistribution(E.g.,meanandvariance)

• Mean:

• Variance:

dxxpxxxEx

)(][ ∫ ⋅==

dxxpxExxx

)(])[()var( 2∫ ⋅−=

=E[x2 ]−E[x]2

ExponentialFamily

• Familyofprobabilitydistributions• Manyofthestandarddistributionsbelongtothisfamily– Bernoulli,binomial/multinomial,Poisson,Normal(Gaussian),beta/Dirichlet,…

• Sharemanyimportantproperties– e.g. Theyhaveaconjugateprior(we’llgettothatlater.ImportantforBayesianstatistics)

Definition• Theexponentialfamilyofdistributionsoverx,givenparameterη (eta)isthesetofdistributionsoftheform

• x-scalar/vector,discrete/continuous• η – ‘naturalparameters’• u(x)– somefunctionofx(sufficientstatistic)• g(η)– normalizer• h(x)– basemeasure(oftenconstant)

)}(exp{)()()|( xugxhxp Tηηη =

1)}(exp{)()( =∫ dxxuxhg Tηη

SufficientStatistics

• Vaguedefinition:calledsobecausetheycompletelysummarizeadistribution.

• Lessvague:theyaretheonlypartofthedistributionthatinteractswiththeparametersandarethereforesufficienttoestimatetheparameters.

• Perhapsthenumberoftimesacoincameupheads,orthesumofvaluesmagnitudes.

Example1:Bernoulli

• Binaryrandomvariable-• p(heads)=µ• Cointoss

xxxp −−= 1)1()|( µµµ

}1,0{∈X]1,0[∈µ

Example1:Bernoulli

xxxp −−= 1)1()|( µµµ

exp{ln)1(

)}1ln()1(lnexp{

⎟⎠

⎞⎜⎝

⎛−

−−+=

)()(11)(

ηση

ησµµµ

+==⇒⎟⎟

⎞⎜⎜⎝

)exp()()|( xxp ηηση −=

Example2:Multinomial• p(valuek)=µk

• Forasingleobservation– dietoss– SometimescalledCategorical

• Formultipleobservations– integercountsonNtrials– Prob(1cameout3times,2cameoutonce,…,6cameout7timesifItossedadie20times)

1],1,0[1

=∈ ∑=

kkk µµ

∏∏ =

!)|,...,( µµ

Example2:Multinomial(1observation)

}lnexp{1∑=

kkkx µ

xkMkxxP

11 )|,...,( µµ

)exp()|( xx Tp ηη =

Parametersarenotindependentduetoconstraintofsumming to1,there’saslightlymoreinvolvednotation toaddressthat,seeBishop2.4

Example3:Normal(Gaussian)Distribution

• Gaussian(Normal)

⎭⎬⎫

⎩⎨⎧ −−= 2

2 )(21exp

21),|( µ

σσπσµ xxp

• µisthemean• σ2 isthevariance• Canverifythesebycomputingintegrals.E.g.,

⎭⎬⎫

⎩⎨⎧ −−= 2

2 )(21exp

21),|( µ

σσπσµ xxp

x ⋅ 12πσ

exp −12σ 2 (x −µ)2

⎧ ⎨ ⎩

⎫ ⎬ ⎭ dx = µ

x→−∞

x→∞

• MultivariateGaussian

P(x |µ,∑) = 2π ∑−1/ 2 exp −12(x −µ)T ∑−1(x −µ)

⎧ ⎨ ⎩

⎫ ⎬ ⎭

• MultivariateGaussian

• x isnowavector• µisthemeanvector• Σ isthecovariancematrix

⎭⎬⎫

⎩⎨⎧ −∑−−∑=∑ −− )()(21exp2),|( 12/1

µµπµ xxxp T

ImportantPropertiesofGaussians

• Allmarginals ofaGaussianareagainGaussian• AnyconditionalofaGaussianisGaussian• TheproductoftwoGaussiansisagainGaussian

• EventhesumoftwoindependentGaussianRVsisaGaussian.

• Beyondthescopeofthistutorial,butveryimportant:marginalizationandconditioningrulesformultivariateGaussians.

Gaussianmarginalizationvisualization

ExponentialFamilyRepresentation

}21exp{)

4exp()2()2(

21exp{

)(21exp

21),|(

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡ −−=

⎭⎬⎫

⎩⎨⎧ −−=

σσµ

µσσ

µσσπ

)(xh )(ηg Tη )(xu

Example:MaximumLikelihoodFora1DGaussian

• SupposewearegivenadatasetofsamplesofaGaussianrandomvariableX,D={x1,…,xN}andtoldthatthevarianceofthedataisσ2

Whatisourbestguessofµ?*Needtoassumedataisindependentandidenticallydistributed(i.i.d.)

x1 x2 xN…

Whatisourbestguessofµ?• Wecanwritedownthelikelihoodfunction:

• Wewanttochoosetheµthatmaximizesthisexpression– Takelog,thenbasiccalculus:differentiatew.r.t.µ,setderivativeto0,solveforµtogetsamplemean

∏ ∏= = ⎭

⎬⎫

⎩⎨⎧ −−==µ

ii xxpdp1 1

21),|()|( µ

σσπσµ

µML =1N

x1 x2 xN…µML

MaximumLikelihood

MLestimationofmodelparametersforExponentialFamily

p(D |η) = p(x1,..., xN ) = h(xn )∏( )g(η)N exp{ηT u(xnn∑ )}

∂ln(p(D |η))

∂η= ..., set to 0, solve for ∇g(η)

=∇−N

nnML xu

1)(1)(ln η

• Caninprinciplebesolvedtogetestimateforeta.• ThesolutionfortheMLestimatordependsonthedataonlythroughsumoveru,whichisthereforecalledsufficientstatistic•Whatweneedtostoreinordertoestimateparameters.

BayesianProbabilities

• isthelikelihood function• isthepriorprobability of(orourpriorbelief over)θ– ourbeliefsoverwhatmodelsarelikelyornotbeforeseeinganydata

• isthenormalizationconstantorpartitionfunction

• istheposteriordistribution

– Readjustmentofourpriorbeliefsinthefaceofdata

)()()|()|(

dppdpdp θθ

∫= θθθ dPdpdp )()|()(

)|( θdp)(θp

)|( dp θ

Example:BayesianInferenceFora1DGaussian

• SupposewehaveapriorbeliefthatthemeanofsomerandomvariableXisµ0 andthevarianceofourbeliefisσ02

• WearethengivenadatasetofsamplesofX,d={x1,…,xN}andsomehowknowthatthevarianceofthedataisσ2

Whatistheposteriordistributionover(ourbeliefaboutthevalueof)µ?

x1 x2 xN…

x1 x2 xN… µ0

Priorbelief

• Rememberfromearlier

• isthelikelihoodfunction

• isthepriorprobability of(orourpriorbelief over)µ

)()()|()|(

dppdpdp µµ

)|( µdp

∏ ∏= = ⎭

⎬⎫

⎩⎨⎧ −−==µ

ii xxPdp1 1

21),|()|( µ

σσπσµ

⎭⎬⎫

⎩⎨⎧

µ−µ−=µµ 202

0000 )(

21),|(

σσπσp

),|()|()()|()|(

NNDppDpDp

σµµ=µ

µµ∝µ

Normal

µN =σ 2

Nσ 02 +σ 2 µ0 +

Nσ 02

Nσ 02 +σ 2 µML

1σN2 =

1σ 02 +

x1 x2 xN… µ0

Priorbelief

x1 x2 xN… µ0

PriorbeliefµML

MaximumLikelihood

x1 x2 xNµN

PriorbeliefMaximumLikelihood

PosteriorDistribution

ConjugatePriors• NoticeintheGaussianparameterestimationexamplethatthefunctionalformoftheposteriorwasthatoftheprior(Gaussian)

• Priorsthatleadtothatformarecalled‘conjugatepriors’

• Foranymemberoftheexponentialfamilythereexistsaconjugatepriorthatcanbewrittenlike

• Multiplybylikelihoodtoobtainposterior(uptonormalization)oftheform

• Noticetheadditiontothesufficientstatistic• ν istheeffectivenumberofpseudo-observations.

}exp{)(),(),|( χνηηνχνχη ν Tgfp =

)})((exp{)(),,|(1

νχηηνχη ν +∝ ∑=

TN xugDp

ConjugatePriors- Examples

• BetaforBernoulli/binomial• Dirichlet forcategorical/multinomial• NormalformeanofNormal• Andmanymore...

• Whataresomepropertiesoftheconjugatepriorforthecovariance(orprecision)matrixofanormaldistribution?

introduction to advanced probability for graphical modelsintroduction to advanced probability for...

Documents

calculos tipo creager

introduction to probability - universidad de atacama · aid...

probabilistic graphical models...graphical models,...

introduction to probability theory and graphical models

applications of graphical models - itc · probability...

capitulo6 perfil creager

aliviadero creager caucasiana y lecho

diseño de pergiles creager

informe perfil creager

29547078 creager caucasiana y lecho

9. probabilistic graphical models directed modelsmachine...

chapter 1 describing data: graphical and numerical...

creager local hub standard template - scanbalt.org ·...

probability - images.careers360.mobi · ... uniform and...

km c364e-20171130173110...creager agrees that, should the...

todd creager · todd creager, a relationship therapist,...

envolventes de lowry creager 2

probabilistic graphical models -...

for the district of maryland richard allen creager, … ·...

graphical probability models for inference and decision...