introduction to machine...

36
Probabilistic graphical models Yifeng Tao School of Computer Science Carnegie Mellon University Slides adapted from Eric Xing, Matt Gormley Carnegie Mellon University 1 Yifeng Tao Introduction to Machine Learning

Upload: others

Post on 27-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Probabilistic graphical modelsYifeng Tao

School of Computer ScienceCarnegie Mellon University

Slides adapted from Eric Xing, Matt Gormley

Carnegie Mellon University 1Yifeng Tao

Introduction to Machine Learning

Page 2: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Recap of Basic Probability Concepts

oRepresentation: the joint probability distribution on multiple binary variables?

oState configurations in total: 28

oAre they all needed to be represented? oDo we get any scientific/medical insight?

oLearning: where do we get all this probabilities? oMaximal-likelihood estimation?

oInference: If not all variables are observable, how to compute the conditional distribution of latent variables given evidence?oComputing p(H|A) would require summing over all 26 configurations of the

unobserved variables

Yifeng Tao Carnegie Mellon University 2

[Slide from EricXing.]

Page 3: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Graphical Model: Structure Simplifies Representation

oDependencies among variables

Yifeng Tao Carnegie Mellon University 3

[Slide from EricXing.]

Page 4: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Probabilistic Graphical Models

oIf Xi’s are conditionally independent (as described by a PGM), the joint can be factored to a product of simpler terms, e.g.,

oWhy we may favor a PGM?o Incorporation of domain knowledge and causal (logical) structureso2+2+4+4+4+8+4+8=36, an 8-fold reduction from 28 in representation cost!

Yifeng Tao Carnegie Mellon University 4

[Slide from EricXing.]

Page 5: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Two types of GMs

oDirected edges give causality relationships (Bayesian Network or Directed Graphical Model):

oUndirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):

Yifeng Tao Carnegie Mellon University 5

[Slide from EricXing.]

Page 6: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Bayesian Network

oDefinition:

oIt consists of a graph G and the conditional probabilities P oThese two parts full specify the distribution:

oQualitative Specification: GoQuantitative Specification: P

Yifeng Tao Carnegie Mellon University 6

[Slide from EricXing.]

Page 7: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Where does the qualitative specification come from?

oPrior knowledge of causal relationships oLearning from data (i.e. structure learning) oWe simply prefer a certain architecture (e.g. a layered graph) o…

Yifeng Tao Carnegie Mellon University 7

[Slide from MattGormley.]

Page 8: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Quantitative Specification

oExample: Conditional probability tables (CPTs) for discrete random variables

Yifeng Tao Carnegie Mellon University 8

[Slide from EricXing.]

Page 9: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Quantitative Specification

oExample: Conditional probability density functions (CPDs) for continuous random variables

Yifeng Tao Carnegie Mellon University 9

[Slide from EricXing.]

Page 10: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Observed Variables

oIn a graphical model, shaded nodes are “observed”, i.e. their values are given

Yifeng Tao Carnegie Mellon University 10

[Slide from MattGormley.]

Page 11: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

GMs are your old friends

oDensity estimation oParametric and nonparametric methods

oRegression oLinear, conditional mixture, nonparametric

oClassification oGenerative and discriminative approach

oClustering

Yifeng Tao Carnegie Mellon University 11

[Slide from EricXing.]

Page 12: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

What Independencies does a Bayes Net Model?

oIndependency of X and Z given Y?P(X|Y)P(Z|Y) = P(X,Z|Y)

oThree cases of interest...oProof?

Yifeng Tao Carnegie Mellon University 12

[Slide from MattGormley.]

Page 13: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

The “Burglar Alarm” example

oYour house has a twitchy burglar alarm that is also sometimes triggered by earthquakes.

oEarth arguably doesn’t care whether your house is currently being burgled.

oWhile you are on vacation, one of your neighbors calls and tells you your home’s burglar alarm is ringing.

Yifeng Tao Carnegie Mellon University 13

[Slide from MattGormley.]

Page 14: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Markov Blanket

oDef: the co-parents of a node are the parents of its children

oDef: the Markov Blanket of a node is the set containing the node’s parents, children, and co-parents.

oThm: a node is conditionally independent of every other node in the graph given its Markov blanket

oExample: The Markov Blanket of X6 is{X3, X4, X5, X8, X9, X10}

Yifeng Tao Carnegie Mellon University 14

[Slide from MattGormley.]

Page 15: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Markov Blanket

oExample: The Markov Blanket of X6 is{X3, X4, X5, X8, X9, X10}

Yifeng Tao Carnegie Mellon University 15

[Slide from MattGormley.]

Page 16: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

D-Separation

oThm: If variables X and Z are d-separated given a set of variables EThen X and Z are conditionally independent given the set E

oDefinition:oVariables X and Z are d-separated given a set of evidence variables E

iff every path from X to Z is “blocked”.

Yifeng Tao Carnegie Mellon University 16

[Slide from MattGormley.]

Page 17: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

D-SeparationoVariables X and Z are d-separated

given a set of evidence variables Eiff every path from X to Z is “blocked”.

Yifeng Tao Carnegie Mellon University 17

[Slide from EricXing.]

Page 18: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Machine Learning

Yifeng Tao Carnegie Mellon University 18

[Slide from MattGormley.]

Page 19: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Recipe for Closed-form MLE

Yifeng Tao Carnegie Mellon University 19

[Slide from MattGormley.]

Page 20: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Learning Fully Observed BNs

oHow do we learn these conditional and marginal distributions for a Bayes Net?

Yifeng Tao Carnegie Mellon University 20

[Slide from MattGormley.]

Page 21: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Learning Fully Observed BNs

oLearning this fully observed Bayesian Network is equivalent to learning five (small / simple) independent networks from the same data

Yifeng Tao Carnegie Mellon University 21

[Slide from MattGormley.]

Page 22: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Learning Fully Observed BNs

Yifeng Tao Carnegie Mellon University 22

[Slide from MattGormley.]

Page 23: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Learning Partially Observed BNs

oPartially Observed Bayesian Network:oMaximal likelihood estimation à Incomplete log-likelihoodoThe log-likelihood contains unobserved latent variables

oSolve with EM algorithmoExample: Gaussian Mixture Models (GMMs)

Yifeng Tao Carnegie Mellon University 23

[Slide from EricXing.]

Page 24: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Inference of BNs

oSuppose we already have the parameters of a Bayesian Network...

Yifeng Tao Carnegie Mellon University 24

[Slide from MattGormley.]

Page 25: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Approaches to inference

oExact inference algorithmsoThe elimination algorithm à Message PassingoBelief propagationoThe junction tree algorithms

oApproximate inference techniques oVariational algorithms oStochastic simulation / sampling methods oMarkov chain Monte Carlo methods

Yifeng Tao Carnegie Mellon University 25

[Slide from EricXing.]

Page 26: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Marginalization and Elimination

Yifeng Tao Carnegie Mellon University 26

[Slide from EricXing.]

Page 27: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Marginalization and Elimination

Yifeng Tao Carnegie Mellon University 27

[Slide from EricXing.]

Page 28: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Yifeng Tao Carnegie Mellon University 28

[Slide from EricXing.]

Page 29: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

oStep 8: Wrap-up

Yifeng Tao Carnegie Mellon University 29

[Slide from EricXing.]

Page 30: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Elimination algorithm

oElimination on trees is equivalent to message passing on branchesoMessage-passing is consistent in trees

oApplication: HMM

Yifeng Tao Carnegie Mellon University 30

[Slide from EricXing.]

Page 31: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Gibbs Sampling

Yifeng Tao Carnegie Mellon University 31

[Slide from MattGormley.]

Page 32: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Gibbs Sampling

Yifeng Tao Carnegie Mellon University 32

[Slide from MattGormley.]

Page 33: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Gibbs Sampling

Yifeng Tao Carnegie Mellon University 33

[Slide from MattGormley.]

Page 34: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Gibbs Sampling

oFull conditionals only need to condition on the Markov Blanket

oMust be “easy” to sample from conditionals

oMany conditionals are log-concave and are amenable to adaptive rejection sampling

Yifeng Tao Carnegie Mellon University 34

[Slide from MattGormley.]

Page 35: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

Take home message

oGraphical models portrays the sparse dependencies of variablesoTwo types of graphical models: Bayesian network and Markov

random fieldoConditional independence, Markov blanket, and d-separationoLearning fully observed and partially observed Bayesian networksoExact inference and approximate inference of Bayesian networks

Carnegie Mellon University 35Yifeng Tao

Page 36: Introduction to Machine Learningyifengt/courses/machine-learning/slides/lecture8-probabilistic...Markov Blanket oDef: the co-parents of a node are the parents of its children oDef:

References

oEric Xing, Ziv Bar-Joseph. 10701 Introduction to Machine Learning: http://www.cs.cmu.edu/~epxing/Class/10701/

oMatt Gormley. 10601 Introduction to Machine Learning: http://www.cs.cmu.edu/~mgormley/courses/10601/index.html

Carnegie Mellon University 36Yifeng Tao