andrew rosenberg- lecture 7: graphical models machine learning
TRANSCRIPT
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 1/43
Lecture 7: Graphical Models
Machine Learning
Andrew Rosenberg
March 5, 2010
1/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 2/43
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 3/43
Today
Graphical Models
3/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 4/43
Recap
Models we’ve looked at so far.
Linear Regression
Logistic RegressionBoth make use of probabilistic models.Graphical models are a way to structure and visualize probabilitymodels.
4/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 5/43
Probability Models
(Joint) Probability Tables.
We represent multinomial joint probabilities between K variablesas K-dimensional tables.
p (x ) = p (flu ?, achiness ?, headache ?, . . . , temperature ?)
Assume D binary (“true/false”) variables.
How big is this table? 2D
Exponential Increase in size of the probability table.Related to the curse of dimensionality.
What if rather than a Bernouli (binary) variables, we hadmultinomials with M choices?
5/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 6/43
Probability Models
What if the variables are independent?
p (x ) = p (flu ?, achiness ?, headache ?, . . . , temperature ?)
Recall, if x and y are independent:
p (x , y ) = p (x )p (y )
The original probability distribution then factorizes.
p (x ) = p (flu ?)p (achiness ?)p (headache ?) . . . p (temperature ?)
How big is this table (if each variable is binary)?
p (flu ?) = .2 .8 p (headache ?) = .6 .4 etc.Total size = 2 ∗D
6/44
G hi l M d l
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 7/43
Graphical Models
Independence assumptions are convenient (Naive Bayes), butrarely true.
More often some groups of variables are dependent, butothers are independent.
Moreover others are conditionally independent.
7/44
C di i l I d d
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 8/43
Conditional Independence
If two variables are conditionally independent, then:
p (x , z |y ) = p (x |y )p (z |y )
but
p (x , z ) = p (x )p (z )
e.g. y = flu ?, x = achiness ?, z = headache ?.Written as:
x ⊥⊥ z |y
8/44
F t i ti f th j i t
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 9/43
Factorization of the joint
Assume
x ⊥⊥ z |y
How do you factorize p (x , y , z )?
p (x , y , z ) = p (x , z |y )p (y )
= p (x |y )p (z |y )p (y )
9/44
F t i ti f th j i t
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 10/43
Factorization of the joint
Assume
x ⊥⊥ z |y
How do you factorize p (x , y , z )?
p (x , y , z ) = p (x , z |y )p (y )
= p (x |y )p (z |y )p (y )
What if x and z not conditionally independent?
p (x , y , z ) = p (x , z |y )p (y )= p (x |y , z )p (z |y )p (y )
10/44
St t f G hi l M d ls
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 11/43
Structure of Graphical Models
Graphical models allow us to represent dependence relationships
between variables visually.
Graphical models are Graphs
Nodes: random variables
Edges: Dependence relationship
No Edge: Independent variables
Direction of the edge: indicates a parent child relationship(like causality, but not exactly)
Child: Destination of the edge – Response
Parent: Source of the edge – Trigger
Graphical models are always Directed Acyclic Graphs (DAG).
11/44
Some example models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 12/43
Some example models
Independence: p (x , y ) = p (x )p (y )
x y
Dependence: p (x , y ) = p (x |y )p (y )
x y
Parents of a node i denoted πi or pai .
Factorization of the joint in a Graphical model
p (x 0, . . . , x n−1) =n−1
i =0
p (x i |pai ) =n−1
i =0
p (x i |πi )
12/44
Basic Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 13/43
Basic Graphical Models
Independent variables.
x y z
Observations.
x y z
When we observe a variable – know it’s value from data – wecolor the variable corresponding to that node grey.
Observing a variable allows us to condition on it.
E.g. p(x,z—y)Given an observation of any variable we can generate generatepdfs for the other variables.
13/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 14/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,y ,z }
(n|πn) = p (x )p (y |x )p (z |y )
x = cloudy ?y = raining ?z = wetground ?
14/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 15/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,y ,z }
(n|πn) = p (x )p (y |x )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
15/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 16/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,y ,z }
(n|πn) = p (x )p (y |x )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
16/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 17/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,y ,z }
(n|πn) = p (x )p (y |x )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y
) =
p (x , y , z )
p (y , z ) =
p (x )p (y |x )p (z |y )
p (y )p (z |y )
17/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 18/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x )p (y |x )p (z |y )
p (y )p (z |y )
=p (x )p (y |x )
p (y )=
p (x , y )
p (y )= p (x |y )
18/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 19/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x )p (y |x )p (z |y )
p (y )p (z |y )
=p (x )p (y |x )
p (y )=
p (x , y )
p (y )= p (x |y )
p (x , z |y ) = p (x |y )p (z |y )
19/44
Example Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 20/43
Example Graphical Models
Basic Graphical models.
Markov Chain
x y z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x )p (y |x )p (z |y )
p (y )p (z |y )
=p (x )p (y |x )
p (y )=
p (x , y )
p (y )= p (x |y )
p (x , z |y ) = p (x |y )p (z |y )
x ⊥⊥ z |y 20/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 21/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
x = achiness ?y = flu ?z = fever ?
21/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 22/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
22/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 23/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
23/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 24/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x |y )p (y )p (z |y )
p (y )p (z |y )
24/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 25/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x |y )p (y )p (z |y )
p (y )p (z |y )= p (x |y )
25/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 26/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x |y )p (y )p (z |y )
p (y )p (z |y )= p (x |y )
p (x , z |y ) = p (x |y )p (z |y )
26/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 27/43
One cause two effects
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x |y )p (y )p (z |y )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x |y )p (y )p (z |y )
p (y )p (z |y )= p (x |y )
p (x , z |y ) = p (x |y )p (z |y )
x ⊥⊥ z |y
27/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 28/43
Two causes One effect
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x , z )p (z )
x = rain?y = wetsidewalk ?z = spilledcoffee ?
28/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 29/43
Two causes One effect
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x , z )p (z )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
29/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 30/43
Two causes One effect
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x , z )p (z )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
30/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 31/43
Two causes One effect
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x , z )p (z )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x )p (y |x , z )p (z )
p (y |z )p (z )
31/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 32/43
Two causes One effect
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x , z )p (z )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x )p (y |x , z )p (z )
p (y |z )p (z )
= =p (x )p (y |x , z )
p (y |z )
32/44
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 33/43
Two causes One effect
x
y
z
p (x , y , z ) =
n∈{x ,
y ,
z }
(n|πn) = p (x )p (y |x , z )p (z )
Is x ⊥⊥ z |y ? That is...Does p (x , z |y ) = p (x |y )p (z |y )?
p (x , z |y ) = p (x |z , y )p (z |y )
p (x |z , y ) =p (x , y , z )
p (y , z )=
p (x )p (y |x , z )p (z )
p (y |z )p (z )
= =p (x )p (y |x , z )
p (y |z )
x not ⊥⊥ z |y
33/44
Factorization
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 34/43
A more complicated factorization
x 0
x 1
x 2
x 3
x 4
x 5
p (x 0, x 1, x 2, x 3, x 4, x 5) = ?
34/44
Factorization
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 35/43
A more complicated factorization
x 0
x 1
x 2
x 3
x 4
x 5
p (x 0, x 1, x 2, x 3, x 4, x 5) = ?
= p (x 0) . . .
35/44
Factorization
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 36/43
A more complicated factorization
x 0
x 1
x 2
x 3
x 4
x 5
p (x 0, x 1, x 2, x 3, x 4, x 5) = ?
= p (x 0) . . .= p (x 0)p (x 1|x 0) . . .
36/44
Factorization
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 37/43
A more complicated factorization
x 0
x 1
x 2
x 3
x 4
x 5
p (x 0, x 1, x 2, x 3, x 4, x 5) = ?
= p (x 0) . . .= p (x 0)p (x 1|x 0) . . .
= p (x 0)p (x 1|x 0)p (x 2|x 0)p (x 3|x 1)p (x 4|x 2)p (x 5|x 1, x 4)
37/44
Factorization
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 38/43
How big are the probability tables?
p (x 0, x 1, x 2, x 3, x 4, x 5) = p (x 0)p (x 1|x 0)p (x 2|x 0)p (x 3|x 1)p (x 4|x 2)p (x 5|x 1, x 4)
p (x 0) =
p (x 1|x 0) =
p (x 2|x 0) =
p (x 3|x 1) =
p (x 4|x 2) =
p (x 5|x 1, x 4) =
38/44
Model Parameters as nodes
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 39/43
If we model the parameters, θ, as a random variable, we caninclude these in the graphical model.
x 0 θ x 1
Multivariate Bernouli
x 0
µ0
x 1
µ1
x 2
µ0
Multinomial
x 0 x 1
µ
x 2
39/44
Continuous models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 40/43
Graphical models can incorporate both discrete and continuousnodes.
x 0 x 1
α
x 2
40/44
Naive Bayes
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 41/43
Naive Bayes Classification.
x 0 x 1
y
x 2
Observation variables, x i are each independent given the class
y .A distribution is optimized using maximum likelihood for eachvariable separately.Can easily combine multinomial, bernouli and continuous(e.g. Gaussian) distributions from the variables.
p (y |x 0x 1, x 2) ∝ p (x 0, x 1, x 2|y )p (y )
p (y |x 0x 1, x 2) ∝ p (x 0|y )p (x 1|y )p (x 2|y )p (y )
41/44
Graphical Models
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 42/43
Graphical Models
Graph representation of dependency relationship
Directed Acyclic Graph (DAG)
Nodes are random variables
Edges define dependence relationships.
What can we do with Graphical models
Learn Parameters – to fit data
Understand the independence relationships between variables
Perform inference (marginals and conditionals)Compute Likelihoods for classification
42/44
Bye
8/3/2019 Andrew Rosenberg- Lecture 7: Graphical Models Machine Learning
http://slidepdf.com/reader/full/andrew-rosenberg-lecture-7-graphical-models-machine-learning 43/43
Next
More fun with Graphical Models
43/44