probabilistic graphical modelsprobabilistic graphical...
TRANSCRIPT
-
1
School of Computer Science
Probabilistic Graphical ModelsProbabilistic Graphical Models
Gaussian graphical models and Gaussian graphical models and IsingIsing models: modeling networks models: modeling networks
Eric Eric XingXingLecture 10, February 18, 2013
Reading: See class website1© Eric Xing @ CMU, 2005-2013
Where do networks come from?The Jesus network
2© Eric Xing @ CMU, 2005-2013
-
2
Evolving networksCan I get his vote?
CorporativityCorporativity,
Antagonism,
Cliques,…
over time?
March 2005 January 2006 August 2006
3© Eric Xing @ CMU, 2005-2013
Evolving networks
…
t=1 2 3 T
4© Eric Xing @ CMU, 2005-2013
-
3
Recall Multivariate GaussianMultivariate Gaussian density:
{ })()()|( 111 T
WOLG: let
{ })-()-(-exp)(
),|( // µµπµ xxx 1212122
1 −ΣΣ
=Σ Tn
p
( )⎭⎬⎫
⎩⎨⎧
−== ∑∑ jiijiiinp xxqxqQ
Qxxxp 2212/
2/1
21 -exp)2(),0|,,,(
πµL
We can view this as a continuous Markov Random Field with potentials defined on every node and edge:
⎭⎩∑∑<
jji
ji
np /)2( π
5© Eric Xing @ CMU, 2005-2013
Cell type
Gaussian Graphical Model
Microarray samples Encodes dependencies
among genesg g
6© Eric Xing @ CMU, 2005-2013
-
4
Precision Matrix Encodes Non-ZeroEdges in Gaussian Graphical Modela
Edge corresponds to non-zero precision matrix element
7© Eric Xing @ CMU, 2005-2013
Correlation network is based on Covariance Matrix
Markov versus Correlation Network
A GGM is a Markov Network based on Precision MatrixConditional Independence/Partial Correlation Coefficients are a more sophisticated dependence measure
With small sample size, empirical covariance matrix cannot be inverted8© Eric Xing @ CMU, 2005-2013
-
5
SparsityOne common assumption to make sparsity.
Makes biological sense: Genes are only assumed to interface with small groups of other genes.
Makes statistical sense: Learning is now feasible in high dimensions with small sample size
sparse
9© Eric Xing @ CMU, 2005-2013
Network Learning with the LASSO
Assume network is a Gaussian Graphical Model
Perform LASSO regression of all nodes to a target node
10© Eric Xing @ CMU, 2005-2013
-
6
Network Learning with the LASSO
LASSO can select the neighborhood of each node
11© Eric Xing @ CMU, 2005-2013
L1 Regularization (LASSO)A convex relaxation.Constrained Form Lagrangian Formg g
x1x2β1β2
Enforces sparsity!
…
y x3
xn-1xn
β3
βn-1
βn
12© Eric Xing @ CMU, 2005-2013
-
7
Theoretical GuaranteesAssumptions
Dependency Condition: Relevant Covariates are not overly dependentp y y pIncoherence Condition: Large number of irrelevant covariates cannot be too correlated with relevant covariatesStrong concentration bounds: Sample quantities converge to expected values quickly
If these are assumptions are met, LASSO will asymptotically recover correct subset of covariates that relevant.
13© Eric Xing @ CMU, 2005-2013
Network Learning with the LASSO
Repeat this for every nodeForm the total edge setForm the total edge set
14© Eric Xing @ CMU, 2005-2013
-
8
If
Consistent Structure Recovery[Meinshausen and Buhlmann 2006, Wainwright 2009]
If
Then with high probability,
15© Eric Xing @ CMU, 2005-2013
Why this algorithm work?What is the intuition behind graphical regression?
Continuous nodal attributesDiscrete nodal attributes
Are there other algorihtms?
More general scenarios: non-iid sample and evolving networksnon iid sample and evolving networks
Case study
16© Eric Xing @ CMU, 2005-2013
-
9
Multivariate GaussianMultivariate Gaussian density:
{ })()()|( 111 T
A joint Gaussian:
How to write down p(x2), p(x1|x2) or p(x2|x1) using the block elements in µ and Σ?
{ })-()-(-exp)(
),|( // µµπµ xxx 1212122
1 −ΣΣ
=Σ Tn
p
) , (),|( ⎥⎦
⎤⎢⎣
⎡ΣΣΣΣ
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡=Σ⎥
⎦
⎤⎢⎣
⎡
2221
1211
2
1
2
1
2
1
µµ
µxx
xx
Np
elements in µ and Σ?Formulas to remember:
211
22121121
221
2212121
2121121
ΣΣΣ−Σ=
−ΣΣ+=
=
−
−
|
|
||
)(
),|()(
V
xm
Vmxxx
µµ
Np
222
22
2222
Σ=
=
=
m
m
mmp
V
m
Vmxx
µ
),|()( N
17© Eric Xing @ CMU, 2005-2013
The matrix inverse lemmaConsider a block-partitioned matrix: ⎥
⎦
⎤⎢⎣
⎡=
HGFE
M
First we diagonalize M
Schur complement:
Then we inverse, using this formula:
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡
HGE-FH
IG-HI
HGFE
I
-FHI --
-
000
0
1
1
1
XZW YW XYZ -- 11 =⇒=GE-FHM/H -1=
( )⎥⎤
⎢⎡⎥⎤
⎢⎡⎥⎤
⎢⎡
⎥⎤
⎢⎡ −
− 1111 00 -FHIM/HIFEM
-
Matrix inverse lemma
( )
( ) ( )( ) ( )
( ) ( )( ) ( ) ⎥⎦
⎤⎢⎣
⎡ +=⎥
⎦
⎤⎢⎣
⎡
+=
⎥⎦
⎢⎣⎥⎦
⎢⎣⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=
−−
−−−−−
−−−−−
−−
−−
111
111111
111111
111
111
00
M/EGEM/E-M/EF-EGEM/EFEE
FHM/HGHHM/HG-HFHM/H-M/H
IHIG-H
HG M
-
-
-
-
-
( ) ( ) 1111111 --- GEFH-GEFEEGE-FH −−−− +=18© Eric Xing @ CMU, 2005-2013
-
10
The covariance and the precision matrices
⎥⎦
⎤⎢⎣
⎡
Σ=Σ 111
σσv
vT
( ) ( )( ) ( ) ⎥⎦
⎤⎢⎣
⎡
+= −−−−−
−−−
111111
1111
-
-
FHM/HGHHM/HG-HFHM/H-M/HM
⎥⎦
⎢⎣ Σ−11σv
⇓
( ) ⎥⎦⎤
⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
Σ+ΣΣΣ
=−
−−
−−
−−
−−
11
1111
111111
111
111
1111111
Qqqq
qI-q-qqQ
T
T
T
v
v
vvv
v
σσσσ
⇓
19© Eric Xing @ CMU, 2005-2013
Single-node Conditional The conditional dist. of a single node i given the rest of the nodes can be written as:
211
22121121
221
2212121
2121121
ΣΣΣ−Σ=
−ΣΣ+=
=
−
−
|
|
||
)(
),|()(
V
xm
Vmxxx
µµ
Np
WOLG: let
( ) ⎥⎦⎤
⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
Σ+ΣΣΣ
=−
−−
−−
−−
−−
11
1111
111111
111
111
1111111
Qqqq
qI-q-qqQ
T
T
T
v
v
vvv
v
σσσσ
20© Eric Xing @ CMU, 2005-2013
-
11
Conditional auto-regression From
We can write the following conditional auto-regression function for each node:
Neighborhood est. based on auto-regression coefficient
© Eric Xing @ CMU, 2005-2013 21
Conditional independenceFrom
Given an estimate of the neighborhood si, we have:
Thus the neighborhood si defines the Markov blanket of node i
22© Eric Xing @ CMU, 2005-2013
-
12
Recent trends in GGM:Covariance selection (classical method)
L1-regularization based method (hot !)
Dempster [1972]: Sequentially pruning smallest elements in precision matrix
Drton and Perlman [2008]: Improved statistical tests for pruning
Meinshausen and Bühlmann [Ann. Stat. 06]:
Used LASSO regression for neighborhood selection
Banerjee [JMLR 08]: Block sub-gradient algorithm for finding precision matrix
Friedman et al. [Biostatistics 08]: Efficient fixed-point equations based on a sub-gradient algorithm
…
Serious limitations in practice: breaks down when covariance matrix is not invertible
Structure learning is possible even when # variables > # samples
23© Eric Xing @ CMU, 2005-2013
The Meinshausen-Bühlmann (MB) algorithm:
Solving separated Lasso for every single variables:
Step 1: Pick up one variable
Step 2: Think of it as “y”, and the rest as “z”
Step 3: Solve Lasso regression problem between y and z
Step 4: Connect the k-th node to those having nonzero weight in w
The resulting coefficient does not correspond to the Q value-wise
24© Eric Xing @ CMU, 2005-2013
-
13
L1-regularized maximum likelihood learning
Input: Sample covariance matrix S
Assumes standardized data (mean=0, variance=1)S is generally rank-deficient
Thus the inverse does not exist
Output: Sparse precision matrix QOriginally, Q is defined as the inverse of S, but not directly invertibleNeed to find a sparse matrix that can be thought as of as an inverse of S
Approach: Solve an L1-regularized maximum likelihood equation
log likelihood regularizer
25© Eric Xing @ CMU, 2005-2013
From matrix opt. to vector opt.:coupled Lasso for every single Var.
Focus only on one row (column), keeping the others constant
Optimization problem for blue vector is shown to be Lasso (L1-regularized quadratic programming)
Difference from MB’s: Resulting Lasso problems are coupledThe gray part is actually not constant; changes after solving one Lasso problemThis coupling is essential for stability under noise
26© Eric Xing @ CMU, 2005-2013
-
14
Learning Ising Model (i.e. pairwise MRF)
Assuming the nodes are discrete (e.g., voting outcome of a person), and edges are weighted, then for a sample x, we p ), g g , p ,have
It can be shown the pseudo-conditional likelihood for node k is
27© Eric Xing @ CMU, 2005-2013
Learning Ising Model (i.e. pairwise MRF)
Assuming the nodes are discrete, and edges are weighted, then for a sample x, we have p ,
It can be shown following the same logic that we can use L_1 l i d l i ti i t bt i ti t fregularized logistic regression to obtain a sparse estimate of
the neighborhood of each variable in the discrete case.
28© Eric Xing @ CMU, 2005-2013
-
15
New Problem: Evolving Social Networks
Can I get his vote?
CorporativityCorporativity,
Antagonism,
Cliques,…
over time?
March 2005 January 2006 August 2006
29© Eric Xing @ CMU, 2005-2013
t*
Reverse engineering time-specific "rewiring" networks
T0 TN
…n=1 or some small #
Drosophila developmentDrosophila development
30© Eric Xing @ CMU, 2005-2013
-
16
Inference IKELLER: Kernel Weighted L1-regularized Logistic Regression
[Song, Kolar and Xing, Bioinformatics 09]
Lasso:
Constrained convex optimizationEstimate time-specific nets one by one, based on "virtual iidvirtual iid" samplesCould scale to ~104 genes, but under stronger smoothness assumptions
31© Eric Xing @ CMU, 2005-2013
Conditional likelihood
Algorithm – nonparametric neighborhood selection
NNeighborhood Selectioneighborhood Selection:
Time-specific graph regression:Estimate at
Where
and32© Eric Xing @ CMU, 2005-2013
-
17
Structural consistency of KELLER
AssumptionsDefine:
A1: Dependency Condition
A2: Incoherence Condition
A3: Smoothness Condition
A4: Bounded Kernel
33© Eric Xing @ CMU, 2005-2013
Theorem [Kolar and Xing, 09]
34© Eric Xing @ CMU, 2005-2013
-
18
Senate network – 109th congress
Voting records from 109th congress (2005 - 2006)There are 100 senators whose votes were recorded on the 542 bills, each vote is a binary outcome
35© Eric Xing @ CMU, 2005-2013
Senate network – 109th congress
March 2005 January 2006 August 2006
36© Eric Xing @ CMU, 2005-2013
-
19
Senator Chafee
37© Eric Xing @ CMU, 2005-2013
Senator Ben Nelson
T=0.2 T=0.8
38© Eric Xing @ CMU, 2005-2013
-
20
S1 (normal)
Progression and Reversion of Breast Cancer cells
( )
T4 (malignant)
T4 revert 1
T4 revert 2
T4 revert 3
39© Eric Xing @ CMU, 2005-2013
Estimate Neighborhoods Jointly Across All Cell Types
T4
S1
How to share information across the cell types?
T4R1
T4R2
T4R3
40© Eric Xing @ CMU, 2005-2013
-
21
Penalize differences between networks of adjacent cell types
Sparsity of Difference
T4
S1
T4R1
T4R2
T4R3
41© Eric Xing @ CMU, 2005-2013
Tree-Guided Graphical Lasso (Treegl)
RSS for all cell types
sparsity Sparsity of difference42© Eric Xing @ CMU, 2005-2013
-
22
T4
EGFR-ITGB1
Network Overview
MMPS1
PI3K-MAPKK
MMP
43© Eric Xing @ CMU, 2005-2013
SummaryGraphical Gaussian Model
The precision matrix encode structureNot estimatable when p >> n
Neighborhood selection:Conditional dist under GGM/MRFGraphical lassoSparsistency
Time-varying Markov networksKernel reweighting est.Total variation est.
44© Eric Xing @ CMU, 2005-2013