![Page 1: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/1.jpg)
Florian [email protected]
Probabilistic refinement
of cellular pathway models
Cambridge Statistical Laboratory Networks seminar series
2009 Jan 21
![Page 2: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/2.jpg)
What is a signaling pathway?
DNA
mRNA
Protein
Environmentalstimuli
Receptor incell membrane
Protein cascade
Transcription factorsregulating target genes
Pathway
![Page 3: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/3.jpg)
Pathway reconstruction
Signaling pathways are important-
Deregulation causes many diseases incl. cancer
Signaling pathways are poorly understood-
Only parts-lists
-
missing are interactions
within and between pathways
Biological research-
So far mostly focused on individual genes
New genome-scale datasets-
Opportunity for data integration and novel methods
![Page 4: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/4.jpg)
What data do we have?
DNA
mRNA
Protein mRNA:-
Expression under different stimuli-
binding to DNA
Sequence:- binding motifs- epigenetic marks
Proteins:- interactions between proteins- binding to DNA
Morphology
Bulk of data:Microarray
![Page 5: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/5.jpg)
Pathways as graphs•
Nodes
are (mostly) known
•
Goal: infer edges from data•
Data are heterogeneous
• binding motifs at genes• Protein domains• Functional annotation
•
co-expression between genes •
interactions between
proteins •
binding of proteins to
DNA
• Cause-effect data: • changing environments• experimental perturbations
Edges
Nodes
Paths
![Page 6: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/6.jpg)
Pathway reconstruction
“Classical”
statistical approaches:Treat the genes/proteins as random variables and
explore correlation structure
in the data:–
Correlation graphs
–
Gaussian graphical models (partial correlation)–
Bayesian networks
Review: Markowetz and Spang (2007)
Challenges/Problems/Opportunities1. Correlation may be un-informative2. Integrate heterogeneous and noisy and
complementary data sources
![Page 7: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/7.jpg)
– Part 1 –
Nested Effects Models
![Page 8: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/8.jpg)
Experimental perturbations
DNA
mRNA
Protein
RNAi
Knockout
DrugsSmallmolecules
Stress
Readout:Global gene expression measurements
![Page 9: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/9.jpg)
Drosophila immune response
Columns: perturbed genesRows: effects on other genes
1.
Silencing tak1
reduces expression of all
LPS-
inducible transcripts2.
Silencing rel
(key) or
mkk4/hep
reduces expression of subsets
of
induced transcripts
(Boutros et al, Dev Cell 2002)
![Page 10: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/10.jpg)
(!)
Two types of entities
Components of signaling pathway
which are
experimentally perturbed
Downstream
effect reporters
![Page 11: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/11.jpg)
(!!)
Only indirect information
No direct observation
of perturbation effects on other pathway components!
Inference from observed perturbation effects on downstream reporters.
![Page 12: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/12.jpg)
The information gap
B
A C
DPathway
-
Cell survival or death- Growth rate- downstream genes
BA C
DPathway
Direct information: effects are visible at other
pathway components
Indirect information: effects are only visible at
down-stream reporters
![Page 13: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/13.jpg)
Correlation won’t do
Downstream regulated
genes
BA C
DPathway Correlation
Graphical models:- Bayes Nets- GGMsMutual Information
NestedEffects Models
“Classical” approach
![Page 14: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/14.jpg)
Nested Effects Models
Inferred pathwayPhenotypic profiles
A B
C D E F
G H
ABC
F
D
H
E
G
Gen
e pe
rtur
batio
ns
Effects
1.
Set of candidate pathway genes2.
High-dimensional phenotypic profile, e.g. microarrayINPUT
OUTPUT Graph representation of information flow explaining the phenotypes
![Page 15: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/15.jpg)
NEM: model formulation
M’xyz :X Y Z
E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6
XYZ
Expected
E1 E2 E3 E4 E5 E6
XYZ
ObservedFN FN
FNFP
Pathway genes: X, Y, Z• core topology• to be reconstructed
= Model
M
Effect reporters: E1
, …, E6• states are observed
= Data D• positions in pathway unknown
= Parameters θ
Posterior:
P ( M | D ) = 1/Z .
P( D | M )
. P( M )Marginal likelihood
![Page 16: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/16.jpg)
Likelihood P( D | M, θ
)
Error probabilitiese.g. false NEG rate 20%, false POS rate 5%
95.080.095.005.0)1Pr()0Pr()1Pr()1Pr( 2121
⋅⋅⋅==⋅=⋅=⋅== EEEELik
Prediction E1 =0 E2 =1Observation 1. E1 =1 E2 =1
2. E1 =0 E2 =1
Compare predictions with observations:
X
Y
Z
E1 E2
![Page 17: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/17.jpg)
Marginal likelihood
∫ ΘΘΘ= dMPMDPMDP )|(),|()|(
∏∑∏= = =
==m
i
n
j
l
kiikm jMeP
n 1 1 1
),|(1 θ
Product over replicate observation
Average over possible positions in the pathway
Product over all effect reporters
Uniformprior overpositions
Distribution of single effect reporter with known position
![Page 18: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/18.jpg)
NEM: inference
Model space: all transitively closed directed graphs
Exhaustive enumeration: score all models to find the one fitting the data best
Markowetz et al. Bioinformatics, 2005
MCMC, Simulated Annealing: take small probabilistic steps to explore model space
. . . with A Tresch; in preparation
Divide and conquer: break a big model into smaller, manageable pieces and then re-assemble
Markowetz et al. ISMB 2007
![Page 19: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/19.jpg)
NEM: extensions
Drop transitivityrequirement
Likelihood based on log-ratios
of effects
Feature selection
to concentrate on informative effect reporters
Tresch
and Markowetz (2008)
![Page 20: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/20.jpg)
NEMs on Drosophila data
![Page 21: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/21.jpg)
Summary of part 1
1.
Gene perturbation screens
with gene- expression readouts
2.
Perturbation screens suffer from the information gap
between pathways and
reporters
3.
Nested Effects Models
reconstruct pathway features from subset relations between observed effects
![Page 22: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/22.jpg)
– Part 2 –
Data integration
and probabilistic refinement
of
a signaling pathway hypothesis
![Page 23: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/23.jpg)
Pathway refinement1.
Start from given pathway hypothesis
Even if our understanding of pathways is poor, that does not mean we have none at all!
2.
Evaluate evidence for hypothesis in data
3.
Identify weakly supported areas and likely extensions
Not reconstruction from scratch.
Step 1:
assemble pathway hypothesis (KEGG, literature, …) for pheromone response pathway
in Yeast
![Page 24: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/24.jpg)
Edge data I
Support for hypothesis in protein-protein interaction
data
![Page 25: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/25.jpg)
Edge data II
Support for hypothesis in co-expression
data
![Page 26: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/26.jpg)
Edge data IIIWhy is it so hard to reconstruct nuclear regulatory network from correlations?
![Page 27: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/27.jpg)
Edge data IVSupport for hypothesis in
TF-DNA binding
data
![Page 28: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/28.jpg)
Paths: cause-effect dataExpression profiling of knock-out mutants
(Hughes et al., 2000)
Result:transcriptional response to perturbation only visible on down-stream genes (information gap!)
![Page 29: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/29.jpg)
Conclusion from data analysis
•
Every data source is informative for a specific compartment of the pathway
•
No data source is informative in all compartments
•
We expect these observations also to hold for other MAPK and signaling pathways.
Need compartment-specific integrative model encompassing edge, node, and path data.
![Page 30: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/30.jpg)
Integrative model
Pathway graph as hidden/latent variables
Conditional distributions for each data type
Different data types contribute to each compartment
Graphical model defines posterior P(G|data)-> inference by Gibbs sampler
ParametersPrior
![Page 31: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/31.jpg)
Evaluation
1.
Fit model parameters
on pheromone response pathway (training)
2.
Use fitted model on other MAPK pathways (generalization to closely related examples)
3.
Use fitted model on all other Yeast signaling pathways
(generalization to everything else)
… work in progress …
![Page 32: Probabilistic refinement of cellular pathway models](https://reader033.vdocuments.net/reader033/viewer/2022042700/55500376b4c905bc138b538a/html5/thumbnails/32.jpg)
Acknowledgements
Nested Effects Models
Rainer Spang
(Univ. Regensburg) .:.
Dennis Kostka
(UC SF) .:.
Achim
Tresch
(Gene Center
Munich) .:.
Holger
Fröhlich
(DKFZ Heidelberg) .:. Tim Beißbarth
(Univ. Göttingen) .:. Josh
Stuart,
Charlie Vaske
(UC SC) .:.
Data integration
Olga G. Troyanskaya
(Princeton) .:. Edoardo Airoldi
(Harvard) .:.
David Blei
(Princeton) .:.