probabilistic refinement of cellular pathway models

33
Florian Markowetz [email protected] Probabilistic refinement of cellular pathway models Cambridge Statistical Laboratory Networks seminar series 2009 Jan 21

Upload: florian-markowetz

Post on 11-May-2015

488 views

Category:

Education


2 download

DESCRIPTION

Talk at the Network Seminar Series of the Cambridge Statistical Laboratoy

TRANSCRIPT

Page 1: Probabilistic refinement of cellular pathway models

Florian [email protected]

Probabilistic refinement

of cellular pathway models

Cambridge Statistical Laboratory Networks seminar series

2009 Jan 21

Page 2: Probabilistic refinement of cellular pathway models

What is a signaling pathway?

DNA

mRNA

Protein

Environmentalstimuli

Receptor incell membrane

Protein cascade

Transcription factorsregulating target genes

Pathway

Page 3: Probabilistic refinement of cellular pathway models

Pathway reconstruction

Signaling pathways are important-

Deregulation causes many diseases incl. cancer

Signaling pathways are poorly understood-

Only parts-lists

-

missing are interactions

within and between pathways

Biological research-

So far mostly focused on individual genes

New genome-scale datasets-

Opportunity for data integration and novel methods

Page 4: Probabilistic refinement of cellular pathway models

What data do we have?

DNA

mRNA

Protein mRNA:-

Expression under different stimuli-

binding to DNA

Sequence:- binding motifs- epigenetic marks

Proteins:- interactions between proteins- binding to DNA

Morphology

Bulk of data:Microarray

Page 5: Probabilistic refinement of cellular pathway models

Pathways as graphs•

Nodes

are (mostly) known

Goal: infer edges from data•

Data are heterogeneous

• binding motifs at genes• Protein domains• Functional annotation

co-expression between genes •

interactions between

proteins •

binding of proteins to

DNA

• Cause-effect data: • changing environments• experimental perturbations

Edges

Nodes

Paths

Page 6: Probabilistic refinement of cellular pathway models

Pathway reconstruction

“Classical”

statistical approaches:Treat the genes/proteins as random variables and

explore correlation structure

in the data:–

Correlation graphs

Gaussian graphical models (partial correlation)–

Bayesian networks

Review: Markowetz and Spang (2007)

Challenges/Problems/Opportunities1. Correlation may be un-informative2. Integrate heterogeneous and noisy and

complementary data sources

Page 7: Probabilistic refinement of cellular pathway models

– Part 1 –

Nested Effects Models

Page 8: Probabilistic refinement of cellular pathway models

Experimental perturbations

DNA

mRNA

Protein

RNAi

Knockout

DrugsSmallmolecules

Stress

Readout:Global gene expression measurements

Page 9: Probabilistic refinement of cellular pathway models

Drosophila immune response

Columns: perturbed genesRows: effects on other genes

1.

Silencing tak1

reduces expression of all

LPS-

inducible transcripts2.

Silencing rel

(key) or

mkk4/hep

reduces expression of subsets

of

induced transcripts

(Boutros et al, Dev Cell 2002)

Page 10: Probabilistic refinement of cellular pathway models

(!)

Two types of entities

Components of signaling pathway

which are

experimentally perturbed

Downstream

effect reporters

Page 11: Probabilistic refinement of cellular pathway models

(!!)

Only indirect information

No direct observation

of perturbation effects on other pathway components!

Inference from observed perturbation effects on downstream reporters.

Page 12: Probabilistic refinement of cellular pathway models

The information gap

B

A C

DPathway

-

Cell survival or death- Growth rate- downstream genes

BA C

DPathway

Direct information: effects are visible at other

pathway components

Indirect information: effects are only visible at

down-stream reporters

Page 13: Probabilistic refinement of cellular pathway models

Correlation won’t do

Downstream regulated

genes

BA C

DPathway Correlation

Graphical models:- Bayes Nets- GGMsMutual Information

NestedEffects Models

“Classical” approach

Page 14: Probabilistic refinement of cellular pathway models

Nested Effects Models

Inferred pathwayPhenotypic profiles

A B

C D E F

G H

ABC

F

D

H

E

G

Gen

e pe

rtur

batio

ns

Effects

1.

Set of candidate pathway genes2.

High-dimensional phenotypic profile, e.g. microarrayINPUT

OUTPUT Graph representation of information flow explaining the phenotypes

Page 15: Probabilistic refinement of cellular pathway models

NEM: model formulation

M’xyz :X Y Z

E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6

XYZ

Expected

E1 E2 E3 E4 E5 E6

XYZ

ObservedFN FN

FNFP

Pathway genes: X, Y, Z• core topology• to be reconstructed

= Model

M

Effect reporters: E1

, …, E6• states are observed

= Data D• positions in pathway unknown

= Parameters θ

Posterior:

P ( M | D ) = 1/Z .

P( D | M )

. P( M )Marginal likelihood

Page 16: Probabilistic refinement of cellular pathway models

Likelihood P( D | M, θ

)

Error probabilitiese.g. false NEG rate 20%, false POS rate 5%

95.080.095.005.0)1Pr()0Pr()1Pr()1Pr( 2121

⋅⋅⋅==⋅=⋅=⋅== EEEELik

Prediction E1 =0 E2 =1Observation 1. E1 =1 E2 =1

2. E1 =0 E2 =1

Compare predictions with observations:

X

Y

Z

E1 E2

Page 17: Probabilistic refinement of cellular pathway models

Marginal likelihood

∫ ΘΘΘ= dMPMDPMDP )|(),|()|(

∏∑∏= = =

==m

i

n

j

l

kiikm jMeP

n 1 1 1

),|(1 θ

Product over replicate observation

Average over possible positions in the pathway

Product over all effect reporters

Uniformprior overpositions

Distribution of single effect reporter with known position

Page 18: Probabilistic refinement of cellular pathway models

NEM: inference

Model space: all transitively closed directed graphs

Exhaustive enumeration: score all models to find the one fitting the data best

Markowetz et al. Bioinformatics, 2005

MCMC, Simulated Annealing: take small probabilistic steps to explore model space

. . . with A Tresch; in preparation

Divide and conquer: break a big model into smaller, manageable pieces and then re-assemble

Markowetz et al. ISMB 2007

Page 19: Probabilistic refinement of cellular pathway models

NEM: extensions

Drop transitivityrequirement

Likelihood based on log-ratios

of effects

Feature selection

to concentrate on informative effect reporters

Tresch

and Markowetz (2008)

Page 20: Probabilistic refinement of cellular pathway models

NEMs on Drosophila data

Page 21: Probabilistic refinement of cellular pathway models

Summary of part 1

1.

Gene perturbation screens

with gene- expression readouts

2.

Perturbation screens suffer from the information gap

between pathways and

reporters

3.

Nested Effects Models

reconstruct pathway features from subset relations between observed effects

Page 22: Probabilistic refinement of cellular pathway models

– Part 2 –

Data integration

and probabilistic refinement

of

a signaling pathway hypothesis

Page 23: Probabilistic refinement of cellular pathway models

Pathway refinement1.

Start from given pathway hypothesis

Even if our understanding of pathways is poor, that does not mean we have none at all!

2.

Evaluate evidence for hypothesis in data

3.

Identify weakly supported areas and likely extensions

Not reconstruction from scratch.

Step 1:

assemble pathway hypothesis (KEGG, literature, …) for pheromone response pathway

in Yeast

Page 24: Probabilistic refinement of cellular pathway models

Edge data I

Support for hypothesis in protein-protein interaction

data

Page 25: Probabilistic refinement of cellular pathway models

Edge data II

Support for hypothesis in co-expression

data

Page 26: Probabilistic refinement of cellular pathway models

Edge data IIIWhy is it so hard to reconstruct nuclear regulatory network from correlations?

Page 27: Probabilistic refinement of cellular pathway models

Edge data IVSupport for hypothesis in

TF-DNA binding

data

Page 28: Probabilistic refinement of cellular pathway models

Paths: cause-effect dataExpression profiling of knock-out mutants

(Hughes et al., 2000)

Result:transcriptional response to perturbation only visible on down-stream genes (information gap!)

Page 29: Probabilistic refinement of cellular pathway models

Conclusion from data analysis

Every data source is informative for a specific compartment of the pathway

No data source is informative in all compartments

We expect these observations also to hold for other MAPK and signaling pathways.

Need compartment-specific integrative model encompassing edge, node, and path data.

Page 30: Probabilistic refinement of cellular pathway models

Integrative model

Pathway graph as hidden/latent variables

Conditional distributions for each data type

Different data types contribute to each compartment

Graphical model defines posterior P(G|data)-> inference by Gibbs sampler

ParametersPrior

Page 31: Probabilistic refinement of cellular pathway models

Evaluation

1.

Fit model parameters

on pheromone response pathway (training)

2.

Use fitted model on other MAPK pathways (generalization to closely related examples)

3.

Use fitted model on all other Yeast signaling pathways

(generalization to everything else)

… work in progress …

Page 32: Probabilistic refinement of cellular pathway models

Acknowledgements

Nested Effects Models

Rainer Spang

(Univ. Regensburg) .:.

Dennis Kostka

(UC SF) .:.

Achim

Tresch

(Gene Center

Munich) .:.

Holger

Fröhlich

(DKFZ Heidelberg) .:. Tim Beißbarth

(Univ. Göttingen) .:. Josh

Stuart,

Charlie Vaske

(UC SC) .:.

Data integration

Olga G. Troyanskaya

(Princeton) .:. Edoardo Airoldi

(Harvard) .:.

David Blei

(Princeton) .:.

Page 33: Probabilistic refinement of cellular pathway models

Florian [email protected]

Probabilistic refinement of cellular pathway models

Thank you !