reverse engineering gene regulatory networks

85
Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk

Upload: feleti

Post on 04-Jan-2016

43 views

Category:

Documents


1 download

DESCRIPTION

Reverse engineering gene regulatory networks. Dirk Husmeier Adriano Werhli Marco Grzegorczyk. Systems biology Learning signalling pathways and regulatory networks from postgenomic data. unknown. unknown. high-throughput experiments. postgenomic data. unknown. data. data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reverse engineering gene regulatory networks

Reverse engineering gene regulatory networks

Dirk Husmeier

Adriano Werhli

Marco Grzegorczyk

Page 2: Reverse engineering gene regulatory networks

Systems biology

Learning signalling pathways and regulatory networks from

postgenomic data

Page 3: Reverse engineering gene regulatory networks
Page 4: Reverse engineering gene regulatory networks

unknown

Page 5: Reverse engineering gene regulatory networks

unknown

high-throughput experiment

s

postgenomic data

Page 6: Reverse engineering gene regulatory networks

unknown

data data

machine learning

statistical methods

Page 7: Reverse engineering gene regulatory networks

true network extracted network

Does the extracted network provide a good prediction of the true interactions?

Page 8: Reverse engineering gene regulatory networks

Reverse Engineering of Regulatory Networks

• Can we learn the network structure from postgenomic data themselves?

• Statistical methods to distinguish between– Direct interactions– Indirect interactions

• Challenge: Distinguish between– Correlations– Causal interactions

• Breaking symmetries with active interventions:– Gene knockouts (VIGs, RNAi)

Page 9: Reverse engineering gene regulatory networks

direct

interaction

common

regulator

indirect

interaction

co-regulation

Page 10: Reverse engineering gene regulatory networks
Page 11: Reverse engineering gene regulatory networks

• Relevance networks

• Graphical Gaussian models

• Bayesian networks

Page 12: Reverse engineering gene regulatory networks

• Relevance networks

• Graphical Gaussian models

• Bayesian networks

Page 13: Reverse engineering gene regulatory networks
Page 14: Reverse engineering gene regulatory networks

Relevance networks(Butte and Kohane, 2000)

1. Choose a measure of association A(.,.)

2. Define a threshold value tA

3. For all pairs of domain variables (X,Y) compute their association A(X,Y)

4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value tA

Page 15: Reverse engineering gene regulatory networks

Association scores

Page 16: Reverse engineering gene regulatory networks

1 2

X

21

X

21

‘direct interaction’

‘common regulator’

‘indirect interaction’X

21

1 2

strong

correlation σ12

Page 17: Reverse engineering gene regulatory networks

Pairwise associations without taking the context of the system

into consideration

Page 18: Reverse engineering gene regulatory networks

• Relevance networks

• Graphical Gaussian models

• Bayesian networks

Page 19: Reverse engineering gene regulatory networks

Graphical Gaussian Models

jjii

ijij

)()(

)(111

1

2

2

1

1

direct interaction

Partial correlation, i.e. correlation

conditional on all other domain variables

Corr(X1,X2|X3,…,Xn)

strong partial

correlation π12

Page 20: Reverse engineering gene regulatory networks

direct

interaction

common

regulator

indirect

interaction

co-regulation

Distinguish between direct and indirect interactions

A and B have a low partial correlation

Page 21: Reverse engineering gene regulatory networks

Graphical Gaussian Models

jjii

ijij

)()(

)(111

1

2

2

1

1

direct interaction

Partial correlation, i.e. correlation

conditional on all other domain variables

Corr(X1,X2|X3,…,Xn)

Problem: #observations < #variables

strong partial

correlation π12

Page 22: Reverse engineering gene regulatory networks
Page 23: Reverse engineering gene regulatory networks

Shrinkage estimation and the lemma of Ledoit-Wolf

Page 24: Reverse engineering gene regulatory networks

Shrinkage estimation and the lemma of Ledoit-Wolf

Page 25: Reverse engineering gene regulatory networks

Graphical Gaussian Models

direct

interaction

common

regulator

indirect

interaction

P(A,B)=P(A)·P(B)

But: P(A,B|C)≠P(A|C)·P(B|C)

Page 26: Reverse engineering gene regulatory networks

Undirected versus directed edges

• Relevance networks and Graphical Gaussian models can only extract undirected edges.

• Bayesian networks can extract directed edges.

• But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

Page 27: Reverse engineering gene regulatory networks

• Relevance networks

• Graphical Gaussian models

• Bayesian networks

Page 28: Reverse engineering gene regulatory networks

Bayesian networks

A

CB

D

E F

NODES

EDGES

•Marriage between graph theory and probability theory.

•Directed acyclic graph (DAG) representing conditional independence relations.

•It is possible to score a network in light of the data: P(D|M), D:data, M: network structure.

•We can infer how well a particular network explains the observed data.

),|()|(),|()|()|()(

),,,,,(

DCFPDEPCBDPACPABPAP

FEDCBAP

Page 29: Reverse engineering gene regulatory networks
Page 30: Reverse engineering gene regulatory networks

Bayesian networks versus causal networks

Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Page 31: Reverse engineering gene regulatory networks

Bayesian networks versus causal networks

A

CB

A

CB

True causal graph

Node A unknown

Page 32: Reverse engineering gene regulatory networks

Bayesian networks versus causal networks

A

CB

• Equivalence classes: networks with the same scores: P(D|M).

• Equivalent networks cannot be distinguished in light of the data.

A

CB

A

CB

A

CB

Page 33: Reverse engineering gene regulatory networks

Equivalence classes of BNs

)|()()|(

)()|()()()|( 1

BCPBPCAP

CPCAPCPBPBCP

11 )(),()(),()(

)|()|()(

APACPCPCBPAP

ACPCBPAP

),|()()( BACPBPAP

A

B

C

A

B

A

B

A

B

C

C

C

)()|()|(

),()|(

CPCBPCAP

CBPCAP

A

B

C

completed partially directed graphs (CPDAGs)

A

C

B

v-structure

P(A,B)=P(A)·P(B)

P(A,B|C)≠P(A|C)·P(B|C)

P(A,B)≠P(A)·P(B)

P(A,B|C)=P(A|C)·P(B|C)

Page 34: Reverse engineering gene regulatory networks

Symmetry breaking

A

CB

•Interventions

•Prior knowledge

A

CB

A

CB

A

CB

Page 35: Reverse engineering gene regulatory networks

Symmetry breaking

A

CB

•Interventions

•Prior knowledge

A

CB

A

CB

A

CB

Page 36: Reverse engineering gene regulatory networks

Interventional data

A B

A B A B

inhibition of A

A B

n

iXpaiii i

DXpaDXPMDP1

][ )][|()|(

n

i

iXpai

iii i

DXpaDXP1

}{][

}{ )][|(

down-regulation of B

no effect on B

A and B are correlated

Page 37: Reverse engineering gene regulatory networks

Learning Bayesian networks from data

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Page 38: Reverse engineering gene regulatory networks
Page 39: Reverse engineering gene regulatory networks
Page 40: Reverse engineering gene regulatory networks

Learning Bayesian networks from data

P(M|D) = P(D|M) P(M) / Z

M: Network structure. D: Data

Page 41: Reverse engineering gene regulatory networks
Page 42: Reverse engineering gene regulatory networks
Page 43: Reverse engineering gene regulatory networks
Page 44: Reverse engineering gene regulatory networks
Page 45: Reverse engineering gene regulatory networks

Evaluation

• On real experimental data, using the gold standard network from the literature

• On synthetic data simulated from the gold-standard network

Page 46: Reverse engineering gene regulatory networks

Evaluation

• On real experimental data, using the gold standard network from the literature

• On synthetic data simulated from the gold-standard network

Page 47: Reverse engineering gene regulatory networks

From Sachs et al., Science 2005

Page 48: Reverse engineering gene regulatory networks

Evaluation: Raf signalling pathway

• Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell

• Deregulation carcinogenesis

• Extensively studied in the literature gold standard network

Page 49: Reverse engineering gene regulatory networks

Raf regulatory network

From Sachs et al Science 2005

Page 50: Reverse engineering gene regulatory networks

Flow cytometry data

• Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins

• 5400 cells have been measured under 9 different cellular conditions (cues)

• Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Page 51: Reverse engineering gene regulatory networks
Page 52: Reverse engineering gene regulatory networks

Two types of experiments

Page 53: Reverse engineering gene regulatory networks
Page 54: Reverse engineering gene regulatory networks

Evaluation

• On real experimental data, using the gold standard network from the literature

• On synthetic data simulated from the gold-standard network

Page 55: Reverse engineering gene regulatory networks

Comparison with simulated data 1

Page 56: Reverse engineering gene regulatory networks

Raf pathway

Page 57: Reverse engineering gene regulatory networks

Comparison with simulated data 2

Page 58: Reverse engineering gene regulatory networks

Comparison with simulated data 2

Steady-state approximation

Page 59: Reverse engineering gene regulatory networks
Page 60: Reverse engineering gene regulatory networks

Real versus simulated data

• Real biological data: full complexity of biological systems.

• The “gold-standard” only represents our current state of knowledge; it is not guaranteed to represent the true network.

• Simulated data: Simplifications that might be biologically unrealistic.

• We know the true network.

Page 61: Reverse engineering gene regulatory networks

How can we evaluate the reconstruction accuracy?

Page 62: Reverse engineering gene regulatory networks

true network extracted network

biological knowledge

(gold standard network)

Evaluation of

learning

performance

Page 63: Reverse engineering gene regulatory networks
Page 64: Reverse engineering gene regulatory networks
Page 65: Reverse engineering gene regulatory networks
Page 66: Reverse engineering gene regulatory networks

Performance evaluation:ROC curves

Page 67: Reverse engineering gene regulatory networks

•We use the Area Under the Receiver Operating

Characteristic Curve (AUC).

0.5<AUC<1

AUC=1AUC=0.5

Performance evaluation:ROC curves

Page 68: Reverse engineering gene regulatory networks

Alternative performance evaluation: True positive (TP) scores

We set the threshold such that we obtain 5 spurious edges (5 FPs) and count the corresponding number of true edges (TP count).

Page 69: Reverse engineering gene regulatory networks

5 FP counts

BN

GGM

RN

Alternative performance evaluation: True positive (TP) scores

Page 70: Reverse engineering gene regulatory networks

data

Directed graph evaluation - DGE

true regulatory network

Thresholding

edge scores

TP:1/2

FP:0/4

TP:2/2

FP:1/4

concrete networkpredictions

lowhigh

Page 71: Reverse engineering gene regulatory networks

data

Undirected graph evaluation - UGE

skeleton of the

true regulatory network

Thresholding

undirected edge scores

TP:1/2

FP:0/1

TP:2/2

FP:1/1

high low

concrete network(skeleton) predictions

Page 72: Reverse engineering gene regulatory networks
Page 73: Reverse engineering gene regulatory networks

Synthetic data, observations

Page 74: Reverse engineering gene regulatory networks

Synthetic data, interventions

Page 75: Reverse engineering gene regulatory networks

Cytometry data, interventions

Page 76: Reverse engineering gene regulatory networks

How can we explain the difference between synthetic

and real data ?

Page 77: Reverse engineering gene regulatory networks

Simulated data are “simpler”.

No mismatch between models used for data generation and inference.

Page 78: Reverse engineering gene regulatory networks

Complications with real data

Can we trust our gold-standard network?

Page 79: Reverse engineering gene regulatory networks

Raf regulatory network

From Sachs et al Science 2005

Page 80: Reverse engineering gene regulatory networks

Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al

Disputed structure of the gold-standard network

Page 81: Reverse engineering gene regulatory networks

Stabilisationthrough negative feedback loops inhibition

Complications with real data

Interventions might not be “ideal” owing to negative feedback loops.

Page 82: Reverse engineering gene regulatory networks

Conclusions 1

• BNs and GGMs outperform RNs, most notably on Gaussian data.

• No significant difference between BNs and GGMs on observational data.

• For interventional data, BNs clearly outperform GGMs and RNs, especially when taking the edge direction (DGE score) rather than just the skeleton (UGE score) into account.

Page 83: Reverse engineering gene regulatory networks

Conclusions 2

Performance on synthetic data better than on real data.

• Real data: more complex• Real interventions are not ideal• Errors in the gold-standard

network

Page 84: Reverse engineering gene regulatory networks

How do we model feedback loops?

Page 85: Reverse engineering gene regulatory networks

Unfolding in time