robust reverse engineering of dynamic gene …apparikh/talks/psb2014_robust...robust reverse...

43
Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size Heterogeneity Ankur P. Parikh, Wei Wu, Eric P. Xing Carnegie Mellon University

Upload: others

Post on 15-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Robust Reverse Engineering of

Dynamic Gene Networks

Under Sample Size Heterogeneity

Ankur P. Parikh, Wei Wu, Eric P. Xing

Carnegie Mellon University

Page 2: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Outline

Motivation

Challenge of sample size heterogeneity

Our solution

Results

Conclusion

2

Page 3: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Outline

Motivation

Challenge of Sample Size Heterogeneity

Our solution

Results

Conclusion

3

Page 4: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

4

Multiple Network Estimation for Progression

and Reversion of Breast Cancer cells

T4 Revert Group 1

T4 Revert Group 2

T4 Revert Group 3

S1 (normal)

T4 (malignant)

Page 5: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

5

Stem Cell Differentiation

Page 6: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Use Microarray Data to Infer

Network Structure

6

?

?

?

Page 7: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

A Data Integration Problem Small numbers of samples from many related cell states.

Poses opportunities and challenges.

Challenges

How to leverage similarities among cell states for more

accurate network estimation? (some of our previous work)

How to integrate data in a “correct” way that does not

compromise validity of biological conclusions? (a focus of

this work)

7

Page 8: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Different cell states have different sample sizes

Challenge: Sample Size Heterogeneity for

network estimation

8

Page 9: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Cell states with more samples have more edges

Biases Resulting Networks

9

Page 10: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Sample Heterogeneity Makes

Network Analysis Problematic Unclear which network differences are meaningful

and which are an artifact of sample size heterogeneity

Comparing Macro-Topology Features

Network Density

Centrality

Comparing Gene Level Interactions

Gene neighborhoods

Local modules

10

Page 11: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Our Contribution

Identify the novel problem of sample size heterogeneity in

network reconstruction

We focus on sparse regression methods for network

reconstruction [Meinshausen and Buhlmann 2006, Friedman et al. 2008.

Song et al. 2009, Parikh et al. 2011]

We add a regularization constraint to make these methods

robust to sample size heterogeneity

11

Page 12: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Outline Motivation

Challenge of sample size heterogeneity

Our solution

Background

Increasing robustness to sample size heterogeneity

Sharing information across states

Results

Conclusion

12

Page 13: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Background on Sparse Regression

Network Reconstruction[Meinshausen and Buhlmann 2006]

Network Learning with the Graphical LASSO [Meinshausen and

Buhlmann 2006]

13

Learn each neighborhood

separately

Page 14: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

14

Use the LASSO to select a sparse set of only the relevant

edges

Selects sparse set of edges

Background on Sparse Regression

Network Reconstruction[Meinshausen and Buhlmann 2006]

Page 15: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Repeat this for every node

15

Background on Sparse Regression

Network Reconstruction[Meinshausen and Buhlmann 2006]

Page 16: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

16

Background on Sparse Regression

Network Reconstruction[Meinshausen and Buhlmann 2006]

Combine all the neighborhoods to form a network

Page 17: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

(Naïve way) run the graphical lasso for each!

More sophisticated techniques to share information exist (Song

et al. 2009, Ahmed and Xing 2009, Parikh et al. 2011)

Learning Multiple Networks

17

network 1

network 2

Graphical lasso for network 1

Graphical lasso for network 2

Page 18: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

The graphical lasso will encourage networks with more samples

to have more edges.

Simply tricks such as scaling parameters only work in theory

but not in practice.

Sample Size Heterogeneity

18

network 1

network 2

Graphical lasso for network 1

Graphical lasso for network 2

Page 19: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Intuition Of Our Solution

Make “normalizing” assumption that absolute sum of

edge weights for all networks is equal

Incorporate this assumption into graphical lasso procedure

19

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑑𝑔𝑒𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 1

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑒𝑑𝑔𝑒𝑤𝑒𝑖𝑔ℎ𝑡𝑠 𝑜𝑓 𝑛𝑒𝑡𝑤𝑜𝑟𝑘 2= = 𝐶

Page 20: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Intuition Of Our Solution

20

network 1

network 2

Graphical lasso for network 1

with constraint

Graphical lasso for network 2

with constraint

Page 21: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Sharing Information Among Related

States

21

Often number of

samples per state is

very small.

“Share” samples among

related states

Song et al. 2009

Ahmed and Xing 2009

Parikh et al. 2011

These strategies can be

integrated into our robust

framework

Page 22: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Summary of Method

Based on the graphical lasso [Meinshausen and Buhlmann 2006]

Add regularization constraint so that networks are

calibrated

Share information among related cell states by taking

weighted average of related samples

22

Page 23: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Outline

Motivation

Challenge of sample size heterogeneity

Our solution

Results

Conclusion

23

Page 24: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Robust algorithm produces more calibrated networks than

non robust version

Both methods produce comparable F1 score

Results on Synthetic Data

24

Mean vs Variance

Mean Number of Edges

Vari

an

ce

Robust

Non-Robust

(Scaled)

Page 25: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Stem Cell Data

25

Hematopoietic stem cell data from Novershtern et al. 2011

38 cell states

4-7 samples per cell state

211 total samples

Page 26: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Stem Cell Data Results

26

Granulocyte networkCommon Myleoid

Progenitor Network

Robust

Non-Robust

Page 27: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Stem Cell Data: Biological Analysis

27

Noveshtern et al. 2011 doesn’t give

insight into module topology.

Experimentally validated

regulators (Novershtern et al.)

Experimentally validated genes

Non-experimentally validated

genes

granulocyte-subnetwork monocyte-subnetwork

Page 28: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Stem Cell Data: Biological Analysis

28

granulocyte-subnetwork monocyte-subnetwork

Two modules with identical gene interaction patterns

large 10-gene module

small 2-gene module

7 out of these 12 genes experimentally validated by Novershtern et al.

Page 29: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Conclusion

We addressed the novel problem of sample size

heterogeneity in the context of dynamic network

reconstruction

Our solution produces more calibrated networks allowing

for more meaningful biological comparison

We hope this can help efforts in personalized medicine and

data integration

29

Page 30: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Acknowledgements

30

PSB Travel

Fellowship 2014

NSF Graduate

FellowshipNIH

Page 31: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Thanks!

31

Page 32: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Regulation of cell response to stimuli is

paramount, but we can usually only measure

(or compute) steady-state interactions

Transcriptionalinteractions

Protein—proteininteractions

Biochemicalreactions

▲ Chromatin IP

▲ Microarrays

▲ Protein coIP

▲ Yeast two-hybrid

▲ Metabolic flux measurements

Transcriptionalinteractions

Protein—proteininteractions

Biochemicalreactions

▲ Chromatin IP

▲ Microarrays

▲ Protein coIP

▲ Yeast two-hybrid

▲ Metabolic flux measurements

32

Page 33: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Sparsity: In a mathematical sense Consider least squares linear regression problem:

Sparsity means most of the beta’s are zero.

But this is not convex!!! Many local optima, computationally

intractable.

33

𝑦

𝑥1

𝑥2

𝑥3

𝑥𝑛−1

𝑥𝑛

b1

b2

b3

bn-1

bn

𝜷 = argmin𝜷

𝒀 − 𝑿𝜷 22

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

𝑗=1

𝑝

𝐼 𝛽𝑗 > 0 ≤ 𝐶

Page 34: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

L1 Regularization (LASSO) [Tibshirani 1996]

A convex relaxation.

Still enforces sparsity!

34

Constrained Form Lagrangian Form

𝜷 = argmin𝜷

𝒀 − 𝑿𝜷 22

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

𝑗=1

𝑝

𝐼 𝛽𝑗 > 0 ≤ 𝐶

𝜷 = argmin𝜷

𝒀 − 𝑿𝜷 22 + 𝜆 𝜷 1

Page 35: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Our Solution – Network Calibration

Calibrate networks to remove this sample size “bias”

Analogous to microarray processing to remove

systematic dye bias

Except our calibration is not a pre/post processing

step but rather a fundamental part of the method

35

Page 36: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Sparsity!

One common assumption to make sparsity.

Makes biological sense: Genes are only assumed to

interface with small groups of other genes.

Makes statistical sense: Learning is now feasible in high

dimensions with small sample size

36

Page 37: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Stem Cell Data: Quantitative

Results

37

Page 38: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Intuition Of Our Solution

38

(something like this is true)

𝜷𝟏(𝒛), … , 𝜷𝒑

(𝒛)= argmin

𝜷𝟏(𝒛)

,…, 𝜷𝒑(𝒛)

𝑗=1

𝑝

𝑿𝒋(𝒛)

− 𝑿−𝒋𝟏𝜷𝒋(𝒛)

2

2

𝑗=1

𝑝

𝜷𝒋(𝒛)

1= 𝐶

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜

𝑓𝑜𝑟 𝑎𝑙𝑙 𝑐𝑒𝑙𝑙 𝑠𝑡𝑎𝑡𝑒𝑠 𝒛

Page 39: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Stem Cell Data Results

39

Granulocyte network

Common Myleoid Progenitor

Network

28 nodes with degree > 0 719 nodes with degree > 0

Page 40: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

40

Our Robust Approach

643 nodes with degree > 0 677 nodes with degree > 0

Page 41: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

41

Our Robust Approach

643 nodes with degree > 0 677 nodes with degree > 0

Page 42: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

Sample Size Heterogeneity for

Network Estimation

42

𝜷𝟏(𝟏), … , 𝜷𝒑

(𝟏)

= argmin𝜷𝟏(𝟏)

,…, 𝜷𝒑(𝟏)

𝑗=1

𝑝

𝑿𝒋(𝟏)

− 𝑿−𝒋𝟏𝜷𝒋(𝟏)

2

2+ 𝜆

𝑗=1

𝑝

𝜷𝒋(𝟏)

1

Increases as number of

samples increases stays constant as a

function of number of

samples

Thus applying the same 𝝀 will result in cell types with more samples

having more edges

Page 43: Robust Reverse Engineering of Dynamic Gene …apparikh/talks/psb2014_robust...Robust Reverse Engineering of Dynamic Gene Networks Under Sample Size HeterogeneityAnkur P. Parikh, Wei

What About Scaling?

43

𝜷𝟏(𝟏), … , 𝜷𝒑

(𝟏)

= argmin𝜷𝟏(𝟏)

,…, 𝜷𝒑(𝟏)

𝑗=1

𝑝1

𝑆(1)𝑿𝒋(𝟏)

− 𝑿−𝒋𝟏𝜷𝒋(𝟏)

2

2

+𝜆

𝑆(1)

𝑗=1

𝑝

𝜷𝒋(𝟏)

1

Only works in theory, not in practice.

Requires restrictive modeling assumptions to hold

Requires larger sample size to kick in