understanding gene regulation: from networks to mechanisms

40
Understanding Gene Regulation: From Networks to Mechanisms Daphne Koller Stanford University

Upload: josiah

Post on 22-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Understanding Gene Regulation: From Networks to Mechanisms. Daphne Koller Stanford University. Gene Regulatory Networks. Controlled by diverse mechanisms. Modified by endogenous and exogenous perturbations. http://en.wikipedia.org/wiki/Gene_regulatory_network. Goals. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Understanding Gene Regulation:  From Networks to Mechanisms

Understanding Gene Regulation: From Networks to Mechanisms

Daphne KollerStanford University

Page 2: Understanding Gene Regulation:  From Networks to Mechanisms

Gene Regulatory Networks

http://en.wikipedia.org/wiki/Gene_regulatory_network

Controlled by diverse mechanismsModified by endogenous and

exogenous perturbations

Page 3: Understanding Gene Regulation:  From Networks to Mechanisms

Goals Infer regulatory network and mechanisms

that control gene expression

Identify effect of perturbations on network

Understand effect of gene regulation on phenotype

Page 4: Understanding Gene Regulation:  From Networks to Mechanisms

Outline Regulatory networks for gene expression Individual genetic variation and gene

regulation Cell differentiation and gene regulation Expression changes underlying phenotype

Page 5: Understanding Gene Regulation:  From Networks to Mechanisms

Regulatory Network I mRNA level of regulator can indicate its activity level Target expression is predicted by expression of its

regulators Use expression of regulatory genes as regulators

PHO5PHM6

SPL2

PHO3PHO84

VTC3GIT1

PHO2

TEC1

GPA1

ECM18

UTH1MEC3

MFA1

SAS5SEC59

SGS1

PHO4

ASG7

RIM15

HAP1

PHO2

GPA1MFA1

SAS5PHO4

RIM15

Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006

Transcription factors, signal transduction proteins, mRNA

processing factors, …

Page 6: Understanding Gene Regulation:  From Networks to Mechanisms

Co-regulated genes have similar regulation program Exploit modularity and predict expression of entire module Allows uncovering complex regulatory programs

Regulatory Network II

module

PHO5PHM6

SPL2

PHO3PHO84

VTC3GIT1

PHO2

TEC1

GPA1

ECM18

UTH1MEC3

MFA1

SAS5SEC59

SGS1

PHO4

ASG7

RIM15

HAP1

PHO2

GPA1MFA1

SAS5PHO4

RIM15

Targets

Segal et al., Nature Genetics 2003; Lee et al., PNAS 2006

“Regulatory Program” ?

Page 7: Understanding Gene Regulation:  From Networks to Mechanisms

Module Networks* Learning quickly runs

out of statistical power Poor regulator selection

lower in the tree Many correct regulators

not selected Arbitrary choice among

correlated regulators

Combinatorial search Multiple local optima

repressor

true

repressor expressio

n false

contex

t B

mod

ule

gene

sre

gula

tion

pr

ogra

m

true

contex

t C

activatoractivator expression

contex

t A

false

target gene expression

induced

repressed

Activator Repressor

Genes in module

Gene1 Gene2 Gene3

* Segal et al., Nature Genetics 2003

Page 8: Understanding Gene Regulation:  From Networks to Mechanisms

Regulation as Linear Regression

minimizew (Σwixi - ETargets)2

But we often have hundreds or thousands of regulators … and linear regression gives them all nonzero weight!

xN…x1 x2

w1w2 wN

ETargets

Problem: This objective learns too many regulators

parametersw1

w2 wN

ETargets= w1 x1+…+wN xN+ε

=

MFA1

Module

GPA1-3 x+

0.5 x

Page 9: Understanding Gene Regulation:  From Networks to Mechanisms

Lasso* (L1) Regressionminimizew (w1x1 + … wNxN - ETargets)2+ C |wi|

Induces sparsity in the solution w (many wi‘s set to zero) Provably selects “right” features when many features are irrelevant

Convex optimization problem Unique global optimum Efficient optimization

But, arbitrary choice among correlated regulators

xN…x1 x2

w1w2 wN

ETargets

parametersw1

w2

x1 x2

* Tibshirani, 1996

L2 L1

Page 10: Understanding Gene Regulation:  From Networks to Mechanisms

Elastic Net* Regressionminimizew (w1x1 + … wNxN - ETargets)2+ C |wi| + D wi

2

Induces sparsity But avoids arbitrary choices among relevant features

Convex optimization problem Unique global optimum Efficient optimization algorithms

Lee et al., PLOS Genetics 2009 * Zhou & Hastie, 2005

xN…x1 x2

w1w2 wN

ETargets

w1w2

x1 x2 L2 L1

Page 11: Understanding Gene Regulation:  From Networks to Mechanisms

Cluster genes into modules Learn a regulatory program for each module

Learning Regulatory Network

PHO5PHM6

SPL2

PHO3PHO84

VTC3GIT1

PHO2

TEC1

GPA1

ECM18

UTH1MEC3

MFA1

SAS5SEC59

SGS1

PHO4

ASG7

RIM15

HAP1

PHO2

GPA1MFA1

SAS5PHO4

RIM15

Lee et al., PLoS Genet 2009

=

MFA1

Module

GPA1-3 x+

0.5 x

This is a Bayesian network• But multiple genes share same program• Dependency model is linear regression

Page 12: Understanding Gene Regulation:  From Networks to Mechanisms

Outline Regulatory networks for gene expression Individual genetic variation and gene

regulation Effect of genotype on expression Regulatory potential

Cell differentiation and gene regulation Expression changes underlying phenotype

Page 13: Understanding Gene Regulation:  From Networks to Mechanisms

Genotype phenotype

…ACTCGGTTGGCCTAAATTCGGCCCGG…

…ACCCGGTAGGCCTTAATTCGGCCCGG…

:

…ACTCGGTAGGCCTATATTCGGCCGGG…

Different sequences

Different phenotypes

??

Perturbations toregulatory network

Page 14: Understanding Gene Regulation:  From Networks to Mechanisms

Genotype Regulation

…ACTCGGTTGGCCTAAATTCGGCCCGG…

…ACCCGGTAGGCCTTAATTCGGCCCGG…

:

…ACTCGGTAGGCCTATATTCGGCCGGG…

Different sequences

??

Perturbations toregulatory network

Goals: Infer regulatory network that controls gene

expression Identify mechanisms by which genetic variation

affects gene expression

Page 15: Understanding Gene Regulation:  From Networks to Mechanisms

eQTL Data [Brem et al. (2002) Science]

BY

×

RM

:

112 progeny

Mar

kers

Expression data112 individuals

6000

gen

es

0101100100…0111011110100…0010010110000…010

:

0000010100…101

0010000000…100

Genotype data

3000

mar

kers

112 individuals

Page 16: Understanding Gene Regulation:  From Networks to Mechanisms

Expression quantitative trait loci (eQTL) mapping For each gene, find the marker that is most predictive of its

expression level [Yvert G et al. (2003) Nat Gen].

Traditional Approach: Single Marker

genes

Genotype data Expression data

markers

individualsindividuals0101100100100…0111011110111100…0010010110001000…010

:

0000010110100…101

1110000110000…100

Gene iMarker jinduced

repressed

1 2 3 4 5 …Marker

mRNA

Gene

Page 17: Understanding Gene Regulation:  From Networks to Mechanisms

LirNet Regulatory network E-regulators: Activity (expression) of regulatory genes G-regulators: Genotype of genes

Measured as values of chromosomal markers

Lee et al., PNAS 2006; Lee et al., PLoS Genetics 2009

PHO5PHM6

SPL2

PHO3PHO84

VTC3GIT1

PHO2

TEC1

GPA1

ECM18

UTH1MEC3

MFA1

SAS5SEC59

SGS1

PHO4

ASG7

RIM15

HAP1

PHO2

GPA1MFA1

SAS5PHO4

RIM15

M1

M120

M22

M1011

M321M321

marker

Page 18: Understanding Gene Regulation:  From Networks to Mechanisms

The Telomere Module

Includes Rif2 – • control telomere length • establishes telomeric silencing• 6 coding & 8 promoter SNPs• Binds to Rap1p C-terminus

Enrichment for Rap1p targets (29/42; p < 10-15)

Chr XII: 1056097

40/42 genes in telomeres Enriched for telomere maintenance (p < 10-11)

& helicase activity (p < 10-18)

Lee et al., PNAS 2006

Page 19: Understanding Gene Regulation:  From Networks to Mechanisms

Some Chromatin Modules• Locus containing Sir1• 4 coding SNPs

4/5 consecutive genes

• Locus containing uncharacterized Sir1 homologue• 87(!) coding SNPs

5/7 consecutive genes

Known Sir1 targets

Chr XI: 643655

Chr X: 22213

Lee et al., PNAS 2006

Page 20: Understanding Gene Regulation:  From Networks to Mechanisms

Chromatin as Mechanism

23 modules (out of 165) with “chromosomal features”

16 have “chromatin regulators” (p < 10-7)

Chromatin modification explains significant part of variation in gene expression between strains

Lee et al., PNAS 2006

module #genes Runs Dom DEG Enrich Chrom Reg-E Reg-G728 16732 7 Telo742 24743 6 Telo744 5 Telo755 14757 3 Telo770 35783 23790 122848 15855 42 Telo862 20 869 48876 51 Telo890 114 897 15904 56 TY,LTR917 31941 110968 218 972 7 Telo991 4993 5

1018 881025 217 1030 151054 6 Telo1061 46 Telo1078 901085 42 1120 8 Telo1126 26 Telo1133 191145 7 Telo1155 75

Mechanism I

“Evolutionary strategy” to make coordinated changes in gene

expression by modifying small number of hubs

Page 21: Understanding Gene Regulation:  From Networks to Mechanisms

The Puf3 Module

+1

0

- 1

112 segregantsBY RM

HAP4TOP2KEM1GCN1GCN20DHH1

weight

PUF3 expressiongenotype

147/153 genes (P ~ 10-130)are pulldown targets of

mRNA binding protein Puf3

P-body component

s

PUF family: Sequence specific mRNA binding proteins (3’ UTR) Regulate degradation of mRNA and/or repress translation

Lee et al., PLOS Genetics 2009

Page 22: Understanding Gene Regulation:  From Networks to Mechanisms

P-Bodies

Dhh1 regulates mRNA decapping in p-bodies

mRNAs stored in P-bodies are translationally repressed

TranslationmRNAs can exit P-bodies and resume translation

** Beliakova-Bethell , et al. (2006) RNA 12:94

GFP of Dhh1**

RNA degradationRNA stored in P-bodies can be degraded (via decapping)

* Sheth and Parker (2003) Science 300:805cell

P-bodyPuf3 protein

?But what regulates sequence-specific

localization ofmRNAs to P-bodies?

Page 23: Understanding Gene Regulation:  From Networks to Mechanisms

Microscopy experiment Fluorescent microscopy: Puf3 localizes to P-body Supports hypothesis of Puf3 involvement in

regulating mRNA degradation by P-bodies

Dhh1

Puf3

cell

P-body

Puf3 protein ?

Dhh1

Joint work with David Drubin and Pam Silver Lee et al., PLOS Genetics 2009

Page 24: Understanding Gene Regulation:  From Networks to Mechanisms

What Regulates the P-bodies?

A marker that covers a large region in Chr 14.

Region contains ~30 genes and ~318 SNPs. Experiments for all 30 genes not

feasible! ChrXIV:449639

DHH1

BYRM

GCN20

GCN1KEM1

BLM3

Lee et al., PLOS Genetics 2009

Page 25: Understanding Gene Regulation:  From Networks to Mechanisms

Outline Regulatory networks for gene expression Individual genetic variation and gene

regulation Effect of genotype on expression Regulatory potential

Cell differentiation and gene regulation Expression changes underlying phenotype

Page 26: Understanding Gene Regulation:  From Networks to Mechanisms

Motivation Not all SNPs are equally likely to be causal.

Gene

SNPs

TACGTAGGAACCTGTACCA … GGAAAATATCAAATCCAACGACGTTAGCCAATGCGATCGAATGGGAACGTA

ChrXIV: 449,639-502,316

SNP 1:Conserved residue in a gene involved in RNA degradation

SNP 2:In nonconserved intergenic region

“Regulatory features” F1. Gene region?2. Protein coding region?3. Nonsynonymous?4. Create a stop codon?5. Strong conservation? :

Idea: Prioritize SNPs that have “good” regulatory features

But how do we weight different features?Lee et al., PLOS Genetics 2009

Page 27: Understanding Gene Regulation:  From Networks to Mechanisms

Bayesian L1-Regularization

wmk

ym

Module m

xmk

Regulator k

~ P(ym|x;w) = N (k wmkxmk,ε2)

~ P(w) = Laplacian(0,C)

higher prior variance weight can more easily deviate from 0 regulator more likely to be selected

wiwi

Priorvariance

km

kmwC,

, ||

M

mmmmyP

1

),|(log wx

Lee et al., PLOS Genetics 2009

Page 28: Understanding Gene Regulation:  From Networks to Mechanisms

Metaprior Model (Hierarchical Bayes)

ym

xmk wmk ~ Laplacian (0,Cmk)

= g(ßTfmk)

:

~ P(ym|x;w) = N (k wmkxmk,ε2)

Cmk

fmk

Regulatoryprior ß

Module m

Regulator k

Module 1

Module M

xN…x1 x2x1 x2

Emodule 1

Potential regulators

xN…x1 x2x2

Emodule M

xN

Regulatory featuresInside a gene?Protein coding region? Strong conservation?TF binds to module genes :

“Regulatory potential” = ß1 x Inside a gene? + ß2 x Protein coding region? + ß3 x Conserved? …

YESNO:

YESNO:

NONO:

w11w12 w1N

wN1wN2 wMN

Lee et al., PLOS Genetics 2009

Page 29: Understanding Gene Regulation:  From Networks to Mechanisms

Metaprior Method

Regulatory potentials…x1 x2 xN

Ei

…x1 x2

Ei

xN

Module i+2

…x1 x2

Ei

xN

Module i+1

…x1 x2

Ei

xN

Module i

Regulatory programs

1. Learn regulatory programs

3. Compute regulatory

potential of SNPs

0 0.1 0.2 0.3

1

2

3

4

5

6

Regulatory weights ß

Non-synonymousConservationAA small large

Cell cycle0 0.1 0.2 0.3

BY(lab) MVLT ELVQ VSDASKQLWDI

RM(wild) MVLT ELVQ VSDASKQLWDI

1 1 1

0

L

D

1 0 0

1

: :

0.3 0.9

= =

Regulatory features F

A

G

2. Learn ß

Lee et al., PLOS Genetics 2009

Maximize P(E,ß,W|X)

Maximize P(E,ß,W|X)

Empirical hierarchical Bayes Use point estimate of model parameters Learn priors from data to maximize joint

posterior

Page 30: Understanding Gene Regulation:  From Networks to Mechanisms

Transfer Learning What do regulatory potentials do?

They do not change selection of “strong” regulators – those where prediction of targets is clear

They only help disambiguate between weak ones

Strong regulators help teach us what to look for in other regulators

Transfer of knowledge between different prediction tasks

Page 31: Understanding Gene Regulation:  From Networks to Mechanisms

Learned regulatory weights0 0.1 0.2 0.3 0.4 0.5

Non-synonymous codingStop codon

Synonymous coding3' UTR

500 bp upstream5' UTR

500 bp downstreamConservation score

Cis-regulationChange of average mass (Da)Change of isoelectric point (pI)

Change of pK1Change of pK2

Change of hydro-phobicityChange of pKa

Change of polarityChange of pH

Change of van der WaalsTranscription regulator activity

Telomere organization andProtein folding

Glucose metabolic processRNA modification

Hexose metabolic processCell cycle checkpoint

Proteolysissame GO processsame GO functionChIP-chip binding

0 0.1 0.2 0.3 0.4 0.5

Non-synonymous codingSynonymous coding

IntronLocus region

Splice siteUTR region

Conservation scoreCis-regulation

Change of average mass (Da)Change of isoelectric point

Change of pK1Change of pKa

Change of polarityChange of pH

Change of van der Waals volumecell communicationsignal transduction

transcription

Human regulatory weights

Location

AA property change

Genefunctio

n

Pairwise feature

Regulatory features

Lee et al., PLOS Genetics 2009

Yeast regulatory weights

Page 32: Understanding Gene Regulation:  From Networks to Mechanisms

Statistical Evaluation#

of g

enes

wit

h PG

V >

X

% PGV: Percent genetic variation explained by the predicted

regulatory program for each gene

100 90 80 70 60 50 40 30 20 10 0

500

100

0 1

500

200

0

2500

PGV (%)

1450 genes (Lirnet without regulatory prior)

250 genes (Brem & Kruglyak)

850 genes (Geronemo*)

1650 genes (Lirnet)

* Lee et al. PNAS 2006Lee et al., PLOS Genetics 2009

Page 33: Understanding Gene Regulation:  From Networks to Mechanisms

How many predicted interactions have support in other data?

Deletion/ over-expression microarrays [Hughes et al. 2000; Chua et al. 2006] ChIP-chip binding experiments [Harbison et al. 2004] Transcription factor binding sites [Maclsaac et al. 2006] mRNA binding pull-down experiments [Gerber et al. 2004] Literature-curated signaling interactions

Biological evaluation I

0

10

20

30

40

50

60

70

80

90

GeronemoFlat LirnetLirnet

%interactions %interactions%modules %modules

Reg

Module

Via cascade

Lirnet without regulatory features

% S

uppo

rted

inte

ract

ions

TF

Lee et al., PLOS Genetics 2009

Page 34: Understanding Gene Regulation:  From Networks to Mechanisms

Biological Evaluation II

0

10

20

30

40

50

60

70

80

90

100

0 15 30 45 60 75 90

Significance (%)

% s

uppo

rted

regu

lato

ry p

redi

ctio

ns

Lee et al., PLOS Genetics 2009

LirnetZhu et al (Nature Genet 2008)Random

significance of support

Page 35: Understanding Gene Regulation:  From Networks to Mechanisms

What Regulates the P-Bodies?

Regu

lato

ry p

oten

tial

0.7

0.6

0.5

0.4

ChrXIV:415,000-495,000

Saccharomyces Genome Database (SGD)

High-scoring regulatory featuresConservation 0.230 Mass (Da) 0.066Cis-regulation 0.208 pK1 0.060Same GO proc

0.204 Polar 0.057

Non-syn 0.158 pI 0.050Translation regulation 0.046

MKT1 The regulatory potential over all 318 SNPs in the

region

ChrXIV:449639

Lee et al., PLOS Genetics 2009

Page 36: Understanding Gene Regulation:  From Networks to Mechanisms

XPG-N Pbp1 InteractionXPG-I*

Mkt1 Mkt1 binds to mRNAs at 3’ UTR BY has SNP at conserved residue in nuclease domain

0

5

10

15

20

25

30

35

-1.2 -0.8 -0.4 0 0.4 0.8 1.2

All othersP-body ModulePuf3 Module

MKT1

Puf3 module

P-body module

BYRM

mkt1 in RM

Lee et al., PLOS Genetics 2009

Page 37: Understanding Gene Regulation:  From Networks to Mechanisms

Predicting Causal Regulators

Region Zhu et al [Nat Genet 08] Lirnet (top 3 are considered)

1 NoneSEC18 RDH54 SPT7

2TBS1, TOS1, ARA1, CSH1, SUP45, CNS1, AMN1

AMN1 CNS1 TOS1

3 None TRS20 ABD1 PRP5

4 LEU2, ILV6, NFS1, CIT2, MATALPHA1 LEU2 PGS1 ILV65 MATALPHA1 MATALPHA1 MATALPHA2 RBK1

6 URA3 URA3 NPP2 PAC2

7 GPA1 STP2 GPA1 NEM1

8 HAP1 HAP1 NEJ1 GSY2

9 YRF1-4, YRF1-5, YLR464W SIR3 HMG2 ECM7

10 None ARG81 TAF13 CAC2

11 SAL1, TOP2 MKT1 TOP2 MSK1

12 PHM7 PHM7 ATG19 BRX1

13 None ADE2 ORT1 CAT5

Finding causal regulators for 13 “chromosomal hotspots”

8 validated regulators in 7

regions

14 validated regulators in 11

regions

Lee et al., PLOS Genetics 2009

Page 38: Understanding Gene Regulation:  From Networks to Mechanisms

Learning Regulatory Priors Learns regulatory potentials that are specific

to organism and even data set

Can use any set of regulatory features: Sequence features Functional features for relevant gene Features of regulator/target(s) pairs

Applicable to any organism, including ones where functional data may not be readily available

Lee et al., PLOS Genetics 2009

Page 39: Understanding Gene Regulation:  From Networks to Mechanisms

Conclusion Framework for modeling gene regulation

Use machine learning to identify regulatory program

Hierarchical Bayesian techniques to capture regularities in effects of perturbations on network

Uncovers diverse regulatory mechanisms Chromatin remodeling mRNA degradation

Page 40: Understanding Gene Regulation:  From Networks to Mechanisms

Acknowledgements

National Science Foundation

Nevan Krogan, UCSF Pam Silver, David Drubin, Harvard Medical

School

Su-In LeeUniversity of Washington

Dana Pe’er

Columbia University

Aimee Dudley

Institute for System Biology