why study genomics of gene regulation? · living cells immunoprecipitate reverse-crosslink ligate...

34
10/29/16 1 Genomics of Gene Regulation 1. Motifs, Conservation, and Epigenetic Features CSHL Course in Computational and Comparative Genomics 2016 Ross Hardison 10/29/16 1 WHY STUDY GENOMICS OF GENE REGULATION? 10/29/16 2

Upload: others

Post on 03-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

1

GenomicsofGeneRegulation1.Motifs,Conservation,andEpigeneticFeatures

CSHLCourseinComputationalandComparativeGenomics2016

RossHardison

10/29/16 1

WHYSTUDYGENOMICSOFGENEREGULATION?

10/29/16 2

Page 2: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

2

Theguidingprincipleofdevelopmentalbiology:

Differentialgeneexpressiondeterminesthedistinctivepropertiesofeachcelltype.

E.H.Davidson,1976,GeneActivityinEarlyDevelopment,2nd ed.

10/17/16 3

We’vefoundmostofthegenes,butwhatabouttherestofthegenome?

10/29/16 4FromLisaStubbs,U.Illinois

bp/gene

Page 3: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

3

Variantsaffectinggeneregulationplayaprominentroleincomplextraits

• Themajorityofgenomicvariantsassociatedwithcomplextraitsarenotinprotein-codingexons

– Hindorff etal(2009)PNAS106:9362.

• Phenotype-associated,noncodingvariantsarehighlyenrichedinDNAwithepigeneticsignaturesofregulatoryregions.

10/17/16 5

Maurano etal.(2012)Science337:1190

ENCODEConsortium(2012)IntegratedEncyclopedia…Nature

Schaub etal.(2012)GenomeResearch

Hardison(2012)JBC287:30932.

CRM=cis regulatorymodule,e.g.enhancer

DNASEQUENCESINVOLVEDINREGULATIONOFGENETRANSCRIPTION

Cis-regulatorymodules=CRMs

10/29/16 6

Page 4: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

4

Distinctclassesofregulatoryregions

Maston G, Evans S and Green M (2006) Annu Rev Genomics Hum Genetics 7:29-59

Act in cis, affecting expression of a gene on the same chromosome.

Cis-regulatory modules (CRMs)

10/29/16 7

Operationaldefinitionsofcis-regulatorymodules

• Apromoter istheDNAsequencerequiredforcorrectinitiationoftranscription– Almostalwaysatthe5’ endofthegene.

• Anenhancer isaDNAsequencethatcausesanincrease ingeneexpression– Act independently ofposition andorientation with

respecttothegene

• Asilencer isaDNAsequencethatcausesadecrease ingeneexpression– Similartoenhancerbuthasanoppositeeffecton

geneexpression– SomeCRMsareswitches:enhancerorsilencer

dependingonconditions(particularTFbound,etc.)• Aninsulator isaDNAsequencethatblocks

activationofpromoterbyanenhancer

10/29/16 8

genePromoter

Enhancergene

Silencergene

Enhgene

Insulator

Page 5: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

5

SEQUENCEDETERMINANTSOFCRMS

Theimportanceofmotifs

10/29/16 9

Generalfeaturesofpromoters

• ApromoteristheDNAsequencerequiredforcorrectinitiationoftranscription

• Mostpromotersareatthe5’ endofthegene.

Maston, Evans & Green (2006) Ann Rev Genomics & Human Genetics, 7:29-59

TATA box + Initiator:Core or minimal promoter. Site of assembly of preinitiation complex

Upstream regulatory elements:Regulate efficiency of utilization of minimal promoter

RNA polymerase II

10/29/16 10

Page 6: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

6

MostpromotersinmammalsareCpGislands

TATA, no CpG island10-20% of promoters

CpG island, no TATA80-90% of promoters

Carninci … Hayashizaki (2006)Nature Genetics 38:626

10/29/16 11

Enhancers

ManyregulatoryDNAsequencesinSV40controlregion

• Cis-actingsequencesthatcauseanincrease inexpressionofagene• Actindependently ofposition andorientation withrespecttothe

gene

10/29/16 12

Page 7: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

7

Featuresofcis-regulatory

modules(CRMs)

10/29/16 13

Hardison&Taylor(2012)NatureReviewsGenetics 13:469-483

a. Bound and unbound motif instances

b. Transcription factors and histone modifications characteristic of different CRMs

Motifs• Motifinstance=matchinagenomicDNAsequencetothepreferredsequenceforbindingatranscriptionfactor(TF)

• Promoters andenhancersarecomposedofmotifinstanceswithspacerDNA

• Arecollectionsofmotifssufficienttodefineapromoterand/orenhancer?

• Domotifscompriseagrammarthatexplainsgeneregulation?

10/29/16 14

Page 8: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

8

MotifsareshortandthusthereareMANYinstancesofeachmotifinagenome

10/29/16 15FromLisaStubbs,U.Illinois

8millionmotifinstancesforbindingGATAfactorsinmousegenome

16

=matchtoWGATAR

10/29/16

Page 9: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

9

Only15,000areboundinerythroidcells

17

About1in500instancesarebound.

10/29/16

HowtoidentifythemotifinstancesmostlikelytobeboundbyaTF?

• Evolutionaryconservation• Chromatinwithhallmarksofgeneactivity

– LackofDNAmethylation– DNasehypersensitivesites– Histonemodificationsassociatedwithactivity– Boundbyatranscriptionfactor(TF)

10/29/16 18

Page 10: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

10

PredictCRMsbyinterspeciesconservationofmotifsorotherregulatoryinformation

19

HardisonandTaylor(2012)NatureReviewsGenetics13:469-483

10/29/16

EPIGENETICSIGNATURESOFREGULATORYREGIONS

10/29/16 20

Page 11: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

11

Epigenetics

• EpigeneticfeaturesarebiochemicalmoleculesormodificationsthatareassociatedwithDNAsequences,butnottheDNAsequenceitself

• Theprefix“epi”means“ontopof”,i.e.biochemicalfeaturesontopoftheDNAsequence

• DNAmethylation,chromatinmodifications,transcriptionfactor(TF)binding,etc.

• Doesnot implytrans-generationalinheritance• Oftenpassedfrommothercelltodaughtercells

10/29/16 21

Epigeneticfeaturesassociatedwithtranscriptionalregulation

10/4/16 22

PromoterEnhancerRepressed chromatin Repressed chromatin

MethylationofDNA

Page 12: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

12

EpigeneticfeaturescontrolaccesstoandactivityofCRMs

• SomefeaturesareassociatedwithdecreasedactivityofaCRM– MethylationofDNAatapromoter– MethylationofDNAataTFbindingsite– HistonemodificationsH3K27me3andH3K9me3– Wrappingintoheterochromatin

• SomefeaturesareassociatedwithincreasedactivityofaCRM– Accessibility(monitoredbynucleases)– BindingbysomeTFs– HistonemodificationssuchasH3K27acandH3K4me(1,2,3)

10/4/16 23

PredictCRMsbyepigeneticsignals

24HardisonandTaylor(2012)NatureReviewsGenetics13:469-48310/29/16

Page 13: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

13

DNAMETHYLATION,CpG ISLANDSMostvertebratepromotersareCpG islands:

10/4/16 25

DNAmethylationinanimalsandplants

• Associatedwithgenesilencing– DNAofinactiveXchromosomeismethylated– DNAonthenonexpressed alleleofimprintedgenes(oneparentalallele

notexpressed)ismethylated– DNAmethylationprotectsagainst“parasiticDNA”:endogenousviruses,

someinterspersedrepeatsaremethylated– Severalnon-expressedgenesaremethylatedatparticularCpGs

• Disruptionofgenesneededfordenovo andmaintenancemethylationofDNAinmiceleadstoaberrantdevelopment

• RecentstudiesofDNAmethylationacrossmammaliangenomesrevealmorecomplicatedpatternssuggestingmultiplerolesforDNAmethylation

10/4/16 26

Page 14: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

14

DNAmethylationatCG, CA

10/4/16 27

MeasureDNAmethylation

§ Sequencebisulfite-convertedDNA§ BisulfiteconvertsCtoU,butleavesmethyl-Cunconverted§ Scaletofullgenome

§ Ofthe583millionC’sinhumanhaploidgenome,atleast60%werecoveredby>=3reads:studythem

§ Listeretal.(2009)Nature462;Laurentetal.(2010)GenomeRes.20:320-331

§ MostC’sarenotmethylated(92to95%)§ C’sinCpG:55%aremethylated§ Decreaseinamountofmethylationduringdifferentiation

10/4/16 28

Page 15: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

15

MeaningfulpatternsinDNAmethylation

10/4/16 29

WeiXie …J.A.Thompson,JosephEkker,BingRen(2013)Cell153:1134-1148

- DNAthroughoutthegenomeismethylated- DNAmethylationvalleyscontainsome(most?)regulatoryregions

CpG islandsarenotmethylated,frequentlypromoters

• CpG isthesiteofmethylationinvertebratesandplants– Deaminationchangesmethyl-CtoT– Occursabout10xmorefrequentlythanothertransitions– ResultsinCpG changingtoTpG orCpA (methyl-C>Toncomplementary

strand)– CausesasubstantialdepletionofCpG ingenome

• toabout20%ofexpectedfrequency

• LocalizedsegmentsofthegenomeretainahighCpG content– CpG islands– Arenotmethylated(ormethylatedindifferentiatedcells)– Oftenassociatedwithpromoters

10/4/16 30

Page 16: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

16

CHROMATINSIGNATURESOFREGULATORYREGIONS

Histonemodifications,nucleaseaccessibility

10/29/16 31

Nucleosomecorestructure,3D

146bpduplexDNAwrappedaround8histonesmolecules:2eachofthedimersH2A-H2B,H3-H4N-terminaltailsofhistonesemergefromthecore

Lugeretal.(1997)Nature389:251,Richmondlab10/29/16 32

Page 17: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

17

RepressionbyPcG proteins:Chromatinmodification

10/29/16 33

Polycomb Group (PcG) Repressor Complex 2:ESC, E(Z), NURF-55, and PcG repressorSU(Z)12

Methylates K27 of Histone H3 via the SETdomain of E(Z)

me3

H3 N-tailK27OFF

H3K9methylationassociatedwithrepression

10/29/16 34

H3 N-tail

me3K9

OFF

• H3K9methylationiscatalyzedbySUV39H1andG9amethyltransferases• G9a:monoanddi-methylation• SUV39H1:trimethylation

• di- andtri-MeH3K9:Bindingsiteforheterochromatinprotein1(HP1)

Page 18: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

18

trx group(trxG)proteinsactivateviachromatinchanges

• Sometrithorax groupproteinscatalyzemethylationofK4inhistoneH3– Trx inDrosophila,MLLinhumans

• MLL=myeloid-lymphoidormixedlineageleukemia• Associatedwithactivationofexpression• H3K4me1mapstoenhancers(andelsewhere)• H3K4me3mapstoactivepromoters

10/29/16 35

H3 N-tail

me1,2,3K4

ON

AcetylationofH3K27atactiveenhancers

• AcetylationofK27inH3tailisassociatedwithactiveenhancers

• NotethatisNOTmethylationofH3K27– thatisisamarkofpolycomb repression

• MorespecificmarkforenhancersthanH3K4me1

10/29/16 36

ac

H3 N-tailK27ON

Page 19: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

19

NucleosomeswithmodifiedhistonesflankregionsofaccessibleDNA

10/29/16 37

Window Positionchr11:

Mouse Dec. 2011 (GRCm38/mm10) chr11:102,359,001-102,379,000 (20,000 bp)102,365,000 102,370,000 102,375,000

Slc4a1 Bloodlinc

21:G1E H3K4me1

534:G1E H3K27ac

12:G1E-ER4 H3K4me1

538:ER4 H3K27ac

870:G1E ATAC

872:ER4 ATAC

G1E Gata2

G1E Tal1

G1E-ER4 Gata1

G1E-ER4 Tal1

Featuresofcis-regulatory

modules(CRMs)

Hardison&Taylor(2012)NatureReviewsGenetics 13:469-483

a. Bound and unbound motif instances

b. Transcription factors and histone modifications characteristic of different CRMs

10/29/16 38

Page 20: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

20

METHODSINGENOMICSOFGENEREGULATION

ChIP-seq

10/29/16 39

MappingFunctionalElements

ENCODE consortium, 2011, PLoS Biology 9: e1001046 10/29/16 40

Page 21: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

21

Living cells

ImmunoprecipitateReverse-crosslink

Ligate AdaptersAmplify

Size-select

Crosslink andisolate chromatin

AlignFind Peaks

ChIP Control

Sequence

ThemanystepsofChIP-seq

Daniel Savic

Shear

Illuminashortreadsequencing

10/29/16 42

- Dependingonthemodel,canget100’sofmillionstobillionsofreadsperrun- Wantabout20-50millionmappedreadsperChIP-seqassay(varieswithassay)- CanascertainmanyTFbinding,histonemodification,ornucleaseaccessibilitymapsperrun

Page 22: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

22

ExampleofChIP-seq

ChIP vs NRSF = neuron-restrictive silencing factorJurkat human lymphoblast line

NPAS4 encodes neuronal PAS domain protein 4

Johnson DS, Mortazavi A, Myers RM, Wold B. (2007) Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science 316:1497-1502.

10/29/16 43

PatternsofepigeneticmarksinCRMsandtranscriptionunits

• Promoters– H3K4me3,H3K4me2– RNAPolII

• Enhancers– H3K4me1– P300coactivator

10/29/16 44

K4me3àK79me2àK36me3

Bernsteinlab,ENCODE

Heintzman …Ren (2007)NatureGenetics39:311-308;Birneyetal.(2007)Nature,447:799-816

CRMs

Transcriptionunits

Page 23: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

23

ATAC-seq:Efficientassayforchromatinaccessibilitywithsmallnumbersofcells

10/29/16 45Buenrostro, Giresi, Zaba, Chang, Greenleaf (2013) Nature Methods

Differencesinregulatorylandscapeacrosshematopoiesis:Gata2

10/29/16 46

HSC

CMP

CLPGMPMEP

MEG ERY GRAMONO

T BNK

ATAC-seqprofilesofregulatorylandscape.DatafromHardisonandcollaboratorsandfromFriedmanandAmit labs:Lara-Astiasoetal.(2014)ScienceAug

50,000cellseach,sortedfrommouseBM.CherylKeller,ElisabethHeuston,DaveBodine,BelindaGiardine

Page 24: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

24

Thecurrentstateofmappingfunction-associatedfeatures

10/29/16 47

ENCODEDataPortal

10/29/16 48

https://www.encodeproject.org

Page 25: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

25

Facetstofilter

10/29/16 49

DNase sensitivityandTFoccupancyduringflydevelopment

DNase sensitivity

Berkeley Drosophila Transcription Network Project

TF occupancy

Genes

10/29/16 50

Page 26: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

26

CONSERVATIONOFSEQUENCEANDEPIGENETICFEATURESOFCRMS

10/29/16 51

MethodsforpredictingCRMs

Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 52

Page 27: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

27

Erythroidenhancer,HS2ofHBB locuscontrolregion

Window Positionchr11:

Short Match

SINELINELTRDNA

SimpleLow Complexity

SatelliteRNA

OtherUnknown

Human Feb. 2009 (GRCh37/hg19) chr11:5,301,795-5,302,089 (295 bp)5,301,850 5,301,900 5,301,950 5,302,000 5,302,050

HS2_pos

K562 Sg 1

PBDE GAT1 UCD

K562 Sig149 -

1 _

Mammal Cons

NFE2KLF1

TAL1GATA TFsbound

DNase footprints

Mammalianconstraint

ChIP-seqGATA1PBDE

DNase HSMatchWGATAR

10/29/16 53

ButnotallCRMsarethatobvious…

EvolutionaryconstraintonSOMEenhancers

• Occupancyoftranscriptionfactorsisconservedinmouseandhumans• StrongevidenceforevolutionaryconstraintontheDNAsequence• PreservationoftheTFbindingsitemotifsacrossmammals

Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 54

Page 28: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

28

MotifturnoveratSOMEenhancers

• Occupancyoftranscriptionfactorsisconservedinmouseandhumans• MorelocalizedevolutionaryconstraintontheDNAsequence• PreservationofoneTFbindingsitemotifsacrossmammals,butsecondmotifisin

differentlocationinrodentscomparedtoothermammals(lineage-specific)

Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 55

LineagespecificevolutionofSOMEenhancers

• Occupancyoftranscriptionfactorsonlyinmouse,nothuman• NoevidenceforevolutionaryconstraintontheDNAsequence• PreservationofoneTFbindingsitemotifinrodentsandlaurasiatherians (dog,

horse,cow),butnotinhumans(lineage-specificlossofbinding?)

Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 56

Page 29: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

29

Differentapproachestofindingfunction

57ENCODEProjectConsortium "Definingfunctionalelementsinthehumangenome”(2014)PNAS

10/29/16

Howsimilararepatternsofgeneexpressionbetweenhumanandmouse?

10/29/16 58MouseENCODEProjectConsortium (2014) IntegratedEncyclopediaofmouseDNAelements.Nature

Geneswithhighvariancebetweentissues

Geneswithhighvariancebetweenspecies

Answerdependsonthegeneexamined

Page 30: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

30

Useconservationordivergenceofexpressionpatternsonagene-by-genebasis

10/29/16 59

00 1 2 3 4 5

0.1

0.2De

nsity

Cons

train

ed

Unco

nstra

ined

Dynamic range of expression (log )100

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Prop

ortio

n of

var

iatio

n ac

ross

org

ans

Proportion of variation across species

TVGs

SVGs

Others

A. B.

Expressed at similar levels in all tissues and speciesHousekeeping genes maintaining basic cellular functions

Constrained genes: Unconstrained genes: Variable expression

Classify by dominant contributor to variability in expression: species or tissueTVGs: Tissue specific functions conserved across speciesSVGs: Basic cellular functions that diverged across species

Pervouchine etal. (2015) NatureCommunicationsBreschi etal.(2016)GenomeBiologyHardison(2016)GenomeBiology(ResearchHighlight)

Powerininterpretationofcomparisons

10/29/16 60GATA1factoroccupancyinerythroblastspreservedacrossmammals

Comparativegenomics

Comparativeepigenomics

Window Positionchr1:

--->Gaps

HumanOrangutan

RhesusMarmoset

Mouse_lemurMouse

RatGuinea_Pig

CowHorse

DogElephant

TenrecArmadillo

SlothOpossumPlatypusChicken

Lizard

Human Feb. 2009 (GRCh37/hg19) chr1:181,122,256-181,122,304 (49 bp)181,122,270 181,122,280 181,122,290 181,122,300

CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGC3

CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C TGTGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T CC T TGCA GCAGGGC T C TGA T A A T C TGCCGG T TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCA A A ACGT T CC T T A T C T C T T TGT A GCAGGAC T C TGA T A A T C TGCCCCC TCAGG - CG T T CC T T A T C T C T TGGC T GCAGGGT T C T CA T A A TGTGCCCAG TCGGGT CG T T CC T T A T C T C T T TGCA GCAGGGT T C TGA T A A T C TGCCCAG TCAGGACGT T CC T T A T C T C T C TGCA GCAGGGT T C TGA T A A TGCGCCC AG TCAGA A TGT T CC T T A T C T C T TGGC A C CAGGGC T - TGA T AGT CAGCCAGG TCA A A A TGT T CC T T A T C T C T TGGC A C CAGGGC T C TGA T A A T TGGCCAGG TCAGA A TGT CCC - T A T C T C T CGGCC C CA - GGCCC TGGT A A T C TGC T CGGCCAGA ACGT T CC - T A T C T C T TGGT T C CAGGGC T C TGA T A A T C TGCC TGGCCAGA A TGT T CCCCA T CGCC T C T CA C CGGGGCA T TGA T A AGC T ACCA T C TCAGA ACA T T CCC TGT CAC T T CGC A C CAGGGCA T TGA T A A A T T T T C T CC C

Signaturesof- purifyingselection- adaptiveevolution- lineagespecificity

MotifsforGATAfactorbindingpreservedacrossmammals

Page 31: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

31

Conservation:Sequence-levelandactivity-level

61

About40%ofregulatoryDNA(TFBS,DHS)inmousemapstoaligningDNAinhuman.About10%ofTF-boundDNAinmouseisalsoboundbythesameTFinhuman.

Olgert Denas,RichardSandstrom,YongCheng,KathrynBeal,JavierHerrero,RossHardison,JamesTaylor,(2015)BMCGenomics.Genomewide comparativeanalysisrevealshumanmouse regulatorylandscapeandevolution.

10/29/16

4categoriesoffunctionalevolutionrevealedbycomparativeepigenomics

10/29/16 62

Functionconserved Functionactiveindifferenttissue

Notpresent(Lineagespecific)

SequenceconservedIn2nd species:

FunctCons FunctActive SeqCons LineageSpec

Tissue1

Tissue2

Denas,Sandstrom,Cheng,Beal,Herrero,Hardison,Taylor(2015)BMCGenomics;bioRxiv

TheMouseENCODEConsortium(2014)Nature

Page 32: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

32

ConservedanddivergentoccupancyoforthologousDNAsegments

10/29/16 63

YongChengetal., Snyder,Hardison,Pennacchio labs (2014).PrinciplesofRegulatoryInformationConservationRevealedbyComparingMouseandHumanTranscriptionFactorBindingProfiles.Nature

ConservationofGATA1-occupancybetweenmouseandhuman

64

Window Positionchr1:

Mouse July 2007 (NCBI37/mm9) chr1:156,885,743-156,887,787 (2,045 bp)156,886,500 156,887,000 156,887,500

hs1862_heart

G1E-ER GATA1 24hr

Erythrobl GATA1

MEL GATA-1

Mammal Cons

Window Positionchr1:

--->Gaps

HumanOrangutan

RhesusMarmoset

Mouse_lemurMouse

RatGuinea_Pig

CowHorse

DogElephant

TenrecArmadillo

SlothOpossumPlatypusChicken

Lizard

Human Feb. 2009 (GRCh37/hg19) chr1:181,122,256-181,122,304 (49 bp)181,122,270 181,122,280 181,122,290 181,122,300

CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGC3

CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C TGTGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T CC T TGCA GCAGGGC T C TGA T A A T C TGCCGG T TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCA A A ACGT T CC T T A T C T C T T TGT A GCAGGAC T C TGA T A A T C TGCCCCC TCAGG - CG T T CC T T A T C T C T TGGC T GCAGGGT T C T CA T A A TGTGCCCAG TCGGGT CG T T CC T T A T C T C T T TGCA GCAGGGT T C TGA T A A T C TGCCCAG TCAGGACGT T CC T T A T C T C T C TGCA GCAGGGT T C TGA T A A TGCGCCC AG TCAGA A TGT T CC T T A T C T C T TGGC A C CAGGGC T - TGA T AGT CAGCCAGG TCA A A A TGT T CC T T A T C T C T TGGC A C CAGGGC T C TGA T A A T TGGCCAGG TCAGA A TGT CCC - T A T C T C T CGGCC C CA - GGCCC TGGT A A T C TGC T CGGCCAGA ACGT T CC - T A T C T C T TGGT T C CAGGGC T C TGA T A A T C TGCC TGGCCAGA A TGT T CCCCA T CGCC T C T CA C CGGGGCA T TGA T A AGC T ACCA T C TCAGA ACA T T CCC TGT CAC T T CGC A C CAGGGCA T TGA T A A A T T T T C T CC C

Window Positionchr1:

Human Feb. 2009 (GRCh37/hg19) chr1:181,121,049-181,123,654 (2,606 bp)181,121,500 181,122,000 181,122,500 181,123,000 181,123,500

hs1862

K562 GATA1 Sg

PBDE GATA1 Sg

Mammal Cons

MotifsforGATAfactorbindingpreservedacrossmammals10/29/16

Page 33: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

33

ConservationofTFoccupancypredictsenhancersactiveinmultiple tissues

65

Model:Pleiotropic functions(multipletissues,multipleTFsbinding)aresubjecttostrongerconstraint,leadingtopreservationofoccupancydespitetendencyofregulatoryregionsto“turnover”

YongChengetal., Snyder,Hardison,Pennacchio labs (2014).PrinciplesofRegulatoryInformationConservationRevealedbyComparingMouseandHumanTranscriptionFactorBindingProfiles.Nature10/29/16

Non-erythroidfunctionofGATA1-boundsitescouldresultfrombindingofparalogs (e.g.GATA4)tosamesiteinothertissues

10/29/16 66

Window Positionchr3:

Mouse Dec. 2011 (GRCm38/mm10) chr3:84,438,567-84,482,797 (44,231 bp)84,445,000 84,450,000 84,455,000 84,460,000 84,465,000 84,470,000 84,475,000 84,480,000

Fhdc1Fhdc1

Fhdc1

GSM746581_2_Gata1.bw

226 -

0 _

GSM1151146_Gata1.bw

74 -

0 _

GSM558904_Gata4.bw

181 -

0 _

GSM558909_Ep300.bw

49 -

0 _

ERYGATA1

ERYGATA1

HeartGATA4

HeartEP300

Gottgens,CODEXhttp://codex.stemcells.cam.ac.uk

Page 34: WHY STUDY GENOMICS OF GENE REGULATION? · Living cells Immunoprecipitate Reverse-crosslink Ligate Adapters Amplify Size-select Crosslink and isolate chromatin Align Find Peaks ChIP

10/29/16

34

GenomicsofGeneRegulation:Motifs,Conservation,Epigenetics

• Almostallknowncis-regulatorymodulesareclustersofmotifsforTFbindingsites:appropriatemotifsatcorrectspacings

• Motifsareshortandabundant,suchthattheyprovidelimiteddiscriminatorypowerforpredictions– eveninclusters

• ConservationandpatternsofalignmentsinnoncodingregionscanbeusedtopredictCRMs

– Misslineagespecificfunctions,turnover

• Biochemicalfeaturesassociatedwithcis-regulatorymodulescanbeusedtopredictCRMs

– Mayover-callCRMs– TFoccupancyalonedoesnotnecessarilymeanthattheDNAisactivelyinvolvedin

regulation.• Startwithepigeneticfeatures,anduseevolutionarypatternstodiscern

historyandpredictfunctions– Somegeneshaveconservedexpressionpatterns,othersdifferbetweenspecies– ConservationofTFoccupancy:Pleiotropicfunctions,corefunctions– Lineage-specificTFoccupancy:Adaptivefunctions– Sequenceconservedbutfunctionco-opted(exapted)todifferentfunctioninonespecies

10/29/16 67