why study genomics of gene regulation? · living cells immunoprecipitate reverse-crosslink ligate...
TRANSCRIPT
10/29/16
1
GenomicsofGeneRegulation1.Motifs,Conservation,andEpigeneticFeatures
CSHLCourseinComputationalandComparativeGenomics2016
RossHardison
10/29/16 1
WHYSTUDYGENOMICSOFGENEREGULATION?
10/29/16 2
10/29/16
2
Theguidingprincipleofdevelopmentalbiology:
Differentialgeneexpressiondeterminesthedistinctivepropertiesofeachcelltype.
E.H.Davidson,1976,GeneActivityinEarlyDevelopment,2nd ed.
10/17/16 3
We’vefoundmostofthegenes,butwhatabouttherestofthegenome?
10/29/16 4FromLisaStubbs,U.Illinois
bp/gene
10/29/16
3
Variantsaffectinggeneregulationplayaprominentroleincomplextraits
• Themajorityofgenomicvariantsassociatedwithcomplextraitsarenotinprotein-codingexons
– Hindorff etal(2009)PNAS106:9362.
• Phenotype-associated,noncodingvariantsarehighlyenrichedinDNAwithepigeneticsignaturesofregulatoryregions.
10/17/16 5
Maurano etal.(2012)Science337:1190
ENCODEConsortium(2012)IntegratedEncyclopedia…Nature
Schaub etal.(2012)GenomeResearch
Hardison(2012)JBC287:30932.
CRM=cis regulatorymodule,e.g.enhancer
DNASEQUENCESINVOLVEDINREGULATIONOFGENETRANSCRIPTION
Cis-regulatorymodules=CRMs
10/29/16 6
10/29/16
4
Distinctclassesofregulatoryregions
Maston G, Evans S and Green M (2006) Annu Rev Genomics Hum Genetics 7:29-59
Act in cis, affecting expression of a gene on the same chromosome.
Cis-regulatory modules (CRMs)
10/29/16 7
Operationaldefinitionsofcis-regulatorymodules
• Apromoter istheDNAsequencerequiredforcorrectinitiationoftranscription– Almostalwaysatthe5’ endofthegene.
• Anenhancer isaDNAsequencethatcausesanincrease ingeneexpression– Act independently ofposition andorientation with
respecttothegene
• Asilencer isaDNAsequencethatcausesadecrease ingeneexpression– Similartoenhancerbuthasanoppositeeffecton
geneexpression– SomeCRMsareswitches:enhancerorsilencer
dependingonconditions(particularTFbound,etc.)• Aninsulator isaDNAsequencethatblocks
activationofpromoterbyanenhancer
10/29/16 8
genePromoter
Enhancergene
Silencergene
Enhgene
Insulator
10/29/16
5
SEQUENCEDETERMINANTSOFCRMS
Theimportanceofmotifs
10/29/16 9
Generalfeaturesofpromoters
• ApromoteristheDNAsequencerequiredforcorrectinitiationoftranscription
• Mostpromotersareatthe5’ endofthegene.
Maston, Evans & Green (2006) Ann Rev Genomics & Human Genetics, 7:29-59
TATA box + Initiator:Core or minimal promoter. Site of assembly of preinitiation complex
Upstream regulatory elements:Regulate efficiency of utilization of minimal promoter
RNA polymerase II
10/29/16 10
10/29/16
6
MostpromotersinmammalsareCpGislands
TATA, no CpG island10-20% of promoters
CpG island, no TATA80-90% of promoters
Carninci … Hayashizaki (2006)Nature Genetics 38:626
10/29/16 11
Enhancers
ManyregulatoryDNAsequencesinSV40controlregion
• Cis-actingsequencesthatcauseanincrease inexpressionofagene• Actindependently ofposition andorientation withrespecttothe
gene
10/29/16 12
10/29/16
7
Featuresofcis-regulatory
modules(CRMs)
10/29/16 13
Hardison&Taylor(2012)NatureReviewsGenetics 13:469-483
a. Bound and unbound motif instances
b. Transcription factors and histone modifications characteristic of different CRMs
Motifs• Motifinstance=matchinagenomicDNAsequencetothepreferredsequenceforbindingatranscriptionfactor(TF)
• Promoters andenhancersarecomposedofmotifinstanceswithspacerDNA
• Arecollectionsofmotifssufficienttodefineapromoterand/orenhancer?
• Domotifscompriseagrammarthatexplainsgeneregulation?
10/29/16 14
10/29/16
8
MotifsareshortandthusthereareMANYinstancesofeachmotifinagenome
10/29/16 15FromLisaStubbs,U.Illinois
8millionmotifinstancesforbindingGATAfactorsinmousegenome
16
=matchtoWGATAR
10/29/16
10/29/16
9
Only15,000areboundinerythroidcells
17
About1in500instancesarebound.
10/29/16
HowtoidentifythemotifinstancesmostlikelytobeboundbyaTF?
• Evolutionaryconservation• Chromatinwithhallmarksofgeneactivity
– LackofDNAmethylation– DNasehypersensitivesites– Histonemodificationsassociatedwithactivity– Boundbyatranscriptionfactor(TF)
10/29/16 18
10/29/16
10
PredictCRMsbyinterspeciesconservationofmotifsorotherregulatoryinformation
19
HardisonandTaylor(2012)NatureReviewsGenetics13:469-483
10/29/16
EPIGENETICSIGNATURESOFREGULATORYREGIONS
10/29/16 20
10/29/16
11
Epigenetics
• EpigeneticfeaturesarebiochemicalmoleculesormodificationsthatareassociatedwithDNAsequences,butnottheDNAsequenceitself
• Theprefix“epi”means“ontopof”,i.e.biochemicalfeaturesontopoftheDNAsequence
• DNAmethylation,chromatinmodifications,transcriptionfactor(TF)binding,etc.
• Doesnot implytrans-generationalinheritance• Oftenpassedfrommothercelltodaughtercells
10/29/16 21
Epigeneticfeaturesassociatedwithtranscriptionalregulation
10/4/16 22
PromoterEnhancerRepressed chromatin Repressed chromatin
MethylationofDNA
10/29/16
12
EpigeneticfeaturescontrolaccesstoandactivityofCRMs
• SomefeaturesareassociatedwithdecreasedactivityofaCRM– MethylationofDNAatapromoter– MethylationofDNAataTFbindingsite– HistonemodificationsH3K27me3andH3K9me3– Wrappingintoheterochromatin
• SomefeaturesareassociatedwithincreasedactivityofaCRM– Accessibility(monitoredbynucleases)– BindingbysomeTFs– HistonemodificationssuchasH3K27acandH3K4me(1,2,3)
10/4/16 23
PredictCRMsbyepigeneticsignals
24HardisonandTaylor(2012)NatureReviewsGenetics13:469-48310/29/16
10/29/16
13
DNAMETHYLATION,CpG ISLANDSMostvertebratepromotersareCpG islands:
10/4/16 25
DNAmethylationinanimalsandplants
• Associatedwithgenesilencing– DNAofinactiveXchromosomeismethylated– DNAonthenonexpressed alleleofimprintedgenes(oneparentalallele
notexpressed)ismethylated– DNAmethylationprotectsagainst“parasiticDNA”:endogenousviruses,
someinterspersedrepeatsaremethylated– Severalnon-expressedgenesaremethylatedatparticularCpGs
• Disruptionofgenesneededfordenovo andmaintenancemethylationofDNAinmiceleadstoaberrantdevelopment
• RecentstudiesofDNAmethylationacrossmammaliangenomesrevealmorecomplicatedpatternssuggestingmultiplerolesforDNAmethylation
10/4/16 26
10/29/16
14
DNAmethylationatCG, CA
10/4/16 27
MeasureDNAmethylation
§ Sequencebisulfite-convertedDNA§ BisulfiteconvertsCtoU,butleavesmethyl-Cunconverted§ Scaletofullgenome
§ Ofthe583millionC’sinhumanhaploidgenome,atleast60%werecoveredby>=3reads:studythem
§ Listeretal.(2009)Nature462;Laurentetal.(2010)GenomeRes.20:320-331
§ MostC’sarenotmethylated(92to95%)§ C’sinCpG:55%aremethylated§ Decreaseinamountofmethylationduringdifferentiation
10/4/16 28
10/29/16
15
MeaningfulpatternsinDNAmethylation
10/4/16 29
WeiXie …J.A.Thompson,JosephEkker,BingRen(2013)Cell153:1134-1148
- DNAthroughoutthegenomeismethylated- DNAmethylationvalleyscontainsome(most?)regulatoryregions
CpG islandsarenotmethylated,frequentlypromoters
• CpG isthesiteofmethylationinvertebratesandplants– Deaminationchangesmethyl-CtoT– Occursabout10xmorefrequentlythanothertransitions– ResultsinCpG changingtoTpG orCpA (methyl-C>Toncomplementary
strand)– CausesasubstantialdepletionofCpG ingenome
• toabout20%ofexpectedfrequency
• LocalizedsegmentsofthegenomeretainahighCpG content– CpG islands– Arenotmethylated(ormethylatedindifferentiatedcells)– Oftenassociatedwithpromoters
10/4/16 30
10/29/16
16
CHROMATINSIGNATURESOFREGULATORYREGIONS
Histonemodifications,nucleaseaccessibility
10/29/16 31
Nucleosomecorestructure,3D
146bpduplexDNAwrappedaround8histonesmolecules:2eachofthedimersH2A-H2B,H3-H4N-terminaltailsofhistonesemergefromthecore
Lugeretal.(1997)Nature389:251,Richmondlab10/29/16 32
10/29/16
17
RepressionbyPcG proteins:Chromatinmodification
10/29/16 33
Polycomb Group (PcG) Repressor Complex 2:ESC, E(Z), NURF-55, and PcG repressorSU(Z)12
Methylates K27 of Histone H3 via the SETdomain of E(Z)
me3
H3 N-tailK27OFF
H3K9methylationassociatedwithrepression
10/29/16 34
H3 N-tail
me3K9
OFF
• H3K9methylationiscatalyzedbySUV39H1andG9amethyltransferases• G9a:monoanddi-methylation• SUV39H1:trimethylation
• di- andtri-MeH3K9:Bindingsiteforheterochromatinprotein1(HP1)
10/29/16
18
trx group(trxG)proteinsactivateviachromatinchanges
• Sometrithorax groupproteinscatalyzemethylationofK4inhistoneH3– Trx inDrosophila,MLLinhumans
• MLL=myeloid-lymphoidormixedlineageleukemia• Associatedwithactivationofexpression• H3K4me1mapstoenhancers(andelsewhere)• H3K4me3mapstoactivepromoters
10/29/16 35
H3 N-tail
me1,2,3K4
ON
AcetylationofH3K27atactiveenhancers
• AcetylationofK27inH3tailisassociatedwithactiveenhancers
• NotethatisNOTmethylationofH3K27– thatisisamarkofpolycomb repression
• MorespecificmarkforenhancersthanH3K4me1
10/29/16 36
ac
H3 N-tailK27ON
10/29/16
19
NucleosomeswithmodifiedhistonesflankregionsofaccessibleDNA
10/29/16 37
Window Positionchr11:
Mouse Dec. 2011 (GRCm38/mm10) chr11:102,359,001-102,379,000 (20,000 bp)102,365,000 102,370,000 102,375,000
Slc4a1 Bloodlinc
21:G1E H3K4me1
534:G1E H3K27ac
12:G1E-ER4 H3K4me1
538:ER4 H3K27ac
870:G1E ATAC
872:ER4 ATAC
G1E Gata2
G1E Tal1
G1E-ER4 Gata1
G1E-ER4 Tal1
Featuresofcis-regulatory
modules(CRMs)
Hardison&Taylor(2012)NatureReviewsGenetics 13:469-483
a. Bound and unbound motif instances
b. Transcription factors and histone modifications characteristic of different CRMs
10/29/16 38
10/29/16
20
METHODSINGENOMICSOFGENEREGULATION
ChIP-seq
10/29/16 39
MappingFunctionalElements
ENCODE consortium, 2011, PLoS Biology 9: e1001046 10/29/16 40
10/29/16
21
Living cells
ImmunoprecipitateReverse-crosslink
Ligate AdaptersAmplify
Size-select
Crosslink andisolate chromatin
AlignFind Peaks
ChIP Control
Sequence
ThemanystepsofChIP-seq
Daniel Savic
Shear
Illuminashortreadsequencing
10/29/16 42
- Dependingonthemodel,canget100’sofmillionstobillionsofreadsperrun- Wantabout20-50millionmappedreadsperChIP-seqassay(varieswithassay)- CanascertainmanyTFbinding,histonemodification,ornucleaseaccessibilitymapsperrun
10/29/16
22
ExampleofChIP-seq
ChIP vs NRSF = neuron-restrictive silencing factorJurkat human lymphoblast line
NPAS4 encodes neuronal PAS domain protein 4
Johnson DS, Mortazavi A, Myers RM, Wold B. (2007) Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science 316:1497-1502.
10/29/16 43
PatternsofepigeneticmarksinCRMsandtranscriptionunits
• Promoters– H3K4me3,H3K4me2– RNAPolII
• Enhancers– H3K4me1– P300coactivator
10/29/16 44
K4me3àK79me2àK36me3
Bernsteinlab,ENCODE
Heintzman …Ren (2007)NatureGenetics39:311-308;Birneyetal.(2007)Nature,447:799-816
CRMs
Transcriptionunits
10/29/16
23
ATAC-seq:Efficientassayforchromatinaccessibilitywithsmallnumbersofcells
10/29/16 45Buenrostro, Giresi, Zaba, Chang, Greenleaf (2013) Nature Methods
Differencesinregulatorylandscapeacrosshematopoiesis:Gata2
10/29/16 46
HSC
CMP
CLPGMPMEP
MEG ERY GRAMONO
T BNK
ATAC-seqprofilesofregulatorylandscape.DatafromHardisonandcollaboratorsandfromFriedmanandAmit labs:Lara-Astiasoetal.(2014)ScienceAug
50,000cellseach,sortedfrommouseBM.CherylKeller,ElisabethHeuston,DaveBodine,BelindaGiardine
10/29/16
24
Thecurrentstateofmappingfunction-associatedfeatures
10/29/16 47
ENCODEDataPortal
10/29/16 48
https://www.encodeproject.org
10/29/16
25
Facetstofilter
10/29/16 49
DNase sensitivityandTFoccupancyduringflydevelopment
DNase sensitivity
Berkeley Drosophila Transcription Network Project
TF occupancy
Genes
10/29/16 50
10/29/16
26
CONSERVATIONOFSEQUENCEANDEPIGENETICFEATURESOFCRMS
10/29/16 51
MethodsforpredictingCRMs
Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 52
10/29/16
27
Erythroidenhancer,HS2ofHBB locuscontrolregion
Window Positionchr11:
Short Match
SINELINELTRDNA
SimpleLow Complexity
SatelliteRNA
OtherUnknown
Human Feb. 2009 (GRCh37/hg19) chr11:5,301,795-5,302,089 (295 bp)5,301,850 5,301,900 5,301,950 5,302,000 5,302,050
HS2_pos
K562 Sg 1
PBDE GAT1 UCD
K562 Sig149 -
1 _
Mammal Cons
NFE2KLF1
TAL1GATA TFsbound
DNase footprints
Mammalianconstraint
ChIP-seqGATA1PBDE
DNase HSMatchWGATAR
10/29/16 53
ButnotallCRMsarethatobvious…
EvolutionaryconstraintonSOMEenhancers
• Occupancyoftranscriptionfactorsisconservedinmouseandhumans• StrongevidenceforevolutionaryconstraintontheDNAsequence• PreservationoftheTFbindingsitemotifsacrossmammals
Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 54
10/29/16
28
MotifturnoveratSOMEenhancers
• Occupancyoftranscriptionfactorsisconservedinmouseandhumans• MorelocalizedevolutionaryconstraintontheDNAsequence• PreservationofoneTFbindingsitemotifsacrossmammals,butsecondmotifisin
differentlocationinrodentscomparedtoothermammals(lineage-specific)
Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 55
LineagespecificevolutionofSOMEenhancers
• Occupancyoftranscriptionfactorsonlyinmouse,nothuman• NoevidenceforevolutionaryconstraintontheDNAsequence• PreservationofoneTFbindingsitemotifinrodentsandlaurasiatherians (dog,
horse,cow),butnotinhumans(lineage-specificlossofbinding?)
Hardison&Taylor(2012)NatureReviewsGenetics 13:469-48310/29/16 56
10/29/16
29
Differentapproachestofindingfunction
57ENCODEProjectConsortium "Definingfunctionalelementsinthehumangenome”(2014)PNAS
10/29/16
Howsimilararepatternsofgeneexpressionbetweenhumanandmouse?
10/29/16 58MouseENCODEProjectConsortium (2014) IntegratedEncyclopediaofmouseDNAelements.Nature
Geneswithhighvariancebetweentissues
Geneswithhighvariancebetweenspecies
Answerdependsonthegeneexamined
10/29/16
30
Useconservationordivergenceofexpressionpatternsonagene-by-genebasis
10/29/16 59
00 1 2 3 4 5
0.1
0.2De
nsity
Cons
train
ed
Unco
nstra
ined
Dynamic range of expression (log )100
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Prop
ortio
n of
var
iatio
n ac
ross
org
ans
Proportion of variation across species
TVGs
SVGs
Others
A. B.
Expressed at similar levels in all tissues and speciesHousekeeping genes maintaining basic cellular functions
Constrained genes: Unconstrained genes: Variable expression
Classify by dominant contributor to variability in expression: species or tissueTVGs: Tissue specific functions conserved across speciesSVGs: Basic cellular functions that diverged across species
Pervouchine etal. (2015) NatureCommunicationsBreschi etal.(2016)GenomeBiologyHardison(2016)GenomeBiology(ResearchHighlight)
Powerininterpretationofcomparisons
10/29/16 60GATA1factoroccupancyinerythroblastspreservedacrossmammals
Comparativegenomics
Comparativeepigenomics
Window Positionchr1:
--->Gaps
HumanOrangutan
RhesusMarmoset
Mouse_lemurMouse
RatGuinea_Pig
CowHorse
DogElephant
TenrecArmadillo
SlothOpossumPlatypusChicken
Lizard
Human Feb. 2009 (GRCh37/hg19) chr1:181,122,256-181,122,304 (49 bp)181,122,270 181,122,280 181,122,290 181,122,300
CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGC3
CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C TGTGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T CC T TGCA GCAGGGC T C TGA T A A T C TGCCGG T TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCA A A ACGT T CC T T A T C T C T T TGT A GCAGGAC T C TGA T A A T C TGCCCCC TCAGG - CG T T CC T T A T C T C T TGGC T GCAGGGT T C T CA T A A TGTGCCCAG TCGGGT CG T T CC T T A T C T C T T TGCA GCAGGGT T C TGA T A A T C TGCCCAG TCAGGACGT T CC T T A T C T C T C TGCA GCAGGGT T C TGA T A A TGCGCCC AG TCAGA A TGT T CC T T A T C T C T TGGC A C CAGGGC T - TGA T AGT CAGCCAGG TCA A A A TGT T CC T T A T C T C T TGGC A C CAGGGC T C TGA T A A T TGGCCAGG TCAGA A TGT CCC - T A T C T C T CGGCC C CA - GGCCC TGGT A A T C TGC T CGGCCAGA ACGT T CC - T A T C T C T TGGT T C CAGGGC T C TGA T A A T C TGCC TGGCCAGA A TGT T CCCCA T CGCC T C T CA C CGGGGCA T TGA T A AGC T ACCA T C TCAGA ACA T T CCC TGT CAC T T CGC A C CAGGGCA T TGA T A A A T T T T C T CC C
Signaturesof- purifyingselection- adaptiveevolution- lineagespecificity
MotifsforGATAfactorbindingpreservedacrossmammals
10/29/16
31
Conservation:Sequence-levelandactivity-level
61
About40%ofregulatoryDNA(TFBS,DHS)inmousemapstoaligningDNAinhuman.About10%ofTF-boundDNAinmouseisalsoboundbythesameTFinhuman.
Olgert Denas,RichardSandstrom,YongCheng,KathrynBeal,JavierHerrero,RossHardison,JamesTaylor,(2015)BMCGenomics.Genomewide comparativeanalysisrevealshumanmouse regulatorylandscapeandevolution.
10/29/16
4categoriesoffunctionalevolutionrevealedbycomparativeepigenomics
10/29/16 62
Functionconserved Functionactiveindifferenttissue
Notpresent(Lineagespecific)
SequenceconservedIn2nd species:
FunctCons FunctActive SeqCons LineageSpec
Tissue1
Tissue2
Denas,Sandstrom,Cheng,Beal,Herrero,Hardison,Taylor(2015)BMCGenomics;bioRxiv
TheMouseENCODEConsortium(2014)Nature
10/29/16
32
ConservedanddivergentoccupancyoforthologousDNAsegments
10/29/16 63
YongChengetal., Snyder,Hardison,Pennacchio labs (2014).PrinciplesofRegulatoryInformationConservationRevealedbyComparingMouseandHumanTranscriptionFactorBindingProfiles.Nature
ConservationofGATA1-occupancybetweenmouseandhuman
64
Window Positionchr1:
Mouse July 2007 (NCBI37/mm9) chr1:156,885,743-156,887,787 (2,045 bp)156,886,500 156,887,000 156,887,500
hs1862_heart
G1E-ER GATA1 24hr
Erythrobl GATA1
MEL GATA-1
Mammal Cons
Window Positionchr1:
--->Gaps
HumanOrangutan
RhesusMarmoset
Mouse_lemurMouse
RatGuinea_Pig
CowHorse
DogElephant
TenrecArmadillo
SlothOpossumPlatypusChicken
Lizard
Human Feb. 2009 (GRCh37/hg19) chr1:181,122,256-181,122,304 (49 bp)181,122,270 181,122,280 181,122,290 181,122,300
CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGC3
CAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C TGTGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T C T C TGCA GCAGGACGC TGA T A A T C TGCCCAGCCAGA ACGT T CC T T A T C T CC T TGCA GCAGGGC T C TGA T A A T C TGCCGG T TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCAGA A TGGT CC T T A T C T C T T TGCA GCAGGAC T C TGA T AGT C TGCCCCA TCA A A ACGT T CC T T A T C T C T T TGT A GCAGGAC T C TGA T A A T C TGCCCCC TCAGG - CG T T CC T T A T C T C T TGGC T GCAGGGT T C T CA T A A TGTGCCCAG TCGGGT CG T T CC T T A T C T C T T TGCA GCAGGGT T C TGA T A A T C TGCCCAG TCAGGACGT T CC T T A T C T C T C TGCA GCAGGGT T C TGA T A A TGCGCCC AG TCAGA A TGT T CC T T A T C T C T TGGC A C CAGGGC T - TGA T AGT CAGCCAGG TCA A A A TGT T CC T T A T C T C T TGGC A C CAGGGC T C TGA T A A T TGGCCAGG TCAGA A TGT CCC - T A T C T C T CGGCC C CA - GGCCC TGGT A A T C TGC T CGGCCAGA ACGT T CC - T A T C T C T TGGT T C CAGGGC T C TGA T A A T C TGCC TGGCCAGA A TGT T CCCCA T CGCC T C T CA C CGGGGCA T TGA T A AGC T ACCA T C TCAGA ACA T T CCC TGT CAC T T CGC A C CAGGGCA T TGA T A A A T T T T C T CC C
Window Positionchr1:
Human Feb. 2009 (GRCh37/hg19) chr1:181,121,049-181,123,654 (2,606 bp)181,121,500 181,122,000 181,122,500 181,123,000 181,123,500
hs1862
K562 GATA1 Sg
PBDE GATA1 Sg
Mammal Cons
MotifsforGATAfactorbindingpreservedacrossmammals10/29/16
10/29/16
33
ConservationofTFoccupancypredictsenhancersactiveinmultiple tissues
65
Model:Pleiotropic functions(multipletissues,multipleTFsbinding)aresubjecttostrongerconstraint,leadingtopreservationofoccupancydespitetendencyofregulatoryregionsto“turnover”
YongChengetal., Snyder,Hardison,Pennacchio labs (2014).PrinciplesofRegulatoryInformationConservationRevealedbyComparingMouseandHumanTranscriptionFactorBindingProfiles.Nature10/29/16
Non-erythroidfunctionofGATA1-boundsitescouldresultfrombindingofparalogs (e.g.GATA4)tosamesiteinothertissues
10/29/16 66
Window Positionchr3:
Mouse Dec. 2011 (GRCm38/mm10) chr3:84,438,567-84,482,797 (44,231 bp)84,445,000 84,450,000 84,455,000 84,460,000 84,465,000 84,470,000 84,475,000 84,480,000
Fhdc1Fhdc1
Fhdc1
GSM746581_2_Gata1.bw
226 -
0 _
GSM1151146_Gata1.bw
74 -
0 _
GSM558904_Gata4.bw
181 -
0 _
GSM558909_Ep300.bw
49 -
0 _
ERYGATA1
ERYGATA1
HeartGATA4
HeartEP300
Gottgens,CODEXhttp://codex.stemcells.cam.ac.uk
10/29/16
34
GenomicsofGeneRegulation:Motifs,Conservation,Epigenetics
• Almostallknowncis-regulatorymodulesareclustersofmotifsforTFbindingsites:appropriatemotifsatcorrectspacings
• Motifsareshortandabundant,suchthattheyprovidelimiteddiscriminatorypowerforpredictions– eveninclusters
• ConservationandpatternsofalignmentsinnoncodingregionscanbeusedtopredictCRMs
– Misslineagespecificfunctions,turnover
• Biochemicalfeaturesassociatedwithcis-regulatorymodulescanbeusedtopredictCRMs
– Mayover-callCRMs– TFoccupancyalonedoesnotnecessarilymeanthattheDNAisactivelyinvolvedin
regulation.• Startwithepigeneticfeatures,anduseevolutionarypatternstodiscern
historyandpredictfunctions– Somegeneshaveconservedexpressionpatterns,othersdifferbetweenspecies– ConservationofTFoccupancy:Pleiotropicfunctions,corefunctions– Lineage-specificTFoccupancy:Adaptivefunctions– Sequenceconservedbutfunctionco-opted(exapted)todifferentfunctioninonespecies
10/29/16 67