the accessible chromatin landscape of the human genome articulo del curso

8
ARTICLE doi:10.1038/nature11232 The accessible chromatin landscape of the human genome Robert E. Thurman 1 *, Eric Rynes 1 *, Richard Humbert 1 *, Jeff Vierstra 1 , Matthew T. Maurano 1 , Eric Haugen 1 , Nathan C. Sheffield 2 , Andrew B. Stergachis 1 , Hao Wang 1 , Benjamin Vernot 1 , Kavita Garg 3 , Sam John 1 , Richard Sandstrom 1 , Daniel Bates 1 , Lisa Boatman 4 , Theresa K. Canfield 1 , Morgan Diegel 1 , Douglas Dunn 1 , Abigail K. Ebersol 4 , Tristan Frum 4 , Erika Giste 1 , Audra K. Johnson 1 , Ericka M. Johnson 4 , Tanya Kutyavin 1 , Bryan Lajoie 5 , Bum-Kyu Lee 6 , Kristen Lee 1 , Darin London 2 , Dimitra Lotakis 4 , Shane Neph 1 , Fidencio Neri 1 , Eric D. Nguyen 4 , Hongzhu Qu 1,7 , Alex P. Reynolds 1 , Vaughn Roach 1 , Alexias Safi 2 , Minerva E. Sanchez 4 , Amartya Sanyal 5 , Anthony Shafer 1 , Jeremy M. Simon 8 , Lingyun Song 2 , Shinny Vong 1 , Molly Weaver 1 , Yongqi Yan 4 , Zhancheng Zhang 8 , Zhuzhu Zhang 8 , Boris Lenhard 9 {, Muneesh Tewari 3 , Michael O. Dorschner 10 , R. Scott Hansen 4 , Patrick A. Navas 4 , George Stamatoyannopoulos 4 , Vishwanath R. Iyer 6 , Jason D. Lieb 8 , Shamil R. Sunyaev 11 , Joshua M. Akey 1 , Peter J. Sabo 1 , Rajinder Kaul 4 , Terrence S. Furey 8 , Job Dekker 5 , Gregory E. Crawford 2 & John A. Stamatoyannopoulos 1,12 DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes of cis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we present the first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. We identify 2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences and expose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements using ENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation and regulatory factor occupancy patterns. We connect 580,000 distal DHSs with their target promoters, revealing systematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibility at many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase I sensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape shows signatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent and immortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected link between chromatin accessibility, proliferative potential and patterns of human variation. Cell-selective activation of regulatory DNA drives the gene expression patterns that shape cell identity. Regulatory DNA is characterized by the cooperative binding of sequence-specific transcriptional regulatory factors in place of a canonical nucleosome, leading to a remodelled chromatin state char- acterized by markedly heightened accessibility to nucleases 1 . DNase I hypersensitive sites (DHSs) in chromatin were first identified over 30years ago, and have since been used extensively to map regulatory DNA regions in diverse organisms 2 . DNase I hypersensitivity is central to all defined classes of active cis-regulatory elements including enhan- cers, promoters, silencers, insulators and locus control regions 2–4 . Because DNase I hypersensitivity overlies cis-regulatory elements directly and is maximal over the core region of regulatory factor occu- pancy, it enables precise delineation of the genomic cis-regulatory compartment. DHSs are flanked by nucleosomes, which may acquire histone modification patterns that reflect the functional role of the adjoining regulatory DNA, such as the association of histone H3 lysine 4 trimethylation (H3K4me3) with promoter elements 5 . Recent advances have enabled genome-scale mapping of DHSs in mammalian cells 6–8 , laying the foundations for comprehensive cata- logues of human regulatory DNA. General features of the accessible chromatin landscape Two ENCODE production centres (University of Washington and Duke University) profiled DNase I sensitivity genome-wide using massively parallel sequencing 7–9 in a total of 125 human cell and tissue types including normal differentiated primary cells (n 5 71), immortalized primary cells (n 5 16), malignancy-derived cell lines (n 5 30) and multipotent and pluripotent progenitor cells (n 5 8) (Supplementary Table 1). The density of mapped DNase I cleavages as a function of genome position provides a continuous quantitative measure of chromatin accessibility, in which DHSs appear as prominent peaks within the signal data from each cell type (Fig. 1a and Supplementary Figs 1 and 2). Analysis using a common algorithm (see Methods) identified 2,890,742 distinct high-confidence DHSs (false discovery rate (FDR) of 1%; see Methods), each of which was active in one or more cell types. Of these DHSs, 970,100 were specific to a single cell type, 1,920,642 were active in 2 or more cell types, and a ENCODE Encyclopedia of DNA Elements nature.com/encode *These authors contributed equally to this work. 1 Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA. 2 Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708, USA. 3 Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA. 4 Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, Washington 98195, USA. 5 Program in Systems Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA. 6 Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas 78712, USA. 7 Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China. 8 Department of Biology, University of North Carolina, Chapel Hill, North Carolina 27599, USA. 9 Department of Biology and Bergen Center for Computational Science, University of Bergen, Bergen 5008, Norway. 10 Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington 98195, USA. 11 Department of Medicine, Division of Genetics, Brigham & Women’s Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA. 12 Department of Medicine, Division of Oncology, University of Washington, Seattle, Washington 98195, USA. {Present address: Institute for Clinical Sciences, Faculty of Medicine, Imperial College London, and MRC Clinical Sciences Centre, London W12 0NN, UK. 6 SEPTEMBER 2012 | VOL 489 | NATURE | 75 Macmillan Publishers Limited. All rights reserved ©2012

Upload: guillermo-jose-ruiz-florez

Post on 11-Aug-2015

11 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

ARTICLEdoi:10.1038/nature11232

The accessible chromatin landscape ofthe human genomeRobert E. Thurman1*, Eric Rynes1*, Richard Humbert1*, Jeff Vierstra1, Matthew T. Maurano1, Eric Haugen1, Nathan C. Sheffield2,Andrew B. Stergachis1, Hao Wang1, Benjamin Vernot1, Kavita Garg3, Sam John1, Richard Sandstrom1, Daniel Bates1, Lisa Boatman4,Theresa K. Canfield1, Morgan Diegel1, Douglas Dunn1, Abigail K. Ebersol4, Tristan Frum4, Erika Giste1, Audra K. Johnson1,Ericka M. Johnson4, Tanya Kutyavin1, Bryan Lajoie5, Bum-Kyu Lee6, Kristen Lee1, Darin London2, Dimitra Lotakis4, Shane Neph1,Fidencio Neri1, Eric D. Nguyen4, Hongzhu Qu1,7, Alex P. Reynolds1, Vaughn Roach1, Alexias Safi2, Minerva E. Sanchez4,Amartya Sanyal5, Anthony Shafer1, Jeremy M. Simon8, Lingyun Song2, Shinny Vong1, Molly Weaver1, Yongqi Yan4,Zhancheng Zhang8, Zhuzhu Zhang8, Boris Lenhard9{, Muneesh Tewari3, Michael O. Dorschner10, R. Scott Hansen4,Patrick A. Navas4, George Stamatoyannopoulos4, Vishwanath R. Iyer6, Jason D. Lieb8, Shamil R. Sunyaev11, Joshua M. Akey1,Peter J. Sabo1, Rajinder Kaul4, Terrence S. Furey8, Job Dekker5, Gregory E. Crawford2 & John A. Stamatoyannopoulos1,12

DNase I hypersensitive sites (DHSs) are markers of regulatory DNA and have underpinned the discovery of all classes ofcis-regulatory elements including enhancers, promoters, insulators, silencers and locus control regions. Here we presentthe first extensive map of human DHSs identified through genome-wide profiling in 125 diverse cell and tissue types. Weidentify 2.9 million DHSs that encompass virtually all known experimentally validated cis-regulatory sequences andexpose a vast trove of novel elements, most with highly cell-selective regulation. Annotating these elements usingENCODE data reveals novel relationships between chromatin accessibility, transcription, DNA methylation andregulatory factor occupancy patterns. We connect 580,000 distal DHSs with their target promoters, revealingsystematic pairing of different classes of distal DHSs and specific promoter types. Patterning of chromatin accessibilityat many regulatory regions is organized with dozens to hundreds of co-activated elements, and the transcellular DNase Isensitivity pattern at a given region can predict cell-type-specific functional behaviours. The DHS landscape showssignatures of recent functional evolutionary constraint. However, the DHS compartment in pluripotent andimmortalized cells exhibits higher mutation rates than that in highly differentiated cells, exposing an unexpected linkbetween chromatin accessibility, proliferative potential and patterns of human variation.

Cell-selective activation of regulatory DNAdrives the gene expression patterns that shapecell identity. Regulatory DNA is characterizedby the cooperative binding of sequence-specifictranscriptional regulatory factors in place of acanonical nucleosome, leading to a remodelled chromatin state char-acterized by markedly heightened accessibility to nucleases1. DNase Ihypersensitive sites (DHSs) in chromatin were first identified over30 years ago, and have since been used extensively to map regulatoryDNA regions in diverse organisms2. DNase I hypersensitivity is centralto all defined classes of active cis-regulatory elements including enhan-cers, promoters, silencers, insulators and locus control regions2–4.Because DNase I hypersensitivity overlies cis-regulatory elementsdirectly and is maximal over the core region of regulatory factor occu-pancy, it enables precise delineation of the genomic cis-regulatorycompartment. DHSs are flanked by nucleosomes, which may acquirehistone modification patterns that reflect the functional role of theadjoining regulatory DNA, such as the association of histone H3 lysine 4trimethylation (H3K4me3) with promoter elements5. Recent advanceshave enabled genome-scale mapping of DHSs in mammalian cells6–8,

laying the foundations for comprehensive cata-logues of human regulatory DNA.

General features of the accessiblechromatin landscape

Two ENCODE production centres (University of Washington andDuke University) profiled DNase I sensitivity genome-wide usingmassively parallel sequencing7–9 in a total of 125 human cell andtissue types including normal differentiated primary cells (n 5 71),immortalized primary cells (n 5 16), malignancy-derived cell lines(n 5 30) and multipotent and pluripotent progenitor cells (n 5 8)(Supplementary Table 1). The density of mapped DNase I cleavagesas a function of genome position provides a continuous quantitativemeasure of chromatin accessibility, in which DHSs appear asprominent peaks within the signal data from each cell type (Fig. 1aand Supplementary Figs 1 and 2). Analysis using a common algorithm(see Methods) identified 2,890,742 distinct high-confidence DHSs(false discovery rate (FDR) of 1%; see Methods), each of which wasactive in one or more cell types. Of these DHSs, 970,100 were specificto a single cell type, 1,920,642 were active in 2 or more cell types, and a

ENCODEEncyclopedia of DNA Elementsnature.com/encode

*These authors contributed equally to this work.

1Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA. 2Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina 27708, USA. 3Division ofHuman Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA. 4Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, Washington 98195,USA. 5Program in Systems Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA. 6Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas78712, USA. 7Laboratory of Disease Genomics and Individualized Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100029, China. 8Department of Biology, University of NorthCarolina, Chapel Hill, North Carolina 27599, USA. 9Department of Biology and Bergen Center for Computational Science, University of Bergen, Bergen 5008, Norway. 10Department of Psychiatry andBehavioral Sciences, University of Washington, Seattle, Washington 98195, USA. 11Department of Medicine, Division of Genetics, Brigham & Women’s Hospital and Harvard Medical School, Boston,Massachusetts 02115, USA. 12Department of Medicine, Division of Oncology, University of Washington, Seattle, Washington 98195, USA. {Present address: Institute for Clinical Sciences, Faculty ofMedicine, Imperial College London, and MRC Clinical Sciences Centre, London W12 0NN, UK.

6 S E P T E M B E R 2 0 1 2 | V O L 4 8 9 | N A T U R E | 7 5

Macmillan Publishers Limited. All rights reserved©2012

Page 2: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

small minority (3,692) was detected in all cell types. The relativeaccessibility of DHSs along the genome varies by .100-fold and ishighly consistent across cell types (Supplementary Figs 1 and 2). Toestimate the sensitivity and accuracy of the sequencing-derived DHSmaps, one ENCODE production centre (University of Washington)performed 7,478 classical DNase I hypersensitivity experiments bythe Southern hybridization method2. Using Southern blots as thestandard, the average sensitivity, per cell type, of DNase I-seq (at asequencing depth of 30 M uniquely mapping reads) was 81.6%, withspecificity of 99.5–99.9%. Of DHSs classified as false negatives withina particular cell type, an average of 92.4% were detected as a DHS in

another cell type or upon deeper sequencing. As such, we estimatethat the overall sensitivity for DHSs of the combined cell type mapsis .98%.

Approximately 3% (n 5 75,575) of DHSs localize to transcriptionalstart sites (TSSs) defined by GENCODE10 and 5% (n 5 135,735,including the aforementioned) lie within 2.5 kilobases (kb) of a TSS.The remaining 95% of DHSs are positioned more distally, and areroughly evenly divided between intronic and intergenic regions(Fig. 1b). Promoters typically exhibit high accessibility across cell types,with the average promoter DHS detected in 29 cell types (Fig. 1c,second column). By contrast, distal DHSs are largely cell selective(Fig. 1c, third column).

MicroRNAs (miRNAs) comprise a major class of regulatorymolecules and have been extensively studied, resulting in consensusannotation of hundreds of conserved miRNA genes11, approximatelyone-third of which are organized in polycistronic clusters12. However,most predicted promoters driving microRNA expression lackexperimental evidence. Of 329 unique annotated miRNA TSSs(Supplementary Methods), 300 (91%) either coincided with or closelyapproximated (,500 base pairs (bp)) a DHS. Chromatin accessibilityat miRNA promoters was highly promiscuous compared withGENCODE TSSs (Fig. 1c, fourth column), and showed cell lineageorganization, paralleling the known regulatory roles of well-annotatedlineage-specific miRNAs (Supplementary Fig. 3).

The 20–50-bp read lengths from DNase I-seq experiments enabledunique mapping to 86.9% of the genomic sequence, allowing us tointerrogate a large fraction of transposon sequences. A surprisingnumber contain highly regulated DHSs (Fig. 1c, fifth column andSupplementary Figs 4 and 5), compatible with cell-specific transcrip-tion of repetitive elements detected using ENCODE RNA sequencingdata13. DHSs were most strongly enriched at long terminal repeat (LTR)elements, which encode retroviral enhancer structures (Supplemen-tary Table 2). Two such examples are shown in Supplementary Fig. 4,which also illustrates the strong cell-selectivity of chromatin accessibilityseen for each major repeat class. We also documented numerousexamples of transposon DHSs that displayed enhancer activity in tran-sient transfection assays (Supplementary Table 3).

Comparison with an extensive compilation of 1,046 experimentallyvalidated distal, non-promoter cis-regulatory elements (enhancers,insulators, locus control regions, and so on) revealed the overwhelm-ing majority (97.4%) to be encompassed within DNase I hypersensi-tive chromatin (Supplementary Table 4), typically with strong cellselectivity (Supplementary Fig. 2b).

Transcription factor drivers of chromatin accessibilityDNase I hypersensitive sites result from cooperative binding of tran-scriptional factors in place of a canonical nucleosome1,2. To quantifythe relationship between chromatin accessibility and the occupancy ofregulatory factors, we compared sequencing-depth-normalizedDNase I sensitivity in the ENCODE common cell line K562 to normal-ized chromatin immunoprecipitation and high-throughput sequencing(ChIP-seq) signals from all 42 transcription factors mapped byENCODE ChIP-seq14 in this cell type (Fig. 2). Simple summation ofthe ChIP-seq signals markedly parallels quantitative DNase I sensitivityat individual DHSs (Fig. 2a) and across the genome (r 5 0.79, Fig. 2b).For example, the b-globin locus control region contains a majorenhancer element at hypersensitive site 2 (HS2), which appears to beoccupied by dozens of transcription factors (Supplementary Fig. 6a).Such highly overlapping binding patterns have been interpreted tosignify weak interactions with lower-affinity recognition sequencespotentiated by an accessible DNA template15. However, HS2 is a com-pact element with a functional core spanning ,110 bp that contains5–8 sites of transcription factor–DNA interaction in vivo depending onthe cell type16–18. The fact that the cumulative ChIP-seq signal closelyparallels the degree of nuclease sensitivity at HS2 and elsewhere is thusmost readily explained by interactions between DNA-bound factors

HUVEC

CD20+

GM06990

WI-38+TAMHVMF

BJ

HMVEC-dAdHMVEC-LLy

AG04450

AG10803

WI-38

HAc

NHEKHMECSAECHRCE

HREWERI-Rb-1

BE2_CSK-N-MCSK-N-SH

HepG2Caco-2LNCaP

HeLa-S3PANC-1NT2-D1

H1 hESCHMVEC-dLy-Neo

HMVEC-LBI

HPAF

NHDF-Neo

HCMHBMEC

HFF

NHLF

HNPCEpiC

HRPEpiC

SKMCHAEpiC

HSMM

TH1

GM12878Jurkat

CD34+_MobilizedK562

Monocytes_CD14+

ap15.4 p13 p12 q14.1 21 q22.3 23.3 25

MICALCLPARVA

12400000 12600000N

um

ber

of

cell

typ

es

in w

hic

h D

HS

is d

ete

cte

d

120

0

40

80

Prom

oter

All

Dist

al

(>10 k

b)m

iRNA

prom

oters

Repea

t

DHS

Constitutive

Cell-type

specific

Classes of DHSs

c

Chr11 (p15.3)

10 kb

HSMM

HAEpiC

HCM

AG10803

1241000012390000

PARVA

SAEC

H1 hESC

K562

HMEC

H1 hESC

HMEC

SAEC

K562

HSMM

HAEpiC

HCM

AG10803

5 kb

12470000 12475000PARVA

b500,000

400,000

300,000

200,000

100,000

0

135,735

246,425

482,236

287,156

20,429 3,664 266

<2.5 kb2.5–

10 kb

10–

50 kb

50–

250 kb

250–

500 kb

500 kb–

1 Mb>1 Mb

Promoter

(75,575)

Exon

(69,083)

Intron

(1,484,669)

UTR

(85,504)

Distal

(1,175,911)

Dis

tal D

HS

s

Figure 1 | General features of the DHS landscape. a, Density of DNase Icleavage sites for selected cell types, shown for an example ,350-kb region. Tworegions are shown to the right in greater detail. b, Left: distribution of 2,890,742DHSs with respect to GENCODE gene annotations. Promoter DHSs are definedas the first DHS localizing within 1 kb upstream of a GENCODE TSS. Right:distribution of intergenic DHSs relative to Gencode TSSs. c, Distributions of thenumber of cell types, from 1 to 125 (y axis), in which DHSs in each of four classes(x axis) are observed. Width of each shape at a given y value shows the relativefrequency of DHSs present in that number of cell types.

RESEARCH ARTICLE

7 6 | N A T U R E | V O L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012

Page 3: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

and other interacting factors that collectively potentiate the accessiblechromatin state (Supplementary Fig. 6b). Given the relatively limitednumber of factors studied, it may seem surprising that such a closecorrelation should be evident. However, most of the factors selectedfor ENCODE ChIP-seq studies have well-described or even fun-damental roles in transcriptional regulation, and many were identifiedoriginally based on their high affinity for DNA. Alternatively, as ori-ginally proposed in ref. 19, a limited number of factors may be involvedin establishment and maintenance of chromatin remodelling, whereasothers may interact nonspecifically with the remodelled state. We alsofound that the recognition sequences for a small number of factors wereconsistently linked with elevated chromatin accessibility acrossall classes of sites and all cell types (Supplementary Fig. 6c), indicatingthat regulators acting through these sequences are key drivers of theaccessibility landscape.

Overall, 94.4% of a combined 1,108,081 ChIP-seq peaks from allENCODE transcription factors fall within accessible chromatin(Fig. 2c and Supplementary Fig. 7a), with the median factor having98.2% of its binding sites localized therein. Notably, a small numberof factors diverged from this paradigm, including known chromatinrepressors, such as the KRAB-associated factors KAP1 (also calledTRIM28), SETDB1 and ZNF274 (refs 20, 21) (Fig. 2c). We hypothesizedthat a proportion of the occupancy sites of these factors representedbinding within compacted heterochromatin. To test this, we developedtargeted mass spectrometry assays22 for KAP1 and three factors

localizing almost exclusively within accessible chromatin (GATA1,c-Jun, NRF1), and quantified their abundance in biochemicallydefined heterochromatin23 against a total chromatin fraction (Sup-plementary Fig. 7b). This analysis confirmed that factors such asKAP1 show a significant level of heterochromatin occupancy (Sup-plementary Fig. 7c).

An invariant directional promoter chromatin signatureThe annotation of sites of transcription origination continues to be anactive and fundamental endeavour13. In addition to direct evidence ofTSSs provided by RNA transcripts, H3K4me3 modifications areclosely linked with TSSs24. We therefore explored systematically therelationship between chromatin accessibility and H3K4me3 patternsat well-annotated promoters, its relationship to transcription origina-tion, and its variability across ENCODE cell types.

We performed ChIP-seq for H3K4me3 in 56 cell types using the samebiological samples used for DNase I data (Supplementary Table 1,column D). Plotting DNase I cleavage density against ChIP-seq tag den-sity around TSSs reveals highly stereotyped, asymmetrical patterning ofthese chromatin features with a precise relationship to the TSS (Fig.3a, b). This directional pattern is consistent with a rigidly positionednucleosome immediately downstream from the promoter DHS, and islargely invariant across cell types (Fig. 3b and Supplementary Fig. 8).

To map novel promoters (and their directionality) not en-compassed by the GENCODE consensus annotations, we applied apattern-matching approach to scan the genome across all 56 cell types(Supplementary Methods). Using this approach we identified a totalof 113,622 distinct putative promoters. Of these, 68,769 correspond topreviously annotated TSSs, and 44,853 represent novel predictions(versus GENCODE v7). Of the novel sites, 99.5% are supported byevidence from spliced expressed sequence tags (ESTs) and/or cap ana-lysis of gene expression (CAGE) tag clusters (Fig. 3c andSupplementary Fig. 9, P , 0.0001; see Supplementary Methods). Wefound novel sites in every configuration relative to existing annotations(Fig. 3d–f and Supplementary Fig. 10). For example, 29,203 putativepromoters are contained in the bodies of annotated genes, of which17,214 are oriented antisense to the annotated direction of transcrip-tion, and 2,794 lie immediately downstream of an annotated gene’s39 end, with 1,638 in antisense orientation. The results indicate thatchromatin data can systematically inform RNA transcription analyses,and suggest the existence of a large pool of cell-selective transcriptionalpromoters, many of which lie in antisense orientations.

Chromatin accessibility and DNA methylation patternsCpG methylation has been closely linked with gene regulation, basedchiefly on its association with transcriptional silencing25. However,the relationship between DNA methylation and chromatin structurehas not been clearly defined. We analysed ENCODE reduced-representation bisulphite sequencing (RRBS) data, which providequantitative methylation measurements for several million CpGs(K. E. Varley et al., manuscript submitted; see Gene ExpressionOmnibus accession GSE27584). We focused on 243,037 CpGs fallingwithin DHSs in 19 cell types for which both data types were availablefrom the same sample. We observed two broad classes of sites: thosewith a strong inverse correlation across cell types between DNAmethylation and chromatin accessibility (Fig. 4a and Supplemen-tary Fig. 11a), and those with variable chromatin accessibility butconstitutive hypomethylation (Fig. 4a, right). To quantify these trendsglobally, we performed a linear regression analysis between chromatinaccessibility and DNA methylation at the 34,376 CpG-containingDHSs (see Supplementary Methods). Of these sites, 6,987 (20%)showed a significant association (1% FDR) between methylationand accessibility (Supplementary Fig. 11b). Increased methyla-tion was almost uniformly negatively associated with chromatinaccessibility (.97% of cases). The magnitude of the associationbetween methylation and accessibility was strong, with the latter on

Chr19 (q13.12)19p13.3 19p13.2 13.1119p12 11 11 19q12 q13.2 13.33

a

b

Combined

DNase I (K562)

EN

CO

DE

tra

nscrip

tio

n f

acto

r C

hIP

-seq

(K

56

2)

36150000 36250000

HAUS5RBM42

ETV2

COX6B1

UPK1ABC007817

ZBTB32MLL4

TMEM149

U2AF1L4

U2AF1L4PSENENAL137752

LIN37

HSPB6C19orf55

SNX26

AK055260

50 kb

ATF3

BHLHE40

BRG1

CCNT2

CEBPB

C-Fos

C-Jun

C-Myc

CTCF

E2F4

E2F6

EGR1

ELF1

ETS1

FOSL1

GABP

GATA2

GATA1

HMGN3

JUND

MAFF

MAX

NFE2

NFYA

NFYB

NRF1

NRSF

RFX5

SIX5

SP1

SP2

SRF

TAL1

TFIIIC

THAP1

YY1

ZBTB7A

ZNF143

ZNF263

MAFK

P300

USF1

ChIP-seq density

2 4 6 8 10 12

2

3

4

5

6

7

8

log[DNase I tag density]

log

[to

tal C

hIP

tag

density]

r = 0.7943

ZFN274

All transcription

factors

c-Jun GATA1

KAP1 SETDB1

Inaccessible chromatinAccessible chromatin

Fraction of transcription factor peaks in

Factors predominantly bound in

accessible chromatin

Factors predominantly bound in

inaccessible chromatin

21.5%

94.4%

99.2% 99.5%

NRF1

99.6%

40.6% 51.6%

c

Figure 2 | Transcription factor drivers of chromatin accessibility. a, DNase Itag density is shown in red for a 175-kb region of chromosome 19. Below:normalized ChIP-seq tag density for 45 ENCODE ChIP-seq experiments fromK562 cells, with a cumulative sum of the individual tag density tracks shownimmediately below the K562 DNase I data. b, Genome-wide correlation(r 5 0.7943) between ChIP-seq and DNase I tag densities (log10) in K562 cells.c, Left: 94.4% of a combined 1,108,081 ChIP-seq peaks from all transcriptionfactors assayed in K562 cells fall within accessible chromatin (grey areas of piechart). Top: three examples of transcription factors localizing almostexclusively within accessible chromatin. Bottom: three transcription factorsfrom the KRAB-associated complex localizing partially or predominantlywithin inaccessible chromatin.

ARTICLE RESEARCH

6 S E P T E M B E R 2 0 1 2 | V O L 4 8 9 | N A T U R E | 7 7

Macmillan Publishers Limited. All rights reserved©2012

Page 4: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

average 95% lower in cell types with coinciding methylation versuscell types lacking coinciding methylation (Supplementary Fig. 11c).Fully 40% of variable methylation was associated with a concomitanteffect on accessibility.

The role of DNA methylation in causation of gene silencing ispresently unclear. Does methylation reduce chromatin accessibilityby evicting transcription factors? Or does DNA methylation passively‘fill in’ the voids left by vacating transcription factors? Transcriptionfactor expression is closely linked with the occupancy of its bindingsites26. If the former of the two above hypotheses is correct, methyla-tion of individual binding site sequences should be independent oftranscription factor gene expression. If the latter, methylation at tran-scription factor recognition sequences should be negatively correlatedwith transcription factor abundance (Fig. 4b).

Comparing transcription factor transcript levels to averagemethylation at cognate recognition sites within DHSs revealed sig-nificant negative correlations between transcription factor expressionand binding site methylation for most (70%) transcription factorswith a significant association (P , 0.05). Representative examplesare shown in Fig. 4c and Supplementary Fig. 12a. These data arguestrongly that methylation patterning paralleling cell-selective chro-matin accessibility results from passive deposition after the vacationof transcription factors from regulatory DNA, confirming andextending other recent reports27.

Interestingly, a small number of factors showed positive correla-tions between expression and binding site methylation (Supplemen-tary Fig. 12b), including MYB and LUN-1 (also known as TOPORS).Both of these transcription factors showed increased transcription

f

–400 –200 0 200 400

Distance from annotated TSS (bp)

56 cell types

Promoter Transcript Avg

. H3K

4m

e3 s

ignalA

vg

. D

Nase I

sig

nal

a b c

EST only (14)

No CAGE or EST (257)

CAGE + EST (16,238)

CAGE only (28,344)

GENCODE coding

TSSs (65,266)

GENCODE non-coding

TSSs (3,496)

Novel TSSs

(44,853)

DNase I

H3K4me3

Annotated TSS

Chr7: 152455000

2 kb

ESTs BI457988

CAGE tag

clusters

GENCODE ACTR3B

152460000

ed

Annotated TSS

Novel TSS with CAGE and EST supportChr19:

1 kb

56145000 56147000

DNase I

H3K4me3

GD153972BX443966BG163590

ESTs

CAGE tag

clusters

Novel TSS

Novel TSS in intron (antisense)

1 kb

Chr1: 5977000 5979000

DNase I

H3K4me3

BG387737ESTs

NPHP4GENCODE

CAGE tag

clusters

Novel TSS

Experimental support for

novel TSSs

ESTs

Novel TSS with CAGE supportChr1:

1 kb

25740002576000

DNase I

H3K4me3

CAGE tag

clusters

Novel TSS

GENCODEGENCODE TTC34

Figure 3 | Identification and directional classification of novel promoters.a, DNase I (blue) and H3K4me3 (red) tag densities for K562 cells aroundannotated TSS of ACTR3B. b, Averaged H3K4me3 tag density (red, right y axis)and log DNase I tag density (blue, left y axis) across 10,000 randomly selectedGENCODE TSSs, oriented 59R39. Each blue and red curve is for a different celltype, showing invariance of the pattern. c, Relation of 113,615 promoter

predictions to GENCODE annotations, with supporting EST and CAGEevidence (bar at right). d–f, Examples of novel promoters identified in K562;red arrow marks predicted TSS and direction of transcription, with CAGE tagclusters, spliced ESTs and GENCODE annotations above. d, Novel TSSconfirmed by CAGE and ESTs. e, Novel TSS confirmed by CAGE, no ESTs.Note intronic location. f, Antisense prediction within annotated gene.

Accessib

ility

(D

Nase I d

ensity)

a

12855000 12857000

TRIB2

Chr2 (p24.3)

1 kb

HAEpiC

HCPEpiC

HepG2

HSMM

K562

NB4

NHDF-Neo

SAEC

SKMC

SK-N-SH_RA

100% meth.

Site-averagedmethylation

0%

bTF TF

TF

TFTF

M M M

MMM

TF TF TF

Cell type A Cell type B

Low expression, high binding-site

methylation

High expression, low binding-site

methylationCpG

5-Me-CpG CpG

c

0

20

40

60

80

100NF-κB

7.5 8.0 8.5 9.0

GM06990

HepG2

K562

NB4SK-N-SH_RA

Transcription factor

expression

Transcription factor

expression

Lymphoid regulators

Avera

ge m

eth

yla

tio

n

at

mo

tif

insta

nces (%

)

0

20

40

60

80

TAL1/TCF3

7.2 7.6 8.0

GM06990HepG2

K562

NB4

SK-N-SH_RA

100Erythroid regulators

Figure 4 | Chromatin accessibility and DNA methylation patterns.a, DNase I sensitivity in 10 cell types with ENCODE reduced representationbisulphite sequencing data. Inset box: accessibility (y axis) decreasesquantitatively as methylation increases. Other DHSs (right) show lowcorrelation between accessibility and methylation. CpG methylation scale:green, 0%; yellow, 50%; red, 100%. b, Model of transcription factor (TF)-drivenmethylation patterns in which methylation passively mirrors transcription

factor occupancy. c, Relationship between transcription factor transcript levelsand overall methylation at cognate recognition sequences of the sametranscription factors. Lymphoid regulators in B-lymphoblastoid line GM06990(left) and erythroid regulators in the erythroleukaemia line K562 (right).Negative correlation indicates that site-specific DNA methylation followstranscription factor vacation of differentially expressed transcription factors.

RESEARCH ARTICLE

7 8 | N A T U R E | V O L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012

Page 5: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

and binding site methylation specifically within acute promyelocyticleukaemia cells (NB4), and both interact with promyelocytic leukaemia(PML) bodies28,29, a sub-nuclear structure disrupted in PML cells. Theanomalous behaviour of these two transcription factors with respect tochromatin structure and DNA methylation may thus be related to aspecialized mechanism seen only in pathologically altered cells.

A map of distal DHS-to-promoter connectionsFrom examination of DNase I profiles across many cell types weobserved that many known cell-selective enhancers become DHSssynchronously with the appearance of hypersensitivity at the pro-moter of their target gene (Supplementary Fig. 13). To generalize this,we analysed the patterning of 1,454,901 distal DHSs (DHSs separatedfrom a TSS by at least one other DHS) across 79 diverse cell types(Supplementary Methods and Supplementary Table 6), and corre-lated the cross-cell-type DNase I signal at each DHS position withthat at all promoters within 6500 kb (Supplementary Fig. 14a). Weidentified a total of 578,905 DHSs that were highly correlated (r . 0.7)with at least one promoter (P , 102100), providing an extensive mapof candidate enhancers controlling specific genes (SupplementaryMethods and Supplementary Table 7). To validate the distal DHS/enhancer–promoter connections, we profiled chromatin interactionsusing the chromosome conformation capture carbon copy (5C) tech-nique30. For example, the phenylalanine hydroxylase (PAH) gene isexpressed in hepatic cells, and an enhancer has been defined upstreamof its TSS (Fig. 5a). The correlation values for three DHSs within thegene body closely parallel the frequency of long-range chromatininteractions measured by 5C. The three interacting intronic DHSscloned downstream of a reporter gene driven by the PAH promoterall showed increased expression ranging from three- to tenfold over apromoter-only control, confirming enhancer function.

We next examined comprehensive promoter-versus-all 5C experi-ments performed over 1% of the human genome31 in K562 cells.DHS–promoter pairings were markedly enriched in the specific cog-nate chromatin interaction (P , 10213, Supplementary Fig. 14b). Wealso examined K562 promoter–DHS interactions detected bypolymerase II chromatin interaction analysis with paired-end tagsequencing (ChIA-PET)24, which quantifies interactions between pro-moter-bound polymerase and distal sites. The ChIA-PET interactionswere also markedly enriched for DHS–promoter pairings (P , 10215,Supplementary Fig. 14c). Together, the large-scale interaction analysesaffirm the fidelity of DHS–promoter pairings based on correlatedDNase I sensitivity signals at distal and promoter DHSs.

Most promoters were assigned to more than one distal DHS,indicating the existence of combinatorial distal regulatory inputs formost genes (Fig. 5b and Supplementary Table 7). A similar result isforthcoming from large-scale 5C interaction data31. Surprisingly,roughly half of the promoter-paired distal DHSs were assigned tomore than one promoter (Fig. 5b and Supplementary Methods), indi-cating that human cis-regulatory circuitry is significantly more com-plicated than previously anticipated, and may serve to reinforce therobustness of cellular transcriptional programs.

The number of distal DHSs connected with a particular promoterprovides, for the first time, a quantitative measure of the overallregulatory complexity of that gene. We asked whether there are anysystematic functional features of genes with highly complex regulation.We ranked all human genes by the number of distal DHSs paired withthe promoter of each gene, then performed a Gene Ontology analysison the rank-ordered list. We found that the most complexly regulatedhuman genes were markedly enriched in immune system functions(Supplementary Fig. 14d), indicating that the complexity of cellularand environmental signals processed by the immune system is directlyencoded in the cis-regulatory architecture of its constituent genes.

Next, we asked whether DHS–promoter pairings reflectedsystematic relationships between specific combinations of regulatoryfactors (Supplementary Methods). For example, KLF4, SOX2, OCT4

(also called POU5F1) and NANOG are known to form a well-characterized transcriptional network controlling the pluripotentstate of embryonic stem cells32. We found significant enrichment(P , 0.05) of the KLF4, SOX2 and OCT4 motifs within distal DHSscorrelated with promoter DHSs containing the NANOG motif;enrichment of NANOG, SOX2 and OCT4 distal motifs co-occurringwith promoter motif OCT4; and enrichment of distal SOX2 andOCT4 motifs with promoter SOX2 motifs (Supplementary Fig.15a). By contrast, promoters containing KLF4 motifs were associatedwith KLF4-containing distal DHSs, but not with DHSs containingNANOG, SOX2 or OCT4 motifs (Supplementary Fig. 15a, bottom).

We also tested for significant co-associations between promotertypes (defined by the presence of cognate motif classes; seeSupplementary Methods) and motifs in paired distal DHSs (Fig. 5cand Supplementary Fig. 15b, c). For example, when a member of theETS domain family (motifs ETS1, ETS2, ELF1, ELK1, NERF (alsocalled ELF2), SPIB, and others) is present within a promoter DHS,motif PU.1 (also called SPI1) is significantly more likely to beobserved in a correlated distal DHS (P , 1025). These results suggestthat a limited set of general rules may govern the pairing of co-regulated distal DHSs with particular promoters.

0.9

0.7

0.5

0.3

0.1

Chr12:

HepG2 5C signal

Correlation

a

2,500

7,500

12,500

17,500

22,500

Co

rrela

tio

n

Tag

s

103275000 103295000 103315000

b

c

PAH

HepG2

EnhancersKnownNovel

HindIII fragments

Enrichment of co-occurring motifs in promoter/distal DHS pairs

Mo

tif

fam

ilies

in p

rom

ote

r D

HS

s

TATA/TBPETS familyE2A family

SP1-likeNRF

CREB/ATFC

TC

FK

LF

4

ELK

1

US

F

YY

1

MA

X

PU

.1

AP

-4

HN

F4

C-R

EL

ELF

1

ER

GATA

1

MIF

1

PP

AR

G

NF

1

NA

NO

G

SO

X2

OC

T4

HO

XB

5

HN

F3

SR

F

CE

BP

B

IRF

1IR

F3

NR

F2

Motifs in DHSs within ± 500 kb with correlation ≥0.8

P-value<10

–5

<0.005

<0.01

<0.05

<0.1≤1

<0.5

0 (9%)

1–10 (36%)>3 (20%)11–20 (18%)

>20 (37%)

1 (48%)

3 (11%)2 (21%)

Distal DHSs connected

per promoter DHS

(n = 69,965)

Promoter DHSs connected

per distal DHS

(n = 578,905 of 1,454,901 total)

Figure 5 | A genome-wide map of distal DHS-to-promoter connectivity.a, Cross-cell-type correlation (red arcs, left y axis) of distal DHSs and PAHpromoter closely parallels chromatin interactions measured by 5C-seq (bluearcs, right y axis); black bars indicate HindIII fragments used in 5C assays.Known (green) and novel (magenta) enhancers confirmed in transfectionassays are shown below. Enhancer at far right is not separable by 5C as it lieswithin the HindIII fragment containing the promoter. b, Left: proportions of69,965 promoters correlated (r . 0.7) with 0 to .20 DHSs within 500 kb. Right:proportions of 578,905 non-promoter DHSs (out of 1,454,901) correlated with1 to .3 promoters within 500 kb. c, Pairing of canonical promoter motiffamilies with specific motifs in distal DHSs.

ARTICLE RESEARCH

6 S E P T E M B E R 2 0 1 2 | V O L 4 8 9 | N A T U R E | 7 9

Macmillan Publishers Limited. All rights reserved©2012

Page 6: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

Stereotyped chromatin accessibility parallels functionIn addition to the synchronized activation of distal DHSs and pro-moters described above, we observed a surprising degree of patternedco-activation among distal DHSs, with nearly identical cross-cell-typepatterns of chromatin accessibility at groups of DHSs widely separatedin trans (Supplementary Figs 16 and 17). For many patterns, weobserved tens or even hundreds of like elements around the genome.The simplest explanation is that such co-activated sites sharerecognition motifs for the same set of regulatory factors. We found,however, that the underlying sequence features for a given pattern weresurprisingly plastic. This suggests that the same pattern of cell-selectivechromatin accessibility shared between two DHSs can be achievedby distinct mechanisms, probably involving complex combinatorialtuning.

We next asked whether distal DHSs with specific functions suchas enhancers exhibited stereotypical patterning, and whether suchpatterning could highlight other elements with the same function.We examined one of the best-characterized human enhancers,DNase I HS2 of the b-globin locus control region16–18. HS2 is detectedin many cell types, but exhibits potent enhancer activity only inerythroid cells33. Using a pattern-matching algorithm (see Supplemen-tary Methods) we identified additional DHSs with nearly identicalcross-cell-type accessibility patterns (Fig. 6a). We selected 20 elementsacross the spectrum of the top 200 matches to the HS2 pattern, andtested these in transient transfection assays in K562 cells (Supplemen-tary Methods). Seventy per cent (14 of 20) of these displayed enhanceractivity (mean 8.4-fold over control) (Fig. 6a, f). Of note, one (E3)showed a greater magnitude of enhancement (18-fold versus control)than HS2, which is itself one of the most potent known enhancers4.Next we selected three elements from the 14 HS2-like enhancers,applied pattern matching (Methods) to each to identify stereotypedelements, and tested samples of each pattern for enhancer activity,revealing additional K562 enhancers (total 15 of 25 positive)(Fig. 6b–d, f). In each case, therefore, we were able to discoverenhancers by simply anchoring on the cross-cell-type DHS patternof an element with enhancer activity. Collectively, these results showthat co-activation of DHSs reflected in cross-cell-type patterning ofchromatin accessibility is predictive of functional activity within aspecific cell type, and suggest more generally that DHSs with stereo-typed cellular patterning are likely to fulfil similar functions.

To visualize the qualities and prevalence of different stereotypedcross-cellular DHS patterns, we constructed a self-organizing map ofa random 10% subsample of DHSs across all cell types and identified atotal of 1,225 distinct stereotyped DHS patterns (Supplementary Figs18 and 19). Many of the stereotyped patterns discovered by the self-organizing map encompass large numbers of DHSs, with some count-ing .1,000 elements (Supplementary Fig. 20).

Taken together, the above results show that chromatin accessibilityat regulatory DNA is highly choreographed across large sets of co-activated elements distributed throughout the genome, and thatDHSs with similar cross-cell-type activation profiles probably sharesimilar functions.

Variation in regulatory DNA linked to mutation rateThe DHS compartment as a whole is under evolutionary constraint,which varies between different classes and locations of elements14, andmay be heterogeneous within individual elements34. To understand theevolutionary forces shaping regulatory DNA sequences in humans, weestimated nucleotide diversity (p) in DHSs using publicly availablewhole-genome sequencing data from 53 unrelated individuals35 (seeSupplementary Methods). We restricted our analysis to nucleotidesoutside of exons and RepeatMasked regions. To provide a comparisonwith putatively neutral sites, we computed p in fourfold degeneratesynonymous positions (third positions) of coding exons. This analysisshowed that, taken together, DHSs exhibit lower p than fourfolddegenerate sites, compatible with the action of purifying selection.

Figure 7a shows p for the DHSs of all analysed cell types, with colourcoding to indicate the origin of each cell type. Particularly striking is thedistribution of diversity relative to proliferative potential. DHSs in cellswith limited proliferative potential have uniformly lower averagediversity than immortal cells, with the difference most pronouncedin malignant and pluripotent lines. This ordering is identical whenhighly mutable CpG nucleotides are removed from the analysis.

If differences in p are due to mutation rate differences in differentDHS compartments, the ratio of human polymorphism to human–chimpanzee divergence should remain constant across cell types. Bycontrast, differences in p due to selective constraint should result inpronounced differences. To distinguish between these alternatives, wefirst compared polymorphism and human–chimpanzee divergencefor DHSs from normal, malignant and pluripotent cells (Fig. 7b).Differences in polymorphism and divergence between these threegroups are nearly identical, compatible with a mutational cause.Second, raw mutation rate is expected to affect rare and commongenetic variation equally, whereas selection is likely to have a largerimpact on common variation. We consistently observe ,62% ofsingle nucleotide polymorphisms (SNPs) in DHSs of each group tohave derived-allele frequencies below 0.05. DHSs in different cell

f15

20

10

5

0

Fo

ld e

nrichm

ent

HS2 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13HS3

HPFHConF

AG04450HGF

AG10803AG09309

BJAG04449

NHDF-AdHCMHCF

HPAF

AG09319HPdLF

NHDF-NeoHMF

HNPCEpiCHCFaaHVMF

HAEpiC

AoAFSKMC

HCPEpiCNHLF

TH2TH1

GM12865

NB4HL-60CMKK562

HMVEC-dLy-NeoHMVEC-LLy

HMVEC-dLy-AdHMVEC-dNeo

GM12864GM12878

HMVEC-dBl-NeoHMVEC-dBl-Ad

HEEpiCNHEKHMEC

H1 hESC

HMVEC-LBlHRGECHUVEC

SAEC

Chr1

2:6

,277,1

45

Chr1

:200,6

09,2

85

Chr1

:181,1

22,2

65

Chr1

1:5

,301,9

65

Chr1

8:3

,653,0

05

Chr1

0:7

3,6

20,3

65

Chr1

9:5

5,7

27,9

65

Chr1

0:1

7,2

58,3

85

Chr1

8:1

1,9

47,4

05

Chr2

:112,4

56,6

25

Chr1

1:3

4,8

31,3

65

Chr1

1:1

3,1

27,0

45

Chr1

5:1

01,7

08,5

05

Chr1

2:1

04,8

02,5

85

a b c d e

HS2 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13

Figure 6 | Stereotyped regulation of chromatin accessibility. a–e, Enhancersgrouped by similar chromatin stereotypes. Related cell lines are colourmatched. HS2 from the b-globin locus control region is at left. E1–E11represent progressively weaker matches to the HS2 stereotype. E12–13 derivefrom matches to a different stereotype based on another K562 enhancer.f, Experimental validation of enhancers detected by pattern matching. Barsindicate fold enrichment observed in transient assays in K562 relative topromoter-only control; mean of testing in both orientations is shown. Red barsindicate data from two potent in vivo enhancers, b-globin LCR HS2 and HS3;the latter requires chromatinization to function and is not active in transientassays. Gold bars indicate data from E1–E13 from a–e above.

RESEARCH ARTICLE

8 0 | N A T U R E | V O L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012

Page 7: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

lines exhibit differences in SNP densities but not in allele frequencydistribution (Fig. 7c). Collectively, these observations are consistentwith increased relative mutation rates in the DHS compartment ofimmortal cells versus cell types with limited proliferative potential,exposing an unexpected link between chromatin accessibility, prolif-erative potential and patterns of human variation.

DiscussionSince their discovery over 30 years ago, DNase I hypersensitive siteshave guided the discovery of diverse cis-regulatory elements in thehuman and other genomes. Here we have presented by far the mostcomprehensive map of human regulatory DNA, revealing novelrelationships between chromatin accessibility, transcription, DNAmethylation and the occupancy of sequence-specific factors. The widespectrum of different cell and tissue types covered by our data greatlyexpands the horizons of cell-selective gene regulation analysis,enabling the recognition of systematic long-distance regulatorypatterns, and previously undescribed phenomena such as stereotypingof DHS activation and mutation rate variation in normal versusimmortal cells. The extensive resources we have provided shouldgreatly facilitate future analyses, and stimulate new areas of investiga-tion into the organization and control of the human genome. Co-published ENCODE-related papers can be explored online via theNature ENCODE explorer (http://www.nature.com/ENCODE), a spe-cially designed visualization tool that allows users to access the linkedpapers and investigate topics that are discussed in multiple papers viathematically organized threads.

METHODS SUMMARYDNase I hypersensitivity mapping was performed using protocols developed byDuke University7 or University of Washington8 on a total of 125 cell types(Supplementary Table 1). Data sets were sequenced to an average depth of30 million uniquely mapping sequence tags (27–36 bp for University ofWashington and 20 bp for Duke University) per replicate. For uniformity of

analysis, some cell-type data sets that exceeded 40M tag depth were randomlysubsampled to a depth of 30 million tags. Sequence reads were mapped using theBowtie aligner, allowing a maximum of two mismatches. Only reads mappinguniquely to the genome were used in our analyses. Mappings were to male orfemale versions of hg19/GRCh37, depending on cell type, with random regionsomitted. Data were analysed jointly using a single algorithm7 (SupplementaryMethods) to localize DNase I hypersensitive sites. H3K4me3 ChIP-seq was per-formed using antibody 9751 (Cell Signaling) on 1% formaldehyde crosslinkedsamples sheared by Diagenode Bioruptor. Gene expression measurements foreach cell type were performed on Affymetrix human exon microarrays. 5Cexperiments were performed as described30,31. Transcription factor recognitionmotif occurrences within DHSs were defined with FIMO36 at significanceP , 1025 using motif models from the TRANSFAC database.

Received 15 December 2011; accepted 15 May 2012.

1. Felsenfeld, G., Boyes, J., Chung, J., Clark, D. & Studitsky, V. Chromatin structure andgene expression. Proc. Natl Acad. Sci. USA 93, 9384–9388 (1996).

2. Gross, D. S. & Garrard, W. T. Nuclease hypersensitive sites in chromatin. Annu. Rev.Biochem. 57, 159–197 (1988).

3. Gaszner, M. & Felsenfeld, G. Insulators: exploiting transcriptional and epigeneticmechanisms. Nature Rev. Genet. 7, 703–713 (2006).

4. Li, Q., Harju, S. & Peterson, K. R. Locus control regions: coming of age at a decadeplus. Trends Genet. 15, 403–408 (1999).

5. Heintzman, N. D. et al. Distinct and predictive chromatin signatures oftranscriptionalpromoters andenhancers in thehumangenome. NatureGenet.39,311–318 (2007).

6. Hesselberth, J.R.et al.Globalmappingofprotein-DNA interactions in vivobydigitalgenomic footprinting. Nature Methods 6, 283–289 (2009).

7. Boyle, A. P. et al. High-resolution mapping and characterization of open chromatinacross the genome. Cell 132, 311–322 (2008).

8. John, S. et al. Chromatin accessibility pre-determines glucocorticoid receptorbinding patterns. Nature Genet. 43, 264–268 (2011).

9. Song, L. et al. Open chromatin defined by DNase I and FAIRE identifies regulatoryelements that shape cell-type identity. Genome Res. 21, 1757–1767 (2010).

10. Harrow, J. et al. GENCODE: The reference human genome annotation for theENCODE project. Genome Res. (in the press).

11. Griffiths-Jones, S., Saini, H. K., van Dongen, S. & Enright, A. J. miRBase: tools formicroRNA genomics. Nucleic Acids Res. 36, D154–D158 (2008).

12. Farazi, T. A., Spitzer, J. I., Morozov, P. & Tuschl, T. miRNAs in human cancer.J. Pathol. 223, 102–115 (2011).

π (×1

0–4 p

er

site)

π (×1

0–4 p

er

site)

π 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8

2.2

2.4

2.6

2.8

SN

Ps k

b–1 (D

AF

> 5

%)

SNPs kb–1 (DAF < 5%)

r2 = 0.84

Pluripotent ES/iPS cell

Malignant

Multipotent CD34+

Immortalized (EBV, MYC)

Normal primary cells

7.5

8.0

8.5

9.0

9.5

10.0

10.5

Glio

bla

WI-

38+

TAM

HA

-sp

NH

A

NH

LF

HA

c

HA

-h

WI-

38

HM

F

SK

-N-M

C

HC

F

HB

ME

C

HC

PE

piC

HIP

Ep

iC

HN

PC

Ep

iC

HA

Ep

iC

HV

MF

HR

PE

piC

HR

E

HR

CE

RP

TE

CH

ME

C

NH

EK

PrE

C

HE

Ep

iC

SA

EC

HeLa-S

3

HC

T-1

16

A549

PA

NC

-1

SK

MC

LN

CaP

MC

F-7

H1 h

ES

C(1

)

H1 h

ES

C(2

)

H7-h

ES

C

NT

2-D

1

BE

2_C

WE

RI-

Rb

-1

HM

VE

C-d

Ad

HM

VE

C-d

Neo

HM

VE

C-d

Ly-A

dH

MV

EC

-LLy

HR

GE

C

HU

VE

C

HPA

EC

HM

VE

C-L

Bl

HM

VE

C-d

Ly-N

eo

HM

VE

C-d

Bl-

Ad

HM

VE

C-d

Bl-

Neo

HL-6

0

NB

4

CM

K

K562

Jurk

at

GM

12878

GM

06990

GM

12864

GM

12865

TH

1

Th2

CD

20+

CD

34+

_M

ob

ilized

Ishik

aw

a_E

Ishik

aw

a_T

Hela

-S3Ifna4h

MC

F-7

Hyp

oxla

c

T-4

7D

Hep

G2

Caco

-2

SK

-N-S

H_R

A

CLL

H9E

S

Ips

Med

ullo

LN

CaP

And

ro

8988T

HS

MM

tub

e

HS

MM

NH

DF

-Ad

NH

DF

-Neo

HG

F

AG

09319

HP

dLF

Ao

AF

HPA

F

HC

Faa

HC

MH

Co

nF

HP

F

AG

04449

BJ

AG

09309

AG

10803

HF

F-M

yc

HF

F

AG

04450

Fourfold degenerate sites (95% CI)

Pluripotent

ES/iPS

div. π div. π div.Malignant Normal

7.5

8.0

8.5

9.0

9.5

10.0

10.5

0.010

0.012

0.014

0.016 Hu

man

–ch

imp

an

zee

div

erg

en

ce p

er s

ite

a

b c

Figure 7 | Genetic variation in regulatory DNA linked to mutation rate.a, Mean nucleotide diversity (p, y axis) in DHSs of 97 diverse cell types (x axis)estimated using whole-genome sequencing data from 53 unrelated individuals.Cell types are ordered left-to-right by increasing mean p. Horizontal blue barshows 95% confidence intervals on mean p in a background model of fourfolddegenerate coding sites. Note the enrichment of immortal cells at right. ES,embryonic stem; iPS, induced pluripotent stem. b, Mean p (left y axis) for

pluripotent (yellow) versus malignancy-derived (red) versus normal cells (lightgreen), plotted side-by-side with human–chimpanzee divergence (right y axis)computed on the same groups. Boxes indicate 25–75 percentiles, with medianshighlighted. c, Both low- and high-frequency derived alleles show the sameeffect. Density of SNPs in DHSs with derived allele frequency (DAF) ,5% (xaxis) is tightly correlated (r2 5 0.84) with the same measure computed forhigher-frequency derived alleles (y axis). Colour-coding is the same as in panel a.

ARTICLE RESEARCH

6 S E P T E M B E R 2 0 1 2 | V O L 4 8 9 | N A T U R E | 8 1

Macmillan Publishers Limited. All rights reserved©2012

Page 8: The Accessible Chromatin Landscape of the Human Genome Articulo Del Curso

13. Djebali, S. et al. Landscape of transcription in human cells. Nature http://dx.doi.org/10.1038/nature11233 (this issue).

14. ENCODE Project Consortium.. An integrated encyclopedia of DNA elements in thehuman genome. Nature http://dx.doi.org/10.1038/nature11247 (this issue).

15. Biggin, M. D. Animal transcription networks as highly connected, quantitativecontinua. Dev. Cell 21, 611–626 (2011).

16. Reddy, P. M., Stamatoyannopoulos, G., Papayannopoulou, T. & Shen, C. K.Genomic footprinting and sequencing of human b-globin locus. Tissue specificityand cell line artifact. J. Biol. Chem. 269, 8287–8295 (1994).

17. Forsberg, E. C., Downs, K. M. & Bresnick, E. H. Direct interaction of NF-E2 withhypersensitive site 2 of the b-globin locus control region in living cells. Blood 96,334–339 (2000).

18. Talbot, D. & Grosveld, F. The 59HS2 of the globin locus control region enhancestranscription through the interaction of a multimeric complex binding at twofunctionally distinct NF-E2 binding sites. EMBO J. 10, 1391–1398 (1991).

19. Weisbrod,S.&Weintraub,H. Isolationofa subclassofnuclearproteins responsiblefor conferring a DNase I-sensitive structure on globin chromatin. Proc. Natl Acad.Sci. USA 76, 630–634 (1979).

20. Schultz, D. C., Ayyanathan, K., Negorev, D., Maul, G. G. & Rauscher, F. J. SETDB1: anovel KAP-1-associated histone H3, lysine 9-specific methyltransferase thatcontributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-fingerproteins. Genes Dev. 16, 919–932 (2002).

21. Frietze, S., O’Geen, H., Blahnik, K. R., Jin, V. X. & Farnham, P. J. ZNF274 recruits thehistone methyltransferase SETDB1 to the 39 ends of ZNF genes. PLoS ONE 5,e15082 (2010).

22. Stergachis, A. B., Maclean, B., Lee, K., Stamatoyannopoulos, J. A. & MacCoss, M. J.Rapid empirical discovery of optimal peptides for targeted proteomics. NatureMethods 8, 1041–1043 (2011).

23. Henikoff, S., Henikoff, J. G., Sakai, A., Loeb, G.B.& Ahmad, K.Genome-wide profilingof salt fractions maps physical properties of chromatin. GenomeRes. 19, 460–469(2009).

24. Li, G. et al. Extensive promoter-centered chromatin interactions provide atopological basis for transcription regulation. Cell 148, 84–98 (2012).

25. Siegfried, Z. et al. DNA methylation represses transcription in vivo. Nature Genet.22, 203–206 (1999).

26. O’Geen,H.et al. Genome-wideanalysisofKAP1binding suggestsautoregulationofKRAB-ZNFs. PLoS Genet. 3, e89 (2007).

27. Stadler, M. B. et al. DNA-binding factors shape the mouse methylome at distalregulatory regions. Nature 480, 490–495 (2011).

28. Rasheed, Z. A., Saleem, A., Ravee, Y., Pandolfi, P. P. & Rubin, E. H. Thetopoisomerase I-binding RING protein, topors, is associated with promyelocyticleukemia nuclear bodies. Exp. Cell Res. 277, 152–160 (2002).

29. Dahle, Ø., Bakke, O. & Gabrielsen, O. S. c-Myb associates with PML in nuclearbodies in hematopoietic cells. Exp. Cell Res. 297, 118–126 (2004).

30. Dostie, J.et al.ChromosomeConformationCaptureCarbonCopy (5C): amassivelyparallel solution for mapping interactions between genomic elements. GenomeRes. 16, 1299–1309 (2006).

31. Sanyal, A., Lajoie, B., Jain, G. & Dekker, J. The long-range interaction landscape ofgene promoters. Nature http://dx.doi.org/10.1038/nature11279 (this issue).

32. Kim, J., Chu, J., Shen, X., Wang, J.& Orkin, S. H. An extended transcriptional networkfor pluripotency of embryonic stem cells. Cell 132, 1049–1061 (2008).

33. Tuan,D., Kong, S. & Hu, K. Transcription of the hypersensitive site HS2 enhancer inerythroid cells. Proc. Natl Acad. Sci. USA 89, 11219–11223 (1992).

34. Neph, S. et al. An expansive human regulatory lexicon encoded in transcriptionfactor footprints. Nature http://dx.doi.org/10.1038/nature11212 (this issue).

35. Vernot, B. et al. Personal and population genomics of human regulatory variation.Genome Res. (in the press).

36. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a givenmotif. Bioinformatics 27, 1017–1018 (2011).

Supplementary Information is available in the online version of the paper.

Acknowledgements We thank our ENCODE colleagues for many insights into the datatypes generated by different centres and for help with coordinated analyses. We thankI. Stanaway for assistance with the variation analysis, and many colleagues, particularlyF. Urnov, for helpful critiques of the manuscript and figures. This work was funded byNational InstitutesofHealth grantsHG004592(J.A.S.),HG004563 (G.E.C.),GM076036(J.M.A.) and R01MH084676 (S.R.S), and J.V. is supported by the National ScienceFoundation Graduate Research Fellowship under grant no. DGE-0718124. N.C.S. issupported by a National Science Foundation Graduate Research Fellowship and theResearch Council of Norway. M.T. and K.G. acknowledge funding support from thecaBIG In Silico Center of Excellence, NCI/NIH contract no. HHSN261200800001E.

Author Contributions Generation of DNase I data was supervised by J.A.S. and G.E.C.,with data collection carried out by M.O.D., P.J.S., R.K., D.B., T.K.C., R.S.H., M.D., D.D., E.G.,T.K., K.L., F.N., V.R., A. Shafer, S.V., M.W., B.-K.L., D. London, L.S., Zhancheng Z. andZhuzhu Z. 5C experiments were supervised by J.D. and performed by A. Sanyal.Primary DNase I data processing was performed by R.S., T.S.F., A.K.J. and A.P.R.Hypersensitivity Southern blots and enhancer cloning and transfection experimentswere performed by E.M.J., A.K.E., T.F., E.D.N., L.B., D. Lotakis, M.E.S. and Y.Y. andsupervised by P.A.N. and G.S. H3K4me3 ChIP-seq experiments were performed byH.W. Primary analysis of DNase I data was performed by R.E.T., R.S. and R.H. Jointanalysis of DNase I and transcription factor ChIP-seq data was performed by J.V., S.N.,A.B.S. and H.Q. Promoter prediction analysis was performed by R.E.T. DNase I versusDNA methylation analysis was performed by M.T.M. DHS–promoter connectivityanalysis was performed by E.R. Integration of DNase I and 5C data was performed byR.H. with assistance from B. Lajoie. DHS stereotyping pattern analysis was performedby E.H. Self-organizing map analysis was performed by N.C.S. and B. Lenhard.MicroRNA analysis was performed by K.G., J.M.S. and M.T. Variation analysis wasperformedbyB.V. andE.R. underdirectionofS.R.S., J.M.A. andJ.A.S.Data interpretationand figure design were performed by J.A.S., R.E.T., J.D.L., V.R.I., G.E.C. and T.S.F. J.A.S.,R.E.T., E.R., R.H., J.V., M.T.M., A.B.S., S.J. and N.S. wrote the paper.

Author Information DNase I-seq data are available through the UCSC browser, andthrough the NCBI Gene Expression Omnibus (GEO) data repository under accessionsGSE29692 and GSE32970. H3K4me3 data are available through the UCSC browser,and through the NCBIGEO data repository under accessionGSE35583.Data for 5C areavailable through the UCSC browser under accession wgEncodeEH002102. Geneexpression data are available through the UCSC browser, and through the NCBI GEOdata repositoryunderaccessionsGSE19090,GSE15805andGSE17778.Reprintsandpermissions information is available at www.nature.com/reprints. This paper isdistributed under the terms of the Creative CommonsAttribution-Non-Commercial-Share Alike licence, and the online version of the paper isfreely available to all readers. The authors declare no competing financial interests.Readers are welcome to comment on the online version of the paper. Correspondenceand requests for materials should be addressed to J.A.S. ([email protected]).

RESEARCH ARTICLE

8 2 | N A T U R E | V O L 4 8 9 | 6 S E P T E M B E R 2 0 1 2

Macmillan Publishers Limited. All rights reserved©2012