genomics of gene regulation ansc 497b ross hardison nov. 10, 2009

59
Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Upload: marcus-williams

Post on 12-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Genomics of Gene Regulation

ANSC 497B

Ross Hardison

Nov. 10, 2009

Page 2: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

DNA sequences involved in regulation of gene transcription

Protein-DNA interactions

Chromatin effects

Page 3: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Distinct classes of regulatory regions

Maston G, Evans S and Green M (2006) Annu Rev Genomics Hum Genetics 7:29-59

Act in cis, affecting expression of a gene on the same chromosome.

Cis-regulatory modules (CRMs)

Page 4: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

General features of promoters

• A promoter is the DNA sequence required for correct initiation of transcription

• It affects the amount of product from a gene, but does not affect the structure of the product.

• Most promoters are at the 5’ end of the gene.

Maston, Evans & Green (2006) Ann Rev Genomics & Human Genetics, 7:29-59

TATA box + Initiator:Core or minimal promoter. Site of assembly of preinitiation complex

Upstream regulatory elements:Regulate efficiency of utilization of minimal promoter

RNA polymerase II

Page 5: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Conventional view of eukaryotic gene promoters

Maston, Evans & Green (2006) Ann Rev Genomics & Human Genetics, 7:29-59

Page 6: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Most promoters in mammals are CpG islands

TATA, no CpG islandAbout 10% of promoters

CpG island, no TATAAbout 90% of promoters

Carninci … Hayashizaki (2006)Nature Genetics 38:626

Page 7: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Differences in specificity of start sites for transcription for TATA vs CpG island promoters

Carninci … Hayashizaki (2006)Nature Genetics 38:626

Fra

ctio

n of

mR

NA

s

Page 8: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Enhancers• Cis-acting sequences that cause an increase in expression of a gene• Act independently of position and orientation with respect to the

gene.

Pennacchio et al., http://enhancer.lbl.gov/

Tested UCE

Over half of ultraconserved noncoding sequences aredevelopmental enhancersPennacchio et al. (2006) Nature 444:499-502

lacZUCE prluciferaseprCRM

About half of the enhancers predicted by interspecies alignments are validated in erythroid cellsWang et al. (2006) Genome Research 16:1480- 1492

Page 9: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

CRMs are clusters of specific binding sites for transcription factors

Hardison (2002) on-line textbook Working with Molecular Genetics http://www.bx.psu.edu/~ross/

Page 10: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Enhancers can occur in a variety of positions with respect to genes

Transcription unitP

Ex1 Ex2

EnhancerEnhancer

Adjacent

Downstream

Internal

Distal

Upstream

Page 11: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Silencer

• Cis-acting sequences that cause a decrease in gene expression• Similar to enhancer but has an opposite effect on gene expression• Gene repression - inactive chromatin structure (heterochromatin)

• SIR proteins (Silent Information Regulators)• Nucleates assembly of multi-protein complex

– hypoacetylated N-terminal tails of histones H3 and H4– methylated N-terminal tail of H3 (Lys 9)

Page 12: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Insulators and boundaries

• A boundary in chromatin marks a transition from open to closed chromatin• An insulator blocks activation of promoter by an enhancer

– Requires CTCF• Example: HS4 from chick HBB complex has both functions

neoRPr Enhancer

Insu-lator

Neo-resistant colonies% of maximum

50 10010

Silencer

Page 13: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Repression by PcG proteins via chromatin modification

Polycomb Group (PcG) Repressor Complex 2: ESC, E(Z), NURF-55, and PcG repressor SU(Z)12Methylates K27 of Histone H3 via the SET domain of E(Z)

me3H3 N-tailK27 OFF

Page 14: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

trx group (trxG) proteins activate via chromatin changes

• SWI/SNF nucleosome remodeling• Histone H3 and H4 acetylation• Methylation of K4 in histone H3

– Trx in Drosophila, MLL in humans• http://www.igh.cnrs.fr/equip/cavalli/

link.PolycombTeaching.html#Part_3

Me1,2,3

H3 N-tail

K4 ON

Page 15: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Histone modifications modulate chromatin structure

http://www.imt.uni-marburg.de/bauer/images/fig2.jpg Uta-Maria Bauer

H3K27me3H3K4me2, 3

Page 16: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Repressed and active chromatin

Dustin Schones and Keiji Zhao (2008) Nature Reviews Genetics 9: 179

Page 17: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Biochemical features of DNA in CRMs

Pol IIaPol II

Coactivators

Accessible to cleavage: DNase hypersensitive site

Bound by specific transcription factors

Associated with RNA polymerase and general transcription factors

Nucleosomes with histone modifications:Acetylation of H3 and H4Methylation of H3K4Lack of methylation at H3K27 or H3K9 …

Clusters of binding site motifs

Page 18: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Methods in Genomics of Gene Regulation

Page 19: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Chromatin immunoprecipitation: Greatly enrich for DNA occupied by a protein

Elaine Mardis (2007) Nature Methods 4: 613-614

Page 20: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

ChIP-chip: High throughput mapping of DNA sequences occupied by protein

http://www.chiponchip.org Bing Ren’s lab

Page 21: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Enrichment of sequence tags reveals function

Barbara Wold & Richard M Myers (2008) “Sequence Census Methods” Nature Methods 5:19-21

Page 22: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Illumina (Solexa) short read sequencing

- 8 lanes per run- 10 M to 20 M reads of 36 nucleotides (or longer) per run. - 1 lane can produce enough reads to map locations of a transcription factor in a mammalian genome.

Page 23: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Example of ChIP-seq

ChIP vs NRSF = neuron-restrictive silencing factorJurkat human lymphoblast line

NPAS4 encodes neuronal PAS domain protein 4

Johnson DS, Mortazavi A, Myers RM, Wold B. (2007) Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science 316:1497-1502.

Page 24: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

ChIP-seq for chromatin modifications

Dustin Schones and Keiji Zhao (2008) Nature Reviews Genetics 9: 179

Page 25: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Histone modifications around HBB locus

Known CRMs

UCSC genes

DNase hyper-sensitive sites

Polycomb

trithorax

Transcriptionassociated mark

Page 26: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Distributions at all GenCode TSSs

Birney et al. (2007) Nature 477: 799-816

Symmetrical distribution of: - H3K4me3, H3K4me2 - H3Ac, H4Ac, DHS - E2F1, E2F4, Myc, Pol II

Page 27: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Distribution of histone

modifications and factor binding

around regulatory regions

• Promoters– H3K4me3, H3K4me2

– E2F1, E2F4, Myc, Pol II

• Distal HSs– H3K4me1: enhancers

– CTCF: insulators

Birney et al. (2007) Nature, 447:799-816

Page 28: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Enhancers predicted from chromatin signatures

(2009) Nature 459: 108-112

Page 29: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Enhancer predictions in human cells

Page 30: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Characteristics and validation of predicted enhancers

Page 31: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Data Resources for Genomics of Gene Regulation

Page 32: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

UCSC Genome Browser

• Visualize data described in publications, e.g.– Expression data

• Affymetrix gene arrays, GNF, Su et al. 2004

– Regulation • Kim et al. 2005, PICs (TAF1) • Kim et al., 2008, CTCF• Boyle et al., 2008, DNase hypersensitive sites • Heintzman et al., 2009, Enhancers predicted by H3K4me1• Mikkelsen et al., 2007, Chromatin modifications in pluripotent and lineage-committed cells

• ENCODE project, Production phase– Expression

• Affy high density tiling arrays• RNA-seq from several sources (CSHL, Helicos)

– Regulation• Broad histone modifications• HAIB DNA methylation• Open Chromatin• UW DNase HS• HAIB TFBS• Yale TFBS• SUNY RBP

Page 33: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Factor occupancy

and DNase

hypersen-sitivity

ENCODE Tracks: Broad histone modifications, Open chromatin, UW DHS, Yale TFBSs

Locus control regionHS5 4 3 2 1

Page 34: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Collated sets of published regulatory regions

• http://www.bx.psu.edu/~ross/dataset/Reguldata.html• Noncoding DNA segments with high regulatory potential• PRPs: Intersection of the High RP segments and the PReMods

(clusters of conserved transcription factor binding site motifs)• Most constrained DNA segments, phastCons• DNase hypersensitive sites in CD4+ T cells• DNA segments occupied by CTCF in primary fibroblasts• Preinitiation complexes (TAF1) in IMR90 cells• Predicted erythroid cis-regulatory modules

Page 35: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

GeneTrack

• Genomic data analysis and integration– Istvan Albert, Frank Pugh, et al., PSU– http://genetrack.bx.psu.edu/

• Install on your system• Gallery of data for visualization

– Yeast H2AZ nucleosome predictions, 454 sequencing– Drosophila H2AZ nucleosome predictions, 454

sequencing

Page 36: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Yeast nucleosome

map

Page 37: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

HIS3: nucleosome-free region

Page 38: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

modENCODE

http://www.modencode.org/

Worm and FlyGene annotationsExpressionChromatin modificationsTFBs in vivo, etc.

Page 39: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Experimental Tests in the Genomics of Gene Regulation

Page 40: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

G1E-ER4 cells

GATA-1 is required for erythroid maturation

Aria Rad, 2007 http://commons.wikimedia.org/wiki/Image:Hematopoiesis_(human)_diagram.png

MEP Hematopoietic stem cell

Commonmyeloidprogenitor

Myeloblast

Basophil

Commonlymphoidprogenitor

Neutrophil

Eosinophil

Monocyte, macrophage

GATA-1G1E cells

Page 41: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

GATA1-induced changes in gene expression and occupancy genome-wide

Genes induced or repressed after restoration of GATA1

Occupancy by TFs and histone modifications along a 60 Mb region

Page 42: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

High sensitivity and specificity of high throughput occupancy data

Page 43: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

High throughput occupancy matches known

CRMs at Hbb locus

Page 44: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Confirmed and novel regulatory regions for Gypa

Known CRMsGypa gene

Response

DHSs

GATA1

TAL1

Trx: H3K4me1

Trx: H3K4me3

PcG: H3K27me3

Input DNA

Page 45: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Induced genes have GATA1 occupied segments close to their TSS

Page 46: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

DNA segments occupied by GATA-1 were tested for enhancer activity on transfected plasmids

Occupiedsegments

Page 47: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Some of the DNA segments occupied by GATA-1 are active as enhancers

Cheng et al. (2008) Genome Research 18:1896-1905

Page 48: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Binding site motifs in occupied DNA segments can be deeply preserved during evolution

Consensus binding site motif for GATA-1: WGATAR or YTATCW

5997constrained

7308not constrained

2055no motif

Page 49: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

All GATA1-occupied segments active as enhancers are also occupied by SCL and LDB1

Page 50: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Genetic Determinants of Variation in Gene Expression

Page 51: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Variation of gene expression among individuals

• Levels of expression of many genes vary in humans (and other species)

• Variation in expression is heritable• Determinants of variability map to discrete genomic intervals• Often multiple determinants• This variation indicates an abundance of cis-regulatory variation in

the human genome• For example:

– Microarray expression analyses of 3554 genes in 14 families • Morley M … Cheung VG (2004) Nature 430:743-747

- Expression analysis of about 16 HapMap individuals• Storey et al. (2007) AJHG 80: 502-509

– Expression analysis of all 270 individuals genotypes in HapMap• Stranger BE … Dermitzakis E (2007) Nature Genetics 39:1217-1224

Page 52: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Variation in expression between populations

Figure 5.Allele-specific qPCR analysis of SH2B3. a, Log2-fold change of SH2B3 expression for all CEU and YRI individuals, relative to the average expression level in the YRI sample obtained from allele-specific qPCR. The distribution of SH2B3 expression is significantly different between samples (t-test, P= .0157), which confirms the microarray results. b, Allele-specific qPCR of a coding polymorphism (rs1107853), which demonstrates that the log2-fold change of the G allele relative to the A allele is significantly different between heterozygous DNA (Het DNA) and heterozygous cDNA (Het cDNA) samples (t-test, P= .00118).

Storey et al., 2007, AJHG 80:502-509

Page 53: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Mapping determinants of expression variation

• Stranger et al., 2007, Nature Genetics 39:1217-1224• Expression analysis of EBV-transformed lymphoblastoid cells from all 270

individuals genotypes in HapMap– 30 Caucasian trios (90) of European descent in Utah (CEU)– 30 Yoruba trios (90) from Ibadan, Nigeria (YRI)– 45 unrelated Chinese individuals from Beijing Univ (CHB)– 45 unrelated Japanese individuals from Tokyo (JPT)

• Measure levels of expression of 47,294 probes (about 24,000 genes) in each individual

– Focus on 13,643 genes “selected on criteria of variance and population differentiation”

• Already know genotypes at about 2.2 million SNPs for each individual (HapMap)

• Test for significant association of variation at each SNP with variation in expression of each gene

– Linear regression model– Spearman rank correlation test

• Evaluate significance of regression P values by 10,000 permutations of the data, focus on those associations above the 0.001 permutation threshold

Page 54: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Association of SNPs with expression

Stranger et al., 2007, Nature Genetics 39:1217-1224

• Significant association between expression and cis-SNPs (within 1 Mb)

• 831 genes in at least one population

• 310 genes in at least 2 populations

• 62 genes in all 4 populations

• Also find associated SNPs in trans: perhaps regulatory proteins

Page 55: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Location of expression-associated SNPs

• Most are “close” to transcription start site (TSS)

• Symmetrical arrangement (similar to biochemical features of promoters)

• Three of the SNPs have been shown to affect promoter activity in transfection assays (Hoogendoorn et al. (2004) Human Mutation 24: 35-42

Figure 4 Properties of significant cis associations as a function of SNP distance from the transcription start site.

Stranger et al., 2007, Nature Genetics 39:1217-1224

Page 56: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Relevance to human health

• "We predict that variants in regulatory regions make a greater contribution to complex disease than do variants that affect protein sequence”– Manolis Dermitzakis, ScienceDaily

Page 57: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Risk loci in noncoding regions

(2007) Science 316: 1336-1341

Page 58: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Biochemical features of DNA in CRMs

Pol IIaPol II

Coactivators

Accessible to cleavage: DNase hypersensitive site

Bound by specific transcription factors

Associated with RNA polymerase and general transcription factors

Nucleosomes with histone modifications:Acetylation of H3 and H4Methylation of H3K4

Clusters of binding site motifs

Page 59: Genomics of Gene Regulation ANSC 497B Ross Hardison Nov. 10, 2009

Candidate functions in T2D SNP intervals

Overlap of SNP rs564398 with DHS suggests a role in transcriptional regulation,but overlap with an exon of a noncoding RNA suggests a role in post-transcriptionalregulation. Different hypotheses to test in future work.