how dead are the dead zones? (16/sep/2010)

12
1 How Dead Are the Dead Zones? (16/Sep/2010) Bob Harris Penn State Center for Comparative Genomics and Bioinformatics [email protected]

Upload: orsin

Post on 04-Jan-2016

97 views

Category:

Documents


2 download

DESCRIPTION

How Dead Are the Dead Zones? (16/Sep/2010). Bob Harris Penn State Center for Comparative Genomics and Bioinformatics. [email protected]. How Dead are the Dead Zones?. Looking at ChromHMM and Segway (short-range) segmentations, vs. certain annotated “features” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: How Dead Are the Dead Zones?  (16/Sep/2010)

1

How Dead Are the Dead Zones? (16/Sep/2010)

Bob Harris

Penn State

Center for Comparative Genomics and Bioinformatics

[email protected]

Page 2: How Dead Are the Dead Zones?  (16/Sep/2010)

2

• Looking at ChromHMM and Segway (short-range) segmentations, vs. certain annotated “features”

• Nothing fancy; just a simple base counting process

How Dead are the Dead Zones?

Page 3: How Dead Are the Dead Zones?  (16/Sep/2010)

3

Portion of Genome

Most or much of genome is assigned todead zone classes.

ChromHMM assigns 76% of the genome todead zones.

Segway assigns 42%

Dead Zone Classes

Promoter Classes

Enhancer Classes

Other Classes

(fu

ll cl

ass

na

me

s g

ive

n o

n s

lide

11

)

Page 4: How Dead Are the Dead Zones?  (16/Sep/2010)

4

Mappable Bases

• Mappability derived from signal tracks– 152 signal tracks are the inputs to the segmentation

– What is considered mappable for a given signal track is dependent on the tag extension length for that track

• I’m using the union of mappable intervals over all the tracks– A base is counted as mappable if it appears in an interval in any

track

• Not to be confused with the “mapability track” (wgEncodeMapability)

Page 5: How Dead Are the Dead Zones?  (16/Sep/2010)

5

Mappable Bases

Dead Zone

Promoter

Enhancer

Other

Not dead simply as an artifact ofnot mapping.

Page 6: How Dead Are the Dead Zones?  (16/Sep/2010)

6

In Repeats

Dead Zone

Promoter

Enhancer

Other

Repeats for ChromHMM dead zonesAre comparable to other classes.

Ditto for Segway’s DF and DFC.

Page 7: How Dead Are the Dead Zones?  (16/Sep/2010)

7

In Genes

Dead Zone

Promoter

Enhancer

Other

Dead zones contain interesting thingslike genes.

Page 8: How Dead Are the Dead Zones?  (16/Sep/2010)

8

In Exons

Dead Zone

Promoter

Enhancer

Other

Exon content is low for dead zones.

Page 9: How Dead Are the Dead Zones?  (16/Sep/2010)

9

GC Content, CpG Ratio

Dead Zone

Promoter

Enhancer

Other

Dead zones are on theLow end for GC content.

CpG Ratio is low, butcomparable to othernon-promoter classes.

Page 10: How Dead Are the Dead Zones?  (16/Sep/2010)

10

Related Work

• Also looked/looking at– SNPs– Sequence composition

• More plots and spreadsheet athttp://www.bx.psu.edu/~rsharris/encode/index.html#dead_zones

• Integration Vignette B02, in progresshttp://encodewiki.ucsc.edu/EncodeDCC/index.php/Integration_Vignette_B02

Page 11: How Dead Are the Dead Zones?  (16/Sep/2010)

11

• ChromHMM K562 kitchensink– http://www.broadinstitute.org/~jernst/K562_max_25state_49mark.bed.gz

– Lifted over to hg19

• Segway short-range K562 kitchensink– http://noble.gs.washington.edu/~stasis/public/2010/segtools/round5b/kitchensink/

k562/round5b.kitchensink.k562.1224-0218a.stws1.bed.gz

– Lifted over to hg19

• Signal tracks– http://noble.gs.washington.edu/~stasis/public/2010/encode/round6/rawSignal/

– 152 *.bedGraph.gz files

Data Sources

Page 12: How Dead Are the Dead Zones?  (16/Sep/2010)

12

Class Names5P0 14 5'UTR

5P1 24 Promoter - 5' UTR

CNV0 6 Repetitive/CNV high

CNV1 13 Repetitive/CNV medium

CNV2 15 Repetive/CNV low

D0 8 Dead zone (more dead)

D1 20 Dead zone

E0 16 Enhancer strong

E1 17 Enhancer - moderate

E2 11 Enhancer

GE 3 End of transcription

GS 7 Transcription initation

I0 9 CTCF + open chromatin high

I1 12 CTCF + open chromatin medium

I2 0 CTCF + open chromatin low

IG 4 Intergenic

K27me3 23 H3K27me3

K36me3 10 H3K36me3 transcribed

K4me1 5 H3K4me1

R0 2 Specific Repression Strong

R1 1 Specific Repression Weaker

T0 21 Weak transcribed

T1 19 Transcribed 5'

TSS0 22 TSS promoter strong

TSS1 18 Promoter/TSS

BBT 23 0.11 BRF1+BDP1+TR4

D0 1 0.0 D dead zone

D+Alu 18 0.1 K9me1 H3K9me1+H4K20me1

DF 13 0.2 F0 FAIRE

DFC 19 0.3 FC FAIRE+CTCF

E0 0 2.1 GM0 enhancer

E1 2 2.4 GM1 enhancer

FI 16 0.10 FI FAIRE+input

GE0 3 2.7 gene end

GE1 7 0.12 gene end

GM0 21 2.5 gene end

GM1 4 2.6 gene end

GM+K36me3 5 0.7 gene end

GS0 10 2.2 gene body

GS1 15 2.8 GM2 gene middle

I 20 2.3 I insulator

K4me1 12 0.8 H3K4me1+H3K9me1+H4K20me1

R0 6 0.4 R0 repression

R1 9 0.5 R1 repression

R2 22 0.6 R2 repression

R3 11 0.9 R3 repression

RGM 8 0.13 H4K20me1+H3K9me1

RTSS 24 0.14 R4 repressed TSS

TSS0 17 1.0 transcription start site

TSS1 14 2.0 near transcription start site

Seg

way

Chr

omH

mm