rbp database: the encode eclip resource for rna binding protein targets … · 2016. 6. 8. · hg19...

35
RBP database: the ENCODE eCLIP resource for RNA binding protein targets Eric Van Nostrand [email protected] Yeo Lab, UCSD 06/08/2016

Upload: others

Post on 11-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

RBP database: the ENCODE eCLIP resource for RNA binding protein targets

[email protected]

YeoLab,UCSD06/08/2016

Page 2: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

ImageadaptedfromGenomeResearchLimited

Page 3: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Each step of RNA processing is highly regulated

StephanieHuelga

•  RNAbindingproteins(RBPs)actastransfactorstoregulateRNAprocessingsteps

•  EsOmated>1000RBPsinhuman

•  RNAprocessingplayscriOcalrolesindevelopmentandhumanphysiology

•  MutaOonoralteraOonofRNAbindingproteinsplayscriOcalrolesindisease

Page 4: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

250 RNA Binding Proteins

CLIP-Seq(ChIP-Seq) Bind-N-Seq RNAi &

RNA-SeqYeo

Fu Graveley

Burge

K562 & HepG2 cells

ENCORE

ENCORE: ENCODE RNA regulaAon group

Lécuyer

RBP Localization

Page 5: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

RBP Data ProducAon Overview (Released data only as of 6/8/16)

1,303Completed/ReleasedExperiments

6920456

27489

2024048

eCLIP-SeqRNAi/RNA-SeqChIP-SeqImagingeCLIP-SeqRNAi/RNA-SeqChIP-SeqRNABind-N-Seq

HepG

2K5

62

344RNABindingProteins

Page 6: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Outline

•  eCLIPoverview•  Methodoutline•  ENCODEsubmi_eddatastructure•  ENCODEeCLIPpipelinewalkthrough

• Whatkindsofanalysescanbedone?

•  Toolscomingsoon

Page 7: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

IdenOficaOonofRNAbindingproteintargetsbyeCLIP-seq

High-throughputsequencing

Dataprocessing&peakcalling

Page 8: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

eCLIP computaAonal pipeline

PEfastqfiles

(2x50bp)

Adaptertrimmedfastq

Adaptertrimming

Cutadaptx2

RepeOOveelementremoval

STARmaptomodifiedrepBase

Repeatelementmapping

PEmappingbamfile

Genomemapping

PESTARmapvshg19+SJdb

PEmapping,dup-removed

bamfile

PCRduplicateremoval

Customscript–nowbasedoffbothPEreads+randommer

Peaks

R2only–mapped,rmDupbamfile

InputnormalizaOon

Customscript

Uniquelymappedreads

Usablereads

Peakcalling

CLIPper(usesR2only)

Repeat-removedfastq

Input-normalized

Peaks

Page 9: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

PEfastqfiles

(2x50bp)

Adaptertrimmedfastq

Adaptertrimming

Cutadaptx2

RepeOOveelementremoval

STARmaptomodifiedrepBase

Repeatelementmapping

PEmappingbamfile

Genomemapping

PESTARmapvshg19+SJdb

PEmapping,dup-removed

bamfile

PCRduplicateremoval

Customscript–nowbasedoffbothPEreads+randommer

Peaks

R2only–mapped,rmDupbamfile

Input-normalized

Peaks

InputnormalizaOon

Customscript

Uniquelymappedreads

Usablereads

Peakcalling

CLIPper(usesR2only)

Repeat-removedfastq

FilesavailableonDCC

eCLIP computaAonal pipeline

Page 10: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –
Page 11: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –
Page 12: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Biosample1

eCLIPReplicate1

Size-matchedinput

Biosample2

eCLIPReplicate2

Page 13: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –
Page 14: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

R1+R2fastqfiles

Paired-endmapping(STAR)

Input-normalizedpeaks

Page 15: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

PEfastqfiles

(2x50bp)

Adaptertrimmedfastq

Adaptertrimming

Cutadaptx2

RepeOOveelementremoval

STARmaptomodifiedrepBase

Repeatelementmapping

PEmappingbamfile

Genomemapping

PESTARmapvshg19+SJdb

PEmapping,dup-removed

bamfile

PCRduplicateremoval

Customscript–nowbasedoffbothPEreads+randommer

Peaks

R2only–mapped,rmDupbamfile

Input-normalized

Peaks

InputnormalizaOon

Customscript

Uniquelymappedreads

Usablereads

Peakcalling

CLIPper(usesR2only)

Repeat-removedfastq

eCLIP computaAonal pipeline

Page 16: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

• AnalysisSOPavailableat:https://www.encodeproject.org/documents/dde0b669-0909-4f8b-946d-3cb9f35a6c52/@@download/attachment/eCLIP_analysisSOP_v1.P.pdf

Linked at boLom of each eCLIP experiment:

Page 17: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

DemulAplexing (already has been done for files on ENCODE DCC)

Page 18: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

File details: fastq files

DATASET.R1.fastq.gz: @CCAAC:SN1001:449:HGTN3ADXX:1:1101:1373:1964 1:N:0:1 CAAATGCCCCTGAGGACAAAGCTGCTGCCGGGCCTCTCTCTCTG + FFFFFFIIFIIIFIIFIFIFIIIIIIIIIIIIIIIIIIIIIIFI @CAGAT:SN1001:449:HGTN3ADXX:1:1101:1669:1914 1:N:0:1 TTAGAGACAGGGTCTCGCTCCGTTGCTCAGGCTGGAGTGCAGTG + FFFFFFIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII ...

DATASET.R2.fastq.gz: @CCAAC:SN1001:449:HGTN3ADXX:1:1101:1373:1964 2:N:0:1 GAGAGAGGAGTGGGAAGTTGGGATAGTACCCAGAGAGAGAGGCCCG + FFFFFBFFBFBFFFFFIFFFIFFIFIIIIIIFIIIIFFIFIIFFIF @CAGAT:SN1001:449:HGTN3ADXX:1:1101:1669:1914 2:N:0:1 TTGTACCACTGCACTCCAGCCTGAGCAACGGAGCGAGACCCTGTCT + FFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIFIIIIIIIIIIIIII ...

•  @CCAAC=random-mer(first5or10ntofsequencedread2)–hasbeenremovedfromthe5’endofread2andappendedtoreadname

•  Anyin-linebarcodehasbeenremoved(aspartofdemulOplexing)

Page 19: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Adaptor trimming:

Page 20: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Adaptor trimming:

• KeyconsideraOon–we’veobservedthatadaptor-concatamerfragments(evenatextremelylowfrequency)yieldhigh-scoringeCLIPpeaks

• Difficulttotrimallwithonepass•  Cutadapt(bydefault)willmissadaptorswith5’truncaOons

•  Toavoidthis,weerronthesideofover-trimming

Page 21: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

RepeAAve element removal •  MajorityofRNAinmostcellsarerRNA/tRNA/repeats•  ThesecanmapandcausestrangearOfacts(parOcularlyrRNA,asa40ntrRNAreadwith1or2sequencingerrorscanmapuniquelytooneofthevariousrRNApseudogenesinthegenome)

•  ToavoidfalseposiOves,weFIRSTmapallreadsagainstaRepBasedatabase,andonlytakereadsthatremainunmappedforfurtherprocessing

Page 22: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Mapping to human genome

• Weperformpaired-endmappingwithSTARtothehumangenomeplussplicejuncOondatabase,keepingonlyuniquelymappedreads

Page 23: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

PCR duplicate removal •  Next,wecomparereadsthatmaptothesamelocaOon(basedonthemappedstartofR1andstartofR2)basedontheirrandom-mersequence

•  IftworeadsmaptothesameposiOonandhavethesamerandom-mer,oneisdiscarded

•  Input:bamfilecontainingonlyuniquelymappedreads•  Output:bamfilecontainingonly“Usable”(uniquelymapped,non-PCRduplicate)reads

Page 24: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

eCLIP significantly decreases PCR duplicaAon rate

Page 25: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

File details: bam files

CCTTG:SN1001:449:HGTN3ADXX:1:1206:8464:69989 147 chr1 14771 255 43M = 14681 -133 CACGCGGGCAAAGGCTCCTCCGGGCCCCTCACCAGCCCCAGGT B<FFFFFB<0<<<IIFBF<07FFFBFIFFFFFBB<B<BBFFFB NH:i:1 HI:i:1 AS:i:80 nM:i:0 NM:i:0 MD:Z:43 jM:B:c,-1 jI:B:i,-1 RG:Z:foo

CCCCT:SN1001:449:HGTN3ADXX:2:2101:6568:79173 147 chr1 15206 255 44M = 15204 -46 GCGGCGGTTTGAGGAGCCACCTCCCAGCCACCTCGGGGCCAGGG FFFFIIIIIIIIIIIIIFFIIIIIIIIIFFIIIIIIFFFFFFFF NH:i:1 HI:i:1 AS:i:76 nM:i:2 NM:i:1 MD:Z:5T38 jM:B:c,-1 jI:B:i,-1 RG:Z:foo

CCTTG=random-mer(first5or10ntofsequencedread2)–hasbeenremovedfromthe5’endofread2andappendedtoreadname

Page 26: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Peak calling Step1)IniOalclusteridenOficaOonwithCLIPper(spline-fisngwithtranscript-levelbackgroundnormalizaOon)

Step2)Compareclustersagainstsize-matchedinput

Step3)Compressclusters(asCLIPperistranscript-level,itcanoccasionallycalloverlappingpeaks–thisstepiteraOvelyremovesoverlappingpeaksbykeepingtheonewithgreaterenrichmentaboveinput)

Page 27: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Why input normalize?

•  WeseemRNAbackgroundatnearlyallabundantgenes…

…buttruesignalishighlyenrichedabovethisbackground

Page 28: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Input normalizaAon removes false-posiAves and idenAfies confident binding sites

Page 29: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

File details: bed narrowPeak (input-normalized peaks)

track type=narrowPeak visibility=3 db=hg19 name="RBFOX2_HepG2_rep01" description="RBFOX2_HepG2_rep01 input-normalized peaks"

Chr7 4757099 4757219 RBFOX2_HepG2_rep01 1000 + 6.539331235 400 -1 -1

Chr7 99949578 99949652 RBFOX2_HepG2_rep01 1000 + 5.233511963 400 -1 -1

Chr7 1027402 1027481 RBFOX2_HepG2_rep01 1000 + 5.243129966 69.5293984 -1 -1

chr \t start \t stop \t dataset_label \t 1000 \t strand \t log2(eCLIP fold-enrichment over size-matched input) \t -log10(eCLIP vs size-matched input p-value) \t -1 \t -1

•  Note:p-valueiscalculatedbyFisher’sExacttest(minimump-value2.2x10-16),withchi-squaretest(–log10(p-value)setto400ifp-valuereported==0)

•  Ourtypical‘stringent’cutoffs:require-log10(p-value)≥5andlog2(fold-enrichment)≥3

Page 30: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

What can we do with the eCLIP database?

Page 31: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Individual RBP analyses

RBFOX2Nucleoli

eCLIPanalysis RBPlocalizaOon

IntegraOonwithknockdownRNA-seq

Page 32: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

An “RNA-centric” view of RBP-binding

‘in silico screen’ of a desired RNA against all CLIP datasets to idenAfy the best-binding RBPs

Page 33: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Integrated global views of RBP binding

Page 34: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Tools available soon (next few months):

•  eCLIPprocessingpipelineonDNANexus(shouldbeready~July)

•  FollowedquicklybyIDR&q/cmetricsforvalidaOngyourowneCLIPdatasets

• RNA-centricbrowser(websiteatalphastagenow)

•  AllowuserstoqueryRNAsorgenomicregionsofinterestagainstourENCODEeCLIPdatabase

•  IntegraOonwithENCODEencyclopedia

•  Factorbook-likesummariesforeachRBP

Page 35: RBP database: the ENCODE eCLIP resource for RNA binding protein targets … · 2016. 6. 8. · hg19 + SJdb PE mapping, dup-removed bam file PCR duplicate removal Custom script –

Acknowledgements

Funding:

GeneYeoBrentGraveleyChrisBurgeEricLécuyerXiang-DongFu

ComputaOonal:GabrielPra_EricVanNostrandShashankSatheBrianYee

Experimental:EricVanNostrandStevenBlueThaiNguyenChelseaGelboin-BurkhartRuthWangInesRabanoAlumni:BalajiSundararamanKeriElkinsRebeccaStanton