bioinformatics lab episode viii chip-seq analysis · bioinformatics lab episode viii –chip-seq...

47
BIOINFORMATICS LAB Episode VIII – ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department of Pharmacy and Biotechnology First Cycle Degree in Genomics

Upload: others

Post on 29-Jul-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

BIOINFORMATICS LAB

Episode VIII – ChIP-Seq

Analysis

Andrew N. Holding, PhDFederico M. Giorgi, PhD

Chiara Cabrelle, TA

Department of Pharmacy and Biotechnology

First Cycle Degree in Genomics

Page 2: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

2/60

• Ferrè and/or Giorgi

• 27 June, 10:00 AULA 1 SC FARMACEUTICHE Via Belmeloro 6

• 18 July, 10:00 AULA B Via Irnerio 42

• Giorgi Exam

– One tool picked from the list (1+ slides presentation)

– Oral Questions about the Lab topics

• Ferrè Exam

– Ask Ferrè

Exam Sessions - Bioinformatics

Page 3: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

3/60

Andrew Holdinghttp://andrewholding.com/

Proteomics Genomics Cancer Game Theory

Page 4: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

4/60

Science in Cambridge!

Page 5: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

5/60

Science in Cambridge!Downing College

Page 6: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

6/60

Science in Cambridge!Gryffindor House

Page 7: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

7/60

Science in Cambridge!

Page 8: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

8/60

Science in Cambridge! Holding's Office (Former Giorgi's Office)

Page 9: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

9/60

UPDATE R

BA

Page 10: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

10/60

TRANSCRIPTION FACTORS

BA

Page 11: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

11/60

TRANSCRIPTION FACTORS

BA

Page 12: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

12/60

CHIP-SEQ

Page 13: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

13/60

CHIP-SEQ

Page 14: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

14/60

ESTROGEN RECEPTOR & BREAST CANCER

Page 15: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

15/60

ESTROGEN RECEPTOR & BREAST CANCER

Page 16: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

16/60

INSTALL DIFFBIND

if (!requireNamespace("BiocManager", quietly = TRUE))

install.packages("BiocManager")

BiocManager::install("DiffBind")

library(DiffBind)

First we must install DiffBind.

You may previously have come across EdgeR and DESeq2 as methods of analysing RNA-seq

data.

The methods we are going to apply to the analysis of ChIP-seq data are based on these two

packages. Hopefully many of the ideas we use will there for be familiar.

Load Rstudio and run the following commands.

Page 17: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

17/60

LOAD THE SAMPLE SHEET

DiffBind requires a sample sheet.

- A sample sheet provides the meta data on each of the samples.

- Metadata includes the tissue type, if the cells were resistant or not, and replicate numbers

- The sample sheet also provides a list of where the files that include where the reads and the peak

files (generated by MACS2) are stored .

Page 18: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

18/60

LOAD THE SAMPLE SHEET

DiffBind requires a sample sheet.

Helpfully there is example data included in the DiffBind package from a real experiment.

The following code lets us see what files are included in the DiffBind package.

pathDiffBind<-system.file("extra", package="DiffBind")

list.files(pathDiffBind)

Output:

The file we want is tamoxifen.csv

Page 19: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

19/60

LOAD THE SAMPLE SHEET

We can load the Sample Sheet using read.csv(). This function imports the CSV file into R.

pathDiffBind<-system.file("extra", package="DiffBind")

sampleSheet<-read.csv(paste0(pathDiffBind,"/","tamoxifen.csv"))

View(sampleSheet)

Then view with:

Page 20: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

20/60

TABLES

pathDiffBind<-system.file("extra", package="DiffBind")

sampleSheet<-read.csv(paste0(pathDiffBind,"/","tamoxifen.csv"))

View(sampleSheet[sampleSheet$Tissue=='MCF7',])

We can adapt this code if we only want to look at the samples labelled MCF7 in the Tissue column.

Exercise: How would we view only the resistant samples?

Page 21: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

21/60

CREATING THE DIFFBIND OBJECT

Now we know we can load the sample sheet, we can create the DiffBind Object.

Page 22: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

22/60

CREATING THE DIFFBIND OBJECT

setwd(system.file("extra", package="DiffBind"))

pathDiffBind<-system.file("extra", package="DiffBind")

dbaObject<-dba(sampleSheet=paste0(pathDiffBind,"/","tamoxifen.csv"))

dbaObject

Now we know we can load the sample sheet, we can create the DiffBind Object.

Page 23: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

23/60

CLUSTERING THE SAMPLES

setwd(system.file("extra", package="DiffBind"))

pathDiffBind<-system.file("extra", package="DiffBind")

dbaObject<-dba(sampleSheet=paste0(pathDiffBind,"/","tamoxifen.csv"))

plot(dbaObject)

Page 24: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

24/60

CLUSTERING THE SAMPLES

dba.plotVenn(dbaObject,mask=dbaObject$masks$T47D)

## If you need to reload data

data(tamoxifen_peaks)

dbaObject<-tamoxifen

Page 25: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

25/60

CLUSTERING THE SAMPLES

dba.plotVenn(dbaObject,mask=dbaObject$masks$T47D)

## If you need to reload data

data(tamoxifen_peaks)

dbaObject<-tamoxifen

Exercise: How would we view the peak overlap of the resistant samples?

Page 26: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

26/60

PEAK OVERLAP

overlap<-dba.overlap(dbaObject,mode=DBA_OLAP_RATE)

plot(overlap)

## If you need to reload data

data(tamoxifen_peaks)

dbaObject<-tamoxifen

Page 27: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

27/60

READ COUNTS

dba.count(dbaObject, minOverlap=3)

Page 28: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

28/60

READ COUNTS

dba.count(dbaObject, minOverlap=3)

data(tamoxifen_counts)

dbaObject<-tamoxifen

plot(dbaObject)

Page 29: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

29/60

READ COUNTS

data(tamoxifen_counts)

dbaObject<-tamoxifen

plot(dbaObject)

Page 30: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

30/60

PRINCPAL COMPONENT ANALYSIS

dba.plotPCA(dbaObject)

Exercise: How would we view the 2nd and 3rd principal component,

and colour by condition?

data(tamoxifen_counts) #If you need to reload data

dbaObject<-tamoxifen

Page 31: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

31/60

PRINCPAL COMPONENT ANALYSIS

dba.plotPCA(dbaObject,components = 2:3,attributes=DBA_CONDITION)

data(tamoxifen_counts) #If you need to reload data

dbaObject<-tamoxifen

Page 32: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

32/60

DIFFBIND ANALYSIS

dbaObject<-dba.contrast(dbaObject, categories = DBA_CONDITION)

dbaObject<-dba.analyze(dbaObject)

dba.plotMA(dbaObject)

data(tamoxifen_counts) #If you need to reload data

dbaObject<-tamoxifen

Page 33: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

33/60

Alternative AnalysisExercise: Up to this point we have focused on the difference in ER binding between

Tamoxifen resistant and sensitive cells.

What if we want to look at the difference between two different types of breast cancer?

MCF7 cells have a heterozygous loss of Progesterone receptor, while T47D have a gain. How

does this change ER binding in a patient?

Start from count data.

data(tamoxifen_counts)

dbaObject<-tamoxifen

count contrast analyze plot

Page 34: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

34/60

MCF7 vs T47D

Load count data

dba.contrast

dba.analyze

dba.plotMA

dbaObject

minMembers=2

categories=DBA_TISSUE

dbaObject

dbaObject

contrast=?

data(tamoxifen_counts)

dbaObject<-tamoxifen

Suggested Parameters

Page 35: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

35/60

Alternative Analysis

#Load count data

data(tamoxifen_counts)

dbaObject<-tamoxifen

#Establish contrast

dbaObject_tissue<-dba.contrast(dbaObject, categories = DBA_TISSUE, minMembers = 2)

#Analyse

dbaObject_tissue<-dba.analyze(dbaObject_tissue)

#Gives BT474 vs MCF7, we need to choose correct contrast

dba.plotMA(dbaObject_tissue)

#list contrasts and let them find interesting ones.

dba.contrast(dbaObject_tissue,categories = DBA_TISSUE,minMembers = 2)

#plot correct contrast

dba.plotMA(dbaObject_tissue,contrast = 4)

Answer

Page 36: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

36/60

Alternative to MA plot

dbaObject<-dba.contrast(dbaObject, categories = DBA_CONDITION)

dbaObject<-dba.analyze(dbaObject)

dba.plotMA(dbaObject)

data(tamoxifen_analysis) #If you need to reload data

dbaObject<-tamoxifen

Page 37: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

37/60

Alternative to MA plot

dbaObject<-dba.contrast(dbaObject, categories = DBA_CONDITION)

dbaObject<-dba.analyze(dbaObject)

dba.plotVolcano

data(tamoxifen_analysis) #If you need to reload data

dbaObject<-tamoxifen

Page 38: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

38/60

Alternative to MA plot

dbaObject<-dba.contrast(dbaObject, categories = DBA_CONDITION)

dbaObject<-dba.analyze(dbaObject)

dba.plotVolcano

data(tamoxifen_analysis) #If you need to reload data

dbaObject<-tamoxifen

There is something

missing here! Can

you guess what?

Page 39: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

39/60

REPORT

View(as.data.frame(dba.report(dbaObject,contrast = 1)))

data(tamoxifen_analysis) #If you need to reload data

dbaObject<-tamoxifen

Page 40: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

40/60

REPORTif (!requireNamespace("BiocManager", quietly = TRUE))

install.packages("BiocManager")

if (!requireNamespace("ChIPpeakAnno", quietly = TRUE))

BiocManager::install("ChIPpeakAnno")

library(ChIPpeakAnno)

### Reload data

data(tamoxifen_analysis)

dbaObject<-tamoxifen

### Export report

dfReport<-dba.report(dbaObject,contrast = 1)

### Annotate

data(TSS.human.GRCh37)

dfReport <- annotatePeakInBatch(dfReport, AnnotationData=TSS.human.GRCh37)

View(as.data.frame(dfReport))

You should now have a feature column with Ensembl IDs with for each peak

Page 41: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

41/60

REPORTif (!requireNamespace("BiocManager", quietly = TRUE))

install.packages("BiocManager")

if (!requireNamespace("org.Hs.eg.db", quietly = TRUE))

BiocManager::install("org.Hs.eg.db")

if (!requireNamespace("ChIPpeakAnno", quietly = TRUE))

BiocManager::install("ChIPpeakAnno")

library(ChIPpeakAnno)

### Reload data

data(tamoxifen_analysis)

dbaObject<-tamoxifen

### Export report and annotate

dfReport<-dba.report(dbaObject,contrast = 1)

data(TSS.human.GRCh37)

dfReport <- annotatePeakInBatch(dfReport, AnnotationData=TSS.human.GRCh37)

### Add gene symbols

library(org.Hs.eg.db)

dfReport<- addGeneIDs(annotatedPeak=dfReport,

orgAnn="org.Hs.eg.db",

IDs2Add="symbol")

View(as.data.frame(dfReport))

You should now have a symbol column with the gene symbol

Page 42: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

42/60

Exercise!Visualize your peaks on the Human Genome

# Generate a BED file

output<-as.data.frame(dba.report(dbaObject,contrast = 1))

bed<-output[,1:3]

outfile<-paste0(file.path(Sys.getenv("USERPROFILE"),"Desktop"),"/example.bed")

write.table(bed,quote=FALSE,row=FALSE,col=FALSE,file=outfile,sep="\t")

Open the BED file: it contains ChIP-Seq peak coordinates

-> Go to UCSC Genome Browser

Click on Genomes, Human, hg19

My Data, Custom Tracks, Choos File, Select your BED file, submit

Page 43: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

43/60

Exercise!

Click on GO!

See where the peaks are (ER binding sites)

You already know which genes it binds (on chromosome 18)

Check those genes on the Genome Browser!

View(as.data.frame(dfReport))

Last column is what you need (symbol)

Page 44: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

44/60

Exercise!

Page 45: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

45/60

Further Readings

Page 46: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

46/60

Further Readings

Page 47: BIOINFORMATICS LAB Episode VIII ChIP-Seq Analysis · BIOINFORMATICS LAB Episode VIII –ChIP-Seq Analysis Andrew N. Holding, PhD Federico M. Giorgi, PhD Chiara Cabrelle, TA Department

www.giorgilab.org

Andrew N. Holding, PhD

Department of Pharmacy and Biotechnology

[email protected]