bioinformatics and omics group meeting reference guided rna sequencing
TRANSCRIPT
![Page 1: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/1.jpg)
Bioinformatics and OMICs Group MeetingREFERENCE GUIDED RNA SEQUENCING
![Page 2: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/2.jpg)
Hi Name: David Oliver
Advisor: Dr. Shtutman
Research: Understanding the role of COPZ2 silencing in cancer progression using RNA-seq to identify transcriptional changes caused by the loss of COPZ2 and its encoded microRNA.
Experience: Microarray analysis, multiple RNA-seq analyses including long-read (PacBio) and short-read (illumina) sequencing experiments.
![Page 3: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/3.jpg)
Why RNA-seq What’s the question?
◦ Differential Expression◦ Differential splicing
Advantage over other technologies◦ Increased sensitivity◦ Increased reproducibility
RNA-Seq vs Dual- and Single-Channel Microarray Data: Sensitivity Analysis for Differential Expression and Clustering. Alina Sîrbu, Gráinne Kerr, Martin Crane, Heather J. Ruskin. Published: December 10, 2012DOI: 0.1371/journal.pone.0050986
![Page 4: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/4.jpg)
Before You Start Consult a statistician Consult your sequencing core
![Page 5: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/5.jpg)
Actually Doing RNA-seq Minimum Requirements
◦ Have consulted a statistician and your sequencing core◦ Know that your question can be answered using sequencing technology and that the
experimental design is appropriate.
![Page 6: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/6.jpg)
Actually Doing RNA-seq Minimum Requirements
◦ Have consulted a statistician and your sequencing core◦ Know that your question can be answered using sequencing technology and that the
experimental design is appropriate.◦ > 10,000,000 reads per sample
◦ Much more depth required for differential splicing◦ ≥ 3 biological replicates
![Page 7: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/7.jpg)
Actually Doing RNA-seq Minimum Requirements
◦ Have consulted a statistician and your sequencing core◦ Know that your question can be answered using sequencing technology and that the
experimental design is appropriate.◦ > 10,000,000 reads per sample
◦ Much more depth required for differential splicing◦ ≥ 3 biological replicates◦ Access to decent amount of computing power
◦ Can be done on a laptop but it takes ~ 3 weeks (ask me how I know)◦ Basic knowledge of Unix system and R
![Page 8: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/8.jpg)
Actually Doing RNA-seq Minimum Requirements
◦ Have consulted a statistician and your sequencing core◦ Know that your question can be answered using sequencing technology and that the
experimental design is appropriate.◦ > 10,000,000 reads per sample
◦ Much more depth required for differential splicing◦ ≥ 3 biological replicates◦ Access to decent amount of computing power
◦ Can be done on a laptop but it takes ~ 3 weeks (ask me how I know)◦ Basic knowledge of Unix system and R
◦ Or, know someone who is willing to help you.
![Page 9: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/9.jpg)
Actually Doing RNA-seq Suggested Pipeline
◦ Quality assessment:◦ FastQC◦ FastX toolkit
◦ Alignment: ◦ Bowtie2/Tophat2◦ STAR◦ NovoAlign
◦ Counting reads: ◦ FeatureCounts◦ Gencode annotation
◦ Differential expression analysis◦ edgeR
◦ Manipulating sequencing files◦ Samtools, bamtools
Total RNA or mRNA
RNA-Seq
RNA expression levels
Align to genome
NovoAlign
BowTie2
Normalization/Quantification edgeR
Quality Filtering
Raw Reads
Biological System
STAR
fastQC
Read Counting
FeatureCount
Gencode
Target Genome
![Page 10: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/10.jpg)
RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification
![Page 11: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/11.jpg)
RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification
Total RNA or mRNA
RNA-Seq
Quality Filtering
Raw Reads
Biological System
fastQC
![Page 12: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/12.jpg)
Check some quality markers FastQC
◦ Basic tool for generating reports◦ Java based◦ Does not provide tools for correcting errors (FastX toolkit)◦ http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Other tools◦ FASTX toolkit: For fixing some problems with datasets (adapter trimming,
readthrough error correction, etc)◦ SAMstat: A tool for alignment QC
![Page 13: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/13.jpg)
RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification
Total RNA or mRNA
RNA-Seq
Align to genome
Quality Filtering
Raw Reads
Biological System
fastQC
Target Genome
![Page 14: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/14.jpg)
Getting the target genomehttp://genome.ucsc.edu/
![Page 15: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/15.jpg)
![Page 16: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/16.jpg)
![Page 17: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/17.jpg)
![Page 18: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/18.jpg)
RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification
![Page 19: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/19.jpg)
Build aligner-specific indexed genome
This step is performed by the aligner and takes a variable amount of time depending on the type of index used and the size of the genome to be indexed.
![Page 20: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/20.jpg)
RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification
Total RNA or mRNA
RNA-Seq
Align to genome
NovoAlign
Bowtie2
Quality Filtering
Raw Reads
Biological System
STAR
fastQC
Target Genome
![Page 21: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/21.jpg)
Perform alignmenttophat2 -p 12 --no-coverage-search --b2-N 1 --b2-L 32 --b2-i S,1,0.5 --b2-D 250 --b2-R 25 -o $RNAwork/ $RNAwork/Indexes/hg38_index $RNAwork/sample1.fastq
Reads:
Input : 20889144
Mapped : 18935684 (90.6% of input)
of these: 2674218 (14.1%) have multiple alignments (436 have >20)
90.6% overall read mapping rate.
![Page 22: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/22.jpg)
RNA-seq Walkthrough Check some quality markers Getting the target genome Build aligner-specific indexed genome Perform alignment Do some file manipulation Get the annotation file Count reads Perform normalization and quantification
![Page 23: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/23.jpg)
Do some file manipulation Depending on which aligner you choose to use, the aligned sequences may be output as a BAM or SAM file.◦ Sequence alignment/map format (SAM)
◦ Contains all the alignment information plus room for user-defined information about the alignments◦ Binary alignment/map format (BAM)
◦ A binary version of the SAM file◦ Added benefit of being much smaller and quickly accessed by other software◦ Not all software can manage the conversion from BAM back to SAM
To manipulate these formats i.e. sort, remove duplicates, remove unaligned sequences, use either samtools or bamtools
![Page 24: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/24.jpg)
RNA-seq Walkthrough Check some quality markers
Getting the target genome
Build aligner-specific indexed genome
Perform alignment
Do some file manipulation
Get the annotation file
Count reads
Perform normalization and quantification
Total RNA or mRNA
RNA-Seq
Align to genome
BowTie2 Quality Filtering
Raw Reads
Biological System
fastQC
Read Counting Gencode
Target Genome
![Page 25: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/25.jpg)
Get the annotation file Annotation files are readily available from multiple sources
◦ Gencode ( http://www.gencodegenes.org/releases/ )◦ Ensembl ( http://useast.ensembl.org/info/data/ftp/index.html?redirect=no )◦ Vega ( http://vega.sanger.ac.uk/info/about/data_access.html )◦ RefSeq ( http://www.ncbi.nlm.nih.gov/refseq/ )
These annotation sources mainly vary in the number of non-coding RNAs which have been annotated. ◦ RefSeq < Gencode < Ensembl < Vega
We use Gencode
![Page 26: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/26.jpg)
RNA-seq Walkthrough Check some quality markers
Getting the target genome
Build aligner-specific indexed genome
Perform alignment
Do some file manipulation
Get the annotation file
Count reads
Perform normalization and quantification
Total RNA or mRNA
RNA-Seq
Align to genome
BowTie2 Quality Filtering
Raw Reads
Biological System
fastQC
Read Counting
FeatureCount
Gencode
Target Genome
![Page 27: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/27.jpg)
Count Reads FeatureCounts
◦ We used to use HTseq-Count which was quite nice but we’ve switched to FeatureCounts because it is much, much, much faster.
◦ Also comes as an R package (bioc::Rsubread)
http://www-huber.embl.de/users/anders/HTSeq/doc/count.html
![Page 28: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/28.jpg)
RNA-seq Walkthrough Check some quality markers
Getting the target genome
Build aligner-specific indexed genome
Perform alignment
Do some file manipulation
Get the annotation file
Count reads
Perform normalization and quantification
Total RNA or mRNA
RNA-Seq
Align to genome
BowTie2 Quality Filtering
Raw Reads
Biological System
fastQC
Read Counting
FeatureCount
Gencode
Target Genome
RNA expression levels
Normalization/Quantification edgeR
![Page 29: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/29.jpg)
Perform normalization and quantification
EdgeR:counts <- read.table(file = "All_counts.csv”)counts <- na.omit(counts)counts <- counts[-(which(rowSums(counts) == 0)),]
### start edgeR ###group <- factor(rep(c("DU145.miR1","DU145.miR148a","DU145.miR148b","DU145.miR152"), each =3))y <- DGEList(counts = counts, group = group)### convert count matrix to a DGEList objectdesign <- model.matrix(~0+group) ### Experimental designkeep <- which(rowMeans(cpm(y)) > 10); y <- y[keep,] ### Remove genes with really low counts per milliony$samples$lib.size <- colSums(y$counts) ### this re-calculates the library size after removing samples with low CPMy <- calcNormFactors(y) ### calculate between sample normalizationy <- estimateGLMRobustDisp(y, design) ### calculate within sample normalizations (sort of)fit <- glmFit(y, design) ### fit the “massaged data” to a generalized linear model
### perform Likelihood Ratio Test on each contrast ###lrt.du145.mir148a <- glmLRT(fit, contrast=c(-1,1,0,0,0,0,0,0)) lrt.du145.mir148b <- glmLRT(fit, contrast=c(-1,0,1,0,0,0,0,0))lrt.du145.mir152 <- glmLRT(fit, contrast=c(-1,0,0,1,0,0,0,0))
### generate a user-friendly output table ###tt.du145.mir148a <- topTags(lrt.du145.mir148a, n = Inf, sort.by = "none")tt.du145.mir148b <- topTags(lrt.du145.mir148b, n = Inf, sort.by = "none")tt.du145.mir152 <- topTags(lrt.du145.mir152, n = Inf, sort.by = "none")
![Page 30: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/30.jpg)
Expected Results
du145-148a du145-148b du145-152MAL2 -3.559 -1.869 -4.668CDH1 -2.634 -2.173 -4.030ERRFI1 -1.209 -0.824 -1.595PPP6R1 -1.015 -0.546 -1.082NTSR1 -0.954 -2.126 -1.314ITGA5 -0.865 -0.928 -1.077PPAP2B -0.616 -0.476 -1.413MCAM 0.407 0.702 1.622IGFBP5 1.897 1.398 2.848GPC4 2.106 2.415 3.420CCL2 2.114 2.758 2.956
![Page 31: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/31.jpg)
Long Read (> 1kb) RNA-seq Long read analysis is performed with essentially the same workflow.
For alignment, STAR or GMAP work equally well
![Page 32: Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING](https://reader030.vdocuments.net/reader030/viewer/2022032707/56649e575503460f94b4f934/html5/thumbnails/32.jpg)
Questions?