introduction to rna-seq& transcriptome...

Post on 07-Jun-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to RNA-Seq & Transcriptome Analysis

Jessica Holmes

PowerPoint by Shounak Bhogale

RNA-Seq Lab | Shounak Bhogale | 2019 16/11/19

Exercise

1. Use the RNA-STAR to align RNA-Seq reads

2. Use htseqCounts to count the reads.

3. Use edgeR to find differential expressed genes.

4. Use IGV for visualization (If time permits).

RNA-Seq Lab | Shounak Bhogale | 2019 26/11/19

Pipeline Overview

v

RNA-Seq Lab | Shounak Bhogale | 2019 36/11/19

sample replicate # fastq name # reads

TP0 Replicate 1,2 a_0.fastaq, b_0.fastaq 100,000

TP8 Replicate 1,2 a_8.fastaq, b_8.fastaq 100,000

name description

mouse_chr12.fna Fasta file with the sequence of chromosome 12 from the mouse genome

Mouse_chr12.gtf GTF file with gene annotation, known genes

RNA-Seq: 100 bp, single end data

Genome & gene information

Input Data

RNA-Seq Lab | Shounak Bhogale | 2019 46/11/19

Step 1A: Logging into Galaxy

Go to: http://compgen.knoweng.org/galaxy

Click Enter

Click Login

Input your login credentials.

Click Login.

RNA-Seq Lab | Shounak Bhogale | 2019 56/11/19

Step 1B: Galaxy Start Screen

The resulting screen should look like the figure below:

RNA-Seq Lab | Shounak Bhogale | 2019 66/11/19

Step 2A: Accessing Input Files

At the top of the page, click Shared Data.

Then click Histories.

RNA-Seq Lab | Shounak Bhogale | 2019 76/11/19

Step 2B: Accessing Input Files

Click sb_transcriptomics

You should see this page.

Click Import History.

RNA-Seq Lab | Shounak Bhogale | 2019 86/11/19

Step 2C: Accessing Input Files

Click Import

You should see an imported history like the following.

RNA-Seq Lab | Shounak Bhogale | 2019 96/11/19

In this exercise, we will be aligning RNA-Seq reads to a reference genome in the absence of gene models. Splice junctions will be found de novo.

Remember, we are not going to provide any genic structure information.

.

Step 1: Alignment using RNA-STAR

RNA-Seq Lab | Shounak Bhogale | 2019 106/11/19

Search RNA STAR in the search box.

RNA-Seq Lab | Shounak Bhogale | 2019 11

Click on RNA STAR under NGS: RNA Analysis

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 12

You should see this on the screen.

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 13

Set fastaq file as a_0.fastq

Set reference genomes as input.

Select mouse_chr12.fna as the fasta file.

Set mouse_chr12.gtf as theinput gene model.

Set length as 99.

Keep rest of the parameters as default.

Click execute after makingthese changes.

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 14

Click on the ’pencil’ symbol next to the dataset 9 to change the name of the dataset.

Once the run is complete three files generated will turn green.

Change the name in the name databox to a_0.bam and then click save.

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 15

Repeat above steps for remaining three fastaq sets – b_0.fastq, a_8.fastq andb_8.fastq

Similarly change the names of remaining bam files once they are generated.

Now we are ready for the next step.

6/11/19

Step 2: Read aligned countsUse htseq-count to generate the aligned counts for each bam files generated in step 1.

RNA-Seq Lab | Shounak Bhogale | 2019 166/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 17

Search htseq-count in the search box.

Click on htseq-count to open the tool.

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 18

Select a_0.bam as the input.

Change stranded to Reverse.

Keep rest of the parametersas default and click Execute.

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 19

Change the name of the following datasets.

Here we generated the counts for each gene in the 4 datasets.Now we are ready for the final step.

6/11/19

Step 3: Finding differentially expressed genesNow we will use edgeR to analyze the count files generated in step 2 to find differentially expressed genes between two time points.

RNA-Seq Lab | Shounak Bhogale | 2019 206/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 21

Search edgeR in the search box on the left.

Click on edgeR to open the tool.

This should come up on the screen.

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 22

Select all.txt as the input dataset.

Select Single Count Matrix

Enter Mouse as factor name.

Enter groups as TP8,TP8,TP0,TP0

6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 23

Set contrast as TP8-TP0.

Click on Filter low counts.Set Yes to filter lowly expressed genes.

Set Counts.

Set minimum Counts as 5.

Set Yes for normalized counts table and rscript.

And now Execute!6/11/19

RNA-Seq Lab | Shounak Bhogale | 2019 24

3 files will be generated after the run is complete.

Click on the eye symbol next to the report file.

Scroll down through the report.For our dataset there are no DEGs.

6/11/19

Conclusion

We did the following today.

1. Alligned RNA-seq data using RNA-STAR

2. Used htseq-count to count the aligned reads for each gene.

3. Used edgeR to find DEGs in the data

RNA-Seq Lab | Shounak Bhogale | 2019 256/11/19

Useful linksOnline resources for RNA-Seq analysis questions –

² http://www.biostars.org/ - Biostar (Bioinformatics explained)

² http://seqanswers.com/ - SEQanswers (the next generation sequencing community)

² Most tools have a dedicated lists

Information about the various parts of the Tuxedo suite is available here -http://ccb.jhu.edu/software.shtml

Genome Browsers tutorials –

² http://www.broadinstitute.org/igv/QuickStart/ - IGV tutorials

² http://www.openhelix.com/ucsc/ - UCSC browser tutorials

(openhelix is a great place for tutorials, UIUC has a campus-wide subscription)

26

Contact us at:

hpcbiohelp@illinois.edu

hpcbiotraining@illinois.edu

RNA-Seq Lab | Shounak Bhogale | 20196/11/19

Extra MaterialIGV

RNA-Seq Lab | Shounak Bhogale | 2019 276/11/19

The Integrative Genomics Viewer (IGV) is a tool that supports the visualization of

mapped reads to a reference genome, among other functionalities. We will use it to

observe where hits were called for the alignment for the two samples (TP0 and

TP8), and the differentially(!) expressed genes.

.

Visualization Using IGV

RNA-Seq Lab | Shounak Bhogale | 2019 286/11/19

In this step, we will start IGV and load the select mouse_chr12.fna file, the known genes file

(mouse_chr12.fna), the hits for both sample groups, and the merged transcriptome.

Step 9: Start IGV

RNA-Seq Lab | Shounak Bhogale | 2019 29

Graphical Instruction: Load Genome

1. Within IGV, click the ‘Genomes’ tab on the menu bar.

2. Click the the ‘Load Genome from File’ option.

3. In the browser window, select mouse_chr12.fna(genome).

Graphical Instruction: Load Other Files

1. Within IGV, click the FILE tab on the menu bar.

2. Click the ‘Load from File’ option.

3. Select the mouse_chr12.gtf file (known genes file).

4. Perform Steps 1-3 for the files to the right.

Files to Load

mouse_chr12.fna

a_0.bam

b_0.bam

a_8.bam

b_8.bam

6/11/19

Step 10: Visualization With IGV

RNA-Seq Lab | Shounak Bhogale | 2019 30

Click here and type the following location of a non-differentially expressed gene:NC_000078.6:17,788,398-17,793,435

6/11/19

top related