rna sequencing: workflow recommendationsrna : sensitive nucleic acid 5 rnases • found just about...
TRANSCRIPT
RNA Sequencing: Workflow Recommendations
Sample Preparation – back to the basics
Sridar [email protected](518) 591-7215
RNA isolation processes
2
Tissue Disruption & Homogenization
Collection
& StabilizationRNA Isolation
RNA
Characterization
Downstream
Applications
• Treatment and Handling- Collection method?- Disruption technique?
• Storage– How will you store your isolated RNA
Problems are not usually at the extraction step but before and after
• Hydrophilic: It dissolves in water. It does not dissolve in phenolor alcohol or lipids
• pH: RNA is unstable in a basic environment (pH ~9 or more). It isstable in mild acid (pH 4) but not strong acid (pH 1). Therefore, try to keep at neutral pH
• Heat: Fairly stable up to ~65°C if divalent cations are not present• Absorbance Spectra : Maximum UV absorbance at 260nm
Characteristics of RNA
3
RNA Degradation
4
The OH- group attacks the nearby phosphate, breaking the phosphate backbone
This is catalyzed by high pH, divalent cations plus heat, RNases, etc...
Mg++ + Heat
RNA : sensitive nucleic acid
5
RNases• Found just about everywhere: the benchtop, your hands, in the air and more.
• Incredibly stable enzymes (can withstand extreme conditions such as heatand high salt) it readily renatures and retains activity
• Some are sequence-specific, others are not. RNase A, for example, cuts 3’ end of unpaired C and U residues
• Are not destroyed by autoclaving. They must be destroyed with DEPC or RNaseZAP® or inactivated with RNase inhibitors like SUPERase-In™
RNase A molecule
10 ways to improve RNA Isolation
6
RNAses quantity in the mouse organs
7
General rules on RNA yields
181 000x
• Essential that any items that could contact the purified RNA isRNase-free
• Benchtop, Pipettors, Glassware should be decontaminatedwith RNAseZap™
• RNase free tips, tubes and solution should always be used and gloves should be changed frequently
1. Reduce exposure to environmental RNases
8
• There are 3 effective methods to accomplish this:
I. Homogenize samples immmediately after harvesting in a chaotropic-based cell lysis solution (e.g. Guanidinium)
II. Flash freeze in liquid nitrogen. Tissue pieces must be smallIII. Place sample in RNAlater®
2. Immediately inactivate endogenous, intracellular RNases.
9
• If flash frozen, must be stored at -80°C and never allowed to thaw prior to homogenization and isolation.
3. Use proper cell or Tissue storage conditions
10
• Essential step to prevent both loss and degradation• Needs to be fast and thorough• Method should be tailored to the cell type• Usually Mechanical:
– Dounce, polytron, vortex, sonication, French press, coffee grinder, bar shaker....• or Enzymatic
– Lysozyme, zymolase and lysostaphin etc....
4. Thoroughly homogenize samples
11
• Needed for some samples after homogenization before isolation to increase RNA yield
– High fat contain tissues, like Brain or Adipose tissue should be extracted with Chloroform to remove lipids
– Plant tissues high in polyphenolics and polysaccharides should be pretreated with insoluble polyvinylpyrrolidone (PVP)
5. Pretreat homogenate before RNA Isolation
12
• Sample type– Tissues, blood, cells, FFPE, LCM, Bacteria or yeast
• What is the downstream application– NGS, Microarray, qRT-PCR, Northern
• How many samples ?
• Other Isolation Needs– Viral RNA, microRNAs, mRNA isolation/enrichment
6. Choose the best RNA isolation method
13
Methods in RNA Isolation
Total RNA Isolation
Solid Phase/Glass Filter Methodology
Magnetic BeadMethods
Combination of All the above
Organic ExtractionPhenol-based
Method
14
• Recommended when RNA is to be used in Transcriptome or Gene ST arrays
• Recommended for WT RNAseq when ribodepletion is used
• Good practice when isolating RNA from DNA rich tissues likespleen
7. DNase treatment
15
• If purified RNA needs to be concentrated by precipitation for downstream applications:
– Ammonium Acetate (NH4OAc) precipitation– For low concentrations of RNA (ng/ml) use glycogen, pellet
paint
8. How to precipitate?
16
• Three ideal qualities of a resuspension solution–RNase-free water (for NGS studies)–Low pH (pH 6-7)–Incorporate a chelating agent to protect RNA against
degradation by introduced RNases• sodium citrate 1 mM, pH 6.4 ± 0.2 • or EDTA >0.1mM • or TE buffer 10mM Tris-HCl, 1 mM EDTA, pH 7.0
9. Resuspension
17
• Short term:– Stored at -20°C
• Long term:– Stored at -80°C in buffer– More stable in NH4OAc/EtOH precipitation mixture at -80°C
• Aliquot your RNA into several tubes– Avoid Freeze-thaw event– Prevent RNase contamination
10. Storage
18
Sample QC
19
Nanodrop
RNA QC is critical to the success of your genomics experiment
The Nanodrop can reveal problematic samples
Sample is NOT 184 ng/ulActually 48 ng/ul as per Qubit/Bioanalyzer272 nm peak is skewing data
260 272
How will this affect downstream processes such as RT-qPCR if one assumes equal RNA input to a cDNAreaction?
TRZ
RNA
Δ =1.2 OD=
48 ng/ul
Good RNA Degraded RNA Genomic DNA
The Nanodrop does not tell the whole story
Nucleic acid quantitation using fluorescence
Note: these do not give any information on RNA integrity or protein contamination
Measuring RNA integrity
RNA QC is critical to the success of your genomics experiment
Capillary electrophoresis
Biorad ExperionAgilent BioAnalyzer 2100
The Agilent Expert software assigns a RIN number to each trace. It assigns a number according to how much signal is found between the 5S and 18S band, between the 18S and 28S bands, and after the 28S band. A RIN number of 10 is perfect score. The software does not always call RIN numbers for prokaryotic RNA and the RIN can be misleading for samples containing additional RNA bands such as those from chloroplasts or a symbiotic RNA.
Small RNA chip versus existing RNA chip
28
miR
NA
Reg
ion
Low
er M
arke
r
Small RNA Region
tRN
A
Smal
l RN
A R
egio
n
18s
28s
RIN: 8.1
Low
er M
arke
r
RNA 6000Nano kit
Small RNA Kit
RNA 6000Nano
Size range: 25-6000nt
Results: Integrity, Total RNA amount, gDNA contamination
Small RNASize range: 6-150nt
Results: miRNA amount, Ratio and amount of other Small RNA
5s
5.8s
High Sensitivity DNA Assay
DNA high sensitivity assay: Bioanalyzer QC (qualitative)
Realistic quantitative estimation
SPRI bead selection
• Each bead is made of polystyrene surrounded by a layer of magnetite, which is coated with carboxyl molecules.
• Beads reversibly bind DNA in the presence of the “crowding agent” polyethylene glycol (PEG) and salt (20% PEG, 2.5M NaCl is the magic mix). PEG causes the negatively-charged DNA to bind with the carboxyl groups on the bead surface.
• As the immobilization is dependent on the concentration of PEG and salt in the reaction, the volumetric ratio of beads to DNA is critical
• The lower the ratio of SPRI:DNA the higher larger the final fragments will be at elution
Library prep kit choices
Illumina Truseq stranded library prep
mRNA kit: 0.1-4ug of total RNA or 10-400ng of mRNA
Total kit: 0.5 – 1ug total RNA
Illumina Truseq stranded library prep
Illumina Truseq stranded library prep
Not enough total RNA?
Other choices:Clontech smarter kit + Nextera
Clontech low input kit
Bioo Scientific
NEB
SMARTer Stranded Total RNA-Seq Kit - Pico Input Mammalian (Clontech)
Recommended input ranging from 250 pg to 10 ng of total mammalian RNA
Incorporates SMART® (Switching Mechanism At 5' end of RNA Template) technology and locked nucleic acid (LNA) technology.
PCR amplification generates Illumina-compatible libraries without the need for adapter ligation.
The directionality of the template-switching reaction preserves the strand orientation of the original RNA, making it possible to obtain strand-specific sequencing data from the synthesized cDNA.
The ribosomal cDNA (originating from rRNA) is then cleaved by ZapR in the presence of the mammalian-specific R-Probes
Successful cDNA synthesis and amplification should produce a distinct curve spanning 200–1,000 bp, with a local maximum at ~300–400 bp
Read 1 is derived from the sense strand of the input RNAFirst three bases are from TSO and must be trimmed
Read 2 will correspond to the antisense strand (PE reads)
SMARTer Stranded Total RNA-Seq Kit - Pico Input Mammalian (cont’d)
SMARTer Ultra Low Input RNA Kit + Nextera XT (Clontech + Illumina)
high-quality cDNA directly from 1–1,000 cells or 10 pg–10 ng of total RNA
Use the Nextera® XT DNA Sample Preparation Kit from Illumina to prepare your library. We recommend using an input amount of 100–150 pg amplified DNA.
Sequence only from Read Primer 1. (i.e.1x75bp)
NEBNext® Ultra™ Directional RNA Library Prep Kit (New England Biolabs)
Total RNA (100 ng–1 μg)
If a peak at ~ 80 bp (primers) or 128 bp (adaptor-dimer)
10 ng - 1 μg of total RNA
NEXTflex™ Rapid Directional RNA-Seq Kit (Bioo Scientific)
•2 hours faster than traditional Illumina stranded RNA-Seq library prep protocols•Provides precise measurement of strand orientation (>99%)•>10 ng – 1 µg total RNA for enrichment by NEXTflex™ Poly(A) Beads or ~ 1 ng - 100 ng isolated mRNA or rRNA-depleted RNA•Minimal hands-on time required•Streamlined protocol reduces sample loss•Utilizes dUTP-based methodology•Robust, reliable performance•Up to 96 barcodes available for multiplexing•Automation protocols are available for the Sciclone NGS and NGSx Workstation•Functionally validated with Illumina sequencing platforms•Allows for single read or paired end transcriptome sequencing
RNA library prep methods:Illumina: Truseq small RNA
1ug total RNA, 10-50ng purified miRNA
CFG process flow
PI contacts CFG for NGS by email/phone
Initial consultation to determine scope and deadline of work
CFG staff identifies protocol and researches kits
Pricing determined and quote sent
Quote accepted and CFG procures materials
Samples and materials received by CFG
Samples undergo quality check 1
Contact PIObtain new samples or redefine project
Contact PI, begin protocols
Library prep and QC 2
Contact PI with data
Contact PI, provide updates
Contact PI, provide updates
Identify issues and troubleshoot
Sequence libraries and QC 3
Identify issues and troubleshoot
1: RNA Bioanalyzer, nanodrop2: HS DNA chip bioanalyzer,
nanodrop, qubit3: Run metrics check, FastQC
RNA seq
3’ end counting(poly A, gene expression)
Whole transcriptome(gene expression, splicing, lncRNA)
Truseq Clontech/Nextera other TruseqNEB
Clontech pico
0.1-4ug total RNA or 10-400ng mRNA
0.1-1ug total RNA10pg-10ng total RNA0.1-1ug total RNA10ng-100ng reduced RNA
250pg-10ng total RNA
Clontech HI100ng-1ug total RNA
Sequencing Type Sequencing parameters
Differential gene expression 1x75 20-25M reads1x150 20-25M reads
Alternative splicing, WT, lncRNA 2x75 50-60M reads2x150 50-60M reads
Small RNA 1x75 2-5M reads
scRNA seq PE 26, 110 5-10M per cell
Miseq Nextseq 500 Hiseq 2500 Hiseq 3000 Hiseq 4000
Output Range 0.5 - 15 Gb 20 - 120 Gb 10 - 1000 Gb 125 - 750 Gb 125 - 1500 Gb
Maximum Read Length 2 x 300 bp 2 x 150 bp 2 x 150 bp 2 x 150 bp 2 x 150 bp
Reads per Run 15 million 130 - 400 million 300 million - 4 billion 2.5 billion 2.5 - 5 billion
Run Time 4hr - 55 hr 11 hr - 29 hr 7 hr - 6 days <1 - 3.5 days <1 - 3.5 days
Key MethodsSmall genome, amplicon, & targeted gene panel sequencing.
Exome, transcriptome, & whole-genome sequencing.
Exome, transcriptome, & whole-genome sequencing.
Exome, transcriptome, & whole-genome sequencing.
Exome, transcriptome, & whole-genome sequencing.
Illumina Current specifications
Hiseq X: 1.8Tb output, 6 B reads in <3 days
NextSeq Performance Parameters
• Total times include cluster generation, sequencing and base calling on a NextSeq 500 system
• Install specifications are based on Illumina PhiX control library at supported cluster densities (between 170 and 230 k/mm2 clusters passing filter)
• The percentage of bases >Q30 (1/1000 chance of an incorrect base identification or 99.9% accuracy) is averaged over the entire run
Run Quality checks
Intensity (P90): The 90% percentile extracted intensity for a given image (lane/tile/cycle/ channel combination).
FWHM The average full width of clusters at half maximum (representing their approximate size in pixels).
Illumina Basespace Dashboard/ SAV
Q = − 10 log10 P
Allows to see how the reads are spread between the pooled samples
CV <25% between samples
Surface view of the flow cell
Blue regions = low sequencing quality
Other ways to look at Q scores
Error rate goes up with cycles but is still very low
% Base: The percentage of called (non-N) clusters for which the selected base has been called.
Cycles 25 through run completion: Intensity, Cluster Density, % PF, Yield, Q scoresExpected: 170-230 K/mm2
Q30 >80% 1x75, 2X75; Q30 >75% 2x150RPF 400M SE, 800M PE (HO, 130M SE, 260M PE (MO)
Run metrics
Data Quality Checks
Q = − 10 log10 P Q30: 1 in 1000 error
A warning will be issued if the lower quartile for any base is less than 10, or if the median for any base is less than 25.
This module will raise a failure if the lower quartile for any base is less than 5 or if the median for any base is less than 20.
For each position a BoxWhisker type plot is drawn. The elements of the plot are as follows:• The central red line is the median value• The yellow box represents the inter-quartile range (25-75%)• The upper and lower whiskers represent the 10% and 90% points• The blue line represents the mean quality
The per sequence quality score report allows you to see if a subset of your sequences have universally low quality values. It is often the case that a subset of sequences will have universally poor quality, often because they are poorly imaged (on the edge of the field of view etc), however these should represent only a small percentage of the total sequences.
A warning is raised if the most frequently observed mean quality is below 27 - this equates to a 0.2% error rate.
An error is raised if the most frequently observed mean quality is below 20 - this equates to a 1% error rate.
This module issues a warning if the difference between A and T, or G and C is greater than 10% in any position.
This module will fail if the difference between A and T, or G and C is greater than 20% in any position.
This module issues a warning it the GC content of any base strays more than 5% from the mean GC content.
This module will fail if the GC content of any base strays more than 10% from the mean GC content.
A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads.
This module will indicate a failure if the sum of the deviations from the normal distribution represents more than 30% of the reads.
69