ken mcgrath - next gen sequencing - game of thrones edition
DESCRIPTION
Title: Next‐generation sequencing: an overview of technologies and applications Presenter: Dr Ken McGrath, Australian Genome Research Facility Abstract: The “Next‐Generation Sequencing” landscape is one of constant change, with new and emerging technologies constantly competing with established platforms. This abundance of competition is resulting in faster and cheaper methods to perform sequencing of DNA and RNA samples, but it also brings with it a confusing array of options, each with its own strengths and weaknesses. Ken gives an overview of the available sequencing technologies and runs through some example projects that can be run on them, as well as describing the typical bioinformatics approaches for these projects, and also take a look at what’s “next” in Next‐Gen. First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/TRANSCRIPT
Next-Generation Sequencing: an overview of technologies and applications
July 2014
Ken McGrathAustralian Genome Research Facility
Next-Gen Sequencing Edition
• Current rulers of the “throne”
• Sequencing by synthesis
• Each cycle extends and reads a single base
• Reads of up to 2x300bp
DNA(0.1-1.0 ug)
Sample preparation Cluster growth
5’
5’3’
G
T
C
A
G
T
C
A
G
T
C
A
C
A
G
TC
A
T
C
A
C
C
TAG
CG
TA
GT
1 2 3 7 8 94 5 6
Image acquisition Base calling
T G C T A C G A T …
Sequencing
Illumina Sequencing TechnologyRobust Reversible Terminator Chemistry Foundation
MiSeq
Illumina
NextSeq500HiSeq2500
MiSeq
Illumina
NextSeq500HiSeq2500
GAIIx
Illumina X Ten
ILLUMINA SEQUENCING SYSTEMS
•150 bp paired end reads ~120Gbp / run (~1 day)
NextSeq500•15
0 bp paired end reads ~ 180 Gbp/ run (2 days)
Illumina HiSeq 2500 Rapid SBS
•125 bp paired end reads ~ 1000 Gbp/ run (6 day)
Illumina HiSeq 2500 v4 SBS
•300 bp paired end reads ~15 Gb/run (2.3 days)
MiSeq v3
• 150bp paired end reads ~1800 Gb/run (3 days)HiSeq X Ten
ILLUMINA SEQUENCING SYSTEMS
•10 -15 million pass filter clusters per run
MiSeq v2•50
bp single reads (0.5 – 0.75 Gb/run)
~6hrs
•≥ 90% bases higher than Q30 at 50 bp
50 cycles
•150 bp paired end reads (3.0 – 4.5 Gb/run)
~24 hrs
•≥ 80% bases higher than Q30 at 2x150 bp
300 cycles
•2x250 bp paired end reads (5.0 - 7.5 Gb/run)
~40 hrs
•≥75% bases higher than Q30 at 2 x 250 bp
500 cycles
•20-25 million pass filter clusters per run
MiSeq v3
•2x 75 bp paired end reads (3.0 – 2.5 Gb/run)
~20 hrs
•≥ 85% bases higher than Q30 at 2 x 75 bp
150 cycles
•2x300 bp paired end reads (12.0 – 15.0 Gb/run)
~55 hrs
•≥ 70 % bases higher than Q30 at 2 x 300 bp
600 cycles
Illumina Summary Strengths Weaknesses
Lots of data Too much data
Low error rates Slower run times
Great choice of platform sizes Shorter reads
Paired-end reads
Pretty awesome Slept with brother
• Competing with illumina for market share
• Two technologies (sequencing by ligation, and semiconductor sequencing)
• Reads of up to 400bp
Ion Torrent
• Ion Semiconductor Sequencing
• Detection of hydrogen ions during the polymerization DNA
• Sequencing occurs in microwells with ion (pH) sensors
– No modified nucleotides
– No optics
Ion Torrent• DNA Ions Sequence
– Nucleotides flow sequentially over Ion semiconductor chip
– One sensor per well per sequencing reaction
– Direct detection of natural DNA extension– Millions of sequencing reactions per chip– Fast cycle time, real time detection
Sensor Plate
Silicon SubstrateDrain SourceBulk
dNTP
To column receiver
∆ pH
∆ Q
∆ V
Sensing Layer
H+
SOLiD
Life Technologies
Ion Torrent PGM Ion Torrent Proton
• 100 bp reads ~20 Gbp/run (Coming soon!)
Ion Torrent Chips
• 200bp and 400bp reads, 30-100Mb/run (1.5 hrs)314 Chip
• 200bp and 400bp reads, 300-1000 Mbp / run (2 hrs)316 Chip
• 200bp and 400bp reads, 600Mb-2Gbp / run (4.5 hrs)318 Chip
• 200 bp reads, 5-10 Gbp/run P1 Chip
P2 Chip
PG
MP
RO
TO
N
Life Technologies Summary Strengths Weaknesses
Fast run times Lower maximum data output
Scalable data outputs Read quality can vary
Longer reads (400bp)
Pretty Haven’t done much recently
• Current rulers of the throne
• Sequencing by synthesis
• Each cycle extends and reads a single base
• Reads up to 2x300bp
• Current rulers of the throne
• Sequencing by synthesis
• Each cycle extends and reads a single base
• Reads up to 2x300bp
• One of the first NGS platforms
• Pyrosequencing based
• Each cycle allows extension of a single base (A, C, G or T)
• Reads up to 800bp
454 Pyrosequencing
454 Pyrosequencing
454: Data Processing
Image Processing
Base-calling
Quality Filtering
SFF File
T Base Flow
A Base Flow
C Base Flow
G Base Flow
Raw Image Files
GS-FLX
Roche
FLX Jr
GS-FLX
Roche
FLX Jr
Roche
• Not over yet…
Stratos Genomics Genia Something else?
Roche Summary Strengths Weaknesses
Long reads (up to 800bp) High $ per base
Older technology
Platform soon unavailable
Had wolves Pretty much dead
• Competing with illumina for market share
• Two technologies (sequencing by ligation, and semiconductor sequencing)
• Reads of up to 400bp
• Competing with illumina for market share
• Two technologies (sequencing by ligation, and semiconductor sequencing)
• Reads of up to 400bp
• Single-molecule real-time sequencing (SMRT)
• Detection of individual bases as they extend (by light emission)
• Long Reads (up to 4x2.5kb)
PacBio
PacBio
• Higher error rates (~90%)
• Compensate by “looping” DNA to create multiple passes
PacBio
Zero-Mode Waveguides (ZMW)
PacBio Summary Strengths Weaknesses
Long reads (4x2.5kb) High $ per base
Single-molecule detection Higher error rate
Capable of Epigenetics Still to prove itself
Freakin’ Dragons! Keeps losing dragons
Oxford Nanopore
• Direct detection of individual bases as pass through a “nanopore”
• MinION and GridION
• No synthesis/extension
• Capable of VERY Long Reads (>100kb)
Oxford Nanopore Summary Strengths Weaknesses
Extra-Long reads (>100kb) Not yet available (alpha testing)
Single-molecule detection Very high error rates
Capable of Epigenetics Immature platform
Very cost effective
Exotic and powerful Steal babies
NGS Applications
• Whole genome sequencing (today)» De novo assembly» Structural variant detection» Comparative genomics
• RNAseq (later today)» Gene expression» Splice variants» Transcriptomics » MicroRNA
• Epigenomics (tomorrow)» Indirect (bisulphite)» Direct
• Targeted sequencing (Wed)» Hybrid capture» Amplicon resequencing
Data Quality
Read Length
Yield/Coverage
Hodor! (Thank You)