the past, present, and future of dna sequencing
DESCRIPTION
The Past, Present, and Future of DNA Sequencing . Craig A. Praul Co- Director Genomics Core Facility Huck Institutes of the Life Sciences Penn State University. A very short history of DNA sequencing. - PowerPoint PPT PresentationTRANSCRIPT
The Past, Present, and Future of DNA Sequencing
Craig A. PraulCo- Director
Genomics Core Facility Huck Institutes of the Life Sciences
Penn State University
A very short history of DNA sequencing
I started from the conviction that, if different DNA species exhibited different biological activities, there should also exist chemically demonstrable differences between deoxyribonucleic acids. Edwin Chargaff
Milestones
• First Isolation of DNA : 1867 (Freidrich Meisher)• Composition of nucleic acids; tetranucleotide theory : 1909 - 1940 (Phoebus
Levine)• G=C and A=T however, the G/C and A/T content of different organisms vary : 1950
(Edwin Chargaff)• G/C content measured by annealing : 1968 (Mandel and Marmur)• Maxam-Gilbert and Sanger Sequencing : 1977• Next-Generation Sequencing : 2005
Genomes Sequenced
• Virus – 3222 (Bacteriophage phi X 174, 5386 nt – 1977)
• Bacteria – 2289 (Haemophilus influenza, 1.8 x 106 nt – 1995)
• Eukarya – 168 (S. cerevisiae 1.2 x 107 nt – 1995; H. sapien, 3 x 109 nt -2001)
• Archaea – 152 (Methanococcus jannaschi , 1.7 x 106 nt – 1996)
Liu et al. Journal of Biomedicine and Biotechnology Volume 2012 (2012), Article ID 251364, 11 pages doi:10.1155/2012/251364
Next-Generation Sequencing
ER Mardis. Nature 470, 198-203 (2011) doi:10.1038/nature09796
Changes in instrument capacity*
Date Cost per Mb Cost per Genome
Sep-01 $5,292.39 $95,263,072Sep-02 $3,413.80 $61,448,422Oct-03 $2,230.98 $40,157,554Oct-04 $1,028.85 $18,519,312Oct-05 $766.73 $13,801,124Oct-06 $581.92 $10,474,556Oct-07 $397.09 $7,147,571Oct-08 $3.81 $342,502Oct-09 $0.78 $70,333Oct-10 $0.32 $29,092Oct-11 $0.09 $7,743Oct-12 $0.07 $6,618Jan-13 $0.06 $5,671
Source - NHGRI : http://www.genome.gov/sequencingcosts/
Sequencing Cost
Central Dogma of Molecular Biology
RNA ProteinDNA
Really?
So once we have the genomic DNA sequence of a species we have all of the information there is?
James Watson version - 1965
• No, not really.
Illumina HiSeq and MiSeq
• Massively parallel – HiSeq : 150 or 180 million reads per lane– MiSeq : 15 million reads per run
• Intermediate Read Length– HiSeq : 100 nt or 150 nt– MiSeq : 250 nt
• High total output per run– HiSeq : 90 GB or 288 GB– MiSeq : 8 GB
Sequencing Types
Single Read
Paired-end read
Mate-pair read
Library Types• Many different library preps : DNA, mate-pair, mRNA, miRNA, ChIP
• Fragmentation – DNA : 300 – 500 nt– RNA : 150 – 200 nt
• Attachment of appropriate adapters– Complex : flow cell binding, F & R sequencing, BC – Custom : Avoid if possible
• Removal of dimers/small inserts
• Amplification (or not)
Applications
• de Novo sequencing (genomes, transcriptomes)
• Resequencing (genomes, exomes, custom sequence capture)
• RNA-seq (mRNA, miRNA, degradome)
• Chip-Seq
• Methyl-seq
• RIP-seq
• Amplicon
de Novo Experimental Design
• Estimate of genome size
• Coverage (30 x – 100 x)
• Sequencing Type (paired-end or mate-pair)
• Example 100 MB genome, 100 x 100 nt paired-end reads– (100 MB) x (30 x coverage) = 3 GB – 3 GB / (200 nt for each pair of paired-end reads) = 15 million read pairs
• Replicates
Resequencing : Sequence Capture
RNA-seq Experimental Design• Estimate of transcriptome size (1-5% of genome ?)
• Coverage (30 x ?)– mRNA or rRNA depleted RNA– Relative abundance of transcripts you are interested in
• Sequencing Type (single read or paired-end)– Simple transcriptome vs. complex transcriptome– Splice variants
• Example 3 GB genome, 100 nt single reads– (3 GB genome) x ( 5% transcriptome ) = 120 MB Transcriptome– (120 MB transcriptome) x (30 x coverage) = 4.5 GB total sequence– 4.5 GB / (100 nt for each read) = 45 million read pairs
• Replicates : Yes!!!!– Biological not technical
ChIP-Seq
http://www.nature.com/nmeth/journal/v4/n8/images/nmeth0807-613-F1.gif
Source : http://openi.nlm.nih.gov/imgs/rescaled512/3269675_ijms-13-00097f6.png
RIP-seq
Methyl-seq
20 different types of base modifications in DNA are known and there are perhaps 200 modifications of RNA
Experimental Space: Next-Gen Platform
• PacBio : 0.075 x 106 reads/sample, 1000 – 3000 nt– Whole transcript
• Roche 454 FLX+ : 0.5 -1 x 106 reads/sample, 800 -1000 nt– Small – Medium Genome de novo sequencing – Long Amplicon– Transcriptome
• PGM: 1-2 x 106 reads per sample, 400 nt– Small genome de novo – Medium Amplicon
• MiSeq: 1-2 x 106 reads per sample, 50 – 250 nt– Small genome de Novo– Small Amplicon
• HiSeq : 10-100 x 106 reads per sample, 50 – 150 nt– Counting Applications : RNA-seq, ChIP-seq, RIP-seq, Methyl-seq– Large genome de novo and resequencing
Experimental Space: The Relevancy of “Classic” Techniques
Differential Gene Expression
• Northern blotting (1977) : 1 Probe – 20 samples
• Dot Blots (1987) : 100s of probes – 1 sample
• RT-PCR (1992) : 100s of probes – 10 -100 samples
• Microarrays (1995 ) : 100,000s of probes – 1 sample
• Next-gen sequencing (2005) : 10-100 x 106 reads – 1 sample
The Future
• More Reads
• Longer Reads
• Faster Sequencing
• Cheaper Sequencing
• New Applications