dna sequencing: what's driving their improvements
Post on 19-Oct-2014
6.968 views
DESCRIPTION
these slides show how the improvements in DNA sequencers are mostly from "reductions in scale." As with integrated circuits, reducing the size of features on DNA sequencers has enabled many orders of magnitude improvements in them. Unlike integrated circuits, the improvements are also due to changes in technology. For example, changes from pyrosequencing to semiconductor and nanopore sequencing have also been needed to achieve the reductions in scale. Second, pyrosequencing also benefited from improvements in lasers and camera chips.TRANSCRIPT
How and Why are Costs of DNA Sequencing Falling?
What are the Implications of these falling costs? 6th Session of MT5009
A/Prof Jeffrey FunkDivision of Engineering
and Technology ManagementNational University of Singapore
For information on other technologies, see http://www.slideshare.net/Funk98/presentations
Objectives
What are the important dimensions of performance for DNA sequencers and higher-level systems?
What are the rates of improvement? What drives these rapid rates of
improvement? Will these improvements continue? What kinds of new higher-level systems
will likely emerge from the improvements in DNA sequencers?
What does this tell us about the future?
Session Technology
1 Objectives and overview of course
2 Two types of improvements: 1) Creating materials that better exploit physical phenomena; 2) Geometrical scaling
4 Semiconductors, ICs, electronic systems
5 MEMS and Bio-electronic ICs
6 Nanotechnology and DNA sequencing
7 Superconductivity and solar cells
8 Lighting and Displays
9 Human-computer interfaces (also roll-to roll printing)
10 Telecommunications and Internet
11 3D printing and energy storage
This is Part of the Sixth Session of MT5009
Creating materials (and their associated processes) that better exploit physical phenomenon
Geometrical scaling◦ Increases in scale◦ Reductions in scale
Some technologies directly experience improvements while others indirectly experience them through improvements in “components”
As Noted in Previous Session, Two main mechanisms for improvements
A summary of these ideas can be found in 1) What Drives Exponential Improvements? California Management Review, Spring 2013 2) Technology Change and the Rise New Industries, Stanford University Press, 2013
Creating materials (and their associated processes) that better exploit physical phenomenon◦ Created materials that enable new techniques of DNA
sequencing Geometrical scaling
◦ Reductions in scale: smaller feature sizes for each technique (but many new techniques)
◦ Increases in scale: larger wash plates and production equipment
Some technologies directly experience improvements while others indirectly experience them through improvements in “components” ◦ Better lasers and sensors were important for some of the
techniques (e.g., pyrosequencing and Single-molecule real-time sequencing)
Both are Relevant to DNA Sequencing
Identify the sequence and identity of 3 billion base pair nucleotides in DNA strand
Nucleotides encode the genetic instructions for organisms
Four types of nucleotides in a DNA strand◦ Adenine◦ Thymine◦ Cytosine◦ Guanine
The Challenge
http://www.genome.gov/sequencingcosts/
http://www.genome.gov/sequencingcosts/
Read lengths Accuracies Speeds Improvements in these variables also lead to reductions in cost of sequencing
Capability to analyze and use gathered data◦need better computers◦need more storage
Improvements have also occurred in…
Improvements in DNA sequencers
Nature 2011, 470: 198-203, Elaine Mardis
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
New methods of sequencing◦ Maxam-Gilbert Sequencing: relies on cleaving of
nucleotides by chemical methods ◦ Chain Termination methods (sometimes called
Sanger method): bases are illuminated with UV light, read with X-rays
◦ Dye-termination: reading sequences with fluorescent dyes where each nucleotide emits light in different wavelengths (this technology caused acceleration)
Improved lasers and cameras to read fluorescent dyes
More parallel processing Smaller feature sizes, reductions in scale
Why do Costs Fall?
http://www.dnasequencing.org/history-of-dna
Source: Nature Biotechnology 30(11), 1023-1026, November 2012
But many different approaches are being investigated
This can be understood by reading highly cited papers such as◦ “Genome sequencing in micro-fabricated high-density
pico-liter reactors” (Margulies, 2005) and◦ “Toward nano-scale genome sequencing” (Ryan et al,
2007). Quote from Ryan et al: “The ability to construct nano-scale
structures and perform measurements using novel nano-scale effects has provided new opportunities to identify nucleotides directly using physical, and not chemical, methods.”
In fact, just the titles of these papers are fairly suggestive. In all of these decreasing scale examples, totally new forms of equipment, processes and factories were required.
Newer Approaches have smaller scale
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
1) separate DNA into smaller strands 2) make copies of strands (i.e., amplification) with
emulsion beads in plastic containers ◦ do this with small containers on a large wash plate so that many
copies are made in parallel◦ smaller containers and larger wash plates lead to more parallel
and faster processing 3) identify DNA nucleotides utilizing lasers and cameras
◦ Nucleotides emit light in the presence of an enzyme, ADT (Adenosine Triphosphate)
◦ falling costs of lasers and cameras reduce costs 4) Analyze data with computers
Pyrosequencing by 454 Life Sciences
One source: http://www.454.com/downloads/news-events/how-genome-sequencing-is-done_FINAL.pdf
Make copies to improve accuracy through redundancy
454 PicoTiterPlate from LifeSciences◦ contains 1.6 million hexagonal wells◦ each holds 75 pico-liters (10-12 liters, <100 micron diameter)
These wells can be made much smaller◦ dimensions on integrated circuits (ICs) are on the order of
20 nano-meters◦ Is it possible to reduce feature sizes by 1000 times or
volumes by 109
Make Copies of Strands (i.e., amplification in Step 2)
Fluorescent Dyes, Lasers, and Cameras
(Step 3)
As bases move across wash plate during sequencing run, a nucleotide (molecules that make up DNA) generates light signal, which is recorded by camera
Signal strength is proportional to number of nucleotide incorporated onto the DNA strands
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
Eliminate amplification and wash steps with zero wave guides (Pacific BioSciences)
http://www.youtube.com/watch?v=v8p4ph2MAvI from 1:50 to 3:50
Uses Zero Mode Wave Guides They are
◦Very small: zepto-liters (10-21 liters, 50 nanometers in diameter)
◦fabricated in a 100nm metal film on a silicon dioxide substrate
◦enough room for 600,000 molecules of liquid water at room temperature
◦How much smaller can they be made?
Pacific Bio-Sciences
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
Uses semiconductor chips to sequence DNA by detecting PH differences between A, G, C, and T◦ Thus, no lasers, cameras, or amplification are used
A microwell containing template DNA strand is filled with single species of deoxyribonucleotide triphosphate (dNTP)◦ Beneath layer of microwells is ion sensitive layer, below
which is ISFET ion sensor.
◦ All layers are contained in CMOS semiconductor chip
◦ If the introduced dNTP is complementary to leading template nucleotide, it is incorporated into growing strand
◦ This causes release of a hydrogen ion that triggers ISFET ion sensor, indicating a reaction has occurred
Ion Torrent
http://www.nature.com/news/2010/101214/full/news.2010.674.html http://en.wikipedia.org/wiki/Ion_semiconductor_sequencinghttp://www.lifescientist.com.au/article/394936/feature_sequencing_3_0/?pp=2
Done in Massively Parallel For each well
Matches cause ion to be released
Multiple matches cause multiple ions to be released
No matches no ions are released
While first sequencers used older (i.e., large feature sizes) semiconductor technology, newer ones use smaller feature sizes and thus are faster than older ones http://www.youtube.com/watch?v=JHzkYDyMzOg&feature=relmfu (2:30-
4:15)
For example, first sequencer (314) had 1.2 million wells while most recent one (Proton II) has 660 million wells◦ How much smaller can these wells be made? ◦ Since 256GB memory chips (1 byte = 8 bits) exist, can ion torrent
be able to provide 256 x 8 billion wells or about 2 trillion wells in next few years?
◦ After that improvements may slow as ion torrent's improvements depend on reductions in feature sizes of semiconductor technology
Ion Torrent
http://www.nature.com/news/2010/101214/full/news.2010.674.html http://en.wikipedia.org/wiki/Ion_semiconductor_sequencinghttp://www.lifescientist.com.au/article/394936/feature_sequencing_3_0/?pp=2
Source: Ion Torrent Video
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
Squeeze DNA through a nano-scopic pore (about 1.4 nm) in a semiconductor and read the distinctive change each letter in the sequence makes in the amount of current flowing through the pore
NanoPores
DNA moves through a nanopore at remarkably high velocities and thus only a small number of ions (as few as ~100) are available in the nanopore to correctly identify nucleotides◦ so the small changes in the ionic current due to the
presence of different nucleotides are overwhelmed by thermodynamic fluctuations
Challenge is to reduce the translocation velocity so that the ions can be correctly identified
Reductions in Translocation Velocity
http://www.youtube.com/watch?v=wvclP3GySUY
http://www.nature.com/nnano/journal/v6/n10/fig_tab/nnano.2011.129_F1.html
nt=
nucl
eoti
des
Reductions in Translocation Velocity over Time
2000 nanopore system (900 USD) that can read DNA at a rate of hundreds of kilobases per second
8000 nanopore system by next year (2013) that can read more than 1M bases per second
With about 3 billion bases per human genome and 20 sequencing machines, it takes about 15 minutes to sequence human genome
Expected Sequencing Time of 15 minutes with Oxford Nanopores
http://www.nature.com/news/nanopore-genome-sequencer-makes-its-debut-1.10051
The reduced velocities (and improved sensitivities) achieved by◦ combination of site-specific mutagenesis and one of
the following: the incorporation of DNA processing enzymes into the nanopore, chemical labeling of the nucleotides or the covalent attachment of an aminocyclodextrin adapter for α-haemolysin
◦ optimization of solution conditions (temperature, viscosity, pH), chemical functionalization, surface-charge engineering, varying the thickness and composition of the membranes, and the use of smaller diameter nanopores (thereby enhancing polymer–pore interactions) for solid sate
Further Reductions in Translocation Velocity
http://www.nature.com/nnano/journal/v6/n10/fig_tab/nnano.2011.129_F1.html
Personal Sequencing, Garage Biology
Sequencing can be done in your home, office, or in field
Sequence your own DNA multiple times in your life
Sequence the DNA from a bucket of ocean water, sewage, or
handful of dirt
Find proteins to manufacture other things
Combined with 3D printers, PCs, and the Internet, there is no limit to what we can do as individuals $900 from Oxford Nanopore
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
Many believe this will be the bottleneck in genome sequencing
Partial solution: because there are redundancies in the data, better algorithms can speed up the sequencing
Amount of Data is Exploding
Source: Nature 498 pp. 255-260, 13 June 2013
a) File sizes of the uncompressed, compressed with links and edits, and unique sequence data sets with default parameters. (b) Run times of BLAST, compressive BLAST and the coarse search step of compressive BLAST on the unique data ('coarse only'). Error bars, s.d. of five runs. Reported runtimes were on a set of 10,000 simulated queries. For queries that generate very few hits, the coarse search time provides a lower bound on search time. (c) Run times of BLAT, compressive BLAT and the coarse search step on the unique data ('coarse only') for 10,000 queries.
Cloud Computing
For storage and processing How to encourage sharing of data? How to protect privacy? Who will be the leading providers and users of these services?
How will this impact on the overall industry of health care?◦Might this globalize health care?
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
We can synthesize new forms of DNA Make new drugs, crops, or materials Test them Then synthesize/design newer forms of DNA
Keep iterating and making better drugs, crops, and materials
Synthesizing DNA
The cost of synthesizing DNA is also dropping
http://singularityhub.com/2012/09/17/new-software-makes-synthesizing-dna-as-easy-as-using-an-ipad/
http://www.synthesis.cc/cgi-bin/mt/mt-search.cgi?blog_id=1&tag=Carlson%20Curves&limit=20
About 5 years behind sequencing
Why do Costs Fall? New Methods Continue to Emerge
◦ Pyrosequencing (454 Life Sciences/Roche and Illumina)
◦ Single-molecule real-time sequencing (Pacific Bio)
◦ Semiconductor arrays (Ion Torrent)◦ Nanopores (Oxford Nanopore Technologies)◦ Methods of data compression
Synthesizing DNA Who Cares? What are the Implications? Conclusions
Outline
Most drugs are naturally occurring substances But improvements in our knowledge of humans
and other organisms and reductions in cost of sequencing and synthesizing DNA increase possibility of synthesizing drugs◦ Begins with DNA "target”: naturally existing cellular or
molecular structure involved in pathology of interest ◦ A common target is proteins whose function has now
become clear as a result of basic scientific research ◦ Sequence the protein’s DNA and then synthesize a drug
that acts on this protein (also based on scientific research)
Drug Discovery (1)
Gary Pisano, Science Business: The promise, the reality, and the future of biotech, Chapters 2 and 3
If we can reduce the cost of drug development, we can target smaller groups of people with drugs
How about synthesizing drugs for individuals? How about understanding which diseases a human
might be susceptible by sequencing their DNA? Even if we cannot synthesize drugs for individuals,
maybe we can better assign drugs to individuals by better understanding which humans are susceptible to known side effects◦ Most drugs have side effects
◦ DNA can tell us who might be susceptible to the side effects
Drug Discovery (2)
Gary Pisano, Science Business: The promise, the reality, and the future of biotech, Chapters 2 and 3
Gleevec treats myeloid leukemia◦ Blocks activity of protein BCR-ABL; it comes from abnormal
gene created by a merge of chromosomes 9 and 22 Crizitnonib teats lung cancer
◦ mutated version of gene called ALK, encodes protein that instructs lung cells to divide uncontrollably
Vemurafenib treats melanoma◦ Attacks protein that is generated by mutated version of a
gene called BRAF Problems
◦ Many cancers driven by more than one mutation and genes involved in repair are often involved with mutations
◦ $100,000 for 4 doses of one drug Nevertheless, DNA sequencing is helping scientists
identify common genes for cancer
Examples
Source: Getting Close and Personal, Economist, January 4, 2014
Green Machines for Better Crops?
Better sensors (cameras, infrared, fluorescence, lasers) and mechanical controls enable complete control and measurement over crop growth
DNA sequencing and DNA synthesizing enable characterization and replication of high performing crops
Other biological materials? Cellulosic ethanol Algae
http://www.aber.ac.uk/en/media/departmental/ibers/facilities/phenomicscentre/BBC-FOCUS-NPPC-Feature.pdf
Source: https://www.soils.org/publications/cs/articles/46/2/528
Improvements in U.S. Corn Yields through New Seeds
Improvements in Yield for other Crops
U.S. Department of Agriculture and Michael Bomford, Crop Yield Projections
According to Science Magazine, Scale-up will not enable economic feasibility
It’s not just about ◦ making bio-fuels from the non-food part of the
plant or◦ scaling up the production in order to reduce cost
It’s also about Developing Better Organisms◦ Better cellulose that produces more ethanol per
weight, while still enabling the plant to produce lots of food
◦ Better algae that consumes more carbon dioxide and generates more energy per weight or area
Better Bio-fuels
http://www.theguardian.com/science/2012/jan/14/synthetic-biology-spider-goat-genetics
Spider silk is very strong But difficult to harvest spider silk, partly
because it is hard to raise spiders (they eat each other)
Scientists introduced the gene for spider silk into goats so spider silk would be produced in their milk
Now spider silk is produced in the goat’s milk and scientists are trying to improve the results
It is expected that many other natural substances can be manufactured in this way
Synthesize Better Materials
http://www.theguardian.com/science/2012/jan/14/synthetic-biology-spider-goat-genetics
Enzymes, plastics, textiles, dyes Many of these are now made from fossil
fuels but were once made form natural substances
Can we return to biological feedstocks?◦ Modify yeast so that sugar can be turned into
useful compounds such as malaria drugs and biofuels
◦ Bring a switch from fossil fuels to biological feedstocks such as sugar, starch, and cellulose
Other Types of Materials
Registry of Standard Biological Parts◦ More than 10,000 parts
Can build complex systems from these parts
Genetically Engineered Machine Competition◦ Students compete to build complex systems◦ One group built a biological light detector with a
resolution of 100 million pixels per square inch Will biological systems ever compete with
electronic systems?
Taking all of this One Step Further:Building Complex Systems
DNA synthesizing equipment can be used to make (and replicate) DNA
One challenge is how to insert DNA into a cell, so that the cell can then replicate itself◦ Each cell contains DNA needed for a specific organism◦ Each cell may even contain the DNA for features that no
longer exist and the features can be turned back on
First done by Craig Venter’s team in May 2010◦ His team synthesized an entire bacterial genome
and “took over” a cell by inserting the DNA into the cell
Can this be done for more complex life forms?
How About “Creating” Life?
Source: Michio Kaku, Physics of the Future: How Science Will Shape
Human Destiny and Our Daily Lives by the Year 2100 (2011)
More Complex Organisms Require More Base Pairs and thus more years for their Synthesizing
The cost of sequencing and synthesizing DNA continues to fall
A major reason for the cost reductions is the benefits from reductions in scale◦Similar to those in ICs, bio-electronic ICs,
and MEMS◦A powerful way to reduce costs
Further reductions in scale and thus further cost reductions appear possible
Conclusions (1)
Low cost and small DNA sequencers and synthesizers will change drug discovery, health care, and science ◦ How will we do drug discovery in the future?
What kind of analyses can help us understand how these trends will change drug discovery and health care?
What kinds of opportunities will emerge for firms as vast amounts of data become available for analysis?
Conclusions (2)
Appendix
Single cell genomics ◦select the embryos created by IVF (in
vitro fertilization) that have best chance of developing into a healthy baby
Metagenomic medicine◦Sequencing many different microbes en
masse and then teasing out individual genomes to diagnose which ones are helping or harming human health
Specific Changes for in vitro fertilization
Nature 494, 21 February 2013, pp. 290-291