faster, cheaper dna sequencing

2
NOVEMBER 1, 2005 / ANALYTICAL CHEMISTRY 415 A © 2005 AMERICAN CHEMICAL SOCIETY S eeing his newborn son gasp for breath in an intensive care unit pushed Jona- than Rothberg into the DNA sequenc- ing business. Wishing he could analyze Noah’s genome for insight into the problem, he thought, “If only we could put sequencing on a chip, it would get twice as powerful and twice as fast every year, like computers.” Fortunately, Noah did not have a genetic disease, but his fa- ther still pursued sequencing on a chip. The company Rothberg founded and now chairs, 454 Life Sciences Corp., makes ma- chines that sequence DNA 100 faster and 10–100 cheaper than conventional technology, which is based on the Sanger method. Developed by Rothberg and col- leagues at 454 Life Sciences; the University of California, Berke- ley; the Rockefeller University; and the Rothberg Institute for Childhood Diseases, the Genome Sequencer 20 can sequence 20 million base pairs per hour (Na- ture 2005, 437, 376–380). Cloning DNA in bacteria has always been a bottleneck in se- quencing. Rothberg sidestepped the problem by drawing on the work of Andrew Griffiths, who conducted billions of separate experiments at the same time by using emulsion microdroplets as test tubes (Nat. Biotechnol. 1998, 16, 652–656). Rothberg and colleagues nebulize a genome into pieces, attach each fragment to a bead, and cocoon each bead in an emulsion of oil, water, and detergent. After PCR is performed, each bead carries 10 million copies of a unique piece of DNA. “So, you have a complete amplified genome in a few hours instead of a few months,” Rothberg says. To speed up sequencing, the research- ers fashioned optical fibers into a chip the size of a credit card. The fibers cre- ate a honeycomb of wells so small that four can fit on the end of a human hair. Centrifuging the chip with the emulsion fills each well with a bead. To limit cross- talk, only about one-third of the 1.6 mil- lion wells on a chip are filled; this per- mits >400,000 pieces of DNA to be sequenced simultaneously. Smaller beads carrying the enzymes needed for pyrose- quencing are also loaded into the wells. One of these enzymes is the firefly en- zyme luciferase. When a base is incorpo- rated to extend the DNA sequence, a pyrophosphate is liberated. Another en- zyme converts the pyrophosphate to adenosine triphosphate, which enables luciferase to generate a visible light sig- nal by oxidizing the pigment luciferin. Users place the chip into the Genome Sequencer 20 as if they were inserting a CD into a drive. The top of the chip is exposed to computer-controlled cycles of fluid, each providing one of the four deoxyribonucleotides. The base of the chip sits on a camera that captures the photons emitted from the wells as pyrosequencing proceeds. For example, if guanine is provided in a cycle, the light signals reveal which wells have cy- tosine at that position on their DNA template. One sequencing run involving 50–100 cycles (enough to se- quence a large bacterium) takes ~4 h and costs $5000 for the rea- gents and a chip. The machine it- self costs $500,000. When read- ing >100 bases in test fragments of DNA, the researchers obtained an accuracy of ~99.4% for individ- ual bases. With 4-fold coverage, the consensus accuracy was 99.99%. Two potential sources of inac- curacy are that nucleotides some- times remain in wells for more than one cycle and that some new chains of DNA stop extending prema- turely. However, the company devel- oped algorithms to correct for these problems and devised a program to as- sess base-call quality. To test the technology on a small bacterium, the researchers sequenced The 454 Life Sciences Corp. sequencing instrument consists of (a) a CCD imaging system, (b) a flow chamber with a fiber- optic slide, and (c) a fluidic system. (Adapted with permis- sion. Copyright 2005 Nature Publishing Group.) (b) (a) (c) i nstrumentals Faster, cheaper DNA sequencing

Upload: dan-pintilescu

Post on 07-Nov-2015

220 views

Category:

Documents


3 download

DESCRIPTION

Faster, Cheaper DNA Sequencing

TRANSCRIPT

  • N O V E M B E R 1 , 2 0 0 5 / A N A LY T I C A L C H E M I S T R Y 4 1 5 A 2 0 0 5 A M E R I C A N C H E M I C A L S O C I E T Y

    Seeing his newborn son gasp for breathin an intensive care unit pushed Jona-than Rothberg into the DNA sequenc-ing business. Wishing he could analyzeNoahs genome for insight into theproblem, he thought, If only we couldput sequencing on a chip, it would gettwice as powerful and twice as fast everyyear, like computers.

    Fortunately, Noah did nothave a genetic disease, but his fa-ther still pursued sequencing ona chip. The company Rothbergfounded and now chairs, 454Life Sciences Corp., makes ma-chines that sequence DNA 100faster and 10100 cheaper thanconventional technology, whichis based on the Sanger method.Developed by Rothberg and col-leagues at 454 Life Sciences; theUniversity of California, Berke-ley; the Rockefeller University;and the Rothberg Institute forChildhood Diseases, the GenomeSequencer 20 can sequence 20million base pairs per hour (Na-ture 2005, 437, 376380).

    Cloning DNA in bacteria hasalways been a bottleneck in se-quencing. Rothberg sidesteppedthe problem by drawing on thework of Andrew Griffiths, whoconducted billions of separateexperiments at the same time byusing emulsion microdroplets astest tubes (Nat. Biotechnol. 1998,16, 652656). Rothberg andcolleagues nebulize a genome intopieces, attach each fragment to a bead,and cocoon each bead in an emulsion ofoil, water, and detergent. After PCR isperformed, each bead carries 10 millioncopies of a unique piece of DNA. So,you have a complete amplified genome

    in a few hours instead of a few months,Rothberg says.

    To speed up sequencing, the research-ers fashioned optical fibers into a chipthe size of a credit card. The fibers cre-ate a honeycomb of wells so small thatfour can fit on the end of a human hair.Centrifuging the chip with the emulsion

    fills each well with a bead. To limit cross-talk, only about one-third of the 1.6 mil-lion wells on a chip are filled; this per-mits >400,000 pieces of DNA to besequenced simultaneously. Smaller beadscarrying the enzymes needed for pyrose-quencing are also loaded into the wells.

    One of these enzymes is the firefly en-zyme luciferase. When a base is incorpo-rated to extend the DNA sequence, apyrophosphate is liberated. Another en-zyme converts the pyrophosphate toadenosine triphosphate, which enablesluciferase to generate a visible light sig-nal by oxidizing the pigment luciferin.

    Users place the chip into theGenome Sequencer 20 as if theywere inserting a CD into a drive.The top of the chip is exposed tocomputer-controlled cycles offluid, each providing one of thefour deoxyribonucleotides. Thebase of the chip sits on a camerathat captures the photons emittedfrom the wells as pyrosequencingproceeds. For example, if guanineis provided in a cycle, the lightsignals reveal which wells have cy-tosine at that position on theirDNA template.

    One sequencing run involving50100 cycles (enough to se-quence a large bacterium) takes~4 h and costs $5000 for the rea-gents and a chip. The machine it-self costs $500,000. When read-ing >100 bases in test fragmentsof DNA, the researchers obtainedan accuracy of ~99.4% for individ-ual bases. With 4-fold coverage,the consensus accuracy was 99.99%.

    Two potential sources of inac-curacy are that nucleotides some-times remain in wells for morethan one cycle and that some new

    chains of DNA stop extending prema-turely. However, the company devel-oped algorithms to correct for theseproblems and devised a program to as-sess base-call quality.

    To test the technology on a smallbacterium, the researchers sequenced

    The 454 Life Sciences Corp. sequencing instrument consistsof (a) a CCD imaging system, (b) a flow chamber with a fiber-optic slide, and (c) a fluidic system. (Adapted with permis-sion. Copyright 2005 Nature Publishing Group.)

    (b)

    (a)

    (c)

    i n s t ru m e n t a l s

    Faster, cheaperDNA sequencing

  • 4 1 6 A A N A LY T I C A L C H E M I S T R Y / N O V E M B E R 1 , 2 0 0 5

    i n s t ru m e n t a l s

    the Mycoplasma genitalium genome.The average length of a sequence was110 bases, and the DNA was oversam-pled (obtaining redundant sequences toreduce errors) 40-fold. For 10 contigu-ous stretches covering 99.94% of thegenome, the consensus accuracy was99.97%. With stricter quality criteria,98.1% of the genome was covered witha consensus accuracy of 99.996%.

    George Weinstock at Baylor Collegeof Medicine says that the main advan-tage of the Genome Sequencer 20 isits tremendous throughput: >200,000samples in a 4-h run versus 96 sam-ples/h with conventional technology.Elaine Mardis at Washington Universityin St. Louis agrees; she adds that skip-ping the bacterial cloning step not onlysaves time but also removes cloning bias,which is the production of amplifiedDNA that lacks some pieces of thegenome because bacteria cant copythem. Clones can be useful, however,because they can be sequenced fromboth ends, so Weinstock is trying to findother ways to incorporate that function.

    The main disadvantages of the newapproach, Weinstock says, are the loweraccuracy rates for individual reads andthe short read length, which makes it dif-ficult to read across repeated sequences.Therefore, he says that scientists maywant to hang onto their current DNAsequencers so that they can identify hu-man mutations and polymorphisms.

    Nevertheless, short read lengths willbe perfectly adequate for many otherapplications, such as sequencing copiesof exons, Weinstock adds. Mardis is al-ready using the machine for digital kar-yotyping, which counts how many timesa snippet of DNA occurs in a genomicregion. With conventional methods, re-searchers must sequence large amountsof DNA to cull such snippets. But thatisnt a problem if you can produce 200,000to 400,000 in a single run on the 454machine, Mardis says.

    Meanwhile, 454 Life Sciences hassequenced >100 genomes of bacteria,fungi, and plants. The company nowhas perfect reads on fragments longerthan 500 bases and has sequenced >100million bases in a single run, accordingto Rothberg.

    Do-it-yourself sequencingAnother new sequencing method, whichdeciphers very short pieces of DNA, wasrecently described by Jay Shendure, Greg-ory Porreca, and colleagues at HarvardMedical School and Washington Univer-sity in St. Louis (Science 2005, 309,17281732). The researchers developedmultiplex polony sequencing, which alsoamplifies DNA fragments on beads inemulsion droplets. But an epifluores-cence microscope is used to analyze the

    beads in a layer of acrylamide gel on amicroscope slide. This system can bebuilt by anyone, Porreca says. All thecomponents are off the shelf, and thereagents are standard.

    The setup includes a syringe pump,autosampler, flow cell, epifluorescencemicroscope, and modules for tempera-ture control and automated washing.Assembling one costs ~$140,000. Theupfront cost is ~$20,000 for fluores-cently labeled oligonucleotides.

    First, researchers shear a genome andselect 1-kb fragments. Using an endonu-clease, they insert universal sequencesbetween and flanking two unique 1718bp pieces, or tags, from each fragment.The resulting constructs are amplifiedon beads. After the beads are immobi-lized in a flow cell on a microscope slide,an anchor for DNA ligase is hybridizedto the base immediately before or afterone of the two tags. To the beads, theresearchers add the ligase and a mixtureof nonamers, each color-coded with afluorescent dye that reveals the identityof the base at a chosen position. For ex-ample, nonamers color-coded according

    to the fifth of the nine bases identify abase five positions away from the anchorprimer.

    Rastering over the slide, the micro-scope collects bright-field and fluores-cent images for each of the four fluo-rophores. Image analysis determineswhich color, and therefore which com-plementary base, hybridized to the firstbase of a tag on a particular bead. Thenext cycle identifies the second base inthat tag, and so on. Because it is possi-ble to sequence 7 bases from one end ofa tag and 6 bases from the other, the 2tags on each bead generate 26 bases ofnoncontiguous sequence. Furthermore,a microscope slide can hold millions ofbeads.

    Shendure and Porreca tested thetechnology by sequencing an E. colimutant and comparing it with theknown wild-type sequence. They locat-ed ~1.16 million reads, covering ~30.1million bases, on the reference sequencewith a median raw accuracy of 99.7%.A consensus sequence for 70.5% of thegenome revealed 6 discrepancies be-tween the mutant and reference ge-nomes, which were confirmed as bonafide mutations. Preparing and arrayingthe beads took 2 days, and the 26 runstook 60 h. The cost per raw kilobasewas 11, which included 3/kb for li-brary construction.

    Multiplex polony sequencing wasntdesigned to replace Sanger sequencing,Porreca explains. The method deliversshort sequences at low cost and withhigh accuracy, he says. So the best ap-plications right now are bacterial rese-quencing and applications such asSAGE [serial analysis of gene expres-sion, in which] you need to sequencebar codes. Although critics say that themethod cannot detect mutations such asinsertions and large deletions, Churchasserts that it can. Porreca adds that thetechnologys primary advantages are itslow cost and high consensus accuracy.

    Weinstock predicts an explosion ofsequencing technologies in the next fewyears: I suspect a fair number of thosewill succeed, and one will use differentsequencing approaches for different ap-plications.aa

    Linda Sage

    False-color image of multiplex polonysequencing on beads.

    JAY

    SH

    EN

    DU

    RE

    AN

    D G

    RE

    GO

    RY

    PO

    RR

    EC

    A