by hannah hickey upercomputers to life...

9
Fall 2006 BIOMEDICAL COMPUTATION REVIEW 7 www.biomedicalcomputationreview.org Bringing S UPERCOMPUTERS to LIFE (Sciences) BY HANNAH HICKEY

Upload: others

Post on 26-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

Fall 2006 BIOMEDICAL COMPUTATION REVIEW 7www.biomedicalcomputationreview.org

BringingSUPERCOMPUTERSto LIFE (Sciences)

BY HANNAH HICKEY

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 7

Page 2: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

heir very names sound like dinosaurs.Teracomputers. Petacomputers. These are, in fact, the dinosaurs of the digitalworld—monstrous, hungry and powerful. But unlike the extinct TyrannosaurusRex, these silicon beasts are state of the art. Housed in cavernous rooms thatrequire their own electrical and ventilation systems, row upon row of hummingboxes solve trillions of calculations every second.

In the late 20th century, such silicon giants revolutionized engineering and sci-entific research from aerospace to weather prediction. Now, supercomputing isextending its reach into the life sciences. Super-sized brains are necessary to inter-pret the new flood of data from high-throughput machines. Supercomputers havealso made possible entirely new fields of study, such as whole-genome compar-isons, protein folding, and protein-protein interactions inside the cell.

Their promise is undeniable. Vast computing power allows modelers to zoomin and simulate the behavior of individual proteins, and perhaps soon entirecells, at the atomic scale. Researchers can study sub-cellular interaction, watch itin slow motion, or blow it up to fill a dual-screen monitor. Soon, high-resolutionflow models will help build medical implants and direct surgical operations. Bigsilicon machines might even design drugs to cure humanity’s worst diseases.

T

8 BIOMEDICAL COMPUTATION REVIEW Fall 2006 www.biomedicalcomputationreview.org

Supercomputing in Science: A Timeline

1955Physicists devise computercode for a globalcirculation model,and by the mid-1960s are usingthe largest available computers to run global-scaleclimate simulations.

1960sThe term “supercomputer”enters the lexicon as IBM rolls out the 7030 (aka“Stretch”) andControl DataCorporationreleases its CDC 6600.

1976The legendaryCray-1 supercomputer is installed at Los AlamosNationalLaboratory where it is used to simulate nuclear explosions.

1977National Centerfor AtmosphericResearch purchases a Cray-1 supercomputerwhich operatesfor the next 12years running climate simulations.

Early 1980sAstrophysicistsuse supercomputersto simulategalaxy formation.

1980sLarge-scale computing provides an alternative towind tunnels in aeronauticsresearch. By the1990s, computershave virtuallyreplaced windtunnels.

1950s to 1960sThe roots of

supercomputing

1970s to 1980sSupercomputers integrated into climatology,

astrophysics, and aeronautics

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 8

Page 3: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

While ordinary computers have alreadychanged the study of life, supercomputersopen up new horizons, offering the possi-bility of discovering new ways to under-stand life’s complexity.

FROM BIG IRONTO ARMIES OF ANTS

To solve mammoth calculations, sci-entists have traditionally booked time on“Big Iron” custom machines housed atnational supercomputing centers or uni-versities. Today the landscape is shifting.These mammoth machines, though not

extinct, are facing tough competition. “It used to be that the power of those

machines [at supercomputing centers]was many orders of magnitude morethan what anybody had access to,” saysPhilip Bourne, PhD, professor of phar-macology at the University of Californiain San Diego and editor-in-chief of PLoSComputational Biology. “Now that’s nottrue anymore—computing is reallycheap.” Alternatives exist in a thrivingrange of home-built, borrowed or net-worked systems. Many researcherschoose to buy a cluster of off-the-shelfprocessors rather than wait for time on a“Big Iron” machine.

In 2003, students at VirginiaPolytechnic Institute in Blacksburg,Virginia, helped build one of the world’sfastest machines by assembling 1,100PowerMac G5 processors. At the time itwas the third-fastest computer in theworld, and the $7 million price was a bar-gain compared to a retail price of morethan $200 million for an equivalent bigiron computer. Similar clusters continueto sprout up every year. The most recentTop500 list, a biannual tally of theworld’s 500 fastest computers, shows thatnetworked, off-the-shelf processors now

claim 72 percent of the positions. Driving this trend is the frustrating evo-

lution of supercomputers. Since the1990s, spurred by economics, supercom-puters themselves became vast assemblagesof small processors. “What we [scientists]wanted was one computer that was muchfaster. What we got was a lot of comput-ers,” comments Vijay Pande, PhD, asso-ciate professor of chemistry and of struc-tural biology at Stanford University. Theworld’s fastest machine, IBM’s Blue Gene,now incorporates a whopping 131,072individual processors. Each one is relative-ly slow, even compared to what’s offered innew laptops, but it’s energy-efficient,

which allows them to be packed into asmall space without overheating.

Massively parallel machines havemany downsides. For one thing, the totalspeed of a single processor is sometimesless important than how quickly individ-ual processors can communicate. Thisshuffling back and forth of informationbecomes a bottleneck for the speed ofthe system. It also means that the entiresystem runs only as quickly as the slowestprocessor on the machine—a weakest-linkrule known as Amdahl’s Law.

Supercomputers today are like “armies

of ants,” says Allan Snavely, PhD, direc-tor of the Performance, Modeling andCharacterization Laboratory at the SanDiego Supercomputing Center. To enlistthese ants, computer code will first haveto be parallelized—split up into instruc-tions that multiple processors can handlesimultaneously. The difficulty of dividingup the problem means a supercomputerwith 100 processors won’t be able to solvea problem 100 times as fast. And today’s“massively parallel” supercomputersdon’t just incorporate 100 processors, butthousands of processors. Running onthese machines often means tweaking thecode yet again, says Mark Miller, PhD, a

Fall 2006 BIOMEDICAL COMPUTATION REVIEW 9www.biomedicalcomputationreview.org

1985The U.S. National Science Foundationestablishes five national supercomputingcenters: The Cornell Theory Center atCornell University; The National Center forSupercomputing Applications at theUniversity of Illinois at Urbana-Champaign;The Pittsburgh Supercomputing Center atCarnegie Mellon University and theUniversity of Pittsburgh; The San DiegoSupercomputer Center at the University ofCalifornia, San Diego; The John vonNeumann Center at Princeton University.

1996ResearchCollaboratory forStructuralBioinformaticssets up shop inthe UCSDSupercomputerCenter.

2000Folding@Homeproject launchedto investigateprotein foldingmechanisms. It’snow the world’smost powerfulgrid computingendeavor, solvingup to 200 trillioncalculations persecond.

2004Blue Gene, IBM’sflagship machine,outranks Japan’sEarth Simulatorcomputer as theworld’s fastestcomputer. It can sustain 36 trillion calculations per second.

2005IBM and theEcolePolytechniqueFédérale deLausanne launch“Blue Brain,”which aims tomodel the humanneocortex on oneof theBlueGene/Lmachines.

2010National ScienceFoundation’s targeted date to roll out apetaflop computer for science and engineering. The machine willsolve a quadrillion(1015) calculationsper second.

1996 to 2006 and beyondSupercomputers extend their

reach to biology

“What we [scientists] wanted was one computer that was much faster.What we got was a lot of computers,” comments Vijay Pande.

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 9

Page 4: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

10 BIOMEDICAL COMPUTATION REVIEW Fall 2006 www.biomedicalcomputationreview.org

hard problem. Such problems becomeexponentially more difficult with everyextra piece of data and so approximatesolutions are typically sought.

Some enterprising protein-foldingprojects recruit volunteers’ unused PC

biology researcher at the San DiegoSupercomputing Center.

Large-scale supercomputing centers’importance will shift from renting timeon computers to offering technicalexpertise, Bourne predicts, helping sci-entists run code on a parallel machine.Also, as journals increasingly requireplacing data in a public database, super-computing centers can fill that void.“The ability to store large amounts ofdata, that value has increased dramati-cally,” Bourne says.

SPREADING THE LOAD TOVOLUNTEER COMPUTERSToday, many of the most crushingly

difficult scientific computing problemsaren’t being solved in supercomputingcenters or on university clusters. They’reas likely to be solved in your living room.Take, for example, the quest to unlock themysteries of protein folding: Predictinghow a string of amino acids will curl upinto the same structure every time is oneof biology’s holy grails. If we could do this,we might design drugs to fit particular tar-gets, understand diseases of protein mis-folding, and be able to visualize unknownproteins from their amino acid sequence.

To run models ofprotein folding at anatomic scale requiresmaking calculationsevery femtosecond—onebillionth of a microsec-ond—in order to captureatomic vibrations. Butthe folding process, likemany things in biology,happens much moreslowly—on the order ofmicroseconds or mil-liseconds. This meansan atomic model of pro-tein folding from startto finish requires a bil-lion to a trillion steps.Also, the typical proteincomprises hundreds ofamino acids, each ofwhich exerts a force onevery other amino acid.Finding the lowest ener-gy configuration for allof these amino acids iswhat’s called an NP-

processing time—an idea pioneered bythe SETI@home project and nowreferred to as “grid computing.”

“It’s probably best thought of as asupercomputer but with radically differ-ent architecture,” says Vijay Pande, wholeads the Folding@Home project, nowthe largest grid computing venture in theworld. With more than 180,000 memberCPUs, Folding@Home commands moreraw FLOPS (floating point operationsper second, a measure of computerpower) than all the supercomputing cen-ters combined—up to 200 trillion calcula-tions per second—and transfers 50 giga-bytes of data every day. Pande wants tounderstand the nature of protein foldingto better understand why proteins some-times misfold, causing diseases likeAlzheimer’s and cystic fibrosis.

Other protein-prediction codes run-ning in a home office near you includeRosetta@home, based at the Universityof Washington in Seattle, which pre-dicts structures for proteins ofunknown function; Predictor@home,based at the Scripps Research Institutein San Diego, which compares differentstructure-prediction algorithms; andthe Human Proteome Project, out of

The HumanProteome Projectrecently finished

rough predictions for all the proteins inthe human genome

in a single year—a job that would have

taken a centuryon the available

laboratory cluster.

With collaborators at Fujitsu, Folding@Home published results showing the initial modeled structure of a pro-tein that is the target of immunosuppressive drugs (FKBP) in complex with a small molecule ligand (left); andthe final structure after a 20 nanosecond simulation (right). In this and other work, Folding@Home has demon-strated that atomistic models of biologically relevant systems can be calculated with a useful level of precisionand accuracy by bringing several orders of magnitude more computational power to the problem. This work isallowing important advances in rigorous physical drug-binding prediction. Courtesy of Hideaki Fujitani, Fujitsu.

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 10

Page 5: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

Fall 2006 BIOMEDICAL COMPUTATION REVIEW 11www.biomedicalcomputationreview.org

What’s a supercomputer?The definition of “supercomputer” is fluid—it just means a machine

that’s among the world’s fastest. Not only is the world’s fastestmachine always changing, but so is the architecture for creating

a supersized computer.

! “BIG IRON” supercomputers are the traditional super-computers: custom-built machines housed in refrigerator-like boxes. They first emerged in the 1980s, produced byCray, Inc. These custom supercomputers still lead the Top500list of the world’s fastest machines. Because they share infor-mation and data quickly between processors, they can tack-le the most complex problems. IMAGE:

The “Q” supercomputer, used byresearchers at Los Alamos National

Laboratory to simulate a ribosomemanufacturing a protein. Courtesy of

Los Alamos National Laboratory.

" CLUSTERS connect tens, hundreds, and in some casesthousands of off-the-shelf PCs. Software codes, typicallywritten in LINUX, provide communication. These aresometimes called “PC farms,” or “Beowulf clusters,” afterthe first systems of this type. Clusters are a much cheaperway to boost computing power. IMAGE: Photos of a teamassembling the 1,100-processor cluster at VirginiaPolytechnic Institute in 2003. Courtesy of Ken Wieringo, VPI.

AN IN-HOUSE NETWORK (not pictured) is created when anorganization connects its computers together, letting users bor-row each others’ computing power. Such a system is a type of in-house “grid” in analogy with the electrical grid, which shares a resource between manyintermittent users. Many businesses, including pharmaceutical companies, digital ani-mation studios and financial-investment firms, have networked employees’ desktop

machines to create an in-housesupercomputer, essentially for free.

! GRID COMPUTING uses unrelat-ed computers to solve pieces of agiant calculation. Volunteers signup over the Internet to donatetheir unused processing cycles.SETI@Home, the pioneer, is stillscanning radio waves for signs ofintelligent life. Other projects pre-dict the effects of global warming(Climateprediction.net), look forprime numbers (Great InternetMersenne Prime Search) or detect

gravitational waves from spinning neutron stars (Einstein@Home), to name a few.Biology projects include Folding@Home, fightAIDS@Home, and the United DevicesCancer Research Project. CERN plans to use this architecture to store and analyze datafrom the Large Hadron Collider beginning in 2007. IMAGE: Computers all over theworld are working on the protein-folding problem. This map shows the distributionof IP addresses as of November, 2004. Courtesy of Vijay Pande, Folding@Home.

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 11

Page 6: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

www.biomedicalcomputationreview.org

the Institute for SystemsBiology in Seattle, which pre-dicts structures for human pro-teins. In summer 2006 CERN,in Geneva, announced a proj-ect to study malaria on thegrid, and Israeli scientists hopeto map genetic diseases.

The benefits of such a schemeare obvious. The HumanProteome Project recently fin-ished rough predictions for allthe proteins in the humangenome in a single year—a jobthat would have taken a centuryon the available laboratory clus-ter. Buying equivalent comput-ing time for Folding@Homefrom a company like SunMicrosystems would cost $1.5 bil-lion a year, Pande says.

But it’s an open questionhow many codes will work on amotley collection of home com-puters, accommodate unpre-dictable run times, and tolerateinfrequent communication.Problems that work best on thegrid are the ones that don’trequire a lot of back-and-forthcommunication. SETI@home isa classic example; each user runsthe same pattern-recognitionalgorithm on a different chunkof radio-wave output. In geekspeak, this is an “embarrassinglyparallel problem”—one that caneasily be split into independenttasks on many processors.

Embarrassing or not, manybiological computing problems

may eventually become parallelized. “Inbiology you’re looking at a very large num-ber of small bits of data,” Bourne says.And clever algorithms may succeed in run-ning even complex problems on the grid.“Protein folding was not something that Ithink people would have thought could bebroken up,” Pande says. “My gut feeling isthat there will be many things that couldbe suited to this type of technology.”

It’s a question of being on the “leadingedge” of science versus the “bleedingedge,” he admits. “A lot of people don’twant to get cut by the bleeding edge.”Many scientists are wary of investing timein a technology that’s in its infancy. Toease the transition, the Berkeley OpenInfrastructure for Network Computing(BOINC), which is funded by the NSF,offers free CPU-scavenging code to inter-ested researchers. The Open Grid Form,launched in June 2006, aims to establishstandards and promote grid computing inthe research community. And the WorldCommunity Grid provides free coordina-tion for distributed computing projectsthat have a humanitarian bent. Since itslaunch in 2004 the World CommunityGrid has hosted fightAIDS@home andthe Human Proteome Folding Project.

BIOLOGICAL SIMULATIONSEnthusiasm for grid computing must

be tempered by realism. Some problemswill never run on the grid. In particular,some large-scale simulations and visual-izations are just too convoluted to splitup. Every component is constantly inter-acting with every other part. In a recentsimulation of the human heart at the SanDiego Supercomputing Center, the flag-

Top of the FLOPSThe widely quoted Moore’s Law pre-

dicts that processing power will doubleevery 18 months. So far the trend, attrib-uted to Intel cofounder Gordon Moore,has held true. Processors continually speedup and supercomputers combine them inever larger numbers. Today’s fastest com-puters, including the Blue Gene machines,are at the teraflop scale—one trillion cal-culations every second.

But engineers already have theirsights set on the next benchmark: petas-cale computers, which would be a thou-sand times faster, performing onequadrillion calculations per second. TheNational Science Foundation announcedit would enable petascale computing forscience and engineering by the year2010. Many scientists say they couldoccupy a machine of that size with exist-ing calculations.

Some question whether Moore’s Lawwill eventually reach a limit. At somepoint, computers can’t pack more pro-cessing power into a small space withoutoverheating the components. On theother hand, machines can’t be so widelydispersed that information, which is lim-ited by the speed of light, takes too longto travel from one processor to another.

Quantum computers and DNA comput-ers may someday introduce new tech-nologies, even as today’s machines reachtheir physical limits. “Most likely whilewe’re sitting around debating how muchfurther we can go with silicon comput-ing, some genius is on the verge of a rad-ical new invention,” says Allan Snavely.

Graphs of the top 500computers in the worldshowing that clusterarchitectures are becom-ing more common (left)and that they are madeup of an increasing num-ber of individual proces-sors (right). Courtesy ofTop500.org.

12 BIOMEDICAL COMPUTATION REVIEW Fall 2006

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 12

Page 7: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

Fall 2006 BIOMEDICAL COMPUTATION REVIEW 13www.biomedicalcomputationreview.org

York City. A recent detailed simulation ofthe membrane protein rhodopsin, whichused about a third of their machine’smammoth computation power, suggestedthat water molecules may play an activerole in its function.

“I think we will model larger and larg-er biological systems,” Germain predicts.He also sees the models them-selves improving. While simulat-ing a living thing is not inher-ently different from recreating aphysical event—exploding galax-ies, say, or air flowing over anairplane wing—biology has morecomplex structure. KevinSanbonmatsu, PhD, the LosAlamos researcher who ran theribosome simulation, began hiscareer in physics, but appreci-ates biology’s challenges. Whenwriting the code to model aribosome, Sanbonmatsu says, hehad many more types of atoms that hadto be placed in specific locations than ifhe were modeling a semiconductor.

The toughest demands for a combi-nation of size and speed may come from

clinical practice. “We have an insatiableappetite for high-performance comput-ing,” says Charles Taylor, PhD , associ-ate professor of bioengineering and sur-gery at Stanford University. His groupsolves fluid-dynamics equations thatmodel blood flow through arteries.Beginning with a 3D image from a

patient, Taylor recreates the inner work-ings of large arteries at millimeter-scaleresolution, problems which incorporate5 million to 10 million variables, eachdepending on all the others. Someday hehopes a surgeon could compare differentoptions in the computer to decide on thebest procedure for a particular patient.

Unfortunately, today even Taylor’s ded-icated, 64-processor SGI supercomputerstruggles when confronting a scenario withmedical complications. An aortic archwith turbulent flow downstream requirescalculating every 10 microseconds, mean-ing it takes 10 thousand or 100 thousandsteps to complete a single cardiac cycle.

“You want to be able to turn thesearound really quickly,” he says. Today’scomputers take days to run the model;doctors would like to compare multiple

ship machine spent 99 percent of its timetwiddling its thumbs (at a billion cyclesper second) waiting to receive its neigh-bor’s results. Running this problem on agrid, where communication takes sec-onds rather than nanoseconds, would bean exercise in frustration.

In 1995, fewer than one in 20researchers using the San DiegoSupercomputing Center was a biologist.By 2005, that number had quadrupled toalmost one in five, and government labsare seeing a similar trend. Last October,researchers at Los Alamos NationalLaboratory in New Mexico completed thefirst biological simulation to incorporatemore than a million atoms: They usedNewton’s laws of motion to watch the2.64 million atoms of the ribosome man-ufacturing a protein. Such atomic-scalesimulations allow researchers to mimicexperiments in silico, observing processesat slower speeds or at a magnified scale.Biologists at IBM Research now use theirBlue Gene machine largely for moleculardynamics applications, says RobertGermain, PhD, a staff researcher at IBMTJ Watson Research Center near New

In 1995, fewer than one in 20 researchers using the

San Diego SupercomputingCenter was a biologist.

By 2005, that number hadquadrupled to almost one in five.

IBM researchers ran molecular dynamics sim-ulations on Blue Gene that show the proteinrhodopsin (silver ribbon) interacting withspecific omega-3 fatty acids in the surround-ing membrane. The work suggests that fattyacids play a role in rhodopsin’s function asthe protein receptor primarily responsiblefor sensing light. This simulation ran for twomillion timesteps of one femtosecond (onequadrillionth of a second) each. Membrane-protein research commands one third of theBlue Gene supercomputer’s nodes. Courtesyof Michael Pitman, IBM Research.

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 13

Page 8: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

treatment options in just a few hours. Thecomputing power necessary to do that islikely on the horizon, he says. Taylorserves on a government panel looking tointegrate supercomputers in the medicaldevice industry, the way aerospace and carmanufacturers did in the past. He says, “Ifeel pretty confident that ten years fromnow, we’ll look back on this time and we’llfind it hard to imagine that these toolswere not used in clinical practice.”

GENETICS’ INFORMATION OVERLOADBiology is seeing its databases explode.

Nowhere is this more dramatic than ingenetics. The vast amount of data provid-ed by sequencing the human genome in2003 was a turning point for biology’s useof computers. Bioinformatics researcherscan now comb through the sequenceslooking for patterns and similarities. Oneof the most promising techniques iswhole-genome comparisons whereresearchers search for portions of thegenome that are conserved across species,

suggesting they may be important.Again, this turns out to be an NP-hardproblem, demanding enormous comput-ing power for genomes that may includebillions of base pairs.

And this is only the beginning.Every year it gets cheaper to sequencemore genomes.

“The amount of biological data avail-able is increasing much faster than theincrease of single processor speeds. It’sgoing much faster than Moore’s Law,”says Serafim Batzoglou, PhD, assistantprofessor of computer science atStanford University. Supercomputerswill be needed to store, access and ana-lyze this data. The first human genometook years to sequence, and cost millionsof dollars. Today every few months anew genome appears. As sequencingtechnologies get cheaper, it’s likely thatwithin a few years we’ll have hundreds ofhuman genomes and thousands of dif-ferent species, Batzoglou predicts.

“The situation has been like quicksandever since I arrived,” laments Robert

www.biomedicalcomputationreview.org14 BIOMEDICAL COMPUTATION REVIEW Fall 2006

Above: In this fluid dynamics model of bloodflow, the colors display variations in the peaksystolic blood pressure from the aorta to thelower extremities. Abrupt pressure changesshow regions of relative inefficiency in the cir-culation. This type of simulation means simul-taneously solving millions of nonlinear equa-tions and, for the finest resolution, requiresdays of computation time on a 64-processor SGIsupercomputer. Courtesy of Charles Taylor,Stanford University.

Christoph Sensen, PhD, professor of bioin-formatics and director of the Centre forAdvanced Technologies at the University ofCalgary, looks down on a larger-than-lifeimage of muscle structures. He is standinginside the CAVE, a 4D virtual environment in theSun Center of Excellence for Visual Genomics.CAVE computers running JAVA code project high-resolution images at 112 times per second, envelop-ing visitors in visions of DNA, cells, or--in this case--the human body. Courtesy of Christoph Sensen.

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 14

Page 9: BY HANNAH HICKEY UPERCOMPUTERS to LIFE (Sciences)bcr.org/sites/default/files/2006f-ftr1-supercomputers.pdf · processors rather than wait for time on a “Big Iron” machine. In

Fall 2006 BIOMEDICAL COMPUTATION REVIEW 15www.biomedicalcomputationreview.org

Petryszak, a technician who for the pastthree years has managed incomingsequences for the InterPro database at theEuropean Bioinformatics Institute inCambridge, England. “The horizons havebeen changing almost monthly.” Petryszakadds incoming protein sequences to thedatabase and then annotates thesequences periodically using both an in-home cluster and an external supercom-puter. When biologist Craig Venter,

PhD, publishes results from his shotgunsequencing project and the sequences gopublic, Petryszak says, it could triple theInterpro database from its current 600gigabytes to 1.8 terabytes by the end of2007. Storage is not a problem, but index-ing the sequences and accessing the data

quickly to send to users is difficult.“The amount of data is just going to

be enormous,” Petryszak says. “That’sgoing to cause a headache, even for thesupposedly heavyweight databases.”

BIOMEDICAL COMPUTINGFOR THE 21ST CENTURY

In biology today, supercomputing is theexception. Even computational biologiststend to solve problems using the comput-

ers they have on hand. Few dream up ques-tions that would require more resources.

“We have a need for high-performancecomputing in biology, but there’s nodemand,” says Nathan Goodman, PhD,senior research scientist at the Institute forSystems Biology in Seattle, WA. “If you go

to a field like physics,people are always think-ing ‘What could I do if Ihad more computingpower.’ They under-stand that their ability toanalyze data is limited bytheir computationalpower.” It’s a Catch-22, he says. Biologistsdon’t have access tolarge computers and sothey don’t propose prob-lems that would requirethem. Because theydon’t propose the prob-lems, they don’tacquire the resources.Whether it’s a questionof training or simply theculture of the discipline,biologists are not yetmaking the most oflarge-scale computing.

“Why daydreamabout something youdon’t have?” Pande says.“But if you give [biolo-gists] the resource, andespecially give the stu-dents access to it, thenthey will come up withnew algorithms andnew uses.”

A case in point is geneticist Batzoglou,a convert to large-scale computing.Although his own background is in com-puter science, he initially shrugged offnews that his department had acquired a600-processor supercomputer for the bio-sciences. But after the machine arrived, heand his graduate students became some ofthe biggest users. Last summer, Batzoglouinvested $55,000 in grant money to buyhis own 100-processor cluster.

“Before we started using it, we didn’trealize how useful it is to have such hugecomputing capabilities,” recalls Batzoglou,who writes algorithms to analyze geneticsequences. “If there’s anything we’velearned it’s that the more computingpower we have, the more we are going tofind ways to use it.”

Some fields angle to capitalize onthe growth in computing power. ThePetascale Collaboratory for theGeosciences, an ad hoc group of scien-tists established in 2004, draws upquestions for the upcoming generationof supercomputers. “I would love to seean analogous effort with biologists,”says Snavely, a member of the taskforce. “To my knowledge there hasn’tbeen this meeting of the minds thatsays, ‘OK, if this is where the technolo-gy is going, what important biologyproblems do we think we could solve?’”

“Biology is probably going to be thelargest user of high-performance comput-ing in the 21st century,” Germain pre-dicts. Sure, this might sound like oldnews to long-time observers of the bio-logical sciences. But hype in the early1990s was premature—biological modelswere still too rough and the computingpower was insufficient, says MichaelPitman, PhD, who leads the membraneprotein group at the IBM TJ WatsonResearch Center in Yorktown Heights,New York. Finally, he says, we’re nearingthe point where supercomputers can liveup to the hype. “I’ve been very encour-aged by the kinds of questions we can askand the quality of answers we’re getting,”he says. “I do feel that we’re in a new erafor supercomputers in biology.” !!

As part of the Blue Brain project, high-performance computersare being used to model the human brain. In preliminary wet-lab research shown here, researchers stained columns of neu-rons in the neocortex to design a detailed model of its circuitry.Each column contains 10,000 individual neurons; thousands ofcolumns together make up the neocortex. Blue Brainresearchers hope to simulate the entire neocortex. In January2005, the team announced they had simulated 10,000 neuronson the Blue Gene/L machine, a model 10 million times morecomplex than any previous neural simulation. The project is acollaboration between IBM and the Ecole PolytechniqueFederale de Lausanne in Switzerland. Courtesy of IBM Research.

“Biology is probably going to be the largest user of high-performancecomputing in the 21st century,” Germain predicts.

Stanford_p9-17:E 8-15 Human vs Machine 10/2/06 2:27 PM Page 15