[email protected] bioscience on the teragrid daniel s. katz [email protected] director of science,...

17
[email protected] BioScience on the TeraGrid Daniel S. Katz [email protected] Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University of Chicago & Argonne National Laboratory Affiliate Faculty, Center for Computation & Technology, LSU Adjunct Associate Professor, Electrical and Computer Engineering, LSU

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

BioScience on the TeraGrid

Daniel S. Katz

[email protected]

Director of Science, TeraGrid GIG

Senior Fellow, Computation Institute, University of Chicago & Argonne National Laboratory

Affiliate Faculty, Center for Computation & Technology, LSU

Adjunct Associate Professor, Electrical and Computer Engineering, LSU

Page 2: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

What is the TeraGrid• World’s largest distributed cyberinfrastructure for open scientific

research, supported by US NSF

• Integrated high performance computers (>2 PF HPC & >27000 HTC CPUs), data resources (>2 PB disk, >60 PB tape, data collections), visualization, experimental facilities (VMs, GPUs, FPGAs), network at 11 Resource Provider sites

• Allocated to US researchers and their collaborators through national peer-review process

• DEEP: provide powerful computational resources to enable research that can’t otherwise be accomplished

• WIDE: grow the community of computational science and make the resources easily accessible

• OPEN: connect with new resources and institutions

• Integration: Single {portal, sign-on, help desk, allocations process, advanced user support, EOT, campus champions}

http://www.teragrid.org/

Page 3: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Governance

• 11 Resource Providers (RPs) funded under separate agreements with NSF– Different start and end dates– Different goals– Different agreements– Different funding models

• 1 Coordinating Body – Grid Infrastructure Group (GIG)– University of Chicago/Argonne National Laboratory– Subcontracts to all RPs and six other universities– 7-8 Area Directors– Working groups with members from many RPs

• TeraGrid Forum with Chair

Page 4: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Who Uses TeraGrid (2009)

(2008)

Page 5: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

How TeraGrid Is Used

Use ModalityCommunity Size

(rough est. - number of users)

Batch Computing on Individual Resources 850Exploratory and Application Porting 650Workflow, Ensemble, and Parameter Sweep 250Science Gateway Access 500Remote Interactive Steering and Visualization 35Tightly-Coupled Distributed Computation 102006 data

Page 6: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

How One Uses TeraGrid

ComputeService

VizService

DataService

Network, Accounting, …

RP 1

RP 3

RP 2

TeraGrid Infrastructure (Accounting, Network, Authorization,…)

POPS (for now)

Science Gateways

UserPortal

Command Line

Page 7: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

User Portal: portal.teragrid.org

http://portal.teragrid.org/

Page 8: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Science Gateways

• A natural extension of Internet & Web 2.0• Idea resonates with Scientists

– Researchers can imagine scientific capabilities provided through familiar interface• Mostly web portal or web or client-server program

• Designed by communities; provide interfaces understood by those communities– Also provide access to greater capabilities (back end)– Without user understand details of capabilities– Scientists know they can undertake more complex analyses

and that’s all they want to focus on– TeraGrid provides tools to help developer

• Seamless access doesn’t come for free– Hinges on very capable developer

Nancy Wilkins-Diehr

Page 9: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

TeraGrid -> XD Future

• Current RP agreements end in March 2011– Except track 2 centers (current and future)

• TeraGrid XD (eXtreme Digital) starts in April 2011– Era of potential interoperation with OSG and others– New types of science applications?

• Current TG GIG continues through July 2011– Allows four months of overlap in coordination– Probable overlap between GIG and XD members

• Blue Waters (track 1) production in 2011

Page 10: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS)

Peter Coveney, University College London

Model large-scale patient-specific cerebral blood flow in clinically-relevant time scale

• Provide simulation support within the operating theatre for neuroradiologists

• Provide new information to surgeons for patient management and therapy:1. Diagnosis and risk assessment

2. Predictive simulation in therapy

• Provide patient-specific information to help plan embolisation of arterio-venous malformations, coiling of aneurysms, etc.Clinical workflow:

•Book computing resources in advance or use preemption

•Shift imaging data around quickly over high-bandwidth low-latency dedicated links

•Interactive simulations and real-time visualization for immediate feedback

Page 11: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

OLSGW Gadgets

• OLSGW Integrates bio-informatics applications• BLAST, InterProScan, CLUSTALW , MUSCLE, PSIPRED, ACCPRO, VSL2

• 454 Pyrosequencing service under development• Four OLSGW gadgets have been published in the iGoogle gadget directory. Search

for “TeraGrid Life Science”.

Wenjun Wu, Thomas Uram, Michael Papka, ANL

Page 12: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Multiscale Simulation of Arterial Tree

Need to combine multi-scale models: 1D (arteries), 3D Navier Stokes (organs, arterial junctions, etc.), Dissipative Particle Dynamics (capillaries, venules, arterioles, blood cells, etc.), Molecular Dynamics (blood cells, platelets, molecular adhesion, etc.)

NIH/NSF-IMAG project: George Em Karnaidakis, Brown

activated platelets

Arterioles/venules 50 microns

Platelet diameter is 2-4 µmNormal platelet concentration in blood is 300,000/mm3

Functions: activation, adhesion to injured walls, and other platelets

Page 13: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Expressed Sequence Tag (EST) Pipeline• ESTs are a collection of random cDNA sequences, sequenced from a cDNA library

or sequencing devices– Typical inputs are O(Million) sequences– Newer 454 devices from higher volume, are relatively easy to obtain and operate– Stored using FASTA format

• ESTs are clustered and assembled to form contigs• Contigs then used to identify potential unknown genes, by Blasting against

known protein database• Goal: Use TeraGrid for backend computing, with existing software, and a gateway

frontend

Initial results – run that took 5 days on local cluster done in 2 days – more opt. underway

A. Kulshrestha, S. L. Pallickara, K. N. Muthuram, C. Kong, Q. Dong, M. Pierce, H. Tang, IU

Page 14: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Experimental structures

Atomic-level simulation

Coarse-grained (CG) model development CG simulation

An iterative modeling approach combining experimental imaging (cryo-electron tomography), coarse-grained (CG) simulation, and atomic-level molecular dynamics (MD)

New CG Interactions from MD

Wright, Schooler, Ding, Kieffer, Fillmore, Sundquist, Jenson, EMBO, 26, 2218, 2007

CG model refinement Key CG interactions

Multiscale Computer Simulation of the Immature HIV-1 Virion

G. A. Voth, U. of Chicago

Page 15: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

CIPRES Portal: A New Science Gateway for Systematics

• Systematics: study of diversification of life and relationships among living things through time

• CIPRES: a flexible web application that can be sustained by the community at minimal cost even beyond the funding period of the project

• Tools include parallel versions of MrBayes, RAxML, GARLI• User requirements include:

– Access to most or all native command line options– Add new tools quickly– Provide personal user space for storing results– Use TeraGrid resources to quickly provide results

• Cited in at least 35 publications, including Nature, PNAS, Cell– Examples: New Family Tree for Arthropoda, Genome Sequence of a

Transitional Eukaryote, Co-evolution of Beetles and Flowering Plants• Used routinely in at least 5 undergraduate classes• Use 77% US (incl. 17 EPSCoR states), 23% 33 other countries

Mark Miller, SDSC

Page 16: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Patient-Specific HIV Drug Therapy

Peter Coveney, University College London

HIV-1 Protease is a common target for HIV drug therapy• Enzyme of HIV responsible for protein maturation• Target for anti-retroviral Inhibitors• Example of structure assisted drug design• 9 FDA inhibitors of HIV-1 proteaseSo what’s the problem?• Emergence of drug resistant mutations in protease• Render drug ineffective• Drug resistant mutants have emerged for all FDA inhibitors• Too many mutations to be interpreted by a clinician

Solution: build a Binding Affinity Calculator (BAC)• Provide tools that allow simulations to be used in clinical context, including

lightweight client– User only needs specify enzyme, mutations relative to wildtype, drug

• Others options can be specified but begin as default• Requires large number of simulations to be constructed and run automatically

(across distributed HPC resources)– To investigate generalisation– Automation is critical for clinical use

• Turn-around time scale of around a week is required• Trade off between accuracy and time-to-solution

Initial results – ensemble MD calculations for lopinavir vs wildtype & five mutants – appear promising; excellent relative ranking in binding free energies

Page 17: D.katz@ieee.org BioScience on the TeraGrid Daniel S. Katz d.katz@ieee.org Director of Science, TeraGrid GIG Senior Fellow, Computation Institute, University

[email protected]

Scripting Protein Structure Prediction

T. Sosnick, K. Freed, G. Hocky, J. DeBartolo, A. Adhikari, J. Xu, W. Wilde, U. Chicago

…1000

predict()calls

Analyze()

int nSim = 1000;int maxRounds = 3;Protein pSet[ ] <ext; exec="Protein.map">;float startTemp[ ] = [ 100.0, 200.0 ];float delT[ ] = [ 1.0, 1.5, 2.0, 5.0, 10.0 ];foreach p, pn in pSet { foreach t in startTemp { foreach d in delT { ItFix(p, nSim, maxRounds, t, d); } }}

ItFix(){ foreach sim in [1:nSim] { (structure[sim], log[sim]) = predict(p, t, d); } result = analyze(structure)}

10 proteins x 1000 simulations x 3 rounds x 2 temps x 5 delta-T’s = 300K application runs