genomes to life a partnership between biology and computing

23
1 OASCR Genomes to Life OBER Genomes to Life a partnership between Biology and Computing http://www.doegenomestolife.org/ Gary Johnson John Houghton Office of Science

Upload: teige

Post on 12-Jan-2016

27 views

Category:

Documents


0 download

DESCRIPTION

Genomes to Life a partnership between Biology and Computing. Gary Johnson John Houghton Office of Science. http://www.doegenomestolife.org/. Office of Advanced Scientific Computing Research: Mathematical, Information and Computational Sciences. a brief overview. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Genomes to Life a partnership between Biology and Computing

1

OASCR Genomes to Life OBER

Genomes to Lifea partnership between Biology and Computing

http://www.doegenomestolife.org/

Gary JohnsonJohn Houghton

Office of Science

Page 2: Genomes to Life a partnership between Biology and Computing

2

OASCR Genomes to Life OBER

Page 3: Genomes to Life a partnership between Biology and Computing

3

OASCR Genomes to Life OBER

Office of Advanced Scientific Computing Research:

Mathematical, Informationand

Computational Sciences

a brief overview

http://www.sc.doe.gov/production/octr/mics/index.html

Page 4: Genomes to Life a partnership between Biology and Computing

4

OASCR Genomes to Life OBER

operate supercomputers, a high performance network, and related facilities.

MICS Mission

Discover, develop, and deploy the computational and networking advances that enable researchers in the scientific disciplines to analyze, model, simulate, and predict complex physical, chemical, and biological phenomena important to the Department of Energy (DOE).

support a broad research portfolio in advanced scientific computing – applied mathematics, computer science, networking and collaboratory software

Page 5: Genomes to Life a partnership between Biology and Computing

5

OASCR Genomes to Life OBER

Program Strategy

BasicResearch

…simulation …distributed teams, of complex systems remote access to facilities

Energy Sciences Network (ESnet)

Advanced Computing Research Facilities

National Energy Research Scientific Computing Center (NERSC)

• Materials• Chemical• Combustion• Accelerator• HEP• Nuclear• Fusion• Climate• Astrophysics

• Applied Mathematics• Computer Science

• Scientific Application Pilots• Collaboratory Tools

• Collaboratory Pilots

BES,BER, FES, HEP, NP

• Integrated Software Infrastructure CentersTeams- mathematicians, computer scientists,

application scientists, and software engineers

High Performance Computing and Network Facilities for Science

Research to enable…

• Grid enabling research • Topical Computing

• Networking

• Nanoscience

Computational Biology

Page 6: Genomes to Life a partnership between Biology and Computing

6

OASCR Genomes to Life OBER

Budget Request

FY2003- $166,625,000

Base Research

Comp. Bio.SciDAC

FacilitiesSBIR/STTR

32%

5%25%

35%

3%

Enhancements over FY2002

• Computational Biology +$5.6M• SciDAC +$5.3M• Facilities +$1.3M

Page 7: Genomes to Life a partnership between Biology and Computing

7

OASCR Genomes to Life OBER

unscalable

scalable

Problem Size (increasing with number of processors)

Tim

e t

o S

olu

tio

n

200

150

50

0

100

10 100 10001

Ax=b F(u,x,y,z)=0 F(u,u’,u’’,…,x,y,z,t)=0

Linear Solvers Nonlinear Solvers PDE Solvers

From the “simple”… …to the complex!

Ax=Bx

Eigensolvers

Algorithms must be scalable. Ideally, as the problem size grows and the number of processors grows, the solution time does not !

Combustion

~60 coupled, nonsymmetric, nonlinear time-dependent PDEs on 10M mesh points. Time steps range from 10-12 (for chemical reaction rates) to 10-2 (for the speed of flame front)

Current simulations use 44 amino acids.

Protein Folding

Actual protein ~300 amino acids. Run times using current techniques? Greater than life of the universe!

Applied Mathematical Sciences

Page 8: Genomes to Life a partnership between Biology and Computing

8

OASCR Genomes to Life OBER

AMS Base Research Program

ObjectivesAdvance our understanding of science and technology by supporting research in basic applied mathematics and in computational research that facilitates the use of the latest high-performance computer systems.

Applied Mathematics Research:Linear AlgebraFluid DynamicsDifferential Eqs.Optimization

Robust High-Performance Numerical LibrariesAdaptive Mesh Refinement (AMR)Sustained Teraflop/s simulationsLevel Set / Fast Marching MethodsInvestment in Education Computational Sciences Graduate Fellowship

Ultrascalable Algorithms(up to millions of PEs)

Mathematical Microscopy

These opportunities will be explored through• Genomes to Life (with BER)• Comp. Nanoscience (with BES)• Fusion Energy (FESAC-ASCAC workshop)

Ongoing Projects Growth Opportunities

Accomplishments

Grid GenerationPredictability Analysis &Uncertainty Quantification

Automated Reasoning

Advanced Numerical Algorithms:PETScAztecTAOADIFOR / ADIC

HypreCHOMBOSuperLUPICO

Page 9: Genomes to Life a partnership between Biology and Computing

9

OASCR Genomes to Life OBER

Computer Science Research

• Challenge – HPC for Science is (still after fifteen years!)– Hard to use– Inefficient– Fragile– An unimportant vendor market

• Vision– A comprehensive, integrated software

environment which enables the effective application of high performance systems to critical DOE problems

• Goal– Radical Improvement in– Application Performance– Ease of Use– Time to Solution Node and System Hardware Arch

User Space Runtime Support

OS Kernel OS Bypass

ScientificApplications

SystemAdmin

SoftwareDevelopment

Chkpt/Rstrt Math LibsDebuggers

Viz/Data Scheduler

PSEsRes. Mgt Framewrks

Compilers

Perf ToolsFile Sys Runtme Tls

HPC System Elements

Page 10: Genomes to Life a partnership between Biology and Computing

10

OASCR Genomes to Life OBER

Computer Science Technical Elements

Interoperability & Portability

Tools$6.5M

System Software

Environment$4.7M

Performance Evaluation & Optimization

$4.5M

Programming Models & Runtime$3.8M

Visualization & Data

Understanding$5.8M

25%19%

18%

15%23%

Page 11: Genomes to Life a partnership between Biology and Computing

11

OASCR Genomes to Life OBER

Major Accomplishments

• PVM – the first widely successful model for parallel computing• MPI – the lingua franca of today’s parallel computing• MPICH – the open source version of MPI that is the basis for all

vendor adaptations• Global Arrays – the distributed shared memory programming model

that is at the core of NWChem, the motivating application for SciDAC

• CTSS – the first interactive operating system for high performance computers

• SUNMOS/Puma/Cougar – the most successful high performance parallel operating system

• OSCAR – a partnership with industry, the most widely used open source toolkit for management of Linux clusters

Page 12: Genomes to Life a partnership between Biology and Computing

12

OASCR Genomes to Life OBER

National Collaboratories

• The nature of how large scale science is done is changing

– Distributed data, computing, people, instruments

– Instruments integrated with large-scale computing

– Human resources are seldom collocated with the resources needed for their science

• Additional drivers– Large and international collaborations

– Management of unique national user facilities

– Large multi-laboratory science and engineering projects

Why?

Page 13: Genomes to Life a partnership between Biology and Computing

13

OASCR Genomes to Life OBER

NERSCSupercomputing

& Large-Scale Storage

PNNL

LBNL

ANL

ESnet

Europe

ORNL

ESNet

MDSCA

Asia-Pacific

Scientist

An End-to-End Problem for ApplicationsMany different types of objectsneed to be connected to and coordinated by the networks

Page 14: Genomes to Life a partnership between Biology and Computing

14

OASCR Genomes to Life OBER

Staff

– Ed Oliver, Associate Director for Advanced Scientific Computing Research– Dan Hitchcock, Senior Scientific Advisor– Linda Twenty, Senior Budget & Financial Specialist

– Walt Polansky, Acting Director MICS

– Gary Johnson, ACRTs, Computational Biology– Fred Johnson, Computer Science– William (Buff) Miner, NERSC & Scientific Applications– Thomas Ndousse-Fetter, Network Research– Kimberly Rasar, Senior Info. Tech. (SciDAC)– Chuck Romine, Applied Mathematics– Mary Anne Scott, Collaboratories– George Seweryniak, Esnet– John van Rosendale, Computer Science- Visualization and Data Management

– Vacancies- (2)

– Jane Hiegel– Susan Kilroy

Phone- 301-903-5800Fax- 301-903-7774http://www.sc.doe.gov/production/octr/mics/index.html

Page 15: Genomes to Life a partnership between Biology and Computing

15

OASCR Genomes to Life OBER

OASCR Advisory Committee

• Committee Chair: Margaret Wright, NYU

• Subcommittee Chairs:– Biology: Juan Meza, LBNL– Computing Infrastructure: Jill Dahlberg, General Atomics

• Members in common

with BERAC: Warren Washington, NCAR

• Next Meeting:2-3 May 2002

Crowne Plaza Hotel

14th and K Streets

Washington, DC

Page 16: Genomes to Life a partnership between Biology and Computing

16

OASCR Genomes to Life OBER

Genomes to Life Program History

• Phased program startup– FY 2002: OBER

– FY 2003: OASCR

• Precursor activity– FN 01-21: Advanced Modeling and Simulation of Biological Systems

– 9 Awards, $3M

• Current solicitations– FN 02-13: Genomes to Life

• Program planning– 5 workshops

– Goal 4 roadmap

– Update to GTL roadmap

Page 17: Genomes to Life a partnership between Biology and Computing

17

OASCR Genomes to Life OBER

GTL Planning Activities

• 7-8 August GTL Computing Workshop

• 6-7 September Systems Biology & GTL Workshop

• 22-23 January Computing Infrastructure Workshop

• 6-7 March Computer Science for GTL Workshop

• 18-19 March Mathematics for GTL Workshop

• 19 April Draft Goal 4 Roadmap

• Future New Edition of the GTL Roadmap

Page 18: Genomes to Life a partnership between Biology and Computing

18

OASCR Genomes to Life OBER

GTL Goal 4 Roadmap

Page 19: Genomes to Life a partnership between Biology and Computing

19

OASCR Genomes to Life OBER

Genomes to Life Goals

Goal 1 Identify and Characterize the Molecular Machines

of Life – the Multiprotein Complexes that Execute

Cellular Functions and Govern Cell Form

Goal 2 Characterize Gene Regulatory Networks

Goal 3 Characterize the Functional Repertoire of Complex

Microbial Communities in their Natural Environments

at the Molecular Level

Goal 4 Develop the Computational Methods and Capabilities

to Advance Understanding of Complex Biological

Systems and Predict their Behavior

Page 20: Genomes to Life a partnership between Biology and Computing

20

OASCR Genomes to Life OBER

Three Computing Domains

• Bioinformatics/Data-Intensive Applications

• Biophysics/Compute-Intensive Applications

• Biosystems/Complex Systems Modeling

Page 21: Genomes to Life a partnership between Biology and Computing

21

OASCR Genomes to Life OBER

Biology & Computing Perspectives

Page 22: Genomes to Life a partnership between Biology and Computing

22

OASCR Genomes to Life OBER

Domain Challenges

• Bioinformatics– Heterogeneous, large and growing data sets

– Legacy systems that don’t interoperate and don’t scale

• Biophysics– Already bumping up against computational resources

• More computation, better algorithms, new theory

• Biosystems – Too much data not to have models

– Data-poor and biology-poor

– Parts list short, but complex systems

Page 23: Genomes to Life a partnership between Biology and Computing

23

OASCR Genomes to Life OBER

Initial Thoughts on Computational Infrastructure