cyberinfrastructure and the role of grid computing

23
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”

Upload: jolie

Post on 20-Mar-2016

50 views

Category:

Documents


0 download

DESCRIPTION

Or, “Science 2.0”. Cyberinfrastructure and the Role of Grid Computing. Ian Foster Computation Institute Argonne National Lab & University of Chicago. “Web 2.0”. Software as services Data- & computation-rich network services Services as platforms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cyberinfrastructure and the Role of Grid Computing

Ian FosterComputation Institute

Argonne National Lab & University of Chicago

Cyberinfrastructure and the Role of Grid Computing

Or, “Science 2.0”

Page 2: Cyberinfrastructure and the Role of Grid Computing

2

“Web 2.0” Software as services

Data- & computation-richnetwork services

Services as platforms Easy composition of services to create new capabilities

(“mashups”)—that themselves may be made accessible as new services

Enabled by massive infrastructure buildout Google projected to spend $1.5B on computers, networks,

and real estate in 2006 Dozens of others are spending substantially

Paid for by advertising

Declan Butler,Nature

Page 3: Cyberinfrastructure and the Role of Grid Computing

3

Science 2.0:E.g., Virtual Observatories

Data Archives

User

Analysis tools

Gateway

Figure: S. G. Djorgovski

Discovery tools

Page 4: Cyberinfrastructure and the Role of Grid Computing

4

Science 2.0:E.g., Cancer Bioinformatics Grid

Data Service@ uchicago.edu<BPEL

WorkflowDoc> BPEL

Engine

Analytic service@ osu.edu

Analytic service@ duke.eduResearcher

Or Client App <WorkflowResults>

<WorkflowInputs> link

link

link

link

caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.

Page 5: Cyberinfrastructure and the Role of Grid Computing

5

The Two Dimensions of Science 2.0

Decompose across network Clients integrate dynamically

Select & compose services Select “best of breed” providers Publish result as new services

Decouple resource & service providers

FunctionResource

Data Archives

Analysis tools

Discovery toolsUsers

Fig: S. G. Djorgovski

Page 6: Cyberinfrastructure and the Role of Grid Computing

6

Provisioning

Technology Requirements: Integration & Decomposition

Service-oriented Gridinfrastructure Provision physical

resources to support application workloads

ApplnService

ApplnService

Users

Workflows

Composition

Invocation

Service-oriented applications Wrap applications &

data as services Compose services

into workflows

“The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005

Page 7: Cyberinfrastructure and the Role of Grid Computing

7

Globus SoftwareEnables Grid Infrastructure

Web service interfaces for behaviors relating to integration and decomposition Primitives: resources, state, security Services: execution, data movement, …

Open source software that implements those interfaces In particular, Globus Toolkit (GT4)

All standard Web services “Grid is a use case for Web services, focused on

resource management”

Page 8: Cyberinfrastructure and the Role of Grid Computing

8

Open Source Grid Software

Data MgmtSecurity Common

RuntimeExecution

MgmtInfo

Services

GridFTPAuthenticationAuthorization

ReliableFile

Transfer

Data Access& Integration

Grid ResourceAllocation &

ManagementIndex

CommunityAuthorization

DataReplication

CommunitySchedulingFramework

Delegation

ReplicaLocation

Trigger

Java Runtime

C Runtime

Python RuntimeWebMDS

WorkspaceManagement

Grid Telecontrol

Protocol

Globus Toolkit v4www.globus.org

CredentialMgmt

Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

Page 9: Cyberinfrastructure and the Role of Grid Computing

9

http://dev.globus.org

Guidelines(Apache)

Infrastructure(CVS, email,

bugzilla, Wiki)

ProjectsInclude

dev.globus — Community Driven Improvement of Globus Software, NSF OCI

Page 10: Cyberinfrastructure and the Role of Grid Computing

10

Community

Services Provider

Content

Services

Capacity

Hosted Science Services1) Integrate services from external sources

Virtualize “services” from providers

2) Coordinate & compose Create new services from existing ones

Capacity Provider

“Service-Oriented Science”, Science, 2005

Page 11: Cyberinfrastructure and the Role of Grid Computing

11

Birmingham•

The Globus-BasedLIGO Data Grid

Replicating >1 Terabyte/day to 8 sites>40 million replicas so farMTBF = 1 month

LIGO Gravitational Wave Observatory

www.globus.org/solutions

Cardiff

AEI/Golm

Page 12: Cyberinfrastructure and the Role of Grid Computing

12

Pull “missing” files to a storage system

List of required

Files

GridFTPLocal

ReplicaCatalog

ReplicaLocation

Index

Data Replication

Service

Reliable File

Transfer Service Local

ReplicaCatalog

GridFTP

Data Replication Service

“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

ReplicaLocation

Index

Data MovementData Location

Data Replication

Page 13: Cyberinfrastructure and the Role of Grid Computing

14

Example: Biology Public PUMA

Knowledge BaseInformation about proteins analyzed against ~2 million gene sequences

Back OfficeAnalysis on GridMillions of BLAST, BLOCKS, etc., on

OSG and TeraGridNatalia Maltsev et al., http://compbio.mcs.anl.gov/puma2

Page 14: Cyberinfrastructure and the Role of Grid Computing

15

Example:Earth System Grid

Climate simulation data Per-collection control Different user classes Server-side processing

Implementation (GT) Portal-based User

Registration (PURSE) PKI, SAML assertions GridFTP, GRAM, SRM

>2000 users >100 TB downloaded

www.earthsystemgrid.org — DOE OASCR

Page 15: Cyberinfrastructure and the Role of Grid Computing

16

Under the Covers

Page 16: Cyberinfrastructure and the Role of Grid Computing

17

Example:Astro Portal Stacking Service

Purpose On-demand “stacks” of

random locations within ~10TB dataset

Challenge Rapid access to 10-10K

“random” files Time-varying load

Solution Dynamic acquisition of

compute, storage

++++++=

+

S4 SloanDataWeb page

or Web Service

Joint work with Ioan Raicu & Alex Szalay

Page 17: Cyberinfrastructure and the Role of Grid Computing

18

Preliminary Performance (TeraGrid, LAN GPFS)

Joint work with Ioan Raicu & Alex Szalay

Page 18: Cyberinfrastructure and the Role of Grid Computing

19

Example: Cybershake Calculate hazard curves by generating synthetic

seismograms from estimated rupture forecast

Rupture Forecast

Synthetic Seismogram

Strain GreenTensor

Hazard CurveSpectral Acceleration

Hazard Map

Tom Jordan et al., Southern California Earthquake Center

Page 19: Cyberinfrastructure and the Role of Grid Computing

20

20 TB,1.8 CPU-year

Enlisting TeraGridResources

Workflow Scheduler/Engine

VO Service Catalog

Provenance Catalog

Data Catalog

SCECStorage

TeraGridCompute

TeraGridStorage

VO Scheduler

Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute

Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years)

1

10

100

1000

10000

100000

10/19

10/21

10/23

10/25

10/27

10/29

10/31 11

/211

/411

/611

/811

/10

JOBS HRS

Page 20: Cyberinfrastructure and the Role of Grid Computing

21

Science 1.0 Science 2.0Gigabytes

TarballsJournals

IndividualsCommunity codes

Supercomputer centersMakefile

Computational sciencePhysical sciences

Computational scientistsNSF-funded

Terabytes Services Wikis Communities Science gateways TeraGrid, OSG, campus Workflow Science as computation All sciences (& humanities) All scientists NSF-funded

Page 21: Cyberinfrastructure and the Role of Grid Computing

22

Science 2.0 Challenges A need for new technologies, skills, & roles

Creating, publishing, hosting, discovering, composing, archiving, explaining … services

A need for substantial software development “30-80% of modern astronomy projects is software”—

S. G. Djorgovski A need for more & different infrastructure

Computers & networks to host services Can we leverage commercial spending?

To some extent, but not straightforward

Page 22: Cyberinfrastructure and the Role of Grid Computing

23

Acknowledgements Carl Kesselman for many discussions Many colleagues, including those named on

slides, for research collaborations and/or slides Colleagues involved in the TeraGrid, Open

Science Grid, Earth System Grid, caBIG, and other Grid infrastructures

Globus Alliance members for Globus software R&D

DOE OASCR, NSF OCI, & NIH for support

Page 23: Cyberinfrastructure and the Role of Grid Computing

24

For More Information Globus Alliance

www.globus.org Dev.Globus

dev.globus.org Open Science Grid

www.opensciencegrid.org TeraGrid

www.teragrid.org Background

www.mcs.anl.gov/~foster2nd Edition

www.mkp.com/grid2

Thanks for DOE, NSF, and NIH for research support!!