cyberinfrastructure and the role of grid computing

22
Ian Foster Computation Institute Argonne National Lab & University of Chicago Cyberinfrastructure and the Role of Grid Computing Or, “Science 2.0”

Upload: gay

Post on 07-Jan-2016

45 views

Category:

Documents


1 download

DESCRIPTION

Or, “Science 2.0”. Cyberinfrastructure and the Role of Grid Computing. Ian Foster Computation Institute Argonne National Lab & University of Chicago. “Web 2.0”. Software as services Data- & computation-rich network services Services as platforms - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cyberinfrastructure and the Role of Grid Computing

Ian FosterComputation Institute

Argonne National Lab & University of Chicago

Cyberinfrastructure and the Role of Grid Computing

Or, “Science 2.0”

Page 2: Cyberinfrastructure and the Role of Grid Computing

2

“Web 2.0” Software as services

Data- & computation-richnetwork services

Services as platforms Easy composition of services to create new capabilities

(“mashups”)—that themselves may be made accessible as new services

Enabled by massive infrastructure buildout Google projected to spend $1.5B on computers, networks,

and real estate in 2006 Dozens of others are spending substantially

Paid for by advertising

Declan Butler

Page 3: Cyberinfrastructure and the Role of Grid Computing

3

Science 2.0:E.g., Virtual Observatories

Data ArchivesData Archives

User

Analysis toolsAnalysis tools

Gateway

Figure: S. G. Djorgovski

Discovery toolsDiscovery tools

Page 4: Cyberinfrastructure and the Role of Grid Computing

4

Science 2.0:E.g., Cancer Bioinformatics Grid

Data Service@ uchicago.edu<BPEL

WorkflowDoc>

BPELEngine

Analytic service@ osu.edu

Analytic service@ duke.eduResearcher

Or Client App<WorkflowResults>

<WorkflowInputs> link

link

link

link

Ravi Madduri et al.

Page 5: Cyberinfrastructure and the Role of Grid Computing

5

The Two Dimensions of Science 2.0

Decompose across network

Clients integrate dynamically Select & compose services Select “best of breed” providers Publish result as new services

Decouple resource & service providers

Function

Resource

Data Archives

Analysis tools

Discovery toolsUsers

Fig: S. G. Djorgovski

Page 6: Cyberinfrastructure and the Role of Grid Computing

6

Provisioning

Technology Requirements: Integration & Decomposition

Service-oriented Gridinfrastructure Provision physical

resources to support application workloads

ApplnService

ApplnService

Users

Workflows

Composition

Invocation

Service-oriented applications Wrap applications &

data as services Compose services

into workflows

“The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005

Page 7: Cyberinfrastructure and the Role of Grid Computing

7

Globus SoftwareEnables Grid Infrastructure

Web service interfaces for behaviors relating to integration and decomposition Primitives: resources, state, security Services: program execution, data movement, data

access, … Open source software that implements those

interfaces In particular, Globus Toolkit (GT4)

All standard Web services “Grid is a use case for Web services, focused on

resource management”

Page 8: Cyberinfrastructure and the Role of Grid Computing

8

Open Source Grid Software

Data Mgmt

SecurityCommonRuntime

Execution Mgmt

Info Services

GridFTPAuthenticationAuthorization

ReliableFile

Transfer

Data Access& Integration

Grid ResourceAllocation &

ManagementIndex

CommunityAuthorization

DataReplication

CommunitySchedulingFramework

Delegation

ReplicaLocation

Trigger

Java Runtime

C Runtime

Python Runtime

WebMDS

WorkspaceManagement

Grid Telecontrol

Protocol

Globus Toolkit v4www.globus.org

CredentialMgmt

Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005

Page 9: Cyberinfrastructure and the Role of Grid Computing

9

http://dev.globus.org

Guidelines(Apache)

Infrastructure(CVS, email,

bugzilla, Wiki)

ProjectsInclude

Page 10: Cyberinfrastructure and the Role of Grid Computing

10

Community

Services Provider

Content

Services

Capacity

Hosted Science Services1) Integrate services from external sources

Virtualize “services” from providers

2) Coordinate & compose Create new services from existing ones

Capacity Provider

“Service-Oriented Science”, Science, 2005

Page 11: Cyberinfrastructure and the Role of Grid Computing

11

Birmingham•

The Globus-BasedLIGO Data Grid

Replicating >1 Terabyte/day to 8 sites>40 million replicas so farMTBF = 1 month

LIGO Gravitational Wave Observatory

www.globus.org/solutions

Cardiff

AEI/Golm

Page 12: Cyberinfrastructure and the Role of Grid Computing

12

Pull “missing” files to a storage system

List of required

Files

GridFTPLocal

ReplicaCatalog

ReplicaLocation

Index

Data Replication

Service

Reliable File

Transfer Service Local

ReplicaCatalog

GridFTP

Data Replication Service

“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005

ReplicaLocation

Index

Data MovementData Location

Data Replication

Page 13: Cyberinfrastructure and the Role of Grid Computing

14

Example: Biology

Public PUMA Knowledge Base

Information about proteins analyzed against ~2 million gene sequences

Back OfficeAnalysis on Grid

Involves millions of BLAST, BLOCKS, and

other processesNatalia Maltsev et al.http://compbio.mcs.anl.gov/puma2

Page 14: Cyberinfrastructure and the Role of Grid Computing

15

Example:Earth System Grid

Provide access to large climate simulation data Per-collection control Different user classes Server-side processing

Implementation (GT) Portal-based User Registration

(PURSE) PKI, SAML assertions GridFTP, GRAM, SRM

>2000 users >100 TB downloaded

Page 15: Cyberinfrastructure and the Role of Grid Computing

16

Under the Covers

Page 16: Cyberinfrastructure and the Role of Grid Computing

17

Example:Astro Portal Stacking Service

Purpose On-demand “stacks” of

random locations within ~10TB dataset

Challenge Rapid access to 10-10K

“random” files Time-varying load

Solution Dynamic acquisition of

compute, storage

++++++

=

+

S4 SloanDataWeb page

or Web Service

With Ioan Raicu & Alex Szalay

Page 17: Cyberinfrastructure and the Role of Grid Computing

18

Astro Portal Stacking Performance (LAN GPFS)

Page 18: Cyberinfrastructure and the Role of Grid Computing

19

Example: Cybershake

Calculate hazard curves by generating synthetic seismograms from estimated rupture forecast

Rupture Forecast

Synthetic Seismogram

Strain GreenTensor

Hazard CurveSpectral Acceleration

Hazard Map

Tom Jordan et al., SCEC

Page 19: Cyberinfrastructure and the Role of Grid Computing

20

(20 TB,1.8 CPU year)

Cybershake onthe SCEC VO

Workflow Scheduler/Engine

VO Service Catalog

Provenance Catalog

Data Catalog

SCECStorage

TeraGridCompute

TeraGridStorage

VO Scheduler

Deelman, Kesselman, et al., USC/ISI

Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years)

1

10

100

1000

10000

100000

10/1

910

/21

10/2

310

/25

10/2

710

/29

10/3

111

/211

/411

/611

/8

11/1

0

JOBS

HRS

Page 20: Cyberinfrastructure and the Role of Grid Computing

21

Science 1.0 Science 2.0

Gigabytes

Tarballs

Journals

Individuals

Community codes

Supercomputer centers

Makefile

Computational science

Physical sciences

Computational scientists

NSF-funded

Terabytes

Services

Wikis

Communities

Science gateways

TeraGrid, OSG, campus

Workflow

Science as computation

All sciences (& humanities)

All scientists

NSF-funded

Page 21: Cyberinfrastructure and the Role of Grid Computing

22

Science 2.0 Challenges

A need for new technologies, skills, & roles Creating, publishing, hosting, discovering, composing,

archiving, explaining … services A need for substantial software development

“30-80% of modern astronomy projects is software”—S. G. Djorgovski

A need for more & different infrastructure Computers & networks to host services

Can we leverage commercial spending? To some extent, but not straightforward

Page 22: Cyberinfrastructure and the Role of Grid Computing

23

For More Information Globus Alliance

www.globus.org Dev.Globus

dev.globus.org Open Science Grid

www.opensciencegrid.org TeraGrid

www.teragrid.org Background

www.mcs.anl.gov/~foster

2nd Editionwww.mkp.com/grid2

Thanks for DOE and NSF for research support!!