cyberinfrastructure and the role of grid computing
DESCRIPTION
Or, “Science 2.0”. Cyberinfrastructure and the Role of Grid Computing. Ian Foster Computation Institute Argonne National Lab & University of Chicago. “Web 2.0”. Software as services Data- & computation-rich network services Services as platforms - PowerPoint PPT PresentationTRANSCRIPT
Ian FosterComputation Institute
Argonne National Lab & University of Chicago
Cyberinfrastructure and the Role of Grid Computing
Or, “Science 2.0”
2
“Web 2.0” Software as services
Data- & computation-richnetwork services
Services as platforms Easy composition of services to create new capabilities
(“mashups”)—that themselves may be made accessible as new services
Enabled by massive infrastructure buildout Google projected to spend $1.5B on computers, networks,
and real estate in 2006 Dozens of others are spending substantially
Paid for by advertising
Declan Butler,Nature
3
Science 2.0:E.g., Virtual Observatories
Data Archives
User
Analysis tools
Gateway
Figure: S. G. Djorgovski
Discovery tools
4
Science 2.0:E.g., Cancer Bioinformatics Grid
Data Service@ uchicago.edu<BPEL
WorkflowDoc> BPEL
Engine
Analytic service@ osu.edu
Analytic service@ duke.eduResearcher
Or Client App <WorkflowResults>
<WorkflowInputs> link
link
link
link
caBiG: https://cabig.nci.nih.gov/; BPEL work: Ravi Madduri et al.
5
The Two Dimensions of Science 2.0
Decompose across network Clients integrate dynamically
Select & compose services Select “best of breed” providers Publish result as new services
Decouple resource & service providers
FunctionResource
Data Archives
Analysis tools
Discovery toolsUsers
Fig: S. G. Djorgovski
6
Provisioning
Technology Requirements: Integration & Decomposition
Service-oriented Gridinfrastructure Provision physical
resources to support application workloads
ApplnService
ApplnService
Users
Workflows
Composition
Invocation
Service-oriented applications Wrap applications &
data as services Compose services
into workflows
“The Many Faces of IT as Service”, ACM Queue, Foster, Tuecke, 2005
7
Globus SoftwareEnables Grid Infrastructure
Web service interfaces for behaviors relating to integration and decomposition Primitives: resources, state, security Services: execution, data movement, …
Open source software that implements those interfaces In particular, Globus Toolkit (GT4)
All standard Web services “Grid is a use case for Web services, focused on
resource management”
8
Open Source Grid Software
Data MgmtSecurity Common
RuntimeExecution
MgmtInfo
Services
GridFTPAuthenticationAuthorization
ReliableFile
Transfer
Data Access& Integration
Grid ResourceAllocation &
ManagementIndex
CommunityAuthorization
DataReplication
CommunitySchedulingFramework
Delegation
ReplicaLocation
Trigger
Java Runtime
C Runtime
Python RuntimeWebMDS
WorkspaceManagement
Grid Telecontrol
Protocol
Globus Toolkit v4www.globus.org
CredentialMgmt
Globus Toolkit Version 4: Software for Service-Oriented Systems, LNCS 3779, 2-13, 2005
9
http://dev.globus.org
Guidelines(Apache)
Infrastructure(CVS, email,
bugzilla, Wiki)
ProjectsInclude
…
dev.globus — Community Driven Improvement of Globus Software, NSF OCI
10
Community
Services Provider
Content
Services
Capacity
Hosted Science Services1) Integrate services from external sources
Virtualize “services” from providers
2) Coordinate & compose Create new services from existing ones
Capacity Provider
“Service-Oriented Science”, Science, 2005
11
Birmingham•
The Globus-BasedLIGO Data Grid
Replicating >1 Terabyte/day to 8 sites>40 million replicas so farMTBF = 1 month
LIGO Gravitational Wave Observatory
www.globus.org/solutions
Cardiff
AEI/Golm
12
Pull “missing” files to a storage system
List of required
Files
GridFTPLocal
ReplicaCatalog
ReplicaLocation
Index
Data Replication
Service
Reliable File
Transfer Service Local
ReplicaCatalog
GridFTP
Data Replication Service
“Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005
ReplicaLocation
Index
Data MovementData Location
Data Replication
14
Example: Biology Public PUMA
Knowledge BaseInformation about proteins analyzed against ~2 million gene sequences
Back OfficeAnalysis on GridMillions of BLAST, BLOCKS, etc., on
OSG and TeraGridNatalia Maltsev et al., http://compbio.mcs.anl.gov/puma2
15
Example:Earth System Grid
Climate simulation data Per-collection control Different user classes Server-side processing
Implementation (GT) Portal-based User
Registration (PURSE) PKI, SAML assertions GridFTP, GRAM, SRM
>2000 users >100 TB downloaded
www.earthsystemgrid.org — DOE OASCR
16
Under the Covers
17
Example:Astro Portal Stacking Service
Purpose On-demand “stacks” of
random locations within ~10TB dataset
Challenge Rapid access to 10-10K
“random” files Time-varying load
Solution Dynamic acquisition of
compute, storage
++++++=
+
S4 SloanDataWeb page
or Web Service
Joint work with Ioan Raicu & Alex Szalay
18
Preliminary Performance (TeraGrid, LAN GPFS)
Joint work with Ioan Raicu & Alex Szalay
19
Example: Cybershake Calculate hazard curves by generating synthetic
seismograms from estimated rupture forecast
Rupture Forecast
Synthetic Seismogram
Strain GreenTensor
Hazard CurveSpectral Acceleration
Hazard Map
Tom Jordan et al., Southern California Earthquake Center
20
20 TB,1.8 CPU-year
Enlisting TeraGridResources
Workflow Scheduler/Engine
VO Service Catalog
Provenance Catalog
Data Catalog
SCECStorage
TeraGridCompute
TeraGridStorage
VO Scheduler
Ewa Deelman, Carl Kesselman, et al., USC Information Sciences Institute
Number of jobs per day (23 days), 261,823 jobs total, Number of CPU hours per day, 15,706 hours total (1.8 years)
1
10
100
1000
10000
100000
10/19
10/21
10/23
10/25
10/27
10/29
10/31 11
/211
/411
/611
/811
/10
JOBS HRS
21
Science 1.0 Science 2.0Gigabytes
TarballsJournals
IndividualsCommunity codes
Supercomputer centersMakefile
Computational sciencePhysical sciences
Computational scientistsNSF-funded
Terabytes Services Wikis Communities Science gateways TeraGrid, OSG, campus Workflow Science as computation All sciences (& humanities) All scientists NSF-funded
22
Science 2.0 Challenges A need for new technologies, skills, & roles
Creating, publishing, hosting, discovering, composing, archiving, explaining … services
A need for substantial software development “30-80% of modern astronomy projects is software”—
S. G. Djorgovski A need for more & different infrastructure
Computers & networks to host services Can we leverage commercial spending?
To some extent, but not straightforward
23
Acknowledgements Carl Kesselman for many discussions Many colleagues, including those named on
slides, for research collaborations and/or slides Colleagues involved in the TeraGrid, Open
Science Grid, Earth System Grid, caBIG, and other Grid infrastructures
Globus Alliance members for Globus software R&D
DOE OASCR, NSF OCI, & NIH for support
24
For More Information Globus Alliance
www.globus.org Dev.Globus
dev.globus.org Open Science Grid
www.opensciencegrid.org TeraGrid
www.teragrid.org Background
www.mcs.anl.gov/~foster2nd Edition
www.mkp.com/grid2
Thanks for DOE, NSF, and NIH for research support!!