grid computing
DESCRIPTION
Grid Computing. Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster. Seminar at Fermilab, October 31 st , 2001. Grid Computing. Issues I Will Address. - PowerPoint PPT PresentationTRANSCRIPT
Grid Computing Ian Foster
Mathematics and Computer Science DivisionArgonne National Laboratory
andDepartment of Computer Science
The University of Chicago
http://www.mcs.anl.gov/~fosterSeminar at Fermilab, October 31st, 2001
[email protected] Argonne Chicago
Grid Computing
[email protected] Argonne Chicago
Issues I Will Address Grids in a nutshell
– Problem statement– Major Grid projects
Grid architecture– Globus Project™ and Toolkit™
HENP data grid projects– GriPhyN, PPDG, iVDGL
[email protected] Argonne Chicago
The Grid Problem Resource sharing & coordinated problem
solving in dynamic, multi-institutional virtual organizations
[email protected] Argonne Chicago
Grid Communities & Applications:Data Grids for High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm ~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Image courtesy Harvey Newman, Caltech
[email protected] Argonne Chicago
Grid Communities and Applications:Network for Earthquake Eng. Simulation
NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other
On-demand access to experiments, data streams, computing, archives, collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC www.neesgrid.org
[email protected] Argonne Chicago
Access Grid Collaborative work
among large groups ~50 sites worldwide Use Grid services for
discovery, security www.scglobal.org
Ambient mic(tabletop)
Presentermic
Presentercamera
Audience camera
Access Grid: Argonne, others www.accessgrid.org
[email protected] Argonne Chicago
Why Grids? A biochemist exploits 10,000 computers to
screen 100,000 compounds in an hour 1,000 physicists worldwide pool resources for
petaop analyses of petabytes of data Civil engineers collaborate to design, execute, &
analyze shake table experiments Climate scientists visualize, annotate, & analyze
terabyte simulation datasets An emergency response team couples real time
data, weather model, population data
[email protected] Argonne Chicago
Why Grids? (contd) A multidisciplinary analysis in aerospace couples
code and data in four companies A home user invokes architectural design
functions at an application service provider An application service provider purchases cycles
from compute cycle providers Scientists working for a multinational soap
company design a new product A community group pools members’ PCs to
analyze alternative designs for a local road
[email protected] Argonne Chicago
Elements of the Problem Resource sharing
– Computers, storage, sensors, networks, …– Sharing always conditional: issues of trust, policy,
negotiation, payment, … Coordinated problem solving
– Beyond client-server: distributed data analysis, computation, collaboration, …
Dynamic, multi-institutional virtual orgs– Community overlays on classic org structures– Large or small, static or dynamic
[email protected] Argonne Chicago
A Little History Early 90s
– Gigabit testbeds, metacomputing Mid to late 90s
– Early experiments (e.g., I-WAY), software projects (e.g., Globus), application experiments
2001– Major application communities emerging– Major infrastructure deployments are underway– Rich technology base has been constructed– Global Grid Forum: >1000 people on mailing lists, 192
orgs at last meeting, 28 countries
[email protected] Argonne Chicago
The Grid World: Current Status Dozens of major Grid projects in scientific &
technical computing/research & education– Deployment, application, technology
Considerable consensus on key concepts and technologies– Membership, security, resource discovery,
resource management, … Global Grid Forum has emerged as a significant
force– And first “Grid” proposals at IETF
[email protected] Argonne Chicago
Selected Major Grid ProjectsName URL &
SponsorsFocus
Access Grid www.mcs.anl.gov/FL/accessgrid; DOE, NSF
Create & deploy group collaboration systems using commodity technologies
BlueGrid IBM Grid testbed linking IBM laboratoriesDISCOM www.cs.sandia.gov/
discomDOE Defense Programs
Create operational Grid providing access to resources at three U.S. DOE weapons laboratories
DOE Science Grid
sciencegrid.orgDOE Office of Science
Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities
Earth System Grid (ESG)
earthsystemgrid.orgDOE Office of Science
Delivery and analysis of large climate model datasets for the climate research community
European Union (EU) DataGrid
eu-datagrid.orgEuropean Union
Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics
ggg
g
g
g
New
New
[email protected] Argonne Chicago
Selected Major Grid ProjectsName URL/
SponsorFocus
EuroGrid, Grid Interoperability (GRIP)
eurogrid.orgEuropean Union
Create technologies for remote access to supercomputer resources & simulation codes; in GRIP, integrate with Globus
Fusion Collaboratory
fusiongrid.orgDOE Off. Science
Create a national computational collaboratory for fusion research
Globus Project globus.orgDARPA, DOE, NSF, NASA, Msoft
Research on Grid technologies; development and support of Globus Toolkit; application and deployment
GridLab gridlab.orgEuropean Union
Grid technologies and applications
GridPP gridpp.ac.ukU.K. eScience
Create & apply an operational grid within the U.K. for particle physics research
Grid Research Integration Dev. & Support Center
grids-center.orgNSF
Integration, deployment, support of the NSF Middleware Infrastructure for research & education
g
g
g
g
g
g
New
New
New
New
New
[email protected] Argonne Chicago
Selected Major Grid ProjectsName URL/Sponsor Focus
Grid Application Dev. Software
hipersoft.rice.edu/grads; NSF
Research into program development technologies for Grid applications
Grid Physics Network
griphyn.orgNSF
Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS
Information Power Grid
ipg.nasa.govNASA
Create and apply a production Grid for aerosciences and other NASA missions
International Virtual Data Grid Laboratory
ivdgl.orgNSF
Create international Data Grid to enable large-scale experimentation on Grid technologies & applications
Network for Earthquake Eng. Simulation Grid
neesgrid.orgNSF
Create and apply a production Grid for earthquake engineering
Particle Physics Data Grid
ppdg.netDOE Science
Create and apply production Grids for data analysis in high energy and nuclear physics experiments
g
g
g
g
gNew
New
g
[email protected] Argonne Chicago
Selected Major Grid ProjectsName URL/Sponsor Focus
TeraGrid teragrid.orgNSF
U.S. science infrastructure linking four major resource sites at 40 Gb/s
UK Grid Support Center
grid-support.ac.ukU.K. eScience
Support center for Grid projects within the U.K.
Unicore BMBFT Technologies for remote access to supercomputers
g
gNew
New
Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS
See also www.gridforum.org
[email protected] Argonne Chicago
Grid Architecture & Globus Toolkit™ The question:
– What is needed for resource sharing & coordinated problem solving in dynamic virtual organizations (VOs)?
The answer:– Major issues identified: membership, resource
discovery & access, …, …– Grid architecture captures core elements,
emphasizing pre-eminent role of protocols– Globus Toolkit™ has emerged as de facto standard
for major protocols & services
[email protected] Argonne Chicago
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Internet Protocol Architecture
For more info: www.globus.org/research/papers/anatomy.pdf
[email protected] Argonne Chicago
Grid Services Architecture (1):Fabric Layer
Just what you would expect: the diverse mix of resources that may be shared– Individual computers, Condor pools, file
systems, archives, metadata catalogs, networks, sensors, etc., etc.
Few constraints on low-level technology: connectivity and resource level protocols form the “neck in the hourglass”
Globus toolkit provides a few selected components (e.g., bandwidth broker)
[email protected] Argonne Chicago
Grid Services Architecture (2):Connectivity Layer Protocols & Services
Communication– Internet protocols: IP, DNS, routing, etc.
Security: Grid Security Infrastructure (GSI)– Uniform authentication & authorization
mechanisms in multi-institutional setting– Single sign-on, delegation, identity mapping– Public key technology, SSL, X.509, GSS-API
(several Internet drafts document extensions)– Supporting infrastructure: Certificate Authorities,
key management, etc.
[email protected] Argonne Chicago
Site A(Kerberos)
Site B (Unix)
Site C(Kerberos)
Computer
UserSingle sign-on via “grid-id”& generation of proxy cred.Or: retrieval of proxy cred.from online repository
User ProxyProxy
credential
Computer
Storagesystem
Communication*
GSI-enabledFTP server
AuthorizeMap to local idAccess file
Remote fileaccess request*
GSI-enabledGRAM server
GSI-enabledGRAM server
Remote processcreation requests*
* With mutual authentication
Process
Kerberosticket
Restrictedproxy
Process
Restrictedproxy
Local id Local id
AuthorizeMap to local idCreate processGenerate credentials
Ditto
GSI in Action: “Create Processes at A and B that Communicate & Access Files at C”
[email protected] Argonne Chicago
Grid Services Architecture (3):Resource Layer Protocols & Services Resource management: GRAM
– Remote allocation, reservation, monitoring, control of [compute] resources
Data access: GridFTP– High-performance data access & transport
Information: MDS (GRRP, GRIP)– Access to structure & state information
& others emerging: catalog access, code repository access, accounting, …
All integrated with GSI
[email protected] Argonne Chicago
GRAM Resource Management Protocol
Grid Resource Allocation & Management– Allocation, monitoring, control of computations– Secure remote access to diverse schedulers
Current evolution– Immediate and advance reservation– Multiple resource types: manage anything– Recoverable requests, timeout, etc.– Evolve to Web Services– Policy evaluation points for restricted proxies
Karl Czajkowski, Steve Tuecke, others
[email protected] Argonne Chicago
Data Access & Transfer GridFTP: extended version of popular FTP protocol
for Grid data access and transfer Secure, efficient, reliable, flexible, extensible,
parallel, concurrent, e.g.:– Third-party data transfers, partial file transfers– Parallelism, striping (e.g., on PVFS)– Reliable, recoverable data transfers
Reference implementations– Existing clients and servers: wuftpd, nicftp– Flexible, extensible libraries
Bill Allcock, Joe Bester, John Bresnahan, Steve Tuecke, others
[email protected] Argonne Chicago
Grid Services Architecture (4):Collective Layer Protocols & Services
Community membership & policy– E.g., Community Authorization Service
Index/metadirectory/ brokering services– Custom views on community resource collections
(e.g., GIIS, Condor Matchmaker) Replica management and replica selection
– Optimize aggregate data access performance Co-reservation and co-allocation services
– End-to-end performance Etc., etc.
[email protected] Argonne Chicago
The Grid Information Problem
Large numbers of distributed “sensors” with different properties
Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type
[email protected] Argonne Chicago
Globus Toolkit Solution: MDS-2
Registration & enquiry protocols, information models, query languages– Provides standard interfaces to sensors– Supports different “directory” structures
supporting various discovery/access strategiesKarl Czajkowski, Steve Fitzgerald, others
[email protected] Argonne Chicago
CAS1. CAS request, with resource names and operations
Community Authorization(Prototype shown August 2001)
Does the
collective policy authorize this
request for this user?
user/group membership
resource/collective membership
collective policy information
Resource
Is this request authorized for
the CAS?
Is this request authorized by
the capability? local policy
information 4. Resource reply
User 3. Resource request, authenticated with
capability
2. CAS reply, with and resource CA info
capability
Laura Pearlman, Steve Tuecke, Von Welch, others
[email protected] Argonne Chicago
HENP Related Data Grid Projects Funded (or about to be funded) projects
– PPDG I USA DOE $2M 1999-2001– GriPhyN USA NSF $11.9M + $1.6M 2000-2005– EU DataGrid EU EC €10M 2001-2004– PPDG II USA DOE $9.5M 2001-2004– iVDGL USA NSF $13.7M + $2M 2001-2006– DataTAG EU EC €4M 2002-2004
Proposed projects– GridPP UK PPARC >$15M? 2001-2004
Many national projects of interest to HENP– Initiatives in US, UK, Italy, France, NL, Germany, Japan, …– EU networking initiatives (Géant, SURFNet)– US Distributed Terascale Facility ($53M, 12 TFL, 40 Gb/s net)
[email protected] Argonne Chicago
Background on Data Grid Projects They support several disciplines
– GriPhyN: CS, HEP (LHC), gravity waves, digital astronomy– PPDG: CS, HEP (LHC + current expts), Nuc. Phys., networking– DataGrid: CS, HEP (LHC), earth sensing, biology, networking
They are already joint projects– Multiple scientific communities– High-performance scientific experiments– International components and connections
They have common elements & foundations– Interconnected management structures– Sharing of people, projects (e.g., GDMP)– Globus infrastructure– Data Grid Reference Architecture
HENP Intergrid Coordination Board
[email protected] Argonne Chicago
Grid Physics Network (GriPhyN) Enabling R&D for advanced data grid systems,
focusing in particular on Virtual Data concept
Virtual Data Tools Request Planning and Scheduling Tools
Request Execution Management Tools
Transforms
Distributed resources(code, storage,computers, and network)
Resource Management
Services
Resource Management
Services
Security and Policy
Services
Security and Policy
Services
Other Grid Services
Other Grid Services
Interactive User Tools
Production Team
Individual Investigator Other Users
Raw data source
ATLASCMSLIGOSDSS
[email protected] Argonne Chicago
Virtual Datain Action
?
Major Archive Facilities
Network caches & regional centers
Local sites
Data request may Access local data Compute locally Compute remotely Access remote
data Scheduling &
execution subject to local & global policies
[email protected] Argonne Chicago
GriPhyN Status, October 2001 Data Grid Reference Architecture defined
– v1: core services (Feb 2001)– v2: request planning/mgmt, catalogs (RSN)
Progress on ATLAS, CMS, LIGO, SDSS– Requirements statements developed– Testbeds and experiments proceeding
Progress on technology– DAGMAN request management– Catalogs, security, policy
Virtual Data Toolkit v1.0 out soon
[email protected] Argonne Chicago
GriPhyN/PPDGData Grid Architecture
Application
Planner
Executor
Catalog Services
Info Services
Policy/Security
Monitoring
Repl. Mgmt.
Reliable TransferService
Compute Resource Storage Resource
DAG
DAG
DAGMAN, Kangaroo
GRAM GridFTP; GRAM; SRM
GSI, CAS
MDS
MCAT; GriPhyN catalogs
GDMP
MDS
Globus
= initial solution is operational
Ewa Deelman, Mike Wilde
[email protected] Argonne Chicago
Transparency wrt materialization
Id Trans F ParamName …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …
Trans Prog Cost …F URL:f 10 …G URL:g 20 …
Program storage
Trans. name
URLs for program location
Derived Data Catalog
Transformation Catalog
Update uponmaterialization
App specific attr. id …… i2,i10……
Derived Metadata Catalog
id
Id Trans Param Name …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …
Trans Prog Cost …F URL:f 10 …G URL:g 20 …
Program storage
Trans. name
URLs for program location
App-specific-attr id … … i2,i10……
id
Physical file storage
URLs for physical file location
Name LObjN …
F.X logO3 … …
LCN PFNs …logC1 URL1logC2 URL2 URL3logC3 URL4logC4 URL5 URL6
Metadata Catalog
Replica CatalogLogical Container
Name
GCMSObject Name
Transparency wrt location
Name LObjN … … X logO1 … …Y logO2 … …F.X logO3 … …G(1).Y logO4 … …
LCN PFNs …logC1 URL1logC2 URL2 URL3logC3 URL4logC4 URL5 URL6
Replica Catalog
GCMSGCMSObject Name
Catalog Architecture
Fabric
Tape Storage
Elements
Request Formulator and
Planner
Client Applications
Compute Elements
Indicates component that will be replaced
Disk Storage
Elements
LANs andWANs
Resource and Services Catalog
Replica Catalog
Meta-data Catalog
Authentication and SecurityGSISAM-specific user, group, node, station registration Bbftp ‘cookie’
Connectivity and Resource
CORBA UDP File transfer protocols - ftp, bbftp, rcp GridFTP
Mass Storage systems protocolse.g. encp, hpss
Collective
Services
Catalogprotocols
Significant Event Logger Naming Service Database ManagerCatalog Manager
SAM Resource Management Batch Systems - LSF, FBS, PBS, Condor Data MoverJob Services
Storage ManagerJob ManagerCache ManagerRequest Manager
“Dataset Editor” “File Storage Server”“Project Master” “Station Master” “Station Master”
WebPython codes, Java codes Command line
D0 Framework C++ codes
“Stager”“Optimiser”
CodeRepostory
Name in “quotes” is SAM-given software component name
or addedenhanced using PPDG and Grid tools
[email protected] Argonne Chicago
NCSA Linux cluster
5) Secondary reports complete to master
Master Condor job running at
Caltech
7) GridFTP fetches data from UniTree
NCSA UniTree - GridFTP-enabled FTP server
4) 100 data files transferred via GridFTP, ~ 1 GB each
Secondary Condor job on WI
pool
3) 100 Monte Carlo jobs on Wisconsin Condor pool
2) Launch secondary job on WI pool; input files via Globus GASS
Caltech workstation
6) Master starts reconstruction jobs via Globus jobmanager on cluster
8) Processed objectivity database stored to UniTree
9) Reconstruction job reports complete to master
Early GriPhyN Challenge Problem:CMS Data Reconstruction
Scott Koranda, Miron Livny, others
[email protected] Argonne Chicago
0
20
40
60
80
100
120
Pre / Simulation Jobs / Post (UW Condor)
ooHits at NCSA
ooDigis at NCSA
Delay due to script error
Trace of a Condor-G Physics Run
[email protected] Argonne Chicago
The 13.6 TF TeraGrid:Computing at 40 Gb/s
26
24
8
4 HPSS
5
HPSS
HPSS UniTree
External Networks
External NetworksExternal
Networks
External Networks
Site Resources Site Resources
Site ResourcesSite ResourcesNCSA/PACI8 TF240 TB
SDSC4.1 TF225 TB
Caltech Argonne
TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
[email protected] Argonne Chicago
International Virtual Data Grid Lab
Tier0/1 facilityTier2 facility
10+ Gbps link
2.5 Gbps link
622 Mbps link
Other link
Tier3 facility
[email protected] Argonne Chicago
Grids and Industry Scientific/technical apps in industry
– “IntraGrid”: Aerospace, pharmaceuticals, …– “InterGrid”: multi-company teams– Globus Toolkit provides a starting point
> Compaq, Cray, Entropia, Fujitsu, Hitachi, IBM, Microsoft, NEC, Platform, SGI, Sun all porting etc.
Enable resource sharing outside S&TC– “Grid Services”—extend Web Services with capabilities
provided by Grid protocols> Focus of IBM-Globus collaboration
– Future computing infrastructure based on Grid-enabled xSPs & applications?
[email protected] Argonne Chicago
Acknowledgments Globus R&D is joint with numerous people
– Carl Kesselman, Co-PI; Steve Tuecke, principal architect at ANL; others acked below
GriPhyN R&D is joint with numerous people– Paul Avery, Co-PI; Mike Wilde, project coordinator; Carl
Kesselman, Miron Livny CS leads; numerous others PPDG R&D is joint with numerous people
– Richard Mount, Harvey Newman, Miron Livny, Co-PIs; Ruth Pordes, project coordinator
Numerous other project partners Support: DOE, DARPA, NSF, NASA, Microsoft
[email protected] Argonne Chicago
Summary “Grids”: Resource sharing & problem solving in
dynamic virtual organizations– Many projects now working to develop, deploy,
apply relevant technologies Common protocols and services are critical
– Globus Toolkit a source of protocol and API definitions, reference implementations
Rapid progress on definition, implementation, and application of Data Grid architecture
First indications of industrial adoption