grid computing

43
Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster Seminar at Fermilab, October 31 st , 2001

Upload: lee

Post on 25-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Grid Computing. Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster. Seminar at Fermilab, October 31 st , 2001. Grid Computing. Issues I Will Address. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid Computing

Grid Computing Ian Foster

Mathematics and Computer Science DivisionArgonne National Laboratory

andDepartment of Computer Science

The University of Chicago

http://www.mcs.anl.gov/~fosterSeminar at Fermilab, October 31st, 2001

Page 2: Grid Computing

[email protected] Argonne Chicago

Grid Computing

Page 3: Grid Computing

[email protected] Argonne Chicago

Issues I Will Address Grids in a nutshell

– Problem statement– Major Grid projects

Grid architecture– Globus Project™ and Toolkit™

HENP data grid projects– GriPhyN, PPDG, iVDGL

Page 4: Grid Computing

[email protected] Argonne Chicago

The Grid Problem Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual organizations

Page 5: Grid Computing

[email protected] Argonne Chicago

Grid Communities & Applications:Data Grids for High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm ~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Image courtesy Harvey Newman, Caltech

Page 6: Grid Computing

[email protected] Argonne Chicago

Grid Communities and Applications:Network for Earthquake Eng. Simulation

NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other

On-demand access to experiments, data streams, computing, archives, collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC www.neesgrid.org

Page 7: Grid Computing

[email protected] Argonne Chicago

Access Grid Collaborative work

among large groups ~50 sites worldwide Use Grid services for

discovery, security www.scglobal.org

Ambient mic(tabletop)

Presentermic

Presentercamera

Audience camera

Access Grid: Argonne, others www.accessgrid.org

Page 8: Grid Computing

[email protected] Argonne Chicago

Why Grids? A biochemist exploits 10,000 computers to

screen 100,000 compounds in an hour 1,000 physicists worldwide pool resources for

petaop analyses of petabytes of data Civil engineers collaborate to design, execute, &

analyze shake table experiments Climate scientists visualize, annotate, & analyze

terabyte simulation datasets An emergency response team couples real time

data, weather model, population data

Page 9: Grid Computing

[email protected] Argonne Chicago

Why Grids? (contd) A multidisciplinary analysis in aerospace couples

code and data in four companies A home user invokes architectural design

functions at an application service provider An application service provider purchases cycles

from compute cycle providers Scientists working for a multinational soap

company design a new product A community group pools members’ PCs to

analyze alternative designs for a local road

Page 10: Grid Computing

[email protected] Argonne Chicago

Elements of the Problem Resource sharing

– Computers, storage, sensors, networks, …– Sharing always conditional: issues of trust, policy,

negotiation, payment, … Coordinated problem solving

– Beyond client-server: distributed data analysis, computation, collaboration, …

Dynamic, multi-institutional virtual orgs– Community overlays on classic org structures– Large or small, static or dynamic

Page 11: Grid Computing

[email protected] Argonne Chicago

A Little History Early 90s

– Gigabit testbeds, metacomputing Mid to late 90s

– Early experiments (e.g., I-WAY), software projects (e.g., Globus), application experiments

2001– Major application communities emerging– Major infrastructure deployments are underway– Rich technology base has been constructed– Global Grid Forum: >1000 people on mailing lists, 192

orgs at last meeting, 28 countries

Page 12: Grid Computing

[email protected] Argonne Chicago

The Grid World: Current Status Dozens of major Grid projects in scientific &

technical computing/research & education– Deployment, application, technology

Considerable consensus on key concepts and technologies– Membership, security, resource discovery,

resource management, … Global Grid Forum has emerged as a significant

force– And first “Grid” proposals at IETF

Page 13: Grid Computing

[email protected] Argonne Chicago

Selected Major Grid ProjectsName URL &

SponsorsFocus

Access Grid www.mcs.anl.gov/FL/accessgrid; DOE, NSF

Create & deploy group collaboration systems using commodity technologies

BlueGrid IBM Grid testbed linking IBM laboratoriesDISCOM www.cs.sandia.gov/

discomDOE Defense Programs

Create operational Grid providing access to resources at three U.S. DOE weapons laboratories

DOE Science Grid

sciencegrid.orgDOE Office of Science

Create operational Grid providing access to resources & applications at U.S. DOE science laboratories & partner universities

Earth System Grid (ESG)

earthsystemgrid.orgDOE Office of Science

Delivery and analysis of large climate model datasets for the climate research community

European Union (EU) DataGrid

eu-datagrid.orgEuropean Union

Create & apply an operational grid for applications in high energy physics, environmental science, bioinformatics

ggg

g

g

g

New

New

Page 14: Grid Computing

[email protected] Argonne Chicago

Selected Major Grid ProjectsName URL/

SponsorFocus

EuroGrid, Grid Interoperability (GRIP)

eurogrid.orgEuropean Union

Create technologies for remote access to supercomputer resources & simulation codes; in GRIP, integrate with Globus

Fusion Collaboratory

fusiongrid.orgDOE Off. Science

Create a national computational collaboratory for fusion research

Globus Project globus.orgDARPA, DOE, NSF, NASA, Msoft

Research on Grid technologies; development and support of Globus Toolkit; application and deployment

GridLab gridlab.orgEuropean Union

Grid technologies and applications

GridPP gridpp.ac.ukU.K. eScience

Create & apply an operational grid within the U.K. for particle physics research

Grid Research Integration Dev. & Support Center

grids-center.orgNSF

Integration, deployment, support of the NSF Middleware Infrastructure for research & education

g

g

g

g

g

g

New

New

New

New

New

Page 15: Grid Computing

[email protected] Argonne Chicago

Selected Major Grid ProjectsName URL/Sponsor Focus

Grid Application Dev. Software

hipersoft.rice.edu/grads; NSF

Research into program development technologies for Grid applications

Grid Physics Network

griphyn.orgNSF

Technology R&D for data analysis in physics expts: ATLAS, CMS, LIGO, SDSS

Information Power Grid

ipg.nasa.govNASA

Create and apply a production Grid for aerosciences and other NASA missions

International Virtual Data Grid Laboratory

ivdgl.orgNSF

Create international Data Grid to enable large-scale experimentation on Grid technologies & applications

Network for Earthquake Eng. Simulation Grid

neesgrid.orgNSF

Create and apply a production Grid for earthquake engineering

Particle Physics Data Grid

ppdg.netDOE Science

Create and apply production Grids for data analysis in high energy and nuclear physics experiments

g

g

g

g

gNew

New

g

Page 16: Grid Computing

[email protected] Argonne Chicago

Selected Major Grid ProjectsName URL/Sponsor Focus

TeraGrid teragrid.orgNSF

U.S. science infrastructure linking four major resource sites at 40 Gb/s

UK Grid Support Center

grid-support.ac.ukU.K. eScience

Support center for Grid projects within the U.K.

Unicore BMBFT Technologies for remote access to supercomputers

g

gNew

New

Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS

See also www.gridforum.org

Page 17: Grid Computing

[email protected] Argonne Chicago

Grid Architecture & Globus Toolkit™ The question:

– What is needed for resource sharing & coordinated problem solving in dynamic virtual organizations (VOs)?

The answer:– Major issues identified: membership, resource

discovery & access, …, …– Grid architecture captures core elements,

emphasizing pre-eminent role of protocols– Globus Toolkit™ has emerged as de facto standard

for major protocols & services

Page 18: Grid Computing

[email protected] Argonne Chicago

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Internet Protocol Architecture

For more info: www.globus.org/research/papers/anatomy.pdf

Page 19: Grid Computing

[email protected] Argonne Chicago

Grid Services Architecture (1):Fabric Layer

Just what you would expect: the diverse mix of resources that may be shared– Individual computers, Condor pools, file

systems, archives, metadata catalogs, networks, sensors, etc., etc.

Few constraints on low-level technology: connectivity and resource level protocols form the “neck in the hourglass”

Globus toolkit provides a few selected components (e.g., bandwidth broker)

Page 20: Grid Computing

[email protected] Argonne Chicago

Grid Services Architecture (2):Connectivity Layer Protocols & Services

Communication– Internet protocols: IP, DNS, routing, etc.

Security: Grid Security Infrastructure (GSI)– Uniform authentication & authorization

mechanisms in multi-institutional setting– Single sign-on, delegation, identity mapping– Public key technology, SSL, X.509, GSS-API

(several Internet drafts document extensions)– Supporting infrastructure: Certificate Authorities,

key management, etc.

Page 21: Grid Computing

[email protected] Argonne Chicago

Site A(Kerberos)

Site B (Unix)

Site C(Kerberos)

Computer

UserSingle sign-on via “grid-id”& generation of proxy cred.Or: retrieval of proxy cred.from online repository

User ProxyProxy

credential

Computer

Storagesystem

Communication*

GSI-enabledFTP server

AuthorizeMap to local idAccess file

Remote fileaccess request*

GSI-enabledGRAM server

GSI-enabledGRAM server

Remote processcreation requests*

* With mutual authentication

Process

Kerberosticket

Restrictedproxy

Process

Restrictedproxy

Local id Local id

AuthorizeMap to local idCreate processGenerate credentials

Ditto

GSI in Action: “Create Processes at A and B that Communicate & Access Files at C”

Page 22: Grid Computing

[email protected] Argonne Chicago

Grid Services Architecture (3):Resource Layer Protocols & Services Resource management: GRAM

– Remote allocation, reservation, monitoring, control of [compute] resources

Data access: GridFTP– High-performance data access & transport

Information: MDS (GRRP, GRIP)– Access to structure & state information

& others emerging: catalog access, code repository access, accounting, …

All integrated with GSI

Page 23: Grid Computing

[email protected] Argonne Chicago

GRAM Resource Management Protocol

Grid Resource Allocation & Management– Allocation, monitoring, control of computations– Secure remote access to diverse schedulers

Current evolution– Immediate and advance reservation– Multiple resource types: manage anything– Recoverable requests, timeout, etc.– Evolve to Web Services– Policy evaluation points for restricted proxies

Karl Czajkowski, Steve Tuecke, others

Page 24: Grid Computing

[email protected] Argonne Chicago

Data Access & Transfer GridFTP: extended version of popular FTP protocol

for Grid data access and transfer Secure, efficient, reliable, flexible, extensible,

parallel, concurrent, e.g.:– Third-party data transfers, partial file transfers– Parallelism, striping (e.g., on PVFS)– Reliable, recoverable data transfers

Reference implementations– Existing clients and servers: wuftpd, nicftp– Flexible, extensible libraries

Bill Allcock, Joe Bester, John Bresnahan, Steve Tuecke, others

Page 25: Grid Computing

[email protected] Argonne Chicago

Grid Services Architecture (4):Collective Layer Protocols & Services

Community membership & policy– E.g., Community Authorization Service

Index/metadirectory/ brokering services– Custom views on community resource collections

(e.g., GIIS, Condor Matchmaker) Replica management and replica selection

– Optimize aggregate data access performance Co-reservation and co-allocation services

– End-to-end performance Etc., etc.

Page 26: Grid Computing

[email protected] Argonne Chicago

The Grid Information Problem

Large numbers of distributed “sensors” with different properties

Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type

Page 27: Grid Computing

[email protected] Argonne Chicago

Globus Toolkit Solution: MDS-2

Registration & enquiry protocols, information models, query languages– Provides standard interfaces to sensors– Supports different “directory” structures

supporting various discovery/access strategiesKarl Czajkowski, Steve Fitzgerald, others

Page 28: Grid Computing

[email protected] Argonne Chicago

CAS1. CAS request, with resource names and operations

Community Authorization(Prototype shown August 2001)

Does the

collective policy authorize this

request for this user?

user/group membership

resource/collective membership

collective policy information

Resource

Is this request authorized for

the CAS?

Is this request authorized by

the capability? local policy

information 4. Resource reply

User 3. Resource request, authenticated with

capability

2. CAS reply, with and resource CA info

capability

Laura Pearlman, Steve Tuecke, Von Welch, others

Page 29: Grid Computing

[email protected] Argonne Chicago

HENP Related Data Grid Projects Funded (or about to be funded) projects

– PPDG I USA DOE $2M 1999-2001– GriPhyN USA NSF $11.9M + $1.6M 2000-2005– EU DataGrid EU EC €10M 2001-2004– PPDG II USA DOE $9.5M 2001-2004– iVDGL USA NSF $13.7M + $2M 2001-2006– DataTAG EU EC €4M 2002-2004

Proposed projects– GridPP UK PPARC >$15M? 2001-2004

Many national projects of interest to HENP– Initiatives in US, UK, Italy, France, NL, Germany, Japan, …– EU networking initiatives (Géant, SURFNet)– US Distributed Terascale Facility ($53M, 12 TFL, 40 Gb/s net)

Page 30: Grid Computing

[email protected] Argonne Chicago

Background on Data Grid Projects They support several disciplines

– GriPhyN: CS, HEP (LHC), gravity waves, digital astronomy– PPDG: CS, HEP (LHC + current expts), Nuc. Phys., networking– DataGrid: CS, HEP (LHC), earth sensing, biology, networking

They are already joint projects– Multiple scientific communities– High-performance scientific experiments– International components and connections

They have common elements & foundations– Interconnected management structures– Sharing of people, projects (e.g., GDMP)– Globus infrastructure– Data Grid Reference Architecture

HENP Intergrid Coordination Board

Page 31: Grid Computing

[email protected] Argonne Chicago

Grid Physics Network (GriPhyN) Enabling R&D for advanced data grid systems,

focusing in particular on Virtual Data concept

Virtual Data Tools Request Planning and Scheduling Tools

Request Execution Management Tools

Transforms

Distributed resources(code, storage,computers, and network)

Resource Management

Services

Resource Management

Services

Security and Policy

Services

Security and Policy

Services

Other Grid Services

Other Grid Services

Interactive User Tools

Production Team

Individual Investigator Other Users

Raw data source

ATLASCMSLIGOSDSS

Page 32: Grid Computing

[email protected] Argonne Chicago

Virtual Datain Action

?

Major Archive Facilities

Network caches & regional centers

Local sites

Data request may Access local data Compute locally Compute remotely Access remote

data Scheduling &

execution subject to local & global policies

Page 33: Grid Computing

[email protected] Argonne Chicago

GriPhyN Status, October 2001 Data Grid Reference Architecture defined

– v1: core services (Feb 2001)– v2: request planning/mgmt, catalogs (RSN)

Progress on ATLAS, CMS, LIGO, SDSS– Requirements statements developed– Testbeds and experiments proceeding

Progress on technology– DAGMAN request management– Catalogs, security, policy

Virtual Data Toolkit v1.0 out soon

Page 34: Grid Computing

[email protected] Argonne Chicago

GriPhyN/PPDGData Grid Architecture

Application

Planner

Executor

Catalog Services

Info Services

Policy/Security

Monitoring

Repl. Mgmt.

Reliable TransferService

Compute Resource Storage Resource

DAG

DAG

DAGMAN, Kangaroo

GRAM GridFTP; GRAM; SRM

GSI, CAS

MDS

MCAT; GriPhyN catalogs

GDMP

MDS

Globus

= initial solution is operational

Ewa Deelman, Mike Wilde

Page 35: Grid Computing

[email protected] Argonne Chicago

Transparency wrt materialization

Id Trans F ParamName …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …

Trans Prog Cost …F URL:f 10 …G URL:g 20 …

Program storage

Trans. name

URLs for program location

Derived Data Catalog

Transformation Catalog

Update uponmaterialization

App specific attr. id …… i2,i10……

Derived Metadata Catalog

id

Id Trans Param Name …i1 F X F.X …i2 F Y F.Y …i10 G Y P G(P).Y …

Trans Prog Cost …F URL:f 10 …G URL:g 20 …

Program storage

Trans. name

URLs for program location

App-specific-attr id … … i2,i10……

id

Physical file storage

URLs for physical file location

Name LObjN …

F.X logO3 … …

LCN PFNs …logC1 URL1logC2 URL2 URL3logC3 URL4logC4 URL5 URL6

Metadata Catalog

Replica CatalogLogical Container

Name

GCMSObject Name

Transparency wrt location

Name LObjN … … X logO1 … …Y logO2 … …F.X logO3 … …G(1).Y logO4 … …

LCN PFNs …logC1 URL1logC2 URL2 URL3logC3 URL4logC4 URL5 URL6

Replica Catalog

GCMSGCMSObject Name

Catalog Architecture

Page 36: Grid Computing

Fabric

Tape Storage

Elements

Request Formulator and

Planner

Client Applications

Compute Elements

Indicates component that will be replaced

Disk Storage

Elements

LANs andWANs

Resource and Services Catalog

Replica Catalog

Meta-data Catalog

Authentication and SecurityGSISAM-specific user, group, node, station registration Bbftp ‘cookie’

Connectivity and Resource

CORBA UDP File transfer protocols - ftp, bbftp, rcp GridFTP

Mass Storage systems protocolse.g. encp, hpss

Collective

Services

Catalogprotocols

Significant Event Logger Naming Service Database ManagerCatalog Manager

SAM Resource Management Batch Systems - LSF, FBS, PBS, Condor Data MoverJob Services

Storage ManagerJob ManagerCache ManagerRequest Manager

“Dataset Editor” “File Storage Server”“Project Master” “Station Master” “Station Master”

WebPython codes, Java codes Command line

D0 Framework C++ codes

“Stager”“Optimiser”

CodeRepostory

Name in “quotes” is SAM-given software component name

or addedenhanced using PPDG and Grid tools

Page 37: Grid Computing

[email protected] Argonne Chicago

NCSA Linux cluster

5) Secondary reports complete to master

Master Condor job running at

Caltech

7) GridFTP fetches data from UniTree

NCSA UniTree - GridFTP-enabled FTP server

4) 100 data files transferred via GridFTP, ~ 1 GB each

Secondary Condor job on WI

pool

3) 100 Monte Carlo jobs on Wisconsin Condor pool

2) Launch secondary job on WI pool; input files via Globus GASS

Caltech workstation

6) Master starts reconstruction jobs via Globus jobmanager on cluster

8) Processed objectivity database stored to UniTree

9) Reconstruction job reports complete to master

Early GriPhyN Challenge Problem:CMS Data Reconstruction

Scott Koranda, Miron Livny, others

Page 38: Grid Computing

[email protected] Argonne Chicago

0

20

40

60

80

100

120

Pre / Simulation Jobs / Post (UW Condor)

ooHits at NCSA

ooDigis at NCSA

Delay due to script error

Trace of a Condor-G Physics Run

Page 39: Grid Computing

[email protected] Argonne Chicago

The 13.6 TF TeraGrid:Computing at 40 Gb/s

26

24

8

4 HPSS

5

HPSS

HPSS UniTree

External Networks

External NetworksExternal

Networks

External Networks

Site Resources Site Resources

Site ResourcesSite ResourcesNCSA/PACI8 TF240 TB

SDSC4.1 TF225 TB

Caltech Argonne

TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org

Page 40: Grid Computing

[email protected] Argonne Chicago

International Virtual Data Grid Lab

Tier0/1 facilityTier2 facility

10+ Gbps link

2.5 Gbps link

622 Mbps link

Other link

Tier3 facility

Page 41: Grid Computing

[email protected] Argonne Chicago

Grids and Industry Scientific/technical apps in industry

– “IntraGrid”: Aerospace, pharmaceuticals, …– “InterGrid”: multi-company teams– Globus Toolkit provides a starting point

> Compaq, Cray, Entropia, Fujitsu, Hitachi, IBM, Microsoft, NEC, Platform, SGI, Sun all porting etc.

Enable resource sharing outside S&TC– “Grid Services”—extend Web Services with capabilities

provided by Grid protocols> Focus of IBM-Globus collaboration

– Future computing infrastructure based on Grid-enabled xSPs & applications?

Page 42: Grid Computing

[email protected] Argonne Chicago

Acknowledgments Globus R&D is joint with numerous people

– Carl Kesselman, Co-PI; Steve Tuecke, principal architect at ANL; others acked below

GriPhyN R&D is joint with numerous people– Paul Avery, Co-PI; Mike Wilde, project coordinator; Carl

Kesselman, Miron Livny CS leads; numerous others PPDG R&D is joint with numerous people

– Richard Mount, Harvey Newman, Miron Livny, Co-PIs; Ruth Pordes, project coordinator

Numerous other project partners Support: DOE, DARPA, NSF, NASA, Microsoft

Page 43: Grid Computing

[email protected] Argonne Chicago

Summary “Grids”: Resource sharing & problem solving in

dynamic virtual organizations– Many projects now working to develop, deploy,

apply relevant technologies Common protocols and services are critical

– Globus Toolkit a source of protocol and API definitions, reference implementations

Rapid progress on definition, implementation, and application of Data Grid architecture

First indications of industrial adoption