grid and e-science technologies

35
Grid and e-Science Technologies Simon Cox Technical Director Southampton Regional e- Science Centre

Upload: zelia

Post on 12-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Grid and e-Science Technologies. Simon Cox Technical Director Southampton Regional e-Science Centre. Summary. The Grid problem : Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid and e-Science Technologies

Grid and e-Science Technologies

Simon CoxTechnical Director

Southampton Regional e-Science Centre

Page 2: Grid and e-Science Technologies

Summary The Grid problem: Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual organizations Grid architecture: Protocol, service definition for

interoperability & resource sharing Grid Middleware

Globus Toolkit a source of protocol and API definitions—and reference implementations

Open Grid Services Architecture represents next step in evolution Condor High throughput Computing Web Services & W3C leveraging e-business

e-Science Projects applying Grid concepts to applications

Page 3: Grid and e-Science Technologies

Grid Computing

Page 4: Grid and e-Science Technologies

The Grid Problem“Flexible, secure, coordinated resource sharing among

dynamic collections of individuals, institutions, and resource”

- “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” by Foster, Kesselman and Tuecke

Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals - assuming the absence of … central location central control omniscience existing trust

Page 5: Grid and e-Science Technologies

Why Grids? (1) e-Science A biochemist exploits 10,000 computers to screen

100,000 compounds in an hour 1,000 physicists worldwide pool resources for peta-op

analyses of petabytes of data Civil engineers collaborate to design, execute, &

analyze shake table experiments Climate scientists visualize, annotate, & analyze

terabyte simulation datasets An emergency response team couples real time data,

weather model, population data

Page 6: Grid and e-Science Technologies

Grid Communities & Applications:Data Grids for High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

www.griphyn.org www.ppdg.net www.eu-datagrid.org

Page 7: Grid and e-Science Technologies

Network for EarthquakeEngineering Simulation

NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other

On-demand access to experiments, data streams, computing, archives, collaboration

NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

Page 8: Grid and e-Science Technologies

DOE X-ray grand challenge: ANL, USC/ISI, NIST, U.Chicago

tomographic reconstruction

real-timecollection

wide-areadissemination

desktop & VR clients with shared controls

Advanced Photon Source

Online Access to Scientific Instruments

archival storage

Page 9: Grid and e-Science Technologies

Why Grids? (2) e-Business Engineers at a multinational company collaborate on

the design of a new product A multidisciplinary analysis in aerospace couples code

and data in four companies An insurance company mines data from partner

hospitals for fraud detection An application service provider offloads excess load to

a compute cycle provider An enterprise configures internal & external resources

to support e-Business workload

Page 10: Grid and e-Science Technologies
Page 11: Grid and e-Science Technologies

Grids: Why Now?

Moore’s law highly functional end-systems Ubiquitous Internet universal connectivity Network exponentials produce dramatic changes in

geometry and geography 9-month doubling: double Moore’s law! 1986-2001: x340,000; 2001-2010: x4000?

New modes of working and problem solving emphasize teamwork, computation

New business models and technologies facilitate outsourcing

Page 12: Grid and e-Science Technologies

Elements of the Problem Resource sharing

Computers, storage, sensors, networks, … Heterogeneity of device, mechanism, policy Sharing conditional: negotiation, payment, …

Coordinated problem solving Integration of distributed resources Compound quality of service requirements

Dynamic, multi-institutional virtual organisations Dynamic overlays on classic org structures Map to underlying control mechanisms

http://www.globus.org/research/papers/anatomy.pdf

Page 13: Grid and e-Science Technologies

The Grid World: Current Status Dozens of major Grid projects in scientific & technical

computing/research & education Deployment, application, technology

Some consensus on key concepts and technologies Open source Globus Toolkit™ a de facto standard for major

protocols & services Far from complete or perfect, but out there, evolving rapidly,

and large tool/user base

Global Grid Forum a significant force Industrial interest emerging rapidly

http://www.gridforum.org

Page 14: Grid and e-Science Technologies

Grid Middleware(coordinate and authenticate use of grid services)

Globus (and GGF grid-computing protocols) Security Infrastructure (GSI) Resource Allocation Mechanism (GRAM) Resource Information System (GRIS) Index Information Service (GIIS) Grid-FTP Metadirectory service (MDS 2.0+) coupled to LDAP server

Condor (distributed high performance throughput system) Condor-G allows us to handle dispatching jobs to our Globus system Active collaboration from with the Condor development team at

University of Wisconsin (Miron Livny)

Page 15: Grid and e-Science Technologies

The Globus ProjectMaking Grid computing a reality

Close collaboration with real Grid projects in science and industry Development and promotion of standard Grid protocols to enable

interoperability and shared infrastructure Development and promotion of standard Grid software APIs and SDKs

to enable portability and code sharing The Globus Toolkit: Open source, reference software base for building

grid infrastructure and applications Global Grid Forum: Development of standard protocols and APIs for

Grid computing

http://www.gridforum.orghttp://www.globus.org

Page 16: Grid and e-Science Technologies

Four Key Protocols The Globus Toolkit centers around four key protocols

Connectivity layer: Security: Grid Security Infrastructure (GSI)

Resource layer: Resource Management: Grid Resource Allocation Management

(GRAM) Information Services: Grid Resource Information Protocol

(GRIP) Data Transfer: Grid File Transfer Protocol (GridFTP)

Page 17: Grid and e-Science Technologies

User

Userprocess #1

Proxy

Authenticate & create proxy

credential

GSI(Grid

Security Infrastruc-

ture)

Gatekeeper(factory)

Reliable remote

invocation

GRAM(Grid Resource Allocation & Management)

Reporter(registry +discovery)

Userprocess #2Proxy #2

Create process Register

The Globus Toolkit in One Slide Grid protocols (GSI, GRAM, …) enable resource sharing within virtual orgs; toolkit provides reference implementation ( = Globus Toolkit services)

Protocols (and APIs) enable other tools and services for membership, discovery, data mgmt, workflow, …

Other service(e.g. GridFTP)

Other GSI-authenticated remote service

requests

GIIS: GridInformationIndex Server (discovery)

MDS-2(Meta Directory Service)

Soft stateregistration;

enquiry

Page 18: Grid and e-Science Technologies

Globus Toolkit: Evaluation (+)

Good technical solutions for key problems, e.g. Authentication and authorization Resource discovery and monitoring Reliable remote service invocation High-performance remote data access

This + good engineering is enabling progress Good quality reference implementation, multi-language

support, interfaces to many systems, large user base, industrial support

Growing community code base built on tools

Page 19: Grid and e-Science Technologies

Globus Toolkit: Evaluation (-) Protocol deficiencies, e.g.

Heterogeneous basis: HTTP, LDAP, FTP No standard means of invocation, notification, error

propagation, authorization, termination, …

Significant missing functionality, e.g. Databases, sensors, instruments, workflow, … Virtualization of end systems (hosting envs.)

Little work on total system properties, e.g. Dependability, end-to-end QoS, … Reasoning about system properties

Page 20: Grid and e-Science Technologies

4

http:/ / www.cs.wisc.edu/ condor

The Condor Project (Established ‘85)Distributed High Throughput Computing researchperformed by a team of ~25 faculty, full t ime staffand students who:

h f ace sof tware engineering challenges in adistributed UNI X/ Linux/ NT environment,

h are involved in national and internationalcollaborations,

h actively interact with academic and commercialusers,

h maintain and support a large distributedproduction environment,

h and educate and train students.Funding – US Govt. (DoD, DoE, NASA, NSF),AT&T, IBM, I NTEL, Microsoft UW- Madison

Page 21: Grid and e-Science Technologies

What is Condor? Condor converts collections of distributively owned

workstations and dedicated clusters into a distributed high-throughput computing facility.

Condor uses ClassAd Matchmaking to make sure that everyone is happy.

Features Unix and NT Operational since 1986 Manages more than 1300 CPUs at UW-Madison Software available free on the web More than 150 Condor installations worldwide in academia

and industry Non-dedicated resources Job checkpoint and migration

Page 22: Grid and e-Science Technologies

What is High-Throughput Computing?

High-performance: CPU cycles/second under ideal circumstances. “How fast can I run simulation X on this machine?”

High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances. “How fast can I run simulation X on this machine?” “How many times can I run simulation X in the next

month using all available machines?”

Page 23: Grid and e-Science Technologies

Some HTC Challenges Condor does whatever it takes to run your jobs,

even if some machines… Crash (or are disconnected) Run out of disk space Don’t have your software installed Are frequently needed by others Are far away & managed by someone else

Page 24: Grid and e-Science Technologies

What is ClassAd Matchmaking? Condor uses ClassAd Matchmaking to make sure

that work gets done within the constraints of both users and owners.

Users (jobs) have constraints: “I need an Alpha with 256 MB RAM”

Owners (machines) have constraints: “Only run jobs when I am away from my desk and

never run jobs owned by Bob.”

Page 25: Grid and e-Science Technologies

Condor Pool Architecture

Page 26: Grid and e-Science Technologies

Mathematicians Solve NUG30 Looking for the solution to

the NUG30 quadratic assignment problem

An informal collaboration of mathematicians and computer scientists

Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites)

14,5,28,24,1,3,16,15,10,9,21,2,4,29,25,22,13,26,17,30,6,20,19,8,18,7,27,12,11,23

MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

Page 27: Grid and e-Science Technologies

What Is Condor-G? Enhanced version of Condor that provides robust job

management for Globus Toolkit Robust replacement for globusrun Provides extensive fault-tolerance Brings Condor’s job management features to Globus jobs

Two Parts Globus Universe GlideIn

Excellent example of applying the general purpose Globus Toolkit to solve a particular problem (i.e. high-throughput computing) on the Grid

Page 28: Grid and e-Science Technologies

Why Use Condor-G Condor

Designed to run jobs within a single administrative domain

Globus Toolkit Designed to run jobs across many administrative domains

Condor-G Combine the strengths of both

Page 29: Grid and e-Science Technologies

Web Services Increasingly popular standards-based framework for accessing network

applications W3C standardization; Microsoft, IBM, Sun, others

XML and XML Schema Representing data in a portable format

WSDL: Web Services Description Language Interface Definition Language for Web services

SOAP: Simple Object Access Protocol XML-based RPC protocol; common WSDL target

WSDL (/ WS-Inspection) Conventions for locating service descriptions

UDDI: Universal Description, Discovery, & Integration Directory for Web services

Page 30: Grid and e-Science Technologies

Structure Structure (XML Schemas)(XML Schemas)

XML Web Services FrameworkXML Web Services Framework

WireWire DescriptionDescription DiscoveryDiscovery

Syntax (XML)Syntax (XML)

Envelope & Envelope & Extensibility Extensibility

(SOAP)(SOAP)InspectionInspection

(DISCO)(DISCO)

Directory (UDDI)Directory (UDDI)Service Service

DescriptionDescription(WSDL)(WSDL)

Process Process OrchestrationOrchestration

(XLANG)(XLANG)AttachmentsAttachments

W3C W3C RecRec

W3C WGW3C WG

FutureFuture

SecuritySecurity

RoutingRouting

ReliabilityReliability

ServiceServiceDescriptionDescription

(WSDL)(WSDL)

ProcessProcessOrchestrationOrchestration

(XLANG)(XLANG)

Page 31: Grid and e-Science Technologies

“New” GlobusOpen Grid Services Architecture (OGSA)

Service orientation to virtualize resources From Web services:

Standard interface definition mechanisms: multiple protocol bindings, multiple implementations, local/remote transparency

Building on Globus Toolkit: Grid service: semantics for service interactions Management of transient instances (& state) Factory, Registry, Discovery, other services Reliable and secure transport

Multiple hosting targets: J2EE, .NET, “C”, …http://www.globus.org/research/papers/ogsa.pdf

http://www.globus.org/research/papers/gsspec.pdf

Page 32: Grid and e-Science Technologies

OGSA Service Model System comprises (a typically few) persistent services &

(potentially many) transient services All services adhere to specified Grid service interfaces and

behaviours Reliable invocation, lifetime management, discovery,

authorization, notification, upgradeability, concurrency, manageability

Interfaces for managing Grid service instances Factory, registry, discovery, lifetime, etc.

=> Reliable, secure management of distributed state

Page 33: Grid and e-Science Technologies

Using OGSAto Construct Grid Environments

Factory RegistryService

FactoryH2R

Mapper

ServiceService Service ...

...

(a) Simple HostingEnvironment

Factory RegistryService

FactoryH2R

Mapper

ServiceService Service ...

...

F R

F M

SS S

F R

F M

SS S

(b) Virtual HostingEnvironment

E2EFactory

E2E Reg

E2E H2RMapper

...

F1

R

M

SS S

F2

R

M

SS S

E2E S E2E S E2E S

(c) Compound Services

In each case, Registry handle is effectively the uniquename for the virtual organization.

Page 34: Grid and e-Science Technologies

Evolution of Globus Initial exploration (1996-1999; Globus 1.0)

Extensive application experiments; core protocols

Data Grids (1999-??; Globus 2.0+) Large-scale data management and analysis

Open Grid Services Architecture (2001-??, Globus 3.0) Integration with Web services, hosting environments,

resource virtualization Databases, higher-level services

Radically scalable systems (2003-??) Sensors, wireless, ubiquitous computing

Page 35: Grid and e-Science Technologies

Summary The Grid problem: Resource sharing & coordinated problem

solving in dynamic, multi-institutional virtual organizations Grid architecture: Protocol, service definition for

interoperability & resource sharing Grid Middleware

Globus Toolkit a source of protocol and API definitions—and reference implementations

Open Grid Services Architecture represents next step in evolution Condor High throughput Computing Web Services & W3C leveraging e-business

e-Science Projects applying Grid concepts to applications