an introduction to grid computing beam workshop december 2004 mark servilla [email protected]...

41
An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla [email protected] LTER Network Office

Upload: noah-hudkins

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

An Introduction to Grid Computing

BEAM Workshop

December 2004

Mark Servilla

[email protected]

LTER Network Office

Page 2: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 2

Presentation Agenda

Definitions Evolution of the Grid Characteristics Computing Model Protocols Examples References

Page 3: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 3

Definitions of a Grid “… a network of conductors for distribution of electric

power; also : a network of radio or television stations” – Merriam-Webster

“… the illusion of a simple yet large and powerful self-managing virtual computer out of a large collection of connected heterogeneous systems sharing various combinations of resources” – IBM Redbooks

“Grid Computing enables virtual organizations to share geographically distributed resources as they pursue common goals, assuming the absence of central location, central control, omniscience, and an existing trust relationship.” – Globus Alliance

“The Web provides us information — the grid allows us to process it.” - Ahmar Abbas

Page 4: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 4

The Evolution ofGrid Technology

High-Performance Computing Cluster Computing Peer-to-Peer Computing Internet Computing

Page 5: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 5

High-Performance Computing

Traditionally known as super-computing

Specialized for parallel processing algorithms

Shared equally among academia, research, and commercial sectors

Page 6: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 6

Cluster Computing Originated 1994 – Beowulf cluster NASA High-performance Massively-parallel (2 to 1000 nodes) Commodity hardware (Intel, AMD) Low-cost software (Linux, FreeBSD) Interconnected via high-speed private networks Shared storage SAN/NAS

AMD Athlon cluster at University of Heidelberg, Germany – 825Gflops, 35th fastest high-performance computer in the world

Page 7: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 7

Cluster Computing

Page 8: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 8

Peer-to-Peer Computing

Primarily used for distributed storage and file-sharing

Early models (rcp, scp, ftp) Restricted to LANs, or Limited to known peers

Internet-based models Centralized (Napster, Kazaa*) Decentralized (Gnutella)

*100,000,000 downloads by 2004; 2-million new downloads a week

Page 9: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 9

Centralized Peer-to-Peer

.mp3

?

??

??

?.mp3 .mp3 .mp3

Page 10: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 10

Decentralized Peer-to-Peer

?

?

??

?

?

.mp3 .mp3 .mp3 .mp3

Page 11: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 11

Internet Computing Volunteer or philanthropic

computing; utilizes personal desktop computers connected to the Internet

Desktop computers idle approximately 95% of the their lifespan

Divide and Conqueror approach Tasks broken into smaller

subtasks Desktop executes subtasks

during idle time Desktop sends data back to

central server, which aggregates results

Page 12: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 12

Synthesis entrée Grid

High-performance computing pioneered the use of “parallel” algorithms

Cluster computing demonstrated the nature of shared computing and

storage load balancing protocols

Peer-to-peer computing distributed storage resource with no central authority

Internet computing geographically distributed virtual organization fabric of the project vanishes with completion of the task

Page 13: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 13

Grid Characteristics Resources that

are connected via a network are geographically distributed may consist of heterogeneous hardware and/or

software are managed transparently for performance and

fault tolerance Creates the illusion of virtual organizations

and projects without the presence of a central authority, or a central control

Explicit trust relationships between users and resources

A system that scales in space and time

Page 14: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 14

Types of Resources Computation

utilization of computing cycles found on processors of the machines on the grid

Storage to increase capacity, performance, sharing, and reliability of data

Communication to increase capacity, performance, and reliability of data

communication Collaboration tools

to facilitate collaboration through conferencing, visualization, and data sharing

Software and Licenses to share site-specific software and/or licenses

Special equipment, capacities, architectures, and policies printers, imaging, sensors, or other local specialty resources

Page 15: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 15

Grid Ingredients

Page 16: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 16

Grid Topologies Departmental Grids

localized to a specific group of people generally, same hardware and software designed for high throughput and high performance over a

dedicated network Enterprise Grids

service to numerous groups within a single company or campus

resource heterogeneity increases company-wide local area network

Extraprise Grids service to multiple companies, partners, and customers within

a particular domain domain based private network

Global Grids established over the public-Internet

Page 17: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 17

Resource-based Grids

Compute Grids desktop nodes server nodes high-performance computing clusters

Data Grids performance-based distributed storage replication for fault-tolerance

Collaboration Grids support for video-conferencing, visualization and data sharing

Utility Grids maintained and managed by a commercial service provider compute resources acquired on a per-need basis application resources that are purchased on a per-use or per-

minute basis

Page 18: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 18

Application Characteristics

Perfect Parallelism – computations run autonomously (Monte Carlo Simulations)

Data Parallelism – operations performed on data simultaneously (db searches)

Functional Parallelism – multiple operations are performed simultaneously

Optimized for parallel execution

Not capable of parallel computation

Fibonacci Series (1, 1, 2, 3, 5, 8, 13, 21,…)F(k+2) = F(k+1) + F(k)

Page 19: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 19

Questions to ask?When thinking Grid

Identity and Authentication—Is this user who he says he is? Is this program the right program?

Authorization and Policy—What can the user do on the grid? What can the application do on the grid? What resources are the user and or application allowed to access?

Resource Discovery—Where are the resources? Resource Characterization—What types of resources are

available? Resource Allocation—What policy is applied when assigning the

resources? What is the actual process of assigning the resources. Who gets how much?

Resource Management—Which resource can be used at what time and for what purpose?

Accounting/Billing/Service Level Agreement (SLA)—How much of the resources is being used? What is the rating schedule? What is the SLA?

Security—How do I make sure that this is done securely? How do we know if we have been compromised? What steps are taken once a security breach is detected?

Page 20: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 20

A Grid Computing Model

(the Globus view)

Software stack consisting of Standards Protocols APIs and SDKs

Loosely based on the Internet model

Page 21: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 21

A detailed view… Fabric – protocols and

interfaces to resource being shared

Connectivity – protocols for grid-specific network transactions (IP, DNS, WSDL); Security implementation (GSI)

Resource – protocols to initiate and control sharing of local resources (GRAM, GridFTP, GRIS)

Collective – protocols for system-wide deployment (versus local)

Application – protocols targeted at a specific application or class of applications

Page 22: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 22

Grid Protocols Grid Security Infrastructure (GSI) Grid Resource Allocation and Management

(GRAM) Grid File Transfer Protocol (GridFTP) Grid Information Services (GIS)

Page 23: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 23

Grid Security Infrastructure

Extended from SSL/TLS and X.509 protocols Utilizes PKI for Certificate Authority

Primary objective is “Authorization” Generates primary credential Generates temporary proxy credential

Certificate Authority Positively identify entities requesting certificates Issuing, removing, and archiving certificates Protecting the Certificate Authority server Maintaining a namespace of unique names for certificate

owners Serve signed certificates to those needing to

authenticate entities Logging activity

Page 24: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 24

Public Key Infrastructure

1. User A encrypts message with his private key

2. Obtains User B’s public key from CA

3. Encrypts message with B’s public key

4. Sends message

1. User B decrypts message with his private key

2. Obtains User A’s public key from CA

3. Decrypts A’s message with public key

4. B knows message is from A

Public

Private

Private

Public

PublicKeys

“A” “B”

CertificateAuthority

B’s publickey

A’s publickey

AuthenticationCredential

Page 25: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 25

Grid Security Infrastructure

Page 26: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 26

Grid Resource Allocation and Management

Allows programs to be started on remote resources Resource Specification Language (RSL)

Resource requirements machine type, number of nodes, memory, etc…

Job configuration directory, executable, arguments, environment

Communication protocols HTTP-base RPC (early protocol) Web-services (WSDL, SOAP)

“create 5-10 instances of myprog, each on a machine with at least 64MB memory that is available to me for 4 hours, or 10 instances, on a machine with

at least 32MB of memory”

Page 27: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 27

Grid File Transfer Protocol

Providing high-speed and reliable transfer of large volume data (petabytes)

Extension of standard FTP to include striped/parallel data channels partial files automatic and manual TCP buffer size settings progress monitoring extended restart functionality

Page 28: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 28

Grid Information Services

Grid Resource Information Service (GRIS) provides resource specific information

Grid Resource Registration (GRR) updates GRIS about resource status

Grid Index Information Service (GIIS) an aggregate directory service provides a collection of information that has

been gathered from multiple GRIS servers Grid Resource Inquiry (GRI)

queries GRIS server for resource information queries GIIS server for information

Page 29: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 29

Open Grid Services Architecture

Marriage of grid protocols with web service protocols

Specifications for How Grid Services are created and discovered How Grid Service instances are named and

referenced Interfaces that define any Grid Service

Initial release with GT 3.0 mid-2003; GT 4.0 Jan 2005

Page 30: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 30

Grid Examples Network for Earthquake Engineering and

Simulation (NEESGrid) Biomedical Informatics Research Network

(BIRN) EcoGrid

Page 31: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 31

NEESGrid(Network for Earthquake Engineering and

Simulation)

Linking scientists and facilities observation of an experiment in progress observation before and after an experiment remote operation of an experiment

Linking facilities and data hybrid operation of physical simulations with other simulations,

both physical and numerical automatic archiving of raw data, calibration data, and

processed data Linking scientists and data

collaborative views (static) of time synchronized data visualizations

collaborative views of time synchronized data visualizations with video and audio recordings

Linking scientists and other scientists synchronous communication, such as with colleagues during

an experiment asynchronous communication, such as with colleagues over

the course of preparing a publication resulting from an experiment

Page 32: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 32

NEESGrid(Network for Earthquake Engineering and

Simulation)

Page 33: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 33

NEESGrid(Network for Earthquake Engineering and

Simulation)

Network Architecture Diagram

Page 34: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 34

BIRN(Biomedical Informatics Research

Network)

Testbed for a biomedical knowledge infrastructure

Federated database of neuro-imaging data Fusion of diverse data sources (location; level of

aggregation) Grid access to computational resources Datamining software Scalable and extensible Driven by research needs, not technology-pull or

not technology-push

Page 35: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 35

BIRN(Biomedical Informatics Research

Network)

Page 36: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 36

BIRN(Biomedical Informatics Research

Network)

Page 37: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 37

EcoGrid Metadata Standardization

Ecological Metadata Language – “EML” Integrate diverse data networks from ecology, biodiversity, and

environmental sciences Standardized interfaces to data resources

Metacat SRB DiGIR Xanthoria

Metadata-mediated data access (application-based) Supports multiple metadata standards EML, Darwin Core as foci

Computational services Pre-defined analytical services On-the-fly analytical services

Page 38: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 38

EcoGrid

*EML facilitates semi-automatic data binding

Page 39: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 39

EcoGrid

Page 40: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 40

Grid Organizations Globus Alliance

Globus ToolkitTM – Reference implementation of the grid architecture and grid protocols

http://www.globus.org NSF Middleware Initiative (NMI)

Supports the design, development, testing, and deployment of middleware for HPC

http://www.nsf-middleware.org GRIDS Center

Grid Research Integration Deployment and Support Center – part of NMI

http://www.grids-center.org Global Grid Forum

Main standards body governing the world-wide grid community

http://www.globalgridforum.org

Page 41: An Introduction to Grid Computing BEAM Workshop December 2004 Mark Servilla servilla@LTERnet.edu LTER Network Office

SEEK-BEAM Workshop Dec 2004 41

Recommended Texts

Grid Computing: A Practical Guide to Technology and Applications Ahmar Abbas Charles River Media © 2004

Introduction to Grid Computing with Globus Luis Ferreira et al. IBM Redbooks © 2004

Enabling Applications for Grid Computing with Globus Bart Jacob et al. IBM Redbooks © 2003

Grid Services Programming and Application Enablement Luis Ferreira et al. IBM Redbooks © 2004