grids and globus at bnl presented by john scott leita

26
Grids and Globus at BNL Presented by John Scott Leita

Upload: phillip-mcbride

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grids and Globus at BNL Presented by John Scott Leita

Grids and Globus at BNL

Presented by John Scott Leita

Page 2: Grids and Globus at BNL Presented by John Scott Leita

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements met?

Page 3: Grids and Globus at BNL Presented by John Scott Leita

A grid is a bunch of nodes interconnected via a network. It includes a set of standards to make its use possible. Think of the US electric power Grid as an example. There are nodes that supply various amounts of power and nodes that use that power. In order for this to work the grid must include standards such as the 120V 60Hz signal that we all use.

What is a grid?

Page 4: Grids and Globus at BNL Presented by John Scott Leita

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements met?

Page 5: Grids and Globus at BNL Presented by John Scott Leita

What use is the grid in terms of computing?

The Grid is used to enforce standards and to offer libraries (APIs) that simplify writing grid compliant software. This levels the playing field so all nodes can use and offer services over a heterogeneous network.

Page 6: Grids and Globus at BNL Presented by John Scott Leita

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements met?

Page 7: Grids and Globus at BNL Presented by John Scott Leita

•Resources – Large storage (HPSS), local drives, CPUs (Farms, Clusters, Supercomputers, PCs), queues (LSF). •Data –Files and databases with storage elements

abstracted.•Jobs – Users should be able to submit various jobs to

be executed. These jobs can include things like executables, parallel programs (MPI), batch jobs, SQL queries, and shell scripts. Much like like data the resources should be abstracted as much as the user would like.

•Security – Anyone who uses the grid should be authenticated and checked for authorization for requested resources or data access.

What use is the grid in terms of computing?

Page 8: Grids and Globus at BNL Presented by John Scott Leita

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements currently met?

Page 9: Grids and Globus at BNL Presented by John Scott Leita

How are these requirements currently met?

One popular way of implementing computational and data grid is by use of middleware. Middleware is software that lies between the operating system and user applications. This layer provides standards and functionality like the requirements listed on the previous slide. Globus is a software toolkit that realizes a middleware based grid.

Page 10: Grids and Globus at BNL Presented by John Scott Leita

Globus Questions?

What specifically is Globus?What does it include?How does it work?

Page 11: Grids and Globus at BNL Presented by John Scott Leita

What specifically is Globus?Globus is a software package that implements the

grid requirement mentioned on previous slides. It’s package consists of three distinct parts:

1. Client middleware – used to access remote data, and resources. Also used to submit, run and manage jobs.

2. Server middleware – used to offer data and resources.

3. Libraries and API – used by developers to allow easy production of grid friendly software. Also to easily convert existing software to be grid friendly.

Page 12: Grids and Globus at BNL Presented by John Scott Leita

Globus Questions?

What specifically is Globus?What does it include?How does it work?

Page 13: Grids and Globus at BNL Presented by John Scott Leita

What does it include?Globus includes the following parts to implement the grid requirements:

GRAM (Globus Resource Manager) – Used to submit and control jobs over the grid.NEXUS – A library that allows different jobs to communicate with each other.GSI (Grid Security Infrastructure) – Provides PKI authentication.GASS (Globus Access to Secondary Storage) – Makes accessing data the same as accessing web pages. MDS (Meta Computing Directory Service) – informationabout the availability of resources.

Page 14: Grids and Globus at BNL Presented by John Scott Leita

Globus Questions?

What specifically is Globus?What does it include?How does it work?

Page 15: Grids and Globus at BNL Presented by John Scott Leita

How does it work?It is best if each component is explained separately:•GRAM•NEXUS•GSI•GASS•MDS

Page 16: Grids and Globus at BNL Presented by John Scott Leita

GRAM OVERVIEWThe Globus Resource Allocation Manager (GRAM) is the lowest level of Globus resource management architecture. GRAM allows you to run jobs remotely, providing an API for submitting, monitoring, and terminating your job.

A job is submitted, the request is sent to the gatekeeper of the remote computer. The gatekeeper handles the request and creates a job manager for the job. The job manager starts and monitors the remote program, communicating state changes back to the user on the local machine. When the remote application terminates, normally or by failing, the job manager terminates as well.

NOTE: This text is from the Globus Website:www.globus.org/gram/overview.html

Page 17: Grids and Globus at BNL Presented by John Scott Leita

From www.globus.org

GRAM Illustration

Page 18: Grids and Globus at BNL Presented by John Scott Leita

GRAM ExampleGRAM can be included in your program by using the

GRAM API or you can use an existing client program to submit a job(s).

Example

%globus-job-run somehost.anl.gov /bin/echo hello

This will cause the remote machine to execute the echo command. This is a very simple example with a very limited scope but still there is a lot going on behind the seines. First this program automatically writes a RSL to inform the remote machine of the argument (hello) the executable (echo), the host (somehost), and the environment variables needed (to redirect stdio). Authentication takes place at this time using SSL. Then it gives the remote gatekeeper the RSL. The remote gatekeeper starts a job manager which forks a child process that is the actual job. When the job is finished a GASS server is started and the local host is notified that the status of the job is complete. The local host uses a GASS client to get the standard output. GRAM can be used for much more powerful submissions however, for example submitting jobs to many different queues such as LSF that are detected using a resource broker.

Page 19: Grids and Globus at BNL Presented by John Scott Leita

NexusAs stated before nexus is a method of passing messages between processes the previous version of MPI-G used nexus as its protocol to pass messages. However the new version of MPI does this more efficiently itself. Nexus is still however an easy way to write programs that need to talk to one another. Here is the formal definition of Nexus from the www.globus.org website:Nexus is a runtime library designed primarily as a compiler target forlanguages supporting task-parallel and mixed data- and task-parallelexecution. The Nexus interface and Nexus design are describedelsewhere; here, we provide the information needed to executeprograms that use Nexus services.

Page 20: Grids and Globus at BNL Presented by John Scott Leita

GSIThe Grid Security Infrastructure (GSI) is a set of libraries and tools, provided with the Globus Metacomputing Toolkit, for doing secure authentication over an open network. The GSI allows you to use an X.509 certificate, normally used for authentication on a system running Globus, for other tasks such as system log on. It is possible to use just the authentication portion of Globus via two applications, GSI-enabled Secure Shell (SSH) and GSI-enabled FTP (gsiftp). From www.globus.org/security/v1.1

Globus supports proxy certificates. This means that a user can authenticate once per session and use all authorized grid services without constantly being asked to authenticate themselves.

Page 21: Grids and Globus at BNL Presented by John Scott Leita

GASSGASS simplifies the porting and running of applications that use file I/O to the Globus environment. Libraries and utilities are provided to eliminate the need to manually login to sites and ftp files install a distributed file system The APIs are designed to allow reuse of programs that use Unix or standard C I/O with little or no modification. Currently the ftp and x-gass (GASS server) protocols are supported. From www.globus.org/gass

Now for my description:GASS allows data to be accessed over the grid with ease. Much like GRAM you can create your own programs using the GASS-API or you can use clients already created.Example:

%globus-rcp somehost.com:data.dat /hereThis will retrieve the requested file via http protocol. It is interesting to note that GRAM is used to start the GASS server on the remote end.

Page 22: Grids and Globus at BNL Presented by John Scott Leita

MDSThe MDS is a directory service that is based on the LDAP protocol. It is used to query both static and dynamic information on grid resources such as:•Available CPUs•Available Storage•Scientific instrumentsEach node with Globus installed has a Grid Resource information System (GRIS) component in the middleware. That node reports all relevant statuses to the GRIS. An organization or collaboration then maintains a Grid Index Information System (GIIS) that can pull information from the GRISs of any node that its in charge of.

Page 23: Grids and Globus at BNL Presented by John Scott Leita

Globus Hourglass

Redraw of picture fount in The Grid : Blueprint for a New Computing Infrastructure by Ian Foster (Editor), Carl Kesselman

Page 24: Grids and Globus at BNL Presented by John Scott Leita

DataBase Replication

Page 25: Grids and Globus at BNL Presented by John Scott Leita

The Future

GRID User promptGrid Virtual File SystemGlobus Replica Catalog

Page 26: Grids and Globus at BNL Presented by John Scott Leita

References

www.globus.orgwww.gridforum.orgwww.cern.ch/gridThe Grid: Blueprint for a New Computing Infrastructure Edited by Ian Foster and Carl Kesselman.