grids and globus at bnl presented by john scott leita

Post on 15-Jan-2016

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Grids and Globus at BNL

Presented by John Scott Leita

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements met?

A grid is a bunch of nodes interconnected via a network. It includes a set of standards to make its use possible. Think of the US electric power Grid as an example. There are nodes that supply various amounts of power and nodes that use that power. In order for this to work the grid must include standards such as the 120V 60Hz signal that we all use.

What is a grid?

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements met?

What use is the grid in terms of computing?

The Grid is used to enforce standards and to offer libraries (APIs) that simplify writing grid compliant software. This levels the playing field so all nodes can use and offer services over a heterogeneous network.

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements met?

•Resources – Large storage (HPSS), local drives, CPUs (Farms, Clusters, Supercomputers, PCs), queues (LSF). •Data –Files and databases with storage elements

abstracted.•Jobs – Users should be able to submit various jobs to

be executed. These jobs can include things like executables, parallel programs (MPI), batch jobs, SQL queries, and shell scripts. Much like like data the resources should be abstracted as much as the user would like.

•Security – Anyone who uses the grid should be authenticated and checked for authorization for requested resources or data access.

What use is the grid in terms of computing?

Grid Basics

• What is a grid?• What use is the grid in terms of computing?• What are some grid requirements?• How are these requirements currently met?

How are these requirements currently met?

One popular way of implementing computational and data grid is by use of middleware. Middleware is software that lies between the operating system and user applications. This layer provides standards and functionality like the requirements listed on the previous slide. Globus is a software toolkit that realizes a middleware based grid.

Globus Questions?

What specifically is Globus?What does it include?How does it work?

What specifically is Globus?Globus is a software package that implements the

grid requirement mentioned on previous slides. It’s package consists of three distinct parts:

1. Client middleware – used to access remote data, and resources. Also used to submit, run and manage jobs.

2. Server middleware – used to offer data and resources.

3. Libraries and API – used by developers to allow easy production of grid friendly software. Also to easily convert existing software to be grid friendly.

Globus Questions?

What specifically is Globus?What does it include?How does it work?

What does it include?Globus includes the following parts to implement the grid requirements:

GRAM (Globus Resource Manager) – Used to submit and control jobs over the grid.NEXUS – A library that allows different jobs to communicate with each other.GSI (Grid Security Infrastructure) – Provides PKI authentication.GASS (Globus Access to Secondary Storage) – Makes accessing data the same as accessing web pages. MDS (Meta Computing Directory Service) – informationabout the availability of resources.

Globus Questions?

What specifically is Globus?What does it include?How does it work?

How does it work?It is best if each component is explained separately:•GRAM•NEXUS•GSI•GASS•MDS

GRAM OVERVIEWThe Globus Resource Allocation Manager (GRAM) is the lowest level of Globus resource management architecture. GRAM allows you to run jobs remotely, providing an API for submitting, monitoring, and terminating your job.

A job is submitted, the request is sent to the gatekeeper of the remote computer. The gatekeeper handles the request and creates a job manager for the job. The job manager starts and monitors the remote program, communicating state changes back to the user on the local machine. When the remote application terminates, normally or by failing, the job manager terminates as well.

NOTE: This text is from the Globus Website:www.globus.org/gram/overview.html

From www.globus.org

GRAM Illustration

GRAM ExampleGRAM can be included in your program by using the

GRAM API or you can use an existing client program to submit a job(s).

Example

%globus-job-run somehost.anl.gov /bin/echo hello

This will cause the remote machine to execute the echo command. This is a very simple example with a very limited scope but still there is a lot going on behind the seines. First this program automatically writes a RSL to inform the remote machine of the argument (hello) the executable (echo), the host (somehost), and the environment variables needed (to redirect stdio). Authentication takes place at this time using SSL. Then it gives the remote gatekeeper the RSL. The remote gatekeeper starts a job manager which forks a child process that is the actual job. When the job is finished a GASS server is started and the local host is notified that the status of the job is complete. The local host uses a GASS client to get the standard output. GRAM can be used for much more powerful submissions however, for example submitting jobs to many different queues such as LSF that are detected using a resource broker.

NexusAs stated before nexus is a method of passing messages between processes the previous version of MPI-G used nexus as its protocol to pass messages. However the new version of MPI does this more efficiently itself. Nexus is still however an easy way to write programs that need to talk to one another. Here is the formal definition of Nexus from the www.globus.org website:Nexus is a runtime library designed primarily as a compiler target forlanguages supporting task-parallel and mixed data- and task-parallelexecution. The Nexus interface and Nexus design are describedelsewhere; here, we provide the information needed to executeprograms that use Nexus services.

GSIThe Grid Security Infrastructure (GSI) is a set of libraries and tools, provided with the Globus Metacomputing Toolkit, for doing secure authentication over an open network. The GSI allows you to use an X.509 certificate, normally used for authentication on a system running Globus, for other tasks such as system log on. It is possible to use just the authentication portion of Globus via two applications, GSI-enabled Secure Shell (SSH) and GSI-enabled FTP (gsiftp). From www.globus.org/security/v1.1

Globus supports proxy certificates. This means that a user can authenticate once per session and use all authorized grid services without constantly being asked to authenticate themselves.

GASSGASS simplifies the porting and running of applications that use file I/O to the Globus environment. Libraries and utilities are provided to eliminate the need to manually login to sites and ftp files install a distributed file system The APIs are designed to allow reuse of programs that use Unix or standard C I/O with little or no modification. Currently the ftp and x-gass (GASS server) protocols are supported. From www.globus.org/gass

Now for my description:GASS allows data to be accessed over the grid with ease. Much like GRAM you can create your own programs using the GASS-API or you can use clients already created.Example:

%globus-rcp somehost.com:data.dat /hereThis will retrieve the requested file via http protocol. It is interesting to note that GRAM is used to start the GASS server on the remote end.

MDSThe MDS is a directory service that is based on the LDAP protocol. It is used to query both static and dynamic information on grid resources such as:•Available CPUs•Available Storage•Scientific instrumentsEach node with Globus installed has a Grid Resource information System (GRIS) component in the middleware. That node reports all relevant statuses to the GRIS. An organization or collaboration then maintains a Grid Index Information System (GIIS) that can pull information from the GRISs of any node that its in charge of.

Globus Hourglass

Redraw of picture fount in The Grid : Blueprint for a New Computing Infrastructure by Ian Foster (Editor), Carl Kesselman

DataBase Replication

The Future

GRID User promptGrid Virtual File SystemGlobus Replica Catalog

References

www.globus.orgwww.gridforum.orgwww.cern.ch/gridThe Grid: Blueprint for a New Computing Infrastructure Edited by Ian Foster and Carl Kesselman.

top related