The NorduGrid project:Using Globus toolkit for
building Grid infrastructure
presented by Aleksandr Konstantinov
Mattias Ellert
Aleksandr Konstantinov
Balázs Kónya
Oxana Smirnova
Anders Wäänänen
[email protected] 2ACAT'2002, Moscow
Introduction
Launched in spring 2001, with the aim of creating a Grid infrastructure in the Nordic countries.
Partners from Denmark, Norway, Sweden, and Finland.
Powered mainly by ATLAS groups (Lund, Copenhagen, Stockholm, Uppsala, Oslo).
Relatively short term project - ends in October 2002.
Relies on very limited human resources (3 full-time researchers, few part-time ones) with funding from NorduNet2.
More info http://www.nordugrid.org/
[email protected] 3ACAT'2002, Moscow
Introduction (cont.)
The purpose of the project is to create and operate functional testbed.
Use approved tools => Globus ToolkitTM (developed at Argonne National Laboratory and University of Southern California) and tools developed at European Data Grid project.
Aim at High Energy Physics applications - take into account while choosing what to implement first.
No temporary solutions (it is better not to implement something, than to be forced to provide backward compatibility for limited solution).
[email protected] 4ACAT'2002, Moscow
Globus ToolkitTM evaluation
Widely accepted de-facto standard for Grid computing.
Provides collection of (mostly) robust protocols, libraries and low-level services.
Security built-in.
Continuously evolving (??).
Missing few important high-level services:
grid-level scheduler
job data stagein/stageout
user-friendly grid entry points (simple user-interface, web portals, etc.)
grid-level authorization system
grid-level accounting and quotas
[email protected] 5ACAT'2002, Moscow
No single point of failure
No central sandbox (unlike EDG)
Lightweight brokering integrated into User Interface
Job should not be Computing Element (cluster) specific
Non grid-aware jobs allowed ("grid functionality" is provided by middleware on Computing Element)
Job runs in as restrictive environment as possible (do not expect network on computing nodes)
Minimal environment is provided on Computing Element
Adequate and full (enough) information provided by InfoSystem
Natural computing unit is cluster
Queue, job and user information
NorduGrid requirements
[email protected] 6ACAT'2002, Moscow
NorduGrid architecture
Cluster
Workernodes
Front- end
Globus gatekeeper+ job manager
Globus GRIS
NorduGrid GridManager
Information Providers: queues, jobs, users
PBS
Working Directories/session1/session2
...
GridFTP
StorageElement(GridFTP)
control connectiondata connection
GlobusReplicaCatalog
GIIS
GIISGIIS
GIIS
Cluster
Cluster
User Interface and Broker
Auth
VirtualOrg.Catalog
[email protected] 7ACAT'2002, Moscow
Information System
NorduGrid operates an MDS based, hierarchically distributed Information System:
new information model for clusters, queues, jobs, users, SE, RC
efficient providers
all the job monitoring, resource discovery, status monitoring and brokering are exclusively built on top of the MDS
MDS hierarchy with dynamic site registrations
[email protected] 8ACAT'2002, Moscow
Information System(example)
cluster entry
queue entry
job entry
user entry
[email protected] 9ACAT'2002, Moscow
Information System (hierarchy)
[email protected] 10ACAT'2002, Moscow
Information System (interfaces)
[email protected] 11ACAT'2002, Moscow
Grid Manager - cluster middleware
Provide job control and data handling functionality (HEP applications requirements are first priority).
The Grid Manager is based on Globus ToolkitTM libraries and services. The following parts of Globus are used:
GridFTP - fast and reliable data access for Grid
GASS Copy interface - support for different data access protocols
Replica Catalog - metadata storage
GRAM - resource request
RSL - expandable Resource Specification Language
[email protected] 12ACAT'2002, Moscow
Grid Manager (features)
Stage in input data and executables. Possible sources:
Job submission machine.
GridFTP (preferred), FTP, HTTP or HTTPS servers.
Files registered in Globus Replica Catalog. Secure authentication. Destination is chosen automatically or can be forced.
Stage out output data. Possible destinations:
Keep on cluster till user downloads.
GridFTP, FTP, HTTP or HTTPS servers.
Files can be registered in Globus Replica Catalog. Destination and protocol are obtained from Location information.
[email protected] 13ACAT'2002, Moscow
Grid Manager (features)
E-mail notification of job status changes.
Support for software runtime environment configuration.
Jobs will be started with environment setup properly for requested application
Customizable GridFTP server
local access through plugins
certificate oriented local file system access plugin
job submission/access plugin - start job/upload input files/download output files through the same interface
Limitation: Data is handled only at that beginning and end of the job. User must provide information about input and output data.
[email protected] 14ACAT'2002, Moscow
Extensions to RSL (evaluation)
RSL stands for Resource Specification Language. Introduced to communicate job requirements to the Global Resource Allocation Manager (GRAM).
Useful features:
Allows basic logical expressions
Set of attributes is expandable
Unknown attributes are passed through.
Allows different parts to be processed at different levels.
Can be used to assist in writing brokers or filters which refine an RSL specification
[email protected] 15ACAT'2002, Moscow
Extensions to RSL (new attributes)
To support additional features new attributes introduced. The most important are
inputFiles=(<file> [<location>]) ... - list of files to be transferred to the computing node from a given location.
outputFiles=(<file> [<location>]) ...-list of files to be preserved after the job completion and transferred to a given location.
executables=<file1> <file2> ... -list of files to be given executable permissions.
notify=<options> <email> ... -E-mail notification on job status change.
[email protected] 16ACAT'2002, Moscow
Extensions to RSL (new attributes)
runTimeEnvironment=<string>... - application-specific runtime environment (e.g., ATLAS-3.2.1)
middleware=<string> -required middleware (e.g., NorduGrid-0.3.0)
cluster=<string> -specific cluster request
rerun=<number> -number of attempts to re-run the job
lifeTime=<number> -maximum time for the session directory to remain on the execution node (can not override local policy)
ftpThreads=<number> -number of GridFTP threads to be used for file transfers
[email protected] 17ACAT'2002, Moscow
User Interface
The NorduGrid toolkit user interface consists of a set of commands that can be executed from the command line
ngsub - for job submission
ngstat - to obtain the status of jobs and clusters
ngcat - to display the stdout or stderr of a running job
ngget - to retrieve the result from a finished job
ngkill - to kill a running job
ngclean - to delete a job from a remote cluster
ngsync - to recreate local information about jobs
[email protected] 18ACAT'2002, Moscow
User Interface
Job request is done through xRSL
processes user-level xRSL request and transforms to one suitable for GM
user-friendly values for some attributes
conditional submission and xRSL transformation
Performs brokering
analyzes information about the different clusters obtained from the MDS servers
from all suitable queues one is chosen randomly, with a weight proportional to the amount of free computing resources
Passes modified job request to GM through GRAM or GridFTP interface and uploads input files.
[email protected] 19ACAT'2002, Moscow
User Authentication Management
Using Globus certificates
NorduGrid Certification Authority established
Access control through gridmapfiles
User access control is delegated to Virtual Organization managers
Gridmapfiles are generated automatically from VO database
GSI enabled secure LDAP server
contains the Subject Names of the user's certificates
VO managers
User Groups and Group Managers
Local site adminisrators have total control over their gridmapfiles
[email protected] 20ACAT'2002, Moscow
Applications
It is possible to run any application with predefined set of input and output data
From as simple as "Hello World"ngsub '&(executable=/bin/echo)(arguments="Hello
World")(stdout=out.txt)'
[email protected] 21ACAT'2002, Moscow
Applications (cont.)
to as difficult as Atlas Data Challengengsub '&(executable = prod)(arguments = "0002" "2" "100")
(stdout = atlas.0002.log)(join = yes)(replicacollection = ldap://grid.uio.no/lc=ATLAS,rc=NorduGrid,dc=nordugrid, dc=org)
(inputfiles = ("atlsim.makefile" "") ("atlas.kumac" "") ("gen0017_1.root" "rc:///gen0017_1.root") )(outputfiles = ("atlas.0002.zebra" "rc:///results/atlas.0002.zebra") ("atlas.0002.his" "") )(runtimeenvironment="ATLAS-3.2.0")(middleware="NorduGrid")'
[email protected] 22ACAT'2002, Moscow
Conclusions
The minimal environment for Grid computing is established.
Globus tools alone are not enough for convenient usage, but provide solid base.
Additional layer of tools/services were developed to provide required infrastructure.
A lot of things to do:
Runtime data handling.
Accounting.
Better support for different LRMS.
Enhanced Information System - more stability, access control, better and richer information providers etc.
...