glite status

53
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org gLite Status Erwin Laure Deputy EGEE Middleware Manager On behalf of and with contributions from all JRA1

Upload: yoko-colon

Post on 04-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

gLite Status. Erwin Laure Deputy EGEE Middleware Manager On behalf of and with contributions from all JRA1. Contents. Integration and Testing Status of gLite components Interoperability with OSG/Grid3 SA1 requirements follow-up. Integration Overview. Activities split in four main areas: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: gLite Status

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

gLite Status

Erwin LaureDeputy EGEE Middleware Manager

On behalf of and with contributions from all JRA1

Page 2: gLite Status

2nd EGEE Conference, Den Haag 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Contents

• Integration and Testing

• Status of gLite components

• Interoperability with OSG/Grid3

• SA1 requirements follow-up

Page 3: gLite Status

2nd EGEE Conference, Den Haag 3

Enabling Grids for E-sciencE

INFSO-RI-508833

Integration Overview

• Activities split in four main areas:– The Build Servers and the Integration Infrastructure– Quality Assurance– Packaging and Installation– Configuration and service instrumentation

• A precise release process is followed as described in the project SCM Plan– https://edms.cern.ch/document/446241

• The guidelines for development, quality and configuration are described in the Developer’s Guide and the Configuration Guidelines Proposal– https://edms.cern.ch/document/468700– https://edms.cern.ch/document/486630

Page 4: gLite Status

2nd EGEE Conference, Den Haag 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Build System

• One nightly build server on RH Linux 3.0– Clean builds out of HEAD every night of all components– Packages (tarballs and RPMS) are published to the gLite web site– Tagged every night and totally reproducible

• One continuous build server on RH Linux 3.0– Incremental builds out of HEAD every 60 minutes– Results published to CruiseControl web site

• One nightly build server on Windows XP– Clean builds every night of all components (Java components build, C/C++ not

yet)– No results published yet. Goal is to have the clients available on Windows XP

for gLite 1.0• Integration builds are produced every week

– Based on developers tags or nightly build tags– Guaranteed to build– Contain additional documentation (release notes, installation and configuration

instructions– Official base for tests– All packages (tarballs and RPMS) plus installation and configuration scripts are

published on the gLite web site

Page 5: gLite Status

2nd EGEE Conference, Den Haag 5

Enabling Grids for E-sciencE

INFSO-RI-508833

Quality Assurance

• Quality assurance tools are integrated in the build system and CVS– Coding guidelines: Jalopy (Java), CodeWizard (C/C++)– Unit tests: JUnit (Java), CppUnit (C++)– Coverage: Clover (Java), gCov (C++)

• Reports are not yet published, but will soon be added to the gLite web site. Some of them are currently available from the CruiseControl servers

• For the moment we generate only warnings, but we can raise the quality requirements at any time and prevent commits or builds if necessary (to be agreed within the project)

Page 6: gLite Status

2nd EGEE Conference, Den Haag 6

Enabling Grids for E-sciencE

INFSO-RI-508833

Packaging and Installation

• Source tarballs, binary tarballs and RPMS are automatically generated at every build

• MSI packages for Windows will be created soon for those components already building on Windows (all Java components)

• External dependencies are repackaged only if really necessary, otherwise we only use officially released RPMS

• Python RPM installation scripts are available with every build. They can be used to easily install all components required by node with all its dependencies out of the gLite web repository

• Quattor RPM templates are also automatically produced with the build. Currently used internally by the test team

• Documentation, installation and configuration scripts form a deployment module– Exist in different granularities: service and node

Page 7: gLite Status

2nd EGEE Conference, Den Haag 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Configuration

• A common configuration model has been proposed• Guidelines and prototypes are available• The guiding principles are:

– Limit the number of configuration files as much as possible: at the moment typically only two XML configuration files and one script per node are necessary. Also limit the number of environment variables and modifications to PATH and LD_LIBRARY_PATH

– Group the parameters by function and scope: three levels are used, user parameters (to be supplied by the sysadmin/user), optimization parameters (default values are provided) and system parameters (better not to touch)

– Unify the interfaces and build instrumentation and monitoring in the services from the beginning: we have proposed a single service instrumentation interface with different implementations depending on language and platform. Migration is started, but the issue is still controversial

Page 8: gLite Status

2nd EGEE Conference, Den Haag 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Release Process

ComponentsComponentsDevelopers announce every Friday what components are available for a release according to the development plan. Components are tagged, list is sent to Integration Team

DMDM

IT/CZIT/CZ

UKUK

Integration

Builds

Integration

Builds

ITeam put together all components, verifies consistency and dependencies and add/update the service deployment modules (installation and configuration scripts). The build is tagged as IyyyyMMdd

ITeamITeam

TestTestThe integrated build is deployed in the testbeds and validates with functional and regression tests, test suites are updated, packaged and added to the build.

Test

Teams

Test

Teams

Pre-productionPre-productionSA1SA1

If the build is suitable for release, release notes and installation guides are updated, the build is retagged (RCx, v.0.x.0) and published on the gLite web site for release to SA1

Page 9: gLite Status

2nd EGEE Conference, Den Haag 9

Enabling Grids for E-sciencE

INFSO-RI-508833

Next Steps

• Complete all deployment modules for RC1• Complete the configuration files and scripts and

thoroughly verify that all guidelines are respected• Write full release notes for all services• Write full installation and configuration instructions for

all services• Go through a number of verification and feedback

iterations with Test Team and SA1• Release final gLite 1.0 RCx to testing in January 2005

Page 10: gLite Status

2nd EGEE Conference, Den Haag 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing process

• Distributed testing testbed across three sites– All run a binary compatible version of Red Hat Enterprise Linux

CERN: SLC3 NIKHEF: CentOS 3.2 RAL: Scientific Linux

• Deploy and test Integration builds – Automatic installation at all sites using quattor or kickstart – gLite component installation via deployment modules– Configuration using post installation configuration scripts

• gLite Testsuites– Build validation run on all rpms nightly builds – Functional tests run on distributed testbed– http://cern.ch/egee-jra1-testing-reports/

• Bug reporting - savannah– https://savannah.cern.ch/bugs/?group=jra1mdw

In addition to this structured testing all components discussed in the following are being tested since May on the prototype installation by application users and developers

Page 11: gLite Status

2nd EGEE Conference, Den Haag 11

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing status (1/4)

• WMS– Successfully deployed a WMS on the testbed using official gLite

rpms and following available instructions on Friday 19 Nov Basic job submission works Testing using Fake BDII – no information system integration yet Next integration build to be deployed across all sites Updating and testing post installation configuration scripts to

produce a correct and reproducible deployment of the WMS.

– Major bugs: 5383: The glite-job-* commands do not work with a VOMS proxy.

The AC signature is not verified correctly, There appears to be an incompatibility with the information returned from voms-proxy-info and what the WMS expects.

Page 12: gLite Status

2nd EGEE Conference, Den Haag 12

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing status (2/4)

• CE– Successfully deployed with pbs on testbed– Basic job submission via blah to pbs works, no further testing yet– Will deploy CE from next integration build at all sites

• L&B– Successfully deployed at CERN– Will test L&B at different site in next integration build– No extensive testing yet (dependency on WMS)

• R-GMA– Successfully deployed across the distributed testbed– Under test, no current showstopper bugs

Page 13: gLite Status

2nd EGEE Conference, Den Haag 13

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing status (3/4)

• gLite IO– Successfully deployed and tested with a castor SRM – Test beginning with dCache SRM at RAL– Performance and stress testing underway

• Catalogs, FPS. FTS– Initial test development beginning on prototype testbed– No deployment modules available yet

• AliEn components– Extensive testing of job submission on pt testbed early on

Many bugs reported and solved

– Installation on testbed was difficult due to lack of comprehensive instructions

– No deployment modules available yet for AliEn components

Page 14: gLite Status

2nd EGEE Conference, Den Haag 14

Enabling Grids for E-sciencE

INFSO-RI-508833

Testing status (4/4)

• VOMS– Still no successful server installation on RHEL– voms-proxy-* client tools installed on testbed– Testing voms proxies with WMS and RH7.3 server– Major bugs: 5582, 5505

#5505, #5582: voms-proxy-* commands are not backwards compatible with grid-proxy-* commands

#5489, voms proxies created from expired certificates

• Package Manager– No testing begun yet– No deployment modules available yet

• Accounting – No testing begun yet– No deployment modules available yet

Page 15: gLite Status

2nd EGEE Conference, Den Haag 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Updated Schedulefor pre-production service

• gLite I/O – Available• Logging & Bookkeeping,

WMS, CE, WN – In testing – end November• R-GMA – In testing – mid December

• CE-Notification – In integration/dev (WMS) – January• Replica, File,

Combined Catalog – In development – January• File Transfer Service – In integration – January• File Placement Service – In integration – January • VOMS – In integration/testing – January• UI – In integration – January

• AliEn TQ & CE – In integration – see following • discussion• Package Manager – Discussions w/experiments,

deployment – prototype exists • Grid Access – Prototype exists – discussions on

semantics ongoing• Accounting (HLR) – In integration – Prototype exists• Job Provenance – Proof of concept exists• Data Scheduler – In development

Page 16: gLite Status

2nd EGEE Conference, Den Haag 16

Enabling Grids for E-sciencE

INFSO-RI-508833

Potential Services for RC1

GAS prototype

WSDL clients; APIs, CLIs | Alien shell

VOMS R-GMA | AliEn ldap

PKIGSImyProxy

GenericInterface

AlienSEglite-I/OgridFTPSRM

AliEn FC

Local RC DGAS

PMprototype

FTS FPS | AliEn DS DS

WMS | AliEn TQL&B

GKCondor-CBlahpCEMon | AliEnCE

Page 17: gLite Status

2nd EGEE Conference, Den Haag 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Components for RC1 & Open Issues

• Workload management system (WMS)– Task queue and information supermarket (ISM)

Works in push and pull mode

– ISM adaptor for CEMon– Query of File Catalog

• Computing Element (CE)– Globus gatekeeper, condor-C, blahp to LSF and PBS– CEMon (pull component)– Security, user mapping: LCAS/LCMAPS, DAS

• Logging and Bookkeeping (L&B)

• Information Service: R-GMA– pre-WS version

Blue: deployed on prototype and released to integration and testing

Orange: in development

Page 18: gLite Status

2nd EGEE Conference, Den Haag 18

Enabling Grids for E-sciencE

INFSO-RI-508833

Components for RC1 & Open Issues

• Catalogs– AliEn file catalog, local replica catalogs (on Oracle and mySQL)– Fireman interface– Messaging system for updating the FC

• Data Management– File Placement/File Transfer Service– glite-I/O– Data Scheduler

• VOMS– Installation on SL3

• DGAS Accounting

Page 19: gLite Status

2nd EGEE Conference, Den Haag 19

Enabling Grids for E-sciencE

INFSO-RI-508833

Components for RC1 & Open Issues

• Package Manager

• Grid Access Service (GAS)– Prototype

• AliEn– Task queue, CE, SE, shell– What is the impact of Alice deployment

Page 20: gLite Status

2nd EGEE Conference, Den Haag 20

Enabling Grids for E-sciencE

INFSO-RI-508833

Impact of Alice deployment

• JRA1 has to support the deployment of “prototype software stack” on 30+ Alice sites– Unclear whether this means *all* of the prototype software stack

Prototype is an evolving system with components dumped there but not necessarily fully interworking

Need a precise description of what Alice really wants to deploy

– Software will not have gone through full JRA1 integration & testing Building in build system, but no configuration adoptions and only partial

documentation

– JRA1 is not a deployment project Will get additional manpower Most of the work is supposed to be done by Alice Any help from SA1, in particular for user support?

– Deployment experience will be fed back into the integration and testing process

Additional manpower is expected to improve documentation, packaging etc. while working on the deployment

How much effort from JRA1 integration and testing teams is needed and when?

Page 21: gLite Status

2nd EGEE Conference, Den Haag 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Priorities for RC1

• With the current manpower it will not be possible to provide all components with the required level of integration and testing– Need prioritization from SA1 and NA4

• Alice deployment will feed back to integration and testing (in particular AliEn components)– Dedicated effort from JRA1 and Alice exists– How much effort from integration and testing teams is needed and when?

• Possible scenarios:– Focus JRA1 integration and testing on AliEn components to support

deployment Deployment cannot start “today” What to do with components not a priority for Alice but for other applications? Evolvement of LCG-2 based pre-production service will be delayed

– Continue delivery to pre-production service as planned Start Alice deployment without going through full integration and testing

• Can start “today” Dedicated deployment team works on integration and testing

• Minimize involvement of integration and testing team Unclear whether feedback will be in time for RC1

• But RC1.1. can be tagged any time the deployment exercise produced results

Page 22: gLite Status

2nd EGEE Conference, Den Haag 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Interoperability with other Grids

• Interoperability mainly needed at resource level– Same physical resource should be exploitable in different Grids

• Approach– Reduce requirements on sites

CE: globus gatekeeper SE: SRM

– Close connection with other projects OSG

• use EGEE architecture and design documents as basis for their blueprint

• Common members in design teams (needs probably enforcement)

Page 23: gLite Status

2nd EGEE Conference, Den Haag 23

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• What has been done since Cork:– Platform support agreed: For this year, the platform support

will be mainly oriented to Linux based (RH Enterprise 3.x or another Binary compatibly distribution based on the Sources of RH 3.0 like Scientific Linux, CERN Linux etc. ) 32-64 bits platforms. Windows remains as secondary platform Integration infrastructure: RHES 3.0 and Windows XP Testing testbed: SLC3 (CERN), CentOS 3.2 (NIKHEF), Scientific

Linux (RAL)

– Operational requirements defined – Update from SA1 on handling of external dependency

requirements (detailed note distributed)– Installation/configuration/release mgt requirements being

implemented by JRA1

Page 24: gLite Status

2nd EGEE Conference, Den Haag 24

Enabling Grids for E-sciencE

INFSO-RI-508833

Main SA1 requirements being implemented

• Source and binary RPMs and tarballs (diff packaging formats) being delivered based on component, services and nodes (different granularity). Windows packages will come soon.

• Mw configuration presented in integration slides

• External dependencies common to all components: as much as possible only standard, official RPMS When we need to modify something, the RPMS are installed in the

$GLITE_LOCATION/externals directory to avoid conflicting with other existing packages

• Relocatable packages: not all yet fully compliant, automatic test suite to check it (see testing web page)

• Reduce components needed on WNs: 3 high-level components needed (gLite I/O client, LB client, R-GMA clients) plus the WMS checkpointing library and/or AliEn clients

Page 25: gLite Status

2nd EGEE Conference, Den Haag 25

Enabling Grids for E-sciencE

INFSO-RI-508833

In progress

• Common administrative interfaces for all grid services– Proposal exists

• Standardized error/log messages– Proposal exists

• One accounting interface for all services– Needs more discussion with SA1 in particular wrt current accounting

• Traceability: logging information and operations API have to allow tracing activities of logs back to the source– ongoing

• Scalability: services deployable in an scalable way– Distributed services (e.g. catalogs), site autonomy

• Critical services must be redundant– Reduce the need for ‘single central’ services

• Service/site autonomy: avoid single point of failure (timeouts, retries, work locally and resynchronize later)– Site autonomy one of the guiding principles

• Exception handling: services have to be prepared to handle non-standard situations in a graceful way– ongoing

Page 26: gLite Status

2nd EGEE Conference, Den Haag 26

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• JRA1 intends to have software for release candidate 1 by end of December– An intense integration and testing period follows

• The first release of gLite is due at the end of March 2005

• Prioritization needed with SA1 and NA4

Page 27: gLite Status

2nd EGEE Conference, Den Haag 27

Enabling Grids for E-sciencE

INFSO-RI-508833

Links

• JRA1 homepage– http://egee-jra1.web.cern.ch/egee-jra1/

• Architecture document– https://edms.cern.ch/document/476451/

• Release plan– https://edms.cern.ch/document/468699

• Prototype installation– http://egee-jra1.web.cern.ch/egee-jra1/Prototype/testbed.htm

• Test plan– https://edms.cern.ch/document/473264/

• Design document– https://edms.cern.ch/document/487871/

• gLite homepage:– http://www.glite.org/

Page 28: gLite Status

2nd EGEE Conference, Den Haag 28

Enabling Grids for E-sciencE

INFSO-RI-508833

Backup Slides

• The following slides show details on – SA1 requirements follow-up– the gLite components

Page 29: gLite Status

2nd EGEE Conference, Den Haag 29

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements follow-up

• What has been done since Cork: several SA1-JRA1 requirement meetings:– Platform support– Operational requirements (new)– Update on handling of external dependency requirements

• Contents of this presentation:– Update on the status of the previously defined requirements

(presented at Cork)– Presentation and status update of operational requirements

Page 30: gLite Status

2nd EGEE Conference, Den Haag 30

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Mw delivery process:– Tarball -> SA1 certification -> JRA1 packages exactly the

certified version– Present status: RPMs delivered (tarballs exists, but not tested)– Short term plan: tarball / RPMs / Windows packages

• Granularity of delivered components:– Keep the granularity of components and packages as fine as

possible – Present status: based on services and nodes

A service is a group of components providing some functionality (WMS, CE, I/O Server, etc)

A node is a set of services to be deployed together

Page 31: gLite Status

2nd EGEE Conference, Den Haag 31

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Release management:– Quick turnaround for bugs and security patches; bug fixes

provided to all versions run by SA1– Present status: weekly integration builds which are release

candidates; only bug fix experience is with the prototype, none with SA1 yet

– Short term plan: we need some experience

• Deployment scenarios:– JRA1 will deliver deployment recommendations for services as

part of a release, and define the minimum running requirements for the entire system

– Present status: included in the installation guide

Page 32: gLite Status

2nd EGEE Conference, Den Haag 32

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Mw configuration (1):– Keep mw installation and configuration separated– Present status: separated installation and configuration scripts

• Mw configuration (2):– Provide simple and tool independent configuration mechanism – Present status: Python configuration scripts – Short term plan: A common configuration method is being

finalized for all services (depending on language, platform and technology used). See Integration slides

Page 33: gLite Status

2nd EGEE Conference, Den Haag 33

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Mw configuration (3):– JRA1 will provide a standard set of configuration files and

documentation with examples that SA1 can use to design tools. Format to be agreed between SA1-JRA1

– Present status: Python configuration scripts– Short term plan: See Integration slides

• Mw configuration (4):– Classify the configuration parameters and give sensible default

values – Present status: Default values given; no classification of

parameters done yet – Short term plan: See Integration slides

Page 34: gLite Status

2nd EGEE Conference, Den Haag 34

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Deployment platforms:– For this year, the platform support will be mainly oriented to

Linux based (RH Enterprise 3.x or another Binary compatibly distribution based on the Sources of RH 3.0 like Scientific Linux, Cern Linux etc. ) 32-64 bits platforms. Windows remains as secondary platform

– Present status: Integration infrastructure: RHES 3.0 and Windows XP Testing testbed: SLC3 (CERN), CentOS 3.2 (NIKHEF), Scientific

Linux (RAL)

Page 35: gLite Status

2nd EGEE Conference, Den Haag 35

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Worker nodes:– Reduce to the minimum the components to be run in the Worker

Nodes and make them easily portable – Present status:

Only clients need to be run on WNs, no services

• Third party software (1):– Avoid using multiple versions of the same libraries, tools,

external programs – Present status: external dependencies are common to all

components as much as possible only standard, official RPMS When we need to modify something, the RPMS are installed in the

$GLITE_LOCATION/externals directory to avoid conflicting with other existing packages

Page 36: gLite Status

2nd EGEE Conference, Den Haag 36

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• Packaging:– All EGEE software should be relocatable – Present status: Automatic test run nightly to check it. Not all

RPMs are compliant yet:• http://egee-jra1-testing-reports.web.cern.ch/egee-jra1-testing-report

s/deployment_testing/installation/latest.html

– Short term plan: All packages will be made relocatable

• Software distribution:– JRA1 will provide a release packaged in the native format of all

supported platforms and a tarball. Both source and binaries would be provided

– Present status: First component release composed of source and binary RPMs and tarballs

– Short term plan: distribute also Windows packages

Page 37: gLite Status

2nd EGEE Conference, Den Haag 37

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 requirements

• External dependency handling:– The middleware should indicate all dependencies on external

packages– Any external package not supplied with the OS should be made

available separately– If a required external package conflicts with the OS, the conflict

shall be noted and effort made to remove the conflict. Same applies to external packages that require patches.

– Avoid using multiple versions of the same libraries, tools, external programs

– Present status: external dependencies are common to all components as much as possible only standard, official RPMS When we need to modify something, the RPMS are installed in the

$GLITE_LOCATION/externals directory to avoid conflicting with other existing packages

– All these is included in the build system

Page 38: gLite Status

2nd EGEE Conference, Den Haag 38

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• Common administrative interfaces:– Grid components should have a common administrative

interface (monitoring, alarms; no central entity managing all services, instead minimum basic common set of APIs implemented by all the services)

– Status: planned

• Service state:– Keep service state in an “external” repository (separated from

the service itself). It can be a repository per service, doesn’t need to be a central one (if it is a DB or not, implementation detail)

– Status: plans to publish service status into R-GMA

Page 39: gLite Status

2nd EGEE Conference, Den Haag 39

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• Verification of Service Level Agreements (SLA):– accounting information provided has to be extensible, as the

SLAs can change over time. We want the services to keep enough information to verify the SLAs, how we do this is not clear, maybe via a combination of monitoring and accounting

– Status: SLA handling unclear at the moment

• Standardized error/log messages and files:– Log level has to adaptable for debugging. Minimal level has to

guarantee audit trails. The logging message should be identifiable: originator, time stamp.

– Status: proposal worked out

Page 40: gLite Status

2nd EGEE Conference, Den Haag 40

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• Accounting:– Services need ONE accounting interface – Status: need more interaction with SA1, in particular wrt existing

accounting infrastructure

• VO policies:– A mechanism to express VO policies, transfer them to the sites

and implement them is needed. – Status: Being done in context of CE development

Page 41: gLite Status

2nd EGEE Conference, Den Haag 41

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• Complex resource usage policies:– The middleware has to be able to deal with complex resource

usage policies– Status: being addressed in new CE developments

• Traceability:– The logging information and the operations API have to allow

tracing activities of jobs back to the source (this implies for privacy and security reasons that a subset of these APIs need a strong authentication and authorization mechanism

– Status: ongoing

Page 42: gLite Status

2nd EGEE Conference, Den Haag 42

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• Scalability:– Services have to be designed so that they can be deployed in a

scalable way. Services should be scalable in a way that is transparent to the user.

– Status: work towards distributed services (e.g. catalogs)

• Redundancy:– Critical services must be redundant. Critical services need to be

quickly restarted and keep most of the state information. State information of services has to be kept in persistent, easy to localize storage

– Status: being done in design of WMS, data scheduler, catalogs, etc.

Page 43: gLite Status

2nd EGEE Conference, Den Haag 43

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• Service/site autonomy:– Avoid single points of failure. if a central service fails, it should

not affect to any of the local services running on the site. When the central service is back the site reconnects, resynchronizes and continues normal operation, instead of just crashing because the central service died. Need for timeouts and retries.

– Status: being done by design of catalogs, etc. site autonomy is one of the guiding principles

• Exception handling:– Services have to be prepared to handle non-standard situations

in a graceful way – Status: needs to be done

Page 44: gLite Status

2nd EGEE Conference, Den Haag 44

Enabling Grids for E-sciencE

INFSO-RI-508833

SA1 operational requirements

• VOs:– Adding and removing a VO has to become a lightweight

operation .– Status: need specific discussion w/SA1

• Batch systems:– First one: LSF; Second one: being investigated by SA1 (Torque,

Maui)– Status: LSF and PBS currently supported

Page 45: gLite Status

2nd EGEE Conference, Den Haag 45

Enabling Grids for E-sciencE

INFSO-RI-508833

WMS

Page 46: gLite Status

2nd EGEE Conference, Den Haag 46

Enabling Grids for E-sciencE

INFSO-RI-508833

CE

• Works in push and pull mode

• Site policy enforcement

• Exploit new globus GK and CondorC (close interaction with globus and condor team)

CEA … Computing Element Acceptance

JC … Job Controller

MON … Monitoring

LRMS … Local Resource Management System

Page 47: gLite Status

2nd EGEE Conference, Den Haag 47

Enabling Grids for E-sciencE

INFSO-RI-508833

Data Management

• Scheduled data transfers (like jobs)

• Reliable file transfer

• Site autonomy

• SRM based storage

Page 48: gLite Status

2nd EGEE Conference, Den Haag 48

Enabling Grids for E-sciencE

INFSO-RI-508833

Storage Element Interfaces

• SRM interface– Management and control– SRM (with possible evolution)

• Posix-like File I/O– File Access– Open, read, write– Not real posix (like rfio)

SRM interface

rfio dcap chirp aio

Castor dCache NeST Disk

POSIXAPIFile I/O

Control

User

Page 49: gLite Status

2nd EGEE Conference, Den Haag 49

Enabling Grids for E-sciencE

INFSO-RI-508833

Catalogs

File Catalog

Metadata Catalog

LFN

Metadata

• File Catalog– Filesystem-like view on logical file names

– Keeps track of sites where data is stored

– Conflict resolution

• Replica Catalog– Keeps information at a site

• (Meta Data Catalog)– Attributes of files on the logical level– Boundary between generic

middleware and application layer

Replica Catalog Site A

GUID SURL

SURL

LFN

Replica Catalog Site B

GUID SURL

SURL

LFN

GUIDSite ID

Site ID

Page 50: gLite Status

2nd EGEE Conference, Den Haag 50

Enabling Grids for E-sciencE

INFSO-RI-508833

Information and Monitoring

• R-GMA for– Information system and

system monitoring– Application Monitoring

• No major changes in architecture– But re-engineer and harden

the system

• Co-existence and interoperability with other systems is a goal– E.g. MonaLisa

MP

P – M

emory P

rimary P

roducer

DbS

P – D

atab

ase S

econdary P

roducer

Job wrapper

MPP

DbSP

Job wrapper

MPP

Job wrapper

MPP

e.g: D0 application monitoring:

Page 51: gLite Status

2nd EGEE Conference, Den Haag 51

Enabling Grids for E-sciencE

INFSO-RI-508833

“The Grid”“The Grid”

Joe

PseudonymityService

(optional)

CredentialStorage

1.2.

3.

4.

Obtain Grid (X.509)credentials for Joe

“Joe → Zyx”

“Issue Joe’sprivileges to Zyx”

“User=Zyx Issuer=Pseudo CA”

AttributeAuthority

Security

myProxy

tbd

VOMS

GSILCAS/

LCMAPS

Page 52: gLite Status

2nd EGEE Conference, Den Haag 52

Enabling Grids for E-sciencE

INFSO-RI-508833

GAS & Package Manager

• Grid Access Service (GAS)– Discovers and manages services on behalf of the user– File and metadata catalogs already integrated

• Package Manager– Provides application software at execution site– Based upon existing solutions– Details being worked out together with experiments and

operations

Page 53: gLite Status

2nd EGEE Conference, Den Haag 53

Enabling Grids for E-sciencE

INFSO-RI-508833

Deployment considerations

• Interoperability and co-existence– Exploit different service implementations

E.g. Castor and dCache SRM implementations– Require minimal support from deployment environment

Sites required to run globus and SRM (might not be required for tactical storage)

– Flexible service deployment Multiple services running on the same physical machine (if possible)

• Platform support– Goal is to have portable middleware– Building & Integration on RHEL 3 and windows– Initial testing (at least 3 sites) using different Linux flavors (including

free distributions)• Service autonomy

– User may talk to services directly or through other services (like access service)

• Open source software license– Based on EDG license