status of srm 2.2 implementations and deployment 29 th january 2007 flavia donno, maarten litmaath...

17
Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

Upload: susan-mccarthy

Post on 31-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

Status of SRM 2.2 implementations and

deployment

29th January 2007

Flavia Donno, Maarten LitmaathIT/GD, CERN

Page 2: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

2

Overview

Study of SRM 2.2 specification Tests executed Plan Status of SRM clients GLUE schema Grid Storage System Deployment (GSSD) Deployment plan

Conclusions

Page 3: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

3

Study of SRM 2.2 specification

In September 2006 very different interpretations of the spec

3 Releases of the specification: July, September, December

A study of the spec (state/activity diagrams) has identified many behaviours not defined by the specs.

A list of about 50 points has been compiled in September 2006.

Many issues solved. Last 30 points discussed and agreed during the WLCG Workshop. The implementation for those will be ready in June 2007.

The study of the specifications, the discussions and testing of the open issues have helped insure consistency between SRM implementations.https://twiki.cern.ch/twiki/bin/view/SRMDev/IssuesInTheSpecifications

Page 4: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

4

Tests executed S2 test suite testing availability of endpoints, basic functionality,

use cases and boundary conditions, interoperability, exhaustive and stress tests.

Availability: Ping and full put cycle (putting and retrieving a file) Basic: basic functionality checking only return codes and passing

all basic input parameters Usecases: testing boundary conditions, exceptions, real use cases

extracted from the middleware clients and experiment applications. Interoperability: servers acting as clients, cross copy operations Exhaustive: Checking for long strings, strange characters in input

arguments, missing mandatory or optional arguments. Output parsed.

Stress: Parallel tests for stressing the systems, multiple requests, concurrent colliding requests, space exhaustion, etc.

S2 tests cron job running 3-4 times per day

In parallel, manual tests from GFAL/lcg-utils,FTS, DPM test suite. Tests from LBNL similar to S2 basic tests but written in Java

http://sdm.lbl.gov/srm-tester/v22-progress.html http://sdm.lbl.gov/srm-tester/v22daily.html

Page 5: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

5

Tests executed

For now only availability, basic, use case and interoperability tests executed on a regular base

Results published daily on a web page. History results also available:

http://cern.ch/grid-deployment/flavia/avail http://cern.ch/grid-deployment/flavia/basic http://cern.ch/grid-deployment/flavia/usecase http://cern.ch/grid-deployment/flavia/crosss2_logs = daily resultshistory = test archive

Results of failed and successful tests reported daily to developers to signal issues.

Test results and issues discussed on srm-tester and srm-devel lists

https://hpcrdm.lbl.gov/mailman/listinfo/srmtester http://listserv.fnal.gov/archives/srm-devel.html

Page 6: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

6

Tests executed

Basic MoU SRM methods

MoU SRM methods needed

by the end of 2007. Expected by

the end of summer

Needed nowonly for dCache!

Page 7: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

7

Tests executed The basic tests are rather simplerather simple: for an asynchronous

function a foreseen return value is enough! We need other checks to make sure the function is correctly implemented.

Therefore, the use casesuse cases and boundary conditions tests are much more meaningful as these give a proper indication of the status of an implementation.

Daily, the output of the test suite is checked for both red and green boxes. A detailed reportdetailed report is sent to the developers. Also a web page is kept up to date with the open problems:

https://twiki.cern.ch/twiki/bin/view/SRMDev/ImplementationsProblems

The test page has links with details about the failures and the full output of the testsfull output of the tests, including the exact sequence of actions that determine the failures.

http://cern.ch/grid-deployment/flavia/basic/s2_logs/22DPMCERN/index.html

Page 8: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

8

Tests executed

Availability

UseCaseInteroperability/ Cross Copy

Page 9: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

9

Tests executed

We monitored the progressmonitored the progress of the implementations including progress over time in order to detect problematic implementations.

After a first instability period, almost all implementations started to convergeconverge around the 11th of December. Rather good stability and response to basic tests.

Around the 15th of December 2006 WSDL updateWSDL update. Many implementations introduced fixes as well, for instance to reflect the decisions taken about the open issues in the SRM spec.

Some problemsproblems found in the test suitetest suite itself and fixed. Use case test suite still does not fully reflect the last decisions taken on the SRM standard (discussed during the WLCG workshop).

Now the failuresfailures for the basic tests are close to zeroclose to zero for all implementations. Red boxes are understood and are being addressed by the developers.

Page 10: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

10

Tests executed: status of the implementations

DPM has been rather stable and in good state for the entire period of testing. Few issues were found (few testing conditions caused crashes in the server) and fixed. All MoU methods implemented beside Copy (not needed at this stage).

DRM and StoRM: good interaction. At the moment all MoU methods implemented (Copy in PULL mode not available in StoRM). Implementations rather stable. Some communication issues with DRM need investigation.

dCache: very good improvements in the basic tests in the last weeks. Implementation is rather stable. All MoU methods have been implemented (including Copy that is absolutely needed for dCache) but ExtendFileLifeTime. Timur has promised to implement it as soon as he gets back to US.

CASTOR: The implementation has been rather unstable. The problems have been identified in transferring the requests to the back-end server. Therefore, it has been difficult to fully test with the basic test suite the implementation of the SRM interface. A major effort is taking place to fix these problems.

Page 11: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

11

Plan Plan for 1Q of 2007 :

Phase 1: From 16 Dec 2006 until end of January 2007: Availability and Basic tests Collect and analyze results, update page with status of endpoints:

https://twiki.cern.ch/twiki/bin/view/SRMDev/ImplementationsProblems Plot results per implementation: number of failures/number of tests

executed for all SRM MoU methods. Report results to WLCG MB.

Phase 2: From beginning until end of February 2007: Perform tests on use-cases (GFAL/lcg-utils/FTS/experiment specific),

boundary conditions and open issues in the spec that have been agreed on.

Plot results as for phase 1 and report to WLCG MB.

Phase 3: From 1 March until “satisfaction/end of March 2007” : Add more SRM 2.2 endpoints (some T1s ?) Stress testing Plot results as for phase 2 and report to WLCG MB.

This plan has been discussed during the WLCG workshop. The developers have agreed to work on this as a matter of priority.

Page 12: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

12

Status of SRM clients FTS

SRM client code has been unit-tested and integrated into FTS

Tested against DPM, dCache and StoRM. CASTOR and DRM test started.

Release to development testbed expected first week of February

Experiments could do tests on the dedicated UI set up for this purpose

GFAL/lcg-utils New rpms expected on test UI first week of February New patch for gLite release certification at the same time Still using old schema (see next slide).

Page 13: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

13

GLUE Schema GLUE 1.3 available

http://glueschema.forge.cnaf.infn.it/Spec/V13

Not everything originally proposed, only the important changes

LDAP implementation done by Sergio Andreozzi. Deployment preview for now.

Information providers started by Laurence Field. First deployment by end of February.

Clients need to adapt to new schema.

Page 14: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

14

Grid Storage System Deployment (GSSD)

Working group launched by the GDB to coordinate SRM 2.2 deployment for Tier-1s and Tier-2s

https://twiki.cern.ch/twiki/bin/view/LCG/GSSD Mailing list: [email protected]

People involved: developers, site admins, experiments

Use pre-GDB meetings for discussions: Tier-1s presenting their setup and reporting problems. Some Tier-2s.

Interesting outcome from d-Cache workshop in Desy: sites reporting configuration and problems with their installations.

Collected requirements from the experiments.

Page 15: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

15

Deployment plan (A.) Collecting requirements from the experiments: more details

with respect to what is described in TDR. The idea is to understand how to specifically configure a Tier-1 or a Tier-2: storage classes (quality of storage), disk caches (how many, how big and with which access), storage transition patterns, etc.

Started. Questionnaire sent to experiments. Good input collected from LHCb and ATLAS.

(B.) Understand current Tier-1 setup: requirements ? Started

(C.) Getting hints from developers: manual/guidelines ? Started

(D.) Selecting production sites as guinea pigs and start testing with experiments.

Beginning of March 2007 - July 2007 (E.) Assisting experiments and sites during tests (monitoring

tools, guidelines in case of failures, cleaning-up, etc.). Define mini SC milestones

March - July 2007 (F.) Accommodate new needs, not initially foreseen, if necessary. (G.) Have production SRM 2.2 fully functional (MoU) by

September 2007

Page 16: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

16

Deployment plan: client perspective and backup plan

FTS, lcg-utils, GFAL SRM v1.1 will be the default until SRM 2.2 is deployed in production and it

has proven stable Site admins have to run both SRM v1.1 and SRM v2.2

Until SRM v2.2 installation is stable SRM type is retrieved from the information system

In case 2 versions found for the same endpoint SRM v2.2 is chosen only if space token (storage quality) specified. Otherwise SRM v1.1 is the default

FTS can be configured per channel on the version to use; policies can also be specified (“always use SRM 2.2”, “use SRM 2.2 if space token specified”,…)

===>>> It is possible and foreseen to run in mixed mode with SRM v1.1 and SRM v2.2, until SRM v2.2 is proven stable for all implementations.

===>>> Backup plan: continue to run SRM v1.1 as done up to now. Introduce SRM v2.2 endpoints if ready and proven stable.

Page 17: Status of SRM 2.2 implementations and deployment 29 th January 2007 Flavia Donno, Maarten Litmaath IT/GD, CERN

17

Conclusions

Much clearer clearer description of SRM specificationsspecifications. All ambiguous behaviors made explicit. A few issues left out for SRM v3 since they do not affect the SRM MoU.

Well establishedestablished and agreed methodologymethodology to check the status of the implementations. Boundary conditions, use cases from the upper layer middleware and experiment applications will be the focus of next month’s work. Monthly reports and problems escalation to the WLCG MB.

A clear planclear plan has been put in place in order to converge. We are still not wherenot where we plannedplanned to be. However, we

have a firm commitment from the developers to work on the SRM implementation as a matter of priority.

Working with sites and experimentssites and experiments for the deployment of the SRM 2.2 and Storage ClassesStorage Classes. Specific guidelines for Tier-1 and Tier-2 sites are being compiled.

It is not unreasonable to expect SRM 2.2 in production SRM 2.2 in production by September 2007by September 2007.

A backup planbackup plan foresees the use of a mixed environment SRM v1 and v2, where the upper layer middleware takes care of hiding the details from the users.