grid middleware & tools session summary

Ian Bird, CERN

Rob Gardner, University of Chicago

Introduction 82 abstracts submitted,

36 oral presentations (7 sessions), 44 posters, [2 withdrawn]

Categories: cover a broad rangeExperiment experiencesData ManagementWorkload ManagementMonitoring, Information, AccountingSecurity & AuthorizationFabric & Deployment

Grid reliability – Pablo Saiz

Grid efficiency during CMS data challenges – Oliver Gutsche

D0 – reprocessing on OSGAmber Boehnlein

Common theme: making sites reliable requires debugging sites/systems one by one

Alien grid environment- Pablo Saiz

Job agents – pilot jobsMonitoring

SRM v2.2 – Flavia Donno

18 month effort to agree, build, test, deploy new version

dCache – one of several MSS systems

-Patrick Fuhrmann – overview of dCache developments-- Gerd Behrmann – distributed instance for NDGF

LCG Data management tools

LFC, DPM, FTS – Markus Schulz

Examples of services that consider deployment & management issues

CORAL – distributed database access

Dirk Duellmann

Pilot jobs?

Pilot jobs – and variants:

Such a good idea – everyone wants one …

Stuart Paterson – optimizations in DIRAC

Marianne Bargiotti Integrity checking in DIRAC

Pilots can move intelligence into the jobPaul Nilsson – Panda experience

gLite WMS developments

Marco Cecchi

CHEP'07, Victoria 21

Igor Sfiligoi – comparison of WMS

Experiment dashboardsJulia Andreeva

Monitoring from VO/user perspective

GridICE – monitoringGuido Cuscela

Permits different views of running jobs

James Casey

Advances in monitoring of grid services

Stephen Burke – 6 years experience with GLUE schema

Martin Flechl – details on integration of information systems

David Groep - glExec

Supporting pilot jobs

Greig CowanUsing DPM over the WAN

Addressing failover for core operations services – Alfredo Pagano

Various strategies

Platform LSF – Robert StoberIntegrating heterogeneous clusters

Observations Solutions exist for most needs now –

Certainly not all perfect yetExperiment layer relatively deep Plethora of workload management systemsNot so many for data management …

Service management issues starting to be addressed by some services (DPM, LFC, FTS, Gridsite, Coral)But in general little thought on how site managers

should manage services

Interoperability / interoperation

Observations Workload management

Everyone wants pilot (aka glidein) jobs (and everyone has written a system to submit them)

Commonality – to reach a reliable service experiments need to systematically debug sites being used: D0, CMS, dashboards, …

Sophisticated systems to monitor, debug, recover Dirac, dashboards, grid service monitoring, etc., To improve reliability and help debug the system

grid middleware & tools session summary

Documents

services dpm

grid service monitoring

pilot aka glidein jobs

sites reliable

reliable service experiments

wanaddressing failover

years experience

osgamber boehnleincommon