4 march 2004gridpp 9th collaboration meeting samgrid:jim and cdf development cdf accepts the need...

31
4 March 2004 GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development • CDF Accepts the Need for the Grid – Requirements • How to Meet the Need – Status of SAMGrid for CDF Rick St. Denis, University of Glasgow

Upload: noel-quinn

Post on 04-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

SAMGrid:JIM and CDF Development

• CDF Accepts the Need for the Grid– Requirements

• How to Meet the Need– Status of SAMGrid for CDF

Rick St. Denis, University of Glasgow

Page 2: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Director’s review, International Finance

Committee: 50% computing outside FNAL

Maximize physics output @ low Lumi

–L3 output rate: 80 -> 360Hz by 06

Spokespersons’ Requirements for CDF

CDFGrid supported by FNAL PAC

CDF needs the Grid

Page 3: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Scale of CDF Requirements

THz %offsite CPU

Speed

#duals

FY04 3.7 25% 3GHz 150

FY05 9.0 50% 5GHz +360

FY06 16.5 50% 8GHz +220

6-7 sites, 100Duals each, by 2006 + 700 @FNAL

Page 4: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

CDF Computing Model

• Develop Analysis on desktop– Access to all CDF data from

anywhere• Large scale processing on batch

clusters– Submission from anywhere– interactive tools: ls,top,head/tail/cat– Output to scratch space or desktop

Implemented Now with CAF

Page 5: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Use Cases for Summer 2004

• User Level MC Production– All CDF Users have access– No data on site -> SAM write

• User Level Data Access– All users have access– Selected samples on site: Full SAM

Support

SAM Essential for Summer 2004

Page 6: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Medium Term Vision

• Many Sites

• Fully transparent submission to all of CDF resources: 75% FNAL, 25%

outside

• Fully transparent input and output of data

Page 7: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Summer 04 Functionality

• User selects submission site, saying what dataset they will use

• System checks they can do this (privileges)

• User access with SAM/dCache

• User registers output with SAM

Page 8: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

October 04

• To extend beyond 25% outside computing JIM is essential: JIM Test for CDF June04, production October 04

• HOWEVER: It already seems that the 25% resources are not sufficient for the produciton passes: will want JIM earlier.

Page 9: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

CAF Gui/CLI

CDFGrid from a User Perspective

AC++

Grid

Toronto KoreaItaly Taiwan FermiCAF UK

CAF Gui/CLI

CDF Grid from a User Perspective

Only Fermilab

Uses SAM

Outside LabGrid

Uses SAMUses SAM

Page 10: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

CDF Grid Strategy• 25% of CDF Computing from external

resources. All CDF computing on CDF Grid by April 15: Utilize resources fully controlled by CDF: Kerberos/fbsng: dCAF + SAM

• October 15, 2004: JIM to capture shared resources

• June 2005: 50% of Computing resources external

Page 11: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Desktop

Anywhere

CondorSubmitter

@regional centers

SAM DBCondor Matchmaker

@FNAL

Globus GKCAF SubmitterSAM Station

@ each site

WN

Private LAN

Private LAN

dCache

June 2004testing

June 2005required

Simple JIM

Page 12: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Detailed JIM

SiteSite SiteSite SiteSite

Resource Selector

Info Collector

Info Gatherer

Match Making

User InterfaceUser Interface User InterfaceUser Interface

SubmissionGlobal Job Queue

Grid Client

SubmissionSubmission

User InterfaceUser Interface User InterfaceUser Interface

Global DH ServicesSAM Naming Server

SAM Log Server

Resource Optimizer

SAM DB ServerRC MetaData Catalog

Bookkeeping Service

SAM Stager(s)

SAM Station(+other servs)

Data Handling

Worker Nodes

Grid Gateway

Local Job Handler(CAF, D0MC, BS, ...)

JIM Advertise

Local Job Handling

Cluster

AAA

Dist.FS

Info Manager

XML DB server

Site Conf.Glob/Loc JID map...

Info Providers

MDS

MSS Cache Site

Web ServGrid Monitoring

User Tools

Flow of: job data meta-data

Page 13: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Meeting the Needs

• Progress in SAM

• JIM Status

• RunJob

• CDFGridWorkshop: “Nerd’s Paradise”

• Strict Project Management and process to respond to operational issues

Page 14: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Progress in SAM• Dbserver, the database server between

applications and Oracle, was upgraded to use a common schema for CDF and D0.

• All CDF data files are in SAM • Sam in is in beta testing on the CDF CAF

(1200 cpus): passed 20TB/Day delivery• Minos uses SAM for its Data Handling• Steve Mrenna (Phenomenology) depositing

ALPGEN files in SAM for common CDF/D0 use.

Page 15: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

JIM Deployment IssuesFocus: • 200 jobs each getting 200 files generated 120000

requests simultaneously to the DBServer!– Sensible sam: reliability went to 60%. Now add retries.

Training Users• D0 has D0Tools: Big script; determines where user

is and copies files: harder to get into a sandbox; • CAF conditions users!Distribution and compatibility: • This has made great strides with SAM, now time

for JIM

Communication with the expert!

Page 16: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

RunJob

• Dedicated farms at FNAL will go away and RunJob will be used for production processing of data

• CDF will use RunJob for MC production• Dave Evans worked for CDF for 2 mo.: has

made CDFRunJob based on RunJob(Shakar), a tool common to CMS. Morag will work on this.

Page 17: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Florida workshop:• 11 installations in about 2 hours. Integrated with

dCAF in 2 cases in 2 days.• 3 in Asia, 4 in Europe • 6 sites committed to summer 2004 usage of their

facilities for all of CDF (mostly MC)• Sam installation now: initsam cdf <stationname>• Follow-up on April 1.• Each site has a local user support person to reduce

load on core development team.• Generally: Security ate 80% of the effort!

Now 20!

Page 18: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Page 19: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Installations progress

Participating Institues installation and testing progress

INSTITUTE krb5 Caf

Head Caf

Node DCAF Works

CDF Sam

Software

Sam Station

sam_par_ret Sam

AC++Dump

Sam File

Store

Sam File

Store Remote

Sam AC++Dump

on CAF

MIT Yes ?

Korea Yes Yes Yes Yes Yes knu Yes Yes

Pisa Yes Yes Yes Yes Yes pisa Yes Yes Yes Yes Yes

Japan Yes Yes Yes Problems Yes japan Yes Yes Yes

Karlsruhe Yes Yes Yes Problems Yes fzzka Yes Yes Yes Yes Yes

Liverpool Yes Yes Yes Problems Yes liverpool Yes Yes Yes

Toronto Yes In progress

Yes toronto Yes

Taiwan Yes Yes Yes Yes Yes taiwan Yes Yes

TTU Yes -ttu,-ttu-phys

Yes

Glasgow Yes In Progress

Yes glasgow Yes Yes

UCSD Yes Yes Yes Yes Yes ucsd Yes

CNAF Yes Yes Yes Yes Yes cnaf Yes Yes Yes Yes Yes

Florida Workshop: After 2 Days

Page 20: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

2TB/Day: Karlsruhe

Page 21: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

CDF Dcache on CAF

ALL CDF on CAF reads 20TB/Day

Page 22: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Page 23: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Page 24: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Dcache and SAM• Dcache shapes traffic into disk: If a SAM

cache is large, need to use Dcache instead of nfs mounts

• Dcache gives the user what is requested. 1TB gets same priority as 1GB: CDF users must send email requesting data to be staged.

• SAM examines consumption rate before staging next files – No EMAIL needed.

• SAM uses Dcache for its Caching at FNAL.• This needs further work with SRM

Page 25: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

SAMGrid Management

Sam Management Team

Sam OperationsAnd Projects

Sam Design

Sam Project Leaders

Sam Technical Leaders

Page 26: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

SamGrid Development Process

SAMGrid Operations/Projects Issue Raised SAMGrid Design

SAMGrid Management TeamGrid Deliverables

Subproject

Chaired by Technical Managers Chaired by Project Leaders

Page 27: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Subproject Organization

• Each Subproject has a subproject leader (SPL) responsible for making a plan and reporting progress.

• Each Subproject has one of the Technical leaders evaluating against an assessment template.

• No deliverable requires more than 3mo work to deliver.

Page 28: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

SubProject Assessment Template1. Background Documents2. Project Definition/Mission Statement3. Deliverables and timetable4. Inter-project deliverables5. Project status6. Challenges and Critical Path Items7. Lessons Learned8. Project specific comments, alternate views

Page 29: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Housekeeping

SAMGrid Assigned SubProjects

JIM:D0Tools

Common API

Database Server RewriteDatabase Servers toLinux

Metadata Query with configurable Params

Work FlowPackageMCRequest

H Stream for CDF

JIM:MCD0

Test Harness

Retire CDF Replica Catalog

Caching

Configuration Management

HousekeepingMC / Reconstruction

Infrastructure

User analysis Apps

Page 30: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Status of Assessments

• Subprojects defined

• Interviews conducted on about ½

• Assessment reports being written

Page 31: 4 March 2004GridPP 9th Collaboration Meeting SAMGrid:JIM and CDF Development CDF Accepts the Need for the Grid –Requirements How to Meet the Need –Status

4 March 2004 GridPP 9th Collaboration Meeting

Conclusions

• CDF has embraced the need for the Grid to achieve its physics mission

• Progress in deployment, robustness testing has SAM in CDF

• JIM is rapidly solving its problems

• … with the help of a review and management process