your university or experiment logo here lhcb development glenn patrick raja nandakumar gridpp18, 20...

17
Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

Upload: darren-burke

Post on 13-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

Your university or experiment logo here

LHCb Development

Glenn PatrickRaja Nandakumar

GridPP18, 20 March 2007

Page 2: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

2

LHCb December 2006

Muon Calorimeters RICH2Trackers

Magnet

RICH1VELO

p p

Page 3: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

3

2 kHz@30 kB/event60MB/s

LHCb Computing Model

40 MHz

Level-0Hardware

1 MHz

Level-1Software

HLTSoftware

40 kHz

Page 4: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

4

DIRAC Production and Analysis

DIRAC JobManagement

Service

DIRAC JobManagement

Service

DIRAC CEDIRAC CEDIRAC CEDIRAC CE

DIRAC CEDIRAC CE

LCGLCGResourceBroker

ResourceBroker

CE 1CE 1

DIRAC SitesDIRAC Sites

AgentAgent AgentAgent AgentAgent

CE 2CE 2

CE 3CE 3

Productionmanager

Productionmanager GANGA UIGANGA UI User CLI User CLI

JobMonitorSvcJobMonitorSvc

JobAccountingSvcJobAccountingSvc

AccountingDB

Job monitorJob monitor

InformationSvcInformationSvc

FileCatalogSvcFileCatalogSvc

MonitoringSvcMonitoringSvc

BookkeepingSvcBookkeepingSvc

BK query webpage BK query webpage

FileCatalogbrowser

FileCatalogbrowser

Userinterfaces

DIRACservices

DIRACresources

DIRAC StorageDIRAC Storage

DiskFileDiskFile

gridftpgridftpbbftpbbftp

rfiorfio

Next talk

Page 5: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

5

DIRAC Workload Management

Job Receiver

Job Receiver

JobJDL

Sandbox

JobInput

JobDB

Job Receiver

Job Receiver

Job Receiver

Job Receiver

DataOptimizer

DataOptimizer

TaskQueue

LFCLFC

checkData

AgentDirectorAgent

Director

checkJob

RBRBRBRBRBRB

PilotJob

CECE

WNWN

PilotAgentPilot

Agent

JobWrapper

JobWrapper

execute(glexec)

UserApplication

UserApplicationfork

MatcherMatcher

CEJDL

JobJDL

getReplicasWMSAdminWMS

Admin

getProxySE

uploadData

VO-boxVO-boxputRequest

AgentMonitorAgent

Monitor

checkPilot

getSandbox

JobMonitor

JobMonitor

DIRACservicesDIRAC

services

LCGservices

LCGservices

WorkloadOn WN

WorkloadOn WN

Page 6: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

6

DIRAC3 Revision and Roadmap

Dec 2006. Brainstorming meeting at CERNamongst developers.

Jan 2007. Barcelona workshop.

Feb – April 2007. Re-implementation of the code baseaccording to new design.

May 2007. Integration of DIRAC3 system andthorough testing.

June 2007. Release of DIRAC 3.

• Operation in multiplatform environment - various Linux flavours, 32 bit/64 bit, Windows(!)

• Need to separate generic and LHCb behaviour.• Need for new functionality affecting multiple components (e.g. job

state machinery). • DIRAC3 will be the result of this major code revision and

reorganisation.

Gennady Kuznetsov (RAL)

Page 7: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

7

Oracle DBOracle DB

JDB

C D

river

BK Service

Bookk eeping Svc B

oo kkeep ingQuer y

Tomcat Serv

let

Web

Browser

Read

Read

Jython Server

XM

L-RPC

GANGA

applicationRead

Read

volhcb01

Write

Read/Write

AM

GA

Client

GANGA

application

volhcb01

Write

AM

GA

AMGA Client

Re

ad

R/WR/W

AMGA-Bookkeeping Architecture

Carmine Cioffi (Oxford):AMGA now used in Production – old system retired.New production machine (volhcb01) for bookkeeping.

Page 8: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

8

ReconstructionBrunel

SimulationGauss

DigitisationBoole

Strippingand Analysis

DaVinci

MC Truth Raw Data rDSTEvent TagCollection

DST+RAW

Software Modules and Data Flow

SoftwareInstallation moduleSoftwareInstallation module

GaussApplication moduleGaussApplication module

BookkeepingUpdate moduleBookkeepingUpdate module

Gauss Step

SoftwareInstallation moduleSoftwareInstallation module

BooleApplication moduleBooleApplication module

BookkeepingUpdate moduleBookkeepingUpdate module

Boole Step

Monte-Carlo Production Job

Page 9: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

9

Last Month Activity

Bugs found, stop and restart the

production

Record of running jobs

9715

CNAF GRIDKA

IN2P3 NIKHEF

PIC

RAL

ALL

CERN

• Average of 7.5K running jobs in the last month• Temporary problems at PIC and RAL Raja Nandakumar (RAL)

Page 10: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

10

CPU Use since Dec. 2006

COUNTRYCPU USE

(%)

UK 41.1

CERN 12.1

Italy 9.6

German 8.1

France 7.7

Spain 6.6

Greece 3.8

Netherlands 3.1

Poland 2.4

Russia 2.0

Hungary 0.9

UK

CER

N

German

SpainFrance

Italy

Page 11: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

11

CPU Use since Dec. 2006

Main sitesCPU Time

(%)

Manchester 17

CERN 11

QMUL 8

CNAF 7

GRIDKA 7

IN2P3 4

Brunel(UK) 4

RAL 3

Glasgow 3

NIKHEF 3

USC 3

Lancashire 2

HG-06 (Greece) 2

Barcelona 2

PIC 2

Other sites 20

CPU Use - 40% @ T1s

Manchester

CERN

CNAF

QMUL

GRIDKA

Page 12: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

12

Reconstruction since Dec. 2006

PIC32%

CNAF20%

CERN19%

IN2P317%

RAL10%

GRIDKA2%

NIKHEF0%

Tier 1 EventsReconstructed

PIC 32.4%

CNAF 20.1%

CERN 18.6%

IN2P3 16.5%

RAL 10.4%

GRIDKA 5.3%

NIKHEF 0.1%

Data access problems the main cause of delays to reconstruction.RAL dCache unstable since December.Problems with file staging through SRM.Some GridFTP problems.

New staging component in DIRAC

Page 13: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

13

Data Transfer 1

Problems with transfers:When a job fails to transfer data to one or more T1, the transfer request is queued through VO box. Storage is not always available at T1s and number of pending transfer requests increase.

Failed

Page 14: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

14

Data Transfer 2

Improvements:Temporary replication to a fail-over SE (all Tier 1s).Replication to final destination queued in VO box.VO box retries until transfer succeeds.Extremely reliable (multi-threaded transfer agent required).

Success

Page 15: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

15

Next Steps

Castor-2

Castor Migration.LHCb tests progressing at RAL (Raja).Once jobs run and are stable aim to switch and replicate existing data from dCache to Castor.End June deadline for Castor approaching fast!

Data Stripping.Delayed to ~June because of late availability of high performance pre-selection algorithms.• Stripped DSTs to be shipped to all Tier 1 centres.• Analysis using Ganga.• Output used for LHCb “Physics Book”.

Page 16: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

16

Alignment Challenge

First release of alignment framework – March.

First Alignment Challenge using tracking detectors – end April for production of datasets. ~June for alignment demonstration.

Second Alignment Challenge using all sub-detectors – September?VELO is most precise device in LHCb, but it

moves!

Retracted by ~3cm in between fills.

21 tracking stations. 4 sensors per station (r/ )

Different Configurations:Magnet OFF, VELO Open Magnet OFF, VELO Closed Magnet ON, VELO Open Magnet ON, VELO ClosedGrid test of Conditions Database – streaming of data constants andrunning of LHCb applications.

Page 17: Your university or experiment logo here LHCb Development Glenn Patrick Raja Nandakumar GridPP18, 20 March 2007

17

2007 Timetable

December 2007

January 2007

From March April: DAQ -Tier 0 throughput tests

June/July: Full chain DAQ -Tier 0 - Tier 1 tests

September: Re-reconstruction of b and Min Bias events

November: First data!

Jan - March: DC06 Production Phase

June: Release of DIRAC3end April - June: First Alignment Challenge

September: Second Alignment Challenge