computing plans in cms

30
Computing Plans in CMS Ian Willers CERN

Upload: xantha-contreras

Post on 03-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Ian Willers CERN. Computing Plans in CMS. The Problem and Introduction Data Challenge – DC04 Computing Fabric – Technologies evolution Conclusions. CERN. The Problem. event filter (selection & reconstruction). detector. processed data. event summary data. raw - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computing Plans in CMS

Computing Plans in CMS

Ian Willers

CERN

Page 2: Computing Plans in CMS

2

1.1. The Problem and IntroductionThe Problem and Introduction

2. Data Challenge – DC04

3. Computing Fabric – Technologies evolution

4. Conclusions

Page 3: Computing Plans in CMS

interactivephysicsanalysis

batchphysicsanalysis

batchphysicsanalysis

detector

event summary data

rawdata

eventreprocessing

eventreprocessing

eventsimulation

eventsimulation

analysis objects(extracted by physics topic)

The Problem

event filter(selection &

reconstruction)

event filter(selection &

reconstruction)

processeddata

CERN

Page 4: Computing Plans in CMS

4

Regional Centres – a Multi-Tier Model

Tier 1

Department

Desktop

CERN – Tier 0

FNALRAL

IN2P3622 M

bps2.5 Gbps

622 M

bp

s

155

mbp

s 155 mbps

Tier2 Lab a

Uni b Lab c

Uni n

Page 5: Computing Plans in CMS

6

Iterations /scenarios

Computing TDR StrategyPhysics Model

•Data model •Calibration•Reconstruction•Selection streams•Simulation•Analysis•Policy/priorities…

Computing Model

•Architecture (grid, OO,…)•Tier 0, 1, 2 centres•Networks, data handling•System/grid software•Applications, tools•Policy/priorities…

C-TDR• Computing model (& scenarios)• Specific plan for initial systems• (Non-contractual) resource planning

DC04 Data challenge

Copes with 25Hz at 2x10**33 for 1 month

TechnologiesEvaluation and

evolution Estimated AvailableResources

(no cost book for computing)

Requiredresources

SimulationsModel systems &

usage patterns

Validation of Model

Page 6: Computing Plans in CMS

7

1.1. The Problem and IntroductionThe Problem and Introduction

2. Data Challenge – DC04

3. Proposed Computing Fabric

4. Conclusions

Page 7: Computing Plans in CMS

8

DC04 Analysis challenge

DC04 Calibration challenge

T0

T1T2

T2

T1

T2

T2

Fake DAQ(CERN)

DC04 T0challenge

SUSYBackground

DST

HLTFilter ?

CERN disk pool~40 TByte(~20 days

data)

TAG/AOD(replica)

TAG/AOD(replica)

TAG/AOD(20

kB/evt)

ReplicaConditions

DB

ReplicaConditions

DB

HiggsDST

Eventstreams

Calibrationsample

CalibrationJobs

MASTERConditions DB

1st passRecon-

struction

25Hz1.5MB/evt40MByte/s3.2 TB/day

Archivestorage

CERNTape

archive

Disk cache

25Hz1MB/evt

raw

25Hz0.5MB recoDST

Higgs backgroundStudy (requests

New events)

Eventserver

50M events75 Tbyte

Pre Challenge Production

CERNTape

archive

Starting Now. “True” DC04 Feb,

2004

Data Challenge DC04

Page 8: Computing Plans in CMS

9

DC04 Analysis challenge

DC04 Calibration challenge

T0

T1T2

T2

T1

T2

T2

Fake DAQ(CERN)

DC04 T0challenge

SUSYBackground

DST

HLTFilter ?

CERN disk pool~40 TByte(~10 days

data)

50M events75 Tbyte

PCP

CERNTape

archive

TAG/AOD(replica)

TAG/AOD(replica)

TAG/AOD(10-100kB/evt)

ReplicaConditions

DB

ReplicaConditions

DB

HiggsDST

Eventstreams

Calibrationsample

CalibrationJobs

MASTERConditions DB

1st passRecon-

struction

25Hz2MB/evt

50MByte/s4 Tbyte/day

Archivestorage

CERNTape

archive

Disk cache

25Hz1MB/evt

raw

25Hz0.5MB recoDST

Higgs backgroundStudy (requests

New events)

Eventserver

Page 9: Computing Plans in CMS

10

MCRunJob

Pre–Challenge Production with/without GRID

Site Manager startsan assignment

RefDBPhysics Group asksfor official dataset

User starts aprivate production

Production Managerdefines assignments

DAG

job job

job

job

JDL

shellscripts

DAGMan

LocalBatch Manager

EDGScheduler

Computer farm

CMS/LCG-0

User’s Site (or grid UI) Resources

ChimeraVDL

Virtual DataCatalogue

Planner

Page 10: Computing Plans in CMS

11

1.1. The Problem and IntroductionThe Problem and Introduction

2. Data Challenge – DC04

3. Proposed Computing Fabric

4. Conclusions

Page 11: Computing Plans in CMS

12

HEP Computing

• High Throughput Computing– throughput rather than performance– resilience rather than ultimate reliability– long experience in exploiting inexpensive

mass market components– management of very large scale clusters is

a problem

Page 12: Computing Plans in CMS

13

CPU Servers

Page 13: Computing Plans in CMS

14

CPU capacity - Industry

• OpenLab study of 64 bit architecture • Earth Simulator

– number 1 computer in top 500– made in Japan by NEC– peak speed of 40 Tflops– leads Top 500 list by almost a factor 5– performance of Earth Simulator equals sum of next 12

computers– the Earth Simulator runs at 90% (vs. 10-60% for PC

farms) efficiency– Gordon Bell warned “Off-the-shelf supercomputing is a

dead end”

Page 14: Computing Plans in CMS

16

Earth Simulator

Page 15: Computing Plans in CMS

17

Earth Simulator

Page 16: Computing Plans in CMS

18

Cited problems with farms used as supercomputers

• Lack of memory bandwidth• Interconnect latency• Lack of interconnect bandwidth• Lack of high performance (parallel) I/O• High cost of ownership for large scale

systems• For CMS - does this matter?

Page 17: Computing Plans in CMS

19

LCG Testbed Structure used100 cpu servers on GE, 300 on FE, 100 disk servers on GE (~50TB), 20 tape server on GE

3 GB lines

3 GB lines

8 GB lines

64 disk server64 disk server

BackboneRouters BackboneRouters

36 disk server36 disk server

20 tape server20 tape server

100 GE cpu server100 GE cpu server

200 FE cpu server200 FE cpu server

100 FE cpu server100 FE cpu server

1 GB lines

Page 18: Computing Plans in CMS

20

HEP Computing

• Mass Storage model– data resides on tape – cached on disk– light-weight private software for scalability,

reliability, performance– petabyte scale object persistency database

products

Page 19: Computing Plans in CMS

21

Mass Mass StorageStorage

Page 20: Computing Plans in CMS

22

Mass Storage - Industry

• OpenLab – StorageTek 9940B drives driven by CERN at 1.1 GB/s

• Tape only for backup

• Main data stored on disks

• Google example

Page 21: Computing Plans in CMS

24

Disk Storage

Page 22: Computing Plans in CMS

25

Disks – Commercial trends

• Jobs accessing files over the GRID– GRID copied files to sandbox– new proposal for file access from GRID

• OpenLab – IBM 28TB TotalStorage using iSCSI disks

• iSCSI: SCSI over the Internet• OSD: Object Storage Device = Object Based

SCSI• Replication gives security and performance

Page 23: Computing Plans in CMS

26

File Access via Grid

• Access now takes place in steps:1) find site where file resides using replica

catalogue

2) check if the file is on tape or on disk, if only on tape move to disk

3) if you cannot open a remote file, copy the file to the worker node and use local I/O

4) open the file

Page 24: Computing Plans in CMS

27

Object Storage Device

Page 25: Computing Plans in CMS

28

Big disk, slow I/O tricks

HotData

ColdData

Sequential faster than randomAlways read from start to finish

Page 26: Computing Plans in CMS

31

Network trends

• OpenLab: 755MB/s over 10 Gbps Ethernet• CERN/Caltech land speed record holders (in

Guinness Book of Records)– CERN to Chicago: iPv6 single stream, 983 Mbps– Sunnyvale to Geneva: iPv4 multiple streams,

2.38 Gbps

• Network Address Translation, NAT• IPv6: IP address depletion, efficient packet

handling, authentication, security etc.

Page 27: Computing Plans in CMS

32

Port Address Translation

• PAT - A form of dynamic NAT that maps multiple unregistered IP addresses to a single registered IP address by using different ports

• Avoids iPv4 problems of limited addresses• Mapping can be done dynamically so adding nodes easier• Therefore easier to management of farm fabric?

Page 28: Computing Plans in CMS

33

iPv6

• iPv4: 32-bit address space assigned– 67% for USA– 6% for Japan– 2% for China– 0.14% for India

• iPv6: 128-bit address space

• No longer need for Network Address Translation, NAT?

Page 29: Computing Plans in CMS

34

1.1. The Problem and IntroductionThe Problem and Introduction

2. Data Challenge – DC04

3. Proposed Computing Fabric

4. Conclusions

Page 30: Computing Plans in CMS

35

Conclusions

• CMS faces an enormous challenge in computing– short term data challenges– long term developments within commercial and

scientific world

• The year 2007 is still four years away– enough for a completely new generation of computing

technologies to appear

• New inventions may revolutionise computing– CMS depends on this progress to make our

computing possible and affordable