nuclear physics data management needs bruce g. gibbard

13
Nuclear Physics Data Management Needs Bruce G. Gibbard SLAC DMW2004 Workshop 16-18 March 2004

Upload: wing-blevins

Post on 31-Dec-2015

24 views

Category:

Documents


0 download

DESCRIPTION

Nuclear Physics Data Management Needs Bruce G. Gibbard. SLAC DMW2004 Workshop 16-18 March 2004. Overview. Addressing a class of Nuclear Physics (NP) experiments utilizing large particle detector systems to study accelerator produced reactions Examples at: BNL (RHIC), JLab, CERN (LHC) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Nuclear Physics Data Management Needs Bruce G. Gibbard

Nuclear Physics Data Management Needs

Bruce G. Gibbard

SLAC DMW2004 Workshop

16-18 March 2004

Page 2: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

2

Overview Addressing a class of Nuclear Physics (NP) experiments

utilizing large particle detector systems to study accelerator produced reactions

o Examples at: BNL (RHIC), JLab, CERN (LHC) Technologies & data management needs of this branch of

NP are quite similar to HEP Integrating across its four experiments, the Relativistic

Heavy Ion Collider (RHIC) at BNL is currently the most prolific producer of data

o Study of very high energy collisions of heavy ions (up to Au on Au)o High nucleon count, high energy => high multiplicityo High multiplicity, high luminosity and fine detector granularity =>

very high data rateso Raw data recording at up to ~250 MBytes/sec

Page 3: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

3

Digitized Event In STAR at RHIC

Page 4: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

4

Support the basic computing infrastructure for experimental collaboration

o Typically large, 100’s of physicist, and internationally distributedo Manage & distribute code, design, cost, & schedule databaseso Facilitate communication, documentation and decision making

Store, process, support analysis of, and serve datao Online recording of Raw datao Generation and recording of Simulated datao Construction of Summary data from Raw and Simulated datao Iterative generation of Distilled Data Subsets from Summary datao Serve Distilled Data Subsets and analysis capability to widely

distributed individual physicists

IT Activities of Such NP Experiments

Data Intensive Activities

Page 5: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

5

Raw Data

Reconstruction

Summary Data

Data Mining

Skimmed Streamed Distilled

Data

Individual Analysis

Derived Physics

DataDisplay

Calibrations & Conditions

DB

Bookkeeping DB

Physics TagDB

Final Results

Detector

System

Generic Computing

Model

Meta-DataProvenance

Physics Based IndicesData HandlingLimited

Page 6: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

6

Data Volumes in Current RHIC Run

Raw Data (PHENIX)o Peak rates to 120 MBytes/seco First 2 months of ’04, Jan & Feb

• 109 Events• 160 TBytes

o Project ~ 225 TBytes of Raw data for Current Run Derived Data (PHENIX)

o Construction of Summary Data from Raw Data then production of distilled subsets from that Summary Data

o Project ~270 TBytes of Derived data Total (all of RHIC) = 1.2 PBytes for Current Run

o STAR = PHENIXo BRAHMS + PHOBOS = ~ 40% of PHENIX

Page 7: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

7

RHIC Raw Data Recording Rate

STAR

PHENIX

120MBytes/sec

120MBytes/sec

Page 8: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

8Current RHIC TechnologyTertiary Storage

o StorageTek / HPSSo 4 Silos – 4.5 PBytes (1.5 PBytes currently filled)o 1000 MB/sec theoretical native I/O bandwidth

Online Storageo Central NFS served disk

• ~170 TBytes of FibreChannel Connected RAID 5• ~1200 MBytes/sec served by 32 SUN SMP’s

o Distributed disk• ~300 TBytes of SCSI/IDE• Locally mounted on Intel/Linux farm nodes

Computeo ~1300 Dual Processor Red Hat Linux / Intel Nodeso ~2600 CPU’s => ~1,400 kSPECint2K (3-4 TFLOPS)

Page 9: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

9

Projected Growth in Capacity Scale Moore’s Law effect of component replacement in experiment

DAQ’s & in computing facilities => ~X6 increase in 5 years

Not yet fully specified requirements of RHIC II and eRHIC upgrades are likely to accelerate growth

0

500

1000

1500

2000

2500

3000

2001 2002 2003 2004 2005 2006 2007 2008

Year

Dis

k (

TB

yte

s)

Disk Volume at RHIC

Page 10: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

10NP Analysis Limitations (1) Underlying the Data Management issue

o Events (interactions) of interest are rare relative to minimum bias events

• Threshold / phase space effect for each new energy domaino Combinatorics of large multiplicity events of all kinds

confound selection of interesting eventso Combinatorics also create backgrounds to signals of interest

Two analysis approacheso Topological: typically with

• Many qualitative &/or quantitative constraints on data sample• Relatively low background to signal• Modest number of events in final analysis data sample

o Statistical: frequently with• More poorly constrained sample• Large background (signal is small difference between large numbers)• Large number of events in final analysis data sample

Page 11: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

11

NP Analysis Limitations (2) It seems that it is less frequently possible to do

Topological Analyses in NP than in HEP so Statistical Analyses are more often required

o Evidence for this is rather anecdotal – not all would agreeo To the extent that it is true, final analysis data sets tend to

be largeo These are the data sets accessed very frequently by large

numbers of users … thus exacerbating the data management problem

In any case the extraction and the delivery of distilled data subsets to physicists for analysis currently most limits NP analyses

Page 12: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

12

Grid / Data Management Issues

Major RHIC experiments are moving (have moved) complete copies of Summary Date to regional analysis centers

o STAR: to LBNL via Grid Toolso PHENIX: to Riken via Tape/Airfreighto Evolution toward more sites and full dependence on Grid

RHIC, JLab, and NP at the LHC are all very interested and active in Grid development

o Including high performance reliable Wide Area data movement / replication / access services

Page 13: Nuclear Physics Data Management Needs Bruce G. Gibbard

17 March 2004 B. Gibbard

13

Conclusions NP and HEP accelerator/detector experiments have

very similar Data Management requirements NP analyses of this type currently tend to be more

Data than CPU limited “Mining” of Summary Data and affording end users

adequate access (both Local and Wide Area) to the resulting distillate currently most limits NP analysis

It is expected that this will remain the case for the next 4-6 years through

o Upgrades of RHIC and Jlabo Start-up of LHC

with Wide Area access growing in importance relative to Local access