data handling at fermilab and plans for worldwide analysis vicky white computing division and d0...

79
Data Handling at Fermilab and Plans for Data Handling at Fermilab and Plans for Worldwide Analysis Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

Upload: calvin-carson

Post on 16-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

Data Handling at Fermilab and Plans for Worldwide Data Handling at Fermilab and Plans for Worldwide AnalysisAnalysis

Vicky White

Computing Division and D0 Experiment, Fermilab

Page 2: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

2

OutlineOutline

(I) -- Solutions already implemented in use today - HEP expts, SloanDigitalSky Survey,Theorist

Lattice Guage Computation operational experience with the Mass Storage Component

(II) -- Solutions being implemented for Collider Run II with upgraded detectors (March 2001)

Building and testing data handling solutions for CDF and D0

(III) -- Moving onwards - to the future SDSS and NSF KDI SAN’s Particle Physics Data Grid Monarch and planning for CMS

(IV) -- Conclusions

Page 3: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

3

(I) Solutions already implemented

(the Hierarchical Mass Storage Component of them )

Page 4: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

4

The ‘Old’ central mass storage system The ‘Old’ central mass storage system

Page 5: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

5

FMSS quota usage updated at Sat Feb 5 01:00:00 CST 2000

Group Exp FMSS Quota (KB) Used (KB)==========================================g022 g022 00000040000000.0 00000011057801.7 ktev ktev 00006000000000.0 00005615922181.0 sdss sdss 00002000000000.0 00001985392386.0 canopy canopy 00005200000000.0 00005033291488.0 mssg mssg 00001000000000.0 00000004800833.0 e781 e781 00003000000000.0 00002911541573.0 e831 e831 0002000000000.0 00001712487988.0 minos minos 00000250000000.0 00000023601888.0cosmos cosmos 00000100000000.0 00000000000000.0 e740 e740 00004000000000.0 00004637814378.0 cms cms 00000800000000.0 00000662489480.0 auger auger 00000150000000.0 00000080381914.0 btev btev 00000200000000.0 00000116132322.0 e791 e791 00000300000000.0 00000260439481.0 e866 e866 00000200000000.0 00000004801515.0 e815 e815 00000400000000.0 00000373635826.0 hppc hppc 00000914400000.0 00000170923724.0 e811 e811 00000050000000.0 00000019215485.0 e872 e872 00000075000000.0 00000064752717.0 theory theory 00000102400000.0 00000055218237.0 e665 e665 00000020480000.0 00000004737660.0

Page 6: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

6

Page 7: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

7

(II) Building and testing data handling solutions for CDF and D0

the Problemthe Solutions - what and how

dealing with a worldwide collaboration

Page 8: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

8

Run II - Petabytes Storage and Data Run II - Petabytes Storage and Data Access problemAccess problem

Category Parameter D0 CDF

DAQ rates Peak rate 53 Hz 75 Hz

Avg ev. Size 250 KB 250 KB

Level 2 output 1000 Hz 300 Hz

Max can log Scalable 80 MB/sData storage # of events 600 M/year 900 M/year

RAW data 150 TB/year 250 TB/year

Reconstructeddata tier

75 TB/year 135 TB/year

Physicsanalysissummary tier

50 TB/year 79 TB/year

Micro summary 3TB/year -CPU Recons/event 1000-2500

MIPS.s /ev1200MIPS.s/ev

Reconstructio 34,000-83,000MIPS

56,000 MIPS

Analysis 60,000-80,000MIPS

90,000 MIPS

Page 9: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

9

The CDF DetectorThe CDF Detector

Page 10: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

10

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 11: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

11

Past and Present Strategies for data Past and Present Strategies for data processing/data handlingprocessing/data handling

Use ‘Commodity’ components where possible inexpensive CPUs in ‘farms’ for reconstruction processing

e.g. PCs inexpensive (if somewhat unreliable) tape drives and media

Multi-vendor IBM, SGI, DEC, SUN, Intel PCs

Use much Open Source Software (Linux,GNU, tcl/tk, python, apache,CORBA implementations…)

Hierarchy of active data stores Disk, Tape in Robot, Tape on Shelf

Careful placement and categorization of data on physical medium

optimize for future access patterns

Page 12: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

12

Processing FarmProcessing Farm

Page 13: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

13

Page 14: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

14

Several Processing FarmsSeveral Processing Farms

Page 15: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

15

D0 Data Access (Read and Write) D0 Data Access (Read and Write) AbstractionAbstraction

Online Data

AcquisitionComputers

Network Fabric(s)

Data Movers

Tape Robot(s)

Tape Shelves

Key factors:a) Organization of data on tape to match access b) Understanding and controlling access patternsc) Disk caches for most frequently accessed datad) Management of pass-through data disk buffers e) Rate-adapting disk buffers where necessary f) Scalability and robustnessg)Bookkeeping and more bookkeeping… h) Distributed client/servers ---> worldwide solns

Reconstruction Processing Farms of

Computers

Database Servers

Analysis Computers

Page 16: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

16

Designing for 200MB/s in/out RobotDesigning for 200MB/s in/out Robot

Page 17: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

17

D0 - the Real SystemD0 - the Real System

Page 18: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

18

CDF Run II Data FlowCDF Run II Data Flow

Write ~ 50 datasets10 Mbytes/sec

75 Hz, 20 Mbytes/sec

FiberChannelconnection

Read Primary datasets

Write Secondarydatasets

150 Mbytes/sec

75 Hz, 20 Mbytes/sec

75 Hz, 20 Mbytes/sec

Read data 1.5 Gbytes/sec

Read RAWdata 20 Mbytes/sec

30 Terabytes

Page 19: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

19

Page 20: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

20

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 21: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

21

Run II Data Access - strategiesRun II Data Access - strategies

data content for an event from different processing stages stored in different physical collections

‘tiers’ of data of different sizes and content - RAW, fully reconstructed, summary reconstructed, highly condensed summary, ntuples and meta-data

primarily file-oriented access mechanisms fetch a whole collection of event data (i.e. 1 file ~ 1GB) read through and process it sequentially

optimize traversal of data & control access based on physics & user - not on file system

use relational databases (Oracle centrally ) for file and event catalogs and other ‘detector conditions’ and calibration data (0.5 - 1 TB)

import simulated data (files and tapes) from MC

Page 22: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

22

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 23: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

23

Data Tiers for a single EventData Tiers for a single Event

RAW detector measurements

Reconstructed Data -Hits, Tracks, Clusters,Particles

Summary Physics Objects

Condensed summaryphysics data

Data Catalog entry

250KB

~350KB

50-100KB

5-15KB

~200B

Page 24: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

24

Data Streams and Data TiersData Streams and Data Tiers

Page 25: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

25

Streaming the Data - optimize for data Streaming the Data - optimize for data access traversalaccess traversal

Up-front physical data organization

and clustering

Multiple streams written and read

in parallel

Streams are physics based,

unlike disk striping

D0 approach to streaming the data

Page 26: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

26

CDF Data StreamingCDF Data Streaming

also separate data into many physical streams

not ‘exclusive’ streams - data may be written to multiple physical streams

Page 27: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

27

Access to Objects - OO designAccess to Objects - OO design

C++ Reconstruction and Analysis Programs Fully object oriented design - STL, Templates reference-counted pointers (D0) OO data model like OODBMS persistent objects inherit from persistent class

Objects and Collections of Objects stored persistently to disk and tape

‘flattened’ out to files in special HEP formats

d0om persistency package for D0 supports various external ‘flattened’ format, including relational

database allows for possibility of storing some ‘tiers’ of the data in OO

database if proven useful

ROOT (HEP analysis package) file format for CDF

Schema evolution can be tailored to need

Page 28: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

28

Object Databases/Strategies and Object Databases/Strategies and ChoicesChoices

An OODB more or less adopts the following ideas:

Objects represents entities and concepts from the application domain. Their behavior is defined by their associated methods, which also can be stored in the DB, thus making them ‘universal’ and available

Hierarchies of classes inherit behavior - to avoid storage of redundant information and improve simplicity similar objects are grouped together

GOAL--- have full database capability available to any object which can be created in any (supported) language

DREAM -- minimum of work to store an object + DB provides query, security, integrity, backup, concurrency control, redundancy + has the performance of a hand-tuned object manager for your particular application

•The more I know about the data, the more likely and the faster it can be found

•The sooner I know what you want, the faster you will get it

•The less variety in the data you have, the more opportunities for optimization

•The less often you restructure the data, the less overhead in keeping track of it

•The more people, from more places who want access to the data, the tougher the problem of serving them

•The more often you want to ask the same questions, the easier it will be to optimize for those ‘queries’

•It will be much faster to “give you what you stored” than to find some new pattern contained in several “things” that you stored

•The more complicated the pattern you search for, the longer the search will take

The “Natural Laws” of data storage and retrieval

Page 29: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

29

Object Lessons - Tom LoveObject Lessons - Tom Love

characterization and “Natural Laws” from the book

Object Lessons - Lessons Learned in Object Oriented Development Projects by Tom Love

“You can never achieve maximum performance with a system designed for maximum flexibility”

CDF and D0 both chose performance over flexibility- at least for the bulk of the data

Page 30: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

30

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 31: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

31

Serial Media Working Group ReportSerial Media Working Group Report

Technology $/drive $/media Size $/GB MB/sstr/ran

RedwoodStor.Tek

80k 78 46.6 1.67 10/4

DD-2Ampex

72k 72 46.6 1.54 14/8

3590IBM

30k 54 10 5.4 8/5

DTFSony

30k 80 42 1.9 12/6

EliantExabyte

1.7k 5.4 6.5 0.83 1/0.9

DLT7000Quantum

5.5k 80 32.6 2.45 5/2

EXB-8900Exabyte

3.5k 72 20 3.6 3/2

AIT-1Sony

3.0k 72 25 2.88 3/2.7

Conclusions for Run II : decide in 1999, maintain options and flexibility, purchase multi-drive capable robot => Grau/EMASS

2000

Page 32: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

32

EMASS AML2 Robot - flexible media - EMASS AML2 Robot - flexible media - up to 5000 cartridges per towerup to 5000 cartridges per tower

One for each - CDF and D0

Page 33: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

33

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 34: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

34

Storage Management Software Storage Management Software Requirements Requirements

Robot Tape Library is not an archive but rather an active store. We therefore need to:

control placement of data on tapes write RAW data from DAQ reliably and with absolute priority exchange tapes frequently between robot and shelf use open tape format and provide packages to read/write tapes mark files and groups of files as read only control robot arm and tape bandwidth according to access mode,

project, user, etc. Keep the system up 24x7. access files from many different vendor machines, including

PC/linux, without software licensing issues

Unable to assure ourselves that necessary HPSS modifications and enhancements would be available for Fall ‘99, Fermilab decided to build a more agile and flexible system modeled on that of DESY.

Page 35: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

35

ENSTORE storage management for ENSTORE storage management for Run IIRun II

Clientencp [options] < source> <destination>

MediaENSTORE replacement for OSM

Data Path

controlMover

ftt

Enstore Servers

PNFS Server Host(from DESY)

/pnfsadmin

usr

Perfectly Normal File System

Page 36: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

36

ENSTORE system in operation todayENSTORE system in operation today

Data Catalog support for 1 million files and 16,000 volumes tested

Integrated Fermilab written tape I/o package - ftt - for tape handling - supports error handling and statistics

Scalability looks good -- achieved 20MB/sec into Origin 2000, using just one Gbit Ethernet, also into Farms. Would have produced graphs of 50MB/sec with 3 Gbit Ethernets if Cisco switch had not broken

Working on robustness -- mainly of the hardware

Because of their overall strategy for tape and disk - planning for Storage Area Networks for disk, and preferring directly connected, separate, tape drives for their Farms and Central Analysis Server - CDF do not use Enstore.

CDF has built its own tape staging package built on mt_tools and the same underlying ftt tape I/o package

Page 37: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

37

GB/day - 3494 Robot + HPSSGB/day - 3494 Robot + HPSS

Fermilab Central Mass Storage System Utilization Gigabytes Transferred March 16th - August 24th, 1998

0.0

250.0

500.0

750.0

1000.0

1250.0

1500.0

03/1

6/98

03/2

3/98

03/3

0/98

04/0

6/98

04/1

3/98

04/2

0/98

04/2

7/98

05/0

4/98

05/1

1/98

05/1

8/98

05/2

5/98

06/0

1/98

06/0

8/98

06/1

5/98

06/2

2/98

06/2

9/98

07/0

6/98

07/1

3/98

07/2

0/98

07/2

7/98

08/0

3/98

08/1

0/98

08/1

7/98

08/2

4/98

GB

/da

y

0.0

2.0

4.0

6.0

8.0

10.0

12.0

14.0

16.0

MB

/se

c

Write

Read

I/O Rate

Average is 3 MB/sec

Max. sustained 23 MB/sec

Page 38: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

38

Some recent Enstore statistics from Some recent Enstore statistics from the webthe web

http://www-d0en.fnal.gov/enstore

Page 39: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

39

All the Enstore mover nodes kept busy All the Enstore mover nodes kept busy by the D0 SAM data access systemby the D0 SAM data access system

Page 40: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

40

CDF disk to tape CDF disk to tape

We have invented a “poor man’s” SAN for read-only disk in a heterogeneous environment

Suitable for static datasets that change infrequently

Use the ISO-9660 file system used by CD-ROMs.

We have verified that the UNIX systems of interest (SGI,SUN) are able to format a disk using the ISO-9660 format, put data on it and read the data from multiple systems

Page 41: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

41

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 42: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

42

Experiment Data Access Software Experiment Data Access Software

Define collection ofdata to be processed

Specify by Data Tier, Data Stream, Triggers, Run Ranges, Specific Files or Event, etc...

Resolve to Listof Files Use Oracle Relational Database Query Engine

Intelligent movementof data

Optimize Traversal of DataRegulate Use of Robot for different purposes and access modes

Implement Disk Cache retention policies

•SAM (Sequential Access Model) System for D0•CDF’s smallest unit is a Fileset and they use only this to optimizetape access and minimize robot arm use

Page 43: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

43

D0 SAM - CORBA based frameworkD0 SAM - CORBA based framework

Networked Clients

Servers

ENSTORE- Robot,Tape Drives and

Movers

Global Optimizer for Robot File Fetching and

Regulator of Robot/Tape Access according to Access Pattern

“Stations” -- logical or physical grouping of resources

Page 44: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

44

SAM in useSAM in use

simple command line interface (+ some GUIs and web browsers) e.g.

sam define project --defname=myproject -- … sam store --filename=xxxx --descrip=metadata-file ….

transparently integrated into D0 framework and d0om file name expanders

one consumer can have many processes all helping ‘consume’ delivered files - - supports Farm production processing without additional bookkeeping

distributed disk caches and various ‘physics group driven’ caching policies

Page 45: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

45

1000+ Monte Carlo Files stored using 1000+ Monte Carlo Files stored using SAM - reading them backSAM - reading them back

Page 46: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

46

CDF Data Access - Stagers and Disk CDF Data Access - Stagers and Disk Inventory ManagerInventory Manager

Resource Management using Batch system and static number of tape drives

File Caching

Page 47: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

47

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 48: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

48

Example File and Event Catalog for Example File and Event Catalog for Run IIRun II

Oracle 8 database ==> 0.5 - 1 TB for D0, including detector run conditions and calibration data

1.8 10**9 Event metadata entries, bit indexes, own data types several million file entries

Oracle Network sitewide licence - now on Linux too

SAM system using a CORBA interface between components, including to database servers

CDF user processes consult directly with database

Data Files Catalogued and related to Runs and Run conditions Luminosity information about the accelerator The processes which produced (and consumed the data) Detector geometry, alignment and calibration data

Page 49: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

49

Page 50: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

50

Persistent Data for all the behaviors of Persistent Data for all the behaviors of the system and the data itselfthe system and the data itself

Page 51: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

51

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 52: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

52

Scalability, Robustness, AvailabilityScalability, Robustness, Availability

SAM starting now to do serious stress testing for high throughput high availability and good error handling

Database used to store context for recovery in SAM

We are learning!

“Oracle 24X7 - Real World Approaches to Ensuring Database Availability”

-- need to start to think like this

Page 53: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

53

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 54: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

54

Data

Enstore

SAM

Access(sync)

Projectrequest

File Request

Data Transfer

Volumeinfo

SAM MetadataExport

SAM Metadata

File Import

Enstore Metadata

Tape Import

Enstore Metadata

Tape Export

Data From Remote sites -- IN2P3, Data From Remote sites -- IN2P3, Nikhef, Prague, Texas…. Nikhef, Prague, Texas….

Page 55: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

55

Access from Remote SitesAccess from Remote Sites

SAM designed for distributed caching system - File can have multiple locations in the database can use central Fermilab database -- or extracts in local

Linux Oracle Server

CDF expects to have local versions of their DH system running at non-Fermilab institutions

Page 56: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

56

Key Elements of Run II Data HandlingKey Elements of Run II Data Handling

What is our overall strategy for the DH system?

How do we physically organize the data?

On what do we store it - where?

How do we migrate between parts of the storage hierarchy ?

How do we provide intelligent and controlled access for large numbers of scientists?

… and track all the processing steps

How do we make it scalable, robust, available?

How do we work with the data at remote sites?

What are we learning for the next generation expts?

Page 57: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

57

Lessons for future experiments? Lessons for future experiments?

Draw your own conclusions so far

We will tell you next year!

Page 58: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

58

(III) Moving onwards - to the future

SDSS and NSF/KDI proposalStorage Area Networks?

Particle Physics Data GridCMS and Worldwide CollaborationNext generation Storage Systems?

Page 59: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

59

A project run by the Astrophysical Research Consortium (ARC)A project run by the Astrophysical Research Consortium (ARC)

Goal: To create a detailed multicolor map of the Northern Skyover 5 years, with a budget of approximately $80M

Data Size: 40 TB raw, 1 TB processed

Goal: To create a detailed multicolor map of the Northern Skyover 5 years, with a budget of approximately $80M

Data Size: 40 TB raw, 1 TB processed

The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study Max Planck Inst, Heidelberg

SLOAN Foundation, NSF, DOE, NASA

The University of Chicago Princeton University The Johns Hopkins University The University of Washington Fermi National Accelerator Laboratory US Naval Observatory The Japanese Participation Group The Institute for Advanced Study Max Planck Inst, Heidelberg

SLOAN Foundation, NSF, DOE, NASA

The Sloan Digital Sky SurveyThe Sloan Digital Sky Survey

Page 60: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

60

SDSS Data FlowSDSS Data Flow

Page 61: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

61

Geometric IndexingGeometric Indexing

“Divide and Conquer”“Divide and Conquer” PartitioningPartitioning

3 N M3 N M

HierarchicalTriangular

Mesh

HierarchicalTriangular

Mesh

Split as k-d treeStored as r-tree

of bounding boxes

Split as k-d treeStored as r-tree

of bounding boxes

Using regularindexing

techniques

Using regularindexing

techniques

Attributes Number

Sky Position 3Multiband Fluxes N = 5+Other M= 100+

Attributes Number

Sky Position 3Multiband Fluxes N = 5+Other M= 100+

Page 62: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

62

All raw data saved in a tape vault at Fermilab

Object catalog 400 GB parameters of >108 objects

Redshift Catalog 1 GB parameters of 106 objects

Atlas Images 1.5 TB 5 color cutouts of >108 objects

Spectra 60 GB in a one-dimensional form

Derived Catalogs 20 GB - clusters - QSO absorption lines

4x4 Pixel All-Sky Map 60 GB heavily compressed

Object catalog 400 GB parameters of >108 objects

Redshift Catalog 1 GB parameters of 106 objects

Atlas Images 1.5 TB 5 color cutouts of >108 objects

Spectra 60 GB in a one-dimensional form

Derived Catalogs 20 GB - clusters - QSO absorption lines

4x4 Pixel All-Sky Map 60 GB heavily compressed

SDSS Data ProductsSDSS Data Products

Page 63: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

63

SDSS Distributed CollaborationSDSS Distributed Collaboration

JapanJapan

FermilabFermilab

U.WashingtonU.WashingtonU.ChicagoU.Chicago

USNOUSNO

JHUJHU

VBNS

NMSUNMSUApache PointObservatory

Apache PointObservatory

I. AdvancedStudy

I. AdvancedStudy

Princeton U.Princeton U.

ESNETESNET

Page 64: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

64

NSF/KDI -- Analysis Data GridNSF/KDI -- Analysis Data Grid

Collaboration with the Analysis Data Grid:proposal to the NSF KDI program by

JHU, Fermilab and Caltech (H. Newman, J. Bunn) +

Objectivity, Intel and Microsoft (Jim Gray)

Involves computer scientists, astronomers and particle physicists

Accessing Large Distributed Archives in Astronomy and Particle Physics

experiment with scalable server architectures,

create middleware of intelligent query agents,

apply to both particle physics and astrophysics data sets

Status:3 year proposal just funded

Page 65: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

65

http://grid.fnal.gov/ppdg

Page 66: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

66

Bulk Transfer Service:100 Mbytes/s, 100 Tbytes/year

Primary Site

Data Acquisition,CPU, Disk,Tape-Robot

Replica Site(Partial)

CPU, Disk,Tape-Robot

High-Speed Site-to-Site File Replication Service

Primary SiteData Acquisition,

CPU, Disk,Tape-Robot

Satellite Site

CPU, Disk,Tape-Robot

Satellite Site

CPU, Disk,Tape-Robot

UniversityCPU, Disk,

Users

UniversityCPU, Disk,

Users

UniversityCPU, Disk,

Users

Satellite Site

CPU, Disk,Tape-Robot

Multi-Site Cached File Access

Initial Testbed ApplicationsInitial Testbed Applications

Page 67: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

67

MREN

FermilabFermilab

MRENDisk

Cache

MetaData Catalog- SAM:File Location

StatisticsEngine Status

MetaData Catalog- SAM:File Location

StatisticsEngine Status

HPSStape library

HPSStape library

Operator Mounted Exabyte Tapes

Indiana U.Indiana U.

Bulk file transfer testbed --FocuscopyBulk file transfer testbed --Focuscopy

Page 68: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

68

Bulk file transfer testbed -- awaits Bulk file transfer testbed -- awaits ESNet research network and QOSESNet research network and QOS

Page 69: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

69

Matchmaking: Distributed Resource Management for High Throughput Computing, Proceedings of the Seventh IEEE International

Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL.

Distributed Cache - combining SAM Distributed Cache - combining SAM and Condorand Condor

next project --- Objectivity database caching with Caltech and ANL?

Page 70: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

70

Storage Area Networks (SANs) where are Storage Area Networks (SANs) where are we? we?

Heterogeneous cluster of machines - not locked to one vendor, competitive bids for computing.

Requires high bandwidth access to shared disk storage to work effectively - NFS and AFS not sufficiently high performance.

Use Fiber Channel as the physical layer and run SCSI over it

Unfortunately read/write to Fiber Channel disks in a heterogeneous environment is not currently available at an affordable cost

Proposal from Quantum Research -- unfunded Proposal from Quantum Research -- unfunded

Page 71: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

71

1st phase of research was quite 1st phase of research was quite successfulsuccessful

Page 72: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

72

LHC Experiment and other future LHC Experiment and other future experiment Data Access Architecturesexperiment Data Access Architectures

Scale and complexity numbers of channels of detectors number of participants and geographic dispersion complexity of collisions

Network bandwidth hopes distributed store of data, rather than data replication

Hierarchical Storage Systems evolution HPSS collaboration - Fermilab continues involvement CERN/DESY/Fermilab/Eurostore?

Disk availability/price all data on disk? => random access to sub-parts of event with

less attention to clustering of data on physical medium.

Object oriented database technology find the right places for it

Page 73: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

73

Page 74: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

74

CPU Power ~100 KSI95 Disk space ~100 TB Tape capacity 300 TB, 100 MB/sec Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps) Raw data 1% 10-15 TB/year ESD data 100% 100-150 TB/year Selected ESD 25% 5 TB/year [*] Revised ESD 25% 10 TB/year [*] AOD data 100% 2 TB/year [**] Revised AOD 100% 4 TB/year [**] TAG/DPD 100% 200 GB/year

Simulated data 25% 25 TB/year

(repository)[*] Covering Five Analysis Groups; each selecting ~1%

of Total ESD or AOD data for a Typical Analysis

[**] Covering All Analysis Groups

Monarc Analysis Model Baseline: Monarc Analysis Model Baseline: ATLAS or CMS “Typical” Tier1 RCATLAS or CMS “Typical” Tier1 RC

Page 75: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

75

MONARC Testbed SystemsMONARC Testbed Systems

Page 76: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

76

Regional Center ArchitectureRegional Center ArchitectureExample by I. GainesExample by I. Gaines

Tapes

Network from CERN

Networkfrom Tier 2& simulation centers

Tape Mass Storage & Disk Servers

Database Servers

PhysicsSoftware

Development

R&D Systemsand Testbeds

Info serversCode servers

Web ServersTelepresence

Servers

TrainingConsultingHelp Desk

ProductionReconstruction

Raw/Sim ESD

Scheduled, predictable

experiment/physics groups

ProductionAnalysis

ESD AODAOD DPD

Scheduled

Physics groups

Individual Analysis

AOD DPDand plots

Chaotic

PhysicistsDesktops

Tier 2

Local institutes

CERN

Tapes

Page 77: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

77

UF Equipment Plan for FY00UF Equipment Plan for FY00

R&D and User support, UF Hardware: disk storage: 1 TByte

large disk pool for ODBMS testbeds and data analysis (Monte Carlo and test beam data)

tape storage: up to 10 TByte provide several TB storage for MC and test beam data setup ODBMS testbed start using Objectivity + mass storage system in analysis provide data import and export facility

CPU resources: 30 node Linux cluster increase main server. Form production unit for MC production. PC analysis cluster and dedicated special purpose R&D

systems Network infrastructure

provide sufficient LAN capacity provide WAN connectivity for production and testbed activities

Page 78: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

78

ConclusionsConclusions

If you have a lot of data to manage and access today you must

think carefully about how you store it, how you wish to access it, and how you will control access to it

be aware of media costs design a system for robustness and uptime (especially if

you use relatively inexpensive tape media) design a system for active and managed access to all

hierarchies of storage - disk, tape in robot & tape on shelf

For the next generation of experiments we hope for better network bandwidth and a truly distributed

system we investigate OO databases for their potential to provide

random access to sub-parts of event data

Page 79: Data Handling at Fermilab and Plans for Worldwide Analysis Vicky White Computing Division and D0 Experiment, Fermilab

CHEP2000Vicky White

Data Handling at Fermilab and Plans for Worldwide Analysis

79

THE ENDTHE END