bnl wide area data transfer for rhic & atlas: experience and plans bruce g. gibbard chep 2006...

BNL Wide Area Data Transfer for RHIC & ATLAS:

Experience and Plans

BNL Wide Area Data Transfer for RHIC & ATLAS:

Experience and Plans

Bruce G. Gibbard

CHEP 2006Mumbai, India

Bruce G. Gibbard

2


IntroductionIntroduction

The scale of computing required by modern High Energy and Nuclear Physics experiments can’t be met by single institutions, funding agencies or even countries

Grid computing, integrating widely distributed resources into a seamless facility, is the solution of choice

A critical aspect of such Grid computing is the ability to move massive data sets over great distances in near real time

High bandwidth wide area transfer rates Long term sustained operations

Bruce G. Gibbard

3


Specific Needs at BrookhavenSpecific Needs at Brookhaven

HENP Computing at BNL Tier 0 center for Relativistic Heavy Ion

Collider - RHIC Computing Facility (RCF) US Tier 1 center for ATLAS experiment at

the CERN LHC – ATLAS Computing Facility (ACF)

RCF requires data transfers to collaborating facilities

Such as RIKEN center in JapanACF requires data transfers from CERN

and on to ATLAS Tier 2 Centers (Universities)

Such as Boston, Chicago, Indiana, Texas/Arlington

Bruce G. Gibbard

4


BNL Staff InvolvedBNL Staff Involved

Those involved in this work at BNL were member of the RHIC and ATLAS Computing Facility and the PHENIX and ATLAS experiments

Not named here there were of course similar contributing teams at the far end of these transfers: CERN, Riken, Chicago, Boston, Indiana, Texas/Arlington

• M. Chiu• W. Deng• B. Gibbard• Z. Liu• S. Misawa• D. Morrison• R. Popescu• M. Purschke• O. Rind• J. Smith• Y. Wu• D. Yu

Bruce G. Gibbard

5


PHENIX Transfer of Polarized Proton Data To Riken Computing Facility in

Japan

PHENIX Transfer of Polarized Proton Data To Riken Computing Facility in

Japan

Near Real Time In particular not to tape storage so no

added tape retrieval required Very shortly after end of RHIC run,

transfer should endPart of RHIC Run in 2005 (~270

TB)Planned Again for RHIC Run in

2006

Bruce G. Gibbard

6


Bruce G. Gibbard

7


Typical Network Activity During PHENIX Data Transfer

Typical Network Activity During PHENIX Data Transfer

Bruce G. Gibbard

8


Bruce G. Gibbard

9


For ATLAS, (W)LCG Exercises

For ATLAS, (W)LCG Exercises

Service Challenge 3 Throughput Phase (WLCG and computing

sites develop, tune and demonstrate data transfer capacities)• July ‘05• Rerun in Jan ‘06

Service Challenge 4 To begin in April 2006

Bruce G. Gibbard

10


Read poolsDCap doors

SRM door doors

GridFTP doors doors

Control Channel

write pools

Data Channel

DCap Clients

Pnfs Manager Pool Manager

HPSS

GridFTP Clientsd

SRM Clients

Oak Ridge Batch system

dCache System

BNL ATLAS dCache/HPSS Based SEBNL ATLAS dCache/HPSS Based SE

Bruce G. Gibbard

11


Disk to Disk Phase of SC3Disk to Disk Phase of SC3 Transfer rate to 150 MB/sec achieved during early

standalone operations Even though FTS (transfer manager) failed to properly

support dCache SRMCP degrading performance of BNL Tier 1 dCache based storage element

Bruce G. Gibbard

12


Overall CERN Operations During Disk to Disk PhaseOverall CERN Operations During Disk to Disk Phase

Saturation of network connection at CERN required throttling of individual site performances

Bruce G. Gibbard

13


Disk to Tape PhaseDisk to Tape Phase

Bruce G. Gibbard

14


dCache Activity During Disk to Tape Phase

dCache Activity During Disk to Tape Phase

Tape Writing Phase Green indicated income data Blue indicates data being migrated out to HPSS, the

tape storage systemRate at 60-80 MBytes/sec were sustained

Bruce G. Gibbard

15


SC3 T1 – T2 ExercisesSC3 T1 – T2 Exercises Transfer to 4 Tier 2 sites (Boston, Chicago, Indiana,

Texas/Arlington) resulted in aggregate rates to 40 MB/sec but typically ~15 MB/sec and quite inconsistent

Tier 1 sites only supported Gridftp on classic storage elements and were not prepared to support sustained operations

Bruce G. Gibbard

16


Potential Network ContentionPotential Network Contention

BNL has been operating with an OC 48 ESnet WAN connection with 2 x 1 GB/sec connectivity over to the ATLAS/RHIC network fabric

PHENIX sustain transfer to Riken CCJ ATLAS Service Challenge Test

←

←

Bruce G. Gibbard

17


Network UpgradeNetwork Upgrade

ESnet OC48 WAN connectivity is being upgraded to 2 x

BNL site connectivity from border router to RHIC/ATLAS facility is being upgrade to redundant 20 Gb/sec paths

Internally, in place of previous channel bonding

ATLAS switches are being redundantly connected at 20 Gb/sec

RHIC switches are being redundantly connected at 10Gb/sec

All will be complete by end of this month

Bruce G. Gibbard

18


RHIC/PHENIX Plans ‘06RHIC/PHENIX Plans ‘06

RHIC will run again this year with polarized protons and so the data will again be transferred to Riken Center in Japan.

Data taking rates will be somewhat higher with somewhat better duty factor so transfer may have to support rates as much as a factor of two higher

Such running is likely to begin in early March

Expect to use SRM for transfer rather than just Gridftp for additional robustness

Bruce G. Gibbard

19


WLHC Service Challenge 4WLHC Service Challenge 4

Service challenge transfer goals are for nominal real transfer rates required by ATLAS to US Tier 1 in first years of LHC operation

200 MB/sec (Disk at CERN to Tape at BNL) Disk to Disk to begin in April with Disk to Tape to

follow as soon as possible BNL Tier 1 expects to be ready with new tape

system in April to do Disk to Tape BNL is planning on being able to use dCache

SRMCP in these transfersTier 2 exercises at a much more serious level

are anticipated using dCache/SRM on storage elements

Bruce G. Gibbard

20


ConclusionsConclusions

Good success to date in both ATLAS exercises and RHIC real operations

New round with significantly higher demands within next 1-2 months

Upgrades of network, storage elements, tape systems, and storage element interfacing should make it possible to satisfy these demands

bnl wide area data transfer for rhic & atlas: experience and plans bruce g. gibbard chep 2006...

Documents

gibbard chep

rhic atlas

riken computing facility

atlas tier

data transfers

countriesgrid computing

brookhavenhenp computing

computing sites