bnl wide area data transfer for rhic & atlas: experience and plans bruce g. gibbard chep 2006...
TRANSCRIPT
BNL Wide Area Data Transfer for RHIC & ATLAS:
Experience and Plans
BNL Wide Area Data Transfer for RHIC & ATLAS:
Experience and Plans
Bruce G. Gibbard
CHEP 2006Mumbai, India
Bruce G. Gibbard
2
CHEP 2006Mumbai, India
IntroductionIntroduction
The scale of computing required by modern High Energy and Nuclear Physics experiments can’t be met by single institutions, funding agencies or even countries
Grid computing, integrating widely distributed resources into a seamless facility, is the solution of choice
A critical aspect of such Grid computing is the ability to move massive data sets over great distances in near real time
High bandwidth wide area transfer rates Long term sustained operations
Bruce G. Gibbard
3
CHEP 2006Mumbai, India
Specific Needs at BrookhavenSpecific Needs at Brookhaven
HENP Computing at BNL Tier 0 center for Relativistic Heavy Ion
Collider - RHIC Computing Facility (RCF) US Tier 1 center for ATLAS experiment at
the CERN LHC – ATLAS Computing Facility (ACF)
RCF requires data transfers to collaborating facilities
Such as RIKEN center in JapanACF requires data transfers from CERN
and on to ATLAS Tier 2 Centers (Universities)
Such as Boston, Chicago, Indiana, Texas/Arlington
Bruce G. Gibbard
4
CHEP 2006Mumbai, India
BNL Staff InvolvedBNL Staff Involved
Those involved in this work at BNL were member of the RHIC and ATLAS Computing Facility and the PHENIX and ATLAS experiments
Not named here there were of course similar contributing teams at the far end of these transfers: CERN, Riken, Chicago, Boston, Indiana, Texas/Arlington
• M. Chiu• W. Deng• B. Gibbard• Z. Liu• S. Misawa• D. Morrison• R. Popescu• M. Purschke• O. Rind• J. Smith• Y. Wu• D. Yu
Bruce G. Gibbard
5
CHEP 2006Mumbai, India
PHENIX Transfer of Polarized Proton Data To Riken Computing Facility in
Japan
PHENIX Transfer of Polarized Proton Data To Riken Computing Facility in
Japan
Near Real Time In particular not to tape storage so no
added tape retrieval required Very shortly after end of RHIC run,
transfer should endPart of RHIC Run in 2005 (~270
TB)Planned Again for RHIC Run in
2006
Bruce G. Gibbard
6
CHEP 2006Mumbai, India
Bruce G. Gibbard
7
CHEP 2006Mumbai, India
Typical Network Activity During PHENIX Data Transfer
Typical Network Activity During PHENIX Data Transfer
Bruce G. Gibbard
8
CHEP 2006Mumbai, India
Bruce G. Gibbard
9
CHEP 2006Mumbai, India
For ATLAS, (W)LCG Exercises
For ATLAS, (W)LCG Exercises
Service Challenge 3 Throughput Phase (WLCG and computing
sites develop, tune and demonstrate data transfer capacities)• July ‘05• Rerun in Jan ‘06
Service Challenge 4 To begin in April 2006
Bruce G. Gibbard
10
CHEP 2006Mumbai, India
Read poolsDCap doors
SRM door doors
GridFTP doors doors
Control Channel
write pools
Data Channel
DCap Clients
Pnfs Manager Pool Manager
HPSS
GridFTP Clientsd
SRM Clients
Oak Ridge Batch system
dCache System
BNL ATLAS dCache/HPSS Based SEBNL ATLAS dCache/HPSS Based SE
Bruce G. Gibbard
11
CHEP 2006Mumbai, India
Disk to Disk Phase of SC3Disk to Disk Phase of SC3 Transfer rate to 150 MB/sec achieved during early
standalone operations Even though FTS (transfer manager) failed to properly
support dCache SRMCP degrading performance of BNL Tier 1 dCache based storage element
Bruce G. Gibbard
12
CHEP 2006Mumbai, India
Overall CERN Operations During Disk to Disk PhaseOverall CERN Operations During Disk to Disk Phase
Saturation of network connection at CERN required throttling of individual site performances
Bruce G. Gibbard
13
CHEP 2006Mumbai, India
Disk to Tape PhaseDisk to Tape Phase
Bruce G. Gibbard
14
CHEP 2006Mumbai, India
dCache Activity During Disk to Tape Phase
dCache Activity During Disk to Tape Phase
Tape Writing Phase Green indicated income data Blue indicates data being migrated out to HPSS, the
tape storage systemRate at 60-80 MBytes/sec were sustained
Bruce G. Gibbard
15
CHEP 2006Mumbai, India
SC3 T1 – T2 ExercisesSC3 T1 – T2 Exercises Transfer to 4 Tier 2 sites (Boston, Chicago, Indiana,
Texas/Arlington) resulted in aggregate rates to 40 MB/sec but typically ~15 MB/sec and quite inconsistent
Tier 1 sites only supported Gridftp on classic storage elements and were not prepared to support sustained operations
Bruce G. Gibbard
16
CHEP 2006Mumbai, India
Potential Network ContentionPotential Network Contention
BNL has been operating with an OC 48 ESnet WAN connection with 2 x 1 GB/sec connectivity over to the ATLAS/RHIC network fabric
PHENIX sustain transfer to Riken CCJ ATLAS Service Challenge Test
←
←
Bruce G. Gibbard
17
CHEP 2006Mumbai, India
Network UpgradeNetwork Upgrade
ESnet OC48 WAN connectivity is being upgraded to 2 x
BNL site connectivity from border router to RHIC/ATLAS facility is being upgrade to redundant 20 Gb/sec paths
Internally, in place of previous channel bonding
ATLAS switches are being redundantly connected at 20 Gb/sec
RHIC switches are being redundantly connected at 10Gb/sec
All will be complete by end of this month
Bruce G. Gibbard
18
CHEP 2006Mumbai, India
RHIC/PHENIX Plans ‘06RHIC/PHENIX Plans ‘06
RHIC will run again this year with polarized protons and so the data will again be transferred to Riken Center in Japan.
Data taking rates will be somewhat higher with somewhat better duty factor so transfer may have to support rates as much as a factor of two higher
Such running is likely to begin in early March
Expect to use SRM for transfer rather than just Gridftp for additional robustness
Bruce G. Gibbard
19
CHEP 2006Mumbai, India
WLHC Service Challenge 4WLHC Service Challenge 4
Service challenge transfer goals are for nominal real transfer rates required by ATLAS to US Tier 1 in first years of LHC operation
200 MB/sec (Disk at CERN to Tape at BNL) Disk to Disk to begin in April with Disk to Tape to
follow as soon as possible BNL Tier 1 expects to be ready with new tape
system in April to do Disk to Tape BNL is planning on being able to use dCache
SRMCP in these transfersTier 2 exercises at a much more serious level
are anticipated using dCache/SRM on storage elements
Bruce G. Gibbard
20
CHEP 2006Mumbai, India
ConclusionsConclusions
Good success to date in both ATLAS exercises and RHIC real operations
New round with significantly higher demands within next 1-2 months
Upgrades of network, storage elements, tape systems, and storage element interfacing should make it possible to satisfy these demands