future plans at ral tier 1 shaun de witt. introduction current set-up short term plans final...
DESCRIPTION
Current Infrastructure Common CMS ATLAS Gen LHcB Common Services (nsd, Cupv, vmgr, vdqm) Instances Disk Layer (Tape and Disk ‘Pools’) TapeLayer (at least 1 dedicated drive)TRANSCRIPT
Future Plans at RAL Tier 1
Shaun de Witt
Introduction• Current Set-Up• Short term plans• Final Configuration• How we get there…• How we plan/hope/pray to use CEPH
Current InfrastructureCommo
n
CMSATLAS Gen LHcB
Common Services(nsd, Cupv, vmgr, vdqm)
Instances
Disk Layer(Tape and Disk ‘Pools’)
TapeLayer(at least 1 dedicated drive)
ATLAS Instance Explodednsd01
nsd02SRM01 SRM02 SRM03 SRM04
HeadNode01RH
StagerTapeGateway
HeadNode02TransferMgr
HeadNode03TransferMgr
NSDXroot Mgr
atlasTape atlasDataDisk atlasScratchDisk
Xroot proxy
x12x1
Database ConfigurationRepack
Nameserver
Cupv…
CMS SRMCMS STGR
LHCb SRMLHCb STGR ATLAS SRM ATLAS
STGRGen SRMGen STGR
RepackNameserve
rCupv
…
CMS SRMCMS STGR
LHCb SRMLHCb STGR ATLAS SRM ATLAS
STGRGen SRMGen STGR
Primaries
Standbys
DataGuard
Short Term Plans…• Improve tape cache performance for
ATLAS– Tape rates limited by disk– Currently heavy IO (read/write from
grid/tape)– Currently configured with 10(7) ‘small’
server in RAID6– Would RAID-1(0) help?
The Future
Swift/S3 XROOT/gridFTP
CASTOR
What is • Erasure-coded CEPH High-throughput
Objectstore• EC uses 16+3• ALL user data planned to use erasure
coding (no replication)• S3/SWIFT recommended interfaces
– Xroot and gridFTP for legacy support• … more later
The Plan…• Move
– The data• Modify• Merge
The ‘Plan’ – Phase 1• Current disk purchases are usable for
CEPH and classic CASTOR• Start moving atlasScratchDisk over to echo
– Lifetime of files should be ~2 weeks– Allows us to monitor production use of echo
• Work with VOs to migrate relevant data using FTS
• Maintain classic CASTOR
The Plan – Phase 2• Once all disk-only (on echo?)
– Consolidate to single castor instance• With single shared diskpool• Tape drive dedication…(Tim knows)• Clear all diskcopies• Single 2/3 node RAC + stdby
– Common headnodes supporting all services– Maintain 3-4 SRMs
• Will probably be phased in
Accessing ECHO• VOs use gridFTP and xroot ATM
– Write using 1 protocol, read using either• But not S3/SWIFT
– Proposed gridFTP url (writing?)• gsiftp://gateway.domain.name/<pool>/<file>• Steers transfer to pool• Certificate (VO and Role) for AA
– Xroot URL• As suggested by Seb…
– But what about access?
The Known Unknowns• S3/SWIFT interoperability?• Will CASTOR/CEPH support EC pools?• Partial writes verboten?• Do users need to supply• Support for CEPH plugins