racf: an overview

Formed in the mid-1990’s to provide centralized computing resources for the four RHIC experiments (BRAHMS, PHOBOS, STAR, PHENIX)

Role was expanded in the late 1990’s to act as the US Tier-1 computing center for the ATLAS experiment at the LHC

Small but growing astrophysics presence (Daya Bay, LSST)

RACF: an overview

• Located in Brookhaven Computing Facility

• 35 FTEs providing a full range of scientific computing services for more than 4000 users

RACF: setting the scale

RHIC 1200 Compute Servers (130 kHS06, 16k job slots) 7 PB of Distributed Storage on Compute Nodes, up to 16 GB/s between compute

servers and distributed storage servers 4 Robotic Tape Libraries w/ 40 tape drives and 38k cartridge slots, 20 PB of active

data ATLAS

1150 Compute Servers (115 kHS06, 12k job slots) 90 Storage Servers driving 8500 disk drives (10 PB), up to 18 GB/s observed in

production between compute and storage farm 3 Robotic Tape Libraries w/ 30 tape drives and 26k cartridge slots, 7 PB of active

data Magnetic Tape Archive

Data inventory of currently 27 PB managed by High Performance Storage System (HPSS), archive layer below dCache or xrootd

Up to 4GB/s tape/HPSS dCache/xrootd throughput Network

LAN – 13 enterprise switches w/ 5800 active ports (750 10GE ports), 160 Gbps inter-switch Bandwidth

WAN – 70 Gbps in production (20 Gbps to CERN and other ATLAS T1s, 10 Gbps dedicated for LHCONE) + 20 Gbps for US ATLAS T1/T2 traffic and up to 20 Gbps serving domestic and international data transfer needs

ATLAS Cloud support and R&D beyond Tier-1 Core Services

Grid job submission• Deployment & operations of grid job submission infrastructure and pandamover (to serve

T2 input datasets)• Deployment & operations of AutoPyFactory (APF) for pilot submission in US and other

regions- Includes work on PanDA job wrapper (Jose C.) and local pilot submission for T3s

• Condor-G performance improvements- In close collaboration w/ Condor developers

gLexec to run analysis payload using user proxy• Lead effort ATLAS-wide and part of the OSG Software Integration activities

CREAM Computing Element (CE) replacing GT2 based CE• As part of the OSG Software Integration activities

Primary GUMS (Grid Identity Mapping Service) developer (John Hover)• Used OSG-wide (incl. US ATLAS (~600 accounts) and US CMS (~250 accounts))

Leading OSG Architecture (Blue Print), Native Packaging (RPMs replacing Packman) / Configuration Management

• OSG Integration and Validation effort Support for T2 & T3 (DQ2 site services, FTS, LFC, Network optimization & monitoring Coordination of worldwide Frontier deployment (ended in Nov 2011) Worldwide ATLAS S/W installation and validation service (Yuri Smirnov) Participation in ATLAS computing R&D projects

• Cloud computing• Federated Xrootd storage system• NoSQL Database evaluation (e.g. Cassandra, now moving to Hadoop)

Storage System performance optimization• Linux & Solaris kernel, I/O driver and file system tweaks

Support for Tier-3 at BNL• User accounts, interactive services, Fabric (all hardware components, OS, batch system),

Xrootd and PROOF (in collaboration w/ Sergey and Shuwei)

Reprocessing from 11/02 – 11/14

MWT2 Included 11/06

Contribution to Simulation (Aug-Oct)

(3843)

(1867) (1762)

(1067)(896)

Avg # of fullyUtilized cores

~1000 opportunistic job slots from NP/RHIC, ~ 2M CPU hours since August

Facility Operations during Hurricane Sandy

Facility Operations During Hurricane Sandy

Configuration Management(Jason Smith et al)

Configuration management - Components

Benefits

RACF and OSG

RACF Staff is heavily engaged in the Open Science Grid• Major contributor to Technology Investigation area and the

architectural development of OSG’s Fabric of Services• Member of the Management Team and represents BNL on the

OSG Council- Committed to develop OSG as a National Computational

Infrastructure, jointly with other providers like XSEDE

• ATLAS Tier-1 center fully integrated with OSG- Provides opportunistic cycles to other OSG VOs

OSG continuing for another 5 years

Besides focus on physics and the momentum of the LHC, there is a broad spectrum of different science applications making use of OSG

• Very strong support from DOE and NSF to continue• Extend support to more stakeholders – communities (e.g. nuclear

physics and astrophysics) and scientists local to the campuses- Partnership/Participation in NSF as an XSEDE Service Provider

- XSEDE is a comprehensive, expertly managed and evolving set of advanced heterogeneous high-end digital services, integrated into a general-purpose infrastructure

- XSEDE is about increased user productivity - Emergence of NSF Institutes – centers of excellence on particular

cyberinfrastructure topics (e.g. Distributed High Throughput Computing (DHTC)) across all science domains

- Evolution of DOE ASCR SciDAC program - Strong participant in Extreme Scale Collaboratories initiative

R&D at the RACF: Cloud Computing ATLAS-wide activity - Motivation

• New and emerging paradigm in the delivery of IT services• Improved approach to managing and provisioning resources, allowing

applications to easily adapt and scale to varied usage demands• New, increasingly competitive market offers cost effective computing

resources; companies small and large already make extensive use of them

• By providing an “Infrastructure as a Service” (IaaS), clouds aim to efficiently share the hardware resources for storage and processing without sacrificing flexibility in services offered to applications

BNL_CLOUD: Standard production Panda site with ~500 Virtual Machines (VM) • Configured to use wide-area stagein/out, so same cluster can be

extended transparently to commercial cloud (e.g. Amazon) or other public academic clouds

• Steadily running production jobs on auto-built VMsThe key feature of the work has been to make all processes and configurations

general and public, so they can be (re-)used outside BNL (e.g. at T2s to dynamically establish analysis facilities using beyond-pledge resources)

Standard, widely used technology (Linux, Condor, OpenStack, etc.) is used.

DOE/NSF Bi-Weekly Operations Meeting - US ATLAS

13

• Flexible algorithms decide when to start and terminate running VMs• Cloud hierarchies: programmatic scheduling of jobs on site-local -> other private -> commercial clouds based on job

priority and cloud cost

Cloud Integration in U.S. ATLAS Facilities

DOE/NSF Bi-Weekly Operations Meeting - US ATLAS

14

Looking into Hadoop-based Storage Management

Hiro, Shigeki Misawa, Tejas Rao, and Doug helping w/ tests Reasons why we are interested

• Internet Industry is developing scalable storage management solutions much faster than we will ever be able to

- We just have to make them work for us- HTTP-based data access works well for them, why shouldn’t it for us?

• With ever increasing storage capacity/drive we expect performance bottlenecks

- With disk-heavy WNs we could provide many more spindles which helps scaling up the I/O performance and improve resilience against failures

Apache Distribution Open Source• Several significant limitations (performance & resilience)

Several commercial products• Free/unsupported downloads besides value-added/supported

licensed versions

MapR (.com)

Performance vs. CPU Consumption

Maxing out a 10 Gbps Network Interface

… at the expense of 10% of the server CPU for Write and ~5% for Read Operations

racf: an overview

Documents

job slots90 storage

tape drives

atlas experiment

atlas t1s

us tier

computing center

us atlas t1t2 traffic

opportunistic job slots