infn tier1 status report spring hepix 2005 andrea chierici – infn cnaf

30
INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Upload: prosper-harris

Post on 11-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

INFN Tier1Status report

Spring HEPiX2005

Andrea Chierici – INFN CNAF

Page 2: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Introduction•Location: INFN-CNAF, Bologna (Italy)

–one of the main nodes of GARR network

•Computing facility for INFN HNEP community–Partecipating to LCG, EGEE, INFNGRID projects

•Multi-Experiment TIER1–LHC experiments–VIRGO–CDF–BABAR–AMS, MAGIC, ARGO, ...

•Resources assigned to experiments on a yearly Plan.

Page 3: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Services•Computing servers (CPU FARMS)

•Access to on-line data (Disks)

•Mass Storage/Tapes

•Broad-band network access

•System administration

•Database administration

•Experiment specific library software

•Coordination with TIER0, other TIER1s and TIER2s

Page 4: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Infrastructure•Hall in the basement (-2nd floor): ~ 1000 m2

of total space–Easily accessible with lorries from the road–Not suitable for office use (remote control)

•Electric power–220 V mono-phase (computers)

• 4 x 16A PDU needed for 3.0 GHz Xeon racks

–380 V three-phase for other devices (tape libraries, air conditioning etc…)

–UPS: 800 KVA (~ 640 KW)• needs a separate room (conditioned and

ventilated).–Electric Generator: 1250 KVA (~ 1000 KW)

up to 160 racks (~100 with 3.0 GHz Xeon)

Page 5: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

HW Resources•CPU:

– 320 old biprocessor boxes 0.8-2.4 GHz– 350 new biprocessor boxes 3 GHz

(+70 servers +55 Babar + 48 CDF +30 LHCb)• 1300 KSi2K Total

•Disk:– FC, IDE, SCSI, NAS

•Tapes:– Stk L180 18 TB– Stk 5500

• 6 LTO-2 with 2000 tapes 400 TB• 2 9940B with 800 tapes 200 TB

•Networking:– 30 rack switches 46 FE UTP + 2 GE FO– 2 core switches 96 GE FO + 120 GE FO + 4x10 GE– 2x1Gbps links to WAN

Page 6: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Networking

•GARR-G Backbone with 2.5 Gbps F/O will be upgraded to Dark Fiber (Q3 2005)

•INFN-TIER1 access is now 1 Gbps (+1 Gbps for SC) and will be 10 Gbps soon (September 2005)– Gigapop is colocated with INFN-CNAF

•International Connectivity via Geant: 10 Gbps access in Milan already in place.

Page 7: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Tier1 LAN

•Each CPU rack (36 WNs) equipped with FE switch with 2xGb uplinks to core switch

•Disk servers connected via GE to core switch

•Foreseen upgrade to rack Gb switch–10 GB core switch already installed

Page 8: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

LAN model layout

SAN

STKTape Library

NAS

Rack_farm

Rack_farm

Page 9: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Networking resources

•30 Switches (14 switches 10Gb Ready)•3 Core Switch/Router (SSR8600, ER16, BD)

– SSR8600 is also the WAN access router with firewalling functions)

– A new Black Diamond 10808 is already installed with 120 GE and 12x10GE ports (it can scale up to 480 GE or 48x10GE)

– New access router (with 4x10GE and 4xGE interfaces) in order to substitute the SSR8600 as WAN access (ER 16 and Black Diamond will aggregate all the Tier1’s resources).

•3 Switch L2/L3 (with 48xGE and 2x10GE) to be used during the “Service Challenge”.

Page 10: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Farming

•Team composition– 2 FTE for general purpose farm – 1 FTE for hosted farms (CDF and Babar)– ~3 FTE clearly not enough to manage ~800 WNs

more people needed

•Tasks– Installation & management of Tier1 WNs and

servers• Using Quattor (still some legacy lcfgng nodes around)

– Deployment & configuration of OS & LCG middleware• HW maintenance management

– Management of batch scheduler (LSF, torque)

Page 11: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Access to Batch system

“Legacy” non Grid Access

CE LSF Wn1 WNn

SE

Grid AccessUI

UIUI UI

Grid

Page 12: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Farming: where were we last year

•Computing resources not fully used – Batch system (Torque+maui) showed not

to be scalable– Present policy assignment does not allow

resource use optimization

•Interoperability issues– Full experiment integration still to be

achieved – Difficulty to deal with 3 different farms

Page 13: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Farming: evolution (1)•Migration of whole farm to SLC 3.0.4 (CERN

version!) almost complete– Quattor deployment successful (more than 500

WNs)– Standard configuration of WNs for all experiments– To be completed in a couple of weeks

•Migration from torque+maui to LSF (v6.0)– LSF farm running successfully– Process to be completed together with OS

migration– Fair sharing model for resource access– Progressive inclusion of BABAR & CDF WNs into

general farm

Page 14: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Farming: evolution (2)

•LCG upgrade to 2.4.0 release – Installation via quattor

([email protected])• Dropped lcfgng• Different packaging from yaim• Deployed upgrade to 500 nodes in one day

– Still some problems with VOMS integration to be investigated

– 1 legacy LCG 2.2.0 CE still running

•Access to resources centrally managed with Kerberos (authc) and LDAP (authz)

Page 15: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Some numbers (LSF output)QUEUE_NAME PRIO STATUS MAX JL/U JL/P JL/H NJOBS PEND RUN SUSPalice 40 Open:Active - - - - 50 0 50 0cdf 40 Open:Active - - - - 840 152 688 0dteam 40 Open:Active - - - - 0 0 0 0atlas 40 Open:Active - - - - 26 12 14 0cms 40 Open:Active - - - - 0 0 0 0lhcb 40 Open:Active - - - - 7 7 0 0babar_test 40 Open:Active 300 300 - - 302 2 300 0babar 40 Open:Active 20 20 - - 2073 2060 13 0virgo 40 Open:Active - - - - 0 0 0 0argo 40 Open:Active - - - - 0 0 0 0magic 40 Open:Active - - - - 0 0 0 0ams 40 Open:Active - - - - 136 0 136 0infngrid 40 Open:Active - - - - 0 0 0 0guest 40 Open:Active - - - - 0 0 0 0test 40 Open:Active - - - - 0 0 0 0

• 1200 jobs running!

Page 16: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Storage & Database• Team composition

– ~ 1.5 FTE for general storage – ~ 1 FTE for CASTOR– ~ 1 FTE for DBs

• Tasks– DISK (SAN, NAS) - HW/SW installation and

maintenance, remote (gridSE) and local (rfiod/nfs/GPFS) access service, clustered/parallel filesystem tests, participation to SC

• 2 SAN systems (~ 225 TB)• 4 NAS systems (~ 60TB)

– CASTOR HSM system - HW/SW installation and maintenance, gridftp and SRM access service

• STK library with 6 LTO2 and 2 9940B drives (+4 to install)

– 1200 LTO2 (200 GB) tapes – 680 9940B (200 GB) tapes

– DB (Oracle for Castor & RLS test, Tier 1 “global” Hardware db)

Page 17: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Storage setup• Physical access to main storage (Fast-T900) via SAN

– Level1 disk servers connected via FC• Usually also in GPFS cluster

– Easiness of administration– Load balancing and redundancy– Lustre under evaluation

– Can be level2 disk servers connected to storage only via GPFS

• LCG and FC dependencies on OS decoupled

• WNs are not members of GPFS cluster (no scalability on large number of WNs) – Storage available to WNs via rfio, xrootd (BABAR) or NFS

(few cases, see next slide)– NFS used mainly to share experiment sw on WNs

• But not suitable for data access

Page 18: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

NFS stress testNFS server

.. . .. .

1) Connectathon (NFS/RPC test)

2) iozone (NFS-I/O test)

NFS clients

WNsNAS LSF

fibre channell

Parameters•client kernel•server kernel•rsize•wsize•protocol•test execution time•I/O disk write•I/O wait server threads

Results• problems with 2.4.21-27.0.2• kernel 2.6 is better than 2.4 (even with sdr patch)• UDP protocol has better performance than TCP• better rsize is 32768• better wsize is 8192 • nfs protocol may scale over 200 clients without aggregate performance degradation.

job scheduler

Page 19: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Disk storage tests•Data processing for the LHC experiments at

Tier1 facilities requires the access to PetaBytes of data from thousands of nodes simultaneously at high rate

•None of old-fashioned storage systems allow to handle these requirements

•We are in the process of defining the required hardware and software components of the system– It is emerging that a Storage Area Network

approach with a Parallel File System on top can make the job

V. Vagnoni – LHCb Bologna

Page 20: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Hardware testbed• Disk storage

– 3 controllers of a IBM DS4500– Each controller serves 2 RAID5 arrays,

4 TB each (17 x 250 GB disks + 1 hot spare)– Each RAID5 is further subdivided in two LUNs of 2 TB

each– 12 LUNs and 24 TB of disk space in total

(102 x 250GB disks + 8 hot spares)

• File System Servers– 6 IBM xseries 346, dual Xeon, 2 GB RAM, Gigabit NIC– QLogic fiber channel PCI card on each server connected

to the DS4500 via a Brocade switch– 6 Gb/s available bandwidth to/from the clients

• Clients– 36 dual Xeon, 2 GB RAM, Gigabit NIC

V. Vagnoni – LHCb Bologna

Page 21: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Parallel File Systems• We evaluated the two main-stream products on the market

– GPFS (version 2.3) by IBM– Lustre (version 1.4.1) by CFS

• Both come with advanced management and configuration features, SAN-oriented failover mechanism, data recovery

– But can be used as well on top of standard disks and arrays

• By using GPFS and Lustre, the 12 DS4500 LUNs in use were aggregated in one 24 TB file system from the servers, and mounted by the clients through the Gigabit network

– Both the file systems are 100% POSIX-compliant from the client side

– The file systems appear to the clients as ordinary local mount points

V. Vagnoni – LHCb Bologna

Page 22: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Performance (1)• A home-made benchmarking tool oriented to HEP

applications has been written– Allows simultaneous sequential read/write from an arbitrary

number of clients and processes per client

Raw ethernet throughput vs time (20 x 1GB file simultaneous reads)

V. Vagnoni – LHCb Bologna

Page 23: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Performance (2)

0

1

2

3

4

5

6

Lustre write GPFS write Lustre read GPFS read

Net

th

rou

gh

pu

t (G

b/s

)

# of simultaneous read/writes

Results of read/write (1GB different files)

V. Vagnoni – LHCb Bologna

Page 24: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Service challenge (1)WAN• Dedicated 1Gb/s link

connected via GARR+Geant• 10 Gb/s link available in

September ‘05

LAN• Extreme Summit 400

(48xGE+2x10GE) dedicated to Service Challenge.

CERN

10Gb in September 2005

1Gb

Summit 40048 Gb/s+2x10Gb/s

GARRItalian Research Network

11 Servers+ Internal HDs

Page 25: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Service challenge (2)

• 11 SUN Fire V20 Dual Opteron (2,2 Ghz)– 2x 73 GB U320 SCSI HD – 2x Gbit Ethernet interfaces– 2x PCI-X Slots

• OS: SLC3.0.4 (arch x86_64), the kernel is 2.4.21-27 Tests bonnie++/IOzone on local disks ~60 MB/s r and wTests Netperf/Iperf on LAN ~950 Mb/s

• Globus (GridFTP) v2.4.3 Installed on all cluster nodes

• CASTOR SRM v1.2.12, Stager CASTOR v1.7.1.5.

Page 26: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

1 2 3 10SRM/GridFTP

CASTOR Stager

Radiant software

10 Gbs

1 Gbs 1 Gbs 1 Gbs 1 Gbs

SRM repository

SC2: Transfer cluster

• 10 machines used as GridFTP/SRM servers, 1 as CASTOR Stager/SRM-repository – Internal disks used (70x10=700GB).– For SC3 also CASTOR tape servers with IBM LTO2 or STK 9940B

drives will be used

• Load balancing implemented assigning to a CNAME the IP addresses of all the 10 servers with round-robin algorithm

• SC2 goal reached: 100 MBps disk-to-disk for 2 weeks sustained

Page 27: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Sample of production: CMS (1)

•CMS activity at T1: grand summary– Data transfer

• >1 TB per day T0T1 in 04 via PhEDEx

– Local MC production • >10 Mevts for >40 physics datasets

– Grid activities• official production on LCG• analysis on DST via grid tools

Page 28: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Sample of production: CMS (2)

Page 29: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Sample of production: CMS (3)

Page 30: INFN Tier1 Status report Spring HEPiX 2005 Andrea Chierici – INFN CNAF

Summary & Conclusions

•During 2004 INFN Tier1 deeply involved in LHC– Some experiments (e.g. BABAR, CDF)

already in data taking phase

•Main issue is HR shortage– ~10 FTE required for Tier1 to nearly

double the staff (both for hw and sw maintenance)