hepsysman, july 09 steve jones [email protected]

16
HEPSYSMAN, July 09 Steve Jones [email protected]

Upload: morgan-macpherson

Post on 28-Mar-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

HEPSYSMAN, July 09Steve Jones

[email protected]

Page 2: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Policy Experience

For deals under £75K per order: get 3 quotes, select cheapest, with some leeway for (e.g.) quality of service etc.

We just buy hardware; we do the software work.

For bigger deals (e.g. new cluster); sealed tender bids.

IT Supplies is often selected as the cheapest, with good support. But they have a longer turnaround time than a larger firm, and we need to keep spares on site.

Page 3: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Dual quad-core, 2.7 gHz, 64b, SL4.4, 10 g RAM glite-SE_DPM_mysql-3.1.13-0 Physical disk: 4 x Seagate 250GB, sda..d S/W Raid 1

md0,sda1,sdd1,spare, 15 GB, / md1,sda2,sdd2,spare, 8 GB, swap md2,sda3,sdd3,spare, 30 GB, /var md3,sda5,sdd5,spare, 30 GB, /opt

S/W Raid 10 (Access and reliability for mySQL and /gridstore, which has storage for ops VO, SAMS

tests) md4, sda6, sdb6,sdc6,sdd6, 200 GB,

/var/lib/mysql md5,sda7,sdb7,sdc7,sdd7,113 GB, /gridstore

Page 4: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Dual quad-core, 2.3 gHz, 64b, SL4.4, 16 g RAM glite-SE_DPM_disk-3.1.12-0 Total of 8 systems S/W Raid 1 (local for OS)

Physical disks: 2 x Seagate 250GB, hda, hdc md0,hda1,hdc1,spare, 15 GB, / md1,hda5,hdc5,spare, 4 GB, swap md2,hda2,hdc2,spare, 30 GB, /opt md3,hda3,hdc3,spare, 30 GB, /var md4,hda6,hdc6,spare, 163 GB, /data

Page 5: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

H/W Raid 6 (typical) Physical disk: 24 x 1T SATA II drives, configured as 1 x

22 TB disk with 4 partitions. sda1, 5.3TB sda2, 5.3TB sda3, 5.3TB sda4, 5.3TB

Page 6: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Raid controllers, ~ 150 T 3ware 9650SE-16ML 3ware 9650SE-24M8 Areca 1280

Type Advantages Disadvantages

Areca Faster, less tuning Slow turnaround to Taiwan

3Ware Faster return warranty

Didn’t broadcast known bug (loss of access, reboot). More tuning.

Page 7: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

1 gbps NICs, Intel 4 per system. Two on MB + 2 extra Each system has 2 network presences One to HEP network, routable to Internet

(138.253…), using 1 NIC. Another for internal traffic

(192.168.178…); busy, so 3 NICs, bonded (1 MAC) to Force10 switch (LACP). Useful when 100s of connections (can’t break a connection across two NICs). Isolated, local grid, allows jumbo frames).

Page 8: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Each system has software firewall, to stop unwanted traffic (no ssh!)

Allows incoming rfio, gridftp, srm (head node only)

Page 9: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

CPU: ~ 20% RAM: only occasionally in swap NW: Close to saturation on pool nodes S/W:

Fixes to fair shares Update to Atlas Athena S/W (14.5.0)

Page 10: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Rfio buffer sizes: We started with 128 MB buffers to load plenty of data (rfio). Many connections to pool nodes exhausted RAM. This slowed things down. Due to the lack of cache space, the disks got hammered. Fix: reduce buffer size to 64 MB.

Note: big buffers help CPU efficiency, but strategy fails if forced to use swap when RAM exhausted, or when bandwidth is saturated. It depends on the number of jobs/connections.

Page 11: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Big data files:

RFIO Buff Size Effect

Small - low cpu efficiency+ low bandwith reqs.

Big + high cpu efficiency- heavy bandwidth reqd.

Page 12: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Small data files:

RFIO Buff Size Effect

Small - low cpu efficiency+ low bandwidth reqs.- head node busy

Big + higher cpu efficiency+ similar bandwidth reqs.- headnode busy

Page 13: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

We are ready for real data, we hope, as long as things remain as they are…

Improvements will continue, e.g. four more pool nodes of similar spec.

Page 14: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

atlasPool, CAPACITY: 109.98TB (uses space tokens) Space Tokens: ATLASDATADISK, ATLASGROUPDISK,

ATLASLIVERPOOLDISK, ATLASLOCALGROUPDISK, ATLASMCDISK, ATLASPRODDISK, ATLASSCRATCHDISK, ATLASUSERDISK

Authorized FQANs: atlas, atlas/Role=lcgadmin, atlas/Role=production, atlas/lcg1, atlas/uk, atlas/uk/Role=NULL

lhcbPool, CAPACITY: 21.99TB , USED: 2.37MB Authorized FQANs: lhcb, lhcb/Role=lcgadmin, lhcb/Role=production

opsPool, CAPACITY: 116.28GB USED: 11.27GB Authorized FQANs: dteam, ops, ops/Role=lcgadmin

t2kPool, CAPACITY: 22.03TB USED: 12.00TB Authorized FQANs: t2k, t2k/Role=lcgadmin, t2k/Role=production

Page 15: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Atlas data access is (apparently) random, which enormously reduces CPU efficiency by hammering the caching. Must this be so? Could it be sorted into some sequence?

Page 16: HEPSYSMAN, July 09 Steve Jones sjones@hep.ph.liv.ac.uk

Thanks to Rob and John at L’pool.

I hope it helps.