hep computing status sheffield university matt robinson paul hodgson andrew beresford
TRANSCRIPT
![Page 1: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/1.jpg)
HEP Computing Status
Sheffield UniversityMatt Robinson
Paul Hodgson
Andrew Beresford
![Page 2: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/2.jpg)
Interactive Cluster
• 30 self built linux boxes• AMD Athlon XP cpu’s, 256/512 meg ram• OS Scientific Linux 303• 100 megabit network• Use NIS for authentication, NFS mount /home etc• System install using kickstart + post install scripts• Separate backup machine• 15 Laptops mostly dual boot• Some MAC’s and one Windows Box• 3 Disk servers mounted as /data1 /data2 etc (few TB)
![Page 3: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/3.jpg)
Batch Cluster
• 100 cpu farm Athlon XP 2400/2800• OS Scientific Linux 303• NFS mounted /home and /data• OpenPBS batch system for job submission• Gigabit Backbone with 100 MBit to worker nodes• Disk server provides 1.3 TB as /data Raid5• Entire cluster assembled in house from OEM components
for less than 50k• Hard part was finding air-conditioned room with sufficient
power
![Page 4: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/4.jpg)
Cluster Usage
![Page 5: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/5.jpg)
Software
• PAW, CERNLIB etc• Geant4• ROOT• Atlas 10.0.1• FLUKA• ANSYS, LS-DYNA
![Page 6: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/6.jpg)
Comments - Issues
• Have tightened up security in last year• Strict firewall policy, limited machine exemption• Blocking scripts prevent ssh access after 3
authentication failures within 1 hour• Cheap disks allow construction of large disk
arrays• Very happy with SL3 for desktop machines• Use FC3 for Laptops – 2.6 kernel
![Page 7: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/7.jpg)
The Sheffield LCG Cluster
![Page 8: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/8.jpg)
Division of Hardware• 162 x AMD Opteron 250 (2.4
GHz)• 4 GB RAM/box (2 GB/CPU)• 72 GB U320 10K RPM local
SCSI disk• Currently running 32 bit
SL303 for maximum compatibility with grid.
• ~2.5 TB storage for experiments.
• Middleware: 2.4.0• Probably the most purple
cluster in the grid.
![Page 9: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/9.jpg)
Looking Sinister
![Page 10: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/10.jpg)
Status
![Page 11: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/11.jpg)
Usage so far
• We can take quite a bit more.
![Page 12: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/12.jpg)
Monitoring
• Ganglia with modified webfrontend to present queue information
![Page 13: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/13.jpg)
Installation
• Service nodes connected to VPN and Internet
• PXE Installation via VPN allows complete control of dhcpd and named
• RedHat kickstart + post install script
• ssh servers not exposed
• RGMA always the hardest part
• Stumbled across routing rules.
• WN install takes about 30 minutes, can do up to 40 simultaneously.
![Page 14: HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford](https://reader036.vdocuments.net/reader036/viewer/2022062500/5697bf9d1a28abf838c9404c/html5/thumbnails/14.jpg)
Future plans
• Keep up with middleware updates
• Increase available storage as required in
~3-4 TB steps
• Fix things that break
• Try not to mess anything up by screwing around
• Look toward operating with 64 bit OS.
Matt Robinson:Matt Robinson: