“managing a farm without user jobs would be easier” clusters and users at cern tim smith cern/it
TRANSCRIPT
“Managing a farmwithout user jobswould be easier”
Clusters and Users at CERN
Tim Smith CERN/IT
2002/10/25 HEPiX fall 2002: [email protected] 2
Contents
The road to shared clusters Batch cluster
Configuration User challenges Addressing the challenges
Interactive cluster Load balancing
Conclusions
2002/10/25 HEPiX fall 2002: [email protected] 3
0
100
200
300
400
500
600
2000
0120
0006
2000
1120
0016
2000
2120
0026
2000
3120
0036
2000
4120
0046
2000
5120
0104
2001
0920
0114
2001
1920
0124
2001
2920
0134
2001
3920
0144
2001
4920
0202
2002
0720
0212
2002
1720
0222
2002
2720
0232
2002
3720
0242
2002
4720
0252
2003
05
Year/Week
#C
PU
s
W/NT
AIX
IRIX
HP-UX
DUX
Solaris
The Demise of Free Choice
2000
2001
2002
2003
2002/10/25 HEPiX fall 2002: [email protected] 4
Cluster Aggregation
0
500
1000
1500
2000
2500
Jul-
97
Jan
-98
Jul-
98
Jan
-99
Jul-
99
Jan
-00
Jul-
00
Jan
-01
Jul-
01
Jan
-02
Jul-
02
#C
PU
s
lxplus
lxshare
eff
lxbatch
testbedtomog
tapes
pcsf
nomad
na49
na48
na45
mta
l3c
ion
cms
ccf
atlas
alice
2002/10/25 HEPiX fall 2002: [email protected] 5
Organisational Compromises
Clusters per Groups Sized for the average
users
Sized for user peaks users financiers : wasted resources
Invest effort in recooperating cycles for other groups
Configuration differences / specialities
Bulk Production Clusters Production fluctuations dwarf those in user
anal Complex cross-submission links
2002/10/25 HEPiX fall 2002: [email protected] 6
Production Farm: Planning
0
100
200
300
400
500
600
700
800
2 3 4 5 6 7 8 9 10
11 12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Week Number
LX
SH
AR
E 2
002:
Req
ues
ted
nu
mb
er o
f P
Cs ATLAS off
LHCb
CMS
ALICE off
NA48
COMPASS
Reserve
IT
EDG
ATLAS on
ALICE on
Available
LCG limit
2002/10/25 HEPiX fall 2002: [email protected] 7
Shared Clusterslx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lxplus001
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxbatch001
DNS load balancing
LSF
disk001
rfio
tape001
rfio
disk001 tape001
750 Batch Servers
70 InteractiveServers
120 Disk Servers
2002/10/25 HEPiX fall 2002: [email protected] 9
Partitioning
Still have identified resources Uniform configuration
Sharing Repartitioning or soak-up queues
If owner experiment reclaims resources, must suspend soak-up jobs – stranded jobs
AL
ICE
AT
LA
SC
MS
LH
Cb
AL
EP
HD
EL
PH
IL
3O
PA
LC
OM
PA
SS
Nto
fO
PE
RA
SL
AP
PA
RC
PA
RC
Int
CV
SB
UIL
DD
EL
PH
I In
tC
SF
Pub
lic
2002/10/25 HEPiX fall 2002: [email protected] 10
LSF Fair-Share
Trade-in partition for a share Multilevel
ATLAS 10%, CMS 12%, … cmsprod 45%, HiggsWG 15%, … usera 10%, userb 80%, userc 10%
Extra shares for productions Effort: Juggling resources to Accounting
Demonstrating fairness Protecting Policing
2002/10/25 HEPiX fall 2002: [email protected] 11
Facts and Figures
Accounting LSF job records Process with C-program Load into Oracle DB Prepare plots/tables with Crystal
Reports package LSFAnalyser ?
Monitoring Poll the user access tools SiteAssure ?
2002/10/25 HEPiX fall 2002: [email protected] 12
CPU Time / WeekMerged user analysisand production farms
2002/10/25 HEPiX fall 2002: [email protected] 13
0
500
1000
1500
2000
2500
0
20
40
60
80
100
120
140
160
180
# pending % free # not placed # users
Performance of BatchJob Slot Analysis
Thu Fri Sa
10 min / tick
2002/10/25 HEPiX fall 2002: [email protected] 14
Challenging Batch (I)
Probing boundaries Flooding Concurrent starts Uncontrolled status polling
Hitting limits Disk space /tmp /pool /var Memory, Swap Full
Guarantees for other user jobs?
System Issues Queue drainers
2002/10/25 HEPiX fall 2002: [email protected] 15
Challenging Batch (II)
Un-Fair-Share Logging onto batch machines Batch jobs which resubmit themselves Forking sessions back to remote hosts
Wasting resources Spawning processes which outlive the jobs Sleeping processes Copying large AFS trees Establishing connections to dead machines
2002/10/25 HEPiX fall 2002: [email protected] 16
Counter Measures
File system quotas Virtual memory limits Concurrent jobs limits per
user/group Restricted access through PAM Instant response queues Master node setup
Dedicated, 1GB memory Failover cluster
2002/10/25 HEPiX fall 2002: [email protected] 17
Shared Clusterslx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lxplus001
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxbatch001
DNS load balancing
LSF
disk001
rfio
tape001
rfio
disk001 tape001
750 Batch Servers
70 InteractiveServers
120 Disk Servers
LSFMultiCluster
2002/10/25 HEPiX fall 2002: [email protected] 18
Shared Clusterslx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lx
plu
s00
1lxplus001
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxb
atc
h0
01
lxbatch001
DNS load balancing
LSF
disk001
rfio
tape001
rfio
disk001 tape001
750 Batch Servers
70 InteractiveServers
120 Disk Servers
SingleCluster
2002/10/25 HEPiX fall 2002: [email protected] 19
Interactive Cluster
DNS load balancing (ISS) Weighted load indexes
load, memory swap rate, disk IO rate # processes, # sessions, # window mgr
sessions
Exclusion thresholds file systems full, nologins
DNS publish 2 every 30 seconds Random from lowest 5
2002/10/25 HEPiX fall 2002: [email protected] 21
Challenging Interactive
Sidestep load balancing
Parallel sessions across farm
Running daemons Brutal logouts
Open connections Defunct processes CPU sapping orphaned
processes
Monitoring+
beniced+
Monthly reboots
2002/10/25 HEPiX fall 2002: [email protected] 22
Interactive Reboots
2002/10/25 HEPiX fall 2002: [email protected] 23
Conclusions
Shared clusters present more user opportunities Both Good and Bad !
Don’t represent a panacea for sysadmins !