“managing a farm without user jobs would be easier” clusters and users at cern tim smith cern/it

“Managing a farmwithout user jobswould be easier”

Clusters and Users at CERN

Tim Smith CERN/IT

2002/10/25 HEPiX fall 2002: [email protected] 2

Contents

The road to shared clusters Batch cluster

Configuration User challenges Addressing the challenges

Interactive cluster Load balancing

Conclusions


0

100

200

300

400

500

600

2000

0120

0006

2000

1120

0016

2000

2120

0026

2000

3120

0036

2000

4120

0046

2000

5120

0104

2001

0920

0114

2001

1920

0124

2001

2920

0134

2001

3920

0144

2001

4920

0202

2002

0720

0212

2002

1720

0222

2002

2720

0232

2002

3720

0242

2002

4720

0252

2003

05

Year/Week

#C

PU

s

W/NT

AIX

IRIX

HP-UX

DUX

Solaris

The Demise of Free Choice

2000

2001

2002

2003


Cluster Aggregation

0

500

1000

1500

2000

2500

Jul-

97

Jan

-98

Jul-

98

Jan

-99

Jul-

99

Jan

-00

Jul-

00

Jan

-01

Jul-

01

Jan

-02

Jul-

02

#C

PU

s

lxplus

lxshare

eff

lxbatch

testbedtomog

tapes

pcsf

nomad

na49

na48

na45

mta

l3c

ion

cms

ccf

atlas

alice


Organisational Compromises

Clusters per Groups Sized for the average

users

Sized for user peaks users financiers : wasted resources

Invest effort in recooperating cycles for other groups

Configuration differences / specialities

Bulk Production Clusters Production fluctuations dwarf those in user

anal Complex cross-submission links


Production Farm: Planning

0

100

200

300

400

500

600

700

800

2 3 4 5 6 7 8 9 10

11 12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

Week Number

LX

SH

AR

E 2

002:

Req

ues

ted

nu

mb

er o

f P

Cs ATLAS off

LHCb

CMS

ALICE off

NA48

COMPASS

Reserve

IT

EDG

ATLAS on

ALICE on

Available

LCG limit


Shared Clusterslx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lxplus001

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxbatch001

DNS load balancing

LSF

disk001

rfio

tape001

rfio

disk001 tape001

750 Batch Servers

70 InteractiveServers

120 Disk Servers


Simple, Uniform

Shared Cluster ?


Partitioning

Still have identified resources Uniform configuration

Sharing Repartitioning or soak-up queues

If owner experiment reclaims resources, must suspend soak-up jobs – stranded jobs

AL

ICE

AT

LA

SC

MS

LH

Cb

AL

EP

HD

EL

PH

IL

3O

PA

LC

OM

PA

SS

Nto

fO

PE

RA

SL

AP

PA

RC

PA

RC

Int

CV

SB

UIL

DD

EL

PH

I In

tC

SF

Pub

lic


LSF Fair-Share

Trade-in partition for a share Multilevel

ATLAS 10%, CMS 12%, … cmsprod 45%, HiggsWG 15%, … usera 10%, userb 80%, userc 10%

Extra shares for productions Effort: Juggling resources to Accounting

Demonstrating fairness Protecting Policing


Facts and Figures

Accounting LSF job records Process with C-program Load into Oracle DB Prepare plots/tables with Crystal

Reports package LSFAnalyser ?

Monitoring Poll the user access tools SiteAssure ?


CPU Time / WeekMerged user analysisand production farms


0

500

1000

1500

2000

2500

0

20

40

60

80

100

120

140

160

180

# pending % free # not placed # users

Performance of BatchJob Slot Analysis

Thu Fri Sa

10 min / tick


Challenging Batch (I)

Probing boundaries Flooding Concurrent starts Uncontrolled status polling

Hitting limits Disk space /tmp /pool /var Memory, Swap Full

Guarantees for other user jobs?

System Issues Queue drainers


Challenging Batch (II)

Un-Fair-Share Logging onto batch machines Batch jobs which resubmit themselves Forking sessions back to remote hosts

Wasting resources Spawning processes which outlive the jobs Sleeping processes Copying large AFS trees Establishing connections to dead machines


Counter Measures

File system quotas Virtual memory limits Concurrent jobs limits per

user/group Restricted access through PAM Instant response queues Master node setup

Dedicated, 1GB memory Failover cluster


Shared Clusterslx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lxplus001

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxbatch001

DNS load balancing

LSF

disk001

rfio

tape001

rfio

disk001 tape001

750 Batch Servers


120 Disk Servers

LSFMultiCluster


Shared Clusterslx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lx

plu

s00

1lxplus001

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxb

atc

h0

01

lxbatch001

DNS load balancing

LSF

disk001

rfio

tape001

rfio

disk001 tape001

750 Batch Servers


120 Disk Servers

SingleCluster


Interactive Cluster

DNS load balancing (ISS) Weighted load indexes

load, memory swap rate, disk IO rate # processes, # sessions, # window mgr

sessions

Exclusion thresholds file systems full, nologins

DNS publish 2 every 30 seconds Random from lowest 5


Daily Users

35 users/ node


Challenging Interactive

Sidestep load balancing

Parallel sessions across farm

Running daemons Brutal logouts

Open connections Defunct processes CPU sapping orphaned

processes

Monitoring+

beniced+

Monthly reboots


Interactive Reboots


Conclusions

Shared clusters present more user opportunities Both Good and Bad !

Don’t represent a panacea for sysadmins !

“managing a farm without user jobs would be easier” clusters and users at cern tim smith cern/it

Documents

batch machinesbatch

user access toolssiteassure

farmwithout user jobswould

user peaks users financiers

usergrouprestricted

juggling resources

swap fullguarantees

oracle dbprepare plotstables