cern - european laboratory for particle physics hep computer farms frédéric hemmer cern...

23
CERN - European Laboratory for Particle Physics CERN - European Laboratory for Particle Physics HEP Computer Farms Frédéric Hemmer CERN Information Technology Division Physics Data processing Group

Upload: heather-moody

Post on 27-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

HEP Computer Farms

Frédéric Hemmer CERN

Information Technology Division

Physics Data processing Group

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 2C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

Outline

Offline analysis (current) Quasi-on line analysis (2000) Online analysis (2005)

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 3C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

Physics Data Processing evolution 1990->1997

• Migration from mainframe computing to Distributed RISC/Unix computing

Reduce Acquisition/Maintenance Costs Decrease Price/Performance ratio

but … system management costs becoming a serious issue 1997->...

• Migration from Distributed RISC/Unix to PC (NT & Linux) technology

Reduce even further Acquisition/Maintenance Costs Possible only starting with Ppro performance

but new issues : OS, management model, stability, performance, technology evolution, etc..

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 4C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

Physics Data Processing evolution (II)

Before 1992• Manual tape transfer

1992-1994• Central Data Recording (< 1 MB/s)

1998• Computer Center part of the experiment• CDR (20 MB/s) and online tagging

2000• CDR (35 MB/s) and online filtering

2005• CDR (100-1000 MB/s) and online filtering

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 5C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

Offline analysis

NT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PCNT PC

Network

Network

Unix RFIOUnix RFIOServerServer

Unix RFIOUnix RFIOServerServer

Unix RFIOUnix RFIOServerServer

Unix RFIOUnix RFIOServerServer

Unix TapeUnix TapeServerServer

stagexxx commandsstagexxx commands

RFIORFIO

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 6C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

NT Simulation Facility : Goals

Make PC+NT a standard option for Physics Data Processing, starting with simulation

Establish a minimum management model for NT farm management

Address scalability issues Gain Windows NT experience

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 7C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

Physically ...

1997 1998

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 8C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

PCSF Usage

0

1000

2000

3000

4000

5000

6000

7000

8000

43 44 45 46 47 48 49 50 51 52 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Week #

NC

U h

ou

rs

Idle

Used

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 9C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

0

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Key Issues

AFS access LSF support Boot proms, equipment interoperability CODE reintegration (Physics & CERNLIB) Think Windows Scalability & Management (home grown

solution vs. commercial apps.)

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

1

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Next Steps

Finish and understand remote boot issues Complete remote boot - remote install AFS Integration Build up resilience Investigate how to use the new WfM, DMI,

PXE, ACPI, etc. initiatives Investigate whether WSH is an alternative Investigate NT’s I/O capabilities

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

2

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Conclusions

PC+NT has proven to work in batch environment, and is now an option for Physics Data Processing

Farm management is less of a concern after have built a few tools (alternatives would be to use SMS or TNG), but some work is still needed

Scalability has started to be addressed, but the relatively small number of nodes does not help here

Considerable NT experience has been gained

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

3

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

NCF Initial conclusions (Feb. 1998)

PC’s can be used for low I/O tasks Confirmed with

• Simulation on PCSF• NOMAD reconstruction on Linux• NA45 reconstruction on NT

PC’s are not adequate now as disk servers

Mixed PC/RISC Unix clusters will be used for 1998

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

4

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

PC NT based Tape server

Goal:• Attach SCSI/FC-AL tape drives to PC

running NT and provide access to 100’s of TB through Gbit Ethernet/HiPPI in order to reduce server acquisition prices.

• Obtain good enough performance (Linux has been already demonstrated) on this platform. Is HPSS out of the game here ?

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

5

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Disk performance (Feb. 1998)

# Streams Linux Windows/NT MaxMB/s CPU % MB/s CPU % MB/s

1 10.5 33 8.5 35 112 21 63 9.2 35 703 21 100 13.5 60 70

• Linux striping has no effect

•1 stream 2 stripes : 21 MB/s (22 max)

•1 stream 3 stripes : 21 MB/s (33 max)

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

6

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Memory bandwidth (lmbench)

0

50

100

150

200

250

300

350

MB/s

Ta

ho

e2

DK

440

LX

Th

un

de

r2

Tig

er2

GA

686

DL

X

GA

686

(CP

U1

)

GA

686

(CP

U2

)

DE

C P

WS

43

3

SU

N U

ltra

5

Th

un

de

r10

0

N4

40

BX

Ka

ya

k X

A's

Co

mp

aq

Pro

lia

nt

16

00

Equipment

Mem read

Mem write

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

7

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

PC NT based Disk server

Goal:• Attach (RAID) disk drives to PC’s running NT

and provide access to 10’s of TB through Gbit Ethernet/HiPPI in order to reduce server acquisition prices.

• Obtain good enough performance on this platform (including using Objectivity databases).

• Issues– Scalability, disk & network performance

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

8

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s Current & Future Data rates at

CERN

Year Experiments BandwidthMB/s

Raw DataTB/year

ProcessingSPECInt95

1990-2000

LEP 0.5 1 100

1997-2000

SPS 15-20 30-70 500

2000-2008

SPS 35 300 2000

2004- LHC 100-1000 3000 50000

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 1

9

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

NA48 setup (1999)

Fast EthernetFast Ethernet

Gigabit EthernetGigabit Ethernet

HiPPIHiPPI

4 * SUN E4504 * SUN E4504.5 TB Disk space4.5 TB Disk space

EventEventBuilderBuilder

Sub detectorSub detectorVME cratesVME crates

7 KM7 KM

3Co

m 3900

3Co

m 3900

HiPPIHiPPI3Com 93003Com 9300

Gigabit EthernetGigabit Ethernet

Fast EthernetFast Ethernet

Cisco 5505Cisco 5505

On/OfflineOn/OfflinePC FarmPC Farm

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 2

0

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Compass (1999-2008)

Parameters• 300 TB/year• 5-20 TB disks• 20000 CU = 200 PII@450 MHz• 35 MB/s• Objectivity

NT is considered for both computing and data serving.

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 2

1

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

CMS Trigger DAQ

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 2

2

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

Main challenges Management/Control/Monitoring of the filter

applications • how to distribute the work (Symera, Clustor ?)• how to manage 1000’s of tasks at every moment

Management/Control/Monitoring of the “computer system” itself• could be 1000’s computer systems• or one very large SMP• or a combination

MS Research, 27 January 1999

Frédéric Hemmer CERN-IT/PDP 2

3

CE

RN

- E

uro

pea

n L

abo

rato

ry f

or

Par

ticl

e P

hys

ics

C

ER

N -

Eu

rop

ean

Lab

ora

tory

fo

r P

arti

cle

Ph

ysic

s

NT Prototype

Modest farm exists in CPPM (Marseille):• 4 Dual Ppro’s @ 200 Mhz• Small SUN as injector• Fast Ethernet switch

Regularly being scale-up at CERN using larger configurations :• Gbit Ethernet• 30 Processors