slide: 1 richard hughes-jones t2uk, october 06 r. hughes-jones manchester 1 update on remote...

17
Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones University of Manchester In Collaboration with: Bryan Caron University of Alberta Krzysztof Korcyl IFJ PAN Krakow Catalin Meirosu TERANA Jakob Langgard Nielsen Niels Bohr Institute

Post on 19-Dec-2015

221 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 1Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 1

Update on Remote Real-Time Computing Farms

For ATLAS Trigger DAQ.

Richard Hughes-Jones University of Manchester

In Collaboration with:Bryan Caron University of Alberta

Krzysztof Korcyl IFJ PAN Krakow

Catalin Meirosu TERANA

Jakob Langgard Nielsen Niels Bohr Institute

Page 2: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 2Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 2

Tests from Krakow Configure/Test/Operate Krakow – CERN ANT GPS box

Learn many details !!! Successful GPS and ANT operation using GUI at CERN Histograms still to be done

Synchronized Manchester – Krakow clocks with GPS

Configured ANT test Krakow-Manchester BUT … Manc Alteon card 66 MHz Manc Netgear card 0.5Mbyte Exchange at CERN next week

Online help from Catalin

udpmon tests Krakow – CERN CERN – Krakow Manc – CERN

Page 3: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 3Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 3

GÉANT2 Topology

Page 4: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 4Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 4

GÉANT2: The Convergence Solution

NREN AccessNREN Access

ExistingIP Router

ExistingIP Router

GÉANT2POP B

GÉANT2POP A

Managed Lambda’s

1626 LM

1626 LM

L2Matrix

L2

TDM Matrix

TDM

1678 MCC

1678 MCCD

ark

Fib

er

EXPReS PC10 GE

EXPReS PC10 GE

Page 5: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 5Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 5

Pionier Weather Map

Page 6: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 6Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 6

Test Traffic on Pionier

Page 7: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 7Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 7

CERN Krakow: Throughput CERN Krakow Krakow CERN

atb79-zeus15_28Sep06

0100200300400500600700800900

1000

0 10 20 30 40Spacing between frames us

Rec

v W

ire r

ate

Mbi

t/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

atb79-zeus15_28Sep06

0

20

40

60

80

100

0 10 20 30 40Spacing between frames us

% P

acke

t lo

ss

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

atb79-zeus15_28Sep06

0

20000

40000

60000

80000

100000

0 10 20 30 40Spacing between frames us

Pac

ket

re-o

rder

ed

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

800Mbit/s 20% Loss in the network

zeus15-atb79_29Sep06

0100200300400500600700800900

1000

0 10 20 30 40Spacing between frames us

Rec

v W

ire r

ate

Mbi

t/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

zeus15-atb79_29Sep06

0

20

40

60

80

100

0 10 20 30 40Spacing between frames us

% P

acke

t lo

ss

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

600-800Mbit/s 40% Loss in the network ~3 * more re-ordering

zeus15-atb79_29Sep06

0

50000

100000

150000

200000

250000

0 10 20 30 40Spacing between frames us

Pac

ket

re-o

rder

ed

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

Page 8: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 8Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 8

CERN Krakow: LatencyCERN Krakow Krakow CERN

Fwhm ~10us Peak separation 90-120 us

Fwhm 50 us Peak separation ~135 us

256 bytes zeus15-atb79_29Sep06

0100200300400500600700800900

37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us

N(t

)

512 bytes zeus15-atb79_29Sep06

0

200

400

600

800

1000

1200

37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us

N(t

)

1024 bytes zeus15-atb79_29Sep06

0

200400

600800

1000

12001400

1600

37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us

N(t

)

1400 bytes zeus15-atb79_29Sep06

0

500

1000

1500

2000

37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us

N(t

)

256 bytes atb79-zeus15_28Sep06

0100020003000400050006000700080009000

37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us

N(t

)

512 bytes atb79-zeus15_28Sep06

0

2000

4000

6000

8000

10000

37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us

N(t

)

1024 bytes atb79-zeus15_28Sep06

0

2000

4000

6000

8000

10000

37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us

N(t

)

1400 bytes atb79-zeus15_28Sep06

0

1000

2000

3000

4000

5000

6000

37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us

N(t

)

Page 9: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 9Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 9

CERN Krakow: 1-way delayCERN Krakow Krakow CERN

20 us packet spacing 20 us packet spacing Interrupt Coalescence

W20 zeus15-atb79_29Sep06

18900

19000

19100

19200

19300

19400

19500

19600

0 100 200 300 400 500Packet No.

1-w

ay d

elay

us

W20 zeus15-atb79_29Sep06

18800

18900

19000

19100

19200

19300

19400

19500

19600

19700

19800

0 1000 2000 3000 4000 5000

Packet No.

1-w

ay d

elay

us

W20 atb79-zeus15_28Sep06

18800

18900

19000

19100

19200

19300

19400

19500

19600

19700

19800

0 1000 2000 3000 4000 5000

Packet No.

1-w

ay d

elay

us

W20 atb79-zeus15_28Sep06

18900

19000

19100

19200

19300

19400

19500

19600

0 100 200 300 400 500Packet No.

1-w

ay d

elay

us

Page 10: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 10Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 10

Thanks to all who helped, including:

National Research NetworksCanarie, Dante, DARENET, Netera, PSNC and UKERNA

“ATLAS remote farms” J. Beck Hansen, R. Moore, R. Soluk,

G. Fairey, T. Bold, A. Waananen, S. Wheeler, C. Bee

“ATLAS online and dataflow software” S. Kolos, S. Gadomski, A. Negri, A. Kazarov, M. Dobson,

M. Caprini, P. Conde, C. Haeberli, M. Wiesmann, E. Pasqualucci, A. Radu

Page 11: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 11Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 11

Any Questions?

Page 12: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 12Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 12

Backup Slides

Page 13: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 13Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 13

Remote Computing Concepts

ROBROBROBROB

L2PUL2PUL2PUL2PU

SFISFI SFI

PFLocal Event Processing Farms

ATLAS Detectors – Level 1 Trigger

SFOs

Mass storageExperimental Area

CERN B513

CopenhagenEdmontonKrakowManchester

PF

Remote Event Processing Farms

PF

PF PF

ligh

tpat

hs

PF

Data Collection Network

Back End Network

GÉANT

Switch

Level 2 Trigger

Event Builders

Page 14: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 14Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 14

ATLAS Remote Farms – Network Connectivity

Page 15: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 15Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 15

ATLAS Application Protocol

Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes

Processing of event Return of computation

EF asks SFO for buffer space SFO sends OK EF transfers results of the computation

Tcpmon - instrumented tcp request-response program emulates the Event Filter EFD to SFI communication.

Send OK

Send event data

Request event

●●●

Request Buffer

Send processed event

Process event

Time

Request-Response time (Histogram)

Event Filter Daemon EFD SFI and SFO

Page 16: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 16Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 16

End Hosts & NICs CERN-Manc.

Request-response Latency

Throughput Packet Loss Re-Order Use UDP packets to characterise Host & NIC

SuperMicro P4DP8 motherboardDual Xenon 2.2GHz CPU400 MHz System bus66 MHz 64 bit PCI bus

pcatb89-gig6_18Jul04

0100200300400500600700800900

1000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire r

ate

Mbi

ts/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

pcatb89-gig6_18Jul04

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% P

acke

t lo

ss 50 bytes

100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

pcatb89-gig6_18Jul04

020406080

100120

0 5 10 15 20 25 30 35 40Spacing between frames us

Num

re-

orde

red

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

64 bytes pcatb89-gig6

0

1000

2000

3000

4000

5000

6000

20900 21100 21300 21500Latency us

N(t)

512 bytes pcatb89-gig6

0100020003000400050006000700080009000

20900 21100 21300 21500Latency us

N(t)

1400 bytes pcatb89-gig6

0

1000

2000

3000

4000

5000

6000

20900 21100 21300 21500Latency us

N(t)

Page 17: Slide: 1 Richard Hughes-Jones T2UK, October 06 R. Hughes-Jones Manchester 1 Update on Remote Real-Time Computing Farms For ATLAS Trigger DAQ. Richard Hughes-Jones

Slide: 17Richard Hughes-Jones

T2UK, October 06 R. Hughes-Jones Manchester 17

TCP (Reno) – Details Time for TCP to recover its throughput from 1 lost packet given by:

for rtt of ~200 ms:

MSS

RTTC

*2

* 2

2 min

0.00010.0010.010.1

110

1001000

10000100000

0 50 100 150 200rtt ms

Tim

e to

rec

ove

r se

c

10Mbit100Mbit1Gbit2.5Gbit10Gbit

UK 6 ms Europe 20 ms USA 150 ms