slide: 1 richard hughes-jones t2uk, october 06 r. hughes-jones manchester 1 update on remote...
Post on 19-Dec-2015
221 Views
Preview:
TRANSCRIPT
Slide: 1Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 1
Update on Remote Real-Time Computing Farms
For ATLAS Trigger DAQ.
Richard Hughes-Jones University of Manchester
In Collaboration with:Bryan Caron University of Alberta
Krzysztof Korcyl IFJ PAN Krakow
Catalin Meirosu TERANA
Jakob Langgard Nielsen Niels Bohr Institute
Slide: 2Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 2
Tests from Krakow Configure/Test/Operate Krakow – CERN ANT GPS box
Learn many details !!! Successful GPS and ANT operation using GUI at CERN Histograms still to be done
Synchronized Manchester – Krakow clocks with GPS
Configured ANT test Krakow-Manchester BUT … Manc Alteon card 66 MHz Manc Netgear card 0.5Mbyte Exchange at CERN next week
Online help from Catalin
udpmon tests Krakow – CERN CERN – Krakow Manc – CERN
Slide: 3Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 3
GÉANT2 Topology
Slide: 4Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 4
GÉANT2: The Convergence Solution
NREN AccessNREN Access
ExistingIP Router
ExistingIP Router
GÉANT2POP B
GÉANT2POP A
Managed Lambda’s
1626 LM
1626 LM
L2Matrix
L2
TDM Matrix
TDM
1678 MCC
1678 MCCD
ark
Fib
er
EXPReS PC10 GE
EXPReS PC10 GE
Slide: 5Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 5
Pionier Weather Map
Slide: 6Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 6
Test Traffic on Pionier
Slide: 7Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 7
CERN Krakow: Throughput CERN Krakow Krakow CERN
atb79-zeus15_28Sep06
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Rec
v W
ire r
ate
Mbi
t/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
atb79-zeus15_28Sep06
0
20
40
60
80
100
0 10 20 30 40Spacing between frames us
% P
acke
t lo
ss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
atb79-zeus15_28Sep06
0
20000
40000
60000
80000
100000
0 10 20 30 40Spacing between frames us
Pac
ket
re-o
rder
ed
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
800Mbit/s 20% Loss in the network
zeus15-atb79_29Sep06
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Rec
v W
ire r
ate
Mbi
t/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
zeus15-atb79_29Sep06
0
20
40
60
80
100
0 10 20 30 40Spacing between frames us
% P
acke
t lo
ss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
600-800Mbit/s 40% Loss in the network ~3 * more re-ordering
zeus15-atb79_29Sep06
0
50000
100000
150000
200000
250000
0 10 20 30 40Spacing between frames us
Pac
ket
re-o
rder
ed
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
Slide: 8Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 8
CERN Krakow: LatencyCERN Krakow Krakow CERN
Fwhm ~10us Peak separation 90-120 us
Fwhm 50 us Peak separation ~135 us
256 bytes zeus15-atb79_29Sep06
0100200300400500600700800900
37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us
N(t
)
512 bytes zeus15-atb79_29Sep06
0
200
400
600
800
1000
1200
37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us
N(t
)
1024 bytes zeus15-atb79_29Sep06
0
200400
600800
1000
12001400
1600
37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us
N(t
)
1400 bytes zeus15-atb79_29Sep06
0
500
1000
1500
2000
37100 37200 37300 37400 37500 37600 37700 37800 37900 38000Latency us
N(t
)
256 bytes atb79-zeus15_28Sep06
0100020003000400050006000700080009000
37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us
N(t
)
512 bytes atb79-zeus15_28Sep06
0
2000
4000
6000
8000
10000
37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us
N(t
)
1024 bytes atb79-zeus15_28Sep06
0
2000
4000
6000
8000
10000
37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us
N(t
)
1400 bytes atb79-zeus15_28Sep06
0
1000
2000
3000
4000
5000
6000
37000 37200 37400 37600 37800 38000 38200 38400 38600 38800Latency us
N(t
)
Slide: 9Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 9
CERN Krakow: 1-way delayCERN Krakow Krakow CERN
20 us packet spacing 20 us packet spacing Interrupt Coalescence
W20 zeus15-atb79_29Sep06
18900
19000
19100
19200
19300
19400
19500
19600
0 100 200 300 400 500Packet No.
1-w
ay d
elay
us
W20 zeus15-atb79_29Sep06
18800
18900
19000
19100
19200
19300
19400
19500
19600
19700
19800
0 1000 2000 3000 4000 5000
Packet No.
1-w
ay d
elay
us
W20 atb79-zeus15_28Sep06
18800
18900
19000
19100
19200
19300
19400
19500
19600
19700
19800
0 1000 2000 3000 4000 5000
Packet No.
1-w
ay d
elay
us
W20 atb79-zeus15_28Sep06
18900
19000
19100
19200
19300
19400
19500
19600
0 100 200 300 400 500Packet No.
1-w
ay d
elay
us
Slide: 10Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 10
Thanks to all who helped, including:
National Research NetworksCanarie, Dante, DARENET, Netera, PSNC and UKERNA
“ATLAS remote farms” J. Beck Hansen, R. Moore, R. Soluk,
G. Fairey, T. Bold, A. Waananen, S. Wheeler, C. Bee
“ATLAS online and dataflow software” S. Kolos, S. Gadomski, A. Negri, A. Kazarov, M. Dobson,
M. Caprini, P. Conde, C. Haeberli, M. Wiesmann, E. Pasqualucci, A. Radu
Slide: 11Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 11
Any Questions?
Slide: 12Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 12
Backup Slides
Slide: 13Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 13
Remote Computing Concepts
ROBROBROBROB
L2PUL2PUL2PUL2PU
SFISFI SFI
PFLocal Event Processing Farms
ATLAS Detectors – Level 1 Trigger
SFOs
Mass storageExperimental Area
CERN B513
CopenhagenEdmontonKrakowManchester
PF
Remote Event Processing Farms
PF
PF PF
ligh
tpat
hs
PF
Data Collection Network
Back End Network
GÉANT
Switch
Level 2 Trigger
Event Builders
Slide: 14Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 14
ATLAS Remote Farms – Network Connectivity
Slide: 15Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 15
ATLAS Application Protocol
Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes
Processing of event Return of computation
EF asks SFO for buffer space SFO sends OK EF transfers results of the computation
Tcpmon - instrumented tcp request-response program emulates the Event Filter EFD to SFI communication.
Send OK
Send event data
Request event
●●●
Request Buffer
Send processed event
Process event
Time
Request-Response time (Histogram)
Event Filter Daemon EFD SFI and SFO
Slide: 16Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 16
End Hosts & NICs CERN-Manc.
Request-response Latency
Throughput Packet Loss Re-Order Use UDP packets to characterise Host & NIC
SuperMicro P4DP8 motherboardDual Xenon 2.2GHz CPU400 MHz System bus66 MHz 64 bit PCI bus
pcatb89-gig6_18Jul04
0100200300400500600700800900
1000
0 5 10 15 20 25 30 35 40Spacing between frames us
Rec
v W
ire r
ate
Mbi
ts/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
pcatb89-gig6_18Jul04
0
20
40
60
80
100
0 5 10 15 20 25 30 35 40Spacing between frames us
% P
acke
t lo
ss 50 bytes
100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
pcatb89-gig6_18Jul04
020406080
100120
0 5 10 15 20 25 30 35 40Spacing between frames us
Num
re-
orde
red
50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes
64 bytes pcatb89-gig6
0
1000
2000
3000
4000
5000
6000
20900 21100 21300 21500Latency us
N(t)
512 bytes pcatb89-gig6
0100020003000400050006000700080009000
20900 21100 21300 21500Latency us
N(t)
1400 bytes pcatb89-gig6
0
1000
2000
3000
4000
5000
6000
20900 21100 21300 21500Latency us
N(t)
Slide: 17Richard Hughes-Jones
T2UK, October 06 R. Hughes-Jones Manchester 17
TCP (Reno) – Details Time for TCP to recover its throughput from 1 lost packet given by:
for rtt of ~200 ms:
MSS
RTTC
*2
* 2
2 min
0.00010.0010.010.1
110
1001000
10000100000
0 50 100 150 200rtt ms
Tim
e to
rec
ove
r se
c
10Mbit100Mbit1Gbit2.5Gbit10Gbit
UK 6 ms Europe 20 ms USA 150 ms
top related