atlas networking & t2uk

12
T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester 1 ATLAS Networking & T2UK Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”

Upload: minerva-crane

Post on 01-Jan-2016

16 views

Category:

Documents


1 download

DESCRIPTION

ATLAS Networking & T2UK. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Remote Computing Farms. Discussion at CERN to establish a work-plan for 2006 Valuable for Monitoring and Calibration MOU Alberta CERN Krakow Manchester - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester1

ATLAS Networking & T2UK

Richard Hughes-Jones The University of Manchester

www.hep.man.ac.uk/~rich/ then “Talks”

Page 2: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester2

Remote Computing Farms Discussion at CERN to establish a work-plan for 2006

Valuable for Monitoring and Calibration MOU Alberta CERN Krakow Manchester New Network Topology with all links carried by GÉANT and NRNs

Planned Investigations Characterise the new network links and end host performance

Tools:iperf udpmon thrulay yatm

Measure the ATLAS request-response behaviourTools: tcpmon, web100 tcpdump

Setup the WAN emulator with the measured conditionsCompare network and ATLAS traffic observations

Install and test ATLAS application gateway (as used at the pit) Test deployment of Online TDAQ HLT releases Measure performance of Online TDAQ HLT releases Consider how to link Real-Time T/DAQ to remote Grid farms

First draft of Work Plan document circulated

Page 3: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester3

Network Operation & Performance Analysis of Fault Tolerance in ATLAS T/DAQ Networks

Document the action of the switches Fate of the packets Effect on T/DAQ applications Networks Considered:

Front End (DataFlow) NetworkBackEnd NetworkControls Network (Run control, services, some monitoring)

Consider questions like: “Failure of a link between the ROS and the ROS Concentrator Switch”

Draft Document being discussed

Performance tests discussed The PCI-e 4* 1GE PEG4 NIC Silicom.

Simple and trunking Throughput ROS SuperMicro Motherboard

6 PCI, 1 4 lane PCI-e, one 3.4 GHz Xeon (dual socket)

Page 4: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester4

Network Monitoring in ATLAS T/DAQ Levels of Monitoring

SNMP Statistics MRTG, RRD, YATM higher sample rateTraffic patterns, bytes, packets NOT dropped packets

Network test programs udpmon, iperf Throughput loss 1-way delay rtt

Standalone ATLAS test programs speaking the TDAQ application protocol.Richard

ATLAS test programs speaking the TDAQ application protocol using TDAQ APIsStefan

Monitoring by the TDAQ application itself

Integration of Message Passing Libraries DataFLow (Reiner) and EF (Mario) main difference in substantiation of buffers Integrate over common thin shim over the socket calls

Idea to put monitoring into (common) message passing layer What can be observed? Question of keeping state – Application would be the best place !

Page 5: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester5

Related Work: RAID, ATLAS Grid RAID0 and RAID5 tests

4th Year MPhys project last semester Throughput and CPU load Different RAID parameters

Number of disksStripe sizeUser read / write size

Different file systemsExt2 ext3 XSF

Sequential File Write, Read Sequential File Write, Read with continuous background read or write

Status Need to check some results & document Independent RAID controller tests planned.

Page 6: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester6

ESLEA: ATLAS Grid on UKLight Demonstration of benefits of Dedicated links

1 Gbit Lightpath Lancaster-Manchester Disk 2 Disk Transfers Storage Element with SRM using distributed disk pools dCache & xrootd

Page 7: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester7

Check out the end host: bbftp What is the end-host doing with your application protocol? Transatlantic bbftp over TCP/IP Look at the PCI-X buses 3Ware 9000 controller RAID0 1 Gbit Ethernet link 2.4 GHz dual Xeon ~660 Mbit/s

PCI-X bus with RAID Controller

PCI-X bus with Ethernet NIC

Read from diskfor 44 ms every 100ms

Write to Networkfor 72 ms

Page 8: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester8

Any Questions?

Page 9: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester9

Backup Slides

Page 10: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester10

TCP Stacks & CPU Load Real User problem! End host TCP flow at 960 Mbit/s with rtt 1 ms falls to 770 Mbit/s when rtt 15 ms

mk5-606-g7_10Dec05

0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00

100.00

0 2 4 6 8 10 12 14 16 18 20nice large value - low priority

% C

PU

mo

de

se

nd

kernel

user

nice

idle

no CPU load

0

200

400

600

800

1000

0 2 4 6 8 10 12 14 16 18 20nice large value - low priority

Thro

ughput

Mbit/s

no CPU load

1.2GHz PIII rtt 1 ms TCP iperf 980 Mbit/s

Kernel mode 95% Idle 1.3 % CPULoad with nice priority

Throughput falls as priorityincreases

No Loss No Timeouts

Not enough CPU power

mk5-606-g7_17Jan05

0.0010.0020.0030.0040.0050.0060.0070.0080.0090.00

100.00

0 2 4 6 8 10 12 14 16 18 20nice large value - low priority

% C

PU

mo

de

se

nd

kernel

user

nice

idle

no CPU load

0

200

400

600

800

1000

0 2 4 6 8 10 12 14 16 18 20nice large value - low priority

Thro

ughput

Mbit/s

no CPU load

2.8 GHz Xeon rtt 1 ms TCP iperf 916 Mbit/s

Kernel mode 43% Idle 55% CPULoad with nice priority

Throughput constant as priority increases

No Loss No Timeouts

Kernel mode includes TCP stackand Ethernet driver

Page 11: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester11

A Few Items for Discussion Achievable Throughput Sharing link Capacity (OK what is sharing?) Convergence time Responsiveness rtt fairness (OK what is fairness?) mtu fairness TCP friendliness Link utilisation (by this flow or all flows) Stability of Achievable Throughput Burst behaviour Packet loss behaviour Packet re-ordering behaviour Topology – maybe some “simple” setups Background or cross traffic - how realistic is needed? – what protocol mix? Reverse traffic Impact on the end host – CPU load, bus utilisation, Offload Methodology – simulation, emulation and Real links ALL help

Page 12: ATLAS Networking & T2UK

T2UK RAL 15 Mar 2006, R. Hughes-Jones Manchester12

More Information Some URLs 1 UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:

http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:

http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/

TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html

TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004

PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002