network performance lessons from the coal face - networkshop44

24
Network performance: lessons from the coal face Chris Walker 20/03/16 Networkshop March 2016

Upload: jisc

Post on 14-Jan-2017

1.457 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Network performance lessons from the coal face - Networkshop44

Networkshop March 2016

Network performance: lessons from the coal face

Chris Walker

20/03/16

Page 2: Network performance lessons from the coal face - Networkshop44

Achieving high performance

20/03/16 Networkshop March 2016 2

What can be achieved?

How we achieved it–

Architecture Network tuning

TCP tuning Parallel transfers Multiple streams

Monitoring

Bottlenecks– Found and fixed

● Conclusions

Page 3: Network performance lessons from the coal face - Networkshop44

Motivation (LHC @CERN)

● Collisions 25ns– 100 PB/year

● QMUL– Small fraction

20/03/16 Networkshop March 2016 3

Page 4: Network performance lessons from the coal face - Networkshop44

Network

20/03/16 Networkshop March 2016 4

● 1 Terabyte can be transferred in:–

100 Mbps network : 30 hrs1 Gbps network : 3 hrs10 Gbps network : 20 minutes

● Takes work to achieve this in practice–

TCP tuningFind and eliminate bottlenecks Reduce packet loss

● Fasterdata.es.net– Excellent source of information

Page 5: Network performance lessons from the coal face - Networkshop44

LAN topology

20/03/16 Networkshop March 2016 5

● 2 * Data transfer nodes (SE) connected at 10Gbit/s–

Optimised for WAN transfers Fast (lustre) filesystem

● Network:–

2 * 20Gbit/s WAN links – HEP use onePreviously HEP 1 Gbit dedicated + 80% of resilient link

Page 6: Network performance lessons from the coal face - Networkshop44

20/03/16 Networkshop March 2016 6

WAN performance

● April 2012: 1 Gbit dedicate link (Saturated)– Source based routing (+ 80% of resilient link)

● Sept 2013: 2* 10 Gig link (1 used by HEP)– 1*10Gig used by High Energy Physics (HEP)

April 2012

Feb 2013

20/03/16 6

Page 7: Network performance lessons from the coal face - Networkshop44

20/03/16 Networkshop March 2016 7

WLCG World sites

20/03/16 7

Page 8: Network performance lessons from the coal face - Networkshop44

Data Transferred● QMUL (2012)

2.6PB downloaded (3.9 million files)

1.4PB uploaded 870MB/s peak rate380MB/s average on busy days

● Atlas– 1PB in 1 week

(October 2012?) worldwide!!!

20/03/16 Networkshop March 2016 8

Page 9: Network performance lessons from the coal face - Networkshop44

Networkshop March 2016

How TCP works: A very short overview● Congestion window (CWND) = the number of packets the sender is

allowed to send–

The larger the window size, the higher the throughput Throughput = Window size / Round-trip Time

● TCP Slow start–

exponentially increase the congestion window size until a packet is lost this gets a rough estimate of the optimal congestion window size

CWND slow start:

exponential

increase

congestionavoidance: linear increase

packet loss

time

retransmit: slow start again

timeout

20/03/16 9

Page 10: Network performance lessons from the coal face - Networkshop44

TCP Tuning

Latency: time to send 1 packet from the source to the destination

RTT: Round-trip time

Bandwidth*Delay = Bandwidth Delay Product

The number of bytes in flight to fill the entire path

Example: 10 Gbps path; ping shows a 90 ms RTT (QMUL->BNL)● BDP = 10 * 0.090 = 0.9 Gbits (112 MBytes)

– QMUL ->Taiwan 273ms RTT (at 10Gbps path)● BDP = 10*0.273 = 2.73 Gbits (340 MBytes)

20/03/16 10

Networkshop March 2016

Page 11: Network performance lessons from the coal face - Networkshop44

Effect of packet loss with distance

● From fasterdata.es.net20/03/16 Networkshop March 2016 11

Page 12: Network performance lessons from the coal face - Networkshop44

Multiple streams

● Parallel streams can help– Potentially unfair on other users

20/03/16 Networkshop March 2016 12

Page 13: Network performance lessons from the coal face - Networkshop44

TCP lessons

20/03/16 Networkshop March 2016

● Increase TCP buffers for distant transfers– Fasterdata.es.net has good recommendations

Packet loss needs eliminating

Application–

large buffers (not scp) Multiple streams GridFTP has these

Aspera uses UDP (and GridFTP can)

Fasterdata.es.net has excellent recommendations

13

Page 14: Network performance lessons from the coal face - Networkshop44

Bottlenecks found

● Gbit connected at 100Mbit–

GridFTP node DeptCollegeIperf tests with another Uni●

1 min CPU limit

2* 1Gbit hashing– Can also cause packet loss 1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Layer 3+4Layer 2 HP switch

20/03/16 Networkshop March 2016

Network card

Pro

porti

on o

f tra

f ic

14

Page 15: Network performance lessons from the coal face - Networkshop44

Routing

20/03/16 Networkshop March 2016

● Linux Layer 2– 1 Gig (not 10Gig)

● 12 *1G → 1G● Router Flapping

– Route 1Gbit or advertise routes● To CERN via the

US

15

Page 16: Network performance lessons from the coal face - Networkshop44

Routing problems with 10Gbit/s upgrade

● 4th September 10Gbit/s WAN upgrade UK sites – increased ratesASGC (Taiwan) decrease–

Route not advertised via GEANT.

16

Page 17: Network performance lessons from the coal face - Networkshop44

Firewalls

20/03/16 Networkshop March 2016

● ICMP– Often blocked

● timeout rather than failure● IPv6

– Tracepath6 blocked (ICMP blocking )● Barclays bank blocked

– Deep packet inspection and rewriting of http packets (but not https)●

Scp failing half way through transfer

GridFTP Slow performance– 1 MB/s through firewall, 50MB/s avoiding firewall

● GridFTP control connection forgotten

17

Page 18: Network performance lessons from the coal face - Networkshop44

IPv6

20/03/16 Networkshop March 2016

● Routes– May be different to IPv4

● Geneva ->QMUL via New York (fixed)● Software (IPv6) / ASIC (IPv4)

Older routers may give poor performance (see perfsonar talk)

Preferred over IPv4–

If IPv6 address (AAAA record) in DNS, it will be used by machines that think they are IPv6 connected.

Blocked differently by firewalls

18

Page 19: Network performance lessons from the coal face - Networkshop44

Jumbo Frames

20/03/16 Networkshop March 2016

● Ethernet–

MTU =1500 - normal MTU=9000 Jumbo (convention)

● Janet network supposed to allow this–

Only for the brave at presentEncapsulation If site uses MTU=9000 jumbo frames, fragmented over Janet

● Path MTU discovery–

Sometimes blocked by firewallsMore likely to be dropped (misconfigured switches etc) net.ipv4.tcp_mtu_probing=1

19

Page 20: Network performance lessons from the coal face - Networkshop44

Networkshop March 2016

Network monitoring (perfsonar +ripe ATLAS)

● Cacti–

Monitor packet loss 64 bit counters

● Perfsonar–

Bandwidth LatencyReverse traceroute● Ripe ATLAS probe

Atlas.ripe.net Latencytraceroute

20/03/16 20

Page 21: Network performance lessons from the coal face - Networkshop44

Bufferbloat Www.bufferbloat.net

20/03/16 Networkshop March 2016

Chaotic and laggy network performance

Buffers too big for bandwidth

Affects home users on low bandwidth links with big buffers– Packet loss signalling bandwidth limit too late

21

Page 22: Network performance lessons from the coal face - Networkshop44

Conclusions

20/03/16 Networkshop March 2016

● Large transfers routine–

But take work (GridPP sites have this experience) Needs management layer

● Monitoring vital–

Transfers Network

● Network–

Low packet lossGood relationship with network team useful

● Information– Fasterdata.es.net

22

Page 23: Network performance lessons from the coal face - Networkshop44

Acknowledgements

20/03/16 Networkshop March 2016

● Fasterdata.es.net (Brian Tierney)– Much thanks for the TCP tuning slides

Duncan Rand

Brian Davies

Dan Traynor

Terry Froy

23

Page 24: Network performance lessons from the coal face - Networkshop44

jisc.ac.uk

20/03/16 Networkshop March 2016

Christopher Walker

[email protected]

24