national energy research scientific computing center (nersc) network performance tuning eli dart,...

27
National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL

Upload: bennett-mcbride

Post on 17-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

National Energy Research Scientific Computing Center (NERSC)

Network performance tuning

Eli Dart, Network engineerNERSC Center Division, LBNL

Supercomputing Conference, Nov. 2003

Page 2: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

What is Network Tuning?

• Get the most out of the network– Architecture– Hosts– Applications

• Required for HPC center to function• Many HPC projects have high-

performance networking requirements• Projects span multiple sites, multiple

networks

Page 3: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Performance Tuning is Complex

• Users and network operators each see a portion of the system

• Different parts of the system interact in complex ways

• Local optimization can often lead to global performance degradation

Page 4: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

NERSC Center

• High bandwidth ESnet connection (2xGigE)

• Jumbo-clean (9000 byte packets) production services soon

• Peak traffic load doubles every year• Major computational and storage systems

– Seaborg (IBM SP, 10TFLOPs)– PDSF (Linux cluster – HEP, Astrophysics, etc)– HPSS (Multi-Petabyte mass storage system)

Page 5: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

NERSC data traffic profile

Page 6: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Bulk data matters

• Primary driver for NERSC users• Differing transfer profiles

– Large number of files– Large files

• Increasing needs — increasing difficulty– Requirements increasing for foreseeable

future– Tuning will only become more critical

Page 7: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Tuning makes a difference

Site Improvement

ORNL 20x

Fermi 7x

Washington U 5x

BNL 2.5x

PPPL 2.5x

Page 8: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Tuning Goals

• Make it go fast!– Oversimplification, underestimates the problem– Potentially complex engineering tradeoffs

• The real list of goals:– Overcome TCP’s timidity– Clean up the network– Understand the application

• Instrumentation and analysis are key – Know your traffic– Know your infrastructure

Page 9: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

The big one – TCP

• The problem is congestion control/avoidance– Packet loss is interpreted as congestion– Response to early congestion collapse

• TCP is timid– Packet loss is not always congestion– Slow recovery from loss events

• Exponential backoff in response to packet loss• Linear recovery – potentially very slow

– Attempts to allow everyone to use the network

Page 10: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Two TCP camps

• Abandon TCP– Interoperability/coexistence is key– Support issues (research-ware)– Production quality concerns (vendor support, etc)

• Fix/help TCP– Modifications/extensions to TCP

• FAST, Vegas, etc• Web100/Net100

– Reduce packet loss in the network– Host tuning– Shares some problems with the Abandon camp

Page 11: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Today, we work with TCP

• Exists today

• Widely deployed/supported

• Significant performance gains with tuning

• Evaluating both camps for the future

Page 12: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Common problems – Hosts

• TCP buffer sizing

• Disk speed

• Interrupt coalescence

• Per-socket hashing for link aggregation

Page 13: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Common Problems – Network

• It all comes down to packet loss• Network architecture

– Micro-congestion affecting TCP slow start– Misbehaving or misconfigured devices

• Old gear– Performance dependent on CPU– Internal bottlenecks– Flaky interfaces, buggy software

Page 14: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Solutions – host tuning

• TCP buffer size– Don’t just crank it up– Different parameters for different purposes– Use modern software (ncftp, grid tools, etc)– Use per-destination parameters if at all possible

• Think about what you’re doing– You can’t fill a GigE with an IDE disk– Fast host interfaces can slow you down– Netperf/Iperf often isn’t production data transfer

Page 15: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Solutions – packet loss

• Avoid finger pointing– Network operator sees link with headroom

(everything’s fine)– User sees poor performance (the network is broken)– Only way to tell is to look at the traffic

• Identify and fix points of micro-congestion– TCP and ATM don’t get along (different design goals)– “Impedance mismatches” in the network– Network device tuning to handle packet bursts

• Clean up error-prone links

Page 16: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Switch fan-in Micro-congestion

• Switch has small queues• Unable to handle line-rate bursts in

the presence of background traffic

1 Gbps

Core Switch

Source A

Source B Dest Site

Nominal Backgroundtraffic load

Line rate burst fromTCP session startup

Burst minus backgroundtraffic (dropped packets)

1 Gbps

500 Kbps

Page 17: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Routers don’t have this problem

• Routers have much larger queues• Able to absorb traffic bursts without

dropping packets (within reason)

1 Gbps

Core Router

Source A

Source B Dest Site

Nominal Backgroundtraffic load

Line rate burst fromTCP session startup

Excess traffic queuesNo dropped packets

1Gbps

500 Kbps

Page 18: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

TCP slow start packet loss consequences

Transfer speed

Recovery time Data sent

during recovery

10Mbps 8.6 sec 5.5MB

100Mbps 86 sec 537MB

200Mbps 3 min 4.8GB

500Mbps 7 min 13.4GB

1Gbps 14 min 53.5GB

Page 19: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

The most important thing

• Communication, communication, communication

• A 15-minute call can solve more problems than a month of email

• Users may not understand the problem or jump to conclusions

• System administrators are often caught in the middle

• Networking staff may not know that a problem exists

Page 20: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Case study – BNL and NERSC

• Goal – enable large, recurring data transfers from BNL to NERSC

• Periodic transfers of 1-4 Terabytes• Per-stream throughput less than 1MB/sec

(requires large numbers of parallel streams)

• Aggregate performance boost is of primary concern (multi-stream transfers OK, within reason)

Page 21: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Case study – BNL and NERSC

• Initial analysis by users indicated TCP window scaling was not being used

• First step – turn on window scaling• Isolated troubleshooting by both sides

indicates that window scaling has not been enabled at remote site

• Finger pointing as each side tells the other to fix their stupid machines

Page 22: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Case study – BNL and NERSC

• Coordinated troubleshooting (simultaneous tcpdumps at both sites plus conference call) reveals both sides have window scaling enabled

• Something in the network is turning off window scaling

• BNL has a Cisco PIX Firewall on their border.

• Test network verifies PIX is disabling window scaling at packet level

Page 23: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Case study – BNL and NERSC

• Bug filed with Cisco• Cisco knew about this behavior,

didn’t document it• Fixed in version 6.1 and later of PIX

software• BNL was then able to saturate their

OC-3 when transferring data to NERSC

Page 24: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Case study – Conclusions

• Network devices misbehave in strange and undocumented ways

• Multi-site, Coordinated troubleshooting necessary– Communication is key– Finger pointing doesn’t help– Impossible to see the whole problem from any one

vantage point in the network– Solving this problem required NERSC, BNL and

ESnet staff working in concert– Leadership/ownership required for coordination of

these efforts – for NERSC users, that’s us

Page 25: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Main Points

• Network performance tuning is a complex, multi-dimensional problem

• Coordinated effort is required• Performance gains have significant impact

on science – it’s worth the time and effort• NERSC users should involve us in their

troubleshooting efforts ([email protected])

Page 26: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Resources

• NERSC– NERSC consultants:

[email protected]• http://hpcf.nersc.gov/help/

– Networking group: [email protected]

• Other resources– Esnet Performance centers: https://performance.es.net/– PSC Tuning guide:

http://www.psc.edu/networking/perf_tune.html– Jumbo frames

• http://www.abilene.iu.edu/JumboMTU.html• http://www.uoregon.edu/~joe/jumbo-clean-gear.html

Page 27: National Energy Research Scientific Computing Center (NERSC) Network performance tuning Eli Dart, Network engineer NERSC Center Division, LBNL Supercomputing

Thanks for listening

• Questions?