a weighted fat-tree routing algorithm for efficient load

26
Feroz Zahid, Ernst Gunnar Gran, Tor Skeie Simula Research Laboratory, Norway Bartosz Bogdanksi, BjØrn Dag Johnsen Oracle Corporation PDP 2015, Turku, Finland March 5, 2015 A weighted fat-tree routing algorithm for efficient load-balancing in InfiniBand clusters

Upload: others

Post on 12-Nov-2021

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A weighted fat-tree routing algorithm for efficient load

Feroz Zahid, Ernst Gunnar Gran, Tor Skeie Simula Research Laboratory, Norway Bartosz Bogdanksi, BjØrn Dag Johnsen Oracle Corporation

PDP 2015, Turku, Finland

March 5, 2015

A weighted fat-tree routing algorithm for efficient load-balancing in InfiniBand clusters

Page 2: A weighted fat-tree routing algorithm for efficient load

InfiniBand (IB) is a popular interconnect for HPC systems

Source: Top500 Supercomputers List, http://top500.org/

44.8% share in November 2014 top supercomputers list

Page 3: A weighted fat-tree routing algorithm for efficient load

Network performance in HPC systems depends on three important factors

Routing

Network Topology

Traffic Patterns

Page 4: A weighted fat-tree routing algorithm for efficient load

Many different topologies are found in real-world clusters Ring, Kautz, Torus, Clos, Fat-trees

Fat-tree and its variants are very common in IB networks

• k-ary-n-tree • n levels, 𝑘𝑘𝑛𝑛 nodes n . 𝑘𝑘𝑛𝑛−1 switches • 2k ports on each switch • Each switch has equal number of up and down connections • Only half of the ports of the root switches are used

• XGFTs • More generalized • Allows different number of up and down connections on switches • Also, allows different number of connections at each level

• PGFTs • Allows multiple connecting links between switches

• RLFTs • Restrictions on PGFTs • Same port switches at all levels

Page 5: A weighted fat-tree routing algorithm for efficient load

Maintenance of full-bisection bandwidth

A B

Easy deadlock-free Routing

Fault Tolerance

Fat-trees have nice properties that make them popular

Up Down

Page 6: A weighted fat-tree routing algorithm for efficient load

Routing in IB networks is generally deterministic

Based on linear forwarding tables (LFTs) stored in the switches

Deterministic routing is traffic oblivious!

Page 7: A weighted fat-tree routing algorithm for efficient load

Routing in fat-tree networks can be source based or destination based, and can be closed form or iterative

• Source-based • Out-port for a packet at a switch based on source node identifier

• Destination-based • Out-port for a packet at a switch based on destination node identifier

• Closed form • D-mod-K, S-mod-K

• Iterative

for each leaf switch lf for each node connected to lf id <= node identifier route_downgoing_go_up(id) ... end for end for

Page 8: A weighted fat-tree routing algorithm for efficient load

OFED’s fat-tree routing algorithm tends to spread the routes across the tree using counters

Ref: Zahavi, Eitan, et al. "Optimized InfiniBand fat-tree routing for shift all-to-all communication patterns." Concurrency and Computation: Practice and Experience 22.2 (2010): 217-231.

OFED is the de-facto standard software stack for building and deploying IB based applications

• Deterministic • High-performance, Avoids out-of-order packet deliveries

• Destination-based • Direct realization in IB networks

• Iterative • Better routes balancing

• Maintains counters on ports • When a new route is added - +1

• Supports XGFTs, PGFTs, RLFTs

Page 9: A weighted fat-tree routing algorithm for efficient load

“Multi-stage switches are not cross-bars!”

The effective bisection-bandwidth depends on the traffic pattern

Ref: Hoefler, Torsten, Timo Schneider, and Andrew Lumsdaine. "Multistage switches are not crossbars: Effects of static routing in high-performance networks." Cluster Computing, 2008

Page 10: A weighted fat-tree routing algorithm for efficient load

“Multi-stage switches are not cross-bars!”

The effective bisection-bandwidth depends on the traffic pattern

Ref: Hoefler, Torsten, Timo Schneider, and Andrew Lumsdaine. "Multistage switches are not crossbars: Effects of static routing in high-performance networks." Cluster Computing, 2008

Page 11: A weighted fat-tree routing algorithm for efficient load

“Multi-stage switches are not cross-bars!”

The effective bisection-bandwidth depends on the traffic pattern

Ref: Hoefler, Torsten, Timo Schneider, and Andrew Lumsdaine. "Multistage switches are not crossbars: Effects of static routing in high-performance networks." Cluster Computing, 2008

Node 1 and 4 share same index position in their leaf switches

Page 12: A weighted fat-tree routing algorithm for efficient load

We identify two important issues with the fat-tree routing algorithm as implemented by OFED’s subnet manager

• Node Traffic Oblivious Routing • All nodes treated equally • Node roles ignored

• Non-predictable Performance • Node are routed in an order that depends on the port numbers • Port numbering is hard to set

• Sysadmins do not care about it • Addition of new nodes

• Which nodes share links? • Depends on the indexing sequence!

Page 13: A weighted fat-tree routing algorithm for efficient load

Some nodes tends to receive more traffic than others, so routes towards those nodes are more likely to be congested Node 4 and 5 are more likely to receive traffic e.g. storage nodes

Page 14: A weighted fat-tree routing algorithm for efficient load

Some nodes tends to receive more traffic than others, so routes towards those nodes are more likely to be congested Node 4 and 5 are more likely to receive traffic e.g. storage nodes

Page 15: A weighted fat-tree routing algorithm for efficient load

Some nodes tends to receive more traffic than others, so routes towards those nodes are more likely to be congested

We call these nodes receiver nodes!

Node 4 and 5 are more likely to receive traffic e.g. storage nodes

Page 16: A weighted fat-tree routing algorithm for efficient load

648-port fat-tree is a common building block for HPC systems

Page 17: A weighted fat-tree routing algorithm for efficient load

Result: The probability of index collision for receiver nodes is very high for node oblivious routing

Probability of about 90% that two receiver nodes will share the same index for 2 rcv/switch !

Page 18: A weighted fat-tree routing algorithm for efficient load

The weighted fat-tree routing algorithm (wFatTree) assigns weights to the nodes

The algorithm is still deterministic!

• All compute nodes are assigned a new parameter • receive weight

• Weights can be assigned based on • Known node roles e.g. storage nodes • Known traffic priorities e.g. following QoS levels • Traffic profiling

• Nodes are routed in the decreasing order of their weights • Not based on port numbering • Predictable

• Port selection is based on both • Downward weight • Upward weight

Page 19: A weighted fat-tree routing algorithm for efficient load

Port selection in wFatTree uses both downward and upward weights

Page 20: A weighted fat-tree routing algorithm for efficient load

Result: Evaluation on 648-port fat-tree shows substantial improvements in total network bandwidth

18 Switches with receiver nodes

27 Switches with receiver nodes

Page 21: A weighted fat-tree routing algorithm for efficient load

Result: Evaluation on 648-port fat-tree shows substantial improvements in total network bandwidth

All 36 Switches with receiver nodes

Page 22: A weighted fat-tree routing algorithm for efficient load

Result: wFatTree minimizes the total contention on the links by routes balancing

Page 23: A weighted fat-tree routing algorithm for efficient load

Result: wFatTree minimizes the total contention on the links by routes balancing

Page 24: A weighted fat-tree routing algorithm for efficient load

Result: The wFatTree execution time is competitive to the original fat tree routing

Topology No. of End Nodes Fat Tree Routing wFatTree Routing

4-ary-2-tree 16 0.167 0.255

8-ary-2-tree 64 0.318 0.365

16-ary-2-tree 256 1.686 2.268

8-ary-3-tree 512 16.386 19.657

12-ary-3-tree 1728 188.856 230.639

16-ary-3-tree 4096 1029.369 1434.287

Page 25: A weighted fat-tree routing algorithm for efficient load

Future Work: Enable smart network provisioning – Four important components

Nodes with weights

Balanced Traffic Better Routes

Optimized Algorithms

Smart Routing Reconfiguration Load Balancing Congestion Control

IB Congestion Control

Performance

Adjusting to Load

Optimization

Monitor->Optimize->Execute Loop

Page 26: A weighted fat-tree routing algorithm for efficient load

Questions?

State-of-the fat-tree routing with oblivious path assignment

The weighted fat-tree routing with

better load-balancing

In summary, weighted fat-tree routing improves actual load-balancing in IB based fat-tree networks