Download - Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown
Stanford University
The Load-Balanced Router
2
R
R
R
R
R
R
Typical Router Architecture
Input
Input
Input
Switch Fabric
Scheduler
Output
Output
Output
1122
11
3
Traffic matrix:
Uniform traffic matrix: λij = λ
Definitions: Traffic MatrixR
R
R
R
R
R
1
N
i
1
N
j
4
100% throughput: for any traffic matrix of row and column sum less than R,
λij < μij
Definitions: 100% ThroughputR
R
R
R
R
R
1
N
i
1
N
j
ij ij
5
Router Wish ListScale to High Linecard Speeds
No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity
Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards
Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
6
Stanford 100Tb/s Router
“Optics in Routers” project http://yuba.stanford.edu/or/
Some challenging numbers: 100Tb/s 160Gb/s linecards 640 linecards
7
In
In
In
Out
Out
Out
R
R
R
R
R
R
Router capacity = NRSwitch capacity = N2R
100% Throughput in a Mesh Fabric
?
?
?
?
?
?
?
?
?
R
R
R
R
R
R
R
R
R
RRRR
8
R
In
In
In
Out
Out
Out
R
R
R
R
R
R/N
R/N
R/N
R/NR/N
R/N
R/N
R/N
R/N
If Traffic Is Uniform
RNR /NR /NR /
R
NR / NR /
9
Real Traffic is Not Uniform
R
In
In
In
Out
Out
Out
R
R
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
RNR /NR /NR /
R
RNR /NR /NR /
R
RNR /NR /NR /
R
R
R
R
?
10
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
Load-Balanced Switch
Load-balancing stage Forwarding stage
In
In
In
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R
R
R
100% throughput for weakly mixing traffic (Valiant, C.-S. Chang et al.)
11
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
112233
Load-Balanced Switch
12
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N33
22
11
Load-Balanced Switch
13
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/NR/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
Intuition: Proof of 100% Throughput
Arrivals to second mesh:
Capacity of second mesh:
Second mesh: arrival rate < service rate
111
111
111
where,1
UaUN
b
01
-b RUaUN
C
UN
RC
Cba
14
Alternative: Crossbar Switch Fabric
External Outputs
Intermediate ports
1
N
ExternalInputs
1
N
1
N
11
2
2
Proposed by C.-S.Chang et al. Essential result: same rate => same
guarantees
15
Router Wish ListScale to High Linecard Speeds
No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity
Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards
Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
?
16
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
Packet Reordering
12
17
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
Bounding Delay Difference Between Middle Ports
1
2
cells
18
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
123
0
UFS (Uniform Frame Spreading)
12
19
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
FOFF (Full Ordered Frames First)
12
20
FOFF (Full Ordered Frames First)
Input Algorithm N FIFO queues corresponding to the N output flows Spread each flow uniformly: if last packet was sent to
middle port k, send next to k+1. Every N time-slots, pick a flow:
- If full frame exists, pick it and spread like UFS - Else if all frames are partial, pick one in round-robin order and send it
123
12
4
N
21
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
Bounding Reordering
123
NN
22
FOFF
Output properties N FIFO queues corresponding to the N middle
ports If there are N2 packets, one of the head-of-line
packets is in order and can depart Buffer size at most N2 packets
111
22
333
Output
4
N
23
FOFF Properties
Property 1: FOFF maintains packet order.
Property 2: FOFF has O(1) complexity.
Property 3: Congestion buffers operate independently.
Property 4: FOFF maintains an average packet delay within constant from ideal output-queued router.
Corollary: FOFF has 100% throughput for any adversarial traffic.
24
In
In
In
Out
Out
Out
R
R
R
R
R
R
Output-Queued Router?
?
?
?
?
?
?
?
?
R
R
R
R
R
R
R
R
R
RRRR
25
Router Wish ListScale to High Linecard Speeds
No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity
Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards
Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
26
Out
Out
Out
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
In
In
In
R
R
R
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
R/N
From Two Meshes to One Mesh
One linecard
In
Out
27
From Two Meshes to One Mesh
First meshIn Out
In Out
In Out
In Out
One linecard
Second mesh
R R
R
R
R
28
From Two Meshes to One Mesh
Combined meshIn Out
In Out
In Out
In Out
2RR
2R
2R
2R
29
Many Fabric Options
Options
Space: Full uniform meshTime: Round-robin crossbarWavelength: Static WDM
Any spreadingdevice
C1, C2, …, CN
C1
C2
C3
CN
In Out
In Out
In Out
In Out
N channels each at rate 2R/NOne linecard
30
AWGR (Arrayed Waveguide Grating Router) A Passive Optical Component
Wavelength i on input port j goes to output port (i+j-1) mod N
Can shuffle information from different inputs
1,
2…N
NxN AWGR
Linecard 1
Linecard 2
Linecard N
1
2
N
Linecard 1
Linecard 2
Linecard N
31
In Out
In Out
In Out
In Out
Static WDM Switching: Packaging
AWGR
Passive andAlmost Zero
Power
A
B
C
D
A, B, C, D
A, B, C, D
A, B, C, D
A, B, C, D
A, A, A, A
B, B, B, B
C, C, C, C
D, D, D, D
N WDM channels, each at rate 2R/N
32
Router Wish ListScale to High Linecard Speeds
No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity
Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards
Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
33
Scaling Problem
For N < 64, an AWGR is a good solution. We want N = 640. Need to decompose.
34
A Different Representation of the Mesh
In Out
In Out
In Out
In Out
R 2R
Mesh
2R In Out
In Out
In Out
In Out
R
2RR
35
A Different Representation of the Mesh
In Out
In Out
In Out
In Out
R In Out
In Out
In Out
In Out
R2R/N
36
1
2
3
4
Example: N=8
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
2R/8
37
When N is Too LargeDecompose into groups (or racks)
4R/42R 2R1
2
3
4
5
6
7
8
2R2R
1
2
3
4
5
6
7
8
4R 4R
38
When N is Too LargeDecompose into groups (or racks)
1
2
L
2R2R
2R
1
2
L
2R2R
2R
Group/Rack 1
Group/Rack G
1
2
L
2R2R
2R
Group/Rack 1
1
2
L
2R2R
2R
Group/Rack G
2RL
2RL 2RL
2RL2RL/G
2RL/G
2RL/G
2RL/G
Electronics Electronics
Optics
39
Router Wish ListScale to High Linecard Speeds
No Centralized Scheduler Optical Switch Fabric Low Packet-Processing Complexity
Scale to High Number of Linecards High Number of Linecards Arbitrary Arrangement of Linecards
Provide Performance Guarantees 100% Throughput Guarantee Delay Guarantee No Packet Reordering
40
When Linecards are Missing
1
2
L
2R2R
2R
1
2
L
2R2R
2R
Group/Rack 1
Group/Rack G
1
2
L
2R2R
2R
Group/Rack 1
1
2
L
2R2R
2R
Group/Rack G
2RL
2RL 2RL
2RL2RL/G
2RL/G
2RL/G
2RL/G
2RL
Solution: replace mesh with sum of permutations
= + +
2RL/G 2RL/G 2RL/G 2RL/G
≤
2RL 2RL/G
G *
41
MEMS-Based Architecture
1
2
L
1
2
L
Group/Rack 1
Group/Rack G
1
2
L
1
2
L
StaticMEMSSwitch
Static MEMSSwitch
Electronics Electronics
Optics
Group/Rack 1
Group/Rack G
Uniform Multiplexing
Uniform Demultiplexing
42
1
2
L
1
2
L
Group/Rack 1
Group/Rack G
1
2
L
Group/Rack 1
1
2
L
Group/Rack G
MEMSSwitch
MEMSSwitch
When Linecards are Missing
43
Implementation of a 100Tb/s Load-Balanced Router
Linecard Rack 1
L = 16160Gb/s linecards
55 56
1 2
40 x 40static
MEMS
Switch Rack < 100W
L = 16160Gb/s linecards
Linecard Rack G = 40
L = 16160Gb/s linecards
44
Summary
The load-balanced switch Does not need any centralized scheduling Can use a mesh
Using FOFF It keeps packets in order It guarantees 100% throughput
Using the MEMS-based architecture It scales to high port numbers It tolerates linecard failure
45
References
Initial Work
C.-S. Chang, D.-S. Lee and Y.-S. Jou, "Load Balanced Birkhoff-von Neumann Switches, part I: One-Stage Buffering," Computer Communications, Vol. 25, pp. 611-622, 2002.
Extensions
I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown, "Scaling Internet Routers Using Optics," ACM SIGCOMM'03, Karlsruhe, Germany, August 2003.
I. Keslassy, S.-T. Chuang and N. McKeown, “A Load-Balanced Switch with an Arbitrary Number of Linecards,” IEEE Infocom’04, Hong Kong, March 2004.
Thank you.