devavrat shah stanford university
DESCRIPTION
Randomization and Heavy Traffic Theory: New approaches to the design and analysis of switch algorithms. Devavrat Shah Stanford University. Network algorithms. Algorithms implemented in networks, e.g. in switches/routers scheduling algorithms routing lookup - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/1.jpg)
Devavrat Shah
Stanford University
Randomization and Heavy Traffic Theory: New approaches to the design and analysis
of switch algorithms
![Page 2: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/2.jpg)
3
Network algorithms
• Algorithms implemented in networks, e.g. in
– switches/routers scheduling algorithms routing lookup packet classification
– memory/buffer managers maintaining statistics active queue management bandwidth partitioning
– load balancers randomized load balancing
– web caches eviction schemes placement of caches in a network
![Page 3: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/3.jpg)
4
Network algorithms: challenges
• Time constraint: Need to make complicated decisions very quickly– e.g. line speeds in the Internet core 10Gbps (soon to be 40Gbps) packets arrive roughly every 50ns (roughly 10ns)
• Limited computational resources – due to rigid space and heat dissipation constraints
• Algorithms need to be very simple so as to be implementable– but simple algorithms may perform poorly, if not well-designed
![Page 4: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/4.jpg)
5
An illustration
• Consider the following simple system
• Algorithms– serve a randomly chosen queue: very easy
– longest queue first (LQF): complicated to implement
• Performance– serve at random: does not give maximum throughput
– LQF: max throughput and minimizes the maximum backlog
Capacity =1
![Page 5: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/5.jpg)
6
Input queued switch
• Multiple queues and servers
![Page 6: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/6.jpg)
7
Input queued switch
• Crossbar constraints– each input can connect to at most one output
– each output can connect to at most one input
Crossbar fabric
1
2
1 2 3
3
Not allowed
![Page 7: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/7.jpg)
8
Switch scheduling
• Crossbar constraints– each input can connect to at most one output
– each output can connect to at most one input
Crossbar fabric
1
2
1 2 3
3
![Page 8: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/8.jpg)
9
Switch scheduling
• Crossbar constraints– each input can connect to at most one output
– each output can connect to at most one input
Crossbar fabric
1
2
1 2 3
3
![Page 9: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/9.jpg)
10
Switch scheduling
• Crossbar constraints– each input can connect to at most one output
– each output can connect to at most one input
Crossbar fabric
1
2
1 2 3
3
![Page 10: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/10.jpg)
11
Notation and definitions
1
2
1 2 3
3
![Page 11: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/11.jpg)
12
Useful performance metric
• Throughput
– an algorithm is stable (or delivers 100% throughput) if for any admissible arrival, the average backlog is bounded; i.e.
• Average delay or average backlog (queue-size)
![Page 12: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/12.jpg)
13
Schedule ´ Bipartite graph matching
19
3421
18
7
1
Schedule or Matching
![Page 13: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/13.jpg)
14
Scheduling algorithms
Not stable
19
3421
18
7
1
PracticalMaximal Matchings
– iSLIP: Cisco’s GSR 12000 Series Nick McKeown
– Wave Front Arbiter Tamir and Chi
– Parallel Iterative Matching Anderson, Owicki, Saxe, Thacker
![Page 14: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/14.jpg)
15
Scheduling algorithms
Not stable
Stable (Tassiulas-Ephremides 92,
McKeown et. al. 96, Dai-Prabhakar 00)
Not stable (McKeown-Ananthram-Walrand 96)
19
3421
18
7
1
PracticalMaximal Matchings
Max Wt Matching
19
18
Max Size Matching
19
1
7
![Page 15: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/15.jpg)
16
The Maximum Weight Matching Algorithm
• MWM: performance– throughput: stable (Tassiulas-Ephremides 92; McKeown et al 96; Dai-Prabhakar 00)
– backlogs: very low on average (Leonardi et al 01; Shah-Kopikare 02)
• MWM: implementation – has cubic worst-case complexity (approx. 27,000 iterations for a 30-port switch) – MWM algorithms involve backtracking: i.e. edges laid down in one iteration may be removed in a subsequent
iteration algorithm not amenable to pipelining
![Page 16: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/16.jpg)
17
Switch algorithms
Stable and low backlogsNot stable
Better performance
Easier to implement
Maximal matching Max Wt Matching
19
18
Max Size Matching
19
1
7
Not stable
![Page 17: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/17.jpg)
18
Summarizing…
• Need algorithms that are simple and perform well– can process packets economically at line speeds– deliver a high throughput– and low latencies/backlogs
• Need to develop methods for analyzing the performance of these algorithms– traditional methods aren’t good enough
![Page 18: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/18.jpg)
19
Rest of the talk
• A simple, high performance scheduling algorithm
– Randomized approximation of Max Wt Matching(joint work with P. Giaccone and B. Prabhakar)
– A method for analyzing backlogs based on heavy traffic theory (joint work with D. Wischik)
• A new approach to switch scheduling
– Randomized edge coloring (joint work with G. Aggrawal, R. Motwani and A. Zhu)
![Page 19: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/19.jpg)
20
Algorithm I
Randomized approximationto the Max Wt Matching algorithm
![Page 20: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/20.jpg)
21
Randomized approximation to MWM
• Consider the following randomized approximation:At every time - sample d matchings independently and uniformly - use the heaviest of these d matchings to schedule
packets
• Ideally we would like to use a small value of d. However,…
Theorem. Even with d = N, this algorithm is not stable.
In fact, when d=N, the throughput · 1 – e-1 ¼ 0.63. (Giaccone-Prabhakar-Shah 02)
![Page 21: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/21.jpg)
22
Tassiulas’ algorithm
Next time MAX
Previous schedule
S(t-1)
Current schedule
S(t)
Random Matching
R(t)
![Page 22: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/22.jpg)
23
Tassiulas’ algorithm
10
50
10107
060
S(t-1)W(S(t-1))=160
40 3
010
20R(t)
W(R(t))=150
MAX
S(t)
![Page 23: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/23.jpg)
24
Performance of Tassiulas’ algorithm
Theorem (Tassiulas 98): The above scheme is stable under any admissible Bernoulli IID inputs.
![Page 24: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/24.jpg)
25
Backlogs under Tassiulas’ algorithm
0.01
0.1
1
10
100
1000
10000
0 0.2 0.4 0.6 0.8 1
Normalized Load
Mean I
Q L
eng
th
Tassiulas
MWM
![Page 25: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/25.jpg)
26
10
10
10
70
60
S(t-1)
W(S(t-1))=160
50
40
30
10
20
R(t)
W(R(t))=150
Reducing backlogs: the Merge operation
30 v/s 120
130 v/s 30
Merge
![Page 26: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/26.jpg)
27
10
10
10
70
60
S(t-1)
W(S(t-1))=160
50
40
30
10
20
R(t)
W(R(t))=150
Reducing backlogs: the Merge operation
Merge
W(S(t)) = 250
![Page 27: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/27.jpg)
28
Performance of Merge algorithm
Theorem (GPS): The Merge scheme is stable under any admissible Bernoulli IID inputs.
![Page 28: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/28.jpg)
29
Merge v/s Max
0.01
0.1
1
10
100
1000
10000
0 0.2 0.4 0.6 0.8 1
Normalized Load
Mea
n I
Q L
eng
th
TassiulasMergeMWM
![Page 29: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/29.jpg)
30
893
5
23
47
1131
97
S(t-1)
W(S(t-1))=209
Use arrival information: Serena
2
7
The arrival graph
![Page 30: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/30.jpg)
31
893
5
23
47
1131
97
S(t-1)
W(S(t-1))=209
Use arrival information: Serena
2
The arrival graph
![Page 31: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/31.jpg)
32
893
6
23
47
1131
97
S(t-1)
23
W(S(t-1))=209
0
W=121
Use arrival information: Serena
Merge
W(S(t))=243
S(t)
89
3
23
31
97
![Page 32: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/32.jpg)
33
Performance of Serena algorithm
Theorem (GPS): The Serena algorithm is stable under any admissible Bernoulli IID inputs.
![Page 33: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/33.jpg)
34
Backlogs under Serena
0.01
0.1
1
10
100
1000
10000
0 0.2 0.4 0.6 0.8 1
Normalized Load
Mea
n I
Q L
ength
TassiulasMergeSerenaMWM
![Page 34: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/34.jpg)
35
Summary of main ideas and results
• Randomization alone is not enough
• Memory (using past sample) is very powerful
• Exploiting problem structure: the Merge operation
• Using arrival information
We obtained an algorithm– whose performance is close to that of MWM– and is very simple to implement
![Page 35: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/35.jpg)
36
A new method for analyzing backlogsvia Heavy Traffic Theory
![Page 36: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/36.jpg)
37
Some context…
• We have seen the design of a simple, high throughput switch scheduling algorithm
• The backlog/delay induced by an algorithm is another very important measure of its performance
0.01
0.1
1
10
100
1000
10000
0 0.2 0.4 0.6 0.8 1
Normalized Load
Mean I
Q L
ength
TassiulasMergeSerenaMWM
• For example, recall Tassiulas’ scheme
![Page 37: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/37.jpg)
38
Delay or backlog analysis
• Delay analysis is harder than throughput analysis because
– throughput measures the long-term efficiency of an algorithm– whereas, the delay (or backlog) induced by an algorithm is a
measure of its instantaneous efficiency
It is also harder to design low delay schedulers
![Page 38: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/38.jpg)
39
Current delay analysis methods are not suitable
• Queuing theory– requires knowledge of service time distribution – but this depends on the scheduling algorithm service distribution is too complicated to determine
• Large deviations: very successful for output-queued switch– N-port OQ switch can be “decomposed” into N independent 1-port switches but an N-port input-queued switch is not decomposable
• Stochastic coupling arguments– effective for establishing: Delay(Alg A) < Delay(Alg B) (without saying how much better) very hard to make for input-queued switches
![Page 39: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/39.jpg)
40
Our approach
• Use the machinery of heavy traffic theory (developed by J. M. Harrison, M. Bramson, R. Williams and many others)
• Focus on the following intriguing observation
– consider the MWM- algorithm: edge(i,j) has weight Qij
– Keslassy-McKeown 01 have observed that the average delay increases with
30
32
34
36
38
40
42
44
46
0 1 2 3 4
ABCD
Ave
rag
e D
ela
y
![Page 40: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/40.jpg)
41
Why this is worth focusing on
• The MWM class of algorithms are the most analyzed– huge body of literature on throughput analysis– hardly anything known about their delay properties– in particular, no explanation of the Keslassy-McKeown observation
• Note the singularity at =0 in the K-M observation– “continuity” implies that delay should be the smallest when =0– however, MWM-0 is the Maximum Size Matching algorithm
– MSM is known to be unstable; i.e. delays are infinite when =0 ! (McKeown-Ananthram-Walrand 96)– Question: if MSM is not the optimum, who is?
Heavy traffic theory helps answer these questions
![Page 41: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/41.jpg)
42
Heavy traffic theory
• Consider a network with N queues– as some parameter r ! 1– let the load on the system approach its capacity (hence the name HT)– speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qi
r(t), say
as r ! 1, we’re looking at the zoomed out system both in time and space
this type of scaling occurs in the Central Limit Theorem
r2 1/r
Original system
Scaledsystem
time
![Page 42: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/42.jpg)
43
Heavy traffic theory
• Consider a network with N queues– as some parameter r ! 1– let the load on the system approach its capacity (hence the name HT)– speed up time by a factor r2, and scale down space by a factor r; i.e. quantity of interest: Qi(r2t)/r = Qi
r(t), say
as r ! 1, we’re looking at the zoomed out system both in time and space
this type of scaling occurs in the Central Limit Theorem
• Main result: As r ! 1– (Q1
r(t),…,QNr(t)) ! (Q1
1(t),…,QN1(t)) = Q1(t),
which is a multidimensional Reflected Brownian Motion (RBM) (such limiting results are independent of details of distribution)
w
![Page 43: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/43.jpg)
44
State space collapse: dimension reduction
• Q1(t) can roam in lower dimensional space, say of dim K < N
![Page 44: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/44.jpg)
45
State space collapse: dimension reduction
1r
Nr
Q1r(t)
QNr(t)
• The HT limit procedure for LQF system
– load: r=1r++N
r = 1-r-1 ! 1
– state: Qr(t)=(Q1r(t),,QN
r(t)) ! Q1(t) = (Q11(t),, QN
1(t))
• State space collapse
– because of the LQF policy, Q11(t) = = QN
1(t)
– or, the dimension of the state space is just 1, not N
Capacity =1
• Q1(t) can roam in lower dimensional space, say of dim K < N
![Page 45: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/45.jpg)
46
State space collapse: performance
• More importantly, the state space induced by an IQ switch scheduling algorithm gives an indication of its performance; i.e.
State space(Alg A) ½ State space(Alg B) ) Alg B is better than Alg A
• So , to resolve the Keslassy-McKeown observation, we need to1. determine the state space of MWM-, and2. show
State space(MWM-1) ½ State space(MWM-2) if 1 > 2
![Page 46: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/46.jpg)
47
19 17
3
6
4 MWM=28
Switch: state space collapse
4
?
17
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
![Page 47: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/47.jpg)
48
19 17
3
6
4 MWM=28
Switch: state space collapse
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
7 ?
4
19
7
17
![Page 48: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/48.jpg)
49
19 17
3
6
4 MWM=28
Switch: state space collapse
7 5
4
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
![Page 49: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/49.jpg)
50
19 17
3
6
4 MWM=28
Switch: state space collapse
7 5
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
?
![Page 50: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/50.jpg)
51
19 17
3
6
4 MWM=28
Switch: state space collapse
7 5
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
18
![Page 51: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/51.jpg)
52
19 17
3
6
4 MWM=28
18
Switch: state space collapse
7 5
4
7
3
18
5
?
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
![Page 52: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/52.jpg)
53
19 17
3
6
4 MWM=28
18
Switch: state space collapse
7 5
4
7
3
18
5
5
• The dimension of the state space is N2, at most– what is the dimension due to state space collapse under MWM ?
• Given that MWM tries to equalize the weights of all matchings,– can we determine the minimum number of queue-sizes we need to know in order to work out the
size of all the N2 queues
• Lemma (Shah-Wischik): Knowing the sizes of any 2N-2 queues is not sufficient. But there exist 2N-1 queues whose sizes are sufficient to know.
• Consider an example of 3-port switch
![Page 53: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/53.jpg)
54
N-port switch
12
19 17
3
6
4
5
• For larger N: obtain entries inductively
![Page 54: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/54.jpg)
55
N-port switch
12
19 17
3
6
4
9
• For larger N: obtain entries inductively
7
18
5
5
6 12
9?
19
3
![Page 55: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/55.jpg)
56
N-port switch
12
19 17
3
6
4
9
• For larger N: obtain entries inductively
7
18
5
5
6 12
93
19
3 10
24
24
![Page 56: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/56.jpg)
57
N-port switch
12
19 17
3
6
4
9
6 12
9
19
3
35 27 31 148
30
22
78
• We use the following 2N-1 entries: W = (R1,…,RN-1; C1,…,CN-1;WNN)
![Page 57: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/57.jpg)
58
MWM-: state space
Theorem (Shah-Wischik)
Under MWM-, given W = (R1,…RN-1,C1,…CN-1,WNN), the corresponding switch state [Qij] is the unique solution of
minimize ij Qij
kQik ¸ Ri 8 i, (row-sum)
kQkj ¸ Cj 8 j, (column-sum)
k,l Qkl ¸ WNN (overall-sum)
subject to:
Qij = 0 if ij = 0
![Page 58: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/58.jpg)
59
MWM-0.5: state space
Cross-section
R1
C1
W
C1
R1
Q11=0
22
• Consider a 2-port switch: dimension of the collapsed space is 3
![Page 59: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/59.jpg)
60
MWM-1: state space
R1
C1
W
Cross-section
R1
C1
Q11=0
22
![Page 60: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/60.jpg)
61
22
MWM-2: state space
W
R1
C1
Cross-section
R1
C1 Q11=0
![Page 61: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/61.jpg)
62
Comparison
MWM-0.5 MWM-1 MWM-2
• As increases, the size of the state space decreases
• Therefore, the delay increases
theoretically explains the Keslassy-McKeown observation
![Page 62: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/62.jpg)
63
What happens at = 0?
• As ! 0, MWM- does not become the MSM
• Rather, it is the Maximum-weighted Maximum Size Matching algorithm – find all possible Max Size Matchings– if more than one MSM exist, pick the one with maximum weight
• Intuitively,– Maximum Size Matching is for good delay – Max Weight is for good throughput
• Mekkitikkul and McKeown 98 showed how to obtain the M-W MSM using Max Weight algorithm, and called their algorithm Longest Port First
Conjecture: The LPF algorithm has the largest possible state space. Hence it has the optimal delay performance.
(Shah-Wischik)
![Page 63: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/63.jpg)
64
Summary of main ideas and results
• There is a large body of literature on throughput analysis, but hardly anything about the delay of switch algorithms
• We developed a new method to analyze delay– based on Heavy Traffic Theory– it is simple and general
• We applied the method – to shed light on an intriguing observation of Keslassy-McKeown– and to characterize a delay optimal algorithm
![Page 64: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/64.jpg)
65
Algorithm II
Randomized edge coloring
![Page 65: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/65.jpg)
66
Edge coloring
• Let G=(V,E) be a graph,– be the maximum vertex degree.
• Edge coloring: – assign colors to the edges in E so that no two edges sharing a vertex have the same color.
![Page 66: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/66.jpg)
67
Edge coloring and switch scheduling
11
2
1
1
Weighted Graph
2
1
1
Multi Graph
1
2
1 2 3
3
![Page 67: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/67.jpg)
68
Contd.
Multi-graph with =4
Schedules
Edge color
T=1 T=2 T=3
Edge-colored multi-graph
T=4
![Page 68: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/68.jpg)
69
Edge-coloring algorithms
• Bipartite multi-graph can be colored using exactly colors– but, algorithm involves backtracking difficult to implement
• Next, consider a simple Greedy algorithm:– pick edges one at a time and in any order– assign the smallest available color
• Greedy algorithm:– simple and distributed– pipelineable easy to implement
![Page 69: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/69.jpg)
70
Greedy algorithm
Edge colored multi-graph
Greedyedge color
5 time slots are required even though the graph is 4-colorable! 80% efficiency or throughput !
Colors:
In the worst case, this algorithm can take 2-1 colors switch gives 50% throughput !
Multi-graph with =4
![Page 70: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/70.jpg)
71
A stability theorem
Theorem. A switch scheduling algorithm gives 100% throughput iff the corresponding edge coloring algorithm uses + o() colors.
(Aggrawal-Motwani-Shah-Zhu 03)
• We want to obtain an edge coloring algorithm which - uses + o() colors- is simple to implement (like Greedy)
• Previous work on such algorithms (Dubhashi, Grable, Panconesi 96)
- uses (1+c) , for c > 0 ) throughput is 1/(1+c)- is a lot more complicated than Greedy
![Page 71: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/71.jpg)
72
Our algorithm: Randomized edge-coloring
• Let G be a bipartite multi-graph on N input and output vertices with max degree
• Rand-Color: Let 1,…, be initial pool of colors.
– Step 1: (Randomized version of Greedy with colors)
a) sample an uncolored edge uniformly at randomb) assign a color uniformly at random from the available colorsc) when no more edges can be colored, go to Step 2
– Step 2: color the remaining edges using the Greedy algorithm
![Page 72: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/72.jpg)
73
Performance
Theorem (AMSZ). Under the following two conditions, the algorithm Rand-Color uses + o() colors w.h.p. (hence it gives 100% throughput).
Condition I:
Condition II:
![Page 73: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/73.jpg)
74
Summary of main ideas and results
• Edge coloring algorithms as a new approach to switch scheduling– not based on MWM– low complexity– new methods to establish stability
• Could be an interesting alternative to MWM-type algorithms– may be useful in general (non-switch) networks
![Page 74: Devavrat Shah Stanford University](https://reader036.vdocuments.net/reader036/viewer/2022070415/56814f92550346895dbd5025/html5/thumbnails/74.jpg)
75
Conclusions
• Network algorithms
– design involves a trade off between simplicity and performance– analysis requires new methods
• In the context of switch scheduling, I presented
– design of a simple, high-performance switch algorithm– a new analysis method based on heavy traffic theory– finally, I presented a new approach, based on edge coloring, for
designing scheduling algorithms