ppt
Post on 09-Jun-2015
1.378 Views
Preview:
TRANSCRIPT
Some Unsolved Problems in High Speed
Packet Swtiching
Shivendra S. PanwarJoint work with: Yihan Li, Yanming Shen and H. Jonathan Chao
Polytechnic University, Brooklyn, NYNY State Center for Advanced Technology in Telecommunications
http://catt.poly.edu/CATT/panwar.html
5
Buffering in a Packet Switch
Fixed-size packet switches Operates in a time-slotted manner The slot duration is equal to the cell transmission time
Contention occurs when multiple inputs have arrivals destined to the same output
Buffering is needed to avoid packet loss Buffering schemes in a packet switch
Output queueing (IQ) Input queueing (OQ) Virtual output queueing (VOQ) / combined input-output-queueing
(CIOQ)
6
Output Queuing (OQ) 100% throughput Internal speedup of N
Impractical for large N
Input 1
Input 2
Input 3
Input 4
Output 1
Output 2
Output 3
Output 4
3
3
3
3
7
Input Queuing (IQ) Easy to implement HOL Blocking, throughput 58.6%
Head of Line BlockingHead of Line Blocking
Input 1
Input 2
Input 3
Input 4
Output 1
Output 2
Output 3
Output 4
12
32
34
24
8
Virtual Output Queuing (VOQ) Virtual Output Queuing
(VOQ) Overcome HOL
blocking No speedup
requirement Need scheduling
algorithms to resolve contention Complexity Performance guarantee
1
234
1
234
1
234
1
234
9
Challenges in Switch Design
Stability 100% throughput
Delay performance Scalability
Scale to high number of linecards and to high linecard speeds
Distributed scheduler is more desirable than a centralized scheduler
Scheduler complexity Pin count
10
High Speed Packet Switches
VOQ switches and scheduling algorithms Buffered crossbar switch Load Balanced switch Multi-stage switch
11
VOQ Switch Architecture
Input 1
Input 2
Input 3
Input 4
Output 1
Output 2
Output 3
Output 4
Switch Fabric
VOQISM ORM
1
N
1
N
1
N
1
N
1
N
1
N
1
N
1
N
Input Segmentation Module (ISM): Segment packets to fixed-length cells.Output Reassembly Module (ORM): Reassemble cells into packets.
12
Scheduling for VOQ Switch
Scheduling is needed to avoid output contention
A scheduling problem can be modeled as a matching problem in a bipartite graph An input and an output are
connected by an edge if the corresponding VOQ is not empty
Each edge may have a weight, which can be
The length of the VOQ The age of the HOL cell
13
Maximum Weight Matching (MWM)
MWM always finds a match with the maximum weight
Stable under any admissible traffic
Very high complexity O(N3), impractical
7
43
7
8
5
6
10
5
2
Weight of the match: 25 N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transaction on
Comm., vol. 47, no. 8, Aug. 1999, pp. 1260-1267. J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000.
References L. Tassiulas, A. Ephremides, ``Stability properties of constrained queueing systems and
scheduling for maximum throughput in multihop radio networks,'' IEEE Transactions on Automatic Control, Vol. 37, No. 12, pp. 1936-1949, December 1992.
E. Leonardi, M. Mellia, F. Neri, Marco A. Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001
14
Maximum Weight Matching The maximum weight matching algorithm is strongly stable
under any admissible traffic pattern Lyapunov function Strongly stable
Admissible
References Emilio Leonardi, Marco Mellia, Fabio Neri, Marco Ajmone Marsan, “On the stability
of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001
N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transaction on Comm., vol. 47, no. 8, Aug. 1999, pp. 1260-1267.
∑ <i
ij 1λ ∑ <j
ij 1λ
∞<∞→ ||][||suplim nn QE
15
Maximum Weight Matching
Fluid model The maximum weight matching is rate stable if:
The arrival processes satisfy a strong law of large numbers (SLLN) with probability one
, and
References J.G. Dai and B. Prabhakar, “The throughput of data switches with and without
speedup,” INFOCOM 2000, pp. 556-564.
Njin
nAij
ij
n,...,2,1,,
)(lim ==
∞→λ
∑ ≤i
ij 1λ ∑ ≤j
ij 1λ
16
Approximate MWM 1-APRX
A function f(.) is a sub-linear function if limx∞ f(x)/x = 0 Let the weight of a schedule obtained by a scheduling algorithm B
be WB Let the weight of the maximum weight match for the same switch
state be W*
If WB ≥ W* - f(W*)
B is a 1-APRX to MWM B is stable if
Makes it possible to find stable matching algorithms with lower complexity than MWM.
References D. Shah, M. Kopikare, “Delay bounds for approximate Maximum weight matching
algorithms for input-queued switches”, IEEE INFOCOM, New York, USA, June 2002.
,)),(*()(*)( ttWftWtW B ∀−≥
17
Average Delay Bound Delay bound for MWM
Lyapunov function
References E. Leonardi, M. Melia, F. Neri, and M. Ajmone Marson. Bounds on average
delays and queue size averages and variances in input-queued cell-based switches. Proceedings of IEEE INFOCOM, 2001.
ρρρ−
−≤
1
/||])([||
22 N
NtQE
18
Average Delay Bound (contd.) Delay bound for approximate-MWM
Lyapunov function
Cb: weight difference to the MWM matching
Uniform traffic, they have the same result
References D. Shah, M. Kopikare, “Delay bounds for approximate Maximum weight matching
algorithms for input-queued switches”, IEEE INFOCOM, New York, USA, June 2002.
δδ 2||])([||lim
~
bt
NCNtQE +
Λ≤∞→
∑−=j iji λδ max1 ∑ −=Λ
ij ijij )( 2~
λλ
)1(1
||])([|| 2
NNtQE
ρρ
ρ−
−≤
19
Open Issues
With simulations, MWM has the best delay performance (Cell delay) Average delay: Choose the weight of a queue as Qa , then delay
is increasing with a for a>0
Is MWM the optimal scheduling scheme for achieving the minimum average cell delay?
What is the optimal scheduling scheme to achieve the minimum average packet delay (Including reassembly delay)?
20
Maximal Matching
Maximal Matching Add connections incrementally,
without removing connections made earlier
No more matches can be made trivially by the end of the operation
Solution may not be unique Complexity O(NlogN)
7
43
7
8
5
6
10
5
2
Weight of the match: 23
21
Maximal Matching A maximal matching achieves 100% throughput with
speed-up S≥2 under any admissible traffic pattern [Leonardi, ToN 2001] 100% throughput
if
with probability 1 A maximal matching algorithm is rate stable with speed-up
S≥2 [Dai, Infocom 2000]
References Emilio Leonardi, Marco Mellia, Fabio Neri, Marco Ajmone Marsan, “On the stability
of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001
J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000, pp. 556-564.
0)()/1(lim)/(lim0
=−= ∑=∞→∞→n
i iinnn DAnnQ
22
Multiple Iterative Matching Use multiple iterations to converge on a maximal
matching Parallel Iterative Matching (PIM) iSLIP and DRRM
complexity of each iteration is O(logN) O(logN) iterations are needed to converge on a
maximal matching (iSLIP) 100% throughput only under uniform traffic
23
iSLIPStep 1: Request
Each input sends a request to every output for which it has a queued cell.
Step 2: Grant If an output receives multiple requests it
chooses the one that appears next in a fixed round-robin schedule.
The output arbiter pointer is incremented by one location beyond the granted input if, and only if, the grant is accepted in step 3.
Step 3: Accept If an input receives multiple grants, it
accepts the one that appears next in a fixed round-robin schedule.
The input arbiter pointer is incremented by one location beyond the accepted output.
Input Output
RequestGrantAccept
24
Achieving 100% Throughput without Speedup Matching algorithms using memory Polling system based matching
25
Low Complexity Algorithms with 100% Throughput Algorithms with memory
Use the previous schedule as a candidate References
L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switches,” IEEE INFOCOM 1998, vol.2, New York, 1998, pp.533-539.
P. Giaccone, B. Prabhakar, D. Shah “Toward simple, high-performance schedulers for high-aggregate bandwidth switches”, IEEE INFOCOM 2002, New York, 2002.
Polling system based matching algorithms Improve the efficiency by using exhaustive service References
Y. Li, S. Panwar, H. J. Chao, “Exhaustive service matching algorithms for input queued switches,” 2004 Workshop on High Performance Switching and Routing (HPSR 2004), April 2004.
Y. Li, S. Panwar, H. J. Chao, “ Performance Analysis of a Dual Round Robin Matching Switch with Exhaustive Service,” IEEE GLOBECOM 2002.
26
Matching Algorithms with Memory
The queue length of each VOQ does not change much during successive time slots In each time slot, there can be
At most one cell arrives to each input At most one cell departs from each input
It is likely that a busy connection will continue to be busy over a few time slots, if the queue length is used as the weight of a connection
Use the match in the previous time slot as an candidate for the new match
Important results: Randomized algorithm with memory [Tassiulas 98] Derandomized algorithm with memory [Giaccone 02] With higher complexity: APSARA, LAURA, SERENA [Giaccone 02]
27
Notations
For a NxN switch, there are N! possible matches
Q(t)=[qij]NxN, qij is the queue length of VOQij
M(t), a match at time t
The weight of M(t) W(t)=<M(t),Q(t)>
the sum of the lengths of all matched VOQs
28
Randomized algorithm with memory
Randomized algorithm with memoryLet S(t) be the schedule used at time t
At time t+1, uniformly select a match R(t+1) at random from the set of all N! possible matches
Let
Stable under any Bernoulli i.i.d. admissible arrival traffic Very simple to implement, complexity O(logN) Delay performance is very poor
)1(,maxarg)1()}1(),({
+=++∈
tQStStRtSS
29
Derandomized Algorithm with Memory Hamiltonian walk
A walk which visits every vertex of a graph exactly once. In a NxN switch,
N! vertices (possible schedules), a Hamiltonian walk visits each vertex once every N! time slots
H(t): the value of the vertex which is visited at time t The complexity of generating H(t+1) when H(t) is known is O(1)
Derandomized algorithm with memory Use the match generated by Hamiltonian walk instead of the
random match Similar performance as randomized algorithm
30
Compared to MWM …
Simple matching algorithms can achieve stability as MWM does
Not necessary to find “the best match” in each time slot to achieve 100% throughput
MWM has much better delay performance than randomized and derandomized matching“better” matches lead to better delay performance
31
With Higher Complexity and Lower Delay
Introduce higher complexity for much lower delay than the randomized and derandomized algorithms
APSARA include the neighbors of the latest match as candidates
LAURA: merge the latest match with a random match to remember the
heavy edges
SERENA Merge the latest match with the arrival figure
Figure: generated from the current arrival pattern Complexity O(N)
32
Polling System Based Matching
Exhaustive Service Matching Inspired by exhaustive service polling systems All the cells in the corresponding VOQ are served
after an input and an output are matched Slot times wasted to achieve an input-output match
are amortized over all the cells waiting in the VOQ instead of only one
Cells within the same packet are transferred continuously
Hamiltonian walk is used to guarantee stability
33
Exhaustive Service Matching with Hamiltonian Walk (EMHW) EMHW
Let S(t) be the match at time t. At time t+1, generate match Z(t+1) by the Exhaustive Service
Matching algorithm based on S(t), and H(t+1) by Hamiltonian walk
Let
where <S,Q(t+1)> is the weight of S at time t+1.
Stable under any admissible traffic Analyzed by an exhaustive service polling system Implementation complexity
HE-iSLIP: O(logN)
)1(,maxarg)1()}1(),1({
+=+++∈
tQStStHtZS
34
E-iSLIP Average Delay Analysis Exhaustive random polling system model
Symmetric system -- only consider one input N VOQs per input, exhaustive service policy -- an exhaustive
service polling system with N stations The service order of the VOQs are not fixed -- random polling
system, assume all station VOQs have the same probability of selection for service after a VOQ is served
Switch over time S[ ],)1(1
1)1(
1
1 1
1
mmmNN
m mm
NQ ω
ωρρ −−−⎟⎟
⎠
⎞⎜⎜⎝
⎛−
−= −−
=∑,
1)(
Q
QSE
−= where
Q
Q
Q
QSE ,1
)1(21)( 2
⎥⎦
⎤⎢⎣
⎡+
−−=
]1
)1(
1
)1(
)1([
2
1)(
22
μμμ
μμσδ
NrN
NNr
NrTE
−−
+−
−+
−+=
.,),()()(),( 2222
NNSESESVarSEr
ρσρμδ ==−===
[ ].)1(11 m
mρω −−=
Average delay T [Levy and Kleinrock]
35
Delay Performance of HE-iSLIP Packet delay: the sum of cell delay and reassembly delay Cell delay: measured from VOQ to destination output Reassembly delay: time spent in an ORM, often ignored in
other work
Input 1
Input 2
Input 3
Input 4
Output 1
Output 2
Output 3
Output 4
Switch Fabric
VOQISM ORM1
N
1
N
1
N
1
N
1
N
1
N
1
N
1
N
36
Performance Summary
schemes complexity stable packet delay performance
iSLIP O(logN) No Always higher than HE-iSLIP.
HE-iSLIP O(logN) Yes Lowest when packet size is larger than 1 cell.
Derandomized O(logN) Yes Highest for all traffic patterns.
SERENA O(N) Yes Lower than HE-iSLIP only under nonuniform diagonal traffic.
MWM O(N3) Yes Lowest when packet size is 1 cell.
38
Packet Delay under Uniform Traffic Pattern 2: packet length is 10
cells
Pattern 3: packet length is variable, the average is 10 cells (Internet packet size distribution)
MWM
HE-iSLIP
HE-iSLIP
MWM
SERENA
iSLIPiSLIP
SERENA
39
When packet length is larger than 1 cell Why does HE-iSLIP have a lower packet delay than MWM? For example, when packet length is 10 cells:
Cell delay Reassembly delay
Low cell delay + low reassembly delay needed for low packet delay
HE-iSLIP
MWM
HE-iSLIP
MWM
Open Problem: Which scheduler minimizes packet delay performance?Open Problem: Which scheduler minimizes packet delay performance?
40
Packet-Based Scheduling Packet-based scheduling algorithm
once it starts transmitting the first cell of a packet to an output port, it continues the transmission until the whole packet is completely received at the corresponding output port
Packet-based MWM is stable for any admissible Bernoulli i.i.d. traffic Lyapunov function, MA. Marsan, A. Bianco, P. Giaccone, E. Leonardi, and F. Neri, “Packet
Scheduling in Input-Queued Cell-Based Swithces,” INFOCOM 2001, pp. 1085-1094.
Packet-based MWM is stable under regenerative admissible input traffic Fluid model, Y. Ganjali, A. Keshavarzian, D. Shah, “Input Queued Switches: Cell switching v/s
Packet switching", Proceedings of Infocom, 2003. regenerative: Let T be the time between two successive occurrences of
the event that all ports are free with E(T) being finite Modified waiting PB-MWM algorithm is stable under any admissible
traffic
41
Buffered Crossbar Switch
Distributed arbitration for inputs and outputs From each input, one cell
can be sent to a crosspoint buffer if it has space
One cell can be sent to an output if at least one crosspoint buffer to that output is nonempty
References Y. Doi and N. Yamanaka, “A High-Speed ATM Switch with Input and Cross-
Point Buffers,” IEICE TRANS. COMMUN., VOL. E76, NO.3, pp. 310-314, March 1993.
R. Rojas-Cessa, E. Oki, Z. Jing, and H. J. Chao, “CIXB-1: Combined Input-One-Cell-Crosspoint Buffered Switch,” Proceedings of IEEE Workshop of High Performance Switches and Routers 2001.
One buffer for each crosspoint
42
Birkhoff-von Neumann Switch When traffic matrix is known
Birkhoff-von Neumann decomposition Reference
Cheng-Shang Chang, Wen-Jyh Chen and Hsiang-Yi Huang, "On service guarantees for input buffered crossbar switches: a capacity decomposition approach by Birkhoff and von Neumann," IEEE IWQoS'99, pp. 79-86, London, U.K., 1999.
44
Load-Balanced Switch
Switching
...
...
...
...
...
...
Load-balancing
……
1
k
N
Load-balanced switch Convert the traffic to uniform, then fixed switching 100% throughput for broad class of traffic No centralized scheduler needed, scalable
45
Original Work on LB Switch
Stability: the load-balanced switch is stable Delay: burst reduction Problem: unbounded out-of-sequence delays Reference
C.-S. Chang, D.-S. Lee and Y.-S. Jou, “Load balanced Birkhoff-von Neumann switches, Part I: one-stage buffering,” Computer Comm., Vol. 25, pp. 611-622, 2002.
46
LB Switch variants Solve the out-of-sequence problem
FCFS (First come first serve) Jitter control mechanism
Increase the average delay EDF (Earliest deadline first)
Reduce the average delay High complexity
Mailbox switch Prevent packets from being out-of-sequence Not 100% throughput
References C.-S. Chang, D.-S. Lee and C.-M. Lien, “Load balanced Birkhoff-von Neumann switches, Part II: multi-stage
buffering,” Computer Comm., Vol. 25, pp. 623-634, 2002. C.S. Chang, D. Lee, and Y. J. Shih, “Mailbox switch: A scalable twostage switch architecture
for conflict resolution of ordered packets,” In Proceedings of IEEE INFOCOM, Hong Kong, March 2004.
47
More LB switch variants FFF (Full frames first) (Infocom 2002, Mckeown)
Frame-based No need for resequencing Require multi-stage buffer communication-high complexity
FOFF (Full ordered frames first) (Sigcomm 2003, Mckeown) Frame-based Maximum resequencing delay N2
Bandwidth wastage
References I. Keslassy and N. McKeown, “Maintaining packet order in two-stage switches,” Proc. of the IEEE Infocom,
June 2002. I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown , “Scaling Internet
routers using optics,” ACM SIGCOMM ’03, Karlsruhe, Germany, Aug. 2003.
48
Byte-Focal Switch Architecture
Input VOQArrival2nd stage switch fabric
Second-stage VOQ
Re-sequencing buffer
i
1
N
(1,1)
(1,N)
(1,k)
(i,1)
(i,k)
...
...
(i,N)
……
(N,1)
(N,k)
(N,N)
...
...
...
...
...
...
...
...
...
...
(1,1)
(1,k)
(1,N)
(j,1)
1
j
N
(j,k)
(j,N)
(N,1)
(N,k)
(N,N)
1st stage switch fabric
……
1
k
N
…
12
N
…
12
N
…
12
N
……
1
i
N
…
…
49
Byte-Focal Switch
Packet-by-packet scheduling Improves the average delay performance
The maximum resequencing delay is N2
The time complexity of the resequencing buffer is O(1)
Does not need communications between linecards
References Y. Shen, S. Jiang, S.S.Panwar, H.J. Chao, “Byte-Focal: a practical load-balanced
swtich”, HPSR 2005, Hongkong.
50
Multi-Stage Switches Single Stage Switches (e.g., Cross-point switch)
Single path between each input-output pair Cannot meet the increasing demands of Internet traffic
No packets out-of-sequence Easy to design Lack of scalability
Multi-stage Switches (e.g., Clos-network switch) Multiple paths between each input-output pair
Better tradeoff between the switch performance and complexity Highly scalable and fault tolerant Memory-less multi-stage switches
No packets out-of-sequence, may encounter internal blocking Buffered multi-stage switches
Packet may be out-of-sequence, easy scheduling
52
Trueway: A Multi-Plane Multi-Stage Switch
TMI(0)
TMI(n-1)
TMI(N-n)
TMI(N-1)
TME(0)
TME(n´)-1)
TME(N-n)
TME(N-
IM(0)n x m
OM(0)m x n
CM(0)k x k
IM(k-1)n x m
OM(k-1)m x n
CM(m-1)k x k
n
n
m
m
k
k
k
k
m
m
n
n
p
p
p
p
Plane(0)
…
… …
… …
… …
… …
… …
… …
… …
…
IM(0)n x m
OM(0)m x n
CM(0)k x k
IM(k-1)n x m
OM(k-1)m x n
CM(m-1)k x k
n
n
m
m
k
k
k
k
m
m
n
n
p
p
p
p
Plane(p-1)
…
… …
… …
… …
… …
… …
… …
… …
…
… …
… ……… …
………TMI(0)
TMI(n´)
TMI(N-n)
TMI(N´)
TME(0)
TME(n´)
TME(N-n)
TME(N´)
IM(0)n x m
OM(0)m x n
CM(0)k x k
IM(k´)n x m
OM(k´)m x n
CM(m´)k x k
n
n
m
m
k
k
k
k
m
m
n
n
p
p
p
p
Plane(0)
…
… …
… …
… …
… …
… …
… …
… …
…
IM(0)n x m
OM(0)m x n
CM(0)k x k
IM(k´)n x m
OM(k´)m x n
CM(m´)k x k
n
n
m
m
k
k
k
k
m
m
n
n
p
p
p
p
Plane(p´)
…
… …
… …
… …
… …
… …
… …
… …
…
… …
… ……… …
………
53
Trueway Switch The switch fabric consists of multiple switching planes,
with each being a three-stage Clos network with m center modules
Each input/output pair has multiple routing paths Highly scalable
1 n
1
2
n
Cross-point buffered memory2
54
Challenges in Multi-Stage Switching How to efficiently allocate and share the limited on-chip
memory? How to schedule packets on multiple paths to maximize
memory utilization and system performance? How to minimize link congestion and prevent buffer overflow
(i.e., stage-to-stage flow control)? How to maintain cells/packet order if they are delivered over
multiple paths (i.e., port-to-port flow control)? How to achieve 100% throughput?
top related