the buffer - bandwidth tradeoff in lossless networks
DESCRIPTION
The Buffer - Bandwidth Tradeoff in Lossless Networks. Alex Shpiner, Eitan Zahavi, Ori Rottenstreich. April 2014. Background: Network Incast Congestion. 100%. 300%. 100%. 100%. Source : http:// theithollow.com /2013/05/flow-control-explained/. Background: Pause Frame Flow Control. Buffer. - PowerPoint PPT PresentationTRANSCRIPT
Alex Shpiner, Eitan Zahavi, Ori RottenstreichApril 2014
The Buffer - Bandwidth Tradeoff in Lossless Networks
© 2014 Mellanox Technologies 2- Mellanox Technologies-
Background: Network Incast Congestion
Source: http://theithollow.com/2013/05/flow-control-explained/
100%
100%
100%
300%
© 2014 Mellanox Technologies 3- Mellanox Technologies-
Background: Pause Frame Flow Control
Source: http://theithollow.com/2013/05/flow-control-explained/
Buffer
© 2014 Mellanox Technologies 4- Mellanox Technologies-
Background: Incast with Pause Frame Flow Control
Source: http://theithollow.com/2013/05/flow-control-explained/
33%
33%
33%
100%
© 2014 Mellanox Technologies 5- Mellanox Technologies-
Background: Congestion Spreading Problem
Small buffers Link pauses Congestion spreading Effective link bandwidth decrease To deal with Incast we can:
• Increase buffers• Increase link bandwidth
Source: http://theithollow.com/2013/05/flow-control-explained/
This flow is also paused, since the
pause control does not distinguish between flows.
Effective link bandwidth= Link bandwidth * %unpaused
Trade-off
33% instead of
66%
© 2014 Mellanox Technologies 6- Mellanox Technologies-
Buffer-bandwidth Tradeoff
Higher bandwidth allows: • Faster buffer draining• Pausing link for longer periods, while achieving same effective bandwidth.
reduced buffering demand.
Aim: evaluate the buffer-bandwidth tradeoff.
Assumptions:• Lossless network• Congestion spreading avoidance is desired
Effective bandwidth = Link bandwidth * %unpaused
© 2014 Mellanox Technologies 7- Mellanox Technologies-
Evaluation Flow
1). Assume network with links of bandwidth C and
buffers of size B
2). Define the most challenging workload the network can handle
without congestion spreading
3). Increase links bandwidth by α and reduce buffer size by β
4). Evaluate the relation between α and β that able to handle the workload from step 2.
Transmit the workload at the
same rate, while keeping the same effective link
bandwidth
Traffic Injector
Switch
Traffic Receiver
0
1
N-1
C
B
B
B
C
C
C
C/N
C/N
C/N
Data
Rat
e
Time
β= 𝑓 (𝛼)
© 2014 Mellanox Technologies 8- Mellanox Technologies-
Network Model
Data
Rat
e
Time
Injected Traffic
Data
Rat
e
Time
Injected Traffic
The most challenging workload:
Traffic Injector
Switch
Traffic Receiver
0
1
N-1
C
B
B
B
C
C
C
C/N
C/N
C/N
SYNCHRONIZED BURSTS
© 2014 Mellanox Technologies 9- Mellanox Technologies-
Determining the Workload Parameters
λin
C
tT
T/N
QB
t
Assumption: output link is 100% utilized.• Burst length
• It takes to fill buffer of size at arrival rate and departure rate :
Arrival rate
Queue sizeN senders
© 2014 Mellanox Technologies 10- Mellanox Technologies-
λin
C
tT
T/N
QB
t
λin
C
tT
T/N
QB
t
Q
βB
tt1 t2 t3
Effect of Buffer Size Reduction with Link Acceleration
Conclusions:• We can push more traffic.• When the buffer is full, the link is
in paused mode: (congestion spreading)
𝑄𝑚𝑎𝑥=𝐵
Arriving traffic
InputQueue Size
InputQueue Size
© 2014 Mellanox Technologies 11- Mellanox Technologies-
Effect of Buffer Size Reduction with Link Acceleration– cont.
When buffer is full, the link is in paused mode:
• Paused mode congestion spreading
By how much the buffer can be reduced (β) to avoid congestion spreading () ?
• For (incast ):- 40% of buffer saving!!!
• For (incast ):- only 5% of buffer saving
Q
βB
tt1 t2 t3 T
; ;
© 2014 Mellanox Technologies 12- Mellanox Technologies-
Effect of Buffer Size Reduction with Link Acceleration– cont.
Observation: we are allowed to pause the link, since we increased the link capacity.
- For :
• Using (1) and (2) . - For !!!
We can save 47% of buffer size with 40% of link rate increase, to get the same performance!
And we can also push more data in total (e.g. 56Gbps vs 40Gbps)
• with the congestion spreading cost
Q
βB
tt1 t2 t3 T
; ;
© 2014 Mellanox Technologies 13- Mellanox Technologies-
Asymptotic Analysis
For no buffering is required in
the switches!
0 5 10 15 20 25 30 35 40 45 500%
10%20%30%40%50%60%70%80%90%
100%
Buffer Size Saving (1-β)
1.11.21.31.41.51.61.71.81.9
N (Incast ratio N to 1)
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 20%
10%20%30%40%50%60%70%80%90%
100%
Buffer Size Saving (1-β)
248163264
α
αN
By increasing link bandwidth by X% the buffer saving is at least X%
for any incast load!
© 2014 Mellanox Technologies 14- Mellanox Technologies-
Multiple Incast Cascade Analysis
Last rank defines the workload parameters:
The analysis is similar to a single-rank case, but now the
traffic arrives at rate
Traffic Injectors
k1
Switch
C
B
BB
CC
Switch
C
B
BB
C
CSwitch
B
BB
Ck2Traffic
Receiver
SwitchB
BB
C
C
knk1
T
λinC
tTn
QB
t
Rank n, α>1, β<1:λin
αC
tT
Tn/α
QβB
t
𝜷=𝟏−𝜶∗%𝒑𝒂𝒖𝒔𝒆𝒅
Rank 1 Rank 2 Rank n
𝛽 (𝛼=1. 4 ,%𝑝𝑎𝑢𝑠𝑒𝑑=32% )=0 .55
© 2014 Mellanox Technologies 15- Mellanox Technologies-
Conclusions
We can reduce switch buffer size, while still pushing the same traffic.• But, we pay with:- Congestion spreading (pause frames)- Buffers at the traffic sources (NICs)
By increasing the links bandwidth, we can reduce the congestion spreading• And push more traffic.
We can save X% of buffer size with X% of link rate increase (for any incast).
When increasing the links bandwidth by a factor of at least 2 ( no buffering is required at the switches.
The results hold also for the multiple incast cascade.
© 2014 Mellanox Technologies 16- Mellanox Technologies-
Thank YouThank You