experience with loss-based congestion controlled tcp stacks yee-ting li university college london
Post on 03-Jan-2016
216 Views
Preview:
TRANSCRIPT
Experience with Loss-Based Experience with Loss-Based Congestion Controlled TCP Congestion Controlled TCP
StacksStacks
Yee-Ting LiYee-Ting Li
University College LondonUniversity College London
IntroductionIntroduction
Transport of Data for next generation Transport of Data for next generation applicationsapplications
Network hardware is capable of Gigabits Network hardware is capable of Gigabits per secondper second
Current ‘Vanilla’ TCP not capable over long Current ‘Vanilla’ TCP not capable over long distances and high throughputsdistances and high throughputs
New TCP Stacks have been introduced to New TCP Stacks have been introduced to rectify problemrectify problem
Investigation into the performance, Investigation into the performance, bottlenecks and deploy-ability of new bottlenecks and deploy-ability of new algorithmsalgorithms
Transmission Control Transmission Control ProtocolProtocol
Connection orientatedConnection orientated Reliable Transport of DataReliable Transport of Data Window basedWindow based Congestion and Flow Control to prevent Congestion and Flow Control to prevent
network collapsenetwork collapse Provides ‘fairness’ between competing Provides ‘fairness’ between competing
streamsstreams 20 Years old20 Years old
Originally designed for kbit/sec pipesOriginally designed for kbit/sec pipes
TCP AlgorithmsTCP Algorithms Based on two algorithms to determine rate at Based on two algorithms to determine rate at
which data is to be sentwhich data is to be sent Slowstart: probe for initial bandwidthSlowstart: probe for initial bandwidth Congestion Avoidance: maintain a steady state transfer Congestion Avoidance: maintain a steady state transfer
raterate Focus on Steady State: probe for increases in Focus on Steady State: probe for increases in
available bandwidth, whilst backing off if available bandwidth, whilst backing off if congestion is detected (through loss).congestion is detected (through loss).
Maintained through a ‘congestion window’ cwnd Maintained through a ‘congestion window’ cwnd that regulates the number of unacknowledged that regulates the number of unacknowledged packets allowed on connection.packets allowed on connection.
Size of window approx equals Bandwidth delay Size of window approx equals Bandwidth delay productproduct Determines the appropriate window size to set to obtain Determines the appropriate window size to set to obtain
a bandwidth under a certain delaya bandwidth under a certain delay Window = Bandwidth x DelayWindow = Bandwidth x Delay
AlgorithmsAlgorithms
Congestion AvoidanceCongestion Avoidance For every packet (ack) received by senderFor every packet (ack) received by sender
Cwnd Cwnd cwnd + 1/cwnd cwnd + 1/cwnd For when loss is detected (through dupacks)For when loss is detected (through dupacks)
Cwnd Cwnd cwnd / 2 cwnd / 2 Growth of cwnd determined by:Growth of cwnd determined by:
the RTT of the connectionthe RTT of the connection When rtt is high, cwnd grows slowly (because of When rtt is high, cwnd grows slowly (because of
acking)acking) The loss rate on the lineThe loss rate on the line
High loss means that cwnd never achieved a large High loss means that cwnd never achieved a large valuevalue
Capacity of the linkCapacity of the link Allows for large cwnd value (when low loss)Allows for large cwnd value (when low loss)
Current Methods of Achieving Current Methods of Achieving High ThroughputHigh Throughput
AdvantagesAdvantages Achieves good Achieves good
throughputthroughput Not changes to kernels Not changes to kernels
requiredrequired DisadvantagesDisadvantages
Have to manually tune Have to manually tune the number of flowsthe number of flows
May induce extra loss May induce extra loss on lossy networkson lossy networks
Need to Need to reprogram/recompile reprogram/recompile softwaresoftware
New TCP StacksNew TCP Stacks
Modify the congestion control algorithm to Modify the congestion control algorithm to improve response timesimprove response times
All based on modifying the cwnd growth All based on modifying the cwnd growth and decrease valuesand decrease values
Define:Define: a = increase of data packets per window of acksa = increase of data packets per window of acks b = decrease factor upon congestionb = decrease factor upon congestion
To maintain compatibility (and hence To maintain compatibility (and hence network stability and fairness), for small network stability and fairness), for small cwnd values:cwnd values: Mode switch from Vanilla to New TCPMode switch from Vanilla to New TCP
HSTCPHSTCP
Designed by Sally FloydDesigned by Sally Floyd Determine a and b as a function of cwndDetermine a and b as a function of cwnd
a a a(cwnd) a(cwnd) b b b(cwnd) b(cwnd)
Gradual improvement in throughput as we Gradual improvement in throughput as we approach larger bandwidth delay productsapproach larger bandwidth delay products
Current implementation focused on Current implementation focused on performance upto 10Gb/sec – set linear performance upto 10Gb/sec – set linear relation between loss and throughput relation between loss and throughput (response function)(response function)
Scalable TCPScalable TCP
Designed by Tom KellyDesigned by Tom Kelly Define a and b to be constant:Define a and b to be constant:
a: cwnd a: cwnd cwnd + a (per ack) cwnd + a (per ack) b: cwnd b: cwnd cwnd – b x cwnd cwnd – b x cwnd
Intrinsic scaling property that has the Intrinsic scaling property that has the same performance over any link (beyond same performance over any link (beyond the initial threshold)the initial threshold)
Recommended settingsRecommended settings a = 1/100a = 1/100 b = 1/8b = 1/8
H-TCPH-TCP
Designed by Doug Leith and Robert ShortenDesigned by Doug Leith and Robert Shorten Define a mode switch so that after Define a mode switch so that after congestioncongestion we do normal Vanilla we do normal Vanilla
After a predefined period After a predefined period ∆∆LL, switch to a , switch to a high performance ahigh performance a ∆∆ii
≤ ∆≤ ∆LL: a = 1: a = 1 ∆∆II
> ∆> ∆LL: a = 1 + (∆ - ∆: a = 1 + (∆ - ∆LL) + [(∆ - ∆) + [(∆ - ∆LL)/20])/20]22
Upon loss drop byUpon loss drop by | [B| [Bii
maxmax(k+1) - B(k+1) - Biimaxmax(k)] / B(k)] / Bii
maxmax(k) | > 0.2: b = 0.5(k) | > 0.2: b = 0.5 Else: b = RTTElse: b = RTTminmin/RTT/RTTmaxmax
ImplementationImplementation All New Stacks have own implementationAll New Stacks have own implementation Small differences between implementations means Small differences between implementations means
that we are comparing the kernel differences rather that we are comparing the kernel differences rather than just the algorithmic differencesthan just the algorithmic differences
Lead to development of ‘test platform’ kernel Lead to development of ‘test platform’ kernel altAIMDaltAIMD
Implements all three stacks via simple Implements all three stacks via simple sysctlsysctl switch. switch. Also incorporates switches for certain undesirable Also incorporates switches for certain undesirable
kernel ‘features’kernel ‘features’ moderate_cwnd()moderate_cwnd() IFQIFQ
Added extra features for testing/evaluation purposesAdded extra features for testing/evaluation purposes Appropriate Byte Counting (RFC3465)Appropriate Byte Counting (RFC3465) Inducible packet loss (at recv)Inducible packet loss (at recv) Web100 TCP logging (cwnd etc)Web100 TCP logging (cwnd etc)
Networks Under TestNetworks Under Test
NetworksNetworks
Cisco 7600
Cisco 7600
Juniper
StarLightCERN
Cisco 7600
Cisco 7600
Cisco 7600
Manchester
UCL
DataTAG MB-NG
Bottleneck Capacity 1Gb/secRTT 120msec
Bottleneck Capacity 1Gb/secRTT 6msec
Graph/DemoGraph/Demo
Mode switch between stacks on Mode switch between stacks on constant packet dropconstant packet drop
Vanilla TCP Scalable TCP HS-TCP
{ { {
Comparison against theoryComparison against theory
Response functionResponse function
Self Similar Background Self Similar Background TestsTests
Results skewedResults skewed Not comparing differences in TCP algorithms!Not comparing differences in TCP algorithms! Not useful results!Not useful results!
SACK …SACK … Look into what’s happening at the Look into what’s happening at the
algorithmic level:algorithmic level:
Strange hiccups in cwnd Strange hiccups in cwnd only only correlation is SACK arrivalscorrelation is SACK arrivals
Scalable TCP on MB-NG with 200mbit/sec CBR Background
SACKSSACKS Supplies the sender information about what Supplies the sender information about what
segments the recv hassegments the recv has Sender infers the missing packets to resendSender infers the missing packets to resend Aids recovery during loss and prevents timeoutsAids recovery during loss and prevents timeouts
Current implementation in 2.4 and 2.6 does Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each a walk through the entire sack list for each SACKSACK Very cpu intensiveVery cpu intensive Can be interrupted by arrival of next SACK which Can be interrupted by arrival of next SACK which
causes the SACK implementation to misbehavecauses the SACK implementation to misbehave Tests conducted with Tom Kelly’s SACK Tests conducted with Tom Kelly’s SACK
fast-path patchfast-path patch Improves SACK processing, but still not sufficientImproves SACK processing, but still not sufficient
SACK Processing overheadSACK Processing overhead Periods of Periods of
web100 silence web100 silence due to high cpu due to high cpu utilizationutilization
Logging done in Logging done in userspace – userspace – kernel time kernel time taken up by tcp taken up by tcp sack processingsack processing
TCP resets cwndTCP resets cwnd
Congestion Window Congestion Window ModerationModeration
Linux TCP implementation adds ‘feature’ Linux TCP implementation adds ‘feature’ of moderate_cwnd()of moderate_cwnd()
Idea is to prevent large bursts of data Idea is to prevent large bursts of data packets under ‘dubious’ conditionspackets under ‘dubious’ conditions When an ACK acknowledges more than 3 When an ACK acknowledges more than 3
packets (typically 2)packets (typically 2) Adjusts cwnd to known number of packets Adjusts cwnd to known number of packets
‘in-flight’ (plus extra 3 packets)‘in-flight’ (plus extra 3 packets) Under large cwnd sizes (high bandwidth Under large cwnd sizes (high bandwidth
delay products), throughput can be delay products), throughput can be diminished as resultdiminished as result
CPU Load and ThroughputCPU Load and Throughput
moderate_cwnd OFFmoderate_cwnd ON
moderate_cwnd(): Vanilla moderate_cwnd(): Vanilla TCPTCP
CW
ND
Th
rou
gh
pu
t90% TCP AF
moderate_cwnd(): HS-TCPmoderate_cwnd(): HS-TCP
70%
TC
P A
F90
% T
CP
AF
moderate_cwnd OFFmoderate_cwnd ON
moderate_cwnd OFFmoderate_cwnd ON
70%
TC
P A
F90
% T
CP
AF
moderate_cwnd(): Scalable-moderate_cwnd(): Scalable-TCPTCP
Multiple StreamsMultiple Streams
Ag
gre
gat
e B
WC
oV
10 TCP Flows versus Self-10 TCP Flows versus Self-Similar BackgroundSimilar Background
Ag
gre
gat
e B
WC
oV
10 TCP Flows versus Self-10 TCP Flows versus Self-Similar BackgroundSimilar Background
BG
Lo
ss p
er T
CP
BW
ImpactImpact Fairness: ratio of throughput achieved by one Fairness: ratio of throughput achieved by one
stack against anotherstack against another Means that a fairness against vanilla tcp is defined by Means that a fairness against vanilla tcp is defined by
how much more throughput a new stacks gets more how much more throughput a new stacks gets more than vanillathan vanilla
Doesn’t really consider deploy-ability of the stacks in Doesn’t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing real life – how does these stacks affect the existing traffic? (mostly vanilla tcp)traffic? (mostly vanilla tcp)
Redefine fairness in terms of the Impact:Redefine fairness in terms of the Impact: Consider the affect of the background traffic only under Consider the affect of the background traffic only under
different stacksdifferent stacks
Vary against number of TCP Flows to determine Vary against number of TCP Flows to determine impact(vanilla flows)impact(vanilla flows)
throughput of n-Vanilla flows
throughput of (n-1) Vanilla flows + 1 new TCP flowBW impact =
Impact of 1 TCP FlowImpact of 1 TCP FlowT
hro
ug
hp
ut
Imp
act
Th
rou
gh
pu
t
1 New TCP Impact1 New TCP Impact
Co
V
Impact of 10 TCP FlowsImpact of 10 TCP FlowsT
hro
ug
hp
ut
Imp
act
Th
rou
gh
pu
t
10 TCP Flows Impact10 TCP Flows Impact
Co
V
WAN TestsWAN Tests
SummarySummary
Comparison of actual TCP differences through Comparison of actual TCP differences through test platform kerneltest platform kernel
Problems with SACK implementations mean that Problems with SACK implementations mean that it is difficult under loss to maintain high it is difficult under loss to maintain high throughput (>500Mbit/sec)throughput (>500Mbit/sec)
Other problems exist with kernel implementation Other problems exist with kernel implementation that hinder performancethat hinder performance
Compare stacks under different artificial (and Compare stacks under different artificial (and hence repeatable) conditionshence repeatable) conditions Single stream: Single stream: Multiple stream:Multiple stream:
Need to study over wider range of networksNeed to study over wider range of networks Move tests onto real production environmentsMove tests onto real production environments
top related