fast protocols for high speed network david netlab, caltech for henp wg, feb 1st 2003
DESCRIPTION
FAST project Goal: Protocols (TCP/AQM) for ultrascale networks Bandwidth: 10Mbps ~ > 100 Gbps Delay: ms delay Research: Theory, algorithms, design, implement, demo, deployment Urgent Need: –Large amount of Data to share (500TB in SLAC) –Typical file in SLAC transfer ~1 TB (15 mins with 10Gbps)TRANSCRIPT
FAST Protocolsfor High Speed Network
David Wei @ netlab, Caltech
For HENP WG, Feb 1st 2003
FAST Protocols for Ultrascale Networks
netlab.caltech.edu/FAST
Internet: distributed feedback control system• TCP: adapts sending rate to congestion• AQM: feeds back congestion information
Rf (s)
Rb’(s)
x
))((1ll
ll cty
cp
)()(1)( tan)(
)()(1-2 tqtt
tTwx iid
tqtxi
ii ii
ii
y
pq
TCP AQM
Theory
Calren2/Abilene
Chicago
Amsterdam
CERN
Geneva
SURFNet
StarLight
WAN in LabCaltech
research & production networks
Multi-Gbps50-200ms delayExperiment
155Mb/s
slowstart
equilibrium
FASTrecovery
FASTretransmit
timeout
10Gb/s
ImplementationStudents
Choe (Postech/CIT)
Hu (Williams) J. Wang (CDS) Z.Wang (UCLA)
Wei (CS)
Industry Doraiswami (Cisc
o) Yip (Cisco)
Faculty Doyle (CDS,EE,B
E) Low (CS,EE)
Newman (Physics) Paganini (UCLA)
Staff/Postdoc Bunn (CACR)
Jin (CS) Ravot (Physics) Singh (CACR)
Partners CERN, Internet2, CENIC, StarLight/UI, SLAC, AMPATH, Cisco
People
FAST project• Goal: Protocols (TCP/AQM) for ultrascale networks
Bandwidth: 10Mbps ~ > 100 Gbps Delay: 50-200ms delay Research: Theory, algorithms, design, implement, demo,
deployment
• Urgent Need:– Large amount of Data to share (500TB in SLAC)– Typical file in SLAC transfer ~1 TB (15 mins with 10
Gbps)
HEP Network (DataTAG)
NLSURFnet
GENEVA
UKSuperJANET4
ABILENE
ESNET
CALREN
ItGARR-B
GEANT
NewYork
FrRenater
STAR-TAP
STARLIGHT
Wave
Triangle
• 2.5 Gbps Wavelength Triangle 2002 • 10 Gbps Triangle in 2003
Newman (Caltech)
Projected performance
Ns-2: capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps100 sources, 100 ms round trip propagation delay
’01155
’02622
’032.5
’04 5
’05 10
J. Wang (Caltech)
Throughput as function of the timeChicago -> CERN
Linux kernel 2.4.19
Traffic generated by iperf (I measure the throughput over the last 5 sec)
TCP single stream
RTT = 119ms MTU = 1500
Duration of the test : 2 hours
0
100
200
300
400
500
0 1000 2000 3000 4000 5000 6000 7000Time (s)
Thro
ughp
ut (M
b/s)
By Sylvain Ravot (Caltech)
Current TCP (Linux Reno)
As MTU increase…
1.5K, 4K, 9K …
0
200
400
600
800
1000
0 1000 2000 3000Time (s)
Thro
ughp
ut (
Mb/
s)
MTU=1498MTU=3998MTU=8988
By Sylvain Ravot (Caltech)
Current TCP (Linux Reno)
Better?????
By Some Dreamers (Somewhere)
FAST
Network• CERN (Geneva) SLAC (Sunnyvale), GE, Standard MTU
Sunnyval -> CERN
Linux kernel 2.4.18-FAST enabled
RTT = 180 ms
MTU = 1500
By C. Jin & D. Wei (Caltech)
Theoretical Background
Congestion control
xi(t)
xi(t)
xi(t)
liRli link uses source if 1
RliRli link usenot does source if 0
Congestion control
xi(t)
Example congestion measure pl(t)– Loss (Reno)– Queueing delay (Vegas)
pl(t)
xi(t) xi(t)→pl(t)
liRli link uses source if 1
Xi
iill xRy ,
AQM:yl(t)
TCP
TCP/AQM
• Congestion control is a distributed asynchronous algorithm to share bandwidth
• It has two components– TCP: adapts sending rate (window) to congestion– AQM: adjusts & feeds back congestion information
• They form a distributed feedback control system– Equilibrium & stability depends on both TCP and AQM– And on delay, capacity, routing, #connections
pl(t)
xi(t)TCP: Reno Vegas
AQM: DropTail RED REM/PI AVQ
MethodologyProtocol (Reno, Vegas, RED,
REM/PI…)
Equilibrium Performance
Throughput, loss, delay
Fairness Utility
Dynamics Local stability Cost of stabilization
))( ),(( )1())( ),(( )1(txtpGtptxtpFtx
Goal: Fast AQM Scalable TCP
• Equilibrium properties– Uses end-to-end delay (and loss) as congestion measure– Achieves any desired fairness, expressed by utility function– Very high bandwidth utilization (99% in theory)
• Stability properties– Stability for arbitrary delay, capacity, routing & load– Good performance
• Negligible queueing delay & loss introduced by the protocol• Fast response
Implementation and Experiment
Implementation
First Version (demonstrated in SuperComputing Conf, Nov 2002):
• Sender-side kernel modification (Good for File sharing service)
• Challenges:– Effects ignored in theory– Large window size and high speed
SCinet Caltech-SLAC experiments
netlab.caltech.edu/FAST
SC2002 Baltimore, Nov 2002
Network Topology
Sunnyvale Baltimore
Chicago
Geneva
3000km 1000km
7000
km
C. Jin, D. Wei, S. LowFAST Team and Partners
FAST TCP Standard MTU Peak window = 14,255 pkts Throughput averaged over > 1hr
• 925 Mbps single flow/GE card 9.28 petabit-meter/sec
1.89 times LSR• 8.6 Gbps with 10 flows
34.0 petabit-meter/sec 6.32 times LSR
• 21TB in 6 hours with 10 flows
Highlights
1 2
1
2
7
910
Gene
va-S
unny
vale
Baltim
ore-Su
nnyva
leFAST
I2 L
SR
#flows
FAST BMPSflows Bmps
PetaThruputMbps
Distancekm
Delayms
MTUB
Durations
TransferGB
Path
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA – ISI, Va
Caltech-SLAC19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Caltech-SLAC18.11.2002
7 24.17 6,123 3,948 85 1,500 21,600 15,396 Baltimore -Sunnyvale
Caltech-SLAC19.11.2002
9 31.35 7,940 3,948 85 1,500 4,030 3,725 Baltimore -Sunnyvale
Caltech-SLAC20.11.2002
10 33.99 8,609 3,948 85 1,500 21,600 21,647 Baltimore -SunnyvaleMbps = 106 b/s; GB = 230 bytes
• C. Jin, D. Wei, S. Low• FAST Team and Partners
FAST BMPSflows Bmps
PetaThruput
MbpsDistance
kmDelay
msMTU
BDuration
sTransfer
GBPath
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA – ISI, Va
Caltech-SLAC19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Mbps = 106 b/s; GB = 230 bytes • C. Jin, D. Wei, S. Low• FAST Team and Partners
FAST BMPSflows Bmps
PetaThruput
MbpsDistance
kmDelay
msMTU
BDuration
sTransfer
GBPath
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA – ISI, Va
Caltech-SLAC19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Mbps = 106 b/s; GB = 230 bytes • C. Jin, D. Wei, S. Low• FAST Team and Partners
FAST BMPSflow
sBmpsPeta
ThruputMbps
Distancekm
Delayms
MTUB
Durations
TransferGB
Path
Alaska-Amsterdam
9.4.2002
1 4.92 401 12,272 - - 13 0.625 Fairbanks, AL – Amsterdam,
NL
MS-ISI29.3.2000
2 5.38 957 5,626 - 4,470 82 8.4 MS, WA – ISI, Va
Caltech-SLAC19.11.2002
1 9.28 925 10,037 180 1,500 3,600 387 CERN -Sunnyvale
Caltech-SLAC19.11.2002
2 18.03 1,797 10,037 180 1,500 3,600 753 CERN -Sunnyvale
Caltech-SLAC18.11.2002
7 24.17 6,123 3,948 85 1,500 21,600 15,396 Baltimore -Sunnyvale
Caltech-SLAC19.11.2002
9 31.35 7,940 3,948 85 1,500 4,030 3,725 Baltimore -Sunnyvale
Caltech-SLAC20.11.2002
10 33.99 8,609 3,948 85 1,500 21,600 21,647 Baltimore -Sunnyvale
Mbps = 106 b/s; GB = 230 bytes • C. Jin, D. Wei, S. Low• FAST Team and Partners
FAST Aggregate throughput
1 flow 2 flows 7 flows 9 flows 10 flows
Average utilization
95%92%
90%
90%
88%FAST• Standard MTU• Utilization averaged over > 1hr
1hr 1hr 6hr 1.1hr 6hr
C. Jin, D. Wei, S. Low
FAST vs Linux TCP (2.4.18-3)
Linux TCP Linux TCP FAST
Average utilization
19%
27%
92%FAST• Standard MTU• Utilization averaged over 1hr
txq=100 txq=10000
95%
16%
48%
Linux TCP Linux TCP FAST
2G
1G
C. Jin (Caltech)
Trial Deployment
FAST Kernel Installed:• SLAC: Les Cottrell, etc. www-iepm.slac.stanford.edu/monitoring/bulk/fast• FermiLab: Michael Ernst, etc.
Coming soon:• 10-Gbps NIC Testing (Sunnyval - CERN)• Internet2• …
Detailed Information:
• Home Page: http://Netlab.caltech.edu/FAST• Theory: http://
netlab.caltech.edu/FAST/overview.html• Implementation & Testing: http://
netlab.caltech.edu/FAST/software.html• Publications: http://
netlab.caltech.edu/FAST/publications.html
FAST
netlab.caltech.edu/FAST• Theory
D. Choe (Postech/Caltech), J. Doyle, S. Low, F. Paganini (UCLA), J. Wang, Z. Wang (UCLA)• Prototype
C. Jin, D. Wei• Experiment/facilities
– Caltech: J. Bunn, C. Chapman, C. Hu (Williams/Caltech), H. Newman, J. Pool, S. Ravot (Caltech/CERN), S. Singh
– CERN: O. Martin, P. Moroni– Cisco: B. Aiken, V. Doraiswami, R. Sepulveda, M. Turzanski, D. Walsten, S. Yip– DataTAG: E. Martelli, J. P. Martin-Flatin– Internet2: G. Almes, S. Corbato– Level(3): P. Fernes, R. Struble– SCinet: G. Goddard, J. Patton– SLAC: G. Buhrmaster, R. Les Cottrell, C. Logg, I. Mei, W. Matthews, R. Mount, J. Navratil,
J. Williams– StarLight: T. deFanti, L. Winkler– TeraGrid: L. Winkler
• Major sponsorsARO, CACR, Cisco, DataTAG, DoE, Lee Center, NSF
Acknowledgments
Thanks
Questions?