2006 © switch end-to-end performance over research networks simon leinen, switch wizard gap, pert,...

14
2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

Upload: kerry-norman

Post on 20-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH

End-to-end Performanceover Research Networks

Simon Leinen, SWITCH

Wizard gap, PERT, performance monitoring, Premium IP

Page 2: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 2

End-to-end Performance Issues

• Performance seen by end users hasn't followed backbone upgrades• “Wizard gap” (ordinary users vs. land speed record heroes)• Issues solving multi-domain performance problems• Issues solving multi-layer performance problems• Lack of performance-oriented network monitoring

-> The “ends” must be included in network performance work!• endpoints, i.e. hosts, operating systems, applications (users even)• campus networks and their administrators

Page 3: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 3

Various efforts to improve e2e performance

Internet2 “e2epi” (end-to-end performance initiative)– Performance workshops– Web100 kernel instrumentation and other TCP enhancements for Linux

enable end-user tools such as NDT (e.g. ndt.switch.ch) auto-tuning for TCP buffers experimental TCP variants (Vegas, Westwood, HS-TCP, BIC, S-TCP, H-TCP...)

GN2– PERT (Performance Enhancement and Response Team)

“like a CERT but for performance” chartered to “own” performance issues (no fingerpointing) collect knowledge, produce documentation (to make itself obsolete)

– Premium IP and other backbone-specific enhancements

Page 4: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 4

Bandwidth is not everything

Most transfers over the Internet (including the GTREN) limited by RTT– TCP window-size limitations for “LFNs” (Long Fat Networks)– short flows– delay-sensitive applications (conversational A/V, RPC, games...)

-> what works well in the LAN won't always do so over the WAN– help users tune TCP (Web100/NDT very useful here)– provide assistance with application design and engineering

alternatives to TCP etc.

RTT harder to improve than bandwidth– speed-of-light issue (btw. router hop-count quickly becoming irrelevant)– some inter-continental connections more useful than others

e.g. TEIN link through Siberia reduces EU-China RTT by half

Other important performance indicators: availability, predictibility...-> using capacity as prime “connectivity” metric no longer justified.

Page 5: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 5

Example from right here (how NOT to do it) My traceroute [v0.71]

agathe (0.0.0.0) Wed May 24 10:24:32 2006

Keys: Help Display mode Restart statistics Order of fields quit

Packets Pings

Host Loss% Snt Last Avg Best Wrst StDev

1. 10.129.21.252 0.0% 377 5.1 8.3 2.5 181.5 15.5

2. 10.64.1.8 1.3% 377 531.6 507.7 125.1 992.6 152.5

3. 172.28.95.109 2.1% 377 544.3 506.3 98.1 1003. 157.6

4. 172.28.74.22 1.6% 377 499.9 509.9 123.5 1204. 162.7

5. 172.28.76.19 1.6% 377 479.8 512.4 117.8 1155. 160.2

6. 172.28.76.33 2.7% 377 475.0 513.0 110.3 1134. 159.7

7. 172.28.75.17 2.7% 377 421.9 515.9 135.5 1102. 158.2

8. 172.28.87.4 2.9% 376 424.8 517.4 119.1 1067. 154.8

9. 172.28.218.241 2.1% 376 583.6 522.1 113.3 1096. 159.4

10. 193.158.5.13 2.9% 376 536.9 513.6 107.3 919.3 156.1

11. zrh-e4.ZRH.CH.net.DTAG.DE 3.7% 376 556.2 526.1 106.6 1027. 154.3

12. swiix1-g2-1.switch.ch 2.9% 376 511.2 534.6 120.0 1087. 158.8

13. 130.59.36.249 2.9% 376 533.0 529.7 139.7 1053. 152.1

14. swiCS3-10GE-1-1.switch.ch 2.7% 376 527.4 525.6 111.8 1052. 148.1

15. swiNM1-G1-0-25.switch.ch 1.6% 376 529.3 528.9 125.7 1090. 150.4

16. swiLM1-V610.switch.ch 2.4% 375 510.2 526.9 136.2 1037. 153.8

17. diotima.switch.ch 1.9% 375 575.2 526.9 149.9 959.0 152.4

Page 6: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 6

Page 7: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 7

GN2 PERT

Part of SA3 (Service Activity – End-to-end Quality of Service)– also called PACE - “Performance and Allocated Capacity for End-Users”

PERT Case Managers mostly from several NRENs– duty CMs, rotating weekly (with videoconference briefings)– dedicated CMs for some cases– reachable through PTS (PERT Ticket System) or [email protected]

Subject Matter Experts (SMEs) participation– issues of “recruiting” and involvement (on demand vs. interest-based)

PERT Knowledge Base (KB)– currently Wiki-based - http://kb.pert.switch.ch/– “Performance Guides” published as deliverables

Page 8: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 8

GN2 PERT Ticket System (PTS)

Page 9: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 9

PERT Knowledge Base (KB)

Page 10: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 10

GN2 PERT Cases (closed)

DEISA TCP Throughput Reduction– solved – due to GEANT packet reordering with heavy cross-traffic

will partly go away with GEANT2 (some of the routers are upgraded)

DEISA-Teragrid Performance (TCP throughput)– closed, but not solved in due time (until demo was over)

DEISA TCP Throughput issues with some sites– found RTT dependency, GEANT->GEANT2 changes explain variations

Loss of large packets on one of the e-VLBI (-> JIVE) paths– resolved by configuration

Page 11: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 11

GN2 PERT cases (ongoing)

ITER VPN– information-gathering phase – VPN makes traditional diagnostics hard

e-VLBI– ongoing investigation – infrequent tests and network changes over time

EU->US routing through Japan– ongoing, but maybe not really a case for PERT?

or, should we have all (GTREN) BGP geeks participate as SMEs?

Page 12: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 12

GN2 PERT Experience

Weaknesses– Few, and often difficult (but interesting!) cases

Mostly large groups: DEISA, e-VLBI (JIVE), DESY/FNAL, ITER... Trying to open up to larger customer base

– It's hard to close cases! lack of clear success indicators

– Friction can be further reduced weekly Case Manager handover, PTS, SME involvements

Strengths– Brings users (researchers) closer to NOCs– Mutual learning experience

Bodes well for PERT Knowledge Base Provides vital input on measurement infrastructure requirements

– Inspires PERT activities in NRENs

Page 13: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 13

SWITCH PERT Example: Opera oberta

Opera oberta– high-quality multicast transmissions of opera from Barcelona and Madrid– mostly Spanish participants, but a few in FR, MX, and now CH– currently 9 Mb/s DVB+D5.1, experimenting with HDTV (~15 Mb/s)

Customer (EPFL) contacted us– early tests were unsatisfactory (due to problems at source, it turns out)– set up NOC support (awareness, test participation, monitoring)– one transmission still failed (due to misconfigured SWITCH router)– fixed problem, improved NOC support (out-of-hours service)– next transmission (last night) a success – it had to be...

-> include aspects of availability and support in “performance” notion

Page 14: 2006 © SWITCH End-to-end Performance over Research Networks Simon Leinen, SWITCH Wizard gap, PERT, performance monitoring, Premium IP

2006 © SWITCH 14

Conclusions

• significant potential for service improvements on current infrastructure– end-host tuning, delay-robust protocols, better NOC cooperation

• PERT concept really helps– improves customers' “reach” into backbones– “user interface” can still be improved

• Leverage new developments in the future– backbone measurement instrumentation, e.g. GN2 JRA1 PerfSONAR– Premium IP and other “on-demand” services

• Long-term benefits– smart users + dumb networks -> unexpected performance and innovation

The end-to-end principles are honoured!