resilient overlay networks by david andersen, hari balakrishnan, frans kaashoek, and robert morris...

23
Resilient Overlay Networks By David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT RON Paper from ACM Oct. 2001 Advanced Operating Systems Overview by Jeff Spiegel University of Utah

Upload: calvin-ellis

Post on 03-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Resilient Overlay NetworksBy David Andersen, Hari Balakrishnan, Frans

Kaashoek, and Robert Morris

MIT RON Paper from ACM Oct. 2001

Advanced Operating Systems

Overview by Jeff Spiegel

University of Utah

Contents

• Features

• BGP-4 and AS Limitations

• Overlay Networks

• Application metrics

• Policies

• Routing

• Forwarding

RON Features

• Allows distributed applications to detect and recover from path outages or delays.

• Application layer overlay network.

• Under 20 second rerouting cost for outages often bypassing BGP gateways.

• Only one intermediate node addition for forwarding packets during faults is usually sufficient.

Interdomain Background

• Internet uses Autonomous Systems which provide aggregation to handle scaling

• Each AS handles intra routing.

• BGP-4 provides inter AS routing.

• BGP-4 does not detect and make routing choices based on congestion.

• BGP path failures can take ten or more minutes to correct.

Autonomous Systems

• Provides Scalability• Intradomain default

routes to border routers

• BGP goal to find any loop-free path to destination

• BGP does not use routing metrics

R1

Autonomous system 1R2

R3

Autonomous system 2R4

R5 R6

BGP and limited pathways between clusters of Autonomous Systems

Backbone service provider

Peeringpoint

Peeringpoint

Large corporation

Large corporation

Smallcorporation

“Consumer ” ISP

“Consumer” ISP

“ Consumer” ISP

BGP-4’s Full Path Routing Limits Alternative Routing Paths

• AS1 advertises path <AS1, AS2, AS4> to reach networks 128.96 and 192.4.153

• Looping avoided by not seeing oneself in the Ad

Backbone network(AS 1)

Regional provider A(AS 2)

Regional provider B(AS 3)

Customer P(AS 4)

Customer Q(AS 5)

Customer R(AS 6)

Customer S(AS 7)

128.96192.4.153

192.4.32192.4.3

192.12.69

192.4.54192.4.23

BGP Limitation Example and how RON helps

• Initially ArosNet client communitcating with MIT

• Aros to UUNet cut• Aros can get to Utah

through the Private network but cannot know Utah’s MIT connection

• BGP cannot route around this problem

• RON provides BGP bypass routing.

RON’s Sixteen Nodes

Overlay Networks

• Internet over telephone network first ON

• Detour has IP in IP encapsulation

• RON uses redundancy reacting to path outages and performance failures in a few seconds.

• RON added application specific metrics

• Utella’s have a similar concept but were more focused on storage, i.e. the PAST paper while RON focused on routing.

Application Metrics

• RON nodes maintain forwarding tables

• Metrics include Latency, packet loss rate, and throughput rate

• Metrics obtained by active probing and data transfer observations.

Application Specific Path Selection

• RON permits different applications to use different metrics for selecting paths.– For example, a packet forwarding application

may be more concerned with loss.– A multi-party conferencing application may be

more concerned with latency.

Latency and Loss Evaluation(Path Evaluation/Selection)

• Low frequency outage detection probes

• Weighted moving average latency sample– lat = 0.9 * lat + 0.1 * new_sample

• RON path total latency– latpath = Σpath i (lati)

• Lossrate = 1 – Prod(1 – lossratei)– Example path1 = 20%; path2 = 10%– lossrate = 1 – (0.8 * 0.9) = 0.28 or 28%

Throughput Evaluation(Path Evaluation/Selection cont.)

• RON uses the following metric for throughput evaluation– score = (1.5)1/2 / (rtt * (p)1/2)

– p is the packet loss probablity with minimum

value of 0.02. The minimum is needed

because if p -> 0 score would go to infinity

• There is further research going on in this area. (pairs of probes, non invasive probes)

RON Performance Databases

• Decentralized Databases

Routing

• The Entry node tags RON packet header with a Flow ID. Subsequent nodes attempt to keep the same forwarding path for the same Flow ID.

• The small size of a RON relative to the internet allows the handling of better routing metrics.

• Link-state routing is feasible on a small RON

Routers making router tables

Link-State Updates

• Each RON node in an N node network makes N-1 periodic requests to its virtual links for link data.

• The request is a RON routing protocol packet type.

• These requests guarantee that unless the RON node is severed from the network it will keep up to date linkage information.

RON Routing Policies

• Policy Tag specifies clique priviliges to clique links.

• Policy Tag specifies general policies which use BPF, Berkley Packet Filters to refuse connections.– BPF allows separate UDP

and TCP filters. RON uses UDP

– New connection attempts that match the filters are refused

• No fragmentation, instead end-to-end path MTU discovery

RON Forwarder

• Forwarding Algorithm if(FlowID is valid) {Forward by FlowID} else { Forward by table[Policy][Metric][Dest] }

• Flow ID – speeds routing by keeping same path.

• Client provides Routing Flags for specifying an application’s metric.

• Packet Type – ex. DNS and HTTP vs FTP may influence Metric selection for minimal latency or maximal throughput.

Results

• Loss Rate – Improvement of 5% due to rerouting around Path outages. RON currently uses bidirectional information to optimize unidirectional loss rates which doesn’t work well with assymetry.

• Latency – ‘RON improves latency by tens to hundreds of milliseconds on slower paths, 11% samples saw 40 ms improvement in RON1 (12 nodes) vs 8% in RON2 (16 nodes).’

Results (cont)

• Throughput – RON strives for a 50% bandwidth utilization on paths and probes to maintain this utilization.

• Found ‘5% of samples doubled’ their throughput with this algorithm in place vs. standard BGP routing. ‘2% of samples increased throughput by factor of five’

Discussion

• Cliques – their advantages/disadvantages• Scalability – Limits types of applications. RON’s

link state algorithms scale at O(N2). Probes are sent out every 5 minutes for checking on network status.

• Security – The PAST paper has more to say on Overlay networks in this area then RON. The authors consider the small size sufficient for administrator prevention of malicious entities.

• VPNs could have greatly improved performance with RONs.