network simulation and testing

Post on 30-Dec-2015

40 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Network Simulation and Testing. Polly Huang EE NTU http://cc.ee.ntu.edu.tw/~phuang phuang@cc.ee.ntu.edu.tw. Dynamics Papers. - PowerPoint PPT Presentation

TRANSCRIPT

Network Simulation and Testing

Polly Huang

EE NTU

http://cc.ee.ntu.edu.tw/~phuang

phuang@cc.ee.ntu.edu.tw

Polly Huang, NTU EE 2

Dynamics Papers

• Hongsuda Tangmunarunkit, Ramesh Govindan, and Scott Shenker. Internet path inflation due to policy routing. In Proceedings of the SPIE ITCom, pages 188-195, Denver, CO, USA, August 2001. SPIE

• Lixin Gao. On inferring automonous system relationships in the internet. ACM/IEEE Transactions on Networking, 9(6):733-745, December 2001

• Vern Paxson. End-to-end internet packet dynamics. ACM/IEEE Transactions on Networking, 7(3):277-292, June 1999

• Craig Labovitz, G. Robert Malan, Farnam Jahanian. Internet Routing Instability. ACM/IEEE Transactions on Networking, 6(5):515-528, October 1998

Polly Huang, NTU EE 3

Doing Your Own Analysis

• Having a problem

• Need to simulate or to test

• Define experiments– Base scenarios– Scaling factors– Metrics of investigation

Polly Huang, NTU EE 4

Base Scenarios

• The source models– To generate traffic

• The topology models– To generate the network

• Then?

Polly Huang, NTU EE 5

Internet Dynamics

• How traffic flow across the network– Routing– Shortest path?

• How failures occur– Packets dropped– Routes failed– i.i.d?

Policy routing

Packet/Route dynamics

Identifying Internet Dynamics

Routing Policy

Packet Dynamics

Routing Dynamics

To the best of our knowledge, we could now generate:

AS-level topology

Hierarchical router-level topology

Polly Huang, NTU EE 8

The Problem

• Does it matter what routing computation we use?

• Equivalent of – Can I just do shortest path computation?

Polly Huang, NTU EE 9

Topology with Policy

• Internet Path Inflation Due to Policy Routing

• Hongsuda Tangmunarunkit, Ramesh Govindan, Scott Shenker

• In Proceedings of the SPIE ITCom, pages 188-195, Denver, CO, USA, August 2001. SPIE

Polly Huang, NTU EE 10

Paper of Choice

• Methodological value– A simple ‘re-examine’ type of study– To strengthen technical value of prior work

• Technical value– Actual paths are not the shortest due to routing policy.– The routing policy is business-driven and can be quite

hard to obtain. – Shown in this paper, for simulation study concerning

large-scale route path characteristics, a simple shortest-AS policy routing may be sufficient.

Polly Huang, NTU EE 11

shortest

Inter-AS Routing

AS 1

AS 3AS 2

AS 4

AS 5

source destination

Polly Huang, NTU EE 12

Hierarchical Routing

Inter-AS shortest

sourcedestination

Intra-AS shortest

Polly Huang, NTU EE 13

Flat Routing

sourcedestination

shortest

5:3

Hierarchical Routing is not optimal

Or

Routes are inflated

How sub-optimal?

Polly Huang, NTU EE 16

Prior Work

• Based on – An actual router-level graph– An actual AS-level graph at the same time– Overlay the AS-level graph on the router-level graph

• Compute– For each source-destination pair– Shortest path using hierarchical routing– Shortest path using flat routing

• Compare route length – In number of router hops

Polly Huang, NTU EE 17

Prior Conclusions

• 80% of the paths are inflated

• 20% of the paths are inflated > 50%

• There exists a better detour for 50% of the source-destination pairs– There exists an intermediate node i such that Le

ngth(s-i-d) < Length(s-d)

Polly Huang, NTU EE 18

This Work

• To address 2 shortcomings– There’s now a newer router-level graph– There’s now a more sophisticated policy model

• Paper #4

• Inter-AS routing is not quite ‘shortest-AS routing’

Polly Huang, NTU EE 19

Newer vs. Older Graph

• Inflation difference not the same– Difference is larger in the newer graph– Due to the newer graph being larger

• Inflation ratio remains the same

Polly Huang, NTU EE 20

Shortest-AS vs. Policy-AS Routing

• Shortest-AS– Simplified model

– Every AS is equal

• Policy-AS– Realistic model

– Not all ASs are the same• Some are provider ASs

• Some are customer ASs

• Customer ASs do not transit traffic

Polly Huang, NTU EE 21

Consider TANET CHT

CHT

NTU

TANET

UUNET

Through NTU?

Through UUNET?

Provider

Customer

Polly Huang, NTU EE 22

Routing with Constraints

• Routes could be– Going up – Going down– Going up and then down

• Routes can never be– Going down and then up

Polly Huang, NTU EE 23

Inferring the Constraints

• On Inferring Autonomous System Relationships in the Internet

• Lixin Gao

• ACM/IEEE Transactions on Networking, 9(6):733-745, December 2001

Polly Huang, NTU EE 24

Not All ASs the Same

• 2 types of ASs– Customer– Provider

• 3 types of Relationships– Customer-provider– Provider-provider

• Peer-peer

• Sibling-sibling

Polly Huang, NTU EE 25

Customer-Provider

• Formal definition– A provider transits for its customer

– A customer does no transit for its provider

• Informal– Provider: I’ll take any traffic

– Customer: I’ll take only the traffic to me (or my customers)

Polly Huang, NTU EE 26

Peer-Peer

• Formal Definition– A provider does not transit for another provider

• Informal– I’ll take only the traffic to me (or my customers)

– You’ll take only the traffic to you (or your customers)

Polly Huang, NTU EE 27

Sibling-Sibling

• Formal Definition– A provider transits for another provider

• Informal– I’ll take any traffic

– You’ll take any traffic

Polly Huang, NTU EE 28

Never “Going Down and then Up”

• A provider-customer link can be followed by only– Provider-customer link

– (Or sibling-sibling link)

• A peer-peer link can be followed by only– Provider-customer link

– (Or sibling-sibling link)

Polly Huang, NTU EE 29

Heuristics

• Compute out-degrees

• For each AS path in routing tables– 1st AS with the max degree the root of hierarchy– From the root, drawing providercustomer

relationship down 2 ends of the AS path

Polly Huang, NTU EE 30

Determining Siblings

• After gone through all AS paths

• Any AS pair being both provider and customer to each other are siblings

Polly Huang, NTU EE 31

Determining Peers

• Do another pass on the AS paths in routing tables

• For each AS path– Top AS who does not have sibling relationships

with the neighboring ASs– Could have peering relationship with the higher

out-degree neighbor – Given the Top AS and the higher out-degree ne

ighbor are comparable in out-degree

Polly Huang, NTU EE 32

Back to Path Inflation

• Draw the customer-provider, peer-peer, and sibling-sibling relationships on the overlay AS graph

• Compute the best routes under the ‘never going down and then up’ constraint

• Compare the inflation difference and ratio again with these running at the inter-AS level– Shortest – Policy

Polly Huang, NTU EE 33

Shortest vs. Policy Routing

• Pretty much the same both in terms of – Inflation difference

– Inflation ratio

Polly Huang, NTU EE 34

Therefore

• The observations from the prior work holds– With a newer graph– With the more realistic inter-AS policy routing

Now forget path inflation

How far away is the shortest to the policy inter-AS routing?

Polly Huang, NTU EE 36

Shortest vs. Policy

• In AS hops– 95% paths have the same length– Policy routes always longer

• In router hops– 84% paths have the same length– Some policy routes longer, some shorter

95% and 84% are pretty good numbers

Therefore shortest path at the inter-AS level might be OK…

Polly Huang, NTU EE 38

To Answer the Question

• Can we simply do shortest path computation?– A likely yes for AS-level graph– A firm no for hierarchical graph

• Must separate inter-AS shortest and intra-AS shortest

Questions?

Identifying Internet Dynamics

Routing Policy

Packet Dynamics

Routing Dynamics

It’s never a perfect world…

Polly Huang, NTU EE 42

The Problem

• But how perfect is the Internet?

• The Internet– A network of computers with stored information

– Some valuable, some relevant

– You participate by putting information up or getting information down

– From time to time, you can’t quite do some of these things you want to do

Why is that?

At the philosophical level…

Humans are so bound to failures.And the Internet is human-made.

But, Seriously…

Consider loading a Web page

Polly Huang, NTU EE 46

Web Surfing Failures

• The ‘window’ waving forever?

• An error message saying network not reachable

• An error message saying the server too busy

• An error message saying the server is down

• Anything else?

Polly Huang, NTU EE 47

Network Specific Failures

• The ‘window’ waving forever?

• An error message saying network not reachable

• An error message saying the server too busy

• An error message saying the server is down

• Anything else?

Polly Huang, NTU EE 48

The Causes

• The ‘window’ waving forever– Congestion in the network

– Buffer overflow

– Packet drops

• An error message saying network not reachable– Network outage

– Broken cables, Frozen routers

– Route re-computation

– Route instability

Polly Huang, NTU EE 49

Back to the Problem

• But how perfect is the Internet?

• Equivalent of– Packets can be dropped

• How frequent• How much

– Routes may be unstable• How frequent• For how long

Polly Huang, NTU EE 50

Significance

• Knowing the characteristics of packet drops and route instability helps – Design for fault-tolerance– Test for fault-tolerance

There are tons of formal/informal study on the dynamics…

Let’s take a look at a couple that are classical

Polly Huang, NTU EE 52

Packet Dynamics

• End-to-End Internet Packet Dynamics

• Vern Paxson

• ACM/IEEE Transactions on Networking, 7(3):277-292, June 1999

Polly Huang, NTU EE 53

Emphasis in Reverse Order

• Real subject of study– Packet loss– Packet delay

• Necessary assessment– The unexpected– Bandwidth estimation

Polly Huang, NTU EE 54

Measurement

• Instrumentation– 35 sites, 9 countries– Education, research, provider, company

• 2 runs– N1: Dec 1994– N2: Nov-Dec 1995– 21 sites in common

Polly Huang, NTU EE 55

Measurement Methodology

• Each site running NPD – A daemon program– Sender side sends 100KB TCP transfer

• Sender and receiver sides both – tcpdump the packets

• Noteworthy– Measurement occurred in Poisson arrival

• Unbiased to time of measurement

– N2 used big max window size• Prevent window size to limit the TCP connection throughput

Polly Huang, NTU EE 56

Packet Loss

• Overall loss rate:– N1 2.7%, N2 5.2%– N2 higher, because of big max window?

• I.e. Pumping more data into the network therefore more loss?

• Big max window in N2 is not a factor– By separating data and ack loss– Assumption: ack traffic in a half lower rate

• Won’t stress the network

– Ack loss: N1 2.88%, N2 5.14%– Data loss: N1 2.65%, N2 5.28%

Polly Huang, NTU EE 57

Quiescent vs. Busy

• Definition– Quiescent: connections without ack drops– Busy: otherwise

• About 50% of the connections are quiescent

• For connections are busy– Loss rate: N1 5.7%, N2 9.2%

Polly Huang, NTU EE 58

More Numbers

• Geographical effect

• Time of the day effect

Polly Huang, NTU EE 59

Towards a Markov Chain Model

• For hours long– No-loss connection now indicates further no-loss conne

ction in the future

– Lossy connection now indicates further lossy connections in the future

• For minutes long– The rate remains similar

pn

No loss Loss

pl1-pn

1-pl

Polly Huang, NTU EE 60

Another Classification

• Data– Loaded data: packets experiencing queueing delay due t

o own connection

– Unloaded data: packets not experiencing queueing delay due to own connection

– Bottleneck bandwidth measurement is needed here to determine whether a packet is loaded or not

• Ack– Simply acks

Polly Huang, NTU EE 61

3 Major Observations

• Although loss rate very high (47%, 65%, 68%), all connections complete in 10 minutes

• Loss of data and ack not correlated• Cumulative distribution of per connection loss rate

– Exponential for data

– Not so exponential for ack

– Adaptive sampling contributing to the exponential observation?

Polly Huang, NTU EE 62

More on the Markov Chain Model

• The loss rate Pu – The rate of loss

• The conditional loss rate Pc– The rate of loss when the previous packet is lost

• Contrary to the earlier work– Losses are busty– Duration shows pareto upper tail – (Polly: maybe more log-normal)

Polly Huang, NTU EE 63

You might ask…pl ,pn?

pn

No loss Loss

pl1-pn

1-pl

Polly Huang, NTU EE 64

Values for the pl’s

N1 N2

Loaded data 49% 50%

Unloaded data 20% 25%

Ack 25% 31%

Polly Huang, NTU EE 65

Possible Invariant

• Conditional loss rate

• For the value remains relatively close over the 1 year period

• More up-to-date data to verifying this?

• The loss burst size log normal?

• Both interested research questions

Polly Huang, NTU EE 66

Packet Delay

• Looking at one-way transit times (OTT)• There’s model for OTT distribution

– Shifted gamma– Parameters changes with regards to time and

path…

• Internet path are asymmetric– OTT one way often not equal OTT the other

way

Polly Huang, NTU EE 67

Timing Compression

• Ack compressions are small events

• So not really pose threads on– Ack clocking– Rate estimation based control

• Data compression very rare– For outlier filtering

Polly Huang, NTU EE 68

Queueing Delay

• Variance of OTT over different time scales– For each time scale – Divide the packets arrival into intervals of – For all 2 neighboring intervals l, r

• ml the median of OTT in interval l

• mr the median of OTT in interval r

• Calculate (ml-mr)

• Variance of OTT over is median of all (ml-mr)

Polly Huang, NTU EE 69

Finding the Dominant Scale

• Looking for ’s whose queueing variance are large– Where control most needed

• For example, if those ’s re smaller than RTT– Then TCP doesn’t need to bother adapting to q

ueueing fluctuations

Polly Huang, NTU EE 70

Oh Well

• Queueing delay variations occur– Dominantly on 0.1-1 sec scales– But non-negligibly on larger scales

Polly Huang, NTU EE 71

Share of Bandwidth

• Pretty much uniformly distributed

Polly Huang, NTU EE 72

Conclusions on Analysis

• Common assumptions violated– In-order packet delivery– FIFO queueing– Independent loss– Single congestion time scale– Path asymmetry

• Behavior– Very wide range, not one typical

Polly Huang, NTU EE 73

Conclusions on Design

• Measurement methodology– TCP-based measurement shown viable– Sender-side only inferior

• TCP implementation– Sufficiently conservative

The Pathologies

The strange stuff

Polly Huang, NTU EE 75

Packet Re-Ordering

• Varying widely and too few samples• Therefore, deriving only a rule of thumb

– The Internet paths sometimes experience bad reordering

– Mainly due to route flapping

– Occasionally this funny case of router implementation• Buffering packets while processing a route update

• Sending these packets interleaving with the post-update arrivals

Polly Huang, NTU EE 76

Orthogonal to TCP SACK

• Receiver end modification– 20 msec wait before sending duplicate acknowledgeme

nt

– Waiting for re-ordered packets therefore lower false duplicate acknowledge

– Dup acks should be indication of losses

• Sender end motification– Fast retransmission after 2 duplicate acknowledgements

– Reactive fast retransmission, higher throughput

Polly Huang, NTU EE 77

Packet Replication

• Very strange, can’t quite explain– A pair of acks duped 9 times, arriving 32 msec apart

– A data packet duped 23 times, arriving in burst• False-configured bridge?

• Observation– Most of these site specific

– But small number of dups spread between other sites

– Senders dup packets too

Polly Huang, NTU EE 78

Packet Corruption

• Checksum good?

• Problem– The traces contain only the header data– Pure ack OK, the header = the packet– Data not OK, the header <> the packet

• Use an corruption inferring algorithm in tcpanaly

Polly Huang, NTU EE 79

Corruption Rate

• 1 corruption out of 5000 data packets• 1 corruption out of 300,000 pure acks

• Possible reasons of the difference– Header compression– Packet size– Inferring tool discrepancy– Other router/link level implementation artifacts

Polly Huang, NTU EE 80

Implication

• 16-bit checksum no longer sufficient– A corrupted packet has a one 216th chance to have the s

ame checksum as the non-corrupted packet– I.e., one out of the 216 corrupted packet can’t be detecte

d by the checksum

• Since 1 out of 5000 data packets is corrupted– 1 out of 5000 * 216 (300 M) packets can’t be identified a

s corrupted by the TCP 16-bit checksum– Consider one Gbps link and packet size 1Kb 1M Pps– 3 seconds per falsely received corrupted packet

Polly Huang, NTU EE 81

Estimating Bottleneck Bandwidth

• The packet pair technique– Send 2 packets back to back (or close enough)

• Inter-packet time, T2-T1, very small

– When then go across the bottleneck• Serving packet 1 while packet 2 will be queued

• Packet 2 immediately follow packet 1

– Packets will be stretched • Internet-packet time, T2-T1 , now the transmission time of

packet 1

– Estimated bandwidth = (Size of packet 1)/(T2-T1 )

Polly Huang, NTU EE 82

This Won’t Work

• Bottleneck bandwidth higher than sending rate

• Out-of-order delivery

• Clock resolution

• Changes in the bottleneck bandwidth

• Multi bottlenecks

Polly Huang, NTU EE 83

PBM

• Instead of sending a pair

• Send a bunch

• More robust again the multi bottleneck problem

Questions?

Identifying Internet Dynamics

Routing Policy

Packet Dynamics

Routing Dynamics

Polly Huang, NTU EE 86

Route Instability

• Internet Routing Instability

• Craig Labovitz, G. Robert Malan, Farnam Jahanian

• ACM/IEEE Transactions on Networking, 6(5):515-528, October 1998

Polly Huang, NTU EE 87

BGP Specific• BGP is an important part of the Internet

– Connecting the domains– Widespread– Known in prior work that route failure could result in

• Packet loss• Longer network delay• Network outage (Time to globally converge to local change)

• A closer look at the BGP dynamics– How much route updates are sent– How frequent are they sent– How useful are these updates

Polly Huang, NTU EE 88

BGP (In a Slide)

• The routing protocol running among the border routers– Path Vector– Think DV– Exchange not just next hop, but entire path

• Dynamics– In case of link/router recovery

• Exchange from the recovering point the route announcements

– In case of link/router down• Exchange from the closed point the route withdraws

– Route updates• Including route announcements/withdraws

Polly Huang, NTU EE 89

Data Collection

• Monitoring exchange of route updates– Over 9 month period– 5 public exchange points in the core

• Exchange point– Connecting points of ASs– Public exchange: of the US government– Private exchange: of the commercial providers

Polly Huang, NTU EE 90

Terminology

• AS– You all know

– In the path of the path vector exchanged by BGP• AS-PATH

• Prefix– Basically network address

– The source/destination of the route entries in BGP• 140.119.154/24

• 140.119/16

Polly Huang, NTU EE 91

Classification of Problems

• Forward instability– Legitimate topological changes affecting paths

• Routing policy fluctuation– Changes in routing policy but not affecting

forwarding paths

• Pathological updates– Redundant information not affecting routing

nor forwarding

Polly Huang, NTU EE 92

Forwarding Instability

• WADiff– A route is explicitly withdrawn– Replaced with an alternative route– As it becomes unreachable– The alternative route is different in AS-PATH or next-hop

• AADiff– A route is implicitly withdrawn– Replaced with an alternative route– As it becomes unreachable or a preferred alternative route

becomes available

Polly Huang, NTU EE 93

In the Middle• WADup

– A route is explicitly withdrawn– Then re-announced as reachable– Could be

• Pathological• Forwarding instability: transient topological change

• AADup– A route is implicitly withdrawn– Replaced with a duplicate of the original route

• Same AS-PATH and next-hop

– Could be • Pathological• Policy fluctuation: differ in other policy attributes

Polly Huang, NTU EE 94

Pathological

• WWDup– Repeated withdraws for a prefix no longer reac

hable– Pathological

Polly Huang, NTU EE 95

Observations – The Majority

• Pathological updates (redundant)– Minimum effect on

• Route quality

• Router processing load

– Some not agree– Adding significant amount of traffic

• 300 updates/second could crash a high-end router

Polly Huang, NTU EE 96

Observation - Instability

• Forwarding instability– 3-10% WADiff– 5-20% AADiff– 10-50% WADup

• Policy fluctuation– AADup quite high – But most probably pathological

• Need this– The Internet routing works become of these necessary a

nd frequent updates

Polly Huang, NTU EE 97

Observation – Distribution

• No spacial correlation– Correlates to router implementation instead

• Temporal– Time the the date effect, date of the week effect

– Therefore correlates to network congestion

• Periodicity– 30, 60 second period

– For self-sync, mis-configuration, BGP is soft-state based, etc

Basically, not saying much…

But for the background

And ease of reading

Questions?

Polly Huang, NTU EE 100

What Should You Do?

• Routing policy– Intra-AS: shortest path– Inter-AS: shortest path (95%, 84% OK)– Better model in progress…

• Packet losses– 2-state markov chain model

• pl: some info• pn: no info…

• Routing instability: outage time– The paper #2 of the original paper set (OSPF vs. DV)

top related