network simulation and testing
DESCRIPTION
Network Simulation and Testing. Polly Huang EE NTU http://cc.ee.ntu.edu.tw/~phuang [email protected]. Dynamics Papers. - PowerPoint PPT PresentationTRANSCRIPT
Polly Huang, NTU EE 2
Dynamics Papers
• Hongsuda Tangmunarunkit, Ramesh Govindan, and Scott Shenker. Internet path inflation due to policy routing. In Proceedings of the SPIE ITCom, pages 188-195, Denver, CO, USA, August 2001. SPIE
• Lixin Gao. On inferring automonous system relationships in the internet. ACM/IEEE Transactions on Networking, 9(6):733-745, December 2001
• Vern Paxson. End-to-end internet packet dynamics. ACM/IEEE Transactions on Networking, 7(3):277-292, June 1999
• Craig Labovitz, G. Robert Malan, Farnam Jahanian. Internet Routing Instability. ACM/IEEE Transactions on Networking, 6(5):515-528, October 1998
Polly Huang, NTU EE 3
Doing Your Own Analysis
• Having a problem
• Need to simulate or to test
• Define experiments– Base scenarios– Scaling factors– Metrics of investigation
Polly Huang, NTU EE 4
Base Scenarios
• The source models– To generate traffic
• The topology models– To generate the network
• Then?
Polly Huang, NTU EE 5
Internet Dynamics
• How traffic flow across the network– Routing– Shortest path?
• How failures occur– Packets dropped– Routes failed– i.i.d?
Policy routing
Packet/Route dynamics
Identifying Internet Dynamics
Routing Policy
Packet Dynamics
Routing Dynamics
To the best of our knowledge, we could now generate:
AS-level topology
Hierarchical router-level topology
Polly Huang, NTU EE 8
The Problem
• Does it matter what routing computation we use?
• Equivalent of – Can I just do shortest path computation?
Polly Huang, NTU EE 9
Topology with Policy
• Internet Path Inflation Due to Policy Routing
• Hongsuda Tangmunarunkit, Ramesh Govindan, Scott Shenker
• In Proceedings of the SPIE ITCom, pages 188-195, Denver, CO, USA, August 2001. SPIE
Polly Huang, NTU EE 10
Paper of Choice
• Methodological value– A simple ‘re-examine’ type of study– To strengthen technical value of prior work
• Technical value– Actual paths are not the shortest due to routing policy.– The routing policy is business-driven and can be quite
hard to obtain. – Shown in this paper, for simulation study concerning
large-scale route path characteristics, a simple shortest-AS policy routing may be sufficient.
Polly Huang, NTU EE 11
shortest
Inter-AS Routing
AS 1
AS 3AS 2
AS 4
AS 5
source destination
Polly Huang, NTU EE 12
Hierarchical Routing
Inter-AS shortest
sourcedestination
Intra-AS shortest
Polly Huang, NTU EE 13
Flat Routing
sourcedestination
shortest
5:3
Hierarchical Routing is not optimal
Or
Routes are inflated
How sub-optimal?
Polly Huang, NTU EE 16
Prior Work
• Based on – An actual router-level graph– An actual AS-level graph at the same time– Overlay the AS-level graph on the router-level graph
• Compute– For each source-destination pair– Shortest path using hierarchical routing– Shortest path using flat routing
• Compare route length – In number of router hops
Polly Huang, NTU EE 17
Prior Conclusions
• 80% of the paths are inflated
• 20% of the paths are inflated > 50%
• There exists a better detour for 50% of the source-destination pairs– There exists an intermediate node i such that Le
ngth(s-i-d) < Length(s-d)
Polly Huang, NTU EE 18
This Work
• To address 2 shortcomings– There’s now a newer router-level graph– There’s now a more sophisticated policy model
• Paper #4
• Inter-AS routing is not quite ‘shortest-AS routing’
Polly Huang, NTU EE 19
Newer vs. Older Graph
• Inflation difference not the same– Difference is larger in the newer graph– Due to the newer graph being larger
• Inflation ratio remains the same
Polly Huang, NTU EE 20
Shortest-AS vs. Policy-AS Routing
• Shortest-AS– Simplified model
– Every AS is equal
• Policy-AS– Realistic model
– Not all ASs are the same• Some are provider ASs
• Some are customer ASs
• Customer ASs do not transit traffic
Polly Huang, NTU EE 21
Consider TANET CHT
CHT
NTU
TANET
UUNET
Through NTU?
Through UUNET?
Provider
Customer
Polly Huang, NTU EE 22
Routing with Constraints
• Routes could be– Going up – Going down– Going up and then down
• Routes can never be– Going down and then up
Polly Huang, NTU EE 23
Inferring the Constraints
• On Inferring Autonomous System Relationships in the Internet
• Lixin Gao
• ACM/IEEE Transactions on Networking, 9(6):733-745, December 2001
Polly Huang, NTU EE 24
Not All ASs the Same
• 2 types of ASs– Customer– Provider
• 3 types of Relationships– Customer-provider– Provider-provider
• Peer-peer
• Sibling-sibling
Polly Huang, NTU EE 25
Customer-Provider
• Formal definition– A provider transits for its customer
– A customer does no transit for its provider
• Informal– Provider: I’ll take any traffic
– Customer: I’ll take only the traffic to me (or my customers)
Polly Huang, NTU EE 26
Peer-Peer
• Formal Definition– A provider does not transit for another provider
• Informal– I’ll take only the traffic to me (or my customers)
– You’ll take only the traffic to you (or your customers)
Polly Huang, NTU EE 27
Sibling-Sibling
• Formal Definition– A provider transits for another provider
• Informal– I’ll take any traffic
– You’ll take any traffic
Polly Huang, NTU EE 28
Never “Going Down and then Up”
• A provider-customer link can be followed by only– Provider-customer link
– (Or sibling-sibling link)
• A peer-peer link can be followed by only– Provider-customer link
– (Or sibling-sibling link)
Polly Huang, NTU EE 29
Heuristics
• Compute out-degrees
• For each AS path in routing tables– 1st AS with the max degree the root of hierarchy– From the root, drawing providercustomer
relationship down 2 ends of the AS path
Polly Huang, NTU EE 30
Determining Siblings
• After gone through all AS paths
• Any AS pair being both provider and customer to each other are siblings
Polly Huang, NTU EE 31
Determining Peers
• Do another pass on the AS paths in routing tables
• For each AS path– Top AS who does not have sibling relationships
with the neighboring ASs– Could have peering relationship with the higher
out-degree neighbor – Given the Top AS and the higher out-degree ne
ighbor are comparable in out-degree
Polly Huang, NTU EE 32
Back to Path Inflation
• Draw the customer-provider, peer-peer, and sibling-sibling relationships on the overlay AS graph
• Compute the best routes under the ‘never going down and then up’ constraint
• Compare the inflation difference and ratio again with these running at the inter-AS level– Shortest – Policy
Polly Huang, NTU EE 33
Shortest vs. Policy Routing
• Pretty much the same both in terms of – Inflation difference
– Inflation ratio
Polly Huang, NTU EE 34
Therefore
• The observations from the prior work holds– With a newer graph– With the more realistic inter-AS policy routing
Now forget path inflation
How far away is the shortest to the policy inter-AS routing?
Polly Huang, NTU EE 36
Shortest vs. Policy
• In AS hops– 95% paths have the same length– Policy routes always longer
• In router hops– 84% paths have the same length– Some policy routes longer, some shorter
95% and 84% are pretty good numbers
Therefore shortest path at the inter-AS level might be OK…
Polly Huang, NTU EE 38
To Answer the Question
• Can we simply do shortest path computation?– A likely yes for AS-level graph– A firm no for hierarchical graph
• Must separate inter-AS shortest and intra-AS shortest
Questions?
Identifying Internet Dynamics
Routing Policy
Packet Dynamics
Routing Dynamics
It’s never a perfect world…
Polly Huang, NTU EE 42
The Problem
• But how perfect is the Internet?
• The Internet– A network of computers with stored information
– Some valuable, some relevant
– You participate by putting information up or getting information down
– From time to time, you can’t quite do some of these things you want to do
Why is that?
At the philosophical level…
Humans are so bound to failures.And the Internet is human-made.
But, Seriously…
Consider loading a Web page
Polly Huang, NTU EE 46
Web Surfing Failures
• The ‘window’ waving forever?
• An error message saying network not reachable
• An error message saying the server too busy
• An error message saying the server is down
• Anything else?
Polly Huang, NTU EE 47
Network Specific Failures
• The ‘window’ waving forever?
• An error message saying network not reachable
• An error message saying the server too busy
• An error message saying the server is down
• Anything else?
Polly Huang, NTU EE 48
The Causes
• The ‘window’ waving forever– Congestion in the network
– Buffer overflow
– Packet drops
• An error message saying network not reachable– Network outage
– Broken cables, Frozen routers
– Route re-computation
– Route instability
Polly Huang, NTU EE 49
Back to the Problem
• But how perfect is the Internet?
• Equivalent of– Packets can be dropped
• How frequent• How much
– Routes may be unstable• How frequent• For how long
Polly Huang, NTU EE 50
Significance
• Knowing the characteristics of packet drops and route instability helps – Design for fault-tolerance– Test for fault-tolerance
There are tons of formal/informal study on the dynamics…
Let’s take a look at a couple that are classical
Polly Huang, NTU EE 52
Packet Dynamics
• End-to-End Internet Packet Dynamics
• Vern Paxson
• ACM/IEEE Transactions on Networking, 7(3):277-292, June 1999
Polly Huang, NTU EE 53
Emphasis in Reverse Order
• Real subject of study– Packet loss– Packet delay
• Necessary assessment– The unexpected– Bandwidth estimation
Polly Huang, NTU EE 54
Measurement
• Instrumentation– 35 sites, 9 countries– Education, research, provider, company
• 2 runs– N1: Dec 1994– N2: Nov-Dec 1995– 21 sites in common
Polly Huang, NTU EE 55
Measurement Methodology
• Each site running NPD – A daemon program– Sender side sends 100KB TCP transfer
• Sender and receiver sides both – tcpdump the packets
• Noteworthy– Measurement occurred in Poisson arrival
• Unbiased to time of measurement
– N2 used big max window size• Prevent window size to limit the TCP connection throughput
Polly Huang, NTU EE 56
Packet Loss
• Overall loss rate:– N1 2.7%, N2 5.2%– N2 higher, because of big max window?
• I.e. Pumping more data into the network therefore more loss?
• Big max window in N2 is not a factor– By separating data and ack loss– Assumption: ack traffic in a half lower rate
• Won’t stress the network
– Ack loss: N1 2.88%, N2 5.14%– Data loss: N1 2.65%, N2 5.28%
Polly Huang, NTU EE 57
Quiescent vs. Busy
• Definition– Quiescent: connections without ack drops– Busy: otherwise
• About 50% of the connections are quiescent
• For connections are busy– Loss rate: N1 5.7%, N2 9.2%
Polly Huang, NTU EE 58
More Numbers
• Geographical effect
• Time of the day effect
Polly Huang, NTU EE 59
Towards a Markov Chain Model
• For hours long– No-loss connection now indicates further no-loss conne
ction in the future
– Lossy connection now indicates further lossy connections in the future
• For minutes long– The rate remains similar
pn
No loss Loss
pl1-pn
1-pl
Polly Huang, NTU EE 60
Another Classification
• Data– Loaded data: packets experiencing queueing delay due t
o own connection
– Unloaded data: packets not experiencing queueing delay due to own connection
– Bottleneck bandwidth measurement is needed here to determine whether a packet is loaded or not
• Ack– Simply acks
Polly Huang, NTU EE 61
3 Major Observations
• Although loss rate very high (47%, 65%, 68%), all connections complete in 10 minutes
• Loss of data and ack not correlated• Cumulative distribution of per connection loss rate
– Exponential for data
– Not so exponential for ack
– Adaptive sampling contributing to the exponential observation?
Polly Huang, NTU EE 62
More on the Markov Chain Model
• The loss rate Pu – The rate of loss
• The conditional loss rate Pc– The rate of loss when the previous packet is lost
• Contrary to the earlier work– Losses are busty– Duration shows pareto upper tail – (Polly: maybe more log-normal)
Polly Huang, NTU EE 63
You might ask…pl ,pn?
pn
No loss Loss
pl1-pn
1-pl
Polly Huang, NTU EE 64
Values for the pl’s
N1 N2
Loaded data 49% 50%
Unloaded data 20% 25%
Ack 25% 31%
Polly Huang, NTU EE 65
Possible Invariant
• Conditional loss rate
• For the value remains relatively close over the 1 year period
• More up-to-date data to verifying this?
• The loss burst size log normal?
• Both interested research questions
Polly Huang, NTU EE 66
Packet Delay
• Looking at one-way transit times (OTT)• There’s model for OTT distribution
– Shifted gamma– Parameters changes with regards to time and
path…
• Internet path are asymmetric– OTT one way often not equal OTT the other
way
Polly Huang, NTU EE 67
Timing Compression
• Ack compressions are small events
• So not really pose threads on– Ack clocking– Rate estimation based control
• Data compression very rare– For outlier filtering
Polly Huang, NTU EE 68
Queueing Delay
• Variance of OTT over different time scales– For each time scale – Divide the packets arrival into intervals of – For all 2 neighboring intervals l, r
• ml the median of OTT in interval l
• mr the median of OTT in interval r
• Calculate (ml-mr)
• Variance of OTT over is median of all (ml-mr)
Polly Huang, NTU EE 69
Finding the Dominant Scale
• Looking for ’s whose queueing variance are large– Where control most needed
• For example, if those ’s re smaller than RTT– Then TCP doesn’t need to bother adapting to q
ueueing fluctuations
Polly Huang, NTU EE 70
Oh Well
• Queueing delay variations occur– Dominantly on 0.1-1 sec scales– But non-negligibly on larger scales
Polly Huang, NTU EE 71
Share of Bandwidth
• Pretty much uniformly distributed
Polly Huang, NTU EE 72
Conclusions on Analysis
• Common assumptions violated– In-order packet delivery– FIFO queueing– Independent loss– Single congestion time scale– Path asymmetry
• Behavior– Very wide range, not one typical
Polly Huang, NTU EE 73
Conclusions on Design
• Measurement methodology– TCP-based measurement shown viable– Sender-side only inferior
• TCP implementation– Sufficiently conservative
The Pathologies
The strange stuff
Polly Huang, NTU EE 75
Packet Re-Ordering
• Varying widely and too few samples• Therefore, deriving only a rule of thumb
– The Internet paths sometimes experience bad reordering
– Mainly due to route flapping
– Occasionally this funny case of router implementation• Buffering packets while processing a route update
• Sending these packets interleaving with the post-update arrivals
Polly Huang, NTU EE 76
Orthogonal to TCP SACK
• Receiver end modification– 20 msec wait before sending duplicate acknowledgeme
nt
– Waiting for re-ordered packets therefore lower false duplicate acknowledge
– Dup acks should be indication of losses
• Sender end motification– Fast retransmission after 2 duplicate acknowledgements
– Reactive fast retransmission, higher throughput
Polly Huang, NTU EE 77
Packet Replication
• Very strange, can’t quite explain– A pair of acks duped 9 times, arriving 32 msec apart
– A data packet duped 23 times, arriving in burst• False-configured bridge?
• Observation– Most of these site specific
– But small number of dups spread between other sites
– Senders dup packets too
Polly Huang, NTU EE 78
Packet Corruption
• Checksum good?
• Problem– The traces contain only the header data– Pure ack OK, the header = the packet– Data not OK, the header <> the packet
• Use an corruption inferring algorithm in tcpanaly
Polly Huang, NTU EE 79
Corruption Rate
• 1 corruption out of 5000 data packets• 1 corruption out of 300,000 pure acks
• Possible reasons of the difference– Header compression– Packet size– Inferring tool discrepancy– Other router/link level implementation artifacts
Polly Huang, NTU EE 80
Implication
• 16-bit checksum no longer sufficient– A corrupted packet has a one 216th chance to have the s
ame checksum as the non-corrupted packet– I.e., one out of the 216 corrupted packet can’t be detecte
d by the checksum
• Since 1 out of 5000 data packets is corrupted– 1 out of 5000 * 216 (300 M) packets can’t be identified a
s corrupted by the TCP 16-bit checksum– Consider one Gbps link and packet size 1Kb 1M Pps– 3 seconds per falsely received corrupted packet
Polly Huang, NTU EE 81
Estimating Bottleneck Bandwidth
• The packet pair technique– Send 2 packets back to back (or close enough)
• Inter-packet time, T2-T1, very small
– When then go across the bottleneck• Serving packet 1 while packet 2 will be queued
• Packet 2 immediately follow packet 1
– Packets will be stretched • Internet-packet time, T2-T1 , now the transmission time of
packet 1
– Estimated bandwidth = (Size of packet 1)/(T2-T1 )
Polly Huang, NTU EE 82
This Won’t Work
• Bottleneck bandwidth higher than sending rate
• Out-of-order delivery
• Clock resolution
• Changes in the bottleneck bandwidth
• Multi bottlenecks
Polly Huang, NTU EE 83
PBM
• Instead of sending a pair
• Send a bunch
• More robust again the multi bottleneck problem
Questions?
Identifying Internet Dynamics
Routing Policy
Packet Dynamics
Routing Dynamics
Polly Huang, NTU EE 86
Route Instability
• Internet Routing Instability
• Craig Labovitz, G. Robert Malan, Farnam Jahanian
• ACM/IEEE Transactions on Networking, 6(5):515-528, October 1998
Polly Huang, NTU EE 87
BGP Specific• BGP is an important part of the Internet
– Connecting the domains– Widespread– Known in prior work that route failure could result in
• Packet loss• Longer network delay• Network outage (Time to globally converge to local change)
• A closer look at the BGP dynamics– How much route updates are sent– How frequent are they sent– How useful are these updates
Polly Huang, NTU EE 88
BGP (In a Slide)
• The routing protocol running among the border routers– Path Vector– Think DV– Exchange not just next hop, but entire path
• Dynamics– In case of link/router recovery
• Exchange from the recovering point the route announcements
– In case of link/router down• Exchange from the closed point the route withdraws
– Route updates• Including route announcements/withdraws
Polly Huang, NTU EE 89
Data Collection
• Monitoring exchange of route updates– Over 9 month period– 5 public exchange points in the core
• Exchange point– Connecting points of ASs– Public exchange: of the US government– Private exchange: of the commercial providers
Polly Huang, NTU EE 90
Terminology
• AS– You all know
– In the path of the path vector exchanged by BGP• AS-PATH
• Prefix– Basically network address
– The source/destination of the route entries in BGP• 140.119.154/24
• 140.119/16
Polly Huang, NTU EE 91
Classification of Problems
• Forward instability– Legitimate topological changes affecting paths
• Routing policy fluctuation– Changes in routing policy but not affecting
forwarding paths
• Pathological updates– Redundant information not affecting routing
nor forwarding
Polly Huang, NTU EE 92
Forwarding Instability
• WADiff– A route is explicitly withdrawn– Replaced with an alternative route– As it becomes unreachable– The alternative route is different in AS-PATH or next-hop
• AADiff– A route is implicitly withdrawn– Replaced with an alternative route– As it becomes unreachable or a preferred alternative route
becomes available
Polly Huang, NTU EE 93
In the Middle• WADup
– A route is explicitly withdrawn– Then re-announced as reachable– Could be
• Pathological• Forwarding instability: transient topological change
• AADup– A route is implicitly withdrawn– Replaced with a duplicate of the original route
• Same AS-PATH and next-hop
– Could be • Pathological• Policy fluctuation: differ in other policy attributes
Polly Huang, NTU EE 94
Pathological
• WWDup– Repeated withdraws for a prefix no longer reac
hable– Pathological
Polly Huang, NTU EE 95
Observations – The Majority
• Pathological updates (redundant)– Minimum effect on
• Route quality
• Router processing load
– Some not agree– Adding significant amount of traffic
• 300 updates/second could crash a high-end router
Polly Huang, NTU EE 96
Observation - Instability
• Forwarding instability– 3-10% WADiff– 5-20% AADiff– 10-50% WADup
• Policy fluctuation– AADup quite high – But most probably pathological
• Need this– The Internet routing works become of these necessary a
nd frequent updates
Polly Huang, NTU EE 97
Observation – Distribution
• No spacial correlation– Correlates to router implementation instead
• Temporal– Time the the date effect, date of the week effect
– Therefore correlates to network congestion
• Periodicity– 30, 60 second period
– For self-sync, mis-configuration, BGP is soft-state based, etc
Basically, not saying much…
But for the background
And ease of reading
Questions?
Polly Huang, NTU EE 100
What Should You Do?
• Routing policy– Intra-AS: shortest path– Inter-AS: shortest path (95%, 84% OK)– Better model in progress…
• Packet losses– 2-state markov chain model
• pl: some info• pn: no info…
• Routing instability: outage time– The paper #2 of the original paper set (OSPF vs. DV)