t-110.5116 computer networks ii– snmp traps, syslog msgs, trouble tickets, traffic traces etc. •...

54
T-110.5116 Computer Networks II Network diagnostics and traffic analysis 12/19.11.2012 Matti Siekkinen (Sources: R.Teixeira: Internet measurements: fault detection, identification, and topology discovery; S. Kandula: Detailed Diagnosis in Enterprise Networks)

Upload: others

Post on 26-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

• 23.9.2010

T-110.5116 Computer Networks II Network diagnostics and traffic analysis 12/19.11.2012 Matti Siekkinen

(Sources: R.Teixeira: Internet measurements: fault detection, identification, and topology discovery; S. Kandula: Detailed Diagnosis in Enterprise Networks)

Page 2: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Concerning exam dates

•  Network security exam on same date: December 17. –  Different time though…

•  Now additional exam date: January 3. 2013

• 2

Page 3: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Outline

•  What is QoS? –  Overview of QoS mechanisms

•  Network diagnostics and traffic analysis –  What, why, and how?

•  Measuring networks –  Topology discovery –  Bandwidth measurements –  Network Tomography

•  Traffic analysis –  Root cause analysis –  Application-level analysis –  Traffic anomaly detection

•  Conclusions

Page 4: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

What is Quality of Service?

•  Many applications are sensitive to delay, jitter, and packet loss –  Too high values makes utility drop to zero

•  Some mission-critical applications cannot tolerate disruption –  VoIP –  high-availability computing

•  Related concept is service availability –  How likely is it that I can place a call and not get interrupted? –  requires meeting the QoS requirements for the given application

Page 5: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Example QoS Requirements

Personal voice over IP Network

monitoring

CEO Video conference with analysis

Financial Transactions

Interactive whiteboard

Unicast radio

Network management traffic Extranet

web traffic Public web traffic

Push news

Personal e-mail

Business e-mail

Server backups

Sensitive

Insensitive

Casual Critical

Delay

Mission Criticality

Page 6: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

How to guarantee QoS

•  Provisioning (before any data packets sent) –  Admission control

•  Prohibit or allow new flows to enter the nw •  Make sure we have necessary available bandwidth in network

–  Resource reservation •  Reserve the necessary available bandwidth in network

•  Control (during data transfer) –  Scheduling (FIFO, WFQ)

•  Which flow gets a piece of resources at a given time instant –  Queue mgmt (drop-tail, RED)

•  If buffer fills up, which flow do we punish? –  Policing (leaky/token bucket)

•  Enforce flows to behave according to agreed policy •  E.g. send traffic at constant rate R

Page 7: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

How to guarantee QoS (cont.)

Packet Scheduling

Admission Control

Traffic Shaping

(Users get their share of bandwidth) (Policing to ctrl

amount of traffic users can inject into the network)

(To accept or reject a flow based on flow specifications)

Core

Page 8: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

How to guarantee QoS (cont.)

•  These are network-layer techniques –  Each router needs to support this –  Together allow perfect control of QoS

•  Internet does not implement these mechanisms –  Works today only within some ISPs network

•  Technically we know how to do it Internet wide but other reasons prevent deployment –  E.g. lack of business models

Page 9: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

What if we had QoS guarantees?

•  Your Internet subscription states the SLA –  Describes what kind of service you will get –  E.g. guaranteed bandwidth of B with max delay of D when there are

no higher priority customers present •  How would you perceive that?

–  YouTube video would either stream perfectly or might not load at all •  May have no admission with your SLA at the moment

–  Skype call would never be of bad quality but call can be refused or interrupted

–  Downloading file (size S) happens in exactly B*S seconds –  Obviously, assuming network is not broken and you have coverage

(wireless)…

• 9

Page 10: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

QoS in Today’s Internet

TCP/UDP/IP: “best-effort service” •  no guarantees on delay, loss

Today’s Internet applications use application-level techniques to mitigate

(as best possible) effects of delay, loss

But some apps (multimedia) require QoS and level of performance to be

effective!

? ? ? ? ?

?

? ? ?

?

?

Page 11: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

What can be done today to control QoS

•  Mainly application-level techniques –  Application adapts to network conditions

•  Buffer stream data, conceal errors, … –  Use overlay networks –  No need to change anything in routers

•  Make the best out of the best effort network –  Cannot guarantee anything

•  No guarantees means we cannot be sure what kind of QoS we get –  Monitoring is important –  Enter network diagnostics and traffic analysis…

Page 12: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Outline

•  What is QoS? –  Overview of QoS mechanisms

•  Network diagnostics and traffic analysis –  What, why, and how?

•  Measuring networks –  Topology discovery –  Bandwidth measurements –  Network Tomography

•  Traffic analysis –  Root cause analysis –  Application-level analysis –  Traffic anomaly detection

•  Conclusions

Page 13: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Network diagnostics and traffic analysis

•  Understand how the network is doing –  Detect and diagnose faults (links, routers, …) –  Identify performance bottlenecks

•  E.g. congested link

•  Detect and quarantine misbehaving devices or traffic –  Anomaly detection –  E.g. misconfigured router, attacks

•  Learn what kind of QoS users perceive –  Performance evaluation of applications –  Analyze resulting traffic to infer perceived QoS –  Goal is obviously to improve if possible

• 13

Page 14: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Why bother?

•  Keep things going –  Stuff breaks down –  Operators and admins are human beings and make mistakes –  Want to keep the networks operational

•  Maximum benefit out of the infrastructure –  Equipment costs money –  Maximize utilization

•  Happy customers –  Performance troubles make them unhappy –  Unhappy customers may decrease revenues

• 14

Page 15: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Why is it challenging?

•  Few built-in diagnosis mechanisms –  Today’s networks run on IP –  Network elements are “simple” –  Intelligence lies at the edges ⇒ May need to use complex end-to-end methods to measure

simple things (e.g. link capacity)

•  Scale can be very large –  Traffic volumes –  Number of nodes –  Different services and protocols ⇒ Diagnosis techniques need to be scalable too

• 15

Page 16: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Diagnosing networks

•  Obtain some input data –  SNMP traps, syslog msgs, trouble tickets, traffic

traces etc.

•  Inference / Analysis –  Analyze the input data –  E.g. learn that a router link is down

•  Do something about it –  E.g. start fixing the link

• 16

Collect raw measurements

Analyze measurements

Use learned information

Page 17: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Ways to collect data for diagnosis

•  Management tools –  Ask the devices how they are doing –  Receive alarms, traps –  E.g. SNMP

•  Passive measurements –  Simply record what you observe –  E.g. Cisco’s Netflow traffic data or raw traffic header traces

•  Active measurements –  Send probes and observe what happens to them –  E.g. tomography, bandwidth measurements

• 17

Page 18: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Where to collect measurements?

•  Network aggregation points –  Router, switch –  Access, gateway, backbone –  Depends on scale, available methods,

and objectives

•  Client or server –  Usually limited possibilities –  Possible in data center networks

• 18

Backbone router

Access router

Customers Customers

Gateway router

ISP 2 ISP 3

ISP 1

Page 19: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Analyzing data

•  On-line –  Perform (at least a part of) the analysis on the observed data in a

real-time manner ☺ Data reduction -> don’t store everything ☺ Can react quickly ☹ Scalability

•  10 Gbit/s link produces >8 MB/s of uncompressed packet headers •  May need sampling, aggregation

☹ Do not necessarily have all the raw data for later analysis •  Off-line

–  Record data into persistent storage and analyze later ☺ Run complex time-consuming analysis ☹ Not for time critical analysis ☹ Storage issues

• 19

Page 20: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Analyzing data (cont.)

•  Human vs. machine –  Statistical analysis and data mining techniques –  Reveal non-trivial patterns (aggregate/similar behavior,

anomalies) –  Still need an admin/operator somewhere in the loop

•  Combine many data sources –  Increase robustness

•  Fewer false positives –  Detect issues that would normally “fly under the radar”

•  Aggregated input feeds may reveal more

• 20

Page 21: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Analyzing data (cont.)

•  Wait a minute, we already have SNMP! –  E.g. routers can produce traps when something goes wrong

•  Alarms and traps from devices are not enough even for one network –  Network “Black holes”

•  Silent failures: nw devices do not send alarms •  Causes: complex cross-layer interactions, router sw bugs/

misconfigurations, … –  Need detailed and application-specific diagnosis

•  Want to know the causes of failures/problems that raise alarms •  End-to-end diagnosis

–  Diagnosis across administrative domains –  You cannot make an SNMP query to a router in Australian ISPs

network

• 21

Page 22: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Outline

•  What is QoS? –  Overview of QoS mechanisms

•  Network diagnostics and traffic analysis –  What, why, and how?

•  Measuring networks –  Topology discovery –  Bandwidth measurements –  Network Tomography

•  Traffic analysis –  Root cause analysis –  Application-level analysis –  Traffic anomaly detection

•  Conclusions

Page 23: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Measuring networks

•  Measurements and diagnosis of network properties –  Bandwidth, delay, connectivity, reachability…

•  How? –  Active measurements

•  Probing messages analyzed at the other end •  Clever use of standard protocols

–  ping, traceroute –  Passively collected data (e.g. routing logs)

•  Three example cases –  Topology discovery –  Bandwidth measurements –  Network tomography

• 23

Page 24: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Topology

•  What’s topology? –  Topology describes how the network is laid out

•  Links between routers, switches, etc. •  Not trivial knowledge in large scale networks

•  What’s Internet topology like? –  Internet consists of Autonomous Systems (AS)

•  “a connected group of one or more IP prefixes run by one or more network operators which has a single and clearly defined routing policy” [RFC 1930]

•  E.g. Internet Service Provider (ISP) –  Internet has two-level topology

•  Intra-domain topology –  Within a single network (AS)

•  Inter-domain topology –  Across ASs

• 24

Page 25: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

• 25

Internet topology: illustration

Page 26: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Internet topology (cont.)

•  Internet service providers (ISP) grouped in classes –  tier 1: global

•  10-15 •  Internet’s “backbone” •  Settlement free peering: allow each other’s traffic

without charges –  tier 2: regional

•  Both peering and transit services –  tier 3: local

•  Solely transit (buy connectivity from higher tier ISPs

• 26

Page 27: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

• 27

Internet topology in 2008

§  A few tens of thousands of ASs o  Size varies

Page 28: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Topology discovery

•  Find out topology of a given network by –  probing (active measurements) –  analyzing logs and/or traffic (passive measurements)

•  Why is it useful? –  Some diagnosis methods rely on accurate topology information

•  E.g. Network tomography needs topology –  Realistic simulation and modeling of the Internet

•  Topology models needed for simulations •  E.g. performance of routing protocols is critically dependent on

topology

• 28

Page 29: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Topology discovery (cont.)

•  Granularity level –  Router-level topologies

•  Reflect physical connectivity between nodes •  Inferred using with e.g. traceroute

–  AS graphs •  Peering relationships between providers/clients •  Inferred from inter-domain routers’ BGP tables •  Could also use traceroute with some additional information

•  Measurement location –  With access to routers (“from inside”)

•  Topology of one network •  Routing monitors (OSPF or IS-IS)

–  No access to routers (“from outside”) •  Multi-AS topology or from end-hosts •  Monitors issue active probes: traceroute

• 29

Page 30: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Topology from inside

•  Routing protocols flood state of each link –  Periodically refresh link state –  Report any changes: link down, up, cost change

•  Monitor listens to link-state messages –  Acts as a regular router

•  AT&T’s OSPFmon or Sprint’s PyRT for IS-IS

•  Combining link states gives the topology –  Easy to maintain, messages report any changes

•  Usually not possible across domains

• 30

Page 31: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Inferring a path from outside: traceroute

• 31

A B

TTL = 1

A.1 A.2 B.2 B.1

TTL = 2

TTL exceeded from A.1

TTL exceeded from B.1

Actual path

Inferred path

A.1 B.1

m t

m t

Page 32: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

A traceroute path can be incomplete

•  Load balancing is widely used –  Forward packets differently based on load in different parts of

network –  Can be per-flow or even per-packet –  Traceroute only probes one path

•  Sometimes traceroute has no answer (stars) –  ICMP rate limiting for DoS protection –  Anonymous routers

•  Do not send ICMP replies at all or reply with probe’s destination IP •  Security and privacy concerns

•  Tunnelling (e.g., MPLS) may hide routers –  Routers inside the tunnel may not decrement TTL

• 32

Page 33: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Traceroute under load balancing

• 33

L

B

A C

D

L

A

D

C

TTL = 2

TTL = 3

B

E

E

Missing nodes and links

False link

Actual path

Inferred path

m

m t

t

Page 34: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Traceroute under load balancing (cont.)

•  Even per-flow load balancing causes trouble •  Traceroute uses the destination port as identifier

–  Needs to match probe to response –  Response only has the header of the issued probe

• 34

L

B

A C

D

TTL = 2 Port 2

TTL = 3 Port 3

E m t

Page 35: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Paris traceroute

•  Solves the problem with per-flow load balancing –  Probes to a destination belong to same flow

•  Keep flow IDs constant for probes to specific destination –  Flow ID = src/dest IP & port, TP protocol

•  How to match probes with ICMP responses? –  Need to know which ICMP response corresponds to which

probe

• 35

Page 36: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Paris traceroute

•  Matching probes with ICMP responses –  Vary fields within first eight octets of TP-layer header (included

in ICMP response) –  Keep the flow ID related fields constant –  UDP probes: vary checksum (need to manipulate payload too) –  ICMP probes: vary #seq, but also Identifier -> keep checksum

constant

• 36

L

B

A C

D

TTL = 2 Port 1

TTL = 3 Port 1 E Checksum 3 Checksum 2

m t

Page 37: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

4 2 1

1

More traceroute shortcomings

•  Inferred nodes = interfaces, not routers –  Different interfaces have different IP address

•  Coverage depends on monitors and targets –  Misses links and routers –  Some links and routers appear multiple times

• 37

1 A

D

3 B 2

3

2

3 1 m1

t1

m2

t2

C

Actual topology

A.1 m1 t1

m2 t2

Inferred topology

C.1 D.1

C.2

B.3

2

Page 38: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Alias resolution: Map interfaces to routers •  Direct probing

–  IP identifier (IPID) in IP header is usually an increasing per packet (or jiffie) counter

–  Responses from same router have close IPIDs and same TTL

•  Record-route IP option –  Records only up to nine IP

addresses of routers in the path •  Enough in many cases

–  Some routers may drop packets with IP options

•  Security concerns usually –  Can also discover outgoing

interfaces

• 38

A.1 m1 t1

m2 t2

Inferred topology

C.1 D.1

C.2

B.3 same router

Page 39: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Large-scale topology measurements

•  Probing a large topology takes time –  E.g., probing 1200 targets from PlanetLab nodes takes 5

minutes on average (using 30 threads) –  Probing more targets covers more links –  But, getting a topology snapshot takes longer

•  Snapshot may be inaccurate –  Paths may change during snapshot –  To know that a path changed, need to re-probe

• 39

Page 40: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Large-scale topology measurements

•  It is possible to reduce redundant probing –  Topologies have tree like structures with aggregation points –  Can skip redundant segments that are already discovered

• 40

B. Donnet et al.: “Efficient Algorithms for Large-Scale Topology Discovery”. SIGMETRICS 2005.

Page 41: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Outline

•  What is QoS? –  Overview of QoS mechanisms

•  Network diagnostics and traffic analysis –  What, why, and how?

•  Measuring networks –  Topology discovery –  Bandwidth measurements –  Network Tomography

•  Traffic analysis –  Root cause analysis –  Application-level analysis –  Traffic anomaly detection

•  Conclusions

Page 42: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Bandwidth measurements

•  What? –  Infer the bandwidth of a specific hop or of a whole path –  Capacity = maximum possible throughput –  Available bandwidth = portion of capacity not currently used –  Bulk transfer capacity = throughput that a new single long-lived

TCP connection could obtain

•  Why? –  Network aware applications

•  Server or peer selection •  Route selection in overlay networks

–  QoS measurements

• 42

Page 43: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Challenges

•  Routers and switches do not provide direct feedback to end-hosts –  Except ICMP (traceroute) –  Mostly due to scalability, policy, and simplicity reasons

•  End-to-end bandwidth cannot be measured with SNMP –  No access because of administrative barriers –  Network administrators can query router/switch information only

within own network

• 43

Page 44: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

The Internet as a “black box”

•  End-systems can infer network state through end-to-end (e2e) measurements –  Without any explicit feedback from routers –  Objectives: accuracy, speed, minimal intrusiveness

• 44

The Internet

Probing packets

Page 45: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Metrics and definitions

•  Simple example of an end-to-end path

• 45

router1

cross traffic

link1 (access link) router2

cross traffic

link2

source host

destination host

link3 (access link)

Page 46: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Metrics and definitions (cont.)

•  Capacity of this path is 100 Mbps –  Determined by the narrow link

•  Available bandwidth of this path is 50 Mbps –  Determined by the tight link

• 46

narrow link tight link

100 Mbps 90 Mbps 10 Mbps

2500 Mbps 1300 Mbps 1200 Mbps

1000 Mbps 50 Mbps 950 Mbps

link capacity

available bandwidth used bandwidth

link1 link3 link2

Page 47: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Measurement techniques

⎟⎟⎠

⎞⎜⎜⎝

⎛Δ=Δ

iinout CL,max

• 47

•  Generally use active probing –  Send packets with a specific inter-

arrival pattern –  Observe the pattern at the other end

•  Example: Packet-pair technique for capacity estimation –  Send two equal-sized packets back-

to-back •  Packet size: L •  Packet tx time at link i: L/Ci

–  P-P dispersion: time interval between first bit of two packets

–  Without any cross traffic, the dispersion at receiver is determined by narrow link: C

LCL

iHiR =⎟⎟

⎞⎜⎜⎝

⎛=Δ

= ,...,1max

C = path capacity

Δin Δout

L L L L Ci

Incoming packet pair

Outgoing packet pair

Page 48: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Bandwidth estimation with cross traffic

•  Cross traffic packets can affect P-P dispersion –  P-P expansion: capacity underestimation –  P-P compression: capacity overestimation

•  Noise in P-P distribution depends on cross traffic load

• 48

Page 49: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Ideal Packet Dispersion

• 49

•  No cross-traffic

Capacity = (Packet Size) / (Dispersion)

Page 50: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Expansion of Dispersion

• 50

•  Cross-traffic (CT) serviced between PP packets

•  Second packet queues due to Cross Traffic (CT )

à Expansion of dispersion à Under-estimation of capacity

Page 51: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Compression of Dispersion

• 51

•  First packet queueing à Compressed dispersion à Over-estimation

Page 52: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

CapProbe

•  CapProbe estimation tool takes cross-traffic into account •  Observations:

–  First packet queues more than the second •  Compression à Over-estimation

–  Second packet queues more than the first •  Expansion à Under-estimation

–  Both are result of probe packets experiencing queuing •  Sum of PP delay includes queuing delay

•  Filter PP samples that do not have minimum queuing time •  Dispersion of PP sample with minimum delay sum reflects

capacity

• 52

Rohit Kapoor et al.: CapProbe: A Simple and Accurate Capacity Estimation Technique. SIGCOMM ‘04

Page 53: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

CapProbe approach

•  For each packet pair, CapProbe calculates delay sum: delay(packet_1) + delay(packet_2)

•  A PP with the minimum delay sum points out the capacity

• 53

capacity

Page 54: T-110.5116 Computer Networks II– SNMP traps, syslog msgs, trouble tickets, traffic traces etc. • Inference / Analysis – Analyze the input data – E.g. learn that a router link

Bandwidth estimation tools

•  Many estimation tools & techniques –  Abing, netest, pipechar, STAB, pathneck, IGI/PTR, abget,

Spruce, pathchar, clink, pchar, PPrate, DSLprobe, ABwProbe, …

•  Some practical issues –  Traffic shapers –  Non-FIFO queues

•  More scalable methods –  Passive measurements instead of active measurements

•  E.g. PPrate (2006) for capacity estimation: adapt Pathrate’s algorithm

–  One measurement host instead of two cooperating ones •  abget (2006) for available bandwidth estimation •  DSLprobe for capacity estimation of asymmetric (ADSL) links

• 54