the impact of router outages on the as-level internet · 2017-10-27 · the impact of router...

56
w w w .caida.o The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly - Naval Postgraduate School *work started while at CAIDA, UC San Diego SIGCOMM 2017, August 24th 2017

Upload: others

Post on 18-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

w w w .caida.org

The Impact of Router Outages on the AS-Level Internet

1

Matthew Luckie* - University of Waikato Robert Beverly - Naval Postgraduate School

*work started while at CAIDA, UC San Diego

SIGCOMM 2017, August 24th 2017

Page 2: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Internet Resilience

2

CE

PE

PE

CE

PE

CE

CE: Customer Edge PE: Provider Edge

Where are the Single Points of Failure?

Example #A Example #B

Page 3: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Internet Resilience

3

CE

PE

PE

Where are the Single Points of Failure?

If the CE router fails,the network is disconnected,

so the CE router is aSingle Point of Failure (SPoF)

CE: Customer Edge PE: Provider Edge

Example #A

Page 4: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Internet Resilience

4

CE

PE

CE

Where are the Single Points of Failure?

If the CE router fails,the network has an

alternate path available,so the CE router is NOT a

Single Point of Failure (SPoF)

CE: Customer Edge PE: Provider Edge

Example #B

Page 5: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Internet Resilience

5

CE

PE

CE

Where are the Single Points of Failure?

If the PE router fails,the customer network is

disconnected, so the PE router is a Single Point of Failure (SPoF)

CE: Customer Edge PE: Provider Edge

Example #B

Page 6: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Challenges in topology analysis• Prior approaches analyzed static AS-level and router-level

topology graphs,

- e.g.: Nature 2000

• Important AS-level and router-level topology might be invisible to measurement, such as backup paths,

- e.g: INFOCOM 2002

• A router that appears to be central to a network’s connectivity might not be

- e.g.: AMS 2009

6

Page 7: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

Large-scale (Internet-wide) longitudinal (2.5 years) measurement study to characterize prevalence of Single Points of Failure (SPoF):

1. Efficiently inferred IPv6 router outage time windows

2.Associated routers with IPv6 BGP prefixes

3.Correlated router outages with BGP control plane

4.Correlated router outages with data plane

5.Validated inferences of SPoF with network operators

7

Page 8: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

8

Identified IPv6 router interfaces from traceroute

83K to 2.4M interfaces from CAIDA’sArchipelago traceroute measurements

Page 9: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

9

probed router interfaces to infer outage windows

We used a single vantage point located at CAIDA,UC San Diego for the duration of this study

Page 10: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290

Page 11: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291

9290

T1: 9290

Page 12: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

9291

Central counter : 9292T1: 9290T2: 9291

Page 13: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293

9292

Page 14: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293

9293

Central counter : 9294T1: 9290T2: 9291T3: 9292T4: 9293

Page 15: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293 Central counter : 9294T1: 9290T2: 9291T3: 9292T4: 9293

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294

9294

Central counter : 9295

Page 16: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293 Central counter : 9294T1: 9290T2: 9291T3: 9292T4: 9293

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294

Central counter : 9295

Reboot!

Central counter : 1

Page 17: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293 Central counter : 9294T1: 9290T2: 9291T3: 9292T4: 9293

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294

Central counter : 9295 Central counter : 1

1

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1

Central counter : 2

Page 18: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293 Central counter : 9294T1: 9290T2: 9291T3: 9292T4: 9293

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294

Central counter : 9295 Central counter : 1T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1

Central counter : 2 Central counter : 3

2

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1T7: 2

Page 19: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

10

Central counter : 9290 Central counter : 9291T1: 9290

Central counter : 9292T1: 9290T2: 9291T1: 9290T2: 9291T3: 9292

Central counter : 9293 Central counter : 9294T1: 9290T2: 9291T3: 9292T4: 9293

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294

Central counter : 9295 Central counter : 1T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1

Central counter : 2 Central counter : 3T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1T7: 2

3

Central counter : 4T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1T7: 2T8: 3

Page 20: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

11

probed router interfaces to infer outage windows using IPID

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1T7: 2T8: 3

Infer a reboot when time series of values returned froma router is discontinuous, indicating router was restarted

Outage Window

Page 21: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Why IPv6 fragment IDs?

• IPv4 Fragment IDs:

- 16 bits, bursty velocity: every packet requires unique ID

- At 100Mbps and 1500 byte packets, Nyquist rate dictates4 second probing interval

• IPv6 Fragment IDs:

- 32 bits, low velocity: IPv6 routers rarely send fragments

- We average 15 minute probing interval

12

Page 22: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

13

correlated routers with prefixes using traceroute paths

Page 23: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

14

2001:db8:1::/48

2001:db8:2::/48correlated routers with prefixes

using traceroute paths

Ark VP

Ark VP

50-60 Ark VPstraceroute every

routed IPv6prefix every day

Page 24: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

14

2001:db8:1::/48

2001:db8:2::/48correlated routers with prefixes

using traceroute paths

Ark VP

Ark VP

50-60 Ark VPstraceroute every

routed IPv6prefix every day

Page 25: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

15

2001:db8:2::/48computed distance of

router from AS announcingnetwork

Ark VP

2001:db8:1::/48

0(CE)

1 (PE)

2

CE: Customer Edge PE: Provider Edge

Page 26: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

16

2001:db8:1::/48

2001:db8:2::/48correlated router outage windows

with BGP control plane

0(CE)

Page 27: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

17

2001:db8:1::/48

2001:db8:2::/48correlated router outage windows

with BGP control plane

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1T7: 2T8: 3

Outage Window

Page 28: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we did

18

2001:db8:1::/48

2001:db8:2::/48correlated router outage windows

with BGP control plane

T1: 9290T2: 9291T3: 9292T4: 9293T5: 9294T6: 1T7: 2T8: 3

Outage Window

2001:db8:2::/48T5.2: Peer-1 WT5.2: Peer-2 WT5.3: Peer-3 WT5.3: Peer-4 WT5.8: Peer-3 AT5.8: Peer-2 AT5.8: Peer-1 AT5.8: Peer-4 A

RouteViews

Page 29: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

classified impact on BGP according to observed activity overlapping with inferred outage

What we did

• Complete Withdrawal: all peers simultaneously withdrew route for at least 70 seconds

- Single Point of Failure (SPoF)

• Partial Withdrawal: at least one peer withdrew route for at least 70 seconds, but not all did

• Churn: BGP activity for the prefix

• No Impact: No observed BGP activity for the prefix

19

Page 30: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Data Collection SummaryWhat we did

• Probed IPv6 routers at ~15 minute intervals from18 Jan 2015 to 30 May 2017 (approx. 2.5 years)

• 149,560 routers allowed reboots to be detected

• We inferred 59,175 (40%) rebooted at least once,750K reboots in total

20

CDF

0.2

0.4

0.6

0.8

1

1 10 100Number of Outages

0

Page 31: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

What we found• 2,385 (4%) of routers that rebooted (59K) we inferred

to be SPoF for at least one IPv6 prefix in BGP

• Of SPoF routers, we inferred 59% to be customer edge router ; 8% provider edge; 29% within destination AS

• No covering prefix for 70% of withdrawn prefixes

- During one-week sample, covering prefix presence during withdrawal did not imply data plane reachability

• IPv6 Router reboots correlated with IPv4 BGP control plane activity

21

Page 32: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Limitations• Applicability to IPv4 depends on router being dual-stack• Requires IPID assigned from a counter

- Cisco, Huawei, Vyatta, Microtik, HP assign from counter- 27.1% responsive for 14 days assigned from counter

• Router outage might end before all peers withdraw route- Path exploration + Minimum Route Advertisement Interval

(MRAI) + Route Flap Dampening (RFD) • Complex events: multiple router outages but one detected

- We observed some complex events and filtered them out

22

Page 33: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Validation

23

Reboots SPoFNetwork ✔ ✘ ? ✔ ✘ ?US University 7 0 8 7 0 8US R&E backbone #1 2 0 3 3 2 0US R&E backbone #2 3 0 1 0 0 4NZ R&E backbone 11 0 22 4 2 27Total: 23 0 34 14 4 39✔ = Validated Inference✘ = Incorrect Inference? = Not Validated

Page 34: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Validation

24

Reboots SPoFNetwork ✔ ✘ ? ✔ ✘ ?US University 7 0 8 7 0 8US R&E backbone #1 2 0 3 3 2 0US R&E backbone #2 3 0 1 0 0 4NZ R&E backbone 11 0 22 4 2 27Total: 23 0 34 14 4 39

Challenging to get validation data: operators oftencould only tell us about the last reboot

Page 35: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Validation

25

Reboots SPoFNetwork ✔ ✘ ? ✔ ✘ ?US University 7 0 8 7 0 8US R&E backbone #1 2 0 3 3 2 0US R&E backbone #2 3 0 1 0 0 4NZ R&E backbone 11 0 22 4 2 27Total: 23 0 34 14 4 39

No falsely inferred reboots: we correctly observedthe last known reboot of each router

Page 36: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Validation

26

Reboots SPoFNetwork ✔ ✘ ? ✔ ✘ ?US University 7 0 8 7 0 8US R&E backbone #1 2 0 3 3 2 0US R&E backbone #2 3 0 1 0 0 4NZ R&E backbone 11 0 22 4 2 27Total: 23 0 34 14 4 39

We did not detect some SPoFs

Page 37: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Data Collection Summary

27

83K

41.8K15.2K

46.5K

(b)~1.1M

79.8K(a) (c)

23.5K

10KJan ’17Jul ’16

AllIncrementing

Jan ’16Jul ’15Jan ’15

3M

1M

100K

30KNum

ber o

f Int

erfa

ces

PPS List Unresponsive(a) 100 Static 83K 12-24 hours(b) 225 Static 1.1M 12-24 hours(c) 200 Dynamic, ~2.4M 7-14 days

Page 38: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Control: six hours prior to inferred outages, Feb 2015Correlating BGP/router outages

28

2 1 0 −1 −2 −3 −4 −5Distance of Router from Destination AS (IP hops)

Frac

tion

of R

eboo

t/Pre

fix P

airs

0

0.1

Churn

0.2

0.3

0.4

0.5

12 11 10

Partial Withdrawal

9 8 7 6 5 4 3

Complete WithdrawalInsideDest.

ASOutsideDest. AS

Page 39: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

During the inferred outages, Feb 2015Correlating BGP/router outages

29

2 1 0 −1 −2 −3 −4 −5Distance of Router from Destination AS (IP hops)

Frac

tion

of R

eboo

t/Pre

fix P

airs

0

0.1

Churn

0.2

0.3

0.4

0.5

12 11 10

Partial Withdrawal

9 8 7 6 5 4 3

Complete WithdrawalInsideDest.

ASOutsideDest. AS

Page 40: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

BGP Prefix Withdrawals: SPoF

30

max

0.2

0.4

0.6

0.8

1

1min

5min

15min

30min

1hr

2hr

4hr

8hr

16hr

CDF

Complete Withdrawal Duration

min

0

44% less than 5 minutes, suggestive of router maintenance or router crash

Page 41: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

SPoF prefixes mostly single homed

31

Router hopdistance

PECE

1

−2

0−1

1

Prefix announced through a single upstream

Prefix announced through multiple upstreams

0.20

23

Fraction of Population

0.4 0.6 0.8

−3

EspeciallySPoFs outsidedestination AS,as expected

Page 42: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Impact on IPv4 prefixes in BGP

32

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

Cum

ulat

ive

Frac

tion

Rou

ter O

utag

es

Withdrawn Peers/Advertising Peers

Before OutageDuring Outage

We examined IPv4 prefixes for 5% sample of reboots.19% of correlated IPv4 prefixes withdrawn by at least 90% of peers during router outage window.

Control

Outage

Page 43: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Summary• Step towards root-cause analysis of

inter-domain routing outages and events

- Explore applicability of method to measurement of other critical Internet infrastructure: DNS, Web, Email

• In our 2.5 year sample of 59K routers that rebooted

- 4% (2.3K) were SPoF

- SPoF were mostly confined to the edge: 59% customer edge

• We released our code as part of scamper

33https://www.caida.org/tools/measurement/scamper/

2 1 0 −1 −2 −3 −4 −5Distance of Router from Destination AS (IP hops)

Frac

tion

of R

eboo

t/Pre

fix P

airs

0

0.1

Churn

0.2

0.3

0.4

0.5

12 11 10

Partial Withdrawal

9 8 7 6 5 4 3

Complete Withdrawal

Page 44: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Backup Slides

34

Page 45: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Impact on IPv4 Services

35

We examined IPv4 prefixes for 5% sample of reboots where at least 90% of peers during router outage window.

Active Hosts 39,107HTTP 25,592HTTPS 16,321

SSH 11,277DNS 7,922SMTP 7,383IMAP 5,127

censys.io April 2017

Web}

Email}

Page 46: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Partial Withdrawals

36

50% of pairs had1−2 peers withdraw

nearly all peers withdraw10% of pairs had

0

1

0 0.2 0.4 0.6 0.8 1Fraction of Peers Withdrawing Route

CD

F 0.6

0.4

0.2

0.8

50% of pairs had 1-2 peers withdraw prefix 10% of pairs had nearly all peers withdraw prefix

Page 47: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Degrees of ASes monitored

37

0

0.2

0.4

0.6

0.8

1

1 10 100 1000

Cum

ulat

ive

Frac

tion

of R

eboo

ting

ASes

AS Degree

Single Points of FailureMonitored Population

ASes that were inferred to have a SPoF were disproportionately low-degree ASes

Page 48: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Activity for IPv4 prefixes in BGP

38

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cum

ulat

ive

Frac

tion

Rou

ter O

utag

es

Peers Sending Updates/Total Peers

Before OutageDuring Outage

At least 70% of peers reported BGP activity on IPv4prefixes for 50% of the inferred router outages

Page 49: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Reboot Window Durations

39

max

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

1min

5min

15min

30min

1hr

2hr

4hr

8hr

16hr

Reboot Window Duration

CDF

min

0

Half the maximum reboot lengths were less than 30 minutes (~two probing rounds)

Page 50: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Router + BGP outage correlation

40

Outage Window

Withdraw-Contained

10, 11, 12 1, 2, 3

W A

Router IP-ID Sequence:

Outage-ContainedW A

Withdraw-BeforeW A

Announce-AfterW A

BGP Sequence:

Page 51: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Data processing pipeline

41

UptimeProber

Cassandra

CAIDAIPv6Topology

rtr targets

<ip,time,ipid>

RouteViews

<peer,time,prefix>

InferredReboots

BGPCorrelation

single pointsof failure

AS borderdistance

Page 52: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Inferring router position

42

x1R1 R3

x2 x3R2 R5

y1 y2R4

0 -1 -212(a) interface addresses routed by Y appear in traceroute

x1R1 R3

x2 x3R2 R5R4

012(b) no interface addresses routed by Y appear in traceroute

Customer Edge(CE) Router

Provider Edge(PE) Router

? ?

AS X AS Y

AS X AS Y

Page 53: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Data Collection Summary

43

18 Jan ’1518 Oct ’16

(a)

18 Oct ’1624 Feb ’17

(b)

24 Feb ’1730 May ’17

(c)

Probing rate 100 pps 225 pps 200 pps

Interfaces 83K seen Dec ‘14

1.1M seen Jun to Oct ’16

Dynamic. 2.4Min May ‘17

Responsive every round~15 mins

every round~15 mins

every round~15 mins

Unresponsive 12-24 hours 12-24 hours 7-14 days

Page 54: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Why IPv6 fragment IDs?

44

At 100Mbps and 1500 byte packets.Nyquist rate dictates a 4 second probing interval

IPv4 ID values are 16 bits with bursty velocity as every packet requires a unique value.

source addressdestination address

TTL protocol checksumidentification

Ver DSCP lengthHLoffset

Page 55: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Why IPv6 fragment IDs?

45

IPv6 ID values are 32 bits with low velocityas systems rarely send fragmented packets.

source address

destination address

protocol TTL

identification

Ver DSCP flow idpayload length

protocol offsetreserved

Page 56: The Impact of Router Outages on the AS-Level Internet · 2017-10-27 · The Impact of Router Outages on the AS-Level Internet 1 Matthew Luckie* - University of Waikato Robert Beverly

Soliciting IPv6 Fragment IDs

46

echo request, 1300 bytes

packet too big, MTU 1280

echo reply, 1300 bytes

echo request, 1300 bytes

echo reply, 1280 bytes

Fragment ID: 12345