leveraging sdn layering to systematically troubleshoot networks

18
Leveraging SDN Layering to Systematically Troubleshoot Networks Brandon HellerColin Scott Nick McKeownScott Shenker Andreas Wundsam § Hongyi ZengSam Whitlock Vimalkumar JeyakumarNikhil HandigolJames McCauley Kyriakos Zari s∞ Peyman KazemianHotSDN 2013 Hong Kong Stanford Berkeley ∞USC ICSI SDN Academy §Big Switch Networks

Upload: acton

Post on 25-Feb-2016

74 views

Category:

Documents


1 download

DESCRIPTION

Leveraging SDN Layering to Systematically Troubleshoot Networks. Brandon Heller ★ Colin Scott  Nick McKeown ⌘ Scott Shenker   Andreas Wundsam § Hongyi Zeng ⌘ Sam Whitlock  Vimalkumar Jeyakumar ⌘ Nikhil Handigol ★ James McCauley   Kyriakos Zarifis ∞ Peyman Kazemian ★ HotSDN 2013 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Leveraging SDN Layering to Systematically Troubleshoot Networks

Leveraging SDN Layering to Systematically Troubleshoot Networks

Brandon Heller★Colin ScottNick McKeown⌘Scott Shenker Andreas Wundsam §Hongyi Zeng⌘Sam WhitlockVimalkumar Jeyakumar⌘Nikhil Handigol★James McCauleyKyriakos Zarifis∞Peyman Kazemian★

HotSDN 2013Hong Kong

⌘StanfordBerkeley

∞USCICSI

★SDN Academy§Big Switch Networks

Page 2: Leveraging SDN Layering to Systematically Troubleshoot Networks

AdminNetwork

skills + tools + knowledge

Protocols

Configuration

Topology

Policy

• connect hosts A + B• quarantine virus-

infected hosts• route guest traffic to

an HTTP proxy• prioritize SSH

+

1: Configure

Ethane, overlays, consistency primitives, network programming languages, …

3: Fix Stuff!

2: Troubleshoot

This Talk

Page 3: Leveraging SDN Layering to Systematically Troubleshoot Networks

AdminNetwork

skills + tools + knowledge

Protocols

Configuration

Topology

Policy

• connect hosts A + B• quarantine virus-

infected hosts• route guest traffic to

an HTTP proxy• prioritize SSH

+

1: Configure

Ethane, overlays, consistency primitives, network programming languages, …

3: Fix Stuff!

2: Troubleshoot

#1 request from network admins:Automatic Troubleshooting

Source: “Automatic Test Packet Generation”, CoNEXT ‘12, Zeng et al.

This Talk

Page 4: Leveraging SDN Layering to Systematically Troubleshoot Networks

How to automate troubleshooting?

NetworkPolicy

• isolate groups A + B• route guest traffic to

an HTTP proxy• block a list of virus-

infected hosts

Challenging in traditional networks.

~?

(2) Check behavior against policy:• confusing: don’t know lowest-level forwarding behavior• distributed: hard to get a meaningful snapshot

Two requirements.(1) Know the intended policy:

• confusing: different config format for each protocol• distributed: configuration spread among all nodes• hard: must understand all protocols & their interactions

difficult to check

impractical to infer

Page 5: Leveraging SDN Layering to Systematically Troubleshoot Networks

Control-Plane Layering in SDN

Firmware FirmwareFirmware

Network Hypervisor

App App App

State Layers

Logical View

Physical View

Device State

Hardware

Policy

Code Layers

Network OS

HW HW

HW

Page 6: Leveraging SDN Layering to Systematically Troubleshoot Networks

Firmware FirmwareFirmware

HW HW

HW

Systematically Troubleshooting an SDN

Network OS

Network Hypervisor

App App App

State Layers

Logical View

Physical View

Device State

Hardware

Policy

Code Layers Observation: Each state layer fully specifies network behavior.

Insight:Bugs manifest as mistranslations between layers.

Systematic Approach:(1) Binary search to isolate

to a code layer.(2) Leverage state to isolate

within the code layer.

Page 7: Leveraging SDN Layering to Systematically Troubleshoot Networks

Phase 1: Localizing to a code layer[Operator Intent]

Logical View

Physical View

Device State

Hardware

Policy

?~

Apps

NetHyperV

NetOS

Firmware

[Actual Behavior]

Cause: Firmware Bug

Yes

No

?~YesNo

?~ YesNo

SOFT[CONEXT ‘12]

Anteater[SIGCOMM ‘11]

Symptom: Hosts unable to communicate

Page 8: Leveraging SDN Layering to Systematically Troubleshoot Networks

Phase 1: Localizing to a code layer[Operator Intent]

Logical View

Physical View

Device State

Hardware

Policy

?~

Apps

NetHyperV

NetOS

Firmware

[Actual Behavior]

Yes

No ?~Yes

No

Symptom: Tenant Isolation Breach

HSA[NSDI ’12]

OFRewind[ATC ‘11]

YesNo?~ ?~

Yes

No

Correspondence Checking

Cause: NetHypervisor Bug

Page 9: Leveraging SDN Layering to Systematically Troubleshoot Networks

How to automate troubleshooting?

NetworkPolicy

• isolate groups A + B• route guest traffic to

an HTTP proxy• block a list of virus-

infected hosts

Possible in Software-Defined Networks

~?

(2) Check behavior against policy:• confusing: don’t know lowest-level forwarding behavior• distributed: hard to get a meaningful snapshot

Two requirements.(1) Know the intended policy:

• confusing: different config format for each protocol• distributed: configuration spread among all nodes• hard: must understand all protocols & their interactions

directly accessible

directly providedapp

fewer nodes

Page 10: Leveraging SDN Layering to Systematically Troubleshoot Networks

Takeways

• Control plane layering enables systematic troubleshooting

• Thinking about troubleshooting in terms of layers shows us where tools fit in– Reveals missing tools– Highlights choices between tools, with tradeoffs

• Plenty of opportunities left.

Operationalize!

Page 11: Leveraging SDN Layering to Systematically Troubleshoot Networks

Leverage the layers in SDN.

Brandon Heller★Colin ScottNick McKeown⌘Scott Shenker Andreas Wundsam §Hongyi Zeng⌘Sam WhitlockVimalkumar Jeyakumar⌘Nikhil Handigol★James McCauleyKyriakos Zarifis∞Peyman Kazemian★

HotSDN 2013Hong Kong

⌘StanfordBerkeley

∞USCICSI

★SDN Academy§Big Switch Networks

Questions?

Page 12: Leveraging SDN Layering to Systematically Troubleshoot Networks

How is this different than general distributed systems debugging?

• Simple answer: it’s not! SDN is an excellent opportunity to draw upon ideas from other distributed systems

• Subtlety: networks are solving a much more constrained problem than general distributed systems

Page 13: Leveraging SDN Layering to Systematically Troubleshoot Networks

Limitations

• Correctness only, not performance• Side effects not reflected in state• No guarantee of finding single code layer• No guarantee of individual layer correctness• No guarantee of future correctness• Layer visibility may be imperfect

Page 14: Leveraging SDN Layering to Systematically Troubleshoot Networks

Plenty of Opportunities Remain• Automatic Troubleshooting

Actionable Bug Reports– Filtering the signal from the noise– Creating consistent views of state

• Improving Invariant Checkers– Scale– Flexible Policy Input

• Hybrid Traditional + SDN Debugging

Page 15: Leveraging SDN Layering to Systematically Troubleshoot Networks

Plenty of Opportunities Remain• Automatic Troubleshooting

Actionable Bug Reports– Filtering the signal from the noise– Creating consistent views of state

Packet History:Path + Headers

+ Forwarding State

Forwarding State Forwarding

State

Forwarding State

Forwarding State

[HotSDN 2012:Where is the Debugger for My Software-Defined Network?]

Page 16: Leveraging SDN Layering to Systematically Troubleshoot Networks

Plenty of Opportunities Remain• Automatic Troubleshooting

Actionable Bug Reports– Filtering the signal from the noise

Controller A

Controller B

Controller C

Switch 1

Switch 2

Switch3

Switch 4

Switch 5

Switch 6

Switch 7

Switch 8

Switch 9

[Berkeley Tech Report:How Did We Get Into This Mess? Isolating Fault-Inducing Inputs to SDN Control Software]

Minimal Causal

Sequence

Page 17: Leveraging SDN Layering to Systematically Troubleshoot Networks

Isn’t this unnecessary with consistency primitives/languages/etc?

• No• Catch/rule out bugs outside the framework• Catch instances where the framework pushes

config that breaks the policy

Page 18: Leveraging SDN Layering to Systematically Troubleshoot Networks

What’s novel about this work?

• Simple answer: nothing!