network verification: from algorithms to...

84
Network Verification: From Algorithms to Deployment Brighten Godfrey Associate Professor, UIUC Co-founder and CTO, Veriflow 2nd Hebrew University Networking Summer June 21, 2017

Upload: others

Post on 07-Jul-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Network Verification: From Algorithms to Deployment

Brighten Godfrey Associate Professor, UIUC Co-founder and CTO, Veriflow

2nd Hebrew University Networking Summer June 21, 2017

Page 2: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Networks are so complex it’s hard to know they’re doing the right thing.

Let’s automate.

Page 3: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted
Page 4: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Outline for Today

Networking background

Data-plane verification • One-shot • Real-time incremental

Configuration verification

Research landscape & directions

Network verification in the real world

Page 5: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

NETWORKING BACKGROUND

Page 6: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Inside a typical enterprise network

Source: http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Medium_Enterprise_Design_Profile/MEDP/chap5.html

Page 7: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Inside a typical enterprise data center

Source: http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Security/SAFE_RG/SAFE_rg/chap4.html

Page 8: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Configs use many protocols & features

List of protocols commonly encountered by CCNAs https://learningnetwork.cisco.com/docs/DOC-25649

Layer 1 protocols (physical layer) USB Physical layer Ethernet physical layer including 10 BASE T, 100 BASE T,100 BASE TX,100 BASE FX, 1000 BASE T and other variants varieties of 802.11 Wi-Fi physical layers DSL ISDN T1 and other T-carrier links E1 and other E-carrier links Bluetooth physical layer Layer 2 protocols (Data Link Layer)

Page 9: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Configs use many protocols & features

version 12.4

service timestamps debug datetime msec

service timestamps log datetime msec

no service password-encryption

! hostname PrimaryR1

! boot-start-marker

boot-end-marker

! ! no aaa new-model

! ! ip cef

! interface Loopback100

no ip address

! interface GigabitEthernet0/1

description LAN port

ip address 64.X.X.1 255.255.255.224

ip nat inside

ip virtual-reassembly

duplex auto

speed auto

media-type rj45

no negotiation auto

standby 1 ip 64.X.X.5

standby 1 priority 105

standby 1 preempt delay minimum 60

standby 1 track Serial3/0

!

interface GigabitEthernet0/2

description conn to Backup Lightpath

ip address 65.X.X.66 255.255.255.240

ip nat outside

ip virtual-reassembly

duplex full

speed 100

media-type rj45

no negotiation auto

! interface GigabitEthernet0/3

description LAN handoff from P2P to Denver

ip address 10.30.0.1 255.254.0.0

duplex auto

speed auto

media-type rj45

no negotiation auto

! interface Serial1/0

description p-2-p to Denver DC

ip address 10.10.10.1 255.255.255.252

dsu bandwidth 44210

framing c-bit

cablelength 10

clock source internal

serial restart-delay 0

! interface Serial3/0

description DS3 XO WAN interface

ip address 65.X.X.254 255.255.255.252

ip access-group 150 in

encapsulation ppp

dsu bandwidth 44210

framing c-bit

cablelength 10

serial restart-delay 0

!

router bgp 16XX

no synchronization

bgp log-neighbor-changes

network 64.X.X.0 mask 255.255.255.224

network 64.X.X.2

aggregate-address 64.X.X.0 255.255.255.0 summary-only

neighbor 64.X.X.2 remote-as 16XX

neighbor 64.X.X.2 next-hop-self

neighbor 65.X.1X.253 remote-as 2828

neighbor 65.X.X.253 route-map setLocalpref in

neighbor 65.X.X.253 route-map localonly out

no auto-summary

! no ip http server

! ip as-path access-list 10 permit ^$

ip nat inside source list 101 interface GigabitEthernet0/2 overload

! access-list 101 permit ip any any

access-list 150 permit ip any any

! route-map setLocalpref permit 10

set local-preference 200

! route-map localonly permit 10

match as-path 10

! control-plane

! gatekeeper

shutdown

! ! end

Example basic BGP+HSRP config from https://www.myriadsupply.com/blog/?p=259

Page 10: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Distributed route computation

device software

device software

device software

device software

device softwaredevice software

protocols

protocols

protocols

protocols

protocolsprotocols

Page 11: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Result: data plane state

device software

device software

device software

device software

device softwaredevice software

protocols

protocols

protocols

protocols

protocolsprotocols

handle(packet p) if p.port != 80 drop if p.ipAddr is in 128.0.0.0/8 then forward out port 8 else if p.ipAddr is 10.5.45.43 then prepend MPLS header with label 52 forward out port 42 ….

Page 12: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Ensuring correct operations today

Manual spot-checking (pings, traceroutes)

Monitoring of events & flows

Screenshot from ScrutinizerNetFlow & sFlow analyzer,

snmp.co.uk/scrutinizer/

Page 13: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Networks are complex

89%of operators never sure that config changes are bug-free 82%

concerned that changes would cause problems with existing functionality

– Survey of network operators[Kim, Reich, Gupta, Shahbaz, Feamster, Clark,

USENIX NSDI 2015]

Page 14: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Software-Defined Networks

this section is to demonstrate a practical means for aserver to verify source provenance similar to the 3WH’sguarantee, yet without introducing an RTT delay.

4.2 Verifying provenance without a handshakeLifesaver leverages cryptographic proof to verify the

provenance of client requests without requiring an RTTdelay on every connection. First, the client handshakeswith a provenance verifier (PV) to obtain a prove-nance certificate (PC). The PC corresponds to cryp-tographic proof that the PV recently verified that theclient was reachable at a certain IP address. After ob-taining this certificate once, the client can use it formultiple requests to place cryptographic proof of prove-nance in the request packet sent to servers, in a waythat avoids replay attacks.

4.2.1 Choosing a Provenance VerifierThe PV may be any party trusted by the server. We

envision two common use cases.First, the PV may simply be the web server itself, or a

PV run by its domain at a known location (pv.xyz.com).The first time a client contacts a domain, it obtainsa PC from the PV prior to initiating the application-level request to the server; thereafter, it can contactthe server directly. Thus, the first connection takes twoRTTs (as in TCP), and subsequent connections requirea single RTT. This will be highly e�ective for domainsthat attract the same user frequently, such as popularweb sites or content distribution networks.

Second, trusted third parties could run a PV service.The advantage is that a client can avoid paying an RTTdelay for each new server or domain. The disadvantageis that servers need to trust a third party. But this isnot unprecedented: certificate authorities and the rootDNS servers are examples in today’s Internet.

The above two solutions, and multiple di�erent trustedthird parties, can exist in parallel. If the client uses aPV the server does not trust, it can fall back to a 3WHand handshake with an appropriate PV for future re-quests.

4.2.2 Obtaining a Provenance CertificateThe protocol by which a client obtains a PC is shown

in Figure 4. Before the process begins, the client and PVhave each generated a public/private key pair (Kc

pub/Kcpriv

andKpvpub/K

pvpriv respectively) using a cryptosystem such

as RSA. The client then sends a request to the PV ofthe form

{Kcpub, dc}

where dc is the duration for which the client requeststhat the PC be valid. The PV replies with the PC:

PC = {Kcpub, ac, t, d}Kpv

priv.

Here ac is the source address of the client, t is the time

Figure 4: Sending a request in Lifesaver: acquiring the

Provenance and Request Certificates and using them to es-

tablish a connection.

when the PC becomes valid, and d is the length of timethe PC will remain valid. The PV sets t to be thecurrent time, and sets d to the minimum of dc and thePV’s internal maximum time, perhaps 1 day (see furtherdiscussion in §4.4).

The obvious implementation of the above exchangewould use TCP, thus proving provenance via TCP’s3WH. In our implementation, however, each messageis a single UDP message. This is su⇤cient to verifyprovenance (§4.1.2) because while it doesn’t prove tothe PV that the client can receive messages at ac, theclient can only use the PC if it is able to receive it at ac.The advantage of this design over TCP is that the PVimplementation is entirely stateless and, thus, is itselfless vulnerable to DoS attacks and more e⇤cient.

4.2.3 Sending a requestOnce the client has a current PC for its present loca-

tion, it can contact a server using the Lifesaver protocoland include the PC in its request in order to bypass the3WH.

To do this, the client begins by constructing a re-quest certificate (RC) encrypted with its private key:

RC = {hash(mnet,mtrans, data), treq}Kcpriv

.

Here hash is a secure hash function,mnet is the network-layer metadata (source and destination IP address, pro-tocol number), mtrans is the transport-layer metadatafor the connection (source and destination port, initialsequence number), treq is the time the client sent therequest, and data is the application-level data (such asan HTTP request). The RC makes it more di⇤cult foradversaries to replay a connection request.

The client then opens a transport connection to theserver with a message of the form:

mnet,mtrans, PC,RC, data.

5

Logically centralizedcontroller

SDNcontroller

app app

Thin, standard interfaceto data plane(e.g. OpenFlow)

Page 15: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Network Verification

The process of proving whether an abstraction of the network satisfies intended network-wide properties.

Page 16: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Network Verification

The process of proving whether an abstraction of the network satisfies intended network-wide properties.

“Host A should be connected to host B.”

“Host A should not be able to reach service B on any server.”

“No packet should fall into a loop.”

“All packets should follow shortest paths.”

Page 17: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Network Verification

The process of proving whether an abstraction of the network satisfies intended network-wide properties.

Configuration

Control software

Data plane state

Packet processing

Configuration verification

Controller verification & verifiable control languages

Data plane verification

Software switch verification

Page 18: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

DATA PLANE VERIFICATION

Page 19: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Configuration verification

device software

device software

device software

device software

device softwaredevice software

protocols

protocols

protocols

protocols

protocolsprotocols

Input

Predicted

Page 20: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Data plane verification

data plane state Input

Predicted

Verify the network as close as possible to its actual behavior

Page 21: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Data plane verification

data plane state Input

Predicted

Verify the network as close as

possible to its actual behavior

• Insensitive to control protocols

• Accurate model

• Checks current snapshot

Page 22: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Need for accuracy

78 bugs sampled randomly from Bugzilla repository of

Quagga (open source software router)

67 could cause data plane effect • Under heavy load, Quagga 0.96.5 fails to update Linux kernel’s

routing tables • In Quagga 0.99.5, a BGP session could remain active after it has been

shut down

11 would not affect data plane • Mgmt. terminal hangs in Quagga 0.96.4 on “show ip bgp”

Page 23: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Architecture

“Can any packet starting at A reach B?”

Verifier

Diagnosis

Page 24: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

A little calculation…

# theoretical packets = 2(#header bits) x #injection points

= 2(18 byte ethernet + 20 byte IPv4) x 10,000 ports = 3.25 x 1095 possible packets

Estimated # atoms in observable universe

109510803.7x1015

Grams of salt In the Dead Sea

Page 25: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Digression into complexity theory

Given only IP longest-prefix match forwarding rules, how hard is it to compute whether A can reach B?

• (a) Polynomial time • (b) NP-complete • (c ) Undecidable

…if we also allow arbitrary bitmask (“drop if bit 7 = 0”)? • NP-complete

…if we also allow stateful devices (e.g. firewall remembering connection establishment)?

• Undecidable in general • EXPSPACE-complete with reasonable assumptions • Easier with additional assumptions

Some complexity results for stateful network verification Velner, Alpernas, Panda, Rabinovich, Sagiv, Shenker, Shoham TACACS 2016

Page 26: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

A-to-B query with bitmask

(x4 _ x7 _ x̄1) ^ (. . .) ^ (. . .) ^ (. . .)

x[4] = 1

x[7] = 1

x[1] = 0

A B

Packet: x[0] x[1] x[2] … x[n]

NP-complete!

Page 27: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Anteater’s solution

Express data plane and invariants as SAT • ...up to some max # hops • Dynamic programming to deal with exponential number of paths • Model packet transformations with vector of packet “versions” &

constraints across versions

Check with off-the-shelf SAT solver (Boolector)

Debugging the Data Plane with AnteaterMai, Khurshid, Agarwal, Caesar, Godfrey, KingSIGCOMM 2011

Page 28: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Data plane as boolean functions

Define P(u, v) as the expression for packets traveling from u to v • A packet can flow over (u, v) if and only if it satisfies P(u,

v)

u v

Destination Action

10.1.1.0/24 Fwd to v

P(u, v) = dst_ip ∈10.1.1.0/24

Page 29: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Reachability as SAT solving

Goal: reachability from u to w

== C = (P(u, v) ∧ P(v,w)) is satisfiable

u v w

• SAT solver determines the satisfiability of C

• Problem: exponentially many paths

- Solution: Dynamic programming (a.k.a. loop unrolling)

- Intermediate variables: “Can reach x in k hops?” - Similar to [Xie, Zhan, Maltz, Zhang, Greenberg,

Hjalmtysson, Rexford, INFOCOM’05]

Page 30: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Packet transformation

Essential to model MPLS, QoS, NAT, etc.

•Model the history of packets: vector over time •Packet transformation ⇒ boolean constraints over adjacent packet versions

v wu

label = 5?dst_ip ∈

0.1.1.0/24

(pi.dst ip 2 0.1.1.0/24) ^ (pi+1.label = 5)pi+1 = f(pi)More generally:

Page 31: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Experiences with real network

Evaluated Anteater with operational network • ~178 routers supporting >70,000 machines

• Predominantly OSPF, also uses BGP and static routing • 1,627 FIB entries per router (mean) • State collected using operator’s SNMP scripts

Revealed 23 violations of 3 invariants in 2 hours

Loop Packet loss Consistency

Being fixed 9 0 0

Stale config. 0 13 1

Total alerts 9 17 2

Page 32: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Backbone

Forwarding loops

IDP was overloaded, operator introduced bypass

Bypass routed campus traffic to IDP through static routes

Introduced 9 loops …

building

IDP

bypass

Page 33: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Multiple policy violations found

u X u

u’

Admin. interface

12.34.56.0/24

Packet loss

• Blocking compromised machines at IP level • Stale configuration

From Sep, 2008

Consistency

• One router exposed web admin interface in FIB • Different policy on

private IP address range

Page 34: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

REAL-TIME DATA PLANE VERIFICATION

Page 35: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Not so simple

Challenge #1: Obtaining real time view

Challenge #2: Verify quickly

Page 36: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Architecture

“Service S reachable only through firewall?”

Verifier

Diagnosis

Page 37: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

VeriFlow architecture

this section is to demonstrate a practical means for aserver to verify source provenance similar to the 3WH’sguarantee, yet without introducing an RTT delay.

4.2 Verifying provenance without a handshakeLifesaver leverages cryptographic proof to verify the

provenance of client requests without requiring an RTTdelay on every connection. First, the client handshakeswith a provenance verifier (PV) to obtain a prove-nance certificate (PC). The PC corresponds to cryp-tographic proof that the PV recently verified that theclient was reachable at a certain IP address. After ob-taining this certificate once, the client can use it formultiple requests to place cryptographic proof of prove-nance in the request packet sent to servers, in a waythat avoids replay attacks.

4.2.1 Choosing a Provenance VerifierThe PV may be any party trusted by the server. We

envision two common use cases.First, the PV may simply be the web server itself, or a

PV run by its domain at a known location (pv.xyz.com).The first time a client contacts a domain, it obtainsa PC from the PV prior to initiating the application-level request to the server; thereafter, it can contactthe server directly. Thus, the first connection takes twoRTTs (as in TCP), and subsequent connections requirea single RTT. This will be highly e�ective for domainsthat attract the same user frequently, such as popularweb sites or content distribution networks.

Second, trusted third parties could run a PV service.The advantage is that a client can avoid paying an RTTdelay for each new server or domain. The disadvantageis that servers need to trust a third party. But this isnot unprecedented: certificate authorities and the rootDNS servers are examples in today’s Internet.

The above two solutions, and multiple di�erent trustedthird parties, can exist in parallel. If the client uses aPV the server does not trust, it can fall back to a 3WHand handshake with an appropriate PV for future re-quests.

4.2.2 Obtaining a Provenance CertificateThe protocol by which a client obtains a PC is shown

in Figure 4. Before the process begins, the client and PVhave each generated a public/private key pair (Kc

pub/Kcpriv

andKpvpub/K

pvpriv respectively) using a cryptosystem such

as RSA. The client then sends a request to the PV ofthe form

{Kcpub, dc}

where dc is the duration for which the client requeststhat the PC be valid. The PV replies with the PC:

PC = {Kcpub, ac, t, d}Kpv

priv.

Here ac is the source address of the client, t is the time

Figure 4: Sending a request in Lifesaver: acquiring the

Provenance and Request Certificates and using them to es-

tablish a connection.

when the PC becomes valid, and d is the length of timethe PC will remain valid. The PV sets t to be thecurrent time, and sets d to the minimum of dc and thePV’s internal maximum time, perhaps 1 day (see furtherdiscussion in §4.4).

The obvious implementation of the above exchangewould use TCP, thus proving provenance via TCP’s3WH. In our implementation, however, each messageis a single UDP message. This is su⇤cient to verifyprovenance (§4.1.2) because while it doesn’t prove tothe PV that the client can receive messages at ac, theclient can only use the PC if it is able to receive it at ac.The advantage of this design over TCP is that the PVimplementation is entirely stateless and, thus, is itselfless vulnerable to DoS attacks and more e⇤cient.

4.2.3 Sending a requestOnce the client has a current PC for its present loca-

tion, it can contact a server using the Lifesaver protocoland include the PC in its request in order to bypass the3WH.

To do this, the client begins by constructing a re-quest certificate (RC) encrypted with its private key:

RC = {hash(mnet,mtrans, data), treq}Kcpriv

.

Here hash is a secure hash function,mnet is the network-layer metadata (source and destination IP address, pro-tocol number), mtrans is the transport-layer metadatafor the connection (source and destination port, initialsequence number), treq is the time the client sent therequest, and data is the application-level data (such asan HTTP request). The RC makes it more di⇤cult foradversaries to replay a connection request.

The client then opens a transport connection to theserver with a message of the form:

mnet,mtrans, PC,RC, data.

5

Logically centralizedcontroller

SDNcontroller

app app

Thin, standard interfaceto data plane(e.g. OpenFlow)

VeriFlow: Verifying Network-Wide Invariants in Real Time

Khurshid, Zou, Zhou, Caesar, GodfreyHotSDN’12 best paper, NSDI’13

Page 38: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

VeriFlow architecture

this section is to demonstrate a practical means for aserver to verify source provenance similar to the 3WH’sguarantee, yet without introducing an RTT delay.

4.2 Verifying provenance without a handshakeLifesaver leverages cryptographic proof to verify the

provenance of client requests without requiring an RTTdelay on every connection. First, the client handshakeswith a provenance verifier (PV) to obtain a prove-nance certificate (PC). The PC corresponds to cryp-tographic proof that the PV recently verified that theclient was reachable at a certain IP address. After ob-taining this certificate once, the client can use it formultiple requests to place cryptographic proof of prove-nance in the request packet sent to servers, in a waythat avoids replay attacks.

4.2.1 Choosing a Provenance VerifierThe PV may be any party trusted by the server. We

envision two common use cases.First, the PV may simply be the web server itself, or a

PV run by its domain at a known location (pv.xyz.com).The first time a client contacts a domain, it obtainsa PC from the PV prior to initiating the application-level request to the server; thereafter, it can contactthe server directly. Thus, the first connection takes twoRTTs (as in TCP), and subsequent connections requirea single RTT. This will be highly e�ective for domainsthat attract the same user frequently, such as popularweb sites or content distribution networks.

Second, trusted third parties could run a PV service.The advantage is that a client can avoid paying an RTTdelay for each new server or domain. The disadvantageis that servers need to trust a third party. But this isnot unprecedented: certificate authorities and the rootDNS servers are examples in today’s Internet.

The above two solutions, and multiple di�erent trustedthird parties, can exist in parallel. If the client uses aPV the server does not trust, it can fall back to a 3WHand handshake with an appropriate PV for future re-quests.

4.2.2 Obtaining a Provenance CertificateThe protocol by which a client obtains a PC is shown

in Figure 4. Before the process begins, the client and PVhave each generated a public/private key pair (Kc

pub/Kcpriv

andKpvpub/K

pvpriv respectively) using a cryptosystem such

as RSA. The client then sends a request to the PV ofthe form

{Kcpub, dc}

where dc is the duration for which the client requeststhat the PC be valid. The PV replies with the PC:

PC = {Kcpub, ac, t, d}Kpv

priv.

Here ac is the source address of the client, t is the time

Figure 4: Sending a request in Lifesaver: acquiring the

Provenance and Request Certificates and using them to es-

tablish a connection.

when the PC becomes valid, and d is the length of timethe PC will remain valid. The PV sets t to be thecurrent time, and sets d to the minimum of dc and thePV’s internal maximum time, perhaps 1 day (see furtherdiscussion in §4.4).

The obvious implementation of the above exchangewould use TCP, thus proving provenance via TCP’s3WH. In our implementation, however, each messageis a single UDP message. This is su⇤cient to verifyprovenance (§4.1.2) because while it doesn’t prove tothe PV that the client can receive messages at ac, theclient can only use the PC if it is able to receive it at ac.The advantage of this design over TCP is that the PVimplementation is entirely stateless and, thus, is itselfless vulnerable to DoS attacks and more e⇤cient.

4.2.3 Sending a requestOnce the client has a current PC for its present loca-

tion, it can contact a server using the Lifesaver protocoland include the PC in its request in order to bypass the3WH.

To do this, the client begins by constructing a re-quest certificate (RC) encrypted with its private key:

RC = {hash(mnet,mtrans, data), treq}Kcpriv

.

Here hash is a secure hash function,mnet is the network-layer metadata (source and destination IP address, pro-tocol number), mtrans is the transport-layer metadatafor the connection (source and destination port, initialsequence number), treq is the time the client sent therequest, and data is the application-level data (such asan HTTP request). The RC makes it more di⇤cult foradversaries to replay a connection request.

The client then opens a transport connection to theserver with a message of the form:

mnet,mtrans, PC,RC, data.

5

SDNcontroller

app app

VeriFlow

Logically centralizedcontroller

Thin, standard interfaceto data plane(e.g. OpenFlow)

VeriFlow: Verifying Network-Wide Invariants in Real Time

Khurshid, Zou, Zhou, Caesar, GodfreyHotSDN’12 best paper, NSDI’13

Page 39: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Verifying invariants quickly

Veriflow

Generate Equivalence

ClassesUpdates

Fwd’ing rulesEquiv classes

0.0.0.0/1 64.0.0.0/3

Find only equivalence classes affected by the update via a multidimensional trie data structure

Page 40: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Verifying invariants quickly

Veriflow

Generate Forwarding

Graphs

Generate Equivalence

ClassesUpdates

All the info to answer queries!

Page 41: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Veriflow

Verifying invariants quickly

Generate Forwarding

Graphs

Generate Equivalence

ClassesRun QueriesUpdates

Good rules Bad rules

Diagnosis report

• Type of invariant violation

• Affected set of packets

Page 42: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Invariant API

Veriflow’s API enables custom query algorithms • Gives access to the “diff”: equivalence classes and their forwarding

graphs • Verification becomes a standard graph traversal algorithm

What invariants can you check? • Anything within data plane state (forwarding rules)... • ...that can be verified incrementally

Page 43: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Evaluation

Simulated network • Real-world BGP routing tables (RIBs) from RouteViews totaling 5

million RIB entries • Injected into 172-router network (AS 1755 topology)

Measure time to process each forwarding change • 90,000 updates from Route Views • Check for loops and black holes

Page 44: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Microbenchmark latency

97.8% of updates verified within 1 ms

Page 45: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

MODELING DYNAMIC NETWORKS

Page 46: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Timing uncertainty

Controller

Remove rule 1(delayed)

Install rule 2

Rule 1

Rule 2

Switch A Switch B

Possible network states:

One solution: “consistent updates”[Reitblatt, Foster, Rexford, Schlesinger, Walker, “Abstractions

for Network Update”, SIGCOMM 2012]

Enforcing Customizable Consistency

Properties in Software-Defined NetworksZhou, Jin, Croft, Caesar, GodfreyNSDI 2015

Page 47: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Uncertainty-aware verification

Page 48: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Update synthesis via verification

CCGStream of Updates

Update queue Verifier

Network Model

VerificationEngine

ConfirmationsYes

Controller

A should reach BSafe?

No

A B

CD

E

F G H

2 1 3 4

1 mod A->C to A->F2 add F->G3 add G->H4 add H->B

Enforcing dynamic correctness with heuristically maximized parallelism

Page 49: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

OK, but…

Can the system “deadlock”? • Proved classes of networks that never deadlock • Experimentally rare in practice! • Last resort: heavyweight “fallback” like consistent updates [Reitblatt

et al, SIGCOMM 2012]

Is it fast?

84 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15) USENIX Association

� ����

������

� ����

������

����

��!������������������

!������������������

!�����������������

!�����������������

���

���

���

���

���

���

� ��

��

�����

��������

���

����������

!������������������

!������������������

!�����������������

!�����������������

Immediate Update

GCC

Consistent Updates

�� �������� ��} CCG

Figure 11: Network-trace-driven emulations: (1) immediate application of updates; (2) CCG (with CU as fallback); and (3) CU.

0

0.5

1

1.5

2

90 91 92 93 94 95

Thr

ough

put

(Gbp

s)

Second

CCGConsistent updates

(a) A eight-switch topology.

0

0.5

1

1.5

2

90 95 100 105 110 115 120 125 130

Thr

ough

put

(Gbp

s)

Second

CCGConsistent updates

(b) A 78-switch network.

Figure 13: Physical testbed results: comparison of through-put changes during network transitions for CCG and CU.

of all updates in the first phase before proceeding to thesecond. In contrast, CCG’s algorithm significantly short-ened the delay, especially for networks experiencing alarge number of state changes. In CCG, the through-put never dropped below 0.9 Gb/s, while CU experiencedtemporary yet significant drops during the transition, pri-marily due to the switches’ lack of support for simulta-neous application of updates and processing of packets.8 DiscussionLimitations: CCG synthesizes network updates withonly heuristically maximized parallelism, and in thecases where required properties are not segment inde-pendent, relies on heavier weight fallback mechanismsto guarantee consistency. When two or more updateshave circular dependencies with respect to the consis-tency properties, fallback will be triggered. One safe wayof using CCG is to provide it with a strong fallback plug-in, e.g., CU [25]. Any weaker properties will be auto-matically ensured by CCG, with fallback triggered (rarein practice) only for a subset of updates and when nec-essary. In fact, one can use CCG even when fallback isalways on. In this case, CCG will be faster most of thetime, as discussed in §5.3.

Related work: Among the related approaches, fourwarrant further discussion. Most closely related to ourwork is Dionysus [15], a dependency-graph based ap-proach that achieves a goal similar to ours. As discussedin §2, our approach has the ability to support 1) flexibleproperties with high efficiency without the need to im-plement new algorithms, and 2) applications with wild-carded rules. [22] also plans updates in advance, but us-ing model checking. It, however, does not account forthe unpredictable time switches take to perform updates.In our implementation, CU [25] and VeriFlow [18] arechosen as the fallback mechanism and verification en-gine. Nevertheless, they are replaceable components ofthe design. For instance, when congestion freedom is theproperty of interest, we can replace CU with SWAN [13].

Future work: We plan to study the generality of seg-ment independent properties both theoretically and prac-tically, test CCG with more data traces, and extend itsmodel to handle changes initiated from the network. Ascomparison, we will test CCG against the original im-plementation of Dionysus with dependency graphs cus-tomized to properties of interest. We will also investi-gate utilizing possible primitives in network hardware tofacilitate consistent updates.

9 ConclusionWe present CCG, a system that enforces customizable

network consistency properties with high efficiency. Wehighlight the network uncertainty problem and its ramifi-cations, and propose a network modeling technique cor-rectly derives consistent outputs even in the presence ofuncertainty. The core algorithm of CCG leverages theuncertainty-aware network model, and synthesizes a fea-sible network update plan (ordering and timing of controlmessages). In addition to ensuring that there are no vi-olations of consistency requirements, CCG also tries tomaximize update parallelism, subject to the constraintsimposed by the requirements. Through emulations andexperiments on an SDN testbed, we show that CCG iscapable of achieving a better consistency vs. efficiencytrade-off than existing mechanisms.

We thank our shepherd, Katerina Argyraki, for helpfulcomments, and the support of the Maryland ProcurementOffice under Contract No. H98230-14-C-0141.

12

84 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15) USENIX Association

� ����

������

� ����

������

����

��!������������������

!������������������

!�����������������

!�����������������

���

���

���

���

���

���

� ��

�� ��

������

�����

���

����

������

!������������������

!������������������

!�����������������

!�����������������

Immediate Update

GCC

Consistent Updates

�� �������� ��} CCG

Figure 11: Network-trace-driven emulations: (1) immediate application of updates; (2) CCG (with CU as fallback); and (3) CU.

0

0.5

1

1.5

2

90 91 92 93 94 95

Thr

ough

put

(Gbp

s)

Second

CCGConsistent updates

(a) A eight-switch topology.

0

0.5

1

1.5

2

90 95 100 105 110 115 120 125 130

Thr

ough

put

(Gbp

s)

Second

CCGConsistent updates

(b) A 78-switch network.

Figure 13: Physical testbed results: comparison of through-put changes during network transitions for CCG and CU.

of all updates in the first phase before proceeding to thesecond. In contrast, CCG’s algorithm significantly short-ened the delay, especially for networks experiencing alarge number of state changes. In CCG, the through-put never dropped below 0.9 Gb/s, while CU experiencedtemporary yet significant drops during the transition, pri-marily due to the switches’ lack of support for simulta-neous application of updates and processing of packets.8 DiscussionLimitations: CCG synthesizes network updates withonly heuristically maximized parallelism, and in thecases where required properties are not segment inde-pendent, relies on heavier weight fallback mechanismsto guarantee consistency. When two or more updateshave circular dependencies with respect to the consis-tency properties, fallback will be triggered. One safe wayof using CCG is to provide it with a strong fallback plug-in, e.g., CU [25]. Any weaker properties will be auto-matically ensured by CCG, with fallback triggered (rarein practice) only for a subset of updates and when nec-essary. In fact, one can use CCG even when fallback isalways on. In this case, CCG will be faster most of thetime, as discussed in §5.3.

Related work: Among the related approaches, fourwarrant further discussion. Most closely related to ourwork is Dionysus [15], a dependency-graph based ap-proach that achieves a goal similar to ours. As discussedin §2, our approach has the ability to support 1) flexibleproperties with high efficiency without the need to im-plement new algorithms, and 2) applications with wild-carded rules. [22] also plans updates in advance, but us-ing model checking. It, however, does not account forthe unpredictable time switches take to perform updates.In our implementation, CU [25] and VeriFlow [18] arechosen as the fallback mechanism and verification en-gine. Nevertheless, they are replaceable components ofthe design. For instance, when congestion freedom is theproperty of interest, we can replace CU with SWAN [13].

Future work: We plan to study the generality of seg-ment independent properties both theoretically and prac-tically, test CCG with more data traces, and extend itsmodel to handle changes initiated from the network. Ascomparison, we will test CCG against the original im-plementation of Dionysus with dependency graphs cus-tomized to properties of interest. We will also investi-gate utilizing possible primitives in network hardware tofacilitate consistent updates.

9 ConclusionWe present CCG, a system that enforces customizable

network consistency properties with high efficiency. Wehighlight the network uncertainty problem and its ramifi-cations, and propose a network modeling technique cor-rectly derives consistent outputs even in the presence ofuncertainty. The core algorithm of CCG leverages theuncertainty-aware network model, and synthesizes a fea-sible network update plan (ordering and timing of controlmessages). In addition to ensuring that there are no vi-olations of consistency requirements, CCG also tries tomaximize update parallelism, subject to the constraintsimposed by the requirements. Through emulations andexperiments on an SDN testbed, we show that CCG iscapable of achieving a better consistency vs. efficiencytrade-off than existing mechanisms.

We thank our shepherd, Katerina Argyraki, for helpfulcomments, and the support of the Maryland ProcurementOffice under Contract No. H98230-14-C-0141.

1284 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’15) USENIX Association

� ����

������

� ����

������

����

��!������������������

!������������������

!�����������������

!�����������������

���

���

���

���

���

���

� ��

�� ��

������

�����

���

����

������

!������������������

!������������������

!�����������������

!�����������������

Immediate Update

GCC

Consistent Updates

�� �������� ��} CCG

Figure 11: Network-trace-driven emulations: (1) immediate application of updates; (2) CCG (with CU as fallback); and (3) CU.

0

0.5

1

1.5

2

90 91 92 93 94 95

Thr

ough

put

(Gbp

s)

Second

CCGConsistent updates

(a) A eight-switch topology.

0

0.5

1

1.5

2

90 95 100 105 110 115 120 125 130

Thr

ough

put

(Gbp

s)

Second

CCGConsistent updates

(b) A 78-switch network.

Figure 13: Physical testbed results: comparison of through-put changes during network transitions for CCG and CU.

of all updates in the first phase before proceeding to thesecond. In contrast, CCG’s algorithm significantly short-ened the delay, especially for networks experiencing alarge number of state changes. In CCG, the through-put never dropped below 0.9 Gb/s, while CU experiencedtemporary yet significant drops during the transition, pri-marily due to the switches’ lack of support for simulta-neous application of updates and processing of packets.8 DiscussionLimitations: CCG synthesizes network updates withonly heuristically maximized parallelism, and in thecases where required properties are not segment inde-pendent, relies on heavier weight fallback mechanismsto guarantee consistency. When two or more updateshave circular dependencies with respect to the consis-tency properties, fallback will be triggered. One safe wayof using CCG is to provide it with a strong fallback plug-in, e.g., CU [25]. Any weaker properties will be auto-matically ensured by CCG, with fallback triggered (rarein practice) only for a subset of updates and when nec-essary. In fact, one can use CCG even when fallback isalways on. In this case, CCG will be faster most of thetime, as discussed in §5.3.

Related work: Among the related approaches, fourwarrant further discussion. Most closely related to ourwork is Dionysus [15], a dependency-graph based ap-proach that achieves a goal similar to ours. As discussedin §2, our approach has the ability to support 1) flexibleproperties with high efficiency without the need to im-plement new algorithms, and 2) applications with wild-carded rules. [22] also plans updates in advance, but us-ing model checking. It, however, does not account forthe unpredictable time switches take to perform updates.In our implementation, CU [25] and VeriFlow [18] arechosen as the fallback mechanism and verification en-gine. Nevertheless, they are replaceable components ofthe design. For instance, when congestion freedom is theproperty of interest, we can replace CU with SWAN [13].

Future work: We plan to study the generality of seg-ment independent properties both theoretically and prac-tically, test CCG with more data traces, and extend itsmodel to handle changes initiated from the network. Ascomparison, we will test CCG against the original im-plementation of Dionysus with dependency graphs cus-tomized to properties of interest. We will also investi-gate utilizing possible primitives in network hardware tofacilitate consistent updates.

9 ConclusionWe present CCG, a system that enforces customizable

network consistency properties with high efficiency. Wehighlight the network uncertainty problem and its ramifi-cations, and propose a network modeling technique cor-rectly derives consistent outputs even in the presence ofuncertainty. The core algorithm of CCG leverages theuncertainty-aware network model, and synthesizes a fea-sible network update plan (ordering and timing of controlmessages). In addition to ensuring that there are no vi-olations of consistency requirements, CCG also tries tomaximize update parallelism, subject to the constraintsimposed by the requirements. Through emulations andexperiments on an SDN testbed, we show that CCG iscapable of achieving a better consistency vs. efficiencytrade-off than existing mechanisms.

We thank our shepherd, Katerina Argyraki, for helpfulcomments, and the support of the Maryland ProcurementOffice under Contract No. H98230-14-C-0141.

12

Page 50: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

CONFIGURATION VERIFICATION

Page 51: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Challenges and Approach

Challenges in faithfully deriving the data plane • Accurately model low-level configuration directives • Provide high-level understanding of errors to operators

Approach: High-fidelity declarative model of control plane • Set of relations that expresses the network’s control plane

computation • Provides queryability and provenance for free

A general approach to network configuration analysis Fogel, Fung, Pedrosa, Walraed-Sullivan, Govindan, Mahajan, Millstein NSDI 2015

Slides in this section thanks to the Batfish team

Page 52: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Batfish

Available at http://www.batfish.org

Has found real bugs in real networks

4 stages: • Control plane generator • Data plane generator • Safety analyzer • Provenance tracker

Page 53: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

n1n2

n3

n4N

Fact about topology LanNeighbors( node1:n3 interface1:int3_1, node2:n1, interface2:int1_3).

Stage 1: Extract control plane model

Fact about OSPF interface costs OspfCost( node:n3, interface:int3_1, cost:1).

//----------Configuration of n3---------- 1 ospf interface int3_1 metric 1 2 ospf interface int3_2 metric 1 3 ospf interface int3_4 metric 1

4 static route 10.0.0.0/24 drop 5 ospf redistribute static metric 10

6 bgp neighbor p1 AS P Accept ALL

Page 54: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

OspfExport( node=n2, network=10.0.0.0/24, cost=10, type=ospfE2).

Fib( node=n1, network=10.0.0.0/24, egressInterface=int1_2).

Stage 2: Compute data plane

InstalledRoute(route={ node=n1, network=10.0.0.0/24, nextHop=n2 administrativeCost=110, protocolCost=10, protocol=ospfE2}).

Page 55: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Stage 3: Data plane analysis

Counterexample of multipath consistency { IngressNode=n1, SrcIp=0.0.0.0, DstIp=10.0.0.2, IpProtocol=0 }

Page 56: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Stage 4: Report Provenance

Counterexample packet traces FlowPathHistory( flow={ node=n1, … ,dstIp=10.0.0.2 }, 1st hop:[ n1:int1_2 → n2:int2_1 ] 2nd hop:[ n2:int2_10 → n10:int10_2 ] fate=accepted). -------------------------------------------------------- FlowPathHistory( flow={ node=n1, … ,dstIp=10.0.0.2 }, 1st hop:[ n1:int1_3 → n3:int3_1 ] fate=nullRouted by n3).

Page 57: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

New Consistency Properties

Multipath – disposition consistent on all paths

10.0.0.0/24

n1

n2

n3

n10

Page 58: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

New Consistency Properties

Multipath – disposition consistent on all paths

Failure – reachability unaffected by failure

n1

n2

n3

C

2.2.2.0/24 3.3.3.0/24

N

2.2.2.0/24

c1

c2

Page 59: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

New Consistency Properties

Multipath – disposition consistent on all paths

Failure – reachability unaffected by failure

Destination – at most one customer per delegated address

CA

Nca1

CBcb1

Bb1

10.0.1.0/24 AS length=1

10.0.1.0/24 AS length=2

n1

Page 60: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Implementation

Support multiple configuration languages • IOS, NX-OS, Juniper, Arista, …

Broad feature support • Route redistribution, OSPF internal/external, BGP communities…

Unified, vendor-neutral intermediate representation

Page 61: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Evaluation

Two large university networks

Net1 – 21 core routers • Federated network • Each department is own AS • Heavy use of BGP

Net2 – 17 core routers • Centrally controlled • Heavy use of VLANs • Single AS • BGP communication only with ISPs

Net1 Core

Net1 ISP1

Net1 Dept1

Net1 ISPm

Net1 Deptn

Net1

BGP BGP

Page 62: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Two large university networks

Net1 – 21 core routers • Federated network • Each department is own AS • Heavy use of BGP

Net2 – 17 core routers • Centrally controlled • Heavy use of VLANs • Single AS • BGP communication only with ISPs

Evaluation

Net2 Core

Net2 ISP1

Net2 Dept1

Net2 ISPm

Net2 Deptn

Net2

IGP,VLAN IGP,VLAN

Page 63: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Invariant Total Violations

Violations Confirmed By

Operators

Violations Fixed by Operators

Net1

Multipath 32(4) 32(4) 21(3)

Failure 16(7) 3(2) 0(0)

Destination 55(6) 55(6) 1(1)

Net2Multipath 11(3) 11(3) 11(3)

Failure 77(26) 18(7) 0(0)

Results

Page 64: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Performance

Data plane generation Multipath consistency

Failure consistency

Net1 238 min 75 x (< 1.5 min) 199 x (~238 min)

Net2 37 min 17 x (< 1.5 min) 279 x (~37 min)

Page 65: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Comparing approaches

Configuration

Control software

Data plane state

Packet processing

• Enables “what-if” analysis • Trace root cause (in Batfish via LogicQL / LogicBlox)

• Provides basis for higher-level analysis • Accuracy based on actual observed data plane

Page 66: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

THE RESEARCH LANDSCAPE

Page 67: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Data plane verification

Static • On static reachability in IP networks [Xie, Zhan, Maltz, Zhang, Greenberg, Hjalmtysson,

Rexford, INFOCOM ’05] - Essentially early form of data plane verification - Computed reachable sets with IP forwarding rules

• FlowChecker [Al-Shaer, Al-Haj, SafeConfig ’10]

• Anteater [Mai, Khurshid, Agarwal, Caesar, G., King, SIGCOMM’11] • Header Space Analysis [Kazemian, Varghese, and McKeown, NSDI ’12]

• Network-Optimized Datalog (NoD) [Lopes, Bjørner, Godefroid, Jayaraman, Varghese,

NSDI 2015]

Real time (incremental) • VeriFlow [Khurshid, Zou, Zhou, Caesar, G., HotSDN’12, NSDI’13] • NetPlumber [Kazemian, Chang, Zeng, Varghese, McKeown, Whyte, NSDI ’13]

• CCG [Zhou, Jin, Croft, Caesar, G., NSDI’15]

Page 68: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Data plane verification (cont’d)

Optimizations • Libra: Divide and Conquer to Verify Forwarding Tables in Huge

Networks [Zeng, Zhang, Ye, Google, Jeyakumar, Ju, Liu, McKeown, Vahdat, NSDI’14] • Atomic Predicates [Yang, Lam, ToN’16] • ddNF [Bjorner, Juniwal, Mahajan, Seshia, Varghese, HVC’16]

Page 69: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Configuration verification

Configuration verification • RCC (Detecting BGP config faults w/static analysis) [Feamster & Balakrishnan,

USENIX ’05] • ConfigAssure [Narain et al, ’08] • ConfigChecker [Al-Shaer, Marrero, El-Atawy, ICNP ‘09]

• Batfish [Fogel, Fung, Pedrosa, Walraed-Sullivan, Govindan, Mahajan, Millstein, NSDI’15] • Bagpipe [Weitz, Woos, Torlak, Ernst, Krishnamurthy, Tatlock, NetPL’16 & OOPSLA’16]

Page 70: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Richer verification

Richer data plane models • Software Dataplane Verification [Dobrescu, Argyraki, NSDI’14] • SymNet [Stoenescu, Popovici, Negreanu, Raiciu, SIGCOMM’16] • Mutable datapaths [Panda, Lahav, Argyraki, Sagiv, Shenker, NSDI’17]

Verifiable controllers & control languages • NICE [Canini, Venzano, Perešíni, Kostić, Rexford, NSDI’12] • NetKAT [Anderson, Foster, Guha, Jeannin, Kozen, Schlesinger, Walker, POPL’14] • Kinetic: Verifiable Dynamic Network Control [Kim, Gupta, Shahbaz, Reich,

Feamster, Clark, NSDI’15]

Page 71: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

NETWORK VERIFICATION IN THE REAL WORLD

Page 72: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Industry efforts

Three startups pursuing general-purpose network verification for enterprises

Special purpose efforts • Hyperscale clouds • Major network device manufacturer

Gartner grouping verification in “intent-based networking” category

What have we learned?

Page 73: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

1. The Need is Real

Page 74: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

1. The Need is Real

Disaster Recovery Data Center

Dallas Office

New York OfficeSalesforce

AWSWAN

ServiceProvider

Backbone

Data Center

Page 75: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

Source:haneke.net

Page 76: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

1. The Need is Real

Network Complexity Change Manual

Processes59% say growth in

complexity has led to more frequent outages

[Dimensional Research]

22,000 changes/mo. at DISA [S. Zabel, 2016]

69% use manual checks (most common

technique) [Dimensional Research]

Page 77: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

2. How is it actually useful?

Network Segmentation

Availability &Resilience

Continuous Compliance

Incident Response

Page 78: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

3. Extracting the abstraction: not easy

Software verification Data plane verification

#include <stdio.h>

int main(int argc, char** argv) {

if (argv >= 2) {

printf(“Hello world, %s!”,

argv[1]);

} return 0; }

• No universal API to extract state (LCD: SSH + CLI “show” commands”)

• No formal spec of how that state relates to functionality

• Vendor-specific behaviors

• Given program as input • Assume formal specification of

programming language

Page 79: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

3. Extracting the abstraction: not easy

Data plane verification

• No universal API to extract state (LCD: SSH + CLI “show” commands”)

• No formal spec of how that state relates to functionality

• Vendor-specific behaviors

6 © 2014 Broadcom Corporation. All rights reserved.

OF-DPA 1.0 ABSTRACT SWITCH

� Full-Feature L2 Bridging and L3 Routing � L2 VLAN assignment and filtering, multicast, DLF, broadcast � L3 unicast, multicast, ECMP

� VXLAN Gateway � Isolated tenant forwarding domain

� Wide-Match Policy ACL Actions � Redirect, drop, classify, mark, etc. � L2 header rewrite

� Source Learning Vendor Extension

VLAN Flow Table

Termination MAC Flow

Table

Apply Actions- push/pop- edits- output

Action set

ACL Policy Flow Table

Bridging Flow Table

Unicast Routing

Flow Table

Group Table Entries

L2 Flood

L3 Multicast

L2 Multicast

L3 ECMP

L2 Interface

L3 InterfaceL3 Unicast

Multicast Routing

Flow Table

Ingress Port Flow Table

Physical Port Physical

Port

MAC Learning

Flow Table

Synchronized

Broadcom’s OF-DPA 1.0 Abstract Switch https://www.ietf.org/proceedings/90/slides/slides-90-sdnrg-3.pdf

• Some hope: Vendor-specific APIs, OpenConfig

Page 80: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

4. Model / Verifier separation works

Looking ahead: An Opportunity

Real time "knowledge layer"

Formal model of network behavior

NetworkRouters, switches, firewalls, ...

Topology

Data plane state

(forwarding tables)1

Snapshot or real-time stream of:

2

Applications!

Page 81: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

4. Model / Verifier separation works

this section is to demonstrate a practical means for aserver to verify source provenance similar to the 3WH’sguarantee, yet without introducing an RTT delay.

4.2 Verifying provenance without a handshakeLifesaver leverages cryptographic proof to verify the

provenance of client requests without requiring an RTTdelay on every connection. First, the client handshakeswith a provenance verifier (PV) to obtain a prove-nance certificate (PC). The PC corresponds to cryp-tographic proof that the PV recently verified that theclient was reachable at a certain IP address. After ob-taining this certificate once, the client can use it formultiple requests to place cryptographic proof of prove-nance in the request packet sent to servers, in a waythat avoids replay attacks.

4.2.1 Choosing a Provenance VerifierThe PV may be any party trusted by the server. We

envision two common use cases.First, the PV may simply be the web server itself, or a

PV run by its domain at a known location (pv.xyz.com).The first time a client contacts a domain, it obtainsa PC from the PV prior to initiating the application-level request to the server; thereafter, it can contactthe server directly. Thus, the first connection takes twoRTTs (as in TCP), and subsequent connections requirea single RTT. This will be highly e�ective for domainsthat attract the same user frequently, such as popularweb sites or content distribution networks.

Second, trusted third parties could run a PV service.The advantage is that a client can avoid paying an RTTdelay for each new server or domain. The disadvantageis that servers need to trust a third party. But this isnot unprecedented: certificate authorities and the rootDNS servers are examples in today’s Internet.

The above two solutions, and multiple di�erent trustedthird parties, can exist in parallel. If the client uses aPV the server does not trust, it can fall back to a 3WHand handshake with an appropriate PV for future re-quests.

4.2.2 Obtaining a Provenance CertificateThe protocol by which a client obtains a PC is shown

in Figure 4. Before the process begins, the client and PVhave each generated a public/private key pair (Kc

pub/Kcpriv

andKpvpub/K

pvpriv respectively) using a cryptosystem such

as RSA. The client then sends a request to the PV ofthe form

{Kcpub, dc}

where dc is the duration for which the client requeststhat the PC be valid. The PV replies with the PC:

PC = {Kcpub, ac, t, d}Kpv

priv.

Here ac is the source address of the client, t is the time

Figure 4: Sending a request in Lifesaver: acquiring the

Provenance and Request Certificates and using them to es-

tablish a connection.

when the PC becomes valid, and d is the length of timethe PC will remain valid. The PV sets t to be thecurrent time, and sets d to the minimum of dc and thePV’s internal maximum time, perhaps 1 day (see furtherdiscussion in §4.4).

The obvious implementation of the above exchangewould use TCP, thus proving provenance via TCP’s3WH. In our implementation, however, each messageis a single UDP message. This is su⇤cient to verifyprovenance (§4.1.2) because while it doesn’t prove tothe PV that the client can receive messages at ac, theclient can only use the PC if it is able to receive it at ac.The advantage of this design over TCP is that the PVimplementation is entirely stateless and, thus, is itselfless vulnerable to DoS attacks and more e⇤cient.

4.2.3 Sending a requestOnce the client has a current PC for its present loca-

tion, it can contact a server using the Lifesaver protocoland include the PC in its request in order to bypass the3WH.

To do this, the client begins by constructing a re-quest certificate (RC) encrypted with its private key:

RC = {hash(mnet,mtrans, data), treq}Kcpriv

.

Here hash is a secure hash function,mnet is the network-layer metadata (source and destination IP address, pro-tocol number), mtrans is the transport-layer metadatafor the connection (source and destination port, initialsequence number), treq is the time the client sent therequest, and data is the application-level data (such asan HTTP request). The RC makes it more di⇤cult foradversaries to replay a connection request.

The client then opens a transport connection to theserver with a message of the form:

mnet,mtrans, PC,RC, data.

5

verify

Predictive model(“forwarding graph”)

search …visualize API

Page 82: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

5. We need a shift in thought

Network as individual devices

Individual config knobs

Page 83: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

5. We need a shift in thought

Network as individual devices

Individual config knobs

Network as one system

End-to-end intent

Page 84: Network Verification: From Algorithms to Deploymentpbg.cs.illinois.edu/outreach/2017.06.20-NetworkVerificationTutorial.p… · Data plane verification data plane state Input cted

THANK YOU!