programming (software-defined) networks · control a single entity controls massive data centers....

80
Programming (Software-Defined) Networks David Walker Princeton University http://www.cs.princeton.edu/~dpw http://frenetic-lang.org

Upload: others

Post on 29-Dec-2019

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Programming (Software-Defined)

Networks

David Walker Princeton University

http://www.cs.princeton.edu/~dpw http://frenetic-lang.org

Page 2: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

The Frenetic Team

Cornell: Carolyn Anderson (alum) Shrutarshi Basu Rebecca Coombes Nate Foster (faculty) Dexter Kozen (faculty) Stephen Gutz (alum) Matthew Milano Mark Reitblatt Emin Gün Sirer (faculty) Robert Soulé (alum) Laure Thompson Alec Story (alum) Todd Warszawski (alum)

Princeton: Ryan Beckett Rob Harrison (alum) Xin Jin Nanxi Kang Naga Praveen Katta Jennifer Gossels Michael Greenberg (soon Pomona) Matthew Meola Chris Monsanto (alum) Srinivas Narayana Josh Reich (alum) Jen Rexford (faculty) Cole Schlesinger (alum) David Walker (faculty)

UMass: Arjun Guha (faculty)

Page 3: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Hypothesis:

The verification, synthesis and PL communities have their biggest

opportunity since Intel screwed up division

3

Page 4: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Where? Data Centers

4

Why? Complexity + Money + Control

Page 5: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Data Center Fun Facts

5

MS Chicago: •  700,000 sq ft •  40 ft shipping containers

– 2k servers/container •  capacity of ~224k servers

Page 6: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

At 12:47 AM PDT on April 21st, a network change was performed as part of our normal scaling activities... During the change, one of the steps is to shift traffic off of one of the redundant routers... The traffic shift was executed incorrectly and the traffic was routed onto the lower capacity redundant network. This led to a “re-mirroring storm”... During this re-mirroring storm, the volume of connection attempts was extremely high and nodes began to fail, resulting in more volumes left needing to re-mirror. This added more requests to the re-mirroring storm... The trigger for this event was a network configuration change. April 21, 2011

http://aws.amazon.com/message/65648/

BUZZfeed: Amazon.com APPEARS TO BE DOWN. EVERYBODY PANIC!

Page 7: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Those outages cost a lot

7

Aug 13, 2013, Amazon was down for roughly 40 minutes.

–  $1,100/second1

–  $4.8 million total2

1 http://www.buzzfeed.com/mattlynley/the-high-cost-of-an-amazon-outage#.lm8dL268x 2 http://www.geekwire.com/2013/amazon-lost-5m-40-minutes/

Page 8: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

These outages cost a lot

Where we you from 3:52 to 3:57pm on August 17, 2013? Google was down. One estimate1 placed the losses at $545,000 A 40% reduction in global website traffic was observed

8

1 http://www.neowin.net/news/googles-5-minute-outage-means-545000-revenue- loss-40-drop-in-global-website-traffic

Page 9: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Lots more problems ... Not all errors manifest themselves as user-facing problems

–  Some errors "merely" cause internal packet loss, retransmission, reduced latency, increased energy usage

MS scale-out checked by Network Optimized Datalog (NoD) –  1 alert per day –  3-4 buggy IP ranges –  16K faulty addresses/range

9

Page 10: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods.

And they are!! –  "Use of formal methods at Amazon web services"1

•  [Newcombe et al, 2014]

–  "Checking beliefs in dynamic networks"2 •  [Lopes et al, 2015]

Start-ups: Veriflow Systems, Forward Networks

10

1 http://research.microsoft.com/en-us/um/people/lamport/tla/formal-methods-amazon.pdf 2 NSDI 2015 http://research.microsoft.com/pubs/215431/nod.pdf

Page 11: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

NETWORK PROGRAMMING

11

Page 12: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

12

A D

H1 H2

B C

Traditional Networks

routers

end host

5 4

3 2

1

routers have ports

Page 13: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

13

A D

H1 H2

B C

Goal

H1 can generate a packet with an address (say H2); the packet finds its way along some path to that address

Page 14: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

14

A D

H1 H2

B C

Goal

H1 can generate a packet with an address (say H2); the packet finds its way along some path to that address

Page 15: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

15

A D

H1 H2

B C

Goal

H1 can generate a packet with an address (say H2); the packet finds its way along some path to that address

Page 16: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

16

A D

H1 H2

B C

Goal

H1 can generate a packet with an address (say H2); the packet finds its way along some path to that address

Page 17: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

17

A D

H1 H2

B C

Data Plane

Pattern Action Bytes Packets 01010 Drop 200 10 010* Forward(3) 100 3

H2 in this range

Router A's Flow Table

Page 18: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

18

A D

H1 H2

B C

Control Plane

Somehow, someone or something must decide WHAT RULES go in to the data plane.

The CONTROL PLANE makes those decisions

Page 19: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

19

A D

B C

Traditional Control Plane

BGP

Each router runs an instance of a distributed algorithm. BGP. OSPF. GRE. RIP ...

BGP BGP

BGP

The routers exchange messages according to pre-defined protocols. Based on information received, decisions are made about how to populate the data plane.

Page 20: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

20

A D

B C

Traditional Control Plane

"I can reach H1

along path A"

Each router runs an instance of a distributed algorithm. BGP. OSPF. GRE. RIP ...

5

7 H1

Page 21: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

21

A D

B C

Traditional Control Plane

"I can reach H1

along path A"

I can reach H1

along path CBA

Each router runs an instance of a distributed algorithm. BGP. OSPF. GRE. RIP ...

H1 7

5

Page 22: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

22

A D

B C

Traditional Control Plane

"I can reach H1

along path A"

I can reach H1

along path CBA

Each router runs an instance of a distributed algorithm. BGP. OSPF. GRE. RIP ...

My data plane should send H1

packets out port 7

7 H1

Page 23: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

23

A D

B C

Configuring Traditional Control Plane

configuration script

Configuration scripts set parameters of the algorithms

Page 24: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Hard to understand & change

Poor interfaces for programming •  Set parameters of algorithms OSPF, BGP •  Rather than program with a general, uniform language

Closed equipment •  Software bundled with hardware •  Slow protocol standardization

Few people can make changes •  "Hello? Cisco?" •  "Hello, this is Cisco! :-) How can I help you?" •  "This is Google. Please improve your load balancer." •  ... <dial tone> ... •  "We seem to have lost the connection. Damn you AT&T!"

Page 25: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Many Opportunities There are a lot of parameters to set on a lot of separate devices in separate configs

– problem 1: check that devices have "good" configurations (and then fix the bad ones)

•  Batfish1 [NSDI 2015], BagPipe2

– problem 2: synthesize a consistent set of parameters for devices from high-level specs

•  Narain3 [Lisa 05], Margrave4 [Lisa 10] – problem 3: develop better languages for

traditional configurations Nettle5 [DSL 09]

25

1 http://research.microsoft.com/en-us/um/people/ratul/papers/nsdi2015-batfish.pdf 2 http://www.konne.me/bagpipe/ 4 http://www.margrave-tool.org/ 3 https://www.usenix.org/legacy/event/lisa05/tech/full_papers/narain/narain.pdf 5 http://haskell.cs.yale.edu/?post_type=publication&p=350

Page 26: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A

B C

A

B C

Traditional: Each router contains control + data plane Configure by setting alg. parameters

Routers communicate data needed by control-plane algorithm using a different standardized protocol for each algorithm

Software-Defined (SDN): Each router contains data plane only Controller runs algorithm to configure global network; sends (OpenFlow) requests to routers

Controller

install/ uninstall rule

request stats

notify failure/ handle packet

Page 27: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Network Events •  Forwarding table miss

Control Messages •  Install rules

An Example SDN Application

Problem 5: Design a programming system that makes it easy to construct reliable, efficient SDN applications.

Page 28: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

SDN was Great, PL people can make it Greater

The good: – simple, uniform abstraction

•  a switch = a table of rules

– never underestimate the power of simplicity! – also "complete for forwarding"

The bad: – provides the "assembly language" level of

abstraction 28

Page 29: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

MODULAR SDN POLICIES

29

[ ICFP 2011, POPL 2012, NSDI 2013, POPL 2014, NSDI 2015 ] Anderson, Baptist, Foster, Guha, Gossels, Harrison, Jin, Kozen, Monsanto, Reich, Rexford, Story, Schlesinger, Walker

Page 30: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Modularity

30

Controller Platform

LB Route Monitor FW

We still need all the functionality of old networks: Traffic monitoring, routing, firewalls, load balancing

The only way to engineer it is through modular design.

Page 31: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Forwarding

NOX

“install rule to forward packets X and Y to D”

A Typical Programming Problem

Page 32: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Typical Programming Problem

Forwarding

NOX

“forwarding packets X and Y to D”

“install rule to forward packets X and Y to D”

Page 33: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Typical Programming Problem

Forwarding Monitoring

NOX

“install rule to count number of packets X and Z that flow to D”

“forwarding packets X and Y to D”

“install rule to forward packets X and Y to D”

Page 34: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Typical Programming Problem

Forwarding Monitoring

NOX

“install rule to count number of packets X and Z that flow to D”

“counting Z flowing to D” “forwarding Y to D” “doing random stuff with X”

“install rule to forward packets X and Y to D”

Page 35: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

An Analogy

Concurrent Thread 1

Concurrent Thread 2

NOX

“write to X”

“race condition on X”

“write to X”

shared state

Page 36: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

OpenFlow/Nox “Solution”

Forwarding & Monitoring

NOX

manually rewrite and combine forwarding and monitoring applications in to one big monolithic jumble!

Just don't divide up complex problems in to smaller parts!

Page 37: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Anti-Modularity in NOX

def switch_join(switch): repeater(switch) def repeater(switch): pat1 = {in_port:1} pat2 = {in_port:2} install(switch,pat1,DEFAULT,None,[output(2)]) install(switch,pat2,DEFAULT,None,[output(1)])

def monitor(switch): pat = {in_port:2,tp_src:80} install(switch, pat, DEFAULT, None, []) query_stats(switch, pat) def stats_in(switch, xid, pattern, packets, bytes): print bytes sleep(30) query_stats(switch, pattern)

Repeater

Web Monitor

def switch_join(switch) repeater_monitor(switch) def repeater_monitor(switch): pat1 = {in_port:1} pat2 = {in_port:2} pat2web = {in_port:2, tp_src:80} Install(switch, pat1, DEFAULT, None, [output(2)]) install(switch, pat2web, HIGH, None, [output(1)]) install(switch, pat2, DEFAULT, None, [output(1)]) query_stats(switch, pat2web) def stats_in(switch, xid, pattern, packets, bytes): print bytes sleep(30) query_stats(switch, pattern)

Repeater/Monitor

blue = from repeater red = from web monitor green = from neither

10

Page 38: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Functional Solution 1st Class SDN Policies + Combinators

38

Repeater Monitoring

Firewall

event

Controller

event

Page 39: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Functional Solution 1st Class SDN Policies + Combinators

39

Repeater Monitoring

Firewall event

event

1st class policies

Page 40: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Functional Solution 1st Class SDN Policies + Combinators

40

Repeater Monitoring

Firewall +

;

combine policies from separate modules

event

event

Page 41: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Functional Solution 1st Class SDN Policies + Combinators

41

Repeater Monitoring

Firewall +

;

Compiler

event

event

Page 42: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Simple Network

42

{ sw = A; pt = 1; src = H1; dst = H2; ... }

Packets are records containing location information:

A B

H1 H2

1 2 1 2

Page 43: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Simple Network

43

A B

H1 H2

1 2 1 2

Generic programming task: to write a function that explains how all switches in the network should process their packets

Each such function is called a network policy:

policy : packet -> packet set

Page 44: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Basic Features A B

1 2 1 2

Predicates: sw = A & pt = 1 // packets at switch A, port 1 left untouched

// others dropped

Sequential composition: dst <- 10.0.0.1; pt <- 2 // modify the dst field then the port field

Actions: dst <- 10.0.0.1 // modify the dest field

pt <- 2 // modify the pt field (ie: move packet to port 2)

Parallel composition: (pt <- 1) + (pt <- 2) // copy the packet and do both operations

Page 45: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Simple Network

A B

H1 H2

1 2 1 2

Tasks: •  Forwarding: Transfer packets between hosts •  Access Control: Block SSH packets

forward = (dst = H1; pt <- 1) + (dst = H2; pt <- 2)

ac = ~(typ = SSH); forward

block on both A & B

Page 46: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Simple Network

A B

H1 H2

1 2 1 2

block on both A & B

acA = sw = A; ~(typ = SSH); forward + sw = B; forward

acB = sw = A; forward + sw = B; ~(typ = SSH); forward

ac = ~(typ = SSH); forward

block on A

block on B

Page 47: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

A Simple Network

A B

H1 H2

1 2 1 2

ac = ... acA = ... acB = ...

Are all SSH packets dropped?

Do all non-SSH packets sent from H1 arrive at H2?

Are the optimized policies equivalent to the unoptimized one?

ac ac ?

Page 48: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Encoding Topologies

A B

H1 H2

1 2 1 2

t = (sw = A & pt = 2; sw <- B; pt <- 1) + (sw = B & pt = 1; sw <- A; pt <- 2)

net = ac; t; ac

ac ac t

Page 49: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Encoding Topologies

H1 H2

t = ...

net = ac; t; ac; t; ac; t; ac; t; ac + ac; t; ac; t; ac; t; ac; t; ac; t; ac; ... + ...

A B

net = (ac; t)*; ac Kleene iteration: p* = id + p + p;p + ...

Page 50: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Encoding Networks

A B

H1 H2

1 2 1 2

ac = ... t = ... net = (ac; t)*; ac

net is a function that moves packets: A1 ==> B2 B2 ==> A1

and also moves packets: A1 ==> A2 A2 ==> A1 B1 ==> B2 B2 ==> B1

edge = sw=A & pt=1 || sw=B & pt=2 net = edge; (ac; t)*; ac; edge

Page 51: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Summary So Far

in; (policy; topology)*; policy; out

a, b, c ::= drop // drop all packets | id // accept all packets | f = v // field f matches v | ~a // negation | a & b // conjunction | a || b // disjunction

p, q, r ::= a // filter according to a | f <- v // update field f to v | p ; q // do p then q | p + q // do p and q in parallel | p* // do p zero or more times

Predicates

Network Encoding

Policies

Page 52: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Summary So Far

in; (policy; topology)*; policy; out

a, b, c ::= drop // drop all packets | id // accept all packets | f = v // field f matches v | ~a // negation | a & b // conjunction | a || b // disjunction

p, q, r ::= a // filter according to a | f <- v // update field f to v | p ; q // do p then q | p + q // do p and q in parallel | p* // do p zero or more times

Predicates

Network Encoding

Policies

Boolean Algebra

Kleene Algebra

Boolean Algebra + Kleene Algebra = Kleene Algebra with Tests

Page 53: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Equational Theory

53

net1 ≈ net2

For programmers (or synthesizers): –  a system for reasoning about programs as they

are written

For compiler writers: –  a means to prove their transformations correct

For verifiers: –  sound and complete with a PSPACE decision

procedure [Foster et al., POPL 2015]

Page 54: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

54

Boolean Algebra: a & b ≈ b & a a & ~a ≈ drop ...

Kleene Algebra: (a; b); c ≈ a; (b; c) a; (b + c) ≈ (a; b) + (a; c)

... p* ≈ id + p; p*

Packet Algebra: f <- n; f = n ≈ f <- n f = n; f <- n ≈ f = n

f <- n; f <- m ≈ f <- m

a || ~a ≈ id

if m ≠ n: f = n; f = m ≈ drop

f <- n; g <- m ≈ g <- m; f <- n if f ≠ g: f = n; g <- m ≈ g <- m; f = n

f = 0 + ... + f = n ≈ id (finite set of possible values in f)

Equational Theory

Page 55: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Using the Theory

A B

H2

1 2 1 2

55

forward = (dst = H1; pt <- 1) + (dst = H2; pt <- 2) ac = ~(typ = SSH); forward t = ... edge = ... net = edge; (ac; t)*; ac; edge

Are all SSH packets dropped?

~typ = SSH; sw=A; pt=1; net ≈ ~typ = SSH; sw=A; pt=1; sw <- B; pt <- 2

Do all non-SSH packets sent from H1 arrive at H2?

typ = SSH; net ≈ drop

H1

Page 56: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Using the Theory

56

A B

H1 H2

1 2 1 2

forward = (dst = H1; pt <- 1) + (dst = H2; pt <- 2) ac = ~(typ = SSH); forward t = ... edge = ... net = edge; (ac; t)*; ac; edge

Are all SSH packets dropped?

~typ = SSH; dst = H2; sw=A; pt=1; net ≈ ~typ = SSH; dst = H2; sw=A; pt=1; sw <- B; pt <- 2

Do all non-SSH packets destined for H2, sent from H1 arrive at H2?

typ = SSH; net ≈ drop

Page 57: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Traffic Isolation A B

H1 H2 1 2 1 2

H4

3

H3

3

polA1 = sw = A; ( pt = 1; pt <- 2 +

pt = 2; pt <- 1 ) polB1 = sw = B; ( ... ) pol1 = polA1 + polB1 net1 = (pol1; t)*

polA2 = sw = B; ( pt = 3; pt <- 2 +

pt = 1; pt <- 3 ) polB2 = sw = A; ( ... ) pol2 = polA2 + polB2 net2 = (pol2; t)*

Programmer 1 connects H1 and H2: Programmer 2 connects H3 and H4:

net3 = ((pol1 + pol2); t)* // traffic from H2 goes to H1 and H4!

Page 58: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Traffic Isolation A B

H1 H2 1 2 1 2

H4

3

H3

3

A network slice is a light-weight abstraction designed for traffic isolation:

{ in } policy { out }

traffic outside the slice satisfying in enters the slice

traffic inside the slice satisfying out exits the slice

traffic inside the slice obeys the policy

slices are just a little syntactic sugar on top of NetKAT

Page 59: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Traffic Isolation A B

H1 H2 1 2 1 2

H4

3

H3

3

A network slice is a light-weight abstraction designed for traffic isolation:

edge1 = sw = A & pt = 1 || sw = B & pt = 2 slice1 = {edge1} pol1 {edge1}

edge2 = sw = A & pt = 3 || sw = B & pt = 3 slice2 = {edge2} pol2 {edge2}

Theorem: (slice1; t)* + (slice2;t)* ≈ ((slice1 + slice2); t)*

packet copied and sent through slice1 and slice2 networks separately

packet runs through network that combines slice1 and slice2

Page 60: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Traffic Isolation A B

H1 H2 1 2 1 2

H4

3

H3

3

A network slice is a light-weight abstraction designed for traffic isolation:

Theorem: edge1; (slice1; t)* ≈ edge1; ((slice1 + slice2); t)*

consider those packets at the edge1 of the slice

can’t tell the difference between slice1 alone and slice1 + slice2

edge1 = sw = A & pt = 1 || sw = B & pt = 2 slice1 = {edge1} pol1 {edge1}

edge2 = sw = A & pt = 3 || sw = B & pt = 3 slice2 = {edge2} pol2 {edge2}

Page 61: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

NetKAT can be implemented with OpenFlow

61

forward = (dst = H1; pt <- 1) + (dst = H2; pt <- 2) ac = ~(typ = SSH); forward

Pattern Actions typ = SSH drop dst=H1 fwd 1 dst=H2 fwd 2

Pattern Actions typ = SSH drop dst=H1 fwd 1 dst=H2 fwd 2

Flow Table for Switch 1:

Flow Table for Switch 2:

compile

Theorem: Any NetKAT policy p that does not modify the switch field can be compiled in to an equivalent policy in “OpenFlow Normal Form.” [POPL '12, '14] See [Smolka et al ICFP 2015] for new, fast BDD-based compilation algorithms.

Page 62: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

NETWORK UPDATES

62

Page 63: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Updates

63

Even when old policy is "good" and new policy is "good" problems can happen when transitioning •  packets dropped, forwarded incorrectly, loops

Page 64: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Updates

64

Problem 7: How to update a network from one policy to another while maintaining key semantic invariants?

Page 65: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Src Traffic Action

Web Allow

Non-web Drop

Any Allow

Example: Distributed ACL

F2

F3

Security Policy

Configuration A

Process black-hat traffic on F1 Process white-hat traffic on {F2,F3}

Configuration B

Process black-hat traffic on {F1,F2} Process white-hat traffic on F3

?

F1

I

Tricky reasoning about ordering of rule installation, network latency & packet flow. A classic hard-to-reason-about concurrency problem.

Page 66: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

The ACL Example: Just one invariant we might want to preserve.

What general properties should the network update operation have?

66

Page 67: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

General Criterion [Reitblatt, Foster, Rexford, Schlesinger, Walker, SIGCOMM 2012 ]

Per-packet consistency: Definition: Every individual packet processed exclusively by either policy P1 or policy P2 (not a mix) Theorem: if a trace property T holds of both P1 and P2 independently, then it holds of all packets flowing through the network Example trace properties: Access control, reachability, waypointing, no loops

Per-flow consistency:

Definition: Well-defined set of related packets are all processed by policy P1 or policy P2 (not a mix)

P1

P2

Page 68: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

BenefitsSimplifies programming practice

– Update configurations in chunks – Programmer doesn’t think about concurrency – A class of programming errors goes away – Run-time system handles the details

Cheaper mechanical verification

– Check each policy P1, P2, P3 in isolation – No transitional states need to be verified! – Conceptually: A generic way to bootstrap a static

network policy analyzer in to a dynamic one •  We specified LTL properties and tried NuSMV •  You can try Header Space Analysis or Anteater!

Page 69: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Mechanism: Two-Phase UpdateSet-up: Pre-process policies

–  Edge: Stamp packet with version # –  Interior: Policy conditioned on version #

Phase 1: Unobservable updates –  Add rules for P2 in the interior –  … matching on version # of P2

Phase 2: One-touch updates –  Add rules to stamp packets

with version #P2 at the edge

Clean-up: Remove old rules –  Wait for some time, then

remove all version # P1 rules

Page 70: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Goal: Avoid naive two-phase update –  Naïve version touches every switch –  Doubles rule space requirements

Optimization: Incremental update [Katta, Rexford, Walker]

–  Shift policy one traffic slice at a time –  Trades update time vs. rule space

Optimization: Supply invariants [McClurg, Hojjat, Cerny, Foster]

–  Specify the LTL invariants you need to preserve –  Use a model checker to search for update sequences

Improved specs: Regular update lang. [Saha, Prabhu, Mahdu]

–  Specify the invariants you need to preserve –  Map to SAT problem; synthesize update

Alternatives

Page 71: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Consistent Updates

Main point: •  Programmers need only concern themselves with

the properties of each policy (A, B, ...) in isolation. Effects: •  Another kind of modularity for SDN programmers. •  Brain power shifted away from tricky transitions.

71

Page 72: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

MOVING FORWARD

72

Page 73: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

NetKAT: Still low level & limited

73

H1 H2

A B

•  Explicitly programmed paths •  Stateless •  Lacking QOS controls •  Lacking monitoring and debugging support •  Limited actions (get and set packet headers) •  Operates over fixed protocols (IPv4 vs. DESTP)

Dave's Extra-Special Transport Protocol (DESTP)

Page 74: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

The Big Switch Model

74

H1 H2

A Big Switch

Big switch model abstracts away from physical topology Operator specifies constraints: •  access control, reachability, QOS constraints, fault tolerance Problem 5: System synthesizes concrete policy •  SIMPLE [Qasi et al 2013], MERLIN [Soule et al, 2013], L [Padon et al 2015] •  Optimization [He 2008, Ghosh et al 2013]

Page 75: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Alternate Platforms: P4 & more

75

•  Openflow was designed as a simple first-cut interface •  It doesn't support state, contains a small set of primitive

actions and doesn't allow programmers to control packet formats.

•  Alternate architectures:

•  P4 •  CPU •  CPU + FPGA •  CPU + GPU?

•  Problem 6: Designing high-level programming models &

automatically splitting computation between devices

Page 76: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Middleboxes

76

H1 H2

Many networks contain middleboxes •  can be as many middleboxes as routers

Middleboxes contain stateful computations •  dynamic access control and intrusion detection •  content caching •  programming and/or verification and synthesis support:

•  Maple, FlowLog, NetEgg, Slick, L, Vericon

A B

Page 77: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

WRAP UP

77

Page 78: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

~1000 students Programming in Pyretic

Networking Researchers at HP & Google

How are we doing? Corybantic: Towards the Modular Composition of SDN

Control Programs

Jeffrey C. Mogul*§, Alvin AuYoung†, Sujata Banerjee†, Lucian Popa†,Jeongkeun Lee†, Jayaram Mudigonda*‡, Puneet Sharma†, Yoshio Turner†

†HP Labs, Palo Alto, *Google, Inc.†{FirstName.LastName}@hp.com, §[email protected], ‡[email protected]

ABSTRACT

Software-Defined Networking (SDN) promises to enablevigorous innovation, through separation of the control planefrom the data plane, and to enable novel forms of networkmanagement, through a controller that uses a global viewto make globally-valid decisions. The design of SDN con-trollers creates novel challenges; much previous work hasfocused on making them scalable, reliable, and efficient.

However, prior work has ignored the problem that mul-tiple controller functions may be competing for resources(e.g., link bandwidth or switch table slots). Our Corybantic

design supports modular composition of independent con-troller modules, which manage different aspects of the net-work while competing for resources. Each module tries tooptimize one or more objective functions; we address thechallenge of how to coordinate between these modules tomaximize the overall value delivered by the controllers’ de-cisions, while still achieving modularity.

Categories and Subject Descriptors

C.2.3 [Network Operations]: Network Management

General Terms

Design, Economics, Management

Keywords

Software-Defined Networking

1. INTRODUCTION

Software Defined Networking (SDN) promises a betterapproach, not just for network innovation, but also for man-aging a network. By separating the control plane from the

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Hotnets ’13, November 21–22, 2013, College Park, MD, USA.Copyright 2013 ACM 978-1-4503-2596-7 ...$10.00.

data plane, via a protocol such as OpenFlow, SDN allowsthe use of an arbitrarily complex, central controller that candynamically make decisions that can maintain global poli-cies and objectives.

Of course, the promise of SDN relies on the creation ofreliable, scalable, and efficient controller software. Many re-searchers and developers have worked on this problem (e.g.,[5, 14, 21, 20]).

For a production network, however, an SDN controllermust simultaneously respect many concerns, possibly withcompeting objectives. For example, a power manager’s ob-jective might be to re-route traffic onto a smaller set of links,so that it can turn off a switch, while a QoS manager mightwant to maintain as many live links as possible, to ensurethat its SLAs are met. Other examples of objectives in-clude routing, flow-level traffic engineering, support for fastfailover, middlebox insertion, VM migration, etc.

One might build a monolithic controller that can juggle allof these issues, but we believe that without modularity, sucha controller will be hard to build, maintain, and extend, andperhaps just as resistant to further innovation as traditionalhardware-defined networks.

In this paper, we propose a controller framework design,Corybantic, that focuses on both achieving modular com-position and maximizing the overall value delivered by thecontroller’s decisions. Our goal is for modules to expose suf-ficient information about their local objectives and policiesso that, operating collaboratively through Corybantic, theymaximize system-wide objectives while meeting all of theirpolicy constraints. At the same time, the inter-module inter-faces should be as narrow as possible, to support modularityand composition.

We contrast our work with Frenetic [8, 9] and Pyretic [18]:both systems support composition of modules by resolvingconflicts over specific OpenFlow rules. Corybantic supportscomposition of modules that might conflict over higher-levelpolicies and objectives. In other words, Frenetic, Pyretic,and all the prior work that we know of, aims at better waysto get the network to do what you want it to do; Corybantic,in contrast, is aimed at deciding what you want it to do. Asa consequence, Corybantic operates at a higher layer of thecontroller stack, and is mostly complementary to prior work.

1

On Consistent Updates in Software Defined Networks

Ratul Mahajan Roger WattenhoferMicrosoft Research ETH Zurich

Abstract— We argue for the development of e�-cient methods to update the data plane state of an SDN,while maintaining desired consistency properties (e.g.,no packet should be dropped). We highlight the inher-ent trade-o↵ between the strength of the consistencyproperty and dependencies it imposes among rules atdi↵erent switches; these dependencies fundamentallylimit how quickly data plane can be updated. For onebasic consistency property—no packet should loop—wedevelop an update algorithm that has provably minimaldependency structure. We also sketch a general archi-tecture for consistent updates that separates the twinconcerns of consistency and e�ciency.

Categories and Subject Descriptors: C.2.1 [Computer-

Communication Networks]: Network Architecture and Design

General Terms: Algorithms

1. INTRODUCTIONFrom early papers on the topic (e.g., [3, 1, 4, 2]), we

can learn that the primary promises of SDNs were thati) centralized control plane computation can eliminatethe ill-e↵ects of distributed computation (e.g., loopingpackets), and ii) separating control and data planes sim-plifies the task of configuring the data plane in a mannerthat satisfies diverse policy concerns. For example, toeliminate oscillations and loops that can occur in cer-tain iBGP architectures, the Routing Control Platform(RCP) [3, 1] proposed a centralized control plane archi-tecture that directly configured the data plane of routersin an autonomous system.

4D aimed to simplify network management [4]. It ob-served that the “data plane needs to implement, in ad-dition to next-hop forwarding, functions such as tunnel-ing, access control, address translation, and queuing.”Intoday’s networks, this “requires complex arrangementsof commands to tag routes, filter routes, and configuremultiple interacting routing processes, all the while en-suring that no router is asked to handle more routes andpacket filters than it has resources to cope with.”Basedon this observation, it argues for centrally computingdata plane state in a way that obeys all concerns.

Similarly, ETHANE reasoned that for simplified man-agement of enterprise networks “policy should deter-mine the path that packets follow” [2]. It then arguedfor SDNs because the requirements of network man-agement “are complex and require strong consistency,making it quite hard to compute in a distributed man-

ner.” These promises have led to SDNs garnering a lotof attention from both researchers and practitioners.

However, as we gain more experience with thisparadigm, a nuanced story is emerging. Researchershave shown that, even with SDNs, packets can takepaths that do not comply with policy [10] and that traf-fic greater than capacity can arrive at a link [5]. Whatexplains this gap between the promise and these incon-sistencies? The root cause is that promises apply tothe eventual behavior of the network, after data planestate has been changed, but inconsistencies emerge dur-ing data plane state changes.

Since changes to data plane state, in response to fail-ures, load changes, and policy changes, are an essentialpart of an operational network, so will be the inconsis-tencies. Thus, successful use of SDNs requires not onlymethods to compute policy-compliant data plane statebut also methods to change that state in a way thatmaintains desired consistency properties.

This paper takes a broad view of this aspect of SDNs.The key to consistently updating a network is care-fully coordinating rule updates across multiple switches.Atomically updating the rules of multiple switches isdi�cult and uncoordinated updates can lead to incon-sistencies such as packet loops and drops. Thus, whena rule can be safely updated at a switch depends onwhat rules are present at other switches, and in somecases, these dependencies can be circular. By analyz-ing a spectrum of possible consistency properties andthe dependency structures they induce, we find an in-herent trade-o↵. Dependency structures are simpler forweaker consistency properties and are more intricate forstronger properties. For instance, simply ensuring thatno packet is dropped does not induce any dependencyamongst switches, but ensuring that no packet sees amix of old and new rules [10] makes rules at a switchdepend on all other switches in the network.

We also take a detailed view of a basic consistencyproperty—no packet loops—and develop two new up-date algorithms that induce less intricate dependencystructures than currently known algorithms [10]. Oneof our algorithms induces provably minimal dependen-cies.

Motivated by our analysis, we sketch a general ar-chitecture for consistent network updates. It decouplesacross two modules the two concerns that are primaryduring network updates: consistency and speed. Given

1

Networking Researchers at Microsoft & ETH

Software & Tools Tools

Frenetic/OCaml [Foster, Guha, Reitblatt+] ==> FlowLog [Nelson, Sheer, Ferguson+]

Pyretic [Reich, Monsanto, Schlesinger+]

==> PyResonance [Kim, Reich, Feamster+] ==> SDX [ Gupta et al. ]

Open Networking Summit: Research Track Call for Papers

Topics: •  Programming languages, verification techniques and tools for SDN ONS Attendance: •  2011: 600 •  2012: 909 •  2013: 1600

P4

[Bosshart, Daly, Izzard, McKeown, Rexford, Schlesinger, Talayco, Vahdat, Varghese, Walker]

The next generation of the OpenFlow protocol:

Custom packet specs & configurable table graphs

Page 79: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Networks Need Abstractions

79

More Tools to Support Them

security fault tolerance quality of service distribution

Debugging

Verification

Testing

Declarative languages Relevant stats: •  Google lost $545,000 during a

5-minute outage •  Amazon lost $1,100/second

during a 25-minute outage •  Nicira bought for 1.2 billion

Program synthesis

Page 80: Programming (Software-Defined) Networks · Control A single entity controls massive data centers. They can re-engineer them for reliability using formal methods. And they are!! –

Cornell: Carolyn Anderson Shrutarshi Basu Rebecca Coombes Nate Foster Dexter Kozen Stephen Gutz Matthew Milano Mark Reitblatt Emin Gün Sirer Robert Soulé Laure Thompson Alec Story Todd Warszawski

Princeton: Michael Greenberg Rob Harrison Nanxi Kang Naga Praveen Katta Xin Jin Matthew Meola Chris Monsanto Nayden Nedev Josh Reich Jen Rexford Cole Schlesinger David Walker

UMass: Arjun Guha

http://frenetic-lang.org http://cs.princeton.edu/~dpw