service‐centric networking with scaffold...principles of scaffold 1. service‐level naming...
TRANSCRIPT
Service‐centricnetworkingwithSCAFFOLD
MichaelJ.Freedman
PrincetonUniversity
withMatveyArye,PremGopalan,StevenKo,
ErikNordstrom,JenRexford,andDavidShue
Fromahost‐centricarchitecture
1960s
Fromahost‐centricarchitecture
1960s1970s
Fromahost‐centricarchitecture
1960s1970s
1990s
Toaservice‐centricarchitecture
1960s1970s
1990s
2000s
Toaservice‐centricarchitecture
• Userswantservices,agnosRcofactualhost/locaRon
• Serviceoperatorsneed:replicaselecRon/loadbalancing,replicaregistraRon,livenessmonitoring,failover,migraRon,…
Hackstofakeservice‐centrismtoday
Layer4/7: DNSwithsmallTTLs HTTPredirects
Layer‐7switching
Layer3: IPaddressesandIPanycast
Inter/intrarouRngupdates
Layer2: VIP/DIPloadbalancers
VRRP,ARPspoofing
+Home‐brewedregistraRon,configuraRon,monitoring,…
Toaservice‐centricarchitecture
• Userswantservices,agnosRcofactualhost/locaRon
• Serviceoperatorsneed:replicaselecRon/loadbalancing,replicaregistraRon,livenessmonitoring,failover,migraRon,…
• Service‐levelanycastasbasicnetworkprimiRve
Twohigh‐levelquesRons
• Moderatevision:Cannetworksupportaidself‐configuraRonforreplicatedservices?
• Bigvision:Should“service‐centricnetworking”becomethenewthinwaistofInternet?
Namingasa“thinwaist”
• Host‐centricdesign:TradiRonallyoneIPperNIC– Loadbalancing,failover,andmobilitycomplicates– Now:virtualIPs,virtualMACs,…
• Content‐centricarchitecture:UniqueIDperdataobject– DONA(Berkeley),CCN(PARC),…
• SCAFFOLD:UniqueIDpergroupofprocesses– EachmembermustindividuallyprovidefullgroupfuncRonality
– Groupcanvaryinsize,distributedoverLANorWAN
Objectgranularitycanvarybyservice
=
SCAFFOLDObjectID
K-bit Admin Prefix Machine-readable ObjectID =
Google YouTube Service
FixedBit‐length
MemcachePar**on= Facebook Partition 243
= Comcast Mike’s Laptop
IZ–“Somewhereovertherainbow” = Google IZ – “Somewhere” video
SCAFFOLDas…
– Cleanslatedesign
– MulR‐datacenterarchitectureforsingleadministraRvedomain
• Deployedoverlegacynetworks
• Few/nomodificaRonstoapplicaRons
Target:SingleadministraRvedomain
• Datacentermanagementmoreunified,simple,centralized
• HostOSnet‐imagedandcanbefork‐liiupgraded
• Alreadystrugglingtoprovidescalabilityandservice‐centrism
• CloudcompuRnglessenimportanceoffixed,physicalhosts
X
DC 2 DC 1
Y
Backbone
Internet
X
YY
X
GoalsforService‐Centrism• Handlingreplicatedservices
– ControloverreplicaselecRonamonggroups
– Controlofnetworkresourcessharedbetweengroups– Handlingdynamicsamonggroupmembershipanddeployments
• Handlingchurn– Flexibility:Fromsessions,tohosts,todatacenters
– Robustness:LargelyhidefromapplicaRons
– Scalability:Localchangesshouldn’tneedtoupdateglobalinfo– Scalability:Churnshouldn’trequireper‐clientstateinnetwork– Efficiency:Wide‐areamigraRonshouldn’trequiretunneling
Clean-Slate Design
PrinciplesofSCAFFOLD1. Service‐levelnamingexposedtonetwork
2. AnycastwithflowaffinityasbasicprimiRve
3. MigraRonandfailoverthroughaddressremapping– AddressesboundtophysicallocaRons(aggregatable)
– FlowsidenRfiedbyeachendpoint,notpairwise
– Controlthroughin‐bandsignalling;statelessforwarders
4. Minimizevisibilityofchurnforscalability– Differentaddr’sfordifferentscopes(successiverefinement)
5. Tighterhost‐networkintegraRon– Allowinghosts/serviceinstancestodynamicallyupdatenetwork
PrinciplesofSCAFFOLD1. Service‐levelnamingexposedtonetwork
2. AnycastwithflowaffinityasbasicprimiRve
PrinciplesofSCAFFOLD1. Service‐levelnamingexposedtonetwork
2. AnycastwithflowaffinityasbasicprimiRve
SCAFFOLDaddress
ObjectID FlowID
(i)ResolveObjectIDtoaninstanceFlowLabel
(ii)RouteoninstanceFlowLabeltothedesRnaRon
Admin Prefix Object Name SS Label Host Label
(iii)SubsequentflowpacketsusesameFlowLabel
SocketID
PrinciplesofSCAFFOLD1. Service‐levelnamingexposedtonetwork
2. AnycastwithflowaffinityasbasicprimiRve
SCAFFOLDaddress
ObjectID FlowID
Admin Prefix Object Name SS Label Host Label SocketID
SCAFFOLDaddress
DecoupledflowidenRfiers
Src FlowID Dst FlowID ObjectID Flow Labels SocketID ObjectID Flow Labels SocketID
3. MigraRonandfailoverthroughaddressremapping
4. Minimizevisibilityofchurnforscalability
Who Where Which conversation
ObjectID Flow Labels SocketID
SCAFFOLDaddress
ManagemigraRon/failoverthroughin‐bandaddressremapping
Src FlowID Dst FlowID
Who Where Which conversation
ObjectID Flow Labels SocketID ObjectID Flow Labels SocketID
ObjectID SS8 : 30 SocketID SS10 : 40 : 20
(i) Localend‐pointchangeslocaRon,assignednewaddress
(ii) ExisRngconnecRonssignalnewaddresstoremoteend‐points
(iii) Remotenetworkstackupdated,applicaRonunaware
Minimizevisibilityofchurnthroughsuccessiverefinement
ObjectID SS4 : 50 SocketID SS10 : 40 : 20
Where
SS1040
20
5Wide‐Area
Minimizevisibilityofchurnthroughsuccessiverefinement
SS4
SS10
ArbitrarySubnet/AddressStructure
MulRplelevelsofrefinementWide‐Area
SRC LocalHost Safari Client SS 4 50 3
DST Google YouTube Svc SS 10 40 20 5 40
20
• Scalability: –Localchurnonlyupdateslocalstate –Addressesremainhierarchical
• Infohiding:Topologynotgloballyexposed
5
Network Controller
Label Router
Label Router
Label Router
Object Router
Host
RouRng
ResoluRon
Host
B 3 A 2
Ac*on NetworkControlMsg
netlinkup join(2)
netlinkdown leave(2)
bind(fd,A) register(A,2)
close(fd) unregister(A,2)
Integratedservice‐host‐networkmanagement
Network Controller
Label Router
Label Router
Label Router
Object Router
Host
RouRng
ResoluRon
Host
B 3 A 2
Ac*on NetworkControlMsg
netlinkup join(2)
netlinkdown leave(2)
bind(fd,A) register(A,2)
close(fd) unregister(A,2)
Integratedservice‐host‐networkmanagement
Self‐configuraRon+adapRvetochurn
Using SCAFFOLD:
Network‐levelprotocols
andnetworksupport
ApplicaRon’snetworkAPI
Today(IP/BSDsockets)
fd = open();
Datagram: sendto (IP:port, data)
Stream: connect (fd, IP:port) send (fd, data);
IP:ApplicaRonseesnetwork,networkdoesn’tseeappSCAFFOLD:Networkseesapp,appdoesn’tseenetwork
SCAFFOLD
fd = open();
Unbounddatagram: sendto (objectID, data)
Bounddatagram: connect (fd, objectID) send (fd, data);
SRC B 3 0
DST A 0 0
LR 1 LR 2
Object Router
OR
DATA
DATA
Label Router 1 Label Router 2
SRC A 2 0
DST B 0 0
SRC A 2 0
DST B 3 0
ObjectID Flow Label SocketID
sendto(B)
sendto(A)
B 4
B 3
A 2
A 2
B 3
B 4
UnboundFlows
3 p1
4 p2
bind(B)join
SRC B 3 0
DST A 2 0
LR 1 LR 2
Object Router
OR
DATA
DATA
Label Router 1 Label Router 2
SRC A 2 0
DST B 0 0
SRC A 2 0
DST B 3 0
ObjectID Flow Label SocketID
sendto(B)
B 4
B 3
A 2
A 2
B 3
B 4
Half‐BoundFlows
3 p1
4 p2
sendto(A,flags)bind(B)join
LR 1 LR 2
Object Router
OR
Label Router 1 Label Router 2
SRC A 2 765
DST B 0 0
SRC A 2 765
DST B 3 0 SRC B 3 234
DST A 2 765
SYN
SYN
SYN/ACK
ACK
SRC A 2 765
DST B 3 234
connect(B)
ConnecRonBound
joinbind(B)listen()
B 3 A 2
BoundFlows
LR 1 LR 2
Object Router
OR
Label Router 1 Label Router 2
SRC A 2 765
DST B 0 0
SRC A 2 765
DST B 3 0
SYN
SYN
SYN/ACK
ACK
SRC A 2 765
DST B 3 234
connect(B)
ConnecRonBound
B 3 A 2
BoundFlows
• ApplicaRonsbindonobject‐levelnames• Networkforwardsonresolvedaddresses
SRC B 3 234
DST A 2 765
joinbind(B)listen()
SRC A 2 765
DST B 3 234
Label Router 1
SupporRngMobilityandMigraRon
Label Router 3
SRC B 3 234
DST A 2 765
Object Router Label Router 2
RSYN
RSYN/ACK
ACK
LR 3
SRC B 3 234
DST A 4 765
ConnecRonMigrated
LR 1 LR 2
OR
SRC A ? 765
DST B 3 234
SRC A 4 765
DST B 3 234
B 3 A 2 A 4
Label Router 1
SupporRngFailoverandLoadShedding
Object Router Label Router 2
LR 1 LR 2
OR A 2
FAIL
B 3
B 5
ACK
RSYN
RSYN/ACK
SRC A 2 765
DST B 3 234
SRC A 2 765
DST B 0 0
SRC A 2 765
DST B 5 529
Label Router 1
SupporRngFailoverandLoadShedding
Object Router Label Router 2
LR 1 LR 2
OR A 2
FAIL
B 3
B 5 ACK
RSYN
RSYN/ACK
SRC A 2 765
DST B 3 234
SRC A 2 765
DST B 0 0
SRC A 2 765
DST B 5 529
• Decoupledid’senablein‐bandmigraRonandrecovery
• Flowaffinitywithoutper‐flowstateinthenetwork
Extentofchanges
Changein‐networksupport
Changethepacketformat
Changesocketlayer+stack
Hdr ObjID SS | … | Host SockID
Label Router
Object Router
Network Controller
Yet:
Canrunontopoflegacynetworks(IPandEthernet) Few/easy/nochangestoapplicaRons
Backwards Compatibility
HidephysicallocaRonfromapp
Today(IP/BSDsockets)
fd = open();
Datagram: sendto (IP:port, data)
Stream: connect (fd, IP:port) send (fd, data);
SCAFFOLD
fd = open();
Unbounddatagram: sendto (objectID, data)
Bounddatagram: connect (fd, objectID) send (fd, data);
CurrentapplicaRons–iperf,TFTP,PowerDNS
SCAFFOLDnetworkstack
Network interfaces
Application Scafd IPC
user
kernel
Network interfaces
Application
Scafd IPpackets
LinuxsocketsinterfacePacketsocket
OperaRngacrosslegacynetworks
SS4
SS10
ArbitrarySubnet/AddressStructure
Wide‐Area
SRC LocalHost Safari Client SS 4 50 3
DST Google YouTube Svc SS 10 40 20 5
4020
5
Label Router
Label Router
Label Router
Label Router
Label Router
Label Router
(Anycasted)IPAddress/Prefix
1.1.1.1 2.2.2.2
3.3.3.3 1.1/16
1.1.1/24 1.1.1.1
RouRngoverlegacynetworks
Ethernet
IPv4
TransportPort:16bobjID
Addr:8bSS|8bHost|16bsock
Ethernet
IPv4
SCAFFOLD
1.1.1.1
ObjectID Flow Label SocketID
2.2.2.2
Current InDevelopment
In‐Networksupport
Object Router
Label Router
ModifiedOpenFlowsoiwareswitchforproporRonalsplitrouRng/resoluRon
‐ NOXapplicaRon:topology,host,objectmanagementNetwork
Controller
‐Network
Controller
Label Router
Label Router
Label Router
Object Router
Host Host
“Evaluation”
Demos
• LoadShedding:– Callclose()onconnecRons– SubsequentpacketsgetFAIL,thenreconnect
• Clientmobility
TFTPtransferwithClientMobility
ClientLeaves ClientReconnects(RSYN)
TFTPtransferwithFailover
Server1(FAIL) Server2(FAIL)
<100msblip
Currentthroughput
CurrentimplementaRonisbothuser/kernelspace.Ongoingdevelopmenttoeither/or.
Service‐centricnetworking
• Moderatevision:Cannetworksupportaidself‐configuraRonforreplicatedservices?
• Bigvision:Should“service‐centricnetworking”becomethenewthinwaistofInternet?
SCAFFOLDrethinks:1. NamingexposedtonetworkandapplicaRons2. Extentofhost‐networkintegraRon3. Roleofdumb/statelessnetworkvs.end‐hosts
Service‐centricnetworkingwithSCAFFOLD
MichaelJ.Freedman
PrincetonUniversity
withMatveyArye,PremGopalan,StevenKo,
ErikNordstrom,JenRexford,andDavidShue
LatencyofAPIcalls
Networkvs.stacklatency
RelatedWork
NewArch i3 LNA DONA LISP HIP CCN SCAFFOLD
Paradigm Object Object Object Host Host Content Object
Layer 3O 4 3/4 3 4 3/4 3/4
Anycast Hash Res Prox No No Mcast Res
Resolution DHT EB Routed EB Rdz DDiff SRefine
Migration Yes Yes Yes* Yes Yes Yes* Yes
Failover Yes Yes Yes No No Yes Yes
RelatedWork
SCAFFOLD SPAIN PortLand VL2
Topology Arbitrary Arbitrary Fat-tree Fat-tree
Multipath Any Many ECMP ECMP
Migration Yes Yes* Yes* Yes*
Failover Yes No No No
Traffic Engineering Arbitrary Oblivious Oblivious Oblivious
Server Selection Yes No* No* No*
Use CoTS? No Yes No Yes
End-host Mod Yes Yes No Yes