building a service‐centric network with...
TRANSCRIPT
Buildingaservice‐centricnetworkwithSCAFFOLD
MichaelJ.Freedman
PrincetonUniversity
withPremGopalan,StevenKo,JenRexford,andDavidShue
*Service‐CentricArchitectureForFlexibleObjectLocalizaNonandDistribuNon
*
Fromahost‐centricarchitecture
1960s
Fromahost‐centricarchitecture
1960s1970s
Fromahost‐centricarchitecture
1960s1970s
1990s
Toaservice‐centricarchitecture
1960s1970s
1990s
2000s
Toaservice‐centricarchitecture
• Userswantservices,agnosNcofactualhost
• ServiceoperatorsneedtosupportreplicaselecNon,failover,migraNon,…
• Service‐centricanycastasfirst‐classprimiNve
Challenges• Handlingreplicatedservices
– ControloverreplicaselecNonamonggroups
– Controlofnetworkresourcesforsharedbetweengroups– Handlingdynamicsamonggroupmembershipanddeployments
• Handlingchurn– Flexibility:Fromsessions,tohosts,todatacenters
– Robustness:LargelyhidefromapplicaNons
– Scalability:Localchangesshouldn’tneedtoupdateglobalinfo– Scalability:Churnshouldn’trequireper‐clientstateinnetwork– Efficiency:Wide‐areamigraNonshouldn’trequiretunneling
SCAFFOLDas…
– Cleanslatedesign
– MulN‐datacenterarchitectureforsingleadministraNvedomain
Target:SingleadministraNvedomain
• DCnetworkmanagementmoreunified,simple,centralized
• End‐hostOSnet‐imagedandcanbefork‐li_upgraded
• Alreadystrugglingtoprovidescalabilityandservice‐centrism
• CloudcompuNngtrendslessonimportanceoffixed,physicalhosts
X
DC 2 DC 1
Y
Backbone
Internet X
YY
X
Appearanceofservice‐centrismtoday
Layer4/7: DNSwithsmallTTLs HTTPredirects
Layer‐7switching
Layer3: IPaddressesandIPanycast
Inter/intrarouNngupdates
Layer2: VIP/DIPloadbalancers
VRRP,ARPspoofing
Outlineoftalk
• Principlesforservice‐centricdesign
• ArchitectureanddesignofSCAFFOLD– Networksupport• Newforwardingmodel• SupportformigraNonandfailover• Networkandservicemanagement
– End‐hostsupport:socketinterfaceandnetworkstack
• ImplementaNon,especiallyOpenFlow/NOXdetails(anddesiderata)
PrinciplesofSCAFFOLD• Service‐centricnaming
– Service/objectid’sasflexiblenaming,nothosts• Webserversprovidingfront‐Nerweb(calendar.google.com)
• AparNcularregionina(distributed)VirtualWorldservice
• AparNcularfileinaCDN
PrinciplesofSCAFFOLD• Service‐centricnaming
– Service/objectid’sasflexiblenaming,nothosts
– Network‐leveladdresseshiddenfromapplicaNon
• FlowsandanycastasbasicnetworkprimiNves– Namescorrespondtoanycastgroups,unicastasspecialcase
– ConnecNonaffinityforflowswithinanycastedendpoints
• MigraNonandfailoverthroughaddressremapping– FlowsidenNfiedbyeachendpoint,notpairwise– Controlthroughin‐bandsignalling,statelessforwarders
• Minimizevisibilityofchurnforscalability– Differentaddressingfordifferentscopes(successiverefinement)
– UnityoffuncNonalityandmanagement
Extentofchanges
Changein‐networksupport
Changethepacketformat
Changesocketlayer+stack
Object Switch
Flow Switch
Flow ID
Obj ID DC ID Host ID Sock ID Hdr
ApplicaNon’snetworkAPI
Today(IP/BSDsockets)
fd = open();
Datagram: sendto (IP:port, data)
Stream: connect (fd, IP:port) send (fd, data);
IP:ApplicaNonseesnetwork,networkdoesn’tseeappSCAFFOLD:Networkseesapp,appdoesn’tseenetwork
SCAFFOLD
fd = open();
Unbounddatagram: sendto (objectID, data)
Bounddatagram: connect (fd, objectID) send (fd, data);
SCAFFOLDinthenetwork:
UnboundDatagramsandNetworkSupport
Forwardingunbounddatagrams
Host ID: B
X B
C SRC
DST
C
X SRC
DST
C SRC
X DST
C A
X SRC
DST
X
X
Object Switch
Host ID: A
X
C:A
X:B
X:D
X:E
HID: D
HID: E Service‐levelnamingandforwarding ServicesshouldcontrolinstanceselecNon
HID: D
HID: E
Successiverefinementofdatagrams
X
Host ID: B
Data Center 2 Data Center 1
Host ID: A
X
X
C:A
X:2
C:1
X:B
X:D
X:E
HID: D
HID: E
Successiverefinementofdatagrams
X
Host ID: B
Data Center 2 Data Center 1
Host ID: A
X 2
C SRC
DST X B
C SRC
DST
X
C SRC
X DST C
X SRC
DST
C A
X SRC
DST
C 1
X SRC
DST
X
C:A
X:2
C:1
X:B
X:D
X:E
Forwardingreferstosuccessively‐refineddesNnaNons Churnhiddenfromwider‐areaasmuchaspossible
SCAFFOLDinthenetwork:
BoundFlowsandNetworkDevices
Forwardingboundflows
Host ID: B
X
X
Object Switch
Host ID: A
XX B
C SRC
DST
C
X SRC
DST
SRC
X DST
C A
X SRC
DST
C:A
X:B
X:D
X:E
C
Forwardingboundflows
Host ID: B
X
X
Object Switch
Host ID: A
XX B
C SRC
DST
C SRC
X DST
A p
A p
C
X SRC
DST A p
B q
C:A
X:B
X:D
X:E
C:A
X:B
X:D
X:E
Forwardingboundflows
Host ID: B
Object Switch
Host ID: A
XX B
C SRC
DST
C SRC
X DST
A p
A p
C
X SRC
DST A p
B q
Flow ID
Obj ID Host ID Sock ID Header
X
C SRC
DST B q
A p
fd=5 oid=C,X C
X SRC
DST A p
B q
fd=9 oid=X,C
C:A
X:B
X:D
X:E
Forwardingboundflows
Host ID: B
Object Switch
Host ID: A
XX B
C SRC
DST
C SRC
X DST
A p
A p
C
X SRC
DST A p
B q
Flow Switch
_:aOS
A:aAB:aB
D:aD
E:aE
C
X SRC
DST A p
B q
fd=9 oid=X,C
X
C SRC
DST B q
A p
fd=5 oid=C,X
Flow ID
Obj ID Host ID Sock ID Header
C:A
X:B
X:D
X:E
Forwardingboundflows
Host ID: B
Object Switch
Host ID: A
XX B
C SRC
DST
C SRC
X DST
A p
A p
C
X SRC
DST A p
B q
Flow Switch
C
X SRC
DST A p
B q
fd=9 oid=X,C
X
C SRC
DST B q
A p
fd=5 oid=C,X
B q
q
_:aOS
A:aAB:aB
D:aD
E:aE
Flow ID
Obj ID Host ID Sock ID Header
C:A
X:B
X:D
X:E
Forwardingboundflows
Host ID: B
Object Switch
Host ID: A
XX B
C SRC
DST
C SRC
X DST
A p
A p
C
X SRC
DST A p
B q
Flow Switch
C
X SRC
DST A p
B q
fd=9 oid=X,C
X
C SRC
DST B q
A p
fd=5 oid=C,X
B q
q
_:aOS
A:aAB:aB
D:aD
E:aE
Flow ID
Obj ID Host ID Sock ID Header
ServicesshouldcontrolinstanceselecNon Flowaffinity,yetnoper‐flownetworkstate FlowsidenNfiedbyeachendpoint
Forwardingboundflows
X
X
Host ID: B
Data Center 2 Data Center 1
X 2
C 1 A p SRC
DST X 2 B
C 1 A p SRC
DST
C 1 A p
X 2 B q SRC
DST
X
C 1 A p SRC
X DST
Host ID: A
C:A
X:2
_:aOS2
2B:aB
2D:aD
2E:aE
1:a1
_:aOS1
1A:aA2:a2
Forwardingboundflows
X
X
Host ID: B
Data Center 2 Data Center 1
X 2
C 1 A p SRC
DST X 2 B
C 1 A p SRC
DST
C 1 A p
X 2 B q SRC
DST
X
C 1 A p SRC
X DST
Host ID: A
C:A
X:2
_:aOS2
2B:aB
2D:aD
2E:aE
1:a1
_:aOS1
1A:aA2:a2
Forwardingboundflows
X
X
Host ID: B
Data Center 2 Data Center 1
X 2 B
C 1 A p SRC
DST
C 1 A p
X 2 B q SRC
DST
X
C 1 A p SRC
X DST
Host ID: A
X 2
C 1 A p SRC
DST
Flow ID
Obj ID DC ID Host ID Sock ID Header
C:A
X:2
_:aOS2
2B:aB
2D:aD
2E:aE
1:a1
_:aOS1
1A:aA2:a2
Forwardingboundflows
X
X
Host ID: B
Data Center 2 Data Center 1
X 2 B
C 1 A p SRC
DST
C 1 A p
X 2 B q SRC
DST
X
C 1 A p SRC
X DST
Host ID: A
X 2
C 1 A p SRC
DST
C:A
X:2
_:aOS2
2B:aB
2D:aD
2E:aE
1:a1
_:aOS1
1A:aA2:a2
ApplicaNonsnamelogicalflows,notphysicallocaNons
Labelmanagementbyend‐host
SocketState
LocalObjectID
LocalFlowID
RemoteObjectID
RemoteFlowID
AcceptedSocket
open C 1:A:p X 2:B:q No
bound C 1:A:r Y −− No
unbound E −− Z −− No
IP:ApplicaNonseesnetwork,networkdoesn’tseeappSCAFFOLD:Networkseesapp,appdoesn’tseenetwork
FileDescriptor ObjectIDs
5 CX
9 CX
47 CY
X 2 B q
C 1 A p SRC
DST User-Space Application
SCAFFOLD Socket State
Labelmanagementbyend‐host
SocketState
LocalObjectID
LocalFlowID
RemoteObjectID
RemoteFlowID
AcceptedSocket
open C 1:A:p X 2:B:q No
bound C 1:A:r Y −− No
unbound E −− Z −− No
IP:ApplicaNonseesnetwork,networkdoesn’tseeappSCAFFOLD:Networkseesapp,appdoesn’tseenetwork
FileDescriptor ObjectIDs
5 CX
9 CX
47 CY
X 2 B q
C 1 A p SRC
DST User-Space Application
SCAFFOLD Socket State
4:A:p 2:D:t3:D:q
4:A:r
MigraNonandFailover
• PlannedmigraNonorphysicalmobility– In‐bandsignaling:DesNnaNonreplacesflowidoldwithflowidnew
• Unplannedfailover– FailureofdesNnaNoncausesremovalfromflowswitch
– Flowswitchlookupfails,flowre‐resolvedatobjectswitch– SenderagainlearnsnewlocaNon(flowid)viain‐bandsignaling
• Mayrequirenew3‐wayhandshakeforrenegoNaNon
NetworkManagementAPIs
X
Controller
X
X
• Flowswitch– FlowTable:MapFlowIDto
networkaddroroutport
• Objectswitch– ObjectTable:MapObjIDto
FlowIDlabel
– Typicallycolocatesflowtable
• End‐host– Join/leavenetwork– Register/unregisterobjectIDs– Migrate/redirectflowids
• NetworkController
NetworkManagementAPIs
X
Controller
X
X
• Flowswitch– FlowTable:MapFlowIDto
networkaddroroutport
• Objectswitch– ObjectTable:MapObjIDto
FlowIDlabel
– Typicallycolocatesflowtable
• End‐host– Join/leavenetwork– Register/unregisterobjectIDs– Migrate/redirectflowids
• NetworkController
NetworkManagementAPIs• Flowswitch
– FlowTable:MapFlowIDtonetworkaddroroutport
• Objectswitch– ObjectTable:MapObjIDto
FlowIDlabel
– Typicallycolocatesflowtable
• End‐host– Join/leavenetwork– Register/unregisterobjectIDs– Migrate/redirectflowids
• NetworkController
X
Controller
X
X
reg
join
IncrementalDeploymentModel
X
DC 2 DC 1
Backbone
Internet
X
Legacy Clients
Y
X
YY
Legacy Clients
IncrementalDeploymentModel
X
DC 2 DC 1
X
YY
X
AnycastedIPPrefix(BGP)
AnycastSubprefix1 Anycast
Subprefix2
IPforwarding
YMACforwarding
CurrentimplementaNon• BackwardscompaNblewith
legacyIPv4networks– SCAFFOLDpacketformat:
• ObjectIDinUDPport• FlowidinIPv4addr
• Flowswitch– OpenFlowso_wareswitch– Hit:LPMonflowID
– Miss:EGREtuntoobjswitch
• Objectswitch– OpenFlowso_wareswitch– Hit:ExactmatchonobjID
– Miss:SendpackettoController
X
Controller
X
X
reg
join
Ingress Proxy
CurrentimplementaNon• End‐host
– NewSCAFFOLDsocketlibrary– User‐levelClickprocess
• Networkandtransport• Comm.withController
– TUN/TAPdriverandin‐kernelClickforpacketintercepNon
• NetworkController– NOX– NewhostAPI(viapacket_in)– Manageflow/objectswitches
X
Controller
X
X
reg
join
Ingress Proxy
OpenflowDesiderata
• Matchonone‐of‐Nentries(e.g.,hashing)
• MulNcast(e.g.,forplannedredirect)
• PacketencapsulaNon(forforwardingtoobjswitch)– UnnecessaryifSCAFFOLD‐onlynetwork
• Moreflexibility/spaceforheaderencoding/rewriNng– Currently216objectsandfixed28DCs,28hosts,216sockets– IPv6supportwouldprovidemuchgreatlyscalability– UlNmatelyprefertodefineownheaderformat
SCAFFOLDonend‐hosts:
NewsocketAPIandnetworkstack
+------------------+ | Scafd Daemon | +----------+ | | | | AF_UNIX socket | Event-driven | | SF app | <------------------> | SFNet element | | | Scafd protocol | | +----------+ +--------+---------+ |Control | Sock | | table | table | +--+-----+-----+---+ Change | ^ kernel | | SF packets send/recv state | +---------------+ | | IP Frag/Reass | | +---------------+ | | V V +-------+ +-----+ User |ioctl()| | TUN | ---------------------------------+-------+---+-----+--- Kernel | +-------<------------->-------// more ethX | +----------+-----------+ | | +------+----+ +------+------+ | | | | non-SF | SFEthOut | | SFEthIn +---------> Linux Stack | | | +-----+ +-+-----+---+ +-------+-----+ | | | ^ ^ ARP | | Eth+SF | Eth+SF | ARP V V pkts | pkts | | +------------------------------+ | +--| eth device |------+ +----+--------+--------+-------+ ^ ^ | V V V flow-switch non-SF L2 bcast
SocketArchitecture
+------------------+ | Scafd Daemon | +----------+ | | | | AF_UNIX socket | Event-driven | | SF app | <------------------> | SFNet element | | | Scafd protocol | | +----------+ +--------+---------+ |Control | Sock | | table | table | +--+-----+-----+---+
SocketArchitecturetypedef struct { uint16_t v; } sf_obj_t;
struct sockaddr_sf { uint16_t family; sf_obj_t local_obj_id; sf_obj_t remote_obj_id; };
int socket_sf (int domain, int type, int protocol)
int bind_sf (int s, const sockaddr *, socklen_t)
– Blockingcall,returnsa_erregistercallb/wscafdandController
int connect_sf (int s, const sockaddr *, socklen_t, sf_err_t &) – Bothblockingandnon‐blockingversions(workswithselect)– Returnssuccessa_er3‐wayhandshakewithremotesockaddr
+------------------+ | Scafd Daemon | +----------+ | | | | AF_UNIX socket | Event-driven | | SF app | <------------------> | SFNet element | | | Scafd protocol | | +----------+ +--------+---------+ |Control | Sock | | table | table | +--+-----+-----+---+
SocketArchitecturetypedef struct { uint16_t v; } sf_obj_t;
struct sockaddr_sf { uint16_t family; sf_obj_t local_obj_id; sf_obj_t remote_obj_id; };
int listen_sf (int s, int backlog, sf_err_t &) int listen_sf (int s, const sockaddr *, socklen_t, int backlog, sf_err_t &)
– LarerversionallowssinglesockettolistenonmulNpleobjects
– Resultsinaregistercallb/wscafdandController
int accept_sf (int s, sockaddr *, socklen_t, sf_err_t &)
– Returnsboundsocket(sender/receiverflowidsestablished)
+------------------+ | Scafd Daemon | +----------+ | | | | AF_UNIX socket | Event-driven | | SF app | <------------------> | SFNet element | | | Scafd protocol | | +----------+ +--------+---------+ |Control | Sock | | table | table | +--+-----+-----+---+
SocketArchitecturetypedef struct { uint16_t v; } sf_obj_t;
struct sockaddr_sf { uint16_t family; sf_obj_t local_obj_id; sf_obj_t remote_obj_id; };
ssize_t send_sf (int s, const void *, size_t, int flags, &) ssize_t sendto_sf (int s, const void *, size_t, int flags, const sockaddr *, socklen_t, sf_err_t &)
ssize_t recv_sf (int s, void *, size_t, int flags, &) ssize_t recvfrom_sf (int s, void *, size_t, int flags, struct sockaddr *, socklen_t *, sf_err_t &)
int close_sf (int s, sf_err_t &);
– Connectedsocketsexecute3‐wayhandshake– Bound/listeningsocketsunregisterallobjIDswithController
ApplicaNons
• Replicatedwebservices– Fault‐tolerantfailoverforunmodifiedservices
• Key‐valuestorew/olayer‐7switch(memcached,CRAQ)
• Layer‐3VMmigraNon• Wide‐areacontentdistribuNonnetwork• SubstrateforVirtualWorlds(Meru)
• Currentports– Iperf– TFTP(FTPoverUDP)– NFSv3(inprogress)
Unresolvedforclean‐slatedesign
• DiscoveryandecosystemofauthoritaNveobjectswitches
• Security– Wide‐arearouNngannouncements
– In‐bandsignalingofflowidupdates
• Flexibilityandextensibility– Useforfine‐grain,ephemeralobjID’s(CCN)
– Revisitstream‐orientedappsasself‐descripNvedatagrams
– SupplantallIPandhost‐to‐hostcommunicaNon?“Host”asserviceIDwithsinglelocaNon?
RelatedWork
• Addressing:SeparaNnglocaNonfromidenNty– SFR,LNA,DOA,LISP;ROFL,SEATTLE– Triad,DONA,CCN– Portland,VL2,SPAIN
• MigraNonandMobility– MobileIP,i3,LISP,TCPMigrate,SCTP;RTP,Trickles
• ReplicaNonandIPanycast– SFR,DOA;4D‐likecontrol;PIAS,GIA
• RouNngoncoarsegrainidenNfiers– AIP,NIRA
Buildingaservice‐centricnetworkwithSCAFFOLD
MichaelJ.Freedman
PrincetonUniversity
withPremGopalan,StevenKo,JenRexford,andDavidShue
*Service‐CentricArchitectureForFlexibleObjectLocalizaNonandDistribuNon
*