goplane: open source bum-less networking for large scale ... · •researcher@software innovation...
TRANSCRIPT
Copyright©2015 NTT corp. All Rights Reserved.
GoPlane: Open Source BUM-less Networking for Large Scale Docker Deployment
August 19, 2015 Soramichi Akiyama Software Innovation Center, NTT, Japan [email protected]
2 Copyright©2015 NTT corp. All Rights Reserved.
• Researcher@Software Innovation Center, NTT
• Had been working on Virtual Machine Live Migration for 5 years • Ask me if you want to know migration technologies
other than “trace and replay”
• Current work on software defined networking for DCs using baremetal switches
• More on me: http://www.soramichi.jp/
A bit about Myself
3 Copyright©2015 NTT corp. All Rights Reserved.
• Japanese counterpart of AT&T
• telephone, telegraph, ASDL, FTTH, ISP, DC, mobile
• Changing toward timely and open
• FLOSS we develop (avail@GitHub)
• Ryu Openflow controller
• Sheepdog distributed storage (merged in QEMU upstrm)
• Lagopus software switch
A bit about NTT
4 Copyright©2015 NTT corp. All Rights Reserved.
Large Scale Docker Deployment
Internet
DC1
DC2
i) Load balancing across multiple racks
ii) User affinity across multiple
DCs in the same provider
iii) Disaster recovery
across the Internet
5 Copyright©2015 NTT corp. All Rights Reserved.
• (Ideally) Deployable as developed
• This requires network tenant isolation
How Docker Apps are Deployed
Development Deployment
Your Macbook (or Linux machine)
No (visible) one else on the NW
Use any IP you want
VM
No (visible) one else on the NW
Use any IP you want
6 Copyright©2015 NTT corp. All Rights Reserved.
• Large scale Docker deployment hits L3 boundaries
• Across racks, DCs, Internet
• Large scale single L2 domain is in feasible
Overcoming L3 boundary
Original Docker app in a single L2 domain
10.0.1.1 10.0.1.2 10.0.1.3
7 Copyright©2015 NTT corp. All Rights Reserved.
• Large scale Docker deployment hits L3 boundaries
• Across racks, DCs, Internet
• Large scale single L2 domain is in feasible
Overcoming L3 boundary
10.0.1.1 10.0.1.2 10.0.1.3
Load-balanced version across an L3 NW
L3 boundary
??
DC1 DC2
8 Copyright©2015 NTT corp. All Rights Reserved.
• VxLAN: Virtual eXtensible LAN
• Encapsulate L2 packets with VxLAN headers
• L2 networks can be extended over L3 networks
L2 over L3 using VxLAN
L3 boundary
VxLAN switch
1.2.3.4
VxLAN switch
123.45.67.89
10.0.1.1 10.0.1.3
L2 packet
9 Copyright©2015 NTT corp. All Rights Reserved.
• VxLAN: Virtual eXtensible LAN
• Encapsulate L2 packets with VxLAN headers
• L2 networks can be extended over L3 networks
L2 over L3 using VxLAN
VxLAN switch
1.2.3.4
VxLAN switch
123.45.67.89
10.0.1.1 10.0.1.3
vxlan header
L3 boundary
L2 packet
UDP packet
10 Copyright©2015 NTT corp. All Rights Reserved.
• VxLAN: Virtual eXtensible LAN
• Encapsulate L2 packets with VxLAN headers
• L2 networks can be extended over L3 networks
L2 over L3 using VxLAN
L2 packet
L3 boundary
VxLAN switch
1.2.3.4
VxLAN switch
123.45.67.89
10.0.1.1 10.0.1.3
11 Copyright©2015 NTT corp. All Rights Reserved.
Problem: How to find the next hop?
10.0.1.3
10.0.1.1
10.0.1.2
10.0.1.4
VxLAN tunnels over L3 NW
ping 10.0.1.3
Where to send BUM packets? Broadcast Unknown Unicast Multicast
??
12 Copyright©2015 NTT corp. All Rights Reserved.
• Dataplane flood and learn
• Just like normal L2 network
• Broadcasting over L3 networks!
Naïve method: what Socketplane does
ARP 10.0.1.3
b-cast to ff:ff:ff:ff:ff:ff
10.0.1.3
10.0.1.1
10.0.1.2
10.0.1.4
13 Copyright©2015 NTT corp. All Rights Reserved.
• BUM packets are sent to *the* service node
• Reduce CPU load for packet replication
Existing approaches: Nicira service node
Figure cited from http://blog.ipspace.net/2013/08/nicira-nvp-control-plane.html
14 Copyright©2015 NTT corp. All Rights Reserved.
• BUM packets are sent to *the* service node
• Reduce CPU load for packet replication
• No help to the congestion problem!
Existing approaches: Nicira service node
ARP
Figure cited from http://blog.ipspace.net/2013/08/nicira-nvp-control-plane.html
15 Copyright©2015 NTT corp. All Rights Reserved.
• Implemented in Cisco Nexus switches
• Mac addresses are reported to/distributed from ToR
Existing approaches: Cisco floodless mode
Cisco Nexus
Host w/ special module
VM VM
Host w/ special module
VM VM (1) VM invoked
(2) Mac addr
detected
(3) Mac addr reported (4) Mac addr distributed
to the other hosts
16 Copyright©2015 NTT corp. All Rights Reserved.
• Implemented in Cisco Nexus switches
• Mac addresses are reported to/distributed from ToR
• Vendor lock-in, Cannot go across an L3 network
Existing approaches: Cisco floodless mode
Cisco Nexus
Host w/ special module
VM VM
Host w/ special module
VM VM (1) VM invoked
(2) Mac addr
detected
(3) Mac addr reported (4) Mac addr distributed
to the other hosts
17 Copyright©2015 NTT corp. All Rights Reserved.
• EVPN (Ethernet VPN): Simplified overview
• Extended ethernet across L3 networks
• Standardized in RFC7432 on Feb 2015
• Exchange MAC addresses thru BGP
Our solution: OVS + BGP (EVPN)
11:22:33:44:55:66
60:57:18:XX:YY:ZZ
E4:1D:2D:AA:BB:CC 11:22:33:44:55:66
60:57:18:XX:YY:ZZ
E4:1D:2D:AA:BB:CC
Border Gateway Protocol
18 Copyright©2015 NTT corp. All Rights Reserved.
• System Components
• GoBGP: Fully open and API-capable BGP daemon
• GoPlane: Data plane management
• Open vSwitch: VxLAN tunneling and flow control
Our solution: OVS + BGP (EVPN)
Linux
GoBGP
GoPlane
Existent things What we built
19 Copyright©2015 NTT corp. All Rights Reserved.
How it works: Mac address exchange
Linux
GoBGP
GoPlane
(1) Container invoked
(2) MAC addr notified to GoBGP
Linux
GoBGP
GoPlane
L3 network
(e.g. Inter-rack, Inter-DC, Inet-net)
(3) MAC addr sent with BGP
(4) Insert flow
(5) VxLAN tunnel
established
Packet forwarding or modifying rule (OpenFlow term)
20 Copyright©2015 NTT corp. All Rights Reserved.
1. Remote tunnel selection flow
• Mac address VxLAN tunnel beyond which the container with the mac exists
2. ARP responder flow
• Mutates an ARP request to a corresponding ARP response (described in the coming slides)
Flows Inserted
21 Copyright©2015 NTT corp. All Rights Reserved.
How it works: ARP & Response
Linux
GoBGP
GoPlane
Linux
GoBGP
GoPlane
(1) ARP packet sent to
FF:FF:FF:FF:FF (b-cast)
ARP
22 Copyright©2015 NTT corp. All Rights Reserved.
How it works: ARP & Response
Linux
GoBGP
GoPlane
Linux
GoBGP
GoPlane
(2) OVS receives the ARP,
mutates it to the response
(as OVS knows the MAC addresses!)
ARP
23 Copyright©2015 NTT corp. All Rights Reserved.
How it works: ARP & Response
Linux
GoBGP
GoPlane
Linux
GoBGP
GoPlane
(3) Container receives the response,
ARP packets never be shot to actual NW
ARP response
24 Copyright©2015 NTT corp. All Rights Reserved.
• Every unicast is “known”
• If the target mac is unknown by the OVS, that address does not exist
How it works: Unicast
10.0.1.3
10.0.1.1
10.0.1.2
10.0.1.4
ping 10.0.1.3
25 Copyright©2015 NTT corp. All Rights Reserved.
• Every unicast is “known”
• If the target mac is unknown by the OVS, that address does not exist
How it works: Unicast
10.0.1.3
10.0.1.1
10.0.1.2
10.0.1.4
ping 11:a:b:c:d:66
ARP resolvable locally (previous slides)
26 Copyright©2015 NTT corp. All Rights Reserved.
• Every unicast is “known”
• If the target mac is unknown by the OVS, that address does not exist
How it works: Unicast
10.0.1.3
10.0.1.1
10.0.1.2
10.0.1.4
The only one VxLAN tunnel chosen by OVS
27 Copyright©2015 NTT corp. All Rights Reserved.
• Pros of our solution • Fully open, no proprietary technology
• Flexibility by API-capable BGP daemon
Even the containers themselves can manage the network if you want (looking for a good use case)
• Cons of our solution • BGP doesn’t consider coherence
Special extension needed for IP management
Discussion
28 Copyright©2015 NTT corp. All Rights Reserved.
• Open Source BGP daemon we develop
• Available@Github https://github.com/osrg/gobgp
• Controlled with GRPC (via http)
• Suitable for SDN
• Leverage multi-core CPUs
• The most famous existing software BGP is single threaded
GoBGP
29 Copyright©2015 NTT corp. All Rights Reserved.
EVPN Interoperation@Interop Tokyo 2015
Cisco Nexus 9xxx GoBGP + GoPlane
Juniper QFX
http://osrg.github.io/goplane/post/20150615/
30 Copyright©2015 NTT corp. All Rights Reserved.
• L2 over L3 NW is required for large scale Docker deployment
• Existing solutions do not fulfill requirements • Service node: no help for congestion
• Cisco floodless mode: vendor lock-in, ToR only
• We built a fully open solution, by a combination of BGP (EVPN) + OVS
• GoPlane and GoBGP available at GitHub!
Summary