goplane: open source bum-less networking for large scale ... · •researcher@software innovation...

31
Copyright©2015 NTT corp. All Rights Reserved. GoPlane: Open Source BUM-less Networking for Large Scale Docker Deployment August 19, 2015 Soramichi Akiyama Software Innovation Center, NTT, Japan [email protected]

Upload: phunghuong

Post on 28-Mar-2019

216 views

Category:

Documents


0 download

TRANSCRIPT

Copyright©2015 NTT corp. All Rights Reserved.

GoPlane: Open Source BUM-less Networking for Large Scale Docker Deployment

August 19, 2015 Soramichi Akiyama Software Innovation Center, NTT, Japan [email protected]

2 Copyright©2015 NTT corp. All Rights Reserved.

• Researcher@Software Innovation Center, NTT

• Had been working on Virtual Machine Live Migration for 5 years • Ask me if you want to know migration technologies

other than “trace and replay”

• Current work on software defined networking for DCs using baremetal switches

• More on me: http://www.soramichi.jp/

A bit about Myself

3 Copyright©2015 NTT corp. All Rights Reserved.

• Japanese counterpart of AT&T

• telephone, telegraph, ASDL, FTTH, ISP, DC, mobile

• Changing toward timely and open

• FLOSS we develop (avail@GitHub)

• Ryu Openflow controller

• Sheepdog distributed storage (merged in QEMU upstrm)

• Lagopus software switch

A bit about NTT

4 Copyright©2015 NTT corp. All Rights Reserved.

Large Scale Docker Deployment

Internet

DC1

DC2

i) Load balancing across multiple racks

ii) User affinity across multiple

DCs in the same provider

iii) Disaster recovery

across the Internet

5 Copyright©2015 NTT corp. All Rights Reserved.

• (Ideally) Deployable as developed

• This requires network tenant isolation

How Docker Apps are Deployed

Development Deployment

Your Macbook (or Linux machine)

No (visible) one else on the NW

Use any IP you want

VM

No (visible) one else on the NW

Use any IP you want

6 Copyright©2015 NTT corp. All Rights Reserved.

• Large scale Docker deployment hits L3 boundaries

• Across racks, DCs, Internet

• Large scale single L2 domain is in feasible

Overcoming L3 boundary

Original Docker app in a single L2 domain

10.0.1.1 10.0.1.2 10.0.1.3

7 Copyright©2015 NTT corp. All Rights Reserved.

• Large scale Docker deployment hits L3 boundaries

• Across racks, DCs, Internet

• Large scale single L2 domain is in feasible

Overcoming L3 boundary

10.0.1.1 10.0.1.2 10.0.1.3

Load-balanced version across an L3 NW

L3 boundary

??

DC1 DC2

8 Copyright©2015 NTT corp. All Rights Reserved.

• VxLAN: Virtual eXtensible LAN

• Encapsulate L2 packets with VxLAN headers

• L2 networks can be extended over L3 networks

L2 over L3 using VxLAN

L3 boundary

VxLAN switch

1.2.3.4

VxLAN switch

123.45.67.89

10.0.1.1 10.0.1.3

L2 packet

9 Copyright©2015 NTT corp. All Rights Reserved.

• VxLAN: Virtual eXtensible LAN

• Encapsulate L2 packets with VxLAN headers

• L2 networks can be extended over L3 networks

L2 over L3 using VxLAN

VxLAN switch

1.2.3.4

VxLAN switch

123.45.67.89

10.0.1.1 10.0.1.3

vxlan header

L3 boundary

L2 packet

UDP packet

10 Copyright©2015 NTT corp. All Rights Reserved.

• VxLAN: Virtual eXtensible LAN

• Encapsulate L2 packets with VxLAN headers

• L2 networks can be extended over L3 networks

L2 over L3 using VxLAN

L2 packet

L3 boundary

VxLAN switch

1.2.3.4

VxLAN switch

123.45.67.89

10.0.1.1 10.0.1.3

11 Copyright©2015 NTT corp. All Rights Reserved.

Problem: How to find the next hop?

10.0.1.3

10.0.1.1

10.0.1.2

10.0.1.4

VxLAN tunnels over L3 NW

ping 10.0.1.3

Where to send BUM packets? Broadcast Unknown Unicast Multicast

??

12 Copyright©2015 NTT corp. All Rights Reserved.

• Dataplane flood and learn

• Just like normal L2 network

• Broadcasting over L3 networks!

Naïve method: what Socketplane does

ARP 10.0.1.3

b-cast to ff:ff:ff:ff:ff:ff

10.0.1.3

10.0.1.1

10.0.1.2

10.0.1.4

14 Copyright©2015 NTT corp. All Rights Reserved.

• BUM packets are sent to *the* service node

• Reduce CPU load for packet replication

• No help to the congestion problem!

Existing approaches: Nicira service node

ARP

Figure cited from http://blog.ipspace.net/2013/08/nicira-nvp-control-plane.html

15 Copyright©2015 NTT corp. All Rights Reserved.

• Implemented in Cisco Nexus switches

• Mac addresses are reported to/distributed from ToR

Existing approaches: Cisco floodless mode

Cisco Nexus

Host w/ special module

VM VM

Host w/ special module

VM VM (1) VM invoked

(2) Mac addr

detected

(3) Mac addr reported (4) Mac addr distributed

to the other hosts

16 Copyright©2015 NTT corp. All Rights Reserved.

• Implemented in Cisco Nexus switches

• Mac addresses are reported to/distributed from ToR

• Vendor lock-in, Cannot go across an L3 network

Existing approaches: Cisco floodless mode

Cisco Nexus

Host w/ special module

VM VM

Host w/ special module

VM VM (1) VM invoked

(2) Mac addr

detected

(3) Mac addr reported (4) Mac addr distributed

to the other hosts

17 Copyright©2015 NTT corp. All Rights Reserved.

• EVPN (Ethernet VPN): Simplified overview

• Extended ethernet across L3 networks

• Standardized in RFC7432 on Feb 2015

• Exchange MAC addresses thru BGP

Our solution: OVS + BGP (EVPN)

11:22:33:44:55:66

60:57:18:XX:YY:ZZ

E4:1D:2D:AA:BB:CC 11:22:33:44:55:66

60:57:18:XX:YY:ZZ

E4:1D:2D:AA:BB:CC

Border Gateway Protocol

18 Copyright©2015 NTT corp. All Rights Reserved.

• System Components

• GoBGP: Fully open and API-capable BGP daemon

• GoPlane: Data plane management

• Open vSwitch: VxLAN tunneling and flow control

Our solution: OVS + BGP (EVPN)

Linux

GoBGP

GoPlane

Existent things What we built

19 Copyright©2015 NTT corp. All Rights Reserved.

How it works: Mac address exchange

Linux

GoBGP

GoPlane

(1) Container invoked

(2) MAC addr notified to GoBGP

Linux

GoBGP

GoPlane

L3 network

(e.g. Inter-rack, Inter-DC, Inet-net)

(3) MAC addr sent with BGP

(4) Insert flow

(5) VxLAN tunnel

established

Packet forwarding or modifying rule (OpenFlow term)

20 Copyright©2015 NTT corp. All Rights Reserved.

1. Remote tunnel selection flow

• Mac address VxLAN tunnel beyond which the container with the mac exists

2. ARP responder flow

• Mutates an ARP request to a corresponding ARP response (described in the coming slides)

Flows Inserted

21 Copyright©2015 NTT corp. All Rights Reserved.

How it works: ARP & Response

Linux

GoBGP

GoPlane

Linux

GoBGP

GoPlane

(1) ARP packet sent to

FF:FF:FF:FF:FF (b-cast)

ARP

22 Copyright©2015 NTT corp. All Rights Reserved.

How it works: ARP & Response

Linux

GoBGP

GoPlane

Linux

GoBGP

GoPlane

(2) OVS receives the ARP,

mutates it to the response

(as OVS knows the MAC addresses!)

ARP

23 Copyright©2015 NTT corp. All Rights Reserved.

How it works: ARP & Response

Linux

GoBGP

GoPlane

Linux

GoBGP

GoPlane

(3) Container receives the response,

ARP packets never be shot to actual NW

ARP response

24 Copyright©2015 NTT corp. All Rights Reserved.

• Every unicast is “known”

• If the target mac is unknown by the OVS, that address does not exist

How it works: Unicast

10.0.1.3

10.0.1.1

10.0.1.2

10.0.1.4

ping 10.0.1.3

25 Copyright©2015 NTT corp. All Rights Reserved.

• Every unicast is “known”

• If the target mac is unknown by the OVS, that address does not exist

How it works: Unicast

10.0.1.3

10.0.1.1

10.0.1.2

10.0.1.4

ping 11:a:b:c:d:66

ARP resolvable locally (previous slides)

26 Copyright©2015 NTT corp. All Rights Reserved.

• Every unicast is “known”

• If the target mac is unknown by the OVS, that address does not exist

How it works: Unicast

10.0.1.3

10.0.1.1

10.0.1.2

10.0.1.4

The only one VxLAN tunnel chosen by OVS

27 Copyright©2015 NTT corp. All Rights Reserved.

• Pros of our solution • Fully open, no proprietary technology

• Flexibility by API-capable BGP daemon

Even the containers themselves can manage the network if you want (looking for a good use case)

• Cons of our solution • BGP doesn’t consider coherence

Special extension needed for IP management

Discussion

28 Copyright©2015 NTT corp. All Rights Reserved.

• Open Source BGP daemon we develop

• Available@Github https://github.com/osrg/gobgp

• Controlled with GRPC (via http)

• Suitable for SDN

• Leverage multi-core CPUs

• The most famous existing software BGP is single threaded

GoBGP

29 Copyright©2015 NTT corp. All Rights Reserved.

EVPN Interoperation@Interop Tokyo 2015

Cisco Nexus 9xxx GoBGP + GoPlane

Juniper QFX

http://osrg.github.io/goplane/post/20150615/

30 Copyright©2015 NTT corp. All Rights Reserved.

• L2 over L3 NW is required for large scale Docker deployment

• Existing solutions do not fulfill requirements • Service node: no help for congestion

• Cisco floodless mode: vendor lock-in, ToR only

• We built a fully open solution, by a combination of BGP (EVPN) + OVS

• GoPlane and GoBGP available at GitHub!

Summary

31 Copyright©2015 NTT corp. All Rights Reserved.

Q&A