vxlan distributed service node
Post on 21-Jul-2015
360 Views
Preview:
TRANSCRIPT
David Lapsley (@devlaps), Chet Burgess (@cfbIV), Kahou Lei (@kahou82)
May 20, 2015
OpenStack Vancouver Summit
VXLAN Distributed Service Node
• L2 Access with L3 Aggregation
• Wasted capacity: STP blocks ports to prevent loops
• VLAN Exhaustion: only 4K with 802.1Q label
• ToR Scalability: hw tables need to scale with endpoints
Traditional Data Centers
• L3 is Scalable
• Well known and supported
• Equal Cost Multi-Path (ECMP) Routing• Each link active at all times
L3
• MAC over UDP/IP overlay
• Re-uses existing IP core (L3 ECMP, No STP)
• Reduces pressure on ToR L2 tables
• Supports over 16M+ VLANs
• Maintains L2 bridging semantics
VXLAN
• Virtual Network Identifier• 24 bits 16+ million
• VXLAN Tunnel End Point (VTEP)• Encapsulation, Decapsulation
• Listen on UDP port 4789 (IANA), 8472 (Linux default) for incoming VXLAN packets
• VNI to VTEP IP mapping
VXLAN Components
VXLAN Example Deployment
Hypervisor 1
VM1 VM2
VTEP (vxlan100)
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
L3 Network
eth0
Hypervisor 2
Tenant bridge (br100)
VM3 VM4
VTEP (vxlan100)
Tenant bridge (br101)
VM3 VM4
VTEP (vxlan101)
eth0
VXLAN 100
VXLAN 101
DMAC SMAC 802.1Q EType Payload CRC
Outer MAC
OuterIP
Outer UDP
VXLAN CRCPayload
VXLANNetwork Identifier
(24 bits)
VXLANFlags
(8 bits)
Reserved(24 bits)
Reserved(8 bits)
Tenant bridge (br100)
• Broadcast, Unknown, and Multicast packets (e.g. ARP, DHCP, multi-cast, etc.) are flooded to all VTEPs for the given VNI
• Two mechanisms used:• Multicast
• Multi-cast address and VNI configured for each VXLAN segment
• VTEP sends IGMP join/leave as VMs spin up/down
• Broadcast domain implemented using multicast
• Service Node:
• Use a “central” service node to maintain mapping of VNIs to VTEP IPs
Broadcast, Unknown and Multicast Packets
Service Node
Hypervisor 1
VM1 VM2
vxlan100 (1.1.1.1)
Tenant bridge (br101)
VM1 VM2
vxlan101 (3.3.3.3)
L3 Network
eth0
Hypervisor 2
Tenant bridge (br100)
VM3 VM4
vxlan100 (2.2.2.2)
Tenant bridge (br101)
VM3 VM4
vxlan101 (4.4.4.4)
eth0
VXLAN 100
VXLAN 101
Tenant bridge (br100)
VNI VTEPs
1001.1.1.1
2.2.2.2
1013.3.3.3
4.4.4.4
Remote
Service Node
Design
Controller 1 Controller 2 Controller 3
L3 Network
Hypervisor 1
Tenant bridge (br100)
VM1 VM2
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
eth0
Hypervisor 500
Tenant bridge (br100)
VM1 VM2
VTEP (vxlan100)
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
eth0
eth0
VTEP (vxlan100)
eth0 eth0
DistributedVXLAN
Service Node
DistributedVXLAN
Service Node
mcrouter
memcache
mcrouter
memcache
mcrouter
memcache
Design
Controller 1 Controller 2 Controller 3
L3 Network
Hypervisor 1
Tenant bridge (br100)
VM1 VM2
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
eth0
Hypervisor 500
Tenant bridge (br100)
VM1 VM2
VTEP (vxlan100)
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
eth0
eth0
VTEP (vxlan100)
eth0 eth0
DistributedVXLAN
Service Node
DistributedVXLAN
Service Node
mcrouter
memcache
mcrouter
memcache
mcrouter
memcache
Design
Controller 1 Controller 2 Controller 3
L3 Network
Hypervisor 1
Tenant bridge (br100)
VM1 VM2
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
eth0
Hypervisor 500
Tenant bridge (br100)
VM1 VM2
VTEP (vxlan100)
Tenant bridge (br101)
VM1 VM2
VTEP (vxlan101)
eth0
eth0
VTEP (vxlan100)
eth0 eth0
DistributedVXLAN
Service Node
DistributedVXLAN
Service Node
mcrouter
memcache
mcrouter
memcache
mcrouter
memcache
• Multi-threaded python program (multiprocessing module)
• Runs on every hypervisor
• Shares state using Distributed Cache• FB Mcrouter – memcached protocol router (5B requests /second @ peak!)
• Listens for new VTEP registrations• Forwards new mappings to Distributed Cache
• Listens for Broadcast, Unknown, Multicast packets• Floods to all VTEPs in the Virtual Network
VXLAN Distributed Service Node
ip link add vxlan1 type vxlan id 1 remote 169.254.1.1 dev
eth0
ip addr add 172.16.1.1 dev vxlan1
ip link set dev vxlan1 mtu 1450
ip link set dev vxlan1 up
Creating VXLAN interfaces
root@mhv2:~# ip addr show vxlan1
4: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc
noqueue state UNKNOWN group default
link/ether f2:af:3f:62:cf:65 brd ff:ff:ff:ff:ff:ff
inet 172.16.1.5/24 scope global vxlan1
valid_lft forever preferred_lft forever
inet6 fe80::f0af:3fff:fe62:cf65/64 scope link
valid_lft forever preferred_lft forever
Configured VXLAN Interface
iptables –t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp -
-dport 8472 -j DNAT --to-destination 127.0.0.1:8473
The @cfbIV rule
-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT
--to-destination 127.0.0.1:8473
The @cfbIV rule
-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT --to-destination 127.0.0.1:8473
The @cfbIV rule
-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT
--to-destination 127.0.0.1:8473
The @cfbIV rule
-t nat -A OUTPUT -d 169.254.1.1/32 -p udp -m udp --dport 8472 -j DNAT
--to-destination 127.0.0.1:8473
The @cfbIV rule
Demo Setup
Controller 1 Controller 2 Controller 3
L3 Network
Hypervisor 1
VTEP (172.16.3.4)
192.168.225.231
Hypervisor 500
192.168.225.232
192.168.225.226
VTEP1 (172.16.1.4)
192.168.225.227 192.168.225.228
VTEP1 (172.16.1.4) VTEP (172.16.3.6)VTEP1 (172.16.1.5) VTEP1 (172.16.1.5)
VXLANDistributed
Service Node
VXLANDistributed
Service Node
mcrouter
memcache
mcrouter
memcache
mcrouter
memcache
• Open source VDSN source code
• Integration with Neutron (if community interest)
• Performance and scalability testing
Future work
• Presentation slides: http://bit.ly/vdsn-presentation
• VDSN Source Code and Ansible playbooks:• Simple, accessible model, horizontal scaling
• http://bit.ly/vdsn-ansible
• VDSN code coming soon (@devlaps, #devlaps)
• Production Code:• Multi-area VXLAN! Highly optimized, requires expertise to
configure/troubleshoot
• http://bit.ly/multi-area-vxlan
References
• C. Burgess, N. Leake, L3 + VXLAN Made Practical, OpenStack Summit Spring 2014.
• M. Mahalingam, et. Al, Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks, https://tools.ietf.org/html/rfc7348
References
• Sanjay K. Hooda, Shyam Kapadia, PadmanabhanKrishnan, Using TRILL, FabricPath, and VXLAN: Designing Massively Scalable Data Centers (MSDC) with Overlays, Cisco Press, 2014.
• Introducing McRouter, http://bit.ly/introducing-mcrouter
References
• McRouter on github, https://github.com/facebook/mcrouter
• Pyroute2, https://pypi.python.org/pypi/pyroute2
• Maintaining a set in Memcached, http://bit.ly/memcache-sets
• Ansible, http://docs.ansible.com
References
top related