perspective of virtual switching fabric€¦ · the average hash bucket’s list length cbp-input...
TRANSCRIPT
© 2018 VMware Inc. All rights reserved.
Perspective of Virtual Switching [email protected]
Agenda
2
q Overview of Fabric Structure
q Emulation & Demonstration
q Performance Evaluation
q Use cases Imagination
What is the Fabric?A L2 virtual networking using smart routing
o node naming: spine vSwitch node vs leaf vSwitch nodeo pseudo wire service: Ethernet-Line(E-Line)o pseudo lan service: Ethernet-LAN(E-LAN)
o mac based forwardingo emulate LAN at the core(built-in multicast tree, replication on-demand)
o path optimizationo adjacency next-hop count as weighto link workload as weighto for E-LAN, minimum spanning treeo For E-LINE, shortest path
E-LAN
E-LINE
What is the Fabric? contdtransport layer consideration
o Ethernet over MPLS ? RFC448o path selection and tenant identificationo encapsulation overhead: 12+2+4=18 bytes
o mitigate TOR variables:o PF and VF mac addresses.
o hardware independencyo NICs supported by DPDKo high volume switch
B-dmac B-smac mpls(0x8847) L2 frame
Why Fabric?x86 platform networking capability
o data path essenceo hardware IO capability(PCIe Gen3 x8GT/s TL bandwidth utilization :66%)
from infrastructure network into host(hypervisor) memoryvpp benchmark: 480Gbps L3 forwarding, does not scale any more due to nic/PCI bus limitation
o memory bandwidth is bottleneck of virtual network from vm to vm(host/hypervisor)experiment: 60Gbps intra-numa-node ,two times memory copy, does not scale wellmemory movement consumes too much cpu time and memory bandwidth
o let dedicated servers as fabric do virtual networking!o more agility and hardware dependency than specific hardware solution.
o as the intrinsic requirement of NFV
leaf vswitch
customer service port
Vlan basedservice selection
E-LINE
E-LAN
Label pushing&next hop forwarding
customer backbone port
spine vswitch leaf vswitch
provider backbone port
unicast
multicast
E-LINE
E-LAN
NHLFE selection Label swapping&replication
customer backbone portcustomer service port
Label basedservice selection
Label stripping&Vlan insertion
Fabric constitutionvirtual switch internals
Agenda
7
q Overview of Fabric Structure
q Emulation & Demonstration
q Performance Evaluation
q Use cases Imagination
Emulated environment pre-setup
Linux Guest(VMs)
0 1 2 3 4 5 6 7
Linux Host(Hyperv…) br-ac1 br-lan1 br-lan2 br-ac2
Leaf Spine Leaf
0 2 4 6 8 10net-nsnet-ns
§ Qemu KVM emulated guest, both host and guest OS are CentOS 7§ 8 x e1000 devices with VLAN stripping and insertion offloading capability§ 4 x ovs bridges for 4 LANs emulation(2 attachment circuit lans and 2 common lans)§ Port 0 and port 7 are for customers and port 1-6 are for fabric switches§ For fabric ports, once taken over by e3datapath, re-index them [0,2,4,6,8,10]§ For customer ports, use Linux namespace and vlan sub-interfaces to segregate themselves
E-line servicecsp port 0 cbp port 2 pbp port 4 pbp port 6 cbp port 8 csp port 10
create two e-line services: 0 and 1
Vlan1000 ----> e-line 0
E-line 0 next hop(to port 4) via port2 with label:1
Port 4 with label 1,knows next hop(to port 8) via port 6 with label 100
Port 8 with label 100 goes to e-line1
e-line1--->vlan 2000
vlan 2000--->e-line1
E-lan1 next hop(to port 6) via port 8 with label:10000
Port 6 with 10000,it knows next hop(to port 2 ) with label 10.
E-lan service multicast forwardingcsp port 0 cbp port 2 pbp port 4 pbp port 6 cbp port 8 csp port 10
create two e-lan services: 0 and 1 , and multicast next hop list:0
Vlan3000 ----> e-lan 0
E-lan 0,find no fwd entry, multicast next hop(to port 4) via port2 with label:2
Port 4 with label 2, goes to multicast list0,perform RPF check and send replication (to port 8) via port 6 with label:101
Port 8 with label 101 goes to e-lan1
e-lan1--->vlan 4000
vlan 4000--->e-lan1
E-lan1 still finds no fwd entry, use multicast nexthop(to port 6)via port 8 with label:10001
Port 6 with 10001,does multicast forwarding, finally goes to port 2 via port 4 with label:11
E-lan service unicast forwarding§ At leaf virtual switch, a fwd entry is found with deterministic <label, nhlfe>§ At spine virtual switch, single next hop is bound to input label entry, no multicast list is searched
Agenda
12
q Overview of Fabric Structure
q Emulation & Demonstration
q Performance Evaluation
q Use cases Imagination
Phy-VS-Phy
VPP Performance at Scale
64B1518B0.0
200.0400.0600.0
[Gbps]]
480Gbps zero frame loss
64B1518B0.0
100.0200.0300.0[Mpps]
200Mpps zero frame loss
64B0
200400600[Gbps]]
IMIX => 342 Gbps,1518B => 462 Gbps
64B0
100200300
[Mpps]
64B => 238 Mpps
IPv6, 24 of 72 cores IPv4+ 2k Whitelist, 36 of 72 cores
Zero-packet-loss Throughput for 12 port 40GE
Hardware:
Cisco UCS C460 M4Intel® C610 series chipset4 x Intel® Xeon® Processor E7-8890v3(18 cores, 2.5GHz, 45MB Cache)2133 MHz, 512 GB Total9 x 2p 40GE Intel XL71018 x 40GE = 720GE !!
Latency18 x 7.7trillion packets soak testAverage latency: <23 usecMin Latency: 7…10 usecMax Latency: 3.5 ms
HeadroomAverage vector size ~24-27Max vector size 255Headroom for much more throughput/featuresNIC/PCI bus is the limit not vpp
https://fd.io/resources/
Max Quad Links rx/tx rate
0102030405060708090
100
bidirectional 2x bidirectional 3x bidirectional 4x singly directional 4x
bidirectional 4x
Max Quad Links Rx/Tx rate in Mpps
64-bytes 1514-bytes
46Gbps52Gbps
60Gbps
60Gbps 60Gbps
[Mpps]
Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz with 20M L3 cache
vSwitching processing complexity
items spatial complexity temporal complexity
csp-input 2^12 vlan entries per csp interface o(1) to find vlan distribution entry
e-line-forward 1 <vlan,interface> entry and 1 <label,nhlfe> entry
all o(1) to find the fwd entries
e-lan-forward 64 <vlan,interface> and 64 <label,nhlfe> and 2^16 fib base entry per e-lan and [n/48,n] fib
entry
O(m/48) to find the mac fwd entry, where m is the average hash bucket’s list length
cbp-input 2^20 label entries per cbp interface O(1) to find label distribution entry
pbp-input 2^20 label entries per pbp interface o(1) to find unicast fwd nhlfeo(n) to enumerate multicast entries where n by
default set to 64
Performance expectation
o simpler forwarding logico dpdk native context o burst-oriented and cache optimized and fast indexo expected to scale out with ports across the vSwitch datapath
Agenda
17
q Overview of Fabric Structure
q Emulation & Demonstration
q Performance Evaluation
q Use cases Imagination
vSwitching fabric
Fabric structure reviewbasic end-system accessing
hypervisor PF and VFs(sr-iov)apply QoS and VLAN
vm container bm
leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch
spine vSwitch spine vSwitch spine vSwitch
hypervisor hypervisor hypervisor
Multi-homed
make it bipartite graph ifignoring inter-spine-vswitch connections
attachment circuit LAN
• Try to fully migrate L2 virtual network into infrastructure network domain
Fabric structure review contdbasic end-system accessing
hypervisor
vm container bm
leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch
spine vSwitch spine vSwitch spine vSwitch
hypervisor hypervisor hypervisor
E-LANE-LINE
• local vlan scope• vlan designated forwarder for multi-homed leaf vSwitches• hairpin traffic optimized• BUM traffic optimized
vlan100
vlan1000 vlan2000
vlan101
vlan1001
vlan102vlan102
vSwitching fabric
Native HA viewService continuity
hypervisor
leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch
spine vSwitch spine vSwitch spine vSwitch
hypervisor hypervisor hypervisor
• only active ether-service in forwarding state• ether-service failure can be detected by in-band OAM or(and) controller• controller assists ether-service failover• populate <ff:ff:ff:ff:ff:ff,smac,vlan> to fast update AC lan’s mac table
vlan100
vlan1001activestandby
vSwitching fabric
Native ECMP viewservice continuity and load balancing
hypervisor
leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch leaf vSwitch
spine vSwitch spine vSwitch spine vSwitch
hypervisor hypervisor hypervisor
• each ether-service is active• ether-services are detected by iOAM or(and) controller• failover can be achieved by disabling inactive ether-service
vlan100
vlan1001activeactive
vSwitching fabric
More use case featuresas a link-level L2 network virtualization solution
• We may still borrow ideas from compute virtualization• Snapshot or backup and restore your virtual network• Live migration for your virtual network • Virtual network High Availability• Virtual network fault tolerance (as ECMP?)