22 - idnog03 - christopher lim (mellanox) - efficient virtual network for service providers
TRANSCRIPT
Christopher Lim, Sr. Engineer July 2016
Mellanox Efficient Virtual Network for Service Providers
© 2016 Mellanox Technologies 2- Mellanox Confidential -
Leading Supplier of End-to-End Interconnect Solutions
StoreAnalyzeEnabling the Use of Data
SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules
Comprehensive End-to-End InfiniBand and Ethernet Portfolio (VPI)
Metro / WANNPU & Multicore
NPSTILE
© 2016 Mellanox Technologies 3- Mellanox Confidential -
Cloud-Native NFV Architecture Dictates Efficient Virtual Network
Mellanox EVN: Foundation for Efficient Telco Cloud Infrastructure
Efficient Virtual NetworkEnabling High-performance, Reliable and
Scalable Infrastructure for Cloud Service Delivery
AUTOMATIONACCELERATIONVIRTUALIZATION
ComputeHigher Workload
Density
NetworkLine Rate Packet
ProcessingStorageHigher IOPS, Lower
Latency
© 2016 Mellanox Technologies 4- Mellanox Confidential -
SR-IOV – Overcome Compute Virtualization Penalty
VM 1 VM 2 VM N……VF Driver VF Driver VF Driver
VM
Virtual NIC
VM
Virtual NIC
Hypervisor
Single Root I/O Virtualization (SR-IOV) capable NIC
Virtual Switch
Physical FunctionVirtual Function
Virtual Function
Virtual Function
NIC Embedded Switch
PF Driver
PCIe Bus
Application Direct Access to achieve bare metal I/O performance
……
VMs leveraging SR-IOV and Mellanox eSwitch for near-line-rate performance without CPU overhead
Software-switched VMs suffering from compute virtualization penalty
© 2016 Mellanox Technologies 5- Mellanox Confidential -
SR-IOV + DPDK: Better Together with Mellanox PMD
VM 1 VM 2 VM N
……
Hypervisor
Single Root I/O Virtualization (SR-IOV) capable NIC
Virtual Function
Virtual Function
Virtual Function
NIC Embedded Switch
PCIe Bus
Further accelerate packet processing performance by eliminating interrupts and context switches
……
Mellanox DPDK PMDDPDK Library
Mellanox DPDK PMDDPDK Library
Mellanox DPDK PMDDPDK Library
© 2016 Mellanox Technologies 6- Mellanox Confidential -
Mellanox Sets New DPDK Performance Records
64 128 256 512 1024 1280 1518 Message size0
10
20
30
40
50
60
42.11
30.58
17.96
9.364.78 3.83 3.24
Superior DPDK Packet Performance at Various Frame Sizes (Lx 40G)
Frame Size (In Bytes)
Fram
es p
er S
econ
d (In
Mill
ions
)
Test setup:• ConnectX-4Lx 40GbE Single
port• 4 Cores Dedicated to DPDK
Product
Single-port TCP Throughput
DPDK 64B Packet Throughput
ConnectX-4 100G
93.4 Gb/s
74.4 million p/s
ConnectX-4 Lx 40G
37.6 Gb/s
42.1 million p/s
ConnectX-4 Lx 25G
23.5 Gb/s
34 million p/s
ConnectX-4 40G
37.6 Gb/s
56.4 million p/s
© 2016 Mellanox Technologies 7- Mellanox Confidential -
Solution:• Overlay Network Accelerators in NIC• Penalty free overlays at bare-metal speed• Integrated and validated by major SDN
vendors Benefits:
• 37.5Gb/s on 40G link, >2X compared to without VxLAN offload
• On a 20 cores system, 7 cores are freed to run addition VMs, saving 35% of total cores while doubling the throughput!
Turbocharge Overlay Networks with ConnectX-3/4 NICs
© 2016 Mellanox Technologies 8- Mellanox Confidential -
Cumulus Overlay Solution
VMware NSX
PLUMgrid ONS
Nuage VSP
Midokura Midonet
Juniper OpenContrail
Akanda Astara
Cumulus LNV
Switch VXLAN tunnel endpoint (VTEP) is used • To connect bare metal servers to VXLAN network• To connect VXLAN and legacy network
Cumulus Integrated with every major Overlay Solution Available with Mellanox switches April 2016
© 2016 Mellanox Technologies 9- Mellanox Confidential -
Accelerated Switching And Packet Processing (ASAP2)
Best of both worlds: Enable hardware accelerated data plane with SDN/virtual switch control plane
Multiple possibilities of accelerated data plane including DPDK in CPU, embedded switch, FPGA, network processor, multi-core processor in server adaptor, TOR switch, or centralized acceleration pool
Standard hardware API to allow control plane and data plane to operate and innovate independently
Roadmap
Virtual Switch Control Plane
Hardware Accelerate
d Data Plane
Standard Hardware Abstraction Interface
ASAP2
© 2016 Mellanox Technologies 10- Mellanox Confidential -
ASAP2 Phase 1: ASAP2 Direct
OVS Control Plane, optionally combined with SDN controller
Direct application I/O access through SR-IOV
Accelerated forwarding and classification through Embedded Switch (eSwitch) on Mellanox NIC
OSVM
OSVM
OSVM
OSVM
tap tapSR-IOV
to the VM
Embedded Switch
© 2016 Mellanox Technologies 11- Mellanox Confidential -
OVS Architecture and Operations
11
OVS-vswitchd
OVS Kernel ModuleFirst
Packet
Subsequent Packets
UserKernel
Forwarding• Flow-based forwarding• First packet of a new flow (match miss) is
directed to user space (ovs-vswitchd)• ovs-vswitchd determines flow handling and
programs kernel (fast path) • Following packets hit kernel flow entries and are
executed in fast path
© 2016 Mellanox Technologies 12- Mellanox Confidential -
Mellanox eSwitch
ASAP2 – Let the Hardware Do the Heavy-lifting
New Flow
• A new flow will result in a ‘miss’ action in eSwitch and is directed to OVS kernel module
• Miss in kernel will punt the packet to OVS-vswitchd in user space
Configuration
• OVS-vswitchd will resolve the flow entry, and based on a policy decision to offload, propagate that to corresponding eSwitch tables for offload-enabled flows
Fast Forwarding
• Subsequent frames of offload-enabled flows will be processed and forwarded by eSwitch
OVS-vswitchd
OVS Kernel ModuleFirst
Packet
Subsequent HW Forwarded
Packets
User
Kernel
Fallback Forwarding Path
Software
Hardw
are
© 2016 Mellanox Technologies 13- Mellanox Confidential -
OVS and SRIOV, Working Seamlessly Together
Representor ports enable OVS to “know” and service those VMs that uses SR-IOV
Representor ports are used for eSwitch / OVS communication (miss flow and PV to SR-IOV communication)
Netdev Representor
Netdev Representor
netdev netdev
VMs using OVS Offload VMs using Para-Virtualization
NIC eSwitch
Policy based Flow Sync
© 2016 Mellanox Technologies 14- Mellanox Confidential -
Software Defined Networking, at Full Speed
Highest performance (High throughput, low and deterministic latency)• Offload is increasingly important as server I/O
speed goes up Low CPU overhead, higher infrastructure
efficiency Software defined Everything In-Box
• All changes will be up-streamed, no proprietary OVS or kernel patches eSwitcheSwitcheSwitch
ConfigurationStats Reporting
SDN or Other Network Orchestration
© 2016 Mellanox Technologies 15- Mellanox Confidential -
Benchmark Targets
Matrices• Message Rate (PPS) • Network related CPU Load
Environments • 25Gbps network • Extreme performance• Open Source• Free
Standard Benchmark• RFC 2544
© 2016 Mellanox Technologies 16- Mellanox Confidential -
Benchmark Topology and Traffic Flow
Mellanox
Kernel Kernel
Kernel Kernel
User User
UserUser
OVS Over DPDK OVS Offload
OVS
DPDK
DPDK
Testpmd
OVS
eSwitch
DPDK
Testpmd
Flows Offload
25GE 25GE
VM
Hypervisor
NIC
VS.
© 2016 Mellanox Technologies 17- Mellanox Confidential -
Results and Conclusions
330% higher message rate compared to OVS over DPDK• 33M PPS VS. 7.6M PPS• OVS Offload reach near line rate at 25G (37.2M PPS)
Zero! CPU utilization on hypervisor compared to 4 cores with OVS over DPDK• This delta will grow further with packet rate and link
speed
Same CPU load on VM
OVS Offload OVS over DPDK0
5
10
15
20
25
30
35
0
0.5
1
1.5
2
2.5
3
3.5
4
4.533M PPS
7.6M PPS
0 Cores
4 Cores
Message Rate Dedicated Hypervisor Cores
Mill
ion
Pack
et P
er S
econ
d
Num
ber o
f Ded
icat
ed C
ores
© 2016 Mellanox Technologies 18- Mellanox Confidential -
Accelerated Data Movement End to End: 25 is the New 10
One Switch. A World of Options.
Flexibility, Opportunities, Speed
Open Ethernet, Zero Packet Loss
Most Cost-Effective Ethernet Adapter
2.5X the Network Performance
Same Infrastructure, Same Connector
One Switch. A World of Options. 25G and 50G at Your Fingertips
© 2016 Mellanox Technologies 19- Mellanox Confidential -
Spectrum: The Ultimate 25/100GbE Switch
The only predictable 25/50/100Gb/s Ethernet switch Full wire speed, non-blocking switch
• Doesn’t drop packets per RFC2544 ZPL: ZeroPacketLoss for all packets sizes
© 2016 Mellanox Technologies 20- Mellanox Confidential -
25GbE to 25GbE Latency Test Results
Not All Ethernet Switches Were Born Equal
5.2
8.49.6 9.7
0.3 0.9 1.0 1.1
64B 512B 1.5B 9KB
Max
Bur
st S
ize (M
B)
Packet size
Microburst Absorption Capability
Spectrum Tomahawk64 82 12
814
616
418
220
025
615
1892
1650
60
70
80
90
100
50
60
70
80
90
100
Packet Size (Bytes)Packet Size (Bytes)
Broadcom Spectrum
Microburst Absorption Fairness Avoidable Packet Loss
Broadcom Spectrum
www.Mellanox.com/tollywww.zeropacketloss.com
Consistently Low Latency
© 2016 Mellanox Technologies 21- Mellanox Confidential -
Open APIs
Open Composable Networks
Automation
End-to-End Interconnect
Network OS
ChoiceSONiC
© 2016 Mellanox Technologies 22- Mellanox Confidential -
RDMA Acceleration – Overcome Transport Protocol Inefficiencies
ZERO Copy Remote Data Transfer
Low Latency, High Performance Data Transfers
InfiniBand - 100Gb/s RoCE* – 100Gb/s
Kernel Bypass Protocol Offload
* RDMA over Converged Ethernet
Application ApplicationUSER
KERNEL
HARDWARE
Buffer Buffer
© 2016 Mellanox Technologies 23- Mellanox Confidential -
RDMA Increases Memcached Performance
Memcached: High Performance in-memory distributed memory object caching system• Simple key-value store• Speeds application by eliminating database access• Used by YouTube, Facebook, Zynga, Twitter etc.
RDMA improved Memcached performance:• 1/3 query latency• >3X throughput
D. Shankar, X. Lu, J. Jose, M.W. Rahman, N. Islam, and D.K. Panda, Can RDMA Benefit On‐Line Data Processing Workloads with Memcached and MySQL, ISPASS’15
OLDP workload
64 96 128 160 320 4000
1
2
3
4
5
6
7
8 Memcached-TCP Memcached-RDMA
No. of Clients
Late
ncy
(sec
)
64 96 128 160 320 4000
500
1000
1500
2000
2500
3000
3500Memcached-TCP Memcached-RDMA
No. of Clients
Thro
ughp
ut (
Kq/
s)
Reduced by 66% Increased
by >200%
© 2016 Mellanox Technologies 24- Mellanox Confidential -
Case Studies
© 2016 Mellanox Technologies 25- Mellanox Confidential -
Server I/O Decides Affirmed Networks Virtual EPC Efficiency
When server I/O is constrained, the Affirmed MCC deployment efficiency can be constrained, resulting in underutilized resources and larger server footprint
Mellanox 40G NIC enables MCC to fully utilize CPU resources, reduce server footprint and enhance efficiency.
MCM
CCMDCM
ASM
WSMIOM
Affirmed High AvailabilityMCM
CCMDCM
ASM
WSMIOM
12
12
N
12
N
12
N
12
N1
2N
Affirmed Mobile Content CloudTMH y p e r v i s o rx86 H W P l a t f o r m
MCM – Management Control Module
CCM – Centralized Control Module
DCM – Distributed Control Module
IOM – Input Output Module
WSM – Workflow Services Module (data plane)
ASM – Advanced Services Module (data plane)
MCC Cluster
IOMIOM
IOMWSMWSM
WSM
SP Router
North-South traffic to and from MCC Cluster
East-West traffic within MCC Cluster
A Typical Datapath Traffic Pattern
A single “composite” virtualized network function with distributed microservices that can scale in and out independently
To support 20Gbps of Cluster
I/OWith 10G NIC With 40G NIC
Number of Servers Needed 4 1
An Example to Show Server Efficiency Improvement
© 2016 Mellanox Technologies 26- Mellanox Confidential -
SR-IOV & Data Plane Acceleration Essential for Affirmed MCC
HypervisorHypervisor
PHY
Native Open vSwitch (OVS) DPDK Accelerated vSwitch (AVS)
~20-30% Line Rate ~80% Line Rate
SR-IOV
Near Line Rate
VM VM VM
Hypervisor
Server NIC
OVS
PHY
Server NIC
OVS
DPDK Lib
DPDK Lib
PHY
Server NIC
OVS
© 2016 Mellanox Technologies 27- Mellanox Confidential -
Conclusion - The Mellanox EVN Differentiation
Higher Workload Density
Faster Data
Movement
Cloud-native
Scalability and
Reliability
Operation and Cost Efficiency
Thank You