high performance cloud-native networking · high performance cloud-native networking k8s unleashing...
TRANSCRIPT
High Performance Cloud-native NetworkingK8s Unleashing FD.io
Giles Heron
[email protected]@cisco.com
Maciek KonstantynowiczPrincipal Engineer, Cisco FD.io CSIT Project Lead
Distinguished Engineer, Cisco
Jerome Tollet
[email protected] Engineer, Cisco
DISCLAIMERs• 'Mileage May Vary'
• Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your opinion and investment of any resources. For more complete information about open source performance and benchmark results referred in this material, visit https://wiki.fd.io/view/CSIT and/or https://docs.fd.io/csit/rls1807/report/.
• Trademarks and Branding• This is an open-source material. Commercial names and brands may be claimed as the property of others.
Cloud
NFV
Internet Mega Trends
Scalability and Self-Healing
Portability and Efficiency
Software Defined Networking
Open Source Platforms
Cloud Native Designs
SDN
5 Pillars of Next Generation Software Data Planes
Blazingly Fast• Process the massive explosion of East-West traffic• Process increasing North-South traffic
Truly Extensible• Foster pace of innovation in cloud-native networking• No compromise on performance (zero-tolerance)
Software First• Cloud means running everywhere• Cloud means hardware and physical infra agnostic
Predictable performance• Dataplane performance must be deterministic• Predictable for a number of VMs, Containers,
virtual topology and (E-W, N-S) traffic matrix
FD.io VPP meets these challengesHow can one use it in large scale Cloud-native networks ?
Measureable• Counters everywhere to count everything for detailed
cross-layer operation and efficiency monitoring• Enables feedback loop to drive optimizations
Production-Grade Container Orchestration• De facto standard for container orchestration• Superior extensibility and self-healing
Performance Container Networking• Seamless cloud-native integration• Extends K8s with userspace networking
Cloud-native Network Micro-services• Scalability, performance and agility• Marries K8s with flexible network topologies
Fast Data Input/Output• Most efficient on the planet• Top performance, flexibility and extensibility
Cloud-native Networking Platform
LIGATO
Network-as-a-ServiceNetwork-as-a-Service
NF Micro-Services
Micro-ServicesMicro-Services
App Micro-Services
Contiv
High Performance Cloud-native NetworkingK8s Unleashing FD.io
The Way Applications Are Developed and Deployed.. Has Changed..
Time
The Way Networks are Deployed and Used… has Changed…
Internet Data-CenterCorporate LAN/WAN”80:20 rule”
LAN WAN
Intranets & Internet
LAN WAN
Internet
WWW
SD-WAN & “BeyondCorp”
WiFi Internet
CloudProvider
VLAN 1 Core
Spine/Leaves & L3 Core/L2 AccessTiered Transit & Private Peering
Tier1 –A Tier1 – B Tier1 – C
Tier2 – D Tier2 – E Tier2 – F
Internet exchanges & public peering
Network 1 Network 2 Network 3
Telco/Cable Access &, OTT/CDN Content
Telco 1 Telco 2 MSO 1
Cloud Provider 1 Global Backbone
Core/Distr/Access, VLAN based
VLAN 1VLAN 2VLAN n
VLAN 2 VLAN 2’Tim
e
Cloud Provider 2 Global Backbone
L3 Fabric/SW Overlay & Virt Networking
Layer 3 Fabric
Microservices & Containers have changed many things…
• Microservices split applications into modular pieces with the network stitching the pieces together
• The interconnection of the pieces increases ”East / West” traffic
• Network performance is therefore critical to the composite application performance
• Applications are being developed and deployed very differently today.
Pod
Pod
PodPod
Pod
Pod
Pod
Microservices & Containers have changed many things…
• How do application trends impact SDN / NFV?
Pod
Pod
PodPod
Pod
Pod
Pod
Traffic
Traffic
Traffic
Pod
Agent
VPP
Pod
Agent
VPP
Pod
Agent
VPP
Pod
Agent
VPP
Pod
Agent
VPP
Pod
Agent
VPP
Pod
Agent
VPP
Data Plane Microservices
• Network Functions can be implemented using containers
• The microservice model enables decomposition of Network Functions into finer-grained Cloud-native Network Functions (CNFs)
• Since network functions are I/O bound network performance is even more critical than for applications
Remember 1965 ”Moore’s Law” – ..MK MK
Electromechanical Solid-stateRelay
Vacuum Tube Transistor Integrated Circuit1010
106
104
102
1
10-2
10-4
10-6
Moore’sLaw 1965
108
Calcu
latio
ns p
er Se
cond
per
$1,
000
Resources to Get Performance1) Processor and CPU cores
for performing packet processing operations_ __
2) Memory bandwidthfor moving data (packets, lookup) and instructions (packet processing)
3) I/O bandwidthfor moving packets to/from NIC interfaces _
4) Inter-socket bandwidthfor handling inter-socket operations _
Processing Packets: How to Use Compute ..
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
PCIe
DDR4 DDR4
DDR4
SATA
B IOS
1
4
3
2
3
1
2
DDR4
DMI
PCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
PCIe
x16
PCIe
x16
PCIe
x16
PCIe
x16PC
Iex16
PCIe
x16
UPI
280 Gbps@ 1,000Bytes ETH frames
100GE 100GE100GE100GE 100GE100GE
280 Gbps@ 1,000Bytes ETH frames
Tℎ#$%&ℎ'%( [*'+] = Tℎ#$%&ℎ'%( ''+ ∗ /0123(_5673[''+]
CyclesPerPacket[ClockCycles]= #FGHIJKLIMNGHOPLQRI
∗ #STLURHFGHIJKLIMNG
Tℎ#$%&ℎ'%( [''+] = V]OPLQRI_OJNLRHHMGW_XMYR[HRL=
]SOZ_[JR\[]^STLURH__RJ_OPLQRI
1
2
3
4
MK MK
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
PCIe
DDR4 DDR4
DDR4
SATA
B IOS
1
4
3
2
3
1
2
DDR4
DMI
PCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
PCIe
x16
PCIe
x16
PCIe
x16
PCIe
x16PC
Iex16
PCIe
x16
UPI
280 Gbps@ 1,000Bytes ETH frames
100GE 100GE100GE100GE 100GE100GE
280 Gbps@ 1,000Bytes ETH frames
Tℎ#$%&ℎ'%( [*'+] = Tℎ#$%&ℎ'%( ''+ ∗ /0123(_5673[''+]
CyclesPerPacket[ClockCycles]= #FGHIJKLIMNGHOPLQRI
∗ #STLURHFGHIJKLIMNG
Tℎ#$%&ℎ'%( [''+] = V]OPLQRI_OJNLRHHMGW_XMYR[HRL=
]SOZ_[JR\[]^STLURH__RJ_OPLQRIMoore’s Law in Action
Resources to Get Performance1) Processor and CPU cores
FrontEnd: faster instr. decoder (4- to 5-wide)BackEnd: faster L1 cache, bigger L2 cache, deeper OOO* executionUncore: move from ring to X-Y fabric mesh
2) Memory bandwidth~50% increase: channels (4 to 6), speed (DDR-2666)
3) I/O bandwidth>50% increase: PCIe lanes (40 to 48), re-designed IO blocks
4) Inter-socket bandwidth~60% increase: QPI to UPI (2x to 3x), interface speed (9.6 to 10.4 GigTrans/sec)
1
2
3
4
Processing Packets: What Improves in Compute ..
MK MK
FD.io VPP – The “Magic” of VectorsCompute Optimized SW Network Platform
1Packet processing is decomposedinto a directed graph of nodes …
Packet 0
Packet 1
Packet 2
Packet 3
Packet 4
Packet 5
Packet 6
Packet 7
Packet 8
Packet 9
Packet 10
… packets move through graph nodes in vector …2
Microprocessor
… graph nodes are optimized to fit inside the instruction cache …
… packets are pre-fetched into the data cache.
Instruction Cache3
Data Cache4
3
4
Makes use of modern Intel® Xeon® Processor micro-architectures. Instruction cache & data cache always hot è Minimized memory latency and usage.
vhost-user-input
af-packet-input dpdk-input
ip4-lookup-mulitcast ip4-lookup*
ethernet-input
mpls-inputlldp-input
arp-inputcdp-input...-no-
checksum
ip6-inputl2-input ip4-input
ip4-load-balance
mpls-policy-encap
ip4-rewrite-transit
ip4-midchain
interface-output
* Each graph node implements a “micro-NF”, a “micro-NetworkFunction” processing packets.
MK MK
K8s perspective: Orchestation/K8s incl. placement for scale-out• We have all of this performance, how do we use it
• How do we make it work for us at scale
• How do we intelligently make an efficient use of available resource to deliver intended application and networking services
MK MK
LIGATOContiv
Cloud-native Network Micro-ServicesFor Native Cloud Network Services
Production-GradeContainer Orchestration
Cloud-native Network FunctionOrchestration
Containerized FastData Input/ Output
Performance-CentricContainer Networking
Enabling Production-Grade Native Cloud Network Services at Scale
MK GH
• ”Kubernetes is a portable, extensible open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation.”
• Terminology:• Node: a host or VM on which pods can be scheduled• Pod: one or more co-resident linux containers with a single IP address
(typically a /24 of address space is provided per node)• Deployment: a set of pods implementing the same application• Service: an abstraction providing a single persistent IP address for a deployment
(Kubernetes provides mechanisms to load balance across multiple pods)
• Kubernetes assumes seamless connectivity between pods, wherever it places them• A networking plugin is needed to abstract the network
Kubernetes Overview
GH GH
• Contiv-VPP is a networking plugin for Kubernetes that:• Allocates IP addresses to Pods (IPAM)• Programs the underlying infrastructure it uses (VPP and the Linux TCP/IP stack) to
connect the pods to other pods in the cluster and/or to the external world• Implements K8s network policies that define which pods can talk to each other• Implements K8s services using a stateful load-balanced NAT function
• Contiv-VPP is a user-space, high-performance, high-density networking plugin for Kubernetes - leveraging FD.io/VPP as the industry’s highest performance software data plane
Contiv-VPP Overview
GH GH
Kubelet
CNI
tapv2/veth
Contiv-VPP vswitch
Agent
PodPod
Pod
VPP
…
K8s Master
Data Centre Fabric
App
Kernel Host stack
Legacy Apps K8s State
Reflector
Contiv-VPP
Etcd
Kubelet
CNI
tapv2/veth
Contiv-VPP vswitch
Agent
PodPod
Pod
VPP
App
Kernel Host stack
High PerformanceApps
PodPod
Pod
Envoy Sidecar App
VPP
TCP
Stack
PodPod
Pod
High PerformanceApps
Envoy SidecarApp
VPP
TCP
Stackmemif
Legacy Apps
PodPod
Pod
CNF
memif
Cloud-Native NFs
PodPod
Pod
CNF
Cloud-Native NFs
K8s policy & state
distribution
Contiv-VPP Architecture
• Can deliver complete container networking solution entirely from user-space
• Replace all eth/kernel interfaces with memif/user-space
interfaces.
• Apps can add VCL library for Higher Performance (bypass
Kernel host stack and use VPP TCP stack)
• Legacy apps can still use the kernel host stack in the same architecture
GH GH
• Networking• HTTP or NAT-based load balancing isn‘t suitable for NFV use-cases• No support for multiple interfaces or IP addresses per pod• No support for high-speed wiring of NFs
• Policy• No support for QoS, network-aware placement etc.
• Isolation• Applications run in user-space – kernel networking is unsuited to NFV
• Performance• Polling mode drivers (e.g. DPDK) required for maximum throughput
What Container Networking Lacks for NFV Use-Cases
GH GH
TopologyPlacement
(K8s)Rendering
NF1 NF2 NF3
Logical RepresentationIngress Network
Ingress Classifier
Egress Network
Egress Classifier
IngressRouter
EgressRouter
Host
VPP Vswitch
CNF
VPP
10.1.0.127
…
CNF1
VPP
…
CNF2
VPP
…
…Server
Vswitch VPP
CNF
VPP
…
CNF
VPP
…
CNF3
VPP
…
…
Overlay Tunnel
Physical Representation
Overlay Tunnel Overlay Tunnel
Ingress Classifier Egress Classifier
Service Function Chaining with Ligato
GH GH
Kubelet
CNICRI
tapv2/veth
Contiv-VPP vswitch
Agent
PodPodPodPodPod
Pod
VPP
Data Centre Fabric
High PerformanceApps
Sidecar Proxy App
VPP TCPStack
App
Kernel Host stack
Legacy Apps
• Kubernetes does not provide a way to stitch micro-services together today• Ligato enables you to wire the data plane together into a service topology• Network functions can now become part of the service topology• Dedicated Telemetry Engine in VPP to enable closed-loop control• Offload functions to NIC but via vSwitch in host memory
Contiv-VPP Etcd
K8s Master
Contiv-VPPNetmaster
DefineTopology
LigatoController
DefineServices
DefineTopology
Ligato – Cloud-native NFs (CNFs)
Smarts in NICs
TelemetryEngine
Back Propagation LoopFor Reactive Placement/Rsrc Mgmt
PodPodPod
memif
Cloud-Native VNFs
Agent
VPP
GH GH
Unleashing Innovation in Networking• Container networking requires fast innovation cycles to deploy new models• Kernel upgrades and ad hoc modules may cause problems in production environments• Multiple options are being explored in the industry
Technology Programmability Model Execution ContextOVS OpenFlow lets users configure very granular
data path behaviourPrimarily kernel basedOpenflow model
eBPF+XDP Packets handled by user coded eBPFprograms running in a sandbox
Bypass kernel networking stack but still in kernel-modeBytecode + JIT + kernel helpers
FD.io / VPP Regular user-mode C program with user loadable plugins
User-mode, no kernel dependenciesNative NIC drivers, Linux APIs, DPDK
JT GH
Baremetal Data Plane Performance Limit FD.io benefits from increased Processor I/O
YESTERDAY
Intel® Xeon® E5-2699v422 Cores, 2.2 GHz, 55MB Cache
Network I/O: 160 GbpsCore ALU: 4-wide parallel µopsMemory: 4-channels 2400 MHzMax power: 145W (TDP)
1
2
3
4
Socket 0
BroadwellServer CPU
Socket 1
BroadwellServer CPU
2
DD
R4
QPI
QPI
4
2
DD
R4
DD
R4
PC
Ie PC
Ie
PC
Ie
x8 50GE
x16 100GE
x16 100GE
3
1
4
PC
Ie PC
Ie
x8 50GE
x16 100GE
Ethernet
1
3DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
SATA
B IOS
PCH
Intel® Xeon® Platinum 816824 Cores, 2.7 GHz, 33MB Cache
TODAY
Network I/O: 280 GbpsCore ALU: 5-wide parallel µopsMemory: 6-channels 2666 MHzMax power: 205W (TDP)
1
2
3
4
Socket 0
Skylake Server CPU
Socket 1
Skylake Server CPU
UPI
UPI
DDR4 DDR4
DDR4
PC
Ie
PC
Ie
PC
Ie PC
Ie
PC
Ie
PC
Ie
x8 50GE
x16 100GE
x8 50GE
x16 100GE
x16 100GE
SATA
B IOS
2
4
2
1
3
1
4
3
x8 50GE
DDR4
PC
Ie
x8 40GE
LewisburgPCH
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
DD
R4
0 200 400 600 800 1000 1200
160
280
320
560
640
Server[1 Socket]
Server[2 Sockets]
Server2x [2 Sockets]
+75%
+75%
PCle Packet Forwarding Rate [Gbps]
Intel® Xeon® v3, v4 Processors Intel® Xeon® Platinum 8180 Processors
1,120*Gbps+75%
* On compute platforms with all PCIe lanes from the Processors routed to PCIe slots.
Breaking the Barrier of Software Defined Network Services1 Terabit Services on a Single Intel® Xeon® Server !
FD.io Takes Full Advantage of Faster Intel® Xeon® Scalable Processors
No Code Change Required
https://goo.gl/UtbaHy
MK MK
24
Scalability and Self-healing
Most Portable and Most Efficient on the Planet
Rich NFV Functionality
Open Source
Cloud Native
SOFTWARE DEFINED NETWORKING
CLOUD NETWORK SERVICES
LINUX FOUNDATION
PORTABILITY AND EFFICIENCY
SCALABILITY and SELF-HEALING
High Performance Cloud-Native NetworkingK8s Unleashing FD.io
MK MK
High Performance Cloud-Native NetworkingK8s Unleashing FD.io
THANK YOU !
MK MK GH