Simplify Networking for Containers
叶磊 曹水
华为 中央软件院 云网络实验室
2
The Nature of Container Network
3
cloud native and containerised micro-services
high density/dynamic
complex deployment scenarios
online monitoring and control
E2E Monitoring
VM ContainersPublicCloud
PrivateCloud
L2/L3 Overlay Tunnel
SLA (Application to Application)
more applications and micro services are deployed in containers
4
deployment complexity
public clouds: AWS/Azure/HEC
NFV: SR-IOV/L2/L3
private clouds: openstack/vmware/
baremetal
sim
ple
fla
t co
nta
iner
netw
ork
mod
el: C
NI
com
ple
x deplo
ym
ent
scenarios
5
deployment complexity
public clouds: AWS/Azure/HEC
NFV: SR-IOV/L2/L3
private clouds: openstack/vmware/
baremetal
sim
ple
fla
t co
nta
iner
netw
ork
mod
el: C
NI
com
ple
x deplo
ym
ent
scenarios
existing solutions are suitable for limited cases with hard-coded “plugins”
require a flexible solution that always adapts the best technology
based on specific situation
6
Neutron
KuryrBare
mental
Traditional OS
Socket
OpenStackBackend
TCP/IP Stack
Container OS
Socket
TCP/IP Stack
vSwitchOVERLAY
NICs Driver
Underlay vRouter
Underlay Kuryr
OVERLAY
Cloud Provider
XENKVM
IRONIC
Container OS
Socket
vNICDRIVER
TCP/IP Stack
Container OS
Socket
vNICDRIVER
TCP/IP Stack
Container OS
NICDRIVER
Network Stack (Iaas)
vSwitchOVERLAY
VPC vRouter
vSwitchOVERLAY
vSwitchOVERLAY
Traditional OS
Socket
TCP/IP Stack
BridgeOVS L2
OVERLAY OVERLAY OVERLAY
Socket
TCP/IP Stack
SNDbackend
HostGW
SuSE12
Socket
Native Driver
TCP/IP StackBridgeOVS L2
OVERLAY
Container OS
L2
Bare mental
Bare mental
Hetero OS
Pass through
ContainerOS
DPDK API
VF DPDK PMD
ContainerOS
DPDK API
vNICDPDK PMD
VF DPDK PMD
VF PassThroughOVERLAY L2
ContainerOS
DPDK API
vNICDPDK PMD
VF DPDK PMD
Cloud Provider
Bare mental
OVERLAY
Container OS
Socket
TCP/IP Stack
vSwitchOVERLAY
NICs Driver
L2
Container OS
vSwitchOVERLAY
NICs Driver
Bare Mental Host
Socket
TCP/IP Stack
vSwitchL2
NICs Driver
Socket
TCP/IP Stack
vSwitchOVERLAY
vSwitchL2
Public Cloud
Private Cloud
How we deal with so many scenarios for containers?
7
Kernel Network Stack
vNIC@Container
vNIC@Container
vNIC@Container
Functionfeature
Rich, identical to Kernel
Performance Normal
Compatibility Very good
Customized Network Stack
Customized Socket Lib
Customized Socket Lib
Customized Socket Lib
Functionfeature
Normal, according to Customized Stack
Performance Good, about 3 times than Kenel
Compatibility Normal, maybe misssome socket function
DPDK PMD
DPDK
DPDK Client@ Container
Functionfeature
Poor, according to DPDK application
Performance Very good, identical to wire speed
Compatibility Poor, only DPDK ENV
DPDK
DPDK Client@ Container
DPDK
DPDK Client@ Container
Why we need so many models
8
Our solution: iCAN (intelligent Container Network)
an extensible framework to
•program various container network data path and policies
•adapt to different orchestrators
•support end-to-end SLA between containerisedapplications
9
iCAN architecture
kubernetes master
iCAN SLA schd ext.
iCAN master
SLA-annotated policy
etcd
Standard Netwok Component (SNC)
models
node
iCAN agent
kubernets agent
iCAN monitor master
node
node
iCAN agent
kubernets agent
node
iCAN agent
kubernets agent
SNC configurations
monitoring report
aggregated report
aggregated report
aggregated report
10
CNI Interface Extension
br-intX
Node
PodX
Eth
0
br-intY
PodY
Eth
1
PhyNet1
PhyNet2
PaaS①CNI ADD
{
"cniVersion": "0.2.0",
"name": "IDM-M",
"type": "bridge-veth",
// type (plugin) specific
"vlanID": 42,
"ipam": {
"type": "dhcp",
"routes": [ { "dst": "10.3.0.0/16" }, { "dst": "10.4.0.0/16" } ]
}
// args may be ignored by plugins
"args": {
"labels" : {
" phynet " : " Phy_Net1"
}
}
}
{
"cniVersion": "0.2.0",
"name": "IDM-M",
"type": "bridge-veth",
// type (plugin) specific
"vlanID": 42,
"ipam": {
"type": "dhcp",
"routes": [ { "dst": "10.3.0.0/16" }, { "dst": "10.4.0.0/16" } ]
}
// args may be ignored by plugins
"args": {
"labels" : {
" phynet " : " Phy_Net1"
}
}
}
{
"cniVersion": "0.2.0",
"name": "IDM-C",
"type": "bridge-veth",
// type (plugin) specific
"vlanID": 43,
"ipam": {
"type": "dhcp",
"routes": [ { "dst": "10.3.0.0/16" }, { "dst": "10.4.0.0/16" } ]
}
// args may be ignored by plugins
"args": {
"labels" : {
" phynet " : " Phy_Net2"
}
}
}
①CNI ADD
②CNI Network Configuration②CNI Network Configuration
Once with one ticket Once with multi ticket
1) Parameters on CNI Network Configuration ,support Once or Multi entry;2) Reuse the CNI’s common agreement, all customized fields within ”args” segment;
11
Standard Network Component (SNC) model
abstract for network components in data-path• interfaces, devices and templates
L2 device
l2 interface
L2 devices: bridge/macvlan/ovs/…
L3 device
l3 interface
L3 devices: router/ipvlan/…
L2 dev:linux bridge
L3 dev: IPS
a template for Flannel data path
l3 interface
l2 paired interface
l2 interface
l2 paired interface
12
Unified Framework For Multi Models
Container MNG
Flannel Plugin Calico Plugin ……
Linux BR
Kernel VxLAN
Kernel Route
Kernel Route
PGP RouteSync
Linux BR
KernelRoute
Container MNG
Canal Plugin
Linux BR
Kernel VxLAN
Kernel Route
Kernel Route
PGP RouteSync
GRE Tunnel
IPIPTunnel
Flannel Type Calico Type SR-IOV type
UserStack
SR-IVOThrough
● Existing every Plugins only support its own model
● Though they employ common data module, the function is isolated
● After deconstruct different data path , we setup a DSL language to describe them ,using abstracted standard component
● Unified Framework with Pluggable drivers for additional vSwitches, Linux BR, SR-IOV, ...
13
Big Pic of Multi-modes && Multi-planes
PHY-NET
PHY-OM PHY-MNG PHY-DATA
NIC NICNIC NIC NIC NIC NIC NIC NIC NIC
NICNIC
DPDKPMD
UservSwitch
UserStack
@Host
Container
UserSocket Lib
ContainerUser
Socket Lib
bonding
DPDKPMD
UservSwitch
UserStack
@HostProcess
APP
Container
UserSocket Lib
ContainerUser
Socket Lib
Container
DPDKAPP
UIO UIO
DPDKPMD
Container
DPDKAPP
UIO UIO
DPDKPMDvSwitch
Container
KernelStack
bonding
Process APPvSwitch
Container
KernelStack
14
Open stack Neutron Ml2 SolutionNeutron Server
ML2 Plugin
Host A
Linuxbridge Agent
Host B
Hyper-V Agent
Host C
Open vSwitch Agent
Host D
Open vSwitch Agent
API Network
Neutron Server
ML2 Plugin
Host A
Modular Agent
Host B
Modular Agent
Host C
Modular Agent
Host D
Modular Agent
API Network
● Existing ML2 Plugin works with existing agents
● Separate agents for Linuxbridge, Open vSwitch, and Hyper-V
● Combine Open Source Agents, a single agent which can support Linuxbridge and Open vSwitch
● Pluggable drivers for additional vSwitches, Infiniband, SR-IOV, ...
15
iCAN Control Plane Integrated with Openstack
LocalNode
KuberletCANAL Agent
C C C C C C
CANAL Master
Distributed KV store (etcd)
KubernetesMaster
Monitoring controller
SLA Manager
IPAM
Neutron controller
Openstack
Neutron Server
Kuryr AgentControlNode
Neutron controller
16
Monitoring based SNC Modeling
pDevpPortpIF
pDevpPortpIF
vDevvPort
vIF
pIF
vPort
vIF
vPort
C1
vIF
C2
vIF
vDev
vPortvIF
pIF
vPort
C3
vIF
•E2E Monitoring
Point Monitor Item
Source Dest
T1
T4
T2
T3
Latency = ((T4 - T1) - (T3 - T2)) / 2
Monitoring on local SNC components :
Latency:
Generate E2E monitoring data in master node:
Monitoring Agent
Monitoring Agent
… …
Monitoring Master
•E2Ethrought:minimal throughput•E2E Drop rate: deviations between RX and TX•Throughput Analysis:data from local node
Bandwidth Throughput Status QoS CPU utilization
17
Simplify Network SLA modeling iCAN provides north bound interfaces for orchestration and applications to define their requirements through PG(Pod Group: a group of pods with
the same functions), Linking (network requirement between PG) , SLA Service types and Service LB Type.
Given topology and link bandwidth, evaluate the offers when deploying pods. Essentially a evaluation for pod placement, and validate the
deployment.
2-Tiers Network topology management Underlay Network(Stable and Predictable) and Overlay Network (Customizable and Dynamic)
Support: bandwidth, latency and drop rate
Bandwidth <5%
Latency <10%, more non-deterministic, affected by many factors such as queuing in software switch and hardware, application response, server IO, etc
Web
Web
DB
DB
Web
Internet
10Mbps (x3)5Mbps (x6)
Web
Web
DB
DBInternet
10Mbps (x2)
Latency: Low
User 1
User 2
Polices DeploymentSchedulervalidation
Convert link requirement to node requirement
18
iCAN Container networking
Multi-dimension SLA& Security
Performance Isolation with bandwidth, latency, drop rate(Proactive Network SLA and Reactive Network SLA )Security Isolation: VLAN/VXLAN, ACL
Rich Network SupportPowerful network component modeling : SNC and Modeling via YangRich network schemes, support L2, Overlay, NAT, VLAN, L3, BGP, VPCAccelerated Network Stack
Powerful Monitoring Implement “monitoring on-demand ”and “E-to-E monitoring” based on the topologyFacilitate on-demand DSL based troubleshootingCooperate with the SLA subsystem to assess the SLA quality
Copyright©2016 Huawei Technologies Co., Ltd. All Rights Reserved.The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.
Thank You.
20
SNC Template Execution Workflow
❶ Network-Agent Local initialized base on Node Network Pool Configuration and Network Capability Strategy, generate Node Network Capability Configuration(NNCC) .
❷ Node received template deployment request, check NNCC. If node can’t meet requirement, return failure, otherwise will return Network Configuration Deployment Template (NCDT) with information ❸;
❹After, send network deployment request to Network-Element as NCDTdefined, Finally executed by related network driver;
Linux-Bridge
Node Network PoolConfiguration
Name Location
Linux-Bridge
Host
Linux-
Bridge
Container
IP-Filter Host
OVS Host
Network Capability Strategy
Cap-Name
Request
Priority-Index
GRE Linux-Bridge;GRE-Ko; IP-Filter
0
GRE OVS;IP-Filter
1
VxLAN OVS; IP-Filter
0
VxLan EVS;IP-Filter
1
L2-DHCP
Dnsmasq;Linux-Bridge;
0
Cap-Name
Location
Priority-Index
GRE Container
1
GRE Host 1
L2-DHCP
Host 0
Node Network Capability Configuration
Network Configuration Deploy Template
Resource Parameter
Bridge @Host,VxLan-cap,GRE-cap…….
GRE-Tunnel @Container…….
Network Configuration Template
Resource Deploy-with Parameter
Bridge OVS @Host,VxLan-
cap,GRE-cap…….
GRE-Tunnel GRE-Ko @Container…….
Internal-Link IP-Filter veth
Bridge Linux-Bridge @Container…
Resource Driver-func
Bridge Create();Delete();Configure()…..
OVS
Resource Driver-func
Bridge Create();Delete();Configure()…..
GRE-Tunneel
Create();Delete();Configure()…
IP-Filter
Resource Driver-func
Internal-Link
Create();Delete();Configure()…..
❷
❸
❹ ❹
❶
❹
21
Modeling for Standard Network Component Standard Network Component can help to :
Decouple network control with implementation
Replace and upgrade network components seperately
Provide on-demanding network solution and SLA for application
L3: IPVLAN
L3_IF L3_IF L3_IF
L2_IF
IPVLAN SNC: L3_IFL3_DEVL2_IF
L2: MACVLAN
L3_IF L3_IF L3_IF
L2_IF
MACVLAN SNC:L3_IFL2_DEVL2_IF
L2-dev
Paired-IF Paired-IF Paired-IF
IFD/IPA
L3: IPS
L3_IFCALICO SNC:
Paired-IFL2_DEVL3_DEVL3_IF
22
Example: Support with Flannel(VxLAN backend mode)via SNC Modeling (kernel based Overlay)
== Operating abstraction:- CreateSubnet() -- get subnet information via etcd API- L2:SW.CreateDevice() => "l2_sw_dev"- L2:SW.CreatePort(port_L) - L2:SW.CreatePort(port_R)- Overlay:Flannel.CreateDevice() => "flannel_dev"- Overlay:Flannel.Connect(flannel_dev.inf_L, l2_sw_dev.port_R)- Overlay:Flannel.Connect(flannel_dev.inf_R, eth0)- Link:vNIC-pair.CreateDevice() => "link_dev"- Link:vNIC-pair.Connect(link_dev.inf_R, l2_sw_dev.port_L)
L2-Device:vSwitch
Overlay: flannel
Link-Device:vNIC-pair
Flannel Template:Port_L Port_R
SNC interfaces: /* L2:SW device definition */{
/* members */string port[];
/* methods */CreateDevice(); // creat L2:SW deviceCreatePort(string port_name);
}/* Overlay:Flannel device definition */{
/* members */string inf_L;string inf_R;/* methods */CreateDevice(); Connect(string inf, string port);
}/* Link:VNIC-pair device definition */{
/* members */string inf_L;string inf_R;/* methods */CreateDevice();Connect(string inf, string port)
}
Page 12 模板化实例