hyper v.nu-windows serverhyperv-networkingevolved
DESCRIPTION
Hyper-V.nu event 16-04-2013 - Hyper-V Networking Evolved - Didier van HoyeTRANSCRIPT
Windows Server 2012 Hyper-VNetworking Evolved
Didier Van Hoye
Didier Van Hoye
Technical Architect – FGIA
Microsoft MVP & MEET Member
http://workinghardinit.wordpress.com@workinghardinit
What We’ll Discuss
• Windows Server 2012 Networking– Changed & Improved features
– New features
– Relationship to Hyper-V
Why We’ll Discuss This
• We face many network challenges– Keep systems & services running
• High to continuous availability• High reliability & reducing complexity • Security, multitenancy, extensibility
– Cannot keep throwing money at it (CAPEX)• Network virtualization, QOS, bandwidth management in box• Performance (latency, throughput, scalability)• Leverage existing hardware
– Control operational cost (OPEX) Reduce complexity
Eternal Challenge = Balanced Design
M E M O R YC P U
S T O R A G EN E T W O R K
A V
A I L A
B I L I T Y
C A
P A
C I
T Y
C O S T
P E R F O R M A N C E
Network Bottlenecks
In the host networking stack
In the NICs
In the switches
PowerEdge M1000e
9 101 2
11 123 4
13 145 6
15 167 8
0
1
00
1
0 0
1
00
1
0 0
1
00
1
0 0
1
00
1
0
0
1
00
1
0 0
1
00
1
0 0
1
00
1
0 0
1
00
1
0
42
Socket, NUMA, Core, K-Group– Processor: One physical processor, which can consist
of one or more NUMA nodes. Today a physical processor ≈ a socket, with multiple cores.
– Non-uniform memory architecture (NUMA) node:A set of logical processors and cache that are close to one another.
– Core: One processing unit, which can consist of one or more logical processors.
– Logical processor (LP): One logical computing engine from the perspective of the operating system, application or driver. In effect, a logical processor is a thread (think hyper threading).
– Kernel Group: A set of up to 64 logical processors.
Kernel Group (K-Group)
Receive Side Scaling (RSS)
Receive Segment Coalescing (RSC)
Dynamic Virtual Machine Queuing (DVMQ)
Single Root I/O Virtualization (SR-IOV)
NIC TEAMING
RDMA/Multichannel support for virtual machines on SMB3.0
Advanced Network Features (1)
Receive Side Scaling (RSS)
– Windows Server 2012 scales RSS to the next generation of servers & workloads
– Spreads interrupts across all available CPUs
– Even for those very large scale hosts
– RSS now works across K-Groups
– Even RSS is “Numa Aware” to optimize performance
– Now load balances UDP traffic across CPUs
– 40% to 100% more throughput (backups, file copies, web)
Node 0 Node 1 Node 2 Node 3
Qu
eu
es
Incoming Packets
RSS improves scalability on multiple processors / NUMA nodes by distributing TCP/UDP receive traffic across the cores in ≠ nodes / K-Groups
RSS NIC with 8 Queues
Receive Segment Coalescing (RSC)
– Coalesces packets in the NIC so the stack processesfewer headers
– Multiple packets belonging to a connection are coalesced by the NIC to a larger packet (max of 64 K) and processed within a single interrupt
– 10 - 20% improvement in throughput & CPU workload Offload to NIC
– Enabled by default on all 10Gbps
Coalesced into larger buffer
Incoming Packets
NIC with RSC
RSC helps by coalescing multiple inbound packets into alarger buffer or “packet” which reduces per packet CPU
costs as less headers need to be processed.
Receive Segment Coalescing
Dynamic Virtual Machine Queue (DVMQ)
VMQ is to virtualization what RSS is to native workloads.
It makes sure that Routing, Filtering etc. is done by the NIC in queues andthat the interrupts for those queues don’t get done by 1 processor (0).
Most inbox 10Gbps Ethernet adapters support this.
Enabled by default.
Network I/O path without VMQ Network I/O path with VMQ
Root PartitionRoot Partition
CPU0
CPU1
CPU2
CPU3
CPU0
CPU1
CPU2
CPU3
Dynamic Virtual Machine Queue (DVMQ)
Adaptive optimal performance across changing workloads
No VMQ
Root Partition
CPU0
CPU1
CPU2
CPU3
Static VMQ
Root Partition
CPU0
CPU1
CPU2
CPU3
Dynamic VMQ
Physical NIC Physical NIC Physical NICPhysical NIC
Network I/O path without SR-IOVNetwork I/O path with SR-IOV
Single-Root I/O Virtualization (SR-IOV)
– Reduces CPU utilization for processing network traffic
– Reduces latency path
– Increases throughput
– Requires:• Chipset: Interrupt & DMA remapping
• BIOS Support
• CPU: Hardware virtualization, EPT or NPT
Physical NIC
Root Partition
Hyper-V Switch
RoutingVLAN
FilteringData Copy
Virtual Machine
Virtual NIC
SR-IOV Physical NIC
Virtual Function
VMBUS
SR-IOV Enabling & Live Migration
Virtual Machine
Network Stack
Enable IOV (VM NIC Property)
Virtual Function is “Assigned”
“NIC” automatically created
Traffic flows through VF
Turn On IOV Switch back to Software path Reassign Virtual Function
Assuming resources are available
Migrate as normal
Live Migration Post Migration
Remove VF from VM
VM has connectivity even if
Switch not in IOV mode
IOV physical NIC not present
Different NIC vendor
Different NIC firmware
SR-IOV Physical NICPhysical NIC
Software Switch
(IOV Mode)
SR-IOV Physical NIC
Software path is not used
Virtual Function
“NIC”
Software NIC
Virtual Function
Software Switch
(IOV Mode)
“NIC”
Software NIC
NIC TEAMING
– Customers are dealing withway to many issues.
– NIC vendors would like toget rid of supporting this.
– Microsoft needs this to becompetitive & complete thesolution stack + reducesupport issues.
NIC Teaming– Teaming modes:
• Switch dependent• Switch independent
– Load balancing:• Address Hash• Hyper-Port
– Hashing modes:• 4-tuple• 2-tuple• MAC address
– Active/Active & Active/Standby– Vendor Agnostic
Hyper-V Extensible Switch
Network switch
IM MUXProtocol edge
Virtual miniport 1
Port 1 Port 2 Port 3
LBFO Configuration DLL
LBFO Admin GUI
Ke
rne
l mo
de
Use
r m
od
e
WMI
IOCTL
NIC 1 NIC 2 NIC 3
LBFO Provider
Frame distribution/aggregationFailure detection
Control protocol implementation
NIC TEAMING (LBFO)
Parent NIC Teaming Guest NIC Teaming
Hyper-V virtual switch
VM (Guest Running Any OS)
SR-IOV NIC SR-IOV NIC
LBFO Teamed NIC
SR-IOV Not exposed Hyper-V virtual
switch
VM (Guest Running Windows Server 2012)
LBFO Teamed NIC
Hyper-V virtual
switch
SR-IOV NIC SR-IOV NIC
NIC Teaming & QOS
• NIC Teaming, Hyper-V switch, QoS and actual performance | part 1 – Theory
• NIC Teaming, Hyper-V switch, QoS and actual performance | part 2 – Preparing the lab
• NIC Teaming, Hyper-V switch, QoS and actual performance | part 3 – Performance
• NIC Teaming, Hyper-V switch, QoS and actual performance | part 4 – Traffic classes
SMB Client SMB Server
User
Kernel
R-NIC
Network w/RDMA
support
NTFSSCSI
R-NIC
SMB Direct (SMB over RDMA)What• Addresses congestion in network stack by offloading the stack to the
network adapter
Advantages• Scalable, fast and efficient storage access• High throughput, low latency & minimal CPU utilization • Load balancing, automatic failover & bandwidth aggregation via SMB
Multichannel
Scenarios• High performance remote file access for application
servers like Hyper-V, SQL Server, IIS and HPC• Used by File Server and Clustered Shared Volumes (CSV) for storage
communications within a cluster
Required hardware• RDMA-capable network interface (R-NIC)• Three types: iWARP, RoCE & Infiniband
SMB Client
Application
Network w/RDMA
support
SMB Server
Disk
SMB Multichannel
Multiple connections per SMB session
Full Throughput
• Bandwidth aggregation with multiple NICs
• Multiple CPUs cores engaged when using Receive Side Scaling (RSS)
Automatic Failover
• SMB Multichannel implements end-to-end failure detection
• Leverages NIC teaming if present, but does not require it
Automatic Configuration
• SMB detects and uses multiple network paths
SMB Multichannel Single NIC Port
No failover
Can’t use full 10Gbps Only one TCP/IP connection
Only one CPU core engaged
1 session, without Multichannel
SMB Server
SMB Client
Switch10GbE
NIC10GbE
NIC10GbE
CPU utilization per core
Core 1 Core 2 Core 3 Core 4
RSS
RSS
SMB Server
SMB Client
No failover
Full 10Gbps available Multiple TCP/IP connections
Receive Side Scaling (RSS) helpsdistribute load across CPU cores
1 session, with Multichannel
Switch10GbE
NIC10GbE
NIC10GbE
CPU utilization per core
Core 1 Core 2 Core 3 Core 4RSS
RSS
Automatic NIC failover
Combined NIC bandwidth available Multiple NICs engaged
Multiple CPU cores engaged
SMB Multichannel Multiple NIC Ports
No automatic failover
Can’t use full bandwidth Only one NIC engaged
Only one CPU core engaged
SMB Server 1
SMB Client 1
Switch10GbE
SMB Server 2
SMB Client 2
NIC10GbE
NIC10GbE
NIC10GbE
NIC10GbE
Switch10GbE
Switch10GbE
NIC10GbE
NIC10GbE
NIC10GbE
NIC10GbE
RSS RSS
RSS RSS
SMB Server 1
SMB Client 1
SMB Server 2
SMB Client 2
NIC10GbE
NIC10GbE
NIC10GbE
NIC10GbE
Switch10GbE
Switch10GbE
NIC10GbE
NIC10GbE
NIC10GbE
NIC10GbE
RSS RSS
RSS RSS
1 session, without Multichannel 1 session, with Multichannel
Switch10GbE
Switch10GbE
Switch10GbE
Automatic NIC failover (faster with NIC Teaming)
Combined NIC bandwidth available Multiple NICs engaged
Multiple CPU cores engaged
Automatic NIC failover
Can’t use full bandwidth Only one NIC engaged
Only one CPU core engaged
SMB Multichannel & NIC Teaming
SMB Server 1
SMB Client 1
SMB Server 2
SMB Client 2
Switch10GbE
NIC10GbE
Switch10GbE
NIC10GbE
NIC10GbE
NIC10GbE
Switch1GbE
NIC1GbE
NIC1GbE
Switch1GbE
NIC1GbE
NIC1GbE
SMB Server 2
SMB Client 1
Switch1GbE
SMB Server 2
SMB Client 2
NIC1GbE
NIC1GbE
Switch1GbE
NIC1GbE
NIC1GbE
Switch10GbE
Switch10GbE
NIC10GbE
NIC10GbE
NIC10GbE
NIC10GbE
NIC Teaming
NIC Teaming
RSS RSS
RSS RSS
NIC TeamingRSS RSS
1 session, NIC Teaming without MC 1 session, NIC Teaming with MC
NIC TeamingRSS RSS
NIC Teaming
NIC Teaming
NIC Teaming
NIC Teaming
SMB Client 1SMB Client 1
Automatic NIC failover
Combined NIC bandwidth available Multiple NICs engaged
Multiple RDMA connections
No automatic failover
Can’t use full bandwidth Only one NIC engaged
RDMA capability not used
1 session, with Multichannel
SMB Direct & Multichannel
SMB Server 2
SMB Client 2
SMB Server 1SMB Server 2
SMB Client 2
SMB Server 1
Switch10GbE
Switch10GbE
R-NIC10GbE
R-NIC10GbE
R-NIC10GbE
R-NIC10GbE
Switch54GbIB
R-NIC54GbIB
R-NIC54GbIB
Switch54GbIB
R-NIC54GbIB
R-NIC54GbIB
Switch10GbE
Switch10GbE
R-NIC10GbE
R-NIC10GbE
R-NIC10GbE
R-NIC10GbE
Switch54GbIB
R-NIC54GbIB
R-NIC54GbIB
Switch54GbIB
R-NIC54GbIB
R-NIC54GbIB
1 session, without Multichannel
SMB Multichannel Auto Configuration
– Auto configuration looks at NIC type/speed => Same NICs are used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA)
– Let the algorithms work before you decide to intervene
– Choose adapters wisely for their function
Switch1GbE
SMB Server
SMB Client
NIC1GbE
NIC1GbE
Switch1GbE
SwitchWireless
SMB Server
SMB Client
NIC1GbE
NICWireless
NIC1GbE
Switch1GbE
SMB Server
SMB Client
NIC1GbE
NIC1GbE
Switch10GbE
SMB Server
SMB Client
R-NIC10GbE
R-NIC10GbE
Switch10GbE
NIC10GbE
NIC10GbE
SwitchIB
R-NIC32GbIB
R-NIC32GbIB
Switch10GbE
R-NIC10GbE
R-NIC10GbE
RSS
RSS
NICWireless
Metric Large Send Offload (LSO)
Receive SegmentCoalescing (RSC)
Receive Side Scaling (RSS)
Virtual Machine Queues (VMQ)
Remote DMA (RDMA)
Single Root I/OVirtualization(SR-IOV)
Lower Latency
Higher Scalability
HigherThroughput
Lower Path Length
Networking Features Cheat Sheet
Advanced Network Features (2)
Consistent Device Naming
DCTCP/DCB/QOS
DHCP Guard/Router Guard/Port Mirroring
Port ACLs
IPSEC Task Offload for Virtual Machines (IPsecTOv2)
Network virtualization & Extensible Switch
Consistent Device Naming
Datacenter TCP (DCTCP)
http://www.flickr.com/photos/srgblog/414839326
1Gbps flow controlled by TCP Needs 400 to 600KB of memory
TCP saw tooth visible
1Gbps flow controlled by DCTCP Requires 30KB of memory
Smooth
DCTCP Requires Less Buffer Memory
Datacenter TCP (DCTCP)– W2K12 deals with network congestion by reacting to
the degree & not merely the presence of congestion.
– DCTCP aims to achieve low latency, high burst tolerance and
high throughput, with small buffer switches.
– Requires Explicit Congestion Notification (ECN, RFC 3168)
capable switches.
– Algorithm enabled when it makes sense
(low round trip times, i.e. in the data center).
Datacenter TCP (DCTCP)
Running out of buffer in a
switch gets you in to stop/go
hell by getting a boatload of
green, orange & red lights
along your way
Big buffers mitigate this but
are very expensive
http://www.flickr.com/photos/bexross/2636921208/http://www.flickr.com/photos/mwichary/3321222807/
Datacenter TCP (DCTP)
You want to be in a green wave
Windows Server 2012 & ECN provides
network traffic control by default
http://www.flickr.com/photos/highwaysagency/6281302040/
http://www.telegraph.co.uk/motoring/news/5149151/Motorists-to-be-given-green-traffic-lights-if-they-stick-to-speed-limit.html
Data Center Bridging (DCB)
– Prevents congestion in NIC & network by reserving
bandwidth for particular traffic types
– Windows 2012 provides support & control for DCB, tags
packets by traffic type
– Provides lossless transport for mission critical workloads
DCB is like a car pool lane …
http://www.flickr.com/photos/philopp/7332438786/
DCB Requirements
1. Enhanced Transmission Selection (IEEE 802.1Qaz)
2. Priority Flow Control (IEEE 802.1Qbb)
3. (Optional) Data Center Bridging Exchange protocol
4. (Not required) Congestion Notification (IEEE 802.1Qau)
10 GbE Phy NIC 10 GbE Phy NIC
Hyper-V virtual switch
VM 1 VM nManagement OS
Live Migration
Storage
Management
Hyper-V Qos beyond the VM
Manage the Network Bandwidth with a Maximum (value) and/or a Minimum (value or weight)
LBFO Teamed NIC
Hyper-V Qos beyond the VMhttp://www.hyper-v.nu/archives/hvredevoort/2012/06/building-a-converged-fabric-with-windows-server-2012-powershell/
Default Flow per Virtual Switch
VM2
Hyper-V Extensible Switch
VM1Gold
Tenant
Customers may group a number of VMs that each don’t have minimum bandwidth. They will be bucketizedinto a default flow, which has minimum weight allocation. This is to prevent starvation.
? ? 10
1 Gbps
Maximum Bandwidth for Tenants
Hyper-V Extensible Switch
Unified Remote Access
Gateway
<100Mb
One common customer pain point is WAN links are expensive
Cap VM throughput to the Internet to avoid bill shock ∞
Internet Intranet
Bandwidth Network Management
• Manage the Network Bandwidth
with a Maximum and a
Minimum value
• SLAs for hosted Virtual Machines
• Control per VMs and not per
HOST
DHCP & Router Guard, Port Mirroring
IPsec Task Offload
– IPsec is CPU intensive => Offload to NIC
– In demand due to compliance (SOX, HIPPA, etc.)
– IPsec is required & needed for secured operations
– Only available to host/parent workloads in W2K8R2
Now extended to virtual machinesManaged by the Hyper-V switch
Allow/Deny/Counter
MAC, IPv4 or IPv6 addresses
Wildcards allowed in IP addresses
Note: Counters are implemented as ACLs
• Counts packets to address/range
• Read via WMI/PowerShell
• Counters are tied into the resource metering you can do for charge/show back, planning etc.
Port ACL
ACLs are the basic building blocks of virtual switch security functions
http://workinghardinit.wordpress.com@workinghardinit
Questions & Answers
Many, many thanks to: