micro vmms and nested virtualization -...
TRANSCRIPT
Micro VMMs and Nested Virtualization
For the TCE 4th summer school on computer security, big data and innovation
Baruch Chaikin, Intel
9 September 2015
Virtual Machines
A virtual machine (VM) allows SW to run virtual HW – Processors, memory, chipset, I/O devices, etc.
– Encapsulates OS and application state
Virtualization usages – Backward and cross ISA compatibility
– Resource sharing and protection
– Isolation, fault tolerance and recovery
– Snapshotting and migration
– Testing and development
VMs are supported by VM Monitor (VMM) – Launches VMs and controls VM execution and VM access to system resources
Virtualization Approaches
Three main approaches:
• Full virtualization – Everything is virtualized
– The VMM emulates, simulates, or translates VM code
– Induces performance penalty
• Para-virtualization – The VMM provides services to the VMM-aware VM
– Fast, but requires SW enabling
• HW-assisted virtualization – The processor supports special ISA (instructions, modes, data structures, MSRs) that VMM can use
– Fast, now SW enabling needed
Intel provides HW-assists to VMMs – VT technology
Defines Virtual Mode Extensions (VMX) to x86 ISA
Intel VT Architecture
Three VMX execution modes – VMX Off – No VMM is running (default state established at reset)
– VT Root Mode – SW is running as VMM (“host”)
– VT Non-Root Mode – SW is running as VM (“guest”)
Transitions between VMX execution modes – VMXON, VMXOFF – VMX instructions to enter/exit VMX root mode
– VMLAUNCH/VMRESUME – VMX instructions to enter VMX non-root mode (VM Entry flow)
– The VM Control Structure (VMCS) defines events and instructions that cause transitions from VMX non-root mode back to the VMM (VM Exit flow)
Non-Root Root
VMXOFF
VM Entry VMXON
VM Exit
VMX Off
VMX On
CPU Virtualization
VMX allows the VMM to control VM use of CPU resources • Execution time
• Instructions set
• Registers
• Etc.
Events that happen during VM execution: • Fixed VM exit: always monitored, e.g. INIT event, CPUID instruction
• Conditional VM exit: can be controlled by the VMM, e.g. #PF exception, PAUSE instr.
• No VM exit: no VMM control, e.g. ADD instruction, SMI event
Basic VMX functionality • Employed using the VMCS
VMCS
VMCS and VT Transitions
Execution Controls
Entry Controls
Exit Controls
Guest State
Host State
(event)
Save exit information
Save “guest” CPU state according to exit controls
Load “host” CPU state
VMLAUNCH/VMRESUME
Check and load “guest” CPU state
Check “host” CPU state in VMCS
Perform additional actions according to entry controls
Guest
Host
Move to root mode Move to non-root mode
Run under execution controls
VMWRITE host/guest state and controls
VMREAD guest state and exit information
Memory Virtualization
VMX allows the VMM to control VM use of memory
Many usages, including: • Preventing VM access to the memory ranges allocated for the VMM or other VMs
• Monitor VM use of memory
• Support VM migrations to other machines
Virtual Memory Virtualization • Establish “shadow” page tables (SPT)
• Monitoring guest MOV to CR3, INVLPG and #PF exceptions
Physical Memory Virtualization • Establish extended page tables (EPT)
• Monitoring EPT violations
Shadow and Extended Page Tables
VM: Access VA x
Guest PA y
Host PA z
Virtual Memory Virtualization using Shadow Page Tables (SPT)
OS PT VA x PA y
Physical Memory Virtualization using Extended Page Tables (EPT)
Shadow PT VA x PA z
VM: Access VA x
Guest PA y
Host PA z
OS PT VA x PA y
Extended PT PA y PA z
Page Fault EPT
Violation
I/O Virtualization
VMX allows the VMM to virtualize I/O devices – VTX – contains VMX features to support I/O virtualization in the cores
– VTD – contains VMX features to support I/O virtualization in the uncore
VTX features – I/O port accesses
– Memory-mapped I/O accesses
– Interrupt and NMI virtualization
– Local APIC Virtualization
– And more…
VTD features – DMA remapping
– Interrupt remapping
– I/O device assignment
– And more…
Concepts
The micro-VMM (uVMM) is a minimal VMM: – Supports a single VM
– Exposes the underlying CPU to the VM
– Allows the VM to access all physical memory (except uVMM region)
– Fully assigns I/O handling to the VM
A thin layer with a small footprint – Minimal impact on boot time, memory consumption and power/performance
– Can be used as hypervisor or as hosted VMM
A foundation for various usages – Can be (and was) serve as a basis for extensions
– Not a stand-alone product!
uVMM
Architecture
CPU
Guest OS
BIOS/SMM
The uVMM runs as a thin layer below the guest OS Can be launched in 2 different ways:
• Early launch – from the BIOS, before the OS is booted • Late launch – from the OS, using a kernel driver
uVMM
Possible uVMM Design
Runtime services IPC Support
HW Abstraction Layer (HAL)
Initialization Power State
Support
Physical
Memory
Virtualization
(EPT)
Context
Support
(VMCS)
CPU Virtualization
(CPUID, etc.)
Add-On Debug Support
Add-
Ons
API
Add-On
Add-On
Usages
A uVMM does nothing in particular, but can be used as a basis for various usages:
– Anti-virus hardening
– Monitoring execution of untrusted VMs
– Prototyping new ISA extensions
– Etc.
NESTED VIRTUALIZATION
Motivation and concepts
Architectures challenges and HW Support
Main Techniques
Motivation
VMMs become ubiquitous in commodity platforms – Windows 7 runs Windows XP in a virtual machine
– Windows 10 uses HyperV as an integral part of the OS
– Linux has an optional built-in VMM (KVM)
To virtualize such commodity platforms, bare-metal VMMs need to support guest VMMs
– For clouds, development, security, and more
This is called Nested Virtualization (NV)
Some VMMs already support NV – KVM Turtles, Xen 4.4, Blue Pill, VMWare ESX
Architecture Challenges
The root VMM shall support unmodified guest VMM that runs unmodified guest VMs 1. Allow VMM establishment in non-root mode
2. Allow the guest VMM to use launch and monitor guest VMs
3. Allow the guest VMM to support nested virtualization…
The root VMM shall run with small footprint • Low power/performance overhead
• Low memory overhead
• Minimal consumption of system resources
VM1b
Nesting Levels
VMM0
VM1a
VMM1
VM2a VM2b
VMM2
VM3
Root mode
Virtual non-root mode
levels
L0
L1
L2
L3
Complexity Reduction
There might be n levels of nesting, but it’s enough that the root VMM (level 0) will support only 2 guest levels – guest VMM (level 1) and guest VM (level 2)
– The guest VMM at level 1 (which thinks it’s the root VMM!) will support up to level 3
– The guest-guest VMM at level 2 (which also thinks it’s the root VMM!) will support up to level 4
– And so on...
So we can restrict our discussion to levels 0-2 only
Terminology: – Root VMM = VMM that runs in VT root mode (level 0)
– Guest VMM = VMM that runs in VT non-root mode (level 1)
– Guest VM = VM supported by the guest VMM (level 2)
VM2
Nesting Level “Flattening”
Root VMM
VM1
Guest VMM
Guest
VM
Root VMM
VM1
(VM1)
VM2
(Guest VMM)
VM3
(Guest VM)
Virtual levels Actual Levels
Nested Virtualization –VT Transition Emulating
Guest VMM
Guest VM
Root VMM
VM entry emulation VM exit emulation
Level 2
Level 1
Level 0
VM Entry 12 VM Exit 21
VM Exit 20
VMX Support for Nested Virtualization
The CPU knows about 2 levels only – root and non-root mode – The root VMM should virtualize VT resources (VMCS, EPT, VTD, etc.) to the guest VMM
– The root VMM should emulate level 1-2 transitions between the guest VMM and the guest VM
Intel VMX adds some support for nesting – Haswell added “VMCS Shadowing” to avoid successive VM exits on VMCS accesses
(VMREAD/VMWRITE) made by a guest VMM
– More opportunities considered
Root VMM can do clever SW tricks to support guest VMM – KVM Turtles declares 6-8% overhead
– Use of VMCS shadowing and virtual interrupts can possibly reduce this number even further
Nested CPU Virtualization
The root VMM maintains 3 kinds of VMCS structures – VMCS 0-1 for regular VMs and for a VM that works as guest VMM
– VMCS 1-2 as a shadow for guest VMCS 0-1 (i.e. the VMCS that the guest VMM created for its VM)
– VMCS 0-2 as the actual VMCS under which the guest VM will runs
Guest VMM execution (in level 1) – The root VMM intercepts the VMPTRLD instruction, and creates/reloads VMCS 1-2 for each
VMCS created/reloaded by the guest VMM
– The root VMM intercepts the VMREAD/VMWRITE instruction, which the guest VMM uses to read/write its VMCS, using VMCS 1-2
Guest VM entry (from level 1 to level 2) – The root VMM intercepts the VMLAUNCH/VMRESUME instructions, and merges VMCS 1-2 with
VMCS 0-1 into VMCS 0-2, then launches/resumes the guest VM
Guest VM exit (from level 2 to level 1) – The root VMM intercepts the VM exiting event/instruction and checks VMCS 0-2
– If the VM exit should be delivered to the guest VMM, the root VMM updates VMCS 1-2 and resume the guest VMM
Nested VMCSs
L1-2 Controls
L2 Guest State
L1 Host State
VMCS 1-2
L0-1 Controls
L1 Guest State
L0 Host State
VMCS01
L0-2 Controls
L2 Guest State
L0 Host State
VMCS 0-2
VMCS 0-1
Nested Memory Virtualization
Both root and guest VMMs need to virtualize memory – Recall: VMMs use either Shadow Page Tables (SPT) or Extended Page Tables (EPT)
– VMMs resort to SPT if EPT is not supported
The root VMM virtualizes memory using “Shadow EPT” – Run the guest VMM under EPT 0-1 (PA1PA0) as usual
– Let the guest VMM setup EPT 1-2 (PA2PA1) for the guest VM
– At guest VM entry emulation, merge EPT 0-1 and EPT 1-2 into EPT 0-2
– Run the guest VM under EPT 0-2 (PA2PA0)
– When EPT violation happens on EPT 0-2, check whether the violation would have happened in EPT 1-2 – if yes, emulated EPT violation VM exit to the guest VMM
– Otherwise, handle the EPT violation and resume the guest VM
The root VMM does not need to virtualize virtual memory
EPT Shadowing
VA2
PA2
PA1
PA0
Guest VM page tables
Guest VMM extended page tables (EPT 1-2)
Root VMM extended page tables (EPT 0-1)
Shadow extended page tables (EPT 0-2)
Nested I/O Virtualization
Nested I/O virtualization requires to virtualize MMIO accesses, I/O port accesses, local APIC, external interrupts, NMIs, VTD tables, etc.
This is the most complex part…
… But the basic uVMM working as root VMM does not need to bother, because the uVMM assigns the I/O to its guest
Nested Virtualization Challenges
Performance
• The root VMM should emulate Guest VMCS accesses and guest VMM-VM transitions – Intel supports “VMCS Shadowing” to accelerate guest VMREAD/VMWRITE
– No Intel CPU support to accelerate other things, yet
• The root VMM should merge control structures (VMCS, EPT, MSR lists, etc.)
• Root VMM execution pollutes caches
Complexity
• NV support requires complex code and thus increases VMM footprint and enlarges risk for bugs and security issues
• Virtualization is a powerful technology with increasingly
many usages
• Intel VMX provides VMMs with HW assists for CPU,
memory and I/O virtualization
• uVMM is a lightweight VMM that performs the bare
minimum required to support an single unmodified OS
• A root VMM that supports Nested Virtualization allows its
guest to run as VMM and launch and control its own VMs
• NV support becomes inevitable requirement to VMMs, but it is still
slow and complex
• In the uVMM model (single VM, no I/O virtualization) NV becomes
less painful