micro vmms and nested virtualization -...

33
Micro VMMs and Nested Virtualization For the TCE 4th summer school on computer security, big data and innovation Baruch Chaikin, Intel 9 September 2015

Upload: others

Post on 27-Jan-2020

47 views

Category:

Documents


0 download

TRANSCRIPT

Micro VMMs and Nested Virtualization

For the TCE 4th summer school on computer security, big data and innovation

Baruch Chaikin, Intel

9 September 2015

Agenda

Virtualization Basics

The Micro VMM

Nested Virtualization

Summary

VIRTUALIZATION BASICS

Virtual Machines

Virtual Machine Managers

Intel VT Architecture

Virtual Machines

A virtual machine (VM) allows SW to run virtual HW – Processors, memory, chipset, I/O devices, etc.

– Encapsulates OS and application state

Virtualization usages – Backward and cross ISA compatibility

– Resource sharing and protection

– Isolation, fault tolerance and recovery

– Snapshotting and migration

– Testing and development

VMs are supported by VM Monitor (VMM) – Launches VMs and controls VM execution and VM access to system resources

Virtualization Approaches

Three main approaches:

• Full virtualization – Everything is virtualized

– The VMM emulates, simulates, or translates VM code

– Induces performance penalty

• Para-virtualization – The VMM provides services to the VMM-aware VM

– Fast, but requires SW enabling

• HW-assisted virtualization – The processor supports special ISA (instructions, modes, data structures, MSRs) that VMM can use

– Fast, now SW enabling needed

Intel provides HW-assists to VMMs – VT technology

Defines Virtual Mode Extensions (VMX) to x86 ISA

Intel VT Architecture

Three VMX execution modes – VMX Off – No VMM is running (default state established at reset)

– VT Root Mode – SW is running as VMM (“host”)

– VT Non-Root Mode – SW is running as VM (“guest”)

Transitions between VMX execution modes – VMXON, VMXOFF – VMX instructions to enter/exit VMX root mode

– VMLAUNCH/VMRESUME – VMX instructions to enter VMX non-root mode (VM Entry flow)

– The VM Control Structure (VMCS) defines events and instructions that cause transitions from VMX non-root mode back to the VMM (VM Exit flow)

Non-Root Root

VMXOFF

VM Entry VMXON

VM Exit

VMX Off

VMX On

CPU Virtualization

VMX allows the VMM to control VM use of CPU resources • Execution time

• Instructions set

• Registers

• Etc.

Events that happen during VM execution: • Fixed VM exit: always monitored, e.g. INIT event, CPUID instruction

• Conditional VM exit: can be controlled by the VMM, e.g. #PF exception, PAUSE instr.

• No VM exit: no VMM control, e.g. ADD instruction, SMI event

Basic VMX functionality • Employed using the VMCS

VMCS

VMCS and VT Transitions

Execution Controls

Entry Controls

Exit Controls

Guest State

Host State

(event)

Save exit information

Save “guest” CPU state according to exit controls

Load “host” CPU state

VMLAUNCH/VMRESUME

Check and load “guest” CPU state

Check “host” CPU state in VMCS

Perform additional actions according to entry controls

Guest

Host

Move to root mode Move to non-root mode

Run under execution controls

VMWRITE host/guest state and controls

VMREAD guest state and exit information

Memory Virtualization

VMX allows the VMM to control VM use of memory

Many usages, including: • Preventing VM access to the memory ranges allocated for the VMM or other VMs

• Monitor VM use of memory

• Support VM migrations to other machines

Virtual Memory Virtualization • Establish “shadow” page tables (SPT)

• Monitoring guest MOV to CR3, INVLPG and #PF exceptions

Physical Memory Virtualization • Establish extended page tables (EPT)

• Monitoring EPT violations

Shadow and Extended Page Tables

VM: Access VA x

Guest PA y

Host PA z

Virtual Memory Virtualization using Shadow Page Tables (SPT)

OS PT VA x PA y

Physical Memory Virtualization using Extended Page Tables (EPT)

Shadow PT VA x PA z

VM: Access VA x

Guest PA y

Host PA z

OS PT VA x PA y

Extended PT PA y PA z

Page Fault EPT

Violation

I/O Virtualization

VMX allows the VMM to virtualize I/O devices – VTX – contains VMX features to support I/O virtualization in the cores

– VTD – contains VMX features to support I/O virtualization in the uncore

VTX features – I/O port accesses

– Memory-mapped I/O accesses

– Interrupt and NMI virtualization

– Local APIC Virtualization

– And more…

VTD features – DMA remapping

– Interrupt remapping

– I/O device assignment

– And more…

THE MICRO VMM

Concept and Usages

Architecture and Design

Implementations

Concepts

The micro-VMM (uVMM) is a minimal VMM: – Supports a single VM

– Exposes the underlying CPU to the VM

– Allows the VM to access all physical memory (except uVMM region)

– Fully assigns I/O handling to the VM

A thin layer with a small footprint – Minimal impact on boot time, memory consumption and power/performance

– Can be used as hypervisor or as hosted VMM

A foundation for various usages – Can be (and was) serve as a basis for extensions

– Not a stand-alone product!

uVMM

Architecture

CPU

Guest OS

BIOS/SMM

The uVMM runs as a thin layer below the guest OS Can be launched in 2 different ways:

• Early launch – from the BIOS, before the OS is booted • Late launch – from the OS, using a kernel driver

uVMM

Possible uVMM Design

Runtime services IPC Support

HW Abstraction Layer (HAL)

Initialization Power State

Support

Physical

Memory

Virtualization

(EPT)

Context

Support

(VMCS)

CPU Virtualization

(CPUID, etc.)

Add-On Debug Support

Add-

Ons

API

Add-On

Add-On

Usages

A uVMM does nothing in particular, but can be used as a basis for various usages:

– Anti-virus hardening

– Monitoring execution of untrusted VMs

– Prototyping new ISA extensions

– Etc.

NESTED VIRTUALIZATION

Motivation and concepts

Architectures challenges and HW Support

Main Techniques

Motivation

VMMs become ubiquitous in commodity platforms – Windows 7 runs Windows XP in a virtual machine

– Windows 10 uses HyperV as an integral part of the OS

– Linux has an optional built-in VMM (KVM)

To virtualize such commodity platforms, bare-metal VMMs need to support guest VMMs

– For clouds, development, security, and more

This is called Nested Virtualization (NV)

Some VMMs already support NV – KVM Turtles, Xen 4.4, Blue Pill, VMWare ESX

Architecture Challenges

The root VMM shall support unmodified guest VMM that runs unmodified guest VMs 1. Allow VMM establishment in non-root mode

2. Allow the guest VMM to use launch and monitor guest VMs

3. Allow the guest VMM to support nested virtualization…

The root VMM shall run with small footprint • Low power/performance overhead

• Low memory overhead

• Minimal consumption of system resources

VM1b

Nesting Levels

VMM0

VM1a

VMM1

VM2a VM2b

VMM2

VM3

Root mode

Virtual non-root mode

levels

L0

L1

L2

L3

Complexity Reduction

There might be n levels of nesting, but it’s enough that the root VMM (level 0) will support only 2 guest levels – guest VMM (level 1) and guest VM (level 2)

– The guest VMM at level 1 (which thinks it’s the root VMM!) will support up to level 3

– The guest-guest VMM at level 2 (which also thinks it’s the root VMM!) will support up to level 4

– And so on...

So we can restrict our discussion to levels 0-2 only

Terminology: – Root VMM = VMM that runs in VT root mode (level 0)

– Guest VMM = VMM that runs in VT non-root mode (level 1)

– Guest VM = VM supported by the guest VMM (level 2)

VM2

Nesting Level “Flattening”

Root VMM

VM1

Guest VMM

Guest

VM

Root VMM

VM1

(VM1)

VM2

(Guest VMM)

VM3

(Guest VM)

Virtual levels Actual Levels

Nested Virtualization –VT Transition Emulating

Guest VMM

Guest VM

Root VMM

VM entry emulation VM exit emulation

Level 2

Level 1

Level 0

VM Entry 12 VM Exit 21

VM Exit 20

VMX Support for Nested Virtualization

The CPU knows about 2 levels only – root and non-root mode – The root VMM should virtualize VT resources (VMCS, EPT, VTD, etc.) to the guest VMM

– The root VMM should emulate level 1-2 transitions between the guest VMM and the guest VM

Intel VMX adds some support for nesting – Haswell added “VMCS Shadowing” to avoid successive VM exits on VMCS accesses

(VMREAD/VMWRITE) made by a guest VMM

– More opportunities considered

Root VMM can do clever SW tricks to support guest VMM – KVM Turtles declares 6-8% overhead

– Use of VMCS shadowing and virtual interrupts can possibly reduce this number even further

Nested CPU Virtualization

The root VMM maintains 3 kinds of VMCS structures – VMCS 0-1 for regular VMs and for a VM that works as guest VMM

– VMCS 1-2 as a shadow for guest VMCS 0-1 (i.e. the VMCS that the guest VMM created for its VM)

– VMCS 0-2 as the actual VMCS under which the guest VM will runs

Guest VMM execution (in level 1) – The root VMM intercepts the VMPTRLD instruction, and creates/reloads VMCS 1-2 for each

VMCS created/reloaded by the guest VMM

– The root VMM intercepts the VMREAD/VMWRITE instruction, which the guest VMM uses to read/write its VMCS, using VMCS 1-2

Guest VM entry (from level 1 to level 2) – The root VMM intercepts the VMLAUNCH/VMRESUME instructions, and merges VMCS 1-2 with

VMCS 0-1 into VMCS 0-2, then launches/resumes the guest VM

Guest VM exit (from level 2 to level 1) – The root VMM intercepts the VM exiting event/instruction and checks VMCS 0-2

– If the VM exit should be delivered to the guest VMM, the root VMM updates VMCS 1-2 and resume the guest VMM

Nested VMCSs

L1-2 Controls

L2 Guest State

L1 Host State

VMCS 1-2

L0-1 Controls

L1 Guest State

L0 Host State

VMCS01

L0-2 Controls

L2 Guest State

L0 Host State

VMCS 0-2

VMCS 0-1

Nested Memory Virtualization

Both root and guest VMMs need to virtualize memory – Recall: VMMs use either Shadow Page Tables (SPT) or Extended Page Tables (EPT)

– VMMs resort to SPT if EPT is not supported

The root VMM virtualizes memory using “Shadow EPT” – Run the guest VMM under EPT 0-1 (PA1PA0) as usual

– Let the guest VMM setup EPT 1-2 (PA2PA1) for the guest VM

– At guest VM entry emulation, merge EPT 0-1 and EPT 1-2 into EPT 0-2

– Run the guest VM under EPT 0-2 (PA2PA0)

– When EPT violation happens on EPT 0-2, check whether the violation would have happened in EPT 1-2 – if yes, emulated EPT violation VM exit to the guest VMM

– Otherwise, handle the EPT violation and resume the guest VM

The root VMM does not need to virtualize virtual memory

EPT Shadowing

VA2

PA2

PA1

PA0

Guest VM page tables

Guest VMM extended page tables (EPT 1-2)

Root VMM extended page tables (EPT 0-1)

Shadow extended page tables (EPT 0-2)

Nested I/O Virtualization

Nested I/O virtualization requires to virtualize MMIO accesses, I/O port accesses, local APIC, external interrupts, NMIs, VTD tables, etc.

This is the most complex part…

… But the basic uVMM working as root VMM does not need to bother, because the uVMM assigns the I/O to its guest

Nested Virtualization Challenges

Performance

• The root VMM should emulate Guest VMCS accesses and guest VMM-VM transitions – Intel supports “VMCS Shadowing” to accelerate guest VMREAD/VMWRITE

– No Intel CPU support to accelerate other things, yet

• The root VMM should merge control structures (VMCS, EPT, MSR lists, etc.)

• Root VMM execution pollutes caches

Complexity

• NV support requires complex code and thus increases VMM footprint and enlarges risk for bugs and security issues

SUMMARY

• Virtualization is a powerful technology with increasingly

many usages

• Intel VMX provides VMMs with HW assists for CPU,

memory and I/O virtualization

• uVMM is a lightweight VMM that performs the bare

minimum required to support an single unmodified OS

• A root VMM that supports Nested Virtualization allows its

guest to run as VMM and launch and control its own VMs

• NV support becomes inevitable requirement to VMMs, but it is still

slow and complex

• In the uVMM model (single VM, no I/O virtualization) NV becomes

less painful