the nova kernel api - fosdem · 02 nova architecture reduce complexity of hypervisor: hypervisor...

44
Fakult ¨ at Informatik Institut f ¨ ur Systemarchitektur, Betriebssysteme THE NOVA KERNEL API Julian Stecklina ([email protected]) Dresden, 5.2.2012

Upload: others

Post on 25-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

Fakultat Informatik Institut fur Systemarchitektur, Betriebssysteme

THE NOVA KERNEL API

Julian Stecklina ([email protected])

Dresden, 5.2.2012

Page 2: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

00 Disclaimer

This is not about OpenStack Compute.

NOVA is mainly the work of Udo Steinberg (kernel) and Bernhard Kauer (userland).

http://hypervisor.org/

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 2 von 26

Page 3: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

00 Goals

• not talking about virtualization propaganda,

• giving a very short overview of NOVA as a whole

• introducing basic concepts of the kernel API

In the end you should be able to pick up the NOVA API manual and make heads ortails of it.

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 3 von 26

Page 4: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

01 NOVA OS Virtualization Architecture

http:

//os.inf.tu-dresden.de/papers_ps/steinberg_eurosys2010.pdf

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 4 von 26

Page 5: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

01 What works, what doesn’t

Works• x86 32-bit

• SMP

• VT-x, AMD-V

• VT-d (Intel IOMMU)

• SR-IOV

• grub, syslinux, . . .

• Linux, L4, . . .

• emulates AHCI, igb, . . .

• drivers for AHCI, someIntel NICs, . . .

• experimental libvirt support

Doesn’t work yet• Windows

• Migration

• Recursive Virtualization

• 64-bit

• being user-friendly ;-)

• . . .

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 5 von 26

Page 6: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM
Page 7: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

02 NOVA Architecture

Reduce complexity of hypervisor:

• hypervisor provides low-level protection domains– address spaces– virtual machines

• one VMM per guest in (root mode) userspace,– possibly specialized VMMs to reduce attack surface– only one generic VMM implement so far

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 7 von 26

Page 8: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

Demo

Page 9: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 The L4 Influence

NOVA cannot deny its roots in the L4 family:

• task, threads, synchronous IPC

• recursive mapping of memory

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 9 von 26

Page 10: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Capability-Based

Syscalls operate on capabilities to kernel objects:

• Protection Domain (PD) (“task”) — create pd

• Execution Context (EC) (“thread”) — create ec, ec ctrl

• Scheduling Context (SC) — create sc, sc ctrl

• Portals (PT) — create pt, call, reply

• Semaphore (SM) — create sm, sm ctrl

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 10 von 26

Page 11: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Capabilities

Userspace can

• create capabilities to objects (by creating kernel objects),

• delegate capabilities (recursively, just as memory),

Capabilities are stored per-PD in capability space in the kernel. A PD

• uses index into capability space to name capabilities,

• unforgeable.

(Think file descriptors.)

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 11 von 26

Page 12: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Communication

EC (thread)• bound to one PD (address

space)

• either thread or vCPU

• has a special memoryregion (UTCB) for IPC

Portals• entry point (instruction pointer)

• bound to one EC

• per client/function/. . .

• pass data, delegate capabilitiesfrom UTCB to UTCB

• can be called or implicitly used byexceptions (if a thread has the cap)

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 12 von 26

Page 13: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 ECs and SCs

There are two kinds of threads:

with time• “global thread” or vCPU

• stick SC to newly created EC

• causes startup exception whenfirst scheduled

without time• “local thread”

• bind portals to ECs

• when portal invoked, startsexecuting at portal EIP

• caller hands in time to handlethe request (no schedulingdecision)

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 13 von 26

Page 14: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Basic Server Scenario

ECglobal

EClocal

(1) call

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 14 von 26

Page 15: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Basic Server Scenario

ECglobal

EClocal

(1) call

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 14 von 26

Page 16: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Basic Server Scenario

ECglobal

EClocal

(2) reply

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 14 von 26

Page 17: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Resource Contention

ECs are not reentrant.What happens when a second client wants to call a service?

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 15 von 26

Page 18: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Resource Contention

ECglobal

EClocal

(1) call

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 15 von 26

Page 19: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Resource Contention

ECglobal

EClocal

(1) call (2) call

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 15 von 26

Page 20: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Resource Contention

ECglobal

EClocal

(1) call (2) call

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 15 von 26

Page 21: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Resource Contention

ECglobal

EClocal

(2) call

(3) reply

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 15 von 26

Page 22: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 Resource Contention

ECglobal

EClocal

(4) reply

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 15 von 26

Page 23: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

03 NOVA’s time management

With SCs only bound to some threads, it is possible to build (simple) serverswithout time reservation.

• How much time should service foo need anyway?

• fewer things to schedule,

• contended resources get “boosted” by clients as needed.

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 16 von 26

Page 24: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Hardware Support for Virtualization

Late Pentium 4 (2004) introduced hardware support for virtualization: Intel VT.(AMD-V is conceptually very similar)

• root mode vs. non-root mode– root mode runs hypervisor– non-root mode runs guest

• situations that Intel VT cannot handle trap to root mode (VM Exit)

• special memory region (VMCS) holds guest state

• reduced software complexity

Supported by all major virtualization solutions today.

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 17 von 26

Page 25: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 VT-x Problems

VMCS (memory region holding guest state) needs to manipulated by VMM, yet

• cannot be mapped into userspace,

• have to use privileged VMREAD/VMWRITE instructions to access,

• reading all content for every VM Exit is expensive.

Kernel has to manage VMCS access.

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 18 von 26

Page 26: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Virtualization on NOVA

VM Exits (and normal exceptions) vector through special portals.

• Portals created with bit field denoting interesting information (MessageTransfer Descriptor, MTD)

– for WRMSR or CPUID we need only general purpose registers– for page fault we need complete vCPU state

• kernel puts this data in handler’s UTCB

• handler produces new MTD on reply

Reduce number of expensive VMREAD/VMWRITE in the kernel.

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 19 von 26

Page 27: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

Page 28: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

Page 29: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

MMIO

SharedMemory

Page 30: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

MMIO

SharedMemory

"write data"

Page 31: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

MMIO

SharedMemory

"working on it"

Page 32: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

Page 33: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

Page 34: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

recall

Page 35: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

recallexception

Page 36: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

injectirq

Page 37: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

04 Writing to disk

VMM

Drv

vCPUExc

Handler IRQ

SharedMemory

Page 38: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

http://os.inf.tu-dresden.de/papers_ps/steinberg_eurosys2010.pdf

Page 39: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

05 There is also . . .

• Userspace Timer Service

• Admission Server

• Device Drivers (IOMMU!)

• . . .

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 22 von 26

Page 40: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

05 Summary

The NOVA microhypervisor is a

• fast capability-basedmicrokernel

• with virtualization

in mind.

Supported by:

http://ict-passive.eu/

Code at http://hypervisor.org/

Discuss at http://os.inf.tu-dresden.de/mailman/listinfo/l4-hackers

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 23 von 26

Page 41: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

06 Multiple CPUs

Thread-related kernel objects are bound to one CPU:

• Portals,

• Execution Contexts,

• Scheduling Contexts.

Semphores work cross-CPU. Communication via Semaphores/recall.“Non-donating” cross-CPU IPC never really needed.Servers can be CPU-topology aware!

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 24 von 26

Page 42: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

http://os.inf.tu-dresden.de/papers_ps/steinberg_eurosys2010.pdf

Page 43: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

06 Livelock

It’s possible to construct helpingloops... Ouch!

• Kernel detects loop

• Random IPC is aborted

PagerSrv A

Srv BClient

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 26 von 26

Page 44: THE NOVA KERNEL API - FOSDEM · 02 NOVA Architecture Reduce complexity of hypervisor: hypervisor provides low-level protection domains –address spaces –virtual machines one VMM

06 Livelock

It’s possible to construct helpingloops... Ouch!

• Kernel detects loop

• Random IPC is aborted

PagerSrv A

Srv BClient

TU Dresden, 5.2.2012 The NOVA Kernel API Folie 26 von 26