ppt

69
Virtual Machines Background Adapted from Silberschatz

Upload: cameroon45

Post on 21-Jan-2015

366 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: PPT

Virtual Machines Background

Adapted from Silberschatz

Page 2: PPT

Virtual Machines

• A virtual machine takes the layered approach to its logical conclusion. It treats hardware and the operating system kernel as though they were all hardware.

• A virtual machine provides an interface identical to the underlying bare hardware.

• For example, the operating system creates the illusion of multiple processes, each executing on its own processor with its own (virtual) memory.

Page 3: PPT

Virtual Machines (Cont.)

• The resources of the physical computer are shared to create the virtual machines.– CPU scheduling can create the appearance

that users have their own processor.– Spooling and a file system can provide virtual

card readers and virtual line printers.– A normal user time-sharing terminal serves as

the virtual machine operator’s console.

Page 4: PPT

System Models

Non-virtual Machine Virtual Machine

Page 5: PPT

Advantages/Disadvantages of Virtual Machines

• The virtual-machine concept provides complete protection of system resources since each virtual machine is isolated from all other virtual machines. What might be bad about this?– This isolation, however, permits no direct sharing of

resources.• A virtual-machine system is a perfect vehicle for

operating-systems research and development. System development is done on the virtual machine, instead of on a physical machine and so does not disrupt normal system operation.

• The virtual machine concept is difficult to implement due to the effort required to provide an exact duplicate to the underlying machine.

Page 6: PPT

Java Virtual Machine

• Compiled Java programs are platform-neutral bytecodes executed by a Java Virtual Machine (JVM).

• JVM consists of

- class loader

- class verifier

- runtime interpreter• Just-In-Time (JIT) compilers increase

performance

Page 7: PPT

Java Virtual Machine

Page 8: PPT

An Overview of Virtual Machine Architectures

Smith and Nair

Page 9: PPT

Definitions

• Instruction Set Architecture (ISA)– Precise specification of the interface between

hardware and software

• Application Binary Interface (ABI)– Defines how an application can work with a platform

at the binary level. (Contrast with API.)– Includes user ISA, system call interface, etc.– Suppose an ABI is changed.

• Recompile?• Source changes?

Page 10: PPT

Virtualization

• VMM also known as hypervisor.

Hardware

OS

Application

ISA

Hardware

OS

Application

Virtual ISAVMM

ISA

Guest

Host

OS

Application

VirtualISAVirtual

Machine

Page 11: PPT

Virtual Machine Uses

• Emulation– One ISA can be used to emulate another.– Provides cross-platform portability.

• Optimization– Emulators can optimize as they emulate.– Also can optimize same ISA to same ISA.

• Replication– A single physical machine can be replicated,

providing isolation between the VMs.• Composition

– Two virtual machines can be composed, combining the functionality of each.

Page 12: PPT

Process vs. System

• Meaning of “machine” depends on perspective.– To a process, the machine is the system calls,

libraries, etc.• Already abstract.

– The entire system also runs on a machine.• Includes ISA, actual devices, etc.

– Other kinds of machines?• As there are two perspectives, there are two

kinds of virtual machines: process and system.– Process virtual machine can support an individual

process.– System virtual machine can run a complete OS plus

environment.

Page 13: PPT

Process vs. System

x86

Linux

JavaVM

NativeApp

NativeApp Java

VM

JavaProg

JavaProg

x86

Linux

VMM

NativeApp

NativeApp

W32App

Windows

W32App

Process VM System VM

Examples?

Page 14: PPT

Process VMs

• Multiprogramming– A process has the illusion of having the whole machine to itself.

• Emulation– Interpreted. (Define.)– Translated. (Define.)– What are relative merits?

• Dynamic optimizers– Especially useful with some kind of profile-directed translation.

• High Level Language VMs– High-level language is compiled to an intermediate language.– VM then runs the intermediate language.– Example is Java: Interpreted or translated?

Page 15: PPT

System VMs• Same ISA

– “Classic” (Define. Pros/cons?)• VMM built directly on top of hardware.• Most efficient, but requires wiping the slate clean.• Requires device drivers in the VMM.

– Hosted (Define. Pros/cons?)• VMM built on top of existing OS.• Most convenient• Devices drivers supplied by host OS, VMM uses facilities provided by host

OS.• Different ISA

– Whole System VMs: Emulation• ISA not the same, must emulate everything.

– Co-Designed VMs: Optimization• Hardware designed to support VMs.• Provides a clean design for virtualization.• Can be significantly more efficient.

Page 16: PPT

Virtualization

• The state of a machine must be maintained.– Physical machine: latches, flip-flops, etc.– Virtual machine: combination of physical machine and state

emulated in software using RAM, etc.

• At certain points in execution, such as a trap, the state of the machine must be “materialized”.– Not trivial due to complex hardware techniques used to provide

high performance.– This ability to materialize the state is termed “preciseness”.

• Three aspects of virtualization– State: registers and memory– Instructions: may involve emulation– State materialization: when exceptions occur

Page 17: PPT

Process VMs Virtualization• Multiprogramming

– State• Mapped 1:1

– Instructions• Native

– State materialization• Provided by hardware

• Dynamic translation– State

• Registers mapped to host registers as available (overflow to memory). Memory mapped to host memory.

– Instructions• Emulated

– State materialization• Provided by VM software

• HLL VMs– State

• Mapped to host resources as available.– Instructions

• Emulated, JIT compiled– State materialization

• Provided by VM software

Page 18: PPT

System VMs Virtualization• “Classic” VMs

– State• Mapped 1:1, except for privileged registers.

– Instructions• Native, except trapping for priveleged instructions

– State materialization• Provided by hardware

• Whole System VMs– State

• Mapped to available memory, not 1:1– Instructions

• Emulated– State materialization

• Provided by VM software

• Co-Designed VMs– State

• Mapped 1:1– Instructions

• Block-level translated– State materialization

• Provided by hardware/VM software combination

Page 19: PPT

Taxonomy

• Process– Same ISA

• Multiprogramming• Dynamic optimization

– Different ISA• Dynamic translators• HLL VM

• System– Same ISA

• “Classic” OS VMs (IBM)• Hosted VMs

– Different ISA• Whole system• Co-designed VMs

Page 20: PPT

Key Ideas

• VMs can support an individual process only, or can support a whole OS.

• Can construct a useful taxonomy based on:– process or system– same ISA or different ISA

Page 21: PPT

Virtualizing I/O Devices on VMware Workstation’s Host VMM

Page 22: PPT

Virtualizing the PC Platform

• Several hurdles– Non-virtualizable processor

• Some privileged instructions fail silently. (Why is this a problem?) (What’s the solution?)

– PC hardware diversity• Why is this problematic for a “classic” VM?

– Pre-existing PC software• Must stay compatible

• To address these, VMware uses a hosted VM. (Not a “classic” VM.)

Page 23: PPT

Two Worlds

• VMApp runs in the host, using the VMDriver host kernel component to establish the VMM.

• CPU is thus executing in either the host world or the virtual world, using VMDriver to switch worlds.

• World switches are expensive, since user and system state must be switched.

Page 24: PPT

Host Kernel

Architecture

VMApp

VMDriver VMNet

Page 25: PPT

Virtualizing the NIC

• I/O port operations by guest OS must be intercepted by VMM.– Must then be processed in the VMM (to maintain the virtual

state).– Or executed in the host world. (When must it do what?)

• Send operations start as a sequence of ops to virtual I/O ports.– Upon finalization of the send, the VMApp issues a host OS

syscall to the VMNet driver, which passes it on the real NIC.– Finally requires raising a virtual IRQ to signal completetion.

• Receive operations operate in reverse.– VMApps executes select() syscall on possible sources.– Reads packet, forwards it to VMM which raises a virtual IRQ.

Page 26: PPT

Details

• Send– Guest OS out to I/O port– Trap to VMDriver– Pass to VMApp– Syscall to VMNet– Pass to actual NIC driver

• Receive– Hardware IRQ– Actual NIC delivers to VMNet driver– VMNet driver causes VMApp to return from select()– VMApp copies packet to VM memory– VMApp asks VMM to raise virtual IRQ– Guest OS performs port operations to read data– Trap to VMDriver– VMApp returns from ioctl() to raise IRQ

Page 27: PPT

Reducing Network Virtualization Overheads

• Handling I/O ports in the VMM– Many accesses don’t involve actual I/O.– Let the VMM maintain the state, avoiding a worlds

switch.

• Send combining– If data rate is high, queue up packets, send them in a

group.

• IRQ notification– Use shared memory bitmap rather than requiring

VMApp to call select() when an IRQ is received on the host system.

Page 28: PPT

Performance Enhancements

• Reducing CPU virtualization overhead– Find operations to the interrupt controller that have

memory semantics and replace with MOV operation, which does not require intervention by the VMM.

– Apparently requires dynamic binary translation.• Modifying the guest OS

– Eliminate idle task page table switching, which is not necessary, since the idle task pages are mapped in every process page table.

– Run idle task with page table of last process.– What would happen if the idle task had a bug and

wrote to some random addresses?

Page 29: PPT

Performance Enhancements• Creating a custom virtual device

– Virtualizing a real device is somewhat inefficient, since the interface to these devices is optimized for real devices, not virtual devices.

– Designing a custom virtual device can reduce expensive operations.– Disadvantage is that must write a new device driver in guest OS for this

virtual device.• Modifying the host OS

– VMNet driver allocates kernel memory sk_buff, then copies from VMApp to sk_buff.

– Can eliminate copy by using memory from VM physical memory.• Bypassing the host OS

– VMM uses own drivers, rather than going through the host OS. (Note that going through the host OS is using a kind of process VM provided by the host OS.)

– Disadvantage is that you have to write your own VMM driver for every supported real device.

Page 30: PPT

Summary

• Main goal is to develop some understanding of the issues of hosted system VM performance.

Page 31: PPT

Question

• If overwrite privileged instructions with a brk instruction, how does the VMM know what instruction used to go there?

Page 32: PPT

Xen and the Art of Virtualization

A (bad) play on “Zen and the Art of Motorcycle Maintenance”

Page 33: PPT

Motivation

• Server farm scenario– Multiple applications installed on machines.– Different customers.– (What’s “admission control”?)

• Current approaches– Allow users to install and run apps

• Configuration interaction between apps (like versions of Java jars, shared libraries, etc.) can lead to compatibility problems requiring time-consuming system administration to solve.

• Behavior of one app can impact performance of another. Need performance isolation.

– One approach is QoS.• Extend OS to provide QoS to apps.• (What’s the difference between QoS and real-time? QoS and perf.

isolation?)

Page 34: PPT

Use VMs

• Instead use multiple VMs, one VM per app.– Each app can configure the entire OS exactly how it

requires.– Relatively easier to implement algorithms at the VM

level to isolate the performance behavior of different apps.

• Requirements for successful partitioning– Isolation (Does VMware provide this?)– Accommodate heterogeneity– Good performance

• To avoid performance penalties of VMs like VMware, use paravirtualization.

Page 35: PPT

Design Principles

• Support for unmodified binaries is essential.– Must virtualize all features required by existing ABIs.

• Support for full multi-app OSs is important. (Not just process VMs.)– Complex configurations may have multiple processes and should be configured

within a single VM.• Paravirtualization is necessary to obtain high performance and strong

resource isolation.– For example, virtualizing page tables can result in many expensive traps.

• Even on ISAs designed for virtualization, completely hiding the virtualization from guest OS risks correctness and performance.

– For example, the VM should know real time (and not just virtual time) to handle things like timeouts.

• Contrast with Denali security model.– Separate namespaces.– Xen uses hypervisor.

Page 36: PPT

The VM Interface Overview

• Memory management– Paging

• Xen in top 64 MB of every AS, avoiding TLB flush for hypervisor transitions.

• Guest OSs update actual hardware page tables through Xen, which improves performance. (But makes them aware of virtualization.)

– Segmentation• Cannot install fully privileged segment descriptors.

Page 37: PPT

The VM Interface Overview (contd.)

• CPU– Protection

• Guest OS must run at lower privilege. Since ring 1-2 seldom used, run guest OS in ring 1.

– Exceptions• Guest OSs must register handlers with Xen. Generally identical to original.• Safety is done by making sure it doesn’t execute in ring 0.

– System Calls• “Fast” handlers may be registered to avoid going through ring 0. Instead go

from ring 3 to ring 1.• Does this change the ABI?

– Page Fault• Page fault handler must be modified, fault addr in a priv reg.• Technique is for Xen to write to a location in the stack frame.

• Device I/O– Network, Disk, etc.

• All replaced with special, buffer-based event mechanism.

Page 38: PPT

Porting

•XP directly accessed PTEs, Linux used macros. (Why sig.?)

Page 39: PPT

Control and Management

• Separation of policy from mechanism

• Microkernel like design– Basic control mechanism provided by

hypervisor through a control interface– Policies implemented by a special

distinguished guest OS instance (domain).• Scheduling parameters, phys mem allocations,

domain creation/destruction, create/delete virtual network interfaces and block devices

Page 40: PPT

Architecture

Page 41: PPT

Details

Page 42: PPT

Hypercalls and Events

• Hypercalls– From domain to Xen– Explicit calls into the hypervisor by the guest

OS. Used by guest OS for things like updating hardware page tables.

• Events– From Xen to domain– Bitmask, and handler

Page 43: PPT

Data Transfer

• Presence of hypervisor is another layer, so imperative to minimize overhead.

• For resource accountability– Minimize work to demultiplex data

• Or, figure out as quickly as possible which domain it goes to.

– Memory committed to I/O comes from relevant domains

• Minimize cross-talk

Page 44: PPT

I/O Rings

• Buffers separate. How is pointer shared? How does reordering work? NBS.

Page 45: PPT

CPU Scheduling

• CPU Scheduling– BVT

• Work-conserving– Latency vs. throughput– When would you want non-work-conserving?

• Fast-dispatch (borrowing)

Page 46: PPT

Time and Timers

• Time and timers– Guest OSs made aware of real time, virtual time, and

wall-clock time.• Real-time, nanosecs since boot, can be frequency-locked to

external• Virtual time advances only when the guest OS is executing.

Used for scheduling by the guest OS.• Wall-clock time? An offset from real time. (When would ever

adjust?)– Xen-provided timers are used by guest OS.

• Solves one efficiency problem with VMware Workstation.– Guest XP causes host to perform poorly, because must

constantly deliver timer interrupts to XP to do things like smooth transition animations (like minimizing a window, etc.). Forcing the guest to use XP provided timer would eliminate the need to virtualize these timer interrupts.

Page 47: PPT

Virtual Address Translation

• Virtual address translation– Handled by Xen, batched updates.– Must be validated by Xen.

• Type and ref count associated with each frame– Type is used to aid validation

• For example, a page table frame needs to be validated once, but not afterwards.

Page 48: PPT

Physical Memory

• Physical memory– Reserved for each guest OS instance at time

of creation.• Provides strong isolation.• But no sharing. What would be advantage of

sharing?

– OS may use an additional table to give the illusion of physical memory.

– Might need to know hardware for optimizing placement.

Page 49: PPT

Network

• VIFs– Two I/O rings– Zero-copy

Page 50: PPT

Disks

• VBDs (Domain0 has direct access.)• Disk scheduling

– Guest doesn’t know the real layout– Xen does some reordering

• (A bit of a violation of policy/mechanism.)

• Scheduling is RR of batched requests, then elevator.– Also may have reorder barriers.– (How well does this provide isolation?)

Page 51: PPT

Performance

Page 52: PPT

Relative Performance

• Compared Linux, XenoLinux, VMware 3.2, and UML.– Tests with others could not be published.

• Tests– SPEC INT2000– Linux build

• Native Linux 7% CPU is system.

– Open Source Database Benchmark (OSDB) Information Retrieval (IR)

– OSDB On-Line Transaction Processing– dbench

• File system benchmark

– SPEC WEB99• App level for Web servers (Apache)

Page 53: PPT

Performance

Page 54: PPT

Performance

Page 55: PPT

Operating System BMs

• What does SMP stand for?• Why might SMP be slower?• Why are the highlighted ones slower?• Why sig handling faster for Xen?

Page 56: PPT

Operating System BMs

• Needs hypercall.• Why more processes needs more time?• Why less sig diff with bigger WS?

Page 57: PPT

Operating System BMs

• mmap and PF require two transitions. (Why?)

Page 58: PPT

Operating System BMs

• Zero-copy

Page 59: PPT

Concurrent VMs

• Run on 2-CPU SMP• Apache only 28% improve over UP.• Xen improves 9% over UP.• Why slightly better sometimes?

Page 60: PPT

PostgreSQL

• Scores running multiple PostgreSQL on native Linux are 25-35% lower. Possibly due to SMP scalability plus poor use of block cache.

• Weights seem to have an effect in the Info Retr case, but no effect in OLTP case due to lots of sync writes. Why sync writes?

Page 61: PPT

Performance Isolation

• Only 4% and 2% below earlier results.– Does this make sense?

Domain

1 2 3 4

OSDB-IR SPEC WEB99 dd and creating many files in large directories

Fork bomb and touch 3GB of VM

Page 62: PPT

Scalability of VMs

• SPEC INT2000• Native Linux identifies as compute bound, and uses 50 ms time

slice. (Why does this matter?)

Page 63: PPT

Future Work

• Universal buffer cache with COW– How might this be used?

• Last chance page cache (LPC)– “of non-zero length only when machine

memory is undersubscribed.”– Clean, evicted pages, added to LPC.– If faults, check LPC

• (Why only clean pages?)

Page 64: PPT

Key Ideas

• A virtual ISA (paravirtualization) is better.– Better performance– Allows VMs to be isolated from one another. One VM can’t

cause the other to thrash, for instance.– Allows up to 100 OS instances– Making the guest OS aware of virtualization improves

correctness and performance

• Control and management of Xen itself is done from a guest OS, via a special interface.

• Cherry picking?– Generally speaking, people always choose tests to show their

work in best light.– Maybe hard to tell if complex situation.

Page 65: PPT

Microkernels Meet Recursive Virtual Machines

Ford et al

Page 66: PPT

Decomposition

• Microkernels decompose functionality horizontally (mainly).– Monolithic services separated horizontally.– Moved up one layer.

• Stackable VMMs decompose functionality vertically.– Each layer supplies some functionality.

Page 67: PPT

Fluke• Uses a nested process architecture.

– Each process provides a VM to its children, possibly with additional functionality.

– Different from usual parent-child in that children are completely contained within and visible to parent.

– This is necessary for the parent to be a VM to its children.• Two APIs

– Low-level kernel API to microkernel for basic manipulation– High-level protocols to handle:

• Parent Interface• Process• MemPool• FileSystem

– Nested VMs interact directly with microkernel for the low-level API, but interact with the parent VM for high-level protocols.

– Parent VM will use interposition to add additional functionality. This is how the stacking works.

Page 68: PPT
Page 69: PPT

Key Ideas

• Implement a microkernel that allows process virtual machines to be stacked.

• Each virtual machine is a user-level server.• Stacking occurs through process nesting.• Use pass-through to avoid exponential behavior.• Mainly interesting for the ideas, performance is

relatively poor, but may be improvable.