a survey on virtualization technologies susanta k nanda
TRANSCRIPT
A Survey on Virtualization A Survey on Virtualization TechnologiesTechnologies
Susanta K NandaSusanta K Nanda
Virtualization is “HOT”Virtualization is “HOT”
Microsoft acquires Connectix Corp.Microsoft acquires Connectix Corp. EMC acquires VMwareEMC acquires VMware Veritas acquires Ejascent Veritas acquires Ejascent IBM, already a pioneerIBM, already a pioneer Sun working hard on itSun working hard on it HP picking upHP picking up
Virtualization is HOT!!!Virtualization is HOT!!!
Virtualization: What is it, Virtualization: What is it, really?really?
Real vs. VirtualReal vs. Virtual Similar essence, effectSimilar essence, effect ””Formally” Formally” differentdifferent
A framework that A framework that combinescombines or or dividesdivides [computing] resources to present a [computing] resources to present a transparenttransparent view of one or more environmentsview of one or more environments Hardware/software partitioning (or aggregation)Hardware/software partitioning (or aggregation) Partial or complete machine simulationPartial or complete machine simulation Emulation (again, can be partial or complete) Emulation (again, can be partial or complete) Time-sharing (in fact, sharing in general)Time-sharing (in fact, sharing in general) In general, can be In general, can be M-to-NM-to-N mapping (M “real” resources, mapping (M “real” resources,
N “virtual” resources)N “virtual” resources) Examples: VM (M-N), Grid Computing (M-1) , Multitasking Examples: VM (M-N), Grid Computing (M-1) , Multitasking
(1-N)(1-N)
Virtualization: Why?Virtualization: Why?
Server consolidationServer consolidation Application ConsolidationApplication Consolidation SandboxingSandboxing Multiple execution environmentsMultiple execution environments Virtual hardwareVirtual hardware DebuggingDebugging Software migration (Mobility)Software migration (Mobility) Appliance (software)Appliance (software) Testing/Quality AssuranceTesting/Quality Assurance
Virtual Machine Virtual Machine Implementation: IssuesImplementation: Issues
Only one “bare” machine interfaceOnly one “bare” machine interface Virtualizable ArchitectureVirtualizable Architecture
““A virtualizable architecture allows any instruction A virtualizable architecture allows any instruction inspecting/modifying machine state to be trapped when inspecting/modifying machine state to be trapped when executed in any but the most privileged mode”executed in any but the most privileged mode”
- Popek & Goldberg (1974)- Popek & Goldberg (1974) X86 is not virtualizableX86 is not virtualizable (Vanderpool??) (Vanderpool??)
Hard to optimize [from below]Hard to optimize [from below] Unused memory pagesUnused memory pages Idle CPUIdle CPU
Difficult to know what NOT to doDifficult to know what NOT to do Example: Page faults (VMM), System Calls (OS level)Example: Page faults (VMM), System Calls (OS level)
HARDWARE
KERNEL
USER LEVEL LIBRARIES
APPLICATIONS
API Calls
System Calls
Instructions
User Space
Kernel Space
Machines: Stacked Machines: Stacked ArchitectureArchitecture
Possible Abstraction LevelsPossible Abstraction Levels
Instruction Set ArchitectureInstruction Set Architecture Emulate the ISA in softwareEmulate the ISA in software
Interprets, translates to host ISA (if required)Interprets, translates to host ISA (if required) Device abstractions implemented in softwareDevice abstractions implemented in software InefficientInefficient
Optimizations: Caching? Code reorganization?Optimizations: Caching? Code reorganization? Applications: Debugging, Teaching, multiple OSApplications: Debugging, Teaching, multiple OS
Hardware Abstraction Layer (HAL)Hardware Abstraction Layer (HAL) Between “real machine” and “emulator” (maps to real Between “real machine” and “emulator” (maps to real
hardware)hardware) Handling non-virtualizable architectures (scan, insert Handling non-virtualizable architectures (scan, insert
code?)code?) Applications: Fast and usable, virtual hardware (in above Applications: Fast and usable, virtual hardware (in above
too), consolidation, migrationtoo), consolidation, migration
Possible Abstraction Levels Possible Abstraction Levels cont’dcont’d
Operating System LevelOperating System Level Virtualized SysCall Interface (may be same)Virtualized SysCall Interface (may be same) May or may not provide all the device abstractionsMay or may not provide all the device abstractions Easy to manipulate (create, configure, destroy)Easy to manipulate (create, configure, destroy)
Library (user-level API) LevelLibrary (user-level API) Level Presents a different subsystem API to applicationPresents a different subsystem API to application Complex implementation, if kernel API is limitedComplex implementation, if kernel API is limited User-level device driversUser-level device drivers
Application (Programming Language) LevelApplication (Programming Language) Level Virtual architecture (ISA, registers, memory, …)Virtual architecture (ISA, registers, memory, …) Platform-independence (Platform-independence ( highly portable) highly portable) Less control on the system (extremely high-level)Less control on the system (extremely high-level)
Overall PictureOverall Picture
**************Degree ofIsolation
**********Ease of Impl
*************Flexibility
**************Performance
PLLibrary
OSHALISA
(more stars are better)
Instruction Set Architecture Instruction Set Architecture Level VirtualizationLevel Virtualization
TechnologiesTechnologies Emulation: Translates guest ISA to native ISAEmulation: Translates guest ISA to native ISA Emulates h/w specific IN/OUT instructions to Emulates h/w specific IN/OUT instructions to
mimic a devicemimic a device Translation Cache: Optimizes emulation by Translation Cache: Optimizes emulation by
making use of similar recent instructionsmaking use of similar recent instructions Code rearrangementCode rearrangement Speculative scheduling (alias hardware)Speculative scheduling (alias hardware)
IssuesIssues Efficient Exception handlingEfficient Exception handling Self-modifying codeSelf-modifying code
ISA Level Virtualization: ISA Level Virtualization: ExamplesExamples
Bochs: Open source x86 emulatorBochs: Open source x86 emulator Emulates whole PC environmentEmulates whole PC environment
x86 processor and most of the hardware (VGA, disk, keyboard, x86 processor and most of the hardware (VGA, disk, keyboard, mouse, …)mouse, …)
Custom BIOS, emulation of power-up, rebootCustom BIOS, emulation of power-up, reboot Host ISAs: x86, PowerPC, Alpha, Sun, and MIPSHost ISAs: x86, PowerPC, Alpha, Sun, and MIPS
Crusoe (Transmeta)Crusoe (Transmeta) ““Code morphing engine” – dynamic x86 emulator on VLIW Code morphing engine” – dynamic x86 emulator on VLIW
processorprocessor 16 MB “translation cache”16 MB “translation cache” Shadow registers: Enables easy exception handling Shadow registers: Enables easy exception handling
QEMU:QEMU: Full ImplementationFull Implementation
Multiple target ISAs: x86, ARM, PowerPC, SparcMultiple target ISAs: x86, ARM, PowerPC, Sparc Supports self-modifying codeSupports self-modifying code Full-software and simulated (using mmap()) MMU Full-software and simulated (using mmap()) MMU
User-space only: Useful for Cross-compilation and cross-User-space only: Useful for Cross-compilation and cross-debuggingdebugging
HAL Virtualization HAL Virtualization TechniquesTechniques
Standalone vs. HostedStandalone vs. Hosted DriversDrivers Host and VMM worldsHost and VMM worlds I/O I/O
Protection RingsProtection Rings Multilevel privilege Multilevel privilege
domainsdomains Handling “silent” failsHandling “silent” fails
Scan code and Scan code and insert/replace artificial insert/replace artificial trapstraps
Cache results to Cache results to optimizeoptimize
VMware ArchitectureVMware Architecture
VMware: I/O VirtualizationVMware: I/O Virtualization
VMM does not have access to I/OVMM does not have access to I/O I/O in “host world”I/O in “host world”
Low level I/O instructions (issued by guest OS) Low level I/O instructions (issued by guest OS) are merged to high-level I/O system callsare merged to high-level I/O system calls
VM Application executes I/O SysCallsVM Application executes I/O SysCalls VM Driver works as the communication VM Driver works as the communication
link between VMM and VM Applicationlink between VMM and VM Application World switch needs to “save” and World switch needs to “save” and
“restore” machine state“restore” machine state Additional techniques to increase Additional techniques to increase
efficiencyefficiency
ParavirtualizationParavirtualization
Traditional architectures do not scaleTraditional architectures do not scale Interrupt handlingInterrupt handling Memory managementMemory management World switchingWorld switching
Virtualized architecture interfaceVirtualized architecture interface Much simpler architectural interfaceMuch simpler architectural interface Virtual I/O and CPU instructions, Virtual I/O and CPU instructions,
registers, …registers, … Portability is lostPortability is lost
ExamplesExamples
DenaliDenali Simpler customized OS with no VM for Simpler customized OS with no VM for
network applicationsnetwork applications XenXen
Simpler port to commercial OSSimpler port to commercial OS Exposes some “real” hardware, e.g. Exposes some “real” hardware, e.g.
clock, physical memory addressclock, physical memory address
OS Level VirtualizationOS Level Virtualization
Containers (operating environments) on top of OSContainers (operating environments) on top of OS Processes, File System, Network resource (IP address), Processes, File System, Network resource (IP address),
Environment variables, System call interfaceEnvironment variables, System call interface TechnologiesTechnologies
chroot(): File system virtualization on Unixchroot(): File system virtualization on Unix Name spaces: Each container is tagged and new entities Name spaces: Each container is tagged and new entities
(fork()) generated from a container remains inside(fork()) generated from a container remains inside System call interposition: The only interface with user System call interposition: The only interface with user
space, can modify parameters, return values (to expose space, can modify parameters, return values (to expose a different environment) a different environment)
Copy-on-write: Enables sharing of filesCopy-on-write: Enables sharing of files Applications: Sandboxing, Fine grain access Applications: Sandboxing, Fine grain access
control (root in the container)control (root in the container)
OS Level Virtualization: OS Level Virtualization: ExamplesExamples
JailJail FreeBSD based virtualization using “FreeBSD based virtualization using “chroot()”chroot()” Scope is limited to the Scope is limited to the jail jail Curtailed access to resources and operationsCurtailed access to resources and operations
Signals, debugger, IP spoofing, system callsSignals, debugger, IP spoofing, system calls A file-system sub-tree, one IP address, one “root”A file-system sub-tree, one IP address, one “root”
Ensim’s “Virtual Private Server”Ensim’s “Virtual Private Server” Supports virtual “boot”, per-VM resource limitsSupports virtual “boot”, per-VM resource limits Virtual /proc, IP address-spaceVirtual /proc, IP address-space
Linux “Virtual Environment” (VE)Linux “Virtual Environment” (VE) Tagged VE (VE-id), policy support for the rights of “root”Tagged VE (VE-id), policy support for the rights of “root”
Library Level VirtualizationLibrary Level Virtualization
TechnologiesTechnologies API interception through DLL hookingAPI interception through DLL hooking Partial/complete implementation of APIsPartial/complete implementation of APIs Emulate low level kernel implementations in user-spaceEmulate low level kernel implementations in user-space
Useful when the host OS does not provide required support Useful when the host OS does not provide required support (e.g. Win32 threads vs. pthreads)(e.g. Win32 threads vs. pthreads)
Mandatory driversMandatory drivers ExamplesExamples
WINE: Win32 API implementation on Unix/XWINE: Win32 API implementation on Unix/X POSIX, OS/2 subsystems on WindowsPOSIX, OS/2 subsystems on Windows
Supports Unix and OS/2 like APISupports Unix and OS/2 like API LxRun: Linux API implementation on SCO UnixWare, LxRun: Linux API implementation on SCO UnixWare,
SolarisSolaris WABI: Sun’s implementation similar to WINE (not WABI: Sun’s implementation similar to WINE (not
extensive)extensive)
Low-Level Drivers
Win9x Kernel
Kernel32.DLL
Gdi32.DLL, User32.DLL,…
Windows DLLs
Applications
Low-Level Drivers
NT Kernel & Executive
NTDLL.DLL
Kernel32.DLL
User32.DLL, …
Windows DLLs
Executables
POSIX,OS/2 Subsystem
Windows ArchitectureWindows Architecture
Wine ArchitectureWine Architecture
Closely follows NTClosely follows NT Implements all the Implements all the
“core” DLLs (ntdll, “core” DLLs (ntdll, user32, kernel32)user32, kernel32)
Wine server provides Wine server provides the NT backbonethe NT backbone Message passingMessage passing SynchronizationSynchronization Object handlesObject handles
Native DLL support for Native DLL support for non-core librariesnon-core libraries
Hardware access Hardware access through Unix device through Unix device driversdrivers
WINE ImplementationWINE Implementation
Wine serverWine server IPC through Unix sockets and shared message queuesIPC through Unix sockets and shared message queues Process/Thread managementProcess/Thread management Simulates Synchronization primitivesSimulates Synchronization primitives
Native vs. Built-in DLLsNative vs. Built-in DLLs DLLs are implemented as Unix shared libraries (built-in DLLs)DLLs are implemented as Unix shared libraries (built-in DLLs) Supports non-core Windows DLLs (Native DLLs)Supports non-core Windows DLLs (Native DLLs) A fully implemented built-in DLL takes precedence over native A fully implemented built-in DLL takes precedence over native
DLLsDLLs Executable LoadExecutable Load
DLL descriptors table maintain the list of loaded DLLsDLL descriptors table maintain the list of loaded DLLs Imports are resolved using DLL descriptor table or on-disk DLLsImports are resolved using DLL descriptor table or on-disk DLLs
Processes/ThreadsProcesses/Threads Windows processes are mapped to WINE/UNIX processesWindows processes are mapped to WINE/UNIX processes Thread-related APIs implemented in user-space and using Thread-related APIs implemented in user-space and using
pthreadspthreads
Application Level Application Level VirtualizationVirtualization
Java Virtual Machine (JVM)Java Virtual Machine (JVM) Executes Java byte code (virtual instructions)Executes Java byte code (virtual instructions) Provides the implementation for the instruction set interpreter Provides the implementation for the instruction set interpreter
(or JIT compiler)(or JIT compiler) Provides code verification, SEH, garbage collection Provides code verification, SEH, garbage collection Hardware access through underlying OSHardware access through underlying OS
JVM ArchitectureJVM Architecture Stack-based architectureStack-based architecture No MMUNo MMU Virtual hardware: PC, register-set, heap, method (code) areasVirtual hardware: PC, register-set, heap, method (code) areas Rich instruction setRich instruction set
Direct object manipulation, type conversion, exception throwsDirect object manipulation, type conversion, exception throws Provides a runtime environment through JREProvides a runtime environment through JRE Other Examples: .NET CLI, Parrot (PERL 6)Other Examples: .NET CLI, Parrot (PERL 6)
Featherweight Virtual Machine Featherweight Virtual Machine (FVM)(FVM)
MotivationMotivation ““Trying out” un-trusted programs in a realistic settingTrying out” un-trusted programs in a realistic setting System Inconsistencies due toSystem Inconsistencies due to
New application installations New application installations Accidental deletion of critical system filesAccidental deletion of critical system files
through application uninstall or human errorthrough application uninstall or human error System damages due to VirusSystem damages due to Virus Hard to undo changes made to the SystemHard to undo changes made to the System
RequirementsRequirements IsolationIsolation Easy Manipulation: Create, Destroy, Suspend, ResumeEasy Manipulation: Create, Destroy, Suspend, Resume Persistence across rebootsPersistence across reboots
Processes are killed Processes are killed Other states need to be savedOther states need to be saved
Flexibility: Interface to configure a machine’s visibility Flexibility: Interface to configure a machine’s visibility
FVM: ArchitectureFVM: Architecture
Virtualization at the OS LevelVirtualization at the OS Level Name-spaceName-space Renaming at the System call interfaceRenaming at the System call interface
Each VM starts with a similar environment as the Each VM starts with a similar environment as the host machinehost machine
VM statesVM states VM-ID, IP address, Processes VM-ID, IP address, Processes Logs for deleted registry-entries and filesLogs for deleted registry-entries and files Visibility optionsVisibility options
OperationsOperations Create/Delete, Suspend/Resume, Copy, Commit, Create/Delete, Suspend/Resume, Copy, Commit,
ConfigureConfigure
ImplementationImplementation
RegistryRegistry Registry access prefixed with the FVM’s repository key Registry access prefixed with the FVM’s repository key
and the VM-ID along with COWand the VM-ID along with COW Example: \HKCU\X Example: \HKCU\X \FVMRep\VM1\HKCU\X\FVMRep\VM1\HKCU\X
File System: Similar to registryFile System: Similar to registry ProcessesProcesses
First process created through CreateVM()First process created through CreateVM() Child belongs to the same VM as parent’s (services?)Child belongs to the same VM as parent’s (services?)
Services and SCMServices and SCM ObjectsObjects NetworkNetwork
FVM: ApplicationsFVM: Applications
Secure mobile code executionSecure mobile code execution Automatic clean uninstallAutomatic clean uninstall Memory Stick based mobile Memory Stick based mobile
computingcomputing
Virtualizations UncoveredVirtualizations Uncovered
Display virtualization (Terminal Display virtualization (Terminal Service)Service)
Network stack virtualizationNetwork stack virtualization Grid-computingGrid-computing And many moreAnd many more