november 2004 j. e. smith virtual machines: an architecture perspective

53
November 2004 J. E. Smith Virtual Machines: An Virtual Machines: An Architecture Perspective Architecture Perspective

Upload: randy-nuth

Post on 15-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

November 2004

J. E. Smith

Virtual Machines: An Architecture Virtual Machines: An Architecture PerspectivePerspective

VMs (c) 2004, J. E. Smith 2

IntroductionIntroduction

Why are virtual machines interesting?

They involve computer architecture in a pure sense

They allow transcending of interfaces

(which often seem to be an obstacle to innovation)

They enable innovation in flexible, adaptive hardware, security, fault-tolerance, support for network computing (and others)

VMs (c) 2004, J. E. Smith 3

Performance Isn’t EverythingPerformance Isn’t Everything

The BIG ideas are all at least 20 years old and they have been very thoroughly explored

Focus research on other important areas• Power efficiency• Performance efficiency• Security• Ease of design• Software compatibility / interoperability

Virtual Machines can be important enablers for all the above

VMs (c) 2004, J. E. Smith 4

OutlineOutline

Virtualization The Family of Virtual Machines Process VMs and Code Caching High Level Language VMs Co-Designed VMs Research in Co-Designed VMs

VMs (c) 2004, J. E. Smith 5

AbstractionAbstraction

Computer systems are built on levels of abstraction

Instruction Set Architecture• Major division between hardware and software

I/O devicesand

Networking

Controllers

System Interconnect(bus)

Controllers

MemoryTranslation

Execution Hardware

DriversMemoryManager

Scheduler

Operating System

Libraries

ApplicationPrograms

MainMemory

1

2

33

4 5 6

7 78888

9

10 10

1111 12

13 14

Software

Hardware

Application Binary Interface• Observed by user processes• User ISA + OS calls

Higher level of abstraction hide details at lower levels

Example: files are an abstraction of a disk

filefile

abstraction

VMs (c) 2004, J. E. Smith 6

VirtualizationVirtualization

An isomorphism from guest to host• Map guest state to host state• Implement “equivalent” functions

Si S

Si' Sj'

Guest

Host

V(Si) V( S j )

e(Si)

e'(Si')

j

VMs (c) 2004, J. E. Smith 7

VirtualizationVirtualization

Similar to abstractionExcept•Details not necessarily hidden

Construct Virtual Disks•As files on a larger disk•Map state•Implement functions

Now do the same thing with the whole “machine”

file file

virtualization

VMs (c) 2004, J. E. Smith 8

The Family of Virtual MachinesThe Family of Virtual Machines Lots of things are called “virtual machines”

IBM VM/370

Java

VMware

Some things not called “virtual machines”, are virtual machines

IA-32 EL

Dynamo

Transmeta Crusoe

VMs (c) 2004, J. E. Smith 9

System Virtual MachinesSystem Virtual Machines

Provide a system environment

Constructed at ISA level

Persistent Examples: IBM

VM/360, VMware, Transmeta Crusoe

guestprocess

HOST PLATFORM

virtualnetwork communication

Guest OS

VMM

guestprocess

guestprocess

guestprocess

Guest OS2

VMM

guestprocess

guestprocess

VMs (c) 2004, J. E. Smith 10

System Virtual MachinesSystem Virtual Machines

Native VM System•VMM privileged mode•Guest OS user mode•Example: classic IBM VMs

User-mode Hosted VM•VMM runs as user application

Dual-mode Hosted VM•Parts of VMM privileged, parts non-privileged•Example VMware

Non-privilegedmodes

PrivilegedMode

Virtual

Machine

VMM

Hardware

Virtual

Machine

Host OS

Hardware

VMM

VirtualMachine

Host OS

Hardware

VMM

VMs (c) 2004, J. E. Smith 11

Process Virtual MachinesProcess Virtual Machines

Constructed at ABI level Runtime manages guest

process Guest processes may

intermingle with host processes

Not persistent As a practical matter,

guest and host OSes are often the same

Dynamic optimizers are a special case

Examples: IA-32 EL, FX!32, Dynamo

HOST OS

Disk

file sharing

network communication

guestprocess

create

hostprocess

guestprocess

runtimeruntime

guestprocess

runtime

hostprocess

VMs (c) 2004, J. E. Smith 12

The Virtual Machine SpaceThe Virtual Machine Space

Multiprogrammed

Systems

HLL VMsCo-Designed

VMs

same ISAdifferent

ISA

Process VMs System VMs

WholeSystem VMs

differentISA

same ISA

ClassicOS VMs

DynamicBinary

Optimizers

DynamicTranslators

HostedVMs

VMs (c) 2004, J. E. Smith 13

Architecture Issues: System VMsArchitecture Issues: System VMs

Why System VMs are of interest today• Security & Fault Tolerance (isolation)• Platform Consolidation• Application/Environment portability

“Efficiently Virtualizable” Instruction Sets• Goldberg and Popek (1974) should still be required reading

(An architecture paper with theorems and proofs!) Virtual Machine Assists

• Compensate for inefficiencies due to privilege level “compression”

• Fast emulation of system functions• Many developed for IBM mainframe VMs

VMs (c) 2004, J. E. Smith 14

System VirtualizationSystem Virtualization

Traps and interrupts (& sys calls)• Transfer to VMM• VMM determines appropriate Guest OS• VMM transfers to Guest OS

Guest performs privileged operation

• Trap to VMM• VMM reads/modifies guest state• May modify shadow state• Returns to Guest

Guest OS “return” to user app.• Transfer to VMM• VMM bounces return back to Guest app.

privileged operation

next instruction

check privileges

perform operation

return

system call/trap

vector location:

virtual vector location:

Application

Guest OS

VMM

VMs (c) 2004, J. E. Smith 15

Popek and Goldberg (in brief)Popek and Goldberg (in brief)

Control Sensitive instructions• All instructions that change hardware resource

allocation (or mapping)• Example: write TLB

Behavior Sensitive instructions• All instructions whose outcome depends on

hardware resource allocation• Example: read processor mode

Theorem (paraphrase)• Efficiently virtualizable if all sensitive instructions

trap in user mode

VMs (c) 2004, J. E. Smith 16

System VM ResearchSystem VM Research

Architecture Challenge: • Make IA-32 efficiently virtualizable

Virtual Machine Assists• Compensate for inefficiencies due to privilege level

“compression”• Fast emulation of system functions• Many developed for IBM mainframe VMs

Applications to Chip Multiprocessors• Technology changes often require innovation and

“re-invention”

VMs (c) 2004, J. E. Smith 17

The Virtual Machine SpaceThe Virtual Machine Space

Multiprogrammed

Systems

HLL VMsCo-Designed

VMs

same ISAdifferent

ISA

Process VMs System VMs

WholeSystem VMs

differentISA

same ISA

ClassicOS VMs

DynamicBinary

Optimizers

DynamicTranslators

HostedVMs

VMs (c) 2004, J. E. Smith 18

Architecture Issues: Process VMsArchitecture Issues: Process VMs

Generally to allow application migration• Or to run popular software on a less popular platform• Goal is generally to minimize performance loss

Same-ISA dynamic optimizers are special case• HP Dynamo

Architecture problems• Efficient code-caching• Indirect jump problem• Protecting runtime from guest process

VMs (c) 2004, J. E. Smith 19

Staged Emulation with Code CachingStaged Emulation with Code Caching

An important part of many VM implementations

Translate, optimize & cache frequent code sequences

Binary MemoryImage

Code CacheProfile Data

Interpreter

Translator/Optimizer

runtime

Start interpreting Profile to find “hot” code regions

VMs (c) 2004, J. E. Smith 20

SuperblocksSuperblocks

Based on “hot” paths One entry multiple exits May contain redundant blocks (tail duplication)

15

B D

C

G

A

EF

15

B D

C

G

A

EF

GG

VMs (c) 2004, J. E. Smith 21

Binary Translation ExampleBinary Translation Example

4FD0: addl %edx,(%eax) ;load and accumulate summovl (%eax),%edx ;store to memorysub %ebx,1 ;decrement loop countjz 51C8 ;branch if at loop end

4FDC: add %eax,4 ;increment %eaxjmp 4FD0 ;jump to loop top

51C8: movl (%ecx),%edx ;store last value of %edxxorl %edx,%edx ;clear %edxjmp 6200 ;jump elsewhere

x86 Binary9AC0: lwz r16,0(r4) ;load value from memory

add r7,r7,r16 ;accumulate sumstw 0(r5),r7 ;store to memorysubi. r5,r5,1 ;decrement loop count, set cr0bez cr0,pc+12 ;branch if loop exitbl F000 ;branch & link to EM4FDC ;save source PC in link register

9AE4: bl F000 ;branch & link to EM51C8 ;save source PC in link register

9C08: stw 0(r6),r7 ;store last value of %edxsubi r7,r7,r7 ;clear %edxbl F000 ;branch & link to EM6200 ;save source PC in link register

PowerPC Translation

VMs (c) 2004, J. E. Smith 22

Code CachesCode Caches

Contain• Basic blocks• Superblocks (one entrance, multiple exits)• Optimized Superblocks

A base technology for many VMs• Dynamic binary translators: Intel IA-32 EL, Compaq FX!32• Dynamic binary optimizers: Dynamo family• Co-designed virtual machines: Transmeta, IBM DAISY• High performance Java virtual machines• System VMs with “inefficiently virtualizable” ISAs• “Sandboxing” secure VMs (x86 DynamoRIO)

VMs (c) 2004, J. E. Smith 23

Indirect JumpsIndirect Jumps

Translated code cache PC (TPC) differs from Source binary PC (SPC)

• Need branch/jump target address translation• (Direct) branches are easier; target address is fixed

Chaining can be used

Superblock

Dispatch table

lookup code

Superblock

Superblock

Without chaining

Superblock

Dispatch table

lookup code

Superblock

Superblock

With chaining

Superblock

VMs (c) 2004, J. E. Smith 24

The Indirect Jump ProblemThe Indirect Jump Problem Target addresses (SPCs) can change

• SPC needs to be translated at run-time, not translation time Conventional solution: superblock construction-time

software prediction (aka inline caching)

If Rx == #addr_1 goto #target_1

Else if Rx == #addr_2 goto #target_2

Else dispatch_table_lookup(Rx); do it the slow way

• The biggest overhead in code caches– Compare-and-branch: 6 instructions

– Hash table lookup: 15 instructions in Dynamo x86

VMs (c) 2004, J. E. Smith 25

Protecting the RuntimeProtecting the Runtime

The runtime shares process memory space with application

• Must protect runtime from application

• Expensive memory protection changes on switches between runtime and code cache

• If guest registers are mapped to host memory

How are memory mapped registers protected?

Guest Code

Guest Data

RuntimeData

RuntimeCode

N

R/W

Code Cache

Ex

R/W

N

R/W

R/W Guest Code

Guest Data

RuntimeData

RuntimeCode

N

N

Code Cache

N

Ex

N

R/W

R

Runtime mode Emulation mode

VMs (c) 2004, J. E. Smith 26

Process VM ResearchProcess VM Research

Same-ISA dynamic binary optimizers are probably not a winning proposition

• Indirect jumps lead to performance losses on modern processors

(optimizers with patching are better)• Complete (intrinsic) compatibility is extremely difficult

May have to rely on extrinsic assurances

Topic of architecture research similar to Goldberg and Popek

For general process VMs some primitive support in ISA will be useful / necessary

• Indirect jumps (more later)• Code caching• Protection

VMs (c) 2004, J. E. Smith 27

Computer Architecture InnovationComputer Architecture Innovation

HLL VMs – software people invent ISA to solve SW problems

Co-Designed VMs – hardware people invent ISA to solve HW problems

These two are the most interesting VMs from an architecture perspective and provide the biggest opportunities.

VMs (c) 2004, J. E. Smith 28

The Virtual Machine SpaceThe Virtual Machine Space

Multiprogrammed

Systems

HLL VMsCo-Designed

VMs

same ISAdifferent

ISA

Process VMs System VMs

WholeSystem VMs

differentISA

same ISA

ClassicOS VMs

DynamicBinary

Optimizers

DynamicTranslators

HostedVMs

VMs (c) 2004, J. E. Smith 29

High Level Language Virtual MachinesHigh Level Language Virtual Machines

Raise the “ABI” level of abstraction• User higher level virtual ISA• OS abstracted as standard libraries

A form of process VM

HLL Program

Intermediate Code

Memory Image

Object Code(ISA)

Compiler front-end

Compiler back-end

Loader

HLL Program

Portable Code(Virtual ISA )

Host Instructions

Virt. Mem. Image

Compiler

VM loader

VM Interpreter/Translator

Traditional HLL VM

VMs (c) 2004, J. E. Smith 30

Architecture Issues: High Level VMsArchitecture Issues: High Level VMs

Examples:• Sun Java• Microsoft .NET Framework and MSIL

Why are HLL VMs important?

• Microsoft says so.• It’s a good idea.

Combines object oriented programming and network computing

VMs (c) 2004, J. E. Smith 31

HLL VMs: Architecture PerspectiveHLL VMs: Architecture Perspective

Here, architects were deprived (or let themselves be deprived) of some interesting architecture work

Don’t look at it bottom-up, i.e.• Take existing software for supporting HLL VMs, • Generate traces for standard ISAs, • Analyze traces• Conclude its “just like C”… problem solved!

Look top-down – start with features of MSIL and look for computer architecture opportunities

• Will require a mix of hardware and software innovation

(else just continue to ignore real architecture in favor of implementation)

VMs (c) 2004, J. E. Smith 32

HLL VM ResearchHLL VM Research

Metadata – an interesting concept• Data Set Architecture• Don’t have to discover data structures

– compare with C programs.

Metadata

Code

Machine IndependentProgram File

Loader

Virtual MachineImplementation

Interpreter

Internal DataStructures

Translator Native Code

VMs (c) 2004, J. E. Smith 33

HLL VM ResearchHLL VM Research

Precise trap model• Problems in conventional processors:

All state preciseMany instructions can trap

Enable/disable “remote” and at any time • HLL VMs

Not all state must be precisePC not needed

operand stack neverlocal variables only if trap is handled locally

Trap enable explicit and locally specified

VMs (c) 2004, J. E. Smith 34

HLL VM ResearchHLL VM Research

Stack tracking• At any given point, operand stack must have

same number of elements and types regardless of control flow path

• This property could simplify exploitation of control independence

VMs (c) 2004, J. E. Smith 35

HLL VMs SummaryHLL VMs Summary

Claim: Slow-downs due to OO programming, probably not dynamic compilation

– and not stack-based ISA Research opportunities abound

• For VM implementation• For speeding up OO programs (look beyond C/C++)• Use co-designed HW/SW

Base design on MSIL/Java and implement conventional ISA as the uncommon case

VMs (c) 2004, J. E. Smith 36

The Virtual Machine SpaceThe Virtual Machine Space

Multiprogrammed

Systems

HLL VMsCo-Designed

VMs

same ISAdifferent

ISA

Process VMs System VMs

WholeSystem VMs

differentISA

same ISA

ClassicOS VMs

DynamicBinary

Optimizers

DynamicTranslators

HostedVMs

VMs (c) 2004, J. E. Smith 37

Co-Designed Virtual MachinesCo-Designed Virtual Machines

Separate the hardware/software interface from the ISA level of abstraction Restore the ISA to its “natural” place

as an Implementation ISA that reflects actual hardware Support existing ISAs

as a Virtual ISA Let processor designers use both hardware and software A form of system VM

OS

libs.

User Applications

V-ISA

I-ISA

Hardware

Software

Hardware

OS

libs.

User Applications

ISA

VMs (c) 2004, J. E. Smith 38

Co-Designed VMsCo-Designed VMs

Should be of interest to both architects and micro-architects

• Offers opportunities for performance, power saving, fault tolerance and other implementation-dependent features

• Allows transcending conventional ISAs• Don’t confuse them with VLIW!

VMs (c) 2004, J. E. Smith 39

Architecture Issues: Concealed MemoryArchitecture Issues: Concealed Memory

VM software resides in memory concealed from all conventional software

Source ISA Data

CodeCache

VM Code

ICacheHierarchy

DCacheHierarchy

ProcessorCore

Source ISA Code

VM Data

concealed memory

conventionalmemory

VMs (c) 2004, J. E. Smith 40

Another Way of Doing ThingsAnother Way of Doing Things

conventional

dynamic translation

Code CacheProcessor

Pipeline

Software

Translator

Main Memory

Func.Unit

Func.Unit

. ..

Main MemoryCache

HierarchyProcessor

Pipeline

TranslationUnit

(form uops)

Func.Unit

Func.Unit

Func.Unit

. ..

TranslationUnit

(form uops)

CacheHierarchy

VMs (c) 2004, J. E. Smith 41

Jump Target-address Lookup TableJump Target-address Lookup Table A hardware cache of dispatch table entries Similar to software-managed TLB in virtual memory

Jump insn TPC

BTB

Predicted next fetch TPC

Tag TPC

Jump insn

Register identifier

SPC

Register file

Jump Target SPC

SPC TPC

JTLT

Jump Target TPC

Hit?Match?Yes

BTB prediction correct

Yes

NoBTB misprediction:

Redirect fetch to jump target TPC from JTLT

NoJTLT miss:

Redirect fetch to the dispatch code

VMs (c) 2004, J. E. Smith 42

SPC TPC

Push-dual-address-RAS insn

Dual-address RASDual-address RAS

Problem: function call instruction saves return SPC not TPC • Conventional software-based chaining cannot utilize a RAS

Solution: save both SPC and TPC

Dual-address RAS

SPC TPC

SPC TPC

JTLT

VMs (c) 2004, J. E. Smith 43

IPC performanceIPC performance

“Translate” Alpha to Alpha; start with highly optimized code Conventional method (ala Dynamo) results in 14% IPC loss Dual-address RAS provides the most benefit Using both JTLT & RAS, 7.7% IPC improvement

• Due to superblock re-layout

0.8

1

1.2

1.4

1.6

1.8

2

2.2

2.4

164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.perlbmk 254.gap 255.vortex 256.bzip2 300.twolf H.mean

IPC

original sw_pred.sw_pred sw_pred.sw_pred (private dispatch) sw_pred.ras jtlt.ras

VMs (c) 2004, J. E. Smith 44

Wide pipelines are at odds with fast pipelines•Fast pipeline => low complexity per stage•More instructions per stage => high complexity per stage

Process larger atomic units in pipeline stages Narrower “effective” width

Reduce decoding stages•Do more in software

Pipeline the issue stage

Research: Efficient MicroarchitecturesResearch: Efficient Microarchitectures

VMs (c) 2004, J. E. Smith 45

Fused Instruction SetFused Instruction Set

Co-designed VM x86 implementation• Shorten and simplify pipeline front-end

Combine pairs of dependent instructions• For single “unit” for pipeline processing

Use VM software to• “Crack” x86 instructions into RISC-ops• Re-order RISC-ops• Reassemble into (new) fused pairs

Related: Pentium-M fuses in front-end• Using original x86 instructions

VMs (c) 2004, J. E. Smith 46

Conventional Issue LogicConventional Issue Logic Select and issue instructions free of data

dependences Based on the selection, clear dependences

•And “wake-up” newly independent instructions Single cycle select-wakeup important for good

performance

OP R1 Imm.R2

OP R6 R7R1

Issue Buffer

selectfanout/

wakeup

VMs (c) 2004, J. E. Smith 47

Fuse dependent instructions into single slot Fused instructions traverse entire pipeline Make single issue decision for the pair

Pipelined Issue LogicPipelined Issue Logic

O P 2 R 1 Im m .R 2O P 1 R 6

Is s u e B u ffe r

s e le c tfa n o u t/w a k e u p

VMs (c) 2004, J. E. Smith 48

Instruction SetInstruction Set

call 0x080af30e (21bit disp)jcc 0x080115a0jmp 0x080C0988

LIMM.lo Redx, LO(0x0810a7de)LIMM.hi Redx, HI(0x0810a7de)CMP.cc Reax, 0x4000

LD Reax, mem[Resp + F8]ST Reax, mem[Rebp + 4C]ADD Reax, Rebx, 4c

ADD Reax, Redx, RebxFmac Facc, Fmp1, Fmp2LD Reax, mem[Rebx + Rebp]

mov esp, ebp MOV Resp, Rebpmov eax,[esp] LD Reax, mem[Resp]add eax, edx ADD Reax, Redx

sub ecx, 4 SUB Recx, 4shr esi, 2 SHR Resi, 2inc ecx INC Recx, 1

jcc 3e e.g. jnz 3e

21-bit Immediate/Displacement10b opcode

11b Immd/Disp10b opcode 5b Rds5b Rsr

16-bit opcode 5b Rds5b Rsr5b Rsr

4b Rd4b Rs 7b op

4b Rd4b I 7b op

8b Immd/Disp 7b op

F

16-bit immediate / Disp10b opcode 5b Rds

F

F

F

F

F

F

VMs (c) 2004, J. E. Smith 49

Translation AlgorithmTranslation Algorithm

Two Pass Algorithm:

1. Form superblocks using Dynamo MRET method

2. Crack x86 instructions into RISC-like micro-ops

3. Attempt to fuse ALU ops only

4. Fuse LD/ST instructions as tails and ALU ops as heads

VMs (c) 2004, J. E. Smith 50

Fusing ProfileFusing Profile

About 50% of operations are fused Only 5-10% of non-fused are single-cycle ALU ops

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

164.

gzip

175.

vpr

176.

gcc

181.

mcf

186.

craf

ty

197.

parse

r

252.

eon

253.

perlb

mk

254.

gap

255.

vorte

x

256.

bzip2

300.

twolf

Avera

ge

Per

cen

tag

e o

f D

ynam

ic I

nst

ruct

ion

s

ALU

FP or NOPs

BR

ST

LD

Fused

VMs (c) 2004, J. E. Smith 51

Distance Between Fused OperationsDistance Between Fused Operations Most fused operations close together

• 70% of fused ops from different x86 instructions• 60% contain two ALU operations

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

164.gzip

175.vpr

176.gcc

181.mcf

186.crafty

197.parser

252.eon

253.perlbmk

254.gap

255.vortex

256.bzip2

300.twolf

Percentage of fused macro-ops

1 2 3 4 5 6 7

VMs (c) 2004, J. E. Smith 52

Performance (Normalized IPC)Performance (Normalized IPC)

Baseline: generic superscalar Macro-op: Fused macro-ops with pipelined issue logic Baseline Pipelined: superscalar with pipelined issue logic

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

16 24 32 40 48 56 64Issue Window Size

Rel

ativ

e IP

C p

erfo

rman

ce

4-wide Macro-op 4-wide Baseline 4-wide Baseline Pipelined

2-wide Macro-op

VMs (c) 2004, J. E. Smith 53

VM ResearchVM Research

Architecture Support for VMs• Enable spectrum of VMs (process, system, HLL, co-designed)• Support for dynamic translation and optimization• Primitives: code caches & indirect jumps; concealed memory• Pays for itself – helps get rid of obsolete ISA baggage

VM applications• Security• Fault Tolerance

Co-Designed VMs• Efficient microarchitecture• Adaptive microarchitecture

For power efficiencyFor performance

New ISAs• Application-area specific ISAs• Support for Java/MSIL• “Convergence” architectures

Computer Architects can do Computer Architecture!