the virtual block interface - eth z

74
The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework Nastaran Hajinazar Pratyush Patel Minesh Patel Konstantinos Kanellopoulos Saugata Ghose Rachata Ausavarungnirun Geraldo F. Oliveira Jonathan Appavoo Vivek Seshadri Onur Mutlu

Upload: others

Post on 27-Dec-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

TheVirtualBlockInterface:AFlexibleAlternativetothe

ConventionalVirtualMemoryFramework

Nastaran Hajinazar Pratyush PatelMinesh PatelKonstantinosKanellopoulos Saugata GhoseRachata Ausavarungnirun GeraldoF.Oliveira

JonathanAppavoo VivekSeshadriOnur Mutlu

ExecutiveSummary• Motivation:Moderncomputingsystemscontinuetodiversifywithrespecttosystemarchitecture,memorytechnologies,andapplications’memoryneeds

• Problem:Continuallyadaptingtheconventionalvirtualmemoryframeworktoeachpossiblesystemconfigurationischallenging- Resultsinperformancelossorrequiresnon-trivialworkarounds

• Goal:Designanalternativevirtualmemoryframeworkthat(1)Efficientlysupportsawidevarietyofnewsystemconfigurations(2)Providesthekeyfeaturesandeliminatesthekeyinefficienciesof

theconventionalvirtualmemoryframework

• VirtualBlockInterface(VBI):Delegatesmemorymanagementtodedicatedhardwareinthememorycontroller- Efficientlyadaptstodiversesystemconfigurations- Reducesoverheadsandcomplexitiesassociatedwithconventionalvirtualmemory- Enablesmanyoptimizations(e.g.,low-overheadpagewalksinvirtualmachines,virtualcaches)

• Evaluation:Twoexampleusecases1. VBI significantlyimprovesperformanceforbothnativeexecution(2.4x)

andvirtualmachineenvironments(4.3x)2. VBIsignificantlyimprovesheterogeneousmemoryarchitectureeffectiveness

2

Motivation

Outline

3

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

Motivation

Outline

4

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

VirtualMemory

5

Application

Hardware

VirtualMemorymanagedbytheoperatingsystem

ComputingSystemsAreDiversifying

6

Application

VirtualMemorymanagedbytheoperatingsystem

Hardware

Cannotadaptefficiently

ComputingSystemsAreDiversifying

7

Application

VirtualMemorymanagedbytheoperatingsystem

Hardware

Cannotadaptefficiently

Continuallyadaptingtheconventionalvirtualmemoryframeworkischallenging

8

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

ConventionalVirtualMemoryFramework

9

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

ConventionalVirtualMemoryFramework

eachprocessismappedtoafixed-sizevirtualaddressspace

e.g.,256TBinIntelx86-64

10

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

ConventionalVirtualMemoryFramework

one-to-onemappingmanagedbytheOS

11

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

ConventionalVirtualMemoryFramework

per-processpagetablestomapeachVAStophysicalmemory

managedbytheOS

readbyhardware

Challenges

• Threeexamplesofthechallenges inadaptingconventionalvirtualmemoryframeworksforincreasingly-diversesystems:

- Requiringa rigid pagetablestructure

- Highaddresstranslationoverhead invirtualmachines

- Inefficient heterogeneousmemorymanagement

12

Challenge1:RigidPageTableStructures

• Flexiblycustomizedpagetablescanreducetheaddresstranslationoverhead- Customizedtotheapplication’smemorybehavior• e.g.,largergranularitiesformoredenselyallocatedmemoryregions

• Con:- Requiresarigid pagetablestructure

• e.g.,fixed-granularity4-levelpagetableinIntelx86

13

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

accessedbybothOS andhardware

Challenges

• Threeexamplesofthechallenges inadaptingconventionalvirtualmemoryframeworksforincreasingly-diversesystems:

- Requiringa rigid pagetablestructure

- Highaddresstranslationoverhead invirtualmachines

- Inefficient heterogeneousmemorymanagement

14

Challenge2:OverheadsinVirtualMachines

15

Host Virtual Address Space

Host OS

Host Page Tables

Physical Memory

Process 1

VAS 1

Challenge2:OverheadsinVirtualMachines

16

• Invirtualmachines,processesgothroughanextralevelofindirection

• Con:- 2Dpagetablewalks

Guest OS

Host Virtual Address Space

Host OS

Process 2

Host Page Tables

Physical Memory

---- virtualization layer ----

Guest Virtual Address Space

g VAS

Guest Page Tables

Process 1

VAS 1 VAS 2

guestvirtual– to–

hostvirtual

hostvirtual– to–

hostphysical

Challenges

• Threeexamplesofthechallenges inadaptingconventionalvirtualmemoryframeworksforincreasingly-diversesystems:

- Requiringa rigid pagetablestructure

- Highaddresstranslationoverhead invirtualmachines

- Inefficient heterogeneousmemorymanagement

17

Page Tablesmanaged by the OS

• Enhancingperformancewithheterogenousmemoriesrequires:- Datamapping

18

Virtual Address Space (VAS)

P1

VAS 1

Challenge3:ManagingHeterogeneousMemory

Slow Mem.Fast Mem.

19

Virtual Address Space (VAS)

P1

VAS 1

Page Tablesmanaged by the OS

Challenge3:ManagingHeterogeneousMemory

Slow Mem.Fast Mem.

• Enhancingperformancewithheterogenousmemoriesrequires:- Datamapping- Datamigration

• Con:- OShaslowvisibilityintoruntimememorybehavior• Timelyreactiontothechangesischallenging

PriorWorks• Optimizationsthatalleviate theoverheads oftheconventionalvirtualmemoryframework

Shortcomings:• Basedonspecific systemorworkloadcharacteristics

- Areapplicabletoonlylimitedproblemsorapplications

• Requirespecialized andnotnecessarilycompatiblechangestoboththeOSandhardware- Implementingallinasystemisadauntingprospect

20

PriorWorks• Optimizationsthatalleviate theoverheads oftheconventionalvirtualmemoryframework

Shortcomings:• Basedonspecific systemorworkloadcharacteristics

- Areapplicabletoonlylimitedproblemsorapplications

• Requirespecialized andnotnecessarilycompatiblechangestoboththeOSandhardware- Implementingallinasystemisadauntingprospect

21

Weneedaholisticsolutionthatefficientlysupportsincreasinglydiversesystemconfigurations

Designanalternativevirtualmemoryframeworkthat

• Efficiently andflexibly supportsincreasinglydiversesystemconfigurations

• Provides thekeyfeatures ofconventionalvirtualmemoryframeworkwhileeliminating itskeyinefficiencies

OurGoal

22

Motivation

Outline

23

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

VirtualBlockInterface(VBI)

VBIisanalternativevirtualmemoryframework

Keyidea:

Delegate physicalmemorymanagementtodedicatedhardwareinthememorycontroller

24

VBI:GuidingPrinciples• Sizevirtualaddressspacesappropriatelyforprocesses

- Mitigates translationoverheads ofunnecessarilylargeaddressspaces

• Decoupleaddresstranslationfromaccessprotection- Defersaddresstranslationuntilnecessarytoaccessmemory- Enablestheflexibility ofmanagingtranslationandprotectionusingseparatestructures

• Communicatedatasemanticstothehardware- Enablesintelligentresourcemanagement

25

Addressestherigidnessandlackofinformationincurrentframeworks,toreducelargeoverheads

Motivation

Outline

26

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

VBI:Overview

27

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

VBIConventional Virtual Memory

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

VirtualBlocks

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• Globally-visible VBIaddressspace

28

VirtualBlocks

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• Globally-visible VBIaddressspace- Consistsofasetofvirtualblocks (VBs)ofdifferentsizes• Examplesizeclasses:4KB,128KB,4MB,128MB,4GB,128GB,4TB,128TB

29

VirtualBlocks

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• Globally-visible VBIaddressspace- Consistsofasetofvirtualblocks (VBs)ofdifferentsizes• Examplesizeclasses:4KB,128KB,4MB,128MB,4GB,128GB,4TB,128TB

• AllVBsarevisibletoallprocesses

30

VirtualBlocks

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• Globally-visible VBIaddressspace- Consistsofasetofvirtualblocks (VBs)ofdifferentsizes• Examplesizeclasses:4KB,128KB,4MB,128MB,4GB,128GB,4TB,128TB

• AllVBsarevisibletoallprocesses

• ProcessesmapeachsemanticallymeaningfulunitofinformationtoaseparateVB- e.g.,adatastructure,asharedlibrary

31

InherentlyVirtualCaches

• VBIaddressspaceprovidessystem-wide uniqueVBIaddresses

• VBIaddresses aredirectly usedtoaccesson-chipcaches- Nolongerrequireaddresstranslation

• Pros:- Enablesinherentlyvirtualcaches

• nosynonymsandhomonyms

32

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

Hardware-ManagedMemory

• MemorymanagementisdelegatedtotheMemoryTranslationLayer(MTL) inthememorycontroller- Addresstranslation- Physicalmemoryallocation

• Pros:Manybenefits,including- Physicalmemoryisallocatedonlywhenthelocationneedstobewrittentomemory

- Noneedfor2Dpagewalksinvirtualmachines

- Enablingflexibletranslationstructures33

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

OS-ManagedAccessPermissions

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• OScontrolswhichprocessesaccesswhichVBs

• Eachprocesshasitsownpermissions (read/write/execute)whenattaching toaVB

• OSmaintainsalistofVBsattachedtoeachprocess- Storedinaper-processtable- Usedduringpermissionchecks

34

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

ProcessAddressSpaceinVBI

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• AnyprocesscanattachtoanyVB

• Aprocess'VBsdefineitsaddressspace- Addressspacesizeisdeterminedbytheactual needsoftheprocess

theaddressspaceofprocessP1

35

ProcessAddressSpaceinVBI

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes• AnyprocesscanattachtoanyVB

• Aprocess'VBsdefineitsaddressspace- Addressspacesizeisdeterminedbytheactual needsoftheprocess

theaddressspaceofprocessP1

36

Firstguidingprinciple:Appropriately-sizedvirtualaddressspaces

DecoupledProtectionandTranslation

AddressmappingmanagedbyOS

37

Virtual Address Space (VAS)

P1

VAS 1

P2 Pn

Page Tablesmanaged by the OS

Physical Memory

VAS 2 VAS n

. . .Processes

AccesspermissionsmanagedbyOS

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

AccesspermissionsmanagedbyOS

AddressmappingmanagedbytheMTL

ConventionalvirtualmemoryVBI

DecoupledProtectionandTranslation

AddressmappingmanagedbyOS

38

AccesspermissionsmanagedbyOS

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

AccesspermissionsmanagedbyOS

AddressmappingmanagedbytheMTL

Secondguidingprinciple:Decouplingaddresstranslationfromaccessprotection

VBI

AddressTranslationStructuresinVBI

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

39

• TranslationstructuresarenotsharedwiththeOS

- Separatestructuresfortranslationandpermissioninformation

- Allowsflexibletranslationstructures

- Per-VBtranslationstructuretunedtotheVB’scharacteristicse.g.,single-leveltablesforsmallVBs

• Pros:- Lowersoverheadsandallowsforcustomization

VBInformation

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

VB

Enable

Reference Counter

Properties

Size

• EachVBisassociatedwith- System-wideuniqueID- Sizei.e.,whichsizeclass

- Enablebit- ReferencecounternumberofprocessesattachedtotheVB

- PropertiesbitvectorsemanticinformationaboutVBcontents,e.g.,accesspattern,latencysensitivevs.bandwidthsensitive

X

40

VBInformation

VBI Address Space

P1

VB 1

P2 Pn

Memory Translation Layerin the memory controller

Physical Memory

VB 2 VB 3 VB 4

. . .Processes

VB

Enable

Reference Counter

Properties

Size

• EachVBisassociatedwith- System-wideuniqueID- Sizei.e.,whichsizeclass

- Enablebit- ReferencecounternumberofprocessesattachedtotheVB

- PropertiesbitvectorsemanticinformationaboutVBcontents,e.g.,accesspattern,latencysensitivevs.bandwidthsensitive

X

41

Thirdguidingprinciple:Communicatingdatasemanticstothehardware

ImplementingVBI

• Pleaserefertoourpaper

- Detailedreferenceimplementationandmicroarchitecture

42

Memory Controller

Memory Translation Layer (MTL)

L1miss

VBUID offset

L2

Last-Level Cache(LLC)

index = request_vb(...);x = malloc(index, size);

.

.

.y = (*x); Virtual

Address

Application

index offset

miss

VBIAddress

CPU Physical Memory

VITsCVTs

enable_vb attach

CVT(Client–VB Table)

Cache TranslationStructures

Data

Translation Walker

Physical AddressTLB

miss

hit

VIT(VB Info Table)

Cache

Motivation

Outline

43

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

• Manyoptimizationsnoteasilyattainablebefore

• Examples:- Appropriatelysizedprocessaddressspace- Flexibleaddresstranslationstructures- Communicatingdatasemanticstothehardware- Inherentlyvirtualcaches- Delayedphysicalmemoryallocation- Eliminating2Dpagewalksinvirtualmachines- Earlymemoryreservationmechanism

OptimizationsNaturallyEnabledbyVBI

44

Achievedthroughguiding

principles

Covered in our paper

Coverednext

• Inherentlyvirtualcaches

• Delayedphysicalmemoryallocation

• Eliminating2Dpagewalksinvirtualmachines

ExampleOptimizations

45

In VBIIn Conventional Virtual Memory

virtually-indexed virtually-tagged(VIVT)

InherentlyVirtualCaches

46

VIPTCache

Core

Physical Memory

TLB

virtual address

miss

hitphysical address

VIVTCache

Core

Physical Memory

TLB

virtual address

physical address

VBICache

Core

Physical Memory

MTL

VBI address

physical address

permission check

page walkmiss

VIVTCache

Synonyms&

homonyms

miss miss

permission check

VBICache

NoSynonyms

&homonyms

physical address

page walk

system-wideunique

virtually-indexed physically-tagged(VIPT)

In Conventional Virtual Memory In VBI

virtually-indexed virtually-tagged(VIVT)

caches in VBI

47

VIPTcache

Core

Physical Memory

TLB

virtual address

miss

hitphysical address

VIVTcache

Core

Physical Memory

TLB

virtual address

physical address

VBIcache

Core

Physical Memory

MTL

VBI address

physical address

permission check

page walkmiss

VIVTCache

Synonyms&

homonyms

miss miss

permission check

VBICache

NoSynonyms

&homonyms

physical address

page walk

system-wideunique

virtually-indexed physically-tagged(VIPT)

VBIreduces addresstranslationoverheadbyenablingbenefitsakintoVIVTcaches

InherentlyVirtualCaches

• Inherentlyvirtualcaches

• Delayedphysicalmemoryallocation

• Eliminating2Dpagewalksinvirtualmachines

ExampleOptimizations

48

misspage walk

In VBIvirtually-indexed physically-tagged (VIPT)In Conventional Virtual Memory

DelayedPhysicalMemoryAllocation

49

VIPTCache

Core

Physical Memory

TLB

virtual address

hitphysical address

VBICache

Core

Physical Memory

MTL

VBI address

miss

no memoryallocated

miss zeroedcache line

actualcache line

misspage walk

50

VIPTcache

Core

Physical Memory

TLB

virtual address

hitphysical address

VBICache

Core

Physical Memory

MTL

VBI address

writeback

allocatesmemory

actualcache line

miss

virtually-indexed physically-tagged (VIPT)In Conventional Virtual Memory In VBI

DelayedPhysicalMemoryAllocation

51

VBICache

Core

Physical Memory

MTL

VBI address

writeback

allocatesmemory

• Noaddresstranslationforaccessestoregionswithnoallocation

• Nomemoryaccessesto regionswithnoallocationyet

• NomemoryallocationforVBsthatneverleavethecacheduringtheirlifetime

In VBI

DelayedPhysicalMemoryAllocation

In VBI

52

VBICache

Core

Physical Memory

MTL

VBI address

writeback

Allocatesmemory

• Noaddresstranslation foraccessestoregionswithnoallocation

• Nomemoryaccessto regionswithnoallocationyet

• NomemoryallocationforVBsthatneverleavethecacheduringtheirlifetime

VBIreduces addresstranslationoverhead,improves overall performance,

and reduces memory consumption

DelayedPhysicalMemoryAllocation

• Inherentlyvirtualcaches

• Delayedphysicalmemoryallocation

• Eliminating2Dpagewalksinvirtualmachines

ExampleOptimizations

53

Guest OS

54

Host Virtual Address Space

Host OS

Process 2

Host Page Tables

Physical Memory

---- virtualization layer ----

Eliminating2DPageWalksinVirtualMachines

Guest Virtual Address Space

g VAS

Guest Page Tables

Process 1

VAS 1 VAS 2

Processrunningonavirtualmachine(VM)

guestvirtual– to–

hostvirtual

hostvirtual– to–

hostphysical

Conventionalvirtualmemory

Guest OS

55

Host Virtual Address Space

Host OS

Process 2

Host Page Tables

Physical Memory

---- virtualization layer ----

Eliminating2DPageWalksinVirtualMachines

Guest Virtual Address Space

g VAS

Guest Page Tables

Process 1

VAS 1 VAS 2

Processrunningonavirtualmachine(VM)

guestvirtual– to–

hostvirtual

hostvirtual– to–

hostphysical

Guest OS

VBI Address Space

Host OS

Process 1

VB 1

Process 2

Memory Translation Layerin Memory Controller

Physical Memory

VB 2 VB 3

---- virtualization layer ----

Conventionalvirtualmemory

GuestOS andhostOSinteractonce toattachProcess1toitsVBs

56

MTL istheonlycomponent inthesystemthatmanagesaddressmapping

Eliminating2DPageWalksinVirtualMachines

Guest OS

VBI Address Space

Host OS

Process 1

VB 1

Process 2

Memory Translation Layerin Memory Controller

Physical Memory

VB 2 VB 3

---- virtualization layer ----

VBI

GuestOS andhostOSinteractoncetoattachProcess1toitsVBs

57

MTL performsaddresstranslationandmemoryallocation

Eliminating2DPageWalksinVirtualMachines

Guest OS

VBI Address Space

Host OS

Process 2

VB 1

Process 1

Memory Translation Layerin Memory Controller

Physical Memory

VB 2 VB 3

---- virtualization layer ----

Byeliminating2Dpagewalks,VBIreduces addresstranslationoverhead

invirtualizedenvironments

Motivation

Outline

58

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

• Simulator:heavily-modifiedversionofRamulator- Modelsvirtualmemorycomponents(e.g.,TLBs,pagetables)- Availableathttps://github.com/CMU-SAFARI/Ramulator-VBI

• Workloads:SPECspeed2017,SPECCPU2006,TailBench,Graph500

• Systemparameters:- Core:4-wideissue,OOO,128-entryROB- L1Cache:32KB,8-wayassociative,4cycles- L2Cache:256KB,8-wayassociative,8cycles- L3Cache:8MB(2MBper-core),16-wayassociative,31cycles- L1DTLB:- 4KBpages:64-entry,fullyassociative

- 2MBpages:32-entry,fullyassociative- L2DTLB:4KBand2MBpages:512-entry,4-wayassociative- PageWalkCache:32-entry,fullyassociative- DRAM:DDR3-1600,1channel,1rank/channel,8banks/rank- PCM:PCM-800,1channel,1rank/channel,8banks/rank

Methodology

59

Motivation

Outline

60

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

• TheimpactofVBIonreducingtheaddresstranslationoverheadinbothnativeexecutionandvirtualmachines

• Evaluatedsystems:- Threebaselines:

• Native: applicationsrunnativelyonanx86-64system• Virtual: applicationsruninsideavirtualmachine(acceleratedusing2Dpagewalkcache[Bhargava+,ASPLOS’08])

• PerfectTLB:anunrealisticversionofNativewithnotranslationoverhead

- OneVBIconfiguration:• VBI-Full:VBIwithalltheoptimizationsthatitenables

• Seeourpaperforresultsonmoresystemconfigurations

UseCase1:AddressTranslation

61

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

mcfmilc

namd

sjeng

bwaves-1

7

deepsjeng-1

7

lbm-17

omnetpp-17

img-d

nnmose

s

Graph500AVG

Virtual Perfect TLB VBI-Full

UseCase1:AddressTranslation

62

Spee

dup

Normalized to Native

0.7x

13.3

8.9

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

mcfmilc

namd

sjeng

bwaves-1

7

deepsjeng-1

7

lbm-17

omnetpp-17

img-d

nnmose

s

Graph500AVG

Virtual Perfect TLB VBI-Full

UseCase1:AddressTranslation

63

Spee

dup

Normalized to Native

1.9x

13.3

8.9

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

mcfmilc

namd

sjeng

bwaves-1

7

deepsjeng-1

7

lbm-17

omnetpp-17

img-d

nnmose

s

Graph500AVG

Virtual Perfect TLB VBI-Full

UseCase1:AddressTranslation

64

Spee

dup

Normalized to Native

2.4x

13.3

8.9

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

mcfmilc

namd

sjeng

bwaves-1

7

deepsjeng-1

7

lbm-17

omnetpp-17

img-d

nnmose

s

Graph500AVG

Virtual Perfect TLB VBI-Full

UseCase1:AddressTranslation

65

Spee

dup

Normalized to Native

4.3x

13.3

8.9

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

mcfmilc

namd

sjeng

bwaves-1

7

deepsjeng-1

7

lbm-17

omnetpp-17

img-d

nnmose

s

Graph500AVG

Virtual Perfect TLB VBI-Full

UseCase1:AddressTranslation

66

Spee

dup

Normalized to Native

49%

13.3

8.9

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

mcfmilc

namd

sjeng

bwaves-1

7

deepsjeng-1

7

lbm-17

omnetpp-17

img-d

nnmose

s

Graph500AVG

Virtual Perfect TLB VBI-Full

UseCase1:AddressTranslation

67

Spee

dup

Normalized to Native

VBIsignificantlyimprovesperformanceinbothnative execution andvirtual machines

49%

13.3

8.9

• ThebenefitsofVBIinharnessingthefullpotentialofheterogeneousmemoryarchitectures

- HybridPCM–DRAMmemoryarchitecture

• Evaluatedsystems:- Twobaselines:

• Hotness-UnawarePCM–DRAM:unawareofthedatahotness

• IDEAL:alwaysmapsfrequently-accesseddatatoDRAM

- OneVBIconfiguration:• VBIPCM–DRAM:VBImapsandmigratesfrequently-accessedVBstotheDRAM

UseCase2:MemoryHeterogeneity

68

Moreinourpaper:• SimilarperformanceimprovementforTiered-Latency-DRAM[Lee+,HPCA’13]

UseCase2:MemoryHeterogeneity

69

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

hmmermcf

milc

soplex

sphinx3

bwaves-1

7

lbm-17

omnetpp-17

xalancb

mk-17

img-d

nnmose

s

Graph500AVG

VBI PCM-DRAM IDEAL

Normalized to Hotness-Unaware PCM–DRAM

Spee

dup

33%

UseCase2:MemoryHeterogeneity

70

0.0

0.5

1.0

1.5

2.0

2.5

astar

bzip2

GemsFDTD

hmmermcf

milc

soplex

sphinx3

bwaves-1

7

lbm-17

omnetpp-17

xalancb

mk-17

img-d

nnmose

s

Graph500AVG

VBI PCM-DRAM IDEAL

Normalized to Hotness-Unaware PCM–DRAM

Spee

dup

33%

VBIenablesefficient datamapping anddatamigration forheterogeneousmemorysystems

Motivation

Outline

71

VBI:VirtualBlockInterfaceKeyIdea&GuidingPrinciplesDesignOverviewOptimizationsEnabledbyVBI

Methodology

Results

Summary

• VirtualBlockInterface(VBI):Anewvirtualmemoryframework- Addressesthechallengesinadaptingconventionalvirtualmemorytoincreasinglydiversesystemconfigurationsandworkloads

• KeyIdea:Delegatephysicalmemorymanagementtodedicatedhardwareinthememorycontroller• Benefits:Noteasilyattainableinconventionalvirtualmemory(e.g.,inherentlyvirtualcaches,delayingphysicalmemoryallocation,andavoiding2Dpagewalksinvirtualmachines)• Evaluation:

- VBIsignificantlyimprovesperformanceinbothnativeexecutionandvirtualmachines

- Increasestheeffectivenessofmanagingheterogeneousmemoryarchitectures

• Conclusion:VBIisapromisingnewvirtualmemoryframework- Canenableseveralimportantoptimizations- Increasesdesignflexibilityforvirtualmemory- Anewdirectionforfutureworkinnovelvirtualmemoryframeworks

Summary

72

TheVirtualBlockInterface:AFlexibleAlternativetothe

ConventionalVirtualMemoryFramework

Nastaran Hajinazar Pratyush PatelMinesh PatelKonstantinosKanellopoulos Saugata GhoseRachata Ausavarungnirun GeraldoF.Oliveira

JonathanAppavoo VivekSeshadriOnur Mutlu