thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · thynvm: enabling...

Post on 12-Nov-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ThyNVM:EnablingSoftware-TransparentCrashConsistencyinPersistentMemorySystems

JingleiRen∗† JishenZhao‡ SamiraKhan†ʹ JongmooChoi+†YongweiWu∗ OnurMutlu†

∗TsinghuaUniversity†CarnegieMellonUniversity

‡UniversityofCalifornia,SantaCruzʹUniversityofVirginia+DankookUniversity

Emergingbyte-addressable non-volatilememory(NVM)

Persistentmemory, a newtier inthememory andstoragestack

NVM is coming…

Step1:

Step2:

Addadataitemtoapersistent linkedlist

Current solution: wrap these in one transactionor use other specificsoftware-based interface

List brokenData lost

Newrequirementforpersistent memorydata:crashconsistency

• Motivation:Limitationsofsoftware-based crashconsistencysupport• Significantburden on programmers:e.g.,adoptingnewinterfaces.• Limitedusecases:e.g.,legacy application, non-transactionalprograms.

• Idea:Software-transparent crashconsistencysupportthrougha newdual-schemecheckpointingmechanismforpersistentmemory.

• Observation:Atradeoffbetween applicationstalltime (checkpointing latency)andmetadatastorageoverhead.• Small-granularityscheme:✔shortcheckpointing latency✘largemetadata.• Large-granularityscheme:✘longcheckpointing latency✔smallmetadata.

• Mechanism:Combinationoftwocheckpointingschemesattwogranularities.• Realizing✔shortcheckpointing latency: cooperation ofthe two schemes.• Realizing✔smallmetadata:sparseupdates→small-granularityscheme;

denseupdates→large-granularityscheme.• Evaluation:Within4.9% slowdown ofan idealizedDRAM-only systemwithcrash consistency support at no cost.

ExecutiveSummary

Outline• Motivation• Observation: A New Tradeoff• Dual-Scheme Checkpointing• Evaluation

MotivationInefficiency of software-based crash consistencysupport

1 void TMhashtable_update(TM_ARGDECL hashtable_t* ht,2 void* key, void* data) {3 list_t* chain = get_chain(ht, key);4 pair_t* pair;5 pair_t updatePair;6 updatePair.first = key;7 pair = (pair_t*)TMLIST_FIND(chain, &updatePair);8 pair->second = data;9 }

Transactional interface for third-party libraries

Manually declaring transactional/persistent components

Prohibited operation, will cause a runtime error

(Potential) program bugfor certain implementation

MotivationInefficiency of software-based crash consistencysupport

ThyNVM - FeatureI:Software-transparent crash consistency support

1 void hashtable_update(hashtable_t* ht, 2 void* key, void* data) {3 list_t* chain = get_chain(ht, key);4 pair_t* pair;5 pair_t updatePair;6 updatePair.first = key;7 pair = (pair_t*)list_find(chain, &updatePair);8 pair->second = data;9 }

Valid operation,persistent memory will ensure crash consistency

Unmodified syntax and semantics

Motivation

Inefficiency of logging• Logging

• Largespaceforrecordingeveryupdate• Slowrecoveryforreplayingthelog

• Copy-on-Write• Largespaceforredundantunmodifieddata• Slowoperationforcopyingunmodifieddata

ThyNVM - FeatureII:An efficientdual-scheme checkpointingmechanism

and copy-on-write (CoW)

Outline• Motivation• Observation: A new tradeoff• Dual-scheme checkpointing• Evaluation

Observation

Two concerns in checkpointing

Latencyofcheckpointingthe

workingcopyofdata

Metadataoverheadtotracktheworkingcopy/checkpointof

data

tradeoff

and their tradeoff

Checkpointing granularity• Small granularity leadsto largemetadata size

• Large granularity leadsto small metadata size

Location of theworkingcopyof data• Caching the working copy in DRAM:writebackbothdirtydataandmetadataduringcheckpointing(longlatency)

• StoringtheworkingcopyinNVM:persistonlymetadataduringcheckpointing(shortlatency);needremapping datalocations

Observation

CheckpointinggranularitySmall (cache block) Large (page)

Locatio

nof

working

copy

DRAM:basedonwriteback

❶ Inefficient✘ Large metadata overhead✘ Long checkpointing latency

❷ Partiallyefficient✔ Small metadataoverhead✘ Long checkpointinglatency

NVM:basedonremap

❸ Partiallyefficient✘ Large metadataoverhead✔ Short checkpointinglatency✔ Fast remapping

❹ Inefficient✔ Small metadataoverhead✔ Short checkpointinglatency✘ Slowremapping(onthecriticalpath)

CheckpointingSchemeI CheckpointingSchemeII

Outline• Motivation• Observation: A New Tradeoff• Dual-SchemeCheckpointing• Evaluation

Dual-Scheme Checkpointing

Definitions• Execution model:epochs

• System model: the hybrid architecture

execution checkpointing execution checkpointing

Epoch 0 (last epoch) Epoch 1 (active epoch) time

Shared LLC

MemoryController

CPUCore

CPUCore

CPUCore

...

DRAM

Address Translation TablesBTT PTT

DRAM Read Queue

NVM Write Queue

NVM Read Queue

DRAM Write QueueNVM

The last checkpoint 𝐶"#$%The active working copy𝑊#'%()*

Block Translation Table (BTT):metadatafor small-granularity schemePage Translation Table (PTT):metadatafor large-granularity scheme

Hardware-based design:Software uses regularload/store instructions

Recover

𝐶"#$%

BTT (Mem. Ctrl.)

Checkpointing Scheme I: Block Remapping(location in the tradeoff: small granularity + NVM in-place)

𝑊#'%()*

𝐶"#$%

BTT (Mem. Ctrl.)P Q NVM (blocks)

PWrite to P(cache block size)

Q

During execution: remap theworking copy to a new address inNVM, to protect the last checkpoint

During checkpointing: only need topersist BTT;𝑊#'%()* becomes𝐶"#$%without anydatamovement

𝐶"#$%

BTT (Mem. Ctrl.)P Q

Checkpointinggranularity

Small (cache block) Large (page)

Locatio

nof

working

copy

DRAM(basedonwriteback)

❶ Inefficient� Large metadata overhead� Long checkpointing latency

❷ Partiallyefficient� Small metadataoverhead� Long checkpointinglatency

NVM(basedonremapping)

❸ Partiallyefficient� Large metadataoverhead� Short checkpointinglatency� Fast remapping

❹ Inefficient� Small metadataoverhead� Short checkpointinglatency� Slowremapping(onthecriticalpath)

P

Q

BTT Backup (NVM)P Q

NVM (blocks)

Dual-Scheme Checkpointing

Dual-Scheme Checkpointing

Checkpointing Scheme II: Page Writeback(location in the tradeoff: large granularity + DRAM cache)

Checkpointinggranularity

Small (cache block) Large (page)

Locatio

nof

working

copy

DRAM(basedonwriteback)

❶ Inefficient� Large metadata overhead� Long checkpointing latency

❷ Partiallyefficient� Small metadataoverhead� Long checkpointinglatency

NVM(basedonremapping)

❸ Partiallyefficient� Large metadataoverhead� Short checkpointinglatency� Fast remapping

❹ Inefficient� Small metadataoverhead� Short checkpointinglatency� Slowremapping(onthecriticalpath)

During execution: update thecached hot pages in DRAM (𝑊#'%()* )

During checkpointing: writeback𝑊#'%()* and PTT to NVM

𝐶"#$%

PTT (Mem. Ctrl.)P P*

P

NVM (pages)

𝑊#'%()*P*

DRAM (pages)

𝐶"#$%

PTT (Mem. Ctrl.)P P*

P

NVM (pages)

𝑊#'%()*P*DRAM (pages) Q

PTT Backup (NVM)P Q

Write toa block in P

Dual-Scheme CheckpointingCoordinating the Two Schemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes

ThyNVM: overlap program execution and checkpointing time.

execution checkpointing

Epoch 1

execution checkpointing

Epoch 0

time

Mainly due to the page writeback scheme,while the block remapping scheme finishes

checkpointing fast

Dual-Scheme CheckpointingCoordinatingtheTwoSchemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes

execution checkpointing

Epoch 1

execution checkpointing

Epoch 2

execution checkpointing

Epoch 0

time

Two schemes operate separatelyfor different memory regions

Page writeback does checkpointingin background

Block remapping takes charge ofall memory regions temporarily

Dual-Scheme Checkpointing

The penultimate checkpoint 𝐶+*,-"%

the last checkpoint𝐶"#$%

Recover

execution checkpointing

Epoch 1

execution checkpointing

Epoch 2

execution checkpointing

Epoch 0

time

CoordinatingtheTwoSchemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes

the workingcopy𝑊#'%()*

Dual-Scheme Checkpointing

Blockremapping (Cooperation) Pagewriteback

Storereceived

No

Still ckpt. 𝐶"#$%?No

Write 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒃𝒍𝒐𝒄𝒌 to NVM

(protecting 𝑪𝒍𝒂𝒔𝒕);Update BTT

YesHit in PTT?

Write 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒑𝒂𝒈𝒆 to DRAM

(protecting 𝑪𝒍𝒂𝒔𝒕);Update PTT

Still ckpt. 𝐶"#$%?

NoYes

Buffer 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒃𝒍𝒐𝒄𝒌 in DRAM

(protecting 𝑪𝒑𝒆𝒏𝒖𝒍𝒕);Update BTT

Yes

CoordinatingtheTwoSchemes• Key Mechanism I:Summary of flow

Acknowledge

Dual-Scheme CheckpointingCoordinatingtheTwoSchemes• Key Mechanism II: Realizing small metadata overhead bymatching write patterns with dual schemes

• Estimate spatial locality by# stores in the last epoch on individualblocks/pages (recorded on BTT/PTT)

• Switch scheme by updating PTT andmigratingnecessary data

Spatiallocality

Writepattern

Page-levelcharacteristics

Granularity formin metadata

Matchingscheme

Low Random,sparse,of small sizes

Small portionof dirty data

Small(cache block size)

Blockremapping

High Sequential,dense,of large sizes

Large portionof dirty data

Large(page size)

Pagewriteback

Outline• Motivation• Observation: A New Tradeoff• Dual-Scheme Checkpointing• Evaluation

Evaluation• Experiment Setup

• Simulator basedongem5• DRAMandNVMwithDDR3• NVM:40(128/368)nsrowhit(clean/dirtymiss)

• Systemsincomparison• IdealDRAM:fullDRAM; nocost in supporting crashconsistency

• Ideal NVM: full NVM; no cost in supporting crashconsistency

• Journaling (one form of logging)• Shadow paging (one form of copy-on-write)

Evaluation• Workload I: Micro-benchmarks with

different write patterns

0

512

1024

1536

2048

Journal ShadowThyNVM 0

20

40

60

80

100

Tota

l am

ount of

NV

M w

rite

tra

ffic

(MB

)

% e

xec. tim

espent on c

kpt.

CPUMigration

Checkpoint.% exec time on ckpt

0

256

512

768

1024

Journal ShadowThyNVM 0

20

40

60

80

100

Tota

l am

ount of

NV

M w

rite

tra

ffic

(MB

)

% e

xec. tim

espent on c

kpt.

CPUMigration

Checkpoint.% exec time on ckpt

(a) Random (b) Sequential

• ThyNVMreducestheNVMwritetrafficby10.8%/14.4%compared to Journalingand Shadow paging.

• Journaling/Shadowpagingspend18.9%/15.2% timeoncheckpointing,whileThyNVMreducesthisoverheadto2.5% onaverage.

Evaluation• Workload II: In-memory storage

(hashtable based key-value store)

• ThyNVMprovides8.8% higherthroughputthanJournaling

• ThyNVMprovides29.9% higherthroughputthanShadowpaging

50 100 150 200 250 300 350

16 64 256 1024 4096

Tran

sact

ion

thro

ughp

ut (K

TPS)

Request size (B)

Ideal DRAMIdeal NVM

JournalShadowThyNVM

Evaluation• Workload III: Compute-intensive tasks

(in CPU SPEC 2006)

• ThyNVMslowsdownbyonly3.4% comparedtoIdealDRAM,andspeedsupby2.7% comparedtoIdealNVM.

0.4 0.5 0.6 0.7 0.8 0.9

1 1.1 1.2

gcc bwavesmilc leslie.soplexGems.lbm omnet.

Nor

mal

ized

IPC

Ideal DRAMIdeal NVM

ThyNVM

ConclusionContributions• We propose anewhybrid persistentmemorydesignwithsoftware-transparentcrashconsistencysupport.

• We identify a new tradeoffbetweenapplicationstalltimeandmetadatastorageoverhead.

• Wedeviseanew efficientdual-schemecheckpointingmechanism.

Potentials• ThyNVMcanenable:(1)easierandmorewidespreadadoption ofpersistentmemory,and(2)moreefficientsoftwarestackforexploitingpersistentmemory.

• ThyNVM can encouragemoreresearchinprovidingprogrammer-friendlymechanismsformanagingpersistentandhybridmemories.

Open Source• Web site:http://persper.com/thynvm (source code, documents, etc.)

Thank you!Jinglei Ren <jinglei.ren@persper.com>

http://persper.com/thynvm

top related