sc17 poster draftoptout.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/sc17poster2.pdf · title:...

1
Exploring Use-cases for Non-Volatile Memories in support of HPC Resilience Onkar Patil, Saurabh Hukerikar, Frank Mueller, Christian Engelmann Dept. of Computer Science, North Carolina State University, Computer Science and Mathematics Division, Oak Ridge National Laboratory MOTIVATION Exaflop Computers à large number of compute + memory devices + different forms of interconnects + cooling and power equipment à Close Proximity Manufacturing processes used to make these devices are not foolproof Lower durability and reliability of the devices. Frequency of device failures and data corruptions ↑ à effectiveness and utility ↓ Future Applications need to be more resilient while they, Maintain a balance between performance and power consumption Minimize trade-offs Non-volatile memory (NVM) technologies à enable memory devices that can maintain state of computation in the primary memory architecture More potential in using these memory devices as specialized hardware Data Retention à critical in improving resilience of an application against crashes Persistent memory regions to improve HPC resiliency à key aspect of this project PROBLEM STATEMENT Design strategy Enable checkpointing at the data structure level Some data structures are more critical than others at different stages of the application in terms of failure recovery Reduce the space and time overhead considerably in comparison to traditional checkpointing methods Easy to use API with minimal code changes NVM-based Main Memory APPROACH APPLICATION STATIC DATA STRUCTURES DYNAMIC DATA STRUCTURES DRAM NVM APPLICATION STATIC DATA STRUCTURES DYNAMIC DATA STRUCTURES DRAM NVM APPLICATION STATIC DATA STRUCTURES DYNAMIC DATA STRUCTURES DRAM NVM Application-directed Checkpointing Data Versioning void main(){ int * a, size; pmem_cpy_init(); a=(int *)pmem_only_alloc(size); pmem_cpy_free(a); } void main(){ int * a, size; pmem_cpy_init(); a=(int *)pmem_cpy_alloc(size); pmem_cpy_update(a); pmem_cpy_free(a); } void main(){ int * a, size; pmem_cpy_init(); a=(int *)pmem_ver_alloc(size); pmem_cpy_update(a); pmem_cpy_free(a); } Experimentation Setup 16-node cluster with Dual socket, Quad-Core AMD Opteron, 128 GB DRAM memory, Intel SSD from 100GB to 256GB DGEMM benchmark of the HPCC benchmark suite Tested for 4, 8 and 16-node configurations for a matrix sizes of 1000, 2000 and 3000 RESULTS 0.001 0.01 0.1 1 10 4 8 16 GFLOPS Nodes GFLOPS in node scaling for StarDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER 0.001 0.01 0.1 1 10 4 8 16 GFLOPS Nodes GFLOPS in node scaling for SingleDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER DRAM only allocation and NVM-based main memory perform better than Application- directed Checkpointing and Data Versioning partly due to an inefficient lookup algorithm The performance is consistent for both Single matrix multiplication and multiple matrix multiplication operations All modes perform similar and consistently when scaled by node size or problem size 0.01 0.1 1 10 100 4 8 16 Time(sec) Nodes Execution times in node scaling for StarDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER 0.01 0.1 1 10 100 4 8 16 Time(sec) Nodes Execution times in node scaling for SingleDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER 0.0001 0.001 0.01 0.1 1 10 1000 2000 3000 GFLOPS No. of elements GFLOPS for problem size scaling in StarDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER 0.0001 0.001 0.01 0.1 1 10 1000 2000 3000 GFLOPS No. of elements GFLOPS for problem size scaling in SingleDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER The execution time increases exponentially when using two types of memory instead of one 0.01 0.1 1 10 100 1000 10000 1000 2000 3000 Time(sec) No. of elements Execution time for problem size scaling in StarDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER 0.01 0.1 1 10 100 1000 10000 1000 2000 3000 Time(sec) No. of elements Execution time for problem size scaling in SingleDGEMM DRAM PMEM_ONLY PMEM_CPY PMEM_VER Develop the memory usage modes to make them more efficient and maintain complete system state Minimal overhead Support more complex applications Develop lightweight recovery mechanisms to work with the checkpointing schemes Reduce downtime and rollback time Combine them with intelligent policies that can help build resilient static and dynamic runtime system FUTURE WORK Non-volatile memory devices can be used as specialized hardware for improving the resilience of the system and we demonstrated three potential memory usage models that show consistent performance for compute intensive workloads REFERENCES Hukerikar, Saurabh, and Christian Engelmann. "Resilience Design Patterns-A Structured Approach to Resilience at Extreme Scale." arXiv preprint arXiv:1611.02717 (2016). Mittal, Sparsh, and Jeffrey S. Vetter. "A survey of software techniques for using non-volatile memories for storage and main memory systems." IEEE Transactions on Parallel and Distributed Systems 27.5 (2016): 1537-1550. Hsu, Terry Ching-Hsiang, et al. "NVthreads: Practical Persistence for Multi-threaded Applications." Proceedings of the Twelfth European Conference on Computer Systems. ACM, 2017. Liu, Qingrui, et al. "Compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery." High Performance Computing, Networking, Storage and Analysis, SC16: International Conference for. IEEE, 2016. Yang, Shuo, et al. "Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC." arXiv preprint arXiv:1705.05541 (2017). Wong, Daniel, G. S. Lloyd, and M. B. Gokhale. A memory-mapped approach to checkpointing. No. LLNL-TR-635611. Lawrence Livermore National Laboratory (LLNL), Livermore, CA, 2013. Rezaei, Arash. Fault Resilience for Next Generation HPC Systems. North Carolina State University, 2016. This work was sponsored by the U.S. Department of Energy's Office of Advanced Scientific Computing Research. This manuscript has been co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). CONCLUSION

Upload: others

Post on 28-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: sc17 poster draftoptout.csc.ncsu.edu/~mueller/ftp/pub/mueller/papers/sc17poster2.pdf · Title: sc17_poster_draft Created Date: 8/7/2017 3:11:02 PM

Exploring Use-cases for Non-Volatile Memories in support of HPC ResilienceOnkarPatil,SaurabhHukerikar,FrankMueller,ChristianEngelmann

Dept.ofComputerScience,NorthCarolinaStateUniversity,ComputerScienceandMathematicsDivision,OakRidgeNationalLaboratory

MOTIVATION• Exaflop Computersà largenumberofcompute+memorydevices+differentformsofinterconnects+coolingandpowerequipmentà CloseProximity• Manufacturingprocessesusedtomakethesedevicesarenotfoolproof

• Lowerdurabilityandreliabilityofthedevices.• Frequencyofdevicefailuresanddatacorruptions↑à effectivenessandutility↓

• FutureApplicationsneedtobemoreresilientwhilethey,• Maintainabalancebetweenperformanceandpowerconsumption• Minimizetrade-offs

• Non-volatilememory(NVM)technologiesà enablememorydevicesthatcanmaintainstateofcomputationintheprimarymemoryarchitecture• Morepotentialinusingthesememorydevicesasspecializedhardware• DataRetentionà criticalinimprovingresilienceofanapplicationagainstcrashes• PersistentmemoryregionstoimproveHPCresiliencyà keyaspectofthisproject

PROBLEM STATEMENT

• Designstrategy• Enablecheckpointing atthedatastructurelevel• Somedatastructuresaremorecriticalthanothersatdifferentstagesoftheapplicationintermsoffailurerecovery• Reducethespaceandtimeoverheadconsiderablyincomparisontotraditionalcheckpointing methods• EasytouseAPIwithminimalcodechanges

NVM-basedMainMemory

APPROACHAPPLICATION

STATICDATASTRUCTURES

DYNAMICDATASTRUCTURES

DRAM NVM

APPLICATIONSTATICDATASTRUCTURES

DYNAMICDATASTRUCTURES

DRAM NVM

APPLICATIONSTATICDATASTRUCTURES

DYNAMICDATASTRUCTURES

DRAM NVM

Application-directedCheckpointing DataVersioningvoidmain(){int *a,size;pmem_cpy_init();…a=(int *)pmem_only_alloc(size);…pmem_cpy_free(a);…}

voidmain(){int *a,size;pmem_cpy_init();…a=(int *)pmem_cpy_alloc(size);…pmem_cpy_update(a);…pmem_cpy_free(a);…}

voidmain(){int *a,size;pmem_cpy_init();…a=(int *)pmem_ver_alloc(size);…pmem_cpy_update(a);…pmem_cpy_free(a);…}

• ExperimentationSetup• 16-nodeclusterwithDualsocket,Quad-CoreAMDOpteron,128GBDRAMmemory,IntelSSDfrom100GBto256GB• DGEMMbenchmarkoftheHPCCbenchmarksuite• Testedfor4,8and16-nodeconfigurationsforamatrixsizesof1000,2000and3000elements

RESULTS

0.001

0.01

0.1

1

10

4 8 16

GFLO

PS

Nodes

GFLOPSinnodescalingforStarDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

0.001

0.01

0.1

1

10

4 8 16

GFLO

PS

Nodes

GFLOPSinnodescalingforSingleDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

• DRAMonlyallocationandNVM-basedmainmemoryperformbetterthanApplication-directedCheckpointing andDataVersioningpartlyduetoaninefficientlookupalgorithm

• TheperformanceisconsistentforbothSinglematrixmultiplicationandmultiplematrixmultiplicationoperations

• Allmodesperformsimilarandconsistentlywhenscaledbynodesizeorproblemsize

0.01

0.1

1

10

100

4 8 16

Time(sec)

Nodes

ExecutiontimesinnodescalingforStarDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

0.01

0.1

1

10

100

4 8 16

Time(sec)

Nodes

ExecutiontimesinnodescalingforSingleDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

0.0001

0.001

0.01

0.1

1

10

1000 2000 3000

GFLO

PS

No.ofelements

GFLOPSforproblemsizescalinginStarDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

0.0001

0.001

0.01

0.1

1

10

1000 2000 3000

GFLO

PS

No.ofelements

GFLOPSforproblemsizescalinginSingleDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

• Theexecutiontimeincreasesexponentiallywhenusingtwotypesofmemoryinsteadofone

0.01

0.1

1

10

100

1000

10000

1000 2000 3000

Time(sec)

No.ofelements

ExecutiontimeforproblemsizescalinginStarDGEMMDRAMPMEM_ONLYPMEM_CPYPMEM_VER

0.01

0.1

1

10

100

1000

10000

1000 2000 3000

Time(sec)

No.ofelements

ExecutiontimeforproblemsizescalinginSingleDGEMM DRAMPMEM_ONLYPMEM_CPYPMEM_VER

• Developthememoryusagemodestomakethemmoreefficientandmaintaincompletesystemstate• Minimaloverhead• Supportmorecomplexapplications

• Developlightweightrecoverymechanismstoworkwiththecheckpointingschemes• Reducedowntimeandrollbacktime

• Combinethemwithintelligentpoliciesthatcanhelpbuildresilientstaticanddynamicruntimesystem

FUTURE WORK

• Non-volatilememorydevicescanbeusedasspecializedhardwareforimprovingtheresilienceofthesystemandwedemonstratedthreepotentialmemoryusagemodelsthatshowconsistentperformanceforcomputeintensiveworkloads

• REFERENCES• Hukerikar,Saurabh,andChristianEngelmann."ResilienceDesignPatterns-AStructuredApproachtoResilienceatExtreme

Scale." arXiv preprintarXiv:1611.02717 (2016).• Mittal,Sparsh,andJeffreyS.Vetter."Asurveyofsoftwaretechniquesforusingnon-volatilememoriesforstorageandmain

memorysystems." IEEETransactionsonParallelandDistributedSystems 27.5(2016):1537-1550.• Hsu,TerryChing-Hsiang,etal."NVthreads:PracticalPersistenceforMulti-threadedApplications." ProceedingsoftheTwelfth

EuropeanConferenceonComputerSystems.ACM,2017.• Liu,Qingrui,etal."Compiler-directedlightweightcheckpointing forfine-grainedguaranteedsofterrorrecovery." High

PerformanceComputing,Networking,StorageandAnalysis,SC16:InternationalConferencefor.IEEE,2016.• Yang,Shuo,etal."Algorithm-DirectedCrashConsistenceinNon-VolatileMemoryforHPC." arXiv preprint

arXiv:1705.05541 (2017).• Wong,Daniel,G.S.Lloyd,andM.B.Gokhale. Amemory-mappedapproachtocheckpointing.No.LLNL-TR-635611.Lawrence

LivermoreNationalLaboratory(LLNL),Livermore,CA,2013.• Rezaei,Arash. FaultResilienceforNextGenerationHPCSystems.NorthCarolinaStateUniversity,2016.• ThisworkwassponsoredbytheU.S.DepartmentofEnergy'sOfficeofAdvancedScientificComputingResearch.This

manuscripthasbeenco-authoredbyUT-Battelle,LLCunderContractNo.DE-AC05-00OR22725withtheU.S.DepartmentofEnergy.TheUnitedStatesGovernmentretainsandthepublisher,byacceptingthearticleforpublication,acknowledgesthattheUnitedStatesGovernmentretainsanon-exclusive,paid-up,irrevocable,world-widelicensetopublishorreproducethepublishedformofthismanuscript,orallowotherstodoso,forUnitedStatesGovernmentpurposes.TheDepartmentofEnergywillprovidepublicaccesstotheseresultsoffederallysponsoredresearchinaccordancewiththeDOEPublicAccessPlan(http://energy.gov/downloads/doe-public-access-plan).

CONCLUSION