photos placed in horizontal position between photos and...
TRANSCRIPT
![Page 1: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/1.jpg)
Photos placed in horizontal position with even amount of white space
between photos and header
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. SAND2017-4325 C
TheImpactofIncreasingMemorySystemDiversityonApplications
GwenVoskuilenArun Rodrigues,MikeFrank,SiHammond
4/25/2017
![Page 2: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/2.jpg)
Diversityinmemoryhierarchy
§ Newmemorytechnologies– stackedDRAM,non-volatile§ Advantages:higherbandwidth,persistent,etc.§ Disadvantages:expensive,higherlatencies
§ Solution:multi-levelmemory(MLM)§ Mixmemoriestoexposeadvantages,hidedisadvantages§ Reallyhardinpractice
§ Mustcarefullylocatedatabasedondatacharacteristics§ Whichdatagoesintowhichmemory?Whodecides?
2
Automatic Manual
Hardware Hardwaredecides
Software OS/runtimedecides
Application decides
![Page 3: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/3.jpg)
ManagingMLM
§ TrinityKNL:SmallstackedDRAM+largeDDRDRAM§ Applicationdatafootprint>>stackedDRAMcapacity
§ Managementrequiresidentifying andselectivelyallocatingdataneedingbandwidthintostackedDRAM
§ Study1:Softwaremanagementpolicies§ Rangingfromsimpletocomplex§ Developedanalysistool,MemSieve, toevaluatememorybehavior
§ Study2:Hardwarecaching§ Insertion/evictionpolicies
3
![Page 4: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/4.jpg)
Outline
§ Methodology
§ Software/manuallymanagedMLM(malloc()based)
§ Hardware/automaticmanagedMLM
§ Conclusions
4
![Page 5: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/5.jpg)
SimulationMethodology
§ MiniApps:HPCG,PENNANT,SNAP§ Memoryfootprintsof1-8GBà samplingrequired
§ SimulatedontwoarchitecturesusingSST§ Lightweight:72smallcores,mesh,privateL1,semi-privateL2§ Heavyweight:8bigcores,ring,privateL1+L2,sharedL3§ Both:DDRandHMCmemories
5
TileHMCDDR
C CL1 L1
L2
L3Core
Mem
Core
L3
Core
Core
CoreCore
Core
Core
L3
L3
L3
L3
L3
L3
Mem
Mem
Mem
Core
L1
L2
![Page 6: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/6.jpg)
Outline
§ Methodology
§ Software/manuallymanagedMLM
§ Hardware/automaticmanagedMLM
§ Conclusions
6
![Page 7: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/7.jpg)
Softwareapproaches
Software management
OS / Runtime
Static
Greedily insert pages
into HMC
Greedily insert mallocs
into HMC
Dynamic(future work)
Programmer
Static
Direct "best" mallocs to
HMC
Dynamic
Migrate to put current "best"
to HMC
7Increased performance?
![Page 8: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/8.jpg)
Tradeoff
§ Automatic(OS)§ Easierforprogrammer§ Abletocaptureallocationsnotunderprogrammercontrol
§ Library,pre-programstart,etc.§ Page-tablecomplexity;potentiallyexpensivere-mapping§ Noprogramknowledgeà worseperformance?
§ Coulduseprogrammerhintsorruntimeprofilingbutmorework
§ Manual(programmer)§ Moreworkforprogrammer,pervasive(?)changes§ Notabletohandleallallocations§ Possibleconflictsbetweenapplicationandlibraryallocations
§ Whatiflibrariesdecidetomanageallocationforinternalstructurestoo?§ Knowledgeofprogrambehaviorà betterperformance?
8
![Page 9: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/9.jpg)
Analysistool:MemSieve
§ Capturesanapplication’smemoryaccessesandcorrelatestotheapplication’smemoryallocations§ Filtersoutcachehits§ Withoutsimulatingfullmemoryhierarchyà 2.5X+faster
§ Keymeasurement:malloc density§ #accesses/size
§ Hypothesis:densemallocs shouldbeputinHMC§ AssumingsimilarlatenciesbetweenHMCandDDR
9
![Page 10: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/10.jpg)
Malloc analysis
Pennant HPCG Snap * MiniPICMalloc count 8B 23M 1B 438KMalloc size 32.1 TB 7.43 GB 30GB 7.9GBDistinct traces 248 612 323 39043Accessed traces 140 146 90 10794Size of accessed traces as % total
89.7% 99.987% 89.6% 84%
10
*Iterations from beginning & middle only
§ Manymallocs butfewdistinctmalloc calltraces(locations)§ Reasonsmallocs arenotaccessed
§ Sameaddressmalloc’d repeatedlyà cache-resident§ Malloc wasnotaccessedinprofiledsectionofapplication
![Page 11: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/11.jpg)
Idealmalloc behavior§ Good: Afew,small,verydensemallocs§ Bad:Many,equallydensemallocs;densestarebig
11
Den
sity
Mallocs
Big density variation: less work to manage
% a
cces
ses
(cum
ulat
ive)
Size
Mallocs
% a
cces
ses
(cum
ulat
ive)
Lots of accesses in a very small region
More dense Less dense
![Page 12: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/12.jpg)
Malloc density
12
0%10%20%30%40%50%60%70%80%90%100%
0
0.5
1
1.5
2
2.5
3
3.5
4
% o
f tot
al a
cces
ses
Den
sity
(acc
esse
s/by
te)
Malloc call sites, from most to least dense
DensityAccesses
0%
20%
40%
60%
80%
100%
0
0.25
0.5
0.75
1
1.25
1.5
1.75
%of
tota
l acc
esse
s
Den
sity
(acc
esse
s/by
te)
Malloc call sites, from most to least dense
Density
Accesses
0%10%20%30%40%50%60%70%80%90%100%
0
8
16
24
32
% o
f tot
al a
cces
ses
Cum
ulat
ive
size
(TB
)
Malloc call sites, from most to least dense
Size (TB)Accesses
0%
20%
40%
60%
80%
100%
0
1
2
3
4
5
6
7
8
% o
f tot
al a
cces
ses
Cum
ulat
ive
size
(GB
)
Malloc call sites, from most to least dense
Size (MB)Accesses
HPCGPENNANT
![Page 13: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/13.jpg)
HMCPotentialPerformance§ Max:8X§ Trendsdonotchangewithdatasetsize(1-8GB)
13
0
2
4
6
MiniPIC charge
MiniPIC field
MiniPIC move
SNAP p0 SNAP p1/2 HPCG PENNANTSpee
dup
over
all
DD
R
Heavyweight Architecture: Performance with all HMC
0
2
4
6
MiniPIC charge
MiniPIC field
MiniPIC move
SNAP p0 SNAP p1 HPCG PENNANTSpee
dup
over
all
DD
R
Lightweight Architecture: Performance with all HMC
![Page 14: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/14.jpg)
Manualallocation:PENNANT
§ Largeperformancejumpfrom25%to50%HMC§ Dynamicmigrationnecessary
14
0
1
2
3
4
5
6
7
Greedy - page Greedy - malloc Static Dynamic
Spee
dup
over
DD
R o
nly
PENNANT
12.50% 25% 50% 100%
![Page 15: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/15.jpg)
Manualallocation:HPCG
§ Again,largejumpfrom25%to50%HMC§ Page-based&staticperformsimilarly§ Dynamicnotbetter
§ Butgranularityofmigrationislarge
15
0
1
2
3
4
5
6
Greedy - page Greedy - malloc Static Dynamic
Spee
dup
over
DD
R o
nly
HPCG
12.50% 25% 50% 100%
![Page 16: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/16.jpg)
Manualallocation:SNAP
§ Greedy-pageperformsthebest§ Twolargemallocs inSNAP,each42%oftotal
§ Medium/lowdensity§ Oncetheydon’tfit,HMCsize/malloc strategydoesn’tmatter
§ Suggestedcodechange§ Breakuplargemallocs toimproveHMCutilization
16
0
0.5
1
1.5
2
Greedy - page Greedy - malloc Static DynamicSpee
dup
over
DD
R o
nly
SNAP
12.50% 25% 50% 100%
![Page 17: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/17.jpg)
Topologycomparison
§ Similartrend
17
0123456
Greedy -page
Greedy -malloc
Static Dynamic
Spee
dup
over
DD
R o
nly
PENNANT
12.50% 25% 50% 100%
01234567
Greedy -page
Greedy -malloc
Static Dynamic
Spee
dup
over
DD
R o
nly
PENNANT
12.50% 25% 50% 100%
Heavyweight Lightweight
![Page 18: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/18.jpg)
Outline
§ Methodology
§ Software/manuallymanagedMLM
§ Hardware/automaticmanagedMLM
§ Conclusions
18
![Page 19: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/19.jpg)
Hardwaremanagement
§ HardwaremanagementofMLMatthepagelevel§ CachepagesinHMC,pagestillresidesinDDR§ Comparedtoblocklevel:lowertrackingoverheadbuthigher
add/removeoverhead
§ Focuswashardwarecaching§ But,alsopossibletodocachingviaOS§ Usually,lessinformation(hits,misses,etc.)
19
![Page 20: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/20.jpg)
AutomaticPage-LevelSwapping§ Additionpolicies
§ Replacementpolicies
20
Directory Controller
DDR Fast Memory
MLM Unit
Mapping Table
DMAPolicy
Dispatcher
Addition Policies Replacement Policies• addT: Simple Threshold• addMFU: Most Frequently Used• addRAND: 1:8192 chance• addMRPU: More Recent Previous Use• addMFRPU: More Frequent + More
Recent Previous Use• addSC: Deprioritize streams• addSCF: as addSC + More Frequent
• FIFO: First-in, First-out• LRU: Least Recently Used• LFU: Least Frequently Used• LFU8: LFU w/ 8-bit counter• BiLRU: BiModal LRU• SCLRU: Deprioritize streams
![Page 21: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/21.jpg)
Performancevs.Policy
21
0.2
0.4
0.6
0.8
1
1.2
1.4
addMFRPU addMFU addMRPU addRAND addSC addT addSCF
Performan
ce
AddPolicy
Lulesh:MLMPerformancevsPolicy
BiLRU FIFO
LRU SCLRU
LFU8 LFU
0.2
0.4
0.6
0.8
1
1.2
1.4
addMFRPU addMFU addMRPU addRAND addSC addT addSCF
Performan
ce
AddPolicy
MiniFE:MLMPerformancevsPolicy
BiLRU FIFO
LRU SCLRU
LFU8 LFU
Addition policy: big variation
Replacement policy: little variation
“What you put in matters more than what you take out”
![Page 22: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/22.jpg)
Largerdatasets
§ Lookedathighestperformingadditionpolicies§ Variantsofmost-frequentlyused§ Baseline:random§ LRUreplacement
22
01234567
1024 8192 65536
Perf(1
=nofastm
em)
Pages
Pennant-bPerformance:AdditionaddMFRPU AllFastaddRand addSCFaddMFU
0
0.5
1
1.5
2
1024 8192 65536
Perf(1
=nofastm
em)
Pages
Snap-p0Performance:Addition
addMFRPU AllFastaddRand addSCFAddMFU
0
2
4
6
1024 8192 65536Perf(1
=nofastm
em)
Pages
HPCGPerformance:AdditionaddMFRPU Series2addRAND addSCFaddMFU
![Page 23: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/23.jpg)
FineTuning1. Thresholds
2. Pagesize3. Throttling
23
0
0.5
1
1.5
2
0 20 40 60 80
Performance
Threshold
PennantThreshold
0
1
2
3
4
5
9 11 13 15
Rel.Pe
form
PageSize(2x B)
PennantPagesizeEffects
128M256M
1.45
1.5
1.55
1.6
9 11 13 15
Rel.Pe
form
PageSize(2x B)
snap-p0PagesizeEffects
128M256M
0.5
1.5
2.5
3.5
0 200 400 600 800 1000
Perfo
rman
ce
Threshold
MLMPerformacevs.Threshold(addT/LRU)
CoMDlammpsluleshminiFE
0
0.2
0.4
0.6
0.8
1
CoMD lammps lulesh miniFEPe
rforman
ce
SwapThro0ling
Thro;le
NoThro;le
![Page 24: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/24.jpg)
Conclusions§ Applicationbehaviorvariessignificantly§ Softwaremanagementisfeasible
§ Smalltomoderatenumberofdense callsites§ Staticallocationsufficientinsomecases,dynamicnecessaryinothers
§ Forhardwaremanagement,additionpolicymattersmost§ Forautomatic&manual,profilingisinstrumental
§ Applicationmanaged– helpsidentifyhigh-bandwidthdata§ Automaticallymanaged– helpsidentifyplaceswhereapplication
changeswillimproveperformance
24
![Page 25: Photos placed in horizontal position between photos and ...salishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2017-voskuilen.pdf · 0.4 0.6 0.8 1 1.2 1.4 addMFRPU addMFU addMRPU addRAND](https://reader034.vdocuments.net/reader034/viewer/2022050201/5f54e2deda610a5353054a77/html5/thumbnails/25.jpg)