Download - An Exascale Workload Study
An Exascale Workload Study
Prasanna Balaprakash (ANL), Darius Buntinas (ANL), Anthony Chan (ANL), Apala Guha (UChicago, ANL), Rinku Gupta (ANL), Sri Hari Krishna Narayanan (ANL), Andrew A. Chien (UChicago, ANL), Paul Hovland (ANL), and Boyana Norris (ANL)
PROBLEM
10x10 is a new approach towards understanding “energy
efficient + performance optimized” supercomputing • "Amdahl’s"approach":"10%"of"code"is"where"90%"of"8me"is"spent"
• "Derive"common"cases"across"applica8ons"and""op8mizing"those""
• "Worked"very"well"in"the"past"
Tradi8onal"
90/10"Architecture"
• "Focuses"on"broad"architectural"changes"which"impact"big"por8ons"of"broad"range"of"applica8on"code"
• "No"support"for"upcoming"customized"heterogeneous"architectures"
• "Power"efficiency"and"performance"op8miza8on"are"increasingly"important"
Limita8ons"
of"
90/10"approach"
The 10 x 10 APPROACH
Step"1"
• "Understand"and"measure"diverse"characteris8cs"of"a"broad"range"of"workloads"which"are"of"interest"to"DOE"in"the"exascale"era"
Step"2"
• "Iden8fy"which"top"ten"characteris8cs"form"dominant(modes"(characteris8cs)"across"the"various"workloads""
Step"3""
• "Design"or"iden8fy"architectures/accelerators"best"suited"for"each"characteris8c"for"the"exascale"era."These"customized"accelerators"address"specific"characteris8cs"and"can"be"designed"to"be"highly"energy"efficient"and"high"performance."
APPLICATION CHARACTERIZATION
EXASCALE EXTRAPOLATION MODELS EVALUATING MODELS ON POTENTIAL
EXASCALE ARCHICTECTURES
ACKNOWLEDGEMENTS
This" work" was" supported" by" the" U.S." Department" of" Energy" Office" of"Science"DEWAC02W06CH11357"and"NSF"OCIW1W57921."
1"
2"
4" 5"
3"
6"
Operations types in applications
Operations types in application hotspots
Understanding memory bandwidth variations of applications with increasing input size
Apps"
PETSc"
Mantevo"
NEK5K"
Tools"
PIN"
HPCT"
PBound"
• We"focus"on"studying"diverse"applica8ons"in"an"effort"to"understand"“dominant"modes"and"characteris8cs”"
• We"understand"how"to"measure"these"characteris8cs"using"current"technologies"
• Determine"whether"and"how"these"characteris8cs"will"change"for"increasing"applica8on"sizes"during"the"exascale"era"
EXASCALE SCALING LIMITS
Exascale Machine: Projected instruction mix Memory requirements projection models for applications
• We"build"numerous"extrapola8ve"models"for"various"applica8ons"to"understand"their"key"characteris8cs"on"exascale"machines"
• Models"are"sta8s8cally"validated"for"accuracy"
• Key"modeled"characteris8cs:"compute"intensity,"memory"intensity,"instruc8on"mix"
• Characteris8cs"provide"empirical"basis"for"designing"future"exascale"architectures"
Exascale Machine: Projected app runtime
Workloads Dominant characteristics
Customized architectures
Overall solution
Applica8on"Exascale"Projec8on"Models,"where"N"="n1*n2*n3,"and"c1,"c2,"c3,"c4"are"constants"
Ex19,"Ex30"f(n1,n2,n3)"="c1"+"c2*N"+"c3*(n1*n2)"+"c4*n1"
Ex20"f(n1,n2,n3)"="c1"+"c2*(n1*n2)"+"c3*(n1*n2)2"
miniFE,"miniMD,"HPCCG" f(n1,n2,n3)"="c1"+"c2*N"
0%20%40%60%80%
100%
Ex19 Ex20 Ex30 miniFEminiMD
HPCCG
Frac
tion
of T
otal
Ope
ratio
ns LoadsStores
Floating PointInteger
BranchesOther
0%20%40%60%80%
100%
Ex10 1Ex10 2
Ex19Ex20
Ex30 1Ex30 2
turbChan 1
turbChan 2
miniFE 1miniFE 2
miniFE 3miniMD
HPCCG
Frac
tion
of O
ps in
Hot
spot
LoadsStores
Floating PointInteger
BranchesOther
1
10
100
1000
Ex10Ex19
Ex20Ex30
vortexturbChan
miniFEminiMD
HPCCG
Band
wid
th (M
B/s)
Traditional Exascale architecture
Processor-under-memory (PUM)
We"evaluate"our"models"on""
Tradi8onal"memory"model:"CPU"10TFlops;"bandwidth"1TB/s""
PUM:"bandwidth"scales"to"10TB/s"due"to"the"stacked"memory"die"architecture"
"
Not"all"applica8ons"are"bandwidth"limited""
App#Scaling#Limit#
Exascale#
PUM#Improvement# Key#Limit#PUM#
Programming#Change#
App=level# Node=level#
Ex19" MemCap" 2.35" 2.97" MemCap" Local"
Ex20" Compute" 4.02" 1.00" Compute" Local"
Ex30" Compute" 4.08" 1.00" Compute" High"
miniMD" Compute" 6.75" 1.00" Compute" Local"
miniFE" MemCap" 6.57" 6.56" MemCap" Moderate"
HPCCG" MemCap" 10.00" 10.00" MemCap" Local"
0%20%40%60%80%
100%
Ex19 Ex20 Ex30 miniFEminiMD
HPCCG
Frac
tion
of T
otal
Ope
ratio
ns LoadsStores
Floating PointInteger
BranchesOther
1e-10
1e-05
1e+00
1e+05
1e+10
1e+15
1e+01 1e+02 1e+03 1e+04
Num
ber o
f Day
s
Input Size in G
Ex19Ex20Ex30
miniFEminiMDHPCCG
1e-051e-041e-031e-021e-011e+001e+011e+021e+031e+041e+05
1e+01 1e+02 1e+03
Tota
l Mem
ory
in P
B
Input Size in G
Ex19Ex20Ex30
miniFEminiMDHPCCG
Exascale Machine: Projected memory requirement
Apps# Exascale#Limit# Exascale#Limit#Cri@cal##Limit#
Feasible#Size#
24hrs# 100PB# Time# Mem#Cap#
miniMD" 41G" 600G" 27"hrs# 108"PB" Compute" 45GB"
Ex20" 92G" 500G" 25"hrs# 98"PB" Compute" 92GB"
Ex30" 130G" 1500G" 27"hrs# 110"PB" Compute" 130GB"
Ex19"5000G"
1000G" 23"hrs" 125"PB# Memory" 1TB"
miniFE"5000G"
250G" 23"hrs" 110"PB# Memory" 250GB"
HPCCG"5000G"
250G" 23"hrs" 110"PB# Memory" 250GB"
• Extrapola8on"models"allow"us"to"classify"apps"as"computeW"or"memoryWlimited"
"
• Feasible"dataset"sizes""
used"to"es8mate"exascale"memory"requirements"
assess"poten8al"benefit"of"exascale"technologies"such"as"PUM"
Compute"Engine"#1"
Memory"
Compute"Engine"#2"
Compute"Engine"#1"
Memory"