Transcript
Page 1: An Exascale Workload Study

An Exascale Workload Study

Prasanna Balaprakash (ANL), Darius Buntinas (ANL), Anthony Chan (ANL), Apala Guha (UChicago, ANL), Rinku Gupta (ANL), Sri Hari Krishna Narayanan (ANL), Andrew A. Chien (UChicago, ANL), Paul Hovland (ANL), and Boyana Norris (ANL)

PROBLEM

10x10 is a new approach towards understanding “energy

efficient + performance optimized” supercomputing • "Amdahl’s"approach":"10%"of"code"is"where"90%"of"8me"is"spent"

• "Derive"common"cases"across"applica8ons"and""op8mizing"those""

• "Worked"very"well"in"the"past"

Tradi8onal"

90/10"Architecture"

• "Focuses"on"broad"architectural"changes"which"impact"big"por8ons"of"broad"range"of"applica8on"code"

• "No"support"for"upcoming"customized"heterogeneous"architectures"

• "Power"efficiency"and"performance"op8miza8on"are"increasingly"important"

Limita8ons"

of"

90/10"approach"

The 10 x 10 APPROACH

Step"1"

• "Understand"and"measure"diverse"characteris8cs"of"a"broad"range"of"workloads"which"are"of"interest"to"DOE"in"the"exascale"era"

Step"2"

• "Iden8fy"which"top"ten"characteris8cs"form"dominant(modes"(characteris8cs)"across"the"various"workloads""

Step"3""

• "Design"or"iden8fy"architectures/accelerators"best"suited"for"each"characteris8c"for"the"exascale"era."These"customized"accelerators"address"specific"characteris8cs"and"can"be"designed"to"be"highly"energy"efficient"and"high"performance."

APPLICATION CHARACTERIZATION

EXASCALE EXTRAPOLATION MODELS EVALUATING MODELS ON POTENTIAL

EXASCALE ARCHICTECTURES

ACKNOWLEDGEMENTS

This" work" was" supported" by" the" U.S." Department" of" Energy" Office" of"Science"DEWAC02W06CH11357"and"NSF"OCIW1W57921."

1"

2"

4" 5"

3"

6"

Operations types in applications

Operations types in application hotspots

Understanding memory bandwidth variations of applications with increasing input size

Apps"

PETSc"

Mantevo"

NEK5K"

Tools"

PIN"

HPCT"

PBound"

•  We"focus"on"studying"diverse"applica8ons"in"an"effort"to"understand"“dominant"modes"and"characteris8cs”"

•  We"understand"how"to"measure"these"characteris8cs"using"current"technologies"

•  Determine"whether"and"how"these"characteris8cs"will"change"for"increasing"applica8on"sizes"during"the"exascale"era"

EXASCALE SCALING LIMITS

Exascale Machine: Projected instruction mix Memory requirements projection models for applications

•  We"build"numerous"extrapola8ve"models"for"various"applica8ons"to"understand"their"key"characteris8cs"on"exascale"machines"

•  Models"are"sta8s8cally"validated"for"accuracy"

•  Key"modeled"characteris8cs:"compute"intensity,"memory"intensity,"instruc8on"mix"

•  Characteris8cs"provide"empirical"basis"for"designing"future"exascale"architectures"

Exascale Machine: Projected app runtime

Workloads Dominant characteristics

Customized architectures

Overall solution

Applica8on"Exascale"Projec8on"Models,"where"N"="n1*n2*n3,"and"c1,"c2,"c3,"c4"are"constants"

Ex19,"Ex30"f(n1,n2,n3)"="c1"+"c2*N"+"c3*(n1*n2)"+"c4*n1"

Ex20"f(n1,n2,n3)"="c1"+"c2*(n1*n2)"+"c3*(n1*n2)2"

miniFE,"miniMD,"HPCCG" f(n1,n2,n3)"="c1"+"c2*N"

0%20%40%60%80%

100%

Ex19 Ex20 Ex30 miniFEminiMD

HPCCG

Frac

tion

of T

otal

Ope

ratio

ns LoadsStores

Floating PointInteger

BranchesOther

0%20%40%60%80%

100%

Ex10 1Ex10 2

Ex19Ex20

Ex30 1Ex30 2

turbChan 1

turbChan 2

miniFE 1miniFE 2

miniFE 3miniMD

HPCCG

Frac

tion

of O

ps in

Hot

spot

LoadsStores

Floating PointInteger

BranchesOther

1

10

100

1000

Ex10Ex19

Ex20Ex30

vortexturbChan

miniFEminiMD

HPCCG

Band

wid

th (M

B/s)

Traditional Exascale architecture

Processor-under-memory (PUM)

We"evaluate"our"models"on""

 Tradi8onal"memory"model:"CPU"10TFlops;"bandwidth"1TB/s""

 PUM:"bandwidth"scales"to"10TB/s"due"to"the"stacked"memory"die"architecture"

"

Not"all"applica8ons"are"bandwidth"limited""

App#Scaling#Limit#

Exascale#

PUM#Improvement# Key#Limit#PUM#

Programming#Change#

App=level# Node=level#

Ex19" MemCap" 2.35" 2.97" MemCap" Local"

Ex20" Compute" 4.02" 1.00" Compute" Local"

Ex30" Compute" 4.08" 1.00" Compute" High"

miniMD" Compute" 6.75" 1.00" Compute" Local"

miniFE" MemCap" 6.57" 6.56" MemCap" Moderate"

HPCCG" MemCap" 10.00" 10.00" MemCap" Local"

0%20%40%60%80%

100%

Ex19 Ex20 Ex30 miniFEminiMD

HPCCG

Frac

tion

of T

otal

Ope

ratio

ns LoadsStores

Floating PointInteger

BranchesOther

1e-10

1e-05

1e+00

1e+05

1e+10

1e+15

1e+01 1e+02 1e+03 1e+04

Num

ber o

f Day

s

Input Size in G

Ex19Ex20Ex30

miniFEminiMDHPCCG

1e-051e-041e-031e-021e-011e+001e+011e+021e+031e+041e+05

1e+01 1e+02 1e+03

Tota

l Mem

ory

in P

B

Input Size in G

Ex19Ex20Ex30

miniFEminiMDHPCCG

Exascale Machine: Projected memory requirement

Apps# Exascale#Limit# Exascale#Limit#Cri@cal##Limit#

Feasible#Size#

24hrs# 100PB# Time# Mem#Cap#

miniMD" 41G" 600G" 27"hrs# 108"PB" Compute" 45GB"

Ex20" 92G" 500G" 25"hrs# 98"PB" Compute" 92GB"

Ex30" 130G" 1500G" 27"hrs# 110"PB" Compute" 130GB"

Ex19"5000G"

1000G" 23"hrs" 125"PB# Memory" 1TB"

miniFE"5000G"

250G" 23"hrs" 110"PB# Memory" 250GB"

HPCCG"5000G"

250G" 23"hrs" 110"PB# Memory" 250GB"

•  Extrapola8on"models"allow"us"to"classify"apps"as"computeW"or"memoryWlimited"

"

•  Feasible"dataset"sizes""

 used"to"es8mate"exascale"memory"requirements"

 assess"poten8al"benefit"of"exascale"technologies"such"as"PUM"

Compute"Engine"#1"

Memory"

Compute"Engine"#2"

Compute"Engine"#1"

Memory"

Top Related