quicksilver:a proxy app for the monte carlo transport code

19
LLNL-PRES-743737 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Quicksilver: A Proxy App for the Monte Carlo Transport Code Mercury Workshop on Representative Applications David Richards, Ryan Bleile, Patrick Brantley, Shawn Dawson, Scott McKinley, Matthew O’Brien September 5, 2017

Upload: others

Post on 14-Nov-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-743737This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Quicksilver: AProxyAppfortheMonteCarloTransportCodeMercuryWorkshoponRepresentativeApplications

DavidRichards,RyanBleile,PatrickBrantley,ShawnDawson,ScottMcKinley,MatthewO’Brien

September5,2017

Page 2: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437372

SierrawillincludeIBMPower9CPUsandNvidia VoltaGPUs

IBMPower9CPU<10%SierraFlops

Nvidia VoltaGPU>90%SierraFlops16GBHBM/GPU

~4500nodesEachnodehas:2Power94VoltaGPU

Page 3: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437373

§ Particlesinteractwithmatterbyavarietyof“reactions”

§ Theprobabilityofeachreactionanditsoutcomesarecapturedinexperimentallymeasured“crosssections”(Latencyboundtablelookups)

§ Followsmanyparticles(millionsormore)andusesrandomnumberstosampletheprobabilitydistributions(Verybranchy,divergentcode)

§ Particlescontributetodiagnostic“tallies”(Potentialdataraces)

§ Theresultisastatisticallycorrectrepresentationofthephysicalsystem

MercurysolvesparticletransportproblemsusingtheMonteCarloMethod

Absorption Scattering Fission

Page 4: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437374

Nuclearcrosssectionsarenot“nice”functions

Scattering

Absorption

Fission

NuclearCrossSectionsfor235U

Page 5: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437375

§ Threespecificuses:— AnimbleprototypecodefortestingdesignorrefactoringoptionsforMercury— Anopensourcevehicleforco-designwithoutsidepartners— AbenchmarkcodetoreplaceourpreviousMonteCarlobenchmarkcode

§ OverallgoalwastoapproximatetheoverallapplicationperformanceofMercury— Controlflowisdominatedbybranchingduetotherandomsamplingofreactions.— Memoryaccesspatternsassociatedwithreadingcrosssectiontablestendtobelatency-bound,

smallmemoryloadsthataredifficultorimpossibletocacheorcoalesce.— Domaindecompositionandinternodecommunicationtohandlelargeproblems.

§ MajordatastructuresintentionallysimilartoMercury

§ Flexibleinputstorepresentmultiplecommonusemodes

CreatingQuicksilverrequiredmodelingchoicesProxyappsaremodelsforoneormoreaspectsoftheirparents

Itisessentialtotoidentifythekeyfeaturesoftheparentapptheproxyisintendedtorepresentandincludefaithfulmodelsofthosefeatures

Page 6: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437376

QuicksilveromitsmanyMercuryfeatures,butkeepsenoughtorepresentcriticalcomputationalpatterns

Mercury Quicksilver

CrossSections Continuous energy&multi-group Multi-group(syntheticdata)

Mesh/Geometry Multipletypes &solidgeometry 3Dpolyhedralonly

Reactions 10-40 3(usesreplication)

Reaction Physics Physicallybased Simplified(isotropic, etc.)

Tallies Many built-in&userdefined Balance&scalarflux

Sources&populationctrl Realistic withvariancereduction Simplified

Loadbalancing Sophisticated applicationspecific Trivial particle-countbased

Inputspecification Pythonscriptinginterface Flexible problemsetup(YAML)

MPI/OpenMP Yes/Yes Yes/Yes

Page 7: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437377

IsQuicksilveragoodrepresentationofMercury?

�����

������

������

������

������

������

������

� �� �� �� �� �� ��

��������������

���� ����

��������������������

��������

Balancetalliescounteachkindofevent.Dataisfor100,000particles.

Numberofreactionsdropssharply

Facetcrossingsnearlyvanish

Particlesreachingcensusincreases

Bigchangesduringfirst10timesteps.Notthebehaviorweexpected

Page 8: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437378

§ Sourcedparticleshaverandomenergydrawnfromuniformdistribution

§ Simplifiedsourcingrulescreate10%oftargetsimulationparticleseachstep

§ Populationcontrolsplitsorkillsparticlestoachievetargetnumber— Adjustsparticlestatisticalweight

§ Injustafewtimesteps,populationcontrolmagnifiesanyparticlesthatsurvivetocensus— Rare,lowenergyparticleslesslikelyto

beabsorbed

Theparticleenergyspectrumhasanunexplainedshift.Causedbyunintendedconsequencesofsimplifiedphysics&sourcing.

����

���

����

���

����

���

� �� ��� ��� ��� ���

������������������

������ �����

���� ����� � 20MeV

1keV

Energygroupsarelogarithmic.Particlesingroup135areabout

100xlowervelocitythangroup230

Page 9: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-7437379

Deletinglow-weightparticlessolvestheproblem

����

���

����

���

����

���

� �� ��� ��� ��� ���

������������������

������ �����

���� ����� �

Thereisnosuchcapabilityintheparentapplication,butitmakesQuicksilverabettermodelforMercury

Page 10: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373710

Thisis1000s(or10,000s)oflinesofcode

§ loopovercycles(timesteps)— cycle_init

• sourceinnewparticles• populationcontrol

— cycle_tracking• loopoverparticles

– untilcensus• finddistancetocensus(endoftimestep)• finddistancetomaterialboundary(meshfacet)• finddistancetocollision(reaction)• selectreactionandupdateparticle

— cycle_finalize

QuicksilverandMercuryarehostiletothetypicalGPUfine-grainedthreadingapproach

Majorityofcrosssectionlookupsareinhere

Page 11: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373711

Thisis1000s(or10,000s)oflinesofcode

§ loopovercycles(timesteps)— cycle_init

• sourceinnewparticles• populationcontrol

— cycle_tracking• loopoverparticles

– untilcensus• finddistancetocensus(endoftimestep)• finddistancetomaterialboundary(meshfacet)• finddistancetocollision(reaction)• selectreactionandupdateparticle

— cycle_finalize

QuicksilverandMercuryarehostiletothetypicalGPUfine-grainedthreadingapproach

Majorityofcrosssectionlookupsareinhere

HowdoyouwritethiscodeforGPUs?

“Fat”threadingstrategy:• Eachthreadgetsitsown“vault”ofparticles

• Tallyandbufferdatastructuresarereplicatedtoavoidraces

• WorksgreatonCPUplatforms!

Page 12: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373712

Makethisakernel!

Thisis1000s(or10,000s)oflinesofcode

§ loopovercycles(timesteps)— cycle_init

• sourceinnewparticles• populationcontrol

— cycle_tracking• loopoverparticles

– untilcensus• finddistancetocensus(endoftimestep)• finddistancetomaterialboundary(meshfacet)• finddistancetocollision(reaction)• selectreactionandupdateparticle

— cycle_finalize

QuicksilverandMercuryarehostiletothetypicalGPUfine-grainedthreadingapproach

Majorityofcrosssectionlookupsareinhere

Canthis“Big-Kernel”approachpossiblyperformwell?

“Fat”threadingstrategy:• Eachthreadgetsitsown“vault”ofparticles

• Tallyandbufferdatastructuresarereplicatedtoavoidraces

• WorksgreatonCPUplatforms!

Page 13: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373713

Tobuildabigkernel,Mercury’sthreadingmodelmustchange

Fatthread

§ Dozensofactivethreads

§ Separatecollectionofparticlesforeachthread

§ Dataracesmanagedwithreplication

§ MPItightlyintegratedintrackingloop

§ WorkswellonCPUs

Thinthread

§ Thousandsofactivethreads

§ Allthreadsshareacommoncollectionofparticles

§ Dataracesmanagedwithatomics

§ NoMPIintrackingloop

§ WorksonGPUsandCPUs

Page 14: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373714

§ QuicksilverwasmorecomplicatedthanwewantedtoportGPU— MPI,variableparticlecount,etc.

§ Quicksilver_lite isevenmoreapproximatethanQuicksilver— Zero-Dmesh,verysimplifiedphysics

§ Quicksilver_lite maintainsfeaturesmostlikelytoimpairGPUperformance— Randomtablelook-ups— Callstackdepthinnucleardatalook-ups— Branchycontrolflowanddivergence

Totestbig-kernelwewroteaproxyappforourproxyapp

Initialize ComputeP8CPU(10threads) 0.27sec 1.25sec

P8CPU(40threads) 0.45sec 0.72sec

P-100GPU 0.26sec 0.45sec

QS_lite runtimes(lowerisbetter)

QS_lite providedourfirstevidencethatthebig-kernelapproach

mightactuallywork

Page 15: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373715

§ Bychangingprobleminputswecanadjust:— Fractionofparticlesthatreachcensus— Ratiooffacetcrossingstoreactions— Relativeprobabilitiesofdifferentreactiontypes

MercuryisemployedforaverywidevarietyofproblemsNosinglesampleproblemwillrepresentallusecases

Fat (CPU)seg/sec Thin (GPU)seg/secReactionDominated 8.52e+06 1.15e+07

Balanced 1.35e+07 7.50+e06

Facet Dominated 2.24e+07 2.90+e07

Higheris

better

GPUsandCPUsaresimilar,butwithdifferenceperformancesensitivities.GPUsareslowestinbalancedcase.Perhapsduetohighestdivergence?

Page 16: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373716

§ Differentcodecapabilitiesmakeapples-to-applescomparisonimpossible

§ Mercurytestproblemisacriticalsphereinwater— TunedmeshsizeandtimesteptoobtainsametallyratiosasasimilarQuicksilverproblem

PerformancecomparisontoMercury

CPUPerformance Mercury (seg/sec) Quicksilver(seg/sec)ReactionDominated 1.62e+06 8.52e+06

Balanced 2.66e+06 1.37e+07

FacetDominated 2.97e+06 2.24e+07

Quicksilverisroughly10xfasterthanMercuryonCPUs,butcapturessametrends.PerformancedifferencelikelysomewhatduetoMercury’sbetterphysics.

Higheris

better

Page 17: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373717

Willbigkernelwork?

Inspiteofadversealgorithmiccharacteristics,wearehopefulthatMercurywillperformequallywellonGPUsasCPUs.Apotential3-5xspeedupcomparedtoCPUsonly

0

10

20

30

40

50

Fat Thread Power 8

Thin Thread Power 8

Fat Thread KNL

Thin Thread KNL

Thin Thread P100

Cyc

le T

rack

ing

Tim

e [s

econ

ds] 1 Node 2 Nodes 4 Nodes

Cycletrackingtimeforareaction

dominatedproblem(lowerisbetter)

Page 18: Quicksilver:A Proxy App for the Monte Carlo Transport Code

LLNL-PRES-74373718

§ WeusedQuicksilvertotestdesignstrategiesforMonteCarloTransportonGPUs— Sofar,abigkernelapproachappearstobeviable

§ EffortisshiftingfromprototypingwithQuicksilvertorefactoringMercury.— Designroleismostlydone

§ ModificationsareplannedtomakeQuicksilvermorerepresentativeforphotons— ThiswillmakeQuicksilveramoreflexibleprocurementbenchmark— MoreonproxiesvsbenchmarksinThursdaykeynotetalk

Conclusionandfuturework

Page 19: Quicksilver:A Proxy App for the Monte Carlo Transport Code