lawrence livermore national laboratory llnl-pres- xxxxxx llnl-pres-657922 this work was performed...

14

Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC Dissecting On-node Memory Performance with MemAxes Petascale Tools Workshop 2014 Alfredo Gimenez * , Todd Gamblin † , Martin Schulz † , Peer-Timo Bremer † , Barry Rountree † , Abhinav Bhatele † , Ilir Jusufi * , and Bernd Hammann * Madison, WI August 4-7, 2014 † LLNL * UC Davis

Upload: osborne-dennis

Post on 19-Jan-2016

215 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National Laboratory LLNL-PRES-XXXXXX

LLNL-PRES-657922This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

Dissecting On-node Memory Performance with MemAxes

Petascale Tools Workshop 2014

Alfredo Gimenez*, Todd Gamblin†, Martin Schulz†, Peer-Timo Bremer†, Barry Rountree†,

Abhinav Bhatele†, Ilir Jusufi*, and Bernd Hammann*

Madison, WIAugust 4-7, 2014

† LLNL* UC Davis

Page 2: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Memory Access Sampling• Recent hardware additions allow us to precisely

sample events, including memory accesses• Intel PEBS, AMD IBS

• Memory access samples contain:• The instruction pointer• The address accessed• How many core clock cycles elapsed during the access• Where in the memory hierarchy the address was resolved

(e.g. L1 cache, Local RAM, Remote RAM)

• We need a way to meaningfully interpretthese samples

Page 3: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Can get thesefrom tools

Need help from app

Adding Context• Can better understand memory references with

appropriate context

• Contexts include:– The code– The node hardware topology– Calling context (call path)– The application (e.g. fluid dynamics)

• Other work by Liu & Mellor-Crummey has looked at mapping latency & access patterns to particular variables, call paths, and access patterns.

Page 4: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

We can already get coarse-grained application context for some codes

• Physics data is available in data structures

• Time steps are easy to mark in the code

• Per-process performance– easy to get– just turn on counters at the

beginning of the run– read them periodically.

• What if we want finer-grained attribution?– How to tie measurements to data

structures?– How to slice and dice the data?

Aluminum

FLOP/s per MPI process

Page 5: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Node topology is easy to get, but not shown clearly.

• PEBS provides metadata for node topology

• Want to highlight connections clearly to show:– Load distribution– Bandwidth– Resource contention

• Existing visualization from hwloc (right)– Does not scale– Clutters connections between

components

Page 6: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

We have developed a measurement tool for collecting detailed context

*SMT: (Semantic Memory Tree) data structure used to mapcallbacks sampled instruction operands

• Use PEBS sampling for hardware information• Supplement with application instrumentation for

mapping addresses to physical coordinates

*

Page 7: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Currently the developer has to instrument the application manually• Add calls to get metadata for allocated objects:

1. Label string2. Start and end addresses3. Size of each element4. Number of elements5. Callback to map address to physical coordinates

• Metadata must be provided by the programmer– Could easily be implemented in libraries– Lots of common mesh libraries would be interesting for this.

Page 8: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Instrumentation

Specify DataObjects

Add additional semantic attributes and define attribution function (optional)

Page 9: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Semantic Memory Tree

Binary Search Tree

VelocityVelocity PressurePressure TempTemp DensityDensity

0x0F 0xF6

0x0F 0x80 0xA2 0xF6

0x0F 0x20 0x40 0x80 0xA2 0xC2 0xE0 0xF6

Address Ranges

Semantic Memory Ranges

Page 10: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Lagrangian Hydrodynamics: LULESH

2D 3D

3D with mappedperformance data

Page 11: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

We have developed MemAxes, a tool for analyzing on-node memory performance

• Measurement component samples memory instructions• We map latency information onto A) source code, B) node topology • C) Pie chart shows percent of total latency selected• D) Parallel coordinates view allows exploration of correlations

Page 12: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Linked views clearly show on-nodelocality problems

PIPER

• Parallel coordinates view shows correlation between array index and core id in LULESH

• Linked node topology view shows data motion for highlighted memory operations

• A contiguous chunk of an array is initially split between threads on four cores

• Using an optimized affinity scheme, we improve locality

• Performance improved by 10%

Default thread affinity with poor locality

Optimized thread affinity with good locality

Page 13: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Hyperion Thread/Core Binding

Improved cache usage44% less access cycles10% total speedup

Page 14: Lawrence Livermore National Laboratory LLNL-PRES- XXXXXX LLNL-PRES-657922 This work was performed under the auspices of the U.S. Department of Energy by

Lawrence Livermore National LaboratoryLLNL-PRES-657922

Future work• Back-port perf_events API to production TOSS 2 kernel

– Currently unable to do fine-grained memory sampling on production machines due to PMU access limits

– Affects some Intel thread tools as well

• More detailed architecture mapping– Sandy Bridge LLC ring interconnect information?– Other node architecture features?

• Instrument AMR libraries for proper context attribution– Study per-patch memory behavior– Study blocking behavior of solvers

• How to query large instruction traces effectively?

LLNL-PRES-652730 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

LLNL Report for USNDP · Lawrence Livermore National Laboratory LLNL-PRES-795944 2 §0.25 FTE for $134k §Coordinate LLNL nuclear data efforts with CSEWG §Make, Verify, Validate

LLNL-PRES-653431 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

The Earth System Grid Federation · 2019-10-09 · LLNL-PRES-790557 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory

NIF User Forum - Lawrence Livermore National Laboratory · 2016-12-01 · LLNL-PRES-679384 This work was performed under the auspices of the U.S. Department of Energy by Lawrence

1 of 32 LLNL-PRES-663440 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract

Spack2 - GitHub Pages · LLNL-PRES-803473 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

LLNL-PRES-634152-DRAFT This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract

Safety considerations for Fusion Energy: From experimental ... · LLNL-PRES-818604 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

LLNL Data Center Consolidation Initiative · LLNL Data Center Consolidation Initiative October 29, 2013 Anna Maria Bailey, PE LLNL-PRES-457823 . Lawrence Livermore National Laboratory

Parallel I/O - International HPC Summer School · ParallelI/O InternationalHPCSummerSchool July11,2018 ElsaGonsiorowski HPCI/OSpecialist,LLNL LLNL-PRES-751922 ThisworkwasperformedundertheauspicesoftheU.S

LLNL-PRES-539552 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

LLNL-PRES-672063 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

LLNL-PRES-645304 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

Tuesday, 17 July 2012 - NARAC · LLNL-PRES-564114 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract

Puﬃn:&An&Embedded&Domain1Speciﬁc&Language&for& …hpc.pnl.gov/conf/wolfhpc/2015/talks/earl.pdf · 2015-12-28 · LLNL-PRES-679383 This work was performed under the auspices of

Characteristic*BasedSlow*Wave*Fast*Wave …...LLNL#PRES#731129 Thisworkwasperformed1under1the1auspicesof1the1U.S.1Department1of1EnergybyLawrence1Livermore1National1Laboratoryunder1c

LLNL-PRES-641496 Resistive-MHD simulations of Coaxial Helicity … · 2013. 8. 16. · This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

National Lab Roles and Responsibilities in the Precision Strike … · 2017. 5. 18. · LLNL-PRES-651754 This work was performed under the auspices of the U.S. Department of Energy

LLNL-PRES-482473 Lawrence Livermore National Laboratory, P. O. Box 808, Livermore, CA 94551 This work performed under the auspices of the U.S. Department

Tutorial:*How*to*install,*tune*and*Monitor* … · 2016-04-14 · LLNL-PRES-683717 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore

Next Generation Image Processing for Computed Tomography · LLNL-PRES-636374 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National

Lawrence Livermore National Laboratory LLNL-PRES-xxxxxx 1 LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy

LLNL-PRES-638575 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

LLNL-PRES-679957 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

Lawrence Livermore National Laboratory Robert D. Falgout Center for Applied Scientific Computing LLNL-PRES-231999 This work performed under the auspices

1 Parallel Performance Analysis with Open|SpeedShop Trilab Tools-Workshop Martin Schulz, LLNL/CASC LLNL-PRES-426152

LLNL-PRES-559814 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

LLNL-PRES-750904 The potential of imposed magnetic fields

LLNL-PRES-673936 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344

LLNL-PRES-573032 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract

Build&and&Test&Automa0on& atLivermoreCompung · LLNL-PRES-678531 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory

LLNL-PRES-655514 This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344

NEA/Nuclear Science Committee/WPEC · 2012-05-29 · LLNL-PRES-XXXXXX This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory