LLNL-PRES-747560
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
Spack: A Package Manager for HPC2019 HPC-AI Advisory Council Stanford Conference
Todd GamblinFebruary 14, 2019Stanford University
Computer Scientist
@spackpmgithub.com/spack
2LLNL-PRES-747560
@spackpmgithub.com/spack
Scientific software is becoming extremely complex
R Miner: R Data Mining Librarydealii: C++ Finite Element LibraryNalu: Generalized Unstructured Massively Parallel Low Mach Flow
3LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Half of this DAG is external (blue); more than half of it is open source
▪ Nearly all of it needs to be built specially for HPC to get the best performance
Even proprietary codes are based on many open source libraries
4LLNL-PRES-747560
@spackpmgithub.com/spack
The Exascale Computing Project is building an entire ecosystem
▪ Every application has its own stack of dependencies.
▪ Developers, users, and facilities dedicate (many) FTEs to building & porting.
▪ Often trade reuse and usability for performance.
80+ software packagesx5+ target architectures/platforms
Xeon Power KNL
NVIDIA ARM Laptops?
x
Up to 7 compilers
Intel GCC Clang XL
PGI Cray NAG
x
= up to 1,260,000 combinations!
15+ applications
x10+ Programming Models
OpenMPI MPICH MVAPICH OpenMP CUDAOpenACC Dharma Legion RAJA Kokkos
2-3 versions of each package +
external dependenciesx
We must make it easier to rely on others’ software!
5LLNL-PRES-747560
@spackpmgithub.com/spack
How to install software on a Mac laptop, circa 2013
6LLNL-PRES-747560
@spackpmgithub.com/spack
How to install software on a supercomputer
configure
make
Fight with compiler...
make
Tweak configure args...
make install
make
configure
configure
make
make install
cmake
make
make install
1. Download all 16 tarballs you need
2. Start building!
3. Run code4. Segfault!?5. Start over…
7LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Most supercomputers deploy some form of environment modules— TCL modules (dates back to 1995) and Lmod (from TACC) are the most popular
▪ Modules don’t handle installation!— They only modify your environment (things like PATH, LD_LIBRARY_PATH, etc.)
▪ Someone (likely a team of people) has already installed gcc for you!— Also, you can only `module load` the things they’ve installed
What about modules?
$ gcc- bash: gcc: command not found
$ module load gcc/7.0.1$ gcc –dumpversion7.0.1
8LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Containers provide a great way to reproduce and distribute an already-built software stack
▪ Someone needs to build the container!— This isn’t trivial— Containerized applications still have hundreds of dependencies
▪ Using the OS package manager inside a container is insufficient— Most binaries are built unoptimized— Generic binaries, not optimized for specific architectures
▪ Developing with an OS software stack can be painful— Little freedom to choose versions— Little freedom to choose compiler options, build options, etc. for packages
What about containers?
We need something more flexible to build the containers
9LLNL-PRES-747560
@spackpmgithub.com/spack
▪ How to install Spack (works out of the box):
▪ How to install a package:
▪ HDF5 and its dependencies are installed within the Spack directory.
▪ Unlike typical package managers, Spack can also install many variants of the same build.— Different compilers— Different MPI implementations— Different build options
Spack is a flexible package manager for HPC
$ git clone https://github.com/spack/spack$ . spack/share/spack/setup-env.sh
$ spack install hdf5
@spackpm
github.com/spack/spack
Visit spack.io
10LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Each expression is a spec for a particular configuration— Each clause adds a constraint to the spec— Constraints are optional – specify only what you need.— Customize install on the command line!
▪ Spec syntax is recursive— Full control over the combinatorial build space
Spack provides the spec syntax to describe custom configurations
$ spack install mpileaks unconstrained
$ spack install [email protected] @ custom version
$ spack install [email protected] %[email protected] % custom compiler
$ spack install [email protected] %[email protected] +threads +/- build option
$ spack install [email protected] cxxflags="-O3 –g3” setting compiler flags
$ spack install [email protected] os=cnl10 target=haswell setting target for X-compile
$ spack install [email protected] ^[email protected] %[email protected] ^ dependency information
11LLNL-PRES-747560
@spackpmgithub.com/spack
`spack list` shows what packages are available
▪ Spack has over 3,000 builtin package recipes.
$ spack list==> 3041 packages.abinit glew nalu py-fastaindex r-cairo r-viridisliteabyss glfmultiples nalu-wind py-fasteners r-callr r-visnetworkaccfft glib namd py-faststructure r-car r-vsnack glibmm nano py-filelock r-caret r-webshotactiveharmony glimmer nanoflann py-fiona r-category r-whiskeradept-utils glm nanopb py-fiscalyear r-catools r-withradios global nasm py-flake8 r-cdcfluview r-xdeadios2 globalarrays nauty py-flake8-polyfill r-cellranger r-xgboostadlbx globus-toolkit ncbi-magicblast py-flask r-checkmate r-xlconnectadol-c glog ncbi-rmblastn py-flask-compress r-checkpoint r-xlconnectjarsaegean gloo ncbi-toolkit py-flask-socketio r-chemometrics r-xlsxaida glpk nccl py-flexx r-chron r-xlsxjarsalbany glproto nccmp py-fn r-circlize r-xmapbridgealbert glvis ncdu py-fparser r-class r-xmlalglib gmake ncftp py-funcsigs r-classint r-xml2allinea-forge gmap-gsnap ncl py-functools32 r-cli r-xnomialallinea-reports gmime nco py-future r-clipr r-xtableallpaths-lg gmodel ncurses py-futures r-cluster r-xtsalquimia gmp ncview py-fypp r-clustergeneration r-xvectoralsa-lib gmsh ndiff py-gdbgui r-clusterprofiler r-yamlaluminum gmt nek5000 py-genders r-cner r-yapsaamg gnat nekbone py-genshi r-coda r-yaqcaffyamg2013 gnu-prolog nekcem py-geopandas r-codetools r-yarnamp gnupg nektar py-gevent r-coin r-zlibbiocampliconnoise gnuplot neovim py-git-review r-colorspace r-zooamrex gnutls nest py-git2 r-combinat r3damrvis go netcdf py-gnuplot r-complexheatmap raconandi go-bootstrap netcdf-cxx py-goatools r-compositions raftangsd gobject-introspection netcdf-cxx4 py-gpaw r-convevol ragelant googletest netcdf-fortran py-greenlet r-corhmm rajaantlr gotcha netgauge py-griddataformats r-corpcor randfoldants gource netgen py-guidata r-corrplot random123ape gperf netlib-lapack py-guiqwt r-covr randrproto. . .
12LLNL-PRES-747560
@spackpmgithub.com/spack
▪ All the versions coexist!— Multiple versions of same
package are ok.
▪ Packages are installed to automatically find correct dependencies.
▪ Binaries work regardless of user’s environment.
▪ Spack also generates module files.— Don’t have to use them.
`spack find` shows what is installed
$ spack find==> 103 installed packages.-- linux-rhel7-x86_64 / [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]@3.9.1 [email protected] [email protected] [email protected] [email protected]@1.0 [email protected] [email protected] [email protected] [email protected]@2.14.0 [email protected] [email protected] [email protected] [email protected]@1.55.0 [email protected] [email protected] [email protected] [email protected]@1.14.0 [email protected] [email protected] [email protected] [email protected]@1.0.2 jpeg@9a [email protected] [email protected] [email protected]@8.1.2 libdwarf@20130729 [email protected] [email protected] [email protected]@8.1.2 [email protected] ocr@2015-02-16 [email protected] [email protected]@2.11.1 [email protected] [email protected] [email protected]@2.5.3 [email protected] [email protected] [email protected]@2.31.2 [email protected] [email protected] [email protected]
-- linux-rhel7-x86_64 / [email protected] [email protected] [email protected] [email protected] libdwarf@20130729 [email protected]@1.0.1 [email protected] [email protected] [email protected] [email protected]
-- linux-rhel7-x86_64 / [email protected] [email protected] [email protected] [email protected]
-- linux-rhel7-x86_64 / [email protected] [email protected] [email protected] libdwarf@20130729 [email protected] [email protected]
-- linux-rhel7-x86_64 / [email protected] [email protected] [email protected] libdwarf@20130729 [email protected]@1.55.0 [email protected] [email protected] [email protected]
13LLNL-PRES-747560
@spackpmgithub.com/spack
Users can query the full dependency configuration of installed packages.
▪ Architecture, compiler, versions, and variants may differ between builds.
$ spack find callpath==> 2 installed packages.-- linux-rhel7-x86_64 / [email protected] ———————— -- linux-rhel7-x86_64 / [email protected] [email protected] [email protected]
Expand dependencies with spack find -d
$ spack find -dl callpath==> 2 installed packages.-- linux-rhel7-x86_64 / [email protected] ----------- -- linux-rhel7-x86_64 / [email protected] -----------xv2clz2 [email protected] udltshs [email protected] ^[email protected] rfsu7fb ^[email protected] ^[email protected] ybet64y ^[email protected] ^[email protected] aa4ar6i ^[email protected] ^[email protected] tmnnge5 ^[email protected] ^[email protected] ybet64y ^[email protected] ^libdwarf@20130729 g2mxrl2 ^libdwarf@20130729cj5p5fk ^[email protected] ynpai3j ^[email protected] ^[email protected] ynpai3j ^[email protected] ^libdwarf@20130729 g2mxrl2 ^libdwarf@20130729cj5p5fk ^[email protected] ynpai3j ^[email protected] ^[email protected] ynpai3j ^[email protected] ^[email protected] aa4ar6i ^[email protected]
$ spack find -dl callpath==> 2 installed packages.-- linux-rhel7-x86_64 / [email protected] ----------- -- linux-rhel7-x86_64 / [email protected] -----------xv2clz2 [email protected] udltshs [email protected] ^[email protected] rfsu7fb ^[email protected] ^[email protected] ybet64y ^[email protected] ^[email protected] aa4ar6i ^[email protected] ^[email protected] tmnnge5 ^[email protected] ^[email protected] ybet64y ^[email protected] ^libdwarf@20130729 g2mxrl2 ^libdwarf@20130729cj5p5fk ^[email protected] ynpai3j ^[email protected] ^[email protected] ynpai3j ^[email protected] ^libdwarf@20130729 g2mxrl2 ^libdwarf@20130729cj5p5fk ^[email protected] ynpai3j ^[email protected] ^[email protected] ynpai3j ^[email protected] ^[email protected] aa4ar6i ^[email protected]
14LLNL-PRES-747560
@spackpmgithub.com/spack
Spack packages are templatesThey use a simple Python DSL to define how to build a spec
from spack import *
class Dyninst(Package):"""API for dynamic binary instrumentation.""”
homepage = "https://paradyn.org"url = "http://www.paradyn.org/release8.1.2/DyninstAPI-8.1.2.tgz"
version('8.2.1', 'abf60b7faabe7a2e’)version('8.1.2', 'bf03b33375afa66f’)version('8.1.1', 'd1a04e995b7aa709’)
depends_on("cmake", type="build")
depends_on("libelf", type="link")depends_on("libdwarf", type="link")depends_on("boost @1.42: +multithreaded")
def install(self, spec, prefix):with working_dir('spack-build', create=True):
cmake('-DBoost_INCLUDE_DIR=‘ + spec['boost'].prefix.include,'-DBoost_LIBRARY_DIR=‘ + spec['boost'].prefix.lib,'-DBoost_NO_SYSTEM_PATHS=TRUE’
'..')make()make("install")
Metadata at the class level
Versions
Install logic in instance methods
Dependencies (note: they use the same spec syntax)
Patches, variants, resources, conflicts, etc.(not shown)
15LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Each unique dependency graph is a unique configuration.
▪ Each configuration installed in a unique directory.— Configurations of the same package can coexist.
▪ Hash of entire directed acyclic graph (DAG) is appended to each prefix.
▪ Installed packages automatically find dependencies— Spack embeds RPATHs in binaries.— No need to use modules or set LD_LIBRARY_PATH— Things work the way you built them
Spack handles combinatorial software complexity.
spack/opt/linux-x86_64/
gcc-4.7.2/mpileaks-1.1-0f54bf34cadk/
intel-14.1/hdf5-1.8.15-lkf14aq3nqiz/
bgq/xl-12.1/
hdf5-1-8.16-fqb3a15abrwx/...
Installation Layout
Dependency DAG
Hash
No limit on the number of versions you can have installed.
16LLNL-PRES-747560
@spackpmgithub.com/spack
▪ mpi is a virtual dependency
▪ Install the same package built with two different MPI implementations:
▪ Virtual deps are replaced with a valid implementation at resolution time.— If the user didn’t pick something and there are
multiple options, Spack picks.
Depend on interfaces (not implementations) with virtual dependencies
$ spack install mpileaks ^mvapich
$ spack install mpileaks ^[email protected]:
class Mpileaks(Package):depends_on("mpi@2:")
class Mvapich(Package):provides("mpi@1” when="@:1.8")provides("mpi@2” when="@1.9:")
class Openmpi(Package):provides("mpi@:2.2" when="@1.6.5:")
Virtual dependencies can be versioned:
dependent
provider
provider
17LLNL-PRES-747560
@spackpmgithub.com/spack
Concretization fills in missing parts of requested specs.
mpileaks ^[email protected]+debug ^[email protected]
Concrete spec is fully constrainedand can be passed to install.
Co
ncretize
▪ Workflow:1. Users input only an abstract spec with some constraints2. Spack makes choices according to policies (site/user/etc.)3. Spack installs concrete configurations of package + dependencies
▪ Dependency resolution is an NP-complete problem!— Different versions/configurations of packages require different
versions/configurations of dependencies— Concretizer searches for a configuration that satisfies all the
requirements— This is basically a SAT/SMT solve
18LLNL-PRES-747560
@spackpmgithub.com/spack
Dependency Resolution is an NP-hard problem!
▪ Different versions of packages require different versions of dependencies— Concretizer searches for a configuration that satisfies all
the requirements— Can show that SAT/SMT solve is equivalent problem
▪ Resolution is NP-complete for *just* package and version metadata— Concretization also includes compilers, variants,
architecture, optional dependencies, virtual dependencies
— We have some leeway because multiple stacks can coexist within Spack (unlike system PMs)
— Even within one DAG there can be issues!
https://research.swtch.com/version-sat
Unsatisfiable!
19LLNL-PRES-747560
@spackpmgithub.com/spack
Spack is used worldwide!
Over 350 contributorsfrom labs, academia, industry
Over 3,000 software packagesOver 150,000 downloads in the past year
Over 1,100 monthly active users (on docs site)
Plot shows sessions onspack.readthedocs.io for one month
20LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Started Spack development in 2013— Paper at SC15— Tutorials at SC16, SC17, SC18— GitHub community has grown steadily!
▪ 232 pull requests merged in lead-up to SC18!— By 74 contributors— We’ve been gradually increasing core
contributors
Spack has a very active open source community
21LLNL-PRES-747560
@spackpmgithub.com/spack
▪ We try to make it easy to modify a package— spack edit <package>— Pull request
▪ Contributors are HPC software developers as well as user support teams and admins
▪ We get contributions in the core as well as in packages
▪ LLNL still ha a majority of the core contributions, with significant help from others.
Spack has benefitted tremendously from external contributions
22LLNL-PRES-747560
@spackpmgithub.com/spack
Spack is being used on many of the top HPC systems
▪ At HPC sites for software stack+ modules— Reduced Summit deploy time from 2 weeks to 12 hrs.
— EPFL deploys its software stack with Jenkins + Spack
— NERSC, LLNL, ANL, other US DOE sites
— SJTU in China
▪ Within ECP as part of their software release process— ECP-wide software distribution
— SDK workflows
▪ Within High Energy Physics (HEP) community— HEP (Fermi, CERN) have contributed many features to
support their workflow
▪ Many others
Summit (ORNL)Sierra (LLNL)
Cori (NERSC)
SuperMUC-NG (LRZ)
23LLNL-PRES-747560
@spackpmgithub.com/spack
▪ New stuff:1. Spack environments (covered today)2. spack.yaml and spack.lock files for tracking dependencies (covered today)3. Custom configurations via command line (covered today)4. Better support for linking Python packages into view directories (pip in views)5. Support for uploading build logs to CDash6. Packages have more control over compiler flags via flag handlers7. Better support for module file generation8. Better support for Intel compilers, Intel MPI, etc.9. Many performance improvements, improved startup time
▪ Spack is now permissively licensed under Apache-2.0 or MIT — previously LGPL
▪ Over 2,900 packages (800 added since last year)— This is from November; over 3,000 in latest develop branch
Spack v0.12.1 was just released
24LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Allows developers to bundle Spack configuration with their repository
▪ Can also be used to maintain configuration together with Spack packages.— E.g., versioning your own local software stack with consistent compilers/MPI
implementations
▪ Manifest / Lockfile model pioneered by Bundler is becoming standard— spack.yaml describes project requirements— spack.lock describes exactly what versions/configurations were installed, allows
them to be reproduced.
Spack has added environments and spack.yaml / spack.lock
Simple spack.yaml file
install build project
spack.yaml file with names of required
dependencies
Lockfile describes exact versions installed
Dependency packages
25LLNL-PRES-747560
@spackpmgithub.com/spack
▪ We recently started providing base images on DockerHub with Spack preinstalled.
▪ Very easy to build a container with some Spack packages in it:
Spack environments also help with building containers
spack-docker-demo/Dockerfilespack.yaml
Base image with Spack in PATH
Copy in spack.yamlThen run spack install
List of packages to install,with constraints
Build with docker build .
Run with Singularity(or another tool)
26LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Supporting the U.S. Exascale project with binary builds— Spack will be used to manage ECP software releases— In conjunction with ECP CI, start to generate prebuilt binaries for HPC facilities— Use the same relocatable binary packages for container deployment
▪ Spack stacks: Build on environments to enable more automated deployment at HPC centers.— Single YAML-file configuration for entire site stack — Install massive combinatorial package installations, modules, etc. with one command.
▪ Spack chains:— Allow user Spack instances to leverage facility and team installations— Hierarchical development flow
▪ Architecture-specific binaries— Better provenance for builds— Better support for matching optimized binary packages to machines
▪ Better dependency resolution— Handle newer C++ libraries better— More aggressive concretizer support— Support for depending on language levels/compiler features (e.g., C++14, lambdas, OpenMP@version)
What’s on the road map?
27LLNL-PRES-747560
@spackpmgithub.com/spack
▪ U.S. Exascale Computing Project (ECP) will release software through Spack
▪ Software in ECP stack needs to run on ECP platforms,testbeds, clusters, laptops— Each new environment requires effort.
▪ ECP asks us to build a robust, reliable, and easy-to-use software stack
▪ We will provide the infrastructure necessary to make this tractable:1. A dependency model that can handle HPC software2. A hub for coordinated software releases (like xSDK)3. Build and test automation for large packages across facility4. Hosted binary and source software distributions for all ECP HPC platforms
Spack is the delivery platform for the ECP software stack
28LLNL-PRES-747560
@spackpmgithub.com/spack
▪ CI at HPC centers is notoriously difficult— Security concerns prevent most CI tools from being run by staff or by users— HPC centers really need to deploy trusted CI services for this to work
▪ We are developing a secure CI system for HPC centers:— Setuid runners (run CI jobs as users); Batch integration (similar, but parallel jobs); multi-center runner support
▪ Onyx Point will upstream this support into GitLab CI— Initial rollout in FY19 at ECP labs: ANL, ORNL, NERSC, LLNL, LANL, SNL— Upstream GitLab features can be used by anyone!
Through ECP, we are working with Onyx Point to deliver continuous integration for HPC centers
User checks out / commits code
Two-factor authentication
Fast mirroring
Setuid runner Batch runner
Trusted runners at HPC facility
29LLNL-PRES-747560
@spackpmgithub.com/spack
We are building CI infrastructure for source andbinary distribution
User
Amazon S3 source mirror
Source archives
Binary packages
Amazon S3binary mirror
HPC Centers
Pull requests
30LLNL-PRES-747560
@spackpmgithub.com/spack
Spack stacks: entire facility deployments in a single YAML file
▪ Allow users to easily express a huge cross-product of specs— All the packages needed for a facility— Generate modules tailored to the site— Generate a directory layout to browse the packages
▪ Build on the environments workflow— Manifest + lockfile— Lockfile enables reproducibility
▪ Relocatable binaries allow the same binary to be used in a stack, regular install, or container build.— Difference is how the user interacts with the stack— Single-PATH stack vs. modules.
31LLNL-PRES-747560
@spackpmgithub.com/spack
▪ As an HPC package manager, we want to provide optimized builds— Code level choices (O2, O3)— Architecture specific choices (-mcpu=cortex-a7, -march=haswell)
▪ Architectures vary as to how much they expose features to users— x86 exposes feature sets in /proc/cpuinfo— Arm hides many features behind revision number
▪ Methods for accessing architecture optimizations— Vary by both compiler and architecture
• Gcc –mcpu vs. –march, for example• Relies on architectures providing a programmatic way to get information
▪ We want to expose the names users understand— Thunderx2, cortex-a7 for arm— Power8, power9 for IBM— Haswell, skylake for Intel
Specific target information in specs – In progress
32LLNL-PRES-747560
@spackpmgithub.com/spack
▪ Spack simplifies HPC software for:— Users— Developers— Cluster installations— The largest HPC facilities
▪ Spack is central to ECP’s software strategy— Enable software reuse for developers and users— Allow the facilities to consume the entire ECP stack
▪ The roadmap is packed with new features:— Building the ECP software distribution— Better workflows for building containers— Stacks for facilities— Chains for rapid dev workflow— Optimized binaries— Better dependency resolution
The Spack community is growing rapidly
@spackpm
github.com/spack/spack
Visit spack.io
DisclaimerThis document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.