dr. robert w. wisniewski chief software architect extreme ... · (elk/splunk) underlying components...

24
Dr. Robert W. Wisniewski Chief Software Architect Extreme Scale Computing, Intel November 15, 2017

Upload: others

Post on 03-Sep-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Dr. Robert W. Wisniewski

Chief Software Architect Extreme Scale Computing, Intel

November 15, 2017

Page 2: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Session Agenda and Objective

• OpenHPC• Value

• Goals

• Background

• Update

• Intel’s component work planned for submission to OpenHPC

Page 3: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Value from Community

• Stable HPC System Software that:

Fuels a vibrant and efficient HPC software ecosystem

Removes duplication of effort throughout community

Simplify the complexity of installation, configuration, and ongoing maintenance of a custom software stack

Takes advantage of hardware innovation and drives revolutionary technologies

Eases traditional HPC application development and testing at scale

Development environment for new workloads (ML, analytics, big data, cloud)

A Shared Repository

Page 4: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC - Mission and VisionMission: to provide a reference collection of open-source HPC software components and best practices, lowering barriers to deployment, advancement, and use of modern HPC methods and tools.

Vision: OpenHPC components and best practices will enable and accelerate innovation and discoveries by broadening access to state-of-the-art, open-source HPC methods and tools in a consistent environment, supported by a collaborative, worldwide community of HPC users, developers, researchers, administrators, and vendors.

Recent article by Adrian Reber, OpenHPC TSC and Red Hat: https://opensource.com/article/17/11/openhpc 4

Page 5: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC: a brief History...

5

ISC’15BoF on the merits/interest in a community effort

ISC’16v1.1.1 release,Linux Foundation announces technical leadership, founding members, and governance

SC’15Initial v1.0 release, gather interested parties to work with Linux Foundation

SC’16v1.2 release, BoF

ISC’17v1.3.1release,BoF

June 2015

Nov 2015

June 2016

Nov 2016

June 2017

Nov 2017

SC’17V1.3.3 release, BoF

Page 6: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

6

Current Project Members

Member participation interest? Please contact Jeff ErnstFriedman

Mixture of academics, labs, and industry

WWW.OpenHPC.Community

Page 7: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC Stack Overview

Page 8: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC v1.3.3 - Current S/W components Functional Areas Components

Base OS CentOS 7.4, SLES12 SP3

Architecture aarch64, x86_64

Administrative ToolsConman, Ganglia, Lmod, LosF, Nagios, pdsh, pdsh-mod-slurm, prun, EasyBuild, ClusterShell, mrsh, Genders, Shine, Spack, test-suite

Provisioning Warewulf, xCAT

Resource Mgmt. SLURM, Munge, PBS Professional, PMIx

Runtimes OpenMP, OCR, Singularity

I/O Services Lustre client, BeeGFS client

Numerical/Scientific Libraries

Boost, GSL, FFTW, Hypre, Metis, Mumps, OpenBLAS, PETSc, PLASMA, Scalapack, Scotch, SLEPc, SuperLU, SuperLU_Dist, Trilinos

I/O Libraries HDF5 (pHDF5), NetCDF/pNetCDF (including C++ and Fortran interfaces), Adios

Compiler Families GNU (gcc, g++, gfortran), Clang/LLVM

MPI Families MVAPICH2, OpenMPI, MPICH

Development Tools Autotools, cmake, hwloc, mpi4py, R, SciPy/NumPy, Valgrind

Performance Tools PAPI, IMB, mpiP, pdtoolkit TAU, Scalasca, ScoreP, SIONLib

Page 9: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Basic Cluster Install Example

• Starting install guide/recipe targeted for flat hierarchy

• Leverages image-based provisioner: Warewulf or xCAT

• PXE boot (stateful or stateless)

• Optionally connect external Lustre* or BeeGFS parallel file system

• Need hardware-specific information to support (remote) bare-metal provisioning

Page 10: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Target System Design

10

Large systems have a considerable number of Service Nodes (SNs)

– SMS – System Management Server

– Row/Rack Controllers

– I/O Nodes (IONs)

– Specialized servers for Fabric Manager (FM), Workload Manager/Resource Manager (RM), Database (DB)

For flexibility, the control system will target a “pool” of service nodes

– i.e., the control system has a mechanism to prefer service nodes for efficiency, affinity, or for necessary characteristics when assigning a particular function

– But, the control system has flexibility so it can spread work over many nodes for performance and resiliency

For max application performance, Compute Nodes (CNs) are avoided for execution of control system software

– Reduce noise; reduce footprint overhead in CNs

– Can leverage CNs when noise is ok or expected

– job control operations (start, kill, etc)

– activity between jobs

This approach encourages Scalable Unit packaging concepts

Page 11: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Focus on Core System Software Components

• mOS– Scalable operating systems

• Unified Control System– Unified, Productive (single pane of glass), Reliable

• DAOS– Distribute Asynchronous Object Store

• MPI– Scalable, high performance, topology optimized

• GEOPM– Global Extensible Open Power Manager

• PMIx– Process management with “Instant On”

11

Page 12: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

12

mOS High-Level Architecture

• LWK performance for HPC applications

• Nimble to adapt to new technology

• Linux compatibility

• Better contained containers

Page 13: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

systemmanifest

Data Access Interface

DAI Framework(SCON, data tier access, daemon mgmt,resilient workitems, etc)

Provider

Security

Provider

Provisioner

Provider

Fabric Manager

Provider

Low-Level Control

Provider

Monitor

Provider

Workload Manager

Provider

Inventory

Provider

Service

Provider

Alerts

Provider

RAS

13

Operator Interface

(Web & REST)

SLURM Warewulf OPA FM Actsys SensysSCON

Service

Pro

vid

er

Opera

tor

Inte

rface

Provider

SCON

Provider

Master

offline tier(ELK/Splunk)

Underlying ComponentsGuided by DAI providers, but generally operate autonomously

bmcs

pduscdus

rectifiers

Inventory

racks

drawerschassis

bladesboards

switches cpus

cablescoresmemory

devices

nodes

Configuration

OS images

softwarecomponents

temperature

Environmental

voltage currentcoolantflow

RAS

events

alerts

jobs

Operations

provisioning

serviceops

Unified Control Systemsystem data

Service

Inventory

Env Data

Alerts

RAS

online tier(VoltDB)

nearline tier(PostgreSQL)

Unified Control System• Provide a comprehensive system view

• Advance the state-of-the art

• Build upon existing work

• Support the system lifecycle

Page 14: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

•Scale-out object store designed from the ground up for nextgen storage & fabric technologies

– High throughput/IOPS @arbitrary alignment/size

– Byte addressable for better application scalability

– no read-modify-write or false block sharing

– OS bypass with lightweight client/server

– Small memory footprint & low CPU usage

•Advanced storage API

– New scalable storage model suitable for both structured & unstructured data

– Non-blocking data & metadata operations

– Metadata & data query/indexing

Distributed Asynchronous Object Storage

DAOSOpen Source Apache 2.0 License

Traditional HPC Apps

NetCDF

Spark RDD

SCRFTI

VeloC

MPI-IO

(No)SQL

Big Data & AIApps

HDF5

Ext

POSIXI/O

ApacheArrow

Dataspaces

•Software-defined storage platform

– Flexible storage provisioning

– Predictable performance/capacity

– Ease of management

•Seamless Integration with Lustre

– Single unified namespace

NVRAM NVMeNVMeHDD NVMeNVRAMArgobotsThread model

MercuryFunction shipping

Ext

Page 15: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

15

MPICH-OFI

• Open-source implementation based on MPICH

• Uses the new CH4 infrastructure• Co-designed with MPICH community

• Targets existing and new fabrics via next-gen Open Fabrics Interface (OFI)• Ethernet/sockets, Intel® Omni-Path, Cray Aries*, IBM BG/Q*, InfiniBand*

• Improving Performance• Topology-aware collectives and process mapping

• Optimizations for newer networks

• Thread performance enhancements

• Support for automated configuration

• Added support for latest libfabric 1.5 features

Page 16: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

16

Runtime for in-band power management and optimization On-the-fly monitoring of hardware counters and application profiling

Feedback-guided optimization of hardware control knob settings

Open source software (flexible BSD three clause license)

Extensible and portable through plugin architecture Enables portability beyond x86 architectures (truly open)

Enables rapid prototyping of new power management strategies

Accommodates the reality that different sites have different constraints and preferences for performance vs. energy savings

Designed for holistic optimization across a whole job Job-wide global optimization of HW control knob settings

Application-awareness for max speedup or energy savings

Scalable via distributed tree-hierarchical design, algorithms

MPI Comms Overlay Shared Mem Region

Power-AwareRM / Scheduler

GEOPM Controller

SHM

GEOPM

GEOPM Root

GEOPM Aggregator

GEOPM Aggregator

GEOPM Leaf

Msr-safe (or Other Drivers for Non-x86 Platforms)

MSR

MPI Ranks0 to i-1

GEOPM Leaf

Processor

MPI Ranksi to j-1

Processor

MPI Ranksj to k-1

GEOPM Leaf

Processor

MPI Ranksk to n-1

GEOPM Leaf

Processor

Project url: http://geopm.github.io/geopm

Contact: [email protected]

Page 17: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OMPISpectrumOSHMEM

SOSPGASothers

What is PMIx?

PMI-1 PMI-2

wireup supportdynamic spawn

keyval publish/lookup

MPICHyears go by…

SLURM

ALPS

RM

PGASothers

2015

Exascale systemson horizon

Launch times longNew paradigms

2016

Exascale launchin < 10s

Orchestration

PMIx v1.2

SLURMJSM

RM

OMPISpectrumOSHMEM

2017

Exascale launchin < 30s

PMIx v2.x

SLURMJSM

others

RM

Page 18: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Three Distinct Entities

• PMIx Standard

Defined set of APIs, attribute strings

Nothing about implementation

• PMIx Reference Library

A full-featured implementation of the Standard

Intended to ease adoption

• PMIx Reference Server

Full-featured “shim” to a non-PMIx RM

Page 19: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Backup

• Backup

Page 20: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

Hierarchical Overlay for OpenHPC software

Page 21: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC Development Infrastructure

• The ‘usual’ software engineering stuff:• GitHub (SCM and issue tracking/planning)

• Continuous Integration (CI) Testing (Jenkins)

• Documentation (Latex)

• Capable build/packaging system• At present: we target a common

delivery/access mechanism that adopts Linux sysadmin familiarity

• Require Flexible System to manage builds

• A system using Open Build Service (OBS) supported by back-end git

Page 22: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC Build System: OBS

• Manage Build Process

• Drive Builds for multiple repositories

• Repeatable builds

• Generate binary and src rpms

• Publish corresponding package repositories

• Client/server architecture supports distributed build slaves and multiple architectures

Page 23: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC Integration/Testing/Validation

• Install recipes

• Cross-package interaction

• Development environment

• Mimic use cases common in HPC deployments

• Upgrade mechanism

Page 24: Dr. Robert W. Wisniewski Chief Software Architect Extreme ... · (ELK/Splunk) Underlying Components Guided by DAI providers, but generally operate autonomously bmcs cdus pdus rectifiers

OpenHPC Integration/Test/Validation

• Standalone integration test infrastructure

• Families of tests that could be used during:• Initial install process (can we build a system?)• Post-install process (does it work?) • Developing tests that touch all of the major components (can we

compile against 3rd party libraries, will they execute under resource manager, etc.?)

• Expectation is that each new component included will need corresponding integration test collateral

• Integration tests are included in the GitHub repo

• Global testing harness includes a number of embedded subcomponents