ucx an open source framework for hpc network ap is and beyond

24
ARM Research – Software & Large Scale Systems UCX: An Open Source Framework for HPC Network APIs and Beyond Pavel Shamis (Pasha) Principal Research Engineer

Upload: insidehpc

Post on 27-Jan-2017

358 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Ucx  an open source framework for hpc network ap is and beyond

ARM Research – Software & Large Scale Systems

UCX: An Open Source Framework for HPC Network APIs and Beyond

Pavel Shamis (Pasha) Principal Research Engineer

Page 2: Ucx  an open source framework for hpc network ap is and beyond

Co-Design Collaboration

Collaborative Effort Industry, National Laboratories and Academia

The Next Generation

HPC Communication Framework

Page 3: Ucx  an open source framework for hpc network ap is and beyond

Challenges

§  Performance Portability (across various interconnects) §  Collaboration between industry and research institutions

§  …but mostly industry (because they built the hardware)

§  Maintenance §  Maintaining a network stack is time consuming and expensive §  Industry have resources and strategic interest for this

§  Extendibility §  MPI+X+Y ? §  Exascale programming environment is an ongoing debate

Page 4: Ucx  an open source framework for hpc network ap is and beyond

UCX – Unified Communication X Framework

§  Unified §  Network API for multiple network architectures that target HPC

programing models and libraries

§  Communication §  How to move data from location in memory A to location in memory B

considering multiple types of memories

§  Framework §  A collection of libraries and utilities for HPC network programmers

Page 5: Ucx  an open source framework for hpc network ap is and beyond

History

MXM ●  Developed by Mellanox Technologies ●  HPC communication library for InfiniBand

devices and shared memory ●  Primary focus: MPI, PGAS

PAMI ●  Developed by IBM on BG/Q, PERCS, IB

VERBS ●  Network devices and shared memory ●  MPI, OpenSHMEM, PGAS, CHARM++, X10 ●  C++ components ●  Aggressive multi-threading with contexts ●  Active Messages ●  Non-blocking collectives with hw accleration

support

Decades of community and industry experience in

development of HPC software

UCCS ●  Developed by ORNL, UH, UTK ●  Originally based on Open MPI BTL and OPAL

layers ●  HPC communication library for InfiniBand,

Cray Gemini/Aries, and shared memory ●  Primary focus: OpenSHMEM, PGAS ●  Also supports: MPI

Page 6: Ucx  an open source framework for hpc network ap is and beyond

What we are doing differently…

§  UCX consolidates multiple industry and academic efforts §  Mellanox MXM, IBM PAMI, ORNL/UTK/UH UCCS, etc.

§  Supported and maintained by industry §  IBM, Mellanox, NVIDIA, Pathscale, ARM

Page 7: Ucx  an open source framework for hpc network ap is and beyond

What we are doing differently…

§  Co-design effort between national laboratories, academia, and industry

Applications: LAMMPS, NWCHEM, etc.

Programming models: MPI, PGAS/Gasnet, etc.

Middleware:

Driver and Hardware Co-

desi

gn

Page 8: Ucx  an open source framework for hpc network ap is and beyond

UCX

InfiniBand uGNI Shared Memory GPU Memory Emerging

Interconnects

MPI GasNet PGAS Task BasedRuntimes I/O

Transports

Protocols Services

Applications

Page 9: Ucx  an open source framework for hpc network ap is and beyond

A Collaboration Efforts §  Mellanox co-designs network API and contributes MXM technology

§  Infrastructure, transport, shared memory, protocols, integration with OpenMPI/SHMEM, MPICH

§  ORNL & LANL co-designs network API and contributes UCCS project §  InfiniBand optimizations, Cray devices, shared memory

§  ARM co-designs the network API and contributes optimizations for ARM eco-system

§  NVIDIA co-designs high-quality support for GPU devices §  GPUDirect, GDR copy, etc.

§  IBM co-designs network API and contributes ideas and concepts from PAMI

§  UH/UTK focus on integration with their research platforms

Page 10: Ucx  an open source framework for hpc network ap is and beyond

Licensing

§  Open Source §  BSD 3 Clause license §  Contributor License Agreement – BSD 3 based

Page 11: Ucx  an open source framework for hpc network ap is and beyond

UCX Framework Mission §  Collaboration between industry, laboratories, and academia §  Create open-source production grade communication framework for HPC applications §  Enable the highest performance through co-design of software-hardware interfaces §  Unify industry - national laboratories - academia efforts

Performance oriented

Optimization for low-software overheads in communication path allows

near native-level performance

Community driven

Collaboration between industry, laboratories, and academia

Production quality

Developed, maintained, tested, and used by industry and researcher community

API

Exposes broad semantics that target data centric and HPC programming

models and applications

Research

The framework concepts and ideas are driven by research in academia,

laboratories, and industry

Cross platform

Support for Infiniband, Cray, various shared memory (x86-64 and Power),

GPUs

Co-design of Exascale Network APIs

Page 12: Ucx  an open source framework for hpc network ap is and beyond

Architecture

Page 13: Ucx  an open source framework for hpc network ap is and beyond

UCX Framework

UC-S for Services This framework provides basic infrastructure for component based programming, memory management, and useful system utilities Functionality: Platform abstractions, data structures, debug facilities.

UC-T for Transport Low-level API that expose basic network operations supported by underlying hardware. Reliable, out-of-order delivery. Functionality: Setup and instantiation of communication operations.

UC-P for Protocols High-level API uses UCT framework to construct protocols commonly found in applications Functionality: Multi-rail, device selection, pending queue, rendezvous, tag-matching, software-atomics, etc.

Page 14: Ucx  an open source framework for hpc network ap is and beyond

A High-level Overview

UC-T (Hardware Transports) - Low Level API RMA, Atomic, Tag-matching, Send/Recv, Active Message

Transport for InfiniBand VERBs driver

RC UD XRC DCT

Transport for intra-node host memory communication

SYSV POSIX KNEM CMA XPMEM

Transport for Accelerator Memory

communucation

GPU

Transport for Gemini/Aries

drivers

GNI

UC-S (Services)

Common utilities

UC-P (Protocols) - High Level APITransport selection, cross-transrport multi-rail, fragmentation, operations not supported by hardware

Message Passing API Domain:tag matching, randevouze

PGAS API Domain: RMAs, Atomics

Task Based API Domain: Active Messages

I/O API Domain:Stream

Utilities Data stractures

Hardware

MPICH, Open-MPI, etc. OpenSHMEM, UPC, CAF, X10, Chapel, etc. Parsec, OCR, Legions, etc. Burst buffer, ADIOS, etc.

ApplicationsUC

X

MemoryManagement

OFA Verbs Driver Cray Driver OS Kernel Cuda

Page 15: Ucx  an open source framework for hpc network ap is and beyond

UCP API (DRAFT) Snippet (https://github.com/openucx/ucx/blob/master/src/ucp/api/ucp.h) §  ucs_status_t ucp_put(ucp_ep_h ep, const void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey) Blocking remote memory put operation.

§  ucs_status_t ucp_put_nbi (ucp_ep_h ep, const void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)

Non-blocking implicit remote memory put operation.

§  ucs_status_t ucp_get (ucp_ep_h ep, void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)

Blocking remote memory get operation.

§  ucs_status_t ucp_get_nbi (ucp_ep_h ep, void ∗buffer, size_t length, uint64_t remote_addr, ucp_rkey_h rkey)

Non-blocking implicit remote memory get operation.

§  ucs_status_t ucp_atomic_add32 (ucp_ep_h ep, uint32_t add, uint64_t remote_addr, ucp_rkey_h rkey)

Blocking atomic add operation for 32 bit integers.

§  ucs_status_t ucp_atomic_add64 (ucp_ep_h ep, uint64_t add, uint64_t remote_addr, ucp_rkey_h rkey) Blocking atomic add operation for 64 bit integers.

§  ucs_status_t ucp_atomic_fadd32 (ucp_ep_h ep, uint32_t add, uint64_t remote_addr, ucp_rkey_h rkey, uint32_t ∗result)

Blocking atomic fetch and add operation for 32 bit integers.

§  ucs_status_t ucp_atomic_fadd64 (ucp_ep_h ep, uint64_t add, uint64_t remote_addr, ucp_rkey_h rkey, uint64_t ∗result)

Blocking atomic fetch and add operation for 64 bit integers.

§  ucs_status_ptr_t ucp_tag_send_nb (ucp_ep_h ep, const void ∗buffer, size_t count, ucp_datatype_t datatype, ucp_tag_t tag, ucp_send_callback_t cb)

Non-blocking tagged-send operations.

§  ucs_status_ptr_t ucp_tag_recv_nb (ucp_worker_h worker, void ∗buffer, size_t count, ucp_datatype_t datatype, ucp_tag_t tag, ucp_tag_t tag_mask, ucp_tag_recv_callback_t cb)

Non-blocking tagged-receive operation.

Page 16: Ucx  an open source framework for hpc network ap is and beyond

Preliminary Evaluation ( UCT )

§  Pavel Shamis, et al. “UCX: An Open Source Framework for HPC Network APIs and Beyond,” HOT Interconnects 2015 - Santa Clara, California, US, August 2015

§  Two HP ProLiant DL380p Gen8 servers §  Mellanox SX6036 switch, Single-port Mellanox Connect-IB FDR (10.10.5056) §  Mellanox OFED 2.4-1.0.4. (VERBS) §  Prototype implementation of Accelerated VERBS (AVERBS)

����������������������������

�� �� �� �� ��� ��� ���

��������������������

��������������������

������������������������������������������������������������������

������������������������������������������������

�� �� �� �� ��� ��� ���������������

��������������������

���������������������������������������������������

������������������������������������������������

��

��

��

��

��

��

��

��

�� ��� �� ��� ��

����������������

��������������������

������������������������������������������������������������������

Page 17: Ucx  an open source framework for hpc network ap is and beyond

OpenSHMEM and OSHMEM (OpenMPI) Put Latency (shared memory)

0.1

1

10

100

1000

8 16 32 64 128 256 512 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB256KB512KB 1MB 2MB 4MB

Late

ncy

(use

c, lo

gsca

le)

Message Size

OpenSHMEM−UCX (intranode)OpenSHMEM−UCCS (intranode)

OSHMEM (intranode)

Lower is better

Slide courtesy of ORNL UCX Team

Page 18: Ucx  an open source framework for hpc network ap is and beyond

OpenSHMEM and OSHMEM (OpenMPI) Put Injection Rate

Higher is better

Connect-IB

0

2e+06

4e+06

6e+06

8e+06

1e+07

1.2e+07

1.4e+07

8 16 32 64 128 256 512 1KB 2KB 4KB

Mes

sage

Rat

e (p

ut o

pera

tions

/ se

cond

)

Message Size

OpenSHMEM−UCX (mlx5)OpenSHMEM−UCCS (mlx5)

OSHMEM (mlx5)OSHMEM−UCX (mlx5)

Slide courtesy of ORNL UCX Team

Page 19: Ucx  an open source framework for hpc network ap is and beyond

OpenSHMEM and OSHMEM (OpenMPI) GUPs Benchmark

Higher is better

Connect-IB

0

0.0002

0.0004

0.0006

0.0008

0.001

0.0012

0.0014

0.0016

0.0018

2 4 6 8 10 12 14 16

GU

PS (b

illion

upd

ates

per

sec

ond)

Number of PEs (two nodes)

UCX (mlx5)OSHMEM (mlx5)

Slide courtesy of ORNL UCX Team

Page 20: Ucx  an open source framework for hpc network ap is and beyond

MPICH - Message rate Preliminary Results

0

1

2

3

4

5

6 1 2 4 8 16

32

64

128

256

512 1k

2k

4k

8k

16k

32k

64k

128k

25

6k

512k

1M

2M

4M

MM

PS

MPICH/UCX MPICH/MXM

Slide courtesy of Pavan Balaji, ANL - sent to the ucx mailing list

Connect-IB

“non-blocking tag-send”

Page 21: Ucx  an open source framework for hpc network ap is and beyond

Where is UCX being used?

§  Upcoming release of Open MPI 2.0 (MPI and OpenSHMEM APIs) §  Upcoming release of MPICH §  OpenSHMEM reference implementation by UH and ORNL §  PARSEC – runtime used on Scientific Linear Libraries

Page 22: Ucx  an open source framework for hpc network ap is and beyond

What Next ?

§  UCX Consortium ! §  http://www.csm.ornl.gov/newsite/

§  UCX Specification §  Early draft is available online:

http://www.openucx.org/early-draft-of-ucx-specification-is-here/

§  Production releases §  MPICH, Open MPI, Open SHMEM(s), Gasnet, and more…

§  Support for more networks and applications and libraries §  UCX Hackathon 2016 !

§  Will be announced on the mailing list and website

Page 23: Ucx  an open source framework for hpc network ap is and beyond

https://github.com/orgs/openucx WEB: www.openucx.org Contact: [email protected] Mailing List:https://elist.ornl.gov/mailman/listinfo/ucx-group [email protected]

Page 24: Ucx  an open source framework for hpc network ap is and beyond

Questions ? Unified Communication - X Framework WEB: www.openucx.org Contact: [email protected] WE B: https://github.com/orgs/openucx Mailing List:https://elist.ornl.gov/mailman/listinfo/ucx-group [email protected]