libfabric: your new bffsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your...
TRANSCRIPT
![Page 1: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/1.jpg)
libfabric: your new BFF
Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April 29, 2015
![Page 2: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/2.jpg)
Background
The Open Fabrics Interface Working Group (OFI WG) formed in August 2013 • Co-chairs:
– Sean Hefty, Intel – Paul Grun, Cray Inc.
Charter: Develop an extensible, open source framework and interface aligned with upper-layer protocols and applications needs for high-performance fabric services.
2
![Page 3: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/3.jpg)
Translation
The only network API you’ll ever need (we hope)
3
![Page 4: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/4.jpg)
Why?
• Today middleware needs to be ported to a new (and sometimes more complicated) low-level network API every 3-5 years
• These hardware-specific APIs have to be supported for upwards of 10 years
• A common API gives you portability on day one*
4
![Page 5: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/5.jpg)
How?
OFI WG is a community effort
• In no particular order: DOE, DOD, NASA, Intel, Cray, Cisco, Mellanox, IBM, UNH (plus storage vendors)…
• Expertise spans hardware and software
5
![Page 6: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/6.jpg)
Charter: open source
Development on github
• https://github.com/ofiwg • Dual licensing: GPL and BSD
6
![Page 7: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/7.jpg)
Charter: ULP and apps
Engage the user community to define requirements
• MPI requirements – ETH Zurich, SNL, ORNL, ANL, Cisco, IBM, Intel,
AMD, Cray, Microsoft, Mellanox, SGI, U Edinburgh/EPCC, U Alabama Birmingham
• PGAS and SHMEM requirements – LANL, ORNL, SNL, Intel, Mellanox, Cray
7
![Page 8: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/8.jpg)
Charter: high performance fabric
Software leading hardware
• Vendor involvement – Can we influence HPC network vendors?
• Extensible interface – Can be vendor-specific – Good way to propose acceptance into main API
8
![Page 9: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/9.jpg)
libfabric architecture
lifabric APIs
provider implementa5on
I/O service
I/O service
I/O service
…
middleware
Message Queue
Control Interface RDMA Atomics
Event Queues Tag Matching Triggered
Opera5ons CM Services
I/O service
![Page 10: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/10.jpg)
libfabric architecture: realized
lifabric APIs
provider implementa5on
MPICH
Message Queue
Control Interface RDMA Atomics
Event Queues Tag Matching Triggered
Opera5ons CM Services
OpenMPI OpenSHMEM GASNet …
sockets verbs Cicso usNIC
… Intel PSM
Cray uGNI
available on github
UPC
![Page 11: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/11.jpg)
libfabric in a visual nutshell
11
Communica5on Services
Connec5on Management
Address Vectors
Comple5on Services
Event Queues
Counters
Data Transfer Services Message Queues
Tag Matching
RMA
Atomics
Control Services
Discovery
Triggered Ope
ra5o
ns
fi_info
![Page 12: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/12.jpg)
Not shown: usage models
• Capabilities – Application desired features and permissions – e.g., RMA, Atomics, tag matching
• Attributes – Defines the limits and behavior of selected interfaces – e.g., thread safety, message ordering constraints
• Mode – Provider request on application – e.g., local memory registration, user-allocated context
12
![Page 13: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/13.jpg)
“The Fine Print”
• It’s a large API with lots of options
• It’s a work-in-progress
• Have an opinion? Join us: – http://lists.openfabrics.org/mailman/listinfo/ofiwg – Weekly meetings: Tuesdays 9am PT – https://github.com/ofiwg/libfabric
13
![Page 14: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/14.jpg)
Status
• Release 1.0 coming soon! – providers: sockets, verbs, PSM, usNIC
14
![Page 15: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April](https://reader034.vdocuments.net/reader034/viewer/2022042414/5f2f6bebf4c6a86cbc00c95a/html5/thumbnails/15.jpg)
Thanks!
And thanks to Sean and Paul for slide content
15