hpx-5 runtime system overhead...
TRANSCRIPT
![Page 1: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/1.jpg)
HPX-5 Runtime System Overhead Times
Thomas Sterling
Chief Scientist, CREST Professor, School of Informatics and Computing
Indiana University April 27, 2016
35th Salishan Meeting: Random Access Session
![Page 2: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/2.jpg)
Introduction
• The “Runtime Hypothesis” is that the time to solution of a fixed workload can be reduced by adding more work – a paradox.
• Driven by promise of exploitation of compute time information to guide application execution to achieve efficiency and scalability improvement.
• An example: HPX-5 runtime based on the experimental ParalleX execution model to address SLOWER performance model parameters (e.g., overhead).
• What are the costs of runtime mechanisms? They impose bounds on parallelism granularity and in other ways get in the way.
• Presented here: quantitative 2
![Page 3: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/3.jpg)
![Page 4: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/4.jpg)
HPX-5 Runtime Software Architecture
4
LCOs
APPLICATION LAYER
Lulesh LibPXGL
N-Body
FMM
GLOBAL ADDRESS SPACE
Source
Node Address
Dest
Source directly converts a virtual address into a node/address pair and sends a message to the Dest
PGAS AGAS PARCELS
PROCESSES
SCHEDULER
Worker threads
ISIR PWC
OPERATING SYSTEM
HARDWARE
Cores
NETWORK
NETWORK LAYER
Courtesy of Jayashree Candadai, IU
![Page 5: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/5.jpg)
ParalleX Computation Complex
-- Runtime Aware -- Logically Active -- Physically Active
![Page 6: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/6.jpg)
Time Required to Check if Memory Address is Local or Remote in HPX5
Chart courtesy of Daniel Kogler, IU
![Page 7: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/7.jpg)
Time Required to Create a New Lightweight Thread in HPX5
Chart courtesy of Daniel Kogler, IU
![Page 8: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/8.jpg)
Time Required to Perform a Context Switch Between Lightweight Threads in HPX5
Chart courtesy of Daniel Kogler, IU
![Page 9: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/9.jpg)
Time Required to Acquire, Prepare, and Put a Parcel into the Send Queue (8 Byte payload, no contention, jemalloc allocator)
Chart courtesy of Daniel Kogler, IU
![Page 10: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload](https://reader034.vdocuments.net/reader034/viewer/2022042221/5ec7efdb353fa148a72fc995/html5/thumbnails/10.jpg)
Concluding Observations
• All mechanisms shown implemented in software using conventional hardware support (e.g., RDMA)
• Key mechanisms achieved in less than 1 microsecond with some less than 100 nanoseconds.
• Measured costs distributed by as much as a factor of 2. • Sensitivities observed to cache miss rates. • Sensitivities less dependent on TLB misses. • Actual impact is dependent on incident rate determined
by real-world application workload. • Lessons learned may suggest architecture advances for
improved runtime overhead, efficiency, and scalability. 10