hpx-5 runtime system overhead...

HPX-5 Runtime System Overhead Times Thomas Sterling Chief Scientist, CREST Professor, School of Informatics and Computing Indiana University April 27, 2016 35 th Salishan Meeting: Random Access Session

Upload: others

Post on 21-May-2020

2 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

HPX-5 Runtime System Overhead Times

Thomas Sterling

Chief Scientist, CREST Professor, School of Informatics and Computing

Indiana University April 27, 2016

35th Salishan Meeting: Random Access Session

Page 2: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Introduction

•  The “Runtime Hypothesis” is that the time to solution of a fixed workload can be reduced by adding more work – a paradox.

•  Driven by promise of exploitation of compute time information to guide application execution to achieve efficiency and scalability improvement.

•  An example: HPX-5 runtime based on the experimental ParalleX execution model to address SLOWER performance model parameters (e.g., overhead).

•  What are the costs of runtime mechanisms? They impose bounds on parallelism granularity and in other ways get in the way.

•  Presented here: quantitative 2

Page 3: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Page 4: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

HPX-5 Runtime Software Architecture

LCOs

APPLICATION LAYER

Lulesh LibPXGL

N-Body

FMM

GLOBAL ADDRESS SPACE

Source

Node Address

Dest

Source directly converts a virtual address into a node/address pair and sends a message to the Dest

PGAS AGAS PARCELS

PROCESSES

SCHEDULER

Worker threads

ISIR PWC

OPERATING SYSTEM

HARDWARE

Cores

NETWORK

NETWORK LAYER

Courtesy of Jayashree Candadai, IU

Page 5: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

ParalleX Computation Complex

-- Runtime Aware -- Logically Active -- Physically Active

Page 6: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Time Required to Check if Memory Address is Local or Remote in HPX5

Chart courtesy of Daniel Kogler, IU

Page 7: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Time Required to Create a New Lightweight Thread in HPX5

Chart courtesy of Daniel Kogler, IU

Page 8: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Time Required to Perform a Context Switch Between Lightweight Threads in HPX5

Chart courtesy of Daniel Kogler, IU

Page 9: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Time Required to Acquire, Prepare, and Put a Parcel into the Send Queue (8 Byte payload, no contention, jemalloc allocator)

Chart courtesy of Daniel Kogler, IU

Page 10: HPX-5 Runtime System Overhead Timessalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/2016-sterling.pdf · • The “Runtime Hypothesis” is that the time to solution of a fixed workload

Concluding Observations

•  All mechanisms shown implemented in software using conventional hardware support (e.g., RDMA)

•  Key mechanisms achieved in less than 1 microsecond with some less than 100 nanoseconds.

•  Measured costs distributed by as much as a factor of 2. •  Sensitivities observed to cache miss rates. •  Sensitivities less dependent on TLB misses. •  Actual impact is dependent on incident rate determined

by real-world application workload. •  Lessons learned may suggest architecture advances for

improved runtime overhead, efficiency, and scalability. 10

ArcGIS Runtime SDKs - Recent Proceedingsproceedings.esri.com/library/userconf/devsummit15/papers/dev_int... · ArcGIS Runtime SDKs: ... build high performance ArcGIS Runtime Apps

Monitoring: Runtime Behavior & Software Development Process · Monitoring runtime behavior Nástroje pro vývoj software Monitoring: Runtime Behavior & Development Process 3 Goals

Android + Runtime Environment

ArcGIS Runtime SDK for .NET: Building Windows Apps · 2018-05-09 · ArcGIS Runtime SDK for .NET Architecture Diagram Win32 Windows Runtime iOS Android ArcGIS Runtime Core (C++) ArcGIS

Runtime Revolution

Objective-C Runtime Manipulation - GitHub Pageswbyoung.github.io/objective_c_runtime.pdf · Objective-C Runtime Manipulation Whitney Young FadingRed @wbyoung wbyoung.github.com. Runtime

Objective c runtime

Manual RunTime

The “MIND” Scalable PIM Architectureauthors.library.caltech.edu/28210/1/Cetraro-Sterling.pdf · MIND reflects a global shared memory model without cache coherence. Any element

Runtime Environments What is in the memory?. 2301373 Runtime Environment2 Outline Memory organization during program execution Static runtime environments