a rtrm proposal for multi/many-core platforms and reconfigurable applications
DESCRIPTION
Emerging multi/many-core architectures, targeting both High Performance Computing (HPC) and mobile devices, increase the interest for self-adaptive systems, where both applications and computational resources could smoothly adapt to the changing of the working conditions. In these scenarios, an efficient Run-Time Resource Manager (RTRM) framework can provide a valuable support to identify the optimal tradeoff between the Quality-of-Service (QoS) requirements of the applications and the time varying resources availability. This paper introduces a new approach to the development of a system-wide RTRM featuring: a) a hierarchical and distributed control, b) the exploitation of design-time information, c) a rich multi-objective optimization strategy and d) a portable and modular design based on a set of tunable policies. The framework is already available as an Open Source project, targeting a NUMA architecture and a new generation multi/many-core research platform. First tests show benefits for the execution of parallel applications, the scalability of the proposed multiobjective resources partitioning strategy, and the sustainability of the overheads introduced by the framework.TRANSCRIPT
A RTRM Proposal for Multi/Many-Core Platforms and Reconfigurable Applications
P. Bellasi, G. Massari and W. Fornaciari{bellasi, massari, fornacia}@elet.polimi.it
ReCoSoC 2012
Dipartimento di Elettronica e InformazionePolitecnico di Milano
Last revision Jul, 2 2012
The BarbequeRTRM Framework 2
IntroductionWhy Run-Time Resource Management?
Run-Time Resources Management (RTRM) is aboutfinding the optimal tradeoff between
QoS requirements and resources availability
Target scenarioShared HW resources
upcoming many-core devices are complex systemsprocess variations and run-time issues
Mixed SW workloadsresources sharing and competition
among applications with different and time-varying requirements
Simple solutions are requiredsupport for frequently changing use-casessuitable for both critical and best-effort applications
The BarbequeRTRM Framework 3
IntroductionPaper Contribution
Methodology to support system-wide run-time resource management
exploiting design-time informationhierarchical and distributed control
BarbequeRTRM Frameworkmulti-objective optimization strategyeasily portable and modular designrun-time tunable and scalable policiesopen source project
http://www.2parma.eu
http://bosp.dei.polimi.it
The BarbequeRTRM Framework 4
Multi-O
bjective
IntroductionHow we compare?
Heter. P
latforms
Hom
og. Platform
s
Reconf./A
dapt
Mult. R
esources
Clustered R
esourcesC
ontrol-Theory Model
Design-Tim
e Exploitation
Portability
ResourcesManagers Proposals
StarPU
Binotto et al.
Fu et al.
ACTORS
SEEC
DistRM
BarbequeRTRM
Desirable Properties
The BarbequeRTRM Framework 5
The BarbequeRTRMSystem-Wide RTRM: Overall Framework
System-Wide RTRMCoarse grained control on platform available resources:- resource accounting, partitioning and abstraction- high-level HW events handling e.g., critical conditions, faults...- manage applications priorities- power/thermal “coarse tuning”
Application-Specific RTMFine grained control on application allocated resources:- task ordering- virtual processor assignment- DVFS- application parameters monitoringDynamic Code
Generation
Task Mapping
DDM
Critical Apps Best-Effort Apps
RTLib
Res Accounting Res Partitioning
Res Abstraction
MRAPI
Platform DRVPlatform DRVPlatform Driver
Platform Proxy
supported platforms
kernel
user-space
H
Platform Firmware
C
F
G
I
ba
c
d
e
f
RTLib
C
ED
A B
X SW Interface (API)
SW/HW Meta-dataY
Legend
BarbequeRTRM
The BarbequeRTRM Framework 6
The BarbequeRTRMSystem-Wide RTRM: Presentation Outline
System-Wide RTRMCoarse grained control on platform available resources:- resource accounting, partitioning and abstraction- high-level HW events handling e.g., critical conditions, faults...- manage applications priorities- power/thermal “coarse tuning”
Application-Specific RTMFine grained control on application allocated resources:- task ordering- virtual processor assignment- DVFS- application parameters monitoringDynamic Code
Generation
Task Mapping
DDM
Critical Apps Best-Effort Apps
RTLib
Res Accounting Res Partitioning
Res Abstraction
MRAPI
Platform DRVPlatform DRVPlatform Driver
Platform Proxy
supported platforms
kernel
user-space
H
Platform Firmware
C
F
G
I
ba
c
d
e
f
RTLib
C
ED
A B
X SW Interface (API)
SW/HW Meta-dataY
Legend
BarbequeRTRM
1
2
3
The BarbequeRTRM Framework 7
The BarbequeRTRMSystem-Wide RTRM: Distributed Hierarchical Control
System-Wide RTRMCoarse grained control on platform available resources:- resource accounting, partitioning and abstraction- high-level HW events handling e.g., critical conditions, faults...- manage applications priorities- power/thermal “coarse tuning”
Application-Specific RTMFine grained control on application allocated resources:- task ordering- virtual processor assignment- DVFS- application parameters monitoringDynamic Code
Generation
Task Mapping
DDM
Critical Apps Best-Effort Apps
RTLib
Res Accounting Res Partitioning
Res Abstraction
MRAPI
Platform DRVPlatform DRVPlatform Driver
Platform Proxy
supported platforms
kernel
user-space
H
Platform Firmware
C
F
G
I
ba
c
d
e
f
RTLib
C
ED
A B
X SW Interface (API)
SW/HW Meta-dataY
Legend
BarbequeRTRM
1
The BarbequeRTRM Framework 8
The Proposed Control SolutionDistributed Hierarchical Control
Different subsystems have their own control loop (CL)System-wide level (resources partitioning, system-wide optimization, ...)
Application specific (application tuning, dynamic memory management, ...)
Firmware/OS level (F/V control, thermal alarms, resource availability, ...)
FF closed CLusing OP and AWM
Optimaluser defined goal functionsincluding overheads
Robust Adaptive
The BarbequeRTRM Framework 9
The BarbequeRTRMSystem-Wide RTRM: Resource Partitioning Strategy
System-Wide RTRMCoarse grained control on platform available resources:- resource accounting, partitioning and abstraction- high-level HW events handling e.g., critical conditions, faults...- manage applications priorities- power/thermal “coarse tuning”
Application-Specific RTRMFine grained control on application allocated resources:- task ordering- virtual processor assignment- DVFS- application parameters monitoringDynamic Code
Generation
Task Mapping
DDM
Critical Apps Best-Effort Apps
RTLib
Res Accounting Res Partitioning
Res Abstraction
MRAPI
Platform DRVPlatform DRVPlatform Driver
Platform Proxy
supported platforms
kernel
user-space
H
Platform Firmware
C
F
G
I
ba
c
d
e
f
RTLib
C
ED
A B
X SW Interface (API)
SW/HW Meta-dataY
Legend
BarbequeRTRM
2
The BarbequeRTRM Framework 10
Scheduling PolicyYaMS - A modular multi-objective scheduler
Introduction of a new modular policy (YaMS)partition available resources (R) on applications (A)
considering A priorities and R “residual” availabilities
multi-objective optimizationsupport a set of tunable goals
DONE: performances, overheads,congestion, fairness
WIP: stability, robustness,thermal and power
increase overall system valueconsidering discrete and tunableimprovements
LP theory, MMKP heuristicpromote scheduling of some AWMs
which improve optimization goals
demote scheduling of others AWMswhich degrade solution metrics
e.g. stability and robustness
The BarbequeRTRM Framework 11
Scheduling PolicyYaMS - Scalability
Speedup
+36%
+54%
The BarbequeRTRM Framework 12
Scheduling PolicySystem-Wide Controller – Overall View
BBQ Validation Policy- enforce certain control properties
energy budget, stability and robustness- authorize resources synchronization
The BarbequeRTRM Framework 13
Scheduling PolicySystem-Wide Controller – Inner-Loop “Scheduling”
BBQ Validation Policy- enforce certain control properties
energy budget, stability and robustness- authorize resources synchronization
The BarbequeRTRM Framework 14
Scheduling PolicySystem-Wide Controller – Inner-Loop Overheads
+
+ +
Apps with 3 AWM, 3 Clusters => 9 configuration per applicationBBQ running on NSJ, 4 CPUs @ 2.5GHz (max)
The BarbequeRTRM Framework 15
The BarbequeRTRMSystem-Wide RTRM: Platform Integration
System-Wide RTRMCoarse grained control on platform available resources:- resource accounting, partitioning and abstraction- high-level HW events handling e.g., critical conditions, faults...- manage applications priorities- power/thermal “coarse tuning”
Application-Specific RTRMFine grained control on application allocated resources:- task ordering- virtual processor assignment- DVFS- application parameters monitoringDynamic Code
Generation
Task Mapping
DDM
Critical Apps Best-Effort Apps
RTLib
Res Accounting Res Partitioning
Res Abstraction
MRAPI
Platform DRVPlatform DRVPlatform Driver
Platform Proxy
supported platforms
kernel
user-space
H
Platform Firmware
C
F
G
I
ba
c
d
e
f
RTLib
C
ED
A B
X SW Interface (API)
SW/HW Meta-dataY
Legend
BarbequeRTRM
3
The BarbequeRTRM Framework 16
The BarbequeRTRMPlatform Integration Layer
Support for generic Linux SMP/NUMA machinesportable solution, based on Linux Control Group
CPUs and Memory nodes assignmentsMemory amount assignmentCPU bandwidth quota assignment (requires kernel 3.2)
Support both resources monitoring and controlpre-configured CGroup to define BBQ controlled resources
at Barbeque start, by parsing a pre-configured cgroups… than Barbeque takes control over these resources
tun-time generation of new CGroup to control applicationsrequires RTLib integration (of course)
Working on P2012 integrationnew genration many-core platform from STMicroelectronics
first version: 4 cluster with 16 Processing Element each one
The BarbequeRTRM Framework 17
Synchronization PolicySystem-Wide Controller – Outer-Loop “Synchronization”
BBQ Validation Policy- enforce certain control properties
energy budget, stability and robustness- authorize resources synchronization
The BarbequeRTRM Framework 18
Synchronization PolicySystem-Wide Controller – Outer-Loop Overheads
CGroupsPIL
+ +
+
+
+
min AWM 25% CPU Time, 3 Clusters x 4CPUs => max 48 syncsBBQ running on NSJ, 4 CPUs @ 2.5GHz (max)
Linux kernel 3.2Creation overheads: ~500msUpdate overheads: ~100ms
(1/3 on quadcore i7)
Application dependent
The BarbequeRTRM Framework 19
The BarbequeRTRM FrameworkPower Optimizations
X86_64 NUMA machine: 3 Clusters x 4CPUsBBQ running on NSJ, 4 CPUs @ 800MHz
Initial experiments on congested workloadsincreasing running instances of Bodytrack (PARSEC)
3AWM: [1,2,4] Threads
system-wide power measurementsvia the standard IPMI interface
Power Gains2,3-3,7%
Time Gains338-625%
The BarbequeRTRM Framework 20
The BarbequeRTRM FrameworkConclusions
Framework for System-Wide RTRMflexibility and scalability of the RTRM strategy
thanks to its hierarchical and distributed control structure
acceptable overheads for real usage scenariosincluding those with variable workload
tunable multi-objective optimization policiesto cope with several design constraints and goals
e.g., performance, power, thermal and reliability, ...
promising results in terms of performance improvingand power consumption reduction
for a highly parallel workload, on a NUMA multi-core architecture
availability of a simple API interfacemaking straightforward for the programmers to take full advantages from framework services
The BarbequeRTRM Framework 21
The BarbequeRTRM FrameworkFuture Works
Optimization policiesrobustness and stability assessmentimproving power/energy optimizationsthermal and reliability management
Platform integrationextended support for Android targetscomplete the integration with the P2012 many-core platform
Thanks for your attention!
If you are interested, please checkthe project website for further information
and keep update with the developments
http://bosp.dei.polimi.it