algorithm parallelization for multicore architectures alma overview poster.pdf · floating point to...

1
Floating point to fixed point No hardware support for FP in embedded multi- core systems Provide a automated floating- to fixed-point conversion tool SIMD/SWP parallelization Loop parallelization and layout optimization for SIMD instructions Explore performance vs. accuracy trade-off in fixed-point encodings Optimized application code on multi-core platform Embedded application design Multicore hardware design Translation to Scilab & pragmas Abstract hardware description (ADL) KIT C-compiler Multi-core simulator Parameters for algorithm optimization Cbased code with parallel descriptions ALMA algorithm parallelization tools Executable binary (for simulator and HW) Recore C-compiler Structural hardware description Feedback for optimization ALgorithm parallelization for Multicore Architectures Faster time-to-market for embedded multicore systems with less application development effort HIDE THE COMPLEXITY BY THE ALMA T OOL FLOW WWW .ALMA -PROJECT .EU KEEP IT SIMPLE FOR THE PROGRAMMER The ALMA ToolFlow aims to Hide the complexity of the underlying hard-ware to the programmer provide a new approach for compiling annotated Scilab Code to MPSoC architectures Develop a unified SystemC simulation framework for MPSoCs Develop algorithms and tools for High-level, platform-independent application code performance estimation and optimization Identification of possible partitions and placing & routing on different underlying architectures Data Type binding and data-level parallelization ALMA Front-End Tools Scilab Front-End (SAFE) Parses Scilab source code and produces high level intermediate representation (HLIR) expressed in C ALMA profiler (aprof) Early performance estimation at the HLIR level High-Level Optimizer (HLO) Applies platform-independent optimizations to the HLIR Application Test Cases Coordinator: Contact: Budget: Start Date: Duration: Jürgen Becker (KIT) [email protected] 3,200,000 € 01/09/2011 36 Months Fine-Grain Parallelism Extraction Coarse-Grain Parallelism Extraction Responsible for global optimization Transformation of ALMA IR CFDG to Hierarchical Task Graph (HTG) High-level parallelization transformations to increase schedulable parallelism HTG partitioning to cores Optimal mapping and scheduling of tasks to architecture resources Iterative optimization by using task and communication profiling Parallel Code Generation Generates target-specific C code Maps Scilab variables to memory locations Expresses communication and SIMD instr. Instrumentation for profiling Uses Recore/Kahrisma C compiler Utilizing native MPI libraries Generates executable for the hardware and simulator Application Input Language (Scilab) ALMA dialect of the Scilab language Subset of Scilab language Extended by a preprocessing language Variables declaration Static types specification Maximum size of vector/matrix data type definition Extended by an annotation language for supporting parallelism extraction Architecture Description Language (ADL) Enables target independence of the toolchain Used as architecture description for the simulator Enables design-space exploration Compact specification of regular MPSoC structure Structural specification annotated with behavioural information Hierarchical module description for mixed- accuracy simulation support ADL Compiler Compile and analyse the architecture description Extracts high-level information from ADL description (e.g. number of cores, communication bandwidth, available memories) Flattens hierarchical description Multicore Architecture Test Cases Recore Systems’ Multi- core DSP Platforms KIT’s KAHRISMA Architecture Multicore Architecture Simulation Simulation of ALMA target architectures Retargetable Structure defined by ADL Implementation by library of SystemC modules Mixed-accuracy simulation Behavioural or cycle-accurate For individual modules Enables task and communication profiling This work is cofunded by the European Union under the 7th Framework Programme under grant agreement ICT287733. Image Processing Object recognition and multi-object tracking Use of Scale Invariant Feature Transform (SIFT) Telecommunication IEEE 802.16e PHY Layer in NT x NR MIMO Configuration State-of-the-art WiMAX wireless communication The ALMA Tool Chain Annotated Scilab Code ADL ALMA IR ALMA IR Annotated C Code C Code + BackAnnotation Binary Profile Information JSON Iterative Optimization Profile Information HLIR HLIR

Upload: others

Post on 29-Aug-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ALgorithm parallelization for Multicore Architectures ALMA Overview Poster.pdf · Floating point to fixed point No hardware support for FP in embedded multi- core systems Provide

Floating point to fixed point No hardware support for FP in embedded multi-

core systems Provide a automated floating- to fixed-point

conversion tool SIMD/SWP parallelization Loop parallelization and layout optimization for

SIMD instructions

Explore performance vs. accuracy trade-off in fixed-point encodings

Optimized application code on multi-core

platform

Embedded application design Multi‐core hardware design

Translation to Scilab & pragmas

Abstract hardware

description (ADL)

KITC-compiler

Multi-core simulator

Parametersforalgorithmoptimization

C‐basedcodewithparalleldescriptions

ALMA algorithm

parallelization tools

Executablebinary(forsimulatorandHW)

Recore C-compiler

Structuralhardwaredescription

Feedbackforoptimization

ALgorithm parallelization for Multicore Architectures

Faster time-to-market for embedded multicore systems with less application development effort

HIDE THE COMPLEXITY BY THE ALMA TOOL FLOW

WWW.ALMA-PROJECT.EU

KEEP IT SIMPLE FOR THE PROGRAMMERThe ALMA ToolFlow aims to Hide the complexity of the

underlying hard-ware to theprogrammer

provide a new approach forcompiling annotated Scilab Codeto MPSoC architectures

Develop a unified SystemCsimulation framework for MPSoCs

Develop algorithms and tools for High-level, platform-independent

application code performanceestimation and optimization

Identification of possible partitionsand placing & routing on differentunderlying architectures

Data Type binding and data-levelparallelization

ALMA Front-End Tools Scilab Front-End (SAFE) Parses Scilab source code and produces high level

intermediate representation (HLIR) expressed in C ALMA profiler (aprof) Early performance estimation at the HLIR level

High-Level Optimizer (HLO) Applies platform-independent optimizations

to the HLIR

Application Test Cases

Coordinator:Contact:Budget:Start Date:Duration:

Jürgen Becker (KIT)[email protected],200,000 €01/09/201136 Months

Fine-Grain Parallelism Extraction

Coarse-Grain Parallelism Extraction Responsible for global optimization Transformation of ALMA IR CFDG to Hierarchical

Task Graph (HTG) High-level parallelization transformations to

increase schedulable parallelism HTG partitioning to cores Optimal mapping and scheduling of tasks to

architecture resources Iterative optimization by using task and

communication profiling

Parallel Code Generation Generates target-specific C code Maps Scilab variables to memory locations Expresses communication and SIMD instr. Instrumentation for profiling

Uses Recore/Kahrisma C compiler Utilizing native MPI libraries Generates executable for the hardware and

simulator

Application Input Language (Scilab) ALMA dialect of the Scilab language Subset of Scilab language Extended by a preprocessing language Variables declaration Static types specification Maximum size of vector/matrix data type definition

Extended by an annotation language for supporting parallelism extraction

Architecture Description Language (ADL) Enables target independence of the toolchain Used as architecture description for the simulator Enables design-space exploration Compact specification of regular MPSoC structure Structural specification annotated with

behavioural information Hierarchical module description for mixed-

accuracy simulation support

ADL Compiler Compile and analyse the architecture description Extracts high-level information from ADL

description (e.g. number of cores, communication bandwidth, available memories)

Flattens hierarchical description

Multicore Architecture Test Cases Recore Systems’ Multi-

core DSP Platforms KIT’s KAHRISMA

Architecture

Multicore Architecture Simulation Simulation of ALMA target architectures Retargetable Structure defined by ADL Implementation by library of SystemC modules

Mixed-accuracy simulation Behavioural or cycle-accurate For individual modules

Enables task and communication profiling

This work is co‐funded by the European Union under the 7th Framework Programme under grant agreement ICT‐287733.

Image Processing Object recognition and multi-object tracking Use of Scale Invariant Feature Transform (SIFT)

Telecommunication IEEE 802.16e PHY Layer in NT x NR MIMO

Configuration State-of-the-art WiMAX wireless communication

The ALMA Tool Chain

Annotated Scilab CodeADL

ALMA IR

ALMA IR

Annotated C Code

C Code + Back‐Annotation

Binary

Profile

Inform

ation

JSON

Iterative Optimization

Profile

Inform

ation

HLIRHLIR