algorithm parallelization for multicore architectures alma overview poster.pdf · floating point to...

Floating point to fixed point No hardware support for FP in embedded multi-

core systems Provide a automated floating- to fixed-point

conversion tool SIMD/SWP parallelization Loop parallelization and layout optimization for

SIMD instructions

Explore performance vs. accuracy trade-off in fixed-point encodings

Optimized application code on multi-core

platform

Embedded application design Multi‐core hardware design

Translation to Scilab & pragmas

Abstract hardware

description (ADL)

KITC-compiler

Multi-core simulator

Parametersforalgorithmoptimization

C‐basedcodewithparalleldescriptions

ALMA algorithm

parallelization tools

Executablebinary(forsimulatorandHW)

Recore C-compiler

Structuralhardwaredescription

Feedbackforoptimization

ALgorithm parallelization for Multicore Architectures

Faster time-to-market for embedded multicore systems with less application development effort

HIDE THE COMPLEXITY BY THE ALMA TOOL FLOW

WWW.ALMA-PROJECT.EU

KEEP IT SIMPLE FOR THE PROGRAMMERThe ALMA ToolFlow aims to Hide the complexity of the

underlying hard-ware to theprogrammer

provide a new approach forcompiling annotated Scilab Codeto MPSoC architectures

Develop a unified SystemCsimulation framework for MPSoCs

Develop algorithms and tools for High-level, platform-independent

application code performanceestimation and optimization

Identification of possible partitionsand placing & routing on differentunderlying architectures

Data Type binding and data-levelparallelization

ALMA Front-End Tools Scilab Front-End (SAFE) Parses Scilab source code and produces high level

intermediate representation (HLIR) expressed in C ALMA profiler (aprof) Early performance estimation at the HLIR level

High-Level Optimizer (HLO) Applies platform-independent optimizations

to the HLIR

Application Test Cases

Coordinator:Contact:Budget:Start Date:Duration:

Jürgen Becker (KIT)[email protected],200,000 €01/09/201136 Months

Fine-Grain Parallelism Extraction

Coarse-Grain Parallelism Extraction Responsible for global optimization Transformation of ALMA IR CFDG to Hierarchical

Task Graph (HTG) High-level parallelization transformations to

increase schedulable parallelism HTG partitioning to cores Optimal mapping and scheduling of tasks to

architecture resources Iterative optimization by using task and

communication profiling

Parallel Code Generation Generates target-specific C code Maps Scilab variables to memory locations Expresses communication and SIMD instr. Instrumentation for profiling

Uses Recore/Kahrisma C compiler Utilizing native MPI libraries Generates executable for the hardware and

simulator

Application Input Language (Scilab) ALMA dialect of the Scilab language Subset of Scilab language Extended by a preprocessing language Variables declaration Static types specification Maximum size of vector/matrix data type definition

Extended by an annotation language for supporting parallelism extraction

Architecture Description Language (ADL) Enables target independence of the toolchain Used as architecture description for the simulator Enables design-space exploration Compact specification of regular MPSoC structure Structural specification annotated with

behavioural information Hierarchical module description for mixed-

accuracy simulation support

ADL Compiler Compile and analyse the architecture description Extracts high-level information from ADL

description (e.g. number of cores, communication bandwidth, available memories)

Flattens hierarchical description

Multicore Architecture Test Cases Recore Systems’ Multi-

core DSP Platforms KIT’s KAHRISMA

Architecture

Multicore Architecture Simulation Simulation of ALMA target architectures Retargetable Structure defined by ADL Implementation by library of SystemC modules

Mixed-accuracy simulation Behavioural or cycle-accurate For individual modules

Enables task and communication profiling

This work is co‐funded by the European Union under the 7th Framework Programme under grant agreement ICT‐287733.

Image Processing Object recognition and multi-object tracking Use of Scale Invariant Feature Transform (SIFT)

Telecommunication IEEE 802.16e PHY Layer in NT x NR MIMO

Configuration State-of-the-art WiMAX wireless communication

The ALMA Tool Chain

Annotated Scilab CodeADL

ALMA IR

ALMA IR

Annotated C Code

C Code + Back‐Annotation

Binary

Profile

Inform

ation

JSON

Iterative Optimization

Profile

Inform

ation

HLIRHLIR

algorithm parallelization for multicore architectures alma overview poster.pdf · floating point to...

Documents