algorithm parallelization for multicore architectures alma overview poster.pdf · floating point to...
TRANSCRIPT
Floating point to fixed point No hardware support for FP in embedded multi-
core systems Provide a automated floating- to fixed-point
conversion tool SIMD/SWP parallelization Loop parallelization and layout optimization for
SIMD instructions
Explore performance vs. accuracy trade-off in fixed-point encodings
Optimized application code on multi-core
platform
Embedded application design Multi‐core hardware design
Translation to Scilab & pragmas
Abstract hardware
description (ADL)
KITC-compiler
Multi-core simulator
Parametersforalgorithmoptimization
C‐basedcodewithparalleldescriptions
ALMA algorithm
parallelization tools
Executablebinary(forsimulatorandHW)
Recore C-compiler
Structuralhardwaredescription
Feedbackforoptimization
ALgorithm parallelization for Multicore Architectures
Faster time-to-market for embedded multicore systems with less application development effort
HIDE THE COMPLEXITY BY THE ALMA TOOL FLOW
WWW.ALMA-PROJECT.EU
KEEP IT SIMPLE FOR THE PROGRAMMERThe ALMA ToolFlow aims to Hide the complexity of the
underlying hard-ware to theprogrammer
provide a new approach forcompiling annotated Scilab Codeto MPSoC architectures
Develop a unified SystemCsimulation framework for MPSoCs
Develop algorithms and tools for High-level, platform-independent
application code performanceestimation and optimization
Identification of possible partitionsand placing & routing on differentunderlying architectures
Data Type binding and data-levelparallelization
ALMA Front-End Tools Scilab Front-End (SAFE) Parses Scilab source code and produces high level
intermediate representation (HLIR) expressed in C ALMA profiler (aprof) Early performance estimation at the HLIR level
High-Level Optimizer (HLO) Applies platform-independent optimizations
to the HLIR
Application Test Cases
Coordinator:Contact:Budget:Start Date:Duration:
Jürgen Becker (KIT)[email protected],200,000 €01/09/201136 Months
Fine-Grain Parallelism Extraction
Coarse-Grain Parallelism Extraction Responsible for global optimization Transformation of ALMA IR CFDG to Hierarchical
Task Graph (HTG) High-level parallelization transformations to
increase schedulable parallelism HTG partitioning to cores Optimal mapping and scheduling of tasks to
architecture resources Iterative optimization by using task and
communication profiling
Parallel Code Generation Generates target-specific C code Maps Scilab variables to memory locations Expresses communication and SIMD instr. Instrumentation for profiling
Uses Recore/Kahrisma C compiler Utilizing native MPI libraries Generates executable for the hardware and
simulator
Application Input Language (Scilab) ALMA dialect of the Scilab language Subset of Scilab language Extended by a preprocessing language Variables declaration Static types specification Maximum size of vector/matrix data type definition
Extended by an annotation language for supporting parallelism extraction
Architecture Description Language (ADL) Enables target independence of the toolchain Used as architecture description for the simulator Enables design-space exploration Compact specification of regular MPSoC structure Structural specification annotated with
behavioural information Hierarchical module description for mixed-
accuracy simulation support
ADL Compiler Compile and analyse the architecture description Extracts high-level information from ADL
description (e.g. number of cores, communication bandwidth, available memories)
Flattens hierarchical description
Multicore Architecture Test Cases Recore Systems’ Multi-
core DSP Platforms KIT’s KAHRISMA
Architecture
Multicore Architecture Simulation Simulation of ALMA target architectures Retargetable Structure defined by ADL Implementation by library of SystemC modules
Mixed-accuracy simulation Behavioural or cycle-accurate For individual modules
Enables task and communication profiling
This work is co‐funded by the European Union under the 7th Framework Programme under grant agreement ICT‐287733.
Image Processing Object recognition and multi-object tracking Use of Scale Invariant Feature Transform (SIFT)
Telecommunication IEEE 802.16e PHY Layer in NT x NR MIMO
Configuration State-of-the-art WiMAX wireless communication
The ALMA Tool Chain
Annotated Scilab CodeADL
ALMA IR
ALMA IR
Annotated C Code
C Code + Back‐Annotation
Binary
Profile
Inform
ation
JSON
Iterative Optimization
Profile
Inform
ation
HLIRHLIR