alma project overvie alma public... · 2016-01-11 · fp7-ict-2011-7-287733 – alma project...

25
FP7-ICT-2011-7-287733 ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ALMA Consortium

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 1

FP7-ICT-2011-7-287733

ALMA Project Overview

ALMA Consortium

Page 2: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 2

Outline

ALMA EU Project Motivation

Project Overview

Target Architectures

ALMA Toolchain & Development Flow

Application Test Cases

Current Status

Summary

Page 3: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 3

ALMA Project ID Card

Three year project: 01/09/2011 – 30/08/2014

Funded by FP7: 3.2 Million Euros

Official web site: http://www.alma-project.eu/

Coordinator: Juergen Becker (KIT)

Technical Coordinator: Nikos Voros (TMES)

Page 4: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 5

Why do we need multi-core processors?

Until ~2005 processor performance increase driven by

Clock speed

Execution optimization

Cache

Power wall

ILP wall

Led to multicore processors

Parallelism must be exposed by the programmer

(source http://www.gotw.ca/publications/concurrency-ddj.htm)

Page 5: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 6

Processor Breakthroughs ....

A major architecture disruption: multiprocessing and

specialization will have a strong impact on software

Time

Processor

Power

Intel 8086

1980 1990 2000

PowerQUICC II

Multi-processing

Processor

specialization

2015-2020 (?)

Architectural

break point

CISC era

RISC era

Technology limitations: perf. by

parallelism no more by frequency =>

disruption in programming model, long

term research challenges

Domain oriented architectures: eg with

predictable performance to control the

timeliness in RT critical applis, dynamic

reconfiguration for adaptive, distributed

critical architectures (multilevel RT

composability) Source:

G. Edelin

(Thales),

2009

Page 6: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 7

Motivation

End user perspective Target architecture perspective

• Explore/Develop algorithms

• Use a simple, comfortable language

• E.g. Matlab, Scilab, …

• Don’t want to care about • data types • parallelism

• End result

• Performance • Energy efficient • Cost efficient • Fast development time

• Multi-Processor System-on-Chip

• Parallel processor cores

• Parallel programming model • E.g. pthreads, MPI, OpenMP

• Parallelism with the processor cores

• Single Instruction Multiple Data • Very Long Instruction Word

• Native data types

• E.g. 32-bit integer • Other data types perform

inefficient

Page 7: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 8

ALMA in a Nutshell

Hide the complexity of the underlying hardware to the end user

ALMA will develop an approach for compiling annotated Scilab code to MPSoC architectures

Algorithms and tools for High-level, platform-independent application code performance

estimation and optimization

Identification of possible partitions and their placement on different resources of the underlying architectures

Data type binding and data parallelization to exploit data-level parallelism

Develop an unified SystemC simulation framework to provide an environment for simulating MPSoCs

Two state-of-the-art architectures provided by RECORE and KIT

Net result: smaller application development time/effort and faster time-to-market

Page 8: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 9

Objectives

Extend Scilab for optimization on high-level system models

Develop a parallelization and optimization environment

Employ and extend two different architectures

Parallel code generation

Parallel code simulation

Page 9: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 10

Challenges for Compiling Scilab to MPSoCs

Scilab (Matlab-like) programming language Dynamic typing (scalars, vectors, matrices)

Pointer-free, i.e. no memory aliasing problems

End users typically use floating-point data types

Natural parallelism within vector operations

MPSoC target architectures Exploit coarse-grain parallelism (task-level)

Distributed memory

Exploit fine-grain parallelism (instruction-level)

Page 10: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 11

ALMA Architectures (1/2): Recore X2014

Scalability by virtue of Packet-switched Network-on-Chip

Distributed memories & I/O

Distributed control

Distributed processing cores

Xentium® processing tile Fixed-point DSP processing

10-issue VLIW processor

SIMD capability

Streaming communication services

Reconfigurability Smart memory tile (RAM/FIFO)

Separate applications from each others

Guarantee QoS

Fault tolerant application mapping

Page 11: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 12

ALMA Architectures (2/2): KIT Kahrisma

Pro

ce

ss

or

Co

ntr

ol U

nit

Ma

in M

em

ory

Co

nte

xt M

em

ory

Ca

ch

e S

ub

syste

m

Control Flow Tiles

Rename Tiles

EDPE Array

Instruction Cache Tiles

EDPE

EDPE

EDPE

EDPE

DSP Instance I DSP Instance II

2-issue VLIW 2-issue VLIW ISA4-issue VLIW ISA 6-issue VLIW ISAProcessor

Instances

DSP Instances

Instr

uctio

n p

re-

pro

ce

ssin

g

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

EDPE

…...

…...

…...

…...

…...

…...

Dynamic reconfigurable MPSoC

Modules can be reconfigured to processors or DSPs

Dynamic clustered VLIW processor instances

Local scratchpad memory

Non-coherent access to main memory

Communication between cores through a network

Page 12: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 13

Outline

Agenda

ALMA EU Project

Motivation

Project Overview

Target Architectures

ALMA Toolchain & Development Flow

Application Test Cases

Current Status

Summary

Page 13: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 14

ALMA Development Flow (overview)

Optimized

application code on

multi-core platform

Embedded application design Multi-core hardware design

Translation to

Scilab &

annotations

Abstract

hardware

description

(ADL)

KIT

C-compiler

Multi-core

simulator

Parameters for algorithm

optimization

C-based code with parallel descriptions

ALMA

algorithm

parallelization

tools

Executable binary (for simulator and HW)

Recore

C-compiler

Structural hardware

description

Feedback for optimization

Page 14: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 15

Input for the ALMA tools

ALMA dialect of the Scilab language

Subset of Scilab language

Extended by a preprocessing language

Variables declaration

Static types specification

Maximum size of vector and matrix data types definition

Extended by an annotation language for supporting parallelism extraction

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Page 15: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 16

ALMA Front-end tools

Scilab Front-End (SAFE)

Parses Scilab source code and produces high level intermediate representation (HLIR) expressed in C

ALMA profiler (aprof)

Early performance estimation at the HLIR level

High-Level Optimizer (HLO) Applies platform independent

optimizations to the HLIR

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Page 16: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 17

Parallelization Tools (Fine grain extraction)

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Floating point to fixed point No hardware support for FP in

embedded multi-core systems

Provide a automated floating to fixed point conversion tool.

SIMD/SWP parallelization Loop parallelization and layout

optimization for SIMD ISA.

Explore perf./accuracy trade-off in fixed point encodings

Page 17: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 18

Parallelization Tools (Coarse-grain extraction)

Coarse-grain parallelism extraction and optimization

Responsible for the global optimization

Transformation of ALMA IR CFDG to Hierarchical Task Graph (HTG)

Resource availability from ADL

HTG partitioning to cores

Optimal mapping and scheduling of tasks to architecture resources

Exploits profiling information from the simulator for better resource usage estimation

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Page 18: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 19

Parallel platform code generation

Parallel platform code generation

Generates target-specific C code

Maps Scilab variables to memory locations

Expresses communication

Expresses SIMD instruction as intrinsics

Uses Recore/Kahrisma C compiler Exploits ILP by VLIW compilation

Generates executable for the hardware and simulator

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Page 19: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 20

Multicore architecture simulation

Multicore architecture simulation

Simulation of ALMA target architectures

Retargetable

Structure defined by ADL

Implementation by library of SystemC modules

Mixed-accuracy simulation Behavioural or cycle-accurate

For individual modules (processor core, memory subsystem, network)

Collect profiling and tracing information

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Page 20: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 21

ALMA Architecture Description Language (ADL)

ALMA ADL

Architecture Description Language (ADL)

Tailored to the requirements of ALMA

1. Enables target independence of the compilation toolchain

2. Used as architecture description for the simulator

3. Enables design-space exploration

Compact specification of regular MPSoC structures by for and if constructs

Structural specification annotated with behavioural information

Applications

Telecom-munication

Image Processing

Annotated Scilab Code

Architecture Description

ADL

ADL Compiler

GeCoS Framework

Fine-GrainParallelism Ext.

Corase-Grain Parallelism Ext.

Parallel Code Generation

ALMA IR

ALMA IR

Annotated C Code

Target-Spec. Compilation

C Code + Back-Annotation

ALMA Multi-Core Simulator

Binary

ProfileInformation

JSON

Kahrisma Compiler

Recore Compiler

ALMA Architectures

Kahrisma Arch.

Recore Arch.

Ite

rati

ve O

pti

miz

atio

n

ALMA Front-End Tools

Source-LevelProfiler

ProfileInformation

SciLab Front-End(SAFE)

High-Level Optimizer

HLIR

HLIR

Legend

ADL

App. Code

Information

Page 21: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 22

Outline

Agenda

ALMA EU Project

Motivation

Project Overview

Target Architectures

ALMA Toolchain & Development Flow

Application Test Cases

Current Status

Summary

Page 22: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 23

IEEE 802.16e PHY Layer in NT x NR MIMO Configuration

Typical example of a state-of-the-art wireless communication system

Application requirements impose hard, real-time constraints.

Design time must follow shrinking time-to-market.

Test Case (1/2): Telecommunications

Rx 1

Rx NR

FFT

Equalizer

Channel

Estimator

Derando

mizer

Deinter

leaver

Symbol

Deconstr

uction

- Cyclic

Prefix

Diversity Combiner

- Cyclic

Prefix FFT

SDU

Generati

on

Data

SDUs

Uplink

Frame

Deconstr

uction

MAC-

PHY

I/F

BS Rx

`

ALMA 1st Increment ALMA 2nd Increment

Tx 1

Tx NT

FEC

Encod

er

Interle

aver

Constel.

Mapping

IFFT + Cyclic

Prefix

S-T

Coding

IFFT + Cyclic

Prefix

+ Pre

amble

Data

SDUs

PHY MAC

UL/DL

Frame

Mapper

UL/DL

Schedul

er

BS Tx

PDU

Generation

MAC-

PHY

I/F

Frame

Constr

uction

Downlink MAC/PHY

Control

Symbol

Constr

uction

Rando

mizer

. .

. . . .

. . .

. .

.

. .

. . .

. . .

.

FEC

Decoder

Const.

Demap

Page 23: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 24

Feature based algorithm for object recognition and multi-object tracking

Use of Scale Invariant Feature Transform (SIFT)

The final goal is to run such applications in smart cameras

Test Case (2/2): Image Processing

Page 24: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 25

Summary

ALMA Goal: Hide the complexity of the underlying hardware to the end user

ALMA will develop an approach for compiling annotated Scilab code to MPSoCs

ALMA toolchain components Front-end tools (Scilab parser, high-level optimizer, profiler)

Fine-grain parallelism extraction

Coarse-grain parallelism extraction

SystemC multi-core simulator

ALMA toolchain is kept platform independent by a novel Architecture Description Language

Two state-of-the-art architectures provided by RECORE and KIT

Evaluated by two test cases from Telecommunications and Image Processing domain

Page 25: ALMA Project Overvie ALMA Public... · 2016-01-11 · FP7-ICT-2011-7-287733 – ALMA Project Overview 1 FP7-ICT-2011-7-287733 ALMA Project Overview ... 2-i s s u e V L IW 4-i s s

FP7-ICT-2011-7-287733 – ALMA Project Overview 26

Thank you !