asap 2005 samos, greece july 23-25, 2005 1 exploring design space of vliw architectures giuseppe...

30
ASAP 2005 Samos, Greece July 23-25, 2005 1 Exploring Design Space of Exploring Design Space of VLIW Architectures VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Maurizio Palesi and Davide Patti Davide Patti Università di Catania Dipartimento di Ingegneria Informatica e delle Telecomunicazioni DIIT - University of Catania, Italy

Upload: christian-melton

Post on 16-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 1

Exploring Design Space of Exploring Design Space of VLIW ArchitecturesVLIW Architectures

Giuseppe Ascia, Vincenzo Catania, Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Maurizio Palesi and Davide PattiDavide Patti

Università di Catania

Dipartimento di Ingegneria Informatica e delle Telecomunicazioni

DIIT - University of Catania, Italy

Page 2: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 2

OutlineOutline IntroductionIntroductionVLIW in past & futureVLIW in past & futureDesign Exploration FrameworkDesign Exploration Framework ILP oriented compilationILP oriented compilationGenetic Design Space ExplorationGenetic Design Space ExplorationConclusionsConclusions

Page 3: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 3

Instruction Level Instruction Level ParallelismParallelism

high performance processors in the 1980s: maximize ILPIssue more than one single instruction in a

given clock cycleWho decides which instructions can be

executed in parallel?

Two different philosophies:SuperscalarVery Long Instruction Word (VLIW)

Page 4: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 4

ILP philosophy: ILP philosophy: SuperscalarSuperscalar

Hide the process of finding ILP ILP is discovered dynamically at run-time by the

control hardware of the processor

HW

Op1,Op2Op3Op4,Op5…

Foo.c

Op1Op2Op3Op4Op5…

compiler

Instruction stream

Run-time

Page 5: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 5

ILP philosophy: VLIWILP philosophy: VLIW Hardware resources are architecturally visible to the compiler Compiler can create a sequence of Very Long Instructions

that defines the plan of execution HW simply execute the plan

HWFoo.cOp1,Op2Op3

Op4,Op5compiler

Hardware resources configuration

Plan of execution

Run-time

Page 6: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 6

VLIW past & futureVLIW past & future Decline of VLIWs for general purpose

systems:Couldn’t be integrated in a single chipBinary compatibility between implementations

Rediscovery of VLIW in embbededNo more integrability issuesBinary incompatibility not relevant Advanteges of VLIW:

Simplified hardwareoptimize ad-hoc the architecture to achieve ILP

Page 7: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 7

Reference architecture Reference architecture (HPL-PD)(HPL-PD)

L2 U

nifie

d C

ache

L2 U

nifie

d C

ache

PrefetchCache

PrefetchUnit

FetchUnit Instruction

Queue

Dec

od

e an

dC

on

tro

l Lo

gic

PredicateRegisters

BranchRegisters

GeneralPrupose

Registers

FloatingPoint

Registers

ControlRegisters

Load/StoreUnit

BranchUnit

IntegerUnit

FloatingPointUnit

L1

Dat

aC

ach

e

L1

Dat

aC

ach

eL

1 In

stru

ctio

nC

ach

e

L1

Inst

ruct

ion

Cac

he

Page 8: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 8

Configuration SpaceConfiguration SpaceThree main parameter categories:

VLIW core: Number of Registers in each register file (from 16 to 256) Number of istancies for Functional Units of each type (from 1 to 6)

Mem Hierarchy: Size, Blocksize, Associativity for each of the caches (L1 Instruction, L1 Data, L2)

Compiler: Conservative compilation strategy (basic blocks) Aggressive ILP oriented compilation strategy (hyperblocks)

Total space size: 1.47 x 1013 configurations !

Page 9: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 9

Required ToolsRequired Tools

High level estimation models Design Space Exploration strategy

Paretoconfigurations

Paretoconfigurations

ExplorationAlgorithm

ExplorationAlgorithmApplication.cApplication.c

ConfigurationConfiguration

Performances,Power,…

CompilerSimulatorEstimator

CompilerSimulatorEstimator

Page 10: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 10

An Open Platform: EPIC An Open Platform: EPIC Explorer Explorer

Interfacing to the Trimaran framework that provide VLIW compiler and simulator for dynamic statistics.

Estimator component implementing high level models

Explorer component implementing multi-objective design space exploration algorithms

Page 11: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 11

The Exploration Data The Exploration Data FlowFlow

IMPACTIMPACTFoo.cFoo.c

System configuration

ProcessorProcessor

MemoryMemory

EmulibEmulib

foo.exefoo.exe

Execution statisticsExecution statistics

EstimatorEstimator

EnergyEnergy PowerPowerCyclesCycles

ExplorerExplorer

ELCORELCOR

Page 12: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 12

Energy estimationEnergy estimation Subdivide architecture in Functional Block Unit (FBU)

Instruction decode logic, Integer units, floating point units, register files For each FBU (from ST Microelectronics LX)

Active power: average power dissipated when the FBU is used Inactive power: average power dissipated when the FBU is not used

From the execution statistic, we know how many cycles each FBU has been active/inactiveEFBU=(Pactive cyclesactive+ Pinactivecyclesinactive) Tclock

Discrete degree of accuracy (about 25%) investigate relative power savings beetween designs

Page 13: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 13

Reference Application Reference Application SetSetChosen from MediaBench suite

Application CategoryG721 encode Voice compression

Gsm encode Speech transcoding

Gsm decode Speech transcoding

Ieee 810 IEEE 1180 inverse DCT

JPEG Image compression

MPEG2 decode Video decoding

ADPCM encode Speech encoding

ADPCM decode Speech decoding

Fir FIR filter

Page 14: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 14

Exploration Exploration MethodologyMethodology Preliminary analisys of compilation

Impact of ILP oriented code transformations Predict the right compilation strategy:

Basic Blocks (conservative) Hyper Blocks (aggressive, ILP-oriented)

Multi-objective Design Space Exploration Extract Pareto Set

Page 15: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 15

Preliminary Analisys Preliminary Analisys (1/3)(1/3)

For each objective, Unpaired two sample t-test allows to estimate the average effect of hyperblock formation

ConfigurationSpace

CN

CH

Random subsets of n configurations

T-test

ON

OH

Compilation with (H) and without (N) hyperblock formation

Is the mean effect on the objective significant respect to the chosen critical difference?

Page 16: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 16

Preliminary Analisys Preliminary Analisys (2/3)(2/3) Example of a metric for critical difference in means: d > 50% M

Page 17: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 17

Preliminary Analisys Preliminary Analisys (3/3)(3/3)

Application Time (ms) Power (W) Energy (mJ)

Δ μN-μH Δ μN-μH Δ μN-μH

ieee810 16.64 6.76+1.84 1.64 0.38+0.16 49.01 30.82+4.55

gsm-enc 36.62 33.25+4.79 0.88 -0.48+0.14 79.28 55.84+9.82

jpeg 4.07 -0.97+0.51 0.89 -0.07+0.09 9.72 -2.31+1.01

adpcm-enc 15.8 8.17+2.2 1.25 -0.89+0.14 46.12 -8.56+3.73

MPEG dec 33.39 -5.28+4.85 0.88 0.25+0.16 62.50 -3.48+9.88

G721-enc 22.76 -7.23+2.95 0.76 -0.39+0.08 65.53 -32.4+5.9

adpcm-dec 24.2 -6.19+3.31 1.02 -0.5+0.12 58.54 -27.74+7.3

Fir 0.68 -0.26+0.08 0.79 -0.27+0.09 1.40 -0.97+0.12

gsm-dec 21.55 -23.83+2.58 0.54 -0.24+0.09 59.60 -56.6+6.43

ILP-oriented compilation impact (positive,negative)

Page 18: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 18

DSE: Genetic MappingDSE: Genetic Mapping

VLIWcore

VLIWcore

CacheCache

Bus ctrlBus ctrl

MemMem

Chromosome Size BSize Assoc Func units Register Files

Page 19: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 19

DSE: Genetic IterationDSE: Genetic Iteration

Current Population

Fitness Evaluation

SimulationEstimation

PerformancePower

Architectureconfiguration

Architectureconfiguration

IndividualIndividual

New Architectureconfiguration

New Architectureconfiguration

Selected ?

DiscendantDiscendant

CrossoverMutation

Page 20: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 20

DSE: Experimental DSE: Experimental ResultsResults Parameters Parameters :

Initial population: 30 individualsCrossover probability: 0.8Mutation probability: 0.1Generations: 50

Example of two different scenarios:G721 encode: exploration should include the

exploration of compilation strategyGsm-encode: hyperblock formation is predicted to

be a better choice

Page 21: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 21

Pareto Set (G721 Pareto Set (G721 encode)encode)

Page 22: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 22

Pareto Set (GSM-Pareto Set (GSM-encode)encode)

Page 23: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 23

ConclusionsConclusions Open platform for VLIW space exploration

Estimate Power, Energy and PerformancePreliminary Analisys of ILP-oriented compilation Genetic multi-objective design space exploration

Future developmentsClustered VLIW Network-on-chip multiprocessorsOpen source:

http://epic-explorer.sourceforge.net

Page 24: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 24

Thanks for your attention !

Page 25: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 25

AppendixAppendix Bus Power Estimation Bus Power Estimation Implemented AlgorithmsImplemented Algorithms Multiobjective Fitness assignmentMultiobjective Fitness assignment How Many Generations?How Many Generations?

Page 26: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 26

Summarizing TableSummarizing Table

Benchmark Visited configurations

Elapsed Time

Pareto Set

Power trade-off

Exec time Trade-off

Mpeg2dec 1137 47h 73 7x 6.8x

Jpeg 1012 17h 83 6x 8.2x

Adpcm-enc 1543 56h 64 4x 3x

Adpcm-dec 1433 44h 76 3.5x 4x

G721-enc 1256 83h 94 2.5x 2x

Page 27: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 27

Power Estimation Power Estimation (buses)(buses) Bus lines transitions computed from the list of

data/address memory accesses

Pbus = 0.5 (Vdd)2 f Cl

Vdd supply voltage

switching activityf clock frequencyCl capacity of a bus line

Page 28: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 28

Design Space ExplorationDesign Space Exploration

Implemented Algorithms :

Exhaustive: intuitive, simple and …unfeasible Dependency analysis (dep), Givargis et al.,

[TVLSI’02]

GA-based DSE (ga), Palesi et al., [CODES’01]

Sensitivity Analysis, Fornaciari et al., [DAES’02]Pareto-based Sensitivity Analysis (pbsa), Palesi et

al., [VLSI-SOC’01]

Page 29: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 29

Multiobjective Fitness Multiobjective Fitness assignmentassignment

Strength Pareto Approach [Zitzler,Thiele] From current population P , is extracted an

external set P*, containing the nondominated configuration of P.

Fitness of P* element j : fj = n/(N+1)N = total size of Pn = # of P configurations dominated by j

Fitness of P element i: 1/S . S is the sum of the fitness values of the P* elements

that dominates i

Page 30: ASAP 2005  Samos, Greece  July 23-25, 2005 1 Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide

ASAP 2005 Samos, Greece July 23-25, 2005 30

How Many How Many Generations?Generations?Fixed number of generationsAutostop criteria

Based on convergency

power

dela

y