uic thesis morandi

48
BY Massimo Morandi [email protected] Thesis committee: John Lillis (Chair), Donatella Sciuto, Mitchell Theys UIC Thesis Defense: May 9 2008 Runtime Core Allocation Management Runtime Core Allocation Management for 2D Self Partially and Dynamically for 2D Self Partially and Dynamically Reconfigurable Systems Reconfigurable Systems

Upload: usrdresd

Post on 03-Jul-2015

279 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: UIC Thesis Morandi

BY

Massimo Morandi

[email protected]

Thesis committee:

John Lillis (Chair), Donatella Sciuto, Mitchell Theys

UIC Thesis Defense: May 9 2008

Runtime Core Allocation Management Runtime Core Allocation Management for 2D Self Partially and Dynamically for 2D Self Partially and Dynamically

Reconfigurable SystemsReconfigurable Systems

Page 2: UIC Thesis Morandi

2

Rationale and InnovationRationale and Innovation

Problem statementProviding runtime management support for 2D self partial and dynamical reconfiguration, in particular for what concerns Core placement decisions

Innovative contributionsA fast and flexible solution

A low complexity, to avoid introducing too much overhead at runtimeSupporting different scenarios and placement policies, according to user needs

Allowing the possibility to exploit multiple shapes per Core by integration with area constraints definition

Page 3: UIC Thesis Morandi

3

AimsAims

Our proposed solution must support different scenarios, placement policies and intervention from the designer

It must be fast when compared to related solutions existing in literature

The quality of the placement choices must be high, in terms of percentage of placement success, global application completion time or other metrics, as defined by the user

Page 4: UIC Thesis Morandi

4

OutlineOutline

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 5: UIC Thesis Morandi

5

Context DefinitionContext Definition

Reconfigurable hardware:Has the capability of changing its configuration (functionality) according to user needs

Self reconfiguration:the system must be completely autonomous at runtime

Partial reconfiguration:the changes can also involve fractions of the device

Dynamical Reconfiguration:if a part of the hardware is reconfigured, the rest can continue its computation

2D Reconfiguration:arbitrary rectangular slots can be dynamically reconfigured, as opposed to arbitrary columns in 1D

Page 6: UIC Thesis Morandi

6

Field Programmable Gate ArrayField Programmable Gate Array

Minimum Granularity:Physical: there is a minimum unit that can be configured independently, depending on the device (Tile)Practical: since reconfiguration has a cost, it is reasonable to define a multiple of a Tile as the minimum reconfigurable unit (Slot)

Page 7: UIC Thesis Morandi

7

A bit of TerminologyA bit of Terminology

Bitstream:Binary file defining the configuration of part or all the reconfigurable device (FPGA)

Core:Representation of a functionality, independent of shape and position (example: JPEG)

RFU (Reconfigurable Functional Unit):A Core to which area constraints have been applied (example: JPEG constrained in a 2x3 rectangle)

A partial bitstream defines a RFU, implemented in a specific position defined by bottom-left cornerThe same bitstream can be reused for all positions if we exploit bitstream relocation

Page 8: UIC Thesis Morandi

8

A bit of TerminologyA bit of Terminology

Page 9: UIC Thesis Morandi

9

Virtual homogeneityVirtual homogeneity

Page 10: UIC Thesis Morandi

10

What’s nextWhat’s next

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 11: UIC Thesis Morandi

11

Motivations and goalsMotivations and goals

The creation and management of a self partially and dynamically reconfigurable system is a complex problem

this is even more critical when exploiting the 2D reconfiguration paradigmmore issues in the definition of area constraints, in the core allocation decisionssince the system must be autonomous, it also needs runtime management functionalities

Need for automation in those processesto reduce the workload on the designerto improve efficiency of the final reconfigurable system

Page 12: UIC Thesis Morandi

12

Motivations and goalsMotivations and goals

Creation of an automated workflow to generate a self dynamically reconfigurable architecture that:

Has “good” area constraints assigned to coresIs autonomous in performing 2D runtime core allocation decisionsExploits relocation to ensure that the system can obtain the configuration bitstreams it needs at runtimeSupports intervention from the designer, to guide or constraint the decisionsKeeps high flexibility and generality

Page 13: UIC Thesis Morandi

13

The Complete WorkflowThe Complete Workflow

Workflow to automate the creation and management of self dynamically reconfigurable architectures

Input: user specificationsFinal output: complete architecture generation

Page 14: UIC Thesis Morandi

14

Specific ContributionsSpecific Contributions

In particular, this thesis deals with the solution identification phase of the flowThis involves:

The definition of area constraints for Cores, when the user does not specify themThe creation of Core Allocation Management solutions, able to efficiently manage runtime Core placement

This last task includes:Offering high versatility, supporting different placement policies and different scenariosKeeping low complexity, to avoid too much overhead in the running time of the systemExperimenting techniques to improve the efficiency, for example allowing multiple shapes per Core

Page 15: UIC Thesis Morandi

15

What’s NextWhat’s Next

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 16: UIC Thesis Morandi

16

Area Constraints DefinitionArea Constraints Definition

The designer can choose to specify or not the AC for each Core in the application

If not specified, they are automatically computed

The designer can also choose wheter to allow multiple shapes per Core (and how many)

Finally, the last parameter represent the tightness of the constraints that will be defined:

Impacts on feasibility of implementationImpacts on performance of the RFU

CORE RFU (or set of RFUs)

Page 17: UIC Thesis Morandi

17

Area Constraints DefinitionArea Constraints Definition

The constraints are defined with a simple heuristics

First a square-like constraint is defined, using these formulae:

Where H is the height (in slice) and W is the width, S is the number of slices of the Core and m is the tightness

Page 18: UIC Thesis Morandi

18

Area Constraints DefinitionArea Constraints Definition

Then, the constraints are converted from slice to slots

Where Vg is a granularity parameter, Vslices is the number of vertical slices in the device and avgH is the average height of all the RFUs defined with the square-like formula

Finally, the constraints (in slots) are iteratively altered to horizontally or vertically stretch the Core and obtain multiple RFUs

Page 19: UIC Thesis Morandi

19

What’s nextWhat’s next

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 20: UIC Thesis Morandi

20

Runtime Core Allocation ManagementRuntime Core Allocation Management

The Problem:Perform the choice of where to place new cores on the reconfigurable areaIn an online scenario: self partial and dynamical reconfiguration

The Goal:Allow efficient usage of the FPGA area Critical in the 2D reconfiguration case

This requires the creation of a solution for allocation management and suitable policies

Page 21: UIC Thesis Morandi

21

Allocation Manager Desired FeaturesAllocation Manager Desired Features

Low Core Rejection Rate (CRR)% of cores that are not successfully placed in time

Fast application completion timeTime from arrival of first Core to completion of last

Low fragmentation gradeFraction of area that is unusable because too sparse

Small management overheadWe want a lightweight solution to run inside the system

High routing efficiencyIf interacting cores are clustered, the system is more efficient

Need to find a good compromise between them

Page 22: UIC Thesis Morandi

22

Example: 2D fragmentationExample: 2D fragmentation

the 2D-fragmentation problem:Area generally more fragmentedCan nullify the area optimizations obtained

Page 23: UIC Thesis Morandi

23

Example: Core RejectionExample: Core Rejection

Bad choices can lead to performance loss and rejectionA: Core C is successfully placed at step 2B: Core C is delayed (possibly rejected, if deadline=2)

Page 24: UIC Thesis Morandi

24

Considered ScenariosConsidered Scenarios

Dynamic ScheduleCores can arrive at any timeHave an ASAP and an ALAP time (dependencies)Rejection: failure to respect ALAP for a CoreGoal: respect the schedule, CRR is the most important metric and should tend to zero

Blind ScheduleCores can be either available from the start or arrive at different times, no dependencies assumedno ASAP, Cores can optionally have a deadlineIf a Core is not placed, retry laterGoal: application must complete as fast as possibile, rejection is not the main issue, total time is

Page 25: UIC Thesis Morandi

25

Allocation Manager CreationAllocation Manager Creation

Choose how to maintain information on empty spaceKeep all information (Expensive but more accurate)Heuristically prune information (Cheaper)

Which placement policy to choose:General (First Fit, Best Fit, Worst Fit…)Focused (Fragmentation Aware, Routing Aware… )

Define in which scenario(s) the manager will work

It can also be useful to consider and exploit different shapes of a Core (multiple RFUs per Core scenario)

Page 26: UIC Thesis Morandi

26

What’s nextWhat’s next

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 27: UIC Thesis Morandi

27

Relevant WorksRelevant Works

Maintain complete information on empty space:

KAMER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.

Keep All Maximally Empty RectanglesApply a general placement policy

CUR: A. Ahmadinia and C. Bobda and S. P. Fekete and J. Teich and J. v.d. Veen, ''Optimal Routing-Conscious Dynamic Placement for Reconfigurable Devices'', Field-Programmable Logic and Applications (FPL'04), 2004.

Maintain the Countour of a Union of RectanglesApply a focused placement policy

Page 28: UIC Thesis Morandi

28

Relevant WorksRelevant Works

Heuristically prune part of the information:

KNER: K. Bazargan, R. Kastner and M. Sarrafzadeh, ''Fast template placement for reconfigurable computing systems'', IEEE Design and Test of Computers, Vol.17, 2000.

Keep Non-overlapping Empty RectanglesApply a general placement policy

2D-HASHING: H. Walder and C. Steiger and M. Platzner, ''Fast Online Task Placement on FPGAs: Free Space Partitioning and 2D-Hashing'', International Parallel and Distributed Processing Symposium (IPDPS'03), 2003.

Keep Non-ov. Empty Rectangles in optimized data structure

Apply (exclusively) a general placement policy

Page 29: UIC Thesis Morandi

29

Example: Empty Space InformationExample: Empty Space Information

Page 30: UIC Thesis Morandi

30

EvaluationEvaluation

The solutions with higher placement quality also have higher complexityThe fastest solution cannot exploit focused policies, for example routing aware, and adds the overhead of maintaining the 2D hashing structureCUR does not support all general policies, for example Best Fit is not allowed

Page 31: UIC Thesis Morandi

31

What’s nextWhat’s next

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 32: UIC Thesis Morandi

32

Proposed ApproachProposed Approach

Choice driven by:Need for a low complexity solution to introduce low overhead at runtime in the self reconfigurable systemDesire to keep high flexibility, to suit user needs also in terms of placement policies

For this reasons we propose an heuristic (KNER-like) empty space manager:

Supporting general and focused placement policies (in particular, First Fit, Best Fit and Routing Aware)Suitable for both dynamic schedule and blind schedule scenariosExploiting multiple RFUs per Core, to improve results

Page 33: UIC Thesis Morandi

33

Data RepresentationData Representation

Core, defined by:Arrival time,Set of RFUs, each one with:

H, W, Latency

Optional set of communicating Cores (if using RA)ASAP and ALAP (if in dynamic schedule scenario)

Two queues: one for new Coresone for Cores that were not successfully placed and need reexamination

Page 34: UIC Thesis Morandi

34

Data RepresentationData Representation

Reconfigurable Device, represented as:Binary Tree structure, each node is a Rectangle, each leaf is an empty Rectangle.Navigation trough:

pointers to left child, right child, next leafa function to find the previous leaf (used for bookkeeping after rectangle split and merge operations)

Rectangle, defined by:Coordinates on device: X, YSize: H, WInitially one, the root, with:

(X,Y)=(0,0), H=FPGA Rows, W=FPGA Cols

Page 35: UIC Thesis Morandi

35

The Online Placement AlgorithmThe Online Placement Algorithm

The whole processing of a Core is completed in linear time

Page 36: UIC Thesis Morandi

36

The Online Placement AlgorithmThe Online Placement Algorithm

Page 37: UIC Thesis Morandi

37

The Online Placement AlgorithmThe Online Placement Algorithm

Page 38: UIC Thesis Morandi

38

What’s nextWhat’s next

Context Definition

Motivations and GoalsThe Complete Polaris WorkflowSpecific Contributions

Area Constraints DefinitionProposed solution

Runtime Core Allocation ManagementFeatures and Structure of an Allocation ManagerRelevant WorksProposed Solution

ResultsConclusions and Future Work

Page 39: UIC Thesis Morandi

39

Evaluation of the proposed solutionEvaluation of the proposed solution

To evaluate the quality of the proposed approach in various scenarios and with different metrics 3 kinds of experiment were performed:

1) A comparison against presented literature solutionsIn a dynamic schedule scenarioWith a Routing Aware placement policyMeasuring CRR (and indirectly fragmentation), routing costs and computational overheadResults published in:

M. MORANDI, M. Novati, M. D. Santambrogio, D. Sciuto, “Core allocation and relocation management for a self dynamically recongurable architecture”, IEEE Computer Society Annual Symposium on VLSI, 2008

Page 40: UIC Thesis Morandi

40

Evaluation of the proposed solutionEvaluation of the proposed solution

2) A measure of application completion timeComposed of real Cores used as benchmarksIn a blind schedule scenarioDirectly measuring application completion time, gaining some insight on CRR and fragmentation

3) Evaluation of the multiple shapes per Core approachComparison between our solution with multiple shapes and KNER (adapted to blind schedule scenario)In a mixed scenario (blind schedule with deadlines and variable arrival times)Using both First Fit and Best FitMeasure of CRR and running time

Page 41: UIC Thesis Morandi

41

Experiment 1: Routing AwareExperiment 1: Routing Aware

Version of our general solution:Tailored to minimize routing pathsCompared with close solutions from literatureNamed in the table RALP (Routing Aware Linear Placer)

Benchmark of 100 randomly generated tasks:Size (5% to 20% of FPGA), randomly interconnected

Page 42: UIC Thesis Morandi

42

Experiment 2: Appl. Completion TimeExperiment 2: Appl. Completion Time

Benchmark applications composed of cores taken from opencores.org like JPEG, AES, 3DESMeasure the time instants needed to complete the applications with different amounts of resources

Infinite resources is shown, to compare against the lower bound

Page 43: UIC Thesis Morandi

43

Experiment 3: Multiple ShapesExperiment 3: Multiple Shapes

Similar benchmark, but Cores have deadlines (for CRR)Shapes defined using the heuristic described previously

Difference in runtime is on average 30% more for 3 shapes and 40% more for 5 shapes w.r.t. 1 shapeCRR is more than halved, often reduced to one third

Page 44: UIC Thesis Morandi

44

Numerical ExampleNumerical Example

To give an idea of the goodness of the obtained results, it is useful to give some numerical values for reconfiguration

Let us consider a JPEG Core, described by a 690 Kb configuration bitstream for a V4 device and using about 10% of the total area

Reconfiguration time: 150 msRelocation time: 90 msPlacement time: 0.4 ms

The obtained time is low and is suitable to actual usage in a real system

Page 45: UIC Thesis Morandi

45

Concluding RemarksConcluding Remarks

The proposed solution offers:High versatility, supporting different placement policies and scenarios, designer intervention, multiple shapesLow overhead, always processing a Core in linear time and obtaining good results compared with literatureGood CRR, especially when exploiting multiple shapesFast application completion time, as shown by exp. 2Effective routing costs reduction, when used in conjunction with a Routing Aware policy (exp. 1)

The original goals were metUnder Review:

S. Corbetta, M. MORANDI, M. Novati, M. D. Santambrogio, D. Sciuto, P. Spoletini, “Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration”, IEEE Transactions on VLSI (2nd review)

Page 46: UIC Thesis Morandi

46

Future WorkFuture Work

Future work will be in the direction of integration with the rest of the workflow that was briefly introduced

The parts that were described achieved good results as a stand-alone in the runtime management of the reconfigurable system, it is important to evaluate them also inside the complete workflow

The final goal is to achieve complete automation in the creation process of a self dynamically reconfigurable architecture, from user specification up to bistreams and processor code generation

Page 47: UIC Thesis Morandi

47

General InformationGeneral Information

Webpagewww.dresd.org/polaris

Mailing [email protected]

ContactTo have more information regarding Polaris:

[email protected]

For a complete list of information on how to contact us:www.dresd.org/contact_polaris

Page 48: UIC Thesis Morandi

48

QuestionsQuestions