formal engineering research with models abstractions and transformations fermat low power hardware...

41
Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications (CAOS) Sandeep K. Shukla Gaurav Singh FERMAT Lab, Virginia Tech.

Upload: jared-goodwin

Post on 02-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

Formal Engineering Research with Models Abstractions and Transformations

FERMAT

Low Power Hardware Synthesis from Concurrent

Action Oriented Specifications (CAOS)

Sandeep K. Shukla

Gaurav Singh

FERMAT Lab, Virginia Tech.

Page 2: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 2

Outline

• CAOS Scheduling Problem– Complexity Analysis

• Peak Power Problem– Complexity Analysis– Technique – Rescheduling ( suppressing actions )

• Dynamic Power Problem– Complexity Analysis– Techniques – Rescheduling, Operand Isolation, Clock Gating, Gated Guards.

Page 3: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 3

CAOS Scheduling Problem

( Complexity Analysis )

Page 4: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 4

SCHEDULING PROBLEMS WITHOUT A PEAK POWERCONSTRAINT

• Maximum Non-conflicting Subset of actions (MNS)

– Choosing actions which can execute in a clock cycle.

• Minimum Length Schedule Construction (MLS)

– Distributing actions over multiple clock cycles.

Page 5: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 5

MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)

Instance - Set A = {a1, a2, …, an} of enabled actions; a collection

C of pairs of actions, where {ai, aj} Є C means that actions ai and

aj conflict; an integer K ≤ n.

Question - Is there subset A’ C A such that |A’| > K and no pair of

actions in A’ conflict?

• MNS problem is NP-Complete.

• Corresponds to Maximum Independent Set (MIS) Problem.

Page 6: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 6

MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)

NOTE - For any ρ ≥ 1, a ρ-approximation algorithm for a

combinatorial optimization problem is a heuristic that produces a

solution which is within a factor ρ of the optimal solution value.

• It is known that for any Є > 0, there is no O(n1- Є) - approximation

algorithm for the MIS problem, unless P = NP.

• Same holds for MNS Problem.

Page 7: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 7

MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)

SOLUTION - Heuristics with good performance guarantees can be

devised by exploiting the relationship between MNS and MIS

problems.

• SPECIAL CASES – – Each action conflicts with at most Δ other actions for some constant

Δ- • Approximation algorithm exists that provides a performance guarantee of Δ+1.

– Planar graphs, near-planar graphs and unit disk graphs- • Efficient approximation algorithms are known for such classes of graphs.

Page 8: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 8

MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)

Instance - Set A = {a1, a2,…,an} of actions; a collection C of

pairs of actions, where {ai, aj} Є C means that actions ai and aj

conflict, an integer t ≤ n.

Question - Is there a partition of A into r subsets A1, A2,...,Ar for

some r ≤ t such that for each i, 1 ≤ i ≤ r, the actions in Ai are

pair-wise non-conflicting?

• MLS problem is NP-Complete.

• Corresponds to Minimum K-coloring (MINCOLOR) Problem.

Page 9: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 9

MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)

• It is known that for any Є > 0, there is no O(n1- Є) - approximation

algorithm for MINCOLOR problem, unless P = NP.

• Same holds for MLS Problem.

Page 10: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 10

MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)

SOLUTION – Heuristics for graph coloring can be used in

constructing schedules of near-minimum length.

• SPECIAL CASES – – Upper bound on the length of schedule is two -

• Corresponds to the problem of determining whether a graph is 2-colorable.

• Efficient algorithms are known.

– Each action conflicts with at most Δ other actions – • For such instances, a schedule of length at most Δ + 1 can be

constructed in polynomial time.

Page 11: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 11

PEAK POWER PROBLEM

( Complexity Analysis )

Page 12: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 12

SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT

Single Clock Cycle –

– Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP).

– Maximizing Utility Subject to Peak Power Constraint (MU-PP).

Page 13: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 13

Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP).

Instance –– set A = {a1, a2,…, an} of non-conflicting actions,– for each action ai, the power pi needed to execute that action, – a positive number P representing the peak power constraint.

Requirement - Find a subset A’ C A such that - – total power needed to execute actions in A’ is at most P and– |A’| is a maximum over all subsets of A that satisfy peak power

constraint.

Optimal Solution - – Sort actions in A into non-decreasing order by the amount of power.– Keep adding actions in order as long as the peak power constraint is

satisfied.

Page 14: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 14

Maximizing Utility Subject to Peak Power Constraint (MU-PP)

Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, its power pi consumed and its utility ui, – a positive number P representing the peak power, – a positive number Γ representing the required utility.

Question - Is there a subset A’ C A such that the total power needed to execute all the actions in A’ is at most P and the utility of A’ is at least Γ ?

• MU-PP problem is NP-Complete.

• Corresponds to KNAPSACK Problem.

Page 15: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 15

Maximizing Utility Subject to Peak Power Constraint (MU-PP)

• Any approximation algorithm for the KNAPSACK problem can be used as

an approximation algorithm with the same performance guarantee for the

optimization version of MU-PP

• When the weights and profits are integers, there is a polynomial time

approximation scheme (PTAS) for the KNAPSACK problem.

Page 16: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 16

SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT

Multiple Clock Cycles –

– Minimizing Makespan Subject to Peak Power Constraint (MM-PP).

– Minimizing Peak Power Subject to Makespan Constraint (MPP-M).

– Minimizing Makespan and Peak Power – Decision Version(MPP-DECISION)

Page 17: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 17

Minimizing Makespan Subject to Peak Power Constraint (MM-PP)

Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that

action,– a positive number P representing the peak power

Requirement –

Find a schedule of minimum length for the actions in A such that the total power needed to execute the actions in each time slot is at most P.

Page 18: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 18

Minimizing Peak Power Subject to a Makespan Constraint (MPP-M)

Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that

action,– a positive number L representing the makespan (number of

slot used by a schedule).

Requirement –

Find a schedule of length at most L for the actions in A such that the maximum total power used in any time slot is a minimum over all schedules of length at most L.

NOTE - MPP-M is dual of MM-PP.

Page 19: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 19

Minimizing Makespan and Peak Power (MPP-DECISION)– Decision Version of MM-PP and MPP-M.

Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that action,– a positive number P representing the peak power,– a positive number L representing the makespan.

Question – Is there a schedule of length at most L for the actions in A such that the

total power used in any time slot is at most P ?

• MPP-DECISION problem is Strongly NP-Complete.

• Corresponds to 3-PARTITION problem.

• No pseudo-polynomial algorithm for the MPP-DECISION problem, unless

P = NP.

Page 20: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 20

Approximation Algorithms for MM-PP

• Efficient approximation algorithms possible by reducing the

problem to the well known BIN PACKING problem.

• Example - Simple algorithm called First Fit Decreasing (FFD)

provides a performance guarantee of 11/9.

– Sort items in non-increasing order of their sizes and then assign each item to the first bin in which it will fit.

– Sophisticated implementation reduces the running time to O(n log n).

Page 21: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 21

Approximation Algorithms for MPP-M

• Efficient approximation algorithms possible by reducing the

problem to classical multiprocessor scheduling problem.

• Example –– 4/3 approximation algorithm -

• Sort the actions in non-increasing order of their power requirements.

• Assign each action to a time slot for which the total power used is the smallest at that time.

– Can be implemented to run in O(n log n) time.

Page 22: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 22

LOW PEAK POWER TECHNIQUE

Re-scheduling – Suppress some actions in each cycle to reduce peak power of the design.

Possible Ways – – Conflict - based

• Add extra conflicts for peak power sake.

– Memory - based• Use memory to select how many actions to execute in each

cycle.

Page 23: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 23

MEMORY-BASED LOW PEAK POWER TECHNIQUE

ALGORITHM -– Arrange actions based on their TRS ordering.

– Find possible combinations of non-conflicting actions which can violate the peak power constraint when executed concurrently.

– For each violating combination -• find a satisfying combination by suppressing some actions.• give priority to actions which come earlier in TRS-ordering.• store the satisfying combinations in a memory.

– In hardware, memory is used to execute appropriate actions in each clock cycle in order to satisfy the peak power constraint.

Page 24: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 24

MEMORY-BASED LOW PEAK POWER TECHNIQUE

Implemented in Bluespec Compiler –

– Around 10% peak-power savings achieved for small designs like Vending Machine.

– Larger power savings may be possible for larger designs • Experiments Ongoing.

Page 25: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 25

MEMORY-BASED LOW PEAK POWER TECHNIQUE

LIMITATIONS -

– Some designs written under the assumption that maximum number of actions will execute in each clock cycle might not be able to use this technique.

– Increases latency so applicable mostly to latency-insensitive designs.

– Designs with large number of actions may result in a big memory.

Page 26: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 26

DYNAMIC POWER PROBLEM

( Complexity Analysis )

Page 27: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 27

DYNAMIC POWER PROBLEM (DPP)

Instance –

- set A = {a1, a2,…,an} of actions.

- a positive integer P representing dynamic power consumed.

Requirement -

Select the ordering of execution of actions in A such that P is minimized.

• DPP is NP-Complete.

• Corresponds to Traveling Salesman Problem - sub-problem to DPP.

Page 28: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 28

LOW DYNAMIC POWER TECHNIQUES

• Re-scheduling.

• Operand Isolation.

• Clock Gating.

• Gated Guards.

Page 29: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 29

RE-SCHEDULING

• Actions can be re-scheduled such that switching at the inputs of the functional units is minimized.

• Resource sharing - Conflicts can be created such that same functional units can be shared among actions consisting of same operations on same operands.

Page 30: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 30

OPERAND ISOLATION

• Operand Isolation –– Computation corresponding to the body of an action is

allowed only when its output is used in the present clock cycle.

– Involves - • Insertion of gates at the appropriate points without affecting

guards.• Selection of activation signal.

– Guards of actions used as gating signals.

– Implemented algorithm in Bluespec Compiler saved upto 25% dynamic power.

Page 31: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 31

x

y

zcurrentstate

nextstate

enablesignals

x’

y’

z’

next-statevaluesQ D

EN

bodylogic

condlogic

action foo

Φ2

Computations stay quiescent except

when action executes, i.e. guard is True

action foo (… cond … (x < y) …);

x <= x + z …

endrule

OPERAND ISOLATION – SINGLE ACTION

Page 32: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 32

OPERAND ISOLATION – MULTIPLE ACTIONS

Isolating multiple actions of a design.

Scheduler

DataSelect

State

DQ

Enable

RuleN

CondN

ActionN

Cond1

Action1

Rule1Rule Control

ΦN

Φ2

Page 33: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 33

REGISTER CLOCK GATING

• Register Clock-gating -– Registers having a common ENABLE signal can be provided

the same gated clock.

– CAOS - Registers being updated by same set of actions can be passed the same gated clock.

• Implemented algorithm in Bluespec Compiler saved upto 45% dynamic power.

Page 34: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 34

REGISTER CLOCK GATING

In CAOS, guards of the actions provide the control for gating the clocks of the registers.

EN

Register

DINQOUT

CLK

EN

CLK

GATED_CLK

GATED_CLK

Page 35: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 35

GATED GUARDS

• In hardware, only required guards should be computed in each clock cycle for power sake.

• Static analysis can be done to figure out which guards should be

computed.

Page 36: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 36

Gated Guards• Rule 1: (x > y) && (y != 0) --> (x = y; y = x;)

• Rule 2: (x <= y) && (y != 0) --> (y = y - x;)

• Rule 3: (y == 0) --> (result = x;)

Let P = ( x > y) ; Q = (y == 0);

Then g1: P && !Q

g2: !P && !Q

g3: Q

------------------------------------------

g1 && g2 = false;

g1 && g3 = false;

g3 && g1 = false

Page 37: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 37

Gated Guards

What else can we infer?

(x > y), (y != 0), (x’ == y), (y’ == x)

------------------------------------------------------

(x’ <= y’) && (y’ != 0) OR (y == 0)

So after Rule 1 execution, we know for sure, G1 cannot be true, but G2 or G3 may be true, and hence G1 need not be evaluated. Also prioritize G3.

Page 38: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 38

Gated Guard• Gcd (70, 42)

• x = 70, y = 42 --> Rule 1

• x = 42, y = 70 --> Rule 2

• x = 42, y = 28 --> Rule 1

• x = 28, y = 42 --> Rule 2

• x = 28, y = 14 --> Rule 1

• x = 14, y = 28 --> Rule 2

• x = 14, y = 14 --> Rule 2

• x = 14, y = 0 --> Rule 3

• result = 14

Page 39: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 39

Gated Guard• Use a F/F that gets value 1, when Rule 1 is

fired, and becomes 0, when other rules are fired.

• If this F/F holds a value 1, evaluate only G3 and then G2.

• Unless Rule 1 is fired, this F/F stays at 0, and hence can be clock gated most of the time.

• This example may not be very useful, as the guards are simple to evaluate, but guard calculus on complex guards can lead to savings.

Page 40: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 40

GATED GUARDS

• Theorem proving techniques can be used for deductions.

• Such analysis can be done for more complicated designs.

• A memory in hardware can be used to store the information about which guards need not be computed in the present clock cycle.

Page 41: Formal Engineering Research with Models Abstractions and Transformations FERMAT Low Power Hardware Synthesis from Concurrent Action Oriented Specifications

FERMAT / Virginia Tech 41

Thank You !!

?