Formal Engineering Research with Models Abstractions and Transformations
FERMAT
Low Power Hardware Synthesis from Concurrent
Action Oriented Specifications (CAOS)
Sandeep K. Shukla
Gaurav Singh
FERMAT Lab, Virginia Tech.
FERMAT / Virginia Tech 2
Outline
• CAOS Scheduling Problem– Complexity Analysis
• Peak Power Problem– Complexity Analysis– Technique – Rescheduling ( suppressing actions )
• Dynamic Power Problem– Complexity Analysis– Techniques – Rescheduling, Operand Isolation, Clock Gating, Gated Guards.
FERMAT / Virginia Tech 3
CAOS Scheduling Problem
( Complexity Analysis )
FERMAT / Virginia Tech 4
SCHEDULING PROBLEMS WITHOUT A PEAK POWERCONSTRAINT
• Maximum Non-conflicting Subset of actions (MNS)
– Choosing actions which can execute in a clock cycle.
• Minimum Length Schedule Construction (MLS)
– Distributing actions over multiple clock cycles.
FERMAT / Virginia Tech 5
MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
Instance - Set A = {a1, a2, …, an} of enabled actions; a collection
C of pairs of actions, where {ai, aj} Є C means that actions ai and
aj conflict; an integer K ≤ n.
Question - Is there subset A’ C A such that |A’| > K and no pair of
actions in A’ conflict?
• MNS problem is NP-Complete.
• Corresponds to Maximum Independent Set (MIS) Problem.
FERMAT / Virginia Tech 6
MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
NOTE - For any ρ ≥ 1, a ρ-approximation algorithm for a
combinatorial optimization problem is a heuristic that produces a
solution which is within a factor ρ of the optimal solution value.
• It is known that for any Є > 0, there is no O(n1- Є) - approximation
algorithm for the MIS problem, unless P = NP.
• Same holds for MNS Problem.
FERMAT / Virginia Tech 7
MAXIMUM NON-CONFLICTING SUBSET OF ACTIONS (MNS)
SOLUTION - Heuristics with good performance guarantees can be
devised by exploiting the relationship between MNS and MIS
problems.
• SPECIAL CASES – – Each action conflicts with at most Δ other actions for some constant
Δ- • Approximation algorithm exists that provides a performance guarantee of Δ+1.
– Planar graphs, near-planar graphs and unit disk graphs- • Efficient approximation algorithms are known for such classes of graphs.
FERMAT / Virginia Tech 8
MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
Instance - Set A = {a1, a2,…,an} of actions; a collection C of
pairs of actions, where {ai, aj} Є C means that actions ai and aj
conflict, an integer t ≤ n.
Question - Is there a partition of A into r subsets A1, A2,...,Ar for
some r ≤ t such that for each i, 1 ≤ i ≤ r, the actions in Ai are
pair-wise non-conflicting?
• MLS problem is NP-Complete.
• Corresponds to Minimum K-coloring (MINCOLOR) Problem.
FERMAT / Virginia Tech 9
MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
• It is known that for any Є > 0, there is no O(n1- Є) - approximation
algorithm for MINCOLOR problem, unless P = NP.
• Same holds for MLS Problem.
FERMAT / Virginia Tech 10
MINIMUM LENGTH SCHEDULE CONSTRUCTION (MLS)
SOLUTION – Heuristics for graph coloring can be used in
constructing schedules of near-minimum length.
• SPECIAL CASES – – Upper bound on the length of schedule is two -
• Corresponds to the problem of determining whether a graph is 2-colorable.
• Efficient algorithms are known.
– Each action conflicts with at most Δ other actions – • For such instances, a schedule of length at most Δ + 1 can be
constructed in polynomial time.
FERMAT / Virginia Tech 11
PEAK POWER PROBLEM
( Complexity Analysis )
FERMAT / Virginia Tech 12
SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT
Single Clock Cycle –
– Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP).
– Maximizing Utility Subject to Peak Power Constraint (MU-PP).
FERMAT / Virginia Tech 13
Maximum Number of Actions in a Time Slot Subject to Peak Power Constraint (MNA-PP).
Instance –– set A = {a1, a2,…, an} of non-conflicting actions,– for each action ai, the power pi needed to execute that action, – a positive number P representing the peak power constraint.
Requirement - Find a subset A’ C A such that - – total power needed to execute actions in A’ is at most P and– |A’| is a maximum over all subsets of A that satisfy peak power
constraint.
Optimal Solution - – Sort actions in A into non-decreasing order by the amount of power.– Keep adding actions in order as long as the peak power constraint is
satisfied.
FERMAT / Virginia Tech 14
Maximizing Utility Subject to Peak Power Constraint (MU-PP)
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, its power pi consumed and its utility ui, – a positive number P representing the peak power, – a positive number Γ representing the required utility.
Question - Is there a subset A’ C A such that the total power needed to execute all the actions in A’ is at most P and the utility of A’ is at least Γ ?
• MU-PP problem is NP-Complete.
• Corresponds to KNAPSACK Problem.
FERMAT / Virginia Tech 15
Maximizing Utility Subject to Peak Power Constraint (MU-PP)
• Any approximation algorithm for the KNAPSACK problem can be used as
an approximation algorithm with the same performance guarantee for the
optimization version of MU-PP
• When the weights and profits are integers, there is a polynomial time
approximation scheme (PTAS) for the KNAPSACK problem.
FERMAT / Virginia Tech 16
SCHEDULING PROBLEMS INVOLVING A POWERCONSTRAINT
Multiple Clock Cycles –
– Minimizing Makespan Subject to Peak Power Constraint (MM-PP).
– Minimizing Peak Power Subject to Makespan Constraint (MPP-M).
– Minimizing Makespan and Peak Power – Decision Version(MPP-DECISION)
FERMAT / Virginia Tech 17
Minimizing Makespan Subject to Peak Power Constraint (MM-PP)
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that
action,– a positive number P representing the peak power
Requirement –
Find a schedule of minimum length for the actions in A such that the total power needed to execute the actions in each time slot is at most P.
FERMAT / Virginia Tech 18
Minimizing Peak Power Subject to a Makespan Constraint (MPP-M)
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that
action,– a positive number L representing the makespan (number of
slot used by a schedule).
Requirement –
Find a schedule of length at most L for the actions in A such that the maximum total power used in any time slot is a minimum over all schedules of length at most L.
NOTE - MPP-M is dual of MM-PP.
FERMAT / Virginia Tech 19
Minimizing Makespan and Peak Power (MPP-DECISION)– Decision Version of MM-PP and MPP-M.
Instance – – set A = {a1, a2,…,an} of non-conflicting actions, – for each action ai, the power pi needed to execute that action,– a positive number P representing the peak power,– a positive number L representing the makespan.
Question – Is there a schedule of length at most L for the actions in A such that the
total power used in any time slot is at most P ?
• MPP-DECISION problem is Strongly NP-Complete.
• Corresponds to 3-PARTITION problem.
• No pseudo-polynomial algorithm for the MPP-DECISION problem, unless
P = NP.
FERMAT / Virginia Tech 20
Approximation Algorithms for MM-PP
• Efficient approximation algorithms possible by reducing the
problem to the well known BIN PACKING problem.
• Example - Simple algorithm called First Fit Decreasing (FFD)
provides a performance guarantee of 11/9.
– Sort items in non-increasing order of their sizes and then assign each item to the first bin in which it will fit.
– Sophisticated implementation reduces the running time to O(n log n).
FERMAT / Virginia Tech 21
Approximation Algorithms for MPP-M
• Efficient approximation algorithms possible by reducing the
problem to classical multiprocessor scheduling problem.
• Example –– 4/3 approximation algorithm -
• Sort the actions in non-increasing order of their power requirements.
• Assign each action to a time slot for which the total power used is the smallest at that time.
– Can be implemented to run in O(n log n) time.
FERMAT / Virginia Tech 22
LOW PEAK POWER TECHNIQUE
Re-scheduling – Suppress some actions in each cycle to reduce peak power of the design.
Possible Ways – – Conflict - based
• Add extra conflicts for peak power sake.
– Memory - based• Use memory to select how many actions to execute in each
cycle.
FERMAT / Virginia Tech 23
MEMORY-BASED LOW PEAK POWER TECHNIQUE
ALGORITHM -– Arrange actions based on their TRS ordering.
– Find possible combinations of non-conflicting actions which can violate the peak power constraint when executed concurrently.
– For each violating combination -• find a satisfying combination by suppressing some actions.• give priority to actions which come earlier in TRS-ordering.• store the satisfying combinations in a memory.
– In hardware, memory is used to execute appropriate actions in each clock cycle in order to satisfy the peak power constraint.
FERMAT / Virginia Tech 24
MEMORY-BASED LOW PEAK POWER TECHNIQUE
Implemented in Bluespec Compiler –
– Around 10% peak-power savings achieved for small designs like Vending Machine.
– Larger power savings may be possible for larger designs • Experiments Ongoing.
FERMAT / Virginia Tech 25
MEMORY-BASED LOW PEAK POWER TECHNIQUE
LIMITATIONS -
– Some designs written under the assumption that maximum number of actions will execute in each clock cycle might not be able to use this technique.
– Increases latency so applicable mostly to latency-insensitive designs.
– Designs with large number of actions may result in a big memory.
FERMAT / Virginia Tech 26
DYNAMIC POWER PROBLEM
( Complexity Analysis )
FERMAT / Virginia Tech 27
DYNAMIC POWER PROBLEM (DPP)
Instance –
- set A = {a1, a2,…,an} of actions.
- a positive integer P representing dynamic power consumed.
Requirement -
Select the ordering of execution of actions in A such that P is minimized.
• DPP is NP-Complete.
• Corresponds to Traveling Salesman Problem - sub-problem to DPP.
FERMAT / Virginia Tech 28
LOW DYNAMIC POWER TECHNIQUES
• Re-scheduling.
• Operand Isolation.
• Clock Gating.
• Gated Guards.
FERMAT / Virginia Tech 29
RE-SCHEDULING
• Actions can be re-scheduled such that switching at the inputs of the functional units is minimized.
• Resource sharing - Conflicts can be created such that same functional units can be shared among actions consisting of same operations on same operands.
FERMAT / Virginia Tech 30
OPERAND ISOLATION
• Operand Isolation –– Computation corresponding to the body of an action is
allowed only when its output is used in the present clock cycle.
– Involves - • Insertion of gates at the appropriate points without affecting
guards.• Selection of activation signal.
– Guards of actions used as gating signals.
– Implemented algorithm in Bluespec Compiler saved upto 25% dynamic power.
FERMAT / Virginia Tech 31
x
y
zcurrentstate
nextstate
enablesignals
x’
y’
z’
next-statevaluesQ D
EN
bodylogic
condlogic
action foo
Φ2
Computations stay quiescent except
when action executes, i.e. guard is True
action foo (… cond … (x < y) …);
x <= x + z …
endrule
OPERAND ISOLATION – SINGLE ACTION
FERMAT / Virginia Tech 32
OPERAND ISOLATION – MULTIPLE ACTIONS
Isolating multiple actions of a design.
Scheduler
DataSelect
State
DQ
Enable
RuleN
CondN
ActionN
Cond1
Action1
Rule1Rule Control
ΦN
Φ2
FERMAT / Virginia Tech 33
REGISTER CLOCK GATING
• Register Clock-gating -– Registers having a common ENABLE signal can be provided
the same gated clock.
– CAOS - Registers being updated by same set of actions can be passed the same gated clock.
• Implemented algorithm in Bluespec Compiler saved upto 45% dynamic power.
FERMAT / Virginia Tech 34
REGISTER CLOCK GATING
In CAOS, guards of the actions provide the control for gating the clocks of the registers.
EN
Register
DINQOUT
CLK
EN
CLK
GATED_CLK
GATED_CLK
FERMAT / Virginia Tech 35
GATED GUARDS
• In hardware, only required guards should be computed in each clock cycle for power sake.
• Static analysis can be done to figure out which guards should be
computed.
FERMAT / Virginia Tech 36
Gated Guards• Rule 1: (x > y) && (y != 0) --> (x = y; y = x;)
• Rule 2: (x <= y) && (y != 0) --> (y = y - x;)
• Rule 3: (y == 0) --> (result = x;)
Let P = ( x > y) ; Q = (y == 0);
Then g1: P && !Q
g2: !P && !Q
g3: Q
------------------------------------------
g1 && g2 = false;
g1 && g3 = false;
g3 && g1 = false
FERMAT / Virginia Tech 37
Gated Guards
What else can we infer?
(x > y), (y != 0), (x’ == y), (y’ == x)
------------------------------------------------------
(x’ <= y’) && (y’ != 0) OR (y == 0)
So after Rule 1 execution, we know for sure, G1 cannot be true, but G2 or G3 may be true, and hence G1 need not be evaluated. Also prioritize G3.
FERMAT / Virginia Tech 38
Gated Guard• Gcd (70, 42)
• x = 70, y = 42 --> Rule 1
• x = 42, y = 70 --> Rule 2
• x = 42, y = 28 --> Rule 1
• x = 28, y = 42 --> Rule 2
• x = 28, y = 14 --> Rule 1
• x = 14, y = 28 --> Rule 2
• x = 14, y = 14 --> Rule 2
• x = 14, y = 0 --> Rule 3
• result = 14
FERMAT / Virginia Tech 39
Gated Guard• Use a F/F that gets value 1, when Rule 1 is
fired, and becomes 0, when other rules are fired.
• If this F/F holds a value 1, evaluate only G3 and then G2.
• Unless Rule 1 is fired, this F/F stays at 0, and hence can be clock gated most of the time.
• This example may not be very useful, as the guards are simple to evaluate, but guard calculus on complex guards can lead to savings.
FERMAT / Virginia Tech 40
GATED GUARDS
• Theorem proving techniques can be used for deductions.
• Such analysis can be done for more complicated designs.
• A memory in hardware can be used to store the information about which guards need not be computed in the present clock cycle.
FERMAT / Virginia Tech 41
Thank You !!
?