a cad framework for leakage power aware synthesis of asynchronous circuits

Click here to load reader

Upload: conan-dickerson

Post on 31-Dec-2015

44 views

Category:

Documents


0 download

DESCRIPTION

A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits. Behnam Ghavami and Hossein Pedram Presented by Wei- Lun Hung. Outline. Introduction AsyncTool : Synthesis of QDI Asynchronous Circuits Statistic Performance Analyzing Transistor’s Parameters Assignment - PowerPoint PPT Presentation

TRANSCRIPT

A CAD Framework for Leakage Power Aware Synthesis of Asynchronous Circuits

Behnam Ghavami and Hossein PedramPresented by Wei-Lun HungA CAD Framework for Leakage Power Aware Synthesis of Asynchronous CircuitsOutlineIntroductionAsyncTool: Synthesis of QDI Asynchronous CircuitsStatistic Performance AnalyzingTransistors Parameters AssignmentExperimental ResultsConclusionIntroductionThe VLSI design challengesHigh power consumptionSynchronization problemsRobust issuesOne possible solution: Asynchronous circuitLow power consumptionNo clock skewLow Electromagnetic Interference (EMI)Asynchronous CircuitsNot controlled by global clockEliminate clock skewPotentially fasterLow power consumptionLow EMIRely on exchanging handshaking LimitationsLack of automatic synthesis tool Hard to evaluate performance of asynchronous circuitsTransistors Parameters The Vth, Vdd and gate size are the parameters which affect the performance of circuitsHeuristically search to find a good tradeoff according to the optimization goalThe optimization of synchronous circuitsMultiple-Vth and multiple-Vdd assignmentEx: the gates on critical paths operate at the higher Vdd or lower VthThe optimization asynchronous circuitsCannot compute a critical path as synchronous circuitsDepends on dynamic factors, ex: # of tokens

5The Framework of Asynchronous Circuit

AsyncTool: Synthesis of QDI Asynchronous Circuits

Asynchronous Circuit ModelDelay-insensitive (DI)Most robust of all asynchronous circuit delay modelsMakes no assumptions on the delay of wires or gatesAny transition on an input to a gate must be seen on the outputNot practical due to the heavy restrictionsQuasi delay-insensitive (QDI)Like DI, butAssume that the delay of the branch are equal (isochronic forks)Use Verilog-CSP Code in this framework8AsyncTool: Synthesis of QDI Asynchronous CircuitsUse Pre-Charge logic Full-Buffer (templates) for its predefined templatesEncapsulate all isochronic forks insideEliminate isochronic fork constrain3 PartsArithmetic function extractor (AFE)Ex: Addition, subtraction, comparison ...Implements them with pre-synthesized standard templatesDecomposition Template Synthesizer (TSYN)one-bit operators, ex: AND, OR, XOR, Expander is used to convert multiple-bit expressions

Decomposition (1/2)Decompose the original description into an equivalent collection of smaller interacting processesConvert to dynamic single assignment formProjectionDynamic Single Assignment form

10

Decomposition (2/2) ProjectionBreak the program up into a concurrent system of smaller modulesStatistic Performance Analyzing

Petri-NetsUsed to model concurrency and synchronizationRepresented as a bipartite graphDefined as four-tuple N = (P, T, F, m0)P: Set of placesT: Se qt of TransitionsF (P T) (T P): Flow relationm0: Initial markingA Masking is a mapping M: P N

Petri-Nets Examples

Timed Petri-NetA Petri-Net in which transitions or places are annotated with delaysFor a cycle Ck, the cycle metric isCM(Ck) = D(Ck)/M(Ck)D(Ck) = di, i CkThe performance of a Timed Petri-Net is dictated by the cycle time largest cycle metricCTime = MAX[CM(Ck)], Ck TPN Can be resolved by Maximum Mean-Cycle Algorithms15Average Case VS Worst Case

Probabilistic Timed Petri-Net

The Average-Case Performance MetricFor a P-TPN has only one choice with n outcomesConvert to n TPN models

For a P-TPN has more than one choice Recursively the following formula

Probability ModelUse the static range of the primary inputs of the circuit to determine the static range or internal signalsIndependent VS dependent

Computing the Static Range (1/3)The tagged static ranges of a variable v is shown by TSR(v), where r TSR(v) is expressed as r.ct: the conditional tagr.vt: the variable expression tagr.sr: the static range20Computing the Static Range (2/3)Having the static range of the right hand side variables can compute the static range or left hand side variable by

Where is a standar operator on data values and is operation on static ranges Computing the Static Range (3/3)For a loop

Computing Choice Probabilities(1/3)For a condition variable CV(X>Y)

Computing Choice Probabilities(2/3)

Computing Choice Probabilities(3/3)

Templates Parameters AssignmentThe Vth, Vdd and gate size are the parameters which affect the performance of circuitsDual-Vdd, dual-Vth and eight sizes for each type of templateAdopt Quantum genetic algorithmThe Genetic Algorithm A search technique used in computing to find exact or approximate solutions to optimizationUse techniques inspired by evolutionary biology such as inheritance, mutation, selection, and crossoverPopulation: abstract representations of candidate solutionsRepopulation: generate a second generation population of solutions from those selected through genetic operatorsFitness function: decide the surviving chance of individuals

The Quantum Genetic AlgorithmThe circuit configuration information is encoded into qubitA qubit may be in 1 or 0 state, or in any superposition of the two, represented as =1+0 , where 2+ 2 = 1 , give the probability that the qubit will be found in 0 or 1

The Quantum Genetic AlgorithmThe population of m qubit individals at generation g is denoted as Q(g) = {q1g, q2g, , qng} , where qj is defined as

The Update Procedure

The Quantum Genetic Algorithm

Fitness FunctionPowerThe leakage of a template depends on the number of transistors that re turned off under inputsCalculate the gate leakage under each input patternAreaA qubit have little chance to survival if its area is larger than the area constraintPerformance

Control ParametersPopulation sizeFor a small population, the genetic diversity may not increase for many generationsFor a large population, it may increase the computing time but take fewer generation to find the best solutionsSmall population of size 10 to 15 perform very wellTermination conditionThe power reduction is less than 0.0005% during the last 200 generationsPerformance Estimation Results

34Power Optimization Results

Max-PerfLeakage power 79%40%

35Power Optimization Results

S64125%5% 36Different Technique Comparisons

By timed petri-net sim.2.5 % performance penalty38Comparison to worst-case optimized circuits

Reed-Solomon decoder2 power driven synhesis(worst-case opt. ave-case opt.)1000 stimuli7X 39ConclusionAn efficient design framework for optimizing reducing total power consumption while maintaining the high performance of circuitsUse Probabilistic Timed Petri-Net model to capture the dynamic behavior of the systemThe proposed assigning threshold-voltage, supply-voltage and template sizing method is based on a quantum genetic algorithm5X ~ 7X savings for power consumptions with 2.5% performance penaltyCommentsNot Scalable?Have to specify the static range of the inputs of the circuitsThe connection between synthesis and parameter assigning is not strongExperimental results are questionableMany typos