a decomposition algorithm to structure arithmetic circuits

27
A Decomposition Algorithm to Structure Arithmetic Circuits Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique Fédérale de Lausanne (EPFL) International Workshop on Logic and Synthesis August 1, 2009

Upload: karma

Post on 22-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

A Decomposition Algorithm to Structure Arithmetic Circuits. Ajay K. Verma, Philip Brisk, Paolo Ienne Ecole Polytechnique F é d é rale de Lausanne (EPFL). International Workshop on Logic and Synthesis August 1, 2009. Logic synthesis tools Local optimization via Boolean minimization - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Decomposition Algorithm to Structure Arithmetic Circuits

A Decomposition Algorithm to Structure Arithmetic Circuits

Ajay K. Verma, Philip Brisk, Paolo Ienne

Ecole Polytechnique Fédérale de Lausanne (EPFL)

International Workshop on Logic and Synthesis

August 1, 2009

Page 2: A Decomposition Algorithm to Structure Arithmetic Circuits

Logic Optimization Strategies

Ripple-Carry Adder Carry-Lookahead Adder

• Logic synthesis tools– Local optimization via Boolean minimization

• Architectural transformation– Not with “traditional” logic synthesis

1

Page 3: A Decomposition Algorithm to Structure Arithmetic Circuits

Naïve Leading Zero Detector

xi is TRUE if (i+1)th most-significant bit is the leading non-zero bit

xi = a15a14 … a15-(i-2)a15-(i-1)a15-i

Convert xi to a binary number

2

Page 4: A Decomposition Algorithm to Structure Arithmetic Circuits

Optimized LZD [Oklobdzija 1994]

3

Page 5: A Decomposition Algorithm to Structure Arithmetic Circuits

Comparison

0.36 ns (427 μm2)

0.30 ns (392 μm2)

16% faster, 8% smaller

4

Page 6: A Decomposition Algorithm to Structure Arithmetic Circuits

Outline• Algorithmic Overview

• Progressive Decomposition Algorithm… – … and its shortcomings– [Verma et al., DAC 2007]

• New Algorithm

• Experimental Results

• Conclusion

5

Page 7: A Decomposition Algorithm to Structure Arithmetic Circuits

Input Condensation• Leader Expressions

– Sufficient to evaluate expression– Once evaluated, you can discard input bits– Works for circuits with “effective online algorithms”

IN

OneBig

Circuit

OUTRecursively compute leader expressions

again

Leader Expressions

L |L| < |IN|

Smaller Circuit

OUT

IN

6

Page 8: A Decomposition Algorithm to Structure Arithmetic Circuits

8:4 Parallel Counter

sc

(Leader Expressions)

7

Page 9: A Decomposition Algorithm to Structure Arithmetic Circuits

Hierarchical Circuit Construction

Use leader expressions as building blocks to impose hierarchy

8

Page 10: A Decomposition Algorithm to Structure Arithmetic Circuits

Progressive Decomposition

• Choose a subset of input bits– How many bits?– Many different combinations?

• Find leader expressions– Optimize via Boolean ring properties– Find identities

• Discard dependent expressions

x y zz = f(x, y)

• Rewrite circuit in terms of leader expressions• Recursively process the remaining circuit

9

Page 11: A Decomposition Algorithm to Structure Arithmetic Circuits

Progressive Decomposition: Shortcomings and Concerns

• [Verma et al., DAC 2007]

• Entire algorithm based on Reed-Muller Form– Rewrite ‘your’ optimizer, e.g., if you use AIGs or BDDs.– Exponential blowup for leading one detector

• Cannot optimize multipliers

• Cannot optimize “structurable circuits” surrounded by peripheral logic

10

Page 12: A Decomposition Algorithm to Structure Arithmetic Circuits

M1 M2

48

E1 E2

19 19

4

sign

negs1 s2

xor

out

Compound CircuitsM1 M2

48

E1 E2

19 19

sign

not

out

and

1

4

s1 s2

xor

g72x

0.82 ns (7998 μm2)

12% faster, 55% larger

0.94 ns (5142 μm2)

11

Page 13: A Decomposition Algorithm to Structure Arithmetic Circuits

Support Sets

• Progressive Decomposition– Support sets are subsets of one another or disjoint– Blocks must always reduce the number of inputs

12

Page 14: A Decomposition Algorithm to Structure Arithmetic Circuits

Support Sets

• New Approach– Support sets may overlap– Relaxes input condensation constraint– Both conditions are necessary to support multipliers

13

Page 15: A Decomposition Algorithm to Structure Arithmetic Circuits

New Algorithm: Overview• Supports any representation with minimization algorithms

– We use BDDs

• Use SAT to check functional dependency – [Lee et al., ICCAD 2007]

• Restrict Operator computes generalized cofactors– [Coudert and Madre, ICCAD 1990]

• Entropy-based delay estimator – [Macii et al., GLS-VLSI 1999]– Imprecise, but effectively computes relative delays

14

Page 16: A Decomposition Algorithm to Structure Arithmetic Circuits

Input Bit Selection via Random Sampling

a5 a4 a3 a2 0 1

b5 b4 b3 b2 1 1 a5 1 a3 a2 a1 0

b5 b4 1 1 b1 b0

Complexity of a 4-bit adder Complexity of a 6-bit adder

• Select every combination of k input bits for k < 6• Randomly assign values to the bits• Estimate the complexity of the resulting circuit

15

Page 17: A Decomposition Algorithm to Structure Arithmetic Circuits

Computing Leader ExpressionsE – input expression

B – chosen input bits

S – leader expressions

found thus far

R – remaining bits

Is E functionally dependent on SR?

1. Randomly sample R’s assignment space

2. Find missing leader expressions using SAT [Lee et al. ICCAD 2007]

- Satisfying assignments provide missing leader expressions

- S contains all leader expressions if no satisfying assignments exist

E

B R

S

16

Page 18: A Decomposition Algorithm to Structure Arithmetic Circuits

Redundant Leader Expressions

• Leader expression is an input bit– Non-disjunctive decomposition is required– Remove from the set

• Leader expression contains no useful information– Remove from the set

a0b1 + a1b0

B = {a0, b0} a1 = b1 = 1 a0 + b0

Cannot help to compute the original expression

17

Page 19: A Decomposition Algorithm to Structure Arithmetic Circuits

• Generalized cofactors– E = (a+b)x + (ab + c)y– g = a + b + c

– E |g=0 = 0

– E |g=1 = (a + b)x + (ab + c)y = E

• The general case is a reduction to SAT– Problem instances tend to be small

Redundant Leader Expressions

18

Page 20: A Decomposition Algorithm to Structure Arithmetic Circuits

Rewrite Original Expression

• Rewrite as Shannon expansion using the Restrict operator – Generalized cofactors are not unique – The order in which the cofactors of each leader expression are

computed may affect the result

• For each cofactor:– Estimate the delay– Estimate delay of F based on

Shannon expansion

• Select the cofactor that leads to the minimal estimated delay of F

F = g(F |g=1) + g’(F |g=0)

D(F) max{D(F |g=1), D(F |g=0), D(g)} + D(mux)

F

F |g=0

F |g=1

g

19

Page 21: A Decomposition Algorithm to Structure Arithmetic Circuits

Rinse, Repeat

We picked this set of input bits to optimize!We generated a set of leader expressionsLocal optimization using your favorite LS tool can’t hurt.The leader expressions are now frozen, and the block that computes them is optimized. Optimize the remaining circuit

20

Page 22: A Decomposition Algorithm to Structure Arithmetic Circuits

Experimental Setup

Circuit written by hand

Known Arithmetic Circuits

Prog. Decomp.

Synopsis Design Compiler

- compile_ultra - minimize delay

Artisan Standard CellsUMC (90 nm)

1 2 3

New Algorith

m

4

21

Page 23: A Decomposition Algorithm to Structure Arithmetic Circuits

Critical Path Delay

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

16-bit ADD 12-bit 3ADD 8x8-bit MUL 16-bit SHIFT ADPCM g72x SAD

ns

Original Progressive Decomposition

Our Algorithm Library/Manual Implementation

Optimized for Area, Not DelayProgressive Decomposition

Fails

22

Page 24: A Decomposition Algorithm to Structure Arithmetic Circuits

Area

μm2

Original Progressive Decomposition

Our Algorithm Library/Manual Implementation

Optimized for Area, Not Delay

0

2000

4000

6000

8000

10000

12000

14000

16-bit ADD 12-bit 3ADD 8x8-bit MUL 16-bit SHIFT ADPCM g72x SAD

Progressive Decomposition Fails

23

Page 25: A Decomposition Algorithm to Structure Arithmetic Circuits

Conclusion

• Technique to structure arithmetic circuits– Fixes shortcomings of Progressive Decomposition

• Our approach is orthogonal to classical Boolean minimization techniques

• Discovered new implementation of a k-input MAX function– Similar structure to LZD Circuit– Will appear at ICCAD 2009

24

Page 26: A Decomposition Algorithm to Structure Arithmetic Circuits

Computing Leader Expressions

E

B R

S

Original Variables

B = {b1, …, bm}

R = {r1, …, rn}

Dummy Variables

C = {c1, …, cm}

S = {s1, …, sn}

Extra Constraints

[Lee et al., ICCAD 2007]

ri = si, 1 < i < n

For each leader expression ej:

ej(b1, …, bm) = ej(c1, …, cm)

E(b1, …, bm, r1, …, rn) E(c1, …, cm, s1, …, sn)

ej

Two different input assignments

Page 27: A Decomposition Algorithm to Structure Arithmetic Circuits

Input Bit Selection

EE’

X

B = {x, y, z}

E’ |xyz=000

E’ |xyz=001

E’ |xyz=111

B = {x, y, z}

Use delay estimator for each E’The complexity of E’ is the metric by which we evaluate each group of input bits

Compute every combination of k input bits for k < 6

Assign values to x, y, z using random sampling