common subexpression elimination involving multiple variables for linear dsp synthesis 15 th ieee...

Common Subexpression Common Subexpression Elimination Involving Multiple Elimination Involving Multiple

Variables for Linear DSP Variables for Linear DSP SynthesisSynthesis

1515thth IEEE International Conference on Application IEEE International Conference on Application Specific Architectures and Processors (ASAP)Specific Architectures and Processors (ASAP)

Farzan FallahFarzan Fallah

Advanced CAD ResearchAdvanced CAD Research

Fujitsu Labs. of AmericaFujitsu Labs. of America

Farzan FallahFarzan Fallah

Advanced CAD ResearchAdvanced CAD Research

Fujitsu Labs. of AmericaFujitsu Labs. of America

Anup Hosangadi Anup Hosangadi

Ryan KastnerRyan KastnerECE Department, UCSBECE Department, UCSB

Anup Hosangadi Anup Hosangadi

Ryan KastnerRyan KastnerECE Department, UCSBECE Department, UCSB

OutlineOutline

IntroductionIntroduction

Arithmetic expressions and polynomial Arithmetic expressions and polynomial formulationformulation

Eliminating multiple variable common Eliminating multiple variable common subexpressionssubexpressions

ResultsResults

Limitations of proposed techniqueLimitations of proposed technique

ConclusionsConclusions


Multiplications by constants encountered Multiplications by constants encountered in many application areasin many application areas– DSP transforms in Audio, Video, Image DSP transforms in Audio, Video, Image

processing (processing (DFT, DCT, IDCT etc..)DFT, DCT, IDCT etc..)– Filtering operations in Communication (Filtering operations in Communication (FIR, FIR,

IIR filters)IIR filters)– Multiple Input Multiple Output (MIMO) Multiple Input Multiple Output (MIMO)

systemssystems– Polynomials in Computer graphics Polynomials in Computer graphics


Multiplication is expensive in hardwareMultiplication is expensive in hardwareDecompose constant multiplications into shifts and Decompose constant multiplications into shifts and additionsadditions– 13*X = (1101)13*X = (1101)22*X = X + X<<2 + X<<3*X = X + X<<2 + X<<3

Signed digits can reduce the number of Signed digits can reduce the number of additions/subtractionsadditions/subtractions– Canonical Signed Digits (CSD) Canonical Signed Digits (CSD) (Knuth’74)(Knuth’74)– (57)(57)1010 = (0110111) = (0110111)22 = (100-1001) = (100-1001)CSDCSD

Further reduction possible by common subexpression Further reduction possible by common subexpression eliminationelimination– Upto 50% reduction Upto 50% reduction (R.Hartley TCS’96)(R.Hartley TCS’96)


Common subexpressions Common subexpressions = common = common digit patternsdigit patterns

– FF11 = 7*X = (0111)*X = X + X<<1 + X<<2 = 7*X = (0111)*X = X + X<<1 + X<<2

FF22 = 13*X = (1101)*X = X + X<<2 + X<<3 = 13*X = (1101)*X = X + X<<2 + X<<3

– DD11 = X + X<<2 = X + X<<2

FF11 = D = D11 + X<<1 + X<<1

FF22 = D = D11 + X<<3 + X<<3

– Good for single variable: Good for single variable: FIR filters FIR filters (transposed form)(transposed form)– Multiple variable? (Multiple variable? (DFT, DCT etcDFT, DCT etc..??)..??)

“0101”

=> X + X<<2

3+, 3<<

4+, 4<<


Matrix form of linear systemsMatrix form of linear systems YY1 1 aa1111 a a1212 a a13 13 XX11

YY2 2 == aa2121 a a2222 a a23 23 xx XX22

YY3 3 a a3131 a a3232 a a33 33 X X33

k

kikjj

iji DCXSY k

kikjj

iji DCXSY

11 00 11 11 00 00

00 11 11 11 00 11

11 00 00 11 00 11

All Distinct SAll Distinct SijijXXjj and C and CikikDDkk

Y1

Y2

Y3

Potkonjak TCAD’95

Arithmetic expressions & Arithmetic expressions & Polynomial formulationPolynomial formulation

View linear systems as set of arithmetic expressionsView linear systems as set of arithmetic expressions– Expressions consisting of Expressions consisting of +,-,<<+,-,<< operators operators– Develop methodology for extracting common Develop methodology for extracting common

subexpressionssubexpressions

Polynomial formulationPolynomial formulation

C×X=(±X×Li)C×X=(±X×Li)

(14)(10)×X=(1110)(2)×X

= X<<3 + X<<2 + X<<1

= XL3 + XL2 + XL1

= (100-10)(CSD)× X = XL4 - XL

(14)(10)×X=(1110)(2)×X

= X<<3 + X<<2 + X<<1

= XL3 + XL2 + XL1

= (100-10)(CSD)× X = XL4 - XL

Arithmetic expressions and Arithmetic expressions and Polynomial formulationPolynomial formulation

YY1 1 = 5 7 X= 5 7 X11

YY2 2 4 12 X4 12 X22

Polynomial formulationPolynomial formulation

5 = 0101

7 = 0111

4 = 0100

12 = 1100

Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2

Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3

Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2

Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3

6 <<, 6 +6 <<, 6 +

Digit pattern matching techniquesDigit pattern matching techniques

0 1 0 1 0 1 1 1

0 1 0 0 1 1 0 0

D1 = X2 + X2<<1Y1 = X1 + X1<<2 + D1+ X2<<2Y2 = X1<<2 + D1<<2

D1 = X2 + X2<<1Y1 = X1 + X1<<2 + D1+ X2<<2Y2 = X1<<2 + D1<<2

5 <<, 5 +5 <<, 5 +

X1

X2

Algebraic techniques for factoring and Algebraic techniques for factoring and eliminating common subexpressionseliminating common subexpressions

Algebraic methods in Algebraic methods in multi-level logic synthesis multi-level logic synthesis ((MLLS)MLLS)– Reducing literal count in a Reducing literal count in a

set of Boolean expressionsset of Boolean expressions– Factoring, decomposition: Factoring, decomposition:

Established algebraic Established algebraic techniquestechniques

Can be applied to linear Can be applied to linear arithmetic expressions as arithmetic expressions as wellwell

D1 = X1+ X2<<2

Y1 = D1 + D1<<3 + X1<<3

Y2 = D1 + X2<<2

Finding candidate common Finding candidate common subexpressions (kernels)subexpressions (kernels)

TerminologyTerminology– Divisor:Divisor: An expression having at least one term with a An expression having at least one term with a

non-zero exponent of Lnon-zero exponent of L– eg. Xeg. X11 + X + X22L + XL + X33LL2 2 isis a divisor a divisor– XX11L + XL + X22LL22 + X + X33LL22 is is notnot a divisor a divisor– Kernel:Kernel: Divisor obtained from original expression by Divisor obtained from original expression by

division by an exponent of L. division by an exponent of L. – Co-kernelCo-kernel: Exponent of L that is used to obtain the : Exponent of L that is used to obtain the

kernelkernel

ExampleExample– P = XP = X11LL33 + X + X22LL3 3 + X + X22LL22 + X + X33

– Division by LDivision by L22 kernelkernel = X = X11L + XL + X22L + XL + X22; ; co-kernelco-kernel = = LL22

Kernel generation algorithmKernel generation algorithm

LX X LX L

Y2(5) 2(4)1(2)

1 LX X LX L

Y2(5) 2(4)1(2)

1

» Divide Y1 by L» Divide Y1 by L» Divide again by L» Divide again by L

2(5)1(2)21 X X

L

Y 2(5)1(2)2

1 X X L

Y LX XX

L

Y2(8) 2(7)1(6)2

2 LX XX L

Y2(8) 2(7)1(6)2

2

» Divide Y2 by L2» Divide Y2 by L2

Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2

Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3

Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2

Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3

Recursively divide by the smallest non-zero exponent of L

Kernel generationKernel generation

All kernels and co-kernels for example All kernels and co-kernels for example linear systemlinear system

(((1)(1)XX11 + + (2)(2)XX11LL22 + + (3)(3)XX2 2 + + (4)(4)XX22L + L + (5)(5)XX22LL22)[1])[1]

(((2)(2)XX11L + L + (4)(4)XX22 + + (5)(5)XX22L)[L]L)[L]

(((2)(2)XX11 + + (5)(5)XX22)[L)[L22]]

(((6)(6)XX11LL22 + + (7)(7)XX22LL22 + + (8)(8)XX22LL33)[1])[1]

(((6)(6)XX1 1 + + (7)(7)XX22 + + (8)(8)XX22L)[LL)[L22]]

Importance of KernelsImportance of KernelsTheorem:Theorem: There exists a k-term common There exists a k-term common subexpression subexpression iffiff there is a k-term “ there is a k-term “non-overlappingnon-overlapping” ” intersection between at least two kernelsintersection between at least two kernels

ProofProof– If:If: Non-overlapping k-term intersection Non-overlapping k-term intersection

=> K-term common subexpression=> K-term common subexpression

Only If: Only If: If there are 2 instances of k-term subexpressionIf there are 2 instances of k-term subexpressionCase1: Case1: “divisor” => Each instance will be a part of some kernel “divisor” => Each instance will be a part of some kernel expressionexpression

Case2:Case2: “non-divisor” => dividing by smallest non-zero exponent of “non-divisor” => dividing by smallest non-zero exponent of L will convert it into a “divisor”L will convert it into a “divisor”

Kernel generationKernel generationeg. 10*X = (1010)*X = eg. 10*X = (1010)*X = (1)(1)XLXL + + (2)(2)XLXL33

14*X = (1110)*X = 14*X = (1110)*X = (3)(3)XLXL + + (4)(4)XLXL22 + + (5)(5)XLXL33

– common subexpression = common subexpression = XL + XLXL + XL3 3 = (X + XL = (X + XL22)L)L– kernels involved in intersection: kernels involved in intersection:

(((1)(1)XX + + (2)(2)XLXL22))

(((3)(3)XX + + (4)(4)XL + XL + (5)(5)XLXL22) )

Overlapping kernelsOverlapping kernels

Consider (1001001)*XConsider (1001001)*X

(1001001)*X = (1001001)*X = (1)(1)XLXL66 + + (2)(2)XLXL33 + + (3)(3)XX

– Kernels Kernels [1] ( [1] ( (1)(1)XLXL66 + + (2)(2)XLXL33 + + (3)(3)XX))

[L[L33] ( ] ( (1)(1)XLXL33 + + (2)(2)XX))

1 0 0 1 0 0 1

Finding kernel intersectionsFinding kernel intersections

Form Kernel Cube Matrix (KCM)Form Kernel Cube Matrix (KCM)– One row for each kernel generatedOne row for each kernel generated– One column for each distinct kernel cubeOne column for each distinct kernel cube– Each non-zero element represents a termEach non-zero element represents a term

1 2 3 4 5 6

X1 X1L2 X2 X2L X2L2 X1L

CoKernels

1 1 1(1) 1(2) 1(3) 1(4) 1(5) 0

2 L 0 0 1(4) 1(5) 0 1(2)

3 L2 1(2) 0 1(5) 0 0 0

4 L2 1(6) 0 1(7) 1(8) 0 0

Y1 = X1 + X1L2 + X2 + X2L + X2L2

Y2 = X1L2 + X2L2 + X2L3

X2L2

Finding kernel intersectionsFinding kernel intersectionsEach rectangle with non-overlapping terms = a common Each rectangle with non-overlapping terms = a common subexpressionsubexpression– RectangleRectangle: Set of rows and columns such that all elements are ‘1’: Set of rows and columns such that all elements are ‘1’

Search only for prime rectanglesSearch only for prime rectangles– Prime rectanglePrime rectangle: Rectangle that is not covered by any other : Rectangle that is not covered by any other

rectanglerectangle

Prime rectangle may have overlapping termsPrime rectangle may have overlapping terms– Find a non-overlapping rectangle within the prime rectangle Find a non-overlapping rectangle within the prime rectangle ((MIRMIR = Maximum Irredundant Rectangle) = Maximum Irredundant Rectangle)

Value of a rectangle (R = #Rows, C = #Cols)Value of a rectangle (R = #Rows, C = #Cols)– ValueValue = # of additions/subtractions saved by selecting rectangle = # of additions/subtractions saved by selecting rectangle– V(R,C) = (R-1)*(C-1)V(R,C) = (R-1)*(C-1)

Finding kernel intersectionsFinding kernel intersectionsSelecting common subexpressionsSelecting common subexpressions– Greedy selection of most valued non-overlapping Greedy selection of most valued non-overlapping

rectangle in each iterationrectangle in each iteration

– This is very expensive This is very expensive Worst case Worst case O(2O(2MNMN)) prime rectangles to be considered prime rectangles to be consideredM = # of expressions; N = Bit-widthM = # of expressions; N = Bit-width

– Heuristic required (Heuristic required (ping-pongping-pong))Start with a seed row/column Start with a seed row/column Build rectangle by intersections with other rows/colsBuild rectangle by intersections with other rows/colsComplexity = Linear in #Rows/ColumnsComplexity = Linear in #Rows/Columns

Finding kernel intersectionsFinding kernel intersections

1 2 3 4 5 6


CoKernels

1 1 1(1) 1(2) 1(3) 1(4) 1(5) 0

2 L 0 0 1(4) 1(5) 0 1(2)

3 L2 1(2) 0 1(5) 0 0 0

4 L2 1(6) 0 1(7) 1(8) 0 0

3 47 8

4 57 8

OR

MIR =

Extracting kernel intersections (1Extracting kernel intersections (1stst Iteration)Iteration)

1 2 3 4 5 6


CoKernels

1 1 1(1) 1(2) 1(3) 1(4) 1(5) 0

2 L 0 0 1(4) 1(5) 0 1(2)

3 L2 1(2) 0 1(5) 0 0 0

4 L2 1(6) 0 1(7) 1(8) 0 0

Select D1 = X1 + X2 + X2L, saves 2 additions!

Extracting Kernel intersections (2Extracting Kernel intersections (2ndnd iteration)iteration)

1 2 3 4 5 6

D1 X1L2 X2L2 X1 X2 X2L

1 1 1(1) 1(2) 1(3) 0 0 0

2 L2 0 0 0 1(2) 1(3) 0

3 1 0 0 0 1(5) 1(6) 1(7)

D2 = X1 + X2

D2 = X1 + X2

D1 = D2 + X2<<1Y1 = D1 + D2<<2Y2 = D1<<1

D2 = X1 + X2

D1 = D2 + X2<<1Y1 = D1 + D2<<2Y2 = D1<<1

Final Implementation

3 <<, 3 +3 <<, 3 +

Experimental SetupExperimental SetupGoalGoal– Reduction in #additions/subtractionsReduction in #additions/subtractions– Effect on area/latency on synthesisEffect on area/latency on synthesis

Transforms DCT, IDCT,DFT, DST, DHT.Transforms DCT, IDCT,DFT, DST, DHT.

8x8 constant matrices8x8 constant matrices

16 digits precision (CSD representation)16 digits precision (CSD representation)

Compare withCompare with– Potkonjak (Potkonjak (TCAD’95TCAD’95))– RESANDS (RESANDS (Nguyen et. al TVLSI’2000Nguyen et. al TVLSI’2000))

Experimental resultsExperimental results

Example Example # of additions/subtractions# of additions/subtractions % Improvement % Improvement overover

OriginalOriginal

(I)(I)

RESANDSRESANDS

(II)(II)

PotkonjakPotkonjak

(III)(III)

Our Our TechniqueTechnique

(IV)(IV)

(I)(I) (II)(II) (III)(III)

DCTDCT 274274 202202 227227 174174 36.536.5 13.113.1 23.323.3

IDCTIDCT 242242 183183 222222 162162 33.033.0 11.511.5 27.027.0

R-DFTR-DFT 253253 193193 208208 165165 34.834.8 14.514.5 20.720.7

I-DFTI-DFT 207207 178178 198198 134134 35.335.3 24.724.7 32.332.3

DSTDST 320320 238238 252252 200200 37.537.5 16.016.0 20.620.6

DHTDHT 284284 209209 211211 175175 38.438.4 16.316.3 17.017.0

AverageAverage 263.3263.3 200.5200.5 219.7219.7 168.3168.3 35.935.9 16.016.0 23.523.5

Experimental resultsExperimental resultsSynthesis results (Synthesis results (Minimum Latency constraintsMinimum Latency constraints))

ExampleExample Area (Library Units)Area (Library Units) Latency (Clock cycles)Latency (Clock cycles)

(II)(II) (III)(III) (IV)(IV) (II)(II) (III)(III) (IV)(IV)

DCTDCT 9066790667 9637596375 7331173311 1010 1111 1111

IDCTIDCT 8186881868 9977199771 6686466864 1010 1111 1111

R-DFTR-DFT 9049690496 8477084770 6982769827 1010 1212 1111

I-DFTI-DFT 7514075140 8486484864 5594055940 1010 1111 1010

DSTDST 108101108101 106498106498 8471584715 1111 1212 1111

DHTDHT 9393993939 7940979409 7127271272 1111 1111 1111

AverageAverage 9011090110 9194891948 7032270322 10.310.3 11.311.3 10.810.8

Limitations of this techniqueLimitations of this technique

Results dependant on initial representation of Results dependant on initial representation of constants constants – Mixed representationMixed representation

Too many: O(3Too many: O(3NN) per constant) per constant

Factoring of constantsFactoring of constants– eg. 105*X = 15*7*X = (16-1)*(8-1)*Xeg. 105*X = 15*7*X = (16-1)*(8-1)*X

= ( (X<<4 -1)<<3 – 1)= ( (X<<4 -1)<<3 – 1)– Factoring in general is very hardFactoring in general is very hard

Common subexpressions with reversed signsCommon subexpressions with reversed signs– eg. (Xeg. (X11 – X – X22) = ) = -(X -(X22 – X – X11) cannot be detected) cannot be detected


ContributionsContributions– Novel polynomial transformationNovel polynomial transformation– Adapting rectangle covering methodsAdapting rectangle covering methods– Single var and multi-var subexpressions Single var and multi-var subexpressions

eliminated together => better resultseliminated together => better results

Future workFuture work– Addressing shortcomings of current methodAddressing shortcomings of current method– Optimization for timing, powerOptimization for timing, power


Thank you!!Thank you!!

Questions??Questions??

common subexpression elimination involving multiple variables for linear dsp synthesis 15 th ieee...

Documents

ucsb slide

multiple variables

introduction multiplications

computer graphics slide

constant multiplications

linear dsp synthesis

polynomial formulation

communication fir