common subexpression elimination involving multiple variables for linear dsp synthesis 15 th ieee...
Post on 19-Dec-2015
218 views
TRANSCRIPT
Common Subexpression Common Subexpression Elimination Involving Multiple Elimination Involving Multiple
Variables for Linear DSP Variables for Linear DSP SynthesisSynthesis
1515thth IEEE International Conference on Application IEEE International Conference on Application Specific Architectures and Processors (ASAP)Specific Architectures and Processors (ASAP)
Farzan FallahFarzan Fallah
Advanced CAD ResearchAdvanced CAD Research
Fujitsu Labs. of AmericaFujitsu Labs. of America
Farzan FallahFarzan Fallah
Advanced CAD ResearchAdvanced CAD Research
Fujitsu Labs. of AmericaFujitsu Labs. of America
Anup Hosangadi Anup Hosangadi
Ryan KastnerRyan KastnerECE Department, UCSBECE Department, UCSB
Anup Hosangadi Anup Hosangadi
Ryan KastnerRyan KastnerECE Department, UCSBECE Department, UCSB
OutlineOutline
IntroductionIntroduction
Arithmetic expressions and polynomial Arithmetic expressions and polynomial formulationformulation
Eliminating multiple variable common Eliminating multiple variable common subexpressionssubexpressions
ResultsResults
Limitations of proposed techniqueLimitations of proposed technique
ConclusionsConclusions
IntroductionIntroduction
Multiplications by constants encountered Multiplications by constants encountered in many application areasin many application areas– DSP transforms in Audio, Video, Image DSP transforms in Audio, Video, Image
processing (processing (DFT, DCT, IDCT etc..)DFT, DCT, IDCT etc..)– Filtering operations in Communication (Filtering operations in Communication (FIR, FIR,
IIR filters)IIR filters)– Multiple Input Multiple Output (MIMO) Multiple Input Multiple Output (MIMO)
systemssystems– Polynomials in Computer graphics Polynomials in Computer graphics
IntroductionIntroduction
Multiplication is expensive in hardwareMultiplication is expensive in hardwareDecompose constant multiplications into shifts and Decompose constant multiplications into shifts and additionsadditions– 13*X = (1101)13*X = (1101)22*X = X + X<<2 + X<<3*X = X + X<<2 + X<<3
Signed digits can reduce the number of Signed digits can reduce the number of additions/subtractionsadditions/subtractions– Canonical Signed Digits (CSD) Canonical Signed Digits (CSD) (Knuth’74)(Knuth’74)– (57)(57)1010 = (0110111) = (0110111)22 = (100-1001) = (100-1001)CSDCSD
Further reduction possible by common subexpression Further reduction possible by common subexpression eliminationelimination– Upto 50% reduction Upto 50% reduction (R.Hartley TCS’96)(R.Hartley TCS’96)
IntroductionIntroduction
Common subexpressions Common subexpressions = common = common digit patternsdigit patterns
– FF11 = 7*X = (0111)*X = X + X<<1 + X<<2 = 7*X = (0111)*X = X + X<<1 + X<<2
FF22 = 13*X = (1101)*X = X + X<<2 + X<<3 = 13*X = (1101)*X = X + X<<2 + X<<3
– DD11 = X + X<<2 = X + X<<2
FF11 = D = D11 + X<<1 + X<<1
FF22 = D = D11 + X<<3 + X<<3
– Good for single variable: Good for single variable: FIR filters FIR filters (transposed form)(transposed form)– Multiple variable? (Multiple variable? (DFT, DCT etcDFT, DCT etc..??)..??)
“0101”
=> X + X<<2
3+, 3<<
4+, 4<<
IntroductionIntroduction
Matrix form of linear systemsMatrix form of linear systems YY1 1 aa1111 a a1212 a a13 13 XX11
YY2 2 == aa2121 a a2222 a a23 23 xx XX22
YY3 3 a a3131 a a3232 a a33 33 X X33
k
kikjj
iji DCXSY k
kikjj
iji DCXSY
11 00 11 11 00 00
00 11 11 11 00 11
11 00 00 11 00 11
All Distinct SAll Distinct SijijXXjj and C and CikikDDkk
Y1
Y2
Y3
Potkonjak TCAD’95
Arithmetic expressions & Arithmetic expressions & Polynomial formulationPolynomial formulation
View linear systems as set of arithmetic expressionsView linear systems as set of arithmetic expressions– Expressions consisting of Expressions consisting of +,-,<<+,-,<< operators operators– Develop methodology for extracting common Develop methodology for extracting common
subexpressionssubexpressions
Polynomial formulationPolynomial formulation
C×X=(±X×Li)C×X=(±X×Li)
(14)(10)×X=(1110)(2)×X
= X<<3 + X<<2 + X<<1
= XL3 + XL2 + XL1
= (100-10)(CSD)× X = XL4 - XL
(14)(10)×X=(1110)(2)×X
= X<<3 + X<<2 + X<<1
= XL3 + XL2 + XL1
= (100-10)(CSD)× X = XL4 - XL
Arithmetic expressions and Arithmetic expressions and Polynomial formulationPolynomial formulation
YY1 1 = 5 7 X= 5 7 X11
YY2 2 4 12 X4 12 X22
Polynomial formulationPolynomial formulation
5 = 0101
7 = 0111
4 = 0100
12 = 1100
Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2
Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3
Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2
Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3
6 <<, 6 +6 <<, 6 +
Digit pattern matching techniquesDigit pattern matching techniques
0 1 0 1 0 1 1 1
0 1 0 0 1 1 0 0
D1 = X2 + X2<<1Y1 = X1 + X1<<2 + D1+ X2<<2Y2 = X1<<2 + D1<<2
D1 = X2 + X2<<1Y1 = X1 + X1<<2 + D1+ X2<<2Y2 = X1<<2 + D1<<2
5 <<, 5 +5 <<, 5 +
X1
X2
Algebraic techniques for factoring and Algebraic techniques for factoring and eliminating common subexpressionseliminating common subexpressions
Algebraic methods in Algebraic methods in multi-level logic synthesis multi-level logic synthesis ((MLLS)MLLS)– Reducing literal count in a Reducing literal count in a
set of Boolean expressionsset of Boolean expressions– Factoring, decomposition: Factoring, decomposition:
Established algebraic Established algebraic techniquestechniques
Can be applied to linear Can be applied to linear arithmetic expressions as arithmetic expressions as wellwell
D1 = X1+ X2<<2
Y1 = D1 + D1<<3 + X1<<3
Y2 = D1 + X2<<2
Finding candidate common Finding candidate common subexpressions (kernels)subexpressions (kernels)
TerminologyTerminology– Divisor:Divisor: An expression having at least one term with a An expression having at least one term with a
non-zero exponent of Lnon-zero exponent of L– eg. Xeg. X11 + X + X22L + XL + X33LL2 2 isis a divisor a divisor– XX11L + XL + X22LL22 + X + X33LL22 is is notnot a divisor a divisor– Kernel:Kernel: Divisor obtained from original expression by Divisor obtained from original expression by
division by an exponent of L. division by an exponent of L. – Co-kernelCo-kernel: Exponent of L that is used to obtain the : Exponent of L that is used to obtain the
kernelkernel
ExampleExample– P = XP = X11LL33 + X + X22LL3 3 + X + X22LL22 + X + X33
– Division by LDivision by L22 kernelkernel = X = X11L + XL + X22L + XL + X22; ; co-kernelco-kernel = = LL22
Kernel generation algorithmKernel generation algorithm
LX X LX L
Y2(5) 2(4)1(2)
1 LX X LX L
Y2(5) 2(4)1(2)
1
» Divide Y1 by L» Divide Y1 by L» Divide again by L» Divide again by L
2(5)1(2)21 X X
L
Y 2(5)1(2)2
1 X X L
Y LX XX
L
Y2(8) 2(7)1(6)2
2 LX XX L
Y2(8) 2(7)1(6)2
2
» Divide Y2 by L2» Divide Y2 by L2
Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2
Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3
Y1 = (1)X1 + (2)X1L2 + (3)X2 + (4)X2L + (5)X2L2
Y2 = (6)X1L2 + (7)X2L2 + (8)X2L3
Recursively divide by the smallest non-zero exponent of L
Kernel generationKernel generation
All kernels and co-kernels for example All kernels and co-kernels for example linear systemlinear system
(((1)(1)XX11 + + (2)(2)XX11LL22 + + (3)(3)XX2 2 + + (4)(4)XX22L + L + (5)(5)XX22LL22)[1])[1]
(((2)(2)XX11L + L + (4)(4)XX22 + + (5)(5)XX22L)[L]L)[L]
(((2)(2)XX11 + + (5)(5)XX22)[L)[L22]]
(((6)(6)XX11LL22 + + (7)(7)XX22LL22 + + (8)(8)XX22LL33)[1])[1]
(((6)(6)XX1 1 + + (7)(7)XX22 + + (8)(8)XX22L)[LL)[L22]]
Importance of KernelsImportance of KernelsTheorem:Theorem: There exists a k-term common There exists a k-term common subexpression subexpression iffiff there is a k-term “ there is a k-term “non-overlappingnon-overlapping” ” intersection between at least two kernelsintersection between at least two kernels
ProofProof– If:If: Non-overlapping k-term intersection Non-overlapping k-term intersection
=> K-term common subexpression=> K-term common subexpression
Only If: Only If: If there are 2 instances of k-term subexpressionIf there are 2 instances of k-term subexpressionCase1: Case1: “divisor” => Each instance will be a part of some kernel “divisor” => Each instance will be a part of some kernel expressionexpression
Case2:Case2: “non-divisor” => dividing by smallest non-zero exponent of “non-divisor” => dividing by smallest non-zero exponent of L will convert it into a “divisor”L will convert it into a “divisor”
Kernel generationKernel generationeg. 10*X = (1010)*X = eg. 10*X = (1010)*X = (1)(1)XLXL + + (2)(2)XLXL33
14*X = (1110)*X = 14*X = (1110)*X = (3)(3)XLXL + + (4)(4)XLXL22 + + (5)(5)XLXL33
– common subexpression = common subexpression = XL + XLXL + XL3 3 = (X + XL = (X + XL22)L)L– kernels involved in intersection: kernels involved in intersection:
(((1)(1)XX + + (2)(2)XLXL22))
(((3)(3)XX + + (4)(4)XL + XL + (5)(5)XLXL22) )
Overlapping kernelsOverlapping kernels
Consider (1001001)*XConsider (1001001)*X
(1001001)*X = (1001001)*X = (1)(1)XLXL66 + + (2)(2)XLXL33 + + (3)(3)XX
– Kernels Kernels [1] ( [1] ( (1)(1)XLXL66 + + (2)(2)XLXL33 + + (3)(3)XX))
[L[L33] ( ] ( (1)(1)XLXL33 + + (2)(2)XX))
1 0 0 1 0 0 1
Finding kernel intersectionsFinding kernel intersections
Form Kernel Cube Matrix (KCM)Form Kernel Cube Matrix (KCM)– One row for each kernel generatedOne row for each kernel generated– One column for each distinct kernel cubeOne column for each distinct kernel cube– Each non-zero element represents a termEach non-zero element represents a term
1 2 3 4 5 6
X1 X1L2 X2 X2L X2L2 X1L
CoKernels
1 1 1(1) 1(2) 1(3) 1(4) 1(5) 0
2 L 0 0 1(4) 1(5) 0 1(2)
3 L2 1(2) 0 1(5) 0 0 0
4 L2 1(6) 0 1(7) 1(8) 0 0
Y1 = X1 + X1L2 + X2 + X2L + X2L2
Y2 = X1L2 + X2L2 + X2L3
X2L2
Finding kernel intersectionsFinding kernel intersectionsEach rectangle with non-overlapping terms = a common Each rectangle with non-overlapping terms = a common subexpressionsubexpression– RectangleRectangle: Set of rows and columns such that all elements are ‘1’: Set of rows and columns such that all elements are ‘1’
Search only for prime rectanglesSearch only for prime rectangles– Prime rectanglePrime rectangle: Rectangle that is not covered by any other : Rectangle that is not covered by any other
rectanglerectangle
Prime rectangle may have overlapping termsPrime rectangle may have overlapping terms– Find a non-overlapping rectangle within the prime rectangle Find a non-overlapping rectangle within the prime rectangle ((MIRMIR = Maximum Irredundant Rectangle) = Maximum Irredundant Rectangle)
Value of a rectangle (R = #Rows, C = #Cols)Value of a rectangle (R = #Rows, C = #Cols)– ValueValue = # of additions/subtractions saved by selecting rectangle = # of additions/subtractions saved by selecting rectangle– V(R,C) = (R-1)*(C-1)V(R,C) = (R-1)*(C-1)
Finding kernel intersectionsFinding kernel intersectionsSelecting common subexpressionsSelecting common subexpressions– Greedy selection of most valued non-overlapping Greedy selection of most valued non-overlapping
rectangle in each iterationrectangle in each iteration
– This is very expensive This is very expensive Worst case Worst case O(2O(2MNMN)) prime rectangles to be considered prime rectangles to be consideredM = # of expressions; N = Bit-widthM = # of expressions; N = Bit-width
– Heuristic required (Heuristic required (ping-pongping-pong))Start with a seed row/column Start with a seed row/column Build rectangle by intersections with other rows/colsBuild rectangle by intersections with other rows/colsComplexity = Linear in #Rows/ColumnsComplexity = Linear in #Rows/Columns
Finding kernel intersectionsFinding kernel intersections
1 2 3 4 5 6
X1 X1L2 X2 X2L X2L2 X1L
CoKernels
1 1 1(1) 1(2) 1(3) 1(4) 1(5) 0
2 L 0 0 1(4) 1(5) 0 1(2)
3 L2 1(2) 0 1(5) 0 0 0
4 L2 1(6) 0 1(7) 1(8) 0 0
3 47 8
4 57 8
OR
MIR =
Extracting kernel intersections (1Extracting kernel intersections (1stst Iteration)Iteration)
1 2 3 4 5 6
X1 X1L2 X2 X2L X2L2 X1L
CoKernels
1 1 1(1) 1(2) 1(3) 1(4) 1(5) 0
2 L 0 0 1(4) 1(5) 0 1(2)
3 L2 1(2) 0 1(5) 0 0 0
4 L2 1(6) 0 1(7) 1(8) 0 0
Select D1 = X1 + X2 + X2L, saves 2 additions!
Extracting Kernel intersections (2Extracting Kernel intersections (2ndnd iteration)iteration)
1 2 3 4 5 6
D1 X1L2 X2L2 X1 X2 X2L
1 1 1(1) 1(2) 1(3) 0 0 0
2 L2 0 0 0 1(2) 1(3) 0
3 1 0 0 0 1(5) 1(6) 1(7)
D2 = X1 + X2
D2 = X1 + X2
D1 = D2 + X2<<1Y1 = D1 + D2<<2Y2 = D1<<1
D2 = X1 + X2
D1 = D2 + X2<<1Y1 = D1 + D2<<2Y2 = D1<<1
Final Implementation
3 <<, 3 +3 <<, 3 +
Experimental SetupExperimental SetupGoalGoal– Reduction in #additions/subtractionsReduction in #additions/subtractions– Effect on area/latency on synthesisEffect on area/latency on synthesis
Transforms DCT, IDCT,DFT, DST, DHT.Transforms DCT, IDCT,DFT, DST, DHT.
8x8 constant matrices8x8 constant matrices
16 digits precision (CSD representation)16 digits precision (CSD representation)
Compare withCompare with– Potkonjak (Potkonjak (TCAD’95TCAD’95))– RESANDS (RESANDS (Nguyen et. al TVLSI’2000Nguyen et. al TVLSI’2000))
Experimental resultsExperimental results
Example Example # of additions/subtractions# of additions/subtractions % Improvement % Improvement overover
OriginalOriginal
(I)(I)
RESANDSRESANDS
(II)(II)
PotkonjakPotkonjak
(III)(III)
Our Our TechniqueTechnique
(IV)(IV)
(I)(I) (II)(II) (III)(III)
DCTDCT 274274 202202 227227 174174 36.536.5 13.113.1 23.323.3
IDCTIDCT 242242 183183 222222 162162 33.033.0 11.511.5 27.027.0
R-DFTR-DFT 253253 193193 208208 165165 34.834.8 14.514.5 20.720.7
I-DFTI-DFT 207207 178178 198198 134134 35.335.3 24.724.7 32.332.3
DSTDST 320320 238238 252252 200200 37.537.5 16.016.0 20.620.6
DHTDHT 284284 209209 211211 175175 38.438.4 16.316.3 17.017.0
AverageAverage 263.3263.3 200.5200.5 219.7219.7 168.3168.3 35.935.9 16.016.0 23.523.5
Experimental resultsExperimental resultsSynthesis results (Synthesis results (Minimum Latency constraintsMinimum Latency constraints))
ExampleExample Area (Library Units)Area (Library Units) Latency (Clock cycles)Latency (Clock cycles)
(II)(II) (III)(III) (IV)(IV) (II)(II) (III)(III) (IV)(IV)
DCTDCT 9066790667 9637596375 7331173311 1010 1111 1111
IDCTIDCT 8186881868 9977199771 6686466864 1010 1111 1111
R-DFTR-DFT 9049690496 8477084770 6982769827 1010 1212 1111
I-DFTI-DFT 7514075140 8486484864 5594055940 1010 1111 1010
DSTDST 108101108101 106498106498 8471584715 1111 1212 1111
DHTDHT 9393993939 7940979409 7127271272 1111 1111 1111
AverageAverage 9011090110 9194891948 7032270322 10.310.3 11.311.3 10.810.8
Limitations of this techniqueLimitations of this technique
Results dependant on initial representation of Results dependant on initial representation of constants constants – Mixed representationMixed representation
Too many: O(3Too many: O(3NN) per constant) per constant
Factoring of constantsFactoring of constants– eg. 105*X = 15*7*X = (16-1)*(8-1)*Xeg. 105*X = 15*7*X = (16-1)*(8-1)*X
= ( (X<<4 -1)<<3 – 1)= ( (X<<4 -1)<<3 – 1)– Factoring in general is very hardFactoring in general is very hard
Common subexpressions with reversed signsCommon subexpressions with reversed signs– eg. (Xeg. (X11 – X – X22) = ) = -(X -(X22 – X – X11) cannot be detected) cannot be detected
ConclusionsConclusions
ContributionsContributions– Novel polynomial transformationNovel polynomial transformation– Adapting rectangle covering methodsAdapting rectangle covering methods– Single var and multi-var subexpressions Single var and multi-var subexpressions
eliminated together => better resultseliminated together => better results
Future workFuture work– Addressing shortcomings of current methodAddressing shortcomings of current method– Optimization for timing, powerOptimization for timing, power
ConclusionsConclusions
Thank you!!Thank you!!
Questions??Questions??