modulo-(2 +3) parallel prefix addition via diminished-3 ... · implemented the required parallel...
TRANSCRIPT
Modulo-(2𝑛 + 3) Parallel Prefix Addition via
Diminished-3 Representation of Residues
Authors: Ghassem Jaberipur, Sahar Moradi Cherati
Arith 26
Kindly presented by: Paulo Sérgio Alves Martins
Contents2/22
INTRODUCTION
Residue number systems (RNS)
ℛ = {𝑚1, 𝑚2…𝑚𝑘}, 𝑚𝑖(1 ≤ 𝑖 ≤ 𝑘)
𝑀 = 𝑖=1𝑘 𝑚𝑖
Applications
RNS Features
3/22
𝑋, 𝑌 ∈ ℛ𝑋 = 𝑥1, 𝑥2… , 𝑥𝑘 , 𝑌 = (𝑦1, 𝑦2… , 𝑦𝑘),
where 𝑥𝑖 = 𝑋 𝑚𝑖, 𝑦𝑖 = 𝑌 𝑚𝑖𝑍 = 𝑋 ⊛ 𝑌, 𝑧𝑖 = 𝑥𝑖⊛𝑦𝑖, where ⊛∈ {+,−,×}
Popular τ = 2𝑛 − 1,2𝑛, 2𝑛 + 1
General form:2𝑛 ± δ 1 ≤ δ < 2𝑛−1
Parallel prefix
modulo-(2𝑛 − δ) adders: δ = 1 δ = 3 [Jaberipur,2015]
δ = 2𝑞 + 1 [Langroudi,2015]
2𝑛 + δ = 2𝑛+1 − δ′
, where 1 < δ′ = 2𝑛 − δ < 2𝑛
No direct fast solution
Delay3 + 2 log𝑛 Δ4 + 2 log𝑛 Δ5 + 2 log𝑛 Δ
Cryptography
digital signal/image processing
High Speed
Low Power
DIMINISHED-1 ADDERS4/22
𝑋: A modulo-(2𝑛 + 1) residue ∈ 0,2𝑛
𝑋 = 𝑋′ + 𝑧𝑋, where 𝑋′ = 𝑋 − 1 ∈ 0,2𝑛 − 1 for 𝑋 > 0𝑧𝑋 = 0(1), if and only if 𝑋 = 0(> 0)
𝑧𝑆 = (𝑧𝐴∨ 𝑧𝐵) ∧ 𝑧𝐴𝑧𝐵 ∧ 𝜉𝑆′ = 𝐴′ + 𝐵′ + 𝑧𝐴 + 𝑧𝐵 − 𝑧𝑆 2𝑛+1
= 𝑊′ + 𝑧𝐴𝑧𝐵𝑤𝑛′
2𝑛
Diminished-1 encoding
𝑤𝑛′ = 𝐺𝑛−1:0𝜉 = 𝑃𝑛−1:0𝐺𝑛−1:0 = 1, Iff 𝐴
′ + 𝐵′ = 2𝑛 − 1
𝑆 = 𝐴 + 𝐵 2𝑛+1 = 𝑆′ + 𝑧𝑆, 𝐴 = 𝐴
′ + 𝑧𝐴, 𝐵 = 𝐵′ + 𝑧𝐵𝑆′ = 𝑆 − 1, 𝐴′ = 𝐴 − 1, 𝐵′ = 𝐵 − 1 for 𝑆, 𝐴, 𝐵 > 0𝑧𝑆, 𝑧𝐴, 𝑧𝐵: zero-indicator bits
𝐴′ + 𝐵′ = 2𝑛𝑤𝑛′ + 𝑊′, where 𝑊′ = 𝑤𝑛−1
′ …𝑤0′
𝑆′ = 𝑆 − 1 = 𝐴 + 𝐵 2𝑛+1 − 1
= 𝐴′ + 1 + 𝐵′ + 1 − 1 2𝑛+1 = 2𝑛𝑤𝑛′ + 𝑊′ + 1
2𝑛+1
= 𝑊′ + 1 − 𝑤𝑛′2𝑛+1
= 𝑊′ +𝑤′𝑛
Diminished-1 addition
Related Work: Modulo-(𝟐𝒏 + 𝟏) D1 adder5/22
RPP TPP
DIMINISHED-3 REPRESENTATION
AND ADDITION
DIMINISHED-3 REPRESENTATION7/22
𝑋:Modulo− 2𝑛 + 3 residue ∈ [0, 2𝑛 + 2]
𝑋′ ∈ [0, 2𝑛 − 1]
D3
Representation 𝑋 = 𝑋′ + 𝑇𝑋
𝑇𝑋 = 𝑡1𝑡0
{0, 1, 2}
[3, 2𝑛 + 2]
𝑇𝑋 ∈ {0, 1, 2}-indicator
𝑋′ ∶ 𝑛 bit
𝑇𝑋 = 0 ⟺ 𝑋 = 0,𝑋′ = 0
𝑇𝑋 = 1 ⟺ 𝑋 = 1,𝑋′ = 0
𝑇𝑋 = 2 ⟺ 𝑋 = 2,𝑋′ = 0
𝑇𝑋 = 3⟺ 3 ≤ 𝑋 ≤ 2𝑛 + 2,𝑋′ ∈ [0,2𝑛 − 1]
Modulo-(𝟐𝒏 + 𝟑) D3 ADDITION8/22
𝐴 ∈ [0, 2𝑛 + 2] 𝐴 = 𝐴′ + 𝑇𝐴 𝐴′= 𝒂𝒏−𝟏 … 𝒂𝟐 𝒂𝟏 𝒂𝟎
𝐵′ = 𝒃𝒏−𝟏 … 𝒃𝟐 𝒃𝟏 𝒃𝟎
𝑤𝒏′ 𝑤𝒏−𝟏
′ … 𝑤2′ 𝑤1
′ 𝑤0′
𝐴, 𝐵, 𝑆 ≥ 3
𝑆′ = 𝑆 − 3 = 𝐴 + 𝐵 2𝑛+3 − 3
= 𝐴′ + 3 + 𝐵′ + 3 − 3 2𝑛+3 = 2𝑛𝑤𝑛′ + 𝑊′ + 3
2𝑛+3
= 𝑊′ + 3 1 − 𝑤𝑛′2𝑛+3
= 𝑊′ + 3𝑤′𝑛
𝐵 ∈ [0, 2𝑛 + 2] 𝐵 = 𝐵′ + 𝑇𝐵
Comparison D1 and D39/22
𝟑 + 𝟐 𝒍𝒐𝒈𝒏 𝜟 𝟓 + 𝟐 𝒍𝒐𝒈𝒏 𝜟
Detect Special Cases
𝑺 = 𝑨 + 𝑩 𝟐𝒏+𝟑 ∈ {𝟎,1,2}10/22
IF 𝐴′ + 𝐵′ = 2𝑛 − 1 THEN ξ1 = 1 ELSE ξ1 = 0
IF 𝐴′ + 𝐵′ = 2𝑛 − 2 THEN ξ2 = 1 ELSE ξ2 = 0
IF 𝐴′ + 𝐵′ = 2𝑛 − 3 THEN ξ3 = 1 ELSE ξ3 = 0
ξ1 = 𝑃𝑛−1:2𝐺𝑛−1:2ℎ1𝑢0 ξ2 = 𝑃𝑛−1:2𝐺𝑛−1:2ℎ1𝑢0 ξ3 = 𝑃𝑛−1:2𝐺𝑛−1:2𝑢1𝑢0
𝑺 = 𝑨 + 𝑩 𝟐𝒏+𝟏 = 𝟎
IF 𝐴′ + 𝐵′ = 2𝑛 − 1 THEN ξ = 1 ELSE ξ = 0
𝑫𝟏
11/22
σ1 = α1β1 ∨ α0β0 𝒦 ∨ 𝑓1(α1, β1, α0, β0, 𝑢1, 𝑣1, 𝑢0)
σ0 = α1β0 ∨ α0β1 𝒦 ∨ 𝑓0 α1, β1, α0, β0, 𝑢1, 𝑣1, 𝑢0
𝒦 = 𝑃𝑛−1:2𝐺𝑛−1:2,
DERIVATION OF 𝑇𝑆
ξ1 = 𝒦ℎ1𝑢0, ξ2 = 𝒦ℎ1𝑢0, ξ3 = 𝒦𝑢1𝑢0
Impact of ξ-dependent noise terms on 𝑻𝑺12/22
𝑻𝑨 𝑻𝑩 𝛏𝟏 𝛏𝟐 𝛏𝟑 𝑻𝑺 𝑺 Justification
3 1 0 X X 3 ≥ 4 𝐴′ + 𝐵′ = 𝐴′ < 2𝑛 − 1⟹ 𝑆 = 𝐴 + 𝐵 = 𝐴′ + 3 + 1 < 2𝑛 + 3
3 1 1 X X 0 0 𝐴′ + 𝐵′ = 2𝑛 − 1⟹ 𝐴 + 𝐵 = 2𝑛 + 3, 𝑆 = 𝐴 + 𝐵 2𝑛+3 = 0
𝑇𝑆 = 3𝜉1
𝑻𝑨 𝑻𝑩 𝛏𝟏 𝛏𝟐 𝛏𝟑 𝑻𝑺 𝑺 Justification
3 2 0 0 X 3 ≥ 5 𝐴′ + 𝐵′ = 𝐴′ < 2𝑛 − 2⟹ 𝑆 = 𝐴 + 𝐵 = 𝐴′ + 3 + 2 < 2𝑛 + 3
3 2 0 1 X 0 0 𝐴′ + 𝐵′ = 2𝑛 − 2⟹ 𝐴 + 𝐵 = 2𝑛 + 3, 𝑆 = 𝐴 + 𝐵 2𝑛+3 = 0
3 2 1 0 X 1 1 𝐴′ + 𝐵′ = 2𝑛 − 1⟹ 𝐴 + 𝐵 = 2𝑛 + 4, 𝑆 = 𝐴 + 𝐵 2𝑛+3 = 1
𝑇𝑆 = 𝜉1 + 3 𝜉1 𝜉2
𝑻𝑨 𝑻𝑩 𝛏𝟏 𝛏𝟐 𝛏𝟑 𝑻𝑺 𝑺 Justification
3 3 0 0 0 3 ≥ 6 𝐴′ + 𝐵′ < 2𝑛 − 3⟹ 𝑆 = 𝐴 + 𝐵 = 𝐴′ + 𝐵′ + 6 < 2𝑛 + 3
3 3 0 0 0 3 ≥ 3 2𝑛 ≤ 𝐴′ + 𝐵′ ≤ 2𝑛 + 2𝑛 − 2⟹ 3 ≤ 𝑆 = 𝐴 + 𝐵 2𝑛+3 ≤ 2𝑛 + 1
3 3 0 0 1 0 0 𝐴′ + 𝐵′ = 2𝑛 − 3⟹ 𝐴 + 𝐵 = 2𝑛 + 3, 𝑆 = 𝐴 + 𝐵 2𝑛+3 = 0
3 3 0 1 0 1 1 𝐴′ + 𝐵′ = 2𝑛 − 2⟹ 𝐴 + 𝐵 = 2𝑛 + 4, 𝑆 = 𝐴 + 𝐵 2𝑛+3 = 1
3 3 1 0 0 2 2 𝐴′ + 𝐵′ = 2𝑛 − 1⟹ 𝐴 + 𝐵 = 2𝑛 + 5, 𝑆 = 𝐴 + 𝐵 2𝑛+3 = 2
𝑇𝑆 = 2𝜉1 + 𝜉2 + 3 𝜉1 𝜉2 𝜉3
𝑻𝑨 𝑻𝑩 𝛏𝟏 𝛏𝟐 𝛏𝟑 𝑻𝑺 𝑺 Justification
3 0 X X X 3 ≥ 3 𝑆 = 𝐴 + 𝐵 = 𝐴 ≤ 2𝑛 + 2
𝑇𝑆 = 3
13/22
THE NOISE TERM 𝑇 = 𝑇𝐴 + 𝑇𝐵 − 𝑇𝑆 IN TERMS OF 𝑇𝐴, 𝑇𝐵, AND ξ BITS
𝑆′ = 𝐴′ + 𝑇𝐴 + 𝐵′ + 𝑇𝐵 − 𝑇𝑆 2𝑛+3 = 𝐴
′ + 𝐵′ + 𝑇𝐴 + 𝑇𝐵 − 𝑇𝑆 2𝑛+3
= 2𝑛𝑤𝑛′ + 𝑊′ + 𝑇
2𝑛+3= 𝑊′ + 𝑇 − 3𝑤𝑛
′2𝑛+3
= 𝑊′ + 𝛅′2𝑛+3
, 𝛅′ = 𝑇 − 3𝑤𝑛′
DERIVATION OF 𝑆′
𝑻𝑩
𝑻𝑨
0 1 2 3
0 0 0 0 0 1 0 0 0 3𝜉1 + 1 2 0 0 1 2𝜉1 + 3𝜉2 + 2 3 0 3𝜉1 + 1 2𝜉1 + 3𝜉2 + 2 𝜉1 + 2𝜉2 + 3𝜉3 + 3
COMPOUND RPP REALIZATION OF 𝑺′14/22
𝑺′ = 𝑾′ + 𝒛𝒘𝒏′ + 𝛅′
𝟐𝒏
𝑧 = α1β1α0β0
δ′ = δ1′ δ0′ ∈ {0,1,2}
δ1′ = ξ1α1β1 α0⨁β0 ∨ ξ1 ξ2𝑧𝑤𝑛
′ ,
δ0′ = ξ1𝑥 ∨ ξ2𝑧 ∨ α0β0 α1⨁β1 ∨ α1β1α0 ∨ β0
15/22 The required RPP circuitry
pgh pgh pgh pgh pgh pghpgh
𝑢7 𝑣7
𝑔 1 , 𝑝 1
HA HA HA HA HAHA HA HA
𝑎0 𝑏0 𝑎1 𝑏1 𝑎2 𝑏2 𝑎3 𝑏3 𝑎4 𝑏4 𝑎5 𝑏5 𝑎6 𝑏6 𝑎7 𝑏7
𝑢6 𝑣6 𝑢5 𝑣5 𝑢4 𝑣4 𝑢3 𝑣3 𝑢2 𝑣2 𝑢1 𝑣1 𝑢0 𝑣8
𝑠7′ 𝑠6
′ 𝑠5′ 𝑠4
′ 𝑠3′ 𝑠2
′ 𝑠1′ 𝑠0
′
𝑐7 𝑐6 𝑐5 𝑐4 𝑐3 𝑐2
𝑐0 𝑐1
(𝒈𝟕:𝟐,𝒑𝟕:𝟐)
𝑷𝟕:𝟐
𝑮𝟕:𝟐 𝑮𝟕:𝟐
𝑢0
(6Δ)𝑞03 𝑞02(3Δ) 𝑷𝟕:𝟐
𝑞01(4Δ)
𝑦(4Δ)
(4Δ)𝑞12 𝑞13(4Δ) 𝑷𝟕:𝟐 𝑷𝟕:𝟐
𝑞11(5Δ)
(3Δ)𝑥
(4Δ)𝑦
σ1 σ0
α0β1 α1β1 α1β0
𝑓0(7Δ)
α0β0
𝑓1(7Δ)
𝑮𝟕:𝟐 𝑷𝟕:𝟐
pgh
( , )i ig pih ( , )l l r l rg p g p p
( , )r rg p( , )l lg p
( )l l rg p g
( , )r rg p( , )l lg p
( , )i ig p
( , )i ig p 𝑢𝑖 𝑣𝑖
𝟕 + 𝟐 𝒍𝒐𝒈𝒏 𝜟
16/22
𝑐𝑖 = 𝐺𝑖−1:2 ∨ 𝑃𝑖−1:2𝑐2=𝐺𝑖−1:2 ∨ 𝑃𝑖−1:2 𝑔1′ ∨ 𝑝1
′𝐺𝑛−1:2 = 𝐺𝑖−1:2 ∨ 𝑃𝑖−1:2 𝑔1′ ∨ 𝑝1
′𝐺𝑛−1:𝑖
𝑐2 = 𝑔1′ ∨ 𝑝1
′𝐺7:2, 𝑔1′ , 𝑝1′ ∘ 𝐺7:2, 1
𝑐3 = 𝑔2 ∨ 𝑝2 𝑔1′ ∨ 𝑝1
′𝐺7:3 , 𝑔2, 𝑝2 ∘ 𝑔1′ , 𝑝1′ ∘ 𝐺7:3, 1
𝑐4 = 𝐺3:2 ∨ 𝑃3:2 𝑔1′ ∨ 𝑝1
′𝐺7:4 , (𝐺, 𝑃)3:2∘ (𝑔1′ , 𝑝1′ ) ∘ 𝐺7:4, 1
𝑐5 = 𝐺4:2 ∨ 𝑃4:2 𝑔1′ ∨ 𝑝1
′𝐺7:5 , 𝐺, 𝑃 4:2 ∘ (𝑔1′ , 𝑝1′ ) ∘ 𝐺7:5, 1
𝑐6 = 𝐺5:2 ∨ 𝑃5:2 𝑔1′ ∨ 𝑝1
′𝐺7:6 , 𝐺, 𝑃 5:2 ∘ (𝑔1′ , 𝑝1′ ) ∘ 𝐺7:6, 1
𝑐7 = 𝐺6:2 ∨ 𝑃6:2 𝑔1′ ∨ 𝑝1
′𝑔7 , 𝐺, 𝑃 6:2 ∘ (𝑔1′ , 𝑝1′ ) ∘ 𝑔7, 1
Carry Bits for RPP Architecture
17/22 The required TPP circuitry
α0β1
σ1
α1β0
𝑓0(7Δ)
α0β0
σ0
𝑮𝟕:𝟐 𝑷𝟕:𝟐
α1β1
𝑓1(7Δ)
𝑷𝟕:𝟐 𝑮𝟕:𝟐 𝑷𝟕:𝟐
𝑷𝟕:𝟐
𝑮𝟕:𝟐 𝑮𝟕:𝟐 𝑮𝟕:𝟐
𝑮𝟕:𝟐
𝑞12′ (7Δ)
(6Δ)𝑞04′
(7Δ)𝑞02′
𝑞11′ (7Δ)
𝑞03′ (5Δ)
𝑞01′ (7Δ)
(6Δ)𝑞13′
𝑷𝟕:𝟐
𝑠0′ 𝑠1
′
pgh pgh pgh pgh pgh pghpgh
𝑢1 𝑣1 𝑢2 𝑣2 𝑢3 𝑣3 𝑢4 𝑣4 𝑢5 𝑣5 𝑢6 𝑣6 𝑢7 𝑣7
𝑔 12 , 𝑝 12 𝑔 11 , 𝑝 11
HA HA HA HA HAHA HA HA
𝑎0 𝑏0 𝑎1 𝑏1 𝑎2 𝑏2 𝑎3 𝑏3 𝑎4 𝑏4 𝑎5 𝑏5 𝑎6 𝑏6 𝑎7 𝑏7
𝑢0
𝑠7′ 𝑠6
′ 𝑠5′ 𝑠4
′ 𝑠3′ 𝑠2
′
𝑐7 𝑐6 𝑐5 𝑐4 𝑐3 𝑐2 (𝒈𝟕:𝟐,𝒑𝟕:𝟐)
𝟓 + 𝟐 𝒍𝒐𝒈𝒏 𝜟
Carry Bits for TPP Architecture
𝑐2,
(( 𝑝′, 𝑔′11∘ 𝑝′, 𝑔′
12) ∘ ( 𝑔, 𝑝 7 ∘ 𝑔, 𝑝 6)) ∘ (( 𝑔, 𝑝 5 ∘ (𝑔, 𝑝)4) ∘ ((𝑔, 𝑝)3∘ (𝑔, 𝑝)2))
𝑐3,
(((𝑝, 𝑔)2∘ 𝑝′, 𝑔′
11) ∘ ( 𝑝′, 𝑔′
12∘ (𝑔, 𝑝)7)) ∘ (((𝑔, 𝑝)6∘ (𝑔, 𝑝)5) ∘ ((𝑔, 𝑝)4∘ (𝑔, 𝑝)3))
𝑐4,
𝑔, 𝑝 3 ∘ 𝑔, 𝑝 2 ∘ 𝑔′, 𝑝′ 11 ∘ 𝑔′, 𝑝′ 12 ∘ (((𝑔, 𝑝)7∘ (𝑔, 𝑝)6) ∘ ((𝑔, 𝑝)5∘ (𝑔, 𝑝)4))
𝑐5,
(( 𝑝, 𝑔 4 ∘ 𝑝, 𝑔 3) ∘ ( 𝑝, 𝑔 2 ∘ 𝑝′, 𝑔′
11)) ∘ (( 𝑝′, 𝑔′
12∘ (𝑔, 𝑝)7) ∘ ((𝑔, 𝑝)6∘ (𝑔, 𝑝)5))
𝑐6,
(( 𝑔, 𝑝 5 ∘ (𝑔, 𝑝)4) ∘ ((𝑔, 𝑝)3∘ (𝑔, 𝑝)2)) ∘ (( 𝑔′, 𝑝′ 11 ∘ 𝑔
′, 𝑝′ 12) ∘ ( 𝑔, 𝑝 7 ∘ (𝑔, 𝑝)6))
𝑐7, 𝑔, 𝑝 6 ∘ 𝑔, 𝑝 5 ∘ 𝑔, 𝑝 4 ∘ 𝑔, 𝑝 3 ∘ (((𝑝, 𝑔)2∘ 𝑝′, 𝑔′
11) ∘ ( 𝑝′, 𝑔′
12∘ (𝑔7, 1)))
18/22
EVALUATION AND COMPARISON19/22
RPP SYNTHESIS RESULTS
𝒏 = 𝟏𝟔
𝒏 = 𝟖
Design(RPP) Delay Area Power
𝒏𝒔 Ratio 𝝁𝒎𝟐 Ratio 𝒎𝒘 Ratio
D3 0.76 1.00 25318 1.00 0.682 1.00
D1 [1] 0.59 0.77 10128 0.40 0.328 0.48
2𝑛 − 3 [2] 0.72 0.94 13227 0.52 0.467 0.68
Design(RPP) Delay Area Power
𝒏𝒔 Ratio 𝝁𝒎𝟐 Ratio 𝒎𝒘 Ratio
D3 0.81 1.00 42043 1.00 1.19 1.00
D1 [1] 0.72 0.88 23776 0.56 0.716 0.60
2𝑛 − 3 [2] 0.78 0.96 30637 0.73 1.01 0.85
[1] Jaberipur,2011
[2] Jaberipur,2015
TPP Results20/22
SYNTHESIS RESULTS FOR 𝒏 = 𝟏𝟔
SYNTHESIS RESULTS FOR 𝒏 = 𝟖
DELAY AND AREA MEASURES
Design(TPP) Delay(𝜟) Area (# of gates)
D3 (5 + 2 log𝑛) 3𝑛 𝑙𝑜𝑔 𝑛 + 15𝑛+ 39
D1 [1] (3 + 2 log𝑛) 3𝑛 𝑙𝑜𝑔𝑛 + 12𝑛 − 1
2𝑛 − 3 [2] (4 + 2 log𝑛) 3𝑛 𝑙𝑜𝑔𝑛 + 12𝑛+ 4
Design(TPP) Delay Area Power
𝒏𝒔 Ratio 𝝁𝒎𝟐 Ratio 𝒎𝒘 Ratio
D3 0.65 1.00 30686 1.00 0.956 1.00
D1 [1] 0.57 0.88 14227 0.46 0.375 0.39
2𝑛 − 3 [2] 0.64 0.98 14152 0.46 0.500 0.52
Design(TPP) Delay Area Power
𝒏𝒔 Ratio 𝝁𝒎𝟐 Ratio 𝒎𝒘 Ratio
D3 0.73 1.00 51121 1.00 1.60 1.00
D1 [1] 0.64 0.88 34441 0.67 0.939 0.59
2𝑛 − 3 [2] 0.73 1.00 32712 0.64 1.081 0.67
[1] Jaberipur,2011
[2] Jaberipur,2015
CONCLUSIONS
Implemented the required parallel prefix (RPP and TPP architectures) adders based on the
novel diminished-3 representation of residues in {3,2𝑛 + 2} and 2-bit {0,1,2} indicator
The adder delay is only 2Δ more than the modulo-(2𝑛 + 1) diminished-1 adder, and 1Δ more
than that of the companion modulo-(2𝑛 − 3) adder
Same speed (synthesis result) for the proposed designs and those of the modulo-(2𝑛 − 3)
adders
Area and Power overhead reduces as 𝑛 grows larger
21/22