routing track duplication with fine-grained power-gating for fpga interconnect power reduction
DESCRIPTION
Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction. Yan Lin, Fei Li and Lei He EE Department, UCLA Partially supported by NSF grant CCR-0306682. Address comments to [email protected]. Outline. Review and Motivation - PowerPoint PPT PresentationTRANSCRIPT
Routing Track Duplication with Fine-Grained Power-Gating for
FPGA Interconnect Power Reduction
Yan Lin, Fei Li and Lei HeEE Department, UCLA
Partially supported by NSF grant CCR-0306682. Partially supported by NSF grant CCR-0306682. Address comments to [email protected] comments to [email protected].
Outline Review and Motivation
Interconnect Leakage Power Reduction using Power-gating
Interconnect Dynamic Power Reduction using Dual-Vdd
Conclusions and Ongoing Work
Power Limitation of FPGAs Existing FPGAs are HIGHLY power inefficient
(> 100X more than ASIC) E.g. [Kusse, ISLPED’98]
Power is likely the largest limitation for FPGAs
Design Example Vdd Energy
Xilinx XC4003A 5v 4.2mW/MHz
Static CMOS ASIC 3.3v 5.5uW/MHz
FPGA Power Reduction Power aware FPGA CAD algorithms for
existing FPGA architectures CAD algorithms to minimize power-delay
product [Lamoureux et al, ICCAD’03] Configuration inversion for leakage reduction
[Anderson et al, FPGA’04] Power efficient FPGA circuits and
architectures Dual-Vdd and Vdd-programmable FPGA logic
blocks [Li et al, FPGA’04][Li et al, DAC’04] Vdd-programmable FPGA interconnects
[Li et al, ICCAD’04] [Anderson et al, ICCAD’04]
Overall FPGA Structure Cluster-based Island Style FPGA Structure
Logic blocks are embedded into routing resources Wire segment connectivity is programmable
FPGA Routing Structure Subset Programmable
switch block An incoming track can
be connected to different outgoing tracks with the same track number
Programmable connection block
Vdd-programmable Interconnects [Li et al, ICCAD’04] Conventional routing switch
Vdd-programmable switch Vdd selection for used switch Power-gating unused switch Configurable Vdd-level conversion
Avoid excessive leakage when low Vdd switch drives high Vdd switches
Power transistor
Limitation of Vdd-programmable Interconnects [Li et al, ICCAD’04] Fine-grained Vdd-level converter insertion
Area overhead 54% area overhead for circuit s38584
Leakage overhead 36% leakage overhead for circuit s38584
SRAM cell overhead 300% SRAM cell overhead for each switch
Area/SRAM efficient low-power interconnects are needed
Outline Review and Motivation
Interconnect Leakage Power Reduction using Power-gating
Interconnect Dynamic Power Reduction using Dual-Vdd
Conclusions and Ongoing Work
Low Utilization Rate of Interconnects
Circuit # of total interconnect switches
# of unused interconnect switches
Utilization rate (%)
alu4apex4bigkeyclmadesdiffeqdsipellipticex5pfrisc
364784374163259653181878774274675547140296454042388523
31224377035401759334379932369747013812580039288216993
14.40%13.80%9.87%9.16%9.04%13.50%7.16%10.33%13.47%9.15%
Average 11.90%
78.15% of total power is consumed by global interconnect power [Li et al, DAC’04]
47% of global interconnect power is leakage Why?
Extremely low utilization rate (~12% w/ minimum array)
Interconnect Utilization Rate is Intrinsically Low Programmable switch block
no more than 25%
Programmable connection block Only one is used (for 64
tracks)
Power-gating unused interconnects is necessary
Vdd-gateable Routing Switch
Vdd-gateable routing switch Only two states for a routing switch
High Vdd Power-gating
Enable power-gating capability w/o extra SRAM cells
Conventional routing switch
Power transitor
Vdd-Gateable Connection Block
Enable power-gating capability w/ only one extra SRAM for a connection block Only n+1 SRAM cells for 2n connection switches A low leakage decoder is needed
Conventional connection block Vdd-gateable connection block
Power and Delay of Vdd-gateable Switch Vdd-gateable switch compared to
conventional switch Dynamic power is almost the same >300X leakage power reduction ~6% delay increase
Vdd
Routing switch delay (ns) Energy per switch (Joule)
w/o power-gating
w/ power-gating w/o power-gating
w/ power-gating
1.3v 5.90E-11 6.26E-11(6%) 3.3E-14 3.25E-14
1.0v 6.99E-11 7.42E-11(6.1%) 1.63E-14 1.65E-14
Power Reduction by Power-gating Unused Interconnects
Vdd-programmable interconnectsVdd-gateable interconnects
Circuit Single-Vdd (baseline) Total Power Saving
Interconnect power (W)
Total power (W) [Li et al, ICCAD04]
Vdd-gateable Interconnects
alu4 0.0657 0.0769 25.13% 29.09%
apex4 0.0437 0.0500 21.83% 30.70%
bigkey 0.1044 0.1375 33.38% 24.89%
clma 0.4918 0.5450 23.42% 45.69%
des 0.1688 0.2136 36.71% 31.79%
diffeq 0.0292 0.0360 17.50% 45.20%
dsip 0.1003 0.1280 34.34% 43.66%
Avg. -- -- 25.19% 38.18%
Outline Review and motivation Interconnect Leakage Power Reduction
using Power-gating
Interconnect Dynamic Power Reduction using Dual-Vdd FPGA fabrics and algorithms Design flow and quantitative evaluation
Conclusions and Ongoing Work
Pre-Defined Dual-Vdd Routing Architecture
Partition routing channel into VddH and VddL regions Vdd-gateable interconnect switch is used Ratio of VddH/VddL track is an architectural parameter
Reduce dynamic power with dual-Vdd by making use of timing slack
Ratio of VddH to VddL Track Determine ratio using dual-Vdd assignment
profile without considering layout constraint Sensitivity-based dual-Vdd assignment
Assignment unit --- a routing tree Power sensitivity --- ΔP/ ΔVdd
Power difference for a routing tree between VddH and VddL Greedy algorithm --- sensitivity based
Initial: uniform VddH assignment Procedure: assign VddL to routing tree with largest power
sensitivity (but without increasing critical delay)
Profile of Dual-Vdd Assignment Assignment with no critical path delay increase
(VddH:VddL=1.5v:1.0v)
Circuits #of routing trees
# of logic
blocks
# of I/O blocks
VddL routing trees (%)
VddL logic blocks (%)
alu4 782 162 22 49.74 82.10
apex4 849 134 28 35.45 78.36
bigkey 1542 294 426 67.77 85.03
clma 7995 1358 144 69.74 89.84
s38417 5426 982 135 64.17 80.05
seq 1138 274 76 20.74 61.62
spla 2091 461 122 54.52 88.47
Avg. 54.54 80.28
Set the ratio of VddH/VddL track to 1:1
Level Converter is NOT Needed
Wire segment can only be connected to another wire segment with the same track number via a subset switch block
A
B
Level Converter is NOT Needed
Wire segment can only be connected to another wire segment with the same track number via a subset switch block
A
B
No level converter is needed in switch block
Layout Constraint Due to Dual-Vdd Dual-Vdd introduces performance
degradation due to layout constraint Insufficient routing resources for Vdd-
matched routing trees May introduce detours
Solutions Vdd-programmable interconnects [Li et al,
ICCAD’04] Provide sufficient routing tracks for Vdd-
matched routing trees Control leakage by power-gating unused
interconnects
Design Flow for Dual-Vdd Interconnects
Delay/Power Model
(dual-Vdd)
Arch Spec
Timing Driven Layout (Single-Vdd)
Tech MappedNetlist (Single-Vdd)
Delay/Power Estimation
Delay Power
Dual-Vdd Assignment for Routing Trees
Timing Driven Layout (Dual-Vdd)
Power-gating Unused Switches
DoubleChannel
width
Dual-Vdd Routing Algorithm Based on the maze routing algorithm in VPR Modify the cost function
),(
),(
)()(
nTMatched
jnstDvExpectedCo
nPathCostDvnTotalCost
TotalCost(n): the cost of routing tree T through wire segment n to the target sink j
PathCostDv(n): the cost of the path from the current partial routing tree to wire segment n
ExpectedDv(n,j): the estimated cost from wire segment n to the target sink j
Matched(T,n): boolean function describing Vdd-matching status
Outline Review and motivation Interconnect Leakage Power Reduction
using Power-gating
Interconnect Dynamic Power Reduction using Dual-Vdd FPGA fabrics and algorithms Quantitative evaluation
Conclusions and Ongoing Work
Comparison of Low Power Architectures
0.07
0.12
0.17
0.22
0.27
60 70 80 90 100 110 120 130
clock frequency (MHZ)
pow
er (
wat
t)
arch-SV
1.3v
1.0v0.9v
1.5varch-PV
1.5v/0.8v1.3v/1.0v
0.9v/0.8v 1.0v/0.8v
arch-PV+PG
1.5v/0.8v1.3v/1.0v
1.0v/0.8v
0.9v/0.8v
arch-DV+PG(1.5W)
1.5v/0.8v1.3v/0.9v1.0v/0.8v0.9v/0.8v
Dual-Vdd interconnects with fine-grained power gating May have performance degradation due to layout constraint Can reduce more power than purely power-gating unused switches Achieve 9.78% interconnect dynamic power reduction, 38.68% total
power saving with 1.5W channel width W is the nominal routing channel width in single-Vdd FPGA
Circuit: S38584
Impact of Routing Channel Width
30%
35%
40%
45%
50%
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
channel width
pow
er s
avin
g
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
norm
aliz
ed c
lock
fre
quen
cypower saving
normalized clock frequency0.955
0.838
0.74345.00%
38.68%
34.86%
power saving
clock frequency
We get the power reduction percentage at the maximum clock frequency achieved by dual-Vdd interconnects
Channel width increases from 1.0W to 2.0W Power saving increases from 34.86% to 45% Normalized clock frequency increases from 0.743 to 0.955
Area Overhead of Vdd-gateable Interconnects Device area is dominant
Single-Vdd
(baseline)
Dual-Vdd w/ Power-gating (1.0W)
Dual-Vdd w/ Power-gating (1.5W)
Dual-Vdd w/ Power-gating (2.0W)
[Li et al, ICCAD’04]
Total FPGA area
7077044 11092744 15420197 20249865 22678225
Area overhead (%)
- 57% 118% 186% 220%
Area overhead is mainly due to power transistors for power-gating capability
Track duplication with power-gating vs Vdd-programmable interconnects [Li et at, ICCAD’04] More power reduction (45% vs 25%) & less area overhead
Mainly due to Vdd-level converter removal
High Vdd interconnects with power gating is BEST considering area
Outline Review and motivation Interconnect Leakage Power Reduction
using Power-gating
Interconnect Dynamic Power Reduction using Dual-Vdd
Conclusions and Ongoing Work
Conclusions and Ongoing Work Conclusions
Developed power-gateable interconnects w/ virtually no extra SRAM cell
Achieved 38.18% total power reduction using Vdd-gateable interconnects
Achieved 24.78% interconnect dynamic power reduction, 45.00% total power reduction with duplicated (2W) channel width
Ongoing work Power-ground design to support dual-Vdd Optimal mix of Vdd-programmable and Vdd-gateable
interconnects Architecture evaluation considering Vdd
programmability [Lin et al, to appear in FPGA’05]