low power functional unit for use in coarse grained reconfigurable array nathaniel mcvicar corey...

Post on 26-Mar-2015

222 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Low Power Functional Unit for use in Coarse Grained Reconfigurable Array

Nathaniel McVicarCorey Olson

Jimmy Xu

Outline Functional Unit

Shifter ALU MADD

Design Flow (all modules) VCS Design Compiler PrimeTime Encounter & Cadence v2lvs

UPF Tutorial Results

Dynamic Power consumption of modules Power Down/Up timing VDD Scaling

FU TopLevel Main Units

ALU MADD Barrel Shifter

Supporting Modules Output Muxes Clock gating registers Crossbar

IBM 65nm PDK

Process - cmos10lpe low power process very low leakage in power analysis

Standard cells cp65npksdst_tt1p2v25c

Shifter Specs

32-bit shifter with 5 shift bits Bi-directional shifting Logical and arithmetic shifting Purely combinational design

1GHz target frequency Want it as fast as possible Need to be power aware during

synthesis

Shifter Design

31’b0 X[31:0]

X[30:0] 31{X[31]}LEFT /

LOGICAL

Z

S[4]

S[3]

S[2]

S[1]

S[0]

ALU Specs

32-bit ALU supporting Supports 15 instructions Combinational design

1GHz target frequency On critical path Want it as fast as possible Need to be power aware during

synthesis

ALU Design Methodologies Muxed Output

Simple functions with muxed output

Gate off functions not in use

More gates Higher leakage,

lower switching

Hardware Reuse Do everything

with the adder Cannot gate the

adder Fewer gates

Lower leakage, higher switching

ALU Design 1

+

A

B

AB

flipA

flipB

clearA

clearB

setA

setA

AB

P

G

Z

Z

Control

sel[1:0]

Power Results

Switching: (Syn. Model) 630 uW (3.55 uW)

Interconnect: 1.14 mW (3.94 mW)

Leakage: 135 nW (530 nW)

Total: 1.77 mW (7.5 mW)

ALU Design 2

+

A

B

A

BO

Z

Control

sel[1:0]

en

en

latch

en

en

Power Results

Switching: (Syn. Model) 655 uW (3.55 uW)

Interconnect: 1.21 mW (3.94 mW)

Leakage: 160 nW (530 nW)

Total: 1.87 mW (7.5 mW)

MADD Specs

32 bit multiply-add unit 2 cycle pipelined module Add input arrives on second cycle

1 GHz target frequency most power hungry module in design

need to be power aware during synthesis ideally would run as fast as possible may need to trade speed for power

(~700MHz)

MADD Design

A

B

CLK

HeterogeneousBooth Enc

PP Generation

CSA TreeStage 1

D QRegisters

CLK

C CSA TreeStage 2

Final Adder Z

VCS

Testbenches written to verify functionality using VCS random input vectors used for data instructions/shift encodings tested

sequentially

Design Compiler Compile to standard cell library

cp65npksdst_tt1p2v25c from IBM’s cmos10lpe compile to others for corner analysis (ff, 1p0v,

…) control target frequency and synthesize for

power Reports created

Power – inaccurate, but use as a baseline Area – reports number of gates in design Timing – design can’t always meet timing

DC Example# standard cells that you synthesize toset target_library <libname>.dbset link_library <libname>.db

# prepare and synthesizeanalyze –f verilog <my_verilog_file>.velaborate <my_toplevel>current_design <my_toplevel>linkuniquifycompile_ultra –gate_clockcompile_ultra –incremental

# check for errors in the synthesized design (timing violations, cell warnings,…)check_designreport_constraint –all_violators

# write the output file in verilog netlist formatwrite –f verilog –output <filename>.vh

# output the timing or power or cell reportredirect timing/power/cell.rep { report_timing/cell/power }

DC Example Output

Operating Conditions: TT1P2V25C Library: cp65npksdst_tt1p2v25cWire Load Model Mode: enclosed

Design Wire Load Model Library------------------------------------------------Alu B0.1X0.1 cp65npksdst_tt1p2v25c

Global Operating Voltage = 1.2 Power-specific unit information : Voltage Units = 1V Capacitance Units = 1.000000pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = 1nW

Cell Internal Power = 433.2152 uW (51%) Net Switching Power = 409.2202 uW (49%) ---------Total Dynamic Power = 842.4354 uW (100%)

Cell Leakage Power = 129.3405 nW

PrimeTime power analysis

reports breakdown of power consumption internal switching intermediate nodes switching leakage

more detailed breakdown available memory, clock network, register, combinational

timing check - redundant at this stage no functional verification

use simulator for functionality vcs, ncsim

PT Example# setuplink_library <libname>.dbread_verilog <netlist>.vhcurrent_design <my_toplevel>link

# for a design without an existing clock inputcreate_clock –name clock -period

# toggle_count is prob of switching, static is prob of being a 1set_switching_activity –toggle_count 0.25 –static_probability 0.5 <INPUT>

# get the power analysis and write details to Alu.rptcheck_powerupdate_powerreport_power > Alu.rpt

PT Example Output

Attributes ---------- i - Including register clock pin internal power u - User defined power group

Internal Switching Leakage TotalPower Group Power Power Power Power ( %) Attrs--------------------------------------------------------------------------------------------io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%) memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%) black_box 0.0000 0.0000 0.0000 0.0000 ( 0.00%) clock_network 0.0000 0.0000 0.0000 0.0000 ( 0.00%) iregister 0.0000 0.0000 0.0000 0.0000 ( 0.00%) combinational 9.606e-04 1.053e-03 1.295e-07 2.014e-03 (100.00%) sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%)

Net Switching Power = 1.053e-03 (52.30%) Cell Internal Power = 9.606e-04 (47.70%) Cell Leakage Power = 1.295e-07 ( 0.01%) ---------Total Power = 2.014e-03 (100.00%)

Encounter

Features Place and Route Control the power and ground to all

cells Extract parasitic capacitances stream out gds for use with Cadence

ALU Encounter Example

Encounter

Failures difficult to use impossible to save netlist views still need to use cadence tools to

generate SPICE netlist unable to extract parasitics

could still do this with Cadence

Cadence

Features read in a verilog netlist stream in standard cell layouts and

schematics stream in gds from Encounter create SPICE netlist

ShiftLR Cadence Example

Cadence

Failures unable to properly stream in standard

cell schematics unable to create netlist from

schematic unable to run LVS or extract parasitics

Solution v2lvs

v2lvs

enables a SPICE netlist from a synthesized

verilog netlist include SPICE definitions of standard

cells run HSPICE simulations for power

down/up sequence and VDD scaling

v2lvs ExampleVerilog:

SEN_EO2_S_0P5 U2120 ( .A1(pprow4[11]), .A2(pprow5[9]), .X(n566) );SEN_EO2_S_0P5 U2121

( .A1(pprow4[13]), .A2(pprow5[11]), .X(n567) );SEN_EO2_S_0P5 U2122 ( .A1(pprow2[13]), .A2(pprow7[3]), .X(n568) );SEN_EO2_S_0P5 U2123 ( .A1(pprow2[15]), .A2(pprow7[5]), .X(n569) );

v2lvs:v2lvs -i -v ../synthesis/ShiftLR.vh -s0 VSS -s1 VDD -s

design_model.inc -o ShiftLR.sp -lsr cp65npksdst.lvs

HSPICE:XU2120 n566 pprow4[11] pprow5[9] SEN_EO2_S_0P5 XU2121 n567 pprow4[13] pprow5[11] SEN_EO2_S_0P5 XU2122 n568 pprow2[13] pprow7[3] SEN_EO2_S_0P5XU2123 n569 pprow2[15] pprow7[5] SEN_EO2_S_0P5

HSPICE

Created simulation test-bench for power measurement using vector input

Adds potential VDD scaling and gating

Final Power Results

Synthesis Matters At 1 GHz, MADD power very dependent

on synthesis options

Internal Switching

Leakage Total

Naïve 11.2 mW

7.16 mW 1.07 uW 18.3 mW

Constrained

7.77 mW

4.56 mW 0.59 uW 12.3 mW

Ultra 4.08 mW

1.88 mW 0.30 uW 5.96 mW

Synthesis Matter contd. The lower power synthesis options, have

trouble reducing clock and register power

Clock Register Comb

Naïve 9.95% 13.0% 77.05%

Constrained 12.7% 14.8% 72.5%

Ultra 27.4% 12.9% 58.5%

Power-up time resultsW=0.6um M=1

Power-up time results contd.

W=0.6um M=12

Power-up time results contd.

W=6um M=12

Power-up time results contd.

W=6um M=120

Power-up time results contd.

Iavg during power-down = 10.66 uAPavg = 12.792 uWPower-up Delay = 9.4ps

Voltage Scaling - ALU

0

1

2

3

4

5

6

7

8

9

500 2500 4500 6500 8500 10500

Delay (ps)

Po

wer

(m

W) 1 GHz

Voltage Scaling – ShiftLR

0

0.1

0.2

0.3

0.4

0.5

0.6

100 1000 10000 100000

Delay (ps)

Po

wer

(m

W) 1 GHz

500 MHz

1.2V

1.0V

0.8V

0.6V

Results

Significantly reduced power for all modules

Explored voltage scaling Implemented power-up / power-

down sleep logic

Intangibles

Gained significant insight into the current state-of-the-art for low power FPGA and CGRA design, through reading

Gained practical knowledge working with the design tool chain of a commercial PDK

Questions?

top related