powerartist: rtl design for power platform

30
© 2014 ANSYS, Inc. 6/23/2014 1 PowerArtist™: RTL Design-for-Power Design Automation Conference 2014

Upload: ansys-inc

Post on 14-Jun-2015

889 views

Category:

Engineering


20 download

DESCRIPTION

PowerArtist™ includes production-proven RTL power analysis with interactive visual debug, analysis-driven automatic RTL power reduction, and a Tcl interface to the database enabling custom reports and tracking of power through regressions. PowerArtist generated models bridge the RTL and layout gap delivering physical-aware RTL power accuracy and RTL-power driven early power grid integrity. This presentation provides an overview of PowerArtist and covers RTL design-for-power best practices using real-life examples. Learn more on our website: https://bit.ly/10Rpcxu

TRANSCRIPT

Page 1: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 1

PowerArtist™: RTL Design-for-Power

Design Automation Conference 2014

Page 2: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 2

Early Power Decisions High Impact

Po

wer

Red

uct

ion

100%

50%

0%

Large Impact Small Impact

RTLDesign

LogicSynthesis

PhysicalDesign

Timing Closure

• Power-Performance-Area Trade-offs

• Voltage / Power Domain Planning

• Block-level Clock and Data Gating

• Eliminate Redundant Activity

• Power Switch Sizing / Placement

• Clock Gater Cloning / Decloning

• Multi-Vt Optimization

• Power Integrity Verification

RTL Design-for-Power Low Power Implementation

Page 3: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 3

RTL Power ↔ Gate-level Power

Design Specification

RTL Design

Gate-Level Design

Layout

~20 hours

~22 mins

Quicker Design Iterations Effective Design-for-Power

RTL Design

Gate-level Power

+Adder

Register

Mux

RTL Power

Power-per-Function

Power-per-Gate

Page 4: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 4

PowerArtist: RTL Design-for-Power Platform

RTL Power

Analysis

• Average, time-based

• Power-critical vector selection

• Regressions via TCL interface

RTL Power

Reduction

• Clock, memory, logic

• Analysis-driven automation

• Interactive power debug

RTL Links

with Physical

• PACE™: RTL power accuracy

• RPM™: RTL-driven physical power integrity Physical

Power

RTL Power

PACE RPM

Page 5: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 5

RTL Power: Ins and Outs

Vdd1

Power domains(UPF / CPF)

Vdd2module PA (...always @ (posedge clk) begin

dout <= din1;endassign out = sel ? dout : din2;...endmodule RTL

(VHDL, Verilog, System Verilog)

RTL Power Analysis

Capacitance model (WLM / PACE)

mux

andregister

register

Activity (FSDB / VCD / SAIF)

Clock tree, gating (SDC, PACE, user input) clk

Power models(Liberty .lib)

register

registerand

mux

Page 6: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 6

Low Power RTL Design Methodology

Peak Power = 391mW

Check power vs. budget

TRANSMIT MODE RECEIVE MODE

Residual receive activity in

transmit mode

Profile power vectors

RTL Power Regression Flow

Monitor power vs. budgetReduce power automatically

Enabled Clock

Inactive Data

Debug power hotspots

Average power = 239mW

Perform design trade-offs

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

Po

wer

(W

)

Version 2 (Typ)

Version 1 (Typ)

Version 2 (Idle)

Version 1 (Idle)

Version 1 Version 2

Page 7: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 7

RTL vs. Gates: Accuracy and PerformanceNvidia Case Study

RTL Power: ~30X fasterRTL Power Accuracy: ~15%

Page 8: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 9

RTL Capacity: Large Designs / FSDBsSamsung Case Study

FSDB captures only power-critical

signals identified by PowerArtist

• FSDB size: 1/4

• TAT: 4X faster

• Loss of accuracy: 2%

Page 9: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 10

RTL Power Analysis

Page 10: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 11

PowerArtist RTL Power Analysis

• Total Logic / Clock Activity per Hierarchical Instance

• Qualify Coverage per Power Mode

• Identify Power Bugs

• Understand Power: Where? Why?

• Per Hierarchy, Category, Mode, Clock / Voltage Domains

• Qualify Power Efficiency with Multiple Metrics

Average Power AnalysisActivity Analysis

• Power Waveforms per Hierarchical Instance

• Waveforms per Category: Clock, Memory, Logic

• Identify Peak Power and Time

Time-based Power Analysis

Page 11: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 12

Clock Gating EfficiencyTemporal and Structural Metrics

Example

• 16 of 20 bits are gated

• 5 of 10 cycles are gated

• 2 of 5 enabled cycles had data toggles

gclk

clk

en

data

SCGE DCGE CGEE

Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles

Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk)

Value 80% 50% 40%

Page 12: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 13

Clock Gating EfficiencyTemporal and Structural Metrics

100% Static CGE

0% Dynamic CGE

CGEE,

Power Impact

CGE: Static, Dynamic

Flop: Power, Activity

Page 13: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 14

RTL Power Reduction

Page 14: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 15

PowerArtist RTL Power Reduction

Original RTL Low-Power RTL

openPDB powerartist.pdb

set RPT [open $output_file "w"]

set ungated_registers [getRegisters -cg none]

foreach I $ungated_registers {

set dyn_power [getPropVal $i Dynamic_Power "inst"]

set bit_width [getInstWidth $reg]

set file [getPropVal $iFile_Name "inst"]

set line_num [getPropVal $i Line_Number "inst"]

}

1. Interactive Power

Debug

2. Automated Power

Reduction

3. Customizable Power

Reports

• Block-level Power “Bugs”

• Large Power Savings

• Instance-level Power Reduction

• 15 Analysis-driven Techniques

• TCL Queries to OADB

• Automation Beyond

PowerArtist Reports

Page 15: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 16

Debug Power: Visualize-Analyze-Reduce

Inactive Data, Active Clock

Identify Block-level Clock Gating Enable

Page 16: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 17

Block-Level Power Reduction

Clock Active, Data Inactive

Clock Inactive, Data Active

Block-level

Clock Gating

Block-level

Data Gating

Block-level Activity Analysis:

Clock and Data Ports

1.1 Clock Pins-------------------------------------------------------Redundant Total Pin Mode InstanceCycles Cycles Name Name Name

-------------------------------------------------------200 201 CLKA read top.core1.t1.dpmem.m1

-------------------------------------------------------1.2 Input and Redundant Pins-------------------------------------------------------Redundant Total Pin Mode InstanceToggles Toggles Name Name Name

-------------------------------------------------------1 1 AB[8] read top.core1.t1.dpmem.m1

-------------------------------------------------------

Wasted Activity

per Mode

Clock Activity per

Hierarchy

Constant high activity

Missed clock gating?

Redundant activity

in read mode

Page 17: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 18

Instance-Level Power Reduction

• Clock gating coverage

• Clock gating efficiency

• Sequential and combinational

• Redundant activity

• Don’t care conditions

• Datapath operand isolation

• Redundant read/write

• Splitting memories

• Exercising sleep modes

Clock / Clock Gating Control Logic and Datapath Memory Subsystem

Page 18: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 19

Analysis-Driven RTL Power Reduction

Wasted activity/power when sel is 0

Page 19: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 20

Analysis-Driven RTL Power Reduction

Pre-compute based new clock gate enables

Multi-cycle ODC sequential analysis

Page 20: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 21

Analysis-Driven RTL Power Reduction

Pre-compute based new clock gate enables

Multi-cycle ODC sequential analysis

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291

Pre

dic

ted

Po

wer

Savin

gs

(norm

aliz

ed)

# RTL Changes (Design Effort)

Top 5 RTL changes

50% identified power savings

Maximize Power Savings

Minimize Design Impact

• Clock, Memory, Logic

• Sequential, Combinational

• Vector-based, Vectorless

• Hierarchical, SoC capacity

15 Power Reduction Techniques

Page 21: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 22

Power Reduction Case Studies

….

.

1

0

A

B

scan_enable = 0

scan_clock

data_inM_OUT

Write ReadWrite

MUX Reduction Technique:

• Scan clocks toggling in functional mode

• Redundant data activity in registers wasting power

Redundant Data Toggles

GMC Technique:

• Redundant data toggles in

read mode

• Cycle-based analysis reports

% Redundant Cycles

Page 22: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 23

Power Database Access with TCL API

Power Database

(OpenAccess)

Design Queries• getMemories/Flops/Combs

• getFanout

• getModulePorts

• reportDesignStats

Report Creation• reportCGEfficiency

• diffPdbPower

• reportPower

• reportReductions

Power Queries• getPropVal instance/net

• getClockPower

• getNetPower

• getClockEnableExpr

Design Navigation• dls

• dpwd, dcd

• dpushd, dpopd

• show

Customize and Automate Power Reduction, Reports, Regressions

• Quick access to power and design properties

• Accomplish custom tasks with few lines of TCL

Page 23: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 24

Custom Power Reports50% Idle Power Reduction in Mobile SoC

Instance Name

Enable

Efficiency Clock Power Clock En Net

or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk

or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1

en_blk

clk

data

gclk_blk

Inefficient enables waste power

en_blk

clk

gclk_blk

Block

Clock

Gate

en_reg

Register

Clock

Gategclk_reg

Block-level clock gates control

significant power

Single clock gate controls >5mWPower Efficiency = 0

PowerArtist clock gating report identifies inefficient clock gates

Page 24: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 25

RTL Power Regressions

• 30+ blocks per typical SoC

• 2+ vectors per block

• Vectors written for power: idle, active

• Daily block-level, weekly chip-level regressions

monitor power changes

• Power metrics track power efficiency

• PowerArtist identifies where power changed

RTL (Verilog, SV, VHDL)

Testbench

Simulator

FSDB

RTL Power

Analysis, Reduction, Regression

Page 25: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 26

RTL Links with Physical Design

Page 26: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 27

PACE™: Physical-Aware RTL Power Budgeting

module PA (

...

always @ (posedge clk)

begin

dout <= din1;

end

assign out = sel ? dout :

din2;

...

endmodule

• Clock Distribution

• Parasitics

• Multiple Vt

• Low-power Structures

• Optimization

PACE Models

(Cap, Clock)

Post-Layout

Gate-level Power

PACERTL PowerPACE Bridges the RTL vs. Layout Gap

Predictable RTL Power Accuracy

Page 27: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 28

RTL PACE vs. Gate-Power: Mobile SoC @14nm

RTL-PACE Power within 20%

Total Power Correlation

Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation

Gate-SPEF vs. RTL-PACE

RTL-PACE Clock Power within 20%

Page 28: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 29

RTL Power-Driven Power Integrity

module PA (

...

always @ (posedge clk)

begin

dout <= din1;

end

assign out = sel ? dout :

din2;

...

endmodule

• Shrinking geometries Increasing di/dt

• Gate vectors too late

• Layout late for changes

• Error-prone guesstimates

RTL PowerRPM Enables PDN Planning

Early, Optimal, Robust

RTL Power

Model

RPMPhysical

Power Integrity

Page 29: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 30

RPM Case Studies

RPM

CPM(Layout)+Pkg

CPM(RPM)+Pkg

Pkg onlyRPM

Gate

FSDB

Vectorless

Peak = 6X Average Power

Di/dt event not at the

same time as the peak

Peak and di/dt Cycle Selection on a GPU Core

Frame: DIDT

Start time: 0.0817704

Finish time: 0.0817706

Average leakage for supply VDD: 0.00257393

Average power for supply VDD: 0.185336

Peak power for supply VDD: 0.219776

Frame: CYCLE_POWER

Start time: 0.0806005

Finish time: 0.0806007

Average leakage for supply VDD: 0.002569

Average power for supply VDD: 0.250168

Peak power for supply VDD: 0.266678

Early Voltage Drop Analysis Early Package Resonance Analysis

Page 30: PowerArtist: RTL Design for Power Platform

© 2014 ANSYS, Inc.6/23/2014 32

Related Presentations @ DAC2014

• Power Analysis Using PowerArtist for WaveLogic3 ASIC –

100Gbs Coherent Metro Optical Modem

• Achieving RTL Power Efficiency and Automated Power

Reduction

• Methods for Achieving RTL to Gate Power Consistency