powerartist: rtl design for power platform
DESCRIPTION
PowerArtist™ includes production-proven RTL power analysis with interactive visual debug, analysis-driven automatic RTL power reduction, and a Tcl interface to the database enabling custom reports and tracking of power through regressions. PowerArtist generated models bridge the RTL and layout gap delivering physical-aware RTL power accuracy and RTL-power driven early power grid integrity. This presentation provides an overview of PowerArtist and covers RTL design-for-power best practices using real-life examples. Learn more on our website: https://bit.ly/10RpcxuTRANSCRIPT
© 2014 ANSYS, Inc.6/23/2014 1
PowerArtist™: RTL Design-for-Power
Design Automation Conference 2014
© 2014 ANSYS, Inc.6/23/2014 2
Early Power Decisions High Impact
Po
wer
Red
uct
ion
100%
50%
0%
Large Impact Small Impact
RTLDesign
LogicSynthesis
PhysicalDesign
Timing Closure
• Power-Performance-Area Trade-offs
• Voltage / Power Domain Planning
• Block-level Clock and Data Gating
• Eliminate Redundant Activity
• Power Switch Sizing / Placement
• Clock Gater Cloning / Decloning
• Multi-Vt Optimization
• Power Integrity Verification
RTL Design-for-Power Low Power Implementation
© 2014 ANSYS, Inc.6/23/2014 3
RTL Power ↔ Gate-level Power
Design Specification
RTL Design
Gate-Level Design
Layout
~20 hours
~22 mins
Quicker Design Iterations Effective Design-for-Power
RTL Design
Gate-level Power
+Adder
Register
Mux
RTL Power
Power-per-Function
Power-per-Gate
© 2014 ANSYS, Inc.6/23/2014 4
PowerArtist: RTL Design-for-Power Platform
RTL Power
Analysis
• Average, time-based
• Power-critical vector selection
• Regressions via TCL interface
RTL Power
Reduction
• Clock, memory, logic
• Analysis-driven automation
• Interactive power debug
RTL Links
with Physical
• PACE™: RTL power accuracy
• RPM™: RTL-driven physical power integrity Physical
Power
RTL Power
PACE RPM
© 2014 ANSYS, Inc.6/23/2014 5
RTL Power: Ins and Outs
Vdd1
Power domains(UPF / CPF)
Vdd2module PA (...always @ (posedge clk) begin
dout <= din1;endassign out = sel ? dout : din2;...endmodule RTL
(VHDL, Verilog, System Verilog)
RTL Power Analysis
Capacitance model (WLM / PACE)
mux
andregister
register
Activity (FSDB / VCD / SAIF)
Clock tree, gating (SDC, PACE, user input) clk
Power models(Liberty .lib)
register
registerand
mux
© 2014 ANSYS, Inc.6/23/2014 6
Low Power RTL Design Methodology
Peak Power = 391mW
Check power vs. budget
TRANSMIT MODE RECEIVE MODE
Residual receive activity in
transmit mode
Profile power vectors
RTL Power Regression Flow
Monitor power vs. budgetReduce power automatically
Enabled Clock
Inactive Data
Debug power hotspots
Average power = 239mW
Perform design trade-offs
0.00E+00
1.00E-02
2.00E-02
3.00E-02
4.00E-02
5.00E-02
6.00E-02
Po
wer
(W
)
Version 2 (Typ)
Version 1 (Typ)
Version 2 (Idle)
Version 1 (Idle)
Version 1 Version 2
© 2014 ANSYS, Inc.6/23/2014 7
RTL vs. Gates: Accuracy and PerformanceNvidia Case Study
RTL Power: ~30X fasterRTL Power Accuracy: ~15%
© 2014 ANSYS, Inc.6/23/2014 9
RTL Capacity: Large Designs / FSDBsSamsung Case Study
FSDB captures only power-critical
signals identified by PowerArtist
• FSDB size: 1/4
• TAT: 4X faster
• Loss of accuracy: 2%
© 2014 ANSYS, Inc.6/23/2014 10
RTL Power Analysis
© 2014 ANSYS, Inc.6/23/2014 11
PowerArtist RTL Power Analysis
• Total Logic / Clock Activity per Hierarchical Instance
• Qualify Coverage per Power Mode
• Identify Power Bugs
• Understand Power: Where? Why?
• Per Hierarchy, Category, Mode, Clock / Voltage Domains
• Qualify Power Efficiency with Multiple Metrics
Average Power AnalysisActivity Analysis
• Power Waveforms per Hierarchical Instance
• Waveforms per Category: Clock, Memory, Logic
• Identify Peak Power and Time
Time-based Power Analysis
© 2014 ANSYS, Inc.6/23/2014 12
Clock Gating EfficiencyTemporal and Structural Metrics
Example
• 16 of 20 bits are gated
• 5 of 10 cycles are gated
• 2 of 5 enabled cycles had data toggles
gclk
clk
en
data
SCGE DCGE CGEE
Definition % Gated Bits % Gated Clock Cycles % Ideally Gated Cycles
Type of Metric Structural Temporal (en, clk) Temporal (data, en, clk)
Value 80% 50% 40%
© 2014 ANSYS, Inc.6/23/2014 13
Clock Gating EfficiencyTemporal and Structural Metrics
100% Static CGE
0% Dynamic CGE
CGEE,
Power Impact
CGE: Static, Dynamic
Flop: Power, Activity
© 2014 ANSYS, Inc.6/23/2014 14
RTL Power Reduction
© 2014 ANSYS, Inc.6/23/2014 15
PowerArtist RTL Power Reduction
Original RTL Low-Power RTL
openPDB powerartist.pdb
set RPT [open $output_file "w"]
set ungated_registers [getRegisters -cg none]
foreach I $ungated_registers {
set dyn_power [getPropVal $i Dynamic_Power "inst"]
set bit_width [getInstWidth $reg]
set file [getPropVal $iFile_Name "inst"]
set line_num [getPropVal $i Line_Number "inst"]
}
1. Interactive Power
Debug
2. Automated Power
Reduction
3. Customizable Power
Reports
• Block-level Power “Bugs”
• Large Power Savings
• Instance-level Power Reduction
• 15 Analysis-driven Techniques
• TCL Queries to OADB
• Automation Beyond
PowerArtist Reports
© 2014 ANSYS, Inc.6/23/2014 16
Debug Power: Visualize-Analyze-Reduce
Inactive Data, Active Clock
Identify Block-level Clock Gating Enable
© 2014 ANSYS, Inc.6/23/2014 17
Block-Level Power Reduction
Clock Active, Data Inactive
Clock Inactive, Data Active
Block-level
Clock Gating
Block-level
Data Gating
Block-level Activity Analysis:
Clock and Data Ports
1.1 Clock Pins-------------------------------------------------------Redundant Total Pin Mode InstanceCycles Cycles Name Name Name
-------------------------------------------------------200 201 CLKA read top.core1.t1.dpmem.m1
-------------------------------------------------------1.2 Input and Redundant Pins-------------------------------------------------------Redundant Total Pin Mode InstanceToggles Toggles Name Name Name
-------------------------------------------------------1 1 AB[8] read top.core1.t1.dpmem.m1
-------------------------------------------------------
Wasted Activity
per Mode
Clock Activity per
Hierarchy
Constant high activity
Missed clock gating?
Redundant activity
in read mode
© 2014 ANSYS, Inc.6/23/2014 18
Instance-Level Power Reduction
• Clock gating coverage
• Clock gating efficiency
• Sequential and combinational
• Redundant activity
• Don’t care conditions
• Datapath operand isolation
• Redundant read/write
• Splitting memories
• Exercising sleep modes
Clock / Clock Gating Control Logic and Datapath Memory Subsystem
© 2014 ANSYS, Inc.6/23/2014 19
Analysis-Driven RTL Power Reduction
Wasted activity/power when sel is 0
© 2014 ANSYS, Inc.6/23/2014 20
Analysis-Driven RTL Power Reduction
Pre-compute based new clock gate enables
Multi-cycle ODC sequential analysis
© 2014 ANSYS, Inc.6/23/2014 21
Analysis-Driven RTL Power Reduction
Pre-compute based new clock gate enables
Multi-cycle ODC sequential analysis
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
1 11 21 31 41 51 61 71 81 91 101111121131141151161171181191201211221231241251261271281291
Pre
dic
ted
Po
wer
Savin
gs
(norm
aliz
ed)
# RTL Changes (Design Effort)
Top 5 RTL changes
50% identified power savings
Maximize Power Savings
Minimize Design Impact
• Clock, Memory, Logic
• Sequential, Combinational
• Vector-based, Vectorless
• Hierarchical, SoC capacity
15 Power Reduction Techniques
© 2014 ANSYS, Inc.6/23/2014 22
Power Reduction Case Studies
….
.
1
0
A
B
scan_enable = 0
scan_clock
data_inM_OUT
Write ReadWrite
MUX Reduction Technique:
• Scan clocks toggling in functional mode
• Redundant data activity in registers wasting power
Redundant Data Toggles
GMC Technique:
• Redundant data toggles in
read mode
• Cycle-based analysis reports
% Redundant Cycles
© 2014 ANSYS, Inc.6/23/2014 23
Power Database Access with TCL API
Power Database
(OpenAccess)
Design Queries• getMemories/Flops/Combs
• getFanout
• getModulePorts
• reportDesignStats
Report Creation• reportCGEfficiency
• diffPdbPower
• reportPower
• reportReductions
Power Queries• getPropVal instance/net
• getClockPower
• getNetPower
• getClockEnableExpr
Design Navigation• dls
• dpwd, dcd
• dpushd, dpopd
• show
Customize and Automate Power Reduction, Reports, Regressions
• Quick access to power and design properties
• Accomplish custom tasks with few lines of TCL
© 2014 ANSYS, Inc.6/23/2014 24
Custom Power Reports50% Idle Power Reduction in Mobile SoC
Instance Name
Enable
Efficiency Clock Power Clock En Net
or1200_cpu.ckg12 0 5.17E-03 clk or1200_cpu.en_blk
or1200_cpu.or1200_ctrl.ckg5 0.1 1.36E-03 gclk_blk or1200_cpu.or1200_ctrl.n1
en_blk
clk
data
gclk_blk
Inefficient enables waste power
en_blk
clk
gclk_blk
Block
Clock
Gate
en_reg
Register
Clock
Gategclk_reg
Block-level clock gates control
significant power
Single clock gate controls >5mWPower Efficiency = 0
PowerArtist clock gating report identifies inefficient clock gates
© 2014 ANSYS, Inc.6/23/2014 25
RTL Power Regressions
• 30+ blocks per typical SoC
• 2+ vectors per block
• Vectors written for power: idle, active
• Daily block-level, weekly chip-level regressions
monitor power changes
• Power metrics track power efficiency
• PowerArtist identifies where power changed
RTL (Verilog, SV, VHDL)
Testbench
Simulator
FSDB
RTL Power
Analysis, Reduction, Regression
© 2014 ANSYS, Inc.6/23/2014 26
RTL Links with Physical Design
© 2014 ANSYS, Inc.6/23/2014 27
PACE™: Physical-Aware RTL Power Budgeting
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
• Clock Distribution
• Parasitics
• Multiple Vt
• Low-power Structures
• Optimization
PACE Models
(Cap, Clock)
Post-Layout
Gate-level Power
PACERTL PowerPACE Bridges the RTL vs. Layout Gap
Predictable RTL Power Accuracy
© 2014 ANSYS, Inc.6/23/2014 28
RTL PACE vs. Gate-Power: Mobile SoC @14nm
RTL-PACE Power within 20%
Total Power Correlation
Gate-SPEF vs. RTL-PACE vs. RTL-WLM Clock Power Correlation
Gate-SPEF vs. RTL-PACE
RTL-PACE Clock Power within 20%
© 2014 ANSYS, Inc.6/23/2014 29
RTL Power-Driven Power Integrity
module PA (
...
always @ (posedge clk)
begin
dout <= din1;
end
assign out = sel ? dout :
din2;
...
endmodule
• Shrinking geometries Increasing di/dt
• Gate vectors too late
• Layout late for changes
• Error-prone guesstimates
RTL PowerRPM Enables PDN Planning
Early, Optimal, Robust
RTL Power
Model
RPMPhysical
Power Integrity
© 2014 ANSYS, Inc.6/23/2014 30
RPM Case Studies
RPM
CPM(Layout)+Pkg
CPM(RPM)+Pkg
Pkg onlyRPM
Gate
FSDB
Vectorless
Peak = 6X Average Power
Di/dt event not at the
same time as the peak
Peak and di/dt Cycle Selection on a GPU Core
Frame: DIDT
Start time: 0.0817704
Finish time: 0.0817706
Average leakage for supply VDD: 0.00257393
Average power for supply VDD: 0.185336
Peak power for supply VDD: 0.219776
Frame: CYCLE_POWER
Start time: 0.0806005
Finish time: 0.0806007
Average leakage for supply VDD: 0.002569
Average power for supply VDD: 0.250168
Peak power for supply VDD: 0.266678
Early Voltage Drop Analysis Early Package Resonance Analysis
© 2014 ANSYS, Inc.6/23/2014 32
Related Presentations @ DAC2014
• Power Analysis Using PowerArtist for WaveLogic3 ASIC –
100Gbs Coherent Metro Optical Modem
• Achieving RTL Power Efficiency and Automated Power
Reduction
• Methods for Achieving RTL to Gate Power Consistency