

Implementing a Voltage Scaling Reference Flow Based on ARM’s IEM
Giorgio ParapiniCadence ICD Product Engineer

AbstractRelative to ARM's IEM technology, this session describes a reference flow to implement voltage scaling along with dormant mode to shutdown the standard cell circuitry on an ARM1176JZF-S processor core. We focus on the low power features leveraged from Cadence Encounter Platform, including multiple supply voltage, DVFS implementation with multi-corner libraries, and variable VDD and ECSM-based sign-off. The entire flow is based on a single specification of the design power intent, including power domains, power modes, level shifting and isolation rules. This session also helps attendees understand RTL and library support required for IEM implementation.

AgendaARM and Cadence Collaboration
iRM
ARM1176JZF-S™ IEM™
ARM® Artisan® Library Support
ARM1176JZF-S™ IEM™ implementation flow using Cadence Encounter Platform
Conclusion

ARM and Cadence Collaboration

ARM Alliance ARM and Cadence Alliance History
Multi-year partnership to: • Address design challenges • Optimize ARM offerings for design and implementation• Increase productivity and IP use predictability
Verification
AHB, eVC, and AMBA complianceSystem C
Encounter and RTL Compiler
ReferenceMethodology
Improved QoS
SI & X Architecture
Support
27% Wire reduction
June 00 June 03 Sept 03 Sept 06Mar 05/May 06
Low Power
40% PWR & PSO savings
June 06
Functional Verification
Kit
ReducedRisks
Cortex-A8 Express
SynthesizableQoS
And TAT
May 07
Low PowerMethodology
Kit
Faster deployment of low power
flows
Nov 07
PFI SiliconProof Point
Silicon Validated low power
flow

iRM

ARM-Cadence Reference Methodology
Scripted RTL to GDSII flow delivers
Optimised PPAReduced Turnaround TimeRisk/Cost reduction
Key elementsCore hardeningGeneration of abstract models
Available from ARMJointly developed flowsQualified/released with coresDesign kit gives complete out-of-box experience
Perf
orm
ance Max Attainable Performance
ARM-Cadence Solution
Proprietary Solutions
Faster Time To Market Time

Cadence iRMs Available For All ARM Processors
SageX 130G4.23ARM966E-S
4.24.24.2
4.24.24.24.26.16.16.16.16.26.26.2
6.2-CPFEncounterEncounter
SageX 130G3ARM7EJ-SSageX 130G3ARM7TDMI-SSageX 130G3ARM946E-S
SageX 130G3ARM1026EJ-SSageX 130G2ARM968E-SSageX 130G2ARM1136J[F]-SSageX 130G2ARM1156J[F]-S
Advantage 90G1ARM926EJ-SAdvantage 90G1ARM1176JZ[F]-SAdvantage 90G1CORTEX-M3Advantage 90G1CORTEX-R4
Advantage 90G, 65LP1CORTEX-A8Advantage 90G1ARM11 MPCOREAdvantage 90G1CORTEX-R4[F]
Metro 130LP1ARM1176JZF-S IEMARM Library (TSMC)ARM Library (TSMC)TierTierProcessorProcessor

ARM1176JZF-S IEM

What Consumers Care About
Users want more features in their mobile devicesMP3, Camera, Video, GPS...
But also need long battery lifeConvenient form factor, affordable price
Battery technology is not evolving fast enough!Need to manage power consumption
Heart of all these devices - Microprocessor

Power Dissipation
Minimise Ileak by:Reducing operating voltageFewer leaking transistors
Minimise Iswitch by:Reducing operating voltageLess switching capLess switching activity
dtIVfCVE lkgDD
t
cDD )(0
2 += ∫
∫t
leakDD dtIV0
Total PowerDissipation
Total PowerDissipation
Dynamic PowerDissipation
Dynamic PowerDissipationStatic Power
DissipationStatic PowerDissipation ∫
t
cDD dtfCV0
2
IleakIswitch

Improving Dynamic Energy Efficiency
Dynamic Frequency Scaling (DFS)Reduce operating frequency if possibleReduces average power (but not task energy)Eliminates idle cycle
Dynamic Voltage & Frequency Scaling (DVFS)Requires DFSReduces voltage if frequency is reducedReduces task energyBased on characterized frequency – voltage pairs (lookup table)

How IEM Works?Batteries have finite amounts of energy stored in themRunning fast and then idling wastes energy
Time
Voltage Reduce VoltageReduce Voltage
ReduceVoltageReduceVoltage
Reduce VoltageReduce Voltage
Task 1 Task 2 Task 3Idle
Only need to run just fast enough to meet the application deadlinesOnly need to run just fast enough to meet the application deadlines
Energy
EnergySaved
Energy
Run Task Slow as Possible
Run Task Slow as Possible
Run Task in Available TimeRun Task in
Available Time

IEM System Implementation

ARM1176JZ-S Power Management
Two Complementary Techniques:
ARM1176 Dormant Mode, allowsComplete Power-off of Core (no leakage in core)Retain system state in Cache/TCM at low voltage
Minimize energy loss due to leakage in standby modes.
Note : Requirement of isolation cells to clamp the RAM inputs
IEM–compatible core and design flowEnables dynamic voltage and frequency scalingTune Performance dynamically to current demand
Substantial reduction of energy consumption, extended Battery Life.

ARM Artisan Library Support

VDD2
VDD1
VDD1
VDD1
VDD2
VDD2
Power gates (MT-CMOS)Power control of voltage islands via switchable voltage rails using header (shown) or footer cells
Level shifter and isolation cells Up and down shifting with optional enable signal
Retention flip-flopsMaintain FF state after power down for leakage reduction
Back-bias support Reduce leakage current via well-biasing with special fill_tie cells
Always on bufferBuffering of signals in powered-down areas Global VDD
Global VSS
ARM Power-Management Kit
Note: Picture shows a conceptual implementation

AgendaARM and Cadence Collaboration
iRM
ARM1176JZF-S™ IEM™
ARM® Artisan® Library Support
ARM1176JZF-S™ IEM™ implementation flow using Cadence Encounter Platform
Conclusion

ARM and Cadence iRMARM-Cadence
Reference Methodology
RTL Compiler
SOC Encounter
Fire & Ice QX
Celtic NDC
VoltageStorm
ConformalLow Power
ARM Artisan Power Management Kit
Compiled Views for Fire&ICE,
VoltageStorm and Celtic NDCTiming ECSM
extensions
ARM Artisan Metro™Standard Cell

ARM1176-IEM iRM FeaturesCPF based flow
CPF file is used to describe the low power intent and to drive implementation and verification flow
Automated RTL to GDS Multi Supply Voltage Implementation flowMulti Mode Multi Corner (MMMC) analysis and optimization
Ensures design is optimized across complete voltage and frequency range
Tri-lib based flowProvides accurate interpolation for DVFS and IR drop analysisECSM extensions to .lib

Cadence Low-Power Solution
Define power architecture early-on in the design flowCapture once using CPF; no re-entry or translation later on
Entire design flow understands CPF and helps preserve the power-intent Verification: Comprehensive low-power simulation and formal verification
Design: Power-aware synthesis, equivalence checking, and DFT
Implementation: Automated power-aware RTL-to-GDS layout
Management: Power plan and metrics

What is the Common Power Format?
Design intentPower domain
Logical: hierarchical modules as domain membersPhysical: power/ground nets and connectivityAnalysis view: timing library sets for power domains
Power LogicLevel Shifter LogicIsolation LogicState-Retention logicSwitch Logic & Control Signals
Power modes DefinitionsTransition Expressions
Technology information
Level shifter cellsIsolation cellsState-retention cellsSwitch cellsAlways-on cells
Single specification of power intent used throughout design, verification, and implementationASCII File that captures:

ARM1176-IEM CPF: MSV setup
create_power_domain -name VCORE \-instances $VCORE_moduleInst_list \-boundary_ports "$VCORE_pins" \-shutoff_condition {SWITCH_VCORE}
update_power_domain -name VCORE \-internal_power_net VDDCORE
create_global_connection -net VDDCORE -domain VCORE -pins VDDcreate_global_connection -net VSS -domain VCORE -pins VSS
create_power_domain -name VRAM …create_power_domain -name VSOC –default …create_isolation_rule -name rule_VCORE2VRAM \
-from VCORE -to VRAM \-isolation_output low \-isolation_condition {!RAMCLAMP}
update_isolation_rules -names rule_VCORE2VRAM \-location to -combine_level_shifting \-cells {LVLLHEHX8M}
create_level_shifter_rule -name rule_VRAM2VCORE \-from VRAM -to VCORE
update_level_shifter_rules -names rule_VRAM2VCORE \-cells {LVLHLX8M} -location
create_level_shifter_rule -name rule_VSOC2VCORE …create_isolation_rule -name rule_VCORE2VSOC_low …create_isolation_rule -name rule_VCORE2VSOC_high …
create_power_nets -nets VDDRAM create_power_nets -nets VDDCORE \
-external_shutoff_condition {SWITCH_VCORE}create_power_nets -nets VDD

Multi Mode Multi Corner (MMMC)
Voltagebbb
bbw
www
wwb
Process
TempMulti PVT Corners
Analysis View
Corner definition Power mode
Library Set Operating Conditions
(PVT).lib
(.ecsm* ext).cdb (SI)
.sdcRC Corner -SPEF -QX tech -Cap Table
Dynamic Voltage Frequency Scaling
250MHz
90MHz
166MHz
250MHz
Frequency
0.72V0.72V1.08VPM_lowV
0.72VOff1.08VVCORE_Dormant
0.90V
1.08V
VCORE
0.90V
1.08V
VRAM
1.08VPM_medV
1.08VPM_highV
VSOCMode
Multiple Modes with multiple constraints (.sdc)

ARM1176-IEM CPF: Power modes
Dynamic Voltage and Frequency Scaling
250MHz
90MHz
166MHz
250MHz
Target frequency
0.72V0.72V1.08VPM_lowV
0.72VOff1.08VVCORE_Dormant
0.90V
1.08V
VCORE
0.90V
1.08V
VRAM
1.08VPM_medV
1.08VPM_highV
VSOCMode
create_nominal_condition -name highV -voltage 1.08update_nominal_condition -name highV -library_set libs-worst-1.08v…create_nominal_condition -name OFF -voltage 0
create_power_mode -name PM_highV -default \-domain_conditions {[email protected] [email protected] [email protected]} \
update_power_mode -name PM_highV -sdc_files ARM1176JZFS.constraints_PM_highV.sdc
create_power_mode -name VCORE_dormant \-domain_conditions {[email protected] [email protected] [email protected]}
…
Same power modes are used by both Logic Synthesis (RTLCompiler) and by Place & Route (SOCEncounter)

ARM1176-IEM CPF: Corners & Analysis viewsOperating corners and analysis views are only used for physical implementationAnalysis and physical optimization are working concurrently on active analysis views
create_operating_corner -name WCORNER_1.08 \-voltage 1.08 -temperature 125 -process 1 -library_set libs-worst-1.08v
create_operating_corner -name WCORNER_0.72 \-voltage 0.72 -temperature 125 -process 1 -library_set libs-worst-0.72v
…create_operating_corner -name BCORNER \
-voltage 1.32 -temperature "-40" -process 1 -library_set libs-best-1.32
create_analysis_view -name WCVIEW_1.08 \-mode PM_highV \-domain_corners {[email protected]_1.08 [email protected]_1.08 [email protected]_1.08}
create_analysis_view -name WCVIEW_0.72 \-mode PM_lowV \-domain_corners {[email protected]_0.72 [email protected]_0.72 [email protected]_1.08}
…create_analysis_view -name BCVIEW_1.32 \
-mode PM_highV \-domain_corners {[email protected] [email protected] [email protected]}

Automated CPF-driven MSV flow
Use the CPF to drive synthesis and physical implementation
MSV/power domain partition (power domains with assigned instances, toplevel IO pins and power ground connections)Setup MMMC environment (power modes, delay corners and analysis views )Isolation rules and level_shifterrules to automatize the usage of shifter / isolation cells: definition, identification from RTL, insertion, placement and optimizationSynthesize, place, optimize, and route based on power domainsDomain-aware Clock Tree Synthesis
GDSII
Constraints CPF Netlist
Floorplanning / Silicon Virtual Prototype
Power Routing
Low Power Clock Tree Synthesis
Domain-aware Post-CTS Optimization
IRDrop-Aware Timing/SI Opt.
Sign-off
Isolation/level shifter cells check and insertion
Placement includingSRPG/Level shifters/Iso cells
Top-down MSV/MultiModeSingle-pass Synthesis
Power Grid Synthesis
Domain Aware NanoRoute
Low P
ower Functional V
erificationS
I & S
tatic Timing A
nalysisS
tatic/Dynam
ic Pow
er Analysis
MSV/MMMC Infrastructure
Timing/P
ower O
ptimization

Top VSOC
Logic synthesis
Top-down single-pass synthesis with power domain definitionIdentification of level shifters and isolation cells already instantiated in RTLMulti-mode synthesis to consider frequency and voltage scaling Implementation of separated scan chains, for VSOC and VCORE domainsLeakage power optimization using High-Vt cells
CRead/constrain top
Apply power constraints
Synthesize “Top”
Analyze timing/power
Lib10.72V
Lib20.90V
B
Chip SDCRTL
Lib10.72V
VCORE
Power DomainsShiftersIsolationPower modesLeakageDynamicClock Gating
Chip CPF
Adjust / incremental update
VRAM
Isolation
PSO
Lib20.90V
Lib31.08V
Lib31.08V
Isolation

Floorplan: Power Domains• Each power domain has
its own dedicated row structure automatically created depending on the associated cell libraries
• Hard block placement is scripted and easily customizable through relative floorplan commands
• Power structure is automatically built via tcl script.
create_power_domain -name VCORE \-instances $VCORE_moduleInst_list -boundary_ports "$VCORE_pins" \-shutoff_condition {SWITCH_VCORE}
create_power_domain -name VRAM \-instances "$env(VRAM_moduleInst_list)" -boundary_ports "$VRAM_pins“
create_power_domain -name VSOC -default \-boundary_ports "$VSOC_pins"

Floorplan: Power StructureVRAM
Rings and stripes for:
VDDRAM, VSS
VCORERings and Stripes for:
VDDCORE, VSS
VSOCRings and stripes for:
VDD,VSS
create_ground_nets -nets VSScreate_power_nets -nets VDDRAM create_power_nets -nets VDDCORE -external_shutoff_condition {SWITCH_VCORE}create_power_nets -nets VDD

PlacementStandard cells, level shifters (single and multi-height) and isolation cells automatically placed in a single pass
verifyPowerDomaincommand checks:
nets crossing the boundaries (both level shifter cells and isolation cells)
library binding
placement of instances in a power domain
IO pins assigned to the correct power domain

Level Shifters / Clamps: VCORE-VRAM
VCOREcreate_isolation_rule -name rule_VCORE2VRAM \
-from VCORE -to VRAM -isolation_output low \-isolation_condition {!RAMCLAMP}
update_isolation_rules -names rule_VCORE2VRAM \-location to -cells {LVLLHEHX8M} \-combine_level_shifting
create_level_shifter_rule -name rule_VRAM2VCORE \-from VRAM -to VCORE
update_level_shifter_rules -names {rule_VRAM2VCORE} \-cells {LVLHLX8M} -location to
Single-row height LVLHL shifters
Triple-row height LVLLHEH shifters
Vertical connections of VDDI pins to
VDDCORE power net – automatically
routed
VRAM
VDDCORE
VDDRAM
VSS
VSS

Clock Tree Synthesis & RoutingCTS in SOC Encounter is power domain awareFor all clocks crossing power domains, CTS assumes that level shifters are present (according to CPF rules)A single pass top-down clock tree is created by CTSCTS does the following:
Establishes a single entry/exit point for each domainSelects buffers from appropriate libraries and places them within domain boundariesBalance skew through domains and for all active corners/views
Both trialRoute and nanoRoute honour power domains in SOC Encounter

Cross-Domain Timing Optimization
Optimization transparently considers level shifter / clamp placement and signal direction
Parts of net should be “don’t touch”Buffer needs to be inserted from the correct library and into the correct moduleBuffer location is timing driven
Optimizes timing and design rules concurrently for all active corners/views
Power Domain ALibraries A
Power Domain BLibraries B
Power Domain ALibraries A
Power Domain BLibraries B
Don’t touch nets
Power Domain B (1.0V)Libraries B
0.8V I/O
Power Domain A (0.8V)Libraries A
0.8V I/O

Variable VDD Flow Assign non-pre-characterized VDD value as the operating voltage and run through timing optimization closureUsing ECSM and tri-lib technology for better accuracyRequires a minimum of two library characterized for different voltages for good a accuracy in full rangeSupported by analysis and optimization
1.08V
0.72V
0.90VProcessTemp
Delay Calculator
SS, 125oC
Dynamic Voltage and Frequency Scaling
250MHz
200MHz
166MHz
250MHz
Target frequency
1.00V1.00V1.08VPM_triV
0.72VOff1.08VVCORE_Dormant
0.90V
1.08V
VCORE
0.90V
1.08V
VRAM
1.08VPM_medV
1.08VPM_highV
VSOCMode

Library support for MMMC
Library /ProcessARM Metro libraries for TSMC CL013G process
Available voltage cornersWC: 0.72V, 0.9V, 1.08V BC: 0.88V, 1.1V, 1.32V
Required viewsTiming models (.lib) characterized at available voltage corners,optionally with ECSM extensions for better accuracyNoise models (.cdB) characterized at available voltage corners

ECSM advanced timing modelsDelay sensitivity to Vdd increases nonlinearly at smaller geometries.lib and K-factors not accurate at low Vdd and between characterization points need for more accurate cell models for IR-drop and MSVECSM accurately models delay variations with Vdd:
Captures the waveform-dependent non-linear behavior of the receiver-pin capacitanceDriver modeled as a current source: I/V curve for each slope, load combination
ARM-Cadence RM deploys ECSM models for signoff STA within Encounter, leveraging detailed instance voltages output by power analysis.
0
50
100
150
200
250
300
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
Voltage (Vdd)D
elay
(ps)
SPICE
ECSM-based
K Factor
30%
Non-linear increase in Delay due to Vdd scaling
Linear Approximation
90nm Buffer, 1V nominal supply

Formal VerificationFunctional equivalence checkStructural checks (Conformal Low Power Solution)
Missing level shifters and power connectivityMissing isolation and power connectivityBad isolation cell and level shifterCorrect cell placement in physical domainIsolation cell enable
VCORE VRAMA Y
VoVi
Enable

Power and Rail analysis
VDDCORE IR drop analysis
Placed & Routed DesignDatabase
Static & DynamicIR-drop Analysis
PowerLibraries
Derive Power-PinLocation
CPF ModeSpecification
Power-SwitchECO
DecapECO
Report Power(Common Power Engine)
Plots
Waveforms & IR drop Files
Timing &Critical Path
Analysis

ConclusionComprehensive Low Power support across RTL2GDSLeverages state of the art features, including DVFS and variable VDD flowOptimized and tested for use with latest Cadence tool releasesComplete offering
Processor IPReference librariesPortable Reference Methodology Scripts for Cadence tools
ARM and Cadence collaborationStreamlining rapid deployment of ARM processor products
Accelerate time to market for the ARM PartnerAvailable to ARM partners and Cadence customers

Thank You…