implementing a voltage scaling reference flow based …rtcgroup.com/arm/2007/presentations/181 -...

of 41 /41
Implementing a Voltage Scaling Reference Flow Based on ARM’s IEM Giorgio Parapini Cadence ICD Product Engineer

Author: dinhliem

Post on 02-Feb-2018

224 views

Category:

Documents


3 download

Embed Size (px)

TRANSCRIPT

  • Implementing a Voltage Scaling Reference Flow Based on ARMs IEM

    Giorgio ParapiniCadence ICD Product Engineer

  • AbstractRelative to ARM's IEM technology, this session describes a reference flow to implement voltage scaling along with dormant mode to shutdown the standard cell circuitry on an ARM1176JZF-S processor core. We focus on the low power features leveraged from Cadence Encounter Platform, including multiple supply voltage, DVFS implementation with multi-corner libraries, and variable VDD and ECSM-based sign-off. The entire flow is based on a single specification of the design power intent, including power domains, power modes, level shifting and isolation rules. This session also helps attendees understand RTL and library support required for IEM implementation.

  • AgendaARM and Cadence Collaboration

    iRM

    ARM1176JZF-S IEM

    ARM Artisan Library Support

    ARM1176JZF-S IEM implementation flow using Cadence Encounter Platform

    Conclusion

  • ARM and Cadence Collaboration

  • ARM Alliance ARM and Cadence Alliance History

    Multi-year partnership to: Address design challenges Optimize ARM offerings for design and implementation Increase productivity and IP use predictability

    Verification

    AHB, eVC, and AMBA complianceSystem C

    Encounter and RTL Compiler

    ReferenceMethodology

    Improved QoS

    SI & X Architecture

    Support

    27% Wire reduction

    June 00 June 03 Sept 03 Sept 06Mar 05/May 06

    Low Power

    40% PWR & PSO savings

    June 06

    Functional Verification

    Kit

    ReducedRisks

    Cortex-A8 Express

    SynthesizableQoS

    And TAT

    May 07

    Low PowerMethodology

    Kit

    Faster deployment of low power

    flows

    Nov 07

    PFI SiliconProof Point

    Silicon Validated low power

    flow

  • iRM

  • ARM-Cadence Reference Methodology

    Scripted RTL to GDSII flow delivers

    Optimised PPAReduced Turnaround TimeRisk/Cost reduction

    Key elementsCore hardeningGeneration of abstract models

    Available from ARMJointly developed flowsQualified/released with coresDesign kit gives complete out-of-box experience

    Perf

    orm

    ance Max Attainable Performance

    ARM-Cadence Solution

    Proprietary Solutions

    Faster Time To Market Time

  • Cadence iRMs Available For All ARM Processors

    SageX 130G4.23ARM966E-S

    4.24.24.2

    4.24.24.24.26.16.16.16.16.26.26.2

    6.2-CPFEncounterEncounter

    SageX 130G3ARM7EJ-SSageX 130G3ARM7TDMI-SSageX 130G3ARM946E-S

    SageX 130G3ARM1026EJ-SSageX 130G2ARM968E-SSageX 130G2ARM1136J[F]-SSageX 130G2ARM1156J[F]-S

    Advantage 90G1ARM926EJ-SAdvantage 90G1ARM1176JZ[F]-SAdvantage 90G1CORTEX-M3Advantage 90G1CORTEX-R4

    Advantage 90G, 65LP1CORTEX-A8Advantage 90G1ARM11 MPCOREAdvantage 90G1CORTEX-R4[F]

    Metro 130LP1ARM1176JZF-S IEMARM Library (TSMC)ARM Library (TSMC)TierTierProcessorProcessor

  • ARM1176JZF-S IEM

  • What Consumers Care About

    Users want more features in their mobile devicesMP3, Camera, Video, GPS...

    But also need long battery lifeConvenient form factor, affordable price

    Battery technology is not evolving fast enough!Need to manage power consumption

    Heart of all these devices - Microprocessor

  • Power Dissipation

    Minimise Ileak by:Reducing operating voltageFewer leaking transistors

    Minimise Iswitch by:Reducing operating voltageLess switching capLess switching activity

    dtIVfCVE lkgDDt

    cDD )(0

    2 +=

    t

    leakDD dtIV0

    Total PowerDissipation

    Total PowerDissipation

    Dynamic PowerDissipation

    Dynamic PowerDissipationStatic PowerDissipation

    Static PowerDissipation

    t

    cDD dtfCV0

    2

    IleakIswitch

  • Improving Dynamic Energy Efficiency

    Dynamic Frequency Scaling (DFS)Reduce operating frequency if possibleReduces average power (but not task energy)Eliminates idle cycle

    Dynamic Voltage & Frequency Scaling (DVFS)Requires DFSReduces voltage if frequency is reducedReduces task energyBased on characterized frequency voltage pairs (lookup table)

  • How IEM Works?Batteries have finite amounts of energy stored in themRunning fast and then idling wastes energy

    Time

    Voltage Reduce VoltageReduce Voltage

    ReduceVoltageReduceVoltage

    Reduce VoltageReduce Voltage

    Task 1 Task 2 Task 3Idle

    Only need to run just fast enough to meet the application deadlinesOnly need to run just fast enough to meet the application deadlines

    Energy

    EnergySaved

    Energy

    Run Task Slow as Possible

    Run Task Slow as Possible

    Run Task in Available TimeRun Task in

    Available Time

  • IEM System Implementation

  • ARM1176JZ-S Power Management

    Two Complementary Techniques:

    ARM1176 Dormant Mode, allowsComplete Power-off of Core (no leakage in core)Retain system state in Cache/TCM at low voltage

    Minimize energy loss due to leakage in standby modes.

    Note : Requirement of isolation cells to clamp the RAM inputs

    IEMcompatible core and design flowEnables dynamic voltage and frequency scalingTune Performance dynamically to current demand

    Substantial reduction of energy consumption, extended Battery Life.

  • ARM Artisan Library Support

  • VDD2

    VDD1

    VDD1

    VDD1

    VDD2

    VDD2

    Power gates (MT-CMOS)Power control of voltage islands via switchable voltage rails using header (shown) or footer cells

    Level shifter and isolation cells Up and down shifting with optional enable signal

    Retention flip-flopsMaintain FF state after power down for leakage reduction

    Back-bias support Reduce leakage current via well-biasing with special fill_tie cells

    Always on bufferBuffering of signals in powered-down areas Global VDD

    Global VSS

    ARM Power-Management Kit

    Note: Picture shows a conceptual implementation

  • AgendaARM and Cadence Collaboration

    iRM

    ARM1176JZF-S IEM

    ARM Artisan Library Support

    ARM1176JZF-S IEM implementation flow using Cadence Encounter Platform

    Conclusion

  • ARM and Cadence iRMARM-Cadence

    Reference Methodology

    RTL Compiler

    SOC Encounter

    Fire & Ice QX

    Celtic NDC

    VoltageStorm

    ConformalLow Power

    ARM Artisan Power Management Kit

    Compiled Views for Fire&ICE,

    VoltageStorm and Celtic NDCTiming ECSM

    extensions

    ARM Artisan MetroStandard Cell

  • ARM1176-IEM iRM FeaturesCPF based flow

    CPF file is used to describe the low power intent and to drive implementation and verification flow

    Automated RTL to GDS Multi Supply Voltage Implementation flowMulti Mode Multi Corner (MMMC) analysis and optimization

    Ensures design is optimized across complete voltage and frequency range

    Tri-lib based flowProvides accurate interpolation for DVFS and IR drop analysisECSM extensions to .lib

  • Cadence Low-Power Solution

    Define power architecture early-on in the design flowCapture once using CPF; no re-entry or translation later on

    Entire design flow understands CPF and helps preserve the power-intent Verification: Comprehensive low-power simulation and formal verification

    Design: Power-aware synthesis, equivalence checking, and DFT

    Implementation: Automated power-aware RTL-to-GDS layout

    Management: Power plan and metrics

  • What is the Common Power Format?

    Design intentPower domain

    Logical: hierarchical modules as domain membersPhysical: power/ground nets and connectivityAnalysis view: timing library sets for power domains

    Power LogicLevel Shifter LogicIsolation LogicState-Retention logicSwitch Logic & Control Signals

    Power modes DefinitionsTransition Expressions

    Technology information

    Level shifter cellsIsolation cellsState-retention cellsSwitch cellsAlways-on cells

    Single specification of power intent used throughout design, verification, and implementationASCII File that captures:

  • ARM1176-IEM CPF: MSV setup

    create_power_domain -name VCORE \-instances $VCORE_moduleInst_list \-boundary_ports "$VCORE_pins" \-shutoff_condition {SWITCH_VCORE}

    update_power_domain -name VCORE \-internal_power_net VDDCORE

    create_global_connection -net VDDCORE -domain VCORE -pins VDDcreate_global_connection -net VSS -domain VCORE -pins VSS

    create_power_domain -name VRAM create_power_domain -name VSOC default create_isolation_rule -name rule_VCORE2VRAM \

    -from VCORE -to VRAM \-isolation_output low \-isolation_condition {!RAMCLAMP}

    update_isolation_rules -names rule_VCORE2VRAM \-location to -combine_level_shifting \-cells {LVLLHEHX8M}

    create_level_shifter_rule -name rule_VRAM2VCORE \-from VRAM -to VCORE

    update_level_shifter_rules -names rule_VRAM2VCORE \-cells {LVLHLX8M} -location

    create_level_shifter_rule -name rule_VSOC2VCORE create_isolation_rule -name rule_VCORE2VSOC_low create_isolation_rule -name rule_VCORE2VSOC_high

    create_power_nets -nets VDDRAM create_power_nets -nets VDDCORE \

    -external_shutoff_condition {SWITCH_VCORE}create_power_nets -nets VDD

  • Multi Mode Multi Corner (MMMC)

    Voltagebbb

    bbw

    www

    wwb

    Process

    TempMulti PVT Corners

    Analysis View

    Corner definition Power mode

    Library Set Operating Conditions

    (PVT).lib

    (.ecsm* ext).cdb (SI)

    .sdcRC Corner -SPEF -QX tech -Cap Table

    Dynamic Voltage Frequency Scaling

    250MHz

    90MHz

    166MHz

    250MHz

    Frequency

    0.72V0.72V1.08VPM_lowV

    0.72VOff1.08VVCORE_Dormant

    0.90V

    1.08V

    VCORE

    0.90V

    1.08V

    VRAM

    1.08VPM_medV

    1.08VPM_highV

    VSOCMode

    Multiple Modes with multiple constraints (.sdc)

  • ARM1176-IEM CPF: Power modes

    Dynamic Voltage and Frequency Scaling

    250MHz

    90MHz

    166MHz

    250MHz

    Target frequency

    0.72V0.72V1.08VPM_lowV

    0.72VOff1.08VVCORE_Dormant

    0.90V

    1.08V

    VCORE

    0.90V

    1.08V

    VRAM

    1.08VPM_medV

    1.08VPM_highV

    VSOCMode

    create_nominal_condition -name highV -voltage 1.08update_nominal_condition -name highV -library_set libs-worst-1.08vcreate_nominal_condition -name OFF -voltage 0

    create_power_mode -name PM_highV -default \-domain_conditions {[email protected] [email protected] [email protected]} \

    update_power_mode -name PM_highV -sdc_files ARM1176JZFS.constraints_PM_highV.sdc

    create_power_mode -name VCORE_dormant \-domain_conditions {[email protected] [email protected] [email protected]}

    Same power modes are used by both Logic Synthesis (RTLCompiler) and by Place & Route (SOCEncounter)

  • ARM1176-IEM CPF: Corners & Analysis viewsOperating corners and analysis views are only used for physical implementationAnalysis and physical optimization are working concurrently on active analysis views

    create_operating_corner -name WCORNER_1.08 \-voltage 1.08 -temperature 125 -process 1 -library_set libs-worst-1.08v

    create_operating_corner -name WCORNER_0.72 \-voltage 0.72 -temperature 125 -process 1 -library_set libs-worst-0.72v

    create_operating_corner -name BCORNER \

    -voltage 1.32 -temperature "-40" -process 1 -library_set libs-best-1.32

    create_analysis_view -name WCVIEW_1.08 \-mode PM_highV \-domain_corners {[email protected]_1.08 [email protected]_1.08 [email protected]_1.08}

    create_analysis_view -name WCVIEW_0.72 \-mode PM_lowV \-domain_corners {[email protected]_0.72 [email protected]_0.72 [email protected]_1.08}

    create_analysis_view -name BCVIEW_1.32 \

    -mode PM_highV \-domain_corners {[email protected] [email protected] [email protected]}

  • Automated CPF-driven MSV flow

    Use the CPF to drive synthesis and physical implementation

    MSV/power domain partition (power domains with assigned instances, toplevel IO pins and power ground connections)Setup MMMC environment (power modes, delay corners and analysis views )Isolation rules and level_shifterrules to automatize the usage of shifter / isolation cells: definition, identification from RTL, insertion, placement and optimizationSynthesize, place, optimize, and route based on power domainsDomain-aware Clock Tree Synthesis

    GDSII

    Constraints CPF Netlist

    Floorplanning / Silicon Virtual Prototype

    Power Routing

    Low Power Clock Tree Synthesis

    Domain-aware Post-CTS Optimization

    IRDrop-Aware Timing/SI Opt.

    Sign-off

    Isolation/level shifter cells check and insertion

    Placement includingSRPG/Level shifters/Iso cells

    Top-down MSV/MultiModeSingle-pass Synthesis

    Power Grid Synthesis

    Domain Aware NanoRoute

    Low P

    ower Functional V

    erificationS

    I & S

    tatic Timing A

    nalysisS

    tatic/Dynam

    ic Pow

    er Analysis

    MSV/MMMC Infrastructure

    Timing/P

    ower O

    ptimization

  • Top VSOC

    Logic synthesis

    Top-down single-pass synthesis with power domain definitionIdentification of level shifters and isolation cells already instantiated in RTLMulti-mode synthesis to consider frequency and voltage scaling Implementation of separated scan chains, for VSOC and VCORE domainsLeakage power optimization using High-Vt cells

    CRead/constrain top

    Apply power constraints

    Synthesize Top

    Analyze timing/power

    Lib10.72V

    Lib20.90V

    B

    Chip SDCRTL

    Lib10.72V

    VCORE

    Power DomainsShiftersIsolationPower modesLeakageDynamicClock Gating

    Chip CPF

    Adjust / incremental update

    VRAM

    Isolation

    PSO

    Lib20.90V

    Lib31.08V

    Lib31.08V

    Isolation

  • Floorplan: Power Domains Each power domain has

    its own dedicated row structure automatically created depending on the associated cell libraries

    Hard block placement is scripted and easily customizable through relative floorplan commands

    Power structure is automatically built via tcl script.

    create_power_domain -name VCORE \-instances $VCORE_moduleInst_list -boundary_ports "$VCORE_pins" \-shutoff_condition {SWITCH_VCORE}

    create_power_domain -name VRAM \-instances "$env(VRAM_moduleInst_list)" -boundary_ports "$VRAM_pins

    create_power_domain -name VSOC -default \-boundary_ports "$VSOC_pins"

  • Floorplan: Power StructureVRAM

    Rings and stripes for:

    VDDRAM, VSS

    VCORERings and Stripes for:

    VDDCORE, VSS

    VSOCRings and stripes for:

    VDD,VSS

    create_ground_nets -nets VSScreate_power_nets -nets VDDRAM create_power_nets -nets VDDCORE -external_shutoff_condition {SWITCH_VCORE}create_power_nets -nets VDD

  • PlacementStandard cells, level shifters (single and multi-height) and isolation cells automatically placed in a single pass

    verifyPowerDomaincommand checks:

    nets crossing the boundaries (both level shifter cells and isolation cells)

    library binding

    placement of instances in a power domain

    IO pins assigned to the correct power domain

  • Level Shifters / Clamps: VCORE-VRAM

    VCOREcreate_isolation_rule -name rule_VCORE2VRAM \

    -from VCORE -to VRAM -isolation_output low \-isolation_condition {!RAMCLAMP}

    update_isolation_rules -names rule_VCORE2VRAM \-location to -cells {LVLLHEHX8M} \-combine_level_shifting

    create_level_shifter_rule -name rule_VRAM2VCORE \-from VRAM -to VCORE

    update_level_shifter_rules -names {rule_VRAM2VCORE} \-cells {LVLHLX8M} -location to

    Single-row height LVLHL shifters

    Triple-row height LVLLHEH shifters

    Vertical connections of VDDI pins to

    VDDCORE power net automatically

    routed

    VRAM

    VDDCORE

    VDDRAM

    VSS

    VSS

  • Clock Tree Synthesis & RoutingCTS in SOC Encounter is power domain awareFor all clocks crossing power domains, CTS assumes that level shifters are present (according to CPF rules)A single pass top-down clock tree is created by CTSCTS does the following:

    Establishes a single entry/exit point for each domainSelects buffers from appropriate libraries and places them within domain boundariesBalance skew through domains and for all active corners/views

    Both trialRoute and nanoRoute honour power domains in SOC Encounter

  • Cross-Domain Timing Optimization

    Optimization transparently considers level shifter / clamp placement and signal direction

    Parts of net should be dont touchBuffer needs to be inserted from the correct library and into the correct moduleBuffer location is timing driven

    Optimizes timing and design rules concurrently for all active corners/views

    Power Domain ALibraries A

    Power Domain BLibraries B

    Power Domain ALibraries A

    Power Domain BLibraries B

    Dont touch nets

    Power Domain B (1.0V)Libraries B

    0.8V I/O

    Power Domain A (0.8V)Libraries A

    0.8V I/O

  • Variable VDD Flow Assign non-pre-characterized VDD value as the operating voltage and run through timing optimization closureUsing ECSM and tri-lib technology for better accuracyRequires a minimum of two library characterized for different voltages for good a accuracy in full rangeSupported by analysis and optimization

    1.08V

    0.72V

    0.90VProcessTempDelay

    Calculator

    SS, 125oC

    Dynamic Voltage and Frequency Scaling

    250MHz

    200MHz

    166MHz

    250MHz

    Target frequency

    1.00V1.00V1.08VPM_triV

    0.72VOff1.08VVCORE_Dormant

    0.90V

    1.08V

    VCORE

    0.90V

    1.08V

    VRAM

    1.08VPM_medV

    1.08VPM_highV

    VSOCMode

  • Library support for MMMC

    Library /ProcessARM Metro libraries for TSMC CL013G process

    Available voltage cornersWC: 0.72V, 0.9V, 1.08V BC: 0.88V, 1.1V, 1.32V

    Required viewsTiming models (.lib) characterized at available voltage corners,optionally with ECSM extensions for better accuracyNoise models (.cdB) characterized at available voltage corners

  • ECSM advanced timing modelsDelay sensitivity to Vdd increases nonlinearly at smaller geometries.lib and K-factors not accurate at low Vdd and between characterization points need for more accurate cell models for IR-drop and MSVECSM accurately models delay variations with Vdd:

    Captures the waveform-dependent non-linear behavior of the receiver-pin capacitanceDriver modeled as a current source: I/V curve for each slope, load combination

    ARM-Cadence RM deploys ECSM models for signoff STA within Encounter, leveraging detailed instance voltages output by power analysis.

    0

    50

    100

    150

    200

    250

    300

    0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4

    Voltage (Vdd)D

    elay

    (ps)

    SPICE

    ECSM-based

    K Factor

    30%

    Non-linear increase in Delay due to Vdd scaling

    Linear Approximation

    90nm Buffer, 1V nominal supply

  • Formal VerificationFunctional equivalence checkStructural checks (Conformal Low Power Solution)

    Missing level shifters and power connectivityMissing isolation and power connectivityBad isolation cell and level shifterCorrect cell placement in physical domainIsolation cell enable

    VCORE VRAMA Y

    VoVi

    Enable

  • Power and Rail analysis

    VDDCORE IR drop analysis

    Placed & Routed DesignDatabase

    Static & DynamicIR-drop Analysis

    PowerLibraries

    Derive Power-PinLocation

    CPF ModeSpecification

    Power-SwitchECO

    DecapECO

    Report Power(Common Power Engine)

    Plots

    Waveforms & IR drop Files

    Timing &Critical Path

    Analysis

  • ConclusionComprehensive Low Power support across RTL2GDSLeverages state of the art features, including DVFS and variable VDD flowOptimized and tested for use with latest Cadence tool releasesComplete offering

    Processor IPReference librariesPortable Reference Methodology Scripts for Cadence tools

    ARM and Cadence collaborationStreamlining rapid deployment of ARM processor products

    Accelerate time to market for the ARM PartnerAvailable to ARM partners and Cadence customers

  • Thank You