clock gating methodology

71
Clock Gating Methodology for Power and CTS QoR

Upload: samtp

Post on 08-Apr-2015

56 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Clock Gating Methodology

Clock Gating Methodologyfor

Power and CTS QoR

Page 2: Clock Gating Methodology

2

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements • Summary

Page 3: Clock Gating Methodology

3

Objective

• Describe the clock gating methodology to meet target– Skew– Insertion delay– Power

• Discuss recommendations during – RTL synthesis using Design Compiler– Physical synthesis using IC Compiler or Physical Compiler– Clock tree synthesis using IC Compiler or Astro

Page 4: Clock Gating Methodology

4

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 5: Clock Gating Methodology

5

What is Clock Gating?

• Register banks disabled during some clock cycles– Typical implementation uses multiplexers– Clock gating cell replaces multiplexers

EN

CLK

D Q

gclkLow

activity

EN

QD

CLK

High activity

Page 6: Clock Gating Methodology

6

Benefits of Clock Gating

• Dynamic power savings– With low toggle rate on clock pin, internal power of registers is

reduced– Gated by the enable signal, the clock network has less switching

activity and consumes less switching power

• Area savings– Eliminating multiplexers saves area

• Easy to implement– No RTL code change is required– Clock gating is automatically inserted by the tool– Technology independent

Page 7: Clock Gating Methodology

7

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 8: Clock Gating Methodology

8

Clock Gating Methodology Overview

Merge clock gates

Placement and placement optimization

Merge clock gates

Placement and placement optimization

Replicate clock gates

Clock tree synthesis

Detail routing

Replicate clock gates

Clock tree synthesis

Detail routing

Input RTL Insert clock gating

Compile

Insert clock gating

Compile

Design CompilerDesign Compiler

Physical CompilerPhysical Compiler

AstroAstro

Merge clock gates

Placement and placement optimization

Replicate clock gates [BETA]

Clock tree synthesis

Detail routing

Merge clock gates

Placement and placement optimization

Replicate clock gates [BETA]

Clock tree synthesis

Detail routing

IC CompilerIC Compiler

Design Compiler X-2005.09IC Compiler v1.1Physical Compiler X-2005.09Astro X-2005.09

Unified Flow in IC Compiler

Page 9: Clock Gating Methodology

9

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis

MethodologyClock gating considerations

– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 10: Clock Gating Methodology

10

Clock Gating Methodology During RTL Synthesis

Input RTLRead in Verilog read_verilog

Read in Verilog read_verilog

Define the clocks create_clock

Define the clocks create_clock

Set the clock gating style set_clock_gating_style

Set the clock gating style set_clock_gating_style

Insert clock gating insert_clock_gating

Insert clock gating insert_clock_gating

Compile compile

Compile compile

RTL Synthesis

Page 11: Clock Gating Methodology

11

Specify Clock Gating Options

• Use the set_clock_gating_style command

• Maximum fanout– This value is the maximum fanout of each clock gating

element– By default, the fanout is unlimited

• Minimum bitwidth– This is the minimum bitwidth of register banks that will be

gated– By default, the minimum bitwidth is 3– No area or power benefit with register banks with bitwidth

less than 3RTL Synthesis

Page 12: Clock Gating Methodology

12

Insert Clock Gating During RTL Synthesis

• Use the insert_clock_gating commandThe -global option looks across hierarchical boundaries for the common enable

RTL Synthesis

b

clk

CG

d1

d2

a

CG

Module A

Module B

Regular clock gatingTop

EN

EN

b

clk

CG

d1

d2

a

Module A

Module B

Hierarchical clock gating

Extra ports added

Top

EN

Page 13: Clock Gating Methodology

13

Measure the Quality of Inserted Clock Gating: Report Power and Clock Gating

• Use the report_power command

• Use the report_clock_gating command Clock Gating Summary

------------------------------------------------------------| Number of Clock gating elements | 222 || | || Number of Gated registers | 167512 (99.92%) || | || Number of Ungated registers | 137 (0.08%) || | || Total number of registers | 167649 |------------------------------------------------------------

RTL Synthesis

Cell Internal Power = 160.6544 mW (61%)Net Switching Power = 102.5581 mW (39%)

---------Total Dynamic Power = 263.2125 mW (100%)

Cell Leakage Power = 3.0961 mW

Page 14: Clock Gating Methodology

14

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis

MethodologyClock gating considerations

– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 15: Clock Gating Methodology

15

Clock Gating Considerations

• Clock gate styles• Enable signal timing

– Ensure that you meet the setup and hold time on the enable pin of clock gate

• Impact of clock gate fanout on– Power and enable pin timing– Clock tree structure

RTL Synthesis

Page 16: Clock Gating Methodology

16

Clock Gate Styles

• Integrated, latch-based, clock gate (ICG) is recommended• Discrete, latch-based or latch-free (simple AND or OR-AND

gate) clock gates are also supported– Discrete clock gates are not recommended (details on next slide)

• Latch-based clock gates prevent a glitch on the enable from being propagated to the gated clock

CLK

EN

GCLK

EN

CLK

D Q

GCLK

No glitches on gated clock

RTL Synthesis

Page 17: Clock Gating Methodology

17

Integrated Versus Discrete Clock Gating

EN

CLK

RTL SynthesisIntegrated clock gating is recommended

EN

CLK

Integrated clock gate Discrete clock gate

GCLK GCLK

Ensure minimum skew between latch and AND gate

Specify latch clock pin as a non stop pin for CTS

Specify the setup and hold time

This adds complexity to the flow

No clock skew between latch and AND gate

Timing analysis and CTS handle the clock gate automatically

Setup and hold check modeled in library

Easy to use in the flow

Page 18: Clock Gating Methodology

18

Enable Signal Timing

• Setup time on the enable pin of clock gate

• Synthesis assumes that the clock signal arrives at all registers and clock gates at same time (within skew)

• Clock signal reaches the clock gating cell earlier than it reaches the registers

• Timing constraints on the enable signals need to be adjusted

Note: The closer the clock gating cell is to the registers, the less constrained the enable signal

CLK

CG

( ) ( + )

EN

CLK

RTL Synthesis

Page 19: Clock Gating Methodology

19

Impact of Clock Gate Fanout

• Clock gate fanout is determined by– The -max_fanout option of the set_clock_gating_style

command in Design Compiler– By default, the fanout is unlimited

• Impact of clock gate fanout on– Power and enable pin timing– Clock tree structure

RTL Synthesis

Page 20: Clock Gating Methodology

20

Impact of Clock Gate Fanout on Power and Timing

Easier to meet enable pin timingPower might be affected

ICG

ICG

ICG

ICG

Fewer clock gating cellsBetter power reductionMore constrained enable

ICG

RTL Synthesis

Large max fanout Small max fanout

Page 21: Clock Gating Methodology

21

Impact of Clock Gate Fanout on Clock Tree Structure

More balanced clock structureEasier to meet CTS QoR

Unbalanced clock structureDepending on design skew requirement,

may need processing for CTS QoR

RTL Synthesis

Large max fanout Small max fanout

108

8

300

60

ICG

ICG

ICG

8

60

ICG

ICG

ICG

30

30

27

27

ICG

ICG

Page 22: Clock Gating Methodology

22

Impact of Clock Gate Fanout Summary

RTL Synthesis

• By default, max fanout is unlimited– Results in best power savings and reasonable CTS QoR

• If CTS QoR is a higher priority,– Make your clock structure as balanced as possible

set_clock_gating_style –minimum_bitwidth value \-max_fanout value

Use similar value for min_bitwidth and max_fanoutBalance fanout of each clock gateEliminate small fanoutSelect the value based on your design

Experiments have shown that using a balanced fanout of 128 or 256 results in improved CTS QoR

Page 23: Clock Gating Methodology

23

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 24: Clock Gating Methodology

24

Clock Gating Usage During Placement Optimization

• Large or unlimited fanout– By default, no group bounds are created for the clock gate

and its fanout during placementAvoid congestion around the clock gateYou will get better overall timing QoR–Placement of the registers is based on timing–Not constrained by location of clock gate

• Small fanout– To keep the clock gate and its register fanout together

during placement, useset physopt_disable_auto_bound_for_gated_clock false

Helps meet timing of the enable pin

Physical Synthesis

Page 25: Clock Gating Methodology

25

Optimizing the Clock Structure in a Gate-Level Design

• Consider the following scenarios:– Clock gate insertion done during RTL synthesis with small

fanout– Gate-level netlist with clock gates from a third party and

with small clock gate fanout

• To improve power, you can– Optimize or minimize the clock gates in your design

Run merge_clock_gates on your design

Physical Synthesis

Page 26: Clock Gating Methodology

26

Merging Clock Gates

Physical Synthesis

Placement optimization Placement optimization

Clock tree synthesis

Gate-level design

Merge clock gatesmerge_clock_gates

Merge clock gatesmerge_clock_gates

Merges clock gates that share a

common enable

Identify clock gatesidentify_clock_gates

Identify clock gatesidentify_clock_gates

Only required in a Verilog-based flow

Page 27: Clock Gating Methodology

27

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis

Prepare your clock structure for CTSReplicate clock gates

– Summary of recommendations• Sample results• Planned enhancements• Summary

Page 28: Clock Gating Methodology

28

Prepare the Clock Structure for CTS

Complex clock gating presents a challenge for CTS. You can– Insert “always enabled” clock gates– Replicate clock gates Add “always enabled” clock

gates to create a more balanced tree

Replicate clock gates

ICG

ICG108

25

8

300ICG

60

ICG

Clock Tree Synthesis

8

ICG

ICG

25

ICG

60

ICG

ICG

ICGICG

ICGICG 31

28

34

28

Page 29: Clock Gating Methodology

29

Creating More Balanced Clock Structures During RTL Synthesis

– To enable, useset power_cg_all_registers true

– Also set the following variableset power_remove_redundant_clock_gates false

ICG

ICG

ICG

EN1

EN2

Active High

RTL Synthesis

ICG

ICG

EN1

EN2

Page 30: Clock Gating Methodology

30

What is Replicate Clock Gates?

Balances fanout by fixing DRC at the output of the ICG

Same engine used for clustering in clock tree synthesis and clock gate replication

Clock Tree Synthesis

108

25

25

ICG

ICG

ICG

ICG

ICG

ICG

31

20

25

25

ICG32

25Adds buffers to drive registers

that are not gated

Page 31: Clock Gating Methodology

31

What Does Replicate Clock Gates in Astro and IC Compiler do?

• Replicates clock gate with new instances using the same reference cell

• Balances the fanout of clock gates based on design rule constraints

• Considers the location of registers• In Astro, marks the output net of the clock gate as “synthesized”

– Astro CTS does not modify the net– IC Compiler CTS checks the net for a DRC violation, but does not modify the

net if it is DRC clean

• Inserts buffers to drive registers that are not gated• The number of clock gates increases

– Clock gates are larger than clock buffers and consume more power– Impact on power and area

Clock Tree Synthesis

Page 32: Clock Gating Methodology

32

When to Replicate Clock Gates?

Clock tree synthesis Clock tree synthesis

Meet target skew ?

Detail routing

Yes

Placed design

Clock Tree Synthesis

Replicate clock gates Replicate clock gates

Only when needed

Yes

Check other factors

No

No

Unbalanced clock

structure ?

Page 33: Clock Gating Methodology

33

Prerequisites for Replicating Clock Gates in Astro1. Ensure that you have logically equivalent cells (LEQs) in

the reference library– This allows the sizing of ICGs

2. Set the DRC constraints– Use the astClockOptions command

3. To enable the insertion of buffers to drive registers that are not gated, use the following command:

axSetIntParam "acts" "push down clock ports" 1

4. If you want to prevent the tool from using certain ICG cells– Define the design LEQs (see the appendix for details)

Clock Tree Synthesis

Page 34: Clock Gating Methodology

34

Prerequisites for Replicating Clock Gates in IC Compiler1. Ensure that you have logically equivalent cells (LEQs) in

the reference library– This allows the sizing of ICGs

2. Set the DRC constraints– Use the set_clock_tree_options command

3. To enable insertion of buffers to drive registers that are not gated, set the following variable:

set cts_push_down_buffer true

4. If you want to prevent the tool from using certain ICG cells, set dont_use on the cells

Clock Tree Synthesis

Page 35: Clock Gating Methodology

35

Using astSplitClockNet in Astro

– File contains either- Instance names of the cells to be replicated- Nets names (all fanout on specified nets are processed)

astSplitClockNetsetFormField “Split Clock Net" "Clock Gated Cells File Name"

“split.txt"formOK “Split Clock Net“

Clock Tree Synthesis

Page 36: Clock Gating Methodology

36

Using split_clock_net in IC Compiler

split_clock_net –objects object_list-gate_sizing–gate_relocation

– The object_list is a list of instances or nets whose fanout is to be replicated

– Enable sizing or relocation of ICGs

Clock Tree Synthesis

Page 37: Clock Gating Methodology

37

Creating Balanced Clock Fanout at RTL Versus Replicate Clock Gates Before CTS

DRC at output of clock gate (includes input capacitance of registers and net capacitance)Clustering based on placement location

Clock gate fanoutBased on

Selected maximum fanout at RTL synthesis for maximum power savings.Need to preprocess clock structure to meet target skew.

CTS QoR is a priority.Enable pin timing is a priority.

Why?

Replicate clock gates before CTS.

Insert clock gating at RTL synthesis.

When?

Replicate Clock GatesBalanced Clock Fanoutat RTL

Page 38: Clock Gating Methodology

38

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 39: Clock Gating Methodology

39

Recommendations for RTL Synthesis

– Select the maximum fanout based on your design priorityLarge fanout gives you more power savingsBalanced fanout gives good CTS QoR

– Use integrated, latch-based clock gating cells

Page 40: Clock Gating Methodology

40

Recommendations for Physical Synthesis/CTS

• Physical synthesis– Use group bounds only when the maximum fanout is small

• Clock tree synthesis– Replicate clock gates only if necessary– Use DRC constraints to control the number of replicated

clock gates

Page 41: Clock Gating Methodology

41

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 42: Clock Gating Methodology

42

Sample Results: Design 1

With replication of clock gates

Clock tree synthesis

No group boundsPhysical synthesis

No max fanout constraint (default: unlimited)Insert always active clock gating cells

RTL synthesisInsert clock gating

Flow highlights

Achieved target skew with replication of clock gates

48mWTotal power without clock gating

150psTarget skew

90nm, 160MHz clock, 181K instances, 37 macros

Design details

27mWFinal power

141psFinal skew *See sample scripts in the appendix

Results

Page 43: Clock Gating Methodology

43

Sample Results: Design 2

No replication of clock gates

Clock tree synthesis

No group boundsPhysical synthesis

No max fanout constraint (default: unlimited)Insert always active clock gating cells

RTL synthesisInsert clock gating

Flow highlights

Achieved target skew without replication of clock gates

21mWTotal power without clock gating

100psTarget skew

90nm, 85MHz clock, 39K instances, 1 macro

Design details

Results

16mWFinal power

91psFinal skew *See sample scripts in the appendix

Page 44: Clock Gating Methodology

44

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 45: Clock Gating Methodology

45

Planned Enhancements for Clock Gating Methodology

• Astro and IC Compiler– Improved QoR with clock gating

Create a more balanced clock structure before doing CTSCreate a clock tree with equal levels of logic to each sink

• IC Compiler only– Use clock gate optimization to optimize the timing of the

enable pin after CTS

Page 46: Clock Gating Methodology

46

Agenda

• Objective• Introduction to clock gating• Clock gating methodology

– Overview– RTL synthesis– Physical synthesis– Clock tree synthesis– Summary of recommendations

• Sample results• Planned enhancements• Summary

Page 47: Clock Gating Methodology

47

Summary

• Understand the power and CTS requirements of your design

• Choose the clock gating methodology based on your design requirements– Use integrated clock gating– Process the clock structure based on your CTS and power

requirementsSelect the right fanout of clock gates during RTL synthesisUse merge and replication of clock gates only if necessary

Page 48: Clock Gating Methodology

48

Appendix

• Sample scripts• Summary of clock gating methodologies• Overview of clock gating methodology using ASCII

interchange format• How to handle enable signal timing• Equivalence checking in Formality• Clock gating and design-for-test• Details on replicate clock gates• Additional considerations with discrete clock gating

Page 49: Clock Gating Methodology

49

Sample DC Script

#Set clock gating options, max_fanout default is unlimitedset_clock_gating_style -sequential_cell latch \

-positive_edge_logic {integrated} \-control_point before \-control_signal scan_enable

#Create a more balanced clock tree by inserting “always enabled” ICGsset power_cg_all_registers trueset power_remove_redundant_clock_gates true

read_db design.gtech.dbcurrent_design toplinksource design.cstr.tcl

#Insert clock gatinginsert_clock_gatingcompile

#Generate a report on clock gating insertedreport_clock_gating

Page 50: Clock Gating Methodology

50

Sample IC Compiler Script

#Open the Milkyway designopen_mw_lib design_lib.mwopen_mw_cel top

current_design toplink

#Placement & placement optimizationplace_opt

#Set clock tree optionsset_clock_tree_options –clock_tree Clk \

–max_capacitance 0.3 \-max_transition 0.3

#Replicate clock gatessplit_clock_net –object_list “*latch*” –gate_sizing –gate_relocation

#Clock tree synthesis and optimizationclock_opt

Page 51: Clock Gating Methodology

51

Sample Astro Script

#Open the Milkyway design geOpenLibsetFormField "Open Library" "Library Name" “design.mw"formOK "Open Library"geOpenCellsetFormField "Open Cell" "Cell Name" “top"formOK "Open Cell“

#Set clock tree optionsastClockOptionssetFormField "Clock Common Options" "Maximum Transition Delay“ “0.3”setFormField "Clock Common Options" "Maximum Load Capacitance" “0.3”formOK "Clock Common Options"

#Replicate clock gatesastSplitClockNetsetFormField "Duplicate Clock Gated Cells" "Clock Gated Cells File Name" “split.lst"formOK "Duplicate Clock Gated Cells"

#Clock tree synthesis astCTSformOK "Clock Tree Synthesis"

Page 52: Clock Gating Methodology

52

Format of file for astSplitClockNet

• Line separated list of instances or net names• Allows wildcard “.*”• Example:

cg_latch_inst_1cg_latch_inst_2cg_latch_inst_3

Page 53: Clock Gating Methodology

53

Design LEQs in Astro

• Define design LEQsastLoadDesignLEQ file_name

– Example:cell1 cell2cell2 cell3cell4 cell5

cell1, cell2, and cell3 are in the same classcell4 and cell5 are in the same class

• Clear/dump design LEQs– astClearDesignLEQ– astDumpDesignLEQ

Page 54: Clock Gating Methodology

54

Summary of Clock Gating Methodologies

Clock gate fanout

Power is a priority.CTS QoR, enable pin constraints more flexible.

Insert clock gating at RTL synthesis.

Unlimited Clock Fanout at RTL

DRC at output of clock gate (includes input capacitance of registers and net capacitance)Clustering based on placement location

Clock gate fanoutBased on

Selected maximum fanout at RTL synthesis for maximum power savings.Need to preprocess clock structure to meet target skew.

CTS QoR is a priority.Enable pin timing is a priority.

Why?

Replicate clock gates before CTS.

Insert clock gating at RTL synthesis.

When?

Replicate Clock GatesBalanced Clock Fanout at RTL

Page 55: Clock Gating Methodology

55

Clock Gating Methodology Overview Using ASCII Interchange Format (Verilog)

Identify clock gating cells

Merge clock gates

Placement and placement optimization

Identify clock gating cells

Merge clock gates

Placement and placement optimization

Replicate clock gates (astSplitClockNet)

Clock tree synthesis

Detail routing

Skew analysis

Replicate clock gates (astSplitClockNet)

Clock tree synthesis

Detail routing

Skew analysis

Input RTL Insert clock gating

Compile

Insert clock gating

Compile

Design CompilerDesign Compiler

Physical CompilerPhysical Compiler

AstroAstro

Identify clock gating cells

Merge clock gates

Placement and placement optimization

Replicate clock gates [BETA](split_clock_net)

Clock tree synthesis

Detail routing

Skew analysis

Identify clock gating cells

Merge clock gates

Placement and placement optimization

Replicate clock gates [BETA](split_clock_net)

Clock tree synthesis

Detail routing

Skew analysis

IC CompilerIC Compiler

Page 56: Clock Gating Methodology

56

How to Handle Enable Signal Timing

• Estimate delay of clock tree after clock gating cell before synthesis to avoid timing problems later– It can be modeled through the clock gate setup

checkset_clock_gating_style -setup (ideal_setup + Δ)propagate_constraints -gate_clock

– It can also be modeled by specifying a clock latency for the clock and then a modified clock latency for all the clock gate clock pinsset_clock_latency 1.7 CLK

This is the delay seen at the input of any ungated registerset_clock_latency 1.1 $ICGClkInputPins

This is the delay seen at the input of the clock gatesset_clock_latency 1.7 $ICGClkOutputPins

This is the delay seen at the input of the gated registers

CLK

Registers

CG

( ) ( + )

Page 57: Clock Gating Methodology

57

Formal Verification

• The Synopsys formal verification tool, Formality, can perform equivalence checking when the design has inserted clock gating cells

• The following command instructs Formality to account for clock gating logic

… …fm_shell > set verification_clock_gate_hold_mode any… …

Page 58: Clock Gating Methodology

58

Clock Gating and Test

• Controllability• Observability• Test signal connections

Page 59: Clock Gating Methodology

59

Potential Loss of Coverage

EN

CLK

Enablelogic

Levels of design

hierarchy

D Q

GLatch

Data in Data out

ENCLK

= fully tested

= partially tested= not tested

Di D Q

Flip-flops

D Q

Flip-flops

Clock is not controllable

Logic not observable

Page 60: Clock Gating Methodology

60

Test Coverage With Scan Enable

scan_enable

EN

CLK

Controllogic

Levels of design

hierarchy

D Q

D Q

GLatch

Data in Data out

ENCLKRegister

bank

= fully tested

= partially tested= not tested

D Q

Flip-flops

“0” during capture

Di

Control point

Page 61: Clock Gating Methodology

61

Test Coverage With Test Mode

test_mode

ENCLK

Enablelogic

Levels of design

hierarchy

D Q

D Q

GLatch

Data in Data out

ENCLKRegister

bank

= fully tested

= partially tested= not tested

“1”

Di

Control point

D Q

Flip-flops

Page 62: Clock Gating Methodology

62

Complete Observability

testmode

EN

CLK

D Qdataout

Observe flop

CLK

Otherobservabilitynodes

Latch

EN3

EN2

EN1

Unobservable point

Page 63: Clock Gating Methodology

63

Test Signal Connections

hookup_testports[-verbose][-se_port port][-tm_port port][-se_pin pin][-tm_pin pin]

SE1 FFFFCG1

FFFFCG1

SE2

SE3

hookup_testports –se_port SE3

Page 64: Clock Gating Methodology

64

Details on Replicate Clock Gates: Pictorial Description

Load on each ICG: 0.25pf (< Max Cap of 0.3pf)Load on

ICG: 2pf

Replication of ICG

Insertion of buffer to drive ungated registers

8 ICGs

DRC fixed on the output of each instanceIn Astro, net is marked as “synthesized”In IC Compiler, net is not marked as “synthesized”

Page 65: Clock Gating Methodology

65

Details on Replicate Clock Gates: Inputs, Constraints and Behavior

• Inputs– Requires a list of nets or instances

•If a net is specified, all instances on the fanout of the net are processed

• Constraints– The replication of the specified instances is based on fixing DRC at the

output of each instance– The DRC constraints considered are maximum fanout, maximum

capacitance and maximum transition•The tool converts maximum fanout and maximum transition into equivalent capacitance values, and uses the tightest of the three capacitance values as the maximum capacitance constraint

• Behavior– The tool splits the specified instance as many times as is necessary to

fix the DRC on the output of each clock gate

Page 66: Clock Gating Methodology

66

Details on Replicate Clock Gates: Example1

• Consider the following scenario:– Root clock net clk drives

1000 ungated registersClock gate cg1, which drives 2000 registersClock gates cg2, which drives 3000 registers

– You would like the clock gates driven by net clk to be balanced based on a maximum capacitance constraint of 0.35• Solution

– Set the following DRC constraints:set_clock_tree_options –max_capacitance 0.35split_clock_net –object clk

Load on each ICG < 0.35pf

Fanout of each ICG ~ 25

1000 registers

2000 registers

3000 registers

~120 ICGs

~80 ICGs

Page 67: Clock Gating Methodology

67

Details on Replicate Clock Gates: Example2

• Consider the following scenario:– Root clock net clk drives

1000 ungated registersClock gate cg1, which drives 2000 registersClock gate cg2, which drives 3000 registers

– You would like the clock gates driven by net clk to be balanced based on a maximum capacitance constraint of 0.35– You would like to make the clock structure more balanced by inserting a buffer to drive the ungated registers

• Solution– Set the following DRC constraints:

set_clock_tree_options –max_capacitance 0.35set cts_push_down_buffer truesplit_clock_net –object clk

Load on each ICG < 0.35pf

Fanout of each ICG ~ 25

1000 registers

2000 registers

3000 registers

~120 ICGs

~80 ICGs

Page 68: Clock Gating Methodology

68

Details on Replicate Clock Gates: Example3

• Consider the following scenario:– Root clock net clk drives

1000 ungated registersClock gate cg1, which drives 2000 registersClock gate cg2, which drives 3000 registers

– You would like the clock gates driven by net clk to be balanced based on a maximum fanout constraint of ~1000• Solution

– Set the following DRC constraints (specify a large maximum capacitance and maximum transition constraint, so that the tool chooses the maximum fanoutconstraint as the tightest constraint)set_clock_tree_options \–max_capacitance 10000 \–max_transition 10000 \–max_fanout 1000

split_clock_net –object clk

Fanout of each ICG ~1000

3 ICGs

2 ICGs

1000 registers

2000 registers

3000 registers

1000 registers

Page 69: Clock Gating Methodology

69

Details on Replicate Clock Gates: Example4

• Consider the following scenario:– Root clock net clk drives

1000 ungated registersClock gate cg1, which drives 200 registersClock gate cg2, which drives 3000 registersClock gate cg3, which drives 195 registers

– You would like the clock gates driven by net clk to be balanced based on a maximum fanout constraint of ~200• Solution

– Replicate the clock gate cg2 such that the fanout of each replicated instance is ~200set_clock_tree_options \–max_capacitance 10000 \–max_transition 10000 \–max_fanout 200

split_clock_net –object cg2

Fanout of each ICG ~ 200

~15 ICGs1000 registers

200 registers

195 registers

3000 registers

200 registers

195 registers

1000 registers

Page 70: Clock Gating Methodology

70

Additional Consideration With Discrete Clock Gating Cells

• Clock skew between latch and AND gate

– Clock at B later than A– Skew > latch delay

CLK@ A

EN

GCLK

CLK@ BEN1

skewdelay

glitch!

CLK

ENGCLK

EN1A

B

Page 71: Clock Gating Methodology

71

Using Discrete Clock Gating Cells

• In Design Compiler and Physical Compiler,– Do not ungroup the clock gating hierarchy– Enable group bounds to place the elements of the clock

gate (latch and AND gate) close togetherset physopt_disable_auto_bound_for_gated_clock false

• In Astro,– Place the latch and AND gates close together

Specify a large netweight on the net– Get the clock to go through the latch, that is, ignore the CLK

pin of the latch as a sync pinUse the astSetClockNonStop command

Refer to SolvNet article 003097