designing a chip - · pdf filedesigning a chip challenges, trends, and latin america ......

104
Designing a chip Challenges, Trends, and Latin America Opportunity Victor Grimblatt R&D Group Director SASE 2012

Upload: vuongtram

Post on 28-Mar-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

© Synopsys 2012 1

Designing a chip

Challenges, Trends, and Latin America

Opportunity

Victor Grimblatt

R&D Group Director

SASE 2012

© Synopsys 2012 2

Agenda

Introduction

The Evolution of Synthesis

SoC

IC Design Methodology

New Techniques and Challenges

IP Market, an opportunity for Latin America

© Synopsys 2012 3

Introduction

© Synopsys 2012 4

Interesting Facts from Cisco

Source: Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2011–2016, Feb 14, 2012

• Last year’s mobile data traffic eight times the size of the entire

global Internet in 2000

• Global mobile data traffic grew 2.3-fold in 2011, more than

doubling for 4th year in a row

• Mobile video traffic exceeded 50% for the first time in 2011

• Average smartphone usage nearly tripled in 2011

• In 2011, a 4th generation (4G) connection generated 28x more

traffic on average than non-4G connection

© Synopsys 2012 5

A Decade of Digital

Universe Growth

0

1000

2000

3000

4000

5000

6000

7000

8000

2005 2010 2015

7.910

Zettabytes

1.2

Zettabytes 130

Exabytes

Bandwidth Increase

Drives Exploding Need for Bandwidth

and Storage

© Synopsys 2012 6

• One zettabyte = stacks of books

from Earth to Pluto 20 times (72

billion miles)

• If an 11 oz. cup of coffee equals 1

gigabtye, then 1 zettabyte would

have the same volume of the

Great Wall of China

Source: IBS and Cisco

© Synopsys 2012 7

Tomorrow’s World

Reality Augmented Reality Blended Reality

Search Agents Info That Finds You

(and networks that know you)

2D 3D Immersive Video Holographics

Medical Mobile Medical Personal Medical

Person to Person Machine to Machine

Human Machines

© Synopsys 2012 8

What the Future Has in Store

© Synopsys 2012 9

How Does This Affect Design?

© Synopsys 2012 10

Today It’s… Used to Be…

Megatrends Change Design Requirements

Computing

Creating Info

Compute Power

Business

At your desk

Work

Connectivity

Consuming Info

Battery Power

Consumer

Anywhere, anytime

Entertainment

© Synopsys 2012 11

3%

5% 6% 5%

13%

20%

31%

13%

4%

0%

5%

10%

15%

20%

25%

30%

35%

≥250nm 180nm 130nm 90nm 65/55nm 45/40nm 32/28nm 22/20nm <20nm

Last Current Next

Synopsys Global User Survey, Feb 2012

N = 1290

Trends Drive Process Migration

© Synopsys 2012 12

and Increasing Gate Count

2-5M, 6% 2-5M, 7%

5-10M, 4%

5-10M, 9% 10-20M, 5%

10-20M, 5% 20-50M, 3%

20-50M, 7%

50-100M, 3%

50-100M, 6%

>100M, 3%

>100M, 13%

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

2010 2011Synopsys Global User Survey, Feb 2012

© Synopsys 2012 13

and Faster Designs

≤50MHz

51-100MHz

101-200MHz

201-300MHz

301-400MHz

401-500MHz

501-750MHz

751MHz-1GHz 1-2GHz

>2GHz

0%

20%

40%

60%

80%

100%

2004 2005 2006 2007 2008 2009 2010 2011

42%

Synopsys Global User Survey, Feb 2012

N = 962

© Synopsys 2012 14

… while requiring aggressive Power

Management

0%

50%

100%

150%

200%

250%

300%

350%

400%

2010 2011

Other

Back-biasing/Well-biasing

Library Variables (e.g., multi-channellength libraries)

Low Vdd Standby

State retention

MTCMOS/Power gating

Lower Vdd operation

Dynamic Voltage/Frequency Scaling(DVFS)

Multi-Corner, Multi-Mode (MCMM)optimization

Multi-voltage domains

Multi-Vt leakage optimization

Clock gating

Synopsys Global User Survey, Feb 2012

N = 282

© Synopsys 2012 15

Design Challenges are Multiplying Example of 28-nm challenges

• Unidirectional Poly (and other RDRs)

– Requires separate layouts, verification & test effort. GF and TSMC

have different preferred orientations (N/S v. E/W)

– No poly for local routing

• Device segmentation

– Limited device sizes, large analog devices broken up into smaller

pieces; Increases analog area

• Complexity

– Approximately 1700 design rule checks at 28nm vs. 700 at 65nm

– 8x the # of corners at 65 v. 28nm

– Lower Vddmin resulting in less design headroom

– Metal resistance doubles from 40 nm to 28 nm

• Global versus local Vth variations due to random doping effects

• Device Aging

– Must take into account device degradation over time due to

threshold voltage instability (NBTI/PBTI) and mobility degradation

(HCI)

40 nm layout

28 nm analog layout

9% larger than 40 nm

due to limitations

on poly area

28 nm is 2X harder than 40 nm

28 nm IP – area increases

without circuit innovation

© Synopsys 2012 16

Software SoC = on a chip System

$-

$0.50

$1.00

$1.50

$2.00

$2.50

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627

$M

Months

HW & SW Development Costs

App-Specific SW

Low-Level SW

OS Support

Design Management

Post-silicon Validation

Masks

Physical Design

RTL Verification

RTL Development

Spec Development

IP Qualification

Source: IBS, Synopsys

Software is Half the Time to Market For a Typical SoC !

© Synopsys 2012 17

$0

$25

$50

$75

$100

$125

$150

$175

90nm (60M) 65nm (90M) 45/40nm (130M) 32/28nm (180M) 22/20nm (240M)

Co

st

($M

)

Feature Dimension (Transistor Count)

Hardware

Software

Source: IBS and Synopsys, 2011

… And Half the Cost

© Synopsys 2012 18

Unlike Moore…

Software Guys are Pessimists

Page’s Law: 2009

Software gets twice as slow every

18 months.”

Wirth’s Law: 1995

Software is getting slower more rapidly

than hardware becomes faster. ”

© Synopsys 2012 19

What Can We Do About It?

© Synopsys 2012 20

The Evolution of Synthesis

© Synopsys 2012 21 Source: GE, 1986

Placement & Routing Ronald L. Rivest, Charles M. Fiduccia, Robert M. Mattheyses,

GE & MIT, 1982

© Synopsys 2012 22

Logic Synthesis David Gregory, Karen Bartlett, Aart J. de Geus, Gary D.

Hachtel, GE & University of Colorado at Boulder, 1986

© Synopsys 2012 23

Until Late 80’s The Implementation Flow Was Quite Straight Forward

There Was Already a “Wall”…

• Schematic Capture

• Timing Simulation Front-End

• Place & Route

• DRC/LVS Back-End

© Synopsys 2012 24

Early 90’s The Relationship Needs Improvements Badly:

“Walls” Now Lead to Iterations, Often Out of Control

• Delay Calculation

• Timing Simulation Sign-Off

• RTL Simulation

• Logic Synthesis Front-End

• Place & Route

• DRC/LVS Back-End

© Synopsys 2012 25

Early 00’s, 130nm, 7+ Metals PC and Astro+Blast+SilEnsemble – The Relationship Matures

Still, Too Many “Walls”, and # of Iterations Too High

• RTL Simulation

• Logic, Power & Test Synthesis

• Floorplan

• Physical Synthesis

• Floorplan

• P&R Back-End

• Extraction & STA

• DRC/LVS Sign-Off

Front-End

© Synopsys 2012 26

The Evolution Of The Relationship Convergence !

2009…

32/28 Nanometers

“In-Design”

2007

45/40 Nanometers

“Look Ahead”

2005

65 Nanometers

“Correlation”

2003

90 Nanometers

“Interoperability”

© Synopsys 2012 27

• Late 80’s - Early 90’s. Attempt #1 :

– Predict the future based on the past

– Wire load models, broken by nanometer wires

• Mid 90’s. Attempt #2 :

– Predict the future based on the present

– Front-end floorplanning, broken by “Frankenstein flows”

• Late 90’s – Today. Attempt #3 :

– Partner to create the future , rather than attempt to predict it

– Convergence of synthesis and place & route

– But underlying mathematics is different

The Evolution Of The Relationship Quick Summary

© Synopsys 2012 28

Logic Synthesis And Place & Route A Revolutionary… Evolution : Convergence !

Logic Compiler, ca. 1986 Design Compiler, 2010.03

From Equations to Gates, to… Placed and Routable Gates

© Synopsys 2012 29

SoC

© Synopsys 2012 30

What is High-Level Synthesis?

User inputs: • High-level algorithm

• Constraints

Automation using

High-Level Synthesis

Designer

Intent

HLS outputs: • Synthesizable RTL

• C-model

• RTL testbench

• Scripts for synthesis,

verification and

downstream tools

HLS

Results

Design technology and methodology

• Develop and verify hardware at a higher level of

abstraction

– Much smaller code with fewer bugs introduced

– Rapid architecture exploration

• Automate implementation and verification

– Automatic optimizations that equal hand-coded QoR

– Eliminate manual RTL coding & verification

Example benefits

• 2-5X productivity for initial designs

• 5-10X productivity for design re-use

• Increased exploration leading to better results

• Multi-million gate designs in weeks vs. months

;* cbac

© Synopsys 2012 31

High-Level Synthesis Advantage

Algorithm

Design RTL Coding

Arc

hit

ectu

re

Ex

plo

rati

on

RTL Verification

Imp

lem

en

tation

Cycle by cycle

functional debug For single architecture only Spreadsheets

Traditional Block Design

Algorithm

Design High-Level

Design

RTL

Verification

Imp

lem

en

tation

HLS-based Block Design

Better Designs,

Faster

Faster, more automatic model-to-RTL

validation, reduced RTL-level debug Quickly evaluate

multiple architectures

RTL automatically generated

Faster design at

higher abstraction

© Synopsys 2012 32

• Best Quality of Results

• May not be suitable for largest FPGA designs (long runtimes and large memory requirements)

Classic FPGA Methodology

Top Down Implementation

• Reduced Quality of Results

• Shorter runtime -preserve unchanged parts

• “Design Preservation”, block based flows, and Incremental P&R with “SmartGuide”

“Divide and Conquer”

Top Down Incremental

Implementation

• Distributed development

• Better design preservation and isolation

• Design style adjustments needed to achieve optimal timing Quality of Results (e.g. registering module boundaries

Emerging

“Mix and Match” Bottom Up and Top

Down Flow

Changing FPGA Design Methodology

© Synopsys 2012 33

Unified RTL Flow for FPGA and SOC

FPGA Synthesis

DW Implementation

Synplify

Premier/Certify

ASIC Implementation

DW Implementation

Galaxy

DesignWare

Building Blocks

Common RTL from prototype to production a combination of IP and tools

All DW Building blocks, minPower and Macrocell Blocks are supported in

Synplify Premier and Certify for FPGA-based prototyping

Your IP

DesignWare

IP

© Synopsys 2012 34

• Designs are getting larger and larger.

• Schedule stays the same or shorter despite the

increases in design complexity.

• Engineering resources are not increasing to handle this

complexity.

Today’s SOC Designs

How can EDA help manage this complexity?

© Synopsys 2012 35

Many Methods of Designing “SOC Design”… Similar Approach But End Results Vary …

Instructions

1. Preheat the oven to 450.

2. Melt butter and chocolate together in the top of a double broiler

or in the microwave. Add sea salt.

3. Meanwhile, beat together the egg, egg yolks, and sugar with a

whisk or an electric beater until light and slightly

foamy.

4. Add the egg mixture to the warm chocolate; whisk quickly to

combine. Add flour and stir just to combine. The batter will be quite

thick.

5. Butter small ramekins, or use Reynolds foil cupcake liners.

6. Divide the batter evenly among the ramekins. (You can make

the cakes in advance to this point and chill them until you're ready

to bake. Be sure to bring the batter back to room temperature

before baking.)

7. Baking time will depend on your oven; start with 7 minutes for a

thin outer shell with a completely molten interior.

8. Melt a little more chocolate to drizzle on top. Sprinkle a little

more salt, and serve with berries or ice cream.

Building Blocks

Instructions

Final Product Varies

© Synopsys 2012 36

Ever Increasing Chip Size

Leads to Hierarchical Design

Instances 3M 5M 15M 100M+ …

Hierarchical Flat

Typical

Threshold Flat versus Hierarchical

© Synopsys 2012 37

Ten Best Practices for Hierarchical Design Understanding These Practices Can Help

#6 Block-Level I/O Paths

Affects block design closure

#7 Block-Level Drivers/Loads

Affects block boundary closure

#8 Inter-Block Critical Paths

Absence helps chip closure

#9 Constraints Management

Affects design closure & TAT

#10 Signoff STA

Correlates to close timing

#1 Floorplan

Affects design closure

#2 Top-Level Style

Requires different discipline

#3 Block Size

Tradeoff size versus TAT

#4 Modeling

Modeling for top-level closure

#5 Top-Level Closure

Meeting the inter-block signals

© Synopsys 2012 38

• Partitioning Guidelines

– Logical connectivity

– Clock

– Voltage areas

– Physical size

– Multiple Instantiated

Modules (MIM)

• Macro Placement

• Power Planning

• IO Planning

#1 Floorplan Affects Design Closure

Example 1

Example 2

vs.

vs.

Challenge Better Approach

© Synopsys 2012 39

#2 Top-Level Style Requires Different Design Discipline

Abutted Narrow Channel Channel

clock

Data

Implementation Complexity

© Synopsys 2012 40

#3 Block Size Tradeoff Size versus TAT (turn around time)

1.5M

1.5M

1.5M

1.5M

1.5M

1.5M

2M 2M

3M

5M

5M

Faster TAT per block

but more blocks to integrate

Longer TAT per block

but fewer blocks to integrate

What Is Reasonable Size Depends A Lot On Design Team Preference?

Note: Block Size in instances

© Synopsys 2012 41

Extracted Timing Model (ETM)

Blocks modeled by timing arcs only

Used for customized IP

Abstract Model

Interface cells of each block retained

Recommended for P&R blocks

#4 Modeling ETM vs. Abstract Model

© Synopsys 2012 42

#5 Top-Level Closure Meeting Timing on Inter-Block Signals

Chg graphic

• Closing top-level inter block

signals can be challenging

• Can be minimized with

– Proper estimation of interface

constraints

– Proper floorplanning for signal

connectivity between blocks

• Simultaneous optimization of

top-level and inter-block

signals needed

© Synopsys 2012 43

Typical Hierarchical Structure

• I/O paths are not finalized during early stage block design

• Overconstraining these paths direct the tool to focus on I/O paths

instead of the intra-block paths

• Accuracy of proportional time budgets is affected if interfaces are

still changing

#6 Block Level I/O Paths I/O Paths Are Typically Not Finalized Early

Block Under Design Adjacent Block Adjacent Block

Logic Logic Logic Logic Logic

Registers Registers Registers Registers

© Synopsys 2012 44

A Better Approach

• Registering block outputs makes budgeting less dependent on

completeness of the netlist and easier

• Re-partitioning logic hierarchy helps manage constraints complexity

• Partitioning according to power domains / logic hierarchy makes

flow easier

#6 Block Level I/O Paths Registering Block Outputs Makes Budgeting Easier

Block Under Design Adjacent Block Adjacent Block

Logic Logic Logic

Registers Registers Registers Registers

Logic

© Synopsys 2012 45

• When designing Block A, need to consider load at output port A

– set_load

• When designing Block B, need to consider driving cell at input port B

– set_driving_cell

#7: Block Level Drivers and Loads Modeling I/O with Realistic Values Drives Convergence

Block A Block B

A B

• Block Interface timing is one of the toughest issues in hierarchical flow

• Realistic model of your input and output ports helps design convergence

© Synopsys 2012 46

• Without good estimation of loads and driving cell

– Integrating these blocks forces iterations unnecessary to meet timing

• Budgeting can automatically generate driver and load information

– Generate a quick netlist to run through budgeting for more accurate results

#7: Block Level Drivers and Loads Inter-blocks Paths Are One Of The Toughest SOC Challenges

n

If no load

is specified

Cell cannot be sized

correctly

© Synopsys 2012 47

If tool cannot see complete path, may be

challenge to stitch them at top-level

• Avoid critical paths crossing

multiple blocks

– Makes timing closure difficult

• Contain them within the same

block or if you must cross multiple

blocks, minimize the number of

blocks

• Budgeting, sizing, and load

estimations are needed to solve

inter-block critical paths violations

#8: Inter-Block Critical Paths Absence Helps Chip Closure

.

Block to Block path,

crossing Top

Top to Block

path

© Synopsys 2012 48

• Use shielding to reduce crosstalk effects between the block- and top-

level t significantly improve timing closure in inter-block critical paths

• Use new Transparent Interface Optimization (TIO) in IC Compiler

#8: Inter-Block Critical Paths Shielding Helps Chip Closure

Without Shielding With Shielding

© Synopsys 2012 49

#9: Constraints Management Pay Attention to Constraints

• Infeasible paths are paths that

are impossible to meet timing

– Missing false path/multi-cycle

path constraints

– Unreasonable input/output

delay constraints

• Other things to watch out

– size_only attributes

– dont_touch attributes

– Multi-cycle paths

– False paths

– Etc.

Eg: Infeasible Path, insufficient for 1 clock cycle

Eg: Infeasible Path, i/p delay too large

© Synopsys 2012 50

• Use IC Compiler signoff correlation checker system

– Performs both consistency and correlation check with user controllable accuracy

level

– Supports both pre-route and post-route checks

#10 Signoff Correlation Tighter Correlation Helps Close Timing

© Synopsys 2012 51

• Focus on environment and library setup for pre-route correlation

• Certain variables for correlation may have runtime and/or QoR impact on optimization

• Correlation setup may change and re-check may be needed for post-route

#10 Signoff Correlation Flows Flows for Pre-route and Post-route Correlation Checks

Pre-Route Flow

© Synopsys 2012 52

Today’s Designs Are Big & Hierarchical

Source: L. Besson, STMicroelectronics

Timing Signoff Challenges

• More effects, more variation

– Impacts accuracy vs. runtime

• Hierarchical P&R vs. flat signoff

– Large machines and runtime

– Interactions between top & block

• 30-40% blocks are tough to close

– 10 to 20 ECO iterations

• Lot’s of scenarios to analyze

– more machines, more reports

© Synopsys 2012 53

The Nanometer Challenges Top Issues to Look at

Source: ITRS 2009; C.A. Malachowsky, NVIDIA, EDPS 2009; P. Saxena, Intel, ISPD 2003

(1) SION Dielectric/Polysilicon Gate; (2) High-k Dielectric/Metal Gate

© Synopsys 2012 54

IC Design Methodology

© Synopsys 2012 55

But, Synthesis has Evolved

• Synthesis has evolved

beyond logic mapping

• It’s now predicting and

resolving congestion for

physical design

• Synthesis prediction of

physical effects evolution

is key to progress

© Synopsys 2012 56

And, Physical Design Under Heavy Load

• Increasingly, Physical

Design is the driver for

implementation schedule

• It’s where the rubber

meets the road – speed,

die-size, power, yield ..

• P&R evolution key to

progress

© Synopsys 2012 57

What’s on Designer’s Mind? Design & Project Management!

Is everyone using the same tool

version and the standard scripts?

How close are we to our design goals?

What’s the status of the blocks

right now?

How can I use the experience

from this project to plan the

next one better?

How much compute and license

resources are we using?

What’s taking up the most time?

Which step? Which block?

© Synopsys 2012 58

Many Flavors Of “Methodology”… Imagination Is the Only Limit…

Source: www.bk.com 2010

© Synopsys 2012 59

• create_clock -period [0.7 * target] high performance

• set_max_area to “0” small area

• Use small blocks for fast turnaround time

Past “Guidance” doesn’t Always

Apply to the Present

Things have changed but users are still

using the above techniques!

Synthesis

Place

& Route

Sig

no

ff

2005-2008

“Look-ahead”

Sig

no

ff

Design

Planning

Synthesis

DRC / LVS

Place

& Route

2000-2005

“Correlation” 2009-2010

“In-Design”

Place

& Route

DRC / LVS

Synthesis

Sig

no

ff

2011-

“Exploration”

Place

& Route

DRC / LVS

Sig

no

ff

Synthesis

Exp

lora

tio

n

Imp

lem

en

tatio

n

DRC / LVS

© Synopsys 2012 60

Wireload Model (WLM) results in higher frequency during Synthesis

than using Design Compiler Topographical (DCT) technology …

The Past vs. The Present

With WLM, these two circuits

have the same delay

Figure 1 Figure 2

With DCT, the delay is a reflection

of the x-y location of the cells

Which is more realistic?

© Synopsys 2012 61

Ten Best Practices for

Design Methodology

#6 Methodology

One or Two Flows

#7 Optimization

Adjust Accordingly

#8 Signoff

Review Your Environment

#9 Performance

Leverage Your EDA Partner

#10 Low Power

Architecture Drives Power

#1 Libraries

Know Your Attributes

#2 Setup

Correlation and Runtime

#3 Scripts

Impacts Your Design

#4 Constraints

Watch Your Constraints

#5 Analyze

Analyze-Fix-Proceed

© Synopsys 2012 62

Why is my design larger in area?

Why is it taking so long to run?

#1 Libraries: Know Your Attributes

Watch for dont_use, dont_touch, and size_only usage in your

libraries and scripts

• Attributes are user-controlled to guide optimization

• Restricting optimization may lead to problems

After

Optimization

Original Area

New Area

© Synopsys 2012 63

• A properly designed set of library

cells give optimization engines more

choice

– Avoid cells sensitive to minor change

in load, impedes convergence

– Footprint-equivalent cells are useful

for final-stage optimization w/ minimal

perturbation to other design metrics

– Std. cell pins should be on grid -

(especially complex cells with small

drive strength: higher pin density)

– Multiple variants for each flop (drive

strengths, delays, setup times, .. )

• Library quality enabler for targeted

performance

Technology and IP Make Sure to Have a Good Quality Library

Example:

Cell Sensitivity To Load Uncertainty

De

lay

Cload C*

D

*

Cell A

Cell B

B

A

© Synopsys 2012 64

#2 Setup: Correlation and Runtime

Netlist v1.0

SDC v1.0

• Compile

• 3.2M instances

Netlist v1.1

SDC v1.1

• Compile

• 6.8M instances??

What

happened???

• Found issues after days of

engineering work

• Size_only on 3.7M cells

• SDC with all cells set with

set_disable_clock_gating on

What do designers do when they run into these?

© Synopsys 2012 65

Review Your Settings and Input Understand the Different Objectives

• Detect design issues and dirty constraints styles that can lead to bad runtime/memory and QoR

DC Utility Checker

• Detect readiness of physical design before going into various implementation stages

ICC Utility Checker

• Detects application variables, settings and design issues causing runtime or memory increase

PT Utility Checker

© Synopsys 2012 66

Need to put things in perspective …

• First Step: review your script

– How was the script migrated to “Tool A”?

– Did you also update the script to leverage the latest

technologies?

• Early stage of your design, think fast mode

• Final stage of your design, think QoR

#3 Scripts: Impacts Your Design

When someone tells you “Tool A” is X times faster than “Tool B”

Incomplete Complete

© Synopsys 2012 67

• Today’s design requires

completeness

• Synopsys tools are tailored for

performance, but they also have

a mode to run fast

• Recommendations

– The typical complaint is long runtime,

choose your goal setting accordingly

– Make sure your script is up to date for

your end goal and to take advantage

of the latest features

Tool Input can Impact Results Understand How the Tool Can Help Meet Design Goals

© Synopsys 2012 68

Symptoms of over-constraining: long runtime,

excessive buffering and huge violations

#4 Constraints: Watch Your Constraints

Original Clock period

Input Delay Output Delay Time Available

for logic

• Over-constraining could guide the

tool to focus on artificial critical paths

• Over-constraining happens with

• Unrealistic input and/or output

delays

• Tightening the clock period

• Specifying large clock uncertainty

Synopsys tools are designed to work towards meeting design goals…

but don’t expect miracles!

© Synopsys 2012 69

Understanding EDA Tool will help Simple Illustration

Circuit A Circuit B

Will DC do this transformation?

CLKA wns = -0.300

CLKB wns = -0.100

CLKA wns = -0.280

CLKB wns = -0.150

Default Weights Delay Cost Before Delay Cost After

CLKA weight = 1

CLKB weight = 1

0.30

0.10

0.28

0.15

Total WNS Cost 0.40 0.43

Adjusted Weights Delay Cost Before Delay Cost After

CLKA weight = 10

CLKB weight = 1

3.00

0.10

2.80

0.15

Total WNS Cost 3.10 2.95

Total cost increased

Transformation rejected

Worst WNS = -0.300

Total cost reduced

Transformation accepted

Worst WNS = -0.280

<

> √

Cost = ∑ pi * wi

© Synopsys 2012 70

#5 Analyze: Analyze-Fix-Proceed

Push Button Flow

does not exists Know your circuit

to guide the tool

© Synopsys 2012 71

Synopsys Galaxy Implementation Flow

DC Graphical

IC Compiler

place_opt -spg

clock_opt

route_opt

signoff_opt

compile_ultra -spg

insert_dft

compile_ultra –spg -incr

StarRC

PrimeTimeSI

Signoff extraction

Signoff STA

Analyze

results

between

design

stages

© Synopsys 2012 72

Design specifications and constraints changes

constantly during the design cycle

#6 Methodology: One or Two Flows

180 nanometers (2000)

225K gates, 11 RAMs

150 MHz

45 nanometers (2010)

96mm2, ~ 300M transistors

7-9W

One flow

for both

exploration &

Implementation

Exploration flow

target for

early specs

& constraints

Implementation

flow

for final

design

realization

© Synopsys 2012 73

Exploration Throughout Galaxy

DC Explorer

• Early RTL Exploration

– Accelerates Design Schedules

Design Compiler

• Look-ahead & Physical Guidance

– Creates a better starting point

IC Compiler

• Design Exploration

– Creates initial floorplan

• Block Feasibility

– Determines physical feasibility

Galaxy Constraint Analyzer

• Continuous improvement

RTL

Exploration

RTL

Synthesis

Design

Exploration

Design

Planning

Block

Feasibility

Block

Implementation

Implementation Exploration

RTL

Physical

© Synopsys 2012 74

Adjust your constraints to model effects of

downstream design steps

#7 Optimization: Adjust Accordingly

Design

Compiler

• Account for clock trees

• No hold-timing fixing

• Be careful with critical range

• Do not over-constrain

An Illustration

© Synopsys 2012 75

• Synthesis and placement

– Do not over-constrain during synthesis

– Use DC SPG flow

– Account for max_transition and clock uncertainty

– Specify pre-CTS estimated constraints

• CTS

– Remove pre-CTS estimated constraints

• Route

– Remove/adjust pre-route constraints

– Adjust crosstalk thresholds

Manage Design Constraints Throughout Guidelines For Convergent Timing Closure

1029

971

913

800

850

900

950

1,000

1,050

1,100

Synthesis Place Clock Route

MH

z

Addnl. Customization For High-Performance

Tuned For Hi-Performance/Low Power

RM (Baseline)

Timing Closure Profile

Timing Closure

Profile

Do Not over

Complicate your flow

© Synopsys 2012 76

Runtime (CPU Hrs)

#8 Signoff: Review your Environment

0

16

32

48

64

80

96

112

128

1.1 1.2 5.5 37.0 50+

0

10

20

30

40

50

60

1.1 1.2 5.5 37.0 50+

Memory Usage (GB) 172 GB

Instances (Million) Instances (Million)

Designs run at customer site using revised

PrimeTime scripts and latest release version

Unlike wine, scripts grow stale with age

© Synopsys 2012 77

PrimeTime Scripts: Key Areas to Review

• Environment and setup

– Use latest release and ensure adequate hardware resources

• Reading parasitics

– Use binary parasitics when possible

• Multiple timing updates

– Eliminate redundant/legacy update_timing steps

• Inefficient TCL scripting and reporting

PrimeTime Design Utility Checker

can help with some of these tasks

© Synopsys 2012 78

#9 Performance: Leverage Your

EDA Partner

• Starting Point

– Built on Synopsys RM

– Understand the new

technologies and features

– Easy to use

• Reduce time-to-results

– Automated methodology to

achieve 90% of target quickly

– Additional advanced

techniques to reach final goal

– Minimize number of iterations

or “trial and errors”

– Reduce ECO efforts

Synthesis

Design Schedule

Typical Flow

HSLP Flow

Signoff + ECO Iterations P&R

© Synopsys 2012 79

HSLP Implementation Best Practices Reduces Time-to-Results

Time

Targets

100%

90%

75%

Typical Flow

With HSLP

Implementation

Best Practices

Design-specific

customization Reduces time-to-results

Typical Flow on

Regular designs Typical Flow on

High Performance designs

HSLP Flow

High Performance, Low Power (HSLP) Flow Requires Customization

© Synopsys 2012 80

#10 Low Power: Architecture Drives Power

0.9V 0.7V

0.9V

OFF

0.9V 0.9V

0.9V

OFF

Multiple Voltage (MV) Domains

Multi-Supply with shutdown No State Retention

Multi-Voltage with shutdown

0.9V 0.7V

0.9V

0.9V 0.7V

0.9V

OFF

Multi-voltage with shutdown & State Retention

SR

Retention

Registers

Power

Switches

(MTCMOS)

Level

Shifters

Isolation

Cells

Always-

on Logic

DESIGN TECHNIQUES

VDDB

VSS

IN

OUT

EN

VDD

ISO

VSS

IN

VDDI VDDO

OUT L

S AO IN OUT

VDD

VDDB

VSS

Gate Gate

on/off

VDD

Gate

VSS

VDDB

VDD

RR

© Synopsys 2012 81

New Techniques and Challenges

© Synopsys 2012 82

Leading The Way In 20nm Design

The Race to 20nm Is On!

© Synopsys 2012 83

The 20 nm Challenge: Single Exposure “Last Pitch With Single Exposure ~ 80 Nanometers…”

We Can Print This,… But We Cannot Print This

Source M. van den Brink, ASML, ITF 2009; P. Magarshack, STMicroelectronics, 2010

© Synopsys 2012 84

And Then This!

The Solution: Double Patterning A Significant Change

We Can Print This, and This,…

© Synopsys 2012 85

Synopsys Solution DPT Ready IC Compiler P&R, and IC Validator DRC

Source: Synopsys Research 2011

Wide Spacing Enforced Two-Color Decomposed Design

© Synopsys 2012 86

Synopsys Solution DPT Ready IC Compiler P&R, and IC Validator DRC

Source: Synopsys Research 2011

© Synopsys 2012 87

The Challenge: Planar CMOS Insufficient Performance, Excessive Power

32 Nanometer Planar Performance Power

Source: K. Kuhn, Intel, IDF 2011

© Synopsys 2012 88

The Solution: Non-Planar CMOS FinFET or Tri-Gate CMOS

22 Nanometer Tri-Gate Performance Power

Source: K. Kuhn, Intel, IDF 2011

© Synopsys 2012 89

The Solution: Non-Planar CMOS The First “Revolution”

Source: M. Bohr, Intel, YouTube 2011

© Synopsys 2012 90

There Are Many Flavors, But… Reality and Fantasy Are not the Same Thing !

© Synopsys 2012 91

• Superior drive current – Active region spans the fin height and

thickness (3 sides)

– Ids α (2*Hfin+Tfin) as opposed to just thickness for planar

• Reduced leakage – Depleted substrate

• Enhanced electron mobility – High-K gate oxide

– Metal gates in place of PolySilicon

– Strained silicon

– Multiple fins possible to increase total drive strength for higher performance

FinFET Advantages FinFET vs Planar Transistor

Source: Intel

FinFET

Planar Inversion Layer

Fin

© Synopsys 2012 92

This Is Not The End of Moore’s Law! But the Gap Between Intel and the Crowd Is Widening

Source: M. Bohr, Intel, IDF 2011

© Synopsys 2012 93

3D ICs: Technology Trends Four Main Categories of “> 2D-IC” Ahead

Memory

“Cube”

(Wide I/O) Memory

“Cube” on Logic

Silicon Interposer

3D Stack

C4

TSV Bump

1 2

3 4

© Synopsys 2012 94

3D-IC Two Basic Configurations Emerging Addressing Gigascale Design Challenges

Silicon Interposer (2.5D)

• Horizontally connected dies

• Drivers: Consumer, Storage, Networking

• Benefits: Yield, Cost, TTM & Flexibility

3D-IC

• Vertically stacked dies with TSVs

• Drivers: Wireless handset, Processors

• Benefits: Performance, form factor

© Synopsys 2012 95

The ”Memory Cube” Now

Source: C.-G. Hwang, Samsung, IEDM 2006

8 die stack

50 microns

560 microns

1

© Synopsys 2012 96

IP Market, an opportunity for Latin

America

© Synopsys 2012 97

IP

Intellectual property core, IP core, or IP block is a reusable unit of logic, cell, or chip layout design that is the intellectual property of one party

IP cores may be licensed to another party or can be owned and used by a single party alone

IP cores can be used as building blocks within ASIC chip designs or FPGA logic designs

© Synopsys 2012 98

IP

IP cores in the electronic design industry have had a profound impact on the design of systems on a chip

IP core licensor spread the cost of development among multiple chip makers

IP cores for standard processors, interfaces, and internal functions have enabled chip makers to put more of their resources into developing the differentiating features of their chips new innovations faster

Licensing and use of IP cores in chip design came into common practice in the 1990s

© Synopsys 2012 99

2011 Design IP Revenue: $1.9B

Semiconductor IP Market Segments

Microprocessors 39%

DSP 5%

Fixed Function (GPUs, Security)

15%

Wired Interfaces 19%

Memory Cells/Blocks 10%

GP Analog/MS 4%

Block Libraries 1%

Physical libaries 3%

Other IP 4%

Processors

(CPUs, GPUs, DSPs)

Source: Gartner, March 2012

© Synopsys 2012 100

Semiconductor IP Market Size

Synopsys Share

CY04 CY05 CY06 CY07 CY08 CY09 CY10 CY11

Semiconductor IP Market Size 964.0 1,068.3 1,267.3 1,378.2 1,464.1 1,351.0 1,695.0 1,910.9

Synopsys Share 7.9% 7.6% 7.3% 7.2% 7.2% 9.1% 11.3% 12.4%

0.0%

2.0%

4.0%

6.0%

8.0%

10.0%

12.0%

14.0%

0.0

200.0

400.0

600.0

800.0

1,000.0

1,200.0

1,400.0

1,600.0

1,800.0

2,000.0

$M

Syn

op

sys S

ha

re

Source: Gartner, March 2012

© Synopsys 2012 101

Rank Company 2010 2011 Growth 2011 Share

1 ARM Holdings 575.8 732.5 27.2% 38.3%

2 Synopsys 191.8 236.2 23.2% 12.4%

3 Imagination Technologies91.5 126.4 38.1% 6.6%

4 MIPS Technologies 85.3 72.1 -15.5% 3.8%

5 Ceva 44.9 60.2 34.1% 3.2%

6 Si l icon Image 38.5 42.8 11.2% 2.2%

7 Rambus 41.4 38.9 -6.0% 2.0%

8 Tens i l ica 31.5 36.3 15.2% 1.9%

9 Mentor Graphics 27.3 23.6 -13.8% 1.2%

10 AuthenTec 19.6 22.8 16.3% 1.2%

Top Semiconductor IP Vendors

Source: Gartner, March 2012

© Synopsys 2012 102

10

20

30

40

50

60

70

0

20

40

60

80

100

120

2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

% D

es

ign

Re

us

e

To

tal N

um

be

r o

f IP

Blo

ck

s p

er

So

C Avg. # IP Blocks per SoC

% Design Reuse

Source: Semico, October 2010

IP Blocks

IP Subsystems

IP Vendors Also Need to Provide More

Functions and Functionality

© Synopsys 2012 103

Complete Solution: HW, SW,

Prototype

Pre-integrated and Verified

SoC Ready: Seamlessly Drop-

in and Go

Subsystems:

The Next Evolution in The IP Market

What is a Subsystem?

© Synopsys 2012 104

Thank You