designing a chip - · pdf filedesigning a chip challenges, trends, and latin america ......
TRANSCRIPT
© Synopsys 2012 1
Designing a chip
Challenges, Trends, and Latin America
Opportunity
Victor Grimblatt
R&D Group Director
SASE 2012
© Synopsys 2012 2
Agenda
Introduction
The Evolution of Synthesis
SoC
IC Design Methodology
New Techniques and Challenges
IP Market, an opportunity for Latin America
© Synopsys 2012 4
Interesting Facts from Cisco
Source: Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2011–2016, Feb 14, 2012
• Last year’s mobile data traffic eight times the size of the entire
global Internet in 2000
• Global mobile data traffic grew 2.3-fold in 2011, more than
doubling for 4th year in a row
• Mobile video traffic exceeded 50% for the first time in 2011
• Average smartphone usage nearly tripled in 2011
• In 2011, a 4th generation (4G) connection generated 28x more
traffic on average than non-4G connection
© Synopsys 2012 5
A Decade of Digital
Universe Growth
0
1000
2000
3000
4000
5000
6000
7000
8000
2005 2010 2015
7.910
Zettabytes
1.2
Zettabytes 130
Exabytes
Bandwidth Increase
Drives Exploding Need for Bandwidth
and Storage
© Synopsys 2012 6
• One zettabyte = stacks of books
from Earth to Pluto 20 times (72
billion miles)
• If an 11 oz. cup of coffee equals 1
gigabtye, then 1 zettabyte would
have the same volume of the
Great Wall of China
Source: IBS and Cisco
© Synopsys 2012 7
Tomorrow’s World
Reality Augmented Reality Blended Reality
Search Agents Info That Finds You
(and networks that know you)
2D 3D Immersive Video Holographics
Medical Mobile Medical Personal Medical
Person to Person Machine to Machine
Human Machines
© Synopsys 2012 10
Today It’s… Used to Be…
Megatrends Change Design Requirements
Computing
Creating Info
Compute Power
Business
At your desk
Work
Connectivity
Consuming Info
Battery Power
Consumer
Anywhere, anytime
Entertainment
© Synopsys 2012 11
3%
5% 6% 5%
13%
20%
31%
13%
4%
0%
5%
10%
15%
20%
25%
30%
35%
≥250nm 180nm 130nm 90nm 65/55nm 45/40nm 32/28nm 22/20nm <20nm
Last Current Next
Synopsys Global User Survey, Feb 2012
N = 1290
Trends Drive Process Migration
© Synopsys 2012 12
and Increasing Gate Count
2-5M, 6% 2-5M, 7%
5-10M, 4%
5-10M, 9% 10-20M, 5%
10-20M, 5% 20-50M, 3%
20-50M, 7%
50-100M, 3%
50-100M, 6%
>100M, 3%
>100M, 13%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
2010 2011Synopsys Global User Survey, Feb 2012
© Synopsys 2012 13
and Faster Designs
≤50MHz
51-100MHz
101-200MHz
201-300MHz
301-400MHz
401-500MHz
501-750MHz
751MHz-1GHz 1-2GHz
>2GHz
0%
20%
40%
60%
80%
100%
2004 2005 2006 2007 2008 2009 2010 2011
42%
Synopsys Global User Survey, Feb 2012
N = 962
© Synopsys 2012 14
… while requiring aggressive Power
Management
0%
50%
100%
150%
200%
250%
300%
350%
400%
2010 2011
Other
Back-biasing/Well-biasing
Library Variables (e.g., multi-channellength libraries)
Low Vdd Standby
State retention
MTCMOS/Power gating
Lower Vdd operation
Dynamic Voltage/Frequency Scaling(DVFS)
Multi-Corner, Multi-Mode (MCMM)optimization
Multi-voltage domains
Multi-Vt leakage optimization
Clock gating
Synopsys Global User Survey, Feb 2012
N = 282
© Synopsys 2012 15
Design Challenges are Multiplying Example of 28-nm challenges
• Unidirectional Poly (and other RDRs)
– Requires separate layouts, verification & test effort. GF and TSMC
have different preferred orientations (N/S v. E/W)
– No poly for local routing
• Device segmentation
– Limited device sizes, large analog devices broken up into smaller
pieces; Increases analog area
• Complexity
– Approximately 1700 design rule checks at 28nm vs. 700 at 65nm
– 8x the # of corners at 65 v. 28nm
– Lower Vddmin resulting in less design headroom
– Metal resistance doubles from 40 nm to 28 nm
• Global versus local Vth variations due to random doping effects
• Device Aging
– Must take into account device degradation over time due to
threshold voltage instability (NBTI/PBTI) and mobility degradation
(HCI)
40 nm layout
28 nm analog layout
9% larger than 40 nm
due to limitations
on poly area
28 nm is 2X harder than 40 nm
28 nm IP – area increases
without circuit innovation
© Synopsys 2012 16
Software SoC = on a chip System
$-
$0.50
$1.00
$1.50
$2.00
$2.50
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627
$M
Months
HW & SW Development Costs
App-Specific SW
Low-Level SW
OS Support
Design Management
Post-silicon Validation
Masks
Physical Design
RTL Verification
RTL Development
Spec Development
IP Qualification
Source: IBS, Synopsys
Software is Half the Time to Market For a Typical SoC !
© Synopsys 2012 17
$0
$25
$50
$75
$100
$125
$150
$175
90nm (60M) 65nm (90M) 45/40nm (130M) 32/28nm (180M) 22/20nm (240M)
Co
st
($M
)
Feature Dimension (Transistor Count)
Hardware
Software
Source: IBS and Synopsys, 2011
… And Half the Cost
© Synopsys 2012 18
Unlike Moore…
Software Guys are Pessimists
Page’s Law: 2009
Software gets twice as slow every
18 months.”
Wirth’s Law: 1995
Software is getting slower more rapidly
than hardware becomes faster. ”
© Synopsys 2012 21 Source: GE, 1986
Placement & Routing Ronald L. Rivest, Charles M. Fiduccia, Robert M. Mattheyses,
GE & MIT, 1982
© Synopsys 2012 22
Logic Synthesis David Gregory, Karen Bartlett, Aart J. de Geus, Gary D.
Hachtel, GE & University of Colorado at Boulder, 1986
© Synopsys 2012 23
Until Late 80’s The Implementation Flow Was Quite Straight Forward
There Was Already a “Wall”…
• Schematic Capture
• Timing Simulation Front-End
• Place & Route
• DRC/LVS Back-End
© Synopsys 2012 24
Early 90’s The Relationship Needs Improvements Badly:
“Walls” Now Lead to Iterations, Often Out of Control
• Delay Calculation
• Timing Simulation Sign-Off
• RTL Simulation
• Logic Synthesis Front-End
• Place & Route
• DRC/LVS Back-End
© Synopsys 2012 25
Early 00’s, 130nm, 7+ Metals PC and Astro+Blast+SilEnsemble – The Relationship Matures
Still, Too Many “Walls”, and # of Iterations Too High
• RTL Simulation
• Logic, Power & Test Synthesis
• Floorplan
• Physical Synthesis
• Floorplan
• P&R Back-End
• Extraction & STA
• DRC/LVS Sign-Off
Front-End
© Synopsys 2012 26
The Evolution Of The Relationship Convergence !
2009…
32/28 Nanometers
“In-Design”
2007
45/40 Nanometers
“Look Ahead”
2005
65 Nanometers
“Correlation”
2003
90 Nanometers
“Interoperability”
© Synopsys 2012 27
• Late 80’s - Early 90’s. Attempt #1 :
– Predict the future based on the past
– Wire load models, broken by nanometer wires
• Mid 90’s. Attempt #2 :
– Predict the future based on the present
– Front-end floorplanning, broken by “Frankenstein flows”
• Late 90’s – Today. Attempt #3 :
– Partner to create the future , rather than attempt to predict it
– Convergence of synthesis and place & route
– But underlying mathematics is different
The Evolution Of The Relationship Quick Summary
© Synopsys 2012 28
Logic Synthesis And Place & Route A Revolutionary… Evolution : Convergence !
Logic Compiler, ca. 1986 Design Compiler, 2010.03
From Equations to Gates, to… Placed and Routable Gates
© Synopsys 2012 30
What is High-Level Synthesis?
User inputs: • High-level algorithm
• Constraints
Automation using
High-Level Synthesis
Designer
Intent
HLS outputs: • Synthesizable RTL
• C-model
• RTL testbench
• Scripts for synthesis,
verification and
downstream tools
HLS
Results
Design technology and methodology
• Develop and verify hardware at a higher level of
abstraction
– Much smaller code with fewer bugs introduced
– Rapid architecture exploration
• Automate implementation and verification
– Automatic optimizations that equal hand-coded QoR
– Eliminate manual RTL coding & verification
Example benefits
• 2-5X productivity for initial designs
• 5-10X productivity for design re-use
• Increased exploration leading to better results
• Multi-million gate designs in weeks vs. months
;* cbac
© Synopsys 2012 31
High-Level Synthesis Advantage
Algorithm
Design RTL Coding
Arc
hit
ectu
re
Ex
plo
rati
on
RTL Verification
Imp
lem
en
tation
Cycle by cycle
functional debug For single architecture only Spreadsheets
Traditional Block Design
Algorithm
Design High-Level
Design
RTL
Verification
Imp
lem
en
tation
HLS-based Block Design
Better Designs,
Faster
Faster, more automatic model-to-RTL
validation, reduced RTL-level debug Quickly evaluate
multiple architectures
RTL automatically generated
Faster design at
higher abstraction
© Synopsys 2012 32
• Best Quality of Results
• May not be suitable for largest FPGA designs (long runtimes and large memory requirements)
Classic FPGA Methodology
Top Down Implementation
• Reduced Quality of Results
• Shorter runtime -preserve unchanged parts
• “Design Preservation”, block based flows, and Incremental P&R with “SmartGuide”
“Divide and Conquer”
Top Down Incremental
Implementation
• Distributed development
• Better design preservation and isolation
• Design style adjustments needed to achieve optimal timing Quality of Results (e.g. registering module boundaries
Emerging
“Mix and Match” Bottom Up and Top
Down Flow
Changing FPGA Design Methodology
© Synopsys 2012 33
Unified RTL Flow for FPGA and SOC
FPGA Synthesis
DW Implementation
Synplify
Premier/Certify
ASIC Implementation
DW Implementation
Galaxy
DesignWare
Building Blocks
Common RTL from prototype to production a combination of IP and tools
All DW Building blocks, minPower and Macrocell Blocks are supported in
Synplify Premier and Certify for FPGA-based prototyping
Your IP
DesignWare
IP
© Synopsys 2012 34
• Designs are getting larger and larger.
• Schedule stays the same or shorter despite the
increases in design complexity.
• Engineering resources are not increasing to handle this
complexity.
Today’s SOC Designs
How can EDA help manage this complexity?
© Synopsys 2012 35
Many Methods of Designing “SOC Design”… Similar Approach But End Results Vary …
Instructions
1. Preheat the oven to 450.
2. Melt butter and chocolate together in the top of a double broiler
or in the microwave. Add sea salt.
3. Meanwhile, beat together the egg, egg yolks, and sugar with a
whisk or an electric beater until light and slightly
foamy.
4. Add the egg mixture to the warm chocolate; whisk quickly to
combine. Add flour and stir just to combine. The batter will be quite
thick.
5. Butter small ramekins, or use Reynolds foil cupcake liners.
6. Divide the batter evenly among the ramekins. (You can make
the cakes in advance to this point and chill them until you're ready
to bake. Be sure to bring the batter back to room temperature
before baking.)
7. Baking time will depend on your oven; start with 7 minutes for a
thin outer shell with a completely molten interior.
8. Melt a little more chocolate to drizzle on top. Sprinkle a little
more salt, and serve with berries or ice cream.
Building Blocks
Instructions
Final Product Varies
© Synopsys 2012 36
Ever Increasing Chip Size
Leads to Hierarchical Design
Instances 3M 5M 15M 100M+ …
Hierarchical Flat
Typical
Threshold Flat versus Hierarchical
© Synopsys 2012 37
Ten Best Practices for Hierarchical Design Understanding These Practices Can Help
#6 Block-Level I/O Paths
Affects block design closure
#7 Block-Level Drivers/Loads
Affects block boundary closure
#8 Inter-Block Critical Paths
Absence helps chip closure
#9 Constraints Management
Affects design closure & TAT
#10 Signoff STA
Correlates to close timing
#1 Floorplan
Affects design closure
#2 Top-Level Style
Requires different discipline
#3 Block Size
Tradeoff size versus TAT
#4 Modeling
Modeling for top-level closure
#5 Top-Level Closure
Meeting the inter-block signals
© Synopsys 2012 38
• Partitioning Guidelines
– Logical connectivity
– Clock
– Voltage areas
– Physical size
– Multiple Instantiated
Modules (MIM)
• Macro Placement
• Power Planning
• IO Planning
#1 Floorplan Affects Design Closure
Example 1
Example 2
vs.
vs.
Challenge Better Approach
© Synopsys 2012 39
#2 Top-Level Style Requires Different Design Discipline
Abutted Narrow Channel Channel
clock
Data
Implementation Complexity
© Synopsys 2012 40
#3 Block Size Tradeoff Size versus TAT (turn around time)
1.5M
1.5M
1.5M
1.5M
1.5M
1.5M
2M 2M
3M
5M
5M
Faster TAT per block
but more blocks to integrate
Longer TAT per block
but fewer blocks to integrate
What Is Reasonable Size Depends A Lot On Design Team Preference?
Note: Block Size in instances
© Synopsys 2012 41
Extracted Timing Model (ETM)
Blocks modeled by timing arcs only
Used for customized IP
Abstract Model
Interface cells of each block retained
Recommended for P&R blocks
#4 Modeling ETM vs. Abstract Model
© Synopsys 2012 42
#5 Top-Level Closure Meeting Timing on Inter-Block Signals
Chg graphic
• Closing top-level inter block
signals can be challenging
• Can be minimized with
– Proper estimation of interface
constraints
– Proper floorplanning for signal
connectivity between blocks
• Simultaneous optimization of
top-level and inter-block
signals needed
© Synopsys 2012 43
Typical Hierarchical Structure
• I/O paths are not finalized during early stage block design
• Overconstraining these paths direct the tool to focus on I/O paths
instead of the intra-block paths
• Accuracy of proportional time budgets is affected if interfaces are
still changing
#6 Block Level I/O Paths I/O Paths Are Typically Not Finalized Early
Block Under Design Adjacent Block Adjacent Block
Logic Logic Logic Logic Logic
Registers Registers Registers Registers
© Synopsys 2012 44
A Better Approach
• Registering block outputs makes budgeting less dependent on
completeness of the netlist and easier
• Re-partitioning logic hierarchy helps manage constraints complexity
• Partitioning according to power domains / logic hierarchy makes
flow easier
#6 Block Level I/O Paths Registering Block Outputs Makes Budgeting Easier
Block Under Design Adjacent Block Adjacent Block
Logic Logic Logic
Registers Registers Registers Registers
Logic
© Synopsys 2012 45
• When designing Block A, need to consider load at output port A
– set_load
• When designing Block B, need to consider driving cell at input port B
– set_driving_cell
#7: Block Level Drivers and Loads Modeling I/O with Realistic Values Drives Convergence
Block A Block B
A B
• Block Interface timing is one of the toughest issues in hierarchical flow
• Realistic model of your input and output ports helps design convergence
© Synopsys 2012 46
• Without good estimation of loads and driving cell
– Integrating these blocks forces iterations unnecessary to meet timing
• Budgeting can automatically generate driver and load information
– Generate a quick netlist to run through budgeting for more accurate results
#7: Block Level Drivers and Loads Inter-blocks Paths Are One Of The Toughest SOC Challenges
n
If no load
is specified
Cell cannot be sized
correctly
© Synopsys 2012 47
If tool cannot see complete path, may be
challenge to stitch them at top-level
• Avoid critical paths crossing
multiple blocks
– Makes timing closure difficult
• Contain them within the same
block or if you must cross multiple
blocks, minimize the number of
blocks
• Budgeting, sizing, and load
estimations are needed to solve
inter-block critical paths violations
#8: Inter-Block Critical Paths Absence Helps Chip Closure
.
Block to Block path,
crossing Top
Top to Block
path
© Synopsys 2012 48
• Use shielding to reduce crosstalk effects between the block- and top-
level t significantly improve timing closure in inter-block critical paths
• Use new Transparent Interface Optimization (TIO) in IC Compiler
#8: Inter-Block Critical Paths Shielding Helps Chip Closure
Without Shielding With Shielding
© Synopsys 2012 49
#9: Constraints Management Pay Attention to Constraints
• Infeasible paths are paths that
are impossible to meet timing
– Missing false path/multi-cycle
path constraints
– Unreasonable input/output
delay constraints
• Other things to watch out
– size_only attributes
– dont_touch attributes
– Multi-cycle paths
– False paths
– Etc.
Eg: Infeasible Path, insufficient for 1 clock cycle
Eg: Infeasible Path, i/p delay too large
© Synopsys 2012 50
• Use IC Compiler signoff correlation checker system
– Performs both consistency and correlation check with user controllable accuracy
level
– Supports both pre-route and post-route checks
#10 Signoff Correlation Tighter Correlation Helps Close Timing
© Synopsys 2012 51
• Focus on environment and library setup for pre-route correlation
• Certain variables for correlation may have runtime and/or QoR impact on optimization
• Correlation setup may change and re-check may be needed for post-route
#10 Signoff Correlation Flows Flows for Pre-route and Post-route Correlation Checks
Pre-Route Flow
© Synopsys 2012 52
Today’s Designs Are Big & Hierarchical
Source: L. Besson, STMicroelectronics
Timing Signoff Challenges
• More effects, more variation
– Impacts accuracy vs. runtime
• Hierarchical P&R vs. flat signoff
– Large machines and runtime
– Interactions between top & block
• 30-40% blocks are tough to close
– 10 to 20 ECO iterations
• Lot’s of scenarios to analyze
– more machines, more reports
© Synopsys 2012 53
The Nanometer Challenges Top Issues to Look at
Source: ITRS 2009; C.A. Malachowsky, NVIDIA, EDPS 2009; P. Saxena, Intel, ISPD 2003
(1) SION Dielectric/Polysilicon Gate; (2) High-k Dielectric/Metal Gate
© Synopsys 2012 55
But, Synthesis has Evolved
• Synthesis has evolved
beyond logic mapping
• It’s now predicting and
resolving congestion for
physical design
• Synthesis prediction of
physical effects evolution
is key to progress
© Synopsys 2012 56
And, Physical Design Under Heavy Load
• Increasingly, Physical
Design is the driver for
implementation schedule
• It’s where the rubber
meets the road – speed,
die-size, power, yield ..
• P&R evolution key to
progress
© Synopsys 2012 57
What’s on Designer’s Mind? Design & Project Management!
Is everyone using the same tool
version and the standard scripts?
How close are we to our design goals?
What’s the status of the blocks
right now?
How can I use the experience
from this project to plan the
next one better?
How much compute and license
resources are we using?
What’s taking up the most time?
Which step? Which block?
© Synopsys 2012 58
Many Flavors Of “Methodology”… Imagination Is the Only Limit…
Source: www.bk.com 2010
© Synopsys 2012 59
• create_clock -period [0.7 * target] high performance
• set_max_area to “0” small area
• Use small blocks for fast turnaround time
Past “Guidance” doesn’t Always
Apply to the Present
Things have changed but users are still
using the above techniques!
Synthesis
Place
& Route
Sig
no
ff
2005-2008
“Look-ahead”
Sig
no
ff
Design
Planning
Synthesis
DRC / LVS
Place
& Route
2000-2005
“Correlation” 2009-2010
“In-Design”
Place
& Route
DRC / LVS
Synthesis
Sig
no
ff
2011-
“Exploration”
Place
& Route
DRC / LVS
Sig
no
ff
Synthesis
Exp
lora
tio
n
Imp
lem
en
tatio
n
DRC / LVS
© Synopsys 2012 60
Wireload Model (WLM) results in higher frequency during Synthesis
than using Design Compiler Topographical (DCT) technology …
The Past vs. The Present
With WLM, these two circuits
have the same delay
Figure 1 Figure 2
With DCT, the delay is a reflection
of the x-y location of the cells
Which is more realistic?
© Synopsys 2012 61
Ten Best Practices for
Design Methodology
#6 Methodology
One or Two Flows
#7 Optimization
Adjust Accordingly
#8 Signoff
Review Your Environment
#9 Performance
Leverage Your EDA Partner
#10 Low Power
Architecture Drives Power
#1 Libraries
Know Your Attributes
#2 Setup
Correlation and Runtime
#3 Scripts
Impacts Your Design
#4 Constraints
Watch Your Constraints
#5 Analyze
Analyze-Fix-Proceed
© Synopsys 2012 62
Why is my design larger in area?
Why is it taking so long to run?
#1 Libraries: Know Your Attributes
Watch for dont_use, dont_touch, and size_only usage in your
libraries and scripts
• Attributes are user-controlled to guide optimization
• Restricting optimization may lead to problems
After
Optimization
Original Area
New Area
© Synopsys 2012 63
• A properly designed set of library
cells give optimization engines more
choice
– Avoid cells sensitive to minor change
in load, impedes convergence
– Footprint-equivalent cells are useful
for final-stage optimization w/ minimal
perturbation to other design metrics
– Std. cell pins should be on grid -
(especially complex cells with small
drive strength: higher pin density)
– Multiple variants for each flop (drive
strengths, delays, setup times, .. )
• Library quality enabler for targeted
performance
Technology and IP Make Sure to Have a Good Quality Library
Example:
Cell Sensitivity To Load Uncertainty
De
lay
Cload C*
D
*
Cell A
Cell B
B
A
© Synopsys 2012 64
#2 Setup: Correlation and Runtime
Netlist v1.0
SDC v1.0
• Compile
• 3.2M instances
Netlist v1.1
SDC v1.1
• Compile
• 6.8M instances??
What
happened???
• Found issues after days of
engineering work
• Size_only on 3.7M cells
• SDC with all cells set with
set_disable_clock_gating on
What do designers do when they run into these?
© Synopsys 2012 65
Review Your Settings and Input Understand the Different Objectives
• Detect design issues and dirty constraints styles that can lead to bad runtime/memory and QoR
DC Utility Checker
• Detect readiness of physical design before going into various implementation stages
ICC Utility Checker
• Detects application variables, settings and design issues causing runtime or memory increase
PT Utility Checker
© Synopsys 2012 66
Need to put things in perspective …
• First Step: review your script
– How was the script migrated to “Tool A”?
– Did you also update the script to leverage the latest
technologies?
• Early stage of your design, think fast mode
• Final stage of your design, think QoR
#3 Scripts: Impacts Your Design
When someone tells you “Tool A” is X times faster than “Tool B”
Incomplete Complete
© Synopsys 2012 67
• Today’s design requires
completeness
• Synopsys tools are tailored for
performance, but they also have
a mode to run fast
• Recommendations
– The typical complaint is long runtime,
choose your goal setting accordingly
– Make sure your script is up to date for
your end goal and to take advantage
of the latest features
Tool Input can Impact Results Understand How the Tool Can Help Meet Design Goals
© Synopsys 2012 68
Symptoms of over-constraining: long runtime,
excessive buffering and huge violations
#4 Constraints: Watch Your Constraints
Original Clock period
Input Delay Output Delay Time Available
for logic
• Over-constraining could guide the
tool to focus on artificial critical paths
• Over-constraining happens with
• Unrealistic input and/or output
delays
• Tightening the clock period
• Specifying large clock uncertainty
Synopsys tools are designed to work towards meeting design goals…
but don’t expect miracles!
© Synopsys 2012 69
Understanding EDA Tool will help Simple Illustration
Circuit A Circuit B
Will DC do this transformation?
CLKA wns = -0.300
CLKB wns = -0.100
CLKA wns = -0.280
CLKB wns = -0.150
Default Weights Delay Cost Before Delay Cost After
CLKA weight = 1
CLKB weight = 1
0.30
0.10
0.28
0.15
Total WNS Cost 0.40 0.43
Adjusted Weights Delay Cost Before Delay Cost After
CLKA weight = 10
CLKB weight = 1
3.00
0.10
2.80
0.15
Total WNS Cost 3.10 2.95
Total cost increased
Transformation rejected
Worst WNS = -0.300
Total cost reduced
Transformation accepted
Worst WNS = -0.280
<
> √
Cost = ∑ pi * wi
© Synopsys 2012 70
#5 Analyze: Analyze-Fix-Proceed
Push Button Flow
does not exists Know your circuit
to guide the tool
© Synopsys 2012 71
Synopsys Galaxy Implementation Flow
DC Graphical
IC Compiler
place_opt -spg
clock_opt
route_opt
signoff_opt
compile_ultra -spg
insert_dft
compile_ultra –spg -incr
StarRC
PrimeTimeSI
Signoff extraction
Signoff STA
Analyze
results
between
design
stages
© Synopsys 2012 72
Design specifications and constraints changes
constantly during the design cycle
#6 Methodology: One or Two Flows
180 nanometers (2000)
225K gates, 11 RAMs
150 MHz
45 nanometers (2010)
96mm2, ~ 300M transistors
7-9W
One flow
for both
exploration &
Implementation
Exploration flow
target for
early specs
& constraints
Implementation
flow
for final
design
realization
© Synopsys 2012 73
Exploration Throughout Galaxy
DC Explorer
• Early RTL Exploration
– Accelerates Design Schedules
Design Compiler
• Look-ahead & Physical Guidance
– Creates a better starting point
IC Compiler
• Design Exploration
– Creates initial floorplan
• Block Feasibility
– Determines physical feasibility
Galaxy Constraint Analyzer
• Continuous improvement
RTL
Exploration
RTL
Synthesis
Design
Exploration
Design
Planning
Block
Feasibility
Block
Implementation
Implementation Exploration
RTL
Physical
© Synopsys 2012 74
Adjust your constraints to model effects of
downstream design steps
#7 Optimization: Adjust Accordingly
Design
Compiler
• Account for clock trees
• No hold-timing fixing
• Be careful with critical range
• Do not over-constrain
An Illustration
© Synopsys 2012 75
• Synthesis and placement
– Do not over-constrain during synthesis
– Use DC SPG flow
– Account for max_transition and clock uncertainty
– Specify pre-CTS estimated constraints
• CTS
– Remove pre-CTS estimated constraints
• Route
– Remove/adjust pre-route constraints
– Adjust crosstalk thresholds
Manage Design Constraints Throughout Guidelines For Convergent Timing Closure
1029
971
913
800
850
900
950
1,000
1,050
1,100
Synthesis Place Clock Route
MH
z
Addnl. Customization For High-Performance
Tuned For Hi-Performance/Low Power
RM (Baseline)
Timing Closure Profile
Timing Closure
Profile
Do Not over
Complicate your flow
© Synopsys 2012 76
Runtime (CPU Hrs)
#8 Signoff: Review your Environment
0
16
32
48
64
80
96
112
128
1.1 1.2 5.5 37.0 50+
0
10
20
30
40
50
60
1.1 1.2 5.5 37.0 50+
Memory Usage (GB) 172 GB
Instances (Million) Instances (Million)
Designs run at customer site using revised
PrimeTime scripts and latest release version
Unlike wine, scripts grow stale with age
© Synopsys 2012 77
PrimeTime Scripts: Key Areas to Review
• Environment and setup
– Use latest release and ensure adequate hardware resources
• Reading parasitics
– Use binary parasitics when possible
• Multiple timing updates
– Eliminate redundant/legacy update_timing steps
• Inefficient TCL scripting and reporting
PrimeTime Design Utility Checker
can help with some of these tasks
© Synopsys 2012 78
#9 Performance: Leverage Your
EDA Partner
• Starting Point
– Built on Synopsys RM
– Understand the new
technologies and features
– Easy to use
• Reduce time-to-results
– Automated methodology to
achieve 90% of target quickly
– Additional advanced
techniques to reach final goal
– Minimize number of iterations
or “trial and errors”
– Reduce ECO efforts
Synthesis
Design Schedule
Typical Flow
HSLP Flow
Signoff + ECO Iterations P&R
© Synopsys 2012 79
HSLP Implementation Best Practices Reduces Time-to-Results
Time
Targets
100%
90%
75%
Typical Flow
With HSLP
Implementation
Best Practices
Design-specific
customization Reduces time-to-results
Typical Flow on
Regular designs Typical Flow on
High Performance designs
HSLP Flow
High Performance, Low Power (HSLP) Flow Requires Customization
© Synopsys 2012 80
#10 Low Power: Architecture Drives Power
0.9V 0.7V
0.9V
OFF
0.9V 0.9V
0.9V
OFF
Multiple Voltage (MV) Domains
Multi-Supply with shutdown No State Retention
Multi-Voltage with shutdown
0.9V 0.7V
0.9V
0.9V 0.7V
0.9V
OFF
Multi-voltage with shutdown & State Retention
SR
Retention
Registers
Power
Switches
(MTCMOS)
Level
Shifters
Isolation
Cells
Always-
on Logic
DESIGN TECHNIQUES
VDDB
VSS
IN
OUT
EN
VDD
ISO
VSS
IN
VDDI VDDO
OUT L
S AO IN OUT
VDD
VDDB
VSS
Gate Gate
on/off
VDD
Gate
VSS
VDDB
VDD
RR
© Synopsys 2012 83
The 20 nm Challenge: Single Exposure “Last Pitch With Single Exposure ~ 80 Nanometers…”
We Can Print This,… But We Cannot Print This
Source M. van den Brink, ASML, ITF 2009; P. Magarshack, STMicroelectronics, 2010
© Synopsys 2012 84
And Then This!
The Solution: Double Patterning A Significant Change
We Can Print This, and This,…
© Synopsys 2012 85
Synopsys Solution DPT Ready IC Compiler P&R, and IC Validator DRC
Source: Synopsys Research 2011
Wide Spacing Enforced Two-Color Decomposed Design
© Synopsys 2012 86
Synopsys Solution DPT Ready IC Compiler P&R, and IC Validator DRC
Source: Synopsys Research 2011
© Synopsys 2012 87
The Challenge: Planar CMOS Insufficient Performance, Excessive Power
32 Nanometer Planar Performance Power
Source: K. Kuhn, Intel, IDF 2011
© Synopsys 2012 88
The Solution: Non-Planar CMOS FinFET or Tri-Gate CMOS
22 Nanometer Tri-Gate Performance Power
Source: K. Kuhn, Intel, IDF 2011
© Synopsys 2012 89
The Solution: Non-Planar CMOS The First “Revolution”
Source: M. Bohr, Intel, YouTube 2011
© Synopsys 2012 91
• Superior drive current – Active region spans the fin height and
thickness (3 sides)
– Ids α (2*Hfin+Tfin) as opposed to just thickness for planar
• Reduced leakage – Depleted substrate
• Enhanced electron mobility – High-K gate oxide
– Metal gates in place of PolySilicon
– Strained silicon
– Multiple fins possible to increase total drive strength for higher performance
FinFET Advantages FinFET vs Planar Transistor
Source: Intel
FinFET
Planar Inversion Layer
Fin
© Synopsys 2012 92
This Is Not The End of Moore’s Law! But the Gap Between Intel and the Crowd Is Widening
Source: M. Bohr, Intel, IDF 2011
© Synopsys 2012 93
3D ICs: Technology Trends Four Main Categories of “> 2D-IC” Ahead
Memory
“Cube”
(Wide I/O) Memory
“Cube” on Logic
Silicon Interposer
3D Stack
C4
TSV Bump
1 2
3 4
© Synopsys 2012 94
3D-IC Two Basic Configurations Emerging Addressing Gigascale Design Challenges
Silicon Interposer (2.5D)
• Horizontally connected dies
• Drivers: Consumer, Storage, Networking
• Benefits: Yield, Cost, TTM & Flexibility
3D-IC
• Vertically stacked dies with TSVs
• Drivers: Wireless handset, Processors
• Benefits: Performance, form factor
© Synopsys 2012 95
The ”Memory Cube” Now
Source: C.-G. Hwang, Samsung, IEDM 2006
8 die stack
50 microns
560 microns
1
© Synopsys 2012 97
IP
Intellectual property core, IP core, or IP block is a reusable unit of logic, cell, or chip layout design that is the intellectual property of one party
IP cores may be licensed to another party or can be owned and used by a single party alone
IP cores can be used as building blocks within ASIC chip designs or FPGA logic designs
© Synopsys 2012 98
IP
IP cores in the electronic design industry have had a profound impact on the design of systems on a chip
IP core licensor spread the cost of development among multiple chip makers
IP cores for standard processors, interfaces, and internal functions have enabled chip makers to put more of their resources into developing the differentiating features of their chips new innovations faster
Licensing and use of IP cores in chip design came into common practice in the 1990s
© Synopsys 2012 99
2011 Design IP Revenue: $1.9B
Semiconductor IP Market Segments
Microprocessors 39%
DSP 5%
Fixed Function (GPUs, Security)
15%
Wired Interfaces 19%
Memory Cells/Blocks 10%
GP Analog/MS 4%
Block Libraries 1%
Physical libaries 3%
Other IP 4%
Processors
(CPUs, GPUs, DSPs)
Source: Gartner, March 2012
© Synopsys 2012 100
Semiconductor IP Market Size
Synopsys Share
CY04 CY05 CY06 CY07 CY08 CY09 CY10 CY11
Semiconductor IP Market Size 964.0 1,068.3 1,267.3 1,378.2 1,464.1 1,351.0 1,695.0 1,910.9
Synopsys Share 7.9% 7.6% 7.3% 7.2% 7.2% 9.1% 11.3% 12.4%
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
12.0%
14.0%
0.0
200.0
400.0
600.0
800.0
1,000.0
1,200.0
1,400.0
1,600.0
1,800.0
2,000.0
$M
Syn
op
sys S
ha
re
Source: Gartner, March 2012
© Synopsys 2012 101
Rank Company 2010 2011 Growth 2011 Share
1 ARM Holdings 575.8 732.5 27.2% 38.3%
2 Synopsys 191.8 236.2 23.2% 12.4%
3 Imagination Technologies91.5 126.4 38.1% 6.6%
4 MIPS Technologies 85.3 72.1 -15.5% 3.8%
5 Ceva 44.9 60.2 34.1% 3.2%
6 Si l icon Image 38.5 42.8 11.2% 2.2%
7 Rambus 41.4 38.9 -6.0% 2.0%
8 Tens i l ica 31.5 36.3 15.2% 1.9%
9 Mentor Graphics 27.3 23.6 -13.8% 1.2%
10 AuthenTec 19.6 22.8 16.3% 1.2%
Top Semiconductor IP Vendors
Source: Gartner, March 2012
© Synopsys 2012 102
10
20
30
40
50
60
70
0
20
40
60
80
100
120
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
% D
es
ign
Re
us
e
To
tal N
um
be
r o
f IP
Blo
ck
s p
er
So
C Avg. # IP Blocks per SoC
% Design Reuse
Source: Semico, October 2010
IP Blocks
IP Subsystems
IP Vendors Also Need to Provide More
Functions and Functionality
© Synopsys 2012 103
Complete Solution: HW, SW,
Prototype
Pre-integrated and Verified
SoC Ready: Seamlessly Drop-
in and Go
Subsystems:
The Next Evolution in The IP Market
What is a Subsystem?