analog circuit synthesis optimization€¦ · simulator with a search program. maelstrom2 (1999)...
TRANSCRIPT
Analog Circuit Synthesis Optimization Prepared for PhD Studies Preliminary Exam by
Yishai Statter, ECE, Colorado State University Advisor: Prof. Tom Chen
1
Outline ● Analog Design Automation Recap ● Circuit Optimization Methods ● Our Approach to Solving Opt. Bottlenecks ● Development Status and Preliminary Results ● Possible Improvements ● Goals and Plans
2
Analog Circuit CAD recap
There are two main categories of circuit synthesizers: ● Generators ● Optimizers, which are either: o Simulation-based o Equations-based
3
Simulation-Based Optimizers
● DELIGHT1 (1988) combined SPICE simulator with a search program.
● MAELSTROM2 (1999) and ANACONDA3 (2000), already aware of the evaluation bottleneck attempted to offset it with advanced search algorithms and parallelization. (2-10h).
4
Equation-Based Optimizers ● OPASYN4 (1990) and OPTIMAN6 (1989) were
pioneering analytic equation-based optimization.
● Prado Synthesis Platform, (Barcelona Design7 2002), used TSMC 180nm models and GP.
5
● Their first op-amp engine9 was used by Mitsubishi. ● But their tracks disappear around 130nm milestone
Problems with Optimizers
● Promise everything, deliver little ● Long run time ● Lack commercial “finish”
6
Problems with Optimizers
● Promise everything, deliver little
o Provide a starting point, 90/10. ● Long run time ● Lack commercial “finish”
7
Problems with Optimizers
● Promise everything, deliver little
o Provide a starting point, 90/10. ● Long run time
o Great opportunity: improve evaluation cycle ● Lack commercial “finish”
8
Problems with Optimizers
● Promise everything, deliver little
o Provide a starting point, 90/10. ● Long run time
o Great opportunity: improve evaluation cycle ● Lack commercial “finish”
o SaaS rewrote vendor-user relationship
9
Optimization Cycle
Optimization runs have two main phases: ● Search - Select a set of solutions by the algorithm. ● Evaluate - Grade solutions for feasibility and optimality. The performance of the optimizer depends on the ● optimization strategy ● evaluation run time
10
Evaluation is a Bottleneck
● Simulation-based optimizers: o External optimizers require costly IPC. o Scripted optimizers are slow and limited. o Simulations take time to calculate irrelevant data
● Equations-based optimizers: o Equations out-date fast. o Equations are either naïve or slow.
11
Table-Based Models
● Lookup tables were suggested as substitute to equations in models for simulation (Yoon & Allen10 1991).
● Their point is valid today: equation models are inefficient and technology-specific.
● Today tables are used as an upper-level partitioning of transistor geometric classes (“binning”).
● A recent project, BAG11 generator generator, uses lookup-table for calculations shortcuts, such as gM/IDS ratio. 12
High Throughput Model - Look-Up Tables
Two developments since the early 1990’s justify reconsidering LUT models: ● FET behavior is much more complicated ● CAD machines are a little bigger An array of samples from the entire operating space can replace equations, provided that: ● Lookup is quick - fast interpolation methods ● Accuracy is sufficient - adequate resolution
13
Table lookups are used today
“Binning” is in fact a LUT on top of equations models
14
Length limits Width limits
TSMC 180nm BSIM 3v32
TSMC 40nm BSIM 4v5
The main goal is reducing run-time
Development Construction Bootstrap Show Time
Developer Side User Side
● Runtime costs rise with each stage. ● Runtime is not “cheap” at Development.
15
Every table consists of ● N-dimensional array ● Per-dimension legend (e.g. Mi=8) ● Structure information ● Size=∏(Mi+1)
0 1 9 10 82 90 91 81
Interpolated Lookup Table (LUT)
16
VDS 9 0 0.2 0.4 0.8 1.0 0.6 1.4 1.6 1.2
1 9 81 Weight
Offset
Cell Location
● Two possible algorithms for converting a physical value to an index: o Linear scaling, requires uniform partitioning, O(N) o Binary search (tree), any sorted partitioning, O(∑logMi)
● Since uniform and non-uniform partitioning are mixed, only binary search is implemented.
● Overshoot and undershoot are tolerated (extrapolation warning).
17
0.8
0
0.2
0.4
1.0 0.6 1.4
1.2
1.6 Undershoot 0 1 2 3 4 5 6 7
Full Interpolation For N dimensional hypercube, “Full Interpolation” ● is equivalent to Lagrange’s Polynomial ● requires O(2N) time and memory complexity
18
Full Interpolation For N dimensional hypercube, “Full Interpolation” ● is equivalent to Lagrange’s Polynomial ● requires O(2N) time and memory complexity
19
Full Interpolation For N dimensional hypercube, “Full Interpolation” ● is equivalent to Lagrange’s Polynomial ● requires O(2N) time and memory complexity
i=0 L
i=1 W
i=2 VGS
i=3 VDS
Bufferj/2=Bufferj+ ri*(Bufferj+1-Bufferj)
20
Pre-fitted Linear Interpolation (LIT) Some cells can be fitted to linear regression ● Conversion is off-line ● requires N+1 coefficients per cell (!) ● Cells that don’t fit are marked with NaN intercept ● Linear-cells’ percentage within the LUT (LCP)
serves as initial accuracy indication. ● Interpolation takes N operations. ● LUT+LIT Size: ∏(Mi+1)+(N+1)∏Mi
21
LUT/LIT performance - LCP
As expected, the proportion of cells that fit linear regression grows with LUT resolution.
22
Other Interpolation Algorithms Two other directions were considered: ● Higher-order interpolation
o Smooth results o Better accuracy for lower resolution o Online method - Huge run-time costs
● Hybrid between LIT and Full o Improve LCP by isolating problem-dimensions o Offline method - Huge memory costs
23
Slopes’ Memory Size (xLUT) Number of Multiplications
LUT/LIT interpolation time
24
Equations are still useful ● LUT’s save time in exchange for a lot of memory
and a little accuracy. ● The trade-off between accuracy, speed and
memory should be balanced by using common-sense table assignments.
Example: We know that IDS∝ W/L. Separating W/L factor from the LUT frees it to model high-order phenomena:
IDS_Size LUT Simulated IDS
Interpolated IDS
25
*L/W *W/L
Implementation ● LUT models should be compatible with a
commercial sign-off tool, Spectre. ● Given the amount of required data, “black-boxed”
commercial simulator is not an option. ● Solution: clone Spectre behavior in an open-
source simulator, NGSPICE.
26
Single Executable: RAMSpice ● Modified NGSPICE, BSD release of SPICE3 ● Local hacks (“Mengo”):
o New compilation script includes code-generator o Enhanced simulation data access and processing o Enhanced simulation modes o LUT/LIT implementation
27
Single Executable: RAMSpice ● Modified NGSPICE, BSD release of SPICE3 ● Local hacks (“Mengo”):
o New compilation script includes code-generator o Enhanced simulation data access and processing o Enhanced simulation modes o LUT/LIT implementation
● Compiled with Tcl interface, it can serve as: o Simulator for characterizing FET’s o Statistical analysis engine o Web server
28
Single Executable: RAMSpice ● Modified NGSPICE, BSD release of SPICE3 ● Local hacks (“Mengo”):
o New compilation script includes code-generator o Enhanced simulation data access and processing o Enhanced simulation modes o LUT/LIT implementation
● Compiled with Tcl interface, it can serve as: o Simulator for characterizing FET’s o Statistical analysis engine o Web server
● The challenge: Convert SCS files to Spice. 29
Interpolation Code - Loop Unrolling
● Currently, LUT’s go up to 5 dimensions. ● Max dimensionality of 8 is reasonable. ● Result: we can code separate interpolation
function per dimensionality. ● Code-generator abstraction enable
compilation-time loops. Example: retval=intercept; for (int i=0; i<a->dim; i++) { retval+=slopes[i]*coord[i]; }
30
Interpolation Code - Loop Unrolling
● Currently, LUT’s go up to 5 dimensions. ● Max dimensionality of 8 is reasonable. ● Result: we can code separate interpolation
function per dimensionality. ● Code-generator abstraction enable
compilation-time loops. Example: retval=intercept; #For: {set i 0} {$i<$DIM} {incr i} { retval+=slopes[$i]*coord[$i]; }
31
Quality Analysis ● LCP has no connection to simulation data. ● LUT Vs. Simulation was compared in two stages: o Spectre Vs. RAMSpice o RAMSpice Vs. LUT
32
RAMSpice LUT / LIT
- Procedure ● Generate random sample of (VGS,VDS,VBS,L,W) ● Convert sample to Skill script ● Run Skill script as Spectre simulations ● Convert sim. values with sample to Tcl script. ● Run the Tcl script on RAMSpice ● Post-process results
33
Random Sample
Skill Gen.
Tcl Gen. RAMSpice
Post Process
RAMSpice
40nm - Results
34
RAMSpice
40nm - Results
35
RAMSpice
Where’s the problem?
36
Length limits Width limits
TSMC 180nm BSIM 3v32
TSMC 40nm BSIM 4v5
40nm w/o wide FET’s
37
RAMSpice
180nm - Results
38
RAMSpice
180nm w/o wide FET’s
39
RAMSpice
- Procedure ● Per leading parameters (IDS,gm,ro), Generate random sample of (VGS,VDS,VBS,L,W) ● Calculated each point via LUT and Simulation. ● Collected relative and nominal errors ● The analysis focused on two indicators:
o Min error range to include 99% of the population o Time interval average per interpolation
● Other quality indicators: sigma, average error (rel+nom)
40
RAMSpice LUT / LIT
- Processing ● We plotted each resolution in an error vs.
memory and error vs. access-time graphs. ● We marked dominated points in red. ● A dominated point has a dominating one, which
o Performs better (less min-error) o For less resources (less memory/query time)
● We used perimeter algorithm to identify points that aren’t strictly-dominated but have poor ROI and colored them orange.
● The rest are in green, forming pareto front.
41
RAMSpice LUT / LIT
- 40nm Results
42
RAMSpice LUT / LIT
- 180nm Results
43
RAMSpice LUT / LIT
Relative Error
44
RAMSpice LUT / LIT
Relative Error
45
RAMSpice LUT / LIT
Relative Error
46
RAMSpice LUT / LIT
FET Sizer.tcl ● The sizer is a re-implementation of
sizer.php, a js/php transistor size calculator. ● Unlike the original, this one uses
o the LUT/LIT infrastructure o Tcl prototype of a general-purpose random search o Spice sign-off simulator o HTTP server All running on a single RAMSpice executable
47
48
Sizer random samples I/II
49
Ids W L Ro Gm Vgs Sim Ro Sim Gm Err Ids Err Gm Err Ro
10 2.20E-07 1.80E-07 1.17E+05 2.52E-05 5.92E-01 1.17E+05 2.51E-005 0.535 0.410 0.077
20 2.20E-07 1.80E-07 1.07E+05 4.75E-05 6.87E-01 1.07E+05 4.76E-005 0.132 0.168 0.024
30 2.20E-07 1.80E-07 1.02E+05 6.48E-05 7.68E-01 1.02E+05 6.48E-005 0.022 0.097 0.002
40 2.20E-07 1.80E-07 9.76E+04 7.62E-05 8.45E-01 9.76E+04 7.64E-005 0.023 0.248 0.004
50 2.20E-07 1.80E-07 9.44E+04 8.37E-05 9.21E-01 9.45E+04 8.38E-005 0.003 0.092 0.010
60 2.20E-07 1.80E-07 9.17E+04 8.83E-05 9.96E-01 9.17E+04 8.84E-005 0.001 0.067 0.017
70 2.20E-07 1.80E-07 8.91E+04 9.13E-05 1.07E+00 8.91E+04 9.13E-005 0.002 0.056 0.023
70 3.37E-06 1.03E-05 3.13E+04 3.43E-05 1.79E+00 3.11E+04 3.43E-005 0.836 0.052 0.830
70 4.99E-06 1.57E-05 3.00E+04 3.32E-05 1.80E+00 2.99E+04 3.32E-005 0.114 0.009 0.113
70 6.34E-06 1.99E-05 3.04E+04 3.35E-05 1.79E+00 3.04E+04 3.35E-005 0.093 0.007 0.061
70 3.55E-06 1.08E-05 3.19E+04 3.46E-05 1.78E+00 3.17E+04 3.46E-005 0.677 0.025 0.660
70 2.29E-06 6.84E-06 3.26E+04 3.52E-05 1.78E+00 3.21E+04 3.51E-005 1.496 0.061 1.510
Sizer random samples II/II
50
Ids W L Ro Gm Vgs Sim Ro Sim Gm Err Ids Err Gm Err Ro
70 2.38E-06 7.07E-06 3.31E+04 3.56E-05 1.77E+00 3.27E+04 3.55E-005 1.460 0.063 1.331
70 4.63E-06 1.43E-05 3.14E+04 3.41E-05 1.78E+00 3.13E+04 3.41E-005 0.105 0.001 0.082
70 3.01E-06 9.19E-06 3.12E+04 3.43E-05 1.79E+00 3.08E+04 3.42E-005 1.147 0.068 1.213
70 1.05E-04 1.80E-07 1.66E+04 1.52E-03 3.91E-01 1.83E+04 0.0015054 5.102 0.875 9.362
70 3.43E-04 1.80E-07 2.12E+04 2.19E-03 3.39E-01 2.42E+04 0.0021019 9.900 4.397 12.29
100 2.20E-07 1.80E-07 8.20E+04 9.49E-05 1.29E+00 8.20E+04 9.50E-005 0.003 0.015 0.006
Sizer behavior I/IV
51
Sizer Behavior II/IV
52
Sizer Behavior III/IV
53
Sizer Behavior IV/IV
54
Plan for next phase of research
55
LUT Improvements Accuracy improvements: ● Offset entries to eliminate average error. ● Over-sample and fit optimal entries (RSM). ● Over-sample and fit optimal sampling levels.
56
LUT Improvements Accuracy improvements: ● Offset entries to eliminate average error. ● Over-sample and fit optimal entries (RSM). ● Over-sample and fit optimal sampling levels. Query-time improvements: ● Go back to uniform sampling cell-location ● Use super-grid lookup arrays ● Fixed-point interpolation
57
LUT Improvements Accuracy improvements: ● Offset entries to eliminate average error. ● Over-sample and fit optimal entries (RSM). ● Over-sample and fit optimal sampling levels. Query-time improvements: ● Go back to uniform sampling cell-location ● Use super-grid lookup arrays ● Fixed-point interpolation Memory improvements: ● Single precision ● Compression techniques 58
Goal 1: Circuit Evaluator ● Circuit evaluator is needed to prove models’
usefulness in accelerating optimizations. ● The next stage in our bottom-up programming
is nodal analysis. ● This program converts the topology of the
circuit to a list of math operations. ● Combined with the constraints and objectives,
the output of the program is an evaluation routine.
59
Goal 2: Circuit Synthesis Web Site The final goal of the research is a web tool that synthesizes a 2-stage op-amp. Remember earlier lessons: ● 90/10 - users intervene, help and take over. ● Speed is still the main goal ● Capitalize on the advantages of web-tools:
o Crowd-source technology, topologies (algorithms?) o Preserve database, update behavior o Reuse successful solutions
60
Two-Stage op-amp
61
Publications from existing research
62
In this paper we reviewed: 1. The LUT/LIT based models 2. Accuracy analysis of our characterizing simulator Vs. Spectre 3. Accuracy analysis of our model Vs. characterizing simulator 4. Sizer tool
References (1) DELIGHT. SPICE: An Optimization-Based System for the Design of Integrated Circuits - William Nye, David c. Riley,
Alberto Sangiovanni-Vincentelli, Andre L. Tits (2) MAELSTROM: Efficient Simulation-Based Synthesis for Custom Analog Cells - Rodney Phelps, Michael Krasnicki, Rob A.
Rutenbar, Richard Caley, James R. Hellum. (3) Anaconda: Simulation-Based Synthesis of Analog Circuits Via Stochastic Pattern Search - Rodney Phelps, Michael
Krasnicki, Rob A. Rutenbar, Richard Caley, James R. Hellum. (4) OPASYN: A Compliler for CMOS Operational Amplifiers - Han Young Koh, Carlo H. Sequin, Paul R. Gray (5) ISAAC: A Symbolic Simulator for Analog Integrated Circuits, Georges G. E. Gielen, Herman C. C. Walscharts, Willy M. C.
Sansen (6) AMGIE—A Synthesis Environment for CMOS Analog Integrated Circuits - Geert Van der Plas, Geert Debyser, Koen
Lampaert, Jan Vandenbussche, Georges G. E. Gielen, Willy Sansen, Petar Veselinovic, Domine Leenaerts (7) “Barcelona Design Unveils Revolutionary Analog Circuit Solution” http://www.design-reuse.com/news/2866 (8) “Startup Wants To Automate Chip Design”, Linda Dailey Paulson, IEEE Explore, Aug. 2002, Page 25 (9) "Circuit Engine" Soups Up Op-Amp Synthesis, David Maliniak, Electronic Design Feb. 2003
(10) An adjustable accuracy model for VLSI analog circuits using lookup tables, Kwang S. Yoon, Phillip E. Allen, 1991
(11) BAG: A Designer-Oriented Integrated Framework for the Development of AMS Circuit Generators J. Crossley, A. Puggelli, H.-P. Le, B. Yang, R. Nancollas, K. Jung, L. Kong, N. Narevsky, Y. Lu, N. Sutardja, E. J. An, A. L. Sangiovanni-Vincentelli, E. Alon Department of Electrical Engineering and Computer Science, University of California, Berkeley
(12) A Novel High-Throughput Method for Table Look-Up Based Analog Design Automation, Yishai Statter, Tom Chen (draft submitted to Elsevier’s Integration journal )
63
Backup
64
Sizer simple search algorithm
65
Plan ● LUT improvements trial - 2 weeks, EOY ● Circuit Analysis - 1 Month, Jan `15 ● Circuit Synthesis site - 3 Months, April `15 ● Analysis, Dissertation - 3 Months, July `15 Planning to defend in Summer 2015
66