design and optimization techniques of high-speed vlsi circuits

Upload: sandeepmahankali

Post on 09-Apr-2018

263 views

Category:

Documents


1 download

TRANSCRIPT

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    1/310

    Design and optimization techniques of

    highspeed VLSI circuits

    Marco Delaurenti

    Politecnico di Torino

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    2/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    3/310

    Design and optimization techniques of

    highspeed VLSI circuits

    Marco Delaurenti

    PhD Dissertation

    December 1999

    Politecnico di Torino

    Advisor

    Prof. Maurizio Zamboni

    Coordinator

    Prof. Ivo Montrosset

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    4/310

    Copyright c1999 Marco Delaurenti

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    5/310

    Writing comes more eas-

    ily if you have something

    to say.

    (Sholem Asch)

    When I use a word,

    Humpty Dumpty said in

    rather a scornful tone, it

    means just what I choose

    it to meanneither more

    nor less.

    (Lewis Carroll)

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    6/310

    Acknoledgments

    First of all I would like to thank my advisor, Prof. M. Zamboni, Prof. G

    Piccinini, Prof. G. Masera for their invaluable help, and Prof. P. Civera for

    his being a bridge toward the real world. Also many thanks to the VLSI

    LAB members at Politecnico of Turin, Italy: Mario for his input about the

    critical paths (no, I do not thank you for the jazz songs that you play all

    day long), Luca for the long discussions about books and movies (no, I

    havent seen the last Kubricks movie), Andrea for his very good cocktails

    (especially the Negroni one) and Danilo, because I forgot him every time

    we went to lunch. Thanks also to Max (for he gave me the root password),

    and to Yuan&Svensson for the invention of the TSPC.

    Special thanks, finally, to Mg, for her support and for have been tolerating

    me till now.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    7/310

    CONTENTS

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

    Part I CMOS Logic 1

    1. Introduction to CMOS logic . . . . . . . . . . . . . . . . . . . . . 3

    1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2 CMOS logic families . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.2.1 Static logic families . . . . . . . . . . . . . . . . . . . . 5

    1.2.2 Dynamic logic families . . . . . . . . . . . . . . . . . . 6

    1.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    Part II Circuit Modeling 13

    2. A simple model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.1 The Elmores model . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    3. A complex model . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    3.1 The FAST model . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.1.1 MO S equations . . . . . . . . . . . . . . . . . . . . . . 23

    3.1.2 Internal nodes approximation . . . . . . . . . . . . . . 24

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    8/310

    viii Contents

    3.1.3 Body effect . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.2 Delay estimation . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.2.1 Equation solving . . . . . . . . . . . . . . . . . . . . . 32

    3.3 Power estimation . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.3.1 Switching energy . . . . . . . . . . . . . . . . . . . . . 36

    3.3.2 Shortcircuit energy . . . . . . . . . . . . . . . . . . . 39

    3.3.3 Subthreshold energy . . . . . . . . . . . . . . . . . . 39

    3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    Part III Optimization 45

    4. Mathematic Optimization . . . . . . . . . . . . . . . . . . . . . 47

    4.1 Optimization theory . . . . . . . . . . . . . . . . . . . . . . . 48

    4.1.1 Mono-objective optimization . . . . . . . . . . . . . . 49

    4.1.1.1 Unconstrained problem . . . . . . . . . . . . 51

    4.1.1.2 Constrained problem . . . . . . . . . . . . . 52

    Lagrange multiplier and Penalty functions . . 52

    4.1.2 Multi-objective optimization . . . . . . . . . . . . . . 54

    4.1.2.1 Unconstrained . . . . . . . . . . . . . . . . . 56

    4.1.2.2 Constrained . . . . . . . . . . . . . . . . . . 57

    Compromise solution . . . . . . . . . . . . . . 57

    4.2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . 58

    4.2.1 One-dimensional search techniques . . . . . . . . . . 59

    4.2.1.1 The section search . . . . . . . . . . . . . . . 59

    Dicotomic search . . . . . . . . . . . . . . . . . 59

    Fibonacci Search . . . . . . . . . . . . . . . . . 60

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    9/310

    Contents ix

    The golden section search . . . . . . . . . . . . 60

    Convergence considerations . . . . . . . . . . . 61

    4.2.1.2 Parabolic interpolation . . . . . . . . . . . . 62

    The Brents rule . . . . . . . . . . . . . . . . . . 62

    4.2.2 Multi-dimensional search . . . . . . . . . . . . . . . . 63

    4.2.2.1 The gradient direction: steepest (maximum)

    descent . . . . . . . . . . . . . . . . . . . . . 63

    4.2.2.2 The optimal gradient . . . . . . . . . . . . . 65

    Convergence considerations . . . . . . . . . . . 66

    4.2.3 The conjugate direction method . . . . . . . . . . . . 67

    4.2.3.1 The FletcherReeves conjugate gradient al-

    gorithm . . . . . . . . . . . . . . . . . . . . . 68

    4.2.3.2 The Powell conjugate gradient algorithm . . 69

    4.2.4 The SLOP algorithm . . . . . . . . . . . . . . . . . . 70

    4.2.5 The simulated-annealing algorithm . . . . . . . . . . 72

    4.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

    5. Circuit Optimization . . . . . . . . . . . . . . . . . . . . . . . . 77

    5.1 Optimization targets . . . . . . . . . . . . . . . . . . . . . . . 78

    5.1.1 Circuit delay . . . . . . . . . . . . . . . . . . . . . . . . 79

    Critical Paths . . . . . . . . . . . . . . . . . . . 80

    5.1.1.1 Delay formula obtained by the Elmore model 84

    5.1.1.2 Delay measurement obtained by the FAST

    model and by HSPICE . . . . . . . . . . . . . 86

    5.1.2 Power consumption . . . . . . . . . . . . . . . . . . . 87

    5.1.3 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

    5.2 Optimization examples . . . . . . . . . . . . . . . . . . . . . . 91

    5.2.1 Algorithm choice . . . . . . . . . . . . . . . . . . . . . 94

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    10/310

    x Contents

    5.2.2 Mono-objective optimizations . . . . . . . . . . . . . . 95

    5.2.2.1 Area . . . . . . . . . . . . . . . . . . . . . . . 95

    5.2.2.2 Power . . . . . . . . . . . . . . . . . . . . . . 96

    5.2.2.3 Delay . . . . . . . . . . . . . . . . . . . . . . 97

    5.2.3 Multi-objective optimizations . . . . . . . . . . . . . . 102

    5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

    6. ACA D

    tool for optimization . . . . . . . . . . . . . . . . . . . . 1076.1 Logical description . . . . . . . . . . . . . . . . . . . . . . . . 107

    6.1.1 The optimization algorithm module (OA M) . . . . . . 107

    6.1.2 The function evaluation module (FE M) . . . . . . . . . 109

    6.1.3 Core engine . . . . . . . . . . . . . . . . . . . . . . . . 109

    6.2 Code implementation . . . . . . . . . . . . . . . . . . . . . . . 110

    6.2.1 The classes CircuitNetlist and Circuit . . . . . . . . . 110

    6.2.2 The class EvaluationAlgorithm . . . . . . . . . . . . . 112

    6.2.3 The class OptimizationAlgorithm . . . . . . . . . . . 113

    6.2.4 The critical path retrieving . . . . . . . . . . . . . . . 115

    6.2.5 The derived classes . . . . . . . . . . . . . . . . . . . . 116

    6.3 Program flows . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

    6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    7. Results and conclusions . . . . . . . . . . . . . . . . . . . . . . 121

    7.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

    7.1.1 Mono-objective vs. Multiobjective . . . . . . . . . . . 122

    7.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

    7.3 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    11/310

    Contents xi

    Appendix 143

    A. Class graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

    B. Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    B.1 Main functions . . . . . . . . . . . . . . . . . . . . . . . . . . 149

    B.2 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . 208

    B.3 Simulators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    12/310

    xii Contents

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    13/310

    LIST OF FIGURES

    1.1 Static and . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.2 Pass-transistor logic xor . . . . . . . . . . . . . . . . . . . . . 61.3 Domino typical gate . . . . . . . . . . . . . . . . . . . . . . . 7

    1.4 CVSL typical gate . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.5 C2MOS typical gate . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.6 TSPC Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.1 RC MO S equivalence . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2 RC chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.3 RC single cell . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.4 Elmore impulse response . . . . . . . . . . . . . . . . . . . . . 18

    3.1 Inverter voltages waveform . . . . . . . . . . . . . . . . . . . 23

    3.2 Mos chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.3 Node voltages . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3.4 Voltages wave form in the nMO S chain . . . . . . . . . . . . 27

    3.5 Voltages wave forms in the pMOS chain . . . . . . . . . . . . 28

    3.6 VDS and VGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.7 MOSFET chain with static voltages . . . . . . . . . . . . . . . 30

    3.8 Threshold variation . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.9 Delay comparison . . . . . . . . . . . . . . . . . . . . . . . . . 42

    3.10 Energy comparison . . . . . . . . . . . . . . . . . . . . . . . . 43

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    14/310

    xiv List of Figures

    4.1 Section search . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

    4.2 Minimization by Powell algorithm . . . . . . . . . . . . . . . 70

    4.3 Minimization by Powell algorithm . . . . . . . . . . . . . . . 71

    4.4 Minimization by SLOP algorithm . . . . . . . . . . . . . . . . 72

    4.5 Minimization by Simulated-annealing algorithm . . . . . . . 73

    4.6 Minimization by Simulated-annealing algorithm . . . . . . . 74

    5.1 Design flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    5.2 Delay definition . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    5.3 Critical paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    5.4 Critical path tree . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    5.5 Elmore delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

    5.6 Elmore delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    5.7 HSPICE delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

    5.8 FAST delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

    5.9 HSPICE Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    5.10 CMOS Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

    5.11 TSPC Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    5.12 TSPC And gates . . . . . . . . . . . . . . . . . . . . . . . . . . 96

    5.13 TSPC Or gates . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    5.14 Static and-or gate . . . . . . . . . . . . . . . . . . . . . . . . . 98

    5.15 Static parity gate . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    5.16 Static full-adder . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    5.17 TSPC full-adder (onestage) . . . . . . . . . . . . . . . . . . . 101

    6.1 Tool block diagram . . . . . . . . . . . . . . . . . . . . . . . . 108

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    15/310

    List of Figures xv

    7.1 Comparison of 0.7 m and 0.25 m. gates @ minimum tech-

    nology width . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    7.2 Delay optimization of 0.7 m gates. . . . . . . . . . . . . . . . 125

    7.3 Delay optimization of 0.25 m gates. . . . . . . . . . . . . . . 126

    7.4 Technology comparison of delay optimization. . . . . . . . . 127

    7.5 Several delaypower optimization policies of 0.7 m gates. . 132

    7.6 Energy-dissipation variation (zoom of figure 7.5(b)) . . . . . 133

    7.7 Several delaypower optimization policies of 0.25 m gates. 134

    7.8 Energy-dissipation variation (zoom of figure 7.7(b)) . . . . . 135

    7.9 Delaypower optimization (50%50%) comparison of 0.7 m

    and 0.25 m gates. . . . . . . . . . . . . . . . . . . . . . . . . 136

    7.10 Delay and power trajectory during 4 different multi-objective

    optimizations for the andor gate . . . . . . . . . . . . . . . . 137

    7.11 Delay and power trajectory during 4 different multi-objective

    optimizations for the parity gate . . . . . . . . . . . . . . . . 138

    7.12 Delay and power trajectory during 4 different multi-objectiveoptimizations for the static full-adder . . . . . . . . . . . . . 139

    7.13 Delay and power trajectory during 4 different multi-objective

    optimizations for the dynamic full-adder . . . . . . . . . . . 140

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    16/310

    xvi List of Figures

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    17/310

    LIST OF TABLES

    3.1 Mean Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.2 Execution time . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    4.1 Optimization algorithms . . . . . . . . . . . . . . . . . . . . . 75

    5.1 Basic gates: complexity . . . . . . . . . . . . . . . . . . . . . . 92

    5.2 Basic gates: pre-optimization delay, power consumption and

    area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

    5.3 Full-adder: delay optimization . . . . . . . . . . . . . . . . . 99

    5.4 Agreements of targets . . . . . . . . . . . . . . . . . . . . . . 103

    5.5 Full-adder: delay and power optimization . . . . . . . . . . 105

    5.6 Full-adder: optimizations comparison . . . . . . . . . . . . . 105

    7.1 Library gates list . . . . . . . . . . . . . . . . . . . . . . . . . . 122

    7.2 Delay and energy dissipation @ minimum width (HSPICE) . 123

    7.3 Delay decreasing and energy increasing (both relative) in a

    delay optimization. . . . . . . . . . . . . . . . . . . . . . . . . 128

    7.4 Elapsed time and total number of function evaluations for a

    full-delay optimization with HSPICE on a ULTRA-sparc 5 129

    7.5 Constrained delay optimization of a few 0.25 m gates. . . . 130

    7.6 Delay worsening and energy improvement between a full

    delay optimization and delay-power optimization . . . . . . 133

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    18/310

    xviii List of Tables

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    19/310

    Preface

    The design of high speed integrated circuit is a long and complex op-

    eration; nonetheless the total timetomarket required from the idea to the

    silicon masks is reducing along the way.

    To help the designer during this long and winding road several CAD tools

    are available. In the first step the only thing existing is the description of

    the circuit behaviour (the idea); in the central step of the design flow the

    designer knows only the logic functioning of each block composing the cir-

    cuit, but he ignores the technology realization of these blocks; in the last

    steps, finally, the designer knows exactly the technology implementation

    of every single gate of the circuit, and can compose the final layout with

    every gate. Ca va sans dire that the CAD tool are nowadays of vital import-

    ance in the design flow, and moreover the goodness or the badness of such

    tools influence a lot the quality of the final design.

    Among all the possible instruments, the optimization tools have a pri-

    mary role in all the phases of a project, starting from the optimization at

    higher level and descending to the optimization made at the electrical level.

    This thesis focuses its efforts in developing new strategies and new

    techniques for the optimization made at the transistor dimension level, that

    is the one done by the cell library engineer, and developing also a CAD in-

    strument to make this work as more as harmless as possible.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    20/310

    xx Preface

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    21/310

    Part I

    CMOS LOGIC

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    22/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    23/310

    Chapter 1

    INTRODUCTION TO CMOS LOGIC

    THE optimization of VLSI circuits involves the optimization of single

    CMOS cell. In this chapter are briefly reported the basic CMOS logic

    families, with their pros and cons. The simple goal is to pick up among

    the static and dynamic logic families the most appealing for the use in vlsi

    circuits, and, in some measure, the most actually used, and then apply to

    them the optimization techniques shown in the next chapters.

    1.1 Introduction

    We might ask: why to optimize a single cell in VLSI circuit, when the

    design nowadays is shifting toward higher and higher level?

    Some answers could be:

    Need of re-usable library cells. This makes easier to reuse the samelibrary for different projects. It is a must nowadays, in order to reduce

    the total time to target/market.

    An optimized library makes easier the design at higher level: floor-planning, routing, can have relaxed constraints, since the gates have

    a better behaviour. It is possible to reduce the time to repeat some

    critical steps like floorplanning or routing until all the specifications

    are met: these specifications are met earlier, since the cell globally

    have a better behaviour.

    Need of having some equivalent libraries with different kind of op-timization. It is possible to have different libraries that have different

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    24/310

    4 Chapter 1. Introduction to CMOS logic

    specifications, but are functionally equivalent, so that it is possible to

    create different version of a project simply substituting the basic lib-rary. It would be possible, for example, to have, of the same project, a

    version that runs at full speed, and version optimized for low-power

    dissipation.

    This swapping of libraries does not involve the higher levels of design,

    for it is totally transparent to the designer during floorplanning or

    routing. Just before the layout production, during the cell mapping,

    it is possible to choose the library on to which the project would be

    mapped.

    These answer have led to consider the appropriateness of the produc-

    tion of a tool able to perform the optimization of a cell library, in a way

    appropriate for the designer. The goal is to produce some results to show

    that this optimization is worth during a design cycle, and also to make the

    insertion of the tool in a design cycle as smooth as possible.

    In order to attain results that are related to a real production cycle, we

    have to choose some cells that are almost present in a real library.

    For this purpose we introduce a very brief description of the most used

    CMOS logic families, and among them we choose the cells to develop and

    test the optimization framework.

    1.2 CMOS logic families

    The first basic distinction inside the CMOS logic families is among the

    static logics and the dynamic logics ([1]).

    Static logic: The static logic is a logic in which the functioning of the cir-cuit is not synchronized by a global signal, namely the clock of the

    circuit. The output is solely function of the input of the circuit, and

    it is asynchronous with respect to them. The timing of the circuit is

    defined exclusively by its internal delay.

    Dynamic logic: The dynamic logic is a logic in which the output is syn-

    chronized by a global signal, viz. the clock. The output is, then, func-

    tion both of the inputs of the circuit and of the clock signal; and the

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    25/310

    1.2. CMOS logic families 5

    timing of the circuit is defined both by its internal delay and by the

    timing of the clock.

    Both the static and dynamic logics comprehend several logic families.

    1.2.1 Static logic families

    The principal static families are:

    Conventional static logic It is the logic normally referred when speakingofstatic logic. A static circuit has the same number ofNMOS and PMOS

    transistors, but the n and p branches are respectively one the dual

    of the other. As an example see figure 1.1, which represents a static

    A

    B

    OUT = A and B

    Fig. 1.1: Static and

    and gate. It has two NMOS transistor connected in series and two

    PMOS connected in parallel.

    The static logic is quite fast, does not dissipate power in steady state

    and has a very good noise margin.

    Pseudo-NMOS It is an evolution of the yet surpassed NMOS logic. It is ob-

    tained by substituting the whole PMOS branch in a static logic with

    a single PMOS transistor with its gate connected to ground. So this

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    26/310

    6 Chapter 1. Introduction to CMOS logic

    PMOS is always conducting and leads the output node to the high

    state. When the NMOS branch conducts also, then the output dis-charges, if the ratio among the NMOS and PMOS transistor is well de-

    signed.

    This logic is cited here only for historical reason, since it is not so fast,

    it dissipates static power in a steady state (when the output is in the

    low state) and it is sensible to noise.

    Pass-logic The pass-logic is relatively new logic, and, for many digital de-

    signs, implementation in pass-transistor logic (PTL) has been shown

    to be superior in terms of area, timing, and power characteristics tostatic CMOS.

    As an example see figure 1.2,

    A

    A

    B

    B

    OUT = A xor B

    Fig. 1.2: Pass transistor logic xor

    1.2.2 Dynamic logic families

    The principal dynamic families have a characteristic in common: every

    dynamic logic needs of a pre-charge (or pre-discharge) transistor to lead to

    a known state some pre-charged nodes. This is done during the working

    phase known as pre-charge phase or memory phase; during another working

    phase, the evaluation phase the output has a stable value1.

    1 This brief introduction is limited to systems that have a single global clock, or onephase, intending here the word phase as synonym of clock, and not as above as a synonymof working period. There are systems that have two, or even four phase, but they are notintroduced here. The basic functioning, however, remains the same.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    27/310

    1.2. CMOS logic families 7

    The principal dynamic logics are divided yet in two sub-families, pipe-

    lined and not-pipelined. The first two these are non-pipelined, while the oth-ers are pipelined:

    Domino logic and NP Domino logic The typical domino gate is depicted

    in figure 1.3

    NMOS Block

    CLOCK

    OUT

    INPUTs

    Fig. 1.3: Domino typical gate

    During the pre-charge phase the clock is at its low state, so that the

    pre-charged node before the static inverter is high, and the output is

    low. During the evaluation phase the clock is high, so that the inputs

    of the nblock (that can perform any logical function) can discharge

    the pre-charged node and lead the output to the high state.

    We can cascade several of these gates, given that each gate has its

    own output inverter, and we can drive every gate with the same clock

    signal, given that the evaluation phase lasts the time necessary to all

    the gates to finish their inputs evaluation. This last fact explains why

    this is a non-pipelined logic: the output of every cell is available when

    the cell has finished its evaluation phase.

    Moreover this logic has a limited area occupancy, since it has a low

    number of PMOS transistors. On the other hand it is not possible to

    implement inverting-structure and, as all the other dynamic logics,

    this logic is subject to the charge-sharing problem2.

    2 The charge-sharing problem, or charge-redistribution, is a problem that affects the dy-

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    28/310

    8 Chapter 1. Introduction to CMOS logic

    A natural evolution of the domino logic is the N-P domino logic, or

    zipper logic. It consist of two typical cells, the one depicted in fig-ure 1.3, and the dual one obtained by that, simply swapping the n-

    block with a p-block, and a PMOS pre-charge transistor with a NMOS

    pre-discharge transistor, driven by the negated clock.

    This logic has a lower are occupancy, since there is no need of a static

    inverter, but has also a lower speed, given by the presence of PMOS

    transistors.

    Cascode voltage switch logic (CVSL) The CVSL is part of the large family

    ofdifferential logics. It needs both the inputs and the inputs negated,and two complementary n-block that perform the logic function, as it

    is possible to see in figure 1.4.

    OUTOUT

    IN

    PUTs

    IN

    PUTs

    Fig. 1.4: CVSL typical gate

    It has the advantage to be quite fast, since the positive feed-back of

    the two PMOS accelerates the switching of the gate, and also it has

    very good noise margins. Moreover it produces both the outputs and

    namic logics. Basically the charge stored in an precharged node node during the memoryphase does not remain fully stored in it. Lets think to a domino gate during the pre-chargephase, when the clock is low. If there is one input in the n-block that is high, then its cor-responding transistor is conducting. The n-branch is still not conducting, since the clockedNMOS transistor is not conducting, but some charge from the precharged node can flow toothers node via the conducting transistors in the n-block. This redistribution of charge issimply a charge of a cap8citor partition and lead to a state of the precharged node lesserthan the high state.

    This problem can produce logic errors, and surely diminishes the noise margins of

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    29/310

    1.2. CMOS logic families 9

    negated outputs without needing an inverter. As a drawback, it has

    a large area occupancy.

    C2MO S logic The typical C2MO S gate is shown in figure 1.5. It is basically

    a three-state gate, since when the clock is at the low state, the output

    is floating at the high impedance state.

    NMOS Block

    INP

    UTs

    CLOCK

    PMOS Block

    INPUTs

    CLOCK

    OUT

    Fig. 1.5: C2MO S typical gate

    It is principally used as a dynamic latch, as an interface among static

    logics and dynamic-pipelined logics.

    NO RAce logic (NORA) The NORA logic, as acronym of no race, is an evol-ution of the N-P domino logic. The static inverter of the domino logic

    is substituted with a C2MO S inverter. This is the first of the pipelined

    logics, since the output of every gates is available only when the clock

    switch its state, and not before.

    Since the output stage of every cell is also dynamic (a C2MO S in-

    verter), then this logic is more subject to the charge-sharing problem

    that the domino logic is.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    30/310

    10 Chapter 1. Introduction to CMOS logic

    True Single Phase Clock logic (TSPC) The final evolution of the NORA is

    the TSPC logic, or true single phase clock logic ([2]).The TSPC logic is a n-p logic, since of each gate exists the n-version

    and the p-version. For example the n-latch and the p-latch are shown

    in figure 1.6.

    OUT

    A

    CLK

    (a) Type n

    CLK

    A

    OUT

    (b) Type p

    Fig. 1.6: TSPC Latches

    The ultimate advantage of the TSPC logic is the presence of a single

    clock, since for its internal structure it is not necessary the presence of

    the clock negated.

    The TSPC logic is among the faster dynamic families, and surely it has

    a great appealing for its very low number of transistor employed.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    31/310

    1.3. Conclusion 11

    1.3 Conclusion

    After this very brief introduction to several CMOS families, we chose

    two different logics, in order to apply the study of the optimization tech-

    niques objects of this thesis. The criteria that drove us in choosing these

    families was both the diffusion in VLSI circuits, and the presence of very

    good qualities, perhaps not yet fully exploited in the real production of

    circuits.

    For these reasons we have chosen to include in our library a few static

    gates (an and gate, an or gate, and a few more) and a few dynamic

    gates, and in particular gates from the TSPC family. This family has shown

    good characteristics in term of speed, area occupancy and power dissipa-

    tion; it has also the very important feature to need only a single clock.

    The complete list of the gates comprising the library can be found in the

    table 7.1 (page 122), with their relative schematic diagram of CMOS imple-

    mentation.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    32/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    33/310

    Part II

    CIRCUIT MODELING

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    34/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    35/310

    Chapter 2

    A SIMPLE MODEL

    THE first model applied in the calculus of the delay in MO S circuits is

    the Elmores model ([3]). It is a simple RC delay model, and it is the

    basement of a switch MO S model (figure 2.1): the generic MOS is represen-

    ted, during the ON state, by its dynamic resistance across the drain pin and

    the source pin, and the parasitic capacitances and resistances at the drain

    and source pins.

    G

    D

    S

    ON= G

    C

    C

    D

    S

    CG

    Rd

    Rg

    S

    D

    RL

    CL

    R0

    Fig. 2.1: RC MOS equivalence

    If this simple MO S model is valid, then the Elmores delay formula can

    be used in every structure containing some MO S. The Elmores formula is

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    36/310

    16 Chapter 2. A simple model

    appealing for its simplicity and its easy of use; however the accuracy of the

    formula can worsen in the deep submicron domain, since the modeling ofa MO S through its resistance it is no more valid.

    Since the use of Elmores model is almost quite limited to comparis-

    ons with other models, of for introduction to delay modelling, section 2.1

    presents here only the very basic of the Elmores model and section 2.2

    shows the conclusions about the use of this model for VLSI models.

    2.1 The Elmores model

    The Elmores model or the Elmores delay formula can predict the delay

    of a RC chain as shown in figure 2.2.

    R RRi-1 i i+1

    C C Ci-1 i+1i

    Vi-1 Vi Vi+1V0

    Fig. 2.2: RC chain

    In order to obtain the formula, lets start with a single RC cell, as shown

    in figure 2.3. We can express the voltage V1(t) by means of a differential

    equation such as:

    C0dV1

    dt=

    V1(t) V0(t)R0

    (2.1)

    Integrating the equation (2.1), we can write

    V1 = V0(t)

    1 e tR0C0

    .

    The time constant is = R0C0, and with t = we obtain:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    37/310

    2.1. The Elmores model 17

    R

    C

    V0

    0

    0

    V1

    Fig. 2.3: RC single cell

    V1 = 0.63V0(t).

    So the time tD = represents the 63% delay from V0(t) to V1(t). Extend-

    ing the formula of the time constant to the chain of figure 2.2, we obtain:

    tD=

    N

    i=0 ij=0

    RjC

    i.

    This delay is the inputoutput delay. When there is the need to know

    the delay between the input and one of the inner nodes, a more complex

    formula (a semi-empirical one) can be used; for example, with N= 2:

    t1 = R0C0+ qR1C1 delay from the input note to the first node

    t2 = R0C0+ (R0+ R1)C1 delay from the input note to the output node

    where q is:

    q =

    R0R0+ R1

    ifR1 2R0,R0C0

    R0C0+ R1C1ifR1 > 2R0.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    38/310

    18 Chapter 2. A simple model

    The first case (with R1 2R0) is named strong coupling, while the secondone is named weak coupling.

    Given the unit impulse response h(t) (figure 2.4) of the output node of

    the RC tree, Elmore proposed to approximate the delay by the mean of

    h(t), considering h(t) as a distribution. The 50% delay is given by:

    h(

    t)

    t

    m

    Fig. 2.4: Elmore impulse response

    Z

    0h(t)dt = 0.5

    while the original work of Elmore proposed:

    tD = m =Z

    0t h(t)dt

    with

    Z

    0h(t)dt = 1.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    39/310

    2.2. Conclusions 19

    This approximation is valid only when h(t) is a symmetrical distribu-

    tion, as in figure 2.4, while in real cases the h(t) distribution is asymmetrical;however in [4] is proved that the Elmore approximation is an upper bound

    for the 50% delay, even when the impulse response is not symmetrical, and,

    furthermore, the real delay asymptotically approaches the Elmore bound as

    the input signal rise (or fall) time increases.

    2.2 Conclusions

    The model shown in this chapter is quite appealing for the calculus ofthe delay in CMOS structure, but it is inaccurate as far as we go into the

    submicron domain, so its use should be limited to a first validation of an

    optimization algorithm, but not for real production.

    About this, it is important to note that the delay functions obtained by the

    Elmores formula satisfy some properties useful in the optimization realm

    (for example equation (4.1), page 50): then the Elmore model is very useful

    for optimization algorithms testing.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    40/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    41/310

    Chapter 3

    A COMPLEX MODEL

    THE target of the model developed here is to offer limited estimation

    errors with respect to physical SPICE simulations and to improve the

    computation speed of more than one order of magnitude. This could be

    useful in optimization algorithms.

    Thus the aim of the model is to evaluate the delay and power dissipation

    ofCMOS structures.

    Several approaches have been used to evaluate the delays of CMOS

    structures: some models are derived from SPICE simulations by means of

    lookuptables [5]; some are analytical [6] while others approximate the

    evaluation of the delay with step or ramp inputs [7, 8, 9, 10, 11].

    Regarding the power consumption the main contributions are: switch-

    ing power, short circuit current and subthreshold conduction. The first

    one occurs during the charge and discharge of internal capacitances; short

    circuit current originates from the simultaneous conduction ofp and n net-

    works and it is dominated by the slope of node voltages; subthresholdcurrents are due to the weak inversion conduction ofMOSFETs and become

    relevant when the power supply is scaled in sub-micron technologies.

    Most of the proposed power models use estimation algorithms not com-

    patible with the delay analysis. The purpose of the FAST model is to com-

    bine delay and power evaluations in the same estimation procedure, allow-

    ing the simultaneous optimization of delay and power.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    42/310

    22 Chapter 3. A complex model

    The section 3.1 reports the theory behind the FAST model, and in par-

    ticular: 3.1.1 shows the MO S equations used in the model, 3.1.2 showsthe internal nodes voltage approximation made by the model and 3.1.3explains how the threshold voltage variation are taken into account in the

    model. Section 3.2 shows how the FAST model estimates the delay, and in

    particular 3.2.1 shows how the equation are solved; while section 3.3 re-ports the method used for the calculation of the power consumption, and

    in particular 3.3.1 accounts for the switching power, 3.3.2 accounts for theshort-circuit power, and 3.3.3 accounts for the subthreshold power.Finally the section 3.4 presents some results by the comparison of the model

    with HSPICE and the section 3.5 draws some conclusions.

    3.1 The FAST model

    The low complexity and the accuracy that can be obtained by taking

    care of the phenomenon of carriers velocity saturation, which is domin-

    ant in submicron technologies, suggested the use of the classical charge

    control analysis and the gradualchannel approximation (Hodges model),

    described in 3.1.1.

    Estimation accuracy and low computational effort can be achieved by

    operating both on the waveforms of internal signals and on the topology

    considerations: in particular all the waveforms in the circuit are approxim-

    ated with linear ramps.

    By approximating the input waveform with a ramp, a strong simplific-

    ation of the I(V) equations is obtained. Figure 3.1 shows the output voltage

    of an inverter driven by a ramp input. It can be noticed that a ramp can

    properly approximate the output voltage variation, especially in the central

    phases of the commutation. The increasing error on the tail of the switching

    does not affect significatively the delay and power estimation.

    The voltage ramp approximation are described in 3.1.2.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    43/310

    3.1. The FAST model 23

    0

    1

    2

    3

    4

    5

    1.2 1.25 1.3 1.35 1.4 1.45 1.5

    V

    Time (ns)

    VoutVinModel

    Fig. 3.1: Inverter voltages waveform

    3.1.1 MO S equations

    The well known equations for the MOS transistors are (for the ntype

    and ptype transistors)[1]:

    below saturation

    IDSn,p = n,p

    (VGS VTn,p )VDS

    V2DS2

    (3.1)

    above saturation

    IDSn,p =n,p

    2

    VDSsatn,p

    2(3.2)

    where n,p =n,pCox W

    L , with n,p modified by the carrier velocity saturation

    effect:

    n =n0

    1+ VDSLEc

    p =p0

    1 VDSLEc

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    44/310

    24 Chapter 3. A complex model

    The saturation voltage (drainsource), not including the carrier velocity

    saturation effect, is given by the well known formula:

    VDSn,p = VGSn,p VTn,p

    while considering the effect abovementioned:

    VDSn,p =Vc

    1 2(VGSn,p VTn,p )

    Vc 1

    (3.3)

    where the plus signs are for nMOSFETs and the minus signs are for the

    pMOSFETs, and Vc = |EcL|

    3.1.2 Internal nodes approximation

    Fig. 3.2: Mos chain with proper numbering

    Let be N the number of nMOSFETs in the nchain and P as the num-

    ber of pMOSFETs in the pchain, and lets label the transistor in the chain

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    45/310

    3.1. The FAST model 25

    from 1 to Nor from 1 to P (figure 3.2). Lets assume that the label 1 comes

    with the driving transistor (i.e. the nMOSFET with source connected to VSS

    as the pMOSFET with source connected to VDD), as in figure 3.2. This hy-

    pothesis is only for the develop of the discussion; in our model any (but

    only one) transistor can be a driving transistor, that is a transistor with a

    changing gate voltage.

    Notation 3.1. In the following equations the superscript index refers to the

    node number (with the variable i always for the nMOSFETs and j always

    for the pMOSFETs), and the smallletter subscript indexes n and p refer, re-

    spectively, to nMOSFETs and pMOSFETs, both for the voltage variables or

    for the time variables; for the voltage variables the capital subscript indexes

    G and D refer to the drain node and the gate node, while the smallletter

    index d refers to the initial conditions of the drain nodes.

    So, for example, ViGn (t) is the gate voltage at the node i for the nMOSFETs

    (function of time), and Vjdp

    is the initial condition of the drain voltage at

    node j for the pMOSFETs.

    The wave forms of the voltage are shown in figure 3.4 and figure 3.5,

    with the hypothesis t10n= t2

    0n=

    = tN

    0nand t1

    0p= t2

    0p=

    = tP

    0p; that is

    because we suppose the start of conduction of all the MOSFETs in a chain

    contemporary1.

    We can write, referring to figures 3.4, 3.5:

    V1Gn (t) =

    0 t < 0VDD

    1int 0 t < 1in

    VDD 1in t

    (3.4a)

    V1Gp (t) =

    VDD t < 0

    VDD VDD1ip

    t 0 t < 1ip0 1ip t

    (3.4b)

    ViGn (t)

    i=2,3,...,N= VDD t (3.4c)

    1 This hypothesis is well supported by simulations

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    46/310

    26 Chapter 3. A complex model

    VjGp

    (t)j=2,3,...,P = VSS t (3.4d)

    ViDn (t)

    i=1,2,...,N=

    Vidn t < ti0n

    Vidn Vidnt ti0n

    ion ti0nti0n t < ion

    VSS ion t

    (3.4e)

    VjDp

    (t)j=1,2,...,P

    =

    Vjdp

    t < tj0pVDD Vjdp

    jop tj0p

    t+jop V

    jdp

    tj0p VDD

    jop tj0p

    tj0p

    t < jop

    VDD jop t

    (3.4f)

    Fig. 3.3: The ith and i+1th MOSFETs with node voltages

    It is also possible to define iin,p = i1on,p and the source voltage V

    is =V

    i+1d ,

    as shown in figure 3.3 for the ith nMOS . The same is valid for the p

    MOSFETs.

    The starting level Vdn,p are determined with a static analysis, described

    in 3.1.3.

    3.1.3 Body effect: threshold variation and its approximation

    It is known that a MOS transistor with the sourcebody voltage differ-

    ent from zero has the threshold voltage modified by the body effect, that

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    47/310

    ooo i

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    48/310

    28 Chapter 3. A complex model

    ooooi

    Fig. 3.5: Voltages wave forms in the pmos chain

    The source potential of the top transistor is

    Vs = VDD VTn ,

    and, ifVTn0 is the threshold voltage with Vsb = 0, then VTn = VTn0+ VTn

    and we can solve for Vsb:

    Vsb =

    4

    2|p|+ 8|p|+ 4VDD 4VTn0+ 22

    + 2|p|+VDD VTn0+2

    2

    (> 0)

    We can find an analogue equation for pMOSFETs: knowing that, for

    the pMO S chain depicted in figure 3.7(b), the drain potential of transistor

    is VPdp = 0, while VPsp =VDD VTp; for the middle transistors Vjdp = V

    jsp =

    VDD VTp ; and for the first (topMO St) transistor V1dp =VDD VTp andV1sp = VDD .

    The threshold voltage variation function ofVsb again is:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    49/310

    3.1. The FAST model 29

    oo

    i

    Fig. 3.6: Drainsource (VDS) and gatesource (VGS) voltages of th ith nMOS

    VTp

    =(2|p|+Vsb 2|p|)

    (for pMO S transistors threshold voltage is negative).

    Again, solving:

    Vsb = VDD VTp = VDD VTp0+ (

    2|p|+Vsb

    2|p|)

    where VTp0 is the threshold voltage with Vsb = VDD ; thus we find:

    Vsb =

    4

    2|p|+ 8|p|+ 4VDD + 4VTp0+ 22

    2|p| VDD VTp0 2

    2(< 0)

    The threshold variation is approximated in the model by a linear ap-

    proximation given by:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    50/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    51/310

    3.2. Delay estimation 31

    0.8

    0.9

    1

    1.1

    1.2

    1.3

    1.4

    1.5

    0 1 2 3 4 5

    VTn

    Vsb

    VTn

    (Vsb

    )VTn approx

    (a) nMOSFET

    -1.7

    -1.6

    -1.5-1.4-1.3

    -1.2

    -1.1

    -1

    0 1 2 3 4 5

    VTp

    Vsb

    VTp

    (Vsb

    )VTp approx

    (b) pMOSFET

    Fig. 3.8: Threshold variation with Vsb (solid line) and its linear approxima-tion (dashed line)

    In figure 3.8(a) and 3.8(b) the actual threshold variation (of a nMO S

    transistor and a pMO S transistor) when a Vsb voltage is applied is com-

    pared with the linear approximation used in our model, for a 0.7 m tech-

    nology.

    The max error due to the linear approximation is limited to 7%.

    3.2 Delay estimation

    The delay estimation of the structures reported in figure 3.2 implies the

    evaluation ofion,p and ti0n,p

    , for each transistor in the chains.

    The currents in each transistor can be obtained from equations (3.1),

    (3.2) (page 23), with the voltage function of time defined in equations (3.4a)

    (3.4f) (page 25). So we can calculate the quantity of charge at each node and

    thus apply the charge conservation law, i.e. at each node the total chargevariation must be equal to zero:

    Qin = 0 Qjp = 0 i = 1, 2, . . . Nand j = 1, 2, . . . , P (3.5)

    The generic term Qin is the sum of three elements, Qin = Q

    i+1I QiI QiC,

    define below:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    52/310

    32 Chapter 3. A complex model

    Qi+1I is the charge due to the (i+ 1)th MOSFET placed above the ithnode:

    Qi+1I =Z ti+1sn

    ti+10n

    Ii+1sat (t)dt+Z i+1on

    ti+1sn

    Ii+1lin (t)dt (3.6a)

    which includes the contributions due to the currents above and be-

    low saturation; ts is the time at which the MOSFET switches from the

    saturation to the linear region;

    QiI is the charge due to the (i)th mos below the ith node:

    QiI=Z tisn

    ti0n

    Iisat(t)dt+Z ion

    tisn

    Iilin (t)dt (3.6b)

    QiC is the charge due to the discharging of the capacitor at the ithnode, Ci:

    QiC= CiVi

    dn. (3.6c)

    Similarly equations apply for pMOSFET.

    For each circuit node, a charge conservation equation can be written.

    3.2.1 Equation solving

    Referring to the nMOS chain in figure 3.3, we can write at the output

    node N:

    QNn = QNC = CNVNdn (3.7)

    because, neglecting the contribution of the pMOS chain above (if it exists),

    QNI = 0.

    At the node N 1 we can write:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    53/310

    3.2. Delay estimation 33

    QN1n = QNI QN1I QN1C ,

    and combining with eq. (3.7) (page 32)

    QN1n = CNVNdn QN1I QN1C ,

    and so on:

    QN2n =CNVNdn CNVN1dn QN1I QN2C .

    More generally:

    Qin =N

    k=i+1

    CkVkdn QiI QiC

    =N

    k=i

    CkVkdn QiI= 0

    Proceeding till the first transistor, we obtain:

    Q1n =N

    k=1

    CkVkdn Q1I= 0 , (3.8)

    the same applies for pMOSFETs.

    In order to solve nonlinear equation (3.8) one must substitute the defin-ition of the current to calculate the charge Q, as in equations (3.6a), (3.6b)

    (page 32), moreover one must substitute both the current calculated in the

    saturation region and the one calculated in the linear region, extending the

    integrals of the aforementioned equations to the proper extremes.

    Finally we must distinguish among several different cases, depending

    on the instant of time on which the transistor switch from the saturation

    region to the linear region. For example, the first transistor can switches

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    54/310

    34 Chapter 3. A complex model

    between the two regions when the rising of the input has already finished,

    or on the contrary can switches when the input is still rising.All the possible cases are:

    t10 t1s

    1i

    1o t

    10

    1i t

    1s

    1o

    t1s t10

    1i

    1o

    1i t

    10 t

    1s

    1o

    t1s 1i t

    10

    1o t

    10 t

    1s

    1o

    1i

    t1s t10

    1o

    1i

    (3.9)

    Evaluating all the possible cases, the equation (3.8) becomes a non

    linear equation of the variables t1s , t10,

    1o ,

    1i , with t

    1s , t

    10,

    1o as unknowns.

    A further step must be done, with the purpose of eliminating all the vari-

    ables but one. The real unknown is the time 1o , while all the other un-

    knowns can be expressed in function of1o : in particular, the times t1s and

    t10 can be calculated together, with the equation VDS = VGS VTand withthe equation that states the charge conservation at node 1 between the time

    0 and the time t10, similar to the equation (3.5) (page 31), including the boot-

    strap effect due to capacitive coupling between the gate and the drain of

    the first transistor.

    Both these equations are functions of t1s , t10,

    1o ,

    1i . By this way one has

    three equations with three unknowns, and by means of some approxim-

    ated methods2 it is possible to evaluate the three unknowns.

    This solution scheme ought to be repeated for all the seven cases shown

    in equation (3.9). Each case gives as a solution a triple t1s , t10,

    1o that is com-

    patible with one and only one of the conditions expressed by these cases.

    Thus, only one working condition is really selected, as it can be expected.

    Indeed all the previous solving scheme is true only if the equation (3.6c)

    (page 32) apply, i.e. only if the capacitance at the node i is not a function of

    the voltage at the same node. But the capacitance actually is function of the

    voltage in this manner:

    Or, taking into account the carrier velocity saturation effect, the equation (3.3) (page 24).2 The problem is always strictly nonlinear.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    55/310

    3.2. Delay estimation 35

    Ci = Cij

    1+

    Vi

    b

    mj+Cip

    1+

    Vi

    b

    mp(3.10)

    where Cj and Cp are, respectively, function of area and function of peri-

    meter of a junction, because the capacitance at the node i is due to the para-

    sitics capacitances of the transistors connected to this node.

    If the capacitance at each node are functions of the voltage at the node it-

    self, then one equation is no more sufficient: one must write equations like

    the equation (3.8) (page 33), one for each node, and the solve them with

    standard solving algorithm for nonlinear equations. The only difference

    among the equations applied at the nodes above the first and the first node

    equation is that not all of the cases of equation (3.9) are possible: in par-

    ticular these conditions apply only when the transistor can pass from the

    saturation region to the linear region, and moreover, only when the input

    rising time 1i can assume whichever value. The passage from saturation to

    linearity can be made only by the first and the last transistors of the chain,

    as they are the only that can saturate3. But in the last transistor, the time Niis governed by Ni =

    N1o , giving thus only two possible cases:

    tN0 tNs

    Ni

    No t0

    Ni t

    Ns

    No

    In order to make the algorithm convergent, two other fictitious cases

    must be included:

    tN0 tNs , No Ni

    t0 tNs ,

    No

    Ni

    These conditions can never verify in a real circuit, since they imply that

    the voltages at the source node and at the drain node of the last transistor

    3 This is because they are the only that have a full voltage swing at some node, e.g. thegate node the first, and the drain the last. All the transistor in the middle of the chainare prevented to saturate by the body-effect, that makes the saturation condition VDS =VGS VT, (or, better, the equation (3.3), page 24) impossible.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    56/310

    36 Chapter 3. A complex model

    crosses, making the transistor current flowing in an inverse direction (see

    figure 3.6 for a visual explanation of the terms i and o and why they relat-ive voltage waveforms cannot cross). Their inclusion help finding the real

    circuit conditions when solving the equation (3.8) for each of these four

    cases: the solution of one the fictitious cases gives only unknowns compat-

    ible with one of the real cases.

    All the other transistors, that can not saturate during the switching from

    off to on, have only one possible working condition, again that the voltages

    at source and drain nodes do not cross:

    ji jo j = 2, . . . N 1

    Solving all the equations, one for each node, the unknowns jo can be

    evaluated, giving thus an estimate of the voltage waveform at each node

    of the chain. The rising/falling time of the last node of the chain gives also

    the delay of the chain itself.

    3.3 Power consumption estimation

    3.3.1 Switching energy

    The contribution to the power dissipation due to the charge and dis-

    charge of internal nodes for each MOSFET can be defined as the integral of

    the voltage across the MOSFET times the current flowing through.

    Theorem 3.2. The switching energy in generic nnetworks and pnetworks can

    be written as:

    Eswn =1

    2

    N

    i=1

    Ci

    V 2i V 2i

    (3.11)

    Eswp =1

    2

    P

    j=1

    Cj

    VDD Vj

    2 VDD Vj 2

    (3.12)

    where Ci is the generic total capacitance of node i-th and Vi , Vi are, re-

    spectively, the initial and final value of the voltage swing at the same node.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    57/310

    3.3. Power estimation 37

    Corollary 3.2.1. If the voltage swing of each node of the network is the full swing

    V= VDD 0, then equations (3.11), (3.12) can be written as:

    Eswn =1

    2

    N

    i=1

    CiV2 (3.13)

    Eswp =1

    2

    P

    i=1

    CiV2 (3.14)

    Proof of theorem 3.2. Since the internal voltages and currents are known from

    the delay analysis, the energy for the nMO S network can be written by

    summing all the contributions of internal nodes (see figure 3.3)

    Eswn =N

    i=1

    Z Vi+1Dn (t) ViDn (t)

    IiDn (t)dt

    where the notation of figure 3.3 is adopted.

    This equation can be written in this way:

    Eswn =Z

    VNDn (t)I

    NDn (t)+

    N1i=1

    ViDn (t)

    IiDn (t) Ii+1Dn (t)

    dt (3.15)

    It is possible to rewrite the previous equations by noting that in general:

    Ii+1Dn

    IiDn = C

    idViDn

    dt

    and, in particular, if we neglect the current of the pMO S chain above the

    node N,

    INDn = CNdVNDn

    dt.

    Thus, for the n network it is possible to define the Eswn energy in the

    following way:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    58/310

    38 Chapter 3. A complex model

    Eswn = N

    i=1

    CiZ t0

    t0ViDn

    dViDndt

    dt

    = N

    i=1

    CiZ Vi

    ViViDn dV

    iDn

    =1

    2

    N

    i=1

    Ci

    V 2i V 2i

    If we integrate the equation (3.11) (page 36) only when the argument of

    the integrals are non zero, then the first integral in this equation goes fromt0 = t

    i0n

    to t0 = ion , so that the second integral goes from V

    i = V

    iDn

    (ti0n ) to

    Vi = ViDn

    (ion ). Since ViDn

    (ion ) = 0, we have Eswn =12

    Ni=1 C

    iV 2i , where Vi

    is the actual voltage swing at the node i.

    The energy dissipated in the p network (Eswp ) can be calculated with

    similar considerations leading to

    Eswp =P

    j=1

    CjZ t0

    t0

    VDD VjDp

    dV

    i

    Dndt

    dt

    =

    P

    j=1

    CjZ Vj

    Vj

    VDD VjDp

    dV

    jDp

    =1

    2j

    Cj

    VDD Vj

    2 VDD Vj 2

    Again, Vj = VjDp

    (ti0n ) and V

    j = VjDp

    (jop ), and in the same way V

    j =

    VDD

    , so that Eswp=

    1

    2

    P

    j=1Cj(V

    DD V

    2

    j), where (V

    DD V

    2

    j) is the voltage

    swing at the node j.

    In the equations (3.11) and (3.12) (page 36) the voltage variation of ca-

    pacitance must be included, obtaining expression for Eswn,p slightly more

    complicated, but still in closed form.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    59/310

    3.3. Power estimation 39

    3.3.2 Shortcircuit energy

    The shortcircuit contribution (for a output falling transition) is given

    by:

    Esc =Z o

    t0VD ID dt

    where ID is the pMOSFET current flowing through the pMOSFET that

    has a changing gate voltage, during the output falling; of course all the

    pMOSFETs among this one and the output node must be on to have this

    contribution of power dissipation. So if we neglect the little discharging of

    the source voltage of this MOSFET, we can easily calculate the shortcircuit

    energy, calculating the current flowing.

    A similar equation can be written for the nMO S network.

    Since voltage swings, internal currents and capacitances are known from

    the delay analysis, the power supply dissipation does not require addi-

    tional computations.

    3.3.3 Subthreshold energy

    The subthreshold current in a MOSFET is given by ([12]):

    IDSsubth = 0W

    L

    kT

    qQ(VS)

    1 e

    qVDSkT

    where

    Q(VS) kTq

    qsNa|p| e

    q(VGVT)kT

    and

    = 1+1

    2Cox

    s Na|p| .

    This current is proportional to the MOSFET width W, but, usually is neg-

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    60/310

    40 Chapter 3. A complex model

    ligible. However, with the scaling down of the dimensions and hence of the

    threshold voltage this current may become no more negligible, and withlow VG and higher VD, the current becomes independent from VG.

    Moreover, while the shortcircuit current is limited by the switching times

    of the circuit, the subthreshold current is not limited in time, so its dissip-

    ation can be comparable to the shortcircuit dissipation.

    3.4 Results

    The circuit in figure 3.2 with 2 nMO S and 2 pMOS transistors (in a

    0.7 m technology) has been simulated using HSPICE (level 6) and the pro-

    posed model, for each combination ofMOSFET widths from 1 mto100 m.

    Figure 3.9 shows the comparison between delay (defined as the delay at

    50% between an input rise ramp of 200 ps and an output falling ramp)

    calculated by the model and the delay simulated by HSPICE for each com-

    bination of widths among 5 m and 30 m; similarly figure 3.10 shows the

    comparison between the energy dissipated (during the output discharging)

    by the circuit calculated by the model and by HSPICE.

    Tab. 3.1: Mean Error

    Mean error Max Error Min Error

    Delay 6.115% 12.985 % 0.905%Energy dissipated 2.1% 6.3% 0.11%

    Tab. 3.2: Execution time

    HSPICE execution time FAST execution time

    6384.3 sec. 188.91 sec.

    The errors between the proposed model and the HSPICE simulation is

    reported in table 3.1 while table 3.2 shows corresponding execution time.

    These results are taken from the analysis of the circuit varying the dimen-

    sions of the MOSFETs continuously from 1 m to 100 m.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    61/310

    3.5. Conclusions 41

    3.5 Conclusions

    The model of this chapter is suitable for the optimization application of

    chapter 5. It is able to compute the delay and the power consumption of

    CMOS structures with good accuracy and a consistent speedup regarding

    to the HSPICE simulation taken as a reference.

    In a real production design cycle, this model might be used for a first pre

    optimization of some basic cell; then in the last steps of the design flow the

    optimization using a more accurate model for the delay (or power) evalu-

    ation must be used.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    62/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    63/310

    3.5. Conclusions 43

    Energy Model

    510

    1520

    2530

    W1 [micron] 5

    10

    15

    20

    25

    30

    W2 [micron]

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    Energy [fJ]

    (a) FAST model

    Hspice Simulation

    510

    1520

    2530

    W1 [micron] 5

    10

    15

    20

    25

    30

    W2 [micron]

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    Energy [fJ]

    (b) HSPICE

    Fig. 3.10: Energy dissipated by the circuit of figure 3.2 with several combin-ation ofW1 and W2

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    64/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    65/310

    Part III

    OPTIMIZATION

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    66/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    67/310

    Chapter 4

    MATHEMATIC OPTIMIZATION

    THE very basic theory of optimization is introduced here, in order to

    develop some optimization schemes, useful later for the optimization

    of real circuits.

    The theory of mono-objective optimization involves some properties and

    theorems regarding finding the minimum of functions, hence the annulling

    of the functions first derivatives. These results can be extended (with some

    restrictions) to the case of multivariable functions but when the functions

    to be optimized are more than one, being optimized simultaneously, the anew theory may be introduced.

    The whole goal of this introduction to mathematical optimization is

    both the developing of reliable algorithms, and the justification of some as-

    sumptions made in the chapter 5 (page 77), especially for the multi-objective

    case.

    In section 4.1 some mathematical optimization foundations are repor-

    ted, and in particular in

    4.1.1 is shown the theory of mono-objective optim-

    ization (unconstrained, 4.1.1.1, and constrained, 4.1.1.2), while in 4.1.2 isshown the theory of multi-objective optimization (unconstrained, 4.1.2.1,and constrained, 4.1.2.2).The section 4.2 reports the basic and most useful numerical algorithms for

    optimization purposes: in 4.2.1 some one-dimensional search techniques,in 4.2.2 some multi-dimensional search techniques, and in 4.2.4, 4.2.5some special algorithms.

    Some conclusion and summarized characteristics are reported in section 4.3.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    68/310

    48 Chapter 4. Mathematic Optimization

    4.1 Optimization theory

    Notation 4.1. In the following section, the function f is defined as:

    f: X Rp Y R. X is called the decisions space, and Y is called the criteriaspace.

    Problem 4.2 (Unconstrained optimization). Given the function f that de-

    pends on one or more variable x X, the problem of optimize f, in thiscontext, is equal to find:

    minx

    Xf(x)

    this is also known as an unconstrained optimization, since there are not any

    constraints on the values the function f may assumes.

    The unconstrained optimization is seldom applied in the field of digital

    circuits, so the constrained optimization is defined as:

    Problem 4.3 (Constrained optimization). Find

    minxX

    f(x) subject to gj(x) hj, j = 1, 2, . . . , m

    where the n equations gi(x) hi constitute the set ofconstraints of the op-timization.

    The function f is also called the objective of the optimization, or the cost

    function of the problem.

    The above problems are classical optimization problems, or mono-objec-

    tive problems. The multi-objective unconstrained optimization is defined as

    the problem to optimize a vectorial function, so that the objective-functionis a vector of objective-functions.

    Notation 4.4. In the following (multi-objective optimization), the function f

    is defined as:

    f: X Rp Y Rn, or f= (f1, f2, . . . , n)|fi : X Rp Y R,Problem 4.5 (Unconstrained multi-objective optimization). Find

    minxX

    fi(x), i = 1, 2, . . . , n

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    69/310

    4.1. Optimization theory 49

    where there are n objective functions.

    Finally, the multi-objective constrained optimization is defined as:

    Problem 4.6 (Constrained multi-objective optimization). Find

    minxX

    fi(x), i = 1, 2, . . . , n subject to gi(x) hi, i = 1, 2, . . . , m

    where there are n objective functions and m constraints.

    The multi-objective optimization is a very complex problem, since the

    problem of finding the minimum of two or more functions is apparently

    only trivial: the set of independent variables xmin that minimizes, lets say,

    the function f1, it is not supposed to minimizes (and generally it does not)

    the other functions. So there should be a way to combine the information of

    minimum among all the functions. The intuitive way of linear combination

    is somewhat problematic:

    ftot(x) =n

    i=1

    ifi(x), i R

    because the functions fi

    cannot be commensurable among them. For ex-

    ample, if there is one function fj that is fj >> fi, i = j, then this functiondominate the total objective, giving false results for the optimization prob-

    lem. This problem is illustrated in 4.1.2.

    4.1.1 Mono-objective optimization

    The mono-objective optimization is the standard optimization problem,

    and is widely treated in literature (see [13] for an introduction). With this

    preliminary statement, here are reported some results, useful to find a solu-tion for the problems 4.2, 4.3.

    The existence of the minimum (at least one) is granted by the Weierstrass

    Theorem1, but these minimums can be local or global:

    Definition 4.7 (Local Minimum). The point x X is a local (or relative)minimum of the function f iff

    > 0 : f(x) f(x) x X |x x| < .1 iffX is a compact set, as is in this context

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    70/310

    50 Chapter 4. Mathematic Optimization

    Definition 4.8 (Global Minimum). The point x X is a global (or abso-lute) minimum of the function f iff f(x) f(x) x X.Definition 4.9 (Feasible direction). d Rn is a feasible direction if >0 : x+d X, : 0

    In an intuitive manner the concept of feasible direction is useful to solve

    the problem of minimization: we search all the direction in which the func-

    tion f is decreasing.

    Lemma 4.10 (First order necessary condition). If x

    X is a minimum of

    f C1 then d Rn, where d is an feasible direction, dT f(x) 0, where() has the usual definition of scalar product in the space Rn.

    Corollary 4.10.1. If x X is an internal point of X, then dT f(x) = 0

    Lemma 4.11 (Second order necessary condition). If x X is a minimum off C2 then d Rn, where d is an feasible direction,

    i) dT f(x) 0;

    ii) if dT f(x) = 0 then dT 2f(x) d 0

    Corollary 4.11.1. Ifx X is an internal point of X, then

    i) dT f(x) = 0

    ii) dT 2f(x) d 0

    The conditions of the corollary 4.1.1 are necessary and sufficient con-

    ditions for the existence of the minimum (local). In order to have some

    information about the existence of a global minimum, the theory of convex

    functions must be very briefly reported.

    Definition 4.12 (Convex function). The function f: X Y, where X is aconvex set2, is convex ifx1, x2 X : 0 1

    f(x1+ (1 )x2) f(x1)+ (1 )fx2) (4.1)2 A set X R n is convex ifx, y X the segment [x, y] is totally contained in X

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    71/310

    4.1. Optimization theory 51

    If in the equation (4.1) the sign < applies, then the function is said to be

    strictly convex.

    Another way to write the equation (4.1) is:

    Lemma 4.13. The function f C1 : X Y is convex over a convex set X if

    f(y) f(x)+f(x)f(y x), y, x X

    or, if f is twice derivable,

    Lemma 4.14. The function f C2 : X Y is convex over a convex set X if

    2f(x) 0, x X

    The convex functions are a very useful mathematical tool in the class of

    optimization problem, mainly for the next two results:

    Theorem 4.15. If f: X Y is convex over a convex set X, the set A of the min-imum of the function is convex, and every local minimum is also a global min-

    imum.

    Theorem 4.16. If f C1 : X Y is convex over a convex set X, and if x X : x Xf(x)(x x) 0, then x is a global minimum of f over X.

    The theorem 4.16 also implies that the conditions of the lemma 4.10 and

    corollary 4.10.1 (first order conditions) are both necessary and sufficient

    conditions for the existence of a global minimum.

    4.1.1.1 Unconstrained problem

    All the previous results are, almost in theory, sufficient to solve the

    problem 4.2. The theory of the convex function ensures the existence of

    a global minimum, while lemma 4.10, corollary 4.10.1, and theorem 4.16

    suggest a method to find this minimum. We will see in 5.1 how thesemethods apply to real circuits, in which, for example, the functions deriv-

    ative are not available.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    72/310

    52 Chapter 4. Mathematic Optimization

    4.1.1.2 Constrained problem

    The solution of problem 4.3 is slightly more complicated. The pres-

    ence of constraints reduces the feasible set of independent variables that

    are solutions of the problem. So the solutions, (i.e. the value of independ-

    ent variables that minimize the objective function), must be searched in the

    set x C X that satisfies all the constraints.The most important method to solve the problem of the minimization tak-

    ing into account the satisfaction of some constraints (and, incidentally, the

    method most useful for our real problem) is the method of the Lagrange

    multiplier (and its derived, the method of the penalty function).

    Lagrange multiplier and Penalty functions The first method defines a

    Lagrangian function:

    L(x, ) = f(x)+m

    i=1

    igi(x) (4.2)

    If we define x as the solution that:

    x =minxX

    f(x) gi(x) 0, i = 1, 2, . . . , m

    then we can write the necessary KuhnTucker conditions for the existence

    of the minimum:

    x L(x, ) = 0 (4.3)

    L(x

    , )

    0 (4.4)

    ()Tg(x) = 0 (4.5)

    0 (4.6)

    In order to find out sufficient conditions, we define the saddle-point condi-

    tions:

    Theorem 4.17. A point (x, ) with 0 is a a saddle-point of the LagrangianL(x, ) iff

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    73/310

    4.1. Optimization theory 53

    i) x minimizes L(x, ) over the whole X

    ii) gi(x) 0, i = 1, 2, . . . , m

    iii) i gi(x) = 0, i = 1, 2, . . . , m

    It can be proved that if the functions f,g are even not differentiable but

    are convex, then the saddle-point conditions are necessary and sufficient

    conditions. Although these conditions must hold at the minimum, they are

    not very useful in determining the optimum point. The determination of

    the optimum by direct solution of these equations is rarely practicable.

    A more feasible way is to convert the constrained problem into an un-

    constrained one, by defining the new objective function:

    P(x, K) = f(x)+m

    i=1

    Ki[gi(x)]2 (4.7)

    The sum added to the objective function is called penalty function, since it

    penalizes the objective function adding a positive quantities (recall that we

    want to minimize the cost function). The constants K = [K1, K2, . . . , Km]T

    are weighting factors (positive) that define how strongly must be satisfied

    the ith constraint, and can also made it commensurable.

    Wherever x is inside the feasible region, we can ignore the constraints,

    so a new objective function can be defined as:

    P(x, K) = f(x)+

    m

    i=1

    Ki[gi(x)]2ui(gi) (4.8)

    where ui(gi) is the usual step function:

    ui(gi) =

    0 ifgi(x) 01 ifgi(x) > 0

    The introduction of the step function makes possible to relate the pen-

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    74/310

    54 Chapter 4. Mathematic Optimization

    alty function defined in (4.8) with the Lagrangian function of (4.2) (page 52):

    P(x, K) = L(, K)

    if we let i = Kigi(x)ui(gi), so that all previous results valid for the Lag-

    rangian function are valid for the penalty function.

    Note that the solution x found optimizing the penalty function P(x, K)

    converges to (x, ), defined by the KuhnTucker conditions, only in the

    limit K .

    4.1.2 Multi-objective optimization

    The multi-objective optimization is not a standard problem in the engin-

    eering, but is quite common in economics ([14]). While with the mono-

    dimensional problem the concept of optimum as a minimum is quite clear

    and defined (the idea of greater or lesser is intuitive with the real number),

    with multi-objective (also multi-criteria) the concept of minimum is less in-

    tuitive. So we must define some relation of order among the points in a

    multi-dimensional space.

    Notation 4.18. Given x, y Rn, define

    x = y iff xk = yk k = 1, 2, . . . , nx y iff xk yk k = 1, 2, . . . , nx y iff x y and x = y (sok : xk < yk)x < y iff xk < yk k = 1, 2, . . . , n

    Notation 4.19. In the following section, the function f is defined as: f: X

    Y, X Rp, Y Rn. X is called the decisions space, while Y is calledthe criteriaspace.

    Given two outcome y1, y2 of the cost functions, y1 = f(x1) and y2 =

    f(x2), we must define which is better and we indicate that y1 is better than

    y2 with y1 y2, that y1 is worse than y2 with y1 y2, and, finally, that y1 isindifferent with respect to y2 with y1 y2.

    In the optimization theory a great importance has the definition ofPareto

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    75/310

    4.1. Optimization theory 55

    point or Pareto preference:

    Definition 4.20 (Pareto preference). Given y1, y2 Y, the Pareto preferenceis defined by

    y1 y2 iff y1 y2.

    A Pareto preference is intuitively guided by the relation lesser is better.

    Definition 4.21 (Non-Dominated and Dominated set). Ify1 y2 is a bin-ary preference defined on Y, the dominated and the non-dominated set

    with respect to {} are defined as:

    N({}, Y) = {y0 Y | y Y : y y0}D({}, Y) = {y0 Y | y Y : y y0}

    If y0 N({}, Y), y0 is a Npoint. Similarly, if y0 D({}, Y), y0 is a Dpoint.

    Definition 4.22 (Pareto optimum). y

    Y is a Pareto optimum iff it is a N

    point with respect to Pareto preference.

    We will give now two theorems that are fundamental for the solution of

    the multi-objective optimization problem; first we introduce the definition

    ofconvex cone in Rn:

    Notation 4.23 (convex cone).

    > ={d Rn |d > 0} =

    {d

    Rn

    |d

    0}

    = ={d Rn |d 0}

    Theorem 4.24. i) ify0 Y minimizes y over Y for some >, then y0is a Npoint;

    ii) ify0 Y uniquely minimizes y over Y for some , then y0 is aNpoint.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    76/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    77/310

    4.1. Optimization theory 57

    4.1.2.2 Constrained

    Again, the solution is to reduce the complexity of the problem from the

    multi-objectivity to a mono-objective one. It is possible to combine the two

    previous methods, that is to minimize a linear weighted function plus a

    sum of penalty function; the only critical point is to ensure the same order

    of magnitude of each term of the sum, such that there is not a dictatorship

    of one term of the sum. The third chance to solve an unconstrained problem

    (or a constrained, but with some care) is to use the method of the compromise

    solution:

    Compromise solution Given the problem 4.3, it is possible to define y as

    the ideal outcome of the cost function f(x) without any constraints, so that

    y = infxX

    f(x); the compromise solution is defined as the minimum ofregret:

    r(y) = y y;

    typically, the Lpnorm (the distance between the actual solution and the

    ideal point) ) it is used:

    r(y) = r(y;p) =

    n

    i=1

    |yi yi |p 1

    p

    .

    Again, a weight can be associated for each term of the sum:

    r(y;p, w) =

    n

    i=1

    wpi |yi yi |p

    1p

    .

    Definition 4.26 (Compromise solution). The compromise solution with re-spect to Lpnorm is yp Y that minimizes r(y;p, w) over Y.

    The compromise solution enjoys several properties, the most important

    is:

    Property 4.27 (Pareto optimality). The compromise solution yp Y is anNpoint, for 1 p < with respect to Pareto preference (definition 4.20).Ify is unique, then it is also an Npoint.

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    78/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    79/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    80/310

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    81/310

    4.2. Optimization Algorithms 61

    This implies that |b a| = |x c|, and that at each iteration the interval isscaled of the same ratio .Then we repeat the process with the new triplet. So the interval (a, c) is di-

    vided in two parts, a smaller and a larger, and the ratio between the whole

    interval and the larger is the same between the larger and the smaller, or in

    other words:

    1

    =

    1 ,

    giving for the positive solution

    =

    5 1

    2.

    This fraction is known as the golden-mean or golden-section, whose aes-

    thetic properties come from ancient Pythagoreans.

    Convergence considerations All the three previous methods have a lin-ear convergence, since at each iteration the ratio between the interval con-

    taining x and the new smaller interval is:

    0 Ik+1Ik

    1.

    The asymptotic convergence rate is defined as

    lim

    k

    Ik+1

    Ik

    .

    For the dicotomic search, since 2Ik+1 = Ik + , taking = 0 we have

    limk

    Ik+1Ik=

    1

    2.

    For the Fibonacci search, first we must write the generic number of the

    Fibonacci sequence in a closed form:

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    82/310

    62 Chapter 4. Mathematic Optimization

    fk =1

    5

    1+

    5

    2

    k+1

    1 52

    k+1.

    then it can be proved that, taking = 0:

    limk

    Ik+1Ik= lim

    kfk+1

    fk=

    5 1

    2

    For the golden section search, as previously saidIk+1

    Ik= , so

    limk

    Ik+1Ik= =

    5 1

    2.

    Thus the convergence rate of the Fibonacci and the golden-section search are

    identical.

    4.2.1.2 Parabolic interpolation

    Given a triplet (a, b, c) that brackets a minimum, we approximate the

    objective function in the interval (a, c) with the parabola fitting the triplet.

    Then we find the minimum of this parabola with the formula (since we

    want the abscissa, the method is indeed an inverse parabolic interpolation):

    x = b 12

    (b a)2[f(b) f(c)] (b c)2[f(b) f(a)]

    (b a)[f(b) f(c)] (b c)[f(b) f(a)]

    This method is useful only when the function is quite smooth in the in-

    terval, but it has the advantage that the convergence is almost quadratic,and it is perfectly quadratic when the function to be optimized is a quad-

    ratic form.

    The Brents rule The Brents rule is a mix of the last two techniques: it

    uses the golden section when the function is not regular and switches to a

    parabolic interpolation when the function is sufficiently regular. In particu-

    lar, it tries always a parabolic step. When the parabolic step is useless then

  • 8/7/2019 Design and optimization techniques of high-speed VLSI circuits

    83/310

    4.2. Optimization Algorithms 63

    the method use the golden section search.

    4.2.2 Multi-dimensional search

    Thi