a spread-spectrum clock generator using phase

A Spread-Spectrum Clock Generator using Phase

Interpolation for EMI reduction

by

Ky-Anh Tran

Submitted to the Department of Electrical Engineering and ComputerScience

in partial fulfillment of the requirements for the degree of

Master of Engineering in Electrical Engineering and Computer Science

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2014

c© Massachusetts Institute of Technology 2014. All rights reserved.

Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science

May 3, 2014

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prof. Charles G. Sodini

LeBel Professor of Electrical EngineeringThesis Supervisor

Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Matthew L. Courcy

Senior Design Engineer, Analog DevicesThesis Supervisor

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prof. Albert R. Meyer,

Chairman, Masters of Engineering Thesis Committee

A Spread-Spectrum Clock Generator using Phase

Interpolation for EMI reduction

by

Ky-Anh Tran

Submitted to the Department of Electrical Engineering and Computer Scienceon May 3, 2014, in partial fulfillment of the

requirements for the degree ofMaster of Engineering in Electrical Engineering and Computer Science

Abstract

The spurious-free dynamic range of RF DAC’s are limited by the heavy digital do-main switching, which interferes with the analog output signal. A design, layout andsimulation of a spread-spectrum clock generator (SSCG) is presented. The SSCGmodulates the clock frequency used to switch the digital blocks of the DAC in orderto reduce electromagnetic interference (EMI) spurs at the analog output signal of theDAC. Leveraging on a phase control architecture rather than a traditional PLL, theSSCG system is shown to reduce the spectral height a divided down clock spur up to19.6dB.

The SSCG is designed in TSMC’s 65nm CMOS process. It takes in quadrature,differential clocks at either 2.5GHz or 5GHz, and provides quadrature output clocksat 625MHz or 1.25GHz. The output spectrum of the clock can be attenuated up to19.6dB relative to the spectrum of an unspread clock. The core of the SSCG is aphase interpolator, which takes in quadrature input clocks and interpolates betweenthem to move the frequency around. To help process the signals before and afterinterpolation, the SSCG incorporates input variable gain filters, output restorationbuffers and divide by 4 circuits. Extensive transistor and behavioral simulations areused to verify the design.

Thesis Supervisor: Prof. Charles G. SodiniTitle: LeBel Professor of Electrical Engineering

Thesis Supervisor: Matthew L. CourcyTitle: Senior Design Engineer, Analog Devices

3

Acknowledgments

The completion of this thesis would not have been possible without the help of many.

I would like to thank Shawn Kuo and Matthew Courcy for their patient guidance

through the technical difficulties I encountered. Their IC design know-how and tech-

nical competence are inspirational. Engineers from the DAC group at ADI made me

feel welcome, and were always available for help. I have relied numerous times on

from Andy Fan, Zhou Bing, Qiurong He, Nathan Egan, Steve Rose and Martin Clara,

both for CAD help and circuit advice. Having not done layout before, I relied exten-

sively on Rick Sullivan and Ramson Gambiza, who taught me CAD techniques for IC

layout. Digital Designers Paul Wilkins, Grace Jin and Jim Rioux were kind enough

to lend me help on the digital design and place-and-route. I’d like to give special

thanks to Jeremy Walker, who generously shared his work on interpolator design and

guided me via phone at the start of the project. Finally, I’d like to thank Haiyang

Zhu, for the numerous discussions, not only on the my project, but also on a host of

other topics such as skin effect, latchup, IC layout, ADC design, DAC architecture

and much more. Your enthusiasm to share knowledge made learning a very enjoyable

process.

Besides working, I had also a great deal of fun at ADI interacting with fellow

interns. Kevin, I learned a lot from your systematic approach to IC design. Alex,

your choice of music on the car made carpooling much more interesting. Ujwal,

debating with you on solid state physics made me review a lot of concepts I thought

I had forgotten.

At MIT, I would like to thank Professor Sodini. Professor Sodini provided detailed

and constructive feedback on the project proposal and thesis, and reminded me the

importance of clear technical writing. He also devoted time from his hectic schedule

to advise me on my IC design career path. Finally, I would like to thank my parents

and sister, who kept me sane throughout the whole process, and whose support for

my education has led me to where I am now.

5

Contents

1 Introduction 19

1.1 Basics of Digital-to-Analog conversion . . . . . . . . . . . . . . . . . . 20

1.1.1 DAC Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.1.2 Harmonic Purity Requirement . . . . . . . . . . . . . . . . . . 23

1.1.3 Spread-Spectrum Clocking . . . . . . . . . . . . . . . . . . . . 23

1.2 Specifications for the Spread-Spectrum Clock Generator . . . . . . . . 24

1.3 Past Work: Frequency Control Systems . . . . . . . . . . . . . . . . . 27

1.4 Past Work: Phase Control System . . . . . . . . . . . . . . . . . . . . 29

2 Behavioral Study and System Proposal 33

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.2 The Mathematics of Spread-Spectrum . . . . . . . . . . . . . . . . . 33

2.2.1 A Toy Model: Single-tone FM Modulation . . . . . . . . . . . 33

2.2.2 Generalization to Arbitrary Modulation Waveform . . . . . . 36

2.2.3 Modulation Waveform Selection . . . . . . . . . . . . . . . . . 37

2.3 Mathematics of a Phase-Control System . . . . . . . . . . . . . . . . 44

2.3.1 Discrete-time and Discrete-Phase System . . . . . . . . . . . . 46

2.3.2 Numerics of Phase and Time Quantization . . . . . . . . . . . 49

2.4 Modelling Phase-Control System Non-Idealities . . . . . . . . . . . . 49

2.4.1 General INL and DNL Modelling . . . . . . . . . . . . . . . . 49

2.5 Architecture Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.5.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . 53

2.5.2 SSCG via Phase Modulation . . . . . . . . . . . . . . . . . . . 53

7

2.5.3 Additional Non-Idealities of Phase Modulator . . . . . . . . . 55

2.5.4 System Level Specifications . . . . . . . . . . . . . . . . . . . 55

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3 Analog Circuit Design 57

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.2 Interpolator Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2.1 Code Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2.2 Schematic Design . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.2.3 Phase-Interpolator Non-idealities . . . . . . . . . . . . . . . . 65

3.2.4 Layout and Sizing . . . . . . . . . . . . . . . . . . . . . . . . . 68

3.3 Signal Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.3.1 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.3.2 Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

3.3.3 Clock Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.4 Top-Level Floorplan . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.5 Evaluating the The High Frequency Signal Path . . . . . . . . . . . . 80

3.5.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 80

3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4 Digital and Auxiliary Circuits 87

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.2 Regulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4.3 Peak Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.4 Waveform Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.4.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.4.2 Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . . 93

4.5 Decoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.6 Calibration Finite State Machine . . . . . . . . . . . . . . . . . . . . 96

4.6.1 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8

4.7 Top-Level Floor-Plan and Clock Distribution . . . . . . . . . . . . . . 97

4.7.1 FloorPlan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.7.2 Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . 97

4.8 Power Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5 Top-Level Simulation 101

5.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

5.2 Spread-Spectrum Operation . . . . . . . . . . . . . . . . . . . . . . . 101

5.3 Full Circuit Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.4 Simulation Results Summary . . . . . . . . . . . . . . . . . . . . . . . 104

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6 Conclusion 109

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.2 System Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.3 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6.3.1 Optimizing the Interpolator . . . . . . . . . . . . . . . . . . . 110

6.3.2 Calibration for Arbitrary Input Frequencies . . . . . . . . . . 111

6.3.3 Top-level clock Distribution . . . . . . . . . . . . . . . . . . . 111

A Effect of Spectrum Analyzer 113

A.1 Model of peak-hold mode Spectrum Analyzer . . . . . . . . . . . . . 113

A.2 Calculation of Measured Spectrum . . . . . . . . . . . . . . . . . . . 114

A.3 Example 1: Gaussian Filter . . . . . . . . . . . . . . . . . . . . . . . 116

A.4 Example 2: Sinc Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 117

B Terminology 119

C PVT Corner Nomenclature 121

9

List of Figures

1-1 Cartoon picture of the effect of a clock spur on the power spectral

density of the signal at the output of a DAC. . . . . . . . . . . . . . . 20

1-2 Basic DAC operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1-3 Cartoon showing the DAC output and its spectrum. . . . . . . . . . . 22

1-4 Typical transmitter architecture for RF DAC. Note that the (purple

) spur location is independent of carrier frequency, and is not easily

filtered out. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1-5 Cartoon of Spread-Spectrum Clocking. . . . . . . . . . . . . . . . . . 25

1-6 The AD9129 in (a) is a typical RF-DAC. We show the traditional

clocking scheme (b) and the new clocking scheme with larger retiming

buffer for SS (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1-7 Typical PLL and its linearized model. . . . . . . . . . . . . . . . . . . 27

1-8 Example of Dual-Path Loop Filter, allowing the zero to be B times

slower. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

1-9 SS is generally achieved by modulating the feedback path or the LF of

a PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1-10 Triangular Frequency Modulation Example. . . . . . . . . . . . . . . 30

1-11 Example of compensated-phase rotation technique for SS clocking. . . 31

1-12 2 Implementations of Spread-Spectrum using digital phase control. In

the first case, the phase outputs of a DLL are muxed. In the sec-

ond case, the delay from a Voltage-Controlled Delay Line (VCDL) is

modulated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

11

2-1 Illustration of single-tone FM modulated signal (f0 = 1πHz, fm =

120πHz, φm = 1.4π radians, ∆f = .07Hz). . . . . . . . . . . . . . . . . 34

2-2 Illustration of the first 3 Bessel Functions. . . . . . . . . . . . . . . . 35

2-3 Example of SS on a square clock. . . . . . . . . . . . . . . . . . . . . 37

2-4 Illustration of the terminology using a sawtooth modulation waveform. 38

2-5 Picture showing a intuitive but incorrect derivation of spectral attenu-

ation. This picture seems to suggest that spectral attenuation should

scale linearly with δ, which is false in general. . . . . . . . . . . . . . 39

2-6 Illustration of the indexing terminology of tlα. . . . . . . . . . . . . . 41

2-7 Plot of the Fresnel Functions, C(x) and S(x), as defined above. . . . 41

2-8 Spectra using triangular modulation for several values of fm (φm =

64× (2π)). The attenuation level is largely independent of fm at fixed

φm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2-9 Spectral Oscillation Effect observed for Sawtooth and Triangular Mod-

ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2-10 Possible modulation waveforms V (t): triangle, sawtooth (better), Hershey-

Kiss (optimal). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2-11 Comparison between the attenuation levels for triangular and sawtooth

modulation, vs φm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2-12 Power Spectra for input and output 5GHz clock (Tdig = 3.2ns). Note

the Images appearing at 5GHz ± 312.5MHz. . . . . . . . . . . . . . 47

2-13 Illustration for the definitions used for discrete-time and phase system. 48

2-14 Plot for the attenuation level of the divide-by-4 clock as a function

of quantization level and clk period, for fm = 38kHz, 76kHz, 152kHz

respectively. Note that the quantization level is largely irrelevant, and

therefore is set to meet cycle-to-cycle jitter specifications, not attenu-

ation specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

2-15 Numerical Simulation of the attenuation of the divide 4 clock output as

a function ofDNLrms. Data includes INLmax ∈ (0mperiods, 200mperiods),

k ∈ (1, 10). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

12

2-16 Architecture Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3-1 IQ plane example, illustrating the terminology for quadrant numbering

and IQ coefficient quantization. . . . . . . . . . . . . . . . . . . . . . 58

3-2 Binary scaled interpolator using linear approximation. Because of the

lack of calibration in our final circuit, we opted for a thermometer DAC

current array instead. Figure from . . . . . . . . . . . . . . . . . . . 59

3-3 Constellation Diagram for interpolation using the linear approximation

and using correct trigonometic values. . . . . . . . . . . . . . . . . . . 60

3-4 Finding a reasonable grid quantization level. . . . . . . . . . . . . . . 61

3-5 Simple Type-I Phase interpolator Example (this is our final design

choice). There is a 16 copies of each differential pair, and 8 common

mode differential pairs, so there is a total of 72 differential pair cells. . 62

3-6 We can get rid of the common mode control cells by doubling the

number of cosine and sine cells for the same resolution (the select

signals now are 32 bits wide thermometer codes instead of 16 bits wide

in this circuit). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3-7 Type II phase interpolator. Inductors allow a larger headroom, so that

we have both enough swing and are able to operate the switches as

common-gate amplifiers. . . . . . . . . . . . . . . . . . . . . . . . . . 64

3-8 Effect of subthreshold conduction and feedforward in the IQ plane. . 66

3-9 Feedforward Effect in type I interpolator. . . . . . . . . . . . . . . . . 66

3-10 Plot of output differential current vs input differential voltage. . . . . 68

3-11 Layout of the Interpolator Array . . . . . . . . . . . . . . . . . . . . 69

3-12 Cartoon picture of 2nd order filter effect on Fourier coefficients. . . . 71

3-13 Filter Schematic. This schematic includes one of the 2 differential path

taken. The feedforward cancelling capacitor is included only for the

2nd stage, because the gate to drain capacitance of the first stage is

insignificant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

13

3-14 Layout of the SineShaper. This has 4 inverter chains, for the 2 differ-

ential signal path, one for cosine one for sine. . . . . . . . . . . . . . . 73

3-15 Plot showing the margin above saturation for the tail current source

and differential pair inputs. For PVT corner numbering reference, see

Appendix C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

3-16 AC response for the filter, configured for typical corner at 5GHz, and

a typical corner at 2.5GHz. . . . . . . . . . . . . . . . . . . . . . . . . 75

3-17 Restoration circuit and layout . . . . . . . . . . . . . . . . . . . . . . 76

3-18 Restoration Phase Margin. Although the phase margin is negative, the

stability of the circuit is not compromised because the inverters operate

non-linearly and effective loop gain is lower than what is expected from

small-signal analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3-19 Restoration output phase transient under phase code sweep. . . . . . 78

3-20 Clock Divider Circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3-21 Top level Floor Plan and High Frequency Signal Path Layout. Note

we added clock buffers at the output to drive the long wires out of the

block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3-22 Methodology for evaluating phase linearity. . . . . . . . . . . . . . . . 80

3-23 Simulation methodology for regulator code selection cross corners. . . 81

3-24 Plot of filter output amplitude as we sweep the regulator input code

value (and therefore change the filter supply), for all 45 PVT corners. 82

3-25 DNLrms value across corners for high and low frequency modes, with

5 quadrature error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3-26 We extract the largest DNL step across input code transitions, for each

corner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3-27 DNLrms histogram for 50 monte carlo simulations at 5GHz (worst case

corner, corner 28). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3-28 Integrated Jitter plot at the output of the restoration circuit. . . . . . 85

4-1 Regulator Schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

14

4-2 Peak Detector Schematic. . . . . . . . . . . . . . . . . . . . . . . . . 90

4-3 Phase Deviation over time for 3 attenuation modes (16.6dB, 13.6dB,

10.6dB) at fm = 38kHz. . . . . . . . . . . . . . . . . . . . . . . . . . 91

4-4 Waveform Generator block diagram. . . . . . . . . . . . . . . . . . . 92

4-5 Acceptable Modes of Operation. . . . . . . . . . . . . . . . . . . . . . 93

4-6 Waveform Generator layout. . . . . . . . . . . . . . . . . . . . . . . . 95

4-7 Sine and Cosine Decoders Layout. . . . . . . . . . . . . . . . . . . . . 95

4-8 Calibration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4-9 Calibration layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4-10 Top-Level Floor Plan, highlighting the new blocks. Synthesized digital

circuits are highlighted in solid lines. . . . . . . . . . . . . . . . . . . 98

4-11 Phase value bits and clock routing paths. . . . . . . . . . . . . . . . . 99

4-12 Effect of Clock Skew between the 2 decoders and Spectral Attenuation

for a 5GHz input clock. The parameters for this sim are φm = 64×2π,

fm = 38kHz, Tdig = 3.2ns. . . . . . . . . . . . . . . . . . . . . . . . . 99

5-1 Demonstration of Calibration Algorithm at work with behavioral phase

interpolator and peak detector. . . . . . . . . . . . . . . . . . . . . . 102

5-2 Output Clock attenuation. . . . . . . . . . . . . . . . . . . . . . . . . 103

5-3 Divide-by-4 clock attenuation. . . . . . . . . . . . . . . . . . . . . . . 104

5-4 Differential signals in signal path for 5GHz operation. . . . . . . . . . 105

5-5 312MS/s phase code sweep for 5GHz operation. . . . . . . . . . . . . 106

5-6 Simulation results at 5GHz for linearity performance. . . . . . . . . . 106

6-1 Replica-Feedback biasing example . . . . . . . . . . . . . . . . . . . . 111

6-2 Simple Clock Distribution Method . . . . . . . . . . . . . . . . . . . 112

A-1 Peak-Hold mode operation of spectrum analyzer, Figure from . . . . 114

A-2 Attenuation as a function of Tmλ

for a Gaussian Filter, where we fix

| 1√jkf0δV ′

I0k2| = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

15

A-3 Attenuation as a function of Tmλ

for a Sinc filter, where we fix | 1√jkf0δV ′

I0k2| =

1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

16

List of Tables

1.1 Specifications summary. . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.1 System level specifications . . . . . . . . . . . . . . . . . . . . . . . . 55

3.1 Power consumption summary (clock path). . . . . . . . . . . . . . . . 82

4.1 Digital and auxiliary circuit power consumption. . . . . . . . . . . . . 100

5.1 Specifications summary. . . . . . . . . . . . . . . . . . . . . . . . . . 107

17

Chapter 1

Introduction

Traditionally analog chips have become increasingly mixed signal to leverage the

digital processing power in finer process nodes. A difficult issue to tackle is the

Electromagnetic Interference (EMI) that pollutes high quality analog outputs. The

higher the clocking frequency of digital circuits, the larger the EMI will be, as high

frequency signals couples easily through capacitive isolation barriers. In high speed

Digital-to-Analog Converters (DAC) used in communication systems, EMI leads to

clock “spurs” which appear as spikes in the frequency spectrum of the output (Figure

1-1). This thesis presents a Spread-Spectrum Clock Generator (SSCG) used to reduce

those spurs while providing a usable digital clock for a high-speed DAC.

• Chapter 1 presents background information on DAC’s and previous work on

spread-spectrum clock generators.

• Chapter 2 shows the analytical and numerical study of the design tradeoffs of

a SSCG.

• Chapter 3 contains the design method and block verification of the main analog

blocks.

• Chapter 4 covers auxiliary analog circuits and digital circuits.

• Chapter 5 presents the top-level verification of the system.

19

• Chapter 6 concludes with possible extensions to this work.

Figure 1-1: Cartoon picture of the effect of a clock spur on the power spectral densityof the signal at the output of a DAC.

1.1 Basics of Digital-to-Analog conversion

1.1.1 DAC Operation

A DAC is a circuit that takes an input stream of bits and synthesizes from them an

analog voltage value (see Figure 1-2). A DAC is characterized by a bit resolution,

which tells us the granularity of the output analog values synthesized, and by bit

update rate in sample per seconds. For example a 2-bit, 10GS-per-second DAC can

output 4 different analog values, and update it every 100ps. We define fs as the

update rate of the output analog signal, in this case 10GHz. In a communication

system, a DAC is used to translate pre-processed bits into a waveform that contains

the baseband information (example, voice input) [1]. RF-DACs are DACs that have

an fs high enough to directly generate both the carrier waveform at an RF frequency

and the baseband modulation.

Let us consider digital data bit stream fbn into a DAC, where fbn takes on the bit

value 0 or 1 for a given bit value “b”, and time sample value “n”. A cartoon showing

20

Figure 1-2: Basic DAC operation.

the derivation of the DAC spectrum below is shown in Figure 1-2. We can then define

the corresponding analog value fn for each sample n, to be

fn =N−1∑0

FS

2N2bfbn (1.1)

where FS denotes the full scale voltage value and N is the DAC bit resolution.

Figure 1-3 shows fp(t), the typical output waveform of a DAC having a output

sample rate fs. This type of waveform is called a “zeroth-order” sample-and-hold

output waveform, and is a good approximation for the output DAC waveform. The

output of the DAC fp(t) can be expressed in terms of pulse function p(t):

fp(t) =∞∑n=0

p(t− n

fs)fn (1.2)

p(t) = 1 for t ∈ (0,1

fs) (1.3)

p(t) = 0 else (1.4)

It turns out we can also express fp(t) as the convolution of a train of dirac delta

functions called fd(t) convolved with p(t), shown in Figure 1-3.

21

Figure 1-3: Cartoon showing the DAC output and its spectrum.

fp(t) = fd(t) ∗ p(t) (1.5)

fd(t) ≡ [∞∑n=0

fnδ(t− nTs)] (1.6)

If we defined V (jω) and F (jω) to be the Fourier transforms of fp(t) and fd(t), we

can write V (jω) as:

V (jω) =1

2πF (jω)Sinc(

jω

2πfs) (1.7)

Sinc(x) ≡ sin(x)

x(1.8)

Figure 1-3 illustrates this calculation, and how it affects the power spectral den-

sities Fd, P and Fp of fd, p and fp respectively.

22

Figure 1-4: Typical transmitter architecture for RF DAC. Note that the (purple )spur location is independent of carrier frequency, and is not easily filtered out.

1.1.2 Harmonic Purity Requirement

Because of the stringent spectral mask requirements, it is important that a commu-

nication DAC be able to synthesize analog waveforms that have no significant output

signal outside of the target channel [2]. This is a difficult job, because the DAC might

have inherent data sequence dependent distortion which introduces errors at the out-

put, but also because of spurs. Spurs are signals at specific frequencies that appear

in the DAC output spectrum, but are not harmonics of the signal (Figure 1-4). The

large EMI generated by digital clocks is concentrated at the fundamental harmonic

of the digital clock. This interfering signal can couple directly to the output of the

DAC, or can couple indirectly to the bias lines, which mixes with the DAC output.

In the former case, the spurious tone frequency does not depend on the input signal

frequency, and is therefore very difficult to filter out (when it is within the frequency

range of interest).

1.1.3 Spread-Spectrum Clocking

Spread-Spectrum Clocking refers to frequency modulating a clock. In the frequency

domain, the original clock spectrum has its fundamental and its harmonics. Each

23

of those would be “fattened”, and the frequency peaks would spread-out, hence the

name Spread-Spectrum (SS). The resulting height of the peaks are reduced, and so

does the height of the interfering signals on the analog output (see Figure 1-5). One

way to see this effect is to realize that the power of the signal is unchanged by SS. This

means that if we increase the frequency spread, the height of the spectral density must

decrease to keep to the total power in the frequency domain constant. This results in

increased Spurious-Free Dynamic Range (SFDR). Not all systems can use SS clocks,

and certain systems have very small frequency deviation requirements, making SS

difficult to implement.

In our case, the DAC clock cannot be spread because the DAC clock must not

be frequency modulated. Modulating the timing of the output data can cause sig-

nificant output value distortion. The digital clock can be spread, given 2 conditions.

First, the cycle-to-cycle jitter must be be low so that clock periods do not become so

small that they compromise minimum timing margins set by digital designers. Fur-

thermore, spreading the digital clock will cause the digital data output to be phase

shifted relative to the analog clock. This data stream needs to be retimed, which

is traditionally accomplished by a FIFO buffer. The depth of the FIFO buffer, in

turn, has to be at least as large as the maximum time deviation between the spread

and unspread clocks. Details regarding the placement of the FIFO buffer in a typical

DAC architecture is shown in Figure 1-6.

1.2 Specifications for the Spread-Spectrum Clock

Generator

Given the use of spread-spectrum in an RF-DAC, we set up the following specifications

for the spread spectrum clock generator.

24

Figure 1-5: Cartoon of Spread-Spectrum Clocking.

Power Budget 50mW

Input quadrature clocks

Output divide-by-4 quadrature clocks

Jitter Less than 5ps rms

Maximum cycle-to-cycle jitter 20ps

Maximum Time Deviation 64 periods of input clock

Area .4mm× .4mm

Modulation Rate 38kHz or faster

Input clock rates fast mode (5GHz), slow mode (2.5GHz)

EMI reduction 20dB reduction

Process node TSMC’s 65nm CMOS low-power

Table 1.1: Specifications summary.

The power consumption and area restrictions are required to integrate the block

into a much larger DAC chip. The relatively lax jitter requirement is because the

output clock will be used to clock the digital data, which only need to have enough

margin to satisfy the setup and hold time requirements. The more jitter, the smaller

25

(a) AD9129

(b) Old Clocking (c) Proposed Clocking

Figure 1-6: The AD9129 in (a) is a typical RF-DAC. We show the traditional clockingscheme (b) and the new clocking scheme with larger retiming buffer for SS (c).

the valid timing window used to sample data bit is. Since a digital clock at 1.25GHz

has 800ps clock period, a 20 ps cycle-to-cycle jitter is only a modulation of 2.5 % of

the total period.

26

Figure 1-7: Typical PLL and its linearized model.

1.3 Past Work: Frequency Control Systems

In general, a frequency control system involves a Phase-Locked Loop, sketched in

Figure 1-7. We can use linear system analysis to determine its stability, with the

phase φ(s) as the control variable, given the input signal is within the PLL locking

range1. Let KPD be the sensitivity of the phase detector, Gm the transconductance

of the charge pump, HLF (s) the transfer function of the loop filter and KV COs

the

transfer function of the Voltage-Controlled Oscillator (VCO) . For the typical loop

filter implemention shown in the figure, the loop transfer function is

L(s) = KPDKV COGm1

s(C1 + C2)

(1 + sR1C1

1 + s(R1C1C2

C1+C2)

)(1.9)

For the loop to be stable, we want to ensure that the 1R1C1

zero falls far below

unity gain frequency, and for the 1R1(C1‖C2)

pole to be above unity gain frequency.

1For more details on PLL modeling, see [3] and [4]

27

For a spread-spectrum application, the unity gain frequency has to be below the

modulation frequency, otherwise, the loop will correct out the frequency modulation

pattern. This means that the 1R1C1

zero must typically be very large.

When one looks at the PLL, one can see two main areas which can be modulated

to create spread spectrum: the input to the VCO or the feedback path. The VCO

is directly modulated in the work in [5], where a programmable current source is

integrated onto a capacitor that stores the control voltage of the VCO, to generate

frequency sweeps and spread spectrum. The concept is simple to implement, however,

one disadvantage is that the bandwidth of the PLL must be significantly reduced, so

that it can filter out the modulation component. Indeed, the modulated clock will

appear on the output as an input to the phase-frequency detector. The small PLL loop

bandwidth can be implemented with a large integrated capacitor, which consumes a

large area.

Several works used the idea of capacitive multiplication to address this problem

[6] [7]. For example, in [7], we have 2 charge pumps that provides 2 parallel paths

in order to create the zero in the loop transfer function (Figure 1-8). If we size the

charge-pumps with a ratio of B, the transfer function of the loop filter becomes:

HLF (s) =1

sC1

(1 + sBR1C1

1 + sR1C2

)(1.10)

Essentially, the zero has be slowed down by a factor B, allowing us to increase the

value of C1.

The most popular way to implement a spread-spectrum clock is to modulate the

divide by M path in a PLL (Figure 1-9). Because the divide-by circuit is inherently a

integer divide operation, fractional division to make small frequency modulations uses

dithering of the clock and delta-sigma modulation to average out the quantization er-

ror [8] . Both the feedback path and the VCO can be modulated simulataneously, as

is done in [9], which allows more accurate production of a triangular frequency wave-

form (example waveform in Figure 1-10), even though the loop-filter is not necessarily

28

B*Ip

Ip

bufferC1

C2

R1

VC

Figure 1-8: Example of Dual-Path Loop Filter, allowing the zero to be B times slower.

slow enough.

For protocols where the instantaneous frequency deviation must be very small,

such as the Serial ATA protocol, the increase in hardware complexity to filter out

the quantization noise from fractional division might cost too much power and area.

Indeed, a quick look at the layout of a fractional-N PLL implementation of SSCG

in [10] shows quickly that most of the area is taken by the loop-filter alone, just

to filter out quantization errors. One technique to have true fractional division in

the feedback path is to have both frequency division and phase selection occurring

simultaneously (Figure 1-11). This technique, called “compensated phase rotation,”

has the benefit of producing true fractional division and reduced jitter [11].

1.4 Past Work: Phase Control System

In phase control systems, a digital finite state machine controls a digitally controlled

phase synthesis system to shift the phase and therefore the frequency of the out-

put around (2 architectures are shown in Figure 1-12). The typical architecture is

presented in [12], where a DLL generates multiple output phases, which are selected

based on a digital algorithm to shift the phase and frequency around. A similar design

in [13] has both dummy and actual delay-locked loop, the dummy one to monitor

the number of delay taps for a full period. This design has a highly reconfigurable

29

Figure 1-9: SS is generally achieved by modulating the feedback path or the LF of aPLL.

Figure 1-10: Triangular Frequency Modulation Example.

30

Figure 1-11: Example of compensated-phase rotation technique for SS clocking.

waveform frequency selection, with an SRAM to allow reprogrammable waveform se-

lection. It demonstrates the flexibility in using a phase-control system as opposed to

a frequency control system. In [14], the phase control system is a delay line with

switchable delay cells. The total delay is just the sum of the delays that are in the sig-

nal path, and is digitally controlled to produce triangular frequency modulation (see

Figure 1-10 for example waveform). The argument for using a delay cell array is that

in contrast to a PLL, the random edge jitter can be suppressed because the random

period jitters are not accumulated by a VCO. For our purpose however, the maxi-

mum phase deviation has to be tightly controlled because it determines the size of

the FIFO buffer needed, and using uncalibrated delays to generate phase modulation

cannot be possible.

The new phase controlled architectures all rely on digital circuits to generate the

output phase waveform. Using a fine process of 65nm CMOS, we therefore envision

that a phase-controlled architecture will make better use of the lower power and

area requirements from a digital processing of the phase waveform. Furthermore, the

previous work using a digitally controlled phase output shows that quantized phases

can reliably create spread-spectrum clocks, an observation we will verify numerically

in the next chapter.

31

Figure 1-12: 2 Implementations of Spread-Spectrum using digital phase control. Inthe first case, the phase outputs of a DLL are muxed. In the second case, the delayfrom a Voltage-Controlled Delay Line (VCDL) is modulated.

32

Chapter 2

Behavioral Study and System

Proposal

2.1 Introduction

This chapter introduces much of the notation used later in the thesis. It includes a

derivation of the attenuation for an arbitrary spread-spectrum waveform, illuminates

the choice for the frequency modulation waveforms, and the effect of various non-

idealities on the performance of the spread-spectrum block. After a careful numerical

investigation, we construct an new architecture for spread-spectrum that is both

simple and effective to meet our design specifications.

2.2 The Mathematics of Spread-Spectrum

2.2.1 A Toy Model: Single-tone FM Modulation

A simple example of a spread-spectrum clock is a clock with a sinusoidally vary-

ing frequency. For example, a toy unspread clock can be a sine wave, of the form

Vunspread(t) = A cos(2πf0t), where f0 is the frequency of clock. The spread clock is

then defined to be:

33

Figure 2-1: Illustration of single-tone FM modulated signal (f0 = 1πHz, fm = 1

20πHz,

φm = 1.4π radians, ∆f = .07Hz).

Vspread(t) ≡ A cos(φ(t)) (2.1)

where φ(t) ≡ 2πf0t+ φm sin(2πfmt) (2.2)

φm ≡∆f

fm(2.3)

We will use this example to illustrate various terms.

• The maximum phase deviation is defined as the maximum phase difference

between the Vspread and Vunspread, and will generically called φm.

• The instantaneous frequency of the clock called f(t) is computed as follows:

f(t) ≡ 1

2π

dφ(t)

dt= f0 + ∆f cos(2πfmt)

.

• The carrier frequency, f0, is the center frequency around which the phase

34

Figure 2-2: Illustration of the first 3 Bessel Functions.

varies. It is typically much higher than the modulation frequency, fm, which

is the frequency at which f(t) varies.

• The modulation depth of the signal is max(|f(t)− f0|) = ∆f

• The frequency spread is defined as the frequency range covered by the in-

stantaneous frequency. In this case, it is just 2∆f .

For this toy model, we can decompose Vspread(t) in terms of the Bessel functions

of nth order, called Jn, and easily obtain its Fourier transform.

Vspread(t) = A

∞∑n=0

Jn(β) cos (2π(f0 + nfm)t) (2.4)

The output spectrum of Vspread is only nonzero at f0+nfm for n ∈ Z. Furthermore,

the attenuation, which we define as the ratio between the highest peak of the PSD

between the unspread and spread clock, is proportional to | 1J0(β)|2, which is only a

function φm, the maximum phase deviation. For large phase shift values φm, the

attenuation scales as follows:

35

∣∣∣∣ 1

J0(φm)

∣∣∣∣2 ≈ 1

2πφm(2.5)

We see that the attenuation scales inversely with φm. This implies a 3dB per octave

scaling between attenuation and maximum phase shift. This observation can be

generalized for more complicated modulation scheme, and is a principle tradeoff in

the design of the spread-spectrum clock generator.

2.2.2 Generalization to Arbitrary Modulation Waveform

In this section, we consider a generic modulation waveform and a generic clock wave-

form instead.

A generic clock waveform u(t) can be written in terms of Fourier series as:

u(t) =∞∑

k=−∞

I0k2

exp(2πjkf0t) (2.6)

For example, if u(t) is a square wave of amplitude A, the harmonic values I0k

would be

I0k = 2Asin(kπ

2)

kπ(2.7)

Consider a modulation shape V (t), which is a function in the range [−1, 1] that

describes the shape of the frequency modulation. In general, a frequency-spread clock

signal us(t) with instantaneous frequency f0(1 + δV (t)), is then written as:

us(t) = u(t+δ

fm

∫ fmt

0

V (t)dt) =∞∑

k=−∞

Imk(t) (2.8)

Imk(t) ≡I0k2

exp

(2πjf0

(t+

δ

fm

∫ fmt

0

V (t)dt

))(2.9)

36

Figure 2-3: Example of SS on a square clock.

(2.10)

We will use this example once again to practice our terminology:

• The modulation frequency is fm.

• The frequency spread is 2δf0.

• The carrier frequency is f0.

• The maximum phase deviation,

φm =

∣∣∣∣∣2πf0δ∫ 1

2

0

V (t)dt

∣∣∣∣∣

2.2.3 Modulation Waveform Selection

Our goal is to select a modulation shape, V (t), such that we create maximum attenu-

ation for a given fm and δ. A first simple guess is to use V (t) as a triangle wave. This

37

Figure 2-4: Illustration of the terminology using a sawtooth modulation waveform.

guess comes from the intuition that the frequency-spread spectrum will be approx-

imately flat, because the instantaneous frequency of the output waveform visits all

frequencies in the frequency spread equal amounts of time. This intuition is almost

true, but inaccurate as we can see in the spectrum of Figure 2-9. The output spec-

trum has oscillations instead of a flat spectrum. Another observation is that the even

for the triangle waveform, the attenuation is largely not a function of the frequency

spread, but rather a function of the maximum phase deviation (Figure 2-8). This is

rather non-intuitive in this case. The total power of the signal is conserved before

and after spread spectrum. If the frequency spread increases, the power spectrum

density must go down proportionately to conserve the integrated area under the box

(Figure 2-5). We should have 3dB per octave relationship between attenuation and

δ, not attenuation and φm. A more careful analysis is needed.

Stationary-Phase Approximation :

38

Figure 2-5: Picture showing a intuitive but incorrect derivation of spectral attenu-ation. This picture seems to suggest that spectral attenuation should scale linearlywith δ, which is false in general.

For typical clock signals, the harmonic content decays much faster than the square

wave because higher frequencies are attenuated. To calculate the attenuation due to

spread-spectrum, we only need to consider the fundamental. The Fourier transform

of the fundamental is then

F(Im0(t)) ≡ Im1(ω) =

∫ ∞−∞

dt exp(−jωt)Im1(t) (2.11)

We will now consider a V (t) that is roughly triangular shaped, although the

analysis, notation and conclusions remain the same for a generic V (t). When the

phase oscillates rapidly enough, we can evaluate the integral using the stationary-

phase method. For a given ω ∈ (2πf0(1− δ), 2πf0(1 + δ)), the instantaneous angular

frequency will match that value twice. We can index the modulation periods by n, and

index the αth time the the waveform frequency matches ω by α. An illustration of this

terminology is shown in Figure 2-6. The stationary-phase method just asserts that

most of the contribution to Im0(ω) comes from when the integrand phase is stationary,

which is around those 2 times, each modulation period. When the integrand phase

oscillates, because the modulation waveform is slow, the integrand roughly cancels

out. If this assertion is true, we can take a Taylor expansion of the phase of the

39

integrand around those points.

We therefore have:

Im1(ω) =

∫ ∞−∞

dt exp(−jωt)I012

exp(2πjf0(t+δ

fm

∫ fmt

0

V (τ)dτ)) (2.12)

≡∫ ∞−∞

dtI0k2

exp(jφ(t)) (2.13)

φ(t) = −ωt+ 2πjf0(t+δ

fm

∫ fmt

0

V (τ)dτ)|t≈tlα (2.14)

≈ φ(tlα) + (πf0δ)V′(tlα)(t− tlα)2 (2.15)

Im1(ω) ≈2∑

α=1

∞∑l=−∞

∫ (l+1)Tm

nTm

dtI0k2

exp(jφ(tlα) + j(πf0δ)V′(tlα)(t− tlα)2)

(2.16)

Im1 ≈2∑

α=1

∞∑l=−∞

Γ(tlα)

√1

j2πδf0V ′(tlα)

I0k2

exp(jφ(tlα)) (2.17)

Im1(ω) ∝ 1

Tm

∑α

Γ(t0α)

√1

jδf0V ′(t0α)(2.18)

Γ(tlα) = S((l + 1)Tm − tlα) + S(tlα − nTm) + C(tlα − nTm) + C((l + 1)Tm − tlα)

(2.19)

C(x) ≡∫ x

0

dt cos(t2) (2.20)

S(x) ≡∫ x

0

dt sin(t2) (2.21)

C(x) and S(x) are the Fresnel integrals (graphed in Figure 2-7). In the limit where

Tm >>√

1δf0V ′(tnα)

, we can extend the limits of integration to ±∞, and approximate

Γ ≈√

2π.

Formula 2.16 confirms the 2 crucial observations made previously about spread-

spectrum:

• |Im1|2 ∝ 1δf0V ′(t)(Tm)2

. For the case of triangular modulation, this value is just a

constant, and is in fact the inverse of the maximum phase deviation. We recover

40

Figure 2-6: Illustration of the indexing terminology of tlα.

Figure 2-7: Plot of the Fresnel Functions, C(x) and S(x), as defined above.

41

Figure 2-8: Spectra using triangular modulation for several values of fm (φm = 64×(2π)). The attenuation level is largely independent of fm at fixed φm.

the 3dBoctave

scaling between maximum phase deviation and attenuation. We note

that for a triangle or sawtooth waveform, the waveform is uniquely specified by

only 2 parameters out of the following: δ, fm and maximum phase. At fixed

maximum phases, changing fm, and correspondingly δ, has little effect on the

attenuation.

• We expect a ripple in the spectrum because of constructive or destructive in-

terference as we vary ω. This comes from the sum of exp(jφ(tnα)) and the

oscillation of the Fresnel functions. In fact, as the tnα approach the edges of the

modulation waveforms, at its peak or its trough, we should expect constructive

interference because tn0 and tn1 become identical. This is confirmed when we

look at numerical computations of the power-spectrum (Figure 2-9).

These 2 observations allow us to conclude that the optimal modulation waveform

will de-emphasize (spend less time) at the edges of the of the spectral band, unlike

the triangular modulation waveform which spends equal amounts of time in all fre-

42

Figure 2-9: Spectral Oscillation Effect observed for Sawtooth and Triangular Modu-lation.

quencies. This is traditionally done with a “Hershey-Kiss” profile (see Figure 2-10)

for V (t), but is difficult to reproduce accurately using a digital FSM. Furthermore,

using a sawtooth-like waveform should perform better than a triangular waveform

because it prevents the constructive interference to occur at the edge of the spectrum

(this claim is verified numerically in Figure 2-11). Further work has been done on

using a waveform that evens out the oscillations due to the Fresnel functions, but

using a so-called “optimal waveform” necessitates a very complex modulation scheme

to accurately reproduce the modulation waveform function [15] [16]. We therefore

choose a sawtooth waveform for our final design.

Effect of Spectrum Analyzer :

It seems that neither the value of fm, nor the values of δ came into play in the

waveform selection. This is partially true. What we have considered so far is an

ideal spectrum analyzer, which computes the Fourier coefficients perfectly. In the

real world, the spectrum analyzer itself has an impulse response we have to account

43

Figure 2-10: Possible modulation waveforms V (t): triangle, sawtooth (better),Hershey-Kiss (optimal).

for. An analysis with the spectrum analyzer response tells us that fm needs to be

faster than the Resolution Bandwidth (RBW) of the spectrum analyzer in order for

the analysis in the previous section to hold (see Appendix A).

2.3 Mathematics of a Phase-Control System

In this section, we explore the non-idealities of using phase-control system as opposed

to frequency control systems to do spread-spectrum. While the frequency-control sys-

tem like a PLL is more faithful to the mathematical analysis in the previous section,

a phase-control system is able to perform discontinuous frequency modulation. We

therefore have to explore how discrete-time and discrete-phases affect the SS perfor-

mance.

44

Figure 2-11: Comparison between the attenuation levels for triangular and sawtoothmodulation, vs φm.

45

2.3.1 Discrete-time and Discrete-Phase System

For a realistic digital phase control system that creates SS to work, we have to ensure

that both the phase quantization and the discrete phase update at the digital clock

will not significantly affect the attenuation of the spectrum. We call the time period

between phase updates Tdig. This period is typically the clock period for the digital

FSM that creates the modulation waveform. For a given V (t), we will then have a

phase φn at time t = nTdig, since the phase is updated discretely. For the sake of

simplicity, consider Tm = NTdig where N is an positive integer. φn is therefore equal

to φn+N . Let us calculate the Fourier transform:

Im0(ω) =

∫ ∞−∞

dt exp(−jωt)Im0(t)dt (2.22)

=∞∑

l=−∞

N−1∑n=0

∫ lTm+(n+1)Tdig

lTm+nTdig

dt exp(−jωt) exp(jφn) (2.23)

= [∞∑

l=−∞

exp(−jlωTm)]sin(ω

Tdig2

)

ω

N−1∑n=0

exp(−jω(n+1

2)Tdig) exp(jφn) (2.24)

The term in bracket is the same as we had before, from the modulation period-

icity. However, the discrete phase update adds “images” to the spectrum which are

weighted by the sinc function (Figure 2-12). Indeed, we can see that the term in the

second summation is invariant under ω → ω + 2πTdig

, which implies that the spectral

contribution at f0 will also appear at f0 + n 1Tdig

for integer n except it will be mod-

ulated by the term before the sum, the sinc function. Furthermore, the continuous

Fourier transform has been transformed into a discrete Fourier transform. We assume

that in the limit Tdig is small, we recover the continuous Fourier transform. Because

the images are suppressed, we do not expect that they affect the overall attenuation

level.

Dividing the spread clock down :

The final clock used for digital clocking is a 1.25GHz. Our block is supposed to

46

Figure 2-12: Power Spectra for input and output 5GHz clock (Tdig = 3.2ns). Notethe Images appearing at 5GHz ± 312.5MHz.

produce 4 phases of the clock at 1.25GHz using a spread clock at 5GHz. There are

several direct effect of this divide-down operation:

• We lose immediately 6dB in attenuation. This is because the maximum phase

deviation decreases by a factor of 4.

• For a phase update rate faster than 1.25GHz, there is a time truncation error.

This is because the phase only updates at clock transitions, which happens

every 800.0ps. This is not likely to happen because the digital FSM is most

likely running at a subrate of the output 1.25GHz clock.

• The width of the spectrum will tighten by a factor of 4. This is because the width

of the spectrum is related to max|f(t) − f0|, but the instantaneous frequency

will shrink by 4 when put through the divider.

47

Figure 2-13: Illustration for the definitions used for discrete-time and phase system.

48

2.3.2 Numerics of Phase and Time Quantization

There are two main parameters we are interested in, in particular the quantization

level Q, which we define as the number of output phases our phase-control system

can produce in a 5GHz period, and the digital update period, Tdig, which is the time

period between phase updates (Figure 2-13). The 2 parameters are interlinked. If

Tdig is slow, but Q is large, there will be phase skips much larger than the minimum

phase step. If Tdig is fast, but Q is small, the phase jumps will be limited by the

minimum phase step of the phase control system. Finally, if fm increases at fixed

φm, Tdig has to decrease, or phase skips will occur. All those effects are investigated

below and graphed (Figure 2-14).

What we observe is Q is a relatively benign parameter, but if we want to meet

attenuation requirements at 76kHz or faster, we need to set Tdig < 3.2ns. This is a

reasonable clocking rate for a digital system. Q is then set to be 32 and fm to 38kHz,

not because of concern for the attenuation specification, but because we want to

bound the minimum phase step to satisfy cycle-to-cycle jitter requirements (although

fm = 76kHz would still meet jitter requirements, using fm = 38kHz leaves a more

satisfactory conservative margin)

2.4 Modelling Phase-Control System Non-Idealities

Previously, our numerical simulations assumed that the phases would be exactly

nQ× 2π, where n is the truncated value of the analog phase. In general, we expect the

output phase to differ because of non-systematic errors (random mismatches in the

silicon), but also systematic errors (quadrature phase errors, anharmonicity of input

signals etc...). We explore the effect of those errors in this section.

2.4.1 General INL and DNL Modelling

To model those errors, we introduce here the term INL, which is typically in data

converters. Consider a phase control system with Q output phases in a revolution.

49

(a) fm = 38kHz (b) fm = 76kHz

(c) fm = 152kHz (d) fm = 304kHz

Figure 2-14: Plot for the attenuation level of the divide-by-4 clock as a functionof quantization level and clk period, for fm = 38kHz, 76kHz, 152kHz respectively.Note that the quantization level is largely irrelevant, and therefore is set to meetcycle-to-cycle jitter specifications, not attenuation specifications.

50

A phase control system takes an input phase φi, and tries to output a signal with an

output phase φn = φi − n2πQ− φoffset, where φoffset is some offset phase which has

no performance impact in our application. If we call the actual output phase φon, we

define:

INLn = Qφn − φon

2π(2.25)

DNLn = INLn+1 − INLn (2.26)

In our case, we will measure INL and DNL in “milliperiods”, which is a unit

of phase (1 milliperiod = 2π1000

radians). We then create a sinusoidal INL profile,

characterized by INLmax and k.

INLn = |INLmax sin(nπk

Q)| (2.27)

At fixed INLmax, k scales linearly with the DNL values. At fixed k, INLmax also

scales linearly with the DNL value. Therefore, DNL scales roughly as INLmax × k.

Because we are interested in spectral attenuation, pure phase INL is not the spec of

interest but DNL is. Therefore, we plot in Figure 2-15 the attenuation as a function

of DNLrms, which we define to be:

DNLrms ≡1

Q

√√√√Q−1∑n=0

DNL2n (2.28)

What we observe is that the attenuation spec is very insensitive to the value of

DNLrms, and in fact, values of DNLrms up to 24 milliperiods have almost no effect

(Figure 2-15).

51

Figure 2-15: Numerical Simulation of the attenuation of the divide 4 clock outputas a function of DNLrms. Data includes INLmax ∈ (0mperiods, 200mperiods), k ∈(1, 10).

52

2.5 Architecture Proposal

2.5.1 System Requirements

We found out that a phase control system to realize spread-spectrum is very robust

to non-idealities such as phase quantization, discrete time updates, and phase DNL.

In fact, we will saturate cycle-to-cycle jitter specifications before causing significant

attenuation degradation. We therefore set Q to be 32, so that only 6ps phase steps are

taken, and Tdig = 3.2ns, for a reasonable digital clocking speed. We leave ourselves

another 6ps of maximum DNL, so that the maximum cycle-to-cycle jitter will be

12ps. Finally, to achieve about 20dB of attenuation at the divide-4 clock output, we

will need to set φm = 64 periods at 5GHz.

2.5.2 SSCG via Phase Modulation

We now propose an architecture to create the phase modulation pattern to create

spread spectrum with those specifications in mind. The key equation of this archi-

tecture is:

√a(t)2 + b(t)2 sin(ωt+ tan−1(

b

a)) = a(t) cos(ωt) + b(t) sin(ωt) (2.29)

This equation tells us that by adding 2 quadrature signals with varying time

coefficients, we can reproduce any points on the IQ plane. To implement this equation,

we take in anharmonic differential IQ clocks, which we filter using variable gain filters

to obtain the sine and cosine signals (Figure 2-16). The phase interpolator then adds

them up with the coefficients a(t) and b(t), which are digitally controlled by a FSM.

The output signal is then gained up and restored to full rail swing, to be divided

down. The divider is included in the block because routing outwards 5GHz signals

is very power inefficient. The use of calibration for the filter is because of the wider

frequency range of operation (2.5GHz to 5GHz). The input amplitude to the phase

interpolator is monitored to ensure it does not saturate, so the bandwidth of the

53

filters must be tuned based on the input frequency and the process variations.

Figure 2-16: Architecture Proposal

54

2.5.3 Additional Non-Idealities of Phase Modulator

We have previously investigated the effect of phase errors on a phase control system

on the spread-spectrum performance. For the case of a phase modulator, the input

quadrature phase error will be a clear contributor to the non-linearity of the phase

to code relationship. However, input quadrature error leads to a smooth modulation

of the phase over a period, and cycle-to-cycle jitter will not be harmed significantly.

We therefore leave a 5 degree budget for input quadrature phase error.

2.5.4 System Level Specifications

We have now determined the type of phase linearity and quantization resolution for

our phase control system. In particular, we have a clear idea on the phase update rate,

therefore the clock speed requirements. We are ready to choose circuit topologies to

satisfy the specifications below:

Specification Value

Phase Resolution 32 phases in a periodDNLrms < 12 milliperiodsmax|DNL| < 31.2 milliperiods (1 LSB phase)Phase Update Rate 312 MSample/sInput Quadrature Error up to 5 degrees.Maximum time deviation 12.8ns (φm = 64periods at 5GHz)Modulation Frequency 38kHzFrequency Modulation Waveform Sawtooth

Table 2.1: System level specifications

2.6 Summary

In this chapter, a mathematical model of spread-spectrum waveforms was presented

both for continuous phase and discrete phase systems. In the continuous phase case,

we derived explicit analytical calculations of the attenuation specification, and nu-

merically verified those analytical expressions. This allowed us to develop an intuition

55

for design tradeoffs of the spread-spectrum system, and choose an optimal modula-

tion waveform (sawtooth waveform). We then analytically and numerically analyzed

non-idealities introduced by discrete-phase and discrete-time systems, and obtained

bounds within which we can meet our target specifications. We finally propose an

phase-control architecture based on phase-interpolation.

56

Chapter 3

Analog Circuit Design

3.1 Introduction

This chapter reviews schematic and layout implementation details for the analog

circuits of the SSCG system. We divide the analog circuits into 2 main categories:

the interpolator core and signal conditioning.

• Interpolator core: The interpolator core is a phase interpolator that takes in

quadrature sine waves and is digitally controlled to produce an output sine wave

with some intermediate phase value. The interpolator core’s crucial specification

is its DNL performance.

• Signal Conditioning: Signal conditioning refers to anything in the high fre-

quency clock signal path that is not the interpolator core. It consists of input

filters, which makes the rail-to-rail inputs become more sine-shaped by filtering

out the higher harmonics of the input clock, and a restoration circuit which re-

stores the weak output signal of the interpolator into a rail-to-rail signal. It also

includes a divide-by-4 circuit which divides down the 5GHz clock to quadrature

phases of a 1.25GHz clock.

We defer discussion of the peak-detector circuits, voltage regulator circuit, and

clock distribution scheme for chapter 4.

57

Figure 3-1: IQ plane example, illustrating the terminology for quadrant numberingand IQ coefficient quantization.

3.2 Interpolator Core

3.2.1 Code Selection

The phase interpolator relies on the equation:

a(t) cos(ωt) + b(t) sin(ωt) =√a2 + b2 cos(ωt− tan−1( b

a)) (3.1)

Each signal characterized by a(t) and b(t) can be mapped to a point (a(t), b(t)) on

a so-called IQ plane, shown in Figure 3-1. The IQ plane is characterized by a grid

resolution N, which we define to be the number of non-zero values an or bn can take

in a particular quadrant. We must choose N, and then find a mapping between a

given phase φn and its corresponding coefficients, or IQ plane lattice point which

we call (an, bn). As a quick reminder, we have decided on using 32 phases a period,

and we are aiming for an absolute maximum of 12 milli-periods DNLrms, with both

quadrature phase error, coefficient quantization error, as well as phase interpolator

non-ideality, input and phase interpolator non-linearity and monte-carlo mismatch.

Intuitively, as we increase N, we are able to hit lattice points closer and closer both

in amplitude and phase to the ideal amplitude 1 and ideal phase value φn. However,

increasing the granularity N also implies a larger number DAC unit current sources.

This is an increase in area and power of the interpolator stage.

58

Figure 3-2: Binary scaled interpolator using linear approximation. Because of thelack of calibration in our final circuit, we opted for a thermometer DAC current arrayinstead. Figure from [19].

A simple IQ point selection, for example in [17] [18], is a linear coefficient

selection. The circuitry for such design is shown in Figure 3-2. This means that

instead of having bn ≈ ±√

1− an2, we instead set bn = 1−an. This strategy produces

a diamond shaped point selection on the IQ plane, as shown in Figure 3-3 . The

advantage of this linear point selection is the relative simplicity of generating bn from

an. Using a binary array, no decoders are even needed, as seen in Figure 3-2. The

disadvantage is that the coefficient selection inherently introduces additional phase

DNL. In the previous work in [17], a digital predistortion (DPD) method was used

to compensate for the nonlinearity. In [18], the application of the interpolator was

for a CDR circuit where the monotonicity, not the linearity, of the interpolator was

crucial.

Another factor that led us to elect a different IQ point selection from a linear inter-

polation method is the concern for amplitude modulation. The amplitude modulation

59

Figure 3-3: Constellation Diagram for interpolation using the linear approximationand using correct trigonometic values.

using the linear approximation becomes independent of the step size chosen in the IQ

plane. Since the interpolator will go into an highly saturated amplifier, the final am-

plitude should be rail-to-rail and independent of the code. However, small amplitudes

can have significant effect on the overall delay. Because we target our interpolator

for a phase step of only 6.25ps at 5GHz, even small delay differences will lead to

amplitude-to-phase conversion on the order of the step size. Furthermore, this ampli-

tude modulation is systematic: it occurs every period. The spread-spectrum system

will therefore exhibit spurious modulation that is inside the clock frequency band,

which can lead to unexpected spectral oscillation and degradation of the attenuation

specification.

In Figure 3-4, we graph the effect of IQ grid resolution to DNLrms. Setting an

amplitude modulation criteria of less than 10 %, we then select a IQ grid granularity

so that DNLrms is a small fraction of the LSB of phase step. Let us call N the

resolution of the IQ grid in any given quadrant (so a quadrant will have (N + 1)2

points). We can find the a N such that DNLrms <15

LSB, a slightly conservative

choice. N = 16 satisfies handily our restriction, and is our final choice.

60

Figure 3-4: Finding a reasonable grid quantization level.

3.2.2 Schematic Design

Interpolators with separate I and Q coefficient control can be implemented in several

ways, but mainly revolve around a single differential pair, with switches either in

the tail current or the in the current path to steer a digitally controlled amount of

current into the resistors. Interpolators called type-I have the current steering switch

below the differential pair inputs, while, interpolators called type-II have the current

steering switch above the differential pair input.

A simple implementation of a type-I interpolator has 64 cells, 16 cells for each

quadrant. By putting the switch below the tail bias transistor, we isolate the switch

from the output. An example of this circuit is found in [20] or [21], although

we reproduce it below for our specific implementation (Figure 3-4). Each cell is

biased with the same tail current, which is ISS = 150µA in our application. They are

therefore called “thermometer weighted” (as opposed to binary-weighted). Let us call

the thermometer value of selectsinp, selectsinn, selectcosp, selectcosn, selectcm to be

x1, x2, x3, x4, x5 respectively (the thermometer code labels are shown in the schematic

of Figure 3-5). Using this terminology, we note the thermometer values x1...x4 takes

on integer values from 0 to N , while x5 can take integer values up to the number

61

select_sinp[15:0] select_sinn[15:0]

vb

sin-sin

sin

vb

Vdd Vdd

select_cosp[15:0] select_cosn[15:0]

cos cos-cos

select_cm[7:0]

vb

vcmvcm

M1 M2

M3 M4

M5

M6

M7 M8

M9

M10

R1 R2ICM ICM

I1d

I2d

I3dI4d

Figure 3-5: Simple Type-I Phase interpolator Example (this is our final design choice).There is a 16 copies of each differential pair, and 8 common mode differential pairs,so there is a total of 72 differential pair cells.

of common-mode cells we choose to provide. Then, the output differential voltage

is proportional to the differential current (called I1d, I2d, I3d, I4d) going through the

positive and negative resistors:

Vod = R× ((I1d − I3d) + (I2d − I4d)) (3.2)

= ISSR× ((x1 − x3) sin(ωt) + (x2 − x4) cos(ωt)) (3.3)

The expression above allows us to identify the value x1−x3N

= bn and x2−x4N

= an,

where the (an, bn) pair represent a point in the IQ plane (Figure 3-1). By selecting

appropriate codes x1, ..., x4, we can target all the points in the 4 quadrants of the IQ

plane, since an and bn can sweep from −1 to 1. The output common mode Vocm can

also be expressed similarly:

62

Vocm = Vdd −I02× (x1 + x2 + x3 + x4 + x5) (3.4)

For 0 ≤ x1, x2, x3, x4 ≤ N , we can see we can reach at most (2N + 1)2 grid points

in the IQ plane. Looking at equation 2, we see there is no need for common mode

cells (b can be set to 0 for all codes), if we set x1 + x3 = N and x2 + x4 = N and use

the difference x1−x3 or x2−x4 to obtain arbitrary points in the IQ plane. Doing so,

however, increases the lattice spacing between reachable points in the IQ plane by a

factor of 2, which is why we did not choose this scheme (although it is implemented

in [22] and shown in Figure 3-6). A circuit that would implement this switching

pattern is shown below.

If we set x1 = 0 in quadrant II and III, x2 = 0 in quadrant III and IV, x3 = 0

in quadrant I and IV, and x4 = 0 in quadrant I and II, allows us to hit the native

resolution of (2N + 1)2 points. The common mode will move roughly by a factor of√

2, which means the number of common mode cells must be at least N(√

2−1). This

means that x5 must take at least (√

2 − 1) × N values. This is our final switching

pattern (the circuit implementing this switching pattern is shown in Figure 3-5).

From this native interpolator topology, there are many other interpolators which can

be built, each having different tradeoffs and gains.

Implementing switches above the inputs can to steer the current either positive

or negative and allows us to reduce the number of cells by a factor of 2 by getting

rid of all negative cells (Figure 3-7). This type of circuit is called a type II phase

interpolator. The area requirement is halved, however, the capacitive loading on the

output resistors is still the same. Furthermore, the switches here do not have enough

overdrive to operate consistently in triode, especially in a small headroom of 1.2V.

If they operate as common-gate amplifiers, the swing has to be reduced below a VT

of devices. Using those cascode switches increases the linearity of the differential

pair, since there is very little channel length modulation on the drain of the input

pairs. We can use peaking inductive loads to counter both bandwidth reduction and

63

vb

sin-sin

sin

vb

Vdd Vdd

cos cos-cos

select_sinp[31:0]select_cosp[31:0]

M1 M2

M3

M4

M5 M6

M7

M8

R1 R2

Figure 3-6: We can get rid of the common mode control cells by doubling the numberof cosine and sine cells for the same resolution (the select signals now are 32 bits widethermometer codes instead of 16 bits wide in this circuit).

Vdd Vdd

vbvb

select_sin[15:0]select_cos[15:0]

sin -sincos -cos

sign I sign I

sign Qsign Q

M1 M2

M3

M4

M5 M6 M8 M7

L1 L2

M9 M10

Figure 3-7: Type II phase interpolator. Inductors allow a larger headroom, so thatwe have both enough swing and are able to operate the switches as common-gateamplifiers.

64

swing limitation because we are allowed to swing the output above the rail [22]. The

inductive loads also make the common mode adjustment cells unnecessary.

Despite all the attractiveness of the previous circuit topology, there are some

tradeoffs that led us to not use it. First, it is a narrowband solution. Unless we have

a good way to calibrate the resonant frequency (maybe with a switchable capacitor

bank), it is hard to implement the interpolator for both 2.5GHz and 5GHz operation.

The inductors also impose an unacceptable area hit on the block.

3.2.3 Phase-Interpolator Non-idealities

Feedforward effect and Subthreshold Conduction :

In the phase interpolators such as the one used by [17], the phase output at

the quadrant boundaries are rarely the correct value. This problem extends to all

interpolators that removes the negative cells and use a sign-select switch instead. The

first reference to this phenomenon is Sidiropoulos’s work on phase interpolation for

CDR circuits [23] [24]. The reason for this phenomenon is the feedforward current

and subthreshold conduction of the differential pair tail current source. When for

example all the sine cells are turned off, we expect an output in phase with cosine.

However, the input from the sine cells still couple to the output via the Cgd capacitors

(feedforward), and the cosine cells still have an exponentially suppressed current of

the form I0 exp(− qVTnkT

) (subthreshold conduction) where n is typically around 1.5 to

1.6. Subthreshold conduction could be countered by using HVT devices for the tail

current source, but this would complicate the cascode layout. We can easily see

that feedforward and subthreshold conduction opposite shifts in the I-Q plane, but

when they do not cancel perfectly, we have a distorted I-Q plane quadrant (Figure

3-8). Techniques to counter this effect either involve carrier pre-rotation or additional

dummy switches to cancel feedforward currents, which adds to the complexity of the

design [19].

This is a problem that is not encountered in the simple type-I phase interpolator

implemented in this thesis. In the type-I case, for every cosine cells that contribute a

certain feedforward current, a corresponding negative cell also contribute the negative

65

of that feedforward current, and the net feedforward current is zero (Figure 3-9). The

same argument applies to the subthreshold conduction of the tail current source. Of

course, the cancellation of currents only occur when careful layout allows matching of

the negative and positive cells interconnect parasitics. A picture of the layout of the

interpolator array is shown in Figure 3-11. Especially in 65nm CMOS process, a large

part of the Cgd feedforward capacitor is the input to output metal rail capacitance,

not the overlap capacitance of the transistor itself.

Figure 3-8: Effect of subthreshold conduction and feedforward in the IQ plane.

vb

sin

Vdd Vdd

-sin

vb

feedforwardcurrent

CgdCgd

M1 M2

M3

M4

M5 M6

M7

M9

R1 R2

cancellationcurrent

-sin sin

Figure 3-9: Feedforward Effect in type I interpolator.

66

Non-linearity :

A differential pair is not a linear amplifier. If the differential pair is kept saturated,

the output differential current Id as a function of the input differential voltage Vd and

the the tail current source ISS is [25]:

Id =k

2Vd

√4ISSk− V 2

d (3.5)

|Vd| ≤√

2ISSk

(3.6)

k ≡ µnCoxW

L(3.7)

else , Id = ISS × sign (Vd) (3.8)

In a more illuminating fashion, we rewrite the differential current (which we plot

in Figure 3-10):

IdISS

=1

2x√

4− x2 if |x| ≤√

2 (3.9)

x ≡ VdVov

(3.10)

Vov =

√ISSk

(3.11)

else ,IdISS

= sign (x) (3.12)

Note here that Vov is just defined as the overdrive voltage of the input pairs when

the differential voltage value is 0. We can quickly see that the differential pair is

very linear over the region x ∈ (−1, 1). This means increasing the overdrive of the

differential pair is one way to obtaining a more linear transfer function curve. This

understanding that driving the transistor too many overdrives in amplitude will cause

significant harmonic distortion is necessarily an over-estimate of the nonlinearity of

the differential pair. This is because the input pairs are short-channel devices, and

therefore suffer noticeable effects of velocity saturation. We can obtain some intuition

67

behind those effects by considering a model of velocity saturation in [26]:

Id =knW

L(VGS − VT −

VDSATn2

)VDSATn (3.13)

This approximation shows a linear relation between VGS−VT and Id, and therefore

the square law analysis of the MOS device overestimates the non-linear contribution

under velocity saturation conditions. In the actual design, the calibration sets the

input amplitude to about 250mV, which is several overdrives, yet the interpolator has

a very good phase linearity.

Figure 3-10: Plot of output differential current vs input differential voltage.

3.2.4 Layout and Sizing

The tail bias current is chosen to be a long channel transistor in order to decrease

channel-length modulation and mismatch effects. The layout of the interpolator cell

is done so as to minimize DNL. Essentially, each time a new current source switches

on, we want to make sure its current contribution is as similar to the previous current

source switching on. This explains why all the cosine cells and the sine cells have

been placed together. Furthermore, great care is placed in providing identical Vgs to

68

all the cells, since the current mismatch is gm∆Vgs. This explains the larger metal

buses to ensure little IR drop at the negative supply between the cells. Furthermore,

because the interpolator array is a high frequency signal path, we want to be able

to satisfy metal fill without adding dummy metals inside the array itself. The metal

buses serve that purpose. The unit cell diffpair is slim (about 2.1µm with dummies),

so that the overall array is not too long, otherwise the difference in resistive path to a

given cell will cause systematic mismatch. 3µm worth of dummies are placed at the

edges of the interpolator array to prevent Shallow Trench Isolation (STI) effects on

carrier mobility [27].

Figure 3-11: Layout of the Interpolator Array

3.3 Signal Conditioning

We call signal conditioning anything in the high-frequency signal path that is not the

interpolator core.

69

3.3.1 Filter

Harmonic Purity Requirement :

The filter’s goal is to present a small amplitude sinusoid at the input of the inter-

polator. Let us consider 2 quadrature harmonically impure signals f1, f2 with odd

frequency components:

f1 =∑n

an cos(nωt) (3.14)

f2 =∑n

an cos(n(ωt− π

)) (3.15)

fout = cos(φ)f1 + sin(φ)f2 (3.16)

Without the higher harmonics, the crossing point would be around t′ = φω

+ π2ω

.

With the higher harmonics, this cross point shifts by ∆t, causing some phase errors

fout = 0→ a1 cos(ωt′ − φ) = −∑n≥3

an cos(ωt′ − φ) (3.17)

a1 cos(π

2+ ω∆t) = a1 sin(ω∆t) = −

∑n≥3

±an sin((n− 1)φ+ nω∆t) (3.18)

We are interested in the case where φ 6= 0 and φ 6= π2

because in those cases, we

know ∆t = 0. If we carry a Taylor expansion the left and the right hand side for

small ∆t, we have:

∆t =

∑n≥3±an sin((n− 1)φ)

ω(a1 −∑

n≥3±an cos((n− 1)φ))(3.19)

We can approximate the phase error as

∆φ ≈ 2πa3a1

(3.20)

70

A reasonable requirement is for a3a1< 1

100. Now the clocks coming into our SSCG

are already slew limited square waves, which have a3a1< 1

3. Suppose we put it through

a 40dB per decade attenuation filter. Then, we have a3a1< 1

27. The phase interpolator

is another approximately first order filter at frequencies of interest, giving another

factor of three (see Figure 3-12). At the output of the phase interpolator, we have

a3a1< 1

81. Further attenuation of the higher harmonics will occur because of slew-rate

limiting effects of the input square wave. A second order filter should therefore suffice.

Square WaveFourier Coefficients

1st harmonic

3rd harmonic

5th harmonic

1

1/31/5

Put through 2nd order filterand 1st order output stage (-60dB/dec total)

Filtered Square WaveFourier Coefficients

1st harmonic

3rd harmonic

5th harmonic

1

1/81

1/525

Figure 3-12: Cartoon picture of 2nd order filter effect on Fourier coefficients.

Circuit and Layout :

The sineshaper circuit is a modified circuit based on Jeremy Walker’s design at

Analog Devices. Because of the high frequency requirement, it is not feasible to use

a feedback topology for a filter (using an op-amp to make an integrator for exam-

ple). Instead, we use CMOS inverters, which can are digitally switchable (for coarse

71

grain bandwidth control), and whose supply is calibrated (for fine grain bandwidth

control). The schematic is shown in Figure 3-13. The first stage is overloaded by 4

identical invertors. 2 of the output stage invertors are switched on at 2.5GHz, and all

4 are switched on at 5GHz. Each stage provides roughly a 1st order roll off. However,

because the Miller capacitance is significant enough for the second stage, a feedford-

ward zero also decreases the roll-off. We therefore use zero-cancellation capacitors,

to improve the roll-off. AC coupling is done using MOM capacitors, because the

non-linear response of the MOS capacitors cause significant harmonic distortion. The

AC coupling capacitors are oversized for 5GHz operation, but need to be so for the

lower frequency 2.5GHz operation. The MOM capacitor allow us to satisfy metal fill

around the sensitive high frequency signal path without needing to introduce dummy

metals (Figure 3-14).

[3:0]

[3:0]

enable

enable_bar

in out

- Vin

Vin

feeforwardcurrent

cancellationcurrent

parasitic Miller

cap

Vin +

Vin -

Vout +

Vout -

To output of LDO

Vcm

Vcm

Figure 3-13: Filter Schematic. This schematic includes one of the 2 differential pathtaken. The feedforward cancelling capacitor is included only for the 2nd stage, becausethe gate to drain capacitance of the first stage is insignificant.

72

Figure 3-14: Layout of the SineShaper. This has 4 inverter chains, for the 2 differentialsignal path, one for cosine one for sine.

73

Biasing of the Interpolator Inputs :

The tail bias point is set to be VTn +Vov, while the input pair common mode bias

is set to be VTn +√

2Vov. The reason why the input common mode can be set this

low is because the tail current is velocity-saturated, and has a lower VDSATn than the

overdrive value. A simulation shows that the tail current source and input pairs stays

in saturation across corners (Figure 3-15).

Figure 3-15: Plot showing the margin above saturation for the tail current source anddifferential pair inputs. For PVT corner numbering reference, see Appendix C.

Evaluation :

To evaluate the performance of the filter, we input a slew-limited square wave

(20ps rail-to-rail rise and fall time) and AC couple this signal into the filter. The

simulations are done with typical layout parasitics, and nominal corner (27 degrees

C, nominal devices). We see that a second order roll-off between the fundamental and

the first harmonic is achieved for both 2.5GHz operation and 5GHz typical operation

(Figure 3-16).

74

Figure 3-16: AC response for the filter, configured for typical corner at 5GHz, and atypical corner at 2.5GHz.

3.3.2 Restoration

The restoration circuit takes in the attenuated interpolated output, and gains it to

a rail-to-rail signal. To level shift the output, we AC couple the signal and self-bias

inverters similar to the sineshaper. To increase bandwidth, power down is done in

the feedback path instead of the forward signal path (Figure 3-17).

The main tradeoff in using this topology is between having a fast feedback system,

and stability. The feedback path has to restore the bias point quickly, because code

switches will introduce common mode transients at the output of the differential

pair. Those glitches happen because of timing mismatch between common mode

and interpolating cells, current bias mismatch between those cells, and because of

charge injection that disturb the common mode of the interpolator everytime a cell

switches on. These sudden voltage steps are high-pass filtered through the CAC , the

AC coupling capacitor and disturb the bias point of the inverter, which has to be

restored quickly by the feedback circuit. The fastest code sweep of the restoration is

75

Figure 3-17: Restoration circuit and layout

312MHz, which means the loop bandwidth should be faster than 312MHz. On the

other hand, the slower the RFCAC time constant is, the slower the dominant pole of

this loop is and the more stable the loop is.

We intentionally sacrificed the stability in the small signal sense to allow fast

common mode settling. This is because the circuit is not meant to operate in small

signal, where the gain of the circuit is very large, but is meant to operate with

the output going rail-to-rail. At worst corner, the input will swing 250mV peak-

to-peak after calibration, and the output is guaranteed to rail. This implies that

requiring small-signal stability is an conservative demand, because the nonlinearity

of the inverters will clamp the gain down. Instead, we check stability of the loop

76

using transient simulations of a sinusoidal input of 250mV peak to peak, with sharp

transients modelling the discrete phase shift. We note that layout parasitics will

further attenuate phase oscillations, because both the digital switching and the finite

bandwidth of the interpolator stage will low pass filter those sharp transients. A

plot comparing the phase oscillations with a behavioral interpolator and with a real

phase interpolator circuit (with layout parasitics) is shown below (Figure 3-19). We

can still calculate the phase-margin across corners, as it is done in Figure 3-18, with

a 15fF capacitive loading (5fF for self loading for layout, and 10fF for output load

capacitance).

Figure 3-18: Restoration Phase Margin. Although the phase margin is negative, thestability of the circuit is not compromised because the inverters operate non-linearlyand effective loop gain is lower than what is expected from small-signal analysis.

3.3.3 Clock Divider

The clock divider is a re-used circuit, shown in Figure 3-20. It takes a differential

input “inn” and “inp” at 5GHz and provides quadrature outputs out1, out2, out3,

out4, at 1.25GHz. It is important to make sure not to add too much load the output,

or setup time of the latches will not be satisfied. In simulation, driving the clock

divide with the restoration circuit at worst corner, we are able to load up to 10fF of

77

Figure 3-19: Restoration output phase transient under phase code sweep.

−

+

−

+

−

+

−

+

Vdd

Vdd Vdd

out1

out3

out2

out4

Vdd

Vdd

Vdd

out-out+

clockn

clockp

clockp

clockn

clockn

clockp

clockp

clockn

in+in-

clockp

clockn

clockp

clockn

Figure 3-20: Clock Divider Circuit.

78

interconnect parasitic in addition to a minimum sized inverter.

3.4 Top-Level Floorplan

We present here the top-level floorplan (Figure 3-21). Because this chapter is mainly

concerned with the performance of the analog high-frequency signal path, we will

simulate only 5GHz signal path with layout. The actual interpolator array is very

compact, and in fact, the main area hit comes from the voltage regulator and peak

detectors, which are reused blocks. A discussion of the digital blocks and routing is

deferred to chapter 4.

Figure 3-21: Top level Floor Plan and High Frequency Signal Path Layout. Note weadded clock buffers at the output to drive the long wires out of the block.

79

3.5 Evaluating the The High Frequency Signal Path

3.5.1 Evaluation Methodology

To evaluate the interpolator, we sweep phase codes, which are decoded using an ideal

behavioral (verilog) decoder (Figure 3-22). The phase codes are updated at 312MHz,

as fast as the fastest phase update the design must sweep, thus we capture both

dynamic and static phase errors. After we let the circuit settle for 100.0ns, we start

measuring the time difference between the crosspoints of the 5GHz differential signals

at the output of the restoration circuit.

Figure 3-22: Methodology for evaluating phase linearity.

In order to accurately simulate post-calibration regulator code values, we extracted

the layout parasitics at the input of the interpolator, and simulated the filter with the

additional capacitor at its output (simulation methodology illustrated in Figure 3-23).

We then did a 200.0ns transient simulation for each PVT corner (45 of them), and

across every regulator output values (32 of them). The results of those 32×45 = 1440

simulations are displayed in Figure 3-24, where we graph the amplitude at the output

80

of the filter swept across input codes for every skew corners. The output of the filter

was fed onto the peak detector circuit with a switching threshold code value of 5 (the

threshold code can be from 0 to 16, but for a good operation of the interpolator, a

code value 4-6 works best), and we picked the lowest code where the peak detector’s

comparator would switch. These codes are stored and their corresponding values are

used for any given PVT corner simulation (the methodology is schematized in Figure

3-25).

Figure 3-23: Simulation methodology for regulator code selection cross corners.

3.5.2 Results

Phase Linearity :

We test the interpolator using the previous method with an input quadrature error

of 5, and record the corresponding values of DNLrms across corners. The circuit

was simulated with layout parasitics and includes all the high frequency signal path

(does not include the divide-by-4 circuit). Both the DNLrms values and maximum

DNL values are well within specifications (Figure 3-26 and 3-27). To estimate the

effect of random mismatches, we also do 50 monte-carlo trial phase sweeps of the

high-frequency path and evaluate DNLrms for each case. A histogram of the various

DNLrms in Figure 3-28 shows that additional non-linearity introduced by monte-carlo

mismatches are essentially insignificant (4.15 milliperiods of DNLrms is well within

specification). The monte-carlo simulation shares the 5 quadrature error handicap.

Power Consumption :

The power values were simulated with layout at 5GHz, at the worse power con-

sumption corner (corner 37, with Vdd = 1.26V , fast transistors and temperature

81

Figure 3-24: Plot of filter output amplitude as we sweep the regulator input codevalue (and therefore change the filter supply), for all 45 PVT corners.

= −40C), and output loads of 50fF for each quadrature output at 1.25GHz.

Block Power (mW)Interpolator Array 4.01Filter 3.38Restoration 4.15Divide-by-4 .45Output Clock Buffers 1.46Total 13.45

Table 3.1: Power consumption summary (clock path).

82

Figure 3-25: DNLrms value across corners for high and low frequency modes, with5 quadrature error.

Figure 3-26: We extract the largest DNL step across input code transitions, for eachcorner.

83

Figure 3-27: DNLrms histogram for 50 monte carlo simulations at 5GHz (worst casecorner, corner 28).

84

Jitter :

Figure 3-28: Integrated Jitter plot at the output of the restoration circuit.

There are 2 main contributors to phase noise at the output: systematic frequency

modulation and random jitter.

Periodic Steady-State Noise Simulation was done in SpectreRF with a fixed input

code value to obtain random jitter (shown in Figure 3-28). It is simulated with layout

at the slowest corner with 5 GHz inputs (corner 28, where Vdd = 1.14V , temperature

= 125C and all devices are skewed slow). The total random jitter at the output of

the restoration circuit is 528fs rms. This jitter figure is dominated by white thermal

noise. It is uncorrelated with the jitter from the spread-spectrum modulation. The

contribution of systematic jitter due to spread-spectrum depends on the mode of

operation. For the case where φm = 64 periods at 5GHz, and fm = 38kHz, there

are a total of 212 phase steps per modulation period Tm, each 6.25ps large. There

are 215 zero crossings of the 1.25GHz clock. This means that the systematic jitter

contribution at the 1.25GHz output clock is 1:

1Note that this jitter figure is for a phase update rate of 312MS/s. As we increase the phaseupdate rate, so does the jitter contribution from our phase modulation scheme.

85

∆t21 = (6.25ps)2 × 1

215× 212 = 4.88× ps2 (3.21)

Given random jitter is uncorrelated, we therefore have a total output rms jitter

of ∆tjitter =√

∆t21 + 528fs2 = 2.32ps

3.6 Summary

Following the architecture proposed at the end of chapter 2, we extract specifica-

tions for each block of the system in the analog signal path. We propose a simple

interpolator topology. The input filters and output restoration circuits are designed

and laid out around this core interpolator. Because of the architecture choice, both

the filtering requirements and interpolator complexity are greatly reduced, making

the circuit choice simple. We evaluate each block separately, and evaluate the full

high-frequency signal path across 45 skew corners, and with monte-carlo mismatches.

The high-frequency signal path satisfies power, jitter, speed and phase linearity spec-

ifications set in chapter 2 by a wide margin.

86

Chapter 4

Digital and Auxiliary Circuits

4.1 Introduction

This chapter is mainly concerned with the implementation details of the auxiliary

circuits and digital circuits.

The auxiliary circuits include:

• The Voltage Regulator : This a reused LDO (Low-Dropout Regulator) block.

The LDO takes a 5 input bit bus which digitally controls the output voltage

level it regulates. This voltage level is supplied to the filter and is used to adjust

the filter’s output swing.

• The Peak Detector : This is a reused block that has a peak detector and a

comparator. The peak detector converts its RF input to a DC level proportional

to the amplitude, and the comparator compares the amplitude value with a

digitally set reference voltage.

The digital circuits are:

• Waveform Generator: The Waveform Generator is a digital block whose

inputs are the divide-by-16 clock and various input mode bits. It outputs a

divide-by-64 clock to the calibration circuit and delivers the bit bus which en-

codes the phasevalue the interpolator should output.

87

• Decoders: the decoders take the bit outputs of the waveform generator and

supply the thermometer codes to control the interpolator stages.

• Calibration FSM: The Calibration FSM uses the peak detector’s output to

set the regulator code value and the number of inverters switched on in the

input filters.

4.2 Regulator

The LDO uses feedback to set the output node at a voltage level fixed by the voltage

Vref and the value of IDAC (Figure 4-1). If the feedback system is stable, negative

feedback forces both terminals of the op-amp at the same level and the output is just:

Vout = Vref + IDAC ×R2 (4.1)

IDAC in turn is a binary weighted current array controlled by digital inputs. For an

input code of value n ∈ [0, 31], IDAC outputs 87.5µA+ n(6.25µA). Vref is nominally

set at 0.3V bias. R2 is set to 2.698kΩ. This means the regulator output for code n is

Vout = 536mV + 16.9mV × n (4.2)

There are two effects that are important from our perspective. The voltage reg-

ulator is rated at 8mA, and under that current load, it will need 80mV drop-out

between Vdd and Vout to keep transistor M1 in saturation. This implies that the max-

imum output voltage at low supply of 1.14V is 1.06V. This is the maximum supply

voltage the filter should rely on using. Note this is a conservative restriction because

the actual current load by the filter is usually around 2mA.

88

−

+

regulator

output

Vdd

ibias

R1

R2

M1

IDAC

Op-AmpVref l

Figure 4-1: Regulator Schematic.

4.3 Peak Detector

The peak detector is digitally controlled by a 4 bit input code to set the offset between

VREFCM and VREF (Figure 4-2). The offset is set by a resistive DAC. Transistor M1

and M2 are essentially followers. They force the capacitor voltage to a VGS higher

than the lowest voltage signal at the input of the input transistors. This signal is then

compared with VREF and the clocked comparator switches from high to low when the

amplitude is large enough. We calibrate only the Q input, not the I input, and load

the I input with a dummy replica peak-detector.

From our perspective, there are a few important parameters to extract from the

peak detector.

• Settling Time: The Peak Detector takes 100ns to settle within .1 dB of its

final value. Therefore the calibration FSM must wait at least a 100.0ns after a

regulator code is switched before it clocks the comparator to obtain the com-

parator output. In our case, we used a conservative 384ns before the comparator

was clocked after a regulator code switch.

• RF amplitude to DC voltage gain : The gain value is −.25dB at worst

corner and −.11dB typically.

• Digital Threshold offset selection: For input code value n, the offset set by

89

−

+

Vdd

Vdd

input

code

Vdd Vdd

Vdd

Vcm

Vref

C1

CAC CAC

inp inn

comparator

output

RCM RCM

IBIAS IBIAS IBIAS4/3

4/3 IBIAS

VCMREFVCMREF

R1R2

R3

Comparator

Clock Input

4/3 IBIAS

M1 M2

M3

8/3 IBIAS

Figure 4-2: Peak Detector Schematic.

the value of IREF , R1, R2, R3 is:

VCM − VCMREF = 88.9mV + n× 7.6mV

• Load : The peak detector represents a 10fF load at worst corner.

From these numbers, we determine the mapping between the code value “n” and

the input differential amplitude A at which the comparator switches:

A = 181.0mV + 15.5mV × n (nominal corner) (4.3)

These will be the numbers used for top-level behavioral simulation.

90

Figure 4-3: Phase Deviation over time for 3 attenuation modes (16.6dB, 13.6dB,10.6dB) at fm = 38kHz.

4.4 Waveform Generator

4.4.1 Basic Operation

The goal of the waveform generator is to create the bit-values representing a given

target phase over a modulation period. Some example of phase waveforms targeted

at different values of attenuation are shown in Figure 4-3.

A simplified architecture is to implement an inner counter (called the frequency-

tuning word) to keep track of the magnitude of frequency deviation values. Because

the phase is the integral of the frequency, we increment an outer register (usually

called the phase accumulator) with the instantaneous value stored in the frequency-

tuning word (Figure 4-4). The phase accumulator therefore stores the phase value

to be used at every clock cycle. A single combinational logic element can detect

whether the frequency tuning word is full or empty, and switch from increment to

decrement or vice-versa. By switching to decrement for both the phase accumulator

and the frequency tuning word, one achieves a sawtooth frequency modulation. How

fast the frequency tuning word fills up then determines fm. The value of the phase

91

accumulator when the frequency-tuning word register is full determines the maximum

phase deviation (and attenuation specification).

Figure 4-4: Waveform Generator block diagram.

Register Sizing Math :

The fact that the slowest spread-spectrum fm is 38kHz, is roughly 213 times slower

than the digital clock update rate of 312MHz, implies that the frequency tuning word

must be at least 11 bits long ( it takes 212 − 1 cycles to increment all the way up

and 212− 1 cycles to decrement all the way down to 0). The phase accumulator must

therefore be at least 16 bits long ( an additional 5 bits for the phase value). We want

a total of 64 periods of phase deviation, and we must therefore pick a appropriate bit

range on the phase accumulator going into the decoder (divide its value) to obtain

the correct number of period deviations.

If we call the value of the frequency tuning word after the nth clock cycle an, and

the value of the phase accumulator b, the maximum value of b is:

max(b) =212−1∑n=1

an =(212 − 1)

2× 212 ≈ 223 (4.4)

92

To obtain 64 periods phase deviation, or 64×25 = 211 incremental phase steps, the

value of a phase step must be 223

211= 212. Therefore, bit[11] on the phase accumulator

must represent a LSB of phase step. It also implies the phase accumlator must be at

least 17 bits wide, since there a phase value is encoded using 5 bits.

4.4.2 Modes of Operation

The main operational mode of the spread-spectrum block is fm = 38kHz and 20dB

attenuation (19.6dB to be exact). Additional built in modes were also implemented

for debugging purposes. In particular, we want independent control of the maxi-

mum phase shift, which mainly controls the value of attenuation, and the modulation

frequency, which could be used for different RBW standards of the spectrum analyzer.

Figure 4-5: Acceptable Modes of Operation.

Suppose we want to increase fm from 38kHz to 76kHz. All we need to do is

increment 2 LSB’s of the frequency tuning word at a time, and the frequency tuning

word register will fill up twice as fast. Because fm is determined by how fast the inner

register fills, doing this effectively increasing fm by 2. However, a side effect of this

trick is that it will decrease the maximum phase deviation by 2. The calculation below

93

shows the the phase accumulator reaches a maximum value that is twice smaller (an

represents the frequency tuning word, and b the phase accumulator):

an = 2× n max(b) =211−1∑n=1

an =211−1∑n=1

2× n ≈ 222 (4.5)

Recall that for an = n, the largest value b took was 223. The maximum phase

deviation became twice smaller. This implies that at fixed maximum phase, if we in-

crease fm by 2 (by increasing the steps taken by the frequency tuning word by 2), we

also need to multiply by 2 the value of the output phase (rightshift the phasebit values

by 1). From this observation, we generate all the modes of operation for 16, 8, 4, 2×

1.25GHz period maximum deviation, and 38kHz, 76kHz, 152kHz, 304kHz, 608kHz, 1.216MHz

values for fm. Many of these modes are anomalous: they have unacceptable phase

skips. We can determine which modes violate the required maximum systematic

cycle-to-cycle jitter of 132

of the input clock period and plot it below (Figure 4-5).

Indeed, we see in the plot that modes with both large attenuation and large fm vio-

late the maximum phase skip requirement of 2 LSB’s of phase. This is because the

phase skip value scales linearly with both attenuation and fm. We therefore see that

the boundary between acceptable and unacceptable modes form a -3dB per octave

slope, implying that there is an inverse relation between attenuation and fm at fixed

maximum phase skip.

A layout of the digital block is after digital synthesis is shown in Figure 4-6.

4.5 Decoders

The Decoders take in the phase value bits from the waveform generation block and

decode those bits into a thermometer bit pattern that controls the I and Q values of

the interpolator as well as the number of common mode correction cells turned on.

The decoders are clocked both at the input and the output, because there might be

significant timing skew between the 2 decoders in the way the bits are routed. As

94

Figure 4-6: Waveform Generator layout.

long as the clock is routed symmetrically, however, that timing skew is irrelevant.

Figure 4-7 shows the decoder layout.

Figure 4-7: Sine and Cosine Decoders Layout.

95

4.6 Calibration Finite State Machine

4.6.1 Operation

Figure 4-8: Calibration algorithm.

Because the SSCG is meant to operate at only 2 different frequency modes, the

calibration FSM is particularly simplified. For the 5GHz frequencies, all the inverters

have to be turned on, while for the 2.5GHz frequency, only half of the inverters in

the 2nd stage have to be on as determined in simulation. Figure 4-8 summarizes the

calibration process, and Figure 4-9 shows the layout of the calibration block. Once

the speed of operation is selected, the calibration FSM just steps the codes of the

regulator from 0 upward until the amplitude of the input signal to the interpolator is

large enough to to switch the peak detector output, and send out a signal flag that

calibration is done. Note that once the flag is done, the spread-spectrum waveform

can start. However, we left the “start” signal for spread-spectrum to be independent

from the “end” signal from calibration because the spread-spectrum waveform start

has to synchronize with the FIFO buffer to make sure that the FIFO buffer will not

96

overflow (FIFO buffer shown in Figure 1-5). Figure 4-9 shows the calibration block

layout.

Figure 4-9: Calibration layout.

4.7 Top-Level Floor-Plan and Clock Distribution

4.7.1 FloorPlan

A top level floor-plan layout is shown in Figure 4-10. The high frequency signal path

is sandwiched between the low frequency circuits and the clock routing.

4.7.2 Clock Distribution

It is of particular importance to route the clocks symmetrically. Clock paths are

shown in Figure 4-11. This is because the decoder outputs are clocked, which means

the the bits that feed into the decoder can be skewed as long as the clocks which

control when the decoder switches its output bits are not skewed. A numerical study

97

Figure 4-10: Top-Level Floor Plan, highlighting the new blocks. Synthesized digitalcircuits are highlighted in solid lines.

98

Figure 4-11: Phase value bits and clock routing paths.

of the clock skew on the spectrum shows that clock skews has little effect on the

spectral attenuation itself (Figure 4-12).

Figure 4-12: Effect of Clock Skew between the 2 decoders and Spectral Attenuationfor a 5GHz input clock. The parameters for this sim are φm = 64× 2π, fm = 38kHz,Tdig = 3.2ns.

99

4.8 Power Summary

Block Power (mW)

calibration FSM 1.167Decoders 0.758Waveform Generator 1.414Total 3.338

Table 4.1: Digital and auxiliary circuit power consumption.

4.9 Summary

We presented the blocks that perform filter bandwidth calibration (the LDO, peak

detector and calibration FSM) and waveform generation ( waveform generator and

decoders). We characterized the LDO for future behavioral use, and calculated the

register sizing for the waveform generator. Because the phase is digitally controlled to

produce a sawtooth waveform, the waveform generation block is particularly simple,

and allows us to implement many-modes of operation with relatively little circuitry.

We analyze decoder timing skew effects, showing the small skew has little effect on the

attenuation specification. The power requirement of the digital blocks is a negligible

3.338mW, well within power budget specifications.

100

Chapter 5

Top-Level Simulation

5.1 Calibration

We plot here the simulation result for a typical calibration run (Figure 5-1). A

behavioral model of the phase interpolator and the peak detector was used. For the

phase-interpolator filter, we used a linearized model for the RF amplitude to regulator

code value relationship. The peak-detector was switched to trigger at a value of .9V

regulator output value, when the amplitude reached about 250mV single-ended peak-

to-peak. This output voltage is a typical regulator value at slow corners. The system

has been configured with the calibration sending out a flag signal called to start

spread-spectrum right away, although realistically, the spread-spectrum start signal

and the calibration end flag can (and should) be set independently.

5.2 Spread-Spectrum Operation

A cross PVT corner simulation of spread-spectrum with layout parasitics is unreal-

istic. Instead, we are here interested in capturing phase errors across corners and

characterize their effect on the spectral attenuation specification.

In the previous chapter, we collected the phase output vs. code input relationship

of the phase interpolator, across corners, using a 312MHz code sweep in transient

simulation. This relationship has been recorded and encoded into a behavioral model

101

Figure 5-1: Demonstration of Calibration Algorithm at work with behavioral phaseinterpolator and peak detector.

102

Figure 5-2: Output Clock attenuation.

of the phase interpolator. We then use this behavioral model for cross-corner Spread-

Spectrum simulation. This simulation does not capture layout parasitics of the full

block, but captures all layout parasitics of the high frequency signal path (this includes

input filters and output restoration), as well as loading effects of the peak detector and

divide-by-4 circuit. We plot the attenuation obtained as a function of the corner value

(Figure 5-2). We observe that the nonlinearity in phase has some negligible impact on

the undivided clock, but almost not impact beyond numerical truncation accuracy on

the divide-down clock (Figure 5-3). The attenuation specification is measured using

the standard metric presented in Chapter 2: by comparing the spectral height of

fundamental of an unspread and a spread output clock. The final attenuation figure

is about 19.6dB attenuation for the divide-by-4 output clock (Figure 5-3).

5.3 Full Circuit Simulation

A (schematic) transient simulation of all the circuits at transistor level is done to verify

functionality. The simulation is carried at corner 28, which is the slowest corner. We

103

Figure 5-3: Divide-by-4 clock attenuation.

record transient signals in the high frequency signal path (Figure 5-4), calculate the

phase deviation during the code sweep (Figure 5-5), and graph the phase INL and

DNL across code values (Figure 5-6). We obtain a DNLrms of 6.6 milliperiods (a

phase LSB is about 32 milliperiods).

5.4 Simulation Results Summary

All the simulation results listed in Table 5-1 are the worst-case specifications ob-

tained. The power figure was obtained at 5GHz clock input, with a code sweep

rate of 312MS/s and with the peak detector powered down. The buffered output

clocks were loaded with 50fF loads, which is most likely a conservative figure. The

peak detector consumes 6.43mW if clocked at 78MHz. However, during calibration,

the peak detector’s comparator is triggered only once every 32 clock cycles and the

comparator consumes 2.77mW. Because the comparator is dynamic (consumes power

only during clock switching), the expected power consumption of the peak detector is

6.43mW− 3132×2.77mW = 3.74mW . This is a slight underestimate because some bias

104

Figure 5-4: Differential signals in signal path for 5GHz operation.

105

Figure 5-5: 312MS/s phase code sweep for 5GHz operation.

Figure 5-6: Simulation results at 5GHz for linearity performance.

106

Specification Value Actual Simulation Results

Power Budget 50mW 15.42mWInput quadrature clocks (5 error) quadrature clocks (5 error)Output divide-by-4 quadrature clocks divide-by-4 quadrature clocksJitter Less than 5ps rms 2.32ps 1

Maximum cycle-to-cycle jitter 20ps 9.35ps 2

Maximum Time Deviation 64 periods of input clock 64 periods of input clockArea .4mm× .4mm .25mm× .35mmModulation Rate 38kHz or faster 38kHz to 306kHz 3

Modulation Waveform Sawtooth SawtoothInput clock rates fast mode (5GHz), slow mode (2.5GHz) 5GHz and 2.5GHzEMI reduction 20dB reduction 19.6dBProcess node TSMC’s 65nm CMOS low-power TSMC’s 65nm CMOS low-power

Table 5.1: Specifications summary.

lines do pull small currents, but they are much, much less than the currents pulled

by the comparator.

5.5 Summary

In this chapter, we combine the behavioral knowledge from the extensive block level

simulation to verify the operation of the SSCG system both during start-up cali-

bration mode and during normal spread-spectrum clocking mode. We evaluate the

attenuation across corners, being careful to capture all the phase non-linearity values

across different corners in our behavioral models. We then simulate at transistor

level the whole system for a code sweep at nominal conditions, confirming that the

system is functional overall and that we are not being deceived by our behavioral

models. The specification summary shows that we meet all specifications within rea-

son. The attenuation specification of 19.6dB is a robust result dependent mostly on

the maximum phase deviation value selected. The value predicted during behavioral

simulations was 19.72dB, showing that the architecture allowed significant circuit

non-idealities without fatally degrading the attenuation specification.

1Assumes 19.6dB attenuation and fm = 38kHz2Assumes 19.6dB attenuation and fm = 38kHz3Assumes acceptable attenuation and fm combination. At fm = 306kHz, only the 10.6dB

attenuation mode can be used.

107

Chapter 6

Conclusion

6.1 Summary

The goal of the work presented is to create a SSCG system for high speed digital

applications such as an RF-DAC. We showed analytic and numerical evidence for the

feasibility of using a phase-control system, and proposed a simple phase-modulator

as a solution. The circuit is designed and laid out in TSMC’s 65nm CMOS process,

and the final active area is .32mm × .25mm. Simulations demonstrate its operation

for both slow (2.5GHz) and fast mode (5GHz). The key to the success of the circuit

relies on a separation of frequency scales. The high speed clock can be modulated by

a relatively slower digital FSM, and still achieve high spread-spectrum attenuation

performance. Furthermore, the implementation of the phase modulator allows easily

reconfigured multiple-modes of operation, and a tight control on the maximum phase

deviation. The latter is critical in applications where spread-clocked data needs to

be retimed with an unspread-clocked circuit (e.g. when interfacing the digital and

analog part of a DAC).

6.2 System Usage

The spread-spectrum clock generator provided here generates 4 divide-by-4 quadra-

ture clock outputs because this was the original specification. However, if we remove

109

the divide-by-4 circuit, we now have high-speed Spread-Spectrum differential clock

output (2.5GHz or 5GHz) that can be used to clock any type of digital system with a

tight frequency modulation specification. Indeed, the main mode of operation, with

fm = 38kHz and maximum phase deviation, would above 25.0dB of attenuation on

undivided clock spectrum. Furthermore, the frequency spread is only 20MHz, allow

this system to be compliant with SATA I-III narrow frequency spread standards.

6.3 Further Work

6.3.1 Optimizing the Interpolator

As a first iteration of this system, the design was very conservative, leaving some

aspects to be optimized. First, the circuit is designed to have a higher phase-linearity

than necessary. We could instead have relied on calibrating the decode values to

calibrate out the phase-nonlinearity. This technique would allow using a much smaller

phase interpolator using linear interpolation scheme briefly considered in Chapter 2.

It would also allow larger input quadrature errors, which could be calibrated later.

Another aspect of the optimization will be the regulator. The regulator is a major

area hit, because it was a pre-built general purpose block with a current rating of

8mA. For our application, we only need about 2mA of current drive.

One possible idea is to eliminate the regulator and the filter altogether, and instead

use replica-feedback biasing as an input stage to ensure roughly constant signal swing

at the input of the phase interpolator [28]. Essentially, the differential pair in this

circuit has a replica, whose bias point is set so that its output swings to VREF when

the input rails. This bias point is enforced using an op amp, and routed to the actual

cell. VREF is a reference voltage that depends on the frequency of the clock, and is

usually supplied by the output of a loop-filter of a PLL. The voltage Vbiasp on the

other hand, is set to make the PMOS operate in deep triode, to have roughly linear

loads. Obviously, this scheme requires that the system has an on-chip PLL that locks

110

Vdd Vdd

−

+

Vref

Vb Vb

+sin -sin

VddVdd

ReplicaCell

M3 M4 M1 M2

Op-Amp

outpoutn

Vbiasp

Figure 6-1: Replica-Feedback biasing example

onto the input clock. The presence of a PLL is quitely likely, because it is a simple

way to generate quadrature input clocks required by the SSCG system.

6.3.2 Calibration for Arbitrary Input Frequencies

The calibration algorithm as it is written is designed for only 2 modes of operation,

2.5GHz or 5GHz input clock. The circuitry can operate at any frequencies in between.

Therefore, it is a simple matter of rewriting the algorithm to make the system be able

to calibrate the input swing for frequencies in between. This involves sweeping both

the voltage supplied to the filters (fine-grain calibration) and the number of inverters

switched on (coarse-grain calibration), while comparing the signal amplitude with a

set reference.

6.3.3 Top-level clock Distribution

A simple clock distribution scheme can be implemented. In this scheme, the input

quadrature clocks are buffered into the interpolator, but also have divided down

versions used to clock the digital blocks. We provide both divide-by-16 clocks and

divide-by-8 clocks because the digital block can run up to 312MHz. If the inputs are

at 2.5GHz, the divide-by-8 clock can be used instead and fully exploit the speed of

111

Figure 6-2: Simple Clock Distribution Method

the digital block.

112

Appendix A

Effect of Spectrum Analyzer

A.1 Model of peak-hold mode Spectrum Analyzer

In the numerical treatments, we assumed the spectrum measured would be similar

to a numerical FFT. In general, this is not a accurate statement. Instead, what is

most likely is that a spectrum analyzer operated in peak-hold mode will be used to

measure the output spectrum of a DAC (Figure 6-3). We will adhere to the notation

consistent with [15] and Chapter 2 of the thesis. The filter of the spectrum analyzer

has an LTI response of the form:

h(t, fc(t)) = h0(t) exp(2πjfc(t)t), (A.1)

where h0(t) is a bandpass filter impulse response function centered at DC with a

bandwidth set to be the RBW of the spectrum analyzer. The frequency-shifted

bandpass filter h(t, fc) is centered at fc. The signal Imk(t), the kth clock harmonic, is

fed into this tunable filter, with fc being discrete stepped. Note for a given RBW of

the filter, the filter center can step only every 1RBW

time, by a frequency step of the

order RBW . A peak detector measures the amplitude of the filter’s output, called

Ib(t, fc), which is then directly proportional to the value registered by the spectrum

analyzer as the power-spectral amplitude at fc (called S(fc)).

113

Figure A-1: Peak-Hold mode operation of spectrum analyzer, Figure from [16].

A.2 Calculation of Measured Spectrum

Intuitively, when the filter’s center frequency matches with the signal frequency,

Ib(t, fc) will “resonate” and the amplitude will be larger. To obtain an expression

for the signal at the output of the filter from the spectrum analyzer, we convolve the

frequency modulated input with the filter response:

Imk(t′) ≡ I0k

2exp

(2πjf0(t+ δ

∫ fmt

0

dτV (τ))

)(A.2)

Ib(t, fc) =

∫dt′Imk(t

′)h0(t− t′) exp (2πjfc(t− t′)) (A.3)

The expression for Ib(t, fc), the filter’s output, can be approximated by using

the method of stationary phase. The method of stationary phase tells us the main

contributions to the integral comes from the instants when the input instantaneous

frequency matches fc, the filter center frequency. Around those times, we carry out

a Taylor expansion of the phase to simplify Ib to the expression below:

Ib(t, fc) =∑α

∑l

I0k2

√1

jkδV ′(tlα)Γ(tlα)h0(t− tlα) exp (2πjfc(t− tlα)) (A.4)

S(fc) ∝ max|Ib(t, fc)| (A.5)

For a sensible analysis, we will explore two limits for a sawtooth like waveform

114

(where there only α = 1): one where RBW fm and another where RBW fm

(which corresponds to our application). For RBW fm, each term in the integral

do not overlap, and we can approximate:

S(fc) ≈ Γ(tl)max|h0|√jkδV ′(tl)

for any l values (A.6)

We observe that in this case, the attenuation scales 3dB per octave with the first

derivative of the instantaneous frequency, δV ′(tl). For a sawtooth modulation, this

value is clearly dependent on δ and fm. Furthermore, the attenuation is independent

of the details about the filter shape, the only filter related information is captured

with max(|h0|).

In the second case, to facilitate the analysis, consider a rectangular shaped filter

in time domain (we’ve made the filter non-causal, but it doesn’t change significantly

the analysis), with:

h0(t) = RBW for |t| < 1

2RBW(A.7)

h0(t) = 0 for |t| > 1

2RBW(A.8)

S(fc) ≈1

Tm ×RBW× I0k

2×RBW

√1

kf0δV ′(tl)|Γ(tl)| for any l values (A.9)

=1

Tm

I0k2

√1

kf0δV ′(tl)|Γ(tl)| (A.10)

The reason why the S(fc) scales with 1Tm

is because the number of stationary

points that contribute to the integral is proportional to 1Tm

. This relationship is true

up to Tm = 1RBW

. We now recover the 3dB scaling between attenuation and maximum

phase δf0V′(tl)T

2m.

115

A.3 Example 1: Gaussian Filter

In the previous example, we considered a time-domain rectangular shaped filter, which

simplified the calculations but was not particular realistic. Here, we consider a so-

called gaussian filter:

h0(t) = exp

(− t

2

λ2

)(A.11)

This filter’s impulse response has a characteristic time scale λ, and therefore the

RBW of the filter is 1λ. For fm RBW , we then have:

Ib(t, fc) =I0k2

∑l

√1

jkδf0V ′(tl)exp

(−(t− tl)2

λ2+ j2πfc(t− tl)

)(A.12)

As we let the peak detector settle, the largest value Ib take will be the sum of all

the stationary point’s contributions, weighted by h0(t− tl). Therefore

|S(fc)| ≈∣∣∣∣ 1√jkf0δV ′

I0k2

∣∣∣∣ ∞∑l=−∞

exp

(−l2

(Tmλ

)2)

(A.13)

Once again, we see the 2 regimes of operation. If Tm λ, then only the l = 0

term contributes is any significant form. This implies that the attenuation |S(fc)| ≈

| 1√jkf0δV ′

I0k2|, and the dependence of Tm drops out. In the case of Tm λ, we

have to account for the contributions of other values of l. If we bound the sum for

l ∈ (−104, 104), we will have a reasonable estimate for values of Tmλ 10−4, and the

sum is easily numerically estimated. The numerical estimation matches our intuition

that in that regime, there should be a first order roll-off relation between attenuation

and Tm.

116

Figure A-3: Attenuation as a function of Tmλ

for a Sinc filter, where we fix

| 1√jkf0δV ′

I0k2| = 1

118

Appendix B

Terminology

BPF Band Pass Filter

CDR Clock and Data Recovery (circuit)

CP Charge Pump

DAC Digital-to-Analog Converter

DC Direct Current.

DFT Discrete Fourier Transform

DNL Differential Non-Linearity

DRFPM Digital-to-RF Phase Modulator

DLL Delay-Locked Loop

EMI Electro-Magnetic Interference

FFT Fast Fourier Transform

FIFO First In First Out

FSM Finite State Machine

HVT High-Voltage Threshold

INL Integral Non-Linearity

LF Loop-Filter

LPF Low Pass Filter

LSB Least significant bit

LTI Linear Time-Invariant

LVT Low-Voltage Threshold

119

MSB Most Significant Bit

PA Power Amplifier

PD Phase Detector

PLL Phase-Locked Loop

PSD Power-Spectrum Density

PVT Process, (supply) Voltage, Temperature

RBW Resolution Bandwidth

RF Radio-Frequency

SFDR Spurious-Free Dynamic Range

SS Spread-Spectrum

SSCG Spread-Spectrum Clock Generator(tion)

SS Spread-Spectrum

STI Shallow Trench Isolation

VCDL Voltage-Controlled Delay Line

VCO Voltage Controlled Oscillator

120

Appendix C

PVT Corner Nomenclature

The PVT corners in this thesis are numbered. The number allows us to determine the

skew characteristics of the devices as well as the simulated temperature and supply

voltage values. The devices can be skewed nominal, fast or slow, while the supplies

are varied between 1.14V , 1.20V (nominal), 1.26V . The temperatures are set to

either −40C, 27C or 125C. n is shorthand for NMOS, p is shorthand for PMOS r

is shorthand for resistor, c is shorthand for capacitor. For a corner number n, we list

here the options set for the simulations:

• If n mod (5) = 0, slow n, fast p, nominal r, nominal c

• If n mod (5) = 1, nominal n, nominal p, nominal r, nominal c

• If n mod (5) = 2, fast n, fast p, fast r, fast c

• If n mod (5) = 3, slow n, slow p, slow r, slow c

• If n mod (5) = 4, fast n, slowp, nominal r, nominal c

• If n mod (15) < 5, temperature is 27C

• If 5 ≤ n mod (15) ≤ 9, temperature is −40C

• If n > 9, temperature is 125C

• If n ≤ 15, supply is 1.2V

121

• If 15 < n ≤ 30, supply is 1.14V

• If n > 30, supply is 1.26V.

122

Bibliography

[1] Domine Leenaerts Gabriele Manganaro. Advances in Analog and RF IC Designfor Wireless Communication Systems. Elsevier, Oxford, UK, 2013.

[2] Gabriele Manganaro. Advanced Data Converters. Cambridge University Press,Cambridge, UK, 2012.

[3] Bezhad Razavi. Design of Analog CMOS Integrated Circuit. McGraw Hill, NewYork, NY, 2001.

[4] David A. Tony C. Carusone. Design of Analog CMOS Integrated Circuit. WileySons, Danvers, MA, 2001.

[5] Hsiang-Hui Chang, I-Hui Hua, and Shen-Iuan Liu. A Spread-Spectrum ClockGenerator with Triangular Modulation. Solid-State Circuits, IEEE Journal of,38(4):673–676, Apr 2003.

[6] Chao-Chyun Chen, Sheng-Chou Lee, and Shen-Iuan Liu. A Spread-SpectrumClock Generator Using a Capacitor Multiplication Technique. In Emerging In-formation Technology Conference, 2005., pages 4 pp.–, Aug 2005.

[7] Yi-Bin Hsieh and Yao-Huang Kao. A Fully Integrated Spread-Spectrum ClockGenerator by Using Direct VCO Modulation. Circuits and Systems I: RegularPapers, IEEE Transactions on, 55(7):1845–1853, 2008.

[8] M. Kokubo, T. Kawamoto, T. Oshima, T. Noto, M. Suzuki, S. Suzuki,T. Hayasaka, T. Takahashi, and J. Kasai. Spread-spectrum Clock Generatorfor Serial ATA Using Fractional PLL Controlled by Delta-Sigma Modulator withLevel Shifter. In Solid-State Circuits Conference, 2005. Digest of Technical Pa-pers. ISSCC. 2005 IEEE International, pages 160–590 Vol. 1, 2005.

[9] Yi-Bin Hsieh and Yao-Huang Kao. A Fully Integrated Spread Spectrum ClockGenerator Using Two-Point Delta-Sigma Modulation. In Circuits and Systems,2007. ISCAS 2007. IEEE International Symposium on, pages 2156–2159, 2007.

[10] R.H. Mekky and M. Dessouky. A 0.8 ps rms Jitter, 6.3 GHz Spread SpectrumClock Generator for SerDes Transmitter Clocking. In Microelectronics (ICM),2010 International Conference on, pages 80–83, Dec 2010.

123

[11] Kuo-Hsing Cheng, Cheng-Liang Hung, and Chih-Hsien Chang. A 0.77 ps RMSJitter 6-GHz Spread-Spectrum Clock Generator Using a Compensated Phase-Rotating Technique. Solid-State Circuits, IEEE Journal of, 46(5):1198–1213,May 2011.

[12] S. Damphousse, K. Ouici, A. Rizki, and M. Mallinson. All Digital Spread Spec-trum Clock Generator for EMI Reduction. In Solid-State Circuits Conference,2006. ISSCC 2006. Digest of Technical Papers. IEEE International, pages 962–971, Feb 2006.

[13] D. De Caro, C.A. Romani, N. Petra, A.G.M. Strollo, and C. Parrella. A 1.27 GHz,All-Digital Spread Spectrum Clock Generator/Synthesizer in 65 nm CMOS.Solid-State Circuits, IEEE Journal of, 45(5):1048–1060, May 2010.

[14] Jonghoon Kim, Dong Gun Kam, Pil Jung Jun, and Joungho Kim. Spread Spec-trum Clock Generator with Delay Cell Array to Reduce Electromagnetic Inter-ference. Electromagnetic Compatibility, IEEE Transactions on, 47(4):908–920,2005.

[15] Y. Matsumoto, K. Fujii, and A. Sugiura. An Analytical Method for Determiningthe Optimal Modulating Waveform for Dithered Clock Generation. Electromag-netic Compatibility, IEEE Transactions on, 47(3):577–584, Aug 2005.

[16] D. De Caro. Optimal Discontinuous Frequency Modulation for Spread-SpectrumClocking. Electromagnetic Compatibility, IEEE Transactions on, 55(5):891–900,Oct 2013.

[17] T.W. Barton, SungWon Chung, P.A. Godoy, and J.L. Dawson. A 12-bit Resolu-tion, 200-MSample/second Phase Modulator for a 2.5GHz Carrier with DiscreteCarrier Pre-rotation in 65nm CMOS. In Radio Frequency Integrated CircuitsSymposium (RFIC), 2011 IEEE, pages 1–4, 2011.

[18] R. Kreienkamp, Ulrich Langmann, C. Zimmermann, and T. Aoyama. A 10-Gb/sCMOS Clock and Data Recovery Circuit with an Analog Phase Interpolator. InCustom Integrated Circuits Conference, 2003. Proceedings of the IEEE 2003,pages 73–76, 2003.

[19] Taylor W. Barton. Phase Manipulation for Efficient Radio Frequency Transmis-sion. PhD thesis, MIT, Cambridge, MA, August 2012.

[20] G. von Bueren, L. Rodoni, H. Jaeckel, A. Huber, R. Brun, D. Holzer, andM. Schmatz. 5.75 to 44Gb/s Quarter Rate CDR with Data Rate Selection in90nm Bulk CMOS. In Solid-State Circuits Conference, 2008. ESSCIRC 2008.34th European, pages 166–169, 2008.

[21] Xiuge Yang, Changhua Cao, K.K. O, J. Brewer, and Jenshan Lin. A 2.5 GHzConstant Envelope Phase Shift Modulator for Low-Power Wireless Applications.In Radio Frequency integrated Circuits (RFIC) Symposium, 2005. Digest of Pa-pers. 2005 IEEE, pages 667–670, 2005.

124

[22] Hua Wang and A. Hajimiri. A Wideband CMOS Linear Digital Phase Rotator.In Custom Integrated Circuits Conference, 2007. CICC ’07. IEEE, pages 671–674, 2007.

[23] S. Sidiropoulos and M.A. Horowitz. A Semidigital Dual Delay-Locked Loop.Solid-State Circuits, IEEE Journal of, 32(11):1683–1692, 1997.

[24] Stefanos Sidiropoulos. High Performance Inter-Chip Signaling. PhD thesis, Stan-ford, Stanford, CA, April 1998.

[25] Paul R. Gray et al. Analog and Design of Analog Integrated Circuits. WileySons, Danvers, MA, 2010.

[26] Borivoje Nikolic Jan M. Rabaey, Anantha Chandrakasan. Digital IntegratedCircuits, A Design Perspective. Prentice Hall, Danvers, MA, 2003.

[27] Wei Wu, Gang Du, Xiaoyan Liu, Lei Sun, Jinfeng Kang, and Ruqi Han. Physical-Based Threshold Voltage and Mobility Models Including Shallow Trench Iso-lation Stress Effect on nMOSFETs. Nanotechnology, IEEE Transactions on,10(4):875–880, July 2011.

[28] J.G. Maneatis. Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques. Solid-State Circuits, IEEE Journal of, 31(11):1723–1732,Nov 1996.

125

a spread-spectrum clock generator using phase

Documents