a spread-spectrum clock generator using phase
TRANSCRIPT
A Spread-Spectrum Clock Generator using Phase
Interpolation for EMI reduction
by
Ky-Anh Tran
Submitted to the Department of Electrical Engineering and ComputerScience
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2014
c© Massachusetts Institute of Technology 2014. All rights reserved.
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Department of Electrical Engineering and Computer Science
May 3, 2014
Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prof. Charles G. Sodini
LeBel Professor of Electrical EngineeringThesis Supervisor
Certified by. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Matthew L. Courcy
Senior Design Engineer, Analog DevicesThesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Prof. Albert R. Meyer,
Chairman, Masters of Engineering Thesis Committee
2
A Spread-Spectrum Clock Generator using Phase
Interpolation for EMI reduction
by
Ky-Anh Tran
Submitted to the Department of Electrical Engineering and Computer Scienceon May 3, 2014, in partial fulfillment of the
requirements for the degree ofMaster of Engineering in Electrical Engineering and Computer Science
Abstract
The spurious-free dynamic range of RF DAC’s are limited by the heavy digital do-main switching, which interferes with the analog output signal. A design, layout andsimulation of a spread-spectrum clock generator (SSCG) is presented. The SSCGmodulates the clock frequency used to switch the digital blocks of the DAC in orderto reduce electromagnetic interference (EMI) spurs at the analog output signal of theDAC. Leveraging on a phase control architecture rather than a traditional PLL, theSSCG system is shown to reduce the spectral height a divided down clock spur up to19.6dB.
The SSCG is designed in TSMC’s 65nm CMOS process. It takes in quadrature,differential clocks at either 2.5GHz or 5GHz, and provides quadrature output clocksat 625MHz or 1.25GHz. The output spectrum of the clock can be attenuated up to19.6dB relative to the spectrum of an unspread clock. The core of the SSCG is aphase interpolator, which takes in quadrature input clocks and interpolates betweenthem to move the frequency around. To help process the signals before and afterinterpolation, the SSCG incorporates input variable gain filters, output restorationbuffers and divide by 4 circuits. Extensive transistor and behavioral simulations areused to verify the design.
Thesis Supervisor: Prof. Charles G. SodiniTitle: LeBel Professor of Electrical Engineering
Thesis Supervisor: Matthew L. CourcyTitle: Senior Design Engineer, Analog Devices
3
4
Acknowledgments
The completion of this thesis would not have been possible without the help of many.
I would like to thank Shawn Kuo and Matthew Courcy for their patient guidance
through the technical difficulties I encountered. Their IC design know-how and tech-
nical competence are inspirational. Engineers from the DAC group at ADI made me
feel welcome, and were always available for help. I have relied numerous times on
from Andy Fan, Zhou Bing, Qiurong He, Nathan Egan, Steve Rose and Martin Clara,
both for CAD help and circuit advice. Having not done layout before, I relied exten-
sively on Rick Sullivan and Ramson Gambiza, who taught me CAD techniques for IC
layout. Digital Designers Paul Wilkins, Grace Jin and Jim Rioux were kind enough
to lend me help on the digital design and place-and-route. I’d like to give special
thanks to Jeremy Walker, who generously shared his work on interpolator design and
guided me via phone at the start of the project. Finally, I’d like to thank Haiyang
Zhu, for the numerous discussions, not only on the my project, but also on a host of
other topics such as skin effect, latchup, IC layout, ADC design, DAC architecture
and much more. Your enthusiasm to share knowledge made learning a very enjoyable
process.
Besides working, I had also a great deal of fun at ADI interacting with fellow
interns. Kevin, I learned a lot from your systematic approach to IC design. Alex,
your choice of music on the car made carpooling much more interesting. Ujwal,
debating with you on solid state physics made me review a lot of concepts I thought
I had forgotten.
At MIT, I would like to thank Professor Sodini. Professor Sodini provided detailed
and constructive feedback on the project proposal and thesis, and reminded me the
importance of clear technical writing. He also devoted time from his hectic schedule
to advise me on my IC design career path. Finally, I would like to thank my parents
and sister, who kept me sane throughout the whole process, and whose support for
my education has led me to where I am now.
5
6
Contents
1 Introduction 19
1.1 Basics of Digital-to-Analog conversion . . . . . . . . . . . . . . . . . . 20
1.1.1 DAC Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.1.2 Harmonic Purity Requirement . . . . . . . . . . . . . . . . . . 23
1.1.3 Spread-Spectrum Clocking . . . . . . . . . . . . . . . . . . . . 23
1.2 Specifications for the Spread-Spectrum Clock Generator . . . . . . . . 24
1.3 Past Work: Frequency Control Systems . . . . . . . . . . . . . . . . . 27
1.4 Past Work: Phase Control System . . . . . . . . . . . . . . . . . . . . 29
2 Behavioral Study and System Proposal 33
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 The Mathematics of Spread-Spectrum . . . . . . . . . . . . . . . . . 33
2.2.1 A Toy Model: Single-tone FM Modulation . . . . . . . . . . . 33
2.2.2 Generalization to Arbitrary Modulation Waveform . . . . . . 36
2.2.3 Modulation Waveform Selection . . . . . . . . . . . . . . . . . 37
2.3 Mathematics of a Phase-Control System . . . . . . . . . . . . . . . . 44
2.3.1 Discrete-time and Discrete-Phase System . . . . . . . . . . . . 46
2.3.2 Numerics of Phase and Time Quantization . . . . . . . . . . . 49
2.4 Modelling Phase-Control System Non-Idealities . . . . . . . . . . . . 49
2.4.1 General INL and DNL Modelling . . . . . . . . . . . . . . . . 49
2.5 Architecture Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.5.1 System Requirements . . . . . . . . . . . . . . . . . . . . . . . 53
2.5.2 SSCG via Phase Modulation . . . . . . . . . . . . . . . . . . . 53
7
2.5.3 Additional Non-Idealities of Phase Modulator . . . . . . . . . 55
2.5.4 System Level Specifications . . . . . . . . . . . . . . . . . . . 55
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3 Analog Circuit Design 57
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.2 Interpolator Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.1 Code Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.2 Schematic Design . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2.3 Phase-Interpolator Non-idealities . . . . . . . . . . . . . . . . 65
3.2.4 Layout and Sizing . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Signal Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3.1 Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.3.2 Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.3.3 Clock Divider . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.4 Top-Level Floorplan . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5 Evaluating the The High Frequency Signal Path . . . . . . . . . . . . 80
3.5.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 80
3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4 Digital and Auxiliary Circuits 87
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2 Regulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.3 Peak Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Waveform Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4.1 Basic Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.4.2 Modes of Operation . . . . . . . . . . . . . . . . . . . . . . . . 93
4.5 Decoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.6 Calibration Finite State Machine . . . . . . . . . . . . . . . . . . . . 96
4.6.1 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8
4.7 Top-Level Floor-Plan and Clock Distribution . . . . . . . . . . . . . . 97
4.7.1 FloorPlan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.7.2 Clock Distribution . . . . . . . . . . . . . . . . . . . . . . . . 97
4.8 Power Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5 Top-Level Simulation 101
5.1 Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Spread-Spectrum Operation . . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Full Circuit Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.4 Simulation Results Summary . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6 Conclusion 109
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 System Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.3 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.3.1 Optimizing the Interpolator . . . . . . . . . . . . . . . . . . . 110
6.3.2 Calibration for Arbitrary Input Frequencies . . . . . . . . . . 111
6.3.3 Top-level clock Distribution . . . . . . . . . . . . . . . . . . . 111
A Effect of Spectrum Analyzer 113
A.1 Model of peak-hold mode Spectrum Analyzer . . . . . . . . . . . . . 113
A.2 Calculation of Measured Spectrum . . . . . . . . . . . . . . . . . . . 114
A.3 Example 1: Gaussian Filter . . . . . . . . . . . . . . . . . . . . . . . 116
A.4 Example 2: Sinc Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 117
B Terminology 119
C PVT Corner Nomenclature 121
9
10
List of Figures
1-1 Cartoon picture of the effect of a clock spur on the power spectral
density of the signal at the output of a DAC. . . . . . . . . . . . . . . 20
1-2 Basic DAC operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1-3 Cartoon showing the DAC output and its spectrum. . . . . . . . . . . 22
1-4 Typical transmitter architecture for RF DAC. Note that the (purple
) spur location is independent of carrier frequency, and is not easily
filtered out. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1-5 Cartoon of Spread-Spectrum Clocking. . . . . . . . . . . . . . . . . . 25
1-6 The AD9129 in (a) is a typical RF-DAC. We show the traditional
clocking scheme (b) and the new clocking scheme with larger retiming
buffer for SS (c). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1-7 Typical PLL and its linearized model. . . . . . . . . . . . . . . . . . . 27
1-8 Example of Dual-Path Loop Filter, allowing the zero to be B times
slower. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1-9 SS is generally achieved by modulating the feedback path or the LF of
a PLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1-10 Triangular Frequency Modulation Example. . . . . . . . . . . . . . . 30
1-11 Example of compensated-phase rotation technique for SS clocking. . . 31
1-12 2 Implementations of Spread-Spectrum using digital phase control. In
the first case, the phase outputs of a DLL are muxed. In the sec-
ond case, the delay from a Voltage-Controlled Delay Line (VCDL) is
modulated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
11
2-1 Illustration of single-tone FM modulated signal (f0 = 1πHz, fm =
120πHz, φm = 1.4π radians, ∆f = .07Hz). . . . . . . . . . . . . . . . . 34
2-2 Illustration of the first 3 Bessel Functions. . . . . . . . . . . . . . . . 35
2-3 Example of SS on a square clock. . . . . . . . . . . . . . . . . . . . . 37
2-4 Illustration of the terminology using a sawtooth modulation waveform. 38
2-5 Picture showing a intuitive but incorrect derivation of spectral attenu-
ation. This picture seems to suggest that spectral attenuation should
scale linearly with δ, which is false in general. . . . . . . . . . . . . . 39
2-6 Illustration of the indexing terminology of tlα. . . . . . . . . . . . . . 41
2-7 Plot of the Fresnel Functions, C(x) and S(x), as defined above. . . . 41
2-8 Spectra using triangular modulation for several values of fm (φm =
64× (2π)). The attenuation level is largely independent of fm at fixed
φm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2-9 Spectral Oscillation Effect observed for Sawtooth and Triangular Mod-
ulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2-10 Possible modulation waveforms V (t): triangle, sawtooth (better), Hershey-
Kiss (optimal). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2-11 Comparison between the attenuation levels for triangular and sawtooth
modulation, vs φm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2-12 Power Spectra for input and output 5GHz clock (Tdig = 3.2ns). Note
the Images appearing at 5GHz ± 312.5MHz. . . . . . . . . . . . . . 47
2-13 Illustration for the definitions used for discrete-time and phase system. 48
2-14 Plot for the attenuation level of the divide-by-4 clock as a function
of quantization level and clk period, for fm = 38kHz, 76kHz, 152kHz
respectively. Note that the quantization level is largely irrelevant, and
therefore is set to meet cycle-to-cycle jitter specifications, not attenu-
ation specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2-15 Numerical Simulation of the attenuation of the divide 4 clock output as
a function ofDNLrms. Data includes INLmax ∈ (0mperiods, 200mperiods),
k ∈ (1, 10). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
12
2-16 Architecture Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3-1 IQ plane example, illustrating the terminology for quadrant numbering
and IQ coefficient quantization. . . . . . . . . . . . . . . . . . . . . . 58
3-2 Binary scaled interpolator using linear approximation. Because of the
lack of calibration in our final circuit, we opted for a thermometer DAC
current array instead. Figure from . . . . . . . . . . . . . . . . . . . 59
3-3 Constellation Diagram for interpolation using the linear approximation
and using correct trigonometic values. . . . . . . . . . . . . . . . . . . 60
3-4 Finding a reasonable grid quantization level. . . . . . . . . . . . . . . 61
3-5 Simple Type-I Phase interpolator Example (this is our final design
choice). There is a 16 copies of each differential pair, and 8 common
mode differential pairs, so there is a total of 72 differential pair cells. . 62
3-6 We can get rid of the common mode control cells by doubling the
number of cosine and sine cells for the same resolution (the select
signals now are 32 bits wide thermometer codes instead of 16 bits wide
in this circuit). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3-7 Type II phase interpolator. Inductors allow a larger headroom, so that
we have both enough swing and are able to operate the switches as
common-gate amplifiers. . . . . . . . . . . . . . . . . . . . . . . . . . 64
3-8 Effect of subthreshold conduction and feedforward in the IQ plane. . 66
3-9 Feedforward Effect in type I interpolator. . . . . . . . . . . . . . . . . 66
3-10 Plot of output differential current vs input differential voltage. . . . . 68
3-11 Layout of the Interpolator Array . . . . . . . . . . . . . . . . . . . . 69
3-12 Cartoon picture of 2nd order filter effect on Fourier coefficients. . . . 71
3-13 Filter Schematic. This schematic includes one of the 2 differential path
taken. The feedforward cancelling capacitor is included only for the
2nd stage, because the gate to drain capacitance of the first stage is
insignificant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
13
3-14 Layout of the SineShaper. This has 4 inverter chains, for the 2 differ-
ential signal path, one for cosine one for sine. . . . . . . . . . . . . . . 73
3-15 Plot showing the margin above saturation for the tail current source
and differential pair inputs. For PVT corner numbering reference, see
Appendix C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3-16 AC response for the filter, configured for typical corner at 5GHz, and
a typical corner at 2.5GHz. . . . . . . . . . . . . . . . . . . . . . . . . 75
3-17 Restoration circuit and layout . . . . . . . . . . . . . . . . . . . . . . 76
3-18 Restoration Phase Margin. Although the phase margin is negative, the
stability of the circuit is not compromised because the inverters operate
non-linearly and effective loop gain is lower than what is expected from
small-signal analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3-19 Restoration output phase transient under phase code sweep. . . . . . 78
3-20 Clock Divider Circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . 78
3-21 Top level Floor Plan and High Frequency Signal Path Layout. Note
we added clock buffers at the output to drive the long wires out of the
block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3-22 Methodology for evaluating phase linearity. . . . . . . . . . . . . . . . 80
3-23 Simulation methodology for regulator code selection cross corners. . . 81
3-24 Plot of filter output amplitude as we sweep the regulator input code
value (and therefore change the filter supply), for all 45 PVT corners. 82
3-25 DNLrms value across corners for high and low frequency modes, with
5 quadrature error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3-26 We extract the largest DNL step across input code transitions, for each
corner. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3-27 DNLrms histogram for 50 monte carlo simulations at 5GHz (worst case
corner, corner 28). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3-28 Integrated Jitter plot at the output of the restoration circuit. . . . . . 85
4-1 Regulator Schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
14
4-2 Peak Detector Schematic. . . . . . . . . . . . . . . . . . . . . . . . . 90
4-3 Phase Deviation over time for 3 attenuation modes (16.6dB, 13.6dB,
10.6dB) at fm = 38kHz. . . . . . . . . . . . . . . . . . . . . . . . . . 91
4-4 Waveform Generator block diagram. . . . . . . . . . . . . . . . . . . 92
4-5 Acceptable Modes of Operation. . . . . . . . . . . . . . . . . . . . . . 93
4-6 Waveform Generator layout. . . . . . . . . . . . . . . . . . . . . . . . 95
4-7 Sine and Cosine Decoders Layout. . . . . . . . . . . . . . . . . . . . . 95
4-8 Calibration algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4-9 Calibration layout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4-10 Top-Level Floor Plan, highlighting the new blocks. Synthesized digital
circuits are highlighted in solid lines. . . . . . . . . . . . . . . . . . . 98
4-11 Phase value bits and clock routing paths. . . . . . . . . . . . . . . . . 99
4-12 Effect of Clock Skew between the 2 decoders and Spectral Attenuation
for a 5GHz input clock. The parameters for this sim are φm = 64×2π,
fm = 38kHz, Tdig = 3.2ns. . . . . . . . . . . . . . . . . . . . . . . . . 99
5-1 Demonstration of Calibration Algorithm at work with behavioral phase
interpolator and peak detector. . . . . . . . . . . . . . . . . . . . . . 102
5-2 Output Clock attenuation. . . . . . . . . . . . . . . . . . . . . . . . . 103
5-3 Divide-by-4 clock attenuation. . . . . . . . . . . . . . . . . . . . . . . 104
5-4 Differential signals in signal path for 5GHz operation. . . . . . . . . . 105
5-5 312MS/s phase code sweep for 5GHz operation. . . . . . . . . . . . . 106
5-6 Simulation results at 5GHz for linearity performance. . . . . . . . . . 106
6-1 Replica-Feedback biasing example . . . . . . . . . . . . . . . . . . . . 111
6-2 Simple Clock Distribution Method . . . . . . . . . . . . . . . . . . . 112
A-1 Peak-Hold mode operation of spectrum analyzer, Figure from . . . . 114
A-2 Attenuation as a function of Tmλ
for a Gaussian Filter, where we fix
| 1√jkf0δV ′
I0k2| = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
15
A-3 Attenuation as a function of Tmλ
for a Sinc filter, where we fix | 1√jkf0δV ′
I0k2| =
1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
16
List of Tables
1.1 Specifications summary. . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.1 System level specifications . . . . . . . . . . . . . . . . . . . . . . . . 55
3.1 Power consumption summary (clock path). . . . . . . . . . . . . . . . 82
4.1 Digital and auxiliary circuit power consumption. . . . . . . . . . . . . 100
5.1 Specifications summary. . . . . . . . . . . . . . . . . . . . . . . . . . 107
17
18
Chapter 1
Introduction
Traditionally analog chips have become increasingly mixed signal to leverage the
digital processing power in finer process nodes. A difficult issue to tackle is the
Electromagnetic Interference (EMI) that pollutes high quality analog outputs. The
higher the clocking frequency of digital circuits, the larger the EMI will be, as high
frequency signals couples easily through capacitive isolation barriers. In high speed
Digital-to-Analog Converters (DAC) used in communication systems, EMI leads to
clock “spurs” which appear as spikes in the frequency spectrum of the output (Figure
1-1). This thesis presents a Spread-Spectrum Clock Generator (SSCG) used to reduce
those spurs while providing a usable digital clock for a high-speed DAC.
• Chapter 1 presents background information on DAC’s and previous work on
spread-spectrum clock generators.
• Chapter 2 shows the analytical and numerical study of the design tradeoffs of
a SSCG.
• Chapter 3 contains the design method and block verification of the main analog
blocks.
• Chapter 4 covers auxiliary analog circuits and digital circuits.
• Chapter 5 presents the top-level verification of the system.
19
• Chapter 6 concludes with possible extensions to this work.
Figure 1-1: Cartoon picture of the effect of a clock spur on the power spectral densityof the signal at the output of a DAC.
1.1 Basics of Digital-to-Analog conversion
1.1.1 DAC Operation
A DAC is a circuit that takes an input stream of bits and synthesizes from them an
analog voltage value (see Figure 1-2). A DAC is characterized by a bit resolution,
which tells us the granularity of the output analog values synthesized, and by bit
update rate in sample per seconds. For example a 2-bit, 10GS-per-second DAC can
output 4 different analog values, and update it every 100ps. We define fs as the
update rate of the output analog signal, in this case 10GHz. In a communication
system, a DAC is used to translate pre-processed bits into a waveform that contains
the baseband information (example, voice input) [1]. RF-DACs are DACs that have
an fs high enough to directly generate both the carrier waveform at an RF frequency
and the baseband modulation.
Let us consider digital data bit stream fbn into a DAC, where fbn takes on the bit
value 0 or 1 for a given bit value “b”, and time sample value “n”. A cartoon showing
20
Figure 1-2: Basic DAC operation.
the derivation of the DAC spectrum below is shown in Figure 1-2. We can then define
the corresponding analog value fn for each sample n, to be
fn =N−1∑0
FS
2N2bfbn (1.1)
where FS denotes the full scale voltage value and N is the DAC bit resolution.
Figure 1-3 shows fp(t), the typical output waveform of a DAC having a output
sample rate fs. This type of waveform is called a “zeroth-order” sample-and-hold
output waveform, and is a good approximation for the output DAC waveform. The
output of the DAC fp(t) can be expressed in terms of pulse function p(t):
fp(t) =∞∑n=0
p(t− n
fs)fn (1.2)
p(t) = 1 for t ∈ (0,1
fs) (1.3)
p(t) = 0 else (1.4)
It turns out we can also express fp(t) as the convolution of a train of dirac delta
functions called fd(t) convolved with p(t), shown in Figure 1-3.
21
Figure 1-3: Cartoon showing the DAC output and its spectrum.
fp(t) = fd(t) ∗ p(t) (1.5)
fd(t) ≡ [∞∑n=0
fnδ(t− nTs)] (1.6)
If we defined V (jω) and F (jω) to be the Fourier transforms of fp(t) and fd(t), we
can write V (jω) as:
V (jω) =1
2πF (jω)Sinc(
jω
2πfs) (1.7)
Sinc(x) ≡ sin(x)
x(1.8)
Figure 1-3 illustrates this calculation, and how it affects the power spectral den-
sities Fd, P and Fp of fd, p and fp respectively.
22
Figure 1-4: Typical transmitter architecture for RF DAC. Note that the (purple )spur location is independent of carrier frequency, and is not easily filtered out.
1.1.2 Harmonic Purity Requirement
Because of the stringent spectral mask requirements, it is important that a commu-
nication DAC be able to synthesize analog waveforms that have no significant output
signal outside of the target channel [2]. This is a difficult job, because the DAC might
have inherent data sequence dependent distortion which introduces errors at the out-
put, but also because of spurs. Spurs are signals at specific frequencies that appear
in the DAC output spectrum, but are not harmonics of the signal (Figure 1-4). The
large EMI generated by digital clocks is concentrated at the fundamental harmonic
of the digital clock. This interfering signal can couple directly to the output of the
DAC, or can couple indirectly to the bias lines, which mixes with the DAC output.
In the former case, the spurious tone frequency does not depend on the input signal
frequency, and is therefore very difficult to filter out (when it is within the frequency
range of interest).
1.1.3 Spread-Spectrum Clocking
Spread-Spectrum Clocking refers to frequency modulating a clock. In the frequency
domain, the original clock spectrum has its fundamental and its harmonics. Each
23
of those would be “fattened”, and the frequency peaks would spread-out, hence the
name Spread-Spectrum (SS). The resulting height of the peaks are reduced, and so
does the height of the interfering signals on the analog output (see Figure 1-5). One
way to see this effect is to realize that the power of the signal is unchanged by SS. This
means that if we increase the frequency spread, the height of the spectral density must
decrease to keep to the total power in the frequency domain constant. This results in
increased Spurious-Free Dynamic Range (SFDR). Not all systems can use SS clocks,
and certain systems have very small frequency deviation requirements, making SS
difficult to implement.
In our case, the DAC clock cannot be spread because the DAC clock must not
be frequency modulated. Modulating the timing of the output data can cause sig-
nificant output value distortion. The digital clock can be spread, given 2 conditions.
First, the cycle-to-cycle jitter must be be low so that clock periods do not become so
small that they compromise minimum timing margins set by digital designers. Fur-
thermore, spreading the digital clock will cause the digital data output to be phase
shifted relative to the analog clock. This data stream needs to be retimed, which
is traditionally accomplished by a FIFO buffer. The depth of the FIFO buffer, in
turn, has to be at least as large as the maximum time deviation between the spread
and unspread clocks. Details regarding the placement of the FIFO buffer in a typical
DAC architecture is shown in Figure 1-6.
1.2 Specifications for the Spread-Spectrum Clock
Generator
Given the use of spread-spectrum in an RF-DAC, we set up the following specifications
for the spread spectrum clock generator.
24
Figure 1-5: Cartoon of Spread-Spectrum Clocking.
Power Budget 50mW
Input quadrature clocks
Output divide-by-4 quadrature clocks
Jitter Less than 5ps rms
Maximum cycle-to-cycle jitter 20ps
Maximum Time Deviation 64 periods of input clock
Area .4mm× .4mm
Modulation Rate 38kHz or faster
Input clock rates fast mode (5GHz), slow mode (2.5GHz)
EMI reduction 20dB reduction
Process node TSMC’s 65nm CMOS low-power
Table 1.1: Specifications summary.
The power consumption and area restrictions are required to integrate the block
into a much larger DAC chip. The relatively lax jitter requirement is because the
output clock will be used to clock the digital data, which only need to have enough
margin to satisfy the setup and hold time requirements. The more jitter, the smaller
25
(a) AD9129
(b) Old Clocking (c) Proposed Clocking
Figure 1-6: The AD9129 in (a) is a typical RF-DAC. We show the traditional clockingscheme (b) and the new clocking scheme with larger retiming buffer for SS (c).
the valid timing window used to sample data bit is. Since a digital clock at 1.25GHz
has 800ps clock period, a 20 ps cycle-to-cycle jitter is only a modulation of 2.5 % of
the total period.
26
Figure 1-7: Typical PLL and its linearized model.
1.3 Past Work: Frequency Control Systems
In general, a frequency control system involves a Phase-Locked Loop, sketched in
Figure 1-7. We can use linear system analysis to determine its stability, with the
phase φ(s) as the control variable, given the input signal is within the PLL locking
range1. Let KPD be the sensitivity of the phase detector, Gm the transconductance
of the charge pump, HLF (s) the transfer function of the loop filter and KV COs
the
transfer function of the Voltage-Controlled Oscillator (VCO) . For the typical loop
filter implemention shown in the figure, the loop transfer function is
L(s) = KPDKV COGm1
s(C1 + C2)
(1 + sR1C1
1 + s(R1C1C2
C1+C2)
)(1.9)
For the loop to be stable, we want to ensure that the 1R1C1
zero falls far below
unity gain frequency, and for the 1R1(C1‖C2)
pole to be above unity gain frequency.
1For more details on PLL modeling, see [3] and [4]
27
For a spread-spectrum application, the unity gain frequency has to be below the
modulation frequency, otherwise, the loop will correct out the frequency modulation
pattern. This means that the 1R1C1
zero must typically be very large.
When one looks at the PLL, one can see two main areas which can be modulated
to create spread spectrum: the input to the VCO or the feedback path. The VCO
is directly modulated in the work in [5], where a programmable current source is
integrated onto a capacitor that stores the control voltage of the VCO, to generate
frequency sweeps and spread spectrum. The concept is simple to implement, however,
one disadvantage is that the bandwidth of the PLL must be significantly reduced, so
that it can filter out the modulation component. Indeed, the modulated clock will
appear on the output as an input to the phase-frequency detector. The small PLL loop
bandwidth can be implemented with a large integrated capacitor, which consumes a
large area.
Several works used the idea of capacitive multiplication to address this problem
[6] [7]. For example, in [7], we have 2 charge pumps that provides 2 parallel paths
in order to create the zero in the loop transfer function (Figure 1-8). If we size the
charge-pumps with a ratio of B, the transfer function of the loop filter becomes:
HLF (s) =1
sC1
(1 + sBR1C1
1 + sR1C2
)(1.10)
Essentially, the zero has be slowed down by a factor B, allowing us to increase the
value of C1.
The most popular way to implement a spread-spectrum clock is to modulate the
divide by M path in a PLL (Figure 1-9). Because the divide-by circuit is inherently a
integer divide operation, fractional division to make small frequency modulations uses
dithering of the clock and delta-sigma modulation to average out the quantization er-
ror [8] . Both the feedback path and the VCO can be modulated simulataneously, as
is done in [9], which allows more accurate production of a triangular frequency wave-
form (example waveform in Figure 1-10), even though the loop-filter is not necessarily
28
B*Ip
Ip
bufferC1
C2
R1
VC
Figure 1-8: Example of Dual-Path Loop Filter, allowing the zero to be B times slower.
slow enough.
For protocols where the instantaneous frequency deviation must be very small,
such as the Serial ATA protocol, the increase in hardware complexity to filter out
the quantization noise from fractional division might cost too much power and area.
Indeed, a quick look at the layout of a fractional-N PLL implementation of SSCG
in [10] shows quickly that most of the area is taken by the loop-filter alone, just
to filter out quantization errors. One technique to have true fractional division in
the feedback path is to have both frequency division and phase selection occurring
simultaneously (Figure 1-11). This technique, called “compensated phase rotation,”
has the benefit of producing true fractional division and reduced jitter [11].
1.4 Past Work: Phase Control System
In phase control systems, a digital finite state machine controls a digitally controlled
phase synthesis system to shift the phase and therefore the frequency of the out-
put around (2 architectures are shown in Figure 1-12). The typical architecture is
presented in [12], where a DLL generates multiple output phases, which are selected
based on a digital algorithm to shift the phase and frequency around. A similar design
in [13] has both dummy and actual delay-locked loop, the dummy one to monitor
the number of delay taps for a full period. This design has a highly reconfigurable
29
Figure 1-9: SS is generally achieved by modulating the feedback path or the LF of aPLL.
Figure 1-10: Triangular Frequency Modulation Example.
30
Figure 1-11: Example of compensated-phase rotation technique for SS clocking.
waveform frequency selection, with an SRAM to allow reprogrammable waveform se-
lection. It demonstrates the flexibility in using a phase-control system as opposed to
a frequency control system. In [14], the phase control system is a delay line with
switchable delay cells. The total delay is just the sum of the delays that are in the sig-
nal path, and is digitally controlled to produce triangular frequency modulation (see
Figure 1-10 for example waveform). The argument for using a delay cell array is that
in contrast to a PLL, the random edge jitter can be suppressed because the random
period jitters are not accumulated by a VCO. For our purpose however, the maxi-
mum phase deviation has to be tightly controlled because it determines the size of
the FIFO buffer needed, and using uncalibrated delays to generate phase modulation
cannot be possible.
The new phase controlled architectures all rely on digital circuits to generate the
output phase waveform. Using a fine process of 65nm CMOS, we therefore envision
that a phase-controlled architecture will make better use of the lower power and
area requirements from a digital processing of the phase waveform. Furthermore, the
previous work using a digitally controlled phase output shows that quantized phases
can reliably create spread-spectrum clocks, an observation we will verify numerically
in the next chapter.
31
Figure 1-12: 2 Implementations of Spread-Spectrum using digital phase control. Inthe first case, the phase outputs of a DLL are muxed. In the second case, the delayfrom a Voltage-Controlled Delay Line (VCDL) is modulated.
32
Chapter 2
Behavioral Study and System
Proposal
2.1 Introduction
This chapter introduces much of the notation used later in the thesis. It includes a
derivation of the attenuation for an arbitrary spread-spectrum waveform, illuminates
the choice for the frequency modulation waveforms, and the effect of various non-
idealities on the performance of the spread-spectrum block. After a careful numerical
investigation, we construct an new architecture for spread-spectrum that is both
simple and effective to meet our design specifications.
2.2 The Mathematics of Spread-Spectrum
2.2.1 A Toy Model: Single-tone FM Modulation
A simple example of a spread-spectrum clock is a clock with a sinusoidally vary-
ing frequency. For example, a toy unspread clock can be a sine wave, of the form
Vunspread(t) = A cos(2πf0t), where f0 is the frequency of clock. The spread clock is
then defined to be:
33
Figure 2-1: Illustration of single-tone FM modulated signal (f0 = 1πHz, fm = 1
20πHz,
φm = 1.4π radians, ∆f = .07Hz).
Vspread(t) ≡ A cos(φ(t)) (2.1)
where φ(t) ≡ 2πf0t+ φm sin(2πfmt) (2.2)
φm ≡∆f
fm(2.3)
We will use this example to illustrate various terms.
• The maximum phase deviation is defined as the maximum phase difference
between the Vspread and Vunspread, and will generically called φm.
• The instantaneous frequency of the clock called f(t) is computed as follows:
f(t) ≡ 1
2π
dφ(t)
dt= f0 + ∆f cos(2πfmt)
.
• The carrier frequency, f0, is the center frequency around which the phase
34
Figure 2-2: Illustration of the first 3 Bessel Functions.
varies. It is typically much higher than the modulation frequency, fm, which
is the frequency at which f(t) varies.
• The modulation depth of the signal is max(|f(t)− f0|) = ∆f
• The frequency spread is defined as the frequency range covered by the in-
stantaneous frequency. In this case, it is just 2∆f .
For this toy model, we can decompose Vspread(t) in terms of the Bessel functions
of nth order, called Jn, and easily obtain its Fourier transform.
Vspread(t) = A
∞∑n=0
Jn(β) cos (2π(f0 + nfm)t) (2.4)
The output spectrum of Vspread is only nonzero at f0+nfm for n ∈ Z. Furthermore,
the attenuation, which we define as the ratio between the highest peak of the PSD
between the unspread and spread clock, is proportional to | 1J0(β)|2, which is only a
function φm, the maximum phase deviation. For large phase shift values φm, the
attenuation scales as follows:
35
∣∣∣∣ 1
J0(φm)
∣∣∣∣2 ≈ 1
2πφm(2.5)
We see that the attenuation scales inversely with φm. This implies a 3dB per octave
scaling between attenuation and maximum phase shift. This observation can be
generalized for more complicated modulation scheme, and is a principle tradeoff in
the design of the spread-spectrum clock generator.
2.2.2 Generalization to Arbitrary Modulation Waveform
In this section, we consider a generic modulation waveform and a generic clock wave-
form instead.
A generic clock waveform u(t) can be written in terms of Fourier series as:
u(t) =∞∑
k=−∞
I0k2
exp(2πjkf0t) (2.6)
For example, if u(t) is a square wave of amplitude A, the harmonic values I0k
would be
I0k = 2Asin(kπ
2)
kπ(2.7)
Consider a modulation shape V (t), which is a function in the range [−1, 1] that
describes the shape of the frequency modulation. In general, a frequency-spread clock
signal us(t) with instantaneous frequency f0(1 + δV (t)), is then written as:
us(t) = u(t+δ
fm
∫ fmt
0
V (t)dt) =∞∑
k=−∞
Imk(t) (2.8)
Imk(t) ≡I0k2
exp
(2πjf0
(t+
δ
fm
∫ fmt
0
V (t)dt
))(2.9)
36
Figure 2-3: Example of SS on a square clock.
(2.10)
We will use this example once again to practice our terminology:
• The modulation frequency is fm.
• The frequency spread is 2δf0.
• The carrier frequency is f0.
• The maximum phase deviation,
φm =
∣∣∣∣∣2πf0δ∫ 1
2
0
V (t)dt
∣∣∣∣∣
2.2.3 Modulation Waveform Selection
Our goal is to select a modulation shape, V (t), such that we create maximum attenu-
ation for a given fm and δ. A first simple guess is to use V (t) as a triangle wave. This
37
Figure 2-4: Illustration of the terminology using a sawtooth modulation waveform.
guess comes from the intuition that the frequency-spread spectrum will be approx-
imately flat, because the instantaneous frequency of the output waveform visits all
frequencies in the frequency spread equal amounts of time. This intuition is almost
true, but inaccurate as we can see in the spectrum of Figure 2-9. The output spec-
trum has oscillations instead of a flat spectrum. Another observation is that the even
for the triangle waveform, the attenuation is largely not a function of the frequency
spread, but rather a function of the maximum phase deviation (Figure 2-8). This is
rather non-intuitive in this case. The total power of the signal is conserved before
and after spread spectrum. If the frequency spread increases, the power spectrum
density must go down proportionately to conserve the integrated area under the box
(Figure 2-5). We should have 3dB per octave relationship between attenuation and
δ, not attenuation and φm. A more careful analysis is needed.
Stationary-Phase Approximation :
38
Figure 2-5: Picture showing a intuitive but incorrect derivation of spectral attenu-ation. This picture seems to suggest that spectral attenuation should scale linearlywith δ, which is false in general.
For typical clock signals, the harmonic content decays much faster than the square
wave because higher frequencies are attenuated. To calculate the attenuation due to
spread-spectrum, we only need to consider the fundamental. The Fourier transform
of the fundamental is then
F(Im0(t)) ≡ Im1(ω) =
∫ ∞−∞
dt exp(−jωt)Im1(t) (2.11)
We will now consider a V (t) that is roughly triangular shaped, although the
analysis, notation and conclusions remain the same for a generic V (t). When the
phase oscillates rapidly enough, we can evaluate the integral using the stationary-
phase method. For a given ω ∈ (2πf0(1− δ), 2πf0(1 + δ)), the instantaneous angular
frequency will match that value twice. We can index the modulation periods by n, and
index the αth time the the waveform frequency matches ω by α. An illustration of this
terminology is shown in Figure 2-6. The stationary-phase method just asserts that
most of the contribution to Im0(ω) comes from when the integrand phase is stationary,
which is around those 2 times, each modulation period. When the integrand phase
oscillates, because the modulation waveform is slow, the integrand roughly cancels
out. If this assertion is true, we can take a Taylor expansion of the phase of the
39
integrand around those points.
We therefore have:
Im1(ω) =
∫ ∞−∞
dt exp(−jωt)I012
exp(2πjf0(t+δ
fm
∫ fmt
0
V (τ)dτ)) (2.12)
≡∫ ∞−∞
dtI0k2
exp(jφ(t)) (2.13)
φ(t) = −ωt+ 2πjf0(t+δ
fm
∫ fmt
0
V (τ)dτ)|t≈tlα (2.14)
≈ φ(tlα) + (πf0δ)V′(tlα)(t− tlα)2 (2.15)
Im1(ω) ≈2∑
α=1
∞∑l=−∞
∫ (l+1)Tm
nTm
dtI0k2
exp(jφ(tlα) + j(πf0δ)V′(tlα)(t− tlα)2)
(2.16)
Im1 ≈2∑
α=1
∞∑l=−∞
Γ(tlα)
√1
j2πδf0V ′(tlα)
I0k2
exp(jφ(tlα)) (2.17)
Im1(ω) ∝ 1
Tm
∑α
Γ(t0α)
√1
jδf0V ′(t0α)(2.18)
Γ(tlα) = S((l + 1)Tm − tlα) + S(tlα − nTm) + C(tlα − nTm) + C((l + 1)Tm − tlα)
(2.19)
C(x) ≡∫ x
0
dt cos(t2) (2.20)
S(x) ≡∫ x
0
dt sin(t2) (2.21)
C(x) and S(x) are the Fresnel integrals (graphed in Figure 2-7). In the limit where
Tm >>√
1δf0V ′(tnα)
, we can extend the limits of integration to ±∞, and approximate
Γ ≈√
2π.
Formula 2.16 confirms the 2 crucial observations made previously about spread-
spectrum:
• |Im1|2 ∝ 1δf0V ′(t)(Tm)2
. For the case of triangular modulation, this value is just a
constant, and is in fact the inverse of the maximum phase deviation. We recover
40
Figure 2-6: Illustration of the indexing terminology of tlα.
Figure 2-7: Plot of the Fresnel Functions, C(x) and S(x), as defined above.
41
Figure 2-8: Spectra using triangular modulation for several values of fm (φm = 64×(2π)). The attenuation level is largely independent of fm at fixed φm.
the 3dBoctave
scaling between maximum phase deviation and attenuation. We note
that for a triangle or sawtooth waveform, the waveform is uniquely specified by
only 2 parameters out of the following: δ, fm and maximum phase. At fixed
maximum phases, changing fm, and correspondingly δ, has little effect on the
attenuation.
• We expect a ripple in the spectrum because of constructive or destructive in-
terference as we vary ω. This comes from the sum of exp(jφ(tnα)) and the
oscillation of the Fresnel functions. In fact, as the tnα approach the edges of the
modulation waveforms, at its peak or its trough, we should expect constructive
interference because tn0 and tn1 become identical. This is confirmed when we
look at numerical computations of the power-spectrum (Figure 2-9).
These 2 observations allow us to conclude that the optimal modulation waveform
will de-emphasize (spend less time) at the edges of the of the spectral band, unlike
the triangular modulation waveform which spends equal amounts of time in all fre-
42
Figure 2-9: Spectral Oscillation Effect observed for Sawtooth and Triangular Modu-lation.
quencies. This is traditionally done with a “Hershey-Kiss” profile (see Figure 2-10)
for V (t), but is difficult to reproduce accurately using a digital FSM. Furthermore,
using a sawtooth-like waveform should perform better than a triangular waveform
because it prevents the constructive interference to occur at the edge of the spectrum
(this claim is verified numerically in Figure 2-11). Further work has been done on
using a waveform that evens out the oscillations due to the Fresnel functions, but
using a so-called “optimal waveform” necessitates a very complex modulation scheme
to accurately reproduce the modulation waveform function [15] [16]. We therefore
choose a sawtooth waveform for our final design.
Effect of Spectrum Analyzer :
It seems that neither the value of fm, nor the values of δ came into play in the
waveform selection. This is partially true. What we have considered so far is an
ideal spectrum analyzer, which computes the Fourier coefficients perfectly. In the
real world, the spectrum analyzer itself has an impulse response we have to account
43
Figure 2-10: Possible modulation waveforms V (t): triangle, sawtooth (better),Hershey-Kiss (optimal).
for. An analysis with the spectrum analyzer response tells us that fm needs to be
faster than the Resolution Bandwidth (RBW) of the spectrum analyzer in order for
the analysis in the previous section to hold (see Appendix A).
2.3 Mathematics of a Phase-Control System
In this section, we explore the non-idealities of using phase-control system as opposed
to frequency control systems to do spread-spectrum. While the frequency-control sys-
tem like a PLL is more faithful to the mathematical analysis in the previous section,
a phase-control system is able to perform discontinuous frequency modulation. We
therefore have to explore how discrete-time and discrete-phases affect the SS perfor-
mance.
44
Figure 2-11: Comparison between the attenuation levels for triangular and sawtoothmodulation, vs φm.
45
2.3.1 Discrete-time and Discrete-Phase System
For a realistic digital phase control system that creates SS to work, we have to ensure
that both the phase quantization and the discrete phase update at the digital clock
will not significantly affect the attenuation of the spectrum. We call the time period
between phase updates Tdig. This period is typically the clock period for the digital
FSM that creates the modulation waveform. For a given V (t), we will then have a
phase φn at time t = nTdig, since the phase is updated discretely. For the sake of
simplicity, consider Tm = NTdig where N is an positive integer. φn is therefore equal
to φn+N . Let us calculate the Fourier transform:
Im0(ω) =
∫ ∞−∞
dt exp(−jωt)Im0(t)dt (2.22)
=∞∑
l=−∞
N−1∑n=0
∫ lTm+(n+1)Tdig
lTm+nTdig
dt exp(−jωt) exp(jφn) (2.23)
= [∞∑
l=−∞
exp(−jlωTm)]sin(ω
Tdig2
)
ω
N−1∑n=0
exp(−jω(n+1
2)Tdig) exp(jφn) (2.24)
The term in bracket is the same as we had before, from the modulation period-
icity. However, the discrete phase update adds “images” to the spectrum which are
weighted by the sinc function (Figure 2-12). Indeed, we can see that the term in the
second summation is invariant under ω → ω + 2πTdig
, which implies that the spectral
contribution at f0 will also appear at f0 + n 1Tdig
for integer n except it will be mod-
ulated by the term before the sum, the sinc function. Furthermore, the continuous
Fourier transform has been transformed into a discrete Fourier transform. We assume
that in the limit Tdig is small, we recover the continuous Fourier transform. Because
the images are suppressed, we do not expect that they affect the overall attenuation
level.
Dividing the spread clock down :
The final clock used for digital clocking is a 1.25GHz. Our block is supposed to
46
Figure 2-12: Power Spectra for input and output 5GHz clock (Tdig = 3.2ns). Notethe Images appearing at 5GHz ± 312.5MHz.
produce 4 phases of the clock at 1.25GHz using a spread clock at 5GHz. There are
several direct effect of this divide-down operation:
• We lose immediately 6dB in attenuation. This is because the maximum phase
deviation decreases by a factor of 4.
• For a phase update rate faster than 1.25GHz, there is a time truncation error.
This is because the phase only updates at clock transitions, which happens
every 800.0ps. This is not likely to happen because the digital FSM is most
likely running at a subrate of the output 1.25GHz clock.
• The width of the spectrum will tighten by a factor of 4. This is because the width
of the spectrum is related to max|f(t) − f0|, but the instantaneous frequency
will shrink by 4 when put through the divider.
47
Figure 2-13: Illustration for the definitions used for discrete-time and phase system.
48
2.3.2 Numerics of Phase and Time Quantization
There are two main parameters we are interested in, in particular the quantization
level Q, which we define as the number of output phases our phase-control system
can produce in a 5GHz period, and the digital update period, Tdig, which is the time
period between phase updates (Figure 2-13). The 2 parameters are interlinked. If
Tdig is slow, but Q is large, there will be phase skips much larger than the minimum
phase step. If Tdig is fast, but Q is small, the phase jumps will be limited by the
minimum phase step of the phase control system. Finally, if fm increases at fixed
φm, Tdig has to decrease, or phase skips will occur. All those effects are investigated
below and graphed (Figure 2-14).
What we observe is Q is a relatively benign parameter, but if we want to meet
attenuation requirements at 76kHz or faster, we need to set Tdig < 3.2ns. This is a
reasonable clocking rate for a digital system. Q is then set to be 32 and fm to 38kHz,
not because of concern for the attenuation specification, but because we want to
bound the minimum phase step to satisfy cycle-to-cycle jitter requirements (although
fm = 76kHz would still meet jitter requirements, using fm = 38kHz leaves a more
satisfactory conservative margin)
2.4 Modelling Phase-Control System Non-Idealities
Previously, our numerical simulations assumed that the phases would be exactly
nQ× 2π, where n is the truncated value of the analog phase. In general, we expect the
output phase to differ because of non-systematic errors (random mismatches in the
silicon), but also systematic errors (quadrature phase errors, anharmonicity of input
signals etc...). We explore the effect of those errors in this section.
2.4.1 General INL and DNL Modelling
To model those errors, we introduce here the term INL, which is typically in data
converters. Consider a phase control system with Q output phases in a revolution.
49
(a) fm = 38kHz (b) fm = 76kHz
(c) fm = 152kHz (d) fm = 304kHz
Figure 2-14: Plot for the attenuation level of the divide-by-4 clock as a functionof quantization level and clk period, for fm = 38kHz, 76kHz, 152kHz respectively.Note that the quantization level is largely irrelevant, and therefore is set to meetcycle-to-cycle jitter specifications, not attenuation specifications.
50
A phase control system takes an input phase φi, and tries to output a signal with an
output phase φn = φi − n2πQ− φoffset, where φoffset is some offset phase which has
no performance impact in our application. If we call the actual output phase φon, we
define:
INLn = Qφn − φon
2π(2.25)
DNLn = INLn+1 − INLn (2.26)
In our case, we will measure INL and DNL in “milliperiods”, which is a unit
of phase (1 milliperiod = 2π1000
radians). We then create a sinusoidal INL profile,
characterized by INLmax and k.
INLn = |INLmax sin(nπk
Q)| (2.27)
At fixed INLmax, k scales linearly with the DNL values. At fixed k, INLmax also
scales linearly with the DNL value. Therefore, DNL scales roughly as INLmax × k.
Because we are interested in spectral attenuation, pure phase INL is not the spec of
interest but DNL is. Therefore, we plot in Figure 2-15 the attenuation as a function
of DNLrms, which we define to be:
DNLrms ≡1
Q
√√√√Q−1∑n=0
DNL2n (2.28)
What we observe is that the attenuation spec is very insensitive to the value of
DNLrms, and in fact, values of DNLrms up to 24 milliperiods have almost no effect
(Figure 2-15).
51
Figure 2-15: Numerical Simulation of the attenuation of the divide 4 clock outputas a function of DNLrms. Data includes INLmax ∈ (0mperiods, 200mperiods), k ∈(1, 10).
52
2.5 Architecture Proposal
2.5.1 System Requirements
We found out that a phase control system to realize spread-spectrum is very robust
to non-idealities such as phase quantization, discrete time updates, and phase DNL.
In fact, we will saturate cycle-to-cycle jitter specifications before causing significant
attenuation degradation. We therefore set Q to be 32, so that only 6ps phase steps are
taken, and Tdig = 3.2ns, for a reasonable digital clocking speed. We leave ourselves
another 6ps of maximum DNL, so that the maximum cycle-to-cycle jitter will be
12ps. Finally, to achieve about 20dB of attenuation at the divide-4 clock output, we
will need to set φm = 64 periods at 5GHz.
2.5.2 SSCG via Phase Modulation
We now propose an architecture to create the phase modulation pattern to create
spread spectrum with those specifications in mind. The key equation of this archi-
tecture is:
√a(t)2 + b(t)2 sin(ωt+ tan−1(
b
a)) = a(t) cos(ωt) + b(t) sin(ωt) (2.29)
This equation tells us that by adding 2 quadrature signals with varying time
coefficients, we can reproduce any points on the IQ plane. To implement this equation,
we take in anharmonic differential IQ clocks, which we filter using variable gain filters
to obtain the sine and cosine signals (Figure 2-16). The phase interpolator then adds
them up with the coefficients a(t) and b(t), which are digitally controlled by a FSM.
The output signal is then gained up and restored to full rail swing, to be divided
down. The divider is included in the block because routing outwards 5GHz signals
is very power inefficient. The use of calibration for the filter is because of the wider
frequency range of operation (2.5GHz to 5GHz). The input amplitude to the phase
interpolator is monitored to ensure it does not saturate, so the bandwidth of the
53
filters must be tuned based on the input frequency and the process variations.
Figure 2-16: Architecture Proposal
54
2.5.3 Additional Non-Idealities of Phase Modulator
We have previously investigated the effect of phase errors on a phase control system
on the spread-spectrum performance. For the case of a phase modulator, the input
quadrature phase error will be a clear contributor to the non-linearity of the phase
to code relationship. However, input quadrature error leads to a smooth modulation
of the phase over a period, and cycle-to-cycle jitter will not be harmed significantly.
We therefore leave a 5 degree budget for input quadrature phase error.
2.5.4 System Level Specifications
We have now determined the type of phase linearity and quantization resolution for
our phase control system. In particular, we have a clear idea on the phase update rate,
therefore the clock speed requirements. We are ready to choose circuit topologies to
satisfy the specifications below:
Specification Value
Phase Resolution 32 phases in a periodDNLrms < 12 milliperiodsmax|DNL| < 31.2 milliperiods (1 LSB phase)Phase Update Rate 312 MSample/sInput Quadrature Error up to 5 degrees.Maximum time deviation 12.8ns (φm = 64periods at 5GHz)Modulation Frequency 38kHzFrequency Modulation Waveform Sawtooth
Table 2.1: System level specifications
2.6 Summary
In this chapter, a mathematical model of spread-spectrum waveforms was presented
both for continuous phase and discrete phase systems. In the continuous phase case,
we derived explicit analytical calculations of the attenuation specification, and nu-
merically verified those analytical expressions. This allowed us to develop an intuition
55
for design tradeoffs of the spread-spectrum system, and choose an optimal modula-
tion waveform (sawtooth waveform). We then analytically and numerically analyzed
non-idealities introduced by discrete-phase and discrete-time systems, and obtained
bounds within which we can meet our target specifications. We finally propose an
phase-control architecture based on phase-interpolation.
56
Chapter 3
Analog Circuit Design
3.1 Introduction
This chapter reviews schematic and layout implementation details for the analog
circuits of the SSCG system. We divide the analog circuits into 2 main categories:
the interpolator core and signal conditioning.
• Interpolator core: The interpolator core is a phase interpolator that takes in
quadrature sine waves and is digitally controlled to produce an output sine wave
with some intermediate phase value. The interpolator core’s crucial specification
is its DNL performance.
• Signal Conditioning: Signal conditioning refers to anything in the high fre-
quency clock signal path that is not the interpolator core. It consists of input
filters, which makes the rail-to-rail inputs become more sine-shaped by filtering
out the higher harmonics of the input clock, and a restoration circuit which re-
stores the weak output signal of the interpolator into a rail-to-rail signal. It also
includes a divide-by-4 circuit which divides down the 5GHz clock to quadrature
phases of a 1.25GHz clock.
We defer discussion of the peak-detector circuits, voltage regulator circuit, and
clock distribution scheme for chapter 4.
57
Figure 3-1: IQ plane example, illustrating the terminology for quadrant numberingand IQ coefficient quantization.
3.2 Interpolator Core
3.2.1 Code Selection
The phase interpolator relies on the equation:
a(t) cos(ωt) + b(t) sin(ωt) =√a2 + b2 cos(ωt− tan−1( b
a)) (3.1)
Each signal characterized by a(t) and b(t) can be mapped to a point (a(t), b(t)) on
a so-called IQ plane, shown in Figure 3-1. The IQ plane is characterized by a grid
resolution N, which we define to be the number of non-zero values an or bn can take
in a particular quadrant. We must choose N, and then find a mapping between a
given phase φn and its corresponding coefficients, or IQ plane lattice point which
we call (an, bn). As a quick reminder, we have decided on using 32 phases a period,
and we are aiming for an absolute maximum of 12 milli-periods DNLrms, with both
quadrature phase error, coefficient quantization error, as well as phase interpolator
non-ideality, input and phase interpolator non-linearity and monte-carlo mismatch.
Intuitively, as we increase N, we are able to hit lattice points closer and closer both
in amplitude and phase to the ideal amplitude 1 and ideal phase value φn. However,
increasing the granularity N also implies a larger number DAC unit current sources.
This is an increase in area and power of the interpolator stage.
58
Figure 3-2: Binary scaled interpolator using linear approximation. Because of thelack of calibration in our final circuit, we opted for a thermometer DAC current arrayinstead. Figure from [19].
A simple IQ point selection, for example in [17] [18], is a linear coefficient
selection. The circuitry for such design is shown in Figure 3-2. This means that
instead of having bn ≈ ±√
1− an2, we instead set bn = 1−an. This strategy produces
a diamond shaped point selection on the IQ plane, as shown in Figure 3-3 . The
advantage of this linear point selection is the relative simplicity of generating bn from
an. Using a binary array, no decoders are even needed, as seen in Figure 3-2. The
disadvantage is that the coefficient selection inherently introduces additional phase
DNL. In the previous work in [17], a digital predistortion (DPD) method was used
to compensate for the nonlinearity. In [18], the application of the interpolator was
for a CDR circuit where the monotonicity, not the linearity, of the interpolator was
crucial.
Another factor that led us to elect a different IQ point selection from a linear inter-
polation method is the concern for amplitude modulation. The amplitude modulation
59
Figure 3-3: Constellation Diagram for interpolation using the linear approximationand using correct trigonometic values.
using the linear approximation becomes independent of the step size chosen in the IQ
plane. Since the interpolator will go into an highly saturated amplifier, the final am-
plitude should be rail-to-rail and independent of the code. However, small amplitudes
can have significant effect on the overall delay. Because we target our interpolator
for a phase step of only 6.25ps at 5GHz, even small delay differences will lead to
amplitude-to-phase conversion on the order of the step size. Furthermore, this ampli-
tude modulation is systematic: it occurs every period. The spread-spectrum system
will therefore exhibit spurious modulation that is inside the clock frequency band,
which can lead to unexpected spectral oscillation and degradation of the attenuation
specification.
In Figure 3-4, we graph the effect of IQ grid resolution to DNLrms. Setting an
amplitude modulation criteria of less than 10 %, we then select a IQ grid granularity
so that DNLrms is a small fraction of the LSB of phase step. Let us call N the
resolution of the IQ grid in any given quadrant (so a quadrant will have (N + 1)2
points). We can find the a N such that DNLrms <15
LSB, a slightly conservative
choice. N = 16 satisfies handily our restriction, and is our final choice.
60
Figure 3-4: Finding a reasonable grid quantization level.
3.2.2 Schematic Design
Interpolators with separate I and Q coefficient control can be implemented in several
ways, but mainly revolve around a single differential pair, with switches either in
the tail current or the in the current path to steer a digitally controlled amount of
current into the resistors. Interpolators called type-I have the current steering switch
below the differential pair inputs, while, interpolators called type-II have the current
steering switch above the differential pair input.
A simple implementation of a type-I interpolator has 64 cells, 16 cells for each
quadrant. By putting the switch below the tail bias transistor, we isolate the switch
from the output. An example of this circuit is found in [20] or [21], although
we reproduce it below for our specific implementation (Figure 3-4). Each cell is
biased with the same tail current, which is ISS = 150µA in our application. They are
therefore called “thermometer weighted” (as opposed to binary-weighted). Let us call
the thermometer value of selectsinp, selectsinn, selectcosp, selectcosn, selectcm to be
x1, x2, x3, x4, x5 respectively (the thermometer code labels are shown in the schematic
of Figure 3-5). Using this terminology, we note the thermometer values x1...x4 takes
on integer values from 0 to N , while x5 can take integer values up to the number
61
select_sinp[15:0] select_sinn[15:0]
vb
sin-sin
sin
vb
Vdd Vdd
select_cosp[15:0] select_cosn[15:0]
cos cos-cos
select_cm[7:0]
vb
vcmvcm
M1 M2
M3 M4
M5
M6
M7 M8
M9
M10
R1 R2ICM ICM
I1d
I2d
I3dI4d
Figure 3-5: Simple Type-I Phase interpolator Example (this is our final design choice).There is a 16 copies of each differential pair, and 8 common mode differential pairs,so there is a total of 72 differential pair cells.
of common-mode cells we choose to provide. Then, the output differential voltage
is proportional to the differential current (called I1d, I2d, I3d, I4d) going through the
positive and negative resistors:
Vod = R× ((I1d − I3d) + (I2d − I4d)) (3.2)
= ISSR× ((x1 − x3) sin(ωt) + (x2 − x4) cos(ωt)) (3.3)
The expression above allows us to identify the value x1−x3N
= bn and x2−x4N
= an,
where the (an, bn) pair represent a point in the IQ plane (Figure 3-1). By selecting
appropriate codes x1, ..., x4, we can target all the points in the 4 quadrants of the IQ
plane, since an and bn can sweep from −1 to 1. The output common mode Vocm can
also be expressed similarly:
62
Vocm = Vdd −I02× (x1 + x2 + x3 + x4 + x5) (3.4)
For 0 ≤ x1, x2, x3, x4 ≤ N , we can see we can reach at most (2N + 1)2 grid points
in the IQ plane. Looking at equation 2, we see there is no need for common mode
cells (b can be set to 0 for all codes), if we set x1 + x3 = N and x2 + x4 = N and use
the difference x1−x3 or x2−x4 to obtain arbitrary points in the IQ plane. Doing so,
however, increases the lattice spacing between reachable points in the IQ plane by a
factor of 2, which is why we did not choose this scheme (although it is implemented
in [22] and shown in Figure 3-6). A circuit that would implement this switching
pattern is shown below.
If we set x1 = 0 in quadrant II and III, x2 = 0 in quadrant III and IV, x3 = 0
in quadrant I and IV, and x4 = 0 in quadrant I and II, allows us to hit the native
resolution of (2N + 1)2 points. The common mode will move roughly by a factor of√
2, which means the number of common mode cells must be at least N(√
2−1). This
means that x5 must take at least (√
2 − 1) × N values. This is our final switching
pattern (the circuit implementing this switching pattern is shown in Figure 3-5).
From this native interpolator topology, there are many other interpolators which can
be built, each having different tradeoffs and gains.
Implementing switches above the inputs can to steer the current either positive
or negative and allows us to reduce the number of cells by a factor of 2 by getting
rid of all negative cells (Figure 3-7). This type of circuit is called a type II phase
interpolator. The area requirement is halved, however, the capacitive loading on the
output resistors is still the same. Furthermore, the switches here do not have enough
overdrive to operate consistently in triode, especially in a small headroom of 1.2V.
If they operate as common-gate amplifiers, the swing has to be reduced below a VT
of devices. Using those cascode switches increases the linearity of the differential
pair, since there is very little channel length modulation on the drain of the input
pairs. We can use peaking inductive loads to counter both bandwidth reduction and
63
vb
sin-sin
sin
vb
Vdd Vdd
cos cos-cos
select_sinp[31:0]select_cosp[31:0]
M1 M2
M3
M4
M5 M6
M7
M8
R1 R2
Figure 3-6: We can get rid of the common mode control cells by doubling the numberof cosine and sine cells for the same resolution (the select signals now are 32 bits widethermometer codes instead of 16 bits wide in this circuit).
Vdd Vdd
vbvb
select_sin[15:0]select_cos[15:0]
sin -sincos -cos
sign I sign I
sign Qsign Q
M1 M2
M3
M4
M5 M6 M8 M7
L1 L2
M9 M10
Figure 3-7: Type II phase interpolator. Inductors allow a larger headroom, so thatwe have both enough swing and are able to operate the switches as common-gateamplifiers.
64
swing limitation because we are allowed to swing the output above the rail [22]. The
inductive loads also make the common mode adjustment cells unnecessary.
Despite all the attractiveness of the previous circuit topology, there are some
tradeoffs that led us to not use it. First, it is a narrowband solution. Unless we have
a good way to calibrate the resonant frequency (maybe with a switchable capacitor
bank), it is hard to implement the interpolator for both 2.5GHz and 5GHz operation.
The inductors also impose an unacceptable area hit on the block.
3.2.3 Phase-Interpolator Non-idealities
Feedforward effect and Subthreshold Conduction :
In the phase interpolators such as the one used by [17], the phase output at
the quadrant boundaries are rarely the correct value. This problem extends to all
interpolators that removes the negative cells and use a sign-select switch instead. The
first reference to this phenomenon is Sidiropoulos’s work on phase interpolation for
CDR circuits [23] [24]. The reason for this phenomenon is the feedforward current
and subthreshold conduction of the differential pair tail current source. When for
example all the sine cells are turned off, we expect an output in phase with cosine.
However, the input from the sine cells still couple to the output via the Cgd capacitors
(feedforward), and the cosine cells still have an exponentially suppressed current of
the form I0 exp(− qVTnkT
) (subthreshold conduction) where n is typically around 1.5 to
1.6. Subthreshold conduction could be countered by using HVT devices for the tail
current source, but this would complicate the cascode layout. We can easily see
that feedforward and subthreshold conduction opposite shifts in the I-Q plane, but
when they do not cancel perfectly, we have a distorted I-Q plane quadrant (Figure
3-8). Techniques to counter this effect either involve carrier pre-rotation or additional
dummy switches to cancel feedforward currents, which adds to the complexity of the
design [19].
This is a problem that is not encountered in the simple type-I phase interpolator
implemented in this thesis. In the type-I case, for every cosine cells that contribute a
certain feedforward current, a corresponding negative cell also contribute the negative
65
of that feedforward current, and the net feedforward current is zero (Figure 3-9). The
same argument applies to the subthreshold conduction of the tail current source. Of
course, the cancellation of currents only occur when careful layout allows matching of
the negative and positive cells interconnect parasitics. A picture of the layout of the
interpolator array is shown in Figure 3-11. Especially in 65nm CMOS process, a large
part of the Cgd feedforward capacitor is the input to output metal rail capacitance,
not the overlap capacitance of the transistor itself.
Figure 3-8: Effect of subthreshold conduction and feedforward in the IQ plane.
vb
sin
Vdd Vdd
-sin
vb
feedforwardcurrent
CgdCgd
M1 M2
M3
M4
M5 M6
M7
M9
R1 R2
cancellationcurrent
-sin sin
Figure 3-9: Feedforward Effect in type I interpolator.
66
Non-linearity :
A differential pair is not a linear amplifier. If the differential pair is kept saturated,
the output differential current Id as a function of the input differential voltage Vd and
the the tail current source ISS is [25]:
Id =k
2Vd
√4ISSk− V 2
d (3.5)
|Vd| ≤√
2ISSk
(3.6)
k ≡ µnCoxW
L(3.7)
else , Id = ISS × sign (Vd) (3.8)
In a more illuminating fashion, we rewrite the differential current (which we plot
in Figure 3-10):
IdISS
=1
2x√
4− x2 if |x| ≤√
2 (3.9)
x ≡ VdVov
(3.10)
Vov =
√ISSk
(3.11)
else ,IdISS
= sign (x) (3.12)
Note here that Vov is just defined as the overdrive voltage of the input pairs when
the differential voltage value is 0. We can quickly see that the differential pair is
very linear over the region x ∈ (−1, 1). This means increasing the overdrive of the
differential pair is one way to obtaining a more linear transfer function curve. This
understanding that driving the transistor too many overdrives in amplitude will cause
significant harmonic distortion is necessarily an over-estimate of the nonlinearity of
the differential pair. This is because the input pairs are short-channel devices, and
therefore suffer noticeable effects of velocity saturation. We can obtain some intuition
67
behind those effects by considering a model of velocity saturation in [26]:
Id =knW
L(VGS − VT −
VDSATn2
)VDSATn (3.13)
This approximation shows a linear relation between VGS−VT and Id, and therefore
the square law analysis of the MOS device overestimates the non-linear contribution
under velocity saturation conditions. In the actual design, the calibration sets the
input amplitude to about 250mV, which is several overdrives, yet the interpolator has
a very good phase linearity.
Figure 3-10: Plot of output differential current vs input differential voltage.
3.2.4 Layout and Sizing
The tail bias current is chosen to be a long channel transistor in order to decrease
channel-length modulation and mismatch effects. The layout of the interpolator cell
is done so as to minimize DNL. Essentially, each time a new current source switches
on, we want to make sure its current contribution is as similar to the previous current
source switching on. This explains why all the cosine cells and the sine cells have
been placed together. Furthermore, great care is placed in providing identical Vgs to
68
all the cells, since the current mismatch is gm∆Vgs. This explains the larger metal
buses to ensure little IR drop at the negative supply between the cells. Furthermore,
because the interpolator array is a high frequency signal path, we want to be able
to satisfy metal fill without adding dummy metals inside the array itself. The metal
buses serve that purpose. The unit cell diffpair is slim (about 2.1µm with dummies),
so that the overall array is not too long, otherwise the difference in resistive path to a
given cell will cause systematic mismatch. 3µm worth of dummies are placed at the
edges of the interpolator array to prevent Shallow Trench Isolation (STI) effects on
carrier mobility [27].
Figure 3-11: Layout of the Interpolator Array
3.3 Signal Conditioning
We call signal conditioning anything in the high-frequency signal path that is not the
interpolator core.
69
3.3.1 Filter
Harmonic Purity Requirement :
The filter’s goal is to present a small amplitude sinusoid at the input of the inter-
polator. Let us consider 2 quadrature harmonically impure signals f1, f2 with odd
frequency components:
f1 =∑n
an cos(nωt) (3.14)
f2 =∑n
an cos(n(ωt− π
)) (3.15)
fout = cos(φ)f1 + sin(φ)f2 (3.16)
Without the higher harmonics, the crossing point would be around t′ = φω
+ π2ω
.
With the higher harmonics, this cross point shifts by ∆t, causing some phase errors
fout = 0→ a1 cos(ωt′ − φ) = −∑n≥3
an cos(ωt′ − φ) (3.17)
a1 cos(π
2+ ω∆t) = a1 sin(ω∆t) = −
∑n≥3
±an sin((n− 1)φ+ nω∆t) (3.18)
We are interested in the case where φ 6= 0 and φ 6= π2
because in those cases, we
know ∆t = 0. If we carry a Taylor expansion the left and the right hand side for
small ∆t, we have:
∆t =
∑n≥3±an sin((n− 1)φ)
ω(a1 −∑
n≥3±an cos((n− 1)φ))(3.19)
We can approximate the phase error as
∆φ ≈ 2πa3a1
(3.20)
70
A reasonable requirement is for a3a1< 1
100. Now the clocks coming into our SSCG
are already slew limited square waves, which have a3a1< 1
3. Suppose we put it through
a 40dB per decade attenuation filter. Then, we have a3a1< 1
27. The phase interpolator
is another approximately first order filter at frequencies of interest, giving another
factor of three (see Figure 3-12). At the output of the phase interpolator, we have
a3a1< 1
81. Further attenuation of the higher harmonics will occur because of slew-rate
limiting effects of the input square wave. A second order filter should therefore suffice.
Square WaveFourier Coefficients
1st harmonic
3rd harmonic
5th harmonic
1
1/31/5
Put through 2nd order filterand 1st order output stage (-60dB/dec total)
Filtered Square WaveFourier Coefficients
1st harmonic
3rd harmonic
5th harmonic
1
1/81
1/525
Figure 3-12: Cartoon picture of 2nd order filter effect on Fourier coefficients.
Circuit and Layout :
The sineshaper circuit is a modified circuit based on Jeremy Walker’s design at
Analog Devices. Because of the high frequency requirement, it is not feasible to use
a feedback topology for a filter (using an op-amp to make an integrator for exam-
ple). Instead, we use CMOS inverters, which can are digitally switchable (for coarse
71
grain bandwidth control), and whose supply is calibrated (for fine grain bandwidth
control). The schematic is shown in Figure 3-13. The first stage is overloaded by 4
identical invertors. 2 of the output stage invertors are switched on at 2.5GHz, and all
4 are switched on at 5GHz. Each stage provides roughly a 1st order roll off. However,
because the Miller capacitance is significant enough for the second stage, a feedford-
ward zero also decreases the roll-off. We therefore use zero-cancellation capacitors,
to improve the roll-off. AC coupling is done using MOM capacitors, because the
non-linear response of the MOS capacitors cause significant harmonic distortion. The
AC coupling capacitors are oversized for 5GHz operation, but need to be so for the
lower frequency 2.5GHz operation. The MOM capacitor allow us to satisfy metal fill
around the sensitive high frequency signal path without needing to introduce dummy
metals (Figure 3-14).
[3:0]
[3:0]
enable
enable_bar
in out
- Vin
Vin
feeforwardcurrent
cancellationcurrent
parasitic Miller
cap
Vin +
Vin -
Vout +
Vout -
To output of LDO
Vcm
Vcm
Figure 3-13: Filter Schematic. This schematic includes one of the 2 differential pathtaken. The feedforward cancelling capacitor is included only for the 2nd stage, becausethe gate to drain capacitance of the first stage is insignificant.
72
Figure 3-14: Layout of the SineShaper. This has 4 inverter chains, for the 2 differentialsignal path, one for cosine one for sine.
73
Biasing of the Interpolator Inputs :
The tail bias point is set to be VTn +Vov, while the input pair common mode bias
is set to be VTn +√
2Vov. The reason why the input common mode can be set this
low is because the tail current is velocity-saturated, and has a lower VDSATn than the
overdrive value. A simulation shows that the tail current source and input pairs stays
in saturation across corners (Figure 3-15).
Figure 3-15: Plot showing the margin above saturation for the tail current source anddifferential pair inputs. For PVT corner numbering reference, see Appendix C.
Evaluation :
To evaluate the performance of the filter, we input a slew-limited square wave
(20ps rail-to-rail rise and fall time) and AC couple this signal into the filter. The
simulations are done with typical layout parasitics, and nominal corner (27 degrees
C, nominal devices). We see that a second order roll-off between the fundamental and
the first harmonic is achieved for both 2.5GHz operation and 5GHz typical operation
(Figure 3-16).
74
Figure 3-16: AC response for the filter, configured for typical corner at 5GHz, and atypical corner at 2.5GHz.
3.3.2 Restoration
The restoration circuit takes in the attenuated interpolated output, and gains it to
a rail-to-rail signal. To level shift the output, we AC couple the signal and self-bias
inverters similar to the sineshaper. To increase bandwidth, power down is done in
the feedback path instead of the forward signal path (Figure 3-17).
The main tradeoff in using this topology is between having a fast feedback system,
and stability. The feedback path has to restore the bias point quickly, because code
switches will introduce common mode transients at the output of the differential
pair. Those glitches happen because of timing mismatch between common mode
and interpolating cells, current bias mismatch between those cells, and because of
charge injection that disturb the common mode of the interpolator everytime a cell
switches on. These sudden voltage steps are high-pass filtered through the CAC , the
AC coupling capacitor and disturb the bias point of the inverter, which has to be
restored quickly by the feedback circuit. The fastest code sweep of the restoration is
75
Figure 3-17: Restoration circuit and layout
312MHz, which means the loop bandwidth should be faster than 312MHz. On the
other hand, the slower the RFCAC time constant is, the slower the dominant pole of
this loop is and the more stable the loop is.
We intentionally sacrificed the stability in the small signal sense to allow fast
common mode settling. This is because the circuit is not meant to operate in small
signal, where the gain of the circuit is very large, but is meant to operate with
the output going rail-to-rail. At worst corner, the input will swing 250mV peak-
to-peak after calibration, and the output is guaranteed to rail. This implies that
requiring small-signal stability is an conservative demand, because the nonlinearity
of the inverters will clamp the gain down. Instead, we check stability of the loop
76
using transient simulations of a sinusoidal input of 250mV peak to peak, with sharp
transients modelling the discrete phase shift. We note that layout parasitics will
further attenuate phase oscillations, because both the digital switching and the finite
bandwidth of the interpolator stage will low pass filter those sharp transients. A
plot comparing the phase oscillations with a behavioral interpolator and with a real
phase interpolator circuit (with layout parasitics) is shown below (Figure 3-19). We
can still calculate the phase-margin across corners, as it is done in Figure 3-18, with
a 15fF capacitive loading (5fF for self loading for layout, and 10fF for output load
capacitance).
Figure 3-18: Restoration Phase Margin. Although the phase margin is negative, thestability of the circuit is not compromised because the inverters operate non-linearlyand effective loop gain is lower than what is expected from small-signal analysis.
3.3.3 Clock Divider
The clock divider is a re-used circuit, shown in Figure 3-20. It takes a differential
input “inn” and “inp” at 5GHz and provides quadrature outputs out1, out2, out3,
out4, at 1.25GHz. It is important to make sure not to add too much load the output,
or setup time of the latches will not be satisfied. In simulation, driving the clock
divide with the restoration circuit at worst corner, we are able to load up to 10fF of
77
Figure 3-19: Restoration output phase transient under phase code sweep.
−
+
−
+
−
+
−
+
Vdd
Vdd Vdd
out1
out3
out2
out4
Vdd
Vdd
Vdd
out-out+
clockn
clockp
clockp
clockn
clockn
clockp
clockp
clockn
in+in-
clockp
clockn
clockp
clockn
Figure 3-20: Clock Divider Circuit.
78
interconnect parasitic in addition to a minimum sized inverter.
3.4 Top-Level Floorplan
We present here the top-level floorplan (Figure 3-21). Because this chapter is mainly
concerned with the performance of the analog high-frequency signal path, we will
simulate only 5GHz signal path with layout. The actual interpolator array is very
compact, and in fact, the main area hit comes from the voltage regulator and peak
detectors, which are reused blocks. A discussion of the digital blocks and routing is
deferred to chapter 4.
Figure 3-21: Top level Floor Plan and High Frequency Signal Path Layout. Note weadded clock buffers at the output to drive the long wires out of the block.
79
3.5 Evaluating the The High Frequency Signal Path
3.5.1 Evaluation Methodology
To evaluate the interpolator, we sweep phase codes, which are decoded using an ideal
behavioral (verilog) decoder (Figure 3-22). The phase codes are updated at 312MHz,
as fast as the fastest phase update the design must sweep, thus we capture both
dynamic and static phase errors. After we let the circuit settle for 100.0ns, we start
measuring the time difference between the crosspoints of the 5GHz differential signals
at the output of the restoration circuit.
Figure 3-22: Methodology for evaluating phase linearity.
In order to accurately simulate post-calibration regulator code values, we extracted
the layout parasitics at the input of the interpolator, and simulated the filter with the
additional capacitor at its output (simulation methodology illustrated in Figure 3-23).
We then did a 200.0ns transient simulation for each PVT corner (45 of them), and
across every regulator output values (32 of them). The results of those 32×45 = 1440
simulations are displayed in Figure 3-24, where we graph the amplitude at the output
80
of the filter swept across input codes for every skew corners. The output of the filter
was fed onto the peak detector circuit with a switching threshold code value of 5 (the
threshold code can be from 0 to 16, but for a good operation of the interpolator, a
code value 4-6 works best), and we picked the lowest code where the peak detector’s
comparator would switch. These codes are stored and their corresponding values are
used for any given PVT corner simulation (the methodology is schematized in Figure
3-25).
Figure 3-23: Simulation methodology for regulator code selection cross corners.
3.5.2 Results
Phase Linearity :
We test the interpolator using the previous method with an input quadrature error
of 5, and record the corresponding values of DNLrms across corners. The circuit
was simulated with layout parasitics and includes all the high frequency signal path
(does not include the divide-by-4 circuit). Both the DNLrms values and maximum
DNL values are well within specifications (Figure 3-26 and 3-27). To estimate the
effect of random mismatches, we also do 50 monte-carlo trial phase sweeps of the
high-frequency path and evaluate DNLrms for each case. A histogram of the various
DNLrms in Figure 3-28 shows that additional non-linearity introduced by monte-carlo
mismatches are essentially insignificant (4.15 milliperiods of DNLrms is well within
specification). The monte-carlo simulation shares the 5 quadrature error handicap.
Power Consumption :
The power values were simulated with layout at 5GHz, at the worse power con-
sumption corner (corner 37, with Vdd = 1.26V , fast transistors and temperature
81
Figure 3-24: Plot of filter output amplitude as we sweep the regulator input codevalue (and therefore change the filter supply), for all 45 PVT corners.
= −40C), and output loads of 50fF for each quadrature output at 1.25GHz.
Block Power (mW)Interpolator Array 4.01Filter 3.38Restoration 4.15Divide-by-4 .45Output Clock Buffers 1.46Total 13.45
Table 3.1: Power consumption summary (clock path).
82
Figure 3-25: DNLrms value across corners for high and low frequency modes, with5 quadrature error.
Figure 3-26: We extract the largest DNL step across input code transitions, for eachcorner.
83
Figure 3-27: DNLrms histogram for 50 monte carlo simulations at 5GHz (worst casecorner, corner 28).
84
Jitter :
Figure 3-28: Integrated Jitter plot at the output of the restoration circuit.
There are 2 main contributors to phase noise at the output: systematic frequency
modulation and random jitter.
Periodic Steady-State Noise Simulation was done in SpectreRF with a fixed input
code value to obtain random jitter (shown in Figure 3-28). It is simulated with layout
at the slowest corner with 5 GHz inputs (corner 28, where Vdd = 1.14V , temperature
= 125C and all devices are skewed slow). The total random jitter at the output of
the restoration circuit is 528fs rms. This jitter figure is dominated by white thermal
noise. It is uncorrelated with the jitter from the spread-spectrum modulation. The
contribution of systematic jitter due to spread-spectrum depends on the mode of
operation. For the case where φm = 64 periods at 5GHz, and fm = 38kHz, there
are a total of 212 phase steps per modulation period Tm, each 6.25ps large. There
are 215 zero crossings of the 1.25GHz clock. This means that the systematic jitter
contribution at the 1.25GHz output clock is 1:
1Note that this jitter figure is for a phase update rate of 312MS/s. As we increase the phaseupdate rate, so does the jitter contribution from our phase modulation scheme.
85
∆t21 = (6.25ps)2 × 1
215× 212 = 4.88× ps2 (3.21)
Given random jitter is uncorrelated, we therefore have a total output rms jitter
of ∆tjitter =√
∆t21 + 528fs2 = 2.32ps
3.6 Summary
Following the architecture proposed at the end of chapter 2, we extract specifica-
tions for each block of the system in the analog signal path. We propose a simple
interpolator topology. The input filters and output restoration circuits are designed
and laid out around this core interpolator. Because of the architecture choice, both
the filtering requirements and interpolator complexity are greatly reduced, making
the circuit choice simple. We evaluate each block separately, and evaluate the full
high-frequency signal path across 45 skew corners, and with monte-carlo mismatches.
The high-frequency signal path satisfies power, jitter, speed and phase linearity spec-
ifications set in chapter 2 by a wide margin.
86
Chapter 4
Digital and Auxiliary Circuits
4.1 Introduction
This chapter is mainly concerned with the implementation details of the auxiliary
circuits and digital circuits.
The auxiliary circuits include:
• The Voltage Regulator : This a reused LDO (Low-Dropout Regulator) block.
The LDO takes a 5 input bit bus which digitally controls the output voltage
level it regulates. This voltage level is supplied to the filter and is used to adjust
the filter’s output swing.
• The Peak Detector : This is a reused block that has a peak detector and a
comparator. The peak detector converts its RF input to a DC level proportional
to the amplitude, and the comparator compares the amplitude value with a
digitally set reference voltage.
The digital circuits are:
• Waveform Generator: The Waveform Generator is a digital block whose
inputs are the divide-by-16 clock and various input mode bits. It outputs a
divide-by-64 clock to the calibration circuit and delivers the bit bus which en-
codes the phasevalue the interpolator should output.
87
• Decoders: the decoders take the bit outputs of the waveform generator and
supply the thermometer codes to control the interpolator stages.
• Calibration FSM: The Calibration FSM uses the peak detector’s output to
set the regulator code value and the number of inverters switched on in the
input filters.
4.2 Regulator
The LDO uses feedback to set the output node at a voltage level fixed by the voltage
Vref and the value of IDAC (Figure 4-1). If the feedback system is stable, negative
feedback forces both terminals of the op-amp at the same level and the output is just:
Vout = Vref + IDAC ×R2 (4.1)
IDAC in turn is a binary weighted current array controlled by digital inputs. For an
input code of value n ∈ [0, 31], IDAC outputs 87.5µA+ n(6.25µA). Vref is nominally
set at 0.3V bias. R2 is set to 2.698kΩ. This means the regulator output for code n is
Vout = 536mV + 16.9mV × n (4.2)
There are two effects that are important from our perspective. The voltage reg-
ulator is rated at 8mA, and under that current load, it will need 80mV drop-out
between Vdd and Vout to keep transistor M1 in saturation. This implies that the max-
imum output voltage at low supply of 1.14V is 1.06V. This is the maximum supply
voltage the filter should rely on using. Note this is a conservative restriction because
the actual current load by the filter is usually around 2mA.
88
−
+
regulator
output
Vdd
ibias
R1
R2
M1
IDAC
Op-AmpVref l
Figure 4-1: Regulator Schematic.
4.3 Peak Detector
The peak detector is digitally controlled by a 4 bit input code to set the offset between
VREFCM and VREF (Figure 4-2). The offset is set by a resistive DAC. Transistor M1
and M2 are essentially followers. They force the capacitor voltage to a VGS higher
than the lowest voltage signal at the input of the input transistors. This signal is then
compared with VREF and the clocked comparator switches from high to low when the
amplitude is large enough. We calibrate only the Q input, not the I input, and load
the I input with a dummy replica peak-detector.
From our perspective, there are a few important parameters to extract from the
peak detector.
• Settling Time: The Peak Detector takes 100ns to settle within .1 dB of its
final value. Therefore the calibration FSM must wait at least a 100.0ns after a
regulator code is switched before it clocks the comparator to obtain the com-
parator output. In our case, we used a conservative 384ns before the comparator
was clocked after a regulator code switch.
• RF amplitude to DC voltage gain : The gain value is −.25dB at worst
corner and −.11dB typically.
• Digital Threshold offset selection: For input code value n, the offset set by
89
−
+
Vdd
Vdd
input
code
Vdd Vdd
Vdd
Vcm
Vref
C1
CAC CAC
inp inn
comparator
output
RCM RCM
IBIAS IBIAS IBIAS4/3
4/3 IBIAS
VCMREFVCMREF
R1R2
R3
Comparator
Clock Input
4/3 IBIAS
M1 M2
M3
8/3 IBIAS
Figure 4-2: Peak Detector Schematic.
the value of IREF , R1, R2, R3 is:
VCM − VCMREF = 88.9mV + n× 7.6mV
• Load : The peak detector represents a 10fF load at worst corner.
From these numbers, we determine the mapping between the code value “n” and
the input differential amplitude A at which the comparator switches:
A = 181.0mV + 15.5mV × n (nominal corner) (4.3)
These will be the numbers used for top-level behavioral simulation.
90
Figure 4-3: Phase Deviation over time for 3 attenuation modes (16.6dB, 13.6dB,10.6dB) at fm = 38kHz.
4.4 Waveform Generator
4.4.1 Basic Operation
The goal of the waveform generator is to create the bit-values representing a given
target phase over a modulation period. Some example of phase waveforms targeted
at different values of attenuation are shown in Figure 4-3.
A simplified architecture is to implement an inner counter (called the frequency-
tuning word) to keep track of the magnitude of frequency deviation values. Because
the phase is the integral of the frequency, we increment an outer register (usually
called the phase accumulator) with the instantaneous value stored in the frequency-
tuning word (Figure 4-4). The phase accumulator therefore stores the phase value
to be used at every clock cycle. A single combinational logic element can detect
whether the frequency tuning word is full or empty, and switch from increment to
decrement or vice-versa. By switching to decrement for both the phase accumulator
and the frequency tuning word, one achieves a sawtooth frequency modulation. How
fast the frequency tuning word fills up then determines fm. The value of the phase
91
accumulator when the frequency-tuning word register is full determines the maximum
phase deviation (and attenuation specification).
Figure 4-4: Waveform Generator block diagram.
Register Sizing Math :
The fact that the slowest spread-spectrum fm is 38kHz, is roughly 213 times slower
than the digital clock update rate of 312MHz, implies that the frequency tuning word
must be at least 11 bits long ( it takes 212 − 1 cycles to increment all the way up
and 212− 1 cycles to decrement all the way down to 0). The phase accumulator must
therefore be at least 16 bits long ( an additional 5 bits for the phase value). We want
a total of 64 periods of phase deviation, and we must therefore pick a appropriate bit
range on the phase accumulator going into the decoder (divide its value) to obtain
the correct number of period deviations.
If we call the value of the frequency tuning word after the nth clock cycle an, and
the value of the phase accumulator b, the maximum value of b is:
max(b) =212−1∑n=1
an =(212 − 1)
2× 212 ≈ 223 (4.4)
92
To obtain 64 periods phase deviation, or 64×25 = 211 incremental phase steps, the
value of a phase step must be 223
211= 212. Therefore, bit[11] on the phase accumulator
must represent a LSB of phase step. It also implies the phase accumlator must be at
least 17 bits wide, since there a phase value is encoded using 5 bits.
4.4.2 Modes of Operation
The main operational mode of the spread-spectrum block is fm = 38kHz and 20dB
attenuation (19.6dB to be exact). Additional built in modes were also implemented
for debugging purposes. In particular, we want independent control of the maxi-
mum phase shift, which mainly controls the value of attenuation, and the modulation
frequency, which could be used for different RBW standards of the spectrum analyzer.
Figure 4-5: Acceptable Modes of Operation.
Suppose we want to increase fm from 38kHz to 76kHz. All we need to do is
increment 2 LSB’s of the frequency tuning word at a time, and the frequency tuning
word register will fill up twice as fast. Because fm is determined by how fast the inner
register fills, doing this effectively increasing fm by 2. However, a side effect of this
trick is that it will decrease the maximum phase deviation by 2. The calculation below
93
shows the the phase accumulator reaches a maximum value that is twice smaller (an
represents the frequency tuning word, and b the phase accumulator):
an = 2× n max(b) =211−1∑n=1
an =211−1∑n=1
2× n ≈ 222 (4.5)
Recall that for an = n, the largest value b took was 223. The maximum phase
deviation became twice smaller. This implies that at fixed maximum phase, if we in-
crease fm by 2 (by increasing the steps taken by the frequency tuning word by 2), we
also need to multiply by 2 the value of the output phase (rightshift the phasebit values
by 1). From this observation, we generate all the modes of operation for 16, 8, 4, 2×
1.25GHz period maximum deviation, and 38kHz, 76kHz, 152kHz, 304kHz, 608kHz, 1.216MHz
values for fm. Many of these modes are anomalous: they have unacceptable phase
skips. We can determine which modes violate the required maximum systematic
cycle-to-cycle jitter of 132
of the input clock period and plot it below (Figure 4-5).
Indeed, we see in the plot that modes with both large attenuation and large fm vio-
late the maximum phase skip requirement of 2 LSB’s of phase. This is because the
phase skip value scales linearly with both attenuation and fm. We therefore see that
the boundary between acceptable and unacceptable modes form a -3dB per octave
slope, implying that there is an inverse relation between attenuation and fm at fixed
maximum phase skip.
A layout of the digital block is after digital synthesis is shown in Figure 4-6.
4.5 Decoders
The Decoders take in the phase value bits from the waveform generation block and
decode those bits into a thermometer bit pattern that controls the I and Q values of
the interpolator as well as the number of common mode correction cells turned on.
The decoders are clocked both at the input and the output, because there might be
significant timing skew between the 2 decoders in the way the bits are routed. As
94
Figure 4-6: Waveform Generator layout.
long as the clock is routed symmetrically, however, that timing skew is irrelevant.
Figure 4-7 shows the decoder layout.
Figure 4-7: Sine and Cosine Decoders Layout.
95
4.6 Calibration Finite State Machine
4.6.1 Operation
Figure 4-8: Calibration algorithm.
Because the SSCG is meant to operate at only 2 different frequency modes, the
calibration FSM is particularly simplified. For the 5GHz frequencies, all the inverters
have to be turned on, while for the 2.5GHz frequency, only half of the inverters in
the 2nd stage have to be on as determined in simulation. Figure 4-8 summarizes the
calibration process, and Figure 4-9 shows the layout of the calibration block. Once
the speed of operation is selected, the calibration FSM just steps the codes of the
regulator from 0 upward until the amplitude of the input signal to the interpolator is
large enough to to switch the peak detector output, and send out a signal flag that
calibration is done. Note that once the flag is done, the spread-spectrum waveform
can start. However, we left the “start” signal for spread-spectrum to be independent
from the “end” signal from calibration because the spread-spectrum waveform start
has to synchronize with the FIFO buffer to make sure that the FIFO buffer will not
96
overflow (FIFO buffer shown in Figure 1-5). Figure 4-9 shows the calibration block
layout.
Figure 4-9: Calibration layout.
4.7 Top-Level Floor-Plan and Clock Distribution
4.7.1 FloorPlan
A top level floor-plan layout is shown in Figure 4-10. The high frequency signal path
is sandwiched between the low frequency circuits and the clock routing.
4.7.2 Clock Distribution
It is of particular importance to route the clocks symmetrically. Clock paths are
shown in Figure 4-11. This is because the decoder outputs are clocked, which means
the the bits that feed into the decoder can be skewed as long as the clocks which
control when the decoder switches its output bits are not skewed. A numerical study
97
Figure 4-10: Top-Level Floor Plan, highlighting the new blocks. Synthesized digitalcircuits are highlighted in solid lines.
98
Figure 4-11: Phase value bits and clock routing paths.
of the clock skew on the spectrum shows that clock skews has little effect on the
spectral attenuation itself (Figure 4-12).
Figure 4-12: Effect of Clock Skew between the 2 decoders and Spectral Attenuationfor a 5GHz input clock. The parameters for this sim are φm = 64× 2π, fm = 38kHz,Tdig = 3.2ns.
99
4.8 Power Summary
Block Power (mW)
calibration FSM 1.167Decoders 0.758Waveform Generator 1.414Total 3.338
Table 4.1: Digital and auxiliary circuit power consumption.
4.9 Summary
We presented the blocks that perform filter bandwidth calibration (the LDO, peak
detector and calibration FSM) and waveform generation ( waveform generator and
decoders). We characterized the LDO for future behavioral use, and calculated the
register sizing for the waveform generator. Because the phase is digitally controlled to
produce a sawtooth waveform, the waveform generation block is particularly simple,
and allows us to implement many-modes of operation with relatively little circuitry.
We analyze decoder timing skew effects, showing the small skew has little effect on the
attenuation specification. The power requirement of the digital blocks is a negligible
3.338mW, well within power budget specifications.
100
Chapter 5
Top-Level Simulation
5.1 Calibration
We plot here the simulation result for a typical calibration run (Figure 5-1). A
behavioral model of the phase interpolator and the peak detector was used. For the
phase-interpolator filter, we used a linearized model for the RF amplitude to regulator
code value relationship. The peak-detector was switched to trigger at a value of .9V
regulator output value, when the amplitude reached about 250mV single-ended peak-
to-peak. This output voltage is a typical regulator value at slow corners. The system
has been configured with the calibration sending out a flag signal called to start
spread-spectrum right away, although realistically, the spread-spectrum start signal
and the calibration end flag can (and should) be set independently.
5.2 Spread-Spectrum Operation
A cross PVT corner simulation of spread-spectrum with layout parasitics is unreal-
istic. Instead, we are here interested in capturing phase errors across corners and
characterize their effect on the spectral attenuation specification.
In the previous chapter, we collected the phase output vs. code input relationship
of the phase interpolator, across corners, using a 312MHz code sweep in transient
simulation. This relationship has been recorded and encoded into a behavioral model
101
Figure 5-1: Demonstration of Calibration Algorithm at work with behavioral phaseinterpolator and peak detector.
102
Figure 5-2: Output Clock attenuation.
of the phase interpolator. We then use this behavioral model for cross-corner Spread-
Spectrum simulation. This simulation does not capture layout parasitics of the full
block, but captures all layout parasitics of the high frequency signal path (this includes
input filters and output restoration), as well as loading effects of the peak detector and
divide-by-4 circuit. We plot the attenuation obtained as a function of the corner value
(Figure 5-2). We observe that the nonlinearity in phase has some negligible impact on
the undivided clock, but almost not impact beyond numerical truncation accuracy on
the divide-down clock (Figure 5-3). The attenuation specification is measured using
the standard metric presented in Chapter 2: by comparing the spectral height of
fundamental of an unspread and a spread output clock. The final attenuation figure
is about 19.6dB attenuation for the divide-by-4 output clock (Figure 5-3).
5.3 Full Circuit Simulation
A (schematic) transient simulation of all the circuits at transistor level is done to verify
functionality. The simulation is carried at corner 28, which is the slowest corner. We
103
Figure 5-3: Divide-by-4 clock attenuation.
record transient signals in the high frequency signal path (Figure 5-4), calculate the
phase deviation during the code sweep (Figure 5-5), and graph the phase INL and
DNL across code values (Figure 5-6). We obtain a DNLrms of 6.6 milliperiods (a
phase LSB is about 32 milliperiods).
5.4 Simulation Results Summary
All the simulation results listed in Table 5-1 are the worst-case specifications ob-
tained. The power figure was obtained at 5GHz clock input, with a code sweep
rate of 312MS/s and with the peak detector powered down. The buffered output
clocks were loaded with 50fF loads, which is most likely a conservative figure. The
peak detector consumes 6.43mW if clocked at 78MHz. However, during calibration,
the peak detector’s comparator is triggered only once every 32 clock cycles and the
comparator consumes 2.77mW. Because the comparator is dynamic (consumes power
only during clock switching), the expected power consumption of the peak detector is
6.43mW− 3132×2.77mW = 3.74mW . This is a slight underestimate because some bias
104
Figure 5-4: Differential signals in signal path for 5GHz operation.
105
Figure 5-5: 312MS/s phase code sweep for 5GHz operation.
Figure 5-6: Simulation results at 5GHz for linearity performance.
106
Specification Value Actual Simulation Results
Power Budget 50mW 15.42mWInput quadrature clocks (5 error) quadrature clocks (5 error)Output divide-by-4 quadrature clocks divide-by-4 quadrature clocksJitter Less than 5ps rms 2.32ps 1
Maximum cycle-to-cycle jitter 20ps 9.35ps 2
Maximum Time Deviation 64 periods of input clock 64 periods of input clockArea .4mm× .4mm .25mm× .35mmModulation Rate 38kHz or faster 38kHz to 306kHz 3
Modulation Waveform Sawtooth SawtoothInput clock rates fast mode (5GHz), slow mode (2.5GHz) 5GHz and 2.5GHzEMI reduction 20dB reduction 19.6dBProcess node TSMC’s 65nm CMOS low-power TSMC’s 65nm CMOS low-power
Table 5.1: Specifications summary.
lines do pull small currents, but they are much, much less than the currents pulled
by the comparator.
5.5 Summary
In this chapter, we combine the behavioral knowledge from the extensive block level
simulation to verify the operation of the SSCG system both during start-up cali-
bration mode and during normal spread-spectrum clocking mode. We evaluate the
attenuation across corners, being careful to capture all the phase non-linearity values
across different corners in our behavioral models. We then simulate at transistor
level the whole system for a code sweep at nominal conditions, confirming that the
system is functional overall and that we are not being deceived by our behavioral
models. The specification summary shows that we meet all specifications within rea-
son. The attenuation specification of 19.6dB is a robust result dependent mostly on
the maximum phase deviation value selected. The value predicted during behavioral
simulations was 19.72dB, showing that the architecture allowed significant circuit
non-idealities without fatally degrading the attenuation specification.
1Assumes 19.6dB attenuation and fm = 38kHz2Assumes 19.6dB attenuation and fm = 38kHz3Assumes acceptable attenuation and fm combination. At fm = 306kHz, only the 10.6dB
attenuation mode can be used.
107
108
Chapter 6
Conclusion
6.1 Summary
The goal of the work presented is to create a SSCG system for high speed digital
applications such as an RF-DAC. We showed analytic and numerical evidence for the
feasibility of using a phase-control system, and proposed a simple phase-modulator
as a solution. The circuit is designed and laid out in TSMC’s 65nm CMOS process,
and the final active area is .32mm × .25mm. Simulations demonstrate its operation
for both slow (2.5GHz) and fast mode (5GHz). The key to the success of the circuit
relies on a separation of frequency scales. The high speed clock can be modulated by
a relatively slower digital FSM, and still achieve high spread-spectrum attenuation
performance. Furthermore, the implementation of the phase modulator allows easily
reconfigured multiple-modes of operation, and a tight control on the maximum phase
deviation. The latter is critical in applications where spread-clocked data needs to
be retimed with an unspread-clocked circuit (e.g. when interfacing the digital and
analog part of a DAC).
6.2 System Usage
The spread-spectrum clock generator provided here generates 4 divide-by-4 quadra-
ture clock outputs because this was the original specification. However, if we remove
109
the divide-by-4 circuit, we now have high-speed Spread-Spectrum differential clock
output (2.5GHz or 5GHz) that can be used to clock any type of digital system with a
tight frequency modulation specification. Indeed, the main mode of operation, with
fm = 38kHz and maximum phase deviation, would above 25.0dB of attenuation on
undivided clock spectrum. Furthermore, the frequency spread is only 20MHz, allow
this system to be compliant with SATA I-III narrow frequency spread standards.
6.3 Further Work
6.3.1 Optimizing the Interpolator
As a first iteration of this system, the design was very conservative, leaving some
aspects to be optimized. First, the circuit is designed to have a higher phase-linearity
than necessary. We could instead have relied on calibrating the decode values to
calibrate out the phase-nonlinearity. This technique would allow using a much smaller
phase interpolator using linear interpolation scheme briefly considered in Chapter 2.
It would also allow larger input quadrature errors, which could be calibrated later.
Another aspect of the optimization will be the regulator. The regulator is a major
area hit, because it was a pre-built general purpose block with a current rating of
8mA. For our application, we only need about 2mA of current drive.
One possible idea is to eliminate the regulator and the filter altogether, and instead
use replica-feedback biasing as an input stage to ensure roughly constant signal swing
at the input of the phase interpolator [28]. Essentially, the differential pair in this
circuit has a replica, whose bias point is set so that its output swings to VREF when
the input rails. This bias point is enforced using an op amp, and routed to the actual
cell. VREF is a reference voltage that depends on the frequency of the clock, and is
usually supplied by the output of a loop-filter of a PLL. The voltage Vbiasp on the
other hand, is set to make the PMOS operate in deep triode, to have roughly linear
loads. Obviously, this scheme requires that the system has an on-chip PLL that locks
110
Vdd Vdd
−
+
Vref
Vb Vb
+sin -sin
VddVdd
ReplicaCell
M3 M4 M1 M2
Op-Amp
outpoutn
Vbiasp
Figure 6-1: Replica-Feedback biasing example
onto the input clock. The presence of a PLL is quitely likely, because it is a simple
way to generate quadrature input clocks required by the SSCG system.
6.3.2 Calibration for Arbitrary Input Frequencies
The calibration algorithm as it is written is designed for only 2 modes of operation,
2.5GHz or 5GHz input clock. The circuitry can operate at any frequencies in between.
Therefore, it is a simple matter of rewriting the algorithm to make the system be able
to calibrate the input swing for frequencies in between. This involves sweeping both
the voltage supplied to the filters (fine-grain calibration) and the number of inverters
switched on (coarse-grain calibration), while comparing the signal amplitude with a
set reference.
6.3.3 Top-level clock Distribution
A simple clock distribution scheme can be implemented. In this scheme, the input
quadrature clocks are buffered into the interpolator, but also have divided down
versions used to clock the digital blocks. We provide both divide-by-16 clocks and
divide-by-8 clocks because the digital block can run up to 312MHz. If the inputs are
at 2.5GHz, the divide-by-8 clock can be used instead and fully exploit the speed of
111
Figure 6-2: Simple Clock Distribution Method
the digital block.
112
Appendix A
Effect of Spectrum Analyzer
A.1 Model of peak-hold mode Spectrum Analyzer
In the numerical treatments, we assumed the spectrum measured would be similar
to a numerical FFT. In general, this is not a accurate statement. Instead, what is
most likely is that a spectrum analyzer operated in peak-hold mode will be used to
measure the output spectrum of a DAC (Figure 6-3). We will adhere to the notation
consistent with [15] and Chapter 2 of the thesis. The filter of the spectrum analyzer
has an LTI response of the form:
h(t, fc(t)) = h0(t) exp(2πjfc(t)t), (A.1)
where h0(t) is a bandpass filter impulse response function centered at DC with a
bandwidth set to be the RBW of the spectrum analyzer. The frequency-shifted
bandpass filter h(t, fc) is centered at fc. The signal Imk(t), the kth clock harmonic, is
fed into this tunable filter, with fc being discrete stepped. Note for a given RBW of
the filter, the filter center can step only every 1RBW
time, by a frequency step of the
order RBW . A peak detector measures the amplitude of the filter’s output, called
Ib(t, fc), which is then directly proportional to the value registered by the spectrum
analyzer as the power-spectral amplitude at fc (called S(fc)).
113
Figure A-1: Peak-Hold mode operation of spectrum analyzer, Figure from [16].
A.2 Calculation of Measured Spectrum
Intuitively, when the filter’s center frequency matches with the signal frequency,
Ib(t, fc) will “resonate” and the amplitude will be larger. To obtain an expression
for the signal at the output of the filter from the spectrum analyzer, we convolve the
frequency modulated input with the filter response:
Imk(t′) ≡ I0k
2exp
(2πjf0(t+ δ
∫ fmt
0
dτV (τ))
)(A.2)
Ib(t, fc) =
∫dt′Imk(t
′)h0(t− t′) exp (2πjfc(t− t′)) (A.3)
The expression for Ib(t, fc), the filter’s output, can be approximated by using
the method of stationary phase. The method of stationary phase tells us the main
contributions to the integral comes from the instants when the input instantaneous
frequency matches fc, the filter center frequency. Around those times, we carry out
a Taylor expansion of the phase to simplify Ib to the expression below:
Ib(t, fc) =∑α
∑l
I0k2
√1
jkδV ′(tlα)Γ(tlα)h0(t− tlα) exp (2πjfc(t− tlα)) (A.4)
S(fc) ∝ max|Ib(t, fc)| (A.5)
For a sensible analysis, we will explore two limits for a sawtooth like waveform
114
(where there only α = 1): one where RBW fm and another where RBW fm
(which corresponds to our application). For RBW fm, each term in the integral
do not overlap, and we can approximate:
S(fc) ≈ Γ(tl)max|h0|√jkδV ′(tl)
for any l values (A.6)
We observe that in this case, the attenuation scales 3dB per octave with the first
derivative of the instantaneous frequency, δV ′(tl). For a sawtooth modulation, this
value is clearly dependent on δ and fm. Furthermore, the attenuation is independent
of the details about the filter shape, the only filter related information is captured
with max(|h0|).
In the second case, to facilitate the analysis, consider a rectangular shaped filter
in time domain (we’ve made the filter non-causal, but it doesn’t change significantly
the analysis), with:
h0(t) = RBW for |t| < 1
2RBW(A.7)
h0(t) = 0 for |t| > 1
2RBW(A.8)
S(fc) ≈1
Tm ×RBW× I0k
2×RBW
√1
kf0δV ′(tl)|Γ(tl)| for any l values (A.9)
=1
Tm
I0k2
√1
kf0δV ′(tl)|Γ(tl)| (A.10)
The reason why the S(fc) scales with 1Tm
is because the number of stationary
points that contribute to the integral is proportional to 1Tm
. This relationship is true
up to Tm = 1RBW
. We now recover the 3dB scaling between attenuation and maximum
phase δf0V′(tl)T
2m.
115
A.3 Example 1: Gaussian Filter
In the previous example, we considered a time-domain rectangular shaped filter, which
simplified the calculations but was not particular realistic. Here, we consider a so-
called gaussian filter:
h0(t) = exp
(− t
2
λ2
)(A.11)
This filter’s impulse response has a characteristic time scale λ, and therefore the
RBW of the filter is 1λ. For fm RBW , we then have:
Ib(t, fc) =I0k2
∑l
√1
jkδf0V ′(tl)exp
(−(t− tl)2
λ2+ j2πfc(t− tl)
)(A.12)
As we let the peak detector settle, the largest value Ib take will be the sum of all
the stationary point’s contributions, weighted by h0(t− tl). Therefore
|S(fc)| ≈∣∣∣∣ 1√jkf0δV ′
I0k2
∣∣∣∣ ∞∑l=−∞
exp
(−l2
(Tmλ
)2)
(A.13)
Once again, we see the 2 regimes of operation. If Tm λ, then only the l = 0
term contributes is any significant form. This implies that the attenuation |S(fc)| ≈
| 1√jkf0δV ′
I0k2|, and the dependence of Tm drops out. In the case of Tm λ, we
have to account for the contributions of other values of l. If we bound the sum for
l ∈ (−104, 104), we will have a reasonable estimate for values of Tmλ 10−4, and the
sum is easily numerically estimated. The numerical estimation matches our intuition
that in that regime, there should be a first order roll-off relation between attenuation
and Tm.
116
Figure A-2: Attenuation as a function of Tmλ
for a Gaussian Filter, where we fix
| 1√jkf0δV ′
I0k2| = 1
A.4 Example 2: Sinc Filter
Another filter response could be the Sinc function, which has a flat frequency response
over a frequency range 1λ.
h0(t) =sin(πt
λ)
πt|S(fc)| ≈ |
1√jkf0δV ′
I0k2|∞∑
l=−∞
sin(−lπ Tmλ
)
πlTm(A.14)
We compute similarly |S(fc)| and graph the behavior of |S(fc)| (Figure A-3).
117
Figure A-3: Attenuation as a function of Tmλ
for a Sinc filter, where we fix
| 1√jkf0δV ′
I0k2| = 1
118
Appendix B
Terminology
BPF Band Pass Filter
CDR Clock and Data Recovery (circuit)
CP Charge Pump
DAC Digital-to-Analog Converter
DC Direct Current.
DFT Discrete Fourier Transform
DNL Differential Non-Linearity
DRFPM Digital-to-RF Phase Modulator
DLL Delay-Locked Loop
EMI Electro-Magnetic Interference
FFT Fast Fourier Transform
FIFO First In First Out
FSM Finite State Machine
HVT High-Voltage Threshold
INL Integral Non-Linearity
LF Loop-Filter
LPF Low Pass Filter
LSB Least significant bit
LTI Linear Time-Invariant
LVT Low-Voltage Threshold
119
MSB Most Significant Bit
PA Power Amplifier
PD Phase Detector
PLL Phase-Locked Loop
PSD Power-Spectrum Density
PVT Process, (supply) Voltage, Temperature
RBW Resolution Bandwidth
RF Radio-Frequency
SFDR Spurious-Free Dynamic Range
SS Spread-Spectrum
SSCG Spread-Spectrum Clock Generator(tion)
SS Spread-Spectrum
STI Shallow Trench Isolation
VCDL Voltage-Controlled Delay Line
VCO Voltage Controlled Oscillator
120
Appendix C
PVT Corner Nomenclature
The PVT corners in this thesis are numbered. The number allows us to determine the
skew characteristics of the devices as well as the simulated temperature and supply
voltage values. The devices can be skewed nominal, fast or slow, while the supplies
are varied between 1.14V , 1.20V (nominal), 1.26V . The temperatures are set to
either −40C, 27C or 125C. n is shorthand for NMOS, p is shorthand for PMOS r
is shorthand for resistor, c is shorthand for capacitor. For a corner number n, we list
here the options set for the simulations:
• If n mod (5) = 0, slow n, fast p, nominal r, nominal c
• If n mod (5) = 1, nominal n, nominal p, nominal r, nominal c
• If n mod (5) = 2, fast n, fast p, fast r, fast c
• If n mod (5) = 3, slow n, slow p, slow r, slow c
• If n mod (5) = 4, fast n, slowp, nominal r, nominal c
• If n mod (15) < 5, temperature is 27C
• If 5 ≤ n mod (15) ≤ 9, temperature is −40C
• If n > 9, temperature is 125C
• If n ≤ 15, supply is 1.2V
121
• If 15 < n ≤ 30, supply is 1.14V
• If n > 30, supply is 1.26V.
122
Bibliography
[1] Domine Leenaerts Gabriele Manganaro. Advances in Analog and RF IC Designfor Wireless Communication Systems. Elsevier, Oxford, UK, 2013.
[2] Gabriele Manganaro. Advanced Data Converters. Cambridge University Press,Cambridge, UK, 2012.
[3] Bezhad Razavi. Design of Analog CMOS Integrated Circuit. McGraw Hill, NewYork, NY, 2001.
[4] David A. Tony C. Carusone. Design of Analog CMOS Integrated Circuit. WileySons, Danvers, MA, 2001.
[5] Hsiang-Hui Chang, I-Hui Hua, and Shen-Iuan Liu. A Spread-Spectrum ClockGenerator with Triangular Modulation. Solid-State Circuits, IEEE Journal of,38(4):673–676, Apr 2003.
[6] Chao-Chyun Chen, Sheng-Chou Lee, and Shen-Iuan Liu. A Spread-SpectrumClock Generator Using a Capacitor Multiplication Technique. In Emerging In-formation Technology Conference, 2005., pages 4 pp.–, Aug 2005.
[7] Yi-Bin Hsieh and Yao-Huang Kao. A Fully Integrated Spread-Spectrum ClockGenerator by Using Direct VCO Modulation. Circuits and Systems I: RegularPapers, IEEE Transactions on, 55(7):1845–1853, 2008.
[8] M. Kokubo, T. Kawamoto, T. Oshima, T. Noto, M. Suzuki, S. Suzuki,T. Hayasaka, T. Takahashi, and J. Kasai. Spread-spectrum Clock Generatorfor Serial ATA Using Fractional PLL Controlled by Delta-Sigma Modulator withLevel Shifter. In Solid-State Circuits Conference, 2005. Digest of Technical Pa-pers. ISSCC. 2005 IEEE International, pages 160–590 Vol. 1, 2005.
[9] Yi-Bin Hsieh and Yao-Huang Kao. A Fully Integrated Spread Spectrum ClockGenerator Using Two-Point Delta-Sigma Modulation. In Circuits and Systems,2007. ISCAS 2007. IEEE International Symposium on, pages 2156–2159, 2007.
[10] R.H. Mekky and M. Dessouky. A 0.8 ps rms Jitter, 6.3 GHz Spread SpectrumClock Generator for SerDes Transmitter Clocking. In Microelectronics (ICM),2010 International Conference on, pages 80–83, Dec 2010.
123
[11] Kuo-Hsing Cheng, Cheng-Liang Hung, and Chih-Hsien Chang. A 0.77 ps RMSJitter 6-GHz Spread-Spectrum Clock Generator Using a Compensated Phase-Rotating Technique. Solid-State Circuits, IEEE Journal of, 46(5):1198–1213,May 2011.
[12] S. Damphousse, K. Ouici, A. Rizki, and M. Mallinson. All Digital Spread Spec-trum Clock Generator for EMI Reduction. In Solid-State Circuits Conference,2006. ISSCC 2006. Digest of Technical Papers. IEEE International, pages 962–971, Feb 2006.
[13] D. De Caro, C.A. Romani, N. Petra, A.G.M. Strollo, and C. Parrella. A 1.27 GHz,All-Digital Spread Spectrum Clock Generator/Synthesizer in 65 nm CMOS.Solid-State Circuits, IEEE Journal of, 45(5):1048–1060, May 2010.
[14] Jonghoon Kim, Dong Gun Kam, Pil Jung Jun, and Joungho Kim. Spread Spec-trum Clock Generator with Delay Cell Array to Reduce Electromagnetic Inter-ference. Electromagnetic Compatibility, IEEE Transactions on, 47(4):908–920,2005.
[15] Y. Matsumoto, K. Fujii, and A. Sugiura. An Analytical Method for Determiningthe Optimal Modulating Waveform for Dithered Clock Generation. Electromag-netic Compatibility, IEEE Transactions on, 47(3):577–584, Aug 2005.
[16] D. De Caro. Optimal Discontinuous Frequency Modulation for Spread-SpectrumClocking. Electromagnetic Compatibility, IEEE Transactions on, 55(5):891–900,Oct 2013.
[17] T.W. Barton, SungWon Chung, P.A. Godoy, and J.L. Dawson. A 12-bit Resolu-tion, 200-MSample/second Phase Modulator for a 2.5GHz Carrier with DiscreteCarrier Pre-rotation in 65nm CMOS. In Radio Frequency Integrated CircuitsSymposium (RFIC), 2011 IEEE, pages 1–4, 2011.
[18] R. Kreienkamp, Ulrich Langmann, C. Zimmermann, and T. Aoyama. A 10-Gb/sCMOS Clock and Data Recovery Circuit with an Analog Phase Interpolator. InCustom Integrated Circuits Conference, 2003. Proceedings of the IEEE 2003,pages 73–76, 2003.
[19] Taylor W. Barton. Phase Manipulation for Efficient Radio Frequency Transmis-sion. PhD thesis, MIT, Cambridge, MA, August 2012.
[20] G. von Bueren, L. Rodoni, H. Jaeckel, A. Huber, R. Brun, D. Holzer, andM. Schmatz. 5.75 to 44Gb/s Quarter Rate CDR with Data Rate Selection in90nm Bulk CMOS. In Solid-State Circuits Conference, 2008. ESSCIRC 2008.34th European, pages 166–169, 2008.
[21] Xiuge Yang, Changhua Cao, K.K. O, J. Brewer, and Jenshan Lin. A 2.5 GHzConstant Envelope Phase Shift Modulator for Low-Power Wireless Applications.In Radio Frequency integrated Circuits (RFIC) Symposium, 2005. Digest of Pa-pers. 2005 IEEE, pages 667–670, 2005.
124
[22] Hua Wang and A. Hajimiri. A Wideband CMOS Linear Digital Phase Rotator.In Custom Integrated Circuits Conference, 2007. CICC ’07. IEEE, pages 671–674, 2007.
[23] S. Sidiropoulos and M.A. Horowitz. A Semidigital Dual Delay-Locked Loop.Solid-State Circuits, IEEE Journal of, 32(11):1683–1692, 1997.
[24] Stefanos Sidiropoulos. High Performance Inter-Chip Signaling. PhD thesis, Stan-ford, Stanford, CA, April 1998.
[25] Paul R. Gray et al. Analog and Design of Analog Integrated Circuits. WileySons, Danvers, MA, 2010.
[26] Borivoje Nikolic Jan M. Rabaey, Anantha Chandrakasan. Digital IntegratedCircuits, A Design Perspective. Prentice Hall, Danvers, MA, 2003.
[27] Wei Wu, Gang Du, Xiaoyan Liu, Lei Sun, Jinfeng Kang, and Ruqi Han. Physical-Based Threshold Voltage and Mobility Models Including Shallow Trench Iso-lation Stress Effect on nMOSFETs. Nanotechnology, IEEE Transactions on,10(4):875–880, July 2011.
[28] J.G. Maneatis. Low-Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques. Solid-State Circuits, IEEE Journal of, 31(11):1723–1732,Nov 1996.
125