improvedphasedetectionfor digital phase-lockedloops€¦ · 4 linearization of digital pll 72 ......
TRANSCRIPT
Improved phase detection for DigitalPhase-Locked Loops
by
Amer Samarah
A thesis submitted in conformity with the requirements
for the degree of Doctor of PhilosophyGraduate Department of Edward S. Rogers Sr. Department of Electrical
and Computer Engineering
University of Toronto
© Copyright 2016 by Amer Samarah
Improved phase detection for Digital Phase-LockedLoops
Amer Samarah
Doctor of Philosophy
Graduate Department of Edward S. Rogers Sr. Department of Electrical and Computer
Engineering
University of Toronto
2016
Abstract
Digital PLLs (DPLLs) have emerged as reliable alternatives to analog PLLs since they
are more robust in the presence of process variations and mismatch and do not need
a large on-chip capacitor to realize a loop filter. However, a DPLL employs a time-
to-digital converter (TDC) to resolve the phase error in quantized steps which shows
up as deterministic jitter in the output clock. Similarly, a DPLL tunes a digitally-
controlled-oscillator (DCO) in discrete frequency steps which adds an extra source of
jitter. Furthermore, the quantized response of the DPLL may cause chaotic limit cycles
in some configurations.
This thesis presents innovations to make DPLLs suitable for a wide range of appli-
cations. First, a novel low-power TDC with 4 ps resolution, approximately an order of
magnitude better than an inverter delay in the 0.13 µm CMOS technology, is enabled by
employing a highly digital coarse-fine TDC with a calibrated coarse stage followed by a
fine stochastic stage. On power-up, on-chip calibration algorithm based on a balanced
mean code density test is used to minimize nonlinearities in the coarse TDC, reducing
worst-case spurs from -54.4 dBc to -70.55 dBc at 1.995 GHz operation.
The thesis also investigates dead-zone behavior in DPLLs caused by the quantiza-
ii
tion effect of the TDC. It results in chaotic limit cycle behavior; producing higher than
expected in-band phase noise and strong spurious tones. To alleviate this problem, a
noise-shaped offset is added to the phase error to keep the TDC active and away from
the dead-zone. The proposed solution is demonstrated in a 0.13 µm CMOS prototype
achieving consistent low in-band noise.
A binary bang-bang phase detector (BBPD) is a commonly used alternative to the
power hungry TDC. However, BBPD based DPLLs have limited frequency pull-in and
capture range that are traded off for steady-state jitter performance. The thesis proposes
an alternative to BBPD by using a multi-phase bang-bang detector (MPBBD). Also, the
thesis presents a rigorous mathematical analysis of the cycle slipping behavior to quantify
the pull-in and capture range as well as the locking time for DPLL with either BBPD or
MPBBD. The final formula gives a useful insight into the effect of various loop parameters
on the cycle slipping behavior. A DPLL architecture with a MPBBD is presented and
implemented in 28 nm CMOS technology to improve the pull-in and capture range while
not affecting the steady-state jitter performance.
iii
Acknowledgements
First and foremost I would like to express my sincere thanks and appreciation to my
supervisor, Prof. Anthony Chan Carusone, for his guidance, support, and his constant
encouragement throughout the course of this thesis. Many thanks to my thesis commit-
tee members, Prof. Glen Gulak, Prof. Antonio Liscidini, and Prof. Wai Tung Ng, for
reviewing my thesis and for their valuable feedback. I would like to thank my external
examiner, Prof. Michael Peter Kennedy, for his detailed and valuable feedback.
To my colleagues and friends in BA5158 and BA5000 at the University of Toronto,
thank you for encouragement and unforgettable moments. Special thanks for Karim Ab-
delhalim, Alireza Nilchi, Hamed Jafari, Kentaro Yamamoto, Yannis Sarkar, Derek Ho,
Dustin Dunwell, Saber Amini, Javid Musaev, Hossein Kassiri, and Arshya Feyzi. A spe-
cial thanks for Marcel Lugthart and Greg Unruh for their help and fruitful discussions
during my internship at Broadcom.
I would like to acknowledge CMC Microsystems for the provision of products and fabri-
cation services that facilitated this research. I also would like to thank Semtech Inc. for
facilitating testing in their laboratory and NSERC for funding support.
Finally, I wish to express my gratitude to my parents, brothers, and sister for their
invaluable love and encouragement through my life. Last but not least, my thanks go to
my lovely wife, Rana Qasass, for being on my side during the ups and downs. Thank
you for the constant and unconditional support and thank you for being a great wife and
mother.
iv
Contents
List of Tables vii
List of Figures viii
Lit of Abbreviation xviii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview of PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Overview of DPLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 System Level Overview and Analysis of DPLL 9
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Time-domain model of DPLL . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 DPLL model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 TDC quantization noise . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 DCO quantization noise . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 TDC Fractional Spurs . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 System level design of DPLL . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Basic TDC structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.1 TDC Normalization Circuit . . . . . . . . . . . . . . . . . . . . . 24
3 A DPLL with Calibrated Coarse and Stochastic Fine TDC 26
3.1 Overview of the DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
v
3.2 State-of-Art Implementations of TDC . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Buffer delay and Inverter delay line TDC . . . . . . . . . . . . . . 28
3.2.2 Vernier delay line TDC . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.3 Gated ring oscillator (GRO) TDC . . . . . . . . . . . . . . . . . . 28
3.2.4 Interpolation-Based TDC . . . . . . . . . . . . . . . . . . . . . . 29
3.2.5 Two-step TDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Coarse-Fine Stochastic TDC . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Coarse TDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3.2 Fine Stochastic TDC . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 TDC Output Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 TDC Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Clock domain synchronization . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7 Implementation Details of the DPLL . . . . . . . . . . . . . . . . . . . . 50
3.7.1 Digital Loop Filter (DLF) . . . . . . . . . . . . . . . . . . . . . . 50
3.7.2 DCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.8 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.8.1 PCB and Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.8.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4 Linearization of Digital PLL 72
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.2 TDC Dead-Zone Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2.1 Zero-Phase Restart (ZPR) Mechanism . . . . . . . . . . . . . . . 80
4.3 Noise-Shaped Dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.1 Implemented Noise-Shaped Dithering . . . . . . . . . . . . . . . . 82
4.3.2 Improved Noise-Shaped Dithering . . . . . . . . . . . . . . . . . 85
4.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5 Cycle-Slipping and Pull-In Range of Bang-Bang PLLs 89
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.2 Transient Analysis of MPBBD PLL When Far From Lock . . . . . . . . 92
5.3 Cycle Slipping Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . 95
vi
5.3.1 Analysis of Pull-In Frequency Range . . . . . . . . . . . . . . . . 95
5.3.2 Locking Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.4 Fast Simulation Model of a DPLL with Quantized Phase Detector . . . . 108
5.4.1 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.4.2 Analogy between DPLL and ∆Σ modulator . . . . . . . . . . . . 111
5.4.3 Improved MPBBD (IMPBBD) without cycle slipping to accelerate
Frequency Acquisition . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4.4 Verilog-A Simulation of DPLL . . . . . . . . . . . . . . . . . . . . 117
5.5 Implemented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.5.1 DCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.5.2 MPBBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.5.3 Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5.5 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6 Conclusion 131
6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
A Schematics 135
B Noise Contribution to Timing Jitter 139
B.1 Noise Floor Contribution to Timing Jitter . . . . . . . . . . . . . . . . . 139
B.2 Phase Noise and Spurs Contribution to Timing Jitter . . . . . . . . . . . 140
B.3 Approximation of RMS Timing Jitter from L(f) . . . . . . . . . . . . . . 143
C Modeling and Simulation of DCO 145
C.1 Noise Modeling of White Gaussian Noise . . . . . . . . . . . . . . . . . . 145
C.1.1 Modeling Flicker Noise . . . . . . . . . . . . . . . . . . . . . . . . 146
C.2 Simulation of the PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Bibliography 156
vii
List of Tables
3.1 State-of-the-art fine-resolution TDC . . . . . . . . . . . . . . . . . . . . . 69
3.2 Comparison Among Published Digital Synthesizers. . . . . . . . . . . . 70
4.1 Summary of TIE rms and peak-to-peak jitter for the 60 different simulation. 84
5.1 Comparison of the pull-in range of BBPD vs. MPBBD base DPLL using
simulation and theoretical findings when Kp = 3 and Ki = 1/32. . . . . . . 100
5.2 The pull-in range (normalized to reference frequency) of BBPD-DPLL
based on simulations and presented theory as well as based on other ref-
erences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
viii
List of Figures
1.1 Typical analog PLL Architecture with a multi-modulus divider and ∆Σ
modulator to synthesize fractional channels. . . . . . . . . . . . . . . . . 3
1.2 Typical analog PLL Architecture with DAC to cancel the deterministic
error caused by ∆Σ modulator [7]. . . . . . . . . . . . . . . . . . . . . . 3
1.3 Digital PLL Architecture where the loop filter is all digital, and the phase
and frequency error signals are fixed-point numbers. . . . . . . . . . . . . 4
1.4 Phase noise contributions for low- and high-bandwidth DPLLs. . . . . . . 5
2.1 Digital PLL Architecture where the loop filter is all digital, and the phase
and frequency error signals are fixed-point numbers. . . . . . . . . . . . . 10
2.2 DPLL model in discrete-time. The DCO gain, Kdco, is expressed in Hz/
LSB. The phase detector gain, Ktdc, is unity for fractional mode and is
inversely proportional to the input phase error during integer mode. . . . 11
2.3 DPLL response using different sampling rate when Kp = 1, Ki = 1/64,
and the DCO gain Kdco = 726 kHz. The blue circles represent DPLL
responses when Fref = 20 MHz while the green triangles describe DPLL
responses when Fref = 40 MHz. . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 DPLL behavior for different damping settings. The DCO gainKdco = 726 kHz/LSB
and the proportional gain Kp = 1 for all settings. The blue circles rep-
resent DPLL step response with Ki = 1/64. The green triangles describe
DPLL step response when Ki = 4/64. The final setting is marked using
red squares where Ki = 16/64. . . . . . . . . . . . . . . . . . . . . . . . . 14
ix
2.5 DPLL behavior for different bandwidth settings while damping ratio is
kept the same. The DCO gain Kdco = 726 kHz/LSB. The blue circles
represent DPLL step response with Kp = 1 and Ki = 1/64. The green
triangles describe DPLL step response when Kp = 2 and Ki = 4/64. The
final setting is marked using red squares where Kp = 4 and Ki = 16/64. . 14
2.6 TDC output during frequency acquisition (below 30 µs) and phase locking
with different TDC resolutions (FCW = 120.01709). . . . . . . . . . . . 16
2.7 Spectrum of TDC quantization noise, tQ, for different TDC resolutions
(FCW = 120.01709). Simulation results are marked in blue while theo-
retical expectations from Eq. 2.13 are marked in red. . . . . . . . . . . . 16
2.8 Phase noise spectrum of TDC Output for different fractional channels
(∆ttdc = 32 ps). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Spectrum of the phase error computed by TDC when FCW = 120.01709:
(a) solid red line when ∆ttdc = 32 ps (b) dashed blue when ∆ttdc = 2 ps.
There is 20 dB difference in the spectrum density at low frequency. . . . 19
2.10 Histogram of absolute output jitter when FCW = 120.01709: (a) solid
red line when ∆ttdc = 32 ps where peak to peak jitter is 21 ps (b) plus
blue symbols when ∆ttdc = 2 ps where peak to peak jitter is 9 ps. . . . . 19
2.11 Timing diagram of phase error computation for DPLL with FCW =21/4. 20
2.12 Phase noise of the output clock (2400.3418 MHz), based on MATLAB/
Simulink simulation, when ∆ttdc = 4 ps. . . . . . . . . . . . . . . . . . . 22
2.13 Buffer delay line implementation of TDC: simplified schematic view (left);
timing diagram(right). The raw Q[i] is pseudo-thermal code to be con-
verted into a normalized binary word representing the fractional phase
error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.14 Estimating phase error based on the TDC output. . . . . . . . . . . . . . 23
2.15 A typical circuit to normalize the phase error of a TDC. . . . . . . . . . 24
2.16 Estimate of TDC resolution as computed by the TDC normalization cir-
cuit. The raw data (blue) is plotted along the 128-point moving average
filter (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 A digital PLL architecture for fractional frequency synthesis [16]. The
shaded blocks are custom-designed but can be automatically generated
using a scripting language like TCL due to the regular structure. . . . . 27
x
3.2 Block and timing diagram of Gated Ring Oscillator (GRO) based TDC. . 29
3.3 Two-stage TDC: Coarse TDC followed by timing amplifier of the residue
which feed to another coarse stage. . . . . . . . . . . . . . . . . . . . . . 30
3.4 The coarse TDC architecture of a two-step TDC. The delayed version of
Fref with phase closest to Fout is muxed to the second TDC stage. Path
delays for the selected reference phase Fref to D Fref and DCO clock Fout
to D Fout are matched. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 The fine stochastic TDC (STDC) architecture of the two-step TDC. The
STDC outputs are sampled on the rising edge of the delayed reference clock. 33
3.6 (a) The stochastic TDC arbiter input-output relationship without and
with random mismatch. Input-referred voltage offset due to mismatch
translates into time offset. (b) SR-Latch used in the stochastic TDC as
arbiter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 A Monte Carlo simulation of the stochastic TDC for a given negative
phase error. The sum of all stochastic TDC arbiter outputs translates
into a phase error within the linear region of the time-offset’s statistical
CDF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Spectre Monte-Carlo simulation of threshold voltage, Vth, for a minimum
size transistor. Accordingly, mean(Vth) = 345 mV and stdv(Vth) = 22.78 mV. 36
3.9 Transfer function of stochastic TDC output using Spectre Monte-Carlo
simulation. Note that Vth has a standard deviation of 22.78 mV. . . . . . 37
3.10 (a) Transfer function of one random example of a non-ideal stochastic
TDC when the number of arbiters M = 64. The associated DNL and INL
are shown in (b) and (c), respectively. . . . . . . . . . . . . . . . . . . . . 39
3.11 Normalized PDF of Gaussian distributed random offset for a stochastic
TDC with a different number of arbiters. Each plot is obtained from a
100-run Monte-Carlo simulation. . . . . . . . . . . . . . . . . . . . . . . 41
3.12 Normalized CDF of Gaussian distributed random offset for a stochastic
TDC with a different number of arbiters. Each plot is obtained from a
100-run Monte-Carlo simulation. . . . . . . . . . . . . . . . . . . . . . . . 42
3.13 The associated DNL and INL of an ideal stochastic TDC with random
offset when the number of arbiters M = 64 & 512. . . . . . . . . . . . . . 43
xi
3.14 Phase error computation and normalization with respect to one DCO out-
put period, performed digitally. The phase error computed by the coarse
TDC is refined by the stochastic TDC. . . . . . . . . . . . . . . . . . . . 44
3.15 On-chip low-area calibration algorithm of the coarse TDC based on a code
density test. The dedicated calibration clock, fcalb, is sampled by the
coarse TDC during the calibration phase and once done, the coarse TDC
samples the DCO clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.16 Delay cell with 4-bit calibration capacitor bank. . . . . . . . . . . . . . . 46
3.17 Spectre mismatch Monte-Carlo simulation of the inverter unit used in the
calibrated coarse TDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.18 Generating random time errors that are uniformly distributed to calibrate
the coarse TDC. Sampling 333.333 MHz calibration clock using 20 MHz
reference will produce uniform time error within ± 1500 ps. . . . . . . . 47
3.19 The delay of TDC inverters before (blue triangle points) and after (red
square points) calibration using a balanced code density test with fine
delay correction (floating point precision). After calibration, the delay
mean = 33.345 ps, std = 0.335 ps, and peak-peak error = 1.203 ps. . . . 48
3.20 The delay of TDC inverters before (blue triangle points) and after (red
square points) calibration using balanced mean with 1 ps correction step
(fixed point precision). After calibration, the delay mean = 32.603 ps, std
= 0.559 ps, and peak-peak error = 1.854 ps. . . . . . . . . . . . . . . . . 48
3.21 Clock synchronization of the reference clock, fref , using the DCO clock,
fout, and a divided down DCO clock, clk8. The DCO clock is divided by
two using CML divider which are custom designed. The synchronization
afterward and feedback phase counter are fully synthesized. . . . . . . . . 49
3.22 Implementation of the digital loop filter. The coarse filter uses only pro-
portional gain, Kc, to accomplish rough frequency lock. Then, it gets
disabled such that a first order filter takes over to achieve phase lock.
Gear shifting is used to accelerate the phase locking time. Finally, the IIR
can be enabled to filter out high-frequency noise. . . . . . . . . . . . . . 51
3.23 LC-DCO with two banks of tuning. The coarse tuning is implemented
using MiM capacitors while the fine tuning is achieved by using MOS
varactors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
xii
3.24 Coarse frequency tuning using Metal-Insulator-Metal (MiM) capacitors. . 52
3.25 Fine frequency tuning using MOSFET varactors. The frequency tuning is
defined by the difference of MOS capacitance between the ON and OFF
state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.26 Illustration of the DCO controls bits and gains. There are six coarse bits
with an average gain of 8125 kHz/code. Also, there are 24 fine bits that are
further divided down into 6-MSBs with an average gain of 726 kHz/code
and 18 LSBs representing the fractional part of the frequency control word.
The 18 fractional LSBs are decoded into 7-bit thermo-metric matrix and
11-bit provided to the ∆Σ modulator to achieve immensely fine frequency
resolution below what a minimum size MOS varactor can achieve in a
particular process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.27 Implementation of the third-order reduced complexity ∆Σ modulator [33].
The first stage has higher computational resolution compared with the
following stages to reduce power and complexity and to meet timing re-
quirement during synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.28 Die photo of the DPLL chip in an IBM (now GF) 130 nm bulk CMOS
process (active area is 0.43 mm2). . . . . . . . . . . . . . . . . . . . . . 55
3.29 PCBs used for powering, biasing, and programming the DPLL chip. . . . 56
3.30 Block diagram of the test setup. The FPGA on the DE0 nano board, which
controls the DPLL chip, is programmed via a PC using Altera Quartus.
KE5FX software runs on the PC and can capture the spectrum of a par-
ticular clock using an HP 8565C spectrum analyzer. . . . . . . . . . . . 57
3.31 DCO gain measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.32 The differential output clock captured using Tektronix RSA6114A. The
differential peak-to-peak voltage is 370mV for a 2 GHz clock. . . . . . . . 58
3.33 Spectrum of the output clock, captured by HP8565C spectrum analyzer
and KE5FX tool, when the reference clock is frequency modulated. . . . 59
3.34 Verilog-A Simulation vs. Measurement captured by HP8565C spectrum
analyzer and KE5FX tool. . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.35 Spurs spectrum at 2012.5 MHz measured using Agilent E4448A spectrum
analyzer before calibration. . . . . . . . . . . . . . . . . . . . . . . . . . . 60
xiii
3.36 Spurs spectrum at 2003.125 MHz measured using Tektronix RSA 6114A
real-time spectrum analyzer before calibration. . . . . . . . . . . . . . . . 61
3.37 Spurs spectrum at 2006.250 MHz measured using Tektronix RSA 6114A
real-time spectrum analyzer before calibration. . . . . . . . . . . . . . . . 61
3.38 Spurs spectrum at 1995.0 MHz measured using Tektronix RSA 6114A
real-time spectrum analyzer. . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.39 Spurs spectrum at 2185.0 MHz measured using Tektronix RSA 6114A
real-time spectrum analyzer. . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.40 Phase noise measurement of 2GHz clock using a HP8565C analyzer with
(red) and without (blue) the fine TDC. The reference clock is a 20 MHz
temperature-controlled oscillator. . . . . . . . . . . . . . . . . . . . . . . 64
3.41 The random jitter measurement of the output clock when the fine stochas-
tic TDC is activated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.42 DPLL output phase noise spectrum at 2.4 GHz captured by an Agilent
E4448A spectrum analyzer. The in-band noise is -107 dBc/Hz while the
integrated jitter is 500 fs rms (0.43 degree) from 1 kHz to 100 MHz for a
loop bandwidth of 1.42 MHz. . . . . . . . . . . . . . . . . . . . . . . . . 66
3.43 DPLL output phase noise spectrum at 1.995 GHz captured by an Agilent
E4448A spectrum analyzer. The in-band noise is -104 dBc/Hz while the
integrated jitter is 233 fs rms from 1 kHz to 100 MHz for a loop bandwidth
of 700 kHz. An IIR filter was used to attenuate high frequency spurs. . . 66
3.44 Fractional synthesis measurements using HP8565C analyzer with (a) a
21 MHz input reference at channel 95 + 67/256 and (b) a 20 MHz input
reference at channel 109 + 64/256 exhibiting less than 1 ppm frequency
error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.1 A digital PLL architecture for integer and fractional mode synthesis. . . 73
4.2 Buffer delay line implementation of TDC: simplified schematic view (left);
timing diagram (right). The raw Q[i] is pseudo-thermal code to be con-
verted into a normalized binary word representing the fractional phase
error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.3 TDC Output during frequency (below 30 µs) and phase acquisition for
different DPLL operation modes. . . . . . . . . . . . . . . . . . . . . . . 74
4.4 The DPLL nonlinearity due to TDC quantized response. . . . . . . . . . 75
xiv
4.5 Dead-zone behavior of Integer-N DPLL with a TDC resolution of 32 ps. . 76
4.6 Bang-Bang behavior of Integer-N DPLL. . . . . . . . . . . . . . . . . . . 76
4.7 Spectrum of TDC normalized output. . . . . . . . . . . . . . . . . . . . . 78
4.8 Phase noise of the same output clock for 60 different initial conditions for
uncompensated Integer-mode DPLL. . . . . . . . . . . . . . . . . . . . . 79
4.9 Zero-phase-restart (ZPR) triggered during the transition from coarse to
fine locking mode. ZPR ensures a smooth transition from coarse to fine
locking without disrupting DCO [15]. . . . . . . . . . . . . . . . . . . . . 80
4.10 Dithering the reference clock by using ∆Σ modulator to control the pro-
grammable delay of an input clock buffer. . . . . . . . . . . . . . . . . . 81
4.11 A typical circuit to estimate the phase error of a coarse TDC. . . . . . . 82
4.12 Digital dithering algorithm at the falling edge of the output clock (0.5 UI). 83
4.13 Phase noise of the same output clock for 60 different initial conditions after
applying noise shaped random offset and disabling the fractional part of
ZPR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.14 Time Interval Error (TIE) for the 60 simulations with different initial
conditions: No dithering, Random dithering, ∗ Noise-shaped dithering. 84
4.15 A generic proposed circuit to generate a dithered phase error, tr/Tout, which
can be applied to an integer and simple fractional channel synthesis. . . 85
4.16 Die photo of the DPLL chip in IBM 130 nm bulk process [8]. It is the same
chip used to demonstrate the DPLL with a coarse-fine TDC in chapter 3. 86
4.17 Phase noise measurement using HP8565C analyzer showing different be-
haviors of integer-mode DPLL. . . . . . . . . . . . . . . . . . . . . . . . 87
4.18 The measured jitter histogram during dead-zone operation. The extracted
random jitter is 896 fs RMS while the deterministic jitter due to dead-zone
operation is 28.3 ps peak-to-peak. . . . . . . . . . . . . . . . . . . . . . . 87
5.1 A DPLL with a quantized phase detector and without a feedback divider. 90
5.2 Transfer function of the MPBBD (thick solid blue) vs. BBPD (thin dashed
red) when a DCO period is divided into eight regions with each region
spans 45 degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Phase domain model of DPLL with quantized phase detector. . . . . . . 92
5.4 Illustration of cycle slipping and speed of frequency acquisition for BBPD
vs. MPBBD based DPLL. . . . . . . . . . . . . . . . . . . . . . . . . . . 96
xv
5.5 The pull-in range (normalized to reference frequency) of BBPD-DPLL for
different values of Ki and Kp. Dashed lines obtained by Eq. 5.31, solid
lines obtained by Eq. 5.33, and symbols are based on simulations. . . . . 101
5.6 A plot of the number of cycle slips (Kp = 3 andKi = 1/32) for BBPD versus
MPBBD based DPLL. (a) BBPD: blue circles from simulation results and
dashed red line from Eq. 5.44 (b) MPBBD: blue squares from simulation
results and solid red line from Eq. 5.44. . . . . . . . . . . . . . . . . . . . 104
5.7 Frequency locking time until cycle slips disappear (Kp = 3 and Ki = 1/32)
for a BBPD (blue circles from simulation) and MPBBD (blue squares from
simulation) based DPLL. Eq. 5.51 is represented using solid red, Eq. 5.55
using small dashed blue line, while Eq. 5.58 using large dashed red. . . . 107
5.8 Discrete-time model of phase error development for fast evaluation of DPLL.111
5.9 Integral path output and cycle-slip trajectory for BBPD (blue triangles)
and MPBBD (red squares) based DPLL, when frequency offset is 3 MHz
(3% frequency error while Kp = 3 and Ki = 1/32). . . . . . . . . . . . . . 112
5.10 Equivalent ∆Σ representation of DPLL with a quantized phase detector. 113
5.11 Transient simulation comparison between BBPD and MPBBD based DPLL,
when frequency offset is 2.5 MHz (2.5% frequency error while Kp = 3 and
Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.12 Modification of the MPBBD transfer function to extend pull-in range and
reduce acquisition time. The improved MPBBD (IMPBBD) identifies the
sign of the initial frequency error and accordingly change its transfer function.117
5.13 Pull in range and locking time of BBPD (blue ∗), MPBBD (red ), and
IMPBBD (green •) based DPLL. The lock-in range of the IMPBBD is
extended to ±fref (fref is 100 MHz and fout is 1 GHz while Kp = 3 and
Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.14 Integral path output and cycle-slip trajectory for DPLL with three differ-
ent phase detectors (frequency offset is 7.5 MHz and fref is 100 MHz while
Kp = 3 and Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.15 Transient simulation comparison between MPBBD and IMPBBD based
DPLL, (frequency offset is 7.5 MHz and fref is 100 MHz while Kp = 3 and
Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
xvi
5.16 Integral path output of BBPD (dark blue) vs. MPBBD (light red) based
DPLL (frequency offset is 6.0 MHz). The simulation employs uniform
time-step sampling (1/100 of DCO period) using a Verilog-A implementa-
tion of the DPLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.17 The architecture of the implemented DPLL. The FLL and the high-speed
counter, as well as the synchronization structure, are disabled by a lock
detector once frequency lock occurs. The MPBBD locks the phase of the
output clock (phase A) to the reference clock. . . . . . . . . . . . . . . . 122
5.18 The programmable delay unit used to form a four-stage DCO. Each unit
has 7-bit coarse cap configuration and 8-bit fine cap implemented as a
combination of 4-bit binary along with 15 thermal caps. . . . . . . . . . . 124
5.19 Timing diagram of the multi-phase DCO sampled by a reference clock at
some point. Based on the sequence of MPBBD outputs (01111000), the
LUT provides an indication of the phase error magnitude between phase
A and reference clock as shown in the circle on the right bottom side. . . 125
5.20 The transfer function of the MPBBD and its LUT (thick solid blue) vs.
BBPD (thin dashed red). . . . . . . . . . . . . . . . . . . . . . . . . . . 125
5.21 Absolute phase error of the output clock of DPLL with respect to an ideal
clock with (a) binary BBPD (thin dashed blue) and (b) MPBBD (solid
thick red). The initial frequency error is 300 MHz. Data is clipped below
6.5 µs as rest was applied at that moment after loading the right loop
configurations. The FLL takes 15.5 µs (310 reference cycles) while PLL
takes 30 µs (600 cycles) in case BBPD (a) is used and 5µs (100 cycles) in
case MPBBD (b) is employed. . . . . . . . . . . . . . . . . . . . . . . . . 126
5.22 The mapped output of the bang-bang detector (from LUT) during fre-
quency and phase lock. The binary BBPD slews when phase error is high
and takes lengthy time to recover. On the other hand, MPBBD automat-
ically gears its gain according to the phase error magnitude till lock is
achieved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
5.23 DPLL output phase-noise spectrum at 1.20 GHz: Simulation (blue) vs.
measurement (black) captured by an Agilent E4448A spectrum analyzer.
The in-band noise is -98.32 dBc/Hz while the loop bandwidth is around
1.7 MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
xvii
5.24 DPLL output phase-noise spectrum at 1.40 GHz captured by an Agilent
E4448A spectrum analyzer. The in-band noise is -96.48 dBc/Hz while the
loop bandwidth is around 1.7 MHz. . . . . . . . . . . . . . . . . . . . . . 129
5.25 Die photograph of the DPLL in 28nm CMOS LP ST Microelectronics
Technology (active area is less than 0.008 mm2). . . . . . . . . . . . . . . 129
A.1 Schematic of the four-stage, 50 output driver used to send the DCO output
off-chip. The last differential pair M4 is sized W = 176 µm/L = 120 nm
while the load resistor R4 = 62.5 ohm. The previous stages are sized
according to the following: transistor sizes ofM4 = 2∗M3 = 4∗M2 = 8∗M1
and resistor values of R4 = R3/2 = R2/4 = R1/8. . . . . . . . . . . . . . 135
A.2 (a) Schematic of the CML latch used in the divide-by-2 circuit. The value
of R = 2 kΩ while M1 = M2 = M3 has W = 6 µm and L = 120 nm (b)
Schematic of divide-by-2 using the two CML latches. . . . . . . . . . . . 136
A.3 Schematic showing the CML to CMOS conversion employed after the CML
divide-by-2. The CML signal is AC coupled through Cc = 150 fF and
then passed to CMOS inverter with feedback resistor Rf = 35 kΩ to
define the input common mode. The small cross coupled CMOS inverters
(W = 160 nm & L = 120 nm) are used ensure differential operation.
Another stage of CMOS inversion follows with similar size of the first
stage (W = 13.02 µm & L = 120 nm) . . . . . . . . . . . . . . . . . . . . 137
A.4 (a) Illustrative diagram of the fabricated chip mounted on QFN36 package
and soldered on PCB (b) Lumped model of the output PADs, bond wires,
lead and PCB trace capacitance. The chip dimensions are 1 µm x 1 µm
while the QFN36 dimensions are 5 µm x 5 µm. Accordingly, the bond
wire could be 2-2.5 µm long and so Lbw = 2.5 nH. The extracted PAD
capacitance was 90 fF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
A.5 Sense-amplifier flip flop with a narrow metastability window [15] . . . . 138
C.1 Verilog-A model of DCO phase noise. . . . . . . . . . . . . . . . . . . . . 149
xviii
List of Abbreviation
ADPLL All Digital Phase Lock Loop
ADC Analog to Digital Converter
BBPD Bang-Bang Phase Detector
CDF Cumulative Distribution Function
CDR Clock and Data Recovery
CML Common Mode Logic
CMOS Complementary Metal Oxide Semiconductor
DAC Digital to Analog Converter
DCO Digitally Controlled Oscillator
DCW DCO Control Word
DLF Digital Loop Filter
DPLL Digital Phase Lock Loop
FCW Frequency Control Word
FLL Frequency Lock Loop
FSM Finite State Machine
GSM Global System for Mobile Communications
IC Integrated Circuit
xix
IP Intellectual Property
MOS Metal Oxide Semiconductor
MPBBD Multi-Phase Bang-Bang Detector
PDF Probability Distribution Function
PHE Fixed-point Phase Error Signal
PHF Fixed-point Feedback Phase Signal
PHR Fixed-point Reference Phase Signal
PLL Phase Lock Loop
PVT Process, Voltage, and Temperature
RF Radio Frequency
RTL Register-Transfer Level
SoC System on a Chip
STDC Stochastic Time to Digital Converter
TDC Time to Digital Converter
VCO Voltage Controlled Oscillator
xx
Chapter 1
Introduction
1.1 Motivation
Electronics innovations in the last two decades have been fueled by the expansion of
wireless communications and standards, high demand for Internet speed and capacity,
and emerging of Internet of things (IoT). This trend still has an excellent potential for
growth. For example, Cisco Inc. estimated there will be nearly 50 billion IoT devices by
2020 which is 50 times larger than the installed base in 2009 [1].
The electronics industry has adopted a system on a chip (SoC) design methodology
to meet the endless demand for higher performance and complex computation, storage,
and communication capabilities under the pressure to shorten time-to-market. A SoC
design integrates a broad range of reusable components called intellectual properties (IPs)
most of which are digital IPs such as ARM processors, memory blocks, communication
protocols, etc. However, SoC design still needs mixed signal and analog RF IPs to
interface with the physical world such as analog-to-digital converters (ADCs), digital-to-
analog converters (DACs), and phase lock loops (PLLs).
Digitally assisted analog circuits have been emerging to enhance analog and mixed-
signal circuit performance by exploiting the growing computational power of Digital
Signal Processing (DSP) on a SoC [2]. This technique has been used to correct the
output of a pipelined ADC and for various related foreground and background calibration
algorithms [3][2]. A PLL design is no exception from other mixed signal designs. Many
researchers propose to use DSP to correct for deterministic errors caused by dithering
of a PLL and to calibrate the loop gain and bandwidth of PLL across process, voltage,
1
1.2. OVERVIEW OF PLLS
and temperature (PVT) variations [4]. The pioneering work of [5] proposes to use an all-
digital PLL (ADPLL) to take full advantage of the constant scaling of CMOS processes.
However, this goal is not entirely feasible since ADPLLs still have analog blocks to resolve
the phase error and to control the required output clock. This thesis, in particular, is
focused on the design of digitally intensive PLLs (DPLLs) and to explore limitations and
ways to improve DPLLs.
1.2 Overview of PLLs
PLLs are feedback circuits that lock an output clock’s phase to an input reference clock’s
phase. PLLs are important and often performance limiting IPs in modern SoCs. PLLs
are used in a wide array of electronics, including microprocessors and communications
devices, digital television, DC motor control, etc.
In a wireline transceiver, a PLL can be used to recover a clock signal embedded
in a serial data stream (like USB, PCI Express, SATA, etc.) transmitted over a noisy
channel. In wireless communications, PLLs are used to provide the local oscillator for
up-conversion (modulation) during transmission and down-conversion (demodulation)
during signal reception. Also, a PLL can be used as a frequency synthesizer to generate
a stable frequency at multiples of an input frequency to provide a programmable clock
for various standards in a software-defined radio (SDR). In large digital circuits like
microprocessors, PLLs are used to distribute precisely timed clocks. Moreover, PLLs
can be used for jitter filtering and reduction, and many more applications. Different
applications have different requirements on the PLL specifications. For example, HDTV
requires a PLL with a very small bandwidth. On the other hand, clock and data recovery
(CDR) circuits target relatively large bandwidth to track jitter on the incoming data
stream.
A PLL consists of a phase detector, a low-pass filter, a controllable oscillator, and a
divider. The earliest implementation of PLLs was purely analog where the phase detector
is implemented as a mixer or multiplier. Modern analog PLL implementation employs
a mixed signal design such that it combines a digital phase detector with an analog
charge pump whose output is converted to an analog voltage using an analog loop filter
to control a voltage controlled oscillator (VCO). There are a divider and noise-shaped
modulator to synthesize integer and fractional channels, as shown in Fig. 1.1.
2
1.2. OVERVIEW OF PLLS
ChargePump
N/N+1
VCO
Fout(t)PFD
Fref(t) AnalogLoop Filter
Dual Modulus Divider
ΣDigital Modulator
Nb[n]
M-bitFCW
Figure 1.1: Typical analog PLL Architecture with a multi-modulus divider and ∆Σmodulator to synthesize fractional channels.
Modern wireless and wireline communication standards place challenging demands on
the phase noise, spurious tones, jitter accumulation, and modulation bandwidth of PLLs
[6]. Accordingly, state-of-the-art wide-bandwidth analog PLLs employ analog phase-
noise-cancellation techniques using a DAC to suppress the quantization noise caused by
the ∆Σ dithering, as shown in Fig. 1.2. It enables a PLL to operate with small fractional
spurs and low phase noise at loop bandwidths of 700 kHz to 1 MHz [4]. However,
matching a DAC cancellation signal to the phase error is a complicated and challenging
analog circuit problem.
ChargePump
N/N+1
VCO
Fout(t)PFD
Fref(t) Loop Filter
Dual Modulus Divider
ΣDigital Modulator
NFrequency Selection
b[n]M-bit
++
-
DACdiv(t)
E(t)
res[n]
Error due to imperfect matching
Figure 1.2: Typical analog PLL Architecture with DAC to cancel the deterministic errorcaused by ∆Σ modulator [7].
3
1.3. OVERVIEW OF DPLLS
1.3 Overview of DPLLs
Research on DPLLs has been actively trying to replace or complement traditional analog
PLLs by taking advantage of aggressive CMOS scaling and operating under lower supply
voltages [8]. DPLLs offer several advantages over their analog counterparts. Firstly,
DPLLs are less sensitive to external noise, substrate noise, mismatch, and PVT variations
since many DPLL building blocks are realized with purely digital logic circuits. Secondly,
DPLLs consume much less area than analog PLLs, and programmability and testability
are available at very low area penalty, reducing die sizes and production costs [15].
In DPLLs, there is no charge pump and no analog loop filter. Instead, a digital
filter is implemented using digital logic standard cells. The phase detector is replaced
by or augmented with a time-to-digital converter (TDC). Fig. 1.3 shows a high level
diagram of the DPLL architecture considered in this thesis. An integer counter provides
an estimate of the number of output periods in one reference period. A correction to
that estimation is achieved using the TDC, which resolves the phase difference between
the reference and output clock and produces a phase error normalized with respect to
the output clock, ε. Together, the counter and TDC generate a feedback phase count,
PHF that is digitally subtracted from a reference accumulated phase, PHR, to produce
a phase error count, PHE. Then, the PHE is digitally filtered and applied to tune a
digitally-controlled-oscillator (DCO).
Despite the immense advantage of DPLLs, they do, however, impose new design
challenges due to the quantization of frequency in DCO and phase in that TDC which
Digital Loop Filter
DCOfout(t)
fref(t)
FCW
TDC-
+
+
fine
coarse
-
+
+PHE
+
PHF
PHR
Figure 1.3: Digital PLL Architecture where the loop filter is all digital, and the phaseand frequency error signals are fixed-point numbers.
4
1.3. OVERVIEW OF DPLLS
introduces quantization error and, hence, jitter. The resolution of phase error detection
is typically limited by the inverter delay in a particular fabrication process. For instance,
one inverter delay in 130 nm CMOS technology is about 32 ps while it reduces to 16 ps
in 65 nm CMOS technology.
TDC quantization noise and reference clock jitter are low-pass filtered by the DPLL’s
dynamics and are therefore dominant at low frequencies within the DPLL loop band-
width. On the other hand, DCO noise is high-pass filtered and dominant at high fre-
quencies as shown in Fig. 1.4.
Combining wide loop bandwidth and excellent in-band phase noise performance re-
mains particularly challenging for DPLLs. The work in [5] demonstrates that a DPLL
can meet even the tough GSM specification. However, its loop bandwidth of 40 kHz
remains an order of magnitude lower than that achieved by the analog techniques de-
scribed above. In applications where only high-frequency phase noise is of interest, a
wide loop bandwidth can be accommodated in a DPLL with a simple bang-bang phase
detector (no TDC); such is the case in [9]. However, more generally in DPLLs with wide
loop bandwidth, it is desirable to have very fine TDC resolution. At the same time, the
TDC’s input dynamic range should be sufficient to cover at least one output DCO period
in order for the DPLL to estimate the phase error across an entire DCO period. An even
larger dynamic range of at least two DCO periods is needed if on-chip jitter measurement
is to be performed. Although two recent DPLLs extended loop bandwidths to 142 kHz
Low PLL Bandwidth High PLL Bandwidth
dBc/
Hz
dBc/
Hz
ffo fo
TDC Noise
DCO Noise
DCO Noise
TDC Noise
f
Figure 1.4: Phase noise contributions for low- and high-bandwidth DPLLs.
5
1.4. THESIS CONTRIBUTION
[10] and 3 MHz [11], the former one cannot achieve low in-band phase noise while the
latter work sacrifices its out-of-band noise performance.
Designing a TDC with fine resolution also prevents detrimental nonlinear dynamics
from arising in DPLLs. If a DPLL is operating as a fractional-N synthesizer, the phase
relationship between DCO output and reference input is scrambled over time, and the
quantization error introduced by the TDC may be approximated as white noise [12].
However, if the DPLL is locked in an integer-N mode, the phase relationship between
TDC inputs is fixed and the TDC may exhibit either bang-bang behavior (associated
with unpredictable loop bandwidth), or it may exhibit a dead-zone behavior resulting in
dynamics that are very dependent upon the initial conditions of the loop. This thesis
focuses on improving the phase and frequency detection in general and on improving
TDC resolution and linearity as doing so improves the noise performance of DPLLs in
both integer and fractional synthesis modes.
1.4 Thesis Contribution
The thesis first presents a fractional-N DPLL that can operate from 1.99 – 2.5 GHz.
Though the DPLL design is meant to be generic, the unlicensed 2.4-GHz ISM (Industrial
Science and Medical) band was in mind during the design stage. Many applications in
the ISM band could take advantage of a wide bandwidth DPLL by using direct digital
modulation. For example, a Bluetooth transmitter employs Gaussian frequency shift
keying (GFSK) with 1Mbps basic data rate i.e. 500 kHz bandwidth [13]. Typically,
a Bluetooth transmitter uses either an open Loop VCO modulation or up-conversion
transmitter architecture. Alternatively, a 2.4 GHz DPLL with 500 kHz bandwidth can
be used to modulate the DCO frequency directly in the digital domain without the need
for DAC or up-conversion mixer. However, using direct digital modulation necessitates
having very small in-band noise. To achieve -114 dBc/Hz in-band phase noise at 2.4 GHz
locked to a 20 MHz reference clock, a TDC with 2 ps resolution is needed, approximately
more than an order of magnitude better than an inverter delay in 0.13 µm technology.
A 9-bit TDC is needed to cover the maximum period of 503 ps with 2 ps resolution.
The minimum inverter delay in 0.13 µm technology varies over PVT from 32 to 48
ps. Accordingly, a minimum of 16-stage coarse TDC is needed to cover the maximum
period of 503 ps. An obvious way to achieve 2 ps resolution is by arraying 24 instances
6
1.4. THESIS CONTRIBUTION
of the 16-stage coarse TDC in parallel where each coarse TDC has extra 2 ps delay with
respect the previous one. This arrangement will increase the area and power by 24 times.
Alternatively, using only one instance of the 16-stage coarse TDC followed by a fine TDC
with 2 ps would save power and area. The fine TDC must cover a range larger than the
maximum possible delay of an inverter in the coarse TDC. In this thesis, a fine TDC with
2 ps resolution that can cover 64 ps is sought. The thesis presents a fractional DPLL that
incorporates a novel low-power two-step coarse-fine TDC to achieve low in-band phase
noise operation. The DPLL employs a 6-bit stochastic TDC for the fine TDC stage while
still achieving wide locking range using a 4-bit coarse delay line TDC. On power-up, a
calibration algorithm to minimize nonlinearities in the coarse TDC is enabled. By using
a balanced mean code density test, the number of registers required for the calibration
algorithm is reduced by 30%. Based upon the coarse TDC output, the appropriate clock
signals are multiplexed into the stochastic fine TDC. The DPLL consumes a total of
15.2 mW of which 4.4 mW are consumed in the TDC in 0.13 µm CMOS. The integrated
random jitter is 213 fs rms for a 2 GHz output carrier frequency with 700 kHz loop
bandwidth. The calibration and IIR filtering reduce worst-case spurs from -54.4 dBc to
-70.55 dBc at 1.995 GHz operation.
The second part of the thesis presents a novel digital solution to avoid the problem
of dead-zone behavior in a digital phase locked loop (DPLL) caused by the quantization
effect of the TDC. The dead-zone behavior results in limit cycle behavior causing higher
than expected in-band phase noise and strong in-band spurious tones. This behavior is
dependent on the initial phase difference between the output and reference clock which
makes the DPLL performance inconsistent and unpredictable. To alleviate this problem,
a noise shaped offset is added to the phase error in the digital domain to keep the TDC
active and away from the dead-zone. The proposed solution is verified by extensive
simulation and using a DPLL prototype in a 0.13 µm CMOS process.
The third part of this thesis presents a rigorous mathematical analysis of DPLL em-
ploying a quantized phase detector during frequency acquisition where DPLLs usually
exhibit cycle slipping. The analysis finds that pull-in range is proportional to the square
root of the phase detector large signal gain,√KPD, while locking time is inversely propor-
tional to its square, K2PD. Based on the findings of this analysis, a multi-phase bang-bang
detector (MPBBD) based DPLL is proposed to accelerate frequency and phase locking
time and to increase the pull in range while maintaining same steady state performance
7
1.5. THESIS OUTLINE
as a bang-bang phase detector (BBPD) based DPLL. The proposed DPLL reduces power
consumption by disabling the high-speed counter and re-timing circuit in the feedback
loop after achieving frequency lock. Also, an improved version of the MPBBD is sug-
gested to extend pull-in range up to the reference frequency range that could eliminate
the need for a frequency lock loop and feedback counter for DPLLs and digital CDRs.
1.5 Thesis Outline
This thesis is structured as follows. Chapter 2 introduces the DPLL structure, and then
presents a discrete-time mathematical model for analysis, and finally discusses the effect
of TDC quantization noise on phase noise and jitter performance. Chapter 3 starts with
a review of the state-of-art TDCs and then describes the proposed coarse-fine TDC as
well as the TDC calibration loop. Also, an overview of the DCO and the digital loop is
given. The chapter concludes with test setups and measurements results. Then, Chapter
4 presents the nonlinear problem of dead zone behavior that manifests during integer
mode operation and which affects the loop dynamics and phase noise performance. The
chapter concludes with a presentation of the implemented dithering algorithm to alleviate
the dead zone problems along with simulations and measurement results. Chapter 5
analyzes a DPLL employing quantized phase detection during frequency locking and
finds closed loop formulas for locking time and frequency pull-in range of a DPLL. An
analogy between DPLL and ∆Σ modulator is drawn to estimate the frequency capture
range. The chapter finishes with a presentation of a prototype of MPBBD-DPLL chip
in 28 nm CMOS technology along with simulation and measurement results. Finally,
Chapter 6 concludes the thesis.
8
Chapter 2
System Level Overview and Analysis
of DPLL
This chapter presents a system-level overview of DPLLs. Then, a discrete-time model of
a DPLL is shown along with an approximate continuous-time model to derive the loop
response and necessary loop performance metrics. Afterwards, the chapter addresses the
effect of TDC and DCO quantization noise on DPLL performance along with discussion
of the TDC related problems during fractional DPLL frequency synthesis. Finally, an
overview of the basic structure and operation of TDCs and the following normalization
circuit is presented.
2.1 Introduction
Fig. 2.1 shows a general block diagram of a DPLL. The DCO changes its frequency
in discrete steps, and has two digital controlling inputs, each with a separate gain and
range. One input has coarse frequency step1 but allows wide frequency tuning range.
This coarse input is used during frequency locking at power-on or reset. The other input
has fine small frequency step with small tuning range that must be larger than one coarse
frequency step such that it covers the PVT variations of the coarse resolution. The DCO
is implemented as a combination of DAC followed by VCO. The DAC can be voltage,
current, or capacitance DAC. The later is composed of switchable capacitor banks to
1The thesis refers to the frequency step as the frequency resolution ∆fres that represents the least-significant-bit (LSB) of DCO.
9
2.2. Time-domain model of DPLL
Digital Loop Filter
DCOfout(t)
fref(t)
FCW
TDC-
+
+
fine
coarse
-
+
+PHE
+
PHF
PHR
Figure 2.1: Digital PLL Architecture where the loop filter is all digital, and the phaseand frequency error signals are fixed-point numbers.
increase and decrease the capacitive load in the DCO.
The TDC acts as phase detector that finds a quantized phase error between the refer-
ence clock (fref) and output clock (fout). The phase error is quantized with a resolution
limited by the fabrication technology and TDC implementation. The accumulated ref-
erence phase (PHR), feedback phase (PHF ), and phase error (PHE ) are all fixed-point
digital signals. Similarly, the frequency control word (FCW or N), coarse and fine DCO
control word (DCW ) and the TDC error signal (ǫ) are all fixed-point digital signals.
At startup, a frequency lock loop (FLL) sets the coarse DCW adjacent to the required
output frequency by employing a first-order DPLL with high proportional gain, Kp.
Then, the finite-state-machine (FSM) controls the DPLL and shifts down Kp to a low
value and enables a third-order loop operation to correct for the frequency residue and
to lock the output clock phase to the input clock reference.
2.2 Time-domain model of DPLL
DPLLs are implemented with discrete-time DSP and so time-domain models are required
to capture the DPLL performance and limitations accurately. Mainly, a DPLL designer
is interested in stability and transient performance like locking time and ringing, as well
as steady-state phase noise and jitter performance.
10
2.2. Time-domain model of DPLL
Ki
Kp
φref+
φoutφe+-+
1 - z-1Tref z
-1Kdco
TDC Digital Loop filter DCO
N k1 – (1-k)z-1Ktdc z
-m 1 - z-1z-1 IIR
Figure 2.2: DPLL model in discrete-time. The DCO gain, Kdco, is expressed in Hz/ LSB.The phase detector gain, Ktdc, is unity for fractional mode and is inversely proportionalto the input phase error during integer mode.
2.2.1 DPLL model
The DPLL can be represented by a discrete-time (z-domain) model as shown in Fig. 2.2.
The DPLL employs a proportional and delaying integral digital loop filter (DLF) that
defines the loop dynamics. An additional high-frequency pole is needed to provide extra
filtering of high-frequency spurs and noise, similar to an analog PLL. It is implemented
using an infinite impulse response (IIR) filter controlled by the value of k. The multi-
plication coefficients (Kp, Ki, and k) in the DLF are implemented by using shifting and
addition operations to reduce the complexity and cost.
In contrast to a conventional PLL, the divider ratio N does not appear in the feedback
path of the DPLL model shown in Fig. 2.2. The TDC finds the normalized phase error
φe, in terms of number of DCO periods, between reference clock fref and the output clock
fout. It does so by using fref to sample delayed versions of fout directly without being
divided down. Accordingly, the reference phase, φref , must be multiplied by the frequency
ratio between output and input clock i.e. N or FCW. The gain of the TDC, Ktdc, is
equal to one during fractional mode but can be very large or small during integer-mode
operation when the DPLL exhibits “bang-bang” or “dead-zone” behavior, respectively,
as will be explained in Chapter 4. The term z−m represents additional delay within the
DPLL and depends upon the details of the implementation of the particular TDC.
The DCO reacts to the filtered normalized phase error (i.e. fine DCW) and changes
its frequency according to its gain (i.e. resolution, Kdco). Since oscillator phase is merely
an integration of its frequency over time, a discrete-time integrator is needed to model
11
2.2. Time-domain model of DPLL
the phase to frequency conversion in the DCO (z−1/1− z−1). For a discrete-time model,
the quantities are updated once every reference period, and so there is an embedded
zero-order hold at the output of DCO (since it holds the frequency for an entire reference
period) which can be approximated by Tref for low frequencies of interest [14]. Based on
the DPLL model shown in Fig. 2.2, the open loop transfer function is given by
Hol(z−1) =
(Ktdc · z−m
)(Kp +Ki
z−1
1− z−1
)(k
1− (1− k)z−1
)(Kdco
Trefz−1
1− z−1
)(2.1)
where Tref = 1/fref is the sampling reference period. To find the equivalent continuous-
time DPLL model, one can use a forward-rectangular discrete-to-continuous-time con-
version by approximating z with sTref + 1 while preserving the stability of the system
[14]. Also, recall that z ≡ esTref ; then, using the power series, z−m ≡ e−mTref s can be
approximated as 1 − mTrefs. Finally, the equivalent continuous-time DPLL model has
the following approximate open-loop response:
Hol(s) = Ktdc (1−mTrefs)
(Kp +
Ki
sTref
)(1 + sTref
1 + sTref/k
)(Kdco
s
)(2.2)
If k ≈ 1, then the IIR terms are approximately unity, and can be omitted for the open
loop response in Eq. 2.2. Also, assuming the TDC does not introduce significant extra
delay within the loop i.e. m ≈ 0,
Hol(s) ≈ Ktdc
(Kp +
Ki
sTref
)(Kdco
s
)=KtdcKdcoKi
Tref
(1 + s/ Ki
KpTref
s2
)(2.3)
⇒ Hol(s) =ω2n
s2
(1 +
s
ωz
)(2.4)
Eq. 2.3 represents a second order system with natural frequency ωn, damping factor ζ ,
and phase margin PM, as follows:
Natural frequency: ωn =
√KtdcKdcoKi
Tref(2.5)
Damping factor: ζ =ωn
2ωz
=Kp
2
√KtdcKdcoTref
Ki
(2.6)
12
2.2. Time-domain model of DPLL
0 2 4 6 8 10 12 14 160
0.2
0.4
0.6
0.8
1
1.2
1.4
Time (microseconds)
No
rma
lize
d F
ilte
r O
utp
ut
(a) Step response
104
105
106
107
−25
−20
−15
−10
−5
0
5
Ma
gn
itud
e (
dB
)
Frequency (rad/s)
(b) Closed loop magnitude response when N =1
Figure 2.3: DPLL response using different sampling rate when Kp = 1, Ki = 1/64,and the DCO gain Kdco = 726 kHz. The blue circles represent DPLL responses whenFref = 20 MHz while the green triangles describe DPLL responses when Fref = 40 MHz.
Unity gain bandwidth: ωUGB = ωn
√2ζ2 +
√4ζ4 + 1 (2.7)
Zero frequency: ωz =Ki
KpTref(2.8)
Phase margin: PM = tan−1
(ωUGB
ωz
)= tan−1
(2ζ
√2ζ2 +
√4ζ4 + 1
)(2.9)
3dB bandwidth: ω3dB = 2ζωn = KtdcKdcoKp (2.10)
From the above equations, the DPLL behavior is mainly defined by the DLF coefficients
(Kp & Ki), DCO gain (Kdco), and reference frequency (fref = 1/Tref). It is interesting
to note that the loop bandwidth, ω3dB, is merely defined by the DCO gain and by the
proportional path gain, Kp. However, ω3dB is not affected by the frequency division ratio,
N . The TDC gain, Ktdc, is unity during the fractional mode of operation. Fig. 2.3 shows
the DPLL response for two different sampling rates (i.e. reference frequencies) while
other loop parameters are kept the same. Based on Eq. 2.6, higher sampling rates make
the DPLL response less damped which causes ringing and peaking.
Scaling up Ki while keeping all other parameters fixed will increase the natural fre-
quency (ωn) and reduce damping factor (ζ) and phase margin, as well, while settling time
and loop bandwidth are barely affected, as shown in Fig. 2.4.
13
2.2. Time-domain model of DPLL
0 1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Time (microseconds)
No
rma
lize
d F
ilte
r O
utp
ut
(a) Step response
104
105
106
107
−10
−5
0
5
10
Ma
gn
itud
e (
dB
)
Frequency (rad/s)
(b) Closed loop magnitude response when N =1
Figure 2.4: DPLL behavior for different damping settings. The DCO gainKdco = 726 kHz/LSB and the proportional gain Kp = 1 for all settings. The bluecircles represent DPLL step response with Ki = 1/64. The green triangles describeDPLL step response when Ki = 4/64. The final setting is marked using red squareswhere Ki = 16/64.
0 1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
1.2
Time (microseconds)
No
rma
lize
d F
ilte
r O
utp
ut
(a) Step response
104
105
106
107
−5
−4
−3
−2
−1
0
1
2
3
4
5
Ma
gn
itud
e (
dB
)
Frequency (rad/s)
(b) Closed loop magnitude response when N =1
Figure 2.5: DPLL behavior for different bandwidth settings while damping ratio is keptthe same. The DCO gain Kdco = 726 kHz/LSB. The blue circles represent DPLL stepresponse with Kp = 1 and Ki = 1/64. The green triangles describe DPLL step responsewhen Kp = 2 and Ki = 4/64. The final setting is marked using red squares where Kp = 4and Ki = 16/64.
To achieve programmable bandwidth operation, Kp must be scaled by α while Ki
must be scaled by α2 to preserve the loop phase margin and peaking in the closed-loop
frequency response, as shown in Fig. 2.5.
14
2.2. Time-domain model of DPLL
Finally, the closed loop response is given by
G(s) = NHol
1 +Hol= N
1 + s/ωz
s2/ω2n + s/ωz + 1
(2.11)
The TDC quantization as well as TDC intrinsic phase noise will be low-pass filtered by
G(s). However, the DCO intrinsic phase noise will be high-pass filtered by [1 − G(s)]
while any frequency noise due to the quantization and dithering process will be band-pass
filtered by [2π/s][1 − G(s)]. In the following, the effect of DCO and TDC quantization
noise on the in-band phase noise is investigated.
2.2.2 TDC quantization noise
Assume the TDC uniformly quantizes the phase difference with a given TDC resolution,
∆ttdc (expressed in seconds). Accordingly, the variance of the timing uncertainty is σ2tQ
=
∆t2tdc/12. The phase noise (rad) is obtained by normalizing the standard deviation of the
timing error, σ2tQ, to the unit interval and multiplying by 2π radians: σφQ
= 2π ·σtQ/Toutwhere Tout = 1/fout is the DCO output period. The total noise power is spread uniformly
over the span from DC to the Nyquist frequency (i.e., half of the reference frequency fref).
Hence, the single-sided spectral density is σ2φQ/fref where fref = 1/Tref is the reference
frequency. In conclusion:
ℓQTDC=σ2φQ
fref=
(2π
σtQTout
)2
· 1
fref(2.12)
=π2
3· f
2out
fref·∆t2tdc (2.13)
From Eq. 2.13, it is obvious that the TDC noise contribution can be minimized by
improving the TDC timing resolution and/or by increasing the sampling rate of the TDC
i.e. by increasing the reference frequency. Reducing TDC resolution by a factor of 10
reduces in-band phase noise by 20 dB. For example, a 2.5 GHz DPLL with a 48 ps TDC
resolution running from a 20 MHz reference clock, the in-band phase noise contribution
is approximately -85.4 dBc/Hz. The phase noise will drop to -113 dBc/Hz if the TDC
resolution is reduced to 2 ps.
The TDC quantization noise is low passed filtered by the DPLL dynamics i.e. G(s),
and so TDC quantization noise is dominant within the loop bandwidth. Fig. 2.6 shows
15
2.2. Time-domain model of DPLL
20 25 30 35 40 45 50
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Time (us)
TD
C O
utpu
t (S
cale
d to
DC
O P
erio
d)
(a) Coarse TDC: ∆ttdc = 48 ps
20 25 30 35 40 45 50
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Time (us)
TD
C O
utpu
t (S
cale
d to
DC
O P
erio
d)
(b) Fine TDC: ∆ttdc = 2 ps
Figure 2.6: TDC output during frequency acquisition (below 30 µs) and phase lockingwith different TDC resolutions (FCW = 120.01709).
the normalized error, ǫ, estimated by TDCs with vastly different TDC resolutions of
∆ttdc = 48 ps & 2 ps. When ∆ttdc = 48 ps, the quantization error is large compared with
the output period (approximately 12% when fref = 20 MHz and FCW = 120.01709).
As expected from Eq. 2.13 and shown in Fig. 2.7, bringing down ∆ttdc to 2 ps will reduce
the noise floor by approximately 20 · log(48/2) = 28 dB. The white noise assumption of
TDC quantization noise used to derive Eq. 2.13 is less valid for large values of ∆ttdc where
there is a chance of appearance of unwanted spurs, as shown in Fig. 2.7(a). Accordingly,
the efforts are devoted to achieving fine-resolution TDC.
0 1 2 3 4 5 6 7 8 9 10−130
−120
−110
−100
−90
−80
−70
−60
−50
Frequency (MHz)
Sp
ect
rum
De
nsi
ty (
dB
)
Eq. 2.13
(a) Coarse TDC: ∆ttdc = 48 ps
0 1 2 3 4 5 6 7 8 9 10−130
−120
−110
−100
−90
−80
−70
−60
−50
Frequency (MHz)
Sp
ect
rum
De
nsi
ty (
dB
)
Eq. 2.13
(b) Fine TDC: ∆ttdc = 2 ps
Figure 2.7: Spectrum of TDC quantization noise, tQ, for different TDC resolutions(FCW = 120.01709). Simulation results are marked in blue while theoretical expec-tations from Eq. 2.13 are marked in red.
16
2.2. Time-domain model of DPLL
2.2.3 DCO quantization noise
During fractional operation, the DCO tuning word spans multiple quantization lev-
els. Hence, the DCO frequency quantization error can be assumed to have white noise
spectral characteristics. Mathematically, the frequency quantization error variance is
σfQ = (2π∆fres)2/12 where ∆fres [Hz] is the DCO frequency resolution i.e. the small-
est frequency step. Due to the white noise assumption, the frequency noise power is
spread uniformly from dc to the Nyquist frequency. The single-sided spectral density of
frequency quantization is σ2fQ/fref . Finally, since phase is the integral of frequency, the
single-sided power spectral density due to DCO quantization noise at frequency offset of
ω can be given as
ℓQDCO(ω) =
σ2fQ
fref· 1
ω2=
(2π∆fres)2
12· 1
fref· 1
ω2(2.14)
⇒ ℓQDCO(f) =
1
12 · fref· ∆f
2res
f 2(2.15)
For a 2.4 GHz DCO with 2.4 MHz frequency resolution and 20 MHz reference, the noise
contribution due to the DCO quantization at 1 MHz offset from the 2.4 GHz carrier is
-76 dBc/Hz, which is very high. Bringing the DCO resolution down to 240 kHz will
reduce the phase noise contribution to -96 dBc/Hz. This is still high and usually greater
than the thermal phase noise contribution of an LC DCO at 1 MHz offset frequency.
Dithering or adding further fractional bits will allow the noise contribution to be within
a reasonable range. Targeting -110 dBc/Hz @ 1 MHz will require a DCO resolution of
better than 48 kHz. Note that the DCO noise, as well as the DCO quantization, are
high pass filtered by the DPLL loop dynamics [15], and are therefore more pronounced
outside the loop bandwidth.
2.2.4 TDC Fractional Spurs
The phase noise spectrum of a DPLL output may contain reference spurs as well as
fractional spurs that appear at frequencies defined by the fractional part of frequency
control word i.e. Nfrac. These spurs show up at multiples of Nfrac · fref , as shown in
Fig. 2.8. The TDC fractional spurs are low pass filtered by the DPLL. Therefore, they
are in particular very problematic for smaller values of Nfrac since these spurs will show
up at low frequency within the loop bandwidth, as shown in Fig. 2.8(d), such that DPLL
dynamics can not filter them out.
17
2.2. Time-domain model of DPLL
100
101
102
103
104
−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Frequency (kHz)
Spe
ctru
m D
ensi
ty (
dB)
(a) FCW = Nint +Nfrac = 120.500
100
101
102
103
104
−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Frequency (kHz)
Spe
ctru
m D
ensi
ty (
dB)
(b) FCW = Nint +Nfrac = 120.250
100
101
102
103
104
−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Frequency (kHz)
Spe
ctru
m D
ensi
ty (
dB)
(c) FCW = Nint +Nfrac = 120.125
100
101
102
103
104
−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Frequency (kHz)
Spe
ctru
m D
ensi
ty (
dB)
(d) FCW = Nint +Nfrac = 120.01709
Figure 2.8: Phase noise spectrum of TDC Output for different fractional channels(∆ttdc = 32 ps).
Fig. 2.9 displays a spectrum of the phase error computed by the TDC when FCW =
120.01709 for two TDCs with ∆ttdc = 32 ps & 2 ps. Both TDCs produce fractional spurs
at frequencies that are multiplies of 0.01709 · 20e6 = 341.8 kHz. However, the 2 ps TDC
has less than 20 dB phase noise at low frequencies compared to the 32 ps TDC (Eq. 2.13
expects 20 · log(32/2) = 24 dB reduction). Looking at the probability density function
(PDF) of the absolute output jitter, as shown in Fig. 2.10, justifies the importance of
designing a low-resolution TDC to reduce jitter. When ∆ttdc = 32 ps, the peak-to-peak
jitter2 is 21 ps while the estimated random jitter is 3 ps RMS. If ∆ttdc is brought down
to 2 ps, the peak-to-peak jitter becomes only 9 ps while the estimated random jitter is
less than 1 ps RMS.
2The peak-to-peak jitter is observed for a long number of reference periods, Tref .
18
2.2. Time-domain model of DPLL
100
101
102
103
104
−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Frequency (kHz)
Spe
ctru
m D
ensi
ty (
dB)
Figure 2.9: Spectrum of the phase error computed by TDC when FCW = 120.01709:(a) solid red line when ∆ttdc = 32 ps (b) dashed blue when ∆ttdc = 2 ps. There is 20 dBdifference in the spectrum density at low frequency.
−15 −10 −5 0 5 10 150
200
400
600
800
1000
1200
Instantaneous Jitter (ps)
Rep
etio
n
∆ ttdc
= 2 ps
∆ ttdc
= 32 ps
Figure 2.10: Histogram of absolute output jitter when FCW = 120.01709: (a) solid redline when ∆ttdc = 32 ps where peak to peak jitter is 21 ps (b) plus blue symbols when∆ttdc = 2 ps where peak to peak jitter is 9 ps.
19
2.2. Time-domain model of DPLL
fref
fout
¼ ¼
¾ ½ ¼ ¾
(a) No TDC
fref
fout
!"
¾ ¼ ¾
#$" ¼ %& '( ¼
" " " " " "
" "½
(b) Infinite TDC resolution and accuracy
)
fref
fout
* + , -
./
012
234506
¾ ¼
7)86 /¼ 9: ;< - ..¼
6 6½ ¼
¼ =¾ ¼ ¼ =¾¼
(c) Infinite TDC resolution and accuracy with tim-ing offset between integer and fractional part
>
fref
fout
? @ A B
CD
EFG
GHIJEK
KLAC KLD? KLAA
M>NK D¼ OP QR B CC¼
KLKO KLK?KL@Q
KLKO SKLKO KLKQ SKLKD KLK? KLKD
(d) Finite TDC resolution and accuracy
Figure 2.11: Timing diagram of phase error computation for DPLL with FCW =21/4.
Fig. 2.11 shows the timing diagram for a DPLL when FCW ≡ Nint+Nfrac = 21⁄4 for
different scenarios. In Fig. 2.11(a), the DPLL uses only an integer counter without TDC
to estimate the number of DCO cycles within one reference cycle. Accordingly, the phase
error, φe, has periodic behavior at a frequency equal to 1/4 · 20 MHz = 5 MHZ. If a TDC
with infinite resolution and accuracy is employed to correct the integer counter estimate,
then the phase error will converge to zero without any noticeable periodicity as shown
in Fig. 2.11(b). Spurs could show up even if an infinite resolution TDC is employed
20
2.3. System level design of DPLL
when there is a time difference between the integer counter and TDC output as shown
in Figure 2.11(c). Re-timing the integer counter and TDC outputs to be synchronous
will solve that problem. Finally, Fig. 2.11(d) shows a typical case of a DPLL employing
a TDC with finite resolution and accuracy. The limited resolution and accuracy of the
TDC translates to the DPLL locking with continuously varying phase error that causes
higher than wanted in-band phase noise and the appearance of some fractional spurs.
Other sources of unwanted phase error are TDC nonlinearity and estimation error in
the TDC output after the normalization circuit (i.e. tr/Tout), as will be explained in
section 2.4.1.
2.3 System level design of DPLL
In a DPLL, the digital phase signals and control signals use fixed-point representation
with enough digits to span the full range of those quantities before saturation or wrap
around. MATLAB/ Simulink uses by default floating point representation. To cor-
rectly capture the actual implementation of a digital PLL, a fixed point representation in
Simulink is used. The designer must run simulations to determine the required number of
bits that will not hurt the digital PLL performance. Then, a register-transfer level (RTL)
implementation of the DPLL is implemented using Verilog for simulation along with a
VerilogA or transistor-level implementation of the DCO to capture the performance ac-
curately. The Simulink model as well as the VerilogA model capture the reference and
the DCO noises, including thermal noise, flicker noise, and their up-conversion to phase
noise. Furthermore, the models capture PVT variation of the DCO which affects its cen-
ter frequency and minimum capacitance step i.e. DCO gain. Similarly, the TDC delay
and range variation are also modeled. On the other hand, the digital algorithm uses a
high level description that it independent of PVT variation since it will be synthesized to
meet hold and setup time over PVT at a later stage. Appendix C has more details about
jitter modeling and simulation and verification flow. Fig. 2.12 shows phase noise simula-
tions based on the variation of the rising edge of the output clock that was captured in
Simulink.
Behavioral simulations were done in Matlab to show the jitter contributed by various
sources in the DPLL for a 2.4 GHz output frequency with 20 MHz reference. In all cases,
dithering of the DCO LSB inputs contributed negligibly to the RMS jitter. With a loop
21
2.3. System level design of DPLL
bandwidth of 700 kHz, the DCO intrinsic phase noise is approximately -110 dBc/Hz
at 1 MHz offset and contributes 179 fs RMS jitter3; improving TDC resolution from
40 ps to 4 ps reduces the TDC’s jitter contribution from 3324 fs RMS down to 232 fs,
effecting a reduction in total output RMS jitter from 3329 fs down to 295 fs. With a
loop bandwidth of 1400 kHz, DCO intrinsic noise contributes 155 fs RMS jitter while
the jitter contributed by TDC quantization can be reduced from 5645 fs down to 394 fs
RMS by improving TDC resolution from 40 ps down to 4 ps resulting in a reduction in
overall RMS jitter from 5647 fs down to 424 fs RMS [8].
104
105
106
107
108
109
−160
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Offset Frequency (Hz)
Pha
se N
oise
(dB
c/H
z)
Jpp = 9.07 ps, Jrms = 0.98 ps, Freq = 2400.340 MHz. The integrated phase jitter 954 fs
Eq. 2.13
MATLABsimulation
Figure 2.12: Phase noise of the output clock (2400.3418 MHz), based on MATLAB/Simulink simulation, when ∆ttdc = 4 ps.
3Details of estimating RMS jitter from a phase noise plot is given in Appendix B.3.
22
2.4. Basic TDC structure
2.4 Basic TDC structure
Fig. 2.13 illustrates the principle of a basic TDC based on a digital delay line. The start
signal (fout) is merely delayed by using buffers or differential inverters to generate multiple
delayed versions of fout. These delayed signals are sampled on the arrival of the rising
edge of stop signal (fref). The outputs of a sampling flip-flop will be high if the delayed
start signal, D[i], passes the stop sampling clock, fref . Otherwise, the sampling process
will generate a low value. Consequently, the TDC generates a pseudo-thermometer code
such that the position of high to low transition estimates the time difference between the
previous rising edge of start signal fout and the rising edge of stop signals, fref , in terms
of the number of TDC delay stages i.e. tr/∆ttdc, as shown in Fig. 2.14.
fout
fref
D Q D Q D Q
Q0 Q1 Qn
D1
D2
fref
0
111110
0
Q[n]
D3
D4
D5
D6
D0
ttdc
tQ
tr
D0 Dn-1 Dnfout
Figure 2.13: Buffer delay line implementation of TDC: simplified schematic view (left);timing diagram(right). The raw Q[i] is pseudo-thermal code to be converted into anormalized binary word representing the fractional phase error.
fref
fout
tf
tr
Te > 0U = 1 - tr/Tout
Figure 2.14: Estimating phase error based on the TDC output.
23
2.4. Basic TDC structure
2.4.1 TDC Normalization Circuit
TDC
0-1 & 1-0 detector
1/x x
tft tdc
trt tdc
2
tft tdc
trt tdc
Q[0: M]
f ref
fout
Tout
t tdc
Tout
t tdc
Tout
tr
1 nΣ
n
k=1
xk
Figure 2.15: A typical circuit to normalize the phaseerror of a TDC.
The TDC pseudo-thermometer
outputs must be normalized to
Tout, similar to other DPLL
signals. An approximation ofTout/∆ttdc is calculated by doubling
the absolute difference betweentr/∆ttdc and tf/∆ttdc, as shown in
Fig. 2.14. Variations are av-
eraged over time using a mov-
ing average filter which generates
Tout/t∆tdc. Inverting the output
of the moving average filter and
multiplying it by the raw TDC
output, tr/∆ttdc, will generate a
normalized phase error i.e. tr/Tout,
as show in Fig. 2.15.
A moving average filter must
have enough points to filter out
period estimation’s errors espe-
cially if a coarse TDC is used.
However, a long moving aver-
age filter slows down the TDC
response and may cause serious
phase estimation error during fre-
quency switch or modulation. Note that some error also arises in the estimate of ∆ttdc
for a particular TDC. When ∆ttdc = 56 ps, the DPLL estimates the TDC resolution
to be 56.24 ps with 7.41 ps standard deviation, as shown in Fig. 2.16(a). However, if
∆ttdc becomes 16 ps, the DPLL estimates the TDC resolution to be 16.04 ps with 0.17 ps
standard deviation, as shown in Fig. 2.16(b). This is show the importance of designing
a fine TDC which will be addressed in the next chapter.
24
2.4. Basic TDC structure
0 10 20 30 40 50 60 70 8010
20
30
40
50
60
70
Time (us)
Estim
ate
of T
DC
reso
lutio
n (p
s)Error of ∆ t
tdc = 0.24 ps, STDV of ∆ t
tdc = 7.41 ps
(a) Coarse TDC: ∆ttdc = 56 ps
0 10 20 30 40 50 60 70 8010
20
30
40
50
60
70
Time (us)
Estim
ate
of T
DC
reso
lutio
n (p
s)
Error of ∆ ttdc
= 0.04 ps, STDV of ∆ ttdc
= 0.17 ps
(b) Fine TDC: ∆ttdc = 16 ps
Figure 2.16: Estimate of TDC resolution as computed by the TDC normalization circuit.The raw data (blue) is plotted along the 128-point moving average filter (red).
25
Chapter 3
A DPLL with Calibrated Coarse and
Stochastic Fine TDC
This chapter presents an overview of the implemented DPLL while focusing on the pro-
posed calibrated coarse-fine TDC. Then, a summary of state-of-art implementations of
TDCs along with their pros and cons is shown. The next section provides an overview
of the proposed low power calibrated coarse and stochastic fine TDC. A detailed discus-
sion of the stochastic behavior of fine TDC is given. Later, an on-chip balanced-mean
code-density test is presented to calibrate the coarse TDC. The chapter concludes with
a summary of the testing setup and measurement results.
3.1 Overview of the DPLL
Fig. 3.1 shows the implemented DPLL architecture that works from 1.99–2.5 GHz .
The shaded blocks in Fig. 3.1 i.e. DCO, TDC, and divide-by-two are custom-designed
while other blocks are implemented using RTL Verilog. There is a retiming circuit to
synchronize the reference clock, fref , with DCO output clock, fout. The retimed reference
clock, fref−D, is used to synchronize the operation of DPLL including phase detection,
normalization, and filtering. The DCO is an LC-oscillator with digitally switchable
capacitors. The output clock of the DCO is divided by two using a CML static divider1.
The CML output is AC coupled before passing it through a pseudo-differential CMOS
1The standard cell library in 0.13 µm IBM CMOS bulk process does not support synthesizable logicbeyond 1 GHz.
26
3.2. State-of-Art Implementations of TDC
Figure 3.1: A digital PLL architecture for fractional frequency synthesis [16]. The shadedblocks are custom-designed but can be automatically generated using a scripting languagelike TCL due to the regular structure.
buffer. After the CML to CMOS stage, the half-rate clock is fed to a synthesized divide-
by-four, output phase accumulator, and ∆Σ modulator. The ∆Σ modulator is used
to reduce the effect of DCO quantization noise and to achieve fine frequency control.
Detailed explanations of each block are given throughout this chapter. Also, Appendix A
has details about the CML divide-by-two, the CML to CMOS conversion, the CML
output buffer, and the flip flop used in the coarse TDC.
3.2 State-of-Art Implementations of TDC
A TDC is widely used in many applications such as nuclear experiments for timing
single-shot events, laser range finders, and space science instruments [17]. In DPLLs,
it has been employed for the measurement of the phase difference between a reference
and an output clock. The phase noise contributed by TDC quantization in, for example,
[5, 10, 18] is unacceptable for many applications that require wide loop bandwidth like
LAN, WCDMA, HSPCA, and LTE [6]. But, designing a fine-resolution and low-power
27
3.2. State-of-Art Implementations of TDC
TDC is a challenging task. The following presents state-of-art implementations of TDC.
3.2.1 Buffer delay and Inverter delay line TDC
The simplest implementation of a TDC uses a buffer delay line [15]. Due to its simple
structure, it can be implemented at Verilog gate-level using a predefined standard cell
library. However, its time resolution is limited by the buffer delay that is technology
dependent. Replacing a buffer delay line with an inverter delay line can improve the
TDC resolution by a factor of two. However, the rise and fall time of the inverter must
be matched such that two adjacent inverters will have the same effective resolution. In
addition, the resolution is still limited by technology. In 0.13 µm CMOS, the inverter
delay varies from 32 to 48 ps over PVT while in 28 nm CMOS technology the inverter
delay is around 10-12 ps.
3.2.2 Vernier delay line TDC
Vernier delay lines are a straightforward method to improve the TDC resolution, using
two delay lines with slightly different stage delays, Ta and Tb, so that the TDC resolution
is determined by the delay difference between the two inverters, [Ta − Tb] [11][19]. Al-
though the Vernier delay-line improves TDC resolution, there is a dramatic area penalty
and increased power consumption especially if a large dynamic range is required. Fur-
thermore, employing a Vernier TDC within DPLL systems would increase the loop delay
and hurt stability since the signals may propagate through a lengthy delay line before the
phase error is resolved. For example, the Vernier delay line in [20] uses two delay lines
consisting of 80 buffers providing 5 ps resolution but resulting in a relatively high DPLL
power consumption of 50 mW in a 90 nm CMOS process. A 2-dimensional Vernier TDC
[21] was proposed to reduce the number of delay stages and the power consumption. A
2-dimensional Vernier TDC resolves 4.8 ps in 65 nm CMOS technology [21]. A DPLL
employing a 2-dimensional Vernier TDC [22] shows a very good noise performance in a
55 nm process.
3.2.3 Gated ring oscillator (GRO) TDC
The gated ring-oscillator (GRO) TDC reported in [12] achieves an effective resolution
of 6 ps in a 0.13 µm technology. It measures the phase error between two signals by
28
3.2. State-of-Art Implementations of TDC
Enable
Counters
Register
Out
+
GRO
(a) Block diagram (b) Timing diagram [12]
Figure 3.2: Block and timing diagram of Gated Ring Oscillator (GRO) based TDC.
enabling a ring oscillator only during a measurement window, as shown in Fig 3.2. The
gating action of the GRO preserves the oscillator state, i.e. the quantization error, at the
end of the measurement interval Tin[k1]. In the following measurement interval Tin[k],
the previous quantization error is carried over, as shown in Fig. 3.2(b), which results in
first-order noise shaping of the quantization error. Furthermore, the GRO-based TDC
could employ multi-phase coupled oscillators to average its delay without the need for
code-density calibration [12]. However, a GRO TDC consumes up to 21 mW for large
phase errors.
3.2.4 Interpolation-Based TDC
An interpolation-based TDC is reported in [23]. It employs a differential delay line to
obtain coarse delay steps. It then interpolates between neighboring phases with a resistor
voltage divider to achieve a small delay step of 4.7 ps in 90nm technology. However,
that TDC uses two auxiliary TDCs and an extra digital loop filter for correction and
calibration, making it power hungry.
29
3.3. Coarse-Fine Stochastic TDC
Start
Stop
TA TA TA
Mux
Encoder
Coarse TDC
Figure 3.3: Two-stage TDC: Coarse TDC followed by timing amplifier of the residuewhich feed to another coarse stage.
3.2.5 Two-step TDC
Two-step TDCs combine a coarse and fine stage to provide fine resolution while still
covering a wide dynamic range of input phase error. For example, the two-step TDC
in [24] uses a delay-line TDC as the coarse TDC followed by a Vernier delay-line fine
TDC. In [25], the residual phase error after a coarse TDC is time-amplified and applied
to another TDC with relatively coarse resolution. Unfortunately, the time amplifier has
high power consumption and a complex analog design which conflicts with the goal of
digitizing the PLL circuits.
3.3 Coarse-Fine Stochastic TDC
This section reports on a low-power two-step coarse-fine TDC achieving 4 ps2 TDC
resolution in a 0.13 µm technology [16]. The proposed TDC architecture uses a coarse-
resolution TDC, as shown in Fig. 3.4, to select a delayed version of the reference clock for
further comparison in a fine-resolution TDC with the output clock. The fine-resolution
2The targeted TDC resolution was 2 ps but could not measure better than -108 dBc/Hz in-bandphase noise which is equivalent to 4 ps TDC resolution.
30
3.3. Coarse-Fine Stochastic TDC
TDC needs to resolve 64 ps range down to 2 ps i.e. 6-bit fine TDC. The fine TDC could
be realized as a Vernier structure which requires a 32 sampling flip-flops and 64 delay
elements (inverters) in addition to the multiplexer. There are two disadvantages of using
a Vernier structure as the fine TDC. One is the large dynamic power consumption due to
the addition of extra 64 inverters. Also, using a Vernier structure delays the phase error
calculation by 32 DCO cycles which affects the loop stability. Similarly, a time amplifier
fine TDC consumes large power. Alternatively, a stochastic fine TDC, shown in Fig. 3.5,
employs 64 latches only without the need for inverters and without experiencing a long
delay. The proposed fine-stochastic TDC uses the stochastic variation of latch offsets
[17] to provide a resolution much better than the technology’s inverter delay.
3.3.1 Coarse TDC
The coarse TDC shown in Fig. 3.4 generates 32 delayed versions of the low-frequency ref-
erence clock by passing it through a chain of pseudo-differential inverters with adjustable
delay. Then, the delayed reference clocks are used for sampling the high-frequency output
clock using sense-amplifier flip flops, shown in Appendix A, that have a narrow symmet-
ric metastability window [15]. The coarse TDC must cover at least one DCO period at
the slowest operating frequency of the DPLL3. Passing the low frequency reference clock
rather than the high frequency output clock through the inverter chain provides two ad-
vantages: lower power consumption and lower jitter induced by the power supply during
the sharp transitions on both the rising and falling edges of the clock signal through the
inverters.
An encoder and 32-to-1 multiplexer are used to select one of the delayed versions of
the reference clock for further comparison with the output clock using the fine TDC. The
encoder introduces a delay which makes it impossible to tap the output of the delay buffer
where the 1-0 transition occurs, since by then the reference clock edge has propagated
further [8]. To solve this problem, the mux selects the output of the second buffer after
the 1-0 edge transition, passing it into the fine TDC. Moreover, the DCO clock is also
delayed to mimic the extra delay experienced by the selected reference clock phase, before
comparison by the fine TDC, as shown on the left of Fig. 3.4.
3Note that a 16-stage coarse TDC is sufficient to cover the maximum period of the DCO clock.However, a 32-stage coarse TDC was used to enable on-chip measurement of the period jitter.
31
3.3. Coarse-Fine Stochastic TDC
Fout
FREF
Q1 Q2 Qn
EncodingFind 1 -0 and 0-1 transition
32x1 CMOS MUX
D_Fref
Coarse TDC
D_Fout
Slop/ delay control
Figure 3.4: The coarse TDC architecture of a two-step TDC. The delayed version of Fref
with phase closest to Fout is muxed to the second TDC stage. Path delays for the selectedreference phase Fref to D Fref and DCO clock Fout to D Fout are matched.
3.3.2 Fine Stochastic TDC
The stochastic TDC is composed of M identical arbiters evaluating in parallel the phase
relationship between two incoming signals [17][26]. Ideally, each arbiter circuit instantly
generates a logical ‘0’ or ‘1’ depending upon which one of the two input signals transitions
first.
In reality, the arbiters exhibit several nonidealities. The output settling time increases
when the time offset between the incoming signals, ∆t, is small. If the time offset is in the
vicinity of zero, the arbiter exhibits metastability and can take a very long time to settle.
Moreover, due to device mismatch, each arbiter exhibits a random input offset voltage,
VOS, that creates different voltage thresholds for each arbiter, as shown in Fig. 3.6(a).
Over a large number of arbiters, these voltage offsets will be Gaussian-distributed with
a standard deviation σV .
32
3.3. Coarse-Fine Stochastic TDC
D_Fref
Fine STDC output
SR
D
SR
D
SR
D
+
D_FoutFine TDC
Figure 3.5: The fine stochastic TDC (STDC) architecture of the two-step TDC. TheSTDC outputs are sampled on the rising edge of the delayed reference clock.
(a)
QQb
R SM1 M2
M3 M4
(b)
Figure 3.6: (a) The stochastic TDC arbiter input-output relationship without and withrandom mismatch. Input-referred voltage offset due to mismatch translates into timeoffset. (b) SR-Latch used in the stochastic TDC as arbiter.
33
3.3. Coarse-Fine Stochastic TDC
The voltage offsets translate into input-referred time offsets, TOS, which will also be
Gaussian distributed with standard deviation σT . If the input clock signals have a long
rise time, even a small voltage offset, VOS, will translate into a significant time offset,
TOS. Accordingly, the time offset of an arbiter can be related to its voltage offset by the
slope of the input signals, Sin, so TOS = VOS/Sin and σT = σV /Sin.
The average stochastic TDC output follows the error function that can be estimated
using a Taylor series expansion as follows,
erf(x) =2√π
x∫
0
e−t2 · dt ≈ 2√π·(x− x3
3+x5
10− · · ·
)(3.1)
The cumulative distribution function (CDF) of the Gaussian-distributed variable is re-
lated to the error function as it follows,
cdf(td, µ = 0, σ = 1) =1√2π
td∫
−∞
e−t2
2 · dt = 1
2
[1 + erf
(td√2
)](3.2)
⇒ cdf(td, µ = 0, σ = σT ) =1√2πσT
td∫
−∞
e−
t2
2σ2T · dt = 1
2
[1 + erf
(td√2σT
)](3.3)
The summed output of a population of M arbiters, with zero mean time offset while
the standard deviation of time offset is σT , has the following approximate CDF (using
the Taylor expansion of the error function as given in Eq. 3.1):
cdf(td; 0, σT ) ≈M
2+
M√2πσT
td −M
6√2πσ3
T
t3d (3.4)
The CDF function is approximately linear around td ǫ [−σT , σT ], as shown in Fig. 3.7.
The stochastic TDC resolution, ∆tstoch, can be estimated as the inverse of the slope of
the CDF function around the midpoint while ignoring the cubic term in Eq. 3.4:
∆tstoch =
√2πσTM
=
√2πσV
M · Sin(3.5)
Considering the cubic term in Eq. 3.4 will increase the estimated ∆tstoch in Eq. 3.5 by
20%, as given here:
34
3.3. Coarse-Fine Stochastic TDC
Figure 3.7: A Monte Carlo simulation of the stochastic TDC for a given negative phaseerror. The sum of all stochastic TDC arbiter outputs translates into a phase error withinthe linear region of the time-offset’s statistical CDF.
∆tstoch =2σT
cdf(td = σT )− cdf(td = −σT )= 1.2
√2πσTM
= 1.2
√2πσV
M · Sin
(3.6)
Hence, it obvious that the stochastic TDC resolution, ∆tstoch, is determined by the
number of arbiters used, the statistical properties of the transistors used to design those
particular arbiters, and the slope of the input signals. Designing a latch with inherently
large mismatch can be achieved by using minimal transistor sizes. However, the slope
of the incoming signal has an even greater effect on the stochastic TDC resolution and
dynamic range and therefore is controlled using a programmable slope control circuit,
implemented by modifying the PMOS load of a CMOS buffer. Although this may increase
the short-term jitter, its impact upon performance was deemed relatively insignificant
for the targeted resolution.
The arbiters have been implemented as set-reset latches based on cross-coupled NAND
gates, as shown in Fig. 3.6(b). The output of these arbiters are sampled on the rising
edge of the delayed reference clock, as shown in Fig. 3.5. This is important to ensure
that the stochastic TDC captures the correct value of the arbiter before it may change
its state.
The arbiter (shown in Fig. 3.6(b)) has an input-referred voltage offset, VOS, due
35
3.3. Coarse-Fine Stochastic TDC
to the random mismatch between its transistors. The mismatch is characterized by
the variations of the threshold voltage, Vth, and by β = µCoxW/L [27]. For a small
overdrive voltage, VOS is affected mainly by Vth variations . However, the effect of Vth
variations reduces for larger overdrive voltage and it becomes comparable to the impact
of β variations. A latch with positive feedback, as shown in Fig. 3.6(b), changes its
output around small overdrive voltage and so Vth variation is the main contributor to
VOS variation. Furthermore, in contrast to the mismatch of the differential input pair,
the mismatch of the transistors forming the positive feedback in the latch has a marginal
effect on the VOS.
Fig. 3.8 shows 512 Monte-Carlo simulations of Vth variations for a minimum size
transistor (i.e. L = 0.12 µm and W = 0.20 µm). According to that simulation, Vth
has a standard deviation of 22.78 mV. Assuming uncorrelated mismatch between the
transistors in the input differential pair (M1 and M2 in Fig. 3.6(b)), the input referred
voltage offset VOS due to Vth mismatch has a standard deviation of√2σVth
= 32.22 mV.
Spectre simulations of the stochastic TDC transfer function, as shown in Fig. 3.9(a),
demonstrate a comparable value and confirm that Vth random variation of the differential
input pair is the main contributor to VOS variations.
Recall also that the stochastic TDC resolution can be adjusted by changing the ref-
erence signal slope, Sin, as shown in Fig. 3.9(b). Accordingly, the reference signal is
buffered such that its rise time has a slope that ranges from 0.2 to 2.0 V/ns programmed
through a serial bus. Accordingly, to achieve a time offset with a standard deviation of
0 50 100 150 200 250 300 350 400 450 500260
280
300
320
340
360
380
400
420
Sample number
Th
resh
old
vo
ltag
e (
mV
)
(a) 512 Monte-Carlo runs
260 280 300 320 340 360 380 400 420 4400
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
Threshold voltage (mV)
Pro
ba
bil
ity
dis
trib
uti
on
(b) Estimated PDF versus ideal PDF distribution
Figure 3.8: Spectre Monte-Carlo simulation of threshold voltage, Vth, for a minimum sizetransistor. Accordingly, mean(Vth) = 345 mV and stdv(Vth) = 22.78 mV.
36
3.3. Coarse-Fine Stochastic TDC
−50 −40 −30 −20 −10 0 10 20 30 40 50
0
10
20
30
40
50
60
Time difference (ps)
Su
mm
atio
n o
f S
TD
C o
utp
uts
(a) CDF for two random stochastic TDCswith the same input slope of 1V/ns
−50 −40 −30 −20 −10 0 10 20 30 40 50
0
10
20
30
40
50
60
Time difference (ps)
Su
mm
atio
n o
f S
TD
C o
utp
uts
(b) CDF for the same stochastic TDC but withdifferent input slopes (a) red: 1V/ns (b) blue:1.5V/ns
Figure 3.9: Transfer function of stochastic TDC output using Spectre Monte-Carlo sim-ulation. Note that Vth has a standard deviation of 22.78 mV.
32 ps, the slope of the incoming signal must be around 1.0 V/ns. This enables the fine
stochastic TDC to have a 64-ps approximately linear region that is around 60% larger
than the coarse TDC resolution. A wide linear range is desirable since any systematic
mismatch (for example as caused by layout mismatch) will shift the CDF to the left or
right and reduce the useful linear range and the ability of the stochastic TDC to resolve
time differences accurately [16].
Employing a large number of arbiters improves the stochastic TDC’s resolution as
well as its differential nonlinearity (DNL). However, there is an area and power con-
sumption penalty traded-off for accuracy. In the following, a statistical treatment of the
stochastic TDC is discussed such that a proper number of arbiters is chosen to achieve
small resolution while consuming reasonable power.
Assume there is a random population of M arbiters with time offsets that follow a
Gaussian distribution. For a very large population size, one can assume that the mean
of arbiter time offset, µT , is zero while the standard deviation is well defined by a single
value, σT . In practice, a limited number of arbiters is used to resolve the time error.
For such a finite population, the mean and standard deviation will be off from the ideal
values depending on the population size, M , and depending on the desired confidence
interval. Define the time offset mean, µT , for M arbiters as a random variable. Then µT
37
3.3. Coarse-Fine Stochastic TDC
lies within the following range:
−tα/2,M−1σM√M
< µT < +tα/2,M−1σM√M
, at (1− α) confidence interval
where σM is the sample standard deviation and tα/2,M−1 denotes the 100 ∗ (1 − α/2)
percentile of the Student t-distribution with M − 1 degrees of freedom where M is the
number of arbiters. Similarly, the population standard deviation is bounded by lower
and upper limits:
σT,min = σM
√M − 1
χ21−α/2,M−1
(3.7)
σT,max = σM
√M − 1
χ2α/2,M−1
(3.8)
where χ2α/2,N−1 is the inverse-Chi-squared distribution with M − 1 degrees of freedom.
For example, using a stochastic TDC with only 64 arbiters will guarantee that µT
lies within ±1.669 · σM/√64 with 90% confidence4. The possible range of µT would
be ±2.656 · σM/√64 for 99% confidence. To first order, the possible range of µT is
proportional to 1/√M for the same confidence level. Doubling the size of the sample
will decrease the potential range of µT by 29%. The uncertainty of µT leads to shifting
the transfer characteristics of the stochastic TDC and so it reduces the available linear
region used to resolve the time error.
Similarly, the uncertainty in the standard deviation of the time offset, σT affects the
stochastic TDC resolution. Using Eq. 3.8 and for 99% confidence interval, the standard
deviation for a stochastic TDC with 64 arbiters lies within [0.81 − 1.29] · σM . If only
32 arbiters are used, the standard deviation will be within [0.75 − 1.46] · σM for 99%
confidence interval. This means that a stochastic TDC’s range could be 29% larger than
expected when 64 arbiters are used or it could be 46% larger than anticipated if 32
arbiters are used.
Recall Eq. 3.6 to estimate the stochastic TDC resolution, ∆tstoch, with 64 arbiters
when σT = 32 ps. Ideally, one expects to achieve 1.50 ps resolution. However, due to the
small number of arbiters (64), σT,max as well as ∆tstoch could be 29% larger i.e. ∆tstoch
could be as large as 1.94 ps. Furthermore, the CDF of a stochastic TDC could suffer from
4In MATLAB, tinv(1− 0.1/2, 63).
38
3.3. Coarse-Fine Stochastic TDC
−80 −60 −40 −20 0 20 40 60 800
10
20
30
40
50
60
Time difference (ps)
Su
mm
atio
n o
f S
TD
C o
utp
uts
(a) CDF
10 15 20 25 30 35 40 45 50 55−2
−1
0
1
2
3
STDC output code
DN
L [p
s]
(b) DNL
10 15 20 25 30 35 40 45 50 55
−4
−2
0
2
4
STDC output code
INL
[p
s]
(c) INL
Figure 3.10: (a) Transfer function of one random example of a non-ideal stochastic TDCwhen the number of arbiters M = 64. The associated DNL and INL are shown in (b)and (c), respectively.
large DNL and INL that limits its ability to resolve small time error. An ideal stochastic
TDC still has a max DNL of around 0.6 ps, as will be discussed later. However, a non-
ideal realistic case of a stochastic TDC with 64 arbiters is shown in Fig. 3.10 that displays
the range of DNL is from -1.5 ps to 2.5 ps while INL varies between -4.2 ps to 2.4 ps.
Fig. 3.11 and Fig. 3.12 show normalized PDFs and CDFs, respectively, of Gaussian
distributed random offset for a stochastic TDC with a different number of arbiters. Each
plot has PDFs of 100 different stochastic TDCs (i.e. Monte-Carlo simulations). It is
obvious that PDFs as well as CDFs of various stochastic TDCs become more consistent
and closer to the ideal case as the number of arbiters M increases but at the cost of
extra power consumption and area penalty. When M is 16 and 32, the variations are
very significant and not acceptable. Increasing M will reduce the DNL for a particular
39
3.4. TDC Output Normalization
stochastic TDC, as shown in Fig. 3.13. However, the integral nonlinearity (INL) may not
and so choosing M to be 64 arbiters is reasonable. Fig. 3.13 shows DNL and INL of an
ideal stochastic TDC when M = 64 & 512. This nonlinearity is caused by the cubic term
in the CDF Eq. 3.4. Based on Fig. 3.13, the worst case DNL is 0.60 ps when M = 64
compared to only 0.075 ps when M = 512. However, INL is around 2 ps regardless of
the number of arbiters.
Finally, it is possible to extend the linear range of the stochastic TDC by using meth-
ods similar to those used for stochastic ADCs. For example, the work in [28] demonstrates
a stochastic ADC with two groups of arbiters where their PDFs are shifted left and right
by applying a symmetric offset. This would create a virtually uniform distribution of the
arbiters’ offsets and improve the CDF linearity with fewer arbiters [8].
3.4 TDC Output Normalization
The proposed TDC is a two-step TDC with a coarse TDC followed by a fine stochastic
TDC to refine the phase error estimation. Outputs of a coarse inverter-line TDC are a
pseudo-thermometer representation of the time error between the input signals that is
normalized to the coarse TDC resolution i.e. tr/∆ttdc. A normalization circuit to change
the coarse TDC outputs into phase error normalized to the output period (Tout) was
shown in section 2.4.1 and it is identical to the left half of the circuit in Fig. 3.14.
The fine stochastic TDC outputs are added together and must be normalized to Tout
as well. An accurate normalization is hardware expensive since it involves dividing the
combined outputs of the stochastic TDC by the reference period Tref and then multiplying
it with FCW. Alternatively, the proposed normalization circuit uses shifting and addition
operations as well as a digital normalizing “Scale” factor (provided on the right side of
Fig. 3.14) to correct the fine TDC output against uncertainty in the clock signal slope, Sin,
and time offset statistics, σT . In this implementation, the Scale factor can be adjusted
with 2 bits of resolution. In a commercial product, the Scale factor can be calibrated
using a technique similar to the one described for the coarse TDC. In this work, only
calibration of the coarse TDC was implemented on-chip since any inaccuracies there will
be dominant.
40
3.5. TDC Calibration
3.5 TDC Calibration
To reap the full performance benefits of a fine resolution TDC, it must have good linearity.
In [29], the reference clock signal is recycled through a single delay cell to avoid the
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Input
No
rma
lize
d P
DF
(a) Number of arbiters = 16
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Input
No
rma
lize
d P
DF
(b) Number of arbiters = 32
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Input
No
rma
lize
d P
DF
(c) Number of arbiters = 64
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Input
No
rma
lize
d P
DF
(d) Number of arbiters = 128
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Input
No
rma
lize
d P
DF
(e) Number of arbiters = 256
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Normalized Input
No
rma
lize
d P
DF
(f) Number of arbiters = 512
Figure 3.11: Normalized PDF of Gaussian distributed random offset for a stochastic TDCwith a different number of arbiters. Each plot is obtained from a 100-run Monte-Carlosimulation.
41
3.5. TDC Calibration
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Input
No
rma
lize
d C
DF
(a) Number of arbiters = 16
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Input
No
rma
lize
d C
DF
(b) Number of arbiters = 32
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Input
No
rma
lize
d C
DF
(c) Number of arbiters = 64
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Input
No
rma
lize
d C
DF
(d) Number of arbiters = 128
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Input
No
rma
lize
d C
DF
(e) Number of arbiters = 256
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Input
No
rma
lize
d C
DF
(f) Number of arbiters = 512
Figure 3.12: Normalized CDF of Gaussian distributed random offset for a stochastic TDCwith a different number of arbiters. Each plot is obtained from a 100-run Monte-Carlosimulation.
nonlinearity that arises from mismatch along a row of delay cells, and an auxiliary loop
fixes the delay against PVT variations. More typically, however, calibration is used to
42
3.5. TDC Calibration
−80 −60 −40 −20 0 20 40 60 800
10
20
30
40
50
60
Time difference (ps)
Su
mm
atio
n o
f S
TD
C o
utp
uts
(a) CDF when M = 64
−80 −60 −40 −20 0 20 40 60 800
100
200
300
400
500
Time difference (ps)
Su
mm
atio
n o
f S
TD
C o
utp
uts
(b) CDF when M = 512
10 15 20 25 30 35 40 45 50 55−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
STDC output code
DN
L [L
SB
]
(c) DNL (M = 64, LSB = 1.49 ps)
50 100 150 200 250 300 350 400 450−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
STDC output code
DN
L [L
SB
]
(d) INL (M = 512, LSB = 0.19 ps)
10 15 20 25 30 35 40 45 50 55−1.5
−1
−0.5
0
0.5
1
1.5
STDC output code
INL
[L
SB
]
(e) INL (M = 64, LSB = 1.49 ps)
50 100 150 200 250 300 350 400 450−15
−10
−5
0
5
10
15
STDC output code
INL
[L
SB
]
(f) DNL (M = 512, LSB = 0.19 ps)
Figure 3.13: The associated DNL and INL of an ideal stochastic TDC with random offsetwhen the number of arbiters M = 64 & 512.
avoid nonlinearity in a TDC.
In a two-step TDC, linearity of the coarse TDC is of prime importance since nonlin-
earities there will introduce more jitter than in the fine-resolution TDC. In this work, the
43
3.5. TDC Calibration
Figure 3.14: Phase error computation and normalization with respect to one DCO outputperiod, performed digitally. The phase error computed by the coarse TDC is refined bythe stochastic TDC.
delay of each stage in the coarse TDC varied from 28 to 38 ps (before extraction) over
200 Monte-Carlo simulations of process and mismatch variations with a Gaussian-like
distribution at an average of 33 ps and 1.89 ps standard deviation, as shown in Fig. 3.17.
Hence, calibration is needed to prevent the coarse TDC mismatch from limiting the overall
performance. Furthermore, calibration of the coarse TDC is crucial to ensure that the
residual quantization error applied to the fine stochastic TDC is within its acceptable
range.
To permit calibration, the coarse TDC comprises independently-programmable delay
stages. Each differential delay stage is comprised of CMOS inverters whose outputs are
cross-coupled by two more inverters and loaded by a 4-bit binary-weighted capacitor
bank, as show in Fig. 3.16. The capacitor bank is implemented with differential MOS
capacitors and provides a programmable delay that can be varied 15 ps, sufficient to
cover delay variations.
44
3.5. TDC Calibration
In this work, a statistical calibration method is used to measure the coarse TDC
nonlinearity. The time-varying difference between an external calibration clock and the
reference clock is relied upon to generate a uniformly distributed time error to be used as
the input to the coarse TDC under calibration. Fig. 3.18 shows the time error generated
by sampling a 333.333 MHz external calibration clock, fcalb, using a 20 MHz reference
clock. This produces time error in the range of -1500 to +1500 ps with fine resolution of
1 ps small enough to allow calibration in that neighborhood. The density function of that
timing error is uniformly distributed. A simplified block diagram of the calibration circuit
to perform a code-density test [30] is shown in Fig. 3.15. A similar statistical method
for measuring DNL is applied to a Vernier TDC in [20]. Unlike that work, however, here
each cell delay is continuously adjusted according to the test results until uniform code
density is observed.
Code-density testing generally needs a large number of clock cycles to achieve accu-
racy. Accordingly, a wide register would be needed to store the number of hits observed in
each delay bin during calibration. In this work, a balanced mean rather than an absolute
mean is used to store the accumulated number of hits in each delay bin during calibration
[31]. Using a balanced mean, the size of the storage registers can be significantly reduced.
Assume a TDC consists of R delay elements and a register is used to store the number
Figure 3.15: On-chip low-area calibration algorithm of the coarse TDC based on a codedensity test. The dedicated calibration clock, fcalb, is sampled by the coarse TDC duringthe calibration phase and once done, the coarse TDC samples the DCO clock.
45
3.5. TDC Calibration
fref
fref
To next
stagea3 a2 a1 a0
m=1m=2m=4m=8
Figure 3.16: Delay cell with 4-bit calibration capacitor bank.
of hits for each delay element (bin). Using a balanced mean, whenever a hit occurs for
the ith bin, the controller increments the ith register by (R − 1) and decrements the
other (R− 1) registers by one. Note that the mean value stored in all registers remains
zero because [no. of hits ×(R − 1)] + [no. of missing hits ×(−1)] = 0. Accordingly,
there is no need to find the average of stored values as is the case with an absolute mean
method. A regular absolute mean code density test requires the code density data be post-
processed to determine the proper calibration code for each delay including averaging,
subtraction, and multiplication. On the other hand, the implemented balanced mean
calibration employs a simple finite-state machine (FSM) that continuously calibrates the
coarse TDC and only monitors a threshold level to determine if a new calibration code
is needed or not.
Figure 3.17: Spectre mismatch Monte-Carlo simulation of the inverter unit used in thecalibrated coarse TDC.
46
3.5. TDC Calibration
Assume that the balanced code density test uses H hits per iteration. Also, assume
the ith register, corresponding to the ith TDC bin, stores a final value of Fi at the end of
calibration phase. Moreover, assume that pi is the probability that an event is collected
for the ith bin, then the number of hits for bin i is piH and so one can write:
Fi = piH · (R− 1) + (1− pi)H · −1 (3.9)
⇒ Fi
H= piR − 1 (3.10)
⇒ pi =1 + Fi/H
R(3.11)
Ideally, pi =1Rand so the Fi/H term in Eq. 3.11 represents the DNLi expressed as
percentage. i.e. DNLi = Fi/H. Hence, each register stores the DNL of each corresponding
TDC bin at the end of balanced mean calibration. To achieve a DNL of 2% with 99%
confidence, a 16-bit register is used for each coarse TDC bin rather than a 23-bit register
which would be required without the use of the balanced mean method, saving 224
registers in total.
To ensure proper operation, a TDC with nonlinearity and the calibration algorithm
are modeled in MATLAB. Later, a Verilog-AMS model of TDC is used while the cali-
0 50 100 150 200 250 300−1500
−1000
−500
0
500
1000
1500
Calibration Cycle
Tim
e E
rror
(ps
)
Figure 3.18: Generating random time errors that are uniformly distributed to calibratethe coarse TDC. Sampling 333.333 MHz calibration clock using 20 MHz reference willproduce uniform time error within ± 1500 ps.
47
3.5. TDC Calibration
0 5 10 15 20 25 30 3526
28
30
32
34
36
38
40
TDC Bin
Bu
ffer
Del
ay (
ps)
Aftercalibration
Beforecalibration
Figure 3.19: The delay of TDC inverters before (blue triangle points) and after (redsquare points) calibration using a balanced code density test with fine delay correction(floating point precision). After calibration, the delay mean = 33.345 ps, std = 0.335 ps,and peak-peak error = 1.203 ps.
0 5 10 15 20 25 30 3526
28
30
32
34
36
38
40
TDC Bin
Bu
ffer
Del
ay (
ps)
Beforecalibration
Aftercalibration
Figure 3.20: The delay of TDC inverters before (blue triangle points) and after (redsquare points) calibration using balanced mean with 1 ps correction step (fixed pointprecision). After calibration, the delay mean = 32.603 ps, std = 0.559 ps, and peak-peakerror = 1.854 ps.
48
3.6. Clock domain synchronization
bration algorithm is coded in Veriolg HDL. The simulation shows the effectiveness of the
algorithm whenever the DNL is in the range -7 ps to +8 ps. Any nonlinearities outside
this range will saturate the correction at the appropriate limits. Fig. 3.19 shows the delay
of the coarse TDC inverters before and after calibration using the balanced mean code
density test. After calibration with infinite precision (i.e. using floating point computa-
tion), the peak-to-peak error is 1.20 ps with 0.34 ps standard deviation. Fig. 3.20 shows
a realistic scenario where the correction step is assumed to be ±1 ps, in accordance with
the transistor implementation in Fig. 3.16. In this case, the peak-to-peak error becomes
1.85 ps with 0.56 ps standard deviation. Choosing a finer correction step would reduce
the peak-to-peak error.
3.6 Clock domain synchronization
There are two asynchronous clock domains in the DPLL, fref and fout. During frequency
acquisition, their edge relationship is not known, and during phase lock, the edges will
Figure 3.21: Clock synchronization of the reference clock, fref , using the DCO clock,fout, and a divided down DCO clock, clk8. The DCO clock is divided by two using CMLdivider which are custom designed. The synchronization afterward and feedback phasecounter are fully synthesized.
49
3.7. Implementation Details of the DPLL
exhibit rotation if the fractional part of FCW is nonzero [8]. This makes DPLL operation
vulnerable to latch metastability which can cause the DPLL to fail from time to time.
This causes a reliability issue which is characterized by the mean time between failure
(MTBF). Digital designers usually solve the problem of asynchronous clock domains by
re-timing the lower frequency clock, fref , with the higher frequency clock, fout.
Usually, one flip-flop (FF) is not enough to achieve this and a series of FFs is used
for the synchronizing process to increase MTBF which improves exponentially with the
number of added FFs [15]. For example, a 2 GHz DPLL with 20 MHz reference im-
plemented in 0.25 µm technology has MTBF of 4.3 seconds if only one retiming FF is
used. Cascading two FFs will increase the MTBF to approximately 12 years. Recent
measurements of synchronizer metastability show a degradation of MTBF with technol-
ogy scaling especially for extreme PVT conditions. Designing the same DPLL in 65
technology will reduce the MTBF to 50 ms when one retiming FF is used. The MTBF
becomes 11 hours using two cascaded FFs and 110 thousands years using three cascaded
FFs [32]. Accordingly, a minimum of two or three FFs must be used for a reliable clock
domain synchronization. Fig. 3.21 shows the implemented synchronization circuit where
the reference clock, fref , is initially synchronized using the output clock, fout, and then
by the divided down output clock, clk8, which is used to dither the DCO using a ∆Σ
modulator. Finally, note that the retiming process generates a fractional phase error
which is estimated and corrected by the TDC.
3.7 Implementation Details of the DPLL
The DPLL has been realized using synthesized Verilog code for the loop filter, normaliza-
tion algorithm, TDC calibration algorithm, a ∆Σ modulator (DSM), high-speed counter
and synchronization logic between the reference clock, output clock, and DSM. Other
blocks such as a CML divide by two, the DCO, and TDC were custom designed.
3.7.1 Digital Loop Filter (DLF)
After calibration and digital normalization of the TDC output, the digital phase and
frequency error are passed to the digital loop filter (DLF). The DLF is controlled by a
FSM that initially enables a proportional filter with a high gain of Kc that controls the
50
3.7. Implementation Details of the DPLL
V
Kp
Ki
+Gear shift
1
00
enFine
Kc1
0
+
0
1
enIIR
IIR
6
24
Finebits
Coarse bits
enFine
Figure 3.22: Implementation of the digital loop filter. The coarse filter uses only propor-tional gain, Kc, to accomplish rough frequency lock. Then, it gets disabled such that afirst order filter takes over to achieve phase lock. Gear shifting is used to accelerate thephase locking time. Finally, the IIR can be enabled to filter out high-frequency noise.
coarse varactors of the DCO to allow fast frequency locking. Then, the FSM freezes the
coarse DCO inputs and switches to phase locking where the DLF consists of a propor-
tional path with gain kp and a delaying integral path with gain ki. Both kp and ki as
well as Kc are programmable via a serial bus. The DLF is followed by an optional IIR
filter with a programmable gain k. Fig. 3.22 shows block diagram of the DLF. Finally,
the digital output of the DLF is applied directly to an array of varactors in the DCO to
control the output phase.
3.7.2 DCO
The DCO is an LC-oscillator with digitally-controlled capacitors as shown in Fig. 3.23.
The LC tank includes two capacitor banks: coarse and fine. The coarse bank, as shown
in Fig. 3.24, uses binary weighted Metal-Insulator-Metal (MiM) capacitors in a common
centroid layout. It has 6 bits of resolution to cover the frequency range 1.99 to 2.50 GHz
resulting in a resolution of approximately 8.125 MHz. The fine capacitor banks are
realized with MOS accumulation-mode varactors that digitally switch between low and
high capacitance values, as shown in Fig. 3.25. The 24-bit fine capacitor banks are
divided into 6-bit integer part with 726 kHz resolution and 18-bit fractional part. The
51
3.7. Implementation Details of the DPLL
M1
M3 M4
M2
C
L
W
Cfin
e
W
Cco
arse
W
Cfin
e
W
Cco
arse
Dec
oder
dC[5:0]
dF[23:0]
Figure 3.23: LC-DCO with two banks of tuning. The coarse tuning is implemented usingMiM capacitors while the fine tuning is achieved by using MOS varactors.
OutP
OutN
61 fF
61 fF
C[0] C[1] C[5]
1x 2x 32x
Figure 3.24: Coarse frequency tuning using Metal-Insulator-Metal (MiM) capacitors.
6-MSBs of the fractional part are binary weighed while the next 7 bits are thermometer
encoded to reduce switching activities and to ensure monotonic behavior of the DCO.
The Unit-sized accumulation-mode varactors provide a frequency resolution of 11 kHz.
The 11-LSBs are fed to a ∆Σ modulator running at a speed of fout/8 (hence in the range
of 250 - 312 MHz) to achieve very fine frequency resolution. Fig. 3.26 shows a summary
of the DCO high-level configuration. Note that the fine capacitor bank is designed to
have enough range to provide at least 50% overlap between adjacent coarse bank settings
and ensure all frequencies are covered.
52
3.7. Implementation Details of the DPLL
OutP
OutN
7x1 15x16 1x64
7x1 2x64 32x64
7bΣ∆
7bfrac.col.
15bfrac.row.
f[0] f[1] f[5]
Fractional fine bits Integer fine bits
Figure 3.25: Fine frequency tuning using MOSFET varactors. The frequency tuning isdefined by the difference of MOS capacitance between the ON and OFF state.
Bin2Th
/8
Σ∆
11
24 fine bits
6 coarse bits
6 MSB
Bin2Th
4
3
7
15
7
8125 kHz/code
726 kHz/code
11 kHz/code
176 kHz/code
dithered11 kHz/code
18 LSB
Figure 3.26: Illustration of the DCO controls bits and gains. There are six coarse bits withan average gain of 8125 kHz/code. Also, there are 24 fine bits that are further divideddown into 6-MSBs with an average gain of 726 kHz/code and 18 LSBs representing thefractional part of the frequency control word. The 18 fractional LSBs are decoded into 7-bit thermo-metric matrix and 11-bit provided to the ∆Σ modulator to achieve immenselyfine frequency resolution below what a minimum size MOS varactor can achieve in aparticular process.
The DCO introduces quantization error because it changes its output frequency in
discrete steps that introduce spurious tones at offset frequencies beyond the loop band-
width. The ∆Σ modulator shapes the quantization noise of the DCO to high offset
frequencies and achieves fine frequency control. The ∆Σ modulator is implemented with
53
3.8. Measurement Results
Frac[10:0]
C[0]
+
C[1]
+
C[2]
+
11 8 5
88 5 51111
Combiner7
T[7:1]
Figure 3.27: Implementation of the third-order reduced complexity ∆Σ modulator [33].The first stage has higher computational resolution compared with the following stagesto reduce power and complexity and to meet timing requirement during synthesis.
a reduced complexity MASH-1-1-1 architecture [33], with each succeeding stage having
shrinking accuracy and area, as shown in Fig. 3.27. The first stage of the ∆Σ modulator
is the most important one, so 11-bit registers are used there. The second stage uses only
8-bit registers while the last stage uses only 5-bit registers.
The output clock of the DCO is divided by two using a CML static divider. The CML
output is AC coupled before passing it through a pseudo-differential CMOS buffer. After
that CML to CMOS stage, the half-rate clock is fed to a synthesizable CMOS divider
and a counter.
3.8 Measurement Results
This section presents the test setup and measurement results of the fabricated test chip.
A prototype was fabricated in 0.13 µm CMOS technology from IBM (acquired by Global
Foundries Inc. in 2015). The fabrication process includes 8 metal layers and high-density
MiM capacitors. Analog power and ground lines are separated from digital power and
ground to minimize noise coupling from digital circuits to analog blocks. The active area
of the proposed DPLL is 0.43 mm2, of which 0.273 mm2 is digital circuitry including
the DPLL core, the calibration algorithm, and the fine stochastic TDC. A die photo of
the fabricated prototype is shown in Fig. 3.28. Note that the active area of the digital
circuitry will be drastically reduced in newer processes. The active area scales down by
a factor of 0.5 as moving from one technology node to another [34]. There are four and
half nodes from 130 µm to 28 nm (28 nm is a half node technology from 32 nm) which
means that the digital active area can be reduced by a factor of 24.5 = 22.7 to become
0.012 mm2 in 28 nm rather than 0.273 mm2 in 130 nm technology.
54
3.8. Measurement Results
DCO 157500 um2
TDC27500 um2
Digital Logic145000 um2
MASH30000 um2
Calibration Logic
700000 um2
Figure 3.28: Die photo of the DPLL chip in an IBM (now GF) 130 nm bulk CMOSprocess (active area is 0.43 mm2).
3.8.1 PCB and Test Setup
The DPLL runs from a temperature-controlled 20 MHz reference clock with a phase noise
of approximately -143 dBm/Hz at 100 Hz and -158 dBm/Hz at 10 MHz (equivalent to
143 fs RMS or 1.03e-3 degree RMS) and it has very small aging and temperature stability
factor on order of 10−8. Also, an external signal generator was sometimes used when a
reference frequency other than 20 MHz was required including to calibrate the coarse
TDC. The DPLL is mounted on custom 4-layer printed circuit board (PCB), as shown
in Fig. 3.29(c). The DPLL can be configured for different bandwidth and operation
settings via serial shift registers. A small form factor FPGA board (DE0 nano from
Terasic that is shown in Fig. 3.29(b)) is employed to shift in the configuration data into
the DPLL. Furthermore, the DPLL and FPGA boards are connected to another PCB
that is shown in Fig. 3.29(a). This main PCB provides biasing and regulated power to
the DPLL board, and translates the voltage levels of the FPGA control signals to the
DPLL board. This modular PCB structure simplifies testing and shortens design time
since only the daughter board must be redesigned to test future designs like the one
presented in Chapter 5.
Many devices and tools were used during testing. A PC running Altera Quartus
programed the FPGA and DPLL, MATLAB to characterize the DCO, and KE5FX soft-
55
3.8. Measurement Results
(a) Main board for powering, biasing, and interfacing with daughter board whereDUT is mounted
(b) FPGA DE0 nano board for programming andcontrolling DUT
(c) Daughter board where the DPLL chip is mounted
Figure 3.29: PCBs used for powering, biasing, and programming the DPLL chip.
56
3.8. Measurement Results
ware to capture the output of an HP 8565C spectrum analyzer. A Tektronix RSA6114A
spectrum analyzer and Agilent PSA high-performance E4448A spectrum analyzer were
also used. An illustrative diagram of the test setup is shown in Fig. 3.30. The HP 8565C
spectrum analyzer has the highest noise contribution among other used equipment of
around 356 fs RMS jitter for 2 GHz output clock. It has a noise sideband of -117 dBc/Hz
at 100 kHz, -133 dBc/Hz at 1 MHz offset, and -142 dBc/Hz at 1 GHz. A Tektronix
RSA6114A spectrum analyzer can detect an average noise level of -151 dBm at 1 GHz
offset and it contributes around 236 fs RMS for 2 GHz clock which is better than the HP
analyzer. Finally, the Agilent E4448A high-performance spectrum analyzer contributes
only 103 fs RMS for 2 GHz output clock and it was used for fine jitter measurements. It
can detect an average noise level of -137 dBm at 100 kHz, -145 dBm at 1 MHz, and 152
dBm at 1 GHz.
PC running Altera Quartus,KE5FX tool, &
MATLAB
USB Blaster Circuit
Altera FPGACyclone IV
Switch LED
40-pin Header
3.3V to 1.0V Level-Shift
DUTDPLL
Low noise Regulators
Biasingcurrents
USB
20 MHz
Spectrum analyzer
Agilent E4448AHP 8565CTektronix RSA6114A
GP
IO
DE0nano board
Daughter board
Main board
Figure 3.30: Block diagram of the test setup. The FPGA on the DE0 nano board, whichcontrols the DPLL chip, is programmed via a PC using Altera Quartus. KE5FX softwareruns on the PC and can capture the spectrum of a particular clock using an HP 8565Cspectrum analyzer.
57
3.8. Measurement Results
3.8.2 Results
Fig. 3.31 shows the open loop test measurements using serially shifted DCO control
words from the on-board FPGA. The coarse DCO bank gain, Kcdco, is 8.125 MHz/code
on average while the fine DCO bank gain, Kfdco, is around 726 kHz/code on average.
The fine DCO bank has a fractional part with 11 kHz/code resolution on average. The
DPLL can lock to any frequency between 1.99 and 2.5 GHz from a nominal reference of
20 MHz.
0 10 20 30 40 50 601.9
2
2.1
2.2
2.3
2.4
2.5
Coarse Digital Code
DC
O F
requ
ency
[GH
z]
DCO Characteristics − Measurements
DCO Model − Kvco = 8.125 MHz/code
(a) Coarse DCO gain = 8125 kHz/code (sweepingonly the coarse DCO control word)
(b) Fine DCO gain = 726 kHz/code (sweepingboth the coarse and fine DCO control word)
Figure 3.31: DCO gain measurements.
Figure 3.32: The differential output clock captured using Tektronix RSA6114A. Thedifferential peak-to-peak voltage is 370mV for a 2 GHz clock.
58
3.8. Measurement Results
Figure 3.33: Spectrum of the output clock, captured by HP8565C spectrum analyzer andKE5FX tool, when the reference clock is frequency modulated.
The DCO output clock is buffered through a four-stage differential CML buffer (see
Appendix A for schematics) that consumes 34 mW and captured by a Tektronix RSA
6114A real-time spectrum analyzer revealing a 370 mV peak-to-peak amplitude., as
shown in Fig. 3.32. Fig. 3.33 shows the spectrum of the output clock, captured by a
HP8565C spectrum analyzer, when the reference clock is frequency modulated. Fig. 3.34
compares the spectrum of the DPLL output based on Verilog-A simulation vs. mea-
surement captured by HP8565C spectrum analyzer. Note that the estimated spectrum
from the Verilog-A simulation is comparable to the measured spectrum for the frequency
range from 100 kHz to 20 MHz. The noise floor of the measured DPLL is higher than the
expected noise level based on Spectre’s simulation which was modeled in the Verilog-A.
Fig. 3.35, Fig. 3.36, and Fig. 3.37 show the fractional spurs for various fractional
values without TDC calibration. For 2012.50 MHz clock, shown in Fig. 3.35, there is an
in-band spurs of -53.42 dBc at 867 kHz offset. The worst spurs of -44.86 dBc appears
at 3.133 MHz offset. For 2003.125 MHz clock, shown in Fig. 3.36, there is -34.05 dBc
spur at 775kHz and -49.30 dBc spur at 1.575 MHz. For larger fractional value when the
output clock is 2006.250 MHz, shown in Fig. 3.37, the worst spurs of -39.29 dBc show up
at 1.575 MHz.
59
3.8. Measurement Results
105
106
107
108
−150
−140
−130
−120
−110
−100
−90
Offset Frequency (Hz)
Pha
se N
oise
(dB
c/H
z)
Simulation
Measurment
Figure 3.34: Verilog-A Simulation vs. Measurement captured by HP8565C spectrumanalyzer and KE5FX tool.
Figure 3.35: Spurs spectrum at 2012.5 MHz measured using Agilent E4448A spectrumanalyzer before calibration.
60
3.8. Measurement Results
Figure 3.36: Spurs spectrum at 2003.125 MHz measured using Tektronix RSA 6114Areal-time spectrum analyzer before calibration.
Figure 3.37: Spurs spectrum at 2006.250 MHz measured using Tektronix RSA 6114Areal-time spectrum analyzer before calibration.
61
3.8. Measurement Results
(a) Before calibration and high frequency filtering
(b) After calibration and high frequency filtering
Figure 3.38: Spurs spectrum at 1995.0 MHz measured using Tektronix RSA 6114A real-time spectrum analyzer.
62
3.8. Measurement Results
(a) Before calibration and high frequency filtering
(b) After calibration and high frequency filtering
Figure 3.39: Spurs spectrum at 2185.0 MHz measured using Tektronix RSA 6114A real-time spectrum analyzer.
63
3.8. Measurement Results
Fig. 3.38 shows spectrum measurements where spurs have been reduced from -54.27
dBc to -70.55 dBc at 2.65 MHz offset from the 1995 MHz carrier by the calibration. Spurs
at larger offset of 5.25 MHz were reduced by 34.2 dB thanks to the use of the DLF with
IIR filter enabled to filter high frequency noise. Fig. 3.39 shows spectrum measurements
where spurs have been reduced from -60 dBc to -70 dBc at 2.65 MHz offset from the 2185
MHz carrier by the calibration.
Fig. 3.40 shows phase noise measurements at a 2 GHz DPLL output frequency and 20
MHz reference clock, using an HP8565C spectrum analyzer and KE5FX tool, with and
without the fine TDC activated. The in-band noise is not less than -83.7 dBc/Hz when
the fine stochastic TDC is disabled. Once the fine TDC is enabled, the in-band phase
noise drops to -104.3 dBc/Hz, which is 20.6 dB lower with only 3 mW additional power
consumption due to the fine stochastic TDC. The loop bandwidth is approximately 1.42
MHz while the integrated random jitter is 697 fs rms (0.502 degree). Once the loop
bandwidth drops to 700 kHz, the integrated random jitter becomes 213 fs rms (0.153
degree) integrated from 1 kHz to 100 MHz.
A phase noise measurement, using an Agilent E4448A spectrum analyzer, with the
103
104
105
106
107
108
−140
−130
−120
−110
−100
−90
−80
−70
Offset Frequency
Ph
ase
No
ise
(dB
c/H
z)
Without fine TDC
With fine TDC
Eq. 2.13
Figure 3.40: Phase noise measurement of 2GHz clock using a HP8565C analyzer with(red) and without (blue) the fine TDC. The reference clock is a 20 MHz temperature-controlled oscillator.
64
3.8. Measurement Results
−4p −3p −2p −1p 0 1p 2p 3p 4p0
10
20
30
40
50
60
70
TIE Jitter
Num
ber
of O
ccur
renc
e
Figure 3.41: The random jitter measurement of the output clock when the fine stochasticTDC is activated.
coarse-TDC calibrated and fine stochastic TDC activated is shown in Fig. 3.42 for a 2.4-
GHz output frequency. The in-band phase noise is -107 dBc/Hz while the jitter is 500 fs
RMS (0.432 degree) integrated from 1 kHz to 100 MHz for a loop bandwidth of 1.42
MHz. Furthermore, the phase noise is -116 dBc/Hz at 2 MHz offset and -137 dBc/Hz
at 19 MHz offset. The random jitter reported by a 25-GS/s real-time oscilloscope was
approximately 50% higher than from the phase noise analyzer, perhaps because some
small fractional spurs are interpreted by the oscilloscope as random jitter.
A phase noise measurement with the coarse TDC calibrated, fine stochastic TDC
activated and IIR filter invoked is shown in Fig. 3.43 for a 1.995 GHz output frequency.
The in-band phase noise is -104 dBc/Hz while the jitter is 233 fs rms (0.167 degree)
integrated from 1 kHz to 100 MHz.
Fractional operation was also confirmed at several other frequencies. For example,
with a reference frequency fref = 21 MHz at synthesized channel of (95 + 67/256)fref =
95.26171875fref = 2.000496094 GHz, as shown in Fig. 3.44(a). Another example, with
a reference frequency fref = 20 MHz at synthesized channel of (109 + 64/256)fref =
109.25fref = 2.185 GHz, as shown in Fig. 3.44(b). The results reveal less than 1 ppm
frequency error. Moreover, with a reference frequency of 20 MHz and a loop bandwidth
of 1.4 MHz, jitter was measured at four fractional channels between 120 and 121, all
exhibiting random jitter within 20% of that observed at an integer channel of 120.
65
3.8. Measurement Results
Figure 3.42: DPLL output phase noise spectrum at 2.4 GHz captured by an AgilentE4448A spectrum analyzer. The in-band noise is -107 dBc/Hz while the integrated jitteris 500 fs rms (0.43 degree) from 1 kHz to 100 MHz for a loop bandwidth of 1.42 MHz.
Figure 3.43: DPLL output phase noise spectrum at 1.995 GHz captured by an AgilentE4448A spectrum analyzer. The in-band noise is -104 dBc/Hz while the integrated jitteris 233 fs rms from 1 kHz to 100 MHz for a loop bandwidth of 700 kHz. An IIR filter wasused to attenuate high frequency spurs.
66
3.8. Measurement Results
(a)
(b)
Figure 3.44: Fractional synthesis measurements using HP8565C analyzer with (a) a21 MHz input reference at channel 95 + 67/256 and (b) a 20 MHz input reference atchannel 109 + 64/256 exhibiting less than 1 ppm frequency error.
67
3.8. Measurement Results
Table 3.1 summarizes state-of-the-art TDC architectures and performances, while
Table 3.2 shows a comparison among state-of-the-art published digital frequency synthe-
sizers. For fair comparison of DPLLs with different reference and carrier frequencies, all
in-band phase noises are normalized using Banerjees figure of merit (BFM) [35]. It is de-
fined as BFM = PN−20∗ log(fout)+10∗ log(fref) where BFM is the normalized phase
noise, and PN is the measured in-band phase noise at low offset frequencies. Note that
the BFM does not take into account the dissipated power or the loop bandwidth. The
DPLLs presented in [36][37][22] have large power consumption mainly due to their DCOs
which have very small out of band phase noise. To account for this, the Gao’s figure of
merit is used to compare the DPLL performance based on the total integrated random jit-
ter and the power dissipation. It is defined as GFM = 10∗ log[(σt,PLL/1s)2 ·PPLL/1mW ]
[38]. The presented coarse-fine DPLL consumes 15.2 mW at 2.4 GHz. The DCO and
CML divide-by-two consume 7.8 mW, the coarse TDC consumes 1.4 mW, the fine TDC
consumes 3 mW, and the remaining standard-cell digital logic dissipates 3 mW. Accord-
ingly, the total power consumption of the proposed coarse-fine TDC is 4.4 mW from
a 1.2 V supply voltage. This is quite low compared to other published fine-resolution
TDC architectures. For example, the coarse-fine TDC based on a time amplifier in [25]
consumes 70 mW and the GRO in [12] consumes 2.2 - 21 mW depending upon the phase
error. Also, note that the power consumption reported here is for 130 nm CMOS tech-
nology, and it can be lowered significantly in newer processes. The power supply reduces
to 0.9 V in 28 nm compared to 1.2 V in 130 nm process while the parasitic capacitance
reduces by an almost factor of 5 due to scaling over four and a half nodes [34]. Recall that
the dynamic power of a digital circuit is given by P = CV 2DDf and based on the given
estimations, the dynamic power consumption in 28 nm is (0.9/1.2)2/5 = 0.1125 times
lower than in 130 nm process. The LC-DCO and CML logic will only be affected by the
power supply reduction. Accordingly, if the presented DPLL is ported to 28 nm technol-
ogy, the estimated power consumption is 0.9/1.2∗7.8+0.1125∗ (1.4+3+3) = 6.68 mW.
Further reduction of power consumption is expected as the 14 nm process has become
available from Intel and Samsung since 2014 while TSMC will release a 10 nm technology
for volume production by the end of 2016 [39].
68
3.8.Measu
rementResu
lts
Table 3.1: State-of-the-art fine-resolution TDC
Reference Interpolative Cyclic Periodic Time GRO 2D Vernier ThisLine [23] Vernier [19] Vernier [11] Amp [25] [12] [21] work
Number of bits 7 12 6 9* 11 7 7⋆Effective resolution [ps] 4.7 8 12 1.25 6 4.8 4INL [LSB] 2.4 N/A 1.15 2 N/A 3.3 N/ADNL [LSB] 1.2 N/A 1 0.8 N/A < 1 N/ABandwidth [MHz] 180 15 40 10 1 50 20Area [mm2] 0.02 0.26 0.04 0.06 0.04 0.08 0.028Supply Voltage [V] 1.2 1.5 1.2 1 1.5 1.2 1.2Power [mW] 3.6 7.5 2.5 70 2.2 to 21 1.7 4.4Technology [nm] 90 130 120 90 130 65 130
* Coarse-fine TDC with 5-bit coarse TDC and 6-bit fine TDC. The effective number of bits is 9-bit.
⋆ Coarse-fine TDC with 5-bit coarse TDC and 5-bit fine TDC. The effective number of bits is 7-bit.
69
3.8.Measu
rementResu
lts
Table 3.2: Comparison Among Published Digital Synthesizers.
Ravi Tonietto Weltin-Wu Hsu Wang Tokairin Lee Temporiti VercesiVLSIC’10 ESSCIRC’06 ISSCC’08 JSSC’08 JSSC’09 JSSC’10 CICC’09 JSSC’10 JSSC’12
Reference [20] [11] [40] [36] [37] [41] [42] [43] [22] This work
Reference frequency [MHz] 40(2x) 40 25 50 26 40 60 35 26 20Output frequency [GHz] 5-6 2 3 3.67 3.6 2.5 3.96 3.5 1.8 1.99-2.5
Bandwidth [MHz] 0.5 3 1.2 0.5 0.10 0.5 0.3 3 1 0.7-1.42In-band Phase noise [dBc/Hz] -94 -102 -100 -106 -95 -105 -96 -101 -108 -104 to -107
Normalized phase noise [dBc/Hz] -211 -212 -216 -220 -212 -217 -210 -216 -219 -217 to -222In-band spurs [dBc] -60 -46 to -42 -45 -42 -75 N/A -38 -58 -50 -34 to -53.3*
Out-of-band phase noise -140 -143 N/A -155 -155 -135 -140 -123 -160 -136in [dBc/Hz] at offset 30MHz 20MHz 20MHz 40MHz 10MHz 20MHz 3MHz 20MHz 30MHz
RMS jitter (fs) 597 1672 2173⋆ 204 364 591 682 1166⋆ 138 574⋆Power [mW] 50 15 9.5 46.7 60 9.7 9.6 9 41.6 15.2
Gao’s FoM (dB) -227.5 -223.8 223.5 -237.1 -231.0 -234.7 -233.5 -229.1 -241 -232.9Active area [mm2] 1.2 0.8 0.4 0.95 0.85 0.37 0.34 0.44 0.7 0.43Technology [nm] 90 130 65 130 130 90 90 65 55 130
⋆ Estimated based on the given phase noise.
* Out of bandwidth spurs reduced from -54 dBc to -70.55 dBc after calibration and high frequency filtering.
70
3.9. Conclusion
3.9 Conclusion
In summary, the performance of DPLLs is still in need of improvement, especially with
respect to spurs and phase noise performance in wide-bandwidth applications. Specifi-
cally, TDC quantization noise and non-linearity are major contributors to in-band phase
noise and spurs, respectively. Improving TDC resolution (quantization step) from 40 ps
to 4 ps can, ideally, improve in-band phase noise by 20 dB. However, achieving 4 ps
resolution in 130 nm CMOS is not an easy task. Also, enhancing the linearity of the
TDC reduces the folding of high-frequency phase noise to low-offset frequencies and re-
duces the spurious tone levels. Accordingly, efficient on-chip calibration algorithms are
essential.
A DPLL with a novel calibrated coarse-fine TDC was presented that is suitable for
modern wireless and wireline standards. The proposed DPLL achieves -104 to -107
dBc/Hz in-band phase noise that is equivalent to 4 ps TDC resolution. The DPLL can
lock to any frequency from 1.99 - 2.5 GHZ using a 20 MHz reference while the loop
bandwidth is around 700 kHz to 1.42 MHz. The entire DPLL consumes 15.2 mW from a
1.2 V supply in IBM’s 0.13 µm bulk CMOS technology. The integrated random jitter from
1 kHz to 100 MHz is 0.167 degree for 1.995 GHz carrier with 700 kHz bandwidth, 0.153
degree for 2 GHz carrier with 700 kHz bandwidth, 0.502 degree for 2 GHz carrier with
1.42 MHz bandwidth, and 0.432 degree for 2.4 GHz carrier with 1.42 MHz bandwidth.
71
Chapter 4
Linearization of Digital PLL
4.1 Introduction
The Digital PLL (DPLL) analysis in the previous chapters depends on a linear model
which fails to explain and predict nonlinear behavior like frequency acquisition and limit
cycles. Unlike a linear system, the steady state response of a nonlinear systems is de-
pendent on the initial conditions. Hence, it is very important to examine the validity
of using a linear model to analyze a DPLL with many sources of nonlinearity including
quantization and saturation.
Recall that a DPLL employs a time-to-digital converter (TDC) and a digitally-
controlled oscillator (DCO) as shown in Fig. 4.1. There are many sources of quantization
noise in DPLLs which have different effects on the purity and settling behavior of the
output clock. Mainly, the DCO quantization manifests as spurious tones outside the loop
bandwidth which get attenuated by the loop dynamics. However, the TDC quantization
noise limits the in-band phase noise and could generate in-band spurious tones either
related to the fractional value of the frequency control word (FCW) or due to the TDC
nonlinearity [8]. Not only does the TDC quantization error cause in-band spurs, but it
can also lead to a DPLL with unpredictable bandwidth and settling behavior which are
dependent on initial conditions like the initial phase error.
The TDC measures the phase difference between the output clock, fout, and the
reference clock, fref where the phase difference is quantized with a limited resolution of
∆ttdc resulting in a quantization error of tQ, as shown in Fig. 4.2. The estimated phase
difference is averaged and normalized to the instantaneous fout period and expressed as
72
4.1. Introduction
a fixed-point number.
If the DPLL is operating as a fractional-N synthesizer, the quantization error intro-
duced by the TDC, tQ may be approximated as white noise [12]. In other words, the
TDC quantization noise is scrambled over time due to the continuously changing phase
relationship between fout and fref , as shown in Fig. 4.3(a). The scrambling of the quan-
tization noise lowers the chance of limit cycle behavior, due to TDC nonlinearities, and
makes linear analysis of DPLL more valid [44].
Digital Loop Filter
DCOfout(t)
fref(t)
FCW
TDC-
+
+
fine
coarse
-
+
+PHE
+
PHF
PHR
Figure 4.1: A digital PLL architecture for integer and fractional mode synthesis.
fout
fref
D Q D Q D Q
Q0 Q1 Qn
D1
D2
fref
0
111110
0
Q[n]
D3
D4
D5
D6
D0
ttdc
tQ
tr
D0 Dn-1 Dnfout
Figure 4.2: Buffer delay line implementation of TDC: simplified schematic view (left);timing diagram (right). The raw Q[i] is pseudo-thermal code to be converted into anormalized binary word representing the fractional phase error.
73
4.1. Introduction
20 25 30 35 40 45 50
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Time (us)
TD
C O
utpu
t (S
cale
d to
DC
O P
erio
d)
(a) Fractional mode: FCW = 120.01709
20 25 30 35 40 45 50
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Time (us)
TD
C O
utpu
t (S
cale
d to
DC
O P
erio
d)
(b) Integer-N mode: FCW = 120
Figure 4.3: TDC Output during frequency (below 30 µs) and phase acquisition for dif-ferent DPLL operation modes.
However, the impact of TDC quantization error is well pronounced during integer-
mode operation [45]. During integer-mode operation, fout aligns itself with fref such
that their phase difference is small and will not be able to span most of the TDC range
once in lock, as shown in Fig. 4.3(b). Accordingly, there is no (or at least not enough)
scrambling of the TDC quantization noise and it becomes concentrated at low frequencies.
Hence, it can not be adequately filtered by the DPLL’s dynamics which causes strong
low-frequency spurs [44][46]. Also, similar behavior can appear if the phase difference
has a periodic pattern due to simple fractional part of FCW like 1/2, 1/4, etc. In this case,
the limited resolution of the TDC has an effect similar to the classic dead-zone behavior
observed in analog phase detectors. The dead-zone has the effect of periodically opening
the loop and letting the phase drift which shows up as deterministic jitter in the output
clock [14].
This chapter elaborates on the dead-zone behavior of a DPLL caused by TDC finite
resolution, focusing on integer-N operation. Also, a pure simple programmable digital
solution to the dead-zone problem is presented that achieves consistently low in-band
phase noise operation regardless of the initial condition while maintaining high loop
bandwidth. This solution is scalable and it is not affected by process, voltage, and
temperature (PVT) variations while it ensures phase locking with minimal phase offset.
74
4.2. TDC Dead-Zone Behavior
Phase error [ps]
TDC output
ttdc
Output phase here → dead-zone behavior
Output phase here → bang-bang behavior
Figure 4.4: The DPLL nonlinearity due to TDC quantized response.
4.2 TDC Dead-Zone Behavior
Recall that a typical TDC is comprised of a chain of buffers with a resolution that ranges
from approximately 32-to-48 ps over PVT in a 0.13-µm CMOS process to 10-to-12 ps
in a 28-nm process [44]. The output clock, fout, propagates through this chain such
that many delayed versions of fout are sampled at the rising edge of the reference clock,
fref , as shown in Fig. 4.2. The TDC reads out the normalized time difference, tr/∆ttdc,
between the rising edge of fref and the previous rising edge of fout. The DPLL reacts to
the time-varying values of the TDC readout to keep the DPLL locked [47].
Due to the TDC’s staircase nonlinearity, different types of nonlinear behavior are
observable depending upon the relationship between the reference phase and DCO output
phase in lock, as illustrated in Fig. 4.4. The DPLL will try to force the TDC output to
track the reference phase provided by the digital phase accumulator on the left side of Fig.
4.1. In integer-N mode, the fractional part of the FCW is zero while the accumulated
reference phase, PHR1, might have an arbitrary fractional value depending upon the
accumulator’s initial condition. The estimated phase difference, ǫ, by TDC tries to
track the fractional part of PHR. Accordingly, and since the fractional part of PHR is
dependent on the initial conditions, the TDC output will be also dependent on the initial
conditions. Furthermore, this dependency will lead to phase lock with an arbitrary offset
dependent on the initial condition.
1The accumulated reference phase in Fig. 4.1 is denoted as PHR or φref .
75
4.2. TDC Dead-Zone Behavior
DCO edge sampled @
Fref edge
Q[i] Q[i+1]
Q[i]
Q[i+1]
Initial position of DCO phase
Fref Fref
Drifting of DCO edge
Dead-zone
TDC output
PHE +1 0 -1
(a) Explanation diagram
−20 −15 −10 −5 0 5 10 15 200
0.005
0.01
0.015
0.02
0.025
0.03
Phase Jitter (ps)
Den
sit
y D
istr
ibu
tio
n
Deterministic jitter is equal to TDC
resolution (32 ps)
(b) PDF of the phase jitter from a behavioral sim-ulation
Figure 4.5: Dead-zone behavior of Integer-N DPLL with a TDC resolution of 32 ps.
DCO edges sampled @
Fref edge
Q[i] Q[i+1]
Initial position of DCO phase
Fref Fref
Drifting of DCO edge
Q[i]
Q[i+1]
Dead-zone
TDC output
PHE +1 0 -1
(a) Explanation diagram
−8 −6 −4 −2 0 2 4 6 80
0.05
0.1
0.15
0.2
Phase Jitter (ps)
De
ns
ity
Dis
trib
uti
on
(b) PDF of the phase jitter from a behavioral sim-ulation
Figure 4.6: Bang-Bang behavior of Integer-N DPLL.
76
4.2. TDC Dead-Zone Behavior
When the phase error coincides with a flat-part of the TDC staircase, the DPLL tries
to lock to a phase where the TDC has low effective gain and, hence, the DPLL has low
loop bandwidth. In this case, the phase error initially lies in the middle of TDC step
anywhere within the gray dead-zone region, as shown in Fig. 4.5(a). As long as it lies
within the gray dead-zone region, the TDC will produce the same output for long time2
and so no correction of the DCO phase is applied within the dead-zone region. It takes
a long time for fout edges to drift toward fref edges such that the DPLL appears as an
open loop during the phase drift within the dead-zone.
Fig. 4.5 shows the worst possible dead-zone behavior when fout locks to fref with
a small frequency offset. Due to unquantified nature of TDC output during integer-N
mode, fout could drift in the wrong direction for a while3 before an error is detected and
corrected by enforcing fout to drift in the opposite direction, as shown in Fig. 4.5(a)
i.e. fout needs to span one whole TDC step back and forth before phase and frequency
error is detected and correction is applied. Therefore, DPLL locks fout to fref with high
deterministic jitter4 equal to the TDC resolution. The probability density function (PDF)
in this case is the convolution of the deterministic jitter and random jitter, as shown in
Fig. 4.5(b). Consequently, a DPLL with a dead-zone behavior results in low-frequency
spurs, as shown in Fig. 4.7(a), similar to analog PLLs with a dead-zone.
Note that the TDC resolution as well as the speed of fout drifting toward fref deter-
mines the frequency and amplitude of those spurs and it is directly related to the DCO
jitter and quantization noise i.e. DCO frequency resolution. A jittery coarse DCO will
have higher frequency spurs with lower amplitude compared to low jitter fine DCO when
they encounter dead-zone operation. Furthermore, low jitter fine DCO has higher chance
of experiencing dead-zone operation compared with high jitter coarse DCO. A high jitter
DCO helps fout escape the dead-zone region faster during phase locking.
On the other hand, if the rising edge of output clock, fout, coincides with a transition
in the TDC staircase, the TDC will operate similar to a bang-bang phase detector as
illustrated in Fig. 4.6(a), where a TDC bin keeps toggling between 0 and 1 and produces
late or early phase difference without being able to quantify the value of that phase
difference. This happens when the initial phase difference between fout and fref is small
compared to the DCO time resolution and jitter such that fout edges drift over time and
2Depending on the length of the dead-zone region which is directly related to TDC resolution.3In case fout lies at the far edge of the dead-zone region.4Or fout is frequency modulated due to the dead-zone region and high bandwidth of DPLL.
77
4.2. TDC Dead-Zone Behavior
104
105
106
107
−100
−90
−80
−70
−60
−50
−40
−30
−20
Offset Frequency (Hz)
Spec
trum
(dB
)
(a) Appearance of low frequency spurs during dead-zone operation
104
105
106
107
−100
−90
−80
−70
−60
−50
−40
−30
−20
Offset Frequency (Hz)
Spec
trum
(dB
)
(b) Noise shaped spectrum of TDC output during bang-bang operation
Figure 4.7: Spectrum of TDC normalized output.
78
4.2. TDC Dead-Zone Behavior
can be quickly detected and corrected as shown in Fig. 4.6(a). The TDC will stay active
bouncing back and forth at high frequency and so the TDC output will be filtered by the
loop dynamics. In this case, the PDF of the phase jitter follows a Gaussian distribution
as shown in Fig. 4.6(b).
During the bang-bang mode of operation, the DPLL will exhibit a loop bandwidth
that depends upon the instantaneous phase error as well as other noise sources in the loop
[48]. It also has the potential for limit-cycle behavior, again resulting in spurs. Based
on a non-linear analysis of bang-bang PLLs, the smaller the jitter on fref and fout, the
higher the loop gain and bandwidth [47].
The more serious of these problems observed is the dead-zone behavior. The dead-
zone behavior increases the spread of phase jitter and degrades the loop bandwidth due
to the degradation of the loop gain. The spread of phase jitter is determined by the DCO
jitter performance as well as its frequency resolution and more importantly by the TDC
resolution.
Fig. 4.8 shows the phase noise, based on simulation results, for the same integer-mode
105
106
107
108
−140
−130
−120
−110
−100
−90
−80
−70
−60
Offset Frequency (Hz)
Phas
e N
oise
(dB
c/H
z)
Figure 4.8: Phase noise of the same output clock for 60 different initial conditions foruncompensated Integer-mode DPLL.
79
4.2. TDC Dead-Zone Behavior
DPLL for 60 different initial conditions illustrating how very different loop bandwidths
can result. The simulation environment employs a high level model of DCO phase noise
and reference noise as presented in [49]. The in-band phase noise varies from -60 dBc/Hz
to -100 dBc/Hz and the loop bandwidth changes by an order of magnitude.
4.2.1 Zero-Phase Restart (ZPR) Mechanism
The DPLL in Fig. 4.1 has a coarse control loop that works as a frequency lock loop (FLL)
to set the DCO frequency close to the desired frequency range. It also has a fine control
loop to achieve accurate frequency and phase locking. To avoid any discontinuities in
the DCO control word during gear shifting from coarse to fine operation, a zero-phase
restart (ZPR) mechanism [15] is used to zero-out the phase detector output and to
avoid disturbing the DCO after frequency locking. The ZPR resets the reference phase
accumulator, PHR, to an arbitrary value depending on the initial frequency error, as
shown in Fig. 4.9.
To avoid the TDC worst dead-zone operation (when deterministic jitter equals to TDC
resolution) and to reduce the dependency of DPLL response on the initial frequency error
during integer-N mode, the fractional part of the reference phase, PHRfrac, can be set to
zero. However, ZPR still modifies the integer part of PHR after the gear shifting from
0
1
PHR
N
++
-
PHF_int
PHF_frac
enFine
+
+
Figure 4.9: Zero-phase-restart (ZPR) triggered during the transition from coarse to finelocking mode. ZPR ensures a smooth transition from coarse to fine locking withoutdisrupting DCO [15].
80
4.3. Noise-Shaped Dithering
coarse to fine control mode.
Even with the TDC worst dead-zone operation avoided by zeroing of PHRfrac, bang-
bang-like operation can still result in inconsistent loop bandwidth and potentially spurs
due to limit cycle behavior. To alleviate the inconsistent behavior of an integer-mode
DPLL, fref edges can be randomized with respect to fout edges to ensure the TDC is
kept busy enough such that the quantization noise is scrambled over time.
4.3 Noise-Shaped Dithering
Recently published work [47] demonstrated an analog approach to avoid the dead-zone
behavior for low bandwidth DPLLs by randomizing the phase of the reference clock, fref .
In [47], the reference buffer is modified by adding 16 bias elements controlled by a dither-
ing sequence provided by a noise shaped ∆Σ modulator, as shown in Fig. 4.10. This
requires custom modification of the reference buffer and accurate sizing of the delay cir-
cuits. Furthermore, due to its analog nature, the effectiveness of this approach is affected
by the PVT variations and so calibration might be needed. Furthermore, mismatches
between the delay elements could reduce the usefulness of this approach. Moreover, this
approach allows fout to lock to fref with an arbitrary phase offset. Another DPLL imple-
mentation based on GRO-TDC [12] intrinsically scrambles the quantization noise with
first-order noise shaping. However, the GRO-TDC design is complex and it consumes
high power and a small dead-zone was still measured for some special cases.
f out
f refXO+
XO-TDC
Dithering
∆∆∆∆ΣΣΣΣ
Programmable delay buffer
MOD.
Figure 4.10: Dithering the reference clock by using ∆Σ modulator to control the pro-grammable delay of an input clock buffer.
81
4.3. Noise-Shaped Dithering
TDC
0-1 & 1-0 detector
1/x x
tf∆t tdc
tr∆t tdc
2
tf∆t tdc
tr∆t tdc
Q[0: M]
Fref
Fout
Tout
∆ t tdc
Tout
∆t tdc
Tout
t r
Figure 4.11: A typical circuit to estimate thephase error of a coarse TDC.
Alternatively, this work proposes to
dither the normalized phase error, ǫ =
1− tr/Tout (which is estimated by a typi-
cal coarse TDC as shown in Fig. 4.11), by
using purely digital techniques. By ob-
serving the transient behavior of normal-
ized phase error, ǫ, as well as its spectrum,
I found that ǫ changes slowly and contin-
ually from 0 to 1 during dead-zone oper-
ation. The spectrum of ǫ exhibiting such
behavior is shown in Fig. 4.7(a) where
a large spur of -33 dBc at 30 kHz offset
is evident. Once a random digital offset,
generated by a 20-bit LFSR, is added to
ǫ, as shown in Fig. 4.12(a), the spurs dis-
appeared and the in-band spectrum drops
to -60 dBc. This random offset solution
ensures consistent but sub-optimal DPLL
performance.
4.3.1 Implemented Noise-Shaped Dithering
Rather than dithering the phase error, ǫ, using a uniform random offset, it is better to
use a 10-bit third-order noise shaped ∆Σ modulator which would provide the required
dithering and linearize TDC response while adding minimal in-band noise to the DPLL.
A block diagram of the dithering circuit is shown in Fig. 4.12(b). It is sampled by fref
and only requires 230 digital gates for implementation in a 0.13 µm IBM CMOS process.
The offset is chosen to be 0.5 DCO period to ensure that the falling edge of fout is always
locked to the rising edge of fref at a phase difference around a step in the TDC response.
A small random offset is added to the ∆Σ modulator input, generated by LFSR, to ensure
acceptable noise shaping as well as to get rid of unwanted reference spurs. Finally, recall
that Fig. 4.8 shows various DPLL responses for the 60 simulations each with a different
initial condition. After applying the noise shaped dithering, the phase noise spectrum as
well as the loop bandwidth becomes consistent, as shown in Fig. 4.13.
82
4.3. Noise-Shaped Dithering
+Tout
tr
0 . 5
LFSR >> 4 +
Tout
tr~
(a) Random offset>> indicates right shift operation
+Tout
tr
0 . 5
LFSR >> 5 +
Tout
tr~
∆Σ modulator
(b) The implemented noise-shapped ran-dom offset in 0.13 µm IBM (now GF)CMOS process
Figure 4.12: Digital dithering algorithm at the falling edge of the output clock (0.5 UI).
104
105
106
107
108
−150
−140
−130
−120
−110
−100
−90
−80
−70
−60
Offset Frequency (Hz)
Phas
e N
oise
(dB
c/H
z)
Figure 4.13: Phase noise of the same output clock for 60 different initial conditions afterapplying noise shaped random offset and disabling the fractional part of ZPR.
83
4.3. Noise-Shaped Dithering
0 10 20 30 40 50 600
1
2
3
4
5
6
7
8
9
10
Simulation Number
RM
S TI
E Ji
tter (
degr
ee)
Figure 4.14: Time Interval Error (TIE) for the 60 simulations with different initial con-ditions: No dithering, Random dithering, ∗ Noise-shaped dithering.
RMS Jitter (deg) PP Jitter (deg)Mode of operation Avg. Dev. Avg. Dev.
(1) Fig. 4.11 :Normal operation 2.86 3.39 13.54 7.79(2) Fig. 4.11 :ZPR off 1.59 0.67 12.22 5.51(3) Fig. 4.12(a) :ZPR off + LFSR 1.38 0.06 10.77 0.78(4) Fig. 4.12(b) :ZPR off + DSM 0.92 0.04 7.68 0.37
Table 4.1: Summary of TIE rms and peak-to-peak jitter for the 60 different simulation.
Fig. 4.14 shows a plot of the time-interval error (TIE) for those simulations before and
after applying the dithering. The average RMS TIE of the 60 simulations after applying
the proposed noise-shaping offset is 0.92 degree with only 0.04 degree standard deviation.
Without dithering, the average RMS TIE is 1.59 degree with 0.67 degree deviation. Table
4.1 present a summary of the average RMS TIE and peak-to-peak jitter along with their
standard deviations for those 60 simulations: (1) when the ZPR is enabled, (2) when
the fractional part of ZPR is disabled without dithering, (3) when the fractional part
of ZPR is disabled with random dithering, and (4) when the fractional part of ZPR is
84
4.3. Noise-Shaped Dithering
disabled with noise shaped dithering. Note that disabling the fractional part of ZPR is
not enough to get a consistently low jitter clock and so dithering is crucial to guarantee
a consistent response.
4.3.2 Improved Noise-Shaped Dithering
The implemented noise shaped random dithering around the falling edge of the output
clock, as shown in Fig. 4.12(b), is effective in linearizing the DPLL during integer mode
such that its response is consistent and independent of the initial conditions. However,
it can not be used if simple fractional channels like 1/2 or 1/4 is needed. An alternative
solution to linearize a DPLL that can work during integer-N mode and during simple
fractional-mode is shown in Fig. 4.15. The phase error is dithered by randomly adding
and subtracting various fractions of TDC resolution to the normalized phase error es-
timated by the coarse TDC over time to make sure DPLL is not stuck in a dead-zone.
Mathematically,trTout
=trTout
± 1
2,1
4,1
8,1
16∗ ∆ttdcTout
+
0 . 5
LFSR >> 5 +
Tout
t r~
∆Σ modulator
TDC
0-1 & 1-0 detector
1/x x
tf∆t tdc
tr∆t tdc
2
tf∆t tdc
tr∆t tdc
Q[0: M]
Fref
Fout
Tout
∆ t tdc
Tout
∆t tdc
Tout
tr
+/- >>1,2,3, or 4
3
Figure 4.15: A generic proposed circuit to generate a dithered phase error, tr/Tout, whichcan be applied to an integer and simple fractional channel synthesis.
85
4.4. Measurement Results
4.4 Measurement Results
The same chip presented in the previous chapter, shown again in Fig. 4.16, is used
to demonstrate the dead-zone behavior and the ability of dithering to linearize the loop
during integer mode. However, the fine stochastic TDC in the DPLL prototype presented
in chapter 3 is disabled and instead the implemented third-order noise shaped dithering
algorithm shown in Fig. 4.12(b) is enabled. Fig. 4.17 shows phase noise measurement
results when the carrier is 2 GHz and the reference is 20 MHz using a HP8565C spectrum
analyzer. For the same frequency and same loop settings, I captured different loop
responses by simply resetting the DPLL many times. Dead-zone operation is drawn in
blue while the medium activity TDC response is shown in green. Large in-band spurs
at 40 kHz and 80 kHz offset frequency are readily seen. The optimal performance of the
integer-mode DPLL after applying the noise shaped dithering algorithm shown in Fig.
4.12(b) is drawn in red. The average integrated RMS jitter is 1.25 ps for 10 different initial
conditions after applying the proposed dithering algorithm with a consistent DPLL loop
bandwidth of 700 kHz. Fig. 4.18 shows the measured jitter histogram during dead-zone
operation. The measurement was done using a Tektronix RSA 6114A real-time spectrum
analyzer. The extracted random jitter is only 896 fs RMS while the deterministic jitter
is 28.3 ps peak-to-peak which is comparable to the coarse TDC resolution.
DCO 157500 um2
TDC27500 um2
Digital Logic145000 um2
MASH30000 um2
Calibration Logic
700000 um2
Figure 4.16: Die photo of the DPLL chip in IBM 130 nm bulk process [8]. It is the samechip used to demonstrate the DPLL with a coarse-fine TDC in chapter 3.
86
4.4. Measurement Results
103
104
105
106
107
108
−140
−130
−120
−110
−100
−90
−80
−70
−60
−50
Offset Frequency
Pha
se N
oise
(dB
c/H
z)
Big spurs due to slow TDC response
caused by the dead−zone
DCO edges drift across full TDC bin
Consistent low in−band phase noise after applying dithering
DCO edges moves slowly
around TDC bins
Figure 4.17: Phase noise measurement using HP8565C analyzer showing different behav-iors of integer-mode DPLL.
Figure 4.18: The measured jitter histogram during dead-zone operation. The extractedrandom jitter is 896 fs RMS while the deterministic jitter due to dead-zone operation is28.3 ps peak-to-peak.
87
4.5. Conclusion
4.5 Conclusion
This chapter presents a detailed explanation of dead-zone behavior in DPLLs operated
in integer mode. It elaborated on the effect of dead-zone behavior on the phase noise
response and on the PDF of the output clock jitter. Based on that understanding, a
simple purely-digital dithering solution is also demonstrated to ensure the DPLL avoids
its dead-zones. The solution employs a third-order noise-shaping phase offset to lin-
earize the bang-bang behavior. The proposed solution ensures phase lock with minimum
offset. Extensive simulation results as well as a DPLL prototype achieve a consistent
low in-band noise operation regardless of the initial condition while maintaining high
loop bandwidth. Contrary to the reference clock dithering presented in [47], the pro-
posed dithering algorithm is scalable, purely digital, and it does not require modification
of the input reference buffer. Furthermore, the proposed algorithm is not affected by
impairment like PVT, mismatch, and noise coupling on the power supply.
88
Chapter 5
Cycle-Slipping and Pull-In Range of
Bang-Bang PLLs
5.1 Introduction
PLLs employing a binary “bang-bang” phase detector (BBPD) have recently become
more commonly used due to their simplicity compared to PLLs with a linear phase
detector, allowing them to operate at the highest possible speed. Also, BBPDs are
preferred over TDCs for integer-mode digital PLLs due to their low power consumption
[25].
However, bang-bang PLLs in general suffer three major drawbacks. Firstly the loop
gain and the associated loop characteristics are hard to define due to the highly non-
linear phase detector, which makes the effective gain dependent on the input jitter [50].
Secondly they have a limited pull-in frequency range that usually does not exceed 10%
of the reference frequency. Thirdly there is a trade-off between pull-in frequency range
and jitter performance of bang-bang PLLs as will be explained afterward [50].
There is a lot of literature dealing with the small-signal steady-state behavior of bang-
bang PLLs [51][52]. Those works focus on the jitter performance and stability of such
PLLs. The pull-in process during frequency acquisition of a bang-bang PLL is very non-
linear and a large signal analysis is needed. An early attempt to quantify pull-in range
of binary bang-bang PLLs was done by [53] where an asymptotic formula was derived for
the pull-in range under certain constraints. That analysis is only valid for lag-lead loop
filters and it does not provide an intuitive understanding of design trade-offs. In [54],
89
5.1. Introduction
Ki
Kp
clkref ε+ DCO
clkoutν
+z-1
±1
ψ
(a) Using Binary Bang-Bang Phase Detector (BBPD)
Ki
Kp
clkref ε
ψ
+ DCOLU
T clkoutν
+z-1
4 phases
[-4,+4]
(b) Using Multi-Phase Bang-Bang Detector (MPBBD)
Figure 5.1: A DPLL with a quantized phase detector and without a feedback divider.
a step-by-step description of oscillator phase and frequency during locking is presented.
However, [54] provides a formula for the frequency lock range before the occurrence of
the first cycle slip, which is much smaller than the pull-in range. More recently, [55]
provides a closed loop formula for pull-in frequency range but it does not provide an
intuitive understanding of the design trade-offs and it also under estimates the pull-in
range.
Fig. 5.1(a) shows a block diagram of a bang-bang digital PLL. Due to the quantized
nature of binary bang-bang PLLs, the instantaneous frequency error is also quantized
and is equal to ±KpKdoc, where Kp, is the proportional path gain 1 and Kdco is the
DCO gain. Accordingly, the bang-bang frequency step, fbb as defined in [56], is equal
to fbb = 2KpKdco. The corresponding bang-bang jitter is equal to Jbb =fbbfout
= 2KPKdco
fout
[UI]. Hence, a small proportional gain, Kp, is often required to minimize bang-bang
jitter. On the other hand, to increase the pull-in frequency range and to speed up the
locking time, a bang-bang PLL requires a larger proportional gain [55], which deteriorates
1Some references refer to the proportional gain as bang-bang gain.
90
5.1. Introduction
jitter performance [57]. Accordingly, there are contradicting requirements on the PLL
proportional gain and so a designer must ensure that a specific PLL loop dynamic is
sufficient to lock in the PLL under some initial frequency error and at the same time be
able to achieve acceptable jitter performance.
φe MPBBD BBPD
(-X, -¾X] -4 -1(-¾X, -½X] -3 -1(-½X, -¼X] -2 -1(-¼X, 0) -1 -1[0, ¼X) +1 +1[¼X, ½X) +2 +1[½X, ¾X) +3 +1[¾X, X) +4 +1 -4
+4
+1
-1¼X ¾X½X X
-¼X-¾X -½X-X
ε
φe
Figure 5.2: Transfer function of the MPBBD (thick solid blue) vs. BBPD (thin dashedred) when a DCO period is divided into eight regions with each region spans 45 degrees.
To decompose the trade-off between pull-in frequency range and jitter performance,
a PLL can dynamically scale its gain according to the initial frequency and phase error
in order to increase the pull-in frequency range while improving jitter performance in
lock, as presented in [57] and in [58]. Another technique is to design a fast frequency
acquisition aid, while freeing loop gain to control jitter performance [50].
In [58], I propose to use a multi-phase bang-bang detector (MPBBD) based PLL, as
shown in Fig. 5.1(b), where a MPBBD transfer function is shown in Fig. 5.2 (there
are plenty of possibilities). The MPBBD acts as a phase detector with an automatic
gear shifting mechanism. Hence, the phase detector absolute large signal gain 2 adjusts
automatically based on the magnitude of the phase and frequency error.
This chapter is divided as follows: First, section 5.2 presents a mathematical analysis
of the transient behavior of MPBBD-based PLLs 3 when far from their lock point. Then,
section 5.3 analyzes the cycle slipping behavior and frequency pull-in process as well as
2The absolute large signal gain of BBPD is KPD = 1 while the absolute large signal gain, KPD, ofMPBBD range from 1 to 4. Note that the steady state small signal gain of both detectors are dependenton the input jitter and can be estimated as kpd ≈ 1
√
2πσrms[51] where σrms is the rms jitter on the
reference clock.3The analysis for BBPD-based PLL is a special case of MPBBD-based PLL when KPD = 1.
91
5.2. Transient Analysis of MPBBD PLL When Far From Lock
the locking time 4. A closed form expression is derived to accurately predict PLL pull-in
frequency range which is more accurate than the prior literature. Later in section 5.4,
a simplified mathematical model of the period and absolute jitter (updated once each
reference cycle rather than each output cycle) is developed for fast simulation of lock-in
frequency range and locking time. Based on the developed understanding, section 5.4.3
proposes an improved MPBBD that has an extended pull in range of ±fref without
deteriorating the jitter performance in lock. Finally, section 5.5 provides an overview of
an implemented DPLL architecture that makes use of a MPBBD as well as a frequency
lock loop. The chapter is concluded with measurements results.
5.2 Transient Analysis of MPBBD PLL When Far
From Lock
It is well known that type-II PLLs can track phase error as well as frequency errors [50].
The integral path of a type-II PLL tracks frequency variation when it is above what
the proportional path [56] can handle. As long as the phase error is small enough, the
proportional path corrects for that error without the need to engage the integral path.
Ki
Kp
φr ε +φoutν
+z-1
ψ
φe+-+ Kdco + 1 - z-1
TrKPD
ω0
N KPD
PD
DCO
Figure 5.3: Phase domain model of DPLL with quantized phase detector.
Fig. 5.3 shows a discrete time (z-domain) phase model of a second order type-II
DPLL with quantized phase detector. The phase detector could be a binary BBPD or a
MPBBD with a stair-case characteristic. The transfer function can be modeled by
ε[k] = KPD · sgn(φe[k]) (5.1)
4Locking time and acquisition time are used interchangeably to mean the time needed for frequencyacquisition till there is no cycle slipping, as will be defined later.
92
5.2. Transient Analysis of MPBBD PLL When Far From Lock
where KPD is the absolute average large signal gain5 of the phase detector. For a binary
BBPD with ±1 output, the average gain KPD = 1. Also, the average MPBBD gain can
be expressed as KPD ≈n∑
r=1
|Kr|/n, where Kr is the output value of the rth step. For a
MPBBD with transfer function as shown in Fig. 5.2, the average gain is estimated as
KPD = 4+3+2+14
= 2.5.
Note that the output clock is sampled directly by the reference clock without being
divided down. Accordingly, the reference phase must be multiplied by the frequency
ratio between output and input clock i.e. N, as shown in Fig. 5.3. Also, note that
the time index, k, is advanced according to the reference sampling period Tr = 1/fref .
Let ω0 be the free running angular frequency of the oscillator output and Kdco be the
oscillator gain expressed in rad/s/step. Also, denote the loop filter output by υ[k] and
the instantaneous output frequency by ωout[k]. Then,
ωout[k] = ω0 +Kdco · υ[k] (5.2)
= ω0 +KdcoKp · ε[k] +KdcoKi · ψ[k] (5.3)
Based on Eq. 5.3 and Fig. 5.3, the change in the output phase can be expressed as the
following
φout[k]− φout[k − 1] = Trω0 + TrKdcoKp · ε[k] + TrKdcoKi · ψ[k] (5.4)
For simplicity, I will consider separately the phase contribution of the proportional path,
φp[k], and integral path, φi[k], as follows:
φp[k] = TrKdcoKp · ε[k] (5.5)
≈ TrKdcoKpKPD · sgn(φe[k]) (5.6)
Define φp ≡ TrKdcoKpKPD ⇒ φp[k] ≈ φp · sgn(φe[k]) (5.7)
Where φp represents the phase correction provided by the proportional path in one ref-
erence period, Tr. Similarly, the phase contribution due to the integral path can be
5From now on, the gain of phase detector, KPD, refers to the absolute average large signal gain ofthe phase detector during frequency acquisition.
93
5.2. Transient Analysis of MPBBD PLL When Far From Lock
expressed as
φi[k] = TrKdcoKi · ψ[k] (5.8)
but ψ[k] = ψ[k − 1] + ε[k] (5.9)
⇒ φi[k] = φi[k − 1] + TrKdcoKi · ε[k] ≈ φi[k − 1] + TrKdcoKiKPD · sgn(φe[k]) (5.10)
Define ωi ≡ KdcoKiKPD ⇒ φi[k] ≈ φi[k − 1] + Trωi · sgn(φe[k]) (5.11)
Hence ωi represents the angular frequency correction provided by the integral path in
one reference period. Finally, substitute Eq. 5.7 and Eq. 5.11 back into Eq. 5.4 to get:
φout[k]− φout[k − 1] = Trω0 + φp[k] + φi[k] (5.12)
The initial frequency offset, ωoff , is defined as the difference between the ideal locked
output frequency, Nωr, and the oscillator free running frequency, ω0.
ωoff = Nωr − ω0 (5.13)
⇒ Trωoff = TrNωr − Trω0 = 2πN − Trω0 (5.14)
If ωoff is within the pull in range, then the output frequency error will converge close to
zero. The phase error is defined as the difference between the reference phase and output
phase. The following expresses the phase error as well as the change in phase error
φe[k] = Nφr[k]− φout[k] (5.15)
⇒ φe[k]− φe[k − 1] = N(φr[k]− φr[k − 1])− (φout[k]− φout[k − 1]) (5.16)
For a fixed reference frequency, the change of reference phase in one reference period is
2π. It follows
φr[k]− φr[k − 1] = kTrωr − (k − 1)Trωr = Trωr = 2π (5.17)
Substituting Eq. 5.12 and Eq. 5.17 into Eq. 5.16, the change in phase error ∆φe[k] can
94
5.3. Cycle Slipping Phenomena
be expressed as the following
φe[k]− φe[k − 1] = ∆φe[k] = 2πN − Trω0 − φp[k]− φi[k] (5.18)
Using Eq. 5.14 ⇒ ∆φe[k] = Trωoff − φp[k]− φi[k] (5.19)
Eq. 5.19 has three terms: the first contributes phase slipping due to the frequency
error between the free-running and lock frequencies; the second and third account for the
phase corrections of the proportional and integral paths, respectively.
5.3 Cycle Slipping Phenomena
A PLL is said to exhibit cycle slipping if the phase error at the input of the BBPD
exceeds the range ±π. This phenomena slows down frequency acquisition and limits the
pull-in frequency range of a PLL [54].
Whether a PLL will exhibit cycle slipping or not is dependent on the relation be-
tween the phase shift caused by the initial frequency offset, ωoff and the available phase
correction by the proportional path, φp. If Trωoff < 2φp6, then a PLL will lock relatively
quickly without cycle slipping, assuming the loop is stable. On the other hand, if the
frequency offset is big enough such that Trωoff > 2φp, but still within the pull-in range
described below, then the PLL will exhibit cycle slipping during frequency acquisition
until the induced phase shift due to frequency error is reduced to below 2φp.
5.3.1 Analysis of Pull-In Frequency Range
Pull-in frequency range is defined as the maximum initial frequency offset, ωoff , for which
a PLL acquires lock, generally after experiencing many cycle slips. To guarantee locking,
the frequency error, Nωr − ωout[k], must reduce in each cycle slip in response to the
phase detector outputs [53]. The period of cycle slipping is inversely proportional to the
frequency error. Accordingly, and in order to guarantee locking, the cycle slip period
must increase with time. If the frequency error remains constant over successive cycle
slipping periods, or increases after a cycle slip period, the PLL will wander around an
intermediate metastable frequency [53].
6The choice of factor 2 is based on an analogy between a BB-PLL and ∆Σ modulator which will bediscussed later. Reference [54] predicts a similar factor.
95
5.3. Cycle Slipping Phenomena
0.10 0.15 0.20 0.25 0.30 0.35 0.40−4
−3
−2
−1
0
1
2
3
4
Time (us)
Pha
se D
etec
tor
Out
put
(a) BBPD output
0.25 0.30 0.35 0.40 0.45 0.50 0.55−4
−3
−2
−1
0
1
2
3
4
Time (us)
Pha
se D
etec
tor
Out
put
(b) MPBBD output
0.10 0.15 0.20 0.25 0.30 0.35 0.40
−180
−120
−60
0
60
120
180
Time (us)
Ph
ase
Err
or
(De
gre
e)
Nωr−ω
out[k]
decreasing
Nωr−ω
out[k]
increasingNω
r−ω
out[k]
increasing
Nωr−ω
out[k]
decreasing
(c) Phase error of BBPD-DPLL
0.25 0.30 0.35 0.40 0.45 0.50 0.55
−180
−120
−60
0
60
120
180
Time (us)
Ph
ase
Err
or
(De
gre
e)
Nωr−ω
out[k]
decreasing
Nωr−ω
out[k]
increasingNω
r−ω
out[k]
increasing
Nωr−ω
out[k]
decreasing
(d) Phase error of MPBBD-DPLL
Figure 5.4: Illustration of cycle slipping and speed of frequency acquisition for BBPD vs.MPBBD based DPLL.
To understand how the frequency error changes over time, a step-by-step analysis
will be presented using the time index in k with a reference period, Tr. Assume that a
PLL has large enough positive frequency error which causes a phase drift from −π to +π
over several reference periods i.e. cycle slipping, as shown in Fig. 5.4(c) and Fig. 5.4(d).
When the phase error is positive, the phase detector output is positive, as shown in Fig.
5.4(a) and Fig. 5.4(b). These positive pulses will push the phase error toward zero.
But they will cause the frequency error to increase rather than decrease since they are
positive pulses. The phase error will then drift below zero and the BBPD will produce
negative pulses as a result. As long as the frequency error is large enough, the phase
error will keep decreasing, though at a slower rate compared with positive phase errors,
toward −π, where a cycle slip happens.
To quantify the pull-in range, denote the number of up pulses in a cycle slipping
period at the edge of pull-in range as Nup and the number of down pulses in a cycle
slipping period as Ndn. Assume that ω0 > Nωr which will cause a negative phase shift
96
5.3. Cycle Slipping Phenomena
in each reference period until it gets corrected by the integral path. Also assume the
initial phase error is just slightly below π, and so the initial output of the phase detector
is positive and stays positive for another Nup reference periods. Recall Eq. 5.11 to find
the phase shift due to the frequency correction of the integral path over the next Nup
periods:
φi[0] = 0
φi[1] = φi[0] + Trωi = 1Trωi
φi[2] = φi[1] + Trωi = 2Trωi
φi[3] = φi[2] + Trωi = 3Trωi
. . .
Also, recall Eq. 5.19 to express the change in phase error during these Nup periods:
∆φe[1] = −Trωoff − φp[1]− φi[1] = −Trωoff − φp − Trωi
∆φe[2] = −Trωoff − φp[2]− φi[2] = −Trωoff − φp − 2Trωi
∆φe[3] = −Trωoff − φp[3]− φi[3] = −Trωoff − φp − 3Trωi
. . .
∆φe[Nup] = −Trωoff − φp[Nup]− φi[Nup] = −Trωoff − φp −NupTrωi
At the end of N thup period, the frequency error increases from ωoff to ωoff +Nupωi. Once
the phase error changes its sign from positive to negative, then the phase detector will
produce negative pulses that move the frequency error in the correct direction. In order to
cause sign inversion of the phase detector output, the sum of the phase shift contribution
of the positive pulses,Nup∑1
∆φe[k], shall be in the neighborhood of −π:
Nup∑
1
∆φe[k] = −NupTrωoff −Nupφp −Nup(Nup + 1)
2Trωi ≈ −π (5.20)
⇒ Nup
(Trωoff + φp +
Nup + 1
2Trωi
)≈ π (5.21)
97
5.3. Cycle Slipping Phenomena
The frequency/ phase correction by the integral path can be ignored to approximate Nup.
This assumption is valid since Nup is small at the edge of pull-in range and since Ki is
usually ≪ Kp to maintain stability and to avoid strong ringing in the step response.
Nup ≈π
Trωoff + φp(5.22)
Now, when there is a negative phase error for the following Ndn periods, one can write
∆φe[Nup + 1] = −Trωoff − φp[Nup + 1]−φi[Nup + 1] = −Tr(ωoff +Nupωi) + φp + Trωi
∆φe[Nup + 2] = −Trωoff − φp[Nup + 2]−φi[Nup + 2] = −Tr(ωoff +Nupωi) + φp + 2Trωi
∆φe[Nup + 3] = −Trωoff − φp[Nup + 3]−φi[Nup + 3] = −Tr(ωoff +Nupωi) + φp + 3Trωi
. . .
∆φe[Nup +Ndn] = −Trωoff − φp[Nup +Ndn]− φi[Nup +Ndn]
= −Tr(ωoff +Nupωi) + φp +NdnTrωi
From the last equation, it is obvious that the frequency correction at the end of a cycle
slipping period is merely (Ndn − Nup)ωi. To guarantee frequency acquisition, Ndn ≥Nup + 1 to ensure the frequency error is reduced by at least ωi at the end of the cycle
slipping period. Otherwise, the frequency error will remain unchanged and not converge
to zero with time. Similar to the phase shift by up pulses, the summation of the phase
shift contribution by down pulses,Nup+Ndn∑Nup+1
∆φe[k], shall be in the neighborhood of −π
and it is equal to the following:
Nup+Ndn∑
Nup+1
∆φe[k] = −NdnTr(ωoff +Nupωi) +Ndnφp +Ndn(Ndn + 1)
2Trωi ≈ −π (5.23)
Substituting Ndn = Nup + 1 in Eq. 5.23 above gives the following simplified form
(Nup + 1)
(Trωoff − φp +
Nup − 2
2Trωi
)= π (5.24)
It is important to note that the total phase change in the first half of the cycle
slip,∑Nup
1 ∆φe[k], was approximated to −π. However, ∑Nup
1 ∆φe[k] could be smaller or
larger than −π depending on the initial condition and the phase shift contributed by the
98
5.3. Cycle Slipping Phenomena
previous cycle slip. Accordingly,∑Nup
1 ∆φe[k] is constrained as it follows
−π − Trωoff − φp −NupTrωi <
Nup∑
1
∆φe[k] < −π + Trωoff + φp + Trωi (5.25)
For example, if there is an initial 14% frequency offset, then the phase shift caused by
this offset is constrained by π − 0.28π < |Nup∑1
φup[k]| < π + 0.28π radian (while ignoring
the effect of φp and φi which are usually much smaller than Trωoff).
Similarly, the total phase change during the second half of cycle slip is constrained as
it follows
−π − Trωoff − φp −NdnTrωi <
Nup+Ndn∑
Nup+1
∆φe[k] < −π + Trωoff + φp + (Ndn −Nup)Trωi
(5.26)
However, the total phase change during a full cycle slip (∑Nup+Ndn
1 ∆φe[k] =∑Nup
1 ∆φe[k]+∑Nup+Ndn
Nup+1 ∆φe[k]) is independent of the initial frequency error and is very close to −2π
−2π − (Nup − 1)Trωi <
Nup+Ndn∑
1
∆φe[k] < −2π +NdnTrωi (5.27)
Accordingly,∑Nup+Ndn
1 ∆φe[k] gives an accurate estimate of the pull in range. So, add
Eq. 5.24 to Eq. 5.21 to get the total phase shift during the first cycle slip:
(2Nup + 1)Trωoff − φp + (N2up − 1)Trωi = 2π (5.28)
Assume that (N2up − 1)Trωi ≪ φp (which is the case when Ki < Kp/64) such that the
effect of ωi can be ignored in Eq. 5.28. Now, substitute Eq. 5.22 into Eq. 5.28
[2π
Trωoff + φp
+ 1
]Trωoff − φp ≈ 2π (5.29)
Rearranging Eq. 5.29 will result in a quadratic equation in terms of φp that is easy to
99
5.3. Cycle Slipping Phenomena
solve
Trωoff ≈√(2π + φp)φp (5.30)
From Eq. 5.30, the pull-in range can be expressed as
fpull−in =ωoff
2π≈ fref
√(2π + φp)φp
2π(5.31)
The pull in range formula can be simplified further without losing accuracy by recalling
that φp ≪ 2π (as required to achieve stability and low peak to peak jitter performance).
fpull−in ≈ fref
√φp
2π=
√frefKdcoKpKPD
2π(5.32)
The pull in frequency range as defined in Eq. 5.31 and Eq. 5.32 is dependent on the
proportional loop gain which is similar to the conclusion presented in [55] but Eq. 5.32 is
more accurate and gives insight on the effect of phase detector gain on the pull in range
during frequency acquisition. The improvement in the pull in range due to the use of
MPBBD over BBPD is proportional to√KPD. Hence, using a MPBBD with a static
transfer characteristic as in Fig. 5.2 has√2.5 = 1.58 times larger pull in range compared
with a BBPD. Based on simulation results, as shown in Table 5.1, the improvement is
58.4% which is almost as predicted by Eq. 5.31. Finally, Table 5.1 presents a comparison
of pull in range that was found using simulation and theoretical results presented in this
section as well as results developed by the authors of [55] and [54]. It clearly shows that
Eq. 5.32 accurately predicts the pull in range with less than 0.02% error. Despite the
Simulation Equation 5.31 Postula [55] Salama [54]
BBPD 6.25% 6.26% 4.22% 1.20%MPBBD 9.90% 9.92% 6.51% 2.45%
Table 5.1: Comparison of the pull-in range of BBPD vs. MPBBD base DPLL usingsimulation and theoretical findings when Kp = 3 and Ki = 1/32.
simplicity of Eq. 5.31 and Eq. 5.32, they both do not capture the effect of Ki on pull-in
range which could underestimate it especially if Ki is not very small relative to Kp. To
include this effect, substitute Eq. 5.22 into Eq. 5.28 without ignoring ωi. This will lead
to cubic equation that can be solved for ωoff
100
5.3. Cycle Slipping Phenomena
(Trωoff)3 + (φp − Trωi)(Trωoff)
2 − (2π + φp + 2Trωi)φpTrωoff
+ (π2Trωi − φ3p − Trωiφ
2p − 2πφ2
P ) = 0 (5.33)
The exact solution to Eq. 5.33 is not provided here but Fig. 5.5 shows the pull-in range
for variousKp andKi combinations when a BBPD is used. Table 5.2 presents a numerical
summary of pull-in range for different combinations of Kp and Ki.
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.16
7
8
9
10
11
12
13
14
15
Normalized Integral path step (KiKdco/fref *100%)
Pulli
ng ra
nge
( / f re
f * 10
0% )
Kp = 3
Kp = 12
Kp = 6
Normalized Integral path step (KiK
dco/f
ref * 100%)
Figure 5.5: The pull-in range (normalized to reference frequency) of BBPD-DPLL fordifferent values of Ki and Kp. Dashed lines obtained by Eq. 5.31, solid lines obtained byEq. 5.33, and symbols are based on simulations.
Kp Ki Sims Eq. 5.31 Eq. 5.33 [55]
3 0.04 6.75% 6.26% 6.60% 4.43%3 0.68 8.15% 6.26% 9.46% 4.66%12 0.04 12.35% 12.59% 12.68% 8.84%12 0.68 13.75% 12.59% 13.94% 8.86%
Table 5.2: The pull-in range (normalized to reference frequency) of BBPD-DPLL basedon simulations and presented theory as well as based on other references.
101
5.3. Cycle Slipping Phenomena
5.3.2 Locking Time
In this section, the frequency locking time is formulated by analyzing the reduction of
frequency offset (i.e. error) in each cycle slipping period. Based on the analysis provided
in the previous section, it is evident that the frequency offset is not constant during each
cycle slipping period. By contrast, it increases in the first half and then decreases in the
second half of each cycle slipping period, or vice versa, depending on the polarity of ωoff .
However, to simplify the analysis, the frequency offset can be approximated as constant
during each cycle slip and is equal to its average value denoted by ωoff (for example,
ωoff = ωoff + (Nup + 1)ωi/2 based on Eq. 5.21). Accordingly, the change in phase and
frequency error will be indexed using the cycle slipping period rather than the reference
period. Define the number of up pulses during the mth cycle slip as Nup[m] where m is
the cycle slip index. Similarly, define the number of down pulses during the mth cycle slip
as Ndn[m]. The period of each cycle slip Tcycle−slip[m] is changing as the PLL progress
towards frequency locking and it can be defined as
Tcycle−slip[m] = Tr(Nup[m] +Ndn[m]), where Tr = 1/fref (5.34)
The frequency locking time tlock is simply the summation of all cycle slips periods
tlock =n∑
m=0
Tcycle−slip[m] =n∑
m=0
Tr(Nup[m] +Ndn[m]) (5.35)
Note that n represents the number of cycle slips exhibited before frequency error becomes
smaller than 2φp.
The longest locking time happens when∑Nup
1 ∆φe[k] contributes the highest possible
phase shift while∑Nup+Ndn
Nup+1 ∆φe[k] has the smallest possible phase shift such that not
many down pulses are generated to correct the frequency error. Recall Eq. 5.20 to
Eq. 5.26 and express the phase shift during the longest locking time
Nup[m](Trωoff [m] + φp) < π + φp + PwTrωoff [m] (5.36)
Ndn[m](Trωoff [m]− φp) > π − φp − PwTrωoff [m] (5.37)
where ωoff [m] is the average frequency offset during the mth cycle slipping period and
Pw is the probability of having the worst locking time. I will assume that Pw = 0, as
102
5.3. Cycle Slipping Phenomena
was done in [55], and make some other assumptions to find a simple form for the locking
time. Later, exact solution of the worst case locking time will be provided but at the
expense of a very complex equation. Rearrange Eq. 5.36 and Eq. 5.37 to express the
summation and the difference of Nup[m] and Ndn[m] pulses during each cycle slipping
period as follows:
Ndn[m] +Nup[m] =2πTrωoff [m]− 2φ2
p
(Trωoff [m])2 − φ2p
(5.38)
Ndn[m]−Nup[m] =2πφp − 2Trωoff [m]φp
(Trωoff [m])2 − φ2p
(5.39)
The frequency correction by the end of themth cycle slip is approximately ωi(Ndn[m]−Nup[m]). Mathematically
ωoff [m+ 1] = ωoff [m] + ωi(Ndn[m]−Nup[m]) (5.40)
Substitute Eq. 5.39 into Eq. 5.40, then
ωoff [m+ 1]− ωoff [m] = ωi
2πφp − 2Trωoff [m]φp
(Trωoff [m])2 − φ2p
(5.41)
Based on Eq. 5.41, the change of frequency offset with respect to the number of cycle
slips can be expressed using a continuous variable of ωoff as follows:
∂ωoff
∂m= ωi
2πφp − 2Trωoffφp
(Trωoff)2 − φ2
p
(5.42)
≈ 2πωiφp
(Trωoff )2 − φ2
p
, assuming Trωoff ≪ π (5.43)
Accordingly, the number of cycle slips, given an initial frequency offset (i.e. error ), can
be calculated as
Ncycle−slips =
ωoff∫
ω0=2φp/Tr
∂m
∂ωoff
· ∂ωoff =(Trωoff)
3 − 3Trωoffφ2p − 2φ3
p
6πTrωiφp, (5.44)
103
5.3. Cycle Slipping Phenomena
0 1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300
Frequency Error (%)
Nu
mb
er o
f C
ycle
Slip
s
Figure 5.6: A plot of the number of cycle slips (Kp = 3 and Ki = 1/32) for BBPD versusMPBBD based DPLL. (a) BBPD: blue circles from simulation results and dashed redline from Eq. 5.44 (b) MPBBD: blue squares from simulation results and solid red linefrom Eq. 5.44.
where ω0 is the maximum frequency offset that does not cause any cycle slipping and
so ω0 = 2φp/Tr. The number of cycle slipping Ncycle−slips vs. a given initial frequency
error is drawn in Fig. 5.6 for both BBPD and MPBBD based PLL. This was done using
Eq. 5.44 and using a time-step simulation of PLL using MATLAB. The agreement and
accuracy of Eq. 5.44 with simulation is well established.
A cycle slip period shown in Eq. 5.34 can be expanded by using Eq. 5.38
Tcycle−slip[m] = Tr2πTrωoff [m]− 2φ2
p
(Trωoff [m])2 − φ2p
(5.45)
Accordingly, the change of cycle slipping period with respect to time index m can be
104
5.3. Cycle Slipping Phenomena
expressed as
∂Tcycle−slip
∂m= Tr
2πTrωoff − 2φ2p
(Trωoff)2 − φ2
p
(5.46)
≈2πT 2
r ωoff
(Trωoff)2 − φ2
p
, assuming φp is very small (5.47)
Using differentiation by substitution and recall Eq. 5.42 and Eq. 5.46, the change of
frequency error with respect to cycle slipping period is
∂ωoff
∂Tcycle−slip
=∂ωoff
∂m· ∂m
∂Tcycle−slip
=ωi
Tr
2πφp − 2Trωoffφp
2πTrωoff − 2φ2p
(5.48)
⇒∂ωoff
∂Tcycle−slip≈ ωiφp
T 2r ωoff
, assuming π ≫ Trωoff ≫ φp (5.49)
And finally, the locking time can be found by recalling Eq. 5.35
tlock =n∑
m=0
Tcycle−slip[m] ≡ωoff∫
ω0=2φp/Tr
∂Tcycle−slip
∂ωoff
· ∂ωoff (5.50)
Using Eq. 5.49 ⇒ tlock =
ωoff∫
2φp/Tr
T 2r ωoff
ωiφp· ∂ωoff =
1
2ωiφp
[(Trωoff)
2 − 4φ2p
](5.51)
⇒ tlock = Tr
[ω2off
2KiKpK2dcoK
2
PD
− 2Kp
Ki
], assuming π ≫ Trωoff ≫ φp (5.52)
For example, if there is 5% frequency error presented at the input of BBPD-PLL running
from 100 MHz reference clock with Kp = 3 and Ki = 0.08 while the DCO LSB is 130
kHz/step, then the locking time is estimated to be 10ns ∗(
(2π∗100e6∗0.05)2
2∗3∗0.08∗(2π∗130e3)2− 2∗3
0.08
)=
30.07µ s. Simulations show a locking time of 32µs.
Eq. 5.52 shows that the frequency locking time is dependent on the square of fre-
quency offset i.e. tlock ∝ ω2off and so it takes an incredibly long time to achieve frequency
acquisition for a large frequency error. Hence, using a binary BBPD to achieve frequency
105
5.3. Cycle Slipping Phenomena
locking is usually not an option. Furthermore, the locking time is inversely proportional
to the loop parameters i.e. tlock ∝ 1KiKp
. Improving the locking time by increasing Ki
and/ or Kp is not desirable since it alters the steady-state behavior of the PLL. Alter-
natively, using a phase detector with an automatic gain shifting mechanism is a better
option. Note that the locking time is inversely proportional to the square of the average
phase detector gain i.e. tlock ∝ 1
K2PD
. For a MPBBD with an average gain of KPD = 2.5,
the improvement in locking time is 2.52 = 6.25 times compared with a BBPD with a gain
of 1. This shows the benefit of using a MPBBD to speed up the frequency and phase
locking performance while not hurting the small signal steady state performance since a
MPBBD will perform like a BBPD once locking is achieved.
By relaxing the assumption (π ≫ Trωoff ≫ φp) used to derive Eq. 5.52, a more
accurate formula for locking time can be found by integrating the inverse of Eq. 5.48
tlock =Trωi
ωoff∫
2φp/Tr
2πTrωoff − 2φ2p
2πφp − 2Trωoffφp
· ∂ωoff (5.53)
= − π
ωiφp
[Trωoff +
(π2 − φ2
p
π
)· ln(π − Trωoff )
]ωoff
2φp/Tr
⇒ (5.54)
tlock =π
ωiφp
[(π2 − φ2
p
π
)· ln(
π − 2φp
π − Trωoff
)− (Trωoff − 2φp)
]⇒ (5.55)
=π
TrKiKpK2dcoK
2
PD
[(π2 − φ2
p
π
)· ln(
π − 2φp
π − Trωoff
)− (Trωoff − 2φp)
](5.56)
Using the same example as above, the calculated locking time using Eq. 5.55 is 32.25µs
compared to 32µs from simulation. This is closer than 30.07µs from Eq. 5.51 but at
the expense of using a more complex expression. However, Eq. 5.55 does not provide an
intuitive understanding of the effect of offset frequency, ωoff , on locking time.
Finally, the worst case locking time can be estimated by considering the maximum
phase contribution when the PD is high and the lowest phase contribution when the PD
output is low (recall Eq. 5.36 and Eq. 5.37). Following similar steps as shown before,
106
5.3. Cycle Slipping Phenomena
0 1 2 3 4 5 6 7 8 9 100
10
20
30
40
50
60
70
Frequency Error (%)
Fre
qu
ency
Lo
ckin
g T
ime
(us) Eq. 5.51
Eq. 5.58
Eq. 5.55
Figure 5.7: Frequency locking time until cycle slips disappear (Kp = 3 and Ki = 1/32)for a BBPD (blue circles from simulation) and MPBBD (blue squares from simulation)based DPLL. Eq. 5.51 is represented using solid red, Eq. 5.55 using small dashed blueline, while Eq. 5.58 using large dashed red.
the worst case locking time can be expressed as the following:
tlock =1
ωi
Trωoff∫
2φp
(2π − 2φp)Trωoff − 2φ2p
2πφp − 2Trωoffφp − 2Pw(Trωoff)2· ∂(Trωoff) (5.57)
⇒ tlock =1
2Pwωi
· (φp − π)[ln(Pwφ
2ferr
+ φpφferr− πφp)
]φferr
=Trωoff
φferr
=2φp
− 1
Pwωi·√φp(φp(2Pw − 1) + π)√
φp + 4πPw
[tanh−1
(φp + 2Pwφferr√φp(φp + 4πPw)
)]φferr
=Trωoff
φferr
=2φp
(5.58)
Fig. 5.7 shows locking time vs. a given initial frequency error expressed as a per-
centage of the reference clock frequency. The locking time is plotted using Eq. 5.51, Eq.
5.55, and Eq. 5.58 for both BBPD and MPBBD based PLL. Furthermore, time-step sim-
ulations of such PLLs are developed in MATLAB and SimuLink to verify the theoretical
107
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
results. It is obvious that locking time based on simulation ranges between Eq. 5.51 and
Eq. 5.58. Furthermore, the simple Eq. 5.51 fairly predicts locking time and gives useful
insights into how the loop parameters and initial conditions affect the locking time.
5.4 Fast Simulation Model of a DPLL with Quan-
tized Phase Detector
Simulating PLLs is difficult and takes a considerable amount of time because of the vastly
different time scales between the reference and the output clock. Furthermore, the time
step of the simulator is usually 1/100 to 1/1000 smaller than the smallest output period
in the system. This section presents a compact and a fast simulation model of DPLLs
to find the locking time and pull in range quickly. The model updates the jitter and
frequency correction provided by the integral path every reference cycle. Consequently,
the presented model is three to four orders of magnitude faster than running a Verilog-A
behavioral simulation, and even faster compared to a Spice circuit simulation.
5.4.1 Model Development
Let tref represent the timing of the rising edge of the reference clock and tdco represent
the timing of every N th rising edge of the output DCO clock. Accordingly, the absolute
jitter of the output clock with respect to the reference clock is simply jabs = tref − tdco.
The output clock is sub-sampled by the reference clock and so the resolvable phase
difference between them must be within one reference period. Accordingly, the absolute
timing jitter, jabs, is warped to (−Tref/2,+Tref/2) where Tref is the reference clock period
and assumed to be constant. The phase detector works as a quantizer of the wrapped
absolute jitter. Finally, the DCO can be modeled in the time-domain rather than in the
frequency-domain.
Tout = T 0out +KT · ν (5.59)
where Tout is the instantaneous DCO period, T 0out is the free running period, ν is the DCO
control input, KT is the DCO period correction factor in seconds and is approximated
as KT ≃ KdcoT2out where Kdco is the DCO frequency resolution.
108
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
Based on this initiation and recalling Fig. 5.3 from section 5.2, one can write the
following set of state-equations using the reference time index k.
Absolute jitter: jabs[k] = tref [k]− tdco[k] (5.60)
Wrapped jitter: ∆t[k] = W(jabs[k]) ≡ (jabs[k] + Tref/2)%Tref − Tref/2 (5.61)
PD output: ε[k] = Q(∆t[k]) ≡ KPD · sgn(∆t[k]) (5.62)
Integrator: ψ[k] = ψ[k − 1] + ε[k] (5.63)
Loop filter: ν[k] = Kp· ε[k] +Ki·ψ[k] (5.64)
DCO: Tout[k] = T 0out +KT · ν[k] (5.65)
As the reference time index, k, progresses, the rising edge of the reference clock, tref [k],
increases by one reference period, Tref [k]. The jitter on the reference clock is usually very
small compared to other sources of jitter and so one can assume a jitter free reference to
simplify the following analysis. Also, the rising edge of output clock, tdco[k], increases by
N multiples of the output period, Tout[k] in each reference period as follows:
tref [k + 1] = tref [k] + Tref [k] ≈ tref [k] + Tref (5.66)
tdco[k + 1] = tdco[k] +NTout[k] (5.67)
Subtract Eq. 5.67 from Eq. 5.66 to find the absolute jitter
jabs[k + 1] = jabs[k] + Tref −NTout[k] (5.68)
Use Eq. 5.65 ⇒ jabs[k + 1] = jabs[k] + Tref −N(T 0out +KT · ν[k]) (5.69)
= jabs[k] +N(Tref/N − T 0out −KT · ν[k]) (5.70)
The initial output period error, Te, can be defined as the difference between the targeted
output period, T fout, and the initial output period, T 0
out:
Te = T fout − T 0
out = Tref/N − T 0out (5.71)
The loop filter provides a correction signal, ν[k], that translates to a period correction,
109
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
Te, by multiplying it with the DCO period gain, KT :
Te[k] = KT · ν[k] (5.72)
Note that the DCO period gain, KT , is a function of the operating region but it can
be assumed constant for small frequency error. Then, the period jitter or period error
can be defined as
jper[k] = Te − Te[k] (5.73)
= Tref/N − T 0out −KT · ν[k] (5.74)
Now, substitute Eq. 5.74 into Eq. 5.70 to get
jabs[k + 1] = jabs[k] +Njper[k] (5.75)
The above DPLL time model can be represented as a two-state system where ∆t[k] and
ψ[k] are the state variables. Rearranging the previous equations:
ψ[k] = ψ[k − 1] +Q(∆t[k]) (5.76)
∆t[k] = W(jabs[k]) (5.77)
where jabs[k] = jabs[k − 1] +N(
jper[k−1]︷ ︸︸ ︷Te −KT (KpQ(∆t[k − 1]) +Kiψ[k − 1])︸ ︷︷ ︸
Te[k−1]
) (5.78)
In the z-domain, this is equivalent to:
Ψ =Q(Φ)
1− z−1(5.79)
Φ = W(Jabs) = Z (∆t[k]), is the z-transform of wrapped jitter (5.80)
Jabs =N
1− z−1
(Te − z−1KT (KpQ(Φ) +KiΨ)
)(5.81)
Based on Eq. 5.79 to Eq. 5.81, one can construct a Simulink model as shown in Fig 5.8
to quickly simulate the locking time and pull in range of a DPLL.
As an example, assume a particular DPLL has the following loop parameters: Kp = 3,
Ki = 0.08, and Kdco = 130 kHz while the reference frequency is 1 GHz and the initial
110
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
+-
1 - z-11
z-1
Ki
Kp
1 - z-1N
Te
jabsjper ∆t ε
ψ
++W() Q()
Feedback output period correction
KT
Te
ν
Figure 5.8: Discrete-time model of phase error development for fast evaluation of DPLL.
frequency error is 3 MHz. Then, the integral path period correction, Kiψ, settles down
around 13.4µs when a BBPD is used. On the other hand, the DPLL takes only 2.6µs
to settle if the MPBBD is used, as shown in Fig. 5.9(a). Furthermore, the trajectory
of the wrapped absolute jitter, ∆t = W(jabs), vs. the integral path period correction,
Kiψ, gives a compact visualization of the cycle slipping process, as shown in Fig. 5.9(b).
It shows that the BBPD-DPLL went through 23 cycle slips before starting the phase
locking stage while the MPBBD-DPLL experienced only one cycle slip given the same
initial frequency error.
5.4.2 Analogy between DPLL and ∆Σ modulator
It is well known that a ∆Σ modulator may suffer from limit cycles, where its output
bits exhibit a repeating pattern. Limit cycle prevention is typically achieved by adding
a random signal just prior to quantization. Many researchers like [59], [56], and [51]
studied the analogy between a BB-PLL and a ∆Σ modulator. They found that a BB-
PLL could also have spurs caused by limit cycles and it can be eliminated by applying
very small dithering or perturbation prior to the quantizer in a mechanism similar to a ∆Σ
modulator. This section will study this analogy to find the maximum frequency offset that
a BB-DPLL can compensate for without experiencing cycle slipping i.e. the acquisition
frequency range. This will be achieved by relating the cycle slipping phenomena in BB-
DPLL to overloading the quantizer in a ∆Σ modulator. Furthermore, the benefit of using
a MPBBD over a BBPD to extend the acquisition range will be shown.
111
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
0 2 4 6 8 10 12 14 16−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
Time (us)
Inte
gral
Path
Per
iod C
orre
ction
(ps)
, kik TΨ
(a) Integral path output (expressed by the amount of output period correction)
−0.5 0 0.5 1 1.5 2 2.5 3 3.5
−180
−120
−60
0
60
120
180
Integrator Path Period Correction (ps)
Phas
e Er
ror (
Degr
ee)
, KiK
TΨ
(b) A trajectory of phase error vs. period correction which emphasizes cycle slipoccurrence as well as the speed of frequency acquisition
Figure 5.9: Integral path output and cycle-slip trajectory for BBPD (blue triangles) andMPBBD (red squares) based DPLL, when frequency offset is 3 MHz (3% frequency errorwhile Kp = 3 and Ki = 1/32).
If the input of a ∆Σ modulator exceeds a specific limit, the quantizer will be over-
loaded. In this case, the ∆Σ modulator becomes unstable and loses its ability to push
the noise to high frequencies. For a PLL, it was shown in the previous section that it
has an implicit wrapping function of the phase error. In other words, once the phase
error exceeds certain limits (±π), it will be wrapped around and it will not overload the
quantizer but it will cause cycle slipping. Accordingly, there is an analogy between the
112
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
z-1
∆t ε+Q()
KpKT
Te
1 - z-1
NKpKT+-
jper
KpKT
(a) First order ∆Σ modulator when Ki = 0
+-
z-1
1 - z-11 ∆t εψ
++Q()
KpKT
Te
1 - z-1
NKpKT++
-0
1jper
KpKTKi/Kp
(b) Second order ∆Σ modulator when Ki > 0
Figure 5.10: Equivalent ∆Σ representation of DPLL with a quantized phase detector.
maximum input to ∆Σ modulator before overloading the quantizer and the maximum
frequency offset before experiencing cycle slipping. To quantify this analogy and to draw
conclusion from it, I will develop an equivalent ∆Σ representation of a DPLL with a
quantized phase detector.
When Ki = 0, Eq. 5.81 reduces to a state-space representation of a first order ∆Σ
modulator with constant input Te
KpKTrepresenting the initial output period error scaled
to the proportional period correction. Fig. 5.10(a) shows this equivalent representation.
Note that the wrapping function is not shown since the model is only valid for inputs
that do not overload the quantizer (which could be binary or multi-level).
Φ =N
1− z−1
(Te − z−1KTKpQ(Φ)
)(5.82)
=NKTKp
1− z−1
(Te
KTKp− z−1Q(Φ)
)(5.83)
When Ki > 0, it can be shown that 5.79 to 5.81 reduce to a state-space representation
of a second order ∆Σ modulator with constant input Te
KpKT. Note that the input is not
exactly constant but it varies over time according to the period jitter and operating
113
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
region. However, this variation is very small compared to the clock period and is very
helpful to get rid of limit cycles in case they appear. Finally, after rearranging the blocks,
the equivalent model in Fig. 5.10(b) is obtained.
A binary BBPD-DPLL is analogous to a ∆Σ modulator with a binary quantizer while
a MPBBD-DPLL is analogous to a ∆Σ modulator with a multi-bit quantizer, which ac-
counts for its improved stability and locking behavior. The literature on ∆Σ modulators
provide asymptotic limits on the maximum input before overloading the quantizer. For
multi-bit ∆Σ modulators with M-step quantizer, the modulator is guaranteed not to
experience overloading for any input u such that [60]
max|u| ≤M + 2−NTF (z = −1) (5.84)
where NTF represents the modulator’s noise transfer function. For the equivalent ∆Σ
model of a DPLL, shown in Fig. 5.10(b), the NTF can be found equal to
NTF (z) =(1− z−1)2
(1− z−1)2 +NKTKiz−1(1 +Ki/Kp − z−1)≈ 1 (5.85)
The criteria in Eq. 5.84 can be used to theoretically define the maximum initial period
error, or equivalently the maximum initial frequency error (acquisition frequency range),
such that a DPLL will not experience any cycle slips.
The input limit defined in Eq. 5.84 is the ratio of the initial period error to the
instantaneous period correction provided by the proportional path i.e., u = Te
KTKp. Ac-
cordingly, max|u| ≤ 1 + 2 − 1 = 2 when a BBPD is used while max|u| ≤ 4 + 2 − 1 = 5
when a MPBBD with eight levels is used. Hence, a BBPD-DPLL will not experience cy-
cle slipping as long as the initial output period error is less than twice the instantaneous
period correction provided by the proportional path i.e. Te ≤ 2KTKp.7. For a MPBBD,
Te ≤ 5KTKp which is 2.5 times larger than the case of binary BBPD. This factor is
equivalent to the average gain of MPBBD, KPD8. Accordingly, employing a MPBBD
rather than a BBPD will improve the frequency acquisition range by KPD.
Simulations show a similar conclusion. A BBPD-DPLL with Kp = 3 and Kdco =
7Note that u = Te
KpKT≡ ωerr
KpKdco⇒ Te ≤ 2KTKp or ωerr ≤ 2KpKdco to avoid cycle slipping.
8To find the frequency locking time, the lower limit of integration in Eq. 5.50 and Eq. 5.53 are chosento be twice the phase correction provided by the loop filter i.e. 2φp/Tr = 2KpKdcoKPD. This was basedon the analogy discussed in this section.
114
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
0 1 2 3 4 5 6 7 8 9 10
−4
−3
−2
−1
0
1
2
3
4
Time (us)
Pha
se D
etec
tor
Out
put
(a) BBPD output
0 1 2 3 4 5 6 7 8 9 10
−4
−3
−2
−1
0
1
2
3
4
Time (us)
Pha
se D
etec
tor
Out
put
(b) MPBBD output
0 1 2 3 4 5 6 7 8 9 10−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Time (us)
Inte
gral
Pat
h P
erio
d C
orre
ctio
n (p
s)
(c) Integral path output (BBPD)
0 1 2 3 4 5 6 7 8 9 10−0.5
0
0.5
1
1.5
2
2.5
3
3.5
Time (us)
Inte
gral
Pat
h P
erio
d C
orre
ctio
n (p
s)
(d) Integral path output (MPBBD)
−0.5 0 0.5 1 1.5 2 2.5 3 3.5
−180
−120
−60
0
60
120
180
Integrator Path Period Correction (ps)
Pha
se E
rror
(D
egre
e)
(e) Trajectory of phase error vs. period cor-rection (BBPD)
−0.5 0 0.5 1 1.5 2 2.5 3 3.5
−180
−120
−60
0
60
120
180
Integrator Path Period Correction (ps)
Pha
se E
rror
(D
egre
e)
(f) Trajectory of phase error vs. period vor-rection (MPBBD)
Figure 5.11: Transient simulation comparison between BBPD and MPBBD based DPLL,when frequency offset is 2.5 MHz (2.5% frequency error while Kp = 3 and Ki = 1/32).
115
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
130 kHz will not experience cycle slipping as long as Te ≤ 2.95KTKp (equivalent to
1.15 MHz frequency error). Replacing BBPD with MPBBD will ensure that the DPLL
will not experience cycle slipping as long as Te ≤ 7.05KTKp (equivalent to 2.75 MHz fre-
quency error). Based on these simulations, the improvement factor of using a MPBBD
compared with a BBPD is 7.05/2.95 = 2.39 which is very close to the predicted improve-
ment of KPD = 2.5. Fig. 5.11 compares the locking time and behavior of BBPD vs.
MPBBD based DPLL when the frequency error is 2.5 MHz. The BBPD-DPLL experi-
enced 12 cycle slips before locking (u = 2.5MHz3∗130kHz
= 6.41 < 7.05) while MPBBD-DPLL
does not slip at all which greatly improves locking time (u = 6.41 < 7.05).
5.4.3 Improved MPBBD (IMPBBD) without cycle slipping to
accelerate Frequency Acquisition
The previous sections demonstrated that the use of a PLL with a MPBBD instead of
a BBPD greatly improves the locking time by K2
PD while it only improves the pull in
range by√KPD. For either case, a PLL still encounters cycle slipping which slows down
the locking process and restricts the pull in range. During each cycle slip period, the
frequency error increases during the first half of cycle slip period and then decreases
during the second half. Based on this understanding, providing a frequency correction
in the right direction, every time step, will break the cycle slipping phenomena.
Similar to the idea of a frequency rotator presented in [60] and [61], Fig. 5.12 shows
a modified transfer function for a MPBBD by merely reversing the sign of either half
of the original transfer function, according to the sign of the frequency error. Due to
the digital nature of DPLLs, the MPBBD transfer function is usually represented as a
look-up-table (LUT). Hence, their transfer function can be easily modified by changing
the associated LUT.
To implement that modification, a simple finite-state machine (FSM) is needed along
with a digital differentiator (i.e. 1 − z−1) to find the sign of the frequency error. The
improved MPBBD (IMPBBD) extends pull in range to ±fref and achieves much faster
locking time as shown in Fig. 5.13. In [62], a modified bang-bang algorithm with FSM
is presented to speed up the locking time for a high speed DPLL. However, the proposed
IMPBBD architecture enhances both locking time and pull in range without increasing
circuit complexity.
116
5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector
Fig. 5.14 shows the phase error in degrees vs. the integral path period correction
for BBPD, MPBBD, and IMPBBD based DPLL when the initial frequency error is
7.5 MHz. From Fig. 5.14(a) it is obvious that BBPD-DPLL does not converge to a stable
state but instead exhibits limit cycles. MPBBD-DPLL converges to the right frequency
correction but after experiencing many cycle slipping, as shown in Fig. 5.14(b). Finally,
the IMPBBD-DPLL converges to the right frequency correction after semi-slipping for
only 8 cycles, as shown in Fig. 5.14(c). Another set of plots (the same experiment and
conditions) of MPBBD and IMPBBD are shown in Fig. 5.15(a) and Fig. 5.15(b).
5.4.4 Verilog-A Simulation of DPLL
All the simulations presented in this section are based on the MATLAB/Simulink real-
ization of the state-space model shown in Fig. 5.8 where the state variables are updated
at the beginning of each reference period. To validate the accuracy of that model, a time-
step simulation in a Cadence environment is conducted using a Verilog implementation
of the DPLL along with a Verilog-A model of the DCO and reference clock to model
their jitter performance. The time step for the Verilog simulation is 1/100 of the output
LUT Output
-4
+4
+1
-1¼Y ¾Y½Y Y
-¼Y-¾Y -½Y-Yφe
LUT Output
-4
+4
+1
-1¼Y ¾Y½Y Y
-¼Y-¾Y -½Y-Yφe
LUT Output
-4
+4
+1
-1¼Y ¾Y½Y Y
-¼Y-¾Y -½Y-Yφe
Large positive
frequency error
Large negative
frequency error
Phase error and small
frequency error
Figure 5.12: Modification of the MPBBD transfer function to extend pull-in range andreduce acquisition time. The improved MPBBD (IMPBBD) identifies the sign of theinitial frequency error and accordingly change its transfer function.
117
5.5. Implemented Architecture
DCO period (which is equivalent to 1/1000 of the reference period in this case). In this
case, the time-step simulation is three orders of magnitude slower than the MATLAB/
Simulink simulation though both simulations show very similar locking time and pull in
range. For example, Fig. 5.16 shows the locking behavior for a BBPD and a MPBBD
based DPLL using Verilog-A simulation when the initial frequency error is 6.0 MHz.
5.5 Implemented Architecture
This section presents a silicon implementation of a DPLL with MBPPD to verify some of
the theories and conclusions in this chapter. The implemented architecture, shown in Fig.
5.17, is based on a 8-level MPBBD that samples the outputs of a multi-phase oscillator.
In [63], a multi-phase oscillator is used to synthesize simple fractional channels (like 1/2,
1/4, 1/8) by making use of the implicit TDC formed by the oscillator and a MPBBD.
However, in the proposed architecture, a MPBBD is used for automatic gear shifting to
accelerate the phase and frequency locking compared to a binary BBPD. Furthermore,
−100 −80 −60 −40 −20 0 20 40 60 80 1000
10
20
30
40
50
60
Lock
ing
Tim
e (u
s)
Offset Frequency (%)
Figure 5.13: Pull in range and locking time of BBPD (blue ∗), MPBBD (red ), andIMPBBD (green •) based DPLL. The lock-in range of the IMPBBD is extended to ±fref(fref is 100 MHz and fout is 1 GHz while Kp = 3 and Ki = 1/32).
118
5.5. Implemented Architecture
0 1 2 3 4 5 6 7 8
−180
−120
−60
0
60
120
180
Integrator Path Period Correction (ps)
Phas
e Er
ror (
Degr
ee)
−0.01 −0.005 0 0.005 0.01
−180
−120
−60
0
60
120
180
Convergence point
, KiK
TΨ
(a) BBPD-based DPLL; frequency offset is larger than the pull-in rangeand so DPLL exhibits limit cycle without converging to the right frequency
0 1 2 3 4 5 6 7 8
−180
−120
−60
0
60
120
180
Integrator Path Period Correction (ps)
Phas
e Er
ror (
Degr
ee)
, KiK
TΨ
(b) MPBBD-based DPLL has larger pull-in range and faster acquisitiontime compared with BBPD-based DPLL
0 1 2 3 4 5 6 7 8
−180
−120
−60
0
60
120
180
Integrator Path Period Correction (ps)
Phas
e Er
ror (
Degr
ee)
, KiK
TΨ
(c) IMPBBD-based DPLL has extended pull-in range and very fast acqui-sition time
Figure 5.14: Integral path output and cycle-slip trajectory for DPLL with three differentphase detectors (frequency offset is 7.5 MHz and fref is 100 MHz while Kp = 3 andKi = 1/32).
119
5.5. Implemented Architecture
0 1 2 3 4 5 6 7 8 9 10
−4
−3
−2
−1
0
1
2
3
4
Time (us)
Pha
se D
etec
tor
Out
put
(a) MPBBD output
0 1 2 3 4 5 6 7 8 9 10
−4
−3
−2
−1
0
1
2
3
4
Time (us)
Pha
se D
etec
tor
Out
put
(b) IMPBBD output
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
Time (us)
Inte
gra
l Pat
h P
erio
d C
orr
ecti
on
(p
s)
(c) Integral path output (MPBBD)
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
Time (us)
Inte
gra
l Pat
h P
erio
d C
orr
ecti
on
(p
s)
(d) Integral path output (IMPBBD)
0 1 2 3 4 5 6 7 8 9 10−2
0
2
4
6
8
10
Time (us)
Per
iod
Jitte
r (p
s)
(e) Period jitter (MPBBD)
0 1 2 3 4 5 6 7 8 9 10−2
0
2
4
6
8
10
Time (us)
Per
iod
Jitte
r (p
s)
(f) Period jitter (IMPBBD)
Figure 5.15: Transient simulation comparison between MPBBD and IMPBBD basedDPLL, (frequency offset is 7.5 MHz and fref is 100 MHz while Kp = 3 and Ki = 1/32).
120
5.5. Implemented Architecture
0 10 20 30 40 50 60 70 80 90 100−80
−70
−60
−50
−40
−30
−20
−10
0
10
20
Time(us)
Inte
gral
Pat
h O
utpu
t (LS
B x
130
kHz)
Figure 5.16: Integral path output of BBPD (dark blue) vs. MPBBD (light red) basedDPLL (frequency offset is 6.0 MHz). The simulation employs uniform time-step sampling(1/100 of DCO period) using a Verilog-A implementation of the DPLL.
the MPBBD gives the DPLL the ability to track large frequency disturbances after being
in lock without involving the FLL loop. By contrast, a binary BBPD will slew and may
take long time to recover, if it ever does, under large frequency disturbance. In [64], a
multi-phase detector is employed where the clock phases are generated locally to allow
in-loop modulation. Our proposed MPBBD has a wider lock range and is simpler to
implement.
In the proposed architecture, an auxiliary frequency lock loop (FLL) is also employed
to extend the lock in range beyond±fref . The FLL is enabled upon reset of the DPLL and
guarantees correct frequency operation without degrading the jitter performance during
lock. The FLL is composed of a re-timing circuit, a counter outputting the number of
output clocks within each reference cycle, an accumulator for the phase of the synthesized
channel based on a given frequency control word (FCW), and finally a digital subtractor.
The FLL controls a 7-bit coarse capacitor bank to bring the four-stage digitally controlled
ring oscillator (DCO) as close as possible to the required output frequency. The FLL
is then disabled after frequency locking is achieved. Accordingly, the proposed DPLL
architecture saves the power of the high speed feedback counter as well as the power of
121
5.5. Implemented Architecture
RefS
Ref
Loop Filter
4
+ -+∑
Ref
FCW
8
7
Lock Detect
Freeze FLL
Frequency Lock Loop (FLL)
Counter
LUT
GearShifting
4
MPBBD
A/A
B/B
C/C
D/D
Figure 5.17: The architecture of the implemented DPLL. The FLL and the high-speedcounter, as well as the synchronization structure, are disabled by a lock detector oncefrequency lock occurs. The MPBBD locks the phase of the output clock (phase A) tothe reference clock.
the re-timing circuit during steady state operation (output clock is phase and frequency
locked to the reference clock).
In a classical PLL, whether it is analog or digital, the feedback divider’s phase noise
appears at the output amplified by a factor of N2. On the other hand, using a sub-
sampling phase detector [65] eliminates the amplification factor of input refereed noise
and totally eliminates the divider noise. The reference clock phase noise is still multiplied
by N2 when transferred to the output.
The presented MPBBD is similar to the sub-sampling detector where the oscillator
output is sub-sampled by the reference clock and no divider is used during phase lock.
Accordingly, the phase noise of the proposed DPLL is independent of the frequency
control word (FCW) i.e. the multiplication factor. The main source of in-band noise is
the reference clock noise and the noise on power supplies.
122
5.5. Implemented Architecture
A lock detector circuit is continuously checking the output patterns of MPBBD and
FLL to determine whether the DPLL is frequency and phase locked or not. During steady
state, the FLL loop is disabled. Compared to a regular binary BBPD, the MPBBD is
able to track larger phase or frequency error without reactivating the FLL and without
slewing for a long time. However, if the MPBBD output is slewing for a long time due
to a very large frequency error, the FLL gets enabled again until locking is achieved.
5.5.1 DCO
The DCO is a four-stage differential ring oscillator, as shown in Fig. 5.19, where each
stage has a 7-bit coarse capacitor bank and an 8-bit fine capacitor bank. The 4-MSBs of
the fine bank are binary encoded while the 4-LSBs are thermometer encoded to reduce
switching activity during locking. The tuning capacitors are implemented as switched
active MOS device, as shown in Fig. 5.18. The layout of the DCO is highly regular and
so automated layout using, for example, TCL is possible if a highly automated design
flow is sought.
The DCO output is a rail-to-rail clock signal where different frequencies are achieved
by adding or removing MOS capacitors. Accordingly, the power consumption, which is
proportional to f.C.V 2, is quite consistent regardless of the DCO frequency. The power
consumption can be brought down by using a current steering programmable DCO.
In this case, the mismatches between the DCO phases are not large compared to one
quarter DCO cycle. In general, the effect of a phase mismatch or duty cycle distortion
could affect the speed of locking under a large frequency disturbance. But, it would not
effect the steady state jitter performance when the loop operates, effectively, as a BBPD.
5.5.2 MPBBD
The eight output phases of the DCO (four differential phases) are sampled by the refer-
ence clock. This results in a 4-bit output stream which carries information of the phase
error sign as well as its magnitude, as shown in Fig. 5.19 and Fig. 5.20. The raw output
of the MPBBD is semi-thermometer encoded and passed through a LUT to generate a
binary representation of the phase error magnitude and sign: -4, -3, -2, -1, +1, +2, +3
or +4 as shown in Fig. 5.20. The loop filter is programmable such that the gain of both
the proportional and integral paths can be altered to achieve a specific performance. For
123
5.5. Implemented Architecture
out+out-
In+ In-
out+In-
out-In+
F<7:4> & T<15:1>
BB-PLL set
FLL set C<6:0>
out+out-
C<i> out+out-
T<i>
F<4> == 16 * T<i>
Figure 5.18: The programmable delay unit used to form a four-stage DCO. Each unithas 7-bit coarse cap configuration and 8-bit fine cap implemented as a combination of4-bit binary along with 15 thermal caps.
example, if the output of the MPBBD is [ACBD] = 0110, then the reference clock is
leading the output clock by more than 90 degrees but less than 135 degrees. And so, the
corresponding LUT output is -3, providing a larger phase correction signal to the DPLL
loop filter.
During steady state operation, assuming the phase error remains in the range ±π/4degrees, the MPBBD alternates between +1 and -1 with the same dynamics as a PLL em-
ployed a simple binary BBPD. However, during initial locking when the DCO frequency
is not locked, the MPBBD output will span the full range from -4 to +4. Similarly, when
the input clock is frequency modulated the MPBBD remains active.
5.5.3 Loop Filter
The loop filter has a 16-bit output where the 8-MSBs are directly driving the DCO fine
capacitor bank (which is a combination of a 4-bit binary rationed bank and a 15-bit unary
thermal banks). The 8-LSBs are considered fractional bits that represent a frequency
step smaller than one DCO LSB step. To realize finer DCO frequency resolution, the 8-
LSBs at the loop filter output are fed to a first order noise-shaping delta-sigma modulator
(DSM) that drives one LSB capacitor at one fourth of the output frequency i.e. the DSM
frequency ranges from 150-375 MHz.
124
5.5. Implemented Architecture
A A
B
C
D
B
C
D
A B C D
A B C D
+ -- +
+ -- +
+ -- +
+ -- +
Ref
A
B
C
D
A
B
C
D
0
0
1
0
1
1
0
1
Ref
+ve Φe
-ve Φe
MPBBD Outputs
+1
+2+3
+4
-1
-2-3
-4
Figure 5.19: Timing diagram of the multi-phase DCO sampled by a reference clock atsome point. Based on the sequence of MPBBD outputs (01111000), the LUT provides anindication of the phase error magnitude between phase A and reference clock as shownin the circle on the right bottom side.
FF out LUT A C B D MPBBD BBPD
0 1 0 0 -4 -10 1 1 0 -3 -10 0 1 0 -2 -10 0 1 1 -1 -11 0 1 1 +1 +11 0 0 1 +2 +11 1 0 1 +3 +11 1 0 0 +4 +1
LUT Output
-4
+4
+1
-1¼Z ¾Z½Z Z
-¼Z-¾Z -½Z-Zφe
Figure 5.20: The transfer function of the MPBBD and its LUT (thick solid blue) vs.BBPD (thin dashed red).
125
5.5. Implemented Architecture
5.5.4 Simulation Results
10 15 20 25 30 35 40 45 50 55 60
−800
−600
−400
−200
0
200
400
Time (us)
Abs
olut
e P
hase
Err
or (
ps)
Figure 5.21: Absolute phase error of the output clock of DPLL with respect to an idealclock with (a) binary BBPD (thin dashed blue) and (b) MPBBD (solid thick red). Theinitial frequency error is 300 MHz. Data is clipped below 6.5 µs as rest was applied atthat moment after loading the right loop configurations. The FLL takes 15.5 µs (310reference cycles) while PLL takes 30 µs (600 cycles) in case BBPD (a) is used and 5µs(100 cycles) in case MPBBD (b) is employed.
The DPLL settling time is inversely proportional to the loop bandwidth. When a
MPBBD is employed, the DPLL loop bandwidth is increased by factor of 2.5, on average,
during the initial frequency and phase locking operation (in comparison to a binary
BBPD offering the same jitter performance in lock). Accordingly, a DPLL expects faster
locking time by approximately a factor of up to 2.52 = 6.25. Fig. 5.21 shows a behavioral
simulation of the absolute phase error of the output clock converging to very small value
(ideally zero) after phase lock. The speed of convergence (which is an indication of the
settling time) is dependent on the type of phase detector used and on the initial frequency
error. If a binary BBPD is used and the initial frequency error is around 300 MHz (34%
locking range), the phase lock operation takes around 30 µs. In that case, the BBPD
slews for a long time before finding the proper code to drive the DCO, as shown in Fig.
5.22(a). On the other hand, using the MPBBD only 5 µs is needed to achieve phase lock
given the same frequency error and same initial conditions.
126
5.5. Implemented Architecture
10 15 20 25 30 35 40 45 50 55 60−4
−2
0
2
4
Time (us)
BB
PD
Out
put
(a) Binary BBPD with maximum gain of ±1
10 15 20 25 30 35 40 45 50 55 60−4
−2
0
2
4
Time (us)
MP
BB
D O
utpu
t
(b) MPBBD with maximum gain of ±4
Figure 5.22: The mapped output of the bang-bang detector (from LUT) during frequencyand phase lock. The binary BBPD slews when phase error is high and takes lengthy timeto recover. On the other hand, MPBBD automatically gears its gain according to thephase error magnitude till lock is achieved.
5.5.5 Measurement Results
A prototype chip was designed and fabricated in the STM 28nm CMOS LP process.
Fig. 5.25 shows a die photograph. The active area is less than 0.008 mm2 including
the decoupling caps and output buffers. The reference clock is an off-chip high quality
20 MHz temperature compensated crystal oscillator with 1 PPM and -143 dBc/Hz phase
noise at 1 kHz from Wenzel Associates.
The measured coarse DCO step is around 2.5 MHz/step while the fine DCO step
is 13 kHz/step on average. Fig. 5.23 shows the simulated phase noise spectrum using
Verilog-A and MATLAB of a 1.2 GHz PLL output on top of the measured phase noise
using an Agilent spectrum analyzer. The in-band phase noise is -98.38 dBc/Hz at 50 kHz
offset and out-of-band phase noise is -142 dBc/Hz at 500 MHz offset. Switching on/off
the FLL has negligible effect on the spectrum. Also, Fig. 5.24 shows the phase noise
127
5.5. Implemented Architecture
spectrum of a 1.4 GHz PLL output captured using an Agilent spectrum analyzer. The
in-band phase noise is -96.46 dBc/Hz at 100 kHz offset and out-of-band phase noise is
-143 dBc/Hz at 500 MHz offset.
Figure 5.23: DPLL output phase-noise spectrum at 1.20 GHz: Simulation (blue) vs.measurement (black) captured by an Agilent E4448A spectrum analyzer. The in-bandnoise is -98.32 dBc/Hz while the loop bandwidth is around 1.7 MHz.
The CMOS stages in the DCO have inherently low power supply noise rejection, and
must therefore generally be operated from a regulated supply voltage, using a voltage
regulator. No regulator was integrated into the present design, resulting in higher-than-
expected phase noise.
The DPLL locks to the reference over the range 880 MHz-to-1.20 GHz using 1.1V
power supply. The in-band phase noise was almost the same for the whole locking range.
The power consumption of the DPLL was 502uW (1.1V x 456 µA) excluding the DCO.
Disabling the FLL after locking saved around 85 µW of power in lock. The power
savings would be larger if the DCO (and, hence, the frequency counter) was working at
higher frequency. The DCO consumes from 2.9 – 3.1 mW depending on the frequency
128
5.5. Implemented Architecture
of operation. Using a 0.7V supply, the DPLL works at 440 MHz while DPLL (excluding
DCO) only consumes 64uW (0.7V x 91uA).
Figure 5.24: DPLL output phase-noise spectrum at 1.40 GHz captured by an AgilentE4448A spectrum analyzer. The in-band noise is -96.48 dBc/Hz while the loop bandwidthis around 1.7 MHz.
Figure 5.25: Die photograph of the DPLL in 28nm CMOS LP ST MicroelectronicsTechnology (active area is less than 0.008 mm2).
129
5.6. Conclusion
5.6 Conclusion
In summary, DPLLs with BBPDs are becoming more widely used compared to TDC-
based DPLLs, due to their simplicity and low power consumption. However, binary
BBPDs suffer from slewing which limits their pull in frequency range and slows the
locking process if the initial frequency error is large. A multi-phase bang-bang detector
(MPBBD) is proposed to achieve a fast locking time and to extend the pull in range while
reducing slewing and cycle slipping. This is done by realizing automatic gear shifting
that produces large gain for large frequency and phase errors and automatically shifts to
a low gain setting in steady state.
To quantify the advantages of using a MPBBD over a BBPD, an analysis was pre-
sented of the PLL’s behavior given a large initial frequency error. The analysis provides
an accurate closed form for the locking time, number of cycle slips, and pull in range for
a PLL with a quantized phase detector that has an absolute average large signal gain of
KPD. Locking time as well as number of cycle slips are improved by K2
PD. However, the
pull in frequency range is extended by only√KPD.
Later, an improved MPBBD is proposed to extend the pull in range beyond√KPD to
reach ±fref by modifying the transfer function of a MPBBD according to the frequency
error sign. This modification can enable the use of an improved MPBBD for high speed
DPLLs without the need to implement a special frequency locking aid and without the
need for a frequency divider.
The analogy between a BB-DPLL and ∆Σ modulator is presented where the occur-
rence of a first cycle slip in the BB-DPLL is analogous to overloading the quantizer in
a ∆Σ modulator. Accordingly, the maximum initial frequency error that will not cause
any cycle slipping (i.e. acquisition frequency range) is found by using limits well known
in the ∆Σ literature. The MPBBD-DPLL can handle KPD times larger initial frequency
error without cycle slipping compared with BBPD-DPLL.
The chapter concludes by presenting a silicon implementation of a MPBBD-DPLL
in STM 28 nm LP CMOS process. The MPBBD-DPLL has much faster locking time
and wider pull in range compared with BBPD-DPLL. Furthermore, the MPBBD has the
same steady-state jitter performance as a BBPD based DPLL with little design overhead.
The modification is minimal, requiring only three additional flip flops and a small LUT.
The presented architecture disables the high speed logic, used for frequency counting,
after achieving frequency locking to save power during steady-state operation.
130
Chapter 6
Conclusion
DPLLs have drawn a great deal of interest over the last decade. The main driver for
research in DPLLs is the continued transition from one CMOS technology node to another
due to the pressures of integrating evermore complex system on chip (SoC) designs. While
analog PLL design has become more complicated with technology scaling, DPLLs take
advantage of technology scaling to realize phase detection and frequency tuning with
better resolution. Furthermore, the DPLL design cycle can be automated to a great
extent by using well-developed software tools and methodologies for digital design and
verification.
This thesis investigates three problems related to phase and frequency detection in
DPLLs. The first part of the thesis deals with the reduction of the quantization noise
of TDCs to enable wide bandwidth operation and low fractional spurs. The second
part proposes different digital solutions to the dead-zone behavior during integer mode
operation where the DPLL dynamics could be unpredictable and dependent on initial
conditions. Finally, an analysis of the pull-in and locking behavior of a DPLL with a
quantized phase detector is presented, and an improvement is proposed.
The next section summarizes the thesis contributions and lists the publications that
arose over the course of research. The thesis concludes with an insight on future research
directions in the area of DPLLs.
131
6.1. Contributions
6.1 Contributions
This thesis explores the analysis and design of high-performance DPLL in sub-micron
CMOS processes. Chapter 3 presents a fractional DPLL that incorporates a novel low-
power two-step coarse-fine TDC to achieve low in-band phase noise operation. The
DPLL employs a stochastic TDC for the fine TDC stage while still achieving wide locking
range using a coarse delay line TDC. The thesis shows a highly sophisticated DPLL that
achieves competitive performance with state-of-art PLLs while at the same time preserve
the simplicity of the digital nature of the DPLL and the coarse-fine stochastic TDC. The
thesis brings DPLL research closer to a fully automated flow. Also, the thesis provides
a statistical analysis of achievable resolution, in practice, for a given stochastic TDC,
which complies closely with the presented measured results. The design incorporates an
on-chip calibration algorithm of the coarse TDC based on a balanced mean code density
test. By using a balanced mean code density test, the number of registers required for
the calibration algorithm is reduced by 30%. More importantly, a balanced mean code
density test relieves the necessity to do off-chip calibration or using SoC for on-chip
calibration.
Measurements results of the DPLL show an in-band phase noise of -107 dBc/Hz,
which is equivalent to 4 ps TDC resolution, approximately an order of magnitude better
than an inverter delay in this process technology. The integrated random jitter is 213 fs
rms for a 2 GHz output carrier frequency with 700 kHz loop bandwidth. The calibration
reduces worst-case spurs by 16 dB. The proposed DPLL consumes only 15.2 mW in
0.13 µm CMOS of which 4.4 mW are consumed in the TDC. Analysis, simulation and
measurement results for the DPLL are summarized in the following publications:
Samarah, A., Chan Carusone, A. “A Digital Phase-Locked Loop With Calibrated
Coarse and Stochastic Fine TDC”; IEEE Custom Integrated Circuits Conference
(CICC), San Jose, California, September 2012. [16]
Samarah, A., Chan Carusone, A. “A Digital Phase-Locked Loop With Calibrated
Coarse and Stochastic Fine TDC”; Solid-State Circuits, IEEE Journal of, Vol-
ume:48, Issue: 8, 1829 - 1841, Aug. 2013. [8]
Chapter 4 investigates the dead-zone problem during integer mode operation caused by
the TDC quantized response. The dead-zone behavior results in limit cycle behavior
132
6.1. Contributions
causing higher than expected in-band phase noise and strong in-band spurious tones.
To alleviate this problem, a novel noise-shaped offset is added to the phase error, in
the digital domain, to keep the TDC active and away from the dead-zone. Extensive
simulations and measurements of a DPLL prototype in a 0.13 µm CMOS process verifies
the effectiveness of the proposed digital solution. The work is published in the following
conference paper:
Samarah, A., Chan Carusone, A. “A Dead-Zone Free and Linearized Digital PLL”;
IEEE International Conference on Electronics, Circuits, and Systems (ICECS),
Seville, Spain, December 2012. [44]
Finally, Chapter 5 presents a rigorous mathematical analysis of a DPLL employing a
quantized phase detector during frequency acquisition where the DPLL usually exhibits
cycle slipping. The analysis finds that the pull-in range is proportional to the square
root of the phase detector large signal gain,√KPD, while the locking time is inversely
proportional to K2PD. Based on the findings of this analysis, an MPBBD-DPLL is pro-
posed to accelerate frequency and phase locking time and to increase the pull in range
while maintaining the same steady state performance as the BBPD-DPLL. The proposed
DPLL reduces power consumption by disabling the high-speed counter and re-timing cir-
cuit in the feedback loop after achieving frequency lock. Also, an improved version of
the MPBBD is suggested to extend the pull-in range up to the reference frequency range
which could eliminate the frequency lock loop and feedback counter for DPLLs and dig-
ital CDRs. Theoretical findings, as well as simulation and measurement results, are
documented in the following publications:
Samarah, A., Chan Carusone, A. “Multi-Phase Bang-Bang Digital Phase Lock
Loop with Accelerated Frequency Acquisition”; IEEE International Symposium on
Circuits and Systems (ISCAS), Lisbon, Portugal, May 2015. [58]
Samarah, A., Chan Carusone, A. “Cycle-Slipping Pull-In Range of Bang-Bang
PLLs ”; IEEE International NEW Circuits And Systems (NEWCAS), Grenoble,
France, June 2015. [66]
Samarah, A., Chan Carusone, A. “Discrete Time Analysis of Multi-Phase Bang-
Bang Phase Lock Loops”; IEEE Transactions on Circuits and Systems , to be
submitted, 2016.
133
6.2. Future Work
6.2 Future Work
Most research in the field of DPLLs focuses on improving the resolution of phase detection
and frequency tuning to achieve competitive phase noise and jitter performance similar
to classical analog PLLs. However, this trend depends on designing power hungry phase
detectors, especially for high-speed operation. There is a need to rethink the DPLL
architecture to employ simple but smart phase detectors since it will be naturally easy
to calibrate simple detectors and they can operate at fast speed.
Though the inverter delay gets smaller with CMOS technology scaling, the mismatch
and PVT variations become worse. Hence, the resolution of a typical inverter-delay-
line TDC improves with time, but the increasing mismatch variations limit the possible
achievable resolution and necessitate the use of a calibrated inverter with wide calibra-
tion range. The calibration complicates the design and limits the achievable resolution
moreover. Alternatively, researchers must take a different approach to minimize the TDC
quantization noise and to keep DPLL design simple and scalable. One approach is to
use a noise shaped coarse TDC with redundant information for error correction similar
to the error correction implemented in pipelined ADCs.
Another promising research is the design of a DPLL compiler similar to memory com-
pilers used within modern digital design flows. A DPLL compiler may accept high-level
specifications and produce synthesized gate-level Verilog for distribution as intellectual
property (IP). The main obstacle to achieving this goal is the difficulty of design automa-
tion for the TDC and DCO blocks due to their analog nature. Though recent works, like
[67], target DCO design automation, there is a long way to go before building a robust
DCO and TDC compiler for different application and specification.
134
Appendix A
Schematics
CML output buffer
Figure A.1: Schematic of the four-stage, 50 output driver used to send the DCO outputoff-chip. The last differential pair M4 is sized W = 176 µm/L = 120 nm while theload resistor R4 = 62.5 ohm. The previous stages are sized according to the following:transistor sizes of M4 = 2 ∗M3 = 4 ∗M2 = 8 ∗M1 and resistor values of R4 = R3/2 =R2/4 = R1/8.
135
CML latch and divide by two
(a)
(b)
Figure A.2: (a) Schematic of the CML latch used in the divide-by-2 circuit. The valueof R = 2 kΩ while M1 = M2 = M3 has W = 6 µm and L = 120 nm (b) Schematic ofdivide-by-2 using the two CML latches.
136
CML to CMOS conversion
Figure A.3: Schematic showing the CML to CMOS conversion employed after the CMLdivide-by-2. The CML signal is AC coupled through Cc = 150 fF and then passed toCMOS inverter with feedback resistor Rf = 35 kΩ to define the input common mode.The small cross coupled CMOS inverters (W = 160 nm & L = 120 nm) are used ensuredifferential operation. Another stage of CMOS inversion follows with similar size of thefirst stage (W = 13.02 µm & L = 120 nm)
Package model
Figure A.4: (a) Illustrative diagram of the fabricated chip mounted on QFN36 packageand soldered on PCB (b) Lumped model of the output PADs, bond wires, lead and PCBtrace capacitance. The chip dimensions are 1 µm x 1 µm while the QFN36 dimensions are5 µm x 5 µm. Accordingly, the bond wire could be 2-2.5 µm long and so Lbw = 2.5 nH.The extracted PAD capacitance was 90 fF.
137
DFF used in the coarse TDC
(a) Clocked sense amplifer
(b) Set-reset latch
Figure A.5: Sense-amplifier flip flop with a narrow metastability window [15]
138
Appendix B
Noise Contribution to Timing Jitter
The purpose of this appendix is to explain and demonstrate the direct relationship be-
tween timing jitter and noise sources. The information presented below is based on an
application note ”APP 3631” from Maxim Integrated1.
B.1 Noise Floor Contribution to Timing Jitter
There are a number of factors that contribute to random timing jitter, including broad-
band noise, phase noise, spurs, slew rate, and bandwidth. Both phase and broadband
noise are random, whereas spurs are deterministic responses caused by various identifi-
able interference signals, such as crosstalk and power supply coupling. Also, slew rate
and bandwidth also affect jitter.
Mathematically, one can represent a sinusoid containing broadband white noise with
the following equation:
V (t) = A. sin(2πfot) + vn(t) (B.1)
where vn(t) is the noise voltage at time t. The random noise vn(t) has a Gaussian
distribution with zero mean. The probability distribution pdf(vn) of the noise voltage is:
PDF (vn) =1√
2πv2nRMS
.e−
v2n2.v2
nRMS (B.2)
where vnRMS is the RMS noise voltage.
1https://www.maximintegrated.com/en/app-notes/index.mvp/id/3631
139
B.2. Phase Noise and Spurs Contribution to Timing Jitter
The broadband noise is a significant contributor to timing jitter. The total root-
mean-square (RMS) noise voltage is the integral of the noise floor over the bandwidth.
The RMS voltage noise is translated into timing jitter through slew rate mechanism.
tn =vnSR
(B.3)
where SR is the slew rate and given by
SR ≈ ∆V
∆t=A. sin(2πfo∆t)
∆t≈ A.2πfo (B.4)
assuming that the timing jitter ∆t is very small compared with the fundamental period.
Accordingly, the squared RMS jitter is given by
〈J2noise−floor〉 =
v2nRMS
(2πfoA)2(B.5)
It appears that a faster slew-rate waveform results in lower jitter. However, a faster
slew rate requires a higher operating bandwidth, which increases the RMS noise of the
system. Because the RMS noise is directly proportional to the bandwidth, system de-
signers must carefully choose the slew rate and bandwidth to minimize jitter.
B.2 Phase Noise and Spurs Contribution to Timing
Jitter
To derive the necessary equations relating phase noise to jitter, consider the following
sinusoid containing phase noise:
V (t) = A. sin(2πfot+ Φ(t)) (B.6)
where A is the amplitude, fo is the nominal frequency, and Φ(t) is the phase noise.
Jitter is commonly measured at the 0V crossing between two or more periods. Take two
consecutive 0V crossing timing instants ti and ti+1, one can write:
2πfoti + Φ(ti) = 2πi (B.7)
2πfoti+1 + Φ(ti+1) = 2π(i+ 1) (B.8)
140
B.2. Phase Noise and Spurs Contribution to Timing Jitter
such that i = 0, 1, 2, etc. Subtracting Eq. B.7 from B.8 to get
2πfo[ti+1 − ti] + [Φ(ti+1)− Φ(ti)] = 2π (B.9)
But
ti+1 − ti = To +∆t (B.10)
where To = 1/fo and ∆t is the period jitter i.e. the period variation over time. Substitute
Eq. B.10 into Eq. B.9, one can write:
2πfo[To +∆t] + [Φ(ti+1)− Φ(ti)] = 2π (B.11)
2πfo∆t + [Φ(ti+1)− Φ(ti)] = 0 (B.12)
∆t =To2π
[Φ(ti)− Φ(ti+1)] (B.13)
Accordingly, the period jitter is the difference function of the absolute phase Φ(ti). Now,
the squared RMS jitter
〈∆t2〉 = T 2o
4π2[〈Φ(ti)2〉 − 2〈Φ(ti)Φ(ti+1)〉+ 〈Φ(ti+1)
2〉] (B.14)
Because Φ(t) is a stationary process:
〈Φ(ti)2〉 = 〈Φ(ti+1)2〉 =
∫∞
−∞
SΦ(f).df (B.15)
where SΦ(f) is the power spectrum density of the phase noise, Φ(t), and f is the offset
frequency from the carrier frequency fo.
The middle term of Eq. B.14 can be written using the autocorrelation function of the
phase noise, Φ(t).
〈Φ(ti)Φ(ti+1)〉 = RΦ(ti+1 − ti) = RΦ(τ) =
∫∞
−∞
SΦ(f). cos(2πfτ).df (B.16)
where is RΦ(τ) is the autocorrelation function of Φ(t) and τ ∼= To.
141
B.2. Phase Noise and Spurs Contribution to Timing Jitter
Now, one can write the squared RMS jitter in Eq. B.14 as the following:
〈∆t2〉 = T 2o
4π2[2
∫∞
−∞
SΦ(f).df− 2
∫∞
−∞
SΦ(f). cos(2πfTo).df] (B.17)
〈∆t2〉 = 2T 2o
4π2
∫∞
−∞
SΦ(f).[1− cos(2πfTo)].df (B.18)
Recalling the algebraic identity [1 − cos(2πfTo)] = 2. sin2(πfTo) and assuming the
phase noise symmetrical, one can write:
〈∆t2〉 = 8T 2o
4π2
∫∞
0
SΦ(f). sin2(πfTo).df (B.19)
SΦ(f) is approximately equal to the phase noise L(f) for close-in phase noise (∆f < fo,
usually ∆f = fo/2),
〈J2period〉 = 〈∆t2〉 = 8
T 2o
4π2
∫ ∆f
0
L(f). sin2(πfTo).df (B.20)
Spurs also contribute to timing jitter, especially in oscillators. Spurs are caused by
phase-locked-loop reference spurs, supply coupling, crosstalk from nearby circuitry, and
sources.
〈J2spurs〉 = 4
T 2o
4π2
∑
m
L(fm). sin2(πfmTo) (B.21)
Assuming that the spurs are not symmetrical and so the spurs on both sides of the carrier
must be included in the jitter calculation. L(fn) is the spur amplitude relative to the
carrier given in dBc. (If the spurs are symmetrical, one may use a factor of 8 not 4, and
account for only one side of the spectrum’s spurs)
Broadband noise (white noise floor), phase noise, and spurs are the three contrib-
utors to timing jitter. Broadband noise is purely random and uncorrelated, thus the
jitter it produces does not accumulate. The latter two, however, generally do produce
accumulating jitter. The squared total timing jitter is the sum of the three squared
jitters.
〈J2total〉 = 〈J2
noise−floor〉+ 〈J2period〉+ 〈J2
spurs〉 (B.22)
And the total RMS period jitter is Jrms =√〈J2
total〉. If calculating N-period jitter
142
B.3. Approximation of RMS Timing Jitter from L(f)
(i.e. N-cycle), replace τ by N.To rather than by To.
The cycle-to-cycle jitter (Jcc) is measure of variation of the difference between adjacent
periods. It is the first difference function of period (or cycle) jitter.
〈J2cyclecycle〉 = 〈∆t2〉 = 32
T 2o
4π2
∫ ∆f
0
L(f). sin4(πfTo).df (B.23)
B.3 Approximation of RMS Timing Jitter from L(f)
One can define phase-noise spectrum L(f) using the power spectrum density SP (f) ob-
tained from spectrum analyzer.
L(f − fc) = 10× log[SP (f)
SP (fc)] dBc (B.24)
Using the Fourier series expansion, it can be shown that a square-wave clock signal
has the same jitter behavior as its base harmonic sinusoid signal. This property makes
the jitter analysis of a clock signal much easier.
A sinusoid signal of a clock signal with phase noise can be written as:
V (t) = A. sin(2πfct+ Φ(t)) = A. sin(2πfc(t+Φ(t)
2πfc)) (B.25)
From which, one can write an equation for the absolute timing jitter as:
J =Φ(t)
2πfc(B.26)
The RMS absolute jitter can be calculated by integrating the phase noise spectrum
as the following:
Jrms =1
2πfc
√〈Φ2(t)〉 = 1
2πfc
√2
∫∞
0
10L(f)10 .df (B.27)
In some applications like SONET and 10GbE, engineers only monitor the jitter at a
143
B.3. Approximation of RMS Timing Jitter from L(f)
certain frequency band. And so,
Jrms =1
2πfc
√2
∫ f2
f1
10L(f)10 .df (B.28)
The phase noise usually can be approximated by a piece-wise linear function when
the frequency axis of L(f) is in log scale:
L(f) =
K−1∑
i=1
[ai(log(f)− log(fi)) + L(fi)][U(f − fi)− U(f − fi+1)] (B.29)
where K-1 is the number of piece-wise line sections in the function and U(f) is the
unit step function. Substitute L(f) shown in Eq. B.29 into Eq. B.28 (remember that
10x.log(y) = [10log(y)]x = yx) to write:
Jrms =1
2πfc
√√√√2K−1∑
i=1
10L(fi)
10 f−ai10
i
∫ fi+1
fi
fai10 .df (B.30)
=1
2πfc
√√√√2
K−1∑
i=1
10L(fi)
10 f−ai10
i (ai10
+ 1)−1[fai10
+1
i+1 − fai10
+1
i ] (B.31)
where:
ai =L(fi+1)− L(fi)
log(fi+1)− log(fi)(B.32)
144
Appendix C
Modeling and Simulation of DCO
C.1 Noise Modeling of White Gaussian Noise
Accurate noise modeling is necessary for precision computer simulation. A mathemat-
ical treatment is needed to represent ideal white Gaussian noise using a discrete time
simulation environment.
Given an additive white Gaussian noise (AWGN), with spectral density No/2 W/Hz,
is filtered by an ideal brick-wall continuous-time low-pass filter. Assuming that the input
Gaussian noise is a stationary random precess, the autocorrelation function of the filter’s
output noise is calculated using the Wiener-Khintchine theorem.
R(τ) =
∫ Fs/2
−Fs/2
No
2ej2πfτdf
R(τ) =NoFs
2
sin(πFsτ)
πFsτ
Noise samples separated by integer-multiplier of 1/Fs are completely uncorrelated. Ac-
cordingly, the ideal continuous AWGN noise source can be modeled exactly in a discrete
time simulation by using a Gaussian random number generator that provides a sample
every 1/Fs and having a sampling variance of NoFs/2.
145
C.1. Noise Modeling of White Gaussian Noise
C.1.1 Modeling Flicker Noise
Discrete-time random process modeling precision should be evaluated based on the au-
tocorrelation function behavior rather than the power spectral density in order to avoid
aliasing in the frequency domain. A zero-mean, discrete-time, Gaussian process is said to
simulate a continuous-time, Gaussian random process if the discrete-time autocorrelation
function precisely matches the sampled continuous-time autocorrelation function. [14]
There are several different methods to generate 1/fα noise that trade off accuracy
with computational resources including:
Auto-Regressive (AR) Method: One of the most accurate methods but at the cost
of being computationally intensive.
Random Midpoint Displacement Method: Used extensively in computer graphics.
Fractional-Differencing Method: Using FIR or IIR filters with a large number of
filter coefficients to model 1/f noise over several decades.
Recursive Filtering Method: The 1/f noise is constructed by passing white Gaus-
sian noise through a cascade of first-order digital filters having appropriately se-
lected pole and zero frequencies.
Generating 1/f Noise by Recursive Filtering [14].
The 1/f noise is constructed by passing white Gaussian noise through a cascade of first-
order digital filters having appropriately selected pole and zero frequencies. The squared
transfer function for the filter is given by
|H(ω)|2 =Nf∏
i=1
(ω2 + z2iω2 + p2i
)
where Nf is the number of cascaded filter sections, and zi and pi is the filter zeros and
poles, respectively. The poles must be located on a logarithmic grid across the frequency
span of interest (ωmin, ωmax) as
pi = ωmin ∗ exp[1
2
(1− α
2
)∆p
]
146
C.1. Noise Modeling of White Gaussian Noise
where
∆p =loge(ωmax)− loge(ωmin)
Nf
for i = 1 to Nf .
In order to have a symmetrical error with respect to the ideal 1/f spectrum line, the
zeros are given by
zi = pi ∗ exp(α2∆p)
Each filter section shapes different regions of the noise spectrum, to yield a composite
output spectrum with the desired 1/f shape. A minimum of one filter section per fre-
quency decade is recommended for reasonable accuracy. A Matlab code to implement
this recursive filter is shown below [14].
1 %===================== recursive flicker noise.m ====================
2 % Use recursive 1/fˆ2 filterting to create a good approximat ion
3 % to 1/f noise
4 %====================−===============================================
5 function [wsw, psw]= recursive flicker noise( wmin, wmax, Nf, alpha )
6 % wmin Minimum radian frequency of interest
7 % wmax Maximum radian frequejncy of interest
8 % Nf Number of 1/fˆ2 filter sections to use
9 % alpha Desired noise exponent 1/fˆalpha
10
11 Hz= @(ww,zz,pp) (ww.ˆ2 + zzˆ2) ./ (ww.ˆ2 + ppˆ2);
12 dp= (log(wmax) −log(wmin))/Nf;
13 hpoles(1)= wmin * exp( 0.1 * (1 −0.50 * alpha) * dp );
14
15 for ii=2:Nf
16 hpoles(ii)= hpoles(ii −1) * exp(dp);
17 end
18
19 for nn=1:Nf
20 hzeros(nn)= hpoles(nn) * exp( 0.50 * alpha * dp );
21 end
22
23 Npts= 1000;
24 wsw= logspace( floor(log10(wmin)), ceil(log10(wmax)), N pts);
25 psw= zeros(1,Npts);
147
C.2. Simulation of the PLL
26
27 for ii= 1:Npts
28 for jj=1:Nf
29 psw(ii)= psw(ii) + 10 * log10( Hz(wsw(ii), hzeros(jj), ...
hpoles(jj) ) );
30 end
31 end
32
33 figure(1);
34 clf;
35 h1= semilogx( wsw, psw, 'k' );
36 set( h1, 'LineWidth' , 2 );
37 xlabel( 'Radian Frequency' );
38 ylabel( 'Relative Spectrum Level, dB' );
39 title( 'fˆ ˆ −ˆ \alpha Power Spectral Density' );
40 grid on;
41 zoom on;
42 end
C.2 Simulation of the PLL
It can take days or weeks of simulation time to run a circuit-level simulation, that captures
PLL locking, making this method tedious and resource intensive. Fast simulation is
highly desirable to minimize fabrication costs and to shorten time-to-market [49]. This
can be done by carefully abstracting the circuits and blocks into reasonable behavioral
models using MATLAB/ Simulink and Verilog-A/ Verilog-AMS, or using other modeling
environments.
Even at the system level of abstraction, simulating a PLL is time-consuming due
to the different time scales among the reference clock and the output clock. The time
step of the simulator must be much smaller than the smallest time constant of the PLL
loop. The PLL needs many reference cycles so that the oscillator locks its frequency to a
multiple of the reference clock frequency. This translates to a large number of simulator
time steps to capture the frequency and phase locking behavior of the PLL.
Time interval error (TIE) is the short-term variations of the significant instants of a
clock cycle (e.g. rising edges) from their ideal positions. TIE maintains a record of errors
versus time which make accumulated phase error measurement possible using an FFT.
148
C.2. Simulation of the PLL
Jrms
(thermal)
+
Jrms
(1/f2)1/f
freq
DCO time
stamps
Cmax
L
∆Ccoarse
∆Cfine
Cop
+
-
d[\]^_`da\]^_`dabcd\e]^e`dafcg\h^e`daijk\h^e`
Encoder
(a) A discrete-time DCO model generates timesteps of the rising and falling edges of DCO clock
1/f noiseUp-conversion
Thermal noiseup-conversion
Thermal noise
0.01 0.1 1 10 100 1000 Offset Frequency (MHz)
Ph
ase
No
ise
(dB
c)
-50
-70
-90
-110
-130
-150
(b) Simulation model (blue) vs. mathematicalmodel (black)
Figure C.1: Verilog-A model of DCO phase noise.
TIE can be measured using a real-time sampling oscilloscope. First, a reference clock is
recovered from incoming data or clock. Then compare the instantaneous edges ti with
the ideal edge location i× To. Ideally, TIE = 0, while calculating the RMS jitter to be√TIE
2and the peak-to-peak TIE jitter.
MATLAB code to find the phase noise of a DPLL based on time stamps of the DPLL
output clock edges is provided below.
1 function jitter eval mat2pn(data1, fileToRead, r start, r end, ...
nfft, plot format, storeFig)
2
3 % data1 −−> matlab matrix contains the time stamps of the clock ...
edges (no scaling)
4 l start = floor(r start * length(data1))+1; % truncate the transient ...
samples
5 l end = floor(r end * length(data1)); % truncate the transient samples
6 data = data1(l start:l end) * 1e12;
7
8 %% Calculate the period and jitter
9 n = [1:length(data)]';
10 length(data)
11 p = polyfit (n, data, 1);
12 T = p(1) % in ps
149
C.2. Simulation of the PLL
13 fsyn = (10ˆ12)/T % estimation of the synthesized frequency
14
15 %% Measure the difference between the ideal edge and the real rising ...
edge
16 ph e(n) = data(n) − polyval(p,n); %
17
18 %% Plot the phase noise of the jitter
19 winNBW=1.5; % Noise bandwidth given in bins
20 phases=2 * pi * ph e/T;
21
22 % compute power spectral density of phase
23 [Sphi,f]=psd(phases,nfft,1e12/T,nfft,nfft/2, 'linear' );
24
25 % correct for scaling in PSD due to FFT and window
26 Sphi=winNBW * Sphi/nfft;
27 % plot the results (except at DC)
28 K = length(f);
29 rbw = winNBW/(T * 1e−12* nfft);
30
31 db = 10 * log10(Sphi(2:K)) −10* log10(rbw);
32 f = f(2:K);
33 % calculate the integrated phase jitter from phase noise
34 PJ = sqrt(Pn2Jitter(f,db))/(2 * pi/T)
35
36 asPlot(f,db, 'Offset Frequency' , 'Phase Noise [dBc/Hz]' );
37 set(gca, 'XScale' , 'log' );
38 set(gca, 'XLim' ,[1e+4 1.2e+9]);
39 set(gca, 'YLim' ,[ −160 −60]);
40 asPrint( fileToRead, ' pn' , 'eps' , 1, 0) % print eps without title
41 mytitle1 = sprintf( ' \n The integrated phase jitter %d ...
fs' ,floor(PJ/1e −3) );
42 title(mytitle1)
43 asPrint( fileToRead, ' pn' , plot format , 1, storeFig) % print .png ...
and save .fig format
The Verilog code to implement thermal and 1/f flikcer noise is shown below [14].
1 //Verilog HDL for "PLL2014", "DCO nonInv" "verilog 2014"
2 `timescale 1ns/1fs // time unit / time resolution
150
C.2. Simulation of the PLL
3
4 `define dco per 0 900 // ps − period of highest frequency − ...
1/900MHz
5 `define dco peroff lim 250 // ps − maximum period deviation
6 `define dco quant a 2033 // fs − time resolution of ...
acquisition caps
7 `define dco quant t 9.68 // fs − time resolution of tracking ...
caps
8
9 `define duty 0.5 // 50 % duty cycle
10 `define NO FILTER 1 // 1 for instant freq change
11 `define dco init dly 0 // initial oscillator delay
12 `define wander rms 217.1 // fs − accumulative jitter
13 `define jitter rms 530 // fs − non−accumulative jitter
14
15 `define wrms1 0.8039
16 `define wrms2 2.5421
17 `define wrms3 8.0389
18 `define wrms4 25.4213
19 `define wrms5 80.3893
20
21 `define fc1 0.1 // 0.1 kHz
22 `define fc2 1 // 1 kHz
23 `define fc3 10 // 10 kHz
24 `define fc4 100 // 100 kHz
25 `define fc5 1000 // 1000kHz
26
27 `define noise floor −150 // dBc/Hz
28 `define L at Foff −103 // dBc/Hz
29 `define Foff 1e6 // 1 MHz i.e. −103 dBc/Hz phase noise @ ...
offset freq 1 MHz
30
31 `define DCO 1f 1
32
33 // Define some math constants
34 `define pi 3.14159
35 `define e 2.71828
36
37 module DCO nonInv
38 (
151
C.2. Simulation of the PLL
39 OutP,
40 A, B, C, D,
41 M, // frequency acquisition input control bits
42 Fcol, // phase tracking input control bits
43 Ftherm // = where the MSB's are binary encoded while
44 ); // = the LSB's are thermally encoded
45 output A, B, C, D;
46 output reg OutP;
47
48 input [15:1] Ftherm;
49 input [7:4] Fcol;
50 input [6:0] M;
51
52 // thermo −code to integer − tracking bits
53 integer i;
54 integer track col, track;
55
56 always @( * ) begin
57 track = 0;
58 for (i = 1; i < 16 ; i = i + 1)
59 if (Ftherm [i] == 1'b1)
60 track = track + 1;
61 end
62 /////////////////////////////////////////////////// /////////////
63 // Compute the DCO period
64 /////////////////////////////////////////////////// /////////////
65 real mat quant a;
66 real mat quant ti;
67 real mat quant tf;
68 real mat quant ls;
69
70 always @( * ) begin
71 mat quant a = M[6:0] * `dco quant a;
72 mat quant ti = Fcol[7:4], 4'b0000 * `dco quant t;
73 mat quant tf = track * `dco quant t;
74 mat quant ls = mat quant a + mat quant ti;
75 end
76
77 real mat pdev, mat pdev var;
78 real mat per = `dco per 0 / 1e3; // in ns
152
C.2. Simulation of the PLL
79
80 always @(mat quant ls, mat quant tf) begin
81 mat pdev var = mat quant ls + mat quant tf; // fs
82
83 if (mat pdev var > `dco peroff lim * 1e3)
84 mat pdev var = `dco peroff lim * 1e3; // fs
85 else if (mat pdev var < −1 * `dco peroff lim * 1e3)
86 mat pdev var = −1 * `dco peroff lim * 1e3; // fs
87
88 mat pdev = mat pdev var; //fs
89 mat per = (`dco per 0/1e3) + (mat pdev var/1e6); // period in ns
90 end
91
92 real fc ctrl = 0.020; // 20 MHz i.e. 50 ns
93 real tau ctrl = 8e −9;
94 always @( * )
95 tau ctrl = 1 / (2.0 * `pi * fc ctrl);
96
97 real jitter = 0;
98 real jitter prev = 0;
99 real wander, wanderT;
100 real wander1, wander2, wander3, wander4, wander5;
101 real wander1f, wander2f, wander3f, wander4f, wander5f;
102 real period;
103 real period prev;
104 real tref;
105 real t diff;
106
107 integer seed1;// = $random($realtime);
108 integer seed2;// = $random(seed1);
109 integer s1, s2, s3, s4, s5;
110
111 real wrms = `wander rms;
112 real jrms = `jitter rms;
113
114 real tau w1 = 1.0;
115 real tau w2 = 1.0;
116 real tau w3 = 1.0;
117 real tau w4 = 1.0;
118 real tau w5 = 1.0;
153
C.2. Simulation of the PLL
119
120 real w1 fc = `fc1 / 1e6; // convert to GHz
121 real w2 fc = `fc2 / 1e6;
122 real w3 fc = `fc3 / 1e6;
123 real w4 fc = `fc4 / 1e6;
124 real w5 fc = `fc5 / 1e6;
125
126 always @( * ) begin
127 tau w1 = 1.0 / (2.0 * `pi * w1 fc);
128 tau w2 = 1.0 / (2.0 * `pi * w2 fc);
129 tau w3 = 1.0 / (2.0 * `pi * w3 fc);
130 tau w4 = 1.0 / (2.0 * `pi * w4 fc);
131 tau w5 = 1.0 / (2.0 * `pi * w5 fc);
132 end
133
134 initial begin
135 seed2 = 3434;
136 s1 = $random;
137 s2 = $random;
138 s3 = $random;
139 s4 = $random;
140 s5 = $random;
141
142 period = `dco per 0 / 1e3;
143 period prev = `dco per 0 / 1e3;
144
145 tref = `dco init dly;
146
147 OutP ≤ #(`dco init dly) 1'b1 ;
148
149 forever begin
150 t diff = $realtime − tref; // time difference between actual ...
and ideal samples
151 tref = tref + mat per; // ideal next time step
152
153 if (`NO FILTER)
154 period = mat per; // adjust the next dco period ...
instantaneously
155 else
156 period = mat per + (period − mat per) * (`e ** ...
154
C.2. Simulation of the PLL
(−period prev/tau ctrl));
157
158 if (jrms != 0) begin
159 jitter = jrms * $dist normal (seed1, 0, 1000) / 1e9 ; // ...
in ns
160 if (jitter ≥ period/2)
161 jitter = 0;
162 period = period + jitter − jitter prev;
163 end
164
165 if (wrms != 0) begin
166 wander = wrms * $dist normal (seed2, 0, 1000) /1e9; // ...
in ns
167 if (wander ≥ period/2)
168 wander = 0;
169 period = period + wander;
170 end
171 if (`DCO 1f) begin
172 wander1 = `wrms1 * $dist normal (s1, 0, 1000) /1e9; // ...
in ns
173 wander2 = `wrms2 * $dist normal (s2, 0, 1000) /1e9; // ...
in ns
174 wander3 = `wrms3 * $dist normal (s3, 0, 1000) /1e9; // ...
in ns
175 wander4 = `wrms4 * $dist normal (s4, 0, 1000) /1e9; // ...
in ns
176 wander5 = `wrms5 * $dist normal (s5, 0, 1000) /1e9; // ...
in ns
177
178 wander1f = wander1 + ( (wander1f − wander1) * (`e ...
** (−period prev/tau w1)) ); // filtered version of ...
−20dBc/hz
179 wander2f = wander2 + ( (wander2f − wander2) * (`e ...
** (−period prev/tau w2)) ); // filtered version of ...
−20dBc/hz
180 wander3f = wander3 + ( (wander3f − wander3) * (`e ...
** (−period prev/tau w3)) ); // filtered version of ...
−20dBc/hz
181 wander4f = wander4 + ( (wander4f − wander4) * (`e ...
** (−period prev/tau w4)) ); // filtered version of ...
155
C.2. Simulation of the PLL 156
−20dBc/hz
182 wander5f = wander5 + ( (wander5f − wander5) * (`e ...
** (−period prev/tau w5)) ); // filtered version of ...
−20dBc/hz
183
184 wanderT = wander1f + wander2f + wander3f + wander4f + ...
wander5f;
185 if (wanderT ≥ period/2)
186 wanderT = 0;
187 period = period + wanderT;
188 end
189
190 OutP ≤ 1'b1;
191 #(period * `duty);
192
193 jitter prev = jitter;
194 period prev = period;
195
196 OutP ≤ 1'b0;
197 #(period * (1 − `duty));
198
199 end
200 end
201 reg A, B, C, D;
202
203 always@*204 begin
205 A ≤ OutP;
206 C ≤ #(period/4) OutP;
207 B ≤ #(5 * period/8) OutP;
208 D ≤ #(7 * period
209 end
210 endmodule //
Bibliography
[1] S.E. Collier. The emerging enernet: Convergence of the smart grid with the internet
of things. In Rural Electric Power Conference (REPC), IEEE, pages 65–68, April
2015.
[2] Xicheng Jiang, editor. Digitally-Assisted Analog and Analog-Assisted Digital IC
Design. Cambridge University Press, 2015.
[3] B. Murmann. Digitally assisted analog circuits. Micro, IEEE, 26(2):38–47, March
2006.
[4] A. Swaminathan, K.J. Wang, and I. Galton. A Wide-Bandwidth 2.4 GHz ISM Band
Fractional-N PLL With Adaptive Phase Noise Cancellation. Solid-State Circuits,
IEEE Journal of, 42(12):2639–2650, 2007.
[5] R.B. Staszewski, J.L. Wallberg, S. Rezeq, Chih-Ming Hung, O.E. Eliezer, S.K.
Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, Meng-Chang Lee,
P. Cruise, M. Entezari, K. Muhammad, and D. Leipold. All-Digital PLL and Trans-
mitter for Mobile Phones. Solid-State Circuits, IEEE Journal of, 40(12):2469–2482,
2005.
[6] S. Pamarti. Digital Techniques for Integrated Frequency Synthesizers: A Tutorial.
Communications Magazine, IEEE, 47(4):126–133, April 2009.
[7] S.E. Meninger and M.H. Perrott. A Fractional-N Frequency Synthesizer Architecture
Utilizing a Mismatch Compensated PFD/DAC Structure for Reduced Quantization-
Induced Phase Noise. Circuits & System II, IEEE Transactions on, 50(11):839–849,
Nov. 2003.
157
Bibliography 158
[8] Amer Samarah and Anthony Chan Carusone. A Digital Phase-Locked Loop with
Calibrated Coarse and Stochastic Fine TDC. Solid-State Circuits, IEEE Journal of,
48(8):1829–1841, 2013.
[9] A. Goel, A. Rylyakov, H. Ainspan, and D. Friedman. A Compact 6 GHz to 12 GHz
Digital PLL with Coupled Dual-LC Tank DCO. In VLSI Circuits (VLSIC), IEEE
Symposium on, pages 141–142, June 2010.
[10] M. Ferriss and M.P. Flynn. A 14mW Fractional-N PLL Modulator with an En-
hanced Digital Phase Detector and Frequency Switching Scheme. In Proc. Digest
of Technical Papers. IEEE International Solid-State Circuits Conference ISSCC ,
pages 352–608, Feb. 2007.
[11] R. Tonietto, E. Zuffetti, R. Castello, and I. Bietti. A 3MHz Bandwidth Low Noise
RF All Digital PLL with 12ps Resolution Time to Digital Converter. In Solid-State
Circuits Conference, 2006. ESSCIRC 2006. Proceedings of the 32nd European, pages
150–153, Sept. 2006.
[12] M.Z. Straayer and M.H. Perrott. A Multi-Path Gated Ring Oscillator TDC With
First-Order Noise Shaping. Solid-State Circuits, IEEE Journal of, 44(4):1089–1098,
2009.
[13] Practical manufacturing testing of bluetooth wireless devices, 2012.
[14] James A. Crawford. Advanced Phase-Lock Techniques. Artech House Publishers,
Nov. 2007.
[15] Poras T. Balsara Robert B. Staszewski. All-Digital Frequency Synthesizer in Deep-
Submicron CMOS. Wiley-Interscience, Aug. 2006.
[16] Amer Samarah and Anthony Chan Carusone. A Digital Phase-Locked Loop with
Calibrated Coarse and Stochastic Fine TDC. In Custom Integrated Circuits Con-
ference (CICC), IEEE, pages 1–4, Sept. 2012.
[17] V. Gutnik and A. Chandrakasan. On-chip Picosecond Time Measurement. In VLSI
Circuits, Digest of Technical Papers. Symposium on, pages 52–53, 2000.
Bibliography 159
[18] Hsiang-Hui Chang, Ping-YingWang, J.-H.C. Zhan, and Bing-Yu Hsieh. A Fractional
Spur-Free ADPLL with Loop-Gain Calibration and Phase-Noise Cancellation for
GSM/GPRS/EDGE. In Proc. Digest of Technical Papers. IEEE International Solid-
State Circuits Conference ISSCC 2008, pages 200–206, Feb. 2008.
[19] Jianjun Yu, F.F. Dai, and R.C. Jaeger. A 12-Bit Vernier Ring Time-to-Digital
Converter in 0.13 µm CMOS Technology. Solid-State Circuits, IEEE Journal of,
45(4):830–842, April 2010.
[20] A. Ravi, S. Pellerano, C. Ornelas, H. Lakdawala, T. Tetzlaff, O. Degani, M. Sa-
jadieh, and K. Soumyanath. A 9.2-12GHz, 90nm Digital Fractional-N Synthesizer
with Stochastic TDC Calibration and -35/ -41dBc Integrated Phase Noise in the
5/2.5GHz Bands. In VLSI Circuits (VLSIC), IEEE Symposium on, pages 143–144,
June 2010.
[21] A. Liscidini, L. Vercesi, and R. Castello. Time to Digital Converter Based on a 2-
Dimensions Vernier Architecture. In Custom Integrated Circuits Conference (CICC),
IEEE, pages 45–48, 2009.
[22] L. Vercesi, L. Fanori, F. De Bernardinis, A. Liscidini, and R. Castello. A Dither-Less
All Digital PLL for Cellular Transmitters. Solid-State Circuits, IEEE Journal of,
47(8):1908–1920, Aug. 2012.
[23] S. Henzler, S. Koeppe, W. Kamp, H. Mulatz, and D. Schmitt-Landsiedel. 90nm
4.7ps-Resolution 0.7-LSB Single-Shot Precision and 19pJ-Per-Shot Local Passive
Interpolation Time-to-Digital Converter with On-Chip Characterization. In Solid-
State Circuits Conference (ISSCC), Digest of Technical Papers. IEEE International,
pages 548–635, 2008.
[24] Chorng-Sii Hwang, Poki Chen, and Hen-Wai Tsao. A High-Precision Time-to-Digital
Converter Using a Two-Level Conversion Scheme. Nuclear Science, IEEE Transac-
tions on, 51(4):1349–1352, Aug. 2004.
[25] Minjae Lee and A.A. Abidi. A 9 b, 1.25 ps Resolution Coarse-Fine Time-to-Digital
Converter in 90 nm CMOS that Amplifies a Time Residue. Solid-State Circuits,
IEEE Journal of, 43(4):769–777, 2008.
Bibliography 160
[26] V. Kratyuk, P.K. Hanumolu, K. Ok, Un-Ku Moon, and K. Mayaram. A Digital
PLL with a Stochastic Time-to-Digital Converter. Circuits and Systems I: Regular
Papers, IEEE Transactions on, 56(8):1612–1621, Aug. 2009.
[27] Tony Chan Carusone, David Johns, and Kenneth Martin. Analog Integrated Circuit
Design. Wiley, second edition, 2011.
[28] S. Weaver, B. Hershberg, and Un-Ku Moon. PDF Folding for Stochastic Flash
ADCs. In Electronics, Circuits, and Systems (ICECS), 17th IEEE International
Conference on, pages 770–773, Dec. 2010.
[29] Hyung Seok Kim, C. Ornelas, K. Chandrashekar, Pin en Su, P. Madoglio, Y.W.
Li, and A. Ravi. A Digital Fractional-N PLL with a 3mW 0.004mm2 6-bit PVT
and Mismatch Insensitive TDC. In ESSCIRC (ESSCIRC), 2012 Proceedings of the,
pages 193–196, Sept. 2012.
[30] J. Doernberg, H.-S. Lee, and D.A. Hodges. Full-Speed Testing of A/D Converters.
Solid-State Circuits, IEEE Journal of, 19(6):820–827, 1984.
[31] F. Baronti, L. Fanucci, D. Lunardini, R. Roncella, and R. Saletti. A Technique for
Nonlinearity Self-Calibration of DLLs. Instrumentation and Measurement, IEEE
Transactions on, 52(4):1255–1260, 2003.
[32] S. Beer and R. Ginosar. A New 65nm LP Metastability Measurment Test Circuit.
In Electrical Electronics Engineers in Israel, IEEE 27th Convention of, pages 1–4,
Nov. 2012.
[33] Z. Ye and M. P. Kennedy. Reduced Complexity MASH Delta-Sigma Modulator.
IEEE Transactions on Circuits and Systems II: Express Briefs, 54(8):725–729, Aug
2007.
[34] Kwyro Lee, I. Nam, Ickjin Kwon, J. Gil, Kwangseok Han, S. Park, and Bo-Ik Seo.
The impact of semiconductor technology scaling on CMOS RF and digital circuits
for wireless application. IEEE Transactions on Electron Devices, 52(7):1415–1422,
July 2005.
Bibliography 161
[35] T.A.D. Riley, N.M. Filiol, Qinghong Du, and J. Kostamovaara. Techniques for In-
Band Phase Noise Reduction in δσ Synthesizers. Circuits and Systems II: Analog
and Digital Signal Processing, IEEE Transactions on, 50(11):794–803, Nov. 2003.
[36] Chun-Ming Hsu, M.Z. Straayer, and M.H. Perrott. A Low-Noise Wide-BW 3.6-GHz
Digital ∆Σ Fractional-N Frequency Synthesizer With a Noise-Shaping Time-to-
Digital Converter and Quantization Noise Cancellation. Solid-State Circuits, IEEE
Journal of, 43(12):2776–2786, Dec. 2008.
[37] Ping-Ying Wang, J.-H.C. Zhan, Hsiang-Hui Chang, and H.-M.S. Chang. A Digital
Intensive Fractional-N PLL and All-Digital Self-Calibration Schemes. Solid-State
Circuits, IEEE Journal of, 44(8):2182–2192, Aug. 2009.
[38] X. Gao, E. A. M. Klumperink, P. F. J. Geraedts, and B. Nauta. Jitter Analysis
and a Benchmarking Figure-of-Merit for Phase-Locked Loops. IEEE Transactions
on Circuits and Systems II: Express Briefs, 56(2):117–121, Feb. 2009.
[39] Jessica Lipsky. TSMC outlines 16nm, 10nm plans. EE-Times, April 2015.
[40] C. Weltin-Wu, E. Temporiti, D. Baldi, and F. Svelto. A 3GHz Fractional-N All-
Digital PLL with Precise Time-to-Digital Converter Calibration and Mismatch Cor-
rection. In Proc. Digest of Technical Papers. IEEE International Solid-State Circuits
Conference ISSCC 2008, pages 344–618, Feb. 2008.
[41] T. Tokairin, M. Okada, M. Kitsunezuka, T. Maeda, and M. Fukaishi. A 2.1-to-2.8-
GHz Low-Phase-Noise All-Digital Frequency Synthesizer with a Time-Windowed
Time-to-Digital Converter. Solid-State Circuits, IEEE Journal of, 45(12):2582–2590,
Dec. 2010.
[42] Ja-Yol Lee, Mi-Jeong Park, Byonghoon Mhin, Seong-Do Kim, Moon-Yang Park,
and Hyunku Yu. A 4-GHz All Digital Fractional-N PLL with Low-Power TDC and
Big Phase-Error Compensation. In Custom Integrated Circuits Conference (CICC),
IEEE, pages 1–4, Sept. 2011.
[43] E. Temporiti, C. Weltin-Wu, D. Baldi, M. Cusmai, and F. Svelto. A 3.5 GHz
Wideband ADPLL With Fractional Spur Suppression Through TDC Dithering and
Feedforward Compensation. Solid-State Circuits, IEEE Journal of, 45(12):2723–
2736, Dec. 2010.
Bibliography 162
[44] Amer Samarah and Anthony Chan Carusone. A Dead-Zone Free and Linearized
Digital PLL. In International Conference on Electronics, Circuits, and Systems
(ICECS) , IEEE, Dec. 2012.
[45] Socrates D. Vamvakos, Robert Bogdan Staszewski, Mahbuba Sheba, and Khurram
Waheed. Noise Analysis of Time-to-Digital Converter in All-Digital PLLs. In Design,
Applications, Integration and Software, IEEE Dallas/CAS Workshop on, pages 87–
90, Oct. 2006.
[46] K. Waheed, R.B. Staszewski, F. Dulger, M.S. Ullah, and S.D. Vamvakos. Spurious
Free Time-to-Digital Conversion in an ADPLL Using Short Dithering Sequences.
Circuits and Systems I: Regular Papers, IEEE Transactions on, 58(9):2051–2060,
Sept. 2011.
[47] R.B. Staszewski, K. Waheed, F. Dulger, and O.E. Eliezer. Spur-Free Multirate All-
Digital PLL for Mobile Phones in 65 nm CMOS. Solid-State Circuits, IEEE Journal
of, 46(12):2904–2919, Dec. 2011.
[48] M. Zanuso, D. Tasca, S. Levantino, A. Donadel, C. Samori, and A.L. Lacaita. Noise
Analysis and Minimization in Bang-Bang Digital PLLs. Circuits and Systems II:
Express Briefs, IEEE Transactions on, 56(11):835–839, Nov. 2009.
[49] Ken Kundert. Predicting the Phase Noise and Jitter of PLL-based Frequency Syn-
thesizers. In Behzad Razavi, editor, Phase-Locking in High Performance Systems,
pages 46–69. IEEE Press, 2003.
[50] Behzad Razavi, editor. Monolithic Phase-Locked Loops and Clock Recovery Circuits:
Theory and Design. Wiley-IEEE Press, 1996.
[51] N. Da Dalt. Linearized Analysis of a Digital Bang-Bang PLL and its Validity Limits
Applied to Jitter Transfer and Jitter Generation. Circuits and Systems I: Regular
Papers, IEEE Transactions on, 55(11):3663–3675, Dec. 2008.
[52] G. Marucci, S. Levantino, P. Maffezzoni, and C. Samori. Analysis and Design of
Low-Jitter Digital Bang-Bang Phase-Locked Loops. Circuits and Systems I: Regular
Papers, IEEE Transactions on, 61(1):26–36, Jan. 2014.
Bibliography 163
[53] James F. Oberst. Pull-In Range of a Phase-Locked Loop with a Binary Phase
Comparator. Bell System Technical Journal, The, 49(9):2289–2302, Nov. 1970.
[54] M. Ramezani, C. Andre, and C.A.T. Salama. Analysis of a Half-Rate Bang-Bang
Phase-Locked-Loop. Circuits and Systems II: Analog and Digital Signal Processing,
IEEE Transactions on, 49(7):505–509, July 2002.
[55] M. Chan and A. Postula. Transient Analysis of Bang-Bang Phase Locked Loops.
Circuits, Devices Systems, IET, 3(2):76–82, April 2009.
[56] Richard C. Walker. Designing Bang-Bang PLLs for Clock and Data Recovery in
Serial Data Transmission Systems, pages 34–45. Wiley-IEEE Press, 2003.
[57] M. Ramezani and C.A.T. Salama. An Improved Bang-Bang Phase Detector for
Clock and Data Recovery Applications. In Circuits and Systems (ISCAS), The
IEEE International Symposium on, volume 1, pages 715–718 vol. 1, May 2001.
[58] A. Samarah and A.C. Carusone. Multi-Phase Bang-Bang Digital Phase Lock Loop
with Accelerated Frequency Acquisition. In Circuits and Systems (ISCAS), IEEE
International Symposium on, pages 545–548, May 2015.
[59] Dan Liu, P. Basedau, M. Helfenstein, J. Wei, T. Burger, and Yangjian Chen. A
Frequency-Based Model for Limit Cycle and Spur Predictions in Bang-Bang All Digi-
tal PLL. Circuits and Systems I: Regular Papers, IEEE Transactions on, 59(6):1205–
1214, June 2012.
[60] A. Pottbacker, Ulrich Langmann, and H. Schreiber. A Si Bipolar Phase and Fre-
quency Detector IC for Clock Extraction up to 8 Gb/s. Solid-State Circuits, IEEE
Journal of, 27(12):1747–1751, Dec. 1992.
[61] Donald Richman. The DC Quadricorrelator: A Two-Mode Synchronization System.
Proceedings of the IRE, 42(1):288–299, Jan. 1954.
[62] Chao-Ching Hung and Shen-Iuan Liu. A 40-GHz Fast-Locked All-Digital Phase-
Locked Loop Using a Modified Bang-Bang Algorithm. Circuits and Systems II:
Express Briefs, IEEE Transactions on, 58(6):321–325, June 2011.
Bibliography 164
[63] Pyoungwon Park, Jaejin Park, Hojin Park, and SeongHwan Cho. An All-Digital
Clock Generator using a Fractionally Injection-Locked Oscillator in 65nm CMOS.
In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE In-
ternational, pages 336–337, Feb. 2012.
[64] R. Nonis, W. Grollitsch, T. Santa, D. Cherniak, and N. Da Dalt. digPLL-Lite:
A Low-Complexity, Low-Jitter Fractional-N Digital PLL Architecture. Solid-State
Circuits, IEEE Journal of, 48(12):3134–3145, Dec. 2013.
[65] Xiang Gao, E.A.M. Klumperink, M. Bohsali, and B. Nauta. A Low Noise Sub-
Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is Not
Multiplied by N2. Solid-State Circuits, IEEE Journal of, 44(12):3253–3263, Dec.
2009.
[66] Amer Samarah and Anthony Chan Carusone. Cycle-Slipping Pull-in Range of Bang-
Bang PLLs. In New Circuits and Systems Conference (NEWCAS), IEEE 13th In-
ternational, pages 1–4, June 2015.
[67] Ching-Che Chung, Duo Sheng, and Chen-Han Chen. An All-Digital Phase-Locked
Loop Compiler with Liberty Timing Files. In VLSI Design, Automation and Test
(VLSI-DAT), 2014 International Symposium on, pages 1–4, April 2014.