improvedphasedetectionfor digital phase-lockedloops€¦ · 4 linearization of digital pll 72 ......

Improved phase detection for DigitalPhase-Locked Loops

by

Amer Samarah

A thesis submitted in conformity with the requirements

for the degree of Doctor of PhilosophyGraduate Department of Edward S. Rogers Sr. Department of Electrical

and Computer Engineering

University of Toronto

© Copyright 2016 by Amer Samarah

Improved phase detection for Digital Phase-LockedLoops

Amer Samarah

Doctor of Philosophy

Graduate Department of Edward S. Rogers Sr. Department of Electrical and Computer

Engineering

University of Toronto

2016

Abstract

Digital PLLs (DPLLs) have emerged as reliable alternatives to analog PLLs since they

are more robust in the presence of process variations and mismatch and do not need

a large on-chip capacitor to realize a loop filter. However, a DPLL employs a time-

to-digital converter (TDC) to resolve the phase error in quantized steps which shows

up as deterministic jitter in the output clock. Similarly, a DPLL tunes a digitally-

controlled-oscillator (DCO) in discrete frequency steps which adds an extra source of

jitter. Furthermore, the quantized response of the DPLL may cause chaotic limit cycles

in some configurations.

This thesis presents innovations to make DPLLs suitable for a wide range of appli-

cations. First, a novel low-power TDC with 4 ps resolution, approximately an order of

magnitude better than an inverter delay in the 0.13 µm CMOS technology, is enabled by

employing a highly digital coarse-fine TDC with a calibrated coarse stage followed by a

fine stochastic stage. On power-up, on-chip calibration algorithm based on a balanced

mean code density test is used to minimize nonlinearities in the coarse TDC, reducing

worst-case spurs from -54.4 dBc to -70.55 dBc at 1.995 GHz operation.

The thesis also investigates dead-zone behavior in DPLLs caused by the quantiza-

ii

tion effect of the TDC. It results in chaotic limit cycle behavior; producing higher than

expected in-band phase noise and strong spurious tones. To alleviate this problem, a

noise-shaped offset is added to the phase error to keep the TDC active and away from

the dead-zone. The proposed solution is demonstrated in a 0.13 µm CMOS prototype

achieving consistent low in-band noise.

A binary bang-bang phase detector (BBPD) is a commonly used alternative to the

power hungry TDC. However, BBPD based DPLLs have limited frequency pull-in and

capture range that are traded off for steady-state jitter performance. The thesis proposes

an alternative to BBPD by using a multi-phase bang-bang detector (MPBBD). Also, the

thesis presents a rigorous mathematical analysis of the cycle slipping behavior to quantify

the pull-in and capture range as well as the locking time for DPLL with either BBPD or

MPBBD. The final formula gives a useful insight into the effect of various loop parameters

on the cycle slipping behavior. A DPLL architecture with a MPBBD is presented and

implemented in 28 nm CMOS technology to improve the pull-in and capture range while

not affecting the steady-state jitter performance.

iii

Acknowledgements

First and foremost I would like to express my sincere thanks and appreciation to my

supervisor, Prof. Anthony Chan Carusone, for his guidance, support, and his constant

encouragement throughout the course of this thesis. Many thanks to my thesis commit-

tee members, Prof. Glen Gulak, Prof. Antonio Liscidini, and Prof. Wai Tung Ng, for

reviewing my thesis and for their valuable feedback. I would like to thank my external

examiner, Prof. Michael Peter Kennedy, for his detailed and valuable feedback.

To my colleagues and friends in BA5158 and BA5000 at the University of Toronto,

thank you for encouragement and unforgettable moments. Special thanks for Karim Ab-

delhalim, Alireza Nilchi, Hamed Jafari, Kentaro Yamamoto, Yannis Sarkar, Derek Ho,

Dustin Dunwell, Saber Amini, Javid Musaev, Hossein Kassiri, and Arshya Feyzi. A spe-

cial thanks for Marcel Lugthart and Greg Unruh for their help and fruitful discussions

during my internship at Broadcom.

I would like to acknowledge CMC Microsystems for the provision of products and fabri-

cation services that facilitated this research. I also would like to thank Semtech Inc. for

facilitating testing in their laboratory and NSERC for funding support.

Finally, I wish to express my gratitude to my parents, brothers, and sister for their

invaluable love and encouragement through my life. Last but not least, my thanks go to

my lovely wife, Rana Qasass, for being on my side during the ups and downs. Thank

you for the constant and unconditional support and thank you for being a great wife and

mother.

iv

Contents

List of Tables vii

List of Figures viii

Lit of Abbreviation xviii

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Overview of PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Overview of DPLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Thesis Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 System Level Overview and Analysis of DPLL 9

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Time-domain model of DPLL . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 DPLL model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.2 TDC quantization noise . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.3 DCO quantization noise . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.4 TDC Fractional Spurs . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 System level design of DPLL . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4 Basic TDC structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4.1 TDC Normalization Circuit . . . . . . . . . . . . . . . . . . . . . 24

3 A DPLL with Calibrated Coarse and Stochastic Fine TDC 26

3.1 Overview of the DPLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

v

3.2 State-of-Art Implementations of TDC . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Buffer delay and Inverter delay line TDC . . . . . . . . . . . . . . 28

3.2.2 Vernier delay line TDC . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2.3 Gated ring oscillator (GRO) TDC . . . . . . . . . . . . . . . . . . 28

3.2.4 Interpolation-Based TDC . . . . . . . . . . . . . . . . . . . . . . 29

3.2.5 Two-step TDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Coarse-Fine Stochastic TDC . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.3.1 Coarse TDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.2 Fine Stochastic TDC . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.4 TDC Output Normalization . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 TDC Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6 Clock domain synchronization . . . . . . . . . . . . . . . . . . . . . . . . 49

3.7 Implementation Details of the DPLL . . . . . . . . . . . . . . . . . . . . 50

3.7.1 Digital Loop Filter (DLF) . . . . . . . . . . . . . . . . . . . . . . 50

3.7.2 DCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.8 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.8.1 PCB and Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.8.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4 Linearization of Digital PLL 72

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.2 TDC Dead-Zone Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2.1 Zero-Phase Restart (ZPR) Mechanism . . . . . . . . . . . . . . . 80

4.3 Noise-Shaped Dithering . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3.1 Implemented Noise-Shaped Dithering . . . . . . . . . . . . . . . . 82

4.3.2 Improved Noise-Shaped Dithering . . . . . . . . . . . . . . . . . 85

4.4 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 Cycle-Slipping and Pull-In Range of Bang-Bang PLLs 89

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

5.2 Transient Analysis of MPBBD PLL When Far From Lock . . . . . . . . 92

5.3 Cycle Slipping Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . 95

vi

5.3.1 Analysis of Pull-In Frequency Range . . . . . . . . . . . . . . . . 95

5.3.2 Locking Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

5.4 Fast Simulation Model of a DPLL with Quantized Phase Detector . . . . 108

5.4.1 Model Development . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.4.2 Analogy between DPLL and ∆Σ modulator . . . . . . . . . . . . 111

5.4.3 Improved MPBBD (IMPBBD) without cycle slipping to accelerate

Frequency Acquisition . . . . . . . . . . . . . . . . . . . . . . . . 116

5.4.4 Verilog-A Simulation of DPLL . . . . . . . . . . . . . . . . . . . . 117

5.5 Implemented Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.5.1 DCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.5.2 MPBBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

5.5.3 Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5.5.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.5.5 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

6 Conclusion 131

6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

A Schematics 135

B Noise Contribution to Timing Jitter 139

B.1 Noise Floor Contribution to Timing Jitter . . . . . . . . . . . . . . . . . 139

B.2 Phase Noise and Spurs Contribution to Timing Jitter . . . . . . . . . . . 140

B.3 Approximation of RMS Timing Jitter from L(f) . . . . . . . . . . . . . . 143

C Modeling and Simulation of DCO 145

C.1 Noise Modeling of White Gaussian Noise . . . . . . . . . . . . . . . . . . 145

C.1.1 Modeling Flicker Noise . . . . . . . . . . . . . . . . . . . . . . . . 146

C.2 Simulation of the PLL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Bibliography 156

vii

List of Tables

3.1 State-of-the-art fine-resolution TDC . . . . . . . . . . . . . . . . . . . . . 69

3.2 Comparison Among Published Digital Synthesizers. . . . . . . . . . . . 70

4.1 Summary of TIE rms and peak-to-peak jitter for the 60 different simulation. 84

5.1 Comparison of the pull-in range of BBPD vs. MPBBD base DPLL using

simulation and theoretical findings when Kp = 3 and Ki = 1/32. . . . . . . 100

5.2 The pull-in range (normalized to reference frequency) of BBPD-DPLL

based on simulations and presented theory as well as based on other ref-

erences. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

viii

List of Figures

1.1 Typical analog PLL Architecture with a multi-modulus divider and ∆Σ

modulator to synthesize fractional channels. . . . . . . . . . . . . . . . . 3

1.2 Typical analog PLL Architecture with DAC to cancel the deterministic

error caused by ∆Σ modulator [7]. . . . . . . . . . . . . . . . . . . . . . 3

1.3 Digital PLL Architecture where the loop filter is all digital, and the phase

and frequency error signals are fixed-point numbers. . . . . . . . . . . . . 4

1.4 Phase noise contributions for low- and high-bandwidth DPLLs. . . . . . . 5

2.1 Digital PLL Architecture where the loop filter is all digital, and the phase

and frequency error signals are fixed-point numbers. . . . . . . . . . . . . 10

2.2 DPLL model in discrete-time. The DCO gain, Kdco, is expressed in Hz/

LSB. The phase detector gain, Ktdc, is unity for fractional mode and is

inversely proportional to the input phase error during integer mode. . . . 11

2.3 DPLL response using different sampling rate when Kp = 1, Ki = 1/64,

and the DCO gain Kdco = 726 kHz. The blue circles represent DPLL

responses when Fref = 20 MHz while the green triangles describe DPLL

responses when Fref = 40 MHz. . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 DPLL behavior for different damping settings. The DCO gainKdco = 726 kHz/LSB

and the proportional gain Kp = 1 for all settings. The blue circles rep-

resent DPLL step response with Ki = 1/64. The green triangles describe

DPLL step response when Ki = 4/64. The final setting is marked using

red squares where Ki = 16/64. . . . . . . . . . . . . . . . . . . . . . . . . 14

ix

2.5 DPLL behavior for different bandwidth settings while damping ratio is

kept the same. The DCO gain Kdco = 726 kHz/LSB. The blue circles

represent DPLL step response with Kp = 1 and Ki = 1/64. The green

triangles describe DPLL step response when Kp = 2 and Ki = 4/64. The

final setting is marked using red squares where Kp = 4 and Ki = 16/64. . 14

2.6 TDC output during frequency acquisition (below 30 µs) and phase locking

with different TDC resolutions (FCW = 120.01709). . . . . . . . . . . . 16

2.7 Spectrum of TDC quantization noise, tQ, for different TDC resolutions

(FCW = 120.01709). Simulation results are marked in blue while theo-

retical expectations from Eq. 2.13 are marked in red. . . . . . . . . . . . 16

2.8 Phase noise spectrum of TDC Output for different fractional channels

(∆ttdc = 32 ps). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.9 Spectrum of the phase error computed by TDC when FCW = 120.01709:

(a) solid red line when ∆ttdc = 32 ps (b) dashed blue when ∆ttdc = 2 ps.

There is 20 dB difference in the spectrum density at low frequency. . . . 19

2.10 Histogram of absolute output jitter when FCW = 120.01709: (a) solid

red line when ∆ttdc = 32 ps where peak to peak jitter is 21 ps (b) plus

blue symbols when ∆ttdc = 2 ps where peak to peak jitter is 9 ps. . . . . 19

2.11 Timing diagram of phase error computation for DPLL with FCW =21/4. 20

2.12 Phase noise of the output clock (2400.3418 MHz), based on MATLAB/

Simulink simulation, when ∆ttdc = 4 ps. . . . . . . . . . . . . . . . . . . 22

2.13 Buffer delay line implementation of TDC: simplified schematic view (left);

timing diagram(right). The raw Q[i] is pseudo-thermal code to be con-

verted into a normalized binary word representing the fractional phase

error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.14 Estimating phase error based on the TDC output. . . . . . . . . . . . . . 23

2.15 A typical circuit to normalize the phase error of a TDC. . . . . . . . . . 24

2.16 Estimate of TDC resolution as computed by the TDC normalization cir-

cuit. The raw data (blue) is plotted along the 128-point moving average

filter (red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 A digital PLL architecture for fractional frequency synthesis [16]. The

shaded blocks are custom-designed but can be automatically generated

using a scripting language like TCL due to the regular structure. . . . . 27

x

3.2 Block and timing diagram of Gated Ring Oscillator (GRO) based TDC. . 29

3.3 Two-stage TDC: Coarse TDC followed by timing amplifier of the residue

which feed to another coarse stage. . . . . . . . . . . . . . . . . . . . . . 30

3.4 The coarse TDC architecture of a two-step TDC. The delayed version of

Fref with phase closest to Fout is muxed to the second TDC stage. Path

delays for the selected reference phase Fref to D Fref and DCO clock Fout

to D Fout are matched. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 The fine stochastic TDC (STDC) architecture of the two-step TDC. The

STDC outputs are sampled on the rising edge of the delayed reference clock. 33

3.6 (a) The stochastic TDC arbiter input-output relationship without and

with random mismatch. Input-referred voltage offset due to mismatch

translates into time offset. (b) SR-Latch used in the stochastic TDC as

arbiter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.7 A Monte Carlo simulation of the stochastic TDC for a given negative

phase error. The sum of all stochastic TDC arbiter outputs translates

into a phase error within the linear region of the time-offset’s statistical

CDF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.8 Spectre Monte-Carlo simulation of threshold voltage, Vth, for a minimum

size transistor. Accordingly, mean(Vth) = 345 mV and stdv(Vth) = 22.78 mV. 36

3.9 Transfer function of stochastic TDC output using Spectre Monte-Carlo

simulation. Note that Vth has a standard deviation of 22.78 mV. . . . . . 37

3.10 (a) Transfer function of one random example of a non-ideal stochastic

TDC when the number of arbiters M = 64. The associated DNL and INL

are shown in (b) and (c), respectively. . . . . . . . . . . . . . . . . . . . . 39

3.11 Normalized PDF of Gaussian distributed random offset for a stochastic

TDC with a different number of arbiters. Each plot is obtained from a

100-run Monte-Carlo simulation. . . . . . . . . . . . . . . . . . . . . . . 41

3.12 Normalized CDF of Gaussian distributed random offset for a stochastic

TDC with a different number of arbiters. Each plot is obtained from a

100-run Monte-Carlo simulation. . . . . . . . . . . . . . . . . . . . . . . . 42

3.13 The associated DNL and INL of an ideal stochastic TDC with random

offset when the number of arbiters M = 64 & 512. . . . . . . . . . . . . . 43

xi

3.14 Phase error computation and normalization with respect to one DCO out-

put period, performed digitally. The phase error computed by the coarse

TDC is refined by the stochastic TDC. . . . . . . . . . . . . . . . . . . . 44

3.15 On-chip low-area calibration algorithm of the coarse TDC based on a code

density test. The dedicated calibration clock, fcalb, is sampled by the

coarse TDC during the calibration phase and once done, the coarse TDC

samples the DCO clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.16 Delay cell with 4-bit calibration capacitor bank. . . . . . . . . . . . . . . 46

3.17 Spectre mismatch Monte-Carlo simulation of the inverter unit used in the

calibrated coarse TDC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.18 Generating random time errors that are uniformly distributed to calibrate

the coarse TDC. Sampling 333.333 MHz calibration clock using 20 MHz

reference will produce uniform time error within ± 1500 ps. . . . . . . . 47

3.19 The delay of TDC inverters before (blue triangle points) and after (red

square points) calibration using a balanced code density test with fine

delay correction (floating point precision). After calibration, the delay

mean = 33.345 ps, std = 0.335 ps, and peak-peak error = 1.203 ps. . . . 48

3.20 The delay of TDC inverters before (blue triangle points) and after (red

square points) calibration using balanced mean with 1 ps correction step

(fixed point precision). After calibration, the delay mean = 32.603 ps, std

= 0.559 ps, and peak-peak error = 1.854 ps. . . . . . . . . . . . . . . . . 48

3.21 Clock synchronization of the reference clock, fref , using the DCO clock,

fout, and a divided down DCO clock, clk8. The DCO clock is divided by

two using CML divider which are custom designed. The synchronization

afterward and feedback phase counter are fully synthesized. . . . . . . . . 49

3.22 Implementation of the digital loop filter. The coarse filter uses only pro-

portional gain, Kc, to accomplish rough frequency lock. Then, it gets

disabled such that a first order filter takes over to achieve phase lock.

Gear shifting is used to accelerate the phase locking time. Finally, the IIR

can be enabled to filter out high-frequency noise. . . . . . . . . . . . . . 51

3.23 LC-DCO with two banks of tuning. The coarse tuning is implemented

using MiM capacitors while the fine tuning is achieved by using MOS

varactors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

xii

3.24 Coarse frequency tuning using Metal-Insulator-Metal (MiM) capacitors. . 52

3.25 Fine frequency tuning using MOSFET varactors. The frequency tuning is

defined by the difference of MOS capacitance between the ON and OFF

state. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.26 Illustration of the DCO controls bits and gains. There are six coarse bits

with an average gain of 8125 kHz/code. Also, there are 24 fine bits that are

further divided down into 6-MSBs with an average gain of 726 kHz/code

and 18 LSBs representing the fractional part of the frequency control word.

The 18 fractional LSBs are decoded into 7-bit thermo-metric matrix and

11-bit provided to the ∆Σ modulator to achieve immensely fine frequency

resolution below what a minimum size MOS varactor can achieve in a

particular process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.27 Implementation of the third-order reduced complexity ∆Σ modulator [33].

The first stage has higher computational resolution compared with the

following stages to reduce power and complexity and to meet timing re-

quirement during synthesis. . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.28 Die photo of the DPLL chip in an IBM (now GF) 130 nm bulk CMOS

process (active area is 0.43 mm2). . . . . . . . . . . . . . . . . . . . . . 55

3.29 PCBs used for powering, biasing, and programming the DPLL chip. . . . 56

3.30 Block diagram of the test setup. The FPGA on the DE0 nano board, which

controls the DPLL chip, is programmed via a PC using Altera Quartus.

KE5FX software runs on the PC and can capture the spectrum of a par-

ticular clock using an HP 8565C spectrum analyzer. . . . . . . . . . . . 57

3.31 DCO gain measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.32 The differential output clock captured using Tektronix RSA6114A. The

differential peak-to-peak voltage is 370mV for a 2 GHz clock. . . . . . . . 58

3.33 Spectrum of the output clock, captured by HP8565C spectrum analyzer

and KE5FX tool, when the reference clock is frequency modulated. . . . 59

3.34 Verilog-A Simulation vs. Measurement captured by HP8565C spectrum

analyzer and KE5FX tool. . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.35 Spurs spectrum at 2012.5 MHz measured using Agilent E4448A spectrum

analyzer before calibration. . . . . . . . . . . . . . . . . . . . . . . . . . . 60

xiii

3.36 Spurs spectrum at 2003.125 MHz measured using Tektronix RSA 6114A

real-time spectrum analyzer before calibration. . . . . . . . . . . . . . . . 61


real-time spectrum analyzer before calibration. . . . . . . . . . . . . . . . 61


real-time spectrum analyzer. . . . . . . . . . . . . . . . . . . . . . . . . . 62


real-time spectrum analyzer. . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.40 Phase noise measurement of 2GHz clock using a HP8565C analyzer with

(red) and without (blue) the fine TDC. The reference clock is a 20 MHz

temperature-controlled oscillator. . . . . . . . . . . . . . . . . . . . . . . 64

3.41 The random jitter measurement of the output clock when the fine stochas-

tic TDC is activated. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.42 DPLL output phase noise spectrum at 2.4 GHz captured by an Agilent

E4448A spectrum analyzer. The in-band noise is -107 dBc/Hz while the

integrated jitter is 500 fs rms (0.43 degree) from 1 kHz to 100 MHz for a

loop bandwidth of 1.42 MHz. . . . . . . . . . . . . . . . . . . . . . . . . 66

3.43 DPLL output phase noise spectrum at 1.995 GHz captured by an Agilent

E4448A spectrum analyzer. The in-band noise is -104 dBc/Hz while the

integrated jitter is 233 fs rms from 1 kHz to 100 MHz for a loop bandwidth

of 700 kHz. An IIR filter was used to attenuate high frequency spurs. . . 66

3.44 Fractional synthesis measurements using HP8565C analyzer with (a) a

21 MHz input reference at channel 95 + 67/256 and (b) a 20 MHz input

reference at channel 109 + 64/256 exhibiting less than 1 ppm frequency

error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.1 A digital PLL architecture for integer and fractional mode synthesis. . . 73

4.2 Buffer delay line implementation of TDC: simplified schematic view (left);

timing diagram (right). The raw Q[i] is pseudo-thermal code to be con-

verted into a normalized binary word representing the fractional phase

error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.3 TDC Output during frequency (below 30 µs) and phase acquisition for

different DPLL operation modes. . . . . . . . . . . . . . . . . . . . . . . 74

4.4 The DPLL nonlinearity due to TDC quantized response. . . . . . . . . . 75

xiv

4.5 Dead-zone behavior of Integer-N DPLL with a TDC resolution of 32 ps. . 76

4.6 Bang-Bang behavior of Integer-N DPLL. . . . . . . . . . . . . . . . . . . 76

4.7 Spectrum of TDC normalized output. . . . . . . . . . . . . . . . . . . . . 78

4.8 Phase noise of the same output clock for 60 different initial conditions for

uncompensated Integer-mode DPLL. . . . . . . . . . . . . . . . . . . . . 79

4.9 Zero-phase-restart (ZPR) triggered during the transition from coarse to

fine locking mode. ZPR ensures a smooth transition from coarse to fine

locking without disrupting DCO [15]. . . . . . . . . . . . . . . . . . . . . 80

4.10 Dithering the reference clock by using ∆Σ modulator to control the pro-

grammable delay of an input clock buffer. . . . . . . . . . . . . . . . . . 81

4.11 A typical circuit to estimate the phase error of a coarse TDC. . . . . . . 82

4.12 Digital dithering algorithm at the falling edge of the output clock (0.5 UI). 83

4.13 Phase noise of the same output clock for 60 different initial conditions after

applying noise shaped random offset and disabling the fractional part of

ZPR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.14 Time Interval Error (TIE) for the 60 simulations with different initial

conditions: No dithering, Random dithering, ∗ Noise-shaped dithering. 84

4.15 A generic proposed circuit to generate a dithered phase error, tr/Tout, which

can be applied to an integer and simple fractional channel synthesis. . . 85

4.16 Die photo of the DPLL chip in IBM 130 nm bulk process [8]. It is the same

chip used to demonstrate the DPLL with a coarse-fine TDC in chapter 3. 86

4.17 Phase noise measurement using HP8565C analyzer showing different be-

haviors of integer-mode DPLL. . . . . . . . . . . . . . . . . . . . . . . . 87

4.18 The measured jitter histogram during dead-zone operation. The extracted

random jitter is 896 fs RMS while the deterministic jitter due to dead-zone

operation is 28.3 ps peak-to-peak. . . . . . . . . . . . . . . . . . . . . . . 87

5.1 A DPLL with a quantized phase detector and without a feedback divider. 90

5.2 Transfer function of the MPBBD (thick solid blue) vs. BBPD (thin dashed

red) when a DCO period is divided into eight regions with each region

spans 45 degrees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.3 Phase domain model of DPLL with quantized phase detector. . . . . . . 92

5.4 Illustration of cycle slipping and speed of frequency acquisition for BBPD

vs. MPBBD based DPLL. . . . . . . . . . . . . . . . . . . . . . . . . . . 96

xv

5.5 The pull-in range (normalized to reference frequency) of BBPD-DPLL for

different values of Ki and Kp. Dashed lines obtained by Eq. 5.31, solid

lines obtained by Eq. 5.33, and symbols are based on simulations. . . . . 101

5.6 A plot of the number of cycle slips (Kp = 3 andKi = 1/32) for BBPD versus

MPBBD based DPLL. (a) BBPD: blue circles from simulation results and

dashed red line from Eq. 5.44 (b) MPBBD: blue squares from simulation

results and solid red line from Eq. 5.44. . . . . . . . . . . . . . . . . . . . 104

5.7 Frequency locking time until cycle slips disappear (Kp = 3 and Ki = 1/32)

for a BBPD (blue circles from simulation) and MPBBD (blue squares from

simulation) based DPLL. Eq. 5.51 is represented using solid red, Eq. 5.55

using small dashed blue line, while Eq. 5.58 using large dashed red. . . . 107

5.8 Discrete-time model of phase error development for fast evaluation of DPLL.111

5.9 Integral path output and cycle-slip trajectory for BBPD (blue triangles)

and MPBBD (red squares) based DPLL, when frequency offset is 3 MHz

(3% frequency error while Kp = 3 and Ki = 1/32). . . . . . . . . . . . . . 112

5.10 Equivalent ∆Σ representation of DPLL with a quantized phase detector. 113

5.11 Transient simulation comparison between BBPD and MPBBD based DPLL,

when frequency offset is 2.5 MHz (2.5% frequency error while Kp = 3 and

Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

5.12 Modification of the MPBBD transfer function to extend pull-in range and

reduce acquisition time. The improved MPBBD (IMPBBD) identifies the

sign of the initial frequency error and accordingly change its transfer function.117

5.13 Pull in range and locking time of BBPD (blue ∗), MPBBD (red ), and

IMPBBD (green •) based DPLL. The lock-in range of the IMPBBD is

extended to ±fref (fref is 100 MHz and fout is 1 GHz while Kp = 3 and

Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5.14 Integral path output and cycle-slip trajectory for DPLL with three differ-

ent phase detectors (frequency offset is 7.5 MHz and fref is 100 MHz while

Kp = 3 and Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5.15 Transient simulation comparison between MPBBD and IMPBBD based

DPLL, (frequency offset is 7.5 MHz and fref is 100 MHz while Kp = 3 and

Ki = 1/32). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

xvi

5.16 Integral path output of BBPD (dark blue) vs. MPBBD (light red) based

DPLL (frequency offset is 6.0 MHz). The simulation employs uniform

time-step sampling (1/100 of DCO period) using a Verilog-A implementa-

tion of the DPLL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.17 The architecture of the implemented DPLL. The FLL and the high-speed

counter, as well as the synchronization structure, are disabled by a lock

detector once frequency lock occurs. The MPBBD locks the phase of the

output clock (phase A) to the reference clock. . . . . . . . . . . . . . . . 122

5.18 The programmable delay unit used to form a four-stage DCO. Each unit

has 7-bit coarse cap configuration and 8-bit fine cap implemented as a

combination of 4-bit binary along with 15 thermal caps. . . . . . . . . . . 124

5.19 Timing diagram of the multi-phase DCO sampled by a reference clock at

some point. Based on the sequence of MPBBD outputs (01111000), the

LUT provides an indication of the phase error magnitude between phase

A and reference clock as shown in the circle on the right bottom side. . . 125

5.20 The transfer function of the MPBBD and its LUT (thick solid blue) vs.

BBPD (thin dashed red). . . . . . . . . . . . . . . . . . . . . . . . . . . 125

5.21 Absolute phase error of the output clock of DPLL with respect to an ideal

clock with (a) binary BBPD (thin dashed blue) and (b) MPBBD (solid

thick red). The initial frequency error is 300 MHz. Data is clipped below

6.5 µs as rest was applied at that moment after loading the right loop

configurations. The FLL takes 15.5 µs (310 reference cycles) while PLL

takes 30 µs (600 cycles) in case BBPD (a) is used and 5µs (100 cycles) in

case MPBBD (b) is employed. . . . . . . . . . . . . . . . . . . . . . . . . 126

5.22 The mapped output of the bang-bang detector (from LUT) during fre-

quency and phase lock. The binary BBPD slews when phase error is high

and takes lengthy time to recover. On the other hand, MPBBD automat-

ically gears its gain according to the phase error magnitude till lock is

achieved. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.23 DPLL output phase-noise spectrum at 1.20 GHz: Simulation (blue) vs.

measurement (black) captured by an Agilent E4448A spectrum analyzer.

The in-band noise is -98.32 dBc/Hz while the loop bandwidth is around

1.7 MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

xvii

5.24 DPLL output phase-noise spectrum at 1.40 GHz captured by an Agilent

E4448A spectrum analyzer. The in-band noise is -96.48 dBc/Hz while the

loop bandwidth is around 1.7 MHz. . . . . . . . . . . . . . . . . . . . . . 129

5.25 Die photograph of the DPLL in 28nm CMOS LP ST Microelectronics

Technology (active area is less than 0.008 mm2). . . . . . . . . . . . . . . 129

A.1 Schematic of the four-stage, 50 output driver used to send the DCO output

off-chip. The last differential pair M4 is sized W = 176 µm/L = 120 nm

while the load resistor R4 = 62.5 ohm. The previous stages are sized

according to the following: transistor sizes ofM4 = 2∗M3 = 4∗M2 = 8∗M1

and resistor values of R4 = R3/2 = R2/4 = R1/8. . . . . . . . . . . . . . 135

A.2 (a) Schematic of the CML latch used in the divide-by-2 circuit. The value

of R = 2 kΩ while M1 = M2 = M3 has W = 6 µm and L = 120 nm (b)

Schematic of divide-by-2 using the two CML latches. . . . . . . . . . . . 136

A.3 Schematic showing the CML to CMOS conversion employed after the CML

divide-by-2. The CML signal is AC coupled through Cc = 150 fF and

then passed to CMOS inverter with feedback resistor Rf = 35 kΩ to

define the input common mode. The small cross coupled CMOS inverters

(W = 160 nm & L = 120 nm) are used ensure differential operation.

Another stage of CMOS inversion follows with similar size of the first

stage (W = 13.02 µm & L = 120 nm) . . . . . . . . . . . . . . . . . . . . 137

A.4 (a) Illustrative diagram of the fabricated chip mounted on QFN36 package

and soldered on PCB (b) Lumped model of the output PADs, bond wires,

lead and PCB trace capacitance. The chip dimensions are 1 µm x 1 µm

while the QFN36 dimensions are 5 µm x 5 µm. Accordingly, the bond

wire could be 2-2.5 µm long and so Lbw = 2.5 nH. The extracted PAD

capacitance was 90 fF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

A.5 Sense-amplifier flip flop with a narrow metastability window [15] . . . . 138

C.1 Verilog-A model of DCO phase noise. . . . . . . . . . . . . . . . . . . . . 149

xviii

List of Abbreviation

ADPLL All Digital Phase Lock Loop

ADC Analog to Digital Converter

BBPD Bang-Bang Phase Detector

CDF Cumulative Distribution Function

CDR Clock and Data Recovery

CML Common Mode Logic

CMOS Complementary Metal Oxide Semiconductor

DAC Digital to Analog Converter

DCO Digitally Controlled Oscillator

DCW DCO Control Word

DLF Digital Loop Filter

DPLL Digital Phase Lock Loop

FCW Frequency Control Word

FLL Frequency Lock Loop

FSM Finite State Machine

GSM Global System for Mobile Communications

IC Integrated Circuit

xix

IP Intellectual Property

MOS Metal Oxide Semiconductor

MPBBD Multi-Phase Bang-Bang Detector

PDF Probability Distribution Function

PHE Fixed-point Phase Error Signal

PHF Fixed-point Feedback Phase Signal

PHR Fixed-point Reference Phase Signal

PLL Phase Lock Loop

PVT Process, Voltage, and Temperature

RF Radio Frequency

RTL Register-Transfer Level

SoC System on a Chip

STDC Stochastic Time to Digital Converter

TDC Time to Digital Converter

VCO Voltage Controlled Oscillator

xx

Chapter 1

Introduction

1.1 Motivation

Electronics innovations in the last two decades have been fueled by the expansion of

wireless communications and standards, high demand for Internet speed and capacity,

and emerging of Internet of things (IoT). This trend still has an excellent potential for

growth. For example, Cisco Inc. estimated there will be nearly 50 billion IoT devices by

2020 which is 50 times larger than the installed base in 2009 [1].

The electronics industry has adopted a system on a chip (SoC) design methodology

to meet the endless demand for higher performance and complex computation, storage,

and communication capabilities under the pressure to shorten time-to-market. A SoC

design integrates a broad range of reusable components called intellectual properties (IPs)

most of which are digital IPs such as ARM processors, memory blocks, communication

protocols, etc. However, SoC design still needs mixed signal and analog RF IPs to

interface with the physical world such as analog-to-digital converters (ADCs), digital-to-

analog converters (DACs), and phase lock loops (PLLs).

Digitally assisted analog circuits have been emerging to enhance analog and mixed-

signal circuit performance by exploiting the growing computational power of Digital

Signal Processing (DSP) on a SoC [2]. This technique has been used to correct the

output of a pipelined ADC and for various related foreground and background calibration

algorithms [3][2]. A PLL design is no exception from other mixed signal designs. Many

researchers propose to use DSP to correct for deterministic errors caused by dithering

of a PLL and to calibrate the loop gain and bandwidth of PLL across process, voltage,

1

1.2. OVERVIEW OF PLLS

and temperature (PVT) variations [4]. The pioneering work of [5] proposes to use an all-

digital PLL (ADPLL) to take full advantage of the constant scaling of CMOS processes.

However, this goal is not entirely feasible since ADPLLs still have analog blocks to resolve

the phase error and to control the required output clock. This thesis, in particular, is

focused on the design of digitally intensive PLLs (DPLLs) and to explore limitations and

ways to improve DPLLs.

1.2 Overview of PLLs

PLLs are feedback circuits that lock an output clock’s phase to an input reference clock’s

phase. PLLs are important and often performance limiting IPs in modern SoCs. PLLs

are used in a wide array of electronics, including microprocessors and communications

devices, digital television, DC motor control, etc.

In a wireline transceiver, a PLL can be used to recover a clock signal embedded

in a serial data stream (like USB, PCI Express, SATA, etc.) transmitted over a noisy

channel. In wireless communications, PLLs are used to provide the local oscillator for

up-conversion (modulation) during transmission and down-conversion (demodulation)

during signal reception. Also, a PLL can be used as a frequency synthesizer to generate

a stable frequency at multiples of an input frequency to provide a programmable clock

for various standards in a software-defined radio (SDR). In large digital circuits like

microprocessors, PLLs are used to distribute precisely timed clocks. Moreover, PLLs

can be used for jitter filtering and reduction, and many more applications. Different

applications have different requirements on the PLL specifications. For example, HDTV

requires a PLL with a very small bandwidth. On the other hand, clock and data recovery

(CDR) circuits target relatively large bandwidth to track jitter on the incoming data

stream.

A PLL consists of a phase detector, a low-pass filter, a controllable oscillator, and a

divider. The earliest implementation of PLLs was purely analog where the phase detector

is implemented as a mixer or multiplier. Modern analog PLL implementation employs

a mixed signal design such that it combines a digital phase detector with an analog

charge pump whose output is converted to an analog voltage using an analog loop filter

to control a voltage controlled oscillator (VCO). There are a divider and noise-shaped

modulator to synthesize integer and fractional channels, as shown in Fig. 1.1.

2

1.2. OVERVIEW OF PLLS

ChargePump

N/N+1

VCO

Fout(t)PFD

Fref(t) AnalogLoop Filter

Dual Modulus Divider

ΣDigital Modulator

Nb[n]

M-bitFCW

Figure 1.1: Typical analog PLL Architecture with a multi-modulus divider and ∆Σmodulator to synthesize fractional channels.

Modern wireless and wireline communication standards place challenging demands on

the phase noise, spurious tones, jitter accumulation, and modulation bandwidth of PLLs

[6]. Accordingly, state-of-the-art wide-bandwidth analog PLLs employ analog phase-

noise-cancellation techniques using a DAC to suppress the quantization noise caused by

the ∆Σ dithering, as shown in Fig. 1.2. It enables a PLL to operate with small fractional

spurs and low phase noise at loop bandwidths of 700 kHz to 1 MHz [4]. However,

matching a DAC cancellation signal to the phase error is a complicated and challenging

analog circuit problem.

ChargePump

N/N+1

VCO

Fout(t)PFD

Fref(t) Loop Filter

Dual Modulus Divider

ΣDigital Modulator

NFrequency Selection

b[n]M-bit

++

-

DACdiv(t)

E(t)

res[n]

Error due to imperfect matching

Figure 1.2: Typical analog PLL Architecture with DAC to cancel the deterministic errorcaused by ∆Σ modulator [7].

3

1.3. OVERVIEW OF DPLLS

1.3 Overview of DPLLs

Research on DPLLs has been actively trying to replace or complement traditional analog

PLLs by taking advantage of aggressive CMOS scaling and operating under lower supply

voltages [8]. DPLLs offer several advantages over their analog counterparts. Firstly,

DPLLs are less sensitive to external noise, substrate noise, mismatch, and PVT variations

since many DPLL building blocks are realized with purely digital logic circuits. Secondly,

DPLLs consume much less area than analog PLLs, and programmability and testability

are available at very low area penalty, reducing die sizes and production costs [15].

In DPLLs, there is no charge pump and no analog loop filter. Instead, a digital

filter is implemented using digital logic standard cells. The phase detector is replaced

by or augmented with a time-to-digital converter (TDC). Fig. 1.3 shows a high level

diagram of the DPLL architecture considered in this thesis. An integer counter provides

an estimate of the number of output periods in one reference period. A correction to

that estimation is achieved using the TDC, which resolves the phase difference between

the reference and output clock and produces a phase error normalized with respect to

the output clock, ε. Together, the counter and TDC generate a feedback phase count,

PHF that is digitally subtracted from a reference accumulated phase, PHR, to produce

a phase error count, PHE. Then, the PHE is digitally filtered and applied to tune a

digitally-controlled-oscillator (DCO).

Despite the immense advantage of DPLLs, they do, however, impose new design

challenges due to the quantization of frequency in DCO and phase in that TDC which

Digital Loop Filter

DCOfout(t)

fref(t)

FCW

TDC-

+

+

fine

coarse

-

+

+PHE

+

PHF

PHR

Figure 1.3: Digital PLL Architecture where the loop filter is all digital, and the phaseand frequency error signals are fixed-point numbers.

4

1.3. OVERVIEW OF DPLLS

introduces quantization error and, hence, jitter. The resolution of phase error detection

is typically limited by the inverter delay in a particular fabrication process. For instance,

one inverter delay in 130 nm CMOS technology is about 32 ps while it reduces to 16 ps

in 65 nm CMOS technology.

TDC quantization noise and reference clock jitter are low-pass filtered by the DPLL’s

dynamics and are therefore dominant at low frequencies within the DPLL loop band-

width. On the other hand, DCO noise is high-pass filtered and dominant at high fre-

quencies as shown in Fig. 1.4.

Combining wide loop bandwidth and excellent in-band phase noise performance re-

mains particularly challenging for DPLLs. The work in [5] demonstrates that a DPLL

can meet even the tough GSM specification. However, its loop bandwidth of 40 kHz

remains an order of magnitude lower than that achieved by the analog techniques de-

scribed above. In applications where only high-frequency phase noise is of interest, a

wide loop bandwidth can be accommodated in a DPLL with a simple bang-bang phase

detector (no TDC); such is the case in [9]. However, more generally in DPLLs with wide

loop bandwidth, it is desirable to have very fine TDC resolution. At the same time, the

TDC’s input dynamic range should be sufficient to cover at least one output DCO period

in order for the DPLL to estimate the phase error across an entire DCO period. An even

larger dynamic range of at least two DCO periods is needed if on-chip jitter measurement

is to be performed. Although two recent DPLLs extended loop bandwidths to 142 kHz

Low PLL Bandwidth High PLL Bandwidth

dBc/

Hz

dBc/

Hz

ffo fo

TDC Noise

DCO Noise

DCO Noise

TDC Noise

f

Figure 1.4: Phase noise contributions for low- and high-bandwidth DPLLs.

5

1.4. THESIS CONTRIBUTION

[10] and 3 MHz [11], the former one cannot achieve low in-band phase noise while the

latter work sacrifices its out-of-band noise performance.

Designing a TDC with fine resolution also prevents detrimental nonlinear dynamics

from arising in DPLLs. If a DPLL is operating as a fractional-N synthesizer, the phase

relationship between DCO output and reference input is scrambled over time, and the

quantization error introduced by the TDC may be approximated as white noise [12].

However, if the DPLL is locked in an integer-N mode, the phase relationship between

TDC inputs is fixed and the TDC may exhibit either bang-bang behavior (associated

with unpredictable loop bandwidth), or it may exhibit a dead-zone behavior resulting in

dynamics that are very dependent upon the initial conditions of the loop. This thesis

focuses on improving the phase and frequency detection in general and on improving

TDC resolution and linearity as doing so improves the noise performance of DPLLs in

both integer and fractional synthesis modes.

1.4 Thesis Contribution

The thesis first presents a fractional-N DPLL that can operate from 1.99 – 2.5 GHz.

Though the DPLL design is meant to be generic, the unlicensed 2.4-GHz ISM (Industrial

Science and Medical) band was in mind during the design stage. Many applications in

the ISM band could take advantage of a wide bandwidth DPLL by using direct digital

modulation. For example, a Bluetooth transmitter employs Gaussian frequency shift

keying (GFSK) with 1Mbps basic data rate i.e. 500 kHz bandwidth [13]. Typically,

a Bluetooth transmitter uses either an open Loop VCO modulation or up-conversion

transmitter architecture. Alternatively, a 2.4 GHz DPLL with 500 kHz bandwidth can

be used to modulate the DCO frequency directly in the digital domain without the need

for DAC or up-conversion mixer. However, using direct digital modulation necessitates

having very small in-band noise. To achieve -114 dBc/Hz in-band phase noise at 2.4 GHz

locked to a 20 MHz reference clock, a TDC with 2 ps resolution is needed, approximately

more than an order of magnitude better than an inverter delay in 0.13 µm technology.

A 9-bit TDC is needed to cover the maximum period of 503 ps with 2 ps resolution.

The minimum inverter delay in 0.13 µm technology varies over PVT from 32 to 48

ps. Accordingly, a minimum of 16-stage coarse TDC is needed to cover the maximum

period of 503 ps. An obvious way to achieve 2 ps resolution is by arraying 24 instances

6

1.4. THESIS CONTRIBUTION

of the 16-stage coarse TDC in parallel where each coarse TDC has extra 2 ps delay with

respect the previous one. This arrangement will increase the area and power by 24 times.

Alternatively, using only one instance of the 16-stage coarse TDC followed by a fine TDC

with 2 ps would save power and area. The fine TDC must cover a range larger than the

maximum possible delay of an inverter in the coarse TDC. In this thesis, a fine TDC with

2 ps resolution that can cover 64 ps is sought. The thesis presents a fractional DPLL that

incorporates a novel low-power two-step coarse-fine TDC to achieve low in-band phase

noise operation. The DPLL employs a 6-bit stochastic TDC for the fine TDC stage while

still achieving wide locking range using a 4-bit coarse delay line TDC. On power-up, a

calibration algorithm to minimize nonlinearities in the coarse TDC is enabled. By using

a balanced mean code density test, the number of registers required for the calibration

algorithm is reduced by 30%. Based upon the coarse TDC output, the appropriate clock

signals are multiplexed into the stochastic fine TDC. The DPLL consumes a total of

15.2 mW of which 4.4 mW are consumed in the TDC in 0.13 µm CMOS. The integrated

random jitter is 213 fs rms for a 2 GHz output carrier frequency with 700 kHz loop

bandwidth. The calibration and IIR filtering reduce worst-case spurs from -54.4 dBc to

-70.55 dBc at 1.995 GHz operation.

The second part of the thesis presents a novel digital solution to avoid the problem

of dead-zone behavior in a digital phase locked loop (DPLL) caused by the quantization

effect of the TDC. The dead-zone behavior results in limit cycle behavior causing higher

than expected in-band phase noise and strong in-band spurious tones. This behavior is

dependent on the initial phase difference between the output and reference clock which

makes the DPLL performance inconsistent and unpredictable. To alleviate this problem,

a noise shaped offset is added to the phase error in the digital domain to keep the TDC

active and away from the dead-zone. The proposed solution is verified by extensive

simulation and using a DPLL prototype in a 0.13 µm CMOS process.

The third part of this thesis presents a rigorous mathematical analysis of DPLL em-

ploying a quantized phase detector during frequency acquisition where DPLLs usually

exhibit cycle slipping. The analysis finds that pull-in range is proportional to the square

root of the phase detector large signal gain,√KPD, while locking time is inversely propor-

tional to its square, K2PD. Based on the findings of this analysis, a multi-phase bang-bang

detector (MPBBD) based DPLL is proposed to accelerate frequency and phase locking

time and to increase the pull in range while maintaining same steady state performance

7

1.5. THESIS OUTLINE

as a bang-bang phase detector (BBPD) based DPLL. The proposed DPLL reduces power

consumption by disabling the high-speed counter and re-timing circuit in the feedback

loop after achieving frequency lock. Also, an improved version of the MPBBD is sug-

gested to extend pull-in range up to the reference frequency range that could eliminate

the need for a frequency lock loop and feedback counter for DPLLs and digital CDRs.

1.5 Thesis Outline

This thesis is structured as follows. Chapter 2 introduces the DPLL structure, and then

presents a discrete-time mathematical model for analysis, and finally discusses the effect

of TDC quantization noise on phase noise and jitter performance. Chapter 3 starts with

a review of the state-of-art TDCs and then describes the proposed coarse-fine TDC as

well as the TDC calibration loop. Also, an overview of the DCO and the digital loop is

given. The chapter concludes with test setups and measurements results. Then, Chapter

4 presents the nonlinear problem of dead zone behavior that manifests during integer

mode operation and which affects the loop dynamics and phase noise performance. The

chapter concludes with a presentation of the implemented dithering algorithm to alleviate

the dead zone problems along with simulations and measurement results. Chapter 5

analyzes a DPLL employing quantized phase detection during frequency locking and

finds closed loop formulas for locking time and frequency pull-in range of a DPLL. An

analogy between DPLL and ∆Σ modulator is drawn to estimate the frequency capture

range. The chapter finishes with a presentation of a prototype of MPBBD-DPLL chip

in 28 nm CMOS technology along with simulation and measurement results. Finally,

Chapter 6 concludes the thesis.

8

Chapter 2

System Level Overview and Analysis

of DPLL

This chapter presents a system-level overview of DPLLs. Then, a discrete-time model of

a DPLL is shown along with an approximate continuous-time model to derive the loop

response and necessary loop performance metrics. Afterwards, the chapter addresses the

effect of TDC and DCO quantization noise on DPLL performance along with discussion

of the TDC related problems during fractional DPLL frequency synthesis. Finally, an

overview of the basic structure and operation of TDCs and the following normalization

circuit is presented.

2.1 Introduction

Fig. 2.1 shows a general block diagram of a DPLL. The DCO changes its frequency

in discrete steps, and has two digital controlling inputs, each with a separate gain and

range. One input has coarse frequency step1 but allows wide frequency tuning range.

This coarse input is used during frequency locking at power-on or reset. The other input

has fine small frequency step with small tuning range that must be larger than one coarse

frequency step such that it covers the PVT variations of the coarse resolution. The DCO

is implemented as a combination of DAC followed by VCO. The DAC can be voltage,

current, or capacitance DAC. The later is composed of switchable capacitor banks to

1The thesis refers to the frequency step as the frequency resolution ∆fres that represents the least-significant-bit (LSB) of DCO.

9

2.2. Time-domain model of DPLL

Digital Loop Filter

DCOfout(t)

fref(t)

FCW

TDC-

+

+

fine

coarse

-

+

+PHE

+

PHF

PHR

Figure 2.1: Digital PLL Architecture where the loop filter is all digital, and the phaseand frequency error signals are fixed-point numbers.

increase and decrease the capacitive load in the DCO.

The TDC acts as phase detector that finds a quantized phase error between the refer-

ence clock (fref) and output clock (fout). The phase error is quantized with a resolution

limited by the fabrication technology and TDC implementation. The accumulated ref-

erence phase (PHR), feedback phase (PHF ), and phase error (PHE ) are all fixed-point

digital signals. Similarly, the frequency control word (FCW or N), coarse and fine DCO

control word (DCW ) and the TDC error signal (ǫ) are all fixed-point digital signals.

At startup, a frequency lock loop (FLL) sets the coarse DCW adjacent to the required

output frequency by employing a first-order DPLL with high proportional gain, Kp.

Then, the finite-state-machine (FSM) controls the DPLL and shifts down Kp to a low

value and enables a third-order loop operation to correct for the frequency residue and

to lock the output clock phase to the input clock reference.

2.2 Time-domain model of DPLL

DPLLs are implemented with discrete-time DSP and so time-domain models are required

to capture the DPLL performance and limitations accurately. Mainly, a DPLL designer

is interested in stability and transient performance like locking time and ringing, as well

as steady-state phase noise and jitter performance.

10


Ki

Kp

φref+

φoutφe+-+

1 - z-1Tref z

-1Kdco

TDC Digital Loop filter DCO

N k1 – (1-k)z-1Ktdc z

-m 1 - z-1z-1 IIR

Figure 2.2: DPLL model in discrete-time. The DCO gain, Kdco, is expressed in Hz/ LSB.The phase detector gain, Ktdc, is unity for fractional mode and is inversely proportionalto the input phase error during integer mode.

2.2.1 DPLL model

The DPLL can be represented by a discrete-time (z-domain) model as shown in Fig. 2.2.

The DPLL employs a proportional and delaying integral digital loop filter (DLF) that

defines the loop dynamics. An additional high-frequency pole is needed to provide extra

filtering of high-frequency spurs and noise, similar to an analog PLL. It is implemented

using an infinite impulse response (IIR) filter controlled by the value of k. The multi-

plication coefficients (Kp, Ki, and k) in the DLF are implemented by using shifting and

addition operations to reduce the complexity and cost.

In contrast to a conventional PLL, the divider ratio N does not appear in the feedback

path of the DPLL model shown in Fig. 2.2. The TDC finds the normalized phase error

φe, in terms of number of DCO periods, between reference clock fref and the output clock

fout. It does so by using fref to sample delayed versions of fout directly without being

divided down. Accordingly, the reference phase, φref , must be multiplied by the frequency

ratio between output and input clock i.e. N or FCW. The gain of the TDC, Ktdc, is

equal to one during fractional mode but can be very large or small during integer-mode

operation when the DPLL exhibits “bang-bang” or “dead-zone” behavior, respectively,

as will be explained in Chapter 4. The term z−m represents additional delay within the

DPLL and depends upon the details of the implementation of the particular TDC.

The DCO reacts to the filtered normalized phase error (i.e. fine DCW) and changes

its frequency according to its gain (i.e. resolution, Kdco). Since oscillator phase is merely

an integration of its frequency over time, a discrete-time integrator is needed to model

11


the phase to frequency conversion in the DCO (z−1/1− z−1). For a discrete-time model,

the quantities are updated once every reference period, and so there is an embedded

zero-order hold at the output of DCO (since it holds the frequency for an entire reference

period) which can be approximated by Tref for low frequencies of interest [14]. Based on

the DPLL model shown in Fig. 2.2, the open loop transfer function is given by

Hol(z−1) =

(Ktdc · z−m

)(Kp +Ki

z−1

1− z−1

)(k

1− (1− k)z−1

)(Kdco

Trefz−1

1− z−1

)(2.1)

where Tref = 1/fref is the sampling reference period. To find the equivalent continuous-

time DPLL model, one can use a forward-rectangular discrete-to-continuous-time con-

version by approximating z with sTref + 1 while preserving the stability of the system

[14]. Also, recall that z ≡ esTref ; then, using the power series, z−m ≡ e−mTref s can be

approximated as 1 − mTrefs. Finally, the equivalent continuous-time DPLL model has

the following approximate open-loop response:

Hol(s) = Ktdc (1−mTrefs)

(Kp +

Ki

sTref

)(1 + sTref

1 + sTref/k

)(Kdco

s

)(2.2)

If k ≈ 1, then the IIR terms are approximately unity, and can be omitted for the open

loop response in Eq. 2.2. Also, assuming the TDC does not introduce significant extra

delay within the loop i.e. m ≈ 0,

Hol(s) ≈ Ktdc

(Kp +

Ki

sTref

)(Kdco

s

)=KtdcKdcoKi

Tref

(1 + s/ Ki

KpTref

s2

)(2.3)

⇒ Hol(s) =ω2n

s2

(1 +

s

ωz

)(2.4)

Eq. 2.3 represents a second order system with natural frequency ωn, damping factor ζ ,

and phase margin PM, as follows:

Natural frequency: ωn =

√KtdcKdcoKi

Tref(2.5)

Damping factor: ζ =ωn

2ωz

=Kp

2

√KtdcKdcoTref

Ki

(2.6)

12


0 2 4 6 8 10 12 14 160

0.2

0.4

0.6

0.8

1

1.2

1.4

Time (microseconds)

No

rma

lize

d F

ilte

r O

utp

ut

(a) Step response

104

105

106

107

−25

−20

−15

−10

−5

0

5

Ma

gn

itud

e (

dB

)

Frequency (rad/s)

(b) Closed loop magnitude response when N =1

Figure 2.3: DPLL response using different sampling rate when Kp = 1, Ki = 1/64,and the DCO gain Kdco = 726 kHz. The blue circles represent DPLL responses whenFref = 20 MHz while the green triangles describe DPLL responses when Fref = 40 MHz.

Unity gain bandwidth: ωUGB = ωn

√2ζ2 +

√4ζ4 + 1 (2.7)

Zero frequency: ωz =Ki

KpTref(2.8)

Phase margin: PM = tan−1

(ωUGB

ωz

)= tan−1

(2ζ

√2ζ2 +

√4ζ4 + 1

)(2.9)

3dB bandwidth: ω3dB = 2ζωn = KtdcKdcoKp (2.10)

From the above equations, the DPLL behavior is mainly defined by the DLF coefficients

(Kp & Ki), DCO gain (Kdco), and reference frequency (fref = 1/Tref). It is interesting

to note that the loop bandwidth, ω3dB, is merely defined by the DCO gain and by the

proportional path gain, Kp. However, ω3dB is not affected by the frequency division ratio,

N . The TDC gain, Ktdc, is unity during the fractional mode of operation. Fig. 2.3 shows

the DPLL response for two different sampling rates (i.e. reference frequencies) while

other loop parameters are kept the same. Based on Eq. 2.6, higher sampling rates make

the DPLL response less damped which causes ringing and peaking.

Scaling up Ki while keeping all other parameters fixed will increase the natural fre-

quency (ωn) and reduce damping factor (ζ) and phase margin, as well, while settling time

and loop bandwidth are barely affected, as shown in Fig. 2.4.

13


0 1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Time (microseconds)

No

rma

lize

d F

ilte

r O

utp

ut

(a) Step response

104

105

106

107

−10

−5

0

5

10

Ma

gn

itud

e (

dB

)

Frequency (rad/s)


Figure 2.4: DPLL behavior for different damping settings. The DCO gainKdco = 726 kHz/LSB and the proportional gain Kp = 1 for all settings. The bluecircles represent DPLL step response with Ki = 1/64. The green triangles describeDPLL step response when Ki = 4/64. The final setting is marked using red squareswhere Ki = 16/64.

0 1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

Time (microseconds)

No

rma

lize

d F

ilte

r O

utp

ut

(a) Step response

104

105

106

107

−5

−4

−3

−2

−1

0

1

2

3

4

5

Ma

gn

itud

e (

dB

)

Frequency (rad/s)


Figure 2.5: DPLL behavior for different bandwidth settings while damping ratio is keptthe same. The DCO gain Kdco = 726 kHz/LSB. The blue circles represent DPLL stepresponse with Kp = 1 and Ki = 1/64. The green triangles describe DPLL step responsewhen Kp = 2 and Ki = 4/64. The final setting is marked using red squares where Kp = 4and Ki = 16/64.

To achieve programmable bandwidth operation, Kp must be scaled by α while Ki

must be scaled by α2 to preserve the loop phase margin and peaking in the closed-loop

frequency response, as shown in Fig. 2.5.

14


Finally, the closed loop response is given by

G(s) = NHol

1 +Hol= N

1 + s/ωz

s2/ω2n + s/ωz + 1

(2.11)

The TDC quantization as well as TDC intrinsic phase noise will be low-pass filtered by

G(s). However, the DCO intrinsic phase noise will be high-pass filtered by [1 − G(s)]

while any frequency noise due to the quantization and dithering process will be band-pass

filtered by [2π/s][1 − G(s)]. In the following, the effect of DCO and TDC quantization

noise on the in-band phase noise is investigated.

2.2.2 TDC quantization noise

Assume the TDC uniformly quantizes the phase difference with a given TDC resolution,

∆ttdc (expressed in seconds). Accordingly, the variance of the timing uncertainty is σ2tQ

=

∆t2tdc/12. The phase noise (rad) is obtained by normalizing the standard deviation of the

timing error, σ2tQ, to the unit interval and multiplying by 2π radians: σφQ

= 2π ·σtQ/Toutwhere Tout = 1/fout is the DCO output period. The total noise power is spread uniformly

over the span from DC to the Nyquist frequency (i.e., half of the reference frequency fref).

Hence, the single-sided spectral density is σ2φQ/fref where fref = 1/Tref is the reference

frequency. In conclusion:

ℓQTDC=σ2φQ

fref=

(2π

σtQTout

)2

· 1

fref(2.12)

=π2

3· f

2out

fref·∆t2tdc (2.13)

From Eq. 2.13, it is obvious that the TDC noise contribution can be minimized by

improving the TDC timing resolution and/or by increasing the sampling rate of the TDC

i.e. by increasing the reference frequency. Reducing TDC resolution by a factor of 10

reduces in-band phase noise by 20 dB. For example, a 2.5 GHz DPLL with a 48 ps TDC

resolution running from a 20 MHz reference clock, the in-band phase noise contribution

is approximately -85.4 dBc/Hz. The phase noise will drop to -113 dBc/Hz if the TDC

resolution is reduced to 2 ps.

The TDC quantization noise is low passed filtered by the DPLL dynamics i.e. G(s),

and so TDC quantization noise is dominant within the loop bandwidth. Fig. 2.6 shows

15


20 25 30 35 40 45 50

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Time (us)

TD

C O

utpu

t (S

cale

d to

DC

O P

erio

d)

(a) Coarse TDC: ∆ttdc = 48 ps

20 25 30 35 40 45 50

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Time (us)

TD

C O

utpu

t (S

cale

d to

DC

O P

erio

d)

(b) Fine TDC: ∆ttdc = 2 ps

Figure 2.6: TDC output during frequency acquisition (below 30 µs) and phase lockingwith different TDC resolutions (FCW = 120.01709).

the normalized error, ǫ, estimated by TDCs with vastly different TDC resolutions of

∆ttdc = 48 ps & 2 ps. When ∆ttdc = 48 ps, the quantization error is large compared with

the output period (approximately 12% when fref = 20 MHz and FCW = 120.01709).

As expected from Eq. 2.13 and shown in Fig. 2.7, bringing down ∆ttdc to 2 ps will reduce

the noise floor by approximately 20 · log(48/2) = 28 dB. The white noise assumption of

TDC quantization noise used to derive Eq. 2.13 is less valid for large values of ∆ttdc where

there is a chance of appearance of unwanted spurs, as shown in Fig. 2.7(a). Accordingly,

the efforts are devoted to achieving fine-resolution TDC.

0 1 2 3 4 5 6 7 8 9 10−130

−120

−110

−100

−90

−80

−70

−60

−50

Frequency (MHz)

Sp

ect

rum

De

nsi

ty (

dB

)

Eq. 2.13


0 1 2 3 4 5 6 7 8 9 10−130

−120

−110

−100

−90

−80

−70

−60

−50

Frequency (MHz)

Sp

ect

rum

De

nsi

ty (

dB

)

Eq. 2.13


Figure 2.7: Spectrum of TDC quantization noise, tQ, for different TDC resolutions(FCW = 120.01709). Simulation results are marked in blue while theoretical expec-tations from Eq. 2.13 are marked in red.

16


2.2.3 DCO quantization noise

During fractional operation, the DCO tuning word spans multiple quantization lev-

els. Hence, the DCO frequency quantization error can be assumed to have white noise

spectral characteristics. Mathematically, the frequency quantization error variance is

σfQ = (2π∆fres)2/12 where ∆fres [Hz] is the DCO frequency resolution i.e. the small-

est frequency step. Due to the white noise assumption, the frequency noise power is

spread uniformly from dc to the Nyquist frequency. The single-sided spectral density of

frequency quantization is σ2fQ/fref . Finally, since phase is the integral of frequency, the

single-sided power spectral density due to DCO quantization noise at frequency offset of

ω can be given as

ℓQDCO(ω) =

σ2fQ

fref· 1

ω2=

(2π∆fres)2

12· 1

fref· 1

ω2(2.14)

⇒ ℓQDCO(f) =

1

12 · fref· ∆f

2res

f 2(2.15)

For a 2.4 GHz DCO with 2.4 MHz frequency resolution and 20 MHz reference, the noise

contribution due to the DCO quantization at 1 MHz offset from the 2.4 GHz carrier is

-76 dBc/Hz, which is very high. Bringing the DCO resolution down to 240 kHz will

reduce the phase noise contribution to -96 dBc/Hz. This is still high and usually greater

than the thermal phase noise contribution of an LC DCO at 1 MHz offset frequency.

Dithering or adding further fractional bits will allow the noise contribution to be within

a reasonable range. Targeting -110 dBc/Hz @ 1 MHz will require a DCO resolution of

better than 48 kHz. Note that the DCO noise, as well as the DCO quantization, are

high pass filtered by the DPLL loop dynamics [15], and are therefore more pronounced

outside the loop bandwidth.

2.2.4 TDC Fractional Spurs

The phase noise spectrum of a DPLL output may contain reference spurs as well as

fractional spurs that appear at frequencies defined by the fractional part of frequency

control word i.e. Nfrac. These spurs show up at multiples of Nfrac · fref , as shown in

Fig. 2.8. The TDC fractional spurs are low pass filtered by the DPLL. Therefore, they

are in particular very problematic for smaller values of Nfrac since these spurs will show

up at low frequency within the loop bandwidth, as shown in Fig. 2.8(d), such that DPLL

dynamics can not filter them out.

17


100

101

102

103

104

−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

Frequency (kHz)

Spe

ctru

m D

ensi

ty (

dB)

(a) FCW = Nint +Nfrac = 120.500

100

101

102

103

104

−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

Frequency (kHz)

Spe

ctru

m D

ensi

ty (

dB)

(b) FCW = Nint +Nfrac = 120.250

100

101

102

103

104

−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

Frequency (kHz)

Spe

ctru

m D

ensi

ty (

dB)

(c) FCW = Nint +Nfrac = 120.125

100

101

102

103

104

−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

Frequency (kHz)

Spe

ctru

m D

ensi

ty (

dB)

(d) FCW = Nint +Nfrac = 120.01709

Figure 2.8: Phase noise spectrum of TDC Output for different fractional channels(∆ttdc = 32 ps).

Fig. 2.9 displays a spectrum of the phase error computed by the TDC when FCW =

120.01709 for two TDCs with ∆ttdc = 32 ps & 2 ps. Both TDCs produce fractional spurs

at frequencies that are multiplies of 0.01709 · 20e6 = 341.8 kHz. However, the 2 ps TDC

has less than 20 dB phase noise at low frequencies compared to the 32 ps TDC (Eq. 2.13

expects 20 · log(32/2) = 24 dB reduction). Looking at the probability density function

(PDF) of the absolute output jitter, as shown in Fig. 2.10, justifies the importance of

designing a low-resolution TDC to reduce jitter. When ∆ttdc = 32 ps, the peak-to-peak

jitter2 is 21 ps while the estimated random jitter is 3 ps RMS. If ∆ttdc is brought down

to 2 ps, the peak-to-peak jitter becomes only 9 ps while the estimated random jitter is

less than 1 ps RMS.

2The peak-to-peak jitter is observed for a long number of reference periods, Tref .

18


100

101

102

103

104

−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

Frequency (kHz)

Spe

ctru

m D

ensi

ty (

dB)

Figure 2.9: Spectrum of the phase error computed by TDC when FCW = 120.01709:(a) solid red line when ∆ttdc = 32 ps (b) dashed blue when ∆ttdc = 2 ps. There is 20 dBdifference in the spectrum density at low frequency.

−15 −10 −5 0 5 10 150

200

400

600

800

1000

1200

Instantaneous Jitter (ps)

Rep

etio

n

∆ ttdc

= 2 ps

∆ ttdc

= 32 ps

Figure 2.10: Histogram of absolute output jitter when FCW = 120.01709: (a) solid redline when ∆ttdc = 32 ps where peak to peak jitter is 21 ps (b) plus blue symbols when∆ttdc = 2 ps where peak to peak jitter is 9 ps.

19


fref

fout

¼ ¼

¾ ½ ¼ ¾

(a) No TDC

fref

fout

!"

¾ ¼ ¾

#$" ¼ %& '( ¼

" " " " " "

" "½

(b) Infinite TDC resolution and accuracy

)

fref

fout

* + , -

./

012

234506

¾ ¼

7)86 /¼ 9: ;< - ..¼

6 6½ ¼

¼ =¾ ¼ ¼ =¾¼

(c) Infinite TDC resolution and accuracy with tim-ing offset between integer and fractional part

>

fref

fout

? @ A B

CD

EFG

GHIJEK

KLAC KLD? KLAA

M>NK D¼ OP QR B CC¼

KLKO KLK?KL@Q

KLKO SKLKO KLKQ SKLKD KLK? KLKD

(d) Finite TDC resolution and accuracy

Figure 2.11: Timing diagram of phase error computation for DPLL with FCW =21/4.

Fig. 2.11 shows the timing diagram for a DPLL when FCW ≡ Nint+Nfrac = 21⁄4 for

different scenarios. In Fig. 2.11(a), the DPLL uses only an integer counter without TDC

to estimate the number of DCO cycles within one reference cycle. Accordingly, the phase

error, φe, has periodic behavior at a frequency equal to 1/4 · 20 MHz = 5 MHZ. If a TDC

with infinite resolution and accuracy is employed to correct the integer counter estimate,

then the phase error will converge to zero without any noticeable periodicity as shown

in Fig. 2.11(b). Spurs could show up even if an infinite resolution TDC is employed

20

2.3. System level design of DPLL

when there is a time difference between the integer counter and TDC output as shown

in Figure 2.11(c). Re-timing the integer counter and TDC outputs to be synchronous

will solve that problem. Finally, Fig. 2.11(d) shows a typical case of a DPLL employing

a TDC with finite resolution and accuracy. The limited resolution and accuracy of the

TDC translates to the DPLL locking with continuously varying phase error that causes

higher than wanted in-band phase noise and the appearance of some fractional spurs.

Other sources of unwanted phase error are TDC nonlinearity and estimation error in

the TDC output after the normalization circuit (i.e. tr/Tout), as will be explained in

section 2.4.1.

2.3 System level design of DPLL

In a DPLL, the digital phase signals and control signals use fixed-point representation

with enough digits to span the full range of those quantities before saturation or wrap

around. MATLAB/ Simulink uses by default floating point representation. To cor-

rectly capture the actual implementation of a digital PLL, a fixed point representation in

Simulink is used. The designer must run simulations to determine the required number of

bits that will not hurt the digital PLL performance. Then, a register-transfer level (RTL)

implementation of the DPLL is implemented using Verilog for simulation along with a

VerilogA or transistor-level implementation of the DCO to capture the performance ac-

curately. The Simulink model as well as the VerilogA model capture the reference and

the DCO noises, including thermal noise, flicker noise, and their up-conversion to phase

noise. Furthermore, the models capture PVT variation of the DCO which affects its cen-

ter frequency and minimum capacitance step i.e. DCO gain. Similarly, the TDC delay

and range variation are also modeled. On the other hand, the digital algorithm uses a

high level description that it independent of PVT variation since it will be synthesized to

meet hold and setup time over PVT at a later stage. Appendix C has more details about

jitter modeling and simulation and verification flow. Fig. 2.12 shows phase noise simula-

tions based on the variation of the rising edge of the output clock that was captured in

Simulink.

Behavioral simulations were done in Matlab to show the jitter contributed by various

sources in the DPLL for a 2.4 GHz output frequency with 20 MHz reference. In all cases,

dithering of the DCO LSB inputs contributed negligibly to the RMS jitter. With a loop

21

2.3. System level design of DPLL

bandwidth of 700 kHz, the DCO intrinsic phase noise is approximately -110 dBc/Hz

at 1 MHz offset and contributes 179 fs RMS jitter3; improving TDC resolution from

40 ps to 4 ps reduces the TDC’s jitter contribution from 3324 fs RMS down to 232 fs,

effecting a reduction in total output RMS jitter from 3329 fs down to 295 fs. With a

loop bandwidth of 1400 kHz, DCO intrinsic noise contributes 155 fs RMS jitter while

the jitter contributed by TDC quantization can be reduced from 5645 fs down to 394 fs

RMS by improving TDC resolution from 40 ps down to 4 ps resulting in a reduction in

overall RMS jitter from 5647 fs down to 424 fs RMS [8].

104

105

106

107

108

109

−160

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60

Offset Frequency (Hz)

Pha

se N

oise

(dB

c/H

z)

Jpp = 9.07 ps, Jrms = 0.98 ps, Freq = 2400.340 MHz. The integrated phase jitter 954 fs

Eq. 2.13

MATLABsimulation

Figure 2.12: Phase noise of the output clock (2400.3418 MHz), based on MATLAB/Simulink simulation, when ∆ttdc = 4 ps.

3Details of estimating RMS jitter from a phase noise plot is given in Appendix B.3.

22

2.4. Basic TDC structure

2.4 Basic TDC structure

Fig. 2.13 illustrates the principle of a basic TDC based on a digital delay line. The start

signal (fout) is merely delayed by using buffers or differential inverters to generate multiple

delayed versions of fout. These delayed signals are sampled on the arrival of the rising

edge of stop signal (fref). The outputs of a sampling flip-flop will be high if the delayed

start signal, D[i], passes the stop sampling clock, fref . Otherwise, the sampling process

will generate a low value. Consequently, the TDC generates a pseudo-thermometer code

such that the position of high to low transition estimates the time difference between the

previous rising edge of start signal fout and the rising edge of stop signals, fref , in terms

of the number of TDC delay stages i.e. tr/∆ttdc, as shown in Fig. 2.14.

fout

fref

D Q D Q D Q

Q0 Q1 Qn

D1

D2

fref

0

111110

0

Q[n]

D3

D4

D5

D6

D0

ttdc

tQ

tr

D0 Dn-1 Dnfout

Figure 2.13: Buffer delay line implementation of TDC: simplified schematic view (left);timing diagram(right). The raw Q[i] is pseudo-thermal code to be converted into anormalized binary word representing the fractional phase error.

fref

fout

tf

tr

Te > 0U = 1 - tr/Tout

Figure 2.14: Estimating phase error based on the TDC output.

23


2.4.1 TDC Normalization Circuit

TDC

0-1 & 1-0 detector

1/x x

tft tdc

trt tdc

2

tft tdc

trt tdc

Q[0: M]

f ref

fout

Tout

t tdc

Tout

t tdc

Tout

tr

1 nΣ

n

k=1

xk

Figure 2.15: A typical circuit to normalize the phaseerror of a TDC.

The TDC pseudo-thermometer

outputs must be normalized to

Tout, similar to other DPLL

signals. An approximation ofTout/∆ttdc is calculated by doubling

the absolute difference betweentr/∆ttdc and tf/∆ttdc, as shown in

Fig. 2.14. Variations are av-

eraged over time using a mov-

ing average filter which generates

Tout/t∆tdc. Inverting the output

of the moving average filter and

multiplying it by the raw TDC

output, tr/∆ttdc, will generate a

normalized phase error i.e. tr/Tout,

as show in Fig. 2.15.

A moving average filter must

have enough points to filter out

period estimation’s errors espe-

cially if a coarse TDC is used.

However, a long moving aver-

age filter slows down the TDC

response and may cause serious

phase estimation error during fre-

quency switch or modulation. Note that some error also arises in the estimate of ∆ttdc

for a particular TDC. When ∆ttdc = 56 ps, the DPLL estimates the TDC resolution

to be 56.24 ps with 7.41 ps standard deviation, as shown in Fig. 2.16(a). However, if

∆ttdc becomes 16 ps, the DPLL estimates the TDC resolution to be 16.04 ps with 0.17 ps

standard deviation, as shown in Fig. 2.16(b). This is show the importance of designing

a fine TDC which will be addressed in the next chapter.

24


0 10 20 30 40 50 60 70 8010

20

30

40

50

60

70

Time (us)

Estim

ate

of T

DC

reso

lutio

n (p

s)Error of ∆ t

tdc = 0.24 ps, STDV of ∆ t

tdc = 7.41 ps


0 10 20 30 40 50 60 70 8010

20

30

40

50

60

70

Time (us)

Estim

ate

of T

DC

reso

lutio

n (p

s)

Error of ∆ ttdc

= 0.04 ps, STDV of ∆ ttdc

= 0.17 ps


Figure 2.16: Estimate of TDC resolution as computed by the TDC normalization circuit.The raw data (blue) is plotted along the 128-point moving average filter (red).

25

Chapter 3

A DPLL with Calibrated Coarse and

Stochastic Fine TDC

This chapter presents an overview of the implemented DPLL while focusing on the pro-

posed calibrated coarse-fine TDC. Then, a summary of state-of-art implementations of

TDCs along with their pros and cons is shown. The next section provides an overview

of the proposed low power calibrated coarse and stochastic fine TDC. A detailed discus-

sion of the stochastic behavior of fine TDC is given. Later, an on-chip balanced-mean

code-density test is presented to calibrate the coarse TDC. The chapter concludes with

a summary of the testing setup and measurement results.

3.1 Overview of the DPLL

Fig. 3.1 shows the implemented DPLL architecture that works from 1.99–2.5 GHz .

The shaded blocks in Fig. 3.1 i.e. DCO, TDC, and divide-by-two are custom-designed

while other blocks are implemented using RTL Verilog. There is a retiming circuit to

synchronize the reference clock, fref , with DCO output clock, fout. The retimed reference

clock, fref−D, is used to synchronize the operation of DPLL including phase detection,

normalization, and filtering. The DCO is an LC-oscillator with digitally switchable

capacitors. The output clock of the DCO is divided by two using a CML static divider1.

The CML output is AC coupled before passing it through a pseudo-differential CMOS

1The standard cell library in 0.13 µm IBM CMOS bulk process does not support synthesizable logicbeyond 1 GHz.

26

3.2. State-of-Art Implementations of TDC

Figure 3.1: A digital PLL architecture for fractional frequency synthesis [16]. The shadedblocks are custom-designed but can be automatically generated using a scripting languagelike TCL due to the regular structure.

buffer. After the CML to CMOS stage, the half-rate clock is fed to a synthesized divide-

by-four, output phase accumulator, and ∆Σ modulator. The ∆Σ modulator is used

to reduce the effect of DCO quantization noise and to achieve fine frequency control.

Detailed explanations of each block are given throughout this chapter. Also, Appendix A

has details about the CML divide-by-two, the CML to CMOS conversion, the CML

output buffer, and the flip flop used in the coarse TDC.

3.2 State-of-Art Implementations of TDC

A TDC is widely used in many applications such as nuclear experiments for timing

single-shot events, laser range finders, and space science instruments [17]. In DPLLs,

it has been employed for the measurement of the phase difference between a reference

and an output clock. The phase noise contributed by TDC quantization in, for example,

[5, 10, 18] is unacceptable for many applications that require wide loop bandwidth like

LAN, WCDMA, HSPCA, and LTE [6]. But, designing a fine-resolution and low-power

27


TDC is a challenging task. The following presents state-of-art implementations of TDC.

3.2.1 Buffer delay and Inverter delay line TDC

The simplest implementation of a TDC uses a buffer delay line [15]. Due to its simple

structure, it can be implemented at Verilog gate-level using a predefined standard cell

library. However, its time resolution is limited by the buffer delay that is technology

dependent. Replacing a buffer delay line with an inverter delay line can improve the

TDC resolution by a factor of two. However, the rise and fall time of the inverter must

be matched such that two adjacent inverters will have the same effective resolution. In

addition, the resolution is still limited by technology. In 0.13 µm CMOS, the inverter

delay varies from 32 to 48 ps over PVT while in 28 nm CMOS technology the inverter

delay is around 10-12 ps.

3.2.2 Vernier delay line TDC

Vernier delay lines are a straightforward method to improve the TDC resolution, using

two delay lines with slightly different stage delays, Ta and Tb, so that the TDC resolution

is determined by the delay difference between the two inverters, [Ta − Tb] [11][19]. Al-

though the Vernier delay-line improves TDC resolution, there is a dramatic area penalty

and increased power consumption especially if a large dynamic range is required. Fur-

thermore, employing a Vernier TDC within DPLL systems would increase the loop delay

and hurt stability since the signals may propagate through a lengthy delay line before the

phase error is resolved. For example, the Vernier delay line in [20] uses two delay lines

consisting of 80 buffers providing 5 ps resolution but resulting in a relatively high DPLL

power consumption of 50 mW in a 90 nm CMOS process. A 2-dimensional Vernier TDC

[21] was proposed to reduce the number of delay stages and the power consumption. A

2-dimensional Vernier TDC resolves 4.8 ps in 65 nm CMOS technology [21]. A DPLL

employing a 2-dimensional Vernier TDC [22] shows a very good noise performance in a

55 nm process.

3.2.3 Gated ring oscillator (GRO) TDC

The gated ring-oscillator (GRO) TDC reported in [12] achieves an effective resolution

of 6 ps in a 0.13 µm technology. It measures the phase error between two signals by

28


Enable

Counters

Register

Out

+

GRO

(a) Block diagram (b) Timing diagram [12]

Figure 3.2: Block and timing diagram of Gated Ring Oscillator (GRO) based TDC.

enabling a ring oscillator only during a measurement window, as shown in Fig 3.2. The

gating action of the GRO preserves the oscillator state, i.e. the quantization error, at the

end of the measurement interval Tin[k1]. In the following measurement interval Tin[k],

the previous quantization error is carried over, as shown in Fig. 3.2(b), which results in

first-order noise shaping of the quantization error. Furthermore, the GRO-based TDC

could employ multi-phase coupled oscillators to average its delay without the need for

code-density calibration [12]. However, a GRO TDC consumes up to 21 mW for large

phase errors.

3.2.4 Interpolation-Based TDC

An interpolation-based TDC is reported in [23]. It employs a differential delay line to

obtain coarse delay steps. It then interpolates between neighboring phases with a resistor

voltage divider to achieve a small delay step of 4.7 ps in 90nm technology. However,

that TDC uses two auxiliary TDCs and an extra digital loop filter for correction and

calibration, making it power hungry.

29

3.3. Coarse-Fine Stochastic TDC

Start

Stop

TA TA TA

Mux

Encoder

Coarse TDC

Figure 3.3: Two-stage TDC: Coarse TDC followed by timing amplifier of the residuewhich feed to another coarse stage.

3.2.5 Two-step TDC

Two-step TDCs combine a coarse and fine stage to provide fine resolution while still

covering a wide dynamic range of input phase error. For example, the two-step TDC

in [24] uses a delay-line TDC as the coarse TDC followed by a Vernier delay-line fine

TDC. In [25], the residual phase error after a coarse TDC is time-amplified and applied

to another TDC with relatively coarse resolution. Unfortunately, the time amplifier has

high power consumption and a complex analog design which conflicts with the goal of

digitizing the PLL circuits.

3.3 Coarse-Fine Stochastic TDC

This section reports on a low-power two-step coarse-fine TDC achieving 4 ps2 TDC

resolution in a 0.13 µm technology [16]. The proposed TDC architecture uses a coarse-

resolution TDC, as shown in Fig. 3.4, to select a delayed version of the reference clock for

further comparison in a fine-resolution TDC with the output clock. The fine-resolution

2The targeted TDC resolution was 2 ps but could not measure better than -108 dBc/Hz in-bandphase noise which is equivalent to 4 ps TDC resolution.

30


TDC needs to resolve 64 ps range down to 2 ps i.e. 6-bit fine TDC. The fine TDC could

be realized as a Vernier structure which requires a 32 sampling flip-flops and 64 delay

elements (inverters) in addition to the multiplexer. There are two disadvantages of using

a Vernier structure as the fine TDC. One is the large dynamic power consumption due to

the addition of extra 64 inverters. Also, using a Vernier structure delays the phase error

calculation by 32 DCO cycles which affects the loop stability. Similarly, a time amplifier

fine TDC consumes large power. Alternatively, a stochastic fine TDC, shown in Fig. 3.5,

employs 64 latches only without the need for inverters and without experiencing a long

delay. The proposed fine-stochastic TDC uses the stochastic variation of latch offsets

[17] to provide a resolution much better than the technology’s inverter delay.

3.3.1 Coarse TDC

The coarse TDC shown in Fig. 3.4 generates 32 delayed versions of the low-frequency ref-

erence clock by passing it through a chain of pseudo-differential inverters with adjustable

delay. Then, the delayed reference clocks are used for sampling the high-frequency output

clock using sense-amplifier flip flops, shown in Appendix A, that have a narrow symmet-

ric metastability window [15]. The coarse TDC must cover at least one DCO period at

the slowest operating frequency of the DPLL3. Passing the low frequency reference clock

rather than the high frequency output clock through the inverter chain provides two ad-

vantages: lower power consumption and lower jitter induced by the power supply during

the sharp transitions on both the rising and falling edges of the clock signal through the

inverters.

An encoder and 32-to-1 multiplexer are used to select one of the delayed versions of

the reference clock for further comparison with the output clock using the fine TDC. The

encoder introduces a delay which makes it impossible to tap the output of the delay buffer

where the 1-0 transition occurs, since by then the reference clock edge has propagated

further [8]. To solve this problem, the mux selects the output of the second buffer after

the 1-0 edge transition, passing it into the fine TDC. Moreover, the DCO clock is also

delayed to mimic the extra delay experienced by the selected reference clock phase, before

comparison by the fine TDC, as shown on the left of Fig. 3.4.

3Note that a 16-stage coarse TDC is sufficient to cover the maximum period of the DCO clock.However, a 32-stage coarse TDC was used to enable on-chip measurement of the period jitter.

31


Fout

FREF

Q1 Q2 Qn

EncodingFind 1 -0 and 0-1 transition

32x1 CMOS MUX

D_Fref

Coarse TDC

D_Fout

Slop/ delay control

Figure 3.4: The coarse TDC architecture of a two-step TDC. The delayed version of Fref

with phase closest to Fout is muxed to the second TDC stage. Path delays for the selectedreference phase Fref to D Fref and DCO clock Fout to D Fout are matched.

3.3.2 Fine Stochastic TDC

The stochastic TDC is composed of M identical arbiters evaluating in parallel the phase

relationship between two incoming signals [17][26]. Ideally, each arbiter circuit instantly

generates a logical ‘0’ or ‘1’ depending upon which one of the two input signals transitions

first.

In reality, the arbiters exhibit several nonidealities. The output settling time increases

when the time offset between the incoming signals, ∆t, is small. If the time offset is in the

vicinity of zero, the arbiter exhibits metastability and can take a very long time to settle.

Moreover, due to device mismatch, each arbiter exhibits a random input offset voltage,

VOS, that creates different voltage thresholds for each arbiter, as shown in Fig. 3.6(a).

Over a large number of arbiters, these voltage offsets will be Gaussian-distributed with

a standard deviation σV .

32


D_Fref

Fine STDC output

SR

D

SR

D

SR

D

+

D_FoutFine TDC

Figure 3.5: The fine stochastic TDC (STDC) architecture of the two-step TDC. TheSTDC outputs are sampled on the rising edge of the delayed reference clock.

(a)

QQb

R SM1 M2

M3 M4

(b)

Figure 3.6: (a) The stochastic TDC arbiter input-output relationship without and withrandom mismatch. Input-referred voltage offset due to mismatch translates into timeoffset. (b) SR-Latch used in the stochastic TDC as arbiter.

33


The voltage offsets translate into input-referred time offsets, TOS, which will also be

Gaussian distributed with standard deviation σT . If the input clock signals have a long

rise time, even a small voltage offset, VOS, will translate into a significant time offset,

TOS. Accordingly, the time offset of an arbiter can be related to its voltage offset by the

slope of the input signals, Sin, so TOS = VOS/Sin and σT = σV /Sin.

The average stochastic TDC output follows the error function that can be estimated

using a Taylor series expansion as follows,

erf(x) =2√π

x∫

0

e−t2 · dt ≈ 2√π·(x− x3

3+x5

10− · · ·

)(3.1)

The cumulative distribution function (CDF) of the Gaussian-distributed variable is re-

lated to the error function as it follows,

cdf(td, µ = 0, σ = 1) =1√2π

td∫

−∞

e−t2

2 · dt = 1

2

[1 + erf

(td√2

)](3.2)

⇒ cdf(td, µ = 0, σ = σT ) =1√2πσT

td∫

−∞

e−

t2

2σ2T · dt = 1

2

[1 + erf

(td√2σT

)](3.3)

The summed output of a population of M arbiters, with zero mean time offset while

the standard deviation of time offset is σT , has the following approximate CDF (using

the Taylor expansion of the error function as given in Eq. 3.1):

cdf(td; 0, σT ) ≈M

2+

M√2πσT

td −M

6√2πσ3

T

t3d (3.4)

The CDF function is approximately linear around td ǫ [−σT , σT ], as shown in Fig. 3.7.

The stochastic TDC resolution, ∆tstoch, can be estimated as the inverse of the slope of

the CDF function around the midpoint while ignoring the cubic term in Eq. 3.4:

∆tstoch =

√2πσTM

=

√2πσV

M · Sin(3.5)

Considering the cubic term in Eq. 3.4 will increase the estimated ∆tstoch in Eq. 3.5 by

20%, as given here:

34


Figure 3.7: A Monte Carlo simulation of the stochastic TDC for a given negative phaseerror. The sum of all stochastic TDC arbiter outputs translates into a phase error withinthe linear region of the time-offset’s statistical CDF.

∆tstoch =2σT

cdf(td = σT )− cdf(td = −σT )= 1.2

√2πσTM

= 1.2

√2πσV

M · Sin

(3.6)

Hence, it obvious that the stochastic TDC resolution, ∆tstoch, is determined by the

number of arbiters used, the statistical properties of the transistors used to design those

particular arbiters, and the slope of the input signals. Designing a latch with inherently

large mismatch can be achieved by using minimal transistor sizes. However, the slope

of the incoming signal has an even greater effect on the stochastic TDC resolution and

dynamic range and therefore is controlled using a programmable slope control circuit,

implemented by modifying the PMOS load of a CMOS buffer. Although this may increase

the short-term jitter, its impact upon performance was deemed relatively insignificant

for the targeted resolution.

The arbiters have been implemented as set-reset latches based on cross-coupled NAND

gates, as shown in Fig. 3.6(b). The output of these arbiters are sampled on the rising

edge of the delayed reference clock, as shown in Fig. 3.5. This is important to ensure

that the stochastic TDC captures the correct value of the arbiter before it may change

its state.

The arbiter (shown in Fig. 3.6(b)) has an input-referred voltage offset, VOS, due

35


to the random mismatch between its transistors. The mismatch is characterized by

the variations of the threshold voltage, Vth, and by β = µCoxW/L [27]. For a small

overdrive voltage, VOS is affected mainly by Vth variations . However, the effect of Vth

variations reduces for larger overdrive voltage and it becomes comparable to the impact

of β variations. A latch with positive feedback, as shown in Fig. 3.6(b), changes its

output around small overdrive voltage and so Vth variation is the main contributor to

VOS variation. Furthermore, in contrast to the mismatch of the differential input pair,

the mismatch of the transistors forming the positive feedback in the latch has a marginal

effect on the VOS.

Fig. 3.8 shows 512 Monte-Carlo simulations of Vth variations for a minimum size

transistor (i.e. L = 0.12 µm and W = 0.20 µm). According to that simulation, Vth

has a standard deviation of 22.78 mV. Assuming uncorrelated mismatch between the

transistors in the input differential pair (M1 and M2 in Fig. 3.6(b)), the input referred

voltage offset VOS due to Vth mismatch has a standard deviation of√2σVth

= 32.22 mV.

Spectre simulations of the stochastic TDC transfer function, as shown in Fig. 3.9(a),

demonstrate a comparable value and confirm that Vth random variation of the differential

input pair is the main contributor to VOS variations.

Recall also that the stochastic TDC resolution can be adjusted by changing the ref-

erence signal slope, Sin, as shown in Fig. 3.9(b). Accordingly, the reference signal is

buffered such that its rise time has a slope that ranges from 0.2 to 2.0 V/ns programmed

through a serial bus. Accordingly, to achieve a time offset with a standard deviation of

0 50 100 150 200 250 300 350 400 450 500260

280

300

320

340

360

380

400

420

Sample number

Th

resh

old

vo

ltag

e (

mV

)

(a) 512 Monte-Carlo runs

260 280 300 320 340 360 380 400 420 4400

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

Threshold voltage (mV)

Pro

ba

bil

ity

dis

trib

uti

on

(b) Estimated PDF versus ideal PDF distribution

Figure 3.8: Spectre Monte-Carlo simulation of threshold voltage, Vth, for a minimum sizetransistor. Accordingly, mean(Vth) = 345 mV and stdv(Vth) = 22.78 mV.

36


−50 −40 −30 −20 −10 0 10 20 30 40 50

0

10

20

30

40

50

60

Time difference (ps)

Su

mm

atio

n o

f S

TD

C o

utp

uts

(a) CDF for two random stochastic TDCswith the same input slope of 1V/ns

−50 −40 −30 −20 −10 0 10 20 30 40 50

0

10

20

30

40

50

60


Su

mm

atio

n o

f S

TD

C o

utp

uts

(b) CDF for the same stochastic TDC but withdifferent input slopes (a) red: 1V/ns (b) blue:1.5V/ns

Figure 3.9: Transfer function of stochastic TDC output using Spectre Monte-Carlo sim-ulation. Note that Vth has a standard deviation of 22.78 mV.

32 ps, the slope of the incoming signal must be around 1.0 V/ns. This enables the fine

stochastic TDC to have a 64-ps approximately linear region that is around 60% larger

than the coarse TDC resolution. A wide linear range is desirable since any systematic

mismatch (for example as caused by layout mismatch) will shift the CDF to the left or

right and reduce the useful linear range and the ability of the stochastic TDC to resolve

time differences accurately [16].

Employing a large number of arbiters improves the stochastic TDC’s resolution as

well as its differential nonlinearity (DNL). However, there is an area and power con-

sumption penalty traded-off for accuracy. In the following, a statistical treatment of the

stochastic TDC is discussed such that a proper number of arbiters is chosen to achieve

small resolution while consuming reasonable power.

Assume there is a random population of M arbiters with time offsets that follow a

Gaussian distribution. For a very large population size, one can assume that the mean

of arbiter time offset, µT , is zero while the standard deviation is well defined by a single

value, σT . In practice, a limited number of arbiters is used to resolve the time error.

For such a finite population, the mean and standard deviation will be off from the ideal

values depending on the population size, M , and depending on the desired confidence

interval. Define the time offset mean, µT , for M arbiters as a random variable. Then µT

37


lies within the following range:

−tα/2,M−1σM√M

< µT < +tα/2,M−1σM√M

, at (1− α) confidence interval

where σM is the sample standard deviation and tα/2,M−1 denotes the 100 ∗ (1 − α/2)

percentile of the Student t-distribution with M − 1 degrees of freedom where M is the

number of arbiters. Similarly, the population standard deviation is bounded by lower

and upper limits:

σT,min = σM

√M − 1

χ21−α/2,M−1

(3.7)

σT,max = σM

√M − 1

χ2α/2,M−1

(3.8)

where χ2α/2,N−1 is the inverse-Chi-squared distribution with M − 1 degrees of freedom.

For example, using a stochastic TDC with only 64 arbiters will guarantee that µT

lies within ±1.669 · σM/√64 with 90% confidence4. The possible range of µT would

be ±2.656 · σM/√64 for 99% confidence. To first order, the possible range of µT is

proportional to 1/√M for the same confidence level. Doubling the size of the sample

will decrease the potential range of µT by 29%. The uncertainty of µT leads to shifting

the transfer characteristics of the stochastic TDC and so it reduces the available linear

region used to resolve the time error.

Similarly, the uncertainty in the standard deviation of the time offset, σT affects the

stochastic TDC resolution. Using Eq. 3.8 and for 99% confidence interval, the standard

deviation for a stochastic TDC with 64 arbiters lies within [0.81 − 1.29] · σM . If only

32 arbiters are used, the standard deviation will be within [0.75 − 1.46] · σM for 99%

confidence interval. This means that a stochastic TDC’s range could be 29% larger than

expected when 64 arbiters are used or it could be 46% larger than anticipated if 32

arbiters are used.

Recall Eq. 3.6 to estimate the stochastic TDC resolution, ∆tstoch, with 64 arbiters

when σT = 32 ps. Ideally, one expects to achieve 1.50 ps resolution. However, due to the

small number of arbiters (64), σT,max as well as ∆tstoch could be 29% larger i.e. ∆tstoch

could be as large as 1.94 ps. Furthermore, the CDF of a stochastic TDC could suffer from

4In MATLAB, tinv(1− 0.1/2, 63).

38


−80 −60 −40 −20 0 20 40 60 800

10

20

30

40

50

60


Su

mm

atio

n o

f S

TD

C o

utp

uts

(a) CDF

10 15 20 25 30 35 40 45 50 55−2

−1

0

1

2

3

STDC output code

DN

L [p

s]

(b) DNL

10 15 20 25 30 35 40 45 50 55

−4

−2

0

2

4

STDC output code

INL

[p

s]

(c) INL

Figure 3.10: (a) Transfer function of one random example of a non-ideal stochastic TDCwhen the number of arbiters M = 64. The associated DNL and INL are shown in (b)and (c), respectively.

large DNL and INL that limits its ability to resolve small time error. An ideal stochastic

TDC still has a max DNL of around 0.6 ps, as will be discussed later. However, a non-

ideal realistic case of a stochastic TDC with 64 arbiters is shown in Fig. 3.10 that displays

the range of DNL is from -1.5 ps to 2.5 ps while INL varies between -4.2 ps to 2.4 ps.

Fig. 3.11 and Fig. 3.12 show normalized PDFs and CDFs, respectively, of Gaussian

distributed random offset for a stochastic TDC with a different number of arbiters. Each

plot has PDFs of 100 different stochastic TDCs (i.e. Monte-Carlo simulations). It is

obvious that PDFs as well as CDFs of various stochastic TDCs become more consistent

and closer to the ideal case as the number of arbiters M increases but at the cost of

extra power consumption and area penalty. When M is 16 and 32, the variations are

very significant and not acceptable. Increasing M will reduce the DNL for a particular

39

3.4. TDC Output Normalization

stochastic TDC, as shown in Fig. 3.13. However, the integral nonlinearity (INL) may not

and so choosing M to be 64 arbiters is reasonable. Fig. 3.13 shows DNL and INL of an

ideal stochastic TDC when M = 64 & 512. This nonlinearity is caused by the cubic term

in the CDF Eq. 3.4. Based on Fig. 3.13, the worst case DNL is 0.60 ps when M = 64

compared to only 0.075 ps when M = 512. However, INL is around 2 ps regardless of

the number of arbiters.

Finally, it is possible to extend the linear range of the stochastic TDC by using meth-

ods similar to those used for stochastic ADCs. For example, the work in [28] demonstrates

a stochastic ADC with two groups of arbiters where their PDFs are shifted left and right

by applying a symmetric offset. This would create a virtually uniform distribution of the

arbiters’ offsets and improve the CDF linearity with fewer arbiters [8].

3.4 TDC Output Normalization

The proposed TDC is a two-step TDC with a coarse TDC followed by a fine stochastic

TDC to refine the phase error estimation. Outputs of a coarse inverter-line TDC are a

pseudo-thermometer representation of the time error between the input signals that is

normalized to the coarse TDC resolution i.e. tr/∆ttdc. A normalization circuit to change

the coarse TDC outputs into phase error normalized to the output period (Tout) was

shown in section 2.4.1 and it is identical to the left half of the circuit in Fig. 3.14.

The fine stochastic TDC outputs are added together and must be normalized to Tout

as well. An accurate normalization is hardware expensive since it involves dividing the

combined outputs of the stochastic TDC by the reference period Tref and then multiplying

it with FCW. Alternatively, the proposed normalization circuit uses shifting and addition

operations as well as a digital normalizing “Scale” factor (provided on the right side of

Fig. 3.14) to correct the fine TDC output against uncertainty in the clock signal slope, Sin,

and time offset statistics, σT . In this implementation, the Scale factor can be adjusted

with 2 bits of resolution. In a commercial product, the Scale factor can be calibrated

using a technique similar to the one described for the coarse TDC. In this work, only

calibration of the coarse TDC was implemented on-chip since any inaccuracies there will

be dominant.

40

3.5. TDC Calibration

3.5 TDC Calibration

To reap the full performance benefits of a fine resolution TDC, it must have good linearity.

In [29], the reference clock signal is recycled through a single delay cell to avoid the

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Input

No

rma

lize

d P

DF

(a) Number of arbiters = 16

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Input

No

rma

lize

d P

DF

(b) Number of arbiters = 32

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Input

No

rma

lize

d P

DF

(c) Number of arbiters = 64

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Input

No

rma

lize

d P

DF

(d) Number of arbiters = 128

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Input

No

rma

lize

d P

DF

(e) Number of arbiters = 256

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Normalized Input

No

rma

lize

d P

DF

(f) Number of arbiters = 512

Figure 3.11: Normalized PDF of Gaussian distributed random offset for a stochastic TDCwith a different number of arbiters. Each plot is obtained from a 100-run Monte-Carlosimulation.

41


−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized Input

No

rma

lize

d C

DF

(a) Number of arbiters = 16

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized Input

No

rma

lize

d C

DF

(b) Number of arbiters = 32

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized Input

No

rma

lize

d C

DF

(c) Number of arbiters = 64

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized Input

No

rma

lize

d C

DF

(d) Number of arbiters = 128

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized Input

No

rma

lize

d C

DF

(e) Number of arbiters = 256

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Normalized Input

No

rma

lize

d C

DF

(f) Number of arbiters = 512

Figure 3.12: Normalized CDF of Gaussian distributed random offset for a stochastic TDCwith a different number of arbiters. Each plot is obtained from a 100-run Monte-Carlosimulation.

nonlinearity that arises from mismatch along a row of delay cells, and an auxiliary loop

fixes the delay against PVT variations. More typically, however, calibration is used to

42


−80 −60 −40 −20 0 20 40 60 800

10

20

30

40

50

60


Su

mm

atio

n o

f S

TD

C o

utp

uts

(a) CDF when M = 64

−80 −60 −40 −20 0 20 40 60 800

100

200

300

400

500


Su

mm

atio

n o

f S

TD

C o

utp

uts

(b) CDF when M = 512

10 15 20 25 30 35 40 45 50 55−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

STDC output code

DN

L [L

SB

]

(c) DNL (M = 64, LSB = 1.49 ps)

50 100 150 200 250 300 350 400 450−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

STDC output code

DN

L [L

SB

]

(d) INL (M = 512, LSB = 0.19 ps)

10 15 20 25 30 35 40 45 50 55−1.5

−1

−0.5

0

0.5

1

1.5

STDC output code

INL

[L

SB

]

(e) INL (M = 64, LSB = 1.49 ps)

50 100 150 200 250 300 350 400 450−15

−10

−5

0

5

10

15

STDC output code

INL

[L

SB

]

(f) DNL (M = 512, LSB = 0.19 ps)

Figure 3.13: The associated DNL and INL of an ideal stochastic TDC with random offsetwhen the number of arbiters M = 64 & 512.

avoid nonlinearity in a TDC.

In a two-step TDC, linearity of the coarse TDC is of prime importance since nonlin-

earities there will introduce more jitter than in the fine-resolution TDC. In this work, the

43


Figure 3.14: Phase error computation and normalization with respect to one DCO outputperiod, performed digitally. The phase error computed by the coarse TDC is refined bythe stochastic TDC.

delay of each stage in the coarse TDC varied from 28 to 38 ps (before extraction) over

200 Monte-Carlo simulations of process and mismatch variations with a Gaussian-like

distribution at an average of 33 ps and 1.89 ps standard deviation, as shown in Fig. 3.17.

Hence, calibration is needed to prevent the coarse TDC mismatch from limiting the overall

performance. Furthermore, calibration of the coarse TDC is crucial to ensure that the

residual quantization error applied to the fine stochastic TDC is within its acceptable

range.

To permit calibration, the coarse TDC comprises independently-programmable delay

stages. Each differential delay stage is comprised of CMOS inverters whose outputs are

cross-coupled by two more inverters and loaded by a 4-bit binary-weighted capacitor

bank, as show in Fig. 3.16. The capacitor bank is implemented with differential MOS

capacitors and provides a programmable delay that can be varied 15 ps, sufficient to

cover delay variations.

44


In this work, a statistical calibration method is used to measure the coarse TDC

nonlinearity. The time-varying difference between an external calibration clock and the

reference clock is relied upon to generate a uniformly distributed time error to be used as

the input to the coarse TDC under calibration. Fig. 3.18 shows the time error generated

by sampling a 333.333 MHz external calibration clock, fcalb, using a 20 MHz reference

clock. This produces time error in the range of -1500 to +1500 ps with fine resolution of

1 ps small enough to allow calibration in that neighborhood. The density function of that

timing error is uniformly distributed. A simplified block diagram of the calibration circuit

to perform a code-density test [30] is shown in Fig. 3.15. A similar statistical method

for measuring DNL is applied to a Vernier TDC in [20]. Unlike that work, however, here

each cell delay is continuously adjusted according to the test results until uniform code

density is observed.

Code-density testing generally needs a large number of clock cycles to achieve accu-

racy. Accordingly, a wide register would be needed to store the number of hits observed in

each delay bin during calibration. In this work, a balanced mean rather than an absolute

mean is used to store the accumulated number of hits in each delay bin during calibration

[31]. Using a balanced mean, the size of the storage registers can be significantly reduced.

Assume a TDC consists of R delay elements and a register is used to store the number

Figure 3.15: On-chip low-area calibration algorithm of the coarse TDC based on a codedensity test. The dedicated calibration clock, fcalb, is sampled by the coarse TDC duringthe calibration phase and once done, the coarse TDC samples the DCO clock.

45


fref

fref

To next

stagea3 a2 a1 a0

m=1m=2m=4m=8

Figure 3.16: Delay cell with 4-bit calibration capacitor bank.

of hits for each delay element (bin). Using a balanced mean, whenever a hit occurs for

the ith bin, the controller increments the ith register by (R − 1) and decrements the

other (R− 1) registers by one. Note that the mean value stored in all registers remains

zero because [no. of hits ×(R − 1)] + [no. of missing hits ×(−1)] = 0. Accordingly,

there is no need to find the average of stored values as is the case with an absolute mean

method. A regular absolute mean code density test requires the code density data be post-

processed to determine the proper calibration code for each delay including averaging,

subtraction, and multiplication. On the other hand, the implemented balanced mean

calibration employs a simple finite-state machine (FSM) that continuously calibrates the

coarse TDC and only monitors a threshold level to determine if a new calibration code

is needed or not.

Figure 3.17: Spectre mismatch Monte-Carlo simulation of the inverter unit used in thecalibrated coarse TDC.

46


Assume that the balanced code density test uses H hits per iteration. Also, assume

the ith register, corresponding to the ith TDC bin, stores a final value of Fi at the end of

calibration phase. Moreover, assume that pi is the probability that an event is collected

for the ith bin, then the number of hits for bin i is piH and so one can write:

Fi = piH · (R− 1) + (1− pi)H · −1 (3.9)

⇒ Fi

H= piR − 1 (3.10)

⇒ pi =1 + Fi/H

R(3.11)

Ideally, pi =1Rand so the Fi/H term in Eq. 3.11 represents the DNLi expressed as

percentage. i.e. DNLi = Fi/H. Hence, each register stores the DNL of each corresponding

TDC bin at the end of balanced mean calibration. To achieve a DNL of 2% with 99%

confidence, a 16-bit register is used for each coarse TDC bin rather than a 23-bit register

which would be required without the use of the balanced mean method, saving 224

registers in total.

To ensure proper operation, a TDC with nonlinearity and the calibration algorithm

are modeled in MATLAB. Later, a Verilog-AMS model of TDC is used while the cali-

0 50 100 150 200 250 300−1500

−1000

−500

0

500

1000

1500

Calibration Cycle

Tim

e E

rror

(ps

)

Figure 3.18: Generating random time errors that are uniformly distributed to calibratethe coarse TDC. Sampling 333.333 MHz calibration clock using 20 MHz reference willproduce uniform time error within ± 1500 ps.

47


0 5 10 15 20 25 30 3526

28

30

32

34

36

38

40

TDC Bin

Bu

ffer

Del

ay (

ps)

Aftercalibration

Beforecalibration

Figure 3.19: The delay of TDC inverters before (blue triangle points) and after (redsquare points) calibration using a balanced code density test with fine delay correction(floating point precision). After calibration, the delay mean = 33.345 ps, std = 0.335 ps,and peak-peak error = 1.203 ps.

0 5 10 15 20 25 30 3526

28

30

32

34

36

38

40

TDC Bin

Bu

ffer

Del

ay (

ps)

Beforecalibration

Aftercalibration

Figure 3.20: The delay of TDC inverters before (blue triangle points) and after (redsquare points) calibration using balanced mean with 1 ps correction step (fixed pointprecision). After calibration, the delay mean = 32.603 ps, std = 0.559 ps, and peak-peakerror = 1.854 ps.

48

3.6. Clock domain synchronization

bration algorithm is coded in Veriolg HDL. The simulation shows the effectiveness of the

algorithm whenever the DNL is in the range -7 ps to +8 ps. Any nonlinearities outside

this range will saturate the correction at the appropriate limits. Fig. 3.19 shows the delay

of the coarse TDC inverters before and after calibration using the balanced mean code

density test. After calibration with infinite precision (i.e. using floating point computa-

tion), the peak-to-peak error is 1.20 ps with 0.34 ps standard deviation. Fig. 3.20 shows

a realistic scenario where the correction step is assumed to be ±1 ps, in accordance with

the transistor implementation in Fig. 3.16. In this case, the peak-to-peak error becomes

1.85 ps with 0.56 ps standard deviation. Choosing a finer correction step would reduce

the peak-to-peak error.

3.6 Clock domain synchronization

There are two asynchronous clock domains in the DPLL, fref and fout. During frequency

acquisition, their edge relationship is not known, and during phase lock, the edges will

Figure 3.21: Clock synchronization of the reference clock, fref , using the DCO clock,fout, and a divided down DCO clock, clk8. The DCO clock is divided by two using CMLdivider which are custom designed. The synchronization afterward and feedback phasecounter are fully synthesized.

49

3.7. Implementation Details of the DPLL

exhibit rotation if the fractional part of FCW is nonzero [8]. This makes DPLL operation

vulnerable to latch metastability which can cause the DPLL to fail from time to time.

This causes a reliability issue which is characterized by the mean time between failure

(MTBF). Digital designers usually solve the problem of asynchronous clock domains by

re-timing the lower frequency clock, fref , with the higher frequency clock, fout.

Usually, one flip-flop (FF) is not enough to achieve this and a series of FFs is used

for the synchronizing process to increase MTBF which improves exponentially with the

number of added FFs [15]. For example, a 2 GHz DPLL with 20 MHz reference im-

plemented in 0.25 µm technology has MTBF of 4.3 seconds if only one retiming FF is

used. Cascading two FFs will increase the MTBF to approximately 12 years. Recent

measurements of synchronizer metastability show a degradation of MTBF with technol-

ogy scaling especially for extreme PVT conditions. Designing the same DPLL in 65

technology will reduce the MTBF to 50 ms when one retiming FF is used. The MTBF

becomes 11 hours using two cascaded FFs and 110 thousands years using three cascaded

FFs [32]. Accordingly, a minimum of two or three FFs must be used for a reliable clock

domain synchronization. Fig. 3.21 shows the implemented synchronization circuit where

the reference clock, fref , is initially synchronized using the output clock, fout, and then

by the divided down output clock, clk8, which is used to dither the DCO using a ∆Σ

modulator. Finally, note that the retiming process generates a fractional phase error

which is estimated and corrected by the TDC.

3.7 Implementation Details of the DPLL

The DPLL has been realized using synthesized Verilog code for the loop filter, normaliza-

tion algorithm, TDC calibration algorithm, a ∆Σ modulator (DSM), high-speed counter

and synchronization logic between the reference clock, output clock, and DSM. Other

blocks such as a CML divide by two, the DCO, and TDC were custom designed.

3.7.1 Digital Loop Filter (DLF)

After calibration and digital normalization of the TDC output, the digital phase and

frequency error are passed to the digital loop filter (DLF). The DLF is controlled by a

FSM that initially enables a proportional filter with a high gain of Kc that controls the

50


V

Kp

Ki

+Gear shift

1

00

enFine

Kc1

0

+

0

1

enIIR

IIR

6

24

Finebits

Coarse bits

enFine

Figure 3.22: Implementation of the digital loop filter. The coarse filter uses only propor-tional gain, Kc, to accomplish rough frequency lock. Then, it gets disabled such that afirst order filter takes over to achieve phase lock. Gear shifting is used to accelerate thephase locking time. Finally, the IIR can be enabled to filter out high-frequency noise.

coarse varactors of the DCO to allow fast frequency locking. Then, the FSM freezes the

coarse DCO inputs and switches to phase locking where the DLF consists of a propor-

tional path with gain kp and a delaying integral path with gain ki. Both kp and ki as

well as Kc are programmable via a serial bus. The DLF is followed by an optional IIR

filter with a programmable gain k. Fig. 3.22 shows block diagram of the DLF. Finally,

the digital output of the DLF is applied directly to an array of varactors in the DCO to

control the output phase.

3.7.2 DCO

The DCO is an LC-oscillator with digitally-controlled capacitors as shown in Fig. 3.23.

The LC tank includes two capacitor banks: coarse and fine. The coarse bank, as shown

in Fig. 3.24, uses binary weighted Metal-Insulator-Metal (MiM) capacitors in a common

centroid layout. It has 6 bits of resolution to cover the frequency range 1.99 to 2.50 GHz

resulting in a resolution of approximately 8.125 MHz. The fine capacitor banks are

realized with MOS accumulation-mode varactors that digitally switch between low and

high capacitance values, as shown in Fig. 3.25. The 24-bit fine capacitor banks are

divided into 6-bit integer part with 726 kHz resolution and 18-bit fractional part. The

51


M1

M3 M4

M2

C

L

W

Cfin

e

W

Cco

arse

W

Cfin

e

W

Cco

arse

Dec

oder

dC[5:0]

dF[23:0]

Figure 3.23: LC-DCO with two banks of tuning. The coarse tuning is implemented usingMiM capacitors while the fine tuning is achieved by using MOS varactors.

OutP

OutN

61 fF

61 fF

C[0] C[1] C[5]

1x 2x 32x

Figure 3.24: Coarse frequency tuning using Metal-Insulator-Metal (MiM) capacitors.

6-MSBs of the fractional part are binary weighed while the next 7 bits are thermometer

encoded to reduce switching activities and to ensure monotonic behavior of the DCO.

The Unit-sized accumulation-mode varactors provide a frequency resolution of 11 kHz.

The 11-LSBs are fed to a ∆Σ modulator running at a speed of fout/8 (hence in the range

of 250 - 312 MHz) to achieve very fine frequency resolution. Fig. 3.26 shows a summary

of the DCO high-level configuration. Note that the fine capacitor bank is designed to

have enough range to provide at least 50% overlap between adjacent coarse bank settings

and ensure all frequencies are covered.

52


OutP

OutN

7x1 15x16 1x64

7x1 2x64 32x64

7bΣ∆

7bfrac.col.

15bfrac.row.

f[0] f[1] f[5]

Fractional fine bits Integer fine bits

Figure 3.25: Fine frequency tuning using MOSFET varactors. The frequency tuning isdefined by the difference of MOS capacitance between the ON and OFF state.

Bin2Th

/8

Σ∆

11

24 fine bits

6 coarse bits

6 MSB

Bin2Th

4

3

7

15

7

8125 kHz/code

726 kHz/code

11 kHz/code

176 kHz/code

dithered11 kHz/code

18 LSB

Figure 3.26: Illustration of the DCO controls bits and gains. There are six coarse bits withan average gain of 8125 kHz/code. Also, there are 24 fine bits that are further divideddown into 6-MSBs with an average gain of 726 kHz/code and 18 LSBs representing thefractional part of the frequency control word. The 18 fractional LSBs are decoded into 7-bit thermo-metric matrix and 11-bit provided to the ∆Σ modulator to achieve immenselyfine frequency resolution below what a minimum size MOS varactor can achieve in aparticular process.

The DCO introduces quantization error because it changes its output frequency in

discrete steps that introduce spurious tones at offset frequencies beyond the loop band-

width. The ∆Σ modulator shapes the quantization noise of the DCO to high offset

frequencies and achieves fine frequency control. The ∆Σ modulator is implemented with

53

3.8. Measurement Results

Frac[10:0]

C[0]

+

C[1]

+

C[2]

+

11 8 5

88 5 51111

Combiner7

T[7:1]

Figure 3.27: Implementation of the third-order reduced complexity ∆Σ modulator [33].The first stage has higher computational resolution compared with the following stagesto reduce power and complexity and to meet timing requirement during synthesis.

a reduced complexity MASH-1-1-1 architecture [33], with each succeeding stage having

shrinking accuracy and area, as shown in Fig. 3.27. The first stage of the ∆Σ modulator

is the most important one, so 11-bit registers are used there. The second stage uses only

8-bit registers while the last stage uses only 5-bit registers.

The output clock of the DCO is divided by two using a CML static divider. The CML

output is AC coupled before passing it through a pseudo-differential CMOS buffer. After

that CML to CMOS stage, the half-rate clock is fed to a synthesizable CMOS divider

and a counter.

3.8 Measurement Results

This section presents the test setup and measurement results of the fabricated test chip.

A prototype was fabricated in 0.13 µm CMOS technology from IBM (acquired by Global

Foundries Inc. in 2015). The fabrication process includes 8 metal layers and high-density

MiM capacitors. Analog power and ground lines are separated from digital power and

ground to minimize noise coupling from digital circuits to analog blocks. The active area

of the proposed DPLL is 0.43 mm2, of which 0.273 mm2 is digital circuitry including

the DPLL core, the calibration algorithm, and the fine stochastic TDC. A die photo of

the fabricated prototype is shown in Fig. 3.28. Note that the active area of the digital

circuitry will be drastically reduced in newer processes. The active area scales down by

a factor of 0.5 as moving from one technology node to another [34]. There are four and

half nodes from 130 µm to 28 nm (28 nm is a half node technology from 32 nm) which

means that the digital active area can be reduced by a factor of 24.5 = 22.7 to become

0.012 mm2 in 28 nm rather than 0.273 mm2 in 130 nm technology.

54


DCO 157500 um2

TDC27500 um2

Digital Logic145000 um2

MASH30000 um2

Calibration Logic

700000 um2

Figure 3.28: Die photo of the DPLL chip in an IBM (now GF) 130 nm bulk CMOSprocess (active area is 0.43 mm2).

3.8.1 PCB and Test Setup

The DPLL runs from a temperature-controlled 20 MHz reference clock with a phase noise

of approximately -143 dBm/Hz at 100 Hz and -158 dBm/Hz at 10 MHz (equivalent to

143 fs RMS or 1.03e-3 degree RMS) and it has very small aging and temperature stability

factor on order of 10−8. Also, an external signal generator was sometimes used when a

reference frequency other than 20 MHz was required including to calibrate the coarse

TDC. The DPLL is mounted on custom 4-layer printed circuit board (PCB), as shown

in Fig. 3.29(c). The DPLL can be configured for different bandwidth and operation

settings via serial shift registers. A small form factor FPGA board (DE0 nano from

Terasic that is shown in Fig. 3.29(b)) is employed to shift in the configuration data into

the DPLL. Furthermore, the DPLL and FPGA boards are connected to another PCB

that is shown in Fig. 3.29(a). This main PCB provides biasing and regulated power to

the DPLL board, and translates the voltage levels of the FPGA control signals to the

DPLL board. This modular PCB structure simplifies testing and shortens design time

since only the daughter board must be redesigned to test future designs like the one

presented in Chapter 5.

Many devices and tools were used during testing. A PC running Altera Quartus

programed the FPGA and DPLL, MATLAB to characterize the DCO, and KE5FX soft-

55


(a) Main board for powering, biasing, and interfacing with daughter board whereDUT is mounted

(b) FPGA DE0 nano board for programming andcontrolling DUT

(c) Daughter board where the DPLL chip is mounted

Figure 3.29: PCBs used for powering, biasing, and programming the DPLL chip.

56


ware to capture the output of an HP 8565C spectrum analyzer. A Tektronix RSA6114A

spectrum analyzer and Agilent PSA high-performance E4448A spectrum analyzer were

also used. An illustrative diagram of the test setup is shown in Fig. 3.30. The HP 8565C

spectrum analyzer has the highest noise contribution among other used equipment of

around 356 fs RMS jitter for 2 GHz output clock. It has a noise sideband of -117 dBc/Hz

at 100 kHz, -133 dBc/Hz at 1 MHz offset, and -142 dBc/Hz at 1 GHz. A Tektronix

RSA6114A spectrum analyzer can detect an average noise level of -151 dBm at 1 GHz

offset and it contributes around 236 fs RMS for 2 GHz clock which is better than the HP

analyzer. Finally, the Agilent E4448A high-performance spectrum analyzer contributes

only 103 fs RMS for 2 GHz output clock and it was used for fine jitter measurements. It

can detect an average noise level of -137 dBm at 100 kHz, -145 dBm at 1 MHz, and 152

dBm at 1 GHz.

PC running Altera Quartus,KE5FX tool, &

MATLAB

USB Blaster Circuit

Altera FPGACyclone IV

Switch LED

40-pin Header

3.3V to 1.0V Level-Shift

DUTDPLL

Low noise Regulators

Biasingcurrents

USB

20 MHz

Spectrum analyzer

Agilent E4448AHP 8565CTektronix RSA6114A

GP

IO

DE0nano board

Daughter board

Main board

Figure 3.30: Block diagram of the test setup. The FPGA on the DE0 nano board, whichcontrols the DPLL chip, is programmed via a PC using Altera Quartus. KE5FX softwareruns on the PC and can capture the spectrum of a particular clock using an HP 8565Cspectrum analyzer.

57


3.8.2 Results

Fig. 3.31 shows the open loop test measurements using serially shifted DCO control

words from the on-board FPGA. The coarse DCO bank gain, Kcdco, is 8.125 MHz/code

on average while the fine DCO bank gain, Kfdco, is around 726 kHz/code on average.

The fine DCO bank has a fractional part with 11 kHz/code resolution on average. The

DPLL can lock to any frequency between 1.99 and 2.5 GHz from a nominal reference of

20 MHz.

0 10 20 30 40 50 601.9

2

2.1

2.2

2.3

2.4

2.5

Coarse Digital Code

DC

O F

requ

ency

[GH

z]

DCO Characteristics − Measurements

DCO Model − Kvco = 8.125 MHz/code

(a) Coarse DCO gain = 8125 kHz/code (sweepingonly the coarse DCO control word)

(b) Fine DCO gain = 726 kHz/code (sweepingboth the coarse and fine DCO control word)

Figure 3.31: DCO gain measurements.

Figure 3.32: The differential output clock captured using Tektronix RSA6114A. Thedifferential peak-to-peak voltage is 370mV for a 2 GHz clock.

58


Figure 3.33: Spectrum of the output clock, captured by HP8565C spectrum analyzer andKE5FX tool, when the reference clock is frequency modulated.

The DCO output clock is buffered through a four-stage differential CML buffer (see

Appendix A for schematics) that consumes 34 mW and captured by a Tektronix RSA

6114A real-time spectrum analyzer revealing a 370 mV peak-to-peak amplitude., as

shown in Fig. 3.32. Fig. 3.33 shows the spectrum of the output clock, captured by a

HP8565C spectrum analyzer, when the reference clock is frequency modulated. Fig. 3.34

compares the spectrum of the DPLL output based on Verilog-A simulation vs. mea-

surement captured by HP8565C spectrum analyzer. Note that the estimated spectrum

from the Verilog-A simulation is comparable to the measured spectrum for the frequency

range from 100 kHz to 20 MHz. The noise floor of the measured DPLL is higher than the

expected noise level based on Spectre’s simulation which was modeled in the Verilog-A.

Fig. 3.35, Fig. 3.36, and Fig. 3.37 show the fractional spurs for various fractional

values without TDC calibration. For 2012.50 MHz clock, shown in Fig. 3.35, there is an

in-band spurs of -53.42 dBc at 867 kHz offset. The worst spurs of -44.86 dBc appears

at 3.133 MHz offset. For 2003.125 MHz clock, shown in Fig. 3.36, there is -34.05 dBc

spur at 775kHz and -49.30 dBc spur at 1.575 MHz. For larger fractional value when the

output clock is 2006.250 MHz, shown in Fig. 3.37, the worst spurs of -39.29 dBc show up

at 1.575 MHz.

59


105

106

107

108

−150

−140

−130

−120

−110

−100

−90


Pha

se N

oise

(dB

c/H

z)

Simulation

Measurment

Figure 3.34: Verilog-A Simulation vs. Measurement captured by HP8565C spectrumanalyzer and KE5FX tool.

Figure 3.35: Spurs spectrum at 2012.5 MHz measured using Agilent E4448A spectrumanalyzer before calibration.

60


Figure 3.36: Spurs spectrum at 2003.125 MHz measured using Tektronix RSA 6114Areal-time spectrum analyzer before calibration.

Figure 3.37: Spurs spectrum at 2006.250 MHz measured using Tektronix RSA 6114Areal-time spectrum analyzer before calibration.

61


(a) Before calibration and high frequency filtering

(b) After calibration and high frequency filtering

Figure 3.38: Spurs spectrum at 1995.0 MHz measured using Tektronix RSA 6114A real-time spectrum analyzer.

62


(a) Before calibration and high frequency filtering

(b) After calibration and high frequency filtering

Figure 3.39: Spurs spectrum at 2185.0 MHz measured using Tektronix RSA 6114A real-time spectrum analyzer.

63


Fig. 3.38 shows spectrum measurements where spurs have been reduced from -54.27

dBc to -70.55 dBc at 2.65 MHz offset from the 1995 MHz carrier by the calibration. Spurs

at larger offset of 5.25 MHz were reduced by 34.2 dB thanks to the use of the DLF with

IIR filter enabled to filter high frequency noise. Fig. 3.39 shows spectrum measurements

where spurs have been reduced from -60 dBc to -70 dBc at 2.65 MHz offset from the 2185

MHz carrier by the calibration.

Fig. 3.40 shows phase noise measurements at a 2 GHz DPLL output frequency and 20

MHz reference clock, using an HP8565C spectrum analyzer and KE5FX tool, with and

without the fine TDC activated. The in-band noise is not less than -83.7 dBc/Hz when

the fine stochastic TDC is disabled. Once the fine TDC is enabled, the in-band phase

noise drops to -104.3 dBc/Hz, which is 20.6 dB lower with only 3 mW additional power

consumption due to the fine stochastic TDC. The loop bandwidth is approximately 1.42

MHz while the integrated random jitter is 697 fs rms (0.502 degree). Once the loop

bandwidth drops to 700 kHz, the integrated random jitter becomes 213 fs rms (0.153

degree) integrated from 1 kHz to 100 MHz.

A phase noise measurement, using an Agilent E4448A spectrum analyzer, with the

103

104

105

106

107

108

−140

−130

−120

−110

−100

−90

−80

−70

Offset Frequency

Ph

ase

No

ise

(dB

c/H

z)

Without fine TDC

With fine TDC

Eq. 2.13

Figure 3.40: Phase noise measurement of 2GHz clock using a HP8565C analyzer with(red) and without (blue) the fine TDC. The reference clock is a 20 MHz temperature-controlled oscillator.

64


−4p −3p −2p −1p 0 1p 2p 3p 4p0

10

20

30

40

50

60

70

TIE Jitter

Num

ber

of O

ccur

renc

e

Figure 3.41: The random jitter measurement of the output clock when the fine stochasticTDC is activated.

coarse-TDC calibrated and fine stochastic TDC activated is shown in Fig. 3.42 for a 2.4-

GHz output frequency. The in-band phase noise is -107 dBc/Hz while the jitter is 500 fs

RMS (0.432 degree) integrated from 1 kHz to 100 MHz for a loop bandwidth of 1.42

MHz. Furthermore, the phase noise is -116 dBc/Hz at 2 MHz offset and -137 dBc/Hz

at 19 MHz offset. The random jitter reported by a 25-GS/s real-time oscilloscope was

approximately 50% higher than from the phase noise analyzer, perhaps because some

small fractional spurs are interpreted by the oscilloscope as random jitter.

A phase noise measurement with the coarse TDC calibrated, fine stochastic TDC

activated and IIR filter invoked is shown in Fig. 3.43 for a 1.995 GHz output frequency.

The in-band phase noise is -104 dBc/Hz while the jitter is 233 fs rms (0.167 degree)

integrated from 1 kHz to 100 MHz.

Fractional operation was also confirmed at several other frequencies. For example,

with a reference frequency fref = 21 MHz at synthesized channel of (95 + 67/256)fref =

95.26171875fref = 2.000496094 GHz, as shown in Fig. 3.44(a). Another example, with

a reference frequency fref = 20 MHz at synthesized channel of (109 + 64/256)fref =

109.25fref = 2.185 GHz, as shown in Fig. 3.44(b). The results reveal less than 1 ppm

frequency error. Moreover, with a reference frequency of 20 MHz and a loop bandwidth

of 1.4 MHz, jitter was measured at four fractional channels between 120 and 121, all

exhibiting random jitter within 20% of that observed at an integer channel of 120.

65


Figure 3.42: DPLL output phase noise spectrum at 2.4 GHz captured by an AgilentE4448A spectrum analyzer. The in-band noise is -107 dBc/Hz while the integrated jitteris 500 fs rms (0.43 degree) from 1 kHz to 100 MHz for a loop bandwidth of 1.42 MHz.

Figure 3.43: DPLL output phase noise spectrum at 1.995 GHz captured by an AgilentE4448A spectrum analyzer. The in-band noise is -104 dBc/Hz while the integrated jitteris 233 fs rms from 1 kHz to 100 MHz for a loop bandwidth of 700 kHz. An IIR filter wasused to attenuate high frequency spurs.

66


(a)

(b)

Figure 3.44: Fractional synthesis measurements using HP8565C analyzer with (a) a21 MHz input reference at channel 95 + 67/256 and (b) a 20 MHz input reference atchannel 109 + 64/256 exhibiting less than 1 ppm frequency error.

67


Table 3.1 summarizes state-of-the-art TDC architectures and performances, while

Table 3.2 shows a comparison among state-of-the-art published digital frequency synthe-

sizers. For fair comparison of DPLLs with different reference and carrier frequencies, all

in-band phase noises are normalized using Banerjees figure of merit (BFM) [35]. It is de-

fined as BFM = PN−20∗ log(fout)+10∗ log(fref) where BFM is the normalized phase

noise, and PN is the measured in-band phase noise at low offset frequencies. Note that

the BFM does not take into account the dissipated power or the loop bandwidth. The

DPLLs presented in [36][37][22] have large power consumption mainly due to their DCOs

which have very small out of band phase noise. To account for this, the Gao’s figure of

merit is used to compare the DPLL performance based on the total integrated random jit-

ter and the power dissipation. It is defined as GFM = 10∗ log[(σt,PLL/1s)2 ·PPLL/1mW ]

[38]. The presented coarse-fine DPLL consumes 15.2 mW at 2.4 GHz. The DCO and

CML divide-by-two consume 7.8 mW, the coarse TDC consumes 1.4 mW, the fine TDC

consumes 3 mW, and the remaining standard-cell digital logic dissipates 3 mW. Accord-

ingly, the total power consumption of the proposed coarse-fine TDC is 4.4 mW from

a 1.2 V supply voltage. This is quite low compared to other published fine-resolution

TDC architectures. For example, the coarse-fine TDC based on a time amplifier in [25]

consumes 70 mW and the GRO in [12] consumes 2.2 - 21 mW depending upon the phase

error. Also, note that the power consumption reported here is for 130 nm CMOS tech-

nology, and it can be lowered significantly in newer processes. The power supply reduces

to 0.9 V in 28 nm compared to 1.2 V in 130 nm process while the parasitic capacitance

reduces by an almost factor of 5 due to scaling over four and a half nodes [34]. Recall that

the dynamic power of a digital circuit is given by P = CV 2DDf and based on the given

estimations, the dynamic power consumption in 28 nm is (0.9/1.2)2/5 = 0.1125 times

lower than in 130 nm process. The LC-DCO and CML logic will only be affected by the

power supply reduction. Accordingly, if the presented DPLL is ported to 28 nm technol-

ogy, the estimated power consumption is 0.9/1.2∗7.8+0.1125∗ (1.4+3+3) = 6.68 mW.

Further reduction of power consumption is expected as the 14 nm process has become

available from Intel and Samsung since 2014 while TSMC will release a 10 nm technology

for volume production by the end of 2016 [39].

68

3.8.Measu

rementResu

lts

Table 3.1: State-of-the-art fine-resolution TDC

Reference Interpolative Cyclic Periodic Time GRO 2D Vernier ThisLine [23] Vernier [19] Vernier [11] Amp [25] [12] [21] work

Number of bits 7 12 6 9* 11 7 7⋆Effective resolution [ps] 4.7 8 12 1.25 6 4.8 4INL [LSB] 2.4 N/A 1.15 2 N/A 3.3 N/ADNL [LSB] 1.2 N/A 1 0.8 N/A < 1 N/ABandwidth [MHz] 180 15 40 10 1 50 20Area [mm2] 0.02 0.26 0.04 0.06 0.04 0.08 0.028Supply Voltage [V] 1.2 1.5 1.2 1 1.5 1.2 1.2Power [mW] 3.6 7.5 2.5 70 2.2 to 21 1.7 4.4Technology [nm] 90 130 120 90 130 65 130

* Coarse-fine TDC with 5-bit coarse TDC and 6-bit fine TDC. The effective number of bits is 9-bit.

⋆ Coarse-fine TDC with 5-bit coarse TDC and 5-bit fine TDC. The effective number of bits is 7-bit.

69

3.8.Measu

rementResu

lts

Table 3.2: Comparison Among Published Digital Synthesizers.

Ravi Tonietto Weltin-Wu Hsu Wang Tokairin Lee Temporiti VercesiVLSIC’10 ESSCIRC’06 ISSCC’08 JSSC’08 JSSC’09 JSSC’10 CICC’09 JSSC’10 JSSC’12

Reference [20] [11] [40] [36] [37] [41] [42] [43] [22] This work

Reference frequency [MHz] 40(2x) 40 25 50 26 40 60 35 26 20Output frequency [GHz] 5-6 2 3 3.67 3.6 2.5 3.96 3.5 1.8 1.99-2.5

Bandwidth [MHz] 0.5 3 1.2 0.5 0.10 0.5 0.3 3 1 0.7-1.42In-band Phase noise [dBc/Hz] -94 -102 -100 -106 -95 -105 -96 -101 -108 -104 to -107

Normalized phase noise [dBc/Hz] -211 -212 -216 -220 -212 -217 -210 -216 -219 -217 to -222In-band spurs [dBc] -60 -46 to -42 -45 -42 -75 N/A -38 -58 -50 -34 to -53.3*

Out-of-band phase noise -140 -143 N/A -155 -155 -135 -140 -123 -160 -136in [dBc/Hz] at offset 30MHz 20MHz 20MHz 40MHz 10MHz 20MHz 3MHz 20MHz 30MHz

RMS jitter (fs) 597 1672 2173⋆ 204 364 591 682 1166⋆ 138 574⋆Power [mW] 50 15 9.5 46.7 60 9.7 9.6 9 41.6 15.2

Gao’s FoM (dB) -227.5 -223.8 223.5 -237.1 -231.0 -234.7 -233.5 -229.1 -241 -232.9Active area [mm2] 1.2 0.8 0.4 0.95 0.85 0.37 0.34 0.44 0.7 0.43Technology [nm] 90 130 65 130 130 90 90 65 55 130

⋆ Estimated based on the given phase noise.

* Out of bandwidth spurs reduced from -54 dBc to -70.55 dBc after calibration and high frequency filtering.

70

3.9. Conclusion

3.9 Conclusion

In summary, the performance of DPLLs is still in need of improvement, especially with

respect to spurs and phase noise performance in wide-bandwidth applications. Specifi-

cally, TDC quantization noise and non-linearity are major contributors to in-band phase

noise and spurs, respectively. Improving TDC resolution (quantization step) from 40 ps

to 4 ps can, ideally, improve in-band phase noise by 20 dB. However, achieving 4 ps

resolution in 130 nm CMOS is not an easy task. Also, enhancing the linearity of the

TDC reduces the folding of high-frequency phase noise to low-offset frequencies and re-

duces the spurious tone levels. Accordingly, efficient on-chip calibration algorithms are

essential.

A DPLL with a novel calibrated coarse-fine TDC was presented that is suitable for

modern wireless and wireline standards. The proposed DPLL achieves -104 to -107

dBc/Hz in-band phase noise that is equivalent to 4 ps TDC resolution. The DPLL can

lock to any frequency from 1.99 - 2.5 GHZ using a 20 MHz reference while the loop

bandwidth is around 700 kHz to 1.42 MHz. The entire DPLL consumes 15.2 mW from a

1.2 V supply in IBM’s 0.13 µm bulk CMOS technology. The integrated random jitter from

1 kHz to 100 MHz is 0.167 degree for 1.995 GHz carrier with 700 kHz bandwidth, 0.153

degree for 2 GHz carrier with 700 kHz bandwidth, 0.502 degree for 2 GHz carrier with

1.42 MHz bandwidth, and 0.432 degree for 2.4 GHz carrier with 1.42 MHz bandwidth.

71

Chapter 4

Linearization of Digital PLL

4.1 Introduction

The Digital PLL (DPLL) analysis in the previous chapters depends on a linear model

which fails to explain and predict nonlinear behavior like frequency acquisition and limit

cycles. Unlike a linear system, the steady state response of a nonlinear systems is de-

pendent on the initial conditions. Hence, it is very important to examine the validity

of using a linear model to analyze a DPLL with many sources of nonlinearity including

quantization and saturation.

Recall that a DPLL employs a time-to-digital converter (TDC) and a digitally-

controlled oscillator (DCO) as shown in Fig. 4.1. There are many sources of quantization

noise in DPLLs which have different effects on the purity and settling behavior of the

output clock. Mainly, the DCO quantization manifests as spurious tones outside the loop

bandwidth which get attenuated by the loop dynamics. However, the TDC quantization

noise limits the in-band phase noise and could generate in-band spurious tones either

related to the fractional value of the frequency control word (FCW) or due to the TDC

nonlinearity [8]. Not only does the TDC quantization error cause in-band spurs, but it

can also lead to a DPLL with unpredictable bandwidth and settling behavior which are

dependent on initial conditions like the initial phase error.

The TDC measures the phase difference between the output clock, fout, and the

reference clock, fref where the phase difference is quantized with a limited resolution of

∆ttdc resulting in a quantization error of tQ, as shown in Fig. 4.2. The estimated phase

difference is averaged and normalized to the instantaneous fout period and expressed as

72

4.1. Introduction

a fixed-point number.

If the DPLL is operating as a fractional-N synthesizer, the quantization error intro-

duced by the TDC, tQ may be approximated as white noise [12]. In other words, the

TDC quantization noise is scrambled over time due to the continuously changing phase

relationship between fout and fref , as shown in Fig. 4.3(a). The scrambling of the quan-

tization noise lowers the chance of limit cycle behavior, due to TDC nonlinearities, and

makes linear analysis of DPLL more valid [44].

Digital Loop Filter

DCOfout(t)

fref(t)

FCW

TDC-

+

+

fine

coarse

-

+

+PHE

+

PHF

PHR

Figure 4.1: A digital PLL architecture for integer and fractional mode synthesis.

fout

fref

D Q D Q D Q

Q0 Q1 Qn

D1

D2

fref

0

111110

0

Q[n]

D3

D4

D5

D6

D0

ttdc

tQ

tr

D0 Dn-1 Dnfout

Figure 4.2: Buffer delay line implementation of TDC: simplified schematic view (left);timing diagram (right). The raw Q[i] is pseudo-thermal code to be converted into anormalized binary word representing the fractional phase error.

73

4.1. Introduction

20 25 30 35 40 45 50

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Time (us)

TD

C O

utpu

t (S

cale

d to

DC

O P

erio

d)

(a) Fractional mode: FCW = 120.01709

20 25 30 35 40 45 50

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Time (us)

TD

C O

utpu

t (S

cale

d to

DC

O P

erio

d)

(b) Integer-N mode: FCW = 120

Figure 4.3: TDC Output during frequency (below 30 µs) and phase acquisition for dif-ferent DPLL operation modes.

However, the impact of TDC quantization error is well pronounced during integer-

mode operation [45]. During integer-mode operation, fout aligns itself with fref such

that their phase difference is small and will not be able to span most of the TDC range

once in lock, as shown in Fig. 4.3(b). Accordingly, there is no (or at least not enough)

scrambling of the TDC quantization noise and it becomes concentrated at low frequencies.

Hence, it can not be adequately filtered by the DPLL’s dynamics which causes strong

low-frequency spurs [44][46]. Also, similar behavior can appear if the phase difference

has a periodic pattern due to simple fractional part of FCW like 1/2, 1/4, etc. In this case,

the limited resolution of the TDC has an effect similar to the classic dead-zone behavior

observed in analog phase detectors. The dead-zone has the effect of periodically opening

the loop and letting the phase drift which shows up as deterministic jitter in the output

clock [14].

This chapter elaborates on the dead-zone behavior of a DPLL caused by TDC finite

resolution, focusing on integer-N operation. Also, a pure simple programmable digital

solution to the dead-zone problem is presented that achieves consistently low in-band

phase noise operation regardless of the initial condition while maintaining high loop

bandwidth. This solution is scalable and it is not affected by process, voltage, and

temperature (PVT) variations while it ensures phase locking with minimal phase offset.

74

4.2. TDC Dead-Zone Behavior

Phase error [ps]

TDC output

ttdc

Output phase here → dead-zone behavior

Output phase here → bang-bang behavior

Figure 4.4: The DPLL nonlinearity due to TDC quantized response.

4.2 TDC Dead-Zone Behavior

Recall that a typical TDC is comprised of a chain of buffers with a resolution that ranges

from approximately 32-to-48 ps over PVT in a 0.13-µm CMOS process to 10-to-12 ps

in a 28-nm process [44]. The output clock, fout, propagates through this chain such

that many delayed versions of fout are sampled at the rising edge of the reference clock,

fref , as shown in Fig. 4.2. The TDC reads out the normalized time difference, tr/∆ttdc,

between the rising edge of fref and the previous rising edge of fout. The DPLL reacts to

the time-varying values of the TDC readout to keep the DPLL locked [47].

Due to the TDC’s staircase nonlinearity, different types of nonlinear behavior are

observable depending upon the relationship between the reference phase and DCO output

phase in lock, as illustrated in Fig. 4.4. The DPLL will try to force the TDC output to

track the reference phase provided by the digital phase accumulator on the left side of Fig.

4.1. In integer-N mode, the fractional part of the FCW is zero while the accumulated

reference phase, PHR1, might have an arbitrary fractional value depending upon the

accumulator’s initial condition. The estimated phase difference, ǫ, by TDC tries to

track the fractional part of PHR. Accordingly, and since the fractional part of PHR is

dependent on the initial conditions, the TDC output will be also dependent on the initial

conditions. Furthermore, this dependency will lead to phase lock with an arbitrary offset

dependent on the initial condition.

1The accumulated reference phase in Fig. 4.1 is denoted as PHR or φref .

75


DCO edge sampled @

Fref edge

Q[i] Q[i+1]

Q[i]

Q[i+1]

Initial position of DCO phase

Fref Fref

Drifting of DCO edge

Dead-zone

TDC output

PHE +1 0 -1

(a) Explanation diagram

−20 −15 −10 −5 0 5 10 15 200

0.005

0.01

0.015

0.02

0.025

0.03

Phase Jitter (ps)

Den

sit

y D

istr

ibu

tio

n

Deterministic jitter is equal to TDC

resolution (32 ps)

(b) PDF of the phase jitter from a behavioral sim-ulation

Figure 4.5: Dead-zone behavior of Integer-N DPLL with a TDC resolution of 32 ps.

DCO edges sampled @

Fref edge

Q[i] Q[i+1]

Initial position of DCO phase

Fref Fref

Drifting of DCO edge

Q[i]

Q[i+1]

Dead-zone

TDC output

PHE +1 0 -1

(a) Explanation diagram

−8 −6 −4 −2 0 2 4 6 80

0.05

0.1

0.15

0.2

Phase Jitter (ps)

De

ns

ity

Dis

trib

uti

on

(b) PDF of the phase jitter from a behavioral sim-ulation

Figure 4.6: Bang-Bang behavior of Integer-N DPLL.

76


When the phase error coincides with a flat-part of the TDC staircase, the DPLL tries

to lock to a phase where the TDC has low effective gain and, hence, the DPLL has low

loop bandwidth. In this case, the phase error initially lies in the middle of TDC step

anywhere within the gray dead-zone region, as shown in Fig. 4.5(a). As long as it lies

within the gray dead-zone region, the TDC will produce the same output for long time2

and so no correction of the DCO phase is applied within the dead-zone region. It takes

a long time for fout edges to drift toward fref edges such that the DPLL appears as an

open loop during the phase drift within the dead-zone.

Fig. 4.5 shows the worst possible dead-zone behavior when fout locks to fref with

a small frequency offset. Due to unquantified nature of TDC output during integer-N

mode, fout could drift in the wrong direction for a while3 before an error is detected and

corrected by enforcing fout to drift in the opposite direction, as shown in Fig. 4.5(a)

i.e. fout needs to span one whole TDC step back and forth before phase and frequency

error is detected and correction is applied. Therefore, DPLL locks fout to fref with high

deterministic jitter4 equal to the TDC resolution. The probability density function (PDF)

in this case is the convolution of the deterministic jitter and random jitter, as shown in

Fig. 4.5(b). Consequently, a DPLL with a dead-zone behavior results in low-frequency

spurs, as shown in Fig. 4.7(a), similar to analog PLLs with a dead-zone.

Note that the TDC resolution as well as the speed of fout drifting toward fref deter-

mines the frequency and amplitude of those spurs and it is directly related to the DCO

jitter and quantization noise i.e. DCO frequency resolution. A jittery coarse DCO will

have higher frequency spurs with lower amplitude compared to low jitter fine DCO when

they encounter dead-zone operation. Furthermore, low jitter fine DCO has higher chance

of experiencing dead-zone operation compared with high jitter coarse DCO. A high jitter

DCO helps fout escape the dead-zone region faster during phase locking.

On the other hand, if the rising edge of output clock, fout, coincides with a transition

in the TDC staircase, the TDC will operate similar to a bang-bang phase detector as

illustrated in Fig. 4.6(a), where a TDC bin keeps toggling between 0 and 1 and produces

late or early phase difference without being able to quantify the value of that phase

difference. This happens when the initial phase difference between fout and fref is small

compared to the DCO time resolution and jitter such that fout edges drift over time and

2Depending on the length of the dead-zone region which is directly related to TDC resolution.3In case fout lies at the far edge of the dead-zone region.4Or fout is frequency modulated due to the dead-zone region and high bandwidth of DPLL.

77


104

105

106

107

−100

−90

−80

−70

−60

−50

−40

−30

−20


Spec

trum

(dB

)

(a) Appearance of low frequency spurs during dead-zone operation

104

105

106

107

−100

−90

−80

−70

−60

−50

−40

−30

−20


Spec

trum

(dB

)

(b) Noise shaped spectrum of TDC output during bang-bang operation

Figure 4.7: Spectrum of TDC normalized output.

78


can be quickly detected and corrected as shown in Fig. 4.6(a). The TDC will stay active

bouncing back and forth at high frequency and so the TDC output will be filtered by the

loop dynamics. In this case, the PDF of the phase jitter follows a Gaussian distribution

as shown in Fig. 4.6(b).

During the bang-bang mode of operation, the DPLL will exhibit a loop bandwidth

that depends upon the instantaneous phase error as well as other noise sources in the loop

[48]. It also has the potential for limit-cycle behavior, again resulting in spurs. Based

on a non-linear analysis of bang-bang PLLs, the smaller the jitter on fref and fout, the

higher the loop gain and bandwidth [47].

The more serious of these problems observed is the dead-zone behavior. The dead-

zone behavior increases the spread of phase jitter and degrades the loop bandwidth due

to the degradation of the loop gain. The spread of phase jitter is determined by the DCO

jitter performance as well as its frequency resolution and more importantly by the TDC

resolution.

Fig. 4.8 shows the phase noise, based on simulation results, for the same integer-mode

105

106

107

108

−140

−130

−120

−110

−100

−90

−80

−70

−60


Phas

e N

oise

(dB

c/H

z)

Figure 4.8: Phase noise of the same output clock for 60 different initial conditions foruncompensated Integer-mode DPLL.

79


DPLL for 60 different initial conditions illustrating how very different loop bandwidths

can result. The simulation environment employs a high level model of DCO phase noise

and reference noise as presented in [49]. The in-band phase noise varies from -60 dBc/Hz

to -100 dBc/Hz and the loop bandwidth changes by an order of magnitude.

4.2.1 Zero-Phase Restart (ZPR) Mechanism

The DPLL in Fig. 4.1 has a coarse control loop that works as a frequency lock loop (FLL)

to set the DCO frequency close to the desired frequency range. It also has a fine control

loop to achieve accurate frequency and phase locking. To avoid any discontinuities in

the DCO control word during gear shifting from coarse to fine operation, a zero-phase

restart (ZPR) mechanism [15] is used to zero-out the phase detector output and to

avoid disturbing the DCO after frequency locking. The ZPR resets the reference phase

accumulator, PHR, to an arbitrary value depending on the initial frequency error, as

shown in Fig. 4.9.

To avoid the TDC worst dead-zone operation (when deterministic jitter equals to TDC

resolution) and to reduce the dependency of DPLL response on the initial frequency error

during integer-N mode, the fractional part of the reference phase, PHRfrac, can be set to

zero. However, ZPR still modifies the integer part of PHR after the gear shifting from

0

1

PHR

N

++

-

PHF_int

PHF_frac

enFine

+

+

Figure 4.9: Zero-phase-restart (ZPR) triggered during the transition from coarse to finelocking mode. ZPR ensures a smooth transition from coarse to fine locking withoutdisrupting DCO [15].

80

4.3. Noise-Shaped Dithering

coarse to fine control mode.

Even with the TDC worst dead-zone operation avoided by zeroing of PHRfrac, bang-

bang-like operation can still result in inconsistent loop bandwidth and potentially spurs

due to limit cycle behavior. To alleviate the inconsistent behavior of an integer-mode

DPLL, fref edges can be randomized with respect to fout edges to ensure the TDC is

kept busy enough such that the quantization noise is scrambled over time.

4.3 Noise-Shaped Dithering

Recently published work [47] demonstrated an analog approach to avoid the dead-zone

behavior for low bandwidth DPLLs by randomizing the phase of the reference clock, fref .

In [47], the reference buffer is modified by adding 16 bias elements controlled by a dither-

ing sequence provided by a noise shaped ∆Σ modulator, as shown in Fig. 4.10. This

requires custom modification of the reference buffer and accurate sizing of the delay cir-

cuits. Furthermore, due to its analog nature, the effectiveness of this approach is affected

by the PVT variations and so calibration might be needed. Furthermore, mismatches

between the delay elements could reduce the usefulness of this approach. Moreover, this

approach allows fout to lock to fref with an arbitrary phase offset. Another DPLL imple-

mentation based on GRO-TDC [12] intrinsically scrambles the quantization noise with

first-order noise shaping. However, the GRO-TDC design is complex and it consumes

high power and a small dead-zone was still measured for some special cases.

f out

f refXO+

XO-TDC

Dithering

∆∆∆∆ΣΣΣΣ

Programmable delay buffer

MOD.

Figure 4.10: Dithering the reference clock by using ∆Σ modulator to control the pro-grammable delay of an input clock buffer.

81


TDC

0-1 & 1-0 detector

1/x x

tf∆t tdc

tr∆t tdc

2

tf∆t tdc

tr∆t tdc

Q[0: M]

Fref

Fout

Tout

∆ t tdc

Tout

∆t tdc

Tout

t r

Figure 4.11: A typical circuit to estimate thephase error of a coarse TDC.

Alternatively, this work proposes to

dither the normalized phase error, ǫ =

1− tr/Tout (which is estimated by a typi-

cal coarse TDC as shown in Fig. 4.11), by

using purely digital techniques. By ob-

serving the transient behavior of normal-

ized phase error, ǫ, as well as its spectrum,

I found that ǫ changes slowly and contin-

ually from 0 to 1 during dead-zone oper-

ation. The spectrum of ǫ exhibiting such

behavior is shown in Fig. 4.7(a) where

a large spur of -33 dBc at 30 kHz offset

is evident. Once a random digital offset,

generated by a 20-bit LFSR, is added to

ǫ, as shown in Fig. 4.12(a), the spurs dis-

appeared and the in-band spectrum drops

to -60 dBc. This random offset solution

ensures consistent but sub-optimal DPLL

performance.

4.3.1 Implemented Noise-Shaped Dithering

Rather than dithering the phase error, ǫ, using a uniform random offset, it is better to

use a 10-bit third-order noise shaped ∆Σ modulator which would provide the required

dithering and linearize TDC response while adding minimal in-band noise to the DPLL.

A block diagram of the dithering circuit is shown in Fig. 4.12(b). It is sampled by fref

and only requires 230 digital gates for implementation in a 0.13 µm IBM CMOS process.

The offset is chosen to be 0.5 DCO period to ensure that the falling edge of fout is always

locked to the rising edge of fref at a phase difference around a step in the TDC response.

A small random offset is added to the ∆Σ modulator input, generated by LFSR, to ensure

acceptable noise shaping as well as to get rid of unwanted reference spurs. Finally, recall

that Fig. 4.8 shows various DPLL responses for the 60 simulations each with a different

initial condition. After applying the noise shaped dithering, the phase noise spectrum as

well as the loop bandwidth becomes consistent, as shown in Fig. 4.13.

82


+Tout

tr

0 . 5

LFSR >> 4 +

Tout

tr~

(a) Random offset>> indicates right shift operation

+Tout

tr

0 . 5

LFSR >> 5 +

Tout

tr~

∆Σ modulator

(b) The implemented noise-shapped ran-dom offset in 0.13 µm IBM (now GF)CMOS process

Figure 4.12: Digital dithering algorithm at the falling edge of the output clock (0.5 UI).

104

105

106

107

108

−150

−140

−130

−120

−110

−100

−90

−80

−70

−60


Phas

e N

oise

(dB

c/H

z)

Figure 4.13: Phase noise of the same output clock for 60 different initial conditions afterapplying noise shaped random offset and disabling the fractional part of ZPR.

83


0 10 20 30 40 50 600

1

2

3

4

5

6

7

8

9

10

Simulation Number

RM

S TI

E Ji

tter (

degr

ee)

Figure 4.14: Time Interval Error (TIE) for the 60 simulations with different initial con-ditions: No dithering, Random dithering, ∗ Noise-shaped dithering.

RMS Jitter (deg) PP Jitter (deg)Mode of operation Avg. Dev. Avg. Dev.

(1) Fig. 4.11 :Normal operation 2.86 3.39 13.54 7.79(2) Fig. 4.11 :ZPR off 1.59 0.67 12.22 5.51(3) Fig. 4.12(a) :ZPR off + LFSR 1.38 0.06 10.77 0.78(4) Fig. 4.12(b) :ZPR off + DSM 0.92 0.04 7.68 0.37

Table 4.1: Summary of TIE rms and peak-to-peak jitter for the 60 different simulation.

Fig. 4.14 shows a plot of the time-interval error (TIE) for those simulations before and

after applying the dithering. The average RMS TIE of the 60 simulations after applying

the proposed noise-shaping offset is 0.92 degree with only 0.04 degree standard deviation.

Without dithering, the average RMS TIE is 1.59 degree with 0.67 degree deviation. Table

4.1 present a summary of the average RMS TIE and peak-to-peak jitter along with their

standard deviations for those 60 simulations: (1) when the ZPR is enabled, (2) when

the fractional part of ZPR is disabled without dithering, (3) when the fractional part

of ZPR is disabled with random dithering, and (4) when the fractional part of ZPR is

84


disabled with noise shaped dithering. Note that disabling the fractional part of ZPR is

not enough to get a consistently low jitter clock and so dithering is crucial to guarantee

a consistent response.

4.3.2 Improved Noise-Shaped Dithering

The implemented noise shaped random dithering around the falling edge of the output

clock, as shown in Fig. 4.12(b), is effective in linearizing the DPLL during integer mode

such that its response is consistent and independent of the initial conditions. However,

it can not be used if simple fractional channels like 1/2 or 1/4 is needed. An alternative

solution to linearize a DPLL that can work during integer-N mode and during simple

fractional-mode is shown in Fig. 4.15. The phase error is dithered by randomly adding

and subtracting various fractions of TDC resolution to the normalized phase error es-

timated by the coarse TDC over time to make sure DPLL is not stuck in a dead-zone.

Mathematically,trTout

=trTout

± 1

2,1

4,1

8,1

16∗ ∆ttdcTout

+

0 . 5

LFSR >> 5 +

Tout

t r~

∆Σ modulator

TDC

0-1 & 1-0 detector

1/x x

tf∆t tdc

tr∆t tdc

2

tf∆t tdc

tr∆t tdc

Q[0: M]

Fref

Fout

Tout

∆ t tdc

Tout

∆t tdc

Tout

tr

+/- >>1,2,3, or 4

3

Figure 4.15: A generic proposed circuit to generate a dithered phase error, tr/Tout, whichcan be applied to an integer and simple fractional channel synthesis.

85


4.4 Measurement Results

The same chip presented in the previous chapter, shown again in Fig. 4.16, is used

to demonstrate the dead-zone behavior and the ability of dithering to linearize the loop

during integer mode. However, the fine stochastic TDC in the DPLL prototype presented

in chapter 3 is disabled and instead the implemented third-order noise shaped dithering

algorithm shown in Fig. 4.12(b) is enabled. Fig. 4.17 shows phase noise measurement

results when the carrier is 2 GHz and the reference is 20 MHz using a HP8565C spectrum

analyzer. For the same frequency and same loop settings, I captured different loop

responses by simply resetting the DPLL many times. Dead-zone operation is drawn in

blue while the medium activity TDC response is shown in green. Large in-band spurs

at 40 kHz and 80 kHz offset frequency are readily seen. The optimal performance of the

integer-mode DPLL after applying the noise shaped dithering algorithm shown in Fig.

4.12(b) is drawn in red. The average integrated RMS jitter is 1.25 ps for 10 different initial

conditions after applying the proposed dithering algorithm with a consistent DPLL loop

bandwidth of 700 kHz. Fig. 4.18 shows the measured jitter histogram during dead-zone

operation. The measurement was done using a Tektronix RSA 6114A real-time spectrum

analyzer. The extracted random jitter is only 896 fs RMS while the deterministic jitter

is 28.3 ps peak-to-peak which is comparable to the coarse TDC resolution.

DCO 157500 um2

TDC27500 um2

Digital Logic145000 um2

MASH30000 um2

Calibration Logic

700000 um2

Figure 4.16: Die photo of the DPLL chip in IBM 130 nm bulk process [8]. It is the samechip used to demonstrate the DPLL with a coarse-fine TDC in chapter 3.

86


103

104

105

106

107

108

−140

−130

−120

−110

−100

−90

−80

−70

−60

−50

Offset Frequency

Pha

se N

oise

(dB

c/H

z)

Big spurs due to slow TDC response

caused by the dead−zone

DCO edges drift across full TDC bin

Consistent low in−band phase noise after applying dithering

DCO edges moves slowly

around TDC bins

Figure 4.17: Phase noise measurement using HP8565C analyzer showing different behav-iors of integer-mode DPLL.

Figure 4.18: The measured jitter histogram during dead-zone operation. The extractedrandom jitter is 896 fs RMS while the deterministic jitter due to dead-zone operation is28.3 ps peak-to-peak.

87

4.5. Conclusion

4.5 Conclusion

This chapter presents a detailed explanation of dead-zone behavior in DPLLs operated

in integer mode. It elaborated on the effect of dead-zone behavior on the phase noise

response and on the PDF of the output clock jitter. Based on that understanding, a

simple purely-digital dithering solution is also demonstrated to ensure the DPLL avoids

its dead-zones. The solution employs a third-order noise-shaping phase offset to lin-

earize the bang-bang behavior. The proposed solution ensures phase lock with minimum

offset. Extensive simulation results as well as a DPLL prototype achieve a consistent

low in-band noise operation regardless of the initial condition while maintaining high

loop bandwidth. Contrary to the reference clock dithering presented in [47], the pro-

posed dithering algorithm is scalable, purely digital, and it does not require modification

of the input reference buffer. Furthermore, the proposed algorithm is not affected by

impairment like PVT, mismatch, and noise coupling on the power supply.

88

Chapter 5

Cycle-Slipping and Pull-In Range of

Bang-Bang PLLs

5.1 Introduction

PLLs employing a binary “bang-bang” phase detector (BBPD) have recently become

more commonly used due to their simplicity compared to PLLs with a linear phase

detector, allowing them to operate at the highest possible speed. Also, BBPDs are

preferred over TDCs for integer-mode digital PLLs due to their low power consumption

[25].

However, bang-bang PLLs in general suffer three major drawbacks. Firstly the loop

gain and the associated loop characteristics are hard to define due to the highly non-

linear phase detector, which makes the effective gain dependent on the input jitter [50].

Secondly they have a limited pull-in frequency range that usually does not exceed 10%

of the reference frequency. Thirdly there is a trade-off between pull-in frequency range

and jitter performance of bang-bang PLLs as will be explained afterward [50].

There is a lot of literature dealing with the small-signal steady-state behavior of bang-

bang PLLs [51][52]. Those works focus on the jitter performance and stability of such

PLLs. The pull-in process during frequency acquisition of a bang-bang PLL is very non-

linear and a large signal analysis is needed. An early attempt to quantify pull-in range

of binary bang-bang PLLs was done by [53] where an asymptotic formula was derived for

the pull-in range under certain constraints. That analysis is only valid for lag-lead loop

filters and it does not provide an intuitive understanding of design trade-offs. In [54],

89

5.1. Introduction

Ki

Kp

clkref ε+ DCO

clkoutν

+z-1

±1

ψ

(a) Using Binary Bang-Bang Phase Detector (BBPD)

Ki

Kp

clkref ε

ψ

+ DCOLU

T clkoutν

+z-1

4 phases

[-4,+4]

(b) Using Multi-Phase Bang-Bang Detector (MPBBD)

Figure 5.1: A DPLL with a quantized phase detector and without a feedback divider.

a step-by-step description of oscillator phase and frequency during locking is presented.

However, [54] provides a formula for the frequency lock range before the occurrence of

the first cycle slip, which is much smaller than the pull-in range. More recently, [55]

provides a closed loop formula for pull-in frequency range but it does not provide an

intuitive understanding of the design trade-offs and it also under estimates the pull-in

range.

Fig. 5.1(a) shows a block diagram of a bang-bang digital PLL. Due to the quantized

nature of binary bang-bang PLLs, the instantaneous frequency error is also quantized

and is equal to ±KpKdoc, where Kp, is the proportional path gain 1 and Kdco is the

DCO gain. Accordingly, the bang-bang frequency step, fbb as defined in [56], is equal

to fbb = 2KpKdco. The corresponding bang-bang jitter is equal to Jbb =fbbfout

= 2KPKdco

fout

[UI]. Hence, a small proportional gain, Kp, is often required to minimize bang-bang

jitter. On the other hand, to increase the pull-in frequency range and to speed up the

locking time, a bang-bang PLL requires a larger proportional gain [55], which deteriorates

1Some references refer to the proportional gain as bang-bang gain.

90

5.1. Introduction

jitter performance [57]. Accordingly, there are contradicting requirements on the PLL

proportional gain and so a designer must ensure that a specific PLL loop dynamic is

sufficient to lock in the PLL under some initial frequency error and at the same time be

able to achieve acceptable jitter performance.

φe MPBBD BBPD

(-X, -¾X] -4 -1(-¾X, -½X] -3 -1(-½X, -¼X] -2 -1(-¼X, 0) -1 -1[0, ¼X) +1 +1[¼X, ½X) +2 +1[½X, ¾X) +3 +1[¾X, X) +4 +1 -4

+4

+1

-1¼X ¾X½X X

-¼X-¾X -½X-X

ε

φe

Figure 5.2: Transfer function of the MPBBD (thick solid blue) vs. BBPD (thin dashedred) when a DCO period is divided into eight regions with each region spans 45 degrees.

To decompose the trade-off between pull-in frequency range and jitter performance,

a PLL can dynamically scale its gain according to the initial frequency and phase error

in order to increase the pull-in frequency range while improving jitter performance in

lock, as presented in [57] and in [58]. Another technique is to design a fast frequency

acquisition aid, while freeing loop gain to control jitter performance [50].

In [58], I propose to use a multi-phase bang-bang detector (MPBBD) based PLL, as

shown in Fig. 5.1(b), where a MPBBD transfer function is shown in Fig. 5.2 (there

are plenty of possibilities). The MPBBD acts as a phase detector with an automatic

gear shifting mechanism. Hence, the phase detector absolute large signal gain 2 adjusts

automatically based on the magnitude of the phase and frequency error.

This chapter is divided as follows: First, section 5.2 presents a mathematical analysis

of the transient behavior of MPBBD-based PLLs 3 when far from their lock point. Then,

section 5.3 analyzes the cycle slipping behavior and frequency pull-in process as well as

2The absolute large signal gain of BBPD is KPD = 1 while the absolute large signal gain, KPD, ofMPBBD range from 1 to 4. Note that the steady state small signal gain of both detectors are dependenton the input jitter and can be estimated as kpd ≈ 1

√

2πσrms[51] where σrms is the rms jitter on the

reference clock.3The analysis for BBPD-based PLL is a special case of MPBBD-based PLL when KPD = 1.

91

5.2. Transient Analysis of MPBBD PLL When Far From Lock

the locking time 4. A closed form expression is derived to accurately predict PLL pull-in

frequency range which is more accurate than the prior literature. Later in section 5.4,

a simplified mathematical model of the period and absolute jitter (updated once each

reference cycle rather than each output cycle) is developed for fast simulation of lock-in

frequency range and locking time. Based on the developed understanding, section 5.4.3

proposes an improved MPBBD that has an extended pull in range of ±fref without

deteriorating the jitter performance in lock. Finally, section 5.5 provides an overview of

an implemented DPLL architecture that makes use of a MPBBD as well as a frequency

lock loop. The chapter is concluded with measurements results.

5.2 Transient Analysis of MPBBD PLL When Far

From Lock

It is well known that type-II PLLs can track phase error as well as frequency errors [50].

The integral path of a type-II PLL tracks frequency variation when it is above what

the proportional path [56] can handle. As long as the phase error is small enough, the

proportional path corrects for that error without the need to engage the integral path.

Ki

Kp

φr ε +φoutν

+z-1

ψ

φe+-+ Kdco + 1 - z-1

TrKPD

ω0

N KPD

PD

DCO

Figure 5.3: Phase domain model of DPLL with quantized phase detector.

Fig. 5.3 shows a discrete time (z-domain) phase model of a second order type-II

DPLL with quantized phase detector. The phase detector could be a binary BBPD or a

MPBBD with a stair-case characteristic. The transfer function can be modeled by

ε[k] = KPD · sgn(φe[k]) (5.1)

4Locking time and acquisition time are used interchangeably to mean the time needed for frequencyacquisition till there is no cycle slipping, as will be defined later.

92


where KPD is the absolute average large signal gain5 of the phase detector. For a binary

BBPD with ±1 output, the average gain KPD = 1. Also, the average MPBBD gain can

be expressed as KPD ≈n∑

r=1

|Kr|/n, where Kr is the output value of the rth step. For a

MPBBD with transfer function as shown in Fig. 5.2, the average gain is estimated as

KPD = 4+3+2+14

= 2.5.

Note that the output clock is sampled directly by the reference clock without being

divided down. Accordingly, the reference phase must be multiplied by the frequency

ratio between output and input clock i.e. N, as shown in Fig. 5.3. Also, note that

the time index, k, is advanced according to the reference sampling period Tr = 1/fref .

Let ω0 be the free running angular frequency of the oscillator output and Kdco be the

oscillator gain expressed in rad/s/step. Also, denote the loop filter output by υ[k] and

the instantaneous output frequency by ωout[k]. Then,

ωout[k] = ω0 +Kdco · υ[k] (5.2)

= ω0 +KdcoKp · ε[k] +KdcoKi · ψ[k] (5.3)

Based on Eq. 5.3 and Fig. 5.3, the change in the output phase can be expressed as the

following

φout[k]− φout[k − 1] = Trω0 + TrKdcoKp · ε[k] + TrKdcoKi · ψ[k] (5.4)

For simplicity, I will consider separately the phase contribution of the proportional path,

φp[k], and integral path, φi[k], as follows:

φp[k] = TrKdcoKp · ε[k] (5.5)

≈ TrKdcoKpKPD · sgn(φe[k]) (5.6)

Define φp ≡ TrKdcoKpKPD ⇒ φp[k] ≈ φp · sgn(φe[k]) (5.7)

Where φp represents the phase correction provided by the proportional path in one ref-

erence period, Tr. Similarly, the phase contribution due to the integral path can be

5From now on, the gain of phase detector, KPD, refers to the absolute average large signal gain ofthe phase detector during frequency acquisition.

93


expressed as

φi[k] = TrKdcoKi · ψ[k] (5.8)

but ψ[k] = ψ[k − 1] + ε[k] (5.9)

⇒ φi[k] = φi[k − 1] + TrKdcoKi · ε[k] ≈ φi[k − 1] + TrKdcoKiKPD · sgn(φe[k]) (5.10)

Define ωi ≡ KdcoKiKPD ⇒ φi[k] ≈ φi[k − 1] + Trωi · sgn(φe[k]) (5.11)

Hence ωi represents the angular frequency correction provided by the integral path in

one reference period. Finally, substitute Eq. 5.7 and Eq. 5.11 back into Eq. 5.4 to get:

φout[k]− φout[k − 1] = Trω0 + φp[k] + φi[k] (5.12)

The initial frequency offset, ωoff , is defined as the difference between the ideal locked

output frequency, Nωr, and the oscillator free running frequency, ω0.

ωoff = Nωr − ω0 (5.13)

⇒ Trωoff = TrNωr − Trω0 = 2πN − Trω0 (5.14)

If ωoff is within the pull in range, then the output frequency error will converge close to

zero. The phase error is defined as the difference between the reference phase and output

phase. The following expresses the phase error as well as the change in phase error

φe[k] = Nφr[k]− φout[k] (5.15)

⇒ φe[k]− φe[k − 1] = N(φr[k]− φr[k − 1])− (φout[k]− φout[k − 1]) (5.16)

For a fixed reference frequency, the change of reference phase in one reference period is

2π. It follows

φr[k]− φr[k − 1] = kTrωr − (k − 1)Trωr = Trωr = 2π (5.17)

Substituting Eq. 5.12 and Eq. 5.17 into Eq. 5.16, the change in phase error ∆φe[k] can

94

5.3. Cycle Slipping Phenomena

be expressed as the following

φe[k]− φe[k − 1] = ∆φe[k] = 2πN − Trω0 − φp[k]− φi[k] (5.18)

Using Eq. 5.14 ⇒ ∆φe[k] = Trωoff − φp[k]− φi[k] (5.19)

Eq. 5.19 has three terms: the first contributes phase slipping due to the frequency

error between the free-running and lock frequencies; the second and third account for the

phase corrections of the proportional and integral paths, respectively.

5.3 Cycle Slipping Phenomena

A PLL is said to exhibit cycle slipping if the phase error at the input of the BBPD

exceeds the range ±π. This phenomena slows down frequency acquisition and limits the

pull-in frequency range of a PLL [54].

Whether a PLL will exhibit cycle slipping or not is dependent on the relation be-

tween the phase shift caused by the initial frequency offset, ωoff and the available phase

correction by the proportional path, φp. If Trωoff < 2φp6, then a PLL will lock relatively

quickly without cycle slipping, assuming the loop is stable. On the other hand, if the

frequency offset is big enough such that Trωoff > 2φp, but still within the pull-in range

described below, then the PLL will exhibit cycle slipping during frequency acquisition

until the induced phase shift due to frequency error is reduced to below 2φp.

5.3.1 Analysis of Pull-In Frequency Range

Pull-in frequency range is defined as the maximum initial frequency offset, ωoff , for which

a PLL acquires lock, generally after experiencing many cycle slips. To guarantee locking,

the frequency error, Nωr − ωout[k], must reduce in each cycle slip in response to the

phase detector outputs [53]. The period of cycle slipping is inversely proportional to the

frequency error. Accordingly, and in order to guarantee locking, the cycle slip period

must increase with time. If the frequency error remains constant over successive cycle

slipping periods, or increases after a cycle slip period, the PLL will wander around an

intermediate metastable frequency [53].

6The choice of factor 2 is based on an analogy between a BB-PLL and ∆Σ modulator which will bediscussed later. Reference [54] predicts a similar factor.

95


0.10 0.15 0.20 0.25 0.30 0.35 0.40−4

−3

−2

−1

0

1

2

3

4

Time (us)

Pha

se D

etec

tor

Out

put

(a) BBPD output

0.25 0.30 0.35 0.40 0.45 0.50 0.55−4

−3

−2

−1

0

1

2

3

4

Time (us)

Pha

se D

etec

tor

Out

put

(b) MPBBD output

0.10 0.15 0.20 0.25 0.30 0.35 0.40

−180

−120

−60

0

60

120

180

Time (us)

Ph

ase

Err

or

(De

gre

e)

Nωr−ω

out[k]

decreasing

Nωr−ω

out[k]

increasingNω

r−ω

out[k]

increasing

Nωr−ω

out[k]

decreasing

(c) Phase error of BBPD-DPLL

0.25 0.30 0.35 0.40 0.45 0.50 0.55

−180

−120

−60

0

60

120

180

Time (us)

Ph

ase

Err

or

(De

gre

e)

Nωr−ω

out[k]

decreasing

Nωr−ω

out[k]

increasingNω

r−ω

out[k]

increasing

Nωr−ω

out[k]

decreasing

(d) Phase error of MPBBD-DPLL

Figure 5.4: Illustration of cycle slipping and speed of frequency acquisition for BBPD vs.MPBBD based DPLL.

To understand how the frequency error changes over time, a step-by-step analysis

will be presented using the time index in k with a reference period, Tr. Assume that a

PLL has large enough positive frequency error which causes a phase drift from −π to +π

over several reference periods i.e. cycle slipping, as shown in Fig. 5.4(c) and Fig. 5.4(d).

When the phase error is positive, the phase detector output is positive, as shown in Fig.

5.4(a) and Fig. 5.4(b). These positive pulses will push the phase error toward zero.

But they will cause the frequency error to increase rather than decrease since they are

positive pulses. The phase error will then drift below zero and the BBPD will produce

negative pulses as a result. As long as the frequency error is large enough, the phase

error will keep decreasing, though at a slower rate compared with positive phase errors,

toward −π, where a cycle slip happens.

To quantify the pull-in range, denote the number of up pulses in a cycle slipping

period at the edge of pull-in range as Nup and the number of down pulses in a cycle

slipping period as Ndn. Assume that ω0 > Nωr which will cause a negative phase shift

96


in each reference period until it gets corrected by the integral path. Also assume the

initial phase error is just slightly below π, and so the initial output of the phase detector

is positive and stays positive for another Nup reference periods. Recall Eq. 5.11 to find

the phase shift due to the frequency correction of the integral path over the next Nup

periods:

φi[0] = 0

φi[1] = φi[0] + Trωi = 1Trωi



. . .

Also, recall Eq. 5.19 to express the change in phase error during these Nup periods:

∆φe[1] = −Trωoff − φp[1]− φi[1] = −Trωoff − φp − Trωi

∆φe[2] = −Trωoff − φp[2]− φi[2] = −Trωoff − φp − 2Trωi

∆φe[3] = −Trωoff − φp[3]− φi[3] = −Trωoff − φp − 3Trωi

. . .

∆φe[Nup] = −Trωoff − φp[Nup]− φi[Nup] = −Trωoff − φp −NupTrωi

At the end of N thup period, the frequency error increases from ωoff to ωoff +Nupωi. Once

the phase error changes its sign from positive to negative, then the phase detector will

produce negative pulses that move the frequency error in the correct direction. In order to

cause sign inversion of the phase detector output, the sum of the phase shift contribution

of the positive pulses,Nup∑1

∆φe[k], shall be in the neighborhood of −π:

Nup∑

1

∆φe[k] = −NupTrωoff −Nupφp −Nup(Nup + 1)

2Trωi ≈ −π (5.20)

⇒ Nup

(Trωoff + φp +

Nup + 1

2Trωi

)≈ π (5.21)

97


The frequency/ phase correction by the integral path can be ignored to approximate Nup.

This assumption is valid since Nup is small at the edge of pull-in range and since Ki is

usually ≪ Kp to maintain stability and to avoid strong ringing in the step response.

Nup ≈π

Trωoff + φp(5.22)

Now, when there is a negative phase error for the following Ndn periods, one can write

∆φe[Nup + 1] = −Trωoff − φp[Nup + 1]−φi[Nup + 1] = −Tr(ωoff +Nupωi) + φp + Trωi

∆φe[Nup + 2] = −Trωoff − φp[Nup + 2]−φi[Nup + 2] = −Tr(ωoff +Nupωi) + φp + 2Trωi

∆φe[Nup + 3] = −Trωoff − φp[Nup + 3]−φi[Nup + 3] = −Tr(ωoff +Nupωi) + φp + 3Trωi

. . .

∆φe[Nup +Ndn] = −Trωoff − φp[Nup +Ndn]− φi[Nup +Ndn]

= −Tr(ωoff +Nupωi) + φp +NdnTrωi

From the last equation, it is obvious that the frequency correction at the end of a cycle

slipping period is merely (Ndn − Nup)ωi. To guarantee frequency acquisition, Ndn ≥Nup + 1 to ensure the frequency error is reduced by at least ωi at the end of the cycle

slipping period. Otherwise, the frequency error will remain unchanged and not converge

to zero with time. Similar to the phase shift by up pulses, the summation of the phase

shift contribution by down pulses,Nup+Ndn∑Nup+1

∆φe[k], shall be in the neighborhood of −π

and it is equal to the following:

Nup+Ndn∑

Nup+1

∆φe[k] = −NdnTr(ωoff +Nupωi) +Ndnφp +Ndn(Ndn + 1)

2Trωi ≈ −π (5.23)

Substituting Ndn = Nup + 1 in Eq. 5.23 above gives the following simplified form

(Nup + 1)

(Trωoff − φp +

Nup − 2

2Trωi

)= π (5.24)

It is important to note that the total phase change in the first half of the cycle

slip,∑Nup

1 ∆φe[k], was approximated to −π. However, ∑Nup

1 ∆φe[k] could be smaller or

larger than −π depending on the initial condition and the phase shift contributed by the

98


previous cycle slip. Accordingly,∑Nup

1 ∆φe[k] is constrained as it follows

−π − Trωoff − φp −NupTrωi <

Nup∑

1

∆φe[k] < −π + Trωoff + φp + Trωi (5.25)

For example, if there is an initial 14% frequency offset, then the phase shift caused by

this offset is constrained by π − 0.28π < |Nup∑1

φup[k]| < π + 0.28π radian (while ignoring

the effect of φp and φi which are usually much smaller than Trωoff).

Similarly, the total phase change during the second half of cycle slip is constrained as

it follows

−π − Trωoff − φp −NdnTrωi <

Nup+Ndn∑

Nup+1

∆φe[k] < −π + Trωoff + φp + (Ndn −Nup)Trωi

(5.26)

However, the total phase change during a full cycle slip (∑Nup+Ndn

1 ∆φe[k] =∑Nup

1 ∆φe[k]+∑Nup+Ndn

Nup+1 ∆φe[k]) is independent of the initial frequency error and is very close to −2π

−2π − (Nup − 1)Trωi <

Nup+Ndn∑

1

∆φe[k] < −2π +NdnTrωi (5.27)

Accordingly,∑Nup+Ndn

1 ∆φe[k] gives an accurate estimate of the pull in range. So, add

Eq. 5.24 to Eq. 5.21 to get the total phase shift during the first cycle slip:

(2Nup + 1)Trωoff − φp + (N2up − 1)Trωi = 2π (5.28)

Assume that (N2up − 1)Trωi ≪ φp (which is the case when Ki < Kp/64) such that the

effect of ωi can be ignored in Eq. 5.28. Now, substitute Eq. 5.22 into Eq. 5.28

[2π

Trωoff + φp

+ 1

]Trωoff − φp ≈ 2π (5.29)

Rearranging Eq. 5.29 will result in a quadratic equation in terms of φp that is easy to

99


solve

Trωoff ≈√(2π + φp)φp (5.30)

From Eq. 5.30, the pull-in range can be expressed as

fpull−in =ωoff

2π≈ fref

√(2π + φp)φp

2π(5.31)

The pull in range formula can be simplified further without losing accuracy by recalling

that φp ≪ 2π (as required to achieve stability and low peak to peak jitter performance).

fpull−in ≈ fref

√φp

2π=

√frefKdcoKpKPD

2π(5.32)

The pull in frequency range as defined in Eq. 5.31 and Eq. 5.32 is dependent on the

proportional loop gain which is similar to the conclusion presented in [55] but Eq. 5.32 is

more accurate and gives insight on the effect of phase detector gain on the pull in range

during frequency acquisition. The improvement in the pull in range due to the use of

MPBBD over BBPD is proportional to√KPD. Hence, using a MPBBD with a static

transfer characteristic as in Fig. 5.2 has√2.5 = 1.58 times larger pull in range compared

with a BBPD. Based on simulation results, as shown in Table 5.1, the improvement is

58.4% which is almost as predicted by Eq. 5.31. Finally, Table 5.1 presents a comparison

of pull in range that was found using simulation and theoretical results presented in this

section as well as results developed by the authors of [55] and [54]. It clearly shows that

Eq. 5.32 accurately predicts the pull in range with less than 0.02% error. Despite the

Simulation Equation 5.31 Postula [55] Salama [54]

BBPD 6.25% 6.26% 4.22% 1.20%MPBBD 9.90% 9.92% 6.51% 2.45%

Table 5.1: Comparison of the pull-in range of BBPD vs. MPBBD base DPLL usingsimulation and theoretical findings when Kp = 3 and Ki = 1/32.

simplicity of Eq. 5.31 and Eq. 5.32, they both do not capture the effect of Ki on pull-in

range which could underestimate it especially if Ki is not very small relative to Kp. To

include this effect, substitute Eq. 5.22 into Eq. 5.28 without ignoring ωi. This will lead

to cubic equation that can be solved for ωoff

100


(Trωoff)3 + (φp − Trωi)(Trωoff)

2 − (2π + φp + 2Trωi)φpTrωoff

+ (π2Trωi − φ3p − Trωiφ

2p − 2πφ2

P ) = 0 (5.33)

The exact solution to Eq. 5.33 is not provided here but Fig. 5.5 shows the pull-in range

for variousKp andKi combinations when a BBPD is used. Table 5.2 presents a numerical

summary of pull-in range for different combinations of Kp and Ki.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.16

7

8

9

10

11

12

13

14

15

Normalized Integral path step (KiKdco/fref *100%)

Pulli

ng ra

nge

( / f re

f * 10

0% )

Kp = 3

Kp = 12

Kp = 6

Normalized Integral path step (KiK

dco/f

ref * 100%)

Figure 5.5: The pull-in range (normalized to reference frequency) of BBPD-DPLL fordifferent values of Ki and Kp. Dashed lines obtained by Eq. 5.31, solid lines obtained byEq. 5.33, and symbols are based on simulations.

Kp Ki Sims Eq. 5.31 Eq. 5.33 [55]

3 0.04 6.75% 6.26% 6.60% 4.43%3 0.68 8.15% 6.26% 9.46% 4.66%12 0.04 12.35% 12.59% 12.68% 8.84%12 0.68 13.75% 12.59% 13.94% 8.86%

Table 5.2: The pull-in range (normalized to reference frequency) of BBPD-DPLL basedon simulations and presented theory as well as based on other references.

101


5.3.2 Locking Time

In this section, the frequency locking time is formulated by analyzing the reduction of

frequency offset (i.e. error) in each cycle slipping period. Based on the analysis provided

in the previous section, it is evident that the frequency offset is not constant during each

cycle slipping period. By contrast, it increases in the first half and then decreases in the

second half of each cycle slipping period, or vice versa, depending on the polarity of ωoff .

However, to simplify the analysis, the frequency offset can be approximated as constant

during each cycle slip and is equal to its average value denoted by ωoff (for example,

ωoff = ωoff + (Nup + 1)ωi/2 based on Eq. 5.21). Accordingly, the change in phase and

frequency error will be indexed using the cycle slipping period rather than the reference

period. Define the number of up pulses during the mth cycle slip as Nup[m] where m is

the cycle slip index. Similarly, define the number of down pulses during the mth cycle slip

as Ndn[m]. The period of each cycle slip Tcycle−slip[m] is changing as the PLL progress

towards frequency locking and it can be defined as

Tcycle−slip[m] = Tr(Nup[m] +Ndn[m]), where Tr = 1/fref (5.34)

The frequency locking time tlock is simply the summation of all cycle slips periods

tlock =n∑

m=0

Tcycle−slip[m] =n∑

m=0

Tr(Nup[m] +Ndn[m]) (5.35)

Note that n represents the number of cycle slips exhibited before frequency error becomes

smaller than 2φp.

The longest locking time happens when∑Nup

1 ∆φe[k] contributes the highest possible

phase shift while∑Nup+Ndn

Nup+1 ∆φe[k] has the smallest possible phase shift such that not

many down pulses are generated to correct the frequency error. Recall Eq. 5.20 to

Eq. 5.26 and express the phase shift during the longest locking time

Nup[m](Trωoff [m] + φp) < π + φp + PwTrωoff [m] (5.36)

Ndn[m](Trωoff [m]− φp) > π − φp − PwTrωoff [m] (5.37)

where ωoff [m] is the average frequency offset during the mth cycle slipping period and

Pw is the probability of having the worst locking time. I will assume that Pw = 0, as

102


was done in [55], and make some other assumptions to find a simple form for the locking

time. Later, exact solution of the worst case locking time will be provided but at the

expense of a very complex equation. Rearrange Eq. 5.36 and Eq. 5.37 to express the

summation and the difference of Nup[m] and Ndn[m] pulses during each cycle slipping

period as follows:

Ndn[m] +Nup[m] =2πTrωoff [m]− 2φ2

p

(Trωoff [m])2 − φ2p

(5.38)

Ndn[m]−Nup[m] =2πφp − 2Trωoff [m]φp


(5.39)

The frequency correction by the end of themth cycle slip is approximately ωi(Ndn[m]−Nup[m]). Mathematically

ωoff [m+ 1] = ωoff [m] + ωi(Ndn[m]−Nup[m]) (5.40)

Substitute Eq. 5.39 into Eq. 5.40, then

ωoff [m+ 1]− ωoff [m] = ωi

2πφp − 2Trωoff [m]φp


(5.41)

Based on Eq. 5.41, the change of frequency offset with respect to the number of cycle

slips can be expressed using a continuous variable of ωoff as follows:

∂ωoff

∂m= ωi

2πφp − 2Trωoffφp

(Trωoff)2 − φ2

p

(5.42)

≈ 2πωiφp

(Trωoff )2 − φ2

p

, assuming Trωoff ≪ π (5.43)

Accordingly, the number of cycle slips, given an initial frequency offset (i.e. error ), can

be calculated as

Ncycle−slips =

ωoff∫

ω0=2φp/Tr

∂m

∂ωoff

· ∂ωoff =(Trωoff)

3 − 3Trωoffφ2p − 2φ3

p

6πTrωiφp, (5.44)

103


0 1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300

Frequency Error (%)

Nu

mb

er o

f C

ycle

Slip

s

Figure 5.6: A plot of the number of cycle slips (Kp = 3 and Ki = 1/32) for BBPD versusMPBBD based DPLL. (a) BBPD: blue circles from simulation results and dashed redline from Eq. 5.44 (b) MPBBD: blue squares from simulation results and solid red linefrom Eq. 5.44.

where ω0 is the maximum frequency offset that does not cause any cycle slipping and

so ω0 = 2φp/Tr. The number of cycle slipping Ncycle−slips vs. a given initial frequency

error is drawn in Fig. 5.6 for both BBPD and MPBBD based PLL. This was done using

Eq. 5.44 and using a time-step simulation of PLL using MATLAB. The agreement and

accuracy of Eq. 5.44 with simulation is well established.

A cycle slip period shown in Eq. 5.34 can be expanded by using Eq. 5.38

Tcycle−slip[m] = Tr2πTrωoff [m]− 2φ2

p


(5.45)

Accordingly, the change of cycle slipping period with respect to time index m can be

104


expressed as

∂Tcycle−slip

∂m= Tr

2πTrωoff − 2φ2p

(Trωoff)2 − φ2

p

(5.46)

≈2πT 2

r ωoff

(Trωoff)2 − φ2

p

, assuming φp is very small (5.47)

Using differentiation by substitution and recall Eq. 5.42 and Eq. 5.46, the change of

frequency error with respect to cycle slipping period is

∂ωoff

∂Tcycle−slip

=∂ωoff

∂m· ∂m

∂Tcycle−slip

=ωi

Tr



(5.48)

⇒∂ωoff

∂Tcycle−slip≈ ωiφp

T 2r ωoff

, assuming π ≫ Trωoff ≫ φp (5.49)

And finally, the locking time can be found by recalling Eq. 5.35

tlock =n∑

m=0

Tcycle−slip[m] ≡ωoff∫

ω0=2φp/Tr

∂Tcycle−slip

∂ωoff

· ∂ωoff (5.50)

Using Eq. 5.49 ⇒ tlock =

ωoff∫

2φp/Tr

T 2r ωoff

ωiφp· ∂ωoff =

1

2ωiφp

[(Trωoff)

2 − 4φ2p

](5.51)

⇒ tlock = Tr

[ω2off

2KiKpK2dcoK

2

PD

− 2Kp

Ki

], assuming π ≫ Trωoff ≫ φp (5.52)

For example, if there is 5% frequency error presented at the input of BBPD-PLL running

from 100 MHz reference clock with Kp = 3 and Ki = 0.08 while the DCO LSB is 130

kHz/step, then the locking time is estimated to be 10ns ∗(

(2π∗100e6∗0.05)2

2∗3∗0.08∗(2π∗130e3)2− 2∗3

0.08

)=

30.07µ s. Simulations show a locking time of 32µs.

Eq. 5.52 shows that the frequency locking time is dependent on the square of fre-

quency offset i.e. tlock ∝ ω2off and so it takes an incredibly long time to achieve frequency

acquisition for a large frequency error. Hence, using a binary BBPD to achieve frequency

105


locking is usually not an option. Furthermore, the locking time is inversely proportional

to the loop parameters i.e. tlock ∝ 1KiKp

. Improving the locking time by increasing Ki

and/ or Kp is not desirable since it alters the steady-state behavior of the PLL. Alter-

natively, using a phase detector with an automatic gain shifting mechanism is a better

option. Note that the locking time is inversely proportional to the square of the average

phase detector gain i.e. tlock ∝ 1

K2PD

. For a MPBBD with an average gain of KPD = 2.5,

the improvement in locking time is 2.52 = 6.25 times compared with a BBPD with a gain

of 1. This shows the benefit of using a MPBBD to speed up the frequency and phase

locking performance while not hurting the small signal steady state performance since a

MPBBD will perform like a BBPD once locking is achieved.

By relaxing the assumption (π ≫ Trωoff ≫ φp) used to derive Eq. 5.52, a more

accurate formula for locking time can be found by integrating the inverse of Eq. 5.48

tlock =Trωi

ωoff∫

2φp/Tr



· ∂ωoff (5.53)

= − π

ωiφp

[Trωoff +

(π2 − φ2

p

π

)· ln(π − Trωoff )

]ωoff

2φp/Tr

⇒ (5.54)

tlock =π

ωiφp

[(π2 − φ2

p

π

)· ln(

π − 2φp

π − Trωoff

)− (Trωoff − 2φp)

]⇒ (5.55)

=π

TrKiKpK2dcoK

2

PD

[(π2 − φ2

p

π

)· ln(

π − 2φp

π − Trωoff

)− (Trωoff − 2φp)

](5.56)

Using the same example as above, the calculated locking time using Eq. 5.55 is 32.25µs

compared to 32µs from simulation. This is closer than 30.07µs from Eq. 5.51 but at

the expense of using a more complex expression. However, Eq. 5.55 does not provide an

intuitive understanding of the effect of offset frequency, ωoff , on locking time.

Finally, the worst case locking time can be estimated by considering the maximum

phase contribution when the PD is high and the lowest phase contribution when the PD

output is low (recall Eq. 5.36 and Eq. 5.37). Following similar steps as shown before,

106


0 1 2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

Frequency Error (%)

Fre

qu

ency

Lo

ckin

g T

ime

(us) Eq. 5.51

Eq. 5.58

Eq. 5.55

Figure 5.7: Frequency locking time until cycle slips disappear (Kp = 3 and Ki = 1/32)for a BBPD (blue circles from simulation) and MPBBD (blue squares from simulation)based DPLL. Eq. 5.51 is represented using solid red, Eq. 5.55 using small dashed blueline, while Eq. 5.58 using large dashed red.

the worst case locking time can be expressed as the following:

tlock =1

ωi

Trωoff∫

2φp

(2π − 2φp)Trωoff − 2φ2p

2πφp − 2Trωoffφp − 2Pw(Trωoff)2· ∂(Trωoff) (5.57)

⇒ tlock =1

2Pwωi

· (φp − π)[ln(Pwφ

2ferr

+ φpφferr− πφp)

]φferr

=Trωoff

φferr

=2φp

− 1

Pwωi·√φp(φp(2Pw − 1) + π)√

φp + 4πPw

[tanh−1

(φp + 2Pwφferr√φp(φp + 4πPw)

)]φferr

=Trωoff

φferr

=2φp

(5.58)

Fig. 5.7 shows locking time vs. a given initial frequency error expressed as a per-

centage of the reference clock frequency. The locking time is plotted using Eq. 5.51, Eq.

5.55, and Eq. 5.58 for both BBPD and MPBBD based PLL. Furthermore, time-step sim-

ulations of such PLLs are developed in MATLAB and SimuLink to verify the theoretical

107

5.4. Fast Simulation Model of a DPLL with Quantized Phase Detector

results. It is obvious that locking time based on simulation ranges between Eq. 5.51 and

Eq. 5.58. Furthermore, the simple Eq. 5.51 fairly predicts locking time and gives useful

insights into how the loop parameters and initial conditions affect the locking time.

5.4 Fast Simulation Model of a DPLL with Quan-

tized Phase Detector

Simulating PLLs is difficult and takes a considerable amount of time because of the vastly

different time scales between the reference and the output clock. Furthermore, the time

step of the simulator is usually 1/100 to 1/1000 smaller than the smallest output period

in the system. This section presents a compact and a fast simulation model of DPLLs

to find the locking time and pull in range quickly. The model updates the jitter and

frequency correction provided by the integral path every reference cycle. Consequently,

the presented model is three to four orders of magnitude faster than running a Verilog-A

behavioral simulation, and even faster compared to a Spice circuit simulation.

5.4.1 Model Development

Let tref represent the timing of the rising edge of the reference clock and tdco represent

the timing of every N th rising edge of the output DCO clock. Accordingly, the absolute

jitter of the output clock with respect to the reference clock is simply jabs = tref − tdco.

The output clock is sub-sampled by the reference clock and so the resolvable phase

difference between them must be within one reference period. Accordingly, the absolute

timing jitter, jabs, is warped to (−Tref/2,+Tref/2) where Tref is the reference clock period

and assumed to be constant. The phase detector works as a quantizer of the wrapped

absolute jitter. Finally, the DCO can be modeled in the time-domain rather than in the

frequency-domain.

Tout = T 0out +KT · ν (5.59)

where Tout is the instantaneous DCO period, T 0out is the free running period, ν is the DCO

control input, KT is the DCO period correction factor in seconds and is approximated

as KT ≃ KdcoT2out where Kdco is the DCO frequency resolution.

108


Based on this initiation and recalling Fig. 5.3 from section 5.2, one can write the

following set of state-equations using the reference time index k.

Absolute jitter: jabs[k] = tref [k]− tdco[k] (5.60)

Wrapped jitter: ∆t[k] = W(jabs[k]) ≡ (jabs[k] + Tref/2)%Tref − Tref/2 (5.61)

PD output: ε[k] = Q(∆t[k]) ≡ KPD · sgn(∆t[k]) (5.62)

Integrator: ψ[k] = ψ[k − 1] + ε[k] (5.63)

Loop filter: ν[k] = Kp· ε[k] +Ki·ψ[k] (5.64)

DCO: Tout[k] = T 0out +KT · ν[k] (5.65)

As the reference time index, k, progresses, the rising edge of the reference clock, tref [k],

increases by one reference period, Tref [k]. The jitter on the reference clock is usually very

small compared to other sources of jitter and so one can assume a jitter free reference to

simplify the following analysis. Also, the rising edge of output clock, tdco[k], increases by

N multiples of the output period, Tout[k] in each reference period as follows:

tref [k + 1] = tref [k] + Tref [k] ≈ tref [k] + Tref (5.66)

tdco[k + 1] = tdco[k] +NTout[k] (5.67)

Subtract Eq. 5.67 from Eq. 5.66 to find the absolute jitter

jabs[k + 1] = jabs[k] + Tref −NTout[k] (5.68)

Use Eq. 5.65 ⇒ jabs[k + 1] = jabs[k] + Tref −N(T 0out +KT · ν[k]) (5.69)

= jabs[k] +N(Tref/N − T 0out −KT · ν[k]) (5.70)

The initial output period error, Te, can be defined as the difference between the targeted

output period, T fout, and the initial output period, T 0

out:

Te = T fout − T 0

out = Tref/N − T 0out (5.71)

The loop filter provides a correction signal, ν[k], that translates to a period correction,

109


Te, by multiplying it with the DCO period gain, KT :

Te[k] = KT · ν[k] (5.72)

Note that the DCO period gain, KT , is a function of the operating region but it can

be assumed constant for small frequency error. Then, the period jitter or period error

can be defined as

jper[k] = Te − Te[k] (5.73)

= Tref/N − T 0out −KT · ν[k] (5.74)

Now, substitute Eq. 5.74 into Eq. 5.70 to get

jabs[k + 1] = jabs[k] +Njper[k] (5.75)

The above DPLL time model can be represented as a two-state system where ∆t[k] and

ψ[k] are the state variables. Rearranging the previous equations:

ψ[k] = ψ[k − 1] +Q(∆t[k]) (5.76)

∆t[k] = W(jabs[k]) (5.77)

where jabs[k] = jabs[k − 1] +N(

jper[k−1]︷︸︸︷Te −KT (KpQ(∆t[k − 1]) +Kiψ[k − 1])︸︷︷︸

Te[k−1]

) (5.78)

In the z-domain, this is equivalent to:

Ψ =Q(Φ)

1− z−1(5.79)

Φ = W(Jabs) = Z (∆t[k]), is the z-transform of wrapped jitter (5.80)

Jabs =N

1− z−1

(Te − z−1KT (KpQ(Φ) +KiΨ)

)(5.81)

Based on Eq. 5.79 to Eq. 5.81, one can construct a Simulink model as shown in Fig 5.8

to quickly simulate the locking time and pull in range of a DPLL.

As an example, assume a particular DPLL has the following loop parameters: Kp = 3,

Ki = 0.08, and Kdco = 130 kHz while the reference frequency is 1 GHz and the initial

110


+-

1 - z-11

z-1

Ki

Kp

1 - z-1N

Te

jabsjper ∆t ε

ψ

++W() Q()

Feedback output period correction

KT

Te

ν

Figure 5.8: Discrete-time model of phase error development for fast evaluation of DPLL.

frequency error is 3 MHz. Then, the integral path period correction, Kiψ, settles down

around 13.4µs when a BBPD is used. On the other hand, the DPLL takes only 2.6µs

to settle if the MPBBD is used, as shown in Fig. 5.9(a). Furthermore, the trajectory

of the wrapped absolute jitter, ∆t = W(jabs), vs. the integral path period correction,

Kiψ, gives a compact visualization of the cycle slipping process, as shown in Fig. 5.9(b).

It shows that the BBPD-DPLL went through 23 cycle slips before starting the phase

locking stage while the MPBBD-DPLL experienced only one cycle slip given the same

initial frequency error.

5.4.2 Analogy between DPLL and ∆Σ modulator

It is well known that a ∆Σ modulator may suffer from limit cycles, where its output

bits exhibit a repeating pattern. Limit cycle prevention is typically achieved by adding

a random signal just prior to quantization. Many researchers like [59], [56], and [51]

studied the analogy between a BB-PLL and a ∆Σ modulator. They found that a BB-

PLL could also have spurs caused by limit cycles and it can be eliminated by applying

very small dithering or perturbation prior to the quantizer in a mechanism similar to a ∆Σ

modulator. This section will study this analogy to find the maximum frequency offset that

a BB-DPLL can compensate for without experiencing cycle slipping i.e. the acquisition

frequency range. This will be achieved by relating the cycle slipping phenomena in BB-

DPLL to overloading the quantizer in a ∆Σ modulator. Furthermore, the benefit of using

a MPBBD over a BBPD to extend the acquisition range will be shown.

111


0 2 4 6 8 10 12 14 16−0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

Time (us)

Inte

gral

Path

Per

iod C

orre

ction

(ps)

, kik TΨ

(a) Integral path output (expressed by the amount of output period correction)

−0.5 0 0.5 1 1.5 2 2.5 3 3.5

−180

−120

−60

0

60

120

180

Integrator Path Period Correction (ps)

Phas

e Er

ror (

Degr

ee)

, KiK

TΨ

(b) A trajectory of phase error vs. period correction which emphasizes cycle slipoccurrence as well as the speed of frequency acquisition

Figure 5.9: Integral path output and cycle-slip trajectory for BBPD (blue triangles) andMPBBD (red squares) based DPLL, when frequency offset is 3 MHz (3% frequency errorwhile Kp = 3 and Ki = 1/32).

If the input of a ∆Σ modulator exceeds a specific limit, the quantizer will be over-

loaded. In this case, the ∆Σ modulator becomes unstable and loses its ability to push

the noise to high frequencies. For a PLL, it was shown in the previous section that it

has an implicit wrapping function of the phase error. In other words, once the phase

error exceeds certain limits (±π), it will be wrapped around and it will not overload the

quantizer but it will cause cycle slipping. Accordingly, there is an analogy between the

112


z-1

∆t ε+Q()

KpKT

Te

1 - z-1

NKpKT+-

jper

KpKT

(a) First order ∆Σ modulator when Ki = 0

+-

z-1

1 - z-11 ∆t εψ

++Q()

KpKT

Te

1 - z-1

NKpKT++

-0

1jper

KpKTKi/Kp

(b) Second order ∆Σ modulator when Ki > 0

Figure 5.10: Equivalent ∆Σ representation of DPLL with a quantized phase detector.

maximum input to ∆Σ modulator before overloading the quantizer and the maximum

frequency offset before experiencing cycle slipping. To quantify this analogy and to draw

conclusion from it, I will develop an equivalent ∆Σ representation of a DPLL with a

quantized phase detector.

When Ki = 0, Eq. 5.81 reduces to a state-space representation of a first order ∆Σ

modulator with constant input Te

KpKTrepresenting the initial output period error scaled

to the proportional period correction. Fig. 5.10(a) shows this equivalent representation.

Note that the wrapping function is not shown since the model is only valid for inputs

that do not overload the quantizer (which could be binary or multi-level).

Φ =N

1− z−1

(Te − z−1KTKpQ(Φ)

)(5.82)

=NKTKp

1− z−1

(Te

KTKp− z−1Q(Φ)

)(5.83)

When Ki > 0, it can be shown that 5.79 to 5.81 reduce to a state-space representation

of a second order ∆Σ modulator with constant input Te

KpKT. Note that the input is not

exactly constant but it varies over time according to the period jitter and operating

113


region. However, this variation is very small compared to the clock period and is very

helpful to get rid of limit cycles in case they appear. Finally, after rearranging the blocks,

the equivalent model in Fig. 5.10(b) is obtained.

A binary BBPD-DPLL is analogous to a ∆Σ modulator with a binary quantizer while

a MPBBD-DPLL is analogous to a ∆Σ modulator with a multi-bit quantizer, which ac-

counts for its improved stability and locking behavior. The literature on ∆Σ modulators

provide asymptotic limits on the maximum input before overloading the quantizer. For

multi-bit ∆Σ modulators with M-step quantizer, the modulator is guaranteed not to

experience overloading for any input u such that [60]

max|u| ≤M + 2−NTF (z = −1) (5.84)

where NTF represents the modulator’s noise transfer function. For the equivalent ∆Σ

model of a DPLL, shown in Fig. 5.10(b), the NTF can be found equal to

NTF (z) =(1− z−1)2

(1− z−1)2 +NKTKiz−1(1 +Ki/Kp − z−1)≈ 1 (5.85)

The criteria in Eq. 5.84 can be used to theoretically define the maximum initial period

error, or equivalently the maximum initial frequency error (acquisition frequency range),

such that a DPLL will not experience any cycle slips.

The input limit defined in Eq. 5.84 is the ratio of the initial period error to the

instantaneous period correction provided by the proportional path i.e., u = Te

KTKp. Ac-

cordingly, max|u| ≤ 1 + 2 − 1 = 2 when a BBPD is used while max|u| ≤ 4 + 2 − 1 = 5

when a MPBBD with eight levels is used. Hence, a BBPD-DPLL will not experience cy-

cle slipping as long as the initial output period error is less than twice the instantaneous

period correction provided by the proportional path i.e. Te ≤ 2KTKp.7. For a MPBBD,

Te ≤ 5KTKp which is 2.5 times larger than the case of binary BBPD. This factor is

equivalent to the average gain of MPBBD, KPD8. Accordingly, employing a MPBBD

rather than a BBPD will improve the frequency acquisition range by KPD.

Simulations show a similar conclusion. A BBPD-DPLL with Kp = 3 and Kdco =

7Note that u = Te

KpKT≡ ωerr

KpKdco⇒ Te ≤ 2KTKp or ωerr ≤ 2KpKdco to avoid cycle slipping.

8To find the frequency locking time, the lower limit of integration in Eq. 5.50 and Eq. 5.53 are chosento be twice the phase correction provided by the loop filter i.e. 2φp/Tr = 2KpKdcoKPD. This was basedon the analogy discussed in this section.

114


0 1 2 3 4 5 6 7 8 9 10

−4

−3

−2

−1

0

1

2

3

4

Time (us)

Pha

se D

etec

tor

Out

put

(a) BBPD output

0 1 2 3 4 5 6 7 8 9 10

−4

−3

−2

−1

0

1

2

3

4

Time (us)

Pha

se D

etec

tor

Out

put

(b) MPBBD output

0 1 2 3 4 5 6 7 8 9 10−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Time (us)

Inte

gral

Pat

h P

erio

d C

orre

ctio

n (p

s)

(c) Integral path output (BBPD)

0 1 2 3 4 5 6 7 8 9 10−0.5

0

0.5

1

1.5

2

2.5

3

3.5

Time (us)

Inte

gral

Pat

h P

erio

d C

orre

ctio

n (p

s)

(d) Integral path output (MPBBD)

−0.5 0 0.5 1 1.5 2 2.5 3 3.5

−180

−120

−60

0

60

120

180


Pha

se E

rror

(D

egre

e)

(e) Trajectory of phase error vs. period cor-rection (BBPD)

−0.5 0 0.5 1 1.5 2 2.5 3 3.5

−180

−120

−60

0

60

120

180


Pha

se E

rror

(D

egre

e)

(f) Trajectory of phase error vs. period vor-rection (MPBBD)

Figure 5.11: Transient simulation comparison between BBPD and MPBBD based DPLL,when frequency offset is 2.5 MHz (2.5% frequency error while Kp = 3 and Ki = 1/32).

115


130 kHz will not experience cycle slipping as long as Te ≤ 2.95KTKp (equivalent to

1.15 MHz frequency error). Replacing BBPD with MPBBD will ensure that the DPLL

will not experience cycle slipping as long as Te ≤ 7.05KTKp (equivalent to 2.75 MHz fre-

quency error). Based on these simulations, the improvement factor of using a MPBBD

compared with a BBPD is 7.05/2.95 = 2.39 which is very close to the predicted improve-

ment of KPD = 2.5. Fig. 5.11 compares the locking time and behavior of BBPD vs.

MPBBD based DPLL when the frequency error is 2.5 MHz. The BBPD-DPLL experi-

enced 12 cycle slips before locking (u = 2.5MHz3∗130kHz

= 6.41 < 7.05) while MPBBD-DPLL

does not slip at all which greatly improves locking time (u = 6.41 < 7.05).

5.4.3 Improved MPBBD (IMPBBD) without cycle slipping to

accelerate Frequency Acquisition

The previous sections demonstrated that the use of a PLL with a MPBBD instead of

a BBPD greatly improves the locking time by K2

PD while it only improves the pull in

range by√KPD. For either case, a PLL still encounters cycle slipping which slows down

the locking process and restricts the pull in range. During each cycle slip period, the

frequency error increases during the first half of cycle slip period and then decreases

during the second half. Based on this understanding, providing a frequency correction

in the right direction, every time step, will break the cycle slipping phenomena.

Similar to the idea of a frequency rotator presented in [60] and [61], Fig. 5.12 shows

a modified transfer function for a MPBBD by merely reversing the sign of either half

of the original transfer function, according to the sign of the frequency error. Due to

the digital nature of DPLLs, the MPBBD transfer function is usually represented as a

look-up-table (LUT). Hence, their transfer function can be easily modified by changing

the associated LUT.

To implement that modification, a simple finite-state machine (FSM) is needed along

with a digital differentiator (i.e. 1 − z−1) to find the sign of the frequency error. The

improved MPBBD (IMPBBD) extends pull in range to ±fref and achieves much faster

locking time as shown in Fig. 5.13. In [62], a modified bang-bang algorithm with FSM

is presented to speed up the locking time for a high speed DPLL. However, the proposed

IMPBBD architecture enhances both locking time and pull in range without increasing

circuit complexity.

116


Fig. 5.14 shows the phase error in degrees vs. the integral path period correction

for BBPD, MPBBD, and IMPBBD based DPLL when the initial frequency error is

7.5 MHz. From Fig. 5.14(a) it is obvious that BBPD-DPLL does not converge to a stable

state but instead exhibits limit cycles. MPBBD-DPLL converges to the right frequency

correction but after experiencing many cycle slipping, as shown in Fig. 5.14(b). Finally,

the IMPBBD-DPLL converges to the right frequency correction after semi-slipping for

only 8 cycles, as shown in Fig. 5.14(c). Another set of plots (the same experiment and

conditions) of MPBBD and IMPBBD are shown in Fig. 5.15(a) and Fig. 5.15(b).

5.4.4 Verilog-A Simulation of DPLL

All the simulations presented in this section are based on the MATLAB/Simulink real-

ization of the state-space model shown in Fig. 5.8 where the state variables are updated

at the beginning of each reference period. To validate the accuracy of that model, a time-

step simulation in a Cadence environment is conducted using a Verilog implementation

of the DPLL along with a Verilog-A model of the DCO and reference clock to model

their jitter performance. The time step for the Verilog simulation is 1/100 of the output

LUT Output

-4

+4

+1

-1¼Y ¾Y½Y Y

-¼Y-¾Y -½Y-Yφe

LUT Output

-4

+4

+1

-1¼Y ¾Y½Y Y

-¼Y-¾Y -½Y-Yφe

LUT Output

-4

+4

+1

-1¼Y ¾Y½Y Y

-¼Y-¾Y -½Y-Yφe

Large positive

frequency error

Large negative

frequency error

Phase error and small

frequency error

Figure 5.12: Modification of the MPBBD transfer function to extend pull-in range andreduce acquisition time. The improved MPBBD (IMPBBD) identifies the sign of theinitial frequency error and accordingly change its transfer function.

117

5.5. Implemented Architecture

DCO period (which is equivalent to 1/1000 of the reference period in this case). In this

case, the time-step simulation is three orders of magnitude slower than the MATLAB/

Simulink simulation though both simulations show very similar locking time and pull in

range. For example, Fig. 5.16 shows the locking behavior for a BBPD and a MPBBD

based DPLL using Verilog-A simulation when the initial frequency error is 6.0 MHz.

5.5 Implemented Architecture

This section presents a silicon implementation of a DPLL with MBPPD to verify some of

the theories and conclusions in this chapter. The implemented architecture, shown in Fig.

5.17, is based on a 8-level MPBBD that samples the outputs of a multi-phase oscillator.

In [63], a multi-phase oscillator is used to synthesize simple fractional channels (like 1/2,

1/4, 1/8) by making use of the implicit TDC formed by the oscillator and a MPBBD.

However, in the proposed architecture, a MPBBD is used for automatic gear shifting to

accelerate the phase and frequency locking compared to a binary BBPD. Furthermore,

−100 −80 −60 −40 −20 0 20 40 60 80 1000

10

20

30

40

50

60

Lock

ing

Tim

e (u

s)

Offset Frequency (%)

Figure 5.13: Pull in range and locking time of BBPD (blue ∗), MPBBD (red ), andIMPBBD (green •) based DPLL. The lock-in range of the IMPBBD is extended to ±fref(fref is 100 MHz and fout is 1 GHz while Kp = 3 and Ki = 1/32).

118


0 1 2 3 4 5 6 7 8

−180

−120

−60

0

60

120

180


Phas

e Er

ror (

Degr

ee)

−0.01 −0.005 0 0.005 0.01

−180

−120

−60

0

60

120

180

Convergence point

, KiK

TΨ

(a) BBPD-based DPLL; frequency offset is larger than the pull-in rangeand so DPLL exhibits limit cycle without converging to the right frequency

0 1 2 3 4 5 6 7 8

−180

−120

−60

0

60

120

180


Phas

e Er

ror (

Degr

ee)

, KiK

TΨ

(b) MPBBD-based DPLL has larger pull-in range and faster acquisitiontime compared with BBPD-based DPLL

0 1 2 3 4 5 6 7 8

−180

−120

−60

0

60

120

180


Phas

e Er

ror (

Degr

ee)

, KiK

TΨ

(c) IMPBBD-based DPLL has extended pull-in range and very fast acqui-sition time

Figure 5.14: Integral path output and cycle-slip trajectory for DPLL with three differentphase detectors (frequency offset is 7.5 MHz and fref is 100 MHz while Kp = 3 andKi = 1/32).

119


0 1 2 3 4 5 6 7 8 9 10

−4

−3

−2

−1

0

1

2

3

4

Time (us)

Pha

se D

etec

tor

Out

put

(a) MPBBD output

0 1 2 3 4 5 6 7 8 9 10

−4

−3

−2

−1

0

1

2

3

4

Time (us)

Pha

se D

etec

tor

Out

put

(b) IMPBBD output

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

Time (us)

Inte

gra

l Pat

h P

erio

d C

orr

ecti

on

(p

s)

(c) Integral path output (MPBBD)

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8

Time (us)

Inte

gra

l Pat

h P

erio

d C

orr

ecti

on

(p

s)

(d) Integral path output (IMPBBD)

0 1 2 3 4 5 6 7 8 9 10−2

0

2

4

6

8

10

Time (us)

Per

iod

Jitte

r (p

s)

(e) Period jitter (MPBBD)

0 1 2 3 4 5 6 7 8 9 10−2

0

2

4

6

8

10

Time (us)

Per

iod

Jitte

r (p

s)

(f) Period jitter (IMPBBD)

Figure 5.15: Transient simulation comparison between MPBBD and IMPBBD basedDPLL, (frequency offset is 7.5 MHz and fref is 100 MHz while Kp = 3 and Ki = 1/32).

120


0 10 20 30 40 50 60 70 80 90 100−80

−70

−60

−50

−40

−30

−20

−10

0

10

20

Time(us)

Inte

gral

Pat

h O

utpu

t (LS

B x

130

kHz)

Figure 5.16: Integral path output of BBPD (dark blue) vs. MPBBD (light red) basedDPLL (frequency offset is 6.0 MHz). The simulation employs uniform time-step sampling(1/100 of DCO period) using a Verilog-A implementation of the DPLL.

the MPBBD gives the DPLL the ability to track large frequency disturbances after being

in lock without involving the FLL loop. By contrast, a binary BBPD will slew and may

take long time to recover, if it ever does, under large frequency disturbance. In [64], a

multi-phase detector is employed where the clock phases are generated locally to allow

in-loop modulation. Our proposed MPBBD has a wider lock range and is simpler to

implement.

In the proposed architecture, an auxiliary frequency lock loop (FLL) is also employed

to extend the lock in range beyond±fref . The FLL is enabled upon reset of the DPLL and

guarantees correct frequency operation without degrading the jitter performance during

lock. The FLL is composed of a re-timing circuit, a counter outputting the number of

output clocks within each reference cycle, an accumulator for the phase of the synthesized

channel based on a given frequency control word (FCW), and finally a digital subtractor.

The FLL controls a 7-bit coarse capacitor bank to bring the four-stage digitally controlled

ring oscillator (DCO) as close as possible to the required output frequency. The FLL

is then disabled after frequency locking is achieved. Accordingly, the proposed DPLL

architecture saves the power of the high speed feedback counter as well as the power of

121


RefS

Ref

Loop Filter

4

+ -+∑

Ref

FCW

8

7

Lock Detect

Freeze FLL

Frequency Lock Loop (FLL)

Counter

LUT

GearShifting

4

MPBBD

A/A

B/B

C/C

D/D

Figure 5.17: The architecture of the implemented DPLL. The FLL and the high-speedcounter, as well as the synchronization structure, are disabled by a lock detector oncefrequency lock occurs. The MPBBD locks the phase of the output clock (phase A) tothe reference clock.

the re-timing circuit during steady state operation (output clock is phase and frequency

locked to the reference clock).

In a classical PLL, whether it is analog or digital, the feedback divider’s phase noise

appears at the output amplified by a factor of N2. On the other hand, using a sub-

sampling phase detector [65] eliminates the amplification factor of input refereed noise

and totally eliminates the divider noise. The reference clock phase noise is still multiplied

by N2 when transferred to the output.

The presented MPBBD is similar to the sub-sampling detector where the oscillator

output is sub-sampled by the reference clock and no divider is used during phase lock.

Accordingly, the phase noise of the proposed DPLL is independent of the frequency

control word (FCW) i.e. the multiplication factor. The main source of in-band noise is

the reference clock noise and the noise on power supplies.

122


A lock detector circuit is continuously checking the output patterns of MPBBD and

FLL to determine whether the DPLL is frequency and phase locked or not. During steady

state, the FLL loop is disabled. Compared to a regular binary BBPD, the MPBBD is

able to track larger phase or frequency error without reactivating the FLL and without

slewing for a long time. However, if the MPBBD output is slewing for a long time due

to a very large frequency error, the FLL gets enabled again until locking is achieved.

5.5.1 DCO

The DCO is a four-stage differential ring oscillator, as shown in Fig. 5.19, where each

stage has a 7-bit coarse capacitor bank and an 8-bit fine capacitor bank. The 4-MSBs of

the fine bank are binary encoded while the 4-LSBs are thermometer encoded to reduce

switching activity during locking. The tuning capacitors are implemented as switched

active MOS device, as shown in Fig. 5.18. The layout of the DCO is highly regular and

so automated layout using, for example, TCL is possible if a highly automated design

flow is sought.

The DCO output is a rail-to-rail clock signal where different frequencies are achieved

by adding or removing MOS capacitors. Accordingly, the power consumption, which is

proportional to f.C.V 2, is quite consistent regardless of the DCO frequency. The power

consumption can be brought down by using a current steering programmable DCO.

In this case, the mismatches between the DCO phases are not large compared to one

quarter DCO cycle. In general, the effect of a phase mismatch or duty cycle distortion

could affect the speed of locking under a large frequency disturbance. But, it would not

effect the steady state jitter performance when the loop operates, effectively, as a BBPD.

5.5.2 MPBBD

The eight output phases of the DCO (four differential phases) are sampled by the refer-

ence clock. This results in a 4-bit output stream which carries information of the phase

error sign as well as its magnitude, as shown in Fig. 5.19 and Fig. 5.20. The raw output

of the MPBBD is semi-thermometer encoded and passed through a LUT to generate a

binary representation of the phase error magnitude and sign: -4, -3, -2, -1, +1, +2, +3

or +4 as shown in Fig. 5.20. The loop filter is programmable such that the gain of both

the proportional and integral paths can be altered to achieve a specific performance. For

123


out+out-

In+ In-

out+In-

out-In+

F<7:4> & T<15:1>

BB-PLL set

FLL set C<6:0>

out+out-

C<i> out+out-

T<i>

F<4> == 16 * T<i>

Figure 5.18: The programmable delay unit used to form a four-stage DCO. Each unithas 7-bit coarse cap configuration and 8-bit fine cap implemented as a combination of4-bit binary along with 15 thermal caps.

example, if the output of the MPBBD is [ACBD] = 0110, then the reference clock is

leading the output clock by more than 90 degrees but less than 135 degrees. And so, the

corresponding LUT output is -3, providing a larger phase correction signal to the DPLL

loop filter.

During steady state operation, assuming the phase error remains in the range ±π/4degrees, the MPBBD alternates between +1 and -1 with the same dynamics as a PLL em-

ployed a simple binary BBPD. However, during initial locking when the DCO frequency

is not locked, the MPBBD output will span the full range from -4 to +4. Similarly, when

the input clock is frequency modulated the MPBBD remains active.

5.5.3 Loop Filter

The loop filter has a 16-bit output where the 8-MSBs are directly driving the DCO fine

capacitor bank (which is a combination of a 4-bit binary rationed bank and a 15-bit unary

thermal banks). The 8-LSBs are considered fractional bits that represent a frequency

step smaller than one DCO LSB step. To realize finer DCO frequency resolution, the 8-

LSBs at the loop filter output are fed to a first order noise-shaping delta-sigma modulator

(DSM) that drives one LSB capacitor at one fourth of the output frequency i.e. the DSM

frequency ranges from 150-375 MHz.

124


A A

B

C

D

B

C

D

A B C D

A B C D

+ -- +

+ -- +

+ -- +

+ -- +

Ref

A

B

C

D

A

B

C

D

0

0

1

0

1

1

0

1

Ref

+ve Φe

-ve Φe

MPBBD Outputs

+1

+2+3

+4

-1

-2-3

-4

Figure 5.19: Timing diagram of the multi-phase DCO sampled by a reference clock atsome point. Based on the sequence of MPBBD outputs (01111000), the LUT provides anindication of the phase error magnitude between phase A and reference clock as shownin the circle on the right bottom side.

FF out LUT A C B D MPBBD BBPD

0 1 0 0 -4 -10 1 1 0 -3 -10 0 1 0 -2 -10 0 1 1 -1 -11 0 1 1 +1 +11 0 0 1 +2 +11 1 0 1 +3 +11 1 0 0 +4 +1

LUT Output

-4

+4

+1

-1¼Z ¾Z½Z Z

-¼Z-¾Z -½Z-Zφe

Figure 5.20: The transfer function of the MPBBD and its LUT (thick solid blue) vs.BBPD (thin dashed red).

125


5.5.4 Simulation Results

10 15 20 25 30 35 40 45 50 55 60

−800

−600

−400

−200

0

200

400

Time (us)

Abs

olut

e P

hase

Err

or (

ps)

Figure 5.21: Absolute phase error of the output clock of DPLL with respect to an idealclock with (a) binary BBPD (thin dashed blue) and (b) MPBBD (solid thick red). Theinitial frequency error is 300 MHz. Data is clipped below 6.5 µs as rest was applied atthat moment after loading the right loop configurations. The FLL takes 15.5 µs (310reference cycles) while PLL takes 30 µs (600 cycles) in case BBPD (a) is used and 5µs(100 cycles) in case MPBBD (b) is employed.

The DPLL settling time is inversely proportional to the loop bandwidth. When a

MPBBD is employed, the DPLL loop bandwidth is increased by factor of 2.5, on average,

during the initial frequency and phase locking operation (in comparison to a binary

BBPD offering the same jitter performance in lock). Accordingly, a DPLL expects faster

locking time by approximately a factor of up to 2.52 = 6.25. Fig. 5.21 shows a behavioral

simulation of the absolute phase error of the output clock converging to very small value

(ideally zero) after phase lock. The speed of convergence (which is an indication of the

settling time) is dependent on the type of phase detector used and on the initial frequency

error. If a binary BBPD is used and the initial frequency error is around 300 MHz (34%

locking range), the phase lock operation takes around 30 µs. In that case, the BBPD

slews for a long time before finding the proper code to drive the DCO, as shown in Fig.

5.22(a). On the other hand, using the MPBBD only 5 µs is needed to achieve phase lock

given the same frequency error and same initial conditions.

126


10 15 20 25 30 35 40 45 50 55 60−4

−2

0

2

4

Time (us)

BB

PD

Out

put

(a) Binary BBPD with maximum gain of ±1

10 15 20 25 30 35 40 45 50 55 60−4

−2

0

2

4

Time (us)

MP

BB

D O

utpu

t

(b) MPBBD with maximum gain of ±4

Figure 5.22: The mapped output of the bang-bang detector (from LUT) during frequencyand phase lock. The binary BBPD slews when phase error is high and takes lengthy timeto recover. On the other hand, MPBBD automatically gears its gain according to thephase error magnitude till lock is achieved.

5.5.5 Measurement Results

A prototype chip was designed and fabricated in the STM 28nm CMOS LP process.

Fig. 5.25 shows a die photograph. The active area is less than 0.008 mm2 including

the decoupling caps and output buffers. The reference clock is an off-chip high quality

20 MHz temperature compensated crystal oscillator with 1 PPM and -143 dBc/Hz phase

noise at 1 kHz from Wenzel Associates.

The measured coarse DCO step is around 2.5 MHz/step while the fine DCO step

is 13 kHz/step on average. Fig. 5.23 shows the simulated phase noise spectrum using

Verilog-A and MATLAB of a 1.2 GHz PLL output on top of the measured phase noise

using an Agilent spectrum analyzer. The in-band phase noise is -98.38 dBc/Hz at 50 kHz

offset and out-of-band phase noise is -142 dBc/Hz at 500 MHz offset. Switching on/off

the FLL has negligible effect on the spectrum. Also, Fig. 5.24 shows the phase noise

127


spectrum of a 1.4 GHz PLL output captured using an Agilent spectrum analyzer. The

in-band phase noise is -96.46 dBc/Hz at 100 kHz offset and out-of-band phase noise is

-143 dBc/Hz at 500 MHz offset.

Figure 5.23: DPLL output phase-noise spectrum at 1.20 GHz: Simulation (blue) vs.measurement (black) captured by an Agilent E4448A spectrum analyzer. The in-bandnoise is -98.32 dBc/Hz while the loop bandwidth is around 1.7 MHz.

The CMOS stages in the DCO have inherently low power supply noise rejection, and

must therefore generally be operated from a regulated supply voltage, using a voltage

regulator. No regulator was integrated into the present design, resulting in higher-than-

expected phase noise.

The DPLL locks to the reference over the range 880 MHz-to-1.20 GHz using 1.1V

power supply. The in-band phase noise was almost the same for the whole locking range.

The power consumption of the DPLL was 502uW (1.1V x 456 µA) excluding the DCO.

Disabling the FLL after locking saved around 85 µW of power in lock. The power

savings would be larger if the DCO (and, hence, the frequency counter) was working at

higher frequency. The DCO consumes from 2.9 – 3.1 mW depending on the frequency

128


of operation. Using a 0.7V supply, the DPLL works at 440 MHz while DPLL (excluding

DCO) only consumes 64uW (0.7V x 91uA).

Figure 5.24: DPLL output phase-noise spectrum at 1.40 GHz captured by an AgilentE4448A spectrum analyzer. The in-band noise is -96.48 dBc/Hz while the loop bandwidthis around 1.7 MHz.

Figure 5.25: Die photograph of the DPLL in 28nm CMOS LP ST MicroelectronicsTechnology (active area is less than 0.008 mm2).

129

5.6. Conclusion

5.6 Conclusion

In summary, DPLLs with BBPDs are becoming more widely used compared to TDC-

based DPLLs, due to their simplicity and low power consumption. However, binary

BBPDs suffer from slewing which limits their pull in frequency range and slows the

locking process if the initial frequency error is large. A multi-phase bang-bang detector

(MPBBD) is proposed to achieve a fast locking time and to extend the pull in range while

reducing slewing and cycle slipping. This is done by realizing automatic gear shifting

that produces large gain for large frequency and phase errors and automatically shifts to

a low gain setting in steady state.

To quantify the advantages of using a MPBBD over a BBPD, an analysis was pre-

sented of the PLL’s behavior given a large initial frequency error. The analysis provides

an accurate closed form for the locking time, number of cycle slips, and pull in range for

a PLL with a quantized phase detector that has an absolute average large signal gain of

KPD. Locking time as well as number of cycle slips are improved by K2

PD. However, the

pull in frequency range is extended by only√KPD.

Later, an improved MPBBD is proposed to extend the pull in range beyond√KPD to

reach ±fref by modifying the transfer function of a MPBBD according to the frequency

error sign. This modification can enable the use of an improved MPBBD for high speed

DPLLs without the need to implement a special frequency locking aid and without the

need for a frequency divider.

The analogy between a BB-DPLL and ∆Σ modulator is presented where the occur-

rence of a first cycle slip in the BB-DPLL is analogous to overloading the quantizer in

a ∆Σ modulator. Accordingly, the maximum initial frequency error that will not cause

any cycle slipping (i.e. acquisition frequency range) is found by using limits well known

in the ∆Σ literature. The MPBBD-DPLL can handle KPD times larger initial frequency

error without cycle slipping compared with BBPD-DPLL.

The chapter concludes by presenting a silicon implementation of a MPBBD-DPLL

in STM 28 nm LP CMOS process. The MPBBD-DPLL has much faster locking time

and wider pull in range compared with BBPD-DPLL. Furthermore, the MPBBD has the

same steady-state jitter performance as a BBPD based DPLL with little design overhead.

The modification is minimal, requiring only three additional flip flops and a small LUT.

The presented architecture disables the high speed logic, used for frequency counting,

after achieving frequency locking to save power during steady-state operation.

130

Chapter 6

Conclusion

DPLLs have drawn a great deal of interest over the last decade. The main driver for

research in DPLLs is the continued transition from one CMOS technology node to another

due to the pressures of integrating evermore complex system on chip (SoC) designs. While

analog PLL design has become more complicated with technology scaling, DPLLs take

advantage of technology scaling to realize phase detection and frequency tuning with

better resolution. Furthermore, the DPLL design cycle can be automated to a great

extent by using well-developed software tools and methodologies for digital design and

verification.

This thesis investigates three problems related to phase and frequency detection in

DPLLs. The first part of the thesis deals with the reduction of the quantization noise

of TDCs to enable wide bandwidth operation and low fractional spurs. The second

part proposes different digital solutions to the dead-zone behavior during integer mode

operation where the DPLL dynamics could be unpredictable and dependent on initial

conditions. Finally, an analysis of the pull-in and locking behavior of a DPLL with a

quantized phase detector is presented, and an improvement is proposed.

The next section summarizes the thesis contributions and lists the publications that

arose over the course of research. The thesis concludes with an insight on future research

directions in the area of DPLLs.

131

6.1. Contributions

6.1 Contributions

This thesis explores the analysis and design of high-performance DPLL in sub-micron

CMOS processes. Chapter 3 presents a fractional DPLL that incorporates a novel low-

power two-step coarse-fine TDC to achieve low in-band phase noise operation. The

DPLL employs a stochastic TDC for the fine TDC stage while still achieving wide locking

range using a coarse delay line TDC. The thesis shows a highly sophisticated DPLL that

achieves competitive performance with state-of-art PLLs while at the same time preserve

the simplicity of the digital nature of the DPLL and the coarse-fine stochastic TDC. The

thesis brings DPLL research closer to a fully automated flow. Also, the thesis provides

a statistical analysis of achievable resolution, in practice, for a given stochastic TDC,

which complies closely with the presented measured results. The design incorporates an

on-chip calibration algorithm of the coarse TDC based on a balanced mean code density

test. By using a balanced mean code density test, the number of registers required for

the calibration algorithm is reduced by 30%. More importantly, a balanced mean code

density test relieves the necessity to do off-chip calibration or using SoC for on-chip

calibration.

Measurements results of the DPLL show an in-band phase noise of -107 dBc/Hz,

which is equivalent to 4 ps TDC resolution, approximately an order of magnitude better

than an inverter delay in this process technology. The integrated random jitter is 213 fs

rms for a 2 GHz output carrier frequency with 700 kHz loop bandwidth. The calibration

reduces worst-case spurs by 16 dB. The proposed DPLL consumes only 15.2 mW in

0.13 µm CMOS of which 4.4 mW are consumed in the TDC. Analysis, simulation and

measurement results for the DPLL are summarized in the following publications:

Samarah, A., Chan Carusone, A. “A Digital Phase-Locked Loop With Calibrated

Coarse and Stochastic Fine TDC”; IEEE Custom Integrated Circuits Conference

(CICC), San Jose, California, September 2012. [16]

Samarah, A., Chan Carusone, A. “A Digital Phase-Locked Loop With Calibrated

Coarse and Stochastic Fine TDC”; Solid-State Circuits, IEEE Journal of, Vol-

ume:48, Issue: 8, 1829 - 1841, Aug. 2013. [8]

Chapter 4 investigates the dead-zone problem during integer mode operation caused by

the TDC quantized response. The dead-zone behavior results in limit cycle behavior

132

6.1. Contributions

causing higher than expected in-band phase noise and strong in-band spurious tones.

To alleviate this problem, a novel noise-shaped offset is added to the phase error, in

the digital domain, to keep the TDC active and away from the dead-zone. Extensive

simulations and measurements of a DPLL prototype in a 0.13 µm CMOS process verifies

the effectiveness of the proposed digital solution. The work is published in the following

conference paper:

Samarah, A., Chan Carusone, A. “A Dead-Zone Free and Linearized Digital PLL”;

IEEE International Conference on Electronics, Circuits, and Systems (ICECS),

Seville, Spain, December 2012. [44]

Finally, Chapter 5 presents a rigorous mathematical analysis of a DPLL employing a

quantized phase detector during frequency acquisition where the DPLL usually exhibits

cycle slipping. The analysis finds that the pull-in range is proportional to the square

root of the phase detector large signal gain,√KPD, while the locking time is inversely

proportional to K2PD. Based on the findings of this analysis, an MPBBD-DPLL is pro-

posed to accelerate frequency and phase locking time and to increase the pull in range

while maintaining the same steady state performance as the BBPD-DPLL. The proposed

DPLL reduces power consumption by disabling the high-speed counter and re-timing cir-

cuit in the feedback loop after achieving frequency lock. Also, an improved version of

the MPBBD is suggested to extend the pull-in range up to the reference frequency range

which could eliminate the frequency lock loop and feedback counter for DPLLs and dig-

ital CDRs. Theoretical findings, as well as simulation and measurement results, are

documented in the following publications:

Samarah, A., Chan Carusone, A. “Multi-Phase Bang-Bang Digital Phase Lock

Loop with Accelerated Frequency Acquisition”; IEEE International Symposium on

Circuits and Systems (ISCAS), Lisbon, Portugal, May 2015. [58]

Samarah, A., Chan Carusone, A. “Cycle-Slipping Pull-In Range of Bang-Bang

PLLs ”; IEEE International NEW Circuits And Systems (NEWCAS), Grenoble,

France, June 2015. [66]

Samarah, A., Chan Carusone, A. “Discrete Time Analysis of Multi-Phase Bang-

Bang Phase Lock Loops”; IEEE Transactions on Circuits and Systems , to be

submitted, 2016.

133

6.2. Future Work

6.2 Future Work

Most research in the field of DPLLs focuses on improving the resolution of phase detection

and frequency tuning to achieve competitive phase noise and jitter performance similar

to classical analog PLLs. However, this trend depends on designing power hungry phase

detectors, especially for high-speed operation. There is a need to rethink the DPLL

architecture to employ simple but smart phase detectors since it will be naturally easy

to calibrate simple detectors and they can operate at fast speed.

Though the inverter delay gets smaller with CMOS technology scaling, the mismatch

and PVT variations become worse. Hence, the resolution of a typical inverter-delay-

line TDC improves with time, but the increasing mismatch variations limit the possible

achievable resolution and necessitate the use of a calibrated inverter with wide calibra-

tion range. The calibration complicates the design and limits the achievable resolution

moreover. Alternatively, researchers must take a different approach to minimize the TDC

quantization noise and to keep DPLL design simple and scalable. One approach is to

use a noise shaped coarse TDC with redundant information for error correction similar

to the error correction implemented in pipelined ADCs.

Another promising research is the design of a DPLL compiler similar to memory com-

pilers used within modern digital design flows. A DPLL compiler may accept high-level

specifications and produce synthesized gate-level Verilog for distribution as intellectual

property (IP). The main obstacle to achieving this goal is the difficulty of design automa-

tion for the TDC and DCO blocks due to their analog nature. Though recent works, like

[67], target DCO design automation, there is a long way to go before building a robust

DCO and TDC compiler for different application and specification.

134

Appendix A

Schematics

CML output buffer

Figure A.1: Schematic of the four-stage, 50 output driver used to send the DCO outputoff-chip. The last differential pair M4 is sized W = 176 µm/L = 120 nm while theload resistor R4 = 62.5 ohm. The previous stages are sized according to the following:transistor sizes of M4 = 2 ∗M3 = 4 ∗M2 = 8 ∗M1 and resistor values of R4 = R3/2 =R2/4 = R1/8.

135

CML latch and divide by two

(a)

(b)

Figure A.2: (a) Schematic of the CML latch used in the divide-by-2 circuit. The valueof R = 2 kΩ while M1 = M2 = M3 has W = 6 µm and L = 120 nm (b) Schematic ofdivide-by-2 using the two CML latches.

136

CML to CMOS conversion

Figure A.3: Schematic showing the CML to CMOS conversion employed after the CMLdivide-by-2. The CML signal is AC coupled through Cc = 150 fF and then passed toCMOS inverter with feedback resistor Rf = 35 kΩ to define the input common mode.The small cross coupled CMOS inverters (W = 160 nm & L = 120 nm) are used ensuredifferential operation. Another stage of CMOS inversion follows with similar size of thefirst stage (W = 13.02 µm & L = 120 nm)

Package model

Figure A.4: (a) Illustrative diagram of the fabricated chip mounted on QFN36 packageand soldered on PCB (b) Lumped model of the output PADs, bond wires, lead and PCBtrace capacitance. The chip dimensions are 1 µm x 1 µm while the QFN36 dimensions are5 µm x 5 µm. Accordingly, the bond wire could be 2-2.5 µm long and so Lbw = 2.5 nH.The extracted PAD capacitance was 90 fF.

137

DFF used in the coarse TDC

(a) Clocked sense amplifer

(b) Set-reset latch

Figure A.5: Sense-amplifier flip flop with a narrow metastability window [15]

138

Appendix B

Noise Contribution to Timing Jitter

The purpose of this appendix is to explain and demonstrate the direct relationship be-

tween timing jitter and noise sources. The information presented below is based on an

application note ”APP 3631” from Maxim Integrated1.

B.1 Noise Floor Contribution to Timing Jitter

There are a number of factors that contribute to random timing jitter, including broad-

band noise, phase noise, spurs, slew rate, and bandwidth. Both phase and broadband

noise are random, whereas spurs are deterministic responses caused by various identifi-

able interference signals, such as crosstalk and power supply coupling. Also, slew rate

and bandwidth also affect jitter.

Mathematically, one can represent a sinusoid containing broadband white noise with

the following equation:

V (t) = A. sin(2πfot) + vn(t) (B.1)

where vn(t) is the noise voltage at time t. The random noise vn(t) has a Gaussian

distribution with zero mean. The probability distribution pdf(vn) of the noise voltage is:

PDF (vn) =1√

2πv2nRMS

.e−

v2n2.v2

nRMS (B.2)

where vnRMS is the RMS noise voltage.

1https://www.maximintegrated.com/en/app-notes/index.mvp/id/3631

139

B.2. Phase Noise and Spurs Contribution to Timing Jitter

The broadband noise is a significant contributor to timing jitter. The total root-

mean-square (RMS) noise voltage is the integral of the noise floor over the bandwidth.

The RMS voltage noise is translated into timing jitter through slew rate mechanism.

tn =vnSR

(B.3)

where SR is the slew rate and given by

SR ≈ ∆V

∆t=A. sin(2πfo∆t)

∆t≈ A.2πfo (B.4)

assuming that the timing jitter ∆t is very small compared with the fundamental period.

Accordingly, the squared RMS jitter is given by

〈J2noise−floor〉 =

v2nRMS

(2πfoA)2(B.5)

It appears that a faster slew-rate waveform results in lower jitter. However, a faster

slew rate requires a higher operating bandwidth, which increases the RMS noise of the

system. Because the RMS noise is directly proportional to the bandwidth, system de-

signers must carefully choose the slew rate and bandwidth to minimize jitter.

B.2 Phase Noise and Spurs Contribution to Timing

Jitter

To derive the necessary equations relating phase noise to jitter, consider the following

sinusoid containing phase noise:

V (t) = A. sin(2πfot+ Φ(t)) (B.6)

where A is the amplitude, fo is the nominal frequency, and Φ(t) is the phase noise.

Jitter is commonly measured at the 0V crossing between two or more periods. Take two

consecutive 0V crossing timing instants ti and ti+1, one can write:

2πfoti + Φ(ti) = 2πi (B.7)

2πfoti+1 + Φ(ti+1) = 2π(i+ 1) (B.8)

140


such that i = 0, 1, 2, etc. Subtracting Eq. B.7 from B.8 to get

2πfo[ti+1 − ti] + [Φ(ti+1)− Φ(ti)] = 2π (B.9)

But

ti+1 − ti = To +∆t (B.10)

where To = 1/fo and ∆t is the period jitter i.e. the period variation over time. Substitute

Eq. B.10 into Eq. B.9, one can write:

2πfo[To +∆t] + [Φ(ti+1)− Φ(ti)] = 2π (B.11)

2πfo∆t + [Φ(ti+1)− Φ(ti)] = 0 (B.12)

∆t =To2π

[Φ(ti)− Φ(ti+1)] (B.13)

Accordingly, the period jitter is the difference function of the absolute phase Φ(ti). Now,

the squared RMS jitter

〈∆t2〉 = T 2o

4π2[〈Φ(ti)2〉 − 2〈Φ(ti)Φ(ti+1)〉+ 〈Φ(ti+1)

2〉] (B.14)

Because Φ(t) is a stationary process:

〈Φ(ti)2〉 = 〈Φ(ti+1)2〉 =

∫∞

−∞

SΦ(f).df (B.15)

where SΦ(f) is the power spectrum density of the phase noise, Φ(t), and f is the offset

frequency from the carrier frequency fo.

The middle term of Eq. B.14 can be written using the autocorrelation function of the

phase noise, Φ(t).

〈Φ(ti)Φ(ti+1)〉 = RΦ(ti+1 − ti) = RΦ(τ) =

∫∞

−∞

SΦ(f). cos(2πfτ).df (B.16)

where is RΦ(τ) is the autocorrelation function of Φ(t) and τ ∼= To.

141


Now, one can write the squared RMS jitter in Eq. B.14 as the following:

〈∆t2〉 = T 2o

4π2[2

∫∞

−∞

SΦ(f).df− 2

∫∞

−∞

SΦ(f). cos(2πfTo).df] (B.17)

〈∆t2〉 = 2T 2o

4π2

∫∞

−∞

SΦ(f).[1− cos(2πfTo)].df (B.18)

Recalling the algebraic identity [1 − cos(2πfTo)] = 2. sin2(πfTo) and assuming the

phase noise symmetrical, one can write:

〈∆t2〉 = 8T 2o

4π2

∫∞

0

SΦ(f). sin2(πfTo).df (B.19)

SΦ(f) is approximately equal to the phase noise L(f) for close-in phase noise (∆f < fo,

usually ∆f = fo/2),

〈J2period〉 = 〈∆t2〉 = 8

T 2o

4π2

∫ ∆f

0

L(f). sin2(πfTo).df (B.20)

Spurs also contribute to timing jitter, especially in oscillators. Spurs are caused by

phase-locked-loop reference spurs, supply coupling, crosstalk from nearby circuitry, and

sources.

〈J2spurs〉 = 4

T 2o

4π2

∑

m

L(fm). sin2(πfmTo) (B.21)

Assuming that the spurs are not symmetrical and so the spurs on both sides of the carrier

must be included in the jitter calculation. L(fn) is the spur amplitude relative to the

carrier given in dBc. (If the spurs are symmetrical, one may use a factor of 8 not 4, and

account for only one side of the spectrum’s spurs)

Broadband noise (white noise floor), phase noise, and spurs are the three contrib-

utors to timing jitter. Broadband noise is purely random and uncorrelated, thus the

jitter it produces does not accumulate. The latter two, however, generally do produce

accumulating jitter. The squared total timing jitter is the sum of the three squared

jitters.

〈J2total〉 = 〈J2

noise−floor〉+ 〈J2period〉+ 〈J2

spurs〉 (B.22)

And the total RMS period jitter is Jrms =√〈J2

total〉. If calculating N-period jitter

142

B.3. Approximation of RMS Timing Jitter from L(f)

(i.e. N-cycle), replace τ by N.To rather than by To.

The cycle-to-cycle jitter (Jcc) is measure of variation of the difference between adjacent

periods. It is the first difference function of period (or cycle) jitter.

〈J2cyclecycle〉 = 〈∆t2〉 = 32

T 2o

4π2

∫ ∆f

0

L(f). sin4(πfTo).df (B.23)

B.3 Approximation of RMS Timing Jitter from L(f)

One can define phase-noise spectrum L(f) using the power spectrum density SP (f) ob-

tained from spectrum analyzer.

L(f − fc) = 10× log[SP (f)

SP (fc)] dBc (B.24)

Using the Fourier series expansion, it can be shown that a square-wave clock signal

has the same jitter behavior as its base harmonic sinusoid signal. This property makes

the jitter analysis of a clock signal much easier.

A sinusoid signal of a clock signal with phase noise can be written as:

V (t) = A. sin(2πfct+ Φ(t)) = A. sin(2πfc(t+Φ(t)

2πfc)) (B.25)

From which, one can write an equation for the absolute timing jitter as:

J =Φ(t)

2πfc(B.26)

The RMS absolute jitter can be calculated by integrating the phase noise spectrum

as the following:

Jrms =1

2πfc

√〈Φ2(t)〉 = 1

2πfc

√2

∫∞

0

10L(f)10 .df (B.27)

In some applications like SONET and 10GbE, engineers only monitor the jitter at a

143

B.3. Approximation of RMS Timing Jitter from L(f)

certain frequency band. And so,

Jrms =1

2πfc

√2

∫ f2

f1

10L(f)10 .df (B.28)

The phase noise usually can be approximated by a piece-wise linear function when

the frequency axis of L(f) is in log scale:

L(f) =

K−1∑

i=1

[ai(log(f)− log(fi)) + L(fi)][U(f − fi)− U(f − fi+1)] (B.29)

where K-1 is the number of piece-wise line sections in the function and U(f) is the

unit step function. Substitute L(f) shown in Eq. B.29 into Eq. B.28 (remember that

10x.log(y) = [10log(y)]x = yx) to write:

Jrms =1

2πfc

√√√√2K−1∑

i=1

10L(fi)

10 f−ai10

i

∫ fi+1

fi

fai10 .df (B.30)

=1

2πfc

√√√√2

K−1∑

i=1

10L(fi)

10 f−ai10

i (ai10

+ 1)−1[fai10

+1

i+1 − fai10

+1

i ] (B.31)

where:

ai =L(fi+1)− L(fi)

log(fi+1)− log(fi)(B.32)

144

Appendix C

Modeling and Simulation of DCO

C.1 Noise Modeling of White Gaussian Noise

Accurate noise modeling is necessary for precision computer simulation. A mathemat-

ical treatment is needed to represent ideal white Gaussian noise using a discrete time

simulation environment.

Given an additive white Gaussian noise (AWGN), with spectral density No/2 W/Hz,

is filtered by an ideal brick-wall continuous-time low-pass filter. Assuming that the input

Gaussian noise is a stationary random precess, the autocorrelation function of the filter’s

output noise is calculated using the Wiener-Khintchine theorem.

R(τ) =

∫ Fs/2

−Fs/2

No

2ej2πfτdf

R(τ) =NoFs

2

sin(πFsτ)

πFsτ

Noise samples separated by integer-multiplier of 1/Fs are completely uncorrelated. Ac-

cordingly, the ideal continuous AWGN noise source can be modeled exactly in a discrete

time simulation by using a Gaussian random number generator that provides a sample

every 1/Fs and having a sampling variance of NoFs/2.

145

C.1. Noise Modeling of White Gaussian Noise

C.1.1 Modeling Flicker Noise

Discrete-time random process modeling precision should be evaluated based on the au-

tocorrelation function behavior rather than the power spectral density in order to avoid

aliasing in the frequency domain. A zero-mean, discrete-time, Gaussian process is said to

simulate a continuous-time, Gaussian random process if the discrete-time autocorrelation

function precisely matches the sampled continuous-time autocorrelation function. [14]

There are several different methods to generate 1/fα noise that trade off accuracy

with computational resources including:

Auto-Regressive (AR) Method: One of the most accurate methods but at the cost

of being computationally intensive.

Random Midpoint Displacement Method: Used extensively in computer graphics.

Fractional-Differencing Method: Using FIR or IIR filters with a large number of

filter coefficients to model 1/f noise over several decades.

Recursive Filtering Method: The 1/f noise is constructed by passing white Gaus-

sian noise through a cascade of first-order digital filters having appropriately se-

lected pole and zero frequencies.

Generating 1/f Noise by Recursive Filtering [14].

The 1/f noise is constructed by passing white Gaussian noise through a cascade of first-

order digital filters having appropriately selected pole and zero frequencies. The squared

transfer function for the filter is given by

|H(ω)|2 =Nf∏

i=1

(ω2 + z2iω2 + p2i

)

where Nf is the number of cascaded filter sections, and zi and pi is the filter zeros and

poles, respectively. The poles must be located on a logarithmic grid across the frequency

span of interest (ωmin, ωmax) as

pi = ωmin ∗ exp[1

2

(1− α

2

)∆p

]

146

C.1. Noise Modeling of White Gaussian Noise

where

∆p =loge(ωmax)− loge(ωmin)

Nf

for i = 1 to Nf .

In order to have a symmetrical error with respect to the ideal 1/f spectrum line, the

zeros are given by

zi = pi ∗ exp(α2∆p)

Each filter section shapes different regions of the noise spectrum, to yield a composite

output spectrum with the desired 1/f shape. A minimum of one filter section per fre-

quency decade is recommended for reasonable accuracy. A Matlab code to implement

this recursive filter is shown below [14].

1 %===================== recursive flicker noise.m ====================

2 % Use recursive 1/fˆ2 filterting to create a good approximat ion

3 % to 1/f noise

4 %====================−===============================================

5 function [wsw, psw]= recursive flicker noise( wmin, wmax, Nf, alpha )

6 % wmin Minimum radian frequency of interest

7 % wmax Maximum radian frequejncy of interest

8 % Nf Number of 1/fˆ2 filter sections to use

9 % alpha Desired noise exponent 1/fˆalpha

10

11 Hz= @(ww,zz,pp) (ww.ˆ2 + zzˆ2) ./ (ww.ˆ2 + ppˆ2);

12 dp= (log(wmax) −log(wmin))/Nf;

13 hpoles(1)= wmin * exp( 0.1 * (1 −0.50 * alpha) * dp );

14

15 for ii=2:Nf

16 hpoles(ii)= hpoles(ii −1) * exp(dp);

17 end

18

19 for nn=1:Nf

20 hzeros(nn)= hpoles(nn) * exp( 0.50 * alpha * dp );

21 end

22

23 Npts= 1000;

24 wsw= logspace( floor(log10(wmin)), ceil(log10(wmax)), N pts);

25 psw= zeros(1,Npts);

147

C.2. Simulation of the PLL

26

27 for ii= 1:Npts

28 for jj=1:Nf

29 psw(ii)= psw(ii) + 10 * log10( Hz(wsw(ii), hzeros(jj), ...

hpoles(jj) ) );

30 end

31 end

32

33 figure(1);

34 clf;

35 h1= semilogx( wsw, psw, 'k' );

36 set( h1, 'LineWidth' , 2 );

37 xlabel( 'Radian Frequency' );

38 ylabel( 'Relative Spectrum Level, dB' );

39 title( 'fˆ ˆ −ˆ \alpha Power Spectral Density' );

40 grid on;

41 zoom on;

42 end

C.2 Simulation of the PLL

It can take days or weeks of simulation time to run a circuit-level simulation, that captures

PLL locking, making this method tedious and resource intensive. Fast simulation is

highly desirable to minimize fabrication costs and to shorten time-to-market [49]. This

can be done by carefully abstracting the circuits and blocks into reasonable behavioral

models using MATLAB/ Simulink and Verilog-A/ Verilog-AMS, or using other modeling

environments.

Even at the system level of abstraction, simulating a PLL is time-consuming due

to the different time scales among the reference clock and the output clock. The time

step of the simulator must be much smaller than the smallest time constant of the PLL

loop. The PLL needs many reference cycles so that the oscillator locks its frequency to a

multiple of the reference clock frequency. This translates to a large number of simulator

time steps to capture the frequency and phase locking behavior of the PLL.

Time interval error (TIE) is the short-term variations of the significant instants of a

clock cycle (e.g. rising edges) from their ideal positions. TIE maintains a record of errors

versus time which make accumulated phase error measurement possible using an FFT.

148


Jrms

(thermal)

+

Jrms

(1/f2)1/f

freq

DCO time

stamps

Cmax

L

∆Ccoarse

∆Cfine

Cop

+

-

d[\]^_`da\]^_`dabcd\e]ê`dafcg\hê`daijk\hê`

Encoder

(a) A discrete-time DCO model generates timesteps of the rising and falling edges of DCO clock

1/f noiseUp-conversion

Thermal noiseup-conversion

Thermal noise

0.01 0.1 1 10 100 1000 Offset Frequency (MHz)

Ph

ase

No

ise

(dB

c)

-50

-70

-90

-110

-130

-150

(b) Simulation model (blue) vs. mathematicalmodel (black)

Figure C.1: Verilog-A model of DCO phase noise.

TIE can be measured using a real-time sampling oscilloscope. First, a reference clock is

recovered from incoming data or clock. Then compare the instantaneous edges ti with

the ideal edge location i× To. Ideally, TIE = 0, while calculating the RMS jitter to be√TIE

2and the peak-to-peak TIE jitter.

MATLAB code to find the phase noise of a DPLL based on time stamps of the DPLL

output clock edges is provided below.

1 function jitter eval mat2pn(data1, fileToRead, r start, r end, ...

nfft, plot format, storeFig)

2

3 % data1 −−> matlab matrix contains the time stamps of the clock ...

edges (no scaling)

4 l start = floor(r start * length(data1))+1; % truncate the transient ...

samples

5 l end = floor(r end * length(data1)); % truncate the transient samples

6 data = data1(l start:l end) * 1e12;

7

8 %% Calculate the period and jitter

9 n = [1:length(data)]';

10 length(data)

11 p = polyfit (n, data, 1);

12 T = p(1) % in ps

149


13 fsyn = (10ˆ12)/T % estimation of the synthesized frequency

14

15 %% Measure the difference between the ideal edge and the real rising ...

edge

16 ph e(n) = data(n) − polyval(p,n); %

17

18 %% Plot the phase noise of the jitter

19 winNBW=1.5; % Noise bandwidth given in bins

20 phases=2 * pi * ph e/T;

21

22 % compute power spectral density of phase

23 [Sphi,f]=psd(phases,nfft,1e12/T,nfft,nfft/2, 'linear' );

24

25 % correct for scaling in PSD due to FFT and window

26 Sphi=winNBW * Sphi/nfft;

27 % plot the results (except at DC)

28 K = length(f);

29 rbw = winNBW/(T * 1e−12* nfft);

30

31 db = 10 * log10(Sphi(2:K)) −10* log10(rbw);

32 f = f(2:K);

33 % calculate the integrated phase jitter from phase noise

34 PJ = sqrt(Pn2Jitter(f,db))/(2 * pi/T)

35

36 asPlot(f,db, 'Offset Frequency' , 'Phase Noise [dBc/Hz]' );

37 set(gca, 'XScale' , 'log' );

38 set(gca, 'XLim' ,[1e+4 1.2e+9]);

39 set(gca, 'YLim' ,[ −160 −60]);

40 asPrint( fileToRead, ' pn' , 'eps' , 1, 0) % print eps without title

41 mytitle1 = sprintf( ' \n The integrated phase jitter %d ...

fs' ,floor(PJ/1e −3) );

42 title(mytitle1)

43 asPrint( fileToRead, ' pn' , plot format , 1, storeFig) % print .png ...

and save .fig format

The Verilog code to implement thermal and 1/f flikcer noise is shown below [14].

1 //Verilog HDL for "PLL2014", "DCO nonInv" "verilog 2014"

2 `timescale 1ns/1fs // time unit / time resolution

150


3

4 `define dco per 0 900 // ps − period of highest frequency − ...

1/900MHz

5 `define dco peroff lim 250 // ps − maximum period deviation

6 `define dco quant a 2033 // fs − time resolution of ...

acquisition caps

7 `define dco quant t 9.68 // fs − time resolution of tracking ...

caps

8

9 `define duty 0.5 // 50 % duty cycle

10 `define NO FILTER 1 // 1 for instant freq change

11 `define dco init dly 0 // initial oscillator delay

12 `define wander rms 217.1 // fs − accumulative jitter

13 `define jitter rms 530 // fs − non−accumulative jitter

14

15 `define wrms1 0.8039





20

21 `define fc1 0.1 // 0.1 kHz

22 `define fc2 1 // 1 kHz

23 `define fc3 10 // 10 kHz

24 `define fc4 100 // 100 kHz

25 `define fc5 1000 // 1000kHz

26

27 `define noise floor −150 // dBc/Hz

28 `define L at Foff −103 // dBc/Hz

29 `define Foff 1e6 // 1 MHz i.e. −103 dBc/Hz phase noise @ ...

offset freq 1 MHz

30

31 `define DCO 1f 1

32

33 // Define some math constants

34 `define pi 3.14159

35 `define e 2.71828

36

37 module DCO nonInv

38 (

151


39 OutP,

40 A, B, C, D,

41 M, // frequency acquisition input control bits

42 Fcol, // phase tracking input control bits

43 Ftherm // = where the MSB's are binary encoded while

44 ); // = the LSB's are thermally encoded

45 output A, B, C, D;

46 output reg OutP;

47

48 input [15:1] Ftherm;

49 input [7:4] Fcol;

50 input [6:0] M;

51

52 // thermo −code to integer − tracking bits

53 integer i;

54 integer track col, track;

55

56 always @( * ) begin

57 track = 0;

58 for (i = 1; i < 16 ; i = i + 1)

59 if (Ftherm [i] == 1'b1)

60 track = track + 1;

61 end

62 /////////////////////////////////////////////////// /////////////

63 // Compute the DCO period

64 /////////////////////////////////////////////////// /////////////

65 real mat quant a;

66 real mat quant ti;

67 real mat quant tf;

68 real mat quant ls;

69


71 mat quant a = M[6:0] * `dco quant a;

72 mat quant ti = Fcol[7:4], 4'b0000 * `dco quant t;

73 mat quant tf = track * `dco quant t;

74 mat quant ls = mat quant a + mat quant ti;

75 end

76

77 real mat pdev, mat pdev var;

78 real mat per = `dco per 0 / 1e3; // in ns

152


79

80 always @(mat quant ls, mat quant tf) begin

81 mat pdev var = mat quant ls + mat quant tf; // fs

82

83 if (mat pdev var > `dco peroff lim * 1e3)

84 mat pdev var = `dco peroff lim * 1e3; // fs

85 else if (mat pdev var < −1 * `dco peroff lim * 1e3)

86 mat pdev var = −1 * `dco peroff lim * 1e3; // fs

87

88 mat pdev = mat pdev var; //fs

89 mat per = (`dco per 0/1e3) + (mat pdev var/1e6); // period in ns

90 end

91

92 real fc ctrl = 0.020; // 20 MHz i.e. 50 ns

93 real tau ctrl = 8e −9;

94 always @( * )

95 tau ctrl = 1 / (2.0 * `pi * fc ctrl);

96

97 real jitter = 0;

98 real jitter prev = 0;

99 real wander, wanderT;

100 real wander1, wander2, wander3, wander4, wander5;

101 real wander1f, wander2f, wander3f, wander4f, wander5f;

102 real period;

103 real period prev;

104 real tref;

105 real t diff;

106

107 integer seed1;// = $random($realtime);

108 integer seed2;// = $random(seed1);

109 integer s1, s2, s3, s4, s5;

110

111 real wrms = `wander rms;

112 real jrms = `jitter rms;

113

114 real tau w1 = 1.0;

115 real tau w2 = 1.0;

116 real tau w3 = 1.0;

117 real tau w4 = 1.0;

118 real tau w5 = 1.0;

153


119

120 real w1 fc = `fc1 / 1e6; // convert to GHz

121 real w2 fc = `fc2 / 1e6;

122 real w3 fc = `fc3 / 1e6;

123 real w4 fc = `fc4 / 1e6;

124 real w5 fc = `fc5 / 1e6;

125


127 tau w1 = 1.0 / (2.0 * `pi * w1 fc);

128 tau w2 = 1.0 / (2.0 * `pi * w2 fc);

129 tau w3 = 1.0 / (2.0 * `pi * w3 fc);

130 tau w4 = 1.0 / (2.0 * `pi * w4 fc);

131 tau w5 = 1.0 / (2.0 * `pi * w5 fc);

132 end

133

134 initial begin

135 seed2 = 3434;

136 s1 = $random;

137 s2 = $random;

138 s3 = $random;

139 s4 = $random;

140 s5 = $random;

141

142 period = `dco per 0 / 1e3;

143 period prev = `dco per 0 / 1e3;

144

145 tref = `dco init dly;

146

147 OutP ≤ #(`dco init dly) 1'b1 ;

148

149 forever begin

150 t diff = $realtime − tref; // time difference between actual ...

and ideal samples

151 tref = tref + mat per; // ideal next time step

152

153 if (`NO FILTER)

154 period = mat per; // adjust the next dco period ...

instantaneously

155 else

156 period = mat per + (period − mat per) * (`e ** ...

154


(−period prev/tau ctrl));

157

158 if (jrms != 0) begin

159 jitter = jrms * $dist normal (seed1, 0, 1000) / 1e9 ; // ...

in ns

160 if (jitter ≥ period/2)

161 jitter = 0;

162 period = period + jitter − jitter prev;

163 end

164

165 if (wrms != 0) begin

166 wander = wrms * $dist normal (seed2, 0, 1000) /1e9; // ...

in ns

167 if (wander ≥ period/2)

168 wander = 0;

169 period = period + wander;

170 end

171 if (`DCO 1f) begin

172 wander1 = `wrms1 * $dist normal (s1, 0, 1000) /1e9; // ...

in ns


in ns


in ns


in ns


in ns

177

178 wander1f = wander1 + ( (wander1f − wander1) * (`e ...

** (−period prev/tau w1)) ); // filtered version of ...

−20dBc/hz



−20dBc/hz



−20dBc/hz



155

C.2. Simulation of the PLL 156

−20dBc/hz



−20dBc/hz

183

184 wanderT = wander1f + wander2f + wander3f + wander4f + ...

wander5f;

185 if (wanderT ≥ period/2)

186 wanderT = 0;

187 period = period + wanderT;

188 end

189

190 OutP ≤ 1'b1;

191 #(period * `duty);

192

193 jitter prev = jitter;

194 period prev = period;

195

196 OutP ≤ 1'b0;

197 #(period * (1 − `duty));

198

199 end

200 end

201 reg A, B, C, D;

202

203 always@*204 begin

205 A ≤ OutP;

206 C ≤ #(period/4) OutP;

207 B ≤ #(5 * period/8) OutP;

208 D ≤ #(7 * period

209 end

210 endmodule //

Bibliography

[1] S.E. Collier. The emerging enernet: Convergence of the smart grid with the internet

of things. In Rural Electric Power Conference (REPC), IEEE, pages 65–68, April

2015.

[2] Xicheng Jiang, editor. Digitally-Assisted Analog and Analog-Assisted Digital IC

Design. Cambridge University Press, 2015.

[3] B. Murmann. Digitally assisted analog circuits. Micro, IEEE, 26(2):38–47, March

2006.

[4] A. Swaminathan, K.J. Wang, and I. Galton. A Wide-Bandwidth 2.4 GHz ISM Band

Fractional-N PLL With Adaptive Phase Noise Cancellation. Solid-State Circuits,

IEEE Journal of, 42(12):2639–2650, 2007.

[5] R.B. Staszewski, J.L. Wallberg, S. Rezeq, Chih-Ming Hung, O.E. Eliezer, S.K.

Vemulapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, Meng-Chang Lee,

P. Cruise, M. Entezari, K. Muhammad, and D. Leipold. All-Digital PLL and Trans-

mitter for Mobile Phones. Solid-State Circuits, IEEE Journal of, 40(12):2469–2482,

2005.

[6] S. Pamarti. Digital Techniques for Integrated Frequency Synthesizers: A Tutorial.

Communications Magazine, IEEE, 47(4):126–133, April 2009.

[7] S.E. Meninger and M.H. Perrott. A Fractional-N Frequency Synthesizer Architecture

Utilizing a Mismatch Compensated PFD/DAC Structure for Reduced Quantization-

Induced Phase Noise. Circuits & System II, IEEE Transactions on, 50(11):839–849,

Nov. 2003.

157

Bibliography 158

[8] Amer Samarah and Anthony Chan Carusone. A Digital Phase-Locked Loop with

Calibrated Coarse and Stochastic Fine TDC. Solid-State Circuits, IEEE Journal of,

48(8):1829–1841, 2013.

[9] A. Goel, A. Rylyakov, H. Ainspan, and D. Friedman. A Compact 6 GHz to 12 GHz

Digital PLL with Coupled Dual-LC Tank DCO. In VLSI Circuits (VLSIC), IEEE

Symposium on, pages 141–142, June 2010.

[10] M. Ferriss and M.P. Flynn. A 14mW Fractional-N PLL Modulator with an En-

hanced Digital Phase Detector and Frequency Switching Scheme. In Proc. Digest

of Technical Papers. IEEE International Solid-State Circuits Conference ISSCC ,

pages 352–608, Feb. 2007.

[11] R. Tonietto, E. Zuffetti, R. Castello, and I. Bietti. A 3MHz Bandwidth Low Noise

RF All Digital PLL with 12ps Resolution Time to Digital Converter. In Solid-State

Circuits Conference, 2006. ESSCIRC 2006. Proceedings of the 32nd European, pages

150–153, Sept. 2006.

[12] M.Z. Straayer and M.H. Perrott. A Multi-Path Gated Ring Oscillator TDC With

First-Order Noise Shaping. Solid-State Circuits, IEEE Journal of, 44(4):1089–1098,

2009.

[13] Practical manufacturing testing of bluetooth wireless devices, 2012.

[14] James A. Crawford. Advanced Phase-Lock Techniques. Artech House Publishers,

Nov. 2007.

[15] Poras T. Balsara Robert B. Staszewski. All-Digital Frequency Synthesizer in Deep-

Submicron CMOS. Wiley-Interscience, Aug. 2006.

[16] Amer Samarah and Anthony Chan Carusone. A Digital Phase-Locked Loop with

Calibrated Coarse and Stochastic Fine TDC. In Custom Integrated Circuits Con-

ference (CICC), IEEE, pages 1–4, Sept. 2012.

[17] V. Gutnik and A. Chandrakasan. On-chip Picosecond Time Measurement. In VLSI

Circuits, Digest of Technical Papers. Symposium on, pages 52–53, 2000.

Bibliography 159

[18] Hsiang-Hui Chang, Ping-YingWang, J.-H.C. Zhan, and Bing-Yu Hsieh. A Fractional

Spur-Free ADPLL with Loop-Gain Calibration and Phase-Noise Cancellation for

GSM/GPRS/EDGE. In Proc. Digest of Technical Papers. IEEE International Solid-

State Circuits Conference ISSCC 2008, pages 200–206, Feb. 2008.

[19] Jianjun Yu, F.F. Dai, and R.C. Jaeger. A 12-Bit Vernier Ring Time-to-Digital

Converter in 0.13 µm CMOS Technology. Solid-State Circuits, IEEE Journal of,

45(4):830–842, April 2010.

[20] A. Ravi, S. Pellerano, C. Ornelas, H. Lakdawala, T. Tetzlaff, O. Degani, M. Sa-

jadieh, and K. Soumyanath. A 9.2-12GHz, 90nm Digital Fractional-N Synthesizer

with Stochastic TDC Calibration and -35/ -41dBc Integrated Phase Noise in the

5/2.5GHz Bands. In VLSI Circuits (VLSIC), IEEE Symposium on, pages 143–144,

June 2010.

[21] A. Liscidini, L. Vercesi, and R. Castello. Time to Digital Converter Based on a 2-

Dimensions Vernier Architecture. In Custom Integrated Circuits Conference (CICC),

IEEE, pages 45–48, 2009.

[22] L. Vercesi, L. Fanori, F. De Bernardinis, A. Liscidini, and R. Castello. A Dither-Less

All Digital PLL for Cellular Transmitters. Solid-State Circuits, IEEE Journal of,

47(8):1908–1920, Aug. 2012.

[23] S. Henzler, S. Koeppe, W. Kamp, H. Mulatz, and D. Schmitt-Landsiedel. 90nm

4.7ps-Resolution 0.7-LSB Single-Shot Precision and 19pJ-Per-Shot Local Passive

Interpolation Time-to-Digital Converter with On-Chip Characterization. In Solid-

State Circuits Conference (ISSCC), Digest of Technical Papers. IEEE International,

pages 548–635, 2008.

[24] Chorng-Sii Hwang, Poki Chen, and Hen-Wai Tsao. A High-Precision Time-to-Digital

Converter Using a Two-Level Conversion Scheme. Nuclear Science, IEEE Transac-

tions on, 51(4):1349–1352, Aug. 2004.

[25] Minjae Lee and A.A. Abidi. A 9 b, 1.25 ps Resolution Coarse-Fine Time-to-Digital

Converter in 90 nm CMOS that Amplifies a Time Residue. Solid-State Circuits,

IEEE Journal of, 43(4):769–777, 2008.

Bibliography 160

[26] V. Kratyuk, P.K. Hanumolu, K. Ok, Un-Ku Moon, and K. Mayaram. A Digital

PLL with a Stochastic Time-to-Digital Converter. Circuits and Systems I: Regular

Papers, IEEE Transactions on, 56(8):1612–1621, Aug. 2009.

[27] Tony Chan Carusone, David Johns, and Kenneth Martin. Analog Integrated Circuit

Design. Wiley, second edition, 2011.

[28] S. Weaver, B. Hershberg, and Un-Ku Moon. PDF Folding for Stochastic Flash

ADCs. In Electronics, Circuits, and Systems (ICECS), 17th IEEE International

Conference on, pages 770–773, Dec. 2010.

[29] Hyung Seok Kim, C. Ornelas, K. Chandrashekar, Pin en Su, P. Madoglio, Y.W.

Li, and A. Ravi. A Digital Fractional-N PLL with a 3mW 0.004mm2 6-bit PVT

and Mismatch Insensitive TDC. In ESSCIRC (ESSCIRC), 2012 Proceedings of the,

pages 193–196, Sept. 2012.

[30] J. Doernberg, H.-S. Lee, and D.A. Hodges. Full-Speed Testing of A/D Converters.

Solid-State Circuits, IEEE Journal of, 19(6):820–827, 1984.

[31] F. Baronti, L. Fanucci, D. Lunardini, R. Roncella, and R. Saletti. A Technique for

Nonlinearity Self-Calibration of DLLs. Instrumentation and Measurement, IEEE

Transactions on, 52(4):1255–1260, 2003.

[32] S. Beer and R. Ginosar. A New 65nm LP Metastability Measurment Test Circuit.

In Electrical Electronics Engineers in Israel, IEEE 27th Convention of, pages 1–4,

Nov. 2012.

[33] Z. Ye and M. P. Kennedy. Reduced Complexity MASH Delta-Sigma Modulator.

IEEE Transactions on Circuits and Systems II: Express Briefs, 54(8):725–729, Aug

2007.

[34] Kwyro Lee, I. Nam, Ickjin Kwon, J. Gil, Kwangseok Han, S. Park, and Bo-Ik Seo.

The impact of semiconductor technology scaling on CMOS RF and digital circuits

for wireless application. IEEE Transactions on Electron Devices, 52(7):1415–1422,

July 2005.

Bibliography 161

[35] T.A.D. Riley, N.M. Filiol, Qinghong Du, and J. Kostamovaara. Techniques for In-

Band Phase Noise Reduction in δσ Synthesizers. Circuits and Systems II: Analog

and Digital Signal Processing, IEEE Transactions on, 50(11):794–803, Nov. 2003.

[36] Chun-Ming Hsu, M.Z. Straayer, and M.H. Perrott. A Low-Noise Wide-BW 3.6-GHz

Digital ∆Σ Fractional-N Frequency Synthesizer With a Noise-Shaping Time-to-

Digital Converter and Quantization Noise Cancellation. Solid-State Circuits, IEEE

Journal of, 43(12):2776–2786, Dec. 2008.

[37] Ping-Ying Wang, J.-H.C. Zhan, Hsiang-Hui Chang, and H.-M.S. Chang. A Digital

Intensive Fractional-N PLL and All-Digital Self-Calibration Schemes. Solid-State

Circuits, IEEE Journal of, 44(8):2182–2192, Aug. 2009.

[38] X. Gao, E. A. M. Klumperink, P. F. J. Geraedts, and B. Nauta. Jitter Analysis

and a Benchmarking Figure-of-Merit for Phase-Locked Loops. IEEE Transactions

on Circuits and Systems II: Express Briefs, 56(2):117–121, Feb. 2009.

[39] Jessica Lipsky. TSMC outlines 16nm, 10nm plans. EE-Times, April 2015.

[40] C. Weltin-Wu, E. Temporiti, D. Baldi, and F. Svelto. A 3GHz Fractional-N All-

Digital PLL with Precise Time-to-Digital Converter Calibration and Mismatch Cor-

rection. In Proc. Digest of Technical Papers. IEEE International Solid-State Circuits

Conference ISSCC 2008, pages 344–618, Feb. 2008.

[41] T. Tokairin, M. Okada, M. Kitsunezuka, T. Maeda, and M. Fukaishi. A 2.1-to-2.8-

GHz Low-Phase-Noise All-Digital Frequency Synthesizer with a Time-Windowed

Time-to-Digital Converter. Solid-State Circuits, IEEE Journal of, 45(12):2582–2590,

Dec. 2010.

[42] Ja-Yol Lee, Mi-Jeong Park, Byonghoon Mhin, Seong-Do Kim, Moon-Yang Park,

and Hyunku Yu. A 4-GHz All Digital Fractional-N PLL with Low-Power TDC and

Big Phase-Error Compensation. In Custom Integrated Circuits Conference (CICC),

IEEE, pages 1–4, Sept. 2011.

[43] E. Temporiti, C. Weltin-Wu, D. Baldi, M. Cusmai, and F. Svelto. A 3.5 GHz

Wideband ADPLL With Fractional Spur Suppression Through TDC Dithering and

Feedforward Compensation. Solid-State Circuits, IEEE Journal of, 45(12):2723–

2736, Dec. 2010.

Bibliography 162

[44] Amer Samarah and Anthony Chan Carusone. A Dead-Zone Free and Linearized

Digital PLL. In International Conference on Electronics, Circuits, and Systems

(ICECS) , IEEE, Dec. 2012.

[45] Socrates D. Vamvakos, Robert Bogdan Staszewski, Mahbuba Sheba, and Khurram

Waheed. Noise Analysis of Time-to-Digital Converter in All-Digital PLLs. In Design,

Applications, Integration and Software, IEEE Dallas/CAS Workshop on, pages 87–

90, Oct. 2006.

[46] K. Waheed, R.B. Staszewski, F. Dulger, M.S. Ullah, and S.D. Vamvakos. Spurious

Free Time-to-Digital Conversion in an ADPLL Using Short Dithering Sequences.

Circuits and Systems I: Regular Papers, IEEE Transactions on, 58(9):2051–2060,

Sept. 2011.

[47] R.B. Staszewski, K. Waheed, F. Dulger, and O.E. Eliezer. Spur-Free Multirate All-

Digital PLL for Mobile Phones in 65 nm CMOS. Solid-State Circuits, IEEE Journal

of, 46(12):2904–2919, Dec. 2011.

[48] M. Zanuso, D. Tasca, S. Levantino, A. Donadel, C. Samori, and A.L. Lacaita. Noise

Analysis and Minimization in Bang-Bang Digital PLLs. Circuits and Systems II:

Express Briefs, IEEE Transactions on, 56(11):835–839, Nov. 2009.

[49] Ken Kundert. Predicting the Phase Noise and Jitter of PLL-based Frequency Syn-

thesizers. In Behzad Razavi, editor, Phase-Locking in High Performance Systems,

pages 46–69. IEEE Press, 2003.

[50] Behzad Razavi, editor. Monolithic Phase-Locked Loops and Clock Recovery Circuits:

Theory and Design. Wiley-IEEE Press, 1996.

[51] N. Da Dalt. Linearized Analysis of a Digital Bang-Bang PLL and its Validity Limits

Applied to Jitter Transfer and Jitter Generation. Circuits and Systems I: Regular

Papers, IEEE Transactions on, 55(11):3663–3675, Dec. 2008.

[52] G. Marucci, S. Levantino, P. Maffezzoni, and C. Samori. Analysis and Design of

Low-Jitter Digital Bang-Bang Phase-Locked Loops. Circuits and Systems I: Regular

Papers, IEEE Transactions on, 61(1):26–36, Jan. 2014.

Bibliography 163

[53] James F. Oberst. Pull-In Range of a Phase-Locked Loop with a Binary Phase

Comparator. Bell System Technical Journal, The, 49(9):2289–2302, Nov. 1970.

[54] M. Ramezani, C. Andre, and C.A.T. Salama. Analysis of a Half-Rate Bang-Bang

Phase-Locked-Loop. Circuits and Systems II: Analog and Digital Signal Processing,

IEEE Transactions on, 49(7):505–509, July 2002.

[55] M. Chan and A. Postula. Transient Analysis of Bang-Bang Phase Locked Loops.

Circuits, Devices Systems, IET, 3(2):76–82, April 2009.

[56] Richard C. Walker. Designing Bang-Bang PLLs for Clock and Data Recovery in

Serial Data Transmission Systems, pages 34–45. Wiley-IEEE Press, 2003.

[57] M. Ramezani and C.A.T. Salama. An Improved Bang-Bang Phase Detector for

Clock and Data Recovery Applications. In Circuits and Systems (ISCAS), The

IEEE International Symposium on, volume 1, pages 715–718 vol. 1, May 2001.

[58] A. Samarah and A.C. Carusone. Multi-Phase Bang-Bang Digital Phase Lock Loop

with Accelerated Frequency Acquisition. In Circuits and Systems (ISCAS), IEEE

International Symposium on, pages 545–548, May 2015.

[59] Dan Liu, P. Basedau, M. Helfenstein, J. Wei, T. Burger, and Yangjian Chen. A

Frequency-Based Model for Limit Cycle and Spur Predictions in Bang-Bang All Digi-

tal PLL. Circuits and Systems I: Regular Papers, IEEE Transactions on, 59(6):1205–

1214, June 2012.

[60] A. Pottbacker, Ulrich Langmann, and H. Schreiber. A Si Bipolar Phase and Fre-

quency Detector IC for Clock Extraction up to 8 Gb/s. Solid-State Circuits, IEEE

Journal of, 27(12):1747–1751, Dec. 1992.

[61] Donald Richman. The DC Quadricorrelator: A Two-Mode Synchronization System.

Proceedings of the IRE, 42(1):288–299, Jan. 1954.

[62] Chao-Ching Hung and Shen-Iuan Liu. A 40-GHz Fast-Locked All-Digital Phase-

Locked Loop Using a Modified Bang-Bang Algorithm. Circuits and Systems II:

Express Briefs, IEEE Transactions on, 58(6):321–325, June 2011.

Bibliography 164

[63] Pyoungwon Park, Jaejin Park, Hojin Park, and SeongHwan Cho. An All-Digital

Clock Generator using a Fractionally Injection-Locked Oscillator in 65nm CMOS.

In Solid-State Circuits Conference Digest of Technical Papers (ISSCC), IEEE In-

ternational, pages 336–337, Feb. 2012.

[64] R. Nonis, W. Grollitsch, T. Santa, D. Cherniak, and N. Da Dalt. digPLL-Lite:

A Low-Complexity, Low-Jitter Fractional-N Digital PLL Architecture. Solid-State

Circuits, IEEE Journal of, 48(12):3134–3145, Dec. 2013.

[65] Xiang Gao, E.A.M. Klumperink, M. Bohsali, and B. Nauta. A Low Noise Sub-

Sampling PLL in Which Divider Noise is Eliminated and PD/CP Noise is Not

Multiplied by N2. Solid-State Circuits, IEEE Journal of, 44(12):3253–3263, Dec.

2009.

[66] Amer Samarah and Anthony Chan Carusone. Cycle-Slipping Pull-in Range of Bang-

Bang PLLs. In New Circuits and Systems Conference (NEWCAS), IEEE 13th In-

ternational, pages 1–4, June 2015.

[67] Ching-Che Chung, Duo Sheng, and Chen-Han Chen. An All-Digital Phase-Locked

Loop Compiler with Liberty Timing Files. In VLSI Design, Automation and Test

(VLSI-DAT), 2014 International Symposium on, pages 1–4, April 2014.

improvedphasedetectionfor digital phase-lockedloops€¦ · 4 linearization of digital pll 72 ......

Documents